Applied Artificial Intelligence

Applied Artificial Intelligence Proceedings of the 7th International FUNS Conference Edited by Da Ruan Pierre D'hondt ...

Author: Da Duan | Paolo F Fantoni | Martine De Cock | Mike Nachtegael | Etienne E Kerre

603 downloads 6422 Views 43MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

Applied Artificial Intelligence Proceedings of the 7th International FUNS Conference

Edited by

Da Ruan Pierre D'hondt Paolo F. Fantoni Martine De Cock Mike Nachtegael Etienne E. Kerre

flpplied Artificial Intelligence

Published Volumes Fuzzy Logic and Intelligent Technologies in Nuclear Science Eds. D. Ruan, P. D'hondt, P. Govaerts, and E. E. Kerre ISBN 981-02-2003-0 (1994) Intelligent Systems and Soft Computing for Nuclear Science and Industry Eds. D. Ruan, P. D'hondt, P. Govaerts, and E. E. Kerre ISBN 981-02-2738-8 (1996) Fuzzy Logic and Intelligent Technologies for Nuclear Science and Industry Eds. D. Ruan, H. Ait Abderrahim, P. D'hondt, and E. E. Kerre ISBN 981-02-3532-1 (1998) Intelligent Techniques and Soft Computing in Nuclear Science and Engineering Eds. D. Ruan, H. Ait Abderrahim, P. D'hondt, and E. E. Kerre ISBN 981-02-4356-1 (2000) Computational Intelligent Systems for Applied Research Eds. D. Ruan, P. D'hondt, and E. E. Kerre ISBN 981-238-066-3 (2002) Applied Computational Intelligence Eds. D. Ruan, P. D'hondt, M. De Cock, M. Nachtegael, and E. E. Kerre ISBN 981-238-873-7 (2004)

Applied Artificial Intelligence Proceedings of the 7th International FUNS Conference Genova, Italy

29 - 31 August 2006

Edited by

Da Ruan Pierre D'hondt Belgian Nuclear Research Centre (SCK'CEN), Belgium

Paolo F. Fantoni Institute for Energy Technology, Norway

Martine De Cock Mike Nachtegael Etienne E. Kerre Ghent University, Belgium

YJ> World Scientific NEW JERSEY • LONDON

• S I N G A P O R E • BEIJING • S H A N G H A I • HONG KONG • T A I P E I • C H E N N A I

Published by World Scientific Publishing Co. Pte. Ltd. 5 Toh Tuck Link, Singapore 596224 USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601 UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library.

APPLIED ARTIFICIAL INTELLIGENCE Proceedings of the 7th International FLINS Conference Copyright © 2006 by World Scientific Publishing Co. Pte. Ltd. All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means, electronic or mechanical, including photocopying, recording or any information storage and retrieval system now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission to photocopy is not required from the publisher.

ISBN 981-256-690-2

Printed in Singapore by World Scientific Printers (S) Pte Ltd

FOREWORD FLINS, an acronym for Fuzzy Logic and Intelligent technologies in Nuclear Science, is a well-established international research forum to advance the theory and applications of computational intelligence for applied research in general and for nuclear science and engineering in particular. FLINS2006 is the seventh in a series of conferences on Applied Artificial Intelligence. It follows the successful FLINS'94 in Mol, FLINS'96 in Mol, FLINS'98 in Antwerp, FLINS2000 in Bruges, FLINS2002 in Gent, and FLINS2004 in Blankenberge, Belgium. FLINS2006 in Genova, Italy, for the first time being held outside of Belgium, once again aims at covering state-of-the-art research and development in all aspects related to Applied AI. The principal missions of FLINS are: (1) conducting research on applied AI systems for solving intricate problems pertaining to nuclear/power research and related complex systems; (2) bridging the gap between machine intelligence and complex systems via joint research with Belgian, European, and international research institutes and universities; and (3) encouraging interdisciplinary research and bringing multi-discipline researchers together via the international FLINS conferences on applied AI. FLINS2006, co-organized by the Belgian Nuclear Research Centre (SCK-CEN), Ghent University (UGent) in Belgium, IFE-OECD Halden Reactor Project in Norway, Softeco and Convention Bureau Genova in Italy, offers a unique international forum to present and discuss techniques that are new and promising for applied AI and to launch international co-operations. The FLINS2006 proceedings consist of a series of invited lectures by distinguished professors and individual presentations, in a total of 131 papers selected out of 207 regular submissions and 6 invited papers from 25 countries. The volume begins with the list of the invited lectures: (a) Computation with information described in natural language by Lotfi Zadeh (University of California, Berkeley, USA), (b) Learning techniques in service robotic environment by Zenn Bien (Korea Advanced Institute of Science and Technology, South Korea), (d) Foundations of many-valued reasoning by Daniele Mundici (University of Florence, Italy), (e) Integrated operations in arctic environments by Fridtjov 0wre (Institute for Energy Technology/Halden Reactor Project, Norway), (f) Can the semantic web be designed without using fuzzy logic? by Elie Sanchez (University of the Mediterranean, Marseille, France), and (g) The role of soft computing in applied sciences by Paul Wang

v

VI

(Duke University, Durham, USA). The volume is followed by three contributed parts: (1) Foundations and recent developments, (2) Decision making and knowledge discovery, and (3) Applied research and nuclear applications. At the previous FLINS2004 conference, we presented a FLINS gold medal to Lotfi Zadeh, Hans Zimmermann, Ronald Yager, Paul Wang, Madan Gupta, Javier Montero, Guoqing Chen, and Yang Xu for their long support to FLINS conferences. At FLINS2006, we present a FLINS gold medal to Zenn Bien, Daniele Mundici, Fridtjov 0wre, and Elie Sanchez for their support to FLINS conferences. We also present one more FLINS gold medal to our long time nuclear and AI professor Marzio Marseguerra on the occasion of his retirement. Our 2006 FLINS Outstanding Service Award goes to Cengiz Kahraman for his tremendous efforts to attract many Turkish researchers to FLINS2006. Special thanks are due to all contributors, referees, regular and invited sessions' chairs, and program committee members of FLINS2006 for their kind co-operation and enthusiasm for FLINS2006; to Pierre D'hondt and Etienne Kerre for their roles as FLINS advisors and program co-chairs of FLINS2006; to Martine De Cock and Mike Nachtegael (FLINS2006 conference co-managers) and Paolo Fantoni (the local organization chair of FLINS2006 in Genova) for their great efforts to make FLINS2006 a success, and to Chelsea Chin (Editor, World Scientific) for her kind advice and help to publish this volume.

Da Ruan, FLINS2006 chair Mol & Gent, May 2006

CONTENTS Foreword D. Ruan

v

Invited Lectures

1

Computation with Information Described in Natural Language — The Concept of Generalized-Constraint-based Computation L.A. Zadeh

3

Learning Techniques in Service Robotic Environment Z.Z. Bien, H.E. Lee, S.W. Lee, andK.H. Park

5

Foundations of Many-Valued Reasoning D. Mundici

8

Integrated Operations in Arctic Environments F. 0wre

11

Can the Semantic Web be Designed without Using Fuzzy Logic? E. Sanchez

13

The Role of Soft Computing in Applied Sciences P.P. Wang

16

PART 1: FOUNDATIONS AND RECENT DEVELOPMENTS

17

A Functional Tool for Fuzzy First Order Logic Evaluation V. Lopez, J.M. Cleva, and J. Montero

19

Field Theory and Computing with Words G. Resconi and M Nikravesh

27

New Operators for Context Adaptation of Mamdani Fuzzy Systems A. Botta, B. Lazzerini, and F. Marcelloni

35

vn

Vlll

Using Parametric Functions to Solve Systems of Linear Fuzzy Equations — An Improved Algorithm 43 A. Vroman, G. Deschrijver, and E.E. Kerre Numerical Implementation Strategies of the Fuzzy Finite Element Method for Application in Structural Dynamics D. Moens andD. Vandepitte

51

Environmental/Economic Dispatch Using Genetic Algorithm and Fuzzy Number Ranking Method G. Zhang, G. Zhang, J. Lu, andH. Lu

59

Minimizing the Number of Affected Concepts in Handling Inconsistent Knowledge E. Gregoire

67

A Knowledge Management based Fuzzy Model for Intelligent Information Disposal X. Liang, Z. Zhang, D. Zhu, andB. Tang

75

A Semantical Assistant Method for Grammar Parsing Y Wang, G. Gan, Z. Wu, andF. Li

81

Lukasiewicz Algebra Model of Linguistic Values of Truth and Their Reasoning L. Yi, Z. Pei, and Y. Xu

87

Propositional Logic L6P(X) based on Six Linguistic Term Lattice Implication Algebra W. Wang, Y. Xu, andL. Zou

95

Weighting Qualitative Fuzzy First-Order Logic and its Resolution Method L. Zou, B. Li, W. Wang, and Y. Xu

103

Annihilator and Alpha-Subset X.Q. Long, Y. Xu, andL.Z. Yi

111

IX

Multi-Fold Fuzzy Implicative Filter of Residuated Lattice Implication Algebras H. Zhu, J. Zhao, Y. Xu, andL. Yi

116

PD-Algebras Y. Liu and Y. Xu

122

Li-Yorke Chaos in a Spatiotemporal Chaotic System P. Li, Z.Li, W.A. Halang, and G Chen

130

On the Probability and Random Variables on IF Events B. Riecan

138

Another Approach to Test the Reliability of a Model for Calculating Fuzzy Probabilities C. Huang and D. Jia

146

A Novel Gaussian Processes Model for Regression and Prediction Y. Zhou, T. Zhang, and Z. Lu

154

On PCA Error of Subject Classification L.H. Feng, F.S. Hu, andL. Wan

162

Optimized Algorithm of Discovering Functional Dependencies with Degrees of Satisfaction Q. Wei and G. Chen

169

From Analogy Reasoning to Instances based Learning W. Pan and T. Li

177

A Kind of Weak Ratio Rules for Forecasting Upper Bound Q. Wei, B. Jiang, K. Wu, and W. Wang

185

Combining Validity Indexes and Multi-Objective Optimization based Clustering T. Ozyer andR. Alhajj

193

A Method for Reducing Linguistic Terms in Sensory Evaluation Using Principle of Rough Set Theory X. Liu, X. Zeng, L. Koehl, and Y. Xu The Specificity of Neural Networks in Extracting Rules from Data M. Holeha

201

209

Stable Neural Architecture of Dynamic Neural Units with Adaptive Time Delays /. Bukovsky, J. Bila, and MM. Gupta

215

Evaluation Characteristics for Multilayer Perceptrons and Takagi Sugeno Models W. Kaestner, T. Foerster, C. Lintow, R. Hampel

223

Research on Improved Multi-Objective Particle Swarm Optimization Algorithms D. Zhao and W. Jin

231

PART 2: DECISION MAKING AND KNOWLEDGE DISCOVERY

239

Knowledge Discovery for Customer Classification on the Principle of Maximum Profit C. Zeng, YXu, and W. Xie

241

An Integrated Analysis Method for Bank Customer Classification J. Zhang, J. Lu, G. Zhang, andX. Yan

247

Two Stage Fuzzy Clustering based on Knowledge Discovery and its Application Y. Qian

253

Application of Support Vector Machines to the Modelling and Forecasting of Inflation M. Marcek and D. Marcek

259

XI

Assessing the Reliability of Complex Networks: Empirical Models based on Machine Learning CM. Rocco andM. Muse Hi

267

Fuzzy Time Series Modelling by SCL Learning M. Marcek andD. Marcek

275

Investment Analysis Using Grey and Fuzzy Logic C. Kahraman andZ. Ulukan

283

An Extended Branch-And-Bound Algorithm for Fuzzy Linear Bilevel Programming G. Zhang, J. Lu, and T. Dillon

291

Fuzzy Multi-Objective Interactive Goal Programming Approach to Aggregate Production Planning T. Ertay

299

Fuzzy Linear Programming Model for Multiattribute Group Decision Making to Evaluate Knowledge Management Performance Y.E. Albayrak and Y.C. Erensal

307

Product-Mix Decision with Compromise LP Having Fuzzy Objective Function Coefficients (CLPFOFC) S. Susanto, P. Vasant, A. Bhattacharya, and C. Kahraman

315

Modeling the Supply Chain: A Fuzzy Linear Optimization Approach N.Y. Ates and S. Cevik

321

A Fuzzy Multi-Objective Evaluation Model in Supply Chain Management X. Liang, X Liu, D. Zhu, B. Tang, and H. Zhuang

329

Evaluating Radio Frequency Identification Investments Using Fuzzy Cognitive Maps A. Ustundag and M. Tanyas Analysing Success Criteria for ICT Projects K. Milis andK. Van hoof

335

343

Xll

Multi-Attribute Comparison of Ergonomics Mobile Phone Design based on Information Axiom 351 G. Yucel and E. Aktas Facility Location Selection Using a Fuzzy Outranking Method /. Kaya andD. Cinar

359

Evaluation of the Suppliers' Environmental Management Performances by a Fuzzy Compromise Ranking Technique G. Biiyiikdzkan and O. Feyzioglu

367

A Fuzzy Multiattribute Decision Making Model to Evaluate Knowledge based Human Resource Flexibility Problem M.E. Genevois and Y.E. Albayrak

375

Fuzzy Evaluation of on the Job Training Alternatives in Industrial Companies G. Kayakutlu, G. Biiyiikdzkan, B.C. Metin, andS. Ercan

383

A Study of Fuzzy Analytic Hierarchy Process: An Application in Media Sector M. Ozyol and Y.E. Albayrak

389

Prioritization of Relational Capital Measurement Indicators Using Fuzzy AHP A. Beskese and F.T. Bozbura

395

Multicriteria Map Overlay in Geospatial Information System via Intuitionistic Fuzzy AHP Method T. Silavi, M.R. Malek, and M.R. Delavar

401

A Consensus Model for Group Decision Making in Heterogeneous Contexts L. Martinez, F. Mata, andE. Herrera-Viedma

409

A Linguistic 360-Degree Performance Appraisal Evaluation Model R. de Andres, J.L. Garcia-Lapresta, andL. Martinez

417

Xlll

An Interactive Support System to Aid Experts to Express Consistent Preferences S. Alonso, E. Herrera-Viedma, F. Herrera, F.J. Cabrerizo, and F.Chiclana

425

A Model of Decision-Making with Linguistic Information based on Lattice-Valued Logic J. Ma, S. Chen, and Y. Xu

433

Information Integration based Team Situation Assessment in an Uncertain Environment J. Lu and G. Zhang

441

Scheduling a Flowshop Problem with Fuzzy Processing Times Using Ant Colony Optimization S. Kilic and C. Kahraman

449

Time Dependent Vehicle Routing Problem with Fuzzy Traveling Times under Different Traffic Conditions T. Demirel andN.C. Demirel

457

A Programming Model for Vehicle Schedule Problem with Accident C. Zeng, Y. Xu, and W. Xie

465

A Web Data Extraction Model based on XML and its Improvement W. Xie and C. Zeng

471

Evaluation of E-Service Providers Using a Fuzzy Multi-Attribute Group Decision-Making Method C. Kahraman and G. Biiyukozkan

477

A Case based Research on the Directive Function of Website Intelligence to Human Flow Z Lu, Z. Deng, and Y. Wang

485

Genetic Algorithm for Interval Optimization and its Application in the Web Advertising Income Control Q. Liao andX. Li

493

XIV

Design and Implementation of an E-Commerce Online Game for Education and Training P. Zhang, M. Fang, Y. Zeng, and J. Yu

499

Selection Model of Semantic Web Services X. Wang, Y. Zhao, and W.A. Halang

505

A Trust Assertion Maker Tool P. Ceravolo, E. Damiani, M. Viviani, A. Curcio, and M. Pinelli

511

Web Access Log Mining with Soft Sequential Patterns C. Fiot, A. Laurent, andM. Teisseire

519

An Improved ECC Digital Signature Algorithm and Application in E-Commerce X.P. Xu

525

An Immune Symmetrical Network-based Service Model in Peer-to-Peer Network Environment X. Zhang, L. Ren, and Y. Ding

533

Machine Learning and Soft-Computing in Bioinformatics — A Short Journey F.-M. Schleif, T. Villmann, T. Elssner, J. Decker, and M. Kostrzewa Full-Length HPLC Signal Clustering and Biomarker Identification in Tomato Plants M. Striekert, T. Czauderna, S. Peterek, A. Matros, H.-P. Mock, and U. Seiffert Feature Scoring by Mutual Information for Classification of Mass Spectra C. Krier, D. Francois, V. Wertz, and M. Verleysen Peak Intensity Prediction for PMF Mass Spectra Using Support Vector Regression W. Timm, S. Bocker, T. Twellmann, and T. W. Nattkemper

541

549

557

565

XV

Learning Comprehensible Classification Rules from Gene Expression Data Using Genetic Programming and Biological Ontologies 573 B. Goertzel, L. Coelho, C. Pennachin, I. Goertzel, M. Queiroz, F. Prosdocimi, andF. Lobo Protein Secondary Structure Prediction: How to Improve Accuracy by Integration L. Palopoli, S.E. Rombo, G. Terracina, G. Tradigo, and P. Veltri

579

The Stabilization Effect of the Triplex Vaccine F. Pappalardo, S. Motta, E. Mastriani, M. Pennisi, and P.-L. Lollini

587

Learning Classifiers for High-Dimensional Micro-Array Data A. Bosin, N. Dessi, andB. Pes

593

Prediction of Residue Exposure and Contact Number for Simplified HP Lattice Model Proteins Using Learning Classifier Systems M. Stout, J. Bacardit, J.D. Hirst, J. Blazewicz, and N. Krasnogor

601

A Study on the Effect of Using Physico-Chemical Features in Protein Secondary Structure Prediction G.L.J. Rama, M. Palaniswami, D. Lai, and M. W. Parker

609

Gene Expression Data Analysis in the Membership Embedding Space: A Constructive Approach M. Filippone, F. Masulli, and S. Rovetta

617

BICA and Random Subspace Ensembles for DNA Microarray-Based Diagnosis B. Apolloni, G. Valentini, and A. Brega

625

Prediction of Sclerotinia Sclerotiorum (Lib) De Baey Disease on Winter Rapeseed (B. Napus) based on Grey GM(1,1) Model G. Liao and F. Xiao

633

XVI

PART 3: APPLIED RESEARCH AND NUCLEAR APPLICATIONS

641

Identification of Seismic Activities through Visualization and Scale-Space Filtering C. Qin, Y. Leung, and J. Zhang

643

Fuzzy Approximation Network Perturbation Systems and its Application to Risk Analysis in Transportation Capacity K. Zou

651

Application of Artificial Neural Networks in the Flood Forecast L. Feng and J. Lu

659

Integrated Management Pattern of Marine Security Synthesis Risk Y. Wang, X.H. Ren, Y.S. Ding, andC.Y. Yu

665

Risk Analysis and Management of Urban Rainstorm Water Logging in Tianjin S.Han, Y.Xie, andD.Li Study on Environmental Risk Influence Factor of Tongliao X.H. Ren, Y.H. Li, H.X. Tian, and Y. Wang

671

678

Practical Research of the Flood Risk based on Information Diffusion Theory X. Zhang and L. Feng

686

Risk Analysis for Agricultural Drought based on Neural Network Optimized by Chaos Algorithm L. Qiu, X. Chen, C. Duan, and Q. Huang

692

A Computer Simulation Method for Harmony among Departments for Emergency Management F. Yang and C. Huang

698

An Approach of Mobile Robot Environment Modeling based on Ultrasonic Sensors Array Principal Components 704 Y.Q. Zhang, F. Li, H.M. Wang, Z.G. Hou, M. Tan, MM Gupta, and P.N. Nikiforuk

XV11

Slam with Corner Features from a Novel Curvature-based Local Map Representation R. Vazquez-Martin, P. Nunez, J.C. Del Toro, A. Bandera, andF. Sandoval Obstacle Avoidance Learning for Biomimetic Robot Fish Z Shen, M. Tan, Z Cao, S. Wang, and Z Hou

711

719

Snake-Like Behaviors Using Macroevolutionary Algorithms and Modulation based Architectures J. A. Becerra, F. Bellas, R.J. Duro, and J. de Lope

725

Decision Tree and Lie Algebra Method in the Singularity Analysis of Parallel Manipulators K. Hao and Y. Ding

731

Combining AdaBoost with a Hill-Climbing Evolutionary Feature Search for Efficient Training of Performant Visual Object Detectors Y. Abramson, F. Moutarde, B. Stanciulescu, andB. Steux

737

Intelligent System Supporting Non-Destructive Evaluation of SCC Using Eddy Current Test S. Kanemoto, W. Cheng, I. Komura, M. Shiwa, and S. Tsunoyama

745

The Continuous-Sentential KSSL Recognition and Representation System Using Data Glove and Motion Tracking based on the Post Wearable PC J.H. Kim andK.S. Hong

753

On the Intuitionistic Denazification of Digital Images for Contrast Enhancement I.K. Vlachos and G.D. Sergiadis

759

A Heuristic Approach to Intuitionistic Fuzzification of Color Images I.K. Vlachos and G.D. Sergiadis Intuitionistic Fuzzy Feature Extraction for Query Image Retrieval from Colour Images K.S. Babu andR.S. Kumar

767

775

XV1U

Classification with Intuitionistic Fuzzy Region in Geospatial Information System M.R. Malek, J. Karami, and S. Aliabady

783

On-line Training Evaluation in Virtual Reality Simulators Using Fuzzy Bayes Rule R.M. de Moraes andL.S. Machado

791

Assessement of Gynecological Procedures in a Simulator based on Virtual Reality L.S. Machado, M.C. de Oliveira Valdek, andR.M. de Moraes

799

Screaming Racers: Competitive Autonomous Drivers for Racing Games F. Gallego, F. Llorens, and R. Satorre

805

Urban Signal Control Using Intelligent Agents M.A. Alipour and S. Jalili

811

Considerations on Uncertain Spatio-Temporal Reasoning in Smart Home Systems J. Liu, J.C. Augusto, andH. Wang

817

Neuro-Fuzzy Modeling for Fault Diagnosis in Rotating Machinery E. Zio and G. Gola

825

FLC Design for Electric Power Steering Automation J.E. Naranjo, C. Gonzalez, R. Garcia, and T. de Pedro

833

Studying on Acceleration Sensor's Fault-Tolerance Technology of Tilting Trains J. Lin, Y. Zhang, Y. Gao, and T. Li

839

A Risk-Risk Analysis based Abstraction Approach to Societal Problem-Solving in Nuclear Systems S.Rao

845

A Fuzzy Logic Methodology for Open Source Information Synthesis in a Non-Proliferation Framework I. Maschio

851

XIX

A Financial-Option Methodology for Determining a Fuzzy Discount Rate in Radioactive Waste Management P.L. Kunsch

859

Application of Intelligent Decision System to Nuclear Waste Depository Option Analysis D.L. Xu, J.B. Yang, B. Carle, F. Hardeman, andD. Ruan

867

Model of Fuzzy Expert System for the Calculation of Performance and Safety Indicator of Nuclear Power Plants K. C. Souto and R. Schirru

875

Artificial Intelligence Applied to Simulation of Radiation Damage in Ferritic Alloys 883 R.P. Domingos, G.M. Cerchiara, F. Djurabekova, andL. Malerba Particle Swarm Optimization Applied to the Combinatorial Problem in order to Solve the Nuclear Reactor Fuel Reloading Problem 891 A. Meneses andR. Schirru Use of Genetic Algorithm to Optimize Similar Pressurizer Experiments D. Botelho, P. de Sampaio, C. Lapa, C. Pereira, M. Moreira, and A. Barroso

899

Particle Swarm Optimization Applied to the Nuclear Core Reload Problem 907 M. Waintraub, R.P. Baptista, R. Schirru, and C. Pereira Parallel Evolutionary Methods Applied to a PWR Core Reload Pattern Optimization R. Schirru, A. de Lima, and M.D. Machado

915

Robust Distance Measures for On-Line Monitoring: Why Use Euclidean? D.R. Garvey and J. W. Hines

922

Multiple Objective Evolutionary Optimisation for Robust Design D.E. Salazar A., CM. Rocco S., andE. Zio

930

XX

Feature Selection for Transients Classification by a Niched Pareto Genetic Algorithm E. Zio, P. Baraldi, andN. Pedroni

938

Optimized Direct Fuzzy Model Reference Adaptive Control Applied to Nuclear Reactor Dynamics F. Cadini and E. Zio

946

A Fuzzy-Logic-Based Methodology for Signal Trend Identification E. Zio and I.C. Popescu

954

Identification of Transients in Nuclear Systems by a Supervised Evolutionary Possibilistic Clustering Approach 962 E. Zio, P. Baraldi, and D. Mercurio Signal Grouping Algorithm for an Improved on-line Calibration Monitoring System 970 M. Hoffmann Intelligent Transient Normalization for Improved Empirical Diagnosis D. Roverso

977

User Interface for Validation of Power Control Algorithms in a Triga Reactor J.S. Benitez-Read, C.L. Ramirez-Chavez, andD. Ruan

985

Author Index

993

INVITED LECTURES

This page is intentionally left blank

COMPUTATION WITH INFORMATION DESCRIBED IN NATURAL LANGUAGE—THE CONCEPT OF GENERALIZED-CONSTRAINT-BASED COMPUTATION* L. A. ZADEH Department ofEECS, University of California Berkeley, CA 94720-1776, USA e-mail: zadeh@eecs. berkeley. edu

What is computation with information described in natural language? Here are simple examples. I am planning to drive from Berkeley to Santa Barbara, with stopover for lunch in Monterey. It is about 10 am. It will probably take me about two hours to get to Monterey and about an hour to have lunch. From Monterey, it will probably take me about five hours to get to Santa Barbara. What is the probability that I will arrive in Santa Barbara before about six pm? Another simple example: A box contains about twenty balls of various sizes. Most are large. What is the number of small balls? What is the probability that a ball drawn at random is neither small nor large? Another example: A function, / from reals to reals is described as: If X is small then Y is small; if ^ i s medium then Y is large; if Xis large then Y is small. What is the maximum of/? Another example: Usually die temperature is not very low, and usually the temperature is not very high. What is the average temperature? Another example: Usually most United Airlines flights from San Francisco leave on time. What is the probability that my flight will be delayed? Computation with information described in natural language, or NLcomputation for short, is a problem of intrinsic importance because much of human knowledge is described in natural language. It is safe to predict that as we move further into the age of machine intelligence and mechanized decisionmaking, NL-computation will grow in visibility and importance. Computation with information described in natural language cannot be dealt with through the use of machinery of natural language processing. The problem is semantic imprecision of natural languages. More specifically, a natural language is basically a system for describing perceptions. Perceptions are intrinsically imprecise, reflecting the bounded ability of sensory organs, and

* Research supported in part by ONR N00014-02-1-0294, BT Grant CT1080028046, Omron Grant, Tekes Grant, Chevron Texaco Grant and the BISC Program of UC Berkeley

3

4 ultimately the brain, to resolve detail and store information. Semantic imprecision of natural languages is a concomitant of imprecision of perceptions. Our approach to NL-computation centers on what is referred to as generalizedconstraint-based computation, or GC-computation for short. A generalized constraint is expressed as X isr R, where X is the constrained variable, R is a constraining relation and r is an indexical variable which defines the way in which R constrains X. The principal constraints are possibilistic, veristic, probabilistic, usuality, random set, fuzzy graph and group. Generalized constraints may be combined, qualified, propagated, and counter propagated, generating what is called the Generalized Constraint Language, GCL. The key underlying idea is that information conveyed by a proposition may be represented as a generalized constraint, that is, as an element of GCL. In our approach, NL-computation involves two modules: (a) Precisiation module; and (b) Computation module. The meaning of an element of a natural language, NL, is precisiated through translation into GCL and is expressed as a generalized constraint. An object of precisiation, p, is referred to as precisiend, and the result of precisiation,/?*, is called a precisiand. Usually, a precisiend is a proposition or a concept. A precisiend may have many precisiands. Definition is a form of precisiation. A precisiand may be viewed as a model of meaning. The degree to which the intension (attribute-based meaning) of p* approximates to that of p is referred to as cointension. A precisiand, p*, is cointensive if its cointension with p is high, that is, if p* is a good model of meaning ofp. The Computation module serves to deduce an answer to a query, q. The first step is precisiation of q, with precisiated query, q*, expressed as a function of n variables u\, ..., «„. The second step involves precisiation of query-relevant information, leading to a precisiand which is expressed as a generalized constraint on u\, ..., uD. The third step involves an application of the extension principle, which has the effect of propagating the generalized constraint on u\, ..., ua to a generalized constraint on the precisiated query, q*. Finally, the constrained q* is interpreted as the answer to the query and is retranslated into natural language. The generalized-constraint-based computational approach to NLcomputation opens the door to a wide-ranging enlargement of the role of natural languages in scientific theories. Particularly important application areas are decision-making with information described in natural language, economics, risk assessment, qualitative systems analysis, search, question-answering and theories of evidence.

LEARNING TECHNIQUES IN SERVICE ROBOTIC ENVIRONMENT Z. ZENN BIEN 1 , HYONG-EUK LEE, SANG-WAN LEE, AND KWANG-HYUN PARK

Dept. of Electrical Engineering and Computer Science, KA1ST 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Republic of Korea f zbien@kaist. edu This presentation addresses the problems of realizing human-friendly man-machine interaction in service robotic environment with emphasis on learning capability. After briefly reviewing the issues of human-robot interaction and various learning techniques from engineering point of view, we report our experiences in case studies where some learning techniques are successfully implemented in service robotic environment and discuss open issues of learning systems such as adaptivity and life-long learning capability.

1. Introduction The way of interaction between human beings and machines becomes a more significant aspect in the design of intelligent service robotic systems [1]. However, it is usually difficult to model and handle such interaction due to variability of the user's behavior and uncertainty of the environment. As an engineering approach, we propose a design based on the idea that the acquired information in interaction should be incorporated into the system to reduce the user's cognitive load during operation and enhance machine intelligence of the system. Early attempts to achieve such a given goal have been to build fully preprogrammed systems with a predefined set of tasks under structured environments. These activities typically have limitation on expansion of available tasks and difficulty in handling time-varying environment with incomplete prior knowledge. We think that these problems can be solved by capability of 'self-improvement' adopting various learning techniques. 2. Human-Robot Interaction in Service Robotic Environment Bio-signal acquisition and its use are essential in human-friendly human-robot interaction to recognize human intention and to understand human's physical status and behavior. However, the success rate of most human bio-signal 5

6 recognition usually decreases due to its time-varying and user-dependent characteristics. To resolve this problem, continual adaptation of the system can be a solution using accumulated knowledge for a long period of time. Thus, the learning capability of the control system becomes of major concern in service robotic environment. 3. Learning Techniques for Service Robotic Environment Learning is a broad concept which refers to the action of a system to adapt and change its behavior based on input/output observations. When a system improves its performance at a given task over time without reprogramming, it can be said to have learned something. In some sense, learning can be considered as a bridge between knowledge and experience. That is, knowledge is obtained and modified by the repetitive experiences/observations using various learning algorithms. For example, reinforcement learning updates its policy by actionreward pairs, and an iterative learning method generates the desired control input by repetitive trials of control action and observation of actual output [2]. In order to achieve a certain level of required performance, a designer has to decide a proper method of knowledge representation and learning mechanism depending on target tasks and goals For service robotic environment where various forms of human-robot interaction take place, we have found that the soft computing techniques, such as fuzzy logic, neural network, rough set theory, hidden Markov model, etc., and their hybrid approaches are very useful to conduct learning of the system. We shall show, in particular, the benefits of various forms of FNN (Fuzzy Neural Network) as a powerful learning mechanism. 4. Case Studies In many applications in the field of pattern recognition, the soft computing techniques have been successfully applied in the sense of short-term learning. However, we now also need to focus on the concept of life-long learning which emphasizes adaptivity of the system (and utilizes accumulated knowledge) continuously during the learning process by repeating inductive and deductive learning processes. As case studies, we shall show (1) adaptation capability of FNN hybridized with other learning skill for a facial expression recognition system [3], and (2) lifelong learning capability in a fuzzy knowledge discovery system for service robots [4].

7 5. Concluding Remarks We think that a system with learning capability is essential to implement service robotic environment such as an intelligent residential space where old/disabled people can live independently. Also, life-long learning with adaptation enables the system to learn user-dependent characteristics and deal with time-varying features of human-beings effectively. Acknowledgment This work was supported by the SRC/ERC program of the MOST/KOSEF under grant #R11-1999-008. References 1.

2. 3.

4.

Z. Zenn Bien and K.-H. Park, "Learning and Adaptivity Aspects in Intelligent Service Robot Systems," Proceedings of Korea-Japan Joint Workshop on Info-Mechatronic Systems, Seoul, Korea, pp. 55-58 (2005) Z. Zenn Bien and J.-X. Xu, Iterative Learning Control: Analysis, Design, Integration and Application, Kluwer Academic Publishers (1998) Sang-Wan Lee, Dae-Jin Kim, Yong Soo Kim, and Zeungnam Bien, "Training of Feature Extractor via New Cluster Validity - Applicant of Adaptive Facial Expression Recognition," Lecture Notes in Computer Science, vol. 3684, pp. 542-548 (2005) Z. Zenn Bien, H.-E. Lee, Y.-M. Kim, Y.-H. Kim, J.-W. Jung, and K.-H. Park, "Steward Robot for Human-friendly Assistive Home Environment," in Promoting Independence for Older Persons with Disabilities, Assistive Technology Research Series, W.C. Mann and A. Helal (eds.), Amsterdam, The Netherlands: IOS Press, vol. 18, pp. 75-84 (2006)

FOUNDATIONS OF MANY-VALUED REASONING

D. MUNDICI Dept. of Mathematics, University of Florence, Viale Morgagni 67/'A, Florence, Italy E-mail: [email protected]

We need infinitely many truth-values when we are to draw inferences from erroneous or imprecise pieces of information. For instance, consider the Renyi-Ulam game of Twenty Questions where some of the answers may be erroneous. Since two equal answers to the same repeated question are more informative than a single answer, and contradictory answers are admissible, classical two-valued logic cannot handle conjunctions of answers in this game. Rather, the appropriate logical tool is given by Lukasiewicz infinitevalued logic, and its algebras—Chang MV-algebras. As a second example, consider the following generalized definition by cases:

else

if hi holds then e\ follows, if ho holds then e? follows,

, else, finally if hn holds then en follows. In many concrete cases the hypotheses hi do not form a boolean partition, but they still are an irredundant and exhaustive set of incompatible propositions in some logic: the infinite-valued calculus L ^ of Lukasiewicz enables us to establish the desired logical interrelation between "causes" {hi,...,hn} and "effects" {ei,...,e n }. Thus, e.g., one can confidently express in LQO the fact that the possible cases hi "sum up to one", or that they are "independent", because every MV-algebra A is equipped with a natural addition operation arising from the unique enveloping group of A, as described in 7. By definition, an MV-algebra A is an abelian monoid {A, 0, ©) equipped 8

9 with an operation -> such that ->->x = x, x © ->0 = ->0 and, finally, - , (- 1 x © y) © y = -•(—ij/ © x) © x.

(1)

These three equations, together with the three equations stating that © is an associative and commutative operation with neutral element 0, formalize certain properties of the real unit interval [0,1] equipped with negation ->x — 1 — x and truncated addition x © y = m'm(l,x + y). For instance, equation (1) states that the maximum operation is commutative. Valid equations yield new valid equations by applying the familiar substitutions of equals by equals. Chang's completeness theorem states that in this way one can obtain from the above six equations every equation that is valid in the MV-algebra [0,1]. The machine V solving the decision problem of the Lukasiewicz calculus can be used to decide which combination of the {ei, ...,e„} should preferably take effect, once it is known that a certain combination of hypotheses {hi,...,hn} is actually true. Conversely, V can be used to decide which hypothesis hj is the more plausible cause of a given combination of the effects ej. Further Reading. In the monograph 2 one can find self-contained proofs of all main theorems about many-valued logic and MV-algebras. Both Hajek's book 5 and Gottwald's treatise 4 devote ample space to Lukasiewicz logic and MV-algebras. The second volume of the Handbook of Measure Theory 10 includes several chapters on MV-algebraic measure theory. As shown in the monograph 3 and in the pioneering textbook 12 , MV algebras also yield an important example of "quantum structures". For more information on Renyi-Ulam games, their relations with Lukasiewicz logic, and their applications to error-correcting codes, fault-tolerant search, algorithmic learning and logic programming see 11 > 1,6 . For the logic of nonboolean definitions by cases, see 8 ' 9 . References 1. F. Cicalese, D. Mundici, Learning and the art of fault-tolerant guesswork, Handbook Chapter, In: Perspectives on Adaptivity and Learning, I. Stamatescu et al., Eds., Springer, (2003), pp. 117-143. 2. R. Cignoli, I. M. L. D'Ottaviano, D. Mundici, Algebraic Foundations of ManyValued Reasoning, Kluwer, Dordrecht, 2000. 3. A. Dvurecenskij, S. Pulmannova, New Trends in Quantum Structures, Kluwer, Dordrecht, 2000. 4. S.Gottwald, A treatise on many-valued logics. Studies in Logic and Computation. 9. Baldock: Research Studies Press. 2000.

10 5. P. Hajek, Metamathematics of fuzzy logic, Kluwer, Dordrecht, 1998. 6. F.Klawonn, R.Kruse, A Lukasiewicz logic based Prolog, Mathware and Soft Computing, 1 (1994) 5-29. 7. D. Mundici, Interpretation of AF C*-algebras in Lukasiewicz sentential calculus. Journal of Functional Analysis, 65 (1986) 15-63. 8. D. Mundici, Reasoning on imprecisely defined functions, In: Discovering the World with Fuzzy Logic, (V.Novak, I Perfilieva, Eds.), "Studies in Fuzziness and Soft Computing", Physica-Verlag, Springer, NY, Heidelberg, vol. 57, (2000) pp. 331-366. 9. D. Mundici, If-then-else and rule extraction from two sets of rules, In: From Synapses to Rules, Proceedings of an International Workshop held at the Center for Physics "E. Majorana", Erice, Italy. B. Apolloni et al., Eds., Kluwer/Plenum, NY, (2002) pp. 87-108. 10. E. Pap, Editor, Handbook of Measure Theory, I,II, North-Holland, Amsterdam, 2002. 11. A. Pelc, Searching games with errors: fifty years of coping with liars, Theoretical Computer Science, 270 (2002) 71-109. 12. B. Riecan, T. Neubrunn, Integral, measure, and ordering, Kluwer, Dordrecht, 1997.

INTEGRATED OPERATIONS IN ARCTIC ENVIRONMENTS FRIDTJOV 0WRE Research Director Institute for energy technology (IFE) Halden Norway Fridtjov. Owre@hrp. no

"Snehvit" is a natural gas field located 140 km offshore north of Northern Norway. It is the first hydrocarbon discovery to be developed in the Norwegian part of the Barents Sea. The production will be controlled from an on-shore control centre located on the island of Melkoya, close to the town of Hammerfest which is the world's northernmost city located on latitude 70 degrees north. Snehvit consists of 3 fields which will be operated jointly. There will neither be production platforms nor other facilities visible on the surface above Snehvit. Natural gas will be produced strictly from sub-sea wells and then transported to shore through a 160 km long pipeline. This will be the worlds longest multiphase transport system. The process plant at Melkaya will be receiving the well stream from the field, first separating gas and water before cooling down the gas to liquid form at -163 degrees C. The plant is the first gas liquidising plant in Europe and the most northern in the world. Four LNG tankers will carry the liquidised gas to customers in Spain and eastern USA. Production will start in 2007 and it is estimated to last for more than 20 years. In this presentation some of the challenges of developing oil & gas fields in Arctic environments, such as Snohvit, will be described. The concept of Integrated Operations (IO) aims to help oil companies to utilize vendors' core competencies and services more efficiently. Utilizing digital services and vendor products, operators will be able to update reservoir models, drilling targets and well trajectories as wells are drilled, manage well completions remotely, optimize production from reservoir to export lines, and implement condition-based maintenance concepts. The total impact on production, recovery rates, costs and safety will be profound. To realize the benefits of IO, work processes have to be integrated and streamlined across disciplines both onshore and offshore, across operators and vendors and 11

12 disciplines, and information about the operations must be made available to all parties involved, in real time. One central component in 10 is the Integrated Operations Centre (IOC). The Snohvit IOC will be introduced as well as the solution provided by IFE for the Large Screen Display (LSD) (with the dimensions of 16 m x 1.5 m (52"x5") to be installed in the Snohvit IOC at Melkaya. Furthermore, since large oil platforms and integrated subsea-onshore solutions are becoming more and more automated, the concept of remote condition monitoring of equipment and systems will be introduced. IFE is now developing two such systems for remote conditioning monitoring based on neural nets and fuzzy logic technologies. These systems will be monitoring air emissions of offshore gas turbines and the quality of the discharged water that is produced as a by-product of the oil separation process. An overview of the technology developed for these systems will also be provided.

CAN THE SEMANTIC WEB BE DESIGNED WITHOUT USING FUZZY LOGIC? E. SANCHEZ Laboratoire d'Informatique Fondamentale UMR 6166 CNRS Biomathematiques et Informatique Medicate Faculte de Medecine, 13385 Marseille Cedex 5, France [email protected]

The challenging question addressed in the title can be first answered by the following (adapted) quotation from L.A. Zadeh in a lecture he gave at the IFAC Symposium Fuzzy Information, Knowledge representation and Decision Analysis, Marseille, 1983. The topic there was Expert Systems, not the Semantic Web. "The present Semantic Web - which has proved to be quite successful has been designed without using fuzzy logic to any appreciable extent. In this sense, then, the answer to the question is in the affirmative. On the other hand, it is widely recognized that the ways in which the issues relating to uncertainty, imprecision and incompleteness of data are dealt with in the existing Semantic Web leave many basic questions unanswered. A thesis which is set forth in this presentation is that a systematic use of fuzzy logic is a necessity when the available information is imprecise, incomplete or not totally reliable." It is a necessity, when meta data come from human sources as well as from machines. On one side, one must have tools to deal with subjective information, and on the machine side, computers require precise definitions, but humans have a remarkable ability to work with imprecise, linguistic, definitions Gust consider the multiple "definitions" of an ontology), as encountered in real world knowledge. Real world knowledge is complex and Description Logics or First Order Logic (for which considerable efforts are currently being made in Semantic Web developments) are inadequate. As WA. Woods already pointed out in [1], "many people have responded to the need for increased rigor in knowledge representation by turning to first-order logic as a semantic criterium ... it is already clear that first-order logic is insufficient to deal with many semantic problems inherent in understanding natural language as well as the semantic requirements of a reasoning system for an intelligent agent using knowledge to interact with the world." Humans can still make inferences in an imprecise environment and that's where fuzzy logic comes into the picture.

13

14 Recently, several initiatives have led to reports of connections between Fuzzy Logic and the Semantic Web [2-10]. Fuzzy logic is now confronted with a new challenge, namely the vision of the Semantic Web. In a recent volume [6] it has been presented how components of the Semantic Web (XML, RDF, OWL, Description Logics, Conceptual Graphs, Ontologies) can be covered, with in each case a fuzzy logic focus. In this presentation, emphasis will be put on ontologies and, more specifically, on fuzzy ontologies. Ontologies are a key component of the Semantic Web, they facilitate a machine processable representation of information and they bridge an effective communication gap between users and machines. They are executable, formal conceptualizations with shared agreement between members of a community of interest. They can be viewed as "collections of statements written in a language such as RDF that define the relations between concepts and specify logical rules for reasoning about them. Computers can understand the meaning of semantic data on a web page by following links to specified ontologies" [II]. A Fuzzy Ontology structure [10] can be defined as consisting of fuzzy concepts, fuzzy relations (or roles), a Taxonomy (or concept hierarchy), nontaxonomic fuzzy Associative relationships and Ontology Axioms (or rules), expressed in an appropriate logical language, e.g. asserting class subsumption, equivalence, more generally to fuzzily constrain the possible values of concepts or instances. Then, a lexicon for a fuzzy ontology consists of lexical entries for concepts and for fuzzy relations, coupled with weights expressing the strength of associations, and of reference functions linking lexical entries to concepts or relations they refer to. A concept Q can be associated with an epistemic lexicon [7] K(Cj), expressing world knowledge about it. K(Q) is organized into relations, with entries defined as fuzzy distribution-valued attributes that are context dependent. Because an ontology is a conceptualization of a domain, it is not supposed to contain instances, hence it can be defined a Fuzzy Knowledge Base structure , associating a Fuzzy Ontology structure and a set of Instances (or Individuals), i.e. 'objects' represented by a concept. Throughout the presentation, all these notions and structures will be explained and illustrated with examples. As concluding remarks, the famous Semantic Web Layer Cake might be expanded to incorporate reasoning with impression and uncertainty, by adding a (sub) fuzzy logic layer. People are searching for a killer application of the Semantic Web: couldn't it be the use of fuzzy logic? Before this, it will be necessary to exploit, adapt and extend existing work. The vision of a Semantic Web Wave is attracting much attention in the scientific world. Design, implementation and integration of ontologies will be crucial in the development of the Semantic Web and it is believed that in the coming years, the Semantic Web will be a major field of applications of Fuzzy Logic.

15 References [I] W. A. Woods, "Don't Blame the Tool," (An essay on the limitations of first-order logic as a foundation for knowledge representation), Computational Intelligence, Vol.3, N°3, pp.228-237, 1987. [2] BISC-SE 2005, BISC Special Event in Honor of Prof. Lotfi A. Zadeh, "Forging New Frontiers", Proceedings: M. Nikravesh (Ed.), Memo No. UCB/ERL M05/31, Nov. 2, 2005. [3] "Fuzzy Logic and the Semantic Web" Workshop, Extended abstracts available at: http://www.lif.univ-mrs.fr/FLSW, Marseille, France, 2005. [4] IPMU 2004, Special Session "Fuzzy Logic in the Semantic Web: a New Challenge", Proc. pp.1017-1038, [email protected], Perugia, Italy, 2004. [5] "New directions in Enhancing the Power of the Internet" (Proceedings UCB/ERL, Berkeley, Memo N° M01/28, August 2001) and "Enhancing the Power of the Internet", M. Nikravesh, B. Azvine, R. Yager and L.A. Zadeh (Eds.), Springer Verlag, 2004. [6] E. Sanchez (Editor), "Fuzzy logic and the Semantic Web", Elsevier (2006). [7] L. A. Zadeh, "Web Intelligence and Fuzzy Logic - The concept of Web IQ (WIQ)", Invited talk at the 2003 IEEE/WIC Int. Conference on Web Intelligence, Halifax, Canada, available at: www.comp.hkbu.edu.hk/IAT03/InvitedTalkl.htm [8] L.A. Zadeh, "From Search Engines to Question-Answering Systems — The Problems of World Knowledge, Relevance, Deduction and Precisiation", in E. Sanchez (Ed.), "Fuzzy logic and the Semantic Web", Elsevier (2006) 163-210. [9] FLSW-I1, Fuzzy Logic and the Semantic Web Second Workshop, Abstracts in the Proceedings of IPMU 2006, Paris, July 2-7, 2006. [10] E. Sanchez, "Fuzzy Ontologies for the Semantic Web", FQAS 2006, 7th Int. Conf. on Flexible Query Answering Systems, in the Proc, Springer-Verlag Lecture Notes, Milano, Italy, 7-10 June, 2006. [II] T. Berners-Lee, J. Hendler, O. Lassila, "The Semantic Web", Scientific American, May 2001.

T H E ROLE OF SOFT C O M P U T I N G I N A P P L I E D SCIENCES

P. P. WANG Department of Electrical & Computer Engineering Box 90291, Pratt School of Engineering Duke University Durham, North Carolina,27708, U.S.A E-mail: [email protected] This talk intends to introduce a spectrum of applications of the soft computing methodologies in applied sciences, which include the areas of computational biology, bioinformatics, chemistry, geology, mechanics, automatic and intelligent controls, operations research and economics. In particular, the fusion of several pillars of soft computing will be emphasized, especially those integrations which provide improved performance of the overall systems. There are three possible outcomes in applying soft computing methods in many examples of applications. The first class of examples illustrate a similar performance as compared with the conventional methods. The second class of examples illustrate much better performance by using soft computing methods. Finally, there is a class of examples in which the conventional techniques simply do not exist. In other words, soft computing methodologies offer the only viable solution due to their ability to deliver solutions for non-reductionistic and highly nonlinear problems.

16

PARTI

Foundations and Recent Developments

This page is intentionally left blank

A FUNCTIONAL TOOL FOR FUZZY FIRST ORDER LOGIC EVALUATION

VICTORIA LOPEZ Fac. Mathematics, Complutense University Madrid, Spain. Email: [email protected] J. MIGUEL CLEVA Fac. Informatics, Complutense University Madrid, Spain. Email:jcleva@sip. ucm. es JAVIER M O N T E R O Fac. Mathematics, Complutense University Madrid, Spain. Email: [email protected]

In this paper we present an automatic evaluation tool for fuzzy first order logic formulae. Since different logics can be considered, we allow for such formulae the appearance of syntactic modifiers, in such a way that our tool is designed not only to evaluate formulae in existing logic, but also to evaluate properties in any other logic framework given by the user. Such generalization is performed using Haskell, a functional programming language.

1. Introduction Verification and software quality measures are important fields nowadays. There exist many different approaches for the verification of software, which requires a logical specification of prerequisites and results of each program under consideration. Such verification mechanisms are considered in many different paradigms, such as imperative8, functional7 or functionallogic languages4. According to the specific program characteristics and the properties to be verified, different techniques can be taken into account. Main verification techniques are model-checking1, theorem proving3 and testing 10 , but alternative combinations between them can be considered, 19

20 together with other formal methods (like abstract interpretation 5 , for example). Model checking verifies that a program, formalized as a transition system, satisfies a given temporal logic formula. Model checking is a very efficient technique to verify such temporal formulae from an initial state (starting point of our computation procedure). Theorem proving consists of verifying a given logical formula over a system, which is being specified as a program. Theorem provers can be distinguished by the language in which systems are specified (see Coq Ref 3). Testing is commonly used for huge systems in which the previous approaches cannot give a result in reasonable time (it is also used to speed up decisions about system specifications). From a formalization and verification point of view, the classical approach for verification of systems is the Hoare alternative 8 , where the specification of the system is done by a pair of first order logical (FOL) formulae, and the verification of the imperative system uses the Hoare's deduction rules (see Hoare Ref 8). Nevertheless, this approach is not enough to deal with other programs executed in parallel. In this case, considered specification is temporal logic, which reflects the idea of system evolution in time. But systems nowadays are even more complex, since they evolve in space too. Hence, spatial-temporal logics are introduced in order to specify programs. Still, specification requisites may be inexact, not fitting standard crisp formalism. For this reason we have introduced a fuzzy logic approach for the specification of program properties 9 . But such a fuzzy logic approach requires a certain level of certainty of a given formula, to be chosen from different interpretations. The tool we present in this paper has been initially developed to assist an expert to select a suitable logic for a given situation, based upon Haskell 2 , a lazy functional programming language which seems appropriate for a general but efficient tool. In particular, with this tool we calculate the values of a fuzzy first order formula for a collection of possible logics given by the user. In this way, experts can get a better knowledge for their decisions. The paper is structured as follows: section 2 is devoted to a survey on specification of software; in section 3 we develop our tool for evaluating fuzzy logic formulae, followed by a section 4 with examples and a final section for conclusions and future research work.

21 2. Software specification Software verification requires to formalize characteristics of the system. This is the main objective of software specification, where the properties each algorithm must verify are given, by means of a precondition and a postcondition. Precondition describes the situation in which the algorithm can be applied (otherwise we may get undesired results). Postcondition describes the relations between the input d a t a and the output of the given algorithm, at the end of the computation process. Such statements are formalized as First Order Logic (FOL) formulae (see Ref 6 for a survey on classical logic). From those formulae and the Hoare's deduction rules 8 for verification of programs, a formal verification of the algorithm can be developed, by applying some appropriate deduction rules in order to deduce from the precondition the final postcondition. The main problem on specification is that properties of a system are given by demand (clients provide the requisites our program should satisfy). Such requisites are usually given in natural language, so they use to be ambiguous. To cope with that possibility we have considered as specification framework the fuzzy logic approach 9 . Under this approach, systems are specified as triples (precondition, program and postcondition), where the precondition and postcondition are fuzzy FOL formulae. For verification and evaluation purposes, a specific method has been developed 9 in order to evaluate the confidence level of our program, once a particular specification has been given. Hence, we need an evaluation of any given fuzzy logic formula, so we can interpret the relation between precondition and postcondition. An automatic system will be very useful for this purpose.

3. Evaluating Fuzzy FOL formulae In this section we present the main characteristics of the evaluation tool we have developed. Our main goal was to provide a mechanism to evaluate fuzzy first order formulae within a given logic, to be considered by the expert for deciding about such logic for the validation of the program. The main characteristic of this tool is the possibility of dealing with any fuzzy or crisp logic for the interpretation of formulae. For the implementation of our tool we have considered Haskell 2 , a functional programming language allowing functions as arguments. The function that evaluates the formula in a given scenario is called e v a l . The implementation of this function is shown in figure 1. It makes

22 use of many other functions checking the correctness of the input data, splitting the formula in different tokens to be evaluated, and calculating the partial values of different kinds of formulae. The general form of this function is: e v a l form semlist univ intlist numcharlist

cont modlist

where the following parameters appear: - form is the fuzzy FOL formula. Fuzzy first order logic is a natural formalization of the system properties. Its syntax is the same as first order logic (FOL 6 ), where we define the following translation between FOL formulae and formulae accepted in the system (we shall be able to write any fuzzy formula in our system by applying this translation mechanism). Definition 3 . 1 . Let tp be a FOL formula. inductively defined as follows:

Its translated formula

• Predicate symbols P have their syntactic counterpart P in the system. • -• ip is translated into tp > ip • The quantifications Vx.tp and 3x.tp are translated into kx.(tp) and Ex. (ip) respectively. - semlist is the logic used to evaluate the formula. In our evaluation tool the fuzzy or crisp logic is given as a list of pairs formed by a logic symbol (" & I >) and the function associated to the corresponding symbol. For simplicity, we have considered in the program usual functions, like Zadeh's logic (dmin,dmax,comp) or Lukaszewicz's logic (luka.comp), but each user can introduce alternative functions. In both cases the function comp is defined as comp(x) = 1-x . Lists in Haskell are written as sequences of elements between square brackets separated with commas (e.g., [ 1 , 2 , 3 , 4 ] is the list of naturales formed by such numbers). - univ is universe of discourse of the logic. The universe is introduced as a list of characters not overlapping any other name neither variable nor predicate symbol. Such characters are written in Haskell between quotation marks. As we can only deal with finite lists, we restrict our universe of discourse to be a finite domain. - intlist is the list of pairs of predicates and associated interpretation

23 eval :: [Char] -> [(Char,Float->Float->Float)] -> [Char] -> [(Char,Float->Float)] -> [(Char,Char,Float)] -> [(Char,Char)] -> [(Char,Float->Float)] -> Float eval xs fs us rs cs es ms = evalR xs fs 0 us rs cs es ms evalR :: [Char] -> [(Char,Float->Float->Float)] -> Float -> [Char] -> [(Char,Float -> Float)] -> [(Char,Char,Float)] -> [(Char,Char)] -> [(Char,Float->Float)] -> Float evalR [] _ v _ _ _ _ _ = v evalR C(':xs) fs v us rs cs es ms= let (ys,zs)=formulaS('(':xs) in evalR zs fs (evalR ys fs v us rs cs es ms) us rs cs es ms evalR C&':xs) fs v us rs cs es ms = let (ys,zs)=formulaS(xs) in evalR zs fs ((funcion '&' fs) v (evalR ys fs v us rs cs es ms)) us rs cs es ms evalR (M':xs) fs v us rs cs es ms = let (ys,zs)=formulaS(xs) in evalR zs fs (dor v (evalR ys fs v us rs cs es ms) (funcion '&' fs) (funcion '"' fs)) us rs cs es ms evalR C>':xs) fs v us rs cs es ms = let (ys,zs)=formulaS(xs) in evalR zs fs (funcionl (funcion '>' fs) (funcion '"' fs) (funcion '&' fs) v (evalR ys fs v us rs cs es ms)) us rs cs es ms evalR C"':xs) fs v us rs cs es ms = let (ys,zs)=formulaS(xs) in evalR zs fs ((funcion '*' fs) (evalR ys fs v us rs cs es ms) 0) us rs cs es ms evalR CA':xs) fs v us rs cs es ms = let (ys,zs,var)=formulaSC(xs) in evalR zs fs (valoraA ys var fs v us rs cs es ms) us rs cs es ms evalR CE':xs) fs v us rs cs es ms = let (ys,zs,var)=formulaSC(xs) in evalR zs fs (valoraE ys var fs v us rs cs es ms) us rs cs es ms evalR (xixs) fs v us rs cs es ms = let (y,m,zs) = formulaRel(xs) in evalR zs fs (valorRel x (head y) m us rs cs es ms) us rs cs es ms Figure 1. The eval function implemented

24 functions. This collection of predicates is inserted as a list of pairs formed by the predicate symbol and the interpretation function. - numcharlist is the list of numeric characteristic values. The numeric characteristic for each universe symbol is introduced as a list of triples (S,P,V), where S is an element from the universe of discourse, P is a predicate symbol, and V is the numeric characteristic associated to S for the predicate symbol P. We use this numeric characteristic to obtain the interpretation of the element for the corresponding predicate. - cont is the environment used to evaluate free variables. It is represented as a list of pairs formed by the variable and the value within a bounded universe. In almost every execution this environment is empty, i.e., there are no free variables. - modlist is the list of modifiers of the logic. The list consists of pairs formed by the modifier symbol and its interpretation function. The modifier symbols are not standard for the program, therefore the user should give them with the associated interpretation function. 4. E x a m p l e Let us consider three individuals for this case named John, Michael, Ann, to be observed under the predicates tall, short and old, young. Lets for example assume that their respective heights are 1.7, 1.9, 1.65 meters and that their respective ages are 18, 20 and 35 years. The interpretation function for these values can be seen in the table below.

John Michael Ann

HEIGHT 1.7 1.9 1.65

tall 0.4 1 0.3

short 0.6 0 0.7

AGE 18 22 35

young 1 0.9 0.6

old 0 0.1 0.4

Lets then evaluate two specific formulae with our tool: Example 1: - Michael is neither very old nor very young: this property is specified as -(PT(M)AQT(M)) where in this case M represents Michael, P and Q are the predicates old and young and the | represents the syntactic modifier very. After performing the syntactic translation we obtain the following formula valid in the system: ~(P+(M)&Q+(M))

25 where + represents the modifier | . To evaluate the formula in our tool we have to give an element in the list of modifiers and its associated function. The call to the system in this case is the following: evaluate "~(P+(M))&(Q+(M))" [('k> ,luka),('~',comp)] ['M'] [ ( ' P ' . y o u n g ) , C Q ' , o l d ) ] [ ( ' M \ ' P ' , 2 2 ) , ( ' M \ ' Q ' , 2 2 ) ] [] [ ( ' + ' , s q r ) ] where sqr is the predefined function to calculate the square of a given number. The formula is evaluated in the system using the Lukaszewicz's logic together with the square function as interpretation of the modifier symbol for predicates. Example 2: - Everybody is very tall and very young: this property is formalized as follows. Vx.(PT0=)AQT0=)) where P and Q represents the predicates tall and young. And after the translation process we obtain the expression: Ax.(P+(x)&Q+(x)) To ask the tool for the value of the expression, we need to give the universe of discourse as the list of elements A, B, C. The environment is again empty, and the expression introduced to the system will be evaluate "Ax.((P+(x))&(Q+(x)))" [('&',dmin),('"',comp)] ['J'.'M'.'A'] [ ( ' P \ tall),('Q',young)] [(,J',,P',1.7),('J',,Q',18),,('M','P',1.9),('M','Q',22), C A ' , , P , , 1 . 6 5 ) , ( , A ' , ' Q ' , 3 5 ) ] [] [ ( ' + ' , s q r ) ] The logic used to evaluate this expression is the Zadeh's one and for the universal quantification we can use the aggregation of the conjunction of every instance of the formula, substituting the variable with a universal symbol (see table below with the values for expressions in this example). Example 1 2

Zadeh 0.19 0.09

Product 0.1881 0.0042

Lukaszewicz 0.18 0

Although results are no dramatically different, it is easy to understand that a fuzzy logic selection and properties settings reach different output values and those values can be used by the expert to make strong decisions.

26 5. Conclusions and future work We have presented in this paper a functional tool for the evaluation of fuzzy first order formulae. This tool is useful to assist an expert to decide about the convenience of different logics. The functional programming language we chose, Haskell, allows to consider functions as parameters, producing a general but efficient tool. Nevertheless, it is still needed that the user writes down the whole expression to be evaluated. In order to make this tool more friendly to users, we plan to improve it by embedding the system into another language with graphic interface capabilities. We also plan to extend the above tool to carry out the progress of the program, in terms of the involved logic formulae, so the user can obtain at any time the evaluation of the formula transformed by the program instruction. Acknowledgments This Research has been partially supported by grants MTM2005-08982 and TIN2005-09207 from the Government of Spain. References 1. B. Berard, M. Bidoit, A. Finkel, F. Laroussinie, A. Petit, L. Petrucci and Ph. Shnobelen. Systems and Software verification: model-checking techniques and tools, Springer, 2001. 2. R. Bird. Introduction to functional programming using Haskell, Prentice Hall, 1998. 3. Y. Bertot and P. Casteran. Interactive Theorem Proving and Program Development Coq'Art: The Calculus of Inductive Constructions, Springer, 2004. 4. J.M. Cleva, J. Leach and F.J.Lopez-Praguas. A logic programming approach to the verification of functional-logic programs, Proc. Principles and Practice of Declarative Programming (PPDP'04), ACM Press, 2004, pp. 9-19. 5. P. Cousot and R. Cousot. Refining model checking by abstract interpretation, Automated Software Engineering Journal 6:69-95, 1999. 6. H.B. Enderton. A Mathematical Introduction to Logic, Academic Press, 2001. 7. M.J.C. Gordon and T.F. Melham. Introduction to HOL, Cambridge Univ. Press, 1993. 8. C.A.R. Hoare. An axiomatic basis for computer programming, Comm. ACM 12:89-100, 1969. 9. V. Lopez, J. Montero y L. Garmendia. Fuzzy specification of algorithms, Technical Report (www.mat.ucm.es/ fuzzycs/fsa.pdf). 10. G.J. Myers. The Art of Software Testing, John Wiley, 1979.

FIELD THEORY AND COMPUTING WITH WORDS GERMANO RESCONI ( , ) and MASOUD NIKRAVESH

(2)

(1) Catholic Univeristy.Brescia , Italy, Email resconi(a),numerica,it BISC Program, EECS Department, University of California, Berkeley, CA 94720, US, Email: Nikravesh(d).eecs. berkelev. edu

<2>

In this paper, we assume that computing with words is equivalent to computation by field. A field is generated by a word or a sentence as sources. Computation by field means the research of the intensity of the sources of fields (holography) and construction by the intensity of the sources of the fields (fields modelling process). A field is a map between points in the reference space and values. For example, in a library, the reference space would be where die documents are located. At any given word, we define the field as a map of the position of the documents in the library and the number of the occurrences (values) of the word in the document. The word or source is located in one point of the reference space (query) but the field (answer) can be located in any part of the reference. Complex strings of words (structured query) generate a complex field or complex answer by which structure is obtained by the superposition of the fields of the words as sources with different intensity. Any field is a vector in the space of the documents. A set of basic fields is a vector space and form a concept. We break the traditional idea that a concept is one word in the conceptual map. Internal structure (entanglement) of the concept is the relation of dependence among the basic fields. Geometric image of the concept X and the field R in Zadeh's rule "X isr R" is given. Fields can be fuzzy sets where values are the membership values of the fuzzy set. The ambiguous word is the source ( query) of the fuzzy set ( field or answer).

1. Introduction We know that a key assumption in computing with words is that the information which is conveyed by a proposition expressed in a natural language or word may be represented as a generalized constraint of the form "X isr R", where X is a constrained variable; R is a constraining relation; and r is an indexing variable whose value defines the way in which R constrains X. Thus, if p is a proposition expressed in a natural language, then "X isr R" representing the meaning of p, equivalently, the information conveyed by p. Therefore, the generalised constraint model can be represented by field theory in this way. The meaning of any natural proposition p is given by the space X of the fields that form a concept in the reference space or objective space, and by a field R in the same reference.. We note that a concept is not only a word, but is a domain or context X where the propositions p represented by the field R are locate. The word in the new image is not a passive entity but is an active entity. In fact the word is the source of the field. We can also use the idea that the word as an abstract entity is a query and the field as set of instances of the query is the answer. 27

28

In the agent image, where only one word (query) as a source is used for any agent. The field generated by the word (answer) is a Boolean field (values true/false). We can thus compose the words by logic operations to create complex Boolean expression or complex Boolean query. This query generates a Boolean field for any agent. The set of agents create a set of elementary complex fields whose superposition is the fuzzy set represented by field with fuzzy values. The field is the answer to the ambiguous structured query whose source is the complex expression p. The fields with fuzzy values for complex logic expression is coherent with traditional fuzzy logic with a more conceptual transparency because is founded on agents and Boolean logic structure. As points out [Masoud Nikravesh] Web is a large unstructured and in many cases conflicting set of data. So in the Web, fuzzy logic and fuzzy sets are essential to put query and find appropriate searches to obtain the answer. For the agent interpretation of the fuzzy set, the net of the Web is structured as a set of conflicting and in many case irrational agents whose task is to create any concept. Agents produce actions to create answers for ambiguous words in the Web. A structured query in RDF can be represented as a graph of three elementary concepts as subject, predicate and complement in a conceptual map. Every word and relationship in the conceptual map are variables whose values are fields which superposition gives the answer to the query. Because we are more interested in the meaning of the query than how we write the query itself, we are more interested in the field than how we produce the field by the query. In fact, different linguistic representations of the query can give the same field or answer. In the construction of the query we use words as sources of fields with different intensity. With the superposition we obtain the answer for our structured query. We structure the text or query to build the described field or meaning. It is also possible to use the answer, as a field, to generate the intensity of the words as sources inside a structured query. The first process is denoted READ process by which we can read the answer (meaning) of the structured query. The second process is the WRITE process by which we give the intensity or rank of the words in a query when we know the answer. In analogy with the holography, the WRITE process is the construction of the hologram when we know the light field of the object. The READ is the construction of the light field image by the hologram. In the holography the READ process uses a beam of coherent light as a laser to obtain the image. Now in our structured query the words inside of the text are activated in the same time. The words as sources are coherent in the construction by superposition of the desired answer or field.

29

2. Representation of the space of the fields inside a reference space Given the n dimensional reference space Xi , X 2 ,.. .,Xn , a scalar field is a function G that associate at one point of the space one value . So the field is G = G( Xi , X 2 ,.. .,X n ). A linear model of the field G is obtained by the weighted superposition of basic fields F! , F 2 ,.. .,FP. in this way G = S,F 1 (X 1 ,...,X n ) + S 2 F 2 (X,,...,X n ) +

+ S D F n (X ] ,...,X I 1 )

(1)

In the equation (1), Fi , F 2 ,..., F n are the basic fields and Si , S 2 ,..., Sn are the weights or source values of the basic fields. We assume that any basic field is generated by an entity as source. So the intensity of the fields is proportional to the intensity of the sources that generate the field itself.

2.1 Example of the basic field and sources In Figure 1, we show an example of two different basic fields in a two dimensional reference space (x,y). The general equation of the fields is

F(x>y) = S[e

U

U

]

(2)

the parameters of the field Fi are S=l h =2 and x0 = -0.5 and y0 = -0.5, the parameters of the field F2 are S=l h =2 and x0 = 0.5 and y0 = 0.5

fo^..,

-1 -1

-0.5

0_5

1

-0.5

Figure 1. Two different basic fields in the two dimensional reference space (x,y).

30 For the sources Si = 1 and S2 = 1 the superposition field F that we show in figure 2 is F = F] + F2. For the sources Si = 1 and S2 = 2 the superposition field F that we show again in figure 2 is F = Fi + 2 F 2 . F = F, + 2F2.

F = F, + F2

Figure 2. Example of superposition of elementary fields Fj , F2

2.2 Computation of the sources To compute the sources Sk we put the values of the elementary fields Fk in the different points Ph of the reference space in the colons of the table 1 . The field G is put in the last colon of the table. Table 1. Fields values for M points in the reference space

Pi

p2 PM

F, F,., F2,

F2 F,.2 F2.2

F2.N

Field G G, G2

F

FM,2

FM,N

GM

M,1

•

FN

>•

Fl.N

The values in the table 1 is represented by this matrices K 1,1 2,1

X F

M,\

U L

K

IN

F

2,2

F

M,2

G,

2,N

r

M,N.

GM.

31 The matrix X is the relation among the elementary fields Fk and the points Ph. Now we are interested in the computation of the sources S by which we give the best linear model of G by the elementary field values. So we have the superposition expression F

'

F

and G* = S,

' F12 '

l,l"

2,l

_FM,\_

+s2

^2,2

' \ n ' + ... + S„ \ n

= xs

(3)

_FM,n_

_FM,2_

Now we compute the best sources S in a way the difference IC-Gl is the minimum distance for any possible choice of the set of sources. It is easy to show that the best sources is obtained by the expression T S = (XX)

-1

T X G

(4)

when X is a square matrix we have G* = G and -1 S = X G and G* = G = XS (5) In a geometric representation, any column of the table 1 is a vector in the orthogonal multidimensional space of the points P of the reference space. The set of the vectors for any elementary field is a general reference that forms the space of the fields. The contro-variant coordinates of the field G in the space of the field are the best sources S. In figure 3 we show an example of the space of the fields

•

P.

Pi Figure 3 The fields F] and F 2 are the space of the fields. The coordinates of the vectors Fj and F2 are the values of the fields in the three points P] , P 2 , P3.

32 When the vector G is out of the space of the fields as we show in figure 4

•

P.

P2 Figure 4 The vector G is out of the space of the fields Fi and F2. The variable QG = X S = G* is the projection of G on the space of the fields The value of the variable D (metric of the space of the field) computed by the expression (6) D2 = ( X S )T ( X S ) = ST XT X S = ST A S = [ ( X T X ) " ' X T G ]TXTX [ ( X T X ) ' X T G ] = ( Q G ) T Q G

(6)

Is the intensity of the field QG where QG = X [ (X T X )•' XT G ] = X S

(7)

is the projection of G on the space of the fields X. All the fields G with the same projection on the space of the fields have the same D. When X is a square matrix D is always invariant because Q G = G = G*. We remark that A = XT X is a quadratic matrix that gives the metric tensor of the space of the fields. When A is a diagonal matrix all the elementary field are independent one from the other. But when A has non diagonal elements, in this case the elementary fields are dependent on from the other. Among the elementary field there is a correlation or a relationship. We note that with A it is possible to represent a complex network of relationships between fields.

33

3. Field theory, concepts and Web search To search in the web, we use the term-document matrix to obtain the information retrieved. In table 2, we show the data useful in obtaining the desired information in the Web. Table 2. Term ( word), document and complex text G

Document] Document2 DocumentM

Wordi K,.,

Word2 K,.2

K2.1

K2.2

K-2.N

Concept G G, G,

KM,I

KM,2

KM,N

GM

•

• •

Wordn Kl.N

Where Ky is the value of the wordj in the document. The word in table 2 is a source of a field which values are the values in the position space of the documents. Any document is one of the possible positions of the word. In a Web search, it is useful to denote the words and complex text in a symbolic form as queries, the answers are the fields generated by the words or text as sources. Any ontology map is a conceptual graph in RDF language where we structure the query as a structured variable. The conceptual map in figure 5 is the source G of a complex field obtained by the superposition of the individual words in the map.

Figure 5. Conceptual map as structured query. The map is a structured variable whose answer or meaning is the field G in the documents space located in the Web

34 Differential equation of the fields can give the dynamic of the field G in the space of the documents or reference space. Given the word "warm" and three agents that are named Carlo , Anna and Antonio, we assume that any agent is a source for a Boolean fields in figure 6.

CARLO

ANNA

ANTONIO

Figure 6 Three Boolean fields Fi(X,Y), F 2 (X,Y), F 3 (X,Y) for three agents. The superposition of the thee Boolean field in figure 6 is the fuzzy set fi(X, Y) = Sj Fj (X,Y) + S2 F2 (X,Y) + S3 F3 (X,Y)

(8)

At the word "warm" that is the query we associate the answer ft(X, Y) that is a field. Different words generate Boolean expressions (query) as G = IF [ ( X AND Y ) ORZ ] THENH

(9)

any agent generate the Boolean fields for the words X , Y , Z and H. After any agent by the (9) generate a new field for G by the composition of the fields for the words X,Y,Z,H. The superposition of the three elementary Boolean fields Gi , G2 , G3 one for any agent, gives the fuzzy set field or answer to the query in (9). We remark that the elementary Boolean fields generate the space of the fields and inside the have the fuzzy set for (9). Relationship among the agents is obtained by A = XT X where X is the space of the fields.

Reference 1. L. A. Zadeh and M. Nikravesh, Perception-Based Intelligent Decision Systems; Office of Naval Research, Summer 2002 Program Review, Covel Commons, University of California, Los Angeles, July 30th-August 1st, 2002. 2. L. A. Zadeh, and J. Kacprzyk, (eds.), Computing With Words in Information/Intelligent Systems 1: Foundations, Physica-Verlag, Germany, 1999a. 3. L. A. Zadeh, L. and J. Kacprzyk (eds.), Computing With Words in Information/Intelligent Systems 2: Applications, Physica-Verlag, Germany, 1999b. 4. G. Resconi and L. C.Jain, Intelligent Agents , Springer, 2004

NEW OPERATORS FOR CONTEXT ADAPTATION OF MAMDANI FUZZY SYSTEMS ALESSIO BOTTA 1MT Lucca Institute for Advanced Studies, Via San Micheletto 3', 55100Lucca, alessio. botta@imtlucca. it

Italy

BEATRICE LAZZERINI, FRANCESCO MARCELLONI Dipartimento di Ingegneria dell 'Informazione: Elettronica, Informatica, Telecomunicazioni, University of Pisa, Via Diotisalvi 2, 56122 Pisa, Italy {b.lazzerini,fmarcelloni}@iet.unipi.it In this paper we introduce a set of tuning operators that allow us to implement context adaptation of fuzzy rule-based systems while keeping semantics and interpretability. The idea is to achieve context adaptation by starting from a (possibly generic) fuzzy system and adjusting one or more its components, such as membership function shape, fuzzy set support, distribution of membership functions, etc. We make use of a genetic optimization process to appropriately choose the operator parameters. Finally, we show the application of the proposed operators to Mamdani fuzzy systems.

1. Introduction The issue of context in fuzzy rule-based systems (FRBS) has been addressed in previous papers [1,2,3,4,5], where context adaptation is mainly regarded as scaling fuzzy sets from one universe of discourse to another. This approach, which has been used, e.g., in FRBS identification problems [1], is, however, unsatisfactory since it does not take into account such issues as the modification of the shape of the membership functions (MFs), of the center of mass and/or the radius of fuzzy sets, and of the extent to which each rule contributes to the final result. On the other hand, context adaptation can be reasonably regarded as the tuning process of an initial normalized FRBS, which can just be built based on a normalized universe of discourse with uniform distribution of the linguistic terms associated with the variables of interest. Of course, an appropriate set of context adaptation operators is required to perform the tuning process. From this perspective, context adaptation also allows reusability of FRBSs: a new FRBS can be derived through context adaptation from a previously available FRBS, provided by a domain expert or obtained by means of an identification technique. A very intuitive example of context adaptation is related, e.g., to 35

36 tuning the sample curves, which represent the general, similar structure of the wages of workers, in order to create a specific curve instance pertinent to a specific context (i.e., educational level, type of work, sex, etc.). In this paper we propose a set of tuning operators that can be used to perform context adaptation with several degrees of freedom. For the sake of simplicity, we will consider triangular MFs described by the triplet (/, m, u), where / and u represent the lower and upper bounds of the support and m the modal value, and will refer to Mamdani FRBSs because of their very immediate interpretability. In the following, we first introduce the operators used to adapt the initial normalized FRBS to the context and then we discuss the application of a genetic algorithm to optimize the choice of the operator parameters. Finally, we show an application of our approach to sample data. 2. Non-Linear Scaling Function for Fuzzy Domains Since we start from a normalized FRBS, the first context adaptation we have to perform is to adjust the universes so as to cover all possible input and output values and possibly to make granularity finer in some parts of the universe and coarser in the other parts. This is carried out by non-linear scaling functions. The scaling function proposed in [1] and [5] can concentrate/dilate the MFs around a single point (a sort of center of gravity), that can be either the center of the universe or one of the two extremes. In [4], the authors propose a more general scaling function, that is a linear composition of sigmoidal-like functions. These two approaches have some limits: the former allows concentrating/dilating the universes around only three predefined points, while the latter requires several parameters, whose identification can be a long and difficult process in the case of strong non-linear behavior of the scaling function. In this paper, we introduce a scaling function which, extending the approach proposed in [5], can both compress/dilate the universe of discourse (from a normalized interval [0,1] to any context-specific interval [a,b]) and nonuniformly distribute the MFs, allowing to select any point in [a,b] as center of gravity. The scaling function is defined as: [a + (b-a)[l-(\-Ay

''(1-*)']

ifx>X,

where ks > 0 is the degree of compression/dilation and X e [0,1] is the relative position within interval [a,b] of the center of gravity. Figure 1 shows three different applications of the function with different values for ks and A. The upper part of the figure displays function s(x) and the lower part shows the scaled MFs (continuous lines) obtained by transforming the original MFs (dotted lines) through s(x).

37

(a) (b) ' (C) ' ' Figure 1. Examples of application of the non-linear scaling function with different values of X and ks: (a) X = 0.5, ks = 0.5; (b) X = 0.25, ks = 0.3; (c) X = 0.8,fe= 1.2.

3. Fuzzy Modifiers As discussed in the Section 1, the effects of the context should be modeled so as to modify the FRBS coherently. Thus, when performing the tuning of the FRBS, we should avoid modifying each single parameter of each MF separately, but rather we should perform the same action on all MFs at once. The scaling function shown in (1) is an example of this approach, but it allows only very restricted modifications of the MFs. To the aim of providing a more flexible and customizable tuning, we use a set of appropriately defined fuzzy modifiers. As stated in [6], a fuzzy modifier is any mapping from a fuzzy set to another fuzzy set. Usually, a fuzzy modifier is associated with a linguistic meaning (modifier semantic), thus composing a linguistic hedge. A modifier is inclusive, if it generates either a superset (expansive modifier) or a subset (restrictive modifier) of the original fuzzy set. In the following, we describe four different modifiers, which allow adapting each fuzzy set of a partition to a specific context, without affecting the meaning of the fuzzy set. The modifiers are concurrently applied to all fuzzy sets of a partition by using the same parameters. 3.1. Coverage-Level Modifier The level of coverage e of a fuzzy partition P = {Ah...^iN} is defined as [7]: Vxe[a,b],3i\le.

on the universe [a,b] (2)

To determine the level of coverage can be very expensive, because it involves computation of the membership values for each point of the universe. Fortunately, some fuzzy partitions show nice properties that simplify the

38 calculation of the level of coverage, restricting the search to a small, finite set of points. For instance, in the case of triangular fuzzy sets, if the following conditions hold: (a) Vi | 2< i < N,mM
(b) I, = ml = a

(c) mN=uN= b, (3)

then the level of coverage is the minimum among the membership values corresponding to the crossing points of adjacent MFs. The first modifier, named coverage-level modifier, moves the crossing points up or down by a factor ka e [-1,1] as follows: y = y • (1 + min(0, ka)) + (1 - y) • (ka - min(0, ka )),

(4)

where y is the crossing point. This effect is obtained by updating the MFs using the following formulas, which are derived from (4):

KL "»,(/,-MM ) + /, (m, - mM ) m, - wM + kCL {I, - uM) l k

i- cLm,

i-Kimi

m

kCLml (lM -u,) + u, (mM - m,) if^
(5)

U

„

\~kCL

1 - KC1

if*a>0.

In partitions that satisfy conditions (3), the effect of the modifier is to increase or decrease the level of coverage of the same factor kCL- Figures 2.a and 2.b show the effects of the application of the coverage-level modifier with, respectively, kCL ~ 0.5 and kCL — -0.5 to a sample partition. The coverage-level modifier is an inclusive modifier, which is expansive when kCL > 0 and restrictive when ka < 0. Though the modifier does not change the core of the MFs, it always modifies the supports. The modification of the supports may cause coverage problems: in particular, when ka. = -l, all the crossing points lie on the X-axis. On the other hand, when kcL = 1, all MFs turn into constant functions with value 1, thus becoming completely indistinguishable. 3.2. Core-Position Modifier The second modifier, named core-position modifier, shifts the position of the core of the fuzzy set within the support. Let [clhcu,], [clj',cuj'] and [sl^u,] be the initial core, the modified core and the support of a fuzzy set, respectively. The modifier changes the position of the core as follows, r

,,

n

f[c/( + (c/ f -*/,)• *CT, CM, H - C C / , - * / , ) - ^ ] if^ CT <0 [[c/j + (su: -cuj)-kcp,cut + (su, -cu.^)-kcp] if kCP > 0,

[cll , cui \ = \

(b)

where kCp e [-1, 1]. Figure 2.c shows the effects of the application of the coreposition modifier with kcp = 0.5 on a sample MF. As regards triangular MFs, c/,

39 = ctij = mh slj = /, and sut = ut. Though the core-position modifier is not an inclusive modifier, it is characterized by the property of not changing the support of the fuzzy set.

(a)

(b)

(c)

Figure 2. Example of application of the coverage-level (a-b) and the core-position (c) modifiers.

3.3. Generalized Positively Modifier In the literature several different linguistic hedges have been proposed. One of the most interesting is the linguistic hedge positively, defined as: A'(x) =

\2A1(x) l-2[l-A(x)]2

\iA{x)0.5,

(7)

where A '(x) is the fuzzy set resulting from the application of the linguistic hedge to A(x). In [8], the modifier has been generalized by introducing a parameter kGP > 0 for managing the contrast intensification. We further generalize the modifier by inserting a parameter 9 e [0,1] to control the membership value to which the MF changes concavity: A'(x) =

••A'-ix)

A{x)f

ifA(x)<0 if A(x) > 6.

(8)

This modifier, named generalized positively modifier, is very powerful: it can generate a large number of different shapes using an overall compact notation and just two parameters. Further, with particular values of kGP and 9, it can reproduce the effects of the original linguistic hedge positively (9 = 0.5, kGP = 2) and of other three linguistic hedges introduced by Zadeh (very, more-orless and negatively). Figures 3.a and 3.b show the effects of the application of this modifier with, respectively, 9 = 0.75, kGP = 2.5 and 6 = 0.3, kGP = 0.3 on a sample MF. The generalized positively is not an inclusive modifier, except for 9 = 0 and 9 = 1: in these cases, it is either restrictive (if kGP< 1) or expansive (if kGP > !)• When kGP —> 0 or kGP » 1, the modifier can generate strange shapes that, eventually, will degenerate into crisp or singleton MFs. 3.4. Generalized Enough Modifier Another interesting linguistic hedge, defined in formula (9a), is known as absolutely. By analogy with absolutely, we define a new linguistic hedge, named

40 enough, in (9b). fl if A(x) = 1 „ s [0 ifA(x)<\

f 1 if A(x) > 0 [0 ifA(x) = 0.

We observe that, while absolutely reduces a fuzzy set to a crisp set equal to its core, enough transforms a fuzzy set into a crisp set equal to its support. We generalize the enough linguistic hedge by introducing a parameter kE, which determines the width of the part of the support that is "crispified'\ as follows: r /. n \\.cli-^i-i.cli-sll)-kE,cul+wr{sul-cul)-kE'\ [cl'cu'\ = < [[ct, - (clt -slt)-kE, cut + (su> - cut) • kE ]

ifAr£<0 (10) if kE > 0,

where w, =(CM / -C/ / )/[(C/ / -J/,) + (5M J -CM J )]. Figure 3.c shows the application of the generalized enough modifier with kE = 0.3. The generalized enough is an inclusive modifier. Some interpretability problems can arise when kE —»• 1, because adjacent MFs tend to overlap very much.

Figure 3. Examples of application of the generalized positively (a-b) and the generalized-enough modifiers (c).

4. The Genetic Algorithm The problem of modeling the effects of a context on a normalized FRBS reduces to searching for an optimal configuration of the nine parameters used in the tuning operators. To find the optimal parameters, we adopt a genetic algorithm (GA). In the following, we briefly describe some specific features of the GA. Coding scheme. We use a real-coded chromosome consisting of all the parameters needed to configure the tuning operators for each linguistic variable. Assuming that V is the number of the linguistic variables, the length of the chromosome is equal to 9 V. Fitness function. Let {(xj,yj)} be a set of M experimental data, where x, is a vector of V-\ input data, and yj is the corresponding output of the real system that we are trying to model. We use as fitness function the mean square error (MSE) estimator, defined as MSE = (2A/)"1 ^T" (F{x.)-yj)\

where Fix]) is the

output of the contextualized FRBS computed on input Xj. Mutation and crossover operators. As mutation operator, we use the random uniform mutation described in [9], As regards the crossover operator,

41 we use the BLX-a crossover operator [9], with a = 0.25 for the genes corresponding to parameters av and bv of the non-linear scaling function, and a = 0 for the other genes. The choice of using a = 0.25 for parameters av and bv allows tuning the extremes of the universe to the experimental data. Phenotype generation. The generation of a phenotype depends not only on the values of the parameters, but also on the order of application of the scaling function and the MF modifiers. Indeed, their composition is not a commutative operation. For the sake of simplicity, in the experiments we chose the following pre-fixed order: non-linear scaling, coverage-level modifier, core-position modifier, generalized enough modifier and generalized positively modifier. We observe that the two modifiers (generalized enough and generalized positively), which change the shape of the MFs, are applied at the end of the sequence: this allows to apply the non-linear scaling and the first modifiers to triangular MFs. 5. Example: Structure of Wages We applied the context adaptation technique to the structure of wages shown in Figure 4.a. The structure of wages [10] is studied in economics by means of a set of curves that show how the hourly wage changes with the amount of years of experience. We have different curves depending on the different educational levels of the people: college graduates, college drop-outs, high-school graduates and high-school drop-outs. Hence, we can consider the educational level as the context. The input and the output linguistic variables are uniformly partitioned with 6 and 5 fuzzy sets, respectively. Thus, the initial Single Input Single Output (SISO) normalized FRBS is composed of 6 IF-THEN rules. For each of the four curves, we executed ten runs of the GA, with a population of 20 individuals, a stopping condition over the total number of generations set to 5000 and a training set of 100 uniformly distributed points. Table 1 shows the averaged MSE in the form of mean ± standard deviation (o) and the best MSE achieved in the runs. Figures 4.a-d display, respectively, the original curves, the output of the initial FRBS, the results of the best fitting models (solid) compared to the original curves (dotted) and an example of a contextualized output linguistic variable. We observe that, although we used an extremely simple FRBS, we obtained a very low MSE on all models.

Figure 4. Example of context adaptation performed on the wage curves inspired by [10].

42 Table 1. Main results of 40 runs of the context adaptation GA on the structure of wages example. Curve

Mean MSE ± a

Best MSE

College graduates

0.0148 ±0.0142

0.0066

College drop-outs High-school graduates

0.0103 ±0.0118 0.0072 ± 0.0029

0.0047 0.0037

High-school drop-outs

0.0034 ± 0.0030

0.0012

6. Conclusion We have shown how context adaptation of a fuzzy rule-based system can be obtained as a tuning process of a given FRBS which aims both to adjust the components of the FRBS to the specific domain and to keep the interpretabihty degree of the original FRBS. We have introduced a set of powerful context adaptation operators, we have described the genetic tuning of their parameters, and we have presented their application to Mamdani fuzzy systems. References 1. L. Magdalena, Adapting the gain of an FLC with genetic algorithms, Int. J. Approximate Reasoning 17 (4) (1997). 2. R. R. Gudwin, F.A.C. Gomide, Context adaptation in fuzzy processing, Proceedings of the Brazil-Japan Joint Symposium on Fuzzy Systems (1994). 3. W. Pedrycz, R.R. Gudwin, F.A.C. Gomide, Nonlinear context adaptation in the calibration of fuzzy sets, Fuzzy Sets and Systems 88 (1997). 4. R. Gudwin, F. Gomide, W. Pedrycz, Context adaptation in fuzzy processing and genetic algorithms, Int. J. Intell Systems 13 (10/11) (1998). 5. O. Cordon, F. Herrera, L. Magdalena, P. Villar, A genetic learning process for the scaling factors, granularity and contexts of the fuzzy rule-based system data base, Inform. Sci. 136 (1-4) (2001). 6. M. De Cock, E.E. Kerre, A new class of fuzzy modifiers, Proceedings of IEEE-ISMVL (2000). 7. J. Valente de Oliveira, Semantic constraints for membership function optimization, IEEE Trans. Systems, Man, Cybern. 29 (1999). 8. H. Shi, R. Ward, N. Kharma, Expanding the definitions of linguistic hedges, IFSA World Congress and 20th NAFIPS Int. Conf. (2001). 9. F. Herrera, M. Lozano, J.L. Verdegay, Tackling real-coded genetic algorithms: Operators and tools for behavioral analysis, Artificial Intell. Rev. 12(1998). 10. K.M. Murphy, F. Welch, The structure of wages, Quart. J. Econ. 107:1 (1992).

USING PARAMETRIC FUNCTIONS TO SOLVE SYSTEMS OF LINEAR FUZZY EQUATIONS-AN IMPROVED ALGORITHM

ANNELIES V R O M A N , GLAD D E S C H R I J V E R , E T I E N N E E. K E R R E Department

of Applied Mathematics and Computer Science, Ghent University Fuzziness and Uncertainty Modelling Research Unit Krijgslaan 281 (S9), B-9000 Gent, Belgium. {Annelies. Vroman\ Glad.Deschrijver]Etienne.Kerre} @UGent. be Homepage: http ://www. fuzzy. UGent. be

Buckley and Qu proposed a method to solve systems of linear fuzzy equations. Basically, in their method the solutions of all systems of linear crisp equations formed by the a-levels are calculated. We proposed a new method for solving systems of linear fuzzy equations based on a practical algorithm using parametric functions in which the variables are given by the fuzzy coefficients of the system. By observing the monotonicity of the parametric functions in each variable, i.e. each fuzzy coefficient in the system, we improve the algorithm by calculating less parametric functions and less evaluations of these parametric functions. We show that our algorithm is much more efficient than the method of Buckley and Qu.

1. Introduction In this paper we search for a solution of the matrix equation: Ax = b for x = [£fc]nxi where A = [a,ij}nXn is a matrix with fuzzy numbers as entries and b = [bfc]nxi is a vector of fuzzy numbers. Such equations are hard to solve exactly and often the exact solution does not exist or is a vector of fuzzy sets which are not fuzzy numbers. Therefore the search for an alternative solution has a solid ground. Buckley and Qu * have already proposed a solution. We follow their line of reasoning, although the solution can be adjusted a little bit. A practical algorithm to find this solution, which is an improved version of the basic method described in a previous paper5, is proposed here. 2. Preliminaries First we recall some definitions concerning fuzzy numbers4. Let A be a fuzzy set on R. Then A is called convex if A(\x\ + (1 — A)x2) > 43

44 min(j4(xi) )J 4(a;2))), for all xit x2 £ E and A £ [0,1]. If for x £ K it holds that -A(a:) = 1, then we call x a modal value of A. The support of A is defined as supp/1 = {x \ x € R and J4(X) > 0}. A mapping / : R —> R, or in particular / : R —> [0,1], is called upper-semicontinuous when / is right-continuous where / is increasing, and left-continuous where / is decreasing. Definition 1. 4 A fuzzy number is defined as a convex uppersemicontinuous fuzzy set on R with a unique modal value and bounded support. From now on fuzzy numbers will be denoted by a lowercase letter with a tilde, e.g. a, and a vector of fuzzy numbers will be denoted as b . Sometimes we will denote the i-th component of b by (b)(. Crisp numbers will be represented by a lowercase letter, e.g. a, and vectors of crisp numbers will be denoted as b = (61,62, ••• , 6 n ) T . The notions of support and a-level are extended componentwise for vectors or matrices of fuzzy numbers. The arithmetic of fuzzy numbers is based on Zadeh's extension principle and is equivalent to interval arithmetic applied to the a-levels. 3. Solving s y s t e m s of linear fuzzy e q u a t i o n s First of all, we require that the matrix A of fuzzy numbers is regular in the sense that the matrix A~l exists for all a$j £ supp(ay). Buckley and Qu 1 proposed to construct a set of all crisp solutions corresponding to the crisp systems formed by the elements in a certain a-level. They define the solution by, for all a £ ]0,1], fi(a) = { x | x £ R " and (3A = [aij}nxn

£ E n x n ) ( 3 b = [6 f c ] n x l £ R n )

((V(«,j,fc) £ { l , 2 , . . . , n } 3 ) ( o i j £ {aij)a and bk £ (bk)a)

and Ax = b)}

and for all x e R " , x B ( x ) = sup{a | a £ ]0,1] and x £ fi(a)}. We see that XB is defined as a fuzzy set on R n and not as a vector of fuzzy numbers. The solution xjg(x) expresses to what extent the crisp vector x is a solution of the system of linear fuzzy equations Ax = b. We prefer to define a vector as solution of fuzzy numbers to avoid information loss. Therefore we give a membership degree to every component of the solution vector and then (XB)J(X) expresses the degree to which x belongs to the fuzzy set (xg)j, independent of ( X B ) J , for all j =f i. We thus define for all x £ R and for a l H £ { 1 , 2 , . . . , n} (xfi)i(x) = sup{a I a £ [0,1] and (3x £ fi(a))(x = xi)},

(1)

45 where X; denotes the i-th component of x. This method is purely theoretical: in fact all crisp systems are solved. When all these systems have to be solved, the computation time will be large. In this paper we propose a practical algorithm to compute the solution. Instead of solving all these crisp systems, we determine parametric functions of these solutions and we only calculate the necessary solutions. 3.1. Systems

with one fuzzy

coefficient

We first consider the case that we have to solve a system of linear fuzzy equations in which exactly one of the coefficients is a fuzzy number and the other coefficients are crisp. Without loss of generality we may assume that fin is a fuzzy number. In order to obtain the solution xs of Ax. = b, we have to solve the crisp systems A(an)x. = b, where fin is replaced by an in the matrix A for all a n e [ a u , a n ] = supp(fin). We can solve all of these systems through Cramer's rule thanks to the non-singularity of the crisp matrix A(an), for all a n € supp(fin). So we can write the solution for every component as a quotient of two determinants. The determinant of a matrix A is denoted as \A\. By expanding the determinants in the numerator and the denominator along the first row, we can write each component of the solution using parameters cy, C2j, C3 and c\: *j = fj(an)

=

„

„

, .

•

(2)

Due to this result, every solution can be written using parametric functions with variable a n . Note that c\j and c-ij are dependent of j due to the fact that the j-th column in the numerator contains the components of b. On the other hand, the denominator is the same for all j € { 1 , . . . , n } , so C3 and C4 are independent of j . Thus we propose the following method to solve .<4x = b. First we compute the determinant of the matrices A(an) and A(an)- The parameters C3 and 04 are obtained by solving the following system of linear crisp equations: |^(an)l = an c 3+C4 |A(an)| = oiic 3 + C4. We solve the crisp systems i4(o11)x = b,

(4)

A(S„)x = b,

(5)

46 and denote by x = {xlt... ,xn)T and x = (xi,... ,xn)T the solutions of (4) and (5) respectively. Then, for all j £ { 1 , . . . , n}, we obtain c\j and C2j by solving the following system of crisp equations: £j\A{au)\

= ancij + C2j

Xj\A{an)\

= aucij + C2j.

Consequently, all possible solutions for the crisp systems A(an)x. = b, for all a n £ supp(an), can be obtained using (2). Moreover we see that the function fj is monotone. Indeed, it follows that fj(au) = ° 0 '" : g'^ 2i , for some real numbers C\j, C2j, £3 and £4. We calculate, for all a n £ supp(an), Qfj

I

\

5

lJ£4 -

C2jC3

where the denominator is always strictly greater than 0 because of the regularity of the matrix A. Hence fj is monotone. Therefore we only have to solve the crisp systems with the lower and upper limit of a certain a-level, denoted for example by (aii)„ and (an)a respectively, to find the a-level of the solution. We denote this solution by xj:

{x/)i(x) = sup{a I a n £ {(a n )£, (an)a}

and

fiian)

= x}

for all x £ /(supp(an)), and (xr)i(x) = 0, for all x £ R \ /(supp(dn)). 3.2. Systems

with two fuzzy

coefficients

Assume we have two fuzzy numbers an and a\2, and that the other coefficients are crisp. In Figure 1 the grey area is the set supp(an) x supp(ai2). If we fix ai2, e.g. a\2 = fii2i then we have a system with only one fuzzy number fin. So for any a n € supp(fin), the solution of the crisp system j4(an,a 1 2 )x = b is given by x = (h(an), f2(0-11), •••, fn{au))T where auc3 + c4 where c\j, C2j, C3 and C4 are calculated in a similar way as described in Subsection 3.1. Note that we have found the solution of the crisp systems corresponding to the points on the lower thick line in Figure 1. Similarly, we obtain for all a n € supp(an), the solution of the system ^4(an,ai2)x = b by constructing functions / ' (with parameters c'^, c'2j, c'3 and c 4 ), for j £ { l , . . . , n } , and by calculating x = (/{(an), / j ( o n ) , . . . , / i ( a n ) ) T Now we fix arbitrarily a n € supp(fin) and let 012 £ supp(ai2) vary. So, again, we obtain a system with only one fuzzy number, but this time

47

fin Figure 1.

an

Solving systems with two fuzzy coefficients

the fuzzy number is &12. Thus we are looking for the solution of the crisp systems corresponding to the points on the vertical thin line in Figure 1. Similarly as we did before for a n , we can obtain the solution of the crisp system A(a*n, a12)x = b as ^ = /;n(a12) = for all j e {1, system

ai2C,7- + c .2j

(7)

ai2C 3 11 + c 4 "

. , n } . We find the parameters C311 and c"11 by solving the

^11^3 + c 4 — a 1 2 c 3 u + c 4 " auc3

(8)

+ c4 = a i 2 c 3 u + c 4 n .

We have seen above that the solutions of the crisp systems ^ ( a l l > f l l 2 ) x = b> yl(a* 1 ,a 1 2)x = b , are given by x a " a

/n( ii))

T

=

(fi(a*n),

• •., f„(a*u))T

and x a "

=

(/{(ajj),...,

respectively. Then, for all j £ { l , . . . , n } , we obtain Cjj 1 and

1

Cjj by solving the following system: ancij a

+ c2j = fi^Cjj1 + c2y

l l c l j + C2j = 0,120^ + C2f.

48 Consequently, all possible solutions for the crisp systems j 4 ( a n , ai2)x = b , for all
= < /j(on). /7ail(ai2),

ifai2=ai2» ifoi2 = ai2, else.

Again we see that the function fj is monotone in both arguments. Indeed, for fixed o n £ s u p p ( a n ) , from (7) it follows that fj{au,ai2) — 01 r s o m e rea "a? ^+T'> f ° l numbers &ij, c2j, £3 and £4. We calculate, for all
cijc4 - c2jc3 —— .,, (ai 2 c 3 + c 4 ) 2

where the denominator is always strictly greater than 0 because of the regularity of the matrix A. Hence fj is monotone in its second argument. Similarly, for fixed 012 6 supp(fii2), we find that fj{a,n,ai2) = " g/+g?J is monotone for a n £ s u p p ( o n ) . Therefore we only have to calculate by means of the parametric functions the solution of the crisp systems obtained by all combinations of the lower and upper limits of a-levels of each fuzzy number involved to find the a-level of the solution. We define for all j £ { 1 , . . . , 71} the fuzzy set {xj)j on R b y (xi)j(x)

=

SU

P{« I (an,012) £ { ( o n ) a , {an)"} and x = fj(an,ai2)},

x {(aiz)£, (5i2)„}

,g.

for all x £ / j ( s u p p ( a n ) , s u p p ( a i 2 ) ) , where (xj)j(x) = 0, for all x £ K \ / j ( s u p p ( a n ) , s u p p ( a i 2 ) ) . We can prove that this is essentially the same as the solution XB of Buckley and Qu.

3.3.

Systems

with more

than two fuzzy

coefficients

Clearly, the procedure proposed in Subsection 3.2 can be extended to systems with more than two fuzzy coefficients. T h e o r e m 3 . 1 . The solution xs obtained by the method described above is the same as the solution x g obtained in (1).

49 T h e o r e m 3.2. Let x s = (x\,... ,xn)T be the solution obtained by the above method. Then, for all j € { 1 , . . . ,n), ij is a fuzzy number, given that all the fuzzy numbers in A and b are continuous. In general, if we have K fuzzy coefficients and we want to approximate the solution using m intermediate points in the support of each fuzzy number, then the total operation count for the method described above is pKn3+2K+1n2+f2Kn+33*2I<-1+5*2K-1(K(n+l)(m-3)-m)-5n-10. The total operation count for the method of Buckley and Qu, since they need to solve mK crisp nxn systems is m ' " I j 3 " ~n>. It is easy to see that for large n, K and m the method described above needs less computation time than the method of Buckley and Qu. We have implemented our algorithm and the improved algorithm in Matlab 6.5 and we have compared them with the method of Buckley and Qu, which consists of solving all crisp systems directly without using parametric functions. In Figure 2(a) the computation time a for the three algorithms is plotted in function of the dimension of the system with four fuzzy coefficients and four a-levels. b In Figure 2(b) the difference in computation time between the basic method with parametric functions and the improved version is shown in function of the dimension of the system with four fuzzy coefficients and four a-levels considered.

4. C o n c l u s i o n In this paper we have proposed an improved version of our method to solve nxn systems in which some (or all) coefficients are fuzzy. While in the method of Buckley and Qu for every element in the support of each fuzzy number the corresponding crisp nxn system is solved, in our method only the crisp nxn systems corresponding to the bounds of each support must be solved, and only the solution for the combinations of the lower and the upper limits of the considered a-level are obtained using parametric functions. Our method performs much better when a lot of a-levels are considered and when n is large.

a

Actually the average of the computation time of 5 different n x n systems for each considered n. b These results are obtained on a computer with an Intel Pentium 4 CPU 2.40 GHz processor (and 512 MB RAM).

50

dimanwxicfUidvriam(•.» »-. SIXMOO)

(a) Comparison of the method of Buckley and Qu, the basic method and the improved method with parametric functions

dmndoi d Of >y«ani(*{|. 5^S00*5C»t

(b) Comparison of the basic method and the improved method with parametric functions

Figure 2. Computation time in function of the dimension of the system. The system contains 4 fuzzy numbers and considers 4 a-levels.

Acknowledgment Annelies Vroman would like to thank the Fund for Scientific ResearchFlanders for funding the research project G.0476.04 elaborated on in this paper. References 1. J. J. Buckley and Y. Qu,"Solving systems of linear fuzzy equation", Fuzzy Sets and Systems, vol. 43, pp. 33-43, 1991. 2. M. Eisenberg, "Axiomatic Theory of Sets and Classes", Holt, Rinehart and Winston, Inc., New York, 1971. 3. R. Moore, "Interval Arithmetic", Prentice-Hall, Englewood Cliffs, NJ, USA, 1966. 4. E. E. Kerre, "Fuzzy Sets and Approximate Reasoning", Xian Jiaotong University Press, Xian, People's Republic of China, 1999. 5. A. Vroman, G. Deschrijver and E. E. Kerre, "Solving systems of linear fuzzy equations by parametric functions", IEEE Transactions on Fuzzy Systems, in press.

NUMERICAL IMPLEMENTATION STRATEGIES OF THE FUZZY FINITE ELEMENT METHOD FOR APPLICATION IN STRUCTURAL DYNAMICS

D. MOENSr D. V A N D E P I T T E K. U.Leuven,

dept. Mechanical Engineering, div. Kasteelpark Arenberg 41 B - 3001, Heverlee, Belgium E-mail: [email protected]

PMA

This paper gives an overview of numerical strategies for the implementation of the fuzzy finite element method for structural dynamic analysis. It is shown how the deterministic finite element procedure is translated to a fuzzy counterpart. Some general solution strategies for the underlying interval problem are reviewed. The specific requirements for the implementation of the fuzzy method for structural dynamic analysis are then discussed based on two typical dynamic analysis procedures, i.e., the eigenvalue analysis and the frequency response analysis.

1. Introduction The exponential growth of computational capabilities of modern computers clearly has an impact on the use of the finite element (FE) method for engineering purposes. This evolution has paved the way for a number of computationally intensive analysis techniques derived from the classical FE technique, as e.g. non-deterministic FE analyses. In this context, recently a number of non-probabilistic approaches are emerging as alternative for the more common probabilistic FE analysis. The interval FE (IFE) analysis is based on the interval concept for the description of uncertain model properties, and so far has been studied only on an academic level li2. The fuzzy FE (FFE) analysis is basically an extension of the IFE analysis, and has been studied in a number of specific research domains as e.g. structural static and dynamic analysis 3>4'5. Section 2 of this paper summarises the methodology behind the implementation of the FFE method. Different generally applicable solution * Postdoctoral Fellow of the Research Foundation - Flanders

51

52 strategies for the underlying interval problem are reviewed in section 3. Section 4 then focuses on the specific interval solution schemes for dynamic analysis, focusing respectively on eigenvalue analysis and frequency response analysis.

2. T h e Fuzzy F i n i t e E l e m e n t M e t h o d Whereas a classical set clearly distinguishes between members and nonmembers of a set, the fuzzy set introduces a degree of membership, represented by the membership function. This function describes the grade of membership to the fuzzy set for each element in the domain. A fuzzy set x is defined as fix (x): x=[(x,

fix (x)) \{xeX)

{m (x) € [0,1])}

(1)

for all x that belong to the domain X. The support of a fuzzy set equals: supp(x) = {x £ X,fiXi(xi)

> 0}

(2)

The difference with the classical (or also called crisp) set is that the fuzzy concept allows for membership values different from zero and one. This enables the representation of a value that is only to a certain degree member of the set. The goal of the F F E analysis is to obtain a fuzzy description of some output quantities of a F E analysis in which the non-deterministic input is modelled using the fuzzy set model. It consequently aims at the derivation of the membership function of the output quantities y = / ( x ) given the membership functions of all input quantities in x. Therefore, the F F E analysis requires an arithmetic which handles the numerical evaluation of functions of fuzzy sets. A general concept follows directly from ZADEH'S extension principle 6 . The strategy consists of searching in the output domain for sets which have an equal degree of membership. This is achieved by analysing the input domain on a specific level of membership a. At this level, the a-cuts of the components of the input vector x are defined as: Xi'a = {xi e X{,p,Xi(xi)

> a}

(3)

This means that an a-cut is the interval resulting from intersecting the membership function at jxXi (xi) = a. After deriving the a-cuts of all input quantities at a specific level, a general interval analysis is performed: Via = {Vi I (Vj € { 1 , . . . ,n}){Xj

e Xj'a) and y = / ( x ) }

(4)

53 It can be proved (see e.g. MOENS et al. 7 ) that the obtained output interval is an intersection of the output membership function at the a-level, and consequently represents an a-cut of the output. This means that a discretized approximation of the output membership function can be obtained from repeating the a-level procedure at a number of levels. Note that in order to obtain the support of the resulting output membership function, the interval analysis has to be performed on the supports of the input membership functions. Also, in case of a multi-dimensional output vector y, this procedure has to be repeated on each individual component y% of the output vector. Based on this a-cut strategy, a number of F F E applications have been published in specific research domains. Since, through the a-cut strategy, the IFE analysis forms the numerical core of the F F E method, this paper now focuses on the implementation of the IFE analysis. 3. General i m p l e m e n t a t i o n s c h e m e s for I F E analysis 3.1. The interval

arithmetic

approach

The interval arithmetic approach consists of translating the complete deterministic numerical F E procedure to an equivalent interval procedure using the interval arithmetic operations for addition, subtraction, multiplication and division of interval scalars. This approach has an important drawback, due to the fact that an arithmetic interval operation overestimates the interval result if it neglects correlation between the operands. This will be further referred to as conservatism. In the interval translation of the F E analysis, this phenomenon cannot be avoided as it is impossible to keep track of the relationships between all intermediate results of the algorithm. Since numerous operations are required in a general F E solution procedure, the amount of conservatism due to this phenomenon can be substantial. Therefore, this approach is not applied as stand-alone principle in F F E implementations. However, it can be usefully applied in a hybrid implementation scheme, as discussed in section 4. 3.2. The global optimisation

approach

Basically, calculating the smallest hypercube around the solution set expressed in Eq. (4) is equivalent to minimising and maximising each output component of the deterministic analysis result y. This can be implemented using a global optimisation procedure in which the interval vector containing the uncertain parameters defines the constraints for the variables. The

54 solution set is an interval vector y 1 describing the hypercube around the exact solution: yl = {y{vi--yIm}

(5)

with: 1i

=

min r /i (x) > * = * • •• ™ X6X 1

(6)

y{ = max fc (x), i = 1 . . . m (7) xex 1 An efficient and robust optimisation algorithm is primordial for this solution strategy. The input interval vector defines the number of constraints and, therefore, strongly influences the performance of the procedure. Also, as an execution of the deterministic FE analysis is required in each goal function evaluation, this approach is numerically expensive, and most appropriate for rather small FE models with a limited number of input uncertainties. 3.3. The vertex

analysis

The vertex method 8 approximates the result of an interval analysis by introducing all possible combinations of boundary values of the input intervals into the analysis. For n input intervals, there are 2™ vertices for which the analysis has to be performed. These vertices are denoted by Cyj = 1,... 2". Each of these represents one unique combination of lower and upper bounds on the n input intervals. The approximate analysis range is deduced from the extreme values of the set of results for these vertices:

Vii

mm{/( c j)}i> m a x {/( c j)}i 3

(8)

3

Despite its simplicity, this method has some disadvantages. Most importantly, this approach cannot identify local optima of the analysis function which are not on the vertex of the input space. It only results in the exact output interval if the analysis function is monotonic over the considered input range. This is a strong condition that is difficult to verify for FE analysis because of the often complex relation between analysis output and physical input uncertainties. The approximation obtained when monotonicity is not guaranteed is not necessarily conservative. This reduces the validity of this method for design validation purposes. Secondly, it is clear that the computational cost of this method increases exponentially with the number of input intervals, which limits its applicability to systems with very few interval uncertainties.

55 4. IFE implementation strategies for structural dynamic analysis 4.1. Eigenvalue

analysis

The deterministic procedure of the FE eigenvalue analysis consists of the assembly of the system stiffness and mass matrices K and M, after which the deterministic eigenvalue Aj satisfies the equation: K # , = XM&i

(9)

with 3?j the corresponding eigenvector. In the interval eigenvalue procedure, the aim is to calculate the bounds on specific eigenvalues, given that the uncertain parameters are within their defined ranges x 1 . This comes down to the calculation of the solution set: {A* I (K e K 1 ) ( M € M 1 ) (K*{ = AiM*i) |

(10)

with K and M incorporating implicitly the dependency of the system matrices on the input parameters. Note that in this equation, # t is different for each A, in the solution set. It can be shown that, assuming independent interval system matrices, the exact bounds of this solution set are achieved for vertex matrix combinations 9 . This means that, based on the assumption that all interval entries appearing in the system matrices are independent, the exact solution of the interval eigenvalue problem can be found. Some algorithms have been developed which efficiently calculate this exact vertex solution of the interval eigenvalue problem. CHEN et al. 9 introduced a non-iterative procedure based on the RAYLEIGH quotient, which states that the lower and upper bound on the ith eigenvalue follow directly from two deterministic eigenvalue problems: (K + S i K S i ) ¥ i = A 7 ( M - S i M S i ) ¥ i 1

1

(K - S'KS ) * ; = Xi (M + S'MS ) * ,

(11) (12)

with S1 = diag (sgn($j,i),... sgn(<J?j,n)) and 4>, the ith eigenvector from the deterministic analysis at the midpoints K and M of the interval matrices. K and M represent the radius of the interval matrices. This method requires all the components of the eigenvector to have a constant sign over the considered domain and does not allow the occurrence of an eigenfrequency cross-over in the input parameter space. An enhanced methodology was developed by EL-GEBEILY et al. 10. It provides a solution for the original problem with an extra restriction of symmetry on the considered system

56 matrices: (\i

| (Ks e K1) (Ms e M1) (Ks*i = AiM.#i) |

(13)

with Ks and Ms symmetric. The most important effect of this extra restriction is that it intrinsically removes the conservatism resulting from allowing artificial non-symmetric system matrices. Currently, a number of research activities aiming at approximating the solution of the problem stated in Eq. (10) are still ongoing n<12. However, even if the exact bounds on this solution set can be found, a substantial amount of conservatism remains incorporated in the result as the entries of the system matrices are implicitly decoupled. Furthermore, the independent treatment of the mass and stiffness matrices forms an additional source of conservatism. These matrices indeed could be strongly mutually related, for instance when geometrical uncertainties are considered. Neglecting this coupling can introduce a severe overestimation in the obtained interval eigenvalue results 5 . The exact amount of this conservatism is strongly problem-dependent. Therefore, it is very difficult to assess the results obtained from the interval matrix problem, no matter how close the exact solution set is approximated. When applying the vertex approach to remedy the conservatism, it should be verified whether the eigenvalues have a monotonic behaviour over the interval input space. It is very hard to check this condition. The global optimisation approach on the other hand comes down to a minimisation and maximisation of each considered eigenvalue. Any numerical optimisation algorithm can be used to do this optimisation. The performance of most optimisation algorithms increases substantially when analytical expressions are available for the derivatives of the goal function to the design parameters. For the eigenvalue optimisation, these derivatives can be calculated analytically based on the system matrices derivatives 13 :

dxj

#fM*i

Accordingly, the optimisation approach for interval eigenvalue analysis generally is not computationally expensive when the system matrices derivatives are known. But also when the derivatives have to be approximated numerically, generally a good convergence is observed. The numerical efficiency combined with the total absence of conservatism in the obtained results make the optimisation approach very interesting for interval eigenvalue analysis.

57 4.2. Frequency

response

analysis

In order to extend the applicability of IFEM, a general remedy to excessive conservatism was introduced 14 . It is a hybrid procedure, consisting of both a global optimisation and an interval arithmetic part. In the first part, an optimisation is applied to calculate the exact interval result at some intermediate step of the total algorithm. This is achieved by minimising and maximising the intermediate results over the interval input parameter space. In the second part, the interval arithmetical procedure is applied on these intermediate interval results. This method has two major advantages: • because of the global optimisation, all conservatism prior to the optimised intermediate result is neutralised • the performance of the optimisation step is controllable by adequately choosing the level on which to perform it This approach has been successfully applied in an IFE procedure for the calculation of interval frequency response functions (FRFs) 1 5 . In the first part of this procedure, the optimisation is used to translate the interval properties defined on the F E model to the exact interval modal stiffness and mass parameters of the structure. The calculation of the envelope FRFs in the second part is done by applying the interval arithmetic equivalent of the modal superposition procedure on these interval modal parameters. This procedure neutralises all conservatism in the matrix assembly phase, since it directly uses the modal parameters as goal functions in the optimisation part. The final envelope FRFs have been proved to contain only a very limited amount of conservatism.

5. Conclusion This paper gives an overview of the existing numerical implementation strategies for the fuzzy finite element method. The generally applicable procedures, i.e., the global optimisation and the vertex approach, have the specific advantage that they can be readily combined with any existing deterministic finite element code. For dynamic eigenfrequency analysis, a number of interval solution procedures are discussed. However, in many practical situations, the global optimisation approach is very well suited to tackle this problem. Finally, it is discussed how a specific methodology based on a hybrid approach can be used to perform dynamic response analysis in a fuzzy finite element framework.

58 References 1. Mullen, R. and Muhanna, R., "Bounds of Structural Response for All Possible Loading Combinations," Journal of Structural Engineering, Vol. 125, No. 1, 1999, pp. 98-106. 2. Dessombz, O., Thouverez, P., Laine, J.-P., and Jezequel, L., "Analysis of Mechanical Systems Using Interval Computations Applied to Finite Element Methods," J. of Sound and Vibration, Vol. 239, No. 5, 2001, pp. 949-968. 3. Rao, S. and Sawyer, P., "Fuzzy Finite Element Approach for the Analysis of Imprecisely Denned Systems," AIAA Journal, Vol. 33, No. 12, 1995, pp. 2364-2370. 4. Chen, L. and Rao, S., "Fuzzy Finite-Element Approach for the Vibration Analysis of Imprecisely-Defined Systems," Finite Elements in Analysis and Design, Vol. 27, 1997, pp. 69-83. 5. Moens, D., A Non-Probabilistic Finite Element Approach for Structural Dynamic Analysis with Uncertain Parameters, PhD thesis, K.U.Leuven, Leuven, 2002. 6. Zadeh, L., "Concept of A Linguistic Variable and Its Application to Approximate Reasoning .1," Information Sciences, Vol. 8, No. 3, 1975, pp. 199-249. 7. Moens, D. and Vandepitte, D., "A Survey of Non-Probabilistic Uncertainty Treatment in Finite Element Analysis," Computer Methods in Applied Mechanics and Engineering, Vol. 194, No. 14-16, 2005, pp. 1527-1555. 8. Dong, W. and Shah, H., "Vertex Method for Computing Functions of Fuzzy Variables," Fuzzy Sets and Systems, Vol. 24, 1987, pp. 65-78. 9. Chen, S., Qiu, Z., and Song, D., "A New Method for Computing the Upper and Lower Bounds on Frequencies of Structures with Interval Parameters," Mechanics Research Communications, Vol. 22, No. 5, 1995, pp. 431-439. 10. El Gebeily, M., Abu-Baker, Y., and Elgindi, M., "The Generalized Eigenvalue Problem for Tridiagonal Symmetric Interval Matrices," International Journal on Control, Vol. 72, No. 6, 1999, pp. 531-535. 11. Chen, S., Lian, H., and Yang, X., "Interval eigenvalue analysis for structures with interval parameters 1," Finite Elements in Analysis and Design, Vol. 39, No. 5-6, 2003, pp. 419-431. 12. Qiu, Z., Wang, X., and Friswell, M., "Eigenvalue bounds of structures with uncertain-but-bounded parameters 2," Journal of Sound and Vibration, Vol. 282, No. 1-2, 6-4-2005, pp. 297-312. 13. Fox, R. and Kapoor, M., "Rates of Change of Eigenvalues and Eigenvectors," AIAA Journal, Vol. 6, No. 12, 1968, pp. 2426-2429. 14. Moens, D. and Vandepitte, D., "An Interval Finite Element Approach for the Calculation of Envelope Frequency Response Functions," International Journal for Numerical Methods in Engineering, Vol. 61, No. 14, 2004, pp. 24802507. 15. De Gersem, H., Moens, D., and Vandepitte, D., "A Fuzzy Finite Element Procedure for the Calculation of Uncertain Frequency Response Functions of Damped Structures: Part 2 - Numerical Case Studies," Journal of Sound and Vibration, Vol. 288, No. 2, 2004, pp. 463-486.

ENVIRONMENTAL/ECONOMIC DISPATCH USING GENETIC ALGORITHM AND FUZZY NUMBER RANKING METHOD GUANGQUAN ZHANG 1 , GUOLI ZHANG 2 , JIE LU 1 , HAIYAN LU1 Faculty of Information Technology, University of Technology, Sydney, PO Box 123, Broadway, NSW 2007, Australia Department of Applied Mathematics, North China Electric Power University Baoding 071003, PR China

Abstract: This paper establishes a novel environmental/ economic load dispatch model in which the cost and emission functions are with uncertain coefficients. To solve the problem, this study first converts the model into a single objective optimization problem by using a novel weighting ideal point method. A hybrid genetic algorithm with quasi-simplex techniques is then developed to solve the corresponding single objective optimization problem. Fuzzy number ranking method is also applied to compare all fuzzy function values of different points for the single objective function. The test results have convinced the validity of the model and effectiveness of the algorithm used for achieving a solution for the economic dispatch problem. Keywords: Environmental/economic dispatch; Genetic algorithm; Fuzzy numbers

1. Introduction The conventional economic dispatch problem mainly concerns the minimization of operating cost subject to diverse units and system constraints. Recently, the environmental pollution problem caused by generation has been proposed and discussed by both industry managers and researchers. How to decrease the emission of maleficent gases has become an important issue in electricity generation. Some related feasible strategies have been proposed to reduce the atmospheric emissions [1,2]. These include (1) installation of pollutant cleaning equipment, (2) switching to low emission fuels, (3) replacement of the aged fuel-burners, and (4) generator units, and emission dispatching. The literature [3] has pointed out that the first three strategies should be as long-term options. The emission dispatching option is an attractive short-term alternative. In fact, the first three options should be determined by the generation companies, but not by regulation departments, especially in the environment of power market. As the aim to pursue in a long run is to reduce the emission of harmful gases, we should reduce the emission of maleficent gases of the generation companies by regulating principle. Therefore, the environmental/economic load dispatch problem considering the emission of harmful gases becomes a key issue in power market. 59

60 Many uncertain factors are involved in modelling the cost and emission functions for environmental/economic load dispatch problems. However, literature review shows that previous environmental/economic load dispatch models lack a suitable consideration for the uncertainty issue [1 - 4]. This paper therefore proposes a new environmental/economic load dispatch model, which includes the cost and emission functions with uncertain coefficients and the constraints of a ramp rate. In this paper, the uncertainties in coefficients are represented by fuzzy numbers. The model is known as fuzzy dynamic environmental economic load dispatch (FDEELD) model. To get an optimal solution from the model, a novel weighting ideal point method (WIPM) is proposed in this paper. The WIPM converts the FDEELD model into a single objective fuzzy nonlinear programming problem. A hybrid genetic algorithm with quasi-simplex techniques is developed to seek optimization solutions for the single objective nonlinear programming problem. A fuzzy number ranking method is applied to compare the fuzzy function values of different points for the single objective function. The rest of this paper is organized as follows. Section 2 proposes the FDEELD model by describing the coefficients of cost and emission functions as fuzzy numbers. Section 3 develops a weighting ideal point method, a hybrid genetic algorithm and a fuzzy number ranking method to solve the FDEELD problem. Test results are given in Section 4. Section 5 summaries the study. 2. Fuzzy Dynamic Environmental Economic Load Dispatch Model The basic structure of power market presented in the literature [5] consists of Power Exchange (PX) and Independent System Operator (ISO). In this market structure, PX is in charge of the spot trade, the economic load dispatch is the main task of the PX and ISO takes responsibility for network security and auxiliary service. Therefore, the load dispatch model may neglect the network constraints and spinning reserve. But it must consider the ramp rate limit in order to assure optimum solution. As the coefficients of cost function and emission function are with uncertainties, they are denoted by fuzzy numbers. The proposed FDEELD model is described as follows: min/ = ££(5,+V>(') + ^ 2 ( 0 ) <=• /-'

(1)

min e = £ £ (a, + /§,/> (/) + f,P] (0) /.I ;=l

fdPi(t) = PB(t) + PL(t) /-I

where T is time segment number; N is total number of committed units; Pj (t) is output active power of the unit j at time segment t; f is fuel cost function,

61 a •, bj, c, are fuzzy coefficients of the cost function of unit j ; e is emission function, a ft, f are fuzzy coefficients of the emission function of the unit j ; .P minand ^"\maxare minimum and maximum output of the unit j , respectively; PD (t) is load demand at time segment t; PL (t) is network loss at time segment t; D, is down ramp rate limit of the unit j , R. is up ramp rate limit of the unity .

Pj^M^Maxfo^.Pjit-D-Dj)

(2)

PjHtiO-Mtofo^.Pjit-D + Rj) (3) Obviously, the FDEELD model is a fuzzy bi-objective non-linear programming problem. Section 3 gives a new algorithm to solve this problem. 3. Weighting Ideal Point Method and Hybrid Genetic Algorithm 3.1 Weighing Ideal Point Method Weighting method and reference point method are both effective in finding the Pareto optimal solutions for multi-objective nonlinear programming problems. Strictly speaking, the weight method only represents the relative importance of the goal values of an objective rather than of different objectives. It is hard to know the magnitude of effect of the set of weights to each objective function value. The reference point method is a relatively practical interactive approach to multiobjective optimization problems. It introduces the concept of a reference point suggested by the decision maker which presents in some desired values of the objective functions. It is very hard to determine weights and reference points in applications, besides the interactive approach increases heavily computing burden. This paper proposes a new weighting ideal point method, which doesn't require interaction, and can predict the magnitude of effect of the set of weights to each objective function value. To describe the proposed WIPM method, we write a general multi-objective nonlinear programming problem as

min f(x) = (fx (*), f2 (x), -,fk

(x))

(4)

jcei

where fx (x), • • •, fk (x) are k distinct objective functions and .S is the constraint set defined by

S = {xeR"\gj(x)<0,j

= l,-,m}

(5)

Let

g(x) = w.(h

Jx

J\

f+--- + wk{Jk

Jk

Jk

f

(6)

62 where y*min

/r*o>

fr=rmnf,(x), xeS y*min\

/ /*min

a

/ = !, 2, •••,&.

so-called

ideal

or

Utopia

point, w = (Wj, • • •, wk ) > 0 , 2 J >v(. = 1 is a weight vector. To get the Pareto optimal solution of problem (4), it can be transformed to solve the single objective optimization problem below ming(.x) (') xeS

Since the values of different objective functions in (4) can be very different, it is hard to know the magnitude of the effect of the set of weights to each objective function value. In the model (7), all objectives are converted into the same level by T

y'min

using formula—'-

r— . We can therefore predict the effect quantity of the set of Ji

weights to each objective function value. For example, if w, = 2w2, then /

**

2

y*min

Jl

j~*

^ 2-A

/•min

rmm

J\ /•min

2 J\ where ft = ft (x ) , i — 1,2, x is the optimal solution of the problem (7). In other words, the weights given in WIPM can reflect the trade-off rate information among the values of objective functions. When the coefficients of nonlinear objective functions are fuzzy numbers, we also use the model (7) to convert (4) into a corresponding single objective fuzzy optimization problem. 3.2 Quasi-Simplex Techniques To solve the single objective problem (7), a new hybrid genetic algorithm with quasi-simplex techniques (GAQST) is proposed. Simplex method is one of the widely accepted conventional direct search methods [6]. A simplex in an ndimensional space is defined by a convex polyhedron consisting of n + 1 vertices, which are not in the same hyper-plane. Assume that there are n+\ points in the ndimensional space, denoted by X( and the objective function values over these points are denoted b y ^ , i = l,2,...,w+l. The worst and the best points in terms of function values are denoted by X determined by

f{xH) = maxfl and

and X

i = 1,2

, respectively, and can be

«+l

(8)

63 / ( x B ) = min/;

i = l,2,...,«+l

(9)

The quasi-simplex technique uses two search directions in generating prospective offspring. One direction is the worst-opposite direction, which is used in the conventional simplex techniques, and the other is the best-forward direction, which is a ray from the centroid of a polyhedron whose vertexes are all the points but the best one towards the best point of the simplex. Along the worst-opposite direction, four individuals will be generated by using the reflection, expansion and compression operations, respectively, and can be determined by x = xc +a(xc -x") (10) where X is the centroid and can be calculated by xc

fn+i

I'x

1

(11)

\-x"

_ V r=l

a is a coefficient whose value determines the position of corresponding potential better point along this direction. For example, a=l, a >1, -1 < a < 0 and 0 < a < R

1 are corresponding with the reflection point x compression points x and x respectively.

F

, the expansion points x

,

Along the best-forward direction, four individuals xe, xr, xm and x", will be calculated by the operations that are similar to the expansion, reflection, and compression operations used in the conventional simplex method using the following formula x = xB + /J(xB -xD) (12) where x denotes the centroid of the polyhedron whose vertexes are all the points but the best point and can be calculated by

2>'

B

(13)

VV ;=i , P is a coefficient whose value determines the position of corresponding calculated point on the best-forward line. The four prospective points along the best-forward direction are with corresponding value ranges of p >1, p=l, -1 < p < 0 and 0 < p < 1, respectively. Genetic algorithm is one of the effective methods to solve economic dispatch problem [7]. General genetic algorithm, contrary to conventional optimization method, has strong global search capability and weak local search ability. It can increase search performance to combine genetic algorithm with conventional optimization method. Based on this idea, a new optimization method, hybrid genetic algorithm with quasi-simplex techniques (GAQST) is proposed in this study to solve general single nonlinear optimization problems. The iteration steps of the GAQST are described as follows:

64 Step 1: Initialize a population of size \x = K (w+1); Step 2: Evaluate the fitness values for each individual Xt of the population based on the objective function/(x.) ; Step 3: Subdivide the population into K subpopulations; Step 4: For each subpopulation, create an offspring by genetic operations and quasi-simplex techniques in parallel. In order to increase subpopulation varieties, select respectively the best point as offspring from the points obtained by the formulas (10) and (12). The rest offspring of subpopulation are created by reproduction, crossover and mutation operations; Step 5: Unite all offspring created in Step 4 to form a new generation population; Step 6: Stop the iteration if the termination criterion is satisfied, otherwise, go back to Step 2. 4. Experiment Results By combining the proposed WIPM, GAQST and fuzzy number ranking methods, a new approach is developed to solve the bi-objective constrained fuzzy non-linear programming problem described by formula (1). Firstly, convert the FDEELD into the single objective optimization problem by using WIPM. Secondly, use the Lagrange relaxation method to form a Lagrange function. Finally, use the GAQST to optimize the Lagrange function. In the process of the iteration, fuzzy number ranking method proposed in the literature [8] is used to compare fuzzy function values of different points for the single objective function. 4.1. Objective Function Using the Lagrange relaxation method, penalty function h can be written as h(x) = w, (1—1.—f +

wJ^-^—)2

+ Mmax{0,PJhw-PJ(t)}

<14>

+ Mmax{Q,Pj(t)-Pjhigh} 4.2 Test Data Table 1 to Table 4 show respectively the test data of the units output, cost function, emission function, and load demand.

65 Table 1 Limits of unit output and ramp rate Unit No.

P«,(MW)

1 2 3 4 5 6 7

20 20 35 35 130 120 125

f _ W

R

*>J

125 150 225 210 325 310 315

J

40 40 50 50 60 60 60

30 30 40 40 50 50 50

Table 2 Fuzzy coefficients of the cost functions Unit no 1 2 3 4 5 6 7

a

a2

825.72578 645.32513 1135.89710 1198.86520 1586.73960 1295.65920 1496.65170

846.36892 661.45826 1158.61504 1222.84250 1610.54069 1315.09409 1519.10148

\

«0 800.95401 625.96538 1107.49967 1168.89357 1555.00481 1269.74602 1466.71867

bo 37.46062 41.32673 38.83637 36.90654 36.58126 38.29901 36.52011

b,

h

38.53973 42.51721 39.83217 37.85286 37.32782 39.08062 37.26542

39.46468 43.53762 40.62881 38.60992 37.92507 39.70591 37.86167

c

o

c

C

i

0.16218 0.12359 0.02705 0.03472 0.02521 0.01682 0.02013

0.15813 0.12050 0.02651 0.03403 0.02478 0.01653 0.01979

2

0.16559 0.12619 0.02754 0.03534 0.02559 0.01707 0.02043

Tat)le 3 Fuzz)f coefficients of the emission functions Unit No. 1 2 3 4 5 6 7

«o

a,

a2

A

A

A

r0

h

Yi

15.18178 15.18178 34.69310 34.69310 42.03762 40.92147 40.92147

15.65132 15.65132 35.58267 35.58267 42.89553 41.75660 41.75660

16.04260 16.04260 36.29432 36.29432 43.53896 42.38295 42.38295

0.28456 0.28456 -0.54136 -0.54136 -0.52138 -0.53245 -0.53245

0.29276 0.29276 -0.52816 -0.52816 -0.51116 -0.52201 -0.52201

0.29979 0.29979 -0.51760 -0.51760 -0.50298 -0.51366 -0.51366

0.00382 0.00382 0.00698 0.00698 0.00453 0.00464 0.00464

0.00392 0.00392 0.00712 0.00712 0.00461 0.00472 0.00472

0.00400 0.00400 0.00725 0.00725 0.00468 0.00479 0.00479

Table 4 Load demand T D T D

1 690

2 670

3 670

4 680

5 730

6 800

7 870

8 840

9 890

10 920

It 950

13 890

14 890

15 930

16 970

17 930

18 950

19 1070

20 1040

21 950

22 850

23 760

12 91 0 24 73 0

T-time segment; D—correspondence load demand

4.3 Test Results Penalty function h is a high-dimension nonlinear function, and therefore it is hard to know where the global minimum point is. In order to demonstrate the effectiveness of the proposed algorithm, the mean and standard deviation of fuzzy fuel cost, fuzzy emission and fuzzy total cost corresponding with the optimal outputs are tested. In addition, in order to compare the magnitude of effect of the set of weights to fuzzy fuel cost and fuzzy emission, we calculate 3 group weights. Table 5 lists the means and standard deviations of fuzzy fuel cost, fuzzy emission and fuzzy total cost by the proposed algorithm through running independently 10 times, where MFC, MEC, and MTC present the means of the fuel cost, the emission, and the total cost, respectively; STDEV-FC, STDEV-EC and STDEVTC present corresponding standard deviations.

66 Table S The comparison of the results obtained for different weights

0,,w>) (0.3,0.7)

(0.5, 0.5)

(0.7, 0.3)

MFC

STDEV-FC

MEC

STDEV-EC

MTC

STDEV-TC

1067359 1092154 1112300 1061711 1086218 1106213 1053936 1078110 1097695

291.4 303.8 312.4 472.8 377.1 615.9 57 58.3 60

11423.23 11993.81 12539.55 11466.7 12041.25 12596.67 11600.89 12184.12 12744.69

2.2 2.5 2.7 4.2 5.3 9.4 1.3 1.4 1.5

1078780 1104148 1124838 1073181 1098320 1118805 1065S37 1090295 1110440

290.7 300.7 310.4 468.5 540.6 607.9 55 58.2 60.9

As the standard deviations of every result are all significantly small, the results are believable. It can be seen that the fuel cost decreases and the emission increase when the weight of the fuel cost is assigned higher. 5. Conclusions This study proposes a new environmental economic load dispatch model with a consideration of uncertainty in the coefficients of fuel cost and emission functions. The weighting ideal point method, hybrid genetic algorithm with quasi-simplex techniques and fuzzy number ranking method are used to solve the optimization problem described in the model. The proposed method has two main advantages: (1) it characterizes more precisely the fuel cost and emission; and (2) it provides more information than real number-based methods. References [1] Abido, M. A. (2003), "Environmental/economic power dispatch using multi-objective evolutionary algorithms", IEEE Transactions on Power Systems, Vol. 18, No. 4, pp. 1529-1537. [2] Venkatesh, P., Gnanadass, R, Padhy, N.P.(2003), "Comparison and application of evolutionary programming techniques to combined economic emission dispatch with line flow constraints", IEEE Transactions on Power Systems, Vol. 18, No. 2, pp. 688-697. [3] Rughooputh, H.C.S., Ah King, R.T.F. (2003), "Environmental/economic dispatch of thermal units using an elitist multi-objective evolutionary algorithm", Industrial Technology, Vol. I, pp. 48-53. [4] Huang, C. M., Yang, H. T. and Huang, C. L. (1997), "Bi-objective power dispatch using fuzzy satisfaction-maximizing decision approach", IEEE Transactions on Power Systems, Vol. 12, No. 4, pp.1715-1721. [5] Watts, D., Atienza, P. and Rudnick, H. (2002), "Application of the Power Exchange-Independent System Operator Model in Chile", Power Engineering Society Summer Meeting, 2002 IEEE, Vol. 3, pp. 1392-1396. [6] Nelder. J. A., Mead. R. (1965), "A simplex method for function minimization", the computer Journal, 5. [7] Damousis, I. G., Bakirtzis. A. G., Dokopoulos. P. S. (2003), "Network-constrained economic dispatch using real-coded genetic algorithm", IEEE Transactions on Power Systems, Vol. 18, No. l,pp. 198-205. [8] Lee, E. S., Li. R. L. (1988), "Comparison of fuzzy numbers based on the probability measure of fuzzy events", Comput. Math. Appl. 15, pp. 887-896.

MINIMIZING THE NUMBER OF AFFECTED CONCEPTS IN HANDLING INCONSISTENT KNOWLEDGE* ERIC GREGOIRE + CRIL CNRS & IRCICA Lens, F-62307, France

In this paper, we introduce a new family of approaches to fuse inconsistent logic-based knowledge sources. They accommodate two preference criteria to arbitrate between conflicting information: namely, the minimisation of the number of contradicted formulas and the minimisation of the number of the different terms that are involved in those formulas.

1. Introduction Logic-based approaches to knowledge fusion can be classified according to several criteria (see e.g. [5] for a survey). Among them, the syntax dependence/independence criterion remains a controversial one [7]. Many authors claim that any acceptable logic-based approach to knowledge fusion should be syntax-irrelevant in the sense that the way according to which knowledge is expressed within formulas should not influence the resulted fused knowledge [5,6]. Such a claim is widely accepted when fusing the beliefs of rational agents is modelled. Indeed, it does not seem acceptable to consider that the beliefs of rational agents should depend on their syntactical representation. In this paper, we are not concerned with fusing the beliefs of rational agents. We are rather concerned with the information that should be fused from mere computerized knowledge sources, simply. In this context, syntax can be important and must sometimes be taken into account, at least to some extent. For example, in earliest expert systems, inference engines a la OPS implemented a different treatment for mere facts and longer formulas, based on the assumption that the longer a formula is, the more specific it should be. More generally, the way according to which knowledge is expressed may embody important implicit information. Such an implicit information can be the result of the -conscious or * This work is supported in part by an EC FEDER grant and by the Region Nord/Pas-de-Calais [email protected]

67

68 unconscious- will of the user, or e.g. simply the result of previous fusion steps. For example, a user might want to insist on the true character of a formula by asserting it several times. Else, the multiple occurrences of a formula can be the result of several converging information sources, contributing to its acceptability. Also, the intended meaning of the formula a and b and c in a knowledge base can be different from the meaning of the presence of the three formulas a, b and c. Indeed, to some extent, the unique formula a and b and c can be a way to assert that the three terms are correlated true, whereas the three mere individual facts can sometimes be independent ones. From a technical point of view, a purely semantic-based approach to knowledge fusion interprets the initial knowledge sources as sets of models, namely truth assignments satisfying all the involved formulas. Sources exhibiting the same models are thus indistinguishable in this respect. Accordingly, it can be claimed that such a semantic-based approach is too coarse-grained with respect to some artificial intelligence applications. On the other hand, a purely syntaxbased approach can be interpreted as too fine-grained, since it can make a difference between formulas a and b and b and a. In this paper, a trade-off between syntax and semantic-based approaches is proposed. It is syntax-based in the sense that it might be influenced according to the way knowledge is split into formulas. However, it will make no difference between e.g. formulas a and b and b and a. It is semantic-oriented in the sense that it will define the resulted fused knowledge by means of its set of models. 2. Semantic-oriented approaches For the simplicity of the presentation, we use standard propositional logic, although all results in the paper can be easily extended to the first-order finite Herbrand case. We follow the notations from [1], Let L be a propositional language of formulas over a finite alphabet P of Boolean variables, also called atoms. The A , v, -i and => symbols represent the standard conjunctive, disjunctive, negation and material implication connectives, respectively. A literal is an atom or a negated atom. A term designs a literal without its sign. Q. denotes the set of all interpretations of L, which are functions assigning either true ox false to every atom. A model of a knowledge base KB is an interpretation that satisfies every formula of KB. An interpretation or a model will be represented by the set of literals that it satisfies. The set of models of KB will be denoted [[.O]]. KB is consistent when [[KB]] is not empty. KB |= x expresses that the literal x can be deduced from KB, i.e. that it belongs to all models of KB.

69 Let us consider a multi-set of n (« > 1) consistent prepositional knowledge bases E = {KB\,...JCBn} to be fused. Fusion operators will be defined as functions A that associate a knowledge base, denoted A(E), to each information set E. From a syntactical point of view, a knowledge base KB is thus a set of formulas of L. Many syntactical approaches to knowledge fusion amount to taking preferred maximal (with respect to cardinality) consistent subset(s) of formulas of the set-theoretic union of the bases to be fused (see e.g. [3]). From a purely model-theoretical point of view, a KB is a set of models. Most modeltheoretic approaches rank order the set of interpretations w.r.t. all KBi, using the Hamming distance (also called Dalai's distance [4]) from a model of KBt. A(E) is then characterized by the set of interpretations that are minimal in some sense with respect to this rank-ordering. In the following, we shall adopt a trade-off between these two points of view. But let us first describe the model-theoretic approach in more details. The Hamming distance between an interpretation ro and a prepositional knowledge base KBt is defined as the smallest number of atoms about which this interpretation differs from some model of KB\. Definition 1. d(aJCBi) = min m^ [[oi]] dist(co,co') where dist(co,co') is the number of atoms whose evaluation differs in the two interpretations. In the following we shall investigate several definitions for rank-ordering interpretations from Q. Accordingly, assume for the moment that an overall distance noted dA(co,E) between an interpretation and the multi-set E has been defined, already. Definition 2.

co <(AiE) ©' iffdA((a,E) < dA(co',E)

The fused knowledge-base A(E) is defined by its models, which are minimal with respect to ^(A,E> •

Definition3. [[A(E)]] = min(Q,<(AjE)) Let us now present the various usual definitions for dA(co,E) that rank-order interpretations [1,9-12]. In a similar way, lexicographic-based egalitarist operators have been defined in [6]. Definition 4. (Majority operators) d^oo.E) = X (i S[i..«]) d(co, KBi) Definition 5. (Weighted sum operator) dlvJ((o,E) = X (i e[i..n]) d(ca, KB^ * kf where kj are numbers reflecting the level of importance of KBi.

70

Definition 6. (Max-based egalitarist operator) dmax(co,E) = Max(i e[i..„]) d((B^Sj) 3. A refined semantic-based approach One main drawback of the above approach is that it does not apply when sources to be fused are themselves inconsistent. Indeed, Definition 1 requires us to consider models co' of KB{, which do not exist when KB\ is inconsistent. Accordingly, a refined approach has been proposed in [7] that, at the same time, solves this problem and adopts a more syntactical point of view. From a technical point of view, it just requires Definition 1 to be replaced by the following alternative one. Definition 1'. d(corKBi) = number of formulas from KB\ that are falsified in the interpretation co. The minimisation principle behind this definition is clear. Minimisation principle PI. Prefer the interpretations that minimize the number of falsified formulas. Now, this definition remains somewhat coarse-grained in the sense that it treats in a similar way all interpretations falsifying a same number of formulas. Moreover, in some situations, combined with the above criterion or not, we might want to adopt a similar criterion with respect to the number of involved concepts in falsified formulas. Minimisation principle P2. Prefer the interpretations where the number of (different) involved terms is minimized in falsified formulas. The idea behind this principle is as follows. In order to restore consistency, we should drop or weaken some formulas. Accordingly, we prefer dropping formulas where the total number of involved terms is minimal. For example, we might prefer dropping a unit formula a rather than a formula b and c and d and e and f and g. Indeed, dropping the latter formula involves the idea that we drop some information about 6 different concepts, namely b, c, d, e, / a n d g, together with a logical link between them. On the contrary, dropping a requires us to drop some information about a single concept, namely a. In order to implement these two principles, Definition 1' can be redefined as follows.

71 Definition 1". dP;(co,£flj) = min/ where • when / = 1: min; = number of formulas from KB{ that are falsified in the interpretation co • when 1 = 2: min; = number of different terms in the formulas from KB{ that are falsified in the interpretation co The above definition just requires one of the two minimisation principles to be taken into account. However, these principles can also be mixed in various ways. The simplest approach in this respect consists in rank- ordering one principle with the other one: we could e.g. apply principle PI, and in case several solutions emerge, prefer the ones that best fit principle P2. Alternatively, we could apply principle P2, and then principle PI to refine the solutions. Accordingly, let us transform Definition 1' again. Definition 1 ' " . Let Pf denote Minimisation Principle Pi, with i e [1,2]. Let (J,k) = (1,2) or (2,1). dpj>pk(P2(co,A2?j)

72 Definition 4'. (Majority operator for dPi>P2(co,Aj5j)) d2Ti>P2 (<»,E) = (Z (i e[i..n]) mini, number of different terms in the formulas from E that are falsified in the interpretation co) Now, we must compare the couples d^-pi^ (co,E) for all possible co, and select interpretations co that are "minimal". Such a form of minimality is naturally and easily defined, using a form of lexicographic order. Let us recall the lexicographic order concept. Definition 7. Let Si = (si,.. .,s„) and S2 = (s' b .. .,s'n) be two sequences or vectors of integers. Si
co e [[A(E)]] iff there is no co' s.t. d^-pi^ (co',E) ?2

An adapted weighted sum operator can defined in the same way. However, we must decide on what the k{ coefficients should apply. Should they apply both on mini and min2? Whereas applying them on mini seems natural, applying them on min2 could be discussed since it would entail that a same term could have a higher weight in some sources than in the other ones. Definition 5'. d ws PI>P2 (co,E) = (Z (i £[i..«]) m m i * h, number of different terms in the formulas from E that are falsified in the interpretation co, pondered by the k\ coefficients) where k\ are numbers reflecting the level of importance of KB{. Definition 9. (co,E).

co e [[A(E)]] iff there is no co' s.t. dws P1>P2 (co',E) p2

Now, how can the Max-based egalitarist operator be adapted? Definition 6'.

dmaxPi>P2 (co,E) = (Max(j 6[1..n]) min,, Max(i e[1..B]) min2)

Let us note that the two Max operators are uncoupled in the sense that the maximal number of involved literals in falsified formulas does not necessarily concern the maximal number of falsified formulas among the knowledge sources where the number of falsified formulas are minimized.

73 Definition 10. roe [[A(E)]] ij^there is no ro' s.t. dmaxPl>?2 (ro'.E) ?2 (co,E). 4.

Example

Let us illustrate these new operators on a simple example. Assume that we need to fuse the following knowledge sources. KB] = {a, - i c } , KB2 = {a => (b A C), 6} and . 0 3 = {-.a A C}. Clearly, KB\ u £ B 2 ^ ^ ? 3 is inconsistent. Let us now compute the above operators. In the following table, we indicate for each interpretation col, the resulting distances and couples of distances. The interpretations corresponding to distances and couples that are underlined are models of the fused knowledge source. Distances and couples that are bared represent solutions that are withdrawn by the new operators with respect to the corresponding initial ones. From this example, we clearly see that the d£pi>P2 and the dmax p1>P2 operators narrow the range of solutions that could be obtained by the dx and d , ^ operators. We can see in the example that the interpretation {a, b, —\c} is not a model anymore of the fused knowledge sources when the d£Pi>P2 operator is considered. Indeed, it requires dropping or weakening formulas that globally involve three terms, whereas it is sufficient to regain consistency to drop a similar number of formulas involving less terms. Obviously enough, similar results can be obtained for the weighted sum operators as well. Table 1. Example to 0)1

a>2 »3 004 0)5 0)6 0)7 0)8

5.

a false false true false true true true false

b false true false true true false true false

c true false false true false true true false

dj(o),E)

dzPi>P2(o),E)

dmax(0),E)

3 2 3 2 2 4 2 3

(3,3)

2

(2,2)

02)

I

(L2)

(3,3) (2,2)

2 2 1 2

(2,3) (2,2)

I I

£L2)

(4,3)

02J (3,3)

dmax P1>P2(0),E)

(2,3)

02}

Conclusions

In this paper, a new family of syntax-based approaches to knowledge fusion has been introduced. Clearly, it provides a fair trade-off between purely syntax-based and semantic-oriented ones. Its main original feature is that it allows several possible solutions to be discriminated based on the number of different terms that it requires dropping.

74 References 1. S. Benferhat, D. Dubois, S. Kaci, and H. Prade, "Encoding classical fusion in possibilistic logic: a general framework for rational syntactic merging", Proc. ofECAI'2000, pp. 3-7 (2000). 2. B. Bessant, E. Gregoire, P. Marquis and L. Sais, "Iterated Syntax-Based Revision in a Nonmonotonic Setting", Frontiers in Belief Revision, Kluwer, Applied Logic Series, 22, pp. 369-391 (2001). 3. C. Cayrol, M.-C. Lagasquie-Schiex and T. Schiex, "Nonmonotonic reasoning: from complexity to algorithms", Annals of Mathematics and Artificial Intelligence, 22, pp. 207-236 (1998). 4. M. Dalai, "Investigations into a theory a knowledge base revision: preliminary report", Proc. ofAAAI'88, pp. 475-479 (1988). 5. E. Gregoire and S. Konieczny, "Logic-based approaches to information fusion", Information Fusion, 7, pp. 4-18 (2006). 6. S. Konieczny, and R. Pino Perez, "Merging with integrity constraints", Proc. ofEcsqaru'99, pp. 233-244, LNCS 1638, Springer (1999). 7. E. Gregoire, " Syntax and semantics in knowledge fusion: a mixed approach", Proc. of the 6' Int. Conf on Sensor Fusion: Architectures, Algorithms and Applications, pp. 60-64, Orlando, (2002). 8. E. Gregoire, "Extension of a distance-based fusion framework, Proc. of the 8th Int. Conf. on Sensor Fusion", pp. 282-286, Orlando (2004). 9. J. Lin, "Integration of weighted knowledge bases", Artificial Intelligence, 83, pp. 363-378(1996). 10. J. Lin, and A.O. Mendelson, "Merging databases under constraints", Int. Journ. of Cooperative Information Systems, 7(1), pp. 55-76 (1998). 11. N. Rescher, and R. Manor, "On inference from inconsistent premises", Theory and Decision, 1, pp. 179-219 (1970). 12. P.Z. Revesz, "On the semantics of theory change: arbitration between old and new information", Proc. of the 12th ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Databases, pp. 71-92 (1993).

A KNOWLEDGE MANAGEMENT BASED FUZZY MODEL FOR INTELLIGENT INFORMATION DISPOSAL * XIAOBEI LIANG 1 Shanghai Business School, Shanghai, 200235, China School of Management, Fudan University, Shanghai, 200433, China ZIJING ZHANG Glorious Sun School of Business and Management, Dong Hua University, Shanghai, 200051, China DAOLIZHU School of Management, Fudan University, Shanghai, 200433, China BINGYONG TANG Glorious Sun School of Business and Management, Dong Hua University, Shanghai, 200051, China Based on analysis of a knowledge management fuzzy model, the judgment thinking process is discussed in this paper. Then, fuzzy models of three thinking forms based on intelligent information disposal are established for abstract thinking, fuzzy pattern recognition model of imagination thinking, and fuzzy state equation model of intuitive thinking. Finally, a fuzzy integrated judgment model can be established on the basis of three thinking forms and different fuzzy models by fuzzy large-scale system modeling techniques.

1. Introduction Knowledge management is a systemization, a process that drives management and uses group knowledge, and includes knowledge foundation, knowledge issuance and sharing together1"2. Generally, a knowledge management system is composed of knowledge gaining, generating system, knowledge issuance, and sharing system. Both knowledge gaining and generating system mainly use the knowledge gaining relevant knowledge and experiences or supplies with a ' This work is supported by grant 05ZR14091 of the Shanghai Natural Science Foundation. Corresponding author. Tel.; 0086-21-62373937; Fax: 0086-21-62708696. E-mail address: [email protected] (X.B. Liang)

1

75

76 definite knowledge environment to create relevant knowledge. Moreover they document it and save it for the later use3"4. Knowledge issuance system mainly uses computer network technology, data storage technology management and issues relevant knowledge. Knowledge sharing system mainly uses group apparatus and workflow system to reach knowledge sharing. In this paper, based on an analysis of knowledge management fuzzy model5, the judgment thinking process is discussed. Then, fuzzy models of three thinking forms based on intelligent information processing are set up. 2. Human's thinking activity processes Most knowledge management processes are human's thinking activity processes, in fact they are typical intelligent information processing. The related information in this process mostly is fuzzy intelligent information except little precise information. Consequently, in the model base system of knowledge management-based enterprise strategic decision support systems we should start with the human brain thinking process and establish corresponding models. The typical thinking activity process could be regarded as an intelligent information processing, besides, it could be divided into four specific process: (1) information processing via recognition process; (2) information transmission via communication process; (3) information storage via memory process; and (4) information usage via decision process. Such process could regarded as some input information, some output information by processing. That is, employing great system control modeling idea, looking on the human brain as a black box, we emphasize to study the relation of thinking process input and output, represent the external behavior, function and result. Then we should establish the index system of input and output information, study the fuzzy information quantitative processing method, finally establishes the corresponding integrated fuzzy models. Generally speaking, the arranged thinking activity process could be divided to the types of judgment, expression, storage and decision according to the thinking target. 2.1. Judgment thinking process The object of judgment thinking process is some objective state of affairs, its terminal is judgment. However, such thinking process also could be divided into some specific types, three thinking links are confirmed. Question Inference Judgment Here, the first question is the suggested question, including initial simple judgment, or by directly proposed, or by collection. Moreover including the related condition and elementary matter, we should arrange these factors.

77 Inference itself is a great deal of front-to-back joined complicated links as well as a single link. We should employ different thinking inference forms according to some specific problem. The conclusion of inference is the judgment, in which the target desires, but this judgment should be integrated into a harmonious result of several thinking forms. 2.2. Decision thinking process Decision thinking process is the thinking process with some actions for object, also could be divided into some specific patterns, for instance: 1) Selecting The main thinking frame is the following links' combination: Collection Comparison Corresponding Decision Examination Such a collection should be various input forms. Input information represents a possible combination of two or more actions. Then we should compare these several possibilities, specially taking the long-range or current interest as the standard, or we could select other standards to compare. Finally we should take steps according to the selective results, and select the feedback information to proof in the practical action. 2) Feedback The main frame form is: Determination Gathering Correcting Proving On some condition one might not think the object factor and measure over before the action. So in the beginning one should make decision in a hurry. Then one starts to gather the information and improve the action in practice. 3. The fuzzy model base of intelligent information disposal Human beings have the perfect thinking organization that could use their own knowledge flexibly to solve the problem with different thinking forms. Commonly human conscious purposeful thinking forms include: (1) abstract thinking; (2) imagination thinking; (3) intuitive thinking. We should use various methods to establish some corresponding fuzzy model in terms of different characteristics of three thinking forms. 3.1. Fuzzy logic inference Model of abstract thinking Abstract thinking is also called logic thinking. It bases sensibility understanding, that reflects nature of a thing correctly or incorrectly, and reveals its internal relation of thinking forms by concept, judgment and inference. In practical

78 abstract thinking people apply the fuzzy concept and fuzzy inference more, so we could employ fuzzy logic method to establish the corresponding fuzzy inference model. Generally speaking, consider the case with two-input, one-output fuzzy system: Input: x equals A' and y equals B' R1: if we impose x equals A1 and y equals B1, then z equals C1; also R2: if we impose x equals A2 and y equals B2, then z equals C2; also Rn: if we impose x equals An and y equals Bn, then z equals Cn. Output: z equals C\ where x, y and z might represent the variables of the state and control quantity of system, Ai, Bi and Ci are the value of x, y and z, respectively. Consider a series of fuzzy control regulation, we should have an overview of fuzzy inclusion relation:

Finally, we get C"=C
fl")°.R

(2)

which gives that M(A'andB'){x, y) = MA- ( X ) A MB- M or M{A-andB-)(X>y)=MAX)MB-b') where " ° " is a synthetic operational character, we usually employ the maximum-minimum synthesis method. 3.2. Fuzzy pattern recognition model of imagination thinking Imagination thinking (including experience thinking) mostly uses typical means to generalize, and people might think depending on imaginative makings and experience pattern. Therefore, we could use a pattern recognition idea to establish fuzzy pattern recognition models. Here we study the following model: Let U be the entire of recognition object, or a group, the object u in the group U be represented by some limited parameter values, each parameter presents some characteristic of u. So that, the object U goes with the random vector P (ui,...,um), which Ui(i=l,2,...,m) relative to the value of the characteristic, thus P(u) is called the pattern of U. The task of pattern recognition is to classify each object to the same type, and that commonly is a fuzzy set. We assume that there exist a series of fuzzy set Ai,...,

79 A„, when a recognition method is employed by the object u, there results the subordinated extent , .,. _ . , ,, which indicates the degree that object u is subordinate to the type Aj. In general, the recognition method usually isn't described clearly, so the method is undefined. Now the pattern recognition should change an undefined method to a defined method. Firstly we might cognition the object, after that we should cognition the pattern. 3.3. Fuzzy state equation model of intuitive thinking Intuitive thinking is also called inspirational thinking, main body intuition leading thinking form. Intuition is the interoperable result of dominant consciousness and sub-consciousness. Its basic features are paroxysmal, accidental, creative, fuzzy, and ordinary, and paroxysmal. Intuitive paroxysmal indicates its surprise from time and its unconsciousness from its effectiveness. But this correlates to the inference of sub-consciousness. The inference result is an information jumping phenomenon. Intuitive fuzzy mostly performs on the input and output information of intuitive thinking. If we want to simulate intuitive thinking, we should establish the L-R fuzzy databased fuzzy state equation model of intuitive thinking5. Considering the random time fuzzy system with the time variability, we should get its state equation in terms of component: xi{k

\)=iais(k)xs(k)+ibil(k)ul{k)

+

i = \,2,---,n

yjW^tcjMx,

j = \,2,-,m

(4)

1=1

If we substitute (3) for (4) over and over, when initial status x(0) is known, we should carry out the proper collation, then we might have 6

yji^t^xM+I^iikHk-pj+ejik) s=]

~

j=l,2 m

(5)

1=1 p=\

For example, when we carry out the specific intuitive thinking analysis, if there exist some mutational characteristic, we can employ the above fuzzy status equation model to represent according to testing fuzzy data. Then we should get thinking output information and work out the qualitative and quantitative forecast result.

80 4. Conclusions Based on an analysis of knowledge management fuzzy model, the judgment thinking process is discussed. Then, the fuzzy models of three thinking forms based on intelligent information processing are established. There are fuzzy logic inference model of abstract thinking, fuzzy pattern recognition model of imagination thinking, Fuzzy state equation model of intuitive thinking. Usually several thinking forms effect alternately and synthetically early or late in the actual thinking process. Therefore, the fuzzy integrated judgment model should be established on the basis of three thinking forms and different fuzzy models by fuzzy large-scale system modeling technique. References 1.

Hartman A, Sifonis J,Kador J, Succeed Strategy under Electric Commerce, China Mechanical Industry Publisher, Beijing ,2000,35-45(in Chinese)

2.

KC. Laudon and JP. Laudon. Management Information Systems (7th Ed). Upper Saddle River, NJ: Prentice-Hall,2002.

3.

T. Housel and AH.Bell. Measuring and Managing Knowledge. New York: McGraw-Hill, 2001. Stockton, D., J., Quinn, L., Khalil, R., A., Use of genetic algorithms in operations management Part 1: applications; Part 2: results. Proceedings of the Institution of Mechanical Engineers — Part B — Engineering Manufacture: 2004.218 (3): 315-344. Laqoudis, I., N., Lalwani, C , S., Nairn, M., M., A Generic Systems Model for Ocean Shipping Companies in the Bulk Sector. Transportation Journal: 2004,43 (1): 56-77.

4.

5.

A SEMANTICAL ASSISTANT METHOD FOR GRAMMAR PARSING * YI WANGf Department of Network Enginerring, Chengdu University of Information Technology, Chengdu, SiChuan 610225, China GUANG GAN, ZHEN WU, FEI LI Department of Network Enginerring, Chengdu University of Information Technology, Chengdu, SiChuan 610225, China This paper proposes a semantic assistant method to help the grammar parser choosing the correct parsing structure. In this method, the understandability of the phrases generated in the process of parsing will be checked by the means of checking the weak-consistency of their semantic knowledge in a background knowledge base learned from tree bank corpus. The parsing probabilities of the understandable phrases are improved to near 1 such that the correct parsing structure can eventually obtains the largest probability. The experiment results show that the method can obviously improve the correctness of the grammar parser.

1.

Introduction

Any grammar rule set for natural language has the problem of over generation, which makes the grammar parser to generate a lot of semantically invalid phrases in the process of parsing. It is very difficult to determine the correct phrase structure from the large mount of parsing results. This problem is one of the crucial obstacles in the application of NLU, especially in orient languages, such as Chinese and Japanese. Most people have realized that the only resolution of this problem is to consider more semantic features of words in grammar parsing. Currently, such an ideal is studied primarily in two ways: the first one is the sub-categorizing of words11,21, which categorizes the words into more categories so as to make each category contains more semantic features and eventually make each grammar rule has more semantic constraints. The second one is generally known as semantic Selection Restriction'3'4'51, which uses the semantic relations between words to restrict the generation ability of grammar rules.

* This work is supported by the National Foundation of China under Grant No.60474022. 81

82 This paper provides a more direct and natural way to integrate the semantic information of not only the words but also the phrases into grammar parsing. Its basic ideal is to direct the grammar parsing by checking the understandability of phrases generated in the processing of parsing. 2.

The ideal of semantic assistant for grammar parsing

The basic ideal is to help the grammar parser choosing the correct phrases in the process of parsing by checking the understandability of phrases, i.e. phrases that is imaginable with regard to human's experiential knowledge. We introduce the concept of weak-consistency of knowledge to simulate such a kind of understanding. Definition 1 (weak-consistency of knowledge). Knowledge is weakconsistent if it can be deduced from any knowledge in a knowledge base by the means of inherent reasoning. For example, the semantic knowledge of the phrase "The robot cat eat a mouse" is weak-consistent in a KB that contains the knowledge of "Cat eats mouse" because it can be inherently deduced from the later knowledge with regarding that "the robot cat" as a kind of cat. It is just weak-consistent because it is conflict with other knowledge in the same KB such as "Robot things do not eat any thing". Besides, this definition also specifies that inherent reasoning is the only way to check the weak-consistency of knowledge because its high computing efficiency meets the requirement in human's shallow-level understanding, which, as mentioned above, is mainly used to determine the boundary of phrases in reading or listening and should be executed as quickly as possible. Definition 2 (The understandability of phrase). A phrase is understandable if its semantic knowledge is weak-consistent in the background knowledge base. According to definition 1 and 2, a phrase generated in grammar parsing should be converted into some kinds of knowledge structure and then be checked within a background knowledge base. The checking result should be used to direct the rest process of grammar parsing by improving the parsing probability of the understandable phrases in probable grammar parsing in order to make the correct parsing structure that contains most understandable phrases obtains the highest parsing probability. The formula used to improve the parsing probability of understandable phrase is:

P'{r) = P{r)a,a«\.

(1)

Where a is a factor large than zero and much less than 1 and r represents a parsing tree which corresponding phrase is understandable.

83 3.

Knowledge representation for the semantic of phrase

3.1. Scenarized knowledge representation (SKR) Scenarized knowledge representation is a middle-level form of knowledge between the natural form and language form of knowledge and is dedicated to represent the key information of knowledge for it can be described in natural language. In other words, in the viewpoint of SKR, knowledge may have three levels of forms: the lowest level is its natural form in human's brain, which can hardly be cloned as a data structure in computer; the highest level is its form in language, i.e. words and phrases, known as the linguistic form of knowledge; the middle level is the scene structure of knowledge, which embodies the key information for the natural formed knowledge to be converted into or described as the linguistic form of knowledge. To analyze which kinds of key information are needed in the language description of knowledge, we have to analyze in which manners the knowledge can be described in natural language. We regards that there are four major manners of language description. They are: (1) naming description that uses the name of the knowledge (i.e. words) to express it (e.g. using "Smith" to express a man or using "Butterfly" to express a sort of creature); (2) super-class description that uses the name of the super-class concept of the knowledge to describe it (e.g. using "read" to describe a concrete action that is subject to the concept of reading); (3) inner-structural description that describe the knowledge by describing its compositions in a specified structure or sequence (e.g. describing an event as "Smith reads a book"); (4) relevant-description that uses a relevant knowledge of the knowledge being described to restrict its description in any of these four manners (e.g. using the relevant knowledge of butterfly that "Butterfly is beautiful" to describe it as "beautiful butterfly" or using the relevant knowledge of Smith that "Smith is reading a book" to describe him as "the man who is reading a book"). Different description method generates different element in language. Naming description and super-class description generates the word in sentences; inner-structural description mainly generates the clause structures; and relevant description mainly generates the modification structure. Inner-structural description need following key information: (1) the compositional information of knowledge; (2) the equivalent or super-class concept of its sub-knowledge objects that are necessary for their naming or super-class description; and (3) the role information of sub-knowledge objects that determines their order to be expressed. Relevant description of knowledge uses the same kinds of

84 information of its relevant knowledge to generate the linguistic structure. Such kinds of information are organized in a form known as Scene Structure in SKR. According to the prior discussion, it is the role information of subknowledge objects that determines the linguistic structure of knowledge. We use Scene Model to represent such type of information. Definition 3 (Scene Model). Scene model is a collection of concepts that reflect the roles of sub-knowledge objects in relation to others. Scene model is an abstraction of a kind of similar concrete scenes. For example, from our experience to many scenes of watching, we know that in such kind of scenes, there are a role concept that represents who are watching and a role concept that represents what is being watched and a role concept that represents the action of watching. These concepts of different roles are integrated into a collection and form the scene model of watching: {Subject of watching, Watching, Objective of watching}. In additional, we may also have some relevant knowledge of these role concepts such as the subject of watching has eyes and the objective of watching is visible, etc. Any concrete scene of watching is the instance of this scene model. Such a scene model in this example is known as semantic scene model because it can help us to identify or imagine the relevant features of real roles in real scenes. In most cases, different relation has different roles and thus different scene model. However, what determines the linguistic structure of the knowledge is not the semantic scene model but the so-called grammatical scene model, which is much abstract than semantic scene model and consists of such role concepts that are used to determine the position of sub-knowledge objects in linguistic structure. Grammatical scene model is similar to the case frame in case theory except that the grammatical scene model can represent much larger scale of models of relational knowledge than case frame does, which is defined only to represent the possible semantic roles of verbal words. In natural language, verb is not the only way to represent semantic relations: some semantic relations might be embodied by the combination of word categories (e.g. the combination of adjective and noun usually implies the relation between an object and its features) and some relations might be represented by other word categories such as pronoun and conjunctive. According to the structure of scene model, we define the scene structure of knowledge in definition 4. Definition 4 (Scene structure of knowledge). Scene structure of knowledge is a collection of the sub-knowledge objects known as the roles of the knowledge and each role is mapped to a role concept in a specified scene model.

85 4.

Checking the weak-consistency of the semantic knowledge of phrase

After a phrase is converted into the scene structure, its understandability can be check by match it with the scene structure of the knowledge in KB. A phrase is understandable if its semantic knowledge be matched with any knowledge in the KB. The standard of matching is: • • •

If their scene model is not the same, they are unmatched; else match their corresponding roles, i.e. roles mapped to the same role concept; if both roles are the same knowledge object or have a direct or indirect classification relationship, they are matched; • if all corresponding roles are match, the two scenes are matched. The background knowledge is learned from the Chinese tree bank (TCT), in which sentences are tagged as grammatical trees consisting of phrases. What we need to do is convert phrases in the tree bank into scene structures by the approach mentioned above. In additional, we use Chinese word-net as the source of the taxonomic knowledge base. 5.

Experiment

We conducted experiments to test this ideal, in which we used TCT tree bank as the corpus, which contains 16608 sentences. We used one from each 20 sentences as the testing corpus and the rest as the training corpus, thus we got 15778 training sentences and 830 testing sentences. The training sentences were converted into scene structures and stored in a background knowledge base. The testing sentences were converted into word-category pair sequence by removing the structural tags and remaining the category tags, which then were used in chart parsing in which each completed arc would be semantically checking with regard to the background knowledge base. Table 1. Experiment Results (15778 learning, 830 testing)

LP LR CBs OCB 1CB

Non-Assistant1 69.73 63.62 3.76 20.60 38.31

Semantic Assistant1 87.35 83.67 0.17 83.98 15.78

We use PARSEVAL algorithm to test effects of the algorithm, in which LP is labeled precision, LR is labeled recall, CB is the average number of cross

86 brackets per sentence, OCB is the percentage of sentences with zero crossbracket and 1CB is that with one cross-bracket. The experiment results show that the merits are obviously improved when parsing with the assistance of semantic checking. Nevertheless, the result is still inferior to our original anticipation because of lacking background knowledge. It is because that the scale of tree bank corpus is much less than other unstructurizd linguistic corpuses. How to learn more background knowledge from other kinds of corpuses is a problem that would determines whether this method would really take off. That is what we need to research in the future. 6.

Conclusion

Integrating semantic information in grammar parsing is the direction of current grammar parsing research. This paper proposes a new approach in this field. The ideal of checking the understandability of phrases by checking the weakconsistency of their semantic knowledge can simulate human's shallow-level understanding in some degree, and the scene structure of knowledge offers the base of representing and reasoning with semantic knowledge. However, the relatively small scale of the background knowledge limits its effects. We will develop new methods for learning more background knowledge from unstructurized text corpuses to deal with the obstacle. In another we either hope the development of Chinese tree bank can catch up with the need for it. References 1. Koenig, Jean-Pierre and Davis, Anthony. Sublexical Modality and the Structure of Lexical Semantic Representations. Linguistics and Philosophy 24,2001,71-124. 2. Richter, Frank and Sailer, Manfred. Basic Concepts of Lexical Resource Semantics. Lecture notes for the course on Constraint-based Combinatorial Semantics, ESSLLI 2003, Vienna. 3. Androutsopoulos, Ion and Dale, Robert, Selectional Restrictions in HPSG. In Proceedings of COLING 2000, 2000, pp. 15-20. 4. Manfred Sailer, Local Semantics in Head-Driven Phrase Structure Grammar. In: Olivier Bonami and Patricia Cabredo Hofherr: Empirical Issues in Formal Syntax and Semantics, 2005, pp. 197-214. 5. Atserias, J., Castellon, I., Civit, M., Rigau, G., Using a Diathesis Model for Semantic Parsing, Proceedins of VEXTAL, 1999 pp. 385-392 6. http://www.chineseldc.org/EN/doc/CLDC-LAC-2003-005/intro.htm

LUKASIEWICZ ALGEBRA MODEL OF LINGUISTIC VALUES OF TRUTH A N D THEIR REASONING

LIANGZHONG YI, ZHENG PEI, YANG XU School of Mathematics & Computer Engineering, Xihua University, Chengdu, Sichuan 610039, China E-mail: [email protected] In order to directly manipulates linguistic values of linguistic variable Truth, on the one hand, the aim of this paper is to construct algebra structures to model linguistic domain of Truth. On the other hand, due to reasoning based on logic system has reached high confidence, logic algebra systems (Lukasiewicz product algebras) are constructed. A method of linguistic reasoning directly using Truth is developed. Examples of this paper show that the conclusions of this paper coincide with our intuition in understanding linguistic values Truth.

1. Introduction The methodology of computing with words (CWW) proposed by Zadeh may be viewed as an attempt to harness the highly expressive power of natural languages by developing ways of CWW or propositions drawn from a natural language1' 4 ' 5 . As it is well known, inference based on logic system has reached high confidence. To handle uncertainty inference based on logic system, many kinds of non-classical logic systems have been studied2' 3, 6, 8, 10-12 And based on difference background, many researchers have study CWW 7' 13 ' 14. 2. Ordering structure in domain of linguistic "Truth" Zadeh has pointed out that the set of linguistic values of linguistic variables can be regarded as a formal language generated by a context-free grammar and, for computing their meanings, hedges could be viewed as operators on fuzzy sets 18 . These ideas suggest us to consider the sets of such linguistic values as algebras with operations to be linguistic hedge. Moreover, these sets have a natural partial ordering. By axiomatizing, N. C. Ho discussed the algebra structure of linguistic variable by using Hedge algebras 1 4 _ 1 7 . 87

88 Definition 2.1. 14 An abstract algebra X_ = (X,C,H,<), with H decomposed into 77 + and H~, is called a hedge algebra (HA, for short) if it satisfies the following properties: 1) Each hedge operation is either positive or negative w. r. t. the others, including itself. Such as, H+ = {more, 1} and H~ = { I, less}, where 7 is positive or negative itself. 2) If terms u and v are independent, i.e., u ^ H(v) and v £ H(u), then for all x £ H(u), we have x £ H(v). In addition, if u and v are incomparable (i.e., there is neither u > v nor v > u), then so are x and y, for every x € H(u) and y £ H(v). 3) If x ^ hx then a; ^ H(hx) and if /t / k and ha; < fcr, then h'hx < k'kx, for all h, k, h! and k' in 77. Moreover, if hx =£ kx, then /ix and kx are independent. 4) If u £ H(v) and u < v (resp., u > v), then u < hv (resp., u > hv), for any hedge h. We notice truoless false in Definition 2.1. However, in real-world practice, true and less false are incomparable. In order to express the situation of incomparability between true and less false, in this paper, we use the method 15 to study the algebra structure of dom(T). Assuming the following properties: 1) Finite set H = H+U {h0} U # ~ , h+ € H+ a n d / i " £ H~ are called positive hedge operator and negative hedge operator respectively, h° is a man-made hedge operator to be used to make a distinction between H+ and H~, and there exists an ordering <^ on H. Such as, selecting h° = / , then less (/i°, c), where (fo0, c) is a linguistic value of linguistic variable Truth, such as, (h°, true) and (h°, false), if h° = I, then (/i°, true) = true and (/i°, false) = false. Especially, for each h\>h2 £ H and c=ture, hi > /i 2 , then (fti,c) > (h2,c). Hc=false, then (fti,c) < (/i2>c). E x a m p l e 2.1. Considering dom(T)={more true, more false, true, false, less true, less false}, more true strengthen the meaning of true, that is, (more, true)> (I, true) = true; more false strengthen the meaning of false, so, (I, false) = false> (more, false); less false strengthen the negative meaning of false, so, (less, false)> false. L e m m a 2.1. 15 Let C and H be finite linear ordered sets according to natural linguistic positive meaning, respectively. T = 77 x C = {(h, c)\h £ 77,c £ C} such that\/(h\,c\),(h2,C2) £ T, [h\,C\) < (h2,C2) if and only if hi
89 and only if {h\,cx ) > (/i2.c 2 ), then (V(TC-),

<„) is a lattice.

T h e o r e m 2.1. V{hj,Ci) £ (T, < ) , define f :

T—>T,

'«**«»-{%&». * ^ - : andV{hj,Ci),(hk,cs)

o

€ /(T),

(/i J -,c i )V / (/ l f c ,c s ) = / ( r 1 ( ( / i J - , c i ) ) V / - 1 ( ( / i f c , c s ) ) ) ,

(2)

(/»,-, <*) A ; (Afc>c.) = nr\(hj,ci))

(3)

A r\{hk,ca)n,

tften (f{T), V/, A^) is o lattice, denoted by (T", < / ) , V is defined by Lemma 2.2. Proof. Obviously, (Z", /i2n+i} such that hi < • • • < h^+1 < • • • < ^2n+i, in which H~ = {/ii, • • • , hn} and /f + = {/in+2, • • • i h2n+i}, for any 1 < j , k < 2n + 1, define hj V hk = hmax{j:ky,

hj A hk — hmin{jyk},

(hj)' = /i2n-j+2.

l

ftj ~*L ^fc = ' min{2n+l,2n- i j+fc+l}!

(4) (5)

then (if,V,A,',—>L,/ii,/i2n+i) is a Lukasiewicz algebra. In this paper, (if, V, A,', —*L, hi, 7i2n+i) is called Lukasiewicz algebra on linguistic hedges set. E x a m p l e 3.2. Let C = {c\, • • • , c„, c n + i , • • • , C2„} such that c\ < • • • < c„+i < • • < c 2 „, in which C~ = {ex,--- , c n } and C+ = { c n + 1 , - - , c 2 „ } , for any 1 < i, s < 2n, define c

i v Cs =

Cj

c

max{i,s}i

Cj A Cs = C m j n ^ s j ,

>L Cs = C m j n {2n,2n-t+s}>

(Cjj = C2n—i+1,

(,0) V'i

90 then (C, V, A,', —>L, ci,C2n) is a Lukasiewicz algebra. In this paper, (C, V, A,', —>£,, ci, C2n) is called Lukasiewicz algebra on generator. T h e o r e m 3 . 1 . V(hj,Ci),{hk,cs)

€ (T,<),

define

(hj,Ci)

V (hk,Cs)

= (/l m0 x{j,fc}> c max{i,s}),

(8)

\llj,Ci)

A ^/l/e,CsJ = l,H7nin^j)/c} , Cmj n {j )S } Jj

(."J

(hj,Ci)' (hj,Ci)

—»£, (hk,Cs)

(10)

= (/l2„_j+2,C2n-i + l), =

(/l m j„{2n+J,2n-.7'+fc+l}) c

min{2n,2n~

i+s}

).

(11)

£ften (T, V, A,', —>i, (/ii,ci), (/i2n+i>C2n)) «s a Lukasiewicz product algebra. Definition 3 . 1 . V(hj,Ci),(hk,ca)

e (T',
define

i

(fti,ci)vi(/ifc,cs) = /(/- ((/i 3 ,c J ))vr i ((/ ifc , Cs ))),

(/ij,ci)AL(/lfc,cs) = / ( r 1 ( ( / l i , c O ) A / - 1 ( ( / l f c , c s ) ) ) , (^•.^^/((/-'((ft^Ci)))'), ( h i ) C i ) ^ L (ft fc ,c.) = f{r\{hj,Ci)) -+L / - ^ ( / i f c . c . ) ) ) ,

(12) (13) (14) (15)

then (T',V,A,',—»i,(/i2n+i,ci),(/i2n+ii c 2n)) is called Lukasiewicz algebra model of (T", < / ) , denoted by T. In which (/i2n+i!Ci) and {h,2n+i,C2n) are its the least (absolute False) and the greatest elements (absolute True), respectively. 4.

Lukasiewicz algebra m o d e l of e x t e n d e d linguistic d o m a i n

In real world practice, the domain of linguistic variable Truth is not H xC but H x • • • x H x C, such as, {very very true, very more true, possibly very true, etc. In this paper, ( T n , < / ) is called extended linguistic domain of linguistic variable Truth, in which T„ = Hn x / ( • • • x f{H\ x C)), Hi(i = 1, • • • ,n) is a lattice. E x a m p l e 4 . 1 . Let H = {possibly,rather,very} and C = {true, false}. According to natural linguistic meaning, elements of H are all positive hedge operators, i.e., H+ = H, and possibly < rather < very. Due to understanding, we add several hedge operators in H, i.e., H' = H+ U {h0} U H~, in which h° is a man-made hedge operator to be used to make a distinction between H+ and H~, H~ = {h^,h^,h^}. Let us consider H'xf(HxC), it is a product algebra of H' and f(HxC), and (f{H'xf{Hx C)), V, A) is a lattice , there exist forty-two elements in f(H' x f(H x C)), its diagram is Figure. 1.

91

(v.(r,*)Mr,(u,t)\ (v,

(p>*)W, M)V(p,(v, (f,(p,i)MP.(r,*))/ (P,

(P. *))/

yAp,ip,f)) (P, (r, f)\ (p> (v,

/

V,(P,/))

/))/£;(r ,/W^

$«,(P,/))

:

(r , ( « , / ) )

*<Mt>,/)) Figure 1.

The diagram of / ( # ' X f(H X C)).

In Fig.l, p = possibly,r = rather and i; = very, (i>, (v, t))=(very, (very, true)), • • • ,(v,(v, f))=(very, (very, false)). All elements {h*,ti) (h* £ {/i°,/ij",/ij, / i j } and £» 6 / ( # x C)) are omitted. In H' x / ( / / x C), let indexes be {/ij",/iJ,/iJ,/i4,P5,T , 6, V7} and { ( w , / ) i , ( r , / ) 2 , (p,/)3,(p,*)4,(r,t)6,(w,t)6}, then H' x / ( J ? x C) is a Lukasiewicz product algebra, and the operators of Lukasiewicz algebra model of ( / ( # ' x f(H x C)), < / ) can be obtained by Eq.(12)- Eq.(15). 5. R e a s o n i n g by directly handling linguistic values of Truth An assertion means a pair A = {p(x; u), t), where p{x; u) is a vague sentence and t is a linguistic truth degree. (Rule)-

(Pl,h),--{Q\,S$,--

•

,(Pn,tn) ,{Qm,Sm)'

(16)

in which (Pi, ti)(i = 1, • • • , n) are premises and (Qj, Sj) (j = 1, • • • , m) are conclusions. According to the natural semantic of linguistic sentence, we

92 obtain the following rules of moving hedges h: .

(p(x;hu),t) (p(x;u),{h,t)Y (p(x;hiu),(h2,t)) n • (p(x;u),(h2,(hi,t)))' RT

ffT n 2 RT

. (p{x;u),(h2,{hi,t))) (p(x;hiu),{h2,t))

'

(p(x;u),(h,t)) (p(x;hu),t) (p(x;hiu),(h2,t)) 2 ' • (p(x;u),(hu(h2,t)))

R T i 3

RT

i

l

'

(p{x;u),{h2,(hi,t))) ' (p{x;h2u),(hi,t))

^ ,ig)

From logic point of view, the following rules can be obtained: .

R

R 2

{p{x;hlu),tl),{q{y;h2v),t2)

. ' R

(P(x;hlu),tl),(q(y;h2v),t2) {p{x;u)\Jq{y,v),{h1M)V{h2,t2)Y . (p(x;hiu),h) -> (g(y;h2v),t2)

(21)

^

(p(x; u) -> q(y, v), tx -> t2) P H i :

(P(^. u) ~* g(2/. v ), h), (P(x, u), t2) {q{y,v)M®t2) '

p ( p ( x ' " ) . *i) -* (g(y» •")' f2)» ( p f a u)^s) K5 . —; r—^

.

,

. W>

,0As (24)

Where, operator " " is the lukasiewicz conjunction. Rq Hb

. (p(x;u),t) • (p(a;u),ty

P~Q,(F(P),ht) Hh

-

(F(Q/P),ht)

'

{ 5)

E x a m p l e 5.1. Suppose the following linguistic sentences: 1. If a student works rather hard is very true and his university is very high-ranking is rather true , then he will be a good employee is possibly very true . 2. The university where Robert studies is very high-ranking is possibly true. 3. Robert is studying very hard is rather true. The question: how about is Robert will be a good employee? By formalizing, 1, 2 and 3 can be rewritten as following: 4. (studying(x; rather hard), (very, true)) A (is(Univ(x); very highranking), (rather, true)) —>(employee(x; good), (possibly, (very, true))). 5. (is(Univ(Robert); very high-ranking), (possibly, true)). 6. (studying(Robert; very hard), (rather, true)). Deduction process is as follows:

93 (1) (('studying(x; hard), (r, (v, t))) A (is(Univ(x); high-ranking), (v, (r, t)))) ~+ (employee(x; good), (p, (v, t))), (2) (studying(x; hard) A is(Univ(x); high-ranking), (r, (v, t)) A (v, (r, t))) —> (employee(x; good), (p, (v, t))), (by 1 and Eq.(20)) (3) ((studying(x; hard) A is(Univ(x); high-ranking))—* employ ee(x; good), ((r, (v, t)) A (v, (r, t))) -> (p, (v, t))), (by 2 and Eq.(22)) (4) (is(Univ(Robert); high-ranking), (v, (p, t))), (by v ) and Eq.(17)) (5) (studying(Robert; hard), (v, (r, t))) (by vi) and Eq.(17)) (6) (studying(Robert; hard) A is (Univ (Robert); high-ranking), (v, (r, t)) A (v, (p, t))), (by 4, 5 and Eq.(20)) (7) ((studying(Robert; hard) A is (Univ(Robert); high-ranking))-^ employee (Robert; good), ((r, (v, t)) A (v, (r, t))) -> (p, (v, t))), (8) (employee(Robert; good), (((r, (v, t)) A (v, (r, t))) -> (p, (v, t))) ® ((v, (r, t)) A (v, (p, t)))). (by 6, 7 and Eq.(23)) According to Example 4.1, we obtain (r 6 , (v, t)B) A (v7, (r, t) B ) = (r 6 , (r, t ) 5 ) ,

(26)

((re, (u, t)6) A (v 7 , (r, i) 6 )) -> (p5, (v, t)6) = (re, (r, t) B ) ~> (PB, (V, t) 6 ) = (r fl , («, t)e),

(27)

(W7, (r, t)s) A (v7, (p, t)t) = (v7, (p, t)i),

(28)

(re, (v, t)a) ® (v 7 , (p, t)4) = (r 6 , (p, t) 4 ).

(29)

Hence, the conclusion is (employee(Robert; good), (r, (p, t))), i.e., Robert will be a good employee is rather possibly true. 6. C o n c l u s i o n In this paper, based on H x C, algebra structures of linguistic domain of Truth are discussed. On the other hand, based on f(HxC) and Lukasiewicz product algebra, we obtain Lukasiewicz algebra model of linguistic values of Truth. Hence, reasoning of linguistic values is embedded in logic system, and the deduction of linguistic values has reached high confidence. The conclusions of this paper can be applied to linguistic decision, linguistic information retrieval, and so on. Acknowledgments This work is supported by the National Natural Science Foundation of China (Grant No. 60474022). Education Department Foundation of Sichuan Province (Grant No. 2005A121).

94

References 1. Paul P. Wang, Computing with words (John Wiley and Sons, Inc, 2001). 2. Hajek. Petr., Methamathematic of fuzzy logic (Kluwer Academic Publishers, 1998). 3. V. Novak, I. Perfilieva, J. Mockof, Mathematical principles of fuzzy logic (Kluwer Academic Publishers, 1999). 4. L. A. Zadeh, "Fuzzy logic = computing with words", IEEE Trans. Fuzzy Systems. 4 (1996) 103-111. 5. L. A. Zadeh, "Toward a theory of fuzzy information granulation and its centrality in houman reasoning and fuzzy logic", Fuzzy Sets and Systems. 9 0 (1997) 103-111. 6. S. Ben-David, R.Ben-Eliyahu-Zohary, "A modal logic for subjective default reasoning",Artificial Intelligence. 116 (2000) 217-236. 7. P.Cholewinski, V.M.Marek, A.Mikitiuk and M.Truszczynski, "Computing with default logic", Artificial Intelligence. 1 1 2 ( 2 - 3 ) (1999) 105-146. 8. J.L.Castro, "Fuzzy logics as families of bivalued logics", Fuzzy Sets and Systems. 6 4 (1994) 321-332. 9. Yang Xu, Da Ruan, Etienne E. Kerre and J u n Liu, "a—Resolution principle based on first-order lattice-valued logic LF(X)", Information Sciences. 1 3 2 (2001) 221-239. 10. Yang Xu, Da Ruan, Keyun Qin and J u n Liu, Lattice-Valued Logic (SpringerVerlag Berlin Heidelberg, 2003). 11. M.Chechik, S.Easterbrook and V.Petrovykh "Model-checking over multivalued logics", In Proc. of the 2001 FME. LNCS. 2 0 2 1 (2001) 72-98. 12. R.O.D'Aquila, C.Crespo, J.L.Mate and J.Pazos, "An inference engine based on fuzzy logic for uncertain and imprecise expert reasoning", Fuzzy Sets and Systems. 1 2 9 (2002) 187-202. 13. F.Herrera, E.Herrera-Viedma, "Aggregation operators for linguistic weighted information", IEEE Trans.System,Man, Cybernet.-Part A:Systems Humans. 2 7 (1997) 646-656. 14. Nguyen Cat Ho, Tran Dinh Khang, Huynh Van N a m and Nguyen Hai Chau, "Hedge algebras, linguistic valued logic and their application to fuzzy reasoning", International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 7 (1999) 347-361. 15. Z. Pei and Y. Xu, " Lattice implication algebra model of linguistic variable T r u t h and its inference", in Applied Computational Intelligence, eds Da Ruan, et al (World Scientific, 2004) p p . 93-98. 16. N. C. Ho and W. Walfgang, "Hedge algebras, linguistic valued logic and their application to fuzzy reasoning", Fuzzy Sets and Systems. 3 5 (1990) 281-293. 17. N. C. Ho and W. Walfgang, "Extended hedge algebras and their application to fuzzy logic", Fuzzy Sets and Systems. 52 (1992) 259-281. 18. L. A. Zadeh, " T h e concept of linguistic variable and its application to approximate reasoning (I), (II)", Information Sciences. 8 (1975) 199-149, 3 0 1 357.

P R O P O S I T I O N A L L O G I C L6P(X) BASED ON SIX-LINGUISTIC-TERM LATTICE IMPLICATION ALGEBRA *

WANG WEI Department of Electrical Engineering, Southwest Jiaotong University Chengdu 610031, China E-mail: [email protected] XU YANG AND ZOU LI Department of Mathematics, Southwest Jiaotong University Chengdu 610031, China E-mail: xayangQhome. swjtu. edu. en

In this paper, we present mathematical properties of lattice implication algebra LQ with six linguistic terms: true, false, more true, more false, less true, less false. We also discuss the properties of propositional logic LeP(X) and present the structure of generalized literals and the standardization of formulae.

1. I n t r o d u c t i o n Since 1990, there have been some important conclusions on linguistic truthvalue inference. In 1990, Ho * constructed a distributive lattice-Hedge algebra, which can be used to deal with linguistic truth-value propositions. In 1996, Zadeh 3 discussed the formalization of some words and the proposition of natural languages. He presented the standardization forms of language propositions and production rules, and discussed the fuzzy inference, which has some linguistic truth-values, based on the fuzzy set theory and the fuzzy inference method. Between 1998 and 1999, Turksen 4 , s studied the formalization and the inference about nature languages which have descriptive words and substantive words. It is only a formalized statement that the research has been carried "This work is supported by the Natural Science Foundation of P. R. China (Grant No. 60474022)

95

96 out on inference which has linguistic truth-values, and it isn't on a logical system. Because there are many differences between the world which we described by the two-valued logic and the world which we are living in, it is unreasonable to solve the problem by using the inference which has uncertainty information by the two-valued logic, so we use the lattice-valued logic as a basis of inference which has some linguistic truth-values. The problem about automated theory proof on M-V logic and fuzzy logic has been researched by many people. Many developments and generalizations on classical resolution have been given. But the logical implication used by researchers is Kleene implication (p —> q = p' V q). In syntax, an M-V logical (or fuzzy logical) formula is equivalent to itself in the classical logic. It is important to give an implication connective which is different from Kleene's, and to give a resolute method which is different from the classical resolute method in order to deal with uncertainty information. In 1993, Xu 7 conbined lattice and implication algebra, and found new algebra-lattice implication algebra. Because the implication in lattice implication algebra is a generalized one described by axioms, and it can be used to solve the problem of the inference in the real world, the lattice-valued propositional logical system LP(X) and the lattice-valued first-order logical system LF(X), which are based on the lattice implication algebra, were constructed by Xu 8 ' 9 . Based on the lattice-valued logical algebra, the a—resolution principle on logical system LP{X) and LF(X), which are generalization of the classical resolution principle, were given by Xu 11>12. The a—resolution principle can be used to prove a set of lattice-valued logical formulae being a—false (a £ L). For studying the uncertainty and automated inference with the linguistic truth-value, Pei and Xu 1 3 presented a lattice implication algebra model of a kind of linguistic variable truth, and discussed its inference. In this paper, based on Xu's work of linguistic truth-value implication algebra, the properties of LQ and the properties of LQP(X) were discussed. 2. T h e p r o p e r t i e s of LQ and

LQP(X)

The lattice implication algebra LQ is denned in Fig.l. Clearly, less true, less false and more true are dual numerators of LQ. In LQ, the operation "/" is equivalent to the negative connection of natural language and is defined as follows, (true)' = false, (false)' = true, (more true)' = more false, (more false)' — more true, (less true)' = less false, (less false)' = less true. The implication operation "—>" is equivalent to the implication connection of natural language and is defined in Table 1.

97 a true less false
\ more true y'

\ less true

false \f Fig.l Hasse Diagram of L6. Table 1. The operation "—»" in LQ.

-> false more false less false less true more true true

false true more true less true less false more false false

more false true true more true less false less false more false

less false true true true less false less false less false

less true true more true less true true more true less true

more true true true more true true true more true

true true true true true true true

According to the properties of lattice implication algebra, for arbitrary a,b £ LQ, we have a V 6 = (a —> 6) —> b, a A b = (a' V 6')', and the following theorem. T h e o r e m 2 . 1 . For L§, we have the following properties: (1) (2) (3) (Vj f5j (7>,)

Va £ Le, a —»false = a', false —> a = true, true —* a = a. Va £ LQ, less false —> a = (less true) V a. Va £ LQ, less true —» a = (less false) V a. Va £ Lg, a —> /ess /rue = a' V /see /rwe. Va £ LQ, a —» /ess /o/se = a' V /ess /a/se. Va £ L$, if a ^ more false, and a ^ more true, Zfoen aV a' = true, a A a' = false.

Since LgP(X) is a lattice implication algebra. For arbitrary logical formulae F,G £ L6P(X), let F V G = ( F - * G) -» G, F A G = ( F ' V G')'. T h e o r e m 2.2. For even/ logical formula F £ LQP(X),

(1) f2j (s) ^J (5j

less true —• F = (/ess /rue)' V F. less false -+ F = (/ess /a/se)' V F. /roe - 4 F = F . F —* less true = (/ess true) V F ' . F -> less false = (less false) V F ' .

then

98 In the lattice implication algebra LQ, the implication operations are Kleene implications except for more true —> more false and more false —» more true. According to Theorem 2.2, if a ^ more false, a ^ more true, then o —» F and F —> a are Kleene implications. Here, F is a logical formula of L§P{X). It is well know that a logical formula, which only contains Kleene implications, can be translated into a conjunctive (disjunctive) normal form. T h e o r e m 2 . 3 . For every variable x more false, x A (x —> x') < more false. 3. Generalized literals of

£ LQP(X),

we have x A x'

<

LgP(X)

A lattice-valued propositional logical formula F of LgP(X) is called an extremely simple form, for short ESF, if a lattice-valued propositional logical formula F* obtained by delating any constant or literal or implication term appearing in F isn't equivalent to F. A lattice-valued propositional logical formula F of L§P{X) is called an indecomposable extremely simple form, for short IESF, if F is an ESF containing no connective other than implication connectives and order-reversing involutions. An IESF is called an n — IESF if there are n implication connectives occurring in it. Definition 3 . 1 . All the IESF

of LQP(X)

are called generalized literals.

For convenience, we call an n — IESF an n—generalized literal. In general, the constants and variables of L&P{X) are called 0—generalized literals, the order-reversing involution of variables of LQP(X) are called 1—generalized literals. If F is an n—generalized literal (n > 1), and F' is also a generalized literal, then F' is an (n + 1)—generalized literal. T h e o r e m 3 . 1 . For every n—generalized literal F, n > 1, then there isn't a constant occurring in F, or only there is a constant of {more true , more false, false} occurring in F. P r o o f If n = 1, and there exists a constant c occurring in F, then F has the following forms: c —> x, x —> c. Here, x is a propositional variable. If c ^ {more true, more false, false}, F is equivalent to the formula of following: x, I, c' V x, x' V c. This contradicts that F is a 1—generalized literal. Suppose the conclusion holds for n < k. If n = k + 1, according to the Lemma 11.3.2 6 , there exist k\— generalized Fi and fo—generalized literal

99 F j , such that F = Fi —• F2, fci + &2 + 1 = fc + 1. If there is a constant c occurring in F, then c occurring in F\ or F 2 . If c occurring in F x , and k\ = 0, then F = c —• F2, and c ^ {more true, more false, false}, then F is the one of the following forms: F = F2, F = c' \/ F2. This contradicts that F is an n—generalized literal. If c occurring in F\, and fci > 0, by the proof about n = 1 and the hypothesis of induction, we have c 6 {more true, more false, false). If c occurring in F2, we can prove that the conclusion holds similarly. According to the mathematics induction, the conclusion holds for every number n. For convenience, we let A = {more true, more false, false}, A* — {more true, more false}. T h e o r e m 3.2. For arbitrary a, b g A*, we have (1) b -> (6 -> a) = b -> a. (2) {a->b)-+a = a. T h e o r e m 3 . 3 . For every propositional formula F of LQP(X), b £ A*, we have

and every

(1) {F^b)^F = F. (2) F -» ( F ^ 6) = F -» 6. P r o o f We only give the proof of conclusion (1), the proof of conclusion (2) can be given similarly. For an arbitrary valuation of LQP{X). If v{F) € A*, according to Theorem 3.2, (v{F) -» 6) -> v{F) = v(F). If v(F) <£ A*, according to the definition of implication operation, (v(F) - • & ) - > v(F) = (v(F)' V b) -

v(F) = {v(F)' -» v{F)) A (6 -» v(F))

= v(F)A(b-*v(F))

= v(F).

Then the conclusion (1) holds by the arbitrarily of valuation v. T h e o r e m 3.4. For arbitrary propositional have

variable x and a,b £ A*, we

(1) (a^F)^(F^b) = F-+b. (2) ( F -+ a) -» ( F -* 6) = F -» (a -» b). (3) ( ( F -» a) -> 6) -» a = ( F -> a) V (6 -> a).

P r o o f We only give the proof of conclusion (1), the proof of conclusions (2) and (3) can be given similarly. If a = 6, we know that the conclusion (1) holds by properties of lattice implication algebra. If a ^ 6, for an arbitrary valuation v of LQP(X), if v(F) € A*, we can prove (o -> v(F)) -* (v{F) -> 6) = w(F) -> b easily. If u(F) ^ A*, according to the definition of implication operation, and a,b € A*, and a 7^ 6, we know o' = 6, o' —> 6 = true, (a -> u(F)) -» (u(F) -» 6) = (u(F) V a') -> ( u ( F ) ' V 6) = ( u ( F ) ' V (t»(F)' V 6)) A ((a' - • u(F)') V (a' -> 6)) ( v ( F ) ' V 6) A true = v(F)' V 6 = v(F) -> 6. Hence, (1) holds by the arbitrarily of valuation v. Definition 3.2. A logical formula F is called an implicative formula, if F only contains a non-Kleene's implicative connection. T h e o r e m 3.5. For every logical formula F of LQP(X), there exist implicative formulae Fjk, j S J, k € K, where J, K are finite index sets, such thatF= V A Fik. P r o o f If F is an implicative formula, it is clear that the conclusion holds. If F has 1 Kleene's implicative connective, then F is one of the following forms: a —• x, x —> a. Here, a j= more true, and a =^ more false. We know that the conclusion holds by Theorem 2.1. Suppose the conclusion holds when F contains k implicative connectives. If F contains fc + 1 implicative connectives, and F = a —> F j , where this implicative connective is Kleene's implicative connective, is a subformula of F , then a —> F\ = a' V Fx by Theorem 2.1, and F can be translated into a conjunction of subformulae Fj* and F£ which have a' and Fi as its subformulae respectively. Since the numbers of implicative connectives occurring in Ff and F2* are all less than k, according to the hypothesis of induction, there exist logical formulae Fjk, j € J i , k £ K\, and F?fc, j € J2, k e K2, where J\,K\,J2, K2 are finite index sets, such that F;=

V

A Fh,Fj=

V

A F?fc.

101 Let J = JlUJ2,K

=

K1UK2 ' Fjk, jGJr,keKi; _ _ I true, j G Juk £ K2; jk | true, j £ J2,k€ Ki; . Ffk, j£j2,keK2.

Clearly F = V

A F,t., and the conclusion holds.

j€Jk£K

3

According to the mathematical induction, for arbitrary logical formula F, the conclusion holds. By Theorem 3.5, we only need to study the implicative formula, and obtain the following theorem. T h e o r e m 3.6. (1) x —> a, (x —* a) —> y, (y —> x) —• a, y —> (x —» a) are generalized literals if and only if a £ A; (2) b —> x, (6 —> x) —> y, (x —> 6) —-> y, b —» (x —> z) ore generalized literals if and only if b 6 ^4*; f3j y - t i , (y -+ x) ->• z, (j/ -+ x) -» y, y -> (a; —> z) and x -+ ( i -> 2) are generalized literals; (4) {x —> b) —» a is a generalized literal if and only if a £ A, b £ A* and a ^ b. (5) (b —• 2) —> a is a generalized literal if and only if a Q A, b £ A*, and a 7^ b'. (6) a —> (b —+ x) is a generalized literal if and only if a = b e A*.

4.

Conclusions

In this paper, the mathematical properties of lattice implication algebra L6 with six linguistic terms were presented. The properties of propositional logic LQP(X), the structure of generalized literals and the standardization of formulae were proposed. These results will be a basis for automated inference dealing with uncertainty information under six linguistic truth-values. Our future work is to study how to select the resolution level a, and how to find the a - resolution field of a generalized literal of LgP(X). Another future work is to prove the soundness theorem and completeness theorem about this a—resolution principle, and present the automated reasoning method based on this a—resolution principle.

References 1. C. Nguyen Ho and W. Wechler, Hedge algebras: an algebraic approach to structure of sets of linguistic true values, Fuzzy Sets and Systems, 35, 281-293 (1990). 2. C. Nguyen Ho and W. Wechler, Extended hedge algebras and their application to fuzzy logic, Fuzzy Sets and Systems, 52, 259-281 (1992). 3. L. A. Zadeh, Fuzzy logic = computing with words, IEEE Trans. Fuzzy Systems, 4, 103-111 (1996). 4. I. B. Turksen, A. Kandel and Y. Q. Zhang, Universal truth tables and normal forms, IEEE Trans. Fuzzy Systems, 6(2), 295-303 (1998). 5. I. B. Turksen, Type / and type II fuzzy system modeling, Fuzzy Sets and Systems, 106, 11-34 (1999). 6. Y. Xu, D. Ruan, K. Y. Qin and J. Liu, Lattice-Valued Logic, Springer, 2003. 7. Y. Xu, Lattice implication algebra, Journal of Southwest Jiaotong University, 28(1), 20-27 (1993). 8. K. Y. Qin, Lattice-valued propositional logic (I), Journal of Southwest Jiaotong University, 2, 123-128 (1993). 9. K. Y. Qin and Y. Xu, Lattice-valued propositional logic (II), Journal of Southwest Jiaotong University, 2, 22-27 (1994). 10. Y. Xu, D. Ruan, E. E. Kerre and J. Liu, a-Resolution principle based on lattice-valued logic LP(X), Information Sciences, 130, 195-223 (2000). 11. J. Liu, Y. Xu and D. Ruan, a-Automated Method Based on LP(X), Proc. East West Fuzzy Colloquium 2000, Germany, 2000. 12. Y. Xu, D. Ruan, E. E. Kerre and J. Liu, a-Resolution principle based on firstorder lattice-valued logic LF(X), Information Sciences, 132, 221-239 (2001). 13. Z. Pei and Y. Xu, Lattice implication algebra model of a kind of linguistic terms and its inference, Proc. FLINS 2004 6th International Conference on Applied Computational Intelligence, Blankenberghe, Belgium, 93-98 (2004). 14. W. Wang, B. Q. Jiang and Y. Xu, a- Automated Reasoning Method Based on LP(X), Proc. FLINS 2004 6th International Conference on Applied Computational Intelligence, Blankenberghe, Belgium, 105-109 (2004). 15. W. Wang, Y. Xu and X. F. Wang, a-automated reasoning method based on lattice-valued propositional logic LP(X), Journal of Southwest Jiaotong University, 10(1), 98-111 (2002). 16. W. Wang, Y. Xu and Z. M. Song, The logic properties of filter of lattice implication algebra, Chinese Quarterly Journal of Mathematics, 16(3), 8-13 (2001).

WEIGHTING QUALITATIVE FUZZY FIRST-ORDER LOGIC AND ITS RESOLUTION METHOD* LI ZOU 1 2 , BAIHUA LI 2 , WEI WANG 1 , YANG XU 1 'Center of Intelligent Control and Development, Southwest Jiaotong University, Chengdu, 610031, Sichuan, P.R.China 2 School of Computer and Information Technology, Liaoning Normal University, Dalian, 116029, China

Using Kripke-style semantics, a kind of qualitative fiizzy logic system that can reflect the "elastic" of fuzzy proposition is proposed. The truth value of fuzzy proposition is not a singleton but depends on the context in the real world. Consider a fuzzy proposition one will choose the equivalence relation to get different classes. Based on an equivalence class and its weight, a qualitative fuzzy proposition can hold. Some properties of this system are also discussed. Considering the weight of different class, a method to aggregate the losing information is presented. With the alternation of the possible world and its weight, a dynamic resolution method is introduced.

1. Introduction In classical logic the truth value of a proposition is true or false. Since there are many fuzzy concepts in the real world, the truth value of fuzzy propositions is a real number in the interval [0, 1] [1]. Is it a singleton truth value for a given proposition? As we know, a fuzzy proposition will be given different truth values because of different people or different circumstances. Base on the qualitative fuzzy set [2~7\ the fuzzy characteristic of a fuzzy concept a qualitative fuzzy propositional logic which can reflect the "elastic" of a fuzzy proposition was presented I8]. For example a proposition "5000$ is much money" is it true? Some poor man say: "It is very true!" but a few millionaires say: "It is a little false." If now the poor man is main component of the real world then the truth value should be true. This paper aims to build another fuzzy logic system that can reflect the "elastic" of a fuzzy proposition. The truth value of fuzzy propositions is not a singleton which depends on the context in the real world. Consider a fuzzy proposition one will choose the equivalence relation and get different classes. ' This work is supported by the National Nature Science Foundation of China with granted No. 6047402 and the Innovation Fund of Southwest Jiaotong University for PhD. Candidate.

103

104 Each class will be given a weight. Then based on equivalence classes, qualitative fuzzy propositions can be obtained. The resolution method is discussed. In order to solve the problem of losing information, the weight of possible worlds can be considered so that all the information will be aggregated in the process of resolution.

2. Qualitative fuzzy logic system We consider a fuzzy proposition with different individuals in the real world. For a fuzzy proposition different people will give different truth values. Hence we should consider certain kind of people at first. Then we consider the truth value of the proposition with the given real world. We attempt to represent fuzzy logic using Kripke-style semantic with a little changing. The modal logic interpretation of various uncertainty theories is based on the fundamental semantics of modal logic [9~'l]. That is a model of modal logic M=<W, R, V>, where W, R, V denote a set of possible worlds, a binary relation on W and a value assignment function respectively. The truth (T) or the falsity (F) is assigned to each atomic proposition, le.,V;WxQ->{T,F} where Q is the set of all atoms. Definition 2.1 Let W be the collectivity of possible worlds and 4> be an atom symbol. {w,u) is u in the real world w. Let O be a weighting function, Q : W— [0, 1] such that V £2. =\. The model of qualitative fuzzy logic with the weighting function is a quaternion M=< W, R, I, Q>, where W is a non-empty set of possible worlds; J? is a binary equivalence relation on W and O is the weight of possible world. Let / be a mapping: I:W* { (w„ ui) is a formula;

105 (2) If G/, Cw"are qualitative fuzzy logic formulae then ~C W ', G/VG/', G/A G/', Gv,-*Gv" and Gv'<-> Cw" are qualitative fuzzy logic formulae. (3) If Cw(x) is a formula, * is a free variable of Cjx), then (Vx) (Gv(x)), (3x) (Cw(x)) are formulae. (4) All the propositional formulae are generated by applying the above rules with finite times. Remark 2.1 If there is one element in the possible world and « e [0, 1], then the WQFL proposition is a classical fuzzy proposition. So the WQFL is an extension of the classical fuzzy logic. Let 4> (w\ it1) and V (w", u") be atom formulae in WQFL where w'e [w"] R. The valuation of the logical connectives is defined as follows, for a given w e W: (1) V£~ 4>(w',u'))=l-u' (2) FA <*>««')V w(w",u"))=max{u',u"} (3) V£" can be defined by "~," "V," "A." Definition 2.4 For a given equivalence relation R in W, the truth value of a WQFL formulae Gv is defined as follows:

Let Cw' and Dw»be two formulae of WQFL where w' e [w"] R. The valuation of the logical connectives is defined as follows, for a given wG W; S is the set of individual symbols: (4) FK-CVH-FAC,/); (5) FXClvVDlv..)=max{ P^CV), V,(DW.)}; (6) K/(Cl/AAv..)=min{ FXCW.), FXZV)}; (7) F A C v ' - A ^ FA-CVVZV); (8) ViCw <-> ZV)= Vi{Cw^Dw.) A (£)W.^CW.)); (9) KX( Vx)(C„M)=inf { ^ ( C J x ) ) } ; (10) V,((3x) (Cw(x))=sup{Vl(Cw(x))}

•

xeS

Theorem 2.1 Let
(2)EWtA(CWiADnj)

V g V D ;

CWVDW=DWJACWI;

= (EWtACWi)ADWi,EWiV(CwyDWi)

= (EWk

106 (3) Kk

V (CWIADWJ)-(EWIVCWI)A

(Ewt

V

DW/),

EWt A ( C „ , V ^ H £ , A C W i ) V (E„t ADWj);

(4)cWiAcwrcWi,cwycWi=cWi; (5)CWiAT=CWi,Cwy?=CWi; (6)(CWV

DWj) ACWi =

V)~~CWI

CWi,(CWiADWj)VCwrCWi;

= CWI;

( 8 ) ~ ( C , A ^ ) = ~ C „ V~D W > , ~ ( C „ V D „ ; ) = ~ C „ A~/>„,; (9) (VxXQ, (*)) A (Vx)(DWj (x)) = (Vx)(CWj (x) A DWj (x)); (10) (Bx)(CWi (x)) v (3xXZ\ (x)) = (3x)(CWi (x) v Z^ (x)) ; (11) (VxXC, (x)) v (VxXA., (*)) = (VXXVJO(CW, (*) A DW/ Cv)); (12) pxXQ, (x)) A (3x)(D^ to) = (3x)(3j)(Q: (X) A £>„y (y» Definition 2.5 Let C„, be a PFgFZ. formula and let a e [0, 1]. If there exist Tfla ^

a

f° r a ^ w i e M «

m a

possible world w, then the WQFL formula Cw

is called a -fuzzy true in the possible world [w]R , denoted it by (w, a )-true; Whereas, if there exist STQU ^ a ^or a " w i G H « m t n e possible world w then the WQFL formula Cw is callled a -fuzzy false in the possible world [w]R, denoted as (w, a )-false. The WQFL formula Cw is called a -identically true if for all possible worlds [wi]R in Wthere exist V Q . U . >

a

.

i

The WQFL formula Cw is a -identically false if for all possible worlds [w{]R in Wthere exist V Q . M . ^

a

•

i

Definition 2.6 Two WQFL formulae C n

are said to be equal if w, e

n

[wy]« and V Q M = y1 Q i

and D

U

for all interpretations /, denoted as Cw = Dw .

J

Definition 2.7 R is an equivalence relation on W, w^W, (f(£/)//?={Cw|w,e [wfo.i'e {1,2,...,«}. From above C„, n C„, = ^ and U c^ = (F(U) hold. So
107 (l)If the possible world w £ W is considered a real number on [0,1] and denoted the qualitative fuzzy proposition P(w,u) by wP(u) then the operator fuzzy logic system can be held1'21; n

(2)Assume V[>]. = 1 and w, 5=0 then V(CW)= V[w]. x «. hence the weighting fuzzy logic holds; i=l

(3)Assume fF=[0,l] and w+w^l, so intuitionistic fuzzy logic system can be held where w and u represent true degree and false degree of intuitionistic fuzzy proposition respectively1131. Therefore WQFL is a generalization of classical fuzzy logic and it is more flexible in reasoning. 3. A resolution method Since the resolution principle of classical logic is presented Robinson'141, many methods of fuzzy logic resolution have been discussed'151, such as the resolution method of operator fuzzy logic and the resolution method of intuitionistic fuzzy logic'161 and so on. We will extend the method of fuzzy logic resolution to qualitative fuzzy logic. Definition 3.1 Let
"i) e Cv, . i=l»2,--.,/c } is called a (w,u)i

qualitative project set of 0.5 if u'^u" is satisfied then
0,(w,,Ki)e<Pand ^ Q . i / , ^ 1 1 " Since we have V*Q(w(. •> «'. Thus we obtain O t E: Piwuy /

Therefore

P(w,u") £ Aw,"')'

We can convert a WQFL formula into a disjunction normal form using Definition 2.3 with finite times. Denote it by H={Hk(wkuk)\h=\,2, •••,«} where Hk(wicUk) is a clause. The definition of Skolem form is the same as the classical first-order logic too. We can convert a WQFL formula into a disjunctive normal form using the properties of WQFL and using Theorem 2.1 finite times.

108 Definition 3.2 For a clause H with different true degrees in different world and w^W, u ^ 0.5 are given, if //(„,,„) is identically false then H is (w,u)identically false. Theorem 3.1 The clause set H is (w,w)-identically false iff there is a deduction that can be a resolution null clause from //(„,,„). So from the resolution principle of the classical logic we can obtain that there is a deduction that can be a resolution null clause from //(w,„) and vice versa. Using Definition 3.1 and Lemma 3.1 we obtain the following theorem: Theorem 3.2 If H is (w,w)-identically false then for a class w of a given equivalence relation R, u' e [0.5,1] and u'^u then H is (w,w')-identically false. How to prove V (w , , u ,) exists for a given clause set H={ V *(w*,,«*)|*=l,2,...,«}? The main idea is: choosing a certain class w0 in the equivalence relation and a certain u0 (we consider this as a threshold which represents the reliability of the deduction), we can get the qualitative project set. If we get a null clause then (w ,, u,) is (w0, w0)-true; If we can't get the null clause we can decline the reliability and take another class in the equivalence relation gradually and repeat the resolution process until getting the null clause. Here is the arithmetic: Step 1 :y'=0, take equivalence relation R, zo=[wv]. Step 2: j'=0, v0=u0=uv. Step 3: Resolute //, v. u ~ v ( w ^ "if)If get the null clause, then stop. Hence formula C can be deduce from A -> B and B -> C . Assume w, w2 w3 w4 w5 w6N y/A 0.4 0.5 0.3 0.7 0.8 1 , y/B 0.3 0.1 0.2 0.2 0.1 0 j c 0.8 0.7 0.6 0.7 0.9 0.8, WIR={ z0 =(yvi, w2, w3), z\ =(w4, w5), z2=w6}

109 Prove: Let z0=[wi]«, o

=0.2, n 1*1

^w

=0.5, fi =0.3, fi„ =0.6, Q„ =0.4, W2

Wj

WA

Wj

= 1

A^={ *r,(w,,0.4,0.2), y2(w2,0.5,0.5), ^2(w2,0.3,0.3)} Bw={ ^(w,,0.3,0.2), ^2(w2,0.1,0.5), if2(w2,0.2,0.3)} C^ ={ ^i(w,,0.7,0.2), if2(w2,0.8,0.5), Sf2(w2,0.6,0.3)} V(AWi) = 0.32 V(BWi)=0.17 ^(C„)=0.72 F^->fl)=^Mvfl)=0.68 ^.(S-»Q=^,(~5vQ=0.83 K2,(^->Q = K2.(~^vC) = 0.72 Take (z0,0.7), we get //(»,,)={~5vC, ^,~C} cannot get a null clause. Then take (z0,0.6) we get clauses as follows: (1) ~AvB (2) ~BvC (3) A (4)~C (5)5 From (1) and (3) (6) C From (2) and (5) (7) null From (4) and (6) Hence for the possible worlds wi,w2 and w3 we can induce A -> C with the reliability 0.6. For some losing information, we can use the weight of the equivalence class to aggregate all the data, such as let Q = 0.3 , Q = 0.6,

n2 =o.i. 4. Conclusions In the real world the truth value of a fuzzy proposition should be decided by different individuals in different context. In this paper we introduced the qualitative fuzzy first-order logic system base on Kripke-style and discussed its resolution method. The further work is to find a better method to deal with the possible world. The method of the qualitative fuzzy logic can also be extended to modal logic and temporal logic.

110 References 1. Zadeh L.A. Fuzzy set, Information and Control 8, pp338~353,1965. 2. 3.

4. 5. 6.

7. 8. 9. 10.

11. 12. 13. 14. 15. 16.

Lin T.Y., A Set Theory For Soft Computing, Proceeding of 1996 IEEE International Conference on Fuzzy Systems, ppl 140-1146, 1996. Lin T.Y., Neighborhood Systems: A Qualitative Theory for Fuzzy and Rough Sets, Advances in Machine Intelligence and Soft Computing, Volume IV, Ed. Paul Wang, ppl32-155, 1997. Lin T.Y., Granulation and Nearest Neighborhoods: Rough Set Approach, Granular Computing: An Emerging Paradigm, W. Pedrycs (ed), PhysicaVerlag,ppl25~142,2001. Lin T.Y. and Tsumoto, S., Qualitative Fuzzy Sets I: 'Elastic' Membership Functions, The Fourth Asian Fuzzy Systems Symposium, 2000. Lin T.Y., and Tsumoto, S., Qualitative Fuzzy Sets Revisited: Granulation on the Space of Membership Functions, The 19th International Meeting of the North American Fuzzy Information Processing Society, pp331-337, 2000. Thiele H., On the Concept of Qualitative Fuzzy Set, Electronic Edition (IEEE Computer Society DL), pp282~287,1998. Xin Liu, Li Zou, Qualitative Fuzzy Logic System and Its Resolution Method, Proceedings of International Conference on Machine Learning and Cybernetics, pp 2000-2004, 2004. Resconi G., Klir G.J., Harmanec D., Clair U.St., Interpretations of Various Uncertainty Theories Using Models of Modal Logic: a Summary, Fuzzy sets and System Vol (80), pp7~14, 1996. Huynh V.N., Nakamori Y., Ho b T.B., Resconi G., a Context model for Constructing Membership Functions of Fuzzy Concepts Based on Modal Logic, Eiter T., Schewe K.D.(Eds), Foundations of Information and Knowledge Systems, LNCS 2284, pp93~104, 2002. Huynh V.N., Nakamori Y., Ho b T.B., Resconi G., a Context Model for Fuzzy Concept Analysis based on Modal Logic, Information Sciences, Vol 160(1-4), ppl 11-129,2004. Liu X.H., Operator Fuzzy Logic and Fuzzy Resolution, Proc. of the 5th IEEE Inter. Symp. On Multiple-Valued Logic, ISMUL 85, pp68~75, 1985. Atanassov K., Intuitionistic Fuzzy Set, Fuzzy Sets and System. Vol 20. pp87~96, 1986. Robinson. J.P. A Machine-oriented logic based on the resolution principle, J.ACM, vol. 12, pp. 23-41,1965. Liu X.H., Automatic Reasoning Based on Resolution Method, Science Publishing House, Beijing, China, pp347~360, 1994. Chen T.Y., Zou L., Intuitionistic Fuzzy Logic on Operator Lattice. BUSEFAL. Vol 69, 107-110, 1997.

ANNIHILATOR AND ALPHA-SUBSET

X.Q. LONG School of Science, Southwest Jiaotong University .chengdu 610031, China; .Dept.of' Mathematics,YiBin

University, YiBin

644007.China.

Y. XU AND L.Z.YI School of Science, Southwest Jiaotong University, chengdu

610031,China..

In this paper , we introduce the properties of annihilators and a - subsets. Then we prove that the annihilator of a lattice H implication algebra is a prime Li-ideal. In the end ,we define " A ", " v ", " a * " and " => " on E (L) which is the set of all /./-ideals and prove that ( £ (L), v , A , a* , =>)is a lattice implication algebra .

1.

Introduction

Non-classical logic has become a considerable formal tool for computer science and artificial intelligence to deal with fuzzy information and uncertain information. Many-valued logic, a great extension and development of classical logic [5], has always been a crucial direction in non-classical logic. In order to research the many-valued logical system whose prepositional value is given in a lattice, in 1993 Xu [1,6] proposed the concept of lattice implication algebras and discussed its some properties. In [2], Jun et al. defined the notion of Li-ideals in lattice implication algebras and investigated its some properties. In [3],authors introduced the notion of CC — subsets and gave its elementary properties. In this paper, as an extension of above-mention work we investigate the properties of annihilator and a — subsets in lattice implication algebras respectively. In Section 2, we list some basic information on the lattice implication algebras which is needed for development of this topic. In section 3,we introduce the properties of the annihilator in lattice implication algebras . In section 4, we introduce the properties of the a — subsets in lattice implication algebras. We give some characterizations of annihilators and a — subsets. 2.

Preliminaries

Definition 2.1 ^' Let (L, v, A , ' ) be a complemented lattice with the universal bounds 0, I, - > : LxL —> L be a mapping . ( L , V , A , ' ,->) is called a lattice implication algebra if it satisfies for all x, y, z e L the following conditions: 111

112 (1) (2) (3) (4) (5)

x-»Cy-»z) = j>->(x->z) x -> X - I x -» y = j ' -» X ' x - > > ' = > ' - > x = / = > x = _y ( x - > ^ ) - > >» = Cv-> x ) - > A;

(6) ( x v 7 ) —» z = ( x - » z) A (3; —» z) (7) ( x A y) - > z = ( x - > z) v ( 7 - > z)

A lattice implication algebra ( L , V , A , ' ,->) is said to be a lattice H implication algebra if for all X, y, Z G L ,xvyv[(x/\y) ->z] = / .We can define a partial ordering < on a lattice implication algebra L by X < y if and only if x —> y — I . In a lattice implication algebra L, the following hold(see[l]): (1) 0 - > x = / , x - » 0 = x ' , / - » x = x , x - » / = / (2) x < y implies y —> z < x —> z,z -> x< z —> y (3) xvy = (x^>y)^>y,xAy = ((x->y)-+ x' )' For more details of lattice implication algebras we refer the readers to [6]. Definition 2.2L ' Let L be a lattice implication algebra .A non-empty subset A of L is called an Li-ideal of L if it satisfies (1) OeA (2) (x -> v)' e A and v G A imply X G A . Definition2.3 A proper Li-ideal B of L is said to be prime if whenever x A y GBthen x e B or yeB. 3.

Annihilators

In this section we introduce the annihilator of a lattice implication algebra and discuss the properties of annihilator. Definition 3.1 Let B be a non-empty subset of a lattice implication algebra L . 5 ( 0 ) is called an annihilator of B if it satisfies 5(0) = {xe L\Vb eB,XAb = 0}.

Obviously the annihilator of {0} is L. The following gives some properties of annihilators . Theorem3.2 Let B, C be two non-empty subsets of a lattice implication algebra L, the following hold (l)If B c C then C(0)cfi(0). (2) S c 5 ( 0 ) 2 .

(3) B(0) = B(0y . (4) (BuC)(0) =

Where

fl(0)nC(0).

B(0)2 = (5(0))(0)

B(0f = ((S(0))(0))(0).

113 Corollary3.3 Let B be a non-empty subset of a lattice implication algebra L, then B(0) = f] a(0). By the Definitioni.^TheoremS^ and Theorem3.3,we have the following Theorem3.4.. Theorem3.4 Let B be a non-empty subset of a lattice H implication algebra L, then .5(0) is a prime Ll-ideal of L . Theorem3.5 Let L be a lattice implication algebra , Vo, b e L ,if

xeb(0),ye(a-+b')(0) 4.

then

XAyea(0).

a - subsets

In this section ,we introduce a — subsets of lattice implication algebras and show that an a — subset of every non-empty subset is an Ll-ideal .we also discuss properties of the a — subset. In the end ,we define " A ", " v "," a * " and => on E (L)which is the set of all ZJ-ideals and prove that ( E (L), v , A , a * , => ) is a lattice implication algebra . Definition 4.1 Let B be a non-empty subset of a lattice implication algebra L , B(a) is called an a — subset of B if it satisfies B(a) = {x GL|V6 eB,xAb" as follows: Table 1

-> 0 a b c d I

a

0 I c d a b 0

I I a a I a

b I b I I I b

c

d

r

I b a a I d

c b I b c

It is easy to verify that (L, A . v , ' , —>)is a lattice algebra[6]. Let B={c ,d} and a = a ,then B(a) - {0,a,d} . Table2 X 0 a b c d I

X

I c d a b 0

implication

114 Theorem 4.2 Let B be a subset of a lattice implication algebra L .If B(a) is an a -subset of B then B(a) is an Li-ideal of L. Theorem 4.3 Let B, C be two non-empty subsets of a lattice implication algebra L, the following hold (1) If B c C then C(cc) c B(a) .

(2) flcB(a)2. (3) B(a) = B(af . (4) CBuC)(a) = £ ( a ) n C ( a ) .

(5) V«,y5eZ,,ae5and b " as follows : VA,B e E(L), B"' = B(a) = {xe L\ib e B,x/\b< a), AvB=

, A=i>B = \ ~ ,then(E [< A(a) \JB>, otherise (L), v , A , a *, => )is a lattice implication algebra .( is an ideal that is generated by B.) 5.

,AAB = AnB

Conclusion

In order to research the many-valued logical system whose propositional value is given in a lattice, Xu proposed the concept of lattice implication algebras. Hence for development of this many-valued logical system, it is needed to make clear the structure of lattice implication algebras. It is well known that to investigate the structure of an algebraic system, the ideals with special properties play an important role. In this paper, we discuss the properties of annihilators and

115 the a — subsets in lattice implication algebras . It hope above work would serve as a foundation for further study the structure of lattice implication algebras and develop corresponding many-valued logical system. Acknowledgement This work is supported by the National Natural Science Foundations of P.R.China(Grant no. 60474022) References 1. Xu Yang .Lattice implication algebra Journal of Southwest Jiaotong University, 1(1993). 2. JUN Young Bae ,ROH Eun Hwan ,Xu Yang . li-ideal in lattice implication algebra [J] .Bull Korean Math Soc ,1998,35(1): 13-24 . 3. Xiqing long, Yang Xu . Some properties of a — subsets .2005 4. Zhu Hua .Filters and ideals of a lattice implication algebra .Southwest Jiaotong University ,2005 5. D.W. Borns, J.M. Mack, An Algebraic Introduction to Mathematical Logic, Springer, Berlin, 1975. 6. Y. Xu, D. Ruan, K.Y, Qin, J. Liu, Lattice-Valued Logic, Springer, Berlin, in press.Y.L. Liu et al. / Information Sciences 155 (2003) 157-175 175 7. Y. B. Jun, On ZJ-ideals and prime ZJ-ideals of lattice implication algebras,J. Korean Math. Soc. 36 (1999), no. 2, 369_380.

MULTI-FOLD FUZZY IMPLICATIVE FILTER OF RESIDUATED LATTICE IMPLICATION ALGEBRAS* ZHUHUA Department of Mathematics, Zhengzhou University, Zhengzhou Henan, 450052, P.R. China

ZHAO JIANBIN Department of Mathematics, Zhengzhou University, Zhengzhou Henan, 450052, P.R. China

XU YANG AND YILIANGZHONG Intelligent Control Development Center, Southwest Jiaotong university, Chengdu Sichuan, 610031, P. R. China

The notion of multi-fold fuzzy implicative filter is introduced in residuated lattice implication algebras. The properties of multi-fold fuzzy implicative filter are investigated. The relations between multi-fold fuzzy implicative filter and filter, between multi-fold fuzzy implicative filter and fuzzy filter, between the lattice implication homomorphism image of multi- fold fuzzy implicative filter and the multi-fold fuzzy implicative filter are discussed, respectively.

1.

Introduction

Lattice-valued

logic

system supplies

a logic

foundation

for

uncertainty

reasoning and intelligent computer in the process of managing intelligent information. In order to research the many-valued logical system whose

This work is supported by the National Nature Science Foundation of China with granted No. 60474022.

116

117 propositional value is given in a lattice, in 1990, Xu [8] proposed the concept of lattice implication algebra and discussed its some properties. Since then, many researchers have investigated this logical algebra(see e.g. [2-7,10,12]). At the same time, because a lot of imprecise, fuzzy, incomplete and incomparable information exists in the real world, uncertainty reasoning by using the precise method and classical logic will mostly produce logic paradox. Thus researching fuzzy filter and fuzzy lattice implication algebra have the important real senses for the lattice-valued logic system which are based on the uncertainty reasoning and automatic reasoning in the artificial intelligent and intelligent control. In [9], Xu and Qin proposed the notions of fuzzy lattice implication algebra and fuzzy filter, respectively, and investigated their some properties. In [1], Liu proposed the concept of residuated lattice implication algebra, and discussed its some properties. In this paper, based on the above-mention work, the notion of multi-fold fuzzy implicative filter is introduced in residuated lattice implication algebra, and the properties of multi-fold fuzzy implicative filter are investigated. The relations between multi-fold fuzzy implicative filter and filter, between multi-fold fuzzy implicative filter and fuzzy filter, between the lattice implication homomorphism image of multi-fold fuzzy implicative filter and the multi-fold fuzzy implicative filter are investigated, respectively. 2. Preliminaries Definition 2.1( Xu [8,11]). By a lattice implication algebra we mean a bounded lattice ( Z , V , A , 0 , 7 ) with an order-reversing involution ' , / and O the greatest and the smallest element of L respectively, and a binary operation —> satisfying the following axioms: (I,) x - » (y - » z) = y - > (x ->• z) ; (I 2 ) x ->• x = / ; (1 3 ) x —> y = y' —» x' •, (1 4 ) x—»}> = >>—>•* = / = > x = y ; (1 5 ) (x - > y) - > y = (y - > x) - > x ; (li) (x v y) —» z = (x —> z) A (y - » z) ; (12) ( j C A j ) ^ z = ( i ^ z ) v ( y - > z ) for all x,y,zeL. A lattice implication algebra L is called a lattice H implication algebra if it satisfies X V y V ((x Ay) —> z) = / for all x,y,zeL. Definition 2.2 (Xu [11]). A subset F of a lattice implication algebra L is called a filter of L if it satisfies (Fl) IeF, (F2) x e F and x — » y e F imply yeF, \/x,yeL. A subset F of a lattice implication algebra L is called an implicative filter of L if it satisfies (Fl) and

118 (F3) j - > ( j - > z ) e f and x — » y e F imply x —>zeF , Vx,y,zeL. Definition 2.3 (Xu [11]). Let Lx and Z 2 be lattice implication algebras, / : L{ —> L2 a mapping from Z, to Z 2 , if f(x —> J>) = / ( x ) —> /(}>) holds for any X, y G Z, , then _/" is called an implication homomorphism from Z, to Z 2 . If f is an implication homomorphism and satisfies f{x vy) = f(x) v fiy), fix A y) = /(*) A / ( y ) ,

/(x') = (/(x))', then f is called a lattice implication homomorphism from Z, to Z 2 . A one-to-one and onto lattice implication homomorphism is called a lattice implication isomorphism. Let f be a lattice implication homomorphism from Z, to Z 2 , 0 2 and 7 2 be the smallest and greatest element of Z 2 respectively, A^L{,B^L2 Then we define fiA) = { / ( x ) | x e A} , rliB) = {xeLl\3yGB,fix) = y}. Definition 2.4 (Liu [1]). Let (Z,V,A,',—>) be a lattice implication algebra, and define a binary operation ® in Z with \/a,beL,a®b = ia-±by . Then ( Z , ® , - » ) is called a residuated lattice implication algebra. Lemma 2.5 (Xu [11]). Let i be a filter of lattice implication algebra Z . Then (1) IeJ, (2) if x e J and X < y , then y <E J , (3)if J t j e / , then X ® ^ E J . Theorem 2.6 Let (Z, ®, —>) be a residuated lattice implication algebra. Then for all x, y e Z , « e N and n > 1, x" —»_y = x —» (x —»• • • (x —»_y) • • •) . Definition2.7 (Jun [6]). Let Z be a residuated lattice implication algebra, J (Z.L is said to be a n-fold implicative filter of Z , if it satisfies (Fl) and (F4) if x" - » iy - > z) G J , x" ^ y e J , then X ' ^ Z G J for any

^ ^ z e l , «eN. Theorem 2.8 (Xu [11]). Let (Z,®,—>•) be a residuated lattice implication algebra, then (Z,®,—>) is a lattice Zf implication algebra if and only if for all X e Z, x ® x = x and x © x = X. Let (Z, ®, —>) is a reisduated lattice implication algebra, define a binary operator © on Z with for all a,b e Z , a@b - a' —» 6 . Then the following hold(see[l]):

119

a

(1) for any m,n G N , if m < n , then a" < am and ma
a°=I,a1=a,aM=ak®a and (k + l)a = {ka)®a • n (2) (avby=a vb",(aAby=anAbn; (3) «(# v Z>) = («a) v (nb), n(a Ab) = (net) A (nb). 3. Multi-fold fuzzy implicative filter Definition 3.1 (Xu and Qin [9]). Let L be a lattice implication algebra, F e ^ (L),F ^ ^ . F is called a fuzzy filter of L , if it satisfies (FF1) F ( x ) < F ( / ) ; (FF2) m i n { F ( x - > ; / ) , F ( x ) } < F ( > ) for all x , y e Z . F is called a fuzzy implicative filter of L , if it satisfies (FF1) and (FF3) min{F(x^>(y->z)),F(x-+y)}z) for

x,y,z

all

e.L.

Definition 3.2 Let L be a residuated lattice implication algebra, F e ^{L), F ^ (j). F is called a n-fold fuzzy implicative filter of L , if it satisfies (FF1) and (FF4) m i n { F ( x " - > (y - > z ) ) , F ( x " - > y)} < F(x" - » z) for anyx,y,z 6 j L , « e N . Lemma 3.3 (Jun [5]). Let F be a fuzzy filter of a lattice implication algebra L . Then Vx, y, z e L , the following hold

(1) if F(x -+y) = F(I), thenF(x) < F(y) , (2) F is order preserving, (3) if X->• ( y - > z) = / , then mm{F(x),F(y)} < F(z) . In the following, we give the properties of fuzzy filter, fuzzy filter and multi-fold fuzzy implicative filter in the residuated lattice algebras. Theorem 3.4 Let F be a fuzzy filter of a residuated lattice algebra L . Then \/a,beL, F(a® b) = min{F(a),F(b)}. Corollary 3.5 Let F be a fuzzy filter of a residuated lattice algebra L. Then VaeL, n e N , F(a") = F(a). Theorem 3.6 Let F be a fuzzy filter of a residuated lattice algebra L , 7? e N . Then the following are equivalent: (1) F is a n-fold fuzzy implicative filter of L ; (2) F(x" -+y) = F(x" + 1 - > y) for any x,yeL; (3) F ( ( x " - > y) - > (x" - > z)) = F(x" - > ( > - > • z))

implicative implication implication implication implication

for

any

x,y,zeL;

(4) min {F(z -> (x" -> (x" ->• >;))), F(z)} < F(x" -> j ) for any x,_y,zeZ.

120 In the following, the relations between fuzzy implicative filter and fuzzy filter, between multi-fold fuzzy implicative filter and fuzzy filter, between multi-fold fuzzy implicative filter and multi-fold implicative filter are discussed. Theorem 3.7 Let F be a fuzzy filter of a lattice implication algebra L . Then F is a fuzzy implicative filter of L ifandonlyif

\/x,y eL,F(x -» y) = F(x -> (x -> y)) . Theorem 3.8 Let F be a n-fold fuzzy implicative filter of a residuated lattice implication algebra L , n € N . Then F is a fuzzy filter of L . Proof. VxeL,F(x) z), F(y)} < F(z). Hence F is a fuzzy filter of L . Theorem 3.9 Let L be a residuated lattice implication algebra, V/l e [0,1], FA ={x\A, and #i e N . Then F is a n-fold fuzzy implicative filter of L if and only if V/l e [0,1], Fx is a n-fold implicative filter of L . Theorem 3.10 Let L be a residuated lattice implication algebra, n e N . If

F

is a n-fold implicative filter of L, then %F is a n-fold fuzzy

implicative filter of L, where %F is the characteristic function of F \\ and

FyF=\

[0

xeF

x£F

Finally, the relations for different n, n-fold fuzzy implicative filters and between the lattice implication homomorphism image of multi-fold fuzzy implicative filter and the multi-fold fuzzy implicative filter are investigated. Theorem 3.11 Let L be a residuated lattice implication algebra, n e N . If F is a n-fold fuzzy implicative filter of L , then V / w e N , F is a (m+n)-fold fuzzy implicative filter of L . Proof. If F is a n-fold fuzzy implicative filter of L , then Vx, y G L,

F(x" ->y) = F(x"+l -»y) F(x"+m^y)

by =

theorem

3.6

(2)

and

so

F(x"+m+l^y).

Hence F is a (m+n)-fold fuzzy implicative filter of L by Theorem 3.6 (2) and Theorem 3.8. Remark. The converse of Theorem 3.11 may not hold.

121 References 1. J. Liu, Y. Xu, Study on the structure of lattice implication algebra, Master Thesis, Southwest Jiaotong Univ. 1996. 2.

J. Liu, Y. Xu, Filters and structure of lattice implication algebra, Chinese Sci. Bull. 42,1517-1520 (1997).

3.

S. Z. Song, Y. B. Jun, On n-fold positive implicative filters of lattice implication algebras, Soochow J. Math. 30 (2), 225-235 (2004).

4.

Y. B. Jun, Implicative filters of lattice implication algebras, Bull. Korean Math. Soc. 34, 193-198 (1997).

5. Y. B. Jun, Fuzzy positive implicative and fuzzy associative filters of lattice implication algebras, Fuzzy sets and systems, 121,353-357 (2001). 6.

Y. B. Jun, On n-fold implicative filters of lattice implication algebras, Internat. J. Math. & Math. Sci. 24 (4), 695-699 (2001)

7.

Y. L. Liu, S. Y. Liu, Y. Xu, K. Y. Qin, ZZJ-ideals and prime Li-ideals in lattice implication algebras, Information Sci. 155, 157-175 (2003).

8. Y. Xu, Lattice implication algebras, J. Southwest Jiaotong Univ. 89 (1), 20-27(1993). 9.

Y. Xu, K. Y. Qin, Fuzzy lattice implication algebras, J. Southwest Jiaotong Univ. 30 (2), 121-127(1995).

10. Y. Xu, K. Y. Qin, Lattice H implication algebras and lattice implication algebra classes, J. Hebei Mining Civil Eng. Inst. 16 (4), 469-495 (2001). 11.

Y. Xu, D. Ruan, K.Y. Qin, J. Liu, Lattice-valued logic, Springer-Verlag. 2003.

12.

H. Zhu, Y.Xu. Study on Filters and Ideals of lattice implication algebra, Master Thesis, Southwest Jiaotong Univ. 2005.

PD- ALGEBRAS*

YONG LIN LIU Department of Mathematics, Nanping Teachers College, Wuyishan 354300, Fujian, P. R. China, E-mail: ylliun&tom. com YANG XU Department of Applied Mathematics, Southwest Jiaotong University, Chengdu 610031, Sichuan, P. R. China, E-mail: [email protected]. edu. en

Pseudo-difference posets have been introduced as a new quantum logic structure. In this paper, we show that the axiom (PD6) in the definition of pseudo-difference posets is not independent, and hence obtain some simpler axiom systems. We show that pseudo-difference posets are a noncommutative generalization of Dposets as if pseudoeffect algebras to effect algebras. The PD-algebra is introduced as generalizations of both a pseudo-difference posets and a £>-algebra, in which the partial order is not assumed.

1. pseudoeffect algebras and pseudo-difference posets In order to generalize effect algebra in a noncommutative form, in 2001, Dvurecenskij and Vetterlein initiated a quantum logic structure called the pseudoeffect algebra 1: A partial algebra (PE; ©,0,1), where © is a partial binary operation and 0 and 1 are constants, satisfies the following axioms: (PE1) a © b and (a®b)®c exist iff b © c, a © (b © c) exist, and in this case (a © b) © c = a © (b © c). (PE2) For each a e PE, there is exactly one d e PE and exactly one e G PE such that a © d = e © a = l. "This work is supported by the National Natural Science Foundation of P.R.China (grant no. 60474022) and the Natural Science Foundation of Fujian.

122

123 (PE3) If a © b exists, there are elements d, e £ PE such that a © b =

d®a = b®e. (PE4) If 1 © a or a © 1 exists, then a = 0. In view of (PE2), we can define two unary operations ~ and ~ by requiring for any a £ PE and a~ © a = a © o ~ = 1. In a pseudoeffect algebra PE, define a binary relation < on PE by a < b iff a © c = b for some c e PE (equivalently, d © a = 6 for some d 6 P E ) . Then ( P E , <) is a poset. Recently, Ma et al. introduced a new notion of pseudo-difference poset s 5 : A pseudo-difference poset is a poset (PD; < , © i , © r , 0 , 1 ) with a maximum element 1 and a minimum element 0 and two partial binary operations ©( and © r satisfying the following axioms: (PD1) b Qi a is defined iff b © r a is defined iff a < b. (PD2) b Qi a < b, b Qr a < b. (PD3) a < b implies b 9 / (b Qr a) = a and b Qr (b ©/ a) = a. (PD4) a < b < c implies c © ; b < c Qi a and cQrb
A x i o m s y s t e m s of pseudo-difference p o s e t s

We found that the (PD6) is not independent. Moreover, some other axioms may be replaced by new one. L e m m a 2.1. Let (PD; <, 9 j , 9 r , 0 , 1 ) be a poset with a maximum element 1 and a minimum element 0 and two partial binary operations 9 ; and Qr. If (PD1), (PD2), (PD3), (PDA) and (PDb) hold, then the following hold: (i) a ©j 0 = a, a © r 0 = a. (ii) a ©; a = 0, a © r a — 0. (hi) If a
124 (vi) If aQic = bQic then a = b. If aQrc=bQrc then a = b. (vii) (c 0 | a) © r 6 = (c © r 6) ©j a. (viii) (c Q( a) ©; (6 G( a) = c ©/ b, (c © r a) Q r (6 0 r a) = c G r b. (ix) / / 1 G; ((1 Gr a) Gr &) exists iff 1 0 r ((1 ©( b) G; a) exists, and in this case 1 Q ( ((1 Q r a) © r b) = 1 Q r ((1 G; b) Qi a). L e m m a 2.2. Let (PD; < , ©j, © r> 0,1) be a poset with a maximum element 1 and a minimum element 0 and two partial binary operations 0 ; and Qr. If (PD1), (PD2), (PDZ), (PDA) and (PDh) hold, then the following hold: (i) 0~ = 0 " = 1,1~ = I - = 0. (ii) a~~ = a~~ = a. (iii) a~ Qrb = 6~ ©j a. (iv) (a~ © r b)~ = (b~ Qi a)~. (v) aQtb = ciffa = (c~ Q r b)~ = (b~ Gt c)~ iffb = aQrc. (vi) a © r 6 = c ^ 0 = ( 6 ~ © r c ) ~ = (c~ © ( 6)~ iffb = aQic. Next, we present some simpler axiom systems of pseudo-difference posets. T h e o r e m 2.1. A poset (PZ>; < , © ; , © r , 0 , 1 ) with a maximum element 1 and a minimum element 0, and two partial binary operations ©j and Qr is a pseudo-difference poset iff the following hold: (PD1) b Qi a is defined iffbQra is defined iff a
125 Dually, if 1 0 ; ((1 0 r b) © r a) exists, let f = (b~ Qr a)~ Qr a, g = (a~ 0 , 6)~ 6 j b, we have 1 ©< ((1 © r a) © r /) = (a~ © r / ) - = [a~ © r ((fc~ © r a ) - © r a)}~ = [a~ © r (a~ ©, (6~ © r a))]~ = (b~ © r a)- = 1 ©i ((1 ©r &) ©r a), and 1 ©I ((1 ©r g) ©r t ) = (g~ ©r &)~ = [ ( ( « " ©i 6)~ ©i t ) ~ Gr b]~

= [(6~ © r ( a - ©i 6))~ © r 6] _ = [b- ©; (6~ © r ( a " ©, b))]~ = ( a - ©; 6)~ = (6~ © r a ) " = 1 © ( ((1 © r b) Qr a).

n

T h e o r e m 2.2. A poset (PD;<,Qi,Qr,0,l) with a maximum element 1 and a minimum element 0, and two partial binary operations ©; and Qr is a pseudo-difference poset iff the following hold: ( P D l ) 6 ©i a is defined iffbQra is defined iff a
126 By (PD2) and (PD8), we have c Q; b = (c Q/ a) Q; (b Q; o) < c Q; a, c © r b = (c Q r a) Gr (& Gr- a) < c Q r a. Hence (PD4) hold. By (PD3) and (PD8), we have b ©/ a = (c G; a) Q r ((c 0 ; a) ©j (b Q, a)) = (c G; a) Q r (c 0 ; b), bQra= (c G r a) Q; ((c Q r a) ©r (& ©r a)) = (c © r a) ©( (c © r 6). Hence (PD5) hold.

a

Kopka and Chovanec initiated a quantum logic structure called the Dposets 4 : A D-poset is a poset (£>;<, 0 , 0 , 1 ) with a maximum element 1 and a minimum element 0 and a partial binary operation © satisfying the following axioms: (Dl) 6 © a is defined iff a < 6. (D2) b Q a < b. (D3) a < b implies 6 © ( 6 © a ) = a. (D4) a-posets, we obtain that pseudo-difference posets are a noncommutative generalization of .D-posets. Corollary 2 . 1 . Let (PD; <,Qi,Qr,0,1) be a pseudo-difference poset. Then it is a D-poset iff a < b implies bQia = bQr a for all a,b € PD. In Theorems 2.2 and 2.3, suppose that b ©j a = b Q r a(a < b) for all a,b £ PD, we can obtain some simpler necessary and sufficient conditions of Z?-posets, respectively. That is following: Corollary 2.2. A poset {D;<,Q, 0,1) with a maximum element 1 and a minimum element 0 and a partial binary operation 0 is a D-poset iff the following hold: (Dl) bQa is defined iff a
0,1) with a maximum element 1 and a binary operation 0 is a D-poset iff the (D2) b © a < b. = a. (D8) a < b < c implies (cQ a) Q

127 3. P.D-algebras Gudder introduced the notion of .D-algebras 3 : A D-algebra is an algebraic system (P; 0 , 1 , ©), where 0,1 € P and 0 is a partial binary operation on P satisfying: (D'l) a © 0 is defined and a © 0 = a for all a <E P, (D'2) 1 e a is defined for all a e P, (D'3) if 0 © o is defined, then a = 0, (D'4) if 6 0 a and c 6 b are defined, then c 0 a and (c © a) © (c 0 6) are defined and (c © a) © (c © 6) = b Q a. The motivation came from the natural question: Although Z?-posets and effect algebras are algebraically equivalent, there is a basic difference in their axiomatic structures. In a D-poset the partial order is assumed and the partial binary operation © is derived from this partial order, while in the effect algebra the partial binary operation © is assumed and the equivalent partial order is derived from ©. Hence, © is more fundamental than ©. © determines the poset structure while © do not. There are same things between pseudo-difference posets and pseudoeffect algebras. Following the idea of Gudder, to place © and ©;, © r on an equal footing, we introduce a more general axiomatic structure called a pseudo-difference algebra (for short, PD-algebra). Definition 3 . 1 . A PD-algebra is a partial algebra (PD; ©;, © r , 0,1), where ©j and © r are two partial binary operations on PD and 0 and 1 are constants, satisfying the following: (PDA1) 6©; a is defined iff b Qr a is defined. (PDA2) a Qi 0 is denned and a ©i 0 = a Qr 0 = a for all a e PD. (PDA3) 1 © ; a is defined for all a € PD. (PDA4) If 0 Qi a or 0 © r a is defined, then a = 0. (PDA5) If bQi a and c©; b are defined, then cQia and (c©; o) © r (c©; b) are define and (c ©j a) © r (c ©; b) = 6 ©( o. If bOr a and cQrb are defined, then cQra and ( c © r o) 0 ; ( c © r b) are defined and (c © r o) ©j (c © r b) — b © r a. A .P.D-algebra is a noncommutative generalization of a .D-algebra. That is: Corollary 3 . 1 . Let (PD;Qi,Gr,0,l) be a PD-algebra. algebra iff b ©; a = b © r a for all a, b € PD if they exist.

Then it is a D-

128 L e m m a 3 . 1 . Let ( P D ; © ( , © r , 0 , 1 ) be a PD-algebra. (i) a Old and aQr a are defined, and aQia = aQra = 0 for all a 6 PD. (ii) If bQi a is defined and b ©/ a = 0, then a = b. If b Qr a is defined and b Qr a = 0, then a = b. Proof, (i) Since 0 0 ; 0 and aQi 0 are defined, by (PDA5) and (PDA2), we have a Qi a is defined and a ©; a = (a Qi 0) GT (a 0 ; 0) = 0 © ; 0 = 0. By (PDA1), aer a and o 9 r 0 are defined. By (PDA5) and (PDA2), we have aQra = (aQr 0) 0 ; (a © r 0) = 0 © r 0 = 0. (ii) If 6©; a is defined and 6©; a - 0, by (PDA5) and (PDA2), we have b={bQi 0) © r (b Qi a) = a Qi 0 = a. Similarly, if b Qr a is defined and b Qr a = 0, we have a = b. • T h e o r e m 3 . 1 . (i) Let (PZ?; < , © ( , © r , 0 , 1 ) be a pseudo-difference poset. Then (PD; ©/, © r , 0 , 1 ) is a PD-algebra. (ii) Lei (PD; ©(,© r , 0,1) fee a PD-algebra. Define a < b iff b Qi a is defined iffbQr a is defined, then (PD; <, ©/, © r , 0,1) is a pseudo-difference poset. Proof, (i) By Theorem 2.2. (ii) We first show that (PD; <, 0,1) is a bounded poset. By Lemma 3.1 (i), we have a < a for all a G PD. If a < b and b < a, then a ©; b and b Qi a are defined. By (PDA5) and Lemma 3.1 (i), we obtain 0 © r (a Qi b) is defined. By (PDA4) o ©; b = 0. Hence a = b as Lemma 3.1 (ii). If a < b and b < c, then b ©j a and c ©; 6 are defined. By (PDA5) we have c © ; a is defined, which means a < c. From (PDA2) and (PDA3), we have 0 < a < 1 for all a e PD. Hence (PD;<,0,1) is a bounded poset. It is easy to check that (PD1), (PD4), (PD5) and (PD7) hold. By Theorem 2.2, (PD; < , ©(, Q r , 0,1) is a pseudo-difference poset. • Theorem 3.1 shows that a pseudo-difference poset is a PD-algebra. Conversely, if a PZ?-algebra is equipped with a natural partial order, it becomes a pseudo-difference poset. From this view, a PD-algebra is a generalization of a pseudo-difference poset. Corollary 3.2. Let ( P D ; © ( , © r , 0 , 1 ) be a PD-algebra. Define a®b = (b~ 0 ; a ) ~ whenever b~ 0 ; a is defined, where x~ = 1 ©; x and x~ = 1 © r x, then (PD; ©, 0,1) is a pseudoeffect algebra.

129 Conversely, let (PD;®,0,1) is a pseudoeffect algebra. Define 6 0 ; a = (a © b~)~ and b Qr a = (b~ © a ) ~ whenever a © b~ or b~ © a is defined, then (PD;Qi,Qr,0,1) is a PD-algebra. Corollary 3.2 shows that PD-algebras and pseudoeffect algebras are equivalent algebraic structures. The notion of states is important. It gives the connection between theory and experiment. A state on a pseudoeffect algebra ( P D ; © , 0 , 1 ) is a mapping s : PD -* [ 0 , 1 ] C R such that s ( l ) = 1 and s(a © b) = s(a) + s(b) whenever a © b exists 2 . Similarly, we introduce the notion of states on a FZ?-algebra. A state on a PD-algebra (PD;Qi,Qr,0,l) is a mapping s : PD -> [ 0 , 1 ] C 8 such that s ( l ) = 1 and s(beta) — s(bera) = s(b)-s(a) whenever b Qi a (and so b Qr a) exists. The next result shows these two definitions of states are equivalent. Corollary 3 . 3 . A mapping s : PD —> [0,1] is a state on a PD-algebra (PD;Qi,Qr,0,l) iff s is a state on the corresponding pseudoeffect algebra (PD;®,0,1) (see Corollary 3.2). Proof. Let s be a state on (PD; ©;, 0 r , 0,1). If a © b exists, we have s(a © b) = s((b~ Qi a)~) = 1 - s(b~ ©; a) = 1 - s(b~) + s(a) = s(a) + s(b). It follows that s is a state on (PD; ©, 0,1). Conversely, let s be a state on (PD; ©, 0,1). If b ©; a and hence bQr a exists, we have s(b 0 ; a) = s((a © b~)~) = 1 - s(a © 6~) = 1 - s(a) - s(6~) = s(6) - s(a), s(b Qr a) = s((b~ © a)~) = 1 - s(b~ © a) = 1 - s(b~) - s(a) = s(b) - s(a), which means that s is a state on (PD; Qi,Qr,0,1).

•

References 1. A. Dvurecenskij, T. Vetterlein, Pseudoeffect algebras. I. Basic properties, Int. J. Theor. Phys. 40(2001) 685-701. 2. A. Dvurecenskij, T. Vetterlein, Congruences and states on pseudoeffect algebras, Found. Phys. Letters 14(2001) 425-446. 3. S. Gudder, D-algebras, Found. Phys 26(1996) 813-822. 4. F. Kopka, F. Chovanec, D-posets, Math. Slovaca 44(1994) 21-34. 5. Z. Ma, J. Wu, S. Lu, Pseudo-effect algebras and pseudo-difference posets, Int. J. Theor. Phys. 43(2004) 1453-1460.

LI-YORKE CHAOS IN A SPATIOTEMPORAL CHAOTIC SYSTEM

PING LI, ZHONG LI AND WOLFGANG A. HALANG Faculty of Electrical and Computer Engineering, FernUniversitdt 58084 Hagen, Germany ping. li@fernuni-hagen. de

in Hagen,

G U A N R O N G CHEN Department of Electronic Engineering City University of Hong Kong, 83 Tat Chee Avenue, Kowloon, Hong Kong SAR, P. R. China gchen@ee. cityu. edu. hk

This paper presents a rigorous proof to the existence of chaos in the sense of LiYorke in a spatiotemporal chaotic system, i.e., a coupled map lattice. An explicit formula for choosing parameters of the coupled map lattice to guarantee its chaos is derived. It therefore lays a theoretical foundation for further study and exploitation of this spatiotemporal chaotic system. Simulation shows the correctness of the theoretical investigation.

1. Introduction The coupled map lattice (CML) as a spatiotemporal chaotic system was proposed in 1983 *. Since it is a simple model with most essential features of spatiotemporal chaos, the CML has been extensively studied in the fields of bifurcation and chaos, pattern formation, physical biology and engineering. Recently, the CML has been applied in cryptography 7'8>9>10. Basic properties of the CML have been investigated from the viewpoints of pattern formation 2 , thermodynamics 5 ' 6 and chaotic dynamics 3 ' 4 , where numerical methods are the main tools. Sometimes, numerical methods provide some artifacts due to finite precisions in computer simulations. Mathematically rigorous verification of chaos is always preferable, which will be carried out in this paper. The rest of the paper is organized as follows. In Section 2, the concept of Li-Yorke chaos in terms of the Marotto theorem is introduced. A sufficient condition for the parameters of a chaotic CML is derived in Section 3. 130

131 In Section 4, simulation of constructing a chaotic CML is given. Finally, conclusions are drawn in Section 4. 2. Li-Yorke chaos and M a r o t t o t h e o r e m The first precise definition of discrete chaos due to Li and Yorke states that any 1-dimensional discrete interval map having a period-three orbit exhibits chaos n . This is a simple criterion for chaos in 1-dimensional discrete systems and was lately generalized to n-dimensional discrete systems by Marotto 12 . A correct proof of the Marotto theorem is given in 15 . T h e o r e m 2.1. (Modified Marotto Theorem) discrete system x f c + 1 = F(x f c ),

xkeRn,

fc

Consider the

n-dimensional

= 0,l,2,...,,

(1)

n

where F is a map from R to itself. Assume that F has a fixed point x* satisfying x* = F(x*). Assume moreover that (1) F(x) is continuously differentiable in a neighborhood o/x*, and all eigenvalues of DF{x*) have absolute values large than 1, where DF(x*) is the Jacobian of F Jacobian of F evaluated at x*, which implies that there r > 0 and a norm 11 • 11 in Rn such that F is expanding F is expanding in BT(x*), the closed ball of radius r centered at x* in (Rn, || • |j); (2) x* is a snap-back repeller of F with Fm(x°) = x*, xo ^ x*, for some xo € Br(x*) and some positive integer m, where Br(x*) denotes an open ball of radius r centered at x* in {Rn, || • ||). Furthermore, F is continuously differ•entiable in some neighborhoods of XQ,XI, ...,xm_i, respectively, and det[DF(x.j)] / 0, where Xj = F(XJ_I) for j = 1,2, ...,m. Then, system (1) is chaotic in the sense of Li-Yorke. 3. Li-Yorke C h a o s in C M L The one-way coupled logistic lattice, one of the most intensively investigated CML, is described by 4+1

= (l-e)f(xi)

+ ef(xi'1),

(2)

where f(x) = ax(l — x),xJk represents the state variable for the j t h site (j = 1,2, ...,L, L is the number of the sites in the CML) at time k (k = 0,1,2,...), e e (0,1) is the coupling strength, a 6 (0,4] is the parameter of the logistic map. The periodic boundary condition, x° = x% for all k, is used in the CML.

Two lemmas 3 from linear algebra are first introduced. In this paper, Euclidean norm for (m x n) real matrices A = [ay] is ||yl|| =

^Er=iE"=1|ay|2. L e m m a 3 . 1 . For a circulant

matrix ' c\ CJV

c2 • •

cN

C\ •

CN-I

•

c= c2 cz

C\

the ith eigenvalue of C is given by Aj = ci + c2t + ... + c j v t N _ 1 , 27TJ •

,

where t = e " \ i = V — l,j = 1,2,..., N. L e m m a 3.2. For a matrix ANXN minant of A is equal to Yli=i Ai-

with eigenvalues XI,...,XN,

the deter-

T h e o r e m 3 . 1 . The CML (2) is chaotic in the sense of Li-Yorke, provided that |(2 - a) (I - 2e)| > 1, a £ (a x ,4] and e g (0,4 - 2\/3), where ax = \ (2(c + 4) + Vc2 + 8c + 4) and c = 1/^L(1 - 2e +2e2). Proof: The proof given here follows some ideas of 16 . Denote x^ = \x\, ...,xfc]T, and rewrite the CML (2) in a vector form as an L-dimensional map, Xfc+i =

F{x.k),

(3)

where F ( x ) is continuously differentiable in jB r (x*) for some r > 0. Let x* = [1 - i , . . . . 1 - £ ] T = z*l e RL, where a > 1, z* = 1 - A, 1 = [1,..., 1] T . Then x* is a fixed point of the CML (3), satisfying x* = F(x*). Since DF(xk) is equal to

(i-O/K) ef'(xl)

0

o

0

(1 - e)f(xl)

0

ef'(4)

(1 - e)f'(xl) ef(xLk-2)

0 0

(1 - e ) / ' ^ - 1 ) (l-£)/'(^) (4)

133 where f'(x) = a(l - 2x), one has DF(x*) = (2 - a)M, where M is a circulant matrix, 1 - e 0 ••• £ 1-e 0 0 e 1-e

M =

0 0

••• 0

0 ••• •••

e 1-e ••• e

e 0 0 0 1-e

According to Lemma 1, the eigenvalues of M, (3j (j = 1,2, ...,£), are = y/-[. Thus, the given by /?, = 1 - e + e cos -P- - i e sin 27T2 absolute values of the eigenvalues of DF(x*), \\j\ (j = 1,2, ...,L), are given by \\j\ = |2 - a\y/l - 2e(l - e) (l - c o s 2 ^ ) , since 0 < e < 1, - 1 < cos ^

j = 1,2,..,L, and

< 1 for all j = 1,2, ...,L, one has

l - 2 e ( l - e ) ( l - c o s ^ i ) > |1 - 2e|, Vj = 1,2,...,L, that is, |A,-| > / |(2 - a)(l - 2e)|, Vj = 1,2,...,L. Since |1 - 2e| < 1, if a > 3 and |(2 — 1, then all the eigenvalues of the Jacobian matrix of F at the fixed point x* of the CML (3) are larger than 1 in absolute value, so x* is an expanding fixed point of F. Therefore, there exists some r > 0 such that F is expanding in Br(x*); that is, for all x G Br(x*), one has||Z?ir(x)|| > 1, where ||DF(x)|| = yja2(l - 2e + 2e2)Ej-=1(l - 2Xj)2. Let the bound of the maximal open expanding ball Br(x*) be denoted by pi, where p satisfies y/La2(l — 2e + 2e 2 )(l — 2p)2 > 1. EquationsjLa 2 {\ — 2e + 2e 2 )(l — 2p)2 = 1 has two solutions:

= P1,2

l 2 I

1 T

ay/L(l-2e

+ 2e2)

)-j('*I)-

where c = l/y/L(l - 2 e + 2e2). Since e G (0,1),L > 2, one has c G (0,1]. For a G (3,4], 0 < p\ < p2 < z* < I; therefore, one may take p = p2 = ^ (l + §)• Then, since 2* — p < 1 — z*, one may take r — z* — p = \ - i (l 4- | ) > 0. Thus, condition (1) of Theorem 1 is satisfied. Moreover, for all z l G Br(x*), one has \z — z*\ < r, that is, <7i < z < a 2 , where
134 Next, let F(x) = x*, that is, (1 - e)axi(l - Xi) + eaxL(l - xL) eaxi(\ — x\) + (1 — e)ax2(l - X2) (5) eaxL-i(l

- XL-I) + (1 - e)ai£,(l - XL) = z*.

Summing the L equations yields a^2j_1Xj(l — Xj) = Lz*. Assume the system of equations (5) has a solution, denoted y1 = z\l, that is, az\{\ - z\) = z*, which gives two solutions: z* and \. Take z\ = \. Since z\ — o\ = \ — \ (l + §), for c £ (0,1] and a 6 (3,4], z\ — o\ < 0, therefore, yigBr(x').

According to Eq. (4), DF(y1) is given by DF(y1) = a(l - 2z{)M, with eigenvalues Ay = a{\ — 2z\)(ij. According to Lemma 2, det[Di r (y 1 )] = n?=i AU = aL(l-2Zl)L ( n ^ i f t - ) . if | ( 2 - a ) ( l - 2 c ) | > 1 and a > 3, that is, /% 7^ 0 (j = 1,2, ...,L), and 1 - 2zx > 0, one has det[Di^(y1)] / 0. Now, let F(x) = yj, that is, (1 - e)ax;i(l - xi) + eaxL(l — xL) = z\ eaxi(l - xi) + (1 - e)ax2(l - x2) = z\ : : : eaa; L _i(l - xL-\)

<6>

+ (1 - e)aa;£,(l - rcjr,) = z i .

Summing the L equations, one has a ^ j - i xj0- ~ ^j) = Lz\. Assume the system of equations (6) has a solution, denoted y 2 = z2\, that is, az2{l — z2) = z\, which gives two solutions: | (1 ± v'a^~4 J, Take 22 = 5 ( i + ^ r 3 ) • T h e n > o n e h a s CTi - z2 = ^ (c - V a ^ I ) . Since c € (0,1] and a £ (3,4], <7x - z2 < 0. Additionally, one has 02 - Z2 = Ya ( 2 a - V^ 2 -

4

~ c - 4) >

(7)

which has two roots: ai^ = | (2(c-f 4) ± \/c 2 + 8c + 4). Define 0 has solutions in the interval: (2 — y/E, 2 + i/5), therefore, for c e (0,1], y?(c) — 3 > 0, that is, ai > 3. In addition, inequality /3) U (4 + 2\/3, +00). Therefore, for c € (0,4 - 2\/3), y(c) - 4 < 0, that is, a\ < 4.

135 Define 0(c) = | (2(c + 4) - Vc 2 + 8c + 4). Inequality 0(c) - 3 > 0 has solutions in the set: (—oo, 2 - \/5) U ( 2 + \ / 5 , +OO). Therefore, for c G (0,1], 0(c) - 3 < 0, that is, a2 < 3. Consequently, for c € (0,4 - 2%/3) and a £ (ai,4], it follows from (7) that 0. Thus, <7i < z 2 < 02, that is, y 2 6 Br(x*). Moreover, since a ^ 2, y 2 ^ z*. According to Eq. (4), DF(y2) is equal to a ( l - 2z 2 )M with eigenvalues X2j = a ( l - 2z2)f3j. By Lemma 2, det[DF(y 2 )] = n £ = i \2j = aL(l 2z

2)L

(Uj=iPj\)-

For

1(2 - a)(l - 2e)| > 1 and a > ax > 3, that is,

fy ^ 0 (i = 1, 2,..., L), and 1 - 2z 2 < 0, one has det[£>F(y 2 )] ^ 0. Let xo = y 2 , x i = y j , for |(2 — a ) ( l — 2e)\ > 1, a G (oi,4] and c £ (0,4 - 2%/3), one has that F ( x i _ i ) = x, and d e t D F ( x i _ 1 ) / 0 for i = 1,2, and moreover, F m ( x 0 ) = x*, xo / 2:' with m — 2. Thus, condition (2) of Theorem 1 is also satisfied. Consequently, CML (2) is chaotic in the sense of Li-Yorke. The proof is completed. 4. S i m u l a t i o n According to Theorem 2, e, a and L are assumed as 0.95, 4 and 32, respectively. The Lyapunov exponents of the CML with these parameters are plot in Fig. 1. It is shown that all the Lyapunov exponents are positive. Thus, the CML is chaotic. The state phase, i.e., the values of the state variables of all sites , is plot in Fig. 2. State variables distribute uniformly in the interval [0,1], which is desirable for applying CML in cryptography. 5. Conclusions A rigorous proof of Li-Yorke chaos in a spatiotemporal chaotic system has been presented in this paper. Meanwhile, a sufficient condition for chaos in the spatiotemporal CML was also derived, which gives a criteria of constructing chaotic CMLs for their applications in some cases where chaos is benefit. 6. A c k n o w l e d g e This research was support by the Germany/Hong Kong Joint Research Scheme (9050180). References 1. K. Kaneko(ecL), Theory and Application of Coupled Map Lattices 1, (1993).

136

5

10

15

F i g u r e 1.

• • : • ; 5

25

30

Lyapunov exponents

= : : ! • : 10

Figure 2.

2. 3. 4. 5. 6.

20

15

' ; : • 20

: ;

i

: : •

25

30

State phase

K. Kaneko, Phys. D34, 1 (1989) K. Kaneko, Phys. Lett. A119, 397 (1987) F.H. Willeboordse and K. Kaneko, Phys. Rev. Lett. 73, 533 (1994) K. Kaneko, Prog. Theor. Phys. Suppl. 99, 263 (1989) H. Shibata, Phys. A292, 182(2001)

137 7. H.P. Lu and S.H. Wang and X.W. Li and G.N. Tang and J.Y. Kuang and W.P. Ye and G. Ku,Chaos 13, 617(2004) 8. P. Li and Z. Li and W.A. Halang and G.R. Chen, Phys. Lett. A, accepted in 2005. 9. P. Li and Z. Li and W.A. Halang and G.R. Chen, Int. J. Bifurcation and Chaos, 16, 2006. 10. P. Li and Z. Li and W.A. Halang and G.R. Chen, Chaos, Solitons and Fractals, accepted in 2005. 11. T.Y. Li and J.A. Yorke, Amer. Math. Monthly 82, 985(1975) 12. F.R. Marotto, J. Math. Analysis and Appl. 63 199(1978) 13. H. Luekepohl, Handbook of Matrices 1998 14. C.P. Li and G.R. Chen, Chaos, Solitons and Fractals 18 69(2003) 15. Y. Shi and G. Chen, Science in China Ser. A Mathematics 34 595(2004) 16. Y. Shi and P. Yu, Dynamics of Continuous, Discrete and Impulsive Systems in press (2005)

ON THE PROBABILITY A N D R A N D O M VARIABLES

ON

IF E V E N T S

B. RIECAN. * Matej Bel University, Tajovskeho 40 SK-97401 Banska Bystrica Mathematical Institute of Slovak Academy of Sciences Stefdnikova 49, SK-81473 Bratislava E-mail: [email protected]

One of the important problems of the theory of IF sets in the creation of probability theory. Recently the family J7 of IF-events was embedded in an MV-algebra, hence many results of the probability theory on MV-algebras can be applied. Of course, the mentioned MV-algebra has some special properties. The aim of the paper is the description of the basic notion of the probability theory on the special MV-algebra.

1.

Introduction

F i r s t recall some basic definitions. B y a n I F - set ([1]) we consider a pair

A = {^A, I>A) of functions fiA, VA : fl —> [0,1] such that VA + VA

< 1-

HA is called a membership function of A, VA a nonmembership function of A. If (Q,,S,P) is a probability space and HA^A are S-measurable, then A is called and IF - event and the probability of A is defined axiomatically (see Definition 2.1). Denote by T the set of all IF-events on fixed probability space (£2, S, P). After some constructive definitions of the notion of probability on F ([5], [6], [15]) a descriptive characterization was given ([8]) and then an axiomatic definition was presented ([9]). Now it is known the general form of all probabilities on the set T ([10]). In Section 2 we show that any probability on T can be described by the help of a probability on the Lukasiewicz triangle ([3], [7]). •WORK PARTIALLY SUPPORTED BY GRANT VEGA 2/2005/02

138

139 The second important notion of the probability theory is the notion of a random variable = a measurable function, hence such / : fl —> R that f~1{A) e. S for any Borel set A € B(R). Following the probability theory on MV-algebras instead of the notion of a random variable we use the notion of an observable as a morphism x : B(R) —> T ([4], [13], [14]). In Section 3 we describe all IF observables in T and prove that to any IF observable there exists their joint observable. 2. Probability Denote by T the family of all IF - events, and by J the family of all compact intervals. In the following definition we shall assume that [a, b] + [c, d] = [a + c, b + d] and [an, bn] / [a, b] if an / a,b„ f b. On the other hand we define (ftA, VA) © (ftByVB) = (l*A © fiB, "A 0 VB) {HA,VB)Q{\XB,VB)

=

{HAQ>IJ-B,VA®VB)

where f®9

= m i n ( / + g, 1)

/ © 3 = max(/ +

fl-l,0).

Moreover {VAn,VAn)

/

{VA,VA)

means fJ-An /

HA,VA„

\

vA.

Definition 2 . 1 . ([9]). An IF - probability on T is a mapping V : T -> J satisfying the following conditions: (i)

P((0,1)) = [0,0])7>((1>0)) = [1,1];

(it)

V{{HA,VA))+V{(IIB,VB))

V({HA,UA))

for any (nA,i/A),

© (HB,UB))

+ V((HA,VA)

= ©

( ^ B , ^ B ) £ T\ («0

(»An,VAn)

/

(»A,VA)

V{{»An,VAn))/V{{llA,VA))-

=*

(»B,VB))

140 In [10] it was proved that to any probability V : T' —• J there exist a,(3 e R,0
= [(1 -a)

f HAdP + a [(1

( 1 - / 3 ) / fiAdP + 0

-

vA)dP,

f(l-uA)dP}.

Moreover, in [12] an MV-algebra M was constructed such that T can be embedded to M. Of course, the aim of the paper is to show that the probability theory on T can be realized without using of MAs a laboratory for T the Lukasiewicz triangle can serve ([3], [7], [11]) A = {(«, v); u, v e R, 0 < u, 0 < v, u + v < 1} endowed with the ordering ( u i . U l ) < (U2,V2)

<^=> Ui < U2,Vi > V2,

and two operations ©, © (ui,Vl)

© (U2,V2) = («1 ®U2,V!

(ui,Vi)0(U2,V2)

= (lil QU2,V!

G>V2) ®V2)

where as before s © t — min(s + t, 1), s © £ = max(s + £ — 1,0). Definition 2.2. Probability on A is any function p : A —> J such that the following properties are satisfied: (i)

p((0,l)) = [0,0] ( P ((l,0)) = [l,l]; (ii)

p({ui,vi))+p({u2,v2))

p({ui,vi))@(u2,v2))+p((u1,vi)0(u2,V2)) for any (wi.wj), {u2,v2) e A; (Hi)

(un,vn)

P((Un,Vn))

/

(u,v) =$> Sp((u,v)).

=

141 If (fi,S, P) is a probability space, and A = ((IA,VA)

—> .T7 we put

I M / M . ^ A ) ) = ( / M^d-P, / i^d-P). Then evidently i>:T -^ A. T h e o r e m 2.3. ^4 mapping V : T' —> J is a probability if and only if there exists a probability p : A —> J such that P = p o tp. Proof. By Theorem of [10] every probability V : T—> J has the form V{(MA,

"A)) = [(1 -a)

f liAdP + a [ (1 Jn Jn

( 1 - / 3 ) / nAdP + 0 Jn Jn

uA)dP,

f{l-vA)dP].

By Theorem 2 of [7] the function p : A —> J given by p{u, v) = [(1 - a)u + Q ( 1 - v), (1 - /?)K + /3(1 - v)] is a probability measure, and evidently P ° ^((AM, »A)) = PW>((/M, VA))) =

= p([ iiAdP, f vAdP) = P((IJ.A,VA))Jn Jn On the other hand, if p : A —* J is any probability, then by Theorem 2 of [7] there exist a, (5 such that p(u, v) = [(1 - a)u + a{\ - v), (1 - 0)u + /3(1 - v)}. Then 7> -> J defined by V{(HA,VA))

= P(TP{{VA,"A)))

=

= p( / / ^ d P , / ^ d P ) = Jn Vn = [(1 - a ) / /MdP + Q ( 1 - / Jn Jn

(1-/3) / iiAdP + P(l-

Jn is a probability by Theorem of [10].

vAdP),

I vAdP)\

Jn

142 3.

Observable

As we have mentioned yet instead of measurable functions / : ft —> R one can consider observables B(R) —> S,A >—> f~1(A). Generalizing this approach we define the notion of IF-observable. Definition 3.1. An IF-observable is a mapping x : B(R) —> J (B(R) being the family of all Borel subsets of R) satisfying the following properties: (t)

x(fl) = ( l n , O n ) ;

(ii) A,BeB(R),AnB

=
x{A) ©x{B) = (0, l),x(AUB) (Hi)

A„ /

A=>

= x(A) ©x(B);

x{A„) /

x(A).

Definition 3.2. The joint IF observable of IF observables x,y : B(R) —> T is a mapping h : B(R2) —> T satisfying the following conditions h(R2) = (ln.On);

(0

(ii) A,B£B(R2),Ar)B

= ®=>

x(A) © h{B) = (0,1), h(A UB)= (Hi)

h(A) © h(B);

An/A^h(An)/h(A).

(iv)

h(C x D) =

for any C,£> G B(fl). (Here (f,g).(h,k)

=

x(C).y(D) (f.h,g.k).)

T h e o r e m 3.3. To any two IF observables x,y : B(R) —» J" iftere exists t/jeir joint ZF observable. Proof. Put x(vl) = (z b (A), 1 - x*(A)),y(B) for fixed u> € fl

= ( ^ ( B ) , 1 - y»(B)), and

At(>l) = x b (yl)( W ),Al(^) = x»(A)(u;), K t ( 5 ) = ^(J5)M,K»,(B) = y»(B)(a,). Then A^,AJ[,,K^,K^ : B(fl) -> [0,1] are probability measures. For C e B(R2) define h(C) =

(h\C),l-h»(C)),

143 where h\C){u)

= (\l

x

KI)(0,

ft'(C)M

= (A«,XK«,)(C)

First we must prove that /^(CO./i^C) : fi —> [0,1] are measurable. If C = A x B, then ftb(^l x B)( W ) = (Xl x , £ ) ( A x B) = = Aj,(A)./£(B) = x\A)(co).y\B)(uj)

=

= ^(A).^(B)M. Since xi(A),y]'(B) are 5-measurable, ft^A x B) : fi -» [0,1] is <Smeasurable as the product of two 5-measurable functions. Since /C = {C € i ? ^ 2 ) ; /i b (C) : fi -> [0,1] is <S-measurable} is a q — cr-algebra containing the family C = {A x B; A € B{R),B e B(R)}, i.e. £ is closed under difference of own subsets and countable unions of disjoint sets. Therefore K, contains the smallest q — cr-algebra over C and it coincides with the cr-algebra B(R2) of all two-dimensional Borel sets ([14], 1.1). Since K. D B(R2), we obtain that hb(C) is <S-measurable for any C e B(R2). The proof for /i»(C) is the same. For to prove that h(C) e T it is necessary to show that ht?{C) + 1 - h*{C) < 1, hence h\C) < 0(C). Of course, we know xb(A) < x*(A), y\B) < yi(B), hence A^ < Aj,, «£ < «£, for any w € fi. We have / ^ ( C ) M = A^ x K ^ ( C ) =

= f nl{Cu)d\l{u) < JR

< f Kl(C")d\i(u) < JR

< [ Kl(Cnd\l(u) JR

=

h\C)(uj).

144 4.

Conclusion

T h e r e is k n o w n a m e t h o d how t o o b t a i n new results of t h e probability t h e o r y on I F events by t h e corresponding results of t h e t h e o r y of M V algebras. In this p a p e r we have shown t h a t t h e m a i n notions of t h e p r o b ability t h e o r y c a n b e described in t e r m s of I F events only. It gives some b e t t e r possibilities for direct applications of t h e probability t h e o r y o n I F events a n d also for image processing p r o b l e m s .

References 1. Atanassov, K.: Intuitionistic Fuzzy Sets: Theory and Applications. Physica Verlag, New York (1999). 2. Cignoli, R., D'Ottaviano, I.M.L., Mundici, D.: Foundations of Many - Valued Reasoning. Kluwer, Dordrecht (2000). 3. Deschrijver, G. - Cornelis, Ch. - Kerre, E.E. Triangle and square: a comparison, Proceedings of the Tenth International Conference IPMU, Perugia, Italy 2004, 1389-1395 (2004). 4. Dvurecenskij, A., Pulmannova, S.: New Trends in Quantum Structures. Kluwer, Dordrecht (2000). 5. Gerstenkorn, T., Manko, J.: Probabilities of intuitionistic fuzzy events. In: Issues in Intelligent Systems: Paradigms (O. Hryniewicz et al. eds.). EXIT, Warszawa, 63 - 68, (2005). 6. Grzegorzewski, P. - Mrowka, E.: Probability of intuitionistic fuzzy events. In: Soft Methods in Probability, Statistics and data Analysis (P. Grzegorzewski et al. eds.). Physica Verlag, New York, 105 - 115, (2002). 7. Lendelova, K., Riecan, B.: Probability on triangle and square. IPMU'2006, Paris, to appear. 8. Riecan, B.: A descriptive definition of the probability on intuitionistic fuzzy sets. In: Proc. EUSFLAT'2003 (Wagenecht, M. and Hampet, R eds.), ZittauGoerlitz Univ. Appl. Sci, 263 - 266, (2003). 9. Riecan, B.: Representation of probabilities on IFS events. Advances in Soft Computing, Soft Methodology and Random Information Systems (M.LopezDiaz et. al. eds). Springer, Berlin, 243 - 246 (2004). 10. Riecan, B.: On a problem of Radko Mesiar: general form of IF - probabilities. Accepted to Fuzzy Sets and Systems. 11. Riecan, B.: On the entropy on the Lukasiewicz square. Joint EUSFLAT LFA 2005, Barcelona, September 7 - 9, 330 - 333, (2005). 12. Riecan, B.: On the entropy of IF dynamical systems. Issues in the representation and Processing of Uncertain and Imprecise Information, EXIT, Warszawa, 328 - 336 (2005). 13. Riecan, B. - Mundici, D.: Probability on MV-algebras. In: Handbook on Measure Theory (E.Pap ed.). Elsevier, Amsterdam (2002).

145 14. Riecan, B. - Neubrunn, T,: Integral, Measure, and Ordering. Kluwer, Dordrecht (1997). 15. Schmidt, E., Kacprzyk, J.: Probability of intuitionistic fuzzy events and their applications in decision making. Proc. EUSFLAT'99, Palma de Malorca, 457 - 460, (1999).

ANOTHER APPROACH TO TEST THE RELIABILITY OF A MODEL FOR CALCULATING FUZZY PROBABILITIES*

C H O N G F U HUANG

D O N G Y U N JIA

Institute of Disaster and Public Security College of Resources Science and Technology, Beijing Normal University Beijing 100875, China. E-mail: [email protected]

In this paper, we suggest a new approach to test the reliability of the interiorouter-set model for calculating fuzzy probabilities. With a sample drawn from a population, we use the model to obtain a possibility distribution on a probability universe with respect to a histogram interval. Then, with N samples drawn from the same population, we obtain N histogram estimates that an event occurs in the same interval. Because the distribution constructed by the histogram estimates is similar to the possibility distribution, according to the consistency principle of possibility/probability, we infer that the model is basically reliable.

1. Introduction It is impossible to precisely estimate a probability distribution of a population with a sample when the probability distribution function of the population is continuous. Using a fuzzy model to deal with the given sample, such as the interior-outer-set model (IOSM)4, we can obtain a fuzzy probability distribution. It is very important to test if the model is reliable. Executing some computer simulation experiments, we have demonstrated5 the reliability of IOSM in terms of the fuzzy expected value. In other words, the demonstration is available only for comparing expected values of a fuzzy probability distribution and a classical probability distribution. Plentiful information in a fuzzy probability distribution has not played an important role in the demonstration. The purpose of this paper is to propose a new approach to test the reliability of IOSM for calculating a possibility-probability distribution (PPD) 6 . * Project Supported by National Natural Science Foundation of China, No. 40371002, and the China-Flanders project BIL 011sll05 entitled "Intelligent systems for data mining and information processing"

146

147 It could be extended to test other models for calculating fuzzy probabilities. Notions related to fuzzy probabilities are introduced in section 2. In section 3, we give a brief survey of the interior-outer-set model. The new approach to test IOSM's reliability is described in section 4. A numerical simulations is shown in sections 5.

2. B a s i c Terminologies 2.1. Uncertainty,

probability

and

possibility

"Uncertainty" has a broad semantic content. When we use the dictionary again to examine these various meanings, two major types of uncertainty emerge quite naturally. They are well captured by the terms "vagueness" and " ambiguity". The concept of uncertainty is closely connected with the concept of information. The amount of information obtained by the action may be measured by the reduction of uncertainty that results from the action. "Probability" is a mathematical measure of the possibility of the event occurring as the result of an experiment. Probability theory is capable of conceptualizing only one type of uncertainty: conflict. Axioms of probability theory do not allow any imprecision in characterizing situations under uncertainty, be it imprecision in the form of nonspecificity or vagueness. "Possibility" is a mathematical measure of the possibility of the object being as a typical object. Possibility theory is capable of conceptualizing another uncertainty: nonspecificity (lack of informativeness). Whenever a basic probability assignment function in the DempsterShafer theory 1 0 induces a nested family of focal elements, we obtain a special belief measure, which is called a necessity measure, and the corresponding special plausibility measure, which is called a possibility measure. The only body of evidence they share consists of one focal element that is a singleton. Since the additivity axiom of probability theory is replaced with the maximum axiom of possibility theory, which guarantees the nested structure of focal elements in possibility theory, the two theories are complementary. Probability is suitable for characterizing the number of persons that are expected to ride in a particular car each day. Possibility theory, on the other hand, is suitable for characterizing the number of persons that can ride in that car at any one time. Since the physical characteristics of a person (such as size or weight) are intrinsically vague, it is not realistic to describe the situation by a sharp distinction between possible and impossible instances.

148 Possibility theory can be formulated not only in terms of consonant bodies of evidence within the Dempster-Shafer theory, but also in terms of fuzzy sets where possibility distributions are in a one-to-one correspondence with fuzzy sets, it is also meaningful to characterize possibility distributions by their degrees of fuzziness. In this paper, we refer to this interpretation of possibility theory as a fuzzy-set interpretation. Klir and Harmanec 8 examined bridge between standard possibility theory 1 2 and probability theory and studied transformations between these two theories. Possibility distribution r and probability distribution p are said to be consistent if it holds that for any event u Prob{u) < Poss{u),

(1)

where Prob denotes the probability measure corresponding to p and Poss denotes the possibility measure corresponding to r. 2.2. Fuzzy

probability

The theory of fuzzy probability was born into a fuzzy community where several researchers started thinking about the probability of a fuzzy event 11 . Gert de Cooman has presented a sound and deep approach 3 to vague probability. Cooman's model follows an approach to modelling uncertainty that was pioneered by Ramsey 9 . The present model also has formal connections with Zadehs fuzzy probabilities 1 3 ' 1 4 , although Cooman believes his model to be fundamentally different, since it has a clear behavioural interpretation, and a calculus that is very different from the one suggested by Zadeh. Engineers refer to fuzzy probability as imprecise probability due to that they haven't any basic probability assignment function in many cases. In this paper, we refer to this interpretation of fuzzy probability as the numerical probability that is a fuzzy quantity defined on the unit interval [0, 1]. The sum of these probabilities is not 1 by the rules of standard fuzzy arithmetic. And, fuzzy probability is characterized by a possibility distribution of probability. 2.3. Possibility-probability

distribution

To avoid any confusion, we restrict ourselves here to study the fuzzy probability that can be represented by a possibility-probability distribution. Definition 1 Let (Q, A, P ) be a probability space, and P be the universe of discourse of probability. Let irx(p) be the possibility that an event occurs

with probability p. Then,

n n ,p = {7rx(p) | x e n , p e ? } is called a possibility-probability

distribution

(2)

(PPD).

It is important to note that, in Definition 1, P is employed to represent a probability measure defining a probability space, and P the universe of discourse of probability. The P P D is a model of the second-order uncertainty 1 along with the first-order uncertainty they may form hierarchical models 2 . 2.4.

Histogram

estimate

Histogram is a model to estimate the probability distribution of an event occurring in some intervals. Let X = {x\,X2, • • • ,xn} be a given sample drawn from a population with P D F p{x). Given an origin XQ and a bin width h, we define the bins of the histogram to be the intervals [xo + mh,Xo + (m + l)/i) for positive and negative integers m. The intervals are chosen to be closed on the left and open on the right. p{x £ Ij) = —(number of xi in the same bin as x),

(3)

is called a histogram estimate (HE) of p{x). 3. Interior-Outer-Set M o d e l Interior-outer-set model (ISOM) is a hybrid model that consists of information distribution method 7 and possibility inference 12 . ISOM is suggested to calculating, with a sample X = {xi,x^, • • • ,xn}, a P P D defined on I x P, where, I = {h,h,

•••Jm)

(4)

and P = {pfc|* = 0Il,---,n} = { 0 , ^ > - - - > l } .

(5)

Let Uj be the midpoint of intervals Ij, A = u3•+1 — ttj, j = 1,2, • • • , j' — 1. Let _ / 1~\0,

9iJ

| Xi - Uj | / A , if | xt - Uj | < A; if \xi-uj\>A.

, . W

Where qtj is called the information gain of that observation Xi distributes to controlling points Uj. It is apparent that for samples within

the intervals Ij, the one with smallest value of q have the highest probability of leaving its interior interval and drift to a neighbor interval. On the contrary, out of interval Ij samples with highest value of q have highest probability of getting in this interval. Let Qj be the list of complemented membership degrees with respect to the information gains, from observations within interval Ij, and let Q t be the list of membership degrees with respect to the information gains, from observations outside interval. Furthermore let sort f [Qj) order the elements of the list according to ascending magnitude and similarly, let sort i (Qj') order the elements of the list according to their descending magnitude. When | Qj | = rij, we can use the formula to calculate a PPD as follows, which is called IOSM. 1st (smallest) element of QJ, if p = 0; 2nd element of Qj, if p = A;

TT/.,

(P)

Last (largest) element of Q , if p ifp=2i; 1, C i 1st (largest) element of Qj i f p = ^n,+2. T 2nd element of Qf, •

Tl

(7)

+ •

if p =

Last element of Q^,

if p = 1.

Then, from a given sample we can obtain a PPD. 4. Description of the New Approach Consider the following case; In an experiment group there are three researchers: (1) A computer scientist who draws N + 1 samples X\, X2, •••, X^, Xiv+i. A sample X consists of n random numbers Xi,i = 1,2, •• • ,n and n is not large. The computer scientist knows that the samples are drawn from a population with density p(x),x 6 R; (2) A statistician who is good at extracting statistical laws through Histogram Method. He does not know where the N samples Xi,X2, • • • ,XN comes from. Studying a sample, he obtain an estimate p(x) to estimate p{x). (3) A fuzzy engineer who is interested in calculating a PPD with a fuzzy model. He also does not know where the sample X^+i comes from. Analyzing the sample, the engineer gives a PPD, i.e., UQ,P to estimate p(x).

151 Let / be an interval with respect to a HE. Vz € / . There is no loss in generality when we supposed that the statistician obtains an estimate pj (x) resulted from sample Xj. Let u>k — Pk{x), the computer scientist obtains a sample: W = {wi,w2,---

,WN}

(8)

Employing a reasonable mathematical statistics method (again histogram model is the simplest method), with W the computer scientist can obtain an estimate f(p) of a probability distribution with respect to probability values in P. When N is large, f(x) is a quality function depicting the scattered field to estimate p(x) with a sample. For the same interval / , there is no loss in generality when we supposed that the fuzzy engineer obtains a possibility distribution irx(p) as a fuzzy probability of t h a t x occurs in / . According to the consistency principle of possibility/probability shown in Eq.(l), Prob could be used to infer if Poss is basically reasonable. Therefore, if irx(p) is similar to / ( p ) , it is natural for the computer scientist to infer that the model to calculate the P P D is basically reliable. The approach to test a model with many HEs is called HistogramCovering Approach. To test a specific model, we might employ a specific form of the approach to accomplish our task.

5. A N u m e r i c a l Simulation E x p e r i m e n t In this section, we employ Histogram-Covering Approach to test the reliability of IOSM for fuzzy probabilities. For the intelligibility we suppose t h a t the samples are drawn from normal distribution AT(6.86,0.372 2 ). Firstly, running Program 2 in paper [5], a generator of random numbers, with MU=6.86, SIGMA=0.372, N = l l and SEED=82495, we obtain 11 random numbers: X = {xi,x2,--,xu} = {0.91,6.59,6.31,6.50,7.03,6.49,7.27,7.13,6.72,7.42,6.34}.

(9)

Employing IOSM, we obtain a P P D : h

h h

P0 / 1.00 0.06 0.04 \ 1.00

Pi 0.41 0.09 0.19 0.45

P2 0.35 0.10 0.19 0.19

P3 0.10 0.29 0.39 0.00

P4 0.09 0.35 0.45 0.00

P5 P6 0.00 0.00 0.41 1.00 1.00 0.29 0.00 0.00

VI 0.00 0.39 0.06 0.00

P8 0.00 0.19 0.00 0.00

P9 0.00 0.04 0.00 0.00

P10 Pll 0.00 0.00\ 0.00 0.00 (10) 0.00 0.00 0.00 0.00 )

152 where h = [5.65,6.25), h = [6.25,6.85), h = [6.85,7.45), h = [7.45,8.05), Pt = i / H . * = 0 , l , - " , 1 1 . Secondly, running the same generator with MU=6.86, SIGMA=0.372, N = l l , and 90 SEEDs, respectively, we obtain 90 samples, X\, Xi, • • • , Xgo. For example, with SEED=876905, we obtain: X3 = {7.14,6.98,6.83,7.00,7.34,6.47,7.65,6.99,6.71, 7.47,6.26}. Employing Histogram Method, with the same intervals used in Eq.(10) and Xk, k = 1,2, • • • ,90, we obtain 90 HEs p\(x £ Ij),p2{x € Ij), • • • ,£90(2: £ Ij),j = 1,2,3,4. Thirdly, for an interval Ij, we obtain 90 estimate values of probability p(x € Ij), forming a sample Wiy For example, for I^ = [6.25,6.85), we obtain: Wi2 = {wi,w2,--,w 9 0 } = {0.55,0.55,0.36,0.45,0.09,0.45,0.64,0.45,0.55,0.27,0.27,0.55,0.64, 0.36,0.36,0.73,0.36,0.45,0.36,0.27,0.64,0.27,0.45,0.36,0.36,0.55, 0.64,0.45,0.45,0.64,0.55,0.27,0.45,0.36,0.45,0.55,0.45,0.36,0.36, 0.45,0.27,0.45,0.45,0.45,0.18,0.36,0.36,0.27,0.73,0.55,0.36,0.45, 0.55,0.64,0.64,0.27,0.55,0.36,0.64,0.45,0.36,0.36,0.36,0.55,0.36, 0.36,0.18,0.27,0.27,0.55,0.55,0.64,0.55,0.45,0.45,0.45,0.55,0.09, 0.27,0.36,0.55,0.45,0.45,0.73,0.45,0.36,0.18,0.36,0.36,0.36} Then, employing the method of information distribution 7 ,with samples Wijtj = 1,2,3,4 and controlling points pt = t/ll,t = 0 , 1 , • • • , 1 1 , we obtain an estimate F(p) shown in E q . ( l l ) . PO Pi V2 h ( 1-00 0.40 0.14 F(p) = I2 0.00 0.08 0.12 h 0.00 0.05 0.19 h \ 1-00 0.45 0.17

P3 0.02 0.46 0.38 0.06

Pi 0.02 1.00 0.90 0.02

P5 P6 0.00 0.00 0.92 0.67 1.00 0.90 0.00 0.00

P7 0.00 0.38 0.71 0.00

P8 0.00 0.12 0.05 0.00

P9 0.00 0.00 0.10 0.00

PlO Pll 0.00 0.00 \ 0.00 0.00 (11) 0.00 0.00 0.00 0.00/

where Ij,j = 1,2,3,4 and pt,t = 0, !,••• ,11 are the same as ones in Eq.(10). Finally, comparing Eq.(10) and E q . ( l l ) , we know that I I ^ p is similar to F(p). According to the consistency principle 8 , we know that IOSM is basically reliable with respect to this computer simulation experiment. Executing a lot of computer simulation experiments with different seed numbers, sample sizes, populations, respectively, we have results showing that IOSM is basically reliable.

153 Acknowledgment The work on this paper was done in Key Laboratory of Environmental Change and Natural Disaster, The Ministry of Education of China. References 1. Gert de Cooman, Possibilistic previsions, in: Proceedings of the 7th International Conference on Information Processing and Management of Uncertainty in Knowledge-Based System, Paris, France, 1998, Vol.1, pp. 2-9. 2. Gert de Cooman, Lower desirability functions: a convenient imprecise hierarchical uncertainty model, in: Proceedings of the 1th International Symposium on Imprecise Probabilities and Their Applications, Ghent, Belgium, 1999, pp. 111-120 3. G.de Cooman, A behavioural model for vague probability assessments, Fuzzy Sets and Systems, 154(3), (2005), 305-358. 4. C.F. Huang, An application of calculated fuzzy risk, Information Sciences, 142(1), (2002), 37-56. 5. C.F. Huang, A demonstration of reliability of the interior-outer-set model, International Journal of General Systems, 33(2-3), (2004), 205-222. 6. C.F. Huang, C. Moraga, A fuzzy risk model and its matrix algorithm, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(4), (2002), 347-362. 7. C.F. Huang, and Y. Shi, Towards Efficient Fuzzy Information Processing— Using the Principle of Information Diffusion, Physica-Verlag, Heidelberg, 2002. 8. G.J. Klir, D. Harmanec, On some bridges to possibility theory, in: Gert de Cooman, D Ruan, E.E. Kerre (Eds.), Foundations and Applications of Possibility Theory, World Scientific, Singapore, 1995, pp. 3-19. 9. F.P. Ramsey, Truth and probability, in: R.B. Braithwaite (Ed.), The Foundations ofMathematics, Routledge & Kegan Paul, London, 1931, pp. 156-198. 10. G. Shafer, A Mathematical Theory of Evidence, Princeton Univ. Press, Princeton, NJ, 1976. 11. L.A. Zadeh, Probability measures of fuzzy events, Journal of Mathematics Analysis and Applications, 23(1), (1968), 421-427. 12. L.A. Zadeh, Fuzzy sets as a basis for a theory of possibility, Fuzzy Sets and Systems 1(1) (1978) 3-28. 13. L.A. Zadeh, Fuzzy probabilities, Information Processing Management, 20(3), (1984), 363-372. 14. L.A. Zadeh, Toward a perception-based theory of probabilistic reasoning with imprecise probabilities, Journal of Statistical Planning and Inference, 105(1), (2002), 233-264.

A NOVEL GAUSSIAN PROCESSES MODEL FOR REGRESSION AND PREDICTION YATONG ZHOU f Dept. Information and Communication Engineering, Xi 'an Jiaotong University, Xianning Road XVan, 710049, P. R. China TAIYI ZHANG Dept. Information and Communication Engineering, Xi 'an Jiaotong University, Xianning Road Xi'an, 710049, P. R. China ZHAOGAN LU Dept, Information and Communication Engineering, Xi 'an Jiaotong University, Xianning Road Xi'an, 710049, P. R. China A novel multi-scale Gaussian processes (MGP) model is proposed for regression and prediction. Motivated by the ideas of multi-scale representations in the wavelet theory, in the new model a Gaussian process is represented at a scale by a linear basis that is composed of a scale function and its different translations. Finally the distribution of the targets can be obtained at different scales. Compared with the standard GP model, the MGP model can control its complexity conveniently just by adjusting the scale parameter. So it can trade-off the generalization ability and the empirical risk rapidly. Experiments show that the performance of MGP is significantly better than GP if appropriate scales are chosen.

1. Introduction In this paper we consider the regression as a problem of finding a desired dependence using a limited number of samples. Once such dependence has been accurately estimated, it can be used for prediction. One goal of the prediction is the prediction accuracy for future data, also known as the generalization ability. And another goal is a lower empirical risk that measures the discrepancy between the true and the estimated targets for the given samples. But these two goals are contradictory if the given sample size is finite. In order to trade-off the * Work partially supported by grant 90207012 of the China National Science Foundation.

154

155 generalization ability and the empirical risk, any model for learning from finite samples needs to have some complexity control. A typical example is the VCtheory that provides a general framework for complexity control called Structure Risk Minimization (SRM) [1]. A Gaussian process is a stochastic process whose joint distribution is a Gaussian. The Gaussian processes are a recent development for solving regression problems, though they have a longer history in spatial statistics [2]. We call a Gaussian process a GP model when it is integrated into a regression modeling. The GP model has emerged as one of the most popular regression and prediction tools [3]. This is perhaps because of its impressive generalization ability over a range of applications. A GP model is specified by a mean and a covariance function. For simplicity, we will only consider GP model with zero mean. Once the covariance function is fixed, it is easy to carry out regression and prediction. Usually the covariance function contains an undetermined parameter G, the GP model controls its complexity by using conjugate gradient method to find the maximum likelihood values of 8. However the evaluation of the gradient of the likelihood requires the inversion of the covariance matrix. Hence calculating gradients becomes time consuming for a large sample size. Additionally, covariance functions most frequently used in the GP model are Gaussian functions. It can be proven that employing this kind of covariance functions is equivalent to representing a Gaussian process by a set of Gaussian radial basis functions centered at different points [4]. But this set of basis functions are not complete, some times the representation is only an approximation. Motivated by the ideas of multi-scale representations in the wavelet theory [5], we represent a Gaussian process at a scale by a linear basis that is composed of a scale function and its different translations, and then a novel multi-scale Gaussian processes (MGP) model is proposed. Comparing to the standard GP model, the new model enjoys three advantages. Firstly, it can control the complexity conveniently just by adjusting the scale parameter. As a consequence it can trade-off the generalization ability and the empirical risk rapidly. Secondly, the scale function and its translations is a complete basis, so the Gaussian processes can be accurately represented at a scale. And thirdly, experiments show that the performance of MGP is significantly better than GP if appropriate scales are chosen. The remainder of this paper is organized as follows. In Sec.2 we review the standard GP model. Following that, in Sec.3 a novel MGP model is proposed.

156 Experiments are reported in Sec.4 where we make comparisons between the two models. Finally some conclusions are presented in Sec. 5. 2. A review on GP model In the following we concentrate to the regression problem assuming that the value of the target function /(x) is generated from an underlying / ( x ) corrupted by Gaussian noise f(x) with mean 0 and variance erj /(i) = / ( x ) +

ff(x).

(1)

Now given a collection of N training samples Z? = {(xn,/n),tt = l,---,Af] , we would like to construct an estimate / ( x ) to the true function / ( x ) which can serve as a reasonable approximation. For convenience, we define sample vectors XN =(xux2,---xN) and corresponding target vector tN =(t[,t2,---tN) . The empirical risk that measures the discrepancy between the true and the estimated targets is defined by

In this section we give a review on the standard GP model and the details may be found in Ref. [4]. The GP model places a Gaussian process prior directly on / ( x ) and /(x). Now assume / ( x ) be generated by a fixed basis functions with a random weight vector w = (w,, w2, • • • wH )

/«=2>/A(4

(2)

In terms of Eq. (1) and (2) we can infer that t„~N(0,C„),

(3)

where CN is a covariance matrix. The next question to address is how to predict/(x JV+ ,) given the test sample x w+l . Let /N+1 denote the target of xN+1, the inference is simple since the joint density P(tN+t,tN) is a Gaussian, the predictive distribution P(tN^ | tN) is also a Gaussian. The distribution P(tNt] 11^) can be obtained by using Baye's rule. After that, it's mean is regarded as / ( x N + 1 ) . To use conjugate gradient method to find the maximum likelihood values of 9 we need to calculate the log likelihood Tand its derivatives. The partial derivatives of T with respect to 0 can be expressed analytically by

51739 = -Q.5Trace[C^ -(dCN /de)\ + 0.5tTNC~lj (dCNld9)C^tN. 3.

Proposed MGP model

One of the areas of investigation in multi-scale analysis has been the emerging theory of multi-scale representations of signals and wavelet transforms [5]. These theories lead naturally to the investigation of multi-scale representations of a stochastic process. Basseville etc. have outlined a mathematical framework for the multi-scale modeling and analysis of stochastic processes [6], [7]. Differ from Basseville's work, the MPG model seeks a multi-scale representation as following

/,(*)=Lw<'Vx)

(4)

where R w is an N x Hj matrix with the element RJ* = jk (x„). Similarly to the GP model, the MGP model assumes /(x) = / y (x) + f(x) and then arrives at t„~JV(0,C#>).

(5)

The matrix C)/ gets the following form C # = ( o M ) 2 R M R M r +
(6)

where I is a unit matrix with the rank N. The («,«') entry of C^' is given by C. (x„,x„,) = alSn, +(a^)2 ^M2'J^

-kW~'*<

~k)>

<7)

k

where 8„„, =1 if n ~ ri and 0 otherwise. tin

Now our task is to infer fj(xNt]), the prediction value of the test sample xN+1 at the scale j . Let t^\ denote the target of %N+] at the scale j . In terms of the assumption that /• (x) is a Gaussian process, it is derived that ^ ,

|t„)ccexp

(8)

2 ( ^ where the mean and variance are given by

)2

158

and k w is the sub block of C(^+1). As anticipated, the Eq. (8) gives the distribution of target t^\x at the scale j . Finally we let ) = tu)

f (x The interval

is called the error bar. It is a confidence interval of 4+i that represents how uncertain we are about the prediction at xw+1 assuming the model is correct. 4. Experiments 4.1. One dimensional toy example Nabney has illustrated the operation of the GP model for regression using 10 training samples drawn from a sine function / ( x ) = sin2;rx with the noise variance a] =1 [8]. Fig. 1(b) shows his results. Similarly, we implement the MGP model for solving the above regression problem. The results at different scales 7(y' = 0,-l,-2,-3) are illustrated in Fig. l(c)-(f). The MGP model involves a numerical calculation of covariance function (7). The function ^ we chosen is the scale function of the Daubechies wavelet with order 10 (DB10). It is smooth and compactly supported. We define the average width of the error bars (A WEB) A

M

where er^ is the error bar of the m-th point which is picked uniformly from the interval of x. AWEB is a meaningful measure of generalization ability because it represents the uncertainty of prediction averagely whereas the generalization ability reflects the prediction accuracy of the model. Assuming the parameter M 50 we obtain the AWEB at different scales in Fig. 1. Performance comparisons between the two models in terms of the empirical risk and the AWEB can be observed in the Table 1.

159

4

;

\

4

\

ft

ft'i

ft*

1V>

*>

\

0*

J 08

s

SS

!

&

0?

H

tt«

(a)

i*

t

1*

«

fiS

B*

(b) 1.

i

8»

(c)

y**-

1

s.

1

V

\ !

> \

/

J.

I

11

\

t

(d)

(e)

(f)

Figure 1, The one dimensional toy example: regression and prediction (solid line) with error bars (dot lines) are presented in the figure, (a) 10 samples (circles) and sine function (dashed line), (b) the GP model, (c)-(0 the MGP model with (c)j=0, (d)j=-l, (e) j= -2 and (f) j= -3

A quick inspection of Fig. 1 and Table 1 shows the scale j plays a key role in controlling the complexity of the MGP model. When j is too large (j = 0), AWEB is small whereas the empirical risk is high. It means that an underfitting has happened. The complexity of MGP model is too low to exhibit the intrinsic characteristics of the estimated function. However, AWEB is large whereas the empirical risk is low when j is too small (j = -3). It means that the complexity of MGP model is so high that an overfitting has happened. Only if appropriate j is chosen (y' = -l or j = -2), i.e., an appropriate complexity is chosen, can good generalization ability and low empirical risk be achieved at the same time. 4.2. Two dimensional toy example The training set are constructed by generating 81 samples randomly on the [8,8]x[-8,8] square and the targets is calculated by Eq. (1), where f(x) is drawn from N(0, 0.25) and /(x) = /(x(l),x,2)) = ((^l))2-(^2))2jsin(o.5^l)) . The test set comes from generating 289 samples on a 17 by 17 grid over the square. The results of two models are shown in Fig. 2 and Table 1. Under the conditions of y' = 2, the performance of MGP is significantly better than that of GP. However, an overfitting and underfitting has happened when 7 = 0 and 7 = 3 respectively.

160

4*^ (c)

Figure 2. The two dimensional toy example: (a) shows the function from which 81 noisy samples are generated and the noisy data points in relation to the function. The offset of each datum due to noise is shown as a dashed line, (b) the results of GP. (c)-(f) the result of MGP with (c) j=0, (d) j=l, (e)j=2and(f)j=3. Table 1. Performance comparisons between the two models in two toy examples. In the table the symbols ER represents empirical risk. Example ID toy

Perform.

GP model

ER*100 AWEB*100

0.11 28.82

ER*10 AWEB*10

18.92 5.17

2D toy

j=-3 0.03 204.23 j=0 2.28 30.76

MGP model j=-2 j=-l 0.06 0.07 44.81 19.53 j=2 j=l 4.53 5.19 4.01 2.68

j=0 6.48 12.80 j=3 22.76 1.23

4.3. Real-word prediction problem Can the good results for the MGP model on toy examples carry over to realworld data? To answer this question, experiment has been performed on the laser data. We use the laser data to illustrate the error bar in making predictions. The laser data has been used in the famous Santa Fe Time Series Prediction Competition. A total of 1000 points are used as the training sample and 100 following points are used for prediction. Fig. 3 plots the predictions and the error bar. The predictions of the MGP model (j = l) match the targets very well except on the region [1065, 1080], Reasonably the model can provide larger error bars for these predictions.

161

,_ , A , v

-"

\

t '

'11te

•••'•"•'•• E n o t

ioS

m5~~~lm

io»

itoo

''rW>

km

1020

./

•41

M h\ I

c

-~

""•'•*••'-'•

V

H\\V

MHO ."" ieeo

\

IOSO

' itoo

Figure 3. Graphs of our predictions on laser data. In the left graph laser data (dashed line), prediction (solid line) and empirical risk (dotted line) are presented. In the right graph the error bar (solid line) and the empirical risk divided by 50 (dotted line) are presented.

5. Conclusions In this work we have proposed a MGP model for function regression and prediction. AWEB is used to act as a meaningful measure of the generalization ability. Experiments indicate that the MGP model can control the complexity conveniently just by adjusting the scale parameter. Additionally, its performance is significantly better than the GP model if appropriate scales are chosen. Future work could include the automatically relevance determination (ARD) of the MGP model. References 1. V. Vapnik, The Nature of Statistical Learning Theory. 112 (1998). 2. C. K. I. Williams, Learning and Inference in Graphical Models. 11 599 (1998). 3. P. Sollich and A. Halees, Neural Computation. 14, 1393 (2002). 4. C. K. I.Williams, Machine Learning. 40, 77 (2000). 5. S. Mallat, IEEE Trans. PAMI. 11, 674 (1989). 6. M. Basseville, A. Basseville and K. C. Chou, IEEE Trans. Information Theory. 38 766 (1992). 7. K. C. Chou, S. A. Golden and A. S. Willsky, Proc. IEEE Int. Conf. ASSP. 1, 1709(1991). 8. I. T. Nabney, NETLAB: Algorithms for Pattern Recognition. 369 (2001).

ON PCA ERROR OF SUBJECT CLASSIFICATION* LIHUA FENG Department of Geography, Zhejiang Normal University, Jinhua 321004, China E-mail: fenglh@zjnu. en FUSHENG HU, LI WAN School of Water Resources and Environment, China University Beijing 100083, China

ofGeosciences,

Since subjective chose could cause the loss of valuable original information, statistics method is employed to deal with multi-variable problem. After normalization, original variables are reduced to several independent synthetic variables on which evaluation is based. Principal Component Analysis provides a good example for this. But the real data calculation shows that the result of Principal Component Analysis is not always complied with the real situation. Sometimes, it can be totally messed up. Some problems exist regarding to this classification. They are as follows: (1) The discrimination ability of PCA is limited, (2) For those samples with big variables, PCA losses its ability of discrimination, (3) When the value of variable increases, on the contrary, class level decreases, (4) The same samples, while different classifications, (5) Variables change a lot, while classification keeps unchanged, (6) While variables change arbitrarily, there are only two different classifications, (7) The position change of variables causes the change of classification, (8) The change of a variable causes the change of the classification. These problems are caused by the nature of Principal Component Analysis itself.

1. Introduction In subject classification, Principal Component Analysis (PCA) is implemented as an objective and practical validation method [1, 2, 3]. It has been used to simplify the high dimensional problems while retaining as much as possible of the information present in the data set [4]. Meanwhile, the weight assigned to each variable that takes into account is obtained objectively to avoid subjective judgment. Therefore it is widely used in finding the synthetic factors, sorting samples, classification, etc. It is especially useful in insect classification, flora classification, environmental quality classification, deposit classification, geology sample classification, etc [5, 6], However, the real data calculation * This work was supported by Zhejiang Provincial Science and Technology Foundation of China (No. 2006C23066).

162

163 shows that sometimes the result of PCA is not complied with the real situation, and problems exist regarding to this classification. These problems will be discussed in this paper. 2. PCA Theory and Method PCA has in practice been used to reduce the dimensionality of problems while retaining as much as possible of the information present in data set, i.e., by linear transformation, PCA reduces dimensionality by extracting the smallest number of synthetic and uncorrelated components that account for most of the variation in original correlated multivariate data and summarizes the data with little loss of information. It simplifies the problem by catching the main features of it [7, 8]. Assume original variables are X\, x2, •••, xp, and the new synthetic variables obtained by PCA are zu z2, •••, zm, which are the linear combination of the original variables x\, x2> ••\xp (m - - > e m accordingly. Therefore, Z) is the combination of those variables that explains the greatest amount of variation, and the m-dimensional hyper-plane is the m-dimensional subspace which retains the maximum amount of information about the input data set [9, 10]. The steps involved in PCA are as follows: (1) Normalize the original data set to avoid the much variance of variables and the difference of the measurement units. *j=(* s -*/) / C T ', (' = l,2,"-,/?;y' = l,2,--,n) where xtj is the original data of the /* variable and/ h class, and xt and ai are the sample mean and standard deviation of j * index, respectively. (2) According to the standardized data set (**) , calculate the correlation matrix R = (riJ)pxp, where n *=i

(3) Compute the characteristic value and characteristic vector of R. According to characteristic equation \R - Al\ = 0, find the eigen values A, of the matrix and sort them in the order of decreasing magnitude A, > X1 > • • • > Xp. At the same time, find the corresponding eigen vector u\, u2, •••, up. They are orthonormal and called principal axes.

164 (4) Compute contribution rate em=XllYJXl and cumulative contribution

rate E.^Aj/f.*, ,=1

•

(=1

(5) Compute principal component z„ = X 2 M / X y=l 1=1

(6) Synthetic analysis. In order to retain as much of the variability in data as possible, what is the necessary accuracy of an m-dimensional system to substitute the original system? This can be estimated by calculating the cumulative contribution rate Em . Usually the minimum m with Em >85% (m< p ) is chosen. Once m is determined, instead of working on all the original variables x\, x2, • • •, xp, we could simply analyze those m principal components. From above analysis, it can be seen that retaining more principal components not only increases calculation, but also decreases the effect of the main principal component. Therefore, it is important to choose several representative principal components from existing principal components [11, 12]. Since subjective chose could cause the loss of valuable original information, statistics method is employed to deal with multi-variable problem. After normalization, original variables are reduced to several independent synthetic variables on which evaluation is based. Principal Component Analysis provides a good example for this [3]. For simplicity, Table 1 lists 5 samples y, with the same 8 variables x:. According to above steps in principal component analysis, first, we normalize 8 variables of 5 samples to get normalized data set (x'j) (i = 1,2,• • • ,8; j = 1,2, • • • ,5). Then we calculate the correlation matrix R = (ru)M and principal components zm . According to the criteria of minimum m (Em> 85%), top 3 principal components are chosen. From this, the data structure is simplified. Based on the product of principal components z\, z2, and z3 and the correspondent weights e,, 3

e2 and e-j, (synthetic principal component Z = ^emzm ), the final synthetic m=l

principal component Z of each sample is sorted as Table 2. According to the synthetic principal component Z of Table 2, the samples are classified as 5 classes. Class I ( Z < - 3 ) , class II ( - 3 < Z < - 1 ) , class III (-1 3 ) . As the result, these 5 samples belong to 5 different classes (Table 2).

165 Table 1. 8 variables X, of 5 samples yt Sample

x,

x2

X,

x<

xs

X6

x,

xa

y>

2

2

2

2

2

2

2

2

y2 y> y. y>

4

4

4

4

4

4

4

4

6

6

6

6

6

6

6

6

8

8

8

8

8

8

8

8

10

10

10

10

10

10

10

10

Table 2. Synthetic principal components of each sample and its classification. Sample

Z

Sort

Class

y,

-4

1

I

yi

-2

2

y, y< y>

0

3

n m

2

4

IV

4

5

V

3. Error Discussion of PC A Classification The result of real data calculation shows that some problems exist regarding PCA classification. They are: 3.1. The Discrimination Ability of PCA is Limited In Table 1, let the variables of sample^ are y3 = (38, 2, 2, 2, 2, 2, 2, 2). The synthetic principal components of 5 samples are Z = (-2.33, -0.87, 2.33, 2.04, 3.50). The synthetic principal component Z3 of y3 is the same as Zt (Z3 = Z\ = -2.33), although y\ and _y3 are different samples (x3) = 38, xu = 2). Thus, it is concluded that the discrimination ability of PCA is limited.

166 3.2. For Those Samples with Big Variables, PCA Losses its Ability of Discrimination In Table 1, let >>, =(1000,2,2,2,2,2,2,2), herexn is large. The result synthetic principal components are Z= (-3.85, -1.62, 0.10, 1.82, 3.55). yx still belongs to class I (Z, = -3,85). PCA here is not suitable to those samples with very big variable. 3.3. When the Value of Variable Increases, on the Contrary, Class Level Decreases In Table 1, assume y3 = (100, 100, 100, 100, 2, 2, 2, 2). Comparing with yx, the values of variables of y3 increase (x3\=x32 = *33 = xi4 = 100), and the class level should increase accordingly. But the result synthetic principal components are Z = (-1.03, -0.19, -2.59, 1.49, 2.33), the level of the class it belongs to is lower than that ofy\ (Z3 = -2.59
167 3.6. While Variables Change Arbitrarily, There Are Only Two Different Classifications In Table 1, let y\ =y2=yi=y4 = (2,2,2,2,2,2,2,2) In y5, when variable *5,>0.2, the output of synthetic principal components is Z = (-1.41, -1.41, -1.41, -1.41, 5.66). When variable * 5 ,<0.2, the output of synthetic principal components is Z = (1.41, 1.41, 1.41, 1.41, -5.66). In this example, it is showed that while variables of ys changed arbitrarily, there are only two different classifications. 3.7. The Position Change of Variables Causes the Change of Classification Assume the variables of five samples are as follows: yt = (1.1, 2.2, 3.3, 4.4, 5.5, 6.6, 7.7, 8.8) ^ = (11,12,3,4,5,6,7,8) ^ = (11,12,13,4,5,6,7,8) ^4 = (H, 12, 13, 14,5,6,7,8) ^ = (11,12,13,14,15,6,7,8) The output of synthetic principal components is Z= (3.44, -0.64, -0.87, -0.94, 1.00). Of y\, if we exchange x\\ =1.1 and xn =2.2, the output of synthetic principal components becomes Z= (-3.25, -0.09, 0.57, 1.17, 1.62). The change of synthetic principal components is tremendous {Z\ = 3.44 —> -3.25), and the correspondent classification of vi changes from class V to class I. Thus we concluded that the position change of variables could cause the change of classification. 3.8. The Change of a Variable Causes the Change of the Classification Let variables of 5 samples are: j>i = ( l l , 12, 13, 14, 15,6,7,8) ^ = ^ = ^ = (1,2,3,4,5,6,7,8) ys = (\. 1,2.1, 3.1, 4.1, 5.1, 6.1, 7.1, 8.1) The output of synthetic principal components is Z= (3.28, -0.90, -0.90, -0.90, 0.59). Of.Fi, if *j5 = 15 is changed to x]s = 5, we have Z= (-1.39, -0.53, -0.53, 0.53, 2.99) (Z\ =3.28 changes to Z, = -1.39). As the result, the classification of Vi changes from class V to class II. This indicates that the change of a variable causes the change of classification.

168 4. Conclusion PCA is a simple and useful classification technique. By linear transformation with minimal information loss, PCA reduces multidimensional variables to a small set of synthetic variables while retain the maximum information of the original data set. Thus, PCA simplifies the data structure and the weights are also obtained objectively. But the real data calculation shows that the results of PCA are not always comply with the real situation, and sometimes this method may fail completely. These drawbacks are associated with the nature of PCA itself. The study of these problems will further our understanding of PCA. As to how to improve, it requires further study. References 1.

X. Li and J. A. Ye. Application of principal component analysis and Cellular Automata in policy-making of space and urban simulation. Science in China (series D), 31(8), 683-690(2001). 2. R. E. Ren and H. W. Wang. Data Analysis of Multivariate Statistic. Publishing House of National Defense Industry, 92-110 (1997). 3. C. Zhang and B. G. Yang. Basis of Quantitative Geography. Publishing House of Higher Education, 145-159(1993). 4. S. Bermejo and J. Cabestany. Oriented principal component analysis for large margin classifiers. Neural Networks, 14(10), 1447-1461 (2001). 5. X. P. Wang. The principal component analysis method of the water quality assessment in rivers. Application of Statistics and Management, 20(4), 49-52 (2001). 6. T. X. Cheng, H. G. Wu and X. H. Sun. A method of tender evaluation based on the PCA. Systems Engineering— Theory & Practice, 20(2), 118-121 (2000). 7. R. J. Bolton, D. J. Hand and A. R. Webb. Projection techniques for nonlinear principal component analysis. Statistics and Computing, 13(3), 267-276 (2003). 8. M. Bilodeau and P. Duchesne. Principal component analysis from the multivariate familial correlation matrix. Journal of Multivariate Analysis, 82(2), 457-470 (2002). 9. A. Pedro and D. Silva. Discarding variables in a principal component analysis: algorithms for all-subsets comparisons. Computational Statistics, 17(2), 251-271 (2002). 10. B. B. Li and E. B. Martin. On principal component analysis in LI August 2002. Computational Statistics & Data Analysis, 40(3), 471-474 (2002). 11. P. Giordani and H. A. L. Kiers. Principal Component Analysis of symmetric fuzzy data. Computational statistics & data analysis, 45(3), 519-548 (2004). 12. M. K. Shukla, R. Lai and M. Ebinger. Principal component analysis for predicting corn biomass and grain yields. Soil Science, 169(3), 215-224 (2004).

OPTIMIZED ALGORITHM OF DISCOVERING FUNCTIONAL DEPENDENCIES WITH DEGREES OF SATISFACTION* QIANG WEI", GUOQING CHEN School of Economics and Management,

Tsinghua University, Beijing 100084, China

In order to tolerate partial truth due to imprecise or incomplete data that may often exist in massive databases, or due to a very tiny insignificance of tuple differences in a huge volume of data, the notion of functional dependency with degree of satisfaction, denoted as (FD)d, has been proposed in [5], along with Armstrong-like properties and the concept of minimal set. This paper discusses and presents several optimization strategies and inference properties for discovering the minimal set of (FD)d and incorporates them into the corresponding algorithm so as to improve the computational efficiency.

1. Introduction Data mining is one of the important and interesting fields in computer science and computational intelligence, and is used to discover hidden, novel and potentially useful knowledge to support decisions. This paper concentrates on a particular type of association knowledge, called functional dependency, in relational databases, which are categorized as a mainstream of data models in current research and applications. For two collections X and Y of data attributes, a functional dependency (FD) X—»Y means that X values uniquely determine Y values. An example of X—»Y is (Student#, Course#)-»Grade, meaning that the value of grade can be uniquely determined by a given value of Student# and a given value of Course#. However, FDs are not explicitly known or are hidden, and therefore need to be discovered. This partly stems from the fact that several decades of IT applications had resulted in a large number of databases that were constructed and maintained in which useful and interesting FDs might have already been hidden. Since the 1990s, an increasing effort has been devoted to mining FDs [3, 5-10].

* Partly supported by the National Natural Science Foundation of China (79925001/70231010), the Tsinghua Research Center of Contemporary Management and the Bilateral Scientific and Technological Cooperation between China and Flanders. 1 Corresponding author: [email protected].

169

170 Formally, let 9?(II; I2, ..., In) be an n-ary relation scheme on domains D 1; D2 Dn with Dom(Ij) = Dj, X and Y be subsets of the attribute set I = {I1; I2,..., I n }, i.e., X, Y c I, and R be a relation of scheme 91, R c DixD2x...xDn. X functionally determines Y (or Y is functionally dependent on X), denoted by X->Y, if and only if VR e 5R, Vt, f e R, if t(X) = t'(X) then t(Y) = t'(Y), where t and t' are tuples of R, and t(X), t'(X), t(Y) and t'(Y) are values of t and t' for X and Y respectively [1-2, 4]. It is important to note that functional dependency possesses several desirable properties, including so-called Armstrong axioms that constitute a FD inference system [1,4]. In discovering functional dependencies, there still exist some open problems. First, in large existent databases, noises often pertain, such as conflicts, nulls, and errors that may result from, for instance, inaccurate data entry, transformation or updates. Apparently, by definition, FDs do not tolerate such noisy or disturbing data. Second, even without noisy data, sometimes a partial truth of a FD may still make sense. For instance, "a FD almost holds in a database" expresses a sort of partial knowledge, meaning that the FD satisfies the relational databases of concern to a large extent. Third, in developing corresponding mining methods, FD inference is desirable but still needs to be further investigated. That is, deriving a FD by inference from discovered FDs without scanning the database may help improve the computational efficiency of the mining process. For example, if both A-»B and B—>C satisfy a relational database, and if A-»C could be inferred directly, then the effort in scanning the database for checking whether A-»C holds can be saved. In 2002, Huhtala et al [3] consider using the concept of approximate dependency to deal with so-called error tuples. In the mean time, Wei and Chen [5] presented a notion of functional dependency with degree of satisfaction (FDd: (X-»Y)0) to reflect the semantic that equal Y values correspond to equal X values at a certain degree (a). Moreover, in 2002, Wei and Chen presented the Armstrong-like inference rules, along with an inference system, based on which the minimal set of (FDs)d has been proposed [5]. Furthermore, a fuzzy relation matrix-based algorithm has been analyzed to perform transitivity-type FD inference. Accordingly, the algorithm for mining (FDs)d, called MFDD, has been provided, which can discover the minimal set of (FDs)d efficiently. In this paper, we will further investigate some important properties of (FD)a, and present two strategies to optimize MFDD. The paper is organized as follows, some preliminaries will be reviewed in Section 2. Section 3 will discuss how to improve the sub-algorithm of computing the degree of satisfaction of a (FD)d. Moreover, some further inference rules will be discussed in Section 4. Accordingly, the optimized algorithm of discovering minimal set of (FDs)d will be presented in Section 5. An illustrative example will be provided in Section 6.

171 2. Preliminaries Definition 1: Let 5R(Ii, I2, ..., In) be a relation scheme on domains D b D2, ..., Dn, X , Y c I , and R be a relation of 5R(I), R c D!xD2x.. .xDn, where tuples tj, tj e R and tj * tj. Then Y is called to functionally depend on X for a tuple pair (tj, tj), denoted as (tMD(X->Y), if tj(X) = tj(X) then tj(Y) = t/Y). It can easily be seen that the FD for a tuple pair could be represented in terms of the truth value, TRUTH(ti,tj)(X->Y), where if ti(X)=tj(X) and ti(Y) * tj(Y), then TRUTH(ti, tj)(X-»Y) = 0; otherwise 1. Subsequently, FD for relation R can be defined in terms of degree of satisfaction. Definition 2: Let 9?(I) be a relation scheme, X, Y c I, and R be a relation of Y is TRUTHR(X-»Y): Y,TRUTHUlj)(X^Y) TRUTHR(X->Y)=

J^li

, NTP

where NTP represents the number of tuple pairs in R and equals n(n-l)/2. Then given a minimum satisfaction threshold 0, 0 < 0 < 1, if TRUTHR(X—»Y) > 0, then X—>Y is called a satisfied functional dependency. For the sake of convenience, a (FD)d X->Y with TRUTHR(X-»Y) = a is denoted as (X—>Y)a. Moreover, some properties could be derived. Let R be a relation on 5R(I) a n d X , Y , Z c I, we have: Al: If Y c X, then TRUTHR(X-> Y) = 1. A2: If TRUTHR(X->Y) = a, then TRUTHR(XZ->YZ) > a, 0 < a < 1. A3: If TRUTHR(X-»Y) = a and TRUTHR(Y->Z) = p, then TRUTHR(X->Z) > a+p-1. A4: If TRUTHR(X->Y) = a, then TRUTHR(Y-»Z) > 1 - a. The first three properties are similar to the three classical Armstrong inference rules, except for A3 in that it guarantees a lower-bound TRUTH value for a transitive (FD)d that could be inferred without scanning database. Moreover, A4 is important to guarantee that invalid values less than 0 will not be generated in transitive inference. Based on the Al, A2 and A3, the Armstrong-like inference system could be defined, as well as the 0-equivalence and minimal set of (FDs)d. Accordingly, an algorithm called MFDD based on fuzzy relation matrix operation has been proposed, by which the minimal set of satisfied (FDs)d could be discovered efficiently. For details, please refer to [5],

172 However, the algorithm could be further optimized on two aspects. First, the algorithm of computing the degree of satisfaction of a certain (FD)d could be further optimized, which will be discussed in Section 3. Second, we will further investigate the four inference rules, especially including A4, and see how to use them to a greater extent for efficiency purposes. 3. Optimized Sub-Algorithm of Computing the Degree of Satisfaction As presented in [5], the sub-algorithm of computing the degree of satisfaction of a certain (FD)d is very direct and easy based on Definitions 1 and 2. For example, given A—>B, the process is to scanning all the tuples and comparing each pair of tuples. Suppose the number of tuples is n, the computational complexity of the sub-algorithm is n(n-l)/2 = 0(n 2 ). However, it could be found that, if and only if t,(A) = (j(A) and tj(B) * tj(B), then TRUTH(tUj)(A->B) = 0, else TRUTH(tii tj)(A->B) = 1. Thus, the number of tuple pairs whose TRUTH value is 1 is equal to n(n-l)/2 subtracted by the number of tuple pairs whose TRUTH value is 0. In this situation, we can focus on different groups with identical A values, since the TRUTH value of A-»B on a certain tuple pairs with different A values will be definitely 1. In brief, only the tuple pairs with identical A values and different B values are worth considering. Given A—>B, the process of computing the degree of satisfaction is as follows. First, categorize the n tuples into k groups, each of which has an identical A value, denoted as (Aj)-group, 1 < / < k. Second, in each A value group, only the tuple pairs with different B values will result in 0 TRUTH value. Then, we can categorize the tuples in an (Aj)-group into /, sub-groups, each of which has an identical B value, denoted as (Ai; Bj)-group, 1 < j < /,. Accordingly, the number of tuples in each (Ai; Bj)-group, denoted as n,y, could be counted. Third, the TRUTH value of A—>B could be computed with the following function:

TRUTHM->B) = \-Y,

Z (H,y,x„,2)/(„(«-l)/2)-

MiikMjfJiil, j\*h

The optimized sub-algorithm is listed in Table 1. The computational complexity of the algorithm contains two parts. The first part is to categorize all the n tuples into groups according to A and B values. The second part is to count the number of tuples in each groups. In part one, if all the tuples could be categorized into a A groups, and each A group could be further categorized into b B groups. Clearly, there always exists a x b < n. So the computational complexity of part one is no more than 0(a/2xb/2xn) = 0(abn/4) < 0(« 2 /4). In

173 part two, the computational complexity is 0(axb(b-l)/2) < 0(n{b-\)l2). Then the total computational complexity is 0(abn/4+ab(b-l)/2), which is less than the computational complexity 0(«(n-l)/2) of the original algorithm in most situations. In the worst situation, where a=\ and b=n, the computational complexity of optimized algorithm is 0(«/2+n(n-l)/2) which is a little higher than the original algorithm. Fortunately, however, the worst situation could be rarely seen, which represents that all the tuples in the databases have identical A values and identical B values. So generally, the optimized sub-algorithm of computing the degree of satisfaction is more efficient than the original sub-algorithm. This optimization is quite important and will improve the performance of the whole algorithm, since the computation of degree of satisfaction is the basic operation. Table 1. Optimized Sub-Algorithm Degree_Satisfaction(A-»B) N[][] = 0; // Initiate a two-dimension array to store the number in each group. SELECT COUNT(t) FROM R GROUP BY A AND B INTO N[a][b] Non_Truth_Number = 0; FORp=lTOa { FOR q = 1 TO b { FORr = q+nOb { Non_Truth_Number = Non_Truth_Number + N[p][?]xN[/)][r]; } } } Deg_FD = 1 - NonTruthNumber / ((« - 1) x n 12);

4.

(FDs)d Inference Rules

In [5], Al, A2 and A3 have been considered partially to improve the mining efficiency. In this paper, Al, A2 and A3 will be further considered, while A4 will also be utilized. Since these important properties could be further deducted and incorporated into the MFDD algoritiim to improve the efficiency. Based on Al, A2, A3 and A4, some important inference rules could be derived, denoted as Dl, D2, D3 and D4. Dl: Let R be a relation on Y) = a and TRUTHR(X->Z) = p, then TRUTHR(X->YZ) > max(a + p - 1 , 0). D2. Let R be a relation on 5R(I) and W, X J . Z c I. If TRUTHR(X->Y) = a and TRUTHR(WY->Z) = P, then TRUTHR(XW->Z) > a + p - 1. D3: Let R be a relation on 5R(I) and X , Y , Z c I, and Z c Y . If TRUTHR(X-»Y) = a, then TRUTHR(X-»Z) >: a. D4: Let R be a relation on Y) = a, then TRUTHR(WY->Z) > 1 - a.

174 Based on the above inference rules, some optimized strategies could be inferred and incorporated into the algorithm. Let R be a relation on 31(1), X, Y c I and given any Zn(XuY) = 0 , and given a threshold 0, we have: Strategy 1: if X->Y is satisfied for R, then XZ-»Y is satisfied. Strategy 2: if X-»Y is not satisfied for R, then X->YZ is not satisfied yet. Strategy 3: if TRUTHR(X->Y) < 1 - 6, then for any Z c I, Y->Z is satisfied R. In mining the minimal set of satisfied (FDs)d, Strategies 1 and 3 could be utilized as inference strategies, while Strategy 2 could be regarded as filtering strategy. In [5], only Strategies 1 and 2 have been incorporated into MFDD algorithm. In Section 5, Strategy 3 will be further incorporated as inference strategy to optimize the MFDD algorithm. 5. Optimized MFDD Algorithm The optimized MFDD algorithm is listed in Table 2. Table 2. The Optimized MFDD Algorithm SC_FP = 0 ; IN_FP = 0 ; DIS_FP = 0 ; CA_FP = 0 ; f = {X->Y with a and flag 10 < a < 1, flag = 0, 1,2 or 3, X c I, Y = Ij, 1 < j < m} CA_F,= {f | X = Ij, Y = Ii, 1 G) {f.flag = 1 and Fp = Fp u {f}; } ELSE {f.flag = 2;} + IF (f.a < 1 - G) + { FOR ALL f e F p A N D f . X = f.Y + { f.flag = 3; }} // Inferred satisfied according Strategy 3. F p =Fp®(Fi)°'"'; //Please refer to [5]. FOR ALL f e Fp { IF (f.a > G) {f.flag = 3;} ELSE {f.degree = 0;}} } SC_FP = {f s Fp | f.flag = 1}; DIS_FP = {f € Fp | f.flag = 2};IN_FP = {f e Fp | f.flag = 3}; CA_Fp+, = Generate_Candidate(DIS_Fp); // Please refer to [5]. DIS_FP = 0 ; p ++; } M_F = u , 5llSpSC_Fk; MULTI_FD = Generate_Multi(M_F); F At = M_F u MULTI_FD;

The line marked with "*" are the optimized sub-algorithm of computing the degree of satisfaction. The three lines marked with "+" are the process incorporated with Strategy 3. The process is as follows. If an (FD)d f is found such that its TRUTH value a is no more than 1 - 9 . Then scan the current Fp, and mark all the f e F p with antecedent equal to the consequent of f with flag =

175 3, which means these (FDs)d could be inferred satisfied without scanning the database. The analysis on computational complexity of this optimization process contains two aspects. First, this process is very efficient if there exist any (FD)d, e.g., A->B, with TRUTH value no more than 1 - 0, then all the B->L., 1 < j < m, could be inferred satisfied without any database scanning. Second, whether this process could take effect highly relates to the values of attributes in databases, depending on number of the tuples with different B values and identical A values. 6. A Small Example Suppose a database as listed in Table 3. Notice there is a null value in Location. Table 3. An Example of Database ID 1 2 3 4 5

Department CS IS CS CS CS

Location

.......

#

.

Building 2 Building 2 Building 2 Building 2

Given 0 = 60%, then according to original MFDD algorithm, it could be discovered that the minimal set of satisfied (FDs)d MF = {(ID—»Deparment)i o, (Department—>Location)0.7, (Location—»Department)0.7}, all these 3 (FDs)d could be derived only by scanning database. The set of inferred satisfied (FDs)d rN_F={ID->Location}, and the set of scanned dissatisfied (FDs)d DIS_F={(Department-»ID)o.4, (Location->ID)0.4}. So totally 5 (FDs)d could be determined whether they are satisfied or not by scanning the database. With the optimized MFDD algorithm, the minimal set is die same, however, since it could be scanned that the TRUTH value of Department-»ID is 0.4, which is no more than 0.4 = 1 - 60%, which means that ID->X, X could represent any of Department or Location, will be definitely satisfied according to Theorem 3. Then ID-»Location and ID->Department could be inferred satisfied without scanning the database. So finally, only 4 (FDs)d could be determined whether they are satisfied or not by scanning the database, which could save more time than original MFDD algorithm. 7. Concluding Remarks In this paper, we have further discussed the functional dependency with degree of satisfaction, which could tolerate noisy data and express partial

176 knowledge. Moreover, in order to further improve the performance of the discovering algorithm, this paper has focused on two aspects. First, a group operation based sub-algorithm of computing the degree of satisfaction has been proposed, which can improve the efficiency of basic operation of discovering (FDs)a. Second, some inference rules based on the Al, A2, A3 and A4 properties have been analyzed, especially for A4. Accordingly, the original MFDD algorithm has been further optimized with Strategy 3, based on which some satisfied (FDs)d could be inferred by some highly dissatisfied (FDs)d without scanning the database. The example also illustrates the ideas. It is worth mentioning that in computing the degree of satisfaction of X-»Y, all the tuple pairs will be considered according to Definition 2 (including the tuple pairs with different X values), which is consistent with the notion of functional dependency in classical relational data models. References 1. Chen GQ. Fuzzy logic in data modeling: Semantics, constraints and database design. Boston, MA: Kluwer Academic Publishers; 1998. 2. Codd EF, A Relational Model for Large Shared Data Banks. Communications of the ACM 1970,13(6): 377-387. 3. Huhtala, Y.; Karkkainen, J.; Porkka, P.; & Toivonen, H., 1998. Efficient Discovery of Functional and Approximate Dependencies Using Partitions. Proc. 14th Int. Conf. on Data Engineering, IEEE Computer Society Press. 4. Ullman, Jeffrey D. Principles of Database and Knowledge-Based Systems. Maryland, Computer Sciences Press Inc., 1988. 5. Wei, Q, Chen, GQ, Efficient Discovery of Functional Dependencies with Degrees of Satisfaction, J. of Intelligent Systems, Vol. 19, 1089-1110, 2004. 6. Baudinet M, Chomicki J, Wolper P. Constraint-generating dependencies. J Comput Syst Sci 1999; 59(1):94-115. 7. Bell S, Brockhausen P. Discovery of data dependencies in relational databases. LS-8 Report 14. University of Dortmund, Germany; 1995. 8. Wyss C, Giannella C, Robertson E. FastFDs: A heuristic-driven depth-first algorithm for mining functional dependencies from relation instances. Technical Report 551, CS Department, Indiana University, July 2001. 9. Castellanos M, Saltor F. Extraction of data dependencies. Report LSI-93-2-R. Barcelona: University of Catalonia; 1993. 10. Flach PA, Savnik I. Database dependency discovery: A machine learning approach. AI Commun 1999;12(3): 139-160.

FROM A N A L O G Y REASONING TO INSTANCES B A S E D LEARNING*

PAN W U M I N G College of Computer

Science, Sichuan University, Chengdu 610065, E-mail: pan.wumingQgmail.com

P.R.China

LI T I A N R U I Department

of Mathematics,

Southwest Jiaotong University, Chengdu 610031, P. R. China Belgian Nuclear Research Centre(SCK»CEN), 2400 Mol, Belgium E-mail: [email protected]

The principle of local structure mapping in analogy reasoning is introduced and applied to the problem of learning from data. A conceptually straightforward approach is presented for the classification problem on real-valued data. An experiment on Iris data set is provided to illustrate that the principle of local structure mapping can be an effective mechanism and viewpoint for tasks of learning from data as well as analogy reasoning.

1. I n t r o d u c t i o n Most learning algorithms based on feature vectors of real-valued and discrete-valued numbers, and in all cases there has been a natural measure of distance between such vectors. For example, the fc-nearest neighbor algorithm 1 assumes all instances correspond to points in the n-dimensional space. The nearest neighbour is normally calculated using a distance measure such as the Euclidean distance. Most learning methods address problems of this sort — two input vectors sufficiently "close" lead to similar outputs. The principle that similar inputs lead to similar outputs has been widely applied in the algorithms of learning from data as well as rules-based reasoning. In this paper, we indicate that this principle can be extended to a more general principle, the local structure mapping principle, which "The work was supported by the national natural science foundation of china (60474022) and the natural science foundation of sichuan province, china (05jy029-021-2).

177

178 is manifested in analogy reasoning. The local structure mapping principle is still not applied to data analysis directly. In this paper, we attempt to design new learning algorithms in terms of this principle. The remaining of the paper is organized as follows. In section 2, we discuss the general representation of structures. We investigate how the local structure mapping principle can be used in analogy reasoning in Session 3. In Section 4, a learning algorithm is presented with reference to the local structure mapping principle intuitively, and experiments on Iris data set are provided. Section 5 concludes the research work of this paper. 2. The Representation of Structures A structure is defined as S = {S, 61,62, • • •) , where S is the underlying set and 6i,62,--- are operators or relations which satisfy some properties or constraints. A local structure SL = (S1,6[,6'2,---) of S is a structure such that S' C S, and 6[, 5'2, • • • are the operators or relations restricted on 5". A structure mapping M : Si ——> S2 is a function between structures which hold the rules of the operators or relations of these structures, such as homomorphisms of algebras. For example, lattice is a structure. Let L be a lattice, then L has a underlying set L together with two binary operations V (join) and A (meet), and the operations of L satisfy commutative laws and other three kinds of laws. 3. Structures mapping in Analogy Reasoning Analogy is a process of finding and using correspondences between concepts, and plays a fundamental and ubiquitous role in human cognition2,3'7. Analogy is also used as a machine learning and automated deduction method in AI and other application domain4,5. Analogy facilitates better understanding of old knowledge and the formation and inference of new knowledge. Understanding analogy requires several processes include retrieval, mapping and application6. Retrieval is a process to find the sources adapted for the target. Mapping is to find the consistence between source problem and target problem and to map concepts from base to target domain. Application is to apply correspondence of representation elements from base to target domain to describe and explain new topics. As a reasoning method, analogy contains a deep logic problem3. However, formal rules haven't been found so far for analogy reasoning. Owen suggested that analogy mapping (called analogy matching) can be expressed as a set of positional associations between symbols in logical terms 4 , i.e. consist of elements of form as

179 follows, ((symboli,positioni),

(symbol,positiori2))

(1)

where symboli has positioni in statementi and symbol has positioni in statement?,. A position in (1) is represented as a sequence of successive argument position of a statement. Local structure mapping here is the mapping of sequence structures. There are still other ways of local structure mappings. If we think that the analogy is between two logic sentences, then they may be associated with algebra structures. And if we think the statements have a syntax structure, then these sentences may be parsed to tree structures, then there may be mappings between tree structure. Chatterjee and Campbell proposed an reasoning method called knowledge interpolation with setting number on knowledge to build order structure on knowledge 8 . Knowledge Interpolation is a kind of local structure mappings of order structures. 4. Classification on Real-valued D a t a by Local Structure Mapping In analogy reasoning, local structure mapping uses the structure information to predict new situations according to reference knowledge. Suppose a classification problem involves non-numeric data. For instance, descriptions are discrete and without any natural notion of measures. A learning algorithm used for this kind of data is decision trees 1 . Even they may have structure information, decision trees algorithms do not use it. However, if we can build structure on these data, the local structure mapping may be a more appropriate approach to learn from these kinds of data. We consider the standard classification problem, where Xi,i — 1,2, ••• ,n are real valued properties and C is the class variable having m possible classes Cj, j = 1 , 2 , . . . , m, where Cj is a nominal data. We will build structure on n-dimension real-valued training data set. Then the set of all labels can be viewed as a trivial discrete structure which only contains a unary relation just contains all labels. When a new query instance is given, we can add this instance to training data and re-build the structure. To predict the most appropriate label of the query instance, we map the local structure of the query instance to the discrete structure of the label set. Because the elements of the label set have no relations to each other, every label is naturally a single point structure. If we have chosen the local structure around the query instance d, we can map it to a label which represent a one point structure. For there are m labels, we must

map it m-times. Now we must determine which label is most appropriate to the query instance. For each label Cj, we count the number of instance correctly mapped to label Cj, we can use A (Cj, d) to denote this number. So evidently if some Ck have the biggest A (Ck, d), Ck may be most appropriate label to the query instance d. A example is show in Fig.l, where most appropriate label to d is C\. If there are many kinds of local structures, we must consider the effect of the mapping of all local structures. The ultimate process of classification is in the following section. d^d^.d^

are labeled C\; d5 is labeled C2;dt is labeled C3.

a) local structure around a? mapping to Cj

b)

local structure around d mapping to C2

c) local structure around a? mapping to C3

Figure 1. An example of local structure around query instance mapping to all three class labels.

4 . 1 . Selecting

the Structure

for

Training

Data

For each Xi, we consider the order structure P j = (Xi, ^ ) , where ^ is the order relation on domain of Xi, the set of real numbers. The product of two order structures P = (P, ^ ) and P ' = (P', ^ ' ) is also an order structure P x P ' = ( P x P',^PxP,), where ( P x P') is the Cartesian product of P and P', and (2:1,2:2) ^ P x P ' (i/i,2/2) iff £1^2/1 and x2 ^'2/2- The order structure is dual. Let P * = (Xit ^*) be the dual order structure of P j = (Xi, ^ ) , then for any a, b G Xi, a^b iff b ^ * a. Therefore, let P i and P2 be the order structures on Xi and X2, there are four order structures, P i XP2, P i x P 2 , P i x P j and P * x P ^ , on Xx x X2- Similarly, there are 2" order structures o n X j x l 2 x ••• x Xn: Mj = P f x P f x • • • x P£" = (Xi x X2 x ... x Xn,

^j)

(2)

where j = 1, 2, • • • , 2 n , and • _ U ~ 1) mod 2i - (j - 1) mod 2 ~ 2( i ~ 1 )

Ji

(3)

181 and for each i e {1,2, • • • , n } , P ? = P i = (Xu » , P j = P J = (Xu >*). If we also write ^ as ^ ° and ^ * as ^ ' , then (xi,x2, • • • , x„) > j (2/1,2/2, • • • , j/ n ) iff Xi^jiyi , i = 1,2, ••• , n . Let the collection D of data cases D\, D2, • •., Di be the training data sets. The data item D^ has the form (k, xf, a;*,...,x*, C fc ), where fc is the key label of the instance for identifying each sample instance uniquely, Ck € { C i , C 2 , . • • , C m } and xk belongs to the domain of the predictive variable Xi. We also use dk to denote (k,xk,xk,.. •, x£) and vk = ( x ^ x * , . . . , x * ) , k k k k then D/i; is also written as (d , C ) or (k, v ,C ). We call D x the instances set of D . However, the partial order structures M j can not be reduced to the corresponding structures on the set D x directly. Suppose we can define relation P'j on the set D x = {d 1 ,d 2 , • • • , d ' } such that for any ds,dr £ D x , ds >j d r if and only if vs >j vr.

(4)

There may be duplicate vectors in D x , but they are associated with different instances. If fci 7^ k^ and x / = xt2,i = 1,2, • • • , n, then we have dkl *=j-d*2 and d*2£=$ d A l , but dkl and d*2 are not the same elements in D x This breaks the antisymmetry of partial order. When we classify a new query instance, the duplicate vectors of it in D x must be considered because they are the nearest samples of it. Hence we want to built some structures on D x such that the nature of structures M j are still exist and the duplicate vectors are also have appropriate connections in these structures. A finite partial order structure P = (P, >) can be represented by a Hasse diagram in which two elements a\ and 1x2 are connected if and only if one is a cover of another. Definition 4 . 1 . Let HJ 3 * = ( D x , ^ , ) be a structure that we call the Hasse structure of the instances set D x such that for any ds,dr e D x , dsPjdr if and only if vs is a cover of vr according to partial order relation ^j or vs = vr. A finite partial order set is equivalent to it's corresponding Hasse diagram, so that the Hasse structure of a data set is essentially a partial order structure except that any element p and all duplicate elements of p are apart but "equal" t o each other. 4.2. Determining the Local Structures Related to Training Data

for an

Instance

In a finite partial order structure, the local structure of an element p of it is the structure on the set just containing all it's adjacent elements in

182 Hasse diagram and p itself. One can extend the local structure to include the local structures of all it's original elements. For a Hasse structure the local structure of an element p may intuitively be the set just containing all it's adjacent elements and p itself. When a new query instance is input, it's local structures is retrieved from training data and will be used to classify the new query instance. Suppose the query instance is d = (I + 1,x\,X2, •. •,xn) — (I + 1, v) where I + 1 is given as the key label, the local structure of d in Hasse structure H£>xu{d} =

^Dx

y

^

^^

is d e f i n e d

ag

Loc, (DJC, d) = (loCj ( D x , d), Pj) , where

IOCJ

(Dx,d)

s

s

(5)

s

= {d \dPjd or d'Pjd, d &DXU {d} } .

Proposition 4.1. IOCJ (Dx,(i) = locin_j ( D x , d ) . 4.3. Approximating New Instances' on Their Local Structures

Target Values

Based

There are 2" local structures of the query instance d in D x U {d}. Definition 4.2. Let D be the training data set, C;, i = 1,2, • • • , m, be the class labels, and Xj (Cj, d) be the number of the instances in IOCJ (Dx, d) — {d} whose class label is Q , the support of assigning the query instance d to class d in the partial order structure M j , j = 1, 2, • • • , 2™, is defined as 3

"

#_of-totalJnstances_inJoCj(Dx,d) — 1

Definition 4.3. For class labels Ct, i = 1,2,•••,m, let X(d,d)

=

2"

J2 Xj(Ci,d), the support of assigning the query instance d to class C,, according to data set D, is defined as support (Citd) = -n ^ ^ . J2 (#.of.totaLinstancesJnJocj (Dx,d) — 1)

(7)

Corollary 4.1. For j = 1,2, • • • , 2 n , we have support j (Ci, d) = support 2 n_j (C*, d)

(8)

Proposition 4.2. For j — 1,2, • • • ,2™, we have m

m

YJ support^ (Ci,d) = I, y ^ support (Cj, d) = 1

(9)

The conceptually straightforward approach to predict the class label of d is to assign the query instance d the class label Cd such that support (Cd,d) is the maximum of all support (d,d), i — 1,2, • • • , m. We use LSM(D,d) to denote this predicted class label of d, then LSM(D,d) is obtained from the following equation LSM(D,d)=

argmax C i 6{C 1 ,C 2

4.4. Experiment

support {Cud)

(10)

Cm}

Study

We consider the classification problem of Iris Plants Database. We use 50 instances as prototype samples, in which there are 16 instances labelled Iris Setosa, 17 instances labelled Iris Versicolour and 17 instances labelled Iris Virginica, and 50 instances (15 instances labelled Iris Setosa, 17 instances labelled Iris Versicolour and 18 instance labelled Iris Virginica) for experiment. There are only two classification errors( 96% recognition rate). A fraction of the experiment results containing 2 classification errors are shown on Table 1. From Table 1, for each testing instance the class label with highest support is prominently higher than the supports of other two classes. The last two columns are two misclassified instances. For one the support of correct class label Versicolor (C2) is 0.36 and the highest support is that of Virginica (C3) -0.40, both are higher than the support of Setosa (Ci) -0.24. For another the support of correct class label Versicolor (C2) is 0.38 and the highest support is that of Virginica (C3) -0.43, both are higher than the support of Setosa (Ci) -0.19. For these two misclassified instances the support of the correct class label is very close to the highest support, and prominently higher than the support of another class label. These results illustrate that the presented classification method is robust. Table 1.

A fraction of the testing results on Iris Data, the testing results of 10 instances

d)

0

0

0

0.14

0.11

0.14

0.19

0

0.24

support(C2, d)

0

0.47

0.70

0.60

0.69

0.08

0.12

0.20

0.36

0.38

support(Ci,d)

1.00

0.53

0.30

0.26

0.20

0.78

0.69

0.80

0.40

0.43

Testing result

c3 c3

c3 c3

Ci

Ci

Ci

Cz

Ci

C3

C3

d

Ci

Ci

Ci

Cz

C3

d

Ci

Ci

support(Ci,

Correct class

0.19

5. Conclusion In this paper, we introduced the local structure mapping principle in analogy reasoning, then we applied the principle to instance-based learning problem, and proposed the basic learning algorithm that can be an alternative algorithm for multi-instances based learning tasks. The local structure mapping principle also can applied to data mining tasks other than classification. We haven't discussed the computational complexity of computing LSM(D,d) by (10) yet. One may think that computational complexity is very high for computing LSM(D,d). For prototype feature vectors of n-dimensions, we must consider 2" local structures of the query instance d, hence the computational complexity seems to grow exponentially with the dimensionality of the feature space. But the exponential complexity is only the guise of the problem, we can reduce the computational complexity by building of some data structure on the sample data before classifying new instances. This will be presented in our future papers. And the further study of local structure mapping principle used in knowledge discovery from data will be our future work.

References 1. Richard O. Duda, Petor E. Hart and David G. Stork: Pattern Classification. John Wiley & Sons (2001) 2. Holyoak, K. J., & Thagard, P. R.: Mental leaps: analogy in creative thought. Cambridge, MA: MIT Press (1995) 3. Salvucci D. Anderson J.: Integrating analogical mapping and general problem solving: the path-mapping theory. Cognitive Science 25 (2001) 67-110 4. Stephen Owen: Analogy for Automated Reasoning. Academic Press (1990) 5. Charles Dierbach and Daniel L. Chester: Abstraction Concept mapping: a foundation model for analogical reasoning. Computational Intelligence 13 (1997) 33-81 6. Arthur B. Markman: Constraints on analogical Inference. Cognitive Science 21 (1997) 373-418 7. Michael A. Arbib, Eds, The handbook of brain theory and neural networks. MIT Press (2003) 8. Nilardri C , Campbell J.: Knowledge Interpolation: A simple approach to rapid symbolic reasoning, Computers and Artificial Intelligence 17 (1998) 517551 9. Jocob E., and Joseph O. Eds.: Handbook of Discrete and Computational Geometry. CRC Press LLC (1997)

A KIND OF WEAK RATIO RULES FOR FORECASTING UPPER BOUND* QING WEI, BAOQING JIANG 1 , KUN WU Institute of Data and Knowledge Engineering, Henan University Kaifeng, Henan, 475001,China weiqing@henu. edu. en WEI WANG School of Electrical Eng., Southwest Jiaotong University Chengdu 610031, Sichuan, China This paper deals with the problem of a kind of weak ratio rules for forecasting upper bound, namely upper bound weak ratio rules. Upper bound weak ratio rules is parallel to Jiang's weak ratio rules and has such a reasoning meaning, that if the spending by a customer on bread is 2, then that on butter is at most 3. By discussing the mathematical model of upper bound weak ratio rules problem, we come to the conclusion that upper bound weak ratio rules are also a generalization of Boolean association rules and that every upper bound weak ratio rule is supported by a Boolean association rule. We propose an algorithm for mining an important subset of upper bound weak ratio rules and construct an upper bound weak ratio rule uncertainty reasoning method. Finally an example is given to show how to apply upper bound weak ratio rules to reconstructing lost data, to forecasting and to detecting outliers.

1. Introduction As proposed in paper [l], Weak Ratio Rules can be used to reconstruct lost data. For example, Suppose Butter: Bread 3= 5: 4 (1) be a weak ratio rule that we obtained, that is to say, if the spending on bread is $4, then according to the reasoning meaning of weak ratio rules, the lost data— the spending on butter— is $5 at least. In fact, the WRR method in paper [l] is a way to forecast the lower bound of lost values, so the Weak Ratio Rule'11 is called Lower Bound Weak Ratio Rule (LBWRR for short), and * This work is supported by the National Natural Science Foundation of China (60474022) and the Natural Science Foundation of Henan Province (G2002026, 200510475028). Corresponding author, email: [email protected].

f

185

correspondingly, the WRR method' ] is called LBWRR method here. While in this paper, an Upper Bound Weak Ratio Rule (UBWRR for short) is dealt with, such as Butter: Bread^7:4, (2) which means if the spending on bread is $4, then that on butter is at most $7. In this way, an UBWRR method is proposed. By LBWRR method, we can forecast the lower bound of the lost value; on the contrary, by UBWRR method we can forecast the upper bound of the lost value. Thus, by averaging lower bound and upper bound of the lost value, an average value can be obtained. Therefore, the accuracy of forecasting the lost data will be improved. This paper consists of four main sections: problem statement, mining algorithm, uncertainty reasoning and application. 2. Problem Statement Let D denote a PQTD[1] (pure quantitative transactional database). Let / denote the set of all items but T the set of all transactions. D(t, f)( also £>,('))> a t t n e rth row and the ;th column of A refers to the amount spent by a customer on item i in transaction t. Let A be a nonnegative real-valued function on /. We use supp^ to denote the set {x|*e /, A(x) > 0}, which is called the support set of A. If suppA = {x\, X2,..., xp), then we express/i as A(xi)/xi +A(x2)/x2 +...+A(xp)/xp or AQa) | A(x2) | XI

X2

|

A(») Xp

Let A,B be nonnegative real-valued functions on / and supp^4 O suppS = 0 ; We say that a transaction t supports A if Z),(/) > 0 for any / e supp/1. Let support_count(A) denote the number of transactions supporting A. We say a transaction / supports (A =>B)ift supports A,B and there exists a e (0, + oo), such a D,(i) ^ A{i) for any /esupp/i and B(f) ^ a D,{j) for anyy esupp5. Let support_count(A => B) denote the number of transactions supporting (A => B). Given minimum support threshold ms and minimum confidence threshold mc, if support_count(A => B) ;> ^g

m

"

support_count(A=> B) > mc support_co unt(A) then {A => B; ms; mc),simply (A => B) or A => B, which is called an Upper Bound Weak Ratio Rule (UBWRR for short). If the values in D are integer 0 or 1, A(t)=\ for any ie suppA and B(j)=\ for any j G supp5, then (A => B) is a upper bound weak ratio rule if and only if (supp^ => supp5) is a Boolean association rule(BAR for short in this paper).

So, we come to the conclusion that UBWRR problem is a generalization of BAR problem. We can prove that if (A => B; ms; mc) is a UBWRR then (supp^ => suppB; ms; mc) is a BAR. We can say that (supp/f => suppi?; ms; mc) is the support rule of (A => B; ms; mc), or (A =>£; ms; mc)\s supported by (supp/1 => suppS; ms; mc). If (A :=> B; ms; mc) is an UBWRR, we can get the following proposition "If the spending on item i by a customer is A(i) for any i G suppA then there is mc possibility that the spending by the customer on item j is at most B(J) for any jesuppB" which is called the reasoning meaning of (A =>B; ms; mc). 3. Mining Algorithm If a PQTD D, a minimum support threshold ms, a minimum confidence threshold mc and a Boolean Associate Rule(BAR) (X=$> Y ) is Given, then the set, Ruw (£>; ms; mcj(=> Y ) , of all UBWRRs supported by (X => Y) is determined. The following Algorithm3.1 give a method of finding Qm(D; ms; mc; X=> Y). An element in QUW(D; ms; mc; X=> Y) is called quasi-minimal UBWRR supported by (X=> Y). Without special statement, concepts in this paper will follow that of paper[i]. Algorithm 3.1:UBWRR, finding all quasi-minimal UBWRR Input: PQTD D, minimum support threshold ms, minimum confidence threshold mc andaBAR(X=>Y)X= {*,,x2, ••• ^P} Y={yx^2, ••• Jq}Output: QUW(D; ms; mcj(=> Y), the set of all quasi- minimal UBWRRs supported by (X=> Y). Method: (1) 17]:= record count in D; (2) Scan all the transactions t ofD, if there exists element x,of X cause D,(;CJ) = 0, then delete transaction t. support_count{X) := record count of left transactions; (3) SCLB:= max(ms * \T], mc * support_count{X)); (4) Scan all the transactions t, if there exists element^ of Y cause Dfyj) = 0, then delete transaction t; (5) For any x,GZ(i=l,2,...,p), sort the set Vx, :={D,(xJ\t supports^ U 7} a s { ^ ° , al°,..., a j } }, cause fl<°< o, (0 <...< a^,i=l,2,...p; (6) For any y^eY (/'= 1,2,... ,q), sort the set Vyt := {D,(yj)\t supports I U K ) a s {b(J\ W\..., b[J) }, cause b(0J)> & 0) >...> b[j) 7=1,2,...,q, m :=/> + q. (7) For any transaction t, calculate the matrix M(D,(Y)/D,{X)) :

188 Dt(y,)/Dt(x:)Dt(y2)/Dt(x,)

••• Di(yq)/D,(x,)

Dt(y,)/Dt(x2)Dt(yi)/Dt(xi)

••• Dt(yq)/Dt(x2)

KDt(yi)/Dt(xP)Dt(y2)/Dt(xP)

••• Dt(yq)/Dt(xp)^

(8) posMUB-0 ; (9) getmax(< >); (10) Build Quw(£>; ms; mc-J(=> Y) by posMUB. The following procedure is to find the maximal elements of a lower segment set. The procedure adopts deep-first algorithm, whose speed is quicker than that of breadth-first algorithm. Theoretically, for the UBWRR model, the minimal elements of an upper segment set should be got. However, by keeping the procedure's frame fixed for the portability of the program, but just overturning the data in the position lattice in Algorithm 3.1, we can prove that the effect is the same as that of rewrite the procedure. procedure getmax(h : a vector of natural numbers) (I) U ~{t\t e.posMUB; and the d(h)-prefix, oft, is greater than h}; ( 2 ) i f t / * 0 then (3) b0 := the biggest of (d(h) + l)th component of vectors in U (4) else b0 ~ 0; (5)*:=A„+1; (6) while b^ndm+l and IsUBWRRPos( b0 or U = 0 then (13) posMUB :=posMUB U {
B(yO/A(x,)B(y2)/A(xi)

•••

B(yq)/A(x,f

B(y,)/A(x2)B(yi)/A(x2)

•••

B(yq)/A(X2)

•••

B(yq)/A(xP)J

yB(yi)/A(xP)B(y2)/A(xP)

(3) support_count(A => B) := 0; (4) Scan all the transactions t, if the values in matrix (M(B(Y)/A(X)M(D,(Y)/Dt(X)) axe all nonnegative, then add 1 to support_count(A => B); (5) if support_count(A-=$ B) 5s SCLB then (6) return TRUE (7) else return FALSE; Example 3.1: Consider the PQTD D in Table 1. Table 1. The PQTD in example 3.1 XI 10 1 10 5 2

T, T2 T3 T4 T5

X2 20 2 20 10 4

Yl 30 3 40 15 4

Let ms =1/3; mc = 2/3, X = {xhx2}; Y = 0>,}; then (X=>Y) is a BAR. We use Algorithm 3.1 to mine QUW(A' ms; mc;X=>Y), and the results can be got as follows: posMUB={<3,3,\>,<2,2,2>,<0,0,4>}, Qm(D; ms; mc;X=> Y) = {r,: (10/il+20/i2+30/i3), r2:(5/il+10/i2+15/i3), r3: (l/il+2/i2+3/i3)}. However, if we use Algorithm 3.1 in paper[i], the results are posM= {<3,3,0>,<2,2,1 >,< 1,1,2>,<0,0,3>}, Mqq(D; ms; mc;X^> Y) = {r,: (l/il+2/i2+3/i3), r2: (2/il+4/i2+4/i3), r3: (5/il+10/i2+15/i3), r4: (10/il+20/i2+30/i3)} 4. Uncertainty reasoning Using the reasoning meaning of UBWRR, we can reach UBWRR uncertainty reasoning method: rule 1: A, => B, rule n: A„ => B„ fact: P conclusion: Qu In this model, Ah Bh P, Qu are nonnegative real-valued functions on /, and suppP = suppAk = X, supp£* = Y, for any k =1,2...,«; Vy el,

QJJV

A > 0,

J y

*

jiY

190 The UBWRR uncertainty reasoning method has an intuitive meaning: If facts are Given as follows: UBWRRs (Ak=>Bk; ms; mc); k =l,2,...,n, and the fact that " spending on item /' by a customer is P(f) for any i e X\ then there is mc probability that the spending on item j by the customer is at most A«V^O-))forany;eY5. Application The LBWRR111 can be applied to data cleaning, data forecasting and outlier detecting. The reconstructed data underlined in Table 2 are shown in Table 3[1], Table 2. Original data

Rec# 1 2 3 4 5 6 7 8 9 10 11

11 149.3 161.2 171.5 175.5 180.8 190.7 202.1 212.4 226.1 231.9 239.0

12 4.2 4.1 3.1 3.1 1.1 2.2 2.1 5.6 5.0 5.1 0.7

13 108.1 114.8 123.2 126.9 132.1 137.7 146.0 154.1 162.3 164.3 167.6

14 15.9 16.4 19.0 19.1 18.8 20.4 22.7 26.5 28.1 27.6 26.3

Table 3. Reconstructed data by LBWRR Rec# 1 5 7 9 11

11 143.51 180.8 202.1 226.1 239.0

12 4.2 3.08 2.1 5.0 0.7

13 108.1 132.1 146.0 162.3 167.77

14 15.9 18.8 12.87 24.19 26.3

which showed that the reconstructed data by LBWRR are much closer to the original data than other methods. To ensure clear understanding of the new approach, the approach in section V of paper[l] is called LBWRR method here. Parallel to LBWRR method, our new approach for reconstructing lost data — the UBWRR method is as follows: 1. For any record t (corresponding a transaction) with the known data (lossless data) and the lost data, let A" be the set, in which all items corresponding with known data on row / are not UB outliers, and let Y be the set, in which all items corresponding with data on row t are lost. 2. Let ms, mc be 0.5 or smaller than 0.5 to ensure that (X=> Y ) is a BAR. 3. Mining QuW(£>; ms; mc-JC=> Y) by Algorithm 3.1.

191 4. Let the quasi-minimal UBWRRs mined in step 3 be the rules in the UBWRR uncertainty reasoning method in Section IV, and let 2_, x*x (D^x)/x) be the fact P. By applying the UBWRR uncertainty reasoning method, a conclusion Qu can be reached. 5. Let the values in conclusion Qu be the reconstructed values of/ on items of Y. According to the LBWRR method above, we can reconstruct the underlined data in Table 2. The results are shown in Table 4. By integrating UBWRR method with LBWRR method, a WRR method is proposed as follows: 1. Let Lva/=the reconstructed values by LBWRR method; 2. Let t/va/=the reconstructed values by UBWRR method; 3. Let Aval=(Lval+Uval)/2. Table 4. Reconstructed data by UBWRR Rec# I 5 7 9 II

II 161.48 180.8 202.1 226.1 239.0

12 4.2 3.068 2.1 5.0 0.7

13 108.1 132.1 146.0 162.79 172.08

14 15.9 18.8 227 24.87 26.3

By the WRR method above, the new values can be got, which are much closer to the original. The final results are shown as in Table 5. Table 5. Final results by WRR method ~Rec# I 5 7 9 II

II 152.50 180.8 202.1 226.1 239.0

12 4.2 107 2.1 5.0 0.7

13 108.1 132.1 146.0 162.54 169.92

14 15.9 18.8 17.78 24.53 26.3

Wang' ] discussed the root-mean-square error of ratio rule method and that of column average method. The RMS is defined as

V

'=' v'=i

Where dij is the reconstructed value, and dy is the lost original value. By the WRR method above, the RMS= 1.15. Results show that the WRR method is more exact than Jiang's LBWRRm(RMS=1.84),Flip Korn's ratio rule131 (RMS=2.04) and column average method (RMS=8.77).

192 6. Conclusion Parallel to the LBWRR[1I,we have proposed a new association relation (called upper bound weak ratio rules, UBWRR for short). Contrary to LBWRR's lower-bound-guess function, the UBWRR can guess the upper bound in pure quantitative transactional database. The UBWRR problem is also a generalization of BAR problem. Through discussing the mathematical model of UBWRR, we come to the conclusion that every UBWRR can induce a BAR as its support rule. We present an algorithm for mining an important subset of all UBWRRs supported by a given BAR. By the reasoning meaning of UBWRR, we propose an UBWRR uncertainty reasoning method. The UBWRR can be applied to reconstructing lost data, to forecasting and to detecting outlier. The WRR method is based on the average of reconstructed values' lower bound by LBWRR and upper bound by UBWRR, the final reconstructed value by WRR method is limited in a smaller range, which is much more exact than the simplex LBWRR method. Experiments demonstrate that the reconstructed data by the WRR method are much closer to the original data than that by Jiang's LBWRR method'11, and by Flip Korn's ratio rule' 3 ', or by column average method. References 1.

2. 3.

4. 5. 6. 7.

Baoqing Jiang, Yang Xu, Qing Wei and Kun Wu. Weak Ratio Rules between Nonnegative Real-valued Data in Transactional Database. The IEEE International Conference on Granular Computing, Beijing, Chinajuly 25-27, 2005.pp.488-491. S. Guillaume, A. Khenchaf and H. Briand, Generalizing Association Rules to Ordinal Rules, In The Conference on Information Quality (IQ2000), (MIT, Boston, MA, 2000), 268-282. Flip Korn, Alexandras Labrinidis, Yannis Kotidis, and Christos Faloutsos, Ratio Rules: A New Paradigm for Fast, Quantifiable Data Mining, In Proc. of the 24th International Conference on Very Large Data Bases (VLDB), pages 582-593, New York, USA, August 1998. A. Marcus, J.I. Maletic and K. Lin, Ordinal Association Rules for Error Identification in Data Sets, CIKM, pages 589-591, 2001. R. Srikant and R. Agrawal, Mining quantitative association rules in large relational tables, In Proc. 1996 ACM-SIGMOD Int. Conf. Management of Data (SIG-MOD'96), pages 1-12. Montreal, Cannada, June 1996. Qingyi Wang, Donghui Shi, Bing He and Qingsh-eng Cai, Research on Linear Association Rules, Mini-Micro System, Vol. 22 No. 11 Nov. 2001. (Chinese) Y. Xu, D. Ruan, K. Qin and J. Liu, Lattice-Valued Logic, Springer- Verlag, 2003.

C O M B I N I N G VALIDITY I N D E X E S A N D MULTI-OBJECTIVE OPTIMIZATION B A S E D CLUSTERING

TANSEL OZYER1'3 AND REDA ALHAJJ1-2 Dept.

of Computer

Science, University of Calgary, Calgary, Alberta, Canada { ozyer, alhajj} @cpsc. ucalgary. ca Dept. of Computer Science, Global University, Beirut, Lebanon TOBB Economics and Technology University, Dept. of Computer Eng, Ankara, Turkey

In this study, we present a clustering approach that automatically determines the number of clusters before starting the actual clustering process. This is achieved by first running a multi-objective genetic algorithm on a sample of the given dataset to find the set of alternative solutions for a given range. Then, we apply cluster validity indexes to find the most appropriate number of clusters. Finally, we run CURE to do the actual clustering by feeding the determined number of clusters as input. The reported test results demonstrate the applicability and effectiveness of the proposed approach.

1. Introduction Clustering is the process of classifying a given set of objects into groups by taking into account two main criteria: objects in each group should be homogeneous and the different groups should be separate. To do this, it is necessary first to decide on the characteristics or attributes based on which to cluster the objects because the same set of objects may produce different clustering based on different combinations of the attributes. This is an application oriented decision. For instance, people may be classified based on any combination of age, sex, nationality, etc. It is necessary to point out that clustering is different from classification. While the latter is a supervised process, the former is unsupervised. In classification, it is necessary first to specify the classes and label the data to train the system to be able to classify new coming data objects. Clustering has several practical applications in biology, finance, webuser analysis, etc. Thus, several research groups developed different clustering algorithms, such as k-means 6 , PAM 5 , CURE 3 and ROCK 4 . However, most of the existing clustering approaches require the number of clusters to 193

194 be specified before the clustering process can start. Others are application oriented and find the clustering based on density analysis without having the number of clusters pre-specified. Identifying knowing the number of clusters as a major requirement for successful clustering and realizing that it is not realistic to have it known apriori in general, we started a project that utilizes multi-objective optimization to achieve better clustering. The motivation is that clustering is not a single objective process. Rather, the intention of clustering is to find homogeneous instance groups and separate the clusters as much as possible to clarify the distinction between them 2 . We tested different combinations of parameters and algorithms to be combined into the multi-objective optimization process. The process by itself delivers multiple solutions and hence it is necessary to rank them and decide on the most appropriate solution for the analyzed dataset. For this purpose, we utilize some of the major indexes already developed for cluster validity analysis. We have successfully tested different forms of the objectives and achieved promising results on different datasets 8 » 7 ' 11 ' 10 . Further, we realized, by testing on datasets from different domains, that not every index gives good result for every dataset. So, we discovered that the more indexes we use the better results we obtain because we decide on the best solution as the one favored by the majority of the indexes. The work described in this paper integrates multi-objectivity and CURE clustering. We use multi-objective optimization to find the alternative clustering, validity analysis to find the most appropriate number of clusters and CURE to do the final clustering based on the determined number of clusters. As compared to the previous stages of this project, in the first trial described in 8>7'11, we used the number of clusters and homogeneity as the two main objectives with the whole dataset as input. In the second stage described in 10 , we homogeneity and separateness as the objectives. We also used iterative approach to find the most natural clustering for each particular number of clusters within a prespecified range. It gives better results than the first approach, but suffers scalability problem. So, in the work described in this paper, we use three objective functions, namely the number of clusters, separateness and homogeneity. Also, we use a sample representative of the dataset and CURE for the final clustering. The conducted experiments demonstrate the applicability and effectiveness of the proposed approach. The rest of the paper is as follows. The proposed system is described in Section 2. Test results are reported in Section 3. Section 4 is conclusions.

195 2. The Clustering Process The clustering process applied in this study consists of three main phases. The first phase applies multi-objective genetic algorithm to find the alternative clustering results; the second phase applies different validity indexes to find the most appropriate number of clusters; and the third phase applies CURE to obtain the final clustering.

2.1. Finding Alternative

Solutions

During the first phase, we first generate a sample of the dataset. This is done in a way to speedup the multi-objective genetic algorithm (GA) process, which is known to be slow. Considering the sample a good representative of the original dataset, we run the GA with three main objectives, minimize the number of clusters, maximize the homogeneity within each cluster and maximize the heterogeneity between the clusters. We tested two alternatives for the heterogeneity or separateness, namely average linkage and average to centroid linkage. For homogeneity we used the total within cluster variation. For the separateness we used the following inter-cluster separability formulas, where C and D are candidate clusters. Average Linkage : d(C, D) =

Average t o Centroid:
Yl

|(?|

-=-7

d x

i1)

( ^v)

^2 d{x, vD) + ^

d(y, vc)

y€D

xSC

(2) For the homogeneity we used the following intra-cluster distance formula:

TWCV = J2 J2 Xld- j ^ Y E n=ld=l

fc=l

k

SF

™

(3)

d=l

where Xi, X2,.. ,XN are the N objects, Xnd denotes feature d of pattern Xn (n= 1 to N). SFkd is the sum of the d—th features of all the patterns in cluster fc(Gfc) and Z& denotes the number of patterns in cluster fc(Gfc)and SFkd is:

SFkd = y2^

Xnd

(d = 1,2, ...£>).

(4)

196 T h e GA process requires specifying a set of parameters, including the coding scheme for t h e individuals, the number of individuals in the population, t h e fitness, cross-over, mutation, and termination criteria. T h e termination criteria may be specified as a threshold on t h e progress achieved between different populations obtained during consecutive runs of t h e algorithm or as a maximum number of iterations t o be reached in case the first condition fails. Each individual in t h e population is represented by a chromosome of length n, where n is the number of d a t a points in t h e sample to be analyzed. Every gene is represented by an allele, where allele i is t h e corresponding cluster of instance i. In other words, each allele in t h e chromosome takes a value from the set {1, 2, . . . , K}, where K is t h e maximum number of clusters for which we t r y t o get an optimal partitioning. In this study, we choose K to be y/n. However, in case the validity analysis step favors y/n, then K is incremented by 5 in a repetitive process with the G A and validity analysis reapplied until the favored number of clusters is smaller t h a n t h e current value of K. T h e employed GA process works as follows. T h e current generation is assigned t o zero and a population with the specified number of chromosomes P is created. This is done by using the ordered initialization as follows. In round order, each allele took values 1 t o K in order, and then those allele value assignments are shuffled within t h e chromosome randomly by processing t h e random pairwise swap operation inside the chromosome. This way, we can avoid generating illegal strings, i.e., we avoid having some clusters without any pattern in t h e string. One-point crossover operator is applied on randomly chosen two chromosomes t o generate new chromosomes. Crossover is carried out with probability pc. To decide on candidate chromosomes t h a t will survive to t h e next generation, t h e selection process considers t h e optimization of the three objectives total within-cluster variation fitness value, separateness and number of clusters. Only the best P chromosomes are kept in t h e population for the next iteration. T h e aim of mutation is t o introduce new genetic material in an existing chromosome. T h e mutation operator replaces each gene value an by an' with respect to the probability distribution; for n = 1 , . . . , N. an' is a cluster number randomly selected from {1, . . . , K} with probability

197 distribution {pi,P2,- • • ,PK} defined as: p-d(X„,c)

(5)

ft = -T J2 e-d(Xn,c3) 3= 1

where i G [l..k] and d(Xn, C^) denotes Euclidean distance between pattern Xn and the centroid Ck of the k—th cluster; pi represents the probability interval of mutating gene assigned to cluster i (e.g., Roulette Wheel). Finally, if the maximum number of generations is reached, or the prespecified threshold is satisfied then exit; otherwise the next generation is produced.

2.2. Deciding

on Number

of

Clusters

The result obtained from the previous step is a set of alternative solutions. Each solution satisfies the criteria employed by the multi-objective optimization process. So, we need to decide on the most appropriate solution from the set of alternatives, we used the clustering validation schema described in 1. As a result, we apply the following validity indexes: scott, friedman, ratkowsky, calinski, rubin, Hubert, db, ssi, dunn and silhouette.

2.3. Applying

CURE for Actual

clustering

After the validity indexes suggests the most appropriate value for the number of clusters k, it is used as input to the CURE clustering algorithm to decide on the actual clustering of the whole dataset. The process of CURE can be summarized as follows. Starting with individual values as individual clusters, at each step the closest pair of clusters are merged to form a new cluster. This is repeated until only k clusters are left. As a result, individuals in the database are distributed into k clusters. The input parameters to this CURE are: The input data set D containing |Devalues in n-dimensional space, where \D\ is the number of values in the database and n is the number of attributes. (1) The desired number of clusters k (2) Starting with individual values as individual clusters, at each step the closest pair of clusters are merged to form a new cluster. The process is repeated until only k clusters are left.

198 3. E x p e r i m e n t s and R e s u l t s We conducted our experiments on Intel 4, 2.00 GHz C P U , 512 M B RAM running Windows X P Dell P C . T h e proposed process has been implemented based on t h e integrated version of GAlib ( C + + Library of Genetic Algorithm Components 2.4.6) 9 and NSGA-II source code (Compiled with g + + ) . Necessary or needed p a r t s have been (re)implemented for t h e multiobjective case. T h e approach and t h e utilized cluster validity algorithms have been conducted by using t h e cclust and fpc packages of t h e R Project for Statistical Computing 1 2 . We have run our implementation 20 times for the tested d a t a set with parameters: population size—100; tournament size during the increment the no of clusters^ approximately t h e noof items/5 (20% of t h e entire d a t a set); p(crossover) for t h e selection=.9; we tried single and two point crossover in order; single point crossover gave better results; p(mutation)= 0.05 and for t h e mutation itself the allele number is not changed randomly but with respect t o Equation 5. We used I R I S in t h e evaluation process: a classification of the iris plant in different species having 4 features, 150 examples and 3 classes, 50 instances each, without missing values. We executed t h e clustering process for different combinations of t h e homogeneity and separateness choices. Our termination criteria is chosen as t h e average of the population when each objective is no more minimized. We use 50 as t h e sample from the dataset; and hence \f%0 = 7 is used by t h e first step of the process as the upper limit for t h e candidate number of clusters t o test.

Table 1. 2 3 4 5 6 7

avg.silwidth 0.702591569 0.732155823 0.607591569 0.573189899 0.533915125 0.513803139

- Results of Average Linkage

hubertgamma 0.088402459 0.101366423 0.050971161 -0.058402459 -0.042051062 -0.040205191

dunn 0.007012415 0.011056519 0.00204998 0.004012415 0.003535411 0.003419531

w b . ratio 0.081916363 0.08141178 0.09017462 0.095016363 0.082071856 0.082062071

Reported in Tables 1, 2, 3, and 4 contain the results obtained by applying the different validity indexes on t h e outcome from t h e multi-objective optimization process. In average linkage, t h e four indexes in Tables 1 report t h e right optimal number of clusters as 3; and all of the four indexes report 2 as t h e next candidate number of clusters. Note t h a t 2 is accepted as a candidate solution because two of t h e Iris classes are very similar and

199 Table 2. calinski 1693 -234 527 257 1712 257

2 3 4 5 6 7

db 0.19 0.18 0.31 0.30 0.29 0.28

Table 3. 2 3 4 5 6 7

avg.silwidth 0.608993661 0.65936671 0.563333035 0.548893601 0.509930437 0.489900031 Table 4. calinski -3 1762 502 254 242 236

- Results of Average Linkage and TWCV ratkowsky 0.11 0.92 0.04 0.03 0.04 0.045

scott 105.16 185.37 49.82 20.27 114.11 14.24

Friedman 16.42 12.45 1.33 1.01 1.61 1.18

rubin -64.65 -9.10 -5.46 46.39 46.85 48.77

ssi 1.07 0.99 0.70 0.57 0.86 0.69

- results of Average to Centroid Linkage hubertgamma 0.200449112 0.362859 0.19052666 0.004449414 -0.100814738 -0.178814111

dunn 0.03711746 0.009238841 0.001631388 0.001814346 0.002227689 0.002020080

w b . ratio 0.097762442 0.100437027 0.09785464 0.098762442 0.098922179 0.099977143

- Results of Average to Centroid Linkage and TWCV db 0.29 0.18 0.31 0.42 0.59 0.98

ratkowsky 0.042 0.05 0.04 0.03 0.026 0.021

scott 113.78 181.17 53.77 18.90 18.76 17.29

Friedman 16.09 11.95 5.88 3.92 1.70 1.56

rubin -42.10 -7.65 -6.68 -6.54 -6.35 -6.23

ssi 0.06 0.12 0.07 0.06 0.08 0.09

some researchers report them as single class. As the convex clustering validity indexes reported in Table 2 are concerned, both 2 and 3 are reported as possible numbers of clusters, with the reported second candidate number of clusters as 3 and 2, respectively; except calinski index. The same applies to the analysis of the results using average to centroid linkage; both 2 and 3 are reported as either first or second candidate possible number of clusters as reported in Tables 3 and 4. 4. Conclusions Clustering is unsupervised process to classify instances from a given dataset into classes. It is in general required that an expert provides in advance the intended number of clusters in addition to other parameters. However, this is not possible especially as the dataset to be analyzed becomes more challenging with increased features. Knowing this, we developed an approach that utilizes multi-objective GA and validity indexes to report the most ap-

200 propriate number of clusters. This process is applied on a sample representative subset of t h e original dataset. After getting t h e number of clusters, we feed it as input to C U R E , which effectively clusters the original dataset. We demonstrated using t h e IRIS dataset t h a t t h e proposed approach works effectively and produces t h e intended clustering result. Currently, we are considering partitioning a dataset into different disjoint subsets; then cluster each subset alone; and at the end consider each cluster as a single point and cluster all t h e clusters to obtain the final clustering. This will help in handling t h e scalability problem t h e best.

References 1. E. Dimitriadou, S. Dolnicar, and A. Weingessel. An examination of indexes for determining the number of clusters in binary data sets. Psychometrika, 67(1):137-160, March 2002. 2. J. Grabmeier and A. Rudolph. Techniques of cluster algorithms in data mining. Data Mining and Knowledge Discovery, 6:303-360, 2003. 3. S. Guha, R. Rastogi, and K. Shim. Cure: An efficient clustering algorithm for large databases. In Proc. of ACM SIGMOD, pages 73-84, 1998. 4. S. Guha, R. Rastogi, and K. Shim. Rock: A robust clustering algorithm for categorical attributes. In Proc. of IEEE ICDE, pages 512-521, 1999. 5. L. Kaufman and P.L. Rouseeauw. Finding group in data: An introduction to cluster analysis. John Wiley & Sons., New York, 1990. 6. A. Likas, N. Vlassis, and J. Verbeek. The global k-means clustering algorithm. Technical Report IAS-UVA-01-02., Computer Science Institute, University of Amsterdam, Netherlands, February 2001. 7. Y. Liu, T. Ozyer, R. Alhajj, and K. Barker. Cluster validity analysis of alternative solutions from multi-objective optimization. Proc. of SIAM DM, 2005. 8. Y. Liu, T. Ozyer, R. Alhajj, and K. Barker. Validity analysis of multiple clustering results on the pareto-optimal front. European Journal of Informatica, 29(1), 2005. 9. Massachusetts Institute of Technology and Matthew Wall. GAlib Documentation. MIT, USA, 2005. 10. T. Ozyer and R. Alhajj. Effective clustering by iterative approach. In Proc. of ISCIS. Springer-Verlag LNCS, 2005. 11. T. Ozyer, Y. Liu, R. Alhajj, and K. Barker. Multi-objective genetic algorithm based clustering approach and its application to gene expression data. Proc. of AD VIS, 2004. 12. R Development Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2004.

A METHOD FOR REDUCING LINGUISTIC TERMS IN SENSORY EVALUATION USING PRINCD7LE OF ROUGH SET THEORY* XIAOHONG LIU College of management, Southwest University for nationalities, Chengdu, Sichuan, 610041, China XIANYI ZENG LUDOVIC KOEHL The ENSAIT Textile Institute^ rue de I ,Ermitage,F-59100 Roubaix France YANG XU College of science, Southwest Jiaotong University, Sichuan Chengdu,

610041,China

This paper presents a method for reducing linguistic terms in sensory evaluation using the principle of rough set theory. Using this method, inconsistent and insensitive evaluation terms are removed and then the related sensory evaluation work can be simplified. The effectiveness of this method has been validated through an example in fabric hand evaluation.

1. Introduction Initially, sensory evaluation or sensory analysis was developed for studying the reactions to certain characteristics of food products. In today's industrial companies, especially in manufacturing fields, sensory evaluation has been widely used. It also concerns other specialized areas such as risk evaluation, investment evaluation, human resource evaluation and safety evaluation and so on. This concept is defined as follows (Stone and Sidel, 2004). Sensory evaluation is a scientific discipline used to evoke, measure, analyze and interpret reactions to chose characteristics ofproducts or materials as they are perceived by the senses of sight, smell, taste, touch and hearing.

* This work is partially supported by National Natural Science Foundation of China (Grant No. 60474022).

201

In the field of sensory evaluation, a linguistics term is used to describe one attribute of the related product quality or the consumer's preference related to this attribute (Zeng, 2004). The number of linguistic terms directly affects the result and the cost of sensory evaluation. This paper presents a new method for reducing linguistic terms in sensory evaluation using rough set theory. Using this method, inconsistent and insensitive evaluation linguistic terms can be removed according to sensory data provided by of experts while relevant evaluation terms can be preserved in the data based for the future evaluation work. Our strategy is composed of two steps. At the first step, the linguistic terms are divided into two sets according to the result of consistency for identical samples. At the second step, the linguistic terms are divided into two sets according to the sensitivity for different samples. The preserved relevant terms can be obtained using operations on theses sets and they correspond to the lower approximation of the rough set of the initial evaluation terms. The basic principle of our method is briefly presented as follows. First, we express the sensory evaluation system using a weighted knowledge table, i.e. S= in which U denotes a set of evaluation targets (U= {1,2,..., n}) and A is a set of linguistic terms each describing one attribute of the products of interest. Next, we calculate the index of individual consistency of evaluation terms for identical samples and the index of individual sensibility of evaluation terms for different samples. For the j-th term of the i-th product, these two performance indices are denoted as r? and r* respectively with r°,r' e[0,l]Next, we calculate the index of aggregated consistency and index of aggregated sensitivity of linguistic terms, denoted as pc and p* according to the following relations: p c = £

W

V >p ' z ^

w

y ,

c s p pP .

e[0,1]

• The whole set of

linguistic evaluation terms A can be then divided into two subsets satisfying the conditions pc > Ec and pc <Ect respectively. These subsets are denoted as V.c ,V,C • In the same way, the set of evaluation terms A can also be divided into two subsets V.'.V,' according to the conditions p" >s" and „» <E\. In this situation, 1

1

* j

J

H

i

J

we consider the result of yc n Vt" as lower approximation of the rough set of A. The linguistic terms belonging to the lower approximation should be preserved as relevant terms. More details are given in the following sections. 2. A knowledge table of linguistic terms in sensory evaluation Linguistic terms play an important role in the course of sensory evaluation. For example, in the field of fabric hand sensory evluation, the linguistic evaluation terms include soft, smooth, pleasant, etc. They describe different attributes

related to the product quality or consumer's preference related to these attributes. In practice, the efficiency and the cost of evaluation are strongly related to the number of linguistic terms. In each sensory evaluation, it is necessary to adopt relevant terms and remove irrelevant terms. The flow chart of evaluation terms generation is shown as Figure 1. Start

if Original terms No Test

Yes Practical terms Application

End Figurel. Flow chart of evaluation terms generation In Figure 1, we present the procedure for reducing linguistic evaluation terms using rough set theory. The sensory evaluation carried out by the i-th panel can be expressed using a weighted knowledge table mentioned in Section 1. Its formal representation is shown in table 1. Table 1. A knowledge table of linguistic evaluation terms Evaluation terms N. Producte>« ^ j \

A,

Am

1

Rn

Rln

n

Rml

Kmn

In table 1, A (j = \,...,m) denotes the j-th linguistic term, and R the evaluation result on the j-th evaluation term for the k-th product, Rk e {l,2,...,r}, i

r the maximal evaluation score used for the j-th term, and wj e [0,1], V w = 1 >

w and t are the weights of the i-th panel and the number of all panels respectively. In practice, there are too many terms in one system of sensory evaluation after the procedure of "brain storming", which generates an exhaustive list of evaluation terms. Therefore, we need to remove the inconsistent and insensitive evaluation terms according to the results of evaluation on all terms. However, if more than two panellists or experts give different evaluation results, one big problem is how to aggregate these results in a simple and suitable way. 3. Computing the indices of consistency and sensibility for evaluation terms In order to improve the efficiency in sensory evaluation and reduce the number of evaluation terms reasonably, we compute the indices of individual consistency and individual sensitivity of evaluation terms respectively. In a sensory evaluation, the consistency of evaluation terms is considered as the degree of resemblance of evaluation scores given by different experts when evaluating identical products. The sensitivity of evaluation terms is considered as the variation of evaluation scores when evaluating different products. Formally, these two indices are denoted as r c and r*. They represent the consistency and the sensibility of the j-th evaluation term for the i-th product respectively. We have rc.,r* e[0,l]- Their definitions are given as follows. Let V be the set of evaluation terms, V: Rr

•[0,1]

for the same product i, 1

d=0 0
for different products,

(1)

d = dm

0 —

d =0 m 0
1

d = dm

(2)

In eq.(l) and eq.(2), d is determined as follows: 1

(3)

where xt denotes the result of evaluation of the l-th panel, and x, e{l,2,...,r},3t = £ x . .

dm=max{dk}

(4)

k

k and dmare determined according to the maximal value (r) of the evaluation scores for the j-th evaluation term, for example, if r - 2, then k = 0,1, d0 = 0, dm=d],rij=\ orO. We suppose that evaluation scores of evaluation terms are 1,2,3 r, and the value of dm varies with r. Some results are shown as table 2. Table 2. The results of d

r dm

2 0.5

3 1.33

4 2.25

5 4.8

6 7.5

7 10.29

8 14

9 17.14

10 22.5

m

4. A method for reducing linguistic terms in sensory evaluation Let pc and p" be the index of aggregated consistency and aggregated sensitivity of the j-th evaluation term for all panels respectively, and p< = y

W

V ,p'=Y

W

V pc,p) e [0,1], wc and w* are the weights of the

individual consistency and individual sensibility of the i-th panel respectively, t is the number of existing panels. The j-th evaluation term does not play an important role in sensory evaluation only when the two following conditions are satisfied. In the first condition, the sensory evaluation results for the j-th term and the k-th product provided by different experts are quite different. In the second condition, the evaluation results for the j-th term and all products are very similar and concentrated. For the other cases, we remove the j-th term according to the values of pc and p". In rough set theory (Pawlak 1991), an approximation space can be denoted as A =, where U and R represent the domain of discussion and an elementary set in A respectively. For any element xeA , let [x]R be the equivalence class of R containing x. For any set X defined on the domain U, it can be characterized using two sets (the upper approximation and the lower approximation in A). The corresponding definitions are given below.

Aupp(X) = {xeU,[x]RnX*t\

(5)

Ahw(X) = \xeU,[x]lt ec and pcf < sc respectively. These two subsets are denoted as V° and F2C. The terms in P,c are considered as consistent and the terms in V£ as inconsistent. In the same way, we can also divide the whole set of terms into two other subsets, denoted as V{ and V£, satisfying the conditions p" > ss and p" < es respectively. Finally, we define Vy n V* as the lower approximation of the rough set of A. The evaluation terms belonging to this set should be preserved for future evaluation work. 5. An example In fabric hand evaluation, evaluation terms are given by experts include: soft, smooth, tight, slippery, floppy, compact, hollow, pleasant, fresh, dense, flexible, dry. Formally, they are denoted as Ai_ A2, A3> At, A5> A6j A7j A8> A9| A10, and AM respectively. One example of fabric hand evaluation is shown in Table 3. In this example, 7 panellists or experts evaluate one product using all these terms. Table 3. Sensory data on fabric hand evaluation Panelist 1 2 3 4 5 6

Al 7 5 6 7 7 6

A2 1 1 1 1 2 2

A3 1 1 1 2 2 1

A4 1 2 1 2 2 1

A5 3 2 2 3 3 2

A6 1 1 1 2 2 1

A7 1 1 1 2 1 2

A8 7 6 5 7 7 6

A9 3 2 3 2 3 3

A10 1 1 1 2 2 1

All 6 4 5 6 5 5

7

5

1

1

1

1

1

1

6

2

1

5

Another example of fabric hand evaluation is shown in Table 4. In this example, aggregated evaluation results of one expert panel are given for 18 different products and all the terms. Table 4. Sensory data on fabric hand evaluation Code sample 4-1 10-1 14-1 14-2

Al

A2

A3

A4

A5

A6

A7

A8

A9

A10

All

7

10

1

1

3

1

1

7

3

1

5

1 1 1

1 1 2

2 1 1

1 1 1

1 1 5

6 3 2

3 1 1

1 1 1

6 3 2

7 6 4

10 10 8

16-1 16-2

6 4

9 9

22-1 22-2 24-1

6 4 5

8 7 ~~8~

1 1 1 ~1 1 1 1

1

1 1

I I 2 1 1 1

2

1 1

4 5

1

2 ~2~

1

4 4

1

1 ~ i ~ 3 ~ 1 ~ 1 1 4 2 1 1 j ~ 1 ~~3~ 1 1

]

]

1

2__J

1

3 ~ 2 3

24-2

4

8

1

26-1

3

4

2

3

26-2

3

6

2

4

1 2

28-1 28-2

3 2

5 5

2 2

5 5

1 3 ~T~ 1 1 3 1 1

32-1 ~

2

~2~

3

~6~

32-2

2

3

4

6

34-1 34-2

1 1 5 7 1 5__J 1 1 5 1 | l | l | 6 | 7 | l [ 5 | l [ l | l | 5 | l

2

1 4

1

2

1

1

2

\

1

1

1

1

1 1

5

1

1

1

1

1 ~ 2 1 2

1~ 1

~1~

3

~ I

1

4

1

1

Using the equations (1) and (2), we calculate the values of rc and rs • The corresponding results are shown as table 5 (from table 3.) and table 6 (from table 4,) respectively. Table 5. The result of r° V

Grade D Dm rc

Al

A2

7 0.81 10.29 0.92

2 0.24 0.5 0.52

A3 2

A4 2

A5 3

0.24 0.5 0.52

0.29 0.5 0.42

0.57 1.33 0.57

A6 2 0.24 0.5 0.52

A7

A8

A9

A10

All

2 0.24 0.5 0.52

7 0.57 10.29 0.94

3 0.29 1.33 0.78

2 0.24 0.5 0.52

6 0.48 7.5 0.94

According to the values of rc shown in Table 5, we can see that the •J

evaluation terms can be ranked according to their values of sensitivity. Using the symbols of » and >- to denote the order of consistency of evaluation for all linguistic terms, i.e. likeness and preferment, and the obtained ranking result is AgwA 11 >-A 1 >-A9>-A 5 >-A2«A3«A6»A7«A 1 o>-A4 Table 6. The result of rs Al

A2

A3

A4

A5

7

10

6

7

3

3.75

9.53

2.35

5.28

0.26

2.5

1.44

3.56

10.29

22.5

7.5

10.29

1.33

4.8

4.8

10.29

0.42

0.31

0.51

0.17

0.52

0.25

0.35

Grade d dm

0.36

A6 5

A7 5

A8 7

A9

A10

All

3

5

6

0.47

2.03

2.47

1.33

4.8

7.5

0.35

0.42

0.33

According to the values of r* shown in Table 6, the evaluation terms can be ranked according to the values of sensitivity. We obtain A6>- A,;- A 2 «Aio>- A]>- A 8 »A 9 >- An>- A3>- A 7 VA 5 We determine the final linguistic terms in this experiment according to the result of table 5 and table 6. For given the values of the thresholds, we divide the

208 result of table 5 and table 6 into two subsets respectively, and they are denoted as V{ , Vc, V}< and Vs. Because of K,c > Vc and V{ >• Vs, set V* and Vts to build rough sets, i.e. V° u K,'5 and F^ n V* are upper approximation and lower approximation of rough sets. We use the lower approximation of rough sets to select the linguistic terms In this application, the values of the thresholds are given by experts. We obtain E° = 0.52 and E". - 0.35 respectively. With these two thresholds, we divide the whole set of the evaluation terms into 4 subsets using the method presented in Section 4. '\ = \A],A2,AJ,A5,A6,A1,As,A9,Aw,A]iy,y2 =v4 4 / "\ =\A\>A2,A4,A6,Ag,Ag,A]0i,v2 = \A3,A5,A1,Auj We obtain then the relevant evaluation terms preserved for future evaluation work by calculating the intersection between the subset of consistency and the subset of sensitivity. K, OK, =\Ai,A2,A6,Ai,A9,A\0j In this application, the relevant evaluation terms that should be preserved are: soft, smooth, compact, pleasant, fresh and dense. 6. Conclusion This paper discussed a method for reducing the number of evaluation linguistic terms using rough set theory. This method has been validated using a real application on fabric hand evaluation. In this application, we obtain the following results: for the consumers, only three to six terms are needed for characterizing the market behaviour but for experts determining the quality of products and product design, the number of terms needed is more than ten. Reference 1. G.B.Dijksterhuis, Multivariate data analysis in sensory and consumer science, Food & nutrition press, Inc. Trumbull, Connecticut 06611 USA. (1997) 2. H. Stone and J.L Sidel, Sensory Evaluation Practices (Third Edition). Academic Press, Inc., San Diego, CA. (2004) 3. Z.Pawlak, Rough sets, theoretical aspects of reasoning about data, Dordrecht, Holland: Klumer Academic Publisher groups. (1991) 4. X.Y.Zeng and Y.S.Ding. An introduction to intelligent evaluation, Journal of Donghua University. 3, 1-4 (2004)

T H E SPECIFICITY OF N E U R A L N E T W O R K S IN E X T R A C T I N G RULES FROM DATA

MARTIN H O L E N A Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod voddrenskou vezi 2, CZ-18207 Praha 8, e-mail: martin@cs. cas. cz

The paper provides a survey of methods for logical rules extraction from data, and draws attention to the specificity of rules extraction by means of artificial neural networks. The importance of rules extraction from data for real-world applications was illustrated on a case study with EEG data, in which 5 rules extraction methods were used, including one ANN-based method.

1.

Introduction

The wide spectrum of nowadays existing data mining methods entails a wide spectrum of various formal reperesentations for expressing the extracted knowledge, e.g., association and decision rules, classification hierarchies, clusters, regression functions, probability networks. Most of those representations are specific for certain classes of methods, for example classification hierarchies for classification, clusters for cluster analysis, regression functions for linear and nonlinear regression. There is, however, one important exception to that general characterization - knowledge representation with sentences of some formal logic. Such sentences, usually called rules, are used to express the knowledge from data in many methods, and those methods rely on various principles. The main objective of this paper is to provide a short survey of rules extraction methods and to point out the specificity of methods based on artificial neural networks (ANNs). Five rules extraction methods, including one ANN-based method, are illustrated on a case study with EEG data.

2.

O v e r v i e w of R u l e s E x t r a c t i o n from D a t a

The probably earliest method aimed specifically at the extraction of logical rules from data is the General unary hypotheses automaton (Guha), developed in the seventies within the framework of observational logic, which is 209

a Boolean logic extended with generalized quantifiers . In Guha actually only sentences of the form (~ x)(ip(x),ip(x)) are extracted, in simplified notation tp ~ tp, with Boolean predicates tp and tp, and with a binary generalized quantifier ~ . Moreover, nearly all generalized quantifiers encountered in the existing implementations of Guha have been inspired by statistical estimation methods (e.g. —>c, the founded implication with threshold c £ (0,1]) and hypotheses testing methods (e.g. —>[., the likely implication with threshold c and significance level a G (0,1), or ~ £ , the Fisher quantifier with significance level a). The closest relatives of Guha are various methods extracting from data association rules, i.e., Boolean implications valid with at least a given confidence c and supported by at least a given proportion s of data 2 ' 3 . A closer look reveals that such an association rule is actually a Guha sentence ip -> c tp, provided s = ^f- 4 . For the extraction of association rules, there is no harm if the antecedents of two or more implications overlap. On the contrary, a substantial part of data typically corresponds to the antecedents of several association rules simultaneously. However, rulesets with overlapping antecedents are undesirable if used for classification, and especially if used for decision making. In such situations, other methods need to be employed, leading to sets of Boolean implications without overlapping antecedents. Rules from such sets are called decision rules. The most important representatives of methods for the extraction of decision rules from data are AQ 5 , CN2 6 , and a large group of methods known as decision trees 7 , s . Their name refers to the fact that the extracted rulesets have a hierarchical structure, due to which they can be easily visualized as tree graphs. Somewhere between association rules and decision rules are fuzzy decision rules, which are implications of some fuzzy logic. Although they too are used for decision making and classification, it is in general not possible to avoid overlapping antecedents in a fuzzy logic. The best known methods extracting fuzzy decision rules are ANFIS 9 and NEFCLASS 1 0 . Inductive logic programming (ILP) consists basically in constructing an intensional definition of a relation from tuples known to belong or not to belong to it, while other relations can be used as background knowledge in the induced definition 11>12. Rules extraction with genetic algorithms (GA) is nowadays the probably most deeply elaborated GA application in data mining 13 - 14 . In that application, GA are used to optimize some quantifiable property or weighted combination of quantifiable properties of the extracted ruleset, e.g., its ac-

211 curacy, completeness, or some measure of its novelty or interestingness.

3.

R u l e s E x t r a c t i o n w i t h Artificial N e u r a l N e t w o r k s

All the rules extraction methods mentioned so far share one important common feature - rules are obtained directly from the input data, the knowledge contained in them is immediately expressed with sentences of some formal logic, without using any additional knowledge representation. Nevertheless, this is not a universal feature of all rules extraction methods, it does not pertain to one important class of such methods - methods for the extraction of rules from data by means of artificial neural networks. Actually, already the mapping computed by the network incorporates knowledge transferred t o the ANN from the data used for training, knowledge about implications that certain values of network inputs have for the values of its outputs. That knowledge is captured in the ANN architecture, and especially in a multidimensional parameter vector, which together with the architecture uniquely determines the computed mapping. It is this distributed knowledge representation that accounts for the excellent approximation properties of multilayer perceptrons. For humans, however, it is not as easily comprehensible as logical rules. T h a t is why methods for rules extraction from trained neural networks and from the approximations they compute have been developed since the late eighties. Up to now, already several dozens such methods exist (cf. the survey papers 15 > 16 ). They differ in a number of aspects, the most important among which are: expressive power of the extracted rules (Boolean and fuzzy rules), relationship between rules and network architecture, computational complexity of the method, its universality both with respect to acceptable kinds of neural networks, and with respect to acceptable kinds of inputs, as well as accuracy, fidelity and completeness of the rules. Nevertheless, all those methods have the common feature that they employ not only those input-output pairs that have been employed already for network training, but also additional pairs, obtained through the mapping computed by the network. Some methods actually employ only pairs obtained through that mapping, and do not need the original training pairs any more. T h a t feature sometimes allows ANN-based methods to find rules that can not be found with other methods. Hence, the distributed knowledge representation by means of the network architecture and the parameter vector is always an intermediate knowledge representation in ANN-based methods.

4.

A Case S t u d y w i t h E E G D a t a

In collaboration of neurophysiologists and transportation scientists, a research into EEG signals corresponding to somnolence has been performed at the Czech Technical University Prague. Its ultimate objective is to provide an empirical knowledge base for a system automatically detecting impaired vigilance, a cause of severe traffic accidents. A survey of that research and a description of the collected data have been presented in 17 . During data preprocessing, Gabor spectral analysis has been performed for EEG signals measured in 35 healthy volunteers and corresponding always to three vigilance levels - full vigilance, mental activity, and somnolence caused by sleep deprivation. The knowledge about the specificity of individual kinds of the signals was primarily obtained through visual inspection of the EEG records and of the corresponding spectrograms by

Table 1. Example rules extracted from the EEG spectra of 2 electrodes O and T with the methods (i.)-(v.) above. In the table, the abbreviations Of and Tj with / being an integer between 1 and 14 stand for the value of the spectrum from the respective electrode for the frequency / Hz

rule Oio 6 (6,30) & Tj e (3, 7.5) & T3 e (2,4) -> vigilance OioG (6,30)&T! 6 (1,5) —> vigilance Oio e (6,30) & T6 € <0,1.5)

extracted

specializes a rule

with method

extracted with method

Guha ->o. 65i (u, _^F —*o.oi Guha -+b.65,o.i.

Guha -+o.9, AQ,

~>0.01 Guha -+£oi

—> vigilance Oio G (0.5,1.5) k Ou e (3,7.5) & Til £ (2,15) —> mental activity Oi2e(l,3)&T3e(2,3)

Guha ->b.65,o.i> F ^O.Ol Guha -»o.65,0.1

—» somnolence 0 i € (1.5,11.5) k02e

(2,13)&

0 3 6 (0.4,7.5) & O12 e (3,11.5) k Ti € (2,9.5) & T 3 e (0,4.5) k T n e (0.5,6.5) - • vigilance

ANN

CART Guha —*o.9, AQ Guha -» 0 .9, -*o.65,o.i> CN2 Guha -> 0 .9, CART Guha -» 0 .9, CN2

expert physiologists. In addition, to 14 frequencies 1-14 Hz of EEG spectra from two selected electrodes, also 5 particular methods for rules extraction from data were applied: (i) the method Guha, in particular the generalized quantifiers —>o.9, "^o.65,o.i a n d ^ a o i ' m t n e s o called LISP-Miner implementation by the Laboratory of Intelligent Systems and Programming of the University of Economy in Prague; (ii) the version AQ21 of the AQ method, in an implementation by the Machine Learning and Inference Laboratory of the George Mason University in Fairfax; (iii) the CN2 method, implemented anew for this study; (iv) the Classification and Regression Trees (CART), one of the classical decision trees methods, proposed 1984 by Breiman et al. 7 , in their implementation in the Matlab Statistics Toolbox; (v) a method for the extraction of Boolean rules from data by means of piecewise-linear neural networks 18 , in an implementation by the author, making use of the system Rebup by the Machine Learning Research Center of the Queensland University of Technology. Several example rules extracted by those methods from the EEG data are given in Table 1. To better illustrate how different or similar are rules extracted with the individual methods, the table lists not only the methods with which the rules have been extracted, but also methods with which their generalizations have been extracted. 5.

Conclusion

The paper recalled the importance of methods for the extraction of logical rules from d a t a in data mining. To this end, a survey of most common methods of that kind was given, and a case study with EEG data was briefly sketched, in which 5 rules extraction methods were used. Moreover, the specificity of methods based on artificial neural networks was pointed out. T h a t specificity consists in the fact that the mapping computed by the network is always inserted as an intermediate knowledge representation between the input data and the extracted rules. This sometimes allows those methods to find rules that can not be found with other rules extraction methods, as also the reported case study has shown. Acknowledgments This research has been supported by the Czech Ministry for Education grant ME701, "Building Neuroinformation Bases, and Extracting Knowledge from them", and by the Institutional Research Plan AVOZ10300504.

214

References 1. P. Hajek and T. Havranek. Mechanizing Hypothesis Formation. Springer Verlag, Berlin, 1978. 2. R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A.I. Verkamo. Fast discovery of association rules. In U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 307-328. AAAI Press, Menlo Park, 1996. 3. M.J. Zaki, S. Parathasarathy, M. Ogihara, and W. Li. New parallel algorithms for fast discovery of association rules. Data Mining and Knowledge Discovery, 1:343-373, 1997. 4. P. Hajek and M. Holena. Formal logics of discovery and hypothesis formation by machine. Theoretical Computer Science, 292:345-357, 2003. 5. R.S. Michalski and K.A. Kaufman. Learning patterns in noisy data. In G. Paliouras, V. Karkaletsis, and C.D. Spyropoulos, editors, Machine Learning and Its Applications. Lecture Notes in Computer Science 2049, pages 22-38. Springer Verlag, New York, 2001. 6. P. Clark and R. Boswell. Rule induction with cn2: Some recent improvements. In Y. Kodratoff, editor, Machine Learning - EWSL-91. Lecture Notes in Computer Science 482, pages 151-163. Springer Verlag, New York, 1991. 7. L. Breiman, J.H. Friedman, R.A. Olshen, and C.J. Stone. Classification and Regression Trees. Wadsworth, Belmont, 1984. 8. J. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann Publishers, San Francisco, 1992. 9. J.S.R. Jang and C.T. Sun. Neuro-fuzzy modeling and control. The Proceedings of the IEEE, 83:378-406, 1995. 10. D. Nauck. Fuzzy data analysis with NEFCLASS. International Journal of Approximate Reasoning, 32:103-130, 2002. 11. L. De Raedt. Interactive Theory Revision: An Inductive Logic Programming Approach. Academic Press, London, 1992. 12. S. Muggleton. Inductive Logic Programming. Academic Press, London, 1992. 13. A.A. Freitas. Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer Verlag, Berlin, 2002. 14. M.L. Wong and K.S. Leung. Data Mining Using Grammar Based Genetic Programming and Applications. Kluwer Academic Publishers, Dordrecht, 2000. 15. A.B. Tickle, R. Andrews, M. Golea, and J. Diederich. The truth will come to light: Directions and challenges in extracting rules from trained artificial neural networks. IEEE Transactions on Neural Networks, 9:1057-1068, 1998. 16. S. Mitra and Y. Hayashi. Neuro-fuzzy rule generation: Survey in soft computing framework. IEEE Transactions on Neural Networks, 11:748-768, 2000. 17. J. Faber, M. Novak, P. Svoboda, and V. Tatarinov. Electrical brain wave analysis during hypnagogium. Neural Network World, 13:41-54, 2003. 18. M. Holena. Extraction of logical rules from data by means of piecewise-linear neural networks. In Proceedings of the 5th International Conference on Discovery Science, pages 192-205. Springer Verlag, Berlin, 2002.

STABLE NEURAL ARCHITECTURE OF DYNAMIC NEURAL UNITS WITH ADAPTIVE TIME DELAYS IVO BUKOVSKY, JIRI BILA, Department of Instrumentation and Control Engineering, Czech Technical University, Technicka 4, Prague, 166 07, Czech Republic MADAN M. GUPTA Intelligent System Research Laboratory, Department of Mechanical Engineering, University of Saskatchewan, 57 Campus Drive Saskatoon SK, S7N 5A9, CANADA The paper introduces the concept of continuous-time dynamic neural units with adaptable input and state variable time delays (TmDNU - Time Delay Neural Units). Two types of TmDNUs are proposed as they introduce adaptable time delays either into the neural inputs or both the neural inputs and the neural unit state variable. Robust capabilities of TmDNU for time delay identification and approximation of linear systems with dynamics of higher orders is shown for standalone single-input TmDNUs with linear neural output function (somatic operation). A simple dynamic BackPropagation learning algorithm is shown for continuoustime adaptation of the time delay parameters. The units also represent elements for building novel artificial neural network architectures.

1. Introduction Through simple principles, this article introduces the concept and basic types of linear dynamic neural units with adaptive time delays which we call TmDNU Time-Delay Neural Units. Contrary to conventional Tapped-Delay Neural Networks (TDNN) [1] where time delays are implemented within inter-neuron inter-layer feedback connections, the concept of TmDNU consists in time delays implemented within the neural units themselves. Further, it will be shown that neural weights of TmDNU are well adaptable by a simple dynamic BackPropagation (for dynamic BP based upon classical minimization of quadratic error function (performance index) [1] [2]. Examples of implementation of two basic types of TmDNU as applied to system identification and to higher order system approximation will be shown. TmDNUs as standalone working neural units with linear synaptic operation

215

216 (activating function) represent simple adaptive mechanisms for linear approximation (identification) of dynamic systems which optionally include time delays, and thus can also serve for identification of time delays instead of seeking for Pade approximation. When used in a network mode, TmDNUs, especially TmDNU-Type 2 constitutes a novel type of artificial dynamic neural networks (DNN) that could be called Time-Delay Neural Networks (TmDNN). In particular, a "super stable" designs of internal neural architectures for both TmDNU - Type 1 and TmDNU Type 2, are proposed as they prevent neural unit from converging toward values resulting in instability of standalone TmDNUs and learning algorithm.

ith TmDNU- Type 2

"l(0 ^2l\ •

Td=™ii

f(V,X0,W,)

dx(i) dt »

T

d=™$2

u„(t) ',

T

2

Xi=M>A2

Figure 1: 1 Linear Time-Delay Neural Unit - (Type 1 for W4=0), (Type 2 for Wa, ^ 0) with adaptable time delays represented by weights W3J on its input and W4 in state variable x. For simplicity, the letter /' indexing the neural unit instance is omitted as for indexing the neural parameters on the picture.

2. Dynamic Neural Units with Time Delays: TmDNU, and TmDNU2 The nature of the (dynamic) Time-Delay Neural Units (TmDNU) originates from linear dynamic neural units (Pineda, Hopfield, Cohen-Grossberg) [1] [3] [4]. TmDNU can be viewed as an adaptive mechanism capable of approximation of a dynamic system in form of a linear differential equation. The analogy with a differential equation indicates that we are going to deal with continuous dynamic neural units (DNUs), which are working in continuous-time where the fastest sampling period of a whole neural architecture is practically determined by the capabilities of a particularly used numerical method. We will classify TmDNUs into

217 two major types. Namely, TmDNU-Type 1 (w4=0) [2] and TmDNU-Type 2 (H>4^0) (Figure 1) will be introduced in this work. In case of TmDNUs, their architecture corresponds to the structure of the first order differential equation with time delayed input and optionally with time-delayed state variable. dx{t), -^-(

2 w

/

2 v-> + 7 min) + * ( ' - w 4 )=ZuW2j

2 ' » / ' ~W3j

)

(J)

Let's consider modification of single input DNU [1] given in Figure 1 where, w\, wi, wy (J~-/•••"), and w4 are neural weights, n is the number of neural inputs, the square root of w^ represents the time delay of/* input the square root of w4 represents the time delay of internal state variable x(t), constant x0 is a minimum time constant of the unit (analogy to neural bias), u(t) is vector of neural inputs,
= v— w,

(2) +x0

The somatic operation ()){) of a neuron will be kept as linear. For in-this-way designed dynamic neural architecture (Figure 1), the neural bias X0 represents the minimal time constant, so it can be assigned *0^*min>0-

(3)

Researchers dealing with control engineering applications may sometimes deal with linear dynamic systems containing not only the input delays but also time-delayed state variables, where the introduction of time delay, here denoted as w42, into state variable results in the increase of capability to better approximate higher order dynamic systems, and thus it might potentially result in more robust dynamic neural networks with even less number of neural parameters and less complicated structure suitable for a given problem. We propose the above introduced dynamic structure Eq. (1) to be the basis of the both types of Time Delayed Neural Units that we also denote as TmDNU, (w4=0) [2] and TmDNU2 (w 4 #)).

218 3.

Dynamic BackPropagation for Time-Delay Neural Units

Dynamic neural units with time delays (TmDNU - Time-Delay Neural Unit) may be adapted by very simple dynamic version of the BackPropagation (BP) learning algorithm based on common gradient minimization of performance index (error function) [1]. The well known principle of this supervised learning and the weight adaptation is recalled in Eq. (4) (5) Eq. (5) then represents a simple dynamic version of BP which can be used to adapt also the neural weights representing time delays both on inputs and delay of the internal state variable of the proposed TmDNUs. + Aw,(*)

aw,

(4)

d0(QL-i\dGTmDNU(s)u^

•He(i)-

dx(t)

{

(5)

dwj

where jU a n d /J denotes learning rates related as (1 = 2 JU , u{t) denotes neural (and system) inputs, x(t) is state variable of the neural unit, y(t) is neural output and yr(i) is real system output, <J){) is neural output function (somatic operation of a neuron), w,- are neural weights, e(t) is an actual output error, GTmDNu (?) is the transfer function of a dynamic system represented by the internal dynamic structure of T m D N U , S is the Laplace operator, L~ denotes the inverse Laplace transformation, and t is the continuous parameter of time. In case of T m D N U (Figure 1) with a single input u{t), the neural output can be expressed using continuous-time transfer function as

y(Q =

r-i 2 (s) • U(s)}) = L

w2 -e 2 ,~

-{w,)2-s •U(s)

\ „ . o-K)2'*

(6)

where e is the Euler's number and (x) = x for simplicity In particular, the actual increment of the delayed parameter w4 can be then derived as

<j>{x)=x

=fle(t)L

A

-

— — • Y(s)

(7)

219 The realization of Eq. (7) is then depicted in Figure 2 bellow.

REAL SYSTEM «(/) TmDNU, 2w4y(t-w4

) 2w 4

&

(wl

s

+*min)

w/t/V Figure 2: The mechanism for generating the neural weight increment AW4 of neural weight W4 which represents the adaptable time-delay of TmDNU-Type 2 (Figure 1). Besides the purposely designed stable internal dynamic structure of both TmDNU) and T111DNU2, no special measures about stability of the units had to be taken to assure the stability of learning algorithm except the appropriate choice of learning rate and initial conditions. The choice of initial neural weights for TmDNU-Type 2, Eq. (1), should follow the condition

w4 w

\

2

71

+*min * 2 -

(8)

which is the stability condition for such a class of dynamic systems [5]. Recall, the TmDNUs are focused as standalone neural units in this work, and only the simplest and pure learning rule (the dynamic BackPropagation) is shown.

220

4. Approximating Capabilities of Time Delay Neural Units- Type 2 (TmDNU2) TmDNU-Type 2 for Approximation - Adaptation

— y(t) is neural output from TmDNU — yr(t)\s output from identified plant u(t) is input signal into the plant and TmDNU

U(t) -0.5

Figure 3: Detail of adaptation of TmDNU-Type 2 with linear output function ^>( ) . The neural unit performs approximation of a dynamic system (9) with time-delayed both input and state variable.

As an example, the application of TmDNU demonstrates its capabilities to approximate time-delayed systems and higher order dynamic systems by a simple dynamic system which both types of TmDNU represent. In this experimental part, the 'real' system-to-be-approximated is chosen as a linear plant of 10th order with the transfer function G(*) =

(2s+ 1)10

(9)

where S is the Laplace operator. In fact, the system (9) has been selected for it can be well approximated by system (1) in both time andfrequencydomain [5]. 5. Conclusions Two types of linear (continuous-time) Time Delay Neural Units including adaptable time delays have been proposed in this work. The units are denoted as TmDNU - Type 1 respectively Type 2 (or just TmDNU, respectively TmDNU2). Their standalone single-input modifications with the simplified linear neural output

221 (somatic) operation have been focused in this work in order to demonstrate their capabilities to identify time delays in dynamic systems or to approximate dynamic systems with dynamics of higher orders. TmDNU-Type 1 has been capable of both identifying time delays within linear plants or approximating higher order systems through input delays and can be also useful in applications where Pade approximation would have to be used otherwise. TmDNU-Type 2 has been introduced as an extension of the dynamic structure of TmDNUi where another adaptable time delay is introduced into the state variable of TmDNU; therefore, the approximation capabilities of TmDNU are enhanced. TmDNU-Type 2 for Approximation - Neural Weight Convergence 1

4

4

3

co

2 CM"

5

1 o>"

/ ^

1

i

, |

r

w1 ] i

w3~i

1

! w4| initial weights: w10:=w30 =w40=3 ! i I w2l j

J

1

f

1

i

f

i

1

> e... output error

J

1 1 i

|

|

200

300

400

—>-—i,

0

100

i .-j

500 time [sec]

Figure 4: Example of convergence of neural weights of TmDNU-Type 2 for approximation of system of 10* order (9). Neural weights wj and wt represent the continually adaptable parameters of time-delays

The disadvantage of TmDNUs for identification of time delays and system approximation can be seen in (relatively) longer time of adaptation which can be, however, significantly reduced by the choice of another set of initial conditions or by more sophisticated methods. In parallel, the problem of weight convergence toward local minima of error function is reduced due to the naturally robust approximating capability of TmDNU-Type 2 and can be further eliminated by the choice of another set of initial weights for adaptation. According to our experiments, the basin of initial neural weights for which the units converge to a minimum, which

222 provides the approximation with a sufficient degree of accuracy, is practically large enough. One of the advantages of TmDNUs can be seen in the capability to identify time delays within linear dynamic systems, robust capability to approximate linear dynamic systems with higher order dynamics (e.g. the approximation of 10th order is shown) in continuous-time domain. Further advantages are the simplicity of the learning algorithm, algorithm extendibility to neural units with higher order dynamics including more time delays, and excellent stability during adaptation due to the proposed design of the novel super stable-structure (1) of an artificial dynamic neuron. The novelty can also be seen, contrary to common artificial neural networks handling time delays in discrete time domain, in turning to continuous parameter of time, thus to continuous adaptation of time delays within artificial neural architectures, i.e. to adaptation of continuous transfer functions of dynamic neural units with time delays. Further, a possible implementation of multiple-input TmDNUs with nonlinear somatic operation 0 ( ) (Figure 1) into networks represents novel directions in continuous-time artificial neural networks. References 1. M.M. Gupta, Jin L., and N. Homma : Static and Dynamic Neural Networks: From Fundamentals to Advanced Theory, IEEE Press and Wiley-Interscience, published by John Wiley & Sons, Inc., 2003, ISBN 0471-21948-7 2. Bukovsky, I., Bila, J., Gupta, M , M : Linear Dynamic Neural Units with Time Delay for Identification and Control (in Czech), In: Automation, Vol. 48, No. 10, Prague, Czech Republic, Oct 2005, p. 628-635. ISSN 0005125X 3.

J. Hopfield: Neural Networks and Physical Systems with Emergent Collective Computational Abilities, Proc. Nat. Sci. USA, Vol. 79, pp.25542558 4. F. J. Pineda: Dynamics and Architecture for Neural Computation, J. Complexity, Vol.4, pp.216-245, Sept. 1988 5. Zitek, P. - Vyhlidal, T.: Low Order Time Delay Approximation of Conventional Linear Model. In: 4th MATHMOD Vienna Proc, Vienna 2003, p. 197-204

EVALUATION CHARACTERISTICS FOR MULTILAYER PERCEPTRONS A N D TAKAGI SUGENO MODELS

WOLFGANG KAESTNER, T O M FOERSTER, CORNELIA LINTOW, RAINER HAMPEL Institute

University of Applied Sciences Zittau/Gorlitz of Process Technology, Automation and Measuring Technology Theodor Koerner-Allee 16, 02763 Zittau, Germany E-mail: [email protected]

(IPMj

In this contribution a comparative analysis of the evaluation for the two Soft Computing methods Multilayer Perceptron (MLP) and Takagi Sugeno model (TSM) will be described. Above all the developed characteristics of the linear connection between the input values and output values are to be compared with regard to the model quality. In addition, a correlation with the characteristic of the linear relationship of two characteristics of the process data is derived.

1. Introduction Multilayer Perceptrons [4] and Takagi Sugeno models [3] are well suited to approximate nonlinear process connections. In particular, this concerns processes, which can be described by analytic methods only insufficiently or not at all, since the analytical relationship between input values and output values is not or only poorly known. If an MLP or TS model has been established then it may be hard to describe the quality of such models. A possibility consists in evaluating various error characteristics. Here, the measuring range and stability problems are to be taken into account. Thus relatively small errors around zero can cause a large relative error leading to wrong conclusions. In addition, error values permit only local information about the model's accuracy in each sample point. For mean errors their distribution should be known in order to get a sound interpretation. Hence, for making statements about the global quality of a model several error characteristics should be consired. To do so, evaluation characteristics correlation weights tyxy [1] for the MLP and TS correlations Kxy 2 for the TSM were developed. They indicate

223

the linear connection between individual characteristics of the inputs and outputs of the models (like the empirical correlation coefficient rxy does for databases). Under certain boundary conditions (see section 3), the two evaluation characteristics of the models are directly comparable with the empirical correlation coefficient rxy. 2. Evaluation m e t h o d s 2.1. Analysis

by means

of internal

characteristics

Here we want to show that the global connection between the characteristics of the inputs Xi and the outputs Yj is mirrored the weight distribution of the MLP respectively in the coefficients of the TSM thus enabling further analysis. To this aim, characteristics had to be derived in certain analogy to classical data analysis. Finally, we want to make global conclusions about the linear behavior of the models in the state space in order to get a comparison with the linear connection of the databases. The Pearson correlation coefficient rxy may serve as such a (dimensionless) measure ranging in [—1,1]. Note, however, that rxy = 0 does not imply independence (the relationship may be strongly nonlinear). For two quadratically integrable variables X and Y the correlation coefficient is given by .

Cov(X,Y) _ ^Var(X) • ^/VarjY)

If only two series of measurements are known for x\, x 2 , . . . , xn and 2/1,2/2, • • •, yn, then the empirical correlation coefficient is computed by E i (xi - x){yi - y)

„ xv

-2.

VEi(*i-*)VEi(yi-i/) 2 '

with x n and 1

"

J7=--Z>' the latter being the empirical expectations of X and Y.

(4)

2.1.1. Weight analysis of the MLP The connection between process inputs and output values is stored in the architecture of the (trained) MLP by the individual weights Wij (weight distribution).

Y

Y

i

(1

Error characteristics

\Y

Y

r

Figure 1. Context of error and process characteristics In Fig. 1 the connection between the weight distribution and the connection of ther process characteristics given by the data as well as their evaluation is schematically represented. It can be seen that the error characteristics only refer to the outputs, whereas the correlation characteristics describe the process relationships between data and model. That is, the weight correlation provides information about the behavior of the model in the state space. To analyse the weight distribution so-called weight factors have been introduced, whereby the following simplifications were made. • Decoupling of the net structure • Linearization of the transfer function The former is accomplished in order to enable an individual anaylysis of the connection between input Xi and output value Yj. The network

architecture is therefore divided into i x j subnetworks.

Output Layer

—

Hidden Layer

—B,

Input Layer

Figure 2. Uncoupling of the net by the example of a two-layered MLP

Consider, for example, a Multilayer Perceptron with two layers of trainable weights, one input neuron X, one output neuron Y and two hidden neurons a and b. Hence, we get four combining weights u>y, which are denoted by w\a, u>xb, w a y and Why- As transfer function the tangent hyperbolicus is used. The bias values are denoted by B\. The output of the trained net is calculated through Y = tanh(way

• Ha + Wby • Hb + By)

(5)

with Ha = tanh(X

• wXa + Ba) ,

(6)

Hb = tanh(X

• wxb + Bb) .

(7)

The linearization of the transfer function leads to simple (linear) connections between the input and the output value. For small arguments the transfer function can obviously be linearized by tanh(z) s* z. Hence, from (1) we obtain Y = waY(X

• wXa + Bo) + wbY(X

Y = X(wXa

• WaY + ™Xb ' WbY) + Ba • WaY + Bb • Wby + By G

• wXb + Bb) + BY ,

(8) •

(9)

Biasportion

From (9) one can see that the equation consists of a linear part (with respect to the inputs X) containing the weights iOy, and a bias part.

The weight measure GxtYj indicates the linear connection between input Xi and output Yj. For an MLP of arbitrary size it holds

that is, the individual weights w^ of a vector path W ^ K , between the input Xi and the output Yj are multiplied and summed up afterwards. There are ax • a.2 • • • • • en vector paths w with the number of neurons 01 in the first hidden layer, 02 in the second hidden layer,..., a; neurons in the hidden layer I. In general for o inputs and p outputs one gets o • p weight measures G. They are independent of the starting initialization or training repetitions. For any training repitition we obtined nearly identical weight measures (for similar or identical net quality) even for different weight distributions u>ij. The transfer functions, however, may not vary (otherwise only the signs of the Wij are preserved). In order to compare or to generalize the weight measures for a process (with various MLP with different architectures) a normalisation is necessary. For this the correlation weight ^ x ^ has been introduced [*].

*** = -

^

=

•

(ID

A further characteristic is given by the weight measure HXY VxiYj

= 92XiY. .

(12)

Obviously, i

£fifc = l.

(13)

fc=l

2.1.2. Cluster analysis of the TSM For the computation of the Takagi Sugeno model, the cluster procedure was used, based on [3] . The cluster algorithm divides itself into structure identification and parameter identification. In the structure identification the database is classify into the clusters c. In the parameter identification the parameters for the fuzzy rules (number of clusters) are determined.

Now we perform the input-output analysis for TSM in a similar way as we did for MLP. considering In order to be able to regard the connection between the input value Xi and the initial value Yj, the structure of the model is divided here also into i x j "subnetworks". Figure 3 shows the uncoupling with i inputs and j = 1 initial value and the appropriate number of Clustern c. The uncoupling supplies i subnetworks, which show the connection between the respective Xi input and Yj initial value.

Figure 3.

Uncoupling of the structure of a TSM

The subnetworks can be described thereby as follows.

Yj(Xi) = (bu • Xi + b10) + ... + {bci • Xi + bc0) .

(14)

Simplified one can write for (14): Yj{Xi) = FirXi + Bj

(15)

with c

Fij =bli + b2i + ...+bCi=

J2bmi

,

(16)

c

Bj =b10+b2o + ...+bCQ=J2

bm0 .

(17)

771=1

The Fij in equation (15) are the rise factors represent over all cluster and according to the coefficients 6C, the cluster c for Xi are determined. The

value B summarizes the absolute values of the linear functions of the individual cluster. For the evaluation the TS correlation Kxy was introduced, which consists of the individual parameters of the cluster. It is computed as KH =

^

.

(18)

Also the TS correlation is a standardized characteristic, which represents the linear connection between Xi and Yj within the range [—1; 1]. As the further characteristic the TS measure My is introduced, which computes itself by squaring Kij. Ma = K% .

(19)

i

£)M f c = l .

(20)

fc=i

3. Comparative analysis In the Introduction boundary conditions were mentioned which should be fulfilled for a reasonable comparison of the characteristics rxy, KXY and tyxy grasping the linear relationship between inputs and outputs. These condition shall be discussed in more detail. (1) As described above, the empirical correlation coefficient rxY is a measure for the linearity between two series (e.g. of measurements) {xi,yi), (x2,jj2), • • • ,{xn,yn). For the evaluation of multiple stochastic dependencies partial and multiple correlation coefficients are applied. While the former accounts each of two series (neglecting the others) the latter evaluates the relationship of one series with respect to all others. The empirical correlation coefficent, however, ignores these relationships what of course may be erroneous. On the other hand, it is not possible to determine the influence of individual characteristics on the weight matrices of an MLP or the clusters of a TSM. Thus rxr is used. (2) For the computation of rxY we usually assumed that the given kdimensional random vector has a fc-dimensional normal distribution. However, this may unrealistic in applications. In these cases, TXY may be unreliable. Contrariwise, for MLP or TSM the distribution

230 is of no meaning with respect to the state space mapping. Recall that the models interpolate between the given nodes. This takes place also for non-normally distributed databases. 4. Conclusion The derived characteristics ^XY and KXY are suitables measures for the linear relationships between the inputs and outputs X and Y of MLP and TSM. They possess the same expressiveness as the empirical correlation coefficient TXY determined from a database. The concordance of these characteristics shows that the interrelationship of the data can be reproduced by the weights u>ij of the MLP respectively the coefficients of the TS models. References 1. T. Foerster, W. Kaestner: Analyse von Gewichtsstrukturen in Multilayer Perzeptren. Technical Report, IPM 2003. 2. C. Lintow: Modellierung/Simulation mittels Soft Computing Methoden. Diploma Thesis, IPM, Zittau, 2005. 3. C. Wong, C. Chen: A Clustering-Based Method for Fuzzy Modeling. Tamkang University, Taipei, Taiwan, 1999. 4. A. Zell: Simulation Neuronaler Netze. Addison-Wesley Publishing Company, Bonn, 1994.

RESEARCH ON IMPROVED MULTI-OBJECTIVE PARTICLE SWARM OPTIMIZATION ALGORITHMS DUO ZHAO* School of Electrical Engineering, Southwest Jiaotong Chengdu, Sichuan 610031 P.R.China

University,

WEIDONG JIN School of Electrical Engineering, Southwest Jiaotong Chengdu, Sichuan 610031 P.R.China

University,

As a novel multi-objective optimization technique, multi-objective particle swarm optimization (MOPSO) has gained much attention and some applications during the past decade. In order to enhance the performance of MOPSO on the diversity and the convergence of the solutions, this paper introduce the new methods to update the personal guide and select the global guide for each swarm members from the particle set and the Pareto front set. In order to validate the proposed method, some simulation results and comparisons with respect to several multi-objective evolutionary algorithms and MOPSO based algorithm which are representative of the state-of-the-art in this area are presented. The article concludes with a discussion of the obtained results as well as ideas for further research.

1. Introduction Since its inception in 1995, particle swarm optimization (PSO), which mimics the social behavior of a flock of birds or fish in order to guide swarms of particles towards the promising solution, has gained rapid popularity as a technique for solving single objective optimization problems [1,2]. In the past, evolutionary algorithms (EAs) have become established as the method at hand for multi-objective optimization problems (MOP) [3]. Recently, researchers have paid more and more attention to PSO to solve MOP. Several multiobjective particle swarm optimization (MOPSO) algorithms have been proposed in the last few years [4],[6],[8]. The MOPSO methods have the property that particles move towards the Pareto-optimal front during generations. Some of these works mainly focus on the design of novel selection or archiving ' Work partially supported by grant 60572143 of the China National Science Foundation and by grant 2005A13 of the Southwest Jiaotong University Science Foundation.

231

mechanisms [7]. But when considering the diversity and the convergence of the solutions, there are some redefinitions and new methods needed when extending the PSO to MOPSO. For instance, in MOPSO the Pareto-optimal solutions should be used to be the global guide for each particle of the swarm [8], and each particle should have a personal guide which represents the best position for itself. The selection of the global guide and the updating of the personal guide have a great impact on the performance of the MOPSO. Thus how to choose the global guide from the Pareto front and how to update the personal guide is a key problem for MOPSO. There are several main approaches to maintain diversity of optimal solutions for MOPSO that have been reported: the e-dominance method [9], the Sigma method [6], the Subswarms method [8], and the stripes method [4]. In this paper, we present a new method for global guide selection and personal guide updating to maintain the diversity and convergence for MOPSO. 2. Multi-objective Optimization The multi-objective optimization problems can be expressed as: Minimize F(x) = \ft(x),f2{x),...,/„»] L - J s.t.

gj(x)>0

; = l,2,...,p,

(1)

hk(x) = 0 k = \,2,:-,q

where m is the number of conflicting objective functions ft: 9?" —> 9? that we want to optimize simultaneously, x = [x],x2,...,xn]T s X c S czW is called the decision vector which is belong to the feasible region, i c S " the feasible region is formed by the/? equality and q inequality constraint functions. Deflnition 1: A decision vector x e X i s Pareto- optimal for every x e X , iff: ehherfi(x)

=

fi(x'),yis{\,2,...,m},orfi(x)>fi(x'),3iG{l,2,...,m}-

Deflnition 2: A decision vector x , e X is said to dominate x2 e X (denoted 3c, -< x2) iff: • x, is not worse than x2, i.e. fi (xx) < fi (x2) V/ e {1,2,..., m}, • x, is strictly better than x2, i.e. ft (x,) < ft (x2) 3/ e {1,2,..., m}. Deflnition 3: The Pareto-optimal Set (P ) is defined as: P*={XEX|^'EX,F(3C')^F(JC)}.

Deflnition 4: The Pareto Front (PF ) is defined as: PF'={F(x) = [Mx),f2(x),...,fm(x)]\xeP'}.

(2)

(3)

In a practical problem, it is impossible for us to get the specific description of the line or surface for the Pareto Front. In order to get the Pareto Front, the usual way is to calculate the nondominated points to form the Pareto Front.

3. Related work PSO and MOPSO involve a swarm of n-dimensional particles through the problem space, in search of the single optimal solution for a single optimization problem or Pareto-Front for the multi-objective optimization problem. Each particle has its own velocity, a memory of the best position it has obtained so far, described as personal guide or personal best position (pBest), and the knowledge of the best position achieved by all the particles of the swarm refer to global guide or global best position (gBest). For PSO, each particle according to its previous velocity, pBest and gBest, adjusts its velocity for the next position as: C , = « < +c^{pBestit -xD + c^igBesti, -x(,). (4) And the position of the particle is updated as: xJ - xJ + vJ . (5) where /' = 1,2,...,«, j is the index of particles of the swarm, and co is the inertia weight of the particle, c, and c2 are two positive constants, and rt,r2 e[0,l] are random values, and t denotes the generation index. Equation (4), (5) can also be applied to MOPSO, but the definition of the global guide (gBest) has to be redefined. In MOPSO, the global guide is no longer a single point but a set of non-dominated solutions. Therefore, the global guide must be selected from the archive that is an external population, in which the updated set of non-dominated solutions are kept. The personal guide ipBesi) for the particle j is a memory keeps the compared result of the last nondominated position pBestt and the new position xl+l. How to select the global guide from the archive, and how to perform the comparison for the personal guide has a great impact on the performance of convergence and diversity of the solutions. This has been studied in [4], [6], [8], [9]. • Laumanns's e-dominance method [9]: The main principle is the relaxation of the dominance. The main reason this uses E-dominance is to keep a certain number of particles in the archive, because the size of the archive depends on the e value which reduces the computation time. • Mostaghim and Teich's Sigma method and Subswarms method [6], [8]: The concepts of Sigma method and the Subswarms method are clustering. Depending on the elite particles in the archive and creating the clusters around the elites thus cover the Pareto front. • Villalobos-Arias's Stripes method [4]: The main mechanism is based on the use of stripes which are applied on the objective function space. According to the calculating of each minimum objective function values (2 objectives), form a line that similar to the Pareto-optimal front.

4. Description of our proposal We adopt the elitism scheme with archive that contains the historical nondominated solutions the particles have found so far. The main idea of our method is adding the global information in to the MOPSO. The global information is the average objective function values (denoted as G,) calculated by the nondominated solutions in the archive at the generation t as follows:

G, =[gu,---,gm,,]

(6)

=[

N N where j e {1,2,..., N} is the index of the archive member. 4.1. Global Guide Selection We can consider the position of each solution in the archive or swarm in 2 objective spaces. And in Delta method, an angle 8Aj is calculated with each archive member's

objective

function

values

(flJ,f2j)

and the

global

information ( g , , g2) as follows: = tan SPi = tan

fij-Si

( f -

-tan ~\

-tan

Si

(7)

8^ V6 i / JLL

(8)

The angle SPi with each particle in the swarm can be calculated as Equation (8) where / e {1,2,..., M) is the index of the particle in the swarm. Figure 1 show the core idea of finding the global guide among the archive members for each particle in the swarm. First, calculate angle SAJ ,je{\,2,...,N} for each member/ in the archive. Second, for particle / in the swarm calculate the difference between SA. and 5Pj, V i e {1,2,...,M} . Then, the member j in the archive which its 5AJ has the minimum difference to 8Pi

is selected as the global guide for the particle /'. In this figure, we can see

that the particle distribution in the search space are divided in two main areas, the left-bottom corner and the upper-right areas. According to the Delta method, particles

in

the

where I J ^ -Spj|

left-bottom

< \SAj -5p\,(k

corner

are

assigned

global

guide Ak,i ,

* / ' ) from archive members. Thus has the

ability to drive the particles direct to the archive members, in order to form the Pareto-optimal front. Particles in the right-upper area, because of their

distribution are assigned one of the two archive members with maximum and minimum angle. In this way, it ensures the particles in the right-upper area have the ability to explore the end of the Pareto-optimal front and to overcome the back draws in the Sigma method.

M

* Ptf*n.i ! W.*~

•

P.*rtit3e m rfe? swam

0

Averse fsmeti«* ultKs

», ?v.

4

"M

\ Figure 1.

ft

,'

\» *^

Global Guide Selection

Figure 2. Personal Guide Updating

4.2. Personal Guide Updating Personal guide Pit for each particle /, is a memory which stores the best position that the particle has found so far at generation t. When the particles gained a new position Pi,l+\ at next generation, an update should be performed. The personal guide update for each particle is another problem in the MOPSO as proposed in [8]. In our method, we adopt another scheme for personal guide updating which illustrated in Figure 2 (2-objective problem). The archive members' average objective function value G, is also used in the personal guide updating scheme. There four relations between personal guide P't and the G,, only when particle Pi,/+i at next generation directjn to the shadow area, personal guide Pjt will be updated with Pi,t+\ , whereP,,(+i =PIM . 5. Comparison of results The Delta method has been validated using several test functions that take from the specialized literature proposed in [3]. In this paper, we choose three test functions: ZDT1, ZDT2 and ZDT3 describe in Table 1. In order to know how competitive our approach was, comparisons against two other multi-objective evolutionary algorithms: NAGA-II [5], s-MOEA [9] and one multi-objective particle swarm optimization algorithm ST-MOPSO that are representative of the state-of-the-art were presented. And in order to allow a quantitative assessment of the performance of these multi-objective algorithms,

we adopted the following two metrics: Inverted Generational Distance (IGD) and Success Counting (SC) which were proposed by Villalobos-Arias et al. Table 1. Test Function ZDTl

*(*,

0=i+9(2I 2 *,)/(«-i);

h(fvg)=\-JJJg

x,e[0,l], n = 30, i = l n 2-objective,30 parameters

/(*) = *,; f1(x) = g(xl,...,x,)-MJt,g) ZDT2 g(x, x,)^l + 9-(£l2x,)/(n-l); A(/i,g) = ! - ( / / « ) ' x,6[0,l], n = 30, i = l,...,n 2-objective,30 parameters fl(x) = xl; f1(x) = g(x2 x,)-h(ft,g) ZDT3 g(x2 x, e [0,1], n = 30, ( = 1 n *„)=i+MXL*,)/(«-i); 2-objective,30 parameters A(/;.«) = l-/A78-(/;/s)-sina0a-yi) f,(.x) = xl; f1(x) = g{xJ xn)-hW„g)

For each test functions, 30 independent runs have been performed. Figs. 3, 4 and 5 show the graphical results produced by DMOPSO in the test functions ZDTl, ZDT2 and ZDT3 respectively. The true Pareto-optimal fronts of the problems are shown as continuous lines in the figure. In order to illustrate the whole performance of the method for 30 independent runs, the solutions were drawn in the left figure. Middle and right figures contain the solutions which have the bad and best performance with IGD metric respectively.

0

0.5

1

0

fi

0.5

1

0

n

0.5

1

n

Figure 3. Pareto fronts obtained by our method for test function ZDTl

Figure 4. Pareto fr6nts obtained by our method for test function ZDT2

0

0.5 f1

1

0

0.5 fi

1

0

0.5 (1

Figure 5. Pareto fronts obtained by our method for test function ZDT3

1

237 Table 2, 3 and 4 show the comparison of results among the DMOPSO and the results obtained in [4] for the other three algorithms considering the previously described metrics for three test problems. In table 1, for test function ZDTl, although the performance of DMOPSO for SC metric is not as good as the STMOPSO, it is much better than NAGA-II and e-MOEA. It also can be seen that the average performance of DMOPSO is as good as the STMOPSO with respect to the IGD which means solutions obtained from our method are much closer to the true Pareto-optimal front than other methods. In table 2, DMOPSO has a better performance for metric IGD, and a better performance for metric SC. In all 30 independent runs, there was only one time that the solutions did not cover the Pareto-optimal front entirely. In table 3, for test function ZDT3, there exists some local Pareto-optimal, and the results of the worst number for SC is zero, which means the solutions founded by DMOPSO did not reach the Pareto-optimal front at all. This is related to the property of the test function, the same results can be seen with the other 3 algorithms. Table 2. Results of the IGD and SC for ZDTl test function IGD Best Worst Mean St. dev. Median

DMOPSO

STMOPSO

EMOEA

1.98e-05 6.18e-03 2.52e-04 6.06e-04 2.32e-04

3.43e-04 6.70e-04 4.30e-04 7.39e-05 4.19e-04

1.67e-03 1.90e-02 7.95e-03 5.05e-03 6.55e-03

NSGAH

2.04e-03 2.76e-02 6.41e-03 5.22e-02 4.95e-03

DMOPSO

SC STMOPSO

EMOEA

NSGAII

100 49 92.57 6.978 98

100 95 99.3 1.208 100

2 0 0.3 0.53 0

8 0 1.1 1.668 1

Table 3. Results of the IGD and SC for ZDT2 test function iIGD Best Worst Mean St. dev. Median

DMOPSO

STMOPSO

EMOEA

4.56e-03 2.54e-04 4.44e-03 1.75e-04 4.56e-03

5.11e-02 1.04e-02 1.87e-02 4.39e-04 5.11e-02

5.16e-02 1.68e-02 1.28e-02 1.12e-02 5.16e-02

NSGAII

5.50e-02 0.37373 2.03e-02 5.18e-02 5.50e-02

DMOPSO

SC STMOPSO

EMOEA

NSGAn

100 9 93.4 8.375 100

100 1 75.6 41.838 100

0 0 0 0 0

0 0 0 0 0

Table 4. Results of the IGD and SC for ZDT3 test function DMOPSO

STMOPSO

EMOEA

NSGAII

DMOPSO

SC STMOPSO

eMOEA

NSGAII

1.05e-04 2.75e-02 7.29e-03 4.27e-03 3.49e-03

7.05e-04 3.97e-02 3.69e-03 7.08e-03 2.04e-03

2.36e-03 0.2357 8.42e-03 3.97e-03 8.04e-03

1.63e-02 2.31e-02 6.51e-03 4.45e-03 5.95e-03

100 0 65 37.546 93

100 0 82.733 25.81 88.5

4 0 0.567 1.165 0

4 0 0.667 0.254 0

I[GD Best Worst Mean St. dev. Median

6. Conclusion and future works This paper has described a method called DMOPSO for multi-objective optimization problems. The core idea is the new method for global guide selection and the personal guide updating scheme in the multi-objective particle

238 swarms optimization algorithm. This new method ensures better performance for diversity and the convergence of MOPSO. The proposed technique was tested using several multi-objective test functions and compared against 3 algorithms. The results have promising improvements over the evolutionary approach for multi-objective optimization, and have almost the same performance as the STMOPSO based on the MOPSO algorithms. This paper mainly considered the 2-dimensional multi-objective optimization problems. How to expand the Delta method to the higher dimensional multi-objective optimization problems will be studied in the future work. Another aspect that we would like to explore in the future is the application of the DMOPSO method for train suspension control system optimization. References 1. J. Kennedy and R. Eberhart, "PSO optimization," in Proc. IEEE Int.Conf. Neural Networks, vol. 4, pp. 1941-1948, (1995). 2. Shi Y. and Eberhart R C, "Particle Swarm Optimization: developments, applications and resources," In Proc Congress on Evol. Comput.,NJ: Piscataway, pp.81-86,(2001). 3. E. Zitzler, K. Deb, and L. Thiele, "Comparison of multi-objective evolutionary algorithms: Empirical results," Evol. Comput., vol. 8, no. 2, pp. 173-195, (2000). 4. M.A. Villalobos-Arias, G. T. Pulido, and C.A. Coello Coello, "A proposal to use stripes to maintain diversity in a multi-objective particle swarm optimizer," In Proc.IEEEInt.Conf. SIS2005, pp.22-29, (2005). 5. Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan, "A Fast and Elitist Multiobjective Genetic Algorithm: NSGA-II," IEEE Trans. Evol. Comput., vol. 6, pp: 182-197, Apr.(2002). 6. S. Mostaghim and J. Teich, "Covering Pareto-optimal fronts by subswarms in multi-objective particle swarm optimization," Evol. Comput., vol. 2, pp. 19-23, June (2004). 7. J.E. Fieldsend, R.M. Everson, and S. Singh, "Using unconstrained elite arch-ives for multi-objective optimization", Evol. Comput, vol. 7, pp. 305323, June (2003). 8. S. Mostaghim and J. Teich. "Strategies for finding goad local guides in multi-objective particle swarm optimization," In IEEE Swan Intelligence Symposium, pp. 26 - 33, (2003). 9. Marco Laumanns, Lothar Thiele, Kalyanmoy Deb, and Eckart Zitzler, "Combining Convergence and Diversity in Evolutionary Multi-objective Optimization," Evol. Comput., 10(3):263-282, (2002).

PART 2

Decision Making and Knowledge Discovery

This page is intentionally left blank

KNOWLEDGE DISCOVERY FOR CUSTOMER CLASSIFICATION ON THE PRINCIPLE OF MAXIMUM PROFIT CHUANHUA ZENG 1,2 College of Auto-mobile and Transportation Engineering, Xihua University, ChengDu City P.R. China 610039, Phone: +86-028-89829140, E-mail: zchfirst(w263.net YANG XU intelligent

Control Development Center, Southwest Jiaotong University, Chengdu P.R. China 610031 WEICHENG XIE

3

College of Electrical & Information Engineering, Xihua University, ChengDu City P.R. China 610039 It is one of the most important strategies to make management according to the classification of customers. A new method to classify customers is presented in this paper. Firstly we try to find the key background information by simplifying the decision table based on the rough set theory; secondly we get to know the profit by analyzing the sale and the cost of customers; and finally, we get the decision rules on the principle of maximum profit. As such, we could reason out to which class that a new customer belongs and select a good way to serve him, thus achieving the optimal economic benefit for enterprises.

1.

Introduction

It is essential to identify the customer's class by analyzing his background information. The reason is that as long as we know to which class the customer belongs, we could choose a proper method to serve him, so as to achieve the goal of using the source of enterprise efficaciously. Such a method on the one hand, improves the service level as well as attracts more customers; on the other, economizes sources by gaining a maximum output with minimum input. Take current methods for instance. The decision tree method, which is put forward in reference [4], first classifies financial customers by their credibility and contribution, then identifies the relationship between the classification results and the background information through the decision tree method, and finally infers the probability of the new customer's class so as to select a proper 241

242 strategy for him. Although the probability is already given in reference [4], that to which class the customer exactly belongs is not sure. Besides, the activitybased costing method, which is presented in reference [5], suggests a way to identify the most valuable customer by calculating the activity cost, then select the suitable strategy according to the cost. Since we know the background of customers, and the profit resulted from different classification, provided we classify customers according to our experience, then here arise questions: could we find the correlation between our choices? Are these choices bound to make maximum profit? Taking these questions into consideration, we may classify a new customer into certain group. A new method based on rough set theory to classify customers is set forth in this paper. Firstly we get the decision table of customers' classification. Secondly, we simplify the attributes of the decision table and find the essential background information of customers. Thirdly, we get the profit corresponding to certain decision by analyzing the relationship between costs and sales. And finally, we get the classification knowledge under the principle of maximum profit. As such, we could reason out a new customer's class and accordingly achieve the optimal economic benefit for enterprises. 2. The simplification of disharmonious decision table As to the decision table of customers' classification, it all depends on how people adjust the classification of customers. Many factors that may affect the result are taken into account. However, these factors can not be included in the condition attributes in the table, so the decision table may not be harmonious and accordingly we should employ the rough set theory to simplify it in nonharmonious decision table. Definition 1 Decision table1" (U,A,F,{d},{gd}) is a decision table, where U is the set of objects, and U = {xx,x2,-,x„}; A is the set of condition attributes; d is decision attribute; F is the set of relations between A and U , i.e. F = {fj : j <m) , where / • : [ / - » Vj(j < m) , and Vj is the value field of aj . gd :U -> Krf is the set of relation between U and d, and Vd is the finite value field of gd . U RA and Rd is the equivalent relationships generated by condition attributes set A and decision attribute set {d} respectively. For any B c A: UIRB = {[x]B :x<sU}, UIRD = {C\,D2,...,Dr}, where [x]B ={yeU: (x,y) e RB] ,RB= { ( W ; ) : /,(*,) = f,(xj)(Va,

<E B)}.

Let EKDj/[x\B)=

\D;r\[x]B\ ' ' *'

(/Sr),

I L*JB I

£> is the inclusion degree. Let ftB(x) = (D(Di/[x]B),...,D(Dr/[x]B)) (xeU), fiB(x) is the generalized decision distribute function of x. Definition 2 Suppose (JJ,A,F,{d\{gd}) is a decision table, and Be A. As to any \fxeU : if /uB(x) = ftA(x), then B is distribution harmonious decision set. If B is distribution harmonious set and any subset of B is not harmonious set, then B is the distribution simplification. Definition 3 Suppose (U,A,F,{d},{gd}) is a decision table, and UIRA ={q,C2,...,C,}.Let Z>* = {(W / 4 ,bb):// / < W^// B W}, and /t(C,) is the value of C, 's attribute a^. To definition ({«* e ^ : /*(C,) * MCJMC„CJ)

e D*

[A,(C„Cj)*D Then Di(ChCj) is the distribution identifiable attributes set of C, and Cj. A = (D(ChCj),i,j 2,ayi) is the set of the classes (Corresponding to a,b, and c), each a, (2) Decision rule r2: [x] -> b , (3) Decision rule r3: [x] -> c. P(.6>j IM) is the probability of certain customer in the state coj when it is described as [x], here let P{o)j | [x]) = \COJ fl|>]|/||>]| • A(rj | o)j) is the risk of making decision by r{ when the state is Wj. Suppose certain customer is described as [x] , and P{a>j \ [x]) as the probability of certain customer in the state o)j. As we make a decision rt concerning certain customer described as [x], the expected risk expressed as total probability formula is:

244 3

R(.n\W)'^(n\'Oj)Pi(Oj\lxYi. 7=1

As to certain customer described as [x], let r{x) be a decision rule, i.e. T(X) e E, let R be the total expected risk of all the decision rules, so we get that R=

^R{T(X))P([X]).

We want to know what decision we can make while the total risk is minimum. According to the Bayes decision procedure, we get the decision rules under the minimum risk, which are showed as follows: (1) r, : [x] -> a , if R{n IM) > R{* I [x]),i = 2,3 , (2) r2:[x]->b, if R(r2 \ [x]) > R^ | [x]),i = 1,3, (2) r3 : [x] -> c, if R(r3 | [*]) > *fo I [x]),i = 1,2 . 4. How to calculate the profit The enterprise classifies its customers into deferent classes according to their characters and selects different service combination for each class. It leads to different cost expended on different customer. The activity cost of customer consists of product cost, selling cost and other cost. The selling cost includes strategy cost and service cost etc. And other cost comprises service cost after sales and management cost etc. By subtracting the total costs from the purchasing sum of each customer, we get the profit gained from him. Let Et be the average cost of », class, and /, be the average purchasing sum of (Dj. We classify customers as classes with different level according to their contribution to enterprise. As to each customer with state a,-, if we make decision r, on it, then we gained the profit from the customer by subtracting the average cost of at class from the average purchasing sum of wt class; if we make decision #•,• with higher level than o,- class on it, then the cost of customer should be the average cost of mj class, while the average purchasing sum should be still at the level of OJ, class because of the limited service; if we make decision rj with lower level than »,• class, then the cost of customer should be the average cost of aj class, while the average purchasing sum should be at the level of tOj class because of the limited service. So we get the following formula.

K1

'

>'

\IJ-E„ift<j

5. Algorithm for classification rule discovery We get the classification rules by the following steps: (l)To get the equivalence classes according to the decision attribute; (2)To get the equivalence classes according to the condition attributes; (3) To get the generalized decision distribution function of each object in decision table; (4)To get the distribution identifiable attributes matrix; (5) To get the distribution simplification B; (6)To get the decision rules on the principle of maximum profit. 6. An example To choose some customers classification information from certain enterprise, we get a decision table, where U = {xux2,..,x20} is the set of customers, and A = {ai,a2,a3,a4} is the set of condition attributes, here at is the income level of customer, a2 is the age fields of customer, a3 is the consumption level of customer, and o4 refers to the frequency of business, while d means the classification result and decision attribute, N means Average, H means HIGH, L means LOW. TABLE 1 THE DECISION TABLE

No

«!

a2

a3

a4

d

1 2

N N

H H

L H

H H

a a

20

Firstly we get the identifiable attributes matrix, then we know that the simplification is {oj, o 3 , o4 }. To classify the customers according to the simplified condition attributes, We get C\ ={*1>*10}>

C

2 ={*2>*12.*18)> •••>

C

l l = (*17 } •

The average cost and the average purchasing sum are showed as follows: /, =8000, I2 =4000, / 3 = 1000, £-, =2000, E2 = 1500, E3 =500. So /l(ii |fi>,)=6000,A(/i |
246 To calculate R(J-J | C,), we get R{rx | C,') =6000, R(r2 | C,') =2500, R(r3 \ c[) =500, Finally, we get the decision rules on the principle of maximum profit: Q -> a,C2 -> o,C3 -> a , C4 -» a.Q -> c,Q -> c , C7 -> c,Q -> 6,C9 -» 6, C|Q —> o,C)i —> b. i.e. (a,,7V)A(a3,Z,)A(a4,//)->o, (a,,iV) A(a 3 ,//)A(a 4 ,//) ^ o, When new customers turn up, we may classify them by these rules. Take the first rule as an example, it means that if a customer is of average income, low consumption but is a regular customer of the enterprise, we consider him as a member of "a" group. Conclusion By applying the algorithm put forward in this paper, we can get the relationship between background and classification of customers easily. We can also reason out the class of a new customer according to the knowledge mined from the decision table. And more importantly, with such management of customers, it satisfies the principle of maximum profit, so as to achieve the optimal economic benefit for enterprises. Acknowledgments This paper is supported by the National Natural Science Foundation of P.R. China (Grant no. 60474022). References 1. Wenxiu Zhang, Yi Liang and Weizhi Wu. Information System & Knowledge Discovery. Beijing: Publishing House of Science, 2003:22-56 2. Wenxiu Zhang, Weizhi Wu and Jiye Liang etc.. Rough Set Theory and Its Application. Beijing: Publishing House of Science, 2001:142-157 3. Qing Liu. Rough Set and Rough Reasoning. Beijing: Publishing House of Science, 2001 4. Jian Kang. The Application of Classification Data Mining in Financial Customer Relationship Management. Journal of Beijing Institute of technology, 2003(23)2:207-211 5. Yingfei Liu. The Application of Activity Based Costing of Customer in Customer Relationship Management. Business Research (274)

AN INTEGRATED ANALYSIS METHOD FOR BANK CUSTOMER CLASSIFICATION JIE ZHANG**, JIE LU*, GUANGQUAN ZHANG*, XIANGBIN YAN* *Management School, Harbin Institute of Technology Harbin, 150001, PR China fzhanqiie; xbvan(a),hit.edu.cn} Faculty of IT, University of Technology Sydney, PO Box 123, Broadway, NSW 2007, Australia liielu: zhanee: zhanqiie(a>.it.uts.edu.au) Customer classification is one of the major tasks in customer relationship management. Customers have both static characteristics and dynamic behavioral features. To apply both kinds of data to conduct comprehensive analysis can enhance the reasonability of customer classification. In this paper, customer dynamic data is clustered using a hybrid genetic algorithm and then is combined with customer static data to give reasonable customer segmentation by using neural network technique. A novel classification method which considers both the static and dynamic data of customers is proposed. Applying the proposed method in a bank's datasets can obviously improve the accuracy of customer classification comparing with traditional methods where only static data is used. Keywords: Customer classification, bank, time series data, genetic algorithm

1. Introduction Classification is an important task in bank customer relationship management (CRM). Credit and behavioral scoring models have been used to classify the bank customers [1]. Bank customer management involves both static data and dynamic data. The static data is associated with the customer demographics, and the dynamic data is associated with the customer purchasing behaviors or the transaction data. Most of the researches only use static data to conduct customer segmentation. Obviously, it is intuitive and easy to operate [2]. Some researches use dynamic data [3,4] to do segmentation. However, as the privacy of and update speed issues of customers information, the customers demographic data is either difficult to obtain or difficult to get updated [5]. This paper proposes an integration method using both dynamic and static datasets to identify the customer groups without this limitation assumption which is that customers with similar demographics have the same purchasing behavior [3]. Customer dynamic data is clustered using hybrid genetic 247

algorithms and then combined with the static data to give effective customer segmentation by neural network. The paper is structured as follows. Section 2 describes the framework of the method. In Section 3, the data preparation process is given in detail. Clustering steps using GAs are proposed in Section 4. In Section 5, the final neural network' structure is given. Finally a bank example is used to test the effectiveness of the proposed method. 2. Framework of Bank Customer Classification Method Customer segmentation is a classification process using one or more customers attributes. The results depend on the selection of attributes and classification methods. The values of different attributes come from the enterprise databases including data of the nature and transactions of customers. We classify these data as static data and the dynamic data as follows: •

Static data: It reflects the basic profile of the customers which are stable for a period of time. Static data is used to describe the natural situation of customers such as a customer's gender, age, background, income and etc.

•

Dynamic data: It is frequently changed with the time period, such as transaction records of a customer. Such data can effectively reflect customers' behaviors. This study develops a framework which shows how a method can use both kinds of data for customer classification shown as Fig 1. Dynamic features I Dynamic features 2 Dynamic features n :=

—*•

Cluster

Behavior Type

1

Combined classification

]Customer transaction data Static attribute I

results

Static attribute 2 VStatic attribute n

Figl A framework of combined classification method using dynamic and static data Under the framework, the method can be applied by three phases. We use bank customer data to illustrate this method as presented in Section 3,4 and 5.

3. Phase I: Customer Data Preparation We mainly focus on the preprocessing of dynamic data. First, customer transaction data is weighted by profit. This method then get the profit weighted time series. The original and the weighted time series reflect different properties of customers' purchasing behaviors, both of two time series are therefore used to cluster buying behavior groups. Second, this method applies descriptive statistical values of time series to deal with these dynamic data. We only use some common features of data as the trend, average, variance, kurtosis and skewness of the time series. It therefore reduces the difficulties of data transformation and some unsuitable assumption to the time series. Using the method, transaction data can be of different time periods and of different time durations, so it is easy to extend to some other fields. After weighting the transaction data, we can obtain two data sets. 4. Phase II: Customer Transaction Data Cluster Using Hybrid GA Literature review shows that GA is better in the cluster analysis than traditional algorithms [6]. This study uses a hybrid GA algorithm by combining a simulated annealing algorithm (SA) to make GA more effective. The hybrid GA can prevent GA form premature convergence. The key points of the proposed algorithm are described as follows: 4.1.

Coding Method

We first code the cluster center as the chromosome. So the chromosome is combined by K cluster centers of real numbers. This is not the basic GA coding method of 0 or 1 string. For two time series data of each customer, we have 10 features, and then the cluster center will have ten genes. The chromosome is thus compose of 10*K real numbers, or \0*K genes to represent K cluster centers. Every ten real numbers represent one cluster center in the method. The initial chromosomes are generated randomly. 4.2. Fitness Function The K cluster centers are noted as Ci, C2, ..., Q. The total measure of distance among each cluster can be calculated as formula (1). M (C,,C1,-

,Ct)=£

X II * i-

C

> II

Where X} is the feature vector of customer X, the distance is Euclidean distance. The fitness function is F= 1/M. Then the GA algorithm is used to search the K cluster center to make the F have the maximum value.

4.3. Genetic Operators (1) Selection: This operator used in this study is based on roulette wheel selection method and has been widely used. (2) Crossover: We use one crossover point in the chromosomes to do this job. If the length of the chromosomes is /, then an integer is generated randomly in [1,/-1] to act as the crossover point. The genes right of the point is exchanged between the parents' chromosomes to generate the new generation. (3) Mutation: To every chromosome a mutation possibility assigned is very small. The mutation position is defined as the same way as the crossover operation. The final mutation is shown as (2). '* + S(Xmx - x) 5 ;> 0 (2) x + 8( x - xmm ) 5< 0 Where x is the original value of the mutation gene, 8 is a coefficient associate with the fitness function. Xmin, Xmax is the range of x. 8 is calculated by R as formula (3). M -M .... R

= i M _ /

- M „,„

M ... M ^

Z M

(3)

= M „„

Where M is the measure of the cluster center can be calculated by formula (2). Mmax and Mmin is the maximum and minimum measures of all the chromosomes cluster center. 8 is a random number produced in the range (R,R). This kind of mutation can make the cluster center in the range of all the data, and also can make the well fitted chromosome can not be destroyed. The algorithm will terminate at the planned generations. The best one in the population is the final solution. The terminate condition also can be assigned by some kinds of rules can be satisfied, or the populations are mature. 5. Phase III: Customer Classification Using Static and Dynamic Features It is necessary to combine the cluster results of the dynamic data with the customer static attribute data to conduct segmentation for customers. Back propagation (BP) neural network approach is used in this study to conduct the final classification. The neural network used here has one hidden layer notion. The input neutrons number is the same as the attributes of the customer plus the dynamic attribute conducted from GA. So the input included the basic profile of a customer and the dynamic features form the GAs results. The output neutron is the same as the customer group number. The accurate rate of the classification is calculated as (4)

251 (4)

P = T7-£ <*(«,.«,)

Only if e. and ei is equal, d=l, else <S=0. e, is the customer group label, e, is the group label classified by the BP neural network. N is the total number of the customers. 6. Experiment Data Analysis for Bank Customer Classification In order to testify the classification frame and the algorithm proposed in detail, we use a sample of customer dataset of a bank branch in China. The dataset has 1932 records of customers from 2002/1/1 to 2003/12/13. The customers had been classified as high value group, middle value group and low value group. Each customer's record has 34 attributes, such as name, gender, age, date of birth, income and education background. Totally 16 attributes are selected as the static inputs in the experiment. All the data is transformed into the integrate number. For two value attributes we use ' 1' and '2', for gender,' 1' means male, and '2' means female etc. Other values such as age, we transform it into an age range. Table 1 shows a part of the data we used. Table 1 A customer static attribute data example customer no gendel age rang< city type background marriage 2000317 1 2 2 1 3 2000318 1 1 3 0 1

2

2000319

2

1

2

2

income

house loan label

1 2 3

2 1 1

The bank provides 32 different kinds of finical products and services. As some of them are treated similar in China, for example, withdraw money from counter or ATM. 23 kinds of products and services are considered. The profit rate of different products and services are then calculated. Part of the transaction records of a customer are shown as Table 2. The transaction code means the transaction contents. From the table, we can find that the time interval is not equal. Table 2 A customer dynamic data example T date

20030116 20030120 20030128 20030203 20030205 20030205 20030206 20030208 20030210

T code

CS

MC

F0

CS

LN

LN

SA

CS

CS

T amount

-44268

-559

-7.6

-10000

-39000

39000

794. 46

-5000

78000

We use the method proposed above to cluster the dynamic data. The Hybrid GA control parameters are: Population N=50, mutation probability Pm=0.03, initial temperature T0=100, and the cooling parameter a=0.9\ GA parameters is the generation E=100, crossover probability Pc=l. All the customers purchasing behavior are classified as six groups, and finally the cluster number is assigned to each customer.

From the static customer data, 16 attributes are selected for the final classification. The dynamic data cluster results can not be used as one input to the neural network is testified. The result is show in Table 3. Comparing to the classification only use static data, the proposed method has obviously improved the classification accuracy. Table 3 Comparison of classification accuracy rates Combined Only static data ratio

Training accuracy rate 85.3% 73.1% 1.17

Verifying accuracy rate 82.9% 69.4% 1.19

From table 3, we can see the combined method can improve the accuracy of training data at a ratio of 17% and the verifying data by 19%. 7. Conclusion and Further Study This paper reports a new classification method for combining the static and dynamic data of customers. The testified results show that the method can improve the classification accurate ratio to nearly 20%. The method applies descriptive statistical features to handle different time interval and time period data. As a further study, more features can be also included in the clustering process according the characteristic of an application, such as chaos feature to further improve the classification rate. References 1 2 3 4

5 6

L. C. Thomas, A survey of credit and behavioral scoring: forecasting financial risk of lending to consumers. Intl. J. of Forecasting, 16, 149172(2000). K., Hammond, A. S. C Ehrenberg and G. J. Goodhardt, Market segmentation for competitive brands. Eur. J. of Marketing, 30,39-49(1996). C. Y. Tsai and C. C. Chiu, A purchase-based market segmentation methodology, Expert Systems with Applications, 11, 265-276(2004). N. C. Hsieh, An integrated data mining and behavioral scoring model for analyzing bank customers, Expert Systems with Applications, 27, 623633(2004). R. G. Drozdenko and P. D. Drake, Optimal database marketing: Strategy, development, and data mining. London: Sage. (2002). C. A. Murthy and N. Chowdhury, In search of optimal clusters using genetic algorithms. Pattern Recognition Letters 17, 825-832(1996).

TWO STAGE FUZZY CLUSTERING BASED ON KNOWLEDGE DISCOVERY AND ITS APPLICATION * YEQIAN School of Finance, Zhejiang University of Finance & Economics, Hangzhou, China In order to reflect the characteristic type knowledge and mine data in the credit market,Two-stage classification method is adopted, and fuzzy clustering analysis is presented .First of all, the paper carries on attribute normalization of multi-factors which influence banks credit, computes fuzzy analogical relation coefficient, sets the threshold level to CC by considering the competition and social credit risks state in the credit market, and selects borrowers through transfer closure algorithm. Second, it makes initial classification on samples according to the coefficient characteristic of fuzzy relation; third, it improves fuzzy clustering method and its algorithm. Finally the paper study a case about knowledge of credit mining in the financial market.

1. Introduction It is an procedure for data mining to identify effective, novel, potential useful and understandable mode from a large amount of, incomplete, noisy, fuzzy and random data. It adopts such relevant technology as machine learning, mathematical statistics, neural network, database, pattern-recognition, rough set, fuzzy mathematics, etc. It applies to: classify or predict the model data mining, summarize data, cluster data, discovery related rule and sequence mode, dependent relation or model, etc. .According to maximum attribute rule or adopting classification of close degree, it goes on pattern-recognition and then classifies a sample according to principle of choosing the closest (close to degree law) with multi factors (features). However, it is unnecessary to classify those which have relatively more samples or have such problems as having higher discerning cost to classify. It only needs grouping according to a certain characteristic, which involves another kind of categorized method, namely the clustering, clustering study plays an important role in data mining. There are a lot of papers of knowledge discovery, methods adopted focus on fuzzy, neural network and their combination, for instance, flexible neuron-fuzzy " This work is supported by Zhejiang University of Finance & Economics(Grant No.YJZ02). Work partially supported by the project of the National Science Foundation of China (Grant No.70571068).

253

systems,hierarchical neuro-fuzzy systems hybrid, rough-neuro- fuzzy systems are mentioned(D. Rutkowska,2003);Vicen9 Torra (2003) describes fuzzy knowledge based systems and intelligent control on the light of chance discovery,proposes a multi-stage classification algorithm and a multi-expert classifier. Grzegorz Drwal and Marek Sikora (2004) presents system which tries to combine the advantages of rough sets methods and fuzzy sets methods to get better classification. A neuro-fuzzy System for the extraction of knowledge directly from data, and a toolbox developed in the Matlab environment for its implementation is discussed (G. Castellano, etc. 2003). 2. Description of the model and algorithm of its classification 2.1. Clustering analysis on fuzzy relation The basic thought of setting up model with fuzzy clustering is that given sample U determine a level, and classify U by considering level a .Its fuzzy uncertainty is expressed by a . The classification of U is confirmed after confirming a. proposition 1. let/? e F(Ux U) ,witha € ( 0 , 1 ] , R - is cut sets. If R is fuzzy equivalent relation on U, then R- is an equivalent relation on U . So R- can classify U, and then the classification obtained from this is called classification on level of a . The steps related to fussy cluster relation algorithm are as follows: (1) Normalization of the characteristic data (2) Similarity relation coefficient Here we establish fuzzy similar matrix which reflects analogical relation among each target, namely, R = (r )nxn Where r is correlation coefficient between object Ui and U,. There are many kinds of common methods. For example, the coefficient correlation law of index, fuzzy method, etc. Here we give solutions to the multi attributes between two targets with degree of close law, namely, | [(«,,«,) = ( VJUik A Ujk)) A ( 1 - A K . V K ^ ) )

i*j

(3) The classification based on fuzzy similar matrix can adopt three kinds of methods: First, law of weave network. Let R be a similar matrix with a e [0,1], then the procedure of categorizing a level: set matrix Ra , insert the diagonal the corresponding symbol, in the diagonal, replace below "*" with 1 and replace 0 by blank. Regarding "*" position as a joint, proceeds from guide vertical line

and horizontal line to the diagonal and bind them together. The sample elements that bind together while still can connect each other belong to one type; second is turning into tree with maximum; third is transmitting close bag law to make cluster analysis on basis of R ° R . 2.2. Dynamic cluster model based on uncertainty knowledge 2.2.1 Dynamic clustering of fuzzy ISODATA Classification obtained from above-mentioned methods is only a classification of taking extreme values. It is a kind of roughly and comparatively static division.While the law of dynamic cluster allows the mode sample to move from a connection type to another, it is an initial inaccurate division that is improved progressively, and it is a kind of heuristic method, striving to be optimal divided and reducing the calculating. Definition 1. let/?, n be two given straight integers with/?<w, and D = ( 0 Then D is called (p, «)- Fuzzy division. We denote all (P, n) - Fuzzy division &f (p, n) which can be bridged to be Af. Definition 2. let U= {«,,w2,••-,«,} cz 9?" be limited subset and \eip
;

D = «,)„„„ , C e 5R and C > 1. We define 2

•/(An = I2X)'l|v-*j

(2)

Then J(D, V) is called cluster membership function divided by Fuzzy. Here we provide the concrete cluster algorithm .In accordance with policymaker's intention and adopt fuzzy similar matrix to determine p.Go on to classify " with the simplest way and then get a classification which is regarded as initial Fuzzy classification Dm. We suppose D

e A ; (p,n) \\ J(D

,V

)-J(D

,V )\\<£

.„

then we define V-l)

V,

. *=i

(4)

Here we can obtain j / ( " = {v, ,v2 ,...,v } We go on to revises it and plan to take it place next time in order to have, Uk — vt and meanwhile we let

If for each /, we have

uk*v, t h e n let

rf'"

=•

(6) (7)

triiv-.ii Calculate J(D"'", V"~") and •/(£>'", K1"). If for each chosen £ , there is

\j{D^\V^)-J{D"\V(,))\<£ Then we stop and regard z>(,) and K"' as optimum classification and optimum cluster center. Otherwise we repeat above-mentioned steps. 3. Case Study In a certain period of 2004, it is quite difficult for a credit department of Zhejiang commercial bank to value the 10 borrower applicants from its own borrower applicants the credit grades directly from their public information. But in order to expand its business, the bank hopes to investigate their characteristic state of the 10 borrowing enterprises from the information that the bank has already known and considers to classify them in order to excavate valuable knowledge. By choosing a large number of information of this mode space and carrying on statistical analysis on historical credit materials and data that it owns, the bank examines and excavates five factors that influence the credit decision appraise: Financial statement, project or enterprise production and process technology, market supply and demand state of the products, commercial credit, credit function of management level, If the value of a borrower is positive, which indicates that the borrower is credible and safe. The dealing with ten applicants of this bank and their five respects attribute index can be described briefly as follows: (l)Collect, deal with and make statistical analysis on bank's historical credit materials and data.

(2) By carrying on regular treatment to characteristic datum of borrower, receive mode space which is formed by categorized targets (the samples ) U={w ; ,w 2 ,w 3 ,w 4 ,w 5 ,w 6 ,w 7 ,u 8 ,w 9 ,w 10 }. (3) Calculate the coefficient matrix R of the fuzzy relation between every borrower. Adopt transfer closure algorithm to calculate R ° R . 0.65

0.92

0.6 1

0.96

0.8 .

0.87

0.6 1

0.6 1

0.8 5

0.5 8

0.83

0.75

0.8 3

0.80

0.7 2

0.9 1

0.7 5

1

0.58

0.92

0.58

0,87

0.6 1

0.5 8

0.47

1

0.85

0.93

0.80

0.95

0.77

0 .64

1

0.56

0.87

0.6 I

0.54

0.8 5

1

0.80

0.93

0.83

0.80

0.65

0.65

0.84

1

0.7 7

0.7 2

1

0.7 5

Classification. Establishing different a value, its classification number and the classification are different. At level a =0. 96, U! and « 5 belong to one group, others make up of the other group; At level a =0. 95, U4 and wg belong to the group that u, and w5 form ; At level a = 0. 92, « 3 and u6 can be grouped together with above types. At level a = 0. 91, above-mentioned groups can be included in u2 and u9. At level a = 0. 87, «7 can be grouped to above group. At level a = 0. 85, uw can only be grouped to one type together with other borrowers. Finally, we can establish types with p=3 according to state of the credit market, namely, separate borrowers who are creditable or have security, borrowers whose prestige should pay close attention to ( must offer security or mortgage), and borrowers who should be refused to grant loan (cancelled by red line). And set c =2, S =0. 001. We can compute 1 0 D = 0 0.523 0 0.477

0 0.187 1 0.813 0 0

1 0 0

0 0 1 0 0 1

0.088 0.912 0

0 0.306 0.694

0 0 1

It is shown that borrowers u, and U5 belong to "credit group"; Borrower u1 and« 10 belong to "cancelled by red line group", w3 and u6 belong to " group of paying close attention to"; while to a greater degree w2,w4 andw8 belong to " group of paying close attention "; And to a much greater degree u9 belongs to "cancelled by red line group" .

258

4. Conclusions In the financial theory of the information asymmetry, it is a kind of comparatively effective method to reduce credit ration by dividing borrowers into groups effectively. However, in the credit market, because the datum is incomplete and information is fuzzy and uncertainty, it is difficult to solve such kind of complicate system with classical and traditional theory. This paper tries to study the division group problems of borrowers in the credit market with fuz2y system theory. The case study shows that it is available for clustering banks customers based on two stage fuzzy clustering analysis . References 1. Rutkowska, D.and Hayashi, Y., Fuzzy inference neural networks with fuzzy parametersJ/fSA: Quarterly ,7(1):7 - 22( 2003). 2. Jan Koszlaga and Pawel Strumillo, Discovery of Linguistic Rules by Means of RBF Network for Fault Detection in Electronic Circuits, ICAISC 2004, pp.223-228. 3. Vicenc Torra and Sadaaki Miyamoto, Evaluating Fuzzy Clustering Algorithms for Microdata Protection," Privacy in Statistical Databases , pp. 175-186(2004). 4. Grzegorz Drwal and Marek Sikora, Fuzzy Decision Support System with Rough Set Based Rules Generation Method, Rough Sets and Current Trends in Computing, pp. 727-732(2004). 5. G. Castellano, C. Castiello and A.M. Fanelli, Designing a meta-learner by a neuro-fuzzy approach, Proc. of NAFIPS 2004, Banff, Alberta, Canada, 27-30, June 2004. 6. G. Castellano, C. Castiello, A.M. Fanelli, and C. Mencar, Knowledge discovery by a neuro-fuzzy modeling framework^wzzy Sets and Systems Special Issue on Fuzzy Sets in Knowledge Discovery, Elsevier, Vol. 149, Issue 1, pp. 187-207, January(2005). 7. Shouhong Wang, "Application of self-organising maps for data mining with incomplete data sets ," Neural Comput & Applic ,pp. 42-48, Dec.(2003). 8. Ben Abdelaziz F., Lang P. and Nadeau R., " Dominance and Efficiency in Multicriteria Decision under Uncertainty," Theory and Decision, vol. 47, no. 3, pp. 191-211(21), December(1999). 9. Y. Yoshida , M. Yasuda, J. Nakagami and M. Kurano, " Multi-Objective Fuzzy Stopping in Systems with Randomness and Fuzziness", Proceedings of 8th Bellman Continuum, Intern. Workshop on Intelligent Systems Resolutions, Hsinchu in Taiwan, pp. 341-346, December(2000).

APPLICATION OF SUPPORT VECTOR MACHINES TO THE MODELLING AND FORECASTING OF INFLATION* MILAN MARCEK Faculty of Philosophy and Science, Silesian University, 746 01 Opava, Czech Republic & MEDIS Nitra, Ltd., Pri Dobrotke 659/81, 949 01 Nitra-Drazovce, Slovak Republic DUSAN MARCEK f Faculty of Philosophy and Science, Silesian University, 746 01 Opava, Czech Republic & Faculty of Management Science and Informatics, University ofZilina 010 26 Zilina, Slovak Republic

In Support Vector Machines (SVM's), a non-linear model is estimated based on solving a Quadratic Programming (QP) problem. Based on work [1] we investigate the quantifying of econometric structural model parameters of inflation in Slovak economics. The theory of classical Phillips curve [7] is used to specify a structural model of inflation. We provide the fit of the models based on econometric approach for the inflation over the period 1993-2003 in the Slovak Republic, and use them as a tool to compare their approximation and forecasting abilities with those obtained using SVM's method. Some methodological contributions are made for SVM implementations to the causal econometric modelling. The SVM's methodology is extended for economic time series forecasting.

1. Introduction This contribution considers the econometric modelling of inflation in the Slovak Republic. The main tools, techniques and concepts involved in econometric modelling of inflation are based on the Phillips concept [7]. According to the Phillips inflation theory the variable inflation is generated on a set of underlying assumptions. In any case, the analysed inflation rates are explained by the behaviour of another variable or a set of variables, in our case by the wages and the unemployment as independent variables (see [1], [9]). In this paper the resulting SVM's are applied using an e-insensitive loss function developed by V. Vapnik [11]. We motivate the approach by seeking a function which approximates mapping from an input domain to the real numbers based on a small subset of training points. The paper is organized as follows. * This work was supported by grants GACR 402/05/2768 and VEGA 1/2628/05.

259

260 The next section will provide a quick overview of the concept of SVM's theory. Section 3 analyses the data, discusses statistical and SVM estimator, presents the fitted inflation rate values by the classical statistical method and SVM's models, discusses the circumstances under which SV regression outputs are conditioned and corresponding interpretation of SV regression results is also considered. Section 4 extends the SVM's methodology for economic time series forecasting. A section of conclusions will close the paper. 2. Support Vector for Functional Approximation This section presents quickly a relatively new type of learning machine - the SVM applied in the regression (functional approximation) problems. For details we refer to [2]. .The general regression learning task is set as follows. The learning machine is given n training data, from which it attempts to learn the input-output relationship y = f(x), where { *., y. e 9T x9?}, i = 1,2,...,n consists of n pairs { y-,xj } "_i • The x denotes the j'th input and yf is the rth output. The SVM considers the regression functions of the form f{x) = fj{ai-a])y/(xi,xj) where aj,al

+b

0)

are positive real constants (Lagrange multipliers which are

calculated from solving of the Quadratic Programming (QP) problem by the saddle point of the Lagrangian [3]), b is a real constant, y/(J.) is the kernel function. Admissible kernels have the following forms: y/(x ,x ) = (x r x +1)'' (polynomial SVM of degree d), ^(x,,x .) = exp(-#|x ; - x .|N (radial basis SVM), where 6 is a positive real constant and other (spline, b-spline, etc.). The SV regression approach is based on defining a loss function. They are different error (loss) functions in use and that each one results from a different final model. Next we will use the Vapnik's e-insensitive loss function [11]. Formally, this leads to the solving of the QP problem [3]. After computing Lagrange multipliers OCt and (X{ , one obtains the form of (1) [5], i.e. /(x) = £ ( « , . -a*)^(x,.,x.) +fc= /(x,w) = w r x +

ft

(2)

where w = (w,,..., wn) are weights that are the subject of learning. Finally, b is computed by exploiting the Karush-Kuhn-Trucker (KKT) conditions [3], i.e.

261

b = yk-Yd(al

-a*)^ixitxk)-£

for ak e (0,C),

i=l

(3)

b = yk - X (a, - a] )y/(xi„ xk) + e

for or* e (0, C).

i=i

3. Causal Models, Experimenting with Non-linear SV-regression To study the modelling problem of inflation quantitatively the quarterly data from 1993Q1 to 2003Q4 was collected concerning the consumption price index CPI, aggregate wages W and unemployment U. 3,5

CPI CPI"

15 i •

i

i > i i i i i i i i i i i i i i i i i > i i i i i i i i i i

n —

ii? —

i. o > - n w r - o i - ^ — — ci a ci a ci rs **>

ID ri^ *•>

cr« — n r? •* -*

i

Figure 1. Natural logarithm of quarterly inflation from January 1993 to December 2003

Experimenting with the linear transfer function models [1], the resulting reasonable causal model formulation was found CPI, =0.292 + 0.856 CPI,.

(4)

A graph of the historical (CPI,) and the fitted values for causal (CPI,) inflation model (4) is presented in Figure 1. If CPIt exhibits a curvilinear trend, one important approach to generating an appropriate model is to regress the CPI, against time. In Table 1 the SVR results of inflation were also calculated using an alternative time series model expressed by the following SVR form

CP/,=2> i tt(x,) + * Where x, = (CP/,_,,CP/f_,,...), >•£•

m

(5)

e Eq. (4) is the causal model, or

x = (1,2,..., 43), then the Eq.(4) is a time series model.

262 One crucial design choice is to decide on a kernel. Creating good kernels often requires lateral thinking: many measures of similarity between inputs have been developed in different contexts, and understanding which of them can provide good kernels depends on insight into the application domain. The Figure 2 shows SVM learning by using various kernels. In Fig. 2a we have a piecewise3.5

<s/?S.

3

&

2.5

0^

•Pit 2 #

•

15

t'

>^*+y

JP

» •

#\. ¥/;'

0.5

•

Fig. 2b

Fig. 2a

Fig. 2c

«^f '

3

?

2.5 2 CPIt. 1.5

Fig. 2d

ft _ ^ ^ 1

:/,

-

s>

i

•

0.5

•

0

5

Fig. 2e

tD

15

26

30

35

40

Fig. 2f

Figure 2. Training results for different kernels, loss functions and a of the SV regression (see Tab 1). The original functions (plus points), the estimated functions (full line), the £ -tube (dotted lines) are shown. Fig. a, c, d, e, f correspond to a good choice of the parameters, Fig. b corresponds to a bad choice.

263 linear approximating function, while in Fig. 2b and Fig. 2c we have a more complicated approximating function. Both functions agree with the training points, but they differ on the three y values, they assign to other X inputs. The functions in Fig. 2d and Fig. 2e apparently ignore some of the example points but are good for extrapolation. The true f(x) is unknown, and without further knowledge, we have no way to prefer one of them, and so to resolve the design problem of choosing an appropriate kernel in our application. For example, the objective in pattern classification from sample data is to classify and predict successfully new data, while the objective in control applications is to approximate non-linear functions, or to make unknown systems follow the desired response. Table 1 presents the results for finding the proper model by using the quantity R2 (the coefficient of determination) on our application of the best approximation of the inflation rate. As shown in Table 1 the "best" is 0.9999 for the time series models with the RBF kernel and quadratic loss functions. In the cases of causal models the best R2 is 0.9711 with the exponential RBF kernel and £ -insensitive loss function (standard deviation O = 0.52). The choice of <7 was made in response to the data. In our case, the CPI, CPL\ time series have O = 0.52. The radial basis function defines a spherical receptive field in 9t and the variance <72 localises it. The results shown in Table 1 were obtained using £ -insensitive loss function (£= 0.2), with different kernels and degrees of capacity C = 10 . We used partly modified software developed by Steve. R. Gunn [4] to train the SV regression models. The use of SV regression is a powerful tool to the solution many economic problems. It can provide extremely accurate approximation of time series, the solution to the problem is global and unique. However, these approaches have several limitations. In general, as can by seen from QP solving, the size of the matrix involved in the QP problem is directly proportional to the Table 1. The SV regression results of different choice of the kernels on the training set (1993Q1 to 2003Q4). In two last column the approximation and extrapolation performances are analysed. See text for details. Fig2 a b c d e f

MODEL (5) causal causal causal causal causal time series (4)

KERNEL exp. RBF RBF RBF polynomial polynomial RBF

LOSS FUNCTION £ - insensitive £ - insensitive £ - insensitive £ - insensitive £ - insensitive quadratic

R2

RMSE

0.9711 0.8525 0.9011 0.7806 0.7860 0.9999

0.0456 0.0090 0.0497 0.0191 0.0179 0.5556

0.7762

0.0187

264 number of training data. For this reason they are many computing problems in which general quadratic programs become intractable in their memory and time requirements. To solve these problems they have been introduced many modified versions of SVM's. For example the generalized version of the decomposition strategy is proposed by Osuna et al. [6], the so-called SVM lg ' proposed by Thorsten [10] is an implementation of an SVM learner which addresses the problem of a large task, and finally, in [8] a modified version of SVM's so-called least squares SVM's (LS-SVM's) is introduced for classification and non-linear function estimation. 4. Forecasting with SV-regression Models Unfortunately, the SVM's method does not explicitly define how the forecast is determined, the point estimates of the fitted model are simple values without any degree of confidence for the results. Despite this fact, the point estimates for the large data sets can be calculated. The entire data set is partitioned into two distinct data sets: the training data set, i.e. the sample period for analysis, and the validation data set as the time period from the first observation after the end of the sample period to the most recent observation. The parameters C, G , £ must be tuned as follows. First an SV machine is estimated on a training set by solving QP. Secondly, the performance is evaluated on a validation set. The parameter set with the best performance on the validation set is chosen. With these parameters, the point estimates of the ex-post forecast may be calculated by simply putting the values of the vectors Xv. and X;- into the following SV regression / ( x ) ^ ( f f , - a ; W x > , ) + (-

(6>

i=l...T ;=l...r

where T denotes the sample training period ends, T is the forecasting horizon, i.e. the number of data points on the validation set, CC^CX* are known real constants (Lagrange multipliers), b is known parameter (bias), X- is a Tdimensional vector of the inputs, f(\) are the point estimates or forecasts of the series of yt predicted at the point xv = (*,",xv2,...,xvm), and y/(.l.) denotes the admissible kernel function used in the fitting phase of the SV regression model. An obvious limitation to the use of causal models is the requirement that independent variables must be known at the time the forecast is made. In our case, the new CPI value is correlated with the CPI value one quarter previous. This fact may be used by obtaining one-step ahead forecast of the CPI value

265 operating on a moving horizon basis. Generally, we denote the current period by T, and suppose that we will forecast the series yt in period T +\ (T = n, x=l). The forecast for future observation CPI, is generated successively from the Eq. (6) by replacing the vector of the independent variable xv with CPIT. As a new observation become available, we may set the new current period T+l to T and compute the next forecast again according to the Eq. (6). In situations where the independent variables are mathematical functions of time, the point estimates or forecasts of CPIT+T are just values of the Eq. (6) at the points x]=(T + \,T + 2,...,T + r). As three new observations become available, the ex ante summary forecast statistics (RMSE) may be calculated. The RMSE statistics generated by each SVM's and dynamic model (4) respectively are given in Table 1. As illustrated in Table 1, a curve fitted with many parameters follows all fluctuations (the R2's values increase) but is poor for extrapolation (the RMSE's values increase, too, i.e. the forecast accuracy decreases). The model in Fig. 2b gives best predictions outside the estimation period and clearly dominates the other models. It should be pointed out that we are ranking the seven models within one category of the forecast summary statistics. This is not a statistical test between models, but one way of trying to determine subjectively which of the models generates best the data of the inflation process. 5. Conclusion In this paper, we have examined the SVM approach to study linear and nonlinear models on a time series of inflation in the Slovak Republic. For the sake of approximation abilities we evaluated eight models. Two models are based on causal multiple regression in time series analysis, and six models are based on the Support Vector Machines methodology. Using the disposable data a very appropriate econometric model is the regression (4) in which the lagged dependent variable CPI x can substitute for the inclusion of other lagged independent variables (W , , £ / , ) • The benchmarking was performed between traditional statistical approaches and SVMs in regression approximation tasks. The SVM approach was illustrated on the regression function of (4) which was developed by statistical tools. As it is visually clear from Figures 2, this problem was readily solved by a SV regression with excellent approximation. Finally, the paper has made some methodological contribution for SVM implementations to the causal econometric modelling and extended the SVM's methodology for economic time series forecasting.

266 References 1. J. Adamda, M. Marcek, L. PanCikova, Some results in econometric modelling and forecasting of the inflation in Slovak economics. Journal of Economics, 52, No. 9, 1080-1093 (2004). 2. N. Cristiani Shave - J. Taylor, An introduction to support vector machines. Cambridge University Press (2000). 3. R. Fletcher, Practical methods of optimization. John Wiley and Sons, Chichester and NY (1987). 4. S., R. Gunn, Support Vector Machines for Classification and Regression. Technical Report, Image Speech and Intelligent Systems Research Group, University of Southampton (1997). 5. V. Kecman, Learning and Soft Computing. The MIT Press, Cambridge, Massachusetts, London, England (2001) 6. R. Osuna., Freund, F. Girosi, An improved training algorithm for support vector machines. In J. Principe, L. Gile, N. Morgan, and E. Wilson editors, Neural Networks for Signal Processing VII - Proceedings of the 1997 IEEE Workshop, New York (1997). 7. A., W. Philips, The Relation between Unemployment and the Rate of Change of Money Wages in the United Kingdom, 1861-1957, Economica November (1958). 8. J., A., K. Suykens, T. Van Gestel, J, De Branter, B. De Mor, J. Vandewalle, Least Squares Support Vector Machines. World Scientific Pub. Co., Singapore (2002). 9. V. Pankova, Do Wages Affect Inflation in Czech Republic?, Mathematical Method I Econmics - MME'97, VSB Technical University Ostrava, 1997, (156-160). 10. J. Thorsten, Making large-Scale SVM Learning Practical. In Advances in Kernel Methods - Support Vector Learning, chapter 11. MIT (1999). 11. V. Vapnik: The support vector method of function. In: Nonlinear Modelling: Advanced Black-Box Techniques, Suykens, J.A.K., Vondewalle, J. (Eds.), Kluwer Academic Publishers, Boston, 1998, (55-85).

ASSESSING THE RELIABILITY OF COMPLEX NETWORKS: EMPIRICAL MODELS BASED ON MACHINE LEARNING CLAUDIO M. ROCCO S. Universidad Central de Venezuela, Facultad de Ingenieria, Caracas, Venezuela. crocco(cbreacciun. ve MARCO MUSELLI Istituto di Elettronica e di Ingegneria dell'Informazione e delle Telecomunicazioni, Consiglio Nazionale delle Ricerche, Genova, Italy, marco.muselli@ieiit. cnr. it

Abstract In this paper three models derived using Machine Learning techniques (Support Vector Machines, Decision Trees and Shadow Clustering) are compared for approximating the reliability of real complex networks, such as for water supply, electric power or gas distribution systems or telephone systems, using different reliability criteria.

1. Introduction In the reliability community, two main categories of evaluation techniques are considered: in the analytical approach the system is analyzed, deriving a closed form expression for its reliability, for example by determining the minima] cut pr path sets. On the other hand, simulation techniques, among which methods based on Monte Carlo estimation, are adopted when complex operating conditions are considered [1]. This last approach is usually employed to evaluate the reliability of real engineering systems, since analytical approaches are computational complex (NP-hard). The expected value of a System Function [2] or of an Evaluation Function (EF) [3], depending on the system state x (vector representing the state of each element), is normally used as a reliability index. In fact, EF determines whether a specific configuration x corresponds to an operating state or to a failed one [4]. In simulation techniques based on Monte Carlo estimation, the evaluation of the system reliability is performed by: 1) randomly sampling a large number of 267

268 states x, 2) applying an appropriate EF to assess if in each sampled state x the system succeeded or failed, and 3) estimating the expected value of the EF. The definition of the EF depends on the success criterion to be used. For example, to evaluate the connectivity between two nodes, a depth first search procedure can be used. Other criteria can require more time-consuming procedures. Since in the Monte Carlo estimation, a large number of EF evaluations must be performed [4], it seems to be convenient to obtain a valid approximation flx) for the EF using a Machine Learning (ML) technique, thus applying^) to assess the system behavior in each sampled state x. Two different ML approaches have been used: predictive methods (e.g. Neural Networks or Support Vector Machines (SVM)), adopting a black box device whose functioning is not directly comprehensible and descriptive methods (e.g. Decision Trees (DT) or Shadow Clustering (SC)), which provide a set of intelligible rules underlying the problem at hand. The rest of the paper is organized as follows: In Sec. 2 the analyzed problem is introduced together with the three machine learning techniques (SVM, DT and SC) considered for approximating the EF. Sec. 3 compares the results obtained by each method for three networks and Sec. 4 contains the conclusions. 2. The Machine Learning approach to reliability evaluation It is assumed that system components have two states, operating and failed, coded by the integers 1 and 0, respectively, and that component failures are independent events. The state xt of the j'-th component is defined as [5]: (1 (operating state) ' ~ [0 (failed state)

with probability P( with probability g, = 1 - Pt

where P, is the probability of success of component /. The state of a system containing dcomponents is then expressed by a vectorx= (xi,x2, ...,* rf ). To establish if x is an operating or a failed state for die network, we employ a proper Evaluation Function (EF): _

_ f 1 if the system is operating in state x [0 if the system is failed in state*

Suppose that a sample, called training set, containing N pairs (jt,-,yy) is available, where y} = EF(xj); a Machine Learning (ML) technique can be used to retrieve a good approximation/(x) for the unknown evaluation function EF(x) of the system at hand. The behavior of a predictive method, Support Vector

269 Machines (SVM), and of two descriptive methods, Decision Trees (DT) and Shadow Clustering (SC), are analyzed in the following. 2.1. Support Vector Machines [6] Suppose the value +1 and -1 are adopted for the output y of the evaluation function EF(x) instead of 1 and 0. Denote with S? (resp. S~) the convex hull of the points xj in the training set with corresponding output yj• = +1 (resp. yj = -1). If S* and S~ are linearly separable, we can construct the optimal hyperplane w-x+b = 0, which has maximum distance from these two convex hulls. The vector w and quantity b, usually referred to as weight vector and bias, can be derived by solving the following quadratic programming problem: . 1 mm — ww w,b 2

subject to yj (w> • Xj + b) > 1

for every j = 1,..., N

Once we have found the optimal hyperplane, we simply determine on which side of the decision boundary a given test pattern x lies and assigns the corresponding class label, using the function sgn(wjc *+£>). Equivalently the weight vector w and the bias b of the optimal hyperplane can be found by searching for the values of the Lagrange multipliers Oj in the Wolfe dual problem. In this case we have w = 2y a, yi Xj. Only those points, which lie closest to the hyperplane, have e$ > 0 and contribute to the above sum. These points are called support vectors and capture the essential information about the training set at hand. If the two convex hulls 5* and S~ are not linearly separable the optimal hyperplane can still be found by accepting a small number of misclassified points in the training set. A regularization factor C accounts for the trade off between training error and distance from S* and 5^. To adopt non-linear separating surfaces between the two classes, we can project the input vectors Xj into another high dimensional feature space through a proper mapping $(•). If we employ the Wolfe dual problem to retrieve the optimal hyperplane in the projected space, it is not necessary to know the explicit form of the mapping 0. We only need the inner product $(*)•(£(*') for every pair of input vectors x, x'; a proper symmetric positive definite function K(xjc') =
270 The need of properly choosing the kernel is a limitation of the support vector approach. In general, the SVM with lower complexity should be selected. 2.2. Decision Trees Decision tree based methods represent a non-parametric approach that turns out to be useful in the analysis of large data sets for which complex data structures may be present [7]. A DT uses a divide-and-conquer strategy: It attacks a complex problem by dividing it into simpler sub-problems and recursively applying the same strategy to solve each of these sub-problems [8]. Every node in a DT is associated with a component of the network, whose current state is to be examined. From each node starts two branches, corresponding to the two different states of that component. Every terminal node (or leaf node) is associated with a class, determining the network state: Operating or failed. Conventionally, the false branch (failed state of the component) is positioned on the left and the true branch (operating state of the component) on the right. DT methods usually exploit heuristics mat locally perform a one-step lookahead search; once a decision is taken it is never reconsidered. This hill-climbing search without backtracking is susceptible to the usual risks of converging to locally optimal solutions that are not globally optimal. On the other hand, this strategy allows building decision trees in a computation time that increases linearly with the number of examples [8], Different algorithms for constructing decision trees essentially follow a common approach, called top-down induction; the basic outline is [9]: 1. If all the examples in the training set belong to one class, then halt. 2. Consider all the possible tests that divide the training set into two or more subsets. Score each test according to how well it splits up the examples. 3. Choose the test that achieves the highest score. 4. Divide the examples into subsets and run this procedure recursively on each subset, considering it as the current training set. 2.3. Shadow Clustering [10-11] Shadow Clustering (SC) is a rule generation method, based on monotone Boolean function reconstruction, which is able to achieve performances comparable to those of best classification techniques. The decision function built by SC can be expressed as a collection of intelligible rules in the if-then form, underlying the classification problem. In addition, as a byproduct of the training process, SC is able to determine redundant input variables for the analysis at

271 hand, thus allowing a significant simplification in the data acquisition process. SC proceeds by grouping together binary strings that belong to the same class and are close to each other according to a proper definition of distance. A basic concept in the procedure followed by SC is the notion of cluster. A cluster is the collection of all the binary strings having the value 1 in a fixed subset of components; as an example, the eight binary strings '01001', '01011', '01101', '11001', '01111', '11011', '11101', '11111' form a cluster since all of them only have the value 1 in the second and in the fifth component. The procedure employed by SC consists of the following four steps: 1. Choose at random an example (Xj-yj) in the training set. 2. Build a cluster of points including x, and associate that cluster with the class yj-

3.

Remove the example (xj^J) from the training set. If the construction is not complete, go to Step 1. 4. Simplify the set of clusters generated and build the corresponding monotone Boolean function. An important characteristic of this technique is that the execution of SC does not involve the tuning of any parameter. 3. Example To evaluate the performance of the methods presented in the previous sections, the three networks shown in Fig. 1-3 have been considered. It is assumed that all links have reliability of 0.90. For network 1 [12], it is assumed that each link has a capacity of 100 units. A system failure occurs when the flow at the terminal node / falls below 200 units (a max-flow min-cut algorithm is used to establish the value of the EF). Network 2 [13] has 20 nodes and 30 double-links. The goal is to evaluate the connectivity between the source node s and the terminal node /. Finally, Network 3 shows 52 nodes and 72 double links of the Belgian telephone network [14]. The success criterion used is the all-terminal reliability (defined as the probability that every node of the network can communicate with every other node). In order to apply a classification method it is first necessary to collect a set of examples (xj^yj), where yj = EF(xj), to be used in the training phase and in the subsequent performance evaluation of the resulting set of rules. To this aim, 50000 system states have been randomly selected without replacement and for each of them the corresponding value of the EF has been retrieved. To analyze how the size of the training set influences the quality of the solution provided by each method, 13 different cases were analyzed, with 1000 to 25000 examples in the training set.

272

Fig. 1. Network 1 [12]

Fig. 2. Network 2 [13]

Fig. 3. Network 3 [14] These examples were randomly extracted with uniform probability from the whole collection of 50000 system states; the remaining pairs were then used to test the accuracy of the model produced by the machine learning technique. An average over 30 different choices of the training set for each size value was then performed to obtain statistically relevant results. The performance of each model is evaluated using standard measures of sensitivity, specificity and accuracy [15]. For reliability evaluation, sensitivity gives the percentage of correctly classified operational states and specificity provides the percentage of correctly classified failed states. Different kernels were tried when generating the SVM model and it was found that the best performance is achieved with a Gaussian radial basis function (GRBF) kernel having parameter (1/202) = \ld. Figures 4 shows the result comparison regarding accuracy, sensitivity and specificity during the testing phase. As expected, the index under study for each model increases with the size of the training set. SC has almost always the best behavior for all the indices. However, for Network 2 the specificity index obtained by DT behaves better. In [16] the previous ML methods are compared in terms of the reliability of the networks. As expected the best assessment is obtained using SC.

273

Fig. 4. Performance results during the testing phase for each network.

4. Conclusions This paper has evaluated the excellent capability of three machine learning techniques (SVM, DT and SC) as an approximating tool that can be used to assess the reliability of a complex network. For the three networks studied, the SC procedure seems to be more stable when considering the three indices simultaneously. It is important to realize that SVM produces a model that cannot be written in the form of a logical sum-of-products involving the system components, whereas DT and SC are able to obtain it, even from a small training set, thus providing information about minimum paths and cuts [16]. References 1. Billinton, R. Li W.: Reliability Assessment of Electric Power System Using Monte Carlo Methods, Plenum Press, 1994. 2. Dubi A.: Modeling of realistic system with the Monte Carlo method: A unified system engineering approach, Proceedings of the Annual Reliability and Maintainability Symposium, Tutorial Notes, 2001.

274 3.

4. 5. 6. 7. 8. 9. 10.

11.

12. 13.

14. 15.

16.

Pereira M. V. F., Pinto L. M. V. G.: A new computational tool for composite reliability evaluation, IEEE Power System Engineering Society Summer Meeting, 1991, 91SM443-2. Pohl E. A., Mykyta E. F.: Simulation modeling for reliability analysis, Proc. of the Annual Reliability and Maintainability Symposium, 2000. Billinton, R. Allan R.N: Reliability Evaluation of Engineering Systems, Concepts and Techniques (second edition), Plenum Press, 1992. Cristianini N., Shawe-Taylor J.: An Introduction to Support Vector Machines, Cambridge University Press, 2000. Breiman L., Friedman J. H., Olshen R. A., Stone C. J.: Classification and Regression Trees, Belmont: Wadsworth, 1994. Portela da Gama J.M.: Combining Classification Algorithms, PhD. Thesis, Faculdade de Ciencias da Universidade do Porto, 1999. Quinlan J. R.: C4.5: Programs for Machine Learning, Morgan Kaufinann Publishers, 1993. Muselli M., Quarati A.: Reconstructing positive Boolean functions with Shadow Clustering. In Proceedings of the 17' European Conference on Circuit Theory and Design (ECCTD 2005), Cork, Ireland, 2005. Muselli M. Switching neural networks: A new connectionist model for classification. Proceedings of the 16th Italian Workshop on Neural Networks Vietri sul Mare, Italy, 2005. Yoo Y. B., Deo N.: A comparison of algorithm for terminal-pair reliability, IEEE Transaction on Reliability, 37, 1988, 210-215. Chaturvedi S.K., Misra K.B.: An Efficient Multi-variable Algorithm for Reliability Evaluation of Complex Systems using Path sets, International Journal of Reliability, Quality and Safety Engineering, 3, 2002, 237-259. Manzi E., Labbe M., Latouche G., Maffioli F., Fishman's Sampling Plan for Computing Network Reliability, IEEE Trans.Rel, R-50, 2001,41-46. Veropoulos K., Campbell C , Cristianini N.: Controlling the sensitivity of Support Vector Machines, Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 1999, 55-60. Rocco C. M., Muselli M.: Machine learning models for reliability assessment of communication networks, submitted to IEEE Trans, on Neural Networks.

FUZZY TIME SERIES MODELLING BY SCL LEARNING* MILAN MARCEK Faculty of Philosophy and Science, Silesian University, 746 01 Opava, Czech Republic & MEDISNitra, Ltd., Pri Dobrotke 659/81, 949 01 Nitra-Drazovce, Slovak Republic DUSAN MARCEK* Faculty of Philosophy and Science, Silesian University, 746 01 Opava, Czech Republic & Faculty of Management Science and Informatics, University ofZilina 010 26 Zilina, Slovak Republic

Based on the works [8], [16] a fuzzy time series model is proposed and applied to predict chaotic financial process. The general methodological framework of classical and fuzzy modelling of economic time series is considered. A complete fuzzy time series modelling approach is proposed. To generate fuzzy rules from data, the neural network with Supervised Competitive Learning (SCL)-based product-space clustering is used.

1. Introduction Much of the literature in the field of the fuzzy logic and technology is focused on dynamic processes modelling with linguistic values as its observations (see e.g. [11]). Such a dynamic process is called fuzzy time series. This type of dynamic processes play very important role in making practical applications. Economic and statistical time series analysis is concerned with estimation of relationships among groups of variables, each of which is observed at a number of consecutive points in time. The relationships among these variables may be complicated. In particular, the value of each variable may depend on the values taken by many others in several previous time periods. Very often it is difficult to express exactly these dependencies, or there is not known hypothesis for that. Very frequently, in such cases more sophisticated approaches are considered. These approaches are based on the human experience knowledge and consist of series linguistic expressions each of which takes the form of an ' i f ... then ...' fuzzy rule, and they are well known under the common name fuzzy controllers. But also, an expert is usually unable linguistically describe the behaviour of economic processes in particular situations. Hence, most recent researches in the

* This work was supported by grants GACR 402/05/2768 and VEGA 1/2628/05.

275

276 fuzzy controllers design for deriving of linguistically interpreted fuzzy rules have been centred on developing automatic methods to build these fuzzy rules using a set of numerical input-output data. Majority of these models and data-driven techniques rely on the use Takagi-Sugeno type controllers and fuzzy/non-fuzzy neural networks, [6], [7], [18], clustering/fuzzy-clustering and genetic algorithm approaches [4], [6], [8], [9], [17], [19]. The goal of this paper is to illustrate that two distinct areas, i.e. fuzzy sets theory and computational networks, may be used to economic time series modelling. We show how to use and how to incorporate both fuzzy sets theory and computational networks to determine the fuzzy relational equations. As an application of proposed method, the estimate of the inflation is carried out in this paper. The characterisation of time series is introduced in Section 2. Quantitative modelling methods of time series are presented in Section 3 and 4. Concluding remarks are offered in Section 5. 2. Conventional and fuzzy time series Time series models are based on the analysis of chronological sequence of observations on particular variable. Typically, in conventional time series analysis, we assume that the generating mechanism is probabilistic and that the observed values {xl,x2,...,xl,...} are realisations of stochastic processes {X],X2,...,Xn...}. In contrast to the conventional time series, the observations of fuzzy time series are fuzzy sets (the observations of conventional time series are real numbers). Song and Chisson [16] give a thorough treatment of these models. They define a fuzzy time series as follows. Let Zt,(t= ..., 1, 2, ...), a subset of 5R , be the universe of discourse on which fuzzy sets x\, (i = 1, 2, ...) are defined and X{ is the collection of x\, (i = 1, 2,...). Then Xt, (t = ..., 1, 2,...) is called a fuzzy time series on Zn(t= ..., 1, 2,...). 3. Quantitative time series modeUing methods In practice, there are many time series in which successive observations are dependent. This dependence can be treated here as an observational relation Ro={(yl-.i,yl),(yt-2,yl-il-}sYl_: where Yt, Yt_x denote the variables and yt, yl_l,... of Yt and Yt_x respectively.

x Y„

(i)

denote the observed values

277 In most real economic processes it is assumed that there exists a functional structure between F M and Yt, i.e. /:r

M

->

r,

(2)

belonging to a prespecified class of mappings [5], In practice many real models of this functional structure are represented by linear relation y, = f(y,-\

>x y,-i + e,

(3)

where
(4)

y't_x e Yt_x, i e I, j e J, I and J are indices sets for Yt and

Yt_x respectively, " ° " is the sign for the max-min composition, R0(t, t - 1) is the fuzzy relation among the observations at t and t -1 times. Then Yt is said to be caused by Yt_x only, i.e. y\-x -*y\ Yt->Yt_x

(5) (6)

and Yt = 7 M °R{t,t-\),

(7)

where R (t, t - 1) denotes the overall relation between Yt and Yt_x. In the fuzzy relational equation (7) the overall relation R(t,t- 1) is calculated as the union of fuzzy relations Ry (t, t -1) ,i.e. R(t, t - 1) = U u Ry (t, t-l), where " U " is the union operator. In the following, we will use Mamdani's method [10] to determine these relations. For simplicity, in the following discussion, we can also

278 express yl_l and yJt as the values of membership functions for fuzzy sets y't_x and y\ respectively. Since the Eq. (4) is equivalent to the linguistic conditional statement "if y\_x then y{ ",

(8)

we have Ry (t, t — l)= y\_x x yJt , where "x" is the Cartesian product and therefore

R(t,t -1) = max { minCy,'.,, yj)}.

(9)

ij

Referring to the above definition b y Song and Chisson of the fuzzy time series, in fuzzy time series model Yt, Yt_x can b e understood as linguistic variables and y \ , y't_x as the possible linguistic values of Yt,

Yt_x

respectively. Equation (7) is called a first-order model of the fuzzy time series of Yt with lag p = 1. This first order model can b e extended to the p-th order model. See [16] for details. 4. Quantitative time series modelling methods All the above fuzzy time series models can b e determined if in particular models the fuzzy relations are known. Since finding the exact solution of fuzzy relations is generally very difficult and in practice unrealistic, hence, more sophisticated approaches are considered very frequently. In a fuzzy system, a powerful tool for generating fuzzy rules purely from data are neural networks. Neural networks can adaptively generate the fuzzy rules in a fuzzy system by SCL-based product-space clustering technique [8]. Next, in a numerical example, we will illustrate and show, how to obtain fuzzy rules using the fuzzy sets theory and neural networks. Let us consider a simple example. The data set used in this example (the 514 monthly inflation rates in the U.S.). A graph oh historical values of inflation is presented in Fig. 1. T o build a forecast model the sample period for analysis y x , •••> ^344

was

defined. The following statistical model was specified

yt=Z+Ay,-i+e,>

(10)

279 where the variable yt is explained by only on its previous values, and £t is a white noise disturbance term. Using Levinson-Durbin algorithm [2], [13] the model (10) is statistically fitted as

y,=-0,1248y M

(11)

The fuzzy time series modelling procedure consists of an implementation of several steps.

Figure 1. Natural logarithm of monthly inflation (514 observations).

Firstly, we specified input and output variables. The input variable x(_, is the lagged first difference of inflation values { y t } . The output variable xt is the first difference of inflation values { y t } . The variable ranges are as follows 0,75 < Xt, X M < 0,75 . These ranges define the universe of discourse within which the data of xt-l and xt are, and on which the fuzzy sets have to be specified. Next, we specified the fuzzy-set values of the input and output fuzzy variables. Each fuzzy variable assumed seven fuzzy-set values as follows: NL: Negative Large, NM: Negative Medium, NS: Negative Small, Z: Zero, PS: Positive Small, PM: Positive Medium, PL: Positive Large. Fuzzy sets contain elements with degrees of membership. Fig. 2 shows membership function graphs of the fuzzy sets above.

Figure 2. Fuzzy membership functions for each linguistic fuzzy-set value.

280 The input and output spaces were partitioned into the seven fuzzy sets. From membership function graphs jUt_x, jUt in Fig. 2 is shown that the seven intervals [-0,75; -0,375], [-0,375; -0,225], [-0,225 -0,075], [-0,075; 0,075], [0,075; 0,225], [0,225; 0,375], [0,375; 0,75] correspond respectively to NL, NM, NS, Z, PS, PM, PL. Next, we specified the fuzzy rule base or the bank of fuzzy relations. The appendix describes the neural network which uses the supervised competitive learning to derive fuzzy rules from data. As shown in Fig. 3(b) the bank contains the 5 fuzzy rules. For example die fuzzy rule of the 34 block corresponds to the following fuzzy relation if x\_x = PM then xj = PS. Finally, we determined the output action given the input conditions. We used the Mamdani's implication [13]. Following the above principles, we have obtained the predicted fuzzy value for the inflation xt = Xj45 =0,74933. To obtain a simple numerical value in the output universe of discourse, a conversion of the fuzzy output is needed. The simplest denazification scheme was used. Following this method, we have obtained the predicted value for the X345 = - 0,15 . The remaining forecasts for ex post forecast period t = 346, 347,... may be generated similarly. 5. Conclusion The method may be of real usefulness in practical applications, where the expert usually can not explain linguistically what control actions the process takes or there is no knowledge of the process. In principle a neural network can derive this knowledge from data. In practice this is usually necessary. Although the method has been carried out in the time series modelling field, it is suitable for other applications as data mining systems, information access systems, etc. Appendix GENERATING FUZZY RULES BY SCL-BASED PRODUCT-SPACE CLUSTERING The neural network pictured in Fig. 4 was used to generate structured knowledge of the form „if A, then B" from a set of numerical input-output data. In Section 4 we defined cell edges with the seven intervals of the fuzzy-set values in Fig. 2. The interval - 0,75 < Xt, Xt_x < 0,75 was partitioned into seven non-uniform subintervals that represented the seven fuzzy-set values NL, NM, NS, Z, PS, PM, and PL assumed by fuzzy variables xt_x and xt. The Cartesian product of these subsets defines 7 x 7 = 49 fuzzy cells in the input-output product space R2.

281 As mentioned in [8] these fuzzy cells equal fuzzy rules. Thus, there are total 49 possible rules and thus 49 possible fuzzy relations. We can represent all possible fuzzy rules as 7-by-7 linguistic matrix (see Fig. 4). The idea is to categorise a given set or distribution of input vectors Xr = (x,_, ,xt),t=\,2,..., 344 into 7 x 7 = 49 classes, and then represent any vector just by the class into which it falls. We used SCL (Supervised Competitive Learning) [10], [14] to train the neural network in Fig. 3. The software was developed at Institute of Computer Science of Faculty of Philosophy and Science, Opava. We used 49 synaptic quantization vectors. For each random input sample X, — (x,_ 1 5 x ( ), the wining vector W(., = (Wy,, W2j') was updated by the SCL algorithm according to Wu,
w2l. < - w2i + ri (x2, - w2i)J

W,, * - % - J] (x„ - Wu ) 1

w2, < - w2i -1] (x2, - w2l )J

where / ' is the winning unit defined |w(, -x,|<||w,. -xjl for all i, and where W(. and X, is a normalized version of W, and X, respectively, 77 is the learning coefficient. X, NL NM NS L

X, PS PM PL

NL NM NS 2

PS PM PL

Figure 3. The topology of the network

Figure 4. Distribution of input-output data (x

,,x )

for fuzzy rules generating by SCL-based product-space clustering.

in the input-output product space X,. x X, (a). Bank of the time series modelling system (b).

Supervised Competitive Learning (SCL)-based product-space clustering classified each of the 344 input-output data vectors into 9 of the 49 cells as shown in Fig. 4(a). Fig. 4(b) shows the fuzzy rule bank. For example the most frequent rule represents the cell 34. From most to least important (frequent) the fuzzy rules are (PM; PS), (PS; PL), (NL; NS), (PS; PL), and (PS; PS). References 1. 2.

G E. P. Box, and G. M. Jenkins, Time Series Analysis, Forecasting and Control, Holden-Day, San Francisco, CA (1970). P. J Brockwell, and R. A Davis, Time Series: Theory and Methods. Springer-Verlag, New York (1987).

282 3.

4.

5. 6. 7. 8. 9. 10. 11.

12. 13. 14.

15. 16. 17.

18. 19.

B. Carse, T. C. Forgarty, and A. Munro, "Evolving fuzzy based controllers using genetic algorithms", Fuzzy Sets and Systems. Vol. 89: 273-293 (1996). M. Delgato, A.F.G. Skarmita, and F. Martin, "Afuzzy clustering-based rapid prototyping for fuzzy rule-based modelling", IEEE Trans. Fuzzy Systems. Vol. 5, No. 2: 223-233 (1997). M. Fedrizzi, M.M. Fedrizzi, W. Ostasiewic, "Towards fuzzy modelling in economics", Fuzzy Sets and Systems, 54": 259-268 (1993). J.Q. Chen, Z.G. Xi, and Z.J. Zhang,, "A clustering algorithm for fuzzy model identification", Fuzzy Sets and Systems. Vol. 98: 319-329 (1998). J.S.R. Jang, and C.T. Sun, "Neuro-fuzzy modelling and controF', in Proceedings of the IEEE. Vol. 83, No. 3, 378-406 (1995). B. Kosko, Neural networks and fuzzy systems - a dynamical systems approach to machine intelligence, Prentice-Hall International, Inc. (1992). R. Li, and Y. Zhang., "Fuzzy logic controller based on genetic algorithms", Fuzzy sets and Systems. Vol. 83: 1-10 (1996). E.H. Mamdani, "Application of a fuzzy logic to approximate reasoning using linguistic synthesis", IEEE Trans. Comput. 26: 1182-1191 (1997). D. Mar£ek, "Stock Price Forecasting: Autoregressive Modelling and Fuzzy Neural Network", Mathware & Soft Computing, Vol. 7, No. 2-3: 139-148 (2000). D.C. Mongomery, L.A. Johnston, J.S. Gardiner, Forecasting and Time Series Analysis. McGraw-Hill, Inc. (1990). A. Morettin, "The Levinson algorithm and its applications in time series analysis", International Statistical Review, 52: 83-92 (1984). J.J. Saade, "A defuzzification based new algorithm for the design of Mamdani-type fuzzy controllers", Mathware & Soft Computing. Vol. 7, No. 2-3: 159-173 (2000). P. Siarry, and F. Guely, "A genetic algorithm for optimising Tagaki-Sugeno fuzzy rule bases", Fuzzy Sets and Systems. Vol. 99, 37-47 (1998). Q. Song, B.S., Chisson, "Fuzzy time series and its models". Fuzzy Sets and Systems, 54: 269-277 (1993). M. Sugeno, and T. Yasukawa, "A fuzzy-logic-based approach to quantitative modelling", IEEE Trans. Fuzzy System. Vol. 1, No. 1: 7-31 (1993). H. Takagi, and I. Hayashi, "NN driven fuzzy reasoning", Int. Journal of Approximate Reasoning. Vol. 5, No. 3: 191-212 (1991). Y.S. Tarng, Z.M. Yeh, and C.Y. Nian, "Genetic synthesis of fuzzy logic controllers in turning". Fuzzy Sets and Systems, Vol. 83: 301-310 (1996).

INVESTMENT ANALYSIS USING GREY AND FUZZY LOGIC CENGIZ KAHRAMAN Department of Industrial Engineering, Istanbul Technical University,, Istanbul, Turkey

34367 Macka

ZIYA ULUKAN Department of Industrial Engineering, Galatasaray Turkey

University, 34357 Macka

Istanbul,

The theory of fuzzy logics founded by Zadeh in 1965 has been proven to be useful for dealing with uncertain and vague information. The grey theory that was first proposed by Deng (1982) avoids the inherent defects of conventional statistical methods and only requires a limited amount of data to estimate the behavior of unknown systems. In this paper, we use the fuzzy set theory and the grey theory to develop an efficient method to predict the cash flows of an investment. The cash flows obtained are used in present worth analysis to determine if the investment is acceptable. Illustrative examples are given.

1. Introduction Fuzzy theory, originally explored by Zadeh in 1965, describes linguistic fuzzy information using mathematical modeling. Because the existing statistical time series methods could not effectively analyze time series with small amounts of data, fuzzy time series methods were developed. Grey theory, originally developed by Deng (1982), focuses on model uncertainty and information insufficiency in analyzing and understanding systems via research on conditional analysis, prediction and decision making. In the field of information research, deep or light colours represent information that is clear or ambiguous, respectively. Meanwhile, black indicates that the researchers have absolutely no knowledge of system structure, parameters, and characteristics; while white represents that the information is completely clear. Colours between black and white indicate systems that are not clear, such as social, economic, or weather systems. The grey forecasting model adopts the essential part of the grey system theory and it has been successfully used in finance, integrated circuit industry and the market for air travel. The grey forecasting model uses the operations of accumulated generation to build differential equations. It has the characteristics of requiring less data. 283

284

Wang (2002) predicts the stock price instantly at any given time. Two problems in predicting stock prices are counted as that 1-there may be a large or small difference in two continuous sets of data and 2-the volume of stock data is so large that it affects our ability to use it. To solve these problems, Wang (2002) constructs a data mart to reduce the size of stock data and combines fuzzification techniques with the grey theory to develop a fuzzy grey prediction as one of predicting functions in the system to predict the possible answer immediately. Lin and Lin (2005) report the use of the grey-fuzzy logic based on orthogonal array for optimizing the electrical discharge machining process with multi-response. An orthogonal array, grey relational generating, grey relational coefficient, grey-fuzzy reasoning grade and analysis of variance are applied to study the performance characteristics of the machining process. The machining parameters (pulse on time, duty factor and discharge current) with considerations of multiple responses (electrode wear ratio, material removal rate and surface roughness) are effective. In this paper, the cash flows of an investment will be estimated using grey and fuzzy logic and these cash flows will be used to calculate the fuzzy present worth value. Fuzzy interest rates for the future periods will be also estimated by using grey fuzzy logic and will be used in the analysis. Comparisons of the results with the other methodologies and sensitivity analyses are the components of this study. 2. Fuzzy and Grey Time Series 2.1. Fuzzy time series Fuzzy theory, originally explored by Zadeh in 1965, describes linguistic fuzzy information using mathematical modeling. Because the existing statistical time series methods could not effectively analyze time series with small amounts of data, fuzzy time series methods were developed. Song and Chissom (1993a, b) proposed a first-order time-invariant model and a time-variant model of fuzzy time series in 1993. They fuzzified the enrollment at the University of Alabama in 1993 in the first application of fuzzy time series to forecasting. Then, in 1994, they proposed a new fuzzy time series and compared three different defuzzification models. The empirical result showed that the best prediction results are obtained when the neural network method is applied to defuzzify the data (Song & Chissom, 1994). Chen (1996) considered that the neural network method is too complicated to apply; he therefore presented arithmetic operations instead of the logic max-min composition. The arithmetic operations have a

285 robust specification and are superior to those applied in Song and Chissom's model. Hwang et al. (1998) defined a fuzzy set for each year, established the fuzzy relationship, and finally forecast the enrollment in the University of Alabama using the relation matrix. Empirical analysis revealed that the average error rate of Hwang's model was smaller than those of Chen and Song and Chissom. The following steps construct Hwang's fuzzy time series (Hwang et al., 1998): Step 1. Calculate the variations using the historical data. Step 2. Separate the universe of discourse U into several even-length intervals. In this step, the universe must first be defined: it includes die minimum number of units (Dmin) and the maximum number of units (Dm<„), according to known historical data. Based on Dmi„ and Dmm; the universe U is defined as [Dm^-Di; Dmax+D2]. Di and D2 are two proper positive numbers. Then, t/is divided into intervals with equal length. Step 3. Define the fuzzy time series F(t). The fuzzy time series is expressed as follows. F{t) = IC]/U]+IC2/u2+... + ICm/um (1) where / c - is the memberships and 0 < Ia

< 1 . Thus, the fuzzy sets (A,)

are expressed as:

A,

VCI

'Ml'-'C2 '

U

2>--->*Cm

/«J

(2)

Step 4. Fuzzify the variations of historical data. This step determines a fuzzy set equivalent to each set of data. If the variation falls within ut; then the degree of each historical datum belongs to each At is determined. Step 5. Calculate the relation matrix R(t): Two variables, which the operation w (w=2, 3, ..., n) is the window base and t is year. The operation matrix is expressed as follows:

F{t-2) O w (<) =

'

11

Fit-!) F(t-w-\)

21 O

'12 On

^22

O

o2m

(3)

O wl °w2 wn The criterion matrix is expressed as follows: C\t) = F\t -\) = \C\,C1,. ..,Cm J where C/ represents "decreases"; C2 represents "increases a little", and Cm represents "increases too much". The

286 relation for changing the degree of period t is thus obtained. The relation matrix R(t) is expressed as follows:

R„(t)=Cj{t)xo;(tl

R(t) =

F(t)-

\
\<j< m

(4)

O, xC

On x C, 0„xC,

0„ x C,

0,xC_

O.xC

0,xC,

O

xC

(5)

/?., /?_

Ma4^Ru,R2i,...,RjMa4K>R2i^R.2l---Max{K'R2^---^J] = k>>- 2 ,...,rj

(6)

Step 6. Defozzify the fuzzified predicted variations in Step 5. The principles of denazification are as follows: 1. If the membership of an output has only one maximum ui ; then select the midpoint of the interval that corresponds to the maximum forecast value. 2. If the membership of an output has one or more consecutive maximum, then select the midpoint of the corresponding conjunct interval as the forecast. 3. If the membership of an output is zero, then no maximum exists. Thus, the predicted degree of change is zero. Step 7. Calculate the outputs. The actual value of change for the preceding year is added to the forecast degree of change, yielding the forecast value for this year. 2.2. Grey forecasting model Grey theory, originally developed by Deng (1982), focuses on model uncertainty and information insufficiency in analyzing and understanding systems via research on conditional analysis, prediction and decision making. In the field of information research, deep or light colours represent information that is clear or ambiguous, respectively. Meanwhile, black indicates that the researchers have absolutely no knowledge of system structure, parameters, and characteristics; while white represents that the information is completely clear. Colours between black and white indicate systems that are not clear, such as social, economic, or weather systems. The grey forecasting model adopts the essential part of the grey system theory and it has been successfully used in finance, integrated circuit industry and the market for air travel (Hsu & Wang, 2002; Hsu, 2003; Hsu & Wen, 1998). The grey forecasting model uses the operations of accumulated

287 generation to build differential equations. Intrinsically speaking, it has the characteristics of requiring less data. The GM(1,1), can be denoted by the function as follows (Hsu, 2001): » be Step 1. Assume an original to series Step 2. A new sequence x(' operation (AGO).

is generated by the accumulated generating

* W , X W = (x^),x^(2),x^{3),...,x^{n)\

where * « ( * ) = J > ( 0 ) W f=i

Step 3. Establish a first-order differential equation, (dx^ldt) + az = u where = U {x {x {x z \k) = ca \k) + {}-a)x \k + \) k = \,2,...,n-\. adenotes dene a horizontal adjument coefficient, and 0 -< a < 1. The selecting criterion of a value is to yield the smallest forecasting error rate (Wen et al., 2000). Step 4. From Step 3, we have

x^(k + \) =

(x^{l)-^ e

U

ak

+—.

(7)

where

6=

a

= {BTB)'XBTY, 1 (2)

l

W B = -Z (3)

1

—

(8)

(9)

fii" (10)

Y = )

Step 5. Inverse accumulated generation operation (IAGO). Because the grey forecasting model is formulated using the data of AGO rather than original data, IAGO can be used to reverse the forecasting value. Namely

x{:\k) = xw{k)-x[l){k-\),

k = 2,3,.-,>

(11)

288 3. Forecasting Net Cash Flows and Fuzzy Present Worth of an Investment In the crisp case, to forecast the future cash flows (revenues and costs), various quantitative forecasting techniques are used. Among these techniques, linear regression analysis and exponential smoothing technique are the frequently used ones. In this paper, both fuzzy time series and grey forecasting model will be used to forecast the net cash flows of an investment. Later these forecasts will be compared and some sensitivity analyses will be made A net cash flow is the difference between total cash receipts (inflows) and total cash disbursements (outflows) for a given period of time. The assumptions we make are the followings: 1. the first cost of the prospective investment will almost be the same as the one of existing investments in the same sector, 2. the same trend in the existing investment will be saved in the prospective investment, 3. very few data are obtainable to forecast the future cash flows. Table 1 shows the net actual values of the reference investment. Table 1. Net actual values of the reference investment Year Net Actual values (x$1000) l

* (O) 0)

2

x®{2)

3

x<°>(3)

*

•

n

* % )

Using the fuzzy time series and grey forecasting technique above, we obtain the forecasted cash flow intervals. The forecasted series of the net cash flows are represented by symmetric triangular fuzzy numbers. Then, using the fuzzy present worth formula in Eq. (12) (Kahraman, 2001), we calculate the fuzzy present worth of the prospective investment.

/„,(#„ )= ftWWfM7)) for i =1,2 and fi(n,r) = (l + r) ". Both

02) F and 7 are positive fuzzy

numbers. Here fx (.) and f2 (.) stand for the left and right representations of the fuzzy numbers, respectively

289 4. Conclusions Fuzzy set theory has been used to develop quantitative forecasting models such as time series analysis and regression analysis, and in qualitative models such as the Delphi method. In these applications, fuzzy set theory provides a language by which indefinite and imprecise demand factors can be captured. The structure of fuzzy forecasting models are often simpler yet more realistic than non-fuzzy models which tend to add layers of complexity when attempting to formulate an imprecise underlying demand structure. When demand is definable only in linguistic terms, fuzzy forecasting models must be used. Cash flows can also be forecasted using fuzzy or grey theory when too few past data exist. Otherwise, crisp statistical techniques should be used. Acknowledgment We are grateful to Galatasaray University Research Foundation for supporting this paper. References 1. C. Goh, R. Law, Modeling and forecasting tourism demand for arrivals with stochastic nonstationary seasonality and intervention, Tourism Management, 23,499-510,(2002). 2. C.H. Wang, Predicting tourism demand using fuzzy time series and hybrid grey theory, Tourism Management,2i, 361-31 A, (2004). 3. C. Kahraman, Capital budgeting techniques using discounted fuzzy cash flows, in Da Ruan, Janusz Kacprzyk and Mario Fedrizzi eds., Soft Computing for Risk Evaluation and Management: Applications in Technology, Environment and Finance, (Physica Verlag, Heidelberg), 375396, (2001). 4. C.L. Hsu, Y.U. Wen, Improved grey prediction models for trans-pacific air passenger market, Transportation Planning and Technology, 22, 87-107, (1998). 5. C. Lim, M. McAleer, Time series forecasts of international travel demand for Australia, Tourism Management, 23, 389-396, (2002). 6. J.C. Wen, K.H. Huang, K.L. Wen, The Study of a in G M ( U ) Model, Journal of the Chinese Institute of Engineers, 23(5), 583-589, (20009. 7. J. Hwang, S.M. Chen, C.H. Lee, Handling forecasting problems using fuzzy time series, Fuzzy Sets and Systems, 100, 217-228, (1998). 8. J.L. Deng, Control problem of grey system, Systems and Control Letters, 1, 288-294, (1982).

290 9. J.L. Lin, C.L. Lin, The use of grey-fuzzy logic for the optimization of the manufacturing process, Journal of Materials Processing Technology, 160, 9-14, (2005). 10. K.H. Huarng, Effective lengths of intervals to improve forecasting in fuzzy time series, Fuzzy Sets and Systems, 123, 387-394, (2001). 11. L.C. Hsu, The comparison of three residual modification model, Journal of the Chinese Grey System Association, 4(2), 97-110, (2001). 12. L.C. Hsu, Applying the grey prediction model for the global integrated circuit industry, Technological Forecasting and Social Change, forthcoming, (2003) 13. L.C. Hsu, C.H. Wang, Grey forecasting the financial ratios, The Journal of Grey System, 14(4), 399^108, (2002) 14. Q. Song, B.S. Chissom, Forecasting enrollments with fuzzy time series,Part I. Fuzzy Sets and Systems, 54, 1-9, (1993a). 15. Q. Song, B.S. Chissom, Fuzzy time series and its models, Fuzzy Sets and Systems, 54, 269-277, (1993b). 16. Q. Song, B.S. Chissom, Forecasting enrollments with fuzzy time series,Part II. Fuzzy Sets and Systems, 62, 1-8, (1994). 17. R. Law, Back-propagation learning in improving the accuracy of neural network-based tourism demand forecasting, Tourism Management, 21, 331340, (2000). 18. R. Law, N. Au, A neural network model to forecast Japanese demand for travel to Hong Kong, Tourism Management, 20, 89-97, (1999). 19. S.M. Chen, Forecasting enrollment based on fuzzy time series, Fuzzy Sets and Systems, 81, 311-319, (1996). 20. Y.F. Wang, Predicting stock price using fuzzy grey prediction system, Expert Systems with Applications.,22, 33-39, (2002). 21. Y.P. Huang, H.C. Chu, Simplifying fuzzy modeling by both gray relational analysis and data transformation methods, Fuzzy Sets and Systems, 104, 183-197, (1999).

AN EXTENDED BRANCH-AND-BOUND ALGORITHM FOR FUZZY LINEAR BILEVEL PROGRAMMING GUANGQUAN ZHANG, JIE LU, THARAM DILLON Faculty of Information Technology University of technology Sydney, POBox 123, Broadway, NSW2007, Australia email: fzhangg, jielu, tharam}@it.uts.edu.au

Abstract: This paper presents an extended Branch-and-Bound algorithm for solving fuzzy linear bilevel programming problems. In a fuzzy bilevel programming model, the leader attempts to optimize his/her fuzzy objective with a consideration of overall satisfaction, and the follower tries to find an optimized strategy, under himself fuzzy objective, according to each of possible decisions made by the leader. This paper first proposes a new solution concept for fuzzy linear bilevel programming. It then presents a fuzzy number based extended Branch-and-bound algorithm for solving fuzzy linear bilevel programming problems. Keywords: Bilevel programming; Branch-and-bound algorithm; Fuzzy sets; Fuzzy optimization; Decision making

1. Introduction Bilevel programming (BLP) has been developed for mainly solving decentralized planning problems with decision makers in a two level organization [1, 2, 10]. Decision maker at the upper level is termed as the leader, and at the lower level, the follower. Each decision maker (leader or follower) tries to optimize their own objective function, but the decision of each level affects the objective value of the other level [6]. Bilevel programming theory and method have been applied with remarkable success in different domains in decision making, for example, decentralized resource planning, electric power market, logistics, civil engineering, chemical engineering and road network management [7, 11], The vast majority of research on BLP has centered on the linear version of the problem, i.e., linear BLP. A set of approaches and algorithms have been well developed such as well known KuhnTucker approach [2] and Branch-and-bound algorithm [3,4], It has been observed that, in most real-world situations, the possible values of parameters in a bilevel programming model can be only imprecisely or ambiguously known and therefore are determined by model builders' understanding of the situations during model establishment process. It would be 291

292 certainly more appropriate to interpret the model builders' understanding of the parameters as fuzzy numerical data which can be represented by means of fuzzy sets [12]. Such a BLP problem in linear version is called a fuzzy linear bilevel programming (FLBLP) problem. The FLBLP problem was well researched by Sakawa et al. [8], Lai [5], and Shih [9]. However, to deal with some limitations of these approaches, this study proposes a new solution concept for FLBLP model. Under the solution concept, an extended Branch-and-Bound algorithm is developed to solve fuzzy linear bilevel programming problems. 2. A Fuzzy Bilevel Programming Model Uncertainty and imprecision are naturally appearing in various decision making including bilevel decision making problems. For example, logistics managers often imprecisely know the values of related constraints and evaluation criteria in making a logistics plan. They can only estimate inventory carrying costs and transportation costs of a particular set of goods. Also, for the evaluation of any alternative facilities, logistics managers can only assign values according to their experience, and these values assigned are often in linguistic terms, such as 'about 100', or 'about two times of the original cost". Obviously, when building a bilevel programming model for a decision problem, the parameters of the objective function or constraints of both the leader and the follower are hard to be set by precise numbers. The normal bilevel programming model which involves these issues is not efficient to express such decision problem as uncertain information and imprecise linguistic expressions are involved. This study therefore develops a FLBLP model as follows. ForxeXczR", yeYaRm, F:XxY -+F*(R), a n d / : it consists of finding a solution to the upper level problem: min F(x,y) = 'clx + dly

XxY-+F*(R), (2.1a)

xeJf

subject to Axx + Bxy-
(2.1b)

where v , for each value of x, is the solution of the lower level problem: min f(x, y) = c2x + d2y

(2.1c)

subject to A2x + B2y
(2.Id)

where cx,c2sF\Rn\

dx,d2eF\Rm\

\ = ^~e^F\R\

B2=fa)

bxeF'(Rp),

SysF'iR).

b2eF*(Rq),

293 In this model whatever in \heF(x,y), the leader's objective function, or in the/(x,j>), the follower's objective function, or their constraints, all parameters are allowed to be a fuzzy value. 3. The solution concept for the fuzzy bilevel programming model This section gives a necessary and sufficient condition for an optimal solution of a FLBLP problem defined by expression (2.1a)-(2.1d) so as to solve this problem. Associated with the FLBLP problem shown in (2.1a)-(2.1d), we now consider the following linear multi-objective multi-follower bilevel programming (LMMBLP) problem: F o r x e l c i ! " , yeY
F: XxY -+F*(R), and f:XxY

min (F(x, y)fx - c]xx + d]xy,

-+F*(R),

X e [0,1]

xeX

(3.1a)

min (F(x, yjfx = cfix + dixy,

A e [0,1]

xsX

subjectto A^xx+ B]xy
+ B]xy
A e[0,l]

mm(f(x,y))x=c2xx

+ d2xy,

Ae[0,l]

mm(f(x,y))x=c2Rxx

+ d2xy,

Ae[0,l]

(3.1b) (3.1c)

yeY

subjectto A2xx + B2xy
c^x,clx,c2x,c2Rx

b2Lx,b2ReR<,

<=k) <=kh

eR",

dlx,dlx,

<=fe) RP™>

d2x,d2R<=Rm,

^x,b^xeRp,

,

Obviously, the two groups of followers shown in (3.1c) are sharing of same variables. Based the model, we give the following theorems and lemmas in order to introduce an extended Branch-and-bound algorithm. Theorem 3.1 [13] Let ( * * , / ) be the solution of the LMMBLP problem (3.1). Then it is also a solution of the FLBLP problem defined by (2.1).

294 Lemma 3.1 [13] If there is (x ,y ) such that cx + dy>cx CQX + dQy>cl;x*+doy*

+ dy ,

and CgX + d^y >CgX* + d*y*, for any (x, y) and

isosceles triangle fuzzy numbers c and d, then c x

i

+ dJLy>c^x*

clx +

+dj[y',

dZy>c$x*+d«y\

for any X e (0,1), where c and d are the centre of "c and d respectively. Theorem 3.2 [13] For* e X c R" , yeYaRm,

If all the fuzzy coefficients

ay, by, ety, 7y, c, and dt have triangle membership functions of the FLBLP problem (2.1). t
/*?(') =

-t + z«

(3.2)

z
z0 - z

4<<

0

where z

denotes 2L-, 6,y, e«, 7y, c, and dj and z are the centre of

z respectively. Then, it is the solution of the problem (2.1) that (x ,y ) e R"xRm

satisfying m(F(x,y)) c min

=c]x + diy,

xeX

mm{F(x,y))o

=c^x + d^y,

xeX

(3.3a)

mm(F(x,y))o

=cl$x + dl%y,

xeX

subject to A)X + Bxy < 6,, (3.3b)

mm{f(x,y))c

= c2x + d2y,

yeY

mm(f(x, yj% = c2\x + d2L0y, yeY

mm(f(x,y)YA=c2*x yeY

(3.3c)

+ d2*y,

295 subject to A2x + B2y
(3.3d)

<*+
+ d]y)+(c]L0x + dlLQy)+(c]Rx + d]Ry)

(3.4a)

subject to A}x + Bxy
Kx

+

Ky

(3.4b)

Ao* + B\oy<[>\o, A2x + B2y
(3.4c)

B2Ry
u{Bx +B}LQ+ < ) + V(B2 + B2L0 + B2R)- w = -(d2 + d2l0 + d2R) «((*, + * , o + * i o ) - U + ^ i o + ^ i o ) « - (B. + K R

L

+ v f e + 62 J + b2 ) - (/f2 + A2 0 + A2

R

)K-(B2+

+ 5io M B2

L

(3 4e)

R

0 +B2

(3.4d)

)y)+wy

=0

x>0,y>0,u>0,v>0,w>0.

(3.4f)

Theorem 3.3 provides a theoretical foundation for extending exist Branchand-bound algorithm to handle the FLBLP problems. We now describe the basic idea of the extended Branch-and-bound algorithm. 4. An extended Branch-and-bound algorithm for FLBLP problems We first write all the inequalities (except of the leader's variables) of (2.1 a)(2.Id) as gj(x,y)>0,i = l,...,p + q + m , and note that complementary slackness simply means ujgj(x,y) = 0 (i = \,...,p + q + m). We suppress the complementary term and solve the resulted linear sub-problem. At each time of iteration the condition (3.4e) is checked. If it is satisfied, the corresponding point is in the inducible region and hence a potential solution to (2.1). Otherwise, a Branch-and-bound scheme is used to implicitly examine all combinations of the complementarities slackness. We give some notations for describing the details of the extended Branch-and-bound algorithm.

296 Let W = {\,...,p + q + m} be the index set for the terms in (3.4e), F be the incumbent upper bound on the objective function of the leader. At the kth level of an search tree we define a subset of indices Wk c W , and a path Pk corresponding to an assignment of either M; = 0 or g, = 0 for ieWk. Now let St={i:iefrk,ul=0) Sk ={i:ieWk,gl

= 0}

S°k={i:itWk}. For / € Sk , the variables uf or g, are free to assume any nonnegative value in the solution of (3.4) with (3.4e) omitted, so complementary slackness will not necessarily be satisfied. By using these notations we give all steps of the extended Branch-to-bound algorithm. Step 1 Step 2 Step 3 Step3.0 Step 3.1

The problem (2.1) is transferred to the problem (3.3) by using Theorem 3.2 The problem (3.3) is transferred to the following linear BLP problem (3.4) by using the method of weighting [7]. To solve the problem (3.4) (initialization) Set k = 0, S£ =, Sk =(/>, S°k={\ p + q + m), and F = °o . (iteration k) Set u, = 0 for i e Sk and g, = 0 for ieSk . It first attempts to solve (3.4) without (3.4e). If the resultant problem is infeasible, go to Step 3.5; otherwise, put &<-& + l and label the solution (xk, yk,

Step 3.2 Step3.3

uk).

(Fathoming) If F(xk,yk k

(Branching) \fu gi(x

k

)>F,

then go to Step 3.5.

k

,y ) = Q, i = \,...,p + q + m, then go to Step

3.4. Otherwise select / for which w, g,(x ,yk)=A0 is the largest and label it /, . Put S+k <- Sk+ u{i,} , 5,° <- 5t° \{/,} , Sk <- Sk , append /, to Pk , and go to Step 3.1. Step 3.4

(Updating) Let F <- F(xk,

Step 3.5

(Backtracking) If no live node exists, go to Step 3.6. Otherwise branch to the newest live vertex and update Sk ,Sk , Sk and Pk as discussed below. Go back to Step3.1. (Termination) I f F = o o , there is not feasible solution to (2.1a)-

Step 3.6

yk).

297

Step 4

(2.1 d). Otherwise, declare the feasible point associated with F which is the optimal solution to (2.la)-(2.Id). Show the result of problem (2.1).

Some explanations are given for these steps and their working process as follows. After initialization, Step 3.1 is designed to find a new point which is potentially bilevel feasible. If no solution exists, or the solution does not offer an improvement over the incumbent (Step 3.2), the algorithm goes to Step 3.5 and backtracks. Step 3.3 checks the value of ufgj(xk,yk)

to determine if the complementary

slackness conditions are satisfied. In practice, if \ukgA < 10~6 it is considered to be zero. Confirmation indicates that a feasible solution of a bilevel program has been found and at Step 3.4 the upper bound on the leader's objective function is updated. Alternatively, if the complementary slackness conditions are not satisfied, the term with the largest product is used at Step 3.3 to provide a branching variable. Branching is always completed on the Kuhn-Tucker multiplier [2]. At Step 3.5, the backtracking operation is performed. Note that a live node is one associated with a sub-problem that has not yet been fathomed at either Step 3.1 due to infeasibility or at Step 3.2 due to bounding, and whose solution violates at least one complementary slackness condition. To facilitate bookkeeping, the path Pk in the Branch-and-bound tree is represented by a vector, its dimension is the current depth of the tree. The order of the components of Pk is determined by their level in the tree. Indices only appear in Pk if they are in either Sk or Sk with the entries underlined if they are in Sk~ . Because the algorithm always branches on a Kuhn-Tucker multiplier first, backtracking is accomplished by finding the rightmost non-underlined component ifPk, underlining it, and erasing all entries to the right. The erased entries are deleted from Sj~ and added to Sk . 5. Conclusions The issue addressed in this study is how to derive an optimal solution for the upper level's decision making for a bilevel programming problem. This paper proposes a fuzzy number based extended Branch-and-bound algorithm to solve fuzzy linear bilevel programming problems. Further study includes the development of models and approaches for fuzzy bilevel multi-follower

298 programming and fuzzy bilevel multi-objective programming problems. In fuzzy bilevel multi-follower programming, the relationships among these multiple followers will be classified into cooperative and non-cooperative situations. Therefore a set of models and algorithms need to be developed. In fuzzy bilevel multi-objective programming, fuzzy multi-objective programming and fuzzy bilevel programming approaches will be integrated to lead a satisfactory solution for the decision makers. Acknowledgments The work presented in this paper was supported by Australian Research Council (ARC) under discovery grants DP0557154 and DP0559213. References [I] G. Anandalingam and T. Friesz, Hierarchical optimization: An introduction, Annals of Operations Research Vol. 34 (1992), 1-11 [2] J. Bard, Practical Bilevel Optimization: Algorithms and Applications (Kluwer Academic Publishers, Amsterdam, 1998) [3] J. Bard and J. Falk, An explicit solution to the programming problem, Computers and Operations Research 9(1982), 77-100 [4] P. Hansen, B. Jaumard, and G. Savard, New branch-and-bound rules for linear bilevel programming. SIAM Journal on Scientific and Statistical Computing 13(1992), 1194-1217. [5] Y.J. Lai, Hierarchical optimization: A satisfactory solution, Fuzzy Sets and Systems 77(1996), 321-335 [6] T. Miller, T. Friesz and R. Tobin, Heuristic algorithms for delivered price spatially competitive network facility location problems, Annals of Operations Research 34(1992), 177-202. [7] M. Sakawa, Fussy sets and interactive mulitobjective optimization (Plenum Press, New York, 1993). [8] M. Sakawa, I. Nishizaki, and Y. Uemura, Interactive fuzzy programming for multilevel linear programming problems with fuzzy parameters, Fuzzy Sets and Systems 109(2000), 3-19 [9] H.S. Shih, YJ. Lai, and E.S. Lee, Fuzzy approach for multilevel programming problems, Computers and Operations Research, 23(1996) 73-91 [10] H. Stackelberg, The Theory of the Market Economy (Oxford University Press, New York, Oxford, 1952) [II] D. White and G. Anandalingam, A penalty function approach for solving bi-level linear programs, Journal of Global Optimization, 3(1993), 397-419. [12] L. A Zadeh, Fuzzy sets, Inform & Control 8(1965), 338-353 [13]G. Zhang and J. Lu, The definition of optimal solution and an extended Kuhn-Tucker approach for fuzzy linear bilevel programming, The IEEE Computational Intelligence Bulletin 2 (2005), 1-7.

FUZZY MULTI-OBJECTIVE INTERACTIVE GOAL PROGRAMMING APPROACH TO AGGREGATE PRODUCTION PLANNING

TIJEN ERTAY Istanbul Technical University Faculty of Management, Management Engineering Department Macka, 34367, Istanbul, Turkey. Abstract: In this paper, we consider an interactive goal programming approach for fuzzy multi objective linear programming application to aggregate production planning problems. Our aim is to determine the overall degree of decision maker satisfaction with the multiple fuzzy goal values and to give the exactly satisfactory solution results for decision maker in illustrative example.

1. Introduction As known, aggregate planning is concerned with the simultaneous establishment of a firm's production, inventory and employment levels over a finite time horizon. In other words, aggregate production planning (APP) is a medium range capacity planning method that encompasses a time horizon. This plan provides the basic input from which more detailed, product-specific production plans are derived and with which longer term strategic decisions are made. APP is an important upper level planning activity in a production management system, other forms of product family disaggregating plan, material requirements plan all depend on APP in a hierarchical way. APP has attracted considerable interest from academics for a long time. Holt et al. (1960) proposed a continuous timevaried model for an optimal employment, production and inventory policy in a single manufacturer under the assumption of a given sales forecast. Much considerable attention has been directed towards aggregate production problems and different optimization models have been developed. (Charnes and Cooper, 1961; Singhal and Adlakha, 1989; Bergstrom and Smith, 1970) But, in realworld APP problems, the input data or parameters such as demand, resources, cost and the objective function are often imprecise because of being incomplete or unobtainable information. Conventional mathematical programming cannot solve all fuzzy programming problems. Zimmermann (1976) first introduced fuzzy set theory into conventional linear programming. This study considered LP problems with a fuzzy goal and fuzzy constraints. Hintz and Zimmermann 299

300 (1989) proposed an approach based on fuzzy linear programming (FLP) and approximate reasoning to solve APP. Lee(1990) discussed fuzzy aggregate production planning problems with single product type, under the environment of fuzzy objective, fuzzy workforce levels and fuzzy demands in each period. A linear programming model with fuzzy objective and fuzzy constraints is developed, and fuzzy solutions under different levels can be achieved through parametric programming technology. Wang and Fang (2001) presented a novel FLP method for solving the APP problem with multiple objectives where the product price, unit cost to subcontract, workforce level, production capacity and market demands are fuzzy in nature. Fung et al. (2003) proposed a fuzzy multiproduct aggregate production panning (FMAPP) model to cater to different scenarios under various decision making preferences by applying integrated parametric programming, best balance and interactive methods. Wang and Liang (2004) developed a fuzzy multi-objective linear programming model with the piecewise linear membership function to solve multi-product APP decision problems in a fuzzy environment. Tang et al. (2000) focused on a novel approach to modeling multi-product aggregate production planning problems with fuzzy demands and fuzzy capacities. The objective of this study is to minimize the total costs of quadratic production costs and linear inventory holding costs. By means of formulation of fuzzy demand, fuzzy addition and fuzzy equation, the production inventory balance equation in single stage and dynamic balance equation are formulated as soft equations in terms of a degree of truth and interpreted as the levels of satisfaction with production and inventory plan in meeting market demands. This study considers an interactive goal programming approach for fuzzy multi-objective linear programming to solve APP problems. This model considers to minimize production cost, inventory carrying and backordering cost and hire and lay off worker costs.

2.1 Problem Formulation and assumptions The multi product APP problem can be described based on assumptions that a firm produces N type of products to satisfy the market demand in each period t (t=l,2,....T). The decision making problem is related to determine the compromise solution for meeting forecast demand by considering the Model 1 and Model II. All of those in Model II are fuzzy with indefinite goal level while all of the considered objective functions are crisp values in Model I. Besides, all of the considered objective functions are linear. The values of all parameters are definite over the planning time horizon. The forecast demand over time period is either met or backordered, if there is a backorder in a period, this backorder should be procured in the following period. The used nomenclatures are as follows.

D, : forecasted demand for nth product in period t (units) C,

: Cost to hire one worker in period t ($/man - hour)

CL, : Normal production cost per unit for nth product in period t Cy

: Cost to layoff one worker in period t ($/man - hour)

Wh,: Workers hired in period t (man - hour) (decision variable) WL : Workers laid off in period t (man - hour)(decision variable) Qt : Normal time productionof nth product in period t (units)(decision variable) Otn: Overtime production of nth product in period t (units)(decision variable) Cfjj: Overtime production cost per unit of nth product in period t ($/unit) C: : Inventory carrying cost per unit of nth product in period t($/unit) L : Inventory level in period t of nth product (units) (decision variable) C!f : Backorder cost per unit of nth product in period t ($/unit) B[n : Backorder level of nth product in period t (units) (decision variable) a>tn : Hours of worker per unit of nth product in period t (man - hour/unit) /itn : Hours of machine usage per unit of nth product in period t (machine - hour/unit) Wtmax: Maximum worker level available in period t (machine - hour) Mtmax : Maximum machine capacity available in period t (machine - hour) I.n min : Minimum inventory level of nth product in period t (units) B,n

max:

Maximum backorder level available of nth product in period t (units)

Model I formulations are as follows.

302

n=l 1=1

n=l 1=1

1=1

(Minimize Costs of total production) Min&i = £ ^jfL

*/,„ +C* *Bln]

(2)

n=l 1=1

( Minimize Inventory Carrying and Back ordering Costs) Ming3=Jjcr*WH+Cr-*Wl, 1=1

I

(3)

( Minimize Costsof Changes in Vorker levels) subjecto l,n-Bu =l,-,„-B,^+a,+Om- Dm V(,Vn 'tnmin

(4) (5) (6)

V/,Vn Vf.Vn

—'In

B

mmax-Bm N

fJG>,-jQ,-l,n+0,-J+Wfi-Wl,

<7)

n=l

n=l N

V;

^^(Qn+Oj^Wt^

(8)

n=l N

Vf

imax

(9)

n=l

Non- negative decision variables; Qm:0,„;la;Bm;Wh,;Wl, >0;

Vf; Vn

2.2 The Proposed Fuzzy Multi-objective LP Model (Model 11) In this study, multi-objective linear programming model can be converted into the multiple fuzzy objective linear programming models by considering a linear membership function to indicate the fuzzy goals of decision maker. (Bellman and Zadeh, 970). The proposed model in this study is mainly inspired from the model considered by Wang and Liang (2004). The present model differs from Wang and Liang's model, in terms of the considered constraints and objective functions. The our model is not considered maximum machine capacity and warehouse capacity and the escalating factors in each of the costs categories over the next T planning horizon. In this study, the linear membership function considering for each objective function is determined as follows.

fki)=

8

'~~8/» / gi

0

u -gi

slight ft*rf

do)

303 where giamtgf indicate the lover and upper bounds of objective function respectively. The interval \g',,g?\ of goals values can be determined ith according to decision maker's view. If decision maker is not satisfied according to the initial solution, the model should be revised until a suitable solution is obtained. First, the APP problem is solved using crisp multi objective linear programming (MOLP) model. Second, the linear membership function is determined for each value of the obtained objective function. Third, the variable X is determined as the overall degree of decision maker satisfaction with the multiple fuzzy goal values. The value X is maximized. The new constraints for upper and lower values of each objective are added to the above model. The new obtained fuzzy multi objective linear programming problem should be solved. If the new solution is not acceptable according to DM considering initial solution, then the model must be changed as a satisfaction solution is procured. Model II Maxk

(11)

subjecto

(i&-8,W£- -«,0>A Vi l

B

tn~ m~h-lr,

~B,-I.n+Q„+Om-

(12) Dm

Vt.Vn

(13)

'tnmm—'tn

Vf.Vn

(14)

B

V*,Vn

(15)

,nmax^Bln N

N

2>-;.„(Q-;.,+Q-J+Wt- -VV( n=l N

(16)

n=l

J^u

\/t

n=l N

YfiniQn+OjlM,^

V»

(17)

(18)

3. An application of interactive goal programming approach for fuzzy multi-objective linear programming problem 3.1. A Case Study The considered fuzzy model is applied to an illustrative case study. The APP strategy of this case study is related to procure fluctuating demand to be met using inventories, overtime, backorders based on constant work force level over the planning time horizon. The planning horizon time is six months long, there are three types of products in illustrative case study. The considered data for case study are shown in Table 1.

304 Table 1. The data for Illustrative case study Period

Product

1

2

3

4

D
C

C°

cL

c,B„

1

1500

20

30

0.20

2 3 1 2 3 1 2 3 1 2 3

1200 2400 800 3000 2000 4000 3000 2000 2400 5000 3000

26 18 20 26 18 20 26 18 20 26 18

39 27 30 39 27 30 39 27 30 39 27

0.15 0.10 0.20 0.15 0.10 0.20 0.15 0.10 0.20 0.15 0.10

*m

Mm

40

0.010

0.12

200

52 36 40 52 36 40 52 36 40 52 36

0.012 0.014 0.010 0.012 0.014 0.010 0.012 0.014 0.010 0.012 0.014

0.06 0.08 0.12 0.06 0.08 0.12 0.06 0.08 0.12 0.06 0.08

400 300 200 400 300 200 400 300 200 400 300

"nraax

W

M

itran

250

350

400

600

300

700

200

400

The other data are shown as follows: • Initial inventory of each product is 400, 300, and 200 respectively. • End inventory of each product is 200, 150, and 100 respectively. • The initial worker level is 200 man-hours. • The costs related to hiring and layoffs are $8 and $4 per worker-hour, respectively. 3.2 Interactive solution procedure First, APP problem for illustrative case study should be solved according to Model I considering data in Tablel. The initial solutions for each objective function are obtained based on crisp Model 1. The results are g, =596897.4 g2 = 1374.667 g3 =499.933 To determine the linear membership function of each objective function, the upper and lover level of each objective function are decided by asking the decision maker. The fuzzy multi objective linear programming model can be formulated as in Model II. The related model is solved using the LINDO computer software package. The obtained results are given in Table 2. According to decision maker's view, the results can be modified interactively by changing the parameters and membership functions. In this study, essentially, it has been considered membership function's change ranges. This proposed model provides the overall levels of decision maker satisfaction for X values. For example, if X equal to zero, none of the goals are satisfied.

305 Table 2. Optimal Production Plan according to Model II Period 1

Product

e™

Om

1

990 900 2200 2666 3000 2000 2042 3000 3569 0 5150 1530

0 0 0 0 0 0 0 0 0 0 0 0

2 2

3 1

2 3

4

3 1 2 3 1

2 3 X =1.0

g]

=597306.43 g2 = 1606.44

'TN

90 0 0 1957 0 0 2600 0 1569 0 0 0

B

TN

0 0 0 0 0 0 0 0 0 0 0 0

Whj-

WlT

0

148

39

0

0

0

0

23

g3 =999.99

Using Model II to simultaneously minimize total production costs, carrying and backordering costs and costs of changes in labor levels, yields total production cost of $ 597306.43, carrying and backordering costs of $ 1606.44 and cost of changes in labor levels of $ 999.99. In this situation, the upper and lover level of objective functions are as follows: gl(upper): 700.000 $ ; gl(lover)=500.000 $; g2(upper)= 1600 $ ; g2(lover)=1000 $ ; g3(upper)=1000 $ ; g3(lover)=0. In our example, the exactly satisfaction solution results for decision maker are given in below line in Table 2. 4. Conclusion The proposed model yields an efficient compromise solution and the overall level of decision maker satisfaction given determined multiple fuzzy objective values. The major limitations of the proposed Model II concern the assumption made in determining each of the decision parameters. However, the proposed model constitutes a systematic framework that facilities the decision -making process, enabling the decision maker interactively to modify the fuzzy data and related model parameters until a satisfactory solution is found. In our example, the exactly satisfactory solution results for decision maker are given. References l.Holt, C. C. et al., 1960 Planning Production Inventories and Workforce. New Jersey: Prentice Hall 2.Zimmermann H.J., 1976 Description and optimization of fuzzy systems. International Journal of General Systems, 2,209-215.

306 3.Lee Y. Y., 1990 Fuzzy set theory approach to aggregate production planning and inventory control. PhD. Dissertation. Department of IE., Kansas State University. 4.Wang R.C. and Hsiao- Hua Fang, 2001 Aggregate production planning with multiple objectives in a fuzzy environment. European Journal of Operational Research, 133, 521-536. 5.Fung R.Y.K., Tang J., Wang D., 2003 Multi product Aggregate production planning with fuzzy demands and fuzzy capacities. IEEE Transactions on Systems, Man and Cybernatics-Part A: Systems and Humans 33, 3, 302-313. 6.Wang, R.C., Liang T.F., 2004 Application of fuzzy multi objective linear programming to aggregate production planning. Computers and Industrial Engineering, 46, 1, 17-41. 7.Tang, J., D. Wang, R.Y.K. Fung, 2000. Fuzzy formulation for multi product aggregate production planning. Production Planning and Control, 11, 670-676. 8.Bellman R.E., L.A. Zadeh 1970, Decision-making in a fuzzy environment, Management Science, 17, 141-164.

FUZZY LINEAR PROGRAMMING MODEL FOR MULTIATTRIBUTE GROUP DECISION MAKING TO EVALUATE KNOWLEDGE MANAGEMENT PERFORMANCE* Y. ESRA ALBAYRAK Engineering and Technology Faculty.Galatasaray University, Ciragan Cad. No: 36, 34357 Ortakoy, Istanbul Turkey Tel: 90 212 227 44 80 (435) Fax: 90 212 259 55 57

YASEMIN CLAIRE ERENSAL Engineering Faculty, Dogus University, Acibadem Zeamet Sokak 34722 Kadikoy, Istanbul Turkey Tel: 90 216 327 11 04 (1377) Fax: 90 216 326 33 Abstract: In this paper, we develop a linear programming technique for multidimensional analysis of preference (LINMAP) method for solving multiattribute group decision making (MAGDM) problems with preference information on alternatives in fuzzy environment. Our aim is to develop a fuzzy LINMAP model to evaluate and to select of knowledge management (KM) tools. KM decision-making problems are often associated with evaluation of alternative KM tools under multiple objectives and multiple criteria.

1. Introduction In this paper, we investigate the fuzzy linear programming technique (FLP) for multiple attribute group decision making (MAGDM) problems with preference information on alternatives. In multiple attribute decision-making (MADM) problems, a decision maker (DM) is often faced with the problem of selecting, evaluation or ranking alternatives that are characterized by multiple, usually conflicting, attributes [1]. In this paper, to reflect the decision maker's subjective preference information and to determine the weight vector of attributes, the technique for order preference by similarity to ideal solution (TOPSIS) developed by Hwang &Yoon and the linear programming technique for multidimensional analysis of preference (LINMAP) developed by Srinivasan and Shocker [2] are used. LINMAP method is based on pairwise comparisons of alternatives given by decision makers and generates the best compromise alternative as the solution that has the shortest distance to the positive ideal This research has been financially supported by Galatasaray University Research Fund.

307

308 solution. In this paper, according to the concept TOPSIS, we define the fuzzy positive ideal solution (FPIS) and fuzzy negative ideal solution (FNIS). Because organizations operate in different business contexts and drivers of knowledge management are often unique for each company, KM decision-making problems are often associated with evaluation of alternative KM tools under multiple objectives and multiple criteria. We proposed a linear programming technique for multidimensional analysis of preferences under fuzzy environment in evaluating KM tools. The use of fuzzy linear programming (FLP) to KM will be discussed and this approach to KM problems has not been appeared in the literature. The weights are estimated using fuzzy linear programming model based on group consistency and inconsistency indices. Through the proposed methodology in this research, enterprises can reduce the mismatch between the capability and implementation of the KM system, and greatly enhance the effectiveness of implementation of the KM systems. Finally, the developed model is applied to a real case of assisting decision-makers in a leading logistics company in Turkey to illustrate the use of the proposed method, 2. The Basic Model The main focus of this paper is to provide a fuzzy linear programming model [3], for multidimensional analysis of preferences (Fuzzy LINMAP). Consider a MADM problem with n alternatives A.,i = l,2,....n, and m decision attributes (criteria),

c .,j = l,2,....m. xy,

by z> = (*..)

x

component of a decision matrix denoted

,is the rating of alternative At with respect to attribute C. . Let

w=(wr wr ..., w / be the vector of weights, where £ w. = i,w.>o,j = l,2,....m and j=i

J

J

w. denotes the weight of attribute C. [4]. In this methodology, linguistic variables are used to model human judgments. These linguistic variables can be described by triangular fuzzy numbers. x..=ia..,b..,cA [5],[6]. 2.1 Distance between two triangular fuzzy numbers Let m = (m1,m2,w3) and n=(«1,«2»«3) be two triangular fuzzy numbers, then the vertex method is defined to calculate the distance between them as [7], d(m,n) = J - l (m, - n})2 + (m2 - n2)2 + (m3

-n})2\

(1)

309 2.2

Normalization

Suppose the rating of alternative A.(i = l,2,...n) on attribute C.(j = l,2,...m) given by DM P (p = l,2,...P) is xp =iap ,bp ,cpA . A fuzzy multiattribute group decision making problem can be expressed in matrix format ( D p = [ xP. ]

).

c\ap;ap E X , ' = (ap.,bp ,cp. ) , i = l,2,..,n;p r = 1,2,..,P\ A y ij v ij IJ y y --min)ap;ap exp = (ap ,bp ,cp ), i = l,2,...n; p = 1,2 P] ij

U

ij

ij

V

r

'J

I

(2) bmax^min cmax cmax h a v e

als0

game

meaning

In

MADM problems, there are

benefit (B) and cost (C) attributes. Using the linear scale transformation, the various criteria scales are transformed into a comparable scale. forjeB

max
J

mm ,mm mm a . 0. c. J J J

and yp =

for j e c

'ij

J

V V

)

V

(3)

V)

We can obtain the normalized fuzzy decision matrix denoted by Yp. Yp=\yp) p = l,2,..,P; wherey P ={yp ,yp,,yp\ are normalized triangular 6 V V )„xm 'J VvLyiJM yUR) h fuzzy numbers and denote the location of the i' alternative in the /n-dimensional space (criteria). 2.3 Fuzzy group LINMAP Let X

=\xJ,x2...

model

..x i is the fuzzy positive ideal point, i.e., the alternative

location most preferred by the individual, the square of the weighted Euclidean distance between Y.p and x , where x.=(x*L,x.M,x.R)dre

triangular fuzzy

numbers, can be calculated as ~X)L>2

"Srpj^&vL

+

(ym ~*jM>2 +(y0R ~X)R>2]/2

for

ieA

(4) The squared distance s. = d2 is given by S? = E v

i>$4 (5)

p

S can

s=

be

rewritten

using

triangular

fuzzy

l hfjltyvL-*V+<»w-*wt+(yw-xV\ y

ijL~XjL'

'"ijM

"jM/

'KJ,ijR

numbers

x*

as [8],

Suppose that the DM

310 P (p = l,2,...,P)gives the preference

relations

between

alternatives

by

np = {{k,l); Akp Ar k,l = l,2,...,n)]where pp is a preference relation given by the DM P„

s

*--}AdW%•s''lrk

#€

are squared

weighted

Euclidean distances between each pair of alternative {k,l) and the fuzzy positive ideal solution (x ) . For every ordered pair ( t , / ) e Q , the solution would be consistent with the weighted distance model if sf > sf and there is no error attributable to the solution [2]. If sf < S%, (sf -sf) an index (Sf-Sff

gives the error. We define

to measure inconsistency between the ranking of

alternatives and the preferences, i.e., to denote the error of the pair (k,l); (Sf - Sff = 0 if Sf > Sf and (Sf -Sff=Sf-

Sf if Sf < Sf (6)

Then the inconsistency index can be rewritten as, (Sf-Sff

=max[o,Sf-Sf

(7) For all the pairs in Q , the total inconsistency is B"= Y

(Sf-Sff (8)

and the total poorness of fit for the group is

B=ZB" = J: p=\

Y

(sf-sfy

p=\ (A,7)efi

(9) Our objective is to minimize the sum of errors for all pairs in Q Similarly, the total goodness (G) of fit for the group is

G=J:Gp = J: p=l

Y

1

PY (Sf-Sf)

(10)

p=\ (i,7)en

Substituting for B and G from (9) and (10), we get;

I (kj)&np

(Sf-Sff-

S

(Sf-Sff

= G-B = h

(k,i)enp

(11) h is arbitrary positive number. The constraint imposes the condition that the goodness of fit G should be greater the poorness of fit B. Let

311 Z^ = max [0, S? - sf} for each (k,l)e Q." and with z£ > 0, we have z£ > S^ - ^ . The problem of finding the best solution(w,x ) reduces to finding the solution(w,v) [9], which minimizes Eq.(12) subject to the constraints [8]. minimize^ £

Z

Zyf

subject to the constraint I m

P

yfjL -^Mi-yiMi-^R 2 P 3p=l

5> H

-I*.

£ (kJtef

(

«

•*#£ 'IqLyYljM

>

!

x

•

jM

kjM\\ \yljR

(kjjeif

ytiM-^wY^jR '

J=>

^ (k,i)enf

'fw-^

=h

y

kjR

M */-=' ZP>0, m

Iw=;, H J w. >0,

j=lZ...m j=l2...m

Using K = }v.|= fw .x*) we can write as vy.,l =wy.x.., v ... =wy.x... and vjR.„ = wJ .xjR.„ y £ ' /A/ yAf By solving this linear programming, w., v i , v

v (eq. 20) can be obtained and

J* is computed. 3. Application 5./ Evaluation Criteria for the KM tools In order to formulate the multiattribute evaluation model, it is necessary to identify the factors that influence KM practitioners' choice of KM tools. After discussions with four KM consultants and the operations manager, we studied the features of the KM tools provided by vendors, reviewed the literature for selecting software, and identified three essential evaluation criteria to use in selecting the best KM tools: cost, functionality and vendors.

312 3.1.1 Cost Cost is a common factor influencing the purchaser to choose the software [10]. It is the expenditure associated with KMS and includes product, license, training, maintenance and software subscription costs. 3.1.2 Functionality Functionality refers to those features that the KM tool performs and, generally, to how well the software can meet the user's needs and requirements. 3.1.3 Vendor The quality of vendor support and its characteristics are of major importance in the selection of software, such as in [11]. It is also critical for the successful installation and maintenance of the software. 3.2 KM tools (Alternatives) Alternative 1. Knowledger: Knowledger consists of components that support personal KM, team KM, and organizational KM. The benefit of these components is that, through the knowledge portal, it is possible to manage, collaborate, capture and convey information and so forth to the teams or organization. Alternative 2. eRoom; The eRoom software is a digital workplace that allows organizations to quickly assemble a project team, wherever people are located and to manage the collaborative activities that drive the design, development and delivery of their products and services. Alternative 3. Microsoft SharePoint Portal Server; SharePoint Portal Server software is a KM tool that is an end-to-end solution for managing documents, developing custom portals and aggregating content from multiple sources into a single location. The proposed method is currently applied to solve KM tools selection problem and the computational procedure is summarized as follows: Stepl: The experts P (p = 1,2,3) give their preference judgments between alternatives with paired comparisons as n1 = \(l,2),(2,3)}, Q2 = {(1,2),(1,3)}, n3 = {(2,i),(3,2)} i.e., 1 is preferred to 2, 2 is preferred to 3, etc. Step2: The experts use the linguistic rating variables (shown in Table 1) to evaluate the rating of alternatives with respect to each attribute. The data and ratings of all alternatives on every attribute are given by the three experts PrP2,P3 as in Table 2.

313 Table 1 Linguistic variables for the ratings Very Poor (VP) (0, 0.1,0.3) Poor(P) (0.2 0.3,0,4) (0.4 0.5,0.6) Fair(F) Good (G) (0.6 0.7,0.8) Very Good (VG) (0.8 0.9, 1.0) Table 2 Decision information and ratings of the three alternatives Criteria

Alternatives

C, (SxlO3)

Decision Makers Pi Pi Pi 50,000 50,000 50,000 35,000 35,000 35,000 25,000 25,000 25,000 Fair Good VeryG Poor Fair Poor Very G Good Good Fair VeryG Good Good Good VeryG Good Fair Good

A, A2 A, A, A2 Ai A, A2 A3

C2

C3

Step3: Constructing the normalized fuzzy decision matrix Y for expert 1 (using Eqs.(2 ) and (3)) X

X

1

Y

=A2 A

3

(0.5,0.5,0 .5) (0.71,0.71 ,0.71) (1.0,1.0,1 .0)

X

3

2

(0.6,0.77, 1.0) (0.8,1.0,1 .0) (0.2,0.33, 0.5) (0.6,0.77, 1.0) (0.8,1.0,1 .0) (0.6,0.77, 1.0)

We can obtain the normalized decision matrices Y2 and Y3 of the experts P2andP To obtain the best weights and ideal point, taking h = 1.0 and using Yp and w ; =0.284,

fipwe

solve

linear

programming

problem

(Eq.

(12)).

w2= 0.398, w} = 0.318 and

x* =((0.27, 0.27, 0.27), (0.19, 0.20, 0.22), (0.23, 0.24, 0.25)).

Using

Eq.

(6),

the

distances between YP and the positive ideal x* can be obtained. According to distances, the ranking orders of the three alternatives for the three experts are as follows: ForP, :A2pA3pAj ForP 2 : A 3 pA, pA2 ForP 3 : A 3 pA2pA, The group ranking order of all alternatives can be obtained using social choice functions such as Copeland's function [12]. Copeland's function ranks the alternatives in the order of the value of / (x), Copeland score, that is the number of alternatives in alternative set that x has a strict simple majority over, minus the number of alternatives that have strict simple majorities over X . Alternatives

A, A2 A,

Table 3 Copeland's scores Decision Makers P/

Pj

Pj

-1,-1 1,1 1,-1

-1 , 1 -1 ,-1 -1 J

-1,-1 -1,1 1,1

Copeland's scores -4 0 2

314 According to the Copeland's scores, the ranking order of the three alternatives is A3, A2, Aj. The best alternative is A3. 4. Conclusion This paper offers a methodology for analyzing individual and multidimensional preferences with linear programming technique under fuzzy environments. In this paper, a systemic approach is proposed using fuzzy linear programming to evaluate an appropriate KM tool for the organization. To reflect the DM's subjective preference information, a fuzzy LINMAP model is constructed to determine the weight vector of attributes and then to rank the alternatives. The development of a KMS is still relatively new to many organizations. This study has several implications for KM practitioners who intend to evaluate KM tools to build a KMS. Through the proposed methodology in this research, enterprises can reduce the mismatch between the capability and implementation of the KM system, and greatly enhance the effectiveness of implementation of the KMS. References 1. Hwang, K. P. and Yoon, K. P., (1995). Multiple Attribute Decision Making, Sage University Paper, Iowa. 2. Sinnivasan, V., Shocker, A.D., (1973). Linear Programming Techniques for Multidimensional Analysis of Preferences, Psychometrica, 38 (3), 337-369. 3. Hwang, C.-L., and S.-J. Chen, in collaboration with F.P. Hwang (1992). Fuzzy Attribute Decision Making: Methods and Applications. Springer-Verlag, Berlin. 4. Wang, Y., M., Parkan, C, (2005). Multiple Attribute Decision Making Based on Fuzzy Preference Information on Alternatives: Ranking and Weighting, Fuzzy Sets and Systems, 153, 331-346. 5. Van Laarhoven, P.J.M., and W. Pedrycz (1983). A fuzzy extention of Saaty's priority theory. Fuzzy Sets and Systems, 11 (3), 229-241. 6. Zadeh, L.A. (1965). Fuzzy Sets. Information and Control, 8 (3), 338-353. 7. Chen, C.T., (2000). Extensions of the TOPSIS for Group Decision-Making under Fuzzy Environment, Fuzzy Sets and Systems, 114, 1-9. 8. Li, D.,F., Yang, J.,B., (2004). Fuzzy Linear Programming Technique for Multiattribute Group Decision Making in Fuzzy Environments. Information Sciences, 158, 263-275. 9. Fan, Z.PHu, G.F., Xiao, S.H., (2004). A Method for Multiple Attribute DecisionMaking with the Fuzzy Preference Relation on Alternatives, Computers&Industrial Engineering, 46,321-327. 10.Davis,L.,&Williams,G.(1994).Evaluating and selecting simulation software using the analytic hierarachy process. Integrated Manufacturing Systems, 5 (1), 23-32. ll.Byun,D.H.,&Suh,E.H.(1996).A methodology for evaluation EIS software packages. Journal ofEnd User Computing, 8 (21), 31. 12. Hwang, C.,L., Lin, M., J., (1987). Group Decision Making under Multiple Criteria, Springer-Verlag, Berlin.

PRODUCT-MIX DECISION WITH COMPROMISE LP HAVING FUZZY OBJECTIVE FUNCTION COEFFICIENTS (CLPFOFC) SANI SUSANTO* Senior Lecturer, Department of Industrial Engineering, Faculty of Industrial Technology, Parahyangan Catholic University, Jin. Ciumbuleuit 94, Bandung - 40141, Indonesia. PANDIAN VASANT* Research Lecturer, EEE Program, Universiti Teknologi Petronas, 31750 Tronoh.BSI, PerakDR, Malaysia. ARIJIT BHATTACHARYA § Examiner of Patents & Designs, The Patent Office, Bouddhik Sampada Bhawan, CP-2, Sector V, Salt Lake, Kolkata 700 091, West Bengal, India. CENGIZ KAHRAMAN** Professor, Department of Industrial Engineering, Istanbul Technical University, 34367 Macka Besiktas, Istanbul, Turkey. This paper outlines, first, a compromise linear programming (LP) having fuzzy objective function coefficients (CLPFOFC) and thereafter, a real-world industrial problem for product-mix selection involving 29 constraints and 8 variables is solved using CLPFOFC. This problem occurs in production planning management in which a decision-maker (DM) plays a pivotal role in making decision under a highly fuzzy environment. Authors have tried to find a solution that is flexible as well as robust for the DM to make an eclectic decision under real-time fuzzy environment.

1. Introduction The theory of fuzzy LP was developed to tackle imprecise or vague problems using the fundamental concept of artificial intelligence, especially in reasoning and modelling linguistic terms. Conventional mathematical programming techniques fail to solve fuzzy programming problems (Kolman and Beck, 1995). Thus, the CLPFOFC approach is best suited to solve some of real-life problems. E-mail: [email protected] [email protected] [email protected] ' [email protected]

315

316 Some previous attempts to set fuzzy intervals, where coefficients of the criteria are given by intervals, were reported by Bitran (1980), Jiuping (2000) and Sengupta, Pal and Chakroborty (2001). Wang (1997) used triangular MF for LP modelling. A real-life industrial problem for optimal product-mix selection involving 29 constraints and 8 variables has been delineated in this paper. 2. Product-Mix Problem of Chocoman, Inc. The firm Chocoman, Inc. manufactures 8 different kinds of chocolate products. There are 8 raw materials to be mixed in different proportions and 9 processes (facilities) to be utilized having limitations in resources of raw materials. Constraints, viz., product-mix requirement, main product line requirement and lower and upper limit of demand for each product, are imposed by the marketing department. All the above requirements and conditions are fuzzy. The objective is to obtain maximum profit (z) with certain degree of LOS of the DM. 2.1. Fuzzy Objective Coefficients and Non-Fuzzy Constraints The two sets of non-fuzzy constraints are raw material availability, and facility capacity constraints. These constraints are inevitable for each material and facility, based on material consumption, facility usage and resource availability. The decision variables for the product-mix problem are: X] to x8 (viz., milk chocolate of 250g to be produced (in '000); milk chocolate of lOOg (in '000); crunchy chocolate of 250g (in '000); crunchy chocolate of 1 OOg (in '000); chocolate with nuts of 250g (in '000); chocolate with nuts of lOOg (in '000); chocolate candy (in '000 packs); chocolate wafer (in '000 packs)). The following constraints are established by the sales department of Chocoman, Inc.: Product mix requirements: Large-sized products (250g) of each type should not exceed 60% (non fuzzy value) of the small-sized product (lOOg) such that: Constraint-1: xj < 0.6 x2, Constraint-2: X3 < 0.6 X4, and Constraint-3: x5 < 0.6 x6 Main product line requirement: The total sales from candy and wafer products should not exceed 15% (non fuzzy value) of the total revenues from the chocolate bar products, such that: Constraint-4: 400x7 + 150x8 < 0.15(375x, + 150x2 + 400x3 + 160x4 + 420x5 + 175x6) 3. Rest of the identified 29 constraints, i.e., material requirement and facility usages, are given below: Constraint-5 (cocoa usage): 87.5x, + 35x2 + 75x3 + 30x4 + 50x5 + 20x6 + 60x7 + 12x8 < 100000 Constraint-6 (Milk usage): 62.5x, + 25x2 + 50x3 + 20x4 + 50x5 + 20x6 + 30x7 + 12x8 < 120000

Constraint-! (nuts usage): Ox, +0x 2 +37.5x 3 +15x 4 +75x 5 +30x 6 +0x 7 + Ox8 < 60000 Constraint-8 (confectionary sugar usage): lOOx, +40x2 +87.5x3+35x4 + 75x5 +30x6 + 210x7 + 24x8 <200000 Constraint-9 (flour usage): Ox, + 0x 2 + 0x 3 + 0x 4 + 0x 5 + 0x 6 + 0x 7 + 72x8 < 20000 Constraint-10 (aluminium foil usage): 500x,+0x 2 +500x 3 +0x 4 + 0x 5 +0x 6 + 0x7 + 250x 8 <500000 Constraint-11 (paper usage): 450x, + Ox, + 450x3 + 0x 4 + 450x5 + 0x 6 + 0x 7 + 0x 8 < 500 000 Constraint-12 (plastic usage): 60x, + 120x2 + 60x3 + 120x4 + 60x5 +120x6 +1600x7 + 250x8 < 500000 Constraint-13 (cooking facility usage): 0.5x, + 0.2x2 + 0.425x3 + 0.17x4 + 0.35x5 + 0.14x6 + 0.60x7 + 0.096xg < 1000 Constraint-]^ (mixing facility usage): Ox, + 0x 2 + 0.15x3 + 0.06x4 + 0.25x5 + 0.10x6 + 0x 7 + 0x 8 < 200 Constraint-]5 (forming facility usage): 0.75x, + 0.3x2 + 0.75x3 + 0.30x4 + 0.75x5 + 0.30x6 + 0.90x7 + 0.36x8 < 1500 Constraint-] 6 (grinding facility usage): 0x,+0x 2 +0.25x 3 +0.10x 4 + 0x5+0x 6 + 0x7 + 0x g <200 Constraint-] 7 (wafer making facility usage): Ox, + 0x 2 + 0x 3 + 0 x , + 0x 5 + Ox, + 0x 7 + 0.30x8 < 100 Constraint-]8 (cutting facility usage): 0.50x, +0.10x2 +0.10x3 +0.10x4 +0.10xs +0.10x6 +0.20x7 + 0x8 <400 Constraint-19 (packaging-1 facility usage): 0.25x, + Ox, + 0.25x3 + 0x 4 + 0.25x5 + 0x 6 + 0x 7 + 0. lx 8 < 400 Constraint-20 (packaging-2 facility usage): 0.05x, + 0.30x2 + 0.05x3 + 0.30x4 + 0.05x5 + 0.30x6 + 2.50x, + 0.15xg < 1000 Constraint-21 (labour usage): 0.30x, + 0.30x2 + 0.30x3 + 0.30x4 + 0.30x5 + 0.30x6 + 2.50x7 + 0.25xg < 1000 Constraint 22 to 29 [which are for demand for MC 250; demand for MC 100; demand for CC 250; demand for CC 100; demand for CN 250; demand for CN 100; demand for Candy; and demand for Wafer, respectively] are x, <500, x 2 <800, x 3 <400, x 4 <600, x5 <300,x 6 £500,x 7 <200,x 8 <400, respectively. The non-negativity constraints are: x,,...,xg>0. 3. Algorithm for the CLPFOFC The CLPFOFC algorithm in order to arrive at an eclectic product-mix decision under fuzzy environment, is as follows: Step 1: Formulation of the crisp linear programming problem; Step 2: Determination of the type of fuzzy number (e.g., triangular fuzzy number) to be chosen for each of the objective function coefficients C •;

318 Step 3: Defining the objective function coefficient vector (C), lower bound vector (c'J and upper bound vector (C- ) of the objective function coefficient; Step 4: Formulation of the following LP model with multiple objectives to minimize the value of fuzzy triangular numbers as follows: maximise z = (c'x,ex,e*x) subject to Ax<,=,> b, where x>0

(1)

Step 5: Transforming the problem formulated in Step 4 to the following form: min z, = (c - c" )x, max z2 = ex, max z3 = (c+ - c)x subject to Ax <, =, > b, where x > 0 (2) Step 6: Determination of the following set of compromise solutions: (c-c)x ^ xeX»(x|Ax£,=,2b,x20|

(c-c")x

(4)

ex

(5)

ex

(6)

(c -c)x

(7)

((c + -c)x

(8)

xeX={x|Ax£,=,2b,x20) X < E X = ( I | A X S , = ,2 b,x20)

min xeX={x|Ax<;,= ,2 b , i 2 0 | + x<sX = (i|AxsS,..,2b,x20)

zmm =

min mi" ieX=(x|Ax£,=,2b,x20) xeX=(x|Ax£,=,2b,x20)

'

Step 7: Defining the following set of three MFs: , if (c - c" )x ^ z. V-z. (*)

:

— ,if zI

s (c - e )x £ Zj

(9)

•c / -i ^ max , if (c - c )x 2 Zj

, i f ex 2 z™" u

x

*,( )

, i f z ° i n < ex < z™"

(10)

, i f c x < z?'"

(c

Mz,0>:

i/(c -c)x>z3 min -c^x-zj mjn + max if Zj <(c - c ) x £ z 3 max min z3 -z3 +

, if (c

(11)

-c)x< z 3

Step 8: Defining the following linear programming problem: max mini u (x),u, (x),u z (x)}

n?x

Step 9: Converting the LP of Step 8 into the following compromise LP problem: a= min { u7 (x),u (x),u (x)} n^ xeX=(x|AxSb,i20) t ^ z l

V

•" ^ !

v

- , ""Zi v

'J

U

J

/

Step 10: Obtaining an equivalent compromise solution to Step 8 by using the following LP problem:

319 max a

(14)

subject to: A z ,(x)>a

p,t(i)>a

or

or

// Zj (x)>a Ax^,=,>b

(,c-c)x+a(zr~zr)
cx-a(z2

max

+

max

or (c -c)x-a(z3

-z2

-Z3

min

rain

(15)

)>zf

(16)

m/

(17)

)>z3 '

(18)

0
(19)

x>0

(20)

4. Results, Discussions and Conclusion The WinQSB® software have been used to obtain the results (Table 1). Solving eqs. (9) to (11), the following values are obtained respectively: \xz\(x) = 0.5030, Hz2(x) = 0.5030 and (J.z3(x) = 0.5039. By solving the definition of a from eq. (13) one gets a = min{ n Z| (x),|x Z2 (x),n Zj (x)} = 0.5030 The value of a corresponds to the two MF nzi(x) and u^x). From these two MFs, the interpretation of the value of a can be obtained through the following steps: (i) From the definitions given in Step 5 for the optimal solution in Table 1, the obtained values are: z, = 33404.033, z2 = 133899.404, and z3 = 33504.823; (ii) Secondly, from the definition one gets: uzi(x) = 0.5030. This value represents the LOS of the DM achieved by the optimal solution in Table 1. The highest and the lowest LOS are achieved when the difference between the value of ex and c" x is 0 and 67215.600, respectively. By linear interpolation, (c-c")x = Zi= 33404.033 which corresponds to the LOS of DM at |jzl(x) = 0.5030. (iii) Thirdly, from the definition nz2(x) = 0.5030. The highest and the lowest LOS are achieved when the value of ex is 0 and 266184.900 respectively. Using linear interpolation, cx=z2= 133 899.404 which corresponds to the LOS of DM at uz2(x) = 0.5030. Table 1. Optimal combination of products from WinQSB'1' software Quantity to produce per 1000 units Product MC250 MC100 CC250 CC 100

46.037 76.728 360.000

CN250 CN 100

600.000 0.000 0.000

CANDY WAFER

100.790 0.0000

320 The solution using the developed fuzzified multi objective compromise LP algorithm considers imprecision of the given information. The non-fuzzy solution of Tabucanon (1996) results in optimal value of z as 266,157 (assuming LOS of the DM always remains constant at 100%) while fuzzy solution of Vasant (2003) using modified S-curve MF results in optimal z = 318,000 (for 100% LOS of the DM which is an ideal assumption) and z = 254,400 (at 50% LOS with a pre-determined vagueness value of 13.8). The CLPFOFC model uses triangular MF with an optimal profit z = 266,184 (at 100% LOS of the DM) and z = 133,899 (at 50.30% of LOS of the DM). The model presented by Tabucanon (1996) was not flexible enough to incorporate DM's LOS. Inter alia, Vasant (2003) didn't use such a fuzzified compromise model. Further extension of the present model using a suitably designed smooth logistic MF (which is of course a more realistic assumption) may increase profit of the Chocoman, Inc. trading off suitably among decision variables and other constraints in making product-mix decision. References 1. Bitran, G.R., 1980, Linear multiple objective problems with interval coefficients. Management Science 26, 694-706. 2. Jiuping, X., 2000, A kind of fuzzy linear programming problems based on interval-valued fuzzy sets. A journal of Chinese universities 15(1), 65-72. 3. Kolman, B. and Beck, R.E., 1995, Elementary linear programming with applications. Academic Press, USA. 4. Sengupta, A., Pal, T.K. and Chakraborty, D., 2001, Interpretation of inequality constraints involving interval coefficients and a solution to interval linear programming. Fuzzy Sets and Systems 119, 129-138. 5. Tabucanon, M.T., 1996, Multi objective programming for industrial engineers. Mathematical programming for industrial engineers. Marcel Dekker, Inc., New York, 487-542. 6. Vasant, P., 2003, Application of fuzzy linear programming in production planning. Fuzzy Optimization and Decision Making, 2(3), 229-241. 7. Wang, L.-X., 1997, A Course in Fuzzy Systems and Control, Prentice-Hall Int., London.

MODELING THE SUPPLY CHAIN: A FUZZY LINEAR OPTIMIZATION APPROACH NUFER YASIN ATES Industrial Engineering Department, Istanbul Technical University, 80680 Macka, Istanbul, Turkey SEZI CEVIK Industrial Engineering Department, Istanbul Technical University, 80680 Macka, Istanbul, Turkey A supply chain is a network of suppliers, manufacturing plants, warehouses, and distribution channels organized to acquire raw materials, convert these raw materials to finished products, and distribute these products to customers. Linear Programming is a wide used technique to optimize Supply Chain decisions. In the crisp case, every parameter value is certain whereas in real life, the data is rather fuzzy than crisp. The fuzzy set theory has the capability of modeling the problems with vague information. In this paper, a fuzzy optimization model for supply chain problems will be developed under vague information. A numerical example will be given to show the usability of the fuzzy model.

1. Introduction A crucial component of the planning activities of a manufacturing firm is the efficient design and operation of its supply chain. A supply chain is a network of suppliers, manufacturing plants, warehouses, and distribution channels organized to acquire raw materials, convert these raw materials to finished products, and distribute these products to customers. Strategic level supply chain planning involves deciding the configuration of the network, i.e., the number, location, capacity, and technology of the facilities. The tactical level planning of supply chain operations involves deciding the aggregate quantities and material flows for purchasing, processing, and distribution of products. The strategic configuration of the supply chain is a key factor influencing efficient tactical operations and has a long lasting impact on the firm. Meanwhile tactical level planning which determines the operational efficiency of the strategic configuration is very important too and must be handled attentively during the all working process of the chain. Supply chain management (SCM) is the term used to describe the management of the flow of materials, information, and funds across the entire 321

322 supply chain, from suppliers to component producers to final assemblers to distribution (warehouses and retailers), and ultimately to the consumer. In fact, it often includes after-sales service and returns or recycling. In contrast to multiechelon inventory management, which coordinates inventories at multiple locations, SCM typically involves coordination of information and materials among multiple firms. Supply chain management has generated much interest in recent years for a number of reasons. Many managers now realize that actions taken by one member of the chain can influence the profitability of all others in the chain. Firms are increasingly thinking in terms of competing as part of a supply chain against other supply chains, rather than as a single firm against other individual firms. Also, as firms successfully streamline their own operations, the next opportunity for improvement is through better coordination with their suppliers and customers. The costs of poor coordination can be extremely high (Johnson and Pyke, 1999). Beginning with the seminal work of Geoffrion and Graves (1974) on multicommodity distribution system design, a large number of optimization-based approaches have been proposed for the design of supply chain networks. However, the majority of this research assumes that the operational characteristics of, and hence the design parameters for, the supply chain are deterministic. Unfortunately, critical parameters such as customer demands, prices, and resource capacity are quite uncertain. Moreover, the arrival of regional economic alliances, for instance the Asian Pacific Economic Alliance and the European Union, have prompted many corporations to move more and more towards global supply chains, and therefore to become exposed to risky factors such as exchange rates, reliability of transportation channels, and transfer prices. Unless the supply chain is designed to be robust with respect to the uncertain operating conditions, the impact of operational inefficiencies such as delays and disruptions will be larger than necessary (Goetschalckx et al., 2003) In decision-making, especially when a high degree of fuzziness and uncertainties are involved, due to imperfections and complications of information processes the theory of fuzzy sets is one of the best tools of systematically handling uncertainty in decision parameters. Supply Chain Problem is complex in nature and invites strategic decision of long-term implications. Much information at the decision process is not known with certainty. Due to this, the supply chain problem inherits the characteristics of impreciseness and fuzziness. Fuzzy set theories are employed due to vagueness and imprecision in the supply chain problem and are used to transform imprecise and vague information of the objective and constraints into the fuzzy objective and fuzzy constraints (Kumar et al., 2006). Zadeh (1965) suggested the concept of fuzzy sets as one possible way of improving the modeling of vague parameters. Bellman and Zadeh (1970)

323 suggested fuzzy programming model for decisions in fuzzy environment. In this paper, a supply chain problem under incomplete information is solved using fuzzy set theory. In the crisp problem all supplier, production, inventory and market constraints are certain. This is not the real case. The real problem includes vagueness rather than certainty. The technique used for solution of this problem is fuzzy linear programming. This paper is organized as follows. Section 2 defines the crisp supply chain problem. Section 3 gives the basis of fuzzy linear programming and fuzzy model of the supply chain problem. A numerical example is demonstrated in section 4. The example problem is both solved for the crisp and the fuzzy case. Section 5 gives the concluding remarks. 2. Crisp Supply Chain Problem A crisp logistic problem can be formulated as following linear programming (LP) model. For simplifying presentation, only one product is considered in this paper. Also it is assumed that the costs and qualities of raw material from suppliers are same, that's why only transportation costs from suppliers are considered. Crisp Logistic Model Minimize

] T ] T cik xik + ] T ck xk + £

^

c

kj

i

subject to

k

k

j

x

kj

j

Ax > b

X

Xk

>~DJ=gi

for each

J'

^

k

3\\xik,xk,xkj,Di,gi

>0

where xik denotes the amount of raw material shipped from location i to plant k, xk represents the amount of product x produced at plant k, xkj is the amount of product x shipped from plant k to market j , cik denotes the unit cost for raw material shipped from location i to plant k, ck represents the unit cost for producing product in plant k, c^ is the unit cost for shipped the product from plant k to market;', Dj denotes the demand of market j for this product and gj represents the safety stock at marketplace j which is specified by the decision maker (Yu and Li, 2000). The objective function in Eq. (1) expresses the cumulative cost which consists of transportation, production, and inventory. First constraint group expresses the general constraints about flow balance, workers, materials, funds, and other resources requirements in related locations, plants, and markets. Last constraint group expresses the constraints of supply and demand in markets. In real life, many input information related with the supply chain are not known with certainty. For example, how the supplier will respond to a new

324

design cannot be ascertained. At the time many situation are expressed in imprecise terms like 'very poor in late deliveries', 'hardly any rejected quantities', 'capacities of a supplier X is somewhere between 2000 and 2500', etc. Also, the inventory, market and supplier constraints can not be known in advance. Moreover decision makers predetermine an interval in their mind for value of the goal. Such vagueness in the critical information cannot be captured in a deterministic problem and therefore the optimal results of these deterministic formulations may not serve the real purpose of modeling the problem. Due to this, we have considered the model as a fuzzy model. For this model, it is desired to maximize the overall aspiration level rather than strictly satisfying the constraints. 3. Fuzzy Linear Optimization Fuzzy mathematical programming is defined as the term that is usually used in operation research, i.e. as an algorithmic approach to solve models of the type Maximize fix) (2) such that g , ( x ) < 0 Here, a special model of the problem "maximize an objective function subject to constraints", namely the "linear programming model" will be considered: Maximize Z = cTx such that Ax0 where c and x are n-vectors, b is an m-vector, and A is an mxn matrix. It is assumed that the decision-maker has upper and lower bounds ci and Cj for the attainment of the objectives. The decision-maker can establish these aspiration levels for himself, or they can be computed as a function of the solution space. The constraints can be hard or soft. If soft, it is assumed that the decision-maker has upper and lower bounds c, and c, for the, the right hand side b can be exceeded by the amount p, which is also under the discretion of the decision-maker (Jaroslav, 2001). The membership function of the fuzzy objective function i, juG (x), should be 0 for aspiration levels equal to or less than c,, 1 for aspiration levels equal to or greater than c, , and monotonically increasing from 0 to 1, that is, 0

ifcjx^c. T

_ ~

MaM = ] Ci_X „C|'

I

if Ci
ifcfxZc;

(4)

325 The membership function of the fuzzy set representing constraint j , fic

(x),

should be 0 if the constraint is strongly violated (i.e., if it exceeds bj+pj), 1 if it is satisfied in the crisp sense (i.e., if equal to or less than bj), and should decrease monotonically from 1 to 0 over the tolerance interval (bj, bj +pj), 1 if (Ax)}
if bj <(Ax)j
Mc (•*) =

(5)

Pi if (Ax)j >bj +

Pj

The membership function of the decision set, juD (x), is given by /uD(x) = min)M c (x),Mc (•*)} ;'= l,..l; j = l,...,m, forall^e X a

j

'

(6)

The min-operator is used to model the intersection of the fuzzy sets of objectives and constraints. Since the decision maker wants to have a crisp decision proposal, the maximizing decision will correspond to the value of x, xmax, that has the highest degree of membership in the decision set: MD(xmM) = max minJMG (x),M c (*)j i = \,...k\ j = \,...,m m This problem is equivalent to solving the following crisp LP problem Max A subject to f c,T x-c" \ >X i = l,2,...,* c, - c i J f

(Ax)]-bj^

>A

(8)

j = l,2,...,m

Pj

x>0 which can be rewritten as Max A subject to A(c -c)t-c] x<-ct i = l,2,...,k Apj + (Ax)j < bj + Pj

(9)

j = 1,2,...,m

x>Q Now, we can convert the crisp LP for logistics problem into the fuzzy LP problem. Since our problem is a minimization problem, we first convert it into a maximization problem with a slight modification. Here, the results obtained from the strictest crisp case can be used to determine candc values. The fuzzy LP model of the logistic problem is given in Eq. (10)

326 Max A subject to A(c-c)-

-ZZ c <* x <*~Z c i

k

Apj+(-Ax)j

<-c k

k

j

<-bj+p]

(10) for each j ,

APj + )j

V *

Jj

&]\xik,xk,xkJ,Dj,gj

>0

4. Numerical Example In this section, a numerical example is solved to demonstrate the usability of the model and the approach. A supply chain consisting 3 suppliers, 2 production plants and 4 market places are to be optimized with the constraints shown in Table 1. Table. 1: Parameters of the numerical Example

Quantities Supplier Capacity

Plant Capacity

Safety Stocks in Markets

Demands in Markets

1

81

1

95

1

32

1

8

2

79

2

55

2

40

2

12

3

97

3

60

3

59

3

11

4

27

4

8

Costs Market

1

2

1

2.1

3.7

2.3

2

1.9

3.8

2.1

3

2.8

3.1

2.75

3 Plant

Supplier

Plant

Production

1

2

3

4

1

1.25

1

3.45

3.20

3.40

2.25

2

1.98

2

2.35

2.00

4.50

2.60

3

1.65

3

3.80

4.05

3.65

1.95

327 First, the model is solved with the crisp LP for the strictest case where all capacity and demand constraints are determined at the lowest level. The solution obtained which is shown in Table.2 will be used as a basis for the fuzzy LP for setting the c a n d c values. Table.2: Crisp LP Solution

CRISP LP SOLUTION Market

1

2

3

1

16

0

47

2

79

0

0

3

0

55

0

1 Plant

Supplier

Plant

Production

2

3

4

1

95

1

37

0

58

0

2

55

2

3

52

0

0

3

47

3

0

0

12

35

Total Cost = 1315.450 Table.3: Fuzzy LP Solution

FUZZY LP SOLUTION Production

Market

2

3

1

69

0

0

2

30

0

52

3

0

57

0

Plant

Supplier

Plant 1

1

1

2

3

4

1

40

0

60

0

2

58

3

52

2

3

55

0

0

3

0

0

15

37

100

Total Cost = 1425.476 Then the same problem is solved for the case where capacity and demand constraints are fuzzy. In this case, supplier's and plant's capacities can be increased at a %9 level to compensate %12 increase in demands at all markets without increasing the total cost more than %15. For this fuzzy case where X is obtained as 0.557, the solution is given in Table.3 above 5.Conclusion In this paper, a fuzzy approach to the supply chain problem is developed using fuzzy linear programming. It is proposed first to solve the crisp optimization case for the strictest case to obtain a basis for the fuzzy case for setting c andc, then to solve the problem for the fuzzy case where tradeoff values are determined by the subjective assessments of the decision makers. The usability of the technique proposed is shown via a numerical example where the suppliers' capacities, plants' capacities, and demands of the markets are fuzzy. Without interrupting the cost constraint more than a decision maker specified increase, the optimum values for the decision variables are obtained by the method proposed for the supply chain actually working in a fuzzy environment. So the model seems quite useful for all supply chain practitioners in the business world.

328 Reference 1. Bellman, R.E., Zadeh, L.A., Decision making in a fuzzy environment, Management Sciences, 17, B141-B164, 1970 2. Geoffrion, A. M. and Graves G.W., Multi-commodity distribution system design by Benders Decomposition, Management Science, 20, 822-844, 1974 3. Jaroslav R., Soft Computing: Overview and Recent Developments in Fuzzy Optimization, Ostravska univerzita Press, Listopad, 151-176, 2001 4. Johnson, M.E. and Pyke, D.F., Supply Chain Management, Dartmouth College Press, Hanover, 2-11, 1999 5. Kumar M., Vrat, P., and Shankar, R, A fuzzy programming approach for vendor selection problem in a supply chain, International Journal of Production Economics, 101, 273-285, 2006 6. Marc Goetschalckx, M. et al, A stochastic programming approach for supply chain network design under uncertainty, Alaaddin Workshop, March, Pittsburg, 2003 7. Yu, C.S. and Li, H.L., A robust optimization model for stochastic logistic problems, International Journal of Production Economics, 64, 385-397, 2000 8. Zadeh, L.A., Fuzzy sets, Information and Control,8, 338-353, 1965

A FUZZY MULTI-OBJECTIVE EVALUATION MODEL IN SUPPLY CHAIN MANAGEMENT* XIAOBEI LIANG* Shanghai Business School, Shanghai, 200235, China School of Management, Fudan University, Shanghai, 200433, China XINOHUA LIU School of Business Administration, South China University of Technology, Guangzhou, 510641, China School of Information Management, Shandong Economic University, JV nan, 250014, China DAOLI ZHU School of Management, Fudan University, Shanghai, 200433, China BINGYONG TANG, HONGWU ZHUANG Glorious Sun School of Business and Management, Dong Hua University, Shanghai, 200051, China Nowadays the selection of suppliers in supply chain management (SCM) becomes more and more important, and then, the problems of which evaluation method can be used in this area are addressed. This paper proposes a worst point-based multi-objectives suppliers evaluation method, the major thoughts are: First choosing n different suppliers evaluation criterion to be an n-dimensions space, every suppliers information is a point in this space. Then the Euclid distance between these points can be calculated, and the suppliers by the distance of being the best or worst are sorted out.

1. Introduction Under the supply chain environment, the competition among companies will develop from individuals into the whole chain. Within the supply chain, the relationship between individuals will develop into cooperation. The characteristic of this kind of cooperation is that an enterprise wraps up the ability of promoting the key business by outsourcing non-core business. When it This work is supported by grant 05ZRI4091 of the Shanghai Natural Science Foundation. Corresponding author. Tel.: 0086-21-62373937; Fax: 0086-21-62708696. E-mail address: [email protected] (X.B. Liang)

329

330 comes to a specific relationship between suppliers and manufacturers, the winwin status can be reached based on this cooperative foundation. This kind of cooperation will be further developed in the direction of cooperative strategy in some proper period. The supplier's conduct is closely related to the manufacturer's profit. Taking Honda for example, about 80% of an automobile's cost in Honda is used for buying the parts from the supplier, namely the purchase volume in supplier's reaches 6 billion dollars every year, that is to say 13,000 staff only account for 20% costs of all vehicles in the company. From this point of view, Honda implements an important purchase and management action—"best partner's" project. The development of supplier's cooperation is very important to a manufacturer, it embodied mainly as follows: raise product's quality, improve stock level, shorten leading time, quick market react, and better product design1"7. Supplier's appraisal is one of the basic decisions for a company, and the choosing course has always many criterions. Dickson first studied appraisal criterions systematically, he enumerates at least 50 criterions in a document. Weber (1991) summarized 23 criterions in ref. 8, these 23 criterions are included within Dicksons'. Because of many criterions of the problem, we know that the essence of the supplier's appraisal is multi-objective, but there is not so much literature about supplier's appraisal by multi-goals programming. In order to satisfy all criterions, it will often produce the conflict between the goals, Weber, et al propose use multi-goals programming to solve the problem in document 4 in 1993, at the same time, he discussed the balance question between different criterions.9"11 2. A Fuzzy Multi-objective Evaluation Model of Suppliers The fundamental thought of model is: first, one should decide on the evaluation standards of numerous suppliers, and use them to constitute a set of vectors; second, one should make an n-dimensional space to represent each supplier by each point, then make sure the optimal or the worst supplier; finally, one could count the distances of each supplier to the optimal or the worst supplier on the basis of the Euclidean distance, and sort the suppliers according to the value of distances. The explanations of variables and symbols are as follows: X~(xh x2, , x„) : the suppliers vectors, which will be estimated; f=(fuf2, ,/J- the evaluation standards vectors of suppliers; f(X)=(fi(X),f2(X), ,fm(X))'- the set of suppliers evaluation targets. Suppose the favorites of the manufacturers are absolutely known. Furthermore people could employ the targets weights a to express. Generally speaking, there exist the following models.

331 Model 1 min z = (f(X)) In terms of the documents, some evaluation standards of the suppliers are the greater the better, such as punctual delivery rate, quality etc. On the contrary some are the smaller the better, such as prices, returns rate etc. If one develops the model 1, then the model 2 can be obtained. Assuming/" =(fi(X),f2(X), ,fk(X)), represents k standards of the smaller and the better; / + =(fk+i(X), fk+2(X) , fm(X)), represents m-k standards of the greater and the better. Then there exists: \minf-(X) = (fl(X),f2(X), ,fk(X)) Model 2 < [max / + (X) = (fM (X), fM (X), , / . (X)) In general one usually employs weights to change single-objective problems for multi-objective problems in the multi-objective decision. This paper will continue the above method and employ relative inferior subordination degree and quadratic normal form to change the single-objective problem for the model 2. Introducing the relative inferior subordination degree v i{Xj)>' = 1>2, , m,x, e. X, represent i standard degree of quality of supplier j , thus change the relative superior subordination degree matrix for a decision matrix, from this could reduce the affect of decision results. —=—,j = l,2,

,k,x.

eX

—=—,i = k + l,k + 2,

which

,m,x

sX

7 = max{/:(x,),/:(x2),

, /(x,)},i = l,2,

/ = min{/(*,),/(*,),

, f.(xn)},i

, k

= k + \,k + 2,

, m

One could get the worst point on the basis of the most inferior principle. The relative subordination degree matrix n! A = (a..)

ij mxn

=<Ji.{x.))

i j mxn

=(^.<-

Assuming s(x ) = (co(\-a ),a> (\-a j 1 ly 2 2j

-

2 ),

A

)

n

r\(n-r}]

a (\-a )) m mj

Setting the quadratic normal form of s(x ) to be J 2 m 2 ,. ,2

II '(*,> = z«.(i-«..r) ' \i ||2

Letting d(x ) = * III sO ) , its geometric meaning is the Euclidean J V J H distance of supplier j to the worst supplier. The greater d(x ) is, the most J excellent integrated index. And then one could get:

r^

Model 3 max d(x ) = «lit s(x ) J V J According to the existing theorem the optimal solution of the model 3 is the effective solution of the model 2. Thus the multi-objective problems have changed into single-objective problems. 3. An Example for the Fuzzy Multi-objective Evaluation Model of Suppliers The key step before applying the suppliers' evaluation model is to decide on the evaluation standards of the suppliers. Referring to the research result of Dickson and Weber, and to consult with a few professionals of relevant field, this model could look on preset time (T), procurement costs (C), quality (Q), after-sales service (S) and supply ability (F) as the critical factors of evaluation. It is hard to calculate the procurement costs, therefore here one could employ the procurement price instead of the procurement costs. If the company has a perfect cost accounting system, it is better to adopt the procurement costs. The supply ability could be computed according to the average annual capacity. The quality and after-sales service are the targets, which are difficult to quartile, so one could employ the datum correlation method to get them. The suppliers set

''

2

'

3

to = [0.5128,0.2615,0.1289,0.0634,0.0333] The targets weights fi(X): the value of the target of preset time; f2(X): the value of the target of procurement costs; f3(X): the value of the target of quality; f4(X): the value of the target of after-sales service; f5(X): the value of the target of supply ability.

"" factors Suppliers"---^ 1 2 3

Table 1 The value of the target of supply ability T S C Q 3 4 6

90 100 80

100 120 150

80 90 100

min/ (X) =

(fi(X),f2(X))

F 1200 800 1100

Building the model max f (X) = (/ (X), f (X), f (X)) 3 4 5 On the basis of the most inferior principle people could get the worst point f

=<J\,f2,fi,Ufi)

=(6, 150, 80, 80, 800)

The relative subordination degree matrix 0.5 0.67 1 0.89 0.67

0.8

0.89

0.8

0.67' 1

1 0.8 1 0.73 Counting the Euclidean weighting distance Table 2 The evaluation results of suppliers on the basis of the most inferior subordination degree Supplier 1 0.0733

Supplier 2 0.0339

Supplier 3 0.0006

From Table 2 one could see that the optimal is Supplier 1. So Supplier 1 is better than Supplier 2, and Supplier 2 is better than Supplier 3. 4. Conclusions The appraisal of suppliers is not only for choosing the promising suppliers to set up strategic partnership, but also to carry on the performance of assess the supplier with business contact. It then offers consultant for setting up effective incentive mechanism in suppliers. The appraisal method on the basis of the most unsatisfactory point in this text can be practiced easily, it may play a utilitarian role in supply chain management.

334 References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11.

Gregory A. The Road to Integration: Reflections on the Development of Organizational Evaluation: Theory and Practice, Omega, 1996,24(3): 295307. F. K. Wang, Norma F. Hubele, Frederick P. Lawrence. Comparison of Three Multivariate Process Capability Indices, Journal of Quality Technology, 2000, 32(3): 263-275. Zachary G. Stoumbos. Process capability indices: overview and extensions. Nonlinear Analysis: Real World Application, 2002,3(2): 191-210. John M. Quigley. A Simple Hybrid Model for Estmating Real Estate price Indexes. Journal of Housing Economics, 1995(4): 1-12. Eklof Jan A, Westlund, Anders h, Customer Satisfaction Index and Its Role in Quality Management, Total Quality Management, 1998, 9(4): 80-86 Hall, Mary-Jo, The American Customer Satisfaction Index, Public Manager, 2002, 31(1): 23-27. Lall Sanjaya, Competitiveness Indexs and Developing Countries: An Economic Evaluation of the Global Competitiveness Report, World Development, 2001, 29(9): 1501-1525. Weber C A, Current J R, Desai A. Non-cooperative negotiation strategies for vender selection. European Journal of Operational Research, 1991,(108): 208223. Lin.Hsiu-Fen, Lee.Gwo-Guang, Impact of organizational learning and knowledge management factors on e-business adoption. Management Decision 43, no. 2 (2005): 171-188. K.Sameer, R.Thomas, A.Mark, Systems thinking, a consilience of values and logic. Human Systems Management 24, no. 4 (2005): 259-274. Weber C A, Current J R. A multi-objective approach to vender Selection.European Journal of Operational Research, 1993,(68): 173-184

EVALUATING RADIO FREQUENCY IDENTIFICATION INVESTMENTS USING FUZZY COGNITIVE MAPS ALP USTUNDAG f Department of Industrial Engineering, Istanbul Technical University Macka 80680, Istanbul, Turkey MEHMET TANYAS Department of Industrial Engineering, Istanbul Technical University Macka 80680, Istanbul, Turkey RFID (Radio Frequency Identification) is an Auto-ID technology which uses radio waves to automatically identify the individual items. Using RFID systems for identifying and tracking objects, it is possible to improve the performance of a supply chain process in terms of operational efficiency, accuracy and security. RFID systems can be implemented in different levels like item, case or pallet. These various applications create different impacts in supply chain processes. RFID investments are very important strategic decisions so they require a comprehensive evaluation process. The tangible and intangible benefits should be integrated to evaluate them. The fuzzy cognitive mapping (FCM) is a suitable tool to model causal relations in a non-hierarchical manner for an RFID investment evaluation. In this paper, the FCM method is used to measure the impact of RFID investment in a supply chain process.

1. Introduction Radio Frequency Identification (RFID) is an Auto-ID technology consisting of a microchip with a coiled antenna and a reader. Using radio-frequency waves, data and energy are transferred between a reader and a tag to identify, categorize and track objects. The RFID system consists of tags, readers and a computing infrastructure for storing and analyzing the data received from the reader. The supply chain processes gain many benefits from the RFID technology. According to a study by A.T. Kearney, retailers expect benefits in primary areas: Reduced inventory, store/ warehouse labor reduction and reduction in out-ofstock items [1]. Using RFID technology, the companies can have better order fill rates, short order lead times, less inventory shrinkage and improved customer service. RFID provides efficiency, accuracy and security on the supply chain. Corresponding author: Tel.:+0 (212) 293 13 00-2759, Email: [email protected]

335

336 So far in many studies, the benefits of RFID are explained qualitatively. But there is a lack of quantitative evaluation model of this technology. In this paper using the fuzzy cognitive mapping approach, we try to build a model which quantifies the impact of RFID technology on a supply chain process. 2. Literature Overview 2.1. RFID Technology The most published research papers about RFID are focused on the explanation of what the technology is and what is not. Wu et al. [2] examine the existing challenges that RFID technology is facing, its future development directions and the likely migration paths to realize its promises. In McFarlane's paper on "The Intelligent Product in Manufacturing Control" [3], the basic concepts of RFID and its implications on discrete event control are examined. Sheffi [4] speculates on the possible future adoption of RFID technology considering the innovation cycles of several technologies. Karkkainen [5] discusses the potential of utilizing RFID technology for increasing efficiency in the supply chain of short shelf life products. RFID based system designs for specific applications are also examined in some research papers. Chow et al. [6] propose an RFID-based warehouse resource management system. Ngai et al. [7] propose system architecture capable of integrating mobile commerce and RFID applications in a container depot. Ni et al. [8] present a location sensing prototype system that uses RFID technology for locating objects inside buildings. Goodrum et al. [9] propose an RFID based tool tracking and inventory system which is capable of storing and maintenance (O&M) data on construction job sites. An RFID system implementation can be seen as an IT/IS investment. There are a lot of studies about IT/IS investment evaluation in the literature. However, it is really difficult to calculate the returns of an RFID system deployment. Patil [10] discusses that Discounted Cash Flow (DCF) and Net Present Value (NPV) calculations are too limited a basis on which to make RFID investment decisions because they undervalue returns and focus management attention on short term cash flow. In our study, we propose a fuzzy cognitive mapping approach to evaluate the impact of RFID on business processes.

2.2. Fuzzy Cogntive Maps (FCM) A fuzzy cognitive map (FCM) is a method to draw a graphical representation of a dynamical system, and connecting the state variables in the system by links that symbolize cause and effect relations. According to Kosko [11], an FCM ties facts and things and processes to values and policies and objectives. An FCM is a non-hierarchic flow graph from which changes to each statement (concept, node) are governed by a series of causal increases and decreases in fuzzy weight values [12]. Given an FCM, with a number of the concepts, Ci where i = 1, ,n exists , the value of each concept can be calculated using the following equation: ,i+!

a=f

f n

^

W=i

J

+ C"'

(l)

Cjt+1 is the value of concept at step t+1, Ci'"1 is the value of the concept at the t-1 step, f(x) is a threshold function and Wy is the weighted link from concept d to Cj. The threshold function f(x) can be hyperbolic (tanh x) or sigmoid (x = l/l+e"rx). If the concepts are negative and their values belong to interval [-1, 1], the hyperbolic function is used. If the concept value interval is [0, 1], the sigmoid function is used. The initial row vector can be written with notation {Ci, C2, C3....Cn} for n concepts and the weights of the edges can be written in a nxn matrix W, where each element wy gives the weight of the edge from concept Cj to Cj. Calculating the FCM continuously, we can either approach to a limit cycle or fixed point cycle. In the literature, several extensions to the FCM method have been proposed, such as Rule Based Fuzzy Cognitive Maps, Extended FCMs, and Evolutionary FCMs. The FCM models were used in numerous areas of applications as medicine, political, science, international relations, military science, supervisory systems etc. 3. Cost and Benefits of RFID Deployment Investment in RFID is of strategic nature since it is clearly linked to business strategy in the following way [10]: • • •

By Process Innovation By importance of its adoption in Business Process Reengineering By enabling IT Capabilities

338 Because the decision to deploy RFID technology in an enterprise is a strategic business decision not a technology decision, cost - benefit analysis is a key component of this decision. To measure the value of an RFID investment, we have to understand the elements of cost as well as the business- and customer-related benefits comprehensively. The cost of an RFID deployment can be examined in three key areas: hardware, software, and services. Hardware costs include the cost of tags, readers, antennas, host computers and network equipment. Software costs include the cost of creation or upgrade of middleware and other applications. Service costs include the cost of installation, integration of various components, training, support and maintenance, and business process engineering. RFID benefits can be broken down into two parts: the first is cost reduction (labor cost reduction, inventory cost reduction, process automation , and efficiency improvements), and the second is value creation (e.g. increase in revenue , increase in customer satisfaction due to responsiveness, and anticounterfeiting, etc) [2]. To identify and track the products, RFID systems can be deployed in item, case or pallet levels. RFID can also be used to track and identify the capital assets like forklifts, cranes or racks. Various options of RFID deployments create different levels of efficiency, accuracy or security in business processes. As an example in a distribution center, receiving operations can benefit greatly from RFID. The labor saved is directly related to the type of receiving performed. Since case-level receiving requires more scans, it should benefit more from RFID than pallet-level operations. While receiving and shipping are common areas of interest RFID deployment, an entirely scan-free operation is the ultimate goal from an efficiency perspective. In the perspective of accuracy, RFID has the ability to provide an inventory tracking mechanism that is not dependent on humaninitiated scans. Outbound load confirmation is also a good example of how RFID can improve accuracy. An RFID alternative that reads all case tags on a pallet as it moved through an outbound door would eliminate both the need for the palletizing scan and the error that would occur if the scan did not take place. Security is also an important performance measure in a supply chain process. Since RFID can passively track the movement of an individual object, it can be used in similar manner as sensormatic and other loss-prevention technology to help reduce theft.

4. Fuzzy Cognitive Map to Evaluate the RFID Investment In our study, we propose a model to examine the impact of an RFID investment in a distribution center. After RFID investment, the company will have labor and inventory cost reductions. Customer demand (sales) will increase due to the increase of customer satisfaction (table 1). Table 1. Cost and benefits of an RFID Investment Investment Cost Hardware Software Service Benefits Inventory Cost Reduction Labor Cost Reduction Sales Increase

+ + +

Due to the automation, labor reduction will be seen in the following areas: •

Clerical Employees: Daily data entry hours spent on transactions and operations • Material Handlers: Personnel involved in picking, receiving, put away, shipping and cycle counting • Customer service: Total personnel involved in customer service activities associated with distribution operations The labor will also reduce due to decrease of inventory level and shipping/data entry errors. Using the RFID system, accuracy and security level will increase which can be measured by receiving, shipping and cycle counting errors, stock out ratio and theft ratio. Customer satisfaction which can be measured by the number of customer complaints will increase due to decrease of order delivery time. And increased satisfaction will trigger the customer demand. In our FCM model, seven concepts are determined (table 2). Table 2. Description of the concepts in the model

CI C2 C3 C4 C5 C6 C7

Concept Descriptions Labor Accuracy Security Inventory Level Cost Customer Satisfaction Customer Demand

Different degrees of influence are shown in the table 3. In the weight matrix (table 4), the weights between any two concepts are given. Table 3. Degrees of influences Linguistic variable Negatively very high Negatively high Negatively medium Negatively low Negatively very low Zero Positively very high Positively high Positively medium Positively low Positively very Low

Degree of Influence -1 -0,8 -0,6 -0,4 -0,2 0 0,2 0,4 0,6 0,8 1

Table4. The weight (influence) matrix

CI C2 C3 C4 C5 C6 C7

CI 0,00 -0,40 0,00 0,20 0,00

C2 0,00 0,00 0,00 0,00 0,00

0,00 0,00 0,00 0,00 0,00

C3

0,00 0,00

0,00 0,00

0,00 0,00

C4 0,00 -0,20 0,00 0,00 0,00 0,00 0,00

C5

C7 0,00 0,00

-0,20 0,40 0,00

C6 0,00 0,20 0,20 0,00 0,00

0,00 0,00

0,00 0,00

0,20 0,00

0,80 0,00

0,00 0,00 0,00

The graphical representation of the dynamical system can be seen in figure 1.

Customer Demand C7 Figure 1. Graphical representation of the model

341 The initial labor, accuracy and security level should be determined according to the characteristics of the RFID investment (case or pallet level etc.) considered by the managers. The expected initial labor level can be determined by considering the workforce (man hours) that the company doesn't need any more due to the automation. The initial accuracy level can be assumed by considering the expected error reductions. And for the security level, expected reduction of the theft ratio should be taken into account. The model will show us, how the labor and inventory levels will change with the influence of the accuracy and security values. So the changes in the level of the cost and the customer demand can be examined in the model. In our example the initial vector is assumed as [0,6 ; 0,8 ; 0,8 ; 0 ; 0 ; 0 ;0]. Using the Eq. (1), the values of concepts [- 0,76 ; 0,3 ; 0,3 ; - 0,54 ; - 0,95 ; 0,66 ; 0,66 ] are determined after 14 iterations. The hyperbolic function [f(x) = tanh x] is used as the threshold function. In our study, as seen from figure 2, each concept value reaches an equilibrium state. The labor and inventory values are decreased due to given accuracy and security levels. Because of the decrease in labor and inventory levels, the cost is decreased from level 0,31 to 0,93. It is impressive that the customer demand value has increased from level to 0 to 0,66

.Q

Labor

-T£

Inventory Level

-3K

Cost

—|

Customer Satisfaction

—a

Customer Demand

-1,50 J Iterations

Figure 2. Equilibrium of concept values

5. Conclusion In this study, we presented an FCM-based evaluation tool for RFID investments. The model helps the decision makers to understand the complex relationships among cost and benefit factors of an RFID investment. It can also be used to

perform what-if analyses regarding the results. For example, we can use the model to answer the following questions: • •

What is the impact of a sudden accuracy level increase on sales and inventory? What is impact of decreasing inventory level on the labor level?

In further research, a decision support system can be developed which helps the managers to decide about the RFID investment in a specific business process. References 1. Heinrich, C, RPID and Beyond, Wiley Publishing, Indianapolis, 2005. 2. Wu, N.C., Nystrom, M.A, Lin, T.R, Yu, H.C., Challenges to global RFID adoption, Technovation, in press (available online 25 October 2005). 3. McFarlane, D., Sarma, S., Chirn, J.L., Wong, C.Y., Ashton, K., Auto-ID systems and intelligent manufacturing control, Engineering Applications of Artificial Intelligence , 16 (2003) 365-376. 4. Sheffi, Y., RFID and the innovation cycle, The International Journal of Logistics Management, 15 (2004) 1-10. 5. Karkkainen, M., Increasing efficiency in the supply chain for short shelf life goods using RFID tagging, International Journal of Retail & Distribution Management, 31 (2003) 529-536. 6. Chow, K.H., Choy, K.L., Lee W.B., Lau, K.C., Design of a case based resource management system for warehouse operations, Expert Systems with Applications in press (available online 6 September 2005). 7. Ngai, S.M.T., Cheng, T.C.E, Au, S., Lai, K., Mobile commerce integrated with RFID technology in a container depot, Decision Support Systems in press (available online 16 June 2005). 8. Ni, L.M, Yunhao, L, Lau, Y.C, Abhishek, P.P., LANDMARC: Indoor Location Sensing Using Active RFID, Wireless Networks, 10(2004) 701710. 9. Goodrum, P.M., Mclaren, M.A, Durfee, A, The application of active radio frequency identification technology for tool tracking on construction job sites, Automation in Construction, in press (available online 27 July 2005). 10. Patil, M, Investments in RFID: A Real Options Approach, published whitepaper, Patni Computer Systems Ltd 2004. 11. Kosko, B , M, Fuzzy thinking: The New Science of Fuzzy Logic, Flamingo Press/Harper-Collins, London, 1990. 12. Sharif, A.M., Irani, Z , Exploring Fuzzy Cognitive Mapping for IS Evaluation, European Journal of Operational Research, in press (available online 26 August 2005).

ANALYSING SUCCESS CRITERIA FOR ICT PROJECTS

KOEN MILIS EHSAL, European University College Brussels, Centre for external cooperation, Campus Economische Hogeschool, Stormstraat 2, 1000 Brussels

KOEN VANHOOF Transportation Research Institute (IMOB), Universiteit Hasselt, Wetenschapspark 5 bus 6, 3590 Diepenbeek Since the 1960's many authors accepted the triple constraints (time, cost, specification) as a standard measure of success and this still appears to be extremely important in evaluating the success of ICT (information communication technology) projects. However, an ICT project cannot always be seen as a complete success or a complete failure. Moreover, the parties involved may perceive the terms "success" or "failure" differently. A quasi-experiment (gaming) was developed in order to determine the measures for success used by the different parties involved to judge an ICT project. The results of this quasi experiment were analysed using aggregation theory and validated by probabilistic feature models. In general the figures do not contradict. This research indicates that the impact of the triple constraints on the judgement of success is rather small. Other criteria, as there are user happiness and financial or commercial success are far more important. Surprisingly, whether or not a project was able to meet the predefined specifications was of little importance for the appreciation of the project's success. Keywords: Multi criteria analysis, gaming, project management and scheduling

1.

Introduction

In order to lead an ICT project towards high levels of success, a manager should know the criteria by which success is measured (i.e. success criteria). Fulfilling these criteria should be the manager's prime concern. Since the 1960's many authors accepted the triple constraints (time, cost, specification) as standard success criteria. It is assumed that if a projects completion time exceeds its due date or expenses overrun the budget, or outcomes do not satisfy a company's predetermined specifications, the project is a failure (Ingram, 2000; Wright, 1997; Turner, 1993). However, determining whether an ICT-project is a success or a failure is far more complex (Belassi & Tukel, 1996). Unlike a construction project, an ICT

343

344 project cannot always be seen as completely successful or completely failed (Wateridge, 1998). Moreover, different parties involved (e.g. management, projectteam, users, supporter, stakeholders) might perceive the project's success differently (Pinto & Slevin, 1989). But even among individuals of the same party, opinions might vary, since every individual has his/her own set of criteria against which the project is measured and these may be very subjective (Fowler & Walsh, 1999). Furthermore, not every criterion can be measured at the same time. Some criteria can only be assessed long after the determination of the project, as for example the financial or commercial success of an ICT implementation (Wateridge, 1996). The aim of the research is to determine the set of success criteria used by the different parties involved in an ICT project. 2. Research design In opposite to most studies on the subject, a quantitative approach was selected. The data was gathered using a type of experiment, referred to as gaming. The participants of the "game" were asked to rate the success of ICT projects, based on information (i.e. project descriptions) provided by the researchers. Seven possible success criteria were selected based on a literature review (Milis & Mercken, 2001). The list of criteria existed out of the triple constraints, extended with four other criteria: On time, Within budget, To specification, Users happiness, Projectteam happiness, Management happiness, Financial or commercial success. Selected experts were all well acquainted with ICT projects and were either employees of one of the two large electricity-distributing companies that were participating, or consultants working for these companies. Based on the role the different experts fulfilled, they could be classified into four groups : managers, project team members - no benefactors, project team members - no benefactors, end-users. During five consecutive days, the experts received an email with five project descriptions and were asked to judge the project's success based solely on the information provided. They were asked to reply by email within 24 hours (e.g. before the next set of descriptions arrived) to avoid comparison between answers

Note that due to the absence of a "control group" and a "calibration measurement" this research approach cannot be classified as an experiment and thus should be regarded as a ^Masi-experiment.

345 They were asked to state whether the project was a success or a failure and to rate success on a scale from 1 to 100. This resulted in a dataset with 650 binary datapoints (success or failure) and a dataset with 650 scores. 3. Data Analysis method The data are analysed using a technique proposed by Vanhoof ( Vanhoof &all 2005) in a customer satisfaction study. The technique evaluates the contribution of the success criteria in a two-stage evaluation process. First the evaluation process is modelled and then, in the second stage, the model is used to determine and quantify the contributions of the criteria. 3.1. Aggregation theory: uninorms Aggregation operators serve as a tool for combining various scores into one numerical value. An important class of aggregators , called representable uninorms, posses additive generators g : [0,1] —» [0,1] which define the uninorm via: U(x,y) = g , (g(x)+g(y))(l) (1) Dombi (Dombi 1982) showed that if g(x) is the generator function of the uninorm operator then the function displaced by a : g(x + a) = g a (x) also possesses the properties of the generator function. The neutral value 'e' naturally varies, which allows the formation of uninorm operators with different neutral values from one generator function. The generator function used contains one parameter whose value needs to be determined from the data. Consequently, for every expert evaluation the neutral value can be determined and the individual evaluation function can be constructed, which is a uninorm. This approach has the advantage that there is a higher sensitivity for differences between experts . 3.2. Calculating contributions of criteria, based on the full set of project evaluations The contribution of criterion Xj for expert i can be defined by the following difference : Contrib (ij) = Ej( x,,...., xn) - Ej( (x b ..... Xj.,, e u xj+i, ... , xn)

(2)

346 with Ej the uninorm of expert i. In fact the effect of replacing the criterion score by the neutral score is calculated. This effect can be positive or negative. As a consequence the histogram of all the contributions for a certain criterion will be bimodal. This histogram is characterized by three numbers: the total average value (called mean), the average value of the positive contributions (called pos) and the average value of the negative contributions (called neg). The results of all evaluations are presented in Table 1. Table 1: Overall results Users

mean on time

-0.3

within budget to specifications Management happiness projectteam happiness User happiness fin/com success

-0.5

neg, pos -19.9,,16.5 -18.5, 14.5

projectteam -

projectteam-

management

no benefactors

benefactors

mean

neg, pos

mean

1.6

-17.5 , 16.1

1.8

-18.5, 17.4

0.9

-19.1 ,16.6 ,

neg, pos

mean

neg, pos

0.6

-17.1 ,14.1

1.0

-16.9, 14.8

0.2

-16.9,,13.6

3.3

-14.7,,15.9

4.6

-12.6 ,15.5

4.8

-13.5, 16.7

3.6

-13.0,,14.9

-2.6

-21.0 ,17.5

-0.7

-18.0 ,17.5

-1.0

-18.6,,17.7

-1.9

-19.5,, 17.1

0.2

-17.2 , 14.4

1.5

-15.8 ,14.7

1.7

-15.9,,15.2

0.5

-16.3,,13.7

0.5

-22.8 ,,18.7

2.6

-19.2 ,20.2

1.8

-21.2,,19.9

0.7

-22.3 ,,18.7

-4.4

-22.6,,18.9

-2.6

-19.8 ,18.1

-3.0

-20.2,,18.4

-3.0

-19.9,,18.4

In order to understand the mean values, the percentages of perceived successful and failing projects is given in table 2. Table 2: Global evaluations Failing

Mean neg.

Succesful

scores

Mean pos.

scores

Users

38%

34,9

62%

62,3

Projectteam No benefactors

37%

32,8

72%

70,0

Projectteam No benefactors

36%

36,1

70%

67,9

Management

32%

30,7

68%

68,0

347 The mean value should be considered as a total impact measure. Table 1 indicates, for example, that for this set of project descriptions the criterion 'to specifications' has in general a positive impact for the users while the criterion 'fin / com success' has in general a negative impact. The absolute values of the negative contributions are greater than the absolute values of the positive contributions, which indicate that in absolute terms, the disfirmative effects exceed the affirmative effects. I.e. the reward for fulfilling a criterion is less compared to the punishment received for not fulfilling the criterion. The span between the positive and negative contributions (= |pos - neg|) provides an insight into the impact of the different criteria on the judgement of the project. A large span implies that fulfilling a criterion contributes largely to the perception of success while failing to fulfil the criterion contributes to a perceived failure. Consequently, the larger the span, the more impact the criterion has on the judgement of the project. Table 2 indicates for example that for the users, the span for the criteria "user happiness" (22,8 + 18,7 = 41,5) and "fin/com success" (22,6 + 18,9 = 41, 5) are equal and larger than the span of the other criteria. Consequently, the impact of both criteria on the judgement of a project is similar. Though, the high scores indicate that these are the most important criteria for the users. "User happiness", "fin/com success" and "management happiness" are the three most important criteria for all groups examined. Though, the proportion between the criteria differs. This signifies that the groups involved use similar sets of criteria, though the impact of every criterion in the set of success criteria differs depending on the group examined. Note that the criteria "to specifications" and "project team happiness" have a low span. Consequently, these criteria can be regarded as of little importance to the judgement of ICT projects. 3.3. Comparing results Table 3 combines the results of a probability matrix decomposition model (Maris, De Boeck & Van mechelen, 1996). For every party involved and for every criterion the median of the PMD model and the positive contribution of the aggregated model are represented. The first indicates the probability that a criterion is perceived as necessary for success, the latter features the affirmation power of the criterion.

348 Table 3: Comparing the results of the PMD model and the aggregation models Users

medianpos

projectteam-

projectteam-

no benefactors

benefactors

median pos

median

management

pos

median

pos

on time

.45

16.5

.28

16.1

.32

17.4

.53

16.6

within budget

.18

14.5

.25

14.1

.12

14.8

.06

13.6

to specifications

.06

15.9

.09

15.5

.12

16.7

.06

14.9

management happiness projectteam happiness user happiness

.17

17.5

.15

17.5

.12

17.7

.18

17.1

.04

14.4

.12

14.7

.05

15.2

.06

13.7

.24

18.7

.43

20.2

.42

19.9

.50

18.7

fin/com success

.48

18.9

.28

18.1

.43

18.4

.41

18.4

In general the figures do not contradict. Both techniques indicate that "user happiness" and "fin/com success" are the two most important criteria. They have a high mean for the PMD model and at the same time, they have a large positive contribution, indicating that their impact on the judgement of the project is important. Similarly, the criteria "to specification" and "project team happiness" are the least important factors. Though, the impact of the criteria "management happiness" and "on time" is less outspoken. Depending on the technique used, they have a slightly different place in the ranking of the criteria within the different groups. 4.

Conclusions

Since none of the groups examined bases their judgement solely on the triple constraints, fulfilling them does not guaranty that the project is perceived as a success. Moreover, satisfying the predefined specification appeared to have little impact on the judgement of a project. This clearly demonstrates that other sets of success criteria should be applied. The results of this research indicate that the criteria "on time", "user happiness" and "fin / com success" should be incorporated in any set of criteria, developed to evaluate the success of ICT projects. This research confirms that user satisfaction is a prime criterion for the end users. They want to work with the best (not optimum) application. They should

349 be happy with the project's results. Though, in opposite to literature (see supra), this is not the sole criterion. Financial or commercial success equally influences their judgment. This indicates that besides their personal desires, the corporate goals are a user's concern as well. Literature indicates that project team members are focusing on short term operational criteria. Though, this could only be confirmed partially. Not exceeding due date (criterion "on time") appeared to be a very important criterion for this group, while the other operational criteria such as "within budget" and "to specifications" have far less impact. Apparently, satisfying users and delivering fin/com success prevails over budgetary constraints and predefined specifications. Note that the emphasis on long term gains (fin/com success) is more outspoken for the project team benefactors compared to the project team no benefactors, as could be expected based on literature since the involvement of the latter ends at the handover of the project. The management focuses on the long-term gains (financial or commercial success). Their company needs to make profit and every project should contribute. Though, the criteria "on time" and "user happiness" appear to be important as well. Possibly this is caused by the fact that the gains an ICT project generates are often not fully tangible. References 1. Belassi, W., Tukel, O. I., 1996. A new framework for determining critical success/failure factors in projects, International Journal of Project Management, vol. 14., pp. 141-151. 2. Dombi, J., 1982. Basic concepts for the theory of evaluation: The aggregative operator, European Journal of Operational Research 10, pp. 282-293. 3. Fowler, A., Walsh, M., 1999. Conflicting perceptions of success in an information systems project, International Journal of Project Management, vol. 17, pp. 1-10. 4. Gelfand, A.E., Smith, A.F.M., 1990. Sampling based approaches to calculating marginal densities, Journal of the American Statistical Association, vol. 85. 5. Ingram, G., 2000. The way to enlightened project management, Project Manager Today. 6. Maris, E., De Boeck, P., Van Mechelen, I., 1996. Probability matrix decomposition models, Psychometrika, vol. 61, pp. 7-29

350 7. Milis, K., Mercken, R., 2001. Implementing IS/IT technology: success factors, in proceedings 13th International Society for Professional Innovation Management - conference. 8. Pinto, J.K., Slevin, D.P., 1989. Critical success factors in R&D projects, Research technology management. 9. Turner, J.R., 1993. The handbook of project-based management, McGrawHill. 10. Vanhoof K. , P. Pauwels , J. Dombi , Brijs T.,Wets G, 2005. Penalty-Reward Analysis with Uninorms: A Study of Customer (Dis)Satisfaction (2005) in Intelligent Data Mining. Techniques and Applications. Editors: Ruan D., Chen G., Kerre E., Wets G, pp. 237-252. 11. Wateridge, J., 1995. IT projects: a basis for success, International Journal of Project Management, vol. 13, pp. 169-172. 12. Wateridge, J., 1996. Delivering successful IS/IT projects: eight key elements from success criteria to review via appropiate management, methodologies and teams, PhD. Henley management college, Brunei University. 13. Wateridge, J., 1997. Training for IS/IT project managers: a way forward, International Journal of Project Management, vol. 15, pp.283-288. 14. Wateridge, J., 1998. How can IS/IT projects be measured for success?, International Journal of Project Management, vol. 16, pp. 59-63. 15. Wright, J.N., 1997. Time and budget: the twin imperatives of a project sponsor International Journal of Project Management, vol. 15, pp. 181-186.

MULTI-ATTRIBUTE COMPARISON OF ERGONOMICS MOBILE PHONE DESIGN BASED ON INFORMATION AXIOM GULCIN YUCEL Department of Industrial Engineering, Istanbul Technical University, Macka, Istanbul 34367, Turkey EMEL AKTAS Department of Industrial Engineering, Istanbul Technical University, Macka, Istanbul 34367, Turkey Axiomatic Design (AD) is a guide for understanding design problems, while establishing a scientific foundation to provide a fundamental basis for the creation of products and processes. The most important concept in axiomatic design is the existence of the design axioms. AD has two design axioms: independence axiom and information axiom. The independence axiom maintains the independence of functional requirements, and information axiom proposes the selection of the best alternative that has minimum information. In this study AD is proposed for multi-attribute comparison of mobile phones regarding their ergonomic design.

1. Introduction Today, mobile phones are not only used for making and receiving calls. Now, mobile phones provide functions such as short messaging service, internet connectivity, mobile camera, video recording, etc. Since mobile phones functions are expanded, the use of mobile phone has become a complex issue and various usability problems have arisen. In order to determine an easy to use mobile phone, an analysis of the current models in the market is conducted. Then several selected models are compared according to their ergonomic design and properties. In this study, to make a comparison, initially dimensions of ergonomic mobile are decided, then this dimensions' sub factors are listed. Finally, six phones are evaluated using Fuzzy AD. Comparison of the mobile phones according to their ergonomic properties is conducted under the predetermined physical and mental criteria. Since the comparison of mobile phones regarding ergonomic concerns has incomplete information, fuzzy AD approaches are exploited. In this paper, fuzzy multi-attribute axiomatic design approaches for selection of the most ergonomic mobile phone is introduced and the implementation process is represented by a real world example. 351

2. Principles of Axiomatic Design Axiomatic Design is a guide for understanding design problems, while establishing a scientific foundation to provide a fundamental basis for the creation of products and processes [1]. The most important concept in axiomatic design is the existence of the design axioms. AD has two design axioms: the Independence Axiom and he Information Axiom [2]. The Information Axiom indicates that the best design is the one with the least information content. In order to apply axiomatic design theory, firstly information content for a given functional requirement FR ( must be calculated. Information content I (. is calculated according to the following equation: I,=log2( — ) , (1) Pi In this formula, p (. is the probability of supplying FR. and it is decided by the design range and system range. Design range shows what the designer wishes to achieve in terms of tolerance, and system range shows the system capability. The intersection area between design range and system range shows the region where the acceptable solution exists and it is called common range, p . is defined as follows: p j = (system range / common range) (2) After obtaining all I ,• for each FR,, because there are n FRs, the total information content is the sum of all probabilities. If I f approaches infinity, then the system will never work [1]. 3. Ergonomics in Mobile Phone Previous research on ergonomic mobile phone evaluation has been done along two separate lines: physical approach and cognitive approach. Physical approach focuses on design elements such as weight, dimension, screen size and arrangement of buttons. Cognitive approach is interested in usability criteria such as leamability, memorability, efficiency, and image. Three usability dimensions are defined in ISO/IEC 9241 -11: effectiveness, efficiency and satisfaction. Effectiveness is defined as the accuracy and completeness with which users achieve specific goals. Efficiency is about the resources expended in relation to the accuracy and completeness with which users achieve specific goals, and satisfaction is the subjective assessment of how pleasurable it is to use. Addition to these factors, leamability (ability to reach a reasonable level of performance) and memorability (ability to remember how to use a product) is defined by Nielson [3].

353 Another reference for usability dimension is SUMI which provides a usability profile according to five scales: affect, control, efficiency, helpfulness, learnability [4] [5]. Also, Han et al. [6] divided usability two main groups. First one is defined as the performance dimensions that measure the user performance. The second group is defined as the image/imprecision dimension that measure the user's perception of the image and imprecision regarding to products. Moreover, the MPUQ (developed by Ryu) includes new criteria such as pleasurability and specific tasks performance [5]. In this study, ergonomic features are divided into two aspects: physical and cognitive. These aspects are further classified into sub-factors which are listed in Table - 2. Cognitive aspect's sub criteria are formed by the common factors in the existing usability questionnaires that are shown in Table - 1. Lai et al. [7] determine 3 representative image word: Simple-Complex, Handsome-Rustic, Leisure-Formal. Since, experts' evaluations have so many factors, these three representative words are used in order to evaluate image of the mobile phone. Moreover, physical attributes are found to be one of the most important features in mobile design in product catalogs. Table 1. Usability dimensions by usability questionnaires. Han et al., 2000

Nielsen, 1993

ISO 9241-11

1 .Perfromance Dimensions 1.1 .Perception/Cognition 1.2Memorization/Leamability 1.3.Control/Action 2.Image/Impression Dimensions 2.1 Basic Sense 2.2 Description of Image 2.3 Evaluative Feeling SUMI

1. Learaability 2. Efficiency 3. Errors 4. Memorability 5. Satisfaction

1. Effectiveness 2. Efficiency 3. Satisfaction

1. Affect 2. Efficiency 3. Control 4. Helpfiillness 5. Learnability

1 .Ease of Learning and Use 2.Helpfiillness and problem Solving 3.Affective Aspect and Multimedia Properties 4.Commands and Minimal Memory Load 5. Control and Efficiency 6. Typical Task for Mobile Phone

MPUQ

354 Table 2. Usability dimensions by usability questionnaires Pyhsical Attributes 1. Weight 2. Dimension 3. Function Button Style 4. Number buttons arrangement 5. Screen Size

Cognitive Attributes 1. Ease of Use 2. Learnability 3. Image 4.1 Simplex-Complex 4.2 Handsome - Rustic 4.3 Leisure - Formal

4. Fuzzy Axiomatic Design Approach In the fuzzy case, there is incomplete information about the system and design range, so the data available is fuzzy. All the previous points regarding uncertainty are very important to incorporate into ergonomic studies. The main advantages of using a fuzzy set are not only a gain in precision, but also the reduction of model complexity. There are limits of using crisp values for the evaluation process. First of all, some criteria can not be measured by crisp values, so in the selection process they are neglected [8] [9], Furthermore, the real world problems are complex and all of the decision data of the problem cannot be precisely assessed [8]. Humans are unsuccessful in making quantitative predictions, but they have capabilities to make qualitative predictions which the computers do not have. The use of fuzzy set theory allows us to incorporate unquantifiable information, incomplete information and nonobtainable information. Since ergonomic mobile phone selection has incomplete information, fuzzy AD is chosen for selection process. The system contains five conversion scales, and triangular fuzzy numbers. Firstly the experts decide design range of each alternative for each criterion by the help of linguistic expressions, and then the linguistic expressions are transformed into fuzzy numbers. After that, the common area is found by the intersection area of triangular fuzzy numbers. 5. Evaluation of Mobile Phones Using Fuzzy AD Six mobile phones of the same price range and having typical mobile phone functions are evaluated using Fuzzy AD. Two of the selected mobile phones are sliding type which is shown in Figure 1 .a, two of them are the folding type (Figurel .b) and the last two are block type (Figurel.c). The criteria considered in the ergonomic mobile phone selection are decided and five conversation scales are produced for them. For evaluating

intangible criteria: ease of use, ease of learning, complexity, fashionability, formality, function button style and number buttons arrangement five fuzzy triangular fuzzy numbers between 0 - 2 0 are used. For tangible criteria: Phone's dimensions, weights, screen sizes, 120 Mobile's related features are collected from magazines and product catalogs, then according to these data, the fuzzy numbers are produced. The linguistic variables and their fuzzy numbers arranged to these criteria are shown in Table 3. The FRs that should be satisfied a Mobile Phone are given below: FR1 = Weight must be light, FR2= Dimension must be medium, FR3 = Screen Size must be large, FR4 = Function Button Style must be moderate, FR5=Number buttons arrangement must be irregular, FR6 = Usability must be high, FR7 = Learnability must be very good, FR8 = Appearance must be moderate, FR9 = Fashionability must be handsome, FRIO = Perception of appearance must be moderate.

Fig 1. Alternative Mobile Phones' Designs.

After design ranges are decided; the experts produce the system range data and use linguistic expressions as in Table 3. In order to obtain Information content for the alternatives, common area is calculated. When M, = (1,, m,, n,) and M 2 = ( l 2 , m 2 , n 2 ) , and when 1, < u 2 , the d is the ordinate of the highest point D between Mi and M2, and it is calculated according to the following formula [10]: Jd)zero.

/|_ 2

" (m 2 -w 2 )-(m,-/,)

and when

'i > u 2 > ^ (d)

is

Table 3. Triangular fuzzy conversion scales and Linguistic Variables. Criteria Weight Dimension Screen Size Function Button Style Number buttons arrangement Usability Learnability Appearance -ashinonability Perception of Appearance

Very light (70,70,80) Very small (50, 50, 70)

Light (80,100,120) Small (60,80, 110)

Very small (65,65,160)

Small 120,160,200)

Very regular (0,0,6) Very regular (0,0,6) Very Low (0,0,6) Poor (0,0,6) Very Simple (0,0,6) Very Rustic (0,0,6) Very Formal (0,0,6)

Regular (4,7,10) Regular (4,7, 10) Low (4,7, 10) Fair (4,7, 10) Simple (4,7, 10) Rustic (4,7, 10) Formal (4,7, 10)

Fuzzy Numbers Medium Heavy 110,140,170) (160, 190,220) Medium Big (90, 125, 160) (140, 175, 190) Medium Large (170,240 (290,350,410) ,310) Moderate Irregular (8,11, 14) (12,15, 18) Moderate Irregular (8,11, 14) (12,15,18) Medium High (8,11,14) (12,15, 18) Good Very Good (8,11,14) (12,15,18) Moderate Complex (8,11,14) (12,15,18) Moderate Handsome (8,11,14) (12,15,18) Moderate Leisure (8,11,14) (12,15,18)

Very Heavy 210,240,240) Very Big 180,200,200) Very Large 390, 700,700) Very Irregular (16,20,20) Very Irregular (16,20, 20) Very High (16,20,20) Excellent (16,20,20) v*ery Complex (16,20,20) 'ery Handsome (16,20,20) Very Leisure (16,20,20)

Table 4. System range data for mobile phones. FR1

FR2

FR3

FR4

FR5

PI

Light

Small

Large

de-derate

Regular

P2

Medium

Medium

Very Large

Irregular

Regular

'hones

FR6 Very High Very High

P3

Light

Medium

Large

Irregular

Very Regular

P4

Light

Small

Large

rregular

Regular

P5

Medium

Big

Large

Very Jregular

Very rregular

Very High Very High

Medium

Medium

dedium

Irregular

rregular

4edium

P6

High

FR7

FR8

FR9

FR10

3ood

Moderate

handsome

Moderate

)ood

"omplex

handsome

Formal

Fair

Simple

Moderate

4oderate

Jood Very )ood Very Jood

Simple

Moderate

Moderate

Complex

Rustic

Leisure

Moderate

landsome

Very Formal

After d points are obtained, common area is calculated, and than information content can be found. The information content calculations for FR1 and FR3 are given in the following. All of the information contents are listed in the table 5. According to the Table 6, the phone with minimum information content is PI. As the alternative with minimum information content is best, PI which a sliding one is selected as the most ergonomic mobile phone together for physical and mental attributes. Also, for one of the two main attribute physical aspects, P3 which is a folding one is the best, on the other hand for cognitive approach PI is found the best.

Table 5. Information Content for Alternatives. Phones

FR1

FR2

FR3

FR4

FR5

Total

J

R6

-R7

FR8

FR9

-RIO

Total

PI

0

5.02

0

0

5.17

6.19

3.4

3.4

0

0

0

12.99

P2

1.32

0

5.78

3.17

0

14.27

3.4

3.4

5.17

0

3.17

27.41

P3

0

0

0

5.17

0

3.17

1

inf

5.17

5.17

0

inf

P4

•.32

3.02

0

5.17

0

10.51

3.4

3.4

5.17

5.17

0

23.65

P5

•.32

5.61

0

inf

3.4

inf

3.4

0

5.17

inf

3.17

inf

P6

•.32

0

5.29

5.17

0

12.78

5.17

0

0

0

inf

inf

6. Conclusion In this paper, we try to find the most ergonomic mobile phone to produce. For this aim, we use fuzzy axiomatic design method rather than Crisp AD. If we had complete information, Crisp AD would be sufficient to solve decision model. If data in decision model have little uncertainty, it is not required to convert data into fuzzy format. According to Ross [12], the aim should be matching the model type with the character of the uncertainty exhibited in the problem. According to uncertainty level, three different types of model can be used. If a system has a little uncertainty, closed-form mathematical expression would be the suitable method. For systems which are more uncertainty, but for which significant data exist, model-free method should be used. However, with the systems which have incomplete information, nonobtainable information or unquantifiable information, fuzzy modeling provides a way to understand system [12]. In our case, ergonomic mobile phone selection has too many attributes and these attributes about the problem are generally conflicting with each other and measured in different scales. Also, there is difficult to measure the intangible criterion quantitatively. Therefore, in this study fuzzy AD method is used. In fuzzy AD method, expert's opinions about the alternatives' design ranges are obtained by the linguistic variables. One advantage of using linguistic variables is that this kind of expression is more intuitive and easy for experts to give their opinions in an ambiguous situation where numerical estimations are hard to get. As a result, PI is found most ergonomic mobile phone both physical and cognitive approach. Also, by the side of physical approach P3 is the best. However, by the side of cognitive approach PI is found the best. The AD

358 method for the selection process has advantages more than other multi-attribute decision making methods. Firstly, the designer wants to satisfy a criterion peak pressure but may not want to meet this criterion best level. This is not possible when working other existing models like AHP, fuzzy AHP, and scoring models. Also, the AD method rejects an alternative which does not meet the decision range of any criterion, and the other methods do not. Also, dimensions of ergonomic mobile phones can help develop ergonomic mobile phone. References 1. Suh, N.P., Axiomatic Design: Advances and Applications. Oxford University Press, New York, (2001). 2. Lai, Y.C., A stochastic model for product development process, PHd. Thesis, Iowa State University (2002). 3. Lee, Y. S., Hong, S.W., Smith-Jackson, T.L., Nussbaum, M. A. and Tomioka, K. Systematic evaluation methodology for cell phone user interface, Interacting with Computers (2001) 1 - 22. 4. Kirakowski, J., and Corbett, M., SUMI: The Software Usability Measurement Inventory," British Journal of Educational Technology 24 (1993) 210-212. 5. Ryu, Y.S. Development of usability questionnaires for electronic mobile products and making methods, Ph.D. Thesis, Virginia Polytechnic Institute and State University (2005). 6. Han, S. H., Yun, M. H., Kim, K. and Kwahk, J. Evaluation of product usability: development and validation of usability dimensions and design elements based on empirical models, International Journal of Industrial Ergonomics 26 (2000) 477- 488. 7. Lai, H.H., Lin, Y.C., Yeh, C.H. and Wei, C.H. User - oriented design for the optimal combination on product design (2005). 8. Kulak, O. and Kahraman, C, Multi-attribute comparison of advanced manufacturing systems using fuzzy vs. crisp axiomatic design approach. International Journal of Production Economics 95(3) (2005) 415-424. 9. Kulak, O. and Kahraman, C, Fuzzy multi-attribute selection among transportation companies using axiomatic design and analytic hierarchy process. Information Sciences 170(2-4) (2005) 191-210. 10. Chang, D.Y., Applications of the extent analysis method on fuzzy AHP. European Journal of Operational Research 95(1996) 649-655. 11. Zhu, K. J., Jing, Y. and Chang, D.Y., Theory and Methodology A discussion on Extent Analysis Method and applications of fuzzy AHP. European Journal of Operational Research 116 (1999) 450 - 456. 12. Ross, T.J. Fuzzy Logic with Engineering Applications (1995).

FACILITY LOCATION SELECTION USING A FUZZY OUTRANKING METHOD IHSAN KAYA Department of Industrial Engineering, Istanbul Technical University Macka 80680, Istanbul, Turkey,[email protected] DIDEM CINAR Department of Industrial Engineering, Istanbul Technical University Macka 80680, Istanbul, Turkey,[email protected] Most decision-making problems deal with uncertain and imprecise data so conventional approaches cannot be effective to find the best solution. To cope with this uncertainty, fuzzy set theory has been developed as an effective mathematical algebra under vague environment. When the system involves human subjectivity, fuzzy algebra provides a mathematical framework for integrating imprecision and vagueness into the decision making models. The main subject of this study is a fuzzy outranking model, and a numerical example of a facility location selection problem with fuzzy data is considered using this model.

1. Introduction Facility location problem, whose optimization is a central area in operations research, is determining the best region for facility. Typical applications of facility location include placement of factories, warehouses, schools, ATM machines and proxy servers in content distribution networks on the internet. Selection of facility locations among alternative locations is a decision problem including quantitative and qualitative criteria simultaneously. In this study, we analyze the outranking preference method to generate strategic concepts, evaluate them and select the best ones. During the decision phase, eight main attributes are considered; travel distance, travel cost, political decision, and convenience of access, material handling cost, working condition, cost of renting & maintenance and other characteristics. Since the criteria of determined locations are subjective, we use fuzzy numbers. The rest of this paper is organized as follows. Section 2 provides some approaches about facility location selection methods. In section 3, proposed fuzzy outranking method is examined and showed how such a model can assist in analyzing a multi criteria decision-making problem when the information 359

360 available is vague, imprecise and subjective. An application of this model and its discussion is presented in section 4. Finally, concluding remarks are made in section 5. 2. Fuzzy Sets Approaches to Facility Location Selection Facility location is one of the most important aspects of logistics. The goal of research in this area is to support decisions regarding building facilities, e.g., plants and warehouses, among a set of possibilities such that all demands can be served. Usually the main objective is minimizing cost or maximizing profit. As a result of a good location decision, a company or an organization can save millions of dollars. Because of the imprecision or vagueness of the linguistic assessment, the conventional methods of location selection tend to be less effective. Fuzzy sets have been widely used for the facility location problem in die recent years. Certain types of uncertainties are encountered in a variety of areas and fuzzy set theory has pointed out to be very efficient to consider these. Tzeng and Chen [1] propose a location model, which helps to determine the optimal number and sites of fire stations at an international airport, and also assists the relevant authorities in drawing up optimal locations for fire stations. Kuo et al. [2] develop a decision support system to locate a new convenience store. This system has been integrated with analytic hierarchy process using the fuzzy sets theory. Chen [3, 4] investigates the distribution center location selection problem under fuzzy environment. To solve this problem, they propose a new multiple criteria decision-making method. Kahraman et al. [5] investigates using different solution approaches of four different fuzzy multi-attribute group decision-making. The first one is a fuzzy model of group decision proposed by Blin. The second is the fuzzy synthetic evaluation. The third is Yager's weighted goals method and the last one is fuzzy analytic hierarchy process to solve facility location problems. Al-bader [6], provides an introduction to the concept of decision-making and presents introduction, prerequisites and the needs for a methodology for analyzing certain problems under fuzzy environment and deals with rating models under fuzzy environment with an application of a facility location selection problem. 3. Fuzzy Outranking Method Multi Attribute Decision Making (MADM) refers to making a selection among some given and predetermined alternatives in the presence of multiple, usually conflicting and sometimes interactive attributes [7]. In MADM, selection attributes are determined; all alternatives are evaluated by rating these attributes and comparing the each other. The aggregation phase is followed by an

361 exploitation phase, which allows the decision maker to obtain a rank ordering, a choice or a sorting among the alternatives [7]. In general, the set of A = {a, b, c...} is used for formulating the alternatives of the multi criteria decision-making problem. These alternatives are evaluated by n criteria; gh g2, . . ., g„. The best alternative in the set A is selected based on the criteria vectors g(k), ke A. In this paper, instead of crisp numbers, the performance ratings are described with triangular fuzzy numbers. Since the available information is too subjective or the information is not sufficient, it is difficult to determine which alternative is the best. The incomparable criteria may be required to select an appropriate location until sufficient information is collected. For modeling the imprecise preference relations between location alternatives, the fuzzy outranking relation proposed by Roy [8] is used. Let Pi (a, b) eR be the fuzzy preference relation between a and b, where a and beA, for criterion i. gt(a) and gt(b) are the linguistic performance of alternatives a and b according to criterion i. gt(a) and g,(b) are represented by fuzzy numbers. According to Tseng and Klein [9], the fuzzy preference relationship is given as follows: riab)

'X.'M + manbfl)

(1)

D(a,0) + D(b,0)

where D(a, b) is the area where a dominates b; D(a, 0) the area of a, D(b, 0) the area of b; D(arb, 0) intersection areas of a and b. Preference relations are obtained by using related areas under fuzzy membership functions. Three preference models, which are applied in this study, are analyzed as follows [9]; 3.1. Pseudo-order preference model The pseudo-order preference model separates the set of alternatives into two sets; dominance and nondominance sets. During the discrimination, relative importance of each criterion is not considered. 3.2. The semi-order preference model When the relative importance of each criterion is predictable, semi-order preference model is used to identify nondominance set. 3.3. The complete-preorder preference model Complete-preorder preference model, in which the most promising "best" alternative is selected, is a special type of the pseudo-order preference model. Threshold value is not used so q; =p; =0, ieC. The degree of dominance is used

362 to determine the complete-preorder preference model and rank the set of alternatives in a complete order. 4. Application This example is chosen from the study of Al-bader [6]. In Al-bader's study he solved this problem with not only fuzzy sets but also crisp sets. Then he compared the results.In this study, we will use three preference-ranking methods, which were explained in chapter 3, with fuzzy numbers. Problem. There is an industrial engineering problem about location selection. Four alternatives (LI, L2, L3, L4) are considered. Selection of the best facility location area related 8 attributes; • Travel Distance: Distance between Working Condition: This location and market or the other lay criteria consists of the size, outs. comfort, car parking, ..., etc. Travel Cost Cost of renting and Political Decision maintenance. Convenience of Access: Situation Other Characteristics: There of ways ... etc. are some characteristics that Material Handling Cost: Handling may effect the decision of the cost is the result of long way, long new facility location. Climate, time etc. social conditions etc. Ratings of location alternatives for each attributes are given in Table 1 and weights of the attributes are summarized in Table 2. Triangular fuzzy numbers are used for each alternative to model the selection problem (Table 3). Table 1. Ratings for each location alternative Attribute Travel Distance Travel Cost Political Decision Convenience of access Material handling cost Working conditions Cost of renting & maintenance Other characteristics

LI 9 8 6 5 5 4 5 5

L2 8

L3 9

7 7 6 5 5 4

7 5 6 5 4 5 5

5

L4 7 8 6 6 5 4 6 5

363 Table 2. Weights and Normalized weight for each location attribute Attribute Travel Distance Travel Cost Political Decision Convenience of access Material handling cost Working conditions

Weight 8 8 3 4

Normalized weight 0.222 0.222

2 4

0.083 0.111

Cost of renting & maintenance

3

0.056 0.111 0.083

Other characteristics

4

0.111

Table 3. Fuzzy ratings for each location alternative

Travel Cost

LI (8,9,10) (7,8,9)

L2 (7,8,9) (6,7,8)

L3 (8,9,10) (6,7,8)

L4 (6,7,8) (7,8,9)

Political Decision Convenience of access Material handling cost Working conditions Cost of renting & maintenance Other characteristics

(5,6,7) (4,5,6) (4,5,6) (3,4,5) (4,5,6) (4,5,6)

(6,7,8) (5,6,7) (4,5,6) (4,5,6) (3,4,5) (4,5,6)

(4,5,6) (5,6,7) (4,5,6) (3,4,5) (4,5,6) (4,5,6)

(5,6,7) (5,6,7) (4,5,6) (3,4,5) (5,6,7) (4,5,6)

Attribute Travel Distance

4.1. Pseudo-order preference model qi=0.25 and pj=0.85 are applied as thresholds to the location selection problem [8]. Pseudo-order preference model is used to set the alternatives into dominance and nondominance set. Fuzzy preference relations among four location alternatives are given in Table 4. Table 4. Fuzzy preference relations between alternatives Travel Distance L4 L2 L3 LI 1.00 LI 0.50 0.875 LI 0.50 L2 0.875 L2 0.125 0.50 0.125 1.00 0.50 L3 0.875 L3 0.50 L4 0.50 0.00 0.125 L4 0.00 Political Decision L4 L3 LI L2 LI 0.50 0.875 0.125 LI 0.50 L2 0.875 1.00 0.50 L2 0.875 0.125 L3 0.50 0.00 L3 0.125 L4 0.50 0.50 0.125 0.875 L4

for each attributes Travel Cost L2 L3 L4 LI 0.875 0.875 0.50 0.50 0.50 0.125 0.125 0.50 0.50 0.50 0.125 0.125 0.875 0.50 0.50 0.875 Convenience of access LI L2 L3 L4 0.125 0.125 0.125 0.50 0.50 0.875 0.50 0.50 0.50 0.50 0.50 0.875 0.50 0.50 0.50 0.875

364 Material handling cost LI L2 L4 L3 LI 0.50 0.50 0.50 0.50 0.50 L2 0.50 0.50 0.50 L3 0.50 0.50 0.50 0.50 L4 0.50 0.50 0.50 0.50 Cost of renting & maintenance LI L2 L4 L3 LI 0.50 0.875 0.50 0.125 L2 0.125 0.50 0.125 0.00 0.875 0.125 L3 0.50 0.50 L4 0.875 1.00 0.50 0.875

LI L2 L3 L4

LI 0.50 0.875 0.50 0.50

LI L2 L3 L4

LI 0.50 0.50 0.50 0.50

Working conditions L2 L3 L4 0.50 0.125 0.50 0.875 0.50 0.875 0.50 0.50 0.125 0.50 0.50 0.125 Other characteristics L4 L2 L3 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50 0.50

According to Fig. 1, dominance and nondominance sets are following; S N D ={2}andS D ={l,3,4} Since only the location 2 (L2) outranks the other alternatives, L2 seems to be the result of this preference model.

Fig. 1. The outranking graph according to pseudo-preference model 4.2. Semi-order preference model Normalized weights are calculated using the weights of attributes in Table 2. For semi-order preference model q; (q,= 0.25) is enough to determine the best result. According to Fig. 2, dominance and nondominance sets are following; S ND ={l}andS D ={2,3,4}

Fig. 2. The outranking graph according to semi-order preference model Since only the location 1 (LI) outranks the other alternatives, it seems to be the result of semi-order preference model.

365 4.3. Complete-preorder preference model Weighted preferences and degree of dominance are calculated using MD = £^(o>&) an< i shown in Table 5 and 6. beA,b*a

Table 5. Weighted preference matrix L2 LI 0,58 0,50 LI 0,42 0,50 L2 0,43 0,53 L3 0,47 L4 0,46

L3 0,57 0,47 0,50 0,53

L4 0,54 0,53 0,47 0,50

Table 6. Degree of dominance Alternative LI L2 L3 L4

Degree of dominance 1,69 1,42 1,43 1,46

Dominance degree of location 1 is the highest value among the four alternatives. Location 4 is following location 1. Since degrees of location 2 and 3 are equal, there are two alternative arrangements. But in both of the arrangements first and second location do not change. Therefore, the best alternative is location 1. Ordering of the alternatives is shown in Fig. 3.

Fig. 3. The outranking graph according to complete-preorder preference model

5. Conclusion In this study a fuzzy outranking model is proposed and a numerical example of a facility location selection problem with fuzzy data is considered . Three preference models are analyzed for facility location problem; (1) pseudo-order preference model, (2) semi-order preference model, (3) complete-preorder preference model. When <jr, is chosen 0.25 and/?, is chosen 0.85, location 2 outranks the other alternatives according to pseudo-order preference model. However, when semiorder and complete preorder preference models are applied, location 1 seems to be the best alternative. The results of models are summarized in Table 6.

366 Table 6. Results of three preference models Pseudo-order preference model Semi-order preference model Complete preorder preference model

Nondominance set {2}

{U {1}

Dominance set 0,3,4} {2,3,4} {2,3,4}

In prospective studies, the effect of the variances of threshold (pf and qO values on the results can be researched as a sensitivity analyses. We may look at whether the best selection changes or not when p ; and q; are changed. Also other fuzzy outranking methods may be used or evaluated to compare with this method's results. References 1. Tzeng, G. H., Chen, Y. W., Optimal location of airport fire stations: a fuzzy multi-objective programming and revised genetic algorithm approach, Transportation Planning and Technology 23 (1) (1999) 37-55. 2. Kuo, R.J., Chi, S.C., Kao, S.S., A decision support system for locating convenience store through fuzzy AHP, Computers & Industrial Engineering 37 (1999) 323-326. 3. Chen, C.T., A fuzzy approach to select the location of the distribution center, Fuzzy Sets and Systems 118 (2001) 65-73. 4. Chen, S.M., Fuzzy group decision-making for evaluating the rate of aggregative risk in software development, Fuzzy Sets and Systems 118 (2001) 75-88. 5. Kahraman, C , Ruan, D., Dogan, I., Fuzzy group decision-making for facility location selection, Information Sciences 157(2003) 135-153. 6. Al-Bader, N., Certain Models for facility location and production planning under fuzzy environment, A thesis of Master of Science, Department of Mechanical and Industrial Engineering, University of Manitoba, Winnipeg, Manitoba 7. Roubens, M., Fuzzy sets and decision analysis, Fuzzy Sets and Systems 90 (1997) 199-206 8. Roy, B., Vincke, P.H., Relational systems of preference with one or more pseudo-criteria: some new concepts and results, Management Sci. 30 (1984) 1323-1335. 9. Giingor, Z., Arikan, F., A fuzzy outranking method in energy policy planning, Fuzzy Sts and Systems, 114 (2000) 115-122.

EVALUATION OF SUPPLIERS' ENVIRONMENTAL MANAGEMENT PERFORMANCES BY A FUZZY COMPROMISE RANKING TECHNIQUE GULCIN BUYUKOZKAN

ORHAN FEYZIOGLU

Department of Industrial Engineering, Galatasaray University, Qragan Caddesi, No:36 Ortakoy, 34357, Istanbul-Turkey Traditionally, when evaluating supplier performance, companies have considered factors such as price, quality, flexibility etc. However, with environmental pressures increasing, many companies have begun to consider environmental issues and the measurement of their suppliers' environmental performance. This paper presents a performance evaluation model based on a multi-criteria decision-making method, known as VIKOR, for measuring the supplier environmental performance. The original VIKOR method has been proposed to identify compromise solutions, by providing a maximum group utility for the majority and a minimum of an individual regret for the opponent. In its actual setting, the method treats exact values for the assessment of the alternatives, which can be quite restrictive with unquantifiable criteria. This will be true especially if the evaluation is made by means of linguistic terms. For this reason we extend the VIKOR method so as to process such data and to provide a more comprehensive evaluation in a fuzzy environment. The extended method is used in a real industrial application.

1. Introduction In the last years "green" movements, institutions and governments have forced many companies to improve their environmental performance. To respond to this growing concern for "green" issues, firms have carried out a great number of environmental programs. Following this, today, the environmentally conscious firms—mainly multinational corporations—are developing "green" programs aimed at organizing their supply value chains according to an ecoefficiency perspective [1]. In particular, pro-active companies are seeking to develop co-operative links with supply chain partners, particularly small and medium sized enterprises, in order to accelerate the diffusion of environmental management initiatives and to design and develop new "green" products [2, 3]. In spite of the growing importance of the supplier's role in the new product development process, there are relatively few models for supplier selection that effectively take environmental performances into account [4, 5, 6, 7]. An effective "green" supplier selection approach should be able to link all the 367

368 members of a supply chain. Furthermore, the selection decision should be driven by environmental factors as well as financial and other tangible and intangible set of performance criteria. As the importance degrees of evaluation criteria are different, the procedure should also allow prioritization of the criteria used in the evaluation process. For these reasons, this study aims to develop a supplier selection approach that is able to (1) consider environmental issues, (2) incorporate both tangible and intangible performance indicators, (3) handle different weights for each evaluation criteria, (4) handle multiple criteria decision making (MCDM), and, (5) provide multiple desirable solutions to the decision maker (DM). The paper is organized as follows. Section 2 gives a brief description of the green supplier evaluation criteria. Section 3 presents the linguistic VIKOR method in a group decision-making context. Section 4 applies the suggested approach to measure the performance of suppliers. Section 5 gives some concluding remarks. 2. Evaluation Criteria for Environmentally Conscious Suppliers Historically, several methodologies have been developed for evaluating, selecting and monitoring potential suppliers10,18'24 that take into account factors dealing with, for example, quality, logistics and cost. However, none of these methodologies has considered the importance of environmental factors, such as, life cycle analysis or design for environment in the decision-making process. Last years, a number of researchers have begun to identify some relevant criteria. Sarkis [8] grouped environmental criteria such as "design for the environment", "life cycle analysis", "total quality environmental management", "green supply chain" and "ISO 14000 environmental management system requirements", but used them only to evaluate the existing internal company operations for their environmental performance. Focused on supplier selection, Noci [7] identified four environmental categories including "green competencies", "current environmental efficiency", "supplier's green image" and "net life cycle cost". Enarsson5 proposed a fishbone diagram based instrument similar to ones used in quality assessment within companies for the evaluation of suppliers from an environmental viewpoint. Four main factors have been identified: "the supplier as a company", "the supplier's processes", "the product itself and "transportation". By consolidating several studies, Humphreys et al. [5] proposes seven environmental categories. The category "environmental costs (pollutant effects)" and "environmental costs (improvement)" are grouped together under the title "quantitative environmental

369 criteria". The other five categories named "management competencies", "green image", "design for environment", "environmental management systems", and "environmental competencies" are in a separate group termed "qualitative environmental criteria". In a recent work, Kongar [6] introduces environmental consciousness indicators such as "recyclability of goods", "decreased amount of hazardous substances" and "compatibility with health and safety regulations" into the supplier evaluation process. Based on the mentioned studies and the contribution of industrial experts who actually work in the environmental management related departments of three international companies' Turkish branches, the following criteria are to be considered for the assessment of the supplier: (a) environmental management competencies, (b) existing environmental management systems, (c) effort for the "design for environment", (d) effort for the "production for environment", (e) effort for the "logistics for environment", and (f) environmental costs. 3. Fuzzy VIKOR Method in a Group Decision-Making Setting Most common approaches in supplier selection include expert evaluation, principal components analysis, factor analysis, cluster analysis, discriminant analysis, data envelopment analysis, fuzzy logic based evaluation approaches [9]. Supplier selection usually involves comparisons of alternative solutions on the basis of multiple conflicting criteria and hence can be considered as a MCDM problem. One type of MCDM methods is the distance-based techniques like compromise and composite programming seek to find a solution that is close to an ideal solution, or like the Nash cooperative game concept - a solution as far as possible from the worst solution. As a method belonging to the compromise programming category, VIKOR was introduced as an applicable technique to implement within MCDM [10, 11]. The VIKOR method determines the compromise ranking-list and the compromise solution by introducing the multi criteria ranking index based on the particular measure of "closeness" to the "ideal" solution. The compromise solution is a feasible solution, which is the closest to the ideal, and here "compromise" means an agreement established by mutual concessions. With this ability, VIKOR is selected in this work as a suitable method for evaluating suppliers. Meanwhile, the method requires crisp evaluation of alternatives. Owing to the availability and uncertainty of the information, it is not always possible to obtain exact numerical data for decision criteria. Moreover, most evaluators tend to give assessments based on their knowledge, past experience and subjective judgments. As an example, "quality" is a linguistic variable since its

values are linguistic values rather than numerical ones, i.e., poor, fair, good, very good, etc. Fuzzy set theory plays a significant role to deal with the vagueness of human thought. The approximate reasoning in fuzzy set theory can properly represent linguistic terms [12]. The value of a linguistic variable can be quantified and extended to mathematical operations using fuzzy set theory [13, 14]. As the suppliers' environmental management performance contains hardly quantifiable factors, VIKOR method is extended in this study with fuzzy logic to process such data and to provide a more comprehensive evaluation. Recently, Opricovic and Tzeng [15] have also suggested using fuzzy logic for the VIKOR method. However, they simply used fuzzy values to define the attributes' ratings and their importance at a first phase, and then, obtained results are defuzzified in a second phase to obtain crisp values and are used as such in the original VIKOR method. Here, we suggest also making use of fuzzy logic in the subsequent phases of VIKOR method to not to loose any important information with the mapping process. Lets denote m alternatives under consideration as a^,a2,...,am, and n evaluation criteria as c],c1,...,c„. Then, the suggested procedure is as follows. Step 1. Construct a committee of K experts and identify the alternatives and evaluation criteria. Step 2. Identify the evaluation base, i.e. the linguistic variables to weight the criteria and rate the alternatives. Step 3. Determine the aggregated fuzzy weight w, of criterion cf i = 1,2,...,n , and the aggregated fuzzy rating ftj of alternative aj j = l,2,...,m under criterion c,. To achieve this, we use the weighted fuzzy Delphi method [16]. Delphi-Stepl: K experts are asked to provide their evaluation by using the linguistic variables as given in Table l.a-b, which in turn corresponds to the fuzzy triangular numbers f~ and wf . For a triangular fuzzy number (/, m, u), m is the most encountered value. Each expert has a weight Xk determined according to his/her degree of experience. Table 1. Linguistic variables to rate (a) criteria importance and (b) alternatives. Linguistic Variable Very Low (VL) Low (L) Medium (M) High (H) Very High (VH)

Fuzzy Scale (0.0,0.0,0.3) (0.0,0.3, 0.5 ) ( 0.2, 0.5, 0.8 ) (0.5,0.7,1.0) (0.7,1.0,1.0)

Linguistic Variable Very Poor (VP) Poor(P) Fair (F) Good (G) Very Good (VG)

Fuzzy Scale (0.0, 0.0, 0.2 ) ( 0.0,0.2, 0.4) (0.3,0.5,0.7) (0.6,0.8, 1.0) (0.8,1.0,1.0)

Delphi-Stepl: First, the weighted average L of all / * 's and wi of all are computed as flj={[\®f^®...®\xk®f^J(\+... + Xk)waA

371 (|A,®v^}e...e{^®wf}) l{^+... + Xk) Then the deviations between the fv and ftj , and also between w, and w* are computed with the method presented in Fortemps and Roubens [17] for each expert. Delphi-Stepi: A threshold value is defined so that the deviation is sent back to the expert if the distance between the weighted average and expert's evaluation data is greater than this value. If the threshold is passed, the process loops from step 2 is until there is no such threshold exceeding value is encountered. This process is repeated until two successive averages are reasonably close to each other. It is assumed that the distance being less than or equal to 0.2 corresponds to two reasonably close fuzzy estimates [18]. Step 4. If the supports of triangular fuzzy numbers expressing linguistic variables (Tables l.a-b) do not belong to the interval [0, 1], then scaling is needed to transform them back in this interval. Here, we use a linear scale transformation to have a comparable number. As an example, if we transform the rating of alternatives, we have f.. = ( / ; / / « , f/ //?>*,////"") where

fv={tiJlfl),fr=™«>fl

< = 1,2,...,«.

Step 5. Compute the values of 5. and Rj j = l,2,...,m by the relations Sj = ®"^wld(\,riJ) and Rj =max/ vv^fl,/^) where Sj and Rj are used for formulating ranking measures of "group utility" and the "individual regret" respectively. Here, d(l,K ) represents the distance of an alternative rating to the positive ideal solution 1 = (1,1,1) calculated by area compensation method [17] which has reasonable ordering properties and computational easiness. Note that the maximum among H ^ I I , ^ . ) values is the one that is the most distant from! . Step 6. Compute the values g\ j = l,2,...,m by the relation Q. =V(S'J)®(\-V)(R'J), where S\ and Rj are the normalized Sj and Rj values using the linear scale transformation. Here, "v" is introduced as a weight of "the majority of criteria" strategy. The compromise can be selected with "voting by majority" (v>0.5), with "consensus" (v=0.5), or with "veto" (v<0.5). Step 7. The ranking order of alternatives is determined with the help of the area compensation method. First, S\, Rj and Qj values are defuzzified into crisp Sj, Rj and g ; values. Then, alternatives are ranked by sorting each Sj, R) and Qj values in an increasing order as in the original VIKOR method. The result is a set of three ranking lists denoted as SI,, R!, and Q,. The alternative j \ corresponding to gL (the smallest among Qj values) is proposed as a compromise solution if

372 CI. The alternative y, has an acceptable advantage, in other words 2r2] _ fijii - DQ where Z)g = l/(/w-l) and m is the number of the alternatives. C2. The alternative _/', is stable within the decision making process, in other words it is also the best ranked in S!-, or Rl,. If one of the above conditions is not satisfied, then a set of compromise solutions is proposed, which consists of: • 7', and j2 where Qj = Q,2, if only the condition C2 is not satisfied, or • j]'j\'---' Jk ^ t n e condition CI is not satisfied; and jk is determined by the relation Q., - Q,, < DQ for maximum k where QA = G t ,. 4. Application of the Proposed Approach Step 1 & 2. The authors have collaborated with a committee of 3 experts from the investigated company to undertake this study. Since there were no significant degrees of experience difference between experts, all are assumed to be equally important for the decision process. Six different suppliers is to be evaluated versus six criteria decided on and given in at the end of section 2. Step3. Linguistic terms presented in Table l.a-b are used in the assessment process. The aggregated fuzzy weights of criteria and the aggregated fuzzy ratings of alternatives are calculated through the weighted sum of individual evaluations and shown in Table 2. In this and subsequent tables, c, and W, stand for the labels of criteria / and alternative j respectively. The column named "Weight" is for the criteria importance evaluations. Table 2. The aggregated fiizzy weights and ratings. W, (.8,1,1) (,2,.4,.6) (.1,.3,.5) (.67,-87,1) (.5,.7,.9) (.1,.3,.5)

c, c2 c3 c„ cs c6

W2 (.8,1,1) (.4,.6,.8) (.57,.77,.9) (.5,.7,.9) (.2,.4,.6) (.5,.7,.9)

W3 (.1..3..5) (.73,-93,1) (.67,.87,1) (.4,.6,.8) (.67,.87,1) (.5,.7,.9)

W4 (.8,1.,1) (.6,.8,1) (.73,.93,1) (.8,1,1) (.8,1,1) (0,.2,,4)

Ws (.57,.77,.9) (.73..93.1) (.1,.3„5) (.4,.6,.8) (.3,.5,.7) (.6,8,1.)

W6 (.3,.5,.7) (.57,.77„9) (0,.2,.4) (.73,-93,1) (.1..3..5) (.5,.7„9)

Weight (.57,-8,1) (.63,.9,1) (.3,-57,-87) (0,-3,-5) (0,.3,.5) (.57,.8,1.)

Table 3. S'. R'. and O, values for v = 0.6. & j'

j

y

Lists \j

Wi

W2

W3

W4

W5

W6

Qj

(.17„29,.38)

(.27,.45,,59)

(.48,.70,.88)

(.12..19..24)

(.35,.55,.72)

(.49,-77,1)

R'j

(.21,.30,.38)

(.28,.40,.50)

(.57, .80,1)

(,14,.20,.25)

(.50,.70,.88)

(.57,80,1)

S'j

(.14,.28,.39)

(.26,.48,.64)

(.42,.63,.80)

(.11,.18,.24)

(.26,.45,.61)

(.44,.75,1)

Step 4. As the supports of the fuzzy numbers given in Table l.a-b are in [0, 1] interval, obtained results remain in that interval and the scaling step is skipped.

373 Step 5 & 6. The normalized "group utility" measure Sj and "individual regret" measure R'j are calculated for each alternativej = 1, ..., m. Based on the results of these two measures, Qs values are computed by selecting v = 0.6 (Table 3). Step 7. Table 4 gives the defuzzified scores of alternatives computed with area compensation method and their corresponding rankings. Table 4. Ranking of alternatives for v = 0.5.

ft'

S' Alternatives

w, w2 w3 w W4s w6

Dist. 0.27 0.46 0.62 0.18 0.44 0.74

Rank 2 4 5 1 3 6

Dist. 0.30 0.40 0.79 0.20 0.69 0.79

Q Rank 2 3 5 1 4 6

Dist. 0.28 0.44 0.69 0.19 0.54 0.76

Rank 2 3 5 1 4 6

It is not possible to declare alternative 4 as the winner. We observe that this alternative satisfies condition C2 but not Ci given that gL, - gL = 0.09 < DQ = 0.2. Since only condition C\ is not satisfied, we propose alternatives 4 and 1 as the set of compromise solutions due to the following inequity: g,3, - gL = 0.25 > DQ = 0.2. The weight v has a central role in the ranking and a sensitivity analysis can be undertaken by systematically setting v to some values between 0 and 1. The results of such an analysis are presented in Table 5. Table 5. Ranking of alternatives for different values of v. V

0.00 0.25 0.50 0.75 1.00

Set of compromise solutions W,, W2, W4 W,,W„ W,,W4 W,,W4 Wi,W4

5. Conclusions This study proposed a fuzzy MCDM framework which a company can consider during their "green" supplier selection process. The approach basically extends the VIKOR method that helps DMs to achieve an acceptable compromise. In the extended method, the importance weights of criteria and the ratings of alternatives are assessed in linguistic terms. By using the suggested approach, the ambiguities involved in the assessment data could be effectively represented and processed to assure a more convincing and effective evaluation process. Although the extended method presented in this paper is applied to the

environmentally conscious supplier evaluation problem, it can also be used to identify acceptable compromises in many supplier evaluation problems. Acknowledgements The authors acknowledge the financial support of the Galatasaray University Research Fund. References 1. J-B. Sheu, Y-H. Chou and C-C. Hu, An integrated logistics operational model for green-supply chain management, Transport. Res. E-Log., 41 (4), 287-313, (2005). 2. G. Biiyukozkan, An analytic approach for strategic analysis of green product development, Proceeding of the 13th International Working Seminar on Production Economics, Innsbruck, Austria, Vol. 2, 87-96, (2004). 3. L. Li and K. Geiser, Environmentally responsible public procurement (ERPP) and its implications for integrated product policy (IPP), J. Clean. Prod, 13 (7), 705-715, (2005). 4. L. Enarsson, Evaluation of suppliers: how to consider the environment, Int. J. Phys. Distr. Logist. Manag., 28 (1), 5-17, (1998). 5. P.K. Humphreys, Y.K. Wong and F.T.S. Chan, Integrating environmental criteria into the supplier selection process, J. Mat. Proc. Tech., 138, 349-356, (2003). 6. E. Kongar, "A comparative study on multiple criteria heuristic approaches for environmentally benign 3PLs selection", Proceeding of the 3rd International Logistics and Supply Chain Congress, Istanbul, 23-24, (2005). 7. G. Noci, Designing 'green' vendor rating systems for the assessment of a supplier's environmental performance, Eur. J. Pur. Supply Manag., 3 (2), 103-114, (1997). 8. J. Sarkis, Evaluating environmentally conscious business practices, Eur. J. Oper. Res., 107, 159-174,(1998). 9. L. de Boer, E. Labro and P. Morlacchi, A review of methods supporting supplier selection, Eur. J. Pur. Supply Manag, 7 (2), 75-89, (2001). 10. S. Opricovic and G.H. Tzeng, Compromise solution by MCDM methods: a comparative analysis of VIKOR and TOPSIS, Eur. J. Oper. Res., 156,445^55, (2004). 11. G.H. Tzeng, C.W. Lin and S. Opricovic, Multi-criteria analysis of alternative-fuel buses for public transportation, Energy. Policy, 33, 1373-1383, (2005). 12. L.A. Zadeh, The concept of a linguistic variable and its applications to approximate reasoning, Inform. Sciences, Part I, 8, 199-249; Part II, 8, 301-357; Part III, 9,43-80, (1975). 13. A. ICaufmann and M.M. Gupta, Introduction to fuzzy arithmetic theory and applications, Van Nostrand Reinhold, New York, (1991). 14. L.A. Zadeh, Fuzzy sets, Inform. Control, 8, 338-353, (1965). 15. S. Opricovic and G.H. Tzeng, Defuzzification within a multicriteria decision model, Int J. Uncertain Fuzz., 11, 635-652, (2003). 16. G. Bojadziev and M. Bojadziev, Fuzzy logic for business, finance, and management: advances in fuzzy systems, World Scientific Pub., (1997). 17. P. Fortemps and M. Roubens, Ranking and defuzzification methods based on area compensation, Fuzzy Set. Syst., 82, 319-330, (1996). 18. C.H. Cheng and Y. Lin, Evaluating the best main battle tank using fuzzy decision theory with linguistic criteria evaluation, Eur. J. Oper. Res., 142, 174-186, (2002).

A FUZZY MULTIATTRD3UTE DECISION MAKING MODEL TO EVALUATE KNOWLEDGE BASED HUMAN RESOURCE FLEXD3DLITY PROBLEM* MUJDE EROL GENEVOIS Industrial Engineering Department, Galatasaray University, Ciragan Cad. No: Ortakoy 34357 Istanbul, Turkey Y. ESRA ALBAYRAK Industrial Engineering Department, Galatasaray University, Ciragan Cad. No: Ortakoy 34357 Istanbul, Turkey In multiple attribute decision-making (MADM) problems, a decision maker (DM) is often faced with the problem of selecting or ranking alternatives associated conflicting attributes. In this paper, a MADM with fuzzy pairwise information is used to solve human resource flexibility problem. In many manufacturing systems human resources are the most expensive, but also the most flexible factors. Therefore, the optimal utilization of human resources is an important success factor contributing to long-term competitiveness.

1. Introduction Human resources are one of the main sources of flexibility. In many manufacturing systems human resources are the most expensive, but also the most flexible factors. Therefore, the optimal utilization of human resources is an important success factor contributing to long-term competitiveness. For example annualizing working hours is a tool that provides flexibility to organizations; it enables a firm to adapt production capacity to fluctuations in demand. In multiple attribute decision-making (MADM) problems, a decision maker (DM) is often faced with the problem of selecting or ranking alternatives associated conflicting attributes. In this paper, through the multicriteria decision making (MCDM) analysis, we illustrate the relationships between human resource policies and characteristics of knowledge-based management. The purpose of this paper is to develop a theoretical framework for flexible modes of strategic human resource management within a global knowledge-based view organisation configuration. Using the proposed fuzzy multi-criteria approach, the ambiguities * This work is supported by research foundation of Galatasaray University.

375

376 involved in the assessment data can be effectively represented and processed to assure a more convincing and effective decision-making. Our aim is to combine subjective fuzzy preference information with objective information to derive the relative importance weights of the attributes and to derive a subjective ranking of alternatives from fuzzy pairwise relations. This approach is based on a fuzzy muhicriteria decision making (FMCDM) model to solve the problem of evaluating the flexibility of Knowledge Based Human Resource Flexibility Management System (KBHRFMS) face to internal and external problems. 2. Basic model In this paper the approach FAHP is introduced, with the use of triangular fuzzy numbers for pairwise comparison scale of FAHP, and the use of the extent analysis method for the synthetic extent value S. of the pairwise comparison. By applying the principle of the comparison of fuzzy numbers, V[MJ>M2)=1

if

m/>m2

a n d v\M2>M])=hgt\M1l

M2)=fiM

(d)

(1) the weight vectors with respect to each element under a certain criterion can be represented by d(A.) = minv[s.2LSk), k = l

,n, k*i.

(2)

In the following, first the steps of the extent analysis method on fuzzy AHP are given and then the method is applied to the evaluation of the human resource flexibility modes face the internal and external firm's problems. The overall FAHP approach can be summarized as follows [1]: a. Construct fuzzy pairwise comparison matrices; the decision-maker needs relative fuzzy values of decision criteria and alternatives based on each criteria. b. Solve fuzzy eigenvalues for each matrix; the eigenvector of the matrix is the relative importance of the alternatives and criteria. c. Determine the total weights The triangular fuzzy number and the linguistic variable are the two main concepts used in this paper to assess the preference ratings of linguistic variables, 'importance' and appropriateness'. In order to assess the relative importance of various criteria, an assumed weighting set W =tyeryLow, Low, Medium, High, Very High) has been developed. To evaluate the appropriateness of the alternatives versus various criteria, the decision makers can employ the linguistic rating set S = {Very Poor, Poor, Fair, Good, Very Good}. The triangle fuzzy conversion scale is shown in Table 1. Assume that X = |x;,x ...,x ) is an object set, and u = \i.,u2,...,u ) is a goal set. According to the method of Chang's [2] fuzzy extent analysis, each object is taken and extent analysis is performed for each goal respectively.

377 Table 1. The triangular fuzzy conversion scale Triangular Fuzzy Numbers

Linguistic Values Very Low (VL) Very Poor (VP)

m

Low(L) Poor(P)

U>)

Exactly Equal

(U.1)

Medium (M) Fair(F) High(H) Good(G) Very High (VH) Very Good (VG)

(U5) {3,5,5)

Therefore, m extent analysis values for each object can be obtained, with the following signs: Mi M2 .,Mg.i = l,2 nwhere all the M{ (j = 1,2 m) are triangular fuzzy numbers representing the performance of the object x with regard to each goal „ . By using fuzzy synthetic extent analysis, the value of fuzzy synthetic extent with respect to the i* object x.(i = 1,2,

,n) that

represents the overall performance of the object across all goals can be determined by m

;

j=l

8

,-

i

(3)

I S Mig

i

The degree of possibility of

n m

i=lj=l

M

> M is defined as, (4)

^ 2 ^ ) = sup min\ fiM (x),MM (j) xi.y

and can be equivalently expressed as follows: When a pair (x,y) exists such that x>y and juu/x) = HU2(y), then we have v(Mx>M1)=\- Since W ; and M2 are convex fuzzy numbers we have that V{M2

>MJ)=

V{M. >M2)=l if m >m ,

hgt[M]I M2)= nM (d),

(5)

where d is the ordinate of the highest intersection point D between jUM and HM . When M] =(//,m/M/) and M 2=\f2,m2,u2),

the ordinate ofD is given by :

378 v{M2>_M^hg{Mx

n

*J

=

p _ i L ^ ^

(6)

To compare M\ and M2, we need both the values of V(M, > M2) and V\M . >M,). The degree possibility for a convex fuzzy number to be greater than

k

v(M>MrM2

convex

fuzzy

numbers

Mk)= V\M > M 1}nnd{M > M 2)md

Assume that, /(A^mmv^zsJ

M.(i = 1,2,...,k)can be

and[MZMk)\ = minV[M >M . ) / = 1,2

for k = l,2,...,n;k*i.

,

7-

defined

by k•

Then the weight vector is

given by, w' = (d (A)d'(A)...,d'(A J where A.(i = l,2

n) are n elements. Via

normalization, the normalized weight vectors are, w = [d{A\d[A\..,d\A f where Wis a non fuzzy number. 3. Human Resource Flexibility 3.1.

WhyHRF

Traditional competitive mechanisms have become less effective as competitors meet or copy each other's corporate initiatives [3]. In response, firms constantly search for newer sources of competitive advantage, one of the most important being human resource management (HRM) [4]. Managers have long understood that the structure of work—for example, the duration of the employment contract, die number of hours typically worked, or the method of compensation—affects employees' wages, opportunities for promotion, the likelihood of unemployment, and other labor market outcomes. In recent years, this managers of labor markets have turned their attention to parttime work, on-call work, independent contracting, and other "nonstandard" forms of employment, meaning those in which work is not done "on a fixed schedule—usually full-time—at the employer's place of business, under the employer's control, and with the mutual expectation of continued employment" [5]. Academic response to these increasingly fashionable ways of organizing work has largely been critical, and with good reason: deviations from "standard" employment relationships are often associated with lower hourly wages, fewer benefits, greater risk of unemployment, fewer chances for promotion, and reduced wage growth ([6]; [7]; [8]; [5]).

379 3.2. Human-Resource-based Flexible Criteria 3.2.1. Human Nature (CI). The human resource's qualifications (CI 1) and the relationship type (contract) (CI2) are the mainly criteria which constitute human nature. The human resource's qualifications can be explain by education (CI 11), skills (CI 12), intelligence (CI 13) and motivation (CI 14). The human-firm relationship consists in a variety of flexible staffing arrangements. When the adoption of these different internal and external staffing arrangements is driven by a firm's uncertainty in its demand for labor we refer to them as contingent work arrangements, using the definition in Polivka [9]: "Any job in which an individual does not have an explicit or implicit contract for long-term employment or one in which the minimum hours worked can vary in a nonsystematic manner". We assume that the employer may draw upon three sources of labor: fixed (C121), contingent (C122) and statue-related (C123) sources like part-time workers and temporary contracts to minimize costs of labor and backlogged work. The decision whether to take on temporary workers in lieu of hiring permanent employees is a decision that involves significant risk. 3.2.2. Organization Nature (C2). There are otiier tools that can provide flexibility within organizations: organization structure (C21) and operation organization (C22). Organizations can be structured in two possible ways: in matrix (C211) or in hierarchical (C212). We can find many possibilities in operation organization like overtime (C221), annualizing working hours (C222), equip-working (C223), multi-skilled workforce, variations in the distribution of working time, shift work. By using annualizing working hours (AH), costs due to a lack of capacity can be diminished and, in some cases, eliminated. However, AH often implies a worsening of the staffs working conditions and the need to solve a complicated problem in planning working time. 4. Application of fuzzy AHP (FAHP). 12 evaluation criteria for the hierarchical structure were used in this study. The aim of the evaluation is to construct the better flexibility modes portfolio for deal whit three problems sets, two external and one internal: PI (needs for volume flexibility), P2 (needs for mixte flexibility) and P3 (needs for process flexibility). The fuzzy evaluation matrix relevant to the goal is given in Table 2.

380 Table 2. The fiizzy evaluation matrix M with respect to the goal Qoal

Human

Organizational

Human(Ci)

(1,1,1)

(1/3,1,3)

Organization^

(1/3,1,3)

(1,1,1)

In a similar way, we can construct the fuzzy pairwise (evaluation) comparison matrices for all criteria among all the elements/criteria and subcriteria in the dimensions of the hierarchy system. Via pairwise comparison, the fuzzy evaluation matrix M, which is relevant to the goal, Mi and M2, matrices with respect to Human resource and Organizational perspective respectively, are constructed (See Tables 3 and 4). Table 3. The fuzzy evaluation matrix of sub-criteria, Mi, with respect to Human resource Human resource Qualification(C„)

Qualification (1,1,1)

Contract style (1,3,5)

Contract style (C,2)

(1/5,1/3,1)

(1,1,1)

Table 4. The fuzzy evaluation matrix of sub-criteria, M2, with respect to Organizational Organization

Organization

Operation

The procedure for determining the evaluation criteria weights by FAHP can be summarized as follows: First step: Table 2 (matrix M) gives the fuzzy comparison data of the subcriteria of goal. Construct pairwise comparison matrices by using formula (3), we obtain SCj =(1.3,2,4)®(I I ^

= (0.16,0.5,1.54)

S^ = (l.3, 2 , 4 ) ® ^ , - , — J = (0.16,0.5,1.54)

Using formulas (4) and (5) y(sr is ) = -, ™-)M , = ,.00 c L cn n) (0.5-1.54)-(0.5-0.16)

v[sr (, cn

ZSr ] = 7 °- 1 '-!- 5 4 , = 1.00 c n J (O.S-1.54)-(O.S-0.16)

The normalized weight vector from Table 2 is calculated as, WM = (0.50,0.50)T • In a similar way, according to hierarchical structure, we can obtain the normalized weight vectors for all criteria and sub-criteria, (omitted).

381 Second step: At the second level of the decision procedure, the committee compares alternatives P, P and P under each criteria separetely. The normalized weight vectors of alternatives with respect to sub-criteria are calculated and shown in Table 5. Table 5. The normalized weight vectors of alternatives Criteria Cm C\n Cm Cll4 Cl21 Cl22 Cl23 C211 C212 C221 C222 C223

Pi 0.182 0.333 0.274 0.333 0.182 0.670 0.420 0.274 0.415 0.908 0.415 0.415

Pi 0.409 0.333 0.311 0.333 0.409 0.179 0.315 0.415 0.274 0.046 0.311 0.311

Pi 0.409 0.333 0.415 0.333 0.409 0.152 0.265 0.311 0.311 0.046 0.274 0.274

The overtime tool is used to reduce the needs of the volume flexibility. The requirement of volume flexibility can be increased also with using contingent labor. The matrix structure tool, educated and experienced workers are the main factors to deal with the mixte flexibility problem. For the process flexibility problem the intelligent, educated and experienced labor forces are the most important factors. Finally, adding the weights per alternative multiplied by the weights of the corresponding criteria (Table 6) a final score is obtained for each alternative. Table 6. Main criteria of the goal and final scores

Weight Pi P2

Ps

c,

C2

Alt. vector

0.5 0.326 0.332 0.342

0.5 0.421 0.305 0.274

0.374 0.318 0.308

priority

5. Conclusion In this paper, the complex, multi-criteria nature of appropriate Knowledge Based Human Resource Flexibility Management System decision has been brought out. A multi-criteria methodology, called the fuzzy AHP, has been suggested in this paper for the purpose of considering the opinions of different managers in order to solve the problem of appropriate flexibility management style. It is well known that the modeling of complex human problems into purely qualitative terms limit the effectiveness of decisions. The decision of appropriate knowledge

382 based decision or evaluation flexibility management is very often, vague and uncertain. To address these concerns, the concepts of fuzzy numbers and linguistic variables are used to evaluate the human factors. An example of appropriate flexibility management style selection is presented to illustrate the proposed framework. The effectiveness of human resource factors are depended to their qualifications. Also, the subcriteria contract type is useful to deal with the quantitative variation of demands. The organizational nature is the most important criteria for the flexibility. When the volume and mixte flexibility problems appear, the organizational structure factors are used, and the volume flexibility need is achieved with operation organization tools. The proposed decision algorithm cannot only manipulate the conventional precision-based (non-fuzzy) problem but also help decision-makers to make suitable decision under fuzzy environment. Therefore, by conducting fuzzy or non-fuzzy assessments, the decision-makers can obtain the appropriate portfolio of flexible factors to resolve specific firm's problems. References 1. Triantophyllou, E., and C.T. Lin (1996). Development and Evaluation of Five Fuzzy Multiattribute Decision-Making Methods, /. J. ofApprox. Reasoning, 14, 281-310. 2. Chang, D.Y., (1996). Applications of the Extent Analysis Method on Fuzzy AHP. European Journal of Operational Research, 95, 649-655. 3. Ulrich, D., (1987). Organizational capability as a competitive advantage: human resource professionals as strategic partners. Human Res. Planning 10 4, 169-184. 4. Schuler, R.S., I.C. MacMillan (1984). Gaining competitive advantage through human resource management practices. Human Resource Management 23 3, 241-255. 5. Kalleberg, A.L., C.F. Epstein, B. Reskin, K. Hudson, (2000). Bad jobs in America: standard and nonstandard employment relations and job quality in the United States. American Sociological Review, 65 (2), 256-278. 6. Blank, R., (1994). Social Protection versus Economic Flexibility. University of Chicago Press, Chicago. 7. Ferber, M., and J. Waldfogel, (1998). The Long-term consequences of nontraditional employment. Monthly Labor Review, 3-12. 8. Kalleberg, A.L., (2000). Nonstandard employment relations: part-time, temporary, and contract work. Annual Review ofSociology, 26, 341-394. 9. Polivka, A.E., (1989). On the definition of contingent work. Mon.Lab. Rev., 112, 9/16.10. Corominas, A., A. Lusa and R. Pastor, (2004). Characteristics and classification of the annualised working hours planning problems, /. J. of Services Tech. and Man., 5, 435-447.

FUZZY EVALUATION OF ON THE JOB TRAINING ALTERNATIVES IN INDUSTRIAL COMPANIES GULGUN KAYAKUTLU T Istanbul Technical University, Department of Industrial Engineering, 34367 Macka Istanbul, Turkey GULCIN BUYUKOZKAN Industrial Engineering Department, Galatasaray University, 34357, Ortakoy, Istanbul, Turkey BURCIN CAN METIN, SAMI ERCAN Institute of Analytical Sciences, Istanbul Commerce University, 34378, Eminonii, Istanbul, Turkey This paper presents a combined fuzzy analytic hierarchy process (AHP) and fuzzy goal programming (GP) to determine the preferred compromise solution for Enterprise Resource Planning (ERP) training package selection in terms of value creation by multiple objectives. The problem is formulated to include six primary goals: maximize financial benefits, maximize effective software utilisation, maximize employee satisfaction, minimize cost, minimize training duration and minimize risk of operation. Fuzzy AHP is used to specify judgments about the relative importance of each goal. A case study performed for a small manufacturing company running ERP software training is included to demonstrate the effectiveness of the proposed model.

1. Introduction Enterprises consider creating a learning organization essential for long time survival. Since the initiation of the concept by Argyrs and Schon [1], and prominence of The Fifth Discipline by Senge [2], improving compatibility in learning is observed to depend on perceived need, learning mechanisms, learning processes and resource allocation. [3]. Enterprise Resource Planning (ERP) training has become a critical part of organizational learning which presents benefits of integrated information, if successfully implemented. Most of the small companies are not in rush to implement ERP because of the inadequacy for the rigorous training and development requirements besides budget [4]. For this reason, this paper suggests an analytic framework for * contact author: kavakutlu(aiitu.edu.tr.

383

effective ERP training package selection. Since ERP training package selection problem is multi objective in nature, the goal programming (GP) approach may be applicable. In addition, ERP training package implementation decision faces many constraints, some of these are related to organizations' internal policy and externally imposed system requirements. In such decision making situations, high degree of fuzziness and uncertainties are involved in the data set. Fuzzy set theory [5] provides a framework for handling the uncertainties of this type. For this reason, the ERP training packages selection problem has been formulated in this study as a fuzzy GP problem and fuzzy analytic hierarchy process (AHP) is also used to specify judgments about the relative importance of each goal in terms of its contribution to the achievement of the overall goal [6]. The concepts, the model, the case study and conclusions will be handled in the following sections. 2. Fuzzy goal programming GP is a powerful multi criteria decision making introduced by Charnes and Cooper [7], Difficulty of defining priorities for goals arises by subjectivity, that can be overcome by fuzzy set theory defined by Narashimhan [8], improved by Hannan [9], Ignizio [10], Tiwari et al. [11], and Mohammed [12]. Formulation of fuzzy GP shows an exception of assigning aspiration levels allowing integration of different decision makers' views. Zimmerman [13] defined two sets of membership functions: for the goals and the constraints. The maximum of discrepancies on expected goal values are to be minimized, hence, Flavell's solution [14] leads to expression of K fuzzy goals and L fuzzy constraints as in (1), where dk are the deviations in goal k and dj are the deviation in constraint / [15]. max X s.t X<\-^Zk'dCkX)

A,l-1^)

* = 1,2 / = 1>2,...,L

K

CD

d

l

Q<X<\ and X>0

3. ERP training package selection model development The selection of ERP training package includes five main steps: identifying the decision variables, describing the goals, formulating the objective function, weighting the importance of the goals by fuzzy AHP and solving the fuzzy GP using fuzzy objective function and the constraints. The first three steps, which

are the model development, are summarized as follows. 3.1. Decision variables Decision variables are xit with n alternatives /= 1, 2, .., «,; x, = 1 if training package i is selected, and zero otherwise. The coefficients are as follows: 6, : Benefits expected from training package i ; c, : Cost associated with training package ;' ; r, : Risk associated with training package i ; est : Employee satisfaction expected from thr training package i; /;: Estimated completion time for training package /'; oc,: Software operation capability proposed by package /. Positive and negative deviations are associated with each goal as in benefit represented by db+ and db. 3.2. Goals Maximize Financial Benefits (FB): Total benefit expected from the training package is to be maximized, to the expected value BEN. Y.biXi+dl -d+ =BEN '=1 (2) Maximize Operation Capability (OC): Contributions of the software training in operations by improving the employee skills is to be maximised to CAPAB. ZocjXj +d~c -dgC = CAPAB M

(3)

Maximize Employee Satisfaction (ES): Maximum satisfaction of the employees anticipated as PERF, will increase the motivation and efficiency. n Z eSjXj + des - des = PERF (4)

i=\

Minimize Risk (R): The risk representing the likelihood of failure in learning or teaching activities is to be minimised. Yet, the real objective is to maximize benefits in the presence of risks; thus, the risk-related objective function includes benefits in addition to risks with a target BEN* denoting the expected benefit given in the presence of risk. Z nbjXi + d; -d, = BENR <=> (5) Minimize Cost (C): We include the costs of resources in our model as an objective to be minimized to fit the budgeted expenses denoted by BDG. ^CjXj+d;

z=i

-d+

=BDG

(6)

Minimize Training Duration (T): Duration of training is to be minimised with a minimum expectation of 2 days, the reality of ERP training. "

-

"EtjXi +dt

+

-dt

=2

'=•

(7) 3.3. The objective function The objective function will attempt to minimize the sum of deviations associated with the constraints in the model as follows: Mm

Z = Pb{d^)+P0C{doc)+Pes{d~esYPr{dt)+Pc{dcYPt[dt)

(8)

4. A case study A small size steel cabinet manufacturer is chosen while it was in process of selecting the ERP software based on training packages proposed by four vendors. Priority of goals is determined by using Chang's fuzzy AHP method [16] with the improvement proposed by Zhu et al. [17] due to effectiveness and simplicity. The goals' importance degrees are given in Table 1. Levels of data sets using fuzzy GP are shown in Table 2 where employee satisfaction factor is subjectively scored on a scale of 0-100, the risk factor subjectively scored on a scale of 0-10, and OC is in a range of 50-90 % with an aspiration level at 80%. Table 1. The relative importance of the goals Training selection objectives

Weights

Financial Benefits (Wb)

0.28

Operation Capability (w„c)

0.22

Training duration (wt)

0.15

Risk(wr)

0.10

Cost(wc)

0.15

Employee Satisfaction (wes)

0.10

Table 2. Data set for fuzzy GP model

Alt. 1 Alt. 2 Alt. 3 Alt. 4

FB ($ 000 /year) AL=135 LT=100 AL=140 LT=120 AL=125 LT=75 AL=150 LT=100

OC

ES

R

AL=80 LT=50 AL=80 LT=50 AL=80 LT=50 AL=80 LT=50

AL=100 LT=90 AL=100 LT=75 AL=100 LT=50 AL=100 LT=80

AL=1 UT=3 AL=2 UT=4 AL=3 UT=5 AL=2 UT=6

C ($ per day per person) AL=1000 UT=1750 AL=1250 UT=1500 AL=750 UT=1250 AL=500 UT=1000

AL is Aspiration Level; LT is Lower Tolerance and UT is Upper Tolerance.

T (in days) AL=28 UT=32 AL=16 UT=35 AL=24 UT=45 AL=12 UT=25

By using the obtained objectives' weights and data set given in Table 2, we formulate the ERP training package selection problem as follows1 min Z = 0.28 ( ^ " + dbj ~ + d^ ~ + dbf~)

+ 0.22(doc>" + dOCi " +
+ 0-\o(deSl-+deSi- + des-+des-)+

OAslfi^+d^+d^+d^) + o.io(^++dr;+dr;+dr+)

+Q.\s{dc;+dc;+dc;+dc;)

(3.86-0.028x1 ,)+rf 6 ~ -db+

=1

s.t. M2\.

{2.67-0M3x2i)+d

~-dOCi+=l '

'

+

(l0-0.1x 3 1 )+rf < a i --rf

=1 (9)

(0.5x 4 1 -0.5)+c? r ~-J -"41 •

'

+

=1

r

' +

(0.002x5l-2)+dc~-dc

=1

-"51 •

(0.25x61 -l)+d, -"61 • d

b~>db+>doc i

i

'

>doc >des~>des + i

J

i

i

+

~-d,

=2

'

>di~~,di+,dc~,dc+,dr~~,dr+,xi/>0 i

J

i

J

J

i

J

Following the proposed procedure, we obtained the results of the fuzzy GP model using the software LINDO that all the deviations are null. Based on the results, the ERP training package 4 was selected and the decision makers were satisfied by this selection. 5. Concluding remarks Due to conflicting nature of the multiple objectives and vagueness in the information related to the parameters of the decision variables, the deterministic techniques are unsuitable to obtain an effective solution. The combined fuzzy AHP and fuzzy GP approach formulated in this paper is then extremely useful for solving the ERP training package selection problem when the goals are not clearly stated. The formulation can effectively handle the vagueness and imprecision in the statement of the objectives. The proposed formulation has also the advantages that any commercially available software such as LINDO may be used for solving it. FGP model constraints are given here only for j=l because of the page limits.

References 1.

C. Argyris and D. Schon, Organisational Learning: A Theory of Action Perspective, Addison Wesley, N.Y, N.Y. (1978). 2. P. Senge, The Fifth Discipline: The Art and Practice of Organisational Learning, Doubleday, New York, N.Y. (1990). 3. D. Whittington and T. Dewar, A strategic approach to organizational Learning, Industrial and Commercial Training, 36 (7), 265-268 (2004). 4. J.R. Muscatello, M.H. Small and I.J. Chen, Implementing enterprise resource planning (ERP) systems in small and midsize manufacturing firms, International Journal of Operations and Production Management, 23(8), 850-871 (2003). 5. L.A. Zadeh, Fuzzy Sets, Information and Control, 8, 338-353 (1965). 6. C. Kahraman, and G. BUyukozkan, A Combined Fuzzy AHP and Fuzzy Goal Programming Approach For Effective Six-Sigma Project Selection, Proceedings of UthlFSA World Congress, Volume III, 28-31 July 2005, Beijing, China. 7. A. Charnes and W.W. Cooper, Management Models and Industrial Applications of Linear Programming, John Wiley and Sons, New York, (1961). 8. R. Narasimhan, Goal Programming in A Fuzzy Environment, Decision Science, 13, 331-336(1982). 9. E.L. Hannan, On fuzzy goal programming, Decision Sciences, 12,522-531 (1981). 10. J.P. Ignizio, On the rediscovery of fuzzy goal programming, Decision Sciences, 13, 331-336(1982). 11. R.N. Tiwari, S.. Dharmar and J.R. Rao, Fuzzy goal programming- an additive model, Fuzzy Sets and Systems, 24,27-34 (1987). 12. R.H. Mohammed, The relationship between goal programming and fuzzy programming, Fuzzy Sets and Systems, 89,215-222 (1997). 13. H.J. Zimmerman, Fuzzy programming and linear programming with several objective functions, Fuzzy Sets and Systems, 1, 45-55 (1978). 14. R.B. Flavell, A new goal programming formulation, Omega, The International Journal of Management Science, 4, 731-732 (1976). 15. C.-C. Lin, A weighted max-min model for fuzzy goal programming, Fuzzy Sets and Systems, 142,407-420 (2004). 16. D-Y. Chang, Applications of the extent analysis method on fuzzy AHP, European Journal of Operational Research, 95, 649-655 (1996). 17. K-J. Zhu, Y. Jing and D-Y. Chang, A discussion on extent analysis method and applications of fuzzy AHP, European Journal of Operational Research, 116, 450456(1999).

A STUDY OF FUZZY ANALYTIC HIERARCHY PROCESS: AN APPLICATION IN MEDIA SECTOR*

MELISA OZYOL Galatasaray University, Industrial Engineering Department, Ciragan Cad. No;36 Ortakoy 34357 Besiktas /Istanbul TURKEY Y. ESRA ALBAYRAK Galatasaray University, Industrial Engineering Department, Ciragan Cad. No:36 Ortakoy 34357 Besiktas /Istanbul TURKEY The media sector is a sector which encompasses the creation, modification, transfer and distribution of media content for the purpose of mass consumption; therefore it is a very active and, when managed properly, a very effective sector. In this paper the three management methods Mbl, MbO, and MbV are evaluated and the most adequate management method for the media company A* is determined using Fuzzy Analytic Hierarchy Process (FAHP). This study shows the power of FAHP in capturing experts' knowledge.

1. Introduction To be effective and increase its organizational competitiveness, a media company should be able to maximize the performance of its most important assets, its employees. And to do so, the company should guarantee the satisfaction and the engagement of the employees. One of the most important factors which influence the engagement of the employees is the method of management of the company. In this paper three management methods (Management by Instructions, Management by Objectives, and Management by Values) will be evaluated using the extent method of Chang.

This research has been financially supported by Galatasaray University Research Fund. Company A, which's name will not be given for reasons of confidentiality, is an important Turkish media company.

389

2. Management Methods There are three methods of management selected for evaluation: Management by Instructions (Mbl), Management by Objectives (MbO), and Management by Values (MbV). The Mbl is the traditional model described by F W. Taylor [1] at the beginning of the 20th century, and it is based on the hierarchically arranged control of the employees. In Mbl, the communication is fast, information is specialized, and there is recourse to specialists. MbO, conceptualized by Peter Drucker [2], is a management method that clearly indicates to employees the results expected from them, it facilitates planning since the managers define their objectives and fix the expiries. MbV makes it possible to direct an organization by using its intangible values the most effectively possible. MbV can absorb the organizational complexity caused by the need of adaptation to changes, and can ease the achievement of a strategic vision in a company. 3. Evaluation Criteria The seven evaluation criteria that will be used are taken from the work of Albayrak and Erensal [3]. These criteria can be gathered in three principal criteria: conditional criteria, managerial criteria and individual criteria. 3.1.

Conditional Criteria

One of the two conditional criteria is the Physical Conditions. The complexity of work and individual coping behavior must be taken into consideration in setting up the workplace. And the other conditional criterion is the Corporate Conditions. This criterion examines how the work force is enabled to develop and use its full potential, aligned with the objectives of the company. 3.2. Managerial Criteria Leadership is one of the three managerial criteria and it examines the personal leadership of the senior leaders and their involvement in creating and sustaining values, company directions, performance expectations, and a system of leadership that promotes excellence of performance for human resources. Corporate Culture, which shapes up the way how people are used to think, act, make decisions and participate in an organization, must maintain effective Environmental fluctuations may have a negative effect on the number of

391 mechanisms focused on developing the personal and professional potential of each and every member. Therefore it is a managerial criterion. The last managerial criterion is Participation. The participation of the employees is becoming more essential in the companies, especially when a problem or a decision requires multiple skills and knowledge in various fields the participation becomes more desirable. 3.3. Individual Criteria The first one of the two individual criteria is Capability. Knowledge, skills and abilities of the individuals in an organization constitute the capability of the human performance. The skills and competency of individuals in a company are generally improved by educating and training the human after having designed a system to support human capabilities and limitations. The other individual criterion is Attitude. Development of a workforce with positive work attitudes, including loyalty to the organization, pride in work, a focus on common organizational goals and the ability to work with employees from other departments, facilitates team work and flexibility [4]. 4. Choice of Methodology Analytical Hierarchy Process (AHP) is often used in solving multi-criteria analysis problems involving qualitative data. Where there are a limited number of choices but each has a number of attributes AHP can be used to formalize decision-making. In AHP, instead of exact numbers, expressions like "more important than" are used to show the preferences of decision. However this method may fail in reflecting the uncertainty and imprecision of human thinking style and therefore it is often criticized. On the other hand, fuzzy logic offers a more natural way of dealing with these preferences instead of exact values [5]. The fuzzy set theory, introduced by Zadeh [6], deals with uncertainty due to imprecision and vagueness and it enables decision makers to give interval judgments instead of fixed value judgments, which they think is more confident. The evaluation of the three methods of management according to seven criteria and the selection of the best is a multi-criteria decision problem. The Fuzzy Analytical Hierarchy Process (FAHP) is abundantly used to deal with the problems including the multiple criteria evaluation or the selection of alternative. It is adequate to use the FAHP to determine the weights of the criteria according to subjective judgments of each expert. As the AHP cannot reflect the human thinking style in capturing experts' knowledge it is going to be developed a model, as shown in Figure 1, based on FAHP which would determine the total weights for the three different methods of management while

examining how, according to certain criteria, the properties of each method of management affect the human performance in a company. There are various ways to treat FAHP [7]. In this application, the extent FAHP (EFAHP) will be used. Level 1: Goal Level 2: Criteria

Level 3: Sub-criteria

Human Performance

/ Conditional Criteria •Physical Conditions •Corporate Conditions

1 Managerial Criteria -•Leadership

Capacity

-•Participation

Attitude

—•Corporate Culture

T Level 4: Alternatives

Management bv Instructions

\ Individual Criteria

Management bv Objectives

I Management bv Values

Figure 1. Hierarchical Structure

5. Application In this part the three methods of management will be evaluated by a group of five experts, two from service sector, and three others from production sector, and the best management method for Company A will be determined. Some abbreviations used in the application are: EH: Excellence of the Human Performance, C: Conditional criteria, L: Leadership, M: Managerial criteria, Pa: Participation, I: Individual criteria, Cc: Corporate Culture, Ph: Physical Conditions, Ca: Capacity, Co: Corporate Conditions, At: Attitude. 5.1. Application of FAHP with the Extent Method (EFAHP) The group of five experts is required to evaluate the alternatives according to the selected criteria and is required to present a common decision. The procedure to determine the weights of the evaluation criteria by the FAHP using the extent method can be summarized in two stages. In the first stage, the pair-wise comparison matrices between all the criteria and the sub-criteria of the hierarchical system are constructed.

Table 1. The fiizzy comparison matnx for the goal EH EH

C

M

I

C

(1,1,1) (1,3,5) (3,5,5)

(1/5, 1/3, 1)

(1/5, 1/5, 1/3)

(1,1,1)

(1/3,1,3)

(1/3, 1,3)

0,1,1)

M I

After having built the pair-wise comparison matrix for the goal (Table 1) the vectors of value synthetic extent S are calculated: Sc = (1.40,1.53,2.33)® (.0492,.0739,. 1240) = (.069,. 113,.289) SM = (2.33,5,9) (.0492,.0739,. 1240) = (.115,.369,1.116) 5/=(4.33,7,9)®(.0492,.0739,.1240) = (.213,.517,1.116) Then the degree of possibility Fis calculated for each pair: VEH\SC ^ SM) ~

VEH(S,>SU)

.115-.289 : .41 C-l 13 - .289) - (.369 -.115)

.069-1.116 = 1.32 (.369-1.116)-(.113-.069) .115-1.116 = = 1-17 (.517-1.116)-(.369-.115)

VEH(S, ZSC) =

.069-1.116 = 1.63 (.517-1.116)-(.113-.069)

.213-.289 16 (.113-.289)-(.517-.213)~' .213-1.116 = .86 fEH(Su>S,) " (.369-1.116)-(.517-.213)

VEH(SC^SI) =

Finally, the values d' and then the weight vector Ware obtained: c/'(Q=16 cT(M)=.86 => W ' E H =(.16, .86, 1.17)T rf'(I)=1.17 By standardizing W ' EH the weight vector of Table 1 in respect to the decision criteria C, M, and I is obtained: W EH = (.072, .392, .535)T In the second stage, the alternatives Mbl, MbO, and MbV are compared separately according to each criterion. After obtaining the W vector for each comparison matrix, the final evaluation matrix given in Table 2 is obtained.

Table 2. Final scores of the alternatives

C 072

Mbl MbO MbV

PH .5 .333 .333 .333

Co .5 .333 .333 .333

L .415 .222 .389 .389

EH M .392 Pa .311 .023 .313 .664

535 CC .274 .072 .392 .535

Ca .5 .274 .311 .415

At .5 .023 .313 .664

W total .150 .334 .514

6. Conclusion The goal of this work was to find the best management method for Company A to maximize the performance of its employees. According to the results of the application, to achieve this goal it is necessary to implement in the company a system of management which is a mixture of the three methods presented: Mbl, MbO, and MbV. This mixture can be considered as an ideal method of management to reach the excellence of human performance in Company A. References 1. F.W. Taylor, The Principles of Scientific Management. Harper & Row, London (1947, original edition, 1911). 2. P. Drucker, The Principles of Management. NY HarperCollins Publishers, New York (1954). 3. E. Albayrak and Y.C. Erensal. "Using analytic hierarchy process (AHP) to improve human performance: An application of multiple criteria decision making problem". Journal of Intelligent Manufacturing, 15, 491-503 (2004). 4. V.L. Huber and K.A. Brown, "Human resource issues in cellular manufacturing: Associotechnical analysis". Journal of Operations Research,\\, 138-159(1991). 5. T.J. Ross, Fuzzy Logic with Engineering Applications. (Internationale Edition). McGraw-Hill, NY (1995). 6. L.A. Zadeh, "Fuzzy Sets". Information and Control, 8 (3), 338-353 (1965). 7. C.-L. Hwang and S.-J. Chen, in collaboration with F.P. Hwang. Fuzzy Attribute Decision Making: Methods and Applications. Springer-Verlag, Berlin, (1992).

PRIORITIZATION OF RELATIONAL CAPITAL MEASUREMENT INDICATORS USING FUZZY AHP AHMET BESKESE Department of Industrial Engineering, Bahcesehir University, 34538, Istanbul, Turkey

Bahcesehir,

F. TUNC BOZBURA Department of Industrial Engineering, Bahcesehir University, 34538, Istanbul, Turkey

Bahcesehir,

Relational capital (RC) is a sub-dimension of the intellectual capital which is the sum of all assets that arrange and manage the firm's relations with the environment. It contains the relations with outside stakeholders (te. customers, shareholders, suppliers and rivals, the state, governmental institutions and society). Although the most important component of RC is customer relations, it is not the only one to be taken into consideration. Measuring the RC is related to how the environment perceives the firm. To control and manage this perception, the companies must measure it first. This study aims at defining a methodology to improve the quality of prioritization of RC measurement indicators under uncertain conditions. To do so, a methodology based on the extent fuzzy analytic hierarchy process (AHP) is applied. Within the model, main attributes, their sub-attributes and measurement indicators are defined. To define the priority of each indicator, preferences of experts are gathered using a pair-wise comparison based questionnaire.

1. Introduction Today, IC is widely recognized as the critical source of true and sustainable competitive advantage [1]. Knowledge is the basis of IC and is therefore at the heart of organizational capabilities. Successfully utilizing that knowledge contributes to the progress of society [2], Intellectual capital (IC) is the pursuit of effective use of knowledge (thefinishedproduct) as opposed to information (the raw material) [3]. See Fig. 1 [4] for an illustrative definition of IC. 395

396

INTELLECTUAL CAPITAL

T

HUMAN CAPITAL

ir

ORGANIZATIONAL CAPITAL

ir

RELATIONAL CAPITAL

Individual-level knowledge

Mission-vision

Customers

Competence

Strategical values

Customer's loyalty

Leadership ability

Working systems

Market

Risk-taking and problem

Culture

Shareholders

Solving capabilities

Management system

Suppliers

Education

Use of knowledge

Official institutions

Experience

Databases

Society

Fig. 1. Components of Intellectual Capital

Knowledge of environmental relationship defines the relational capital in the intellectual capital. The relational capital contains the relations with customers, shareholders, suppliers and rivals, the market, the state, governmental institutions and society. Although the most important component of RC is customer relations, it is not the only one to be taken into consideration. Measuring the RC is related to how the environment perceives the firm. The relational capital is the reflection of the firm. It is a knowledge database which has brands, customer loyalty scales, and the image in society, suppliers and customer feedback systems. Mc Kenna [5] states that, there are three steps to establish relations with the environment: a) To understand the market, b) To move with it, c) To establish relations. In the value chain, there is the obligation that the firms should establish relations with all the sections from the customer to the supplier. Many researches show that, being market-focused has an effect on the profit rate of the company and on the increase of the market share [6]. The relational capital, defines the relations of the elements that are in the value chain with the firm. It is obvious that, the essential criteria of the relational capital are related to customer and market. Except these criteria, the stockholders that are important elements of the firm environment, suppliers and society should be defined in the context of relational capital. In a research held in Canada industry, features like the growth rate, sales rate to permanent customers, customer loyalty, customer satisfaction, customer complaint rate, and market share are defined as the criteria of RC [7]. In another

397 research regarding the intellectual capital of companies in Sweden, features like the rate of re-purchasing, and market capital index are defined [8]. 2. Methodology of the study The problem has a subjective and intangible nature where the Analytic Hierarchy Process (AHP) is usually considered the most appropriate method. In this paper, Fuzzy AHP is preferred in the prioritization of relational capital indicators since this method uses a hierarchical structure among goal, attributes and alternatives. Usage of pair-wise comparisons is another asset of this method that lets the generation of more precise information about the preferences of decision makers. AHP method which is developed by Saaty [9] is uses pair-wise comparisons of the elements of each hierarchy by means of a nominal scale. Then, comparisons are quantified to establish a comparison matrix, after which the eigenvector of the matrix is derived, signifying the comparative weights among various elements of a certain hierarchy. Finally, the eigenvalue is used to assess the strength of the consistency ratio of the comparative matrix and determine whether to accept the information. There are several fuzzy AHP methods explained in the literature. In this paper, we prefer Chang's extent analysis method [10, 11] since the steps of this approach are relatively easier than the other fuzzy AHP approaches and similar to the conventional AHP. 3. A Hierarchical Model for Prioritization of RC Measurement Indicators The goal of this study is prioritization of relational capital indicators. According to [12], RC refers to the organization's establishment, maintenance, and development of public relations matters, including the degree of customer, supplier, and strategic partner satisfaction, as well as the merger of value and customer loyalty. The flexibility of the external organizational links, ethics in relations and efficiency of relation-channels have been decided as main attributes of the system. Flexibility of the external organizational links (FE) are characterized by four sub-attributes: Knowledge flow from customer (KF), knowledge flow from suppliers (KS), relations between the company and its shareholders (RCS) and relations between the company and society (RS). Ethics in relations (ER) is characterized by two sub-attributes: Social responsibility (SR) and business ethics (BE). Efficiency of relation-channels (EC) is

398 characterized by two sub-attributes: Easy and quick access to knowledge (EK) and reliability of relation-channels (RCh). In this research, RC measurement indicators can be categorized in three main groups as the indicators related to the customers, to the market and to the other elements of environment [13]. These three main groups can be summarized with nine indicators: IND1: Customer satisfaction; IND2: Environmental consciousness; IND3: Emphasizing customer request; IND4: Customer loyalty; IND5: Preference in competition, IND6: Being the sponsor for the social activities; IND7: Market and customers to be understood by employee; IND8: Efficient relations with shareholders; IND9: Participating social activities that are not sponsored. Fig. 2 shows the hierarchical structure of criteria. A group of experts consisting of academics and professionals are asked to make pair-wise comparisons for main and sub-attributes, and indicators. A questionnaire is provided to get the evaluations. The overall results could be obtained by taking the geometric mean of individual evaluations. However, since the group of experts came up with a consensus by the help of the Delphi Method in this case, a single evaluation could be obtained to represent the group's opinion.

4. Conclusion In this paper, the authors proposed a Fuzzy AHP method to prioritization of relational capital measurement indicators under uncertain conditions. The model proposed in this study consists of three main attributes, eight sub-attributes, and nine indicators. The model is verbalized in a questionnaire form including pairwise comparisons. After analyzing the results, it is decided that indicator 4 (ie. Customer Loyalty) is the most important indicator for relational capital measurement. The sequence of the next two indicators according to their importance weights is as follows: IND1: Customer satisfaction, and IND2: Environmental consciousness. For further research, other fuzzy multi-criteria evaluation methods like fuzzy TOPSIS or fuzzy outranking methods can be used and the obtained results can be compared with the ones found in this paper.

Selection of the most efficient indicators

RCh

IND1

Fig. 2. Hierarchical structure of criteria

IND2

IND9

400

References 1.

B. Marr, G. Schiuma and A. Neely, International Journal of Business Performance Management 4, 279 (2002).

2.

A. Seetharaman, K.L.T. Low and A. S. Saravanan, Journal of Intellectual Capital 5, 522 (2004).

3.

N. Bontis, Management Decision 36,63(1998).

4.

F. T. Bozbura and A. Beskese, in Proceedings of the IIth International Fuzzy Systems Association World Congress (IFSA 2005), 1756 (2005).

5.

R. McKenna, The Regis Touch. Addison-Wesley, Massachusetts, USA (1986).

6.

J. C. Narver, and S. F. Slater, Journal of Marketing 54, 20 (1990).

7.

M. Miller, B. DuPont, V. Fera, R. Jeffrey, B. Mahon, B. Payer, and A. Starr, International Symposium: Measuring and Reporting Intellectual Capital, Amsterdam, Holland, June 9-10, (1999).

8.

U. Johanson, M. Martensson and M. Skoog, International Symposium: Measuring and Reporting Intellectual Capital, Amsterdam, Holland, June 910, (1999).

9.

T. L. Saaty, The analytic process: Planning, priority setting, resources allocation, McGraw-Hill, London, (1980).

10. D-Y. Chang, European Journal of Operational Research 95, 649 (1996). 11. D-Y. Chang, Optimization Techniques and Applications, Vol. 1, World Scientific, Singapore, 352 (1992). 12. P. Y. Chu, Y. L. Lin, H. Hsiung and T. Y. Liu, Technological Forecasting & Social Change, Forthcoming. 13. F. T. Bozbura, The Learning Organization: An International Journal 11, 357 (2004).

MULTICRITERIA MAP OVERLAY IN GEOSPATIAL INFORMATION SYSTEM VIA INTUITIONISTIC FUZZY AHP METHOD TOLU SILAVI MOHAMMAD REZA MALEK MAHMOUD REZA DELAVAR Center of Excellence in Geomatics Eng. and Disaster Management, Dept. of Surveying and Geomatics Eng., Engineering Faculty, University of Tehran, Tehran, Iran. tsilavi@ut. ac. ir, malek@ncc. neda. net. ir, mdelavar@ut. ac. ir

Decision making within the real-world inevitably includes the consideration of evidence based on several criteria, rather than a preferred single criterion. Solving a multi-criteria decision problem offers the decision maker a recommendation, in terms of the best decision alternatives. Within the framework of this article we are attempt to reach a new method in making the comparison matrices in AHP approach to consider some aspects of uncertainties in process of multi criteria decision making. To do this, rules and logic of intuitionistic fuzzy is applied. We provide numerical illustrations of using four major criteria, namely population, age of construction, type of construction and mean number of floors in a census block of an urban region. The result is earthquake vulnerability map which presented with two raster layers, where the degree of membership and the degree of non-membership are the values of each layer so the regions with high membership degree in first map and low membership degree in second map can be determined as high vulnerable regions with more reliability degree.

1. Introduction Geospatial information system (GIS) is a decision support system involving the integration of spatially referenced data in a problem solving environment [4]. One of the basic classes of operations for spatial analysis is attribute operations which are operations on one or more attributes of multiple entities that overlap in space [3]. An overlay procedure generates a new layer (output layer) as a function of two or more input layers. Specifically, the attribute value assigned to every location (or set of locations) on the output layer is a function of the independent values associated with that location in the input layers. Overlay operations may involve any combination of points, lines, areas or pixels [10]. The ultimate aim of GIS is to provide support for making spatial decision so the 401

402 GIS capabilities for supporting spatial decisions can be analyzed in that context of the decision-making process. Simon suggests that any decision making process can be structured into three major phases: intelligence (is there a problem or an opportunity for change?), design (what are the alternatives?) and choice. One of the stages of design is criteria weighting that at this stage, the decision maker's preferences with respect to the evaluation criteria are incorporated into the decision model. One of the most famous and usable approach for doing this stage is the analytical hierarchy process (AHP). The AHP elicits a corresponding priority vector interpreting the preferred information from the decision-maker, based on the pair wise comparison values of a set of objects. Levary and Wan [8] clearly state the need to consider uncertainty within AHP: "Since in most cases it is unrealistic to expect that the decision maker will have either complete information regarding all aspects of the decision making problem or full understanding of the problem, a degree of uncertainty will be associated with some or all of the pair-wise comparisons" [8]. One of the most important source of uncertainty involved in this area, is contradictory of experts' opinion. That means, determining the preferences are imperfect. Using of intutionistic fuzzy (IF) method in such a case is reasonable because on the one hand it supports linguistic variables related to doubt and hesitancy and on the other hand, it could manage given contradictory reasons [2, 11]. Therefore, a multi-criteria decision making approach based on intuitionistic fuzzy AHP has been implemented in order to solve a map overlay problem in GIS to find earthquake vulnerability of an urban region with some simulated data. In this area, there are some spatial criteria which have effect on urban earthquake vulnerability and can be shown as a raster map layers [1]. In chapter 2 of this paper principles of AHP and fuzzy AHP have been stated, chapter 3 considers the intuitionistic fuzzy concepts as a tool for modeling some aspects of uncertainty in AHP, such as contradictory of decision maker's opinions. Chapter 4 is concerned on implementing the fuzzy AHP and presented intuitionistic fuzzy approach in chapter 3; in order to weighting the factors determine earthquake vulnerability. 2. Basic Principle of AHP and Fuzzy AHP The AHP, originally developed by Saaty [14, 15], is a widely applied multicriteria decision making tool which utilizes the concept of pair-wise comparisons to arrive at a scoring and rank ordering of the alternatives under consideration. The decision maker provides a subjective cardinal judgment

about the intensity of his preference for each alternative over other alternatives under each of a number of criteria or properties [19]. In AHP, three important components which are the aim, criteria and alternatives have been represented hierarchically. There are many approaches for extracting weights from this hierarchy that all of them use a pair-wise comparison matrix which is filled by decision maker. As mentioned above, there are aspects of uncertainties in expressing the preferences during pair-wise comparison. This paves the path for the incorporation of fuzzy logic in the AHP [6]. All the laws in traditional AHP can be used for fuzzy AHP but pair-wise comparisons must be expressed as fuzzy numbers which are in most cases triangular. In order to calculate the weights from such a pair-wise comparison matrix, using the operations on triangular fuzzy numbers are necessary. Here are the few basic fuzzy arithmetic operations on triangular fuzzy numbers: Let A=(la, ma, ua) and B=(lb, mb, Ub) be the two triangular fuzzy numbers, • A+B=( la+lb, ma+mb, ua+ub) • A-B=( la-lb, ma-mb, ua-ub) • AB=( lalb, mamb, uaub) • A/B=( la/ ub, ma/mb, ua/lb) After obtaining the fuzzy performances, the ultimate aim will be gotten the final results in crisp form. Therefore, the fuzzy performance matrices are transformed into interval performance matrices using the a-cut concept, a-cuts will yield an interval set of values from a fuzzy number. For example an a=0.5 will yield a set 010.5= [0.3, 0.7] for a triangular fuzzy number which the start and end point of that are 0.1 and 0.9. Now the crisp performance matrix is obtained by applying the X, the optimum index. Optimum index is applied over the interval performance set as shown in equation (1) resulting in a crisp performance C^. Cx=Xy.pra+(\-X)y.p,a

,^

Where C\ is crisp performance of a fuzzy performance p after extracting the [pia, Pra] as a one a-cut of that. The optimum index is a number between 0 and 1. In most cases it would be 0, 0.5 and 1 [12]. 3. Uncertainty modeling in AHP using intuitionistic fuzzy logic As mentioned above, since pair wise comparison values are the judgments obtained from an appropriate semantic scale, in practice, the decision-maker usually gives some or all pair-to-pair comparison values with an uncertainty

404 degree rather than precise ratings. Out of several higher-order fuzzy sets, intuitionistic fuzzy sets introduced by Atanassov [2] have been found to be well suited to dealing with vagueness. The traditional fuzzy logic has two important deficiencies. First, to apply the fuzzy logic, we need to assign, to every property and for every value, a crisp membership function. Second, fuzzy logic does not distinguish between the situations in which there is no knowledge about a certain statement and a situation that the belief to the statement in favor and against is the same. Due to this fact, it is not recommended for problems with missing data and where grades of membership are hard to define [11]. According to the above mentioned arguments, it is expected that intuitionistic fuzzy sets could be used to simulate human decision making processes and any activities requiring human expertise and knowledge, which are inevitably imprecise or not totally reliable[9]. Therefore, it is attempted to developing the AHP, as a multi-criteria decision making approach to consideration of hesitancy in decision maker's opinions. Start point of this method is supposing that there is a finite set of criteria or alternativesX ={xt,...,x \. The set R can be defined as a set of all the possible pairs from members of X. A membership degree and a nonmembership degree can be determined for all these pairs by a decision maker. The concept of set R is consisting of pairs in them the first element has the most preference on the second element. In such a case the membership and nonmembership degrees are 1 and 0, respectively. At the next stage, two set for membership and non-membership degrees which are similar to fuzzy binary preference relations [7] for each member of this set have been produced. Then summing up all the degrees which are related to preference and non-preference of each criteria to the others and dividing the results to n-1 are performed. So for all the criteria, there are two weights indicating the preference and nonpreference of that criteria relative the others. The ultimate results of these weights after using of them on an overlay analysis are two output maps which can indicate the preference index and non preference index of each region to a specific class. 4. Implementing the fuzzy and intuitionistic fuzzy AHP for earthquake vulnerability mapping Natural disasters are extreme events with in the Earth's system (lithosphere, hydrosphere, biosphere and atmosphere), resulting in death or injury to humans, and damage or loss of goods, such as buildings, communication systems, agricultural land, forest and natural environment. Mitigation of natural disasters

can be successful only when detailed knowledge is obtained about the expected frequency, character and magnitude of hazardous events in an area. Many types of information that are needed in natural disaster management have an important spatial component such as maps [16]. The risk of each hazard consists of two components: hazard and vulnerability [18]. According to this we find that, in order to reach a risk map, preparing a reliable vulnerability map is essential equipments. Since most of the criteria which have affected on earthquake vulnerability have spatial characteristics, therefore, assessment of earthquake vulnerability requires the spatial distribution of these parameters [1]. In GIS each criteria can be described by a map layer which can be presented directly or indirectly in raster format [3]. Consequently, criteria integration procedure would be done by overlay analysis on weighted layers (see e. g. [5, 17]). Table 1 shows a pair-wise comparison matrix for fuzzy AHP, in order to weight the four most popular parameters which have affect on earthquake vulnerability. These parameters are mean number of population in each block, age of buildings in each block, type of constructions and mean number of floors [1]. A, B, C and D have been respectively assigned to the parameters. These preferences have been determined based on scale ratings which are presented in traditional AHP, by a decision maker as introduced in Table 2. In other word these are personal opinions of him/her about preferences of each criteria to others. Table 1. Pair-wise comparison matrixes for fuzzy AHP E hc f »" a , k e vulnerability A B C D

A

B

0,1,1) (1/9, 1/8, 1/7) (1/4, 1/3, 1/2) (1/5, 1/4, 1/3)

(3,4,5) 0,1,1) (1/3, 1/2, 1) (1/4,1/3, 1/2)

C

D

(7, 8, 9) (1,2,3)

(2, 3, 4) (1/6, 1/5, 1/4) (2,3,4)

(1,1,1) (4,5,6)

(1,1,1)

Table 2. The AHP pair-wise comparison continuous rating scale (The numbers between them can be used too) [13], More important Extremely 9

Strongly 5

Moderately 3

rt

, Moderately 1 1 / 3

Strongly 1/5

'

Extremely 1/7

1/9

In the case of intuitionistic fuzzy AHP, for parameters A, B, C and D the set R can be produced as follow:

R= {(A, B), (A, C), (A, D), (B, A), (B, C), (B, D), (C, A), (C, B), (C, D), (D, A),(D, B),(D, C )} This set is produced with this assumption that each criteria has no preference on itself, so the (A, A), (B, B), (C, C) and (D, D) have not been included. The sets PRE and NPR, as shown below, consist of membership and non-membership degrees of each pair to set R, which are personal opinions of a decision maker. PRE= {0.80, 0.95, 0.45, 0.17, 0.22, 0.09, 0.02, 0.70, 0.35, 0.45, 0.84, 0.55} NPR= {0.17, 0.02, 0.45, 0.80, 0.70, 0.84, 0.95, 0.22, 0.55, 0.45, 0.09, 0.35} Mean of the all the preferences and non-preferences for each criteria must calculated separately. Table 3 shows the result of these processes and the results from fuzzy AHP with a=0.6 and A.=0.5. Table 3. The weights resulted from intuitionistic fuzzy AHP and fuzzy AHP Intuitionistic fuzzy AHP Fuzzy AHP Criteria A B C D

Preferences {0.80,0.95,0.45} {0.17,0.22,0.09} {0.02, 0.70, 0.35} {0.45, 0.84, 0.55}

Weights 0.72 0.16 0.36 0.61

Non-Preferences {0.17,0.02,0.45} {0.80,0.70, 0.84} {0.22,0.95,0.55} {0.45, 0.09, 0.35}

Weights 0. 0. 0. 0.

w

. 0.528 0.106 0.147 0.217

The main difference of fuzzy AHP and used intuitionistic fuzzy concepts are in preference numbers which are used in them. In fuzzy AHP, numbers are in rating scale which has been presented in traditional AHP, described in Table 3, however, in intuitionistic fuzzy of that, preferences have presented as fuzzy binary numbers. Figures 1 and 2 present the vulnerability maps which are result of weighted overlay analysis on four raster map layers. In this paper one district of city of Tehran has been considered for case study. Figure 1 show this by applying preferences and Figure 2 is the same however, with applying the nonpreferences. The results like these maps, help to decision makers to make decisions with more reliability as soon as possible. In these two raster map, regions with high membership degree in Figure 1 and low membership degree in Figure 2, can be determined as high vulnerable regions with high consistency (see regions inside circles in figure 1 and 2).

5. Conclusions and further works Intuitionistic fuzzy AHP method is an effective method to emerge different experts' viewpoints and to deal with interoperability of different systems, combining different data sets. Our result shows, it would be better for the disaster manager to make more reliable decisions with uncertain data and idea about preferences of effective parameters on earthquake risk and vulnerability. The AHP as a popular approach in multi-criteria decision making processes has been selected and a procedure to combine intuitionistic fuzzy concept with that has been considered. The result of this method has been compared with fuzzy AHP however these techniques are inherently different, because the preference numbers used in them are different. Combination of the given IF risk map with an intuitionistic knowledge base system is our future work.

Figure 1. The left is earthquake vulnerability map resulted from preference weights in intuitionistic fuzzy AHP, based on table 3 and the right is earthquake vulnerability map resulted from nonpreference weights in intuitionistic fuzzy AHP, based on table 3.

References 1. R. Aghataher, M. R. Delavar and N.Kamalian: Weighing of contributing factors in vulnerability of cities against earthquakes, Proceeding of Map Asia, Jakarta, Indonesia (2005). 2. K. Atanassov, Intuitionistic Fuzzy Sets, Fuzzy Sets and Systems, 20: 87- 96 (1986). 3. P. A. Burrough and R. A. McDonnell, Principles of Geographical Information Systems, Oxford University Press (1998). 4. D. J. Cowen, GIS versus CAD versus DBMS: what are the differences?, Photogrammetric Engineering and Remote Sensing, 54:1551-4 (1988). 5. M. N. Demers: GIS Modeling in Raster, John Wiley & Sons, Inc (2002). 6. H. Deng: Multicriteria Analysis with fuzzy pairwise comparison, International Journal of Approximate Reasoning, 21: 215-231 (1999).

408 7. F. Herrera, L. Martinez and P. J. sanchez: Managing non-homogeneous information in group decision making, European Journal of Operational Research, 166: 115-132 (2005). 8. R. R. Levary and K. Wan, A Simulation Approach for Handling Uncertainty in the Analytical Hierarchy Process, European Journal of Operational Research, 106(1): 116- 122 (1998). 9. D. F. Li: Multi-Attribute Decision Making Models and Methods Using Intuitionistic Fuzzy Sets, Journal of Computer and System Sciences, 70: 7385 (2005). 10. J. Malczewski: GIS and Multi-criteria Decision Analysis, John Wiley & Sons, (1999). 11. M. R. Malek and F. Twaroch: An Introduction to Intuitionistic Fuzzy Spatial Region, Geolnfo Series 28a, Proc. of the ISSDQ'04, Vienna (2004). 12. T. N. Prakash: Land Suitability Analysis for agricultural Crops: A Fuzzy Multicriteria Decision Making Approach, Master Thesis, International Institute for Geo-information Science and Earth Observation, Enschede, Netherland, (2003). 13. K. Rashed and J. Weeks: Assessing vulnerability to earthquake hazards through spatial multi-criteria analysis of urban areas, International Journal of Geographic Information Science, 17( 6): 547-576 (2003). 14. T. L. Saaty: A Scaling Method for Priorities in Hierarchical Structures, Journal of Mathematical Psychology, 15: 37- 57 (1977). 15. T. L. Saaty and L.G. Vargas: Uncertainty and Rank Order in the Analytical Hierarchy Process, European Journal of Operational Research, 32: 107117(1987). 16. A. Skidmore: Environmental Modeling with GIS and Remote Sensing, Taylor & Francis, (2002). 17. C. D. Tomlin: Geographic Information System and Cartographic Modeling. Englewood Cliffs, NJ: Pertice Hall (1990). 18. UN: Mitigating Natural Disasters: Phenomena, Effects, and Options: a Manual for Policy Makers and Planners (New York: UNDRO (United Nations Disaster Relief Organization)) (1991). 19. R. C. Van den Honert: Stochastic Group Preference Modeling in the Multiplicative AHP: A Model of Group Consensus, European Journal of Operational Research, 110: 99- 111 (1998).

A CONSENSUS MODEL FOR GROUP DECISION MAKING IN HETEROGENEOUS CONTEXTS

LUIS M A R T I N E Z , F R A N C I S C O MATA Dept. of Computer Science University of Jaen 23071 - Jaen, Spain Email:martin,fmata@ujaen. es ENRIQUE HERRERA-VIEDMA Dept. of Computer Science and University of Granada 18071 - Granada, Spain Email:vieda@decsai. ugr. es

A.I.

The consensus process in Group Decision Making (GDM) problems helps to achieve solutions that are shared by the different experts involved in such problems. Due to the fact that in GDM problems different experts take part in the decision process is common that they need to express their information in different domains 5 ' 2 . In this contribution we focus on GDM problems defined in heterogeneous contexts with numerical, linguistic and interval valued information. And our aim is to define a consensus model that includes an Advice Generator to assist the experts in the consensus reaching process of GDM problems with heterogeneous preference relations. This model will provide two important improvements: (i) Firstly, its ability to cope with group decision-making problems with heterogeneous preference relations, and, (ii) secondly, the figure of the moderator, traditionally presents in the consensus reaching process, is replaced by an advice generator, and in such a way, the whole group decision-making process can be easily automated.

1.

Introduct ion

In GDM problems are carried out two processes before obtaining a final solution 3'4'6-8: the Consensus Process and the Selection Process (see Fig. 1). The first one refers to how to obtain the maximum agreement between the set of experts on the solution set of alternatives. Normally this process is guided by a human figure called moderator 4 ' 8 . The second one obtains the solution set of alternatives. In the literature has shown that in GDM problems could be necessary 409

410 CONSENSUS PROCESS .

ADVICES

PREFERENCES UNDER CONSENSUS

PROBLEM

EXPERTS' GROUP

SETOF

I MODERATOR!

SELECTION J PROCESS J

ALTERNATIVES PREFERENCES

Figure 1.

SOLUTION SET OF ALTERNATIVES

Resolution process of a group decision-making problem

or suitable that the experts can express their knowledge in different expression domains such as numeric, linguistic and/or interval ones and different Selection Processes 2 ' 5 have been proposed to solve them, but there are not defined specific consensus processes for this type of problems. Consequently, in this contribution we focus on the Consensus Process on heterogenous GDM problems. The consensus is defined as a state of mutual agreement among members of a group where all opinions have been heard and addressed to the satisfaction of the group 10 . The consensus reaching process is defined as a dynamic and iterative process composed of several rounds, where the experts express and discuss about their opinions. Traditionally this process is coordinated by a human moderator, who computes the agreement among experts in each round using different consensus measures 9 , r . If the agreement is not acceptable then the moderator recommends to the experts to change their furthest opinions from the group opinion in an effort to make their preferences closer in the next consensus round 1 , n . The moderator is usually a controversial figure because the experts complaints about his lack of objectivity and additionally in heterogeneous contexts it is difficult for him to understand all the different domains and scales in a proper way. Therefore, the aim of this contribution is to present a consensus model for GDM problems such that: • The experts can express their preferences by means of linguistic, numerical or interval-valued preference relations. • The moderator tasks are carried out by means of an automatic advice generator. The rest of the paper is set out as follows. The scheme of an heterogenous GDM problem is described in Section 2. The intelligent consensus model is presented in Section 3. Finally, in Section 4 we draw our conclusions.

411 2. A Heterogeneous GDM Problem A group decision-making (GDM) problem may be defined as a decision situation where there are X = {x\, x-i,..., xn} (n > 2), a finite set of alternatives, and a group of experts, E = {ej, e<2,..., e m } (m > 2); each expert ej provides his/her preferences on X by means of a linguistic preference relation, fipe : X x X —> D, where D is the expression domain used by the expert e; to provide their preferences. The ideal situation in a GDM problem is that all the experts have a precise knowledge about the alternatives and provide their opinions in a numerical precise scale. However, in some cases, experts may belong to distinct research areas and have different levels of knowledge about the alternatives. A consequence of this is that preferences can be expressed by means of numbers, interval values or linguistic terms, so D G {N\I\L}. In this contribution, we deal with heterogeneous GDM problems, i.e., GDM problems where each expert ej may express his/her opinions on the set of alternatives using different expression domains D, G {N\I\L}, by means of a preference relation P e , = (p] ), where p\ £ D* represents the preference of alternative Xj over alternative Xk for that expert. fp}1 ••

p}n\

W1 •••vT) This type of context implies the necessity of adequate tools to manage and model heterogeneous information 5 . 3. A Consensus Model for Heterogeneous GDM problems In this section are presented a consensus model for GDM problems defined in heterogeneous contexts that automates the moderator's functions (see Fig. 2) that is developed in four phases : (1) Making the information uniform: it unifies all the different preferences into a single domain. (2) Computing consensus degrees: these values measure the agreement amongst all the experts. (3) Checking the agreement: these values are used to learn how close the collective and individual expert's preferences are. (4) Generating Advices: an automatic advice generator guides the experts in order to improve the consensus recommending which opinions should change.

PREFERENCES UNDER CONSENSUS

EXPERTS'

SELECTION PROCESS

GROUP ALTERNATIVES

SOLUTION SET OF ALTERNATIVES HETEROGENEOUS PREFERENCES

Figure 2. Resolution process of a heterogeneous group decision-making problem

The above model will be described in detail in the following subsections. 3.1. Making

the information

Uniform

We must keep in mind we are dealing with heterogeneous contexts composed by numerical, interval valued and linguistic information. So, we need to unify the heterogeneous information into a common utility space to operate on it easily. To do so we propose the use of the process proposed in 5 that transforms the heterogeneous input values into fuzzy sets on a linguistic term set, ST = {so,...,sg}. So, each numerical, interval-valued and linguistic evaluation, is transformed into a fuzzy set in ST, F(ST)'TD:D->

F{ST)

TDST(p-j

= {(ch,a£)/h

= 0,...,g}

where at least 3cp£ > 0

After this unification process and assuming that each fuzzy set will be represented by means of its membership degrees ( a ^ 0 , . . . , a^ ), the preference relation of each expert, Pei, whose elements are fuzzy sets:

/ Pe,•

11 _

Pi

/

11

— \ai0 ' • . ,a.ig)

„ l n _ (a}Z,..., c-ln

a%)\

=

\P

3.2. Computation

,nl

Wo1,

a"1)

of Consensus

(ann iO

, •

-<")/

Degrees

The consensus degree measures the agreement among all the experts. To compute these degrees it is necessary to compute a consensus matrix obtained aggregating the distance among the experts preferences, comparing one with each other. The distance between two experts, ej, ej, is computed using distance matrices, DMij = {<&). The values d\k, express the distance

between two preferences plk, plk and are calculated as:

< = rf(pf,pf) = l -

cv\k — cvlk (1)

where cvlk is the central value of the fuzzy set that represents the preference, p\k that is calculated as:

OT« =

s?-o^qp-«S t being index{sij) = .

(2)

The computation of the consensus degrees is carried out as follows: (1) To compute the central values for each p]1: cvf; V i = l,...,m;

I, k = 1,... ,n A / ^ k.

(3)

k

(2) To compute distance matrix DMij = (d\ ) for each pair of experts: d^=d(pt,plk).

(4)

lk

(3) A consensus matrix, CM = (cm ), is obtained by aggregating all the distance matrices at the level of pairs of alternatives: cmlk = 4>{dfj); i,j-l,...,m

A V l,k = 1,... ,n A i < j .

Where & is an aggregation operator. This matrix CM is used to compute the consensus degrees. (4) Computation of consensus degrees: This computation are carried out at three levels: (a) Consensus on pairs of alternatives, cplk: it measures the agreement on the pair of alternatives (xi, Xk) amongst all the experts: cplk = cmlk,

Vl,k=l,...,n

A

l^k.

lk

The closer cp to 1, the greater the agreement. (b) Consensus on alternatives, cat: it measures the agreement on an alternative xi amongst all the experts:

ca^^cmn (c) Consensus on the relation, cr: it measures the global consensus degree amongst the experts' opinions:

•r-ffilSi. n

(6)

3.3. Checking

the

Agreement

The consensus model controls the agreement in each discussion round. Before starting the model, a consensus threshold, 7 e [0,1], is fixed, which will depend on the particular problem we are dealing with. When the consensus measure or reaches 7 the consensus process is ended and the selection process will be applied to obtain the solution. Additionally a parameter, Maxcycles, controls the maximum number of discussion rounds.

3.4. Generating

Advices

When the agreement is not good enough, or < 7, the experts should modify their preferences to increase the agreement. To do so, this model computes which are the experts furthest from the collective opinion (proximity measures) and will generate advices for them recommending which and how do they change their preferences. Both processes are presented in detail.

3.4.1. Computation

of Proximity

Measures

Proximity measures evaluate the agreement between the individual experts' opinions and the group opinion. Thus, firstly a collective preference relation, P e c = (Pcfc)is calculated aggregating the individual preference relations { P e , = (p-*);i = l , . . . , m } : Plc — *l>{p[k, • • • ,Pm) with ip an "aggregation operator" We use the equation (1) to measure the agreement between each individual expert's preferences, P e , , and the collective preferences, P e c . Therefore, the measurement of proximity is carried out in two steps: (1) A proximity matrix, PMi = (pmf), for each expert e*, is obtained with pmf = d{pf,p1*). These matrixes will be used to compute the proximity measures. (2) Computation of proximity measures at different levels: (a) Proximity on pairs of alternatives pplf: it measures the proximity between the preferences, on each pair of alternatives, of the expert, ej, and the group: ppf=pmf,

\fl,k=l,...,n 1

A I J= k

(b) Proximity on alternatives, pa : it measures the proximity between the preferences, on each alternative, xi, of the ex-

pert, ej, and the group:

(c) Experts's proximity, pe*: it measures the global proximity between the preferences of each expert, ej, and the group:

*e, = SkM

(8)

n If the above values are close to 1 then they have a positive contribution for the consensus to be high, while if they are close to 0 then they have a negative contribution to consensus. 3.5. Advice

Generator

Finally, the consensus model will generate advices automatically in order to increase the agreement indicating who and how should change his/her opinions. This generation is carried out as: • To identify the experts furthest from the agreement: percentage. • To identify which alternatives must be changed, those alternatives whose consensus degree cal < 7. • To identify the pairs of alternatives that must be changed. Once has been identified the expert e^ and the alternatives xi to change, all the pairs p' fc (k = 1, ...,n) such that pplk < 0 must be changed. The parameter 0 is a proximity threshold that helps to choose which are the furthest alternatives from the collective opinion. • Changing Direction Rules (CDR): finally the advice generator computes if the values of the pair of alternatives to change should increase or decrease. Taking account that p' fc are fuzzy sets the advice generator defines two direction parameters ml or main and si or sencondary. These parameters are used so to experts (eml, esl) as for the collective opinion (cml,csl). Each parameter are the value and position of the two highest membership values of the expert's preference (emlpoS,emlvai,eslpOS,eslVai) and the collective preference (cmlpOS,cmlvai,cslpOS,cslvai). This parameters are used by the following direction rules: D R . l . IF emlpos > cmlpos THEN ej should decrease the value of p' fc . DR.2. If emlpos < cmlpos THEN e,- should increase the value of p\k. DR.3. If emlpos = cmlp0s THEN DR.l and DR.2 but with emlvai and cmlvai-

DR.4. IF {emlpos = cmlpos AND emlval but with si.

= cmlvai),

THEN DR.l and DR.2

4. C o n c l u d i n g R e m a r k s A consensus model to manage the consensus process of heterogeneous GDM problems has been presented. There are two main features of this model: (i) it is able to manage consensus processes in problems where experts may have different levels of background or knowledge to solve the problem, and (ii) it is able to generate advices on the necessary changes in the experts' opinions in order to reach consensus, which makes the figure of the moderator, traditionally present in the consensus reaching process, unnecessary. References 1. N. Bryson. Group decision making and the analytic hierarchy process: Exploring the consensus-relevant information content. Computers and Operational Research, 23:27-35, 1996. 2. M. Delgado, F. Herrera, E. Herrera-Viedma, and L. Martmez. Combinig numerical and linguistic information in group decision making. Information Sciences, 107:177-194, 1998. 3. J. Fodor and M. Roubens. Fuzzy preference modelling and multicriteria decision support. Kluwer Academic Publishers, Dordrecht, 1994. 4. F. Herrera, E. Herrera-Viedma, and J.L. Verdegay. A model of consensus in group decision making under linguistic assessments. Fuzzy Sets and Systems, 79:73-87, 1996. 5. F. Herrera, L. Martinez, and P.J. Sanchez. Managing non-homogeneous information in group decision making. European Journal of Operational Research, 166(1):115-132, 2005. 6. E. Herrera-Viedma, F. Herrera, and F. Chiclana. A consensus model for multiperson decision making with different preference structures. IEEE Transactions on Systems, Man and Cybernetics-Part A, 32:394-402, 2002. 7. E. Herrera-Viedma, F. Mata, L. Martinez, F. Chiclana, and L.G. Perez. Measurements of consensus in multi-granular linguistic group decision making. Lecture Notes in Artificial Intelligence, Vol. 3131, 3131:194-204, 2004. 8. J. Kacprzyk, H. Nurmi, and M. Fedrizzi. Consensus under Fuzziness. Kluwer Academic Publishers, 1997. 9. L.I.Kuncheva. Five measures of consensus in gdm using fuzzy sets. In Proc. IFSA 91, pages 141-144, 1991. 10. S. Saint and J.R. Lawson. Rules for Reaching Consensus. A Modern Approach to Decision Making. Jossey-Bass, San Francisco, 1994. 11. S.Zadrozny. Consensus under Fuzziness, chapter An Approach to the Consensus Reaching Support in Fuzzy Environment, pages 83-109. Kluwer Academic Publishers, 1997.

A LINGUISTIC 360-DEGREE PERFORMANCE APPRAISAL EVALUATION MODEL

R. DE ANDRES Dep. de Fundamentos del Andlisis Economico e H. I. E., Universidad de Valladolid, Avda. Valle de Esgueva 6, 47011 Valladolid, Spain E-mail: [email protected] J. L. GARCIA-LAPRESTA Dep. de Economia Aplicada (Matemdticas), Universidad de Valladolid, Avda. Valle de Esgueva 6, 47011 Valladolid, Spain E-mail: [email protected] L. MARTINEZ Dep. de Informdtica, Universidad de Jaen, Campus Las Lagunillas s/n, 23071 Jaen, Spain E-mail: [email protected]

Performance appraisal is a process used for some firms in order to evaluate the efficiency and productivity of their employees for planning their promotion policy. Initially this process was carried out just by the executive staff, but recently it has evolved to an evaluation process based on the opinion of different reviewers, supervisors, collaborators, clients and the employee himself (360-degree method). In such a evaluation process the reviewers evaluate some indicators related to the employee performance appraisal. These indicators are usually subjective and qualitative in nature that implies vagueness and uncertainty in their assessment. However, most of performance appraisal models force reviewers to provide their assessments about the indicators in a unique precise quantitative domain. We consider this obligation drives to a lack of precision in the final results, so in this paper we propose a linguistic evaluation framework to model qualitative information and manage its uncertainty. Additionally due to the fact that there are different sets of reviewers taking part in the evaluation process that have a different knowledge about the evaluated employee it seems suitable t o offer a flexible framework in which different reviewers can express their assessments in different linguistic domains according to their knowledge. The final aim is to compute a global evaluation for each employee, that can be used by the management team to make their decisions regarding their incentive and promotion policy.

417

1. I n t r o d u c t i o n One of the main challenges of companies and organizations is the improvement of productivity and efficiency. Performance appraisal is essential for the effective management and evaluation of corporations. Recently more and more companies are trying to increase their productivity through the human performance measurement. Performance appraisal is used for the evaluation of employees estimating their contributions to the goals of the organization, behavior and results. This evaluation process has been accomplished from different points of view can be found in 2, 4, 6, 7, 10, 11, 15, 17 and 18. In classical performance appraisal methods just supervisors evaluated employees. However, corporations are adopting new methods that use information from different people (reviewers) connected with each evaluated worker. In fact, the 360° appraisal or integral evaluation is a methodology for evaluating worker's performance that includes the opinions of supervisors, collaborators, clients and himself (see 9 and 16). Then, each reviewer from the different reviewers collectives (supervisors, collaborators, clients, employee) evaluates indicators used for measuring the performance appraisal of the evaluated worker. Usually these indicators have a qualitative nature and involves uncertainty. However most of evaluation process force the reviewers to manifest their assessments in a unique quantitative precise scale (see 3). Finally the method generates a global evaluation value according to all the indicators and all the reviewers aggregating their assessments. The use of a precise scale to express qualitative information can produce a lack of precision in the assessments provided by the reviewers due to the difficulty of expressing uncertain knowledge in a precise way. In the literature the use of the Fuzzy Linguistic Approach 1 9 to model and manage the qualitative and uncertain information has provided successful results 1,8

Although, it could be logical that in the initial performance appraisal methods there was a unique expression domain for all the reviewers (supervisors) that belonged to the same collective the addition of new reviewers collectives (collaborators, clients, etc) implies that different collectives can have totally different knowledge about the evaluated worker. So each collective or even more each reviewer could express his assessments in different expression domains 1 2 , 1 3 . Therefore the aim of this paper is provide a performance appraisal

419 method that take into account the above problems. Hence the aim of this contribution is to develop a linguistic evaluation method in which different linguistic expression domains can be used by the reviewers to express their assessments. Subsequently, in order to aggregate all the opinions, it is necessary to unify them in a common domain. In this way, the proposed method will conduct each linguistic label provided by reviewers as a fuzzy set in the common domain to compute collective assessments that will allow to the management team to make the final decision. Thus, the problem falls, in a natural way, into the collective decision making context. The paper is organized as follows. Section 2 is devoted to introduce the notation and the structure of the arisen problem. In Section 3 we introduce the multi-granular linguistic evaluation model. Finally, some concluding remarks are included in Section 4.

2. P r e l i m i n a r i e s In this section is introduced a scheme for a 360° evaluation methodology for performance appraisal evaluation and afterwards is showed a classical evaluation method for it. The aim of this problem is to evaluate the employees taking into account the opinions of different collectives related to them. We now present the main features and terminology we consider for the arisen problem. • A set of all employees X = {x\,... following collectives:

,xn}

to be evaluated by the

— A set of supervisors (executive staff): A = {ai,..., ar}. - A set of collaborators (fellows): B = {bi,..., bs}. - A set of clients (customers): C = { c i , . . . , ct}. — X (the opinion of each employee about himself can be taken into account). • Different criteria: Y\,..., Yp. • The assessments of a« £ A, bj £ B and Cj £ C on the employee Xj according to the criterion V*: o'fc, 6*fc and cj*, respectively. Moreover, ar? is the assessment of Xj on himself with respect to Yk- Therefore, there are (r + s + t + l ) p assessments for each employee provided by the different collectives. In the classical approach to performance appraisal, the assessments are usually numerical 7 . However, in this paper we will consider multi-granular

linguistic assessments. However, at this stage we do not specify neither the assessments nature nor the domain expressions. To obtain a global evaluation value of each employee, carried out an evaluation method with the following steps: (1) Computing reviewers collective criteria values, V^_{XJ): for each reviewers collective are aggregated their assessments about a given criterion Yk, by means of an aggregation operator, u*, that can be different for each reviewers collective: vkA(xj) = < « , • • • > < )

VUXJ)

= «c( c "' • • •' cf)

vkB(Xj)

=

ukB{bf,...,bf)

v

j (xi) = Ak

k

(2) Computing global criteria values, v (xj): the previous collective assessments, V^_(XJ), are aggregated by means of an aggregation operator uk obtaining a global value for each criterion Y^: vk(xj) = uk(vkA(xj),vkB(xj),vl^(xj),vk(xj)). (3) Computing a final value, V(XJ): it is obtained aggregating the global criteria values related to the employee Xj, by means of an aggregation operator u: V(XJ)

= U{V1(XJ),

...

,VP(XJ)).

These final outcomes, V(XJ), are used for ranking the employees in order to establish the promotion policy. 3. A multi-granular linguistic performance appraisal m o d e l In Section 2 we have seen that usually a performance appraisal problem is defined in a numerical scale in spite of the most of evaluated indicators are qualitative that are difficult to express in a precise way, the use of linguistic values facilitates the modelling and managing of these qualitative indicators. The Fuzzy Linguistic Approach 1 9 provides a systematic way to represent linguistic variables. Due to the fact that in performance appraisal take part in the evaluation process different reviewers that belong to different collectives, it seems logical that they have different information about the indicators so they can express their values in different linguistic expression domains 1 2 , 1 3 . Therefore a suitable way to model the performance appraisal problem to offer a greater expression flexibility is by means of a multi-granular linguistic framework. In this section we present the scheme of the problem in such a framework and an evaluation model to manage this type of information.

3.1. The scheme The schema we present now is similar to that given in Section 2, but in this case the reviewers express their assessments by means of linguistic labels. Additionally we assume that each collective can use different linguistic term sets 12,13 to assess each criterion Yk, fc = 1,... ,p: • aljk e CkA for each i e { 1 , . . . , r} and each j € { 1 , . . . , n). • bf £ CkB for each i € {l,...,s}

and each j € { 1 , . . . , n}.

l

• c f € CQ for each i g { 1 , . . . ,£} and each j 6 { 1 , . . . ,n}. • x>k eCkx for each j e {1,. ,.,n). We note that any appropriate linguistic term set £* is characterized by its cardinality or granularity, |£^|. The granularity represents the level of discrimination among different degrees of uncertainty. Additionally, each £* is ranked by a linear order5. For a further review of the Fuzzy Linguistic Approach and Multigranular linguistic contexts see 12, 13, 19. Since there are p criteria and 4 collectives, we have at most 4p different sets of linguistic labels. From them, we consider the global set of appeared linguistic labels:

c = c\ u • • • u cpA u clB u • • • u CB u clc u • • • u cpc u cxxU£ux.• 3.2. The method Here we present our method to carry out the performance appraisal in a multi-granular linguistic framework (see Fig. 1). The main difference with the evaluation method presented in Section 2 is that here the information to evaluate is expressed in different domains. Therefore it must be unified into a unique linguistic domain before its aggregation. MULTI -GRANULAR LINGUISTIC PERFORMANCE APPRAISAL Supervisor*

—

Collaborators " ClienU Employoc

EVALUATION METHOD Information

—

UNIFYING INFORMATICS

AGGREGATION PROCESSES

• Employee Evaluatioi

—•

Figure 1.

A Multi-Granular Linguistic Performance Appraisal Model

The different phases of the evaluation method are presented in further detail in the following subsections.

3.2.1. Make the information uniform To operate with linguistic terms assessed in different linguistic term sets, first of all we have to conduct the multi-granular linguistic information provided by the different collectives into a unique expression domain, called Basic Linguistic Term Set (BLTS), £ = {1\,... ,Jg}, with g > m a x { | £ i | , . . . , \CA\, \CB\,..., \CB\,\Clc\,...,

\&cl I ^ U • • • > 1^1}-

Once it has been chosen the BLTS, the multi-granular linguistic information must be conducted in it. To do so, we propose to transform this information into fuzzy sets in C by means of the following function14 that computes the matching between the reviewers labels and the labels of the BLTS: T : C —» F(Z) r(s) = {(ii,ai),...,(lg,ag)} at = maxy mm {fii(y),/j.-[((y)}, i =

l,...,g

where •?•"(£) is the set of fuzzy sets on C, m and fij are the membership functions of the linguistic labels / 6 C and ~U € C, respectively. The function r is used for transforming individual assessments into fuzzy sets in the BLTS F(C). Once we have converted all the individual assessments into fuzzy sets on C, the evaluation is easy to carry out. 3.2.2. Evaluation method This aggregation process is similar to that presented in Section 2 but in this case the information aggregated will be fuzzy sets. So, the aggregation operators must aggregate this type of information14'12. (1) Computing reviewers collective criteria values: for each reviewers collective are aggregated their assessments about a given criterion, Yk: vkA^)-ukA{T(af),

,r{af)),

where u\ : (F{C))r —» T{C)

vkB(Xj) = a%(T{b}k),

Mb'")),

where u% : (F(Z))S —» T{C)

vkc(xj) =

,T(cf)),

where ukc : (JF(Z))' —» F(£)

ukc(r(c)k),

vk{Xj) = r{xik).

(2) Computing global criteria values: the previous collective assessments are aggregated by means of an aggregation operator obtaining a global value for each criterion Y^:

vk(Xj)

uk(vkA{xj),vkB{xj),v£:(xj),v*{xj)),

=

where uk : (T(C))4 —+ F{E). (3) Computing a final value: it is obtained aggregating the global criteria values related to the employee xy. V(XJ)

where u : (Jr(C))P

—»

= u(w1(i;,),...,iip(a;j)),

F(E).

These final outcomes, V(XJ), are used for ranking the employees either to establish the promotion policy by means of a fuzzy ranking 1 2 .

4. C o n c l u d i n g remarks The performance appraisal process is more and more important in nowadays companies. In this contribution has been presented a linguistic method to carry out this process that offers a greater flexibility to the reviewers in order to express their values. In the future we shall evolve this process using a linguistic computational model based on linguistic hierarchies 13 to carry out the processes of computing with words in multi-granular contexts without loss of information and to obtain linguistic outcomes.

Acknowledgments The contribution has been partially supported by the Projects MTM200508982-C04-02/03, E R D P and VA040A05.

References 1. B. Arfi, Fuzzy decision making in politics: A linguistic fuzzy-set approach (LFSA). Political Analysis 13, pp. 23-56 (2005). 2. C. G. Banks and L. Roberson, Performance appraisers as test developers, Academy of Management Review 10, pp. 128-142 (1985). 3. J. N. Baron and D. M. Kreps, Strategic Human Resources. Frameworks for General Managers. Wiley k. Sons, New York (1999).

424 4. H. J. Bernardin, J. S. Kane, S. Ross, J. D. Spina and D. L. Johnson, Performance appraisal design, development, and implementation, In: G. R. Ferris, S. D. Rosen and D. T. Barnum (eds.), Handbook of Human Resources Management, Blackwell, Cambridge, pp. 462-493 (1995). 5. P. P. Bonissone and K. S. Decker, Selecting uncertainty calculi and granularity: An experiment in trading-off precision and complexity. In: L. H. Kanal and J. F. Lemmer (eds.), Uncertainty in Artificial Intelligence, North-Holland, pp. 217-247 (1986). 6. R. D. Bretz, G. T. Milkovich and W. Read, The current state of performance appraisal research and practice: Concerns, directions and implications, Journal of Management 18, pp. 321-352 (1992). 7. R. L. Cardy and G. H. Dobbins, Performance Appraisal: Alternative Perspectives, South-Western, Cincinati (1994). 8. C-H Cheng and Y. Lin, Evaluating the best main battle tank using fuzzy decision theory with linguistic criteria evaluation, European Journal of Operational Research 142, pp. 174-186, 2002. 9. M. Edwards and E. Ewen, Automating 360 degree feedback, HR Focus 70, p. 3 (1996). 10. G. R. Ferris and T. A. Judge, Personnel/human resources management: A political influence perspective, Journal of Management 17, pp. 1-42 (1991). 11. C. Fletcher, Performance appraisal and management: The developing research agenda, Journal of Occupational and Organization Psychology 74, pp. 473-487 (2001). 12. F. Herrera, E. Herrera-Viedma and L.Martinez, A fusion approach for managing multi-granularity linguistic term sets in decision making, Fuzzy Sets and Systems 114, pp. 43-58 (2000). 13. F. Herrera and L. Martinez, A model based on linguistic 2-tuples for dealing with multigranularity hierarchical linguistic contexts in multiexpert decisionmaking, IEEE Transactions on Systems, Man and Cybernetics. Part B: Cybernetics 31, pp. 227-234 (2001). 14. F. Herrera, L. Martinez and P.J. Sanchez, Managing non-homogeneous information in group decision making, European Journal of Operational Research 166, pp. 115-132 (2005). 15. J. L. Kerr, Diversification strategies and managerial rewards: An empirical study, Academy of Management Journal 28, pp. 155-179 (1985). 16. S. Marshall, Complete turnaround 360-degree evaluations gaining favour with workers management, Arizona Republic, Dl (1999). 17. J. B. Miner, Development and application of the rated ranking technique in performance appraisal, Journal of Occupational Psychology 6, pp. 291-305 (1988). 18. K. R. Murphy and J. N. Cleveland, Performance Appraisal: An Organizational Perspective, Allyn & Bacon, Boston (1991). 19. L. A. Zadeh, The concept of a linguistic variable and its applications to approximate reasoning, Information Sciences, Part I, II, HI, 8, pp. 199-249; 8, pp. 301-357; 9, pp. 43-80 (1975).

A N INTERACTIVE S U P P O R T SYSTEM TO AID EXPERTS TO EXPRESS CONSISTENT PREFERENCES

S. ALONSO, E. HERRERA-VIEDMA, F. HERRERA AND F.J. CABRERIZO Department of Computer Science and Artificial Intelligence University of Granada, 18071, Granada, Spain E-mail: {salonso, viedma, herrera}@decsai.ugr.es, [email protected] F. CHICLANA Centre for Computational Intelligence, School of Computing De Montfort University, Leicester LEI 9BH, UK. E-mail: [email protected] In Group Decision Making, the expression of preferences is often a very difficult task for the experts, specially in decision problems with a high number of alternatives. The problem is increased when they are asked to give their preferences in the form of preference relations: although preference relations have a very high level of expressivity and they present good properties that allow to operate with them easily, the amount of preference values that the experts are required to give increases exponentially. This usually leads to situations where the expert is not capable of properly express all his/her preferences in a consistent way (that is, without contradiction), so finally the information provided can easily be either inconsistent or incomplete (when the expert prefers not to give some particular preference values). In this paper we develop a transitivity based support system to aid experts to express their preferences (in the form of preference relations) in a more consistent way. The system works interactively with the expert making recommendations for the preference values that the expert have not yet expressed. Those recommendations are computed trying to maintain the consistency level of the expert as high as possible.

1. I n t r o d u c t i o n One of the key issues when solving Group Decision Making (GDM) problems is to obtain the preferences of the different experts in order to lately combine them and find which solution Xj among the feasible set of alternatives X = {x\, ...,xn} is the best. There exist several different representation formats in which experts can express their preferences but, among others, Fuzzy Preference Relations (FPR) 5 ' 6 , 8 have been widely used because they are a very expressive format and also they present good properties that allow to operate with them easily 6 ' 8 .

425

426 Preference relations may also present some disadvantages. As it is required to express a preference degree among all possible pairs of different alternatives, the amount of information that the experts have to provide increases exponentially. Clearly, when the cardinality of the problem is high then we may find situations where the experts do not provide good (consistent and complete) preference relations. In this cases, an expert might choose not to provide all the preference values that he is required to, or the expert might provide his/her preferences in an inconsistent way, i.e., his/her preferences might be contradictory. In a previous paper * a procedure to compute the missing values of an incomplete F P R taking into account the expert consistency level has been developed. Nevertheless, that procedure could not deal with the initial contradiction that the expert could have introduced in his/her preferences, and what could be worse, the expert might not accept the estimated values (even if they increase the overall consistency level). Thus, when designing a computer driven model to deal with GDM problems where the information is given in the form of FPR, software tools to aid the experts to express their preferences avoiding the mentioned problems should be implemented. As experts might not be familiar with preference relations, the aiding tools should be easy enough to use and they should follow the general principles of interface design 4 . In this paper we present an interactive support system to aid experts to express their preferences using fuzzy preference relations. The system will give recommendations to the expert while he/she is providing the preference values in order to maintain a high level of consistency in the preferences, as well as trying to avoid missing information. Also, the system will provide measures of the current level of consistency and completeness that the expert has achieved, which can be used to avoid situations of self contradiction. The system has been programmed using Java technologies, which allows its integration in web-based applications which are increasingly being used in GDM and Decision Support environments 3 ' 1 0 . The rest of the paper is set as follows: In Section 2 we present our preliminaries. In Section 3 we describe in detail our support system. Finally in Section 4 we point out our conclusions and future improvements.

2. Preliminaries In this section we present the preliminaries concepts needed for the rest of the paper: the notion of Incomplete Linguistic Preference Relation, the

427 Additive Transitivity Property and how this transitivity property can be used to estimate missing values in a fuzzy preference relation.

2.1. Incomplete

Fuzzy Preference

Relations

One of the most frequently used formats to represent preferences are Fuzzy Preference Relations 5 , 6 ' s . They present a very high level of expressivity and good properties that allow to operate with them easily 6 , s . Definition 1: A fuzzy preference relation P on a set of alternatives X is a fuzzy set on the product set X x X, i.e., it is characterized by a membership function p,p: X x X —» [0,1]. When cardinality of X is small, the preference relation may be conveniently represented by the n x n matrix P = (j>ik), being pik = fj,p(xi, Xk) (Ve, k € { 1 , . . . , n}) interpreted as the preference degree or intensity of the alternative Xi over xy. p^ = 1/2 indicates indifference between Xi and Xk (xi ~ Xk),p%k = 1 indicates that x; is absolutely preferred to x^, and pik > 1/2 indicates that x^ is preferred to Xk (xi y xk)- Based on this interpretation we have that pa = 1/2 \/i G { 1 , . . . , n} (xi ~ Xi). Usual models to solve GDM problems assume that experts are always able to provide all the preferences required, that is, to provide all pu- values. This situation is not always possible to achieve. Experts could have some difficulties in giving all their preferences due to lack of knowledge about part of the problem, or simply because they may not be able to quantify some of their degree of preference. In order to model such situations, we define the concept of an incomplete fuzzy preference relation 7. Definition 2 A function / : X — • Y is partial when not every element in the set X necessarily maps onto an element in the set Y. When every element from the set X maps onto one element of the set Y then we have a total function. Definition 3 An incomplete fuzzy preference relation P on a set of alternatives X is a fuzzy set on the product set X x X that is characterized by a partial membership function.

2.2.

Additive

Transitivity

Property

For GDM problems where the preferences are given as fuzzy preference relations, some properties about the preferences expressed by the experts are usually assumed desirable to avoid contradictions in their opinions, that is, to avoid inconsistent opinions. One of them is the additive transitiv-

iij/property 6 ' 9 : [pa - 0.5) + (Pjk - 0.5) = (Pik - 0.5) Vt,i, k e { 1 , . . • , n } 2 . 3 . Estimating

Missing

Values

Using

Additive

(1)

Transitivity

Expression (1) can be used to calculate an estimated value of a preference degree using other preference degrees in a fuzzy preference relation. Indeed, the preference value pu, {i ^ k) can be estimated using an intermediate alternative Xj in three different ways: • From p^ = pij + pjk - 0.5 we obtain the estimate (cPikY1 = Pij + Pjk - 0.5

(2)

• From pjk = Pji +Pik — 0.5 we obtain the estimate {cpikY2 = Pjk ~ Pji + 0.5

(3)

• From p^ = pik + pkj — 0.5 we obtain the estimate {cpik)j3 = Pij - Pkj + 0.5

(4)

As we have already said, and expert can choose to not provide complete preference relations, thus, the above equations may not be possible to be applied for every alternative Xi,xk,Xj. If expert e/, provides an incomplete fuzzy preference relation Ph, the following sets are defined 7 : A = {(i,j) I i,j e h

MV

h

EV

{l,...,n}Ai^j}

= | (i,j) € A | p^j is h

=A\

MV

H% =

lj^i,k\(i,j),(j,k)eEVh]

unknown} H% = \j # i,k I O'.O.O'.fc) e EV". H?k3 =

\j?i,k\(i,j),(k,j)eEVh'

MVh is the set of pairs of alternatives whose preference degrees are not given by expert e^, EVh is the set of pairs of alternatives whose preference degrees are given by the expert e/,| H^k, H^k2, H^k are the sets of intermediate alternative Xj(j ^ i,k) that can be used to estimate the preference value p^k (i ^ k) using equations (2), (3), (4) respectively. The final estimated value of a particular preference degree p\k ((i, k) £ EVh) can be calculated only when #(i// l fc 1 + H$ + H^) ± 0: h

_ ZjeHyicptkV1

+ Sj6fl
EjeH»>(cP?k)J3

In the case of being ( # # # + # # ^ 2 + #H?kz) = 0 then the preference value P'ik ((*> k) e EVh) cannot be estimated using the rest of known values.

3 . Interactive S u p p o r t S y s t e m t o A i d E x p e r t s t o E x p r e s s C o n s i s t e n t Preferences In this section we describe in detail our interactive support system to aid experts t o express their fuzzy preference relations in a consistent way. Firstly we will enumerate all the design goals and requirements that we have taken into account and secondly we will describe the actual implementation of every requirement in the system.

3 . 1 . Design

Goals and

Requirements

Our design goals and requirements could be split in two different parts: Interface Requirements, and Logical Goals. Interface R e q u i r e m e n t s : These requirements deal with the visual representation of the information and the different controls in the system. We want our system to comply the so called "Eight Golden Rules"4 for interface design: • • • • • • • •

GR GR GR GR GR GR GR GR

1. 2. 3. 4. 5. 6. 7. 8.

Strive for consistency. Enable frequent users to use shortcuts. Offer informative feedback. Design dialogues to yield closure. Offer simple error handling. Permit easy reversal of actions (undo action). Support internal focus of control (user is in charge). Reduce short-term memory load of the user.

Logical Goals: • Goal 1. Offer recommendations to the expert to guide him toward a highly consistent and complete fuzzy preference relation. • Goal 2. Recommendations must be given interactively. • Goal 3. Recommendations must be simple to understand and to apply. • Goal 4. The user must be able to refuse recommendations. • Goal 5. The system must provide indicators of the consistency and completeness level achieved in every step. • Goal 6. The system should be easy to adapt to other types of preference relations. • Goal 7. The system should be easy to incorporate to Web-based GDM models and decision support systems 3 , 1 0

3.2. Actual

Implementation

We will now detail how we have dealt with every requirement and goal that we have presented in the previous section. To do so we will make use of a snapshot of the system (figure 1) where we will point out every implementation solution. Implementation of the Interface Requirements: LhoDiIng a Small Car

a

gfe €& c=* e& * » . °:r ^

r *t

J

CAT

I

a

ttoA

..ra

'".

£3* ^ •-L'" ( iimpltniif i * I evil
Figure 1.

Snapshot of the Support System

GR 1. The interface has been homogenised in order to present a easy to understand view of the process which is being carried. We have introduced 3 main areas: In area number (1) we present the fuzzy preference relation that the expert is introducing, as well as a brief description of every alternative. Area number (2) contains several global controls to activate/deactivate certain functions, as well as to finish the input process. Area number (3) contains different measures that show the overall progress (see below). G R 2. Shortcuts have been added to the most frequent options and the input text areas for the preference values have been ordered to access to them easily using the keyboard. G R 3. Our systems provides recommendations (4) and consistency and completeness measures (5) (see below). All controls have tooltips. G R 4. With every change that the user makes to his/her preferences the system provides new recommendations and measures. G R 5. Incorrect inputs are prompt with error messages. G R 6. We have introduced undo and redo buttons (6). G R 7. The user can choose at every moment which preference value

431 wants to give or update, as well as enabling/disabling options. • G R 8. All information is presented in a single screen. Logical Goals: • Goal 1. To offer recommendations, the system computes all the missing values that could be estimated by using equation 5 and it presents them in area (1). As the values are computed taking into account the additive transitivity property, the recommendations should tend to increment the overall consistency level. They are presented in a different color (gray) (4) to be easily distinguishable from the proper expert values (7). • Goal 2. When the expert introduces or updates a preference value all possible recommendations are recomputed and presented. • Goal 3. Recommendations are given in the same manner as the user inputs his/her preferences. There is also a button that enables the user to accept or validate a given recommendation (8). • Goal 4. A user can choose any value for a particular preference degree ignoring all the recommendations. • Goal 5. In previous works 2 we provided some measures of the consistency and completeness of fuzzy preference relations (5). The consistency measure for a particular F P R Ph (called clh) is based on the error that can be computed between the p^k values that the expert e^ provides and the cp\k values that can be estimated using expression 5. The completeness measure (Ch) is obtained as a ratio between the number of values given by the expert (#EVh) and the total number of values that the expert should give to have a complete FPR. In our system we also combine these two measures into a global consistency/completeness measure that informs the expert with his/her current degree of consistency and completeness: CCh = clh • Ch

(6)

• Goal 6. As the system is programmed following the principles of Object Oriented Programming, to adapt it to new kinds of preference relations is an easy task. • Goal 7. As the system is Java based, it is easy to incorporate it into a web-based environment. 4. Conclusions and Future I m p r o v e m e n t s In this paper we have presented an interactive support system which aids experts to provide consistent preferences and to help them to avoid incom-

plete information situations in GDM environments where the opinions must are provided as fuzzy preference relations. The system works providing easy recommendations while the expert gives his/her preference values, always trying to maximize the consistency of the expert's opinions. In the future we will extend the system to allow the use of different preference relations (linguistic, interval-valued and multiplicative preference relations, for example) and we will integrate it into a complete consensus reaching process to enrich the preference acquisition step in the process. Acknowledgments This work has been supported by the Research Project TIC2003-07977, and the EPSRC research project "EP/C542215/1". References 1. Alonso, S., Chiclana, F., Herrera, F., Herrera-Viedma, E.: A consistency based procedure to estimate missing pairwise preference values. International Journal of Intelligent Systems, in press. 2. Alonso, S., Herrera-Viedma, E., Chiclana, F., Herrera, F., Managing Incomplete Information in Consensus Processes. Proc. of Simposio sobre Logica Fuzzy & Soft Computing (LFSC2005), Granada (Spain) (2005) 175-182. 3. Bhargava, H.K., Power, D.J., Sun, D.: Progress in Web-based decision support technologies. Decision Support Systems, In Press. 4. Chen, Z.: Interacting with Software Components. Decision Support Systems 14 (1995) 349-357. 5. Chiclana, F., Herrera, F., Herrera-Viedma, E.: Integrating three representation models in fuzzy multipurpose decision making based on fuzzy preference relations. Fuzzy Sets and Systems 97 (1998) 33-48. 6. Herrera-Viedma, E., Herrera, F., Chiclana, F., Luque, M.: Some issues on consistency of fuzzy preference relations. European Journal of Operational Research 154 (2004) 98-109. 7. Herrera-Viedma, E., Chiclana, F., Herrera, F., Alonso, S.: A group decision making model with incomplete fuzzy preference relations based on additive consistency. Technical Report #SCI2S-2004-U- University of Granada (2004). http://sci2s. ugr. es/publications/ficheros/TechnicalReportSCI2S2004-ll.pdf 8. Kacprzyk, J.: Group decision making with a fuzzy linguistic majority. Fuzzy Sets and Systems 18 (1986) 105-118. 9. Tanino, T.: Fuzzy preference orderings in group decision making. Fuzzy Sets and Systems 12 (1984) 117-131. 10. Zhang S., Goddard, S.: A software architecture and framework for Webbased distributed Decision Support Systems. Decision Support Systems, In Press.

A M O D E L OF D E C I S I O N - M A K I N G W I T H LINGUISTIC INFORMATION B A S E D O N LATTICE-VALUED LOGIC

J U N MA, SHUWEI CHEN, YANG XU Department E-mail:

of Mathematics, Southwest Jiaotong University Chengdu 610031, China [email protected], [email protected], [email protected]

In this paper, a model for decision-making with linguistic information is discussed based on uncertainty reasoning in the framework of lattice-valued logic through an example. In this model, decision-making process is treated as an uncertainty reasoning problem, in which decision-maker's background knowledge about the problem at hand and consultancy experts' assessments on alternatives are regarded as the antecedents of the uncertainty reasoning, the final decision is taken as the conclusion of the uncertainty reasoning, respectively.

1. I n t r o d u c t i o n Linguistic decision-making study is an important issue in linguistic information processing. A great number of works have been presented and applied in various fields, such as multi-source information fusion and aggregation [2,3,9,11]. As far as the primary processing strategies are concerned, most existing approaches are computational means and always have a common presupposition that the final decision is hidden in the evaluated alternatives, which can be picked out by computation. However, in our opinion, there is a great deal of common ground between this process strategy and uncertainty reasoning: 1) the final result is predefined and no unexpected conclusion need to be evaluated; 2) some transcendent knowledge is taken as starting point of processing, which is accumulated and solidified from human's experience; 3) operating is established in a certain interpretation and application environment, which links the transcendent knowledge and the final result. So, it is rational to study decision-making in terms of uncertainty reasoning in the framework of the classical logic or some non-classical logics. In this paper, we shall discuss a linguistic model for a kind of multicriteria decision-making problem based on lattice-valued logic. Concretely,

433

the rest of paper is organized as follows: in Section 2, we give some notations and assumptions firstly, and then illustrate the model through an example; in Section 3, we discuss the merits and disadvantage of the model. 2. M a i n R e s u l t s 2 . 1 . Model

Description

Firstly, we shall suppose that Q is a real decision-making problem, L is a lattice implication algebra (LIA) [5], E = { e i , e 2 , . . . , e/t} is the set of consultancy experts, A = {a\,a2,... ,o„} is the set of alternatives, T = {*ii *2; • - •} is the set of all linguistic terms, F = {/i, /2, • • •, fm} is the set of evaluation factors; G is the decision goal, W = {ui\,W2, • • •, wm] is the set of generalized weights for evaluation factors, W C T, and fi = {u>i,u>2,... ,Wfc} is the set of generalized weights for consultancy experts, OCT. Secondly, we give following assumptions. A s s u m p t i o n 2.1. For each e G E, it is associated with three

mappings:

(1) &^ : T<e) —> Int(L), where T^ CT is the set of linguistic terms the expert e prefers to use, and Int(L) is the set of all intervals of a lattice implication algebra L [4J- This mapping means intuitionally the expert e 's understanding of each linguistic term t £ T^. (2) W^ : W —> T^, which means the expert e's opinion on the importance of each factors. (3) s*V> : AxF —> 0>{TM), which means the expert e 's assessment to all alternatives on all evaluation factors. For convenience, we shall denote (/(a),TJ e J ) as the assessment of the expert e on a in terms of the factor f, T^f G &>{T^). A s s u m p t i o n 2.2. Each f G F is linked to a fact ((\fx)f(x) =» G(x),w), where w G W is the assigned weight by the decision maker to the factor f. A s s u m p t i o n 2.3. Each a £ A is associated with a set of synthesized assessments 5 ( e i ) ( a ) , S ( e 2 ) ( a )> . . . , S ( e f c ) (a), and S(a), where S^(a) is the synthesized assessment of the expert e, to a, and S(a) is the synthesized assessment of all consultancy experts. These synthesized assessments have the following forms: m

S^(a):

(hGvWMa)),

i = l,2,...,fc

435

and S(a):

(SM{a)A---AS(-ek)(a)^G{a),t{a)),

where t(a) 6 T is a set of linguistic evaluation factor fj, j = 1 , 2 , . . . , m. A s s u m p t i o n 2.4. The decision to T*- ' C T, such that each u t e T^ which means the effect decision. Hence, we can describe

term(s),

(1)

Gij(o) is the decision by

maker has defined a mapping from CI G fl is attached to a linguistic term of the consultancy expert e on the final the effect of the consultancy expert e by

((^X)S^{X)=>G{X),LJ).

So, the decision process is aiming at getting 5^ 6 l '(a) A • • • A 5^e*^(a) => G(a) and t(a) from the facts ((Va:)S<e<>(:r) => G{x),Vi), {{\fx)fj(x) => G{x),Wj), and ( / j f a p ) , ^ ^ ) , where i = 1, 2, . . . , k, j = 1, 2, . . . , m, p= 1, 2, . . . , n. 2.2. Example

for

Illustration

Due to the limitation of paper length, we shall illustrate the model through the following example, which is taken from [7]. E x a m p l e 2.1. Consider the evaluation of university faculty for tenure and promotion. The evaluation factors used at some universities are / i : teaching,

fa

: research,

fa

: service.

Five alternatives ai, a?, 03, 04, and 05 are to be evaluated using the linguistic terms: TE = {^o = extremely poor,

t\ = very poor,

£2 = poor,

£3 = slightly poor,

i 4 = fair,

£5 = slightly good,

te = good,

t-j = very good,

is = extremely good},

by four consultancy experts e\, e-z, e$, and e 4 . Step 1. (selecting LIA) The selected LIA L — {ZQ < z\ < • • • < z$} is shown in Figure 1. Here, we shall not distinguish the difference between the consultancy experts' preference and the evaluation factors' importance, and take the LIA as the common universe of discourse for convenience. In fact, we can select different LIAs for them.

436

Int(t) Figure 1.

Linguistic terms for evaluation.

Step 2. (interpreting linguistic terms) Each expert e £ E gives the definition for each linguistic term who prefers to use. Without loss of generality, suppose all consultancy experts' select the same linguistic terms and give the same definition for each, which are also marked in Figure 1.

weights for evaluation factors

Figure 2.

weights for consultancy experts

Weights for evaluation factors and consultancy experts.

Step 3. (interpreting linguistic weights) Each weight w £ W (u> £ £1) is a linguistic term, which reflects the effect of the corresponding factor (consultancy expert) on the final decision. Suppose the weights for evaluation factors and for consultancy experts are shown in Figure 2, where Si is "very important", S2 is "important", u\ is "very important", and u?, is "slightly important". The weights for each evaluation factor and for each consultancy expert are listed in Table 1.

Table 1. perts.

Weights for evaluation factors and consultancy ex-

expert weight

ei

e2

e3

U2

Ul

U2

e4 ui

factor

/i

h

h

weight

S2

si

S2

Step ^. (evaluating alternatives) Suppose the consultancy experts assessments are shown in Table 2 [7]. Table 2.

Assessments of all experts.

Expert ei's assessments: fi ax 12

03

14

15

fl

*4,*5

t7i*8

*7,t 8 *6,*7.*8

t3,t4,*5 *5|t6

t7,t8 *5,t6,*7 t6,*7

"3

14

15

*6,t7

*6,t7,*8

t6,*7,*8

*6,t7

t7,*8 *6,*7 t4,t5,t6it7

t7,ta

f2 *5,*6 *6,t7 h *5,*6>*7 t6,*7i<8 Expert e2's assessments: fi oi «2 fl h,t6 t4,45,*6 J2 t7i*8 /3 *4,t5 *6,*7,*8 Expert e3's assessments: fi ox 12 f\ *6,t7 *4,t5,*6 /2 /3

t7,*8 *5,*6

*6)*7

Expert e4's assessments: fi "l «2

/i fl /3

h,te t6 1 *7i*8 *S)t6i*7

*7.<8 *6,*7

13

04

15

*7,*8 *6,*7 *5>t6

*6,*7 t4,<5 *7,'8

*5,*6,t7 t4,t5,t6,t7

13

04

15

*6,t7,*8 *6>*7 *7,*8

t4,t5,*6 *6.*7 *5,*6

*5>*6,t7 *4,*5

Step 5. (rearranging assessments) We shall select the smallest interval of the selected LIA L to cover all linguistic terms for each expert's assessments. For example, the expert ej's assessment on the alternative a\ in terms of the evaluation factor f\ is covered by the interval [25,ZQ}. The rearranged assessments are shown in Table 3. Step 6. (constructing deduction) Taking expert e's assessments for alternative a and background knowledge for evaluation factors as antecedents, we can construct deduction of S^ia) from (/}'(a),&/W{a,fi)) and (iyx)f\e'(x) =4> G\e'(x), u>i), i— 1 , . . . , m. The construction has two stages: (1) { ( / | e ) ( « ) y W ( « , / i ) ) } ^ ! e ) ( 4 (2) {G<e)(a),i = l,...,m}r-S( e )(a).

i = l,...,m.

Table 3.

Rearranged assessments of each expert.

Rearranged expert e\ 's assessments. a$ 0-2 "3 h a\ h [25, 26] [23, 25) [23,,24] [25, 26] h [23,.25) [24, 26) 1*5,,26) 1*2, 24) h 1*3, 2 6 | [24, 26) [24,, 26J 1*3, 25] Rearranged expert e2's assessments. 04 a\ as 12 fi

as

h h h

[25, 26] [24, 26J [23, 26]

[24, 26] [25, 26] [23, 24]

[23, 25]

1*3, 26) [24, 26)

[24, 26] [23, 25] [24, 26]

[24, 26] [23, 25) [24, 26]

"5

[25, 26]

1*3, 26] [*4, ,2 6 j

Rearranged expert e3's assessments. (24 a\ 12 a.3 fi

as

h h h

[24, 26] [25, 2 6 ] [23, 2 5 ]

[23, 25] 1*3,,*6] [*3, 2 6 ]

fi

a\

<*2

h h h

[23, 25] [24, 26) [23, 26 J

[25, 26) [24, 26) [24, 26)

[25,,26] [24, z6] [23, 251 [24,,26] [*3, 24] [23, 25) [23,,25] [25, 26] [24, 26) Rearranged expert e4's assessments.

az [24, 26] [24, 2 6 )

N, 2 ) 6

04

«5

[23, 25] [24, 26)

[23, 25] [23, 26) [23, 24]

1*3, 25]

By properties of lattice-valued logic based on LI A, the construction is trivial. Readers can refer to [4,6] for more details. Step 7. (computing t(a)) According to the constructed deduction sequences of S^{a), the synthesized assessments for each alternative are listed in Table 4. Table 4.

Synthesized assessments for each alternatives.

ei

ai

02

«3

04

05

ei

[23,25] [20,20] [24,24] [23,24]

[23,24] [22,24] [22,24] [23,25]

[20,20] [22,25] [23,24) [23,25]

[23,24] [22,25] [23,24] [23,24]

[22,25] [23,25] [22,24) [22,23]

e2 e3 e4

Step 8. (aggregating consultancy experts assessments) To aggregate the synthesized assessments of all consultancy experts, we shall construct another deduction of S^ei)(a) A • • • A 5 ( e t ) ( a ) => G(a) and t(a) from the facts ((yx)S^6i^(x) =>• G(x), u>i), i = 1, 2, ..., k. Because the construction is trivial, we only give the aggregated results in Table 5, where the numbers Vij are occurrence of the corresponding linguistic terms Sj in the synthesized assessments for a{.

Table 5. Aggregated synthesized assessments for all alternatives. Vij

so

«1

«2

«3

S4

S5

S6

S7

S8

en

1 1 1 0 2

0 1 0 0 2

1 3 2 3 3

3 4 3 4 4

3 4 3 4 4

2 2 2 2 2

0 1 2 1 1

0 0 0 0 0

0 0 0 0 0

12

a.3 a.4

as

Step 9. (selecting appropriate decision) To make the final decision, the decision maker has many methods. Here, we number each linguistic term's index as its score, i.e. [SJ] — j (j — 0 , 2 , . . . , 8), and compute the average score for each alternative: s(a,i) = 5^i = o( v u case the average score for each alternative is s(oi) = 3.30,

s(a2) = 3.19,

s(a3) = 3.62,

x

lsj])/Ylj=ovij-

S° in this

s(a 4 ) = 3.57,

s(a5) = 2.89.

Hence 0,3 is the best alternative. The conclusion is the same to that in [7]. If we take the sum of scores as the selecting criteria, then 02 is the best, which is the same to [1]. From this example, it can be concluded that different selecting criteria will lead to different results. 3. Conclusion In the present work, we discussed a model for processing linguistic information in a kind of decision-making problem based on lattice-valued logic. In this model, linguistic information needn't be placed symmetrically in linear order, although the example is illustrated based on a linear placed linguistic term sets. Moreover, each experts, as in fact a source of information, can use his/her own operating logic to present his/her opinions. All these strategies are easy to be applied to multi-source linguistic information processing. Notice t h a t the reliability of the conclusion in an uncertainty reasoning will decrease rapidly with the increasing of the length of inference sequence, we should construct as short as possible inference sequences in order to keep the reliability above a satisfying level. Hence, the choice of the underlying logic system plays an crucial role in the model. More work is needed. Acknowledgement We sincerely appreciate the anonymous referees for their valuable comments and suggestions. The work is supported by the National Natural

440 Sciences Foundation of China with granted number 60474022, and ChinaFlanders Bilateral Scientific Cooperation Joint Project with granted number 011S1105. References 1. N. Bryson and A. Mobolurin, An action learning evaluation procedure for multiple criteria decision making problem, European J. Operational Research, 1995, 96: 379-386 2. F. Herrera, E. Herrera-Viedma, and L. Martinez, A Fusion approach for managing multi-granularity linguistic term set in decision making, Fuzzy Sets and System, 2000, 114: 43-58. 3. F. Herrera and E. Herrera-Viedma, Linguistic decision analysis: steps for solving decision problems under linguistic informaiton, Fuzzy Sets and Systems, 2000, 115: 67-82 4. J. Ma, Studies on Lattice-Valued Logic and Its Applications, Post-Doc Research Report, Southwest Jiaotong University, Chengdu, 2005 5. Y. Xu, Lattice implication algebra, J. Southwest Jiaotong University, 1993, 28(1): 20-27 6. Y. Xu, D. Ruan, K. Qin, and J. Liu, Lattice-Valued Logic - An Alternative Approach to Treat Fuzziness and Incomparability, Germany: Springer, 2003 7. Z. Xu, Uncertain linguistic aggregation operators based approach to multiple attribute group decision making under uncertain linguistic environment, Information Science, 2004, 168: 171-184 8. R.R. Yager, Inference in a multiple-valued logic system, Internal. J. ManMachine Stud., 1985, 23: 27-34 9. R.R. Yager, On ordered weighted averaging aggregation operators in multicriteria decision making, IEEE Trans. Systems, Man, and Cybernetics, 1988, 18: 183-190 10. R.R. Yager, An Approach to Ordinal Decision Making, Internal. J. Approximate Reasoning, 1995, 12: 237-261. 11. X. Zeng, Y. Ding, and L. Koehl, A 2-Tuple Fuzzy Linguistic Model for Sensory Fabric Hand Evaluation, in: D. Ruan and X. Y. Zeng, eds., Intelligent Sensory Evaluation-Methodologies and Applications, Germany: SpringerVerlag, 2003.

INFORMATION INTEGRATION BASED TEAM SITUATION ASSESSMENT IN AN UNCERTAIN ENVIRONMENT JIE LU, GUANGQUAN ZHANG Faculty of Information Technology University of technology Sydney, POBox 123, Broadway, NSW 2007, Australia Email

{jielu,zhangg}@it.uts.edu.au

Abstract: Understanding a situation requires integrating many pieces of information which can be obtained by a group of data collectors from multiple data sources. Uncertainty is involved in situation assessment. How to integrate multi-source multi-member uncertain information to derive situation awareness is an important issue in supporting decision making for crisis problems. The study focuses on how uncertain situation information is presented, integrated and finally how situation awareness information is derived. A multisources team information integration approach is developed in the study to support a team's assessment for a situation in an uncertain environment. A numerical example is then shown for illustrating the proposed approach. Keywords: Information integration, situation assessment, fuzzy numbers, team

1. Introduction Decision making for a crisis problem often depends on the awareness of decision makers for a situation. Situation awareness (SA) is defined by Endsley [3] "as the perception of elements in the environment, the comprehension of their meaning in terms of task goals, and the projection of their status in the near future." The process of achieving a SA is called situation assessment or situation analysis. Situation assessment is based on acquired situation information that can be implicit or explicit. Awareness information is derived as the results of situation assessment. SA has been largely studied as an important element in diverse military and pilot systems using observation, experiments and empirical methods. Recently SA has been recognized as an element to support emergency management and crisis problem finding. This trend requires the development of general situation assessment models and approaches. Although the elements of SA vary widely between domains, the nature of SA, as the output of a kind of information systems, and the mechanisms used for achieving SA can be described generically. The enhancement of SA information processing approaches and techniques has become a major design goal for situation assessment systems [6]. However, comparing with the results conducted in aviation environments by experiment methods, the research in general situation information processing has received less exploration. This study will combine the main features of both descriptive and prescriptive assessment approaches to show a general process for team situation 441

442 assessment under some uncertain factors in deriving situation awareness for supporting decision making. SA is aware of a situation, can be an objective or an environment, based on the perception of what is happening or has the potential to happen [9]. If a department or a command centre is to make the right decisions for a crisis problem, it must be able to assess the situation, the threats and the opportunities faced, and create awareness for the situation. Information about a situation often comes from multiple data sources. Awareness or decisions based on information from a single source may be sub-optimal or incorrect. It is often the convergence of evidence from various sources that provides an accurate and reliable result [2]. Also, information about a situation is often collected by multiple observers or data collectors as a team. A collector observing a situation and collecting date from the same source but in different time slots may report different results. Because the collector may have difference personal views for a situation in different time slots and a situation may change in different time slots. Therefore, team SA is proposed to concern how to integrate awareness of individuals in such a complex situation [1]. The data fusion community is more commonly to refer such a situation as situation assessment [4]. Uncertainty is involved in both situation assessment elements and situation assessment process. First, information about a situation is hard to be captured precisely and completely. Situation observers or collectors may don't know exactly what information will relate to a possible problem and may put their personal views into the information they obtained. Accepting as a given that the situation information is incomplete and uncertain, the perception of SA can therefore only be obtained through processing the uncertain or incomplete information. Second, when multiple data collectors are involved in a situation assessment, they can be given different relevant degrees or weights based on their experiences and knowledge for the situation. Experienced collectors will have higher weights, while inexperienced collectors, with lower weights. The third, when multiple information sources are used for assessing a situation, these sources may have different believe degrees. Data collected from some sources may be more reliable and truthful than of others. Therefore, an appropriate aggregation of information from different sources and different collectors with a consideration of the effects of uncertain factors contributes crucially to the correct assessment for situations. Many aggregation or fusion approaches have been reported in literature. This research aims to develop a team situation assessment approach to integrate uncertain team assessment information for a situation. Following the introduction, Section 2 proposes multisources, multi-member information integration issue. An information integration approach for team situation awareness is presented in Section 3. A numerical

443 example is shown in Section 4 for illustrating the proposed approach. Conclusions and further study are discussed in Section 5. 2. Multiple data sources and team information integration Understanding a situation requires mentally integrating many pieces of information, including both that information exists for a relevant situation, how it is interrelated to the situational context, the information sources' situation, and the information collection team members' background. Information integration is an increasingly important element of SA systems. The process of information integration uses overlapping information to detect, identify and track relevant objects in an environment which may be reflected in multiple sources. A team's situation assessment is based on individuals' assessment result which aggregates each individual's assessment for multiple sources. Performing assessment individually on each data source will minimize the potentially harmful impact of background noise derived from various sources. As this study will incorporate uncertainty of information collected into an integration process, the information integration process can be completed under four levels which can be implemented by four steps: 1) each individual collector's information integration for his/her multiple observations to each single data source (or multiple sources but with the same believe degree); 2) each individual collector's information integration for his assessment to all data sources; 3) individuals who have the same relevant degree (weight) is aggregated; and 4) all individual collectors' situation assessment information is weighted integrated to conduct a synthesis assessment of the team. This approach first focuses on determining individual assessment from single data source to multiple sources for a situation. It then considers the synthesis information assessment result of the team. Workload, time, stress, inexperience at assessing a problem all affect the assessment results. Some team members in some situations may not either know what information relate to the problem or may not provide all relevant information to the team. Also the team members may obtain different information from the same source at the same time, or have different understanding for an object. As a result, they may have different judgments and awareness for a situation. Therefore, the synthesis assessment results of a team must consider both the information sources' believe degree and members' relevance degree (weight). This approach also considers 'time-scale' issue to analyze individual information gathering. In actual work situations, such as an emergency coordination centre, a team often works exclusively on one or few time-scale, and distributes goals and information to higher or lower levels in the hierarchical web, who are then able to consider other time-scales. The team situation forming process is a consequence of the information filtering, coordination and aggregation. This proposed approach identifies team situation

444 assessment factors, including the believe degree of data sources, the time-scale of data collections, and the weight of collectors. It will apply target recognition approach [5] to conduct team assessment's coordination with consideration of uncertain factors. As team members may share many similar characteristics with persons functioning as individuals in dynamic environments, this approach adopts similar approaches for group uncertain issues, such as weights, as of a lone collector when the group of members has similar backgrounds. This approach uses these concepts to be developed for coordinating team information and deriving a team's situation assessment. This approach uses fuzzy numbers to deal with uncertain information obtained from multiple data sources, believe degrees of these sources and relevance degrees of individuals. To consider the believe degrees of sources and weights of members is to minimize the influence of incorrect understanding for a situation. The proposed situation assessment approach can therefore help teams achieve better decision-making processes in a crisis problem solving. 3. Multi-Sources Team Uncertain Information Integration Approach Let S = {S\,S2, ...,Sm},m > 2, be a given finite set of data source; C = {CUC2, ..., C„} be a given finite set of data collectors. We suppose that these data sources can be divided into s groups by their believe degrees. Such as, there are m{ sources with a high reliability, m2 sources with a rather high reliability, and ms sources with a low reliability, m=m\+..+ms. Similarly, these collectors is devised in to t types such as there are ti\ strong collectors, m rather strong collectors, and wt weak collectors, etc. n=n\+..+ns. Collector C, (/' = 1,2, ..., ri) obtains information ay from source 5} (/'= 1, 2, ...,m),ay(i= 1, 2, ...,n;j= 1, 2, ..., rri) is a fuzzy number and can take values from five fuzzy subsets (linguistic terms): SN {surely have not), MN {may have not), NS {not sure), MH {may have), and SH {surely have). It was recognized that subsets with normal-type membership functions ^:[0,1]->[0,1] The centers of membership functions z of the associating numerical values to linguistic labels of the fuzzy subsets in an intuitive way: 0: SN, 0.25: MN, 0.5: NS, 0.75: MH, 1: SH. The individual collectors' assessment values (linguistic) are effectively combined to produce a synthesis values. The approach is described by following four steps: Step 1: integration of each individual's information of multiple observations for single source (or multiple but with same believe degree) The approach firstly integrates the assessment information of each collector obtained from one or multiple data sources with same believe degree. The multiple values of each collector present the collector's observation for a object in different

445 time slots, and for multiple objectives which have equivalent believe degree. These values are combined according to the believe degree (reliability) of the respective data sources. These data sources can be classified into high reliability sources, medium reliability sources and low reliability sources for example. The step aims to obtain individuals' assessment for a situation from one or few equivalent data sources the average operator is therefore used as follows. m

\ a.,

m

m

i a.,

'-t a.

.

m

* a

C , 1 = E ^ ; C 1 , = S - ^ ; . . , Q _ 1 = 2 : ^ ^ and Cls = £ ^ - ,

/ = 1, 2, . , „,w

here m\ = mi — lt, } . m, = m, /,- is the number of empty value for ith source. Cy (i = 1,2,..., n,j = 1,2,..., s) is the ith collector's average assessment value for they'th group of data source. Step 2: integration of each individual's information for multiple sources We suppose Rh R2, .... Rs represent the believe degrees of these group of data sources using fuzzy numbers with normal-type membership functions. Obviously, from data high reliability sources will be integrated with a higher priority to the integrated result for each collector. This step aims at producing a value for each collector by combining all integrated values obtained in Step 1. By the step, each collector has obtained a value as his/her assessment. C^t^fCy,

i = l,2,-,«.

H

Step 3: integration of equivalent individuals' information To integrate all collectors' assessment for an objective/environment, the difference between team members should be considered. Each group of members who have same relevant degree (weight) will be integrated as follows. J\ a„ i a„ vi a. J^, a„ a n dC C^Z—'Cm-lL — '-'C..,-!,^ » , = Z —> where £| = I nt - n. Step 4: integration of all weighted individuals' information to conduct a synthesis assessment for the situation We suppose wh wj,..., w, represent the weights of collector groups from strong to weak respectively. To achieve a value as the team situation assessment result, the step combines the results obtained in Step 3 with weights. Of course, higher priority is given to most strong collectors. i=l

The synthesis value obtained is as the assessment result of the team for a situation. Obviously, it is a fuzzy set. We then provide a crisp value through defuzzyfication

446 (centre of mass) as a team assessment result by using the following formula [7,8]: i

\xftR(x)dx \HR{x)dx o

4. An illustrate Example There are three data collectors to assess an environment from three data sources through multiple times observations. The nine collectors have three members with weights 'high', three 'middle', and three 'low'. The three sources have believed degree 'high', 'middle' and 'low' respectively. Basically, each collector visits source 1, 2 and 3 for three, two and three times respectively. Therefore, each collector has an assessment result: (SI, S2, S3) from source 1 with a high believe degree, (S4, S5) medium believe degree and (S6, S7, S8) low believe degree. As possible ata missing, some values could be empty. 0

x<0

SN = -4x + \

0<x
i.e.,

SN = u

X

,le[0,l]

0 0

1/4 < x x<0

4x - 4JT + 2

0 < x < 1/4 , , ..e„ 1/4 < x < 1/2

0

1/2 < x

MN--

0 NS--

MN=

4

4

X_ _X_ J_

u X
4

2

x < 1/4

4x-\

1/4 < x < 1/2 ' , , - 4* + 3 \/2<x< 3/4 3/4<x 0 x<\/2 0 4x - 2

MH = {

-4x + 4 0

\/2<x< '

3/4<x
3/4 ' ,

i.e.,

i.e.,

NS=

SH = \4x-3

— I _i. 2

u X ,»e|0,i] 4

MH =

4'

4

4

X I X , +1 4 2 4

<J X — + —,

Ae[o,ij

\<x

x<3/4 0

o,-A+I

3/4<x<\, \<x

i.e.,

A

SH = u X WW \_4

A. 4

Now we use the proposed approach to calculate the integrated situation assessment result for the team. Step 1: integration of each individual's information for single source C.„ =-NS '•"3

X 1 + -MH + -NS= <j X 3 3 -"(".'I —+ - , 4 3

X

5 +— 4 6

C,U=-SN '• W

+ -MN=

2

2

u X

C „ =-NS + -MN + -SH= u /I 1 ' 3 3 3 ^o,i| 4

4 + 8.

8'

Ae[0,l]

i. 1 -1 1 4 8

C1H=-NS • 2

+ -MN= u /t • + 2 *«[o.i] L 4 8'

CiL=-DY ' 2

C, H = DX =

+ ~DN= u J " i . + l , - i + i l 2 A«|o,i] [8 8 8 8 J

C,U=UN=

A 3

A* -', u A—+ -U[o.i] [ 4

+1 -,

4

4

2

2

-KM] [ 4

i6[o,i] [ 4 C.6 H=RY= '"

4

8

46[0,IJ [ 4

8

]•

2

'

- 3]

-uio.i] [ 4

4

4

c7U=-w+-Rr=

4

-111 4 2J

J

4 4J

C6M=D>'= w A | - + 2,I],

u 4 - +2 . - - + 1

J'

2

J40.il L4 4

A«to.ij [ 4 1

1

X\— + —,

io,-A+J

2 4

4

,U 3

u

J] L 4 < 1 ™, ' , , » , ,[•* 1 • 2 2 A6[o,i] [ 8 8 C,„=RY = u i i . + I , - i . + i l

c4,=-!-«r + 2Dr= u J i - + - , - - + i ,,t

6

C,L=DN= u n \_4 4 • ;u(0,

4J

Jx 5 A + -DY = u J A —+ - , o.i) L4 8 8

C.H=-RY

3'

+ -NS= u ^ 1 ++ 2 _ i+ 2 2 -t=lo,i] 4 8 ' 4 8

C2M=-MH ' 2

+

L -A + 1

c7H =iDK + -!-Dy+-oy = Dr= uu x\± + lA

4J

7,«

u 4-+-.--+-

3

3

4

J

Ciu=-RN "•" 2

+ -DY + -UN= V i - + - , - i - + i i 3 3 *«[».u L4 12 12 12 Jx I /I + -UN = u 2 -wo,J) L4 8 4

' * +1 Cw8 , = i i w + i DK= u XJ* — +—, —, 2 2 iMOj) [ 4 2 8 8J

C,„=-DN '"3

+ -UN + -DN = u / — + — . - - + 3 3 w Ll2 12 4 12

C,„=*r=

c,,=/(y= u 4-+-,--+)]

'

2

2

CSH=-DN '" 2

Ae[0,i] [ 4

8

4

C1L=-DY • 3

[o.i] L4

3

8

+ ^-UN = u / i . + I . - i . + i 2 Aeio.i] [ 8 8 4 2 7

u i | - +2 , - - + l

•ie|o.i] [ 4

2

4

w

^o.i] [4

2

4

J

Step 2: integration of each individual's information for multiple sources 1

2 '•"

3 '•"

6 ''

KMiT_24

9

72

„ 1„ 1„ 1„ J5A 11 C, = - C j „ + - C . „ + - C , , = u M—+—, i 2 3jt 3 jjr 6 u ^o.l]^24 24 _ 1„ 1„ !,, SX 7 2 3 6 a<*o,ij ^4 24 r -lr

*'r

-wjf'1*43

J r

7,« 3 7.K

2

6

7J. ^ 0 | ] |^4

?2.

3J

2

!

"

3

2l

*

2,i

6

3A 19l _ 1„ 1_ 1_ +—L C =-C +-C ,+-C ,= 24 24J 4 2 4K'•" 4t3 **4 6 w A 19] „ 1_ 1_ 1_ 4 24J 2 3 6 n

68 +

?2

1

7 2

r-'r

j.

«

2

.V

J r

>.« j I.M

6

i « i ] [48

-..jf"* .,t

4

48

,[5.1 11 u X]— + — , i.+ 1 /.fo.il [24 24 6 6 J i 13 1 23] •uio.i] [4 24 6 i<)0|]

|^ )6

3 ]6.

IU 4g

4.»1 4 g

j.

C,=-C,j,+-C,u+-C,,u A [ - + — , - - + —1 ' 2 " 3 " ( '-1 ^o.i] L6 24 4 24j

Step 3: integration of equivalent individuals' information j

1 3 2

3

1^, 3

,r3U 67 _ 8 M 52
J MX ^ 31 _7£ + 62' M

3 4 3 4 3 A«(o,i) [ 7 2 72 36 72 1„ 1„ 1^ J29A 155 83i 325

C„ + , + w =-C,+-C. +-C. = u A 3 3 ' 3 i
Step 4: integration of all weighted equivalent individuals' information /? = - Q + - C M + 2 c „ = u A 2

3

6

^l».'i

95/t

929

506/t

2005

432

6x432'

6x432

6x432

24 _

448 After defuzzyfication, a crisp synthesis value obtained for the team's assessment: x'R =0.5783. This describes the assessment result of the team for a situation. It could be the possibility of a risk which has the potential to happen. 5. Conclusions and further study This project develops a multi-sources team uncertain information integration approach by applying fuzzy set technique handing uncertain factors involved. It aims to minimize the influence of incorrect assessment results for a situation, and therefore support achieving better decision-making in crisis problems. Acknowledgements The work presented in this paper was supported by Australian Research Council (ARC) under discovery grants DP0559213. References [1] H. Artman, Team situation assessment and information distribution, Ergonomics, Vol. 43, No. 8, 1111-1128,2000. [2] S.Chang and S. Greenberg, Application of fuzzy-integration-based multipleinformation aggregation in automatic speech recognition, The IEEE Conference on Fuzzy Integration Processing, Beijing, 2003. [3] M. Endsley, Toward a theory of situation awareness in dynamic systems, Human Factors, Vol. 37, No.l 32-64, 1995. [4] M. Endsley and D. Garland, Situation awareness analysis and measurement, Lawrence Erlbaum, Mahway, New Jersey, 2000. [5] M M. Kokar and J.Wang, An example of using ontologies and symbolic information in automatic target recognition. SPIE Conference on Sensor Fusion: Architectures, Algorithms, and Applications VI, April, 40-50, 2002. [6] J. McCarley, C. Wickens, J. Goh and W. Horrey, A computational model of attention/situation awareness, The 46lh annual meeting of the human factors and ergonomics society, Human Factors and Ergonomics Society, 2002. [7] S. Murakami, S. Maeda and S. Imamura, Fuzzy decision analysis on the development of centralized regional energy control systems, The IFSA Symposium on Fuzzy Information, Knowledge Representation and Decision Analysis, Pergamon Press, New York, 363-368,1983. [8] R. Yager, and D. Filev, On the issue of denazification and selection based on a fuzzy set, Fuzzy Sets and Systems, Vol. 55, No. 3, 255-272, 1980. [9] W. Zhang and R. Hill, A template-based and pattern-driven approach to situation awareness and assessment in virtual humans, The fourth international conference on Autonomous agents, Spain, ACM Press New York, USA, 116123,2000.

SCHEDULING A FLOWSHOP PROBLEM WITH FUZZY PROCESSING TIMES USING ANT COLONY OPTIMIZATION SEZGlN KILIC Department

of Industrial Engineering, Air Force Academy, Hava Harp Okulu, Istanbul, 34149, Turkey, [email protected]

Yesilyurt

CENGIZ KAHRAMAN Department of Industrial Engineering, Istanbul Technical University, Islt. Fak., Macka Istanbul, 34367,Turkey, [email protected] Most of the work about flowshop problems assumes that the problem data are known exactly at the advance or the common approach to the treatment of the uncertainties in the problem is use of probabilistic models. However, the evaluation and optimization of probabilistic model is computationally expensive and the application of the probabilistic model is rational only when the descriptions of the uncertain parameters are available from the historical data. In this paper we deal with a permutation flowshop problem with fuzzy processing times. First we explain how to compute start and finish time of each operation on related machines for a given sequence of jobs using fuzzy arithmetic. Next we used a fuzzy ranking method in order to select the best schedule with minimum fuzzy makespan. We proposed an ant colony optimization algorithm for generating and finding good (near optimal) schedules.

1.

Introduction

Flowshop problems are made up of n similar jobs which have the same order of processing on m machines. The objective is to find a sequence of jobs which minimizes some measure of production cost such as makespan or mean flow time. In this paper we are interested in the permutation flowshop scheduling (PFS) problem similar to most of the research on flowshops and real-world production management practices, where the same job order is chosen on every machine. Hence a schedule is uniquely represented by a permutation of jobs. The problem is NP-hard, only some special cases can be solved efficiently [1], Although the PFS problem has often been investigated, very little of this research is concerned with the uncertainty characterized by the impression in problem variables. In most of the work about PFS it is assumed that the problem data are known precisely at the advance or the prevalent approach to the treatment of the uncertainties in the PFS problem is use of probabilistic models. 449

450 However, the evaluation and optimization of probabilistic models is computationally expensive and the use of probabilistic models is realistic only when description of the uncertain parameters is available from the historical data. In this paper we propose a schedule generation algorithm for the cases where the problem data are not known precisely and probabilistic models are not suitable because of its computational expense or absence of historical data. It is assumed that the planner is able to approximate the imprecise data by using fuzzy sets. The first application of fuzzy set theory on a flowshop problem as means of analyzing performance characteristics was by McCahon and Lee [2]. Ishibuchi et al. [3] examined flowshop problems with fuzzy due dates. Balasubramanian and Grossmann [4] applied a fuzzy approach to the treatment of processing time uncertainty. We proposed an ant colony algorithm for fuzzy permutation flowshop scheduling (FPFS) problem. As discussed above there is a little search on FPFS problem and there have not been any work on FPFS using an ant colony optimization (ACO) approach. 2. Formulation of the fuzzy permutation flowshop problem Using fuzzy numbers to represent the uncertainty in processing times is very plausible for real world applications. If a decision maker estimates the processing time (fa) of the job j on machine k as an interval rather than a crisp value then the interval can be represented as a fuzzy number. The use of interval \tjk — At ,ki, tJk + Atjk2 J is more appropriate than the crisp fa value. The fuzzy processing time, fk, can be represented by a triangular fuzzy number (TFN); TJk=(tjk-&Jtt,tJk,tJk+AtJk2)-

The addition and the maximum operators are required for calculating the fuzzy completion time. The addition operator does not distort the shape of triangular fuzzy numbers. In contrast, max {A,B} is not always a triangular fuzzy number while both A and B are triangular fuzzy numbers. Sakawa and Kubota [5] proposed an approximation for max {A,B} which keeps the triangularity as follows; max{^,5}= (a\a2 ,c?)v (b\b\tf)

* (a1 vb\a2 vb\a' vfe3)

(1)

When fuzzy data are incorporated in to the scheduling problem the measure of the makespan of alternative schedules are also fuzzy numbers. There exists a large body of literature that deals with the comparison of fuzzy numbers. More recently, Fortemps and Roubens (1996) have proposed the "area compensation (AC)"method for comparing fuzzy numbers based on compensation of the areas determined by the membership functions. They have shown that AC is a robust ranking technique which has compensation, linearity and additivity properties,

451 compared to other ranking techniques, yields results consistent with the human intuition. The AC of a fuzzy number is defined by; i

AC(X) = 0.5 \{xLa + xRa)da

(2)

0

If we are interested in minimizing the makespan, a schedule with fuzzy makespan Xj will be preferred over a schedule with fuzzy makespan X2 if AC(X!)
be the fuzzy completion time of job n(j);

c ... be the fuzzy completion time of job n(j) on machine k; We can find the fuzzy start and completion times for each job on each machine for a permutation n as follows;

c , ( A t = m a x f c y . ^ . c , ( A t _,) + TK{j)k y/ > 1, V* > 1 C

n(\),k -

C

+ £

n(\),k-\

-K(m ~ c«-(y-D,i C

+ L

«•(!),*

\/k > 1

*o),i

v/ > l

~ln(\\\

n(\\\

(3) (4) (5)

(6)

Once all jobs have been scheduled, makespan (M) can be obtained as follows: C

=c

M = maxC„,„ j=l,-.n '»
(7) (8)

3. Proposed algorithm 3.1. Ant colony optimization approach Ant Colony Optimization (ACO) is a population based, cooperative search metaphor inspired by the foraging behavior of real ants. Real ants leave on the ground a deposit called pheromone as they move about, and they use pheromone trail to communicate with each other for finding shortest path between food resource and their nest. Ants tend to follow the way in which there is more pheromone trail. In ACO algorithms, artificial ants with the above described

452 characteristics and some specific additional features collectively search for good quality solutions to the optimization problem [7], The seminal work on ACO is Ant System (AS) [8] that was first proposed for solving the Traveling Salesman Problem (TSP). In Ant System, the ants are simple agents that are used to construct solutions, guided by the pheromone trail and heuristic information based on intercity distances. Since the work on AS, several extensions of the basic algorithm have been proposed with different names. The main difference between AS and these extensions are the ways the pheromone update is performed and, some additional details in the management of pheromone trails. Common structure of these ACO algorithms can be illustrated as follows [7]; Step 1. Initialize the pheromone trails and parameters. Step 2. While (termination condition is not met) do the following: Construct a solution, Improve the solution by local search, Update the pheromone trail or trail intensities. Step 3. Return the best solution found. 3.2. Description of the proposed ACO algorithm We proposed an algorithm based on MAX-MIN Ant System (MMAS) [9]. We made modifications as described below in order to schedule a FPFS problem. 3.2.1. Initializing the pheromone trails and parameters One of the main differences of MMAS from other ACO algorithms is that it limits the possible range of pheromone trail values to the interval \Tnin , Tmax J in order to avoid search stagnation. r and T max = 1 / Zgb' min = r max / a > w h e r e ZgTis t n e makespan of the best solution found up to now and a is a parameter. r max and Trrin are updated each time when a new ZJ£ is found. Tlp denotes the quantity of trail substance for job i on plh position. It gives the degree of desirability for an ant to choose job i for position p for the schedule being generated by itself. Initial values for Tip are set to r max , we used an initial solution randomly generated. 3.2.2. Construction of a solution by an artificial ant In ACO algorithms solutions are constructed by artificial ants and each of them is capable of generating a complete solution itself. In order to schedule the FPFS problem, each ant starts with a null sequence and makes use of trail intensities for selecting a job for the first position, followed by the choice of an

453 unscheduled job for the second position, and so on. Each selection is made with the application of a probabilistic choice rule, called random proportional rule, to decide which job to select for the next position. In particular, the probability with which the ant will choose job i for the/>,A position of its schedule is;

Pip=Jj£-,XieS

(9)

9 is the set of unscheduled jobs for the ant. After a complete sequence of jobs is generated by the ant, the completion time of each job on each machine and the makespan of the schedule are found by the equations (3)-(8). Approximation in Eq. (1) is used for maximum operation. 3.2.3. Improving the solution by Local Search We tried to improve the solution generated by the ant by searching its neighborhood. The set of possible moves is defined by a neighborhood of the current sequence. The insertion-move operates on a sequence of jobs and removes a job placed at p,h position and inserts it in (p-l)th or (p+l)th position of the sequence while relative order of other jobs are conserved. 3.2.4. Updating trail intensities After a complete sequence is constructed by each ant (an iteration) and possibly improved by the insertion-move, the trails are updated. In MMAS only one of the ants is allowed to add pheromone. The ant which generated the best tour in the current iteration or the ant which generated the best tour since the start time of the algorithm is chosen for trail updating. Let zf£! denotes the best makespan found by the ants in current iteration. The ant which generated the best schedule in the current iteration or the ant which have generated the best-so-far schedule updates the trails as follows,

^new __ J H'"ip

{P-Tip

| ybest ' rybest [

if job i is placed in position p in the best sequence otherwise

where p denotes the persistence of W

trail (0
T, = T"* will be used in Eq. (9) for selecting jobs for positions.

454 4. Computational experiments There exist many test problems for crisp case of flowshop problem and results of the different models can be compared easily. However the case is not similar and easy for fuzzy case. Same job sequence can give different results because of the approximation techniques used in fuzzy arithmetic or the ranking of the two schedules may change because of the selected ranking technique. We used the test problems in [4]. They proposed a mixed integer linear programming model and also applied Reactive Tabu Search (RTS) algorithm for the FPFS problem. They used seven and 21 point approximations for fuzzy arithmetic operations and AC for fuzzy ranking. They described 5 problems all with 4 machines and job numbers varying between 5 and 20. TFN's are used to represent the fuzzy processing time of jobs on machines. The first problem with five jobs is handled in order to inspect the process more in depth and to find parameter values for the proposed MMAS without a local search process. Parameter values are used as follows; number of ants=10, p=0.90, a=5 and 30% of the ants make global trail updating. Figure 1 displays the search process of the proposed algorithm for 100 iterations.

40

50 Iteration

Figure 1. Search process of proposed MMAS for the first problem

Best solution is found on the third iteration with the sequence [ 5 2 3 1 4]. As seen on Figure 1, diversity of search lessens as die iteration number increases because of the intensification of trails on specific positions for specific jobs. Figure 2 displays the trail quantities for jobs on each position of the sequence after 25 iterations, as seen on the figure there is not a clear intensification of jobs for specific positions.

455

0.005

Figure 2. Trail intensities after 25 iterations

On the other hand, as seen on Figure 3 after 75 iterations there is a clear intensification of trails for the best solution found up to now. The process is expected to stagnate close around the best solution and seemingly there is not need to wait for a new best solution. 0.045 0.04 0.035 0.03

I f 0.025 lT

0.02 0.015

nd i in

\ \

position

Figure 3 Trail intensities after 75 iterations

Table 1 gives the computational results for the other test problems. Each problem is solved with the proposed algorithm five times for 100 iterations. Best solutions and the averages of the found at iterations and CPU times are represented in Table 1 beside with the results in [4]. It should be noticed that, for the same sequence of jobs, different AC values may be computed because of the different approximation methods for maximum operation is used in proposed MMAS and RTS.

456 Table 1 MMAS and RTS performance for test problems

Number of Jobs 8 12 15 20

Best Solution (AC) 189.5 285.75 317.75 451

MMAS Found at iteration (average) 10.2 49 74.6 27

RTS CPU(s) for one iteration 0.091 0.146 0.207 0.276

Best Solution (AC) 187.46 285.75 317.08 451

Found at iteration 43 50 38 555

CPU(s) for one iteration 0.016 0.057 0.112 0.271

5. Conclusion The flowshop problem is NP-hard in crisp case and the complexity of the problem seriously increases when it is fuzzified. So, there is a demand for methods which can approach to the optimum solution in a reasonable time period. Even if we tried to investigate the performance of simple artificial ants on the solution space we were able to access good solutions in reasonable time periods. The performance of the proposed model can be increased (especially in means of solution time) by adding extra capabilities to the artificial ants like look ahead information or local search techniques. References 1. A.H.G. Rinnooy Kan, Machine scheduling problems: Classification, complexity and computations. The Hagues: Martinus Nijhoff, (1976). 2. C.S. McCahon, E.S. Lee, Fuzzy job sequencing for a flow shop, European Journal of Operational Research, 62 (1992). 3. H. Ishibuchi, N. Yamamoto, T. Murata, H. Tanaka, Genetic algorithms and neighborhood search algorithms for fuzzy flowshop scheduling problems, Fuzzy Sets and Systems, 67 (1996) 4. J. Balasubramanian, I.E. Grossmann, Scheduling optimization under uncertainty - an alternative approach, Computers and Chemical Engineering, 27 (2003). 5. M. Sakawa, R. Kubota, Fuzzy programming for multiobjective job shop scheduling with fuzzy processing time and fuzzy duedate through genetic algorithms, European Journal of Operational Research, 120 (2000). 6. P. Fortemps, M. Roubens, Ranking and denazification methods based on area compensation, Fuzzy Sets and Systems,S2 (1996). 7. M. Dorigo, T. Stutzle, Ant Colony Optimization, The MIT Press (2004). 8. M. Dorigo, V. Maniezzo, A. Colorni, The Ant System: Optimization by a colony of cooperating agents, IEEE Tansactions on System, Man and Cybernetics, 26 (1996). 9. T. Stutzle, H.H. Hoos, MAX-MIN Ant System, Future Generation Computer Systems, 16 (2000).

TIME DEPENDENT VEHICLE ROUTING PROBLEM WITH FUZZY TRAVELING TIMES UNDER DIFFERENT TRAFFIC CONDITIONS TUFAN DEMIREL* Department of Industrial Engineering, Yildiz Technical University, Yildiz-Istanbul, 34349, Turkey NIHAN CETIN DEMIREL Department of Industrial Engineering, Yildiz Technical University, Yildiz-Istanbul, 34349, Turkey Time dependent vehicle routing problem is a vehicle routing problem in which travel costs along the network are dependent upon the time of day during which travel is to be carried out. Most of the models for vehicle routing reported in the literature assume constant and deterministic travel times. This paper describes a route construction method for time dependent vehicle routing problem with fuzzy traveling times according to different traffic conditions.

1. Introduction Distribution and transportation networks become an important part of our daily life gradually. Vehicle Routing Problems (VRP) are concerned with the delivery of some commodities from one or more depots to a number of geographically scattered customers with known demand. The goal is to find routes for the vehicles, each starting from a given depot to which they must return, such that every customer is visited exactly once. Usually there is also an objective that needs to be optimized, e.g. minimizing the travel cost or the number of vehicles needed. Time dependent vehicle routing problem (TDVRP) is a VRP in which travel costs along the network are dependent upon the time of day during which travel is to be carried out. This problem has constrained time for delivery. The objective function of TDVRP becomes a composite function: • Maximize total number of customers served. • Minimize total number of customers unserved (if allowed).

* Corresponding author Tel.: +90-212-2597070 (2547) E-mail address: [email protected] (T. Demirel)

457

458 • • • • •

Minimize total backorder costs (if allowed). Minimize total lateness duration (if allowed). Minimize total number of vehicle used. Minimize total distance travelled. Minimize total costs.

The difference of ours study is acceptance of travelling times as fuzzy numbers. We allowed the travelling times as fuzzy numbers because in real life the travelling times between to nodes is not constant. Our aim for this paper is to improve nearest neighbour based heuristic algorithm for a problem with time constraint and fuzzy travelling time. The rest of this paper is organized as follows. Section 2 presents a brief literature review dedicated to different vehicle routing problems. Section 3 describes time dependent vehicle routing problem with fuzzy traveling times. Section 4 explains nearest neighbour based a heuristic method for our defined problem. And also in this section algorithm is used and results are given for a problem with different traffic conditions. Finally, section 5 concludes and proposes future avenues of research. 2. Literature Review Many studies on vehicle routing problem have been published. Liu and Shen [2] presented a route construction method for the vehicle routing problem with multiple vehicle types and time window constraints. They extended several insertion-based savings heuristics. Tan et al. [3] investigated and developed various advanced artificial intelligent (AI) techniques including simulated annealing (SA), and genetic algorithm (GA) to effectively solve the Vehicle Routing Problem with Time Windows to near optimal solutions. Moin [4] explanted two hybrid genetic algorithms developed to solve vehicle routing problems with time windows. Czech and Czarnas [5] presented a parallel simulated annealing algorithm to solve the vehicle routing problem with time windows. Hapke and Wesolek [6] suggested a mathematical model taking into account real constraints and goals of a concrete Polish transportation company. In their model both flexibility and uncertainty were handled by means of fuzzy sets. Lau et al. [7] explained a variant of the vehicle routing problem with time windows where a limited number of vehicle is given. Ichoua et al.[8] presented a model based on time-dependent travel speeds which satisfies the "first-in-firstout" property. An experimental evaluation of the proposed model was performed in a static and a dynamic setting, using a parallel tabu search heuristic. Donati et al. [9] described a time dependent model of the vehicle

459 routing problem, with delivery time windows, TDVRP, and the optimization algorithms used, based on the Ant Colony System and local search procedures. 3. Problem Description The VRP can be represented with an incomplete directed graph G(V, A), where A is a set of oriented arcs connecting pairs of nodes, and Fis die set of nodes of which one represent a depot, and the rest the customers. In this paper we consider the Time Dependent Vehicle Routing Problem (TDVRP) having the following features: 1. A single commodity to be distributed from a single depot to customers with known demand. 2. Each customer must be visited by exactly one vehicle. 3. Each vehicle has the same capacity. 4. Total service time of day has a time constraint. 5. Total service time of day is divided into time intervals which depend on traffic conditions of city. 6. Travelling, waiting, and unloading times include an uncertainty 7. The objective is to minimize the total number of vehicle used. As shown in Figure 1 traveling time in a day is changing according to a continuous function [10]. If we appropriate this time constant and deterministic, some issues will be appeared. As a sample; time constraint will be exceed, can not visit some customer or visit lately. These results will be occurred because of unforecast of the traveling times. Clearly, in real life according to different traffic intensity, the traveling times will be variable. Acceptance of this time as fuzzy numbers will approximate to a right conclusion. A

Traveling Time

Time of a day Figure 1. Travel time variation as a continuous function

In our studies we prevent all customer demands in time dependent by choosing the needed vehicle and routing. Because of the traveling times uncertainty we accept these times as a triangular fuzzy numbers (TFNs). A triangular fuzzy

460 number can be defined by a triplet (a, b, c). The parameters a, b, and c, respectively, denote the smallest possible value, the most promising value, and the largest possible value that describe a fuzzy event. The membership function is defined as: 0, l(x)--

x
(x - a)/(b -a),

a<x
(c - x)/(c -b),

b<x
(1)

x>c

0,

4. Nearest Neighbour Based Heuristic Algorithm Our designed heuristic algorithm is similar to nearest neighbour. In our studies we take into consideration, the time of nodes in order to distance between the nodes. Time between nodes are fuzzy triangular numbers. For choosing the minimize time intervals v/e used the Chui and Park's [1] fuzzy ranking method. The weighted method compares traveling times by assigning relative weights. The evaluation of the traveling time in the form of a TFN (a, b, c) is determined by assigning relative weight [1]. a+b+c

+ wb

(2)

where w=0.2 represent the relative weight as the magnitude. 4.1. A heuristic algorithm The basis logic of the algorithm is choosing the min. time interval node between nodes. During selection of the node, vehicle capacity and time constraint have been taken into consideration. The notation used in this paper is defined as follow: S

: set of unvisited nodes.

k

: numbers of vehicle used.

Dj

: demand of jth node (customer).

UC

: update capacity for kth vehicle.

Ty

: fuzzy traveling time \Ty = (cty, by, ctj )) from i toy.

461

Tip

: minimize fuzzy traveling timefromi to p for kth vehicle.

CT0p : cumulative fuzzy traveling timefrom0 (depot) top for kth vehicle. In our studies the scheme of the heuristic algorithm is designed as follow: k =\ Repeat f=0 Repeat j e SI S = {/ —> Y/', j is unvisited node and i & y J

k if (c/C* >o)and(uC

>min(Dj)] J*S

(cf0k < Total Time) then\peS\T*=

min

jf„}l

endif if (cfQkp © fip )< TotalTime

then Update

I let S = S {p} letUCk = UCk k

letCT

p

-D.

CTOp r

\let i = p

endif Until UCk <min(Dj) or^CTkp ®fip)> Total Time] jzS

letk = k + \ Until S = <j> A time interval and coefficient of traffic intensity have been taken into consideration in determining fuzzy traveling time as shown Eq.(3). Where ^ is a coefficient depending on traffic intensity for ith time interval and a time interval is definite time length depending on time of a day.

462

T- = v

A\(ly>mij>uij)

,Time interval 1

A2(lij,miJ,Uy)

.Timeinterval

^3Kij'mij>uij)

, Time interval n

2 (3)

4.2. A numerical example The case study will be seen in Figure 2. In this figure there are 6 customer nodes and single depot. The customer demands are given in Figure 2. In this case study the vehicles are unlimited and each vehicle has same capacities as 10 units. All nodes at traveling time intervals will be seen as fuzzy triangular numbers in Table 1.

0

©

3 units

6 units (£) 10 units

depot

©

©

5 units

6 units f O 4 units

Figure 2. The demand for each customer. Table 1. Fuzzy traveling times for all nodes. 0 0

1

2

3

4

5

6

(30,40,55)

(35,55,65)

(20,30,45)

(30,35,45)

(50,60,75)

(40,45,55)

(40,50,70)

(50,55,70)

(40,60,75)

(30,50,60)

(40,50,60)

(30,40,60)

(25,35,45)

(10,30,40)

(50,60,80)

(15,25,40)

(30,50,65)

(45,50,65)

(20,30,50)

(30,50,55)

1

(30,40,55)

2

(35,55,65)

(40,50,70)

3

(20,30,45)

(50,55,70)

(30,40,60)

4

(30,35,45)

(40,60,75)

(25,35,45)

(15,25,40)

5

(50,60,75)

(30,50,60)

(10,30,40)

(30,50,65)

(20,30,50)

6

(40,45,55)

(40,50,60)

(50,60,80)

(45,50,65)

(30,50,55)

(20.40,50) (20,40,50)

Obtained result using developed heuristic algorithm is showed in both Figure 3 and Table 2. On the result we determined 4 vehicle and found the route for each. And also for each vehicle's service times occurred including the last customer.

463 The service times from the last customer to the depot will not take into the consideration.

Figure 3. A solution for case study. Table 2, Detail results for case study. Vehicle Route 1 2 3 4

Depot-*3-»4->Depot Depot-* l-»5->Depot Depot->6-»Depot Depot->2->Depot

Used Capacity 9 9 6 10

Total Traveling Time (35,55,85) (60,90,115) (40,45,55) (35,55,65)

Time Constraint < 150 < 150 < 150 < 150

5. Conclusion The vehicle routing problem is a very important problem in distribution and logistic systems. Satisfying the customer demands exactly and just in time will be increasing the customer service quality. Because of the uncertain traveling times we used the fuzzy triangular numbers. Development of heuristic algorithm has come to a conclusion for time dependent vehicle routing problem with fuzzy traveling times with an empirical sample. For the future models, inserting the traveling times according to different traffic conditions will be allowed more real conclusion. References 1. 2.

C.Y. Chiu and C.S. Park, Fuzzy cash flow analysis using present worth criterion, The Engineering Economist, 39,2, 113-138, (1994). F.H. Liu and S.Y.Shen, A Method for Vehicle Routing Problem with Multiple Vehicle Types and Time Windows, Proc.Natl.Sci.Counc, 23, 4, 526-536 (1999).

3.

K.C. Tan, L.H. Lee, Q.L. Zhu and K. Ou, Heuristic methods for vehicle routing problem with time windows, Artificial Intelligence in Engineering 15,281-295,(2001). 4. N. H. Moin, Hybrid Genetic Algorithms for Vehicle Routing Problems with Time Windows, International Journal of the Computer, the Internet and Management 10, (2002). 5. Z.J. Czech and P.Czarnas, Parallel simulated annealing for the vehicle routing problem with time windows, In Proceedings of 10th Euromicto Workshop on Parallel Distributed and Network-Based Processing, Spain, 376-383, (2002). 6. M. Hapke and P. Wesolek, Handling Imprecision and Flexible Constraints in Vehicle Routing Problems:Fuzzy Approach, Report RA-005/2003, Politechnica Poznanska, (2003). 7. H. C. Lau, M. Sim and K. M. Teo, Vehicle routing problem with time windows and a limited number of vehicles, European Journal of Operational Research 148 559-569, (2003). 8. S. Ichoua, M. Gendreau and J. Y. Potvin, Vehicle dispatching with timedependent travel times, European Journal of Operational Research 144, 379-396, (2003). 9. A. V. Donati, R. Montemanni, N. Casagrande, A. E. Rizzoli and L. M. Gambardella, Time Dependent Vehicle Routing Problem with a Multi Ant Colony System, Technical Report IDSIA-17-03, (2003). 10. A. Haghani and S. Jung, A dynamic vehicle routing problem with timedependent travel times, Computers&Operations Research, 32, 2959-2986, (2005).

A PROGRAMMING MODEL FOR VEHICLE SCHEDULE PROBLEM WITH ACCIDENT CHUANHUA ZENG 1 ' 2 'College of Auto-mobile and Transportation Engineering, Xihua University, ChengDu City PR. China 610039, Phone: +86-028-89829140, E-mail: zchfirstm63.net YANGXU 1

Intelligent Control Development Center, Southwest Jiaotong University, Chengdu P.R. China 610031 WEICHENG XIE

}

College of Electrical & Information Engineering, Xihua University, ChengDu City P.R. China 610039 A good transport schedule plan may be discomfited when a transportation accident happens. In order to get an adjusted transport schedule, we put forward a model for the vehicle routing problem with stochastic in contingency, and discuss the algorithm by combining stochastic simulation with genetic algorithm. We verify this algorithm by applying it to a specific case, and eventually reach the conclusion that it is better than the ordinary way.

1. Introduction Logistics distribution plays a pivotal role in logistics system. It involves assigning goods, assembling goods and delivering goods to customers in time. In order to obtain the highest serving level with the lowest cost in this work, various VRP(Vehicle Routing problem) and VSP(Vehicle Schedule problem) problems are studied and many optimization algorithms are put forward, such as the accuracy Algorithm including Brach and Bound Approach121, Cutting Planes Approach, Network Flow Approach and Dynamic Programming Approach'51; the Heuristics Algorithm including Constructive Algorithm, Two-phase Algorithm151, etc. All the Algorithms above are studied for the optimization of the future vehicle schedule problem, while there is scanty literature dealing with the vehicle schedule in contingency. In effect, accidents in transportation often 465

466 affect the plan. For instance,(l) the vehicle team can't set up if driver is absent; (2) the vehicle can't undertake work if it's under repair; (3) the vehicle can't reach destination on time if it breaks down midway or gets stuck in traffic jam; (4) suppose there are some extra tasks prop up, need to be done as soon as possible; (5) and also, the transportation time may be affected by the vehicle's capacities, performances of workers and road conditions. On the whole, all these factors involved may pose big problem to the schedule. Since the transportation task may be interrupted by contingency, we've got to adjust the transport schedule plan under these circumstances by applying an optimization algorithm based on SVRP (Stochastic Vehicle Route Problem). 2. Problem Describing Suppose the transport schedule plan has been set, it needs to be adjusted by applying an optimization algorithm based on SVRP (Stochastic Vehicle Route Problem). Here SVRP means that VRP with some stochastic restrictions. We put forward a model for the vehicle routing problem with stochastic in contingency'11, and discuss here the algorithm by combining stochastic simulation with genetic algorithm. The supposed conditions are showed as follows: (1) The usual transportation schedule plan exists; (2) The plan is being performed while a vehicle goes wrong; (3) Each vehicle's loading capacity is limited; (4) Each vehicle serves various customers while carrying out its tasks along a certain route, and the vehicles in good conditions are expected to accomplish the assigned tasks; (5) Each customer's freight could be transported by one vehicle only; (6) Each customer's freight should be delivered in a certain time window; (7) The transportation time between two customers is stochastic. 3. Model 3.1. Some Notations Some notations in model are described as follows: n : The total number of customers served by vehicles in good condition; L : The total number of customers served by vehicles in contingency; n: Final total customers' number, here the customers served by vehicles in contingency are classified into two types : one is load-customers and another is unload-customers, and n = n + 1L ; m : The vehicles' number after contingency, m = m'+l,herem' is the number of vehicles in good conditions, and " 1 " means a new vehicle;

Qk: Capacity of vehicle k; o,: Weight of customer's freight, 1 = 1,2,..., n; Dy : Distance between customer Jand customer j . There are three vectors x0, y0 and t0: x 0 =(x ,x ,...,x ) is an integer vector, \<x'
0 0 ,

•

°

,0

I

J>

>J

> >

'

Yo =(yo.yo.-".yo I ) is an integer vector, y{,,£ = l,2,...,m is corresponding to the superscript of xo(i = l,2,...,n'), and .y,
= z

n+(21)

l •

The other two decision vectors x and y: x = (xi,x 2 ,...,x n ) is an integer vector, l<x,<«and xi*Xj,\,i = \,2,...,n . y = (y 1 ,y 2 ,...,y m ) is an integer vector, yk (k = l,...,m) is corresponding to the superscript ofx;(i = l,2,...,n,...,n + 21), andyx y> <) = h+Tklc , and

/, Jx

yk-i*JK

(x,y,t) = fx '•"

+ TX

'

(x,y,t)vax

•/x«-i+y-iv

, y

'

+ SX * « - I + /-I

*,k-)+J-ixn-i+J

2<=i<=yk-yk]

Let g(x, y) be the total distance covered by all vehicles, then we get that m

g(x,y) =

^gk(x,y), k=\

+£ here gk(x,y)= Z A V . + I W

+D

*yko>yk >yk-\

(i).

tyk =yk-4 We know that v £

' (>'i-l+1.>'i-J+2,...,>'it),

yk

yi-\

£ « - * - J^qxj
For each customer, we get that PT{fi(x,y,te[ai,bi]}>/3„i = l,2,...,n.

j=yk-i+i

(2)

3.2.

Model

We get the final model: min g(x,y) s.t. Pr{// (*, y,0 e K .*<]}* ft, i = 1,2,...,« 1 < *, < «,;' = 1,2,..., n x,- * Xj,;' * y,;',y = 1,2,...,« Q
>*0

i—>*o ' — l y»-i+l' ^t-i+l

if z) e{Xyt_l+i,Xyt_l+1,...,xyk), ^i e{Xytj+uXyk]+u...,xyi:},

>"*'

then and z's is behind z,

yk

yi-\

y=r*-i+i

J=yk-t+l

and i =

yk-i+lyk-]+2,...,yk

Xj, yt e Z, i = 1,2,..., n, j -1,2,..., m -1. {*o'~l+1'*o*",+2•••••*<>*} certain order.

is

corresponding with {*yt_,+i,*r*-,+i >->*>'*} by

4. To Construct the Chromosome We are considering solving this problem by combining stochastic simulation with genetic algorithm1'1, and how to construct the chromosome is described as follows. We take the chromosome as an operation plan, while genes x, y have the same meaning as decision vector, here we omit t because t is certain(s , here t k is the time when a new decision needs to be make after accident happens). 1 2

m

To initialize the chromosome randomly. According to yn =(y ,y ,—,y ) , we divide x0=(x ,x ,...,x ) into m sets * • .,,*• .,,...,* • ,K = 1,2,...,/W and "

o

o

o

^t-i+i

yk-\+i

yk

add an empty set 0 into it to denote the set of customers served by new vehicle, here the 0 is marked as k +1 th set. To each remainder / of accident vehicle, we generate a random position k between 1 and k + \ to express that customer / will be served by vehicle k , then add zhz] into k'th set. To each set, we assembly it randomly and get new genes x = (xi,x2,—,x„), y = (yi,y2>—>ym,+\)> here yk(k = \,2,...m +1) is corresponding to the sub mark of the first x, in each set. We verify the chromosome, and if it is feasible we will accept it, otherwise we go through the procedure again till we get a feasible one.

469 To cross the chromosomes, suppose two chromosomes (VUV2) will be employed to make cross, here F, = (*,-, v,,/,),»'= 1,2 • Firstly, we verify whether they are feasible to be made cross: to each Vt , divide it into m +1 sets according to p O ' ] , ^ y„' + |), then compare them one after another, if all sets of each Vi are equal respectively, then select the best part of chromosomes to make cross, and get a new chromosome; continue this work by combining the sets of (VUV2) in line with 2x2,3x3,...,w'x/n', if all sets of each Vt are equal respectively, then select the best part of chromosomes to make cross, and get a new chromosome. To mutate the chromosomes, suppose there are two random numbers /,, l2 for gene x , to exchange z; and z\ of customer /, with customer l2 and verify it, if it is feasible, we will accept it, otherwise we go through the whole procedure again till we get a feasible one. 5. An example Here we taking the problem in [1] as an example to verify the model and algorithm, and the initial best plan is showed as follows: Vehicle 1: 0 -> 10 -> 19 -> 17 -> 18 -> 0; Vehicle 2: 0 - > 1 3 - > 8 - > 2 - > 1 6 - > 0;Vehicle 3: 0 - > l - > 1 4 - > 6 - > 7 - > 4 - > 0 ; Vehicle 4:0->3->9-> 15-> l l - > 5 - > 12->20->0; Here "0" means distribution center, and the other numbers mean the customers. The start time of four vehicles are 8:32, 8:05, 8:14 and 8:32 respectively. At 10:00 vehicle 1 has an accident while it is on the way to customer 18 (we name this place as A), so the task for customer 18 is not finished and its demand is 200. Meanwhile, vehicle 1 has finished its all tasks and has been in distribution center; vehicle 3 is unloading the goods for customer 6, and its start time will be at /0=10:13, while it has its own tasks such as customer 7 and customer 4 yet to be finished, also vehicle 3 has capability of 640 left; vehicle 4 is unloading the goods for customer 9, and its start time will be at t0=10:02, while it has its own tasks such as customer 15, customer 11, customer 5, customer 12, and customer 20 yet to be finished, also vehicle 3 has capability of 600 left. The matrix of time between customers, the time widows, demands for customers and the matrix of distances can be found in [1]. The time of unloading goods for customer 4, 5, 7, 11, 12, 15, 18 and A are 10, 13, 20, 18, 20, 20, 14, 10 respectively. We expect that the service level for customers in its time window is more than 90%, so we get that Pr{/} (x,y, t) e [a,-,bt],i = 1,2,...,«} > 0.90 .

470 We construct the model in the way put forward above, then solve this model by combining stochastic simulation with genetic algorithm111, and finally we get the transportation schedule plan showed as follows. Vehicle 3: 6 -» 7 -> 4->0; Vehicle 4: 9 -> A -> 18 -> 15 -> 11 -> 5 -> 12 -• 20 -> 0, The total distance is 415. If we send a new vehicle from distribution center to transport the goods at A, the transportation schedule plan would be: Vehicle 3: 6 -> 7 -> 4 -> 0; Vehicle 4: 9 -> 15 -> 11 -> 5 -> 12 -> 20 -> 0; New vehicle: 0 -» A -> 18 —> 0; Here the total distance would be 515. So the first plan is much better than the second plan. Conclusion To VSP under the accident, firstly we construct a stochastic programming model. Secondly we solve this model by combining stochastic simulation with genetic algorithm put forward in [1], and give a way of how to construct, cross and mutate the chromosomes particularly. At last, we apply the algorithm to a specific case, get a new transportation schedule plan, and verify that the new plan has more advantages than the old one. Acknowledgements This paper is supported by the National Natural Science Foundation of P.R. China (Grant no. 60474022). References 1. Liu Baoding, Zhao Ruiqing, Wang Gang. Uncertain Programming With Applications. Beijing: Publishing House of Qinghua university, 2003.8. 2. Guo yaohuang, Qian Songdi etc. Operational research. Beijing: Publishing House of Qinghua university, 1990:266 ~ 267. 3. Liu B and Lai K K. Stochastic programming models for vehicle routing problems. Asia Information-Science-Life, 2002,l(l):13-28. 4. Laporte G, Nobert Y. Exact Algorithm for the vehicle route problem[M]. Amsterdam: North-Holland publishing, 1987. 47-84. 5. Christofides N. A new Exact Algorithm for the vehicle route problem based on q-path and k shortest path relaxationsfR], London: Imperial College, 1993. 6. Fisher M L. Vehicle routing with time windows: Two optimization algorithms[J]. Operation research, 1997,45(3): 488-492.

A WEB DATA EXTRACTION MODEL BASED ON XML AND ITS IMPROVEMENT WEICHENG XIE College of Electrical & Information Engineering, Xihua University, Cheng Du, Sichuan Province, China CHUANHUA ZENG Intelligent Control Development Center, Southwest Jiaotong Chengdu P.R. China

University,

A web data extraction model based on HTML or XML Web pages is provided. Firstly, read the Web document from the web server with STOCK, and check format of the Web document, transform the existing HTML web page into XML or XHTML (a subset of XML); secondly, an "operation" on a Web page can generate series of XML documents, integrating these documents will lead to data storing; thirdly, the absolute path in Xpath and the anchors can extract interest data with tools of XML data format; finally, retrieve the data and construct XML output, display the inquiry result on the browser. The result show the implementing Web data extract with the model is effect, but its limitations and defects is existed, an improved semantic web data extraction model is provided.

1. Introduction There are so many data on the web that how to make full use of them has become a hot subject in the field of database technology research. How to obtain information from the web is becoming a hot talk, and various data mining models have been put forward to solve this problem. Web data extraction is a key process of web data mining. Web data extraction is the process to obtain information, including texts and multimedia, from the web, to meet the clients' needs. Therefore, web data extraction is critical to web data mining, and it is necessary to examine the way to do so. Data Mining is to discover rule-governed information from plenty of data, improving the quality of data use [1]. KDW (Knowledge Discovery in Web) covers three different data mining tasks: Content-based data mining; Structure-based data mining and Record-based data mining. While traditional databases are highly structured, web data are typically half-structured and overwhelmingly written by HTML. So web data extraction can't be executed fully automatically so far. But XML is semantic-based and 471

472 can be tackled well by the program, so it is highly likely to extract data automatically from XML data. 2. XML and Web Data Extraction Technology XML is especially for the web application service [2], XML comes up with solutions, which HTML fails to: Internet's lower-speed connection despite its high-speed development; and Difficulty in getting the desired information from desultory data on the web [3]. XML can provide structural and semantic information, and make computers and servers process information in different forms in time. So the new web space based on XML is web data oriented, compatible well with existing web applications, and easier to share and exchange web information. XML makes it possible to map the document descriptions with relation database properties and to inquire information and extract models accurately. 3. Web Data Extraction Model and Extracting Procedures 3.1. Web Data Extraction Model In our development experiments, we create a web data extraction model, the model we propose functions this way: firstly, transform the HTML page into XML format; then, inquire the XML document; finally, display the inquiry result on the browser, and game over. Figure 1 demonstrates the web data extraction model base on XML, which covers the following tasks: 3.2. Web Page A ccess and Data Extraction In the process of data extraction, two kinds of web pages will rise: the page with desired data and the page with hyperlinks pointing to the desired data. By carefully analyzing the navigation rules of the website, we can describe the data manually or with some helpful tools. The key of data extraction is transforming the existing web page into XML or XHTML, and using XML data formatting tools to search for related data. Currently many HTML pages on most websites are format incomplete. Browsers like Internet Explorer can tolerate the ill format. Therefore, the first step is to transform the ill-defined HTML web page into well-defined XML document, following data extraction. Some tools can help to format the HTML page in an organized way, among them is Tidy, which can filter the errors in the HTML page, and is charge free. We can call the XMLHelper.tidyHTML() method to realize our purpose, with a URL as its parameter, and call XMLHelperger.outputXMLToFile() method to make an XML format document.

Web Pages (HTML.XML)

RDF API Transforming the HTML pages into XML format

Data extraction executer

XML DOC storing

Data extraction executing

X

Showing extraction result on browser

Showing extraction result on browser via Internet Figure 1. A web data extraction model faced on HTML or XML pages.

Figure 2. An improved model of Web data extraction based on XML

3.3. Structure Integration and Data Storing An "operation" on an HTML page can generate series of XML documents. Integrating these documents will lead to data storing. The storing technology of XML documents has been widely researched. Besides some general storing systems, some exclusive storing system is introduced one after another. There are three ways to store the XML data: in file systems, in databases (including relational databases and object-oriented databases), and in exclusive systems. Each way has its advantages and disadvantages, and a proper choice of storing is determined by the specific situation. 3.4. Data Mapping-XML Document Inquiry Data extraction is characterized with inquiring and manipulating the data sets out of the web pages, following the integration of the data sets. The XML inquiry language can manipulate the content, so we adopt UnQL and StruQL, designed by AT & T Laboratory, which can fulfill such tasks as inquiring, constructing, transforming and integrating. XML-QL (XQL) integrates inquiry language technology and XML grammar [4]. Declaring the path expression and pattern, providing the Where clause to point out the inquiry condition and XML data module, the final form is still XML. For example: in www.xhu.edu.cn/library/newbook.xml

Construct library_name $bn This inquiry means retrieving a list of all books recently-published in all libraries (book name, author, publisher, content, price). We can see that XQL is similar to SQL, and its powerful inquiry can help us reconstruct the XML data and a data view of different data sources. 4. Realization of Data Extraction Data extraction is an important sub-function of the model, because it has to solve the problem of how to retrieve the desired data from the web page to meet the clients' needs. These data may be text files, and may be stored in a special database as a source for date mining. This thesis is illustrating the idea of transforming the HTML page into XML format and retrieving the desired data by using XML technology. In the following we will demonstrate how to realize data extraction with the help of HTML, XTML, XML, XSL and JAVA technology. We want to extract information of books recently published from Xihua University Library' website. 4.1. Read Source Web Pages and Transform Them into XHTML We can read the HTML document from the web server with STOCK. XHTML, an HTML version compatible to XML, and well defined, includes all HTML objects and properties [5]. We can use the HTML tidy tool from W3C website to process HTML document automatically. The use of the tool has two steps: Firstly, standardize HTML document. Secondly, transform it. The transformed code will conform to the XML specification. Nevertheless, when we write codes in JAVA, we should check whether we have designated Tidy file in the Class Path of System under the circumstances of compiling and running, and whether we need the support of XML library. 4.2. Traversing The Document to Search for The Data Node There are several methods to retrieve information; one of them is absolute path method. Data extraction can be executed by using the absolute path in Xpath. Xpath, based on strings, is a non-XML-document grammar, which can be used to locate every node and every component (elements, properties, and contents)

475 of an XML document. After the previous process, the tags in the document are nested correctly. Using JAVA's event-driven method, such as Document_start() (on starting receiving the document);Document_end() (on finishing receiving the document);Element_start() (on starting an XML tag);Element_end() (on finishing an XML tag);Characters() (on retrieving an XML character) and Comment () (on giving comments). By calling these methods properly, a document can be properly traversed. Characters returns the content of an XML document, compares it with the desired content, judges whether it is the desired data, If yes, such methods as Elementstart and Element_end can be called to get the current path, which is the Xpath that is the data reference. Another method is to extract data from the anchors. Due to HTML pages' constant change, absolute path method will trigger an error. Changes are mainly involved in the location of the information, which is often included in such tags as

, and

. Therefore, we should construct location information independent of absolute path, which involves seeking the anchor including the extraction information. Generally, anchors, based on the information content, have nothing to do with HTML paths. For example, if we want to extract the information about books recently-published, as soon as finding the word "book_name", we have make an anchor independent of the path, codes are as follows: <xsl:template match="td[contains(.,'Last Tade')]"> <xsl:value-of select="b"/> 4.3. Retrieve The Data, Construct XML Output and Data Mapping XSL can transform XML documents into different forms. Supposing we have got the information of recently-published books from Xihua University Library's website, and we have to construct an XML document in the format of book name, author, publisher, content, and price, we can use XSLT method to fulfill the task. Major codes are as follows: <xsl:template match="/HTML/BODY/TABLE[ 1 ]/TR[2]/TD[3]"> < book_name > <xsl:value-ofselect="tr/td[3]/font"/>

476 A new file will come into being if it is the first time to extract data. If there is another file, we can use the Merge function to merge the two files, and we can check the correctness of data extraction. 5. Defects of The Model and an Improved Model The model is designed for HTML document or XML document. But data on the web are not limited in HTML format or XML document; there are other forms, such as databases, logs, and files. How to extract data from these sources is a great challenge. Adopting XML grammar, RDF can easily realize automatic search without manual tagging interference, improving the rate of search coverage and veracity. Logically, the data extraction system base on XML and RDF has three layers: Information layer; Middle layer (RDF and XML) and Application layer. Considering its logical structure, the previous model can be modified and an improved model is Figure 2. 6. Conclusion A web data extraction model based on XML and its framework implementation are introduced. Based on the discussion about its limitations and defects, an improved semantic web data extraction model is provided by the authors. Data mining is composed of repetitive data extraction processes. We should consider the specialty of data mining, extract data from the web time and again, merge the products, and construct a practical data mining system. References 1. Jiawei Han, Micheline Kamber, DATA MINING Concepts and Techniques. 1st ed., Springfield: High Education Press and Morgan Kaufmann Press, pp.6-12 (2001). 2. MyllymakiJussi.Effective, Web Data Extraction with Standard XML Technologies. International Journal of Computer and Telecommunication Networking In: 10th intl. World Wide Web Conf. Hong Kong, (2001). 3. V. Baran, M. Colonna, M. Di Toro and A. B. Larionov, Intelligent Web Mining System Based on MLDB. Computer Engineering, vol.30, no.5, pp93-94, 101 (2003). 4. Chamberiin D D.Robie J, Florescu D. Quilt, An XML Query Language for Heterogeneous Data Sources. In: Proc. Of the Third Intl. Workshop on the Web and Database, Dallas, Texas, U.S.A., May 2000,pp53-62 (2001). 5. Zhang Chenghong, Gu Xiaohong, Bai Yanhong, The Progress of Web Data Extraction Technology. Computer Science, vol.31, no.2, pp 129-131, 151, (2004).

EVALUATION OF E-SERVICE PROVIDERS USING A FUZZY MULTI-ATTRIBUTE GROUP DECISION- MAKING METHOD CENGIZ KAHRAMAN Istanbul Technical University, Department of Industrial Engineering, 34367 Macka Istanbul, Turkey GULgiN BUYUKOZKAN Galatasaray University, Department of Industrial Engineering, 34357 Ortakoy Istanbul Turkey Since most of the companies prefer outsourcing for e-activities, the selection of e-service provider becomes a crucial issue for those companies. E-service evaluation is a complex problem in which many qualitative attributes must be considered. These kinds of attributes make the evaluation process hard and vague. Cost-benefit analyses applied to various areas are usually based on the data under certainty or risk. In case of uncertain, vague, and/or linguistic data, the fuzzy set theory can be used to handle the analysis. This paper presents the evaluation and selection process for e-service provider alternatives. Many main and sub-attributes are considered in the evaluation. The e-service providers are evaluated and prioritized using a fuzzy multi-attribute group decision-making method. This method first defines group consistency and inconsistency indices based on preferences to alternatives given by decision makers and construct a linear programming decision model based on the distance of each alternative to a fuzzy positive ideal solution which is unknown. Then the fuzzy positive ideal solution and the weights of attributes are estimated using the new decision model based on the group consistency and inconsistency indices. Finally, the distance of each alternative to the fuzzy positive ideal solution is calculated to determine the ranking order of all alternatives.

1. Introduction As it is open seven days a weeks, 24 hours a day, in fact, according to industry observers, Web-based customer service that is also known as "E-Service", is one of the biggest business opportunities on the Net. Internet's influence in creating e-services has been revolutionary for providers and their customers. Unfortunately, there has been a wide gap between inspiring applications of the Internet that help increase service customization while maintaining or even improving delivery efficiency. The firms are not able to determine the scale of the e-service they will provide [1] and select the best Internet service provider, which meets their needs for e-service applications. For firms concerned with e477

service, one central issue is to determine the decision criteria for proper selection of an e-service provider. The lack of accurate decision criteria reveals benefit/cost imbalance such as high investment and inadequate return. This article aims at developing a model for evaluation of different e-service providers for a particular company. Multiple attribute decision-making (MADM) problems find a best compromise solution from all feasible alternatives assessed on multiple attributes, both quantitative and qualitative. Suppose the decision makers have to choose one of or rank n alternatives, A A ••• A , based on m attributes, C ,C •••Cm. An alternative set is denoted by A = \A A •••,Anj and an attribute set is denoted by c = jc , C , • • •, C }• Let Xy be the score of alternative A, (i = 1,2,•••,«) on attribute Cj(j = 1,2, • • •,m), and suppose CO is the relative weight of attribute Cj, where a > o (j = 1,2, —,m) and g a = j . A weight vector H j is denoted by a> = (a>, ,co2,--,co J • A MADM problem can then be expressed as the following decision matrix: c, A D =

(x)

=

4

X

u

c

ca X

n

'"

** *n -

X

\m

*~

(1)

Crisp data are often inadequate or insufficient to model real-life decision problems. Human judgments are vague or fuzzy in nature and as such it may not be appropriate to represent them by accurate numerical values. A more realistic approach is to use linguistic variables to model human judgments. In this paper we evaluate e-services using Li and Yang's approach [2]. In their approach, linguistic variables are used to capture fuzziness in decision information and group decision-making processes by means of a fuzzy decision matrix. They propose a new vertex method to calculate the distance between triangular fuzzy scores. Group consistency and inconsistency indices are defined on the basis of preferences between alternatives given by decision makers. Each alternative is assessed on the basis of its distance to a fuzzy positive ideal solution (FPIS), which is unknown. The fuzzy positive ideal solution and the weights of attributes are then estimated using a new linear programming model based upon the group consistency and inconsistency indices defined. Finally, the distance of each alternative to FPIS is calculated to determine the ranking order of all alternatives. The lower value of the distance for an alternative indicates that the alternative is closer to FPIS.

The paper is organized as follows. In next section, the basic definitions and notations of fuzzy numbers and linguistic variables are defined as well as the fuzzy distance formula and the normalization method. Section 3 defines fuzzy group decision-making model. Fuzzy multi-attribute e-service provider selection is explained in Section 4 and illustrated with a numerical example in Section 5. The paper is concluded in Section 6. 2. Related Terms 2.1. Triangular Fuzzy numbers Let ? = {l,m,u) be a triangular fuzzy number (TFN). Its membership function /*.(*) is given by I \

x- -l l<x<m m -/' \m-i u— u-x x u- -m m<x
(2)

2.2. Linguistic variables A linguistic variable is a variable whose values are linguistic terms. The concept of linguistic variable is very useful in situations where decision problems are too complex or too ill defined to be described properly using conventional quantitative expressions. 2.3. Distance between two TFNs Let S^ and S2 be two TFNs. The vertex method is used to calculate the distance between them as follows:

4tth$t-hJ+fa-mJ

+ (»,-uj]

(3)

2.4. Normalization method Suppose there exist n possible alternatives A ,A ,---,A from which P decision makers

Pp{p = 1,2, •••,/•)

have

to

choose

on

the

basis

of

m

attributes^ ,c , — ,c • Suppose the rating of alternative A. (/ = 1,2, •••,«) on attribute

c .{j = 1,2, -,m) gi v e n by decision maker

p ( p = 1,2,•••,/•) is

x;

=(a'J,b*,c*).

A multi attribute group decision-making problem can be

expressed in matrix format as follows: C, C, - C

DF-{x,±

P=

\X-,P

A x (4) Now let minja'|< « ; = («;, 6;, c; )}, i = 1,2, - , « ; / > = 1,2, • • •, P = max{i;\a', ex; ={a'„b;,c;)}, i = 1,2,-,n;p

= 1,2,-,P

(5) (6)

= min^'|6; ex; =(a;,b;,c;)},

; = 1,2,-,K;/? = 1 , 2 , - , P

(7)

= mvJp;\b;ex;=(a>,b>,c>%

i = \,2,-,n;p

= \,2,-,P

(8)

= 1,2,-, P

(9)

, = m i n ^ c ; e x,' = {a'„b'„c',% i = \,2,-,n;p

C- = min{c;|c; ex;= {a'„b;,c',)}, i = l,2,-,/i;/> = 1,2,-,P

(10)

Then the following normalization formulas are used: Al

forjeC

(11)

-Al

forjeC 1

(12)

and

Now, the fuzzy decision matrices D" {p- \,2,---,P)

are transformed into the

normalized fuzzy decision matrix Rp as follows:

c, 4

'=(r')

c2

• ••

r»

c

= 4 K'

K:

• - r;m P= • •• K:

4, r;

?:

• ••

rtl'

\2,-,P

(13)

r±

3. Fuzzy Group Decision-Making Model Li linear programming based multiple VA and ana Yang iang [3] \p\ propose propose the me following ionowing un -"••''• • - *• ' ' 'on-makine model: attribute fuzzy group decision-making model:

481

Max £ £ 4

(14)

s.t. J l-i

3tr

r'1 (i,/).n''

2X Ek-<)

•J y=i

ft>. =e,j

j

y=i

= l,2,---,m

£ 0, vw > 0, Vjo > 0, j = l,2,-,w ^ > 0 , (*,/)efi'; p = l,2,-,/> where v , = co .a'.,, v.„ = co a', v .„ = ft) a" |

(15)

^ ' W - and v.„(y = 1,2, •••,/«) can be obtained by solving Eq. (14). Then a)L,dm,and a'm (j = 1,2,• • •,m)are computed using Eq. (15) 4. Fuzzy Multi-attribute e-Service Provider Selection Fuzzy sets were introduced by Zadeh in 1965 [3] to represent/manipulate data and information possessing non-statistical uncertainties. Fuzzy logic provides an inference methodology that enables approximate human reasoning capabilities to be applied to knowledge-based systems. The theory of fuzzy logic provides a mathematical strength to capture the uncertainties associated with human cognitive processes, such as thinking and reasoning. Fuzzy multi-criteria decision-making has been widely used to deal with decision making problems involving multiple criteria evaluation/selection of alternatives [4, 5, 6]. These studies show the advantages in handling unquantifiable/qualitative criteria, and obtained quite reliable results. E-service provider selection is a complex problem in which many qualitative attributes must be considered. These kinds of attributes make the evaluation process hard and vague. Since most of the companies prefer

outsourcing for e-activities, the selection of e-service provider becomes a crucial issue for those companies. In this paper, taking into consideration the literature on e-services [7, 8, 9], the following cost-benefit attributes are preferred in the evaluation of the best e-service provider. • Cost Attributes: 1. Setup costs (SC): These costs include installation costs of the required system such as operating system, database system, e-payment systems etc., 2. Hosting Costs (HC): Periodic cost of hiring the service and/or system, 3. Design & Programming Costs (DPC): Costs for designing the Web, programming procedures and preparing necessary scripts, 4. Backup & Maintenance Costs (BMC): The cost of backing up the important data on the system and maintenance and upgrading costs to provide better service, 5. Advertising & Promotion Costs (APM), 6. Personnel (Staff) Cost (PC): The cost of the staff that will primarily work for this system. • Benefit Attributes: 1. Service Guarantee, References & Familiarity (SGRF), 2. Stability, reliability, and uptime (SRU): The e-service should be available online 24 h a day. That is, the operation system of e-services should not present failures at anytime, 3. Ease of control and use (ECU): The system provided should have control panel easy to use, 4. Security & Privacy (SP): Because e-commerce is operated on an open network, encryption technologies must be developed to deter hacker attacks. The system has maximum security for credit card applications. It is very important that stored data should also be kept private, 5. Speed (S), 6. Physical Location (PL), 7. Customer Service Level (CSL): E-service provider should have good customer relations, respond to the matters quickly. 5. A Numerical Application Suppose a firm plans to select a e-service provider among four candidates A\, A2, A-i and A4. There are four experts Pp (p=l, 2, 3, 4) who agree to take into consideration the following six attributes in evaluating the e-service providers: Setup costs (SC), Hosting Costs (HC), Backup & Maintenance Costs (BMC), Security & Privacy (SP), Speed (S), and Customer Service Level (CSL). Assume that the experts provide their preferences between alternatives as a 1 ={(2,l),(l,3),(2,4),(3,4)}, « 2 ={(2)l),(l,3),(l,4),(3,4)}, n 3 = {(2,1),(2,3)>(3,l),(3,4)}) and

n ={(l,3), (2,4), (3,4), (3,2)}. The corresponding relations between linguistic variables and positive triangular fuzzy numbers are given in Table 1. The data and ratings of all alternatives on every attribute are given by the four experts Ph P2, P3, and P4 as in Tables 2 respectively.

Table 1. Linguistic variables and their representation with fuzzy numbers Triangular fuzzy numbers (0.8,0.9,1.0) (0.7, 0.8, 0.9) (0.6, 0.7, 0.8) (0.3, 0.5, 0.7) (0.2, 0.3, 0.4) (0.1,0.2,0.3) (0,0.1,0.2)

Linguistic variables Excellent (E) Very good (VG) Good (G) Fair (F) Poor (P) Very poor (VP) Too poor (TP)

Table 2. Evaluations of experts Attributes

e-service providers

A, A2 A3 A,

HC

SC E2

E3

E4

El

E2

E3

E4

El

E2

E3

E4

33 38 78 64

43 32 78 62

41 39 76 74

42 45 68 55

44 33 45 52

40 33 41 59

45 33 35 52

37 36 37 42

30 34 39 30

25 36 29 30

32 34 35 30

45 28 45 34

El

E2

E3

E4

El

E2

E3

E4

El

E2

CSL E3

E4

30 21 21 24

32 24 18 22

32 21 20 24

28 34 22 25

F VG G VG

G P F VG

G G P E

G VG P VG

G P G P

G G VG F

P G VG F

G P VG G

S

SP

A, A2 A, A4

BMC

El

Now, using Eqs. (11), (12), and (13), we obtain the normalized decision matrices and then the transpose of these for each expert. Accepting e = 0.01 and A = 1.0 and substituting the normalized decision matrices and Q' [p = 1,2,3,4) into Eqs. (14) and (15), the linear programming problem is solved. The final ranking for evaluating and selecting among e-service providers is obtained as A, >- A, y A, > A,. 6. Conclusions In this paper the model developed by Li and Yang [2] has been used for evaluating and selecting among e-service providers. The selection among eservice providers is a complex problem in which many qualitative attributes must be considered. These kinds of attributes make the evaluation process hard and vague. The judgments from experts are always in vague rather than in crisp numbers. It is suitable and flexible to express the judgments of experts in fuzzy number instead of in crisp number. For further research, the results of this study

may be compared with other fuzzy multi-attribute methods like fuzzy outranking methods, fuzzy utility theory and fuzzy AHP. Acknowledgments The authors acknowledge the financial support of the Galatasaray University Research Fund. References 1. K.B. Kenneth, H. Roger and V.R. Aleda, E-services: operating strategy—a case study and a method for analyzing operational benefits, Journal of Operations Management, 20, 175-188 (2002). 2. D-F. Li and J-B. Yang, Fuzzy linear programming technique for multi attribute group decision making in fuzzy environments, Information Sciences, 158, 263-275 (2004). 3. L. Zadeh, Fuzzy sets, Information Control, 8, 338-353, (1965). 4. C. Kahraman, D. Ruan and I. Dogan, Fuzzy group decision-making for facility location selection, International Journal of Production Economics, 87(2), 171-184(2004). 5. G. Buyukozkan, C. Kahraman and D. Ruan, A fuzzy multi-criteria decision approach for software development strategy selection, International Journal of General Systems, 33 (2-3), 259-280 (2004). 6. O. Kulak and C. Kahraman, Multi-attribute comparison of advanced manufacturing systems using fuzzy vs. crisp axiomatic design approach, International Journal ofProduction Economics, 95 (3), 415-424 (2005). 7. P.A. Dabholkar, D.I. Thorpe and J.O. Rentz, A measure of service quality for retail stores: scale development and validation, Journal of the Academy of Marketing Science, 24 (1), 3-16 (1996). 8. C. Liu and K.P. Arnett, Exploring the factors associated with Web site success in the context of electronic commerce, Information and Management, 38 (1), 23-33 (2000). 9. M. Wolfinbarger and M.C. Gilly, eTailQ: Dimensionalizing, Measuring and Predicting eTail Quality, Journal of Retailing, 79, 183-198 (2002).

A CASE BASED RESEARCH ON THE DIRECTIVE FUNCTION OF WEBSITE INTELLIGENCE TO HUMAN FLOW ZILU, ZHUOPENG DENG, YANG WANG Faculty ofResource and Environment Science, Hebei Normal University, Shijiazhuang, 050016, P. R.China Email: luzi@mail. hebtu. edu. en Abstract: More and more websites have started to apply intelligent techniques to conduct effective and flexible operations and provide high-quality and personalized online services. One of the explicit manifestations of website intelligence is that web-based information flow directs and guides a related human flow. This paper chooses Sino-Australian as a case to explore how information flow directs a study-abroad human flow through providing intelligent e-services. An evaluation system with the case is conducted to prove the rationality of the directing function and process. Keywords: website intelligence; e-services; study-abroad; directive functions; human flow; China

1. Introduction Online services (e-services) have been growing rapidly, keeping pace with the web. More and more websites are applying intelligent techniques to provide high-quality online services. These intelligent e-services have conducted an impact on the user decision-making for their activities in the real life. Therefore, some important website information may promote the movement of human flow in a real space. Taking websites as a front-office, some education agents/companies apply intelligent approaches and system to provide Chinese students full-stop services to study abroad from information search to visa application online. The rest of the paper is organized as follows. Literature review and background in related areas are shown in Section 2. The directing process and mechanism of information flow to related human flow are deeply revealed in Section 3. Section 4 focuses on an empirical analysis of how information flow directing human flow by taking www.ozstudynet.com as a case. An evaluation result is displayed and discussed to the directing effect of information flow to human flow. Conclusions are drawn in Section 5. 485

2. Literature Review and Background E-service intelligence (ESI) is an integration of intelligent technologies and e-services [1]. E-service intelligence has been right now identified as a new direction and next stage of e-services. Intelligent online services can provide with much higher quality information, personalized recommendations and more integrated seamless link services. Brazier and Cookson [2] discuss how real-time intelligent technique is applied to the business world. In the context of providing solutions for real-time communications services, their paper puts forward the following design patterns: service node, intelligent network and service delivery platform. Valerie et al [3] points out that enabling the intelligent e-services means that consumers will be provided with universal and immediate access to available content and services, together with the ways of effectively exploiting them. Intelligent online techniques and approaches have been received more and more attention. In the meantime, some researchers have shown their interests in how invisible information flow directing visible material flow. Graham and Marvin [4] identify four effects of information technology to a city, which include cooperation effect, substitution effect, derivation effect and enhancement effect. Moss [5] analyses the information and spatial distribution according to the Internet infrastructure of American. Adams and Ghose [6] have a qualitative analysis to the real effect of Internet in promoting migrations from India to American from the view of new ICTs. Some Chinese researchers have particularly conducted research in this area with China's cases. For example, Yao et al [7] give an explanation and empirical analysis to the four effects mentioned by Graham and Marvin. Zhen et al [8] develops related theories between information flow and transportation. Lu and Zhang [9] analyses the positive effect of implementation of intelligent Traffic Guiding System (TGS). Lu and Sun [10] put forward their perspectives to the nature of flow space. However, review results show that the research on the directive functions of invisible information flow to visible material flow is limited in simple qualitative analysis and lack of deep analysis to its mechanism generation. With the full-scale exchanges and cooperations of politics, economy, trade, culture, science and technology between China and Australia in last decade, education exchange and cooperation have been developed and extended gradually. As shown in Table 1, the number of Chinese students in Australia is rapidly increasing. Besides the various attractions from Australian education systems and policy impetus from Chinese government, one of the important reasons is the extensive establishment of study-abroad websites. Table 1 The increased human flow of Sino-Australia student abroad during 1997-2002 China to Australia

1997 1700

1998 3100

1999 4100

2000 8909

2001 13452

2002 14215

Data source: Chinese Ministry of Education, Australian statistic bureau and federal immigrant agency

The study-abroad websites of Sino-Australian is a kind of virtual platform constructed in a network space for Chinese students study abroad. This platform contributes to the construction of a "study-abroad bridge" between China and Australia. With the websites, a virtual community for study abroad is formed, in which various information exchanges can be freely conducted between website users and owners or among users. This approach is efficient in ultimate result, it thus stands for the future development trends. Therefore, websites can produce an effect to human flow through online information flow provided. 3. Directing Process of Web Intelligence to Human Flow 3.1 The directing process of web intelligence to human flow Stage 1

Stage 2

The destination confirmation of study abroad

The selection and interview of agent for study abroad

4

1

Stage 3

Stage 4

The application and transaction for study abroad

The completion and derivation of study abroad

•

*

t

Online information exchange community

Figure 1 Intelligent services provided by study-abroad websites at different stages

The process of information flow directing human flow under website intelligence includes four stages as shown in Figure 1. We will discuss it in detail below. 3.1.1 Stage 1: The destination confirmation of study abroad Previous research [11] indicates that to enhance the website usability is a key issue. The enhancement of website intelligence can help handling this issue. In this stage, the most significant application of intelligent technology to websites is intelligent information retrieval function. (l)Web Intelligence enhances the proportion of useful information. Some advanced intelligent techniques, such as cooperative filtering, multi-keywords search, are employed to enhance the understanding of "natural language" of users. (2)Website intelligence enhances the pertinence of users' search. In www.ozstudynet.com, it provides an intelligent engine for searching information of study abroad destination. Users can choose countries, cities, even universities and majors through using it. Users therefore can have a quick and full understanding to everything they may concern the process of study abroad. (3)Website intelligence also enhances the utility and pertinence of website links. Many previous links in www.ozstudynet.com are constructed in the absence of analysing their users.

3.1.2 Stage 2: The selection and interview of agentfor study abroad At this stage website intelligence is mainly embodied in their services provided. The improvement of service quality mainly depends on that of website intelligence, including: (1) An intelligent online feedback function. It is a function that can periodically send the related information pertaining to the study abroad destination automatically by e-mails according to different demands of users. (2) An intelligent online consultation function. They can consult the further detailed information about study abroad through intelligent means such as e-mails or on-line chat tools (QQ, MSN) etc. The www.ozstudynet.com online consultation function is available anytime to its users. (3) An intelligent online evaluation function. The student abroad can send their specific circumstance to the websites by e-forms. The intelligent evaluation system and experienced staff give the most accurate and authoritative results within a short time and offer suggestions suitable for different users. 3.1.3 Stage 3: The application and transaction of study abroad In this stage the website intelligence is reflected in three aspects. (1) An intelligent application function for study abroad. This means that users can log in websites to submit a set of their application forms. The service staff of websites will help accomplish necessary procedures for study abroad. Meanwhile, users can log in websites at any time to know their current application state. If some problems happen, they can be solved online without any delay. (2) An intelligent visa transaction function. A typical one is e-visa application. All visa application materials can be transmitted and transacted between Overseas Procedure Centre (OPC) of Australian government and their agents for study abroad in the form of e-files. This way makes the visa procedure become simpler and ensures the objectivity and impartiality to the largest degree. Table 2 lists the differences between traditional and intelligent visa transaction. (3) A follow-up service for directing human flow. Namely, websites make the use of their perfect service Table 2 Differences between traditional and intelligent visa transactions Transaction procedure of traditional visa Applicants or agents post the whole application materials to OPC; After OPC receives the applications materials and fees, the file number and other forms are sent back to your agents, (to wait more than one month); Then you should wait another three months. If pass, the pre-evaluation results will post to your agents; After agents receive the results, it will be posted to your school with tuition fee. When the school gives a confirmation and the enrolment notice will be back; Agents send the confirmation to OPC again; OPC will post the letter of visa approval to general consulate of Australian in Shanghai; Agents will post the passport of applicants to Shanghai to transact visa; General consulate in Shanghai will post the visa to your agents.

Notes: the whole process needs about four months. Your related materials should be posted at least six times.

Transaction procedure of e-visa Application materials of applicants will be scanned by agents authorized e-visa transaction. Then input certain passwords. It will be sent to OPC by e-files; E-Visa is a smart system. Every application will be generated a TRN (Transaction Reference Number) automatically. Applicants also can use their TRN to acknowledge their application state. Meanwhile, the body-checking form can be printed and have a check in appointed hospital; OPC will send the results and letter of visa approval to agents by e-files, too; Agents wilt post the passport of applicants to Shanghai to transact visa. When finished, general consulate in Shanghai will post the visa to your agents.

Notes: most process is done online. Save more time using in pre-evaluation and material posting.

489 network and continue providing intelligent services to users who plan to study in Australian. The directing effect to human flow is ultimately realized. 3.1.4 Stage 4: The completion and derivation of study abroad The fourth stage is completion and derivative stage. Its significance is that through continuous online exchanges, new information flow and human flow may be derived, which further enlarges the circulative size of directive functions. (1) A custom-made service function. One of the important aspects of website intelligence here is to support personalization. The online services also include study accompany and bridging course transferring to provide personalized student-abroad services. (2) An online exchange community function. It is a virtual space in which website owners and online members or members exchange their thoughts, emotions and experiences. In this virtual space, members can have discussions for the interested topics according to their own characters, interest and taste etc. The website www.ozstudynet.com provides such a free exchange community, which includes the four different forums as shown in Table 3. It needs to indicate that intelligent interactive online exchanges in the virtual community are not limited in the fourth stage. As a means of service or personalization, it functions in the whole process. Table 3 The topic of exchange forum in www.ozstudynet.com Different forum Special area for study abroad Ally of study abroad to Australian Living information Website affairs administration

hot topics __ __„—__ Study abroad to Australian; school transfer IELTS ally; major ally; school ally; homemate forum; meeting friends from same city; middle school ally; scholar ally Accommodation; living in Australian; back ticket; Second-hand market; migration to Australian; employment information; matchmaking; recreation Latest activities: Website affairs administration

3.2 Mechanisms The directing mechanism of intelligent information flow to human flow is shown in Figure 2. The human brain is affected by intelligent information directly and they will have an indirect impact on human's decision-making. In turn, e-services will be provided to direct and guide related human flow. On the condition that human cognition is fixed, the availability, precision and abundance of information and intelligent e-services will directly determine the human behaviour. We will prove the process and mechanism in next section by conducting an empirical analysis. 4. The Empirical Analysis — Taking www.ozstudynet.com as a Case 4.1 Research objective, method and result The website www.ozstudynet.com was established in May 2000, Melbourne, Australia. It develops its own intelligent e-study system and provides intelligent e-services that cover all steps of a study-broad application, based on a new operation model. Therefore, this study selects it as an example for exploring how website with e-service intelligence directing the study-Australia human flow. According to continuous investigation to www.ozstudynet.com for three

weeks in December 2005, the evaluation system was created as shown in Table 4. The selection of those factors is based on literature review result and data collection availability. Synthesizing these factors orderly formed the evaluation system. The evaluation system is divided into four levels. Three factors are considered in level 1, eight factors in level 2, 17 factors in level 3 and 33 factors in level 4. The reason to divide factors into four levels is to facilitate scoring and have a deep and systematic view to website intelligence. The Likert Style Scale was employed to quantitative analysis. The score of each factor was divided into five levels (2, 4, 6, 8, and 10) according to the data we observed and collected. First, the scores of factor on level 4 were obtained. Second, the scores of factors on level 3 indexes were also obtained according to their corresponding factors in level 4. In turn, the level 1 and level 2 factors were calculated finally.

Different humai man! flowtoAustrali

fv^

Front NO

Transaction process Information flow

_ZTI | Transaction

If

application-notary-body check fee paying-visa application

I

Service

Instntction-ticketaccommodation-study instruction -school transfer

Figure 2 The directing mechanism of information flow to related human flow

4.2 Evaluation result analysis (1) The score of factor "website content" in level 1 is 7.9, the score of factor "website service" in level 1 is 8.4 and the score of factor "personalization" in level 1 is 6.2. According to the scoring criterion, the factor "website service" gets the best evaluation and the score of its four factors in level 2 are all beyond 8; the factor "website content" gets a better evaluation and the score of its one factor in level 2 is more than 7; the factor "personalization" gets a worse evaluation and only one factor in level 2 gets a score 7. (2) On the base of website intelligent e-services, the directive functions of information flow to a related human flow become more effective and orderly. (3) This research also offers some valuable references to website owners in website improvement. First, the results suggest that the multi-keywords,

Table 4 The evaluation index system of website intelligence Level 1

Score

Intelligent information retrieval function

Level of website content

7.9

Level 2

Intelligent online consultation function Intelligent evaluation function

Level 3

Score

Information serviceability

6

Personalized Retrieval

8.7

Related links

9

Level 4

Score

Rate of useful information Number of useful information Destination retrieval College retrieval Major retrieval Number of links

6 6

Relevance abroad E-mail Multi-means 8.4

8.7

8

8.5

Intelligent online feedback function

5.3

study

10 8

oo MSN

10 8

8

Response time Consultation satisfaction

8 8

Non-profit

S

Charge fee or not Extra fee

10 6

9

Number of Professional Related working experience

10

Professional Evaluation Dynamic query

6

Fast and convenient

6

Diverse ways

10

Especially interview

Make whole process simplified

7

Time-saving Procedure simplified

6 8

Success rate

10

Regularly automatic send e-mail to users

5.3

Other custom-ma de services function

Perfectibility

5

Completeness

7.3

6.2

Online exchang e function

7

with

8 10 8 8

Real-time response

8.5

Intellige nt visa applicati on function

Level of personalization

6.2

7.9

Intellig ent applica tion fiinctio

Level of website service

8.4

Score

Gathering effect

7

Topic abundance

7

video

8

10

Increased success rate

10

Time interval E-mail content

2 6

Junk-mail or not

8

Accompany visa service Transfer service

4 6

Buying plane tickets Meeting at airport Accommodation Number of members Online number at peak time Number of topics Coverage degree

8 6 8 8 6 6 8

cooperative filtering and associated rule techniques are comprehensively used together in the future information retrieval. Second, the personalization of website is still weak. More emphasis should be laid on its real application effect. (4) The website intelligence is a relative new concept and the directive function is not same effective in all-human groups. The effective exertion of its intelligence must aim at specific sub-groups. (5) The significance of intelligent online exchange is not only forming the favourable cycle between web-based information flow and related human flow, but also enlarging the circulative size of its directive functions. Therefore, the intelligent online exchanges can play an important role in the whole process of directing human flow of studying in Australia.

5. Conclusions Intelligent online services in study abroad industry can break down the regionality, enhance realtimeness and create high efficiency. Web based information flow is effecting related human flow. Especially, with the development of intelligent e-visa, online services for study abroad become very effective and efficient. The study-abroad websites offer an approach to let applicants enter a virtual world, and this virtual world can direct and guide the movement of student flow from China to Australian. The research has also found that: (1) As a front-office of study abroad companies, study-abroad websites perfectly meet the users' demand through intelligent e-services; (2) The Sino-Australian study-abroad website direct and guides the human flow effectively; (3) The development of Sino-Australian study-abroad website has been transmitted from information provision and services to full-stop intelligent e-services. Acknowledgement This research is supported by National Science Foundation of China. (40571042) References 1. Lu, J. (2005), "Recent developments on E-service intelligence", www.fuzzy.ugent.be Noordrach Jie Lu .pdf 2005. 2. Brazier, R., Cookson, M. (2005), "Intelligence design patterns", BT technology Journal, Vol.23, No.l, pp 69-81. 3. Valerie, I., Daniele, S, Ferda, T, Francoise, S. (2005), "Developing ambient intelligence systems: a solution based on web services", Automated Software Engineering, No.12, ppl01-137. 4. Graham, S., Marvin, S. (1996), "Telecommunications and the city", Electronics, Urban places, Routledge, London, 234. 5. Moss, L. (1998), 'Townsend AM, Spatial analysis of the Internet in U.S. cities and states." http:// urban.nyu.edu/research/newcastle/Newcastle.html. 6. Adams, R, Ghose, R. (2003), "India.com: the construction of a space between", Progress in Human Geography, Vol.27, No.4, pp414-437. 7. Yao, S., Chen, S., Zhu, Z. (2001), "From the development of information network to the construction of digital-city in urban agglomeration", Human Geography (in Chinese), Vol.16, No.5, pp20-23. 8. Zhen, H., Liu, H., Zhang, J. (2001), "Related theory of information flow and transportation" (in Chinese), China Communications Press, 2001. 9. Lu, Z., Zhang, H. (2003), "The active impact of traffic guiding system to urban space distribution in Shijiazhuang", Economic Geography (in Chinese). Vol.23, No.2, pp242-246. 10. Lu, Z., Sun, Z. (2005), "A geographical perspective to the elementary nature of space of flows", Geography and Geo-Information Science (in Chinese). Vol.21, No.l, ppl09-112. 11. Nielsen, J. (2001). "Did poor usability kill e-Commerce?" Jacob Nielsen's Alertbox, http://www.useit.com/alertbox/20010819.html, 2001.

GENETIC ALGORITHM FOR INTERVAL OPTIMIZATION AND ITS APPLICATION IN THE WEB-ADVERTISING INCOME CONTROL QIN LIAO 1 , XIWEN LI1 'School of Mathematical Sciences, South China University of Technology, Guangzhou, Guangdong 510640, P. R. China Abstract: The control problem of the web-advertising income is discussed in this paper. Firstly, the evaluation index system of the web advertisement and the evaluation model of the neural network are built. Secondly, in order to effectively control the factors influencing the income of the web advertisement, the genetic algorithm for interval optimization is proposed to find the corresponding intervals of those factors according to the anticipant income in a certain interval. Finally, an appropriate allocation of the advertising cost is made by using the clustering algorithm to classify the keywords. Numerical experiments are given and the results show that the novel method is effective. Keywords: Interval optimization, Genetic algorithm, Web advertisement, Advertising income

1. Overview Following by the prompt development of Internet, the evaluation problem about the web-advertising effect, income and factors attain more and more concern. By building the evaluation indexes and evaluation model of the web advertisement, the genetic algorithm for interval optimization is proposed to study the control of the intervals of the web-advertising influencing factors. Furthermore, by using the clustering algorithm to class the keywords, the keywords having maximum advertising income are found and an appropriate allocation of the web-advertising cost is made. 2. Evaluation indexes and evaluation model of the web advertisement CTR (Click Through Rate), CPC (Cost Per Click), and CVR (Conversion Rate) are chosen to be the evaluation indexes of the web-advertising effect. ROI (Return on investment) is chosen to be the evaluation index of the web-advertising income. CTR is the number of clicks divided by the number of impressions—the number of times the advertisements have been displayed. CPC is how much the 493

advertisers should be charged for a click on their ads. CVR is the number of conversions, which occurs when someone clicks on the ads and performs a behavior on the websites published on the ads, divided by the number of ad clicks. ROI can be calculated as revenue from sales, minus advertising costs, all divided by the cost. Suppose ROI is evaluated by the three evaluation indexes, the B-P model is built by choosing CTR, CVR and CPC as 3 input neurons and ROI as the output neuron. The B-P model is showed as follow:

ROI = F(xl,x2,xi) = f(fjVj(£wijxJ-0i)-r) 1=1

(1)

7=1

where/=l / (1 + e"*) ,x\= CPC,x2 = CTR,x3 = CVR, V-, and w,, are connection weights, di and r are thresholds. By the network study of the samples, the total error, which is between the output of the model and the practical result of the sample, can satisfy the demand and the weights and thresholds of Eq. (1) are obtained. Furthermore, the evaluation model is built. By using the evaluation model, the ROI can be forecasted by inputting the values of the three indexes. But to an appointed ROI, it is too burdensome to try many combinations of the values of the indexes in order to satisfy the ROI. Thus, the genetic algorithm for interval optimization is proposed to find the corresponding intervals of those factors according to the anticipant income in a certain interval. 3. Genetic algorithm for interval optimization and the web-advertising income control model Traditional genetic algorithm is to find out the best individual^ K-AfV• ^ V , <XJ0)
individualsX{ ' - {x\ ', x\ ', • • •, x[ '} , and then the population presented as S = {^]\ tf2\.. .X^} comprises N individuals. The output of the individual is: Y = {y^ yi2\...Y(N)}, ^ e [Ymax, Ymin] i = 1, 2...N. The midpoint is presented as Ymtd= (Ymwl+ Ymln) 12 and the fitness function is defined as

By using the genetic algorithm first time, when the fitness achieves the maximum value, the corresponding individual X = {xx, x2...xn} is obtained. It means to find the best combination X = {x\, x2...x„} of the independent variable that makes the output close to the midpoint Ymid of the practical output. Basing on the process of finding the best individual X = {x\, x2...xa}, it generates random intervals (x, - e„ x, + e,), ei>0 of each independent variable (gene) xt, where e, is a positive number chosen randomly. Randomly generate M genes x) k = 1, 2...M, i = 1, 2...«, respectively from n intervals, so consequently M individualsX -{x[ ,---,xn )k = 1, 2...M are obtained, where x; e (x,- eh x,+ e,). The fitness function is defined as Eq. (2). The genetic algorithm is carried out again with the M individuals. After computing several times through the defined crossover operator and mutation, all the individuals that satisfy 1^*' - Yml^ < (Ymax - Ymi„) I 2 are chosen. Suppose P individuals are chosen and the i* genes of the P individuals are ranked from small to big as xfl, xf2, • • •, xfp, so the corresponding interval of the i"1 gene is [ xf ,xf ], j=l, 2...n. In order to determine the possibility that the corresponding output Y, which is calculated by the combination of the random values chosen from n intervals [xf ,xfp], is bounded in [Ymax, Ymi„], we can randomly test H times by choosing JC, from their corresponding intervals [jcf1, xf ] which constitute individuals X(k) = {JC,W, x[k), • • •, X(„k}} , k=\, 2...H. And then determine how many outputs Y= {Y"l\ y ^ , . . . ! ^ } are bounded in[Ymax, Ymin] thereby calculating the frequency that the outputs of the H individuals are bounded in the interval of Y. The frequency presents the approximate possibility describing the outputs, which is calculated by the combination of xt bounded in [xf ,xfp ] and other genes, bounded in an appointed interval. Overall, the interval [xf ,xfp ] of each independent variable xt is obtained, which makes the corresponding output Y e [Ymax, Ymi„], and the possibility of this relationship is also obtained. Suppose CTR and CVR are bounded in [0, 1], CPC\ is depended on the lowest cost demanded by the website, and CPC2 is determined by the maximum cost the advertisers willing to pay. Based on Eq. (1), there is ROI = F (CPC,

CTR, CVR). In order to control the factors CTR, CVR and CPC influencing the ROI, an interval [ROI,, ROI2] is defined for ROI and ROImid= (ROI,+ROI2y2. The individual constituted by CTR, CVR and CPC is defined as tfk)=(CPClk\CTR(k\CVRik)) and the fitness function is defined as: G(Xik)) = ] -j—\ROI^-ROImid\

1 +

,a>0

(3)

a

The best combination X=(CPC, CTR, CVR) of the independent variable is obtained, which makes the output of Eq. (1) achieves or close to the midpoint ROImid of the practical output. By using the genetic algorithm for interval optimization, the corresponding optimal intervals of CTR, CVR and CPC are respectively determined, which make ROI&[ROI,,ROI2]: CPCe[CPC,,CPC2], CVR&[CVR,,CVR2], CTR<E[CTR,,CTR2]. According to these three intervals the ROI can be controlled effectively. 4. Test of the web-advertising income control model Table 1 is the web-advertising sample data of a campaign in 7 weeks. By using the sample data in table 1, the B-P model for the ROI can be built and the parameters of the model are shown in table 2. When the ROI is demanded from 90% to 100%, by combining the evaluation model and the genetic algorithm for interval optimization, three control intervals of the advertising evaluation indexes are obtained, as table 3 shows. In table 3, in order to make the ROI bounded in [90%, 100%], CTR, CVR, CPC should be respectively bounded in [0.030, 0.049], [0.63, 0.99], [1.61, 2.41]. 100 samples bounded in these three intervals are randomly chosen and the ROI of these 100 samples are found out all bounded in [90%, 100%]. It means when the values of the three indexes are respectively bounded in corresponding intervals, the possibility that the outputs, which are calculated by the combinations of the values, are confined in the appointed interval is 100%. Table 1 Web-advertising sample data of a campaign in 7 weeks Week

CTR

CVR

CPC

ROI

Week

CTR

CVR

CPC

ROI

1

0.029

0.718

2.39

0.79

5

0.029

0.546

2.1

0.59

2

0.027

0.396

2.08

0. 16

6

0.030

0.525

1.98

0.62

3

0.033

0.472

1.78

0.61

7

0.035

0.311

2. 17

0. 12

4

0.031

0.651

2.03

0.87

Table 2 The parameters of the B-P model Input neurons 3

Hidden neurons 3

Output neurons 1

Learning times 7566

Total error 0.0009

Smoothness factor 0.15

Adjusting factor 1

Table 3 The result of the genetic algorithm for interval optimization

Control interval

CTR [0.030, 0.049]

CVR TO.63,0.991

CPC [1.61,2.41]

ROI [0.9,11

5. Effective allocation of web-advertising cost In order to raise the advertising income, besides applying the method proposed above to control the values of the factors influencing the advertising ROI, the quality of the influencing factors have to be controlled. Hence, the relation between the keyword of the website and the evaluation indexes of the advertising income has to be studied in order to control the advertising ROI. To raise the CTR and CVR the advertiser should make the keywords of the campaign more pertinence. If a keyword is too general, the advertiser run the risk of getting clicks to his site that aren't really relevant, and have a lower conversion rate. Advertiser can increase the keyword's profitability by adjusting its CPC. For keywords that show a profit, increase the CPC to increase exposure and generate more traffic. For keywords that aren't profitable, decrease the CPC to lower the costs. So clustering is done to the keywords and find out which keywords are profitable and which are not. Two indexes are used: CTR and CVR to determine the quality of the keywords. Table 4. Sample data of the advertising keyword CTR

CVR

Keyword

CTR

CVR

29%

42.4%

data mining technologies

5%

2.8%

45%

61.5%

data mining technology

87%

21.3%

data mining analysis

3%

0.0%

intelligent data mining

3%

3.9%

data mining solutions

6%

0.0%

multimedia data mining

5%

0.0%

Keyword business intelligence data mining data mining

The 8 sample data in table 4 are clustered by using the specimen system clustering method and the distance of the cluster is defined as the minimum distance based on the Euclidean Distance. If the keywords are clustered into two groups, then the result is represented as Ul and U2: Ul={business intelligence data mining, data mining, data mining technology} U2={data mining analysis, data mining solutions, data mining technologies,

intelligent data mining, multimedia data mining} Ul shows the keywords, which contribute a lot to the CTR and CVR, having higher quality, while U2 shows the keywords that have lower quality. So, the CPC of the first cluster can be increased, and the CPC of the second cluster should be decreased to allocate the costs of the web advertisement and it's helpful to increase the advertisers' web-advertising income. 6. Conclusion The genetic algorithm for interval optimization not only can be applied in controlling the influencing factors interval of the web-advertising income but also be validated on controlling multiple factors of general multiple objectives problems. Combining the evaluation indexes of web-advertising income and evaluation neural network model, not only the ROI is forecasted by changing the values of CPC, CTR, CVR but also the optimal control is processed reversibly according to the anticipate advertising income. In optimal control interval, the advertising income of any group, which constituted by "CPC, CTR, CVR", are bounded in the anticipate income interval. The clustering for the advertising keywords in the website can assistant the optimal allocation of advertising cost and effectively controls both the influencing factors and cost of the web advertisement. The result of the application indicates that the models are feasible and can be extended. References 1. Huang Z. W., Liao Q., "A celerity association Rules Method Based on Data Sort Search", Lecture Notes in Computer Science, No. 3613, August 2005, Springer, 1063-1066. 2. Liao Q., Gu H. M., Hao Z. F., "Fuzzy Comprehensive Evaluation of E-Commerce and Process Improvement", Lecture Notes in Computer Science, No. 3828, December, 2005, Springer, 366-374. 3. Liao Q., Li J., Hao Z. F., Li X. W., "Optimized Fuzzy Evaluation Model on Security Risk of Underground network", Asia Pacific Symposium on Safety, 2005,129-135. 4. Li Y. Q. & Liao Q., "Influence Factors and Mathematical Modeling of the Security of Underground Gas Conduit Network". Journal of South China of University of Technology (Natural Science Edition), 2004, 32: 89-93. 5. Syswerda G., Uniform Crossover in Genetic Algorithms. Proc of the 3rd Int. Conf. on Genetic Algorithms, Morgan Kaufmann, Los, 1989, 2-9. 6. Syswerda G., A Study of Reproduction in Generational and Steady-State Genetic Algorithms, Foundation of Genetic Algorithms, Morgan Kaufmann, CA, 332-349.

DESIGN AND IMPLEMENTATION OF AN E-COMMERCE ONLINE GAME FOR EDUCATION AND TRAINING PEILI ZHANG 1 , MEIQI FANG 1 , YI ZENG 1 , JIA YU 2 I. Economy & Science Lab, Renrnin University of China, 1000872, P.R. China 2. School of Electrical Engineering and Telecommunications University of New South Wales UNSW, Sydney, NSW2052, Australia In this paper, we describe our design and implementation of a flexible, efficient way for ECommerce education by utilizing online game mode. We are building a multi-player ECommerce simulating online game - ECGAME, which everyone can login in via Internet all over the world. The platform provides participants with a virtual integrated business world to experience and helps them to learn business rules more effectively. It also motivates participants to learn E-Commerce and sparks competitiveness.

1. Introduction As the online game business growing, we realize that massive multiplayer online game can inspire people's interests greatly. We find that it is a new way to help students experience real business world. Therefore, during this year, we are building a multi-player E-Commerce online game -ECGAME. There are four basic roles in the game: consumers, manufacturers, retailers and transporters. The participants can run their own companies with various strategies and compete with the other companies in the virtual world. In ECGAME, each player will act different roles: consumer, manufacturer, retailer or transporter. The tasks of each role are different. We will calculate the scores based on their performance, for example, the money they earn. It is helpful for students, business planners and e-commerce product designers to train their skills via the game platform. It is also an inexpensive, safe way to evaluate the potential consequences of business strategies and market models. In addition, we cooperate with an art school to ornament our system with embellishment scenes in order to attract participants' interests (Fig. 1.).

Fig. 1. Consumer's Home in ECGAME

499

500 2. Roles Assignment 2.1. Consumer Buy Product

Earn Money

Work

IS

Auction

> Consumer ^ if I ^4

i Stu.dy

Buy Stock Fig. 2. Consumer

Description: Participants who act consumers have various attributes. They need to consume products to enhance their abilities, such as eating food to get strength. There are several ways to earn money. They can auction products with other consumers, work for companies, buy stock and study in the school to get better jobs. 2.2. Manufacturer Research & Development

Buy Materials

J Manufacture ] Products

-

with » Contact Transporters

Bargain with Retailers

k Accomplish the Contract

Employment

Fig. 3. Manufacturer

Description: Participants who act manufacturer need to research and develop new product at first. They can manufacture products as long as they buy enough materials and employ enough workers. They also bargain with retailers to sell their products. After signing the contracts, they need contact with transporters to transfer their products to retailers. 2.3. Retailer

Investigation

Bargain with Manufacturers

Buy the Goods

Fig. 4. Retailer

S

Manage Storage

^ " y J S e l l Goods to Consumers

501 Description: Participants who act retailers should investigate consumers' interests at the beginning. Then, they bargain with manufacturers to buy the goods they want based on the investigation reports. Finally, they need to manage the warehouse and sell the products to consumers. 2.4. Transporter Develop new Lane

Receive order

Transporter

Buy Vehicles Employment

V

1

K*

Transfer goods

Fig. 5. Transporter

Description: Participants who act transporter should develop new lanes firstly. Then they need to buy vehicles and employ workers. Each vehicle has different capacity. At the same time, they will receive orders from manufacturers to transfer goods to retailers. They decide the optimal way to transfer goods to save time. 2.5. Auto Roles It is far less enough to assign only four basic roles (consumers, manufacturers, retailers and transports) to simulate the business market. Hence, we add in several auto roles, such as Supplier, Bank, College, Supermarket, Job Center, Stock Center, Custom Center, Insurance Center, Auction Center, and Arbitration Center etc. Each auto role has its own rules to assist participants to accomplish their tasks. The primary events in ECGAME are listed in Figure 6. Baigain

Sjviiij- mantv

Partkinants Auto rules

Fig. 6. The events among basic roles and auto roles

502 3. Game Scenarios 3.1. Overview ECGAME offers an integral market simulation platform for participants to experience the real business world. As we can see in Figure 7, there are three general game scenarios in our system:

"^N £**"»!

S(»*

M&**(S|S^

i

s-omng 1 tee& Accumuia/ufig

the stock SHr-fcci

Fig. 7. Overview of Game scenarios in ECGAME

Stage 1: Initializing and Accumulating In ECGAME, participants will act consumers with little money and low abilities at the beginning. They will participate in various activities to earn money and enhance their abilities. Stage 2: Running a new company After their skills and personal savings reach the level scenario II requires, participants can enter the scenario II to run a new company. They can act as manufacturer, retailer or transporter. The tasks of each role are different for participants to experience distinct business modes. In this scenario, each participant can manage his company with own strategies and trade with other participants to earn money. Stage 3: Coming into the stock market When participants run their companies well enough, they can apply to act as super enterprisers. The super enterprise can come into the stock market to accumulate more money from the public. In ECGAME, we establish a simplified stock market for participants to experience.

503 3.2. Enterprise resource planning (ERP) We add ERP (Enterprise Resource Planning) components into our online game platform for business management catering the complex task requirements in ECGAME. Participants who act as enterprisers (manufacturers, retailers or transporters) can easily use ERP technology to manage their virtual company during playing the game. In our ERP software, there are five crucial modules: Finance Management, Human Resource Management, Manufacturing Management, Strategy Decision and Materials Management. Participants can learn how to plan enterprise resources efficiently using this platform. 3.3. Enterprise Alliance Although the game provides participants with opportunities to compete with the others, the experienced participants tend to cooperate with potential partners. We find that participants' behavioral patterns keep changing and the relationships among people are dynamic. When they benefit from integrating enterprises' resource, they will become partners. ECGAME provides a platform for those participants to form alliances. The members in the Enterprise Alliance can share information with each other, and come to an agreement with discounts. It is more challenging that participants need to change their strategies and find their partners in order to survive in the game. 3.4. How to win the game In the ECGAME, there are a lot of activities and targets for participants to achieve including various decision factors. In order to win, there are four primary factors that participants should take into account: 1. 2. 3. 4.

Money: How to use the money to invest the company? Time: When to manufacture? When to transfer? Human Resource: How to schedule the workers for manufacturing? Information: What are consumer needs now? How about the opponents?

Therefore, when participants make decisions on the business strategies, they must consider all the factors above. If some one ignores any factor, it will lead to failure. As a whole, only when the student plans everything carefully, could he or she get a high score.

504 4. Conclusion In conclusion, ECGAME is an original and efficient approach to learn and experience E-Commerce. It is an excellent review of many business strategies, management methods, and marketing principles. It provides an approachable way to gain hands-on management experiences. Currently, we have accomplished the prototype of ECGAME with most functions and here we described some main innovations of our system in this paper. As for an online-game, efficiency is a significant factor to attract participants. The participants are inpatient if they play the game in a low speed. Thus, to evaluate the efficiency of our system is a crucial task in the next step. Furthermore, we will add more components in the game for participants to experience other latest technologies: such as Supply Chain Management (SCM), Client Resource Management (CRM) and Supply Resource Management (SRM). References 1. Martin Griss, Reed Letsinger(2000). Games at Work - Agent-Mediated Ecommerce Simulation, Workshop on Autonomous Agents 2000, Barcelona, Spain, June 2000. 2. Hua Cheng, Meiqi Fang and lin Guan, Zuqiang Hong, "Design and implementation of an E-Commerce platform," Journal of Electronic Commerce in Organizations, 2 (2), pp. 44-54, 2004. 3. John Mitchell (2000). The Implications of E-commerce for Online Learning Systems, Proceedings of Moving Online Conference II, 2 - 4 September, Gold Coast, Australia, 110-123. 4. Martin Fowler. Patterns of Enterprise Application Architecture, AddisonWesley, Hardcover, Published November 2002, 533 pages, ISBN 0321127420 5. A. Chaves and P. Maes. "Kasbah: An Agent Marketplace for Buying and Selling Goods". Proceedings of the First International Conference on the Practical Application of Intelligent Agents and Multi-Agent Technology (PAAM'96). London, UK, April 1996. 6. T. Kindberg, et.al., "People, Places, Things: Web Presence for the Real World", HP Laboratories, Technical Report, HPL-2000-16, Feb 2000. 7. A. Durante, et. Al., "A Model for the E-Service Marketplace," HP Laboratories, Technical Report, HPL-2000-17,Feb 2000. 8. R.Guttman, A. Moukas, and P. Maes. "Agent-mediated Electronic commerce: A Survey." Knowledge Engineering Review, June 1998.

SELECTION MODEL OF SEMANTIC WEB SERVICES

X. WANG, Y. ZHAO, A N D W. A. HALANG Faculty of Electrical and Computer Engineering FernUniversitat, 58084 Hagen, Germany E-mail: {xia.wang, yi.zhao, Wolfgang.halang}@fernuni-hagen.de Widely supported by industry and research developments, Semantic Web Services appear to be the next generation of the service-oriented paradigm. Their discovery constitutes a great challenge. Although much work has been done on selecting semantic services, an exact selection model is still not defined, which could describe services and service selection in a powerful way. Moreover, little consideration was given to quantify the selection of services basing on both the capabilities and quality of service (QoS). Here, a QoS-based selection model of semantic services is proposed, and a service matching algorithm working under this model is presented. Also, two real-life examples are discussed to validate its implementation.

1. Introduction With the development of the Semantic Web11, the future web could provide not only static information understood by human beings, but also semantic services which implement a series of tasks and can automatically be accomplished by machines. More and more semantic web services will be developed and published in the fields of business, such as travel agencies, on-line trade, e-commerce, or manufacturing. Thus, the automatic discovery or selection of desired services is becoming critical. The achievements in the area of Semantic Web Services (SWS) so far can be summarised as follows. Currently, OWL-S12 is one important web ontology language for services, providing strong semantic support for web services. There is also some work on service matching algorithms. Sycara et al. 6 , for instance, proposed a dynamic matchmaking algorithm based on the agent capability description language LARKS. Similar work1,7 adopts these ideas to different degrees. Most current work, however, deals with the selection problem only with respect to similarity of non-functionality and functionality. The quality attributes of services10 are considered to a lesser extent. Moreover, the similarity of services is considered to match the whole description of ser505

vices published in OWL-S or WSDL documents , which may render the selection process to be erroneous 5 as the documents contain too much redundant information. Therefore, as a remedy, this paper proposes a concise and practical QoS-based service model inherently supporting selection. In the context of this model, two real-life examples of implementing the matching process are presented, which consider both the qualities and the capabilities of services. 2. M o d e l of S e m a n t i c W e b Services To describe a service, its non-functionality and functionality should be defined. Based on 2 , 1 2 , a service is stated as a tuple s = (NF, F, Q, C), where NF is a set defining its non-functionalities, F as a set describing the functionalities, Q defines the quality of the service, and Cis the overall cost of its invocation. The details of this service model presented in OWL Description Language (DL) 3 are as follows: - NF, an array of String, is denoted as NF = {serviceName, serviceCategory,textDescription}. According to the OWL-S specification, they are: (1) serviceName a String as the name of a service; (2) serviceCategory a String array as the categories of a service, which is classified according to the application fields, kinds of services, or some industry taxonomy, e.g., NACIS a or UN-SPSC b ; (3) textDescription a short text readable for humans briefly describing the service. - F is a set of service functionalities defined as F = {/j, ji, •••, fi}, i € N, with the ith function fa = (cypi,Hji,T1oi,Consi,Pi,Ei) described by: (1) (2) (3) (4) (5) (6) (7)

a b

opi: String naming the operation i; S / ; : array of String consisting of input parameters; HoV array of String consisting of output parameters; Consi: set of assertions expressed in DL syntax as constraints; Pi: set of assertions in DL syntax as pre-conditions; Ei\ set of assertions in DL syntax as effects; Relation 5: (op, x T,0. x Consi x Pj) — • ( £ 0 . x .E*) being the logic implication of service execution.

North American Cartographic Information Society (NACIS), www.nacis.org United Nations Standard Products and Services Code (UNSPSC), www.unspsc.org

The relation 5 follows the logic statement "If (opi x So, x ConSi x Pj) is true, then (Eoi x Ei) is true". That means, if the inputs, constraints and pre-conditions required by operation i are fulfilled, its relevant outputs and effects are yielded. - Q denotes the quality attributes of a service. In 8 , all possible quality requirements of services were identified. For the purpose of efficiency of service selection and from the point of view of clients, our model only categories QoS metrics into two parts, a necessary set (Qn) and an optional one (Qo)- We explicitly define Q in our model as Q = Qn = {QI,Q2, •••jQj}, j € N, which is formalised as a set of attribute assertions expressed in DL syntax, see Fig. 1. - C is the overall cost of invoking a service, generally an assertion formulated in DL syntax. Fig 1 takes as examples two real flight-booking systems0, which have been re-described using the above service model. T»F-{CheapFtightBearch, Flight, "Search far a cheap flight tickel.UK citizens"); OP={CheapFlightSearch); Sf={LondonUK,OrlandoVSAJu!yl405Myi 705,lAdu!ts.OCkildren.01n/antsJieturnTrip); Z^=(TicketDeparl, TicketReturn); Cons=(n(=M5.70cosLTicket)r(=morningti me.DepartTicket)ri(=eveningtime.Return7ick et) r\(=economyc!ass. Ticket)); P={(=\valid.PaymentCard)); .&={ {money.InsurenceCard-cosL Ticket)); Q=(rX<8time.quaIityResponseTime)r\£8tim e.qualityExecuteTime)ri^8rank.Reliabihty) nQ>7rank.quahtyExceptionHandling)n(i'9 rankqualityAccuracy) n&2rank.quahtySecu rip)); C={(= \55.70cosLCostService));

NF={Flights,Flight, "travel expertise, unrivalled knowledge, excellent value... "); OF=(FindFlight). Sj={LondonCit//Airport,OrlandolISAJulyl 405Julyl70JAduIts,OChildren,OInfants,Retu rnTrip); S0={TicketDepart, TicketReturn); Cons={rX=22S0.35cost.Ticket)ri(=morningti me.DepartTicket)n(=eveni/igtime.RelurnTick et) n (=2class.Ticket) ) , P={{=\valid.PaymentCard)); E=((money.InsurenceCard-cost.Ticket)), Q={rX^2time.qualityResponseTime)n(<8ti me.qua!ityExecuteTime)r\(^8rank.Re!iabilil y)r\(^7rankqualityExceptionHandling)n(8 rank.qualityAccuracy)n(^8rank.qualitySec "rip)); C={(< \M35cosLCostService)),

(a) Advertisement;

(b) Advertisement

Figure 1.

c

Two real-world examples of the service model

Advertisement}, at www.squareroutetravel.com/Flights/cheap-fiight-ticket.htm; Advertisement2, at www.travelbag.co.uk/fiights/index.html

and

3. S e l e c t i o n A l g o r i t h m

Usually, there is always a service requester {sR), and a number of service providers {sA). A selection algorithm matches the published services potentially meeting the requirements of the service requester. Denoted as sR = (NFR, FR, QR, CR) and sA = {NFA, FA,QA, CA), the service requirement and the service advertisement form the inputs to the matching engine. Then, pairs of sA and sR are interpreted and processed conforming to the matching algorithm. The output of the matching engine is a sorted set of candidate services returned to the requester as result. The matching process can flexibly be organised by various kinds of filters. Here, we sequentially combine three filters to illustrate this. I. NF - This filter is based on service name, service category, and service description to select the available services, and obtain a collection of relative candidates, named resultSetn. The process is to run a program implementing the pseudo-code in Fig. 2 twice, using serviceName and serviceCategory as input parameters, respectively. The distance of two different concepts is computed according to 6 . The outputs are the similarity of serviceName and serviceCategory, with a value between 0 and 1. matchConcept (ConceptR, ConceptA){ double simConcept; simConcept = distance (ConceptR, ConceptA) return eimConcept; } / / BimConcept G [0,1];

Figure 2. Matching and serviceCategory

serviceName

matchDescript ion (DescriptionR, DescriptionA){ Double simDes; for each term in DescriptionR and DescriptionA { DesR[tj] = number of term t j in DescriptionR; DesA[ti]= number of term t j in DescriptionA;} a i m D e s = (DesR * DesA)/(\DesR\ * |Des<4|); returnsimDea; } / /simDes e [0, 1];

Figure 3.

Matching

serviceDescription

To measure the similarity of short text information, like serviceDescription, the cosine coefficient is a simple and effective approach. In Fig. 3, serviceDescription is the input, and the similarity of descriptions is the output, which ranges between 0 and 1. II. op, £/, Eo - This filter works on the functionalities of service, and obtains a result set resultSetf. First, the operation op is compared by using matchConcept, as defined in Fig. 2: matchConcept{OperationR, Operation A); //returnsimOP G [0,1] Second, if the service requester's input information subsumes the parameters required by the service advertisement, request and advertisement are considered to be similar. Then, the value of simPara is assigned to 1. Otherwise, it is thought that the requester's input parameters are not sufficient to fulfill the implementation of advertised service, and simPara is set to 0.

The function matchOutput, however, does the opposite. If the service requester's output cannot be subsumed in the service advertisement, then the results of the advertisement's execution meet the service requester's requirement only partly. In this case the match fails, noted as 0. III. Cons, P, E, Q and C - This filter processes resultSet/ to refine the selection and similarly to obtain resultSet. In this step, the matchmaking is processed on a set of expressions in DL format. Here, one could not simply consider the subsumption, because the similarity degree is more important than their subsumption relationship. Considering cost. Ticket in Fig. 1, the requester R is assumed to state that the ticket must cost less than $800; Advertisement\ asks for $645.70 and A2 for $2280.35. In this case, an approach should be used to scale the close-degree rather than the values' difference. This is addressed by the algorithm as: closeExpCost(R, Ax) = 8^(800 - 645.70) = 0.192875; closeExpCost{R, A2) = ^(800 - 2280.35) = -1.8504375; Obviously, the bigger the numerical value is, the better is the result. Advertisementi is a better choice than AdvertisementiIf extended to a set of expressions (in this context, Cons, P, E, Q, and C), the multi-attribute utility theory 2 , 5 should be applied. T h a t is, after computing the close-degree based on fuzzy computing 4 , to the expressions are assigned utility scores 2 between 1 and 5 based on the attribute definitions. In the above example, the values of closeness are utilityScore(R, A\) = 5 and utilityScore{R, A2) = 1. A set of expressions is E = {E\, E2, •••,En} is assumed, whose related utility score are u = {ui,U2, . . . , u „ } , n € N. If a weight (v) between 0 and 1 is assigned to each attribute, then .

.

svmExpression =

Vi • Ml + V2 • U2 + ... + Vn • Un

. vi+v2 + ... + vn Cons, P, E, Q, and C are all sets of expressions, whose corresponding similarities could be calculated as simCons, simP, simE, simQoS and simC. Similarly, to quantify the similarity between service requirements and advertisements, to each attribute involved in the matchmaking a weight between 0 and 1 is assigned. Then, 11

sumSimilarity

= (S,

w

i)~l

' (wi • simName

+ w2 • simCategory

-f W3 x

!=1

x simDes x simCons

4- W4 • simOP + W5 • simlnput

+ WQ • simOutput

+ ws • simP + uig • simE + W\Q • simQoS

+ 107 x

+ u>n • simC)

510 4. C o n c l u s i o n Semantic Web Services appear to be the next generation of the serviceoriented paradigm. The discovery of semantic services constitutes a great challenge. All problems encountered currently lead to the conclusion that an effective semantic service model for selection is missing, which provides powerful expressiveness with respect to capabilities and qualities of service, and that a quantified matching algorithm is needed, too. For these problems, this paper proposed a selection model, which effectively implements QoS-based matchmaking of semantic web services. The matching process is discussed with two real examples. Future work is to optimise the method of accessing each quality attribute of service, and to consider how to combine various QoS attributes. References 1. C. Zhou, L. Chia and B. Lee, DAML-QoS Ontology for Web Services, Proc. ICWS04, pp. 472-479 (2004). 2. D. Menasce, Composing Web Services: A QoS View, IEEE Internet Computing, 8(6) (2004). 3. D. Nardi, F. Baader, D. Calvanese, D.L. McGuinness and P.F. PatelSchneider, The Description Logic Handbook, Cambridge (2003). 4. E. Herrera-Viedma et al., Evaluating the Informative Quality of Web Sites by Fuzzy Computing with Words. Proc. Atlantic Web Intelligence Conference, LNAI 2663 (2003). 5. E.M. Maximilien and M.P. Singh, A Framework and Ontology for Dynamic Web Services Selection, IEEE Internet Computing, 8(5), 84-93 (2004). 6. K. Sycara, S. Widoff, M. Klusch and J. Lu, LARKS: Dynamic Matchmaking Among Heterogeneous Software Agents in Cyberspace, Autonomous Agents and Multi-Agent Systems, 5(2), 172-203 (2002). 7. M. Paolucci, T. Kawmura, T. Payne and K. Sycara, Semantic Matching of Web Services Capabilities, Proc. ISWC02, pp. 333-347 (2002). 8. QoS for Web Services: Requirements and Possible Approaches. W3C Working Group Note (2003). 9. R. Akkiraju et al., Web Service Semantics - WSDL-S, A joint UGA-IBM Technical Note, Version 1.0 (2005). 10. S. Ran, A Model for Web Services Discovery With QoS, ACM SIGecom (2003). 11. T. Berners-Lee, J. Hendler and O. Lassila, The Semantic Web. Scientific American, 284(5), 34-43 (2001). 12. The OWL Services Coalition, OWL-S Semantic Markup for Web Services, W3C Member Submission (2004). 13. Web Services Description Language (WSDL) Version 2.0, Part 0: Primer. W3C Working Draft (2004).

A T R U S T ASSERTION M A K E R TOOL

P A O L O CERAVOLO, E R N E S T O DAMIANI, M A R C O VIVIANI

E-mail:

University of Milan via Bramante 65, 26013 Crema (CR), Italy {ceravolo, damiani, viviani}@dti.unimi.it

ALESSIO C U R C I O , MICOL PINELLI

E-mail:

University of Milan via Bramante 65, 26013 Crema (CR), Haly {acurcio, mpinelli}@crema.unimi.it

In this paper we outline the architecture of a distributed Trust Layer that can be superimposed to metadata generators, like our Trust Assertion Maker. TAM is a tool allowing to produce metadata from multiple sources in an authoritybased environment, where user's role is certified and associated to a trust value. These metadata can complement metadata automatically generated by document classifiers. Our ongoing experimentation is aimed at validating the role of a Trust Layer as a technique for automatically screening high quality metadata in a set of assertions coming from sources with different level of trustworthiness.

1. I n t r o d u c t i o n Nowadays, communication technologies have created a space of flows that substitutes the space of places: geographical proximity is giving way to relational proximity. These phenomena greatly impact on the architecture of a community of practice, giving to it a distributed extension of the internal processes and dynamics. CoPs exist within businesses and across business units and company boundaries; even though they are informally constituted, these self-organizing systems share the capability of creating and using organizational knowledge through informal learning and mutual engagement 5 , putting users at the center of the space of interactions. Moreover, in current business contexts that require multidisciplinary approaches and competencies, this stress the relevance of user's role, reputation, and trust. For these reasons, generic knowledge management techniques in CoPs have 511

512 to be evolved towards a source oriented evaluation of the acquired knowledge. The knowledge extracted during the analysis of the information flow produced by the community must be filtered by the relevance of the node producing it. Also the composition of nodes can evolve and the knowledge is continuously under the evolution pressure. Typically, knowledge management techniques used metadata in order to specifying content, quality, type, creation, and spatial information of a data item. A number of specialized formats for the creation of metadata exist. A typical example is the Resource Description Framework (RDF). But metadata can be stored in any format such as free text, Extensible Markup Language (XML), or database entries. All of these format must relay on a vocabulary that can have different degree of formality. If this vocabulary is compliant to a set of logical axioms it is called an ontology. There are a number of well-known advantages in using information extracted from data instead of data themselves. On one hand, because of their small size compared to the data they describe, metadata are more easily shareable than data. Thanks to metadata sharing, information about data becomes readily available to anyone seeking it. Thus, metadata make data discovery easier and reduces data duplication. On the other hand, metadata can be created by a number of sources (the data owner, other users, automatic tools) and may or may not be digitally signed by their author. The present paper briefly outlines our current research work (for a more detailed description, see 2 ) on how to validate such assertions by means of a Trust Layer, including a Trust Manager able to collect votes from the different nodes and to compute variations to trust values on metadata. Then we focus our discussion on the description of Trust Assertion Maker (TAM), a tool especially designed for allowing the manual production of trust assertions, in distributed environments. Trust assertions are metadata assertions associated to a trust value. The trust level of an assertion is based on user roles. The adoption of a manual tool allows a detailed level of description and represents an important complement to assertion produced by automatic classifier that simply associate resources to domain concepts. This paper is organized as follows: in Section 2 we outline the architecture of our Trust Layer, while Section 3.1 we focus our attention on TAM; finally in section 3.2 we define in detail the type of asserttns supported by TAM.

2. T h e Trust Layer architecture Before describing our proposed Trust Layer, let us make some short remarks on related work. Current approaches distinguish between two main types of trust management systems *, namely Centralized Reputation Systems and Distributed Reputation Systems. In centralized reputation systems, trust information is collected from members in the community in the form of ratings on resources. The central authority collects all the ratings and derives a score for each resource. In a distributed reputation system there is no central location for submitting ratings and obtaining resources reputation scores; instead, there are distributed stores where ratings can be submitted. In our approach trust is attached to metadata in the form of assertions rather than to generic resources. While trust values are expressed by clients, our Trust Layer includes a centralized Metadata Publication Center that acts as an index, collecting and displaying metadata assertions, possibly in different formats and coming from different sources. It is possible to assign different trust values to assertions depending on their origin: assertions manually provided by a domain expert are more reliable than automatically generated ones. Metadata in the Publication Center are indexed and clientss interact with them by navigating them and providing implicitly (with their behavior) or explicitly (by means of an explicit vote) an evaluation about metadata trustworthiness. This trust-related information is provided by the Publication Center to the Trust Manager in the form of new assertions. Trust assertions, which we call Trust Metadata, are built using the well-known technique of reification. This choice allows our system to interact with heterogeneous sources of metadata: our Trust Metadata are not dependent on the format of the original assertions. Also, all software modules in our architecture can evolve separately; taken together, they compose a complete Trust Layer, whose components communicate by means of web services interfaces. This makes it possible to test the whole system despite the fact that single models can evolve with different speeds. Summarizing our architecture, the Trust Manager is composed of two functional modules: Trust Evaluator: examines metadata and evaluates their reliability; Trust Aggregator: aggregates all the inputs coming from the trust evaluators by means of a suitable aggregation function. This system allows to integrate large amount of assertions produced from different sources. Trust aggregation algorithms provides a self-running mechanism allowing high quality assertion to emerge in the whole set of produced assertions. Fig. 1 describes the architecture of our Trust Layer. More details

514 on Trust Manger can be found in 3 .

Figure 1. The Trust Layer Architecture.

3. Trust Assertion Maker 3.1. Architecture

and

requirements

The functional requirements of TAM are to manage role-based creation of metadata assertions. This means that each user must be associated to a role. Roles are obtained by an authority tasked to certify user's expertise. For these reasons TAM is organized according to a client-server architecture. The client provides the structure for editing assertions. The server manages user's access, and maintains the metadata base synchronizing updating. User's expertise is represented associating to a role the portions of ontology describing a domain where his expertise can be considered reliable. Portions of ontologies can be computed simply listing a concept list containing concepts where the expertise of a role is maximal. These concepts are associated to a trust value. And trust values of other concepts of the ontology are deviated decreasing progressively the trust value according to the distance from the closer concept among those in the concept list. In a knowledge management system the role of a manual metadata editor such

515 as TAM, is to complement automatic classifiers. Tools implementing automatic or semi-automatic algorithms for indexing resources are essential in order to support a satisfactory number of resources. But these tools are able to provide only simple assertions connecting a resource to a concept. For this reason they can be used only for producing a first base of metadata. In order to enrich the metadata base users must be provided with easy instruments for the production of semantically complex assertions. As shown in Fig. 2 TAM guides the user in a step-by-step dialogs allowing to produce an assertion without any expertise on the format used in the system for storing and sharing metadata assertions. According to Stojanovic, Staab, and Studer 4 it is possible to distinguish two type of ontologies: a domain ontology, or content ontology, that provides the vocabulary valid for a particular domain, or a structure ontology, or context ontology, that provides general concepts, cross functional to specific domains, describing the type/structure of the resource. Our tool supports both these type of ontologies. We call Direct the assertions made on structural concepts, because we directly assert on the resources. We call Indirect the assertions made on the content of a resource. Also the tool allows to produce simple or complex assertions. In summary we have four type of metadata assertions, as shown in Table 1. In section 3.2 we describe in more detail these assertions.

Direct Indirect

3.2. Editing

Simple DS IS

Complex DC IC

assertions

Metadata produced by TAM are editable by means of an user friendly interface. The interface proposed in TAM leads the user to the creation of an assertion, using step-by-step dialogs. In the first step the user has to select the role he want to hold, the resource he want to relate to the metadata assertion, and the ontology he want to use for creating metadata. Fig. 2 shows a screenshot from the TAM interface. In the second step the user has to choose the type of assertion he want to create. As previously mentioned four type of assertions are supported by TAM.

Figure 2.

The TAM interface.

IS: Indirect Simple assertion. An Indirect Simple assertion consists in associating single concepts of the domain ontology to a resource. To obtain this, it is necessary to select the resource and, then, to choose from the contextual menu, the option related to indirect simple assertion. Then, it is necessary to individuate the concept of the ontology to be related to the resource. For example, a indirect simple assertion can be represented by the following: "Document-X speaks about Enterprise", where E n t e r p r i s e is a concept belonging to the domain ontology. DC: Indirect Complex assertion. An Indirect Complex assertion consists in associating not only single concepts of the ontology to a document, but whole logic assertions, structured according to the model subject-predicate-object. To obtain this, it is necessary to select the resource and, then, to choose from the contextual menu, the option related to direct complex assertion. Then, it is necessary to individuate the subject of the assertion, selecting a concept from the ontology. After that, it is necessary to specify the predicate, choosing it among all the possible predicates, suitable with the subject (considering also the inheritance among the concepts). Finally, it is necessary to select the object of the assertion. The following statement illustrates an example: "Document-X speaks about enterprise that invests in technology". Where, E n t e r p r i s e

517 and t e c h n o l o g y are concepts belonging to the domain ontology. DS: Direct Simple assertion. Direct Simple assertion is different from the previous ones, because it is built relying on a structure ontology. This assertion specifies the type of the current resource the user is indexing. This is an example of a simple no-semantic assertion: "Document-X is an image". IC: Direct Complex assertion. Direct Complex assertion is well explained by the following example: "Document_X is an image that has a white background". The structure is similar to the indirect complex assertion; the only difference is in the typology of utilized ontology, that is a structural ontology, instead of a domain ontology. 4. Conclusions In this paper, we have presented an approach for developing a Trust Layer service, aimed at improving the quality of automatically generated metadata. Such a system must be supported by a tool for manual creation of complex metadata assertions. A specific solution to this requirement is illustrated in the paper, introducing TAM: a tool allowing to produce different type of metadata assertions in a distributed environment.

Acknowledgments This work was partly funded by the Italian Ministry of Research Fund for Basic Research (FIRB) under projects RBAU01CLNB_001 "Knowledge Management for the Web Infrastructure" (KIWI), and RBNE01JRK8_003 "Metodologie Agili per la Produzione del Software" (MAPS).

References 1. J. Audun, I. Roslan, C.A. Boyd, Survey of Trust and Reputation Systems for Online Service Provision. Decision Support Systems (to appear) (2005) 2. P. Ceravolo, E. Damiani, M. Viviani, Adding a Trust Layer to Semantic Web Metadata To appear in Soft Computing for Information Retrieval on the Web, F. Crestani, E. Herrera-Viedma, G. Pasi, eds., Elsevier. 3. P. Ceravolo, E. Damiani, M. Viviani, Adding a Peer-to-Peer Trust Layer to Metadata Generators, Lecture Notes in Computer Science, vol 3762, pages 809-815, (2005).

518 4. L. Stojanovic, S. Staab, R. Studer, E-Learning based on the Semantic Web. Proceedings of the World Conference on the WWW and Internet, Orlando, Florida, USA, (2001). 5. E. Wenger, Communities of Practice: The Key to Knowledge Strategy, Knowledge Directions, 48-64, (1999).

W E B ACCESS LOG MINING WITH SOFT SEQUENTIAL PATTERNS

C. F I O T , A. L A U R E N T A N D M. T E I S S E I R E LIRMM - CNRS - UMII, Montpellier, France E-mail: {fiot, laurent, teisseire} @lirmm.fr

Mining the time-stamped numerical data contained in web access logs is interesting for numerous applications (e.g. customer targeting, automatic updating of commercial websites or web server dimensioning). In this context, the algorithms for sequential patterns mining do not allow processing numerical information frequently. In previous works we defined fuzzy sequential patterns to cope with the numerical representation problem. In this paper, we apply these algorithms to web mining and assess them through experiments showing the relevancy of this work in the context of web access log mining.

1. Introduction The quantity of data from the World Wide Web is growing dramatically: requested URLs, number of requests or session duration, etc. are gathered automatically by web servers and stored in access log files. Analysing these data can provide useful information for performance enhancement or customer targeting. In this context, many works have been proposed to mine usage patterns and user profiles [1, 2, 3]. Particularly, [4] provides knowledge from database of visited page sequences. However this method, based on sequential pattern mining, cannot be used to mine numerical data contained in these log files, such as number of requests for the same page, transfer rates, number of downloaded kilobytes or duration of sessions. Few works have been carried out to process such numerical data and most of them are restricted to association rules [5, 6]. Sequential patterns are more adapted to time-stamped data. In order to cope with this problem, we propose here to apply an efficient fuzzy approach for sequential pattern mining to mine time-stamped numerical data from web access logs. This approach, defined in our previous works [7], is based on the definition of fuzzy intervals. Obtained patterns are of the type "60% of users visiting a lot the Disneyland website and a few Eiffel Tower pages visit later a lot 519

520 of traveling websites". These patterns are characterized by their support, which is the percentage of users who have followed this rule. Three approaches are proposed to mine such rules: SPEEDYFUZZY, MINIFUZZY and TOTALLYFUZZY, differing by the support computation. The end-user is allowed to choose between the speed of result extraction and the accuracy of the obtained frequent patterns. Implementation of these solutions is based on a method, which extends the PSP algorithm proposed in [4]. Experiments were carried out on synthetic datasets and on real-world data. They highlight the feasibility and robustness of a fuzzy approach. This paper gives an overview of our algorithms focusing on the processing of web log data. Section 2 introduces sequential patterns and fuzzy sequential patterns. Section 3 presents the experiments on web access logs. Section 4 concludes on the perspectives associated to this work.

2. From Crisp to Fuzzy Sequential Patterns In this section, we briefly describe the basic concepts of sequential patterns and fuzzy sequential patterns. Let T be a set of object records where each record consists of three information elements: an object-id, a timestamp and a set of items. An itemset, (iii-z . • -ik), is a non-empty, unordered set of items taken from / = {ii,i2, •••,*m}- A sequence s is a non-empty ordered list of itemsets, denoted by < siS2-..sp >. The support of a sequence s is the percentage of objects having s in their records. To decide whether a sequence is frequent or not, a minimum support value (minSupp) is specified by the user. A sequence s is said to be frequent if support(s) > minSupp. The problem of sequential pattern mining is to find all maximal frequent sequences [8]. In this context, fuzzy sequential patterns were defined to handle quantitative attributes. [9] and [10] proposed methods to mine fuzzy sequential patterns without providing experimental evaluations. We thus here consider the complete approach from [7], defining three fuzzification levels through the algorithms, SPEEDYFUZZY, MINIFUZZY and TOTALLYFUZZY. We consider the fuzzy extension of item, itemset and sequence. The quantity universe of each attribute is discretized into fuzzy subsets (see next paragraph). A fuzzy item is the association of one item and one fuzzy set. For instance, [/php/tutor.htm, lot] is a fuzzy item where lot is a fuzzy set defined by a membership function on the access quantity universe of the item /php/tutor.htm. A fuzzy itemset is a set of fuzzy items, e.g. ([/php/tutor.htm,lot][/php/functions.php, little]). Note that the fuzzy itemset {{/php/tutor.htm, lot]{/php/tutor.htm, little])

521 is not a valid fuzzy itemset, since no item can be repeated within an itemset. A fuzzy sequence is a sequence of fuzzy itemsets, e.g. < ([/php/faq.htm,lot})(\/php/eg.php,lot]) >. Table 1.

Access grouped by IPs, temporaly ordered (empty cells for unaccessed pages)

Cust./IP CI 82.228.151.02

86.197.153.12

82.226.199.47 82.226.199.48

| Date dl d2 d3 d4 d5 dl d2 d3 d4 dl d2 d3 d4

|

/php/tutor.htm 2 1

|| 1 j

/php/eg.htm

||

/php/functions.php

3

/php/faq.php

1 1 1 2

4

2

2 1

4 1

||

|

3 1 5

1

dl

2 2

1

First, the quantitative database is converted into a membership degree database. These partitions are automatically built by dividing the universe of quantities into intervals. Each interval groups the same proportion of users. It is then fuzzified in order to enhance generalization. From these membership functions we get the membership degrees for each record and for each fuzzy set. An example is given for CI in Table 2. Table 2. D. dl d2 d3 d4 d5

Membership degrees for customer 1

/php/tutor.htm U. | L. || ,, j , 1 :• J".

Items /php/eg.htm L. || m. li 5

V 7S

/php/functions.php li.

L.

/php/faq.php ||

m

|

T.

i j

0.5

0.3 0.5 ••

fM

.

0.5 II.fi 1

1

The support of a fuzzy itemset (X, A) is the proportion of objects supporting it. We propose three definitions depending on the fuzzification level: (1) SPEEDYFUZZY counts objects recording, for each item of the itemset, a membership degree not null at least once; (2) MINIFUZZY is based on a thresholded count, incrementing the number of objects supporting the fuzzy itemset when each of its item has a membership degree greater than a specified threshold in the data sequence; (3) TOTALLYFUZZY carries out a thresholded S-count. The support is computed as a weighted sum of the membership degrees greater than a relevancy threshold u>. The support of a fuzzy sequence is computed as the ratio of the number of objects supporting this fuzzy sequence compared to the total number of objects in

522 database. This support degree is computed algorithms detailed in [7]. Let us consider the membership database for customer 1 (Table 2) and the support of < ([/php/tutor.htm, lot])([/php/f unctions.php, lot]) >. With SPEEDYFUZZY, we consider the items underlined into account. With MINIFUZZY (w=0.49), we take the items boldfaced into account. With TOTALLYFUZZY (o;=0.49), customer 1 supports the sequence, the best occurrence of the sequence is kept, twice underlined. Table 3. SPEEDYFUZZY MINIFUZZY

TOTALLYFUZZY

Sequential patterns extracted with

minsupp=55%

<([/php/tutor.htm, little])([/php/functions.php, lot])> <([/php/tutor.htm, lot]}([/php/functions.php, lot])> <([/php/tutor.htm, little])([/php/functions.php, lot])> <([/php/functions.php, lot])> <([/php/tutor.htm, little])([/php/functions.php, lot])> <([/php/functions.php, Iot])>

75% 75% 75% 75% 69% 56%

Table 3 shows the sequential patterns respectively extracted by SPEEDYand TOTALLYFUZZY. Note that the frequent items are the same for all counting methods. The difference is in the number and length of the sequences. For a same minSupp, the number and length of the mined patterns are indeed greater with MINIFUZZY or SPEEDYFUZZY than with TOTALLYFUZZY (due to the thresholded E-count). This reduction in the number of patterns can be used for a database containing very high amount of frequent patterns to find the most relevant ones, the user will thus be provided with a selection of patterns, and not only have to assess a selection of patterns and not a really large quantity of them. So SPEEDYFUZZY could be used to identify user profiles, whereas TOTALLYFUZZY for mining detailed downloading rate. FUZZY, MINIFUZZY

3. E x p e r i m e n t s In this section, we present performances of the algorithms SPEEDYFUZZY, MINIFUZZY and TOTALLYFUZZY compared to PSP [4] and results of web access log mining with TOTALLYFUZZY. We show that soft sequential pattern mining brings more relevant information than crisp methods. Access logs from a laboratory website were prepared and mined to find frequently visited pages - such as in crisp sequential pattern mining - but also repeatedly visited pages. The access logs were pre-processed and the dataset recorded the number of accesses to one page, the same half-day by one user. For example, record "1500 5067 10 6" means that "visitor 21500" on half-day

523 5067 visited 6 times the URL coded by 10. This dataset contained 27,209 web pages visited by 79,756 different IPs over 16 days (32 half-days). As explained previously, we fuzzify the number of access per visitors on each page 3 fuzzy sets using a tool based on the DiscretizeFilter module of Weka [11]. Note that all data modeling choices have impact on what can be extracted from the data. Next experiments should thus be carried out using a fuzzification more adapted to the dataset, based on the approach of [12]. First, we compared performances of the fuzzy algorithms to those of PSP [4]. Figure 1(a), it can be noted that SPEEDYFUZZY is almost as fast as PSP despite the fact that it scans three times more items. Figure 1(b) shows that MINIFUZZY and TOTALLYFUZZY extract less frequent sequences than SPEEDYFUZZY and PSP. MINIFUZZY and ToTALLYFuzzYonly keep the items which have a degree greater than w and so which are considered as relevant by the user. The number of frequent sequences is then necessarily reduced compared to SPEEDYFUZZY or PSP. Figure 1(c) shows the extraction time according to the number of data sequences in database for minSupp = 0.2. These preliminary experiments show that results on fuzzy logs are consistent with the one on crisp values. The same frequent URLs can indeed be found using crisp or soft sequential pattern mining. The advantages of our method concern the additional knowledge supplied by quantities. Indeed, while the crisp algorithm PSP found URL 139 was frequently accessed, TOTALLYFUZZY found that it was frequently accessed between 2 and 5 times during the same period.

Runtime according to the minsup value

(a)

t of frequent sequences according to minSupp

(b)

Runtime according to the sequence number

(c)

Figure 1. (a)Runtime according to rninSupp for 79756 sequences; (b)Number of frequent sequences according to minSupp for 79756 sequences; (c)Runtime according to the number of sequences in the datasets, for minSupp=0.2%

524 4. C o n c l u s i o n a n d p e r s p e c t i v e s Historical analysis of web logs and more especially sequential pattern extraction from web databases is highly interesting for customer targeting, server dimensioning or transfer optimization. In this paper we propose to mine web logs using fuzzy sequential patterns handling three fuzzification levels thanks to three algorithms. This choice allows the extraction of frequent sequences by making a trade-off between relevancy and performance. Experiments on web access logs have highlighted the relevance of our proposal. This work builds many perspectives, for instance, to mine web-purchases for e-marketing.

References 1. M. Spiliopoulou and L. C. Faulstich. WUM: A tool for Web utilization analysis. Lecture Notes in Computer Science, 1590, 1999. 2. O.-R. Zaiane, M. Xin and J. Han. Discovering web access patterns and trends by applying olap and data mining technology on web logs. In Proc. of the Advances in Digital Libraries Conf., pages 19-30, 1998. 3. E. Damiani, B. Oliboni, E. Quintarelli and L. Tanca. Modeling users' navigation history. In WS. on Int. Tech. for Web Personalisation, 2001. 4. F. Masseglia, P. Poncelet and R. Cicchetti. An efficient algorithm for web usage mining. Networking and Information Systems Journal, 2(5-6), 1999. 5. C. M. Kuok, A. W.-C. Fu and M. H. Wong. Mining Fuzzy Association Rules in Databases. SIGMOD Record, 27(1), pages 41-46, 1998. 6. R. Srikant and R. Agrawal. Mining Quantitative Association Rules in Large Relational Tables. In Proc. of the ACM SIGMOD Int. Conf. on Management of Data, pages 1-12, 1996. 7. C. Fiot, A. Laurent, M. Teisseire and B. Laurent. Why Fuzzy Sequential Patterns can Help Data Summarization: An Application to the INPI Trademark Database. In Proc. of the 2006 Fuzz-IEEE Conference, to appear. 8. R. Agrawal and R. Srikant. Mining Sequential Patterns. In Proc. of the 11th Int. Conf. on Data Engineering, pages 3-14, 1995. 9. T.P. Hong, K.Y. Lin and S.L. Wang. Mining Fuzzy Sequential Patterns from Multiple-Items Transactions. In Proc. of the Joint 9th IFSA World Congress and 20th NAFIPS Int. Conf., pages 1317-1321, 2001. 10. Y.-C. Hu, R.-S. Chen, G.-H. Tzeng and J.-H. Shieh. A Fuzzy Data Mining Algorithm for Finding Sequential Patterns. Int. J. of Uncertainty Fuzziness Knowledge-Based Systems, 11(2), pages 173-193, 2003. 11. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools with Java Implementations, 2000. 12. A. Gyenesei and J. Teuhola. Multidimensional fuzzy partitioning of attribute ranges for mining quantitative data: Research articles. Int. J. Intell. Syst., 19(11), pages 1111-1126, 2004.

AN IMPROVED ECC DIGITAL SIGNATURE ALGORITHM AND APPLICATION IN E-COMMERCE XU XIAO-PING Guangdong

Polytechnic

Normal

University

Guangzhou

Electron-information

Department,

510633

Abstract: It is well known that Elliptic Curve Cryptograph (ECC) features the highest bit security. The author analyses the advantages of ECC by comparison, explores the mathematics base on elliptic curve and the complexity of discrete logarithm, and makes improvements on existing Elliptic Curve Digital Signature Algorithm (ECDSA) to accelerate the arithmetic speed and shorten time for data transmission. Better effects have been achieved by applying ECC into E-commerce to realize digital signature algorithm. Keyword: Elliptic Curve Cryptograph, Digital Signature, E-commerce

1 Introduction The popularity of the Internet and the sharing of the information have now reached a new level, which have made information security protruded day by day. Key technology is critical in solving network information security. The study on key has been increasing in an explosive way over the past 20 years. RSA, a popular technology, has matured and has higher requirements on bit length. The increase on key module has slowed down its speed in encryption and decryption. Its realization by hardware also becomes more and more unbearable. Therefore the application of RAS has become a heavy burden, especially its application in E-commerce, a substantive way of safe trading. ECC adopts small module and still reaches the same level of security as RAS. Thus the study on ECC has become hot for domestic and overseas scholars. The author has made some improvements on existing ECDSA. Its arithmetic speed has been accelerated and its data transmission has been shortened. The author also studies the application of its improved algorithm in E-commerce and makes E-commerce faster and safer. 2 ECC Systems and its Comparison Since the birth of public key cryptography, people have presented various public key cryptography methods. The following three types of systems have been thought to be safe and effective: Big Integer Factorization System (The representative one is RSA), ECC and Discrete Logarithm System (The representative one is DSA.) 2.1 RSA RSA was firstly put forward in 1978 by three professors from Massachusetts Institute of Technology: Rivest, shamir and Adleman. This is a 525

526 unilateral trapdoor function, an exponential function based on factorization. An exponential function is a unilateral trapdoor function based on seeking discrete logarithm. It is a matured and widely used key system nowadays. The security of RSA is based on the difficulty of factorizing big integer in number theory. Therefore if the integer is bigger, the factorizing will be more difficult. It will be more difficult to decrypt the key. The level of encryption will be much higher. The advantages of RSA methods are: it is simple in theory and easy to use. But with the advancement and perfection of big integer factorization, the raising of computer operation speed and the advancement of computer network, the integer for safeguarding RSA encryption and decryption is required to be bigger and bigger. To ensure RSA safety, the length of its key is expanding. For example, today it is thought RSA needs 1024-bit length to protect its safety'11. Therefore it brings a heavy burden to use RSA and its application range is limited. 2.2 DSA DSA has been presented by National Institute of Standards and Technology of U.S. in August 1991. It is a digital signature standard based on discrete logarithm which is used for Digital Signature Standard (DSS). This is a DSS of public key. DSA uses a public key to verify data integrity and consistency. DSA only provides digital signature. It does not provide data encryption'21. 2.3 ECC In 1985, Neil Koblitz and Victor Miller respectively put forward Elliptic Curve Cryptograph (ECC). The security of ECC not only relies on the factorization of discrete logarithm at elliptic curve, but also on the selection of the curve and system of curve. At present, 200 bits ECC already has very high security. The mathematic theory of ECC is that'31: at a binary finite field F, an elliptic curve is developed based on certain rule where addition and multiplication are defined. Suppose that two points P and Q are known at elliptic curve: Q=xP Seek x? The seeking of X is the well-known discrete logarithm on elliptic curve. At one side, it maps multiplication and exponent arithmetic on field of real numbers into addition on elliptic curve. It is faster and easier than other public key systems whether ECC is realized by hardware or software. The cost is also very low. At the other side, seeking X at elliptic curve is involved in both integer factorization and discrete logarithm, which is undoubtedly more difficult. It is quite natural that this system has higher security.

527 Table 1: Comparison among Various Public Key Systems in key length Protected key (bit)

80

E C C (bit) n

161

D S A (bit) R S A (bit) n

112 224

128 256

192 384

256 512

q

160

224

256

384

512

P

1024

2048

3072

7680

15360

1024

2048

3072

7680

15360

The security of encryption is reflected by its ability to resist attacking. Compared to other public key systems. ECC has absolute advantages in resist attacking. For example, 160-bit ECC has the same level of security as 1024-bit RSA or DSA. 210-bit ECC has the same level of security as 204-bit RSA or DSA.

Decryption Time (MIPS Years) Figure 1: Comparison among ECC, RSA and DSA in Resisting Attacking

ECC has the following technical advantages: 1 .Smaller computational complexity and it is quick in operation. Though RSA can raise key processing speed by selecting smaller key (can be as small as 3) to speed up processing of encryption and signature verification and make RSA compatible in encryption and signature verification speed, ECC is much more faster in processing key (decryption and signature verification) than RSA or DSA. Therefore the comprehensive speed of ECC is much faster than that of RSA or DSA. 2.Smaller storage space Key size and system parameter of ECC is much smaller than RSA or DSA. This means that its storage space is much smaller. It is very significant in applying encryption on IC card.

528 3. Lower bandwidth requirement. When long message is encrypted or decrypted, these three key systems have the same requirements on bandwidth. But ECC has much lower bandwidth requirements when short message is processed. While key encryption is mainly used for processing short message, for example, for digital signature or conversation key transmission in symmetrical system. The low requirements on bandwidth make ECC have very wide prospect in application at wireless network. These features from ECC will make it take the place of RSA and become a popular public key decryption algorithm. For example, SET framer has decided to make ECC a default public key algorithm for next SET protocol. 3 ECC Digital Signature Algorithms and its Improvement Digital signature is used to guarantee data integrity in transmission and provide identity verification and non-repudiation of sender. Using public key algorithm is the main technology to realize digital signature. The flow chart of digital signature and verification • Plaint Text Hashed Value Plaint lext

^

Hash Algorithm

Signature Algorithm

uigitai signature

Private Key | Figure 2: Digital Signat ure Flow chart Plaint Text Plaint Text

^

Hash Algorithm

Hashed Value ..

Verification Algorithm ^

Digital Signature

Valid or not ^

iL

Public Key

Figure 3: Signature Verification Process

By existing elliptic curve digital signature algorithm, we have made some improvements in ECC to make its operation burden decreased and its speed accelerated. Inversion is the main operation burden in existing ECC encryption or signature process. For example, in verifying signature, the algorithm of w= s"1 (mod n), which uses extended Euclid algorithm and needs to complete 0.8431og2(n)+1.47 divisions in average, is very slow'41. We developed a totally

529 new signature equation, which does not need inversion in its algorithmic process. The signature and verifying equations are as following: Signature Equation: k=s+mrx Verifying Equation: r=sg+mry Operation: sg+mry=(k-mrx)g+mrxg=kg=r From above process, we can understand that it is same to generation of elliptic curve and distribution of private keys in existing IEEE PI363. The difference lies in its signature generation and verification process. The improved algorithm reduces the operation burden and speeds up the operation, which has been verified by experimental results. Under the premium that the same security will be provided, this solution has lower requirements on system resources and is much more suitable for using at environment that is limited by algorithm or storage capability, such as intelligent card etc'51. 4 Application of Improved ECC in E-commerce E-commerce is the inevitable result from social and technology development. The network transaction needs a lot of information, such as information on product manufacturing, supply, product demand, competition etc. Compared to traditional transaction, E-commerce has higher requirements on information security'61. The core and key of E-commerce security is the security of e-transaction. At present, the popular safe on-line payment protocols used in e-transaction are SSL and SET. SET (Safe Electronic Transaction) is one of the protocols to guarantee safe transaction on Internet. Because of its strict design and high security, it has been widely used. In SET protocol, various encryption algorithms, including symmetrical and unsymmetrical encryption, have been used to realize safety. In text, we will apply improved ecliptic curve digital signature algorithm into SET. 4.1 SHA Application SI.Add the text for abstracting; S2.Reckon abstract S3.Send your information and abstract to other (business or bank) S4.The other side (business or bank) initializes in the same way. Added text, reckon abstract and compare if abstracts are same. 4.2 Signing Process SI.Cardholder uses SHA to generate abstract H(OI), H(PI) from (OlorPI) . S2.Card holder decides elliptic curve parameters F= (P, a, b, g, n, h) or (m, f(x), a, b, g> n, h) . S3.Cardholder sends decided hash function (SHA) and parameters of elliptic parameters to business and bank.

530 S4.Cardholder chooses private key X based on decided limited field G (P) and elliptic curve. Public key Y will be got from the public point g: y=xg. The signer makes y in public. S5.Cardholder chooses random parameter K, l ^ K ^ n - 1 . S6.Reckon r=kg. If r=0, go to step S5. S7.Reckon s=mrx-k. ( m is text abstract). S8.Use (s, r) to sign m. Send (s, r) with m to verifier. 4.3 Verifying Process S9.The business reckon r'=sg+mry. SlO.If r'=r, then signature is valid. Otherwise the signature will be turned down. 4.4 Code The selection of elliptic curve adopts IEEE P2363 standard. The selection of parameters171 is as following: // Various fields: p(t) = tA163 + tA7 + tA6 + tA3 + 1 inline void use_NIST_B_163 () {F2X pt=Pentanomial (163, 7, 6, 3, 0); setModulus (pt);} // Degree 163 Binary Field from fips 186-2 // (fake random curve E) : yA2 + xy = xA3 + xA2 + b, // b = 2 0a601907 b8c953ca 1481ebl0 512f7874 4a3205fd // (Order of base point) : // r = 5846006549323611672814742442876390689256843201587 // (Base point G) : // Gx = 3 f0ebal62 86a2d57e a0991168 d4994637 e8343e36 // Gy = 0 d51fbc6c 71a0094f a2cdd545 bllc5c0c 797324A // remaining factor h = 2 #define NIST_B_163 EC_Domain_Parameters (163, 3, 7, 6, 3, Curve ("1", "20a601907b8c953cal481ebl0512f78744a3205fd"), decto_Bigint ("5846006549323611672814742442876390689256843201587"), Point ("3f0ebal6286a2d57ea0991168d4994637e8343e36", "0d51 fbc6c71 a0094fa2cdd545b 11 c5c0c797324f 1"), decto_BigInt ("2")); void ecdsa_ex () { // Degree 163 Binary Field from fips 186-2 use_NIST_B_163 (); EC_Domain_Parameters dp = NIST_B_163; ECPrivKey sk (dp);// (generate key pair from elliptic curve parameters) std::string M ("correct message"); ECDSAsigl (sk,OS2IP(SHAl(M)));// (generate signature) // (DER encode) DERder_str(sigl);

531 HexEncoder hex_str (der_str); std::cout« "DER Encoding: " « hex_str « std::endl; ECDSA sig2; try {// (analyze and check the error from DER) sig2 = der_str.toECDSA (); // (decode) } catch (borzoiException e) {// (Print error message and exit) e.debug_print (); return;} ECPubKey pk (sk); std::cout« "Checking signature against M: " « M.c_str ( ) « "\n->"; std::cout« "SHA1(M):" « OS2IP(SHAl(M)) « std::endl; if (sig2.verify(pk, OS2IP(SHAl(M))» // (verify and sign) std::cout«"valid signature\n"; else std::cout« "invalid signature\n"; M = "in" + M; // (falsify data) s t d . x o u t « "Checking signature against M: " « M.c_str ( ) « "\n->"; if (sig2.verify(pk, OS2IP(SHAl(M)))) // (verify signature) std::cout« "valid signature\n"; else std::cout« "invalid signature\n";} The improved algorithm has higher performance now and it is good for implementing key, but not good to crack key. The improved algorithm has raised key practicability as well as guaranteed key security, simplified operation complexity and accelerated operation speed. These have been verified by experiment.

i^iiirK , 'P* , ' Hi 7^ J *^Kag.5iri!,~* zva,," -- •*•'-!™ .•• •

- Pi xf

deyfil: 3a3id53081a20btt72a8648ce3d82B130819602Biai3Blaa28288a30f>092a8648ce3dBia2U 3033O0?e2@18302010C02ei8?:i82e04i50000OB8000@0800000008008fl80BB008B0B008000104i!;B 2B«6B1907b8c953cal48iebl0512f 78V44a328SFdB42MM«3F0eljal6286a2dS7eaB9¥l.:1.68d4V9463 7i;8343e360Bd51fbc6r.7:ta0094f«2c:dd545hllc5c0i:797324ri82:lS04B8080Baa08800808800292f e77e70ci2a4234c33020102032eB8042bB48:l9bc9ba«e09S34M045224S'»92dF5dM>834b86SFfBiB e26418028aa7S9ae3376ai92c69a8014e34FS8c Aei-d?.-- 3881dS3081«.206872*8648<;B3da201388i96028:1013Bl»82028B»306092a8648ce3dai82M 3033009028103828184020J.07302ea41S0BB0800000fl0B8008fl0000000BBB8888000008BB0i041S0 28a6019a?b8c953cal481eM0S'12F?8?44a328£Fd042b0403F0ebal628Ga2dS7ea899U68d499463 7c8343e360Brt5lFhc6r.7la0B94fa2cdd51KhllcHc0i;797324Fl.02i504BBBB8fl0000B00000a00292F e7?e78cl2a4234c330201B2U32sB0842hB40:l.9bc9bd«e09!i34blM45224S492dF5db6834bS65FF01W e2«41.8B28aa7S9ae3376Bl92c69aSB14e34fSI»r: OK Pi'ess anij key to continue^

Figure 4: the experiment and its results.

532 5 Conclusions With continuous popularity of Internet, E-commerce has become social development trend. As a network payment protocol, SET is in continuous perfection. After ECC becomes a hot issue, there are a lot of studies on its application. The author has made some improvements on existing elliptic curve digital signature algorithm and has verified the theory and practical applications on improved algorithm. The improved algorithm has simplified operation complexity, accelerated operation and guaranteed key security. It is suitable for application in SET. Reference [1] Li Ke-hong, Wang Da-ling and Dong Xiao-mei. Applied Password Study and Computer Data Security. Northeast University Publishing House. Shengyang. 1997. [2] Written by Bruce Schneier; Translated by Wu Shi-zhong, Zhu Shi-xiong and Zhang Wen-zheng. Applied Password Study —Protocol, Algorithm and C Source Program. Machine Press. Beijing. 2002. [3] National Institute for Standards and Technology, "Digital signature standard', FIPS Publication 186, 1993 [4] Qing Si-han. Password Study and Computer Network Safety. Qinghua University Publishing House. Beijing. 2001. [5] Lu Sheng, Miao Quan-xing, Bian Zheng-zhong, Luo Rong-tong. Design and Realization of Elliptic Curve Intelligent Card Algorithm. 2003(9):25-2 [6] Han Bao-ming, Du Peng and Liu Hua. Safety and Payment of E-commerce. People's Post and Telecommuniations Publishing House. Beijing. 2001. [7] G Agnew, R. Mullin and S. Vanstone, "An implementation of elliptic curve cryptosystems over F2155", IEEE Journal on Selected Areas in Communications, 11 (1993), 804-813. Note: This topic is a natural science study project for universities from Education Department of Guangdong Province. The project number is 203061. Author: Xu Xiao-ping, Associate Professor of Computer Applications. Mainly involved in the study of Internet Environment and Applications. Address: Room C903, New Teachers Village, South China Normal University, Guangzhou P C : 510631 Email: cathy.xu@ 163.com

AN IMMUNE SYMMETRICAL NETWORK-BASED SERVICE MODEL IN PEER-TO-PEER NETWORK ENVHIONMENT XIANGFENG ZHANG1, LIHONG REN1, AND YONGSHENG DING1'2, * I) College of Information Sciences and Technology 2) Engineering Research Center of Digitized Textile & Fashion Technology, Ministry of Education Donghua University, Shanghai 201620, P. R. China * Email: [email protected] Requirements for future Internet are implemented possibly in peer-to-peer network environments where all network nodes are equal to each other. Inspired by the similar features between immune systems and future Internet, based on the immune symmetrical network theory, a network service model in a peer-to-peer network environment is designed on our bio-network platform. To validate the feasibility of the model, we do experiments with different network service distributions and user requests towards each node. The results of the hops per request with time show that the users can acquire network services flexibly and efficiently.

1. Introduction The development of Internet technologies enables more and more man-made devices to access Internet and act as its components, which shows us a bright prospect of Internet. Obviously, future Internet must be capable of extensibility, survivability, mobility, and adaptability to network environments. It is necessary to optimize Internet architecture and its applications to address the challenges of the above key features. On the other hand, biological immune systems are adaptive systems and learning behaviors take place through evolutionary mechanisms similar to biological evolutions. They are distributed systems without central control. They can survive local failures and external attacks and maintain balance because of emergent behaviors of the interactions of many local elements, like immune cells. Inspired by the similar features between the immune systems and future Internet, a bio-network platform architecture has been designed in our former work. Bio-entities in the bio-network architecture are regarded as autonomous immune cells and they possess the characteristics of immune cells, such as interaction, no central control, diversity, mobility, and evolution. 533

534 Requirements for future Internet are implemented possibly in peer-to-peer (P2P) network environments where all network nodes are equal to each other and provide network application and services for other nodes in a distributed mode. Hoffmann [1] has proposed a symmetrical network theory for the immune system, which is a tractable first approximation. Principal lymphocytes that are involved in the response to a particular antigen are classified into just two specificity classes or sets. The interaction between these two sets maintains the balance of the immune system. So we develop a network service model in a P2P network environment based on the immune symmetrical network theory. This paper aims at the design a network service model in a P2P network environment. To this end, it is organized as follows: Section 2 will introduce some basic theories about immune symmetrical system and P2P network. Section 3 will present briefly our bio-network platform architecture. Section 4 will emphasize on the creation of the network service model. In order to validate its feasibility, some experiments are made on the bio-network platform with different network service distributions and user requests towards each node. The results show that the users can acquire network services flexibly and efficiently. Section 5 concludes the paper by discussing advantages of the service model. 2. Background 2.1. Immune Symmetrical Network The natural immune system is a very complex system with several functional components. It plays an important role in coping with dynamically changing environment through constructing self-non-self recognition networks among different species of antibodies. According to the immunologists, the components such as cells, molecules and organs in the immune system can prevent a body from being damaged by pathogens, known as antigens. The basic components of the immune system are lymphocytes that have two major types, B lymphocytes (B cells) and T lymphocytes (T cells). These two types play different roles in the immune response, but they act together and control or affect each other. The immune system is an adaptive system and learning behaviors take place through evolutionary mechanisms similar to biological evolution. It is a distributed system without central control. It can survive local failures and external attacks and maintains balance because of emergent behaviors of many local elements, like immune cells. Hoffmann [1] has proposed a symmetrical network theory for the immune system. The principal lymphocytes are classified into just two specificity classes. The first class is the antigen-binding set, denoted T+ andB+ for T

535 cells and B cells respectively. The second set is minus or anti-idiotypic set, T and B -. There are three types of interaction between the plus and minus sets as follows: stimulation, inhibition, and killing. Stimulation can occur when two lymphocytes encounter each other. The receptors of one lymphocyte ('+') can cross-link the receptors of a second lymphocyte ('-'), the converse is also true, so stimulation is assumed to be symmetrical in both directions between these two sets. Specific T cells factors could inhibit receptors. Finally, antibody molecules are assumed to kill in a symmetrical mode. According to interactions among B cells and T cells, we can receive a set of four stable steady states for the system of T+, B +, T - and B - cells. The steady states are initial, suppressive, responsive, and anti-responsive. 2.2. Peer-to-Peer Network Although the traditional client-server model first establishes Internet's backbone, more and more clients enter Internet and loads on the servers are steadily rising, resulting in long access times and server breakdowns. The user requests and communications among users are completed through application server. While P2P systems offer an alternative to such traditional systems for a number of application domains. In P2P systems, every node (peer) of the system acts as both client and server and provides part of the overall resources/information available from the system. In a pure P2P system, no central coordination or central database exists and no peer has a global view of the system. Participating peers are autonomous and self-organize the system's structure, i.e., global behavior emerges from local interactions. P2P technologies have many applications, such as file sharing and exchanging, distributed computing, collaborative system, peer-to-peer computing, and enterprise application. Like most other P2P applications, Gnutella builds, at the application level, a virtual network with its own routing mechanisms. Reference [2] extracts the topology of Gnutella's application level network. A content location solution is also proposed in which peers loosely organize themselves into an interest-based structure on top of the existing Gnutella network and this method makes Gnutella a more competitive solution [3]. Yang [4] further evaluates a non-forwarding peer-to-peer search. 3. A Bio-Network Platform Architecture In our previous work [5,6], we regarded services and applications on Internet as a number of interacting entities and applied some key principles and mechanisms of the immune systems to build network computation models. Based on these models, we have built a new bio-network platform architecture

536 (see Figure 1.) and its simulation platform which has the capability of service emergence, evolution etc. The simulation platform can be used to simulate some complex services and applications for Internet or distributed network.

Bio-network Low-level Functional Modules

Figure 1. The bio-network platform architecture.

In the bio-network platform architecture, a basic and important entity is called bio-entity. A bio-entity is an autonomous mobile agent and analogous to a cell or an antibody in the immune system. The bio-entity consists of its attribute, behaviors, services information, and communication. Interactions among a group of bio-entities can form an emergent service or application, called a society-entity. Apart from the bio-entities, the layered infrastructure consists of bio-entities' survivable environment, bio-network core services, and bio-network low-level functional modules established in a network node. Obviously, the bio-network platform hosts in a network node. The bio-network architecture has the advantages of extensibility, survivability, mobility, and adaptability to the changes of different users and network environments over the current Internet. Bio-entity survivable environment. The Bio-entity survivable environment is a runtime environment for deploying and executing bio-entities and protects the node from attacking with some security policies. It exists on different platforms so that the bio-entities could migrate in the heterogeneous network and access resources of different systems. Bio-network core services. The bio-network core service layer provides a set of general-purpose runtime services that are frequently used by bio-entities. They include event processing service and some basic services such as lifecycle

537 service, directory service, naming service, community sensing service, bio-entities migration service, evolution state management service, interaction control service, credit-driven control service, security authentication service, and application service. All these services alleviate bio-entities from low-level operations and also allow bio-entities to be lightweight by separating them from routine work. Bio-network low-level functional modules. In the bio-network low-level functional modules, there are six main modules: local resource management, bio-entity registration, bio-entity state control, local security, message transport, and class loader. The ideal model would place a bio-network platform on every device as a network node. The modules are just a bridge to maintain access to local resources. 4. A Service Model in Peer-to-Peer Network Environment 4.1. The Creation ofP2P Network Environment We implement the P2P network model on the designed bio-network simulation platform according to the symmetrical network theory of immune systems. The network can provide services and applications to users. Because of complexity and heterogeneity of future Internet, the network structure is very changeable, from just few nodes to incalculable nodes. Users can communicate directly, share resources and collaborate with each other. Users or services, represented by bio-entities in the network nodes, can be regarded as antibodies or anti-antibodies in the immune system. The bio-entities in different nodes interact equally. They have three actions between two neighbor nodes and keep four steady states. There are several bio-entities on the nodes to provide network services. The inter-connecting nodes are called symmetrical nodes. As an example, Node 1(N1) and Node 2 (N2) are regarded as two sets and bio-entities in the bio-network are regarded as antibodies in the symmetrical immune network, thus interactions of bio-entities in these two nodes exit stimulation, inhibition, and killing. The two nodes have four stable states: initial state, suppressive state, responsive state, and anti-responsive state, as shown in Figure 2.These states are shown in details as follows. (1) Initial state. When user requests to nodes are few, the bio-entities in the node can provide enough services to the users so that the users need not send requests to other nodes. At the same time, bio-entities have not enough credits to provide services to the other nodes.

538 (2) Suppressive state. With the increment of user requests, bio-entities in the network nodes require more and more credits to reward their services and they will evolve to produce next generations. At the same time, bio-entities in the node cannot provide enough services to the request users towards the node. For instance, the bio-entities on Nl have not enough credits for some users; while the bio-entities on N2 suppress the users of Nl to access their services because they have to provide services to their own users. The two nodes suppress each other to use their own resources and services, so the bio-entities in the nodes keep suppressive state. Nl

oo o o o

(1) Initial state (2) Suppressive state (3) Responsive state

(4) Anti-responsive state

Fig. 2.

Interaction of two end nodes in P2P network.

(3) Responsive state. When the requests towards Nl are increasing while the requests towards N2 are decreasing, a lot of unused resources or bio-entities exist on N2, bio-entities on Nl stimulus bio-entities on N2 and the latter provides services directly or migrates to Nl to provide services. When the services are provided, the bio-entities return their nodes and establish relationship with Nl. The state of the two nodes is in responsive state. With the decrement of bio-entities on N2, the services to N2 reduce and the credit level of Nl is higher than that of N2 so that Nl and N2 cannot keep the relationship, they will look for new nodes, and return the initial state. But the relationship between these nodes still exits and the bio-entities on the nodes interact with each other to provide services. If a bio-entity, denoting user requests, needs network services on other nodes, it sends a request to all of its known neighbors. The neighbors then check to see whether they can reply to the request by matching it to the service type. If they find a match, then they will reply; otherwise, they will forward the query to their own neighbors and increase the message's hop count. For instance, there are three bio-entities A, B, and C hosting on Nl, N2, and N3 respectively. Bio-entity A sends requests to other bio-entities of its neighbors. If B can provide service, it replies to A. At the same time, B acquires credits from A and establishes a relationship with B. If B cannot provide services to A, it sends the

539 request to C, and C will migrate to Nl to provide service for A. Then C returns its node and establishes a relationship with A. At the same time, bio-entities A and C announce their relationships to odier bio-entities on Nl. Bio-entities on Nl may access services on N3 directly and save time because of the decrease of the hops next time. (4) Anti-responsive state. The state is opposite to responsive state. The bio-entities on Nl provide services to requests from N2 and receive the credits. 4.2. Simulation Experiments In the simulation platform, we implement the P2P network model with two thousand nodes and simulate the service access. Before starting the experiments, we model the following parameters: (1) service information distribution; (2) request distribution encapsulates information on users' behavior. First, we set simulation parameters and set up simulation environments. We assume two request distributions: an unbiased distribution, with all requests having the same probability of being submitted, and a biased distribution, with a small number of requests are frequently asked. Suppose ten kinds of different network services distribute randomly on different nodes. In these experiments, we assume static resource distributions and no failures. The resource distributions are balanced and unbalanced. 100

H

m

|

40

O - -"— -13' -

Unbalanced & biased Unbalanced & unbiased Balanced & biased Balanced & unbiased

I 20

10

20 30 40 Simulator Time (generations)

50

Figure 3. Average number of hops per request with the change of simulation time.

Average number of hops per request with time is shown in the Figure 3. We can see that biased user request can access more easily services than biased uses request under unbalanced or balanced resource distribution. As to a type of user request, bio-entities can migrate much more efficiently resource nodes and achieve much more easily resource services under the unbalanced resource distribution.

540 5. Conclusions In this paper, a network service model in P2P network environment is designed which based on the immune symmetrical network theory. Bio-entities on two nodes can interact through stimulation, inhibition, and killing to provide some services. From the simulation experiments, the nodes are designed in four stable states as those in the immune symmetrical network. Different request distribution affects the average number of hops. The interaction among bio-entities maintains the balance of bio-network and makes the resources utilized reasonably and optimizes architectures of bio-network. The characteristics of biological immune systems can satisfy service evolution, adaptability, and security of future Internet, so it is necessary to study other bio-network computational models to improve the exiting network architecture and to make future network services become much more intelligent and individual. Acknowledgments This work was supported in part by the Key Project of the National Nature Science Foundation of China (No. 60534020), the National Nature Science Foundation of China (No. 60474037 and 60004006), and Program for New Century Excellent Talents in University (No. NCET-04-415). References 1. G. W. Hoffmann, A neural network model based on the analogy with the immune system, J. Theoretical Biology, 122, 33-67(1986). 2. M. Ripeanu, I. Foster, and A. Iamnitchi, Mapping the Gnutella network: Properties of large-scale peer-to-peer systems and implications for system design, IEEE Internet Computing Journal. 6(1),(2002). 3. K. Sripanidkulchai, B. Maggs, and H. Zhang, Efficient content location using Interest-based locality in peer-to-peer systems, Twenty-second Annual Joint Conference of the IEEE Computer and Communications Societies, IEEE INFCOM2003, 30 March- 3 April, San Francisco, California, USA(2003). 4. B. Yang, P. Vinograd, H. G. Molina, Evaluating GUESS and non-Forwarding peer-to-peer search, The 24th IEEE Internet Conference on Distributed Computing Systems, Hachioji, Tokyo, Japan, 24-26, March 2004 (2004). 5. Y. S. Ding and L. H.Ren, Design of a bio-network architecture based on immune emergent computation, Control and Decision (in Chinese). 18(2), 185-189(2003). 6. L. Gao, Y. S. Ding, and L. H. Ren , A Novel Ecological Network-Based Computation Platform as Grid Middleware System, Int. J. Intelligent Systems. 19(10), 859-884 (2004).

M A C H I N E L E A R N I N G A N D S O F T - C O M P U T I N G IN BIOINFORMATICS - A SHORT J O U R N E Y F.-M. SCHLEIF a AND T. VILLMANN University of Leipzig, Department of Mathematics and Computer Science, Institute of Computer Science, Leipzig, Germany T. ELSSNER AND J. DECKER AND M. KOSTRZEWA Bruker Daltonik GmbH, Fahrenheitstrafie 4, D-28359 Bremen,

Germany

Bioinformatics is a promising and innovative research field which gains hope to develop new approaches for e.g. medical diagnostics. Appropriate methods for pre processing as well as high level analysis are needed. In this overview we highlight some aspects in the light of machine learning and neural networks. Thereby, we indicate crucial problems like the curse of dimensionality and give possibilities for overcoming. In an examplary application we shortly demonstrate a possible scenario in the field of mass spectrometry analysis in cancer detection. Despite of a high number of techniques dedicated to bioinformatic research as well as many successful applications, we are in the beginning of a process to massively integrate the aspects and experiences in the different core subjects such as biology, medicine, computer science, chemistry, physics, and mathematics.

1. Introduction Bioinformatics is a challenging field in research on the border between computer science and biology. It brings together scientists from mathematics, physics, chemistry, medicine and biology. The applications of bioinformatic research ranging from food industry, medical analytic, drug development in pharmacy to agriculture. From computer science point of view a broad variety of methods contribute to these applications, including artificial neural networks, statistical and Bayes methods, tree-based systems, image processing, statistical pattern recognition, visualization models to name just a few 1 " 8 . In the following we will shortly outline basic directions of methods in softcomputing for bioinformatics and link the contributions from this special session therein. a

Corresponding author: Frank-Michael Schleif: Bruker Daltonik GmbH, Permoserstr. 15, D-04318 Leipzig, Germany, Tel/(Fax): +49 341 24 31-408(404), [email protected]

541

542 2. Data analysis - clustering and classification Data analysis in the field of bioinformatics takes place after maybe advanced data processing 9 . Main issues of data mining and data analysis are knowledge extraction, modelling of biological processes, generating of classification and decision systems featuring the biological knowledge, which can be used for classification, regression and prediction 10 . Thereby, the data frequently are noisy, high-dimensional/complex and, in particular in medicine, sparse. These aspects cause several difficulties. Especially, many of the traditional statistical methods can not be applied in medical or bioinformatic applications due to these restrictions u . Methods of soft-computing offer alternative ways to handle these difficulties. Clustering as an unsupervised paradigm plays a central role. It can have different purposes depending on the context: data compression, identification of typical patterns, efficient data description. Standard method is the commonly used agglomerative hierarchical clustering, which can be realized in several ways. A more robust class of methods are prototype based vector quantizer as k-means or its fuzzy counterpart fuzzy-k-means 12 , 13 . A robust type of vector quantizer is the family of neural maps 14 . Thereby, the self-organizing map (SOM, 15 ) and the neural gas algorithm (NG, 16 ) are prominent examples, which are successfully used in bioinformatic clustering tasks l r . These unsupervised algorithms allow information optimum compression and clustering of data together with robust noise tolerant behavior 18

In close relation to clustering and compression is complexity reduction of data. Standard methods are principal component analysis (PCA), Fourierand Wavelet analysis for feature extraction 5 or feature extraction by optimization of mutual information 19 or other objective functions. The growing variant of the above mentioned SOM can be taken as non-linear PCA 20 . Another paradigm is classification 21 . In machine learning context, it belongs to supervised learning methods. After training the classification model by pre-classified examples, the model should decide for unknown data the (probably) respective class. Classical statistics uses Bayes inference, which usually require knowledge or assumptions about the mathematical type of distribution of the data 22 . Alternatively, decision trees are a commonly applied method in biological or clinical classification systems 23 . Again, several objectives are possible. Usually some kind of information measures is used (Gini-index or Renyi-index) 24 . In bioinformatics, decision trees are applied for phylogenetic trees, for example 25 . This can be combined with neural networks approaches 26 . A popular prototype based classification method is the family of kernel methods, and, in particular, the support vector machines (SVMs) 27 . Based on the usage of separation properties of huge-dimensional spaces, classification is obtained by mapping of data into such high-dimensional

543 spaces and subsequent separation margin maximization. SVMs are successfully applied to complex data in bioinformatics as gene expression data in micro-array analysis 2 8 , 3 . A combination of SVMs together with decision trees for classification of plant spectra is provided in 29 . Learning vector quantization as another prototype based classification method allows an intuitive interpretation of the class dependent prototypebased classification decision as well as of the adaptation process (learning), which tries to approximate the Bayes borders of classification 14 . Several methods have been established to improve the standard algorithm LVQ2.1 concerning aspects like overlapping classes (Generalized LVQ - GLVQ) or insensitivity to initial conditions by neighborhood cooperativeness using the framework of NG (Supervised NG) 3 0 , 3 1 . Recent approaches also include probabilistic (fuzzy) classification extensions 32 , which have been successfully applied in proteomics and image segmentation of barley grains 3 3 , 3 4 . A challenging issue is the use of non-standard metric for classification 3 5 . Instead of the widely used Euclidean metric, other, more task specific metric should be applied. Examples are the Tanimoto-distance 2 1 widely applied in taxonomy, LIK-kernels or correlation measures in splice site recognition and microarray-analysis 3 6 , 3 7 . Further, parametrized distance measure can be applied. They allow an optimal adaptation to the given classification or clustering tasks. Adaptive metric were applied to cluster gene expression data 38 and cancer spectral data 39 . 3. D a t a visualization Data visualization of high-dimensional data may offer new insights in data structures and, therefore, is of increasing interest for biological experts, too. Standard tools for distance preserving mapping are curvilinear mapping or multidimensional scaling (MDS) 4 0 . Other methods like (linear) principal component analysis (PCA) obtain low-dimensional models by dimensionality reduction, which easily can be visualized 21 . A non-linear PCA can be realized by the growing self-organizing map (GSOM) 20 . It was demonstrated that this method is suitable to visualize the internal data structure of large-dimensional data by faithful data representation covering both statistical and structural (topological) properties, due to the fact, that most high-dimensional data have a low-dimensional internal representation caused by high inner correlations 41 . Applications in bioinformatics comprise visualization of micro-array analysis, gene expression data 42 , and proteomics 33 . Another type of dimensionality reduction scheme is based on the search for independent components based on blind source separation 43 . Thereby, usually a linear mixing of the unknown sources is assumed. The method determines the inverse mixing matrix such that the original low-dimensional sources can be reconstructed and thereafter may be visualized.

544 Another complicate task is the visualization of decision processes or structured non-metric data. The first one can be taken as visualization of decision trees which, hence, leads to the more general problem of visualization of trees. This includes also the above mention phylogenetic trees and, therefore, the visualization of phylogenetic structures and dependen44 45

cies ^ , * ° .

4. Exemplary clinical proteomics application in the light of bioinformatics We present in the following an exemplary application. This is the classification of mass-spectrometry (MS) data for cancer prediction. It comprise several of the above addressed problems, which are typical in bioinformatics: 1.) The data are high-dimensional 2.) Only a small sample set is available. 3) The data need to be preprocessed carefully. 4.) A visualization is highly recommended to detect internal relations and to extract them for biomarker search. 5.) The ability of generalizations is demanded for prediction. The exemplary data set is the LEUK data set generated by 4 6 . It was obtained by matrix assisted laser desorption ionization time of flight (MALDI-TOF) mass spectrometry analysis of blood plasma of patients suffering from acute lymphatic leukemia. Additionally a group of control volunteers was under consideration. A mass range between 1 to lOkDa was used. The spectra were first processed using the standardized workflow as given in 47 . The particular sample preparation is described in 4 8 . After preprocessing the spectra are obtained as 145-dimensional vectors, i.e. the feature vectors to be classified are high-dimensional. The data set consists of 30 cancer samples and 30 control samples, reflecting the problem of small sample sets, see Fig.1.The data were mapped onto their first two non-linear principal components spanned by a two-dimensional SOM 14 . A re-mapping of the SOM lattice into the data subspace spanned by the first two linear principle components is depicted in Fig. 2.A more appropriate probabilistic model of SOM is obtained by incorporation of fuzzy classification learning (FL-SOM, 3 3 ) , such that the responsibilities of the lattice notes give probability values for cancer prediction , Fig. 3. Acknowledgement The processing of the proteomic data was supported by the Bruker Daltonik GmbH using the C L I N P R O T ™ system and Sachsische Aufbaubank grant 7495/1187. References 1. U. Seiffert, B. Hammer, S. Kaski, and Th. Villmann. Neural networks and machine learning in bioinformatics - theory and applications. In M. Verleysen, editor, Proc. Of European Symposium on Artificial Neural Networks (ESANN'2006), pages 521-532, Brussels, Belgium, 2006. d-side publications.

545 laaukaernie\tcil_wcxjin_pat_s\134 Raw

j 1

i

JXJ;JmUL *J^m l3aukaamta\lol_wcxJin_koiit^s\JT52Raw

jJJMt

III

Ju

X W L Jju.^.i . XH..JL.

, .. A

Figure 1. Mass spectrum samples from LEUK. The above spectrum is a sample from the cancer class, the spectrum below is taken from the healthy control group. 2. M. Amos. Theoretical and Experimental DNA Computation. Natural Computing. Springer, Berlin, 2005. 3. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene selection for cancer classification using support vector machines. Mach. Learn., 46:389-422, 2002. 4. L. Li, H. Tang, Z. Wu, J. Gong, M. Gruidl, J. Zou, M. Tockman, and R.A. Clark. Data mining techniques for cancer detection using serum proteomic profiling. Artificial Intelligence in Medicine, 32:71-83, 2004. 5. Pietro Lio. Wavelets in bioinformatics and computational biology: state of art and perspectives. Bioinformatics, 19(l):2-9, 2003. 6. M. Kussmann, M. Affolter, and L.B. Fay. Proteomics in nutrition and health. Comb. Chem. High Throughput Screen., 8(8):679-696, December 2005. 7. Jeffrey S. Morris, Philip J. Brown, Richard C. Herrick, Keith A. Baggerly, and Kevin R. Coombes. Bayesian analysis of mass spectrometry proteomics data using wavelet based functional mixed models. UT MD Anderson Cancer Center Department of Biostatistics and Applied Mathematics Working Paper Series. Be press, 2006. http://www.bepress.com/mdandersonbiostat/paper22. 8. N. Benoudjit, D. Francois, M. Meurens, and M. Verleysen. Spectrophotometric variable selection by mutual information. Chemometrics and Intelligent Laboratory Systems, 74:243-251, 2004. 9. M. Strickert, T. Czauderna, S. Peterek, A. Matros, H.-P. Mock, and U. Seiffert. Full-Length HPLC signal clustering and biomarker identification in tomato plants. In Proc. of FLINS 2006, 2006.

546

OVjOCS

Figure 2. Re-mapping of the 3 x 2 SOM lattice into the data subspace spanned by the first two linear principle components of the data set. Different colors and shapes ( • 0) refer to different responsibilities according to volunteers and patients.

FL-SOM for leukaemia data set •9B%\

83%

$0

8

%$L J r. Figure 3. Probabilistic fuzzy label for the 3 x 2 (FL-)SOM. The prototypes representing the responsibilities for the healthy and the patient class, respectively. The first bar visualizes the probability for cancer whereas the second one refers to healthy group membership.

10. W . T i m m , S. Boecker, T . T w e l l m a n n , a n d T. W . N a t t k e m p e r . P e a k intensity prediction for pmf m a s s s p e c t r a using s u p p o r t vector regression. In Proc. of FLINS 2006, 2006. 11. T. Villmann, G. Blaser, A. K o r n e r , and C. Albani. Relevanzlernen u n d

547

12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.

27. 28.

29. 30.

31.

statistische Diskriminanzverfahren zur ICD-10 Klassifizierung von SCL90Patienten-Profilen bei Therapiebeginn. In G. Plottner, editor, Aktuelle Entwicklungen in der Psychotherapieforschung, pages 99-118. Leipziger Universitatsverlag, Leipzig, Germany, 2004. Y. Linde, A. Buzo, and R.M. Gray. An algorithm for vector quantizer design. IEEE Transactions on Communications, 28:84-95, 1980. J.C. Bezdek. Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum, New York, 1981. T. Kohonen. Self-Organizing Maps, volume 30 of Springer Series in Information Sciences. Springer, Berlin, Heidelberg, 1995. (2nd Ext. Ed. 1997). Helge Ritter, Thomas Martinetz, and Klaus Schulten. Neural Computation and Self-Organizing Maps: An Introduction. Addison-Wesley, Reading, MA, 1992. Thomas M. Martinetz, Stanislav G. Berkovich, and Klaus J. Schulten. 'Neural-gas' network for vector quantization and its application to time-series prediction. IEEE Trans, on Neural Networks, 4(4):558-569, 1993. Udo Seiffert, Lakhmi C. Jain, and Patrick Schweizer. Bioinformatics using Computational Intelligence Paradigms. Springer-Verlag, 2004. T. Villmann and J.-C. Claussen. Magnification control in self-organizing maps and neural gas. Neural Computation, 18(2):446-469, February 2006. C. Krier, D. Francois, V. Wertz, and M. Verleysen. Feature scoring by mutual information for classification of mass spectra. In Proc. of FLINS 2006, 2006. Th. Villmann and H.-U. Bauer. Applications of the growing self-organizing map. Neurocomputing, 21(1-3):91-100, 1998. R.O. Duda and P.E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, 1973. Marina Vannucci, Naijun Sha, and Philip J. Brown. Nir and mass spectra classification: Bayesian methods for wavelet-based feature selection. Chemometrics and Int. Lab. Systems, 77:139-148, 2005. M.K. Mareky, G.D. Tourassi, and C.E. Floyd Jr. Decision tree classification of proteins identified by mass spectrometry of blood serum samples from people with and without lung cancer. Proteomics, 3:1678-1679, 2003. J.R. Quinlan. C4-5: Programs for Machine Learning. Morgan Kaufmann, 1993. J. Dopazo and J.M. Carazo. Phylogenetic reconstruction using an unsupervised growing neural network that adopts the topology of a phylogenetic tree. Journal of Molecular Evolution, 44(2):226-233, 1997. E.V. Samsonova, T. Back, M.W. Beukers, A.P. Ijzerman, and J. Kok. Combining and comparing cluster methods in a receptor database. In Proceedings of the 5th International Conference on Intelligent Data Analysis (IDA), volume 2810 of Lecture Notes in Computer Science. Springer, 2003. Vladimir N. Vapnik. The nature of statistical learning theory. Springer New York, Inc., New York, NY, USA, 1995. M.P.S. Brown, W.N. Grundy, D. Lin, N. Christianini, C.W. Sugnet, T.S. Furey, M. Ares Jr., and D. Haussler. Knowledge-based analysis of microarray gene expression data using support vector machines. PNAS, 97(l):262-267, 2000. P.M. Granitto, F. Biasioli, C. Furlanello, and F. Gasperi. Rf-rfe on ptr-ms fingerprinting of agroindustrial products. In Proc. of FLINS 2006, 2006. A. Sato and K. Yamada. Generalized learning vector quantization. In D. S. Touretzky, M. C. Mozer, and M. E. Hasselmo, editors, Advances in Neural Information Processing Systems 8. Proceedings of the 1995 Conference, pages 423-9. MIT Press, Cambridge, MA, USA, 1996. B. Hammer, M. Strickert, and Th. Villmann. Supervised neural gas with

548 general similarity measure. Neural Processing Letters, 21(l):21-44, 2005. 32. T. Villmann, B. Hammer, F.-M. Schleif, and T. Geweniger. Fuzzy labeled neural gas for fuzzy classification. In M. Cottrell, editor, Proc. of Workshop on Self-Organizing Maps (WSOM) 2005, pages 283-290, 2005. 33. F.-M. Schleif, T. Elssner, M. Kostrzewa, T. Villmann, and B. Hammer. Analysis and visualization of proteomic data by fuzzy labeled self-organizing maps. In Proc. of CBMS, page in press. IEEE Computer Society Press, Los Alamitos, 2006. 34. C. Brtifi, F. Bollenbeck, F.-M. Schleif, W. Weschke, Th. Villmann, and U. Seiffert. Fuzzy image segmentation with fuzzy labeled neural gas. In M. Verleysen, editor, Proc. Of European Symposium on Artificial Neural Networks (ESANN'2006), pages 563-568, Brussels, Belgium, 2006. d-side publications. 35. B. Hammer and Th. Villmann. Classification using non-standard metrics. In M. Verleysen, editor, Proc. Of European Symposium on Artificial Neural Networks (ESANN'2005), pages 303-316, Brussels, Belgium, 2005. d-side publications. 36. Barbara Hammer, Marc Strickert, and Thomas Villmann. Prototype based recognition of splice sites. In U. Seiffert, L.A. Jain, and P. Schweitzer, editors, Bioinformatic using Computational Intelligence Paradigms, pages 2 5 56. Springer-Verlag, 2005. 37. M. Strickert, U. Seiffert, N. Sreenivasulu, W. Weschke, T. Villmann, and B. Hammer. Generalized relevance LVQ (GRLVQ) with correlation measures for gene expression analysis. Neurocomputing, 69(6-7):651-659, March 2006. ISSN: 0925-2312. 38. Samuel Kaski. SOM-based exploratory analysis of gene expression data. In Nigel Allinson, Hujun Yin, Lesley Allinson, and Jon Slack, editors, Advances in Self-Organizing Maps, pages 124-131. Springer, London, 2001. 39. V. Cheng, C.-H. Li, J.T. Kwok, and C.-K. Li. Dissimilarity learning for nominal data. Pattern Recognition, 37(7):1471-1477, 2004. 40. M. Strickert, N. Sreenivasulu, and U. Seiffert. Sanger- driven MDSLocalize - a comparative study for genomic data. In M. Verleysen, editor, Proc. Of European Symposium on Artificial Neural Networks (ESANN'2006), pages 265-270, Brussels, Belgium, 2006. d-side publications. 41. Th. Villmann, E. Merenyi, and B. Hammer. Neural maps in remote sensing image analysis. Neural Networks, 16(3-4):389-403, 2003. 42. S. Kaski, J. Nikkila, M. Oja, J. Venna, P. Toronen, and E. Castren. Trustworthiness and metrics in visualizing similarity of gene expression. BMC Bioinformatics, 4:48, 2003. 43. A. Hyvarinen, J. Karhunen, and E. Oja. Independent Component Analysis. J. Wiley Sons, 2001. 44. F. Schreiber. Visual comparison of metabolic pathways. Journal of Visual Languages and Computing, 14(4):327-340, 2003. 45. A. Drummond and K. Strimmer. PAL: an object-oriented programming library for molecular evolution and phylogenetics. Bioinformatics Applications Note, 17(7):662-663, 2001. 46. IKP Stuttgart MHH Hannover and Bruker Daltonik Leipzig, internal results on leukaemia, 2004. 47. B.L. Adam et al. Serum protein finger printing coupled with a patternmatching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men. Cancer Research, 62(13):3609-3614, July 2002. 48. E. Schaffeler, U. Zanger, and M. Schwab. Magnetic bead based human plasma profiling discriminate acute lymphatic leukaemia from non-diseased samples. In 52st ASMS Conf. 2004, page TPV 420, 2004.

FULL-LENGTH HPLC SIGNAL CLUSTERING A N D BIOMARKER IDENTIFICATION IN TOMATO PLANTS

M. S T R I C K E R T 1 , T. C Z A U D E R N A 1 , S. P E T E R E K 2 , A. M A T R O S 2 , H.-R M O C K 2 , AND U. S E I F F E R T 1 1 - Pattern

Recognition Group, 2 - Applied Biochemistry Leibniz Institute of Crop Plant Research, Corrensstr. 3, D-06466 Gatersleben, Germany E-mail: stricker@ipk-gatersleben. de

Group

High resolution HPLC data of a tomato germplasm collection are studied: Analysis of the molecular constituents of tomato peels from 55 experiments is conducted with focus on the visualization of the plant interrelationships, and on biomarker extraction for the identification of new and highly abundant substances at a wavelength of 280nm. 3000-dimensional chromatogram vectors are processed by stateof-the-art and novel methods for baseline correction, data alignment, biomarker retrieval, and data clustering. These processing methods are applied to the tomato data set and the results are presented in a comparative manner, thereby focusing on interesting clusters and retention times of nutritionally valuable tomato lines.

1. I n t r o d u c t i o n High-performance liquid chromatography (HPLC) enables detection of molecular compounds in probe material. Here, substances of tomato peel from 36 accessions from the in-house germplasm collection, providing 55 partly redundant data sets, are studied in order to identify those lines with highly abundant phenolic compounds. The available HPLC multi-channel device records absorption rates in a wavelength range between 200nm and 600nm using a diode array detector with 1.2nm resolution. Specific interest lies in the components displayed at about 280nm for a 50rran sampling at 1Hz. Thus, absorption values for 3000 retention time points are considered corresponding to substance composition in the peel of each tomato fruit. The available 55 high-dimensional chromatogram vectors are further analyzed in a processing pipeline of baseline correction, vector alignment, and clustering. Additionally, a new feature selection method is introduced for identification of distinctive retention times. The extracted time points serve as biomarkers of specific differences within the set of chromatograms. Instead of utilizing standard software tools accompanying the HPLC device, the analytic steps are realized by alternative state-of-the-art and novel methods here. Thereby, entire chromatograms, not a selection of manually determined peaks, are processed. This maintains all information and it is a particular advantage if the chemical compounds are a priori unknown.

549

550 2. C h r o m a t o g r a m d a t a processing pipeline The goal, identification of possibly interesting tomato lines and relevant retention times, is addressed after raw data export from delivered chromatogram software, such as Empower or MetaboliteTools. Data processing steps proposed here are summarized in Fig. 1. 0. 1. 2. 3. 4. Raw Data —• Baseline Correction —» Alignment —>• /Biomarker ExtractionX —* Clustering QuantileSpline DTW, PTW, COW \ FUSE / PCA,HiT-MDS Figure 1.

Pipeline for chromatogram data processing.

1. Baseline correction is an essential step to remove low-frequency instationarities occurring during the measuring procedure. Here, correction is obtained by a moving window approach. Within fixed time frames the g-quantiles (q < 20%), regarded as noise-induced baseline thresholds, are calculated and interpolated by a spline (QuantileSpline); this background spline is subtracted from the original signal in order to obtain the corrected data. This approach implements a high-pass filter and is, for example, part of the MATLAB Bioinformatics Toolbox. 2. Chromatogram alignment is essential to make the measured signals intercomparable. During the HPLC recording usually slight delays or accelerations are observed because of specific processes in the probe columns. Three approaches to chromatogram alignment have been studied, (restricted) dynamic time warping (DTW), parametric time warping (PTW), and correlation optimized warping (COW). Dynamic time warping DTW1 yields optimum alignment in terms of the shortest way, corresponding to the smallest sum of distances, in the component-wise source vs. target distance matrix. Thereby, the source signal becomes stretched by constant replica of adjacent components. This undesired effect of constantinduction is attenuated by upper limit constraints, but it cannot be completely avoided. The constant sections are unnatural for subsequent chromatogram peak integrations and, moreover, the stretched signal has to be mapped back to the original time scale. For these reasons, DTW has been excluded from the following considerations. Parametric time warping PTW 2 is an alternative alignment method without unnatural constant stretching. It provides very fast source-to-target time scale mappings by fitting parameters of a quadratic transfer function via an iterative least squares approach. Different to that, correlation optimized warping COW 3 computes local timescale adaptations by interval-based correlation maximization. Within a fixed range, all possible shifts are tested around either

551 discretely sampled or manually selected retention times to find the best match. A crucial alignment choice of both PTW and COW is the alignment target: all chromatograms need a common reference signal by means of which they become intercomparable. Possible choices of such a prototypic reference signal are the average (mean) chromatogram values or the medians at each time point. The alignment quality is very important, because the subsequent steps, biomarker identification and clustering, are based solely upon the preprocessed data. 3. Biomarker extraction is an optional step after alignment for the identification of retention times that are characteristic of the data set. Unsupervised feature selection (FUSE) is proposed which, in an exhaustive manner, iteratively isolates those retention times i that do maximum dis-correlate the original chromatogram distance matrix D and the time-reduced distance matrix DsS(k) = arg min v2{D, Z? S (*-i)ui), i G ( 1 . . . T)\S{k - l ) , f c = l . . . T - l i

S(k) is the growing set of index pointers to retention times which have been isolated until iteration number k; by definition 5(0) := {}, and by construction \S(k)\ = k. £>s(k-i)ui 1S the distance matrix calculated using the chromatogram vectors, thereby skipping the time indices given in set S(k — 1) U i. The total number of retention times is T = 3000 in the present study; in larger applications early stopping can be considered if the remaining squared correlation r2 drops below a critical near-zero threshold or when reaching a plateau. FUSE is analogous to sensitivity analysis of system input response. The reference inter-relationships of preprocessed source chromatograms are calculated once as distance matrix D and compared with the simplified models reflected by Ds- Thereby, matrix entries in D and Ds are not necessarily Euclidean distances. They may denote any reasonable similarity measure between input chromatograms and their time-reduced counterparts. FUSE is thus more than mere variance analysis of T independent time slots or difficult-to-interpret PCA-loading factor analysis. For example, the variance of components might be high just due to noise without contributing fruitfully to the reconstruction of original data relationships. To summarize, the indices S(k) from FUSE denote in descending manner the relevance of data components for faithful relationship reconstruction. Particularly, these biomarkers represent intuitive components responsible for systematic differences between the source vectors. 4- Visual clustering is the last step of the data processing pipeline to view inter-relationships of aligned chromatograms and their biomarker representations. This helps to find groups of tomatoes with similar chemical

552 compounds and it helps to identify outliers with specific substance combinations. Principal component analysis (PCA) is the standard visual clustering method for inspecting data proximities by projecting chromatogram vectors onto the two axes of maximum variance. PCA, however, is a linear approach with implicit assumption of a Euclidean input space. Moreover, as mentioned before, the axes of maximum variance are not necessarily the axes of maximum interest. The general multi-dimensional scaling (MDS) methodology aims at distance-preserving reconstruction of source data in a low-dimensional space. High-throughput multidimensional scaling (HiTMDS) is a highly optimized realization of MDS, specifically designed for dealing with very high-dimensional data 4 . Analogous to the biomarker detection approach, the optimization goal of HiT-MDS is to find positions of points in low-dimensional, interpretable Euclidean space in such a way that the matrix of their mutual distances is maximum correlated with the original data distance matrix. HiT-MDS iteratively improves the locations of randomly initialized points by update rules derived from an efficient dis-correlation stress minimization criterion. 3. Results Data screening of the 280nm-chromatograms is obtained by pseudo-gelplots of the raw data. The top panel of Fig. 2 displays in their recording order the 55 unprocessed experiments. Three groups are visually identified, experiments 1-19, 20-41, and 42-55. The first group 1-19, separated by the solid horizontal line, is one entire set of continuous HPLC recordings. Experiments 20-55, also continuously taken, were conducted one month later for testing and validation purposes. Runs 20-41 exhibit systematic recording artefacts between 250s and 500s, but a recalibration phase in the late stage of record 41 leads to a recovery of the measuring process for the subsequent runs, starting with the dashed horizontal line. 1. Baseline correction is obtained with the QuantileSpline method. The time frame size is fixed to 50s, providing a number of 3000/50=60 base points for the interpolating spline. A quantile value of q = 15% is chosen. The setting of the quantile value and the window width is determined by visual validation of the background curve and the source data under the constraints of average peak width and avoidance of too many negative baseline-corrected target points. A data example with its corresponding baseline is given in Fig. 3 for the focus of interest between 500s and 2750s: the baseline smoothly follows the original signal without interfering with the time scale of the retention dynamic. Thus, the summation of peak areas of baseline-corrected chromatograms provides more comparable standards.

Raw Data 1

5? 10 15 20

E 2530

'

1

•

i I j

1

I I

•

1

i

>

i

"

•

35 [ 40t_. 45i

*

i

.

J

i

__ 1

50 1 1 -

•

.

.

_ ! .

-;

;

1

,

',

, 1

500

1000

'.

1 i

1500 2000 retention time (s]

i

2500

3000

PTW-aligned Data ,

;

r I

I .'•

j ,

.,...|

J

'

- i t

i : I'I ,

500

I,

1000

, I

i

.

.;

,.

, • !• : •

'i l

1500 2000 retention time [s]

'

!

. _

) 1 f*

2500

3000

2500

3000

COW-aligned Data 5 10 . 15 r 500

1000

1500 2000 retention time [s]

Figure 2. Pseudo-gelplots of tomato chromatograms at 280nm. Dark bands denote high substance concentrations. Top panel: raw data; as indicated by horizontal lines, the 55 experiments fall into three temporal categories, records 1-19, 20-41, and 42-55. Two bottom panels: plots for aligned chromatograms from PTW and COW; they are already baseline corrected.

2. Chromatogram alignment is studied for PTW and COW. When all 55 experiments are processed at once, PTW produces alignments of substantially worse quality than COW. The reason for this are the three temporal groups of chromatograms identified visually: they constitute a data set too heterogeneous for the quadratic warping model. Once the three groups are independently processed, much better alignments are obtained for PTW and COW. Further considerations focus on the first group of runs 1-19, because they are supposed to contain the most trustworthy data. Alignments are shown in the two bottom panels of Fig. 2 for PTW and COW. As can be seen, the bands of the aligned chromatograms are significantly straightened in comparison to the raw data in the the upper panel above the

Chromatogram Baseline Correction reference signal spline base points baseline

•

«»(• • • » nJ\n, 500

750

1000

1250

1500 1750 retention time [s]

2000

2250

2500

. » 2750

Figure 3. Chromatogram baseline determination as spline interpolation of 15%-quantile of 50s-time-windows.

solid line. For these results, biological knowledge has been used in COW to roughly specify 39 intervals of retention times with potentially interesting substances - for these manually set focuses the correlation is maximized, whereby the search is restricted to empirically determined maximum shifts of 20s. Conveniently, PTW does not require further parameter choices. However, for both PTW and COW the alignment reference must be given. Although the differences are small, the mean chromatogram generally performs better than the median signal. This has been confirmed for all three groups. Comparing the alignment quality of PTW and COW it turns out that, for the favored mean-alignment, PTW shows higher squared deviations from the reference and lower correlation values than COW in all three cases. To conclude, COW takes more calculation time, but it yields the best overall alignment quality. Apart from one obvious COW-misalignment for experiment 17 at 1600s, this result is visually confirmed by close inspection of the two bottom panels in Fig. 2. Biomarker Identification (FUSE) 1

la...A...4iljJ.i.l'oifiLiili].k^JJi.Ul'i.>-.

0.8 0.6

8 I

•£-. •{.

0.4

0.2 500

750

1000

1250

1500 1750 retention time [s]

2000

2250

2500

2750

Figure 4. Biomarker extraction. Dots are feature ranks 1-3000 scaled to [0;1]. Upper line connects 20% top most discriminative retention times. Vertical lines refer to equivalent view on corresponding correlation loss of r 2 e [0; 1] during feature exclusion.

555 3. Biomarker extraction is an optional step after alignment. The goal is to reduce the overall data complexity to only relevant retention time intervals. By focusing on the most discriminative features of the data set according to the FUSE protocol, the noise influence of unimportant components is discarded. For the chromatograms X\,.,\§ of runs 1-19 FUSE has been calculated for Pearson similarity with matrix entries D{j = 1 — r(xi,Xj), and Dij = 1 — r(ii,Xj) for the time-dropped chromatogram vectors. This yields Fig. 4, zooming on an interesting time interval. Vertical lines correspond to the correlation loss r2(D, Ds(k-i)vi) during successive time point dropping. The dashed horizontal line separates 600, i.e. 20%, top ranked, most discriminative retention times from the less important features (dots). As expected, these feature ranks and the correlation loss are highly correlated. The top-rated retention times are of high biochemical interest for the characterization of the tomato plant compounds. 4- Visual clustering is the last step of the data processing pipeline to inspect relationships between the tomato lines. Since peak areas are proportional to compound concentrations, integral differences, i.e. Manhattan Lldistances, of chromatograms provide particularly meaningful comparisons. These distances of peak-aligned data are used for 2D-embeddings by HiTMDS of the 3000-dimensional chromatograms and of the 600-dimensional biomarkers obtained by FUSE. Figure 5 displays the visualization results. Within each plot, two visual clusters can be identified, as well as the clear outliers 6, 16, 18, and 19. The elliptic cluster corresponds to tomato lines with specifically high compound concentrations at 7805 and 9605, the circular cluster comprises tomatoes with more subtle differences, but with a tendency to higher rutin concentrations. In a chromatogram overlay, outlier 16 matches pretty well the circular group, except for an exceptionally high rutin level, which is one of the desired compounds in the study. Runs 6, 18, and 19 have more complex structure and show up as outliers. The high similarity of both panels in Fig. 5 must be pointed out: although the biomarker embedding uses only 20% of the original chromatogram length, the results are essentially the same. Since FUSE and HiT-MDS are founded on the same principles, i.e. the consideration of correlation between the original data and the dimension-reduced models, such result is expected. 4. Conclusions a n d future directions HPLC signals of probes from tomato peels have been successfully analyzed by means of a d a t a processing pipeline with the stages QuantileSpline-baseline correction, chromatogram alignment to the chromatograms' mean signal, optional FUSE biomarker identification, and HiT-MDS-based

HiT-MDS full chromatogram clustering

HiT-MDS chromatogram subspace clustering

19 1

19 16

/Ti\

r

16

/14S ,I /7~^\ 7

0

5 "1

\_y 1 17

1 -

1

Figure 5.

0

i

6

1 2 embedding axis 1

6 16

17

18 3

-

1

0

1 2 embedding axis 1

3

Chromatogram and biomarker embedding by HiT-MDS and Ll-distance.

chromatogram and biomarker embedding. The obtained results help significantly to focus on experiment outliers and groups of experiments as well as on relevant retention time intervals. The processing pipeline points out an important direction t o semi-automatic chromatogram processing, especially interesting for handling massive data sets. Future research will focus on improving the alignment procedure by reducing free parameters in COW and by enabling reiteration, i.e. by realignment of already aligned data to the recomputed mean signal. FUSE biomarker detection shows promising results, but is yet only at its starting point. Subsequent experiments are necessary to fully assess the power of this method. Finally, the visual arrangement after embedding is not only determined by the signal alignment quality, but also essentially by the underlying similarity measure for signal comparison. Therefore, meaningful measures must be reconsidered, which is a crucial topic already for most alignment procedures. Taking these issues into account, the presented pipeline can be canonically extended for dealing with mass spectra, which is subject to future investigations. References 1. V. Pravdova, B. Walczak, and D.L. Massart, "A comparison of two algorithms for warping of analytical signals", Analytica Chimica Acta 456, Issue 1, pp. 77-92, 2002 2. Paul H.C. Eilers, "Parametric time warping", Analytical Chemistry 76, Issue 2, pp. 404-411, 2004 3. N.V. Nielsen, J.M. Carstensen, J. Smedsgaard, "Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping", J. of Chromatography A 805, pp. 17-35, 1998 4. M. Strickert, S. Teichmann, N. Sreenivasulu, and U. Seiffert, "HighThroughput Multi-Dimensional Scaling (HiT-MDS) for cDNA-Array Expression Data", in W. Duch et al. (Eds.): Artificial Neural Networks: Biological Inspirations, LNCS 3696, pp. 625-634, Springer, 2005

FEATURE SCORING BY MUTUAL INFORMATION FOR CLASSIFICATION OF MASS SPECTRA C. KRIER 1 , D. FRANCOIS 2 , V. WERTZ 2 , M. VERLEYSEN 1 Universite catholique de Louvain, Machine Learning Group DICE, Place du Levant 3, B-1348 Louvain-la-Neuve, Belgium fkrier, verleysenj @ dice. ucl. ac. be 2 CESAME Research Center, Av. G. Lemaitre 4, B-1348 Louvain-la-Neuve, (francois, wertz}@inma.ucl.ac.be 1

Belgium

Selecting relevant features in mass spectra analysis is important both for classification and search for causality. In this paper, it is shown how using mutual information can help answering to both objectives, in a model-free nonlinear way. A combination of ranking and forward selection makes it possible to select several feature groups that may lead to similar classification performances, but that may lead to different results when evaluated from an interpretability perspective.

1. Introduction Mass spectrometry allows identifying chemicals in a substance by their mass and charge. It produces spectra that plot the quantity of chemicals in the substance as a function of their mass to charge ratio (m/z). Typically, several thousands m/z values are considered. Such spectra are said to be high-dimensional. For illustration purposes, the detection of cancer will be considered. The interesting question for researchers is of course which chemicals are involved in the process and which biomolecules are affected by the disease process. Two objectives are thus sought together: classification performances should be high, and the method should identify which chemicals are affected. Focusing only on features (m/z) that allow building an efficient classification model is not sufficient; indeed several sets of features could lead to similar classification performances, while one set could be of much greater interest for causality interpretability than the other ones. Another way of identifying relevant features is to examine the statistical dependency between each of them (taken individually) and the class label. While the statistical dependency concept does not make any assumption on the model that is further used for classification, it will discard features that are only relevant in a group, and not individually. 557

558 In this paper, we suggest to overcome these limitations by using the mutual information measure between features and the class label. Mutual information is a nonparametric, model-free method for scoring a set of features. It can be used to spot all features relevant to the classification, and to identify groups of features that allow building a valid classification model. It is applied to the detection of ovarian cancer through spectra of human serum. The process allows identifying feature sets that can be later assessed from a clinical perspective. The paper is organized as follows: Section 2 reviews the existing literature, Section 3 introduces the concept of the mutual information. Section 4 proposes some experiments with the Ovarian Cancer dataset and Section 5 concludes. 2. Previous work Several mass spectrometry classification algorithms have been proposed in the literature [1, 2, 3]; yet only a few studies focus on feature selection. In a comparative study, Liu [4] considers the Chi-squared test and the t-test, making the assumption that the class populations are normal-distributed. He furthermore uses an entropy-based method, considering the mutual information between pairs of features, and between each feature and the class label. Those methods will however eliminate features relevant only in conjunction with each other. Petricoin uses a Genetic algorithm [5] prior to probabilistic classification. This allows to find an (sub-)optimal feature subset, but fails at scoring each feature individually. The method may thus find a set of features adequate for classification, but not all sets that could be of interest. Furthermore, the procedure is model-dependent and prone to convergence issues. Lilien [6] proposes the use of the Linear Discriminant Analysis, and Back Projection to score the features. The LDA results in a discriminant vector that is then normalized according to the initial features variances. The score associated to each feature is the corresponding element in the normalized discriminant vector. The classification model is thus constructed on all features; therefore, a lot of poorly scored (by the LDA criterion) features may have the same weight in the classification process as a single high-scoring feature. Here again, two features that are relevant only when paired will not be identified as such. In the following section, we will see that the mutual information allows scoring groups of features, independently from a subsequent classification model, and without making any assumptions about the class sample distributions.

3. The mutual information The mutual information (MI) between two random variables or random vectors measures the "amount of information", i.e. the "loss of uncertainty" that one can bring to the knowledge of the other, and vice versa. 3.1. Definition of the mutual information The concept of uncertainty of a random variable is expressed by its entropy [7]. Although the notion of entropy has first been developed for discrete variables, it can be extended to continuous variables rather easily. The entropy H(Y) of a random variable Y with probability density function (pdf) juY is defined by H{Y) = -\nY{y)\ognY{y)dy.

(1)

The entropy of a random variable or vector Y when the value of some other random variable X is known is the conditional entropy: H(Y\ X)=-\Mx(x)

\nY{y\X^x)\ogMY{y\X=x)dydx.

(2)

The mutual information is the difference between the entropy of a variable and the conditional entropy I(X, Y) = H(Y) - H(Y\X). The mutual information can be expressed as the Kullback-Leibler divergence between the joint distribution fix.y of the variables and the product of the marginal distributions fix and /
^^'^dbcdy

.

(3)

When X and Y are independent, the mutual information is zero; the higher the dependency between variables X and Y, the higher is their mutual information. Contrarily to the correlation, the mutual information measures any relationship between variables, and not only linear relations. In the above equations, X and Y can be random vectors instead of random variables. If Y is a binary class label, definition (3) holds. Its extension to multi-class problem is not obvious though, as an adequate class labeling has to be provided. 3.2. Estimation Equations (1) to (3) are not applicable as such, as the pdf are not known in practice. The estimation of the mutual information given finite samples is thus a problem of density estimation. Density estimation can be achieved in several ways [8], for instance with histograms, kernels, B-splines, or Nearest

560 Neighbours. The latter has the major advantage to be reasonably efficient for the estimation of a multivariate density (when a random vector is involved), while the other ones suffer more dramatically from the 'curse of dimensionality' (the required number of samples needed for the estimation grows exponentially with the dimension of the random vector). There exists an extensive literature on density-based entropy estimators [9, 10]; recently, they have been extended to the Mutual Information by Kraskov et al [11]. The latter estimator is used in the experimental part of this paper. 3.3. Using the mutual information for feature scoring/selection A high mutual information between a feature X and the class label Y thus means that feature X is relevant, regardless of the classification algorithm. However, the mutual information can be used in several ways to select (sets of) features. First, the mutual information scores can be estimated between each feature individually (m/z) and the class label. The highest scores correspond to features that are most relevant in discriminating between the two classes. In contrast, the features with a mutual information near zero are statistically independent from the class label. The drawback of this method is that features that are relevant together but useless individually cannot be accurately spotted. Secondly, the mutual information can be used to search for the optimal feature subset (which may or may not be the subset of optimal features) in a forward manner: the feature with the highest mutual information with the class label is chosen first. Then, pairs of features containing the already selected one and any remaining one are built. The mutual information between each of these pairs and the class label are measured; the second chosen feature is the one contained in the pair with the highest mutual information score. The procedure is then iterated until the adequate number of features has been reached. Although this procedure, which is greedy in the sense that the choice of a feature is never questioned afterwards, can lead to a sub-optimal feature subset, it performs most often efficiently, and definitely better than the previous option. While the second procedure is good at identifying the most relevant subset, it will probably not select all features that could be relevant for the problem, as redundancy between features is avoided. Both procedures have advantages; therefore, in order to identify all features relevant as well individually as in conjunction with others, they are merged into a single one, inspired from [12]: 1. N features are selected by individual mutual information 2. M features are selected by the forward procedure 3. All 2N+M possible feature subsets are constructed and their mutual information with the class label is estimated.

561 The subset with the highest mutual information can be chosen for classification purposes; however, all other subsets associated to a high value of the mutual information with the class label can be considered as relevant for the problem too. In this way, several subsets of features can be identified, hopefully allowing spotting all features relevant to the problem, either individually or in conjunction with others. The subsets can thus be ranked and further applicationdependent investigations performed. The values of N and M should be chosen as high as possible, while keeping the 2N+M MI estimations tractable. Despite the fact that the complexity of the method is proportional to the square of the number of features in the worst case, the average number of computations is linear with the number of features. In practice, the computation of all MI values does not exceed a few tenth of minutes on a standard computer if, as an example, N+M is limited to 7. Furthermore, the whole variables selection process can be performed off-line, as it does not need to be repeated to classify a new sample. 4. Example The method is illustrated on an ovarian cancer dataset from the Clinical Proteomics Program of the U.S. National Cancer Institute [13]. The spectra result from SELDI-TOF experiments. The healthy samples come from women showing risks of cancer from a clinical perspective, while the positive cancer samples come from women with various tumors types and severity (see [5] for details.) To get a tractable number of feature subsets to assess, three features were chosen by the forward selection method; then, four other ones were chosen among the highest scored features not already selected. To assess the relevance of the selected features, a linear classification is performed on a test set, as in [6]. 4.1. Results Figure 1 shows the mutual information score for each m/z value. The vertical lines indicate which features were chosen by the forward strategy and not by the ranking procedure (in this case, the m/z ratios 2.7921 and 24.2851). Few features have really high mutual information scores. Note that negative values are obviously the result of the estimates bias (without consequence on the ranking) and variance (that gives an idea of the estimator accuracy). The final set of selected features is given in Table 1, with the corresponding m/z values. Feature 1679 has the highest mutual information with the class label. The features selected by the ranking method are obviously highly correlated; nevertheless, we will see that they are not totally equivalent for classification.

0

2000 4000

6000 8000 10000 12000 14000 16000 18000

m/z Figure 1. Mutual information for each m/z feature. Table I. The seven selected features, along with their corresponding m/z values. An O in regard of the name of a method indicates that the feature was selected by the method. Feature 181 530 1678 1679 1680 1681 1682 m/z 2.7927 24.2851 244.6604 244.9525 245.2447 245.5370 245.8296 Forward O O O Ranking O O O O O

The mutual information of each possible of the 128 feature subsets is given in Figure 2, along with the performances of a linear classifier built using that subset. The feature groups are ordered by increasing mutual information. Table 2 presents some of those feature groups. Figure 2 confirms that the classification performances are highly correlated to the mutual information (0.9041).

O

20

40

60 group ID

80

100

120

Figure 2. The mutual information and classification performances for a linear classifier built on every possible subset of the selected features.

4.2. Discussion From the analysis of Table 2, it appears that: • The group of features achieving the best classification is not the group of most individually relevant features nor it is the group identified by the

2. Some values from Figure 2. Group Feature group ID 181 126 127 530 1678 118 113 530 ;1678 38 181 ; 530 ; 1678 1 181; 1678; 1679; 1680; 1682 33 181; 530; 1678; 1679; 1680; 1682;1681 1678;1679; 1680;1681;1682 59 34 181; 530; 1680 1678 ; 1679 35

•

•

• •

Mutual information 0.3694 -0.0349 0.3694 0.5644 0.6571 0.7026 0.6585 0.6347 0.6583 0.6581

% Correct classification 74.86 58.00 75.86 90.00 100.00 98.43 98.43 95.31 98.43 95.23

forward procedure. Group 38 is the best group and contains only one of the highest-ranked features. Furthermore, the group discovered by the forward procedure achieves less good classification performances; this is because the choice of the first feature was never questioned. Using a forward or ranking procedure alone does not lead to the optimal feature subset. Some individually less relevant features help building more accurate classifiers than if using individually relevant features only. Features 530 (Group 127 - low MI) 1678 together reach 90% of correct classification (Group 113), while feature 1678 (Group 118) classifies only 76% of the samples correctly. It can thus be assumed that feature 530 is involved in the process. Only ranking features may prevent from spotting relevant ones. There are groups of different features that achieve very similar results. For example Groups 34 and 35 share no variable, although their classification performances (95 and 98 %) are rather close. Simply relying on the best feature subset according to the mutual information or to some model-based algorithm does not allow recovering all features involved in the process. The performances in classification reached by the method are similar to the results obtained by Lilen with LDA and Back Projection [6]. In this problem, it appears that only features with low m/z ratio are relevant.

5. Conclusion This paper shows that using the mutual information between features and class label in mass spectra analysis help choosing relevant feature sets. The method based on the combination of feature ranking and forward selection, and using mutual information on sets of features rather than individually, makes it possible to rank feature subsets. Then, an application-driven procedure can be used to

assess the (clinical in this example) relevance of the feature sets, starting from the highest-ranked ones by the proposed procedure. Acknowledgments M. Verleysen is Research Director of the Belgian FNRS. C. Krier and D. Francois are funded by a Belgian FRIA grant. Parts of this research results from the Belgian Program on Interuniversity Attraction Poles, initiated by the Belgian Federal Science Policy Office. The scientific responsibility rests with its authors. References 1. BL. Adam et al., Serum protein fingerprinting coupled with a patternmatching algorithm distinguishes prostate cancer from benign prostate hyperplasia and healthy men, Cancer Research, 62, 3609 (2002). 2. G. Ball et al., An integrated approach utilizing artificial neural networks and SELDI mass spectrometry for the classification of human tumours and rapid identification of potential biomarkers. Bioinformatics, 18,395 (2002). 3. H. Zhou et al., Quantitative proteome analysis by solid-phase isotope tagging and mass spectrometry, Nature Biotechnology, 20, 512 (2002). 4. H. Liu, J. Li and L. Wong., A Comparative Study on Feature Selection and Classication Methods Using Gene Expression Proles and Proteomic Patterns, Genome Informatics 13, 51 (2002) 5. E. Petricoin III et al., Use of proteomic patterns in serum to identify ovarian cancer, The Lancet, 359 572 (2002) 6. R. H. Lilien, H. Farin, B. R. Donald, Probabilistic Disease Classification of Expression-Dependent Proteomic data from Mass Spectrometry of Human Serum, Journal of Computational Biology 10(6), 925 (2003). 7. C.E. Shannon, W. Weaver, The Mathematical Theory of Communication, University of Illinois Press, Urbana, IL, 1949. 8. D.W. Scott, Multivariate Density Estimation: Theory, Practice, and Visualization, Wiley, New-York, 1992. 9. R.L. Dobrushin, A simplified method of experimental evaluation of the entropy of stationary sequence, Theory Prob. Appl. 3(4), 462 (1958). 10. O. Vasicek, Test for normality based on sample entropy, J. Royal Statist. Soc. B38, 54 (1976). 11. A. Kraskov, H. Stogbauer, P. Grassberger, Estimating mutual information, Phys. Rev. E69:066138 (2004). 12. F. Rossi, A. Lendasse, D. Francois, V. Wertz, M. Verleysen, Mutual information for the selection of relevant variables in spectrometric nonlinear modeling, Chemometrics & Intelligent Lab. Systems, 80, 215-226 (2006). 13. http://home.ccr.cancer.gov/ncifdaproteomics/ppatterns.asp

PEAK INTENSITY PREDICTION FOR P M F MASS SPECTRA USING SUPPORT VECTOR REGRESSION

W. TIMM 1 ' 2 - 3 , S. BOCKER 2 , T. TWELLMANN 3 , T. W. NATTKEMPER 3 1. International NRW Graduate School in Bioinformatics and Genome Research 2. Junior Research Group Informatics for Mass Spectrometry, Genome Informatics Group, Faculty of Technology 3. Applied Neuroinformatics Group, Faculty of Technology Bielefeld University, Postfach 100131, 33501 Bielefeld, Germany E-mail: wtimm
1. I n t r o d u c t i o n Mass spectrometry has become the method of choice to analyze the proteome of a cell. One widely-used approach is based on separating proteins via two dimensional electrophoresis, then digesting each protein using an endopeptidase such as trypsin, and finally analyzing the peptide mixture by MALDI-TOF mass spectrometry. Proteins are identified by comparison of the resulting protein mass fingerprints (PMFs) with those in a database of known proteins. W i t h the increasing amount of d a t a produced in this area, automated approaches for reliable protein identification are highly desirable. Shadforth et al.1 give an overview of currently available techniques. The most established programs for this purpose are ProFound 2 565

566 and MASCOT 3 . These query tools require human interaction to evaluate the identification results, which significantly slows down the process of analyzing whole proteomes. In P M F analysis, cleaved peptides stay intact during MS analysis and are only detected if they are ionized. During such an experiment, not only peak masses, but also their intensities (peak height, area under curve) can be assessed. However, current analysis tools for P M F d a t a usually only take a list of peak masses as input. If peptides are ionized all alike and the digestion procedure works perfectly, peaks of equal intensity can be found at all peptide masses, taking into account multiplicities of peptides in a protein. In reality, this is not the case: Due to the variety of ionization processes, and unequal specificity of trypsin for different amino acids, peak intensities can vary significantly. Models have been proposed for these processes 4 ' 5 , but currently these models do not allow to precisely predict peak intensities. Elias et al.6 show that the use of peak intensities enhances the reliability of protein identification for tandem mass spectra. It is reasonable to assume the same to be true for P M F spectra. As noted above, software for P M F interpretation usually ignores intensity information. Modeling of intensities in tandem mass spectra has been dealt with by multiple authors 7 ' 8 , 9 , 1 0 . Such models mainly consider the fragmentation probabilities of molecules. For MALDI P M F spectra, where molecules stay intact in the analysis procedure and the intensities highly depend on the ionization probability of the molecules, Gay et al.n applied a number of different regression and classification methods. They found the M5' decision tree algorithm to perform best among the regression methods tested and derived a few rules regarding the influence of the occurrence frequencies of some amino acids on the peak intensities.

2. Goal The goal of this work is to examine approaches for predicting the intensities of peaks from measured protein MALDI-TOF mass spectra using a numerical representation of the peptide sequences of theoretically cleaved proteins. In a first step towards a solution of this problem, our question is whether this string representation enables us to find a correlation between the peptide sequence and its peak intensity. In the long run, the prediction of intensities should enhance the reliability of protein identification results by calculating more realistic theoretical spectra.

567 For the experiments as presented in this work, we assumed that the peak intensities are reproducible under the same experimental conditions. This assumption is supported by the work of Coombes et al.12 who found peak intensities of multiple experiments under the same conditions with proteins from the same sample to be highly correlated. Another assumption we make is that the intensities only depend on the corresponding peptide's sequence. This means that interactions between peptides inside the device are neglected. Also, the effect of incomplete cleavage is not considered. 3. Materials and methods 3.1.

Data

The experimental spectra were taken from biological studies on Corynnebacterium glutamicum which involved 2D-PAGE experiments with subsequent tryptic digestion and measurement in a Bruker Ultraflex device. The whole data consists of 369 raw spectra of which 20% (66) were selected belonging to 43 different proteins. These were identified by MASCOT3 with the highest distance to the score of the second-best match. The minimal score of the selected spectra is 66 and the minimal distance 37. In the further process the MASCOT identification is taken to be true. Based on this identification, theoretical tryptic digestion was performed to determine the theoretical peptides. The masses of these peptides lead to theoretical peak lists that are then used to retrieve the corresponding peaks from the preprocessed raw spectra. The preprocessing involves denoising, baseline correction, peak detection, filtering of peaks, and deconvolution (de-isotoping). If a peak is found within 1.2 Da a of the mass of a theoretical peptide, it is declared a match. For each match, the amino acid sequence and the intensity before and after the deconvolution step are taken. The extracted peaks form an initial set. All sequences from this set are embedded into a vectorial feature space to allow for processing by a machine learning algorithm. The transformation and algorithms used are described in the Techniques section (3.2) below. Three data sets were compiled from the initial set. The set r > 8 0 o includes all peaks with a mass above 800 Da, a match accuracy (distance between theoretical and real peak's mass) of at most 1.0 Da, and the intensities from the deconvoluted spectra as target values. The set TQ+ consists a

In MALDI experiments, usually only singly charged ions are observed. Therefor, when "Da" or "mass" is written in this paper, "Da/z" or "mass per charge" is meant and would be more accurate, but the difference is not important for our application.

568 of the same peaks as the set r>8oo> but additionally includes peaks with masses below 800 Da because of the low signal-to-noise ratio in that range. The r n ( j set consists of the same peaks as the set r>soo but intensities before deconvolution were used as target values instead. All peak intensities are scaled according to Is = = ^ — ^ - — — . Here, Is is the scaled intensity, 7 or i g the unsealed one, N the number of data points in the raw spectrum, /; the raw intensity value at index i, Bi the baseline value, and Ni the noise determined in the denoising step. In the remainder, intensities refers to the scaled intensities. Most peptide sequences occur in the data set multiple times but with different intensities. Therefore, the target values for the regression are calculated as the a-trimmed mean b of all intensities per distinct sequence. The scewishness of the distribution of intensities suggests to normalize them. The result are training sets with different target values: Traw- Target values are those of the original data set. T\n: The natural logarithm of intensities is calculated before the Q-trimmed mean is applied to these values. Trank- All peaks from T\n are sorted by target value, then bins are created so each bin holds twenty consecutive peaks. The bin index is assigned as the target value for the corresponding peak. Tbins: Intensities are sorted into five bins, such that the lowest bin takes all the peaks from T\n with target values < 2, the highest bin > 5, and all bins in between spanning an equal range. Again, the target values are assigned according to the corresponding bin index. Tbt,: This is a balanced variant of T^ms- Here, peaks were randomly selected from Tbi ns so each bin holds the same number of peaks. 3.2. Prediction

of peak intensities

by machine

learning

The feature vectors consist of the relative frequency of mono- and trimers over the amino acid alphabet without positional information. No relative frequencies of dimers were used. Instead, the first and last dimer of a sequence was encoded, scaled by the length of the sequence, because these positions might be the most important locations for ionization. In earlier experiments on a DNA data set the described setup performed best and can be considered a trade-of between dimensionality and amount of information. This encoding yields very sparse 8820-dimensional feature vectors. i.e. the mean of the center 50% of an ordered list

569 For the regression task at hand we applied a support vector machine for regression, the f-SVR 13 . 14 . 15 ; w ith both a Gaussian kernel and a linear kernel. To find the optimal values for the SVR regularization parameter C, the Gaussian kernel's bandwidth 7, and 1/, grid searches were performed in the three-dimensional parameter space log2(C) € [—5,15], log2(7) S [—15,7], and v £ [0.1,0.9] for each training set separately. For the evaluation, a ten-fold cross validation was performed using the optimal SVR parameters found in the grid search. Because of the low amount of data available, no validation set was used for this. The predicted values accuracy was assessed by measuring the root mean square error (RMSE) and Spearman's rank correlation coefficient (cor).

4. Results and Discussion Table 1 shows the prediction performance of the i^-SVR on all data sets as well as some statistical properties. A positive correlation between the predicted and measured intensities (target values) can be observed. Highest correlation is yielded for almost all sets of To-i-. The r n( j sets are predicted worst. The highest correlation overall (0.55) was achieved for TQ+I \n. All other logarithmic sets of To+ have a correlation close to that value. In general, logarithmizing improves the prediction. The ranking/binning as well as balancing of the data does not improve the correlation except for Tncj, rank, which has low correlation though. The choice of the kernel function does not have any clearly visible influence on the correlation. Figure 1 shows plots of the predicted values against target values of different training sets. High densities are shown as dark, low densities as light areas. There is almost a diagonal structure, but low values can not be predicted well, indicated by a wide spreading. In addition, high values are predicted too low in most cases regardless of the normalization used. The Trank a nd the Tbb sets are balanced so this result can not be due to a low number of training examples in these ranges but must be caused by the data and the chosen representation. Pearson's correlation coefficient (which assumes a normal distribution) was also calculated (data not shown). The found values are similar (max. 0.01 difference) to Spearman's rank correlation coefficients except for the non-logarithmic modes. For these, it was noticeably lower. A leave-one-out cross-validation was done for the T raw and the T\n sets with the Gaussian kernel to take into account the low number of training

T(0+, In)-Gaussian kernel estimated 2d density

1

T(0+, rank) - Gaussian kernel estimated 2d density

T(>800, D P ) - linear kernel estimated 2d density

0*

Figure 1. Plots of measured (target) vs predicted intensities as estimated 2d density using a Gaussian kernel. SVR parameters, RMSE, and correlation are according to Table 1. left: Tti+, lm middle: To-)., rank> right: T>soo, bb- An almost diagonal structure is visible, but a wide spread for low target values can be observed. High target values are predicted too low.

examples available. This improved Pearson's correlation values by 0.03 to 0.14 only for the r>soo and r n a data sets, more so for the non-logarithmic ones, but did not change the correlation for To+, raw and To+, in- This suggests that the number of examples available for r>8oo and r nc j is by far not sufficient. It can be assumed that an increase in the number of data points of T 0+i bb will result in a correlation even higher than that of T 0+ , in. 5. Conclusion The presented results show a positive correlation between the predicted values and the scaled and logarithmized intensities although only few data was available, motivating more detailed experiments. The use of a larger data set with known protein identities and more sophisticated feature vectors would be promising. 6. Acknowledgments Thanks to Martina Mahne and Joern Kalinowski for providing spectra from their studies, and to Andreas Wilke, who manually selected spectra and did the MASCOT identification. W. Timm is currently supported by the International NRW Graduate School in Bioinformatics and Genome Research. References 1. I. Shadforth, D. Crowther, C. Bessant, Protein and peptide identification algorithms using MS for use in high-throughput, automated pipelines. Proteomics 5(16), p p . 4082 (2005).

Table 1. Dataset r>soo, raw 7>soo, In r> 8 oo!rank TVsoo, bins r>800, bb To+, raw r 0 + , in 7o+'rank 2o+,'bins T 0 + , bb Tnd, raw Tnd, In r„d!rank T n d , bins T n d , bb

N 353 353 353 353 265 448 448 448 448 265 353 353 353 353 252

fi 74.58 3.37 9.33 1.87 2.00 61.11 2.96 11.69 1.56 2.00 34.34 2.81 9.33 1.29 1.55

median 28.7 3.3 9 2 2 19.8 2.9 12 1 2 14.6 2.6 9 1 1.5

a 105.84 1.43 5.10 1.34 1.42 98.22 1.59 6.45 1.37 1.42 47.28 1.13 5.11 1.15 1.21

Results of the y-SVR's intensity prediction

Gaussian kernel v 0.2 0.5 0.5 0.8 0.8 0.6 0.7 0.5 0.8 0.8 0.8 0.3 0.5 0.5 0.7

log 2 (C) 13 13 7 11 3 9 3 5 3 9 13 15 15 3 9

log 2 (7) 3 -11 3 -9 -1 3 1 1 1 1 3 3 -11 -1 -7

RMSE 103.77 1.28 4.57 1.18 1.23 94.46 1.32 5.40 1.17 1.19 45.78 1.07 4.56 1.05 1.12

cor 0.32 0.45 0.45 0.47 0.48 0.49 0.55 0.54 0.53 0.54 0.33 0.35 0.47 0.39 0.38

linear kernel v 0.4 0.6 0.5 0.9 0.5 0.3 0.6 0.6 0.7 0.6 0.3 0.4 0.5 0.7 0.7

ln(C) 11 3 5 3 3 9 3 5 3 3 9 3 5 3 3

RMSE 105.19 1.29 4.48 1.17 1.24 97.00 1.33 5.46 1.22 1.21 44.70 1.01 4.74 1.06 1.10

cor 0.37 0.45 0.49 0.49 0.47 0.40 0.54 0.53 0.48 0.53 0.41 0.45 0.42 0.38 0.41

Note: N: Number of items in data set, fi: mean intensity value, cr: standard deviation, v. trade-off parameter v of the I/-SVR, C: regularization parameter of the f-SVR, 7: width of Gaussian kernel function of the f-SVR, RMSE: root mean square error, cor: Spearman's rank correlation coefficient as calculated by the R statistical environment 16 .

572 2. W. Zhang, B. T. Chait, ProFound: An Expert System for Protein identification Using Mass Spectrometric Peptide mapping Information. Anal. Chem. 72, pp. 2482, (2000). 3. D. N. Perkins, D. J. C. Pappin, D. M. Creasy and J. S. Cottrell, Probabilitybased protein identification by searching sequence databases using mass spectrometry data., Electrophoresis 20(18), pp. 3551, (1999). 4. R. Zenobi and R. Knochenmuss, Ion formation in MALDI mass spectrometry. Mass Spec. Reviews 17, pp. 337 (1998). 5. M. Karas, M. Gliicksmann and J. Schafer, Ionization in matrix-assisted laser desorption/ionization: singly charged molecular ions are the lucky survivors. J. Mass Spectrom. 35, pp. 1 (2000). 6. J. E. Elias, F. D. Gibbons, O. D. King, F. P. Roth and S. P. Gygi, Intensitybased protein identification by machine learning from a library of tandem mass spectra, nature biotechnology 22(2), pp. 214 (2004). 7. R. J. Arnold, N. Jayasankar, D. Aggarwal, H. Tang and P. Radivojac, A machine learning approach to predicting peptide fragmentation spectra. Pac. Symposiom on Bioinformatics (2006). 8. E. A. Kapp, F. Schiitz, G. E. Reid, J. S. Eddes, R. L. Moritz, R. A. J. O'Hair, T. P. Speed and R. J. Simpson, Mining a Tandem Mass Spectrometry Database To Determine the Trends and Global Factors Influencing Peptide Fragmentation. Anal. Chem. 75, pp. 6251 (2003). 9. F. Schiitz, E. A. Kapp, R. S. Simpson and T. P. Speed, Deriving statistical models for predicting peptide tandem MS product ion intensities. Biochem. Soc. Trans. 31, pp. 1479 (2003). 10. A. Bonner and H. Liu, Predicting Protein Levels from Tandem Mass Spectrometry Data. NIPS'04 Workshop on New Problems and Methods in Computational Biology (2004) 11. S. Gay, P.-A. Binz, D. F. Hochstrasser and R. D. Appel, Peptide mass fingerprinting peak intensity prediction: Extracting knowledge from spectra. Proteomics 2, pp. 1374 (2002). 12. K. R. Coombes, H. A. Fritsche Jr., C. Clarke, J.-N. Chen, K. A. Baggerly, J. S. Morris, L.-C. Xiao, M.-C. Hung and H. M. Kuerer, Quality Control and Peak Finding for Proteomics Data Collected from Nipple Aspirate Fluid by Surface-Enhanced Laser Desorption and Ionization. Clin. Chem. 49:10, pp. 1615 (2003). 13. B. Scholkopf, P. Bartlett, A. Smola and R. Williamson, Shrinking the Tube: A New Support Vector Regression Algorithm. Neural Computation 12, pp. 1207 (2000). 14. B. Scholkopf, P. Bartlett, A. Smola and R. Williamson. In M. S. Kearns, S. A. Solla and D. A. Cohn (Eds.), Advances in neural information processing systems, 11, Cambridge, MA: MIT Press (1999). 15. C.-C. Chang, C.-J. Lin, LIBSVM: a library for support vector machines, h t t p : / / w w w . c s i e . n t u . e d u . t w / ~ c j l i n / l i b s v m , (2001). 16. R Development Core Team, R: A Language and Environment for Statistical Computing, h t t p : / / w w w . R - p r o j e c t . o r g , ISBN 3-900051-07-0, Vienna, Austria, (2005).

LEARNING COMPREHENSIBLE CLASSIFICATION RULES FROM GENE EXPRESSION DATA USING GENETIC PROGRAMMING AND BIOLOGICAL ONTOLOGIES BEN GOERTZEL, LUCIO DE SOUZA COELHO, CASSIO PENNACHIN, IZABELA FREIRE GOERTZEL, MURILO SARAIVA DE QUEIROZ, FRANCISCO PROSDOCIMI, FRANCISCO PEREIRA LOBO Biomind LLC, 1405 Bernerd Place Rockville MD 20851, USA We consider the problem of how to use automated techniques to learn simple and compact classification rules from microarray gene expression data. Our approach employs the traditional "genetic programming" (GP) algorithm as a supervised categorization technique, but rather than applying GP to gene expression vectors directly, it applies GP to "enhanced feature vectors" obtained by preprocessing the gene expression data using the Gene Ontology and PIR ontologies. On the two datasets considered, this "GP + enhanced feature vectors" combination succeeds in producing compact and simple classification models with near-optimal classification accuracy. For sake of comparison, we also give results from the combination of support vector machine classification and enhanced feature vectors on the same datasets.

1. Introduction The analysis of microarray data is still somewhat problematic [10,13,14,15]. One popular approach is to apply supervised categorization technology [1,3], a strategy with two purposes: to find classification models that can be used to develop diagnostics; and to find models that can be used as guidelines for ongoing experimental and theoretical work. Unfortunately regarding the latter purpose, it is common for highly accurate machine learning algorithms to generate models that can't be easily understood. Techniques exist that alleviate the understandability problem [4,9,18] via providing lists of "important genes or GOs" related to the categorization process - but this sort of approach ignores the issue of how genes relate to each other in the operation of a classifier. The approach we describe here utilizes the genetic programming algorithm [12] in an unusual way - by giving it additional input features beyond the gene expression values, which are derived from the latter using knowledge resources such as the Gene Ontology (GO) [6] and the Protein Information Resource (PIR) [17] This "enhanced feature vectors" approach often produces classification models that are simple and compact and have clear biological 573

574 meaning. Here we report results obtained by applying this approach to two different microarray datasets, using both GP and support vector machines. 2. Feature Vector Enhancement The basic idea of the enhanced feature vectors approach is to produce feature vectors containing additional entries besides the usual (normalized, transformed) gene expression values. The simplest way to produce additional features is to use the GO, but the same approach applies with the PIR or other resources (generically referred to as Background Information or BI). For each entity whose gene expression profile is under study, we may create a single feature value corresponding to each GO category. In the simplest approach, these GO-derived feature values may be computed by averaging. Let G be the set of genes whose expression values have been measured (i.e., the original feature set), and GO the set of gene ontology categories. If we have a GO category CBGO, and an entity E, then the value of the feature corresponding to C in the feature vector of entity E may be defined as the average expression in E of all the genes gj e G annotated to belong to C. More formally, assume we have an entity (e.g. an organism or a tissue sample, evaluated at a single time point) Eh and let geneexpy denote the (perhaps normalized and transformed) expression level of gene g, in entity Et. Let GOk denote the Ar'th GO category under consideration. Let 6* denote the set of genes contained in GOk. We may consider a number wjk in [0,1] associated with each element gj in G/c, which is the confidence with which it is known that gjeGk. As a default this may be set close to 1 (e.g. 0.99), but in cases where GO category membership is determined via automated learning, the confidence may be significantly lower. In the results reported here we set all w# constant, but we have done other work to be reported elsewhere, in which the wJk vary significantly because some geneGO assignments are made via machine learning. Given all these preliminaries, we now define the amalgamated expression value of the GO category GOk, as GO_expik= Zj:gjEGk[ wjk gene_expy]

(1)

Using this approach, one may obtain, for each entity being categorized, an extended feature vector of length.

575 |gene_set| + |GO_cat_set|

(2)

where \gene_set\ is the number of genes measured by the microarrays under use, and \GO_cat_set\ is the number of gene ontology categories being utilized. Alternately, one may choose to utilize only the GO-based feature vector entries, thus obtaining a feature vector of length \GO_cat_set\. 3. Experimental Results 3.1. Biological Test Data In order to test the enhanced feature vectors methodology, we have experimented with two publicly available datasets: 1.

2.

Lung cancer [7]. This dataset has expression data on samples of lung malignant pleural mesothelioma (MPM) and adenocarcinoma (ADCA), the goal being to classify between the two tumor types. The training data has 32 samples, equally split between the two classes. The test set has 134 ADCA samples and 15 MPM samples. This dataset has already been submitted to supervised categorization analyses using ensembles of decision trees, which obtained test set accuracies of 93.29% [15]. Aging Human Brain [12]. This dataset contains samples of three categories of subjects divided by age: "young" (less than 42 years old); "middle age" (between 42 and 72 years); "old" (over 72 years). In our experiments, we used a subset of this original dataset containing only the categories "young" (9 samples) and "old" (11 samples).

3.2. Methods First of all, the datasets underwent a normalization based on log-transform and Z score, in such a way that, after normalization, all features had mean zero and variance 1. Then, GP and SVM classification methods available at the categorization framework of Biomind ArrayGenius Software were used on all datasets. In particular, we used the metatasking capability of ArrayGenius: the experiments for each dataset were actually composed of 500 GP tasks and 500 SVM tasks running with random parameters; the results presented here are the best ones found via this process. All parameters not mentioned below were left at their ArrayGenius default values. All combinations of use of direct and derived features (see section above on feature vector enhancement ) were allowed. In SVM tests, only the kernel parameter was varied among all available alternatives. In GP tests, fitness function was varied across all available

576 alternatives. For each test, a random number N between 10 and 1000, and the test used only the top N features with values most differentiated among the categories in the problem. In the case of the Lung Cancer dataset, separate training and test sets were used for statistical validation, in order to allow comparison with previously published results. For the Aging Brain dataset, we used 10x10 cross-validation, since it had no prior division of test and train datasets. 3.3. Results Table 1 . Summary of the accuracy we obtained in our tests using enhanced feature vectors. We compare these results with previous reports on the same dataset when those are available. Accuracy values are for the test sets only.

Dataset

Method

Lung Cancer Aging Brain

SVM GP SVM GP

Accuracy with Enhancement 100.0% 97.0% 100.0% 95.0%

Accuracy without Enhancement 100.0% 91.3% 95.0% 70.0%

Accuracy in Literature 93.3% —

Tables describing our results on these datasets in more detail are available in the Supplementary Information, online at http://www.biomind.com/cibb06.html, where results on other datasets are also noted. Tests using only derived features showed comparatively poor performances. For the lung cancer dataset, the use of derived-only features achieved only 89.9% for both GPC and SVM. For the ageing brain dataset, the use of derived-only features achieved only 80.0% accuracy using GP and 90.4% using SVM. Notably, SVM achieved 100% classification accuracy on both of them, beating the best results in the literature on the lung cancer dataset, namely 93.3%o accuracy, which had been achieved through supervised classification technology. The ageing brain dataset does not have any published benchmark against which to compare our results. GP achieved 97 and 95% accuracies on the lung cancer and ageing brain datasets, respectively. These are numbers for GP with enhanced features, which beat the results for GP without enhanced features on both cases.

577 The models learned using genetic programming together with enhanced feature vectors are compact and informative. Below, we show the best model found for the aging brain dataset using genetic programming, in algebraic form. 1. Aging Brain: (((GO:0015671-FAM0010221) * GO:0001565 ) * (0.849917 + SF000628)) / ( FAM0040135 / GO:0001775)

At this point one can ask some questions to further validate those results: 1) are the BI features selected coherent with the traditional features, i.e., do they represent the genes that are differentially expressed and 2) do they assist with understanding the biological phenomena underlying the data? 3.3.1. Aging Brain data To validate the coherent utilization of BI features versus normal ones (i.e., to check if the BI features are somehow correlated with the traditional ones) we performed a curated analysis of the 20 features most utilized by the models in our GP ensembles as compared to the genes found as differentially expressed in the original work [11]. In this work the greatest changes observed in expression occurred in genes related with synaptic function, neuronal plasticity, signal transduction, lipid metabolism, vesicular transport, protein metabolism, Ca+ homeostasis, microtubule cytoskeleton, amino acid modification, hormones and immune response. The vast majority of BI features most used by our ensembles are clearly correlated with the gene functions differentially expressed in the original work, notably long-term neuron survival (features GO:0048169 and GO:0008582), lipid metabolism (GO:0004063 and GO:0004064), and immune response (GO:0004915, GO:0045917, GO:0048143 and GO:0019981). It's interesting to notice that BI features correlated with common diseases from aging process, notably Parkinson's, were also found (GO:0048154 and GO:0048155). 3.3.2. Lung Cancer data In this dataset there is no clear indication of specific differentiated categories in the original work [7], so we analyzed the feature utilization directly, comparing the normal and BI features generated by our algorithms. The presence in the "important feature lists" of several features related with cytoskeleton (like

578 NM_001614, NM005775 and GO:0005519), and with fibroblast growth (NM_002010, NM_000604 and SF000628) is a good sign. Several characteristics of cancer clearly are represented by BI features: locomotion both from tumor cells and immune ones [2] - and replication, represented by fibroblast growth. References 1. A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer and Z. Yakhini, J ComputBiol. 7(3-4), 559-83 (2000). 2. A. Besson, R. K. Assoian and J. M. Roberts, Nat. Rev. Cancer. 4, 948-55 (2004). 3. M. Brown, W. Grundy, D. Lin, N. Cristianini and C. Sugnet, Proc. Natl. Acad. Set USA. 97, 262-267 (2000). 4. J. Cho, D. Lee, J. Park and I. Lee, FEBS Letters 571, 93-98 (2004). 5. N. Cristianini and J. Shaw-Taylor, Support Vector Machines, Cambridge University Press 2000. 6. Gene Ontology Consortium, Nat. Genet. 25, 25-29 (2000). 7. G. Gordon, R. Jensen, L. Hsiao, S. Gullans, J. Bluemnstock, S. Ramaswamy, W. Richard, D. Sugarbaker and R. Bueno, Cancer Res. 62, 4963-4967 (2002). 8. I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Machine Learning 46, 389422 (2002). 9. T. Hvidsten, A. Laegreid and J. Komorowski, Bioinformatics 19, 1116-1123 (2003). 10. R. Kothapalli, S. J. Yoder, S.Mane and T.P. Loughran, BMC Bioinformatics 3(1), 22 (2002). 11. J. Koza. Genetic Programming. MIT Press 1992. 12. T. Lu, Y. Pan, S. Y. Kao, C. Li, I. Kohane, J. Chan and B. A. Yankner, Nature 24, 883-891 (2004). 13. J. Lyons-Weiler, Applied Bioinformatics 2(4), 193-195 (2003). 14. D. Singh, P. Febbo, K. Ross, D. Jackson, J. Manola, C. Ladd, P. Tamayo, A. Renshaw, A. D'Amico, J. Richie, E. Lander, M. Loda, P. Kantoff, T. Golub and W. Sellers, Cancer Cell 1 203-209 (2002). 15. A. Statnikov, C. F. Aliferis, I. Tsamardinos, D. Hardins, S. Levy, Bioinformatics 21, 5, 631-643 (2005). 16. A. Tan and D. Gilbert, ,4/?/)/. Bioinformatics 2, S75-S83 (2003). 17. J. Wang, T. B0, I. Jonassen, O. Myklebost and E. Hovig, BMC Bioinformatics 4 (2003). 18. C. Wu, L. Yeh, H. Huang, L. Arminski, J. Castro-Alvear, Y. Chen, Z. Hu, R. Ledley, P. Kourtesis, B. Suzek, C. Vinayaka, J. Zhang and W. Barker, Nucleic Acids Res. 31, 345-347 (2003).

P R O T E I N SECONDARY STRUCTURE PREDICTION: H O W TO IMPROVE ACCURACY BY INTEGRATION

LUIGI PALOPOLI DEIS, Universita delta Calabria, Italy e-mail: [email protected] SIMONA E. ROMBO DIMET,

Universita

"Mediterranea" di Reggio Calabria, e-mail: [email protected]

Italy

GIORGIO TERRACINA Dip. di Matematica, e-mail:

Universita della Calabria, [email protected]

Italy

GIUSEPPE TRADIGO ICAR-CNR, Rende, Italy e-mail: [email protected] PIERANGELO VELTRI * Universita

keywords:

"Magna GrtEcia" di Catanzaro, e-mail: [email protected] ph: +39 0961 3694149 fax: +39 0961 3694112

Italy

Proteomics, Protein Structure Prediction, D a t a Integration

In this paper a technique to improve protein secondary structure prediction is proposed. The approach is based on the idea of combining the results of a set of prediction tools, choosing the most correct parts of each prediction. The correctness of the resulting prediction is measured referring to accuracy parameters used in several editions of CASP. Experimental evaluations validating the proposed approach are also reported.

* contact author

579

1. Introduction Biological functions of proteins depend on the spatial disposition of amino acids composing them. Even if new protein amino acid sequences are continuously discovered, identifying their spatial disposition requires many efforts. Experimental, or exact, methods such as X-Ray Crystallography or Solution Nuclear Magnetic Resonance (NMR-Ray), are very expensive and time consuming. Thus, computer based automatic tools have been designed to predict protein structures, and such methods received great attention in the last few years 3 ' 5 ' 4 . Recently, many tools have been proposed and are available on-line 6 ' 1 3 , achieving good prediction accuracies. Nevertheless, quality is still not comparable with that obtained by exact methods and research for quality prediction improvements is considered an important research topic 2 . Moreover, most of the existing prediction tools have high accuracy only on specific groups of proteins. Thus, a challenging problem is to devise prediction methods capable of achieving high levels of accuracy independently of the input proteins they are applied to. Recently, to improve the quality of prediction and to reduce the input dependency, methods based on a joint use of available prediction tools have been proposed 6 ' 7 ' 8 . In this paper we focus on secondary structure prediction, presenting a novel approach based on the integration of prediction results obtained by several existing prediction tools. The idea is to select and integrate the best predictions in order to obtain higher accuracy than using a single prediction tool. Such an idea is similar to what it has been done for tertiary structures prediction 8 , but focusing on secondary structures. The following example shows the basic idea of the proposed approach. Given a protein p, composed by k amino acids, its secondary structure can be represented as a string with cardinality k on the alphabet of three symbols £ = {E, H, L}, meaning that the corresponding amino acid stands respectively on an a-helix, a /3-strand or a non regular conformation. Let X i , . . . , Tn be the predictions for a protein p obtained by using n different prediction tools. The idea is to combine (integrate) T i , . . . , Tn to obtain a new prediction. Figure 1(a) schematically shows the foregoing when n = 5. Each prediction is represented by a bar filled using three different textures, one for each possible secondary structure configuration (black for a-helix (H), striped for /3-strand (E) and white for non regular shapes (L)). The bottom part of the figure reports the real (i.e., obtained by exact methods) secondary structure of p, using the same notation. Figure 1 (b) shows that, combining three out of the five predictions (namely, Ti, T3 and T5), the

Set of predictions (except T2 and T4):

Result of predictions "composition":

Result of predictions "composition":

Real secondary structure:

Real secondary structure:

~KW

(>)

(b)

Figure 1. (a) "Composition" of predictions, (b) "Composition" of predictions discarding the worst predictions.

result is closer to the real structure than the one reported in Figure 1 (a). Note, by the way, that predictions T2 and Tn are less accurate than T\, T3 and T5, if compared with the real structure. The main contribution of the paper consists in the definition of a method for the integration of different predictions; this is carried out by applying an appropriate criterion to locate and combine the "best" parts of the various predictions. 2. Parameters Definition To measure the accuracy of a prediction, some parameters have been defined in the literature 1 2 > u , Given a prediction tool and the amino acid sequence for a protein p, the three-state prediction accuracy Q3 represents the percentage of secondary structure configurations (i.e., states) correctly predicted by the prediction tool. The per-segment accuracy SOV measures the percentage of segments of secondary structure correctly predicted, where a segment is a contiguous set of amino acids. Q3 and SOV can be evaluated once the real (observed) protein secondary structure is available. Using such parameters, we define new ones in order to evaluate the accuracy of a prediction tool w.r.t. a set of proteins. In particular, given a secondary structure prediction tool T,, and a set P of m proteins whose observed secondary structure is known, we define the average per-segment accuracy coefficient SOV^), as follows:

SOVu) = ^~>»SOV™ (1) x 100 w m where SOV^j) indicates the value of SOV corresponding to the prediction of Ti for the j-th protein in P . SOV indicates the ability of a prediction tool

to correctly predict entire sections of secondary structures. Such information is necessary to evaluate how much the "opinion" of such prediction tool is to be considered accurate, whenever a situation of disagreement among prediction tools occurs. Given a set T of n prediction tools, and given a protein p with k amino acids, we define a consensus parameter to measure the agreement among prediction tools in T while predicting the secondary structure of p. In particular let T; € T be a prediction tool, and kj the j-th amino acid in p, we define the consensus percentage C(y) as follows:

^ x 100. (2) n where Nc^jf is the number of prediction tools in T that have predicted the same result as T, for the j - t h amino acid of p. The consensus percentage indicates how much a prediction tool agrees with the remaining n — 1 ones in predicting a single amino acid state. Similarly, given a prediction tool Tj and a segment Sj in the predicted structure for p, in order to evaluate the consensus of the prediction tool T; with the remaining n — 1 prediction tools in T w.r.t. the segment Sj, we define the superposition mutual coefficient for segment SOVmutuai as: C(i,j) = ^

X) SOVmutUai(i)

= —

SOVTiT> " ' _ i

(3)

where SOVTiTl is analogous to SOV with the difference that SOV is evaluated by considering a predicted and an observed secondary structure, whereas SOVTiT' measures the segment overlap existing between two predicted structures. 3. Integrating Prediction R e s u l t s Approach The integration approach exploits a set T of n prediction tools. The inputs are the amino acid sequence (primary structure) of a protein p whose secondary structure is unknown, and a set P of m proteins whose structures are known (observed), and such that they are related to the protein p. More precisely, the proteins in P are required to be homologous to p in their structures, and in their biological functions. We notice that several tools and databases that classify proteins in families, referring to their biological functions, mutations or protein structures, are available on-line 9 , 1 ° .

583 3.1. Assigning

a Vote to Each Prediction

Tool

Let p be a protein with k amino acids and let T be a set of n prediction tools that have expressed n distinct predictions for the secondary structure of p. The proposed approach combines the n predictions in order to obtain a predicted secondary structure for p with a better accuracy than each single prediction. It consists in selecting, for each amino acid of p, the most probable state (helix, strand or irregular state) from the n predictions. To integrate predictions we define a voting matrix M(n x k) where M[i, j] represents a vote for the prediction of the i-th prediction tool w.r.t. the j - t h amino acid of p. M is defined by using the parameters defined in Section 2, evaluated running the prediction tools in T on a set P of proteins whose secondary structure is known and such that proteins in P are homologous to p. The voting matrix M is then defined as follows:

M[i, j] = SOV

(0

+ C(i,j) + SOVmutual{i)

(4)

In particular, SOV^) can be considered as a reliability score for the i-th prediction tool to predict the secondary structures of the proteins in P . C^j) represents a punctual agreement among the i-th prediction tool and the other ones in predicting the j-th amino acid of p, whereas SOVmutuai^) represents a structural agreement index comparing the i-th prediction w.r.t. the remaining ones. The following section reports the integration algorithm exploiting the voting matrix.

3.2. The Integration

Algorithm

Let p be a protein, P a set of known proteins homologous to p and T a set of n prediction tools. Suppose to know the n secondary structure predictions for p given by the prediction tools in T , and let M be the voting matrix computed as in equation 4 w.r.t. p, P and T . procedure I n t e g r a t e ^ , T, M, s) begin for each amino acid j of p do begin if all the prediction tools in T agree on the amino acid j t h e n s[j] = the j-th symbol of the prediction of one prediction tool in T; else b e g i n select i such that M[i, j] is the maximum of the column j of M; s ti] = t h e j-th symbol of the prediction of the i-th prediction tool; end else end for end procedure

584 The integration algorithm obtains a secondary structure prediction s for p as follows. For each amino acid j in p, the prediction of the i-th tool is chosen, where i is obtained determining the maximum value M[i, j] for the column j . Finally, the prediction s is the sequence obtained by concatenating the predictions chosen for each amino acid in p. Table 1. Comparison between accuracy measures of the prediction tool scoring the maximum values of SOV (resp. Q3) and the integration tool. Ts (resp., T*5) indicates that the tool T scored the best SOV (resp., Q3) for that protein. Protein

Tools

ldlw lmwb

porter, psipred 5 , rosetta^ prof, jufo, r o s e t t a S Q , yaspin prof, porter, jufo, psipred"^, yaspin psipred 5 , rosetta w , sam r o s e t t a 5 y , porter psapredict, prof, porter, psipred , rosetta 1 ^, yaspin prof, porter, psipred 5 w , rosetta, sam, yaspin porter, psipred 5 , rosetta, s a m ^ prof, porter, psipred, rosetta, sam , yaspin prof 5 , porter, s a m y psipred4*1, jufo, yaspin, rosetta porter, j u f o y , yaspin, rosetta, sam porter, j u f o 5 y , rosetta p o r t e r 5 y , yaspin, rosetta p o r t e r 5 , jufo, prof, yaspin, s a m Q prof, p o r t e r 5 ^ porter, yaspin 5 *^, sam porter, psipred , s a m ^ prof, psipred^, sam prof, psipred , sam prof, psipred 5 ^, sam

lidr li78 lk24 lilz livs llrz leiy lset lfaf lbqO lxbl lhdj lfpo lmm4 lp4t lqj8 lg90 lbxw iqjp

M a x SOV Tool SOV Q3 83.62 86.21 91.87 96.00

M a x Q3 Tool SOV Q3 83.21 91.38 91.87 96.00

Integration Tool SOV Q3 85.2 88.79 99.05 95.12

98.77

91.91

96.95

94.12

99.51

94.12

68.80 67.11 84.20

78.45 78.66 79.64

67.10 67.11 71.89

81.48 78.66 82.91

77.89 67.11 86.71

82.83 78.66 80.00

72.94

78.77

72.94

78.77

75.64

80.74

82.28

80.75

81.02

82.63

84.50

82.63

62.48

73.43

62.48

73.43

63.69

71.14

79.57 85.42

76.96 83.54

75.78 82.44

78.38 87.34

82.43 87.54

78.15 93.67

56.98

71.84

52.55

79.61

60.18

74.76

76.85 92.21 92.73

89.72 90.91 91.8

76.85 92.21 91.71

89.72 90.91 92.98

71.11 92.95 93.31

86.92 92.21 92.40

81.43 81.02 76.08 79.98 89.68 88.08

82.35 79.35 76.35 82.95 87.79 85.96

81.43 81.02 67.15 77.67 89.68 88.08

82.35 79.35 77.7 83.52 87.79 85.96

81.43 82.07 79.12 86.31 97.66 95.51

82.35 80.65 77.7 86.36 87.79 86.55

4. Experiments To validate our approach we tested the algorithm in Section 3.2 on proteins whose secondary structures are published in the Protein Data Bank (PDB) 1, using a set of 9 available prediction tools, namely porter, psipred, psapredict, jufo, prof, rosetta, sam, and yaspin. Results of experimental evaluations are reported in Table 1. Here, each

585 row corresponds to a test on a protein p (assumed unknown as far as the test was concerned) whose PDB identifier is reported in the first column. For each p the table shows: (i) the prediction tools considered in the integration process; in this column we use the notation Ts (resp., T ^ ) to highlight the tool T scoring the best SOV (resp., Q3) on p among the considered ones; (ii) values of SOV and Q3 for Ts; (in) values of SOV and Q3 for T®; (iv) values of 5 0 V and Q3 for our integration tool. Integration improves SOV parameter in 85.7% of cases, and Q3 parameter in 66.6% of cases. Note that, usually the prediction tool scoring the maximum SOV value does not obtain the maximum Q3, and vice versa (see, e.g., rows 1, 3 and 4). On the contrary, the integration tool scores the best accuracy for both SOV and Q3 in many cases and it shows accuracy values that are very close to the maximum ones in the other cases. As a consequence, it is possible to conclude that our integration tool tends to improve the overall accuracy of the prediction, considering both measures. The selection of prediction tools used for the integration process is currently semi-automatic. We are working to define an appropriate technique for the automatic selection of the best prediction tools set, to provide a fully automated tool for the prediction of protein secondary structures. Integration procedure (described in Section 3.2), and voting matrix evaluation are fully implemented in Java. Automatic querying of available prediction tools and results normalization are currently under development. All these modules are part of a more complex architecture for the automatic prediction of secondary structures whose prototyping will be soon available. Finally we plan to face a further challenge, that is, the use of the presented tool in combination with tertiary structures prediction ones to further improve overall accuracy. 5. Related Work In 6 an interactive protein secondary structure prediction Internet server is presented. The server allows a single sequence or multiple alignment to be submitted, and returns predictions from six secondary structure prediction algorithms that exploit evolutionary information from multiple sequences. The main difference w.r.t. our approach is that they aim at individuating the best results among the available predictions exploiting a consensus technique, whereas our system integrates only the subset of available predictions allowing to improve prediction accuracy. In 7 a method based on the cooperative exploitation of different tertiary structure prediction tools is proposed. The tool is based on the selection

586 of models predicted by a number of independent fold recognition servers, by confidence assignment. In 8 an approach for tertiary structures prediction is proposed. Such approach considers the characterization of the performances of a team of prediction tools jointly applied over a prediction problem, choosing the best team for a prediction problem and integrating prediction results of the tools in the team in order to obtain a unique prediction. Differently from our approach, 7 ' 8 face the problem of protein tertiary structures prediction, thus the technique to combine predictions, the voting matrixes and the reference measures of precision are completely different from our own. References 1. H.M., J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, and P.E. Bourne. The protein data bank. Nucleic Acids Research, 28:235-242, (2000). 2. Critical assessment of techn. for protein structure prediction. http://predictioncenter.llnl.gov/casp6/Casp6.html. 3. D. M. Webster, "Protein Structure Prediction: Methods and Protocols (Methods in Molecular Biology)", Humana Press, 15 August (2000). 4. C. Guerra, S. Istrail, "Mathematical Methods for Protein Structure Analysis and Design: Advanced Lectures (Lecture Notes in Bioinformatics)", Springer, 13 August (2003). 5. A. Tramontano, "Protein Structure Prediction : Concepts and Applications", John Wiley & Sons , (2006). 6. J. A. Cuff, M. E. Clamp, A. S. Siddiqui, M. Finlay and G. J. Barton, "Jpred: a consensus secondary structure prediction server" Bioinf. 14, 892-893 (1998). 7. D. Fischer, "3D-SHOTGUN: a novel, cooperative, fold-recognition metapredictor" Proteins 51:3, 434-41 (2003). 8. L. Palopoli and G. Terracina, "CooPS: a system for the cooperative prediction of protein structures" J. of Bioinf. and Comp. Biol. 14, 14-16 (2004). 9. A. G. Murzin, S. E. Brenner, T. Hubbard and C. Chothia, "SCOP: a structural classification of proteins database for the investigation of sequences and structures" Journal of Molecular Biology 475, 536-540 (1995). 10. S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Anang, W. Miller and D. J. Lipman, "Gapped BLAST and PSI-BLST: a new generation of protein database search programs" Nucleic Acids Research 25, 3389-3402 (1997). 11. C. Venclovas, A. Zemla, K. Fidelis and J. Moult, "Some measures of comparative performance in the three CASPs" Proteins: Structure, Function, and Genetics 34, 220-223 (1999). 12. B. Rost, C. Sander and R. Schneider, "Redefining the goals of protein secondary structure prediction" Journal of Molecular Biology 235, 13-26 (1994). 13. G. Pollastri and A. McLysaght, "Porter: a new, accurate server for protein secondary structure prediction" Bioinf., in press. Bioinf. Adv. Ace. (2004).

THE STABILIZATION EFFECT OF THE TRIPLEX VACCINE*

F . P A P P A L A R D O , S.MOTTA, E. MASTRIANI, M. PENNISI Dept. of Mathematics

E-mail:

& Computer Science and Faculty of Pharmacy University of Catania, V.le A. Doria, 6, 1-95125 Catania, Italy {francesco,motta,mastriani}Qdmi.unict.it; [email protected] P.-L. LOLLINI

Sezione di Cancerologia, Dipartimento di Patologia Sperimentale, and Centro Interdipartimentale di Ricerche sul Cancro "Giorgio Prodi" University of Bologna, Viale Filopanti 22, 1-40126 Bologna, Italy E-mail: [email protected]

Cancer immunoprevention vaccines are based, like all vaccines, on drugs which gives to the immune system the necessary information to recognize tumor cells as harmful. The vaccine, in endogeneous tumors, cannot eliminate all tumor cells, but maintains them to a non dangerous level. In this paper we show that the vaccine's administrations acts as a stabilizing perturbation for the immune system. Results suggest that it is possible to model such an effect using ODE based technique with an "external input".

1. I n t r o d u c t i o n T h e Immune System

(IS) is a c o m p l e x a d a p t i v e s y s t e m of cells a n d

molecules, d i s t r i b u t e d in t h e v e r t e b r a t e s b o d y , t h a t p r o v i d e s a basic d e *This work was supported in part by IMMUNOGRID project, under EC contract FP62004-IST-4, No. 028069. F.P. and S.M. acknowledge partial support from University of Catania research grant and MIUR (PRIN 2004: Problemi matematici delle teorie cinetiche). This work has been done while F.P. is research fellow of the Faculty of Pharmacy of University of Catania. P.-L.L. acknowledges financial support from the University of Bologna, the Department of Experimental Pathology ("Pallotti" fund) and MIUR.

587

588 fense against pathogenic organisms. A vaccine is a drug which provides the immune system with a first encounter with pathogen using inactivated ones. In such a way the system produces memory cells that will be able to destroy the active pathogen at the next encounter. A special kind of phatogens are cancer cells. Being self cells, they are usually not recognized by the immune system as dangerous. Tumors are caused by a combination of exogenous and endogenous factors. Elimination of exogenous factors (industrial carcinogens, tobacco, and so on) is in principle quite easy. This is achieved by governmental decrees and laws or changes in lifestyle. Other tumors are mainly due to endogenous or unknown factors, for example the major risk factors for breast cancer are related to hormonal (estrogenic) stimulation of the mammary gland during fertile life. Cancer immunoprevention is based on the use of immunological approaches to prevent solid tumor, rather than to cure cancer. This is mostly important in endogenous originated tumors in which cancer cells are continuosly formed from corrupted normal cells. Cancer immunoprevention vaccines are based (like all vaccines) on drugs which gives to the immune system the necessary information to recognize tumor cells as harmful. For this reason, cancer vaccines need to be administred to the host for his entire life. The vaccine cannot eliminate all tumor cells but stabilizes them to a non dangerous level. Comprehensive discussion on vaccines, immune system and models can be found in Lollini et al.3. In this paper, we treat immunoprevention vaccines looking in particular to the stabilizing effect of the vaccine's administration. The plan of the paper is the following. In Section 2 we will recall the model which reproduce the effect of an immunoprevention vaccine for mammary carcinoma and presents the analysis of the model results in an appropriate states' space which describe the cancer - immune system competition. In Section 3 we draw conclusions and plan for future works.

2. A n a l y s i s of t h e Triplex stabilizing effect A typical, and unfortunately very spread, endogenous cancer is the mammary carcinoma. Research on immunoprevention cancer vaccines for this carcinoma started in the mid-'90s. Various attempts were made by Lollini et al.2 and others to prevent mammary carcinoma in HER-2/neu transgenic mice using immunological maneuvers. A complete prevention of mammary

589 carcinogenesis with the Triplex vaccine was obtained when vaccination cycles started at 6 weeks of age and continued for the entire duration of the experiment, at least one year (chronic vaccination) 2 . The question whether the Chronic protocol is the minimal vaccination protocol yielding complete protection from tumor onset, or whether a lower number of vaccination cycles would provide a similar degree of protection, is still an open question. Finding an answer to this question via a biological solution would be too expensive in time and money as it would require an enormous number of experiments, each lasting at least one year. For this reason we developed an accurate model of immune system responses to vaccination. A detailed description and applications of this model to vaccine schedules can be found in Pappalardo et al.5'6 and in Motta et al.4. From the point of view of entities description, the model is inspired by Boltzmann equations. The simulator is based on lattice Boltzmann automata and the results one observe in the figures of the quoted papers, describes the moments of the distributions functions of the various entities with respect to time. However the fight of the immune system against harmful cancer cells is a competition which recalls the well known LotkaVolterra equations. As matter of fact, there is a population of prey, (the cancer cells) with infinite food resources (the host blood) and different populations of predators (effector cells) which try to recognize and eliminate them. At variance with classical predator-prey models the predator survival is not determined by the prey, but they exist at a certain level as normal state of the host using the same food resources, the blood. As tumor cells are self cells, they are hardly recognized by the immune system so that without any vaccine's treatment tumor takes over, destroies the immune system efficiency and kills the host. When vaccine is administred effector cells recognize tumor cells, their number raises in order to eliminate tumor cells and then goes back to normal level. However in endogeneous originated tumors new cancer cells will be formed and then it needs a new vaccine administration to stabilize the cancer - immune system competition. Thus the system including cancer cells is unstable and will be stabilized by an external action (the vaccine). The situation is similar, but reversed, to mechanical system where external forces induce instability (like wind effect on bridges). The state of the system describing the immune system - cancer competition can be summarized in three foundamental parameters: the number of cancer cells, the number of cytotoxic cells (representing the cellular response) and the number of antibodies (representing the humoral response).

These parameters can be represented in a 3-D states' space. Curves in this space will then represent the evolution of the system. Time is the curves' parameter. To show the effect of the vaccine in stabilizing the system, we firstly look to the ground state, i.e. the behavior of the immune cells when no cancer cells are present. Then we analyze three cases previuosly studied in Motta et al.A.

Figure 1.

Ground states for normal immune system

The ground state behavior is shown in Figure 1 where we plotted the number of B and Cytotoxic T cells along the normal host's life. Figure 1 shows that due to the stochastic nature of cells birth and death the ground state is a region and not a single point. We then analyze the three cases quoted above. Firstly we analyze the untreated case. In this case there is no action of the immune system and the number of cancer cells grow with no control until solid tumor formation (Figure 2a). Figure 2b shows that both humoral and cellular immune response are absent.

Figure 2.

3-D states' space for no-treated case

591 Then we consider the Early schedule, which consist on three vaccination cycle starting at week 6. One vaccination cycle consisted of four intraperitoneal administrations of non-replicating (mitomycin-treated) vaccine cells over two weeks followed by two weeks of rest 1 . The effect of the schedule is to reduce the initial growth of tumor cells (Figure 3a). This effect is shown in Figure 3b as a large loop in which the cancer cells growth is reduced by immune response which is represented by cytotoxic T cells and antibodies increasing. After this initial phase (~160 days), the cancer cells grow with no constraints and the straight line is similar to the plot of the untreated case. (b)

Figure 3.

3-D states' space for Early treatment

Finally we consider the Chronic schedule. This schedule consists on repeated Early cycle administrations for the entire life of the host. Looking at Figure 4a one can see that after an initial burst, the cancer cells are kept under a safe threshold and the solid tumor formation is inhibited by Triplex vaccine. Figure 4c shows (for the initial phase, ~160 days) the same behavior of the Early schedule. After this initial phase, the system (immune system - cancer) is stabilized (Figure 4b) and the equilibrium region is better shown in Figure 4d.

3. Conclusions We presented an analysis of the effect of a cancer immunoprevention vaccine modeled by computer simulations. This analysis shows that the effect of the vaccine is to stabilize the immune system - cancer competition around values which are safe for the host. This result suggests that it is possible to model this effect using ODE based technique including "external inputs". Work in this direction is in progress and results will be published in due course.

(b) 100000

400

(0

(<0 Abx10°

TCx10'

TCxIO2

CCxIO4

Figure 4. 3-D states' space for Chroninc treatment

References 1. C. De Giovanni, G. Nicoletti, L. Landuzzi, A. Astolfi, S. Croci, A. Comes, S. Ferrini, R. Meazza, M. Iezzi, E. Di Carlo, P. Musiani, F. Cavallo, P. Nanni, P.-L. Lollini, Immunoprevention of HER-2/neu transgenic mammary carcinoma through an interleukin 12-engineered allogeneic cell vaccine, Cancer Res. 64(11), 4001 (2004). 2. P.-L. Lollini, G. Nicoletti, L. Landuzzi, C. De Giovanni, P. Nanni, New target antigens for cancer immunoprevention, Curr. Cancer Drug Targets, 5(3), 221 (2005). 3. P.-L. Lollini, S. Motta, F. Pappalardo, Modeling models in tumor immonology, Mathematical Models and Methods in Applied Sciences, to appear, 2006. 4. S. Motta, P.-L. Lollini, F. Castiglione, F. Pappalardo, Modelling Vaccination Schedules for a Cancer Immunoprevention Vaccine, Immunome Research, 1(5), doi:10.1186/1745-7580-l-5 (2005). 5. F. Pappalardo, P.-L. Lollini, F. Castiglione, S. Motta, Modelling and Simulation of Cancer Immunoprevention vaccine,Bioinformatics, 21(12), 2891 (2005). 6. F. Pappalardo, E. Mastriani, P.-L. Lollini, S. Motta, Genetic Algorithm against Cancer, Proceedings of Second International Meeting on Computational Intelligence Methods for Bioinformatics and Biostatistics (CIBB 2005), Lectures Notes in Computer Science, 3849, 223 (2006).

LEARNING CLASSIFIERS FOR HIGH-DIMENSIONAL MICRO-ARRAY DATA ANDREA BOSIN Department of Mathematics and Computer Science, University Via Ospedale 72, 09124 Cagliari, Italy

ofCagliari,

NICOLETTA DESSI Department of Mathematics and Computer Science, University Via Ospedale 72, 09124 Cagliari, Italy

ofCagliari,

BARBARA PES Department of Mathematics and Computer Science, University Via Ospedale 72, 09124 Cagliari, Italy

ofCagliari,

In this paper, we address the challenging task of learning accurate classifiers from microarray datasets involving a large number of features but only a small number of samples. We present a greedy step-by-step procedure (SSFS) that can be used to reduce the dimensionality of the feature space. We apply the Minimum Description Length principle to the training data for weighting each feature and then select an "optimal" feature subset by a greedy approach tuned to a specific classifier. The Acute Lymphoblastic Leukemia dataset is used to evaluate the effectiveness of the SSFS procedure in conjunction with different state-of-the-art classification algorithms.

1. Introduction The advent of DNA micro-array technology [1] has brought broad patterns of gene expressions simultaneously recorded in a single experiment and several data sets have become publicly available on the Internet. These data sets present multiple challenges, including a large number of gene expression values per experiment (several thousands of genes, usually referred as features), and a relatively small number of samples (a few dozen of patients). Micro-array data can be analyzed from many different viewpoints. Recent works [2][3][4] apply supervised learning for cancer classification: in this case available training samples, coming from patients whose pathological condition (e.g. cancer or normal) is known, are used for building classifiers suitable for medical diagnosis. In addition, the identification of discriminatory genes is of 593

594 fundamental and practical interest since medical diagnostic tests may benefit from the examination of a small subset of relevant genes. This paper goes further in this direction and proposes a greedy step-by-step feature selection heuristic (SSFS). Specifically, the Minimum Description Length criterion [5] is applied for weighting each feature according to its correlation with the target class. The resulting weights serve as input to an iterative procedure that evaluates different feature subsets by measuring the predictive accuracy of a classifier built on them. The smallest subset leading to the best accuracy is selected as "optimal". An experimental study is carried out to evaluate the proposed heuristic in conjunction with different classification algorithms. Specifically, we investigate the effectiveness of Naive Bayes [6], Adaptive Bayesian Network [7], Support Vector Machines [8] and k-Nearest Neighbor [9] in predicting many different classes or targets (i.e. diagnoses). The paper also shows how the knowledge of a domain expert makes it possible to replace one multi-target classifier with a set of binary classifiers, one for each target, with a substantial improvement in accuracy and performance. The Acute Lymphoblastic Leukemia (ALL) dataset [10] has been used as a testbed for the experiments presented here. The paper is organized as follows. Section 2 illustrates our learning strategy. The experiments and the related results are described in Section 3. Finally, Section 4 presents a brief discussion and concluding remarks. 2. Learning strategy A known problem in classification, and in machine learning in general, is to reduce the dimensionality of the feature space to overcome the risk of "overfitting" that arises when the number of training patterns (i.e. instances) is small and the number of features is comparatively large. In such a situation, we can easily learn a classifier that correctly describes the training data but performs poorly on an independent set of test data. Given a particular classifier, the selection of the best subset of features by exhaustive search and evaluation of all the possible subsets is impractical for a large dimensional input space, but it can be used in combination with another method that first reduces the number of features to a manageable size. In this paper, we evaluate a wrapper technique [11][12] based on the MDL principle [5] that removes many of the original input features and retains a minimum subset of features that yields best classification performance.

595 The MDL principle states that the best theory to infer from training data is the one that minimizes the length (i.e. the complexity) of the theory itself and the length of the data encoded with respect to it. MDL provides a criterion to judge the quality of a classification model and, as a particular case, it can be employed to address the problem of feature selection, by considering each feature as a simple predictive model of the target class [13][7][4], In order to select an "optimal" set of features, we use a greedy approach. First we rank and weight each feature according to its description length, that reflects the strength of its correlation with the target class, but any other ranking method can be used as well. In particular, we assume that the most informative features are those with the largest weights. Even if it is good in ranking features, MDL alone cannot be used as a feature selection criterion. To explicitly build a good feature subset and to evaluate its effectiveness, we adopt the iterative procedure, named Step-by-Step Feature Selection (SSFS), whose steps are detailed in Figure 1. 1

Rank all features according to their description length (MDL)

2

Select the set of the N top-ranked features (e.g. start with JV = 20)

3

Build a classifier from a training set D, filtered according to selected features

4

Test classifier accuracy on a test set T, filtered according to selected features Extend the set by adding the next k top-ranked features (e.g. k = 10) and put N =N+k

.

Repeat steps 2 to 5 and stop if the accuracy has not increased over the last J iterations (or when all the original attributes are appended to the sub-set)

Figure 1. Steps of the greedy SSFS procedure

We have applied different classification methods in conjunction with the same feature selection heuristic (by performing a set of independent experiments), in order to obtain evidence of the intrinsic effectiveness of the heuristic itself. In particular, we compare a Bayesian classification approach [6], Support Vector Machines (SVM) [8] and k-Nearest Neighbor (k-NN) [9], These techniques may be useful to identify expression patterns from Micro-array data, as witnessed by recent literature [3][4][14]. Specifically, in the context of Bayesian classifiers, we compare the performances of the Naive Bayes (NB) [6] and the Adaptive Bayesian Network (ABN) [7].

3. Experiments Our experimental study is carried out on the Acute Lymphoblastic Leukemia (ALL) dataset [10]. ALL is a heterogeneous disease consisting of various leukemia sub-types that remarkably differ in their response to chemotherapy [15]. The ALL dataset contains all known ALL sub-types (including T-ALL, E2A-PBX1, TEL-AMLl, BCR-ABL, MLL, Hy-perdip > 50) and consists of 327 samples, each one described by the expression level of 12558 genes. This dataset includes 215 training samples and 112 testing samples. In a first phase, we evaluate the SSFS procedure with NB, ABN, SVM and k-NN classifiers, by addressing the multi-class problem in one shot. That is, we look for multi-target models that are capable of distinguishing between all six ALL sub-types. For each classifier, we measure the errors on the test dataset, i.e. the number of patterns that are misclassified (corresponding to a diagnostic error). The results are shown in Figure 2 for an increasing number of features selected by MDL.

Figure 2. Misclassifications of multi-target classifiers for an increasing number of features selected by MDL.

As we can see, the number of features needed to achieve a stabilization in accuracy is very high, and the resulting classifiers have a relatively high level of misclassifications (5-10%). This confirms the difficulty of multi-target classification, in agreement with recent literature [16]. To circumvent this problem, a domain specific heuristic can be useful to decompose a multi-target classification problem in a structured set of binary classification problems, one for each target.

597 Specifically, we adopt a divide-and-conquer methodology based on clinical knowledge, experience and observation [15]. Indeed, when approaching the ALL diagnosis, doctors first look for evidence of the most unambiguous sub-type (i.e. T-ALL) against all the others (referred as OTHERS 1). If there is no T-ALL evidence, they look for the next most unambiguous sub-type (i.e. E2A-PBX1) against the remaining (OTHERS2). Then the process steps through TEL-AMLl vs. OTHERS3, BCR-ABL vs. OTHERS4, MLL vs. OTHERS5 and finally Hyperdip > 50 vs. OTHERS (which groups all samples not belonging to any of the previous sub-types). This approach can be reproduced in a machine learning process that builds and evaluates six binary classifiers, each one is responsible for only one ALL sub-type, i.e. is capable of distinguishing a single sub-type from all the others. When constructing such a binary classifier, all the samples belonging to a subtype different from the one considered have to be reassigned, in both training and test datasets, to a generic "OTHER" sub-type. In such a way, we only have two targets: the sub-type considered and "OTHER". On this basis, two distinct experiments have been performed. In the first one (referred as El), we learn each binary classifier in turn, according to the order specified above, leaving out from training and test datasets the patterns belonging to the sub-types already considered. Table 1 summarizes the cardinality of training and test sets for each binary classifier. Table 1. Experiment E1: number of training and test patterns for each binary classifier. ALL SUB-TYPE

Training set

Test set

T-ALL vs. OTHERS1 E2A-PBX1 vs. OTHERS2 TEL-AMLl vs. OTHERS3

28 vs. 187

15 vs. 97

18 vs. 169 52 vs. 117

9 vs. 88 27 vs. 61

BCR-ABL vs. OTHERS4

9 vs. 108

6 vs. 55

MLL vs. OTHERS5 Hyperdip > 50 vs. OTHERS

14 vs. 94 42 vs. 52

6 vs. 49 22 vs. 27

We learn each binary classifier by applying the SSFS procedure to the corresponding training and test datasets. This results in six different "optimal" subsets of 20 features. Each subset contains the attributes i.e. genes that characterize a specific ALL sub-type. Figure 3 shows the errors of NB, ABN, SVM and k-NN binary classifiers, in conjunction with MDL ranking.

1

' • NB DABN

T-ALL

,

E2APBX1

w 1 , rr 1 TELAML1

BCR-ABL

DSVM DK-NN

H1 MLL

Hyperdip >50

ALL sub-types

Figure 3. Misclassifications of binary classifiers in the experiment El.

On the contrary, in the second experiment (referred as E2) we learn each binary classifier without leaving out the patterns belonging to the sub-types already considered, i.e. we retain the complete training and test sets (consisting of 215 and 112 patterns) for learning and evaluating each classifier. In this way, we can verify if the use of smaller and smaller training datasets in experiment El (Table 1) influences the accuracy of classifiers. As in the experiment El, the SSFS procedure results in six "optimal" subsets of 20 features. We observe that the six feature subsets selected in the experiment E2 differ from those selected in the experiment El, even if we adopted the same ranking criterion (MDL). For each sub-type, Table 2 shows the number of different features in the corresponding subsets (out of 20). Table 2. Number of features that are not common to both experiments El and E2 T-ALL

E2A-PBX1

TEL-AML1

BCR-ABL

MLL

Hyperdip > 50

0

3

4

11

9

13

The ABN classifiers trained using these two different features sets (El and E2) result in the errors shown in Table 3. The performance is slightly better in E2 and the majority of errors occurs for different ALL sub-types. Table 3. Misclassifications of ABN binary classifiers in experiments El and E2. T-ALL El E2

E2A-PBX1

TEL-AML1

BCR-ABL

MLL

Hyperdip > 50

overall

4. Discussion and concluding remarks In the multi-target problem (Figure 1), the behavior of all classifiers is similar: the accuracy has some initial oscillation and a large number of features (between 300 and 700) is necessary to reach the convergence. On the contrary, binary classifiers achieve maximum accuracy with very few features: only 20 features are enough in all cases, meaning that we can identify every ALL sub-type by the corresponding set of 20 genes. Moreover, the number of misclassified samples is lower for binary models (Figure 3). Interestingly enough, the pre-processing of the training data needed to learn the binary models can influence the results. The comparison of different ABN binary classifiers gives some insight on this point. In both experiments El and E2 there are no errors in classifying T-ALL, E2A-PBX1, TEL-AML1 (Table 3) and the subsets of features are almost the same (Table 2). We can reasonably expect that also from a medical point of view the subsets of features selected by SSFS correspond to genes characterizing these ALL sub-types. Accuracy is also good for MLL, in both El and E2 setting, while it is variable for BCR-ABL and Hyperdip>50 (Table 3). In these last cases, the subsets of features show significant differences (Table 2). Leaving out or not leaving out part of the samples from the training datasets has important consequences on feature selection: this can be due both to the small number of training samples (e.g. for BCR-ABL) and to a not very sharp genetic characterization of these sub-types. In this case feature subsets correspond to genes that are not necessarily relevant from a medical point of view and further investigation is needed. When compared to recent studies on the same dataset [14], our results show some improvement in terms of predictive accuracy. This confirms that MDL can be useful in selecting relevant features, as we have suggested in a previous study on a different dataset [4]. As a last point, Table 4 reports the computational effort measured in minutes of CPU time on a 1.8 GHz Intel Pentium 4 processor, for the basic learning tasks, i.e. feature ranking, model training and model test. Table 4. CPU times (Intel Pentium 4 1.8 GHz) activity MDL ranking Model training Model test

20 attributes < 1 min. < 1 min.

400 attributes < 10 min. 2 min.

12558 attributes 100 min.

References 1. G. Hardimann., Microarray methods and applications: Nuts & bolts. DNA Press (2003). 2. T.R. Golub et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286 (1999). 3. I. Guyon, J. Weston, S. Barnhill and V. Vapnik, Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46 (1-3): 389 - 422 (2002). 4. A. Bosin, N. Dessi, D. Liberati and B. Pes, Learning Bayesian Classifiers from Gene-Expression MicroArray Data, Proceedings of WILF 2005, LNAI, vol. 3849, Springer-Verlag (2005). 5. A. Barron, J. Rissanen and B. Yu, The minimum description length principle in coding and modelling, IEEE Transactions on Information Theory, 44: 2743-2760 (1998). 6. N. Friedman, D. Geiger and M. Goldszmidt, Bayesian Network Classifiers, Machine Learning, 29: 131-161 (1997). 7. J.S. Yarmus, ABN: A Fast, Greedy Bayesian Network Classifier (2003). http://otn.oracle.com/products/bi/pdf/adaptive bayes net.pdf. 8. V. Vapnik, Statistical Learning Theory, Wiley-Interscience, New York, NY, USA (1998). 9. T.M. Cover and P.E. Hart, Nearest neighbor pattern classification, IEEE Transactions on Information Theory, 13:21-27 (1967). 10. http://www.stjuderesearch.org/data/ALL 1 /. 11. A. Blum and P. Langley, Selection of relevant features and examples in machine learning, Artificial Intelligence, 97:245-271 (1997). 12. R. Kohavi and G. John, Wrappers for feature subset selection, Artificial Intelligence, 97:273-324 (1997). 13. I. Kononenko, On biases in estimating multi-valued attributes, IJCAI95, 1034-1040(1995). 14. H. Liu, J. Li and L. Wong, A Comparative Study on Feature Selection and Classification Methods Using Gene Expression Profiles and Proteomic Patterns, Genome informatics 13: 51-60 (2002). 15. E. J. Yeoh et al., Classification, sub-type discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell, 1:133-143 (2002). 16. S. Mukherjee, Classifying Microarray Data Using Support Vector Machines, Understanding And Using Microarray Analysis Techniques: A Practical Guide. Kluwer Academic Publishers, Boston, MA (2003).

P R E D I C T I O N OF R E S I D U E E X P O S U R E A N D C O N T A C T N U M B E R FOR SIMPLIFIED H P LATTICE MODEL PROTEINS USING LEARNING CLASSIFIER SYSTEMS

MICHAEL STOUT, JAUME BACARDIT, JONATHAN D. HIRST, JACEK BLAZEWICZ AND NATALIO KRASNOGOR* Automated Scheduling, Optimisation and Planning Research Group, School of Computer Science and IT, University of Nottingham, Jubilee Campus, Wollaton Road, Nottingham, NG8 IBB, UK Email: {jqb, mqs,nxk} @ cs.nott.ac.uk School of Chemistry, University of Nottingham, University Park, Nottingham NG7 2RD, UK Email: [email protected] Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznan, Poland Email: [email protected]

The performance of a Learning Classifier System (LCS) applied to the classification of simplified hydrophobic/polar (HP) lattice model proteins was compared to other machine learning (ML) algorithms. The GAssist LCS classified functional HP model proteins on the 3D diamond lattice as folding or non-folding at 88.3% accuracy, outperforming significantly three out of the four other methods. GAssist correctly classified HP model protein instances on the basis of Contact Number (CN) and Residue Exposure (RE) on both 2D square and 3D cubic lattices at a level of between 27.8% and 80.9%. Again, the LCS performed at a level comparable to the other ML technologies in this task outperforming significantly them in 24 out of 180 cases, and being outperformed just six times. The benefits of using LCS for this problem domain are discussed and examples of the LCS generated rules are described.

1. I n t r o d u c t i o n Prediction of structural properties of proteins such as residue exposure (RE) and coordination number (CN) based solely on protein sequence has recently received renewed attention. In other studies, simplified protein 'corresponding author

601

602 models such as the HP model have been used to understand protein folding and protein structure prediction. These models represent the sequence of a protein using two residue types: hydrophobic and polar restricting the residue locations to those of a lattice. This paper compares CN and R E prediction for simplified HP model proteins using machine learning technologies, in particular Learning Classifier Systems (LCS). LCS apply Evolutionary Computation to Machine Learning problems. Four questions were examined: 1) Is it possible to predict, from sequence alone, which proteins will and will not fold? 2) Is it possible to predict which residues have above or below average CN and RE? 3) Is it possible to predict the detailed CN and RE states? and 4) Are LCS suitable tools for these tasks? 2. B a c k g r o u n d 2.1. Protein

Structure

Prediction

The prediction of the 3D structures of proteins is a fundamental and difficult problem in computational biology. Popular approachs include predicting specific attributes of proteins, such as secondary structure, solvent accessibility or coordination number. The contact/coordination number (CN) problem is defined as the prediction, for a given residue, of the number of residues from the same protein that are in contact with it. Two residues are said to be in contact when the distance between the two is below a certain threshold. This problem is closely related to contact map (CM) prediction. While protein structure prediction remains unsolved, researchers have resorted to simplified protein models to try to gain understanding of both the process of folding and the algorithms needed to predict i t 1 . Approaches have included fuzzy sets, cellular automata, L-systems and memetic algorithms (for references see 2 ) . One common simplification is to focus only on the residues (C-alpha or C-beta atoms) rather than all the atoms in the protein. A further simplification is to reduce the number of residue types to less than twenty by using residue sequence representations based, for instance, on physical properties such as hydrophobicity, as in the so called hydrophobic/polar (HP) models. Another simplification is to reduce the number of spatial degrees of freedom by restricting the atom or residue locations to those of a lattice 1 ' 3 ' 4 . Lattices of various geometries have been explored, e.g., two-dimensional triangular and square geometries or threedimensional diamond and face centered cubic. Idealized models have been used, among other things, to study the nature of the energy landscape, the uniqueness of the native state or associated degenerate sequences, the origin of the two-state thermodynamic behavior of globular proteins (i.e.

603 first folding into secondary structures and later into a three dimensional shape), the existence of cooperative folding (i.e. an energy gap between the native conformation and the closest non-native one) and structure-function relations (for further references see 5 , 2 ) 2.2. HP

Models

In the HP model (and its variants) the 20 amino acids are reduced to two classes: non-polar or hydrophobic (H) and polar (P) or hydrophilic amino acids. An n amino acid protein is represented by a sequence s £ {H, P}+ with \s\ = n. The sequence s is to be mapped to a lattice, where each residue in s occupies a different lattice cell and the mapping is required to be self-avoiding. The energy potential in the HP model reflects the fact that hydrophobic amino acids have a propensity to form a hydrophobic core. In the standard HP model, contacts that are HP and P P are assigned an energy of 0 and an HH contact is assigned an energy of - 1 . Whilst in the functional model protein (PMP), HP and P P receive a value of 1 and HH a value of - 1 . For an F M P sequence to be viable it must fold into a unique native state (unlike Dill's model 6 where the same sequence could have a variety of minimum energy states), the native structure is required to have a binding pocket, i.e. at least one hole in the conformation 5 . Moreover, there must exist an energy gap between the minimum energy conformation and the next excited state. In this paper, rather than applying optimisation methods 7 ' 8 to minimise the energy of the structures we concentrate on classification of models. We employ a class of machine learning techniques called Learning Classifier Systems, and in particular we use the GAssist system 9 which is based on a binary representations of rules (see section 3 for more details). 10 3. M e t h o d o l o g y Three datasets were employed (Table 1). A 3D HP diamond lattice data set used for the Fold/Non-fold experiments (3DFNF), a 3D HP cubic lattice dataset used for the CN and RE experiments (3DCNRE) and a 2D square lattice dataset used for the CN and R E experiments (2DCNRE). Datasets are available on-line at h t t p : //www. cs . n o t t . a c . uk/~nxk/hppdb.html. The experimental design was as follows: 1) For all residues, calculate CN and RE. CN is typically defined as the number of non-contiguous residues within a given radius (r=1.0 lattice unit) of each residue. R E was defined as the distance of each residue from the center of mass of the protein. 2) Create instance sets by moving a window of fixed length over the sequence-attribute

Table 1.

Details of the data sets used in these experiments.

Dataset Identifier Lattice Dimensions Lattice Type Coordination Number Model Type Number of Sequences Number of Structures Maximum Sequence Length Minimum Sequence Length Total Residues Total Hydrophobic Total Polar Source

3DFNF 3D Diamond 4 FMP 4196352 893 23 23 96516096 48258049 48258047 Taken from 1 1

3DCNRE 3D Cubic 6 HP 15 15 48 27 640 316 309 Taken from yz

2DCNRE 2D Square 4 FMP 4428 4428 20 20 92988 42638 45922 Taken from 1 "

vectors, assigning a class to each instance: the value of that attribute for central residue in the window. 3) Split the instance sets into Training and Test sets. 4) Apply machine learning tools to predict the classes in Test Sets. 5) Extract classification accuracies for each algorithm. 6) For the non-deterministic algorithms (GAssist) iterate 10 times with different random number seeds. 7) Calculate the mean prediction accuracy. 8) Perform student t-tests on the mean prediction accuracies to determine which algorithms significantly outperformed the others (using a confidence interval of 95 and Bonferroni correction 14 for multiple pair-wise comparisons). Windows were generated for one, two and three residues at each side of a central residue. For each attribute and for each window size, three class assignment levels (Two State, Three State and Five State) were explored. For two state assignment residues were assigned the class 1 (high) or 2 (low) according to whether their attribute value was below or above the average for that attribute value in that particular the protein. For three states the class assignments were 1 (low), 2 (intermediate) or 3 (high) for the lower, middle or upper third of the range respectively. In five state assignments the classes were 1, 2, 3, 4 or 5 for the first, second through fifth portion of the range respectively. Composed of a rule learning algorithm and a rule inference engine, LCSs have the ability to balance multiple, potentially conflicting, constraints (e.g. formation of local structures vs global structures) and can produce high quality predictions. Moreover, LCS can produce human understandable explanations of the rules they have used to make their classifications, unlike, for example, neural network based systems. GAssist 9 is a Pittsburgh learning classifier system descended from GABIL 10 . The system applies a near-standard Genetic Algorithm (GA) that evolves individuals that represent complete problem solutions. Each individual consists of a variable length rule set. We used the rule-based knowledge representation of the

605 GABIL 10 system (see section 5 for an example of a generated rule set). The experimental parameters used for the GAssist experiments were the default values 9 except that for the larger datasets (2DCNRE), where 25 strata were used rather than the two strata used by default. One thousand iterations of the LCS were used. GAssist was compared against Naive Bayes, C4.5, IBk (k=3) and JRip, all of them taken from the WEKA machine learning package. 4. Experimental Results 4.1. Results of Fold Non-Fold Classification

Experiments

Table 2 summarises the results of the Fold/Non-fold classification experiments on the 3D Diamond Lattice Structure dataset. For each algorithm the overall average and deviation of test accuracy is shown. GAssist was the best method on this dataset, outperforming significantly three of the four other tested methods. Table 2. Averaged Classification Accuracies (%) for 3D HP Fold/Non-Fold Experiments. A • means that GAssist significantly outperformed the Algorithm to the left Algorithm Naive Bayes GAssist IBk JRip C4.5

Total 74.8±3.1 • 88.3±1.7 81.8±2.7 • 86.9±3.1 • 87.9±2.5

4.2. Results of CN and RE Classification

Experiments

Table 3 summarises the results of the classification experiments for CN and RE for the 3DHPCNRE the 2DHPCNRE datasets. For each algorithm the overall average and deviation of test accuracy is shown. GAssist performed at a similar or better level than the other tested machine learning methods. It significantly outperformed other methods 24 times and it was outperformed in just six of the tested datasets. 5. Discussion The performance of the GAssist LCS was equal or better than the other tested methods, especially on the fold/non fold dataset. It was outperformed significantly very few times. From a general point of view we can say that CN is easier to classify than RE, and that the 2D lattice data are also more difficult to classify than the 3D data. On the 3D lattice, CN can be classified around 80%, 67% and 52% for two, three and five states, and

606 Table 3. Averaged Classification Accuracies (%) for 2D and 3D HP CN and RE Experiments. A • means that GAssist significantly outperformed the Algorithm to the left, a o means that the Algorithm on the left outperformed GAssist Exper. States Alg.\Win. Size Naive Bayes GAssist 2

IBk

JRip

C4.5 Naive Bayes GAssist CN

3

IBk JRip C4.5 Naive Bayes GAssist

5

IBk

JRip C4.5 Naive Bayes GAssist 2

RE

3

IBk

JRip C4.5 Naive Bayes GAssist

IBk JRip C4.5 Naive Bayes GAssist

5

3 79.7±5.8 79.9±6.0 80.1±6.0 80.1±6.0 80.2±fi.O 67.1±5.6

67.1±6.0» 66.1±6.3 60.7±5.2 67.5±5.6 51.6±4.4 51.4±4.5 51.3±4.6 45.5±3.7» 51.7±4.5 77.8±5.5 77.9±5.5 78.2±5.3 78.1±5.3 77.8±5.4 63.0±5.7 62.0±5.5 61.1±4.9 59.7±3.0 61.6±5.2 37.3±6.6

37.6±5.9

IBk

37.0±5.7

JRip

34.5:1:2.9

C4.5

38.2±6.8

3D Data 5 79.9±5.2 80.2±5.4 79.0±5.4 80.1±5.8 79.9±5.7 67.2±4.6

7 80.2±4.5 79.6±4.7 78.0±5.1 79.9±5.0 79.8±4.6 67.3±4.9 67.7±4.6 67.3±5.0 66.7±5.3 64.9±5.7 64.8±5.2 64.5±4.9 67.7±4.7 65.8±5.1 S2.2±4.4 51.8±5.8 51.3±4.4 52.9±5.3 49.6±4.6 48.8±5.8 46.9±4.3» 49.0±6.0 50.7±4.2 52.3±5.1 78.6±4.4 79.7±4.4 78.1±4.8 78.2±4.2 76.7±S.l 76.2±4.3 77.8±4.8 78.3±4.6 T T i i i l J 77.9±4.1 63.3±5.2 62.5±5.5 61.7±5.5 62.1±4.7 61.0±5.0 61.8±5.2 5 9 . 0 ± 3 . 3 . 61.4±3.9 61.7±5.3 64.1±4.1 38.6±6.1 37.6±6.1 36.2±5.9 39.2±g.3 36.7±5.9 38.5±6.1 33.6±3.8« 3<S.2±S.S 36.8±6.3 38.9±4.9

2D D a t a 5 63.9±0.4 64.1±0.4 64.1±0.4 63.8±0.4

7 62.6±0.4« 64.9±0.3 65.1±0.4 64.7±0.4

61.2±0.3 64M0.4

es.i±o.4

3 61.2±0.3 61.2±0.3 61.2±0.3 61.2±0.3 70.9±0.2

70.9±0.2 6 8 . 5 ± 0 . 2 .

70.S±0.4 7i.0±o.4 7l.0±0.4 70.9±0.2 70.9±0.2 70.9±0.2 58.1±0.2 S8.1±0.2 58.1±0.2 58.1±0.2 58.1±0.2 56.9±0.5 56.9±0.4

^.§±0.4'

56.9±0.4 56.9±0.4 43.3±0.3 43.3±0.3 43.3±0.3 43.3±0.3 43.3±0.3 27.8±0.2 27.8±0.3 27.8±0.3 2EJ.3±0.0. 27.8±0.3

71.1±0.3 70.5±U.3« 7l.l±0.3 56.8±0.2« 58.7±0.3 58.7±0.3 57.6±0.3» 58.6±0.3 60.0±0.4» 60.4±0.5 6O.B±0.5 60.2±0.5 60.5±U.4 45.4±0.3» 46.5±0.3 46.5±0.3 45.6±0.3» 46.5±0.3 27.8±0.3»

3o.8±0.5 31.1±0.5 28.4±0.3» 31.2±0.5o

71.0±0.2 70.5±0.3» 71.0±0.2 56.4±0.3» 58.8±0.3 58.9±0.3 57.6±0.3. 58.8±0.2 58.7±0.5» 61.4±0.5

61.9±0.6o

61.1±0.5 61.7±0.6 44.2±0.3» 47.2±0.6 47.8±0.So 46.5±0.4. 47.8±0.4o 28.1±0.4» 32.0±0.6 33.1±0.4o 28.0±0.3« 33.0±0.4o

RE can be classified around 78%, 62% and 38%. For the 2D lattice data, CN can be classified around 65%, 71% and 59%, and RE can be classified around 62%, 47% and 33% for two, three and five states. The fold/non fold domain can be classified with an 88% accuracy. Beside its performance, GAssist has another advantage, which is the generation of compact and interpretable solutions. GAssist generated on average rule sets consisting of 52.8, 9.6 and 3.5 rules for the 3DFNF, 2DCNRE and 3DCNRE datasets, respectively. As an example, we show a rule set from an individual generating 87.3% accuracy for two state prediction with a window size of seven (three residues either side of the residue being predicted) for the CN domain using 3D lattice. An X symbol is used to represent positions at the end of the chains, that is beyond the central residue being studied, H means high CN, L means low CN. The rule set only had three rules, and at most three of the seven input attributes were expressed. The rules are interpreted in order, therefore all examples not matched by the first or second rules are assigned class L.

607 (1) If Positioni-i £ {p}, Positicmi £ {h}, Positiom+x ^ {h} then class is H (2) If Positioni-2 £ {X}, Positiorii e {h} then class is H (3) Default class is L

Moving from highly abstract (2 class) to more informative predictions (5 class) more input data (larger windows) are required in order to facilitate learning. The 3D structures on the cubic lattice have less than 50 residues, as a result the training data has an unnaturally high proportion of exposed/low-CN residues (including hydrophobic residues which are more usually found buried). Analysis (not shown) of the distribution of residues by class showed that for the 2D square lattice structures this bias in the input data distributions is less pronounced. We have extended these studies to real proteins (papers submitted) and HP representations of real proteins 2 . In the future we will investigate computation and prediction of other structural properties such secondary structures and disulfide bridges. 6. Conclusions These studies have shown that: a) it was possible to discriminate at around 80% accuracy, from sequence alone, which proteins will and will not fold b) It was also possible to predict which residues have above or below average CN and RE c) it is possible to predict the detailed CN and RE states of residues and d) The GAssist LCS performs at a level comparable to other ML algorithms on these problems. Of the WEKA algorithms studied, those based on orthogonal representations perform slightly better than those which are not. Minimalist lattice structure models focus on the essential details of protein structure prediction. Moving from highly abstract predictions (above/below mean for a given attribute) to more detailed structural predictions (eg. five state CN), accuracy can be increased by incorporating more local residue pattern information in the inputs (increased window size). However, in real proteins, only some contacts (secondary structure contacts) arise from local residue sequence patterns that may be recognizable in short fragments/windows. Other contacts arise from long-range global features of proteins and these may not be evident in short local sequence patterns. Future studies will extend these investigations with classifications based on other structural attributes and studies of real protein datasets. 7. Acknowledgments We acknowledge the support provided by the UK Engineering and Physical Sciences Research Council (EPSRC) under grants GR/T07534/01 and

608 GR/62052/01 and the Biotechnology and Biological Sciences Research Council (BBSRC) under grant BB/C511764/1. References 1. Yue, K., Fiebig, K.M., Thomas, P.D., Sun, C.H., Shakhnovich, E.I., Dill, K.A.: A test of lattice protein folding algorithms. Proc. Natl. Acad. Sci. USA 92 (1995) 325-329 2. Stout, M., Bacardit, J., Hirst, J.D., Krasnogor, N., Blazewicz, J.: From hp lattice models to real proteins: coordination number prediction using learning classifier systems. In: 4th European Workshop on Evolutionary Computation and Machine Learning in Bioinformatics 2006 (to appear). (2006) 3. Blazewicz, J., Dill, K., Lukasiak, P., Milostan, M.: A tabu search strategy for finding low energy structures of proteins in hp-model. (Computational Methods in Science and Technology) 4. Blazewicz, J., Lukasiak, P., Milostan, M.: Application of tabu search strategy for finding low energy structure of protein. Artificial Intelligence in Medicine 35 (2005) 135-145 5. Hirst, J.D.: The evolutionary landscape of functional model proteins. Protein Engineering 12 (1999) 721-726 6. Dill, K., Bromberg, S., Yue, K., Fiebig, K., Yee, D., Thomas, P., Chan, H.: Principles of protein folding: A perspective from simple exact models. Prot. Sci. 4 (1995) 561 7. Krasnogor, N., Blackburne, B., Burke, E., Hirst, J.: Multimeme algorithms for protein structure prediction. In: Proceedings of the Parallel Problem Solving from Nature VII. Lecture Notes in Computer Science. Volume 2439. (2002) 769-778 8. Hart, W., Istrail, S.: Fast protein folding in the hydrophobic-hydrophilic model within three-eighths of optimal. Journal of Computational Biology 3 (1996) 53-96 9. Bacardit, J.: Pittsburgh Genetics-Based Machine Learning in the Data Mining era: Representations, generalization, and run-time. PhD thesis, Ramon Llull University, Barcelona, Catalonia, Spain (2004) 10. DeJong, K., Spears, W., Gordon, D.: Using genetic algorithms for concept learning. Machine Learning 13 (1993) 161-188 11. Blackburne, B.P., Hirst, J.D.: Three dimensional functional model proteins: Structure, function and evolution. Journal of Chemical Physics 119 (2003) 3453-3460 12. Hart, W.: (www.cs.sandia.gov/tech_reports/compbio/tortilla-hpbenchmarks.html) Tortilla HP Benchmarks. 13. Blackburne, B.P., Hirst, J.D.: Evolution of functional model proteins. Journal of Chemical Physics 115 (2001) 1935-1942 14. Miller, R.G.: Simultaneous Statistical Inference. Springer Verlag, New York (1981) Heidelberger, Berlin.

A STUDY ON THE EFFECT OF USING PHYSICO-CHEMICAL FEATURES IN PROTEIN SECONDARY STRUCTURE PREDICTION

G. L. JAYAVARDHANA RAMA, M.PALANISWAMI Dept of Electrical and Electronics Engineering, The University of Melbourne, Parkville, Victoria - 3010, Australia [email protected] and [email protected] DANIEL LAI Dept of Electrical and Computer Systems Engineering, Monash University, Clayton, Victoria - 3168, Australia daniel. lai@eng. monash.edu. au. MICHAEL W. P A R K E R St. Vincent's Institute of Medical Research, 9, Princes Street, Fitzroy, Victoria - 3065, Australia. mparkerQsvi. edu. au

Protein structure prediction is a powerful tool in today's drug design industry as well as in the molecular modeling stage of x-ray crystallography research. This paper proposes a redefined encoding scheme based on the combination of ChouFasman parameters, physico-chemical parameters and position specific scoring matrix for protein secondary structure prediction. A new method of calculating the reliability index based on the number of votes and the SVM decision value is also proposed and it has been shown to assist design of better filters. The proposed features are then tested on the RSI 26 and CB513 datasets and shown to give better cross-validation results compared to the existing techniques.

1. Introduction The dependence on experimental methods of protein structure prediction may not yield protein structures fast enough to keep up with the requirement of today's drug design industry. With the availability of abundant 609

proteomic data, it has been shown that it is possible to predict the structure through machine learning techniques. As the prediction of tertiary structure from protein sequence is a very difficult task, the problem is usually sub-divided into secondary structure prediction and super secondary structure prediction leading to tertiary structure. This paper concentrates on secondary structure prediction using position specific scoring matrix and physico-chemical properties as features. Secondary structure prediction is based on prediction of the 1-D structure from the sequence of aminoacid residues in the target protein 22 . Several methods 23 have been proposed to find the secondary structure including PHD 21 , PROF-King 17, PSIPred 12, JPred 4 and SAMT99-Sec 14. Recently, significant work has been done on secondary structure prediction using Support Vector Machines. Hua and Sun 24 used SVMs and profiles of the multiple alignments and reported Q3 score as 73.5% on the CB513 dataset 4 . In 2003 Ward et. al. n reported 77% with P SI-BLAST profiles on a small set of proteins. In the same year Kim and Park 15 reported an accuracy of 76.6% on the CB513 dataset using PSI-BLAST Position Specific Scoring Matrix (PSSM). Nguyen and Rajapakse 18 reported a highest accuracy of 72.8% on RS126 dataset 2 using a two stage SVM. Guo et. al 7 used dual layered SVM with profiles and reported a highest accuracy of 75.2% on CB513 dataset. In this paper we make use of the Chou-Fasman parameters 19, physicochemical properties including Kyte-Dolittle Hydrophobicity 8 , Grantham Polarity 20 and Rigidity of Proline and compare it with existing techniques which mainly use only position specific scoring matrix (PSSM) obtained from PSI-BLAST. We investigate the performance when the PSSMs are used with physico-chemical properties as features. We propose a new method to calculate the Reliability Index (RI) based on the number of votes each class receives in combination with the decision value of the SVM classifier. We suggest an improvement to the tertiary classifier proposed by Hu et. al. 6 by calculating the posterior probability 10 of SVM decision value.

2. M e t h o d s Non-homologous CB513 4 and RS126 2 datasets were used for experiments as these are the most commonly used datasets in the literature. The secondary structure definitions used in our experiments were based on the DSSP 27 algorithm. The 8 to 3 state reduction method used was H to H, E to E and all others to C where H stands for a Helix, E for /? Strand and

611 C for Coil. We used six parameters derived from physico-chemical properties and probability of occurrence of amino acids in each state. Chou-Fasman conformational parameters 1 9 (3 parameters), Kyte-Dolittle Hydrophobicity scale 8 , Grantham Polarity 2 0 and Presence of Proline in the vicinity (1 parameter each) were used as the features in this set. Kyte-Dolittle hydrophobicity values and Grantham Polarity values were taken from the Protscale a website. The last parameter in the set is used to represent the information of rigidity Ri due to Proline residues. If a Proline residue is present at a particular position, Ri is given by 1 otherwise 0. We call this D2C-PC. In the rest of the paper, the term physico-chemical refers to the six features from D2C-PC. A second set containing position specific scoring matrix (PSSMs) generated by PSI-BLAST 2 5 using non-redundant (NR) database was used. pfilt 5 was used to filter the low complexity regions, coiled-coil regions and transmembrane helices before subjecting to PSI-BLAST. After getting the PSSM, a window of length w was considered around every residue and this is used as a feature for the classifier. PSSM have 20 * L elements where L is the length of the protein chain. We used the following function to scale the profile values from the range (-7,7) to the range (0,1) 15 . ( 0.0 g{x) = I 0.5 + 0.1x { 1.0

x < -5 -5 < x < 5 x>5

(1)

where x is the value of the PSSM matrix. All the values within the window of length w were considered 12 . The final feature length for each residue of this set is w*20. We call this D2C-PSSM. The third set comprises of combination of D2C-PC and D2C-PSSM as feature vector and we call this D2C-PCPSSM Support

Vector

Machines

(SVM)

The Support Vector Machines (SVM) developed by Vapnik 2 6 has been shown to be a powerful supervised learning tool for binary classification problems. The d a t a to be classified is formally written as © = { ( x i , 2 / l ) , ( x 2 , t/2) • • • ( x n . y n ) }

xi e nm K6{-1,1> a

http://au.expasy.org/tools/protscale.html

(2)

The SVM formulation defines a boundary separating two classes in the form of a linear hyperplane in data space where the distance between the boundaries of the two classes and the hyperplane is known as the margin. The idea is further extended for data that is not linearly separable where it is first mapped via a nonlinear function to a possibly higher dimension feature space. The nonlinear function is never explicitly used in the calculation. We note that maximizing the margin of the hyperplane in either space is equivalent to maximizing the distance between the class boundaries. For space constraint, details about SVM formulation has not been included in this paper. We use implementation by Lai et. al. 16 , namely the D2CSVMb software based on the heuristic framework 16 for all our experiments. More details about the algorithm can be found in the paper by Lai et. al. 16 . The SVM classifier decision surface is given by / ( x ) = sign I ^ouyiK

(x,x;)+ b\

(3)

3. Classification Prediction of secondary structure by the proposed technique reduces to a three class (H, E, C) pattern recognition problem. The idea is to construct six classifiers which include three one vs one classifiers {H/E, E/C, C/H) and three one vs rest classifiers (H/H,E/E,C/C). To adjust the free parameters of binary SVMs, we selected a small set of proteins containing about 20 chains and performed cross validation with different sets of parameters. Based on these experiments, we selected Radial Basis Function (RBF) kernel with C = 2.5 and a = 4 for all six classifiers. We designed three one vs one classifiers H/E, E/C and C/H and three one vs rest classifiers H/H, E/E, C/C. An ensemble of classifiers used by Hua and Sun 24 and SVMJFLepresent method proposed by Hu et. al. 6 are combined using our voting procedure. The ensemble of classifiers used is as shown in Figure 1. The classifier which gives absolute maximum value amongst H/E, E/C, C/H classifiers is used for decision making. However, the output of the SVM classifier is uncalibrated and it is not wise to use it directly for comparison. We can convert the output of SVM to a posterior probability 10,13 . We use Piatt's method to moderate the uncalibrated output to posterior probabilities Px which would range between 0 and 1. Finally the classifier is chosen according to eq. 4. Vi is incremented based on b

http://www-personal.monash.edu.au/~dlai/

613

V, =» Vole Class i

F i g u r e 1.

Classification Voting Scheme.

classification of the chosen classifier arg

max

\PX — 0.51

(4)

ie{HE,EC,CH}

Reliability

Index and Post

Processing

The reliability index (RI) we propose is based on the highest vote the class gets as well as the posterior probability of the one vs rest classifier. The vote Vi any class can get is in the range (0,4). As discussed earlier, the posterior probability of the winning class P is in the range (0,1). We define RI by eq. 5. RI(k) = (0.5 *

VH/A)

+ (0.5 * Pki)

(5)

where k is the residue number and i represents the winning classifier. 4. Evaluation Methods We use standard Q% accuracy, SOV * and Matthew's correlation coefficients for comparing the proposed technique with the existing results in literature. The procedure described by Rost and Sander 2 was used for calculation of Q3 accuracies and Matthew's correlation coefficients. Q$ was calculated as follows: 3

E Mi Q3 = ^ ^ 5 — X 100 where, Aij = N u m b e r of residues p r e d i c t e d t o b e s t r u c t u r e j a n d observed in t y p e i b = T o t a l n u m b e r of residues in d a t a b a s e

Matthew's

Correlation y-»

where,

pt = A»

Coefficient

Cj

was

calculated

,,.•,

using

Pj.n,;— UjOj

V(Pi +"i 1(Pi +°i )("• +"M"i +°i ) 3 3 3 m; = £ ) £ Ajk for « f « , f t 7 0i = X) A?» jjLik^i

jyti

3 i = 12

u

j^i

(7) A

ij

We also use the Segment Overlap (SOV) Score proposed by Zemla et. al. * (SOV99) which is denned in eq. 8 SOV = 100 x

N

itiB.cy k)

(8)

len{si

™™^*>

\

where S(i) is the set of all overlapping pairs of segments ( s j , s 2 ) i n conformation state i, len(si) is the number of residues in segment s i , minov(si, s 2 ) is the length of the actual overlap and maxov(si,S2) is the total extent of the segment. 5. R e s u l t s a n d D i s c u s s i o n We tested our method on the RS126 and CB513 datasets. Ten fold cross validation was performed for the RS126 dataset and Seven fold cross validation was performed on the CB513 dataset for all experiments. Three sets of features (D2C-PC, D2C-PSSM and D2C-PCPSSM) were used to evaluate RS126 dataset and D2C-PCPSSM feature for CB513 dataset. Results for datasets RS126 and CB513 are tabulated in the following tables respectively. Method Kim and Park 1 5 Nguyen and Rajapakse PHD 2 1 JPred 4 f D2C-PC D2C-PSSM D2C-PCPSSM Method PHD " j PSIPred 1 2 t t JNet 9 f Hua and Sun 2 4 t Kim and Park 15 Guo et. al. 7 D2C-PCPSSM

Q3 70.8 76.5 76.4 73.5 76.6 75.2 77.9

18

Q3 76.1 72.8 72.5 74.8 61.47 74.6 76.9

SOV 79.6 66

QH

QE

77.2 66.1

63.9 57.8

Qc 81.5 81.9

74.5 60.3 70.18 75.2

60.5 70.29 74.56

59.7 65.03 68.2

61.2 79.28 79.1

CH

CE

CC

-

-

-

0.68 0.71 0.67

0.60 0.61 0.65

0.56 0.61 0.6

SOV 73.5

QH

QE

72

66

Qc 72

-

-

-

-

74.2 76.2 80.1 80 76.17

78.4 75 78.1 80.4 77.6

63.9 60 65.6 71.5 69.8

80.6 79 81.6 72.8 81.1

C: Matthew's correlation coefficients fSOV94 3 ^Results not for CB513 dataset

Overall we have looked at several aspects of protein secondary structure prediction including the use of physico-chemical properties as features, fast trainable support vector machines, reliable tertiary classifier and calculation of reliability index. From the cross validation experiments it is clear that the use of physico-chemical parameters will improve the performance of secondary structure prediction. As a fair comparison we have experimented with PSSM alone as feature set as well as PSSM along with physico-chemical properties. We found that the improvement in accuracy was about 3% (on RS126 dataset) demonstrating the role played by the physico-chemical properties.

References 1. Zemla A, venclovas C, Fidelis K, and Rost B. A modified definition of sov, a segment based measure for protein secondary structure prediction assessment. Proteins: Structure, Functions and Genetics, 34:220-223, 1999. 2. Rost B and Sander C. Prediction of protein secondary structure at better than 70% accuracy. Journal of Molecular Biology, 247:584-599, 1993. 3. Rost B, Sander C, and Schneider R. Redefining the goal of protein secondary structure prediction. Journal of Molecular Biology, 235:584-599, 1994. 4. J A Cuff and G J Barton. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins, 34:508-519, 1999. 5. Jones DT, Taylor WR, and Thornton JM. A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33:30383049, 1994. 6. Hu HJ, Pan Y, Harrison R, and Tai PC. Improved protein secondary structure prediction using support vector machine with a new encoding scheme and an advanced tertiary classifier. IEEE Transaction on Nanobioscience, 3(4):265:271, 2004. 7. Guo J, Chen H, Sun Z, and Lin Y. A novel method for protein secondary structure prediction using dual layer svm and profiles. Proteins: Structure, Function and Bioinformatics, 54:738-743, 2004. 8. Kyte J and Doolittle RF. A simple method for displaying the hydropathic character of a protein. Journal of Molecular Biology, 157:105-132, 1982. 9. Cuff JA and Barton GJ. Application of multiple sequence alignment profiles to improve protein secondary structure prediction. PROTEINS: Structure, Function and Genetics, 40:502-511, 2000. 10. Piatt JC. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, MIT Press, pages 61-74, 2001. 11. Ward JJ, McGuffin LJ, Buxton BF, and Jonese DT. Secondary structure prediction with support vector machiness. Bioinformatics, 19(13):1650-1655, 2004.

12. DT Jones. Protein secondary structure prediction based on position-specific scoring matrices. Journal of Molecular Biology, 292:195-202, 1999. 13. Kwok JTY. Moderating the outputs of support vector machine classifiers. IEEE Transaction on Neural Networks, 10(5):1018-1031, 1999. 14. K Karplus, C Barrett, and R Hughey. Hidden markov models for detecting remote protein homologies. Bioinformatics, 14:846-856, 1998. 15. Hyunsoo Kim and Haesun Park. Protein secondary structure prediction based on an improved support vector machines approach. Protein Engineering, 16(8):553-560, 2003. 16. D Lai, N Mani, and M Palaniswami. Effect of constraints on sub-problem selection for solving Support Vector Machines using space decomposition. In The 6th International Conference on Optimization: Techniques and Applications (ICOTA6) accepted, Ballarat,Australia, 2004. 17. Ouali M and King RD. Cascaded multiple classifiers for secondary structure prediction. Protein Science, 9:1162-1176, 2000. 18. Nguyen MN and Rajapakse JC. Multi-class support vector machines for protein secondary structure prediction. Genome Informatics, 14:218-227, 2003. 19. Chou PY and Fasman GD. Conformational parameters for amino acids in helical, b-sheet, and random coil regions calculated for proteins. Biochemistry, 13(2):211-222, 1974. 20. Grantham R. Amino acid difference formula to help explain protein evolution. Science, 185:862-864, 1974. 21. B Rost. Phd: predicting one-dimensional protein structure by profile based neural networks. Methods in Enzymology, 266:525-539, 1996. 22. Burkhard Rost. Protein Structure Prediction in ID, 2D and 3D, volume 3. 1998. 23. Burkhard Rost. Review: Protein secondary structure prediction continues to rise. Journal of Structural Biology, 134:204-218, 2001. 24. Hua S and Sun Z. A novel method ofprotetin secondary structure prediction with high segment overlap measure: Support vector machine approach. Journal of Molecular Biology, 308:397-407, 2001. 25. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, and Lipman DJ. Gapped blast and psi-blasat: a new generation of protein database search programs. Nucleic Acid Research, 27(17):3389-3402, 1997. 26. V. N. Vapnik. The nature of statistical learning theory. Statistics for engineering and information science. Springer, New York, 2nd edition, 2000. 27. Kabsch W and Sander C. Dictionary of protein secondary structure: Pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22:2577-2637, 1983.

Acknowledgements: The authors would like to thank Prof. David Jones for kindly providing the pfilt program. We are also grateful to NCBI for access to the PSI-BLAST program and the Barton Group for making CB513 and RS126 datasets available on the web.

GENE EXPRESSION DATA ANALYSIS IN THE MEMBERSHIP EMBEDDING SPACE: A CONSTRUCTIVE APPROACH*

M. F I L I P P O N E , F. MASULLI AND S. R O V E T T A Department of Computer and Information Sciences University of Genova and CNISM Via Dodecaneso 35, I-I6I46 Genova, Italy {filippone, masulli, rovetta} @disi. unige. it

Exploratory analysis of genomic data sets using unsupervised clustering techniques is often affected by problems due to the small cardinality and high dimensionality of the data set. A way to alleviate those problems lies in performing clustering in an embedding space where each data point is represented by a vector of its memberships to fuzzy sets characterized by a set of probes selected from the data set. This approach has been demonstrated to lead to significant improvements with respect the application of clustering algorithms in the original space and in the distance embedding space. In this paper we propose a constructive technique based on Simulated Annealing able to select sets of probes of small cardinality and supporting high quality clustering solutions.

1. Introduction Clustering methods provide an useful tool to explore genomic data sets, but often the crude application of classical clustering algorithms leads to poor results. Actually, many clustering approaches suffer from being applied in high-dimensional spaces, as clustering algorithms often seek for areas where data is especially dense. However, sometimes the cardinality of the data sets available is even less than the number of variables. This means that the data span only a subspace within the data space. In these conditions, it is not easy to define the concept of volumetric density. Moreover, when space dimensionality is high or even moderate (as low as 10-15), the distance of a point to its farthest neighbor and to its nearest neighbor tend to become equal3'1. Therefore the evaluation of distances, •Work funded by the MIUR grant code 2004062740

617

and the concept of "nearest neighbor" itself, become less and less meaningful with growing dimension. Defining clusters on the basis of distance requires that distances can be estimated. For instance, one of the most common methods, c-means (CM) clustering, is based on iteratively computing distances and cluster averages. Increasing the data space dimensionality may introduce a large number of suboptimal solutions (local minima), and the nearest-neighbor criterion which is the basis of the method may even become useless. This problem is not avoided even when CM is modified in the direction of incorporating fuzzy concepts, e.g. as for the FCM (Fuzzy c-Means) algorithm 5 , 2 . If the cardinality of the data set is small compared to the input space dimensionality, then the matrix of mutual distances or other pairwise pattern evaluation methods such as kernels 13 may be used to represent data sets in a more compact way. P§kalska and Duin 1 2 have developed a set of methods based on representing each pattern according to a set of similarity measurements with respect to other patterns in the data set. In this framework the data set is embedded in a lower dimensional space called embedding space, in which, in the presence of large-dimensional data sets, a notable complexity reduction is achieved. Following this approach, the data matrix is replaced by a pairwise dissimilarity matrix D. Let X = {2:1, £ 2 . . . . ,xn} be a data set of cardinality n. We start by computing the dissimilarity matrix D:

dik =d(xi,xk)

Vi,k

(1)

according to an assigned dissimilarity measure d(x, y) between points x and y (e.g., using Euclidean distance). Applications of projection into dissimilarity embedding spaces to clustering are reported in 7 ' 1 0 . As pointed out in 1 2 , the dissimilarity measure should be a metric, since metrics preserve the reverse of the compactness hypothesis: "objects that are similar in their representation are also similar in reality and belong, thereby, to the same class". Often non-metric distances are used as well. In the following, we will adopt the Euclidean distance as the dissimilarity measure. In case of a data set with dimensionality N there is the upper bound of N + 1 probes (or support data)12 that we can use in order to build the dissimilarity matrix. In the case of genomic data this upper bound is often un-realistic, since the cardinality is much lower than the dimensionality. However, for data having some structure, it is not necessary to reach this

upper bound for good representation. We only require that the dimension of the embedding space is large enough to preserve the reverse of the compactness hypothesis. On the other hand, if the embedding dimension n is lower than N + l, some points could have an ambiguous representation and, moreover, clustering could be affected by the high metrical contribution of farthest points. In order to avoid those problems, in6 we proposed a different kind of embedding based on the space of memberships to fuzzy sets centered on the probes, that we will call Membership Embedding Space (MES) . Following this approach, a point in the embedding space will be represented by a vector containing only few non-null components (depending on the width of the membership function), in correspondence of the closer probes in the original feature space. In our experiments, the memberships of fuzzy sets centered on the probes were modeled using the following normalized function:

v* =

~r

K

(2)

where i = 1 , . . . , n and k = 1 , . . . , s. Note that the parameter p regulates the spread of the membership function and it is related to the average distance between the data points. For large values of (3 the memberships tends more rapidly to zero than for little (3. In the MES each data point Xi is represented as a row of v^. In this paper, we propose a constructive method to obtain the set of probes leading to optimal clustering in the MES using Simulated Annealing. 2. Simulated Annealing for Probe Selection The proposed method for probe selection makes use of the Simulated Annealing (SA) technique9 that is a global search method technique derived by Statistical Mechanics. SA is based on the work by Metropolis et al. 11 aimed to simulate the behavior and small fluctuations of a system of atoms starting from an initial configuration, by the generation of a sequence of iterations. In the Metropolis algorithm each iteration is composed by a random perturbation of the actual configuration and the computation of the corresponding energy variation {AE). If AE < 0 the transition is unconditionally accepted, otherwise the transition is accepted with probability given by the Boltzmann distribution:

P(AE) = exp(-£pj

(3)

where K is the Boltzmann constant and T the temperature. In SA this approach is generalized to the solution of general optimization problems9 by using an ad hoc selected cost function (generalized energy), instead of the physical energy. SA works as a probabilistic hill-climbing procedure searching for the global optimum of the cost function. The temperature T takes the role of a control parameter of the search area (while K is usually set to 1), and is gradually lowered until no further improvements of the cost function are noticed. SA can work in very high-dimensional searches, given enough computational resources.

(1) Initialize parameters (see list in Tab. 1); (2) Initialize the binary mask g at random; (3) Perform clustering and evaluate the generalized system energy E\ (4) do (5) Initialize / = 0 (number of iterations), h=0 (number of success); (a) (b) (c) (d) (e) (f)

do Increment number of iterations / ; Perturb mask g; Perform clustering and evaluate the generalized system energy E; Generate a random number rnd in the interval [0,1]; if rnd < P(AE) t h e n i. Accept the new g mask; ii. Increment number of success h;

(g) endif (h) loop until h < hmin

and / <

(6) update T = aT; (7) loop until h > 0; (8) end. Figure 1.

Simulated Annealing Probe Selection (SA-PS) Algorithm.

In Fig. 1 the proposed Simulated Annealing Probe Selection (SA-PS) algorithm is shown. In our approach the state of the system is represented by a binary mask g = (pi, g2,..., gn), where each bit gt (with i = l,...,n) corresponds to the selection (ffj = 1) / deselection (gi = 0) of a probe. The initialization of the vector mask g (Step 2) is done by generating so integer

621 numbers with uniform distribution in the interval [l,n] and setting the corresponding bits to 1 of g and the remaining ones to 0. At each step only s probes are selected from the original set of n patterns. A perturbation or move is done in the following way: (1) Chose randomly w G [w m j n ,w max ] and v £ [vm-m,vmax\; (2) w bits of g set to 1 are switched to 0; (3) v bits of g set to 0 are switched to 1. The values wm\n,wmax,vm\n,vmax can be used to reduce or to increase the variability of each perturbation. Once a set of probes is selected, it is possible to represent each pattern in the Membership Embedding Space (MES) and to perform clustering. In the experiments reported in the remainder of this paper, we performed clustering using the FCM algorithm2, but many other clustering algorithms can be employed. The generalized energy E is computed as a linear combination between an assigned clustering quality measure e and the number of selected probes s: E = s + \s

(4)

The clustering quality measure e can be a function of either the cost function associated to the clustering algorithm, a clustering validation index, or, in the case of labeled data sets, the Representation Error (RE). RE is the count of data points in each cluster disagreeing with the majority label in that cluster, summed over all clusters and expressed as a percentage. Note that the introduction of the number of selected probes s in the computation of E penalizes situations in which the number of selected probes is high. This choice of E leads to the minimization of the cardinality of the set of probes able to achieve a good clustering quality measure. The balance between these two terms is controlled by A (regularization coefficient). 3. Experimental setup The method was tested on the publicly available Leukemia data by Golub et al. 8 . The Leukemia problem consists in characterizing two forms of acute leukemia, Acute Lymphoblastic Leukemia (ALL) and Acute Myeloid Leukemia (AML). The original work proposed both a supervised classification task ("class prediction") and an unsupervised characterization task ("class discovery"). Here we obviously focus on the latter, but we exploit the diagnostic information on the type of leukemia to assess the goodness of the clustering obtained.

Table 1.

Choice of parameters.

Meaning

Symbol

Value

Number of random perturbations of g used to estimate the initial value of T

P

10000

Number of probes to be initially selected

so a

3

Cooling parameter Membership width parameter

0

10~ 6

Maximum number of iteration at each T

Jmax

2000

Minimum number of success for each T

flmin

200

Regularization coefficient

A

0.9

io-2

Minimum number of bits to be switched

^ m i n i min

Maximum number of bits to be switched

t^maxi ^max

1,1 s, 5

Number of clusters

C

3

FCM fuzziness parameter

m

2

FCM trials

r

10

u

The training data set contains 38 samples for which the expression level of 7129 genes has been measured with the DNA microarray technique (the interesting human genes are 6817, and the other are controls required by the technique). These expression levels have been scaled by a factor of 100. Of these samples, 27 are cases of ALL and 11 are cases of AML. Moreover, it is known that the ALL class is in reality composed of two different diseases, since they are originated from different cell lineages (either T-lineage or Blineage). In the data set, ALL cases are the first 27 objects and AML cases are the last 11. Therefore, in the presented results, the object identifier can also indicate the class (ALL if id < 27, AML if larger). In 6 we presented an extended experimentation using the FCM algorithm 2 and comparing the following approaches: (1) FCM on the original data set (RD); (2) FCM in the Distance Embedding Space (DES) with different probe/data ratios; (3) FCM in the Membership Embedding Space (MES) with different probe/data ratios. For each experiment we made 1000 independent trials, each of them using a different random initialization of the membership in the FCM algorithm. In all trials probes were extracted at random (using an uniform pdf) from the data set without replacement, the number of clusters was set to 3, and the fuzziness parameter m of FCM was set to 2. The last approach (3), projecting the data set into the membership embedding space, lead to better results. Moreover, increasing the parameter f3 from 1 0 - 8 to 10~ 6 we obtained for increasing probe/data ratios (from .8 to .4) a shift of the optimal error ratio.

0

5

1 10

1 15

(a) Figure 2. rithm.

1 20

1 25

1 30

r 35

1 0

i 5

i 10

i 15

i 20

i 25

r 30

35

(b)

RE (a) and number of probes selected (b) during a run of the SA-PS algo-

Starting from those previous results, we ran the SA-PS algorithm in the MES with the assumptions shown in Tab. 1. The value of the parameter (3 used in the experiments (/? = 10~ 6 ) was about the reciprocal of the mean distance between patterns. As a clustering quality measure we used the Representation Error (RE) evaluated as the best value obtained on r = 10 independent trials of FCM. Each independent run of the SA-PS algorithm finds a different small subset of probes leading to a clustering Representation Error equal 0. In Fig. 2, the Representation Error and the number of selected bits of g are plotted versus the iteration number during a run of the SA-PS algorithm, where each iteration corresponds to a different value of temperature T. In this case, at iterations 31, 33, 34 and 35 we obtained 4 different sets of 3 probes giving clustering RE equal 0. 4. Conclusions Exploratory analysis of genomic data sets using unsupervised clustering techniques, are often affected by problems due to the small cardinality and high dimensionality of the data set. A way to alleviate those problems lies in performing clustering in an embedding space where each data point is represented by a vector of its memberships to fuzzy sets centered on a set of probes selected from the data set. In previous work, this approach has been demonstrated to lead to significant improvements with respect the

624 application of clustering algorithms in the original space and in the distance embedding space. In this paper we have presented a constructive technique based on Simulated Annealing able to select sets of probes for clustering in the embedding space of fuzzy memberships. The application of the proposed probe selection algorithm combined with FCM to the Leukemia data by Golub et al 8 leads to high quality clustering solutions. References 1. C.C. Aggarwal and P.S. Yu, Redefining clustering for high-dimensional applications, IEEE Transactions on Knowledge and Data Engineering 14 210-225 (2002). 2. J.C. Bezdek, Pattern recognition with fuzzy objective function algorithms. Plenum, New York (1981). 3. K. Beyer, J. Goldstein, R. Ramakrishnan and U. Shaft, When is nearest neighbor meaningful? In: 7th International Conference on Database Theory Proceedings (ICDT'99), Springer-Verlag 217-235 (1999). 4. R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, Wiley, New York (1973). 5. J.C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact well-separated clusters. Journal of Cybernetics 3 32-57 (1974). 6. M. Filippone, F. Masulli and S. Rovetta, Clustering Genomic Data in the Membership Embedding Space. In: CI-BIO Workshop on Computational Intelligence Approaches for the Analysis of Bioinformatics Data, MontrealCanada, IEEE, Piscataway, NJ, USA (2005), http://ci-bio.disi.unige.it/CIBIO-booklet/CI-BIO.html. 7. A. Fred, J. Leitao, A new cluster isolation criterion based on dissimilarity increments. IEEE Trans, on PAMI, 25(8) 944-958 (2003). 8. T. Golub, et al., Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286 531-537 (1999). 9. S. Kirkpatrick, C D . Gelatt and M.P. Vecchi, Optimization by simulated annealing. Science, 220 661-680 (1983). 10. F. Masulli and S. Rovetta, A New Approach to Hierarchical Clustering for the Analysis of Genomic Data. In: Proc. I.J.C. on Neural Networks, MontrealCanada, IEEE, Piscataway, NJ, USA (2005). 11. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller and E. Teller, Equation of state calculations for fast computing machines. Journal of Chemical Physics, 21 1087-1092 (1953). 12. E. P§kalska, P. Paclik and R.P.W. Duin, A generalized kernel approach to dissimilarity-based classification. Journal of Machine Learning Research 2 175-211 (2001). 13. J. Shawe-Taylor and N. Cristianini, Kernel Methods for Pattern Analysis. Cambridge University Press (2004).

BICA A N D R A N D O M SUBSPACE ENSEMBLES FOR D N A MICROARRAY-BASED DIAGNOSIS

B. APOLLONI AND G. VALENTINI Dipartimento di Scienze dell'Informazione, Universita degli Studi di Milano, Via Comelico 39/41, 20135 Milano, Italy {apolloni, valentini}Qdsi.unimi.it A. BREGA Dipartimento di Matematica "F. Enriquez", Universita degli Studi di Milano Via Saldini 50, 20133 Milano, Italy andrea. [email protected] We compare two ensemble methods to classify DNA microarray data. The methods use different strategies to face the course of dimensionality plaguing these data. One of them projects data along random coordinates, the other compresses them into independent boolean variables. Both result in random feature extraction procedures, feeding SVMs as base learners for a majority voting ensemble classifier. The classification capabilities are comparable, degrading on instances that are acknowledged anomalous in the literature.

1. I n t r o d u c t i o n The traditional taxonomy of malignancies, based on their morphological, histopathological, and clinical characteristics, may be sometimes ineffective for a correct diagnosis and prognosis of tumors 1. Indeed a more refined diagnosis may be achieved exploiting the genome-wide bio-molecular characteristics of tumors, using high throughput bio-technologies based on large scale hybridization techniques (e.g. DNA microarray) 5 . One of the main drawbacks that characterizes DNA microarray data is represented by their very high dimensionality and low cardinality (problem known as curse of dimensionality). Hence several works pointed out the importance of feature selection methods to reduce the dimensionality of the input space 7 . An alternative approach is represented by data compression techniques that can reduce the dimensionality of the data, while approxi625

626 mately preserving their information content. As for their processing, several authors recently proposed to apply ensemble methods for improving the performance of state-of-the-art classification algorithms in the context of gene expression data analysis 4 . In this paper we compare two ensemble methods based on datacompression techniques for DNA-microarray-based diagnosis. The first one exploits random projections to lower dimensional subspaces 8 , while the second performs data compression through a Boolean Independent Component Analysis (BICA) algorithm 13 . While the first method has just been applied to gene expression data analysis 3 , BICA has never been previously applied to DNA microarray data analysis. In the next two sections we introduce the methods, and in Sect. 4 we experimentally analyze the effectiveness of the two approaches, applying them to DNA microarray-bases diagnosis of tumors.

2. R S E : R a n d o m S u b s p a c e E n s e m b l e The dimensionality reduction in the context of supervised analysis of data is usually pursued through feature selection methods. Many methods can be applied, such as filter ones, wrapper methods, information theory based techniques and "embedded" methods (see e.g. 6 for a recent review). We recently experimented a different approach 3 based on random subspace ensemble methods 8 . For a fixed n, n features (genes) are randomly selected, according to the uniform distribution. Then the d a t a of the original cf-dimensional training set is projected to the selected n-dimensional subspace. The resulting data set is used to train a suitable base learner and this process is repeated v times giving raise to an ensemble of v learning machines trained on different randomly selected subsets of features. The resulting set of classifiers are then combined by using majority voting. This method, that can be implemented easily in a parallel way, avoids some computational difficulty of feature selection (that is an NP-hard problem). Anyway feature selection methods can explicitly select sets of relevant features, while this information cannot be directly obtained through RS ensembles. On the other hand, with different random projections of the data we can improve diversity between base learners 9 , while the overall accuracy of the ensemble can be enhanced through aggregation techniques. As a consequence the performance of a given classification algorithm may be enhanced. A high-level pseudo-code of the method is summarized in Fig. 1. In particular, S u b s p a c e _ p r o j e c t i o n procedure selects a n-subset

R a n d o m Subspace Ensemble Algorithm Input: - A data set V = { (XJ , tj) 11 < j < m], x 3 € X C Md, tj € C = { 1 , . . . , A:} - a learning algorithm C - subspace dimension n < d - number of the base learners m Output: - Final hypothesis hTan : X —> C computed by the ensemble, begin for i = 1 to v begin Di = Subspace_projection(X>, n) hi = C(Di) end ftron(x) = argmax t g ccard({i|/jj(x) = t}) end. Figure 1.

High-level pseudo-code of the RSE method

A = { a i , . . . , a „ } from { 1 , 2 , . . . ,d}, and returns as output the new data set Di = {(PA(XJ), tj)\l <j< m}, where PA(xu ...,xd) = (xai ,...,xaJ. The new data set Di is then given as input to a learning algorithm C which outputs a classifier hi. All the classifiers obtained are finally aggregated through majority voting, where cardQ measures the cardinality of a set. 3. B I C A network A suitable way of taking decisions based on data is to split the decision process in two steps. The first is devoted to preprocessing data in a feasible way such that they can be interpreted in the second one. As for the former, it mirrors real vectors into boolean ones, that should reflect relevant features of the original data patterns. Stressing the fact that independence is a property of the representation of the data that we use, we search for this property precisely on a concise Boolean representation of them suitable for their correctly partition into positive and negative inputs of our decision rule. Accordingly, we call the mirroring method Boolean Independent Component Analysis, BICA for short. 3 . 1 . The

architecture

We split the mirroring of the original data into the target Boolean vector in two parts: a true mirroring of the patterns and a projection of a compressed representation of them (obtained as an aside result of the first part) into

the space of Boolean assignments. The whole process is done by a neural network with an architecture shown in Fig. 2 sharing the same input and hidden layer with the two output segments A and B computing the Boolean assignments and a copy of the input, respectively. Part A: Propositional Variable Vector v = (v\,vi,... ,vn)

Part B: Mirroring of Pattern Vector

Pattern Vector x = (x\ , #2, • • •, xj) Figure 2. Layout of the neural network mapping features to symbols.

3.2.

The learning

algorithm

We train this network with a backpropagation algorithm 10 as follows. Error backpropagation in part B . As customary with this functionality 11 , we structured our network as a three-layer network with the same number of units in both input and output layers and a smaller number of units in the hidden layer. Therefore the hidden layer constitutes a bottleneck which collects in the state of its nodes a compressed representation of the input. This part of the network is trained according to a quadratic error function and usual formulas 12 . Error backpropagation in part A . Things are different for the units of part A of the output. In this case we require that the network minimizes the following error:

Es = In (f[ z-*"*(l - z. lfc )- (1 -*" fc) J

(1)

where zsj is the output of the unit j upon presentation of s-th pattern. This function, which we call the edge pulling function, has the shape of an entropy measure that finds its minima in the vertices of the neural network output space (see Fig. 3).

Es

Figure 3. Graph of the function Es with n = 2.

The error which is backpropagated from the units of part A is: S

a,k = / a c t ( n e t s , f c ) a s , k

(2)

where netSjj is a weighted sum of the inputs to j - t h unit on s-th pattern, / a c t is the sigmoid function, and dEs ( a*,* = --a =ln

za
\

(3)

In addition,we insert a syntactic feedback into eq. 3 through an extra term which has the form of a 'directed noise' 6Sik added to the initial value of Q when we are not satisfied with the 'correctness' of the result. Namely, when the Hamming distance between vectors corresponding to patterns belonging to different classes falls below a given threshold, we assume patterns with the minority label incorrect. Then, denoting rStk the specific punishment to the neuron k for an incorrect pattern s, 0s>k reads: 9a,k = {l-2T{zB,k))re>k

(4)

where T is a threshold function. The first term in the brackets specifies the sign of 6s,k so that the contribution to the network parameters is in the opposite direction from the one the unit is moving in. Finally, using a tuning parameter TTA to balance parts B and A, as%k reads: <*s,k = TA (flg.fc+ In f

J*

•))

(5)

The joint goal of minimizing Es and maintaining patterns well separated in two categories brings the Boolean assignments to figure as samples of independent random variables, thus we may say that these variables are expectedly independent. More precisely, the following lemma has been proved in 13 :

L e m m a 3 . 1 . With reference to the neural network and training algorithm described above, if the neural network outputs are correct and all close to the vertices of the Boolean hypercube then their values stretched to the vertices constitute assignments to expectedly independent Boolean variables. We repeat v times also this process getting different maps, as a consequence of the random initialization of the network parameters, and different base learners trained on the encoded training sets. Finally, we compute for each sample of the training set the frequency with which base learners answer 1, and we gather frequencies corresponding to either positive or negative samples. In the lose assumption that frequencies in each group follow a Gaussian distribution we locate a threshold at the cross of their p.d.f.s 14 , i.e.

A-<7++£+<7(7_ + < 7 +

where /t_ and CT_ are the sample estimate of parameter fi- and CT_ of the negative distribution; idem for the positive distribution. With this threshold we classify test set records giving label 1 to those whose 1 frequency according to trained base learners overcome the threshold.

4. C o l o n T u m o r Classification 4 . 1 . Experimental

setup

In order to compare the two approaches, we applied the two ensemble methods to the classification of DNA microarray data relative to colon tumor samples 2 , composed of 2000 genes and 62 samples, 40 tumoral tissue samples and 22 normal samples. We evaluated the generalization performances of the two ensembles using multiple hold-out techniques: we randomly split the data in two equally-sized training and test sets, repeating this process 50 times. Then the average error on the test set has been computed. In both ensembles we used 60 Support Vector Machines (SVMs) as base learners. With RS ensembles we applied different projections into random subspace with dimension from 16 to 1024, and used linear SVMs, tuning their regularization parameter. With BICA network we mapped from R 2 0 0 0 to {0,1} space, and we used a second order kernel SVMs, as a result of a model selection procedure.

631 4.2. Results

and

discussion

Comparing the results obtained with the ensemble methods with those obtained with single SVMs, we can register a significant enhancement achieved with the ensemble approach w.r.t. the single SVMs (Tab. 1). On the other Table 1. Classification accuracy in single SVMs, BICA and RSE ensembles. single SVM

RSE

BICAe

test accuracy

0.67

0.828

0.792

a

0.07

0.05

0.09

train accuracy

0.80

1.000

0.98

a

0.07

0.00

0.02

hand, there is no a substantial difference between the performances of the two ensemble approaches, with only a slight improvement obtained with RSE ensembles. In order to understand if the errors of BICA and RSE ensembles are approximately distributed on the same examples, we also analyzed the frequencies of their errors in function of the pattern examined across the 50 test sets used in the multiple hold-out experiments. Interestingly enough, the two ensemble methods show their largest errors on the same examples (apart a few discrepancies). The largest errors are concentrated on samples 45, 49, 51, 55 and 56 for both the ensemble methods. As explained in 2 , most normal samples are enriched in muscle cells, while tumor samples are enriched in epithelial cells. The above samples consistently misclassified by both ensemble methods present an "inverted" tissue composition: normal samples are rich in epithelial cells, tumor samples are rich in muscle cells. This fact shows that the separation between normal and tumoral samples is also made on the basis of tissue composition, as observed in 7 . The best results with RSE have been obtained through random projections into 64-dimensional subspaces. BICA requires only 20 bits. As a matter of fact both encodings do not represent a real strong compression of DNA data, since we need 60 different maps to obtain a satisfactory classification. We note however that the 63% of the database is well classified using a single SVM and 93% using only 3 SVMs. Moreover, only 14 variables are used by the mentioned single SVM involving in own turn only 151 features uniformly distributed within the topology of the bench of the 2000 features supplied by the micro-array.

These results suggest that BICA technique could be in perspective applied to discover genes relevant for tumor discrimination that may be validated by the RSE ensembles. Acknowledgments We would like to thank the anonymous reviewers for their comments and suggestions. References 1. A. Alizadeh et al. Towards a novel classification of human malignancies based on gene expression. J. Pathol, 195:41-52, 2001. 2. U. Alon, et al. Broad patterns of gene expressions revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS, 96:6745-6750, 1999. 3. A. Bertoni, R. Folgieri, and G. Valentini. Bio-molecular cancer prediction with random subspace ensembles of support vector machines. Neurocomputing, 630:535-539, 2005. 4. S. Dudoit, J. Fridlyand, and T. Speed. Comparison of discrimination methods for the classification of tumors using gene expression data. JASA, 97(457):7787, 2002. 5. M. Eisen and P. Brown. DNA arrays for analysis of gene expression. Methods Enzymol, 303:179-205, 1999. 6. I. Guyon and A. Elisseeff. An Introduction to Variable and Feature Selection. Journal of Machine Learning Research, 3:1157-1182, 2003. 7. I. Guyon, J. Weston, S. Barnhill, and V. Vapnik. Gene Selection for Cancer Classification using Support Vector Machines. Machine Learning, 46(l/3):389-422, 2002. 8. T.K. Ho. The random subspace method for constructing decision forests. IEEE Transactions on PAMI, 20(8):832-844, 1998. 9. L.I. Kuncheva and C.J. Whitaker Measures of diversity in classifier ensembles. Machine Learning, 51:181-207, 2003. 10. D.E. Rumelhart, G.E. Hinton and R.J. Williams. Learning Internal Representations by Error Propagation. MIT Press, Cambridge, 318-362, 1987. 11. J. Pollack. Recursive distributed representation. Artificial Intelligence, 46:77105, 1990. 12. C M . Bishop. Neural networks for pattern recognition. Clarendon Press, Oxford, 1995. 13. B. Apolloni, A. Esposito, D. Malchiodi, C. Orovas, G. Palmas, J.J Taylor. A general framework for learning rules from data. IEEE Trans, on Neural Networks, 11:6, 2004. 14. R.O. Duda,P.E. Hart. Pattern classification and scene analysis. John Wiley & Sons, New York, 1973

PREDICTION OF SCLEROTINIA SCLEROTIORUM flJB) DE BAEY DISEASE ON WINTER RAPESEED {B. NAPUS) BASED ON GREY GM (1,1) MODEL* GUIPING LIAO Institute of Agricultural Information Technology, Hunan Agricultural Changsha, Hunan, 410128, China

University,

FEN XIAO College of Bio-Safety Science and Technology, Hunan Agricultural Changsha, Hunan, 410128, China

University,

Abstract: A novel Grey forecasting model for predicting S. sclerotiorum (Lib) de Bary disease on winter rapeseed (B. napus) is built based on Grey GM (1,1) model. The residual error test and the posterror test methods were used for calibration of the model. Different from other conventional forecasting methods, the GM (l,l)-based Grey calamity prediction forecasts a prediction Ar to infer the probable year of Sclerotinia disease outbreaks according to the origin, and then uses the result to recommend spraying a field or not in order to avoid unnecessary fungicide application. Based on practical experiments in Hunan province, the threshold (^) of the disease rate at the time when winter rapeseed begins to flower is defined as 5, and^ < 5 is called a down-calamity. The Grey forecasting model was tested at the 7 stations in 2004 and 2005 and predicted the probable year of Sclerotinia disease outbreaks and the need for fungicide application with thefirst-classgrade and high accuracy.

1. Introduction Sclerotinia sclerotiorum (Lib) de Bary causes diseases and significant yield losses on oil crops. The grain yield losses can be 100%[1]. In China, the yield reduction can reach 50%[2]. Applying fungicides, crop rotation and selecting disease resistant cultivars are currently the major methods of controlling this disease. However, fungicide treatments are often ineffective, because they are either not well-timed or unnecessary, and fungicidal chemicals are expensive and not all environmentally safe. Therefore, growers desire a new and convenient approach to predict Sclerotinia disease on winter rapeseed and to tell them whether or not to apply fungicides spray. ' This work is supported by Provincial Natural Science Foundation of Hunan (BK.04X7-04TT3021) and the Scientific Research Foundation of Department of Education of Hunan Province (04C290).

633

Different approaches have been applied to forecast S. sclerotiorum disease on rapeseed, such as testing petals for Sclerotinia infestation on agar plates [3], serological test [4], a heat unit system, and the checklists method [5]. Recently, decision support system (DSS) was used to forecast plant disease [6]. However, these systems require that many factors be taken into account, which is inconvenient for practical applications to growers. The Grey forecasting model is the core of the theory. GM (1,1) has been successfully applied for solving time series data in finance, physical control, management, engineering and economics with insufficient data [7,8,9]. Lin and Wang [10] proved that a Grey-forecasting model has much higher prediction validity than the Markov chain model. In this paper, using the Grey Systematic Theory we develop a Grey GM (1,1) model for predicting Sclerotinia disease on winter rapeseed (B. napus) based on the data observed in a time series from 1995 to 2005. 2. Grey GM (1,1) modeling principle Assume an original series of a given datax(0)is defined as

xm=(Xr,xi°\...,x^)

(i)

Where xm is the number of data observed at time t, t = 1, 2, . . ., n. A new data sequence xm can be generated by one-order accumulated generating operation (AGO) from the original^(0) sequence, as follows: xm=(x?\xV,...,x™)

(2)

Where *<» = ,<«, and jf> = £_ f 4°>, / = 1, 2 n. From the accumulated sequence xm we can form GM (1,1), which corresponds to the following whitened first-order differential equation: dxm/dt + cdl)=b

(3)

Where a represents the developed coefficient, b represents the grey controlled variable, which are both unknown variables. Therefore, the solution can be written as *<;> = (x<°> -b/a)e-' +b/ayt>l

(4)

Taking the inverse accumulated generating operation (IAGO) to Jc^,, and then the predictive value of original sequence will be obtained as: *%=i%S?\Vt}>l

(5)

The variables a and b can be solved by the ordinary least-square method as: = (BTB)"'BTy

(6)

Where matrix B is -(*
1

-(* 3 (,) +^")/2

1

B=

*<»> (7)

. V -(*I' ) +^.)/2

-<°>

Finally, using the residual error test method and the posterror test method can test the error between the forecasted value and the actual value. The forecasting errors were defined as ^=*<°>-Jc<°\/ = l,2,..,»i

(8)

And the forecasting percent errors were defined as (9) The standard deviation of the original time series (Si) and the forecasting errors (S2) are as below:

*.=^L>' m - 3f(0, >7»

(10)

do Where x (0) and a are the mean of the original time series and the forecasting errors. The posterror ratio C is derived by dividing S2 by Su i.e. C= S2/S\. The lower the C is, the better the model is. The posterror ratio can indicate the change rate of the forecasting error. The probability of small error is defined as p = prob.{krr -q\<0.6745.$,},f = 2,3,...,«

(12)

That p is another indicator of forecasting accuracy shows the probability that the relative bias of the forecasting error is lower than 0.6745. p is commonly required to be larger than 0.95. The pairs of the forecasting

indicators p and C can characterize four grades of forecasting accuracy, as shown in table 1. Table 1. The grades of forecasting accuracy [11] Forecasting indicators P C

Grade Good >0.95 <0.35

Qualified >0.8 <0.5

Just >0.7 <0.65

Unqualified 0.7 0.65

3. GM (1,1) Modeling 3.1. Data Collection The data (table 2) collected from 1995(SN=1) to 2005(SN=11) are the average from 7 long-term agricultural experimental stations for winter rapeseed production distributed at Changsha, Changde, Yueyang, Hengyang, Huaihua, Nan, and Cili, in Hunan Province. The sampling date was at the beginning of rapeseed flowering at all 7 sites. And the sampling method was counted in 3 fields, with 50 plants sampled at random per field (5 points), for a total of 150 plants sampled at every site. Disease rate (%) is defined as the percentage of the disease plants in all sampling plants. Table 2. The disease rate of S. scleroliorum disease on winter rapeseed in 1995-2005 SN 1 2 3 4 5 DR 5.37 6.40 3.80 5.30 1.38 SN: Serial Number, DR: Disease Rate (%).

6 3.60

7 2.66

8 3.19

9 4.05

10 3.49

11 5.65

3.2. Data Processing 3.2.1. Data Processing Method According to practical experiments in Hunan province, if the disease rate is at or above 5 during investigation, i.e. the beginning flowering stage of winter rapeseed, fungicide spraying is necessary; otherwise disease outbreaks usually ensues. So, we let the threshold ( / ) be 5, a n d / < 5 is called the down-calamity year of the disease. Mapping the down-calamity year of the disease, f • {xm} _»{x(0)}, the downcalamity set can be obtained as:

,.(0)

: x

(0)

(0)

„(0)

( /m>x/(2)

\

(13)

Where ^
(14)

Here we only take the data from 1995 to 2003 to construct the Grey model, and keep the data of 2004 and 2005 for test. 3.2.2.

Data smoothing

verification

Before modeling GM (1,1) based on down-calamity set*'10', the set of x,(0) must meet the condition of data smoothing:

*'!7x;>i°: <s

(15)

W h e r e O < £ < U = l,2,..,6. When /=3, 4, 5,6 then e =0.67, 0.467, 0.36, 0.3 respectively. The results obviously reveal that setting t sufficiently large (t£3), the smoothing condition can be meet. So, modeling can be done based on x'(m •

3.3. Model construction Fromx' <0) , one-order AGO sequence ofx'(l) is obtained as follows: x'm = {4,5,15,22,30,39}

(16)

From Eq. (7), matrix B and constant vector YN are accumulated as below:

B

-6.5 -12 •17.5 -26 •34.5

(17)

And then (BTB)"' and BTYN are easily calculated as follows: (B T B)-'

0.0020 0.0395

From Eq. (6), a is obtained as

0.0395 0.9705

B T r„

-752.5 35

(18)

-0.1418 4.2340

(19)

Therefore, the Grey prediction model is acquired by substituting a and b into Eq. (4): JC'JVI = 33.8500e01418' - 29.8500(? > 1)

(20)

3.4. Model testing 3.4.1. Residual error testing of model The actual value, the forecasted value, the residual error, the percent residual error and the average residual error can be obtained from Eq. (20), (14), (8) and (9). Table 3 shows that the average residual error is 0.0106; the maximal residual error is -0.1585, the minimal residual error is 0.0551 and the origin residual error is -0.0979( "•" f>6 is called prediction, •'• t=6 is defined as the origin). These statistics indicate the efficiency of the proposed Grey prediction model. Table 3. Residual error test of the Grey prediction model

/ value 1 2 3 4 5 6

Actual value (Eq. (14)) 4 5 6 7 8 9

Forecasted Value (Eq. (5)) 4 5.1585 5.9449 6.8508 7.8948 9.0979

Residual error (Eq. (8)) 0 -0.1585 0.0551 0.1492 0.1052 -0.0979

Percent residual error (%) (Eq. (9)) 0 -3.0722 0.9261 2.1785 1.3326 -1.0765

Average residual error 0.0106

3.4.2. Posteriori deviation testing of model Posteriori deviation testing is a kind of statistic testing based on probabilities distribution of residual error. From Eq. (10) and (11), the posterror ratio C can be calculated as: C = s2/st =0.0698 When t=2,..., 6, the probability of small error can be obtained from Eq. (12) respectively. Obviously p = \ and all the probabilities of small error are met: q^-q\<

1.1519,/ = 2,3,...,6

639 The values of C and p indicate that the forecasting accuracy of the Grey prediction model (Eq. (20)) is the first-class "Good" according to table 1, and also show that the Grey prediction model has a good generality extrapolation, and so it can be applied to forecast. 3.5. Prediction and verification Let 2003 (i.e.*6 = 9 ) be the origin (a reference point), from the Grey prediction model (Eq. (20)), we can easily computed: x'f = 3 8.9469, X''0 = 49.4313, x''" = 61.5136 We can also obtained according to Eq. (5): *'<0) = 10.4844, i'<0) = 12.0822 Therefore AT" is generated from A T - x'J0) -x'f 0>6). Let? = 7 , thenAF = 10.4844-9 * 1. The result indicates that the first year of the down-calamity is 2004 compared the origin (2003+1). Similarly let/ = 8, thenAr = 12.0822 - 9 * 3. The result shows the second year of the down-calamity will be 2006 (2003+3); by contrast, it reveals that 2005 is a calamity year. Obviously the predicted results agree with the actual data in 2004 and 2005. Also we can go on predicting in the same way. 4. Discussion and conclusion The Grey GM (1,1) model described in the paper resolves the problem and focuses on information insufficiency in analyzing and forecasting the disease on winter rapeseed. Compared to previous models mentioned above (introduction) the Grey predicting model is very easy to use, distribute and program into the computer, and requires no laboratory facilities and special training; moreover it is a highly accurate predictor. The results of testing and verification of the model has demonstrated that the Grey GM (1,1) model for forecasting the Sclerotinia disease is correct and accurate by statistics and practical prediction in 2004 and 2005. In conclusion, the Grey GM (1,1) model developed herein is only one of the novel means available to winter rapeseed producers in Hunan province for accurately forecasting the next calamity year of S. sclerotiorum outbreaks. The results illustrate that the residual error of the forecasting model is lower than 10%. Furthermore, the model has better quality prediction validity and is clearly a viable means of forecasting Sclerotinia disease of winter rapeseed. Meanwhile,

640 the proposed forecasting model requires only few data, compensating for the limitations of earlier studies. Also the method requires no biological samples from the rapeseed field, and hence no processing of such samples. Consequently, it is more convenient for growers to operate, and growers in China can better predict the demand for spraying against Sclerotinia disease by using the new Grey-predicting model. Fungicides can be sprayed only when necessary. Therefore, any adverse impacts of fungicides on the environment can be avoided, permitting the sustainable development of winter rapeseed production. Acknowledgments The authors are grateful to Dr. Pierre Fobert from Plant Biotechnology Institute of National Research Council of Canada for his very helpful comments. We would also like to thank the Provincial Department of Agriculture for sharing data. References 1. Purdy LH, Phytopathology, 69, 875(1979). 2. GUAN C.Y., LI F.Q., LI X., CHEN S.Y., WANG G.H., Liu Z.S., Acta Agronomica Sinica, 29(5), 715(2003). 3. Turkington, T.K. and Morrall, R.A.A., 1993. Phytopathology, 83, 682(1993). 4. Jamaux, I. and Spire, Y)., Plant Pathology, 43, 847(1994). 5. Eva Twengstrijm, Roland Sigvald, Christer Svensson and Jonathan Yuen, 1 Crop Protection, 17, 405(1998). 6. Shtienberg D., Crop Protection, 19, 747(2000). 7. Hsu, C.I., and Wen, Y.H., Transportation Planning and Technology, 22, 87(1998). 8. Lin, C.T. and Yang, S.Y., Technological Forecasting and Social Change, 70, 177(2003). 9. Tsaur, R.C., International Journal of Computer Mathematics, 82, 141(2005). 10. Lin, C.T., and Wang, S.M., International Journal of Information Management Science. 11, 13(2000). 11. Deng, J.L., System & Control Letters, 5, 288(1982). 12. Deng, J.L., Journal of Grey System, 1, 1(1998).

PART 3

Applied Research And Nuclear Applications

This page is intentionally left blank

IDENTIFICATION OF SEISMIC ACTIVITIES THROUGH VISUALIZATION AND SCALE-SPACE FILTERING* CHENGZHIQIN' Institute of Geographic Sciences and Natural Resources Research, CAS Beijing 100101, China YEE LEUNG Department of Geography and Resource Management, Center for Environmental Policy and Resource Management and Institute of Space and Earth Information Science, The Chinese University of Hong Kong, Shatin, Hong Kong JIANGSHE ZHANG Institute for Information and System Sciences, Xi 'an Jiaotong University XVan, Shaanxi 710049, China The identification of seismic active periods and episodes in spatio-temporal data is a complex scale-related clustering problem. Clustering by scale-space filtering is employed to give a quantitative basis for their identification. Visualization methods are employed to facilitate researchers to interactively assess and judge the clustering results by their domain specific experience in order to obtain the optimal segmentation of the seismic active periods and episodes. The real-life applications in strong earthquakes occurred in Northern China confirms the effectiveness of such an integrative approach. 1. Introduction Finding natural clusters in spatio-temporal databases is a major issue in geographical analysis. A typical example in seismology is the identification of seismic active periods and episodes in the temporal domain. In larger spatial context, the temporal sequence of strong earthquakes is not quite stochastic but a certain pattern of clustering with interspersed quiescence and active periods [1,2]. The regional seismic activity can be segmented into the seismic active periods and the seismic active episodes on the finer temporal scale [2]. * This work is supported by National Natural Science Foundation of China (No. 40225004). * Work partially supported by National Natural Science Foundation of China (No. 40501056).

643

644 Quantitative analysis of seismic active periods and episodes has important implications to the understanding and forecasting of long- and medium-term earthquakes. Due to the complexity of earthquakes, the studies of seismic activities often call for the seismologists' expertises with simple statistical indices [3,4]. To make the analysis more rigorous and results easier to evaluate, quantitative methods are often needed in conjunction with domain expertise [5]. Cluster analysis has thus become a common approach to study seismic activities [6], Due to the complexity and scale feature of earthquakes, the black-box-style clustering algorithms could not be generally applied. They also fail to give satisfactory answers to the following questions: 1) how many clusters? 2) how to evaluate the validity of a clustering result? An appropriate solution to these problems requires a powerful analytical method which gives a quantitative description, and a visualization method which assists experts to interactively visualize the quantitative results and judge according to their experience. Most of the clustering methods are sensitive to initialization and incapable of determining the optimal number of clusters. To overcome these difficulties, Leung et al. [7] proposed a clustering algorithm simulating our visual system which considers a dataset as an image with each light point located at a datum position. By modeling the blurring effect of our lateral retinal interconnections based on scale space theory, smaller light blobs merge into larger ones until whole image becomes one light blob at a low enough level of resolution. By identifying each blob with a cluster, the blurring process generates a family of clusterings along the hierarchy. This approach is computationally stable and insensitive to initialization. Since visualization was proposed in 1986 [8], geo-scientists have employed it in a variety of studies p]. The impetus for the use of visualization in geographical analysis is to interactively integrate the qualitative analysis based on geo-knowledge, which is difficult to depict by mathematics, and the quantitative computation which caters for massive geo-datasets and complex geo-processes [9,10], The identification of seismic active periods and episodes is actually the problem to which the interactive use of visualization and clustering by scale-space filtering can be aptly applied. The objective of this paper is to achieve such an integration. In Sec. 2, we describe the visualization of clustering by scale-space filtering and indices for cluster validity check. It is applied to segment seismic active periods and episodes in Sec. 3. The paper is then concluded by a discussion.

2. Methodology Scale-space theory was originated in digital signal processing [11]. It applies multiple-scale filtering (often a Gaussian filter with a scale parameter) to describe a digital signal or image on different scales resulting in a hierarchical structure. The process is similar to the visual process. By simulating our visual system, Leung et al. [7] developed a clustering algorithm based on scale-space filtering. To facilitate discussion, we first review the algorithm. 2.1. Clustering by Scale-Space Filtering Without prior assumptions, it is proven that one can blur the image in a unique and sensible way via the convolution of an image p(x), P(x, s), with the Gaussian kernel, i.e., P(x,o)=p(x)*g{x,o)=\p(x-y)-—re

*'ay

(1)

2jca

where g(x,a) is the Gaussian function; s is the scale parameter; (x^)-plane is the scale space; and P(x,s) is the scale-space image. There is a direct relation with neurophysiological findings in animals and psychophysics in man supporting this theory [12]. Leung et al. [7] deduced a numerical solution for following the blob centers (or maxima which represent the cluster centers) at each given scale s. The numerical solution can be interpreted as an iterative local centroid estimation. For higher computational efficiency, the algorithm uses the centroid of a cluster instead of data points in the cluster and becomes N

*(" + D = ^ Ikje

\*W-PJ\2

3-

±<±d_

(2)

2

°

where p} is cluster center j obtained at scale s,; TV, is the number of clusters at scale s{, kj is the number of data points in cluster j whose center is p/, and s =s,• s M . In practice, the {s,} sequence is constructed by S,-SM ~kSj.\ which guarantees the accuracy and stability of clustering [13]. The constant k is called the Weber fraction. Based on Weber's law in psychophysics, there is a lower bound for k since we cannot sense the difference between two images p{x, s^x)

646 and p(x, s,) when k is less than its Weber fraction. For instance, &=0.029 is enough in one-dimensional applications [7]. Based on Eq. 2, the hierarchical clustering algorithm is as follows: 1. Given a sequence of scales {s,} (/=0,1,2,...) with s o=0. 2. At s0=0, each datum is a cluster and its blob center is itself. Let j=l. 3. Cluster the data at scale sh Find the new blob center at j , for each blob center obtained at scale s,.i by Eq. 2. If two new blob centers are close enough, the old clusters disappear and a new cluster is formed. 4. If there are more than one cluster, let i=i+\ and go to step 2. 5. Stop when there is only one cluster. 2.2. Cluster Validity Checks in Clustering by Scale-Space

Filtering

The followed indices were defined in [7] to evaluate the validity of an obtained cluster and to facilitate the selection of the optimal clustering: • Lifetime. Lifetime of a cluster is the range of the logarithm scales over which the cluster survives. The mean of lifetime of clusters in a clustering is taken as the clustering lifetime. Longer lifetime means a more stable clustering because the real cluster should be perceivable over a wide range of scales. • Isolation of a clustering. This index measures the total isolation degree of clusters in a clustering. • Compactness of a clustering. This index measures the total degree of compactness of clusters in a clustering. Intuitively, a cluster is good if the distances between the data inside the cluster are small and those outside are large. For a good cluster, the isolation and compactness of the cluster are close to one. And the isolation and compactness of a good clustering should be as large as possible. 2.3. Visualization of Clustering by Scale-Space

Filtering

The visualization of clustering by scale-space filtering includes two phases which are inclined to visual representation and interactive analysis respectively. Different visualization techniques can be used because of their task-oriented characteristics [14]. In the first phase, the construction process of the scale space for clustering can be visualized naturally by a top-to-bottom tree-growing animation in 2D/3D views. Animation and interaction facilitate the generation of the original, qualitative cognition about the clustering in the whole scale-space. This phase suits the visual representation of the scale space. After the scale space is constructed, visualization based on the scale space and the indices for cluster validity can assist us to interactively construct, verify and revise at any scale our cognition (or assumption) about the optimal

clustering until the final result is obtained. Based on the quantitative indices, we can use the slider technique to select the scale of interest in free-style. The corresponding clustering result is shown by both the view of the scale space and the map or time sequence graph for interactively obtaining the optimal result. 3. Applications 3.1. Experimental Data In this application, we attempt to segment the periodic seismic activity of strong earthquakes in Northern China (34~42°N, 109~124°E), which forms a comparatively regional unit in whole for seismological analysis, via visualizing the clustering process by scale-space filtering. Considering the completeness of the strong earthquake catalogue, the datasets are chosen as the strong earthquakes (Ms>6.0) of 1290A.D.-2000A.D. which have 71 records [15]. 3.2. Temporal segmentation of Strong Earthquakes The scale space for the time sequence of earthquakes is depicted in Fig. 1. The number of clusters and the indices including Lifetime, Isolation and Compactness of clustering are shown in Fig. 2.

.

--..^LJCUD,

Figure 1. Temporal scale-space for earthquakes Scrutinization of the scale-space graph and indices calls for special attention to the patterns appearing in both the 59th~95th and the 6th scale steps (Fig. 2). In the 59lh~95th scale range, there are three clusters in the clustering with the maximum Lifetime, Isolation and Compactness of spreading across the longest scale range. It B thus the seismic active period recognized by the clustering algorithm (Fig. 3a). It actually corresponds to the Second, Third, and Fourth Seismic Active Periods singled out by the seismologists [16] (Table 1). In the 6* scale step, the number of clusters changes dramatically. After the 6th step, the change in clustering becomes comparatively smooth. This can be explained as that the earthquakes, which are temporally frequent preceding the 6th

648

10

20

30

50

M

60

70

80

90

scale step

a) Number of clusters changed with the scale step M 50 c

i i0

1 10

i ° (3 -10 o -20

a a

•3 -30

0

10

3

30

40

50

60

70

80

90

100

scale step b) Lifetime, Isolation and Compactness of clustering Figure 2. Indices of clustering along the time scale for earthquakes

II

( )K«

l n£<

1M

>X
a) 3 clusters in the 59,h~95,h scale range -t H

t—1 H H M

Ml.

™|j™»

*™«

Ml•

"

*

"

\

I j

1

r 1;

b) 17 clusters at the 6th scale step Figure 3. Ms-time plot of clustering results for earthquakes

i

"

'

step, merged rapidly as clusters when the observation scale increases in this scale range. When the time scale is larger than 6 and 7, however, clusters formed in more apparent relative isolations. Less clusters are formed in a relatively long scale range. The clustering result in the 6th scale step corresponds to the seismic active episodes recognized by the seismologists [16] (Fig. 3b; Table 1). Table 1 tabulates the seismic active periods and episodes recognized by our clustering algorithm versus that of the seismologists. It can be observed that the seismic active periods and episodes obtained by scale-space clustering are consistent with the results identified by the seismologists via their domain specific expertise, with the exception that the episodes of the Fourth Seismic Active Period recognized by our approach is not so consistent. It seems that there is quasi-periodicity of about 10~15 years for active episodes. Table 1. Seismic active periods and episodes obtained by the clustering algorithms and the Seismologists' expertise Active Period II III IV II

III

IV

Clustering result

Ref. [16]

Ref. [4]

1484-1730 1815-

1481-1730 1812-

2(?) 1 2 3 4 5 6 7 8 9 1 2 3 4 5

1484-1487 1497-1506 1522-1538 1548-1569 1578-1597 1614-1642 1658-1683 1695-1708 1720-1730 1815-1820 1829-1835 1855-1862 1880-1898 1909-1923

1481-1487 1501-1506 1520-1539 1548-1569 1580-1599 1614-1642

6

1929-1952

1921-1952

7

1966-1978

1965-1976

Active Episode

K?)

(Afe>6) 1290-1340(6) 1484-1730(31) 1815-(34) 1290-1314(5) 1337(1) 1484-1502(3) 1524-1536(2) 1548-1568(4) 1587-1597(2) 1614-1642(8)

1658-1695

1658-1695 (10)

1720-1730 1812-1820 1827-1835 1846-1863 1880-1893 1909-1918

1720-1730(2) 1815-1830(4) 1861 (1) 1879-1888(3) 1903-1918(4) 1922(1) 1929-1945(6) 1966-1983 (13) 1998- (2)

4. Conclusion Clustering by scale-space filtering can provide a series of clustering in different scales by simulating our visual systems. It is computationally stable and insensitive to initialization. It is also free from solving difficult optimization problems encountered by many clustering algorithms. The proposed algorithm constructs the clustering in scale space and computes the indices for cluster and clustering validity checks. And visualization assists researchers to derive clustering that is optimal in terms of computation and physical interpretation. The seismic active periods and episodes can actually be viewed as time scopes in which strong earthquakes in certain spatial region cluster as subsets on different temporal scales. The identification of seismic periods and episodes is a typical scale-related clustering problem to which clustering by scale-space filtering can be appropriately applied. The visualization of clustering provides a promising way to tightly couple background information and domain expertise with quantitative computation to make our solutions natural and meaningful. Since Gaussian scale-space theory is designed to be totally noncommittal, clustering by scale-space filtering cannot take into account a priori information on structures being worthy of preserving. In clustering by scale-space filtering, clusters tend to be spherical along the changing scale. This shortcoming can be alleviated by incorporating structure information in the visualization process. References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16.

Y. Kagan and D. Jackson, Geophys. J. Int. 104,117 (1991). Z. Ma and M. Jiang, Earthquake Research in China 3,47 (1987). M. Matthews and P. Reasenberg, Pure Appl. Geophys. 126,2 (1988). F. Gu, C. Zhang and J. Li, Earthquake Research in China 11, 341 (1995). Y. Kagan, Pure Appl. Geophys. 155,233(1999). C. Frohlich and S. Davis, Geophys. J. Int. 100,19 (1990). Y. Leung, J.-S. Zhang and Z.-B. Xu, IEEE T. Pattern Anal. 22,1396 (2000). A. McCormick,T. Defanti and M.Brown, Comput. Graphicsll, 103(1987). A. MacEachren and M.-J. Kraak, Comput. Geosci. 23,335 (1997). M. Gahegan, M. Wachowicz, M. Harrower and T.-M. Rhyne, Cartography and Geographic Information Science 28,29 (2001). T. Linderberg, IEEE T. Pattern Anal. 12,234 (1990). D. Hubel, Eye, Brain, and Vision, Scientific Am. Library, New York (1995). J. Koenderink, Bio. Cybern. 50,363 (1984). C. Qin, C. Zhou and T. Pei, Conf. AsiaGIS'2003, (2003). W. Huang, W. Li and X. Cao, Acta Seismological Sinica 7, 351 (1994). M. Jiang and Z. Ma, Earthquake 6, 5 (1985).

FUZZY APPROXIMATION NETWORK PERTURBATION SYSTEMS AND ITS APPLICATION TO RISK ANALYSIS IN TRANSPORTATION CAPACITY* ZOU KAIQI University key Lab of Information Science and Engineering of Dalian College of Information Engineering, Dalian University Dalian, Liaoning,

116622

University

P.R.China

By embedding the fuzzy neural networks, we construct a fuzzy approximation network perturbation system based on the human knowledge. A survey of the principle of this system is presented including the architectures and hybrid learning rules. In order to enhance the traffic productivity, using Fuzzy Logic Toolbox of MATLAB, wc establish a prediction modeling and apply it to risk analysis in transportation capacity. In terms of the simulation result, the modeling is quite good and decreases risk in transportation capacity.

1. Introduction Vagueness is a distinct characteristic of human thinking. It is not a defect of language, but rather an important source of clarity. Since Zadeh's fuzzy set theory was proposed, fuzzy logic has been successfully applied to control a pilot scale steam engine by Professor E.H. Mamdani in 1974, and a batch of products related to fuzzy technology was explored and applied in Japan. By now, fuzzy set theory and application has been focused on. In sec.2, the structure and algorithm of fuzzy inference system is reviewed. We promote the catastrophe fuzzy neural network system based fuzzy inference system in sec.3. In sec.4, the prediction transportation capacity of this system is explored using the MATLAB language in the field of traffic market to dress the problem that system modeling based on conventional mathematical tools is not well suited for dealing with ill-defined and uncertain systems such as traffic market.

This work is supported by National Nature Science Foundation of China, No. 60573072

651

652 2. Fuzzy Inference System Firstly, we introduce fuzzy if-then rules. Fuzzy if-then rules are the core part of a whole fuzzy inference system. Fuzzy sets and fuzzy operators are the subjects and verbs of fuzzy logic. But in order to say anything useful we need to make complete sentences. Fuzzy if-then rules are the things that make fuzzy if-then rules are the things that make fuzzy logic useful. Fuzzy if-then rules or fuzzy conditional statements are expressions of the form if x is A then y is B, where A and B are lingual values of the linguistic variables x and y in the universes of discourse X and Y, respectively. The if-part of the rule "x is A" is called the antecedent or premise, while the then-part of the rule "y is B" is called the consequent or conclusion. Note that the antecedent is an interpretation that returns a single number between 0 and 1, whereas the consequent is an assignment that assigns the entire fuzzy set B to the output variable y. Depending on the types of fuzzy reasoning that fuzzy if-then rules employed, most fuzzy inference systems can be classified into three types. Here we only discuss the first-order Sugeno fuzzy model, which has rules of the form if x is A then y=kx+r, where A and B are fuzzy sets in the antecedent, while k and r are all constants. Its merits lie in compactness and efficiency. Secondly, a fuzzy inference system is composed of five functional blocks (see Figure 1): Knowledge bases Data bases Fuzzification interface

Rule bases

X

Defuzzification interface

±

Decision-making unit Figure 1. Fuzzy inference system where, a rule base containing a number of fuzzy if-then rules; a database defines the membership functions of the fuzzy sets used in the fuzzy rules; a decision-making unit performs the inference operations on the rules; a fuzzification interface transforms the crisp inputs into degrees of matching with linguistic values; a defuzzfication interface transforms the fuzzy results of the inference into a crisp output.

653 In general, the rule base and the database are jointly referred to as the knowledge base. The steps of fuzzy reasoning (inference operations upon fuzzy if-then rules) performed by first-order Sugeno fuzzy model are: 1. Compare the input variables with the membership functions on the premise part to obtain the membership values of each linguistic label.(This step is often called fuzzification) 2. Through multiplication operator combine membership values on the premise part to get firing strength (weight) of each rule. 3. Generate the qualified consequent with a crisp value on the weight of each rule. 4. Aggregate the qualified consequent to produce a crisp output derived by weighted average defuzzification (This step is called defuzzification) For simplicity, we assume the fuzzy inference system under consideration has two inputs x and y and one output z. There are two rules: Rule 1: if x is Al and y is Bl then fl=plx+qly+rl, Rule 2: if x is A2,andyisB2 then f2=p2x+q2y+r2. The fuzzy reasoning by first-order Sugeno fuzzy model is illustrated in Figure 2

Fi^pix+qiy+r

F2=P2X+q2y+r2

c=>

z=

= mxjx + Wj + vt>2

Figure 2. Fuzzy inference system

VJ2J2

654 3. Fuzzy Approximation Network Perturbation Systems Based Fuzzy Inference System By embedding the first-order Sugeo fuzzy inference system into the framework of fuzzy approximation network perturbation system, multiplayer feed forward networks in which part or all of the nodes are adaptive, which means their outputs depend on the parameters pertaining to these nodes, we obtain the architecture of fuzzy inference system by first-order Sugeno fuzzy model. Fig.3 shows the equivalent of the fuzzy inference system by first-order Sugeno fuzzy model. Fig 3 shows the equivalent of the fuzzy inference system shown in Figure 2. No weights are associated with links between the nodes of two adjacent layers. It employs a hybrid learning rule [1] which combines the gradient method and the least squares estimate to identify premise and consequent parameters, respectively. Lqyerl

layer2

layer3

layer4

layer5

Al

< - *

i

^ N ^ A2 /

Bl ^

Nt^

B2

ml

i i i i

Figure 3. Fuzzy approximation network perturbation system

It is called simplified if its fuzzy if-then rules are in the following form: Ifx is big andy is small, then z is d where d is a crisp value. 4. Application to Risk Analysis in Transportation Capacity At present, many prediction methods are used in the field of traffic. For the uncertainty of traffic market, however, some prediction results do not afford to be tested by facts. In the following, we establish prediction modeling in MATLAB by using the data of transportation capacity of world bulk fleet (1976-2004) (see Table 1)

655 Table 1. Transportation capacity of world bulk fleet (Million Dwt)

year

1976

1977

1978

1979

tonnage

234.6

268.1

303.7

348.5

Year

1980

1981

1982

1983

tonnage Year

399.9

449.0

489.9

514.2

1984

1985

1986

1987

tonnage

516.6

520.2

520.6

526.8

year

1988

1989

1990

1991

tonnage Year tonnage Year tonnage Year

519.2 1992

505.3 1993 467.0 1997 511.4 2001

496.7 1994 473.4

tonnage Year

525.9 2004

475.3 1995 486.9 1999 524.4 2003 560.4

tonnage

563.2

268.9 1996 500.1 2000

537.7

1998 517.4 2002 549.6

The procedure of establishment prediction modeling is as follows: 1. Treat the original data to get increased percentages. 2. Establish input-output prediction modeling by extracting 24 input-output data pairs of the following format: [x(i) x(i+l) x(i+2) x(i+3) x(i+4)]. 3. Use MATLAB language to simulate on a PC 586.The simulation result is in Table 2. 4. Analyze the result. Table 2. The simulation result

Year

Treatment

Output

Prediction

1976 1977 1978 1979 1980 1981 1982 1983 1984 1985

0.1428 0.1328 0.1475 0.1475 0.1228 0.0911 0.0496 0.0047 0.0070

0.1227 0.0911 0.0496 0.0047 0.0070

448.99 489.9 514.2 516.6 520.2

Original 234.6 268.1 303.7 348.5 399.9 499.0 489.9 514.2 516.6 520.2

656 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004

0.0008 0.0119 -0.014 -0.027 -0.017 -0.043 -0.014 -0.004 0.0137 0.0285 0.0271 0.0226 0.0117 0.0135 0.0029 0.0224 0.0221 0.0197 0.0050

0.0008 0.0119 -0.014 -0.027 -0.017 -0.043 -0.014 -0.004 0.0137 0.0285 0.0272 0.0225 0.0117 0.0134 0.0030 0.0223 0.0221 0.0197 0.0051

520.6 526.8 519.2 501.3 496.7 475.3 468.9 467.0 473.4 486.9 500.1 511.4 517.4 524.3 525.9 537.7 549.5 560.3 563.2

520.6 526.8 519.2 505.3 496.7 475.3 468.9 467.0 473.4 486.9 500.1 511.4 517.4 524.4 525.9 537.7 549.6 560.4 563.2

Figure 4 shows that after 146 epochs, we had RMSE (root mean squared error) is 0.000068. The desired and predicted transportation capacity curves are essentially the same in Figure 5. The predictive values for the future 5 years are given in table 3, and the transportation capacity from 2000to 2004 will average increase by 0.0139. Therefore, this system can model the relationship between input and output quickly and precisely. And, the prediction for the future 5 years fit for the trend of world bulk fleet.

Figure 4. RMSE curves for predicted modeling

657

r" / ^** y

i i

_y

\ \ \

yJ

/ \

j

I

/

"^ -^ /.

i •

/

i

Figure 5. Expected and predicted transportation capacity curves Table 3. Prediction result for the incoming 5 years (Million Dwt)

Year Tonnage Year Tonnage

2005 575.9 2008 593.9

2006 590.8 2009 603.4

2007 594.4 2010 613.4

5. Conclusions We have described the neural network based fuzzy inference system, and briefly introduced its architecture and algorithm. In sec .4, and applied to predict transportation capacity. The prediction modeling is adaptable for the uncertainty of traffic market very well. However, this system has some limitations: 1. Only supports the first-order Sugeno fuzzy inference systems. 2. Employ the defuzzification strategy of weighted average Furthermore, previously, we established prediction modeling only for transportation capacity of traffic market. But for freight volume and freight rate, since it is not easy to collect their historical data and they are interrupted more easily by all kinds of factors such as political and economic than transportation capacity. So, there will be some difficulty in predicting them. Further research challenges us.

658 References 1. Jang, J.R. 1993. ANFIS: Adaptive-Network-Based Fuzzy Inference System. IEEE transactions on system, man, and cybernetics, vol.23, No.3: 665-685. 2. Mamdani.E.H.1997.Aplication of Fuzzy Logic to Approximate Reasoning Using Linguistic Synthesis.IEEE Transaction on computers, vol.26, No.l2:1182-1191. 3. Takagi. T. & Sugeno.M. 1985. Fuzzy identification of systems and its applications to modeling and control. IEEE Trans.Syst, Man, Cybern.,vol.l5:116-132. 4. Chichocki. A. 1996. New neural network for solving linear programming problems. European Journal of Operational Research, 93(2):244-256. 5. Draye. J. 1996. Dynamic recurrent neural networks: a dynamical analysis. IEEE Transactions on Systems, Nan and Cybernetics, Part B: Cybernetics, 26(5):692-706. 6. Feng. S. &.Xu. L 1996. A hybrid knowledge-based system for urban development.Expert Systems with Applications,10(1): 157-163. 7. Gen. M.1998. Neural network technique for fuzzy multiobjective linear programming. Computers and Industrial Engineering, 35(3-4):543-546. 8. Jacobsen. H.1998.Genetic architecture for hybrid intelligent systems. Proceedings of the IEEE International Conference on Fuzzy Systems, New York: 709-714. 9. P.Z.Wang. & H.X.Li. 1997.Fuzzy Information Processing and Fuzzy Computers, Science Press: New York, Beijing. 10. H.X.Li. 2001.Fuzzy Neural Intelligent Ssystems, CRC Press,FL: New York. 11. K.Q.Zou. 2003 .Fuzzy Decision Support Systems and Its Application. Advances in Systems Science and Application.3(l):52-54. 12. K.Q.Zou. 2003.Catastrophe fuzzy neural network model for natural fire. IEEE-ICMLC-2003:552-554. 13. K.Q.Zou. 2003.Fuzzy symmetric group categories in system sciences. IEEE-NAFIPS-2003:73-76.

APPLICATION OF ARTIFICIAL NEURAL NETWORKS IN THE FLOOD FORECAST* LIHUA FENG Department of Geography, Zhejiang Normal University, Jinhua 321004, China E-mail: fenglh@zjnu. en JIALU Computer Science and Information System, University of Phoenix, FL 33076, USA Adopt the " memory " and "simulation" of the artificial neural networks to do the flood forecast because the advantages of the neural networks can be used to simulate and record the relationship of the input and the output on the complex "function" through the training and the learning based on the historical data without any mathematics models. For this research, the authors proposed a new flood forecast system with the related applications based on the neural networks methods. It was shown the better performance and the efficiency results. It will be expected that this system application will become more sensitive to increase a better performance for the flood forecast.

1. Introduction In 1998, we got the serious damages by the flood at Changjiang areas in China. It cost more than 2000 billion RMB. Therefore, we have to develop a precision technique for the flood forecast. For the traditional forecast, the scientists usually set mathematical models, graphics, and related history data to analyze and to deal with the pattern recognition [1], Many scientists have been doing the researches on the relationship of the dependent variable y and independent variable x in [ y = f(x) ] to find out the models for the flood research [2]. The artificial neural networks (ANN) have been successful in the pattern recognition because of the advantages in self-learning, self-organizing, selfadapting and fault tolerant [3]. In this case, we proposed the system for the flood forecast based on the principles and methods of the artificial neural networks in our research [4].

" This work was supported by Zhejiang Provincial Science and Technology Foundation of China (No. 2006C23066).

659

660 2. The Principles and Methodology ANN is the complex neural networks that consists of the large of the simple neural cells. It was proposed based on the biology research on human brain. It can be used to simulate the activity in human brain neural. ANN has the topology structures of the information processing with the parallel distribution. The mappings of the estimation responses of input and output are obtained via the combinations of the nonlinear functions. We can use the pattern recognition methodology to analyze the algorithms of the neural networks using the past experience, neural cells, memory, fuzzy theory, nonlinear, and noise data without any mathematical models [5]. The neural networks of the error back propagation is called BP network that consists of the back and forward propagating in the process of learning. In the forward processing, the sample signals will gradually go forward through the each layer with the Sigmoid function/(*) = l/(l + e"*). The neural cell of the each layer only affects the status of the next neural cell. If the expected output signals can not be obtained in the output layer, the weight values of the each layer of the neural cells must be modified. The error of the output signals will be backward from the same way. Finally, the signal error will arrive in the certain areas with the repeated propagating. The network is set m layers and yj, so that, y™ indicates the output with m layers and j nodes. If y° is equal to x,, andy indicates the inputs. Let W™ be the connection weight, and it indicates y"~' to y", then we get the threshold of 6J with m layers andy nodes. The training steps of BP networks are as follows: ® Sign the values and the thresholds in (-1, 1) randomly. (2) Select ( x k , Tk) from the group of the training data and add the input variables into the input layer (m=0) , then we get, y°=x" (all/points) (1) k indicates the training graphical numbers (Equation 1). (D The signals go forward propagating at the networks with the following formula:

y:=F(S;)=FQ:wi;yr+0:)

(2)

It starts to calculate they nodes of the each layer with the output ym from the first layer until the calculation processing completed. F(s) represents one of the Sigmoid functions (Equation 2). (3) The error value of the each node for the output layer (Equation 3):

The above error is obtained from the different values between the real output and the required output, ® The error value of the each node is for the previous layers (Equation 4):

(4)

It depends on the each layer of the error back propagation (m=m, m-1, • • •, 1) . © The weight values and thresholds are for the backward of the each layer (Equation 5 and 6),

w; (/+1) = w; (o+ n sjyr + a\w; (0 -w;{$-1)] o~ (t+1) =
(5) (6)

where t indicates the sequence time of the layers; 77 indicates the learning rate[ rj £ (0, 1)]; and a indicates the momentum constant [ a € (0, 1)]. (7) It calculates them backward to the computation of the step (D, and then we get the next graphics to repeat the calculations from the step (2) to the step (7) until the precision obtained (Equation 7).

^=iE(^-^;)2/2 *

(7)

J

We can start to analyze the forecast information with the weight values and the thresholds after the training procedure of the neural networks. 3. Results and Discussions We use the peak stages in the upper and lower reaches stations at Dadu River of China (without tributary rivers) as an example to describe the flood forecast of the neural networks. The following table (Table 1) indicates the peak stages of 16 measured floods in the upper and lower reaches stations [6]. We use the peak stage of the upper reaches station as an input variable, and the peak stage of the lower reaches station can also be used as an output variable. Then we get the nodes of the input and the output layers as 1. We also set the node of the hidden layer as 5 according to the law of Kolmogorov. Therefore, we obtained the topology structure (1,5,1) from the neural networks of the flood forecast. For the speed of the convergence, it is necessary to handle the original data H, (Equation 8), #;=(",-"-.)/(#«-#.j (8) where H^ and HmiB indicate the maximum and the minimum values of the peak stage. From the equation 8, know the value of H\ locates intervals [0, 1], After the original values were handled and added to the BP networks, we can select the training parameters for the training and the learning by using the learning rate 7 at 0.85 and the momentum constant a at 0.60. We use the numbers of 1 ~ 12 as the training samples and the number of 13 —16 as test

samples in order to test the networks of training and learning. According to 100 thousands of the training and learning on the training samples, the error of the networks was £=0.006. It is less than the expected precision scale where the networks were convergence. The expected results are shown in Table 1 where the mean error e is 0.08m and the maximum error emajc is -0.21m. Table 1. Peak stages in the upper and lower reaches stations at Dadu River and its calculating results (unit: m). NO.

Training

sample

Test sample

Upper peak stage Lower peak stage

Output

Fit value

Error

1 2 3 4 5 6 7 8 9 10 11 12

2280.96 2283.21 2277.93 2278.16 2280.41 2280.59 2281.93 2279.68 2283.00 2280.97 2283.09 2281.54

1309.75 1311.05 1307.88 1308.04 1309.44 1309.39 1310.30 1309.05 1310.93 1309.94 1311.26 1310.09

0.5552 0.9574 0.0217 0.0389 0.4716 0.4991 0.7315 0.3316 0.9328 0.5472 0.9433 0.6525

1309.76 1311.12 1307.95 1308.01 1309.47 1309.57 1310.35 1309.00 1311.03 1309.73 1311.07 1310.09

0.01 0.07 0.07 -0.03 0.03 0.18 0.05 -0.05 0.10 -0.21 -0.19 0.00

13 14 15 16

2280.99 2282.00 2281.59 2278.21

1309.74 1310.44 1310.12 1308.02

0.5599 0.7491 0.6644 0.0436

1309.77 1310.41 1310.13 1308.03

0.03 -0.03 0.01 0.01

We can use the networks for the forecast on the peak stage for the lower reaches station because of the relationship of the "function". This function of the input and the output variables were obtained through the "simulation" and "memory" after the neural networks training. The results are shown in Table 1 where the error e is 0.03 ^ -0.03 > 0.01 and 0.01m respectively. For the flood motion, there exists the relationship of the dependent variable v and the independent variable x: v = f(x). It is impossible to describe the relationship by using the related function, even for the simplest peak stage of the non-tributary river [ Hlma = f(Hupper) ]. Therefore, we adopt the " memory " and "simulation" of the neural networks to do the flood forecast because the advantages of the neural networks can be used to simulate and record the relationship of the input and the output on the complex "function" through the training and the learning based on the historical data without any mathematics models. In fact, we can use the technology of the neural networks to analyze both simple and complex situations for the flood forecast as follows:

663 (1) The forecast of the related peak stage (discharge) based on the parameters of the different factors (equal to the related variable graphics) . These factors include the simultaneous stages (discharge) of the lower reaches station, the difference of the flooding of the upper reaches station (discharge), the precipitation of the space interval, the lower backwater, and the multiple tributary water etc.. These can be only used as input variations. Table 2 shows the results of the forecast of three peak stages about Gushui and Shigou Stations of Suijiang River in China. The first, second, and third input variables are the peak stages in Gushui Station HaaM, the simultaneous stages Hs of Shigou standing with the peak stages of Gushui Station, the precipitation of space interval P, and the output variation is the peak stages of Shigou Station HSMgou [6]. The mean error e is 0.07m, the maximum error emn is 0.19m, and the forecast error e of three peak stages are 0.06m, 0.06m, and -0.09m based on the computations respectively. Therefore, Table 2 shows the satisfied results. Table 2. Forecast of the peak stages at Gushui and Shigou Stations of the Suijiang River (unit: m). NO. 78 79 80

^Gushui

30.26 30.22 32.19

Hs

/>(mm)

"shigou

Output

Value

Error

12.34 13.15 14.68

44 72 136

13.36 13.70 15.44

0.2416 0.3136 0.6597

13.42 13.76 15.35

0.06 0.06 -0.09

(2) The forecast of the propagation time of the flood peak based on the parameters of the factors. The input variable is the same as the above, but the output variable is the propagation time of the flood peak. (3) The forecast of the precipitation- runoff of the factors. The input variables include the precipitation of the single station/^ (/=1, 2, •••, n) , the mean of the area precipitation, the influencing precipitation of the previous stage Pa, the water storage of the drainage basin before the precipitation W0, the duration of the rainfall T, the rainfall intension, the evaporation in the interval of the precipitation and the flooding discharge Q0 etc. The output variation is the runoff. (4) The flood routing through reservoir. The input variables include the precipitation of the single station within the time interval P, (;'=1, 2, •••, n) , the mean precipitation of the area within the time interval, the mean of the discharge flowing into reservoir within the time intervals and the previous water stage within the time intervals etc. The output variations are the stages at the front of the reservoir within the end of the time intervals and the mean of the discharge flowing out from the reservoir within the time intervals.

The neural networks technology can be also used to analyze the relationships of the flood peak-volume and the water stages-discharge. For the research, the historical data were analyzed and computed, which show that the forecast results are satisfied if the parameter selection is reasonable. 4. Conclusions From the above the authors described in the research, it is impossible to describe the relationship between the independent variable x and dependent variable y by using certain functions for the flood motions. The technology of the neural networks can be used as an alternative method to solve the problems. The artificial neural networks can be used to complete the information processing of the networks with the interaction of the neural cells. The mappings of the stimuli effectives and the estimation responses of the input and the output are obtained via the combinations of the nonlinear functions. It has advantages on self-learning, self-organizing, self-adapting and fault tolerant. It is definitely to be used for the application on the flood forecast. It was proved from this research. For this research, the authors proposed a new flood forecast system and developed the new software with the related applications based on the neural networks methods. It was shown the better performance and the efficiency results. Nevertheless, there still exist some problems for the further study on the flood forecast. It will be expected that this system application will become more sensitive to increase a better performance for the flood forecast in the neural networks. References 1. 2.

3.

4.

5.

6.

U. R. Acharya and P. S. Bhat. Classification of heart rate data using artificial neural network and fuzzy equivalence relation. Pattern Recognition, 36(1), 61-68 (2003). M. Amitabha. Application of visual, statistical and artificial neural network methods in the differentiation of water from the exploited aquifers in Kuwait. Hydrogeology Journal, 11(3), 343-356 (2003). L. Konstantin and G. L. Norman. Application of an artificial neural network simulation for top-of-atmosphere radiative flux estimation from CERES. Journal of Atmospheric and Oceanic Technology, 20(12), 1749-1757 (2003). J. C. Zhou, Q. S. Zhou and P. Y. Han. Artificial Neural Networks, the Sixth Generation Computation Accomplishment. Scientific Popularization Publisher, 4751 (1993). G. M. Brion and S. Lingireddy. Artificial neural network modelling: a summary of successful applications relative to microbial water quality. Water Science and Technology, 47(3), 235-240 (2003). H. L. Li, L. C. Li and J. F. Yan. Hydrological Forecast,. China Waterpower Press, 10-13(1979).

INTEGRATED MANAGEMENT PATTERN OF MARINE SECURITY SYNTHESIS RISK WANG YUE College of City & Environment, Liaoning Normal University, Dalian Huanghe Road 850 Dalian, Liaoning, China REN XUE-HUI College of City & Environment, Liaoning Normal University, Dalian Huanghe Road 850 Dalian, Liaoning, China DING YONG-SHENG Institute of Marine Science, Shanghai Marine University, Shanghai Pudong Road 1550 Shanghai, China YU CHANG-YING Department of Public Administration, Dongbei University of Finance and Economics, Dalian Jianshan Street 217, Dalian, Liaoning, China The meaning of marine security has been interpreted in point view of synthesis risk, and by integrating 4S (GIS, GPS, RS and DSS) the management system has been designed. Based on this, the integrated management pattern of marine security synthesis risk has been provided. This management system has been taken as the management tool. Through the risk calculation and fuzzy appraisal, the harm and loss that risk has brought may be estimated, and the forecast or prewam of risk also can be achieved. By the response mechanism of emergency management, replying and reducing risk influence may be realized. The whole process depends on the system that has been built.

1. Introduction Security and risk are the two aspects of one thing. It is necessary to strengthen the management and prevention of risk for security. ' The sea, the important constituent of global life system, is not only the strategic development base of the biology resources, the water resources, the mineral resource and the sea energy, but also the new space that the humanity pursues the continual development.2 However, when the people obtain rich economical benefit, the sea suffered serious safely threat at the same time. Therefore, it is urgent to utilize scientific method to manage sea, reduce synthesis risk coefficient of marine 665

security, and realize continual development of marine economy. 2. Marine Security Synthesis Risk 2.1. Marine Security Marine security was not merely limited to the security of resource and environment, also involved each kind of social relations and social problems that has relation to the marine resource and environment, 3 especially the territory security and national sovereignty problem. Marine security has been divided into territory security, environment security, resource security, marine activity security and so on. Territory security. The sea is usually taken as the extend part of the national territory; its security problem directly has relation to the national signory and territorial integrity. As the front platform of national security, whether no matter has the event occurred, marine security leaves various hidden troubles, which include the significant war harm and sea disaster and so on." Resource security. At present, the marine resource has been extremely influenced and destructed during the course of developing. Take the petroleum as an example, once the scientists has sketchily estimated that it approximately has 500000 tons petroleum entered the world sea by natural process every year, but the petroleum drained by such humanity activities greatly surpassed this number for petroleum transport, ships accident, and marine petroleum mining and so on. Environment security. The environmental pollution mainly originated from two aspects: One is the life activity, and the other is the producing activity. Pertinent data indicate: the bigger fishery pollution and harm accidents of China altogether have surpassed 1000 times in 2001, the directly economical loss amounts to 350 million Yuan.4 Increasingly serious pollution has brought the extremely disadvantageous consequence to marine environment. Marine activity security. There are mainly three kinds of marine activity that influence marine security: sea transportation, sea traveling and marine production. In the process of transportation, besides own fuel to the sea can cause sea pollution, some accidents like the petroleum leak, the dangerous material detonation, the waste throwing random may also create the security threat. The sea traveling activity serious influences and destructs marine resource, and pollutes marine environment. As the main place of marine cultivation activity, people face security risk not only from production but also "DING De-wen, LV Ji-bin, etc. Think about the National Marine Ecological Environment Security. Report on Marine Economic Geography Forum in 2004. Dalian. 2004, 9

from sea itself. 2.2. Synthesis Risk Management At present, Chinese marine security synthesis risk management is in leading strings. The existed risk management pattern has below characteristics:5 1) Each individual or department often independently adapts certain countermeasure for risk, which lacks the systematic characteristic and global property. 2) The risk management basically locates the passive management situation. 3) Risk management is often instantaneous and discontinuous; as the risk comes the management will carry on. 4) Systematic and scientific theory method is lacked to direct the risk management. 3.

Integrated Management of Marine Security Synthesis Risk

The integrated management pattern of marine security synthesis risk should contain the parts, such as monitoring, prewarning, appraisal, emergency, administer and so on; also should involve the person, wealth, matter, etc. 3.1. Integrated Management Idea The integrated management idea of marine security synthesis risk is showed on Fig.l. Mb) re Tactic of rescue and comeback

X

Risk with respect to loss Monitor the risk course

Monitor

Marine security synthesis risk

Risk assessment Risk calculation Forecast risk

Create and implement the pre-planned scheme

Prewarn promul- alarm gate Fig.l Management idea

Correspond department combine emergency

<"

iternet

a

Continual monitor

3.2. Integrated Management Tool The management and emergency decision-making system of marine security synthesis risk has been constructed by seamlessly integrating 4S. The 4S includes Geography information system (GIS), Three-dimensional visible system (VS) and Expert system (ES), Decision-making support system (DSS). Mapinfo7.0 is taken as the foundational platform; SQL Server2000 stores the resource entity data, meanwhile by using VC++ it has been exploited

secondly under the environment of Windows XP. The structure of C/S+B/S is applied to the systemic running environment; a server (SUN3000, 450) is used as the hardware. This system mainly has the functions: risk monitor, risk appraisal and prewarn, management decision, emergency response. 3.3. Systemic Database Design Systemic database design is shown on Table. 1. Table .1 Design of database Database User

Data table Usertable

Corresponding Information Data of Study Area

Environmental Foundational Information

Online Information

Archives Information

Line Name ID User's Name Password Create Data Atmosphere Seawater Biota Weather Human Activity Public Security Medical Treatment Traffic Salvage Army Community Input Record Output Record Historic Files Policy Files

3.4. Integrated Management Pattern 3.4.1. Risk Monitor With technology developing unceasingly, people have already been able to monitor and forecast the security risk. The system can monitor marine disaster risk. Furthermore, probability that the disaster possibly occurs may be forecasted by the fuzzy judgment model interior the system, computation risk: *n(
' u f a ) *»(?)

n

tM(q)

12

^i(
1»

W =

=

(asl(q),ex3(q),a^(q),c^(q))

669 Among them, tj (q) (i=l, 2...n; j=l, 2, 3, 4) expresses the probability that j rank disaster occurs in t sea area when the disaster factor P,„ appears. Based on the result above, the superintendent may forecast the disaster risk occurrence and arrange preparation earlier time. 3.4.2. Risk Estimation and Prewarning In order to alleviate or reduce threat of safe and economical that accident or the disaster has brought, risk harm degree and loss also must be appraised by using the systematic management tool:

sup \ryD>(d,$,l)A KD(d,p)A

KL(l,p)= fit

n^)\

Diem

7tD(d, p) is the fuzzy risk of object of hazard effect, r(3)(d,0,l)is the relation between destructiveness and loss degree,

Advances in Applied Artificial Intelligence

Read more

Developments in Applied Artificial Intelligence

Read more

Applied artificial intelligence. Proc. 7th FLINS conf

Read more

Artificial Intelligence

Read more

Artificial Intelligence

Read more

Artificial Intelligence

Read more

Artificial Intelligence

Read more

Artificial Intelligence

Read more

Artificial Intelligence

Read more

Applied Intelligence

Read more

Innovations in Applied Artificial Intelligence: 18th International Conference on Industrial and Engineering Applications of Artificial Intelligence

Read more

Artificial general intelligence

Read more

Artificial Intelligence Through Prolog

Read more

Handbook of Artificial Intelligence,

Read more

Handbook of Artificial Intelligence,

Read more

Encyclopedia of Artificial Intelligence

Read more

Handbook of Artificial Intelligence,

Read more

Artificial Intelligence: Modern Approach

Read more

Artificial Intelligence with Uncertainty

Read more

Artificial Intelligence with Uncertainty

Read more

Artificial Intelligence in Medicine

Read more

Artificial Intelligence Illuminated

Read more

Artificial Intelligence eBook

Read more

Artificial Intelligence: Modern Approach

Read more

Artificial Intelligence, Genetic Programming

Read more

Artificial Intelligence: Modern Approach

Read more

Artificial Intelligence Today

Read more

Artificial General Intelligence

Read more

Artificial Intelligence: Modern Approach

Read more

Artificial Intelligence today

Read more

Recommend Documents

Advances in Applied Artificial Intelligence

Advances in Applied Artificial Intelligence John Fulcher, University of Wollongong, Australia IDEA GROUP PUBLISHING He...

Developments in Applied Artificial Intelligence

Lecture Notes in Artificial Intelligence Subseries of Lecture Notes in Computer Science Edited by J. G. Carbonell and J....

Applied artificial intelligence. Proc. 7th FLINS conf

Artificial Intelligence

Artificial Intelligence

Titles in the Series Artificial Intelligence Science Beats Crime Science of Emotions Spare Parts for People Sports ...

Artificial Intelligence

~~i®~l i~i~Ii~ii~llll il~lllIl~II~lliilIIi~llliiIl~l!IIIIi~i ili i !iiiii~iii Artificial Intelligence ~' ~ ~ ~ ~ ...

Artificial Intelligence

Artificial Intelligence Other books in the Lucent Library of Science and Technology series: Bacteria and Viruses Bla...

Artificial Intelligence

Artificial Intelligence Other books in the Lucent Library of Science and Technology series: Bacteria and Viruses Bla...

Artificial Intelligence

CSIR GOLDEN JUBILEE SERIES ARTIFICIAL INTELLIGENCE K. D. PAVATE ARTIFICIAL INTELLIGENCE K.D. PAY ATE Publications ...

Applied Intelligence

This page intentionally left blank P1: KNP 9780521884280FM CUFX193/Sternberg 978 0 521 88325 2 January 11, 2008 A...

Report "Applied Artificial Intelligence"

Your name

Email

Reason

Description

Copyright © 2025 EPDF.TIPS. All rights reserved.
About Us | Privacy Policy | Terms of Service | Copyright | DMCA | Contact Us | Cookie Policy

Sign In

Email

Password

Remember me Forgot password?

Login with Facebook

Our partners will collect data and use cookies for ad personalization and measurement. Learn how we and our ad partner Google, collect and use data. Agree & close