Multicriteria Decision Aid Classification Methods
Applied Optimization Volume 73 Series Editors: Panos M. Pardalos Uni...
94 downloads
894 Views
12MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Multicriteria Decision Aid Classification Methods
Applied Optimization Volume 73 Series Editors: Panos M. Pardalos University of Florida, U.S.A. Donald Hearn University of Florida, U.S.A.
The titles published in this series are listed at the end of this volume.
Multicriteria Decision Aid Classification Methods by
Michael Doumpos and
Constantin Zopounidis Technical University of Crete, Department of Production Engineering and Management, Financial Engineering Laboratory, University Campus, Chania, Greece
KLUWER ACADEMIC PUBLISHERS NEW YORK, BOSTON, DORDRECHT, LONDON, MOSCOW
eBook ISBN: Print ISBN:
0-306-48105-7 1-4020-0805-8
©2004 Kluwer Academic Publishers New York, Boston, Dordrecht, London, Moscow Print ©2002 Kluwer Academic Publishers Dordrecht All rights reserved No part of this eBook may be reproduced or transmitted in any form or by any means, electronic, mechanical, recording, or otherwise, without written consent from the Publisher Created in the United States of America Visit Kluwer Online at: and Kluwer's eBookstore at:
http://kluweronline.com http://ebooks.kluweronline.com
To my parents Christos and Aikaterini Doumpos To my wife Kleanthi Koukouraki and my son Dimitrios Zopounidis
Table of contents
PROLOGUE
xi
CHAPTER 1: INTRODUCTION TO THE CLASSIFICATION PROBLEM 1. 2. 3. 4.
Decision making problematics The classification problem General outline of classification methods The proposed methodological approach and the objectives of the book
1 4 6
10
CHAPTER 2: REVIEW OF CLASSIFICATION TECHNIQUES 1. Introduction 2. Statistical and econometric techniques 2.1 Discriminant analysis 2.2 Logit and probit analysis 3. Non-parametric techniques 3.1 Neural networks 3.2 Machine learning 3.3 Fuzzy set theory 3.4 Rough sets
15 15 16 20 24 24 27 30 32
CHAPTER 3: MULTICRITERIA DECISION AID CLASSIFICATION TECHNIQUES 1. Introduction to multicriteria decision aid 1.1 Objectives and general framework 1.2 Brief historical review 1.3 Basic concepts 2. Methodological approaches 2.1 Multiobjective mathematical programming 2.2 Multiattribute utility theory
39 39 40 41 43 45 48
viii 2.3 Outranking relation theory 2.4 Preference disaggregation analysis 3. MCDA techniques for classification problems 3.1 Techniques based on the direct interrogation of the decision maker 3.1.1 The AHP method 3.1.2 The ELECTRE TRI method 3.1.3 Other outranking classification methods 3.2 The preference disaggregation paradigm in classification problems
50 52 55 55 55 59 64 66
CHAPTER 4: PREFERENCE DISAGGREGATION CLASSIFICATION METHODS 1. Introduction 2. The UTADIS method 2.1 Criteria aggregation model 2.2 Model development process 2.2.1 General framework 2.2.2 Mathematical formulation 2.3 Model development issues 2.3.1 The piece-wise linear modeling of marginal utilities 2.3.2 Uniqueness of solutions 3. The multi-group hierarchical discrimination method (MHDIS) 3.1 Outline and main characteristics 3.2 The hierarchical discrimination process 3.3 Estimation of utility functions 3.4 Model extrapolation Appendix: Post optimality techniques for classification model development in the UTADIS method
77 78 78 82 82 86 96 96 97 100 100 101 105 111 113
CHAPTER 5: EXPERIMENTAL COMPARISON OF CLASSIFICATION TECHNIQUES 1. Objectives 2. The considered methods 3. Experimental design 3.1 The factors 3.2 Data generation procedure 4. Analysis of results 5. Summary of major findings Appendix: Development of ELECTRE TRI classification models using a preference disaggregation approach
123 124 126 126 131 134 143 150
ix
CHAPTER 6: CLASSIFICATION PROBLEMS IN FINANCE 1. Introduction 2. Bankruptcy prediction 2.1 Problem domain 2.2 Data and methodology 2.3 The developed models 2.3.1 The model of the UTADIS method 2.3.2 The model of the MHDIS method 2.3.3 The ELECTRE TRI model 2.3.4 The rough set model 2.3.5 The statistical models 2.4 Comparison of the bankruptcy prediction models 3. Corporate credit risk assessment 3.1 Problem domain 3.2 Data and methodology 3.3 The developed models 3.3.1 The UTADIS model 3.3.2 The model of the MHDIS method 3.3.3 The ELECTRE TRI model 3.3.4 The rough set model 3.3.5 The models of the statistical techniques 3.4 Comparison of the credit risk assessment models 4. Stock evaluation 4.1 Problem domain 4.2 Data and methodology 4.3 The developed models 4.3.1 The MCDA models 4.3.2 The rough set model 4.4 Comparison of the stock evaluation models
159 161 161 164 172 172 174 176 178 179 181 185 185 188 194 194 196 199 200 201 202 205 205 209 215 215 220 222
CHAPTER 7: CONCLUSIONS AND FUTURE PERSPECTIVES 1. Summary of main findings 2. Issues for future research
225 229
REFERENCES
233
SUBJECT INDEX
251
Prologue
Decision making problems, according to their nature, the policy of the decision maker, and the overall objective of the decision, may require the choice of an alternative solution, the ranking of the alternatives from the best to the worst ones or the assignment of the considered alternatives into predefined homogeneous classes. This last type of decision problem is referred to as classification or sorting. Classification problems are often encountered in a variety of fields including finance, marketing, environmental and energy management, human resources management, medicine, etc. The major practical interest of the classification problem has motivated researchers in developing an arsenal of methods for studying such problems, in order to develop mathematical models achieving the higher possible classification accuracy and predicting ability. For several decades multivariate statistical analysis techniques such as discriminant analysis (linear and quadratic), and econometric techniques such as logit and probit analysis, the linear probability model, etc., have dominated this field. However, the parametric nature and the statistical assumptions/restrictions of such approaches have been an issue of major criticism and skepticism on the applicability and the usefulness of such methods in practice. The continuous advances in other fields including operations research and artificial intelligence led many scientists and researchers to exploit the new capabilities of these fields, in developing more efficient classification techniques. Among the attempts made one can mention neural networks, machine learning, fuzzy sets as well as multicriteria decision aid. Multicriteria decision aid (MCDA) has several distinctive and attractive features, involving, mainly, its decision support orientation. The significant advances in MCDA over the last three decades constitute a powerful non-parametric alternative methodological approach to study classification problems. Al-
xii
though the MCDA research, until the late 1970s, has been mainly oriented towards the fundamental aspects of this field, as well as to the development of choice and ranking methodologies, during the 1980s and the 1990s significant research has been undertaken on the study of the classification problem within the MCDA framework. Following the MCDA framework, the objective of this book is to provide a comprehensive discussion of the classification problem, to review the existing parametric and non-parametric techniques, their problems and limitations, and to present the MCDA approach to classification problems. Special focus is given to the preference disaggregation approach of MCDA. The preference disaggregation approach refers to the analysis (disaggregation) of the global preferences (judgement policy) of the decision maker in order to identify the criteria aggregation model that underlies the preference result (classification). The book is organized in seven chapters as follows: Initially, in chapter 1 an introduction to the classification problem is presented. The general concepts related to the classification problem are discussed, along with an outline of the procedures used to develop classification models. Chapter 2 provides a comprehensive review of existing classification techniques. The review involves parametric approaches (statistical and econometric techniques) such as the linear and quadratic discriminant analysis, the logit and probit analysis, as well as non-parametric techniques from the fields of neural networks, machine learning, fuzzy sets, and rough sets. Chapter 3 is devoted to the MCDA approach. Initially, an introduction to the main concepts of MCDA is presented along with a panorama of the MCDA methodological streams. Then, the existing MCDA classification techniques are reviewed, including multiattribute utility theory techniques, outranking relation techniques and goal programming formulations. Chapter 4 provides a detailed description of the UTADIS and MHDIS methods, including their major features, their operation and model development procedures, along with their mathematical formulations. Furthermore, a series of issues is also discussed involving specific aspects of the functionality of the methods and their model development processes. Chapter 5 presents an extensive comparison of the UTADIS and MHDIS methods with a series of well-established classification techniques including the linear and quadratic discriminant analysis, the logit analysis and the rough set theory. In addition, ELECTRE TRI a well-known MCDA classification method based on the outranking relation theory is also considered in the comparison and a new methodology is presented to estimate the parameters of classification models developed through ELECTRE TRI. The comparison is performed through a Monte-Carlo simulation, in order to investi-
xiii
gate the classification performance (classification accuracy) of the considered methods under different data conditions. Chapter 6 is devoted to the real-world application of the proposed methodological framework for classification problems. The applications considered originate from the field of finance, including bankruptcy prediction, corporate credit risk assessment and stock evaluation. For each application a comparison is also conducted with all the aforementioned techniques. Finally, chapter 7 concludes the book, summarizes the main findings and proposes future research directions with respect to the study of the classification problem within a multidimensional context. In preparing this book, we are grateful to Kiki Kosmidou, Ph.D. candidate at the Technical University of Crete, for her important notes on an earlier version of the book and her great help in the preparation of the final manuscript.
Chapter 1 Introduction to the classification problem
1.
DECISION MAKING PROBLEMATICS
Decision science is a very broad and rapidly evolving research field at theoretical and practical levels. The post-war technological advances in combination with the establishment of operations research as a sound approach to decision making problems, created a new context for addressing real-world problems through integrated, flexible and realistic methodological approaches. At the same time, the range of problems that can be addressed efficiently has also been extended. The nature of these problems is widely diversified in terms of their complexity, the type of solutions that should be investigated, as well as the methodological approaches that can be used to address them. Providing a full categorization of the decision making problems on the basis of the above issues is a difficult task depending upon the scope of the categorization. A rather straightforward approach is to define the two following categories of decision making problems (Figure 1.1): Discrete problems involving the examination of a discrete set of alternatives. Each alternative is described along some attributes. Within the decision making context these attributes have the form of evaluation criteria. Continuous problems involving cases where the number of possible alternatives is infinite. In such cases one can only outline the region where the alternatives lie (feasible region), so that each point in this re-
2
Chapter 1
gion corresponds to a specific alternative. Resource allocation is a representative example of this form of problems.
When considering a discrete decision making problem, there are four different kinds of analyses (decision making problematics) that can be performed in order to provide meaningful support to decision makers (Roy, 1985; cf. Figure 1.2): to identify the best alternative or select a limited set of the best alternatives, to construct a rank–ordering of the alternatives from the best to the worst ones, to classify/sort the alternatives into predefined homogenous groups, to identify the major distinguishing features of the alternatives and perform their description based on these features. The first three forms of decision making problems (choice, ranking, classification) lead to a specific result regarding the evaluation of the alternatives. Both choice and ranking are based on relative judgments, involving pair-wise comparisons between the alternatives. Consequently, the overall evaluation result has a relative form, depending on the alternatives being evaluated. For instance, an evaluation result of the form “product X is the best of its kind” is the outcome of relative judgments, and it may change if the set of products that are similar to product X is altered. On the contrary, the classification problem is based on absolute judgments. In this case each alternative is assigned to a specific group on the basis of a pre-specified rule. The definition of this rule, usually, does not depend on the set of alternatives being evaluated. For instance, the evaluation result “product X does not meet the consumer needs” is based on absolute judgments, since it does not depend on the other products that are similar to product X. Of course, these judgments are not always absolute, since
1. Introduction to the classification problem
3
they are often defined within the general context characterizing the decision environment. For instance, under specific circumstances of the general economic and business environment a firm may fulfill the necessary requirements for its financing by a credit institution (these requirements are independent of the population of firms seeking financing). Nevertheless, as the economic and business conditions evolve, the financing requirements may change towards being stricter or more relaxed. Therefore, it is possible that the same firm is rejected credit under in a different decision environment. Generally, despite any changes that are made in the classification rule used, this rule is always defined independently of the existing decision alternatives. This is the major distinguishing difference between the classification problem and the problems of choice or ranking.
4
2.
Chapter 1
THE CLASSIFICATION PROBLEM
As already mentioned classification refers to the assignment of a finite set of alternatives into predefined groups; this is a general description. There are several more specific terms often used to refer to this form of decision making problem. The most common ones are the following three: Discrimination. Classification. Sorting. The first two terms are commonly used by statisticians as well as by scientists of the artificial intelligence field (neural networks, machine learning, etc.). The term “sorting” has been established by MCDA researchers. Although all three terms refer to the assignment of a set of alternatives into predefined groups, there is notable difference to the kinds of problems that they describe. In particular, from the methodological point of view the above three terms describe two different kinds of problems. The terms “discrimination” and “classification” refer to problems where the groups are defined in a nominal way. In this case the alternatives belonging into different groups have different characteristics, without being possible to establish any kind of preference relation between them (i.e., the groups provide a description of the alternatives without any further information). One of the most well-known problems of this form is the iris classification problem used by Fisher (1936) with a pioneering work on the linear discriminant analysis. This problem involves the distinction between three species of flowers, iris setosa, iris versicolor and iris virginica, given their physical characteristics (length and width of the sepal and petal). Obviously, each group (specie) provide a description of its member flowers, but this description does not incorporate any preferential information. Pattern recognition is also an extensively studied problem of this form with numerous significant applications in letter recognition, speech recognition, recognition of physical objects and human characteristics. On the other hand, sorting refers to problems where the groups are defined in an ordinal way. A typical example of this form of problems, is the bankruptcy risk evaluation problem, which will be extensively studied later on in this book (Chapter 6). Bankruptcy risk evaluation models typically involve the assignment of a firm into the group healthy firms or into the group of bankrupt ones. This is an ordinal definition of the groups, since it is rather obvious that the healthy firms are in a better situation than the bankrupt ones. Therefore, the definition of the groups in sorting problems does not only provide a simple description of the alternatives, but it also incorpo-
1. Introduction to the classification problem
5
rates additional preferential information, which could be of interest to the decision making context. For simplicity reasons, henceforth only the general term “classification” will be used in this book. However, distinction will be made between sorting and classification when required. Closing this introduction to the main concepts related to the classification problem, it is important to emphasize the difference between classification and clustering: in classification the groups are defined a priori, whereas in clustering the objective is to identify clusters (groups) of alternatives sharing similar characteristics. In other words, in a classification problem the analyst knows in advance what the results of the analysis should look like, while in clustering the analyst tries to organize the knowledge embodied in a data sample in the most appropriate way according to some similarity measure. Figure 1.3 outlines this difference in a graphical way.
6
Chapter 1
The significance of the classification problems extends to a wide variety of practical fields of interest. Some characteristic examples are the following: Medicine: medical diagnosis to assign the patients into groups (diseases) according to the observed symptoms (Tsumoto, 1998; Belacel, 2000). Pattern recognition: recognition of human characteristics or physical objects and their classification into properly defined groups (Ripley, 1996; Young and Fu, 1997; Nieddu and Patrizi, 2000). Human resources management: personnel evaluation on the basis of its skills and assignment to proper working positions. Production management: monitoring and control of complex production systems for fault diagnosis purposes (Catelani and Fort, 2000; Shen, 2000). Marketing: selection of proper marketing policies for penetration to new markets, analysis of customer characteristics, customer satisfaction measurement, etc. (Dutka, 1995; Siskos et al., 1998). Environmental management and energy policy: analysis and in time diagnosis of environmental impacts, examination of the effectiveness of energy policy measures (Diakoulaki et al., 1999). Financial management and economics: bankruptcy prediction, credit risk assessment, portfolio selection (stock classification), country risk assessment (Zopounidis, 1998; Zopounidis and Doumpos, 1998).
3.
GENERAL OUTLINE OF CLASSIFICATION METHODS
Most classification methods proposed for the development of classification models operate on the basis of a regression philosophy, trying to exploit the knowledge that is provided through the a priori definition of the groups. A general outline of the procedure used to develop a classification model is presented in Figure 1.4. This procedure is common to most of the existing classification methods. In traditional statistical regression, the objective is to identify the functional relationship between a dependent variable Y and a vector of independent variables X given a sample of existing observations (Y, X). Most of the existing classification methods address the classification problem in a similar approach. The only actual difference between the statistical regression and the classification problem is that in the latter case, the dependent variable is not a real valued variable, but a discrete one. Henceforth, the dependent variable that determines the classification of the alternatives will be denoted by C, while its discrete levels (groups) will be denoted by
1. Introduction to the classification problem
7
where q is the number of groups. Similarly, g will be used to denote the vector of independent variables, i.e., Henceforth the independent variables will be referred to as criteria or attributes. Both terms are quite similar. However, an attribute defines a nominal description of the alternatives, whereas a criterion defines an ordinal description (i.e., a criterion can be used to specify if an alternative is preferred over another,
8
Chapter 1
whereas an attribute cannot provide this information). A more detailed discussion of the criterion concept is given in the Chapter 3. The term “attribute” will only be used in the review presented in Chapter 2 regarding the existing parametric and non-parametric classification techniques, in order to comply with the terminology used in the disciplines discussed in the review. All the remaining discussion made in this book will use the term “criterion” which is established in the field of multicriteria decision aid, the main methodological approach proposed in this book. The sample of observations used to develop the classification model will be referred to as the training sample or reference set. The number of observations of the training sample will be denoted by m. The observations will be referred to as alternatives. Each alternative is considered as a vector consisting of the performance of the alternative on each criterion, i.e., where denotes the performance of alternative on criterion On the basis of the above notations, addressing the classification problem involves the development of a model of the form which can be used to determine the classification of the alternatives given their characteristics described using the criteria vector g. The development of such a model is performed so that a predefined measure of the differences between the a priori classification C and the estimated classification is minimized. If the developed model performs satisfactorily in the training sample, it can be used to decide upon the classification of any new alternative that becomes under consideration. This is the major point of interest in implementing the above process: to be able to organize the knowledge embodied in the training sample so that it can be used for real-time decision making purposes. The above model development and implementation process is common to the majority of the existing classification techniques, at least as far as its general philosophy is concerned. There are several differences, however, in specific aspects of this process, involving mainly the details of the parameter estimation procedure and the form of the classification model. Figure 1.5 gives a general categorization of the existing methodologies on the basis of these two issues. The developed classification model, most commonly, has a functional form expressed as a function combining the alternatives’ performance on the criteria vector g to estimate a score for each alternative. The estimated score is a measure of the probability that an alternative belongs into a specific group. The objective of the model development process, in this case, is to minimize a measure of the classification error involving the assignment of the alternatives by the model in an incorrect group. The classification models of form are referred in Figure 1.5 as “quantitative”, in the sense that they
1. Introduction to the classification problem
9
rely on the development and use of a quantitative index to decide upon the assignment of the alternatives1.
Alternatively to the functional form, classification models can also have a symbolic form. The approaches that follow this methodological approach lead to the development of a set of “IF conditions THEN conclusion” classi1
The term “quantitative models” does not necessarily imply that the corresponding approaches handle only quantitative variables. The developed function can also consider qualitative variables too. This will be demonstrated later in this book, through the presentation of multicriteria decision aid classification methodologies.
10
Chapter 1
fication rules. The conditions part of each rule involves the characteristics of the alternatives, thus defining the conditions that should be fulfilled in order for the alternatives to be assigned into the group indicated in the conclusion part of the rule. Except for a classification recommendation, in some cases the conclusion part also includes a numerical coefficient representing the strength of the recommendation (conclusion). Procedures used to develop such decision rules are referred to as rule induction techniques. Generally, it is possible to develop an exhaustive set of rules covering all alternatives belonging in the training sample, thus producing a zero classification error. This, however, does not ensure that the developed rules have the necessary generalizing ability. For this reason, in order to avoid the development of rules of limited usefulness a more compact set of rules is often developed. The plethora of real-world classification problems encountered in many research and practical fields has been the major motivation for researchers towards the continuous development of advanced classification methodologies. The general model development procedure presented above, reflects the general scheme and objective of every classification methodological approach, i.e. the elicitation of knowledge from a sample of alternatives and its representation into a functional or symbolic form, such that the reality can be modeled as consistently as possible. A consistent modeling ensures the reliability of the model’s classification recommendations.
4.
THE PROPOSED METHODOLOGICAL APPROACH AND THE OBJECTIVES OF THE BOOK
Among the different methodological approaches proposed for addressing classification problems (see Chapter 2 for an extensive review), MCDA is an advanced field of operations research providing several advantages from the research and practical points of view. At the research level, MCDA provides a plethora of methodological approaches for addressing a variety of decision making situations. Many of these approaches are well-suited to the nature of the classification problem. The major characteristic shared by all MCDA classification approaches is their focus on the modeling and addressing of sorting problems. This form of classification problems is of major interest within a decision making context, given that the concept of preference lies in the core of every real-world decision2. Furthermore, recently, there has been a number of studies on the use of 2
Recall from the discussion in section 2 of this chapter that sorting problems involve the consideration of the existing preferences with regard to the specification of the groups (or-
1. Introduction to the classification problem
11
MCDA approaches for addressing discrimination problems too (Perny, 1998; Belacel, 2000). Except for the MCDA approach, the research made in other fields on considering the special features of the sorting problems is still quite limited. This characteristic of MCDA can be considered as a significant advantage within a decision making context. The above issue has also practical implications. The main objective of many classification methodologies is to develop “optimal” classification models, where the term optimal is often restricted to the statistical description of the alternatives, or to the classification accuracy of the developed model given a training sample. In the first case, the discrimination of a given set of alternatives on the basis of a pre-specified statistical discrimination measure is often inadequate in practice. This is because such an approach assumes that the decision maker is familiar with the necessary theoretical background required for the appropriate interpretation of the developed models. In the second case, the development of classification models of high accuracy is obviously of major interest from a practical perspective. This objective, however, should be accompanied with the objective of developing models that are easily interpretable and that they comply with the concepts used by the decision maker. This will ensure that the decision maker can judge the logical consistency of the developed model, judge it according to his/her decision making policy and argue upon the model’s recommendations. The objective of MCDA is to address the above issues taking into consideration the decision maker’s preferential system. The major part of the research in the development of MCDA classification methodologies has been devoted to the theoretical aspects of the model development and implementation process, given that the decision maker is willing to provide several information regarding his/her preferential system. However, this is not always a feasible approach, especially within the context of repetitive decision making situations, where the time required to make decisions is often crucial. The preference disaggregation approach (Jacquet-Lagrèze and Siskos, 1982, 1983) of MCDA is well-suited for addressing this problem following the general regression-based scheme outlined in section 3 of this chapter. The preference disaggregation approach constitutes the basis of the methodology proposed in this book for addressing classification problems. In particular, the present book has two major objectives: 1. To illustrate the contribution of MCDA in general, and preference disaggregation in particular in addressing classification problems: Towards the accomplishment of this objective, Chapter 4 presents in detail two dinal specification of the groups), while discrimination problems do not consider this special feature.
12
Chapter 1
MCDA methods that employ the preference disaggregation paradigm. These methods include the UTADIS method (UTilités Additives DIScriminantes) and the MHDIS method (Multi-group Hierarchical DIScrimination). Both methods constitute characteristic examples of the way that the preference disaggregation approach can be used for model development in classification problems. Furthermore, the preference disaggregation philosophy is a useful basis which can be used in conjunction with other MCDA streams (see Chapter 3 for a discussion of the existing MCDA methodological streams). Between these streams, the outranking relation approach (Roy, 1991) is the most widely studied MCDA field for developing classification models. The new methodology presented in Chapter 5 for specifying the parameters of the ELECTRE TRI method (a well-known MCDA classification method based on the outranking relation approach), illustrates the capabilities provided by the preference disaggregation paradigm in applying alternate MCDA approaches in a flexible way; it also illustrates the interactions that can be established between preference disaggregation analysis and other methodological streams of MCDA. 2. To perform a thorough investigation of the efficiency of MCDA classification approaches in addressing classification problems: Despite the significant theoretical developments made on the development of MCDA classification methodologies, there is still a lack of research studies on the investigation of the efficiency of these methodologies as opposed to other approaches. Of course, the up to date applications presented in several practical and research fields illustrate the high level of support that MCDA methodologies provide to decision makers through:
The promotion of the direct participation of the decision maker in the decision making process. The interactive development of user-oriented models that facilitate the better understanding of the major structural parameters of the problem at hand. In addition to these supporting features, the efficiency of the MCDA classification methodologies is also a crucial issue for their successful implementation in practice. The analysis of the classification efficiency can not be performed solely on the basis of case studies applications; experimental investigation is also required. The extensive simulation presented in Chapter 5 addresses the above issue considering the classification performance of MCDA classification methods (UTADIS, MHDIS, ELECTRE TRI) compared to other well-known methodologies. The encouraging results of this comparison are further complimented by the practical applications presented in Chapter 6. These applications involve
1. Introduction to the classification problem
13
classification problems from the field of financial management. Financial management during the last decades has become a field of major importance for the sustainable development of firms and organizations. This is due to the increasing complexity of the economic, financial and business environments worldwide. These new conditions together with the complexity of financial decision making problems has motivated researchers from various research fields (operations research, artificial intelligence, computer science, etc.) to develop efficient methodologies for financial decision making purposes. Considering these remarks, three financial decision making problems are considered: bankruptcy prediction, credit risk assessment, portfolio selection and management. These applications highlight the contribution of MCDA classification techniques in addressing significant real-world decision making problems of high complexity. The results obtained through the conducted experimental investigation and the above applications illustrate the high performance capabilities of MCDA classification methodologies compared to other well-established techniques.
Chapter 2 Review of classification techniques
1.
INTRODUCTION
As mentioned in the introductory chapter, the major practical importance of the classification problem motivated researchers towards the development of a variety of different classification methodologies. The purpose of this chapter is to review the most well-known of these methodologies for classification model development. The review is organized into two major parts, involving respectively: 1. The statistical and econometric classification methods which constitute the “traditional” approach to develop classification models. 2. The non-parametric techniques proposed during the past two decades as innovative and efficient classification model development techniques.
2.
STATISTICAL AND ECONOMETRIC TECHNIQUES
Statistics is the oldest science involved with the analysis of given samples in order to make inferences about an unknown population. The classification problem is addressed by statistical and econometric techniques within this context. These techniques include both univariate and multivariate methods. The former involve the development and implementation of univariate statistical tests which are mainly of descriptive character. For these reasons, such techniques will not be considered in this review. The foundations of multi-
16
Chapter 2
variate techniques can be traced back to the work of Fisher (1936) on the linear discriminant analysis (LDA). LDA has been the most extensively used methodology for developing classification models for several decades. Approximately a decade after the publication of Fisher’s paper, Smith (1947) extended LDA to the more general quadratic form (quadratic discriminant analysis - QDA). During the subsequent decades the focus of the conducted research moved towards the development of econometric techniques. The most wellknown methods from this field include the linear probability model, logit analysis and probit analysis. These three methods are actually special forms of regression analysis in cases where the dependent variable is discrete. The linear probability model is only suitable for two-group classification problems, whereas both logit and probit analysis are applicable to multi-group problems too. The latter two methodologies have several significant advantages over discriminant analysis. This has been one of the main reasons for their extensive use. Despite the criticism on the use of these traditional statistical and econometric approaches, they still remain quite popular both as research tools as well as for practical purposes. This popularity is supported by the existence of a plethora of statistical and econometric software, which contribute to the easy and timeless use of these approaches. Furthermore, statistical and econometric techniques are quite often considered in comparative studies investigating the performance of new classification techniques being developed. In this regard, statistical and econometric techniques often serve as a reference point (benchmark) in conducting such comparisons. It is also important to note that under specific data conditions, statistical techniques yield the optimal classification rule.
2.1
Discriminant analysis
Discriminant analysis has been the first multivariate statistical classification method used for decades by researchers and practitioners in developing classification models. In its linear form it was developed by Fisher (1936). Given a training sample consisting of m alternatives whose classification is a priori known, the objective of the method is to develop a set of discriminant functions maximizing the ratio of among-groups to within-groups variance. In the general case where the classification involves q groups, q-1 linear functions of the following form are developed:
2. Review of classification techniques
17
where
are the attributes describing the alternatives is a constant term, and are the attributes’ coefficients in the discriminant function. The indices k and l refer to a pair of groups denoted as and respectively. The estimation of the model’s parameters involves the estimation of the constant terms and the vectors The estimation procedure is based on two major assumptions: (a) the data follow the multivariate normal distribution, (b) the variance-covariance matrices for each group are equal. Given these assumptions, the estimation of the constant terms and the attributes’ coefficients is performed as follows:
where: is a n×1 vector consisting of the attributes’ mean values for group is the within-groups variance-covariance matrix. Denoting by m the number of alternatives in the training sample, by the vector and by q the number of groups, the matrix is specified as follows:
The parameters’ estimates in LDA are not unique. In particular, it is possible to develop alternative discriminant functions in which the coefficients and the constant terms are defined as linear transformations of and This makes it difficult to ascertain the contribution of each attribute in the classification of the alternatives1. One approach to tackle this problem is to use the standardized discriminant function coefficients estimated using a transformed data set so that the attributes have zero mean and unit variance. Once the parameters (coefficients and constant term) of the discriminant functions are estimated, the classification of an alternative is decided on the basis of its discriminant score assigned to the alternative by each
1
In contrast to the traditional multivariate regression analysis, in discriminant analysis statistical tests such as the t-test are rarely used to estimate the significance of the discriminant function coefficients, simply because these coefficients are not unique.
18
Chapter 2
discriminant function In particular, an alternative is classified into group if for all other groups the following rule holds:
In the above rule K (k | l) denotes the misclassification cost corresponding to an incorrect decision to classify an alternative into group while actually belong into group and denotes the a priori probability that an alternative belongs into group Figure 2.1 gives a graphical representation of the above linear classification rule in the two-group case, assuming that all misclassification costs and a priori probabilities are equal.
In the case where the group variance-covariance matrices are not equal, then QDA is used instead of LDA. The general form of the quadratic discriminant function developed through QDA, for each pair of groups and is the following:
The estimation of the coefficients and the constant term is performed as follows:
2. Review of classification techniques
groups
and and
19
denote the within-group variance covariance matrices for estimated as follows:
where denotes the number of alternatives of the training sample that belong into group Given the discriminant score of an alternative on every discriminant function corresponding to a pair of groups and the quadratic classification rule (Figure 2.2) is similar to the linear case: the alternative is classified into group if and only if for all other groups the following inequality holds:
20
Chapter 2
In practice, both in LDA and QDA the specification of the a priori probabilities and the misclassification costs K (k | l) is a cumbersome process. To overcome this problem, trial and error processes are often employed to specify the optimal cut-off points in the above presented classification rules. Except for the above issue, LDA and QDA have been heavily criticized for a series of other problems regarding their underlying assumptions, involving mainly the assumption of multivariate normality and the hypotheses made on the structure of the group variance-covariance matrices. A comprehensive discussion of the impact that these assumptions have on the obtained discriminant analysis’ results is presented in the book of Altman et al. (1981). Given that the above two major underlying assumptions are valid (multivariate normality and known structure of the group variance-covariance matrices), the use of the Bayes rule indicates that the two forms of discriminant analysis (linear and quadratic) yield the optimal classification rule (the LDA in the case of equal group variance-covariance matrices and the QDA in the opposite case). In particular, the developed classification rules are asymptotically optimal (as the training sample size increases the statistical properties of the considered groups approximate the unknown properties of the corresponding populations). A formal proof of this finding is presented by Duda and Hart (1978), as well as by Patuwo et al. (1993). Such restrictive statistical assumptions, however, are rarely met in practice. This fact raises a major issue regarding the real effectiveness of discriminant analysis in realistic conditions. Several studies have addressed this issue. Moore (1973), Krzanowski (1975, 1977), Dillon and Goldstein (1978) showed that when the data include discrete variables, then the performance of discriminant analysis deteriorates especially when the attributes are significantly correlated (correlation coefficient higher than 0.3). On the contrary, Lanchenbruch et al. (1973), Subrahmaniam and Chinganda (1978) concluded that even in the case of non-normal data the classification results of discriminant analysis models are quite robust, especially in the case of the QDA and for data with small degree of skewness.
2.2
Logit and probit analysis
The aforementioned problems regarding the assumptions made by discriminant analysis motivated researches in developing more flexible methodologies. The first of such methodologies to be developed include the linear probability model, as well as logit and probit analysis. The linear probability model is based on a multivariate regression using as dependent variable the classification of the alternatives of the training sample. Theoretically, the result of the developed model is interpreted as the
2. Review of classification techniques
21
probability that an alternative belongs into one of the pre-specified groups. Performing the regression, however, does not ensure that the model’s result lies in the interval [0, 1], thus posing a major model interpretation problem. Ignoring this problem, a common cut-off used to decide upon the classification of the alternatives is 0.5. In the multi-group case, however, it is rather difficult to provide an appropriate specification of the probability cut-off point. This combined with the aforementioned problem on the interpretation of the developed model make the use of the linear probability model quite cumbersome, both from a theoretical and a practical perspective. For these reasons the use of the linear probability model is rather limited and consequently it will not be further considered in this book. Logit and probit analysis originate from the field of econometrics. Although both these approaches are not new to the research community2, their use has been boosted during the 1970s with the works of Nobelist Daniel McFadden (1974, 1980) on the discrete choice theory. The discrete choice theory provided the necessary basis for understanding the concepts regarding the interpretation of logit and probit models. Both logit and probit analysis are based on the development of a nonlinear function measuring the group-membership probability for the alternatives under consideration. The difference between the two approaches involves the form of the function that is employed. In particular, logit analysis employs the logistic function, whereas the cumulative probability density function of the normal distribution is used in probit analysis. On the basis of these functions, and assuming a two-group classification problem, the probability that an alternative belongs into group is defined as follows3: Logit analysis: Probit analysis: The estimation of the constant term a and the vector b, is performed using maximum likelihood techniques. In particular, the parameters’ estimation process involves the maximization of the following likelihood function:
2
3
The first studies on probit and logit analysis can be traced back to the 1930s and the 1940s with the works of Bliss (1934) and Berkson (1944), respectively. If a binary 0-1 variable is assigned to designate each group such that and then equations (2.1)-(2.2) provide the probability that an alternative belongs into group C2. If the binary variable is used in the opposite way then equations (2.1)-(2.2) provide the probability that an alternative belongs into group C1.
22
Chapter 2
The maximization of this function is a non-linear optimization problem which is often difficult to solve. Altman et al. (1981) report that if there exists a linear combination of the attributes that accurately discriminates the pre-specified groups, then the optimization process will not converge to an optimal solution. Once the parameters’ estimation process is completed, equations (2.1) and (2.2) are used to estimate the group-membership probabilities for all the alternatives under consideration. The classification decision is taken on the basis of these probabilities. For instance, in a two-group classification problem, one can impose a classification rule of the following form: “assign an alternative to group if otherwise assign the alternative into group Alternate probability cut-off points, other than 0.5, can also be specified through trial and error processes. In the case of multi-group classification problems, logit and probit analysis can be used in two forms: as multinomial or ordered logit/probit models. The difference among multinomial and ordered models is that the former assume a nominal definition of the groups, whereas the latter assume an ordinal definition. In this respect, ordered models are more suitable for addressing sorting problems, while traditional discrimination/classification problems are addressed through multinomial models. The ordered models require the estimation of a vector of attributes’ coefficients b and a vector of constant terms a. These parameters are used to specify the probability that an alternative belongs into group in the way presented in Table 2.1. The constant terms are defined such that The parameters’ estimation process is performed similarly to the two-group case using maximum likelihood techniques. The multinomial models require the estimation of a set of coefficient vectors and constant terms corresponding to each group On the basis of these parameters, the multinomial logit model estimates the probability that an alternative belongs into group as follows:
2. Review of classification techniques
23
For normalization purposes and are set such that and whereas all other and (k = 2, …, q) are estimated through maximum likelihood techniques. Between the logit and probit models, the latter is usually preferred. This is mainly because the development of logit models requires less computational effort. Furthermore, there are not strong theoretical and practical results to support a comparative advantage of probit models in terms of their classification accuracy.
During the last three decades both logit and probit analysis have been extensively used by researchers in a wide range of fields as efficient alternatives to discriminant analysis. However, despite the theoretical advantages of these approaches over LDA and QDA (logit and probit analysis do not pose assumptions on the statistical distribution of the data or the structure of the group variance-covariance matrices) several comparative studies made have not clearly shown that these techniques outperform discriminant analysis (linear or quadratic) in terms of their classification performance (Krzanowski, 1975; Press and Wilson, 1978).
24
3.
Chapter 2
NON-PARAMETRIC TECHNIQUES
In practice the statistical properties of the data are rarely known, since the underlying population is difficult to be fully specified. This poses problems on the use of statistical techniques and motivated researchers towards the development of non-parametric methods. Such approaches have no underlying statistical assumptions and consequently it is expected that they are flexible enough to adjust themselves according to the characteristics of the data under consideration. In the subsequent sections the most important of these techniques are described.
3.1
Neural networks
Neural networks, often referred to as artificial neural networks, have been developed by artificial intelligence researchers as an innovative modeling methodology of complex problems. The foundations of the neural networks paradigm lie on the emulation of the operation of the human brain. The human brain consists of huge number of neurons organized in a highly complex network. Each neuron is an individual processing unit. A neuron receives an input signal (stimulus from body sensors or output signal from other neurons), which after a processing phase produces an output signal that is transferred to other neurons for further processing. The result of the overall process is the action or decision taken in accordance with the initial stimulus. This complex biological operation constitutes the basis for the development of neural network models. Every neural network is a network of parallel processing units (neurons) organized into layers. A typical structure of a neural network (Figure 2.3) includes the following structural elements:
1. An input layer consisting of a set of nodes (processing units-neurons) one for each input to the network. 2. An output layer consisting of one or more nodes depending on the form of the desired output of the network. In classification problems, the number of nodes of the output layer is determined depending on the number of groups. For instance, for a two-group classification problem the output layer may include only one node taking two values: 1 for group and 2 for group (these are arbitrary chosen values and any other pair is possible). In the general case where there are q groups, the number of nodes in the output layer is usually defined as the smaller integer which is larger than (Subramanian et al., 1993). Alternatively, it is also possible to set the number of output nodes equal to the number of groups. 3. A series of intermediate layers referred to as hidden layers. The nodes of each hidden layer are fully connected with the nodes of the subsequent
2. Review of classification techniques
25
and the proceeding layer. Furthermore, it is also possible to consider more complicated structures where all layers are fully connected to each other. Such general network structures are known as fully connected neural networks. The network presented in Figure 2.3 is an example of such structure. There is no general rule to define the number of hidden layers. This is, usually, performed through trial and error processes. Recently, however, a significant part of the research has been devoted on the development of self-organizing neural network models, that is neural networks that adjust their structure to best match the given data conditions. Research made on the use of neural networks for classification purposes showed that, generally, a single hidden layer is adequate (Patuwo et al., 1993; Subramanian et al., 1993). The number of nodes in this layer may range between q and 2n+1, where q is the number of groups and n is the number of attributes. Each connection between two nodes of the network is assigned a weight representing the strength of the connection. The determination of these weights (training of the network) is accomplished through optimization techniques. The objective of the optimization process is to minimize the differences between the recommendations of the network and the actual classification of the alternatives belonging in the training sample.
26
Chapter 2
The most widely used network training methodology is the back propagation approach (Rumerlhart et al., 1986). Recently advanced nonlinear optimization techniques have also contributed in obtaining globally optimum estimations of the network’s connection weights (Hung and Denton, 1993). On the basis of the connections’ weights, the input to each node is determined as the weighted average of the outputs of all other nodes with which there is a connection established. In the general case of a fully connected neural network (cf. Figure 2.3) the input to node i of the hidden layer r is defined as follows:
where: the number of nodes at the hidden layer j, the weight of the connection between node i of layer r and node k of layer j, the output of node k at layer j , an error term. The output of each node is specified through a transformation function. The most common form of this function is the logistic function:
where T is a user-defined constant. The major advantage of neural networks is their parallel processing ability as well as their ability to represent highly complex, nonlinear systems. Theoretically, this enables the approximation of any real function with infinite accuracy (Kosko, 1992). These advantages led to the widespread application of neural networks in many research fields. On the other hand, the criticism on the use of neural networks is focused on two points:
1. The increased computational effort required for training the network (specification of connections’ weights). 2. The inability to provide explanations of the network’s results. This is a significant shortcoming, mainly from a decision support perspective, since in a decision making context the justification of the final decision is often a crucial point.
2. Review of classification techniques
27
Except for the above two problems, research studies investigating the classification performance of neural networks as opposed to statistical and econometric techniques have led to conflicting results. Subramanian et al. (1993) compared neural networks to LDA and QDA through a simulation experiment using data conditions that were in accordance with the assumptions of the two statistical techniques. Their results show that neural networks can be a promising approach, especially in cases of complex classification problems involving more than two groups and a large set of attributes. On the other hand, LDA and QDA performed better when the sample size was increased. A similar experimental study by Patuwo et al. (1993), leads to the conclusion that there are many cases where statistical techniques outperform neural networks. In particular, the authors compared neural networks to LDA and QDA, considering both the case where the data conditions are in line with the assumptions of these statistical techniques, as well as the opposite case. According to the obtained results, when the data are multivariate normal with equal group variance-covariance matrices, then LDA outperforms neural networks. Similarly in the case of multivariate normality with unequal variance-covariance matrices, QDA outperformed neural networks. Even in the case of non-normal data, the results of the analysis did not show any clear superiority of neural networks, at least compared to QDA. The experimental analysis of Archer and Wang (1993) is also worth mentioning. The authors discussed the way that neural networks can be used to address sorting problems, and compared their approach to LDA. The results of this comparison show a higher classification performance for the neural networks approach, especially when there is a significant degree of group overlap.
3.2
Machine learning
During the last two decades machine learning evolved as a major discipline within the field of artificial intelligence. Its objective is to describe and analyze the computational procedures required to extract and organize knowledge from the existing experience. Within the different learning paradigms (Kodratoff and Michalski, 1990), inductive learning through examples is the one most widely used. In contrast to the classification techniques described in the previous sections, inductive learning introduces a completely different approach in modeling the classification problem. In particular, inductive learning approaches organize the extracted knowledge in a set of decision rules of the following general form:
28
Chapter 2
IF elementary conditions THEN conclusion The first part of such rules examines the necessary and sufficient conditions required for the conclusion part to be valid. The elementary conditions are connected using the AND operator. The conclusion consists of a recommendation on the classification of the alternatives satisfying the conditions part of the rule. One of the most widely used techniques developed on the basis of the inductive learning paradigm is the C4.5 algorithm (Quinlan, 1993). C4.5 is an improved modification of the ID3 algorithm (Quinlan, 1983, 1986). Its main advantages over its predecessor involve: 1. The capability of handling qualitative attributes. 2. The capability of handling missing information. 3. The elimination of the overfitting problem4.
The decision rules developed through the C4.5 algorithm are organized in the form of a decision tree such as the one presented in Figure 2.4. Every node of the tree considers an attribute, while the branches correspond to elementary conditions defined on the basis of the node attributes. Finally, the leaves designate the group to which an alternative is assigned, given that it satisfies the branches’ conditions.
4
Overfitting refers to the development of classification models that perform excellently in classifying the alternatives of the training sample, but their performance in classifying other alternatives is quite poor.
2. Review of classification techniques
29
The development of the classification tree is performed through an iterative process. Every stage of this process consists of three individual steps:
1. Evaluation of the discriminating power of the attributes in classifying the alternatives of the training sample. 2. Selection of the attribute having the highest discriminating power. 3. Definition of subsets of alternatives on the basis of their performances on the selected attribute. This procedure is repeated for every subset of alternatives formed in the third step, until all alternatives of the training sample are correctly classified. The evaluation of the attributes’ discriminating power in the first step of the above process is performed on the basis of amount of new information introduced by each attribute in the classification of the alternatives. The entropy of the classification introduced by each attribute is used as the appropriate information measure. In particular, assuming that each attribute introduces a partitioning of the training sample into t subsets each consisting of alternatives, then the entropy of this partitioning is defined as follows:
where, denotes the number of alternatives of set that belong into group The attribute with the minimum entropy is selected as the one with the highest discriminating power. This attribute adds the highest amount of new information in the classification of the alternatives. The above procedure may lead to a highly specialized classification tree with nodes covering only one alternative. This is the result of overfitting the tree to the given data of the training sample, a phenomenon which is often related to limited generalizing performance. C4.5 addresses this problem through the implementation of a pruning phase, so that the decision tree’s size is reduced, in order to improve its expected generalizing performance. The development and implementation of pruning methodologies is a significant research topic in the machine learning community. Some characteristic examples of pruning techniques are the ones presented by Breiman et al. (1984), Gelfand et al. (1991), Quinlan (1993). The general aspects of the paradigm used in C4.5 are common to other machine learning algorithms. Some well-known examples of such algorithms include CN2 (Clark and Niblett, 1989), the AQ family of algorithms (Michalski, 1969) and the recursive partitioning algorithm (Breiman et al., 1984).
30
Chapter 2
The main advantages of machine learning classification algorithms involve the following capabilities:
1. Handling of qualitative attributes. 2. Flexibility in handling missing information. 3. Exploitation of large data sets for model development purposes through computationally efficient procedures. 4. Development of easily understandable classification models (classification rules or trees).
3.3
Fuzzy set theory
Decision making is often based on fuzzy, ambiguous and vague judgments. The daily use of verbal expressions such as “almost”, “usually”, “often”, etc., are simple yet typical examples of this remark. The fuzzy nature of these simple verbal statements is indicative of the fuzziness encountered in the decision making process. The fuzzy set theory developed by Zadeh (1965), provides the necessary modeling tools for the representation of uncertainty and fuzziness in complex real-world situations. The core of this innovative approach is the fuzzy set concept. A fuzzy set is a set with no crisp boundaries. In the case of a traditional crisp set A a proposition of the form “alternative x belongs to the set A” is either true or false; for a fuzzy set, however, it can be partly true or false. Within the context of the fuzzy set theory the modeling of such fuzzy judgments is performed through the definition of membership functions. A membership function defines the membership degree that an object (alternative) belongs into a fuzzy set. The membership degree ranges in the interval [0, 1]. In the aforementioned example a membership degree equal to 1 indicates that the proposition “alternative x belongs to the set A” is true. Similarly, if the membership degree is 0, then it is concluded that the proposition is false. Any other value for the membership degree between 0 and 1 indicates that the proposition is partly true. Figure 2.5 presents an example of a typical form for the membership function µ for the proposition “according to attribute alternative x belongs to the set A”. The membership function corresponding to the negation of this proposition is also presented (the negation defines the complement set of A, denoted as the complement set includes the alternatives not belonging into A). In order to derive an overall conclusion regarding the membership of an alternative into a fuzzy set based on the consideration of all attributes, one must aggregate the partial membership degrees for each individual attribute. This aggregation is based on common operators such as “AND” and “OR” operators. The former corresponds to a union operation, whereas the latter
2. Review of classification techniques
31
indicates an intersection operation. A combination of these two operators is also possible.
In the case of classification problems, each group can be considered as a fuzzy set. Similarly to the machine learning paradigm, classification models developed through approaches that implement the fuzzy set theory have the form of decision rules. The general form of a fuzzy rule used for classification purposes is the following:
where each corresponds to a fuzzy set defined on the scale of attribute The strength of each individual condition is defined by the membership degree of the corresponding proposition “according to attribute alternative belongs to the set The rules of the above general form are usually associated with a certainty coefficient indicating the certainty about the validity of the conclusion part. Procedures for the development of fuzzy rules in classification problems have been proposed by several researchers. Some indicative studies on this field are the ones of Ishibuchi et al. (1992, 1993), Inuiguchi et al. (2000), Bastian (2000), Oh and Pedrycz (2000). Despite the existing debate on the relation between the fuzzy set theory and the traditional probability theory, fuzzy sets have been extensively used to address a variety of real-world problems from several fields. Furthermore, several researchers have exploited the underlying concepts of the fuzzy set theory in conjunction with other disciplines such as neural networks (neurofuzzy systems; Von Altrock, 1996), expert systems (fuzzy rule-based expert systems; Langholz et al., 1996), mathematical programming (fuzzy mathematical programming; Zimmermann, 1978) and MCDA (Yager, 1977; Du-
32
Chapter 2
bois and Prade, 1979; Siskos, 1982; Siskos et al., 1984a; Fodor and Roubens, 1994; Grabisch, 1995, 1996; Lootsma, 1997).
3.4
Rough sets
Pawlak (1982) introduced the rough set theory as a tool to describe dependencies between attributes, to evaluate the significance of attributes and to deal with inconsistent data. As an approach to handle imperfect data (uncertainty and vagueness), it complements other theories that deal with data uncertainty, such as probability theory, evidence theory, fuzzy set theory, etc. Generally, the rough set approach is a very useful tool in the study of classification problems, regarding the assignment of a set of alternatives into prespecified classes. Recently, however, there have been several advances in this field to allow the application of the rough set theory to choice and ranking problems as well (Greco et al., 1997). The rough set philosophy is founded on the assumption that with every alternative some information (data, knowledge) is associated. This information involves two types of attributes; condition and decision attributes. Condition attributes are those used to describe the characteristics of the objects. For instance the set of condition attributes describing a firm can be its size, its financial characteristics (profitability, solvency, liquidity ratios), its organization, its market position, etc. The decision attributes define a partition of the objects into groups according to the condition attributes. On the basis of these two types of attributes an information table S= is formed, as follows: U is a finite set of m alternatives (objects). Q is a finite set of n attributes. V is the intersection of the domains of all attributes (the domain of each attribute is denoted by The traditional rough set theory assumes that the domain of each attribute is a discrete set. In this context every quantitative real-valued attribute needs to be discretized5, using discretization algorithms such as the ones proposed by Fayyad and Irani (1992), Chmielewski and Grzymala-Busse (1996), Zighed et al. (1998). Recently, however, the traditional rough set approach has been extended so that no discritezation is required for quantitative attributes. Typical examples of the new direction are the DOMLEM algorithm (Greco et al., 1999a) and the MODLEM algorithm (Grzymala-Busse and Stefanowski, 2001).
5
Discretization involves the partitioning of an attribute’s domain [a, b] into h subintervals where and
2. Review of classification techniques
33
is a total function such that for every called information function (Pawlak, 1991; Pawlak and Slowinski, 1994). Simply stated, the information table is an m×n matrix, with rows corresponding to the alternatives and columns corresponding to the attributes. Given an information table, the basis of the traditional rough set theory is the indiscernibility between the alternatives. Two alternatives and are considered to be indiscernible, if and only if they are characterized by the same information, i.e. for every In this way every leads to the development of a binary relation on the set of alternatives. This relation is called P-indiscernibility relation, denoted by is an equivalence relation for any P. Every set of indiscernible alternatives is called elementary set and it constitutes a basic granule of knowledge. Equivalence classes of the relation are called P-elementary sets in S and denotes the P-elementary set containing alternative Any set of objects being a union of some elementary sets is referred to as crisp (precise) otherwise it is considered to be rough (imprecise, vague). Consequently, each rough set has a boundary-line consisting of cases (objects) which cannot be classified with certainty as members of the set or of its complement. Therefore, a pair of crisp sets, called the lower and the upper approximation can represent a rough set. The lower approximation consists of all objects that certainly belong to the set and the upper approximation contains objects that possibly belong to the set. The difference between the upper and the lower approximation defines the doubtful region, which includes all objects that cannot be certainly classified into the set. On the basis of the lower and upper approximations of a rough set, the accuracy its approximation can be calculated as the ratio of the cardinality of its lower approximation to the cardinality of its upper approximation. Assuming that and then the P-lower approximation, the Pupper approximation and the P-doubtful region of and respectively), are formally defined as follows:
On the basis of these approximations it is possible to estimate the accuracy of the approximation of the rough set Y, denoted by The accu-
34
Chapter 2
racy of the approximation is defined as the ratio of the number of alternatives belonging into the lower approximation to the number of alternatives of the upper approximation:
Within the context of a classification problem, each group is considered as a rough set The overall quality of the approximation of the classification by a set of attributes P is defined as follows:
Having defined the quality of the approximation, the first major capability that the rough set theory provides is to reduce the available information, so as to retain only the information that is absolutely necessary for the description and classification of the alternatives. This is achieved by discovering subsets R of the complete set of attributes P, which can provide the same quality of classification as the whole attributes’ set, i.e Such subsets of attributes are called reducts and are denoted by Generally, the reducts are more than one. In such a case the intersection of all reducts is called the core, i.e The core is the collection of the most relevant attributes, which cannot be excluded from the analysis without reducing the quality of the obtained description (classification). The decision maker can examine all obtained reducts and proceed to the further analysis of the considered problem according to the reduct that best describes reality. Heuristic procedures can also be used to identify an appropriate reduct (Slowinski and Zopounidis, 1995). The subsequent steps of the analysis involve the development of a set of rules for the classification of the alternatives into the groups where they actually belong. The rules developed through the rough set approach have the following form: IF conjunction of elementary conditions THEN disjunction of elementary decisions The procedures used to construct a set of decision rules employ the machine learning paradigm. Such procedures developed within the context of the rough set theory have been presented by Grzymala-Busse (1992), Slowinski and Stefanowski (1992), Skowron (1993), Ziarko et al. (1993), Ste-
2. Review of classification techniques
35
fanowski and Vanderpooten (1994), Mienko et al. (1996), Grzymala-Busse and Stefanowski (2001). Generally, the rule induction techniques follow one of the following strategies:
1. Development of a minimal set of rules covering all alternatives of the training sample (information table). 2. Development of an extensive set of rules consisting of all possible decision rules. 3. Development of a set of strong rules, even partly discriminant6, which do not necessarily cover all alternatives of the training sample. The first rule induction approaches developed within the rough set theory assumed that the attributes’ domain was a set of discrete values; otherwise a discretization was required. The most well-known approach within this category of rule induction techniques is the LEM2 algorithm (Grzymala-Busse, 1992). This algorithm leads to the development of a minimal set of rules (i.e., rules which are complete and non-redundant)7. The elementary conditions of decision rules developed through the LEM2 algorithm have an equality form where is a condition attribute and Recently new rule induction techniques have been developed that do not require the discretization of quantitative condition attributes. The condition part of rules induced through these techniques have an inequality form strict inequalities are also possible). Typical examples of such techniques are the DOMLEM algorithm (Greco et al., 1999a) and the MODLEM algorithm (Grzymala-Busse and Stefanowski, 2001). Both these algorithms are based on the philosophy of the LEM2 algorithm. The DOMLEM algorithm leads to the development of rules that have the following form:
6
7
Rules covering only alternatives that belong to the group indicated by the conclusion ofthe rule (positive examples) are called discriminant rules. On the contrary, rules that cover both positive and negative examples (alternatives not belonging into the group indicated by the rule) are called partly discriminant rules. Each partly discriminant rule is associated with a coefficient measuring the consistency of the rule. This coefficient is called level of discrimination and is defined as the ratio of positive to negative examples covered by the rule. Completeness refers to a set of rules that cover all alternatives of the training sample. A set of rules is called non-redundant if the elimination of any single rule from initial rule set leads to a new set of rules that does not have the completeness property.
36
Chapter 2
and denote the set of alternatives belonging into the sets of groups and respectively. In this context it is assumed that the groups are defined in an ordinal way, such that is the group of the most preferred alternatives and is the group of the least preferred ones. The rules developed through the MODLEM algorithm have a similar form to the ones developed by the DOMLEM algorithm. There are two differences, however:
1. Each elementary condition has the form or 2. The condition part of the rules indicates a specific classification of the alternatives rather than a set of groups. Irrespective of the rule induction approach employed, a decision rule developed on the basis of the rough set approach has some interesting properties and features. In particular, if all alternatives that satisfy the condition part belong into the group indicated by the conclusion of the rule, then the rule is called consistent. In the case where the condition part considers only a single group, then the rule is called exact, otherwise the rule is called approximate. The conclusion part of approximate rules involves a disjunction of at least two groups Approximate rules are developed when the training sample (information table) includes indiscernible alternatives belonging into different groups. Each rule is associated with a strength measure, indicating the number of alternatives covered by the rule. For approximate rules their strength is estimated for each individual group considered in their conclusion part. Stronger rules consider a limited number of elementary conditions; thus, they are more general. Once the rule induction process is completed, the developed rules can be easily used to decide upon the classification of any new alternative not considered during model development. This is performed by matching the conditions part of each rule to the characteristics of the alternative, in order to identify a rule that covers the alternative. This matching process may lead to one of the following four situations (Slowinski and Stefanowski, 1994):
1. The alternative is covered only by one exact rule. 2. The alternative is covered by more than one exact rules, all indicating the same classification. 3. The alternative is covered by one approximate rule or by more than one exact rules indicating different classifications. 4. The alternative is not covered by any rule. The classification decision in situations (1) and (2) is straightforward. In situation (3) the developed rule set leads to conflicting decisions regarding the classification of the alternative. To overcome this problem, one can consider the strength of the rules that cover the alternative (for approximate rule
2. Review of classification techniques
37
the strength for each individual group of the condition part must be considered). The stronger rule can be used to take the final classification decision. This approach is employed in the LERS classification system developed by Grzymala-Busse (1992). Situation (4) is the most difficult one, since using the developed rule set one has no evidence as to the classification of the alternative. The LERS system tackles this problem through the identification of rules that partly cover the characteristics of the alternative under consideration8. The strength of these rules as well as the number of elementary conditions satisfied by the alternative are considered in making the decision. This approach will be discussed in mode detail in Chapter 5. An alternative approach proposed by Slowinski (1993), involves the identification of a rule that best matches the characteristics of the alternative under consideration. This is based on the construction of a valued closeness relation measuring the similarity between each rule and the alternative. The construction of this relation is performed in two stages. The first stage involves the identification of the attributes that are in accordance to the affirmation “the alternative is close to rule r”. The strength of this affirmation is measured on a numerical scale between 0 and 1. The second stage involves the identification of the characteristics that are in discordance with the above affirmation. The strength of concordance and discordance tests are combined to estimate an overall index representing the similarity of a rule to the characteristics of the alternative. Closing this brief discussion of the rough set approach, it is important to note the recent advances made in this field towards the use of the rough set approach as a methodology of preference modeling in multicriteria decision problems (Greco et al., 1999a, 2000a). The main novelty of the recently developed rough set approach concerns the possibility of handling criteria, i.e. attributes with preference ordered domains, and preference ordered groups in the analysis of sorting examples and the induction of decision rules. The rough approximations of decision groups involve dominance relation, instead of indiscernibility relation considered in the basic rough set approach. They are build of reference alternatives given in the sorting example (training sample). Decision rules derived from these approximations constitute a preference model. Each “if ... then ...” decision rule is composed of: (a) a condition part specifying a partial profile on a subset of criteria to which an alternative is compared using the dominance relation, and (b) a decision part suggesting an assignment of the alternative to “at least” or “at most” a given class9. 8
9
Partly covering involves the case where the alternative satisfies only some of the elementary conditions of a rule. The DOMLEM algorithm discussed previously in this chapter is suitable for developing such rules.
38
Chapter 2
The decision rule preference model has also been considered in terms of conjoint measurement (Greco et al., 2001). A representation theorem for multicriteria sorting proved by Greco et al. states an equivalence of simple cancellation property, a general discriminant (sorting) function and a specific outranking relation (cf. Chapter 3), on the one hand, and the decision rule model on the other hand. It is also shown that the decision rule model resulting from the dominance-based rough set approach has an advantage over the usual functional and relational models because it permits handling inconsistent sorting examples. The inconsistency in sorting examples is not unusual due to instability of preference, incomplete determination of criteria and hesitation of the decision maker. It is also worth noting that the dominance-based rough set approach is able to deal with sorting problems involving both criteria and regular attributes whose domains are not preference ordered (Greco et al., 2002), and missing values in the evaluation of reference alternatives (Greco et al., 1999b; Greco et al., 2000b). It also handles ordinal criteria in more general way than the Sugeno integral, as it has been proved in Greco et al. (2001). The above recent developments have attracted the interest of MCDA researchers on the use of rough sets as an alternative preference modeling framework to the ones traditionally used in MCDA (utility function, outranking relation; cf. Chapter 3). Therefore, the new extended rough set theory can be considered as a MCDA approach. Nevertheless, in this book the traditional rough set theory based on the indiscernibility relation is considered as an example of rule-based classification techniques that employ the machine learning framework. The traditional rough set theory cannot be considered as a MCDA approach since it is only applicable with attributes (instead of criteria) and with nominal groups. This is the reason for the inclusion of the rough sets in this chapter rather than the consideration of rough sets in Chapter 3 that refers to MCDA classification techniques.
Chapter 3 Multicriteria decision aid classification techniques
1.
1.1
INTRODUCTION TO MULTICRITERIA DECISION AID Objectives and general framework
Multicriteria decision aid (MCDA) is an advanced field of operations research which has evolved rapidly over the past three decades both at the research and practical level. The development of the MCDA field has been motivated by the simple finding that resolving complex real-world decision problems cannot be performed on the basis of unidimensional approaches. However, when employing a more realistic approach considering all factors relevant to a decision making situation, one is faced with the problem referring to the aggregation of the existing multiple factors. The complexity of this problem often prohibits decision makers from employing this attractive approach. MCDA’s scope and objective is to support decision makers towards tackling with such situations. Of course, MCDA is not the only field involved with the aggregation of multiple factors. All the approaches presented in the previous chapter are also involved with the aggregation of multiple factors for decision making purposes. The major distinctive feature of MCDA, however, is the decision support orientation (decision aid) rather than the simple decision model development. In this respect, MCDA approaches are focused
40
Chapter 3
on the model development aspects that are related to the modeling and representation of the decision makers’ preferences, values and judgment policy. This feature is of major importance within a decision making context, bearing in mind that an actual decision maker is responsible for the implementation of the results of any decision analysis procedure. Therefore, developing decision models without considering the decision maker’s preferences and system of values, may be of limited practical usefulness. The decision maker is given a rather passive role in the decision analysis context. He does not participate actively to the model development process and his role is restricted to the implementation of the recommendation of the developed model, whose features are often difficult to understand. The methodological advances made in the MCDA field involve any form of decision making problem (choice, ranking, classification/sorting and description problems). The subsequent sub-sections describe the main MCDA methodological approaches and their implementation to address classification problems.
1.2
Brief historical review
Even from the early years of mankind, decision making has been a multidimensional process. Traditionally, this process has been based on empirical approaches rather than on sound quantitative analysis techniques. Pareto (1896) first set the basis for addressing decision-problems in the presence of multiple criteria. One of the most important results of Pareto’s research was the introduction of the efficiency concept. During the post-war period, Koopmans (1951) extended the concept of efficiency through the introduction of the efficient set concept: Koopmans defined the efficient set as the set of non-dominated alternatives. During the 1940s and the 1950s Von Neumann and Morgenstern (1944) introduced the utility theory, one of the major methodological streams of modern MCDA and decision science in general. These pioneering works inspired several researchers during the 1960s. Charnes and Cooper (1961) extended the traditional mathematical programming theory through the introduction of goal programming. Fishburn (1965) studied the extension of the utility theory in the multiple criteria case. These were all studies from US operations researchers. By the end of the 1960s, MCDA attracted the interest of European operations researchers too. Roy (1968), one of the pioneers in this field, introduced the outranking relation approach; he is considered as the founder of the “European” school of MCDA. During the next two decades (1970–1990) MCDA evolved both at the theoretical and practical (real-world applications) levels. The advances made
3. Multicriteria decision aid classification techniques
41
in information technology and computer science contributed towards this direction. This contribution extends towards two major directions: (1) the use of advanced computing techniques that enable the implementation of computationally intensive procedures, (2) the development of user-friendly decision support systems implementing MCDA methodologies.
1.3
Basic concepts
The major goal of MCDA is to provide a set of criteria aggregation methodologies that enable the development of decision support models considering the decision makers’ preferential system and judgment policy. Achieving this goal requires the implementation of complex processes. Most commonly, these processes do not lead to optimal solutions-decisions, but to satisfactory ones that are in accordance with the decision maker’s policy. Roy (1985) introduced a general framework described the decision aiding process that underlies the operation of all MCDA methodologies (Figure 3.1).
The first level of the above process, involves the specification of a set A of feasible alternative solutions to the problem at hand (alternatives). The objective of the decision is also determined. The set A can be continuous or discrete. In the former case it is specified through constraints imposed by the
42
Chapter 3
decision maker or by the decision environment, thus forming a set of feasible solutions, a concept that is well-known within the mathematical programming framework. In the case where the set A is discrete, it is assumed that the decision maker can list some alternatives which will be subject to evaluation within the given decision making framework. The determination of the objective of the decision specifies the way that the set A should be considered to take the final decision. This involves the selection of the decision problematic that is most suitable to the problem at hand: Choice of the best alternative. Ranking of the alternatives from the best to the worst. Classification/sorting of the alternatives into appropriate groups. Description of the alternatives. The second stage involves the identification of all factors related to the decision. MCDA assumes that these factors have the form of criteria. A criterion is a real function g measuring the performance of the alternatives on each of their individual characteristics, defined such that:
These properties define the main distinctive feature of the criterion concept compared to the attribute concept often used in other disciplines such as statistics, econometrics, artificial intelligence, etc. (cf. the previous chapter). Both an attribute and a criterion assign a description (quantitative or qualitative) to an alternative. In the case of a criterion, however, this description entails some preferential information regarding the performance of an alternative compared to other alternatives. The set of the criteria identified at this second stage of the decision aiding process, must form a consistent family of criteria. A consistent family of criteria is a set of criteria having the following properties: 1.
Monotonicity: every criterion must satisfy the conditions described by relations (3.1) and (3.2). Some criteria often satisfy (3.1) in the opposite way: In this case the criterion g is referred to as criterion of decreasing preference (lower values indicate higher preference). Henceforth, this book will not make any distinction between criteria of increasing or decreasing preference (any decreasing preference criterion can be transformed to an increasing preference criterion through sign reversal). A specified criteria set is considered to satisfy the monotonicity property if and only if: for every pair of alternatives x
3. Multicriteria decision aid classification techniques
43
and
2.
3.
for which there exists such that for every and it is concluded that x is preferred to Completeness: a set of criteria is complete if and only if for every pair of alternatives x and such that for every it is concluded that x is indifferent to If this condition does not hold, then it is considered that the chosen criteria set does not provide enough information for a proper evaluation of the alternatives in A. Non-redundancy: if the elimination of any single criterion from a criteria set that satisfies the monotonicity and completeness conditions leads to the formation of a new criteria set that does not meet these conditions, then the initial set of criteria is considered to be non-redundant (i.e., it provides only the absolutely necessary information for the evaluation of the alternatives).
Once a consistent family of criteria has been specified, the next step of the analysis is to proceed with the specification of the criteria aggregation model that meets the requirements of the objective/nature of the problem (i.e., choice, ranking, classification/sorting, description). Finally, in the fourth stage of the analysis the decision maker is provided with the necessary support required to understand the recommendations of the model. Providing meaningful support is a crucial issue for the successful implementation of the results of the analysis and the justification of the actual decision taken on the basis of the model’s recommendations.
2.
METHODOLOGICAL APPROACHES
As already noted, MCDA provides a plethora of methodologies for addressing decision making problems. The existing differences between these methodologies involve both the form of the models that are developed as well as the model development process. In this respect, MCDA researchers have defined several categorizations of the existing methodologies in this field. Roy (1985) identified three major methodological streams considering the features of the developed models:
1. Unique synthesis criterion approaches. 2. Outranking synthesis approaches. 3. Interactive local judgment approaches.
44
Chapter 3
Pardalos et al. (1995) suggested an alternative scheme considering both the features of the developed models as well as the features of model development process1:
1. 2. 3. 4.
Multiobjective mathematical programming. Multiattribute utility theory. Outranking relations. Preference disaggregation analysis.
Figure 3.2 illustrates how these four main MCDA methodological streams contribute to the analysis of decision making problems, both discrete and continuous. In this figure the solid lines indicate a direct contribution and dashed lines an indirect one. In particular, multiattribute utility theory, outranking relations and preference disaggregation analysis are traditionally used in discrete problems. All these three approaches lead to the development of a decision model that enables the decision maker to evaluate the performance of a discrete set of alternatives for choice, ranking or classification purposes. On the other hand, multiobjective mathematical programming is most suitable for continuous problems.
As indicated, however, except for the easily identifiable direct contribution (solid lines) of each MCDA stream in addressing specific forms of decision making problems, it is also possible to identify an indirect contribution (dashed lines). In particular, multiattribute utility theory, outranking rela1
Henceforth, all subsequent discussion made in this book adopts the approach presented by Pardalos et al . (1995).
3. Multicriteria decision aid classification techniques
45
tions and preference disaggregation analysis can also be used within the context of continuous decision problems. In this case, they provide the necessary means to model the decision maker’s preferential system in a functional or relational model, which can be used in a second stage in an optimization context (multiobjective mathematical programming). A well-known example where this framework is highly applicable is the portfolio construction problem, i.e. the construction of a portfolio of securities that maximizes the investor’s utility. In this case, the multiattribute utility theory or the preference disaggregation analysis can be used to estimate an appropriate utility function representing the investors’ decision making policy. Similarly, the multiobjective mathematical programming framework can be used in combination with the other MCDA approaches to address discrete problems. Within this context, multiobjective mathematical programming techniques are commonly used for model development purposes. This approach is employed within the preference disaggregation analysis framework, discussed later on in this chapter (cf. sub-section 2.4). The following sub-sections outline the main concepts and features of each of the aforementioned MCDA approaches. This discussion provides the basis for reviewing the use of MCDA for classification purposes.
2.1
Multiobjective mathematical programming
Multiobjective mathematical programming (MMP) is an extension of the traditional mathematical programming theory in the case where multiple objective functions need to be optimized. The general formulation of a MMP problem is as follows:
where: x
B
is the vector of the decision variables, are the objective functions (linear or non-linear) to be optimized, is the set of feasible solutions.
In contrast to the traditional mathematical programming theory, within the MMP framework the concept of optimal solution is no longer applicable. This is because the objective functions are of conflicting nature (the opposite is rarely the case). Therefore, it is not possible to find a solution that optimizes simultaneously all the objective functions. In this regard, within the
46
Chapter 3
MMP framework the major point of interest is to search for an appropriate “compromise” solution. In searching for such a solution one does not need to consider the whole set of feasible solutions; only a part of the feasible set needs to be considered. This part is called efficient set. The efficient set consists of solutions which are not dominated by any other solution on the pre-specified objectives. Such solutions are referred to as efficient solutions, non-dominated solutions or Pareto optimal solutions. In the graphical example illustrated in Figure 3.3 the efficient set is indicated by the bold line between the points A and E. Any other feasible solution is not efficient. For instance, solution Z is not efficient because the feasible solutions C and D dominate Z on the basis of the two objectives and and and
In solving MMP problems, the optimization of a linear weighted aggregation of the objectives is not appropriate. As indicated in the Figure 3.3, if the feasible set is not a convex hull, then using an aggregation of the form may lead to the identification of a limited part of the existing efficient solutions. In the example of Figure 3.3 only solutions B and D will be identified. Therefore, any MMP solution methodology should accommodate the need for searching the whole efficient set. This is performed through interactive and iterative procedures. In the first stage of such procedures an initial efficient solution is obtained and it is presented to the decision maker. If this
3. Multicriteria decision aid classification techniques
47
solution is considered acceptable by the decision maker (i.e., if it satisfies his expectations on the given objectives), then the solution procedure stops. If this is not the case, then the decision maker is asked to provide information regarding his preferences on the pre-specified objectives. This information involves the objectives that need to be improved as well as the trade-offs that he is willing to undertake to achieve these improvements. The objective of defining such information is to specify a new search direction for the development of new improved solutions. This process is repeated until a solution is obtained that is in accordance with the decision maker’s preferences, or until no further improvement of the current solution is possible. In the international literature several methodologies have been proposed that operate within the above general framework for addressing MMP problems. Some well-known examples are the methods developed by Benayoun et al. (1971), Zionts and Wallenius (1976), Wierzbicki (1980), Steuer and Choo (1983), Korhonen (1988), Korhonen and Wallenius (1988), Siskos and Despotis (1989), Lofti et al. (1992). An alternative approach to address constrained optimization problems in the presence of multiple objectives, is the goal programming (GP) approach, founded by Charnes and Cooper (1961). The concept of goal is different from that of objective. An objective simply defines a search direction (e.g., profit maximization). On the other hand, a goal defines a target against which the attained solutions are compared (Keeney and Raiffa, 1993). In this regard, GP optimizes the deviations from the pre-specified targets, rather than the performance of the solutions. The general form of a GP model is the following: Max/Min subject to :
where is goal defined as a function (linear or non-linear) of the decision variables is the target value for goal are the deviations from the target value representing the under-achievement and over-achievement of the goal respectively.
48
Chapter 3
g
is a function (usually linear) of the deviational variables.
The above general formulation shows that actually an objective function of an MMP formulation is transformed into a constraint within the context of a GP formulation. The right hand size of these constraints includes the target values of the goals, which can be defined either as some satisfactory values of the goals or as their optimal values. The simplicity of GP formulations has been the main reason for their wide popularity among researchers and practitioners. Spronk (1981) provides an extensive discussion of GP as well as its applications in the field of financial planning.
2.2
Multiattribute utility theory
Multiattribute utility theory (MAUT) extends the traditional utility theory to the multidimensional case. Even from the early stages of the MCDA field, MAUT has been one of the cornerstones of the development of MCDA and its practical implementation. Directly or indirectly all other MCDA approaches employ the concepts introduced by MAUT. For instance, the underlying philosophy of MMP and GP is to identify an efficient solution that maximizes the decision maker’s utility. Obviously, this requires the development of a utility function representing the decision maker’s system of preferences. Some MMP methodologies employ this philosophy; they develop a utility function and then, maximize it over the feasible set to identify the most suitable solution. The methodology presented by Siskos and Despotis (1989) implemented in the ADELAIS system (Aide à la DEcision pour systèmes Linéaires multicritères par AIde à la Structuration des préférences) is a typical example of this approach. The objective of MAUT is to model and represent the decision maker’s preferential system into a utility/value function U(g), where g is the vector of the evaluation criteria Generally, the utility function is a non-linear function defined on the criteria space, such that: (alternative x is preferred to (alternative x is indifferent to
) )
The most commonly used form of utility function is the additive one:
where,
3. Multicriteria decision aid classification techniques
49
are the marginal utility functions corresponding the evaluation criteria. Each marginal utility function defines the utility/value of the alternatives for each individual criterion are constants representing the trade-off that the decision maker is willing to take on a criterion in order to gain one unit on criterion These constants are often considered to represent the weights of the criteria and they are defined such that they sum-up to one:
The form of the additive utility function is quite similar to simple weighted average aggregation models. Actually, such models are a special form of an additive utility function, where all marginal utilities are defined as linear functions on the criteria’s values. The main assumption underlying the use of the additive utility function involves the mutual preferential independence condition of the evaluation criteria. To define the mutual preferential independence condition, the concept of preferential independence must be, firstly, introduced. A subset of the evaluation criteria is considered to be preferential independent from the remaining criteria, if and only if the decision maker’s preferences on the alternatives, that differ only with respect to the criteria do not depend on the remaining criteria. Given this definition, the set of criteria g is considered to be mutual preferentially independent if and only if every subset of criteria is preferentially independent from the remaining criteria (Fisburn, 1970; Keeney and Raiffa, 1993). A detailed description of the methodological framework underlying MAUT and its applications is presented in the book of Keeney and Raiffa (1993). Generally, the process for developing an additive utility function is based on the cooperation between the decision analyst and the decision maker. This process involves the specification of the criteria trade-offs and the form of the marginal utility functions. The specification of these parameters is performed through interactive procedures, such as the midpoint value technique proposed by Keeney and Raiffa (1993). The realization of such interactive procedures is often facilitated by the use of multicriteria decision support systems, such as the MACBETH system developed by Bana e Costa and Vansnick (1994). The global utility of the alternatives estimated on the basis of the developed utility function constitutes an index used for choice, ranking or classification/sorting purposes.
50
2.3
Chapter 3
Outranking relation theory
The foundations of the outranking relation theory (ORT) have been set by Bernard Roy during the late 1960s through the development of the ELECTRE family of methods (ELimination Et Choix Traduisant la REalité; Roy, 1968). Since then, ORT has been widely used by MCDA researchers, mainly in Europe. All ORT techniques operate in two major stages. The first stage involves the development of an outranking relation, whereas the second stage involves the exploitation of the outranking relation in order to perform the evaluation of the alternatives for choice, ranking, classification/sorting purposes. The concept of the outranking relation is common to both these stages. An outranking relation is defined as a binary relation used to estimate the strength of the preference for an alternative x over an alternative This strength is defined on the basis of: (1) the existing indications supporting the preference of x over (concordance of criteria), (2) the existing indications against the preference of x over (discordance of criteria). Generally, an outranking relation is a mechanism for modeling and representing the decision maker’s preferences based on an approach that differs from the MAUT framework on two major issues:
1. The outranking relation is not transitive: In MAUT the evaluations obtained through the development of a utility function are transitive. Assuming three alternatives the transitivity property is formally expressed as follows:
In contrast to MAUT, ORT enables the modeling and representation of situations where the transitivity does not hold. A well-known example is the one presented by Luce (1956) (see also Roy and Vincke, 1981): obviously no one can tell the difference between a cup of coffee containing of sugar and a cup of coffee with of sugar; therefore there is an indifferent relation between these two situations. Similarly, there is indifference between sugar and of sugar. If the indifference relation is transitive, then and of sugar should be considered as indifferent. Following the same line of inference, it can be deduced that there is no difference between a cup of cof-
3. Multicriteria decision aid classification techniques
51
fee containing of sugar and a cup of coffee that is full of sugar, irrespective of Obviously, this is an incorrect conclusion, indicating that there are cases where transitivity is not valid. 2. The outranking relation is not complete: In the MAUT framework only the preference and indifference relations are considered. In addition to these two relations, ORT introduces the incomparability relation. Incomparability arises in cases where the considered alternatives have major differences with respect to their characteristics (performance on the evaluation criteria) such that their comparison is difficult to be performed.
Despite the above two major differences, both MAUT and ORT use similar model development techniques, involving the direct interrogation of the decision maker. Within the ORT context, the decision maker specifies several structural parameters of the developed outranking relation. In most ORT techniques these parameters involve: 1. The significance of the evaluation criteria. 2. Preference, indifference and veto thresholds. These thresholds define a fuzzy outranking relation such as the one presented in Figure 3.4. Furthermore, the introduction of the veto threshold facilitates the development of non-compensatory models (models in which the significantly low performance of an alternative in an evaluation criterion is not compensated by the performance of the alternatives on the remaining criteria).
The combination of the above information enables the decision-analyst to measure the strength of the indications supporting the affirmation “alternative x is at least as good as alternative as well as the strength of the indications against this affirmation.
52
Chapter 3
Once the development of the outranking relation is completed on the basis of the aforementioned information, the next stage is to employ the outranking relation for decision making purposes (choice, ranking, classification/sorting of the alternatives). During this stage heuristic procedures are commonly employed to decide upon the evaluation of the alternatives on the basis of developed outranking relation. The most extensively used ORT techniques are the ELECTRE methods (Roy, 1991), as well as the PROMETHEE methods (Brans and Vincke, 1985). These two families of methods include different variants that are suitable for addressing choice, ranking and classification/sorting problems. Sections 3.1.2 and 3.1.3 of the present chapter discuss in more detail the application of the ORT framework in classification problems.
2.4
Preference disaggregation analysis
From the proceeding discussion on MAUT and ORT, it is clear that both these approaches are devoted to the modeling and representation of the decision maker’s preferential system in a pre-specified mathematical model (functional or relational). On the other hand, the focus in preference disaggregation analysis (PDA) is the development of a general methodological framework, which can be used to analyze the actual decisions taken by the decision maker so that an appropriate model can be constructed representing the decision maker system of preferences, as consistently as possible. Essentially, PDA employs an opposite decision aiding process compared to MAUT and ORT (cf. Figure 3.5). In particular, both MAUT and ORT support the decision maker in aggregating different evaluation criteria on the basis of a pre-specified modeling form (utility function or outranking relation). This is a forward process performed on the basis of the direct interrogation of the decision maker. The decision maker specifies all the model parameters with the help of the decision analyst who is familiar with the methodological approach that is employed. On the contrary, PDA employs a backward process. PDA does not require the decision maker to provide specific information on how the decisions are taken; it rather asks the decision maker to express his actual decisions. Given these decisions PDA investigates the relationship between the decision factors (evaluation criteria) and the actual decisions. This investigation enables the specification of a criteria aggregation model that can reproduce the decision maker’s decisions as consistently as possible. PDA is founded on the principle that it is, generally, difficult to elicit specific preferential information from decision makers. This difficulty is due to time constraints and the unwillingness of the decision makers to partici-
3. Multicriteria decision aid classification techniques
53
pate actively in such an interactive elicitation/decision aiding process. Instead, it is much easier for decision makers to express their actual decisions, without providing any other information on how these decisions are taken (e.g., significance of criteria). PDA provides increased flexibility regarding the way that these decisions can be expressed. Most commonly, they are expressed in an ordinal scale involving a ranking or a classification of the alternatives. Alternatively a ratio scale can also be employed (Lam and Choo, 1995). More detailed information is also applicable. For instance, Cook and Kress (1991) consider the ranking of the alternatives on each evaluation criterion and the ranking of the evaluation criteria according to their significance.
The objective of gathering such information is to form a set of examples of decisions taken by the decision maker. These examples may involve:
1. Past decisions taken by the decision maker. 2. Decisions taken for a limited set of fictitious but realistic alternatives. 3. Decisions taken for a representative subset of the alternatives under consideration, which are familiar to the decision maker and consequently he can easily express an evaluation for them.
54
Chapter 3
These decision examples incorporate all the preferential information required to develop a decision support model. PDA’s objective is to analyze these examples in order to specify the parameters of the model as consistently as possible with the judgment policy of the decision maker. Henceforth, the set of examples used for model development purposes within the context of PDA will be referred to as the reference set. The reference set is the equivalent of the training sample used in statistics, econometrics and artificial intelligence (see the discussion in the previous chapters). Generally, the PDA paradigm is similar to the regression framework used extensively in statistics, econometrics and artificial intelligence for model development purposes (cf. Chapter 2). In fact, the foundations of PDA have been set by operations researchers in an attempt to develop non-parametric regression techniques using goal-programming formulations. The first studies on this issue were made during the 1950s by Karst (1958), Kelley (1958) and Wagner (1959). During the 1970s Srinivasan and Shoker (1973) used goal programming formulations for the development of ordinal regression models. In the late 1970s and the beginning of the 1980s, Jacquet–Lagrèze and Siskos (1978, 1982, 1983) introduced the PDA concept for decisionaiding purposes through the development of the UTA method (UTilités Additives). A comprehensive review of this methodological approach of MCDA and the development made over the past two decades is presented in the recent paper of Jacquet–Lagrèze and Siskos (2001). The first of the aforementioned studies on PDA employed simple linear weighted average models:
The aim of these approaches was to estimate the scalars (criteria weights), so that the model’s estimations were as much consistent as possible to the observed Y. From a decision aiding point of view, however, the use of weighted average models has two major disadvantages:
1. The weighted average model represents a risk-neutral behavior. The modeling of risk-prone or risk-averse behaviors can not be modeled and represented in such a model. 2. The consideration of qualitative criteria is cumbersome in weighted average models. In several practical decision making problems from the fields of marketing, financial management, environmental management, etc., the consideration of qualitative information is crucial. The introduction of qualitative criteria in weighted average models, requires that the each level of their qualitative scale is assigned a numerical value. Such a
3. Multicriteria decision aid classification techniques
55
quantification, however, alters the nature of the qualitative information, while furthermore, the selection of the quantitative scale is arbitrary. The tools provided by MAUT are quite useful in addressing these problems. Jacquet–Lagrèze and Siskos (1978, 1982) were the first to introduce the use of utility functions within the context of PDA. In particular, the authors used linear programming techniques to estimate an additive utility function that can be used in ordinal regression decision making problems (ranking). Of course, the use of additive utility functions in a decision making context has been criticized, mainly with regard to the fact that the criteria interactions are not considered in such an approach (Lynch, 1979; Oral and Kettani, 1989). These interactions can be modeled and represented in a multiplicative utility function, as proposed by Oral and Kettani (1989). Nevertheless, the estimation of multiplicative utility functions within the framework of PDA is a computationally intensive process involving the solution of non-linear mathematical programming problems.
3.
MCDA TECHNIQUES FOR CLASSIFICATION PROBLEMS
Having defined the context of MCDA and the main methodological approaches developed within this field, the subsequent analysis is focused on the review of the most characteristic MCDA techniques proposed for addressing classification problems. The review is performed on two phases:
1. Initially, the techniques based on the direct interrogation of the decision maker are discussed. Such techniques originate from the MAUT and ORT approaches. 2. In the second stage, the review extends to the contribution of the PDA paradigm in the development of classification models.
3.1
Techniques based on the direct interrogation of the decision maker
3.1.1
The AHP method
Saaty (1980) first proposed the AHP method (Analytic Hierarchy Process) for addressing complex decision making problem involving multiple criteria. The method is particularly well suited for problems where the evaluation criteria can be organized in a hierarchical way into sub-criteria. During the last two decades the method has become very popular, among operations researchers and decision scientists, mainly in USA. At the same time, how-
56
Chapter 3
ever, it has been heavily criticized for some major theoretical shortcomings involving its operation. AHP models a decision making problem through a process involving four stages: Stage 1 : Stage 2 : Stage 3 : Stage 4 :
Hierarchical structuring of the problem. Data input. Estimation of the relative weights of the evaluation criteria. Combination of the relative weights to perform an overall evaluation of the alternatives (aggregation of criteria).
In the first stage the decision maker defines a hierarchical structure representing the problem at hand. A general form of such a structure is presented in Figure 3.6. The top level of the hierarchy considers the general objective of the problem. The second level includes all the evaluation criteria. Each criterion is analyzed in the subsequent levels into sub-criteria. Finally, the last level of the hierarchy involves the objects to be evaluated. Within the context of a classification problem the elements of the final level of the hierarchy represent the choices (groups) available to the decision maker regarding the classification of the alternatives. For instance, for a two-group classification problem the last level of the hierarchy will include two elements corresponding to group 1 and group 2.
3. Multicriteria decision aid classification techniques
57
Once the hierarchy of the problem is defined, in the second stage of the method the decision maker performs pairwise comparisons of all elements at each level of the hierarchy. Each of these comparisons is performed on the basis of the elements of the proceeding level of the hierarchy. For instance, considering the general hierarchy of Figure 3.6 at the first level, no comparisons are required (the first level involves only one element). In the second level, all elements (evaluation criteria) are compared in a pairwise way on the basis of the objective of the problem (first level of the hierarchy). Then, the sub-criteria of the third level are compared each time from a different point of view considering each criterion of the second level of the hierarchy. For instance, the sub-criteria and are initially compared on the basis of the criterion then on the basis of criterion etc. The same process is continued until all elements of the hierarchy are compared. The objective of all these comparisons is to assess the relative significance of all elements of the hierarchy in making the final decision according to the initial objective. The comparisons are performed using the 9-point scale presented in Table 3.1.
The results of the comparisons made by the decision maker are used to form a n×n matrix for each level k of the hierarchy, where denotes the number of elements in level k.
where, denotes the actual weights assigned to each element included at level k of the hierarchy as opposed to a specific element
58
Chapter 3
of the level k-1. Assuming that all comparisons are consistent, the weights can be estimated through the solution of the following system of linear equalities:
If is known, then this relation can be used to solve for The problem for solving for a nonzero solution to this set of equation is known as the eigenvalue problem:
where is the matrix formed by the comparisons made by the decision maker, is the largest eigenvalue of and is the vector of the estimates of the actual weights. The last stage of the AHP method involves the combination of the weights defined in the previous stage, so that an overall evaluation of the elements belonging in the final level of the hierarchy (level k) is performed on the basis of the initial objective of the analysis (first level of the hierarchy). This combination is performed as follows:
where, is a vector consisting of the global evaluations for the elements of level k, and is a matrix of the weights of the elements in level j as opposed to the elements of level j–1. For a classification problem the global evaluation for the elements in the last level of the hierarchy are used to decide upon the classification of an alternative. Since the elements of the last level correspond to the prespecified groups, an alternative is assigned to the group for which the evaluation of the corresponding element is higher. Srinivasan and Kim (1987) used AHP in the credit granting problem in order to classify a set of firms into two groups: the firms that should be granted credit and the ones that should be rejected credit. Despite the extensive use of AHP for addressing a variety of decision making problem (for a review of AHP applications see Zahedi, 1986; Vargas, 1990), the method has been heavily criticized by researchers. The focal point in this criticism is the “rank reversal” problem. This problem was first noted by Helton and Gear (1983) who found that when a new alternative is added in an existing set of alternatives A such that is the same with an
3. Multicriteria decision aid classification techniques
59
existing alternative then the evaluations on the new set of alternatives are not consistent with evaluations on the initial set of alternatives A. An example of this problem is given by Harker and Vargas (1990). The authors considered a set of three alternatives evaluated along three criteria and they applied the AHP method and concluding on the ranking When they introduced an additional alternative in the analysis such that had the same description with it would be expected that the new evaluation result will have the form However, the new evaluation of the alternatives was not consistent with this expected result (the new evaluation was cf. Harker and Vargas, 1990). Several researchers have proposed methodologies for addressing the rank reversal problem (Schoner and Wedley, 1989, 1993; Dyer, 1990), however, an overall solution of this limitation of the method is not still available.
3.1.2
The ELECTRE TRI method
The family of ELECTRE methods initially introduced by Roy (1968), is founded on the ORT concepts. The ELECTRE methods are the most extensively used ORT techniques. The ELECTRE TRI method (Yu, 1992) is a member of this family of methods, developed for addressing classification problems. The ELECTRE TRI method is based on the framework of the ELECTRE III method (Roy, 1991). The objective of the ELECTRE TRI method is to assign a discrete set of alternatives into q groups Each alternative is considered as a vector consisting of the performance of alternative on the set of evaluation criteria g. The groups are defined in an ordinal way, such that group includes the most preferred alternatives and includes the least preferred ones. A fictitious alternative is introduced as the boundary among each pair of consecutive groups and (Figure 3.7). Henceforth, any such fictitious alternative will be referred to as reference profile or simply profile. Essentially, each group is delimited by the profile (the lower bound of the group) and the profile (the upper bound of the group). Each profile is a vector consisting of partial profiles defined for each criterion Since the groups are defined in an ordinal way, each partial profile must satisfy the condition for all k=1, 2, …, q–1 and i=1, 2, …, n.
60
Chapter 3
The classification of the alternatives into the pre-specified groups is performed through a two stage process. The first stage involves the development of an outranking relation used to decide on whether an alternative outranks a profile or not. The second stage involves the exploitation of the developed outranking relation to decide upon the classification of the alternatives. The development of the outranking relation in the first stage of the process is based on the comparison of the alternatives with the reference profiles. These comparisons are performed for all pairs j=1, 2, …, m and k=1, 2, …, q–1. Generally, the comparison of an alternative with a profile is accomplished in two stages, involving the concordance and the discordance test respectively. The objective of the concordance test is to assess the strength of the indications supporting the affirmation “alternative is at least as good as profile The measure used to assess this strength is the global concordance index This index ranges between 0 and 1; the closer it is to one, the higher is the strength of the above affirmation and vise versa. The concordance index is estimated as the weighted average of partial concordance indices defined for each criterion:
3. Multicriteria decision aid classification techniques
61
where denotes the weight of criterion (the criteria weights are specified by the decision maker), and denotes the partial concordance index defined for criterion Each partial concordance index measures the strength of the affirmation “alternative is at least as good as profile on the basis of criterion The estimation of the partial concordance index requires the specification of two parameters: the preference threshold and the indifference threshold. The preference threshold for criterion represents the largest difference compatible with a preference in favor of on criterion The indifference threshold for criterion represents the smallest difference that preserves indifference between an alternative and profile on criterion The values of these thresholds are specified by the decision maker in cooperation with the decision analyst. On the basis of these thresholds, the partial concordance index is estimated as follows (Figure 3.8):
62
Chapter 3
The discordance index measures the strength of the indications against the affirmation “alternative is at least as good as profile on the basis of criterion The estimation of the discordance index requires the specification of an additional parameter, the veto threshold Conceptually, the veto threshold represents the smallest difference between a profile and the performance of an alternative on criterion above which the criterion vetoes the outranking character of the alternative over the profile, irrespective of the performance of the alternative on the remaining criteria. The estimation of the discordance index is performed as follows (Figure 3.9):
Once the concordance and discordance indices are estimated as described above, the next stage of the process is to combine the two indices so that an overall estimation of the strength of the outranking degree of an alternative over the profile can be estimated considering all the evaluation criteria. This stage involves the estimation of the credibility index measuring the strength of the affirmation “alternative is at least as good as profile according to all criteria”. The estimation of the credibility index is performed as follows:
where, F denotes the set of criteria for which the discordance index is higher than the concordance index:
Obviously, if
3. Multicriteria decision aid classification techniques
63
The credibility index provides the means to decide whether an alternative outranks profile or not. The outranking relation is considered to hold if The cut-off point is defined by the decision-analyst in cooperation with the decision maker, such that it ranges between 0.5 and 1. The outranking relation developed in this way is used to establish three possible outcomes of the comparison of an alternative with a profile In particular, this comparison may lead to the following conclusions:
1. Indifference (I): 2. Preference (P): 3. Incomparability (R): The modeling of the incomparability relation is one of the main distinguishing features of the ELECTRE TRI method and ORT techniques in general. Incomparability arises for alternatives that have exceptionally good performance on some criteria and at the same time quite poor performance on other criteria. The above three relations (I, P, R) provide the basis for developing the classification rule. ELECTRE TRI employs two assignment procedures, the optimistic and the pessimistic one. Both procedures begin by comparing an alternative to the lowest (worst) profile If then the procedure continues with the comparison of to the next profile The same procedure continues until one of the two following situations appears: 1. 2.
64
Chapter 3
In the first case, both the optimistic and the pessimistic procedures will assign the alternative into group In the second case, however, the pessimistic procedure will assign the alternative into group whereas the optimistic procedure will assign the alternative into group Overall, the key issue for the successful implementation of the above process is the elicitation of all the preferential parameters involved (i.e., criteria weights, preference, indifference, veto thresholds, profiles). This elicitation is often cumbersome in real-world situations due to time constraints or the unwillingness of the decision makers to actively participate in a direct interrogation process managed by an expert decision analyst. Recently, Mousseau and Slowinski (1998) proposed a methodology to infer all this preferential information using the principles of PDA. The main features, advantages and disadvantages of this methodology will be discussed in the Appendix of Chapter 5, together with the presentation of a new approach to address this problem.
3.1.3
Other outranking classification methods
The N–TOMIC method The N–TOMIC method presented by Massaglia and Ostanello (1991), performs an assignment (sorting) of the alternatives into nine pre-specified groups indicating: exceptionally high performance, high performance, relatively high performance, adequate performance, uncertain performance, inadequate performance, relatively low performance, low performance and significantly low performance. These nine groups actually define a trichotomic classification of the alternatives, i.e. good alternatives (high performance), uncertain alternatives and bad alternatives (low performance). The assignment of the alternatives into the above groups is performed through the definition of two reference profiles and These two profiles define the concepts of a “good” and “bad” alternative. Every alternative such that is considered certainly good, whereas the case indicates that the alternative is certainly bad. These two cases correspond to the following two affirmations:
1. “Alternative 2. “Alternative
is certainly good” (affirmation 1) is certainly bad” (affirmation 2)
The method’s objective is to estimate the credibility of these affirmations using the concordance and discordance concepts discussed previously for the ELECTRE TRI method. The realization of the concordance and discordance tests is based on the same information used in the context of the ELECTRE
3. Multicriteria decision aid classification techniques
65
TRI method (criteria weights, preference, indifference, veto thresholds). The outcomes of the concordance and discordance tests involve the estimation of a concordance and discordance index for each one of the above affirmations. On the basis of these indices the assignment procedure is implemented in three stages: Stage 1: In this first stage it is examined whether an alternative can be assigned into one of the groups and (these groups do not pose any certainty on whether an alternative is good, uncertain or bad). Denoting the credibility indices of affirmations 1 and 2 for an alternative by and respectively, the assignment is performed as follows2: If If ance group) If If ance group) If
then then
(uncertain performance group) (significantly low perform-
then then
(low performance group) (exceptionally high perform-
then
(high performance group)
Stage 2: In this stage the assignment of an alternative into one of the following sets is considered: 1. {Good}={Alternatives belonging into groups 2. {Uncertain}={Alternatives belonging into groups 3. {Bad}={Alternatives belonging into groups
} } }
The assignment is performed as follows3: If If If
then then then
Stage 3: This final stage extends the analysis of stage 2 through the consideration of the discordance test. The assignment of the alternatives is performed through decision trees constructed for each of the sets {Good}, {Uncertain}, {Bad}. These trees enable the specific classification of the alternatives into the groups-members of above sets. A detailed description of the trees used to perform the classification is presented in Massaglia and Ostanello (1991).
2
3
and denote two profiles ranging between 0.5 and 1, which are defined by the decision analyst. denotes a profile ranging between 0.5 and 1.
66
Chapter 3
The PROAFTN method and the method of Perny (1998) Both the ELECTRE TRI method and the N–TOMIC method are suitable for addressing sorting problems where the groups are defined in an ordinal way. The major distinguishing feature of the PROAFTN method (Belacel, 2000) and the method of Perny (1998) is their applicability in classification problems with nominal groups. In such cases, the reference profiles distinguishing the groups cannot be defined such that they represent the lower bound of each group. Instead, each reference profile is defined such that it indicates a representative example of each group. On the basis of this approach both the PROAFTN method and the Perny’s method develop a fuzzy indifference relation measuring the strength of the affirmation “alternative is indifferent to profile The development of the fuzzy indifference relation is based on similar procedures to the one used in ELECTRE TRI. Initially, the indications supporting the above affirmation are considered through the concordance test. Then, the discordance test is employed to measure the indications against the above affirmation. The realization of the two tests leads to the estimation of the credibility index measuring the indifference degree between an alternative and the profile The credibility index is used to decide upon the classification of the alternatives. The assignment (classification) procedure consists of comparing an alternative to all reference profiles, and assigning the alternative to the group for which the alternative is most similar (indifferent) to the corresponding profile. This is formally expressed as follows:
Comprehensive description of the details of the model development process and the assignment procedures used in the above methods are provided in the works of Perny (1998) and Belacel (2000).
3.2
The preference disaggregation paradigm in classification problems
All the aforementioned MCDA classification/sorting methods contribute towards the development of a methodological context for decision support modeling and representation purposes. Nevertheless, they all share a common problem. They require that the decision maker or the decision analyst specifies several technical and preferential information which are necessary for the development of the classification model.
3. Multicriteria decision aid classification techniques
67
The methodological framework of preference disaggregation analysis (PDA) constitute a useful basis for specifying this information using regression-based techniques. Such an approach minimizes the cognitive effort required by the decision maker as well as the time required to implement the decision aiding process. Sub-section 2.4 of the present chapter discussed some of the early studies made during the 1950s on using the PDA paradigm for decision making purposes. During the 1960s there were the first attempts to develop classification models using regression techniques based on mathematical programming formulations. One of the first methods to employ this approach was the MSM method (Multi-Surface Method) presented by Mangasarian (1968). The model development process in the MSM method involves the solution of a set of linear programming problems. The resulting model has the form of a set of hyperplanes discriminating the alternatives of a training sample as accurately as possible in the pre-specified groups. Essentially, the developed classification models introduce a piece–wise linear separation of the groups. Recently, Nakayama and Kagaku (1998) extended the model development process of the MSM method using goal programming and multiobjective programming techniques to achieve higher robustness and increased generalizing performance for the developed models. In the subsequent years there were some sparse studies by Smith (1968), Grinold (1972), Liittschwager and Wang (1978), and Hand (1981). Despite the innovative aspects of these pioneering studies, it was the works of Freed and Glover (1981a, b) that really boosted this field. The authors introduced simple goal programming formulations for the development of a hyperplane w · g' = c that discriminates two groups of alternatives. Essentially, such a hyperplane is a linear discriminant function similar to the one used in LDA. The linear discriminant function looks similar to the additive utility function defined in section 2.2 of this chapter. There are three major differences, however, between these two modeling forms: (1) the discriminant function cannot be considered as a preference model because it does not consider neither preference order among decision groups, nor preference order in criteria domains, (2) all criteria are assumed to be quantitative (the qualitative criteria should also be quantified), (3) the above discriminant function is always linear, whereas the additive utility function can be either linear or non-linear depending on the form of the marginal utility functions. These differences can be considered as disadvantages of the discriminant function over the use of a utility function. Nevertheless, many researchers use this modeling form for two reasons: (1) its development is much easier than the development of a utility function, since only the coefficients should be estimated, (2) it is a convenient modeling form when nominal groups are considered (the use of
68
Chapter 3
a utility function is only applicable in sorting problems where the groups are defined in an ordinal way). Furthermore, using the Bayes rule it can be shown that the linear discriminant function is the optimal classification model (in terms of the expected classification error) when the data are multivariate normal with equal group dispersion matrices (Patuwo et al., 1993). These assumptions are strong, however, and only rarely satisfied in practice. On the basis of the linear discriminant function, Freed and Glover (1981a) used the following simple classification rule for two-group classification problems:
The first approach proposed by Freed and Glover (1981a), introduced the minimum distance between the alternatives’ scores and the cut-off point c as the model development criterion. This is known as the MMD model (maximize the minimum distance; cf. Figure 3.10): Max d Subject to:
d unrestricted in sign c user-defined constant Soon after the publication of their first paper, Freed and Glover published a second one (Freed and Glover, 1981b) describing an arsenal of similar goal-programming formulations for developing classification models. The most well-known of these is the MSD model (minimize the sum of deviations), which considers two measures for the quality of the classification obtained through the developed models (Figure 3.11): (1) the violation of the classification rules (3.3) by an alternative of the training sample, and (2) the distance (absolute difference) between a correctly classified alternative and the cut-off point that discriminates the groups. On the basis of these two
3. Multicriteria decision aid classification techniques
69
measures, the optimal discriminating hyperplane is developed through the solution of the following linear programming problem:
Subject to:
where and are constants representing the relative significance of the two goals of the problem (minimization of the violations and maximization of the distances These constants are specified by the decision maker such that
Alternatively to the linear discriminant function, the above goalprogramming formulations are also applicable when a quadratic discriminant function is employed:
70
Chapter 3
The quadratic discriminant function has been proposed by several authors (Duarte Silva and Stam, 1994; Östermark and Höglund, 1998; Falk and Karlov, 2001) as an appropriate approach to consider the correlations between the criteria. Essentially, the quadratic discriminant function can be considered as a simplified form of the multiplicative utility function, which can be used to address nominal classification problem. Using the Bayes rule it can be shown that the quadratic discriminant function is the optimal classification model (in terms of the expected classification error) when the data are multivariate normal with unequal group dispersion matrices. The above two studies by Freed and Glover (1981a, b) motivated several other researchers towards employing similar approaches. The subsequent research made in this field focused on the following issues:
1. Investigation of problems in goal programming formulations for the development of classification models: This issue has been a major research topic mainly during the 1980s. Soon after the studies of Freed and Glover (1981a, b) researchers identified some problems in the formulations that the authors proposed. Markowski and Markowski (1985) first identified two possible problems that may occur in the use of the goal programming formulations proposed by Freed and Glover (1981a, b).
3. Multicriteria decision aid classification techniques
71
All coefficients in the discriminant function (hyperplane) are zero (unacceptable solution). In such a case all alternatives are classified in the same group. The developed classification models are not stable to data transformations (e.g., rotation). Later Ragsdale and Stam (1991) noted two additional problems which can be encountered: The development of unbounded solutions. In such cases the objective function of the goal programming formulations can be increased or decreased without any limitation and the developed model for the classification of the alternatives is meaningless. The development of solutions for which all alternatives are placed on the hyperplane that discriminates the groups (improper solutions).
2. The development of new goal programming formulations: Soon after the identification of the problems mentioned above, it was found that these problems were mainly due to the lack of appropriate normalization constraints. To address this issue several authors proposed improved formulations including hybrid models (Glover et al., 1988; Glover, 1990), nonlinear programming formulations (Stam and Joachimsthaler, 1989), mixed-integer programming formulations (Choo and Wedley, 1985; Koehler and Erenguc, 1990; Rubin, 1990a; Banks and Abad, 1991; Abad and Banks, 1993; Wilson, 1996) and multiobjective programming formulations (Stam, 1990). Recently, there have been several studies proposing the use of advanced optimization techniques such as genetic algorithms (Conway et al., 1998) and tabu search (Fraughnaugh et al., 1998; Yanev and Balev, 1999). 3. The comparison with other classification techniques: The first comparative study was performed by Bajgier and Hill (1982). The authors compared the classification models developed through a mixed-integer programming formulation, with the models developed by the two formulations of Freed and Glover (1981a, b) and the ones of linear discriminant analysis. The comparison was based on a simulation experiment and the obtained results showed that the models developed through the mathematical programming formulations provide higher classification accuracy compared to the statistical models, except for the case where the groups variance-covariance matrices are equal. Freed and Glover (1986) concluded to similar encouraging results, despite their observation that goal programming techniques were more sensitive to outliers. Joachimsthaler and Stam (1988) performed a more extensive comparison considering a goal programming formulation, linear and quadratic discriminant analysis, as well as logit analysis. They found that all methods provide similar
72
Chapter 3
results, even though the performance of discriminant analysis (linear and quadratic) deteriorates as kurtosis increases. Expect for these comparative studies that concluded to encouraging results, there were also other studies that concluded to opposite results. Markowski and Markowski (1987) investigated the impact of qualitative variables in the classification accuracy of models developed using goal programming techniques and linear discriminant analysis. Qualitative variables do not comply with the distributional assumptions of linear discriminant analysis (multivariate normality). The results of the authors, however, showed that linear discriminant analysis performed better than goal programming techniques when qualitative variables are introduced in the data. In particular, the performance of linear discriminant analysis was improved with the consideration of qualitative variables, while the performance of goal programming techniques remained unaffected. The simulation study performed by Rubin (1990b) is even more characteristic. The author compared 15 goal programming formulations to quadratic discriminant analysis. According to the results the author concluded that in order for the goal programming techniques to be considered as a promising alternative for addressing classification problems, they must be shown to outperform discriminant analysis at least in cases where the data are not multivariate normal. Table 3.2 summarizes some of the most significant comparative studies performed during the last two decades. Comprehensive reviews of this field are presented in the work of Joachimsthaler and Stam (1990), as well as in a recent special issue of Annals of Operations Research (Gehrlein and Wagner, 1997). On the basis of the above discussion, the main existing problems regarding the use of goal programming techniques for developing classification model involve the following issues:
1. The form of the developed models. The majority of the existing research employs simple linear models (linear discriminant functions) which often fail to represent adequately the complexity of real-world classification problems. 2. The consideration of qualitative criteria. Using a simple weighted average or a simple discriminant function makes it difficult to consider qualitative criteria. This requires that for each level of the qualitative scale a 0-1 variable is introduced or that each qualitative criterion is “quantified” introducing a numerical scale to its qualitative measurement (e.g, good 1, medium 2, bad 3, etc.). However, both these solutions alter the nature of qualitative criteria and the way that they are considered by the decision maker.
3. Multicriteria decision aid classification techniques
73
74
Chapter 3
3.
The number of groups. Most of the existing research on the use of mathematical programming techniques for developing classification models is restricted to two group classification problems. The multigroup case still needs further research. There have been some sparse studies on this issue (Choo and Wedley, 1985; Wilson, 1996; Gochet et al., 1997) but further analysis is required towards the investigation of the peculiarities of multi-group classification problems within the context of mathematical programming techniques.
3. Multicriteria decision aid classification techniques
75
4. The nature of the problems that are addressed. The existing research is heavily focused on classification problems where the groups are defined in a nominal way. However, bearing in mind that sorting problems (ordinal groups) are of particular interest in many real-world decision making fields, it is clear that this field is of major practical and research interest and it deserves further investigation. The MCDA methods that will be presented in detail in the next chapter address most of the above issues in an integrated and flexible framework.
Chapter 4 Preference disaggregation classification methods
1.
INTRODUCTION
The review of MCDA classification methods presented in the previous chapter reveals two major shortcomings: 1. Several MCDA classification methods require the definition of a significant amount of information by the decision maker. The process involving the elicitation of this information is often cumbersome due to: (1) time constraints, (2) the willingness of the decision maker to participate actively in this process, and (3) the ability of the analyst to interact efficiently with the decision maker. 2. Other MCDA techniques that employ the preference disaggregation philosophy usually assume a linear relationship between the classification of the alternatives and their characteristics (criteria). Such an approach implicitly assumes that the decision maker is risk–neutral which is not always the case.
This chapter presents two MCDA classification methods that respond satisfactory to the above limitations. The considered methods include the UTADIS method (UTilités Additives DIScriminantes) and the MHDIS method (Multi–group Hierarchical DIScrimination). Both methods combine a utility function–based framework with the preference disaggregation paradigm. The problems addressed by UTADIS and MHDIS involve the sorting of the alternatives into q predefined groups defined in an ordinal way:
78
Chapter 4
where denotes the group consisting of the most preferred alternatives and denotes the group of the least preferred alternatives. The subsequent sections of this chapter discuss in detail all the model development aspects of the two methods as well as all the important issues of the model development and implementation process.
2.
THE UTADIS METHOD
2.1
Criteria aggregation model
The UTADIS method was first presented by Devaud et al. (1980), while some aspects of the method can also be found in Jacquet–Lagrèze and Siskos (1982). The interest of MCDA researchers in this method was rather limited until the mid 1990s. Jacquet–Lagrèze (1995) used the method to evaluate R & D projects, while after 1997 the method has been widely used for developing classification models in financial decision making problems. (Zopounidis and Doumpos, 1997, 1998, 1999a, b; Doumpos and Zopounidis, 1998; Zopounidis et al., 1999). Recently, the method has been implemented in multicriteria decision support systems, such as the FINCLAS system (Zopounidis and Doumpos, 1998) and the PREFDIS system (Zopounidis and Doumpos, 2000a). The UTADIS method is a variant of the well–known UTA method (UTilités Additives). The latter is an ordinal regression method proposed by Jacquet–Lagrèze and Siskos (1982) for developing decision models that can be used to rank a set of alternatives from the best to the worst ones. Within the sorting framework described in the introductory section of this chapter, the objective of the UTADIS method is to develop a criteria aggregation model used to determine the classification of the alternatives. Essentially this aggregation model constitutes an index representing the overall performance of each alternative along all criteria. The objective of the model development process is to specify this model so that the alternatives of group receive the highest scores, while the scores of the alternatives belonging into other groups gradually decrease as we move towards the worst group Formally, the criteria aggregation model is expressed as an additive utility function:
where:
4. Preference disaggregation classification methods
79
is the vector of the evaluation criteria. is a scaling constant indicating the significance of criterion is the marginal utility function ofcriterion The marginal utility functions are monotone functions (linear or nonlinear) defined on the criteria’s scale, such that the following two conditions are met:
where and denote the least and the most preferred value of criterion respectively. These values are specified according to the set A of the alternatives under consideration, as follows: For increasing preference criteria (criteria for which higher values indicate higher preference, e.g. return/profitability criteria): and
For decreasing preference criteria (criteria for which higher values indicate lower preference, e.g. risk/cost criteria):
and Essentially, the marginal utility functions provide a mechanism for transforming the criterion’s scale into a new scale ranging in the interval [0, 1]. This new scale represents the utility for the decision maker of each value of the criterion. The form of the marginal utility functions depends upon the decision maker’s preferential system (judgment policy). Figure 4.1 presents three characteristic cases. The concave form of the utility function presented in Figure 4.1 (a) indicates that the decision maker considers as quite significant small deviations from the worst performance This corresponds to a risk–averse attitude (i.e., the decision maker is satisfied with “acceptable” alternatives and does not necessarily seek alternatives of top performance). On the contrary, the case presented in Figure 4.1(b) corresponds to a risk– prone decision maker who is mainly interested for alternatives of top performance. Finally, the linear marginal utility function of Figure 4.1(c) indicates a risk–neutral behavior.
80
Chapter 4
Transforming the criteria’s scale into utility terms through the use of marginal utility functions has two major advantages: 1. It enables the modeling and representation of the nonlinear behavior of the decision maker when evaluating the performance of the alternatives. 2. It enables the consideration of qualitative criteria in a flexible way. Consider for instance, a qualitative corporate performance criterion representing the organization of a firm measured through a three–level qualitative scale: “good”, “medium”, and “poor”. Using such a qualitative criterion through simple weighted average models, requires the a priori assignment of a numerical value to each level of the qualitative scale (e.g., Such an assignment is often arbitrary and misleading. On the contrary, the specification of the marginal utility function provides a sound methodological mechanism to identify the value (in quantitative terms) that the decision maker assigns to each level of the qualitative scale. Within the context of the UTADIS method and the preference disaggregation framework in general, the form of the
4. Preference disaggregation classification methods
81
criteria’s marginal utility functions is specified through a regression– based framework, enabling the a posteriori assignment of a numerical value to each level of a qualitative scale rather than an arbitrary a priori specification. Given the above discussion on the concept of marginal utilities, the global utility of an alternative specified through eq. (4.1) represents a measure of the overall performance of the alternative considering its performance on all criteria. The global utilities range in the interval [0, 1] and they constitute the criterion used to decide upon the classification of the alternatives. Figure 4.2 illustrates how the global utilities are used for classification purposes in the simple two group case. The classification is performed by comparing the global utility of each alternative with a cut–off point defined on the utility scale between 0 and 1. Alternatives with global utilities higher than the utility cut–off point are assigned into group whereas alternatives with global utilities lower than the cut–off point are assigned into group
In the general case where q groups are considered, the classification of the alternatives is performed through the following classification rules:
Chapter 4
82
where denote the utility cut–off points separating the group. Henceforth, these cut–off points will be referred to as utility thresholds. Essentially, each utility threshold separates two consecutive groups and
2.2
Model development process
2.2.1
General framework
The main structural parameters of the classification model developed through the UTADIS method include the criteria weights, the marginal utility functions and the utility thresholds. These parameters are specified through the regression–based philosophy of preference disaggregation analysis described in the previous chapter. A general outline of the model development procedure in the UTADIS method is presented in Figure 4.3. Initially, a reference set consisting of m alternatives described along n criteria is used as the training sample (henceforth the training sample will be referred to as the reference set in order to comply with the terminology used in MCDA). The alternatives of the reference set are classified a priori into q groups. The reference set should be constructed in such a way so that it includes an adequate number of representative examples (alternatives) from each group. Henceforth, the number of alternatives of the reference set belonging into group will be denoted by Given the classification C of the alternatives in the reference set, the objective of the UTADIS method is to develop a criteria aggregation model and a set of utility thresholds that minimize the classification error rate. The error rate refers to the differences between the estimated classification defined through the developed model and the pre–specified classification C for the alternatives of the reference set. Such differences can be represented by introducing a binary variable E representing the classification status of each alternative:
4. Preference disaggregation classification methods
83
On the basis of this binary variable, the classification error rate is defined as the ratio of the number of misclassified alternatives to the total number of alternatives in the reference set:
84
Chapter 4
This classification error rate measure is adequate for cases where the number of alternatives of each group in the reference set is similar along all groups In the case however, where there are significant differences then the use of the classification error rate defined in (4.3) may lead to misleading results. For instance, consider a reference set consisting of 10 alternatives, 7 belonging into group and 3 belonging into group In this case a classification that assigns correctly all alternatives of group and incorrectly all alternatives of group has an error rate
This is a misleading result. Actually, what should be the main point of interest in the expected classification error This is expressed in relation to the a priori probabilities and that an alternative belongs into groups and respectively, as follows:
In the above example the error rates for the two groups (0% for and 100% for can be considered as estimates for the probabilities and respectively. Assuming that the a priori probabilities for the two groups are equal then the expected error of the classification is 0.5:
Such a result indicates that the obtained classification corresponds to a random classification. In a random classification the probabilities are determined based on the proportion of each group to the total number of alternatives in the reference set. In this respect, in the above example a naïve approach would be to assign 7 out of the 10 alternatives into group i.e., and 3 out of the 10 alternatives into group i.e.,
4. Preference disaggregation classification methods
85
The expected error of such a naïve approach (random classification) is 0.5:
To overcome this problem, a more appropriate measure of the expected classification error rate is expressed as follows:
Even though this measure takes into consideration the a priori probabilities of each group, it assumes that all classification errors are of equal cost to the decision maker. This is not always the case. For instance the classification error regarding the assignment of a bankrupt firm to the group of healthy firms is much more costly to an error involving the assignment of a healthy firm to the bankrupt group. The former leads to capital cost (loss of the amount of credit granted to a firm), whereas the latter leads to opportunity cost (loss of profit that would result from granting a credit to a healthy firm). Therefore, it would be appropriate to extend the expected classification error rate (4.4) so that the costs of each individual error are also considered. The resulting measure represents the expected misclassification cost (EMC), rather than the expected classification error rate:
where: is the misclassification cost involving the classification of an alternative of group into group is a binary 0–1 variable defined such that if an alternative is classified into group and if is not classified into group Comparing expressions (4.4) and (4.5) it becomes apparent that the expected classification error rate in (4.4) is a special case of the expected misclassification cost, when all costs are considered equal for every k, l=1, 2, …, q. The main difficulty related to the use of the expected misclassifica-
86
Chapter 4
tion cost as the appropriate measure of the quality of the obtained classification is that it is often quite difficult to have reliable estimates for the cost of each type of classification error. For this reason, all subsequent discussion in this book concentrates on the use of the expected classification error rate defined in (4.4). Furthermore, without loss of generality, it will be assumed that all a priori probabilities are equal to If the expected classification error rate, regarding the classification of the alternatives that belong into the reference set, is considered satisfactory, then this constitutes an indication that the developed classification model might be useful in providing reliable recommendations for the classification of other alternatives. On the other hand, if the obtained expected classification error rate indicates that the classification of the alternatives in the reference set is close to a random classification then the decision maker must check the reference set regarding its completeness and adequacy for providing representative information on the problem under consideration. Alternatively, it is also possible that the criteria aggregation model (additive utility function) is not able to provide an adequate representation of the decision maker’s preferential system. In such a case an alternative criteria aggregation model must be considered. However, it should be pointed that a low expected classification error rate does not necessarily ensure that practical usefulness of the developed classification model; it simply provides an indication supporting the possible usefulness of the model. On the contrary, a high expected classification error rate leads with certainty to the conclusion that the developed classification model is inadequate.
2.2.2
Mathematical formulation
Pursuing the objective of the model development process in the UTADIS method, i.e., the maximization of the consistency between the estimated classification and the predefined one C, is performed through mathematical programming techniques. In particular, the minimization of the expected classification error rate (4.4) requires the formulation and solution of a mixed–integer programming (MIP) problem. The solution, however, of MIP formulations is a computationally intensive procedure. Despite the significant research that has been made on the development of computationally efficient techniques for solving MIP problems within the context of classification model development (cf. sub–section 3.2 of Chapter 3), the computational effort still remains quite significant. This problem is most significant in cases where the reference set includes a large number of alternatives.
4. Preference disaggregation classification methods
87
To overcome this problem an approximation of the error rate (4.4) is used as follows:
where
is a positive real variable, defined such that:
Essentially, represents the magnitude of the classification error for alternative On the basis of the classification rule (4.2), the classification error for an alternative of group involves the violation of the utility threshold that defines the lower bound of group For the alternatives of the last (least preferred) group the classification error involves the violation of the utility threshold that defines the upper bound of group For any other intermediate group the classification error may involve either the violation of the upper bound of the group or the violation of the lower bound Henceforth the violation of the lower bound of a group will be denoted by whereas will be used to denote the violation of the upper bound of a group. Figure 4.4 provides a graphical representation of these two errors in the simple two–group case. By definition it is not possible that the two errors occur simultaneously Therefore, the total error for an alternative is defined as At this point it should be emphasized that the error functions (4.4) and (4.6) are not fully equivalent. For instance, consider a reference set consisting of four alternatives classified into two groups: Assume that for this reference set an additive utility classification model (CM1) is developed that misclassifies alternatives and such that and Then according to (4.6) the total classification error is 0.075, whereas considering (4.4) the expected classification error rate is An alternative classification model (CM2) that classifies correctly but retains the misclassification of such that has and Obviously the model CM1 outperforms CM2 when the definition (4.6) is considered, but according to the expected classification error rate (4.4) CM2 performs better.
88
Chapter 4
Despite this limitation the definition (4.6) provides a good approximation of the expected classification error rate (4.4), while reducing the computational effort required to obtain an optimal solution. The two forms the classification errors can be formally expressed on the basis of the classification rule (4.2) as follows:
These expressions illustrate better the notion of the two classification error forms. The error indicates that to classify correctly a misclassified alternative that actually belongs into group its global utility should be increased by Similarly, the errors indicates that to classify correctly a misclassified alternative that actually belongs into its global utility should be decreased by Introducing these error terms in the additive utility model, it is possible to rewrite the classification rule (4.2) in the form of the following constraints:
4. Preference disaggregation classification methods
89
These constraints constitute the basis for the formulation of a mathematical programming problem used to estimate the parameters of the additive utility classification model (utility thresholds, marginal utilities, criteria weights). The general form of this mathematical programming model is the following (MP1):
subject to:
In constraints (4.11)–(4.12) is a positive constant used to avoid cases where when Of course, is considered as the lower bound of group In this regard the case typically, does not pose any problem during model development and implementation. However, assuming the simple two–group case the specification may lead to the development of a classification model for which for all and for all Since the utility threshold is defined as the lower bound of group it is obvious that such a model performs an accurate classification of the alternatives. Practi-
90
Chapter 4
cally, however, since all alternatives of group are placed on the utility threshold, the generalizing ability of such a model is expected to be limited. Therefore, to avoid such situations a small positive (non–zero) value for the constant should be chosen. The constant in (4.12)–(4.13) is used in a similar way. Constraints (4.14) and (4.15) are used to normalize the global utilities in the interval [0, 1]. In these constraints and denote the vectors consisting of the most and the least preferred alternatives of the evaluation criteria. Finally, constraint (4.16) is used to ensure that the utility threshold discriminating groups and is higher than the utility threshold discriminating groups and This specification ensures the ordering of the groups from the most preferred to the least preferred ones In this ordering of the groups, higher utilities are assigned to the most preferred groups. In constraint (4.16) s is a constant defined such that Introducing the additive utility function (4.1) in MP1 leads to the formulation of a nonlinear programming problem. This is because the additive utility function (4.1) has two unknown parameters to be specified: (a) the criteria weights and (b) the marginal utility functions. Therefore, constraints (4.11)–(4.15) take a nonlinear form and the solution of the resulting nonlinear programming problem can be cumbersome. To overcome this problem, the additive utility function (4.1) is rewritten in a simplified form as follows:
where:
Both (4.1) and (4.18) are equivalent expressions for the additive utility function. Nevertheless, the latter requires only the specification of the marginal utility functions As illustrated in Figure 4.1, these functions can be of any form. The UTADIS method does not pre–specify a functional form for these functions. Therefore, it is necessary to express the marginal utility functions in terms of specific decision variables to be estimated through the solution of MP1. This is achieved through the modeling of the marginal utilities as piece–wise linear functions through a process that is graphically illustrated in Figure 4.5.
4. Preference disaggregation classification methods
91
The range
of each criterion is divided into subintervals A commonly used approach to define these subintervals is based on the following simple heuristic: Define
equal subintervals such that there is at least one alternative belonging in each subinterval, i.e.,
Henceforth this heuristic will be referred to as HEUR1. Following this piece–wise linear modeling approach, the estimation of the unknown marginal utility functions can be performed by estimating the marginal utilities at the break–points As illustrated in Figure 4.5 this estimation provides an approximation of the true marginal utility functions. On the basis of this approach, it would be reasonable to assume that the larger the number of subintervals that are specified, the better is the approximation of the marginal utility functions. The definition of a large number of subintervals, however, provides increased degrees of freedom to the additive utility model. This increases the fitting ability of the developed model to the data of the reference set; the instability, however, of the model is also increased (the model becomes sample–based).
92
Chapter 4
Once the marginal utilities for every break–point are estimated, the marginal utility of any criterion value can be found using a simple linear interpolation:
where On the basis of this piece–wise linear modeling approach of the marginal utility functions, MP1 is re–written in a linear form as follows: (LP1):
subject to:
4. Preference disaggregation classification methods
93
Constraints (4.21)–(4.27) of LP1 correspond to the constraints (4.11)– (4.17) of MP1. Therefore, the two problems are equivalent. Table 4.1 presents the dimensions of LP1. According to this table, the number of constraints in LP1 is defined by the number of alternatives in the reference set and the number of evaluation criteria. The latter defines the number of monotonicity constraints (4.27). The number of such constraints can be quite significant in cases where there is a large number of criteria subintervals. For instance, consider a two–group classification problem with a reference set of 50 alternatives evaluated along five criteria. Assuming that each criterion’s values are divided into 10 subintervals, then LP1 has the following constraints: 1. 50 classification constraints [constraints (4.21)–(4.23)], 2. 2 normalization constraints [constraints (4.24)–(4.25)] and 3. (5 criteria)×(10 subintervals)=50 monotonicity constraints [constraint (4.27)]
Almost half of the constraints in this simple case are monotonicity constraints determined by the number of criteria and the definition of the subintervals. The increased number of these constraints increases the computational effort required to solve LP1. This problem can be easily addressed if the monotonicity constraints are transformed to non–negativity constraints (non–negativity constraints do not increase the computational effort in linear programming). This transformation is performed using the approach proposed by Siskos and Yannacopoulos (1985). In particular, new variables are introduced representing the differences between the marginal utilities of two consecutive break–points and
94
Chapter 4
On the basis of these new incremental variables constraints (4.27) are transformed into non–negativity constraints The marginal utilities and the global utilities can now be expressed in terms of the incremental variables Marginal utilities:
where Global utilities:
where denotes the subinterval into which the performance of alternative on criterion belongs to. Other changes made in LP1 through the introduction of the incremental variables w, include the elimination of constraint (4.25), and the transformation of constraint (4.24) as follows:
According to all the above changes LP1 is now rewritten in a new form (LP2) presented below. Table 4.2 illustrates the dimensions of the new problem.
subject to:
4. Preference disaggregation classification methods
95
Comparing Tables 4.1 and 4.2 it is clear that LP2 has less constraints and n less variables compared to LP1. Thus the computational effort required to solve LP2 is significantly reduced. LP2 is the formulation used to develop the additive utility classification model within the context of the UTADIS method.
96
Chapter 4
2.3
Model development issues
2.3.1
The piece–wise linear modeling of marginal utilities
The way that the piece–wise linear modeling of the marginal utility functions is performed is quite significant for the stability and the performance of the additive utility classification models developed through the UTADIS method. This issue is related to the subintervals defined for each criterion’s range and consequently to the number of incremental variables w of LP2. In traditional statistical regression it is known that to induce statistically meaningful estimates for a regression model consisting of n independent variables the model development sample should have at least n+1 observations. Horsky and Rao (1984) emphasize the fact that this observation also holds for mathematical programming approaches. In the case of the UTADIS method every basic solution of LP2 includes as many variables as the number of constraints In addition, the optimal basic solution includes the utility thresholds (q–1 variables). Therefore, overall the optimal solution includes at most of the incremental variables w. It is obvious, that if a large number of subintervals is determined such that the number of incremental variables w exceeds t, then at least incremental variables w will not be included in any basic solution of LP2 (they will be redundant). Such a case affects negatively the developed model, increasing the instability of the estimates of the true significance of the criteria. One way to address this issue is to increase the number of constraints of LP2. Such an approach has been used by Oral and Kettani (1989). The appendix of this chapter also presents a way that such an approach can be implemented. Increasing the number of constraints, however, results to increased computational effort required to obtain an optimal solution. An alternative approach that is not subject to this limitation is to consider appropriate techniques for determining how the criteria scale is divided into subintervals. The heuristic HEUR1 presented earlier in this chapter is a simple technique that implements this approach. However, this heuristic does not consider how alternatives of different groups are distributed in each criterion’s scale. To accommodate this valuable information, a new simple heuristic can be proposed, which will be referred to as HEUR2. This heuristic is performed for all quantitative criteria in five steps as follows: Step 1:
Rank–order all alternatives of the reference set according to their performances on each quantitative criterion from the least to the most preferred ones. Set the minimum acceptable number of alternatives belonging into a subinterval equal to zero
4. Preference disaggregation classification methods
97
Step 2:
Form all non–overlapping subintervals such that the alternative, whose performance is equal to belongs to a different group from the alternative whose performance is equal to
Step 3:
Check the number of alternatives that lie into each subinterval formed after step 2. If the number of alternatives in a subinterval is less than then merge this subinterval with the precedent one (this check is skipped when ). Check the consistency of the total number of subintervals formed after step 3 for all criteria, as opposed to the size of the linear program LP2, i.e. the number of constraints. If the number of subintervals leads to the specification of more than incremental variables w, then set and repeat the process from step 3; otherwise the procedure ends.
Step 4:
The recent study of Doumpos and Zopounidis (2001) showed that under several data conditions HEUR2 increases the stability of the developed additive utility classification models and contributes positively to the improvement of their classification performance. 2.3.2
Uniqueness of solutions
The simple linear form of LP2 ensures the existence of a global optimum solution. However, often there are multiple optimal solutions. In the linear programming theory this phenomenon is known as degeneracy. The existence of multiple optimal solutions is most often when the groups are perfectly separable, i.e., when there is no group overlap. In such cases all error variables and are zero. The determination of a large number of criteria subintervals is positively related to the existence of multiple optimal solutions (as already mentioned as the number of subintervals increases, the degrees of freedom of the developed additive utility model also increases and so does the fitting ability of the model). Even if the subintervals are defined in an appropriate way, on the basis of the remarks pointed out in the previous sub–section, this does not necessarily eliminate the degeneracy phenomenon for LP2 and the existence of multiple optimal solutions. In addition to the degeneracy phenomenon, it is also important to emphasize that even if a unique optimal solution does exist for LP2 its stability needs to be carefully considered. A solution is considered to be stable if it is not significantly affected by small tradeoffs to the objective function (i.e., if near-optimal solutions are quite similar to the optimal one). The instability of the optimal solution is actually the result of overfitting the developed additive utility model to the alternatives of the reference set. This may affect negatively the generalizing classification performance of the developed clas-
98
Chapter 4
sification model. In addition to the classification performance issue, the instability of the additive utility model also raises interpretation problems. If the developed model is unstable then it is clearly very difficult to derive secure conclusions on the contribution of the criteria in performing the classification of the alternatives (the criteria weights are unstable and therefore, difficult to interpret). The consideration of these issues in the UTADIS method is performed through the realization of a post–optimality analysis that follows the solution of LP2. The objective of post–optimality analysis is to explore the existence of alternate optimal solutions and near optimal solutions. There are many different ways that can be used to perform the post–optimality stage considering the parameters that are involved in the model development process. These parameters include the constants and as well as the number of criteria subintervals. The use of mathematical programming techniques provides increased flexibility in considering a variety of different forms for the post–optimality analysis. Some issues that are worth the consideration in the post–optimality stage include: 1. The maximization of the constants and This implies a maximization of the minimum distance between the correctly classified alternatives and the utility thresholds, thus resulting to a more clear separation of the groups. 2. Maximization of the sum of the differences between the global utilities of the correctly classified alternatives from the utility thresholds. This approach extends the previous point considering all differences instead of the minimum ones. 3. Minimization of the total number of misclassified alternatives using the error function (4.4). 4. Determination of the minimum number of criteria subintervals.
The appendix of this chapter describes the mathematical programming formulations that can be used to address these issues during the post– optimality stage. The formulations presented in the appendix can also be used instead of LP2 to develop the optimal additive utility classification model. These post–optimality approaches consider either the technical parameters of the model development process (cases 1 and 4) or alternative ways to measure the quality of the developed classification model (cases 2 and 3). Considering, however, the aforementioned issues regarding the stability of the developed model and its interpretation, none of these approaches ensures the existence of a unique and stable solution. Consequently, the uncertainty on the interpretation of the model is still an issue to be considered.
4. Preference disaggregation classification methods
99
To overcome this problem the post–optimality stage performed in the UTADIS method focuses on the investigation of the stability of the criteria weights rather than on the consideration of the technical parameters of the model development process. In particular, during the post–optimality stage n+q–1 new linear programs are solved, each having the same form with LP2. The solution of LP2 is used as input to each of these new linear programs to explore the existence of other optimal or near optimal solutions. The objective function of each problem t involves the maximization of each criterion weight (for t=1, 2, …, n) and the value of the utility thresholds (for t > n). All new solutions found during the post–optimality stage are optimal or near optimal for LP2. This is ensured by imposing the following constraint:
where: is the optimal value for the objective function of LP2, is the value of the objective function of LP2 evaluated for any new solution obtained during the post–optimality stage. is a small portion of (a tradeoff made to the optimal value of the objective function in order to investigate the existence of near optimal solutions). This constraint in added to the formulation of LP2 and the new linear program that is formed, is solved to maximize either the criteria weights or the utility thresholds as noted above. Finally, the additive utility model used to perform the classification of the alternatives is formed from the average of all solutions obtained during the post–optimality stage. Overall, despite the problems raised by the existence of multiple optimal solutions, it should be noted that LP2 provides consistent estimates for the parameters of the additive utility classification model. The consistency property for mathematical programming formulations used to estimate the parameters of a decision making model was first introduced by Charnes et al. (1955). The authors consider a mathematical programming formulation to satisfy the consistency property if it provides estimates of the model’s parameters that approximate (asymptotically) the true values of the parameters as the number of observations (alternatives) used for model developed increases. According to the authors this is the most significant property that a mathematical programming formulation used for model development should have, since it ensures that the formulation is able to identify the true values of the parameters under consideration, given that enough information is available.
100
Chapter 4
LP2 has the consistency property. Indeed, as new alternatives are added in an existing reference set and given that these alternatives add new information (i.e., they are not dominated by alternatives already belonging in the reference set), then the new alternatives will add new non–redundant constraints in LP2. These constraints reduce the size of the feasible set. Asymptotically, for large reference sets, this will lead to the identification of a unique optimal solution that represents the decision maker’s judgment policy and preferential system.
3.
THE MULTI–GROUP HIERARCHICAL DISCRIMINATION METHOD (MHDIS)
3.1
Outline and main characteristics
People often employ, sometimes intuitively, a sequential/hierarchical process to classify alternatives to groups using available information and holistic judgments. For example, examine if an alternative can be assigned to the best group if not then try the second best group etc. This is the logic of the MHDIS method and (Zopounidis and Doumpos, 2000c) its main distinctive feature compared to the UTADIS method. A second major difference between the two methods involves the mathematical programming framework used to develop the classification models. Model development in UTADIS is based on a linear programming formulation followed by a post– optimality stage. In MHDIS the model development process is performed using two linear programs and a mixed integer one that gradually calibrate the developed model so that it accommodates two objectives: (1) the minimization of the total number of misclassifications, and (2) the maximization of the clarity of the classification1. These two objectives are pursued through a lexicographic approach, i.e., initially the minimization of the total number of misclassifications is sought and then the maximization of the clarity of the classification is performed. The common feature shared by both MHDIS and UTADIS involves the form of the criteria aggregation model that is used to model the decision maker’s preferences in classification problems, i.e., both methods employ a utility–based framework.
1
This objective corresponds to the maximization of the variance among groups in traditional discriminant analysis; cf. chapter 2.
4. Preference disaggregation classification methods
3.2
101
The hierarchical discrimination process
The MHDIS method proceeds progressively in the classification of the alternatives into the predefined groups. The hierarchical discrimination process used in MHDIS consists of q–1 stages (Figure 4.7). Each stage k is considered as a two–group classification problem, where the objective is to discriminate the alternatives of group from the alternatives of the other groups. Since the groups are defined in an ordinal way, this is translated to the discrimination of group from the set of groups Therefore at each stage of the hierarchical discrimination process two choices are available for the classification of an alternative: 1. To decide that the alternative belongs into group or 2. To decide that the alternative belongs at most in the group
belongs into one of the groups
to
(i.e., it
102
Chapter 4
Within this framework the procedure starts from group (most preferred alternatives). The alternatives found to belong into group (correctly or incorrectly) are excluded from further consideration. In a second stage the objective is to identify the alternatives belonging into group Once again, all the alternatives found to belong into this group (correctly or incorrectly) are excluded from further consideration and the same procedure continues until all alternatives are classified into the predefined groups. The criteria aggregation model used to decide upon the classification of the alternatives at each stage k of the hierarchical discrimination process, has the form of an additive utility function, similar to the one used in UTADIS.
denotes the utility of classifying any alternative into group on the basis of the alternative’s performance on the set of criteria g, while denotes the corresponding marginal utility function regarding the classification of any alternative into group according to a specific criterion Conceptually, the utility function provides a measure of the similarity of the alternatives to the characteristics of group Nevertheless, as noted above at each stage k of the hierarchical discrimination process there are two choices available for the classification of an alternative, the classification into group and the classification at most into group The utility function measures the utility (value) of the first choice. To make a classification decision, the utility of the second choice (i.e., classification at most into group ) needs also to be considered. This is measured by a second utility function denoted by that has the same form (4.35). Based on these two utility functions the classification of an alternative is performed using the following rules:
where
denotes the set of alternatives belonging into groups During model development the case is considered to be a misclassification. When the developed additive utility functions are used for extrapolating purposes, such a
2
As noted in the UTADIS method, in this expression of the additive utility function the marginal utilities range in the interval where is the weight of criterion The criteria weights sum up to 100%.
4. Preference disaggregation classification methods
103
case indicates that the classification of the alternatives is not clear and additional analysis is required. This analysis can be based on the examination of the marginal utilities and to determine how the performance of the alternatives on each of the evaluation criterion affects their classification. In both utility functions and the corresponding marginal utilities and are monotone functions on the criteria scale. The marginal utility functions are increasing, whereas are decreasing functions. This specification is based on the ordinal definition of the groups. In particular, since the alternatives of group are considered to be preferred to the alternatives of the groups to it is expected that the higher the performance of an alternative on criterion the more similar the alternative is to the characteristics of group (increasing form of the marginal utility function and the less similar is to the characteristics of the groups to (decreasing form of the marginal utility function ). The marginal utility functions are modeled in a piece–wise linear form, similarly to the case of the UTADIS method. The piece–wise linear modeling of the marginal utility functions in the MHDIS method is illustrated in Figure 4.8. In contrast to the UTADIS method, the criteria’s scale is not divided into subintervals. Instead, the alternatives of the reference set are rank–ordered according to their performance on each criterion. The performance of each alternative is considered as a distinct criterion level. For instance, assuming that the reference set includes m alternatives each having a different performance on criterion then m criterion levels are considered, ordered from the least preferred one to the most preferred one where denotes the number of distinct criterion levels (in this example Denoting as and two consecutive levels of criterion the monotonicity of the marginal utilities is imposed through the following constraints (t is a small positive constant used to define the smallest difference between the marginal utilities of and
where,
104
Chapter 4
Thus, it is possible to express the global utility of an alternative in terms of the incremental variables w as follows denotes the position of within the rank ordering of the criterion levels from the least preferred one to the most preferred one
While both UTADIS and MHDIS employ a utility–based modeling framework, it should be emphasized that the marginal utility functions in MHDIS do not indicate the performance of an alternative with regard to an evaluation criterion; they rather serve as a measure of the conditional similarity of an alternative to the characteristics of group (on the basis of a specific criterion) when the choice among and all the lower (worse) groups is considered. In this regard, a high marginal utility would indicate that when considering the performance of alternative on criterion the most appropriate decision would be to assign the alternative into group instead of the set of groups (the overall classification decision depends upon the examination of all criteria).
4. Preference disaggregation classification methods
105
This simple example indicates that the use of utilities in MHDIS does not correspond to the alternatives themselves, but rather to the appropriateness of the choices (classification decisions) that the decision maker has measured on the basis of the alternatives’ performances on the evaluation criteria.
3.3
Estimation of utility functions
According to the hierarchical discrimination procedure described above, the classification of the alternatives in q classes requires the development of 2(q–1) utility functions. The estimation of these utility functions in MHDIS is accomplished through mathematical programming techniques. In particular, at each stage of the hierarchical discrimination procedure, two linear programs and a mixed–integer one are solved to estimate “optimally” both utility functions3. The term “optimally” refers to the classification of the alternatives of the reference set, such that: (1) the total number of misclassifications is minimized and (2) the clarity of the classification is maximal. These two objectives are addressed lexicographically through the sequential solution of two linear programming problems (LP1 and LP2) and a mixed–integer programming problem (MIP). Essentially, the rationale behind the sequential solution of these mathematical programming problems is the following. As noted in the discussion of the UTADIS method the direct minimization of the total classification error (cf. equations (4.4) or (4.5)) is quite a complex and hard problem to face, from a computational effort point of view. To cope with this problem in UTADIS an approximation was introduced (cf. equation (4.6)) considering the magnitude of the violations of the classification rules, rather than the number of violations, which defines the classification error rate. As noted, this approximation overcomes the problem involving the computation intensity of optimizing the classification error rate. Nevertheless, the results obtained from this new error function are not necessarily optimal when the classification error rate is considered. To address these issues MHDIS combines the approximation error function (4.6) with the actual classification error rate. In particular, initially an error function of the form of (4.6) is employed to identify the alternatives of the reference set that are hard to classify correctly (i.e., they are misclassified). This is performed through a linear programming formulation (LP1). Generally, the number of these alternatives is expected to be a small portion of the number of alternatives in the reference set. Then, a more direct error mini3
Henceforth, the discussion focuses on the development of a pair of utility functions at stage k of the hierarchical discrimination process. The first utility function characterizes the alternatives of group whereas the second utility function characterizes the alternatives belonging in the set of groups The same process applies to all stages k=1, 2,..., q–1 of the hierarchical discrimination process.
106
Chapter 4
mization approach is used considering only this reduced set of misclassified alternatives. This approach considers the actual classification error (4.4). The fact that the analysis at this stage focuses only a reduced part of the reference set (i.e. the misclassified alternatives) significantly reduces the computational effort required to minimize the actual classification error function (4.4). The minimization of this error function is performed through a MIP formulation. Finally, given the optimal classification model obtained through the solution of MIP, a linear programming formulation (LP2) is employed to maximize the clarity of the obtained classification without changing the groups into which the alternatives are assigned. The details of this three–step process are described below, along with the mathematical programming formulations used at each step.
LP1: Minimizing the overall classification error The initial step in the model development process is based on a linear programming formulation. In this formulation the classification errors are considered as real–valued variables, defined similarly to the error variables and used in the UTADIS method. In the case of the MHDIS method these error variables are defined through the classification rule (4.36):
Essentially, the error indicates the misclassification of an alternative towards a lower (worst) group compared to the one that it actually belongs, whereas the error indicates a misclassification towards a higher (better) group. Both errors refer to a specific stage k of the hierarchical model development process. On the basis of the above considerations, the initial linear program (LP1) to be solved is the following:
Subject to:
4. Preference disaggregation classification methods
107
s, t small positive constants Constraints (4.39) and (4.40) define the classification error variables and These constraints are formulated on the basis of the classification rule (4.36) and the global utility functions (4.37). In the right–hand side of these constraints a small positive constant s is used to impose the inequalities of the classification rule (4.36). This constant is similar to the constants and used in the linear programming formulation of the UTADIS method. The set of constraints defined in (4.41) is used to ensure the monotonicity of the marginal utility functions, whereas the set of constraints in (4.42) normalize the global utility to range between 0 and 1.
MIP: Minimizing the number of misclassifications The solution of LP1 leads to the development of an initial pair of utility functions and that discriminate group from the groups to These utility functions define a classification of the alternatives in the reference set that is optimal considering the classification error measured in terms of the real–valued variables and . When the classification error rate is considered, however, these utility functions may lead to sub–optimal results. Nevertheless, this initial pair of utility function enables the identification of the alternatives that can be easily classified correctly and the “hard” alternatives. The “hard” alternatives are the ones misclassified by the pair of utility functions developed through the solution of LP1. Henceforth, the set of alternatives classified correctly by LP1 will be denoted by COR, whereas the set of misclassified alternatives will be denoted by MIS. Given that the set MIS includes at least two alternatives, it is possible to achieve a “re–arrangement” of the magnitude of the classification errors and for the misclassified alternatives (alternatives of MIS) that will lead
108
Chapter 4
to the reduction of the number of misclassifications. The example discussed earlier in sub–section 3.3 of this chapter is indicative of this possibility. However, as it has already been noted to consider the number of misclassification, binary 0–1 error variables need to be introduced in a MIP context. To avoid the increased computational effort required to solve MIP problems, the MIP formulation used in MHDIS considers only the misclassifications that occur through the solution of LP1, while retaining all the correct classifications. Thus, it becomes apparent that actually, LP1 is an exploratory problem whose output is used as input information to MIP. This reduces significantly the number of binary 0–1 variables, which are associated to each misclassified alternative, thus alleviating the computational effort required to obtain a solution. While this sequential consideration of LP1 and MIP considerably reduces the computational effort required to minimize the classification error rate, it should be emphasized that the obtained classification model may be near optimal instead of globally optimal. This is due to the fact that MIP inherits the solution of LP1. Therefore, the number of misclassifications attained after solving MIP depends on the optimal solution identified by LP1 (i.e., different optimal solutions of LP1 may lead to different number of misclassifications by MIP). Nevertheless, using LP1 as a pre–processing stage to provide an input to MIP provides an efficient mechanism (in terms of computational effort) to obtain an approximation of the globally minimum number of misclassifications. Formally, MIP is expressed as follows:
Subject to:
4. Preference disaggregation classification methods
109
s, t small positive constants The first set of constraints (4.45) is used to ensure that all correct classifications achieved by solving LP1 are retained. The second set of constraints (4.46) is used only for the alternatives that were misclassified by LP1 (set MIS). Their interpretation is similar to the constraints (4.39) and (4.40) in LP1. Their only difference is the transformation of the real-valued error variables and of LP1 into the binary 0–1 variables and that indicate the classification status of an alternative. Constraints (4.46) define these binary variables as follows: indicates that the alternative of group is classified by the developed model into the set of groups whereas indicates that the alternative belonging into one of the groups to is classified by the developed model into group Both cases are misclassifications. On the contrary the cases and indicate the correct classification of the alternative The interpretation of constraints (4.47) and (4.48) has already been discussed for the LP1 formulation. The objective of MIP involves the minimization of a weighted sum of the error variables and The weighting is performed considering the number of alternatives of set MIS from each group This is denoted by
LP2: Maximizing the minimum distance Solving LP1 and then MIP, leads to the “optimal” classification of the alternatives, where the term “optimal” refers to the minimization of the number of misclassified alternatives. However, it is possible that the correct classification of some alternatives is “marginal”. This situation appears when the classification rules (4.36) are marginally satisfied, i.e., when there is only a slight difference between and For instance, assume a pair of utility functions developed such that for an alternative of group its
110
Chapter 4
global utilities are and Given these utilities and considering the classification rules (4.36), it is obvious that alternative is classified in the correct group This is, however, a marginal result. Instead, another pair of utility functions for which and is clearly preferred, providing a more specific conclusion. This issue is addressed in MHDIS through a third mathematical programming formulation used on the basis of the optimal solution of MIP. At this stage the minimum difference d between the global utilities of the correctly classified alternatives identified after solving MIP is introduced.
where COR' denotes the set of alternatives classified correctly by the pair of utility functions developed through the solution of MIP. The objective of this third phase of the model development procedure is to maximize d. This is performed through the following linear programming formulation (LP2).
Subject to:
s, t small positive constants
4. Preference disaggregation classification methods
111
The first set of constraints (4.51) involves only the correctly classified alternatives. In these constraints d represents the minimum absolute difference between the global utilities of each alternative according to the two utility functions. The second set of constraints (4.52) involves the alternatives misclassified after the solution of MIP (set MIS' ) and it is used to ensure that they will be retained as misclassified. After the solution of LP1, MIP and LP2 at stage k of the hierarchical discrimination process, the “optimal” classification is achieved between the alternatives belonging into group and the alternatives belonging into the groups The term “optimal” refers to the number of misclassifications and to the clarity of the obtained discrimination. If the current stage k is the last stage of the hierarchical discrimination process (i.e., k=q–1) then the model development procedure stops since all utility functions required to classify the alternatives, have been estimated. Otherwise, the procedure proceeds to stage k+1, in order to discriminate between the alternatives belonging into group and the alternatives belonging into the lower groups In stage k+1 all alternatives classified by the pair of utility functions developed at stage k into group are not considered. Consequently, a new reference set A' is formed, including all alternatives that remain unclassified in a specific group (i.e., the alternatives classified in stake k in the set of groups According to the set A', the values of and are updated, and the procedure proceeds with solving once again LP1, MIP and LP2.
3.4
Model extrapolation
The classification of a new alternative is performed by descending the hierarchy of Figure 4.7. Initially, the two first additive utility functions and are used to determine whether the new alternative belongs into group or not. If then and the procedure stops, while if then and the procedure proceeds with the consideration of the next pair of utility functions and If then and the procedure stops, while if then and the procedure continues in the same way until the classification of the new alternative is achieved. To estimate the global utility of a new alternative the partial value (marginal utility) of this alternative on each one of the evaluation criteria needs to be determined. Assuming that the performance of on criterion lies between the performances of two alternatives of the reference set, i.e., denotes the number of distinct criterion levels at stage k of the hierarchical discrimination process), then the mar-
112
ginal utilities and lation (cf. Figure 4.8).
Chapter 4
need to be estimated through linear interpo-
4. Preference disaggregation classification methods
113
APPENDIX POST–OPTIMALITY TECHNIQUES FOR CLASSIFICATION MODEL DEVELOPMENT IN THE UTADIS METHOD This appendix presents different techniques that can be implemented during the post–optimality stage within the context of the UTADIS method. Furthermore, the presented mathematical programming formulations can also be employed in the fist stage of the model development instead of LP2. Throughout the discussion made in this appendix, the following notation will be used: z: COR: MIS:
the optimal (minimum) value of the objective function of LP2, a trade–off to the optimal value made to explore the existence of near optimal solutions (z is a small portion of the set of alternatives classified correctly according to the additive utility model defined by the solution of LP2, the set of alternatives misclassified by the additive utility model defined on the basis of the solution of LP2.
Maximization of the minimum difference between the global utilities of the correctly classified alternatives from the utility thresholds The objective in this approach is to identify an alternative optimal or near optimal additive utility classification model that maximizes the minimum difference between the global utilities of the correctly classified alternatives from the utility thresholds. Formally, the minimum difference d is defined as follows:
114
Chapter 4
The maximization of the difference d eliminates the problem of defining the values of the constants and Initially, these constants can be set to an arbitrary small positive value (for instance 0.001). Then the maximization of d is performed through the following linear program. Max d
Subject to:
Constraints (4.30)–(4.32) of LP2, Constraints (4.33)–(4.34) of LP2
where:
Constraints (A1)–(A3) define the minimum difference d. These constraints apply only to alternatives that are correctly classified according to the solution (additive utility model) of LP2 (set COR). For the misclassified alternatives (set MIS) the constraints (4.30)–(4.32) of LP2 are retained. Constraints (4.33)–(4.34) of LP2 are also retained to normalize the developed utility functions and to ensure the ordering of the groups. Constraint (A4) is
4. Preference disaggregation classification methods
115
used to ensure that the classification error of the new additive utility model developed through the solution of the above linear program does not exceed the trade–off made on the optimal classification error defined on the basis of the optimal solution for LP2. The maximization of the minimum difference d can also be incorporated into the formulation of LP2 as a secondary goal for model development (the primary goal being the minimization of the classification error). In this case few revisions are required to the above linear program, as follows:
1. No distinction is made between the sets COR and MIS and consequently constraints (A1)–(A3) apply to all the alternatives of the reference set. 2. In constraints (A1)–(A3) the classification errors and are introduced similarly to the constraints (4.30)–(4.32) of LP2. 3. Constraint (A4) is eliminated. 4. The new objective takes the form:
where and are weighted parameters for the two goals (minimization of the classification error and maximization of the minimum difference) defined such 5. During the post–optimality stage described in sub–section 2.2.3 a new constraint is added: where: is the maximum difference defined by solving the above linear program considering the revisions 1-4, is a trade-off made over to explore the existence of near optimal solutions ( is a small portion of ). d is the maximum difference defined through the solution of each linear program formed during the post-optimality stage described in sub-section 2.2.3. Similarly to the consideration of the minimum difference d (minimum correct classification), it is also possible to consider the maximum difference d' between the global utilities of the misclassified alternatives from the utility thresholds. Essentially, the maximum difference d' represents the maximum individual classification error, i.e., A comibination of the two differences is also possible.
116
Chapter 4
Maximization of the sum of differences between the correctly classified alternatives from the utility thresholds The consideration of the differences between the global utilities of the alternatives and the utility thresholds on the basis of the maximum or minimum operators as described in the previous approach has been shown to be rather sensitive to outliers (Freed and Glover, 1986). To address this issue it is possible to use instead of the maximum/minimum operator an metric distance measure. This involves the sum of all differences between the alternatives’ global utilities and the utility thresholds. This involves only the alternatives classified correctly by the additive utility classification model developed through LP2. The differences are defined similarly to the classification errors and as follows:
The development of an additive utility classification model that maximizes the sum of these differences, given the classification of the alternatives belonging into the reference set by LP2, is sought through the solution of the following linear program:
Subject to:
Constraints (4.30)–(4.32) of LP2, Constraints (4.33)–(4.34) of LP2
4. Preference disaggregation classification methods
117
where:
Similarly to the case of LP2, the objective function of the above linear program considers a weighted sum of the differences and In particular, the differences are weighted to account for variations in the number of alternatives of each group in the reference set. Constraints (A5)–(A7) define the differences and for the alternatives classified correctly after the solution of LP2 (set COR). The uncontrolled maximization of these differences, however, may lead to unexpected results regarding the classification of alternatives belonging into intermediate groups In particular, given that any intermediate group is defined by the upper utility threshold and the lower utility threshold the maximization of the difference for an alternative classified correctly by LP2, is possible to lead to the estimation of a global utility that exceeds the utility threshold (upper boundary of group ). Therefore, the alternative is misclassified. A similar phenomenon may also appear when the difference is maximized (the new estimated global utility may violate the utility threshold i.e., the lower boundary of group ). To avoid these cases the differences and should not exceed the range of each group where the range is defined by the difference Constraints (A8) introduce these appropriate upper bounds for the differences and
118
Chapter 4
For the alternatives misclassified by LP2 (set MIS) the constraints (4.30)–(4.32) of LP2 apply. Constraints (4.33)–(4.34) of LP2 are also retained to normalize the developed utility functions and to ensure the ordering of the groups. Constraint (A9) is used to ensure that the classification error of the new additive utility classification model complies with the trade–off specified for the minimum classification error defined according to the solution of LP2. Doumpos and Zopounidis (1998) presented a similar approach (UTADIS I method), which introduces the sum of differences and as a secondary goal in the objective function of LP2 for the development of the optimal additive utility classification model. In that case, however, appropriate weights should be specified for the two goals (minimization of the classification errors and maximization of the differences) such that higher significance is attributed to the minimization or the classification errors. The above approach overcomes the requirement for the specification of the appropriate weighting parameters.
Minimization of the total number of misclassifications As already mentioned in sub–section 2.2.1 of this chapter the error function considered in the objective of LP2 is an approximation of the actual classification error defined by equation (4.4). The use of this approximation is imposed by the increased computational effort required for the optimization of the classification error function (4.4). Zopounidis and Doumpos (1998) presented a variation of the UTADIS method, the UTADIS II method that considers the number of misclassified alternatives instead of the magnitude of the classification errors and In this case all the error variables and used in LP2 are transformed into binary 0–1 variables designating the classification status of each alternative (0 designates correct classification and 1 misclassification). In this case, the resulting mathematical programming formulation is a mixed integer one with binary 0–1 variables (cf. Table 4.2 for the dimensions of LP2). Nevertheless, when there is significant group overlap (even for small reference sets), the minimization of the actual classification error (4.4) is a quite cumbersome process (from a computational effort point of view). This problem becomes even more significant considering that during the post–optimality analysis stage as described in sub–section 2.2.3 of this chapter a series of similar mixed integer programming problems needs to be solved. Considering the minimization of the classification error function (4.4) at a post–optimality context, given that a solution of LP2 is obtained, provides a good approach for reducing the required computational effort. In this case, the MIP problem to be solved is formulated as follows:
4. Preference disaggregation classification methods
119
Subject to:
Constraints (4.33)–(4.34) of LP2
where:
Essentially the above MIP formulation explores the possibility to reduce the number of alternatives misclassified by LP2 (set MIS), without affecting the classification of the correctly classified alternatives (set COR). This is a similar approach to the one used in MHDIS. Constrains (A10)–(A12) apply only for the alternatives of the set COR (alternatives classified correctly by LP2). These constraints ensure that the classification of the alternatives will remain unaffected. Constrains (A 13)– (A15) apply to the misclassified alternatives of the set MIS (i.e., alternatives
120
Chapter 4
misclassified by LP2). For these alternatives it is explored whether it is possible or not correct the classification for some of them. The binary 0–1 variables and designate the classification status for each of these alternatives, as follows:
According to the solution of LP2 it is expected that the set MIS is a small portion of the whole reference set. Therefore, the number of binary 0–1 variables and is considerably lower than the number of the corresponding variables if all the alternatives of the reference set are considered (in this case one should use binary 0–1 variables). This reduction in the number of binary variables is associated to a significant reduction in the computational effort required to obtain an optimal solution. However, the computational complexity problem still remains for large data sets. For instance, considering a reference set consisting of 1,000 alternatives for which LP2 misclassifies 100 alternatives (10% error), then 100 binary 0–1 variables should be introduced in the above mixed integer programming formulation (assuming a two group classification problem). In this case significant computational resources will be required to find an optimal solution. The use of advanced optimization techniques such as genetic algorithms or heuristics (tabu search; cf. Glover and Laguna, 1997), constitute promising approaches to tackle with this problem. Overall, however, it should be emphasized that the optimal fit of the model to the data of the reference set does not ensure high generalizing ability. This issue needs careful investigation.
Determination of the minimum number of criteria subintervals The specification of the criteria subintervals during the piece–wise linear modeling of the marginal utility functions is an issue of major importance in the UTADIS method. The discussion of this issue in this chapter has been focused on two simple heuristic approaches (HEUR1 and HEUR2; cf. subsection 2.2.3). Alternatively, it is also possible to calibrate the criteria subintervals through a linear programming approach. Initially, the scales of the quantitative criteria are not divided into any subinterval. LP2 is used to de-
4. Preference disaggregation classification methods
121
velop an optimal additive utility classification model and then at the post– optimality stage the following linear program is solved:
Subject to:
Constraints (4.30)–(4.34) of LP2
The objective of the above linear program is to minimize the differences in the slopes of two consecutive linear segments of the piece–wise linear marginal utility functions (cf. Figure 4.5), such that the classification error of the new additive utility model complies with the trade–off specified for the optimal error (cf. constraint (A 17)). The minimization of the differences in the slopes corresponds to the elimination of the unnecessary criteria subintervals (subintervals with equal slopes of the marginal utility segments can be merged into one subinterval). The slope of a linear segment of the piece–wise linear marginal utility function between two consecutive criterion values and is defined as follows:
On the basis of the expression, constraint (A16) defines the difference between the slopes and of two consecutive linear segments of
122
Chapter 4
the piece–wise linear marginal utility function of criterion The deviational variables used to account for these differences are denoted by and
Nevertheless, assuming that the performance of each alternative of the reference set defines a new level at the criterion scale (a new point for the specification of the piece–wise linear form of the marginal utility function), the above linear program will have too many variables (incremental variables w) that will increase the computation effort for determining the optimal solution. This problem can be easily solved by using one of the heuristics HEUR1 and HEUR2, for the determination of initial set of criteria subintervals and then proceeding with the solution of the above post–optimality approach to determine a minimum number of subintervals.
Chapter 5 Experimental comparison of classification techniques
1.
OBJECTIVES
The existing research on the field of MCDA classification methods has been mainly focused on the development of appropriate methodologies for supporting the decision making process in classification problems. At the practical level, the use of MCDA classification techniques in real-world classification problems has demonstrated the capabilities that this approach provides to decision makers. Nevertheless, the implementation in practice of any scientific development is always the last stage of a research. Before this stage, experiments need to be performed in a laboratory environment, under controlled data conditions in order to investigate the basic features on the scientific development under consideration. Such an investigation and the corresponding experimental analysis enable the derivation of useful conclusions on the potentials that the proposed research has in practice and the possible problems that may be encountered during its practical implementation. Within the field of MCDA experimental studies are rather limited. Some MCDA researchers conducted experiments to investigate the features and peculiarities of some MCDA ranking and choice methodologies (Stewart, 1993, 1996; Carmone et al., 1997; Zanakis et al., 1998). Comparative studies involving MCDA classification techniques have been heavily oriented towards the goal programming techniques discussed in Chapter 3. Such comparative studies tried to evaluate the efficiency of goal programming classification formulations as opposed to traditional statistical classification techniques, such as LDA, QDA and LA.
124
Chapter 5
The present chapter follows this line of research to investigate the classification performance of the preference disaggregation methods presented in Chapter 4, as opposed to other widely used classification techniques some of which have been discussed in Chapters 2 and 3. The investigation is based on an extensive Monte Carlo simulation experiment.
2.
THE CONSIDERED METHODS
Every study investigating the classification performance of a new methodology relatively to other techniques, should consider techniques which are: (1) representative of a wide set of alternative methodological approaches, (2) well-established among researchers, (3) non-overlapping (i.e., the considered techniques should consider different underlying assumptions/functionality). On the basis of these remarks, the experimental investigation of the classification performance of the UTADIS and the MHDIS methods considers five other classification techniques:
1. 2. 3. 4. 5.
Linear discriminant analysis. Quadratic discriminant analysis. Logit analysis. Rough sets. ELECTRE TRI.
The two forms of discriminant analysis (linear and quadratic) are among the most widely used classification techniques. Despite their shortcomings (cf. Chapter 2), even today they are still often used in many fields for studying classification problems. Also, they often serve as benchmark in comparisons regarding the investigation of the classification performance of new techniques developed from the fields of operations research and artificial intelligence. The use of these techniques in comparative studies should be of no surprise. Indeed, considering the fact that LDA and QDA provide the optimal classification rule when specific assumptions are met regarding the statistical properties of the data under consideration (multivariate normality, known group variance-covariance matrices), their consideration in comparative studies enables the analysts to investigate the ability of new techniques to compete with a theoretically optimal approach. Logit analysis (LA) has been developed as an alternative to LDA and QDA, following an econometric approach. Its consideration in the experimental comparison presented in this chapter is due to its theoretical advantages over the two forms of discriminant analysis in combination with its extensive use in addressing classification problems in many disciplines. The ordered logit model is used to apply LA in the present analysis.
5. Experimental comparison of classification techniques
125
The last two methods considered in this experimental comparison, are examples on non-parametric classification techniques. The rough set theory has evolved rapidly over the last two decades as a broad discipline of operations research and artificial intelligence. During its development, the rough set theory has found several connectives with other disciplines such as neural networks, fuzzy set theory, and MCDA. In contrast to all the other methods considered in the conducted experimental comparison, the rough set approach develops a symbolic classification model expressed in the form of decision rules. Therefore, the consideration of rough sets enables the investigation of an alternative approach to address the classification problem on the basis of rule induction. The implementation of the rough set approach in this study is performed through the MODLEM algorithm (Grzymala–Busse and Stefanowski, 2001) and the LERS classification system (Grzymala–Busse, 1992). As noted in Chapter 2, the MODLEM algorithm is well suited to classification problems involving quantitative criteria (attributes) without requiring the implementation of a discretization process. On the other hand, the LERS system provides the basis to overcome problems such as the conflicts that may be encountered in the classification of an alternative covered by rules providing different recommendations, or the classification of an alternative that is not covered by any rule. Of course, the value closeness relation (Slowinski, 1993) could have been used instead of the classification scheme of the LERS system. Nevertheless, the implementation of the value closeness relation approach requires that the decision maker specifies some information that is necessary to construct the closeness relation. Since this is an experimental comparison, there is no decision maker that can specify this information and consequently the use of the value closeness relation is quite cumbersome. Finally, the ELECTRE TRI method is the most representative example of the MCDA approach in addressing the classification problem. In contrast to the UTADIS and the MHDIS methods, ELECTRE TRI originates from the outranking relation approach of MCDA. Its major distinctive features as opposed to the UTADIS and the MHDIS methods involve its noncompensatory character1 and the modeling of the incomparability relation. None of the other methods considered in this experimental design has these two features. Typically, the application of the ELECTRE TRI method re1
Compensatory approaches lead to the development of criteria aggregation models considering the existing trade-offs between the evaluation criteria. Techniques based on the utility theory approach have a compensatory character. On the other hand, non-compensatory approaches involve techniques that do not consider the trade-offs between the evaluation criteria. Typical examples of non-compensatory approaches are lexicographic models, conjunctive/disjunctive models, and techniques based on the outranking relation approach that employ the veto concept.
126
Chapter 5
quires the decision maker to specify several parameters (cf. Chapter 3). This is impossible in this experimental comparison, since there is no decision maker to interact with. To tackle this problem a new procedure has been developed allowing the specification of the parameters of the outranking relation constructed through ELECTRE TRI, using the preference disaggregation paradigm. The details of this procedure are discussed in the appendix of this chapter. Of course, in addition to the above techniques, other classification methodologies could have also been used (e.g., neural networks, goal programming formulations, etc.). Nevertheless, introducing additional classification techniques in this experimental comparison, bearing in mind the already increased size of the experiment would make the results difficult to analyze. Furthermore, as already noted in Chapters 2 and 3, there have been several comparative studies in these fields involving the relative classification performance of the corresponding techniques as opposed to the statistical techniques used in the analysis of this chapter. Therefore, the results of this comparative analysis can be examined in conjunction with the results of previous studies to derive some conclusions on the classification efficiency of the MCDA classification techniques compared to a variety of other nonparametric techniques.
3.
EXPERIMENTAL DESIGN
3.1
The factors
The comparison of the MCDA classification methods, presented in Chapter 4, to the methods noted in the previous sub-section (LDA, QDA, LA, rough sets, ELECTRE TRI) is performed through an extensive Monte Carlo simulation. The simulation approach provides a framework to conduct the comparison under several data conditions and derive useful conclusions on the relative performance of the considered methods given the features and properties of the data. The term performance refers solely on the classification accuracy of the methods. Of course, it should be emphasized that given the orientation of MCDA methods towards providing support to decision makers, a comparison of MCDA classification techniques, should also consider the interaction between the decision maker and the method itself. However, the participation of an actual decision maker in an experimental analysis is rather difficult. Consequently, the experiment presented in this chapter is only concerned with the investigation of the classification accuracy of MCDA methods on experimental data conditions.
5. Experimental comparison of classification techniques
127
In particular, the conducted simulation study investigates the performance of the methods on the basis of the following six factors.
1. 2. 3. 4. 5. 6.
The statistical distribution of data. The number of groups. The size of the reference set (training sample). The correlation of the evaluation criteria. The structure of the group variance-covariance matrices. The degree of group overlap.
Table 5.1 presents the levels considered for each factor in the simulation experiment. As indicated in the table the ELECTRE TRI and UTADIS methods are both applied in two ways. In particular, ELECTRE TRI is applied with and without the discordance test (veto) in order to investigate the impact of the veto concept on the efficiency of the method. The introduction of the veto concept is the major distinguishing feature of the ELECTRE TRI method (and the outranking relation approach in general) as opposed to other MCDA methodologies employing a compensatory approach (e.g., UTADIS, MHDIS methods).
128
Chapter 5
For the UTADIS method the two heuristic approaches (HEUR1, HEUR2) presented in Chapter 4 for the definition of the criteria sub-intervals during the piece-wise modeling of the marginal utility functions are employed. Using the two heuristic approaches enables the investigation of their impact on the classification accuracy of the developed additive utility classification models. All methods defined by the factor are compared (in terms of their classification accuracy) under different data conditions defined by the remaining factors Factor specifies the statistical distribution of data (i.e., the distribution of the performances of the alternatives on the evaluation criteria). Most of the past studies conducting similar experiments have been concentrated on univariate distributions. Through univariate distributions, however, it is difficult to model the correlations between the criteria, which are an important issue in the model development process. In the multivariate case the majority of the existing studies focus only on the multivariate normal distribution which is easy to model2. On the contrary, this experimental comparison considers a rich set of multivariate distributions, including four cases. In particular, the first two of the multivariate distributions that are considered (normal and uniform) are symmetric, while the exponential3 and log-normal distributions are asymmetric, thus leading to a significant violation of multivariate normality. The methodology used to simulate these multivariate distributions is presented in the subsequent sub-section. Factor defines the number of groups into which the classification of the objects is made. Most of the existing experimental studies consider only the two-group case. In real-world problems, however, multi-group classification problems are often encountered. A multi-group classification scheme adds more flexibility to the decision making process as opposed to the strict two-group classification framework. In this experimental design two-group and three-group classification problems are considered. This specification enables the derivation of useful conclusions on the performance of the methods in a wide range of situations that are often met in practice (many realworld classification problems involve three groups). Factor is used to define the size of the reference set (training sample), and in particular the number of alternatives that it includes (henceforth this number is denoted by m). The factor has three levels corresponding to 36, 72 2
3
If z is a vector of n random variables that follow the standard normal distribution N (0,1), then the elements of the vector y=Bz+µ follow the multivariate normal distribution with mean µ and variance-covariance (dispersion) matrix BB'. This is actually a multivariate distribution that resembles the exponential distribution in terms of its skewness and kurtosis. Nevertheless, for simplicity reasons, henceforth this will be noted as the exponential distribution.
5. Experimental comparison of classification techniques
129
and 108 alternatives, distributed equally to the groups defined by factor In all three cases the alternatives are described along five criteria. Generally, small training samples contain limited information about the classification problem being examined, but the corresponding complexity of the problem is also limited. On the other hand, larger samples provide richer information, but they also lead to increased complexity of the problem. Thus, the examination of the three levels for this factor enables the investigation of the performance of the classification procedures under all these cases. Factor defines the correlations between the evaluation criteria. Two cases (levels) are considered for this factor. In the first case the correlations are assumed to be limited (the correlation coefficient ranges between 0 and 0.1). In the second case higher correlations are used (the correlation coefficient ranges between 0.2 and 0.5). In both case the correlation coefficient among the criteria and is specified as a uniformly distributed random variable ranging in the appropriate interval. The specified correlation coefficients for every pair of criteria define the off-diagonal elements of the group variance-covariance matrices. The elements in the diagonal of these matrices, representing the variance of the criteria are specified by the sixth factor which is considered in two levels. In the first level, the variances of the criteria are equal for all groups, whereas in the second level the variances differ. Denoting the variance of criterion for group as the realization of these two situations regarding the homogeneity of the group dispersion matrices is performed as follows: For the multivariate normal, uniform and exponential distributions: Level 1: Level 2: For the multivariate log-normal distribution, the variances are specified so as to assure that the kurtosis of the data ranges within reasonable levels4, as follows: a) In the case of two groups: 4
In the log-normal distribution the skewness and kurtosis are defined by the mean and the variance of the criteria for each group. The procedures for generating multivariate nonnormal data can replicate satisfactory the prespecified values of the first three moments (mean, standard deviation and skewness) of a statistical distribution. However, the error is higher for the fourth moment (kurtosis). Therefore, in order to reduce this error and consequently to have better control of the generated data, both the mean and the variance of the criteria for each group in the case of the multivariate log-normal distribution, are specified so as the coefficient ofkurtosis is lower than 40.
130
Chapter 5
Level 1:
Level 2:
b) In the case of three groups: Level 1:
Level 2:
The last factor is used to specify the degree of group overlap. The higher the degree of group overlap the more difficult is to discriminate the considered groups. The definition of the group overlap in this experiment is performed using the Hotelling’s statistic. For a pair of groups and this statistic is defined as follows:
where and denote the number of alternatives of the reference set belonging into each group, and are vectors of the criteria averages for each group, and S is the within–groups variance–covariance matrix:
Hotelling’s is a multivariate test for the differences between the means of two groups. To evaluate its statistical significance the statistic is computed, which follows the F distribution with n and degrees of freedom (Altman et al., 1981). The use of the Hotelling’s statistic implies a multivariate normal distribution and an equality of the group variance-covariance matrices. Studies
5. Experimental comparison of classification techniques
131
investigating the multivariate normality assumption have shown that the results of the Hotelling’s are quite robust for multivariate non-normal data, even for small samples (Mardia, 1975). Therefore, the use of the Hotelling’s in this experimental combined with non-normal data is not a problem. In the case where the group variance-covariance matrices are not equal, then it is more appropriate to use the revised version of the Hotelling’s as defined by Anderson (1958):
where:
:
the vector consisting of the performance of an alternative evaluation criteria.
on the
In this revised version of the Hotelling’s , the statistic (M–n– follows the F distribution with n and M–n–1 degrees of freedom. The Hotelling’s and its revised version are used in this experiment to define the average performance of the alternatives in each group. For the multivariate normal, uniform and exponential distributions the average performance of the alternatives of group to all criteria is set equal to one (i.e., whereas for the multivariate log-normal distribution the average performance of the alternative of group on the evaluation criteria it set such that The average performance of the alternatives in group is specified so that the Hotteling’s or its revised version (when the group variance-covariance matrices are unequal) defined between and is significant at the 1% level (low overlap) and the 10% level (high overlap). The average performance of the alternatives in group is defined in a similar way.
3.2
Data generation procedure
A crucial aspect of the experimental comparison is the generation of the data having the required properties defined by the factors described in the previous sub-section.
132
Chapter 5
The generation of data that follow the multivariate normal distribution is a well-known process. On the other hand, the simulation of multivariate nonnormal distributions is a more complex process. In this study the methodology proposed by Vale and Maurelli (1983) is employed. The general outline of this methodology is presented in Figure 5.1. The outcome of this methodology is the generation of a vector g' consisting of n random variables (evaluation criteria) having the statistical properties described in the previous sub-section. In this experimental comparison the generated criteria vector g' consists of five criteria
5. Experimental comparison of classification techniques
133
Each criterion of the generated vector g' follow the specified multivariate non-normal distribution with zero mean and unit variance. Their transformation, so that they have the desired mean and standard deviation defined by factors and respectively (cf. Table 5.1), is performed through the relation Each criterion of the vector g' is defined as where is a random variable following the multivariate standard normal distribution. The constants are specified through the solution of a set of non-linear equations on the basis of the desired level of skewness and kurtosis (Fleishman, 1978):
The use of the traditional techniques for generating multivariate normal random variables is not adequate for generating the random variables This is because the desired correlations between the criteria of the vector g should be taken into consideration in generating the random vector y. To address this issue Vale and Maurelli (1983), proposed the construction of an intermediate correlation matrix. Each element of this matrix defines the correlation between the variables and corresponding to criteria and The calculation of is performed through the solution of the following equation:
where denotes the desired correlation between the criteria and as defined by factor The intermediate correlation matrix is decomposed so that the correlations between the random variables of the vector y corresponds to the desired correlations between the criteria of the vector g. In this experimental study the decomposition of the intermediate correlation matrix is performed using principal components analysis. The data generation procedure ends with the transformation of the vector g' to the criteria vector g with the desired mean and standard deviation defined by factors and respectively. Since all the considered MCDA methods assume an ordinal definition of the classes, it is important to ensure that the generated data meet this requirement. This is achieved through the following constraint:
134
Chapter 5
This constraint ensures that an alternative of group does not dominate the alternatives of group thus ensuring that group consists of alternatives that are preferred to the ones of For each combination of factors to the above data generation procedure is employed to produce two data samples. The first one is used as the reference set, while the second one is used as the validation sample. The number of alternatives in the reference set is specified by factor whereas the validation sample consists of 216 alternatives, in all cases. In both the reference set and the validation sample the number of alternatives are equally distributed in the groups This experiment is repeated 20 times for each combination of the factors to (192 combinations). Overall, 3,840 reference sets are considered5, each matched to a validation sample. Each reference set is used to develop a classification model through the methods specified by factor (cf. Table 5.1). This model is then applied to the corresponding validation sample to test its generalizing classification performance. The simulation was conducted on a Pentium III 600Mhz PC, using Matlab 5.2 for data generation as well as for the application of LA and QDA. Appropriate codes for the other methods were written by the authors in the Visual Basic 6 programming environment. The results of the simulation have been analyzed using the SPSS 10 statistical package.
4.
ANALYSIS OF RESULTS
The results obtained from the simulation experiment involve the classification error rates of the methods both in the reference sets and the validation samples. However, the analysis that follows is focused only on the classification performance of the methods on the validation samples. This is because the error rates obtained considering the reference set are downwardly biased compared to the actual performance of the methods, since the same sample is used both for model development and model validation. On the other hand, the error rates obtained using the validation samples provide a better estimate of the generalizing performance of the methods, measuring the ability of the methods to provide correct recommendations on the classification of new alternatives (i.e., alternatives not considered during model development).
5
(192 combinations of factors F2 to F7)×(20 replications).
5. Experimental comparison of classification techniques
135
The analysis of the results is based on the use of a transformed measure of the error rate proposed to stabilize the variance of the error rates (Bajgier and Hill, 1982; Joachimsthaler and Stam, 1988):
Table 5.2 presents the ANOVA results for this error rate measure defined on the basis of the validation samples. All main effects and the interaction effects presented in this table are significant at the 1% level. Furthermore, each effect (main or interaction) explains at least 0.5% of the total variance in the results ( statistic6). Except for the 17 effects presented in Table 5.2 there were 64 more effects found to be significant at the 1% level. None of these effects, however, explained more that 0.5% of the total variance and therefore, in order to reduce the complexity of the analysis, they are not reported. A first important note on the obtained results is that the main effects regarding the seven factors are all significant. This clearly shows that each of these factors has a major impact on the classification performance of the methods. The main effects involving the statistical distribution of the data the structure of the group dispersion matrices and the classification methods explain more than 48% of the total variance. The latter effect (classification methods) is of major importance to this analysis. It demonstrates that there are significant differences in the classification performances of the considered methods. Figure 5.2 presents the average error rates for each method in the validation samples for the whole simulation. The numbers in parentheses indicate the grouping of the methods according to the Tukey’s test7 on the average transformed error rates. The homogeneous groups of classification methods formed by the Tukey’s test are presented in an increasing order (i.e., 1, 2, ...) from the methods with the lower error rate to those with the higher error rate.
6
Denoting by SS the sum of squares, by MSE the mean square error, by df the degrees of freedom and by TSS the total sum of squares, the statistic is calculated as follows:
7
Tukey’s honestly significantly different test is a post-hoc comparison technique that follows the results of ANOVA enabling the identification of the means that most contribute to the considered effect. In this simulation study the Tukey’s test is used to perform all pairwise comparisons among average classification error rates (transformed error rates) of each pair of methods to form homogenous sets of methods according to their classification error rate. Each set includes methods that do not present statistically significant differences with respect to their classification error rates (see Yandell, 1977 for additional details).
136
Chapter 5
5. Experimental comparison of classification techniques
137
The results indicate the increased performance of the considered MCDA classification methods as opposed to the other techniques. In particular, UTADIS provides the best performance (lower error rate) compared to all the other methods. The use of the heuristic HEUR2 (UTADIS2) for the specification of the subintervals during the piece-wise formulation of the marginal utility functions in the UTADIS method, provides better overall results compared to the use of the heuristic HEUR1 (UTADIS1). The difference between the two cases is significant at the 5% level according to the Tukey’s test. UTADIS is followed by the MHDIS method, which provides similar results to ELECTRE TRI. With regard to the ELECTRE TRI method it should be noted that the procedure used to specify the parameters of the outranking relation provides quite satisfactory classification results (a detailed description of this procedure is given in the Appendix of this chapter). The differences between the use of the discordance test or not (ELEC1 vs ELEC2) are not significant. Regarding the other methods, the rough set approach provide the lower error rate followed by QDA, while LA and LDA provide similar results. These results provide an overview of the overall performance of the considered methods in the experiment. Further insight information can be derived considering the significant interactions between the factor and the factors to The most significant of these interactions is that involving the performance of the methods for the different statistical distributions considered in the experiment This interaction explains 8.71% of the total variance of the results (cf. Table 5.2). Table 5.3 presents the corresponding results for all combinations of these two factors (similarly to Figure 5.2 parentheses indicate the grouping of the methods through the Tukey’s test at the 5% significance level).
138
Chapter 5
For all four statistical distributions the two implementations of the UTADIS method provide the best results. In the case of the multivariate normal distribution the error rates for UTADIS 1 and UTADIS2 are slightly higher than the error rate of QDA, but the differences are not statistically significant (both UTADIS 1 and UTADIS2 provide similar results in this case). It is also important to note that the other MCDA classification methods (MHDIS and ELECTRE TRI) outperform both LDA and LA. For the multivariate uniform distribution UTADIS 1 and UTADIS2 provide the lower error rates, followed by the MHDIS method and QDA. The differences between the MCDA classification methods and the traditional statistical techniques significantly increase for the two asymmetric distributions, the exponential and the log-normal. In the exponential case, the implementation of the UTADIS method using the heuristic HEUR2 provides better results compared to all the other approaches. It should also be noticed that the difference with the use of the heuristic HEUR1 is significant according to the Tukey’s test at the 5% level. MHDIS, ELECTRE TRI and the rough set approach all provide similar results which are considerably better as opposed to the three statistical methods. Similar results are also obtained for the lognormal distribution. The use of heuristic HEUR2 in UTADIS once again provides the best results. The heuristic HEUR1 (UTADIS1), however, leads to similar results compared to MHDIS, and ELECTRE TRI. A second significant two-way interaction that is of interest to the analysis of the performance of the methods, is the interaction involving factors (classification methods) and (structure of the group dispersion matrices). This interaction explains 4.46% of the total variance in the results of the experiment. The corresponding results presented in Table 5.4 show that in both cases (equal and unequal group dispersion matrices), the considered MCDA classification techniques provide quite satisfactory results. In particular, the two implementations of the UTADIS method provide the lower error rates both in case the group dispersion matrices are equal as well as in case they are unequal. In the former case, the use of HEUR2 provides significantly lower error rate compared to the use of HEUR1. In the case of equal group dispersion matrices, the UTADIS method is followed by the MHDIS method and the two implementations of the ELECTRE TRI method (with and without the discordance test). In this case the MHDIS method performs slightly better than the ELECTRE TRI method. In the case of unequal group dispersion matrices the differences between MHDIS and ELECTRE TRI are not significant. It is also worth noticing the performance of QDA for the two considered cases regarding the structure of the group dispersion matrices. When these matrices are equal along all groups, QDA performs worst than all the other methods used in this experiment. The performance of the method, however, improves significantly when the group dispersion matrices
5. Experimental comparison of classification techniques
139
are unequal. In this case the error rate of QDA is similar to the one of the UTADIS method and significantly lower compared to all the other techniques. These results indicate that QDA is quite sensitive to changes in the structure of the group dispersion matrices.
The third two–way interaction which is found significant in this experiment for the explanation of the differences in the performance of the methods, involves the size of the reference set This interaction explains 1.22% of the total variance in the results of the experiment. The results of Table 5.5 show that the increase of the size of the reference set (number of alternatives) reduces the performance of all methods. This is an expected result, since in this experiment larger reference sets are associated with an increased complexity of the classification problem. The most sensitive methods to the size of the reference set are LDA, LA and UTADIS. On the other hand, QDA, rough sets and MHDIS appear to be the least sensitive methods. Nevertheless, it should be noted that irrespective of the reference set size the considered MCDA methods always perform better than the other methods. In particular, the two implementations of the UTADIS method provide the best results for small to moderate reference sets (36 and 72 alternatives). In both cases, the differences between UTADIS1 (HEUR1) and UTADIS2 (HEUR2) are not significant according to the Tukey’s grouping at the 5% level. The UTADIS method is followed by MHDIS, ELECTRE TRI and rough sets. For larger reference sets (108 alternatives) UTADIS2 provides the best results. In this case, its difference to UTADIS 1 is statistically significant, thus indicating that the use of the heuristic HEUR2 is less sensitive to the size of the reference set compared to HEUR1. UTADIS 1, MHDIS and ELECTRE TRI all provide similar results in this case, followed by rough sets and QDA.
140
Chapter 5
The last two-way interaction, that is of interest in this analysis, is the one involving the performance of the methods according to the number of groups This interaction explains 0.64% of the total variance in the results of the experiment. The corresponding results are presented in Table 5.6. A first obvious remark is that the performance of all methods deteriorates significantly in the three-group classification problem as opposed to the two-group case. This is no surprise, since the number of groups is positively related to the complexity of the problems (i.e., the complexity increases with the number of groups). Nevertheless, in both the two-group and the three-group case the use of the heuristic HEUR2 in UTADIS is the approach that provides the lower error rate. In particular, in the two-group case UTADIS2 performs similarly to UTADIS 1 (use of HEUR1), whereas in the three-groups case its differences from all other methods (including UTADIS 1) are all statistically significant at the 5% level according to the grouping obtained from the Tukey’s test. It should also be noticed that MHDIS and ELECTRE TRI are the least sensitive methods to the increase of the number of groups. In both cases, the increase in the error rates for the three-group problem is the smallest compared to all other methods. As a result, both MHDIS and ELECTRE TRI perform similarly to UTADIS 1 in the three-group classification problem. Except for the above two-way interaction results, Table 5.2 also indicates some three–way interactions to be significant in explaining the results of this experiment regarding the performance of the considered classification methods. The first of these three–way interactions that is of interest involves the performance of the methods according to the form of the statistical distribution of the data and the structure of the group dispersion matrices (interaction The corresponding results presented in Table 5.7 provide
5. Experimental comparison of classification techniques
141
more insight information on the remarks noted previously when the statistical distribution and the structure of the group dispersion matrices were examined independently from each other (cf. Tables 5.3 and 5.4); the interaction of these two factors is examined now.
142
Chapter 5
The results of the above table indicate that when the data are multivariate normal and the group dispersion matrices are equal LDA and LA provide the lower error rates, whereas when the group dispersion matrices are unequal QDA outperforms all the other methods, followed by UTADIS. These results are to be expected considering that multivariate normality and the apriori knowledge of the structure of the group dispersion matrices are the two major assumptions underlying the use both LDA and QDA. On the other hand, when the data are not multivariate normal and the group dispersion matrices are equal, then the MCDA classification methods (UTADIS, MHDIS, ELECTRE TRI) provide the best results compared to the other methods considered in this experiment. In all these cases the use of the UTADIS method with the heuristic HEUR2 (UTADIS2) provides the best results. Its differences from all the other MCDA approaches are significant for the exponential and the log-normal distributions, whereas for the uniform distribution its results are similar to UTADIS 1. The results obtained when the data are not multivariate normal and the dispersion matrices are unequal are rather similar. The differences, however, between the MCDA methods, rough sets and QDA are reduced in this case. In particular, for the uniform distribution QDA performs similarly to the UTADIS method, while outperforming both MHDIS and ELECTRE TRI. A similar situation also appears for the log-normal distribution. On the other hand, for the exponential distribution UTADIS outperforms all the other methods, followed by MHDIS and rough sets. The second three-way interaction that is of interest involves the performance of the classification methods according to the form of the statistical distribution of the data and the size of the reference set (interaction The results presented in Table 5.8 show that for low and moderate sizes of the reference set (36 and 72 alternatives) the MCDA classification methods compare favorably (in most cases) to the other techniques, irrespective of the form of the statistical distribution. Futhermore, it is interesting to note that as the size of the reference set increases, the performance of MHDIS and ELECTRE TRI relative to the other methods is improved. The improvement is more significant for the two asymmetric distributions (exponential and log-normal). For instance, in the case of the log-normal distribution with a large reference set (108 alternatives) both MHDIS and ELECTRE TRI perform significantly better than the UTADIS method when the heuristic HEUR1 (UTADIS1) is used.
5. Experimental comparison of classification techniques
5.
143
SUMMARY OF MAJOR FINDINGS
The experiment presented in this chapter provided useful results regarding the efficiency of a variety of MCDA classification methods compared to
144
Chapter 5
other established approaches. Additionally, it facilitated the investigation of the relative performance of the MCDA classification methods compared to each other. The methods UTADIS, MHDIS and ELECTRE TRI originate from different MCDA approaches. The conducted extensive experiment helped in considering the relative classification performance of these methods for a variety of different data conditions. Overall, the main findings of the experimental analysis presented in this chapter can be summarized in the following points: 1.
The considered MCDA classification methods can be considered as an efficient alternative to widely used statistical techniques, at least in cases where the assumptions of these techniques are not met in the data under consideration. Furthermore, the MCDA classification methods appear to be quite effective compared to other non-parametric classification techniques. Of course, in this analysis only the rough set approach was considered as an example of a non-parametric classification approach. Therefore, the obtained results regarding the comparison of MCDA methods and other non-parametric classification techniques should be further extended considering a wider range of methods, such as neural networks, machine learning, mathematical programming, etc. Despite this shortcoming, it is important to consider the present results as opposed to the results of other experimental studies on the comparison of multivariate statistical classification methods and non-parametric approaches (cf. Chapter 2). The fact that some of these studies often do not show a clear superiority of the existing non-parametric techniques over statistical classification methods, in conjunction with the results of the above analysis, provides a first positive indication on the performance of the considered MCDA classification methods as opposed to other nonparametric techniques. Table 5.9 provides a synopsis of the results of the experiment in terms of pair-wise comparisons of the methods with regard to their error rates in the validation samples. Furthermore, Table 5.10 presents the methods with the lower error rates (in the validation samples) for each combination of the four factors that were found the most significant for the explanation of the differences between the methods. These factors include the form of the statistical distribution of the data the number of groups the size of the reference set and the structure of the group dispersion matrices The methods presented in Table 5.10 as the ones with the lower error rates do not have significant differences according to the grouping of the Tukey’s test at the 5% level. The methods are presented in ascending order from the ones with the lower error rates to those with highest error rates.
5. Experimental comparison of classification techniques
145
The results of Table 5.9 show that the MCDA classification methods (UTADIS, MHDIS, ELECTRE TRI) outperform, in most cases, the other approaches. The high efficiency of the considered MCDA methods is also illustrated in the results presented in Table 5.10. The analysis of Table 5.10 shows that the implementation of UTADIS with the heuristic HEUR2 provides the lowest error rates in most cases, especially when the data come from an asymmetric distribution (exponential and lognormal). In the same cases, the MHDIS method and ELECTRE TRI also perform well. The results of Tables 5.9 and 5.10 lead to the conclusion that the modeling framework of MCDA methods is quite efficient in addressing classification problems. The UTADIS and MHDIS methods that employ a utility-based modeling approach seem to outperform the outranking relations framework of the ELECTRE TRI method. Nevertheless, the differences between these approaches are reduced when more complex problems were considered (e.g., classification problems in three groups and problems with larger reference sets). 2. The procedure proposed for estimating the parameters of the outranking relation in the context of the ELECTRE TRI method (cf. the Appendix of this chapter for a detailed description of the procedure), seems to be well-suited to the study of classification problems. Extending this procedure to consider also the optimistic assignment approach will contribute to the full exploitation of the particular features and capabilities of
146
Chapter 5
5. Experimental comparison of classification techniques
147
148
Chapter 5
ELECTRE TRI. This will enable the modeling of the incomparability relation which provides significant information to the decision maker. Overall, during the whole experiment the discordance test in the ELECTRE TRI method was performed in 1,250 out of the 3,840 total replications conducted in the experiment (32.6%). In the proposed procedure used to specify the parameters of the outranking relation in the ELECTRE TRI method the discordance test is performed only if it is found to improve the classification of the alternatives in the reference set. The limited use of the discordance test in this experiment is most possibly due to the nature of the considered data. Generally, the discordance test is useful in the evaluation of alternatives that have good performance on some criteria but very poor performance on other criteria. In such cases, it is possible that a criterion where the alternative has poor performance vetoes the overall evaluation of the alternative, irrespective of its good features on the other criteria. Such cases where the performances of the alternatives on the criteria have significant fluctuations were not considered in this experiment. Modeling such cases within an experimental study would be an interesting further extension of this analysis in order to formulate a better view of the impact of the discordance test on the classification results of the ELECTRE TRI method. Table 5.11 presents the percentage of replications at which the discordance test was conducted for each combination of the four factors found to be the more significant in this experiment (i.e., the form of the statistical distribution, the number of groups, the size of the reference set and the structure of the group dispersion matrices). It should be noted that for each combination of these four factors 80 replications were performed.
5. Experimental comparison of classification techniques
149
The results of Table 5.11 indicate that the discordance test was most frequently used in the three-group case. Furthermore, it is interesting to note that the frequency of the use of the discordance test was reduced for larger reference sets. Finally, it can also be observed that the heterogeneity of the group dispersion matrices reduced the frequency of the use of the discordance test. Of course, these results on the use of the discordance test need further consideration. The discordance test is a key feature of the ELECTRE TRI method together with the ability of the method to model the incomparability relation. These two features are the major distinguishing characteristics of classification models developed through outranking relation approaches compared to compensatory approaches such as the UTADIS and the MHDIS methods. The analysis of the existing differences in the recommendations (evaluation results) of such methods will contribute to the understanding of the way that the peculiarities of each approach affect their classification performance. The experimental analysis presented in this chapter did not address this issue. Instead, the focal point of interest was the investigation of the classification performance of MCDA classification methods compared to other techniques. The obtained results can be considered as encouraging for the MCDA approach. Moreover, they provide the basis for further analysis along the lines of the above remarks.
150
Chapter 5
APPENDIX DEVELOPMENT OF ELECTRE TRI CLASSIFICATION MODELS USING A PREFERENCE DISAGGREGATION APPROACH
1. Prior research As noted in the presentation of the ELECTRE TRI in Chapter 3, the use of the method to develop a classification model in the form of an outranking relation requires the specification of several parameters, including:
1. The weight of each criterion 2. The reference profiles distinguishing two consecutive groups and for all 3. The preference, indifference and veto thresholds for all criteria and for all k=1, 2, …, q–1. 4. The cut-off threshold that defines the minimum value of the credibility index above which it can be ascertained that the affirmation “alternative is at least as good as profile is valid. The method assumes that all these parameters are specified by the decision maker in cooperation with the decision analyst through an interactive process. Nevertheless, this process is often difficult to be implemented in practice. This is due to two main reasons: a) The increased amount of time required to elicit preferential information by the decision maker. b) The unwillingness of the decision makers to participate actively in the process and to provide the required information. These problems are often met in several fields (e.g., stock evaluation, credit risk assessment, etc.) where decisions have to be taken on a daily basis, and the time and cost are crucial factors for the use of any decision making methodology. To overcome this problem Mousseau and Slowinski (1998) proposed an approach to specify the parameters of the outranking relation classification model of the ELECTRE TRI method using the principles of preference disaggregation. In particular, the authors suggested the use of a reference set for
5. Experimental comparison of classification techniques
151
the specification of the above parameters, so that the misclassifications of the alternatives in the reference set are minimized. This approach is similar to the one used in UTADIS and MHDIS (cf. Chapter 4). The methodology used by the authors implements only the pessimistic assignment procedure (cf. Chapter 3), without considering the discordance test. In the proposed methodology the partial concordance index is approximated through a sigmoid function as follows (cf. Mousseau and Slowinski, 1998):
where:
On the basis of this approximation, a mathematical programming problem with non-linear constraints is formulated and solved to specify optimally the reference profiles, the criteria’s weights, the preference and the indifference thresholds, as well as the cut-off point This non-linear mathematical programming problem includes 4m+3n(q–1)+2 constraints, where m is the number of the alternatives of the reference set, n is the number of criteria and q is the number of groups; 2m of these constraints have a non-linear form. Therefore, assuming a reference set of 100 alternatives evaluated along five criteria and classified into three groups, the proposed non-linear mathematical programming problem consists of 4×l00+3×5×(3–l)+2=432 constraints overall, including 200 (2×100) non-linear constraints. This simple example indicates that even for rather small reference sets, the optimization process in the methodology of Mousseau and Slowinski (1998) can be quite consuming in terms of the computational resources that are required to implement it efficiently. Subsequent studies using this methodology assumed that the reference profiles, as well as the preference and indifference thresholds are known (i.e., specified by the decision maker) and focused only on the estimation on the criteria’s weights (Dias et al., 2000; Mousseau et al., 2000). In this simplified context the resulting mathematical programming formulation has a linear form and consequently its solution is easy even for large reference
152
Chapter 5
sets. Nevertheless, this simplification does not address the problem of specifying the parameters of the outranking relation adequately, since some critical parameters (reference profiles, the preference and indifference thresholds) need to be specified by the decision maker. Furthermore, it should be emphasized that using the pessimistic assignment procedure in ELECTRE TRI without the discordance test, is quite similar to the utility-based approach used in the UTADIS method. In particular, the partial concordance index can be considered as a form of a marginal utility function. The higher the partial concordance index for the affirmation “alternative is at least as good as reference profile on the basis of criterion the higher is the utility/value of alternative on criterion These remarks show that the main distinguishing feature of the two approaches (outranking relations vs utility-based techniques) is the noncompensatory philosophy of the outranking relation approaches that is implemented through the discordance test. In this regard, the use of the ELECTRE TRI method without the discordance test cannot be considered as a different approach to model the classification problem compared to compensatory techniques such as the use of additive utility functions.
2. The proposed approach In order to address all the aforementioned issues a new methodology has been developed to estimate the parameters of an outranking relation classification model within the context of the ELECTRE TRI method. Similarly to the approach of Mousseau and Slowinski (1998), the proposed methodology implements the pessimistic assignment procedure, considering both the concordance and the discordance tests. The methodology combines heuristic techniques for the specification of the preference, indifference and veto thresholds, as well as linear programming techniques for the specification of the criteria’s weights and the cut-off point The major advantages of this methodology can be summarized in the following two issues: 1. It is computationally efficient, even for large data sets. 2. It implements the most significant feature of the ELECTRE TRI method, i.e., the discordance test.
The proposed methodology is a regression–based approach that implements the preference disaggregation philosophy leading to an indirect specification of the parameters involved in the construction and exploitation of an outranking relation within the context of the ELECTRE TRI method. In particular, the determination of the parameters involved in the ELECTRE TRI method is based on the analysis of a reference set of n alternatives
5. Experimental comparison of classification techniques
153
which are classified into the pre–specified ordered groups ( is the group of the most preferred alternatives and is the group of the least preferred alternatives). The outcome of the methodology involves the specification of the criteria’s weights the cut–off point and the parameters and which are defined as follows: and This definition shows that the specification of and provides similar information to the specification of and In particular, using and instead of and the computation of the partial concordance index and the discordance index can be performed as follows:
To facilitate the presentation of the proposed approach, henceforth, and will denote the vectors and respectively, will be used to denote the global concordance index defined on the basis of the partial concordance indices i.e.8:
Similarly, will be used to denote the credibility index defined on the basis of the global concordance index and the discordance index i.e.:
8
In contrast to the discussion of the ELECTRE TRI method in chapter 3, in this presentation the criteria’s weights are assumed to sum up to 1.
154
Chapter 5
The specification of the parameters and in the proposed methodology is performed in two stages. The first stage involves the concordance test in order to determine the criteria’s weights and the parameters and The latter two parameters are specified through the following algorithm, which is applied on the reference set for each criterion separately. Step 1:
Step 2:
Step 3:
Rank-order the performances of the alternatives of the reference set from the lowest to the highest ones where denotes the number of distinct values for criterion Break down the range of criterion values into subintervals such that each and are consecutive values corresponding to two alternatives from different groups. Let denote the number of such sub-intervals formed. Calculate the midpoint of each sub-interval as For all set and such that and so that the following difference is maximized:
where and the sets of groups
denote the sets of alternatives belonging into and respectively.
Steps 1 and 2 are inspired from the algorithm of Fayyad and Irani (1992) which is one of the most popular approaches for the discretization of quantitative criteria for machine learning algorithms. It should be noted that the above algorithm does not lead to the optimal specification of the parameters and Instead, it leads to the identification of “reasonable” values for these parameters which can provide a useful basis for an interactive decision aiding process. Once the and parameters are specified through the above procedure, the following linear program is solved to estimate the criteria’s weights denotes the number of alternatives of the reference set that belong into group
9
All criteria are assumed to be of increasing preference.
5. Experimental comparison of classification techniques
155
Minimize
Subject to:
is a small positive constant. Linear programs of the above form often have multiple optimal solutions. Furthermore, it is even possible that near-optimal solutions provide a more accurate classification of the alternatives than the attained optimum solution (this is because the objective function of the above linear program does not consider the number of misclassifications). These issues clearly indicate the necessity of exploring the existence of alternative optimal or nearoptimal solutions. However, performing a thorough search of the polyhedron defined from the constraints of the above linear program could be a difficult and time consuming process. To overcome this problem the heuristic procedure proposed by Jacquet-Lagrèze and Siskos (1982) for the UTA method is used (this procedure is also used in the UTADIS method, cf. sub-section 2.3.2 of Chapter 4). This procedure involves the realization of a post optimality analysis stage in order to identify a characteristic subset of the set of feasible solutions of the above linear program. In particular, the partial exploration of the feasible set involves the identification of solutions that
156
Chapter 5
maximize the criteria’s weights. Thus, during this post optimality stage n alternative optimal or near–optimal solutions are identified corresponding to the maximization of the weights of the n criteria, one at a time. This enables the derivation of useful conclusions on the stability of the estimated parameters (criteria’s weights). The criteria’s weights which are used in building the outranking relation are then computed as the average of all solutions identified during the post optimality process (Jacquet-Lagrèze and Siskos, 1982). Alternative procedures to aggregate the results of the post optimality analysis are also possible (Siskos, 1982). At this point of the methodology all the information required to compute the concordance index is available. Assuming (for the moment) that no criterion has a veto capability, the assignment of the alternatives is performed as follows (the pessimistic procedure is employed):
Before the end of the first stage of the methodology and the concordance test, the sets are identified as follows:
These sets include the misclassified alternatives identified on the basis of the concordance test. In particular, each set includes the alternatives of group that are assigned (according to the concordance test and the classification rule (A11)) in the set of groups or in the set of groups The second stage of the process involves the discordance test and the specification of the parameter that corresponds to the veto threshold. The specification of this parameter is based on the use of the three-steps algorithm used also in the concordance test. Beginning from step 2 of this algorithm, a criterion is given veto ability only if it aggravates the classification of the alternatives that is obtained on the basis of the concordance test. In particular, for all k=1, 2,…, q–1 and every criterion is initially set as follows: such that and On the basis of this setting, (A2) is used to compute the discordance index and then the credibility index is computed using (A3). On the basis of the credibility index new sets of misclassified alternatives are formed using (A 12). In using relation (A 12) at this stage is set equal to 0.5 and is used instead of If the cardinality of all is smaller than or equal to the cardinality of
5. Experimental comparison of classification techniques
157
then it is considered that the choice aggravates the classification of the alternatives that is obtained on the basis of the concordance test. Therefore, is an acceptable value for the veto threshold parameter If this is not the case then is unacceptable. In either case, the procedure proceeds with setting such that and and the above process is repeated. If the result of the above process indicates that all selections for criterion such that and provide a worse classification of the alternatives compared to the concordance test, then no veto capability is given to the criterion regarding the comparison of the alternatives to the profile If, overall, no criterion is given a veto capability, then the procedure ends, otherwise the cut-point level needs to be determined. This is performed as follows: If the groups are perfectly separable, i.e. where:
and then otherwise the specification of performed through the following linear program:
Subject to:
a small positive constant.
is
Chapter 6 Classification problems in finance
1.
INTRODUCTION
Financial management is a broad and rapidly developing field of management science. The role of financial management covers all aspects of business activity, including investment, financing and dividend policy issues. During the last decades the globalization of the financial markets, the intensifying competition between corporate entities and the socio-political and technological changes have increased the complexity of the business, economic and financial environments. Within this new context the smooth financial operation of any corporate entity and organization becomes a crucial issue for its sustainable growth and development. Nevertheless, the increasing complexity of the financial environment poses new challenges that need to be faced. The plethora of the new financial products that are now available to firms and organization as risk management, investment and financing instruments is indicative of the transformations that have occurred in the finance industry over the past decades and the existing complexity in this field. To address this complexity it is necessary to adjust the financial decisionmaking methodologies so that they meet the requirements of the new financial environment. Empirical approaches are no longer adequate. Instead, gradually there is a worldwide increasing trend towards the development and implementation of more sophisticated approaches based on advanced quantitative analysis techniques, such as statistics, optimization, forecasting, simulation, stochastic processes, artificial intelligence and operations research.
160
Chapter 6
The roots of this new approach of financial decision-making problem can be traced back to the 1950s and the work of Nobelist Harry Markowitz (1952, 1959) on portfolio theory and the use of mathematical programming techniques for portfolio construction. Since then, the contribution of applied mathematics, statistics and econometrics, operations research, artificial intelligence and computer science, in conjunction with the advances of the finance theory, have played a major role on addressing the complexity of financial decision-making problems. The application of the aforementioned quantitative analysis techniques in financial decision-making is of interest both to practitioners and researchers. In particular, practitioners in the finance industry are interested on the development and implementation of efficient quantitative approaches that can provide efficient support in their daily practice. On the other hand, researchers from the aforementioned fields often consider financial decision-making problems as an excellent field where the outcomes of the ongoing theoretical research can be tested under complex and challenging real-world conditions. Several quantitative analysis techniques applied in finance, implement the classification paradigm. This should be of no surprise. A variety of financial decisions need to be taken following the classification approach. Some typical examples include: Business failure prediction: discrimination between failed and non-failed firms. Credit risk assessment: discrimination of firms of low credit risk from firms of high risk (default). Corporate mergers and acquisitions: discrimination between firms that are likely to be merged or acquired from firms for which their ownership status is not expected to change. Stock evaluation and mutual funds’ assessment: classification of the stocks or mutual funds into predefined groups according to their suitability as investment instruments for a particular investor. The classification can be performed in terms of their expected future returns, their risk or any other evaluation criterion that is considered relevant by the decisionmaker/investor. Several investment firms have adopted this approach in their evaluation of stocks and mutual funds (Standard & Poor’s Rating Services, 1997, 2000; Moody’s Investors Service, 1998, 2000; Sharpe, 1998). Bond rating: evaluation of corporate or government bond issues according to the characteristics of the issuer and classification into rating groups. Several well-known financial institutions follow this approach in their bond ratings (e.g., Moody’s Standard & Poor’s, Fitch Investors Service).
6. Classification problems in finance
161
Country risk assessment: evaluation of the performance of countries taking into consideration economic measures as well as social and political indicators, in order to classify the countries into predefined groups according to their default risk. Such classifications are available from leading financial institutions including Moody’s (Moody’s Investors Service, 1999) and Standard & Poor’s. Venture capital investments: evaluation of venture capital investment projects and classification into the ones that should be accepted, rejected or submitted to further analysis (Zopounidis, 1990). Assessment of the financial performance of organizations (banks, insurance companies, public firms, etc.): classification of the organizations into predefined groups according to their financial performance. All the above examples are illustrative of the significance of developing efficient classification models for financial decision-making purposes. On the basis of this finding, the objective of this chapter is to explore the efficiency of the proposed MCDA paradigm in modeling and addressing financial classification problems. This analysis extends the results of the previous chapter through the consideration of real-world classification problems. Three financial decision-making problems are used for this purpose:
1. Bankruptcy prediction. 2. Corporate credit risk assessment. 3. Stock evaluation. In each of these problems the methods considered in the experimental design of Chapter 5 are used to develop appropriate classification models. With regard to the application of the UTADIS method, it should be noted that all the subsequent results are obtained using the heuristic HEUR2, which was found to outperform HEUR1 in many cases considered in the simulation of Chapter 5.
2.
BANKRUPTCY PREDICTION
2.1
Problem domain
In the field of corporate finance any individual, firm or organization that establishes some form of relationship with a corporate entity (i.e., as an investor, creditor or stockholder) is interested on the analysis of the performance and viability of the firm under consideration. Financial researchers have explored this issue from different points of view considering the different forms of financial distress, including default, insolvency and bankruptcy. Essentially, the term “bankruptcy” refers to the
162
Chapter 6
termination of the operation of the firm following a filing for bankruptcy due to severe financial difficulties of the firm in meeting its financial obligations to its creditors. On the other hand, the other forms of financial distress do not necessarily lead to the termination of the operation of the firm. Further details on the different forms of financial distress can be found in the books of Altman (1993), Zopounidis and Dimitras (1998). The consequences of bankruptcy are not restricted to the individuals, firms or organizations that have an established relationship with the bankrupt firm; they often extend to the whole economic, business and social environment of a country or a region. For instance, developing countries are often quite vulnerable to corporate bankruptcies, especially when the bankruptcy involves a firm with a major impact in the country’s economy. Furthermore, taking into account the globalization of the economic environment, it becomes clear that such a case may also have global implications. The recent crisis in Southeast Asia is an indicative example. These findings demonstrate the necessity of developing and implementing efficient procedures for bankruptcy prediction. Such procedures are necessary for financial institutions, individual and institutional investors, as well as for the firms themselves and even for policy makers (e.g., government officers, central banks, etc.). The main goal of bankruptcy prediction procedures is to discriminate the firms that are likely to go bankrupt from the healthy firms. This is a twogroup classification problem. However, often an additional group is also considered to add flexibility to the analysis. The intermediate group may include firms for which it is difficult to make a clear conclusion. Some researchers place in such an intermediate group distressed firms that finally survive through restructuring plans, including mergers and acquisitions (Theodossiou et al., 1996). The classification of the firms into groups according to their bankruptcy risk is usually performed on the basis of their financial characteristics using information derived by the available financial statements (i.e., balance sheet and income statement). Financial ratios calculated through the accounts of the financial statements are the most widely used bankruptcy prediction criteria. Nevertheless, making bankruptcy prediction solely on the basis of financial ratios has been criticized by several researchers (Dimitras et al., 1996; Laittinen, 1992). The criticism has been mainly focused on the fact that financial ratios are only the symptoms of the operating and financial problems that a firm faces rather than the cause of these problems. To overcome this shortcoming, several researchers have noted the significance of considering additional qualitative information in bankruptcy prediction. Such qualitative information involves criteria such as the management of the firms, their organization, the market niche/position, the market’s trends, their
6. Classification problems in finance
163
special competitive advantages, etc. (Zopounidis, 1987). However, this information is not publicly available and consequently quite difficult to gather. This difficulty justifies the fact that most existing studies on bankruptcy prediction are based only on financial ratios. The first approaches used for bankruptcy prediction were empirical. The most well-known approaches of this type include the “5 C method” (Character, Capacity, Capital, Conditions, Coverage), the “LAPP” method (Liquidity, Activity, Profitability, Potential), and the “Creditmen” method (Zopounidis, 1995). Later more sophisticated univariate statistical approaches were introduced in this field to study the discriminating power of financial ratios in distinguishing the bankrupt firms from the non-bankrupt ones (Beaver, 1966). However, the real thrust in the field of bankruptcy prediction was given by the work of Altman (1968) on the use of linear discriminant analysis (LDA) for developing bankruptcy prediction models. Altman used LDA in order to develop a bankruptcy prediction model considering several financial ratios in a multivariate context. This study motivated several other researchers towards the exploration of statistical and econometric techniques for bankruptcy prediction purposes. Some characteristic studies include the work of Altman et al. (1977) on the use of QDA, the works of Jensen (1971), Gupta and Huefner (1972) on cluster analysis, the work of Vranas (1992) on the linear probability model, the works of Martin (1977), Ohlson (1980), Zavgren (1985), Peel (1987), Keasey et al. (1990) on logit analysis, the works of Zmijewski (1984), Casey et al. (1986), Skogsvik (1990) on probit analysis, the work of Luoma and Laitinen (1991) on survival analysis, and the work of Scapens et al. (1981) on catastrophe theory. During the last two decades new non-parametric approaches have gained the interest of the researchers in the field. These approaches include among others, mathematical programming (Gupta et al., 1990), expert systems (Elmer and Borowski, 1988; Messier and Hansen, 1988), machine learning (Frydman et al., 1985), rough sets (Slowinski and Zopounidis, 1995; Dimitras et al., 1999), neural networks (Wilson and Sharda, 1994; Boritz and Kennedy, 1995), and MCDA (Zopounidis, 1987; Andenmatten, 1995; Dimitras et al., 1995; Zopounidis, 1995; Zopounidis and Dimitras, 1998). The results of these studies have shown that the aforementioned new approaches are well-suited to the bankruptcy prediction problem providing satisfactory results compared to the traditional statistical and econometric techniques. A comprehensive review of the relevant literature on the bankruptcy prediction problem can be found in the books of Altman (1993), Zopounidis and Dimitras (1998), as well as in the works of Keasey and Watson (1991), Dimitras et al. (1996), Altman and Saunders (1998).
164
2.2
Chapter 6
Data and methodology
The data of this application originate from the study of Dimitras et al. (1999). Two samples of Greek industrial firms are considered. The first sample includes 80 firms and is used for model development purposes, while the second sample consists of 38 firms and serves as the validation sample. Henceforth, the first sample will be referred to as the basic sample, and the second one as the holdout sample. The basic sample includes 40 firms that went bankrupt during the period 1986–1990. The specific time of bankruptcy is not common to all firms. In particular, among these 40 firms, 6 went bankrupt in 1986, 10 in 1987, 9 in 1988, 11 in 1989 and 4 in 1990. For each of the bankrupt firms, financial data are collected for up to five years prior to bankruptcy using their published financial statements. For instance, for the firms that went bankrupt in 1986, the collected financial data span the period 1981–1985. Consequently, the basic sample actually spans the period 1981–1989. To facilitate the presentation and discussion of the results, each year prior to bankruptcy will be denoted as year –1, year –2, year –3, year –4 and year –5. Year –1 refers to the first year prior to bankruptcy (e.g., for the firms that went bankrupt in 1986, year –1 refers to 1985); year –2 refers to the second year prior to bankruptcy (e.g., for the firms that went bankrupt in 1986, year –2 refers to 1984), etc. The bankrupt firms operate in 13 different industrial sectors including food firms, textile firms, chemical firms, transport, wear and footwear industries, metallurgical industries, etc. To each of these bankrupt firms a non-bankrupt firm is matched from the same business sector. The matching is performed on the basis of the size of the firms, measured in terms of their total assets and the number of employees. The holdout sample was compiled in a similar fashion. This sample includes 19 firms that went bankrupt in the period 1991–1993. The financial data gathered for these firms span over a three-year period prior to bankruptcy. A matching approach, similar to the one used for the basic sample, has also been used for the selection of the non-bankrupt firms. The fact that the holdout sample covers a different period than the one of the basic sample, enables the better investigation of the robustness of the performance of the developed bankruptcy prediction models. On the basis of the available financial data for the firms of the two samples 12 financial ratios have been calculated to be used as bankruptcy prediction criteria. These financial ratios are presented in Table 6.1. The selection of these ratios is based on the availability of financial data, their relevance to the bankruptcy prediction problem as reported in the international financial literature, as well as on the experience of an expert credit manager of a leading Greek commercial bank (Dimitras et al., 1999).
6. Classification problems in finance
165
Among the financial ratios considered, the first four ratios measure the profitability of the firms. High values of these ratios correspond to profitable firms. Thus, all these ratios are negatively related to the probability of bankruptcy. The financial ratios current assets/current liabilities and quick assets/current liabilities involve the liquidity of the firms and they are commonly used to predict bankruptcy (Altman et al., 1977; Gloubos and Grammaticos, 1988; Zavgren, 1985; Keasey et al., 1990; Theodossiou, 1991; Theodossiou et al., 1996). Firms having enough liquid assets (current assets) are in better liquidity position and are more capable in meeting their short– term obligations to their creditors. Thus, these two ratios are negatively related to the probability of bankruptcy. The remaining ratios are related to the solvency of the firms and their working capital management. High values on the ratios and (solvency ratios) indicate severe indebtedness, in which case the firms have to generate more income to meet their obligations and repay their debt. Consequently both ratios are positively related to the probability of bankruptcy. Ratios and are related to the working capital management efficiency of the firms. Generally, the higher is the working capital of a firm, the less likely is that it will go bankrupt. In that regard, ratios and are negatively related to the probability of bankruptcy, whereas ratio is positively related to bankruptcy (inventories are often difficult to liquidate and consequently a firm holding a significant amount of inventory is likely to face liquidity problems).
Of course the different industry sectors included both in the basic and the holdout sample, are expected to have different financial characteristics, thus presenting differences in the financial ratios that are employed. Some researchers have examined the industry effects on bankruptcy prediction models, by adjusting the financial ratios to industry averages. However, the
166
Chapter 6
obtained results are controversial. Platt and Platt (1990) concluded that an adjusted bankruptcy prediction model performs better than an unadjusted one, while Theodossiou (1987) did not find any essential difference or improvement. Furthermore, Theodossiou et al. (1996) argue that adjusted industry or time models implicitly assume that bankruptcy rates for businesses are homogenous across industries and time, an assumption which is hardly the case. On this basis, no adjustment to the industry sector is being made on the selected financial ratios. Tables 6.2-6.6 present some descriptive statistics regarding the two samples with regard to the selected financial ratios for the two groups of firms, the bankrupt and the non-bankrupt and respectively). In particular, Table 6.2 presents the means of the financial ratios, Tables 6.3 and 6.4 present the skewness and kurtosis coefficients, whereas Tables 6.5 and 6.6 present the correlation coefficients between the selected financial ratios. It is interesting to note from Table 6.2 that many of the considered financial ratios significantly differentiate the two groups of firms at least in the case of the basic sample. However, in the holdout sample the differences among the two groups of firms are less significant. Actually, the only ratio that significantly differentiates the two groups for all three years of the holdout sample, is the solvency ratio total liabilities/total assets that measures the debt capacity of the firms. On the basis of the two samples and the selected set of financial ratios, the development and validation of bankruptcy prediction models is performed in three stages:
1. In the first stage the data of the firms included in the basic sample for the first year prior to bankruptcy (year -1) are used to develop a bankruptcy prediction model. The predictions made with this model involve a timedepth of one year. In that sense, the model uses as input the financial ratios of the firms for a given year and its output involves an assessment of the bankruptcy risk for the firms in year Alternatively, it could be possible to develop different models for each year prior to bankruptcy (years –1 up to –5). In this scheme, each model would use the financial ratios for a year to produce an estimate of the bankruptcy risk in years t+1, t+2, …, t+5. Also, it could be possible to develop a multi-group bankruptcy prediction model considering not only the status of the firms (bankrupt or non-bankrupt) but also the time bankruptcy occurs. In this multi-group scheme the groups could be defined as follows: nonbankrupt firms firms that will go bankrupt in the forthcoming year firms that will go bankrupt in year firms that will go bankrupt in year firms that will go bankrupt in year firms that will go bankrupt in year This approach has been used in the study of Keasey et al. (1990). Nevertheless, the fact that the hold-
6. Classification problems in finance
167
out sample involves a three year period as opposed to the five year period of the basic sample posses problems on the validation of the classification models developed through these alternative schemes. 2. In the second stage of the analysis the developed bankruptcy prediction models are applied to the data of the firms of the basic sample for the years –2, –3, –4 and –5. This enables the investigation of the ability of the developed models to provide early warning signals for the bankruptcy status of the firms used to develop these models.
168
Chapter 6
6. Classification problems in finance
169
170
Chapter 6
6. Classification problems in finance
171
3. In the final stage of the analysis, the developed models are applied to the three years of the holdout sample. This provides an assessment of the generalizing ability of the models when different firms for a different time period are considered. The above three-stage procedure was used for all the considered classification methods. The sub-sections that follow analyze and compare the obtained results.
172
Chapter 6
2.3
The developed models
2.3.1
The model of the UTADIS method
To apply the UTADIS method the heuristic HEUR2 is used for the specification of the piece-wise linear form of the marginal utility functions. Following this approach, the additive utility model developed for bankruptcy prediction purposes has the following form:
The coefficients of the marginal utilities in this function, show that the most significant ratios for the discrimination of the two groups of firms and the prediction of bankruptcy are the solvency ratios total liabilities/total assets and net worth/(net worth + long-term liabilities) For the other ratios there are no significant differences in their contribution to bankruptcy prediction. Only the ratio inventory/working capital has very low significance in the developed additive utility model. The specific form of the marginal utility functions of the developed model is illustrated in Figure 6.1. On the basis of this additive utility model the classification of the firm as bankrupt or non-bankrupt is performed through the following classification rules: If
then firm
is non-bankrupt.
If
then firm
is bankrupt.
6. Classification problems in finance
173
174
2.3.2
Chapter 6
The model of the MHDIS method
In the case of the MHDIS method, based on the financial ratios of the firms for year –1, two additive utility functions are developed, since there are only two groups (bankrupt and non-bankrupt firms). The first additive utility function, denoted as characterizes the non–bankrupt firms, while the second one, denoted as characterizes the bankrupt firms. The form of these two functions is the following:
According to the weighting coefficients of the marginal utilities in the above utility functions the dominant factors for the estimation of bankruptcy risk are the profitability ratio net income/total assets and the solvency ratio total liabilities/total assets The latter was also found to be significant in the bankruptcy prediction model of the UTADIS method. In the case of the MHDIS method the ratio net income/total assets mainly characterizes the non-bankrupt firms, since its weight in the additive utility function is more than 85%. On the other hand, the fact that the weight of this ratio in the utility function is only 3.85% indicates that while high values of net income/total assets are significant characteristic of non-bankrupt firms, low values do not necessarily indicate bankruptcy. The second ratio that is
6. Classification problems in finance
175
found significant, i.e., the ratio total liabilities/total assets mainly characterizes the bankrupt firms, since its weight in the utility function exceeds 88%. However, the weight of this ratio in the utility function is only 2.81%, indicating that while high values of total liabilities/total assets characterize the bankrupt firms, low values of this ratio are not a significant characteristic for non-bankrupt firms. The forms of the marginal utility functions for these two ratios (Figure 6.2) provide some insight on the above remarks. In particular, the form of the marginal utility function for the profitability ratio net income/total assets indicates that firms with net income/total assets higher than 1.29% are more likely to be classified as non–bankrupt. On the other hand, firms with total liabilities/total assets higher than 77.30% are more likely to go bankrupt. These results indicate that profitability and solvency are the two main distinguishing characteristics of non–bankrupt and bankrupt firms, according to the model developed through MHDIS.
176
Chapter 6
The decision regarding the classification of a firm into one of the two considered groups (bankrupt and non-bankrupt) is based upon the global utilities obtained through the two developed additive utility functions. In that sense, a firm is considered bankrupt if and non–bankrupt if Through out the application there were no cases where
2.3.3
The ELECTRE TRI model
The application of the ELECTRE TRI is based on the use of the procedure presented in Chapter 5 for the specification of the parameters of the outranking relation classification model. In applying this procedure, first the concordance test is performed to specify the parameter vector corresponding to the preference thresholds for all criteria and the parameter vector corresponding to the indifference thresholds for all criteria At the same stage the criteria weights are estimated. An initial estimation for the cut-off
6. Classification problems in finance
177
point is also obtained. All this information is used to perform an initial classification of the firms of the reference set (year -1, basic sample) and to measure the classification error rate. Then, in a second stage the impact of the discordance test is explored on the classification results. If the discordance test improves the classification of the firms, then it is used in the construction of the outranking relation classification model, otherwise the results of the discordance test are ignored and the classification is performed considering only the results of the concordance test. In this bankruptcy prediction case study, the discordance test was not found to improve the classification results obtained through the concordance test. Consequently all the presented results are the ones obtained only from the first step of the parameter estimation procedure involving the concordance test. The results of the parameter estimation process for the considered bankruptcy prediction data are presented in Table 6.7.
According to the weights of the financial ratios in the outranking relation model of the ELECTRE TRI method, the most significant ratios for the prediction of bankruptcy are the profitability ratio net income/total assets and the solvency ratios net worth/(net worth + long-term liabilities) and current liabilities/total assets The net income/total assets ratio was also found to be a significant factor in the bankruptcy prediction model of the MHDIS method. The classification of the firms as bankrupt or non-bankrupt according to the outranking relation model of the ELECTRE TRI method is performed
178
Chapter 6
using only the results of the concordance test, since the use of the discordance test was not found to improve the classification results for the data of the reference set (i.e., year -1 of the basic sample). Therefore, the credibility index is defined on the basis of the global concordance index, as follows:
for all firms where:
On the basis of the credibility index calculated in this way, the classification of the firms is performed through the following classification rule (the cut-off point is estimated through the procedure described in the appendix of the Chapter 5): If
then firm
is non-bankrupt.
If
then firm
is bankrupt.
2.3.4
The rough set model
Similarly to the experimental simulation of Chapter 5, the MODLEM algorithm is used to develop a rough set model for the prediction of bankruptcy. The resulting model consists of 10 decision rules presented in Table 6.8. The rule-based model considers only six ratios thus leading to a significant decrease in the information required to derive bankruptcy predictions. The ratios used in the developed rules include the profitability ratios and the solvency ratios and and the working capital ratio Some of these ratios were also found significant in the bankruptcy models of the MCDA classification techniques. In particular: (a) the net income/total assets ratio was found significant by the MHDIS method and the ELECTRE TRI method, (b) the total liabilities/total assets ratio was found significant by UTADIS and MHDIS, and (c) the net worth/(net worth + long-term liabilities) ratio was found significant by UTADIS and ELECTRE TRI. Six of the developed rules involve the bankrupt firms, whereas the remaining four rules involve the non-bankrupt ones. It is also interesting to
6. Classification problems in finance
179
note that the rules for the non-bankrupt firms are stronger than the rules corresponding to the bankrupt firms. The average strength of the rules for the non-bankrupt firms is 15.75 as opposed to 11.17 which is the average strength of the rules for the bankrupt firms. These two findings (the number and the strength of the rules per group) indicate that, generally, it is more difficult to describe the bankrupt firms than the non-bankrupt ones.
2.3.5
The statistical models
The application of the three statistical classification techniques in the bankruptcy prediction data led to the development of three bankruptcy prediction models. The functional form of these models and the corresponding bankruptcy prediction rules are the following:
1. Linear discriminant analysis (LDA): If
then firm
is non-bankrupt.
If
then firm
is bankrupt.
2. Quadratic discriminant analysis (QDA): If
then firm
is non-
bankrupt. If
then firm
is bankrupt.
180
Chapter 6
3. Logit analysis (LA): If
then firm
is non-bankrupt.
If
then firm
is bankrupt.
Tables 6.9 and 6.10 present the estimates for the parameters of the above models, including the constant term the discriminant coefficients and the cross-product terms for the case of QDA.
6. Classification problems in finance
2.4
181
Comparison of the bankruptcy prediction models
The detailed classification results obtained through all the above bankruptcy prediction models are presented in Tables 6.10 and 6.11 for the basic and the holdout sample, respectively. The presented results involve the two types of error rates. The type I error refers to the classification of bankrupt firms as non-bankrupt ones, whereas the type II error refers to the classification of non-bankrupt firms as bankrupt ones. Generally, the type I error leads to capital loss (e.g., a firm that goes bankrupt cannot fulfill its debt obligations to its creditors), while the cost of the type II error has the form of an opportunity cost (e.g., the creditor loses the opportunity to gain revenues from granting credit to a healthy firm). In that sense, it is obvious that the type I error is much more significant than the type II error. Altman (1993) argues that the type I error for banks in USA is approximately 62% of the amount of granted loans, whereas the type II error is only 2% (the difference between a risk-free investment and the interest rate of the loan). However, in obtaining a good measure of the overall classification it is necessary to consider both the cost of the individual error types, and the a-priori probability that an error of a specific type may occur. In particular, generally, the number of non-bankrupt firms is considerably larger than the number of firms that go bankrupt. For instance in USA, Altman (1993) notes that bankrupt firms constitute approximately 5% of the total population of firms. Of course, this percentage varies from country to country and for different time periods. Nevertheless, it gives a clear indication that the probability that a firm will go bankrupt is considerably lower than the probability that a firm will not go bankrupt. On the basis of these remarks, it is reasonable to argue that both the type I error and the type II error contribute equally to the estimation of the overall error rate. Therefore, the overall error rate is estimated as the average of the two error types. For more details on the manipulation of the probabilities and the costs associated with the type I and II errors see Theodossiou et al. (1996) and Bardos (1998). The comparison of the bankruptcy prediction results regarding the overall error rate in the basic sample shows that UTADIS, MHDIS and rough sets perform rather better than the other techniques. In particular, in year -1 rough sets perform better than all the other techniques, followed by UTADIS, MHDIS and ELECTRE TRI. This is of no surprise: recall that the data of the firms for the first year prior to bankruptcy are used as the reference set for model development. Rough sets and all the MCDA classification techniques have a higher fitting ability compared to the three statistical techniques and consequently they are expected to perform better in terms of the error rate in the reference set. In year -2 UTADIS and MHDIS provide the same overall error rate which is significantly lower than the error rate of all
182
Chapter 6
the other methods. In year -3 MHDIS provides the lowest error rate, followed by UTADIS (HEUR2). In year -4 the lowest overall error rate is obtained by the UTADIS model, whereas in year -5 the best result is obtained by the models of MHDIS, rough sets and LA. As far as the holdout sample is concerned, it is clear that the overall error rate of all methods is increased compared to the case of the basic sample. The comparison of the methods shows that the bankruptcy prediction model of UTADIS performs better than the other models in years -1 and -2, whereas in year -3 the lower overall error rate is obtained by the model of the MHDIS method. It is important to note that all three MCDA methods (UTADIS, MHDIS and ELECTRE TRI) outperform the three statistical techniques in all three years of the holdout sample. This is significant finding with regard to the relative efficiency of the corresponding bankruptcy prediction models. It is also interesting to note that the two bankruptcy prediction models of UTADIS and MHDIS provide significantly lower type I error rates compared to the other techniques. For both these models the type I error rate is lower than 50% for all years of the holdout sample, whereas in many cases regarding the other methods it considerably exceeds 50%. Overall, it should be noticed that the type I error rate is higher than the type II error rate for all methods in both the basic and the holdout samples. Nevertheless, this is not a surprising result. Generally the process which leads a firm to bankruptcy is a dynamic one and it cannot be fully explained through the examination of the financial characteristics of the firm. In the beginning of this process the financial characteristics of both non-bankrupt and bankrupt firms are usually similar (Dimitras et al., 1998). As time evolves some specific changes in the environment (internal and external) in which the firm operates, such as changes in the management of the firm or changes in the market, may lead the firm in facing significant problems which ultimately lead to bankruptcy. Thus, non-bankrupt firms are rather easier to describe than the bankrupt firms, in terms of their financial characteristics (they remain in good position over time). The above finding has motivated researchers to propose the consideration of additional qualitative strategic variables in bankruptcy prediction models, including among others management of the firms, their organization, their market niche/position, their technical facilities, etc. (Zopounidis, 1987). Actually, as pointed out by Laitinen (1992) the inefficiency of the firms along these qualitative factors is the true cause of bankruptcy; the poor financial performance is only a symptom of bankruptcy rather than its cause.
6. Classification problems in finance
183
184
Chapter 6
Despite the significance of the consideration of qualitative strategic variables in developing bankruptcy prediction models, their collection is a difficult process since they are not publicly available to researchers and analysts. This difficultly was also encountered in this case study. The exclusion from the analysis of qualitative information can be considered as one of the main reasons justifying the rather low performance of all the developed models that is apparent in some cases in both the basic and the holdout samples. However, similar results are obtained in most of the existing studies that employ the same methodological framework for bankruptcy prediction purposes (i.e., the use of financial ratios), thus indicating that additional qualitative information is required to perform a better description of the bankruptcy process. Another form of information that could be useful for modeling and estimating bankruptcy risk involves the general economic conditions within which firms operate. The economic environment (e.g., inflation, interest
6. Classification problems in finance
185
rates, exchange rates, taxation, etc.) has often a significant impact on the performance and the viability of the firms. Consequently, the consideration of the sensitivity of the firms to the existing economic conditions could add a significant amount of useful information for bankruptcy prediction purposes, especially in case of economic vulnerability and crises. Some studies have adopted this approach (Rose et al., 1982; Foster, 1986) but still further research is required on this issue. Finally, it would be useful to consider the bankruptcy prediction problem in a dynamic context rather than in a static one. As already noted, bankruptcy is a time-evolving event. Therefore, it could be useful to consider all the available bankruptcy related information as time evolves in order to develop more reliable early warning models for bankruptcy prediction. Kahya and Theodossiou (1999) followed this approach and they modeled the bankruptcy prediction problem in a time-series context. All the above remarks constitute interesting research directions in the field of bankruptcy prediction. The fact that there is no research study that combines all the above issues to develop a unified bankruptcy prediction theory indicates the complexity of the problem and the significant research effort that still needs to be made. Despite this finding the existing research on the development of bankruptcy prediction models should not be considered inadequate for meeting the needs of practitioners. Indeed, any model that considers publicly available information (e.g., financial ratios) and manages to outperform the estimations of expert analysts has an obvious practical usefulness. Studies on this issue have shown that bankruptcy prediction models such as the ones developed above perform often better than experts (Lennox, 1999).
3.
CORPORATE CREDIT RISK ASSESSMENT
3.1
Problem domain
Credit risk assessment refers to the analysis of the likelihood that a debtor (firm, organization or individual)1 will not be able to meet its debt obligations to its creditors (default). This inability can be either temporary or permanent. This problem is often related to bankruptcy prediction. Actually, bankruptcy prediction models are often used within the credit risk assessment context. However, the two problems are slightly different: bankruptcy has mainly a legal interpretation, whereas default has a financial interpreta1
Without loss of generality the subsequent analysis will focus on the case where the debtor is a firm or organization (corporate credit risk assessment).
186
Chapter 6
tion. Indeed, most authors consider that a firm is in a situation of default when the book value of its liabilities exceeds the market value of its assets (Altman, 1993). Except for this difference on the definition of bankruptcy and credit risk, there is also a significant underlying difference in the practical context within which they are addressed. In particular, bankruptcy prediction simply involves the assessment of the likelihood that a firm will go bankrupt. On the other hand, credit risk assessment decisions need to be taken in a broader context considering the following two issues:
1. The estimated loss from granting credit to a firm that will ultimately default (default risk). 2. The estimated profit from granting credit to a healthy firm. The tradeoff between the estimated losses and profits is a key issue for deciding on the acceptance or rejection of the credit as well as on the amount of credit that will be granted. Within this context, the credit risk assessment problem can be addressed within a three-stage framework (Srinivasan and Kim, 1987): Stage 1: Estimation of the present value of the expected profits and losses for each period of the loan, on the basis of the background (credit history) of the firm. Stage 2: Combination of the present value of the expected profit/losses with the probabilities of default and non-default to estimate the net present value from granting the credit. Stage 3: If the net present value is negative then the credit is rejected, otherwise it is accepted and the amount of loan to grant is determined. The implementation of this context assumes that the credit granting problem (and consequently credit risk assessment) is a multi-period problem (cf. stage 1). This is true considering that the repayment of the loan is performed through a series of interest payments (monthly, semi-annual or annual) spanned over a period of time (usually some years). Over this period the credit institution has the opportunity to extend its cooperation with the firm. In this regard, profits are not only derived from the interest that the firm pays for the loan, but they may also be derived through the extended cooperation between the bank and the firm. The contribution of classification techniques in the implementation of the above framework is realized in the second stage involving the estimation of the probabilities of default and non-default. Default is often followed by a bankruptcy filing, thus the analysis presented in the previous sub-section for the development of bankruptcy prediction models is similar to that used often for credit risk assessment purposes. Nevertheless, it should be emphasized that default does not necessarily mean that a firm will go bankrupt. For
6. Classification problems in finance
187
instance, a firm in default may implement a restructuring plan to recover from its problems and ultimately to avoid bankruptcy (Altaman, 1993). The use of classification techniques for credit risk assessment aims at developing models which assign the firms into groups, according to their credit risk level. Usually, two groups are used: a) firms for which the credit should be granted and b) firms for which the credit should be rejected. The gathering of the data required for the development of the appropriate credit risk model (i.e., construction of a reference set/training sample) can be realized using the existing credit portfolio of the financial institution for which the development of the model takes place. The development of such a credit risk assessment model provides significant advantages for financial institutions (Khalil et al., 2000): It introduces a common basis for the evaluation of firms who request financing. The credit applications are, usually, evaluated at a peripheral level and not at a central one, particularly in cases where the amount of the credit is limited. The practical implementation of a credit risk assessment model allows the use of a common evaluation system, thus reducing the peremptoriness and subjectivity that often characterize individual credit analysts. It constitutes a useful guide for the definition of the amount of the credit that could be granted (Srinivasan and Kim, 1987). It reduces the time and cost of the evaluation procedure, which could be restricted to firms of high credit risk. Further analysis of the credit applications of these firms can be realized thoroughly from the specialized credit analysts, at a central level. It facilitates the management and monitoring of the whole credit portfolio of the financial institution. The above four points justify the wide spread use of credit risk assessment systems. At the research level, there has been a wide use of statistical approaches up to today. An analytical presentation of the relevant applications is outlined in the book of Altaian et al. (1981). However, recently there has been a spread of alternative approaches such as machine learning and expert systems (Cronan et al. 1991; Tessmer, 1997; Matsatsinis et al., 1997), decision support systems (Srinivasan and Ruparel, 1990; Duchessi and Belardo, 1987; Zopounidis et al., 1996; Zopounidis and Doumpos, 2000b), genetic algorithms and neural networks (Fritz and Hosemann, 2000), multicriteria analysis (Bergeron et al., 1996; Zopounidis and Doumpos, 1998; Jablonsky, 1993; Lee et al., 1995; Khalil et al., 2000), e.t.c. The objective of the application presented in the subsequent subsections is to illustrate the potentials of credit risk assessment models within the aforementioned credit granting framework, using data derived from the
188
Chapter 6
credit portfolio of a leading Greek commercial bank. For this purpose, the classification methods used in Chapter 5 are employed and the obtained results are compared.
3.2
Data and methodology
The data of this application involve 60 industrial firms derived from the credit portfolio of a leading Greek commercial bank. The data span through the three year period 1993–1995. The firms in the sample are classified into two groups: The firms with high financial performance that cooperate smoothly with the bank and manage to fulfill their debt obligations. These firms are considered as typical examples of firms with low credit risk that should be financed by the bank. The number of these low credit risk firms in the sample is 30. The firms with poor financial performance. The cooperation of the bank with these firms had several problems since the firms were not able to meet adequately their debt obligations. In that regard, these firms are considered as typical examples of firms in default for which credit should be rejected. The sample includes 30 such firms. Based on the detailed financial data of the firms for the period under consideration, all the 30 financial ratios which are included in the financial model base of the FINCLAS system (FINancial CLASsification; Zopounidis and Doumpos, 1998) were computed as an initial set of evaluation criteria describing the financial position and the credit risk of the firms. Table 6.12 presents these ratios. Of course, developing a credit risk assessment model, involving such a large amount of information, is of rather limited practical interest. Indeed, a broad set of ratios is likely to include ratios that provide similar information (i.e., highly correlated ratios). Such a situation poses both practical and model development problems. At the practical level, credit risk analysts do not feel comfortable with the examination of ratios that provide the same information, because data collection is time consuming and costly. Therefore, the analysis needs to be based on a compact set of ratios to avoid the collection of unnecessary and overlapping data. From the model development point of view, the consideration of correlated ratios poses problems on the stability and the interpretation of the developed model. To avoid these problems a factor analysis was initially performed. Through this process nine factors were obtained. Table 6.13 presents the factor loadings of each financial ratio.
6. Classification problems in finance
189
190
Chapter 6
It was decided to include in the analysis the ratios with factor loadings greater than 0.7 (in absolute terms). Therefore, the ratios (net income/sales), (net worth/total liabilities), (current assets/current liabilities), (quick assets/current liabilities), (cash/current liabilities), (dividends/cash flow), (working capital/total assets) and (current liabilities/inventories) were selected. Ratios and involve the profitability
6. Classification problems in finance
191
of the firms, ratios and involve the solvency and liquidity, while ratio involves the managerial performance (Courtis, 1978). In addition to the above ratios selected through factor analysis, in a second stage it was decided to incorporate in the analysis some additional financial ratios which are usually considered as important factors in the assessment of credit risk, in order to have a more complete description of the firms’ credit risk and financial performance. Through a cooperative process and discussion with the credit managers of the bank, four additional financial ratios were selected at this stage: the profitability ratios (earnings before interest and taxes/total assets) and (net income/net worth), the solvency ratio (total liabilities/total assets) and the managerial performance ratio (interest expenses/sales). The final set of the selected ratios is presented in Table 6.14.
Most of the selected financial ratios are negatively related to credit risk (i.e., higher values indicate lower credit risk). Only ratios (total liabilities/total assets), (interest expenses/sales) and (current liabilities/inventories) are positively related to credit risk (i.e., higher values indicate higher credit risk). Tables 6.15-6.17 present some descriptive statistics regarding the two groups of firms (i.e., low credit risk and high credit risk) with regard to the selected financial ratios. In particular, Table 6.15 presents the means of the financial ratios, Table 6.16 presents the skewness and kurtosis coefficients, whereas Table 6.17 presents the correlation coefficients between the selected financial ratios.
192
Chapter 6
6. Classification problems in finance
193
From the results of the descriptive statistical analysis it is interesting to observe that most financial ratios significantly differentiate the two groups of firms (cf. Table 6.15). The only exceptions involve the ratios cash/current liabilities and current liabilities/inventories which are not found statistically significant in any of the three years of the analysis. It should also be noted that the skewness and kurtosis are similar for the two groups of firms for many ratios (cf. Table 6.16), while their changes over time are in many cases limited. On the basis of the considered sample of firms, the credit risk model development and validation process is similar to the one used in the bankruptcy prediction case study discussed earlier in section 2 of this chapter. In particu-
194
Chapter 6
lar, the data of the firms in the sample for the most recent year (year 1995) are used as the reference set for the development of appropriate credit risk assessment models that distinguish the low risk firms from the high risk ones. In a second stage the developed models are applied on years 1994 and 1993 to test their ability in providing reliable early-warning signals of the credit risk level of firms. Following this analysis framework the subsequent sub-sections present in detail the obtained results for all methods.
3.3
The developed models
3.3.1
The UTADIS model
Using the data of the firms for the most recent year (year 1995) as the reference set, the application of the UTADIS method (HEUR2) led to the development of the following additive utility function as the appropriate credit risk assessment model:
Figure 6.3 presents form of the marginal utility function above additive utility model.
in the
6. Classification problems in finance
195
196
Chapter 6
The weighting coefficients of the marginal utility functions in the credit risk assessment model (6.2) of the UTADIS method indicate that the most significant ratios for assessing credit risk are the profitability ratios net income/net worth and net income/sales The weights for most of the other ratios are similar, ranging between 6% and 9%. Only ratios cash/current liabilities and dividends/cash flow have weights lower than 6% and consequently they can be considered as the least significant factors for determining the level of credit risk. On the basis of the developed additive utility credit risk assessment model of the UTADIS method, the distinction between low and high risk firms is performed through the following rules (the cut-off point 0.6198 is estimated by the method during the model development process): If
then firm
is a low risk firm.
If
then firm
is a high risk firm.
3.3.2
The model of the MHDIS method
Similarly to case of the UTADIS method, the data of the firms in the sample for year 1995 are used to develop a credit risk assessment model through the MHDIS method. The resulting model consists of two additive utility function, denoted by and The former characterizes the firms of low credit risk, whereas the latter characterizes the firms of high credit risk. The analytic form of these two functions is the following:
6. Classification problems in finance
197
The weighting coefficients of the marginal utilities in the above utility functions differ for each of the two groups. In particular, the main characteristics of the low risk firms are the ratios earnings before interest and taxes/total assets current assets/current liabilities quick assets/current liabilities dividends/cash flow interest expenses/sales and current liabilities/inventories The weights of these ratios in the utility function are 15.74%, 11.58%, 14.38%, 14.89%, 16.21%, and 15.26% respectively. On the other hand, the ratios that best describe the high risk firms are the ratios net income/net worth net income/sales and net worth/total liabilities with weights, in the function equal to 22.42%, 29% and 18.11% respectively. Ratios and were also found significant in the UTADIS credit risk assessment model. The marginal utility functions of all financial ratios are illustrated in Figure 6.4.
198
Chapter 6
6. Classification problems in finance
199
The decision regarding the classification of a firm into one of the two considered groups (low credit risk and high credit risk) is based upon the global utilities obtained through the two developed additive utility functions. In that sense, a firm is considered to be a low risk firm if otherwise if then is classified as a firm of high credit risk.
3.3.3
The ELECTRE TRI model
The main parameters of the credit risk assessment developed through the ELECTRE TRI method are presented in Table 6.18. Similarly to the bankruptcy prediction model discussed earlier in this chapter, the developed credit risk assessment model does not employ the discordance test of the ELECTRE TRI method. Consequently, all the presented results and the classification of the firms are solely based on the concordance test. The classification is performed as follows: If
then firm
is a low credit risk firm.
If
then firm
is a high credit risk firm.
The estimated weights of the financial ratios in the ELECTRE TRI model indicate that the most significant factors for assessing corporate credit risk include the ratios net income/net worth net income/sales current assets/current liabilities and dividends/cash flow These results have some similarities with the conclusions drawn from the models developed through the UTADIS and the MHDIS method. In particular, ratios net income/net worth and net income/sales were found significant in the UTADIS model. The same ratios were found significant in characterizing the firms of high credit risk according to the model of the MHDIS method. Furthermore, ratios current assets/current liabilities and dividends/cash
200
Chapter 6
flow were found significant in describing the firms of low credit risk according to the credit risk model of MHDIS.
3.3.4
The rough set model
The credit risk assessment model developed through the rough set approach consists of only three simple decision rules, presented in Table 6.19.
The first rule covers all the low risk firms included in the reference set (year 1995). Its condition part considers two ratios, namely the ratios net income/sales and working capital/total assets The former ratio was found significant in all the credit risk models developed through UTADIS, MHDIS and ELECTRE TRI. This result indicates that the net profit margin (net income/sales) is indeed a decisive factor in discriminating the low risk firms from the high risk ones. Rules 2 and 3 describe the firms of high credit risk. It is interesting to note that these rules are actually the negation of rule 1 that describes the low risk firms. Therefore, the above rule set actually
6. Classification problems in finance
201
consists of only one rule, that is rule 1. If rule 1 is fulfilled then it is concluded that the firm under consideration is of low credit risk, otherwise it is a high risk firm. The major advantage of the credit risk assessment model developed through the rough set approach is derived from the fact that it is quite a compact one, in terms of the information required to implement it. The analyst using this model needs to specify only two ratios Therefore, credit risk assessment decisions can be taken in very short time without requiring the use of any specialized software to implement the developed model. This significantly reduces the time and the cost of the decisions taken by the credit analysts.
3.3.5
The models of the statistical techniques
The application of the LDA, QDA as well as LA in the sample for credit risk model development led to the estimation of three credit risk assessment models. The form of these models is similar to the one of the bankruptcy prediction models presented in sub-section 2.3.5 earlier in this chapter. Tables 6.20 and 6.21 present the parameters of these models (i.e., discriminant coefficients, constant terms and cross-product term of the QDA model). The classification of the firms as high or low risk is performed using the classification rules discussed in sub-section 2.3.5 for the bankruptcy prediction case and consequently they are not repeated.
202
3.4
Chapter 6
Comparison of the credit risk assessment models
The detailed classification results for all the credit risk models presented in the previous sub-section are presented in Table 6.22. Similarly to the bankruptcy prediction case, the results involve the two types of error rates, i.e., the type I and the type II error rates. In the credit risk assessment problem, a type I error corresponds to the classification of a high risk firm as a low risk one, resulting to a potential capital loss. On the other hand, a type II error corresponds to the classification of a low risk firm as a high risk one, resulting to a potential opportunity cost. The discussion on the contribution of these two error types in estimating the overall error presented earlier in subsection 2.4 of this chapter for the bankruptcy prediction case, is still valid for the credit risk assessment problem due to the similarity of the two problems. Therefore, following the arguments made in sub-section 2.4 earlier in this chapter, the overall error rate is estimated as the average of the type I and the type II error rates, assuming that both contribute equally to the estimation of the overall error of a credit risk assessment model. In analyzing the obtained results, recall that the data of the firms for the year 1995 served as the reference set for the development of the credit risk assessment models. Consequently, it is not surprising that the models of most of the methods do not misclassify any firm. Only the models of LDA and QDA provide a small overall error rate of 5%. Since the results obtained for the reference set are only indicative of the fitting ability of the models, the results in years 1994 and 1993 are of major interest. These results are obtained by applying the developed models on the data of the firms in the corresponding two years to perform their classifica-
6. Classification problems in finance
203
tion and then comparing the models’ classification with the a-priori known classification of the firms in the two credit risk classes. Thus, the obtained results are indicative of the performance of the models to provide accurate early-warning estimations on the credit risk of the firms.
A close examination of the results indicates that the credit risk assessment models developed through the three MCDA classification methods (UTADIS, MHDIS and ELECTRE TRI) are more effective compared to the models of the other methods. In particular, in terms of the overall error rate, the UTADIS model outperforms the models of rough sets, LDA, QDA and LA in both years 1994 and 1993. Similarly, the MHDIS model outperforms the credit risk models of the three statistical methods in both 1994 and 1993. Compared to the rough set model, MHDIS provides the same performance in 1994, but in 1993 its overall error rate is significantly lower than the rough set model. Similar conclusions can also be derived for the performance of
204
Chapter 6
the ELECTRE TRI credit risk assessment model as opposed to the rough set model and the models of the three statistical methods. The comparison of the three MCDA models to each other shows that UTADIS provides the best result in 1994, but its performance in 1993 deteriorates significantly compared both to MHDIS and ELECTRE TRI. On the other hand, MHDIS seems to provide more robust results, since its overall error rate deteriorates more slowly from 1994 to 1993 compared to the two other MCDA models. Another interesting point that needs to be noted in the obtained results, is that the credit risk models developed through the four non-parametric classification techniques (UTADIS, MHDIS, ELECTRE TRI and rough sets) provide significantly lower type I error rates in both 1994 and 1993 compared to the models of the three statistical techniques. This fact indicates that these models are able to identify the high risk firms with a higher success rate than the statistical methods. This finding has significant practical implications for the selection of the appropriate credit risk assessment model, since analysts often fill more comfortable with models that are efficient in identifying the firms of high credit risk. A further analysis of the practical usefulness of these credit risk assessment models could be performed through the comparison of their results with the corresponding error rates that are obtained by the expert credit analysts of the bank from which the data have been derived. Obviously, a credit risk assessment model that performs consistently worse than the actual credit analyst, cannot provide meaningful support in the credit risk assessment process. From a decision-aiding perspective such a model is not consistent with the credit analyst’s evaluation judgment and therefore it is of limited practical use. On the other hand, models that are able to perform at least as good with the analysts’ estimations, they can be considered as consistent with the credit analyst’s evaluation judgment and furthermore, they have the ability to eliminate the inconsistencies that often arise in credit risk assessment based on human judgment. Therefore, the incorporation of such models in the bank’s credit risk management process is of major help, both for assessing new credit applications submitted to the bank, as well as for monitoring the risk exposure of the bank from its current credit portfolio. However, the information required to perform such a comparison between the developed models’ estimations and the corresponding estimations of the bank’s credit analysts was not available and consequently the aforementioned analysis was solely based on the comparison between the selected classification methods.
6. Classification problems in finance
4.
STOCK EVALUATION
4.1
Problem domain
205
Portfolio selection and management has been one of the major fields of interest in the area of finance for almost the last 50 years. Generally stated, portfolio selection and management involves the construction of a portfolio of securities (stocks, bonds, treasury bills, mutual funds, repos, financial derivatives, etc.) that maximizes the investor’s2 utility. The term “construction of a portfolio” refers to the allocation of a known amount of capital to the securities under consideration. Generally, portfolio construction can be realized as a two stage process:
1. Initially, in the first stage of the process, the investor needs to evaluate the available securities that constitute possible investment opportunities on the basis of their future perspectives. This evaluation leads to the selection of a reduced set consisting of the best securities. Considering the huge number of securities that are nowadays traded in the international financial markets, the significance of this stage becomes apparent. The investor is very difficult to be able to manage a portfolio consisting of a large number of securities. Such a portfolio is quite inflexible since the investor will need to be able to gather and analyze a huge amount of daily information on the securities in the portfolio. This is a difficult and time consuming process. Consequently portfolio updates will be difficult to take place in order to adjust to the rapidly changing market conditions. Furthermore, a portfolio consisting of many securities imposes increased trading costs which are often a decisive factor in portfolio investment decisions. Therefore, a compact set of securities needs to be formed for portfolio construction purposes. 2. Once this compact set of the best securities is specified after the evaluation in the first stage, the investor needs to decide on the allocation of the available capital to these securities. The allocation should be performed so that the resulting portfolio best meets the investor’s policy, goals and objectives. Since these goals/objectives are often diversified in nature (some are related to the expected return, whereas some are related to the risk of the portfolio), the resulting portfolio cannot be an optimal one, at least in the sense that the term “optimal” has in the traditional optimization framework where the existence of a single objective is assumed. In2
The term “investor” refers both to individual investors as well as to institutional investors, such as portfolio managers and mutual funds managers. Henceforth, the term “investor” is used to refer to anyone (individual, firm or organization) who is involved with portfolio construction and management.
206
Chapter 6
stead, the constructed portfolio will be a satisfying one, i.e., a portfolio that meets in a satisfactory way (but not necessarily optimal) all the goals and objectives of the investor. The implementation of the above two stage process is based on the clear specification of how the terms “best securities” and “satisfying portfolio” are defined. The theory of financial markets assumes that the investor’s policy can be represented through a utility function of some unknown form. This function is implicitly used by the investor in his/her decision making process. The pioneer of modern portfolio theory, Harry Markowitz assumed that this unknown utility function is a function of two variables/criteria: (1) the expected return of the portfolio and (2) the risk of the portfolio (Markowitz, 1952, 1959). These two criteria define the two main objectives of the portfolio selection and management process, i.e.: (1) to maximize the expected return and (2) to minimize the risk of the investment. Markowitz proposed two well-known statistical measures for considering the return and the risk of a portfolio. In particular, he used the average as a tool to estimate the expected return and the variance as a measure of risk. Within this modeling framework, Markowitz proposed the use of a quadratic programming formulation in order to specify an efficient portfolio that minimizes the risk (variance) for a given level of return. The mean-variance model of Markowitz provided the basis for extending the portfolio selection and management process over a wide variety of aspects. Typical examples of the extensions made in the Markowitz’s meanvariance model, include single and multi index models, average correlation models, mixed models, utility models, as well as models based on the concepts of geometric mean return, safety first, stochastic dominance, skewness, etc. A comprehensive review of all these approaches is presented in the book of Elton and Gruber (1995), whereas Pardalos et al. (1994) provide a review on the use of optimization techniques in portfolio selection and management. Generally, the existing research on the portfolio selection and management problem can be organized into three major categories:
1.
The studies focusing on the securities’ risk/return characteristics. These studies are primarily conducted by financial researchers in order to specify the determinants of risk and return in securities investment decisions. The most well known examples of studies within this category include Sharpe’s study on the capital asset pricing model (CAPM; Sharpe, 1964), Ross’ study on the arbitrage pricing theory (APT; Ross, 1976) and the Black-Scholes study on option valuation (Black and Scholes, 1973).
6. Classification problems in finance
2.
207
The studies focusing on the development of methodologies for evaluating the performance of securities according to different performance measures. These studies can be further categorized into two groups: The first group includes studies on the modeling and representation of the investor’s policy, goals and objectives in a mathematical model, usually of a functional form. This model aggregates all the pertinent factors describing the performance of the securities to produce an overall evaluation of the securities that complies with the policy of the investor. The securities with the higher overall evaluation according to the developed model are selected for portfolio construction purposes in a latter stage of the analysis. The developed model has usually the form of a utility function following the general framework of portfolio theory, according to which the investor is interested in constructing a portfolio that maximizes his utility. Thus, making explicit the form of this utility function contributes significantly in the portfolio selection and management process, both as a security evaluation mechanism as well as for portfolio construction. MCDA researchers have been heavily involved with this line of research. Some characteristic studies employing MCDA methods to model the investor’s policy include those of Saaty et al. (1980), Rios-Garcia and Rios-Insua (1983), Evrard and Zisswiller (1983), Martel et al. (1988), Szala (1990), Khoury et al. (1993), Dominiak (1997), Hurson and Ricci (1998), Zopounidis (1993), Hurson and Zopounidis (1995, 1996, 1997), Zopounidis et al. (1999). A comprehensive review of the use of MCDA techniques in the field of portfolio selection and management is presented in the book of Hurson and Zopounidis (1997) as well as in the studies of Spronk and Hallerbach (1997) and Zopounidis (1999). The second group involves studies on the forecasting of securities’ prices. The objective of this forecasting-based approach is to develop models that are able to provide accurate predictions on the future prices of the securities. Given that reliable predictions can be obtained from historical time-series data, the investor can select the securities with the highest anticipated future upward trend in their price. These securities are then considered for portfolio construction purposes. The development of such forecasting models is traditionally a major field of interest for researchers in econometrics and statistics. Nevertheless, recently the interest on the use of artificial intelligence techniques has significantly increased. This is mainly due to the flexibility of these techniques in modeling and representing the
208
3.
Chapter 6
complexity that describes securities’ prices movements and the highly non-linear behavior of the financial markets. Some examples based on this new approach, include neural networks (Wood and Dasgupta, 1996; Trippi and Turban, 1996; Kohara et al., 1997; Steiner and Wittkemper, 1997), machine learning (Tam et al., 1991; John et al., 1996), expert systems (Lee et al., 1989; Lee and Jo, 1999; Liu and Lee, 1997), fuzzy set theory (Wong et al., 1992; Lee and Kim, 1997) and rough sets (Jog et al., 1999). With regard to the contribution of these new techniques in portfolio selection and management, it is important to note that their use is not solely restricted on the academic research, but they are also often used in the daily practice of investors worldwide. The studies on the development of methodologies for portfolio construction. These methodologies follow an optimization perspective, usually in a multiobjective context. This complies with the nature of the portfolio construction problem. Indeed, portfolio construction is a multiobjective optimization problem, even if it is considered in the mean-variance framework of Markowitz. Within this framework, the investor is interesting in constructing a portfolio that maximizes the expected return and minimizes the risk of the investment. This is a two-objective optimization problem. Furthermore, considering that actually both return and risk are multidimensional, it is possible to extend the traditional meanvariance framework so that all pertinent risk and return factors are considered. For instance, risk includes both systematic and non-systematic risk. The traditional mean-variance framework considers only the nonsystematic risk, while within an extended framework the systematic risk (beta coefficient) can also be considered (e.g., construction of a portfolio with a pre-specified beta). Such an extended optimization framework can consider any goal/objective as perceived by the investor and not necessarily following a probabilistic approach such as the one of the mean-variance model. Actually, as noted by Martel et al. (1988), measuring risk and return on a probabilistic context does not always comply with the investors’ perception of these two key concepts. This finding motivated several researchers to introduce additional goals/objectives in the portfolio construction process (e.g., marketability, dividend yield, earning per share, price/earnings, etc.). Following this line of research, the construction of portfolios within this extended optimization framework, can be performed through multiobjective mathematical and goal programming techniques. Some typical studies following this approach have been presented by Lee and Chesser (1980), Nakayama et al. (1983), Rios-Garcia and Rios-Insua (1983), Colson and De Bruyn (1989), Tamiz et al. (1997), Zopounidis et
6. Classification problems in finance
209
al. (1998), Hurson and Zopounidis (1995, 1996, 1997), Bertsimas et al. (1999), Zopounidis and Doumpos (2000d). The use of classification techniques in the two-stage portfolio construction process discussed in the beginning of this sub-section can be realized during the first stage and it can be classified in the second group of studies mentioned above. The use of a classification scheme is not an uncommon approach to practitioners who are involved with security evaluation. For instance, in the case of stock evaluation most investment analysts and financial institutions periodically announce their estimations on the performance of the stocks in the form of recommendations such as “strong buy”, “buy”, “market perform”, etc. Smith (1965) first used a classification method (LDA) in order to develop a model that can reproduce such expert’s recommendations. A similar study was compiled by White (1975). Some more recent studies such as the ones of Hurson and Zopounidis (1995, 1996, 1997), Zopounidis et al. (1999) employ MCDA classification methods including ELECTRE TRI and UTADIS for the development of stock classification models considering the investor’s policy and preferences. Of course, except for the evaluation and classification on the basis of expert’s judgments, other classification schemes can also be considered. For instance, Klemkowsky and Petty (1973) used LDA to develop a stock classification model that classified stocks into risk classes on the basis of their historical return volatility. Alternatively, it is also possible to consider a classification scheme where the stocks are classified on the basis of their expected future return (e.g., stocks that will outperform the market, stocks that will not outperform the market, etc.). Jog et al. (1999) adopted this approach and used the rough set theory to develop a rule-based model that used past data to classify the stocks into classes according to their expected future return, as top performers (stocks with the highest future return), intermediate stocks, low performers (stocks with the lowest future return). A similar approach was used by John et al. (1996) who employed a machine learning methodology, whereas Liu and Lee (1997) developed an expert system that provides buy and sell recommendations (a two-group classification scheme) on the basis of technical analysis indicators for the stocks (Murphy, 1995). The results obtained through such classification models can be integrated in a later stage of the analysis with an optimization methodology (goal programming, multiobjective programming) to perform the construction of the most appropriate portfolio.
4.2
Data and methodology
Following the framework described in the above brief review, the application in this sub-section involves the development of a stock evaluation model
210
Chapter 6
that classifies the stocks into classes specified by an expert stock market analyst. The development of such a model has both research and practical implications for at least two reasons: The model can be used by stock market analysts and investors in their daily practice as a supportive tool for the evaluation of stocks on the basis of their financial and stock market performance. This reduces significantly the time and cost of the analysis of financial and stock market data on a daily basis. If the developed model has a specific quantitative form (utility function, discriminant function, outranking relation, etc.) it can be incorporated in the portfolio construction process. Assuming that the developed model is a stock performance evaluation mechanism representing the judgment policy of an expert stock market analyst, then the construction of a portfolio that achieves the best performance according to the developed model can be considered as an “optimal” one in the sense that it best meets the decision-maker’s preferences. From the methodological point of view, this application has several differences compared to the previous two applications on bankruptcy prediction and credit risk assessment: 1.
The stock evaluation problem is considered as a multi-group classification problem (the stocks are classified into three groups). Both bankruptcy prediction and credit risk assessment were treated as two-group problems. 2. There is an imbalance in the size of the groups in the considered sample. In both bankruptcy prediction and credit risk assessment each group in the samples used consisted of half the total number of firms. On the other hand, in this application the number of stocks per group differs for the three groups. This feature in combination with the consideration of more than two groups increases the complexity of this application compared to the two previous ones. 3. The sample used in this application involves only one period and there is no additional holdout sample. Consequently, the model validation techniques used in the bankruptcy prediction and in the credit risk assessment cases are not suitable in this application. To tackle this problem a jackknife model validation approach is employed (McLachlan, 1992; Kahya and Theodossiou, 1999; Doumpos et al., 2001) to obtain an unbiased estimate of the classification performance of the developed stock evaluation models. The details of this approach will be discussed later. Having in mind these features, the presented stock evaluation case study involves the evaluation of 98 stocks listed in the Athens Stock Exchange
6. Classification problems in finance
211
(ASE). The data are derived from the studies of Karapistolis et al. (1996) and Zopounidis et al. (1999). All stocks in the sample were listed in ASE during 1992 constituting the 68.5% of the total number of stocks listed in ASE at that time (143 stocks). The objective of the application is to develop a stock evaluation model that will classify the stocks into the following three groups: Group This group consists of 9 stocks with the best investment potentials in the medium/long run. These stocks are attractive to the investors, while the corresponding firms are in a sound financial position and they have a very positive reputation in the market. Group The second group of stock includes 31 stocks. The overall performance and stock market behavior of these stocks is rather moderate. However, they could be used by the portfolio manager to achieve portfolio diversification. Group This is the largest group of stocks, since it includes 58 stocks out of the 98 stocks in the sample. The stocks belonging into this group do not seem to be good investment opportunities, at least for the medium and long-term. The consideration of these stocks in a portfolio construction context can only be realized in a risk-prone investment policy seeking short-term profits. This trichotomous classification approach enables the portfolio manager to distinguish the promising stocks from the less promising ones. However, the stocks that are found to belong to the third class (less promising stocks) are not necessarily excluded from further consideration. Although the portfolio manager is informed about their poor stock market and financial performance in the long-term, he may select some of them (the best ones according to their performance measured on the basis of the developed classification model) in order to achieve portfolio diversification or to make shortterm profits. In that sense, the obtained classification provides an essential form of information to portfolio managers and investors; it supports the stock evaluation procedure and leads to the selection of a limited number of stocks for portfolio construction. The classification of the stocks in the sample was specified by an expert stock market analyst with experience on ASE. The classification of the stocks by this expert was based on the consideration of 15 financial and stock market criteria describing different facets of the performance of the stocks. These criteria are presented in Table 6.23. Criteria describe the stock market behavior of the stocks, whereas criteria are commonly used financial ratios similar to the ones employed in the previous two case studies. The combination of stock market indices and financial ratios enables the evaluation of all the fundamental features of the stocks and the corresponding firms.
212
Chapter 6
For most of the criteria, the portfolio managers’ preferences are increasing functions on their scale; this means that the greater the value of the criteria, the greater is the satisfaction of the portfolio manager. On the contrary, criterion (P/E ratio) has a negative rate, which means that the portfolio managers’ preference decreases as the value of this criterion increases (i.e., a portfolio manager would prefer a stock with low price that could yield high earnings). Furthermore, although it is obvious that the criteria and are correlated, an expert portfolio manager that has collaborated in this case study, has indicated that both criteria should be retained in the analysis. 3
Gross book value per share=Total assets/Number of shares outstanding Capitalization ratio=1/(Price/Earning per share) Stock market value=(Number of shares outstanding)×(Price) Marketability=Trading volume/Number of shares outstanding Financial position progress=(Book value at year t)/(Book value at year t-1) Dividend yield=(Dividend paid at time t)/(Price at time t) Capital gain=(Price at time t- Price at time t-1)/(Price at time t-1) Exchange flow ratio=(Number of days within a year when transactions for the stock took place)/(Number of days within a year when transactions took place in ASE) Round lots traded per day=(Trading volume over a year)/[(Number of days within a year when transactions took place in ASE)×(Minimum stock negotiation unit)] Transactions value per day=(Transactions value over a year)/(Number of days within a year when transactions for the stock took place)
6. Classification problems in finance
213
Tables 6.24 and 6.25 present some descriptive statistics (group means, skewness, kurtosis and correlations) regarding the performance of the stocks in the sample on the considered evaluation criteria. The comparison of the criteria averages for the three groups of stocks presented in Table 6.24 show that the three groups have significantly different performance as far as the stock market criteria are concerned (only criterion is found insignificant). On the contrary, the existing differences for the financial ratios are not found to be significant (except for the net income/net worth ratio,
214
Chapter 6
6. Classification problems in finance
215
As noted earlier the model validation techniques used in the bankruptcy prediction case study and in credit risk assessment (i.e., application of the models in earlier years of the data or in the holdout sample) cannot be used in this stock evaluation case. Instead, a jackknife model validation approach is employed. This approach is implemented in five steps as follows: Step 1: Select at random one stock from each of the three groups and The three selected stocks form a random holdout sample. A random number generator is used to perform the selection of the stocks. For each group the random number generator produces a random integer in the interval where is the number of stocks belonging into group This integer indicates the stock to be selected. Step 2: A stock classification model is developed using as the reference set the initial sample of stocks, excluding the three stocks selected randomly at step 1. Step 3: Use of the model developed at step 2 to classify the three stocks of the holdout sample formed at step 1. Step 4: Estimate the classification error for both the reference set and the holdout sample. Step 5: Repeat steps 1-4 for 150 times. The procedure provides an unbiased estimate of the classification performance of the compared methods in classifying the stocks into the three groups.
4.3
The developed models
For each replication of the above model validation process (150 replications overall) a different stock evaluation model is developed. Therefore, to facilitate the presentation of the developed models, the subsequent analysis is based on statistical information obtained for the models’ parameters and the classification error rates over all 150 replications.
4.3.1
The MCDA models
The most significant parameter of the models developed through UTADIS (HEUR2), MHDIS and ELECTRE TRI involve the criteria’s weights. Since different models are developed at each replication of the jackknife model validation process, different estimations are obtained at each replication for the criteria’s weights. Tables 6.26 and 6.27 present some summary statistics for the different estimates of the stock evaluation criteria weights, including the average weights, their standard deviation and their coefficient of varia-
216
Chapter 6
tion. It should be noted, that for the MHDIS method these statistics involve all the additive utility functions that are developed. These include four utility functions. Functions and are developed at the first stage of the hierarchical discrimination process for the discrimination of the stocks belonging into group from the stocks of groups and The former function characterizes the high performance stocks whereas the latter function characterizes the stocks of groups and (medium performance stocks and low performance stocks, respectively). The second pair of utility functions and are developed at the second stage of the hierarchical discrimination process for the distinction between medium performance stocks and low performance stocks The function characterizes the medium performance stocks, whereas the function characterizes the low performance stocks.
6. Classification problems in finance
217
According to the results of Tables 6.26-6.27 there are some interesting similarities among the models of the three MCDA methods as far as the most contributing criteria that describe the performance and classification of the stocks are concerned. In particular, the results of both UTADIS and ELECTRE TRI methods agree that the exchange flow ratio the transactions value per day and the financial ratio net worth/total asses are significant factors for the classification of the stocks. Of course, the weights assigned by each method to these stock evaluation criteria differ, but they all lead to the above conclusion. In the MHDIS method the exchange flow ratio is found significant in discriminating between medium and low performance stock (the weights of this ratio in the utility functions and exceed 10%), whereas the transactions value per day is found significant at all stages of the hierarchical discrimination process employed in MHDIS (the weights of this ratio in all four utility functions are higher than 15%). Considering the coefficient of variation as a measure of the stability of the criteria’s weights, it is interesting to observe that the estimates of UTADIS and MHDIS are quite more robust compared to the ones of ELECTRE TRI. For instance, in the case of the UTADIS method the coeffi-
218
Chapter 6
cient of variation for the weights of 9 out of the 15 ratios is lower than one. In the ELECTRE TRI’s results only two weight estimates have coefficient of variation lower than one, whereas in the MHDIS method the weight estimates for all the significant ratios (ratios with average weight higher than 10%) have coefficient of variation lower than one. Table 6.28 provides some further results with regard to the significance of the stock evaluation criteria in the classification models developed by the three MCDA classification methods. In particular, this table presents the ranking of the stock evaluation criteria according to their importance in the models of each method. The criteria are ranked from the most significant (lowest entries in the table) to the least significant ones (highest entries in the table). In each replication of the jackknife experiment, the criteria are ranked in this way for each of the models developed by the three MCDA methods. The different rankings obtained over all 150 replications of the jackknife experiment are then averaged. The average rankings are the one illustrated in Table 6.28. The Kendall’s coefficient of concordance W is also reported for each method as a measure of the concordance in the rankings of the criteria over the 150 replications. The results indicate that the results of the UTADIS method are quite more robust than the ones of ELECTRE TRI and MHDIS. In addition, the significance of the transactions value per day is clearly indicated since in all methods this criterion has one of the highest positions in the rankings.
6. Classification problems in finance
219
In addition to the criteria’s weights, the ELECTRE TRI models consider three additional parameters involving the preference, indifference and veto thresholds. The corresponding results for these parameters are presented in Table 6.29 (the presented results involve the average estimates over all 150 ELECTRE TRI models developed at each replication of the jackknife procedure). The results of Table 6.29 show that most ratios are given a veto ability when the discrimination between the high performance stocks and the medium performance stock is performed (comparison with the profile In particular, during the 150 replications ratio was given a veto ability in 87 replications, ratio in 10 replications, ratios and in 4 replications, while ratio in two replications. In the case where the stocks are compared to the reference profile in order to discriminate between the medium performance stocks and the low performance stocks (groups and respectively), the discordance test is used less frequently. In particular, ratio was given a veto ability in four replications, whereas ratios and were given a veto ability in only one replication.
220
4.3.2
Chapter 6
The rough set model
The rule-based stock evaluation models developed at each replication of the jackknife process are analyzed in terms of the number of rules describing each group of stocks and the strength of the rules (i.e., the number of stocks covered by each rule). However, given that the sample is not balanced in terms of the number of stocks per group (9 stocks in 31 stocks in and 58 stocks in it is obvious that the rules describing the smaller groups will have lower strength. To overcome this difficulty in using the traditional strength measure defined as the number of stocks covered by each rule, a modified strength measure is employed in this application. The modified measure used is referred to as relative strength. The relative strength of a rule describing group (i.e., a rule whose conclusion part recommends the classification into group is defined as the ratio of the number of stocks that belong into group and they are covered by the rule to the number of stocks belonging into group The higher is the relative strength of a rule corresponding to group the more general is the description of this group of stocks. Tables 6.30 and 6.31 present the statistics on the developed rule-based stock classification models in terms of the three aforementioned features (strength, relative strength, and number of rules). The results show that the number of rules describing the high performance stock is smaller compared to the number of rules corresponding to the medium and the low performance stocks (groups and respectively). This is of no surprise, since as noted earlier there are only nine high performance stocks in the sample and consequently only a small number of rules is required to describe them. Despite this fact, it is interesting to note that the relative strength of the rules corresponding to high performance stock is considerably higher than the relative strength of the rules developed for the two other groups of stocks, thus indicating that the rules of group are more general compared to the other rules.
6. Classification problems in finance
221
Figure 6.5 presents some results derived from the developed rule-based models with regard to the significance of the stock evaluation criteria. The presented results involve the number of replications for which each stock evaluation criterion was considered in the developed stock classification rules.
The more frequently a criterion appears in the developed rules the more significant it is considered. On the basis of this remark, the results of Figure 6.5 show that the gross book value per share ratio and the exchange flow ratio are the most significant factors in the stock classification rules, since they are used in all 150 rule-based models developed at each replication of the jackknife process. The former ratio was found significant in the MHDIS models (cf. Table 6.27), while in the ELECTRE TRI models it was often given a veto ability. On the other hand, the exchange flow ratio was found significant by all the three MCDA classification methods. Another ratio that was found significant in the models developed by the MCDA
222
Chapter 6
methods, the transactions value per day ratio is quite often used in the stock classification rules constructed through the rough set approach. In particular, this ratio is used in 127 out of the 150 rule-based models that are developed. On the other hand the net worth/total assets ratio that was found significant in the UTADIS and the ELECTRE TRI models is not found to be significant by the rough set approach; it is used only in 19 out of the 150 rule-based models.
4.4
Comparison of the stock evaluation models
The classification results obtained by all the above described stock evaluation models as well as the corresponding results of the statistical classification techniques (LDA and LA)4 are summarized in Table 6.32. The presented results are averages over all 150 jackknife replications; they refer to the holdout sample formed at each replication and consisted of three randomly selected stocks. Along with the classification results, Table 6.32 also presents (in parentheses) the grouping of the considered methods on the basis of their overall error rate using the Tukey’s test at the 5% significance level. In terms of the overall error rate UTADIS provides the best results, followed closely by MHDIS and LA. The results of these three methods do not differ significantly according to the grouping of the Tukey’s test. Nevertheless, it should be noted that UTADIS is the only method for which the overall error rate is lower than 30%. Furthermore, both UTADIS and MHDIS are quite effective in classifying correctly the high performance stocks and the low performance stocks as well. On the other hand, the classification performance of these two methods with regard to the medium performance stocks is rather poor compared to the other methods considered in the comparison. In contrast to UTADIS and MHDIS, ELECTRE TRI is quite effective in the group of medium performance stocks, both when the discordance test is employed as well as in the case where this test is not considered (ELECTRE TRI with veto vs. ELECTRE TRI without veto). In terms of the overall error rate, the performance of the ELECTRE TRI models that do not consider the discordance test, is slightly inferior to the case where the discordance test is performed, but the differences between the two cases are not statistically significant. The results of the ELECTRE TRI models are similar to the ones of LDA. Finally, the rough set approach seems to provide the worst results compared to the other methods in terms of the overall error rate. One reason for the low performance of the rough set approach could be the imbalance of 4
QDA was not used in the stock evaluation case study due to some high correlations between the criteria for the high performance stocks (group C1). This posed problems in the estimation of the quadratic discriminant functions’ coefficients.
6. Classification problems in finance
the three groups of stocks in the sample (9 stock in group group and 58 stocks in group
223
31 stocks in
As far as the individual error rates are concerned it is interesting to note that the ELECTRE TRI models that consider the discordance test dot not lead to any error of the forms or The errors are associated to an opportunity cost for the investor/portfolio manager due to the decision not to invest in a high performance stock. On the other hand, the errors are likely to lead to capital losses in a medium-long-term, since they correspond to an incorrect decision to invest in a low performance stock. The ELECTRE TRI models that do not consider the discordance test also lead to very limited errors of the forms and and so are the models of the MHDIS method. The errors are also limited in the
224
Chapter 6
other methods too, but the errors are higher especially in the case of LDA and LA. Overall, the rather high error rates of all methods (28.89%-41.83%) indicate the complexity of the stock evaluation problem. The dynamic nature of the stock markets in combination with the plethora of internal and external factors that affect stock performance as well as the huge volume of financial and stock market information that is available to investors and stock market analysts, all contribute to the complexity of the stock evaluation problem.
Chapter 7 Conclusions and future perspectives
1.
SUMMARY OF MAIN FINDINGS
The classification problem has always been a problem of major practical and research interest. This remark is justified by the plethora of decision making problems that require absolute judgments/comparisons of the alternatives with explicit or implicit reference profiles in order to decide upon the classification of the alternatives into predefined homogenous groups. Several typical examples of such problems have been mentioned through out this book. Most of the existing research on classification problems is focused on the development of efficient methodologies for developing classification models that aggregate all the pertinent factors (evaluation criteria) describing the problem at hand so that the following two main objectives can be met: 1. The decision maker support in the evaluation of the alternatives in an accurate and reliable way by providing correct recommendations on the classification of the alternatives. 2. The analysis of the impact that the considered evaluation criteria have on the evaluation and classification of the alternatives.
Methodologies that meet satisfactory these objectives are of major interest as tools to study complex decision making problems and provide efficient support to decision makers. The traditional methodologies used for developing classification models originate from the fields of multivariate statistics and econometrics. These methodologies have set the bases for understanding the nature of the classification problem and for the modeling and representation of classification problems in quantitative models. Nevertheless, similarly to any statistical
226
Chapter 7
procedure, these methodologies are based on specific statistical assumptions. The validity of these assumptions is often impossible to be checked in real world problems, since in most cases the analyst cannot have a full description of the actual phenomenon under consideration, but only a small sample. This restriction has motivated researchers to explore the development of more flexible and less restrictive methodologies for developing classification models. Towards this direction a significant part of the research has been focused on the use of artificial intelligence techniques and operations research methods. This book followed the operations research approach and in particular the MCDA paradigm. MCDA has evolved over the past three decades as one of the most significant field of operations research and decision sciences. The book was mainly focused on the preference disaggregation approach of MCDA. Two preference disaggregation methods were introduced to develop classification models, namely the UTADIS and the MHDIS methods. Both methods assume that the decision maker’s system of preferences can be represented in the form of an additive utility function that is used to decide on the classification of the alternatives into the predefined groups. The main advantage of both methods compared to other MCDA techniques is that the development of the additive utility models requires only minimal information to be specified by the decision maker. In contrast to other methods (e.g., ELECTRE TRI), in preference disaggregation analysis the decision maker does not need to specify detailed preferential information in the form of criteria weights, reference profiles, indifference, preference, veto thresholds, etc. Instead, only a representative sample of the actual decisions that he takes is needed. The decisions taken by the decision maker are the outcome of his system of preferences and values, i.e., the outcome of the decision policy that he employs in his daily practice. In this regard, a sample of decisions taken on representative situations encompasses all the information required to describe the decision maker’s system of preferences. Given that such a sample can be formed, the model development process is then focused on the development of a model that can reproduce the actual decisions taken by the decision maker. This approach is analogous to the well-known regression paradigm used in statistics and econometrics. The use of mathematical programming techniques for model development purposes within the above context provides increased flexibility. In particular, the discussion made in Chapter 4, shows that the use of mathematical programming enables the use of different approaches to measure the efficiency of the classification model to be developed. Except for the increased model development flexibility, the implementation of preference disaggregation paradigm within the context of the two proposed methods (UTADIS and MHDIS) has the following two main advantages:
7. Conclusions and future perspectives
227
1. The parameters of the additive utility models (criteria weights and marginal utility functions) have clear interpretation that can be understood by the decision maker. This is a very important issue for understanding the results and the recommendations of the developed models with regard to the classification of the alternatives and the amelioration of the model so that it is as consistent as possible with the decision maker’s system of preferences. Actually, the model development process in the context of the proposed MCDA methods should not be considered as a straightforward automatic process involving the solution of an optimization problem. Instead, the specification of the model’s parameters through an optimization procedure is only the first stage of the model development process. The results obtained at this first stage constitute only an initial basis for the further calibration of the model through the interactive communication among the decision maker and the analyst. The implementation of this interactive process will clarify and eliminate the possible inconsistencies in the model or even in the decision maker’s judgments. 2. The use of the additive utility function as the modeling and representation form enables the use of qualitative criteria. Many classification methods from the fields of statistics and econometrics but also nonparametric classification techniques such as mathematical programming and neural networks assume that all criteria (variables) are quantitative. For qualitative criteria two approaches are usually employed: (a) Quantification of the qualitative scale by assigning an arbitrary chosen real or integer value to each level of the scale (e.g., 0=low, l=medium, 2=high). (b) Consideration of each level of the qualitative scale as a distinct binary variable (criterion). For instance, the criterion market reputation of the firm measured in the three level scale {good, medium, bad}, following this second approach would be broken down into three binary criteria: market reputation good={0, 1}, market reputation medium={0, 1}, market reputation bad={0, 1}, where zeros correspond to no and ones to yes. Both these approaches, alter the nature of the qualitative criteria and hardly correspond to the way that the decision maker perceives them. On the other hand, the proposed MCDA methods do not require any change in the way that the qualitative criteria are measured, and consequently the developed classification models can easily combine quantitative and qualitative criteria. This is an important advantage, mainly for real-world problems where qualitative information is vital.
Of course, additive utility functions are not the only available choice for the representation and modeling of the decision maker’s preferences in classification problems within the preference disaggregation paradigm. In Chapter 5 the use of the outranking relation model was considered as the criteria
228
Chapter 7
aggregation mechanism for classification purposes. The methodology proposed in Chapter 5 (appendix) for the specification of the parameters of an outranking relation classification model (criteria weights, preference, indifference and veto thresholds) on the basis of the preference disaggregation paradigm is a research direction that is worth further investigation. The approach presented in Chapter 5 is new attempt to address this issue that overcomes some of the problems of previous similar techniques (Mousseau and Slowinski, 1998), mainly with regard to computational complexity issues and the modeling of the discordance test. Nevertheless, further research is still required in order to take full advantage of the capabilities that the outranking relation modeling framework provides, such as the introduction of incomparability relation. Except for the above methodological issues, the book also focused on more “practical” issues regarding the comparison of MCDA classification methods to other well-known techniques. Most existing comparative studies in the field of MCDA involve the comparison of different MCDA methods in terms of their theoretical grounds and the kind of support that they provide to decision makers (Siskos et al., 1984b; Roy and Bouyssou, 1986; Carmone et al., 1997; Zanakis et al., 1998). Such studies provide valuable insight to the peculiarities and features that underlie the operation of MCDA methods, thus contributing to the understanding of the way that MCDA can be used to support and ultimately improve the decision making process. Nevertheless, there is an additional question that needs to be answered; this involves the analysis of the relative performance of MCDA classification methods as opposed to other existing and widely used techniques. Such analysis is of major practical interest. An actual decision maker would not simply be interested in an approach that provides enhanced preference modeling capabilities, but he would also be interested in using a method that meets a complex realworld problem as effectively as possible providing accurate recommendations. The investigation of this issue in the present book was realized in two directions. The first involved an extensive experimental comparison of MCDA methods compared to other classification approaches, whereas at a second stage the analysis was based on real-world financial data (bankruptcy prediction, credit risk assessment, stock evaluation). The results obtained in both cases can be considered as encouraging for the MCDA classification methods (UTADIS, MHDIS, ELECTRE TRI). In most cases, their classification performance was found superior to widely used classification techniques such as the linear and the quadratic discriminant analysis, the logit analysis and the non-parametric rule-based framework of the rough set approach.
7. Conclusions and future perspectives
2.
229
ISSUES FOR FURTHER RESEARCH
In this book an effort was made to cover as comprehensively as possible a plethora of issues regarding the model developing techniques for MCDA classification methods and their performance. Nevertheless, in the fields of classification, in general, and MCDA classification methods, in particular, there are still quite many topics that are worth further research and analysis. Two of the most characteristic future research topics involve the validation of the developed models and their uniqueness. Both issues are of major interest for MCDA methodologies that employ indirect model development approaches. Methodologies that employ direct interrogation procedures for model development, cope with these issues using the information that the decision maker provides directly to the analyst during model development. However, when employing an indirect approach for preferential elicitation purposes, there are several issues raised. Most of the proposed methodologies rest with the development of a classification/sorting model that satisfies some optimality criteria defined on the reference set of alternatives. Nevertheless, no matter what these optimality criteria are, it is possible that there are other optimal or sub–optimal (near–optimal) classification/sorting models that can provide a more appropriate representation of the problem and the decision maker’s preferences. This is because what is optimal according to the limited information included in the reference set cannot be guaranteed to remain optimal when the whole information becomes available. In some methods, such as the UTADIS method, this problem is addressed through post–optimality analysis. In MHDIS the sequential solution of three mathematical programming problems provides similar results. The further investigation of this issue is clearly of major importance towards performing a more thorough analysis of the information included in the reference set, thus providing significant support to the decision maker in selecting the most appropriate model not according to a pre–specified “optimality” criterion, but rather according to his/her preferential system. The above issue is closely related to another possible future research direction. Since it is possible to develop many classification/sorting models (optimal or near–optimal) from a given reference set, it would be interesting to explore ways of combining their outcomes (classification/sorting recommendations). Such combination of classification/sorting models of the same or different forms could have a positive impact on the accuracy of the classification/sorting decisions taken through these models. This issue has been widely studied by researchers in the field of machine learning through the development of voting algorithms (Breiman, 1996; Quinlan, 1996; Jelanek and Stefanowski, 1998). The consideration of similar approaches for MCDA classification/sorting models is also worth the investigation.
230
Chapter 7
Special treatment should also be given to the specification of the technical parameters involved in the model development process (e.g., optimality criterion, normalization constraints imposed on the development of discriminant functions, number of break-points for the piece-wise linear formulation of marginal utility functions in utility-based techniques, etc.). The specification of these parameters affects both the performance of the developed classification/sorting models as well as their stability. Thus, investigating the way that these parameters should be specified will provide significant support in eliminating a source of peremptoriness during model development, thus facilitating the interactive development of classification and sorting models. Interactivity is an issue of major importance in MCDA and its further consideration for the development of decision rules from assignment examples is also an important topic for future research. Except for the above model development and validation issues, the future research can also focus on the form of the criteria aggregation models that are used. In the present book the additive utility function and the outranking relation have only been considered. As noted above the consideration of the outranking relation approach needs further investigation so that outranking relations models can be developed through the preference disaggregation paradigm that consider the incomparability between the alternatives. The way that the discordance test is performed needs also careful consideration. The methodology proposed in the appendix of Chapter 5 implied the use of the discordance test only in cases where it was found to improve the classification results. Other ways can also be explored to the realization of the discordance test to comply with its nature and the introduction of the veto ability for the criteria. With regard to the utility function framework, it is worth investigating the use of other forms than additive ones. The multiplicative utility function is a typical example (Keeney and Raiffa, 1993) having some interesting advantages over the additive case, mainly with regard to the modeling of the interactions between the criteria (additive utility functions assume that the criteria are preferentially independent; cf. Chapter 3). However, the use of the multiplicative utility function in optimization problems such as the ones used in UTADIS and MHDIS leads to non-linear mathematical programming formulations with non-linear constraints which can be quite difficult to solve especially for large data sets. The use of advanced optimization algorithms, such as genetic algorithms and heuristic optimization techniques (e.g., tabu search) could be helpful at this point. Another future research direction, which could be of interest at the levels of research and practice, is the investigation of the similarities and differences of MCDA classification and sorting methods. Since there is an arsenal of different MCDA classification/sorting methods, the analyst should be able to recommend to the decision maker the most appropriate one according to
7. Conclusions and future perspectives
231
the features of the problem at the hand. Providing such a recommendation requires the examination of the circumstances under which different MCDA models give similar or different results, as well as the relative comparison of the classification and sorting performance of these models subject to different data conditions. Finally, it is important to emphasize the necessity for the development of multicriteria decision support systems that do not focus only on model development, but they also incorporate all the above future research directions. Such integrated systems that will employ different classification/sorting models will provide major support to decision makers in constructing appropriate classification/sorting models for real-world decision making purposes.
References
Abad, P.L. and Banks, W.J. (1993), “New LP based heuristics for the classification problem”, European Journal of Operational Research, 67, 88–100. Altman, E.I. (1968), “Financial ratios, discriminant analysis and the prediction of corporate bankruptcy”, Journal of Finance, 23, 589-609. Altman, E.I. (1993), Corporate Financial Distress and Bankruptcy, John Wiley and Sons, New York. Altman, E.I. and Saunders, A. (1998), “Credit risk measurement: Developments over the last 20 years”, Journal of Banking and Finance, 21, 1721–1742. Altman, E.I., Avery, R., Eisenbeis, R. and Stinkey, J. (1981), Application of Classification Techniques in Business, Banking and Finance, Contemporary Studies in Economic and Financial Analysis, Vol. 3, JAI Press, Greenwich. Altman E.I., Hadelman, R.G. and Narayanan, P. (1977), “Zeta analysis: A new model to identify bankruptcy risk of corporations”, Journal of Banking and Finance, 1, 29–51. Andenmatten, A. (1995), Evaluation du Risque de Défaillance des Emetteurs d’Obligations: Une Approche par l’Aide Multicritère à la Décision, Presses Polytechniques et Universitaires Romandes, Lausanne. Anderson, T.W. (1958), An Introduction to Multivariate Statistical Analysis, Wiley, New York. Archer, N.P. and Wang, S. (1993), “Application of the back propagation neural networks algorithm with monotonicity constraints for two-group classification problems”, Decision Sciences, 24, 60-75. Bajgier, S.M. and Hill, A.V. (1982), “A comparison of statistical and linear programming approaches to the discriminant problem”, Decision Sciences, 13, 604–618. Bana e Costa, C.A. and Vansnick, J.C. (1994), “MACBETH: An interactive path towards the construction of cardinal value functions”, International Transactions on Operations Research, 1, 489-500.
234 Banks, W.J. and Abad, P.L. (1991), “An efficient optimal solution algorithm for the classification problem”, Decision Sciences, 22, 1008–1023. Bardos, M. (1998), “Detecting the risk of company failure at the Banque de France”, Journal of Banking and Finance, 22, 1405–1419. Bastian, A. (2000), “Identifying fuzzy models utilizing genetic programming”, Fuzzy Sets and Systems, 113, 333-350. Beaver, W.H. (1966), “Financial ratios as predictors of failure”, Empirical Research in Accounting: Selected Studies, Supplement to Journal of Accounting Research, 5, 179-199. Belacel, N. (2000), “Multicriteria assignment method PROAFTN: Methodology and medical applications”, European Journal of Operational Research, 125, 175-183. Belton, V. and Gear, T. (1983), “On a short-coming of Saaty’s method of analytic hierarchies”, Omega, 11/3, 228-230. Benayoun, R., De Montgolfier, J., Tergny, J. and Larichev, O. (1971), "Linear programming with multiple objective function: Stem method (STEM)", Mathematical Programming, 1/3, 366-375. Bergeron, M., Martel, J.M. and Twarabimenye, P. (1996), “The evaluation of corporate loan applications based on the MCDA”, Journal of Euro-Asian Management, 2/2, 16-46. Berkson, J. (1944), “Application of the logistic function to bio-assay”, Journal of the American Statistical Association, 39, 357-365. Bertsimas, D., Darnell, C. and Soucy, R. (1999), “Portfolio construction through mixedinteger programming at Grantham, Mayo, Van Otterloo and Company”, Interfaces, 29, 49-66. Black, F. and Scholes, M. (1973), “The pricing of options and corporate liabilities”, Journal of Political Economy, 81, 659-674. Bliss, C.I. (1934), “The method of probits”, Science, 79, 38-39. Boritz, J.E. and Kennedy, D.B. (1995), “Effectiveness of neural network types for prediction of business failure”, Expert Systems with Applications, 9/4, 503-512. Brans, J.P. and Vincke, Ph. (1985), “A preference ranking organization method”, Management Science, 31/6, 647-656. Breiman, L., Friedman, J.H., Olsen, R.A. and Stone, C.J. (1984), Classification and Regression Trees, Pacific Grove, California. Carmone Jr., F.J., Kara, A. and Zanakis, S.H. (1997), “A Monte Carlo investigation of incomplete pairwise comparison matrices in AHP”, European Journal of Operational Research, 102, 538-553. Catelani, M. and Fort, A., (2000), “Fault diagnosis of electronic analog circuits using a radial basis function network classifier”, Measurement, 28/3, 147-158. Casey, M., McGee, V. and Stinkey, C. (1986), “Discriminating between reorganized and liquidated firms in bankruptcy”, The Accounting Review, April, 249–262. Charnes, A. and Cooper, W.W. (1961), Management Models and Industrial Applications of Linear Programming, Wiley, New York.
References
235
Charnes, A., Cooper, W.W. and Ferguson, R.O. (1955), “Optimal estimation of executive compensation by linear programming”, Management Science, 2, 138-151. Chmielewski, M.R. and Grzymala-Busse, J.W. (1996), “Global discretization of continuous attributes as preprocessing for machine learning”, International Journal of Approximate Reasoning, 15, 319-331. Choo, E.U. and Wedley, W.C. (1985), “Optimal criterion weights in repetitive multicriteria decision–making”, Journal of the Operational Research Society, 36/11, 983–992. Clark, P. and Niblett, T. (1989), “The CN2 induction algorithm”, Machine Learning, 3, 261283. Colson, G. and de Bruyn, Ch. (1989), “An integrated multiobjective portfolio management system”, Mathematical and Computer Modelling, 12/10-11, 1359-1381. Conway, D.G., Victor Cabot A. and Venkataramanan, M.A. (1998), “A genetic algorithm for discriminant analysis”, Annals of Operations Research, 78, 71-82. Cook, W.D. and Kress, M. (1991), “A multiple criteria decision model with ordinal preference data”, European Journal of Operational Research, 54, 191-198. Courtis, J.K. (1978), “Modelling a financial ratios categoric framework”, Journal of Business Finance & Accounting, 5/4, 371-387. Cronan, T.P., Glorfeld, L.W. and Perry, L.G. (1991), “Production system development for expert systems using a recursive partitioning induction approach: An application to mortgage, commercial and consumer lending”, Decision Sciences, 22, 812-845. Devaud, J.M., Groussaud, G. and Jacquet-Lagrèze, E. (1980), “UTADIS: Une méthode de construction de fonctions d’utilité additives rendant compte de jugements globaux”, European Working Group on Multicriteria Decision Aid, Bochum. Diakoulaki, D., Zopounidis, C., Mavrotas, G. and Doumpos, M. (1999), “The use of a preference disaggregation method in energy analysis and policy making”, Energy–The International Journal, 24/2, 157-166. Dias, L., Mousseau, V., Figueira, J. and Climaco, J. (2000), “An aggregation/disaggregation approach to obtain robust conclusions with ELECTRE TRI”, Cahier du LAMSADE, No 174, Université de Paris-Dauphine. Dillon, W.R. and Goldstein, M. (1978), “On the performance of some multinomial classification rules”, Journal of the American Statistical Association, 73, 305-313. Dimitras, A.I., Zopounidis, C. and Hurson, C. (1995), “A multicriteria decision aid method for the assessment of business failure risk”, Foundations of Computing and Decision Sciences, 20/2, 99-112. Dimitras, A.I., Zanakis, S.H. and Zopounidis, C. (1996), “A survey of business failures with an emphasis on prediction methods and industrial applications”, European Journal of Operational Research, 90, 487-513. Dimitras, A.I., Slowinski, R., Susmaga, R. and Zopounidis, C. (1999), “Business failure prediction using rough sets”, European Journal of Operational Research, 114, 263-280. Dominiak C. (1997), “Portfolio selection using the idea of reference solution”, in: G. Fandel and Th. Gal (eds.), Multiple Criteria Decision Making, Proceedings of the Twelfth Inter-
236 national Conference, Lectures Notes in Economics and Mathematical Systems 448, Hagen, Germany, Berlin-Heidelberg, 593-602. Doumpos, M. and Zopounidis, C. (1998), “The use of the preference disaggregation analysis in the assessment of financial risks”, Fuzzy Economic Review, 3/1, 39-57. Doumpos, M. and Zopounidis, C. (2001), “Developing sorting models using preference disaggregation analysis: An experimental investigation”, in: C. Zopounidis, P.M. Pardalos and G. Baourakis (Eds), Fuzzy Sets in Management, Economy and Marketing, World Scientific, Singapore, 51-67. Doumpos, M., Pentaraki, K., Zopounidis, C. and Agorastos, C. (2001), “Assessing country risk using a multi–group discrimination method: A comparative analysis”, Managerial Finance, 27/7-8, 16-34.. Duarte Silva, A.P. and Stam, A. (1994), “Second-order mathematical programming formulations for discriminant analysis”, European Journal of Operational Research, 74, 4-22. Dubois, D. and Prade, H. (1979), “Decision-making under fuzziness”, in: M.M. Gupta, R.K. Ragade and R.R. Yager (Eds), Advances in Fuzzy Set Theory and Applications, NorthHolland, Amsterdam, 279-302. Duchessi, P. and Belardo, S. (1987), “Lending analysis support system (LASS): An application of a knowledge-based system to support commercial loan analysis”, IEEE Transactions on Systems, Man, and Cybernetics, 17/4, 608-616. Duda, R.O. and Hart, P.E. (1978), Pattern Classification and Scene Analysis, John Wiley and Sons, New York. Dutka, A. (1995), AMA Handbook of Customer Satisfaction: A Guide to Research, Planning and Implementation, NTC Publishing Group, Illinois. Dyer, J.S. (1990), “A clarification of ‘Remarks on the analytic hierarchy process’”, Management Science, 36/3, 274-275. Elmer, P.J. and Borowski, D.M. (1988), “An expert system approach to financial analysis: The case of S&L bankruptcy”, Financial Management, 17, 66-76. Elton, E.J. and Gruber, M.J. (1995), Modern Portfolio Theory and Investment Analysis (5th edition), John Wiley and Sons, New York. Evrard, Y. and Zisswiller, R. (1982), “Une analyse des décisions d’investissement fondée sur les modèles de choix multi-attributs”, Finance, 3/1, 51-68. Falk, J.E. and Karlov, V.E. (2001), “Robust separation of finite sets via quadratics”, Computers and Operations Research, 28, 537–561. Fayyad, U.M. and Irani, K.B. (1992), “On the handling of continuous-valued attributes in decision tree generation”, Machine Learning, 8, 87-102. Fishburn, P.C. (1965), “Independence in utility theory with whole product sets”, Operations Research 13, 28-45. Fishburn, P.C. (1970), Utility Theory for Decision Making, Wiley, New York. Fisher, R.A. (1936), “The use of multiple measurements in taxonomic problems”, Annals of Eugenics, 7, 179-188.
References
237
Fleishman, A.I. (1978), “A method for simulating nonnormal distributions”, Psychometrika, 43, 521-532. Fodor, J. and Roubens, M. (1994), Fuzzy Preference Modelling and Multicriteria Decision Support, Kluwer Academic Publishers, Dordrecht. Foster, G. (1986), Financial Statements Analysis, Prentice Hall, London. Fraughnaugh, K., Ryan, J., Zullo, H. and Cox Jr., L.A. (1998), “Heuristics for efficient classification”, Annals of Operations Research, 78, 189-200. Freed, N. and Glover, F. (198la), “A linear programming approach to the discriminant problem”, Decision Sciences, 12, 68-74. Freed, N. and Glover, F. (1981b), “Simple but powerful goal programming models for discriminant problems”, European Journal of Operational Research, 7, 44-60. Freed, N. and Glover, F. (1986), “Evaluating alternative linear programming models to solve the two–group discriminant problem”, Decision Sciences, 17, 151–162. Fritz, S. and Hosemann, D. (2000), “Restructuring the credit process: Behavior scoring for German corporates”, International Journal of Intelligent Systems in Accounting, Finance and Management, 9, 9-21. Frydman, H., Altman, E.I. and Kao, D.L. (1985), “Introducing recursive partitioning for financial classification: The case of financial distress”, Journal of Finance, XL/1, 269-291. Gehrlein, W.V. and Wagner, B.J. (1997), Nontraditional Approaches to the Statistical Classification and Regression Problems, Special Issue in Annals of Operations Research, 74. Gelfand, S., Ravishankar, C. and Delp, E. (1991), “An iterative growing and pruning algorithm for classification tree design”, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13/2, 163–174. Glover, F. (1990), “Improved linear programming models for discriminant analysis”, Decision Sciences, 21, 771–785. Glover, F. and Laguna, M. (1997), Tabu Search, Kluwer Academic Publishers, Boston. Glover, F., Keene, S. and Duea, B. (1988), “A new class of models for the discriminant problem”, Decision Sciences, 19, 269–280. Gloubos, G. and Grammatikos, T. (1988), “The success of bankruptcy prediction models in Greece”, Studies in Banking and Finance supplement to the Journal of Banking and Finance, 7, 37–46. Gochet, W., Stam, A., Srinivasan, V. and Chen, S. (1997), “Multigroup discriminant analysis using linear programming”, Operations Research, 45/2, 213-225. Grabisch, M. (1995), “Fuzzy integrals in multicriteria decision making”, Fuzzy Sets and Systems, 69, 279-298. Grabisch, M. (1996), “The application of fuzzy integrals in multicriteria decision making”, European Journal of Operational Research, 89, 445-456. Greco, S., Matarazzo, B. and Slowinski, R. (1997), “Rough set approach to multi-attribute choice and ranking problems”, in: G. Fandel and T. Gal (Eds), Multiple Criteria Decision Making, Springer-Verlag, Berlin, 318-329.
238 Greco, S., Matarazzo, B. and Slowinski, R. (1999a), “The use of rough sets and fuzzy sets in MCDM”, in: T. Gal, T. Hanne and T. Stewart (eds.), Advances in Multiple Criteria Decision Making, Kluwer Academic Publishers, Dordrecht, 14.1-14.59. Greco, S., Matarazzo, B., Slowinski, R. and Zanakis, S. (1999b), “Rough set analysis of information tables with missing values”, in: D. Despotis and C. Zopounidis (Eds.), Integrating Technology & Human Decisions: Bridging into the 21st Century, Vol. II, Proceedings of the 5th International Meeting of the Decision Sciences Institute, New Technologies Editions, Athens, 1359–1362. Greco, S., Matarazzo, B. and Slowinski, R. (2000a), “Extension of the rough set approach to multicriteria decision support”, INFOR, 38/3, 161–196. Greco, S., Matarazzo, B. and Slowinski, R. (2000b), “Dealing with missing values in rough set analysis of multi-attribute and multi-criteria decision problems”, in: S.H. Zanakis, G. Doukidis and C. Zopounidis (Eds.), Decision Making: Recent Developments and Worldwide Applications, Kluwer Academic Publishers, Dordrecht, 295–316. Greco, S., Matarazzo, B. and Slowinski, R. (2000b), “Dealing with missing values in rough set analysis of multi-attribute and multi-criteria decision problems”, in: S.H. Zanakis, G. Doukidis and C. Zopounidis (Eds.), Decision Making: Recent Developments and Worldwide Applications, Kluwer Academic Publishers, Dordrecht, 295–316. Greco, S., Matarazzo, B. and Slowinski, R. (2001), “Conjoint measurement and rough sets approach for multicriteria sorting problems in presence of ordinal data”, in: A. Colorni, M. Paruccini and B. Roy (eds), AMCDA-Aide Multicritère à la decision (Multiple Criteria Decision Aiding), EUR Report, Joint Research Centre, The European Commission, Ispra (to appear). Greco, S., Matarazzo, B. and Slowinski, R. (2002), “Rough sets methodology for sorting problems in presence of multiple attributes and criteria”, European Journal of Operational Research, 138, 247-259. Grinold, R.C. (1972), “Mathematical programming methods for pattern classification”, Management Science, 19, 272-289. Grzymala-Busse, J.W. (1992), “LERS: A system for learning from examples based on rough sets”, in: R. Slowinski (ed.), Intelligent Decision Support. Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht, 3–18. Grzymala-Busse, J.W. and Stefanowski, J. (2001), “Three discretization methods for rule induction”, International Journal of Intelligent Systems, 26, 29-38. Gupta, M.C. and Huefner, R.J. (1972), “A cluster analysis study of financial ratios and industry characteristics”, Journal of Accounting Research, Spring, 77-95. Gupta, Y.P., Rao, R.P. and Bagghi, P.K. (1990), “Linear goal programming as an alternative to multivariate discriminant analysis: A Note”, Journal of Business Finance and Accounting, 17/4, 593-598. Hand, D.J. (1981), Discrimination and Classification, Wiley, New York. Harker, P.T. and Vargas, L.G. (1990), “Reply to ‘Remark on the analytic hierarchy process’ By J.S. Dyer”, Management Science, 36/3, PP. 269-273.
References
239
Horsky, D. and Rao, M.R. (1984), “Estimation of attribute weights from preference comparisons”, Management Science, 30/7, 801-822. Hosseini, J.C. and Armacost, R.L. (1994), “The two-group discriminant problem with equal group mean vectors: An experimental evaluation of six linear/nonlinear programming formulations”, European Journal of Operational Research, 77, 241-252. Hung, M.S. and Denton, J.W. (1993), “Training neural networks with the GRG2 nonlinear optimizer”, European Journal of Operational Research, 69, 83-91. Hurson Ch. and Ricci, N. (1998), “Multicriteria decision making and portfolio management with arbitrage pricing theory”, in: C. Zopounidis (ed.), Operational Tools in The Management of Financial Risks, Kluwer Academic Publishers, Dordrecht, 31-55. Hurson, Ch. and Zopounidis, C. (1995), “On the use of multi-criteria decision aid methods to portfolio selection”, Journal of Euro-Asian Management, 1/2, 69-94. Hurson Ch. and Zopounidis C. (1996), “Méthodologie multicritère pour l’évaluation et la gestion de portefeuilles d’actions”, Banque et Marché 28, Novembre-Décembre, 11-23. Hurson, Ch. and Zopounidis, C. (1997), Gestion de Portefeuille et Analyse Multicritère, Economica, Paris. Ishibuchi, H., Nozaki, K. and Tanaka, H. (1992), “Distributed representation of fuzzy rules and its application to pattern classification”, Fuzzy Sets and Systems, 52, 21-32. Ishibuchi, H., Nozaki, K. and Tanaka, H. (1993), “Efficient fuzzy partition of pattern space for classification problems”, Fuzzy Sets and Systems, 59, 295-304. Inuiguchi, M., Tanino, T. and Sakawa, M. (2000), “Membership function elicitation in possibilistic programming problems”, Fuzzy Sets and Systems, 111, 29-45. Jablonsky, J. (1993), “Multicriteria evaluation of clients in financial houses”, Central European Journal of Operations Research and Economics, 3/2, 257-264. Jacquet-Lagrèze, E. (1995), “An application of the UTA discriminant model for the evaluation of R & D projects”, in: P.M. Pardalos, Y. Siskos, C. Zopounidis (eds.), Advances in Multicriteria Analysis, Kluwer Academic Publishers, Dordrecht, 203-211. Jacquet-Lagrèze, E. and Siskos, J. (1978), “Une méthode de construction de fonctions d’ utilité additives explicatives d’ une préférence globale”, Cahier du LAMSADE, No 16, Université de Paris-Dauphine. Jacquet-Lagrèze, E. and Siskos, Y. (1982), “Assessing a set of additive utility functions for multicriteria decision making: The UTA method”, European Journal of Operational Research, 10, 151-164. Jacquet-Lagrèze, E. and Siskos, J. (1983), Méthodes de Décision Multicritère, Editions Hommes et Techniques, Paris. Jacquet-Lagrèze, E. and Siskos, J. (2001), “Preference disaggregation: Twenty years of MCDA experience”, European Journal of Operational Research, 130, 233-245. Jelanek, J. and Stefanowki, J. (1998), “Experiments on solving multiclass learning problems by n2-classifier”, in: Proceedings of the 10th European Conference on Machine Learning, Chemnitz, April 21-24, 1998, Lecture Notes in AI, vol. 1398, Springer-Verlag, Berlin, 172-177.
240 Jensen, R.E. (1971), “A cluster analysis study of financial performance of selected firms”, The Accounting Review, XLVI, January, 36-56. Joachimsthaler, E.A. and Stam, A. (1988), “Four approaches to the classification problem in discriminant analysis: An experimental study”, Decision Sciences, 19, 322–333. Joachimsthaler, E.A. and Stam, A. (1990), “Mathematical programming approaches for the classification problem in two-group discriminant analysis”, Multivariate Behavioral Research, 25/4, 427-454. Jog, V., Michalowski, W., Slowinski, R. and Susmaga, R. (1999), “The Rough Sets Analysis and the Neural Networks Classifier: A Hybrid Approach to Predicting Stocks’ Performance”, in: D.K. Despotis and C. Zopounidis (eds.), Integrating Technology & Human Decisions: Bridging into the 21st Century, Vol. II, Proceedings of the 5th International Meeting of the Decision Sciences Institute, New Technologies Editions, Athens, 1386-1388. John, G.H., Miller, P. and Kerber, R. (1996), “Stock selection using RECONTM/SM,., in: Y. AbuMostafa, J. Moody, P. Refenes and A. Weigend (eds.), Neural Networks in Financial Engineering, World Scientific, London, 303-316. Karapistolis, D., Katos, A., and Papadimitriou, G. (1996), “Selection of a solvent portfolio using discriminant analysis”, in: Y. Siskos, C. Zopounidis, and K. Pappis (Eds.), Management of small firms, Cretan University Editions, Iraklio, 135-140 (in Greek). Kahya, E. and Theodossiou, P. (1999), “Predicting corporate financial distress: A time-series CUSUM methodology”, Review of Quantitative Finance and Accounting, 13, 323-345. Karst, O.J. (1958), “Linear curve fitting using least deviations”, Journal of the American Statistical Association, 53, 118-132. Keasey, K. and Watson, R. (1991), “Financial distress prediction models: A review of their usefulness”, British Journal of Management, 2, 89-102. Keasey, K., McGuinness, P. and Short, H. (1990), “Multilogit approach to predicting corporate failure-Further analysis and the issue of signal consistency”, Omega, 18/1, 85-94. Kelley, J.E. (1958), “An application of linear programming to curve fitting”, Journal of Industrial and Applied Mathematics, 6, 15-22. Keeney, R.L. and Raiffa, H. (1993), Decisions with Multiple Objectives: Preferences and Value Trade-offs, Cambridge University Press, Cambridge. Khalil, J., Martel, J-M. and Jutras, P. (2000), “A multicriterion system for credit risk rating”, Gestion 2000: Belgian Management Magazine, 15/1, 125-146. Khoury, N.T., Martel, J.M. and Veilleux, M. (1993), “Méthode multicritère de sélection de portefeuilles indiciels internationaux”, L’Actualité Economique, Revue d’Analyse Economique, 69/1, 171-190. Klemkowsky, R. and Petty, J.W. (1973), “A multivariate analysis of stock price variability”, Journal of Business Research, Summer. Koehler, G.J. and Erenguc, S.S. (1990), “Minimizing misclassifications in linear discriminant analysis”, Decision Sciences, 21, 63–85. Kohara, K., Ishikawa, T., Fukuhara, Y. and Nakamura, Y. (1997), “Stock price prediction using prior knowledge and neural networks”, Intelligent Systems in Accounting, Finance and Management, 6, 11-22.
References
241
Koopmans, T.C. (1951), Activity Analysis of Production and Allocation, John Wiley and Sons, New York. Kordatoff, Y. and Michalski, R.S. (1990), Machine Learning: An Artificial Intelligence Approach, Volume III, Morgan Kaufmann Publishers, Los Altos, California. Korhonen, P. (1988), “A visual reference direction approach to solving discrete multiple criteria problems”, European Journal of Operational Research, 34, 152-159. Korhonen, P. and Wallenius, J. (1988), “A Pareto race”, Naval Research Logistics, 35, 615623. Kosko, B. (1992), Neural Networks and Fuzzy Systems, Prentice-Hall, Englewood Cliffs, New Jersey. Krzanowski, W.J. (1975), “Discrimination and classification using both binary and continuous variables”, Journal of the American Statistical Association, 70, 782-790. Krzanowski, W.J. (1977), “The performance of Fisher’s linear discriminant function under nonoptimal conditions”, Technometrics, 19, 191-200. Laitinen, E.K. (1992), “Prediction of failure of a newly founded firm”, Journal of Business Venturing, 7, 323-340. Lam, K.F. and Choo, E.U. (1995), “Goal programming in preference decomposition”, Journal of the Operational Research Society, 46, 205-213. Lanchenbruch, P.A., Snuringer, C. and Revo, L.T. (1973), “Robustness of the linear and quadratic discriminant function to certain types of non-normality”, Communications in Statistics, 1, 39-56. Langholz, G., Kandel, A., Schneider, M. and Chew, G. (1996), Fuzzy Expert System Tools, John Wiley and Sons, New York. Lee, S.M. and Chesser, D.L. (1980), “Goal programming for portfolio selection”, The Journal of Portfolio Management, Spring, 22-26. Lee, K.C. and Kim, H.S. (1997), “A fuzzy cognitive map-based bi-directional inference mechanism: An application to stock investment analysis”, Intelligent Systems in Accounting, Finance & Management, 6, 41-57. Lee, C.K. and Ord, K.J. (1990), “Discriminant analysis using least absolute deviations”, Decision Science, 21, 86-96. Lee, K.H. and Jo, G.S. (1999), “Expert system for predicting stock market timing using a candlestick chart”, Expert Systems with Applications, 16, 357-364. Lee, J.K., Kim, H.S. and Chu, S.C. (1989), “Intelligent stock portfolio management system”, Expert Systems, 6/2, 74-85. Lee, H., Kwak, W. and Han, I. (1995), “Developing a business performance evaluation system: An analytic hierarchical model”, The Engineering Economist, 30/4, 343-357. Lennox, C.S. (1999), “The accuracy and incremental information content of audit reports in predicting bankruptcy”, Journal of Business Finance & Accounting, 26/5-6, 757-778. Liittschwager, J.M., and Wang, C. (1978), “Integer programming solution of a classification problem”, Management Science, 24/14, 1515-1525.
242 Liu, N.K.. and Lee, K.K. (1997), “An intelligent business advisor system for stock investment”, Expert Systems, 14/4, 129-139. Lofti, V., Stewart, T.J. and Zionts, S. (1992), “An aspiration-level interactive model for multiple criteria decision making”, Computers and Operations Research, 19, 677-681. Lootsma, F.A. (1997), Fuzzy Logic for Planning and Decision Making, Kluwer Academic Publishers, Dordrecht. Luce, D. (1956), “Semiorders and a theory of utility discrimination”, Econometrica, 24. Luoma, M. and Laitinen, E.K. (1991), “Survival analysis as a tool for company failure prediction”, Omega, 19/6, 673-678. Lynch, J.G. (1979), “Why additive utility models fail as descriptions of choice behavior”, Journal of Experimental Social Phychology, 15, 397-417. Mangasarian, O.L. (1968), “Multisurface method for patter separation”, IEEE Transactions on Information Theory, IT-14/6, 801-807. Mardia, K.V. (1975), “Assessment of multinormality and the robustness of Hotelling’s T2 test”, Applied Statistics, 24, 163-171. Markowitz, H. (1952), “Portfolio selection”, Journal of Finance, 7/1, 77-91. Markowitz, H. (1959), Portfolio Selection: Efficient Diversification of Investments, John Wiley and Sons, New York. Markowski, C.A. (1990), “On the balancing of error rates for LP discriminant methods”, Managerial and Decision Economics, 11, 235-241. Markowski, E.P. and Markowski, C.A. (1985), “Some difficulties and improvements in applying linear programming formulations to the discriminant problem”, Decision Sciences, 16, 237-247. Markowski, C.A. and Markowski, E.P. (1987), “An experimental comparison of several approaches to the discriminant problem with both qualitative and quantitative variables”, European Journal of Operational Research, 28, 74-78. Martel, J.M., Khoury, N.T. and Bergeron, M. (1988), “An application of a multicriteria approach to portfolio comparisons”, Journal of the Operational Research Society, 39/7, 617-628. Martin, D. (1977), “Early warning of bank failure: A logit regression approach”, Journal of Banking and Finance, 1, 249-276. Massaglia, M. and Ostanello, A. (1991), “N-TOMIC: A decision support for multicriteria segmentation problems”, in: P. Korhonen (ed.), International Workshop on Multicriteria Decision Support, Lecture Notes in Economics and Mathematics Systems 356, SpringerVerlag, Berlin, 167-174. Matsatsinis, N.F., Doumpos, M. and Zopounidis, C. (1997), “Knowledge acquisition and representation for expert systems in the field of financial analysis”, Expert Systems with Applications, 12/2, 247-262. McFadden, D. (1974), “Conditional logit analysis in qualitative choice behavior”, in: P. Zarembka (ed.), Frontiers in Econometrics, Academic Press, New York.
References
243
McFadden, D. (1980), “Structural discrete probability models derived from the theories of choice”, in: C.F. Manski and D. McFadden (eds.), Structural Analysis of Discrete Data with Econometric Applications, MIT Press, Cambridge, Mass. McLachlan, G. J. (1992), Discriminant Analysis and Statistical Pattern Recognition, Wiley, New York. Messier, W.F. and Hansen, J.V. (1988), “Inducing rules for expert system development: An example using default and bankruptcy data”, Management Science, 34/12, 1403-1415. Michalski, R.S. (1969), “On the quasi-minimal solution of the general covering problem”, Proceedings of the 5th International Federation on Automatic Control, Vol. 27, 109-129. Mienko, R., Stefanowski, J., Toumi, K. and Vanderpooten, D., (1996), “Discovery-oriented induction of decision rules”, Cahier du LAMSADE no. 141, Université de Paris Dauphine, Paris. Moody’s Investors Service (1998), Moody’s Equity Fund Analyzer (MFA): An Analytical Tool to Assess the Performance and Risk Characteristics of Equity Mutual Funds, Moody’s Investors Service, New York. Moody’s Investors Service (1999), Moody’s Sovereign Ratings: A Ratings Guide, Moody’s Investors Service, New York. Moody’s Investors Service (2000), Moody’s Three Point Plot: A New Approach to Mapping Equity Fund Returns, Moody’s Investors Service, New York. Moore, D.H. (1973), “Evaluation of five discriminant procedures for binary variables”, Journal of the American Statistical Association, 68, 399-404. Mousseau, V. and Slowinski, R. (1998), “Inferring an ELECTRE-TRI model from assignment examples”, Journal of Global Optimization, 12/2, 157-174. Mousseau, V., Slowinski, R. and Zielniewicz, P. (2000), “A user-oriented implementation of the ELECTRE-TRI method integrating preference elicitation support”, Computers and Operations Research, 27/7-8, 757-777. Murphy, J. (1999), Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications, Prentice Hall Press, New Jersey. Nakayama, H. and Kagaku, N. (1998), “Pattern classification by linear goal programming and its extensions”, Journal of Global Optimization, 12/2, 111-126. Nakayama, H., Takeguchi, T. and Sano, M. (1983), “Interactive graphics for portfolio selection”, in: P. Hansen (ed.), Essays and Surveys on Multiple Criteria Decision Making, Lectures Notes in Economics and Mathematical Systems 209, Springer Verlag, BerlinHeidelberg, 280-289. Nieddu, L. and Patrizi, G. (2000), “Formal methods in pattern recognition: A review”, European Journal of Operational Research, 120, 459-495. Oh, S. and Pedrycz, W. (2000), “Identification of fuzzy systems by means of an auto-tuning algorithm and its application to nonlinear systems”, Fuzzy Sets and Systems, 115, 205230. Ohlson, J.A. (1980), “Financial ratios and the probabilistic prediction of bankruptcy”, Journal of Accounting Research, 18, 109–131.
244 Oral, M. and Kettani, O. (1989), “Modelling the process of multiattribute choice”, Journal of the Operational Research Society, 40/3, 281-291. Östermark, R. and Höglund, R. (1998), “Addressing the multigroup discriminant problem using multivariate statistics and mathematical programming”, European Journal of Operational Research, 108, 224-237. Pareto, V. (1896), Cours d’ Economie Politique, Lausanne. Pardalos, P.M., Sandström, M. and Zopounidis, C. (1994), “On the use of optimization models for portfolio selection: A review and some computational results”, Computational Economics, 7/4, 227-244. Pardalos, P.M., Siskos, Y. and Zopounidis, C. (1995), Advances in Multicriteria Analysis, Kluwer Academic Publishers, Dordrecht. Patuwo, E., Hu, M.Y. and Hung, M.S. (1993), “Two-group classification using neural networks”, Decision Sciences, 24, 825-845. Pawlak, Z. (1982), “Rough sets”, International Journal of Information and Computer Sciences, 11, 341–356. Pawlak, Z. (1991) Rough Sets: Theoretical Aspects of Reasoning about Data, Kluwer Academic Publishers, Dordrecht. Pawlak, Z. and Slowinski, R. (1994), “Rough set approach to multi-attribute decision analysis”, European Journal of Operational Research, 72, 443-459. Peel, M.J. (1987), “Timeliness of private company reports predicting corporate failure”, Investment Analysis, 83, 23-27. Perny, P. (1998), “Multicriteria filtering methods based on concordance and non-discordance principles”, Annals of Operations Research, 80, 137-165. Platt, H.D. and Platt, M.B. (1990), “Development of a class of stable predictive variables: The case of bankruptcy prediction”, Journal of Business Finance and Accounting, 17/1, 31– 51. Press, S.J. and Wilson, S. (1978), “Choosing between logistic regression and discriminant analysis”, Journal of the American Statistical Association, 73, 699-705. Quinlan, J.R. (1983), “Learning efficient classification procedures and their application to chess end games”, in: R.S. Michalski, J.G. Carbonell and T.M. Mitchell (eds.), Machine Learning: An Artificial Intelligence Approach, Tioga Publishing Company, Palo Alto, CA. Quinlan, J.R. (1986), “Induction of decision trees”, Machine Learning 1, 81–106. Quinlan J.R. (1993), C4.5: Programs for Machine Learning, Morgan Kaufmann Publishers, Los Altos, California. Ragsdale, C.T. and Stam, A. (1991), “Mathematical programming formulations for the discriminant problem: An old dog does new tricks”, Decision Sciences, 22, 296-307. Rios-Garcia, S. and Rios-Insua, S. (1983), “The portfolio problem with multiattributes and multiple criteria”, in: P. Hansen (ed.), Essays and Surveys on Multiple Criteria Decision Making, Lectures Notes in Economics and Mathematical Systems 209, Springer Verlag, Berlin Heidelberg, 317-325.
References
245
Ripley, B.D. (1996), Pattern Recognition and Neural Networks, Cambridge University Press, Cambridge. Rose, P.S., Andrews W.T. and Giroux, G.A. (1982), “Predicting business failure: A macroeconomic perspective”, Journal of Accounting and Finance, 6/1, 20-31. Ross, S. (1976), “The arbitrage theory of capital asset pricing”, Journal of Economic Theory, 13, 343-362. Roy, B. (1968), “Classement et choix en présence de points de vue multiples: La méthode ELECTRE”, R.I.R.O, 8, 57-75. Roy, B. (1985), Méthodologie Multicritère d’ Aide à la Décision, Economica, Paris. Roy, B. (1991), “The outranking approach and the foundations of ELECTRE methods”, Theory and Decision, 31, 49-73. Roy, B. and Vincke, Ph. (1981), “Multicriteria analysis: Survey and new directions”, European Journal of Operational Research, 8, 207-218. Roy, B. and Bouyssou D. (1986), “Comparison of two decision-aid models applied to a nuclear power plant sitting example”, European Journal of Operational Research, 25, 200215. Rubin, P.A. (1990a), “Heuristic solution procedures for a mixed–integer programming discriminant model”, Managerial and Decision Economics, 11, 255–266. Rubin, P.A. (1990b), “A comparison of linear programming and parametric approaches to the two–group discriminant problem”, Decision Sciences, 21, 373–386. Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986), “Learning internal representation by error propagation”, in: D.E. Rumelhart and J.L. Williams (eds.), Parallel Distributed Processing: Explorations in the Microstructure of Cognition, MIT Press, Cambridge, Mass. Saaty, T.L. (1980), The Analytic Hierarchy Process, McGraw-Hill, New York. Saaty, T.L., Rogers, P.C. and Pell, R. (1980), “Portfolio selection through hierarchies”, The Journal of Portfolio Management, Spring, 16-21. Scapens, R.W., Ryan, R.J. and Flecher, L. (1981), “Explaining corporate failure: A catastrophe theory approach”, Journal of Business Finance and Accounting, 8/1, 1-26. Schoner B. and Wedley, W.C. (1989), “Ambiguous criteria weights in AHP: Consequences and solutions”, Decision Sciences, 20, 462-475. Schoner B. and Wedley, W.C. (1993), “A unified approach to AHP with linking pins”, European Journal of Operational Research, 64, 384-392. Sharpe, W. (1964), “Capital asset prices: A theory of market equilibrium under conditions of risk”, Journal of Finance, 19, 425-442. Sharpe, W. (1998), “Morningstar’s risk adjusted ratings”, Financial Analysts Journal, July/August, 21-33. Shen, L., Tay, F.E.H, Qu, L. and Shen, Y. (2000), “Fault diagnosis using rough sets theory”, Computers in Industry 43, 61-72. Siskos, J. (1982), “A way to deal with fuzzy preferences in multicriteria decision problems”, European Journal of Operational Research, 10, 314-324.
246 Siskos, J. and Despotis, D.K. (1989), “A DSS oriented method for multiobjective linear programming problems”, Decision Support Systems, 5, 47-55. Siskos, Y. and Yannacopoulos, D. (1985), “UTASTAR: An ordinal regression method for building additive value functions”, Investigação Operacional, 5/1, 39-53. Siskos, J., Lochard, J. and Lombardo, J. (1984a), “A multicriteria decision-making methodology under fuzziness: Application to the evaluation of radiological protection in nuclear power plants”, in: H.J. Zimmermann, L.A. Zadeh, B.R. Gaines (eds.), Fuzzy Sets and Decision Analysis, North-Holland, Amsterdam, 261-283. Siskos, J., Wascher, G. and Winkels, H.M. (1984b), “Outranking approaches versus MAUT in MCDM”, European Journal of Operational Research, 16, 270-271. Siskos, Y., Grigoroudis, E., Zopounidis, C. and Saurais, O. (1998), “Measuring customer satisfaction using a survey based preference disaggregation model”, Journal of Global Optimization, 12/2, 175-195. Skogsvik, K. (1990), “Current cost accounting ratios as predictors of business failure: The Swedish case”, Journal of Business Finance and Accounting, 17/1, 137-160. Skowron, A. (1993), “Boolean reasoning for decision rules generation”, in: J. Komorowski and Z. W. Ras (eds.), Methodologies for Intelligent Systems, Lecture Notes in Artificial Intelligence vol. 689, Springer-Verlag, Berlin, 295–305. Slowinski, R. (1993), “Rough set learning of preferential attitude in multi-criteria decision making”, in: J. Komorowski and Z. W. Ras (eds.), Methodologies for Intelligent Systems. Lecture Notes in Artificial Intelligence vol. 689, Springer-Verlag, Berlin, 642–651. Slowinski, R. and Stefanowski, J. (1992), “RoughDAS and RoughClass software implementations of the rough sets approach”, in: R. Slowinski (ed.), Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, Kluwer Academic Publishers, Dordrecht, 445-456. Slowinski, R. and Stefanowski, J. (1994), “Rough classification with valued closeness relation”, in: E. Diday et al. (eds.), New Approaches in Classification and Data Analysis, Springer-Verlag, Berlin, 482–488. Slowinski, R. and Zopounidis, C. (1995), “Application of the rough set approach to evaluation of bankruptcy risk”, International Journal of Intelligent Systems in Accounting, Finance and Management, 4, 27–41. Smith, C. (1947), “Some examples of discrimination”, Annals of Eugenics, 13, 272-282. Smith, K.V. (1965), “Classification of investment securities using multiple discriminant analysis”, Institute Paper No. 101, Institute for Research in the Behavioral, Economic and Management Sciences, Perdue University. Smith, F.W. (1968), “Pattern classifier design by linear programming”, IEEE Transactions on Computers, C-17,4, 367-372. Spronk, J. (1981), Interactive Multiple Goal Programming Application to Financial Planning, Martinus Nijhoff Publishing, Boston. Spronk, J. and Hallerbach, W. (1997), “Financial modeling: Where to go? With an illustration for portfolio management”, European Journal of Operational Research, 99, 113-125.
References
247
Srinivasan, V. and Kim, Y.H. (1987), “Credit granting: A comparative analysis of classification procedures”, Journal of Finance, XLII/3, 665–683. Srinivasan, V. and Ruparel, B. (1990), “CGX: An expert support system for credit granting”, European Journal of Operational Research, 45, 293-308. Srinivasan, V. and Shocker, A.D. (1973), “Linear programming techniques for multidimensional analysis of preferences”, Psychometrika, 38/3, 337–396. Stam, A. (1990), “Extensions of mathematical programming-based classification rules: A multicriteria approach”, European Journal of Operational Research, 48, 351-361. Stam, A. and Joachimsthaler, E.A. (1989), “Solving the classification problem via linear and nonlinear programming methods”, Decision Sciences, 20, 285–293. Standard & Poor’s Rating Services (1997), International Managed Funds: Profiles, Criteria, Related Analytics, Standard & Poor’s, New York. Standard & Poor’s Rating Services (2000), Money Market Fund Criteria, Standard & Poor’s, New York. Stefanowski, J. and Vanderpooten, D. (1994), “A general two-stage approach to inducing rules from examples”, in: W. Ziarko (ed.) Rough Sets, Fuzzy Sets and Knowledge Discovery, Springer-Verlag, London, 317–325. Steiner, M. and Wittkemper, H.G. (1997), “Portfolio optimization with a neural network implementation of the coherent market hypothesis”, European Journal of Operational Research, 100, 27-40. Steuer, R.E. and Choo, E.U. (1983), “An interactive weighted Tchebycheff procedure for multiple objective programming”, Mathematical Programming, 26/1, 326-344. Stewart, T.J. (1993), “Use of piecewise linear value functions in interactive multicriteria decision support: A Monte Carlo study”, Management Science, 39, 1369-1381. Stewart, T.J. (1996), “Robustness of additive value function methods in MCDM”, Journal of Multi-Criteria Decision Analysis, 5, 301-309. Subramanian, V., Hung, M.S. and Hu, M.Y. (1993), “An experimental evaluation of neural networks for classification”, Computers and Operations Research, 20/7, 769-782. Subrahmaniam, K. and Chinganda, E.F. (1978), “Robustness of the linear discriminant function to nonnormality: Edgeworth series”, Journal of Statistical Planning and Inference, 2, 79-91. Szala, A. (1990), L’ Aide à la Décision en Gestion de Portefeuille, Diplôme Supérieur de Recherches Appliquées, Université de Paris Dauphine. Tam, K.Y., Kiang, M.Y. and Chi, R.T.H. (1991), “Inducing stock screening rules for portfolio construction”, Journal of the Operational Research Society, 42/9, 747-757. Tamiz, M., Hasham, R. and Jones, D.F. (1997), “A comparison between goal programming and regression analysis for portfolio selection”, in: G. Fandel and Th. Gal (eds.), Lectures Notes in Economics and Mathematical Systems 448, Multiple Criteria Decision Making, Proceedings of the Twelfth International Conference, Hagen, Germany, BerlinHeidelberg, 422-432. Tessmer, A.C. (1997), “What to learn from near misses: An inductive learning approach to credit risk assessment”, Decision Sciences, 28/1, 105-120.
248 Theodossiou, P. (1987), Corporate Failure Prediction Models for the US Manufacturing and Retailing Sectors, Unpublished Ph.D. Thesis, City University of New York. Theodossiou, P. (1991), “Alternative models for assessing the financial condition of business in Greece”, Journal of Business Finance and Accounting, 18/5, 697–720. Theodossiou, P., Kahya, E., Saidi, R. and Philippatos, G. (1996), “Financial distress and corporate acquisitions: Further empirical evidence”, Journal of Business Finance and Accounting, 23/5–6, 699–719. Trippi, R.R. and Turban, R. (1996), Neural Networks in Finance and Investing, Irwin, Chicago. Tsumoto, S. (1998), “Automated extraction of medical expert system rules from clinical databases based on rough set theory”, Information Sciences, 112, 67-84. Wagner, H.M. (1959), “Linear programming techniques for regression analysis”, Journal of the American Statistical Association, 54, 206-212. White, R. (1975), “A multivariate analysis of common stock quality ratings”, Financial Management Association Meetings. Wierzbicki, A.P. (1980), “The use of reference objectives in multiobjective optimization”, in: G. Fandel and T. Gal (eds.), Multiple Criteria Decision Making: Theory and Applications, Lecture Notes in Economic and Mathematical Systems 177, Springer-Verlag, Berlin-Heidelberg, 468-486. Wilson, J.M. (1996), “Integer programming formulation of statistical classification problems”, Omega, 24/6, 681–688. Wilson, R.L. and Sharda, R. (1994), “Bankruptcy prediction using neural networks”, Decision Support Systems, 11, 545-557. Wong, F.S., Wang, P.Z., Goh, T.H. and Quek, B.K. (1992), “Fuzzy neural systems for stock selection”, Financial Analysts Journal, January/February, 47-52. Wood, D. and Dasgupta, B. (1996), “Classifying trend movements in the MSCI U.S.A. capital market index: A comparison of regression, ARIMA and neural network methods”, Computers and Operations Research, 23/6, 611 -622. Vale, D.C. and Maurelli, V.A. (1983), “Simulating multivariate nonnormal distributions”, Psychometrika, 48/3,465-471. Vargas, L.G. (1990), “An overview of the AHP and its applications”, European Journal of Operational Research, 48, 2-8. Von Altrock, C. (1996), Fuzzy Logic and Neurofuzzy Applications in Business and Finance, Prentice Hall, New Jersey. Von Neumann, J. and Morgenstern, O. (1944), Theory of Games and Economic Behavior, Princeton, New Jersey. Yager, R.R. (1977), “Multiple objective decision-making using fuzzy sets”, International Journal of Man-Machine Studies, 9, 375-382. Yandell, B.S. (1977), Practical Data Analysis for Designed Experiments, Chapman & Hall, London.
References
249
Yanev, N. and Balev, S. (1999), “A combinatorial approach to the classification problem”, European Journal of Operational Research, 115, 339-350. Young, T.Y and Fu, K.-S. (1997), Handbook of Pattern Recognition and Image Processing, Handbooks in Science and Technology, Academic Press. Yu, W. (1992), “ELECTRE TRI: Aspects methodologiques et manuel d’utilisation”. Document du Lamsade No 74, Universite de Paris-Dauphine, 1992. Vranas, A.S. (1992), “The significance of financial characteristics in predicting business failure: An analysis in the Greek context,” Foundations of Computing and Decision Sciences, 17/4, 257-275. Zadeh, L.A. (1965), “Fuzzy sets”, Information and Control, 8, 338-353. Zahedi, F. (1986), “The analytic hierarchy process: A survey of the method and its applications”, Interfaces, 16, 96-108. Zanakis, S.H., Solomon, A., Wishart, N. and Duvlish, S. (1998), “Multi-attribute decision making: A simulation comparison of select methods”, European Journal of Operational Research, 107, 507-529. Zavgren, C.V. (1985), “Assessing the vulnerability to failure of American industrial firms. A logistic analysis”, Journal of Business Finance and Accounting, 12/1, 19–45. Ziarko, W., Golan, D. and Edwards, D. (1993), “An application of DATALOGIC/R knowledge discovery tool to identify strong predictive rules in stock market data”, in: Proceedings of the AAAI Workshop on Knowledge Discovery in Databases, Washington D.C., 89–101. Zighed, D., Rabaseda, S. and Rakotomala, R. (1998), “FUSINTER: A method for discretisation of continuous attributes”, International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 6/3, 307-326. Zions, S. and Wallenius, J. (1976), “An interactive programming method for solving the multicriteria problem”, Management Science, 22, 652-663. Zimmermann, H.J. (1978), “Fuzzy programming and linear programming with several objective functions”, Fuzzy Sets and Systems, 1, 45-55. Zmijewski, M.E. (1984), “Methodological issues related to the estimation of financial distress prediction models”, Studies on current Econometric Issues in Accounting Research, 5982. Zopounidis, C. (1987), “A multicriteria decision making methodology for the evaluation of the risk of failure and an application”, Foundations of Control Engineering, 12/1,45–67. Zopounidis, C. (1990), La Gestion du Capital-Risque, Economica, Paris. Zopounidis, C. (1993), “On the use of the MINORA decision aiding system to portfolio selection and management”, Journal of Information Science and Technology, 2/2, 150-156. Zopounidis, C. (1995), Evaluation du Risque de Défaillance de l’Entreprise: Méthodes et Cas d’Application, Economica, Paris. Zopounidis, C. (1998), Operational Tools in the Management of Financial Risks, Kluwer Academic Publishers, Dordrecht.
250 Zopounidis, C. (1999), “Multicriteria decision aid in financial management”, European Journal of Operational Research, 119, 404-415. Zopounidis, C. and Dimitras, A.I. (1998), Multicriteria Decision Aid Methods for the Prediction of Business Failure, Kluwer Academic Publishers, Dordrecht. Zopounidis, C. and Doumpos, M. (1997), “A multicriteria decision aid methodology for the assessment of country risk”, European Research on Management and Business Economics, 3/3, 13-33. Zopounidis, C. and Doumpos, M. (1998), “Developing a multicriteria decision support system for financial classification problems: The FINCLAS system”, Optimization Methods and Software, 8, 277-304. Zopounidis, C. and Doumpos, M. (1999a), “Business failure prediction using UTADIS multicriteria analysis”, Journal of the Operational Research Society, 50/11, 1138-1148. Zopounidis, C. and Doumpos, M. (1999b), “A multicriteria decision aid methodology for sorting decision problems: The case of financial distress”, Computational Economics, 14/3, 197-218. Zopounidis, C. and Doumpos, M. (2000a), “PREFDIS: A multicriteria decision support system for sorting decision problems”, Computers and Operations Research, 27/7-8, 779797. Zopounidis, C. and Doumpos, M. (2000b), Intelligent Decision Aiding Systems Based on Multiple Criteria for Financial Engineering, Kluwer Academic Publishers, Dordrecht. Zopounidis, C. and Doumpos, M. (2000c), “Building additive utilities for multi-group hierarchical discrimination: The M.H.DIS method”, Optimization Methods and Software, 14/3, 219-240. Zopounidis, C. and Doumpos, M. (2000d), “INVESTOR: A decision support system based on multiple criteria for portfolio selection and composition”, in: A. Colorni, M. Paruccini and B. Roy (eds.), A-MCD-A (Aide Multi Critère à la Décision – Multiple Criteria Decision Aiding), European Commission Joint Research Centre, 371-381. Zopounidis, C., Matsatsinis, N.F. and Doumpos, M. (1996), “Developing a multicriteria knowledge-based decision support system for the assessment of corporate performance and viability: The FINEVA system”, Fuzzy Economic Review, 1/2, 35-53. Zopounidis, C., Despotis D.K. and Kamaratou, I. (1998), “Portfolio selection using the ADELAIS multiobjective linear programming system”, Computational Economics, 11/3 (1998), 189-204. Zopounidis, C., Doumpos, M. and Zanakis, S.H. (1999), “Stock evaluation using a preference disaggregation methodology”, Decision Sciences, 30/2, 313-336.
Subject index
Arbitrage pricing theory, 206 Bankruptcy prediction, 6, 161-163 Bayes rule, 20, 68, 70 C4.5, 28-29 Bond rating, 160 Capital asset pricing model, 206 Capital losses, 223 Classification error rate, 82, 84-88 Clustering, 5 Coefficient of variation, 216-217 Compensatory approaches, 125, 149 Consistent family of criteria, 4243 Consistency, 99 Correlation coefficient, 129,166, 191 Country risk, 6,161 Credit granting, 58, 185-188 Credit risk assessment, 6, 13, 185188 Decision problematics, 1-3 Decision rules, 27-28, 31, 34-37 Decision trees, 28 Decision support systems, 41, 49, 78,187 Default risk, 161, 186 Degeneracy, 97 Descriptive statistics, 166, 191, 213 Discriminant analysis Linear, 16-18 Quadratic, 18-19 Discriminant function Linear, 16 Quadratic, 18 Dividend policy, 159 Dominance relation, 37 Efficient set, 40, 46
ELECTRE TRI Assignment procedures, 6364 Concordance index, 60 Concordance test, 60 Credibility index, 62 Discordance index, 62 Discordance test, 60 Indifference threshold, 61 Preference threshold, 61 Reference profiles, 59-60, Veto threshold, 62 Error types Type I error, 181 Type II error, 181 Experimental design, 126-127 Expert systems, 31, 187, 208 Factor analysis, 188 Financial management, 13, 54, 159 Financial ratios, 162-171 Financial statements, 162 FINCLAS system, 188, 178 Forecasting, 159, 207 Fuzzy sets, 30-32 Genetic algorithms, 71, 120, 187 Goal programming, 47 Group overlap, 130 ID3, 28 Incomparability relation, 51 Jackknife, 215 Kurtosis, 72 Linear interpolation, 92 Linear probability model, 20 LERS system, 37 Logit analysis Logit model, 20-23 Ordered model, 23 Multinomial model, 22 Machine learning, 27-30
252
Mean-variance model, 206 Mergers and acquisitions, 160 MHDIS Classification rule, 102 Hierarchical discrimination, 101-105 Marginal utility functions, 102-104 Model extrapolation, 111112 Mixed-integer programming, 71 Model validation, 134, 210, 215 Monotonicity, 42-43 Multiattribute utility theory, 48-49 Multicriteria decision aid, 39-55 Multi-group classification, 22, 128, 210 Multiobjective mathematical programming, 45-48 Mutual funds, 160, 205 Net present value, 186 Neural networks, 24-27 Non-compensatory approaches, 125 Opportunity cost, 85,181 Option valuation, 206 Ordinal regression, 54 Outranking relation theory, 50-52 Portfolio theory, 160, 206, 207 PREFDIS system, 178 Preference disaggregation analysis, 52-55 Preferential independence, 49 Principal components analysis, 133 Quadratic programming, 206 Random data generation, 131-134 Rank reversal, 58-59 Risk attitude, 79 Reference set, 54, 82 Regression analysis, 6 Rough sets
Core, 34 Discretization, 32 Decision rules, 34-37 DOMLEM algorithm, 35 Indiscernibility relation, 33 MODLEM algorithm, 36 Reduct, 34 Rule induction, 34-36 Valued closeness relation, 37 Skewness, 133 Sorting, 4 Statistical distribution, 127 Stock evaluation, 205-209 Tabu search, 71, 120, 230 Time-series, 185, 207 Trade-off, 47, 49 Training sample, 8 UTADIS Additive utility function, 78, 90, 94 Classification rules, 82 Criteria subintervals, 91, 9697 Marginal utility functions, 79-81, 91-92 Piece-wise linear modeling, 96-98 Post-optimality analysis, 9899, 113-122 Utility thresholds, 82 Utility functions Additive utility function, 4849 Multiplicative utility function, 55 Variance-covariance matrix, 17, 19 Venture capital, 161 Weighted average model, 54, 80 Voting algorithms, 229
Applied Optimization 1. 2.
D.-Z. Du and D.F. Hsu (eds.): Combinatorial Network Theory. 1996 ISBN 0-7923-3777-8 M.J. Panik: Linear Programming: Mathematics, Theory and Algorithms. 1996 ISBN 0-7923-3782-4
3.
R.B. Kearfott and V. Kreinovich (eds.): Applications of Interval Computations. 1996 ISBN 0-7923-3847-2
4.
N. Hritonenko and Y. Yatsenko: Modeling and Optimization of the Lifetime of Technology. 1996 ISBN 0-7923-4014-0 T. Terlaky (ed.): Interior Point Methods of Mathematical Programming. 1996 ISBN 0-7923-4201-1 B. Jansen: Interior Point Techniques in Optimization. Complementarity, Sensitivity and Algorithms. 1997 ISBN 0-7923-4430-8
5. 6. 7. 8. 9. 10.
11. 12.
A. Migdalas, P.M. Pardalos and S. Storøy (eds.): Parallel Computing in Optimization. 1997 ISBN 0-7923-4583-5 F. A. Lootsma: Fuzzy Logic for Planning and Decision Making. 1997 ISBN 0-7923-4681-5 J.A. dos Santos Gromicho: Quasiconvex Optimization and Location Theory. 1998 ISBN 0-7923-4694-7 V. Kreinovich, A. Lakeyev, J. Rohn and P. Kahl: Computational Complexity and Feasibility of Data Processing and Interval Computations. 1998 ISBN 0-7923-4865-6 J. Gil-Aluja: The Interactive Management of Human Resources in Uncertainty. 1998 ISBN 0-7923-4886-9 C. Zopounidis and A.I. Dimitras: Multicriteria Decision Aid Methods for the Prediction of Business Failure. 1998 ISBN 0-7923-4900-8
13.
F. Giannessi, S. Komlósi and T. Rapcsák (eds.): New Trends in Mathematical Programming. Homage to Steven Vajda. 1998 ISBN 0-7923-5036-7
14.
Ya-xiang Yuan (ed.): Advances in Nonlinear Programming. Proceedings of the ’96 International Conference on Nonlinear Programming. 1998 ISBN 0-7923-5053-7
15.
W.W. Hager and P.M. Pardalos: Optimal Control. Theory, Algorithms, and Applications. 1998 ISBN 0-7923-5067-7 Gang Yu (ed.): Industrial Applications of Combinatorial Optimization. 1998 ISBN 0-7923-5073-1 D. Braha and O. Maimon (eds.): A Mathematical Theory of Design: Foundations, Algorithms and Applications. 1998 ISBN 0-7923-5079-0
16. 17.
Applied Optimization 18.
O. Maimon, E. Khmelnitsky and K. Kogan: Optimal Flow Control in Manufacturing. Production Planning and Scheduling. 1998 ISBN 0-7923-5106-1
19.
C. Zopounidis and P.M. Pardalos (eds.): Managing in Uncertainty: Theory and Practice. 1998 ISBN 0-7923-5110-X
20.
A.S. Belenky: Operations Research in Transportation Systems: Ideas and Schemes of Optimization Methods for Strategic Planning and Operations Management. 1998 ISBN 0-7923-5157-6
21. 22.
J. Gil-Aluja: Investment in Uncertainty. 1999
23.
ISBN 0-7923-5296-3
M. Fukushima and L. Qi (eds.): Reformulation: Nonsmooth, Piecewise Smooth, Semismooth and Smooting Methods. 1999 ISBN 0-7923-5320-X M. Patriksson: Nonlinear Programming and Variational Inequality Problems. A UniISBN 0-7923-5455-9 fied Approach. 1999
24.
R. De Leone, A. Murli, P.M. Pardalos and G. Toraldo (eds.): High Performance Algorithms and Software in Nonlinear Optimization. 1999 ISBN 0-7923-5483-4
25.
A. Schöbel: Locating Lines and Hyperplanes. Theory and Algorithms. 1999 ISBN 0-7923-5559-8
26.
R.B. Statnikov: Multicriteria Design. Optimization and Identification. 1999 ISBN 0-7923-5560-1
27.
V. Tsurkov and A. Mironov: Minimax under Transportation Constrains. 1999 ISBN 0-7923-5609-8
28.
V.I. Ivanov: Model Development and Optimization. 1999 ISBN 0-7923-5610-1 F.A. Lootsma: Multi-Criteria Decision Analysis via Ratio and Difference Judgement. 1999 ISBN 0-7923-5669-1
29. 30.
A. Eberhard, R. Hill, D. Ralph and B.M. Glover (eds.): Progress in Optimization. Contributions from Australasia. 1999 ISBN 0-7923-5733-7
31.
T. Hürlimann: Mathematical Modeling and Optimization. An Essay for the Design of Computer-Based Modeling Tools. 1999 ISBN 0-7923-5927-5
32.
J. Gil-Aluja: Elements for a Theory of Decision in Uncertainty. 1999 ISBN 0-7923-5987-9 H. Frenk, K. Roos, T. Terlaky and S. Zhang (eds.): High Performance Optimization. 1999 ISBN 0-7923-6013-3
33. 34.
N. Hritonenko and Y. Yatsenko: Mathematical Modeling in Economics, Ecology and the Environment. 1999 ISBN 0-7923-6015-X
35.
J. Virant: Design Considerations of Time in Fuzzy Systems. 2000 ISBN 0-7923-6100-8
Applied Optimization 36.
G. Di Pillo and F. Giannessi (eds.): Nonlinear Optimization and Related Topics. 2000 ISBN 0-7923-6109-1
37.
V. Tsurkov: Hierarchical Optimization and Mathematical Physics. 2000 ISBN 0-7923-6175-X C. Zopounidis and M. Doumpos: Intelligent Decision Aiding Systems Based on Multiple Criteria for Financial Engineering. 2000 ISBN 0-7923-6273-X
38. 39.
X. Yang, A.I. Mees, M. Fisher and L. Jennings (eds.): Progress in Optimization. Contributions from Australasia. 2000 ISBN 0-7923-6286-1
40.
D. Butnariu and A.N. Iusem: Totally Convex Functions for Fixed Points Computation and Infinite Dimensional Optimization. 2000 ISBN 0-7923-6287-X
41.
J. Mockus: A Set of Examples of Global and Discrete Optimization. Applications of Bayesian Heuristic Approach. 2000 ISBN 0-7923-6359-0
42.
H. Neunzert and A.H. Siddiqi: Topics in Industrial Mathematics. Case Studies and Related Mathematical Methods. 2000 ISBN 0-7923-6417-1
43.
K. Kogan and E. Khmelnitsky: Scheduling: Control-Based Theory and PolynomialTime Algorithms. 2000 ISBN 0-7923-6486-4
44.
E. Triantaphyllou: Multi-Criteria Decision Making Methods. A Comparative Study. 2000 ISBN 0-7923-6607-7
45.
S.H. Zanakis, G. Doukidis and C. Zopounidis (eds.): Decision Making: Recent Developments and Worldwide Applications. 2000 ISBN 0-7923-6621-2
46.
G.E. Stavroulakis: Inverse and Crack Identification Problems in Engineering Mechanics. 2000 ISBN 0-7923-6690-5
47.
A. Rubinov and B. Glover (eds.): Optimization and Related Topics. 2001 ISBN 0-7923-6732-4
48.
M. Pursula and J. Niittymäki (eds.): Mathematical Methods on Optimization in Transportation Systems. 2000 ISBN 0-7923-6774-X
49.
E. Cascetta: Transportation Systems Engineering: Theory and Methods. 2001 ISBN 0-7923-6792-8
50.
M.C. Ferris, O.L. Mangasarian and J.-S. Pang (eds.): Complementarity: Applications, Algorithms and Extensions. 2001 ISBN 0-7923-6816-9
51.
V. Tsurkov: Large-scale Optimization – Problems and Methods. 2001 ISBN 0-7923-6817-7 X. Yang, K.L. Teo and L. Caccetta (eds.): Optimization Methods and Applications. 2001 ISBN 0-7923-6866-5
52. 53.
S.M. Stefanov: Separable Programming Theory and Methods. 2001 ISBN 0-7923-6882-7
Applied Optimization 54.
S.P. Uryasev and P.M. Pardalos (eds.): Stochastic Optimization: Algorithms and Applications. 2001 ISBN 0-7923-6951-3
55.
J. Gil-Aluja (ed.): Handbook of Management under Uncertainty. 2001 ISBN 0-7923-7025-2
56.
B.-N. Vo, A. Cantoni and K.L. Teo: Filter Design with Time Domain Mask Constraints: Theory and Applications. 2001 ISBN 0-7923-7138-0
57.
S. Zlobec: Stable Parametric Programming. 2001
58.
M.G. Nicholls, S. Clarke and B. Lehaney (eds.): Mixed-Mode Modelling: Mixing Methodologies for Organisational Intervention. 2001 ISBN 0-7923-7151-8
59.
F. Giannessi, P.M. Pardalos and T. Rapesák (eds.): Optimization Theory. Recent Developments from Mátraháza. 2001 ISBN 1-4020-0009-X
60.
K.M. Hangos, R. Lakner and M. Gerzson: Intelligent Control Systems. An Introduction with Examples. 2001 ISBN 1-4020-0134-7
61.
D. Gstach: Estimating Output-Specific Efficiencies. 2002
62.
J. Geunes, P.M. Pardalos and H.E. Romeijn (eds.): Supply Chain Management: Models, Applications, and Research Directions. 2002 ISBN 1-4020-0487-7
63.
M. Gendreau and P. Marcotte (eds.): Transportation and Network Analysis: Current Trends. Miscellanea in Honor of Michael Florian. 2002 ISBN 1-4020-0488-5
64.
M. Patriksson and M. Labbé (eds.): Transportation Planning. State of the Art. 2002 ISBN 1-4020-0546-6
65.
E. de Klerk: Aspects of Semidefinite Programming. Interior Point Algorithms and Selected Applications. 2002 ISBN 1-4020-0547-4
66.
R. Murphey and P.M. Pardalos (eds.): Cooperative Control and Optimization. 2002 ISBN 1-4020-0549-0
67.
R. Corrêa, I. Dutra, M. Fiallos and F. Gomes (eds.): Models for Parallel and Distributed Computation. Theory, Algorithmic Techniques and Applications. 2002 ISBN 1-4020-0623-3
68.
G. Cristescu and L. Lupsa: Non-Connected Convexities and Applications. 2002 ISBN 1-4020-0624-1
69.
S.I. Lyashko: Generalized Optimal Control of Linear Systems with Distributed Parameters. 2002 ISBN 1-4020-0625-X
70.
P.M. Pardalos and V.K. Tsitsiringos (eds.): Financial Engineering, E-commerce and Supply Chain. 2002 ISBN 1-4020-0640-3
71.
P.S. Knopov and E.J. Kasitskaya: Empirical Estimates in Stochastic Optimization and Indentification. 2002 ISBN 1 -4020-0707-8
ISBN 0-7923-7139-9
ISBN 1-4020-0483-4
KLUWER ACADEMIC PUBLISHERS – DORDRECHT / BOSTON / LONDON