ANALYSING AND AIDING DECISION PROCESSES
This Page Intentionally Left Blank
ANALYSING AND AIDING DECISION PROCESSES Editors
Pat rick HUMPHREY S London School of Economics and Political Science, England Ola SVENSON Department of Psychology, University of Stockholm,Sweden Anna VARI Bureau for Systems Analysis, State Office for Technical Development, Hungary Co-editors Tibor ENGLANDER and Janos VECSENYI, Hungary, Willem WAGENAAR, The Netherlands, Detlof VON WINTERFELDT, U.S.A.
1983
NORTH-HOLLAND PUBLISHING COMPANY AMSTERDAM NEW YORK OXFORD
-
Joint edition with Akadhiai Kiad6, Budapest AN rightsreserved. N o part ofthis publication may be reproduced, stored in a retrieval system, or transmitted, in any form or b y any means, electronic, mechanical, photocopying, recording or otherwise, without the prior permission of the copyrighr owner.
ISBN: 0 444 86522 5 @ AkadBmiai I Kiado, Budapest 1983
Publishers:
NORTH HOLLAND PUBLISHING COMPANY AMSTERDAM * NEW YORK * OXFORD Sole distributors f o r the U.S.A.and Canada:
ELSEVIER SCIENCE PUBLISHING COMPANY, INC. 5 2 VANDERBILT AVENUE NEW YORK, N.Y. 10017 PRINTED IN HUNGARY
TABLE OF CONTENTS
Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
Section 1: SOCIETAL DECISION MAKING
k,
Patrick Humphreys Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Ralph L. Keeney Evaluation of Mortality Risks for Institutional Deci23 sions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . John Latluop and Joanne Linnerooth The Role of Risk Assessment in a Political Decision 39 Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Howard Kunreuther A Multi-Attribute Multi-Party Model of Choice: Descriptive And Prescriptive Considerations . . . . . . . . . 69 Rex V. Brown and Jacob W. Ulvila The Role of Decision Analysis in International Nuclear Safeguards . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 David H. Custafson, Robert Peterson, Edward Kopetsky, Rich Van Koningsveld, Ann Macco, Sandra Casper, and Joseph Rossmeissl A Decision Analytic System for Regulating Quality of Care in Nursing Homes: System Design and Evalua105 tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Section 11: ORGANIZATIONAL DECISION MAKING JBnos Vecsenyi and Detlof von Winterfeldt
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . Oleg I. Larichev Systems Analysis and Decision Making . . . . . . . . . . Andrew R. Lock Applying Decision Analysis in an Organisational Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Detlof von Winterfeldt Pitfalls of Decision Analysis . . . . . . . . . . . . . . . . . Anna VBri and JBnos Vecsenyi Decision Analysis of Industrial R & D Problems. Pitfalls and Lessons . . . . . . . . . . . . . . . . . . . . . . . . .
123 125
145
167
183
Section 111: AIDING TtIE STRUCTURING OF SMALL. SCALE DECISION PROBLEMS Anna Viri Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . Gordon F. Pitz Human Eugineering of Decision Aids . . . . . . . . . . . I-lelmut Jungerniarln, Ingrid von Ulardt, and Lutz tlaus niann The Role of the Coal for Generating Actions . . . . . . Dimiter S. Driankov and Ivan Stantcliev Fuzzy Structural Modelling--An Aid for Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Patrick tiumphreys Use of Problerii Structuring Techniques for Option Generation: A Computer Choice Case Study . . . . . . Fred Bronner and Robert de Hoog Non-Expert Use of a Computerized Decision Aid. . . . Richard S. John, Detlof von Winterfeldt, and Ward Edwards The Quality and User Acceptance of Multiattribute Utility Analysis Performed by Computer and Analyst Stuart Wooler and Alma Erlich Interdependence Between Problem Structuring and At tribute Weighting in Transitional Decision Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
199 205
2 23
237
253
28 1
30 1
32 1
Section IV: TRACING DECISION PROCESSES Ola Svenson and Patrick Humphreys Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . Henry Montgomery Decision Rules and the Search for a Dominance Structure: Towards a Process Model of Decision Making Ola Svenson Scaling Evaluative Statements in Verbal Protocols from Decision Processes . . . . . . . . . . . . . . . . . . . . Lennart Sjoberg To Snioke or Not T o Smoke: Conflict or Lack of Differentiation? . . . . . . . . . . . . . . . . . . . . . . . . . Joshua Klayman Analysis of Predecisional Information Search Patterns . Rob Ranyard and Ray Crozier Reasons Given for Risky Judgnwnl and Choice: A Comparison of Three Tasks . . . . . . . . . . . . . . . . . Eduard J . Fidler The Reliability and Validity of Concurrent, Retrospective, and Interpretive Verbal Reports: An Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . . . . Oswald Huber The Information Presented and Actually Processed in a Decision Task . . . . . . . . . . . . . . . . . . . . . . . . . Kobert W. Goldsmith and Nils-Eric Sahlin The Role of Second-Order Probabilities in Decision Making.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Marian Smith and William R. Ferrell The Effect of Base Rate on Calibration of Subjective Probability for True -False Questions: Model and Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . Ruma Falk and Don MacGregor The Surprisingness of Coincidences . . . . . . . . . . . .
337
343
371
383
401
41 5
429
44 1
455
469
489
Section V: A SYMPOSIUM ON THE VALIDITY OF STUDIES ON HEURISTICS AND BIASES Willem-Albert Wagenaar Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 505 Ward Edwards Human Cognitive Capabilities, Representativeness, and Ground Rules for Research . . . . . . . . . . . . . . . 507
Baruch Fischhoff Reconstructive Criticism . . . . . . . . . . . . . . . . . . . 515 Lawrence D . Phillips A Theoretical Perspective on Heuristics and Biases in Probabilistic Thinking . . . . . . . . . . . . . . . . . . . . . 525 Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
545
Subject lndex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
553
PREFACE
The papers in this book are an edited selection from those presented at the Eighth Research Conference on Subjective Probability, Utility and Decision Making, held in Budapest in August 1981. Together they span a wide range of new developments in studies of decision making, the practice of decision analysis and the development of decision-aiding technology. The international, interdisciplinary nature of the work represented here makes it difficult and perhaps unwise to assign papers to categories according to the methodology or approach used, or the area surveyed. Nevertheless, we have arranged the book in five principal sections, according to the different fields of interests our readers may have, and placed in each section those papers which seem to be particularly significant in providing food for thought and new perspectives within that field. The titles we gave to the first four sections are "Societal Decision Making", "Organizational Decision Making", "Aiding the Structuring of Small Scale Decision Problems, and "Tracing Decision Processes". Throughout, the emphasis i s on decision processes and structures and their applications, rather than formal modelling in isolation. We do not see this as a 'bias', but rather a reflection of current developments in research and practice which follow from the understanding of the nature and operation of decision theoretic models gained during the 1970's. Here you will find suggestions on how to bring these models alive and put them to work in a wide range of contexts. The fifth section is of a different nature. It presents papers given at a symposium on the validity of studies on heuristics and biases at the Budapest conference. These papers take stock of the considerable volume of work investigating 'heuristics and biases' in decision making over the past decade, and their implication for theory and practice. The papers give the authors' own viewpoints and are presented here unedited with the hope that they will stimulate discussion among a wider audience.
10
We d o not propose to make any suggestions directing particular readers t o particular sections, as this would run counter t o the spirit of the book and the conference which gave rise to it. However, if IOU would like a general overview of the papers in a section before delving more deeply, you will find this in the introduction at the start of the section. The Eighth Research Conference on Subjective Probability, Utility and Decision Making was part of a biennial series, currently attracting over one hundred leading practitioners. Participation in these conferences is open to anyone working in relevant areas, and the conference is advertized through the mailing of information t o active individuals and institutions. It is a European conference, and while participation is encouraged from all countries in the world, its principal aim is t o stimulate the exchange of ideas and development of research and practice throughout Europe, east and west. Conferences have been held in Hamburg (1969), Amsterdam (1 970), London ( 1 971 ), Rome ( 1 973), Darmstadt ( 1 975), Warsaw ( 1 977), Gothenburg ( 1 979) and Budapest ( 1 981 ). The Ninth conference will be held in Groningen, the Netherlands. in 1983. Procedures for setting up and running each conference fall under the responsibility of an organizing committee chosen at the closing session of the previous conference. The conference is not affiliated t o or sponsored by any institution or organization, being supported each time by its participants and by national research agencies of the host country, and we consider that such independence has been a cornerstone in maintaining its vitality and acceptance across the complete range of European countries. Edited versions of the proceedings of most of'the conferences in the series have been published in books, or in Actu Psychologicu.* Keviewing earlier publications in the series, one can see how foundations laid earlier come t o fruition and in turn set the scene for new developments in the field. We have compiled this volume with the hope that it will play its part in this process. Patrick Humphreys Ola Svenson Anna Vari
* D. Wendt and C.A.J. Vlek (eds.), 1975. Utility, Probability and Human Decision Making, Amsterdam: Reidel (Proceedings o f the 4th Conference). H . Jungerniann andG. de Zeeuw (eds.), 19ll.DecisionMaking and C h a w e in Humon Affairs. Amsterdam: Reidel (Proceedings of the 5th Conference). L. Sjoberg, T. Tyszka and J.A. Wise (eds.), 1983. Decision Anulysis and Decision Processes. Lund: Doxa (Proceedings of the 6th Conference). L.R. Beach, P.C. Humphreys, 0. Svenson and W.A. Wagenaar (eds.), 1980.Exploring Human Decision Making. Amsterdam: North Holland (Proceedings of the 7th Conference); also published as Acta Psychologica, 45, 1980 (complete volume).
ACKNOWLEDGEMENTS
The editors gratefully acknowledge the assistance o f the following persons who acted as referees of contributions t o this volume. Berndt Brehmer (Sweden) Vaclav Brichaced (Czechoslovakia) Klara Farago (Hungary) Baruch Fischhoff (U. K.) Robert Goldsmith (Sweden) Robin Hogarth (U. S. A.) Robert de Hoog (The Netherlands) Richard John (U. S . A.) Howard Kunreuther (Austria) Oleg Lariclicv ( [ I . S. S . R.) Sarah Liclitenstein (U. S. A.) Lola Lopcs (LI. S. A.) Jan Meisrier ( U K.) Henry Montgomery (Sweden) Gerard do Zeeuw (The Netherlands)
Czeslaw Nosal (Poland) Lawrence Phillips (U. K.) Gordon Pitz (U. S. A,) Rob Ranyard (U. K.) Dieter Schroder (West Germany) Zur Shapira (Israel) Lennart Sjoberg (Belgium) David Spiegelhalter (U. K.) Ivan Stantchev (Bulgaria) Tadeusz Tyszka (Poland) Charles Vlek (The Netherlands) Stephen Watson (U. K.) Ayleen Wisudha (U. K.) Stuart Wooler (U. K.)
This Page Intentionally Left Blank
Section I
SOCIETAL DECISION MAKING
This Page Intentionally Left Blank
IN TROD UCTlON Patrick HUMPHREYS
Decision theory, as a theory of individual choice has been largely concerned with the delineation of act -event sequences leading to consequences, the evaluation of those consequences, computing the expected value (or expected utility) of courses of action immediately available through principles which enable the uncertainty with which consequences are linked to immediate acts t o be taken explicitly into account. The papers in tlus section describe models and procedures which encompass these ideas, but demonstrate how they in themselves map out only a small part of the issues which must be considered in studying decision making within a social context either descriptively or prescriptively. I t is not sufficient t o 'socialise' the study of decision processes by grafting ideas about communication, equity or bargaining onto models of personal decision making. The models explored here demonstrate how decision-theoretic models of the problem need to be refocussed, with goals concerning 'good' decision processes which are qualitatively quite different from those conventionally adopted in studying individual choice. Ralph Keeney sets the scene for this refocussing in considering the evaluation of mortality risks. Estimating fatalities consequent upon a socially controversial course of action, like siting a new energy plant, has often been proposed as an objective measure of the 'risk' involved in the enterprise. The "objectivity" is commonly thought to reside in the possibility of separating probabilistic estimates of magnitude of risk (number of fatalities expected) from severity of risk (the value of each life lost through such fatalities). However, the concept of "expected fatalities" h d e s the distribution of those fatalities in the population at risk. Keeney considers the two extremes which characterize 'real life' situations in various mixtures: loss of an identifiable life, where a single expected fatality refers t o one known person, and loss of a statistical life, where a
16
P.Humphreys
single expected fatality refers to a I/n chance of loss of life for each member of a population of n people. Keeney shows that there is no reason to assume the value of a statistical and an identifiable life to be the same. Arguments may be adduced for distinct preferences for avoiding either type of fatality over the other. Moreover, valuation of a statistical life as greater than an identifiable life implies a preference for risk equity : spreading the risk equally as possible over the population, whereas valuation of an identifiable life as greater than a statistical life goes with a preference for catastrophe avoidance. It is not possible to find a valuation function which meets both preferences, one has to decide on the tradeoffs one makes between them in deciding how to value lives which may be lost in a particular situation. This analysis reveals two crucial flaws in risk assessments which try to avoid the subjectivity inherent in value judgements by describing risk solely in terms of probabilities of fatalities. First, without adequate information concerning the expected distribution of these fatalities, one cannot place a value on the lives in any unique way. Secondly, the values assigned depend of necessity on the relative preference for risk equity versus catastrophe avoidance of the assessor. The authorsof the other papers in this section start from an acknowledgement of the essential subjectivity of risk assessment and focus on modelling the context of this subjectivity as an essential pre-requisite to any prescriptive analysis. John Lathrop and Joanne Linnerooth examine the role of risk assessments-as represented in risk studies commissioned by interested parties within a political decision process: siting of a Liquid Energy Gas plant in California. They show how the content of a risk assessment is largely determined by the intended use of the study in the current round of political debate on the decision. This varies the need to model the decision process sequentially, as the same assessment may be used in different ways i n different rounds. Lathrop and Linnerooth describe how a risk assessment, containing “worst case” scenarios, introduced in one round as part of a local government’s environmental impact report can be taken up in subsequent rounds by pressure groups from among the population shown at risk in these scenarios. The formulation of the problem may change between rounds, as sequential constraints operate. Lathrop and Linnerooth describe by round four in a LEG siting decision this can result in the formulation being constrained to the question “is the single site under consideration safe?”, despite changes in the wider problem context (fall in energy demand) which are left unconsidered as they now lie outside the problem formulation.
Section 1 : INTRODUCTION
17
In this context, what should be the role of risk assessments within the decision making process? Lathrop and Linnerooth critically examine the implications ot' the idea commonly found in literature on prescriptive models of decision making that the goal should be to concentrate on improving risk assessment reports (e.g., by establishing independently funded public research bodies t o provide the analysts) or on improving their interpretation ( e g , by including technical experts among the judges). They suggest that a more realistic aim would be to improve the role that risk assessments actually play in the decision process by treating them as evidence, rather than attempting t o promote them as fact. Some desirable features are indicated for rules of evidence:Assessments covering different sites (policies) s h o d d be comparable, so that alternafhes, rather than assessments can be judged; they should discuss the modelling of uncertainties and tradeoffs "about which there cannot be any objectivity"; they should address the assessments of risk as perceived by those at risk, not relying on pseudo-objective measures like "expected fatalities" when the political process is primarily sensitive t o potential for catastrophe. In line with the change from "facts" to evidence, the advocacy role of experts should be recognized, and treated as a productive rather than a silent element in making public policy. In line with these ideas, Howard Kunreuther describes a niultiattribute multiparty (MAMP) model for structuring the social decision making processes which pass through a number of rounds and involve various parties with different concerns. Kunreuther shows how each round in the process can be characterised by a unique problem formulation, involving a particular set of interested parties. Interpreting the mpdel involves the development of a party/concern matrix, studying the impact of exogenous events, and analysing the sequential constraints imposed on subsequent rounds by the outcome of the current round. Kunreuther shows how the MAMP model may be used to draw lessons from case studies of liquid energy gas siting decisions. Specifically, (i) there was little articulation of value judgements by different parties in the process, who would state objectives, but were reluctant t o provide the importance weight which would reveal their value structures; (ii) the constraints guiding the decision process were unstable, consistent with the view that policies are determined through a process where each of the interested parties attempts t o modify the rules of the game in a way that will serve t o maximize the likelihood o f the attainment of their own goals and objectives; (iii) parties exploited the fact the siting of sophisticated technologies was "not well understood scientificallyff as a licence t o concentrate on whatever measures of risk they wished. Moving from descriptive t o prescriptive analysis, Kunreuther notes
2
18
P. Humphreys
that within the MAMP model there is no reason why one cannot focus on how well different decision making procedures (rather than outcomes) score with respect t o a well defined set of objectives. One can use MultiAttribute Utility theory t o examine how well a procedure scores on attributes like: does each party have an opportunity t o voice its own position; were a wide enough set of attributes considered t o feel a choice was actually made? Given that decision processes may be evaluated in this way, Kunreuther discusses how GERT (Graphic Evaluation and Review Technique) may be used t o investigate how they might be improved. Questions which can be addressed through the use of GEKT include: ”What is the likely impact on the decision process if some of the existing constraints are relaxed; what would the impact be if certain parties were given power they currently d o not have; what would happen if there was a change in the way alternatives sites were introduced into the picture?” The MAMP niodel helps us t o understand how decision analysis may be employed in providing inputs to decision making, conceived as an essentially political process. This orientation is essential when interested groups or parties t o the decision are in conflict, where the goals of a decision analysis should not extend t o trying to organize the decision process itself, as in such cases there is no formally ’equitable’ way of doing this from all parties’ points of view. In contrast are c a m where decision analyses are commissioned as a way of promoting goals shared by society, excepting a few minority of renegade groups or individuals whose actions or goals are generally viewed as going against consensus interests. Here the context is different from that addressed by the MAMP model and usually encompasses an agency constituted as the guardian of society’s interest and granted appropriate executive powers. The final pair of papers in this section describe decision analyses having the goal of improving the regulatory activities of two such agencies in rather different fields. Rex Brown and Jacob Ulvila describe the role of decision analysis in international nuclear safeguards, examining the issues that need t o be considered in improving the efficiency of the inspection activities of the International Atonuc Energy Agency (IAEA). David Gustafson and his colleagues describe the development and evaluation of a decision analytic system for regulating the quality of care in nursing homes in the state of Wisconsin. While the overall goal is the same in each of these two cases, the very different contexts means that their interpretation in practical terms are rather different, and the contrast between the two studies illustrates the diversity of potential applications of decision analysis in prescriptive social decision making.
Section 1: INTRODUCTION
19
In the IAEA decision analysis the goal was to allocate inspection activities in a way that would minimize the diversion of nuclear material from peaceful use; in the nursing home decision analysis the aim was t o allocate inspection activities in a way that would maximize the quality of care. In both cases time and scope of inspection activities were an overall constraint as inspection resources were severely limited. In the IAEA case further constraints resulted from the agreements with the participating nation states on which the acceptance of the legitimacy of the agency's activities rested. For example, the IAEA is constrained not to allocate its resources in a way that discriminates between states (hence indicating its view that particular states are more likely t o contemplate diversion). No such constraint existed in the Wisconsin case, where nursing homes must undergo annual inspection, backed up by the possibility of state enforcement action. Here one task of the initial analysis was t o develop a screening instrument which would indicate those homes of suspected inferior quality, so that r~zorctinze could then be spent in these homes in subsequent inspection activities. In the initial phase of the analysis described by Brown and Ulvila, the idea of developing a screening instrument based upon judges' ratings is replaced by a diversion path analysis operating under the constraint of only analysing info, mation which can be "uncontroversially and objectively" documented. This constraint rules out a classical decision analytic approach which would typically involve modelling the likelihood of different types of diversion in a particular context; how a diverter would behave, who the diverter might be. But a fornial analysis of these issues would force the innate logic out into the open and thereby open t o objection. Brown and Ulvila discuss ways of overcoming these problems. While, t o avoid objections, the likelihood of a diversion along a path must be given equal weight in every context, one can vary the number of paths objectively identified in any particular context. Paths may also be weighted by their attractiveness, with thcse weights operating within the analysis as relative probabilities. Rather than make any attempt t o identify "diverted', one can use "objective" surrogates for the seriousness of a diversion path: classifying it according t o the type of nuclear material that may be diverted along it. Brown and Ulvila describe how these possibilities form the basis for an analysis, in which diversion paths are defined by the material involved, its location and the method of concealing its removal. This information is determined for any particular facility from analysis of design information, operating and accounting procedures, and inspection histories. Determining a diversion path does not indicate that it will be used, hence the probability that it is in use must also be determined
2*
20
P. Humphreys
"objectively", This is done by determining 'anomalies', unusual and observable conditions that might occur in the event of a diversion through the path concerned. What sort of decision aids should be developed on the basis of the types of analysis Gustafson et al., and Brown and Ulvila describe? Both groups of authors agree that these should involve (i) developing decision rules for when to initiate a special inspection of a facility and when t o proceed from one level of response open t o the regulating activity to a more severe one; (ii) how best t o allocate inspection resources a t a facility and between facilities. In the nursing home case, the rules for initiating special inspection are: if the home passes the initial (screening) review, its survey is complete; if only 3 few minor problems are found, surveyors would seek t o correct them; if many deficiencies or a few substantial problems are revealed, a complete survey (i.e., special inspection) would be implemented concerning the full set of 1,500 regulations for nursing homes, legal investigations, special review teams collecting data which might be useful in court proceedings. Choice of consequent executive action is not based o n the overall evaluative score for a home from the analysis, but rather involves the srlec.tion of the one intervention that would most likely lead to a change in the way care was delivered. In the case of nuclear safeguards, results of inspections are also not expressed as evaluation numbers, but for 3 different reason: to d o so would require weighting the importance of different levels of detection and. given the context, there are difficulties for the regulatory agency in analysing such weights explicitly. Rather, Brown and Ulvila suggest that analysis of anomalies be translated into detection probabilities for diversion paths. In the current version the single 'detection' response has been used corresponding t o the observation of an anomaly. If the anomaly remains unresolved, inspection activity may be stepped up, the relevant state missions may be consulted, or ultimately a report may be made t o the U.N. Security Council. A future possibility would be t o assess explicitly the detection probabilities t o be associated with different levels of IAEA response. Regarding allocation of inspection resources, Gustafson et al. stress thrt their goal was t o maximize the potential for contingent regulation : identifying and discharging problems in agencies delivering good care, while concentrating on those agencies which need assistance. Gustafson et al. describe how their Quality Assurance Process (QAP) inspection procedures (comprising multi-attribute utility based screening, together with interviews of 10% of residents chosen statistically) was better able t o meet these goals than the old regulatory process, since it led t o more time
Section I: INTRODUCTION
21
being spent in poor quality homes, action being taken on more problems, improvement in the quality of care, and actions being more frequently "consistent with effective change agents". In the aid proposed for use by the IAEA, the fundamental goal in allocating inspection resources is one of deterrence by risk of detection, To this end a decision analytic model was used to develop a prioritized list of'inspection activities, with priority depending on the value of the activity and its cost in terms of inspection time. Brown and Ulvila describe how value can be indexed through an aggregate measure involving explicit consideration of the probability of detecting a diversion, the amount and type of material diverted, the technical complexity specific vulnerabilities of the system and the timeliness of the detection. Determining the optimal relation between priority of an inspcction activity and including it in the inspection regime at any facility was a much easier task in the case of ensuring quality of care in nursing homes than in ensuring nuclear safeguards. In each case the inspection procedures, as instruments of social policy, must of necessity be made public. In Gustafson et al.'s QAP procedure it was not considered that this would degrade the efficiency of the procedures. On the contrary, knowledge of the procedures was expected t o aid the majority of nursing homes in their own attempts to improve the quality of care they offered. Public knowledge of the IAEA's prioritization of inspection activities might, however, aid potential diverters in their activities, and such aid would run precisely counter t o IAEA's goals. Any time-limited inspection strategy is sure t o leave some diversion possibilities uncovered, and fixed prioritization serves to identify those uncovered paths on which the diverter might then focus his attention. Brown and Ulvila describe how in this context it is not possible to perform the sort of analysis that Kunreuther recommends in his multiattribute multiparty model, since it is not permitted to model a potential diverter (i.e., one of the parties in the social decision process) explicitly, so no party-concern matrix can be obtained and 'solved' for an ideal inspection strategy. Instead they recommend partial randomization as a source of deterrence: Inspection activities covering those paths of low technical complexity and easy t o use nuclear activity should be covered in every inspection, together with inspection activities selected on a random basis to provide some coverage of paths that are more complex or involve hard-to-use material. In general, the papers in this section demonstrate how decision analyses for societal decision making must be sensitive t o the social and political context in which they are employed. They also illustrate ways of modelling that context that aid the design and use of decision analyses in
22
P. Humphreys
meeting goals held by social consensus across parties to the decision. In many cases these goals involve the use of analyses to optimise procedures rather than outcomes. Conversely, in assessing the status of decision theory, we find here evidence that, while retaining characteristic emphasis on the assessment of probabilities of events and the utilities of consequences, the scope of this theory now extends well beyond models predicated on individual or corporate goals, providing the basis for both descriptive and prescriptive analysis of situations of social concern.
EVALUATION OF MORTALITY RISKS FOR INSTITUTIONAL DECISIONS' Ralph L. KEENEY Woodward-Clyde Consultants, San Francisco, California, U.S.A.'
Abstract This paper investigates the implications of various value judgments 3n the evaluation of public mortality risks. Using utility functions to quantify values indicates the mutual inconsistency of three reasonable goals: minimize the expected number of fatalities, promote equitable distribution of public risk, and a preference for catastrophe avoidance. Utility analyses and some other approaches for assisting with decision malung involving public risks are appraised from an institutional or governmental perspective.
Introduction An important aspect of many institutional decision problems involves possible health and safety risks to members of the public. Decisions involving the licensing of drugs, medical research and service programs, use of existing and future technologies, road safety, and military programs can each have a significant effect on public risks. These risks may involve possible fatdlitks, sicknesses, and injuries, as well as psychological effects. For appraising alternatives in such decision contexts, the decision makers need to address public risks in addition to other economic, social, and environmental consequences. This paper addresses issues that an institution 'This work was conducted under a subcontract with Decision Research, a branch of Perceptronics, for Oak Ridge National Laboratory under ORNL subcontract No. 7656. It was performed for the 11,s.Nuclear Regulatory Commission under NRC Interagency Agreement 40-550-75. fliis papr is a revised version of a presentation given at the Conference on the Value of Life sponsored by the Geneva Association, Geneva, March 30-April 1, 1981. The comments of Detlof von Winterfeldt were very helpful in the revision. 2The address of WoodwardClyde Consultants is: Three Embarcadero Center, Suite 700, San Francisco, California 94111, U S A .
24
R.L. Keeney
should consider in order to evaluate responsibly the public risks of alternatives. Our focus is on the evaluation of mortality risks, those risks which could end the life of individuals. It must be understood that a period of pain or morbidity resulting in the fatality is perhaps worse than an immediate fatality. However, the difference is the undesirability of the pain or morbidity, which should be explicitly included as an additional factor in an evaluation process. In order to explicitly evaluate mortality risks, an index of the relative desirability (or undesirability) of any circumstance characterizing the specific risks is needed. Our task is to structure potential indices. The process will necessarily involve value judgements. Hence, we proceed by postulating various reasonable value judgements and then derive their implications for the resulting index. Examining the implications of different basic values results in some interesting insights about evaluating possible fatalities. The paper is outlined as follows. The first section characterizes the evaluation of mortality risks from an institutional decision making perspective. The next section introduces three common approaches: insurance, willingness-to-pay, and court awards for examining mortality risks from that perspective. As an alternative, we propose utility analysis. In the third section, the restrictions that various value assumptions place on the evaluation of potential fatalities are examined, and the contradictions between these assumptions are highlighted. Final sections consider the extension of the approach to multiple causes of risk, discuss implementation of utility analysis, and present a brief summary and conclusions.
Structuring the Problem Over the past decade, several approaches have been suggested for evaluating decisions that involve potential fatalities (e.g., Acton, 1973; Jones-Lee, 1974; Howard, 1979; Keeney, 1980a). Linnerooth (1975, 1979) and Zeckhauser (1975) provide excellent surveys of much of this literature, which carries the label "value of life". One conclusion is that there is certainly no consensus on an appropriate approach for evaluating lives. A major reason for this lack of consensus is the complexity of the problem. But perhaps a more important explanation is the fact that there are several different problems categorized under the rubric "value of life". For instance, these problems might be characterized along the following dimensions:
EVALUATION OF MORTALITY RISKS
25
1 . prescriptive versus descriptive, 2. evaluation before or after fatalities occur, 3 . whether many individuals or one individual is at risk, 4. whether contemplated actions affect the risks, 5. who is doing the evaluating. Regardless of the problem addressed, value judgments must be made. However, there is a choice of whether these value judgments should be made implicitly or explicitly. Also, the value judgments appropriate for one characterization of the ”value of life” problem may not be appropriate for other characterizations. Our orientation is prescriptive throughout this paper, since the ultimate concern is to identify appropriate actionsgiven the risks and the basic value judgments articulated for the problem. We will be particularly interested in examining the desirability of spending funds to reduce (directly or indirectly) the mortality risks. Hence, we will require an index of costs, as well as the risks, as arguments in an analysis. By an institutional decision making perspective we mean that an institution provides the value judgments necessary to appraise various risks as well as the valbe tradeoffs between the risks and costs. There are two general circumstances where this model seems particularly appropriate. The most common is when individuals in the government evaluate programs t o spend public funds t o reduce mortality risks to members of society. The second circumstance is when individuals in an institution spend institutional funds to reduce the risks t o members of that institution. An example is expending industry funds to make equipment safer to reduce the risks to workers. These circumstances are to be contrasted with those where individuals may appraise risks involving only themselves. We wish t o evaluate mortality risks before any fatalities have occurred. Also, there will be numerous individuals potentially a t risk. Hence, our problem can be characterized by a mortality risk vector p = (pl ,pz ... ...,p,) where pi is is the risk to individual i and n is the total number of individuals at risk. Specifically, pi will be the probability that individual i will be a fatality in the time period of concern due to the cause of concern. This probability is independent of all the other probabilities in the mortality risk vector. To describe circumstances involving risks to individuals where the risk of one person is related to others, lotteries over mortality risk vectors must be utilized. An appropriate manner to measure the desirabillty of lotteries is a utility function assessed in accordance with assumptions postulated alternatively by von Neumann and Morgenstern (1947) or Savage (1945). The cost that one might be willing to expend to reduce mortality risks will be denoted c. Thus, a possible consequence characterizing a
26
R.L. Keeney
situation would be the vector (p; c) = (pI , pz , . . . , Pn; c). The basic problem for evaluating mortality risks is then to determine an appropriate utility function u which assigns the utility u(p;c) to each consequence (p;c). Based on the assumptions of utility theory, consequences with higher utilities should be preferred to those with lower utilities, and when uncertainty is involved, situations correspondmg to higher expected utilities should be preferred. Given this characterization of the problem, consequences which were indifferent to each other would have the same utility. Specifically, suppose we searched for 3 consequence of the form (0, . . . , O;c*) that was indifferent t o a consequence (1,0, . . . 0;O). Then became the first consequence involves no risks and a cost of c*, and the second consequence involves a risk of I to one individual and no costs, one could define c* to be the ”value of life”. One might assume that if there were a unique measure of the value of life, then the value of each life would be equal and the value of 2 lives lost would be equal t o 2 times the value of 1 life lost. This would imply that consequences (l,O, . . . O;c*) and (O,l, . . . , O;c*) are indifferent and that consequences ( l , l , O , . . . 0;O) and (0, . . . ,0;2c*) are indfferent. However, these are not reasonable assump tions for many decision problems. Our problem formulation, which does not make such simplifying assumptions, can characterize many different ”value of life” problems using the utility function u(p;c). Problems are distinguished by whose lives are at risk, whose funds might be expended to reduce these risks, and whose values are utilized t o evaluate the consequences. One situation is where the individual at risk is the same individual whose funds and value judgments are utilized. Another situation occurs when the risks are to one group, the costs are borne by another group, and the evaluation is done by only one of the two parties. A third situation is where the risks are to one set of individuals, the costs are to a second party or organization, and the value judgments are made by an individual or individuals presumably representing both parties. This latter situation is what we have referred to as the institutional perspective.
Insurance, Willingness-to-Pay, and Court Awards Approaches utilizing insurance, willingness-to-pay schemes, and court awards are among those suggested for evaluating mortahty risks to individuals (see, for example, Zeckhauser, 1975). As we wd1 see, each of these schemes seems to have serious shortcomings for prescribing an
EVALUATION OF MORTALITY RISKS
21
appropriate institutional evaluation of mortality risks. That is not to say that the schemes are inappropriate for other uses. It is simply the case that there are widely different “value of life” problems.
Insurance It has been proposed that if an individual has an insurance policy of x dollars, then he or she implicitly values life at x dollars. Given that, the leap has been sometimes made t o suggest that institutions should utilize the same value in examining options affecting risks to the insured. From the prescriptive institutional viewpoint, there are many shortcomings to this approach. First, decisions of whether or not to insure and how much insurance are not necessarily made with one’s best judgments. Because one buys an insurance policy does not imply that he or she should have bought that policy, so an institution may not want to follow such individual a3ions. Second, the existence of an insurance policy usually does not affect the mortality risks t o an individual. Consequently, the value tradeoff between mortality risks and costs is not considered. I would assume that a majority of the individuals with life insurance amount x would not gladly give up their lives for a return of slightly greater than x dollars.
Willingness-to-Pay With the willingness-to-pay approach, individuals are asked how much they would be willing t o pay to reduce a particular mortality risk by an amount E . If the response is y dollars, this might then be extrapolated linearly to imply that y/e dollars is an appropriate value of life for the respondent. One shortcoming often found in the assessment procedure is that the implications of the initial responses about willingness-to-pay are not investigated. Experience indicates that it is very difficult to respond consistently to questions involving very small changes in one’s mortality risks. An additional problem is to find an appropriate manner to aggregate individual responses which is necessary to obtain any institutional evaluation of mortality risks to several individuals. Because the responses about willingness-to-pay concern funds of the individual at risk and the institutional problem concerns funds of some other entity, a direct aggregation of the individual responses may be inappropriate.
R.L. Keeney
28
Court Awards A third approach suggested for evaluating the value of a life involves court awards. In some Circumstances after an individual has lost his or her life, the court awards the heirs a dollar amount z as restitution from a party responsible for the fatality. The manner in which the court selects an amount z could result from several factors and is often not clearly articulated. There seems to be no clear justification for utilizing such a value for prescriptive evaluation. Whereas insurance and willingness-to-pay involve situation where the individual at risk made value judgments about his or her own costs and risks, the judge has more of an institutional perspective with court awards. However, court awards involve circumstances where lives are evaluated after a fatality has resulted, so no contemplated action can affect the risks. Because we wish t o evaluate circumstances where action can influence risks to individuals, the value of a life based on court awards may not be appropriate. Also, a court award is usually based on a specific individual’s circumstance and would necessarily need to be extrapolated in some manner to deal with the institutional problem involving risks to several individuals. A reasonable manner for this extrapolation does not obviously present itself.
Evaluating Mortality Risks Using Utility Analysis Utility analysis uses a utility function for evaluating mortality risks. In t h s section, several suggestions for an appropriate structure of such an institutional utility function are discussed. The implications of various basic value judgments wdl be examined. Rather than use a formal theorem-and-proof format, we will simply state some of the key ideas as observations. When appropriate, details wdl be referenced to other works. The observations in this section are appropriate for any separable utility function (u(p;c) = f [uR(p),uc(c)], where f is any function, U R is the utility function for mortality risks, and uc is the utility function for costs. However, for simplicity, we will assume that an appropriate utility function for risks and costs is U(P;c> = UR(P) 4-hW(c),
(1)
where X is a scaling constant to assure that uR and are consistently scaled. For convenience, we will set the origin of all the utility functions by
EVALUATION OF MORTALITY RISKS
29
and establish a scale for each of the utility functions by u(l,O, . . . , O ; O )
= U,(l,O,.
. . ,O)
=
w ( 1 ) = -1
where c is evaluated in millions of dollars.
Identifiable versus Statisticul Fatalities Let us now examine the concept of value of life introduced earlier. Specifically, we suggested that the c* defined by ( l , O , . . ., 0;O) indifferent to (0, . . . , O ; c * ) can be interpreted as the value of life. However, the indifference of ( l l n , . . . , l/n;O) and ( 0 , . . . , 0;c’) gives us another possible value of life. In the first situation, we know that individual 1 will be a fatality and there will be no costs or that no individual will be a fatality and there will be c* costs. Hence, the life at risk is clearly identifiable; it is individual 1’s life. Such a circumstance is sometimes referred to as an identifiable fatality, since the individual is identifiable. Hence, it might be more appropriate to refer to c* as the value of an identifiable life. T o interpret the second pair of indifferent consequences, recall that there are n individuals at risk. Since each of these individuals has a I/n chance of being a fatality i n the first consequence, the expected number of fatalities resulting from the risks is I . Hence, we could say that 1 expected fatality and no costs is indifferent t o 0 expected fatalities and the cost of c’. However, at the time the evaluation must be made, it is clear that each individual has an equal chance of being a fatality and it is not possible to identify who will be fatalities (in fact, there could be zero, one, or more than one fatality). Such a circumstance is sometimes referred t o in the literature as a statistical fatality, so c’ could be referred t o as the value of a statistical life. Many people feel that the risk ( l , O , . . , , 0) is worse than ( l / n , l / n , . . , l / n ) because the former consequence is so unevenly distributed and at least the risk in the latter case is ”somewhat equitable”. If this is the case, the value c‘ should be less than the value c*. In essentially all situations involving an iristitutional evaluation of mortality risks, we would expect to be dealing with only small risks to for eacb individual. These risks should certainly be much less than each person and would much more likely be in the range of lo4 and smaller. However, we could certainly have situations where 20 percent of the individuals had a iisk of 5/n each and the other 80 percent had a 0 risk. The expected fatalities corresponding to this risk would also be 1 although the risk in some sense would not be as equitable as the situation where each individual had a l/n risk. Consequently, although this
R.L. Keeney
30
latter risk situation might be characterized as having 1 statistical fatality, it may be appropriate to value this statistical fatality less than the former case. Observation 1 . There is no unique definition for the term value of life. Aside from the lack of a clear definition, use of the term "value of life" can be misinterpreted to imply that the value of 2 lives is equal t o twice the value of 1 life. This may be the case but it requires additional value judgments. Furthermore, for decision making, we are interested in evaluating the risks prior to possible fatalities. Of course we are interested in risks because of the possible resulting fatalities, but it might be appropriate to refer t o the value of a specific risk or of a specific risky situation when considering decisions involving mortality risks rather than to the value of lives.
The Utility Function uR To determine the utility function u R , we need to consider utility functions over each component pi and the integration of these utility functions. Concerning the former problem, it may be reasonable to assume that an organization should evaluate an individual's risks as the individual would want t o evaluate them. Assuming that the individual wants t o minimize his or her risks, such a utility function should be linear in pi. This linearity condition also follows from a consistency argument which assumes that the relative utility of mortality risk vectors must equal the expected utility of the implied set of fatalities (see Keeney, 1 9 8 0 ~ ) Given . the individual had a different value structure, we would need t o use a nonlinear utility function ui (pi). In this paper, we concentrate on the evaluation of simultaneous individual risks so we will employ the linear case for ui for simplicity. Similar developments are possible for nonlinear cases. We also make the assumption that risks to each individual should be evaluated identically. Furthermore, it may be appropriate to evaluate risks to any subset of individuals in the same manner regardless of fixed risks to those individuals not in the subset. In technical terms, these assumptions are equivalent t o the following.
Basic Model Assumptions. The attributes measured by the individual risks are utility independent, all subsets of attributes are preferentially independent, and the utility function for each of these single attributes is linear (namely, minus pi for individual i).
EVALUATION OF MORTALITY RISKS
31
As shown in Keeney (198Oc), the resulting risk utility function uR must either be the additive form
or the multiplicative form
where k is a scaling constant. We will now examine further implications on the form of uR given the assumptions above for each of three different objectives: that we want to minimize the expected number of fatalities, that we prefer equitable distributions of risk, and that we wish to avoid catastrophes. As we will see, these desires are mutually inconsistent.
Minimizing Expected Fatalities
Given specific funds to reduce the risks to individuals, a reasonable guideline might be to minimize the resulting total expected fatalities to save as many lives as possible. Observation 2. Given the basic model assumptions and a preference to minimize the expected number of lives lost, the utility function for mortality risks must be the additive form (4).
This observation almost directly follows from the fact that the sum of the pi's is equal to the expected number of fatalities with any mortality risk vector. If we use expected fatalities as our criterion, the additive utility function must be appropriate. lf we define f to be the number of fatalities resulting from any particular risk situation, a utility function UF for fatalities consistent with the additive UR is UF(f) = -f.
(6)
An Equitable Distribution of Risk We will define consequence (pI , . . . ,pi, . . . , pj . . . , pn), for any i and j , to be a more equitable distribution of risk than (pl, . . . , p i + € , . . . , p j - ~ , . . . , pn) if the difference between pi and pj is less than the difference
R.L. Keeney
32
between pi + f and pj- e . Note that all Ph, h # i, j are held fixed in the definition. This definition merely says that given all risks but t w o are fixed, the better balanced these two are, the more equitable the risk. A reasonable assumption might be that a more equitable distribution of risk is preferred t o a less equitable distribution.
Observation 3. Given the basic model assumptions and a preference for a more equitable distribution of risk, the utility function for mortality risks must be the multiplicative form (5) where 0 > k > 1 . ~~
As shown in Keeney (198Oc), the utility function for fatalities consistent with this preference for risk equity is
uF(f)
=
- l / k [(l
+ k)'
-
11
(7)
which is risk prone for the allowable values of parameter k. With a risk prone utility function for fatalities, a lottery with T expected fatalities would be preferred to ffatalities for sure. In simple terms, one would gamble in an attempt t o save more than the average possible fatalities. Thus, a risk prone utility function would not always result in evaluating alternatives in the manner t o save the most expected fatalities. It follows that t o have the original risks more equitably spread among the individuals at risk, one must be willing t o allow a few more expected fatalities. Whenever there is a preference t o better achieve an aspect of risk other than the fatalities themselves, it is necessary t o give up something in terms of fatalities.
Catastrophe A voidance A number of papers, such a Ferreira and Slesin (1976) and Slovic et al. (1 977), suggest that a small probability of a catastrophic loss of life is worse than a larger probability o f a smaller loss of life, given the expected number of fatalities are the same for each case. To be precise, let us say that one prefers catastrophe avoidance if a probability n of having f fatalities is preferred t o a probability 71' of having f fatalities for any f less than f' given that nf = n ' f . Given this assumption, the following is proven using results in Keeney (1980b, 1 9 8 0 ~ ) .
Observation 4. Given the basic model assumptions and a preference for catastrophe avoidance, the utility function for mortality risk must be the multiplicative form (5) where k > 0.
EVALUATION OF MORTALITY RISKS
33
The corresponding utility function for fatalities is (7) where k > 0. As can easily be demonstrated, this utility function is risk averse, which 5 consistent with a preference for T fatalities for sure to any lottery with f expected fatalities. With this preference to avoid a large loss of life (i.e., a catastrophe), one must be willing to accept more expected fatalities.
h Statistical versus Identifiable Fatalities Revisited Combining the cost utility function in (1) with the utility functions for mortality risks in observations 2 through 4 leads to the following interesting result.
Observation 5 . Given the basic model assumptions and ( l ) , the value of an identifiable life equals the value of a statistical life if and only if there is a preference to minimize expected fatalities, the value of an identifiable life is greater than the value of a statistical life if and only if there is a preference for risk equity, and the value of a statistical life is greater than the value of an identifiable life if and only if there is a preference for catastrophe avoidance. Many individuals might feel that more funds should be expended to eliminate an identifiable risk (1,0, . . . ,0) than a statistical risk (l/n, l/n, . . . , l/n). It follows that the preference for risk equity and the risk prone utility function for fatalities are appropriate, given the basic model assumptions.
Evaluating Circumstances with Multiple Causes of Risk In the literature (see, for instance, Starr, 1965); and Slovic et al., 1977), one finds that the undesirability of various risks as perceived by the public seems t o be dependent on many factors. Such factors include whether the risks are voluntary or involuntary, whether they are associated with catastrophic accidents or not, and whether the risks result in immediate or delayed fatalities. To the extent that these factors matter, the evaluation of risks needs t o take them into account. However, it seems that if one is willing to utilize an individual’s relative values for evaluating risks to themselves, the problem might appear differently.
Observation 6. Concerning risks to oneself or loved ones, many individuals do not seem to differentiate between different causes of risk.
34
R.L. Keeney
Although I have not conducted any formal tests to support this observation, I have questioned a number of individuals t o lend some insight about its appropriateness. For example, it has often been suggested that more funds should be expended t o reduce risks from possible nuclear accidents than to reduce risks of the same magnitude from other types of accidents. In this regard I've asked many individuals whether they would prefer option A or option B. Option A is a 50-SO lottery involving an individual in a nuclear accident or not on the following day. If involved in the nuclear accident, painless death immediately results. If not involved, life goes on as usual. Option B involves a S l % chance of being in an ;iutomobile accident and a 49% chance of not being involved on the followriig day. If involved in the automobile accident, the individual will immediately die in a painless manner. If not involved in the accident, life will go on as usual. Almost every individual to whom I have posed this question prefers option A for themselves or for their loved ones. If the likelihood of the automobile accident in option B is reduced to 49%, all of the individuals prefer option B. This is a rather strong indication that an individual is indifferent t o dying in a nuclear or an automobile accident, even thought the nuclear accident has all the "undesirable characteristics" such as involuntary risk associated with it. 1 have posed the same types of questions t o several individuals involving circumstances other than nuclear accidents, where one of the types of death might be categorized "a good kind" and the other type as "a bad kind". In all cases, the results seem to be the same, implying that the individual concerned wishes t o maximize his or her likelihood of surviving past tomorrow unharmed. Thus, if an institution wishes to utilize an individual's own values for evaluating personal risk, it may be appropriate to evaluate risks in the same manner regardless of the cause or circumstances associated with that mortality risk. Typically, individuals face risks from multiple independent sources. Thus, if an individual has risks ql from source 1, q2 from source 2 , . . . , and risks q,, from source in, we might say that the individual risk profile is [ q l . 42 . . . , q m ] . With the independence assumption, the total risk t o the individual is then m
q=l -n(lj=1
By utilizing the same types of questions as discussed above involving multiple risks, it seems as if many individuals prefer circumstances which maximize their likelihood of unharmed survival.
EVALUATION OF MORTALITY RISKS
35
Observation 7. If an individual wishes to maximize his or her likelihood of survival, risk profiles with a smaller total risk should be preferred. If an institution cared to adopt such a preference for evaluating risks from several different sources, it should evaluate alternatives to reduce the overall mortality risks to an individual rather than to reduce risks from specific causes. If government programs involving safety to citizens were consistent with such a policy, funds should be expended based on the numerical risks only and not on the "type" of risk.
Implementation of Utility Analysis A general comment that is sometimes heard about "value of life" literature is "Does anybody ever expect to use that?" It's a reasonable question. And from a prescriptive viewpoint, the answer is most certainly yes. However, one must be careful about the meaning of "use". Numerous programs are aimed at reducing mortality risks (or causing mortality risks as an undesirable effect associated with a perceived beneficial action). T o responsibly evaluate decisions on such programs, it is essential to evaluate the various mortality risk vectors. The evaluation might not be conducted in such a formal manner, but one must have some concept of what these vectors will be and make value judgments to appraise them. The choice is whether to make the assessments of the vectors and value judgments formally or informally. Given the complexity of the problen,, I think it is appropriate to utilize both procedures. The formal procedure is utilized as a device to assist one in understanding and examining various value structures and their appropriateness for institutional policy regarding mortality risks. The purpose of the models and the methodology is to provide critical insights and guidance for action. They should not be a substitute for the decision makers. Use of the models requires the careful consideration of several value judgments. First, one must consider the general structure of the problem. There are implicit value judgments built into this structure as they are built into any problem structure. Second, one needs to appraise the value judgments necessary t o structure the utility function u. For illustration in this paper, we assumed by (1) that the utility function for mortality risks and costs was additive. This assumption could easily be relaxed and all the ideas in the paper would still be valid. The assumptions leading to functional forms for the mortality risk utility function need to be appraised for any specific use. Finally, even if a particular functional form is found to be 3*
36
R.L. Keeney
appropriate, parameters for that utility function must be assessed. These would require the decision makers t o make a few specific value judgments t o identify pairs of indifferent consequences. For these difficult judgments, it is best to ask redundant questions from different perspectives. Inconsistencies will no doubt be observed. However, this is part of the motivation for the process, t o identify the inconsistencies and bring about an internal reconciliation after decision makers understand where and why these inconsistencies occurred. In Keeney ( 1 980a), the assessment of sotne parameters for utility functions involving public risks is illustrated.
Summary and Conclusion There are many approaches which have been suggested for evaluating risks t o life. One of the explanations for the large number of approaches is the fact that there are a large number of distinct generic problems involving risks t o life. It seems as if different approaches t o evaluating such risks would be appropriate for the different circumstances. To date, many approaches involve a "value of life" derived from situations where decisions were not necessarily made in a consistent manner or from situations where only risks t o single individuals were traded off against costs. These were the basic building blocks for models involving risks t o several individuals. These models based on insurance, willingness-to-pay, and court awards seem inappropriate for de terniining prescriptive policy for institutional evaluation of mortality risks. As an alternative, we structure the risk problem using explicit value judgments and examine their implications on the evaluation of mortality risks. I t is evident from observations 2 through 4 that three seemingly reasonable assumptions lead to three incompatible utility functions for mortality risks. Namely, preference t o minimize the expected number of fatalities leads t o the linear utility function for fatalities, preference for risk equity leads t o a risk prone utility function for fatalities, and preference for catastrophe avoidance leads t o a risk averse utility function for fatalities. The important implication of these results is that three desirable objectives are t o some degree mutually inconsistent. One might ask t o what extent is this inconsistency due t o the formalization of the problem using lotteries over mortality risk vectors (p, , pz , . . . , p,) or to the specific definitions of equity and catastrophe avoidance. Unless one is willing t o conclude that this formulation and these definitions are all completely irrelevant to the problem of evaluating mortality risks, the inconsistency holds. If additional attributes are deemed appropriate
EVALUAHON OF MORTALITY RISKS
37
for characterizing mortality risks or if additional concepts of equity and catastrophe avoidance seem reasonable, the degree (but not the fact) of the inconsistency may diminish. The implications of this paper provide some interesting insights, but in no way can be considered a solution to the general problem. The insights, however, do indicate the value of pursuing similar research to determine reasonable institutional evaluation structures. The search for such evaluation schemes is currently being pursued actively by U.S. governmental organizations, such as the Advisory Committee on Reactor Safeguards (1980). The arguments for a systematic approach t o assist with the evaluation of mortality risks are compelling (see, for instance, Okrent, 1980). With better allocation of the available funds, we are likely to be able t o reduce the risks to the public at large in a manner felt t o be appropriate and responsive to the problem.
References Acton. J . P., 1973. Evaluating a public program t o save lives: The case of heart attacks. Santa Monica. California: Rand Corporation Report R-950-RC. Advisory Committee on Reactor Safeguards, 1980. An approach to quantitative safety goals for nuclear power plants. Washington, D.C.: U. S. Nuclear Regulatory Commission, NUREG -0739. Ferreira, J., Jr. and L. Slesin, 1976. Observations on the social impact of large accidents. Cambridge, Massachusetts: Operations Research Center, Massachusetts Institute of Technology, Technical Report No. 122. Howard, R. A., 1979. Life and death decision analysis. 1n:Lawrence Symposium on Systems and Decision Sciences. North Hollywood, California: Western Periodicals. Jones-Lee, M., 1974. The value of changes in the probability of death or injury. Journal of Political Economy, 99, 835-49. Keeney, R. L., 1980a.Evaluating alternatives involving potential fatalities. Operations Research, 28, 188-205. Keeney, R. L., 1980b. Equity and public risk. Operations Research, 28, 527-534. Keeney, R. L., 1980c. Utility functions for equity and public risk. Management Science, 26, 345 -35 3. Linnerooth, J., 1975. A critique of recent modeling efforts t o determine the value of human life. Laxenburg, Austria: International Institute for Applied Systems Analysis, Research Memorandum RM-75 -67. Linnerooth, J., 1979. The value of human life: A review of the model. Economic Inquiry, 17, 52-74. Okrent, D., 1980. Comment on societal risk. Science, 208, 372-375. Savage, L. J., 1954. The Foundations of Statistics. New York: Wiley.
38
R.L. Keeney
Slovic, P., B. Fischhoff, and S. Lichtenstein, 1977. Risk assessment: Basic issues. In: R. W . Kates (ed.), Managing Technological Hazards: Research Needs and Opportunities. Boulder, Colorado: Institute of Behavioral Science, University of Colorado. Stan, C., 1969. Social benefits versus technological risk. Science, 165, 1232-1238. von Neumann J . and 0. Morgenstern, 1947. Theory of Games and Economic Behavior. Princeton, New Jersey: Princeton University Press, 2nd ed. Zeckhauser, R., 1975. Procedures for evaluating lives. Public Policy, 23, 419--464.
THE ROLE OF RISK ASSESSMENT IN A POLITICAL DECISION PROCESS' John LATHROP and Joanne LINNEROOTH' International Institute f o r Applied System Analysis, Laxenburg. Austria
Abstract In this paper, we examine the role risk assessments played in a political decision process: the siting of an LNG facility on the California coast. We find that the political process, where the decisions are made sequentially, bears little resemblance to the analyst's perspective, where objectives are traded off under conditions of uncertainty. A detailed comparison of three risk assessments used in this sequential process reveals that there are many degrees of freedom left to the analyst's judgment, and the results of an assessment may be determined as much by this judgment as by the site and technology considered. In addition, the effectiveness of a risk assessment is shown t o depend not only on its analytic rigor, but on the persuasiveness of its presentation. In order to improve the use of risk assessments in setting public policies, we suggest that rules of evidence, or standards to which risk assessments must adhere in order to be admissible evidence, be considered.
1. Introduction
How did that get there? This is a question that might come to one's lips when driving along a beautiful section of the California coastline, spoiled, suddenly, by a number of large storage tanks. The analytically-minded person might suppose that this "place" has become a "site" only after an elaborate screening process, where careful tradeoffs have been made between the likes of spoiling his view and other socioeconomic-technical concerns. The politically-minded person, alternatively, might suppose that organizational thinking and political game-playing were more important factors in the siting decision. The research reported in this paper is supported by the Bundesministerium fur Forschung und Technologie, F.R.G., contract No. 321/7591/RGB 8001. While support for this work is gratefully acknowledged, the views expressed are the authors' own and are not necessarily shared by the sponsor. We wish to thank Patrick Humphreys, Howard Kunreuther, and James Vaupel for helpful commentson earlier drafts. The author's names are listed in alphabetical order.
40
J. Lathrop and J. Linnerooth
We shall begin this paper by contrasting these two Weltansrhauungen of the siting problem. The analyst's single decision maker who balances the welfare and concerns of those affected by his actions does not coincide with the reality of many conflicting parties who interact in a process that resolves the large problem sequentially, where early-on decisions tend to constrain the alternatives open for the next decision, and so on. The sequential and adversary nature of the process are important factors determining the role that formal analyses can play in influencing the decision outcome. In this paper, we will demonstrate the ways in which risk analyses have been used in a controversial siting issue, the siting of an LNG terminal in California. The conflicting and contradictory results of these studies, we will suggest, is a predictable and important element of the political debate. Not unlike many other areas of scientific investigation, it is difficult, if not impossible, to arrive at indisputable scientific truths especially where the data are scarce and subjective. Yet, because the risk studies are highly quantitative, imitating in some sense technical, engineering studies, they generate false expectations regarding the conclusiveness of the results. While it would be desirable for risk assessments to provide conclusive and indisputable facts, this is an ideal that can never be fully achieved. For this reason, risk analyses should be regarded as introducing necessarily ambiguous evidence into the policy process. Viewing the results of a study as "evidence" instead of "facts" offers a more realistic perspective for improving the uses of these studies, and for improving the studies themselves. The intent of this paper is to describe the results, interpretations, and uses of three risk studies prepared during the course of the attempted siting of an LNG terminal in Oxnard, California. The decision process is briefly presented in Section 11, and the three studies are described in the context of this process in Section 111. In the final section, we draw some tentative conclusions regarding an improved role of technical analyses in aiding or improving siting decisions.
11. Siting an LNG Terminal in California
A . Nature of the Risk Methane, or natural gas, becomes a liquid when cooled to -163 O C , with a density more than 600 times that of its gaseous phase. Liquefied natural gas (LNG) can be economically transported over long ocean distances; the economies of scale lead t o large ships (e.g., 130,000 m3 LNG) and large onshore storage tanks (e.g., 77,500 m3 LNG each) for a base load opera-
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
41
tion such as the one proposed for California. In the event of a ship or terminal accident, a significant amount of LNG could be spilled, whch would "boil of? into a methane cloud possibly covering a sizeable area before igniting and burning. Since there exist almost no data concerning large LNG spills, and since the dispersion characteristics of methane clouds are poorly understood, there is a great deal of uncertainty involved in predicting accident consequences. Yet, the present state of knowledge indicates that at some very low probability an LNG accident could result in a cloud covering several miles before igniting. Depending on the population density of the area covered by the cloud, the possibility exists, albeit at a low probability, for a catastrophic accident.
B. The Siting Problem from the Perspective of the Decision Analyst The general purpose of a risk assessment is to estimate the probability and consequences of the range of possible accidents. It is important to distinguish a risk analysis, which is intended to provide specific data on the safety of the operation, from a decision analysis, the intent of which is to aid an interested party in making a choice or taking a stand. A risk analysis supplies, but does not evaluate, one piece of the information needed to make an informed decision on the selection of a site. The decision analyst might view the problem as consisting of two decisions: whether or not to import LNG and if so, where to site the plant. Both decisions involve different tradeoffs. The importation of LNG would reduce the risk of a shortage of natural gas and would improve air quality (due to an increased use of a clean-burning fuel). Yet these benefits would come at a financial cost (LNG is an expensive form of natural gas), an environmental cost (a large facility on the coast), and a cost in terms of population safety. Siting the plant at a remote and beautiful part of the coast reduces the population risk relative to siting the plant in a port, but increases environmental degradation and financial cost. Because of the uncertainty surrounding estimates of population risk, as well as estimates of the risk of a shortage in natural gas, in a decision-analytic sense the "whether" and "where" decisions involve the trading off under uncertainty of natural gas shortage risk, air quality, environmental degradation, financial cost, and population risk. From this perspective, estimates of population risk take their place among the myriad of uncertain inputs to the decision. Yet, it may come as no surprise that the actual political siting process bears little resemblance
42
J . Lathrop and J . Linnerooth
to the decision-analytic framework just described. As we will show in this section, the sequential nature of the process and the multiple parties give risk analyses a special status.
C. A Description of the Decision Process
In the late 1960s, faced with projections of decreasing natural gas supplies and increasing need, California gas utilities began to seek additional supplies. In 1974, Western LNG Terminal Company (Western), which was formed t o represent the LNG interests of the gas utilities, applied for approval of three LNC import sites on the California coast: Point Conception (PC),located on a remote and attractive part of the coast; Oxnard (X),a port city; and Los Angeles (LA), a large harbor metropolis. The LNC would be shipped from Southern Alaska, Alaska’s North Slope, and Indonesia. As of this writing, Point Conception, the one site remaining under active consideration, is still pending approval. This section describes the procedures, decisions, and events of this lengthy process (for a more complete review see Linnerooth, 1980; and Lathrop, 1980). Though much preliminary work had been done by the California utilities in negotiating a contract with Indonesia and in preselecting possible sites, for our purposes the process began with Western’s application for approval of the three sites. This marked the beginning of the fourround process as shown in Table 1, where each round can be characterized by a problem formulation as perceived by most if not all of the interested par tie^,^ by an event (proposal, request, etc.) initiating the discussions, and by a decision(s) or nondecision concluding the round (for a more detailed description of this characterization see Kunreuther et al., 1981 ). A routine process for approving industrial facilities existed at the time of Western’s applications. This siting procedure was, however, coniplex, involving three levels of government. The Federal Power Commission was responsible for assessing national need as well as environmental impact; the local authorities were required to grant the various licenses for land use, access, and so forth, and the California Coastal Commission (CCC) was mandated to give the final approval for any facility on the California coastline. As the application progressed through the approval channels, it became increasingly apparent that these routine procedures, especially on the local level, were ill suited to handle a large-scale facility having the potential for a catastrophic accident. The mismatch between 3This does not preclude the possibility that some parties might object to this formulation and challenge it during the course of the debate.
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
43
Table 1. Summary of Rounds in the California LNG Siting Case
ROUND I
DA TE
Problem Formulation:
Should the proposed sites be approved? That is: Does California need LNG, and if so, which, if any, of the proposed sites is appropriate?
Initiating Event :
Applicant files for the approval of three sites.
September 1974 (34 months)
Conclusion:
Applicant perceives that no site is approvable without a long delay.
July 1977
ROUND 2 Problem Formulation:
How should need for LNG be determined? If need is established, how should an LNG facility be sited?
Initiating Event:
Applicant and others put pressure on the State Legislature to facilitate LNG siting.
July 1977 (2 months)
Conclusion:
A new siting process is established
September 1977
that assumes a need for LNG, which is designed to accelerate LNG terminal siting.
ROUND 3 Problem Formulation:
Which site is appropriate?
Initiating Event:
Applicant files for approval of Point Conception site.
October 1977 (10 months)
Conclusion:
Site approved conditional on consideration of additional seismic risk data.
July 1978
ROUND 4 Problem Formulation:
Is Point Conception seismically safe?
Initiating Event:
Regulatory agencies set up procedures to consider additional seismic risk data.
Conclusion:
(Round still in progress)
1980
44
J. Latlirop and J . Linnerooth
the scale of the project and the proccdures designed t o approve it was aggravated by the novelty of the technology. The risks were ill defined, the experts were not in agreement on the possible consequences of a spill, and there were iio standard operating procedures or regulations for the technology. From the point of view of formal risk assessments, the lirst round of the California siting process was the most interesting. To support its applications t o the Federal Power Commission, Western was required t o submit an analysis of the safety of the facilities and their operations. For this purpose, it contracted with a consulting firm (Science Applications, lnc. /SAI/). As required by State Law, the local governments involved were required to submit an Environmental Impact Report (EIR); of most interest to us here was the Oxnard study which was also submitted by a consulting firm (Socio-Economic Systems /SES/). As will be shown in later sections, the Oxnard study incited a great deal o f public resistance t o the terminal plans. Finally, the Federal Power Commission was required t o carry out an in-house Environmental Impact Statement (EIS). Though the approval appeared t o be a routine matter, the lowprobability consequences of this large-scale operation complicated the process considerably, resulting in the stalemate (at last as perceived by Western) concluding Round One. The apparent significance of the risks of the planned facilities appear surprising in view of the low estimates of these risks assessed by Western and the FPC. But the picture began t o cloud when the FPC staff noted an earthquake fault near the Los Angles site, when several worst-case scenarios for Oxnard were published in the SES report, and with the growing resistance on the part of the CCC ant! Santa Barbara County which held approval authority for the Point Conception site. With the possibility of not receiving approval for any one of the three sites, Western turned t o the California State Legislature for help, initiating Round Two of the process. Western perceived a better chance in changing the siting process in its favor than in fighting the multipleapproval, standard process, and successfully brought pressure to bear on the State Legislature (along with business and labor) t o pass the 1977 California LNG Siting Act (Senate Bill 1081). This legislation concluded Round Two (see Table 1) which was effectively a "problem bounding" round, or a round for the purpose of narrowing the bounds of the problem t o a proportion that could be handled by the relevant institutions. The Siting Act removed the decision authority from the local agencies and the CCC and vested sole state licensing authority with the California Public Utilities Commission (CPUC), a regulator with a history of sympathy for utility capacity expansion. The act also gave the CCC the role of ranking
RISK ASSESSMENT I N A POLITICAL DECISION PROCESS
4s
alternative sites, including the applied for site, but that ranking was not binding upon the CPUC. Finally, the act required that the site be remote and onshore. The applicant's decision t o reapply for the Point Conception site under the new process initiated the third round of political negotiations; this round was formulated more narrowly than those preceding it. Essentially, the only question open for the political agenda was "which site is appropriate?" While the CCC ranked Point Conception third out of its four top-ranked sites, the CPUC selected this site for conditional approval on the grounds that the higher-ranked sites would involve excessive delays as the applicant would have t o draw up new plans. The CPUC approval was conditional on analysis o f wind and wave conditions, archeological data, and, most importantly, seismic risk. At the federal level, where both the Oxnard and Point Conception sites remained "alive", a reorganization had replaced the FPC with two agencies: the Economic Regulatory Administration (ERA) in charge of import approval and the Federal Energy Regulatory Commission (FERC) in charge of site approval. The ERA approved the Indonesia import project. The FERC staff, which carried out detailed risk studies, favored Oxnard, but the Commissioners approved Point Conception t o avoid a confrontation with the State which had legislated against the nonreniote Oxnard site. Conditional approvals on the part of the three mandated decision makers, the CPUC, the ERA, and the FERC, did not, however, resolve the siting issue. Opponents of the project petitioned a federal appellate court for a stay in the proceedings on the grounds that not all seismic risk evidence had been considered. The court has remanded the case back t o the FERC. That ruling, and the subsequent procedures t o investigate seismic risk set u p by the relevant agencies, has initiated the fourth round of discussions (see Table 1). This round is tightly formulated as a technical risk issue. The single question open for discussion o n the political agenda is whether Point Conception is seismically safe.
D . Siting Decisions as Public Policy An examination of the California siting process reveals that siting an LNG terminal is not so much a decision, in the usual sense of individual goaldirected behavior, as a policy-malung process. Majone (1981) gives three main reasons that differentiate private decisions from public policies: First, in the public domain actions must be justified with seemingly objective arguments. Second, policies, unlike individual decisions, need t o gain a consensus in order to be viable. Finally, public choices are not
46
J. Lathrop and J. Linnerooth
made by only one person. A consensus within andlor beyond an organization can be reached only with convincing and institutionally appropriate arguments. Because of this need on the part of an interested party, whether a citizens’ group or a licensing agency, t o justify its stand on the siting issue, risk assessments showing the plant t o be safe or not safe are especially useful. It is also important t o recognize that organizations often deal with current issues, not so much for their sake alone, but for their longer term implications for the institutions. Western, for instance, may have pursued a change in siting procedures (S. 1081), not so much because it perceived Oxnard to be blocked by local opposition, but because it recognized the longer term benefits of shifting the power away from the local communities by a one-stop licensing procedure. The California State Legislature did not set a policy for remote siting as an analyst would prescribe, that is, malung explicit the tradeoffs, but compromised instead among the proand the anti-oxnard interests. The Federal Energy Regulatory Commission also had t o consider longer term implications of its siting policy as was evident in its strategy not t o pursue its preference for Oxnard and thus provoke a Federal-State confrontation. Another feature of the political siting process that frustrates attempts on the part of the decision analyst t o set out clearly the tradeoffs involved is the sequential nature of the decisions. In California, resolution of the question whether a site was needed necessarily preceded the site selection phase? which in turn will precede the licensing process. Because of time and cost considerations, a decision on one level is often binding in that it cannot easily be reopened for political debate. Thus the process becomes tied or locked into certain courses of action. The responsible agencies have little alternative but t o consider increasingly narrow aspects of the problem. As a case in point, during the seven-year course of the California proceedings, the need for imported natural gas in the State diminished greatly.’ Instead of reexamining this need, the process is presently locked into a commitment for an import facility. All efforts are now directed toward pursuing the narrower problem of seismic risk a t Point Conception. A perhaps more important case in terms of the long-run implications is the remote siting constraint set by the California Legislature which has effectively ruled out the possibility of trading off population risk for other ecocomic or environmental benefits. 41n the first round, the questions of need and site were considered simultaneously. This, however, did not lead to a decision on site. In the second round, the State Legislature effectively resolved the need question. 5 Gas prices were deregulated during this time which increased the domestic supply of natural gas.
41
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
111. The Oxnard Risk Assessments
During the course of events in the California LNG terminal siting debate, there were seven major risk assessments carried out for the three prospective sites: Los Angeles, Oxnard and Point Conception. To understand the role these assessments played in the process, as well as in the outcome of the debate, it is instructive to review their content and use. An important point of this paper is to demonstrate that the content of such a study is largely determined by the use of the study in the political debate. It is only with an understanding of the latter that recommendations can be made for improving the former. For the sake of brevity, and with no loss in generality, we will limit our discussion to the early studies concerning the Oxnard site. These studies, the Science Applications, lnc., risk assessment (SAI, 1975), the Federal Power Commission risk assessment (FPC, 1976), and the SocioEconomic Systems risk assessment (SES, 1976) will be discussed in turn.
A . A n Overview 1. Science Applications, Inc., Risk Assessment
As part of its case for the Federal Power Commission, the applicant commissioned. a consulting firm, Science Applications, lnc. (SAI), to do a risk assessment of its proposed Oxnard terminal. This assessment was elaborate, involving calculations of probabtlities of vessel accidents, tank ruptures, LNG spill sizes, methane cloud dispersion and ignition, and the resulting fatalities. The computer model developed for cloud hspersion was deemed one of the two best in a Coast Guard review of several models (Havens, 1977). Ship collision calculations also involved a computer model, calibrated to statistics from several harbors. The SAI results were presented in the form of several different indices of risk. Individual annual probabilities of fatality due to the terminal were presented in the format of iso-probability contour maps of the site (see Figure 1). Those probabilities ranged from a maximum of 1.5-lo-' near the terminal to less than lo-'' beyond three miles for the most conservative (risk-overstating) set of assumptions. Other contour maps were presented for less conservative assumptions. The maximum individual probability of LNG fatality was compared to other risks. The individual probability of dying in a fire generally was reported as 220 times greater; the maximum probability of having a plane fall on a person in the site vicinity was reported as 10 times greater than the LNG risk. b
48
J. Lathrop and J. Linnerooth
0
1
2
--SeJ. (km)
Figure I . Iso-Probability Contour Map for Oxnard, Most Conservative Assumptions. Source: SAI (1972)
Annual probabilities of catastrophes were also presented, including lo-' for a 2,000 to 10,000 fatality year, and 1.4 lo-", or "one chance in 710 septendecillion", for the maximum catastrophe: 1 13,OO fatalities. The study concluded that LNG risks at the Oxnard site were "extremely low". The results of the SAI study seem to have been persuasive at the FPC hearing. The FPC decision of July, 1977, cited all the various numbers mentioned above and a few more, noted the conservative assumptions, pointed out that no party disputed the findings, and found that the Oxnard site involved levels of risk sufficiently low for FPC approval. While the SAI ztudy seems to have been effective in its intended role before the FPC', as events transpired, it did not have any bearing on the siting process, as shortly after the FPC decision a federal reorganization abolished the FPC and set up a new approval procedure.
2 . Federal Power Commission Staff Risk Assessment The staff of the FPC also carried out a risk assessment as part of the Environmental Impact Statement (EIS) to be presented to the Commission in the July, 1977, hearing. This assessment generally used less elaborate models and less resources than SAI in reaching its conclusions. The logic of the report can be stated quite simply: All significant risks were seen as arising from ship accidents. While this is plausible on technical grounds, the assessment did not defend this assumption with analysis. Those accidents were assumed to happen at least as far from shore as the
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
49
end of the 6,000 ft (1.8 km) trestle of the Oxnard facility. Since the FPC staff determined that the maximum travel of the flammable vapor cloud and maximum distance of significant fire radiation effects were both less than 6,000 feet, the risk was deemed to be "negligible". The FPC assessment results included risk measures for the Point Conception and Los Angeles sites. In all three cases, risk was measured by two indices: annual expected fatalities and annual individual probability of LNG fatality. However, for the reasons discussed above, no numbers were given for those indices for Oxnard, only the abbreviation for "negligible". The report concluded that ship transport to the Oxnard site "constitute[s] an acceptable risk to the public". As with the SAI study, the results of the FPC staff assessment seem to have been accepted and persuasive at the FPC hearing. The decision of July, 1977, cites both the FPC and SAI results in support of its conclusions already discussed.
3. Socio-Economic Systems Risk Assessment As part of its Environmental Impact Report, the city of Oxnard commissioned a consulting firm, Socio-Economic Systems, Inc. (SES), to carry out a risk assessment of the LNG terminal. This assessment took a broader look at the problem than the previous two assessments. Rather than characterize the risk solely in probabilistic terms, the report presented 26 "population risk scenario;, with maps of the Oxnard area with shaded maximum plume areas or fire radiation zones superimposed, for each of several wind directions, spill sizes, etc. (see Figure 2). Each scenario named a "population risk", in fact the number of people covered by the maximum plume or fire zone, which ranged from 0 to 70,000. These scenarios could be described as maximum credible accidents (though SES did not do so). They were not accompanied by any estimates of their probabilities, which would have been quite low. In the section immediately following the scenarios, the SES report presented a more probabilistic analysis, which in fact combined the most conservative numbers and assumptions from the SAI and FPC studies as well as a Coast Guard study. In tabulating these, the report pointed out wide differences in numbers used in different studies. For example, the FPC used a probability of ship collision more than 5,600 times larger than the one used by SAI. The number of expected fatalities per year computed by SES in this way was 5.74, or 380 times larger than the SAI estimate. These numbers (SES and SAI estimates) were compared with expected fatalities from other hazards. While the SAI report estimates 4
50
J. Lathrop and J . Linnerooth
-
a a
LEGEND 0
..... f: ...... .:,:.. . . .
= = = = =
500PEOPLE CENSUS TRACT NUMBER HYPOTHETICAL VAPOR CLOUD PATH POSSIBLE IGNITION SOURCE PROBABLE IGNITION SOURCE IDWELLINGS) = SHIPCOURSE
g'
*
--
Figure 2. SES Population Risk Scenario for Oxnard. Source: SES (1976)
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
51
LNG to have 7 times more expected fatalities than a hypothetical Oxnard nuclear plant, the SES report estimates LNG to have 2,900 times more expected fatalities. The SES report plotted annual probabilities of catastrophes against the numbers of fatalities involved, for both its and the SAI estimates, and compared these plots with other types of hazards (see Figure 3). Once
FIGURE 39
RISK COMPARISON, LNG PROJECT
AND OTHER NATURAL AND MAN-MADE HAZARDS
...... ............ ...... ...... ...... ............ ...... .....
Lagand RANGE OF UNCERTAINTY, OXNARO LNG FACILITY RISK ASSESSMENTS
'US. Nudaar Rquletory Commmon. "Ramor Salaty Study" Wuh.1400. October 1075
'.SCianca Applications. Inc. "LNG T a m i n d 11111: A-ant lor O x m d , Cxlifornia." O w m b r r 1976
Figure 3. Probabilities vs. Size of Catastrophes. Source: SES (1976)
4*
52
J. Lathrop and J. Linnerooth
again, the SAl estimates for LNG were higher than the numbers for a nuclear plant, while the SES estimates were much higher still. In marked contrast to the other two assessments, the SES study concluded that in view of the problems of estimating risks with very little experience base ff. and the differences in risk estimates between reports, it is not now possible to state confidently that the proposed facility poses a ’low probability’ of a high consequence accidentf’. As it happened, the SES report was never directly used because the California LNG siting process was changed by new legislation in 1977. which ruled out non-remote sites such as Oxnard. However, the SES report may have been influential in indirect ways. The population risk scenarios, which allowed local residents t o see a deadly methane plume covering their own homes, in Ahern’s (1980) words “electrified opposition to the terminal”. In addition, the generally cautious tone of the report may have increased the sense of caution and dampened support for the terminal in the City Council. The report seems to have increased opposition to the terminal, opposition which led eventually to the remote siting provision of the 1977 legislation.6
B. Comparing the Assessments In this section, key features of each of the three assessments are selected for comparison. Table 2 shows these features in summary form.
1 . Use
As indicated in Table 2 , each assessment was used differently in the siting process. The SAI study was used to defend Western’s application. In the other two cases, the assessments could be seen as having advisory roles: the FPC study was used by the staff to advise the commissioners, and the SES study was part of an environmental impact report (EIR) with the purpose of providing a data base for all parties to the process.
As late as 1980 one of the authors was told by a state legislative aide that the statc could not site a plant that could kill 40,000 people. The fatalities described without an associated probability coincides (in format and number) with one of the scenarios in the SES report.
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
53
Table 2. Summary Comparison of Risk Assessments SAI
FPC Staff
SES
use:
support applicant stand to regulator
upport staff stanc t o commission
support SES stand to all parties
Scope:
thorough, but own models only
mly ship accidcnt at end of trestle
composite of several studies
Assessor
Example Parameters:' probability of ship accident
5.6
cloud distance 25,000 m 3
*
1U'
3.1cr2
1.2 k m
23 km
3
1U6
2 km
*
Example Results:' maximum individual probability (pi) catastrophe size: prob.
1.5
"negligible" 4,000:1 lp (note2)
2,000-1 0,000 J F 8 11 3 , 0 0 0 : 1 ~ s 7
expected fatalities
Formats:
. 1u7
' )max plume maps
iso-pi contour maps (see Figure 1 )
(see Figurc 2) *)catastrophe size vs probability (comparative) (see Figure 3)
(besides tables)
Conchsion :
Effect:
'
risks ... are extremely low." I'...
FPC persuaded to approve
5.74
"negligible"
,015
"
... risks .. arc
... negligible."
" ... an acceptable risk to the public."
FPC persuaded t o approve
... it is not now possible t o state confidently that (it) poses a 'low probability' of a highonsequence accident." "
increased opposition dccreased support
Several differing qualifying conditions apply t o each number, so data are only appropriate for very rough comparisons. All probabilities and expected values are annual. This figure was scaled off of a low-resolution figure, and so is quite approximate.
54
J . Lathrop and J. Linnerooth
2 . Scope
The analysts for each assessment adopted widely varying assumptions, problem scopes, which explain many of the fundamental differences in the results and effects of the assessments. The SAI report was quite thorough, but used primarily in-house computer models for the important calculations concerning ship accidents and flammable cloud travel distances. While those models were impressive, by neglecting to acknowledge the existence of experts and models with conflicting results, the SAI report over-stated the confidence with which their results should be accepted. The FPC report considered only one type of accident: ship accidents at the end of the trestle. There were reasons for believing that other accidents would not add significantly to the risk. However, because the staff did not carry out calculations to prove that contention, that important narrowing of the problem is defended only by the analysts’judgment. Other parties to the process are then apt to suspect that the risk is understated. The SES study combined widely varying results of several other studies. In doing 90, the analysts were able to make abundantly clear the extent of uncertainty in expert opinions on LNG safety.
3 . Parameters As shown in Table 2 the critical parameters used in the risk calculations differed by from one to four orders of magnitude among the assessments. This makes clear the large uncertainties in elements of the assessments, uncertainties which contributed to the grossly differing conclusions and effects of the assessments. Clearly, such uncertainties should be reflected in the assessment results. Yet, the uncertainties were not addressed in the SAI report, and while the FPC report acknowledged them, they were not presented as qualifying factors for the results. (A later report on Oxnard by the Federal Energy Regulatory Commission, successor t o the FPC, took some of the uncertainties into account.) If each risk assessment was in fact a probabilistic representation of the available technical knowledge concerning LNG risk, as may have reasonably been assumed by a reader, then each assessment should have acknowledged the existence of conflicting opinions about physical processes important t o LNG risk. This is an especially important matter for low-probability catastrophic risks such as LNG, since the very low reported probabilities of catastrophe could be meaningless given conflicting expert opinions. For example, how meaning ful is a probability of of a catastrophe if there is a high probability
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
55
that the cloud dispersion model on which that lo-’ was based is so erroneous that the actual catastrophe probability is more than (see Mandl and Lathrop, 1981).
4. Results and Formats It is instructive to ask what is being assessed by a risk assessment. As shown in Table 2, not only do the assessments differ by up to two orders of magnitude on the measurements of risk, they differ more fundamentally on how risk is characterized, that is, the dimensions used to describe risk and the formats used to present the results. The maximum plume maps of the SES report (essentially maximum credible accidents) were terrifying in comparison to the more reassuring message of the other two reports. Underlying the problem of selecting an appropriate format to communicate the results of a risk assessment is the fact that there is no objective risk associated with a novel and complex technology. Beyond the problems of relying on subjective probabilities where no frequentistic data base is available, risk itself is a many-dimensioned concept, with no apparent consensus on how those dimensions should be combined into a (1980) point out, social risk measure. In fact, as von Winterfeldt and RIOS groups may differ in their values and beliefs about technological risks in ways related to more general value orientation. It would seem, then, that it is important to define clearly the assessed risk. Yet, there is an odd lack of precision in many risk assessments as to just what it is that is being assessed. A survey of eighteen risk assessments found that only four included an explicit definition of risk, and those four differed greatly (Mandl and Lathrop, 1981). It appears that risk assessments are commissioned wilhout any specification of what it is that is to be assessed. Operating in that vacuum, each risk assessment team sets out to characterize risk in whatever way it sees fit. Not only do risk assessments differ on what they are assessing, but also risk assessments of an identical physical plant, compared on the same measure, are often quite dissimilar. Compare, for example, estimates of estimated annual expected fatalities for the Oxnard facility. The SAI report estimated ,015;the SES report estimated 5.74, or 380 times the SAI number (which puts into question the two and three significant figures of the assessments). Underlying this lack of agreement is the fact that these probabilistic assessments are not entirely probabilistic. Gaps in the probabilistic models are filled by assumptions that often are not probabilistic in the sense of certainty-equivalent representations of incomplete knowledge. As a case in point, consider estimates of the probability
56
J . Lathrop and J . Linnerooth
of ignition at the spill site in the event of an LNG ship collision. This estimate was not generated from a computer model, but was set by expert judgment. The number selected was not an expected value, modal value, or certainty equivalent, but was typically a conservative value, i.e., a number that would overstate the risk with some unstated confidence. Such an assessment is more defensible; the final measure is not apt to understate the risk, since all gaps in the model are filled by risk-overstating numbers. But with this procedure the expected fatalities measure is not, in fact, expected fatalities, but some odd mix of maximum and expected fatalities. In order t o calculate actual expected fatalities, the uncertainty in underlying assumptions must be explicitly modeled by assessing a probability distribution over the various sets of assumptions that could be used in the analysis, and taking the expectation of analysis results over that probability distribution. That approach would not only yield more meaningful expectations, but would also provide the basis for analyses t o assess the value of research to improve the data base used for risk assessment. The conservatism described above has three important effects. First, the range of possible conservative assumptions contributes t o differences among assessments. I n this respect, the SES study simply compared the SAI, the FPC, and the U.S. Coast Guard studies, and adopted the most conservative assumptions from each to reach its 5.74 expected fatalities, 380 times the S A I estimate. Second, conservatism blurs the distinction between probabilistic and non-probabilistic measures. The population risk scenarios of the SES study, which are similar t o maximum credible accidents, seem non-probabilistic. Yet, since each was generated by making many conservative assumptions, they are not in fact qualitatively different from the seemingly more probabilistic expected fatality measures. Finally, by ruling out the possibility of modeling the uncertainty (usually by modeling uncertain quantity as several levels of conservativeness), the possibility of using sensitivity analyses to examine the importance or redundancy of improving the modeling is also ruled out. In this way, a major decision aiding possibility of decision analysis is precluded.
5. Conclusions One of the most striking contrasts aniong the assessments is found by comparing their verbal conclusions, as summarized in Table 2 . It appears that the SES study could hardly be describing the same site and technology as the SAI and FPC studies. Yet with the exception of the FPC "acceptable risk" statement, in some sense the conclusions in Table 2 are correct and
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
57
even consistent. Because the SAI and FPC statements are prefaced by "it is the opinion of . . . that . . and there is n o description of how confident the analysts are of the results, they do not conflict with the phrasing of the SES report, "it is not . . . possible to state confidently . . .". But, of course, confidence in results is an essential issue here, as has been discussed in preceding paragraphs. The "acceptable risk" opinion of the FPC staff raises the issue of how broad the mandate of the risk assessors should be. The other assessments only assess the level of risk, or the "facts" of the situation, and not the acceptability of the risk which naturally brings values into the assessment in a more direct way. Several authors have made the point that the acceptability of a risk is an essentially political question, beyond the legitimate mandate of technical risk assessors (see, e.g., Fischhoff et al., 1981). .f'
6 . Effect As shown in Table 2 , there were two markedly different effects produced
by the three assessments, persuading authorities of the safety of the terminal and increasing public opposition t o the terminal. The opposition following publication of the SES study could be attributed to its larger measures of risk. However, according to a key participant in the process (Ahern, 1980), the more important contribution was made by the maximum plume map formats used by SES and the great deal of uncertainty acknowledged in the SES conclusions.
7. Other Factors
Perhaps one of the most important factors in all three of the risk assessments is the highly conditional nature of the results. Yet, without exception, this factor is not clearly presented as a qualifier in their conclusions. For instance, the assessments do not take sabotage or terrorist action into account. According to the FPC decision, SAI maintained that such risks cannot be quantified. Yet the appropriate response to that problem is either to use direct subjective judgment (as was used for ignition probabilities) or to make clear in the presentation of the results that such events are omitted. To do neither has the effect of understating, in contrast to the conservatism discussed earlier, the risks of the terminal.
58
J. Lathrop and J. Linnerooth
8. In Sum In reviewing the differences among the assessments, it becomes clear that there is a large degree of flexibility or freedom left t o engineering and analytic judgment. Among those degrees of freedom are: decisions concerning how t o characterize risk, what formats to use for presentation, what gaps to fill with assumptions of what degree of conservatism, which of several conflicting models t o use, how to portray the degree of confidence in the results, and what contingencies simply t o leave out of the analysis. This leeway explains the differences among the three risk assessments examined here. It can push the risk measurement results in any direction. Very conservative assumptions can drive it up; omissions of inconvenient aspects such as terrorism can drive it down; clear presentations of expert disagreements can decrease the confidence in the results; particular formats can feature more or less salient aspects of the risk; and so on. This flexibility on the part of the analyst can have a major effect on the results of the analysis, over- or understating the risk over such a large range that the final result may havc more t o d o with the predilections of the analyst than with any physical characteristics of the site or technology. Yet, risk assessments are generally viewed more in the realm of scientifically-determined facts than of subjectively-determined evidence, which serves only t o fog discussions on the safety of a terminal. This dual nature of subjective, probabilistic analyses, and the implications for improving the policy process, is the topic of the following sections.
C. Policy Context
In this section, we would like t o turn t o examining the assessments within the context of the policy process. Particularly, we will describe the timing of the assessments in relation to how the problems were defined on the political agenda, the purpose of the assessments in relation to these problems, and the uses of the assessments. I n the following section, we will reexaniinc the content of the reports from the perspective of their role in the policy process.
1 . Timing of the Risk Assessments The Oxnard risk assessments examined here were carried out in an early stage of the process. The problem (Round One) was defined in vague terms. Both questions. whether an LNG terminal is needed and which, if any, of
RISK ASSESSMENT IN A POLITICAL DEClSlON PROCHSS
59
the proposed sites is acceptable, were yet t o be resolved. The process, not locked into having a terminal or necessarily approving any of the sites, did not have a clearly defined direction. Many possible issues could have served as a focus or "handle" for the arguments, pro and con, any site. Since the assessments were commissioned during this unstructured and vague stage of the debate, it is not surprising that "risk" became an important focus for the discussions.'
2 . Purpose of the Risk Assessments A general purpose of each of the assessments was t o establish the accept-
ability or nonacceptability of the Oxnard terminal from the standpoint of the safety risks it would impose on the surrounding population. From an analyst's point of view, it in sttiking that the question of safety was not viewed in terms of the benefits of the facility (cost-benefit framework) or in terms of the safety of alternative energy sources. This threshold concept of safety (Is it safe enough irrespective of benefits or alternatives?) is typical of debates on the assessment of new and novel technologies, and is not surprising when seen in relation t o the political decision procedures. The assessments were not msde as an input t o a wholistic analysis, such as that described in Section 11, where tradeoffs are explicit and all alternatives are evaluated, but rather as support for an argument supporting or opposing less significant, incremental decisions a t a particular point in time. The sequential nature of the decision procedures, as clearly demonstrated by the increasing concreteness of the problem formulations through the four rounds of discussions in California, limits the possibilities for comprehensive analyses. 'The risk studies were carried out, not as an input to a broad energy siting analysis in California, but to support a more narrowly defined problem (Should site x or site y be approved?). Since Round One in California was not defined in these narrow terms (the question of whether the terminal was needed was yet t o be resolved), the analyses were ill-suited t o address fully the issues o n the table. In some sense, then, analyses designed t o address the question of safety were prematurely introduced into a process that had not resolved higher-order 7The significance of the timing of these studies is especially evident when viewed in a cross-national context. In the F.R.G., for example, the risks to the surrounding population from a proposed LNG terminal were first considered after the site, but before the design of the terminai had been approved (Atz, 1981). In the U.K., a risk assessment will be carried out for an export terminal planned in Scotland only after the terminal has been built (Macgill, 1981). It is not surprising that the safety of the plant has played a larger role in the US.than in the F.R.G. and the U.K.
60
J . Lathrop and J . Linnerooth
questions of energy policy. Though they served to focus the debate on the safety question, they could not offer (nor were they intended to offer) a panacea for the resolution of the siting question. It is not surprising, then, that Round One ended i n a stalemate. The second round, where the State Legislature took center stage, narrowed the problem (by resolving the question whether California needed a site) to a proportion more receptive of technical risk studies. Whether the site was safe, and not whether it was needed, became the critical question.
3. The Uses Made of the Risk Assessments
It might be useful to distinguish between those risk assessments commissioned (or carried out in-house) to advise the client (or advise the agency) on a course of action (pre-decision) and those commissioned t o support or rationalize a client's actions or intended actions (post-decision). The latter should not be viewed as falling outside the routine business of policy analysis. As Majone (1981) points out, the policy-making process is driven by the clash of opposing arguments and for this reason policy makers need retrospective (post-decision) analysis, including 'rationalization' as much as they need prospective (or predecision) analysis" (p. 17). Though the preand post-decision distinction is useful, the distinction was not always in the case of Oxnard a clear one. The reason is that there were, as would be expected in the dynamic process described above, several audiences over time for each of the three studies. The SAI study, as commissioned by Western, can be fairly unambiguously classified as a post-decision rationalization; the expressed purpose was to defend Western's decision to site at Oxnard at hearings held on the question by the FPC, which was in charge of approving the application. The SES report, alternatively, was commissioned for the purpose of advising the Oxnard City Council of the accident risks for the Oxnard site. As could have been anticipated, its audience expanded in time to include local interest groups, the Sierra Club, and eventually the State Legislature, and the purpose of the report was transformed from advising to rationalizing arguments against Oxnard as a suitable site. The case of the FPC is the most ambiguous. Though the report was prepared to advise the staff in t:king a position on Oxnard, it was carried out in full knowledge that it would be used to rationalize the staff's chosen position at the Commission hearings. An important point here is that all of the analyses were used, if not immediately, at some time to support a stand taken by one or more of the parties. Their functions were not in any case limited to that of solely
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
61
advisory. Each was intended to go beyond that of planning a course of action (or stand) on the part of the client to that of defending the client's position in a public setting (usually a legalistic hearing). For this reason, it was important that each report not only inform the client but make its case as persuasively as possible.
D.77ie Role of Risk Assessments The role of riskassessments can be viewed from two opposing perspectives. Because the assessments are quantitative, imitating the physical sciences, they are seen as objective facts. Yet, as we have shown in this paper, the large uncertainties involved necessarily push the evidence out of the realm of "facts" and into the realm of "opinion". This dual nature of formal risk studies has fogged discussions on their role in the policy process. I . Facts vs. Judgment
The possibility that the form or content of scientific knowledge, as distinct from its incidence or reception, might in some way be socially determined, has recently been put forth by sociologists of science. Indeed, the sociology of science as developed in Europe since the early 1970s has challenged the positivist view of science (for a review of this literature, see Mulkay, 1980). Though several authors have discussed the possible "pitfalls" of analysis, whereby values on the part of the analyst color his "methodologies" and results and whereby heuristics introduce biases into his work, Wynne (1981) reminds us of the importance of recognizing these biases as part and parcel of science and not eradicable lapses from proper rational scientific analysis. There is a pervasive myth about the nature of science which supports this false approach t o the question of "analytic bias". The tendency in the literature is to regard bias or mistakes as individual and isolated in origin, which suggests that ideal objective scientific knowledge can be attained in professional practice and as an input to policy issues. .. This gives a fundamentally misleading and politically damaging picture of the role of expertise, and may make us part of the problems we analyze (pp. 1-2). A recognition of the inevitability of intertwining facts and values leads us to examine critically recent notions of the desirability of separating information from judgment. The widely-held view of the scientist
62
J. Lathrop and J . Linnerooth
producing "objective" information and keeping "facts" and "values" in separate, airtight containers, clearly reflects the acceptance of the "standard view", of science. An opposing view, as suggested in the above quote by Wynne, is t o recognize that there is a n important element of subjective judgment in all scientific experiments, which is especially apparent in policy analysis. There is a clear need for judgment in all steps of an analysis: (a) setting or defining the problem; (b) collecting the data; (c) choosing the tools and methods of analysis; (d) presenting the evidence and formulating the arguments; and (e) drawing conclusions, coniniunicating and implementing the results. According t o Vaupel (1981), since nearly all statistics used in policy analysis not only summarize a body of data, but also imply a policy thrust, statistical analysis for policy making is fundamentally different from statistical analysis for descriptive scientific research. As has been argued by Ravetz (1971) and Majone ( l 9 8 0 ) , policy analysis is inore craft than science. Seen in this light, it would be naive to suppose that a level of risk can be estimated and accepted as fact. Scientific truths are not proved but are the product of a process of general acceptance in the scientific community. Since rational methods for discovering the scientific information t o guide policy are often inconclusive, Majone (1981) points out that often non-rational (not t o be confused with irrational) methods are used, including bargaining, voting, delegation, material incentives, and procedures. Majone suggests further that persuasion is perhaps the most important of these non-rational methods. As Vaupel(l981) has illustrated in the case of the US. Environmental Protection Agency, the importance o f using the analyst's rational methods may lie as much in justifying the EPA's stxidards and regulations for the Federal Register and the courts as for liclping the staff t o set them. In this role as justifying or rationalizing policy positions, the analyst will naturally choose the statistics and methods that present a convincing case.
2 . The Advocacy Role of Risk Assessments A review of the three Oxnard risk assessments has revealed striking differences in their "scientitic" content: the relative conservativeness of the assumptions, the completeness of the analysis, the characterization o f safety or "risk", and the formats for presenting the results. A review of the policy process has revealed that the assessments were done to persuade either the client (advisory role) or a decision-making body (rationalization) of the safety or nonsafety of the Oxnard terminal. The SAI study was intended t o rationalize t o the FPC Western's choice of Oxnard; the
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
63
FPC study was intended t o advise the FPC staff and then t o persuade the commissioners of the merits of the staff's position; the SES study was intended t o persuade the client, the Oxnard City Council, of the analyst's own reservations of the safety of the Oxnard terminal. It is clear from the nature of the problem, indeed from policy analysis in general (and maybe all "scientific" investigations), that there are many competent and respectable ways of analyzing the problem. NO one set of assumptions is best; no analysis can be complete, no assessments are "free" of judgment. In fact, the assessments express opinion, an opinion in support or rejection of a policy argument. As in any area of uncertainty, it would be expected that there is a range of opinions. Since the assessments are ultimately intended t o support arguments in the policy process it would be expected that these opinions are expressed as persuasively as possible. Seen in this light, it is not surprising that SAI found the risks of Oxnard t o be "extremely low", that the FPC found those risks t o be "negligible", and that SES, having concluded that there are large uncertainties involved, presented some of their results in the form of worst-case scenarios. It is also not surprising, in view of the incentives t o present these results persuasively, that the reports did not elaborate in any detail on the uncertainty of the results. Working in a client-oriented environment, the analysts chose the format and presentation that made the best case for their and/or their client's position. As pointed out by Sjoberg (1980), risk analyses often increase polarization of the arguments rather than produce a consensus.
IV. Summary and Recommendations A . Summary
In this paper we have examined the role risk assessment played in a political decision process: the siting of an LNG facility on the California coast. From an analytic perspective the siting problem was one of trading off several objectives under uncertainty: environmental quality, cost, gas supply interruption risk, and population risk. Yet the political decision process studied here bore no resemblance to such a rational structure: several interested parties with conflicting goals and different short-hand long-term agendas came together in a series of structured and unstructured debates. That set of debates generated a sequential decision process, where bounds on the overall siting problem were successively narrowed as parts of the problem were resolved. An essential element of those debates was
64
J. Lathrop and J. Linnerooth
risk. With few exceptions, some measure of risk was used in support of each party’s stand. Yet, risk is a multidimensional concept; party stands were often supported by diverse definitions of risk, based on all manner of ways of combining dimensions into a risk measure, of assessing each dimension, and of portraying the results. We examined and compared three risk assessments that were produced in the course of the California LNC siting process. That comparison established that beyond the basic differences in definitions and measurements of risk, there are many other degrees of freedom in a risk assessment which are left to the analysts’ judgment. Those factors can affect the results of the assessment in major ways, determining the level of risk measured over a very broad range (perhaps two orders of magnitude), the degree of confidence ascribed to the results, and the salience of the risk. As a consequence, the results of a risk assessment may be determined as much by the judgments of the analysts as they are by the site and technology considered. But risk assessments should not be considered as andyses existing in a vacuum. An understanding of the political context within which the assessments operate is vital to the development of improvements in those assessments. The purpose of a risk assessment in the political process we studied was not only t o assess risk, but to support one side or another in debates concerning the acceptability of a risk, where those debates affected incremental decisions in a sequential process. For this reason, the timing of a risk assessment can be crucial t o its nature and effectiveness, and its effectiveness depends not only on its analytic rigor, but on its persuasiveness.
B, A Suggestion for Improving Risk Assessments. Rules ofEvidence From some normative point of view, the ideal role of a risk assessment is to make such technical information as can be mustered available to the political decision process in such a way that the information is most effectively used. Yet our comparison of risk assessments found that the assessment results depended as much on analysts’judgments and presentation formats as they did on technical aspects of the site and technology. An examination of the role actually played by the assessments found that they were designed as much to persuade as to inform. Procedural reforms to improve the content and presentation of risk assessments so that the users have a clear idea of the technical failure possibilities, along with the uncertainty involved, could take several direc-
RISK ASSESSMENT IN A POLITICAL DECISION PROCESS
65
tions. First, the analysts could be changed, e.g., independent publiclyfunded research bodies could be set up to work in a less adverse atmosphere. Alternatively, the users could be changed, e g , the present hearing system where the judges are not trained in quantitative methods could be reformed to include technical experts. Along this same line, a Technical Advisory Board (see Ackerman et al., 1974) could be set up which would critique the content of technical reports for the benefit of those using them. In the opinion of the authors, a more useful and implementable strategy for improving the content of risk assessments would involve not changing the analysts or the users, but changing the rules under which the analysts operate. Given the necessarily subjective content of risk assessments, it appears appropriate that they be received in the same manner as other more obviously subjective evidence in hearings and procedures. It follows that a promising strategy for improving risk assessments lies in the development of rules of evidence, or standards that assessments must meet in order to be used in a hearing or accepted as part of an Environmental Impact Report. The notion of introducing rules of evidence as a possibly useful procedural reform is based upon the belief on the part of the authors that the advocacy role of analysts, with the existing incentives for them to produce a persuasive case for their results, can be a productive element in malung public policy. The presentation of polar views in a policy debate, by highlighting conflicting approaches to the problem and uncovering opposing evidence, can be more informative than a single analysis with whatever attempts are made to reduce the biases discussed in this paper. However, and this point cannot be overemphasized, the introduction of competing expertise to the policy process is useful only if those in the position of judging the merits of the evidence are able to recognize the assumptions and methodologies underlying the differences in t h s evidence. In other words, rather than proposing the impossible - that only objective reports concerning the risks of a terminal be allowed as impacts to siting decisions - we propose instead that more subjective and argumentative evidence be admissible but only under the condition that those elements of the analysis which are left to the analysts’ judgment are clarified, to the extent possible, in the presentation of the results.
1. Rules of Evidence: Desirable Features
A desirable feature of a risk assessment is that it communicate an appropriate degree of confidence in its results, and that it make clear the limitations of the analysis and current technical knowledge. If risk 5
J. Lathrop and 3. Linnerooth
66
assessments are used as arguments, one on each side of a perhaps many-sided debate, it would be desirable to those judging their merits that they be as comparable as possible. The debate can then focus on comparing aspects of the alternatives themselves, as opposed to unwittingly comparing aspects of the assessments and presentations of the various sides. In this way, the interested parties can discuss differences in the modeling of uncertainties and the tradeoffs involved, about which there cannot be any "objectivity" (see Humphreys, 1982). When the relevant physical processes are poorly understood, widely different sets of assumptions can be defended so that two different analyses can deliver two different results, both following correctly from the assumptions adopted. While that murce of difference w n o t be eliminated, the other sources of differences listed in Section 1II.B. can be minimized by procedural standards for risk assessments, or rules of evidence such as those suggested below.
2 . One Suggested Set of Rules of Evidence
a. Clearly define the "risk" being assessed. As was stressed before, risk is a manydimensioned concept that is characterized in different ways for different people. Differing characterizations of risk may be such an intrinsic part of the political debate that any consensus on risk definiton may not be feasible, or more to the point, such a consensus may be more difficult to achieve than the resolution of the risk management debate itself. At the very least, then, risk assessments should clearly state what aspects of risk are being assessed, so that differences among assessments due to differing risk measures are recognized as such. As a case in point, it hardly helps to measure only expected fatalities when the political process is sensitive to a concern for the potential for catastrophe. While different political processes may be sensitive to different aspects of risk, one set of measures that take into account sensitivity to catastrophe can be adopted from Keeney er al. (1979) which include the following: (1) Expected fatalities: allows cost/benefit and some value of life calculations to be made. ( 2 ) Individual probabilities of fatality: allows comparison with nondecision "benchmarks" of individual risk, such as smoking, driving a car, etc. (3) Individual probabilities of fatality (for members of groups): when grouped by occupation, neighborhood, or activity (recreation, living), allows consideration of equity.
RISK ASSESSMENT IN A POLITICAL DECISION PKOCFSS
-
61
Risk of multiple fatalities: allows incorporation of sensitivity t o catastrophe. In addition, the analyst should be sensitive t o the many other dimensions of the risk issue that are of political concern. Who are the people at risk? Are they young or old? How frightened are they of the risk situation? Are they voluntarily exposed? Are they familiar with the risk? And this list could continue (see Otway and von Winterfeldt, 1980). (4)
b. Be clear on error bounds of results. Those bounds should include disagreements among experts. This requirement could have the effect of reducing differences among assessments, by forcing all assessments t o take into account a similar data base and the set of relevant experts or models, as opposed t o a single expert or model for each assessment. c. Model uncertainty explicitly. If the assumptions made are clearly stated along with the results, debates over differences between assessments can often be converted into more meaningful debates on assumptions. For instance, if the assessment assumes “no terrorist actions”, that should be clearly stated along with the results. d. Wherever possible, risk measures should be stated in relative terms, relative t o an actual, agreed-upon alternative. Many problems could be mitigated by measuring relative risk, as opposed to absolute risk. But this suggestion would require an agreement on a particular alternative to the applicant’s project when the debate may be due to two different ideas of what that alternative should be. However, where possible, t h s requirement would lead t o risk assessment results that are more easily incorporated into the actual political process.
References Ackerman, B., S. Adterman, J . Sawyer, Jr., and D . Henderson, 1974. The Uncerrain Search for Environmental Quality. London: Collier Mamillan. Ahern, W . , 1980. California meets the LNG terminal. Coastal Zone Management Journal, 7, 185-221. Atz, H . , 1981. Decision-making in LNG terminal siting:Wilhelmshaven,F.R.G. Draft Report. Laxenburg, Austria: IIASA. Fischhoff, B., S . Lichtenstein, P. Slovic, S . Derby, and R . Keeney, 1981. Acceptable Risk: New York. Cambridge University Press.
68
J. Lathrop and J. Linnerooth
FPC, 1976. Pacific-Indonesia project, final environmental impact statement. Bureau of Natural Gas, lederal Power Commission Staff. Federal Energy Regulatory Commission, Washington, D.C., December. FPC, 1977. Initial decision on importation of liquefied natural gas from Indonesia. Federal Power Commission. Fedcral Energy Regulatory Commission, Warhington, D.C., July. Havens, J., 1977. Predictability of LNG vapor dispcrsion from catastrophic spills onto water: An assessment. U.S. Coast Guard,Washington, D.C. Humphreys, P., 1982. Value structure underlying risk assessments. In: H . Kunrellthcr (cd.), Risk: A Seminar Series. Lawnburg, Austria: IIASA. Keeney, R., R. Kulkarni, and K . Nair, 1979. A risk analysis of a n LNG terminal, Omqa, 7, 191-205. Kurireuther, H., J . W. Lathrop, and J. Linnerooth, 1981. A descriptive model of choice for siting facilities: The case of the California LNG terminal. IIASA Working Paper, WP-81-106.iLaxenburg. Austria: IIASA. Lathrop, J.W., 1980. The role of risk assessment in facility siting: An example from California. WP-80- 150. Laxenburg, Austria: IIASA. Linnerooth, J . , 1980. A short history of the California LNG terminal. WP-80-155. Laxenburg, Austria: IIASA. Macgill, S. M., 1981. lkcision making on LNG terminal siting: Mossmorran-Braefoot Bay, United Kingdom. Draft Report. Laxenburg, Austria: IIASA. Majone, C . , 1980. An anatomy of pitfalls. 1n:G. Majone and E. Quade (eds.),Pitfu//s of Analysis. IIASA Series. New York: Wiley. Majone, G. The uses of policy deemphasis and analysis. Draft. Laxenburg, Austria: IIASA (in press). Mandl, C. and J . W. Lathrop, 1981. Assessment and comparison of liquefied energy gas terminal risk. IIASA Working Paper, WP-81-98. Laxenburg: IIASA. Mulkay, M., 1980. Science and rhe Sociology of Science. London: Allen and Unwin. Otway, H. and D. von Winterfeldt, 1981. Beyond acceptable risk: O n the social acceptability of technologies. Policy Science (forthcoming). Ravetz, H., 197 1 . Scientific Knowledge and its Social Pfohlems. Harmondsworth, Middle se x : Penguin University Books. SAI, 1975. LNG terminal risk assessment study for Oxnard,California. La J o b , California: Science Applications, Inc., December. SES, 1976. Environmental impact report for the proposed Oxnard LNG facilities. Draft EIR Appendix B: Safety. Los Angeles, California: Socio-Economic Systems, Inc., September. Sjoberg, L., 1980. The risks of risk analysis. Acta Psychologica, 45, 301--321. Vaupel, J . W., 1981a. On statistical insinuation and implicational honesty. Unpublished paper, Duke University and International Institute for Applied System Analysis. July. Vaupel, J . W., 1981b. Analytic perspective on setting environmental standards. Draft report prepared for the Office of Air Quality Planning and Standards. U.S. Environmental Protection A g e n q , August. von Winterfeldt. D. and M. Rios, 1980. Conflicts about nuclear power safety: A decision theoretic approach. Los Angeles: Social Science Research Institute, University of Southern California. Wynne, B., 1981. Institutional mythologies and dual societies in the management of risk. Presented at IIASA Summer Study o n Risk. To appear in proceedings. Laxenburg, Austria: IIASA.
A MULTI-ATTRIBUTE MULTI-PARTY MODEL OF CHOICE: DESCRIPTIVE AND PRESCRIPTIVE CONSIDERATIONS Howard KUNREUTHER’ Internutional Institute o l A pplied Systeins Analysis, I.axerzhirrg, Austria
Abstract Societal decision making for siting facilities, such as liquefied natural gas (LNG) terminals, have two primary features which make these problems difficult to structure analytically. First, the decision affects many interested parties each with their own objectives, attributes, data base and constraints. Secondly, there is the absence of a detailed statistical data base on the safety and environmental risks associated with a proposed project. This paper describes a multi-attribute multi-party model (MAMP) which has been developed at IIASA for structuring the process for siting LNG terminals in four different countries. The decision making process in each of these countries can be characterized as a sequence of decisions, subject to change over time, which are influenced by exogenous factors and legislation. The MAMP model separates this process into a series of rounds, each of which is characterized by a unique problem formulation and an interaction phase. The model is illustrated using detailed data on the California siting decision obtained from published and unpublished material as well as personal interviews. The final section of the paper discusses empirical lessons which follow from the MAMP model and proposes areas for future research on prescriptive analysis.
’
The research report in this paper is supported by the Bundesministerium fur Forschung und Technologic, F.R.G., contract No. 321/7591/RGB 8001. While support for this work isgratefully acknowledged, the views expressed are the author’s and not necessarily shared by the sponsor. This paper is part of a larger project a group of us at IIASA are undertaking with respect to siting decisions of Liquefied Natural Gas facilities. The ideas presented here reflect helpful discussions with my IIASA colleagues-John Lathrop, Joanne Linnerooth and Nino Majone-as well as with David Bell and Louis Miller. Helpful comments on an earlier version of this paper were provided by Patrick Humphreys and Detlof von Winterfeldt.
70
ti. Kunreuther
1. Introduction
Society has become increasingly concerned with the question as to how one evaluates the siting of technologically sophisticated projects which provide social benefits over a wide region but also may impose significant costs on certain groups. The recent debates on the future of nuclear power plants as a source of energy throughout the world highlights this point. A less publicized set of decisions is the siting of liquefied energy gas terminals in different parts of the world, the particular technology which serves as an illustrative example in this paper. There are two primary features associated with these proposed projects which make them particularly difficult t o structure analytically. First, the decision affects many different individuals and groups in society rather than being confined to the normal relationship of a private niarket transaction such as when a consumer purchases food or an appliance from a store or firm. In the siting decision, each interested party has its own objectives, attributes, data base and constraints (Keeney, 1980). A second feature of these problems is the absence of a detailed statistical data base on the variety of different risks associated with either investing or not investing in a particular project. For example, if the construction of a nuclear power plant or LNG terminal is approved, then cnvironrnental and safety risks are created. By not building the project there are economic risks with respect to the future cost of energy to residences and businesses. Each interested party is thus likely t o provide different estimates of the uncertainties and consequences of these risks. Hence it is particularly difficult t o utilize what Majone and Quade ( 1 980) call statistic'il rules of evidence t o settle these differences. The puipose of this paper is t o propose a framework for investigating societal p o b l e m s which have the above two characteristics. Section I 1 provides a set of concepts which are relevant for characterizing the decision making process. In Section 111 these concepts are integrated into a descriptive model of choice, a multi-attribute multi-party model (MAMP) which has been developed a t IlASA for structuring the process for siting LNG terminals in four different countries: the F.R.G., the U.K., The Netherlands, and the U S . (Kunreuther, Lathrop, and Linnerooth, 1981). The model will be illustrated using the California siting case. The MAMP model enables the policy analyst to understand more fully the dynamics of the political process and the relevant constraints which impact on final actions. It thus promises t o be a useful tool for improving decision malung for societal problems where there are conflicts between the parties. Section IV discusses lessons which follow from the descriptive analysis and proposes areas for future research on prescriptive analysis.
MULTI-ATTRIBUTE MULTI-PARTY MODEL OF CHOICE
71
11. Relevant Concepts for a Descriptive Model of Choice
Different Interested Parties Facility siting debates vary in detail but there are a well defined set of stakeholders who can be classified into one of four general groups depicted in Figure 1 . Let us briefly look at each of these interested parties in turn to better understand why potential conflict is likely to result when a specific project is proposed.
Pot en t t a l The applicant
sources of
Local residents
conf l t c t
Public interest
groups Figure 1 . Relevant Interested Parties in Facility Siting Decisions
The Applicant Firms or developers who support the construction and operation of a facility have concluded that despite future uncertainties, the expected profits associated with the project exceed the potential costs. Their position is likely to be based on economic factors, although they may also be concerned with the safety risk.
Local Residents Residents in a community that has been proposed as a possible site will have differing views of the situation. Those who own the property where the project is to be constructed have to determine whether the price the
12
H . Kunreuther
developer offers them is attractive enough. If the developer has eminent domain power (e.g., a public utility) then these residents may be concerned that a court will not award them a fair price for their property (O’Hare, 1977). Others in the community may focus on the reduced property taxes or increased employment that a facility is likely t o bring and hence favor the action. A third group may be concerned with the increased safety risk created by the facility and oppose the project.
Cover nmen t Agencies State and federal government agencies normally play the role of referee or arbiter in the decision making process, even though in many cases they have an interest in a particular outcome. Their regulatory actions, which are often constrained by legislation, influence the nature and distribution of the public’s preferences and provide advantages to some interests relative to others (Jackson and Kunreuther, 1981).
Public Interest Groups Recently we have seen the rise of very intense public interest groups. These organizations generally represent the interests and preference of one component of the public. For example, the membership of the Sierra Club is concerned with the effects that the siting of any new facility will have on the environment. Wilson (1975) and Mitchell (1979) have pointed out that those attracted to such organizations have strong, particular interests which dictate the agenda of the organization and influence the type of information that is collected and processed. It should be clear from these brief descriptions that there is considerable room for potential conflict between groups once a specific site is proposed as an option. The relative influence of each of the parties in the process will depend on their composition as well as how well-defined their objectives are. Olson (1971) postulates that each person in a group allocates time and energy in proportion t o the expected benefits he or she receives. If t h s assumption is true, then it is less likely that individuals will devote more effort to supporting a group’s cause as the size of the group decreases and the amount at stake for each individual increases.
MULTI-ATI'KIBUTE MULTI-PARTY MODEL OF CHOICE
13
Sequential Decision Process
Another feature of the facility siting problem is that the process is characterized by sequential decisions. March (1 978) notes that individuals and groups simplify a large problem into small subproblems because of the difficulty they have in assimilating all alternatives and information. Often constraints due t o legislation and legal considerations dictate the order in which certain actions must be taken. If the process is sequential in nature then the setting of an agenda is likely t o play a role in determining the final outcome as well as the length of time it takes t o reach it. There is strong empirical evdence from the field as well as from laboratory experiments (Levine and Plot;, 1977) that the order in which different subproblems are considered frequently lead t o different outcomes for the same larger problem. There are two principal reasons for this. Once a particular decision has been made on a subproblem this serves as a constraint for the next subproblem. If the order of the subproblems is reversed then there would likely be a different set of choices to consider. Secondly, each subproblem involves a different set of interested parties who bring with them their own set of data t o bolster their cause. The timing of the release of this information may have an effect on later actions. For example, citizens' groups normally enter the scene with respect t o siting problems only when their own community is being considered as a possible candidate. The data on the risks associated with siting would be released at a slower rate (but perhaps with greater emphasis and more political impact) if only one site was considered at a time than if all potential sites were evaluated sin~ultaneously.
Hole of Exogenous Events Another important concept, which relates to the uncertainty of infornia tion on probabilities and losses, is the importance of exogenous events in influencing the decision process. Random events, such as disasters, often play a critical role in triggering specific actions to "prevent" future crises and call attention to the dangers associated with a particular technology. The small data base for judging the frequency of low probability events, coupled with systematic biases of individuals in dealing with concepts of chance and uncertainty, increases the importance of a salient event in the decision making process. Tversky and Kahneman (1974) describe this phenomenon under the heading of availability, whereby one
74
H.Kunreuthcr
judges the frequency of an event by the ease with which one can retrieve it from memory. Fischhoff, Iichtenstein , and Slovic (in press) summarize their recent experimental studies on perceived risks by cataloguing the nature of individual estimates on the probability of occurrence and consequence of different types of hazards. One of their principal conclusions is that these estimates are labile and likely to change over time because of salient events which are highlighted by mass media coverage. In a similar spirit, March and Olsen (1976) suggest that random events and their timing play a role in many organizational decisions because of the ambiguity o f many situations and the limited attention that can be given t o any particular problem by the interested parties unless it is perceived as being critical. They provide empirical evidence to support their theory using empirical studies of organizations in Denmark, Noru.ay, and the United States. In another context, Holling (1981) has pointed o u t how specific crises in the short-run can lead to changes in policies with respect to environmental and ecological problems ( e g 3the suppression of the spruce budworm after it had destroyed forest in Canada). Kunreuther and Lathrop (in press) describe with specific examples how exogenous events triggered new coalitions and new legislation regarding LNG siting decisions in the United States. One reason for the importance of exogenous events, such as crises and disasters, in triggering societal interest in a specific problem is that it is easily understood evidence of trouble. Walker (1977) stresses the importance of this factor in setting the discretionary agenda of the U S . Congress or a government agency. T o support these points, Wa!'.er presents empirical evidence o n the passage of safety legislation in the U S . Numerous examples of this process are also provided by Lawless (1977) through a series of case histories of problems involving the impact of technology on society. He points out that frequently: new information of an "alarming nature is announced and is given rapid and widespread visibility by means of modern mass communications media. Almost overnight the case can become a subject of discussion and concern t o much of the populace, and generate strong pressures t o evaluate and remedy the problem as rapidly as possible. (P. 16) In the case of decisions such as the siting of facilities, exogenous events such as an LNG explosion or an oil spill may be sufficiently graphic and affect enough people t o cause a reversal of earlier decisions, inject other alternatives into the process and change the relative strength of
MULTI-ATTRIBUTE MULTI-PARTY MODEL OF CHOICE
-
15
parties interested in the decision outcome. The mass media may play a critical role in focusing on these specific events and in many cases exaggerating their importance.
111. A Multi-Attribute Multi-Party Model of Choice
The above concepts are now incorporated into a model of sequential decision making for large-scale projects such as facility siting. The approach, which has been influenced by the work of Braybrooke (1974), focuses on more than one attribute and involves many interested parties. Hence we have called it the Multi-Attribute Multi-Party (MAMP) model? The MAMP model will be described using an illustrative example: the siting of a liquefied natural gas (LNG) terminal in California. The relevant data for developing the scenarios described below were obtained from published documents as well as personal interviews with key interested parties. For inore detailed discussions see Lathrop (1981) and Linnerooth ( I 980). It i s useful t o provide a brief background on the nature of the siting problem. LNG is a potential source of energy which requires a fairly complicated technological process that has the potential, albeit with very low probability, of creating severe losses. To import LNG the gas has t o be converted t o liquid form at about 11600 the volume. It is shipped in specially constructed tankers and received a t a terminal where it undergoes regasification and is then distributed. The entire system (i.e., the liquefaction facility, the LNG tanker and the receiving terminal and regasification facility) can cost more than $ 1 billion t o construct (Office of Technology Assessment. 1977).
Elements of the Model Figure 2 provides a schematic diagram of the MAMP model. The decision process can be separated into different rounds which are labeled by capital letters, A, B . . . A round is simply a convenient device t o illustrate a change in the focus of discussions either because (1) a key decision was taken (or a stalemate reached due to conflicts among parties) or (2) a change occurred in the context of the discussions due to an exogenous event, entrance of a new party or new evidence to the debate. A round is 2For a more detailed description of the MAMP model see Kunreuther, Lathrop and Linnerooth (1981).
76
H. Kunreuther
initiated by a formal or informal request by one or more of the interested parties. In California Round A began in September 1974 when the applicant filed for approval of three sites on the California Coast Point Conception, Oxnard, and Los hgeles--to receive gas from Indonesia. No matter how a round is initiated it is characterized by a unique problem formulation which is presented in the form of a set of alternatives. There can be several decisions made in any round or there can simply be discussions of issues which revolve around the proposed set of alternatives. By definition the activities during any given round are based on the same set of alternatives. Each alternative is characterized by a set of attributes which may be viewed differently by each of the interested parties. In Round A the alternatives were whether one or more of the proposed sites for an LNG terminal was acceptable. There were four primary attributes used for the ensuing debate among the parties. The need for LNG and the risk of an interruption in the supply of natural gas were arguments for supporting the location of a terminal in at least one of the three proposed sites. Environmental and land use considerations suggested a non-remote site (Los Angeles or Oxnard) while the risks to the population argued for siting the terminal in a remote area (Point Conception). Finally, concerns about earthquake risk brought about opposition to the Los Angeles site, which was found to be crossed by a significant fault. There were several interested parties in Round A which can be referenced t o the four groups depicted in Figure 1. The applicant for the terminal was Western LNG Terminal Associates, a special company set up t o represent the LNG siting interests of the three gas distribution utilities: Southern California Gas Company, Pacific Gas and Electric, and El Paso Natural Gas Company. With respect to local residents, each of the city councils evaluated the proposed terminal in their jurisdiction by looking at the tax revenues and jobs it promised t o provide. These positive features had to be weighed against the negative impacts that the facility might have on land use and risk t o the population. With respect t o government agencies, the Federal Energy Regulatory Commission (FERC) determines whether a proposed LNG project is in the public interest and should be allowed and the California Coastal Commission (CCC) has the responsibility of protecting the California coastline. Finally, the public interest groups, represented by the Sierra Club and local citizens’ groups, were primarily concerned with safety and environmental issues. Each of the interested parties states its preference over the different alternatives and constructs arguments to defend its preference by focusing on different attributes. This descriptive view of the process
MULTI-ATTRIBUTE MULTI-PARTY MODEL OF CHOICE
I
Alternatives in round
77
1
I Relevant interested parties
' Preferences by parties
m 0) 0 L
a c .0 _ + u
P
aI + c -
Use of attributes to defend preferences
Nature of conflicts between parties
e Decision in round
Conclusion of round
Is there a feasible solution or no solution possible
No
Figure 2. Multi-Attribute Multi-Party Model (MAMP) of Choice
H. Kunreuther
78
should be distinguished from the concept of elicitation of preferences as used by psychologists and decision analysts. During this interaction phase certain decisions are made. In the case of Round A in California two key decisions were taken. First, the CCC favored Point Conception over the non-remote sites due l o concerns over the safety risk t o population. Second, the FERC disapproved of the Port of Los Angeles because a recently discovered earthquake fault increased the seismic risk above an acceptable threshold. Round A was concluded with a potential stalemate perceived by the gas industry. We have summarized the elements of Round A in Table 1 . Table 1 . Elements of Round A Problem Formulation:
Should the proposed sites be approved? That is: Does California need LNG, and if so, which, if any, of the proposed sites is appropriate?
Initiating Event:
Applicant tiles for approval of three sites.
Alternatives:
Site at Point Conception: Site a t Oxnartl: Site at Los Angeles: Site at any combination of:
A' A' A3 A' , A', A'
Interaction:
Involved Parties
Attributes Used as Arguments
Applicant *FERC
XI
PI p2 *ccc p3 *City Council5 P, Sierra Club P, Local Citizens P,
XI
x3 x2 X1
xs
X1
XS
Xl
x5
Key Decisions: CCC concerns over population risk implies that A' is preferred over t h e other two sites. 2. FERC would not approve A' because the seismic risk is greater than a prescribed acceptable level. I.
Conclusion: Applicant perceives a stalemate, i.e., that no site is approvable without long delay.
*Interested party with responsibility for dccision(s).
The siting process in California can be characterized by four rounds (A . . . D) as shown in Table 2 . Round B resulted in the passage of the LNG Siting Act of 1977 which was designed to break the stalemate at the end
MULTI-ATTRIBUTE MULTI-PARTY MODEL O F CHOICE
79
Table 2 . Summary of Rounds in California LNG Siting Case
DA TE
ROUND A Problem Formulation:
Should the proposed sites be approved? That is: Does California need LNG, and if so, which if any, of the proposed sites is appropriate?
Initiating Event:
Applicant files for approval of three sites.
Septcmber 1974 (34 months)
Conclusion :
Applicant perceives that no site is approvable without long delay
July 1977
ROUND B Problem Formulation :
How should need for LNG be determined? If need is established, how should an LNG facility be sited?
Initiating Event:
Applicant and others put pressure on state legislature to facilitate LNG siting.
Conclusion :
New siting process set up that essentially assumes a need for LNG, and is designed to acceleratc LNC tcrminal siting.
July 1977
(2 months) September 1977
ROUND C Problem Formulation:
Which site should be approved?
Initiating Event:
Applicant files for approval of Point Conception site
October 1977 (10 months)
Conclusion :
Site approved conditional on consideration of additional seismic risk data.
July 1978
ROUND D Problem Formulation:
Is Point Conception seismically safe?
Initiating Event:
Regulatory agencies set up procedures to consider additional seismic risk data.
Conclusion:
(Round still in progress)
80
H.Kunreuther
of Round A. Its principal feature was that the CCC nominates and ranks potential sites for an LNG terminal in addition to those which the Western LNG Terminal Associates applies for. The California Public Utilities Commission, the principal state body involved in power plant issues, selects a site from the CCC list, not necessarily the top ranked site. In Round C which occurred during the summer of 1977 thc CCC ranked four sites (Camp Pendleton, Rattlesnake Canyon, Point Conception, and Deer Canyon) in that order and the CPUC chose Point Conception, conditional on it being a seismically safe location. Kound D is still in progress with the FERC and CPUC examiningseismic data which will determine whether Point Conception is seismically safe. Whether an LNG terminal will ever be sited at Point Conception is an open question since the enthusiasm of the applicant for an LNG terminal has now waned considerably. In addition, there are two sets of wealthy landholders owning adjacent tracts of land to Point Conception: the Hollister and Bixby Kanches. These landholders are attempting to do everything in their legal power to prevent the siting process at Point Conception and so far have managed to stall any action.
Interpretation o f the Model The MAMP decision process in California reflects the basic concepts which were outlined in Section 11. As indicated by the scenario of the four rounds, there were different interested parties who interacted with each other at each stage of the process. There were three broad categories of concern which are relevant to this problem: risk aspects, economic aspects, and environmental aspects. Each of these concerns can be described by a set of attributes. Table 3 depicts an interested partylconcern matrix showing the main attributes considered by each of the relevant groups over the seven year period based on detailed interviews with key personnel (see Lathrop, 1981). It is clear from this table that each of the parties brought to the debate their own special interests. The applicant’s primary concerns are earning profits for shareholders and delivering gas reliably to consumers. Hence the emphasis on the need for gas, profit considerations, and price of gas as the relevant factors. The federal and state government agencies concerns were specified by legislation; local governments conipared the economic benefits with environmental and safety factors. Public interest groups, like the Sierra Club and local citizens groups, focused their attention on the environmental aspects and safety risks associated with the project.
MULTI-ATTRIBUTE MULTI-PARTY MODEL OF CHOICE
81
Table 3. PartylConcern Matrix Parties Applicant Utility
0
Interest GPS
0
0
0
0
0
0
.
Sierra Local Club GP
0
0
0
ENVIRONMENTAL Air quality Land use RISK Population Earthquake
State Gov’t
- - -Local Gov’t
FERC CCC CPUC LEG
Concerns ECONOMIC Need for gas Profit considerations Price of gas Local economic
Federa1
0
0
0
0
0
0
0
0
.
0
0
0
0
0
0
.
0
0
0
0
0
0
0
The case also illustrates the importance of a small but powerful interested party- the Oxnard citizens group-in influencing legislative actions. Until the publication in 1976 of a worst case scenario associated with a proposed $300 million terminal in Oxnard, there was almost unanimous agreement by all stakeholders that this community would be an ideal site for an LNG terminal. At the time even the Sierra Club was in favor of this location (they changed their feelings about Oxnard in 1977). A worse case scenario indicated that a spill of 125,000 cubic meters of LNG from all five tanks on a tanker would cause a vapor cloud which would affect 50,000 people. Residents could look on a map t o determine whether the cloud covered one’s own house (Ahern, 1980). No estimate of a probability was attached t o this scenario. The graphic depiction of these consequences generated a public reaction by a small group organized by concerned citizens of Ventura County. The California legislature was influenced by this public reaction. One legislative staff member stressed that it was not possible t o allow a site that would lead t o a large number of deaths in a ~ a t a s t r o p h e Hence .~ new siting regulations were passed stating that no more than an average of 10 people per square mile could be within one mile of the terminal and no more than 60 within four miles of the terminal. The President’s National 3This comment was made to John Lathrop in an interview in Sacramento, California, in July 1980, regarding the siting process of an LNG terminal.
6
82
14. Kunreuther
Energy Plan incorporated similar population guidelines which effectively ruled out any high density areas as candidates for an LNG terminal. Interestingly enough the risk assessment used by the citizens’ group at Oxnard was only one of three conimissioned by different interested parties for this site each of which produced different estimates and conclusions (Lathrop and Linnerooth, this volume). The sequential decision process is self-explanatory based on the four rounds depicted in Table 2 . This process may facilitate decisions at each stage by limiting the number of parties but it can have negative l o n g range consequences. For example, the need for imported natural gas has greatly diminished in California but the possihility of siting a terminal is still alive. Point Conception has been deemed an acceptable site subject to a seismic risk study. Due to the nature of the siting process, the only way this site would be unacceptable is if the seismic risk was found t o be too high. Rather than stating that California may not need LNG, the relevant interested parties have preferred to delay the findings of the seismic risk studies (Lathrop, 1981). Another example of the long-range negative effects of the sequential constraints is the case of a supply interruption risk. Initially, the applicant proposed three separate sites t o minimize the risk of California having a shortage of natural gas. When the decision process eliminated two of the three proposed terminals, Western Associates proposed the construction of a large facility at Point Conception capable of producing a throughput of 58,000 1n3 LNG/day, equivalent in energy flow t o roughly 15 modern nuclear reactor units (Mandl and Lathrop, 1981). By concentrating the facilities at one port the supply interruption risk will now likely be increased rather than decreased, if Point Conception is approved and actually utilized. Finally, turning to the role of’exogenous events in California there is one incident which had an impact on the decision making process. In December 1976 the Los Angeles City Council voted t o allow work to begin on an LNG terminal in San Pedro Bay. The following day an explosion ripped the oil tanker Sansinea in bs Angeles harbor, leaving 9 dead and 50 injured. A week later the City Council commissioned a study as to the relative safety of the proposed site. They later approved the terminal. This explosion, although it had nothing t o d o with liquefied natural gas, alerted many Californians t o the potential dangers o f LNG. On a more general level, two disasters in other parts o f the country illustrate the importance that exogenous events have had on the decision process with respect t o LNG siting and regulations. In 1973 an LNG tank in Staten Island, New York, exploded and the roof collapsed burying 40 workers. There was n o LNC in the tank but it
MULTI-ATTRIBUTE MULTI-PARTY MODEL OF CHOICE
83
had seeped through the insulation and caused a huge fire. A result of this explosion was the increased concern with the dangers of LNG by Staten Island residents. The neighborhood organization which was formed a year before the accident, attracted considerable attention and interest because of the media coverage of the tank explosion. In the context of the MAMP model a new interested party played a key role because of an exogenous event. What may have been a foregone decision regarding the location of an LNG tank in Staten Island became problematical (Davis, 1979). The worst LNG accident occurred in 1944 when the storage tank operated by the East Ohio Gas Company in Cleveland ruptured, spilling LNG on adjacent streets and sewers. The liquid evaporated, the gas ignited and exploded, resulting in I28 deaths, 300 injuries and approximately $7 million in property damage. An investigation of this accident indicated that the tank failed because it was constructed of 3.5% nickel steel, which becomes brittle when it comes in contact with the extreme cold of LNG. All plants are now built with 9% nickel steel, aluminium or concrete and the storage tanks are surrounded by dikes capable of containing the contents of the tank if a rupture occurs (Davis, 1979). This example illustrates the impact of a particular incident on new regulations, which otherwise may not have been passed.
IV. Improving the Decision Process: Prescriptive Analysis The siting process for LNG terminals in California has provided a graphic description of the conflicts which exist between different interested parties, each of whom have their own goals and objectives. The partylconcern matrix depicts the different attributes used to defend positions, the MAMP model reveals the dynamics of the decision process and the relevant constraints which determined the outcomes at the end of each of the different rounds.
Lessons from the MAMP Model A retrospective view of the situation through the eyes of the MAMP model provides the following insights which may have relevance for prescription. 1. There is little articulation of v a h e judgments by the different parties. Each of the groups has a set of objectives and related attributes which they are willing t o articulate but there has been n o statement by anyone as to the importance weights assigned t o the different attributes in the 6*
84
H. Kunreuther
problem. This observation coincides with Ward Edwards’ experience in attempting to use multi-attribute utility analysis in evaluating alternative school desegregation plans submitted by external groups t o the Los Angeles School Board. He has noted that the interested parties in a societal decision problem are unlikely to reveal their value structures because this information would then be public and they would be accountable for numerical judgments (Edwards, 1981). For this reason it will be difficult to utilize this technique as a way of determining preferences between alternatives. Humphreys and McFadden ( 1980) have had similar problems using their MAUD interactive computer model. Constraints guiding the decision process are not stable but may change over time as new information is injected into the process by one or more interested parties. An interesting example is the present concern that seismic risk is a potential problem for siting a facility at Point Conception, even though this risk had not surfaced in earlier discussions of the feasibility of the site. Another illustration is the ability of the Oxnard citizens’ group to influence new legislation on siting criteria by focusing on the number of deaths froma catastrophic accident rather than on the extremely low probability of such a disaster actually occurring. These examples illustrate the point made by Majone (in press) that actual policies are determined through a process where each of the interested parties attempts to modify rules of the game which constrain them from achieving their goals and objectives. This may further exacerbate the problem of eliciting the value structure of the different interested parties. The siting of sophisticated technologies is a process that is not well understood scientifically so that there are no measures of risk which can be pinpointed using statistical analysis. Hence each of the interested parties has an opportunity to focus on different measures to support their position. The conflicting risk assessments for evaluating the safety of an I.NG terminal in proposed sites has been well documented by Mandl and Lathrop (1981) and Lathrop and Linnerooth (this volume) for the four IlASA case studies. Each of several different interested parties commissioned a special risk study and used the results for their own purposes. Given these observations what can be done to improve the situation? One of the most important aspects of the MAMP desciiptive model is that it enables the policy analyst t o focus on the actual siting process and to evaluate its success on the basis of several different dimensions. The standard analytic tools such as multi-attribute utility analysis or cost/benefit analysis have normally focused on outcomes rather than process. There is
MULTI-ATTRIBUTE MULTI-PARTY MODEL OF CHOICE
85
no reason why one cannot focus on how well different procedures score with respect to a well-defined set of objectives. The first step in undertaking this type of analysis would be to specify the relative importance of different attributes one would like a process to satisfy. One of these attributes might be related to how well the final choice performs with respect to resource allocation, but there is also likely to be a set of attributes which reflect the way different interested parties feel about the process as well as the outcome. For example, did each interested party have an opportunity to voice its position? Were a wide enough set of alternatives considered so that the parties felt that a choice was actually being made? These factors may be important in some type of cultural settings but less relevant in others. The policy analyst can also point out that a more elaborate process takes time, another dimension t o be considered in the evaluation procedure. By articulating the types of tradeoffs which have to be made in choosing one type of procedure over another, the analyst can provide guidance to policy makers as to what decision process they may want to consider in the future. Future Research Needs The Use of GERT The MAMP model also may be a useful tool for analyzing how alternative procedures are likely to fare for a given problem context. In reality, the decisions made in any round are probabilistic with the chances of different outcomes determined by the partylconcern matrix and the procedures which one employs. One way to modify the MAMP model to incorporate these elements of uncertainty is to employ the concepts of another technique-GERT (Graphical Evaluation and Review Technique).- to structure the process. GERT is a combination of network theory, probability theory and simulation, and was developed by Alan Pritsker (1 966) to analyze the terminal countdown on an Apollo space ~ y s t e m The . ~ basic features of GERT in the context of the California siting decision appears in Kunreuther (1 98 1). The use of GERT to structure the key questions and activities depicted in the MAMP model provides a vehicle for prescriptive analysis. It enables the policy analyst to develop alternative scenarios and likely out4For an excellent description o f the modeling features and capabilities of GERT including its application in real world problems see Moore and Clayton (1976).
86
H.Kunreuther
comes by changing the nature of the decision process. Future research using this model could address the following types o f questions: What is the likely impact on the decision process if some of the existing constraints are relaxed? For example, suppose that experts were explicitly brought into the picture t o attempt to arrive at consensual judgments regarding specific risk and that the interested parties have t o abide by their findings. What impact would this have on the likely outcomes? What would the impact be if certain parties were given power which they currently do not have? For example, suppose that a specific regulatory agency was given full authority t o rank and approve a specific site in California. What difference would this make on the scenario and final outcome? What would happen if there was a change in the way alternative sites were introduced into the picture? For example, suppose the gas companies decided t o propose only one site a t a time for locating a terminal. How would this affect the interaction between different interested parties and the alternative outcomes? In this type of scenario one would first have t o determine the order of the sites t o be introduced and the relevant nodes and activities should a particular site be approved or deemed infeasible.
Efficiency and Equity Tradeoffs The MAMP model can provide insight as to when political considerations are likely t o foreclose certain outcomes which may have desirable economic fwtiires. For example, a particular scenario may reveal that a community is likely to be opposed to a given site and will fight hard t o stop its approval because they feel that the increased risks which they must bear are t o o high. If this project is socially beneficial, then it may be useful t o investigate policy tools for compensating interested parties who feel they will suffer from the project. Insurance may provide a way of protecting potential victims against potential property losses and physical injury. Today there is limited insurance protection against large-scale accidents such as a catastrophic accident of an LNG terminal. A General Accounting Office report (1978) concluded that under present liability arrangements injured parties could not be fully compensated for a serious accident. Some of the research questions which could be appropriately addressed in future problem-focused st d i e s are :
MULTI-ATTRIBUTE MULTI-PARTY MODEL OF CHOICE
81
Which of the interested parties is liable in the event that a specific disaster occurs after a project has been sited? 0 What types of enforcement procedures can be evoked to assure that contract provisions are satisfied ex posf? 0 Are there historical lessons which shed light on the role of insurance as a tool for providing financial protection to potential victims? With respect to more direct consequences of siting a new facility, O’Hare (1977) has proposeda compensation system where there was opposition to proposed sites from certain interested parties, such as residents of the area, who felt that they would suffer losses in property values and would have to bear safety and environmental risks. The essence of his proposal is that each community determines a minimum level of per capita compensation so that it is willing to make a legal commitment to having the project in their backyard if the compensation is paid. Whether or not some type of compensation scheme is a useful policy prescription depends on the specifics of the situation. In this connection, it would be interesting to ask: What type of payments wotild have been required to appease the citizens of Oxnard so that an LNG terminal could have been located there? What would the Sierra Club require in payments so that they would support a site which might have adverse environment effects? These questions can only be answered in a real world problem context. They do reflect an increasing concern of economists and lawyers in dealing with windfalls or wipeouts from specific actions which involve the public sector. Hagman and Misczynski (1978), in their comprehensive study of the subject, believe that windfalls should be partially recaptured t o help compensate for wipeouts. They propose a number of alternative mechanisms for ameliorating this problem ranging from special assessments to development permits. These types of policy instruments could also be investigated in the context of specific siting problems. After all is said and done the final outcome is likely to represent some type of balance between the political constraints and economic criteria. As Wildavsky (1 98 1 ) has pointed out: The criterion of choice in politics and markets in not being right or correct as in solving a puzzle, but agreement based on interaction among partially opposed interests. (p. 133) The MAMP model will not tell any politician how one should deal with the equitylefficiency dilemma but at least it uncovers some of the specific causes of these conflicts. How one actually improves the process is a challenge for the future.
88
H. Kunreuther
References Ahern, W., 1980. California meets the LNG-terminal. CoastaI Zone Managemenr Journal, 7 , 185-221. Braybrooke, D., 1974, Traffic Congestion Goes Throwh the Issue Machine. London : Routledge and Kegan Paul. Davis, L.N., 1979. Frozen Fire (Friends of the Earth). Edwards. W., 1981. Reflections on and criticisms of a highly political multi-attitude utility analysis. In.: L. Cobb. and R. Thrall (eds.), Mathematical Frontiers of the Social and Policy Sciences, Boulder, Colorado. Westview Press, 157 -189. Fischhoff, B., P. Slovic, and S. Lichtenstein. Lay foibles and expert tales in judgments about risk. In: T. O’Riordan and R. K. Turner (eds), Progress in Resource Management and Environmental Planning, Vol. 3. Chichester: Wiley (in press). General Accounting Office, 1978. Need to improve regulatory review process for liquefied natural gas imports. Report to the Congress, ID-78-17, Washington, D. C., July. Hagman, D. and D. Misczynski, 1978. Windfalls for Wipeours. Chichago: American Society of Planning Officials. Holling, C. B., 1981. Resilience in the unforgiving society. Working Paper R-24, March, Vancouver: Institute of Resource Ecology. Humphreys, P.C. and W. McFadden, 1980. Experiences with MAUD: Aiding decision structuring versus bootstrapping the decision maker. Acta Psychologica, 45,51-70. Jackson, J. and H. Kunreuther, 1981. Low probability events and determining acceptable risk: The case of nuclear regulation. Professional Paper PP-8 1-7, May. Laxenburg, Austria: IIASA. Keeney, R., 1980. Siting Energy Facilities. New York:Academic Press. Kunreuther. H., 1981. A multi-attribute multi-party model of choice: Descriptive and prescriptive considerations. IIASA Working Paper, WP-81-123. Kunreuther, H. and J. Lathrop. Siting hazardous facilities: Lessons from LNG Risk Analysis (in press). Kunreuther, H.,J. Lathrop, and J . Linnerooth, 1982. A descriptive model of choice siting facilities. Behavioral Science, 27 ( 3 ) . Lathrop, J., 1981. Decisionmaking on LNG terminal siting: California, U S A . Draft Report IIASA. Laxenburg, Austria: IIASA. Lathrop, J. and J. Linnerooth. The role of risk assessment in a political decision process. In this volume, 39-68. Lawless, J., 1917. Technology and Social Shock. New Brunswick, New Jersey: Rutgers University Press. Levine, ME., and C.R. Plott, 1977. Agenda influence and its implications. Virginia Law Review, 6 3 (4). Linnerooth, J., 1980. A short history of the California LNG terminal. W - 8 0 - 1 5 5 . Laxenburg. Austria: IIASA. Majone, N. and E. Quade (eds.), 1980. Pitfolk of Analysis. Laxenburg: IIASA, Wiley. Majone, N. The uses of policy analysis. Draft: Laxenburg, Austria: IIASA (in press). Mandl, C. and J. Lathrop, 1981. Assessment and comparison of liquefied energy gas terminal risk. IIASA Working Paper, WP-81-98. Laxenburg, Austria: IIASA.
MULTI-ATTRIBUTE MULTI-PARTY MODEL OF CHOICE
89
March, J. G., 1978. Bounded rationality, ambiguity and the engineering of choice. Bell Journal of Economics, 9, 587-608. March, J. and J. Olsen, 1976. Ambiguity and Choice in Organizations. Bergen, Norway: Universtetsforlaget. Mitchell, R.C., 1979. National environmental lobbies and the apparent illogic of collective action. In: C. Russell (ed.), Applying Public Choice Theory: What are the Prospects. Washington, D. C.: Resources for the Future. Moore, L. and E. Clayton, 1976. GERT Modeling and Simulation. New York: Petrocelli. Office of Technology Assessment (OTA), 1977. Transportation of Liquefied Natural Gas. Washington, D. C.: Office of Technology Assessment. O’Hare, M., 1977. Not o n my block you don’t: Facility siting and the strategic importance of compensation. Public Policy, 25, 409-58. Olson, M., 1971. The Logicof Collective Action. Cambridge, Mass.: Harvard. Pritsker, A., 1966. Graphical Evaluation and Review Technique. Santa Monica, CA: The RAND Corporation. Tversky, A. and D. Kahneman, 1974. Judgment under uncertainty: Heuristics and biases.Science, 18.5, 1124--31. Walker, J., 1977. Setting the agenda in the U S . Senate: A theory of problem selection. British Journal of Political Science, 7 , 423-445. Wildavsky, A., 1981. Rationality in writing: Linear and curvilinear. Journalof Public Policy, I , 125-140. Wilson, J.Q.. 1975. Political Organization. New York: Basic Books.
This Page Intentionally Left Blank
THE ROLE OF DECISION ANALYSIS IN INTERNATIONAL NUCLEAR SAFEGUARDS* Rex V. BROWN and Jacob W. ULVILA Decision Science Consortiuni. Irlc.
Abstract This paper discusses how the tools of personalist decision analysis (PDA) are being used to help the International Atomic Energy Agency to safeguard nuclear material against diversion from peaceful uses. PDA has two main roles: as a framework for reporting Agency effectiveness (notably the probability of its detecting diversion); and as a management aid (for example, in the allocation of limited inspection sources). A political requirement that no member state be discriminated against imposes interesting constraints on permissible analyses.
Institutional Background The International Atomic Energy Agency (IAEA) is responsible for assuring the international community that nuclear material “under safeguards” is not diverted from peaceful use (IAEA, 1973; IAEA, 1976; IAEA, 1978). The conduct of inspections, and therefore their effectiveness, is constrained by agreements with the states concerned, by the desire of these to protect their econornic and other interests, by the need to respect state sovereignty, and by a very severe rationing of’personal and other agency resources. One constraint is that the Agency should not allocate its resources in a way that discriminates between states, e.g., an allocation which implies that one state is more likely than another to contemplate diversion. This means that any direction of resources on a non-uniform basis must be based on the objective inspection information the IAEA has at hand. In addition, allocation between facilities might be based on technical characteristics, like the type of facility, the fuel involved, the nature of the fuel cycle pattern in the state concerned, the quality of the state’s accounting *The opinions and assertions contained in this paper are the private ones of the authors and do not reflect any policy of the International Atomic Energy Agency or the United States Government. This paper was supported by the U. S . Arms Control and Disarmament Agency under contract Number ACONCl 10.
92
R. Brown and J.W. Ulvila
system in terms of accuracy and transparency, and the quality of the facility measurement system. Specific characteristics may include: whether all nuclear materials in the state are under safeguards; the history of anomaly observation at that facility or in that state; the quality of the facility measurement system as reflected in the level of material unaccounted for (MUF); the documentation of the measurement quality control system. This safeguards function of the Agency supports the more general objective of non-proliferation, shared by countries party to the Non-Proliferation Treaty and other agreements (IAEA, 1968; IAEA, 1972). (There are those who do not belong t o the Treaty but nonetheless either support or purport to support non-proliferation objectives.) Together with other measures taken by states, singly or in concert, Agency safeguards help both t o deter diversion and mitigate its effects if it does occur. The Agency has two administrative objectives requiring analytic he1p : -- Reporting measures of Agency Luccess to member states, primarily through periodic Safeguards Implementation Reports (SIRS). - Managing the Agency's limited inspection resources, for example, allocating an inspector's time to activities, facilities, and time periods. As we know from analogies in financial and management accounting, the measures used for reporting may not be identical t o those used for decision making. Reporting measures may be limited to what can be uncontroversially and objectively documented, whereas decision making (management) measures can incorporate whatever the decision maker judges to be relevant. Of course, some reasonable correspondence between the two must exist if the Agency is to serve the member states effectively.
Analytic Strategy Decision analysis is a prescriptive modeling technique, which incorporates quantitative measures of human judgment. For this reason, we will refer to it as "personalist decision analysis" (PDA), to distinguish it from other apDroaches to analyzing decisions. In principle, PDA has a role to play in supporting both objectives of the Agency noted above. It can support the reporting function by providing a systematic, non-arbitrary way of addressing significant measures of performance, like detection probability. Without personalist decision analysis, one is largely limited to purely informal subjective evaluations or to purely objective evaluations based on judgment free measurements, like the number of inspection visits. It can support the management function
DECISION ANALYSIS IN INTERNATIONAL NUCLEAR SAFEGUARDS
93
by enhancing the quality of decisions through more efficient use of available data and expertise; and by making decisions easier to defend within and without the Agency. Decision Science Consortium, Inc. is now in the second year of helping the Agency to adapt PDA approaches for both reporting and management purposes (Ulvila and Brown, 1981). Reporting Aids A general purpose aggregate measure of the effectiveness of Agency Safeguards has been developed and is summarized in the Appendix. It will shortly be disseminated in an Agency report (IAEA, in preparation). Although designed primarily as a reporting measure, its primary application to date has been in the context of a resource allocation decision presented in the next main section, where its properties are discussed. The head of the Department of Safeguards is also interested in using probability of detecting a diversion as a primary evaluation measure, both for decision and reporting purposes (Gruemm, 1980). It is easy t o understand and communicate, and is clearly a key feature of the Agency's success. However, it must logically take into account judgments about which types of diversions are most likely, in addition to how well each type of diversion is detected. It is perfectly acceptable to assess the latter explicitly since it is largely a technical matter-which is not to say it is easy to do. However, to assess the likelihood of different types of diversion requires taking some view of how a diverter would behave, and this in turn depends on who the diverter might be. Agency staff believe it is politically unacceptable t o address such issues explicitly, for example, for directly imputing behavior to a potential diverter. A classical decision-analytic approach to this problem forces the innate logic out into the open, and thereby open to objection. The overall (marginal) probability of detection given any diversion is computed as the weighted average of the conditional detection probabilities for all diversion "paths", with weights proportional to path probabilities. It is making the latter explicit which is liable to raise sensitive issues. Two approaches have been explored to get around this problem. One of them, which is the one currently favored by Agency staff, is to assign equal weight, and implicitly equal probability, to all diversion paths. On the face of it, this is objective. In fact, it depends critically on how diversion paths are distinguished. For example, splitting one path into two variants will double the weight for the original path. Implicit relative probabilities could be inobtrusively incorporated by partitioning the more
94
R. Brown and J.W. Ulvila
probable scenarios more finely than the expected ones. Without this, one may end up with a very unrealistic detection probability (unless the probable and improbable scenarios have much the same conditional probability of detection). A second approach is to weight paths by some measure of "attractiveness", say, the technical complexity of a diversion path, which will approximate relative probabilities. This approach is incorporated into the aggregate measure presented in the Appendix, though that measure attempts to capture more concerns than the probability of detecting a diversion. Professional integrity aside, any seriously unrealistic overall detection probability is liable to be shown up by the unfolding of events. The Agency would be embarrassed if diversions ultimately come to light and it turns out that the frequency with which they were detected proves at variance with the reported probability. (On the other hand, if the sample is small enough, as we must hope it will be in the case of nonproliferation, it will not be very diagnostic, and therefore not very embarrassing.) Overall detection probability may not be a sufficient measure of detection performance. It will not reflect the relative importance of detecting different diversion paths. Presumably, the Agency would like to report the highest probability of detection for those potential diversions involving material which may be used directly in fabricating nuclear explosives. Detection probabilities could be reported for each level of diversion seriousness, but there may be some political sensitivity if "seriousness" depends on who the diverter is. Such factors again are handled loosely enough to be politically acceptable in the "aggregate measure", by the use of "objective" surrogates for seriousness, such as types of nuclear material. Another refinement (not yet developed) would be to assess several detection probabilities for different levels of Agency response. In our current analyses, a single "detection" response has been used: the observation of an anomaly by an inspector. In fact, this weak form of detection may lead to stronger responses if the anomaly remains unresolved: inspection activity at a facility may be increased; the Director General may consult informally with state missions concerned; the unresolved anomaly may ultimately be reported to the U.N. Security Council. (Derived responses, say by individual member states, may be the actual instruments of deterrence and mitigation.) Any attempt to collapse such probabilities into a single evaluation number would require weighting the importance of different levels of detection. This would require getting into mitigation and deterrence effects which the Agency may not care t o analyze explicitly.
DECISION ANALYSIS IN INTERNATIONAL NUCLEAR SAFEGUARDS
95
If response probabilities absent diversion are also incorporated to capture the risk of false alarm, the cost of false alarm would need t o be balanced against the value of detection.
Decision Aids A specific decision aid has been proposed for allocating inspection resources within a facility and is discussed in some detail in the next section (Shea ef d.,1981; Ulvila, 1980). Any general measure-like the probability of detection-will inevitably represent only part of the considerations relevant to any particular decision. Even if "public" utility to the international community is fully captured, there will inevitably be more parochial "private" Agency considerations to take into account. A "tough" policy for inspectors to follow-up on observed anomalies may be best for effective detection; but it may be ruled out on the grounds that it is intrusive, impairs relations with states accepting safeguards and might even lead to broken agreements-a severe embarrassment for the Agency and a reduction in the effectiveness of the nonproliferation effort. Multiattributed decision analysis may be used, but the need for discretion in disclosing private utility may dictate handling that informally. Moreover, the appropriate way to measure performance (completely or incompletely) depends critically on the decision. Primary analytic and data-gathering effort should be devoted to what this decision may effect. When allocating inspection resources inside a light water reactor (LWR), one will only want to analyze diversion paths relevant in an LWR. When allocating resources across states, one will want to pay careful attention to political implications that are irrelevant to allocations within a state. A common error by beginning decision analysts is to develop a general purpose measuring procedure for a system (like the Agency) and expect it to be usable as it stands for any particular decision. The most promising decision to analyze W i l l be those: where the stakes are high (heavy cost for a mistake); where the options are clear cut, but their comparison is perplexing; where human judgment is unavoidable, but others need to be persuaded of the rationality of the decision; and where substantially the same decision is made on a recurring basis. Immediately promising candidates appear to be: - Developing decision rules for when to initiate a special inspection, and more generally, when to proceed from one Agency response level to the next (IAEA, 1972; IAEA, 1973). - Allocating inspection resources at a facility (see next section), between facilities, and between states.
R. Brown and J.W.Ulvila
96
Personalist decision analysis and inference techniques appear to have an important potential role to play at IAEA. They may be able to ease the inherent difficulty of determinations the Agency must make, which have unavoidable judgmental components. They can also make those determinations more transparent to the international constituency served by the Agency. On the other hand, t h s very transparency may invite challenge in politically charged situations.
A Decision Aid for Allocating Inspection Resources Noting that the IAEA's personnel resources are insufficient to meet all commitments, allocation of these resources is necessary. The IAEA is investigating the use of decision analysis to aid in the allocation of an inspector's time across activities w i t h n a routine inspection of a facility (IAEA, in preparation; Shea et al., 1981; Ulvila, 1980). The approach uses a decision-analytic model to develop a prioritized list of inspection activities. The general approach is illustrated in Figure 1 . Prioritization is based on the value of the activity to the IAEA and its cost in terms of inspection time. tnspectton Cumulative activities hours Inspection Character activities isttcs I A 6 C
...
Value
~
Cost(hours)
-
"/C
_ -
-
-
_ _ -
A
B C
2 ?
0??:
%z
To;,
25
c3 A
25
Cumulative hours
Fipure 1. ApproaLh to Prioritizing Inspection Procedures
Safeguards Objective Inspection activities have value to the extent that they contribute to the Agency's safeguards objective which is stated as follows: "The objective of IAEA safeguards is the timely detection of diversion of significant quantities of nuclear material from peaceful nuclear activities to the manufacture of nuclear weapons or of other nuclear explosive devices or for purposes unknown, and deterrence of such diversion by the risk of early detection" (IAEA, 1972). This objective is composed of five significant elements, each of
DECISION ANALYSIS IN INTERNATIONAL NUCLEAR SAFEGUARDS
91
which is important in the development of safeguards. These elements are: timely detection, detection of diversion, significant quantities, from peaceful purposes, and deterrence of diversion by the risk of detection. Each element is the subject of a continuing international discussion, but some guidelines have been agreed to and they serve to guide the current safeguards activity (IAEA, 1980). Timeliness is currently addressed by a guideline that the maximum time that may elapse between a diversion and detection by IAEA safeguards should "correspond in order of magnitude to the time required to convert the material into the metallic components of a nuclear explosive device". This conversion time is different for different types of material-ranging from several days for plutonium metal to about a year for low enriched uranium. Since it is highly unlikely that an inspector would catch a diverter in the act of diverting nuclear material, a method had to be devised to detect diversion indirectly. The method currently under consideration makes use of three activities. First, a diversion path analysis is performed that identifies different methods of removing nuclear material from a facility and concealing the removal. A diversion path is defined by the material involved (e.g., spent fuel), its location (e.g., the spent fuel pond), and the method of concealing its removal (e .g., by falsifying accounting records). A diversion path is not limited to the physical route of removal. Diversion paths for a facility are determined from an analysis of design information, operating procedures, the state's system of accounting and control, and inspection histories (where available). Next, anomalies that would be generated by a diversion through the path are determined. Anomalies are "unusual observable conditions that might occur in the event of a diversion". For example, a diversion that was concealed by falsifying records at the facility would produce an anomaly of inconsistent accounting records. Then, inspection activities that would detect the anomalies are defined. For example, checking and comparing account balances would detect an inconsistency in accounting records. A significant quantity is "the approximate quantity of nuclear material in respect of which, taking into account any conversion process involved, the possibility of manufacturing a nuclear explosive device cannot be excluded". Current guidelines provide specific estimates of significant quantities for different types of nuclear material. These range from about 8 kilograms for plutonium to about 20 tons for thorium. Peaceful uses of nuclear materials occur at a variety of nuclear installations under IAEA safeguards. Categories of facilities subject to safeguards include power reactors, research reactors, conversion plants, fuel fabrication plants, reprocessing plants, enrichment plants, separate storage facilities and other locations (such as transit stores). 7
98
R. Brown and J.W. Ulvila
Deterrence by the risk of detection assumes two things: that the negative consequences of detection will act to deter a potential diverter and that a probability of detection will also provide a deterrent effect. This is the deterrence provided by IAEA’s international safeguards. It contrasts with deterrence provided by physical protection of facilities, which is part of a state’s national safeguards.
Aggregafe Measure of Safeguards Effectiveness In order to make managerial decisions affecting safeguards-such as evaluation and resource allocation-it is desirable to have an operational measure of the safeguards objective. For purposes of helping to allocate an inspector’s time among possible inspection activities during the inspection of a facility, the IAEA is considering the use of the aggregate measure presented in the appendix. This measure has been proposed by the lAEA staff and a group of international consultants invited by the IAEA to assist in formalizing the safeguards assessment methodology. The measure incorporates six explicit considerations related to the safeguards objective (IAEA, in preparation): (a) the probability of detecting a diversion; (b) the amount of nuclear material diverted; (c) the type of nuclear material; (d) the technical complexity involved in removing the material and concealing its removal; (e) specific vulnerabilities of the system; and (9 the timeliness of detection. The probability of detecting a diversion is related to the objective of deterrence through the risk of detection. The specific probability measure used is the probability that an inspection activity would detect the presence of an anomaly (which is the easiest probability to assess). Estimated probabilities are high for simple activities (e.g., that an item count will detect that a fuel assembly is missing from the spent fuel pond) and lower for more difficult activities (e.g., that a non-destructive assay will detect the substitution of material in a spent fuel assembly). Value increases as probability increases, but not linearly. l t was assessed that there is considerable deterrence value in achieving even a low probability of detection. This is reflected in the measure by raising the probability to the power of 0 . 3 . This assessment implies, for example, that half of the value of covering a diversion path is achieved with a coverage probability of 0.1. This further represents an assessment that it is more important to cover a large number of paths with low probabilities than a small number of paths with high probabilities.
DECISION ANALYSIS IN INTERNATIONAL NUCLEAR SAFEGUARDS
99
The amount of material diverted is important in relation to the objective of detecting diversion of a "significant quantity of material". This is reflected in the method by defining diversion paths with one significant quantity of material and assessing an activity's probability of detecting an anomaly associated with the diversion of a single significant quantity. If the same activities were performed, probabilities of detection would generally be higher for larger quantities of material and lower for smaller quantities. The type of nuclear material associated with the diversion path and the technical complexity of a diversion and concealment are important because they bear on the probability that a diverter would choose one path over another. To an extent, use of material type and technical complexity is a way to estimate the relative probability of paths without considering characteristics of a diverter explicitly. The aggregate measure, reflecting assessments of the international consultants, ascribes more value to the coverage of paths that involve easy-to-use material and simple concealments than to those that involve difficult-to-use material and complex concealments. This is consistent with the view that a potential diverter is less attracted to a path containing a hard-to-use material like lowenriched uranium than to a path containing an easy-to-use material like plutonium metal. Likewise, a diverter is less attracted to difficult concealment strategies (e.g., those requiring spent fuel dummies with correct gamma signatures to deceive a non-destructive assay) than to easy ones (e.g., those that involve concealment by falsification of accounting records). Specific vulnerabilities of the safeguard system are considered in relation to the objective of deterrence through risk of detection. A diverter is likely to be attracted to a diversion path that is known not to be covered, which is represented by a path with zero probability of detection. The aggregate measure encourages coverage of all paths with some probability by raising value more quickly for increasing the probability of detection of paths that are not covered than for paths that have some probability of detection (by raising the probability to the power of 0.3). More importantly, the procedure suggested for implementing the method includes a recommendation for randomizing the choice of some inspection activities to provide some probability of coverage for every path during each inspection (see below). The timeliness of detection is related to the goal of "the timely detection of diversion". Timeliness is reflected in the aggregate measure through a factor that reduces value if inspection activities are not performed often enough to detect a diversion within one conversion time.
7*
R. Brown and J.W.Ulvila
100
Prioritization Procedure
The prioritization procedure begins with a calculation of the value of the aggregate measure for each inspection activity. Next, an estimate is made of the time required to perform each activity. With this information, an allocation of time can be determined by selecting activities that provide the most value per hour of inspection time. That is, activities are prioritized on the basis of their value-to-cost ratios. Figure 2 displays the cumulative value of inspection activities versus inspection time for a prioritized list. This example is for a pressurized water reactor. The figure shows, for instance, that an inspector can provide coverage that is about 80% effective by performing the first 7 of the possible 17 activities. These seven activities require only about 30%of the total time.
I
00-
10
20
-~
30
I
I
I
40
50
60
Inspection hours
Figure 2. Valuc of Inspection Activities vs. Hours of Inspection Time
With a limited amount of inspection time, an efficient allocation would choose activities in order of priority that could be completed in the available time. However, such a strategy is certain to leave some diversion possibilities uncovered if there is insufficient time to cover all paths. Furthermore, since the method is likely to be public knowledge, a potential divsrter could find out which paths would be left uncovered. The diverter might then focus attention on diversion through those paths. This problem might be addressed analytically as a two-person game between the IAEA and a diverter. This would involve constructing utility functions for the IAEA and the diverter over outcomes of different diversion strategies and solving the game using a concept like equilibrium.
DECISION ANALYSIS IN INTERNATIONAL NUCLEAR SAFEGUARDS
101
However, the analysis is not permitted to model a potential diverter explicitly. (In addition, it is not clear that game theoretic concepts of solutions-which provide advice to both sides jointly--would be of much use for providing advice to IAEA individually .) A modified procedure that uses efficiency information but provides some coverage of all paths is under consideration. In this procedure, some activities would be designated to be performed during every inspection (e.g., the first seven activities in Figure 2). These activities would provide coverage of a large fraction of the total diversion paths, especially those that are of low technical complexity and contain easy-to-use nuclear material. The remaining inspection time would be devoted to activities selected on a random basis from those remaining. These activities would typically cover few paths that are more complex or involve hard-to-use nuclear material. This procedure provides maximum coverage of the most desirable paths and some risk of detection for all paths.
Appendix :Aggregate Measure Definition The following aggregate measure has been proposed for consideration in evaluating the effectiveness of an inspection regime at providing coverage of a facility (IAEA, in preparation). Technical terms included in t h s definition are further defined in IAEA (1980).
where VF,R is the aggregate measure for facility F and inspection regime R; i is the index for nuclear material type and technical complexity level class; F is the facility; is the importance weight for class i (see below); Wi is the number of diversion paths in class i; ni is the index for diversion paths; j t’ is an index for time (expressed in physical inventory verifications, PIVs);
102
R. Brown and J.W. Ulvila
T' f(t') 1 2.4 t T m "TI Pijt
TI' J
is the time period covering the four most recent PIVs; is the weighting function for time (see below); is a normalization factor over T'; is an index for time (expressed in conversion times, CTs); is the material balance period (excluding the earlier PIV); is the fractional value for multiple coverage of a path during T; is the number of CTs for class i in time period T; is the probability that the inspection activities that provide coverage for time t would detect an anomaly associated with a diversion along path j in class i if such an anomaly exists; and is the set of coverage times in T, for path j , excluding the one containing Max pijt T
I )
Importance Weight
Values, in the form of importance weights, were assigned by the group of international consultants to the relative importance of providing coverage of different types and forms of materials for three levels of technical complexity. These values, on a 0 to 100 scale, are as follows (Shea et al., 1981): Technical Complexity Level A
B
C (most complex)
100
58
24
Pu; HEU; U-233 separated from fission products
82
47
15
Pu; HEU; U-233 in spent fuel
45
20
6
LEU; NU; DU; Th
25
12
4
Material Type
Pu; HEU; U-233 metal
Weighting Function for Time
The weighting function for time, qt'), is the product of an importance weight and a coverage factor for the period. A proposed importance weight function is:
DEClSlON ANALYSIS Ih INTERNATIONAL NUCLEAR SAFEGUARDS
1 0.2
2 0.4
1 0.25
2 0.5
Time Period (in PIVs) Importance
3 0.8
103
most recent period 4 1 .o.
A proposed coverage function is:
Time Period (in PIVs) Coverage
most recent coverage 3 1.00.
References Gruemm, H., 1980. Designing IAEA safeguards approaches. Nuclear Materials Management. INMM Proceedings, 14-22. International Atomic Energy Agency, 1968. The Agency’s safeguards system (1965, as provisionally extended in 1966 and 1968). INFCIRC/66/Rev. 2. Vienna: Author. International Atomic Energy Agency, 1972. Structure and content of agreements between the Agency and states required in connection with the treaty on the nonproliferation of nuclear weapons. INFCIRC1153. Vienna: Author., International Atomic Energy Agency, 1973. Statute. Vienna: Author. International Atomic Energy Agency, 1976. Short history of non-proliferation. Vienna: Author. International Atomic Energy Agency, 1978. Non-proliferation and international safeguards. Vienna: Author. International Atomic Energy Agency, 1980. IAEA safeguards glossary. IAEA/SGIINFIl . Vienna: Author. International Atomic Energy Agency, in preparation. Safeguards assessments: Uses, concept, procedures. IAEA/SG/INF/x.Vienna: Author. Shea, T.E., E.W. Brach, and J.W.Ulvila, 1981. Allocation of inspection resourczs. Rocekdings of the 3rd Annual Symposium on Safeguards and Nuclear Material Management. Ulvila, J.W., 1980. A Method for Determining the Best Use of Limited Time Available for the Inspection of a Nuclear Facility. Falls Church, Va: Decision Science Consortium, Inc. Uhrila, J.W. and Brown, R.V., 1981. Development of Decision Analysis Aids forNonProliferation Safeguards. Falls Church, Va: Decision Science Consortium, Inc.
This Page Intentionally Left Blank
A DECISION ANALYTIC SYSTEM FOR REGULATING QUALlTY OF CARE IN NURSING HOMES: SYSTEM DESIGN AND EVALUATION David H. GUSTAFSON University of Wisconsin Center f o r Health Systems Research and Analysis Wisconsin Health Care Review, Inc.
Robert PETERSON, Edward KOPETSKY, Rich Van KONINGSVELD, Ann MACCO, Sandra CASPER Wisconsin Health Care Review, Inc.
and Joseph ROSSMEISSL University of Wisconsin Center for Health Systems Research and Analysis
Abstract In an era of deregulation and diminishing resources, it is essential that effective methods of program evaluation be identified and implemented. This paper describes a methodology for program evaluation which was applied to monitoring the quality of care in nursing homes. While one component of this process lnvolves a new sampling scheme for review of resident populations, this paper deals primarily with the development and testing of a multiattribute utility model of nursing home quality. Results indicate that the method presented here has several advantages including less time spent applying the method and a higher number of more severe problems detected.
Introduction Decision analysis has begun to demonstrate its potential as an evaluation methodology. Edwards et al. (1975) presented a decision theoretic approach to program evaluation which has since been used to evaluate the Community Anti-Crime Program of the U.S. Law Enforcement Assistance Agency (Snapper and Seaver, 1980). Von Winterfeldt (1978) has used decision analysis to evaluate the environmental effects of the North Sea oil drilling programs. Gustafson et ul. (1981b) have constructed utility models to measure the seventy of illness. That severity measure is being used to evaluate the effectiveness of emergency medical services in the U S . (see Gustafson et al., 1981d).
106
D.H. Gustafson el al.
The study reported here differs from previous research in that decision analysis itself is the intervention to be evaluated. This research uses decision analysis as a fundamental part of a new regulatory system and evaluates its effectiveness in monitoring the quality of care in nursing homes. This new regulatory process is now being field-tested in Wisconsin and is being used, with variations, in New York and Massachusetts. A study by the State of Wisconsin’s Medicaid Management Study Team (MMST) (see Gustafson er al., 1981a) found significant problems with the existing approach to monitoring quality of care i n nursing homes. First, the regulatory process was very expensive. Each of the State’s 500 nursing homes was required to undergo an annual survey monitoring compliance with over 1500 regulations. Each survey took a four-person team several days to complete. Second, since each nursing home was surveyed every year, substantial resources were invested in assessments of both good and deficient homes. Few regulatory resources remained for actual promotion of changes in quality of care. The MMST study, in fact, showed that enforcement action was taken against only 2@hof nursing homes found in violation in 1977. Finally, the existing quality assurance process required an assessment of each nursing home patient’s level of care to determine, in part, whether or not the needs of the residents were being met. This review required an average of 20 days of professional staff time to complete. Moreover, its focus on medical records meant that nursing home staff was rarely interviewed and residents seldom examined. This exacting review process seemed to have little impact on nursing home care. So, while the process of regulating nursing home quality seemed ponderous and ineffective, there was a general consensus among professionals and lay persons alike that it was easy to tell a good home from a bad home. The first smell on entering a facility, the presence and treatment of bedsores, the daily variety of the food service menu, the attitudes of personnel toward residents, etc. could clearly identify the good home or the bad home. Aware that an expensive, time-consuming, harassing regulatory process was not working and that a common wisdom regarding good nursing homes versus bad apparently existed, the Medicaid Management Study Team proposed a new approach to monitoring quality of care in nursing homes. That approach was based on the principles of decision analysis and statistical quality control. Briefly, the statistical quality control contribution was the design of a sampling scheme in whch a sample of residents was reviewed rather than an entire population. The sampling scheme required that 10%of the
REGULATING QUALITY OF CARE
107
nursing home's population (or a minimum of 10 residents) be included in the survey of the home. Upon arrival at the facility, the survey team would assign a number to each paiient in the population and select for review those patients whose number appeared on a list of random numbers generated earlier by computer. Under this new design, each resident in the sample was examined, staff were interviewed, and medical records were examined; whereas only record reviews had been conducted previously. Criteria for satisfactory performance were established. If the home failed to satisfy these criteria in the sampling study, then all its residents were reviewed. The idea was to focus the review on patient care arid to leave the good nursing homes alone as much as possible. This paper will focus on the decision analytic aspects cf this new quality assurance process. The same basic idea of a contingent regulatory approach is embodied in the decision analytic component of the quality review. Whereas the statistical quality control component focuses on patient (or resident) review, the decision analytic component focuses on facility review. The decision analytic component is a multi-attribute utility model of nursing home quality. That model and the evaluation of its use in regulating nursing home quality are at the focus of this report.
A MAU Model of Nursing Home Quality The process of developing a multi-attribute utility (MAU) model of nursing home quality began with the assembly of two panels of respected theoreticians and practitioners in the field of long-term care. The Integrative Group Process (see Custafson ef al., 1981c) was used to develop that model. 'The lntegrative Group Process (IGP) is based on the assumption that the type of group process employed can affect the quality of the judgments elicited from the group (see McNamara, 1978). In the IGP approach, groups carry out two basic activities. First, in "qualitative modeling, groups identify the factors and measures to be used in evaluating quality of care. Second, in "quantitative estimation", groups estimate the strength of the relationships that those factors have with quality. The rather sparse literature (see Nemiroff and King, 1975; Hall and Watson, 1970; Gustafson et al., 1980; and Nemiroff ef al., 1976) suggests that effective qualitative modeling processes and quantitative estimation group process should include basic characteristics. These guidelines formed the basis of the ICP: (1) Group members should individually formulate their own models prior to group discussion.
108
D.H.Gustafson et al.
Group discussion should be preceded by informal interaction allowing each group member to establish "legitimacy" in the group. ( 3 ) The group should have an initial "straw man" model on which to focus attention in early stages of group discussion. (4) Time should be available to interpret the ideas of each participant into a formal conceptual model. (5) The modeling procedure should be divided into discrete phases to provide a task structure for the group. (6) A group facilitator should be used to channel discussion in ways that keep the groups task-oriented.
(2)
Index Development Strategy Participating theoretical and applied long-term care experts were convened in two panels. Panels were comprised of gerontologists, therapists, nurses, social workers, and nursing home administrators from universities, state government, and the long-term care industry. Potential physician-panelists were nominated by recognized opinion leaders in the long-term care field. Those receiving several nominations were invited to participate with a panel. Each panelist agreeing to serve was interviewed at length by telephone by the panel facilitator. First, the nature of the project and the use to which the Quality of Care Index would be put were reviewed. The panelist was then asked to identify the important indicators of nursing home quality (indicators for which data would be likely to be recorded) and to provide an example of a poor quality home and a good quality home for each criterion. The panelist was asked to list as few indicators as possible in order to focus attention on a small set of critical variables. (The more that marginally relevant variables are included, the lugher the likelihood of their spurious effect on the final model.) Responses to the telephone interview were examined to identify indicators and concepts frequently mentioned by all panelists. From these a "tentative" quality model was prepared. The first meeting of the group was for the purpose of qualitative modeling. In this meeting, panelists reviewed the tentative model, modified that model, and reached a consensus on a final set of variables from which the quality of care index would be constructed. They also specified operational definitions for use in measuring these variables during observations of the nursing home. The panel reconvened for quantitative estimation the next morning. At that time, the panel reviewed the model they had developed, provided
REGULATING QUALITY OF CARE
109
utility estimates, and made weight assessments for the MAU model. By the end of the session, sufficient quantitative data had been collected to develop additive MAU models for the group as a whole and for each individual in the panel. The IGP produced a list of factors, weights on those factors, and operational definitions for use in measuring the factors. A second panel was then convened to review and refine the model. The model was placed into a format that could be used by surveyors to screen nursing homes (see Figure 1). The idea was that the MAU model could be used to guide a brief review of a nursing home. If the home passed the review, then its survey was complete. If only a few minor problem areas were found, then the surveyors would seek to correct those problems. If the screening process found many deficiencies or a few substantial problems, a complete survey could be implemented using the full array of 1,500 regulations mentioned earlier, legal investigators, and special review teams to collect data that would be useful in court proceedings. In the original regulatory method, surveyors were limited t o citing violations (an action that leads to fines). In the new method, surveyors were to select the one intervention that would most likely lead to a change in the way care was delivered. The intervention could be a violation citation, but it could also be consultation, training, political pressure, etc. It should be noted that a score for each nursing home was not calculated as a result of this process. The MAU model as described above served only as the basis for this new quality assurance process.
D.H.Gustafson el a/.
110 Quality Assurance Project FACILITY ASSESSMENT
FACILITY _ _ ._ -- EVALUATOR __ O A T € -__ / O l O K RESIDENT CONOITION COMMENTS Personal appearance lgroomed /not) -__ ___.___ _ _ ~ !-Odor (no problemlproblem) Clothing I appropriate / inappropriate, cleanlnot) Mood Ihawvlaluml (oDen/afraid to talk1 ... - Awareness Ialert /drugged / disor iented - no activity ) -Physical condition ldecubiti. catheters, geri chairs /good ) -Behavior IappropriateInot) -Appropriate safety devices I used / n o t )
1
I
11 !
~
t+
~~
J 1
~
CARE MANAGEMENT
COMMENTS
-Plan of care (goals appropriate /not) ( t o t a l needs /medical needs only) -Resident involved in own care (yeslnegligible) -Records system (meaningful /routine1 (self-helpful /only to meet regs) -Restorative care (geared to needs / not 1
I
I
I
utilizedInot)
-____
-
-Staff role in planning. evaluation (periodic reevaluation / n o G -Evidence of communication among interdisciplinary stuffs (yes /no)
-
-
RESIDENT IMPORTANCE COMMENTS -Between - resident communication Ipromote interactions, voluntary gatherings/ inhibit. prevent Interactions) -Variety of activities (many in-and-outdoor I T V . bingo only) -Residents' lifestyles and conditions matched to activities Icommunity. facility. bedside /no effort to individualize) - Focility altivity space I adequate / inadequate I --__ -Religious I ties to home churches,-bedside /no services) -___ -Ties to community ( y e s l n o ) -~ -__ -Volunteers I a c t i v e I I i t t ~ r o l e )(trained /untrained1 -Ties to family lagressive attempt to involve family / n o )-___ -Residentouncil ( resident-run /staff r u n / no council) -Staff knows residents and provides some continuity of lifestyle lyeslno) - R e c e n t s ' rights respected lyes/no) -Resgent roam assignments (appropriate /not) -tiandling of problem residents ( discharge quickly / keep) __
t3gure I . M A U Model as Adapted for Facility Review
REGULATING Q U A L I l Y OF CARE
111
A Pilot Test The MAU model was pilot tested prior t o implementation. Five teams of two qualified people each were gathered t o pilot-test the model. Nine nursing homes-six proprietary homes, t w o church-affiliated homes, and one municipal home - agreed t o open their doors t o these teams of judges. The pilot-test judges were paired so that each of the five teams consisted of one 'activities person' and one nurse. One person on each team also had experience in nursing home administration. The teams first met with the administrator of the nursing home who answered questions raised by the team members. The teams next toured the facility. During this time the screening instrument served as a guide for investigation. After the tour, each team member individually completed his or her screening questionnaire and shared it with the other team member. When consensus did not exist and questions on the instrument could not be answered, the team members collected the necessary data by returning to the appropriate places in the nursing home. Discussion was encouraged between members of the same team; however, discussion was prohibited among teams t o ensure that later comparisons of survey results would be valid. Part of the final survey day was spent suggesting revisions in the evaluation procedure. In addition, each team provided a general assessment of the relative quality of care provided by each of the homes. This general assessment was performed in addition t o the assessment done with the screening instrument. Later, all the judges from all the teams were given a list of the nine homes visited in the pilot test. Each judge individually ranked the homes in the order of the general quality of care delivered and assigned 100 t o the "best" home and scores between 0 and 100 t o the remaining homes. These ratings were assigned so as to reflect the relative quality of those other homes. For instance, a home receiving a score of 99 could be assumed t o be about twice as "good" as a home receiving a score of 50 in this rating process. The judges were not permitted t o discuss their scores with each other a t any time. The ratings from both judges in each team were then averaged t o give a team score t o each home. This average general assessment was used later t o test the strength of agreement among the evaluating teams. Three different measures of effectiveness were used in the pilot test of the model: (1) General team assessments of nursing home quality were correlated with the values assigned by the MAU model. Strong agreement
I12
D.H. Gustafson el al.
would indicate that the screening model replicated the judgments of a team of respected surveyors. (2) Screening instrument ratings were correlated across teams. High correlations would suggest that the model was being reliably applied by several different groups. ( 3 ) Screening model ratings were correlated with the number of deficiencies cited in the home during the most recent standard or "old method" survey. Since the screening process had been developed as an alternative to what was considered to be an ineffective system, this correlation was expected to be poor. The results of the pilot test, as summarized below, were encouraging: (1) It appeared that the average general assessments could be correlated with the screening instrument ratings and that the assessments could be used as one standard of comparison for the screening approach. (2) The screening process was reliable. When a composite model rating (obtained by averaging the scoresgiven by all teams to each indicator in the screening model) was compared to the scores given by each team, the correlation averaged 0.78. (3) The correlation between the composite screening rating and the number of deficiencies in the most recent State survey was 0.53 (significant at the 0.10 level). When the number of deficiencies was compared to the average general assessment, the correlation was 0.1 1,
A Field T e s t S t u d y Design
Based on the results of this pilot study and through the efforts of officials in Wisconsin's state government, the United States Government granted permission for Wisconsin to field-test the new Quality Assurance Process (QAP) in 320 nursing homes. The homes were randomly assigned to control groups and experimental groups according to a three-part design. 1 . Rural. One group of 120 nursing homes in a rural area of Wisconsin was assigned to a four-cell design. Thirty homes were to be reviewed using the full QAP process (both MAU model and statistical quality control system). Thirty homes were reviewed by the original regulatory process. Thirty homes used the MAU model and the original resident review. Thirty homes used the original facility review and the new statistical quality control system (sampling). A picture of this design follows:
I
Facility Review
Patient Review
MAU
Original
Sampling
30
30
Original
30
30
v
The same State survey teams used all four methods. So, in one home, they would use MAU and sampling. In another home, the same team would use the original methods of facility and patient review. 2 . Urban. A group of 40 nursing homes in a major urban area was selected for the study. These homes were randomly assigned to two cells. Surveyors used the original patient and facility review process for homes in one cell. In the other cell, the new patient samplinglfacility review MAU model was used (see Figure 1). One team was assigned to do "new-methodA homes only. One team did only "original method" homes. 3 . Suburban. An additional 160 nursing homes in suburban areas of Wisconsin were assigned to two cells: the original patient and facility review method and the new samplindMAU approach. Eighty of the homes were randomly assigned. The other 80 homes were geographically located in two separate areas of the state. These two areas and the homes in them were very similar in terms of socioeconomic and demographic character. Survey teams were assigned to apply one of the methods (either original or new MAUlsampling).
Hypotheses Five hypotheses guided the Quality Assurance Process (QAP) field-test. The QAP was to: (1) Lead to a reallocation of time such that more time would be spent in poor quality homes and less time spent in good homes. ( 2 ) Identify and take action on more problems than the old method. (3) Correct problems and prevent recidivism more effectively than the old method. (4) Lead to an improvement in the quality of care in the nursing home to a greater extent than would the original method. (5) Lead to actions that would tend to be consistent with effective change agents more frequently than the old method.
8
114
D.H.Gustafson e t a / .
Measure of Effectiveness I . Time Allocation. A time study of the surveyors applying the QAP method and those applying the original method was conducted. Each day for a four-week period, surveyors completed a self-ieport of their activities for the day. All surveyors in the state completed this self-report o n which they indicated the types of activities as well as the name of the nursing home in which the activities were applied. The activities were subdivided into three categories: annual visit, interim visit, and office activities. Annual visit referred to the time when the MAU/sampling method or the original method was applied t o assess quality of care. The interim visit was used for consultation, training, surprise surveys, etc. Office-related work should be self-explanatory. The research staff had collected quality of care ratings Tor each nursing home in the state by polling state surveyors, obtaining the advice of nursing home association directors, and eliciting the opinions of knowledgeable people in state government. This iterative process led to a classification of homes into good, medium, and bad groups. Finally, data were collected on-site describing how much time surveyors spent in the annual survey visit t o the nursing home. These data were used t o convert percenttime information into personnel hours. We investigated the amount of time spent in a home, the change in that time as a function of the home's quality, and allocation of time t o various activities. 2 . Problem Detection. A random sample of 60 nursing homes was chosen t o examine the effectiveness of the original method versus the MAU/sampling approach. The sample included 30 homes using each method, split evenly among urban, rural, and suburban areas of the state. Records were examined to determine the number o f problems identified in each home. Those problems could result in a citation being issued or in another action, such as training, taking place. If a citation were given, it would be classified as A (most serious), B (serious), C (not a threat to patient well-being), or F (violation of federal but not state statutes). Each problem (even if n o citation were given) was classified by research staff according to what federal or state code was violated. Five panels of qualified people were convened t o rate the relative seriousness of each problem identified. Inter-rater reliability studies of this rating process were conducted by including between 10 and 35 duplicate problems in the packetsrated b y each panelist. The correlations among the panelists indicating inter-rater reliability were high, ranging from 0.78 to 0.98. Two measures of effectiveness were developed for this study: ( I ) The number of problems detected by the original method versus the MAUlsampling process.
REGULATING QUALITY OF CARE
115
The severity of problems detected by the original method versus the MAU/sampling process. 3 . Recidivism. Obviously, the objective of a quality assurance program is not only to identify a problem but also to correct that problem. The recidivism measure of effectiveness is intended to examine the extent to which problems identified by the two quality assurance methods continue t o exist when the nursing home survey is repeated one year later. Data on recidivism will eventually be presented in two ways: (1) Recidivism rate-the relative frequency with which a problem occurs. (2) Severity/Recidivism rate--each problem that reoccurs will be weighted by the severity of that problem. Reoccurrence of a problem was determined by having a team of nursespecialists review records of two years of visits to the sample of 60 nursing homes mentioned in the previous section. Problems (or types of problems) that occurred in both years were identified. Severity ratings of problems from the first year’s visit were obtained as described in the previous sec. tion. Those severity scores were used to weight the reoccurring problems. Data on recidivism rate will be reported here, data on the severity weighted recidivism rate are not yet available. 4. Quality of Care. The most important change to observe is a change in the overall quality of care delivered by a nursing home. T o this end, an instrument for measuring quality of care has been developed that will be applied t o a sample of 130 homes in the study. The reason a separate index was developed for use in this effectiveness measure was that an independent assessment of changes in quality of care was needed. A standard of comparison was required against which the new Quality Assurance Process could be evaluated. The Quality Instrument was designed t o be applied i n separate visits to the nursiiig home both before and after the new QAP was applied t o the home. The Quality Instrument is built on the same conceptual model used t o develop the MAUlsampling method. However, the instrument is quite different in its methods of measurement. First, it is intended to yield an overall measure of quality but is not intended to identify the source of specific quality problems. Second, the instrument is designed t o be applied in one day rather than the two or t h e e days required by the MAUlsampling method. 1 Third, the Quality Instrument embodies a set of rigid guidelines for measurement designed to ensure inter-rater reliability. (This is currently being tested.) The variables in the model are combined using a MAU format. (2)
8*
D.H. Giistafson et al.
116
Results 1 . Time Allocation. Table 1 is indicative o f the results of this study. The figures presented are for two nursing homes each with 100 beds. the average size of a nursing home in Wisconsin. The data compare the original method and MAU/sanipling method across a “good’’ quality homes and a “poor” quality horrie. The data are further divided into annual and interim visits, as well as office work. The data suggest that the total time required t o conduct a survey is smaller in both “good” and “poor“ homes when the MAU/sampling approach is employed. Moreover, the MAU/sampling approach leads to a much more impressive shift in time allocation based on quality of the home. These results suggest the MAU/sampling method may be less expensive, yet more sensitive t o the quality of the home. Both were intended results of the new Quality Assurance Process.
’
Original Method
MAU/Sampling
I 12.5
Annual visit
12.5
Intcnm V i u t 6.2
2.9
10.1
Related to Home
I p . 8
16.8
26)
Bad Home
26
I .2
1.4
Office Work
Good Homc
Good Home 2.1
Bad Home
3.8
Good Home 4.6
Good Home Bad Home
I
Bad Home
31
.d
32.1
1 Total Time
2. Problem Detection. Table 2 represents the comparison of the number of problems detected using the two methods. The results suggest the MAU/sampling method results in fewer ”C” level (patient well-being not threatened) citations and about the same number of “B” or “A“ violations (threats to patient well-being). MAU/sampling also issues fewer federal violation notices. However, the MAUlsarnpling approach identified
REGULATING QUALITY OF CARE
117
and treated a greater number of problems handled in ways other than citations ("C" level problems where consultation or training was used in place of citations; problems not related to codes, etc.).
20
40 60 Increasing Severity Score
80
100
Figure 2. Cumulative Distribution of Severity Ratings by Method (Provider Panel)
MAU/Sampling N = 30 homes
Original N = 30 homes
I_ -Noncode problems -1 _ Violation
F-level violations
-
C-level Type
C-level violations A&B-level violations Totals
62 (19.4%) 32 (10.0%) 9
(2.8%)
I
89 (25.8%) 15 (4.3%) 146 (42.3%)
202 (63.1%) 15
(4.7%)
320 (100.0%)
14 (4.06%) 345*
I
D.H. Gustafson er al.
118
Figure 2 plots the cumulative distribution of seventy for the problems identified. The results indicate a superiority of the MAU/sampling approach over the original approach to quality assurance. These results suggest that, as intended, the MAU/sampling approach is identifying more problems and more severe problems. Moreover, the MAU/sampling approach is dealing with less serious problems in more flexible ways by using citations, consultation, and training rather than forcing intervenors to rely almost solely on citation' procedures. 3 . Problein Resolution. Results of this study are still incomplete. However, current results are shown in Table 3. Problems are listed along the rows according to their disposition in Year 1. If the problem was identified again in Year 2 and dealt with in any way, that problem was classified as a repeat. So, for instance, of the 68 citations issued by the MAU/sampling method in Year 1, five were identified as still being present in some form one year later. The results suggest that recidivism rates are about the same overall in the MAUlsanipling and original methods. However, when one examines the source of recidivism, we see a much lower recidivism rate among cited problems in the MAUlsampling method. It is assumed the cited problems have higher severity and we therefore conclude that the MAU/sampling method deals with more severe problems in a more efficient manner.
r
Citations Issued
5/68
Citable Problems Dealt with in Other Ways
(7.4%)
14/92 (15.2%)
NonCitable Problems Identified
4/35 (11.4%)
Total
231188 (12.2%) A
231148 (15.5%)
Oll
(0%)
0114
(0%)
231163 (14.1%)
.
Conclusion and Recommendations In an era of deregulation, we run the risk of abdicating our responsibility to monitor the quality of services provided by regulated industries while pursuing the laudable cause of reducing red tape. Decision analysis offers
REGULATING QUALITY OF CARE
119
the prospect of an alternative (contingent regulation) in which a screening process can identify and quickly discharge problems in the agencies delivering good care while concentrating attention on those agencies that need assistance. The results of this study suggest that this prospect is very much alive. The quality assurance method described here seems to have several advantages over other more cumbersome and expensive approaches. Time allocation dies seem to be contingent upon the quality of the nursing home. More problems are being identified and corrected. Yet, the methods used to deal with these problems are more selectively chosen. The current study is not completed. Data are still needed on a number of dimensions including impact on quality of care. However, the results are promising indeed.
References Edwards, W., M. Guttentag, and K. Snapper, 1975. Effective evaluation: A decision theoretic approach. I n : E. Streuning and M. Guttentag (eds.), Handbook of Evaluation Research, VolJ. Beverly Hills, CA: Sage Publications, Inc. Gustafson, D.H., C. Fiss, J. Fryback, P. Smelser, and M. Hiles, 1981. Quality of care in nursing homes: The new Wisconsin Evaluation System. The Journal of Long-Term Care Administration, Y (2). Gustafson, D.H., D. Fryback, and J. Rose, 1981. Severity index methodology development research project. Final Report of Grant No. HS 02621, funded by the U.S. National Center for Health Services Research. Gustafson, D.H., D. Fryback, J. Rose et al., 1981. A decision theoretic methodology for severity index development. Working Paper. Center for Health Systems Research and Analysis, University of Wisconsin-Madison. Gustafson, D.H., P. Jotwani, and G . Huber, 1981. The effects of disclosure as a conflict reduction technique on planning and priority setting. Working Paper. Center for Health Systems Research and Analysis, University of WisconsinMadison. Gustafson, D.H., G. Juedes, J. Rose, D. Detmer et al., 1981. A model to measure and explain the effectiveness of emergency medical services. Working Paper. Center for Health Systems Research and Analysis, University of Wisconsin-Madison. Hall, J. and W. Watson, 1970. The effects of a normative intervention on group decision making performance. Human Relations. 23, 299. McNamara, DE., 1978. The effects of structure, disclosure, closure, and mathematical form of the decision on the small group decision making process. Unpublished doctoral dissertation, University of Wisconsin-Madison. Nemiroff, P.M. and D.C. King. 1975. Group decision making performance as influenced by consensus and self-orientation. Human Relations, 28, 1. Nemiroff, P., W. Passmore, and D. Ford, 1976. The effects of two formative structural interactions on established and ad hoc groups: Implications for improving decision making effectiveness. Decision Sciences, 7,841.
120
D.H. Gustafson et al.
Snapper, K . and D. Seaver, 1980. The use of evaluation models for decision making: Applications to a community anti-crime program. Technical Report 80-1. Decision Science Consortium, Inc. von Winterfeldt, D., 1978. A decision theoretic model for standard setting and regulation. Research memorandum RM-78-7. Laxenburg, Austria: International Institute for Applied Systems Analysis.
Section I1 ORGANIZATIONAL DECISION MAKING
This Page Intentionally Left Blank
INTRODUCTION Janos VECSENYI and Detlof von WLNTERFELDT
Decision analysis is based on the theory of individual decision making and many of its applied techmques are derived from or related t o psychological scaling methods. It is therefore not surprising that decision analysts often experience some stress when they apply decision analysis outside of the domain of individual decision making, e.g., to group or organisational decision problems. The four papers of this chapter share a concern about the difficulties in applying decision analysis to organisational problems, as well as d hope that through an understanding of the underlying organisational issues and an adaptation of analysis, many of these problems can be overcome. Larichev presents a critical survey of systems analysis for organisational decision making. Systems analysis enlarged the domain of the earlier versions of operations research by including "softer '' and ill-structured problems, by allowing subjective judgments to enter the analysis, and by focusing on evaluation and decision making. The problems that systems analysis faces in organisational decision problems are, therefore, as Larichev points out, not very different from those of decision analysis. They include limitations of numerical trade-off analysis, problems in measurement of "qualitative" variables, problems in cases of group decisions and organisational conflicts, and difficulties in implementing the results of analysis effectively in organisational decisions. Larichev argues that if systems and decision analyses are to increase their range of applications they have to pay greater attention to issues of human factors affecting decision making, and to the role of groups and to the nature of organisational decision processes. This wdl most likely involve the development of procedures for problem analysis able to model and compare the decision maker's changing policies on objectives and the means of their accomplishment.
124
I. Vecsenyi and D.von Wintcrfeldt
Many of these points are emphasized in Lock’s paper which stresses explicitly the institutional obstacles t o carrying out and implementing decision analysis. Lock argues that many characteristics of organisational decision processes contradict the traditional decision analytic “ideology”. For example, decision makers are hard t o identify, and decision points are often vague and ill-defined events in a continuous organisational decision process. Goals are seldom stated explicity, but rather emerge through the formation of coalitions among individuals. In contrast to decision analytic principles, organisational goals are often held ambiguous, sometimes fuzzy, and almost always adaptive and subject to change. And finally, analysts, rather than being neutral enforcers of rationality, are likely to become immersed in organisational power games. Lock concludes his discussion with several concrete suggestions how analysts can raise their consciousness about organisational issues and develop-largely clinical-skills in coping with them. The third paper in this chapter discusses a number of specific pitfalls of decision analysis in organisational settings. Von Winterfeldt summarised the experience of several decision analysts arid the failures, problems, or other difficulties they encountered in applications of decision analysis. These pitfalls range from hidden agendas by the clients of the analysis, over wrong problem formulations, t o gaming in probability and utility elicitation, and failures of implementation. These experiences provide a background against which lessons are drawn for improving decision analysis. For example, suggestions are for developing a client-analyst relationship, maintaining a flexible definition of the problem, recognizing and overcoming institutional obstacles, etc. in the final paper Vari and Vecsenyi describe their experiences in three applications of decision analysis t o research and development problems. The pitfalls included being restricted by the decision maker to work only on part of the problem, existence of dependent and overlapping attributes, problems with means-ends relationships in value trees, and the rejection of SEU type considerations by some managers.
SYSTEMS ANALYSIS AND DECISION MAKING Oleg I. LARICHEV Institute for Systems Studies, Moscow, U.S.S.R.
Abstract According to current definitions, systems analysis is a combination of procedures and analytical methods used for the study of ill-structured problems. The concept of systems analysis is broader than that of decision making, including also procedures of problem investigation known as a "systems approach". The systems approach is a train of logical stages: definition of a goal or a set of goals; identification of alternative ways of goal achievement; construction of the model presenting the interdependence of goals, means and parameters of the system;determination of the dccision rule for selecting the preferred alternative. Thus, we may define systems analysis as a combination of the general framework of the systems approach with decision making tools. The last stage is in fact that commonly known as decision making. However, in earlier versions of systems analysis only a "cost-effectiveness" criterion was usually applied in the decision rule. The paper examines the basic features of contemporary systems analysis methodology and its difference from operations research. The capabilities and limitations of systems analysis are also analyzed. The requirements for decision making methods appropriate for use with ill-structured problems are defined and an approach to the development of methods in line with these requirements is proposed. Ways of improving systems analysis methodology in the light of this approach are discussed.
Introduction During the last two decades there have appeared a number of branches of research somehow or other connected with the problem of choice of the best decision alternative out of a set of potential options. Along with the traditional topics of our conference-decision malung-we could also mention such branches as artificial intelligence, cybernetics, systems analysis and a host of others. Systems analysis is probably the most popular of
126
0.1.Larichev
all thanks to a number of large-scale problems having been solved through its application. It is defined by Enthoven (1969) as a "reasonable approach to decision making correctly defined as quantitative common sense". One can easily spot the similarity of systems analysis and decision making: both deal with processes of complex human decision. A question arises: Does it not come to just a terminological difference? The question is far more important than this as both areas of research are undergoing continuous change and, as we shall presently see, there is a set of common problems connected with their future development. There are also other reasons for making such a comparison. As we shall see further, the potential of decision making methods can only be increased by the skillful use of problem analysis (that is, by the means of systems analysis). Then again, it is useful for specialists in decision making to look as if from the side at the characteristic features of their field of activity. From my point of view the best way to do it is to compare decision making with systems analysis, not least of all because for many people the frontier between them is rat her diffuse.
Systems Analysis Nowadays directly opposing attitudes toward systems analysis are often found: either a strong belief in the omnipotence of t h s new approach capable of solving, at last, complex and large-scale problems, or accusations about the use of fashionable terminology failing to produce any specific recommendations. There are numerous books and articles illustrating the effective application of systems analysis in the U.S. Department of Defense in the sixties and equally, books and articles describing the dismal failure of the approach in the civil departments of the United States administration. What then is systems analysis and what problems is it designed t o cope with? By the end of the 1960s the necessity was ripe for the application of some special analytic techniques to large-scale problems of organizational system management. The methods were designed to introduce consistency and logic into the problem solution. Also,successful application of operations research techniques (Wagner, 1969) resulted in attempts t o extend techniques of scientific analysis to problems involving both quantitative and qualitative, though sometimes insufficiently defined, factors (the former are charasteristic of operations research). Undoubtedly, there are
SYSTEMS ANALYSIS AND DECISION MAKING
127
common features in the operations research and systems analysis approaches to problemstudies. In the frst place, both these approaches are supposed to involve five stages or logical elements (Quade, 1969). These are: 1. to identify the objective or a set of objectives; 2. to find alternative ways of identifying the obejctive(s); 3. to determine the resources required for each system employed; 4. to build a mathematical (under the operations research approach) or a logical (most often under the systems analysis approach) model, i.e., a chain of dependences between the objectives; alternative ways of their accomplishment, environment and resources; 5. to define the choice criterion to be used in identifying the preferential alternative. It is worth noting that these stages are by no means inherent exclusively to operations research or systems analysis. They are also related to systems engineering (Shinners, 1967), decision making (Young, 1966), recommendations to inventors (Altshuler, 1973) and the like, briefly, to cases implying a consistent approach to investigaton of complex problems. Various “systems” approaches are integrated primarily by a set of successive logical stages whichcan be referred to as a general systems approach pattern (Figure 1).
Identify goals and resources
---)
Identify problem solution alternatives
Analytically compare the .-alternatives
Choose the most preferable a l t e r native
i
And what is the difference between the systems approaches to problem solutions? It is in the methods of analytical comparison of alternative decision options. Thus, for example, the operations research approach involves a set of quantitative mathematical programming techniques, network techniques, etc. From the very outset the systems analysis approach was associated with methods of comparison of decision alternatives subsumed under the terms of ’’cost-effectiveness” (Heuston and Ogawa, 1966) and ”cost-benefit” (Enthoven, 1969), with cost-effectiveness being the basic term.
0.1. Larichev
128
Systems analysis was fist applied to military engineering problems (Quade, 1964). An example of such a problem is given in Figure 2 .
/
.5I
100
300 500 %mber of missiles
Cost and effective-
ness synthesis
___)_
Recomrnendat Ions
al U 0
E
f u 0
100
300
500
Number o f missiles
Figure 2. Cost-Effectiveness Modelling of a Military Problem
The model consists of two portions: the cost model and the effectiveness model. The models are utilized for selection of a weapons system comprising a specific number of missiles. The cost model constitutes a total cost dependence on the number of missiles, and the effectiveness model constitutes a dependence of target-destruction probabfiity on the number of missiles. Both models may be treated in this case as objective: they are built on the basis of factual data and reliable statistics. The output parameters of the models are not integrated, however, through a given dependence, instead an executive’s judgement is employed: it is he or she who sets the marginal cost and the required effectiveness. Executives often utilize the cost-effectiveness ratio, but it is recommended simultaneously that attention should be paid t o their absolute values. In thls case cost and effectiveness are the decision evaluation criteria, hence the problem becomes a multicriteria one. The tradeoff between the two criteria values is chosen by the decision maker. It is worth mentioning that cost and effectiveness models were developed in a similar way to the ones that were traditionally invoked with operations research techniques. It was assumed in the majority of the initial applications of systems analysis that the analyst was in the position to identify objectively the major characteristics of the problem and reflect them in the model. The subjective element emerges only in the process of synthesizing the cost and effectiveness
SYSTEMS ANALYSIS AND DECISION MAKING
129
measurements. The orientation of systems analysis towards problems involving both quantitative and certain qualitative factors is well reflected in the popular definition of systems analysis offered by Enthoven (1969) who characterized it as an explicit, monitored, self-corrected and objective process. It is interesting that Quade, one of the founders of systems analysis, distinguished quantitative rather than qualitatve difference between operational research and systems analysis (6.Quade, 1969). However, representing the problem in a way analogous to that presented in Figure 2 is not a central requirement for systems analysis. Systems analysis w a s intended for solution of what Simon defines as ill-structured or mixed problems containing both quantitative and qualitative elements, where the qualitative, scarcely conceived and ill-defined facets of the problems tend to dominate. In the cases where qualitative and ill-defined facets prevail it is hardly possible to build an objective quantitative model. To exemplify a t y p i d ill-structured problem, let us turn to the problem of R & D project choice. One of the standard representations of dependences between parameters utilized in the solution of such problems (Dean, 1968) looks like: CommerResearch Profitcid Annual success X sales ability = X success pro baindicator probavolume bility bility R & D cost
Product X unit price
Steady sales
period (years)
+ Engineering expenses + Marketing expenses
The "profitability indicator", expressing the project value, does indeed depend on the listed factors. Yet it depends on many other variables which are missing from this formula, as for instance the qualifications of potential project executants. The lund of dependences between the variables contained in the formula is not objectively defined; it is just clear that some of them increase the project value while the others reduce it. It is not for nothmg that there is a host of such dependences and there are no objective reasons for singling out any one of them. The given model just reflects the belief of an organization executive that the project choice must be exercised on the presented dependences. Should there be no data necessary for construction of an objective model conceived as the only correct model, its parameters become the decision alternative evaluation criteria. The model itself, however, can be built only on the basis of the subjective preferences of the decision maker. The successful applications of systems analysis within the framework of the famous PPB system (Novick, 1965) can obviously be attributed 9
130
0.1. Larichev
primarily to the successful application of the cost-effectiveness technique to tactical military-engineering problems. As far as the broader class of strategic problem is concerned, there is no basis for the construction of an objective cost-effectiveness model. Nevertheless, attempts at adapting the cost-effectiveness technique to such problems have been and are being made. As is well known, the success of the PPB system resulted in a much broader application of systems analysis. Various planning and decision problems could be found within the range of its application. The uselessness of the method of constructing "objective" cost-effectiveness models was one of the major reasons for the failure of the PPB system in civil institutions, as was convincingly illustrated in the interesting book by Hoos (1 972). Besides, practical experience of application of systems analysis suggests that the general pattern of systems approach is far from versatile. The "magic" of the logical stage sequence of Figure 1 disappears when it comes to complex practical, "wicked" problems (Rittel and Webber, 1972) often leading to trivial recommendations like: "formulate the goals-define subgoals-identify optional means of goal accomplishment", etc. It was found that objectives are significantly affected by the means, that problem setting can be affected by a successful guess concerning the way of its solution and the achievement of the result by the possibility of re-structuring the organizational system (Gershuny, 1978). New and rather significant difficulties were encountered in the course of the application of systems analysis to problems involving several persons or active groups affecting the decision. It is quite clear that equally influential proponents of conflicting standpoints can paralyze any possibility of reaching effective decisions. The objective truth is somehow convincing. And as the decision rules are inevitably subjective, a question arises as to whose preferences must provide the basis for decisions. Systems analysis, oriented at the preferences of a single decision maker, failed to answer thls question. Practical experience of systems analysis application revealed also that the very process of application is an art. It is an art to set a problem correctly "in the solvable manner", to undertake a creative approach in the search of decision alternatives and to select appropriate scientific aids for the analysis. This art is required both of the executive responsible for prbblem solution and of the systems analyst. Successful application of systems analysis depends on a variety of factors. The very criteria for success may be objective and subjective. An experienced analyst can, with a certain degree of success, solve the problem just through a simple discussion of its aspects with many of the contributors to the decision. In this case he acts as a diagnostician (or
SYSTEMS ANALYSIS A N D DECISION MAKING
131
psychoanalyst, cf. Fischhoff, 1981.) Thus, it becomes clear how important the art of problem analysis is. It should be noted that as far back as in the sixties it was understood that the purpose of systems analysis is not a simple extension of operations research methodology to problems featuring a number of qualitative factors, that is, ill-structured problems. Thus, Schlesinger pointed out in 1963 that there are no inner reasons why subjective evaluation problems cannot be treated within systems analysis. He believed that "it was necessary to identify the area of systems analysis application as a problem where the conflict between the numerous incommensurable objectives is solved through judgement". Thus, there was some agreement that systems analysis can be utilized as a tool for problem investigation on the basis of subjective judgement. However, actual successful application of systems analysis to this end was greatly limited by the methods employed for the comparison of alternatives. These remained based on cost-effectiveness techniques which were not suitable for this purpose. The experience of systems analysis applications reviewed so far suggests the following methodological conclusions: First, each technique employed for alternative comparison is not versatile and suits a specific area of application. The cost-effectiveness technique, in its generally known version, is ill-suited for solving strategic choice problems with heterogeneous criteria. It is simply nor suitable for the solution of ill-structured problems typically encountered in social systems (education, health care, environmental protection, and the like). Second, the alternative comparison technique employed affects the overall general systems approach pattern, influencing the content of its individual stages. A unified whole appears as a result. Due to this, the general success or failure of the analytical approach to choice between the alternatives strongly depends on the particular alternative evaluation technique which is employed.
Current Trends in Systems Analysis The current state-of-the-art of systems analysis makes a complex picture. Problems continue to be studied (and sometimes successfidly) with objective cost and effectiveness models analogous to those used in the initial applications of systems analysis. As the boundaries between the various types of problems are fuzzy, there are continuing attempts to build "objective" models for the problems where inadequate objective informa-
9*
132
0.1.Larichev
tion can be supplemented only through subjective judgement. Very often, owing to the lack of any data source, the consulting analyst "patches the holes" on the basis of his own knowledge of relationships between the system parameters. In complex models, information provided in this lund of way affects the final results in an unpredictable manner. The models developed reflect to a great extent the beliefs of their creators that the world is arranged in some particular way. Sometimes the qualitative dependences between the parameters in the model are quite explicit, but it proves difficult (or impossible) to determine the exact quantity of each of these dependences. Hence the consultant "fills the gap", and in so doing also strongly affects the result. The resulting pseudo-objective models are often then found unac ceptable by the decision makers, as they are not based on the executives' experience, intuition, and preferences. As a result the model builders often do not exert any influence on decision making. Though well-known and popular definitions of systems analysis emphasize its direct orientation toward decision making (Quade, 1964, 1969; Schlesinger, 1963), the same term systems analysis has in recent years often been associated with the development of pre-fabricated, "context-free" mathematical models with a view toward creating "stores" of models for potential use in decision making. The understanding of the fact that application of systems analysis represents a combination of art and scientific analysis was used in the service of the "separation" of the analytical aid from art, together with the separation of their further study. This directior. of research has resulted in the emergence of a great number of mathematical models while there is much less data concerning anythmg to do with their application. We believe that practical problems possess characteristic features whch can be reflected only in a model built specially for the problem. This is because in the very structure of a correct model all interconnections must reflect the subjective preferences of a decision maker. Many of the models (e.g., the so-called global models; Meadows and Meadows, 1973) contain a lot of assumptions and premises of their creators intermixed with certain objective dependences. Hence application of such models in decision making is simply dangerous. The experience gained in unsuccessful applications of systems analysis to problems with a subjective structure and a list of parameters brought about two direction of research. The first one, "policy analysis" is concerned with the solution of public policy problems. Research conducted along these lines places a strong emphasis on the art of problem analysis, on problems of the organizational mechanism for decision
SYSTEMS ANALYSIS AND DECISION MAKING
133
implementation, etc. The article by Rittel and Webber (1972) is an example of interesting research conducted in this field. The second direction is concerned with the systematization of systems analysis applications and typical errors and miscalculations. Here it is worth mentioning Quade and Majone's book, "Pitfalls of Analysis" (1980), recently published through the International Institute for Applied Systems Analysis. The book attempts t o systematize unsuccessful approaches to problem analysis, model formulation, consultant-decision maker interrelationships, and so on. For all this, extremely little attention has been paid to improving the methods and procedures for analytical comparison of decision alternatives. As usual, one may encounter in the literature descriptions of cases applying "cost-effectiveness" techniques t o such problems as storage of radio active waste and construction of atomic power stations, though these problems undoubtedly involve a variety of subjective and objective factors. An impression is gained that the authors of these papers have overlooked popular critical articles and books (e.g., J. R. Hoos, 1972; 1. Hoos, 1974) and well known results of the theory of collective choice (Arrow, 1963). In actuality, the major research on methods of comparing complex decision alternatives is being conducted at present by decision making experts, and not by systems analysts.
System Analysis and Decision Making: Parallels
Let us compare some characteristics of systems analysis and decision making. Actual Content Judging by the actual content of research conducted under the terms of decision making and systems analysis one can detect the following differences : 1. Systems analysis is generally of a prescriptive (or normative) nature; i.e., it looks for an answer to the question as to how decisions should be made. In decision making one can identify three branches of research : (a)
descriptive research into decision making processes (see the recent overviews by Slovic er al. 1977; and Einhorn and Hogarth, 1982);
0.1. Larichev
134
(b)
development of normative or prescriptive techniques for decision making (see, for example, Keeney and Raiffa, 1976; multi-criteria techniques are the most popular at present); (c) mathematical research into the problems of choice (see, for example, Fishburn, 1970). It is evident that the range of research directions in decision making is much wider than in systems analysis, though here too the normative techniques in use serve to justify the existence of the entire area as a unified whole (without them we would have had some mathematical and some psychological research). 2 . The class of ill-structured systems subjected to systems analysis features both quantitative and qualitative characteristics and encompasses the business activities of man: decision making in organizational system. Most frequently these are large-scale decisions in large organization. The class of problems subjected to decision making analyses is much wider: it encompasses both problems possessing qualitative and quantitative factors and problems possessing only qualitative factors; ill-structured and non- structured problems. Decision makmg analyses have been concerned with a wide range of business decisions: from planning of astronomical (Boichenko et. al., 1977) or space (Dyer and Miles, 1976) research to consumer choice (Bettman, 1971). Also decision making covers personal, individual problems such as job choice (Cerds et al., 1979) and family planning (Jungermann, 1980). Interrelation
Systems analysis is largely oriented towards procedures of organizational decision malung while decision making is primanly oriented at the techniques for the comparison of decision alternatives. In this connection, ill-structured problem (quantitative-qualitative problems) can well be solved through the application of a systems approach, utilizing appropriate techniques from those developed for decision making analyses. The "cost-effectiveness" technique is not, in our opinion, a vital part of systems analysis, and so it can be replaced with some other for the technique for the comparison of alternatives. What is characteristic of systems analysis, in our opinion, is (a) its orientation towards ill-structured problems; (b) its utilization of a general systems approach pattern; (c) its use of a decision alternative comparison technique which allows the consideration of both qualitative and quantitative factors; (d) its normative, prescriptive nature. If we consider systems analysis from this standpoint, we see that a portion of decision making techniques makes up one of its component parts. Actually, when we look beyond the terminology
SYSTEMS ANALYSIS AND DECISION MAKING
135
employed, we find that we are familiar with such version of systems analysis. It is worth pointing out in this connection that systems analysis and decision making share conimon problems in the development of advanced techniques for comparing decision alternatives relative t o ill-stuctured problems containing qualitative and quantitative factors. The development of these techniques is of paramount importance for systems analysis as the technique employed to compare alternatives determines to a great extent the potential practical application of systems analysis. (We have seen how this has already been the case with systems analysis employing the cost-effectiveness technique ). Subjective and Objective Models As was shown above, systems analysis aims at replacing "objective" models (inherited from operations research) with subjective models better suiting the nature of ill-structured problems. On the other hand, papers on decision making have always recognized the central role of the subject, i.e., the decision maker, and the development of decision making as a branch of research connected with the problem of choice has obviously been strongly influenced by expected utility theory, based on measurement of subjective values.
Validation of Practical Need
To validate the practical need for systems analysis the following advantages are often mentioned (Quade, 1964; Majone and Quade, 1980; Optner, 1965): - systematization of complex decision making processes, - more extensive information support of the decision maker, - services of experienced consultant-analyst, - quantitative comparison of decision alternatives. Most often it comes down t o the efficient utilization of the systems approach (Churchman, 1968, 1980). When validating the need for decision making analyses, the same factors are often mentioned. Beside these there is, in our opinion, yet another significant basis for validation, this time relating to the expansion of the actual capabilities of the decision maker in perceiving complex, multi-attributed information. A skillful and experienced decision maker develops in due course his own approach to choice problems, his own policy. However, in complex decision problems involving a variety of factors this policy is but a tradeoff between the complexity of a problem and the capabilities of the decision maker. These capabilities are limited and the
136
0.1. Larichev
major constraints depend not so much on the individual traits of the decision maker as on the general characteristics of the human data processing system. Thus, the limited capacity of man's short-term memory makes one utilize various methods of data grouping (Cranovskaya, 1974), and different heuristics (Russo and Rosen, 1975). Frequently a person knows the key factors to be considered in the process of decision making but employs them inconsistently by using simplified rules. The flexibility of man, h s ability to adapt information to his potential, can lead to some undesirable consequences: compression of information can result in contradicting and erroneous estimates. For example, Tversky (1 972) has shown convincingly that by utilizing a seemingly absolutely jcstified rule for choosing between alternatives involving the neglect of insignificant differences in evaluations by criteria, one finds oneself in a "trap of contradictions", A more reasonable application of decision making techniques can however contribute to the consideration of many factors, to conscious tradeoffs being made between contradicting criteria and to the reduction of contradictions. All t h l s increases the actual capacity of the decision maker in processing complex, multi-attributed information.
Dissatisfaction with the Current Stateof-theart and Techniques It is characteristic that both systems analysis and decision making have been criticized for their limited capacities. There are by now a lot c f papers supporting the assertion that there is dissatisfaction with systems analysis in its traditional form (i.e., a general systems approach pattern incorporating the employment of cost-effectiveness techniques). This dissatisfaction ranges from complete negation (Hoos, 1972) t o consenting to its application "for want of something better" (Fischhof, 1977). It is worth noting that both proponents and opponents of systems analysis believe that it is inseparable from the cost-effectivenesstechnique. Actually, the major share of criticism concerns the application of the cost-effec tiveness technique in situations where it is inadequate (Shuman, 1976). There has also been some sharp criticism concerning various groups of decision making techniques. There is a multitude of techniques, generally subsumed under these five groups (Larichev, 1979): 1. Axiomatic (MAW; e.g., Keeney and Raiffa, 1979) 2. Direct methods for estimating probability and utility in decision analysis (Raiffa, 1968) and for estimating weighted sums of criteria (e.g., Edwards, 1977).
SYSTEMS ANALYSIS AND DECISION MAKING
137
3 . Tradeoff techniques (e.g., Indifference curves, McCrimmon and Siu, 1975; additive difference model, Tversky, 1972). 4. Threshold comparability techniques such as ELECTRE (Roy, 1968). 5. Man -machine procedures in multi-criteria linear programming (Larichev and Polyakov, 1980). Despite this abundance of methods it is rarely the case that a method actually influences the decisions made through its use. Though in formal terms the axiomatic methods are the best validated ones, the critique of such formal validation (von Winterfeldt, 1975; Rivett, 1977; Fischer, 1979) deserves consideration. With respect to practical application, the weighted sum of criteria estimates technique ranks high in use among the others (Edwards, 1977; n e e , 1971). although its sole advantage is the simplicity of using it. Hence, we cannot simply employ a decision making technique (or a group of techniques like one of the above) in systems analysis, declaring it to be “the most suitable tool for comparing decision alternatives inill-structured (qualitativequantitative) systems”. Choosing the appropriate technique is a complex case indeed as the areas of systems analysis and decision making are in a state of continuing development and there are a number of common problems affecting their future progress.
General Methodological Difficulties Problems of Measurement Both in systems analysis and decision making the decision rules employed (i.e., the models for comparing decision alternatives) are developed on the basis of data received from the decision maker and experts. Hence the methods used to obtain the data are of great, sometimes decisive importance. Some 5-10 years ago it was considered possible to obtain reliable information from man in almost any form desired. Recent research has shed new light on the actual characteristics of man’s performance under particular methods of data acquisition. Thus, Tversky and Kahneman (1974) analyzed the ability of man to provide probabilistic data. Slovic, Fischhoff and Lichtenstein (1977) studied the phenomena of overconfidence and retrospective confidence (Fischhoff and Beyth, 1975). A number of papers (e.g., Russo and Rosen, 1975; Marshak, 1968) indicate that man makes inconsistent, often contradicting holistic judgements of the worth of alternatives made up of partial estimates on multiple criteria.
I38
0.1.Larichev
The problem of assigning appropriate weights to multiple criteria is also difficult for man (Slovic and MacPhillamy, 1974). In comparing multi-criteria decision alternatives man employs heuristics leading to contradictions (Russo and Dosher, 1976; Tversky, 1972; Montgomery, 1977). According to some information, people make systematic mistakes in identifying preferences in lotteries (Dolbear and Lave, 1967). This would introduce errors when determining utilities by lottery methods (eg., Raiffa, 1968). Systematization of the results attained may be found in the overviews (by Slovic et al., 1977; Einhorn and Hogarth, 1981; Aschenbrenner, 1979;and Lanchev, 19lu)). Given all these results, what do they indicate? First of all, the methods for decision making techniques should be revised. If the methods are criticized for failing to consider the real constraints and informational capabilities of men, then it is necessary to take tlus criticism into account at the method development stage. For example, a significant problem encountered when measuring qualitative characteristics inherent to ill-sructured problems is the impossibility of using the quantitative measurement scales employed in current methods in any reliable way. Unfortunately, there are at present no reliable methods of quantitative measurement of the multiple subjective factors involved in decision quality evaluation like the organizational image, the scientific level of problem solution, the technological decision irnpct on society, etc. In measuring these factors we can employ only ordinal scales, with a verbal definition of each grade of quality on the scale. The same position that we face here probably existed for physical variables, such as heat, length and the like, up until the time when quantitative measurement techniques were devised (Carnap, 1965). We shall see whether quantitative measures will appear for "qualitative" factors like those listed above, but currently we have none. Accordingly, it is much better to use reliable ordinal scales than unreliable ratio or interval scales. Along with qualitative factors, ill-structured problems involve also the quantitative factors (i.e., those on which man usually makes quantitative evaluations). Generally speaking, it is desirable that, in describing factors, the measurement techniques employ a language which is amenable to the decision maker and experts who will potentially participate in their use (Larichev, 1979). From my point of view this point is very important. Using adequate language in forming the problem description can roughly be counted upon to increase the trust of a decision maker and an expert in the results of the analysis. Any transformations of information within the model without proper utilization of the decision maker's preferences harm this trust.
SYSTEMS ANALYSIS AND DECISION MAKING
139
Development of Decision Rules for IllStructured Problems The problem of measurement is closely associated with the problem of decision rules integrating quantitative and qualitative factors on the basis of the decision maker’s preferences into a unified rule for the evaluation of alternatives. The problems of identification of preferences are closely linked with the psychology of decision making. As a general case, questions concerning the ability of man to provide the type of reliable information required for the development of a decision rule are the subject matter of specialized research. The word “reliable” implies here the following three characteristics: (1) recurrence in subsequent interviews; ( 2 ) transitivity; (3) the possibility of formulating complex rules for estimating decision alternatives (Larichev ef al., 1980). Hypotheses concerning ways of obtaining the data must be proposed and checked in model experiments (Larichev, 1979). Something has already been achieved to this end. It is known, for example, that the application of several heuristic rules substantially reduces the number of errors made when comparing multi-criteria systems (Montgomery, 1977). It is also known that man can be taught to evaluate subjective probabilities of events better (Lichtenstein el al., 1977j. Also, given a certain number of criteria and binary estimates, man can classify multi-dimensional alternatives on ordinal scales (Larichev et a/., 1980). It is well-known that, using verbal evaluations on ordinally scaled criteria, man can compare the effects of decreasing estimates along the scales of two criteria while the estimates on all other criteria under estimation are set to create the image of the best (or worst) decision (Larichev, 1979). This method lies behind the ZAPROS technique (Zuev et al., 1979). Of particular significance for ill-structured problems is how to integrate qualitative and quantitative factors within a model. These problems also typically involve objective dependences between a number of factors and some objective models. The integration of all this on the basis of subjective preferences into a general model for evaluation of the decision alternatives implies a host of complex methodological problems. Must consideration be given first t o qualitative factors, followed by the quantitative ones (Emelyanov and Larichev, 1976)? How, for example, does one compare the cost (represented on an uninterrupted quantitative scale) with undesirable spinoffs (represented on qualitative scales)? It should be pointed out that any simplifications introduced into problem solutions in the face of such difficulties will significantly affect the confidence of the decision maker in the results of the analysis.
140
0.1.Larichev
Collective Decision Making Both systems analysis and decision making face the problem of collective decisions, the problem of viewpoints held by different groups influencing the decisions. Despite the need here for practical methods, we find that results taken from purely theoretical models prevail in decision making practice (see Emelyanov and Nappelbaum, 1978, for an overview). From the pragmatic point of view it is clear that there is a need for procedures for the discussion and harmonization of opinions contributing to a collective decision. It is clear, though, that no procedures will be helpful in cases where there are well thought through and entrenched contradicting policies (e.g., an environmentalist versus a proponent of industrial development). We strongly believe that a thorough consideration should be given to the problems which lie in between those of personal and collective decision making and its implementation. In contrast to problems characterized as individual decision making, however, here an individual decision maker is guided by the criterion of maximum satisfaction of interests of those persons (or groups) who have an interest in the problem solution. Should there be a decision option satisfying all of the participants then the preferences of the decision maker do not affect the outcome. Illustration of the impossibility of attaining such a result influences the participants and reveals the true position of the decision maker. As was correctly noted in Thurow (1969), such a demonstration is very important. Moving further on, the executive compares the individual utilities from his own point of view but tries not t o be autocratic and tries to find a decision which is close to being "natural". An overview of a number of works concerning this problem can be found in Rosenman (1978), an example of one of the papers is that by Larichev and Kozhukharov (1979).
Rational Procedures of Roblem Analysis Modem systems analysis is in need of additional research aimed at the development of procedures allowing the generalization of results and discoveries of talented analysts concerning problem setting and ways of implementation. This is equally true for decision making as it is rarely the case that a real-world problem consists simply of the comparison of decision alternatives. Usually, this is preceded by extensive work on the part of the consulting analysts. And should they succeed in achieving a situation allowing the application of a decision making technique, it would be safe to say that the problem is 70 per cent solved.
SYSTEMS ANALYSlS AND DEClSlON MAKINcl
141
What must the real procedures of problem analysis be like? Most probably these must be iterative procedures for comparing the changing objectives (and means of their accomplishment) which account for the major lines of the decision maker’s policy. Certainly, the art of their application will play the central role but a certain framework will appear permitting a less skillful executive (and/or consultant) to perform on a iugher level.
Conclusion The last 20 to 30 years have witnessed the emergence of decision making techniques for the analysis of problems that have traditionally been considered as unyielding to rational research. The methods of problem analysis are far from perfect and do not take into account the complex nuances of human decision making. The process of improving analysis techniques involves periods of excessive belief in the versatility of particular methods of problem solution (e.g., mathematical models), followed by periods of absolute scepticism concerning the efficiency of the said techniques. The truth is somewhere in the middle: there are groups of problems which are procurable with adequate analysis techniques but the majority of real problems still lack such tools for their solution. The current stage of development of systems analysis and decision malung dictates the necessity of considering the human factors which affect decision making. To the forefront come psychological and sociological factors. The problems of cognition and the study of decision makers’ policies and the role of active groups acquire an ever greater significance. We believe that the future of systems analysis depends on the successful solution of these problems: either it will flash and then vanish as the next ”fashonable” trend, or gain strength, integrating knowledge from a variety of scientific disciplines with the art of problem analysis.
References Altshuler, G. S., 1973. The Algorithm of Invention (in Russian). Moscow:Rabochi. Arrow, K. J., 1963. Social Choice and Individual Values. New York: Wiley. Aschenbrenner, M., 1979. Komplexes Wahlverhalten als Problem der Informationsverarbeitung. Universitat Mannheim, Sonderforschungsbereich 24. Bettman, J . R., 1971. A graph theory approach to comparing consumer information processing models. Management Science, 18. 114- 128.
142
0.1. Larichev
Boichenko, V. S., 0. I. Larichev, V. A. Minin, and H. M. Moshkovich, 1977. The method of the development of scientific and technological policy for the planning of fundamental research. In: V. A. Trapeznikov (ed.), The Management of Research, Development and Introduction of New Technology (in Russian). Moscow: Economiks. Carnap, R., 1965. Philosophical Foundations of Physics. Ncw York: Basic Books. Churchman, C. W., 1968. The Systems Approach. Delta Books. Churchman, C. W., 1980. The Systems Approach and Its Enemies. New York: Basic Books Dean, B. V., 1968. Evaluating, Selecting and Controlling R & D Projects. New York: American Management Association. Dolbear, F. T. and L. B. Lave, 1967. Inconsistent ;behavior in lottery choice experiments. Behavioral Science, 12 (1). Dyer, J. C. and R. F. Miles, Jr., 1976. An actual application of collective choice theory for the Mariner JupiterlSaturn 1977 Project. Operations Research, 27. Edwards, W., 1977. How to use Multi-Attribute Utility Measurement for social decision making. IEEE Transactions on Systems, Man and Cybernetics, SMC- 7 (5). Einhorn, H. J. and R. M. Hogarth, 1981. Behavioral decision theory: Processes of judgement and choice, Annual Review of Psychology, 32. Emelyanov, S. V. and E. L. Nappelbaum, 1978. The Principles of Rational Collective Choice (in Russian). Moscow: VINITI. Emelyanov, S. V. and 0. I. Larichev, 1976. A multicriteria approach to applled R & D planning: The case of qualitative criteria. Problems of Control and Informtion Theory, 5- 6, 385-399. Enthoven, A., 1969. The systems analysis approach. In: Program Budgeting and Benefit-Cost Analysis. Pacific Palisades, California: Goodyear Publishing Co. Fischer, G. W., 1979. Utility models for multiple objective decisions: Do they accurately represent human preferences? Decision Sciences, 10, 45 1-479. Fischhoff, B., 1977. Cost-benefit analysis and the art of motorcycle maintenance. Policy Sciences, 8, 177-202. Fischhoff, B., 1981. Decision analysis: Clinical art or cllnical science? In:L. Sjoberg, T. Tyszka and J. A. Wise (eds.), Decision Analyses and Decision Processes. Lund: Doxa Fischhoff, 8. and R. Beyth, 1975. I knew it would happen-remembered probabilities of oncefuture things. Organizational Behavior and Human Performance, 13. Fishburn, P. C., 1970. Utility Theory for Decision Making. New York: WiIey. Cerds, U., K . Aschenbrenner, S. Ieromin., E. Kroh-Puschl, and M. Zaus, 1979. Probiemorientiertes Entscheidungsverhalten in Entscheidungssituationen mit Mehrfacher Zieisetzung. In: H. Ueckert and D. Rhenius (eds.), Komplexe menschliche Informatwnsverarbeitung. Bern: Huber. Gershuny, J. I., 1978. Policymaking rationality: A formulation. Policy Sciences, 9., 295-316. Granovskaya, R. M., 1974. Perception and Memory Models (in Russian). Leningrad: Nauka. Heuston, M. C. and G . Ogawa, 1966. Observations on the theoretic basisof costeffectiveness. Operations Research, 14., (2). Hoos, I. 1974. Can systems analysis solve social problems? Datamation, June, 82-92. Hoos, J. R., 1972. System Analysis in Public Policy. Los Angeles: University of California Press.
SYSTEMS ANALYSIS AND DECISION MAKING
143
Jungermann, H., 1980. Speculations about decision theoretic aids for personal decision making. Acta Psychologica, 45, 7-34. Keeney, R. and H. Raiffa, 1916. Decisions with Multiple Objectives: Preferences and Value Trudeoffs. New York: Wiley. Klee, A. J., 1971. The role of decision models in the evaluation of competlng environmental health alternatives. Management Science, 18 (2). Larichev, 0. I., 1979. Science and Art of Decision Making (in Russian). Moscow: Nauka Larichev, 0. I., 1980. Process tracing for the problems of estimation and choice of multicriteria alternatives in decision making. In: S. Emehnov and 0. Larichev (eds.), Descriptive Studies of Multicriteria Decision Making Processes (in Russian). Moscow: VINITI. Larichev, 0. 1. and A. N. Kozhukharov, 1979. Multiple criteria assignment problem: Combining the collective criterion with individual preferences. Mathematique et Science Humines, 68, 63- 77. Larichev, 0. I., V. S. Boichenko. H. M. Moshkovich, and L. P. Sheptalova, 1980. Modelling multiattribute information processing strategies in a binary decision task. Organizational Behavior and Human Performance, 26. 218-29 1. Larichev, 0. I. and 0. A. Polyakov, 1980. Interactive procedures for solution of multicriterial mathematical programming problems (in Russian). Economika i Matematicheskije Methodi, 1. Lichtenstein, S., B. Fischhoff, and L. D. Philhps, 1977. Calibration of probabilities: The state of the art. In: H. Jungermann and G. de Zeeuw (eds.), Decision Making and Change in Human Affairs. Amsterdam: Reidel. McCrimmon, K. R. and J. K. Siu, 1914. Making trade-offs. Decision Science, 5. Majone, G. and E. S. Quade (eds.), 1980. Pitfalls ofAnalysis. New York: Wiley. Marshak, J ., 1968. Decision making: Economic aspects. International Encyclopaedia of the SocialSciences. New York: Crowell, Colier, Macmillan, vol. 4. Meadows, D. L. and D. H. Meadows (eds.), 197 3. Toward Global Equilibrium. Wright Allen Press. Montgomery, H. A., 1977. A study of intransitive preferences using a think aloud procedure. In: H. Jungermann and G. de Zeeuw (eds.), Decision Making and Change in Human Affairs. Dordrecht: Reidel, 1965. Novick, D. (ed.), 1965. Program Analysis and the Federal BudRet. Cambridge, Mass.: Harvard University Press. Optner, S. L., 1965. Systems Analysis for Business and Industrial Problem Solving. Englewood Cliffs, N.J.: Prentice Hall. Quade, E. S. (ed.), 1964. Analysis for Militav Decisions. Chicago: Rand McNally and co. Quade, E. S., 1969. Systems Analysis Techniques for Planning-Programming -Budgeting: Systems Approach to Management. Chicago. Raiffa, H , 1968. Decision Analysis: Introductory Lectures on Choices under Uncertainty. New York: Addison-Wesley. Rittel, H. and M. Webber, 1972. Dilemmas in a general theory of planning. Policy Sciences, 4. Rivett, P., 1917. The dog that did not bark. Engineering Economics, 2 (4). Rosenman, M. I., 1978. Methods of collective decision making with participation of the decision maker. In: S. Emelyanov (ed.), Multicriteria Choice in Solving Ill-Structured Problems (in Russian). Moscow, VINITI, 38-48.
I44
0.1. Larichev
Roy, B., 1968. Classement er choix en presence de points de w e multiples (la methode ELECTRE). Revue Franfais d e s Informatiques et Recherches Operationales, 2 ( 8 ) . Russo, J. E. and 9. A. Dosher, 1976. An Information processing analysis of blnary choice. Technical Report, CarnegbMellon University. Russo, J. E. and L. D. Rosen, 1975. A n eyc fixation analysis of multi-alternative choice. Memory and Cognition, 38 267--276. Schksinger, J . R., 1963. Quantitative and national security. World Polirics, 1.5 ( 2 ) . Shinners, S. M., 1967. Technique for System Engineering. New York: McCraw Hill. Shuman, J., 1976. Mathematlcal model bullding and public policy: The game some bureaucrats play. Technological Forecasting and Social Change, 9. Slovlc. P . , B. Fischhoff, and S . Lichtenstein, 1977. Behavloral decision theory. Annual Review of Psychology, 28. Slovic, P. and D. MacPhillamy, 1974. Diniensional commensurability and cue utilization in comparative judgment, Organizational Behavior and Human Performance, 1 1 . Thurow, L. C., 1980. The ZeraSurn Society. New York: Baslc Books. Tversky, A., 1972. Choice by elimination. Journal of Mathematical Psychology, 76
(I). Tversky, A. and D. Kahneman, 1974. Judgment under uncertalnty: Heuristlcs and biases. Science, 18.5. von Winterfeldt, D., 1975. An overview, integration and evaluation of utility theory for decision analysis. Technlcal Report 75-9. Los Angeles: Social Science Research Instltute, Unlversity of Southern Californla. Wagner, H. M., 1969. Principles of Operations Research. New Jersey: Prentice Hall. Young, S., 1966. Management: A Systems Analysis. Glenview, Illinois: Scott, Foresman and Co. Zuev, Ju. A., 0. I . Larichev, V . A. Filippov, and Ju. V. Chuev, 1979. The problems of R & D project estimations (In Russian). Vestnik AcedemiNauk, 8.
APPLYING DECISION ANALYSIS IN AN ORGANISATIONAL CONTEXT* Andrew R LOCK Kingston Poly lechnic, England
Abstract The structures and political processes of client organisations present a number of potential pitfalls for decision analysis applications. Above the purely operational level, decisions appear within organisations as disruptions and imply changes in the relative power and status of participants. Uncertainty about such changes is a source of resistance to the analysis process. The balance of individual goals within overall organisational goals results from the internal power structures and political processes, and the nature of the dominant coalition. The presence of an 'executive' coalition with the power to take explicit decisions and to implement them markedly increases the likelihood of acceptance of 'rational' choice models such as decision analysis. In other situations, analysis takes place within a more complex political framework and has to be seen more as a contribution to a broader organisational decision-making process. In the light of this, c h i c a l strategies are proposed for the decision analysis process Particular emphasis is placed on the nature of the 'contract' before entering the organisation, and on problem definition and structuring.
Introduction As Bross (1953) points out, decision consdtancy has distinctly dubious antecedents. Decision analysts play similar roles to those played in the past by medicine men, soothsayers, prophets, wise men, sorcerers and even witches-and the field has always been vulnerable to charlatanry. It would be comforting to believe that resistances encountered in practical applications were due to the scepticism engendered by these predecessors, and
* The author would like to thank the editors and two anonymous referees for their extremely useful suggestions, and t o express gratitude t o the participants in the Pitfalls in Decision Analysis symposium in Budapest, notably Rex Brown, Ward Edwards, Dave Gustafson, Larry Phillips and Detlof von Winterfeldt for providing many of the ideas which stimulated the revisions of this paper. 10
146
A.R. Lock
that a few demonstrations of the effectiveness of' newer decision-making technologies would serve to quell unbelievers. Apart from the fact that some of this technology is cumbersome, testing the credulity of subjects and clients, I argue here that much of the decision analysis seed will be cast on stony ground, without an awareness of the organisational environment and inherently political process of applied decision analysis. The decision analysis paradigm has generally met most success in organisational applications at the operational level. Despite the fact that practitioners have felt that the technology would be better applied to the less-structured problems of strategic directions for the organisation, applications in this area have not met with conspicuous success. Part of the problem is that it has rarely been admitted that resistance was encountered or that the recommendations were not taken up. Such adnussions are revealed betwsen decision analysts but have to date received little or no attention in the main stream of decision analytic literature. Exceptions are Fischhoff s (1977) comparison between decision analysis and psychoanalysis and Phillips' (1980) discussion of the relationship between organisational structure and the ability t o cope with uncertainty, both of which stimulated many of the thoughts that have gone into this paper. Interestingly, similar concerns about the breadth of applications and the acceptability of recommendations are being expressed in the O.R. and Management Science literature (Ackoff, 1979a and b; Eilon, 1980). As one studies problems at higher levels in organisations, the range of options and the intervention process itself have organisational implications, which links analysis to political processes within the organisation. If the objective of a study is acceptance of its prescriptions, analysis strategies have to include a deliberate consideration of the structure and processes of the organisation involved. The goal of this paper is to explore the issues of power and political decision-making processes witlun organisation and the extent to which these relate to decision analysis and its implicit ideology. Specific pitfalls are discussed and suggestions are made for approaches to the analysis process whch might increase the probability of acceptance and successful implementation of the ensuing recommendations. Controversy seems inevitable. Some of the views expressed in this paper might be dismissed as being mainly applicable to a conventional conservative U.K. corporate environment. It is true that the decision analysis paradigm has been found to be less successful in Europe than in North America. However, many of the pitfalls mentioned have been encountered in U.S. applications. Given the small number of decision makers and the disparate problems and situations with which analysts have
DECISION ANALYSIS IN ORGANISATIONAL CONTEXT
147
worked, any development of a theoretical framework for the organisational context for decision analysis has to be based on an eclectic selection of material from many areas.
Decision Analysis and Ideology Decision analysts often see their work as disinterested and, in helping decision makers, as a value-free embryonic science, despite the many demonstrations that science does reflect a particular Weltanschauung. Fischhoff (1 977), for example, has neatly identified the rationale for much of this work with the ideological biases of therapeutic interventions in the field of psychotherapy. The belief that analysis is de facto a valuable intervention is at the heart of the decision analyst's credo, and underlies the concept of intellectual technology promulgated by Bell (1979) and echoed by Phillips (1 980). A structured approach t o decision making reflects, among other things, both a desire for orderliness, and a belief that an orderly solution exists. Thus can sometimes lead to analysis becoming a form of ritual that has to be gone through in making a decision. In some such cases, the impression is left that decision analysis becomes a form of (unwitting) casuistry for salving clients' consciences over morally difficult decisions. Under decision analysis lies a model of rational choice. Its roots are explicitly normative and are based on the general proposition that there exists a set of clearly definable goals or objectives which can be modelled in terms of well-defined preferences. Findings that SEU models can be poor predictors of actual individual choice behaviour within an organisational context (Simon, 1978, 1979), have not led decision analysts to question whether they are a good basis for improving decision making. Consequently 'better' decisions are often defined in the analyst's terms rather than those of the client. As well as carrying with them the normative content implicit in the paradigm, analysts frequently enter decision situations with preconceived ideas about the structure of problems and the preference structures of the individuals involved. Particularly in business organisations, verbal statements of goals and the structures elicited are influenced by external norms perceived by the subjects and/or the analyst. As an example, discounting is a frequently used technique to aggregate money or utility flows in different time periods; Lock (1982) reports the results of preference analyses for one corporate decision maker whose relative weights for money flows increased up to the third year out and only declined slowly thereafter. 10*
148
A . K . Lock
T h s does not mean that analysts should avoid making any personal inputs. The passive role is rarely an option. Decision analysts should be aware of their own goals and be prepared to question whether the assumptions and structure being used are consistent with the broad perspective of the client.
Decisions, Procedural Rules and Organisational Upheavals
In discussing organisational decision processes it is convenient to distinguish between one-off decisions and the repeated application of procedural rules. The latter may be laid down by executive fiat or established custom. They represent ways of responding to situations or events that can be categorised in terms of those that have previously been encountered. There is an analogy here with the procedural scripts that individuals apply in repetitive choice situations (Abelson, 1976). Which procedure to use is not always easily determined, and is sometimes the subject of a procedure in itself, or has a stochastic component. The degree of development of such rules is largely dependent on the external environment and the type of problems that the organisation encounters. Organisations, or sections of organisations, in stable environments tend to have well-developed if not ossified procedural rules, and are olten referred to as mechanistic (Burns and Stalker, 1961). As organisations become larger a higher level of programming, in the form of standardised procedures and control systems, takes place (Pugh and Hickson, 1976), although, at the top of large organisations decision problems tend to become less structured, because of the lengthening of the time span of the decision maker as organisational size increases (Jaques, 1976). Within the organisation, decisions appear as events or episodes, representing discontinuities in the organisational continuum. The more of these discontinuities that a section of an organisation and its members experience, the less likely they are to perceive them as major upheavals and the better they will be at handling uncertainty (Phillips, 1980). Thus, the best opportunities for decision analysis applications are offered by lateral or matrix structures, based on a task or project oriented culture, which develop as a response to the need for flexibility in a changing environment. In other types of organisational structure, the climate for decision analysis is less favourable. In vertical-hierarchical organisations, which are the dominant form in stable situations, decisions are perceived as significant upheavals and there is a low ability to cope with uncertainty. In power cultures, where power is concentrated at the centre, or person cultures,
DECISION ANALYSIS IN ORGANISATIONAL CONTEXT
149
where power is diffused according to expertise (Handy, 1981), decisionmaking is highly politicked and there is a tendency to reject explicit choice models.
Contrasting Perspectives
For most practising decision analysts, even internal consultants, the frequency with which they change problems or situations makes it difficult to appreciate the position of people in an organisation whch has undergone relatively few upheavals, particularly as they do not experience the longer term consequences. Organisational upheavals imply changes in relationships and relative power positions. The more deeply established these are, the greater the obstacles in the path of change and those who appear to be harbingers of change. The insecurity that one frequently observes in this type of situation reflects primarily the uncertainties relating to the organisational consequences as opposed t o the uncertainties that analysts are usually concerned with measuring.The uncertainties that engender insecurity concern organisational positions, relative power, interpersonal relations and interactions with the exterior environment. As a newcomer to an organisation, the analyst lacks a perspective on the particular organisational culture. Although it is quite easy to list the factors that influence the type of culture-history and ownership, size, technology, goals and objectives, environment, and people (Handy, 198 1)-it is rather harder for an outsider t o gain a full understanding in a short time. Things are further complicated by the differences in culture between groups-the dominant coalition and affected groups lower down the organisation, for instance. Outsiders frequently find that the most established parts of the culture are taken for granted and therefore not mentioned and others may be taboo subjects.
Power in Organisations Pettigrew (1973) observed that analysis of organisations as political systems had not been widespread. Indeed it has seemed that power in organisations has been a subject that people did not wish t o address. There is a distinction between power and authority in organisations. Authority is voluntarily obeyed being seen as a legitimate contribution to the functioning of the organisation which the recipient of orders has usually joined voluntarily. Organisation charts map the national authority structure. Against h s is the concept of power whereby individuals or coalitions
150
A.R. Lock
are in a position to change positively or negatively the achievement of the goals of the organisation or others within it. It may not be exercised directly but represent an unspoken internalised constraint upon decisionmakers. The power of organisational actors is determined by their ability in performing their organisational role and the importance of that role (Pfeffer, 1981). There are a number of general forms and sources of power in organisations: Power over resources. Clearly, possession of resources that others want and need confers bargaining power. This extends beyond the obvious examples of money and slack resources t o the provision of jobs and career advancement of subordinates. Coping with uncertainty. Individuals who are able to cope well with uncertainty, even to the point of being able to repair machines that are crucial to the functioning of a system, acquire power. Mintzberg (1978) suggests that consultants are hired t o absorb uncertainty for the organisation. Being irreplaceable. As technological specialisation increases, the more likely it is that certain actors possess skills or knowledge that renders them irreplaceable. Strikes by key computer personnel, which prevented the government collecting large amounts of tax revenue, enabled unions in the U.K. Civil Service to maintain pressure on the government for some long time in 1981 at relatively little cost. Similar dependence patterns are to be found increasingly in organisations. Controlling the decision process This flows from control of information about alternatives, and will need little amplification for decision theorists. It can also stem from control over agendas (Plott and Levine, 1978). Power of consensus The degree of internal cohesion and shared world view markedly affects the influence a sub-group can have o n the organisation and its credibility and consequent influence with other actors. Possession of political skills. It is unlikely that an individual can exercise power purely by the possession of political skdls, though that may have been the means by which he or she achieved a position with other sources of power. Political skills determine the effectiveness with w h c h other sources of power are used. They become paramount in organisations where the other sources of power for individuals and coalitions result in a fairly even balance.
DECISION ANALYSIS IN ORGANISATIONAL CONTEXT
151
The concept of power and its sources are important in analysing situations particularly those which are approached as a newcomer. Few participants in decision processes involving outside consultants are remote from the political processes in organisations and so that derived accounts of relative power are usually slanted and freqiieotly very sketchy. Resistance t o bringing power issues into the open is often encountered (Pettigrew, 1973). Thus much of the analysis of the political processes in the organisation has t o be done by direct observation and inference rather than relying on participants’ accounts. Neither is the decision analyst divorced from the political process. Studies are commissioned by those that already possess power and are often seen as means to reinforce that power. Analysts enter the situation with power based on special skills and external status and acquire additional internal power through association with the dominant coalition. The conventionally presented role of the decision analyst allocates a considerable implied degree of power to him or her. It is not assumed there that the decision maker can understand the techniques by which the probability and outcome inputs to the model are translated into the model, nor is it assumed that the decision maker should understand what is going on in the utility assessment phase. If decision makers accept the full implications of t h s form of analysis, they are in fact abdicating control over it. In the long run this may well be perceived as a threat to their position. Indeed decision makers may well feel threatened by the notion that their ’real’ policies may be captured by models, let alone by simple models. There may be criteria or goals to which they are unwilling to admit, and unwilling to have revealed. Particularly in the case of middle managers, a subject might be placed in the position of covering up his or her true preferences and then being faced with a solution, based on declared preferences, that he or she finds unacceptable. It is remarkably difficult for a manager in a supposedly ’rational’ role to argue against a supposedly rationally derived solution to a decision problem. The willingness of subjects to go along without a reasonable understanding of the process is often dependent on the perceived status of the analyst. Analysts with lower perceived status may find it hard t o persuade subjects to participate in some of the tasks required. For example, Lock (1979, 1982) reports a case study where some of the difficulties in persuading subjects t o participate in certain elicitation exercises were due t o the analyst’s lack of perceived status in that case as a doctoral student.
152
A.R. Lock
Organisational Decision Processes Decisions in organisations emerge from the relative influence and interests of different individuals and coalitions. The specific individuals and coalitions involved in a particular decision will depend upon the groups of people affected and the level of decision w h c h is being considered. These levels range from the strategic, concerned with the overall direction of the organisation, down to immediate operational decisions at the most basic level. The political nature of the bargaining processes implicit in making these decisions is usually only seen when the distribution of interest, expertise and power is different from the explicit authority structure. In reading studies of organisational decision-making piocesses. one notices how little is written on the actual formal mechanisms and ntuals as opposed t o the line-ups in terms of coalitions and relative power. Little is said about the extent to which decisions are made by individual executive action, taken by groups or cabals in informal meetings, or taken in committee. Decisions are perceived to emerge rather than there being always a readily identified decision point. This reflects the extent to which 'decisions' in many instances are the cumulative consequence of a set of minor preceding actions (or inactions). Weiss (1980) has suggested the phrase 'decision accretion' for this pattern. Simdarly, the power of other groups may only be identifiable as internalised constraints on the part of the decision-maker . These form part of the 'frame of reference' or 'world view' of the decision maker and other organisational participants. Without an understanding of these viewpoints, as is noted by Boulding (1959), analysts will understand little of the process. Some studies of organisational decision processes suggest that goals are largely imputed by observers. Weick (1979) and March (1978) question the usefulness of notions of preferences at the organisational as well as the individual level, observing that they result from choice rather than determining it. In this framework choices are acquired, learnt, or they are the result of the influence of other participants. Where does all this leave practitioners of decision analysis'? It appears that in many organisations decisions are difficult t o identify and that there tends to be a very distinct desire to adhere t o and reinforce existing strategies (Staw, 1976). One possibility consists of changing the goal of an analysis as making a contribution to the organisational process rather than specifically recommending a course of action and getting it adopted. Fischhoff (1977) suggested that thls contribution is frequently the main benefit of decision analysis and saw this as a fundamental objection to Watson and Brown's (1978) proposals on the valuation of decision analysis. The rejection of any formal recommendations does not imply
DECISION ANALYSIS 1N ORGANISATIONAL CONTEXT
153
that no useful result has been achieved; the process of structuring the problem and the information relating actions to outcomes may have been a significant influence on the organisational decision process. Rejection of formal proposals does, however, suggest that somewhere along the line there was a lack of understanding of organisational culture, decision processes, and political structure.
Organisational Goal Structures
“Organisationsdo not have objectives, only people have objectives’’ (Cyert and March, 1963). Despite this view, it is rare that the goals of an organisation can be clearly identified as those of one individual participant and equally rare t o find an individual all of whose goals are satisfied by one organisation. It is clear that individuals satisfy their goals by multiple means inside and outside the organisation and that they adjust them in the light of experience. What is less clear is the processes by which organisational goal structures emerge from individual ones. Goal structures serve functions of coardination and direction ot organisations. Ideally, at each level in the organisation the overall goal structure should be broken down into goals appropriate to problems that ate encountered at that level while remaining consistent with the goal structure at higher levels. Decision problems are usually perceived in terms of a difference between some actual or forecast state and a desired state. Whilst the desired state or goal may be re-evaluated and adjusted in the light of experience, an absence of goals or aspiration levels removes the possibility of developing any coherent framework for action. The extent to which members of organisations have similar individual goal structures is affected by the way in which the organisation selects and admits recruits, the self-selection of potential recruits and the extent to wiuch newcomers are socialised into the organisational culture. Participants join internal coalitions to further personal goals, though the goals pursued by the coalition are an amalgam of members’ goals where the relative importance of individuals’ goals depend on their power. Some useful examples may be found in MacMillan (1978). The nature of the dominant coalition in the organisation depends on the patterns and diffusion of power in the organisation. Richards (1978) provides the following categorisation of dominant coalitions: - an executive coalition, based on the chief executive and those to whom he or she delegates power,
154
A . R . Lock
a bureaucratic coalition, where essentially the chief executive has lost power to the component departments, - a political coalition, where no one group has sufficient power of itself, and political slulls become dominant, - an expert coalition, where power stems from individual skills and expertise; Handy (1981) describes this as a 'person culture'. These different patterns of dominant coalitions result in markedly different goal structures. In the first case, there tend to be clearly stated corporate goals of a familiar conventional nature; in the second, overall goals are very much secondary to sub-unit interests; in the highly politicised organisation. it may be very difficult to identify any general goals at all; finally, in the expert dominated organisation, the overt pursuit of individual goals predominates. Unless the balance of power between coalitions is evenly balanced or coalitions are potentially unstable, decision analysts are introduced into the organisation by members of the dominant coalition. It may be seen that the strategy of the analyst will vary according to the nature of that coalition. 'The easiest one to deal with is the executive coalition, where problems and preferences are likely to be sufficiently articulated to produce an analysis acceptable to the decision maker and with the top-down commitment which is seen as the most successful route to change by specialists in Organisational Development (Porter el d., 1975). In the bureaucratic coalition, analysis a t the departmental level is likely to be similarly successful, but attempts to develop strategies at a higher level seem likely to fail. Within politicked and 'person culture' organisational contexts, there will be no single individuals with sufficient cornmitment and power to implement a set of recommendations even if it were possible to sufficiently discern an overall set of goals to actually make formal recommendations, though it may be possible to contribute to organisat.iona1 decision processes here by structuring available options and focussing attention on areas of agreement and disagreement.
-
Pulling Aside the Shroud of Ambiguity Whlle it is frequently claimed as an advantage of decision analysis that it throws light into areas that have previously been vague and shadowy, there remains what March (1978) has called the 'optimal clarity problem'. One aspect of this problem concerns the extent to which preferences need to be clarified. Explicit discussion of trade-offs and relative preferences has a tendency to accentuate differences in opinion and viewpoint. Sensitivity analysis on the final solution of an analysis frequently shows
DECISION ANALYSIS IN ORGANISATIONAL CONTEXT
155
that it is remarkably stable with respect to varying opinions and preference structures. Thc analyst may then concentrate on those differences of view to which the decision is sensitive, rather than creating unnecessary conflict at an earlier stage. The requirement that goals be clearly articulated at the outset conflicts with views (eg., Weick. 1979; Humphreys and McFadden, 1980) that goals are adjusted or changed as a consequence of investigating or experiencing the outcome of behaviour. Having 'fuzzy' goals may serve similar functions to other heuristics in a dynamic environment (Einhorn and Hogarth, 198 I). The insistence on a clear definition of goals at a particular point of time may lead t o an impoverished set, w h c h will not produce a decision that is 'better' in whatever subjective terms clients may choose to apply. One also has to consider the impact of attempts to structure decision problems on intermediate groups. The problem is particularly acute when one is dealing with middle managers, who are frequently unwilling t o expose either their forecast of actual outcomes or their own preferences to senior management for fear of ending up in a position they feel would be untenable. They would be responsible for the implementation of a chosen strategy and would carry the can for its success or failure. Conventionally middle managers survive by the exploitation of ambiguity in terms of goals and strategy outcomes. The difficulties faced by them are discussed by Uyterhoeven (1971). The maintenance of some level of ambiguity enables subordinates to protect their interests to some extent vis h vis the dominant coalition in the organisation. The pressure from senior management to expedite decision processes has a tendency to become an extension of their control over the organisation and a reduction of the autonomy of middle management . Thus there are advantages in not placing too much emphasis on precision. The need to maintain access and not to provoke political resistance unnecessarily seem to dictate the need for a certain level of flexibility in the definition of goals and for certain guarantees to intermediate participants, in order to prevent the collapse of an analysis due to political pressure or the production of information later aimed at discrediting it.
Problem Areas Encountered in Practice Who made the approach? Difficulties seem to occur when the initiative for a decision analysis did not come from the actual decision maker(s). Particularly in government organisations, the initial approach may come
156
A.R. Lock
from individuals higher up the organisation. In some cases, analysts have made the first approach. The initial ’contract’. As with many services provided by consultants, there is a danger of promising too much t o a client in order to secure a contract. It is also at this stage that the ’problem’ to be studied is defined, when the analyst may not have a full understanding of the client organisation. The result is that the full or ’real’ problem is not always defined or analysed and that it is difficult to reopen the issue of problem definition in order to explore it more deeply. The inhibitions about discussing real issues have been vividly highlighted by Rex Brown’s anecdote about an exercise with a governmental agency, when the study only really took off after the definition of a significant attribute named ’administrative morale.’ The process The degree of complexity of the required tasks is a recurrent issue. Both tasks that are seen as tedious and demeaning (Lock, 1979, 1982) and tasks whose complexity conceals their relevance or purpose have been advanced as causes of difficulty. Complex requisite models do not necessarily entail complex elicitation tasks. The nature of the innovation. When the goal of a particular analysis is the development of a technology for dealing with repetitive decision problems, radical departures from previous practice are less likely to gain acceptance. Tlus is dependent also on the type of culture of the organisation (Handy, 1981). Radical changes are most likely t o be accepted in the task culture, for reasons previously discussed, and in the power culture if the dominant individuals favour them. The role culture (or mechanistic organisation) is singularly ill-adapted to radical changes and is only likely to accept them if imposed by an external agency under an extreme threat. Finally, the person culture, for example, the medical profession, proves particularly resistant as innovations like decision analysis come from outside narrow professional areas and erode their mystique. The acceptability of the analyst. In organisations or situations with significant professionalised groups, it is important that the analyst is either of a similar professional background or has a status or social characteristics (e,g., in the U.K., social class or even belonging to the right club) that render him or her acceptable. Status may be derived from the organisation withm w h c h the analyst is based or from previous work of an academic or professional nature. The target group and the level of control With sufficient support from the top, business organisations find it relatively easy to impose innovations on individuals lower down the system. The more diffuse the target group (again medical decision makers are a prime example), the more difficult it is for proposals and innovations to be adopted.
DECISION ANALYSIS IN ORGANISATIONAL CONTEXT
157
Implementation. It is easy to assume that a client knows how to implement a particular option chosen by a decision analysis. Choices that may involve a greater level of conflict than the individuals in the organisation have previously faced will not be implemented unless those individuals can be shown how to confront it.
Developing Clinical Skills and Strategies Freud (1953) considered that resistance on the part of the subject towards the psychoanalyst’s interpretations was a healthy sign. Its absence would indicate complete vulnerability to outside influences. We have not all the same view when it comes to rejection of the recommendations of a structured and apparently scientific approach to problem-solving. A frequent response is to grumble about irrationality or to make veiled comments about Luddites. One has to recognise, however, that techniques or technologies that tend to exclude decision makers from much of the process, or take over from them, will be perceived as threats to their autonomy and their conclusions will be resisted. Indeed ’irrational’ resistance may be the only available strategy. There is also the more general resistance to change in particular organisational structures that has been discussed earlier. The problem is to develop skills that will reduce subject resistance. Fischhoff (1977) considers the problems that may arise in decision analysis as a result of low awareness of clinical issues, and defines the skills relevant to psychotherapists as follows: . . they must instil confidence in clients, choose the appropriate questioning procedures t o elicit sensitive information, handle crises, understand what is not being said, avoid imposing their own values and perceptions, and co-operating solutions”. The implication is that the skills required by decision analysts are similar. The problems involved may be broken down into seven areas: the initial ’contract’; the role adopted by the analyst; diagnosing, exploring and structuring the problem; the analysis process, presenting solutions; implementation; and dealing with conflict. The initial ’contract’.At the outset, entry to an organisation is based on a ’contract’, by no means always written and certainly more complex than a formal written one. It is clear that in a number of decision analyses the expectations and goals of the analyst and the client were markedly different. Some understanding of expectations and goals and a clarification of the likely process, procedures, and level of involvement are important for the overall success of the exercise. On the other hand, starting off with a rigid definition of the problem to be studied can prove to be a source of difficulty. The client’s initial definition of the perceived problem reff.
15 8
A.R. Lock
presents a set of symptoms or a ’felt need’. The nature of the underlying problem to be studied is likely only to emerge later. It seems a little naive to assume that the analyst can keep the terms on which he or she enters the organisation entirely flexible. Clients have distinct expectations of what should be considered and want to have some idea of the end result. Analysts sometimes also want to gain entry as an end in itself (for pecuniary or intellectual reasons) and as a consequence tend to comnu t themselves over-readily to some broad expected outcome. Some time often has to be spent by the analyst in the initial phase on explaining the requirements of the process in terms of inputs and the degree of access to others in the organisation. Even quite sophisticated managerial decision makers conceive decision analysis as simply drawing decision trees. The need for extensive problem re-structuring and the elicitation of preferences and utilities should be established at the outset. It should also be made clear that analyst may wish t o interview other members of the organisation to gain additional inputs. In highly politicised organisations, sponsoring coalitions or individuals may attempt to control access t o the analyst or the analyst’s access to other participants. This should be considered in planning the initial structuring of the problem and deciding which groups’ views have to be taken into account in devising acceptable strategies and representing preferences. In a typical decision analysis interested groups might include owners (the state, shareholders, community, etc.), employees, consumers, managers and society. It is important to realise that almost all the inputs come from a small subset of managers. Serious consideration has to be given to the extent to which one should include other groups and to the legitimacy of the goals of the sponsor (Kunreuther, this volume), but this can lead to a ’requisite’ decision model rather different from that anticipated, or welcomed, by the sponsor. The role adopted by the arzalyst. The literature on orgariisational change tends to take a philosophical stand on what should guide change agents. Bennis (1966) sees the goal as helping the system to become healther, in terms of a better ability to test and adapt to reality. The change agent is faced with a spectrum of possible roles ranging from the ’expert’, through the ’adviser’, to the ’trainer’. In the expert role the structuring and analysis is performed for the system but outside it. In contrast, the goal of the trainer is that the organisation members should carry out the structuring, analysis and strategy formation for themselves, and should be able to use the techniques in the absence of the analyst. Adviser roles involve a joint process between the analyst and organisation members, with the former retaining some special expertise. Obviously,
DECISION ANALYSlS IN ORGANISATIONAL CONTEXT
159
there are a wide range of options w i t h n this framework and there is no necessity for the same role to be adopted throughout the interaction. Involvement and the resulting commitment on the part of decision makers increases as one moves from the ’expert’ to the ’trainer’ roles. The tendency for conventional decision analysis to follow the first path partially explains the resultant low commitment to conclusions and recom mendations in a number of cases. It is clear that the adviser and trainer roles are more likely to be required when there is not a dominant executive coalition. Diagnosing, exploring and structuring the problem Client definitions of ’felt needs’ are based on discrepancies between some desired states and the aLTual current or future state. The question at this stage is to determine what problem or problems the analysis ought naturally to consider. One view of the decision analyst is that of a passive encoder of client-provided information, the ’man with the slide-rule’ as Rex Brown has put it. The problem with t h s is that it assumes that the decision maker has a fully developed representation of the problem. In fact, much of the value of decision analysis seems to come from the structuring phase (Jungermann, 1980; Humphreys, 1982) developing subjects’ representations of the situations and problems. This aids the retention, acquisition and use of information, which takes place within the structure of individual schemata (Abelson, 1976). We may point out several specific areas in which probing by the analyst is of considerable use. The first is the elicitation of the range of goals and decision criteria. These in turn assist the definition of the range of alternative actions (Jungermann et al., this volume). The second is exploring how actions are linked to outcomes. As well as specific questions of how different situation aspects are affected by particular actions, it is also necessary to identify who will be affected by a particular decision and their likely response. Crucial skills at this stage are the ability to listen and the ability to mirror and feed back the elicited information. The goal structuring phase starts with the elicitation of a list of criteria from the decision maker (it helps, particularly with business decision makers, to avoid terms like goals and objectives whose normative overtones hinder the process). These can be extended by focussing on the aspects on which decision alternatives differ, as in the MAUD programme (Humphreys and McFadden, 1980). Bauer and Wegener (1975) and Humpreys (1982) have suggested that analysts should try t o get subjects to imagme themselves in some future state or scenario and to come up with aspects they feel to be important. The analyst may also suggest criteria, while being careful not to impose these on the decision maker.
I60
A.R. Lock
At t h s point, the elicited aspects need some degree of organising. Firstly, they can be split into more global and more specific ones. The decision maker can then explore the links between the various attributes, arranging them in some form of goal hierarchy. Some new attributes may be found, others may be merged or renamed. Frequently, the elicitation of the goal hierarchy adds alternatives to the set defined at the outset by the decision maker. Further ones may be found by exploring the possibility of breaking a large, difficult choice into a number of reversible stages, the possibility of mixtures of options, and the way in which the various options relate to the goals. The latter process is extended in the process of exploring the full structure of the options ,and the particular areas about w h c h the decision maker feels uncertain (Jungermann, 1980). The aim of the structuring phase is to generate a ’requisite’ model, pruned as far as possible to leave a description that can be discussed with the decision maker to aid his or her understanding of the problem. As far as is possible, only diagnostic events and critical tradeoffs are left in. The remaining information gathered may be used in later sensitivity analyses. The analysis process. Having arrived at a ’requisite’ structure (con&tionally, at least) one can move to the collection of data on the probabilities and preferences identified within the representation of the structured decision problem. Given that a decision maker does not have pressing reasons why he or she does not want to provide accurate information, better estimates are obtained when it is understood what they are to be used for. Hence care should be taken to explain the nature of the specific tasks involved, and multiple simple tasks rather than single complex ones should be employed. Presenting solutions. Ultimately the analysis has to be useful to decision makers and they have t o feel confident about it. The presentation of a single best option does not always inspire this confidence. Strict optimisation is less attractive than the ability to explore the problem. A major role for computer-based models is the facilitation of manager-based sensitivity analyses and the ability to respond t o ’what-if questions. Decision makers acquire commitment t o the solution by feeling both that they have some control over the solution and that they have contributed to its develop ment. An alternative approach, where the exploration one is, for various reasons, unfeasible, is t o present the decision maker with a small number of alternatives with their predicted outcomes. Implementation. One of the major problems of proposed changes is that ranges of fears and expectations are raised which largely define indviduals’ attitudes to those changes. Many of these hopes and fears will
DECISION ANALYSIS IN ORCANISATIONAL CONTEXT
161
not be realised, and need to be modified if possible. In political or expert coalition dominated organisational structures, any discussion taking place about the choice of particular options will be vitiated by the concerns of participants about possible personal gains or losses within the overall outcomes. Prior agreement about how these could be redistributed after a decision can markedly reduce the conflict involved in the actual choice. It is important to realise that the tacit consent. at least. of most individuals in an organisation and the open commitment of key individuals are required for the successful implementation of a chosen option (Porter et al. . 197 5). It is worth invoking at this point a little of the literature on planned changes. Kurt Lewin defined three elementary stages t o the process-unfreezing the existing situation, changing it, and refreezing it once the desired changes have occurred (see, for example, Zaltman, 1973). The process of unfreezing occurs when levels of certain stimuli rise above a given threshold (Boulding, 1963). Thus change may occur as a response to an external threat providing a sufficiently strong stimulus. A possible internal strategy is to raise uncertainty about the present situation sufficiently for participants to start contemplating the possibility of changes. The presence of an outsider and the decision analysis process can form the source of this uncertainty that unfreezes the existing situation. One should note the implication of planned change models is that there is a high level of control and quality of feedback for the group seeking t o initiate change. A rather more plausible view is that in most organisations the impact of changes and the subsequent organisational equilibrium is not readily predictable (Weick, 1979). This is an issue that seems to have been played down in the organisational development literature. The main elements in devising an implementation strategy relate t o identifying the key groups and individuals, how they can be induced to contemplate change and how they will respond to any particular proposals. The person with the perceived responsibility for the decision should be in charge of the implementation strategy rather than the decision analyst. The reason for its inclusion here is that both the feasibility of implementation of a particular decision within a particular organisation and the way it is actually implemented are not separable from the analysis process. In the first case there is little point recommending policies that cannot be carried out for various reasons, and in the second, some responsibility has to be taken for the impact of the chosen policy and outside the organ isa t i on.
11
162
A . R . Lock
Dealing w i t h conflict. Real choice can be painful in organisational terms. In stagnant or declining organisations, change usually implies strong negative consequences. It is clesr that decision makers often dislike conflicts, particularly highly personal ones, and seek to avoid them. In many situations, changes and decisions are postponed until they are imposed by an external agency. By this time, the organisation’s survival may even be threatened. The problem of handling conflict may thus be a barrier to the adoption of otherwise ’optimal’ choices. For example, the author has observed decision makers in a U.K. higher education institution whose strategy is to try and maintain the present balance of subjects, courses and faculty. If choices are to be made in terms of what areas to cut and whch to expand. these will be made essentially by the intervention of outside bodies controlling the purse-strings. There is no implicit belief that the current balance of work is optimal, except in terms of the inertia of the status quo, an overriding desire to avoid compulsory redundancies, and concern about levels of conflict resulting from any other choice. It may be that in many cases where power is widely diffused and the organisation is highly politicised that it is the only feasible strategy. This unfortunate conclusion may mean that the organisation has little control over its own survival. The alternative is to consider to what extent it is possible t o improve the client’s or the client organisation’s ability to deal with conflict. Porter et al. (1975), Thomas (1976) and MacCrimmon and Taylor (1976) discuss a number of ways of resolving conflict. In cases where conflict is not directly resolvable one has to consider whether it is feasible to assist people to handle overt conflict and to confront political issues openly. This appears to be a development strategy over a longer time horizon than is usually available in a decision analysis study.
Conclusions Most of this paper has been concerned with developing the consciousness of analysts and would-be analysts about the organisational issues involved in practical applications. Analysts tend to enter decision problems with a large amount of normative and ideological baggage. Because of the tendency to be employed by the wealthier or more powerful segments of society, analysts have t o accept responsibility for the use to which their techniques and technologies are put, and to recognise the legitimacy of other interests than the original client group. On the behavioural side, sensitivity to the organisational climate and the organisational consequences of any decision are likely t o be crucial to
DECISION ANALYSIS IN ORGANISATlONAL CONTEXT
163
both the likelihood of implementation and the success of such an impiementation. The view of the role of the analyst as a change agent enables one to focus on the strategy that should be adopted in this role and the degree of client involvement that should be sought. This in its own way also tends to be an ideological issue reflecting the analyst’s goals. Considerable emphasis has been given of the need, in relation to problem structure and organisational goals, to avoid over-definition particularly in the initial stages. This is a recurrent theme in the suggestions of important clinical issues and strategies at the different phases of the decision analysis process, to avoid premature closure and as a means of maintaining sufficient room to manoeuvre for the analyst. In considering the overall organisational issues surrounding a major decision, it is appropriate to investigate what additional skills are required by an analyst and to what extent other specialist skillsshould be called upon. It would be probably inappropriate to define a specific professional training which should be undergone by all would-be decision analysts, but one should be aware of the very wide range of skdls that they seem expected to possess. It does not appear that decision analysis stands alone as a tool or skill, but that practitioners need to be aware of its and their own limitations and the point at which other approaches and other skills are more suitable to the problem and situation under consideration. It is possible to take perhaps a less pessimistic view of the relevance of decision analysis by emphasising its role in assisting organisational decision processes rather than viewing the latter as regrettable constraints to be circumvented wherever possible.
References Abelson, R. P., 1976. Script processing in attitude formation and decision making. In: 1. S. Carroll and 1. W. Payne (eds.), Cognition and Social Eehaviour. Hillsdale, N. J.: Lawrence Erlbaum. Ackoff, R. L., 1979a. The future of operational research is past. Journol of the Operational Research Society, 30, 93- 104. Ackoff, R. L., 1979b. Resurrecting the future of operational research. Journal o f t h e Operational Research Society, 30, 189- 199. Argyris, C , 1970. Intervention Theory and Method. Reading, Mass.: Addison-Wesley. Argyris C., 1976a. Explorations in consulting-client relationships. In: W. G. Bennis et al.. q. v. Argyris, C., 1976b. Theories of action that inhibit learning. American Psychologisf, 31, 638-654. Bauer, V. and M. Wegener, 1975. Simulation, evaluation and conflict analysis in urban planning. Proceedings of the 1E.E.E. 63,405-413. Bell, D., 1979. Thinking ahead. Harvard Business Review, 57, May-June, 20-42. Bennis, W. G.. 1966. Chaneine Oraanisations. New York: McGraw-Hill.
11*
164
A.R. Lock
Bennis, W. G.. K D. Benne. R. Chin, and K. E. Corey (eds.), 1976. The Planning of Change. New York: Holt, Rinehart and Winston, 3rd ed. Boulding, K. E., 1959. Natural images and international systems. Journal ofConj7ict Resolution, 3, 120. Boulding K. E., 1963. The place of the image in the dynamics of society. In: K Zollschan and W. Hlrsch (eds.), Explorarions in Social Change. London: Routledge, Kegan Paul. Burns T. and G. M. Stalker, 1961. The Management of Innovation. London: Tavlstock. Campbelh D.T., 1975. "Degrees of freedom" and the case study. Comparative Political Studies. 8, 178- 193. Cyert, R. M. and J. G. March, 1963. A Behavioural Theory o f t h e Firm. Englcwood Cliffs, N . J . : Prentice-Hall. Dale, A., 1974. Coercive persuasion and the role of the change agent. Interpersonal Development, 5. 102-111. Dunnette, M. D. (ed.), 1976. Handbook of Industrial and Organizational Psychology. Chicago: Rand McNally. Eden, C., S. Jones, and D. Sims, 1979. Thinking in Organisations. London: Macmillan Eilon, S., 1980. The role of management science. Journal of the Operational Research Society, 31. 17 - 28. Einhorn, A. and K. M. Hogarth, 1981. Behavioral decision theory: Processes of judgement and choice. Annual Review ofPsychology, 3 2 53-88. Elbing A. O., 1970. Behvioural Decisions in Organizations. Glenview, Ill.: Scott, Foresman Fischhoff, B . , 1977, Decision analysis: Clinical art or clinical science? Paper presented at the 6th Research Conference o n Subjective Probability, Utility and Decision Making, Warsaw, 1977. Freud, S., 1953. Inhibitions, symptoms and anxiety. In: The Standard Edition of the Complete Works of Sigmund Freud. London: Hogarth, Vol. 20, 77. Galbraith. J. K. 1973. Desiqning Complex Organizations. Reading, Mass.: AddisonWesley. Hammond, K. R. and L. Adelman, 1976. Science, values and human judgement. Science, IY4, 389- 396. Handy, C. B., 1981. Understanding Organizations. Harmondsworth, Middlesex: Penguin, 2nd ed. Humphreys, P. C., 1980. Decision aids: Aiding decisions. In: L. Sjoberg,T. Tyszka, and J. A. Wise (eds.), Decision Analysis and Decision Processes Lund: Doxa. Humphreys, P. C., 1982, Value structures underlying risk assessments. In: H.Kunreuther (ed.), Risk: A Seminar Series. ILaxenburg, Austria: International Institute for Advanced Systems Analysis. Humphreys, P. and W. McFadden, 1980. Experiences with MAUD: Aiding decision structuring versus bootstrapping the decision maker. Acta Psychologica, 45, 51-69. Hogarth, R. M., 1980. Judgement and Choice. Chichester, England: Wiley. Hogarth, R. M., 1981. Decision making in organizations and the organization of decision making. Working paper. University of Chicago, Center for Decision Research. Hogarth, R. M. and S. Makridakis, 1981. Forecasting and planning: An evaluation. Management Science, 2 7, 1 15- 138.
DECISION ANALYSIS IN ORGANISATIONAL CONTEXT
165
Jaques. E., 1976. A General Theory of’Bureaucracy. London: Heinemann. Jungermann, H., 1980. Speculations about decision-theoretic aids for personal decision making. Actu Psychologica, 4.5. 7 -34. Jungermann, H., I. von Ulardt, and L. Hausmann, 1983. The role of the goal for generating actions. In this volume, 223-236. Kunreuther, H., 1983. A multkattribute multi-party model of choice: Descripitive and prescriptive considerations. In this volume, 69 -89. Lansley, P., P. Sadler, and T. Webb, 1974. Organizatioii structure, management style and company performance. Omega, 2, 467-485. Lock, A. R., 1979. Multiple criterion strategic marketing problems. An analytical approach. Unpublished Ph. D. Thesis. University of London. Lock, A. R., 1982. A strategic business decision with multiple criteria: The Baliy men’s shoe problem. Journal of the Operational Research Society. MacCrimmon, K. R. and R. W. Taylor, 1976. Decision making and problem solnng. In: M. D. Dunnette (ed.), q. v. MacMillan, 1. C., 1978. Strategy Formulation: Political Concepts. St. Paul, Minnesota: West Publishing. March, J. G., 1978. Bounded rationality, ambiguity and the engineering of choice. Bell Journal of Economics, 9, 587-608. Mintzberg, H., 1978. Patterns in strategy formation. Management Science, 24, 934-948.
Neave. E. H. and E. H. Petersen, 1980. A comparison of optimal and adaptive decL sion mechanisms in an organizational setting. Management Science, 26, 8 10- 822. Neisser, U., 1976. Cognition and Reality: Principles and Implications of Cognitive Psychology. San Francisco: W. H. Freeman. Nisbett, R and L. Ross, 1980. Human Inference: Strategies and Shortcomings of Social Judgement. Englewood Cliffs, N.J.: PrenticeHall. Nord, W. R., 1974. The failure of current applied behavioural science: A Marxian perspective. Journal of Applied Behavioural Science, 10, 557-578. Pettigrew, G M., 1973. The Politics of Organizational Decision-Making. London: Tavistock. Pettigrew, A. M., 1974. The influence process between specialists and executives. Personnel Review. 3, 24-31. Pfeffer, J., 1981. Power in Organizations. London: Pitman. PhiIlips L. D., 1980. Organisational structure and decision technology. Actu Psychologica, 4.5, 247-264. Plott, C. R and M. E. Levine, 1978. A model of agenda influence on committee decisions. American Economic Review, 68, 146-160. Porter, L W., E. E. Lawler, and J. R. Hackman, 1975. Behaviour in Organizations. New York: McGraw-Hill. Pugh, D. S. and D. J. Hickson, 1976. Organizational Structure in Its Context: The Aston Programme I . Farnborough, England: Saxon House. Richards, M. D. 1978. Organizational Goal Structures St. Paul, Minnesota: West Publishing Simon, H. A, 1978. Rationality as process and as product of thought. American Economic Review, 68 ( 2 ) , 1-16. Simon, H. k , 1979. Rational dedsion making in business organizations. American Economic Review, 69,493-513.
166
A.R. Lock
Slovic, P., B. Flschhoff, and S. Lichtensteln, 1977. Behavioral dedsion theory. Annual Review of Psychology, 28, 1-39. Staw, B. M., 1976. Knee-deep in the big muddy: A study of escalating commitment to a chosen course of action, Organbational Behavior and Human Performance. 16, 27-44. Staw. B. M. and G. R. Salancik (eds.), 1977. New Directions in Organizational Behaviour. Chicago: St. Clair Press Thomas, K.,1976. Conflict and conflict management. In: M. D. Dunnette (ed.),q.v. Uyterhoeven, H. E. R., 1972. General managers In the middle. Haward Business Review, 50. March-April, 75 -85. Vroom, V. H. and P. W. Yetton, 1913. Leadership and Decision-Making. Pittsburgh, Pa.: University of Pittsburgh Press. Watson, S. R. and R. V. Brown, 1978. The valuation of decision analysis. Journal of the Royal Statistical Society, Series A, 141, 69- 18. Weick, K . E., 1979. The Social Psychology of Organizing. Reading, Mass,:AddisonWesley, 2nd ed.. Weiss, C. H., 1980. Knowledge creep and declsion accretion. Knowledge: Creation, Diffwsion, Utilization, 1, 381 -404. Zaltman, C. (ed.) 1973. Processes and Phenomena of Social Change. New York: Wiley.
PITFALLS OF DECISION ANALYSIS' Detlof von WINTERFELDT Social Science Research Institute, University of Southern California US.A
Introduction
Much of the craft of decision analysis consists of recognizing and avoiding pitfalls, which exist at each stage of an application between problem formulation and model implementation. Consider the nightmare of solving the wrong problem for a misconstructed client with an inappropriate model and gamed numerical inputs from experts. In this case one would almost hope that the model is never used or implemented, which, incidentally, is another common pitfall of decision analysis. Unfortunately, the published literature on applications of decision analysis provides little information about such pitfalls or the process by which analysts try to avoid them. The published "success stories" are usually cut and dry and, except for a few cases, they appear to be relatively straightforward (and often dull) applications of standard techniques. The failures are, of course, never published. Yet there is much to be learned from failures and mastered problems, especially if one wants to go beyond the technical surface of decision analysis and understand the craft of its application. One purpose of t h ~ s paper is t o systematically explore some typical traps and pitfalls of decision analysis, and thereby increase the practitioner's awareness of these problems. Another purpose is to present examples and lessons of how pitfalls -once recognized-can best be avoided. Further stimulatiori for this paper came from a series of articles and books (Majone, 1977a, b; Ravetz, 1973; Majone and Quade, 1980) which discuss, on a more general level, the pitfalls of analysis. Much of this discussion applies directly to decision analysis, e.g., the pitfall of addressing the wrong problem. Other pitfalls
'
With contributions by R.V. Brown, J.S. Dyer, W. Edwards, D.H. Gustafmn, P. Hurnphreys, L.D. Phillips, D.A. Seaver, A . V k i , and J. Vecsenyi.
168
D.von Winterfeldt
appear to be handled well by decision analysis, e.g., the neglect of intangibles. Sill others seem unique to decision analysis, eg., the "gaming" of utilities or probabilities. It is relatively easy to write about pitfalls in the abstract, but the most important lessons can be learned from problems and failures in real applications. To provide a rich set of pitfalls several practitioners of decision analysis contributed examples of their own applied experiences.' Thls paper discusses their contributions, following the stages of decision analysis : 1. Developing an analyst-client relationship; 2. Defining the problem; 3. Organizing the analysis; 4. Structuring and modeling; 5. Elicitation of utilities and probabilities; 6. Using and implementing the model. For each stage, we will describe the general nature of the possible pitfalls, give several examples, and briefly discuss the lessons.
Developing an Analyst-Client Relationship: Users and Hidden Agendas Decision analysis is still an emerging discipline and therefore largely supply driven. Often the analyst identifies a problem area for which decision analysis is useful and then approaches the potential client. While this picture is rapidly changing, it is still true that in a majority of cases the analyst "sells" the decision analytic approach to governmental agencies, businesses, or individual clients. As a partial consequence of this supply orientation and of the resulting eagerness of the analyst to practice his or her profession, a number of pitfalls can occur. These include misconstruction of the "red" client, misrepresentation of the "red" motives of the client, or simply confusion about who is to be served by the analysis and why. The problem is aggravated if the sponsors themselves are not sure about the ultimate user of the analysis. An example of multiple clients was decision analysis application for evaluating community anti-crime (CAC) programs (see Snapper and Seaver, 1980a). The sponsor of this evaluation was the Law Enforcement
Tht authors of these. contributions are Usted in the title of this paper. I would Uke to thank them all for thefr frankness and for lettlng me use.theh cases and materials. They are the real authors of this paper.
PITFALLS OF DECISION ANALYSIS
169
Administration Agency (LEAA), which also funded the programs that were to be evaluted. One of the purposes of the evaluation was therefore soon identified: to aid LEAA in its planning and funding decisions. However, two other clients of the proposed evaluation emerged soon: the CAC project managers who wanted to learn how their projects could be improved and Congress who had to set policy about programs and LEAA. A natural pitfall of decision analysis in this situation would be to focus the evaluation solely on the needs and concerns of LEAA, in essence becoming a watchdog of the subordinate project managers, and using the evaluation to ”sell” the program as a whole to Congress. The analysts avoided this pitfall with a quite elegant solution: they constructed a three level value tree whose upper level represented the values of Congress; one second level branch included LEAA’s overall program objectives; and one branch of LEAA’s objectives coincided with the local CAC program objectives. Tius approach may have idealized the harmony of the objeo tives of these major stakeholders somewhat, but it helped maintain an open mind about the use and users of the analysis. As it turned out, one of the more interesting implementations of the evaluation system was for project management purposes (see Snapper and Seaver, 1980b), rather than planning. Another pitfall in the client-analyst relationship are ludden agendas: a client wants a favorable evaluation of an activity; a decision, already made, needs justification; or the analyst’s technical skills and professional stature are to enhance the image of a project. Not all of these are necessarily detrimental to the analysis, but they are better handled when brought out in the open from the start. Hidden agendas sometimes occur in risk analysis applications of decision analysis if the client’s intention is to use the analysis to prove that the product or technology is safe (or unsafe), rather than solving a decision problem related to the safety of the product. In many such instances the analysis is in response to public or governmental concern about safety, and the client expects results that favor his or her position and therefore contribute to resolving the controversy in his or her favor. Ironically, risk analyses have frequently stirred up more controversy, rather than consolidating opinions (Mazur, 1980), partially because the public suspected such hidden agendas. A third pitfall is a direct consequence of the supply driven nature of decision analysis: analysts ldce to please their client and sometimes promise too much. An example is the American College of Radiation (ACR) study of the efficacy of X-rays (see Lusted et. al., 1978). The sponsors of the research and the analysts originally agreed that many X-rays were a waste
170
D.von Winterfeldt
of time, and the analyst in essence promised that this study would show exactly that, It was soon recognized that efficacy was a multi-level concept. To be efficacious, an X-ray has to have the potential for changing the medical doctors opinion, then to change his or her decision, and finally to change the state of the patient’s health. Recognizing that it is easiest to begin with the first link in this efficacy chain, the analysis began with a study of whether X-rays would change medical doctors’ opinions (this would later be called efficacy 1). The hypothesis was that it would not, and that hypothesis was apparently widely shared. But the large scale study showed that X-rays were indeed meeting efficacy I criteria-they did change the diagnostic likelihood ratio judgements of the physicians. While some of the researchers are still convinced that X-rays would not meet the efficacy I1 criterion (change physicians action), the efficacy I1 study was never funded. What is there t o be learned from these examples? First, understand your client, probe his or her motive, and carefully define the purpose of the analysis. This is usually simple when the client comes with a problem, so be more alert when you, as an analyst, approach a client. Second, actively search for organizations or individuals with a stake in the decision to identify the conflicting interests and motives. Third, establish an atmosphere of trust, encourage the client to reveal hidden agendas, stress that such agendas can become a legitimate part of the analysis. And finally: don’t promise too much or draw conclusions too early.
Defining the Problem: Real Problems, Apparent Problems, and Side Problems An important part of the early phases of decision analysis is devoted to the definition of the problem. A common pitfall is to take the client’s problem at face value. Another trap is to accept the client’s restrictions of the part of the problem for which he or she wants an answer. In contrast, analysts like to reformulate problems to make them accessible to decision analytic modeling, thereby fitting the problem to the model. Common to many of these pitfalls are the difficulties of defining the level of abstraction of alternatives (e.g., strategic vs. tactical) and objectives (e .g., overall organizational vs. cost-benefits) in response to an initially vague problem formulation. The first pitfall is illustrated by a study of chronic oil pollution from North Sea oil production platforms (von Winterfeldt, 1982; CUEP, 1976). The decision maker was the U.K.Department of Energy who had to set standards on oily water discharges from offshore oil production
PITFALLS OF DECISION ANALYSIS
171
platforms. These standards were to prevent negative environmental impacts, while at the same time avoiding costly restrictions on offshore oil development. This appears like a straightforward "choice among standards" problem and it was, indeed, handled that way informally by the CUEP, and formally, though experimentally, in von Winterfeldt (1 982). During the course of interacting with the U.K. Government agencies, the analyst began to suspect, however, that setting oil pollution standards was not really the problem of the U.K. government. Apparently, several countries of the European Community had been pushing very hard for a clean North Sea, enforced by uniform European standards. In the past, the U.K. government had avoided numerical standards, and instead stressed the flexible nature of pollution sources and of the local environmental carrying capacity, so setting numerical standards required a change in policy. Perhaps a better problcm formillation would have been how best t o counter the European Community's push for uniform standards. Obviously, the alternatives would have been much different from the maximum levels of oil concentration, that both the CUEP and v. Winterfeldt analyzed. Pitfalls in problem formulation frequently occur in analyses for personal decisions. In an experimental study of such decisions (see John, v. Winterfeldt, and Edwards, 1981), a woman initially formulated her problem as an apartment selection problem. The real problem turned out t o be much more complex and involved financial questions and family interactions. The problem could have been formulated, for example, as a problem of managing her relationship with her parents. Another problem formulation would have involved alternatives for managing her financial situation. This case was interesting, because out these deeper aspects of the "apartment selection" problem, the analysis incorporated them into objectives and maintained "apartments" as the alternatives. It was clear, however, that each apartment stood for a complex alternative for managing the subject's lifestyle. Another example of addressing only part of the "real" problem has been described by Viri and Vecsenyi (this volume). The decision maker, a Hungarian governmental agency, had to select among possible mixes of R & D projects in a typical resource allocation problem. The analystsintended to consider the complementarities and redundancies between projects in an explicit model of conditional utilities and probabilities. However, the decision makers preferred t o consider the utilities of the single projects separately and to use their own heuristics in combining the single project utilities to an overall evaluation. One reason for this preference was, presumably, that the decision makers wanted to engage some of their own values about
172
D.von Winterfeld t
projects and project interdependencies without making them an explicit part of the analysis. Brown and Ulvila (1981) provide an example concentrating on those parts of a decision problem which lend themselves easily to decision analytic tools (see also Ulvila and Brown, 1981). They developed a decision analytic model to aid the International Atomic Energy Agency (IAEA) in allocating inspection resources. On later reflection, it seemed that it might have been more useful t o develop and evaluate alternative strategies for motivating inspectors within a given resource allocation context. One conclusion of these examples is that a decision analyst should keep an open mind about problem formulations including those for which decision analysis does not seem especially suited. Another conclusion is that it is often necessary to examine different levels of problem formulations in the early stages of analysis. Problems usually come in hierarchies. In the oil pollution example the most general problem formulation would have involved alternative policies for dealing with international pressure on the U.K. government to clean up the North Sea. Once regulation strategies were accepted, the U.K. faced alternatives for regulating oil pollution: case by case regulation, emission taxes, standards, etc. Once standards were decided upon, the questions remained what levels these standards should be set at, how the monitoring and inspection procedures should be organized, etc. Decision makers sometimes like to “push down” the problem definition because it makes the analysis more technical, requires less sensitive political judgements, and it keeps the altenatives within their decision making capacity. This tendency t o suboptimize is not necessarily detrimental. A wider problem formulation does not always lead to a substantial improvement in the overall organizational objectives, and often leads to a more expensive analysis. Yet the analyst should be aware of higher order problem formulations and the tradeoffs involved.
Organizing the Analysis: Institutional Obstacles and Obstinate Individuals Setting up an analysis in an institutional environment often requires substantial managerial skill and political savvy. Decision makers have to be convinced that the analysis is useful and does not interfere with their activities or threaten their job. Experts have to contribute valuable time to perform what they may consider tedious and boring assessments. In politically controversial decisions, opposing stakeholder groups have t o be
PITFALLS OF DECISION ANALYSIS
173
involved in the analysis. Sometimes the institutional constraints prevent the analysis from heing carried out or being effectively implemented. Sometimes individuals or organizations simply refuse to cooperate. A particularly interesting case of institutional constraints was discovered in a descriptive case study of TVA’s power plant siting process (see b o p , 1980). One purpose of the descriptive study was t o examine how a decision analysis process could improve on the siting procedure then in use by TVA. That procedure was essentially a four stage screening and evaluation process in which, first, candidate areas were selected on the basis of power systems requirements. In the second stage numerous candidate sites were screened mainly on the basis of engineering, construction,and landuse features. More detailed comparative evaluations of cost, engineering, and environmental factors were done on the 3-5 sites that survived the screening. Finally, the preferred site was studied in great detail, mainly to meet federal and state regulations for environmental impact statements. An interesting feature of this process was the organizational arrangement. In the first two stages the process was coordinated and, in effect, controlled by the Division of Power Resource Planning, in the last two stages by the Division of Engineering Design and Construction. The Environmental Division, while technically reporting t o the Board of Directors, was in effect a subcontractor for performing surveys and research for the lead divisions. In addition, the Environmental Division became involved only in the later stages of the siting process. This institutional arrangement seemed t o contradict TVA’s policy which put environmental considerations on an equal footing with cost, engineering, and power systems considerations. The real process of decision making cLuld best be described as a sequential elimination process with shifting priorities in the order of power systems (first stage), engineering (second stage), cost (third stage), and environmental (fourth stage). Decision analysis by its very emphasis on tradeoffs could have been implemented only with organizational rearrangements. Interestingly, such reorganization was a focal point of discussion at TVA at the time of the case study: most divisions, headed by the Environmental Division pushed strongly for more lead time and an earlier involvement in the siting process-an institutional form of redressing the balance between TVA’s mu1ti ple objectives. A more straightforward pitfall of decision analysis is the refusal of important experts or stakeholders to qoperate. An example is Ward Edwards’ school board study (Edwards, 1979). In the late seventies and up to 1981 the board of the Los Angeles Unified School District (LAUSD) was presented several court orders to develop and implement a desegregation plan for the district. One order asked the school board to develop a
174
D. von Winterfeldt
scheme for evaluating alternative plans. Under the leadership of Edwards, a multiattribute utility analysis was developed that was meant to incorporate the concerns of the opposing stakeholders. Edwards developed a common value tree from inputs of all relevant stakeholder groups, school board experts performed the single attribute evaluations of the available plans, and school board members assigned the weights. While the general level of cooperation of stakeholders was remarkable, there were notable exceptions: a pro-bussing group and one member of the board refused t o provide inputs to the analysis. This did not cause a failure of the analysis, but it highlighted a potentially severe problem with decision analysis in politically controversial situations. There are many other examples of institutional constraints and lack of individual cooperation. Some can be very serious-as the rejection of some courts to allow probabilistic testimony. Others are more amusing, as the refusal of an expert to express his or her opinions probabilistically, because of a lack of "hard" knowledge. What can be done about it? Awareness of the institution, of the political arrangements and of individual psyches is an important step. Don't try to force an analysis that does not respect institutional barriers-it is likely to fail. Meshing analysis, institutions, and individuals, requires compromises from everybody-especially from the analyst.
Developing a Structure and Model: Bushy Messes and Analytic Myopia The initial steps of decision analysis are largely political in nature. Wldle the structuring, elicitation and implementation steps involve more of the craft and science of decision analysis, they nevertheless have their own traps. A frequent structuring pitfall is the acceptance of a traditional decision analytic paradigm that does not quite fit the problem. A problem is quickly labelled a "typical riskless multiattribute problem" or a "typical signal detection problem". Dynamic aspects and feedbacks in decision problems are often overlooked, because the traditional static structures of decision analysis d o not handle them well. A related pitfall is t o structure a problem too quickly in too much detail. Bushy messes in the form of overly detailed and often redundant decision, value, and inference trees are a frequent problem of beginning decision analysts. In the development of an appropriate inference, evaluation, and decision model, there are additional pitfalls: dependencies in inference models, redundancies in evaluation models, and dynamic features in decision models, to name only a few.
PITFALLS OF DECISION ANALYSIS
175
The previously mentioned school board study (Edwards, 1979) illustrates a structuring pitfall with value trees, Edwards’ value tree had 144 twigs, far too many according to his own rules (Edwards, 1977). Edwards began the value tree building process by constructing a relatively modest “strawman” tree. The tree was then shown t o members of various stakeholder groups, and their response was typically to add values. This addition of values not only increased the size of the tree but also created interdependencies and redundancies. The analyst accepted both the dependencies and the redundancies as a fact of life and the only reasonable way to deal with the political nature of the task he had set out to do. A fascinating case of misstructuring-mismodelingis the analysis of Ford’s decision whether to place the Pinto gas tank in front or behind the rear axle. The decision was based on a probabilistic cost-benefit analysis in which the chances of fires in rear end collisions were traded off against the costs of lives lost in these fires. As it turned out, the expected dollar value of lives saved by placing the tank in front of the rear axle was smaller than the cost of the tank relocation. Ford therefore chose not t o relocate. The problem with this model and structure is its myopia. It neglected the possible negative publicity that resulted from frequent occurrences of fires in rear end oollisions and it did not consider the possibility of punitive damages in liability suits. The analysis derived the value of lives from insurance premiums and past court awards. At the time of the analysis that value was estimated to be between #300,000and $400,000 (in 1970 dollars). From the company’s point of view, this number represented the average loss they would face if sued for liability. Punitive damages could, of course. be in the millions. The case had an ironic twist. Ford was sued in several instances of fatalities and injuries caused by fires in rear end collisions. In one suit the analysis was made public and that publication was partly responsible for punitive damages in the millions. The irony is that had Ford considered such punitive damages, it might very well have concluded that the expected costs of not replacing the tank would be larger than the costs of replacing it. Perhaps the best way to avoid the pitfalls of the above examples, and the many more that were mentioned in the beginning of this section is to develop several alternative structures and models early in the decision analysis, and run several sensitivity analyses t o see what matters and what does not. Further, the analyst should encourage creativity in structuring, and in inventing events and anticipating obscure secondary and tertiary impacts of the decisions, One way to improve this process is by involving groups of decision makers and experts covering a wide range of experiences, opinions, and values.
176
D.von Wintarfeldt
Elicitation: Gamed Numbere and Numbers Games Elicitation of utilities and probabilities is the technically best developed part of decision analysis. The main problems that an analyst encounters in this phase are inadvertent biases, misrepresentations, or even lies by respondents. A frequent bias is optimism or pessimism in assesments of probabilities of events on which the assessor has a stake. Misrepresentations can also occur in utility assessment t o bias the tradeoffs in order to make one alternative "come out better". And finally, there isthe possibility of outright gaming of the numerical inputs of the analysis. Perhaps the most intriguing example of this sort occurred in Dyer and Miles' (1976) evaluation of alternative trajectories for NASA's Jupiter/Saturn mission. In this study several science teams assigned utilities to alternative trajectories by considering the value of the research contribution for each trajectory. Since different science teams had different research tasks and interests, these utilities varied widely. Using the single team evaluations as inputs, several formal rules for collective decision making were applied t o find an acceptable trajectory. The science teams involved in the assessment knew, of course, that the purpose of the study was to develop a compromise evaluation and they were knowledgeable enough about collective decision rules to be able to game them if they wished to do so. In fact, two types of gaming occurred. The first was to simply give the trajectories which were considered good an extremely high evaluation, and to give all others an extremely low rating. This type of gaming gave the trajectories that a group preferred a better chance of surviving. Another type of gaming was coalition formation. Dyer and Miles report that some groups met during the evaluation process, and it is conceivable that they coordinated their assessments. The analysts were, of course, aware that such gaming could occur and they used several strategies to ensure that gamed numbers would not distort the analysis too much. Nevertheless, some groups, after the evaluation was completed, actually celebrated their "victory" in beating the evaluation system Humphreys and McFadden (1980) observed numbers games people play with the interactive computer program MAUD. In a group decision making task, a typical game was for individual members to argue for increased weights for those attributes on which their favored alternative scored most highly. If successfd, the group's evaluation would be biased in favor of the alternative that the "game player" originally preferred. It turned out that such gaming was much more difficult with MAUD than in interactions with group members, leading some group members to play "machine games".
PITFALLS OF DECISION ANALYSIS
177
While utilities are easier to game than probabilities, unintentional biases in probability judgments can occur quite easily. While much of the decision theoretic research literature focuses on cognitive biases in probability judgments, the more obvious biases are probably motivational. For example, researchers are notoriously optimistic in their estimates of the probability that a proposal will be funded c r In their point estimate of the amount of time it takes to finish a paper. To counteract gaming strategies, the analyst first of all has to know the motives of the decision makers and experts. This brings us back to Section 2, and involves questions of client-analyst relationships and trust. In utility assessment it furthermore helps to carefully anchor scales, and to ask for relative judgments rather than absolute ones. In a probability context unintentional optimism or pessimism can often be counteracted by phrasing the questions in terms of low stake lotteries. Other "debiassing" procedures that address specific cognitive illusions have been discussed in the literature (see, e.g., Kahnemann and Tversky, 1979). Implemntation: Being Better is Not Good Enough This section is about the unfortunate analyst who successfully avoided all the previous pitfalls only to fall into the last pit: the analysis is never implemented or used. There are, of course, many ways in which decision makers can ignore an analysis. One shot decision analyses are often compiled in voluminous reports and shelved away. A milder form of ignoring decision analysis consists of using parts of the analysis to justify a decision that is favored by the decision makers. Perhaps the most severe forms of the implementation pitfall occur when a decision analysis system that is built for repeated use, is never implemented. A particularly sad case of this nature is Gustafson e t d ' s (1977) system for predicting suicide attempts. They developed a computer based system which permitted a person complaining of suicidal thoughts to be interviewed by a computer prior to seeing the psychiatrist. The computer then prepared a narrative summary of that interview by the psychiatrist and also provided an estimate of the probability that a person would make a suicidal attempt. That probability calculation used a subjective Bayesian model where the person's characteristics are linked with the appropriate likelihood estimates and processed in a way that will give a probability of a serious suicidal attempt versus the probability that this person will not make a serious suicidal attempt. Both pilot studies and field test studies suggested that the Bayesian model was significantly superior to unaided judgment of psychiatrists and 12
178
D.von Winterfeldt
clinical psychologists. For instance, one analysis divided patients into two categories: Those that made a serious suicidal attempt and those that didn’t. The authors compared clinicians who were asked t o estimate whether the person would make an attempt on their life to the computer which was asked to make a similar prognostication. It turned out that the clinicians identified the serious attemptor about 37 per cent of the time and the non-attemptqr about 94 per cent of the time. The computer system identified the serious attemptor 74 per cent of the time and the non-attemptor about 94 per cent of the time. Thus, the ability of the computer t o detect serious suicidal attempts was much better than that of the expert clinician. But, even though this research has been published in several academic journals; even though the results have been described in a popular journal (TimeMagazine):even though the system has been set up for full scale implementation and attempts have been made t o implement on the full scale basis, the technique has never been adopted in practice. Nobody is at this point using it and while there have been expressions for interest in this system, those expressions have not led to implementation. There are a number of reasons for this failure. The system developers had paid little attention to the organizational and psychological aspects that can hinder the implementation of a formal decision analytic system. While the patients liked to interact with the computer, clinicians appeared to resist the implementation of the system. One reason was that the clinicians and other users in mental health counseling and guidance were not familiap with computers, and thus sceptical about their use; another reason was that the computer challenged the authority of the clinician, because it could be viewed as replacing clinical psychologists and psychiatrists in a critical task of their practice. Further, while the research scientists were very fond of the complexity of the Bayesian processing model, this model was hard to explain and somewhat obscure to the clinicians-another reason for a lack of trust. Edwards’ Probabilistic Information Processing (PIP) (Edwards et al. 1968) System is another instance of an inference system that never was adopted in significant real world applications. PIP was developed mainly for intelligence purposes to facilitate the task of processing intelligence information that bear on hypotheses of relevance to policy makers. PIP was built on a simple Bayesian inference structure using likelihood ratios for quantifying the diagnostic impact of information and prior odds for quantifying the apriori knowledge. PIP then computes the posterior probability or odds of the relevant hypothesis. In a simulation study (Edwards etuf., 1968) PIP was found to be superior to other on line systems that were evaluated. Yet PIP was never used. One reason was that inference structure in PIP was too simplistic.
PITFALLS OF DECISION ANALYSIS
179
Recent research by Schwn (1977, 1979) has generated several generic and useful inference structures which are quite a bit more realistic than the one stage Bayesian structure used in PIP. Another contributing factor was that PIP was an information processing system with no direct links intodecision making. There are also less dramatic examples of the implementation pitfall. The previously mentioned study of oil pollution standards was never used. While the analysis sharpened the decision maker's eye for the sensitive places in the process (e.g., monitoring procedures, sampling methods) it did not have precise numerical inputs nor were the environmental impacts modeled sufficiently. Consequently, the numerical analysis was not taken seriously by the decision makers. Edward's school board study had a unique fate. The study was successful in that it producsd an evaluation system and that the desegregation plans were actually evaluated. However, after the initial evaluation was completed, the school board developed its own plan, which was never evaluated by the multiattribute system. It is unclear why the school board did not use the system to evaluate the final proposal, in particular, since the proposal was developed with knowledge of the system and would have been a likely winner. What can be done to avoid the non-implementation pitfall? First, it helps to avoid all others that precede it. Second, during the process of the analysis keep in mind that the decision maker has to use the analysis, not the analyst. This should bias against overly complex modeling, too extensive use of incomprehensible or inaccessible machinery, and use of ideas or devices that may threaten the client. Third, the analysis or model should be presented as an aid, not a substitute for decision making, and the interaction between model and client should be user-friendly. A Brief Conclusion
Larry Phillips made a most pertinent comment about avoiding any of the pitfalls described above: "Over the past five years, I have completed about 25 decision analyses far a great variety of clients at all levels in organizations. None has been an outright failure, and on reflecting why this has been so, I can see only one common feature to all of these analyses: the client always came to me." Fortunately for decision analysis and decision analysts, more and more clients are doing just that. 12*
180
D. von Winterfeldt
References Brown, R. V . and J. W. Ulvila, 1982. The role of decision analysis in international nuclear safeguards. In this volume, 91 -103. Central Unit on Environmental Pollution, 1976. The separation of oil from water for North Sea oil Operations. Pollution Paper No, 6. Department of thvironmcnt, London: HMSO. Dyer, J. S. and D. Miles, 1976. An actual application of collective choice theory t o the selection of trajectorlcs for the mariner JupiterlSaturn 1977 Project. Operations Research, 24, 220--243. Kdwards, W., 1977. Use of multiattribute utility measurement for social decision making. I n : D. E . Bell, R . L. Keeney, and H. Raiffa (eds.). Conflicring Objectives in Decisions. New York: Wiley. Edwards, W., 1979. Multlattrlbute utllity measurement: Evaluatifig desegregation plans In a highly polltical context. In: R. Perloff (ed.), Evaluator Interventions: Pros and C o n s Baverly Hills: Sage. Edwards, W., 1980. Reflections o n and criticism of a highly politlcal multi-attribute irtihty analysis. In: L. Cobb and R. M. Thrall (eds.), Mathematical Frontiers of Brhavioral arid Policy Sciences. Colorado: Westvicw Press, 1 5 7 - 186. lidwards, W., L. D. Phillips, W. L. Hays, and B. C. Goodman, 1968. Probabillstlc information processing system: Design and cvaluation. IEEE Transactions of .Sy.vtems Science and Cybernetics, SSC-4, 3, 248-- 265. Gnstafson. D. H., J. H. Greist, F. F. Strauss, H. Erdman, and T. Laughren, 1977. A probabilistic system for identifying suicide attemptors. Compufers and Hiornedical Research, 10, 8 3 -- 89. Ilumphreys, P. and W. McFadden, 1980. Experiences with MAUD: Aiding decision structuring versus bootstrapping the decision maker. A c f a Psychologica, 45, 51 --69. Hurnphreys, P. and A . Wisudha, 1980. Multi-Attribute Utility Decomposition. Technical Report 79-212. Uxhridge, Middlesex, England: Brunel University, Decision Analysis Unit, John, R . S . , I). von Winterfeldt. and W . Edwards, 1981. The quality and user acceptance of decision analysis performed by coinputcr vs. analyst. Technical Report No, 81 - I . Social Science Research Institutc, University of Southern California. Kahncman, D. and A. 'I'versky, 1979. Intuitive prediction: Biases and corrcctivc procedures. Management Science, 12, 313-327. Knop, I{., 1979. The 'Tennessee Valley Authority: A field study. IIASA-RR-79-2. International Institute for Applied Systems Analysis, Laxenburg, Austria. Lusted, L . B ., H . V. Roberts, D. L. Wallace, M . Lahiff. W .Edwards, J . W . Loop, R . S. Bell, J. R. Thornbury, D. L. Seale, J. P. Steele, and D. C . Fryback. EffIcacy of diagnostic radiologic procedures. In : K , Snapper (ed.), Practical Evalrratio~i: Case Studies in Simplifying Complex Problems. Washington, D. C . : Information Resources Press (in press). Majone, G . and 1;. Quade (eds.), 1980. Pitfalls ofAnalysis. Chichcster: Wiley. Majone. C., 1977a. Pitfalls of analysis and the analysis of pitfalls. Research Memorandum IIASA- RM-77- 1. International Institute for Applicd Systems Analysis, Laxenburg, Austria.
PITFALLS OF DECISION ANALYSIS
181
Majonc, G., 1977b. l'echnology assessment in a dialectic key. Professional paper I I A S A , PP-77 1. International Institute for Applied Systems Analysis, Laxenburg, Austria. Mazur, A., 1980. Societal and scientific causes of the historical development of risk assessment. In: J . Conrad (ed.), Society, Teclinology, and Risk Assessment. New York: Academic Press, 1 5 1 -- 164. Ravetz, J., 1973. S('ieritifiic. Knowledge and Its Social Prohlenis. Harmondsworth, Middlesex, h g l a n d : Penguin. Sciium. D. A,, 1977. The behavioral richness of cumulative and corroborative tcstjnionial evidcnce. In: Castcllan, Pisoni, and Potts (eds.), Cognitive Theory, Vol. 2 . Hilisdalc, N . J . Laurence Erlenhaum Association. Schum, D. A,, 1979. On factors which influence the redundancy of' cumulative and corroborative testimonial evidence. Technical Report No. 79- 02. Houston, Texas: Rice University. Dept. of Psychology, Snapper, K, and D. A. Seaver. 1980a. The irrelevance of evaluation research for decision making: Case studies from community antkcrime program. Technical Report 80.-12. Decision Science Consortium, Inc. 1:alls Church, VA. Snapper, K. and D. A. Seaver, 19ROb. The use of evaluatior) models for decision making: Application to the community anti-crime program. Evaluafion and Program Planning, 197.- 208. Ulvila, J . and R. V. Brown, 198 1. Development of decision analysis for non-proliferation safeguards: Project summary and characterization of the IAEA decision structure, Technical Report No. 81 --7. Decision Science Consortium, Inc., Falls Church. VA. Viri, A. and J. Vecsenyi. Decision analysis of industrial K flr D problems. Pitfalls and lessons. In this volume, 183-195. von Winterfeldt, D. Setting standards for offshore oil discharges: A regulatory deck slon analysis. Operations Research (in press).
This Page Intentionally Left Blank
DECISION ANALYSIS OF INDUSTRIAL R & D PROBLEMS: PITFALLS A N D LESSONS Anna VARI and Janos VECSENYI Bureau for Systems Analysis State Oj]ke jor Technical Development Budapest. Hungary
Abstract This paper draws attention to a number of pitfalls of multi-attributive decision analysis, viz. : ( 1 ) Decision makers very often do not wish the whole decision problem to be analyzed. Only a limited part--more or less independent of the value system of the decision makers -is subjected to analysis, The multi-attributive comparison does not, as a result, refer to thc actual alternatives of the problem sulution but. e g . , to certain states, objects, etc. characterizing the decision situation. (2) There is a contradiction encountered when structuring the decision problem: while, by increasing the number of attributes the reliability of the assessments by attributes can be improved, the possibility for their simultaneous consideration thereby decreases. ( 3 ) During multi-attributive aggregation, the problem may arise that, if there are cause-effect, means-end relationships between the elements of the system of attributes those cannot be treated independently of the value system, of the decision makers. According to our experiences, the adequacy of utility models can be questioned on performing aggregations under these circumstances. (4) Another pitfall concerns the consideration of uncertainties. The models suggested by the decision analysts, like, e.g.. SEU, are often not accepted by the decision makers as an appropriate way of taking uncertainties (probabilities of success) into account when evaluating projects.
Introduction
Is decision analysis a successful enterprise? We have attempted t o answer tlus question by calling attention t o some pitfalls of multi-attribute decision analyses applied to research and development decisions in Hungary. In the first part we present some illustrative examples of experiences familiar to decision consultants who find themselves in analyzing not the real decision problem but only a limited part of it. We argue that the reasons for this usually stem from the efforts made by the decision makers aimed at simplifying and better understanding the problem, and at the same time hiding some of their goals and values, as well as some of the possible solutions.
184
A. Viri and J . Vecsenyi
In the second part we investigate some questions concerning the quantity and quality of the value relevant attributes. John et al. (1 98 1) point out that completeness and logical as well as value-wise independence of attributes are somewhat contradictory requirements. They also found that whle 'completeness' usually increases the impressiveness of the analysis, the level of acceptance of results based on fewer but more independent attributes is higher. In the third and fourth part we address some problems with aggregation rules. The usual methods for aggregating evaluations across multiple attributes are very poor in handling attribute-interdependencies, like cause-effect, means-end relationships, etc. Besides t h s , we often face the problem of means-end confusion which is a special case of incoherence of preference structure, pointed out by Humphreys and McFadden (1981) as goal confusion, and which cannot be resolved by applying an appropriate decision rule. The problem of finding appropriate rules is even more complicated in cases involving high uncertainties. We found that managers d o not accept SEU-based suggestions in the domain of low probabilities. We will illustrate these problems with our experience in decision analysis of the following three decisions: A. Selection among R & D projects at the branch level. B. Developing strategies for production mixes at the enterprise level. C. Selecting among development alternatives for a given product (choice between home development and buying a licence from abroad).
Case Descriptions
In Case A, a fixed budget had to be allocated across ten R & D projects (Vhri and Dhvid, 1982). Due to considerable long-term uncertainties, the individual projects were compared on the basis of their subjective expected utility (SEU), computed through considering the subjective probabilities of technical success of the research, the success in implementing the R & D results, and the success in applying the results. For each project, these probabilities were estimated using the certainty equivalence method. On the utility side an additive multi-attribute utility model with 14 attributes was used (David et al. ,, 1982). In Case B, a choice had to be made concerning which of the products of an enterprise should be developed, constrained by existing level or reduced budget levels (Vecsenyi, 1982). The products were evaluated by
DECISION ANALYSIS OF INDUSTRIAL R & D PROBLEMS
185
the decision makers on ten attributes and their uncertainties were expressed by using interval-estimations. The weights of the attributes were determined by pair-wise comparison and Guilford-transformation (Torgerson, 1967). The expected utilities were calculated using these estimations with the simulation model described by Kahne (1975)' In Case C, experts of an enterprise had to choose among three foreign licence offers and two home development strategies. A hierarchy of 28 means- and ends-type attributes was explored and transformed by canonical factor analysis into six independent factors. Alternatives were compared using the Kahne modal (Vari and Fustos, 1981). In all three analyses the comparison of alternatives (projects, products, development alternatives) was based on subjective expected utility and additive multi-attribute utilities. Evaluation attributes and input parameters of the decision model (weights of attributes, utilities, subjective probabilities) were elicited by applying various methods, depending on the particular problem. A common feature was that these operations were carried out in groups by experts from various fields and from different levels of the decision hierarchy. The group settings usually included feedback and discussions of the aggregation of individual evaluations, group characteristics (like indices of group agreement), etc. Pitfall 1 :The Domain of the Decision Problem vs. the Domain of Decision Analysis
In Case A, the decision makers faced an R & D resource allocation prob lem They had to choose the best alternative from the possible combinations of the competitive R & D projects. Because of several conditional, complementary and competitive relationships among the projects, we (the decision analysis consultants) suggested an evaluation method taking into consideration these interdependencies. This would have required estimating several conditional probabilities and utilities. Instead the decision makers first independently evaluated the individual R & D projects and several 'rational' combinations of projects. In this stage, peculiar heuristics were applied by the decision makers, which enabled them to engage values which so far had not been made explicit. The philosophy of these heuristics was explicated as the simultaneous consideration of the importance, cost and overlapping of the projects as well as the subjective expected The stages involved in the decision analyses in Cases A and B, and the contexts and participants, are discussed in Humphreys et ol., in press.
186
A. Viri and J . Vecsenyi
utilities computed for each project and project-combination. In fact, among the final choices they considered only importance and neglected all other parameters. (See the detailed description in Part 4.) A similar situation occurred in Case B. Here, a choice among improvement, maintenance and reduction of production of each project of the enterprise had to be made. The actual alternatives of the decision were, therefore, the possible development strategies of the enterprise. Analysis, however, was confined t o the comparison of the individual products. In order to make decisions about the actual development strategies, additional criteria were used, e g . , perspectives of the individual products related to governmental programs, costs required for development, capacity constraints, etc. Nevertheless, at this stage, decision makers did not wish t o apply formal analysis. Von Winterfeldt (1980) has pointed out that approaches based on multi-attribute utility theory (MAUT) are often not adequate in solving resource allocation problems, since MAUT enables only assessments of the individual alternatives and it is not suitable for facing the project-interdependencies and the continuous character of the budget. In our experiences, however, the decision makers still insisted on evaluating individual projects with MAUT. This could partly be due to the relative simplicity of MAUT and to the clarity of its results. But another similarly important reason for using MAUT for individual projects is the frequently observable phenomenon that decision makers prefer the analysis of particular elements of the decision problem to that of the decision problem as a whole. This provides them with additional information useful for surveying complex situations. At the same time, only a part of their values and preferences have to be made explicit and submitted to formal analysis, while implicit values can be taken into consideration intuitively by the decision makers during the actual decision. In h e r a r c h i d decision-making systems, tlus partial analysis helps the lower level decision makers to meet the expectations of higher-level decision makers. This would otherwise be fairly difficult to incorporate into formal decision analysis, as the experiences for other analyses have shown. In Case C, for example, the original intention of the decision makers of the corporate was to evaluate, as a first step, foreign licence offers only and then, in a later phase, to choose between the best foreign and the best domestic alternatives. They assumed that a comprehensive analysis and comparison of the foreign licence offers would justify for the higher level decision makers the importance of buying a licence from abroad instead of home development. Only after some deliberation could we convince decision makers to accept our proposal to compare all foreign and domestic alterriatives using the same criteria.
DECISION ANALYSIS OF INDUSTRIAL R & D PROBLEMS
187
Pitfall 2: Dependent and Overlapping Attributes According to our experiences, the development of attributes is one of the crucial phases of decision analysis. On the one hand, the attributes should allow reliable evaluations of alternatives, while on the other hand, they should facilitate the aggregation of values. These requirements have been more or less contradictory in our cases. The main contradiction as t o the structuring of the decision problem lies w i t h n the fact that while the reliability of evaluation in terms of the individual attributes can generally be improved by raising the number of attributes, the difficulties of their simultaneous consideration will increase. In Cases A and B, high-level categories of criteria were defined by elementary sub-criteria. In Case A, sub-criteria were obtained by brain storming. High-level categories of criteria were defined by clustering. Clustering was first made individually by the members of the group. Afterwards an automatic classification algorithm was used for creating the clusters best compatible with the group clustering. The results were interpreted and criterion categories were defined by the group. Figure la. shows one of the criterion categories and its sub-criteria used in evaluating R & D projects. In Case B, high-level criteria had already been available in the form of recommendations. These criteria were adapted by the participants of the analysis to the concrete situation and subcriteria were determined in the interpretation process. Figure l b shows one criterion category and its sub-criteria. The examples show that the relationship between hgh-level criterion categories and elementary sub-criteria is generally of a wholepart character which is not necessarily additive. In neither case did we seek to reveal these complicated relationshps. However, as a result, it was very difficult to interpret the criterion categories and to determine, at least qualitatively, the end-points of their scales. Since we used the sub-criteria for the purpose of definition only, and since we did not evaluate the alternatives in terms of these sub-criteria, we did not have the opportunity to use them for checking the evaluations in terms of the higher-level criteria. This resulted in substantial uncertainties on the evaluations reflected in differences between the various decision makers’ conceptions of the criteria, which we tried to reduce through group discussions. In Case C, we acted differently. We tried t o analyze the goals of the enterprise concerning the development strategies and the means of achieving them. Figure 2 shows a d e t d of the revealed goal hierarchy. We assigned criteria to each element of the goal hierarchy and asked the decision makers to evaluate the alternatives in terms of each criterion. Statistical analysis of the evaluations enabled us to check whether the
188
A. Vki and J. Vecsenyi
I
Llntelligible results
I
Up-to-date quality of the product
Technical parameters Novelty of the product Up-to-dateness of the materials used
I
~
~
~ service ~
~
Range of supply
Figure 1. Examples of Criterion Categories and Sub-criteria (a) Case A : A criterion used in evaluating R & D projects (b) Case B : A criterion used in evaluating products
m
e
r
DECISION ANALYSIS OF INDUSTRIAL R & D PROBLEMS
I I
Home market Wition
Quality of the product
Profitability
I
I
Exportability
Novelty of the prcduct
!
189
Technical development of the corporate
7
u Novelty01the technology
Figure 2. Goal Hierarchy for Development Strategies (Case C)
supposed means-end relationships did actually exist, and whether the evaluations did not contradict them. Preliminary estimations of the strength of interdependencies among criteria were made by the decision makers and these estimations were compared with the pair-wise productmoment correlation coefficients. By discussing the differences between assessments obtained through the two different approaches, we succeeded in reducing the contradictions between direct and indirect correlations. Making arguments more explicit also raised the consciousness and improved the reliability of the evaluations.
Pitfall 3: Means Don’t Aggregate to Ends As mentioned earlier, an additive model was applied in all three analyses
for aggregating single attribute evaluations. This did not cause us any trouble in Cases A and B, because the criterion categories were formulated at a high level of abstraction, and met the requirements of utility independence relatively well. In Case C, we originally intended t o aggregate the evaluations up to the highest level criteria. This approach turned out to be inappropriate, since the decision makers were not indifferent about the means by which the same goals might be achieved. We found it problematic to determine the importance of the means-criteria and to include such criteria in the summary model. Since the criteria located at lower levels of the goal
190
A . Vari and J . Vecsenyi
hierarchy are partly ends and partly means of fulfilling higher-level ends, their weights have a double character. In a multi-attribute utility model only the ends-component should be considered, but we have n o suitable method for separating this component from the one related t o the means. Further analysis of criteria revealed that not even the goals at the highest level of goal hierarchy have a pure ’ends’ character. They often can be each other’s means. For instance, a greater profit can establish a better economic base for technological development in the long run, while t e c h ological development in turn may influence profit. The above goals cannot be logically combined since they are environmentally related and can even contradict each other under certain conditions. Separation of the means-components of goal-criteria is as difficult as that of the goal-components of means-criteria. If, as in Case C, these components cannot be carefully separated, we face a contradiction: on the one hand, we gain reliability in single attribute assessments through the decomposition, on the other, the aggregation becomes less meaningful. To eliminate this contradiction, it seemed useful to identify quasi-independent means-end sub-hierarchies. In Case C, we tried t o obtain such sub-hierarchies through the use of canonical factor analysis, and the canonical factors obtained were used as inputs in the MAUT-model. The results are shown in Figure 3.
Organizational conditions
Novelty of the technology
Content and level of support
Range of the know- how
I
-M
Internal market position
Exportability
Quality of the product
Risk of the development
Possibility of further development
I
factor
, 2 nd main
.
3 rd main
Canonical factors
L
High level preferences
Profitability
External benefits
development
Organizational
Technical development
Goals -
I
A. Vari and J. Vecsenyi
192
Pitfall 4: Discounting for Uncertainties ys. Striving for Success In the assessment of R & D projects (Case A) overall uncertainties had t o be considered: the technical success of the research, the success in implementing the R & D results, and the success in applying them. The subjective probability of the above success categories were estimated for each project by the decision makers. The maximum feasible utility (U) calculated for each project was that which would be obtained in the case of total success in research, implementation, and application. Probabilities and utilities were combined to provide a SEU evaluation of each project. Prior to being informed about the SEU values and U values, the decision makers were asked to give a preliminary ranking for the projects. Comparing this ranking with rankings based on SEU as well as U values, we found that the preliminary ranking showed a strong positive correlation with the ranking based on U, but a small negative correlation with the ranking according to SEU. Our interpretation of this result is that the preliminary judgments of decision makers neglected the uncertainties and instead expressed a desire, a wish, or a necessity to obtain the maximum utility from the project. What happens if the decision makers get to learn the U and SEU values and the ranking of projects based on them? Figure 4 shows the location of the ten R & D projects in a space whose axes define SEU and maximum feasible utility. For some projects SEU
t t II
+I11
t v ---.
10
IV t*O
%
- -
t x
5
w 3 0
Figure 4. R & D Projects Located in t a m s of their SEU and Maximum 1:easible Utility
Maximum Feasible UtllltlJ (U)
DECISION ANALYSIS OF INDUSTRIAL R & D PROBLEMS
193
(11, X) both values are high, while for others the U and SEU values are
contradictory. Projects VIII and I, for example, have a high U but their success is less probable (low SEU), while projects 111, V, and VII have low U but their success is more probable (hgh SEW. The projects actually selected by decision makers aware of the results of the U/SEU analysis are encircled in Figure 4. Apparently, the decision makers favoured the high U projects rather than hgh SEU projects. The only deviation from a strict U rank ordering is that projects I and I1 are substituted by project VI. This can be explained by the fact that there are some overlappings in contents on these projects. The decision makers explained this final choice by stressing that these projects were very important and consequently their success should be important. The results of the analysis were utilized by alerting the organizations involved to the fact that in certain projects (VIand IX) "great attention should be paid to promoting the implementation of research projects and to the promotion of the use of the results in practice", as it was stated by a decision maker. The paradoxical situation arose that, although the decision makers had acknowledged the necessity of probabilistic thinking, and had made great efforts to estimate subjective probabilities, SEU was not accepted for actual decision making. This might have been due to the low values of thz probabilities of the overall success (plotted in Figure 5 against v). Instead of differentiating among the risks of failure of the projects, the decision makers put forth suggestions designed to alter the social world of the implementation of the projects in such a way that uncertainty about future events would be reduced and hopefully eliminated. Probability
of Ovpr(;ll Success (P)
1O 18b I/
+I1
t Ill
l2i
A
Maximum
Feasible 20
30
UtllltlJ (U)
Figure 5. R .& D Projects Located in terms of Probabilities of Overall Success and Maximum Feasible Utility
13
194
A. Vari and J . Vecsenyi
Conclusions 1. The first difficulty in our analyses was that being comprehensive (in terms of mapping all alternatives and their implications according to goals, sub-gods, means and their relationships) often created attribute structures which were incompatible with simple aggregation rules. In our opinion, there is no need, in every case, for aclueving both comprehensiveness and simple aggregations. In cases where, due to the character and time-scale of the problem, the uncertainty of estimations will not be reduced by a more detailed analysis, it is often sufficient to form more or less independent attributes at a relatively high level of abstraction, and to aggregate these high level goals. On the other hand, when the preferences of the decision makers cannot be reliably assessed (e.g., in the case of some toplevel decision makers) it is not always necessary to aggregate assessments across all attributes. Subaggregation across some attributes can be quite useful in a comprehensive analysis of alternatives, consequences, and means-end relationships. If both the detailed analysis and the multi-attribute aggregation seem to be promising, the gap between the two kinds of analyses should be adequately "bridged". This means that a multi-attribute evaluation should consider each attribute as well as the relationships among them. For this purpose, certain tools of mathematical statistics, such as the canonical factor analysis, employed here, may be useful. 2. The experiences of our case studies indicate that decision makers do not always accept the values aggregated from their "fragmentary" judgments and formed according to prescribed rules (e g.,SEU). If the results of decision analysis essentially contradict the preliminary ideas of decision makers, the decision makers are inclined to restructure the problems in order to reduce these contradictions (e.g., to eliminate certain attributes, to use new ones and to correct estimations and assessments). Humphreys and McFadden (1980) report similar findings. Their decision makers, using a computerized decision aid without help of a decision analyst, frequently restructured their problem in the course of the interactive sessions with the computer. In the cases described above, however, the elicitation of opinions and the use of computer programs were controlled by decision analysts. It is possibly due t o this and to the group interactions in the decision analysis that restructuring of the tasks and adaptation of the preliminary ideas were not attempted by the decision makers during the process of formal analysis of the projects. 3. Considering the observed reluctance of decision makers to allow an analysis of the whole problem, the question arises whether we can find methods which do not require the decision makers to "put their cards on
DECISION ANALYSIS O F INDUSTRIAL R & D PROBLEMS
195
the table", but still help in solving the real decision problems as a whole. Alternatively, we could resign ourselves to help the decision makers in structuring, learning, and estimating problem elements only. Usually, there is a compromise between the intention of the decision makers and the aspiration of the analysts as to w h c h methods to apply when analyzing a given problem. This decision is also multi-attributed, considering such criteria as, e.g., the connection between decision makers and analysts, the individual or group character of the decision making, the time factor, the efforts required, etc. We assume that considerable differences exist about the usage of decision aids, depending upon cultural factors (e.g., the expected role of the consultants, the function of group discussions, etc.) as well. We believe that further investigation of the above topics,e.g.,through crossnational comparative analysis, would be necessary for supporting the choice of the appropriate decision aiding tools under different circumstances.
References Humphreys, P. and W. McFadden, 1980. Experiences with M A W : Aiding decision structuring versus boot-strapping the decision maker. Acta Psychologica, 45, 51-70. Humphreys, P. C., 0. Larichev, A. Viri, and J . Vecsenyi. Comparative analysis o f use of decision support systems in R & D decisions. In: H. G Sol (ed.),Processes and Tools for Decision Support. Amsterdam: North-Holland (in press). John, R. S., D. von Winterfeldt, and W. Edwards. The quality and user acceptance of decision analysis performed by computer vs. analyst. In this volume, 30 1-31 9. Kahne, S., 1975. Procedure for optimizing development decisions. Automatica, 11. Torgerson, W. S., 1967. Theory and Method of Scaling New York: Wiley. V k i , A. and L. Fiistos. M6dszerek licenciavisklasi dontksek megalapozisha (Methods for decision making in buying licences) (in Hugarian and Russian). Presented at the joint meeting of MNIIPU and OMFB REI. Budapest: Szigma (in press). VLi, A. and L. Divid, 1982. R & D planning involving multi-criteria decision analytic methods at the branch level. Collaborative paper CP-82-73. Laxemburg. Austria: IIASA. Vecsenyi J., 1982. Product mix development strategy at the enterprise level. Collaborative paper CP-82 -74. Laxenburg, Austria: IIASA. von Winterfeldt, D., 1980. Structuring decision problems for decision analysis. Acta Psychologica, 45, 71-94.
13*
This Page Intentionally Left Blank
Section I11
AIDING THE STRUCTURING OF SMALL SCALE DECISION PROBLEMS
This Page Intentionally Left Blank
INTRODUCTION Anna
VARI
The widespread use of microcomputers has been associated with an increasing interest in computerized decision aids which can assist individuals in solving their problems. This interest is well reflected by the relatively rapid spread and use of programs discussed in this section, like that for "Multi-Attribute Utility Decomposition" (MAUD), developed by Humphreys and Wisudha, and that for "Goal-Directed Decision Structuring" (GODDESS), developed by Pearl, Leal, and Saleh. One of the aims of the authors in this section has been to provide extensive survey of computerized decision aids based on different principles and developed for supporting the stucturing of small-scale decision problems. At the same time, they have attempted t o answer general questions of principle, as for instance, what the division of labour should be like between the decision maker, the decision analyst and the computer. Which activities of the human decision-muking process should be automated and which should be augmented? To what extent can the help given by an unalyst in resolving a decision problem be translated into software? How can the human representation of decision problems be modelled and what are the requirements to be met by the procedures used in decision aids? In lus paper, Gordon Pitz has sought an answer to the general questionson the division of labour between the decision maker and the decision aid. in approaching the above questions, the author has suggested carrying out a kind of cost-benefit analysis where "the costs of automation are based on the difficulty of automating the process accurately; its benefit! depend on the difficulty faced by the human if left without mechanical assistance". In the case of certain types of abilities-like the integration of largc quantities of information - the superiority of computers over the human i!
200
A. Vari
quite obvious, and the advantages resulting from use for such tasks can be easily determined. Concerning other abilities-like information retrieval from long-term memory, evaluation of complex perceptual information and creative problem solving-the systems implemented at the present technical level d o not even approach human abilities. However, assistance lent by them may considerably increase (unaided) human abilities in these tasks. The degree of assistance depends, first of all, on the compatibility of the decision aid with human behaviour. Pitz’ paper offers an analysis of some extensively used decision aids from the point of view of Compatibility with human information processing procedures, comparing the applied techniques and procedures with recent results from psychological research on human information storage, retrieval and problem solving. One of the most debated questions is how the categories ( t e r m ) and procedures constructed during decision analysis are compatible with the human mechanisms and strategies used t o develop the structural representation of the decision problem “Is the knowledge that is retrieved in decision situations actually structured in terms of actions, events, and outcomes, as decision theory assumes? How are goals connected t o these elements? These are the questions posed by Helmut Jungermann, Ingrid von Ulardt and Lutz Hausmann. In their paper they attach special importance t o the goals which they consider the key elements of problem representation. They propose a network model in which goals and actions are connected and the actions can be activated by the related goals. In order to validate the proposed model, they have investigated the assumptions as follows. When goals are made more explicit and specific -which means an increase in goal activation-the activation of the connected actions will also be increased. As a consequence, more options will be generated. On the other hand, when the problem is considered from the point of view of one’s personal priorities-which means the limitation of goal activation compared with the less self-oriented approach -the number of actions generated will be limited as well. The results of the experiments support the hypotheses on the above two factors, i.e., goal explicitness and personal involvement, thus demonstrating the importance of goals in the process of generating options. In Dimiter Driankov and Ivan Stantchev’s paper a similarly great importance is attached t o the goals in the process of option generation and evaluation. Goals are considered as complex wholes which are described in terms of constituent parts and interrelations among these parts. The authors suggest a fuzzy structural modelling approach w h c h offers an efficient procedure for describing imprecise, ill-defined andlor ’I
Section Ill :INTRODUCTION
20 1
nonlinear interrelations among the constituent parts of the goals. The suggested procedure helps the decision maker in measuring the attainment of the goals, in assessing the strength of influence among the constituent parts of the goals, as well as in generating and testing options. The case study discussed by Patrick Humphreys is concerned with otherwise rather neglected methodological problems of option generation. The case studied involved the selection of a set of computers intended for meeting a relatively complex system of goals and requirements by which, from a largely extensive set of options, the specification of options worth of consideration can be made. The suggested procedure is based on the thorough analysis of the goals to be achieved. The first step is the hierarchical decomposition of the goals into computing facilities. This is followed by the mapping out of three requirements spaces in which the interrelatedness of the computing facilities can be described from the point of view of the hardware and software requirements to be met as well as from the point of view of the groups of potential users. Partitioning the requirements spaces in different ways is used for generating different configurations of systems meeting the whole set of requirements. First, options are generated under different philosophies (e. g., assuming the use of microcomputers, "multi-micro" approach, etc.), and afterwards actually purchasable computers are specified which meet the requirements prescribed for the parts of the composite options. The case study demonstrates the practical applicability of certain procedures of goal analysis, as e. g., the herarchical decomposition of goals, the use of multidimensional scaling for mapping the requirements spaces, for aiding the generation of options. It shows also how flexibility analyses can be performed to options under consideration. Mule the above papers discussed the issues of the relationships between goals and options, with special regard to option generation, other papers included in this chapter have focussed their attention on computerized procedures of aiding the evaluation of options All the programs investigated are based on multi-attribute utility decomposition (MAUD) comprising automated procedures for elicitation and weighting of attributes as well as for computing multi-attribute utilities. The experiments described by Fred Bronner and Robert de Hoog have double aims. In part, they attempt to elaborate evaluation techniques which can help developers of interactive computer-based decision aids in revealing and eliminating the deficiencies of software still in the process of development. On the other hand, by using these techniques, they evaluate the experiences of the use of a MAUD-based decision aid (developed by themselves) by people without a special training in decision analysis.
20 2
A. Vari
For the evaluation of the decision aid, two criteria were introduced: the overall ease of interaction with the program and the helpfulness of the aid The value of the first criterion depends on how independently, i. e., without any technical or conceptual assistance, can the subjects go through the decision analysis, namely, how much the program is compatible with the procedures of the individual problem structuring of the decision makers. In order to measure the degree of helpfulness, a so-called scale of consciousness raising was introduced. The scores on this scale are dependent on the subjects’ judgements as to the importance of the program from the points of view of problem structuring, awaicness raising and decision justification. The authors’ experiment for testing the proposed criteria and measurement techniques in evaluating their program has proved t o be successful. For example, through investigating subjects’ requests for (external) assistance in different phases of the analysis, the stages of the program requiring considerable improvements can be identified. In comparing different decision problems in terms of ”perceived applicability” scores provided by users of the program, guidelines for the application area of the software can be defined. Analysing the relationshps between scores on the proposed evaluation criteria and characteristics of the subjects (education, age-group, sex, technical experience) can lead to important decisions related to the education and training of the potential users. Similarly to Bronner and de Hoog, k c h a r d S. John, Detlof von Winterfeldt, and Ward Edwards also consider the attribute elicitation procedure to be the most crucial phase of multi-attribute utility analysis. They compare the effectiveness of decision analyses performed by a computer running the MAUD 3 program (written by Humphreys and Wisudha) and by the analyst from the point of view of the number and quality of the elicited attributes. According to the results of their experiments, the number of attributes elicited (“completeness” of the analysis) was greater for analyst sessions than for MAUD 3 sessions. However, the attribute set elicited by MAUD 3 was judged t o be more independent, both logically as well as valuewise. Judgements of overall quality of the attribute set hghly correlated with those of ”completeness”. Further findings from John, von Winterfeld and Edwards’ experiments point out the contradictory character of the evaluation criteria for decision analysis as well. The users’ satisfaction with the process was better in analyst sessions, while the acceptance of the resulting alternatives was higher in MAUD3 sessions. This points to the fact that the subjects’ feelings were influenced by factors different from the quality of the results, eg., by the supposed “completeness” of the elicited structure, the
Section 111: INTRODUCTION
20 3
need for personal interaction, the experience with or aptitude for computer-like tasks, etc. Consequently, instead of raising the dogmatic question of "computer or analyst" it is more to the point to create the most satisfactory division of labour between the computer and the analyst, considering the characteristics of each decision situation. The paper by Stuart Wooler and Alma Erlich deals with another critical phase of multi-attribute utility analysis: attribute weighting. They made their investigations in connection with a MAUD-based set of interactive computer programs whch had been developed for university students to structure and evaluate their own career options. The weighting assessments of two groups of students were investigated: Members of the first group were at the stage in their decision processes where they had fully determined the option set they wished to evaluate, whereas the second group was at the stage where they had structured a preliminary option set, which they wished to evaluate with a view to determining those options worthy of further consideration. The weighting strategies of the two groups differed signlficantly. The weights assessed by participants concerned with structuring correlated with the degree of satisfaction with the current option set on each attribute, namely, the degree of achievement of the subgoals represented by the attributes. On the other hand, the weights assessed by those currently concerned with evaluating options were based on weight judgements across "grand scale" attributes, without incorporating a measure of the range of the option set on each attribute. The fact that the weightingstrategy of both groups contradicts the prescriptions of decision theory hghlights the point that, in every case, the choice of attribute weighmg procedures should be made with due consideration. Special attention should be devoted to the anchors to be used in decision situations where-as here in the case of career choice-the decision makers usually face a lack of knowledge necessary for constraining the attribute scales by reality and feasibility.
This Page Intentionally Left Blank
HUMAN ENGINEERING OF DECISION AIDS' Gordon F. PITZ Southern Illinois University at Carbondale, US.A.
Broadly speaking, this paper is about helping people to make decisions when confronted with situations in which they are not sure what to do. When one reflects on the issues involved in this topic, it becomes clear that almost everythmg known about human behavior is potentially relevant. T o make the topic manageable, I have defined certain boundary conditions on the kinds of decision problems to be considered, and on possible approaches to solving these problems. Someone interested in a more thorough survey of decision aiding methods and principles can sample from a variety of recent books and articles written from different points of view (e.g., Janis and Mann, 1977; Hogarth, 1980; Jungermann, 1980a; Humphreys, 1981). The situation I had in mind in writing this paper was one in which a problem of personal choice is to be assessed from the point of view of a formal model of decision analysis. 1 have assumed that external expertise is not available either in the form of a human expert or as part of the decision analysis. Several computer programs have recently appeared that can implement such a decision analysis (e. g., Humphreys and Wisudha, 1979; Weiss, 1980; Pearl etal., 1980). What makes these programs interesting t o a psychologist is their generality; they make few assumptions about the nature of the person's problem, and include procedures for helping the person to describe any problem in terms that would be suitable for analysis. The programs attempt to formulate the problem according to some formal structure or representation; they vary in how they help a person to achieve this representation.
'
I would like to thank Michael Franzen, Natalie Sachs, and three anonymous reviewers of the manuscript for their helpful comments on earlier versions of the paper.
206
G.F. Pitz
Programs of this sort are likely to proliferate, and the ones I have nmtioned will no doubt be modified on the basis of experience with their use. The purpose of this paper is to discuss some psychological issues involved in the design of decision aids of this kind. "Human Engineering is the name given to that branch of applied psychology that uses general principles of human behavior to guide the design and construction of operational systems involving people (Alluisi and Morgan, 1976). I have used the term in the title of tlus paper because it captures quite precisely the issues involved in the design of decision aids. One approach used by human engineers t o issues of design is to regard the human as an integral part of a larger system, and t o investigate ways in which the contribution of each part of the system to its overall functioning can be made most efficient. One may divide t h s approach into two parts: First, find an appropriate division of labor between the human and the rest of the system; second, given that one has determined the responsibilities of the human component, design the total system in such a way that the person can meet these responsibilities most effectively. By focussing on the interaction between decision aid and decision maker I may give the impression that there are no other issues involved in the design of the decision aid. In typical applications of decision analysis there is usually a third party involved, namely, the decision analyst. Furthermore, many applications will incorporate other sources of information, especially when something is known about the substantive concerns of the decision maker. For example, an analysis of career decision making would certainly need to include relevant sources of vocational information. My failure to discuss these issues should not be taken as an indication that they are any less important than those that I have discussed.
Automation and the Division of Labor The systems approach to design regards the decision maker and the decision aid as parts of a single decision making system. The overall goal of the system is to arrive at a choice that maximizes some explicit criterion, given certain assumptions about the nature of the problem. The importance of the conditionality implicit in this statement has been emphasized by Einhorn and Hogarth (1981). There may be few assertions to be made that are independent of the assumptions made about the problem. Nevertheless, it should still be possible to offer general guidelines that apply to the design of specific systems. Achieving the objective of the decision system is clearly dependent on input of some sort from the human decision maker. The person must
HUMAN ENGINEERING OF DECISION AIDS
201
provide some information about the nature of the problem, about the uncertainties that are involved, about the values t o be taken into consideration, etc. On the other hand, a mechanical component of some sort is necessary to provide an integration of all this information. The integration task is normally carried out by calculating utility functions, expected utilities, or other quantities. The output of the system, therefore, can be regarded as a set of descriptive statements calculated by the mechanical component. Between the initial human input and the final mechanical output, however, it is not clear which components of the decision task should be automated and which should be left to the individual decision maker. The important question is, what should a decision aid try to do for the decision maker? Deciding which parts of the process t o automate requires a kind of cost-benefit analysis. The costs of automation are based on the difficulty of automating the process accurately; its benefits depend on the difficulty faced by the human if left without mechanical assistance. In some cases, the relative superiority of human and machine are clear cut. For example, the computational abilities of the machine are clearly far superior; once probabilities and utilities are known, no one would suggest leaving the calculation of expected utilities to the human. Machines are better able to follow formal algorithms without error, they can deal better with known information in a predetermined way, and they are able to insure that no information is ignored that is known to be relevant. Conversely, we know that limitations on human short-term memory make it difficult to integrate large quantities of information (Newell, 1973). It is presumably for this reason that people revert to simplifying heuristics in making judgments (Lopes and Ekberg, 1980). Identifying the unique strength of humans is more difficult, since advances in artificial intelligence may make today’s human strength tomorrow’s mechanical advantages. Nevertheless, there are a number of respects in which no machine can yet approach the effectiveness of a person. One such area is the encoding, storage, and retrieval from longterm memory of large quantities of information organized in complex ways. In contrast to the limits on short-term memory, the capacity of, and the speed of information retrieval from long-term memory are quite remarkable. As yet there is no mechanical substitute for human memory in this respect, even with quite narrowly defined content areas. Weiss (1980) has proposed a decision aid (GENTREE) that would have access to a very large data base for decision structuring. While such an effort is admirable, one must be pessimistic with respect to its success. The kind of data base with which computers deal efficiently is one in which a large number of items are organized in fairly simple ways (long lists of people, places, etc.). The
G.F. Pitz
208
kind of data base that seems most useful for representing a decision problem is one involving multiple, complex interconnections among items of information. In recent years, some understanding of the organization of human memory has developed, and artificial models of memory organization have appeared (e.g., Anderson, 1976; Bobrow and Norman, 1975; Minsky, 1975). These models give some insight into the nature of human memory, but can hardly be said to replace it, One of the responsibilities, then, for the human component of the decision system will be to provide relevant information about the problem from knowledge stored in l o n g term memory. A second, uniquely human ability that may be important in designing decision aids is the rapid evaluation and categorization of complex perceptual information. While integrating complex information within short-term memory is awkward and error prone, the automatic integration of perceptual information is very fast and accurate (see Schneider and Shiffrin, 1977). This perceptual skill is related t o the human ability to deal rapidly with large amounts of information stored in memory. For example, expert chess players derive much of their expertise from the rapid identification and evaluation of the dynamic features of a chess position. This is made possible by having in memory a very elaborate storage of relevant information (Chase and Simon, 1973). Computer models of this process are not very impressive. For example, recent advances in computerized chess programs rely on the rapid search and the calculation abilities of a computer, rather than on attempts to model human perceptual and mnemonic abilities.l Thus, insofar as decisions depend on perceptual evaluations, these appear to be best left to the human. Other areas in which human abilities seem t o be unique include certain aspects of problem solving, inference and creativity. Pitz et al. (1980) have suggested that identifying new, creative options is an important aspect of decision aiding. While this creativity can be assisted, it is unlikely that it can be automated. As in the case of memory, computer models of human problem solving with a well-defined task can often be quite impressive. Nevertheless, I expect that i t will be many years before generalized mechanical problem solvers can outperform humans if the problem is not well defined. This limitation will be particularly important if we are concerned with decision aiding systems that are intended t o have general applicability.
See Scientific American, April 1981. 244 (4).
83-85.
HUMAN ENGINEERING OF DECISION AIDS
209
The design of a decision aid requires that much of the relevant information be made available to the decision system from the memory of the human decision maker. The remainder of the decision analysis process will probably not be fully automated. For any given substantive problem, it may be possible to provide useful technical information, as well as an overall structure for objectives and attributes. However, there will always be advantages to having human perceptual skills and creative problem solving abilities used to evaluate the information. Nevertheless, the use of human memory, perception, and problem solving within the decision system is not without need of mechanical assistance. There are characteristic limitations to human memory and problem solving. An important part of any decision aid, therefore, might consist of devices designed to overcome these limitations. A good decision aiding mechanism would help an individual retrieve from long-term memory (and, perhaps, from other sources) information that might be relevant for the analysis of the problem. In addition, it might help a person to recognize new connections between previously unassociated items in ways that lead to creative problem solving. In order to develop procedures for assisting humans in performing these tasks, it will be necessary to know something about the nature of the information retrieval and problem solving processes. For example, if we know how retrieval from memory occurs, it may be possible to design methods for assisting the retrieval process. If we know something about how inferences are made, it may be possible to construct automatic devices that help a person to make new and appropriate inferences. Thus, we need to address the issues of designing an interface between the human and the machme. Communication Between Decision Maker and Decision Aid Problems in designing a mechanical decision aid range from the traditional concern of human engineers with the design of displays and controls, to the more subtle problem of making sure that information stored in human memory is adequately represented to the system. It is helpful in discussing these problems to divide the decision making process into four separate stages. Stage one consists of selecting an appropriate abstract model that can be used for analyzing the problem. Stage two consists of formulating the original problem in terms of the structure that is required by the analytic model. Stage three is the quantification process, in which the decision maker’s values and beliefs are represented in some form suitable for computation. Stage four consists of the feedback of results to the 14
210
G.F. Pitz
decision maker, and possible reevaluation of the judgements made at the three earlier stages. In any complex analysis these stages will not be clearly distinct, and the decision malung process will not follow such an orderly sequence. Nevertheless, the stages can be helpful in identifying some of the psychological issues involved. Design issues can be addressed separately for each of these four stages.
Model Selection There is not a lot to be said about the model selection process. A characteristic of decision aids currently available is that they offer little flexibility in the kind of analytic model used for decision analysis. Such a limitation is perhaps inevitable. The demands made on the system at later stages of the analysis are frequently unique t o one particular kind of model. To design a system capable of working with several different models would probably not be worth the effort involved. Some of the issues involved in model selection have been discussed by von Winterfeldt (1980). An effeo tive choice of model depends upon having a useful taxonomy of problems that might be used for diagnosis purposes. Von Winterfeldt has discussed existent taxonomies, and he outlines an approach based on the construction of prototypical problem types that might describe broad categories of substantive problems. Subsequent sections make no particular assumptions about the type of decision model to be used, except that it falls into the category of "orthodox decision analysis". This consists of a combination of expected utility analysis of the sort described by Raiffa (1968), and its extension to the multiattribute utility models described by Keeney and Raiffa (1976). Issues of problem structuring and quantification that arise within the context of these models will be discussed.
PToblem Structwing
Humphreys and McFadden (1980) suggest that one of the most important things a decision aid can do for a person is to provide some better understanding of the interrelationships among elements of the problem. In any event, structuring a problem appropriately, i. e., formulating it in such a way that it is amenable to analysis according to the analytic model, is far more important than the small amount of research devoted to the topic might imply (Jungermann, 1980b). There are many issues involved in this stage of the decision analysis that are not yet well understood. From a
HUMAN ENGINEERING OF DECISION AIDS
211
human engineering point of view, it may be the most critical stage in the decision analysis. In discussing problem structuring, I shall assume that we are interested in eliciting from the decision maker two kinds of information. On the one hand, we need t o know the range of options that might be available, and the possible outcomes that might result from each choice. On the other hand, we need information about the decision maker’s goals or objectives, from which we can derive an attribute structure that can be employed to evaluate options and outcomes. Assume for simplicity’s sake that the only source of relevant information is the individual decision maker. In some cases we might want t o consider the possibility that the decision aid could direct the decision maker to external sources of information (see, for example, the career guidance system designed by Katz, 1973). However, the present discussion will not address these possibilities. The simplest approach that a decision aid can take is to ask the decision maker to provide a list of options and a set of attributes. The program QVAL (Weiss, 1980) does little more than use this direct approach Humphreys and Wisudha’s program M U D possesses a more elaborate set of procedures for eliciting attributes. The elicitation approach taken by MAUD is based upon Kelly’s personal construct theory (see Humphreys and McFadden, 1980). MAUD includes a number of techniques that help the decision maker to develop a multiattribute representation of the problem. An equally elaborate approach, but based on a very different set of principles, is described by Pearl et a1 (1980). The program GODDESS is based on a goal-directed approach to decision structuring. It begins with an attempt to specify the decision maker’s primary objective, and then attempts to describe the means-end relationshlps among elements of the problem. A decision structuring mechanism needs to address three separate issues, each of which has important psychological implications. First, it is necessary to retrieve from the decision maker’s long-term memory information that might be relevant for generating options and for describing objectives and attributes. Second, it must help the decision maker to go beyond the available information. For example, it may be necessary to provide generalized, abstract characterizations of options, outcomes, and objectives, In addition, it may be useful t o help the person exercise some creativity in generating options or outcomes that have not previously been considered. Third, the mechanism must help the decision maker to d o all this in a way that is compatible with the analytic model.
14”
212
G.1:. Pitz
Information Retrieval Retrieval of information from memory is a process that is well understood by cognitive psychologists. Its role in structuring decision problems has been discussed by Gettys and Fisher (1979). It is possible to identify a number of reasons why important information might not be retrieved at the appropriate time. For example, retrieval is known to be context dependent; information stored under one set of conditions may not be retrievable in a different context (Light and Carter-Sobell, 1970). The best general solution to this problem is t o devise procedures for encouraging a person to view problems from a range of different perspectives. Another way in which the decision aid might help in the retrieval of information is by providing cues that have a high probability of being associated with relevant information. In this way, the aid can direct the decision maker’s search for information in long-term memory. Since we are assuming that initially the decision aid has no information about the specific problem faced by the decision maker, retrieval cues must obviously be derived from the decision maker. Any procedure for assisting retrieval must therefore include a boot-strapping process, in w h c h preliminary information obtained from the decision maker is used to generate retrieval cues that might be helpful in generating new information. This approach to structuring has been discussed by F’itz, Sachs, and I-leerboth (1980) and by Pitz, Sachs, and Brown (1981). We described two different techniques that seem to be effective in helping a person to generate options and objectives. The technique that we found to be most helpful is based on the theoretical concept of “schemata“ (see below), and on the kind of means-end analysis that is represented by the GODDESS program (Pearl et al., 1980). It involves using information about options t o suggest relevant objectives, and using information about objectives to elicit possible options. Since in many cases one cannot differentiate the retrieval of existent information from the generation of new information, a discussion of this technique must include a consideration of the inference process. Retrieval and inference appear to be closely interrelated mechanisms.
Generating New Ideas It is almost impossible t o rely on retrieval alone to provide an adequate structure for a decision analysis. By specifying options, outcomes, and attributes, the decision maker will inevitably begin to explore new ideas. It is clear that memory processes cannot in general be separated from infer-
HUMAN ENGINEERING OF DECISION AIDS
213
ence processes (Bransford et al.. 1972). Current theories of memory make extensive use of this interdependence. Several recent theories suggest that memory is organized around "schemata" or "scripts", which are abstract scenarios that describe typical sequences of events and actions in one's daily life. It is by using such schemata that a person is able t o understand and remember information. Establishing the connections between generalized schemata and specific event sequences involves the use of inference processes (see, e.g., Schank and Abelson, 1977). By reference to the concept of schemata, we can understand how an individual might have difficulty in structuring a problem completely, according to the set of scripts that the person thinks are relevant for the problem. T h s kind of fixation can be seen in numerous studies of problem solving. The phenomenon of "set" (Luchins, 1942) refers to the continued use of strategies that at one time were useful, but are no longer relevant. The similar phenomenon of "functional fixity)) (Duncker, 1945) is the identification of some component of the problem in terms of its most obvious function, and the consequent failure to recognize alternative functions that might be served by that component. In both cases, a person is limited by sequences of ideas that follow well established lines. Creative solutions t o problems are typically found by breaking away from such structured thinking. One way of doing this is to stimulate alternative scenarios. Pitz et nl. (1980) described how this might be done. They found that the number of options that a person could generate for a typical decision problem was increased by providing the person with a list of the relevant objectives, one at a time, and asking the person to think only of choices that might be helpful for acheving that objective. In a further study of the method, Pitz et al. (1981) found that the additional options generated in that way were not identifiably poorer choices. They also found that the same procedure could be used effectively in reverse. By presenting a list of options, one at a time, and asking the decision maker to think about possible outcomes of each choice, it was possible to generate a more complete attribute structure for analyzing the problem. The procedure used for generating objectives and attributes by Pitz et nl. (1981) differs in some interesting ways from the procedure used by MAUD (Humphreys and Wisudha, 1979). The essence of MAUD's procedure is to focus on dimensions that define differences between options. The Pitz et al. procedure focuses on outcomes of individual options. In a recent experiment, Michael Brown and I compared the effectiveness of these two procedures. We were unable to find any difference between them in terms of the completeness of the attribute structure that results, or its ability to differentiate among the options. Nevertheless, there appear
214
G.F. Pitz
to be some important psychological differences between the two procedures that would be worth exploring further. There may be other ways in w h c h the retrieval and generation of relevant information can be assisted by a mechanized decision aid. As mentioned earlier, one of the reasons why human problem solving sometimes fails is the limited ability of short-term memory to deal simultaneously with several items of information. Many failures to recognize new solutions may occur because the individual never had the necessary pieces of information together in short-term memory. It should be possible t o design display systems that present elements of the problem structure in groups in such a way that the decision maker can use them creatively. One must be cautious, however, in using such a procedure. The ability of a machine to present the decision maker with large quantities of information can actually interfere with the efficiency of the decision making process. For example, management information systems now exist that enable a decision maker to retrieve and display large quantities of information. However, one effect of increased information may be to increase the decision maker’s confidence in his or her eventual decision without any necessary increase in accuracy (Oskamp, 1965). Einhorn and Hogarth (1978) describe how it is possible under some conditions to learn to be very confident about poor judgements, and never to discover this error.
Problem R eprcsen tation Generating a formal problem structure requires that a decision maker’s knowledge, beliefs, and preferences be translated into a format appropriate for the analysis. The decision maker’s description of the problem, even if complete, will not necessarily be suitable as a formal structure. For example, the typical multiattribute utility model considers outcomes and options to be points in a continuous multidimensional space. There may be times when the decision maker’s description of the information can be characterized in this way-for example, when the important concerns for the decision maker can be described by variables such as cost, time, and distance. However, in most cases a spatial representation of the information may be very difficult to obtain. Recent theories of the organization of knowledge, and theories of preference, tell us something about how relevant information is organized for the decision maker. While there are some theories of knowledge that are spatial in character (e. g., Rips et al., 1973), most recent theories suggest that knowledge consists of a network of associations among discrete nodes. Information may consist of features, relations, hierarchical
HUMAN ENGINEERING OF DECISION AIDS
21 5
structures, or production rules, but the way in which these are related to continuous quantities is not at all clear (see Pitz, 1980). Similarly, there are non-spatial theories of preference (e.g., Tversky and Sattath, 1979) that may help to explain how problems are represented cognitively. An important concern in the design of decision aids is to find procedures for translating non-spatial information about preferences and knowledge into the format required by a normative model. As an example of the difficulties involved in creating an appropriate problem representation, consider an application of multiattribute utility theory to the problem of selecting a contraceptive (Sachs and Pitz, 1981). Both cognitive theory and actual experience suggest that the most suitable representation of the problem is one that characterizes the options in terms of discrete features that are unique t o each contraceptive. The pill and the IUD pose certain health hazards (different in each case), rhythm methods require a period of abstinence, and so on. There is no reason in principle why a multiattribute utdity model could not use such features as attributes. In practice, however, the problem of assigning weights to the features is then unmanageable, due largely to the difficulty of establishing tradeoffs among attributes of this sort. The alternative approach is t o use more abstract attributes that are continuously variable (we used such attributes as convenience, personal feelings, etc.). To establish such attributes, however, requires a great deal of effort; it took six of us several months t o establish the attribute set, and we still ran into unexpected difficulties later. To ask a decision maker to perform this abstraction process for each separate problem is to violate the original purpose of decision analysis. The assumption behnd the analysis is that one should minimize the amount of integration that the decision maker has to perform by asking the individual to evaluate only the basic elements of the problem There are other recent findings concerning the representation of information in long-term memory that may have implications for decision structuring. Wilks (1977) has suggested that knowledge can be divided into two broad categories, causally-oriented and goal-oriented. The former deals with cause and effect sequences among events, the latter with actions that have some intentional or goal-directed property. When representing a problem for purposes of decision analysis, goal-oriented knowledge may be most relevant when describing the objectives that are to be achieved; causally-oriented knowledge may be most important’ in describing tree structures that connect possible options with subsequent outcomes. A recent study by Graesser, Robertson,and Anderson (1981) suggests that these two kinds of information are organized in different ways. If this is the case, problem representation procedures that deal with objectives may
216
G.F. Pitz
have to be different from those that deal with options and outcomes. Note that the program MAUD, which is primarily concerned with the attribute structure that relates to the objectives, has a set of procedures quite different from those used by GODDESS, which is concerned with the means-end relationships between options, outcomes, and goals. One important property of goal-oriented knowledge is that it is hierarchical. Achieving a single goal may depend on first achieving a number of sub-goals, while each sub-goal requires establishing intermediate goals, and so on. For this reason, goals at each of these levels may be regarded as options; one may establish as an option the pursuit of a given objective. For example, suppose that a college student is trying to select a career. One option is t o pursue a career in medicine, which requires that the student first pass a number of science courses, then make application to, and be accepted by, a medical school, find the necessary financial support, and so on. In this situation the distinction between options and objectives becomes confused. Any option may require that certain conditions be met before it can be selected, which leads to the specification of new objectives. On the other hand, one may choose whether or not t o include a certain objective in the analysis. Descriptive models of human problem solving suggest that people do think in terms of such means-end relationships (e.g., Atwood and Polson, 1976). Yet the usual form of decision analysls has difficulty in capturing this representation (see Pearl et al., 1980). Problem Translation
Because a person’s cognitive representation of a problem may be so different in character from that required by a decision analysis, it would be helpful if decision aids could somehow translate a direct description of the problem into a format compatible with the analyis. To some extent this is what the programs MAUD and GODDESS try to do. MAUD, however, still requires the decision maker to identify and name the relevant dimensions, GODDESS is more compatible with a means-end representation of problems, but it is not clear that it can capture the feature-based structure of many personal decisions. Problem translation would be a two-stage process. First it is necessary to assess the structure of the decision maker’s representation of the problem; then that representation must be translated into an analytic structure. There are several procedures that have been described recently that might be adaptable for the problem representation component. For example, Reitman and Rueter (1980) describe a technique for assessing
HUMAN ENGINEERING OF DECISION AIDS
217
the structure of information in long-term memory that is based on an analysis of free recall; it uses item sequences and pauses between items to infer the underlying structure. Craesser et al. (1981) describe a procedure that uses a series of "how", "why", "when", and "where" questions to generate a representation of a person's knowledge structure. Both of these techniques lead to representations in terms of discrete nodes and associations among nodes, rather than as points in multidimensional space. To be useable for decision analysis purposes, one must develop algorithms that allow mapping a representation of t h i s sort onto a multiattribute space. In order to develop such a mapping procedure one must know something about the kind of problems that people confront, and how these problems are represented cognitively. One might employ something like von Winterfeldt's concept of problem prototypes, but using a taxonomy that is defined in terms of cognitive representations rather than situational characteristics. One way to initiate such a taxonomy is to develop a syntax for problem representation. It may be possible to identify a set of rules that describe the form of a problem, independent of its content. Some preliminary research of mine has been concerned with the way in which people describe their problems. A person is asked to characterize the problem in a single sentence, beginning with the words "I don not know ..." One can construct a grammar for parsing the sentences that are produced this way. That is, each sentence can be described as a sequence of transitions betwesn nodes of a syntactic net. What is of interest is the possibility of inferring important properties of the problem from a purely syntactic analysis of this sort. For example, a distinction that may be important for decision structuring is between problems in which the options are well defined, and problems in which the decision maker needs to expend some effort to discover suitable options. It appears that sentences beginning "where", "when", or "which" are more often associated with problems with well defined options. "Whether" typically identifies twwhoice problems. "How", on the other hand, is usually associated with problems in which a search for suitable options would be an important part of the decision analysis. The purpose of a syntactic representation is to provide a formal mechanism for identifying problem characteristics that are important from a decision analytic point of view. The goal of such an approach is to describe a problem without having to use concepts or frameworks that are incompatible with the cognitive representation of the problem. Only if one can establish translation rules that map cognitive descriptions onto formal representations will it be possible to automate the most difficult part of the problem structuring process.
218
G.F. Pitz
Quantification Problems involved in quantifying a decision maker's values, preferences, and beliefs are widely known and have already been discussed by others (Tversky and Kahneman, 1981; Fischhoff et al., 1980). It isclear that one can obtain mutually inconsistent judgements from the decision maker by asking equivalent questions in different ways, or in the context of different statements of the problem. Apart from inconsistencies that stem from information processing limitations, there are other biases that might be due to the way in which an individual is asked to translate subjective feelings into numbers. Some of these issues have been discussed by Weiss (1980); I shall not pursue them here.
Evaluating the Results The purpose of a decision aid is to provide an analysis of a problem that is acceptable to the decision maker, and that makes recommendations for future action. The final recommendation must be consistent with all other information provided by the decision maker. Presumably the decision aid contains procedures for detecting inconsistencies, whether they occur among the initial judgements, or between the final recommendations and the person's global evaluation of the options. However, when inconsistencies do exist it is not clear how one should proceed; they may arise for any number of reasons. Inconsistencies among quantifications of beliefs and values, for example, may result from an inappropriate representation of the problem rather than from biased judgements. The source of concern at this stage may be deeper than the identification and resolution of inconsistencies. The concept of consistency, the foundation stone of decision analysis, may itself be a chimera. Demonstrating consistency for any complete formal system is known to be impossible; there must always be gaps in the demonstration (Godel's theorem). In a fascinating discussion of this issue, Hofstadter (1979) suggests that the gaps specified by Godel's theorem arise under conditions of "self-reference", that is, when a formal system somehow makes reference to itself. The formal system of decision analysis concerns the specification of beliefs and values. Since decision theory itself provides the rationale for evaluating any behavior, self-reference occurs whenever one attempts to evaluate the statements of value. An implication of Hofstadter's thesis is that such self-reference is bound to lead to paradox and confusion. One resolution of the paradox was provided by the Persians
HUMAN ENGINEERING OF DECISION AIDS
219
over two thousand years ago. According t o Herodotus, in Book One of 7he Histories,
’
“If an important decision is to be made, they discuss the questit n when they are drunk, and the following day the master of the house where the discussion was held submits their decision for reconsideration when they are sober. If they still approve it, it is adopted; if not, it is abandoned. Conversely, any decision they make when they are sober is reconsidered afterwards when they are drunk.”
In other words, invariance of recommendation under changes in context may be the safest guide to action. Humphreys and McFadden (1980) have suggested that the value of a decision analysis lies in the insight that it provides into the decision maker’s value system, rather than in specific recommendations for action. Their conclusion is consistent with observations of our own; following a decision analysis, individuals frequently make statements t o the effect that they have a better understanding of their problem, and that they see more clearly what to do. However, in a world characterized by uncertainty and conflict, it is difficult to argue that such an outcome is inevitably desirable. Have we provided the person with insight, or have we converted realistic confusion into dogmatic determination? If the latter, is such dogmatism necessarily undesirable? In other words, can we ever know if, or how, the decision aid is aiding anything? A formal analysis of a decision problem may be an internally coherent representation of the problem, but translating the analysis into action requires a leap of faith that cannot be justified within the system.
References Alluisi, E. A. and B. B. Morgan, Jr., 1976. Engineering psychology and human performance. Annual Review of Psychology, 27, 305-330. Anderson, J. R., 1976. Language, Memory, and Thought. Hillsdale, N.J.: Erlbaum. Atwood, M. E. and P. G. Polson, 1976. A process model for waterjug problems. Cognitive Psychology, 8, 191-216. 3obrow, D. G. and D. A. Norman, 1975. Some principles of memory schemata. In: D. G. Bobrow and A. M. Collins (eds.), Representation and Understanding: Studies in Cognitive Science. New York: Academic Press.
1 would like to thank Roger Garberg for bringing to my attention this early form of decision aiding
2 20
G.F. Pitz
Hransford, J. I),, J. R. Barclay, and J. S. Franks, 1972. Sentence memory: A constructive versus intcrpretivc approach. Cognitive Psychology, 3, 193-209. Chase, W. G. and H. A. Simon, 1973. The mind’s eye in chess. In: W. G. Chase (ed.), Visual Information Processing New York: Academic Press, Duncker, K., 1945. On problem solving. Translated by L. S. Lees from the 1935 original. PsychologicalMonographs, 58, (270). Einhorn, H. J. and R. M. Hogarth, 1978. Confidence in judgment: Persistence of the illusion of validity. Psychological Review, 85, 395-416. Einhorn, H. J. and R. M. Hogarth, 1981. Behavioral decision theory: Processes of judgment and choice. Annual Review of Psychology, 32, 53-88. Fischhoff, B., P. Slovlc, and S. Lichtenstein, 1980. Knowing what you want: Measuring labile values. In: T. S. Wallsten (ed.), Cognitive Processes in Choice and Decision Behavior. Hillsdale, N. J.: Erlbaum. Gettys, C. F. and S. D. Fisher, 1974. Hypothesis plausabllity and hypothesis generation Organizational Behavior and Human Performance, 24, 93-1 10. Graesser, A. C., S. P. Robertson, and G. A. Anderson, 1981. Incorporating inferences in narrative representations: A study of how and why. Cognitive Psychology, 13, 1-26. Hofstadter, D. R., 1979. Godel, Escher, Bach: An Eternal Golden Braid. New York: Basic Rooks. Hogarth, R. M., 1980. Judgement and Choice: The Psychology o f Decision. Chiches ter, England: Wiley & Sons. Humphreys, P. C. 1981. Decision aids: Aiding decisions. In: L. Sjoberg, T. Tyszka, and J. A. Wise (eds.), Decision Analysis and Decision Processes Lund: Doxa. Humphreys, P. C. and W. McFadden, 1980. Experiences with MAUD: Aiding decision structuring versus bootstrapping the decision maker. Acta Psychologica, 45, 5 1-69. Humphreys, P. C. and A. Wisudha, 1979. MAUD: An interactive computer program For the structuring, decomposition, and recomposition of preferences between multiattributed alternatives. Technical Report 79- 2. Uxbridge, Middlesex: Declsion Analysis Unit, Brunel University. Janis, I. L. and L. Mann, 1977. Decision Making, New York: Free Press. Jungermann, H., 1980a Speculations about decision-theoretic aids for personal decision making. Acta Psychologica, 45, 7- 34. Jungermann, H., 1980b. Structural modeling of decision problems. Paper presented at American Psychological Association, Montreal. Katz, M., 1973. Career decision making: A computer-based system of interactive guidance and information (SIGI). Proceedings of 1973 conference on testing problems: Measurement for Self-understanding and Personal Development. Princeton, N.J.: Educational Testing Service. Keeney, R. L. and H. Raiffa, 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs New York: Wiley. Light, L. L. and L. Carter-Sobell, 1970. Effects of changes in semantic context on recognition memory. Journal of Verbal Learning and Verbal Behavior, 9, 1-11. Lopes, L. L. and P. H. Ekberg, 1980. Test of an ordering hypothesis in risky decision making. Acta Psychologica, 45, 161-167. Luchins, A. S., 1942. Mechanization in problem solving. Psychological Monographs, 54 (248).
HUMAN ENGINEERING OF DECISION AIDS
221
Minsky, M. A., 1975. A framework for representing knowledge. In: P. H. Winston (ed.), The Psychology of Computer Vision New York: McGraw-Hill. Newell, A,, 1973. Production systems: Models of control structures. In: W. G. Chase (ed.), Visuallnformation Processing. New York: Academic Press. Oskamp, S., 1965. Overconfidence in case-study judgements. Journal of Consulting Psychology, 29, 261 - 265. Pearl, J., A . Leal, and J . Saleh, 1980. GODDESS: Agoal-directed decision structuring system. UCLA-ENG-CSI 8034. School of Engineering and Applied Sciences, Ilniversity of Calitoi Ilia, Los Angeles. E’itz, G. F., 1980. The very guide of life: The use of probabilistic information for making decisions. In: T. S. Wallsten (ed.), Cognitive Processes in Choice and Decision Behavior. Hillsdale, N. J. : Erlbaum. Pitz, G. F., N. J. Sachs, and M. T. Brown, 1981. Eliciting a formal problem structure for individual decision analysis. Unpublished manuscript. Southern Illinois University. Carbondale. Pitz, G. F., N. J. Sachs, and J. Heerboth, 1980. Procedures for eliciting choices in the analysis of individual decisions. Organizational Behavior and Human Performance, 26, 396-408. Raiffa, H. 1968. Decision Analysis. Reading, Mass.: Addison-Wesley. Reitman, J. S. and M. R. Rueter, 1980. Organization revealed by recall orders and confirmed by pauses. Cognitive Psychology, 12, 554-581. Rips, L. J., E. J. Shoben, and E. E. Smith, 1973. Semantic distance and verification of semantic relations. Journal of Verbal Learning and Verbal Behavior, 12, 1- 20. Sachs, N. J., and C . F. Pitz, 1981. Choosing the best method of contraception: Application of decision analysis to contraceptive counseling and selection. Unpublished manuscript. Southern Illinois University, Carbondale. Schank, R. C. and R. P. Abelson, 1977. Scripts, Plan4 Goals, and Understanding. Hillsdale, N.J.: Erlbaurn. Schneider, W. and R. M. Shiffrin, 1977. Controlled and automatic human information processing: l. Detection, search and attention. Psychological Review, 84, 1- 66. Tversky, A. and D. Kahneman, 1981. The framing of decision and the psychology of choice. Science, 211, 453-458. Tversky, A. and S. Sattath, 1979. Preference trees. Psychological Review, 86, 542-573. von Winterfeldt, D., 1980. Structuring decision problems for decision analysis. Acta Psychologica, 45, 7 1 - 9 3. Weiss, J. J., 1980. CVAL and GENTREE: Two approaches t o problem structuring in decision aids. Technical Report 80- 3-97. Decisions and Designs Incorporated, McLean, VA. Wiks. Y.. 1977. What sort of taxonomy of causation do we need for language understanding? Cognitive Science, 1, 235- 264.
This Page Intentionally Left Blank
THE ROLE OF THE GOAL FOR GENERATING ACTIONS* Helmut JUNGERMdNN, Ingrid von ULARDT, and Lutz HAUSMANN Technical University, Berlin
Abstract In thls paper the initial phase of decision processes is conceptualized as the develop ment of a structural representation of relevant knowledge. Goals are viewed as playing an important role in representing decision problems when they have some specific content and are not purely formal (e. g,maximize SEU). A network model is proposed f a the representation of goals and actions, and several assumptions are made regarding the spread of activation through the network. In an experiment, hypotheses about the effects of two factors were investigated: Goal explidtness (E) was varied by presenting to Ss goal hierarchies of different speciflcity (one to three levels), and goal importance (R)was varied by letting Ss either rank-order goals with respect to their personal priorities, or not. The results show that the number of actions generated increases with the degree of goal explicitness, thus supporting the Ss creative search process, whereas the number of actions is lower for Ss who focus on their own values compared to Ss who do not, thus pointing to ego involvement as a factor restricting creativity. On the other hand, the actions generated by the personally involved group were rated higher on goal achievement scales than the actions generated by the other group. The results axe in accordance with the model which, however, needs elaboration
The Structural Representation of Decision Problems Any decision problem can be assumed to be structurally represented somehow. This representation may be more or less explicit, more or less precise, more or less aware to the subject, of course, but in any case it reflects the subject's perspective of the situation from which the decision
* We would like to thank Susanne Dlbbelt for her cooperation in the preparation of the experiment, and we are indebted to Robin Hogarth and Lennart Sjoberg for comments on a previous version of this paper. Requests for reprints should be sent to Helmut Jungermann, Institut f i i Psychologie, Technische Universitat Berlin, Doverstr. 1-5, D-1000 Berlin 10, West Germany.
224
H . Jungermann, I. von Ulardt,and L. Hausmann
process starts. A structure is a set of components of a complex whole and their interrelations; developing a structure, then, implies the generating of the components of the problem, and relating these components to each other. Both processes are closely intertwined and can be distinguished only analytically: The generating process is mostly guided by some implicit assumptions about the relations among the elements (e.g., their similarity or their mutual influences), and the structuring process often leads to a redefinition of the element set (e.g., adding or eliminating elements). In decision problems, the components may be possible actions, relevant events or states, potential outcomes, or goals and objectives; relations may be of a categorical or means-end sort (e.g., in goal hierarchies) or of a causal sort (e.g, in decision trees). A person’s representation of a problem, may it be ’real’ or ’experimental’, draws on two sources: (a) Knowledge already stored in memory, that is activated and retrieved in a particular situation; (b) Information that is searched for in the environment, and subsequently stored and integrated in permanent memory, Knowledge may be stored as relatively unconnected pieces of information that are structured only in a specific situation, or it might be stored in some already existing structure, e.g., as a schema or script. The mechanisms and strategies people use to generate and structure the components of a decision problem are not yet well understood. This is important, though, since it is this first initial phase of a decision process in which an ill-defined problem becomes well-defined, and the definition of a problem, of course, predetermines strongly the subsequent process. For instance, the selection of actions taken into consideration, or not, is certainly an important decision made before the usual decision process starts (i.e., the selection of one of the actions). This is particularly important when actions are not somehow given (e.g., sites for energy facilities), but must be designed or created (e.g., in urban planning), i.e., when imagination, phantasy, creativity are required. Whereas many authors have pointed out the importance of studying the process of representation and the factors that affect it (e.g., Vlek and Wagenaar, 1979; Jungermann, 1980; von Winterfeldt, 1980; Einhorn and Hogarth, 1981; Pitz, t h s volume), little empirical research exists on the issue. Exceptions are Gettys and Fisher (1979), who studied how people generate hypotheses about possible states of the world, or Pitz, Sachs, and Heerboth (1980), who investigated the generation of options and goals.
ROLE OF THE GOAL FOR GENERATING ACTIONS
225
The Goal Concept in Decision Theory Of particular importance for the problem formulation is the goal concept, but its function for representing decision problems has received little theoretical attention in decision theory. 'Goals' have been used in different meanings and functions: - Originally, the 'goal' of a decision maker was understood as nothing else than to 'maximize' sometiung. A goal in this sense does not have any particular content, it is a purely 'formal' goal, it is conrext-free. Its function is to serve as a criterion for selecting a course of action. - With the extension of the classic approach towards a 'multiple-objectives' approach, however, the goal concept got another meaning: goals stand for specific outcomes the decision maker tries to achieve (Keeney and Raiffa, 1976). A goal has a particular 'content'; it is contextbound- which has consequences for rhe understanding of 'rationality' and 'optimality' (cf. Einhorn and Hogarth, 1981; Pitz, this volume). Mostly so far, the function of this goal concept has been to generate and structure outcomes, i.e., to define attributes or dimensions on which potential outcomes may be evaluated. T h s is one of the goal(s) in representing decision problems. Another role might be considered, however, namely, to generate and structure actions, i.e., prior to all further steps (evaluation and selection) to design possible alternative courses of action. It is surprising that tlus interpretation has not been explored more systematically, since we can think of no other way of defining the set of actions than to consider the goal or goals to be achieved. The reason for this neglect might be that mostly decision-theoretic approaches, particularly the prescriptive ones, assume a well-defined problem in the sense that the options are given; the focus of these 'option-driven' approaches is on evaluation and selection. Only recently, in what may be called 'goal-driven' approaches, ill-defined problems have come under closer study, in the sense that the options are not given but must be created (e.g., Toda, 1978; Vlek and Wagenaar, 1979; Hogarth, 1980; Pearl, Leal,andSaleh, 1980; Pitz, Sachs, and Heerboth, 1980). Pearl, Leal and Saleh (1980) were apparently the first who, within a decision-theoretic framework, used the subject's goal(s) for generating actions, GODDESS is a computerized goal-directed decision structuring system for representing decision problems. The system allows the user to state relations among aspects, effects, conditions, and goals, in addition to actions and states which are the basic components of the traditional decision-theoretic approach. The system begins with assessing a structure of goals and subgoals and then elicits possible actions that would help produce improvements in each of the subgoals. 1s
226
H. Jungermann,l. von Ulardt,and L. Hausrnann
In an experimental study, Pitz, Sachs, and Heerboth (1980) investigated the effect of techniques on the generation of options and objectives, based on means-end analysis as used in the GODDESS system, and on script theory. Groups were given a decision problem and a list of objectives related to this problem, and were then asked to generate as many reasonable options as possible. Three groups, for example, were given the list of objectives and asked to generate options that satisfied the objectives one at a time, two at a time, or all simultaneously. Although the differences were small, there was a tendency for the goup focusing on one objective at a time only to produce more options than the group who tried to meet all objectives simultaneously. Our present interest is to study this role of goals for creating actions, i.e., in situations in which these are not given a priori. We want to know whether and how the explicit consideration of the goal, and various ways of thmking about the goal, affect the representation of the problem in terms of alternative actions. For instance, is the representation of a problem dependent on the framing of the goal (e.g, in terms of seeking positive or of avoiding negative consequences, of Tversky and Kahneman, 1981)? Under which conditions does it help and under which does it hinder people to create actions when they think about their goals(s) beforehand (Pitz, this volume)? What effect has the degree of detail with whch people think about their goals for the representation of a problem? Are there situations in which thinking about the goal results in the creation of actions which do not need to be evaluated any more, because a solution has ’emerged’ meanwhile (Beach and Wise 1980)?
A Cognitive Approach to Study the Role of the Goal
Our approach to study the role of the goal is based on conceptions of the representation of knowledge in human memory. Specifically, we made use of the idea that human memory can be modeled in terms of an associative network of concepts and schemata. The basic unit of thought is a proposition. The basic process is activation of the network’s nodes, i.e., concepts; activation presumably spreads from one to another by associative linkages (Anderson and Bower, 1973; Collins and Loftus, 1975). We assume generally that goals, actions, events, and outcomes are components of the memory structure; in the present study, however, we focus on goals and actions only. We do not distinguish here between goal-directed and causally-directed knowledge (Wilks, 1977), but rather treat goals and actions as elements of the same knowledge structure.
ROLE OF THE GOAL FOR GENERATING ACTIONS
221
A god is assumed to be represented as a node that is connected to many subordinate and superordinate goals by associative pointers. The relation between a subordinate and a superordinate goal can be interpreted as 'is meant by' or 'is implied by'. This assumption entails the idea of a cognitive goal hierarchy. Connected to goals are actions that, by reasoning or by experience, are related to that goal. The relation between an action and a goal can be read as 'helps achieve'. These actions may differ in their efficiency with respect to the goal. For simplicity we assume further that actions are directly connected to the more specific (lower level) goals and only indirectly connected to the more abstract (higher level) goals (see Figure 1). \ \ \
\
--
7
\
0 , , ,
, ,
Figure 1. Schematic Representation of a Cognitive Network Structure with Hierarchically Ordered Coals G and Actions A as components. (The numbers attached to action components indicate hypothetical efficiencies, and shaded actions indicate their belonging to a personal set.)
15*
228
H. Junprmann, 1. von Ulardt, and L. Hausmann
Activation of a goal node spreads throughout the nodes in the network to which it is connected, creating excitation at those nodes. Activation of a node might be interpreted as the attention the element is getting from the subject. The degree of excitation of an action node is dependent on the distance, i.e., the number of elements between the action node and the activated goal node; the greater the distance, the weaker the excitation. We postulate that, whether and how goal nodes are activated, influences which actions or sets of actions will be excited, retrieved, and generated. Specifically, the following assumptions are made with respect to the generation of actions: (1) When the activation of a goal concept is increased (e.g., by activating more of the goal nodes constituting the concept), then the excitation of action nodes connected to the goal concept will also be increased. If the chance of an action to be generated is a function of the excitation level of the node, which appears reasonable, the number of actions that can be generated increases with the degree of the general goal activation. For each individual, a goal concept has an impersonal (seman(2) tic) but also a personal (episodic) meaning which forms a part of the semantic meaning. When a subject considers a goal from his or her own perspective, i.e., what it means to him or her personally, only the elements of the personal goal concept, including the associated action nodes, will be activated. Consequently, the number of actions that can be generated is lower when goals are personally interpreted than when they are not. A further assumption refers to the quality of actions generated for the achievement of personal goals: Relating goals to personal preferences might also induce an unequal distribution of activation among the goals, i.a, higher-valued goals might be stronger activated than lower-valued goals. This implies, due to the spread-of-activation effect, that the action nodes connected with the higher-valued goals are also more excited than the action nodes connected with lower-valued goals. Since discrimination and search among a set of more excited actions should be easier than among a set of less excited actions, the quality of actions generated with respect to some goal is assumed to be a function of the value that goal has for the subject.
ROLE OF THE GOAL FOR GENERATING ACTIONS
229
The Experiment Material and Variables We chose an issue that was presumably familiar to all subjects: The goal was "I want to have a nice vacation", and the actions to be generated and evaluated were the possible means to achieve this goal. Two independent variables were manipulated in a 3 x 2 design: The first factor (E) was the goal explicitness, i.e., the degree of elaboration of the goal; there were three levels of E: In E l , Ss were only given the general goal, i.e., "I want to have a nice vacation". In Ez,Ss were presented a two-level hlerarchy in which the elements on the second level represented aspects of the top-level goal. In &, Ss were presented a three-level hierarchy with elements on the third level representing aspects of the elemensts on the second level. The goal hierarchy had been developed beforehand with the help of 20 student Ss, using a sorting technique. A part of the three-level hierarchy is shown in Figure 2. The second factor (R) was the way in which Ss had to think about the goals; there were two levels: In condition R,,Ss had to rank-order the goals that were presented to them, according to their personal priorities. In Rz,Ss were requested t o think about the goals in a general way, i.e., as goals people commonly have for their vacation. Procedure In the first step, Ss were presented the goal, or goal hierarchy, and requested to think about it for a while. To ensure that they actually considered each element, they had to give a specific example for each. Depending on the experimental condition, they then had to rank-order the goals, or not. In the second step, Ss were asked to generate actions that would help to achieve the goal. They had to put together packages consisting of 8 elementary actions, one from each of 8 categories (location, transport, accommodation, company activity, transport at destination, food, all other). Ss were given a booklet, and used for each package a new sheet. The first 'vacation package' was to represent their personal, ideal combination; they had then to form as many other packages as possible that might be attrao tive to other people. The number of different elementary actions generated was the first dependent variable. In the third step, all Ss were shown the complete hierarchy, had to rank-order the goals (if they had not done this before) and then evaluated their own ideal package on a 0 to 100 scale as to which degree it met each second-level goal. The rating on the respeo tive scales was the second dependent variable.
h, w
0
I want to have a nice vacation i. e.,i want
Recovering physically
activity
H . Jungermann, I. von Ulardt, and L.Hausmann
Improving
some sports
Figure 2. Fragment of the Whole Three-Level Goal Hierarchy
hobbies
education
sightseeing
H . Jungermann, I. von Ulardt, and L.Hausmann
Q
To engage in
To recover
ROLE OF THE GOAL FOR GENERATING ACTIONS
231
Hypotheses The hypotheses can be stated as follows: H 1 (General Activation Hypothesis): With increasing goal explicitness, as operationalized through factor E, an increasing number of actions will be generated. H 2 (Set Activation Hypothesis): Ranking of goals with respect to personal importance, as operationalized through factor R, will reduce the number of actions generated. H 3 (Evaluation Hypothesis): The personal package of actions will have higher values on the goal achievement scales for the group who rank-ordered the goals with respect to personal importance before generating the actions than for the group who did not.
Subjects Ss were 130 students, nurses, secretaries, and post office workers. The number of Ss was not the same in all groups: In group El /Rl , N=21; in group E2/RI, N=19; in group E3/R1, N=17; in group EI/Rz, N=34; in group Ez/R2,N= 19; in group E3/Rz, N=20.
Results For each S, three scores were determined: The P-Score represents the number of packages the S has generated; all packages that were either incomplete (i.e., one or more categories left out) or impossible under the given contingencies (e.g., three weeks vacation) were not counted. The A-Score represents the number of generated actions, the variable of main interest in our study; only semantically or functionally different elements were counted (e.g., a VW and a Ford were not counted as two different means of transportation). The E-scores represent the evaluations of the personal package with respect to the goal achievement scales. Our main two hypotheses concerned the number of actions generated, depending on goal explicitness (factor E) and goal importance (factor R). The A-scores, that are of primary interest here, are given in Table 1. Goal explicitness (factor E). The General Activation Hypothesis (H 1) said that Ss who think more explicitly about the goal will produce more possible actions. The data support this assumption: The number of elementary actions (A-score) was significantly different among the three
H. Jungermann, I. von Ulardt ,and L. Hausmann
232
Table 1. Mean Number of Generated Actions (A-score)
I Importance of Goal
EI
E2
I
E3
ii
Rl
29.90
2758
32.18
29.8 1
R2
28.21
38.1 1
42.55
34.71
x
28.85
32.84
37.78
~
groups El, E2, and E3 (Kruskal-Wallis test: H = 16.15; df = 2; p < 0.001). In particular, group E3 generated more actions (A = 37.78) than group Ez (A = 32.84), and group E2 more than group El (A = 28.85). The main effect of E was also found when P-scores were analysed: The mean number of packages ('P) produced by E l , E2, and E3 were 5.36, 7.42, and 8.08, respectively. These differences are highly significant (Krwkal-Wallis H = 21.43; df = 2; p < 0.0001). Goal importance (factor R). In our Set Activation Hypothesis (H 2) it was assumed that Ss who rank the goals with respect to personal importance will generate fewer actions than Ss who do not. The data provide evidence for this hypothesis also: Group R1 produced significantly less = 29.81) than group R2 (i = 34.71) (Kruskalelementary actions (i Wallis H = 6.16; df = 1; p < 0.02). The analysis of the P-sere shows the same effect. For group R I , 5 = 6.37, and for group Rz,P = 7.03 (Kruskal-Wallis H = 7.75; df = 2; p < 0.01). Some further evidence on E and R effects. Since an analysis of variance could not be applied to explore possible interactions, separate analyses with the Kruskal-Wallistest were performed of the effects of E on the two levels of R, and of the effect of R on the three levels of E. The analysis of E effects shows that only under condition R2 are the differences between E l , Ez, and E3 significant (p < O.OOOl), but not under condition R 1 in which Ss ranked the goals with respect to personal importance. The analysis of R effects shows that only the differences on levels Ez and E3 are significant (for both, p < 0.01), but not the difference on level El . Evalwtion. The Evaluation Hypothesis (H 3) postulated that Ss who relate the goals to themselves will produce better personal packages in t e r n of potential goal achievement than Ss who do not. Table 2 shows the mean evaluations (E-scores) of groups RI and R2 with respect to each of the 8 second-level goals. As predicted, the values for group R1 are
ROLE OF THE GOAL FOR GENERATING ACTIONS
233
1
2
3
4
5
6
7
8
R1
88.8
86.9
85.8
19.3
81.1
70.1
75.3
66.6 N = 5 1
RZ
89.6
84.1
18.8
18.0
73.6
76.1
66.0
58.6 N = 19
significantly higher than the values for group Rz (Kolmogorov-Smirnov test: Chi' = 11.91; p < O.Ol).However,ifonesplitsthe datain evaluations for the four most important goals and the four least important goals, the groups differ significantly only on the latter scales.
Discussion The results support the hypotheses and are basically in accordance with our theoretical approach based on a network model of the representation of knowledge. The main effects of goal explicitness (factor E) and goal importance (factor R) were evident in our data. The finding that factor E is not effective when Ss rank the goals may be seen as an indicator for the strong effect of R irrespective of the other factor: Relating goals to personal values and thus interpreting them from a personal perspective seems to level the effect of goal explicitness. The finding that factor R is not effective on level El of the goal is relatively easy to understand: Under condition El, only one very general goal is presented to Ss, namely, the top element of the goal hierarchy. An instruction to relate to the self, as given in R1,might actually be expected not to have any particular effect, although, of course, some Ss might implicitly rank aspects of this goal with respect to their importance. However, the lack of any difference between groups El/Rl and El/Rz seems more plausible. Finally, we have a very tentative explanation for the finding that evaluations differ only on the less important goal achievement scales. Ss in group R2,when asked to generate their personal ideal package, had only a very brief time to reflect about their own goals, and they might have directed their attention therefore primarily t o the most important goals while neglecting the less important ones. Consequently, they could select relatively efficient actions with respect to the more important goals, thus
234
H . Jungerrnann, I . von Ulardt,and L. Hausrnann
not differing from Ss working under condition R1,but produce only less efficient actions with respect to the other goals. Our results demonstrate the importance of goals for the process of generating options. On the one hand, the consideration of goals can support the creative activity of subjects, namely, when goals are made explicit and specific. On the other hand, goals may limit subjects’ search for options when the perspective is strongly self-oriented. In both cases, however, goals direct the attention and effort subjects spend on generating actions for goal achievement; the effect differs only depending on the way goals are introduced and framed. Thus result may be linked to other psychological research. For instance, in problem-solving psychology it has always been said that one needs to define the goal status in order to be able to generate the operators that help transform the status quo into the desired status, and various theoretical and empirical approaches have studied problemsolving processes using this conception (e.g., Newel1 and Simon, 1972; Dorner, 1976). Effects of set and direction on creative problem-solving has clearly been demonstrated in many studies (e.g., Hyman, 1961), and characteristics of people’s handling of goals in complex decision situations (like running a town) have recently been investigated by Dorner (in press). Another area is the research on goal setting and task performance. A recent review (Locke, Shaw, Saari, and Latham, 1981) concluded that the “beneficial effect of goal setting on task performance is one of the most robust and replicable findings” (p. 145). One particular finding is that specific goals lead to higher output than vague goals such as ”do your best”. This, of course, sounds similar to the contrast between concrete, substantial goals and abstract, formal goals like ”maximize your SEU”, as discussed above in Section 11. Of course, the model proposed in this paper is very simple. However, the use of models of long-term memory for research on decision processes is only beginning. More specific models of the acquisition, storage and retrieval of decision-relevant knowledge must be developcd and tested. Questions worth to be studied would include, for instance: Is the knowledge that is retrieved in decision situations actually structured in terms of actions, events, and outcomes, as decision theory assumes? How are goals connected to these elements? Which factors influence the retrieval process in which ways? Can the effects of heuristics suchas availability or representativeness (Tversky and Kahneman, 1975) be explained with the help of models of the representation of knowledge? One problem for such research is that it seems difficult to design experiments and to develop measurement procedures for eliciting subjects’ r e p resentation of decision problems or for studying factors that affect this
ROLE OF THE GOAL FOR GENERATING ACTIONS
235
representation. New experimental paradigms are needed that parallel those in cognitive psychology, e.g., recognition and recall. Finally, what are the implications for the application of decision theory? In decision analyses, clients are required to retrieve, and then externalize knowledge in terms of the prescriptive conception (i.e., actions, events, and outcomes), and this seems usually to work, at least it does not seem to be counterintuitive. However, this approach might not be sufficient when the representation of the problem is itself an important step of the decision process, as many decision analysts claim it is most of the time. But it is exactly this step where attention and memory play an important role. Our conclusion is that, if we want to improve our understanding of people’s decision behavior, and also if we want to improve our ability to aid people making decisions, we will have to study how this process of representing problems is performed and what techniques we need to develop for adequately eliciting people’s knowledge about the problem. If the representation of a problem depends on which knowledge is activated and retrieved, the procedures used for eliciting this knowledge and their effects on cuing attention and activation are as important as the techniques for eliciting utilities and subjective probabilities.
References Anderson, J.R. and C.H. Bower, 1973. Human Associative Memory. Washington, D.C.: V.H. Winston & Sons. Beach, L.R and J.A. Wise, 1980. Decision emergence: A Lewinian perspective. Acta Psychologica, 45, 343-356. Colhs, A.M. and E.F. Loftus, 1975. A spreading-activation theory of semantic memory. Psychological Review, 82, 407-428. Dorner, D., 1976. Problemlosen als Informationsverarbeitung. Stuttgart: W. Kohlhammer. Dorner, D. Heuristics and cognition in complex systems. In: R. Groner, M. Groner, and W.F. Blschof (eds.), Methods of Heuristics Hillsdale, N.J.: Lawrence Erlbaum (in press). Einhorn. H.J. and R.M. Hogarth, 1981. Behavioral decfsion theory: Rocesses of judgement and choice. Annual Review of Psychology, 32, 53-88. Cettys, C.F. and S.D. Fisher, 1979. Hypothesis plausabillty and hypothesis generation. Organizational Behavior and Human Performance, 24, 9 3 - 100. Hogarth, R.M, 1980. Judgment and Choice: The Psychology of Dedsion. Chichester, England: Wiley & Sons. Hyman, R, 1969. On prior Information and creativity. Psychological Reports, 9, 151- 161.
Jungermann, H., 1980. Structural modeling of decision problems. Paper presented at American Psychological Association, Montreal.
236
H. Jungermann, I. von Uiardt, and L. Hausmann
Keeney, R.L and H. Raiffa, 1976. Decisions with Multiple Objectives: Preferences and Value lladeoffs. New York: Wlley. Lmke, E.A., K.N. Shaw, L.M. S d , and G.P. Latham, 1981: Goal setting and task performance: 1969- 1980. Psychological Bulletin, 90, 125- 152. Newell, A. and H.A. Simon, 1972. Human Problem Solving Englewood Cliffs, N.J.: Rentice-Hall. Pearl, J., A. Leal, and J. Saieh, 1980. GODDESS: Agoal-directed decision structuring system UCLA-ENG-CSL-8034. School of Engineering and Applied Sciences, University of California, Los Angeies. Pitz, G.F. Human englneering of decision alds. In this volume, 205 -221. Pit& G.F., N.J. Sachs, and J. Heerboth, 1980. Procedures for eliciting choices In the analysts of Individual decislons. Organizational Behavior and Human Performance, 26, 396-408. Toda, M., 1976. The dedslon process: A perspective. International Journal of General Systems, 3, 79-88. Tversky, A. and D. Kahneman, 1976. Judgment under uncertainty: Heuristics and blases. In: D. Wendt and C. Vlek (eds.), Utility Probability, and Human DecC sion Making DordrechtlBoston: D. Reidel. Tversky, A. and D. Kahneman, 1981. The framing of dedslons and the psychology of choice. Science, 2 1 I , 4 5 3- 458. Vlek, Ck and W.A. Wagenaar, 1979. Judgment and declsion under uncertalnty. In: J.A. Michon, E.G. Eijkman, and L.F.W. DeKlerk (eds.), Handbook ofPsychonomics, 11. Amsterdam: North-Holland, 253-345. von Winterfeidt, D., 1980. Structuring decision problems for dedsion analysis. Acta Psychologica, 45, 11-93. WiIks, Y., 1977. What sort of taxonomy of causation do we need for language understandlng? Cognitive Science, 1, 235 -264.
FUZZY STRUCTURAL MODELLING-AN AID FOR DECISION MAKING Dimiter S. DRIANKOV and Ivan STANTCHEV Higher Institute of Economics "K. Marx" Sofia, Bulgaria*
Introduction An issue which has received little attention in decision theory is the role of
the goal for representing decision problems. Our hypothesis is that in situations in which the alternatives and their consequences are not a p r w n given, goals play a major role in the process of,generation andevaluation of the alternatives and their consequences. In the present paper goals are considered as complex wholes which are described in terms of constituent parts and interrelations among these parts. In the field of complex humanistic systems practical experience shows that it is quite impossible to assess in precise terms all the quantitative and/or qualitative data necessary for elaborating a relevant representation of the goals. That is why we employ here a linguistic approach ior describing the Constituent part of the goals, the values they take, and the interrelations among them. On the other hand, the structural modelling approach offers a relatively simple procedure appropriate to the imprecise and/or ill-defined interrelations among the constituent parts of the goal. It is to be stressed here that the use of fuzzy conditional propositions helps us t o retain the nonlinear character of- these interrelations, whde the classical structural modelling approach simply linearizes them. It should also be stressed here that whde in the classical structural modelling approach the strengths of the influences among the constituent parts are assessed subjectively, this is not so in our approach: the assessment of the strength of influence among the constituent parts of the goal is made on the basis of objectively existing relationslups among the constituent parts.
* Authors' address: Department of Systems Analysis, Higher Institute of Economics "K. Marx", Ekxrh losif 14 St.. Sofia. Bulgaria.
238
D.Driankov and I . Stantchev
Statement of the Problem In the process of decision making, once the Goal System (GS) has been constructed, an acute problem arises. This concerns the possible relationships of influence among the so-called Goal Indicators (GI), the strength of these relationships and the mechanism and the type of interaction among the goal indicators. According to Griiber and Strobe1 (1977), the goal indicators are placed at the lowest level of a tree-like structure which describes the goal system. They represent measurable system parameters which describe the behaviour of the system within which the decision maker takes and executes his decisions. Thus, the goal indicators and certain values taken by them, which we shall call ”aspiration values,” express in measurable terms the meaning of the GS in the context of a certain real life system. The “aspiration values” of the goal indicators give the decision maker the conditions under which he measures the attainment of his goals. In general, there is more than one GI characterizing a certain goal. On the other hand, these goal indicators may describe some other goals. Thus, if there exist relationships of influence among these goal indicators, and some of them have been influenced by a particular decision, this eventually may change the degree of attainment not only of that particular goal but of the whole GS. Thus there is a necessity for a descrip tion of the GI’s structure in terms of: - relationshps of influence among the goal indicators; - the strength and the type of these relationships; - the mechanism of interaction. The necessity for this description arises in order to help the decision maker to determine the full range of consequences of a particular decision with respect t o the attainment of the whole GS.
Constructing the Structure of the Goal Indicators After the goal indicators expressing the meaning of the GS in measurable terms have been assessed, the construction of their structure follows. In our approach, this is done through a man-machine procedure consisting of the following stages:
Stage 1: Identification of the relationships of influence between each pair of goal indicators
239
FUZZY STRUCTURAL MODELLING
Definition I : The GI, Xi is said to be fuzzy with respect to the decision maker if he describes the values taken by this GI in terms of fuzzy sets, i.e.,
T(X')
=
{LK(Xi)
= xik ]
k
="t
where
X' is the name of the i-th GI. X' is a linguistic variable (Zadeh, 1975) with a universe of discourse U' = {u'} and (Xi)is one of its linguistic valries assigned to it by the decision maker in order to behaviour of Xi in measurablc terms. T(X') determ:nes the so-called term-set of Xi; xk is a fuzzy set expressing the meaning of the linguistic value Lk(Xi). I t is defined on the universe of discourse U' and is characterized by a membership function pcC&(u') such that pi( (u'): u' + [0,1]. Notion 1: Informally, a linguistic variable, Xi, is a variable whose values are words or sentences in a natural or artificial language. For example, if "salary" is interpreted as a linguistic variable with termset T (salary), the set of linguistic values Li(, k = 1,2 ..... might be:
Ti (salary) = {$ (salary)] k = 1, 2 , . = {high, low, above low, approximately high, between low and medium, very high, . . .] where each of the terms in T (salary) is a label of a fuzzy set of an universe of discourse, say U' ={$7,000 $40,000). More generally, U may be an arbitrary collection of objects or constructs of objects. Then a finite fuzzy set, x, of U = u is expressed as: ,
-
x = pi/ u i + ~ 2 1 + ~ 2 . + Pnlun
(2)
where the pi, i = 1,2, . . . n , represent grades of membershp (compatibility) of the ui in x. More generally, a finite fuzzy set x of U = { u ) is expressed as: x
=
PX(U)/U
(3 )
where, px(u): U + [0,1] is the membership function (compatibility function) of x. Thus, the linguistic value, say Lk(Xi) = high salary may be expresse d by a fuzzy set as:
D. Driankov and I. Stantchev
240
.I,= 1/$40,000 + 0.8/$32,000 + 0.7/$30,000 + t 0.5/$21,000 + 0.2/$12,000 Definition 2: A fuzzy relationship of influence of Xi on XJ exists for the decision maker if he is able to construct a fuzzy conditional proposition expressing his cognition that certain values of Xi induce certain values of xJ,i.e., IF Ln (X') THEN L, (Xj)
(4)
Formally,(4) is expressed by a fuzzy relation R, where R is a fuzzy set defined on the Cartesian product U' x Uj and is characterized by a membership function defined in the following way:
Example 1: Let
+
+ +
U i = Uj = 1 2 + 3 f 4 5 6 + 7 L, (Xi) = small integer = 1/1 + 0.8/2 + 0.613 + 0.4/4 + 0.2/5 L,,, (XJ)= medium integer = 0.4/2 + 0.813 + 114 + 0.8/5 + 0.2/6 Then a fuzzy conditional proposition, say IF Xi is small THEN Xj is medium translates into fuzzy relation R, characterized by a membershp function p~ obtained according to ( 5 ) :
1 PR
=
2 3
2
3
0.4 0.4
0.8 1
4 1 1
5 6 0.8 0.2 1 0.2
4 5 Using the apparatus described above, the decision maker can construct all the fuzzy conditional propositions describing fuzzy relationships of influence between Xiand Xi. To do this he proceeds as follows: Step I: The decision maker defines linguistically all the expressions of type (4).
FUZZY STRUCTURAL MODELLING
24 1
Step 2: The decision maker, using a man-machine procedure, constructs the fuzzy sets representing the meaning of the linguistic values taken by Xi and XI. Step 3: Using (5), for each expression of type (4) its R is built. Thus, at the end of stage 1 , the decision maker has obtained both the linguistic and the formal representation of all the relationships of influence between each pair of goal indicators.
Stage 2. Determining the degree of influence between each pair of goal indicators
The degree of influence of X' on Xj is determined according to the follow ing procedure : Step I: Given p~ and ph(u') it is possible t o obtain gm(u-) by the use of a fuzzy inference rule and the so-called "composition" operation. T h s is always possible if p~ has been constructed according t o (5). This is done as follows:
where "0" denotes the "composition" operation given by the expression
in the right hand side of (6); "V"denotes "max"; "A" denotes "mii?.
Notion 2: We often make an inference such that from the relationship between some objects Xi and XJ and the information about Xi, we deduce some information about X' as a consequence. For this, Zadeh (1975) suggested an inference rule named the "compositional rule of inference". In our case we consider an inference rule of the form:
Antecedent 1: IF Xi is L ( X 1 )THEN Xj is Lm(XJ) Antecedent 2: Xi is L ( X i ) Consequence: XJ is ~ ( x J ) If we translate the antecedent of the inference which represents the relationship between the objects Xi and (Antecedent l), into a suitable fuzzy relation R, given by p R , and the antecedent, which represents information about Xi (Antecedent 2), into a suitable fuzzy set x i , charac16
D. Driankov and 1. Stantchev
242
terized by pf (ui), then the consequence can be obtained by the "composition" of p i (u') and p K , as given in (6). This may be expressed as follows:
Xi = L(XL) and XJ = L ( X J ) are R = p~ xi = h ( X i ) = ph(u') Xj = pL(ui) 0 p~ =
&m(d)
which is L,,,(XJ)
Example 2: Let
U i = U l = 1 + 2 + 3 + 4 + 5 + 6 -I- 7 Ln(Xi) = very small integer 1.011 + 0.6/2 + 0.4/3 + 0.114 h ( X J ) = big integer = 0.2/4 + 0.4/5 + 0.8/6 + 1.017 Applying ( 5 ) , we translate the antecedent of the inference, say:
If Xi is very small THEN X' is big into a fuzzy relation R: 1 2
4
4 0.2
5 0.4
0.2
0.4
6 0.8 1
7 1
1
The antecedent which represents information about Xi = h ( X i ) = very small, is translated into a fuzzy set as: x i = 1.011
Xi, i.e.,
+ 0.612 t 0.4/3 + 0.1/4
which is characterized by a membership function pl(ui) taking values: (1,0.6,0.4,0.1). Then the consequence Xj = b ( X j ) is obtained by the composition of g',(ui) and C(R :
r0.2 (1,0.6,0.4,0.1) 0
0.4
0.8
;1
FUZZY STRUCTURAL MODELLING
24 3
(max (min(l,0.2), min(0.6,0.2), min(0.4,0.2), min(O.l,l)), max (min(l,O.4), min(0.6,0.4), min(0.4,1), min(O.l,l)), max (min( 1,0.8), min (0.6,1), min(OA,l), min(0.1, l)), max (min( 1, l), min(0.6, l), min(OA,l), min(0.1,l))) = (max (0.2,0.2,0.2,0.1 j, max (0.4,0.4,0.4,0.1), max (0.8,0.6,0.4,0.1), max (1,0.6,0.4,0.1)) = (0.2,0.4,0.8,1) Thus we have obtained the values of the membership function drn(.') which expresses the meaning of h ( X J ) big. It is easy to see that if we consider the values taken by ph(u') as unknown, then (6) defines a system of fuzzy equations, as defined by Driankov and Petrov (1977). This can help the decision maker to obtain all values of Xi which give one and the same value of Xj, expressed by bm(d).Th'is means that he/she can find out those values of Xi which do not change a certain value taken by XJ. If p~ is constructed according to (5) then (6) has a solution; i.e., there exist some values of Xi represented by the set of memberslup functions, [p:(u*)} which also includes p:(ui) and such that:
It has been proved that lower and upper bounds o f the set of the solutions of (6) exist. These are denoted by p i m i n (u') and p\max(ui) respectively . Applying the so-called "linguistic approximation procedure"(Zadeh, 1975) to the elements of the set of the solutions, we can obtain the linguistic value corresponding to each solution.
Example 3: Following the previous step, we deal with the inverse problem of (6): Given a fuzzy relation R , defined on the Cartesian product U' x and characterized by a membership function p R , o b t a i n e d according to ( 5 ) find all p:(ui), such that
u,
p:(ui) @ C(R = drn(d), vt
Let
I 1
0.4 0.0 0.9 0.6 0.8 0.7 0.8 0.3 1.0 0.5 IrR = 0.6 0.4 0.3 0.4 0.9 0.2 1.0 0.5 0.8 0.4
pJrn(d) = (0.6,0.5,0.9,0.6,0.8) 16*
D. Driankov and I. Stantchev
244
We look for p:(u*) such that p:(u') @ PR = pfn(uj)~v t and these are: (0.9, 0.5,0.6, O.O)
(0.9,0.0,0.6,0.5)~p~(u')~(1.0,0.5,0.8,0.5)
Example 4 : Suppose we have the following fuzzy conditional proposition: IF Xi is low THEN Xj is medium Then, applying the procedures described above and the procedure of linguistic approximation to the elements of the set [pi(ui)], we may obtain, for example:
IF Xi is slightly above low THEN Xj is niedium 1F Xi is between low and medium THEN Xj is medium IF Xi is slightly below medium THEN Xj is medium where: slightly above low, between low and medium, slightly below medium and low are fuzzy sets characterized by membership functions which are elements of the set {p:(u')). In this case, p: in(ui) determines the fuzzy set 'low'. Andp:m,,(ui) determines the?uzzy set 'slightly below medium'. (u') and p:.max(ui) have been determined Step 2: At this step, after pi p i n and linguistically de ined, the decision maker is asked : - to define this value of Xi which is next to the value determined by p:min(ui)and below it; - to define this value of Xi which is next to the value determined by p!max(ui)and above it. After these two values have been determined, the decision maker is asked to establish a fuzzy relationship of influence between X' and Xj for the values mentioned above:
IF
i - pt next t o t min and below idu')
i-
THEN
IF
i-
=
p--(d)
i
- pt
next to t max and above it('')
THEN Xj = dP(d)
FUZZY STRUCTURAL MODELLING
245
Example 5: In the case of Example 4, the decision maker constructs also the following fuzzy conditional propositions, say: IF Xi is medium THEN XJ is between medium and high IF Xi is slightly below low THEN Xj is low Step 3. At this step using an appropriate similarity measure, D, taking on values in the interval [0,1], the decision maker measures the similarity between &m(uj) and
dq(d)and the similarity between
dm(d)and &,(d),obtaining Dmq and D,,,
respectively.
The smaller the D the bigger the influence exerted. Thus the so-called "influence" set, describing the influence of Xi on Xj is constructed: Infl(Xi/XJ)= [DklJ, ke [1,2 ,...,Nil ; l e (1,2,..-Nj)
Example 6: Suppose that by applying consecutively step 1 and step 2 the decision maker has obtained: (a) IF Xi is very low THEN Xj is high (b) IF Xi is slightly below low THEN XJ is high (c) IF Xi is low THEN Xj is high (d) IF Xi is above low THEN XJ is very high (e) IF X' is near medium THEN Xj is very high (f) IF Xi is medium THEN XJ is medium (s) IF Xi above medium THEN Xj is medium (h) IF Xi is near high THEN Xj is medium (i) IF Xi is h g h THEN XJ is low (i) IF Xi is absolutely high THEN Xj is low (k) IF Xi is absolutely high THEN Xj is very low Now the decision maker measures the degree of similarity between the right hand sides of, say, (d) and (c). - D,, gives the degree of influence of Xi = 'above,Zow'onXJ = 'high' and at the same time the degree of influence of XI = 'low' on XJ = 'very high'. It is easy to see that D,, gives also the degree of influence of Xi = 'near medium' on XI = 'very high' and the degree of influence of Xi = 'very low' or 'slightly below' on Xj = 'very high,' Acco.rdingly, Df,, D,,, Dkc, Df,, Die, Dk,, b h , & h , Dkj are determined. Step4: Determining the max-influence path between each pair of goal indicators. At this step the so-called direct influence matrix, A, is constructed:
D. Driankov and I . Stantchev
246
X'
X2
....XN
all a21
a1 2 a22
....a1 N .-.a2N
XN aN1
aN 2
*-a"
x1 x2
with rows and columns representing goal indicators and elements ai, e[0,1], ail = I.a,l corresponds t o the degree of influence which Xi exerts on Xj when X' changes its value in the direction of its aspiration level. In the context of Example 6 this means that if the aspiration level of X' is medium and the actual value of X' is low, then if the decision maker wants to change the value of X1 from low to medium, this vill exert an influence on the value taken by XJ equal to D,, . It is easy to be seen that A represents a weighted directed graph with nodes GI and directed weighted arcs aij. It must be stressed here that the notion of transitivity is not valid for the elements of A. This means that it is not supposed that the shorter the path, the greater is the influence exerted through this path, i.e., the strength of the link between each pair of nodes must be greater or equal to the strength of any indirect chain connecting this pair of nodes. Thus, we face the following problem: Find that chain which realizes the max influence of X' on XJ. The above problem is solved as follows: Step4.1.
It has been proved (Dimitrov and Driankov, 1977) that A G A ' G A3 6 ; . . . < A k Q . . .
where Ak = A 0 Ak-' and "0"is the "composition" operation defined in (6). It is also proved that if aii = 1 then: A G A2 < ...Q A' = A"' = Ak-' Q Ak means that: ak-' Q ak for Vij, where ak-' r Ak-' and a t e Ak.
...
iJ
il
1J
Thus, in obtaining ATwe are guaranteed that we have found the max possible influence of Xj on Xi and this influence is exerted on Xi through a chain with length smaller or equal to r.
Step 4.2.
At this step, based on A and AT,the decision maker builds the structure of the goal indicators. A detaileQ description of the procedure used can be found in Dimitrov and Driankov (1977).
FUZZY STRUCTURAL MODELLING
241
Example 7: Let 1.0 0.4 0.7 0.3 0.9 0.6 0.7 1 .O 0.8 0.4 0.5 0.3 0.3 0.5 1 .O 0.7 0.1 0.6 A= 0.1 0.3 0.6 1.0 0.8 0.2 0.3 0.7 0.9 0.2 1 .O 0.4 0.7 0.5 0.1 0.3 0.4 1.0
1
Applying the procedure described in Step 4.1 we obtain 1.0 0.7 0.9 0.70.9 0.6 0.7 1.0 0.8 0.7 0.7 0.6 0.6 0.5 1.0 0.7 0.7 0.6 0.3 0.7 0.8 1.0 0.8 0.6 0.7 0.7 0.9 0.7 1.0 0.6 0.7 0.5 0.7 0.40.7 1.O
1.O 0.7 0.9 0.7 0.9 0.6 0.7 1.0 0.8 0.70.70.6 0.6 0.7 1 .O 0.7 0.7 0.6 0.7 0.7 0.8 1.O 0.8 0.6 0.7 0.7 0.9 0.7 1.0 0.6 0.7 0.7 0.7 0.7 0.7 1.0 1.0 0.7 0.9 0.7 0.9 0.6 0.7 1.0 0.8 0.7 0.7 0.6 0.7 0.7 1.O 0.7 0.7 0.6 0.7 0.7 0.8 1.O 0.8 0.6 0.7 0.7 0.9 0.7 1.0 0.6 0.7 0.7 0.7 0.7 0.7 1.O
w h e r e A < A 2 G A 3 < A 4 = A 5 = ... Thus, the elements of A4 express the max degree of influence between each pair of goal indicators. I f we choose 0.6 as a minimal level of the max influence exerted,the structure of goal indicators shown in Figure 1 is built.
248
D. Driankov and 1. Stantchev
L
0.7
\
-0
-
Figure 1. The Structure of the Goal Indicators 4
Thus, the element a, e A tells us that if we want to change the value of X2 in the direction of its aspiration value, this will exert influence on the value taken by X' equal to 0.4 and this influence will be exerted directly on XI. Then the element a;'2eA4 tells us that, due to the interconnections among the goal parameters, this influence will become greater, i.e., equal to 0.7. And according to Figure 1, the influence will not be exertsd directly but through the chain: X 2 A 7 X5% X' or t h o u the chain: X 2 s X 5 2 9 3 X 3 L 7 X', or through the chain: X A 7 X50._8 X " a X3-% 7 X5 O.? X', Here the latter chain contains a circuit.
P
Stage 3: Determining the type of the circuits in the structure
As stated at the beginning of this paper each GI has its aspiration value. Thus, the attainment of these aspiration values determines the degree to which the whole GS is attained. Definition 4: It is said that Xi has negative influence on Xj (6 = -1) for certain values of xi,if we can rearrange these values of X' in increasing or decreasing sequence with respect to aspiration value such that the corresponding values of form a decreasing sequence with respect to aspiration
FUZZY STRUCTURAL MODELLING
249
value. If the corresponding values of XJ form an increasing sequence with respect to its aspiration value then it is said that Xi has a positive influence on XJ (6 = +I).
Example 8: Suppose we have obtained IF xi = h 3(xi)THEN XJ = Lm, (XI) IF Xi = Lk, (Xi) THEN Xj
= Lm,
(XJ)
IF xi = 4, (xi)THEN XJ = Lm, (xJ)
IF Xi = Lk, (Xi) THEN Xj = Lm, (XJ) IF Xi = Lk5 (X')THEN XJ = Lm6 (XJ) IF Xi = Lk6 (Xi) THEN Xj = Lm5 (XJ) Then, if
k 3(xi)< Lk, (xi)< Lk4 (xi)< Lk, (xi)< Lasp. ( x i ) and:
Lasp,(Xj)>,Lm, ( x J ) > L ~( X , J)>L,
( x j ) > L m 3 (xJ)
we say that Xi exerts a negative influence on Xj for these particular values taken by Xi.
If
'
(xi) 2 Lk6 (xi)2
L,asp. h
6
Lk5 ( X i ) a n d ~ s p . ( X J ) 2 L r n(XJ) ,
(xJ)
we say that Xi exerts positive influence on XJ for these particular values of
xi.
Step 1: Determ'ning the type of influence for each two nodes of the structure. Using a simple search procedure for finding increasing or decreasing sequences in a list of fuzzy conditional propositions, determined by a particular degree of influence, we mark the links in the structure thus determining the type of influence between each pair of nodes of the structure. This will help the decision maker to,see the effect which Xi has on Xj with respect to the aspiration value of XJ for particular values taken by X'. Step 2: Determining the type of the circuits. The following types of cyclic relations may exist among the elements of a given circuit: ( I ) "Snowball" type: all links in a certain circuit are marked with +1 sign. The effect which this type of circuit causes is as follows: Suppose we have a circuit consisting of three nodes, like that shown in Figure 2.
D. Driankov and 1. Stantchev
250
XJ
Figure 2. The "Snowball"Type of a Circuit
If the value of Xi is increased this leads (according to Definition 4) to an increase of the value ofXJ,and due to that, the value of Xk will also increase. This leads again to an increase of the value of Xi, etc. So, we can see that if we increase the value of an arbitrary element of the circuit this leads to a "snowball" like increase of the values of all elements of that circuit. It is easy to show that if we decrease the value of a n arbitrary element of the circuit this will cause a "snowball" like decrease of the values of all elements of the circuit. q
(2)
"Colhpse" type: This type of circuit exists if fl 6k > 0, where k=l 6, e {- 1 , -I-1) . An example is given in Figure 3. The effect which this type of circuit causes is as follows:
x'
X' Figure 3. The "Collapse" Type of a Circult
FUZZY STRUCTURAL MODELLING
25 1
& increase of the value of X' leads to an increase of the value of XJ. According to Definition 4, the value of Xk vill decrease, which
leads to an increase of the value of X' . Since the influence of X' on Xi is positive, the value of Xi increases again and thus the value of XJ also increases, thus decreasing the value of Xk for the second time. So, we can see that the values of Xi, XJ and X' increase permanently while the value of Xk decreases permanently. If we start with a decrease of the value of Xi it is easy t o see that the effect is a permanent increase of the value of Xk and at the same time a permanent decrease of the values of Xi: Xj and X I . ( 3 ) "Resistance-to-change" type: This type of a circuit exists if 9
n 6 k < 0,where G k E {-I,
+I} . An example is given in Figure 4.
k=l
The effect of this type of circuit is as follows: XJ
+I Hgure 4. "Resistance-toChange" Type of a Circuit
An increase of the value of Xi leads to an increase of the value of Xj. As a result, the value of Xk decreases. Since there is a positive influence of Xk on Xi the value of Xi will decrease. Then the value of XJ,also decreases. But due to the negative type of influence of XJ on Xk this leads to an increase of the value of Xk,etc. Thus, we can see that in this case the values of the goal indicators pulsate around certain initial values. The effect will be the same if we start with a decrease of the value of Xi. It is easy to see that if the structure contains circuits of the types described above, this may cause problems with respect to the attainment of the aspiration values of the goal indicators. One way out of this is to rebuild the structure in such a way that in the new structure there will be no circuits. Two alternative possibilities exist in this direction. The first
D. Driankov and I. Stantchev
25 2
one is to change Xi in such a way that the direct influence it exerts on XJis to increase, ie., to increase the value of a,]. The second one is to determine some significant value for the degree of influence such that for each k holds: if a: Q y then
a,] > y or the inverse.
Conclusion The structure of GIs thus built up is used by the decision maker in the following manner: - the decision maker measures the attainment of the goals in terms of certain goal indicators taking into consideration their aspiration levels and their actual values resulting from a particular decision; - he or she inserts these actual values into the GI’s structure and relates them to each other, thus causing changes to some other goal indicators. Thus, situations and problems are perceived and identified, alternatives are sought and tested. Finally, the experiences following the decision are stored and inserted in the GI’s structure. References Dimitrov. V. and D. Driankov, 1977. Pointwise prediction based on fuzzy sets. Pro. reedings o f the International Conference on Information Sciences and Systems, Patras. Greece, 19 76. Hemisphere Publishing Corporation. Driankov, I). and A. Petrov, 1977. Fuzzy equations with appllcations. Proceedings of the 4th Congress of the Balkan Mathematical Union, Varna, Bulgarh, 1975. Bulgarian Academy of Sciences. Zadeh, L.A., 1975. The concept of a llnguistic variable and its appllcatlon to approximate reasoning, Parts 1, 2, 3. Information Sciences, 8, 9.
USE OF PROBLEM STRUCTURING TECHNIQUES FOR 0PTI ON GENERATION : A COMPUTER CHOICE CASE STUDY Patrick HUMPHREYS London School of Economics and Political Science, England
Abstract Structuring techniques developed so far for use in decision analyses have concentrated on structuring uncertainty or the decompostion of worth of alternatives while the specification of the options to be considered within the analysis has usually been considered as something which happens a priori to the decision analysis, which does not comprise activities specifically addressed to this problem. This case study of a decision analysis dealing with the choice of a set of computers to meet the overall requirements of a university psychology department demonstrates structuring techniques useful in providing a representation of a decision problem at the stage prior to the development of specific options. It shows how this representation can be used to generate alternative options, each of which could provide adequate solutions to the problem. The evaluation of individuiu elements within each option is discussed, together with subsequent rvaluation of complete sets of options, where properties of the configuration of elements within each option may be taken into consideration. In each case a computer-based interactive decision aid, MAUD4, was used in forming preferences on the basis of these evaluations. The advantages and limitations of the option generating methodology are discussed, together with a discussion of how one can use these techniques to assess the flexihilify of a potential option set in terms of the provision of a method for judging its responsiveness both (i) to future changes in requirements under the assumption that there is in the present total uncertainty about what such changes might be PO thcy cannot be structured and (ii) to future changes in states of the world about which there is currently too much uncertainty to permit explicit act-event modelling.
25 4
P. Humphreys
Introduction: Problems of Option Selection in Purchasing Decisions
This paper is based on a case study of a deckion analysis concerning the optimal set of computers to buy to completely re-equip a university psychology department.' Its principal aim, however, is to examine a method capable of aiding the decision making process at a n earlier stage than that typically addressed by decision theoretic analyses: structuring the options which will subsequently be evaluated and chosen between. In text-book analyses any need for structuring rarely appears at this stage. Options somehow have already arrived on the scene; indeed it is often the awareness of a set of options being present which triggers the perceived need for a decision analysis. Emphasis is laid on structuring the uncertainty surrounding the links between the immediate acts involved in selecting an option and the ensuing consequences (e.g., Howard and Matheson, 1980), or in structuring one's objectives in a way that allows one to form preferences between options, either by clarifying goal relationships (Pearl, Leal, and Saleh, 1980) or by identifying relationships between objectives and preferred levels on attributes characterizing the various options (Keeney and Raiffa, 1976, Section 2). However, there are many decisions relating t o the purchasing of goods within a commercial environment where a large part of the problem stems from the fact that at the outset there is considerable uncertainty what should comprise the set of options to be evaluated. Methods traditionally recommended for clarifying the specification of options t o be considered through exploring goals or initiating schemata (Pitz, Sachs, and Heerboth, 1981) d o not always help, since they lead to the specification of yet more options, whereas the principal problem here is that the potential range of options is too wide for the decision maker to be able to give proper consideration to the tradeoffs involved in making choices between them. Svenson (1 983) and Montgomery (1 983) discuss how, when faced with decisions of this type, people will try to find screening (non-compensatory) methods to remove less favourable options from consideration To preserve the realism of the material presented here, no attempt has been made to disguise the identity of the decision maker's institution, or the computers considered. It should be remembered that all the characterizatbns, evaluations and pre ferences between computers identlfied here are subjective and made relative to the specific requirements of particular users. Hence information presented here should not be taken as generally applicable in assessing the various computers. For this task, it is recommended that a new decision analysis be carried out using requirements and criteria speclflc to the new application.
USE OF PROBLEM STRUCTURING TECHNIQUES
25 5
before engaging in the more difficult task of making tradeoffs between the remaining smaller set of options which have emerged as serious contenders for choice under the decision maker’s various (and possibly conflicting) objectives. In the decision situation discussed in this paper the option-selection problem is particularly acute: the decision maker was the computer engineer for the Department of Psychology at Brunel University which has about 100 undergraduate students, 12 teaching staff and some 20 active research staff and students. The computer engineer (henceforth called the decision maker to reflect her role in this analysis) was charged with t w o objectives: (i) to replace an obsolete computer serving the department with computing facilities which would meet the various (and as yet ill-defined) requirements of users within the department, (ii) to d o so on a budget of €27,000. The second objective serves as a constraint on the development of options: all options need to comprise configurations of computers costing no more than C27,OOO. Beyond this, it does not require further elaboration. The first objective is more problematic. Figure 1 shows the decision maker’s ’first-cut’ attempt a t achieving this objective through breaking it down into a hierarchy of activities requiring computer facilities, with the pattern of distinctions within the hierarchy reflecting the formal organization of activities within the department. Figure 1 highlights a special problem in the generation of potential options in this study. It is not just a question of selecting a number of computers which might be purchased, each constituting a separate option, but a question of how many separate computer configurations should comprise each particular option. One could consider candidates for a single multipurpose time sharing machine, capable of meeting all requirements. Alternatively, a set of three computers could be purchased, each supporting one of the three activities shown at the first level of decomposition (Research, Secretarial, Teaching); or one could purchase a set of 18 small dedicated machines, each supporting one of the 18 facilities shown at the right hand side of Figure 1. In general, an option can comprise a set of n computers, each meeting the requirements of a cluster of activities formed by grouping the 18 facilities into n groups, and there are a vast number of ways of forming such groupings. When one couples this realization with the fact that there were computers available from over 40 different manufacturers which, on grounds of price and capabilities, could be candidates for an element in the set of computers comprising each option, one can see that the number of different options that could be generated is very large indeed, and far too
P. Humphreys
25 6
Overall objective: provide
Decomposed objectives
computer facilities for
facilities 1 Psychophysiological Laboratory research
~
Laboratory
\ 2 Human experimental
experiments
laboratory research 3 Cognitive simulation research
Si mulat i on
I//
4 Interactive decision aiding research
Research Conversational
1 5 Interactive educationall interview research
6 Statistical package / development research Statistics <
1
7 Statistical package
research user facility Word processing
8 Secretarial word
/
processing
\ 9 0 Iy
processing
Secretarial
:-::.: 10 Departmental student record system
11 Departmental financial record system
Records
12 External records Imailing lists, etc )
13 Undergraduate teaching lab demons t rat ion
C:t ( Classroom demonstrat ion
14 Undergraduate stats teaching demonstration 15 Undergraduate simulation teaching demonstration
Teaching
/
\
Coursework
16 Undergraduate lab coursework 17 Undergraduate stats. cou rsew o r k
-'. 18 Undergraduate simulation coursework Figure 1. The Decision Maker's Initial Breakdown of Objectives in terms of Activities Requiring Computer Facilities
USE OF PROBLEM STRUCTURING TECHNIQUES
25 I
large to permit comparative evaluation of all but a very small proportion of them. Faced with this perception of the problem facing her, the Decision Maker sought assistance from Brunel University’s Decision Analysis Unit (DAU). Stages in Structuring the Problem The immediate motivation of the Decision Maker in approaching the DAU was for aid in option generation but her major goals in the decision making process could be identified as: (a) t o select, and purchase a set of computers “optimally” meeting the department’s requirements, within the given cost constraint; (b) to provide a rationale for the selection, so that arguments can be advanced to senior departmental staff justifying the release of the budgeted funds; (c) to be able to explain clearly to users the appropriate use of components of the purchased system (post hoc): taking the decision did not remove the decision maker from the situation, as computer engineer she would be responsible for preparation of materials and courses introducing users to the system that was purchased and assisting them with problems encountered while using it. Any decision analysis which could aid this decision maker has to address all three goals, and to provide a rationale for what is meant by ”optimally” under goal (a). Hence a major part of the analysis would have to be directed at structuring requirements as well as options, and a t showing how the chosen option maps onto these requirements. This would serve the option selection, rationalization and explanation of the goals outlined above, but a complete analysis must also provide a basis for choice between the selected options and here one meets an unusual feature of this study: there are two levels at which such evaluation can be carried out. The first level involves choosing computers to comprise elements of each option, and the second level then involves choosing between complete options, each of which is defined in terms of a set of preferred computers identified at the first level. One decision analyst from the DAU worked with the decision maker in formulating and carrying out these tasks and,as described below, three decision support systems were employed a t various stages in the analysis: one for structuring options, one for evaluating alternative options and option elements, and one for generating computers which could serve as option elements. 17
25 8
P. Humphreys
The mrious sequences of stages in the analysis can be characterized as follows: (1) construct "small worlds" locating the decomposition of requirements; (2) map out elements comprising each option set in these small worlds; (3) generate computers which might satisfy "local requirements" for each option element; (4) evaluate computers identified in stage (3) in terms of how they meet the "local requirements"; ( 5 ) select optimal configurations of computers for each option, given the result of stage (4); (6) evaluate complete options against each other. These stages, the actions involved and the use of decision support systems in the construction of structures "requisite" in aiding the decision process a t each stage are discussed in detail below. Note,however, that while the decision analysis was complete at stage (6), which led to the specification of the set of computers to be purchased, for the decision maker there were two further steps in the process, viz: (7) use results in the decision analysis to justify the proposed purchase in obtaining the release of funds; and (8) use results of the decision analysis in formulating strategies for introducing users to the appropriate use of the equipment purchased in the light of their own specific requirements.
Construction of Small Worlds Locating the Decomposition of Requirements In discussion with the decision maker it soon transpired that her "breakdown of requirements" shown in Figure 1 was not particularly "requisite'' in providing a decomposition of the objectives. One way of looking at decision analysis is to consider it to comprise procedures for manipulation of content (e.g., rating of utilities of consequences, probabilities of uncertain events, etc.) within defined structures (e.g., multiattribuf :> utility hierarchies, decision trees) in a way that can provide a basis for transition to subsequent stages in the decision process (including action and accounting for the action taken). As any results of such manipulations are predicated on the nature of the structures employed, it follows that one criterion for a structure being "requisite" i s the extent t o which it permits the generation of results of use in subsequent stages.
USE OF PROBLEM STRUCTURING TECHNIQUES
259
This u posteriori criterion emphasises the essentially synthetic nature of the construction of "requisite" structures for aiding decision making (Humphreys, Wooler, and Phillips, 1980; Phillips, 1982). A priori criteria usually d o not yield appropriate structures for this purpose. There should not, for example, simply be isomorphs of the pattern of relevant linkages in a person's semantic memory as nothing is then gained by exploring them beyond what could be achieved just by asking a person to freeassociate. Neither should they be attempts at modelling "external reality", as this does not provide a basis for moving towards action but only for justifying it (Sjoberg, 1980). The structure shown in Figure 1 is based on the "external reality" of the formal organization of patterns of activities in the psychology department in question, but how can one use it to identify options? Pitz, Sachs,and Heerboth (1980) suggest focussing on decomposed objectives one at a time and generating options specially good at meeting those single objectives but that strategy gets defeated here b y the problems discussed earlier how far to decompose objectives before formulating options. Our solution was to interpose an additional structuring phase between decomposing objectives and selecting options for evaluation, consisting of mapping out requirements spaces. The rationale for adopting this approach rested on a perceptual analogy: if the set of computers comprising an option should cover all requirements, then, providing that we can locate these requirements in space, we can partition that space by marking off the local area (i.e., local requirements) t o be served by each computer in the set. In doing this, we can examine whether the complete set of partitions which serve to define the option divides up the requirements space in a way, that indicates efficient and effective use of the various elements in the set. Once we have identified options defined in this way we can move to the next stage of the decision analysis: identifying the computers which could best serve as elements in the set comprising an option. However, all
17*
260
P. Humphreys
worlds".2 To d o this we used KYST (Kruskal, Young,and Seery, 1973) t o perform non-metric multidimensional scaling. Kruskal (1964) states that "the goal of multidimensional scaling, broadly stated, is to find n points (in our case, 18) whose interpoint distances in mdimensional space match in some sense the experimental distributions of aspects (which here are the "facilities" shown in fig.1). Mead of dissimilarities, the experimental ti ieasurements may be ... measures of proximity or dissociation 01 the most diverse kind.'' Hence the first step was to develop with the decision maker a definition of the criterion t o be used in establishing inter-facility distances. As the decision maker was (i) familiar with the whole range of activities in which the various users were involved, (ii) had considerable professional knowledge of the type of hardware and software required in support of each activity,j and (iii) was the only person in the psychology department possessing an overview (based on previous consultancy) of all the activities; we decided that she should develop a subjective criterion codifying her current knowledge. The proximity criterion adopted was called "interrelatedness", combining both similarity (overlap) and interdependency of characteristics. It was scaled from 1 =unrelated (no similarity and/or complete independence) to 9 = strongly interrelated (close similarity and/or highly interdependent). In the next step in the analysis the decision maker prepared t h e e sets of 18 record cards, summarizing respectively the users, hardware requirements and software requirements for each of the 1 8 facilities. As a n example, the three cards prepared for the facility psychophysiological laboratory (1 ) are shown in the Table.
"Small world" is a term introduced by Savage (1954) to define a temporarily closed system within which the current stage of a decision problem can be conceptualized (Toda and Shuford, 1965; Toda, 1976; and Humphrevs, 1981, discuss the conditions under which It Is reasonable to make such temporary closure in order to make a decision). Hence a requirements small world, like any closed system must be coherent In this case this means that the same critexlon of distance/dissimilarity of elements must be used throughout. To bolster this, the decision maker first identified the potential users of each of the 18 computing facilities, and then asked these people about t h e l potential require ments. Some users could specify hardwarelsoftware requirements directly, for the rest these had to be inferred from the less technical accounts they gave of their activities.
USE OF PROBLEM STRUCTURING TECHNIQUES
26 1
The Contents of the Cards Summarizing the Requirements For Facility No. 1: "Psychophysiologicd Laboratory" Hardware requirements Software requirements
fast processor o fast disk access o clock o AID board o high reliability t precision 0 good V. D. U. t graphics
o FORTRAN
o
1
0
assembly language
o fast operating system
Users o Michael o research staff o outside skilled research
workers
o specialized software
graphic functions clock functions o interrupt control 0
o
J
--
Note that these requirements refer only to the requirements of the users of this particular psychophysiological laboratory. People with laboratories doing different work, or the same work in different ways, or with users of different levels and types of expertise would no doubt have different requirements.
The decision maker and a decision analyst familiar with the relevant computing technology each made independent pairwise comparisons of the interrelatedness of requirements between all 1 8 cards in each of the hardware requirements and software requirements sets. The results from these two judges were adjusted iteratively by comparing and refining scaling criteria until agreement was reached about values of all the cells in each of the matrices so generated. The comparisons were symmetric as it should not make any difference whether, for example, the interrelatedness of hardware requirements for psychological/laboratory research with human experimental laboratory research estimated or vice versa, and so only the lower halves of the resulting 18 x 1 8 "interrelatedness of requirements" matrices were obtained (cf., Coombs, Dawes, and Tversky, 1970, section 3.4.). A similar matrix was also constructed for user interrelatedness, this time by the decision maker only, as in this case she alone had the knowledge required for scaling interrelatedness comparisons. The next step consisted of using KYST to perform the non-metric regressions required to attempt to find a distance metric through which the rank order of numbers in the half matrices of interrelatedness requirements would be preserved in the rank order of distances between the points representing the 18 facilities in mdimensional space. For each
262
P. Humphreys
matrix, this was done with m = 3 , 2 , and 1. In each case an acceptable result was obtained in two dimension^.^ These three solutions for configurations of points in space serve t o map the facilities identified in Figure 1 into three two-dimensional ”small worlds”, described in terms of a hardware requirements space, a software requirements space and a users space, respectively. Figures 2 , 3 , a n d 4 show these three spaces. Each facility is represented by the number assigned to it in Figure 1. In each space, these numbered points have been located in the configuration specified by the KYST solution.’ (The shaded envelopes grouping clusters of points into ”elements)) were superimposed later by the decision maker in the manner described in the next section.) We found that it was not possible to obtain a satisfactory solution by using separate regressions to place the relations contained in all three matrices in a single two-dimensional space, as the resulting stress (0.39) was unacceptably high6 This implied that the three requirements spaces could not be topological transformations of each other. Thus we concluded that we would need to structure options siniultaneously within the rhree separate requirements small worlds identified above.
“Acceptability” is determined by examining the computed “stress” associated with result. “Stress” is a goodness of fit measure, summarizing the percent of variance in the data associated with those assigncd ratings of interrelatedness which are not properly mapped by a monotone transformation into the distance matrix. Kruskal (1964) gives details of the rationale involved in computing “goodness of fit” in this way and also a table of degree of acceptabtllty of flt assochted with the range of stress values. Latcr applications of the proc;dure usua!ly compute stress in a sllghtly different way (corresponding to KYST’s SFORM2 used here) for which values twice as large as those shown In Kruskal’s table for the origlnal method Indicate the same degree of acceptability (see Kruskal, Young, and Seery, 197 3, for details). The stress matrices for hardware software and users (shown on Figure 2, 3, and 4 respectively, all fall in the category of “fair-to-good” fit. The configurations we: reflected : :a rotated as necessary t o place “secretarial)) facilities on the left and experimental faclllties on the right in each conflflration for comparison purposes, However, this orientation is arbitrary, as in multidimensional scaling solutions only the configuration Is determined. (There is no separate delineation of ”meaningful” axes, as in factor analysis.) This was attempted through using KYST’s options SPLIT BY GROUPS and BLOCKDIAGONAL = NO.
USE OF PROBLEM STRUCTURING TECHNIQUES
Figure 2. Option 1 Mapped on the Hardware Requirements Space
Figure 3. Option 1 Mapped on the Software Requirements Space
26 3
264
P. Humphreys
Figure 4. Option 1 Mapped on the User Requirements Space
Generating Options in Three Small Worlds The next stage in the decision analysis was to generate the sets of elements constituting options with each element defined not in terms of named computers w h c h might be purchased, but simply in terms of types of computer systems which could fulfil the local requirements identified in a particular shaded area. For each option the aim was to develop a philosophy which would partition each of the three requirements spaces in a way which generated a "good configuration" of systevs meeting the total requirements. Judgements of whether or not a configuration is "good" could now be made perceptually, by examining the pattern of the areas mapped out by the partitions (expressed as envelopes arounds sets of points) made under a particular philosophy. Efficient partitioning of a requirements space is achieved when the envelopes can be seen to have smooth boundaries, and do not overlap in a convoluted way or divide each other into fragments . The rationale underlying this perceptual principle for finding good configurations goes like this: As one moves around a requirements space, the local requirements vary considerably (just like, for example, local climate varies as one moves around the "grand" physical world we call
USE OF PROBLEM STRUCTURING TECHNIQUES
265
Earth). It follows that in different localities the salience of attributes of the computing machinery needed to meet the local requirements will vary considerably, as will the tradeoff ratios between those attributes. For example, in the region a t the t o p right of Figure 2 (hardware requirements) one might be prepared to give up quite a lot on an attribute like "easy user interface" in order to gain a little in terms of "rapid computational power", which is especially salient for psychophysiological research (shown as point 1 ). Moving to the left of Figure 1, the tradeoff ratios change as one is now in locality of wordprocessing activities (points 8 and 9) where the quality of the user interface is much more important than is computational power. The "local requirements" for any computer can be characterized by (i) drawing anenvelope around the points defining those facilities it is designed to serve, and (ii) noting that the attributes that the machine needs to possess are those which become salient at any position within the envelope. Hence large envelopes will enclose requirements which can only be met by more "general purpose" machines (as yet unidentified), that is, those with a high specification on a wide range of attributes. Partitioning the space into convoluted envelopes will serve to mark out within each envelope an unusual combination of requirements which is likely t o be met only by machines which, in achieving a high specification o n the required attributes, will also possess a high specification on many other attributes which are redundant in meeting the defined requirements. As an increased specification on any attribute is usually only achieved by paying a higher purchase price, the effect of convoluted partitions in designing a configuration to a fixed cost will be t o achieve a lower overall level of performance. This is because any system chosen on this basis is likely t o end up providing facilities in redundant areas at the expense of being able to concentrate resources where they are required. Using this perceptual criterion t o determine a "good configuration" does not on its own solve the problem of how to generate an efficient set of computer options for evaluation, since there are still several different ways of partitioning the requirements spaces to meet this criterion. The only formal rule following from it is that each partition must envelop the same subset of points in all three spaces, since for each facility a specific set of users will be using a specific set of software on a specific set of hardware. Also required is a philosophy informing how one groups facilities within envelopes. The question arises here: whose philosophy? Recalling that one of the goals of the decision maker was to provide a rationale for the selection of computers, the decision analyst asked her t o identify the philosophes concerning of computer support which might be held by
P. Humphreys
266
those people who would need t o be convinced by the rationale in agreeing to the allocation of funds. 'Three distinct philosophies were identified, and the requirements spaces were partitioned by the decision maker under each of these philosopl~iesin the ways described below: The traditional philosophy: This had been inherited from the days when systems based upon timesharing minis rather than microcomputers were the norm Option 1, generated by the decision maker under this philosophy, neatly partitioned each requirements space (according to the perceptual criterion identified above) into three areas, each served by a different computer system (a business system, a laboratory system and a statistics and simulation system). Note that the areas enveloped by the three eletnents constituting this composite option d o not remain in the same relationship to each other across the three spaces (Figures 2 , 3 , and 4), but they meet the partitioning criterion in all three requirements spaces. Option 2 was generated under a more bureaucratic and authoritarian philosophy, which the decision maker suspected might be according t o user status: one for secretarial work, one for undergraduate work, and one for postgraduate and research work, with two special systems for groups developing computer software for external distribution. It resulted in highly convoluted partitions of the requirements spaces which thus failed to meet our "perceptual criterion" (Figure 5 shows this composite option mapped on the hardware requirements space). So we felt justified in Rasearch R
Secrafnnal
Figure 5. Option 2 Mapped on the Hardware Requirements Space
USE OF PROBLEM STRUCTURING TECHNIQUES
261
banishing this option and the variety of functional fixity (Duncker, 1945; Pitz, 1982) whichgave rise t o it from further consideration in the analysis. Option 3 was based on a divide-and-conquer multi-micro philosophy espoused by those who believed that the "microcomputer revolution'' would set the pattern for computing in the future. This generated eight microcomputer-based systems, each performing a different but restricted function. Its partitioning of the hardware requirements space is shown in Figure 6. The partitioning of all three spaces b y this option was quite acceptable under the perceptuai criterion, but the decision maker noted that large areas within the hardware requirements space were not enveloped by any option element. As there were no points indicating unmet requirements within these "out-of-bounds" spaces, this did not matter in terms of the current projection of requirements, but it did have implications for future, as yet unforeseen developments in computing requirements in a way that will be discussed later. I
Psvchophystolog~cel laboratory
Human expenmental labors tory
Word processl mulation reachmg
Decision analysis unir
Figure 6. Option 3 Mapped on the Hardware Requirements Space
268
P. Humphreys
Generating Computers Which Miglit Satisfy an Option Element's Requirements The next step task was to identify actual, purchasable computers which might satisfy those local requirements identifid within each partition of the three iequirements spaces under the t w o surviving philosophies. To d o this, the decision analyst grouped together the record cards characterizing the hardware and software requirements for all the facilities enveloped by each element within each of the comp\site options. There are eleven such groupings for both hardware and software requirements; three for option 1 plus eight for option 2. This started the decision maker thinking about actual computers which might meet particular groupings of requirements, but it quickly became obvious that extensive "lateral thinking" would be required in dreaming up suitable computers meeting particular requirements, given the plethora of machines on the market. De Bono (1971) describes techniques for aiding and training intuitive lateral thinking and these would usually suffice in cases where the decision maker had sufficient information about characteristics of computers (or whatever) in his or her mind. However, in this case the decision maker expressed uncertainty about whether her knowledge of the available computer systems was sufficiently comprehensive. As we had available a computer-based system for the generation and lateral exploration of structured data bases, we decided t o adapt this system as a Decision Support System (cf. Bonczek, Holsapple, and Whinston, 1981) for this stage in the analysis. A database was generated relating 41 different computers from nearly as many manufacturers within a network defined in terms of both featural and hierarchical relations (with multiple superset membership a l l ~ w e d ) .The ~ information used in specifying the links in the database was gleaned by Decision Analysis Unit personnel from reviews in computing journals, manufacturers' literature, hearsay and subjective impressions provided by the decision maker. The software which we employed t o explore this database "laterally" used, as guidelines for its search, operations specified by Lacan (1977) as underlying displacement and condensation in primary process thought (Nagara, 1969). The "motivation" guiding the system in performing these operations was to seek computers possessing "desired" attributes, while
' The characteristics of this structure arc analogous to those prescribed in "marker-warch" model\ of human semantic memory (Glass and Holyoak, 1975; Smith, 1978).
USE OF PROBLEM STRUCTURING I'ECHNIQUES
269
avoiding those possessing "undesirable" attributes; the specification of what was desirable and undesirable being supplied by the user each time the decision support system was used. The procedure was as follows: Each of the 11 groupings of record cards specifying option elements was used by the decision maker to generate a specification of desired and undesirable attributes for computers which might implement that element. Under each specification the 1BM 51 10 running the decision support system printed out the names of the computers encountered through its displacements and condensations through the database, together with comments on those desires each computer could meet and the enhancements which would have t o be made t o meet the rest (e.g., "add hard disk"). From the suggestions provided by these printouts the decision maker drew up eleven "option element short lists", each specifying between six and nine computers for further evaluation.
Evaluation of Computers Shortlisted for Option Elements
A primary process based system, while good at dreaming up short lists of potential candidates, in this case computers meeting particular requirements, is useless in identifying the relative importance of the various attributes on which shortlisted computers should be assessed and at making tradeoffs between those computers in terms of the extent t o which they are variously preferred on salient attribute dimensions. For this task it is necessary to generate a structure suitable for multiattribute utility analysis (Keeney and Raiffa, 1976). However, .nultiattribute utility theory assumes fixed tradeoff ratios between attributes at any particular level. This is unwise as a universal assumption here since, as was pointed out earlier, the relative importance of attributes vary enormously as one moves around the requirements small worlds. Hence we decided that it would be necessary t o compute valuewise importance of attribute dimensions (assuming fixed tradeoffs) sepururely for each of the three elements identified by the partitions of option 1, and for each of the eight elements similarly identified for option 3. However, the decision maker was uncertain at this point about how t o specify ?he various sets of attributes which she would like to use in evaluating the various short lists of computers. To aid her in this task we used MAUD4 (Humphreys and Wisudha, 1981), an interactive computer program designed to aid the structuring of preferences between multiattributed alternatives in direct interaction with the decision maker, without the intervention of a decision analyst. Descriptions of MAUD3 (a precursor of MAUD4) are given in Humphreys and McFadden (1980) and John, von
P. Humphrcys
210
Winterfeldt,and Edwards (this volume). A review of the iniproveinents on MAUD3 incorporated in MAUD4 is given in Humphreys and Wooler (1981). The decision maker spent 1 1 sessions with MAUD4, once for each option element. In each session she worked in interaction with MAUD4 in identifying and modifying attribute dimensions salient in deciding between the particular computers on the short lists for an element and in computing their relative importance in choosing between these computers to implement the element. The sessions lasted from 30 t o 45 minutes each, and were spread out over a period of one week. Each session resulted in M U D 4 producing a summary, and a n extract from one of these 1 1 summaries, for the element "cognitive and educational research", grouping facilities 3 and 5 under option 3 is shown in Figure 7. C u l r e n t o r d e r u f P r e f e r e n c e of CONPUTERG frum b e s t t u worst I o r e f e r m - e values a r e m v e n
EXORSET
I
HORIZON
8
CRUNENCO APPLE
,701
I "
brackets)
(BEST
.70) .SE)
I
:I
.5E1
RnL ?ED 2
.35)
I
SUPERBRRIN
I
SUPERPET
I
.?&I
OHIO LlrP
I
. L C J (WORST
. & I 1
w Preferences
H U R I 2 O N Rat I n g
f,.)!
O H I O C O P
COMPUTERS ctn t n d # v \ d u a I s r ~ a l e s
E X U R S E T
R P P L E 2
S U P E R P E
S
R
U
M
P
L
T
R
scale itumbel
E R B
R
3
E E
I N
2
( 1 )
1.00
.OO
.83
1.00
.17
.17
.EJ
.50
2)
.25
.OO
. 3 8 1.00
.50
.38
.l'
.7s
Ratnn9 s c a l e name
FRST t u SLOW
I
T)
. l?.
1.00
.OO
.OO
.61
.63
.38
.25
I
4)
.86
.E6
.OO
.71 1.00
.S6
.71
.85
r e l a t j v e ~ m ~ u r c a n=~ e . 0 ? GOOD QRRPHlC5 t u POOR GRAPHIC5 r e I a t , v e ~ m o ~ i t r n c=e .10 EXPENSIVE t u CHERP r e I a t l v C ~ r n o u i t a n ~ .=e .26 PORTRBLE t u NOT PORTRBLE
(
SI
.88
. 0 0 1.00
.EE
.EE
.IE
. 13
.50
5PECIRL CORDS t o
(
6)
1.00
1.00
.50
.OO
.33
.E7
RELIABLE to UNREL I RBLE
(
r.latave
-
.W NO SPECIRL CARD
nmouitance
S r e l a t t v e a m ~ u r t a n c e=
(
7)
1.00
.OO
.8J
. O O 1.00 1.00
.50
.50
.75
.53
( 6 )
1.00
.001.00
.67
.50
.33
.67
.33
9)
1.00
.OO
.83
.83
.67
.33
.67
.C7
r r l a t a v e omoortance = LRRQE CORE t o SNRLL CORE relative ~ m u o r t a n c e= QDOD D I S K S t o POOR D I S k S r e I a t I v r ~ m ~ o r t a n c=r
.05
.30 .OE
.05 QOOD SOFTURRE to W O R SOFTURRE rrlattve ~ m o o r t a n c r 10
-
.
Figure 7. Extract from the Summary of the Session with MAUDI for the Element "Cognitive and Educational Research" of Option 3
USE OF PROBLEM STRUCTURING TECHNIQUES
27 1
The decision maker was content t o use the relative preference values for computers shown in the MAUD4 summaries as a basis for selecting the computer implementing each element (e.g., from the summary shown in Figure 7 a Motorola Exorset was selected as this received the highest preference rating). This led to the selection two Zilog computers and one PDPll computer for the three elements of option 1 , and seven North Star Horizon microcomputers and a Motorola Exorset* for the eight elements in option 3 .
Generation of a New Option The discovery that the "multimicro" philosophy led t o the specification of an option with nearly all the computers coming from the same manufao twer (or indeed all of them if we selected the Horizon that was a close runner up in preference to the Exorset in the summary shown in Figure 7), enabled the decision maker t o overcome one aspect of the "functional fixity" of the multimicro philosophy: if a particular type of machine can perform well in meeting the various requirements identified within all elements of a particular composite option, then the partitions of the requirements spaces do not need to be rigid, but can be permeable. Given this realization, the decision maker decided that it would be a good idea t o design the new composite option where the envelopes around elements are expressed at two levels: an inner core defining the primary function of the computer system implementing the element and an outer region describing the range of secondary functions which could be carried out on the same machine. Allowing secondary functions such as "do-it-yourself wordprocessing" (point 9) to lie within the broken line envelopes defining the secondary functions of more than one element would mean that the relevant "secondary" activities would be carried out on more than one system. The partitioning of the hardware requirements space generated by the decision maker under this new "permeable" philosophy is shown in Figure 8 where the envelopes defining primary functions of elements are heavily shaded and the outer region 3 of secondary functions of elements are lightly shaded. This way of partitioning the hardware requirements space turnzd out to have some nice characteristics in that there is consider-
'
Note that the decision maker had particular conflyrations in mind for each computer, this of course affected het ratings of spccifications, costs, and preferences.
212
P. Humphreys
Figure 8. Option 4 Mapped on the Hardware Requirements Space (Option 4a has the same mapping)
able overlap of elements able to perform particular functions in the centre, providing a lot of backup for widespread activities like use of statistical packages. Some facilities, like "do-it-yourself wordprocessing, are present only as secondary functions, but available on a wide variety of elements. The decision maker was particularly interested in being able to use the option structuring technique in developing this "permeable" philos-
USE OF PROBLEM STRUCTURING TECHNIQUES
213
ophy since it could potentially serve all three of her goals. It provided for the possibility of selecting a set of computers designed t o meet the requirements of the department through a mixture of primary and secondary computing functions (goal a); it provided a rationale for selection under a new philosophy which could be contrasted directly (by comparing pictures such as Figures 6 and 8) with the results generated under other philosophies held within the psychology department (goal b); and it provided a way of explaining clearly to users how to get the best use out of the available facilities in terms of both the primary and secondary functions of the elements (goal c). However, there was one problem. The partitioning of the software requirements space, which, as explained above, had t o follow the same grouping rules as the partitioning of the hardware requirements space did not look so promising under this philosophy. In the central region it was very convoluted, and appeared to be unacceptable under the ffperceptual" criteria described earlier. However, the decision maker then realized that the problem could be solved by specifying a machine-independent common operating system (CPIM) for all activities in this area. In this case the central region partitions become so permeable that they dissolve, and so convolution is irrelevant. Following the same computer short list generation and evaluation procedure for this option as that described above for options 1 and 2 , the decision maker identified the computers most preferred in implementing the six elements constituting this new option 4 , as six North Star Horizons and a Research Machines 3802 (one option element was implemented by two Horizons). At this point. the decision maker decided t o consider also an alternative option 4, based on the same elements, but implemented o n seven Horizons (a Horizon had been a close runner up to the 3802 in the MAUD4 summary of preferences for the "portable system" element, with a relative preference of 0.67 compared with 0.71 for the 3802). This alternate option had different configurational properties: the reason the decision maker gave for considering it alongside the original option 4 was that it had the added benefits of making it easier to substitute machines, exchange software between machines, stock spare parts, etc. It had the risk, though, of reliance on a single manufacturer and would provide less breadth of experience of hardware systems for students in the psychology department.
18
214
P. Iiumphreys
Choosing Between Options The final task was to choose between the four options. Fortuitously the cost for alternative sets of computers chosen for the different options lay between €26,000 and €27,000 in each case, just inside the cost constraint. Therefore, since there was almost no difference between the capital cost of each option, cost did not enter into the analysis at thislevel (although it had previously entered the analysis during each session with MAUD deterniining preferences between computers for option elements.’ Neither were the relative preference values coniputed in interaction with MAUD4 for computers within option elements a suitable basis for computing overall preferences across elements or across complete options. Therefore, the decision maker used MAUD once again, this time evaluating whole options defined in terms of sets of the preferred computers implementing option elements. Now criteria represented at the composite configuration level but not at the level of individual option elements dominated the decision maker’s choice, with degree o f s o f t w r e interchangeability between machines and substitutability of machines carrying 67% of the weight in a seven-attribute decomposition. As can be seen from the extract from MAUD4’s summary for this session, reproduced in Figure 9, purchasing seven North Star Horizon microcomputers t o provide facilities in the way specified in Figure 8 was clearly preferred overall to the other options, with a relative preference of 0.89.
* If the total cost of any option needed had adjustment so as to lie just inside the cost constraint, our procedure would have been to adjust the relatiw weights glven to the cost attribute in the MAUD4 results for preferences within elements by similar proportions across a l l elements comprising the option untll a set of preferred computers meeting the cost constraint was generated. A paretoaptimal ”design to cost”so1ution (Buede and Peterson, 1977) is not appropriate here as tradeoff ratios between attributes vary across elements in the option.
USE OF PROBLEM STRUCTURING TECHNIQUES SUMMRRV OF SESSION CONFIGURFITIONS
215
SO FRRI
Current o r d e r of oreference of COMPUTERS from best to Norst (Pre+erenct values are ~ I V Q I ~n I brackets) GP417YHRZ
.E9)
(
(EES1
OPkr6XHRZ+3802
(
OP3:7YHRZ+FXOR
t .70)
OP1I2XMCZ+PDPll
.71)
(
.09) (WORST
25 PreferenCQS f o r COMPUTERS on ~ n d ~ v ~ d u 5acld as ( 1 . 0 0 - b c s t . 0 P 1
%CAI.
number (
1)
0 P k
0 P k
,
I
l
l
2 Y M C 2
7 X H R z
6 X H R z
7 X H R z
P O P 1 1
E X
3 8
0
0
R
Z
+ RAtlnY
0 P 3
+
+
Ratbn.
.00 1.00
.75
.88
.75
.EE 1.00
.88
scale name
IZRTIGN
-
HIGH MRINTENRNCE COSTS to LOU MCIINTENRNCE COSTS
relatnve ~noortance
< 2)
8.BB-worst)
.07
,
LOU TRFIININO GENERQLIZRTION to HIGH TRRININQ GENERAL
I
3)
1.00
.75
.25
.00
relatlve ~mmortancc = .I'd RELIRNCE ON SINGLE FIRM to RELIRNCE ON TUO FIRMS
(
5)
1.00
.58
. 13
.00
UIDE HRRD EXPERIEN to M R R O U HRRD EXPERIENCE
(
6)
.fM . 7 5
reiatlve nmoortance relatnve ~mrortance
.BB 1.88
(
7)
.00
.83 1.00
(
E)
.BE
.63
.83
.63 1.00
.0Z .07
SUESTITUTFIBLE MtXHINES to NON-SUBSTITUTRBLE MFICHINES TeIAtIVe ~moortance .31 ERSV EYP SOFTWIRE t o D I F F I C U L T EXP SOFTWRRE rrlattve ~mportance * .&o LOW WJFT INTERCHFINO to HIGH SOFT INTERCHRND
reiato~e tmoortance
.38
The f o l i ~ ~ o nEsC A ~ ~are S no longer on use for the reason8 stven below
Figure 9. Extract fom the Summary o f the Session with MAUD4 Evaluating Corn plete Opt ions
18*
216
P. Humphreys
Advantages and Limitations of the Option Structuring Procedure The six stages outlined for the decision analysis were now complete. The decision maker expressed satisfaction with the result. She intended t o follow the prescriptions of option 4 and had also gained support for the two further steps in the decision process outlined in the introduction: stage 7, using the results of the decision analysis to justify the proposed purchase in obtaining the release of funds, a i d stage 8, using the results of the decision analysis in formulating strategies for introducing users t o the use of the specific requirements. The decision maker subsequently reported that generating the information displayed in Figure 8 was the part of the decision analysis that had been of particular use in implementing each of these two steps. Seven Horizon microcomputers were purchased, but perhaps the real value of the decision came from a result that neither the decision maker nor the analyst had anticipated at the start of the analysis. In a follow up one year after stages 7 and 8 had been implemented we found there had been a change in the activities of the users of the computing facilities in the psychology department. They, like the personnel involved in the decision analysis, no longer kept to the functional fixity stemming from being brought up on "traditional" or "user status" computing philosophies, but started using the computers interchangeably for a wide variety of secondary activities, ensuring more efficient and evenly spread use of the available computing resources. Another interesting property of the option structuring technique is that it can be used to assess the flexibility of an option (defined as judged responsivity to future changes in requirements) even when there is at present total uncertainty about what such future changes might be. We can examine the way in which options partition the requirements small worlds, noting, for example, that the "multimicro" option 2 leaves much of the hardware requirements space outside the areas covered by current option elements. If a new activity emerges with requirements located outside the bounds of any current option element, then the "flexibility" of this option depends on being able t o purchase a new microcomputer to service this activity without disturbing other computing activities. One can take advantage of up-to-date developments in computer technology, but at the cost of having to raise money to buy the machines. The flexibility of option 4, generated under the "permeable" philosophy, is rather different. Here most of each requirements space is already covered in terms of secondary functions of available computers so servicing any new as yet unforeseen activity should be possible, at least
USE OF PROBLEM STRUCTURING TECHNIQUES
211
initially, with the resources presently specified under this option. Additional expenditure becomes necessary only when the new activity reaches the level where it needs to occupy the primary function of a new computer. The cost of t h i s flexibility is reliance on the continued development of computers which are compatible in hardware and software terms with those currently implementing this option. Investigation of the nature of the ”flexibility” of a plan or option is a major preoccupation of decision makers with responsibility for longrange planning, but has been largely ignored in the development of formal techniques supporting decision analyses. One cannot always foresee the future in the way decision trees require and, after the initial decision has been taken, one can find oneself trapped by their straitjacket when subsequent events and acts unfold in unanticipated ways (Brown, 1978; Humphreys, 1981). The alternative “flexible” strategy of refusing to act so that one can ”keep all options open” (i.e., one does not commit oneself to any model of the future) brings with it its own pitfalls. The decision analysis described here may open the way t o exatnining the future adequacy of immediate acts in cases where there is too much uncertainty about the structure of future act-event linkages to permit the employment of more traditional techniques such as act-event trees and influence diagrams. However, much more work is needed before we can feel confident that the technology is available to carry out a “flexibility analysis” at the end of any decision analysis, in the way that today a “sensitivity analysis” is usually carried out. The success of the option generating technique described here depended crucially on the nature of the decomposition of objectives (Figure 1) whose structure in this cube permitted successful scaling in two dimensions in easily identifiable small worlds. The next steps are (i) to assemble alternative option structuring methods for use in cases where it is not $appropriate to represent options in multidimensional space and (ii) to learn how to select the most appropriate method for a particular problem.
References Bonczek, R.H., C.W. Holsapple, and A.B. Whinston, 1981. Foundation of Decision Support Systems. New York: Academic Press. Brown, R.V., 1978. Heresy in decision analysis: Modelling subsequent acts without rollback. Decision Sciences, 9, 543-554. Buede, D.M. and C.R. Peterson, 1977. An application of cost-beneflt analysis to the USMC Program Objective Memorandum (POM). Technical Report 77-8-72 McLean, Virglnla: Decisions & Designs, lnc.
218
P. Humphrcys
Coombs, C.H., R.M. Dawes, and A. Tversky, 1970. Mathematical Psychology, an Elementary Introduction. Englewood, N.J.: Prentice Hall. de Bono, E., 1971. The Use of Lateral Thinking. Harmondsworth: Penguin Books. Duncker, K., 1945. On problem solving. Psychological Monographs, 58 (5). Glass, A.L. and K J . Holyoak, 1975. Alternative conceptions of semantic memory. Cognition, 3, 31 3 - 339. Howard, R.A. and J.E. Matheson, 1980. Influence Dhgrams Menlo Park, California: S. R. 1. International. Humphreys, P.C., 1981. Decision Aids: Aiding decisions. In: L. Sjoberg, T. Tyszka, and J.A. Wise (eds.), Decision Analyses and Decision Processes, voL 1. Lund: DOW.
Humphreys, P.C. and W. McFadden, 1980. Experiences with MAUD: Aiding decision structuring versus bootstrapping the decislon maker. Acta Psychologica, 45, 51-70. Humphreys, P.C., S. Wooler, and L.D. Phillips, 1980. Structuring declsicns: The role of structuring heuristics. Technical Report 80- 1. Uxbridge, Middlesex: Decision Analysis Unit, Brund Unlversity. Humphreys, P.C. and A. Wisudha, 1981. MAUD4. Decision Analysis Unit Technical Report 8 1-5. Uxbridge, Middlesex: Brunel Universlty. Humphreys, P.C. and S. Wooler, 1981. Development of MAUD4. Decision Analysis Unit Technical Report 8 1-4. Uxbridge, Middlesex: Brunel University. John, R.S., D. von Winterfeldt, and W. Edwards. The quality and user acceptanoe of Multiattribute Utility Analysis performed by computer and analyst. In thls volume, 301-319. Keeney, R.L. and H. Raiffa, 1976. Decisions with Multiple Objectives: Preferences and Value Tradeoffs New York: W h y . Kruskal, J.B., 1964. Multidimensional scaling by optimizing goodness of fit to a non-metric hypothesis Psychometrika, 29, 1- 26. Kruskal, J.B., F. Young, and J. Seery, 1973. How t o use KYST, a very flexible program for multidimensional scaling and unfolding. Murray Hill, N. 1.: Bell Telephone Laboratories. Lacan, J., 1977. The agency of the letter In the unconscious. In: J. Lacan, Ecrits London: Tavistock. Montgomery, H., 1983. Decision rules and the search for a dominance stucture: Towards a process model of decision making. In this volume, 343-369. Nagara, H., 1969. Basic Psychoanalytic Concepts on the Theory o f Dreams London: George Allen and Unwln. Pearl, J., A. Leal, and J. Saleh, 1980. GODDESS: A goal-directed decision structuring system. UCLA-ENG-CSL-8034. SchoolofEngineering and Applied Sciences, University of California, Los Angeles. Phillips, LD., 1982. Requisite decision modelling: A case study. Journal o f the Operations Research Society, 33,303-311. Pitz, G.F., 1983. Human engineering of decision aids. In this volume, 205-221. Pitz, G.F., N.J. Sachs, and M.T. Heerboth, 1980. Eliciting a formal problem structure for individual decision analysis. Organizational Behavior and Human Performance, 26, 396-408. Savage, L.J., 1954. The Foundations of Statistics New York: WIley. Sjoberg, L., 1980. Volitional problems in carrying through a dlfflcult decislon. A c t a Psychologica, 45, 123- 132.
USE OF PROBLEM STRUCTURING TECHNIQUES
279
Smith, E.E., 1978. Theories of semantic memory. In: W.K. Estes (ed.), Handbook of Learning and Cognitive Processes, vol. 6. Hillsdale, N.J.: Lawrence Erlbaum Assodates. Svenson, O., 1983. Scaling evaluative statements IS verbal protocols from decision processes. In this volume, 371-382. To&, M., 1976. The decision process: A perspective. International Journal of General Systems, 3, 79- 88. To&, M. and E.H. Shufard, 1965. Utility, induced utilities and mall warlds. Behovioral Science, 10, 238-254.
This Page Intentionally Left Blank
NON-EXPERT USE OF A COMPUTERIZED DECISION AID' Fred BRONNER VeldkamplMarktonderzoek, Amsterdam, The Netherbnds
and Robert de HOOC Instituut voor Wetenschap der Andragogie University of Amstemlam, The Netherlands
Abstract This paper describes the results of an experiment in which a wide variety of subjects (according to age, education, sex, political preference) made a choice from a set of cars and political parties by means of an automated decision aiding technique for personal decision making. It is shown that computer-based aids can, in principle, be handled reasonably well by people without a specialist training in decision analyis. Nonetheless, some groups of people handled the program better than others and some stages in the program were found to need considerable improvements. Subjects had welldefined opinions on applicability of the program, preferring this kind of aids for problems at the intermediate level (not too trivial, not too emotional), such as choosing a job, an education or wnsumer durables. In measuring the helpfulness of the aid, we were able to show that for political party choice there is a logical hierarchical structure in the concept: structuring, raising awareness, decision justification. The paper wncludes with thoughts about future developments of this kind of program (e. g., concludes with external information systems and construction of new, more 'user-friendly' versions).
' This research was made possible by the Instituut voor Wetenschap der Andragogie of the University of Amsterdam. Valuable assistance was provided by I-lajee van Houten, JosC Vogels and Nicole Diamand. We thank Veldkamp Marktonderzoek for offering facilities for conducting the experiment.
F. Bronncr and R.de Hoog
282
Introduction Interactive computerized decision aids are a relatively recent development in the field of decision research. Although the concept of 'Decision Support System' has been in existence for a fairly long time (see, e.g., Scott Morton and Huff, 1980), knowledge about this kind of decision support is mainly limited to more or less 'professional' decision makers who are confronted with 'familiar' decision problems (Humphreys and McFadden, 1980. See for exceptions among others: Beach et a/., 1976; Humphreys and Wooler, 1981). Due t o the increasing availability of cheap home-computers, the question arises whether aids of this type can be made more accessible to large groups of people. In principle this goal can be achieved quite easily. It is no great feat of programming t o write a computer program that performs the basic components of a decision analysis (see, for an example, the program 'Decide' in R u g and Feldman, 1980). The issue of the quality of the aid that has been provided seems more important. The design of such a system requires insight into the way ordinary people interact with decision aids of this type. A neat way to approach this problem is to start with a kind of 'prototype' of a program, conduct a number of experiments with a wide variety of subjects and improve the prototype with the experience gained (Shneiderman, 1980). This approach is analogous to the one described by Turoff (1 980, p. 3):
..., we do not have an environment where we can hope t o derive natural laws from the experiences observed. What we can d o is observe case studies in an uncontrolled environment and hope that some implications emerge that appear t o be consistent over similar cases." "
Our problem can be formulated as follows: what knowledge can be gained from experiments in order t o improve the quality of interactive computer-based aids for intelligent and effective use by people without a sfxcialist training in decision analysis. The decision aid we are aiming at is a so-called situation-based system. This is a domain-independent system that provides an 'empty structure' for the user. Persons interacting with the program are supposed to possess the relevant knowledge and expertise and only map into the machine that section of knowledge perceived as relevant t o the problem at hand. The structure of the aid is that of decision analysis and aims at aiding the decision process; and is not directed at predecisional structuring such as discovering alternatives (for this approach, see Pearl et al., 1980). In addition, one could say that the aid must have
NON-EXPERT USE OF A COMPUTERIZED DECISION AID
283
’individual manageability’ too, which means that a wide variety of individuals can use the program fruitfully. As a first step towards such an aid we carried out an experiment in which subjects of different educational levels and age-groups had to make a choice in interaction with a computer program of the general type just described.
Method Design of the Experiment
One of the main difficulties in testing the quality of a computerized decision aid is the interference between problems which are due to the (in)adequacy of the underlying model, and problems that result from weaknesses in the design and coding of the computer program. If the program is not optimally designed, failure to aid decisions effectively can be attributed t o the suboptimal design as well as to the model. This implies that before testing a computerized decision aid with people with real decision problems, a number of trial runs must be conducted in order to detect and correct design and coding mistakes. Experiments of this type are characterized by the fact that the focus is on improving the program instead of aiding the decision maker. That is why the design of these experiments doesn’t call necessarily for subjects with real decision problems. The only thing that is required is first and foremost a group of subjects that reflects roughly the group of intended users of the aid. Secondly, the task given to subjects must be clear and relate to relatively well-known choice objects. Clear, because in testing the program of the level indicated one must avoid problems with pre-decisional structuring. The capability of the program to cope with pre-decisional structuring problems is a question about the adequacy of the underlying MAUT-model. Well known, because interactions with the program that make sense in terms of the possibility to detect errors require at least some knowledge by the user about the domain selected. Apart from gaining insight into design errors, experiments of this type can also be used for other goals. Two of them are especially important in our case: - The subjects can be asked whether they can give an indication of domains they feel are promising for computerized decision aiding. These domains can be employed in future research concerning the decision aiding capability of the program.
204
F. Bronner and R. de Hoog
Developing measuring instruments for evaluating the effects and sideeffects I of computerized decision aiding processes. There are a number of criteria on which the success of an aid can be determined, but in most cases one needs operationalized instruments for doing so. Taken together these requirements broadly define a class of experiments that can give us knowledge about possibilities for improvement of interactive computer-based aids. The restriction to non-expert users iniplies also a specific selection procedure for subjects.
-
The Actual Experiment The prototype we selected was an adopted version of Humphreys’ MAUDprogram (Humphreys and Wisudha, 1979), translated into Dutch, written in BASIC and run on a TRS-80 minicomputer. Our program was based on MAUD2, which has since been supplanted by improved and more comprehensive versions,’ MAUD3 and M A W . In our version the instructions were often completely rephrased, while the weighting procedure is changed fundamentally. Perhaps the program we used can be best described as a program with a common ancestry with MAUD. The program decomposes the decision process in six stages: elicitation of attributes by means of a repertory grid procedure; rating of alternatives for the relevant attributes; positioning ideal p i n t s for each of the attributes; a test of independence of the attributes; a prescribed preference order and an optional weighting procedure for the attributes. As has already been mentioned, this last stage was the main difference to the original MAUD2 program which employed a Basic Reference Lottery Ticket (BRLT) procedure. After some preliminary sessions it became clear that this BRLT procedure was too difficult for most subjects. Bauer and Wegener (1977) stated earlier that simple estimating techniques proved to be nearly equally effective as more complex assessment techniques. Furthermore, we wanted to provide our subjects with a simple procedure t o ’fit’ prescribed and intuitive preference orders, if desired. Because the philosophy underlying MAUD-like decision aiding systems does not imply any restriction of intended users, we aimed at a wide variety of subjects. That is why we decided to test the program in a group of 40 subjects selected from the population of Amsterdam. Selection was done by interviewers on the basis of a quota scheme. Four selection criteria were used: educational level, sex, age-group, and political preference.
*
An evaluation of MAUD3 is given in John er a1 (thls volume); the development and evaluation of MAUD4 is desaibed in Humphreys and Wooler. 1981.
NON-EXPERT USE OF A COMPUTERIZED DECISION AID
285
Table 1 . Sample Characteristics (N = 4 0 ) (i)
Political preference
N
Socialists (PvdA)
11
Educution
low
N 9
9
medium-low
10
Christian Democrats (CDA)
10
medium-high
1s
Conservatives (VVD)
10
high
Liberals (D’66)
(ii)
(iii)
Age-groups
(iv)
18 -24 years
12
25 -34 years
6
35 -49 years
9
5 0 years
13
6
Sex male
21
female
19
From Table 1 it is clear that we achieved the variety in subjects. Each of the subjects was invited to come to a central location where the session took place. A session lasted about an hour on the average andin this period the subject had to perform three tasks: first, he or she was to choose a car from a set of six different cars preselected by us; secondly, choosing a political party from six well-known Dutch parties; and thirdly, he or she had to answer a self-administered questionnaire. Cars and parties were selected by us because they both are clear and well-known objects for most people, so no problenis concerning predecisional structuring and lack of knowledge were anticipated. The questionnaire contained questions pertaining to evaluation of the procedure, perceived applicability of the aid, suggestions for improvement of the program and usefulness of the program. For all subjects these tasks were carried out in the same order (cars, parties, questionnaire). Each subject generated three types of data: an observation scheme tallied by the observerlexperiment leader; a protocol of the conversation between the subject and the program; the questionnaire presented after the session with the computer. Results are based on all these three datatypes and relate mainly to political parties.
286
F. Bronner and R. de Hoog
Results We will present our results in three subsections: an overview of some salient findings; the capacity of the program to act as supportive partner in the decision process; the question of the perceived applicability of the aid and the helpfulness of the program.
Overview The average length of conversation between subject and computer program was 15-20 minutes, which is relatively brief considering that a large number of subjects had had no previous experience with computers or even with manipulating key-boards. Concerning the prescribed preference order, 86% of the subjects expressed satisfaction with the program’s advice far cars (84% for parties). Moreover, for 95% of the subjects the party recommended for a vote coincided with the actual political preference of the subject. The high convergence is in this case an indication that the underlying model obviously captivates some essentials of the subjects’ decision process. At least for cars and parties no radical changes in the approach seem necessary. Another interesting finding was the great variety of attributes used by the subjects. For parties 80 different attributes were used:40variants of a kind of ’left-right’ distinction, 12 variants of the capacity of parties to participate in coalition governments, 9 proxy measures of preference (eg., good-bad) and 8 variants of religiousness of a party. Apparently the program gives the subjects sufficient scope to employ their own subtle distinctions in perceiving parties. This is precisely what is required of a situation-based aid. Nonetheless, some things did not go so smoothly with the attributes, because 25 subjects did in fact use attributes with one or two poles in common. For example, left-right together with left-conservative. Even if these attributes turned out to be highly dependent, only few subjects were willing to drop one or substitute them with one, more general, attribute. The prevention of these ’double’ attributes forms a point for improvement in the design of our program because they lower the quality of the aid, Another problem with the grid-procedure for elicitating attributes is a peponderance of purely cognitive attributes which are not very relevant for the choice. In choosing between cars subjects made a first distinction between Japanese and German cars, which was unlikely to be a very salient attribute in buying a car.
NON-EXPERT USE OF A COMPUTERIZED DECISION AID
287
Supportive Partnership
For most of the subjects the interaction with the program proceeded surprisingly smoothly. From Table 2 it can be concluded that in the case of parties 70% of the interactions were classified as (very) easy. Table 2. Ease of lntcraction with the Program Alternatives rated Subjects’ reports
cars
political parties
Very easy
Easy Not (so) easy
44% 100% = 40 subjects
According t o their own evaluation the great majority judged the procedure as simple, clear and understandable. It seems from Table 2 that the second session went better than the first which could be attributed to the subjects’ quickly gaining expertise. From the questionnaire, however, it became clear that t h s was mainly due the fact that most subjects thought they were better informed about parties than about cars. This suggests that the quality of the program is not entirely situation- or domain-independent, but is related to knowledge possessed by the subjects (or lacking in the program). Although the overall picture seems satisfactory, a detailed analysis of problems occurring in the six different stages in the program reveals some bottlenecks, as shown in Table 3. Most problems occurred in the elicitation stage which required a lot of technical assistance (e.g., retyping, explication of actions requested by the program) and a considerable amount of conceptual assistance (e.g., clarifying the meaning of actions). The other stages proceeded more satisfactorily. As has already been mentioned in the previous subsection, improvements in the repertory-grid procedure implemented in our version of the program are clearly necessary, because we expect these problems to return in other situations in which the aid may be used.
F. Bronner and R. de Hoog
288
Table 3. DifficultiesExperienced by subjects in Each Phase of Interaction with the Program
Phase
at least 1 request for technical assistance
at least 1 request for conceptual assistance
Elicitation
61
42
Rating o f alternatives on attributes
23
25
Ideal points
3
15
(ln)dependence chcck
0
12
Prescribed preference order
0
3
Importance w i g h t s
5
7
One could object that it is difficult to discern whether the diffkulties are caused by the formulation used or by the problem that people d o not think about choice objects the way multi-attribute utility theory based programs require. But previous research has shown that political parties and cars are seen by voters and consumers as multi-attributed bundles of characteristics (political parties: Budge et al., 1976. who compare results of different countries; cars: Green et al., 1969; Green, 1973). So it seems plausible to suppose that the difficulties are caused by formulations in the program. Concerning the individual manageability of the aid we found some marked differences between several groups of subjects. For five important variables characterizing individuals, we tried to discover differences in ease of interaction with the program. The main differences (see Table 4) were found in education (of the highly educated 80% found it very easy contrasted t o 22% of the less educated) and age-group. Sex and technical experience turned out to be of little importance. Another clear relation was that between the time required and the ease of interaction: the faster the procedure went the easier the interaction, a relation that even held despite differences between number of attributes used, weighting procedure used and other variables influencing time. From these findings it is obvious that the present system does not yet exhibit individual manageability, at least not for the domains mentioned here.
NON-EXPERT USE OF A COMPUTERIZED DECISION AID
289
Table 4. Percentages of Subjects Experiencing Easy Interaction with the Program
-
Classification o f subjects
Total somple (i)
(iii)
(iv)
22
medium-low
30
medium-high
36
htsh
80
l(y age-group
18 -24 years
64
25 -34 years
40
35 -49 years
22
50 years
23
By sex
male
31
female
31
By txhnical experience;
a lot
(v)
40
average
33
little
36
By time required
5-10 minutes
19
31
By education: low
(ii)
% o f subjects c1ass:fied as "very easy" in rating parties
100
10-15 minutes
15
15-20 minutes
36
20-25 minutes
25
a 25 minutes
12
F. Bronner and R. de Hoog
290
Perceived Applicability Another problem is dependency of the applicability of the program on the nature of the choice alternatives: some decision situations can be more suitable than others. As a first indication of the suitability of our program for aiding the evaluation of sets of alternatives from different situations, we asked if people considered each of the situations shown in Table 5 suitable for application of computerized decision aids. The existence of well-defined opinions on applicability (hardly any ’no opinions’) and a lot of variations in the different choice alternatives surprised us. Table 5 . Perceived Applicability of Programs like ours in Various Areas of Decision Making ~~~
Decision problem
70of subjects judging program as applicable
Job choice
67
Washing machines
67
Education
62
Holiday destinations
46
Butter
32
Lottery
29
Family doctor
18
Divorce
8
Choosing a partner
8
CMting children
8
Our subjects thought these kinds of programs are applicable in the following decision situations: - For decision problems at the ’intermediate level’, so not for too trivial problems (like buying daily consumer goods) or too emotionally important decisions (like divorce, choosing a partner or getting children), but for problems like choosing a job, an education, consumer durables (cars, washing machines) or holiday destinations. The high applicability scores for job and education are in line with those of Aschenbrenner et al. (1980) who reported positive experiences with now computerized decision aids for childrens’ occupational choices.
NON-EXPERT USE OF A COMPUTERIZED DECISION AID
291
Jungermann (1979) goes even further; according t o his view, career and vocational counseling seem to him the only field of personal problems in which computerized technology would probably be worthwhile and appropriate; the situation is usually not overly emotionally stressful and the structure of the problem, for the most part, can be determined a priori. - When one has to choose one alternative from a large number of alternatives (67% thought the program applicable here, whereas only 34% thought it applicable when only few alternatives were involved.) Choice from a large number requires more cognitive effort and strain. Apparently, people would like t o use such programs to reduce this strain. - When a number of people with conflicting opinions are in a group decision making situation (64% thought the program applicable here). One can take individuals’ preferences and opinions and provide the participants with information about differing points of view or mental models within the group (e.g., Johnson-Lenz et al., 1978; Turoff, 1980). Of course, these statements represent opinions, further research is needed into the actual applicability of the aid in these situations. Didactic Potential or Consciousness Raising The results presented in the section about perceived applicability d o not answer the question at wtucn specific points computerized decision-aids offer any help. Broadly speaking, help can be directed at two aspects: - ’solving’ the decision problem, i.e., giving a first preference advice, - making the user of the aid more aware of his own decision processes. Humphreys (1977) has called t h s the consciousness raising approach, Bauer and Wegener (1977) speak about the didactic potential of a decision aid. In which choice situations does the first aspect dominate, and in which situations does the second? To us this seems dependent upon the presence of an intuitive preference order. If such an order is completely absent it seems plausible that the main help people expect and get from a decision aid is prescription of choice. Afterwards one can determine the degree of satisfaction with the machine’s advice, as expressed by the user. But when an intuitive preference order is available, the consciousness raising aspect seems to bc more important. A distinction can be made between situations in which with a 19*
292
F. Bronner and R.de Hoog
decision aid the preference order can be easily reconstructed and in which there is a divergence between intuitive and prescribed preference order. In the last case (intuitive order, divergence), the base for divergence can be discussed and the resulting change in convergence indexes the results of consciousness raising (Humphreys and McFadden, 1980). If a convergence between intuitive and prescribed preference order can be easily achieved, the focus is on helping people t o think explicitly about their beliefs, value-wise importances and trade-offs. Especially in this case, the helpfulness of the computerized decision aid seems difficult to measure. If there is no intuitive preference order we can use the satisfaction expressed by the user, if there is an intuitive order and divergence we can use the change in divergence. But what can we use as a criterion for helpfulness if there is an intuitive order and easily achieved convergence? As shown earlier, in our experiment nearly all subjects were satisfied with the prescribed preference order and there was strong convergence between party advised and party preferred, so in our experiment we deal with the third case (intuitive preference order, easily achieved convergence between intuitive and prescribed order), the case in which no clear instruments for measuring aiding facilities are available. As a first step in developing an indicator of helpfulness of a decision aid in this kind of prestructured decision situations we explored the merits of scaling techniques. Our interest lies primarily in the possibility of constructing such an instrument and not so much in the absolute levels of consciousness raising found in this experiment. Statements about the helpfulness of the aid were formulated and the subjects were asked to rate each statement on a verbal scale from ’agree strongly’ through a ’no opinion’ midpoint to ’disagree strongly’. To discover whether some of the answers form a scalogram pattern the Mokken-scale technique was used (Mokken, 1971). This technique is based on a stochastic-response model resembling Cuttman scaling. The scale criterion €1 does not exhibit the deficiencies of the well-known Repcoefficient. Six statements formed a scale (H = 0.51, which means a strong scale according t o Mokken). We present the scale items according to popularity: 1. The program makes me realize I cannot get everything at once from a party (65% agrees); 2. Everyone should consult such a program before voting (55% agrees); 3. The program is an aid for me in clarifying what I want in politics (53% agrees); 4. The program shows my subconscious decision process (40%agrees);
NON-EXPERT USE OF A COMPUTERIZED DECISION AID
293
5. The program facilitates my steadfastness against others if we discuss political convictions (30% agrees); 6. The program makes me more independent of others in judging political parties (30% agrees). This continuum of consciousness raising has a logical hierarchical structure: - The most frequently endorsed effect is t o make people think explicitly about their utilities and trade-offs within an a prion defined problem structure. This can be called the structuring aspect. - Next in the consciousness raising hierarchy is a growing realization of subconsciously used decis&n rules. We found the same in an earlier study in which we asked people if they were able to indicate whether choice rules verbalizing formal combination models, resemble the way they usually reach their decisions (Bronner and de Hoog, 1981). - At the top of the hierarchy is the facilitation of discussing political opinions with others. The use of the program forces everybody t o think about utilities, trade-offs, weights, etc., which can facilitate discussion with others. A clear understanding of one's own opinions makes these more readily explicable and justifiable to others. The first two levels of the hierarchy are known as 'structuring decision processes' and 'raising awareness', the third is mostly neglected and can be labelled 'decision justification' (see also de Hoog and van Houten, 1980). Every subject could be located at a point on this continuum of consciousness raising. The maximum score is 6, the minimum score is 0 (from now on we use the abbreviation CR-scale). The average score in the total sample is 2.8. Operationalized in this way, we can search for differences in scale scores: which subjects have a hgh CRYin other words, for which subjects is the consciousness raising effect the strongest. In Table 6 we present some illustrations of differences in CR-scores. A striking fact is that lower educated and elderly people had the highest CR. We noted above that it was these groups who frequently experienced problems with handling the program. This finding is also expressed in the relation between ease of interaction and CR: subjects with the largest amount of difficulties exhibited the strongest CR. An explanation could be that higher educated young people are a prion so conscious about their political choice process that the program could not offer any help at that point. This stresses the fact that if we want t o know something about the absolute level of consciousness raising which can be achieved with this kind of aid, an experiment in which individuals are faced with real choice
F. Bronner and R. de Hoog
294 Table 6. Mean Consciousness Raising Scores for Various Groups of Subjects (6-item scale: O=low consciousness raising: 6= high consciousness raising) Subject grouping Total sample (i)
(ii)
2.8
education low
4.3
medium-low
3 .O
medium-high
2.3
high
1 .5
By age-group
18-24 years
(iii)
Mean score
1.7
25 -34 years
1.2
35 -49 years
3.7
> 5 0 years
4 .O
By ease ofinteraction
very easy
2.1
easy
3.5
not (so) easy
3.6
problems is necessary. Replication of the experiment reported here with an improved program limited to, for example, individuals who hesitate between three or more political parties one month before an election, can offer more insight in the didactic potential of our program.
Consciousness Raising in Relation to Perceived Applicability It may be expected that people with a high CR-scale score are more enthusiastic about using program like ours and, consequently, see more possibilities for applications. To investigate this relation we took the four items from Table 5 with the highest percentages of applicability: job, washing machines, education, holiday. People with a low CR have a mean applicability score of 41% for these four situations, those with a medium CR62% and with a high CR77%. We can thus conclude that there is a relation between CR and perceived applicability. In Table 7 the relation for each of the four items is given.
NON-EXPERT USE OF A COMPUTERIZED DECISION AID
Decision problem
Total sample applicability score (see Table 5)
295
Level of consciousness raising (CR) low
medium
high
Job
67%
39%*
8 2%
8 007
Washing machines
6 7%
5 4%
64%
80%
Education
6 2%
39%
64%
80%
Holiday destination
46%
3 I%
36%
6 7%
* i s . 39% of subjects with low CR indicate that programs like ours are suitably applicable to job choice.
The absolute level of applicability varies for different levels of CR. But also the rank order of the four items is changing. Individuals with low CR found durables most apt for interactive decision aids, individuals with a medium or high CR assigned first place to job/education and durables (together). For the choice of durables our program seemed to have the broadest support.
Conclusions This paper reports on the use of a computerized decision aid by non-professional decision makers. Some of the main conclusions are : - Computer-based aids like ours can in principle be handled reasonably well by people without a specialist training in decision analysis. We found a high level of respondent interest which seems to be the usual finding in studies of computer-interactive interviewing techniques (MacBride and Johnson, 1980; Johnson, 1979). Perhaps the combination of a new challenge and novelty makes these kinds of tasks more interesting for the respondent, perhaps electronic techniques will in the future also remain more stimulating for respondents than conventional approaches. - The previous point does not imply that the program is flawless as a prototype. Some groups of people can handle the program better than others and concerning decision situations there is a varying level of applicability. The implications of these findings will be treated in the context of improve me nt s.
296
I:. Bronner and R. de Hoog
Non-professional decision makers have well-defined ideas about application of interactive decision aids: problems at the 'intermediate' level (job, education, durables), choosing from a large number of alternatives, group decision making. - By means of a new operationalization of the consciousness raising concept, we were able to show that a t least for political parties choice there is a logical hierarchical structure in the concept: structuring the choice process, subconscious process becomes conscious, own opinions are easier t o defend and/or justify. -
Future Possibilities
Improvements in the Program The attribute elicitation procedure we used has pros and cons as was shown in an earlier section: there were problems with attempts to specify double attributes and poles, but on the other hand everyone was able t o use his own specific formulation of a general attribute. So we think we should keep the repertory grid procedure but will have t o build in safeguards against mentioning aspects more than once and provide more definite procedures for adding or deleting attributes. Also confrontation with a set of attributes which are used most can be a safeguard against forgetting important attributes. For the rating procedure our program currently uses bipolar attributes which pose no problems for the 'easy-handlers'. Unipolar attributes would perhaps be better for some individuals showing difficulties in handling the program, but this would produce problems with the scaling metric on the resulting attributes within an additive multi-attribute utility theory model, such as that used here. Finally, people using the computer expect the same system reliability as they obtain from a telephone, for example. This means that the program and the system must be made 'fool proof' through built-in error checks and flexibility of answering possibilities. Reliability of this kind enhances the acceptability of the aid.
Improvements in the Model
In the program we used, the combination process is assumed to be compensatory. That means that trade-offs among attributes are possible. In non-compensatory models a choice object is rejected because of one bad
NON-EXPERT USE OF A COMPUTERIZED DECISION AID
291
evaluation of an attribute or in the reverse case accepted because of one good evaluation of an attribute. Confronted with a choice between corn pensatory and non-compensatory rules, our subjects were clearly segmented into different groups. Obviously, the preference model is not the same for all individuals and depends upon choice alternatives (Bronner and d e Hoog, 1980) or decision style (Bronner, 1981). Advocates of compensatory rules say: "I like comparing pros and cons on important points for me", "the more nuances the better", 'btherwise choice is too black and white". Supporters of non-compensatory rules say: "In reality I sometimes reject a party because of onepoinf', "matters of principle can come forward", "it is easier, I do not have to rate all parties on all attributes". Earlier research has shown that for political parties compensatory rules are a plausible description of the choice process. But for other choice situations other rules can be plausible. The program might be improved by incorporating threshold values on each attribute, the use of which can be optional, dependent upon the choice alternatives.
Interfacing with Knowledge Bases As stated before our subjects indicated that applications, like choices between jobs, types of education or consumer durables, seem to be very apt for programs like ours. In talks after the experimental session some subjects indicated that in forming the basis for choice between these kinds of alternatives retrieval of 'knowledge' would be desirable. Based on this information utilities can be assigned. This 'knowledge' could be incorporated in the program itself (see, e.g., Shaw, 1979) or by means of external information systems like consumer guides, view-datasystems or other data-bases. The volume of data in computer-processible forms will continue to grow and the on-line availability of such data banks will vastly increase. This leads to greater opportunities for consulting them in decision situations. This can be a solution for the problem sketched by von Winterfeldt (1980) that most aids developed so far rely on empty structuring concepts without adding problem substance. Generalizing structures with connection possibilities to problems-specific data banks might help here.
F. Bronner and R. de Hoog
298
'Peopleware' Finally, we note that attention in computer use has been concentrated for a long time on hardware and software. However, we agree with Kochen (1978, p. 22) that: "Peopleware" of information systems will continue to increase in their relative importance. References Aschenbrenner, K. M., D. Jaus, and C. Villani, 1980. Hierarchical goal structuring and pupils' job choices: Testing a decision aid in the field. Act0 Psychologico, 46, 35-49.
Bauer, V. and M. Wegener, 1977. Applications of multi-attribute utility theory: Comments. In: H. Jungermann and G. de Zeeuw (eds.), Decision Making and Change in Human Affoirs Dordrecht-Reidel, 209- 214. Beach, L. K., B. D. Townes, F. L. Campbell, and G. W. Keating, 1976. Developing and testing a decision aid for blrth planning decision. Orgonizotionol Behavior and Human Performance, IS, 99- 116. Bronner, A. E. Decision styles in transport mode choice. Journal of Economic Psychology (forthcoming). Bronner, A. E. and R. d e Hoog, 1981. Party choice in the Netherlands. In: L. Sjoberg, T. Tyszka, and J . A. Wise (eds.), Decision Analysis and Decision Processes. Lund: Doxa. Budgc, I., I. Crewe, and D. Farlie, 1976. Porty Identificotion ond Beyond: Representations of Voting ondPorty Competition. London: Wiley & Sons. de Hoog, R. and H. J. van Houten, 1980. Keuzeagogie per computer. Tijdschrift voor Agologie, 9, 5, 381 -404. Green, P. E., A. Maheshwan, and V. R. Rao, 1969. Dimensional interpretation and configuration invariance in multidimensional scaling: An empirical study. Multivoriore Behavioral Research, 159- 180. Green, P. E., 197 3. Generalized approaches to product-features mapping. Internal Paper. llniversity of Pennsylvania. Humphreys, P. C. and W. McFadden, 1980. Experiences with MAUD: Aiding decision structuring versus bootstrapping the decision maker. Acta Psychologico, 45, 51-69.
Humphreys, P. C. and A. Wisudha, 1979. MAUD-an interactive computer program for the structuring, decomposition and recomposition of preferences between multl-attributed alternative. Declsion Analysis Unit. Technical Report 79- 2. Uxbridge, Middx: Brunel University Humphreys, P. C. and S. Wooler, 1981. Development of MAUD4. Decision Analysis Unit. Technical Report 81 -4. Uxbridge, Middx.: Brunel University. John, R. S., D. von Winterfeldt, and W. Edwards, 1983. The quality and user acceptance of multiattribute utility analysis performed by computer and analyst. In this volume, 301 -319.
NON-EXPERT IJSE OF A COMPUTERIZED DECISION AID
299
Johnson, R. M., 1979. Advances in computer interactive interviewing techniques. Paper delivered at the ESOMAR Conference. Johnson-Lenz, P., T. JohnsorrLenz, and J. M. Scher, 1978. How groups can make decisions and solve problems through computerized conferencing. Bulletin of the AmericanSociety forInformfftionScience,4, 5 , 15 -17. Jungermann, H., 1979. "Decisionetics the art of helping people t o make personal decisions Paper presented at the 7th Research Conference on Subjective Probability, Utility and Decision Making, Gotehorg. Kochen, M., 1978: Long-term implications of electronic information exchanges for information science. Bulletin of the American Society for Information Science, 4, 5, 2 2 - 2 3 . MacBride, J. N. and R . M . Johnson, 1980. Respondent reaction to computer-interactive interviewing techniques. Proceedings of the 3 3rd ESOMAR Congress, Monte Carlo, 39-54. Mokken, R. J., 197 1. A theory and procedure of scale analysis with applications in political research. 's-Gravenhage: Mouton. Pearl, J., A. Leal, and J. Saleh, 1989. GODDESS: A goal directed decision structuring system. Technical Report UCLA-ENG-CSL-803U. University of California, Los Angeles. Rug& T. and P h Feldman, 1980. 32 Basic Programs for TRS-80 (level I I ) Computer, Portland, Oregon: Dihthium Press. Scott Morton, M. and M. Huff, 1980. Theimpactof computers on planning and decision making. In: H. T. Smith and T. R. Green (eds.), Human Interaction wirh Computers New York: Academic Press. Shaw, M. L. G., 1979. On Becoming a Personal Scientist. London: Academic Press. Shneiderman, B., 1980. Software Psychology. New York: Winthrop. Turoff, M., 1980. Management issues in human cornmunlcation via computer. Paper prepared for Stanford Conference on Office Automation. von Winterfeldt, D., 1980. Structuring decision problems for decision analysis. Acia Psychologicq 45, 7 1-95.
This Page Intentionally Left Blank
THE QUALITY AND USER ACCEPTANCE OF MULTIATTRIBUTE UTILITY ANALYSIS PERFORMED BY COMPUTER AND ANALYST &chard S. JOHN, Detlof von WINTERFELDT, and Ward EDWARDS’ Social Science Research Institute, University of Southern California, Los Angeles, California, U.S.A.
Abstract Two potential problems challenge the computerization of decision analysis. First, to what extent can the often ill-defined art of structurlng be transformed into software. Second, to what extent 1s past consumers’ satisfaction with decision analysis a function of the formal methods and procedures of the theory and rationale of decision theory, and t o what degree do other factors such as personal interaction and the establishment of a rapport account for client approval? We compared multiattribute utflity analyses of personal decision probkms of undervaduates performed by a human analyst vs. those performed by a standalone software package, MultbAttribute Utility Decomposition (MAUD3). Although subjects favored the analyst session over the MAUD3 session, agreement with and acceptance of the analyst and MAUD3 results (implied ordering and most preferred altenative) did not differ. We did find that subjects feel better taken care of when more attributes are included in the analysis, but that subjects’ holistic ratings are bettcr accounted for by analyses with smaller rather than larger number of attributes. Although the analyst attribute sets were more often judged more nearly complete and better in overall quality, the MAUD3 attribute sets were more often judged more nearly independent, both logically and valuewise. Furthermore, the attribute set of each pair wlth the greater number of dimensions was overwhelmingly chosen as being more complete, less independent, and of higher quality than the other attribute set. Judgments of overall quality were virtually ldpntical t o those of completeness.
This research was supported by the Advanced Research Projects Agency of the Department of Defense and was monitored by the Office of Naval Research, prime contract MDS903-80-C-0194, under subcontract from Decision and Design Inc. Special thanks go to Peggy Giffin, Greg Griffin, J. Robert Newman, and William G. Stillwell, who served as analysts and who made valuable contributions to the conceptual continuity of this research. We would also like t o thank Patrick Humphreys arld Larry Phillips for giving us access to MAUD3. Grateful acknowledgement is given to two anonymous reviewers for helpful commcnts on an carlicr draft of this paper.
302
R.S. John, D. von Winterfeldt, and W. Edwards
Introduction Computerized decision aids have become an indispensable tool in decision analysis. The majority of the early aids were designed to perform such functions as data storage, information display, and sensitivity analysis. For example, Decision and Designs, Inc. developed several aids for analyzing decision structures involving multiattribute alternatives (EVAL), uncertainty (DECISION), or both (OPINT). Such aids typically require the services of a decision analyst, or a team of analysts, including one who directs the execution of the program. The primary emphasis of these decision aids is on augmenting the efficiency and power of the analyst, and not on the direct automation of critical analyst functions, such as option invention, problem structuring, or parameter elicitation. The purpose of the present paper is to focus on two problems in the further computerization of decision analysis. First, most of what goes on in decision analysis, especially during early phases of option generation and problem structuring, is more accurately described as "art" than as "science". To what extent can this often ill-defined art translate into software? Secondly, past consumers of decision analysis have expressed both satisfaction with the process and acceptance of the conclusions of analyses. To what extent is this satisfaction and acceptance mediated by factors such as personal interaction and the establishment of a rapport? Elsewhere in t h i s volume, Pitz describes a number of algorithms for option generation and problem structuring. Most of these require that certain parts of the problem be given aprion' and other parts are then derived. For example, Humphreys and Wisudha (1980) implemented a procedure for deriving attributes from a prespecified list of options in a computer program called "Multi-Attribute Utility Decomposition" (MAUD). Pearl, Leal, and Saleh (1980) developed the Goal-Directed DEcision Structuring System (GODDESS) that utilizes the opposite method; alternatives are generated to acheve subsets of predetermined goals and objectives, while remaining unconstrained with respect to the remaining goal structure. The basic idea of fleshing out a structure by building on initial skeletal elements and using decision analytic relations seems promising. So far, this idea has been mainly used to formalize the derivation of options from goals, and the derivation of attributes from options. It is conceivable that this approach could be extended to hypothesis generation and act-event structuring as well. There are relatively few data from controlled experiments on our second problem relevant t o the computerization of decision analysis: user
MAUA PERFORMED BY COMPUTERIZED DECISION AID
303
atisfaction with the process of decision analysis and acceptance of recommended courses of action. Fischhoff (1981) speculates on sources of resistance to personal decision analysis: “People who need decision analysis may reject it (1) because they are personally threatened b y having t o face and acknowledge their own doubts and desires, (2) because they wish t o avoid decision analysis’ full public disclosure requirement, (3) because they feel uncomfortable and incompetent to deal with probabilities and multi-attribute certainty equivalents, (4) because they are afraid t o innovate.” Presumably, the anonymity of computerized decision aids could prove to be a major benefit with regard to the first two concerns voiced by Fischhoff. The absence of another person with whom to ask questions and seek reassurance could prove to be a disadvantage with respect t o his last two concerns. Presently, we have no good data-only speculation. On the issue of user acceptance of the ”best alternative” suggested by a decision analysis, Fischhoff (1981) states: ”Once a decision analysis has been performed, its bottomline recommendantions may be rejected because they are viewed as the output of numerical mumbo-jumbo which has no intuitive appeal and cannot be readily justified to superiors, subordinates, constituents, etc.” Whether a coniputer or human analyst is viewed as more helpful will certainly vary the situation. In a personal decision context, much would depend upon the decision maker’s personal knowledge of t h e aid (either analyst or computer) gained through past experience. Christen and Samet (1980) report some provocative data o n the issue of decision maker acceptance of recommendations arrived at by OPINT, Decisions and Designs, Inc.’s computer aid for the rapid screening of decision options. In their laboratory evaluation, experienced naval intelligence analysts were presented with a background scenario and intelligence summaries, and required to diagnose enemy military plans either with the assistance of OPINT or not. A set of ”correct” diagnoses was determined independently for each stimulus intelligence report. OPINT’s recommendations to aided officers outperformed the unaided officers by making about 33% more correct decisions. But the aided officers frequently disagreed with OPINT, leading to essentially equal performance between aided and unaided officers. Apparently the lack of confidence in the aid produced a substantial decrement in the officers’
304
R.S.John, D. von Winterfeldt, and W. Edwards
performance. Since OPINT requires the services of a decision analyst. many questions remain unanswered. In particular, would a similar decision aid administered by an analyst without a computer have produced more or less rebellion from the naval officers? Would a user-oriented standalone version of OPINT have instilled more or less confidence?
An Exploratory Experiment We sought to compare multi-attribute utility analyses performed by a n analyst and by a computer program. We selected MAUD3, an interactive MAUA program "designed to work in direct interaction with the decision maker, without a decision analyst, counselor, or other 'expert' as intermediary" (Humphreys and McFadden, 1980). Each college subject interacted with both an analyst and MAUD3 at different times. Although our paradigm poses many problems with experimental control, the repeated observations (within subjects) design offers the most sensitive test of differences between the MAUD3 and analyst interactions, especially with regard to problem structuring. The experiment afforded four critical comparison of the analyst and computer sessions, related to differences in: (1) final recommendations, (2) correspondence of final recommendations with intuition (holistic assessments), (3) the number and quality of attributes, and (4) stated satisfaction with the process and confidence in the results. The first two comprisons correspond to convergence among model outputs and holistic preference ratings. Convergence is an admittedly difficult criteria t o interpret in this context (See Fischer, 1975, 1976, 1977, 1979); however, we use it only as an indicator of (non)sensitivity, without implying any value judgment regarding the relationship between convergence and the efficacy of the analysis. All decision problems were multi-attribute evaluation problems generated individually by our subjects. Although Pitz and his colleagues have performed several value experiments with college students demonstrating the viability of hypothetical scenarios (e.g., roommate difficulties, Pitz, Sachs, and Heerboth, 1980; apartment choice, Pitz, Heerboth, and Sachs, 1980; and vacation plans, Pitz, Sachs, and Brown, 1980, we elected t o elicit personal problems that were currently important for each individual subject. (Each of the three examples above was proposed b y at least one of our subjects.) We felt that questions for user satisfaction and confidence
MAUA PERFORMED BY COMPUTERIZED DECISION AID
305
could only be addressed in a real context tapping personally relevant values? Since MAUD3 does not provide any mechanisms for option generation, all options were generated prior t o either decision analytic session. So as to increase our ability to detect differences in the recommendations of the two analyses and their correspondence with intuition, only feasible, highly-viable (non-dominated) alternatives were allowed. Finally, in order to maintain maxinium sensibility to differences in model recommendations, we sought to reduce random judgmental errors by requiring that the subject possess a maximal level of knowledge of the proposed choice alternatives.
Method Design Overview Thirty-five college students underwent two versions of multi-attribute utility analysis in two experimental sessions, each lasting from 1 t o 3 hours. The complete protocol of the experiment is presented chronologically in Table 1. All subjects identified an evaluation problem at the beginning of the session that (1) was personally important and relevant, (2) involved four or more viable alternatives, and (3) required information that was readily accessible. Twenty-four subjects interacted with the computer program (MAUD3) during the first session, and with one of five human analysts during the second session; the remaining eleven subjects first interacted with a human analyst, and then with MAUD3. Before and after each MAUA interaction, subjects provided judgments, including (1) “holistic ratings” of the choice alternatives, (2) rankings of different vectors of alternative ratings, and (3) self-report measures of the usefulness of the MAUA technique used. Following all experiment sessions, each subject’s pair of attribute sets (MAUD3 and analyst) was presented (blindly) t o three of the five
-
~~
All too often, value experiments utilize hypothetical sceyios that rnwe resemble problemsolving tasks, inviting the subjects t o play a numbers-game in which consistency is the winnlng move. Consequently, many experiments that employ decision analytic value models in assessing a roleplayed preference structure never come close to any evaluative or affectlve construct, so necessary to the usual notion of value.
20
306
R.S.John, D. von Winterfeldt, and W. Edwards
Table 1. Experiment Protocol Session 1 Induction Interview: Problem specification, listing of alternatives, and subject screening PreMAUA holistic ratings of alternatives* (Hl) Interaction with MAUD3 or analyst: Multiattribute values (A1 and El) are derived from assessed weights and equal welghts, respectively Post-MAUA holistic ratings of alternatives* (H2) Self-report ratings of the interaction** Ranking of Session 1 alternative rating sets (Hl, H2, and Al; also El for MAUD3 session)*** Session 2 (approximately one week later) Re-MAUA holistic ratings of alternatives* (H3) Interaction with MAUD3 or analyst: Multlattrlbute values (A2 and E2) are derived from assessed weights, respectively Post-MAUA holistic ratings of altesnatives* (H4) Self-report of the Interaction** Ranking of Session 2altemativc rating sets (H3, H4, A2; also E2 for MAUD3 session)*** Forced choice between most preferred set of alternative ratings from Session 1 and from Sesslon 2 Forced choice between assessed weight model composites from Session 1 (Al) and from Session 2 (A2) Ordinal judgment of superiority between MAUD3 and the analyst on the self-report items Debriefing: Discussion of MAUD3 and analyst procedures and discrepancies among holistic ratings and MAUA recommendations
* Each subject listed hisher choice alternatives from most preferred (assigned an anchor of 100) to least preferred (anchored at 0). Subjects were told that an altcrnative (X) should be rated 50 if the increment in desirability form the worst alternative to X was equivalent to the increment in desirability from X t o the best alternative. ** Each subject rated (from 1 to 10) the degree to which helshe: (1) had discovered new aspects of the problem via the MAUA interaction; (2) felt comfortable during the interaction; (3) thought the MAUA had helped t o solve the problem; (4) trusted the MAUA to recommend the "best' alternatlve; and ( 5 ) would desire to use the particular MAUA technique for future decision problems. *** Model composite evaluations derived from assessed weights (and equal weights for MAUD3 sessions) were normalized to the same 0-100 scale as the holistic ratings by subtracting the lowest rating form all ratings, dividing by the difference between the lowest and the highest rated alternative, and multiplying by 100. Each subject rank ordered the three sets of alternative ratings (for sets for MAUD3 sessions) in terms of hislher agreement with the ratings (and implied orderings). The source of rating sets was not explicity identified.
MAUA PERFORMED BY COMPUTERIZED DECISION AID
307
analysts, along with a generic description of the corresponding choice alternatives (e.g., “college majors”). Analysts made quantitative judgments concerning the completeness, logical independence, and value independence of the attribute sets, as well as their ’‘overall’’or “global” quality.
Subjects Sixty-seven college students (31 females, 36 males) enrolled in an introductory psychology course at the University of Southern California were interviewed. Of these, 35 (52%) were able t o identify a multiattribute evaluation problem that met the requirements of personal relevance and accessible information, and included at least four viable alternatives. These 35 students (22 females, 13 males) served as subjects; the rest were dismissed. All 67 students received ”credits” toward a course requirement proportional to the number of hours of participation.
Problem Specification One male experimenter conducted all 67 induction interviews in a private office, each lasting from approximately 1 5 minutes t o 1 l2 hours. Each interview began with the subject reading a brief description of the experiment, outlining the purpose of the two experimental sessions. Subjects were told that the first step would be t o identify a decision problem that w a s personally important and currently relevant. Hypothetical choice situations, decisions that had already been made and acted upon, and problems whose outcomes had no clkar, direct effect on the subject were rejected. The experimenter stressed that the problem should involve options with distinctly positive and negative aspects. In particular, a proposed alternative to a decision problem was rejected if the subjects admitted not really knowing very much about the alternative, or if the subjects felt that the alternative, although a possible course of action, was not something helshe could envision ever really doing. The final product of the induction interview, for the 35 subjects who developed choice dilemmas meeting the experimental criteria, was a list of at least four and not more than eight well-defined alternative courses of action. Problems included choosing among majors (at USC) ( l l ) , college to which to transfer (9), places to live (in the L o s Angeles area) (6), careers (4), travel plans (2), automobiles (I), sports activities (1 ), and strategies for handling a roommate difficulty (1).
20*
308
R.S.John, D. von Winterfeldt, and W. Edwards
Most of the 32 rejected subjects identified a choice dilemma decomposable as an MAU evaluation problem, but that failed to meet one or more experimental requirement. In particular, many male students were reluctant to consider as many as four viable alternatives, insisting that they had "narrowed" problems down t o only two (usually) or three alternatives. As a result, the rejection rate for males (64%) was significantly higher than that for females (29%),who often indicated little or no pre-screening of alternatives.
MAUD3 Sessions All subjects were given an introduction familiarizing them with the IBM 51 10 minicomputer (and keyboard), the cathode ray tube (CRT) monitor, and the IBM 5103 printer. Subjects were told t o report back t o the experimenter when the question "DO you want t o investigate your preferences? " appeared. Elicitation of attributes and single-dimension values. Details of the MALJD3 assessment can be found in Humphreys and McFadden (1980) and Humphreys and Wisudha (1980). Briefly, MAUD3 begins by recursively eliciting attributes, assessing single-dimension value functions, and checking for correlations between pairs of value dimensions. Endpoint descriptions of attributes are determined by asking how triads of alternatives differ, or by aslung for the endpoints directly. Single-attribute value functions are assessed by placing each alternative on a nine-point rating scale (anchored at the elicited endpoints), determining an "ideal point" along the 9-point range, and normalizing under an assumption of piece-wise linearity. When significant correlations between these normalized value functions are detected, the subject is given an opportunity t o combine the two dimensions under a single heading; otherwise they remain in the analysis as separate attributes. After the addition of each attribute (beyond the first three), MAUD3 allows the subject to review the attribute descrip tions, single-attribute .value ratings and ideal points, and normalized singleattribute values. Assessment of weights. M A W 3 assesses scaling parameters (attribute weights) under an assumption of additive utility independence using a version of the basic reference lottery tickets (brlts) procedure? (For details MAUD4 provides a "riskless)) procedure for weight assessment based on difference measurement. The weight on a given attribute is determined by a holistic assessment of the difference betwecn an alternative best on that attribute and an alternative worsr on that attribute, both of which are equal o n all other attributes.
MAUA PERFORMED BY COMPUTERIZED DEClSiON AID
309
see Humphreys and Wisudha, 1980; for more on brlts, see Keeney and Raiffa, 1976). For n attributes, MAUD3 presents n-1 brlts questions, consisting of a choice between a "moderate"sure thing (alternative best on one dimension and worst on one dimension) and a gamble in which an "excellent" outcome (alternative best on both dimensions) results with probability p, and a "poor" outcome (alternative worst on both dimensions) results with probability 1 -p. The two dimensions chosen for each brlts question are determined by the correlational structure of the normalized single-attribute values and earlier brlts judgments. The algorithm is designed to include every attribute in at least one brlts question, and to include more important attributes in more brlts questions than less important attributes. An attempt is made t o select early attribute pairs that are positively correlated, thus creating easily imagined alternatives in the gamble, but not in the sure thing. Later attribute pairs, more critical t o the weight assessment, are selected so as to bear as little statistical association as possible. Observation of pilot subjects indicated that most subjects had trouble understanding the brlts question; in particular, subjects often became confused and frustrated at trying to keep so many pieces of seemingly unrelated information in mind at once. Further observation of pilot subjects revealed a deeper problem inherent in the brlts question; most subjects always switched their preference to the sure thing for values of p greater than 0.5 (usually 0.7or 0.8) regardless of the attribute pair. This response pattern is exactly consistent with multiattribute risk aversion, suggesting a non-additive model form (see von Winterfeldt, 1980). Since MAUD3 assumes an additive aggregation rule, we proceeded as though the response pattern was artifactual. An intervention was employed, therefore, t o explain the structure of the brlts questions and to link them to notions of attribute importance. The experimenter explained that each brlts involved only two attributes, that the sure thing was always best on one attribute and worst on the other, and that the gamble always resulted in an alternative either best on both dimensions or worst on both dimensions. In addition, he suggested that the probability for which the subject was just indifferent between the gamble and the s u e thing indicated the attractiveness of the sure thing in relation to the two outcomes of the gamble. If the sure thing alternative is almost as good as the "excellent" outcome of the gamble, the subject was advised to accept little risk, and switch at a value of p greater than 0.5. Contrarily, if the sure thing alternative is judged to be little better than the "poor'' outcome, the subject was advised to take a bigger chance, and switch at a value of p less than 0.5. The experimenter explained that standing of the sure thing alternative with respect to the gamble outcomes
310
R.S. John, D.von Winterfeldt, and W.Edwaxds
should be related to the relative importance of the two varying attributes and that a brlts with equally important attributes should result in indifference between the gamble and the sure thing at p = 0.5. He also emphasized that careful notice of which attribute was best and which was worst for the sure thing was imperative t o understanding the question.
Analyst Sessions Five different analysts were utilized, including two research fellows, one seventh-year graduate student, and two first-year graduate students. None of the analysts had more than cursory experience with applying MAUA for personal decision problems, and the two first-year students learned of MAU ideas only a few weeks before their involvement in the study. After obtaining pre-analyst ratings, the experimenter introduced the subject t o hislher analyst. All analyst MAUA sessions were carried out in the private office of the analyst, with no intervention from the experimenter of any kind. All sessions were similar t o Edwards' (1972,1977) Simple MultiAttribute Rating Technique (SMART). Like MAUD3, the analysts determined a list of relevant attributes, elicited single-attribute values for each alternative, and assessed scaling parameters (weights), Although SMART does not suggest any specific procedure for determining relevant dimensions, retrospective discussions with analysts indicated that all had used one or more of thefollowingmethods: (1 ) suggestion of a particular attribute; (3) asking the "MAUD3-like" question "How do these alternatives differ?"; (3) asking "How is alternative X attractive? ";(4) asking the subject t o find one aspect on which each and every alternative is attractive; ( 5 ) asking "What attributes d o you want t o consider?" directly; and (6) asking "What factors are relevant to the decision?". A distinction is made between the last two procedures since some analysts allowed the subject t o include any attribute that helshe wanted, whereas other analysts stressed the requirement of relevancy, thereby screening out "unimportant" attributes or attributes with little variability. Single-attribute values were elicited via a 100-point rating scale. Four of the analysts anchored each single-attribute value scale with the worst and best alternatives on each attribute assigned 0 and 100, respectively. One analyst told subjects to think of 0 and 100 as corresponding to hypothetical alternatives (not necessarily under consideration) that are "very bad" and "very good", respectively, on the particular dimension. Additive value independence among attributes is always assumed within the SMART procedure; thus, an additive aggregation rule is used
MAUA PERFORMED BY COMPUTERIZED DECISION AID
311
and the scaling parameters can be thought of as "importance weights". Subjects first rank ordered the attributes in terms of "importance". The lowest ranked attribute was assigned a weight of 10, and the subject made subjective estimates of "importance" relative to this lower anchor. Some analysts specifically called the subject's attention to the problem of attribute ranges, explaining that an attribute with a restricted range among the alternatives at hand should receive less weight than might be the case if the range were larger. One analyst explained the concept of attribute importance in terms of "how much one would like to step from the worst available level of the attribute t o the best available level."
Analyst Evaluations of Attributes Three of the five analysts (two research fellows and one first-year graduate student) evaluated all 70 attribute sets (35 subjects X 2 analyses) in terms of (1) completeness, (2) logical independence, (3) value independence, and (4) "overall global quality." Each subject's pair of attribute sets was presented along with a generic name for the four or more alternatives evaluated. The experimenter abstracted attribute names from the endpoint labels for MAUD3 attributes. All analyst judgments were collected blind, as analysts did not know which attribute set resulted from the MAUD3 and which from the analyst session, nor did they know the subjectlanalyst source of individual attribute sets. All analysts were given written instructions defining completeness, logical independence, and valhe independence. No explanation was given for Ifoverall global quality". In addition, the experimenter met with the analysts in a group to discuss the definitions via several examples and t o answer questions. The three analysts made their judgments independently over a periodof several days following the meeting. For each subject's pair of attribute sets (MAUD3 and analyst elicited), analyst made an ordinal judgment as t o which more nearly c a p tured the relevant aspects of the generic evaluation problem (completeness), and assigned a number reflecting the ratio of the number of aspects covered by the more complete attribute set to the number covered by the less complete set. Logical independence and value independence were judged on 100-point rating scales, each anchored by the attribute set of the 70 judged least independent (assigned a 0) and the attribute set of the 70 judged most independent (assigned a 99). An attribute set is considered t o be logically independent if the attribute labels d o not mean the same things semantically. An attribute set is value independent if the value of an alternative on one attribute is not influenced by the alternative's value o n
312
R.S.John, D. von Winterfeldt, and W. Edwards
another attribute, for all pairs of attributes. Overall judgments of global quality were also made on a 100-point rating scale, anchored by the "worst" attribute set of the 70 (assigned a 0) and the "best" attribute set of the 70 (assigned a 99).
We present three kinds of data analyses. In the first we examined the convergence of multiattribute models from the MAUD3 and analyst sessions and the agreement between models and subjects' holistic judgments. Measures of convergence between models relate t o the sensitivity of results to variations in assessment procedures for dimensions, single-attribute values, and weights. Convergence (or non-convergence) bet ween models and holistic ratings gauges the extent to which model results correspond to intuition. Note that although high convergence between models and ratings might correspond t o a greater tendency to accept (confirming) final recommendations, it might also lead to a total rejection of the (seemingly uninformative and repetitive) decomposed analysis. The second analysis compared the user satisfaction and acceptance ratings of MAUD3 and analyst sessions. As was pointed out earlier, a decision aid (computerized or otherwise) that does not appear helpful and that does not promote acceptance of final recommendations will be of questionable use. In the third analysis, we compared the size and quality of the attribute sets generated by MAUD3 and the analysts. It was expected that this comparison would provide a critical examination of the formalized attribute elicitation method utilized by MAUD3. Convergence
To study convergence we calculated each subject's multiattribute utilities using single attribute value ratings from MAUD3 and analyst sessions, coupled with either assessed weights, or equal weights. The median Pearson product correlation between multiattribute utilities of MAUD3 and analysts (using assessed weights) was 0.63. For 54% of the subjects the analyst and MAUD3 assigned the highest utility t o the same option. Using equal weights for both the analyst and MAUD3 increases this convergence somewhat (median product moment correlation of 0.7 1, with 65% matching highest utility option). Another convergence measure was the correlation between subjects' holistic rating of the options and the multiattribute utilities calculated
MAUA PERFORMED BY COMPUTERIZED DECISION AID
313
from the models. Table 2 shows the median Pearson correlations (ranging from 0.5 to 0.88),conditionalized on whether subjects first interacted with MAUD3 (top half) or the analyst (bottom half). Although differences are obviously small, three minor trends are suggested. First, assessed weights had a higher correlations with holistic ratings than did equal weights in 1 3 out of the 16 possible comparisons. Second, all 8 sets of holistic ratings appear somewhat more consistent with the model that directly preceded or followed the rating. Thud, holistic ratings tended t o drift towards closer agreement with the multiattribute utilities as the sessions progressed. In six of the eight cases, multiattribute models correlated more highly with the final holistic ratings than with any of the remaining three holistic ratings sets. Table 2. Median Pearson Correlations between Holistic Ratings and MAUA Values -
Holistic Rating From: MAUA Values
Weights First Session
PreMAUD3 PostMAUD3 Preanalyst Postanalyst
N = 24
Assessed
0.55
0.71
0.63
0.61
Equal
0.67
0.63
0.59
0.77
Assessed
0.55
0.63
0.67
0.77
Equal
0.58
0.58
0.56
0.69
MAUD3
Analyst N = 11 Analyst
MAUD3
Second Session
Pre-analyst
Postanalyst PreMAUD3 PostMAUD3
Assessed
Assessed
0.80
Equal
0.69
0.88 0.74
0.79
A final measure of convergence was the subjects’ blind rankings of the sets of ratings produced by holistic judgments and the models. Eight of the subjects indicated more agreement with the analyst’s utilities than with the holistic ratings generated either before or after the analyst session. The same was true of only four of the subjects in the MAUD3 sessions. However, when forced to choose between MAUD3 utilities and analyst utilities (with assessed weights), only a slim majority (58%) indicated more agreement with the analyst’s results.
R.S.John, D. von Winterfeldt, and W.Edwards
314
User Satisfaction and Acceptance of MAUA Next we analyzed subjects’ expressed satisfaction and acceptance of the MAUD3 vs. analyst sessions. The number of subjects rating MAUD3 higher than the analyst, and vice versa, are displayed in Table 3 separately for Table 3. Subjects’lmpressions of MAUA Sessions Sex ~
Question
Male (N = 13)
Female (N = 22)
M>A
A>M
M>A
A>M
5
5
4
14
Felt comfortable?
1
6
5
10
Helped to solve problem?
4
5
4
15
Would trust to find best alternative?
2
9
4
11
Would use again?
6
2
3
14
Brought out new aspects?
Note: The number of subjects rating MAUD3 higher than analyst (M > A) and the number rating the analyst higher than MAUD3 ( A > M) sum to less than the sample size, since some subjects assigned equal ratings.
males and females. Females indicated a desire to use the analyst rather than MAUD3 in future decisions and confidence that the analyst rather than MAUD3 recommended the best option. Furthermore, more females found the analyst interaction t o be more helpful, more comfortable, and more effective in discovering new aspects of the problem than MAUD3. Contrarily, males were split roughly evenly on the issue of whether MAUD3 or the analyst brought out more new aspects of the problem or “helped” to solve the problem. In addition, more males preferred to use MAUDS rather than an analyst for future decisions, despite agreeing with the females in rating the analyst interactions as being more comfortable and expressing confidence that the analyst recommendation is more likely to be the “best” option. Overall, males and females rated both MAUD3 and analyst sessions quite high with respect to all five self-report questions.
MAUA PERFORMED BY COMPUTERIZED DECISION AID
315
Quality and Size of Attribute Sets Data on the relative size and quality of MAUD3 and analyst elicited attribute sets is presented in Table 4. The counts for completeness, independence, and quality were generated as follows. For each subject and each Table 4. Site and Quality of Attribute Sets* Session Order Attribute Set Features
MAUD3 first (N = 23)**
Analyst first
(N =
11)
M>A
?
A>M
M>A
?
A>M
N o . of Attributes
5
4
14
1
1
9
Completeness
5
4
14
1
0
10
Logical Independence
6
9
8
3
I
Value Independence
9
6
8
I I
1
Quality
3
10
10
2
3 4
5
criterion, the ratings were scored as favoring MAUD3 if all three raters gave MAUD3 attributes a higher rating or if two gave a higher and one rated MAUD3 and the analyst the same. The ratings were counted as favoring the analyst if the consensus was in the analyst’s favor. The middle column shows the number of cases in which no such decision could be made and therefore indicates the amount of rater disagreement. As the first row of Table 4 indicates, analysts elicited more attributes than MAUD3 for the majority of subjects, particularly when the analyst interact ion occurred first. There was almost perfect rater agreement on which attribute set was more complete, somewhat less consensus on the issue of independence, and substantial disagreement in judments of overall quality. Analyst attribute sets were more often judged more complete than MAUD3 sets, especially if the analyst session preceded the MAUD3 session. Both logical and value independence depended upon the order of the MAUA sessions. The MAUD3 attributes were more often judged more independent for subjects exposed to the analyst interaction first, but subjects interacting with MAUD3 prior
316
R.S. John, D. von Winterfeldt, and W. Edwards
to an analyst were split about evenly as to attribute independence. Finally, judgments of overall attribute quality heavily favored analyst elicitations, regardless of the session order. The attribute set features considered in Table 4 were highly related. In particular, the attribute set of each pair with the greater number of dimensions was overwhelmingly chosen as being more complete, less independent, and of higher quality than the other attribute set. Ordinal judgments of overal quality were virtually indentical to those of completeness.
Discussion and Conclusions MAUD3 and the analyst sessions produced highly convergent multiattribute utilities. Subject’s agreement with the MAUD3 and analyst utilities differed little across session orders, problem types, analysts, or subject sex and race. Assessed weights produced multiattribute utilities in more agreement with holistic judgments than did equal weights. Holistic ratings tended to agree with the most contiguous model values; repeated holistic ratings tended to converge toward agreement with utilities calculated from the models. Unfortunately, it is somewhat difficult t o interpret convergence or the lack of it directly as an indicator of the quality of the analysis. Low convergence could mean that the analysis has totally gone awry, or it could he indicative of a deeper, more valid evaluation than the subject is capable of in his/her own holistic ratings. Subjects became quite involved in both MAUD3 and analyst sessions. Ratings of both session were greatly skewed toward the high end. Subjects were highly motivated, and their responses seemed more thoughtful and considered than is our experience with thought experiments employing hypothetical scenarios, typical of laboratory experiments with college subjects. The observed sex differences with respect t o user satisfaction are curious. One interpretation is that our male subjects possibly had more experience with or aptitude for computer-like tasks. Another explanation lies in a possible analyst sexlsubject sex interaction effect; only one of the analysts was female. Future experiments should certainly better counterbalance for the sex of both subject and analyst. The median number of attributes elicited was greater for analyst sessions (7.5) than for MAUD3 sessions (5.9); however, one analyst averaged 10 attributes per session, while another averaged only a little over 5. The 10-attribute analyst was rated higher than the other four analysts in terms of subjects’ impressions of the session, but received the lowest
MAUA PERlORMED BY COMPUlERIZliL) DECISION AID
31 7
amount of acceptance of the resulting alternative orderings. The five-attribUte analyst, however, received the lowest subjective ratings of all, but achieved the greatest degree of acceptance of final alternative orderings. Our findings seem t o indicate that subjects feel better taken care of when more attributes are included in the analysis, but that subjects' holistic ratings are better accounted for by analyses with smaller rather than larger numbers of attributes. Greater convergence with fewer attributes might be expected if holistic judgments are made on the basis of a selection of a few highly salient attributes. Our findings regarding the size and quality of attribute sets suggest that our analysts' notions about attribute elicitation are much like those of our subjects: the more the better. Although MAUD3 elicited smaller, less "complete" attribute sets, they were judged to be more independent, both logically and valuewise. This result is presumably due, at least in part, t o the effectiveness of the MAUD3 mechanism for identifying statistically related attributes and presenting them for combination under a single heading. Interpretation of our findings should give proper consideration to the subject population, problem types, analyst experience and method (SMART), and the particular MAUA software we employed (MAUD3). In particular, we should comment on the peculiarities of the MAUD3 program. We found that MAUD3 is not truly "stand alone". Many of our subjects asked for assistance in the attribute elicitation phase of the program. Typical mistakes included: repetition of attributes (up to 15 times); including more than one attribute in a given attribute definition; and thinking about other attriboutes when specifying the "ideal point" and/or scale values on an attribute. MAUD3 should give the subject more information concerning attribute elicitation, as the "difference questions" are simply too abstract and non-directive. We also found that very few subjects were able to answer the brlt questions properly. Most subjects had initial difficulties understanding this question, and even after careful instructions they experienced some problem keeping track of the different pieces of information that constitute brlts. There also appeared to be a response bias that is built into the sequencing of the brlts questions. That sequence reduces the attractiveness of the gamble until the sure thing is either preferred to or indifferent to the gamble. Subjects inclined to stop an obviously difficult information processing task, appeared to choose the sure thing even before they reached their indifference point. In spite of these problems, the computer sessions compared quite favorably with the analyst sessions. This general result is encouraging for those who see the future of decision analysis in computerized and possibly
318
R.S. John, D. von Winterfeldt, and W. Edwards
stand-alone decision aids. We conceptualize the development of computerized decision aids in a 3-dimensional framework: (1) The extent to which the program requires the services of someone knowledgeable of either DA or the operation of the program; (2) The degree to which available problem structures are organized into empty DA categories vs. orientation toward problem specific structures that make use of prototypical features generic to all problems of a given class; and (3) The level of detail of the environmental model. (Buede, 1979, calls the last dimension the engineering science-clinical art factor of decision aiding.) The results of our experiment suggest that stand-alone decision aids are feasible. We believe that many of the issues corresponding t o problem structuring and option invention can be eliminated by creating generic problem structures, complete with a general structureand a set of options that can be both pruned and added to. We feel that user satisfaction was largely mediated by the fact that our analyst could recommend options and objectives to the decision maker directly, whereas MAUD3 could not. Perhaps user confidence would be enhanced further by including a problem related data base, thus allowing the subject to employ as complex and complete a model of the choice problem as seems desirable.
References Buede, D. M., 1979. Decision analysis: Engineering science or clinical art? Technical Report TR 79-2-97. McLean, Virginia: Decisions and Designs, Inc. Christen, F. G . and M. G. Samet, 1980. Empirical evaluation of a decision-analytic aid. Final Technical Report PFTR-1066-80-1. Woodland Hills, California: Perceptronlcs, Inc. Edwards, W., 1972. Social utilities. In: Decision and risk analysis: powerful new tools for management. Proceedings of the Sixth Triennial Symposium, June, 1971. Hoboken: The Englneerlng Economist, 119-129. Edwards, W., 1977. Use of multiattribute utility measurement for social decision making. In: D. E. Bell, R. L. Keeney, and H. Rafffa (eds.), Conflicring Objectives in Decisions New York: Wiley. Fischer, G. W., 1975. Experimental applications of multiattribute utility models. In: D. Wendt (ed.), Utility, Probability, and Human Decision Making. Dordrecht, Holland: D. Reidel Publishing Co., 47-85. Fischer, G. W., 1976. Multidimensional utility models for rlsky and riskless decisions. Organizational Behavior and Human Performance, 17, 127-146. Fischer, G. W., 1976. Convergent validation of decomposed multiattribute utility assessment procedures for risky and riskless choice. Organizational Behavior and Human Performance, IS, 295- 315. Fischer, C. W., 1979. Utility models for multiple objective decisions: Do they accurately represent human preferences? Decision Sciences, 10, 45 1-479.
MAUA PERFORMED BY COMPUTERIZED DECISION AID
319
Fischhoff, B., 1981. Decision analysis: Clinical art or clinical science? In: L. Sjoberg, T. Tyszka, and J. Wise (eds.), Decision Analysis and Decision Processes Lund: Doxa. Humphreys, P. and W. McFadden, 1980. Experiences with MAUD: Aiding decision structuring versus bootstrapping the decision maker. Acta Psychologica, 45, 51-69. Humphreys, P. and A. Wisudha, 1980. Multi-Attribute Utility Decomposition. Technical Report 79-2/2. Uxbridge, Middlesex, Brunel Univasity, Decision Analysis Unit. Keeney, R. L. and H. Raiffa, 1976. Decisions with Multiple Objectives New York: Wiley. Pearl. J., A. Leal, and J. Saleh, 1980. GODDESS: A goal-directed decision structuring system. UCLA-ENC-8034. Los Angeles: University of California, School of Engineering and Applied Science, Cognitive Systems Laboratory. Pitz, D. F., J. Heerboth, and N. J. Sachs, 1980. Assessing the utility of multiattribute utility assessments. Organizational Behavior and Human Performance, 26, 65- 80. Pitz, G . F., N. J.Sach. and M. T. Brown, 1980. Eliciting a formal problem structure for individual decision analysis. Technical Report TRAEP, Vol. 2., No. 2. Carbondale, Illinois: Southern Illinois University, Department of Psychology. Pitz, G. F., N. J. Sachs, and J. Heerboth, 1980. Procedures for eliciting choices in the analysis of individual decisions. Organizational Behavior and Human Performance, 26, 396-401. von Winterfeldt, D., 1980. Additivity and expected utility in risky multiattribute preferences. Journalof Mathematical Psychology, 21, 66-82.
This Page Intentionally Left Blank
INTERDEPENDENCE BETWEEN PROBLEM STRUCTURING AND ATTRIB’UTE WEIGHTING IN TRANSITIONAL DECISION PROBLEMS* Stuart WOOLER and Alma ERLICH London School of Economics and Political Science, England.
Abstract The inherently complex judgements required for assessing relative weights of attributes in multiattributed decision problems are explored in this paper. The role of weighting in the evaluation of choice options is distinguished from the function whlch weighting may perform in structuring an option set. For this purpose, a distinction is drawn between two different sorts of scales across which decision makers may assess utility differences in attribute weighting: option-anchored scales where poles are defined by most and least preferred options, and grand scales enclosing attribute levels which may be more extreme than those achievable by the option set but yet which may, it is argued, influence the attribute weights assigned. An exploratory study in the area of career choice is described, investigating the weighting assessments of two small groups of participants: members of the f i s t group being concerned with the structuring of an option set, i.e., the identiflcation of a small set of alternatives worthy of further consideration; while those of the second group were concerned with the evaluation of the options within a predetermined set. Results derived from the two groups suggest that decision makers concerned with structuring operations may adopt very different weighting strategies from those concerned with evaluation of options. These differential weightlng strategies are discussed in terms of the grand scale/optiowanchored scale distinction. Contrary t o the prescriptions of decision theory, our evaluation group neglected t o incorporate a measure of the range of the option set on each attribute within assessments of relative attribute weights. The discussjon of this result is based on an interpretation1 of the decision problems faced by our participants as psychosocial transitions, within which the decision maker is typically severely hindered by attendant restrictions in the knowledge base of the decision problem and instability of the preference structure.
* The research reported in this paper
W , ~ Scarried out as part of a U.K. Social Science Research Council funded proj,r?ct on Modelling and Aiding Careers Transitions: A multkattribute utility analysis. (HR6904).
21
322
S. Wooler and A . Erlich
Introduction From the point of view of the ’client’ seeking help from the decision analyst, most available forms of MAU (Multi-Attribute Utility) technology break down into two steps: (1) the assessment of utility levels achievable on each attribute dimension of a given option, and (2) the assessment of relative importance weights across attributes. A difficulty, however, is that while the assessment of multi-attribute functions over separate attribute dimensions requires relatively simple atomistic judgements on the part of the decision maker, weights assessments require inherently composite judgements as a result of their relativity to the following factors: (a) the decision maker’s current option set, where a revised option set may change the range of feasible levels of an attribute and consequently result in a change in its relative weight, (b) the utilities which the decision maker associates with feasible levels of all other attributes currently employed in the evaluation, and (c) the superordinatie goal of which, in Raiffa’s (1969) hierarchical utility model, the sub-goals (or objectives) associated with attributes are decomposed parts. The problem for the weights assessor is not simply one, however, of information processing difficulties in the face of such multiple factors since, as Humphreysand McFadden (1980) point out, the trade-off judgements across attributes required for determining relative weights may necessitate that the decision maker cope with feelings of regret and loss as a result of relinquishing desirable outcomes. Thus, it is perhaps not surprising that decision analysis has as yet found the decomposition of weighting assessments a largely intractable problem. None of the existing weights assessment methods seems to provide an ideal solution in the face of these complexities. Indifference techniques (Keeney, 1975; Keeney and Nair, 1977) are time consuming and require from the decision maker highy artificial judgements about abstract entities (cf. Kneppreth et a l , 1978). Direct weighting techniques achieve s i m plicity by ignoring the problems associated with the relativity of weights to the range of the choice options (Edwards, 1977; Einhorn and McCoach, 1977). Other methods, such as those requiring direct compensation or swing-weight judgements (Sayeki, 1972; von Winterfeldt and Edwards, 1973), achieve a degree of simplicity in the judgements required, but on the basis of approximation rather than decomposition. Weiss (1980), to be sure, offers some interesting suggestions for decomposition, but these appear to be as yet untested. Within classical decision theory, weighting is viewed as a procedure composed of two steps: (1) employing available factual information to identify the range, or degree of discrimination, of the option set on each attribute, and (2) evaluating relative utility differences across these ranges.
PROBLEM STRIJCTURING AND ATTRIBUTE WEIGHTING
323
In the first step the decision maker draws on information within his or her 'cognitive structure' of the problem in making the required factual, descriptive judgements; the second step engages the decision maker's 'preference structure' (Raiffa, 1969), requiring evaluative or utility judgements. It follows therefore that the notion of 'relative importance of an attribute' has no meaning or function in the evaluation of choice options except in relation to the range of levels of the attribute achievable within the option set (Keeney and Raiffa, 1976). Discussions of weighting procedures are commonly restricted to these terms of reference (see, for example, Johnson and Huber, 1977), weighting being studied only as a requisite step in the evaluation of a fixed option set. This neglects the role that weighting of attributes may play in structuring the decision problem (von Winterfeldt, 1980; Pitz, this volume), a role which there are good Q priori reasons for believing may be an extensive one. How else would an initial set of provisional attributes be refined down to a smaller operational set except by weighting them? How is one to know the crucial features to be possessed by action alternatives contending for inclusion in the option set except by weighting attributes? This lack of discussion of the role that weights may play in structuring means that little can be said about them with confidence. However, supposing such weights are employed by decision makers, what we do know is that they are not restricted to the 'small world' of the option set, the option set being conditioned on these weights rather than vice versa. But how then are these judgements bounded? In particular how do decision makers delimit the range of levels of attributes deemed relevant t o determining attribute weights for option structuring? The study reported below gave the opportunity to study both structuring and evaluation operations and their effects on weighting behaviour. But first in order to clarify the theoretical terrain of the study let us define some terms. Let a grand scale (paraphrasing Savage, 1954) be any scale measuring levels of an attribute within which option sets are located, where the poles of the scale are defined, independently of the option set, by some semantically meaningful reference points forming upper and lower bounds to salient values of the attribute. An example from decision analysis (see, e.g., MacCrimmon and Wehrung, 1977; and Edwards, 1977) is that of the scale with poles defined as 'maximum feasible' and 'minimum feasible' levels of the attribute. Such a scale constitutes a grand scale in our sense because there is no constraint that any options fall at either of the poles. The poles, nevertheless, can index very specific meanings for the decision maker and serve as effective anchors for judgements (Johnson, 1972). The grand scale contains all values of the attribute to which operational utilities can be assigned, and the operational sub-goal is consequently 71 *
324
S. Wooler and A. Erlich
constrained to fall within it. Thus, while an ’absolute ideal’ on an attribute may be some infinite value of the attribute, in practice any real operational utilities will be constrained by some upper bound (cf. Zeleny, 1975). In aided decision making, analysts normally attempt t o impose severe constraints of realism on both upper and lower bounds. In contrast, these grand scales are to be distinguished from what we shall call option-anchored scales which are scales having poles formed by the highest and lowest rated choice options. By definition such scales be defined within grand scales, though the two may on occasions be equivalent. On this view then the decision maker, in evaluating a fixed option set, constructs an option-anchored scale within some subdomain of the grand scale through using his or her factual knowledge of the world to ascertain what levels of the grand scale are realistically achievable within the current option set. Differences between decomposed preferences are assessed across this option-anchored range, not across the grand scale. Given that fixed trade-off ratios may be assumed between the attributes over the range of options, swing weights may then be computed indexing the relative scaling factors (Humphreys, 1977) to be applied to the dimensions within the decision maker’s preference structure. On the other hand the possibility exists that some decision makers may intuitively assess weights for structuringthe option set as swing weights on grand scales, the grand scale poles thus forming anchors between which decision makers assess differences between preferences within problem structuring operations. On successful completion of the structuring stage, swing welghts are (or should be) employed for purposes of option evaluation in the manner prescribed by decision theory. If swing weights are intuitively computed on grand scales for purposes of structuring the option set, then under the prescriptions of decision theory decision makers, when moving on to evaluation of this option set, are faced with having to duft the anchors from the grand scale poles t o the option set anchors and to adjust the weights accordingly. Assessing weights over option-anchored scales does not imply, however, that the associated grand scale anchors have no impact on the perceived importance of the attributes. An example serves t o illustrate this. Consider a politically conservative and a politically radical person using the attribute ’democracy’ in evaluating alternative governmental systems. Assume they agree over the meaning of democracy, and about the ratings of the options on the attribute. In a case where the conservative assesses large utility differences between the options on the democracy attribute, the radical, however, may assess small differences. The grand scale/option scale distinction provides a means of conceptualising what is going on. The conservative’s grand scale associated with the democracy
PROBLEM STRUCTURING AND A’ITRIBUTE WEIGHTING
325
attribute contains a considerably more restricted range of values than the radical’s. Consequently, the conservative discerns strong contrast between the options, where the radical perceives only similarity and lack of discrimination (cf. Einhorn and McCoach, 1977). By stretching the range of the grand scale, the radical has bunched up the options and left little room for utility differentiation within the option-anchored scale. There are consequently good prima facie reasons for the conservative to give greater weight than the radical to the ’democracy’ attribute in the evaluation of these options. The differential weighting here derives from differences in the siting of what we are calling grand scale poles. None of this is strictly contrary to decision theory for the simple reason that classical decision theory has not addressed these issues. As traditionally conceived, it is not decision theory’s business to worry about the ontogeny of utility differences over values of attributes (Humphreys et al., 1980). Decision theory has concerned itself with the integration of utility differences with other relevent preferential and probabilistic data for evaluation of options, not with the derivation of these data. Grand scales with poles constrained to maximum and minimum feasible levels have proven useful in decision analysis because they place tight constraints of realism on the ranges of operational utilities falling within the grand scale. Unrealistic, nonachievable levels of attributes are rendered non-operational. This requires, however, that the decision environment (von Winterfeldt, 1980) is cognitively lughly structured by the decision maker, who needs to possess a rich fund of environmental data and thus to be an ’expert’ of sorts. When the decision environment is novel to the decision maker, who knows only very approximately what to expect-as in ’transitional decision problems’ (Wooler and Humphreys, 1979)-matters may be different. Parkes (1971) delineates as a field worthy of special investigation those circumstances which induce major and dramatic life changes-which he labels ’psycho-social transitions’. The nature of these transitions renders decision problems imbedded in them especially intractable for decision makers and for MAU technology alike. In particular, while undergoing such a transition the decision maker possesses a severely limited ability both t o structure the decision problem realistically and to anticipate his or her own emergent utility structure, due to the inherent unpredictability of the resultant changes in self and in external environment. An investigation into decision maker’s attribute weighting practices within a decision problem which is typically of this sort is briefly reported below.
326
S. Wooler and A . Erlich
The Study Background of the Study The investigation is part of a project employing various (mainly computer based) techniques derived from decision theory, designed t o help students approaching graduation structure and evaluate career choices. Aschenbrenner et al. (1980) and Wooler (in press), amongst others, have developed decision analytic instruments specifically designed for this purpose. The present project aims to develop a kit of tools for use within the careers advisory services of universities and polytechnics, consisting of a set of interactive computer programs for use by students t o structure and evaluate career options. Subjects and Procedures Twenty-thuee students took part in the study, attending a total of three 2-3 hour sessions after each of which they had an interview with a careers counsellor based on material generated in the session. In each session the analyst and participant explored together the implications of the latter’s current thinking about the imminent career problem. Within the sessions reported here, a list of careers currently under consideration and a set of attributes for evaluation of careers were elicited. Part-worths of choice options were obtained by direct scaling using utility scales with top and bottom poles formed by most and least preferred options respectively on each of the attributes-that is. with use of optionunchored scales. Importance weights across attributes were then elicited employing the SMART procedure as described by Edwards (1 977), participants being asked firstly to rank the attributes as displayed in front of them in order of importance, and then to give ratio judgements reflecting differences in importance, setting the least important at a base figure of 10. Grand scales associated with each attribute were then obtained. The aim contrary to normal decision analytic practice, was not t o impose tight bounds on grand scales but to encourage participants t o externalise the bounds of operational utilities used in intuitive judgement. For each attribute in turn a 0-100 scale was drawn on a whiteboard, the participant’s descriptions of ’best feasible’ and ’worst feasible’ outcomes appended to the poles, so as to form grand scales. The participant was then asked to rate the most and least preferred options as already assessed relative to these poles. The difference in values between the most and least preferred options thus defined the range of current options relative to the grand scale.
PROBLEM STRUCTURING AND ATTRIBUTE WEIGHTING
321
The next step was to assess relative utility differences across the grand scales, the participant being required to judge w i n g weights across the range of each grand scale. This involved asking if all attributes were set at zero on the grand scales, which of these the participant would most prefer to shift to 100, and so on down the list. This yielded a rank ordering of utility differences across the grand scales. These differences were then converted into full numerical representations by asking for SMART-type judgements setting the attribute with smallest utility difference at 10, as above. Having been alerted in pilot runs t o the existence of non-compensatable ,regions around the zero point of grand scales of some attributes (yielding outcomes which cannot be traded-off against feasible levels of other attributes), we asked participants of each attribute in turn: ”Would you be prepared to accept a job which was rated zero on this attribute if it was very attractive on other attributes? ” The responses enabled us t o make a binary classification of all attributes into those where tradeoff was restricted at the lower end of the grand scale, that is, Restricted Trade-off atfributes, and those permitting universality of trade-off across the whole range, Unrestricted Trade-off attributes. While the method described here represents the baseline of information collected from each participant, this method was adapted and supplemented to meet requirements of individual cases. Sessions were directed toward helping participants with their career problems, not toward carrying out a fixed experimental procedure. The conceptualisation of our participants’ career decision problems as transitional problemes (discussed above) was borne out by the participant’s frequently expressed anxieties over their perceived ignorance of the options they were choosing between and their uncertainties over the part-worth utility rating of options. The specification of poles of grand scales and locating the range of the option set on option-anchored scales within them appeated to be a meaningful exercise for participants and to clarify issues for them. Participants fell into two groups: (i) the evaluation group (16 participants) whose concern was to evaluate options from within a predetermined set, (ii) the structuring group (7 participants) who, unlike those in the evaluation group, expressed dissatisfaction with the current option set, either because options were insufficiently attractive or because attractive options included in the set were considered to be unrealistic, and whose primary concern was to assess the worth of the current option set and to perhaps identify other options for inclusion in the option set.
328
S. Wooler and A. Erlich
The Hypotheses Because the SMART technique imposes the least formal structure on the decision maker’s weights assessments compared with other weighting techniques currently available while still generating a full numerical representation (Coombs, Dawes, and Tversky, 1970) it comes closest of any currently available techniques to providing a measure of ’intuitive’ weights. It was hypothesized that for the evaluation group high degrees of convergence would be found between the derived weights which compensate for the range of the option set on attributes and SMART-based ’intuitive’ weights assigned to attributes, in line with the prescriptions of decision theory. The judgemental problem facing the structuring group-that of assessing the adequacy of the current option set and of identifying sources of inadequacy-is, however, of a very different sort, and it was conjectured that this difference of focus would impose a different weighting strategy: the structuring group, rather than employing weights compensating for the range of the option set on the attribute, would incorporate a measure of the degree of satisfaction with the option set defined on the grand scale of the attribute (cf. Bauer and Wegener, 1975). This degree of satisfaction was equated with the degree of achievement by the current option set of the sub-goals associated with each attribute (see below).
Results and Discussion A simple multiplicative combination of the ranges of options as defined on the associated grand scales with the swing weight assigned to the grand scale was used to generate for each attribute aweightequivalent to a relative scaling factor for that scale. XI = (nrd 2 r1) (Wll
Wl)
(equation 1)
where XI denotes this relative scaling factor for attribute i, n the number of attributes, ri the range of the option set within the grand scale of attribute i, and wi the swing weight assigned to the grand scale of attribute i. The method described above also provided a rough measure of the degree of achievement of the sub-goals offered by the current option set. The mean of the value of all options on the grand scale for each attribute was computed. This point, measured from zero (i.e., the least preferred pole af the grand scale), provided an achievement measure (m) which could be substituted for the range of the option set (r) within equation 1. Weights derived from this equation, which we will call option structuring weights,
PROBLEM STRUCTURING AND ATTRIBUTE WEIGHTING
329
were correlated for each participant with intuitive weights, under the rationale that high correlation indicates high convergence between intuitive weights and option structuring weights where convergence is taken to be a measure of the degree to which the two sets of weights are based on similar structural representations of the weighting task. Our hypothesis for this structuring group thus consisted in the prediction that for the seven persons in this structuring group and for no other participants, these option structuring weights would converge more highly with intuitive weights than would the relative scaling factors computed by equation 1. This hypothesis gained support. For this seven person group the derived structuring weights gave a significant increase in convergence with SMART-based intuitive weights 011) compared both with relative scaling factors 012) where the mean convergence increase was 0.334 (99% credible interval is 0.056 < p1 -p2 < 0.61 3 which does not include zero), and with simple swing weights assessed over grand scales 013) where the mean convergence increase was 0.156 (99% credible interval is 0.025 < p l - p3 < 0.260 whch does not include zero). There seems t o be a suggestion of bolstering here (Montgomery, this volume), since these results suggest that these decision makers tended to consider an attribute more important if the current option set performed well on it. Since these sessions took place very close to the end of their final academic year, bolstering the current option set could result from a recognition of the considerable cost of searching once more for new options when participants were becoming increasingly anxious t o pass to the evaluation stage in working on their decision problem. Our hypothesis concerning the relative scaling practices of the evaluation group, which essentially followed the prescriptions of decision theory, was however less successful, and it is to a brief discussion of these results and of suggested reasons for the failure of the hypothesis which we now turn. Degrees of convergence between the values of relative scaling factors and SMART generated 'intuitive' weights were found to be low (mean convergence = 0.40). In every case relative scaling factors were outperformed as predictors of SMART weights by swing weights assessed over the whole grand scale. The mean increase in convergence between swing weights alone and intuitive weights (pl) compared with relative scaling factors (112) was 0.309 (99%credibleinterval is0.162 Ql - p 2 <0.447 which does not include zero). Weights which did not perform relative scaling compensating for disparities in ranges of options on attributes thus proved to be significantly better predictors of SMART than those which did. For only one participant was a significant positive correlation found between sizes of ranges and assigned SMART weights, whereas for nine participants
330
S. Wooler and A. Erlich
negative correlations were found (mean association between intuitive weight and range of option set = 0.22), suggesting not just a neglect of relative scaling but an association between lack of discrimination of options on an attribute and the attribute's SMART weight. Within a small scale study of this kind, it is not possible t o infer with confidence from these results to the conclusion that these decision makers anchored intuitive weights assessments on grand scales, rather than on optionanchored scales. This study can only tentatively suggest reasons for the reported phenomena and to indicate some avenues which it may be fruitful to explore within a more thorough-going investigation. As noted above, there is ample evidence of our participants'lack of confidence in the judgements they needed t o make to assess their preference between jobs. Their awareness of their ignorance of the options they were considering was very marked, taking the form both of lack of information about choice options and lack of knowledge of the utility structure they would emerge with at the end of the transitional period in their careers that they were now entering. This inadequate structuring of the choice problem had two implications. Firstly, the range of the option set on attributes could not be judged with confidence. Protocols of sessions within this study provide numerous indications of participants' awareness of lack of required data, and its effects upon their judgements. Fragments such as the following were typical: "The problem is that I might just have put some jobs up hlgh on that one (grand scale of attribute X) because that one (attribute X) seems important to me." In the absence of an adequate fact-based cognitive structure for the problem t o fuc the ranges of the options on attributes, our participants suggested that these range assessments were being influenced by their preferential attitudes towards the attributes. It is worthy of note that this was readily recognised as a form of self-delusion by our participants. Thus they exhibited an intuitive awareness of the decision theoretic requirement (discussed above) that the formation of option scales should be solely a function of the decision maker's factual knowledge, independent of preference structure. The career decision problems our participants faced did not permit t h s independence to be maintained. Secondly, grand scales enclosing operational utilities relevant to the choice problem could not be sufficiently constrained to realistic levels, since our participants did not know what was realistic to expect from the various job options and what was not. Lacking the knowledge permitting attribute values and consequently operational utilities t o be constrained by realism and feasibility, grand scale poles may have been defined more by imaginability than by feasibility (cf. Tversky and Kahneman, 1973). We mean by this that decision maker's judgements may have been anchored on personally salient reference
PROBLEM STRUCTURING AND ATTRIBUTE WEIGHTING
331
irmges representing prototypically desirable and undesirable outcomes with respect to each attribute, which he or she could be fairly sure would remain stably desirable or undesirable regardless of all the anticipated fluctuations in preference structure throughout the transitional period. This view is reminiscent of the findings of Sjoberg (e.g., 1980) from his investigations into the role of desirable and undesirable images in decision making. Sjoberg was concerned to press the general case for images supplanting values as determinants of preference structures. The present study suggests a narrower hypothesis concerning the significance of these i m g e s in circumstances where neither knowledge structures about the world nor preference structures about the self are stable or well-developed-that is, in transitions. We suggest that in these circumstances emotion-laden images may determine decision behaviour in the absence of information constraining operational utilities to realistic levels. In these circumstances one would expect, as we have found in this study, a wide divergence between weights assessed by this imaginal strategy and those derived from techniques enforcing compensation for the range of the option set. Moreover,the present study has also found for many participants negative correlations between ranges of options on attributes and intuitive importance of attributes, suggesting that within these transitional problems the more important an attribute the more extreme will be the images associated with i t and consequently the wider the divergence between the utilities operationalised within the associated grand scale. To test t h s further we employed the information we had derived from participants about w h c h of the grand scales attributes afforded universality of tradeoff across the whole grand scale range and which did not. This enabled us t o bifurcate all attributes employed by participants into 2 classes: those offering unrestricted trade-off across the whole range of the grand scale (UTO attributes) and those offering acceptable trade-off across only a portion of the grand scale, restricted trade-off (or RTO) attributes. It is a reasonable assumption that within RTO attributes there is a wider utility difference between the scale poles than in the case of UTOs in view of the evidence suggesting that trade-offs are only acceptable within narrowly bounded utility ranges (see Shapira, 1979). Under the present interpretation one would expect to find the wider utility difference of RTOs to be associated with higher intuitive importance. A further bifurcation of the data for each participant into those attributes with above and those with below average importance provided us with a 2 X 2 data matrix, each attribute employed by each participant being an entry in one of the four cells according t o whether it was above or below the average importance of the participant’s attributes and whether it was RTO or UTO. A Goodnix-
332
S. Wooler and A. Erlich
Kruskal test was used to investigate any predictive association. It was found that from knowledge of the tradaoff status (RTO or UTO) of an attribute one reduces one’s uncertainty about an attribute being above or below average importance by 0.50. The result holds u p well if one substitutes participants’ swing weights judgements over grand scales for their intuitive weights within Goodman-Kruskal test (=0.48), providing further evidence that the intuitive weighting strategy employed by our participants was based on swing weight judgements across grand scale attributes, the poles being formed, we have suggested, b y reference images of prototypically desirable and undesirable outcomes.
Conclusion
In this paper we have explored decision theoretic precepts concerning attribute weighting, and suggested that much may be gained in explicating intuitive weighting practices by extending the analysis of weighting beyond the ’small world’ of the option set t o include explicit consideration of attribute levels which may be beyond the range of the option set but which may nevertheless have an impact upon weighting practices. We have reported an intensive, small-scale study employing this idea. In t h s study grand scales were distinguished from optionanchored scales within a decision aiding context, and decision makers explicitly defined the poles of the grand scales representing the extremes of the operationalized utilities associated with attributes. Results reported here suggest that this technique may throw light on decision makers’ intuitive weighting strategies. The weighting behaviour for our participants differed significantly depending on whether the participant’s concern was to structure the option set or to evaluate this set. Our results suggest that weights assessed by decision makers engaged in structuring a decision problem may typically incorporate some measure of degree of satisfaction with the option set on each attribute. Those engaged in evaluating an option set o n the other hand appear, contrary to decision theoretic requirements, not to compensate for the relative ranges of the option set on attributes in assessing weights across attributes. It is suggested that this may result from the specific transitional nature of the decision problem under consideration.
PROBLEM STRUCTURING AND ATTRIBUTE WEIGHTING
333
References Aschenbrenner, K. M.,D. Jaus, and C. Villani, 1980. Hierarchical goal structuring and pupils’ job choices: Testing a decision aid in the fleld. Acta Psychologica, 45, 35-51. Coombs, C. H., R. M. Dawes and A. Tversky, 1970. Mathematical Psychology: A n Elementary Introduction Englewood Cliffs, N.J.: Prentice Hall. Edwards, W., 1977. Use of multi-attribute utility measurement for social decision making, ln: L). E. Bell, R. L. Keeney, and H. Raiffa (eds.), Conflicting Objee fives in Decisions New York: Wlley. Einhorn, H. J. and W. McCoach, 1977. A simple multiattribute utillty procedure for evaluation. Behavioral Sciences, 22, 210- 282. Humphreys, P. C., 1977. Applications of multiattribute utility theory. In: H. Jungermann and G. de Zeeuw (eds.), Decision Makingand Change in Human Affairs Dordrecht: Reldel. Humphreys, P. C. and W. McFadden, 1980. Experiences with MAUD: Alding decision structuring versus bootstrapping the decision maker. Acta Psychologica, 45, 51-71. Humphreys, P. C., S. Wooler, and L. D. Phillips, 1980. Structuring decisions: The role of structuring heuristics. Technlcal Report 80-1. Oxbridge, Middlesex: Decision Analysis Unit, Brunel University. Johnson, D. M., 1972. A Systematic Introduction to the Psychology of Thinking, New York: Harper Row. Johnson, E. M. and G. P. Huber, 1977. The technology of utility assessments. IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-7,5. Keeney, R. L., 1975. EneIgy policy and value trade-offs. Research Memorandum RM-75-76. Laxenburg, Austria: IIASA. Keeney, R. L. and K. Nair, 1977. Selecting nuclear power sites in the Pacific Northwest using decision analysis. In: D. E. Bell, R. L. Keeney, and H. Raiffa (eds.), Conflicting Objectives in Decisions. New York: Wiley. Keeney, R L. and H. Raiffa, 1976. Decisions with Multiple Objectives: Preferences and Value Trade-offs New York: Wiley. Kneppreth, N. P., W. Hoessel, D. J. Gustafson, and E. M. Johnson, 1978. A strategy for selecting a worth assessment technique. Technical Paper 280 (ADA055345). Arlington, VA: U. S. Army Research Institute for the Behavioral and Social Sciences. MacCrimmon, K . R. and D. A. Wehrung, 1977. Trade-off analysis: The indifference and the preferred proportions approaches. In: D. E. Bell, R. L. Keeney, and H. Raiffa (eds.), Conflicting Objectives in Decisions, New York: Wiley. Montgomery, H., 1983. Decision rules and the search for a dominance structure: Towards a process model of decision making. In this volume, 343-369. F’arkes, C. M., 1971. Psychosocial transitions: A field for study. Social Science and Medicine, 5, 101- 115. Pitz, G. F., 1983. Human engineering of decision aids. In this volume, 205-221. Raiffa, H, 1969. Preferences for multiattrlbuted alternatives. Memorandum RM-5868-DOTIRC. Santa Monica: The Rand Corporation. Savage, L. J. 1954. The Foundations of Statistics New York: Wlley. Sayeki, Y., 1972. Allocation of importance: An axiom system. Journal ofMathematical Psychology, 9, 55-65.
334
S. Wooler and A. Erlich
Shapira, Z,1979. Making trade-offs between job attributes. Pittsburgh: Graduate School of Business Administration, Carnegie Mellon University. Sjoberg, L., 1980. Volitional problems in carrying through a difficult decision. Acta PSychologica, 45, 123-133. Tversky, A. and D. Kahneman, 1973. Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 207-232. von Winterfeldt, D. 1980. Structuring decision problems for decision analysis. Acta Psychologica, 45, 71-95. von Winterfeldt, D. and W. Edwards, 1973. Evaluation of complex stimuli using multiattribute utility procedures. Technlcal Report 011313-2-T. Ann Arhor, Mich: Engineering Psychology Laboratory, University of Michigan. Weiss, J. J., 1980. OVAL and GENTREE: Two approaches to problem structuring in decision aids. Technical Report 80- 3-97. McLean, VA: Decisions & Designs, Inc. Wooler, S. A decision aid for structuring and evaluation career choice options, Journal of the Operational Research Society (in press). Wooler, S. and P.C. Humphreys, 1979. Modelling and aiding career transitlons. Technical Report 79-6. Uxbridge. Middlesex: Decision Analysis Unit, Brunel University. Zeleny, M., 1975. The theory of the displaced ideal. In: M. Zeleny Multiple Criteria Decision Making: Kyoto, Berlin: Verlag.
Section IV
TRACING DECISION PROCESSES
This Page Intentionally Left Blank
INTRODUCTION Ola SVENSON and Patrick HUMPHREYS
This section opens with a theoretical and stimulating paper by Montgomery who argues that a decision process may be characterized by a search for good arguments supporting the later chosen alternative. Thus, the justificution of a choice is seen as the ultimate goal of a decision process and Montgomery suggests one possible way to reach this goal, that of finding or creating a dominance situation favoring only one choice alternative. A dominance situation is one where the cognitive representation of the decision task is such that one alternative dominates all the others, i.e., it is equally good or better than all the other alternatives on all attributes. Such a situation can be achieved by processes such as counterweighing, ignoring or eliminating differences not supporting the final choice candidate. The infornution processing leading t o a final decision can be studied in process tracing studies generating think-aloud protocols, eye movement records and so forth. However, such data d o not inform us directly about the cognitive information processes but about concurrent verbal or behavioral reactions The value of think-aloud data may be questioned with regard to their precision, reliability and replicability, and all process tracing data have to be evaluated in relation to the degree to which the information they provide reveal the cognitive processes under study. The studies by Fidler and Svenson investigate the value of think-aloud data from somewhat different starting points. Fidler investigates concurrent, retrospective and what he calls interpretative verbal protocols. His subjects made choices among alternatives with incomplete information (e.g., the choice alternatives were characterized not only on the same set of attributes but on a set of attributes unique for each alternative) and the results indicated that concurrent verbalization did not change the internal consistency of a set of decisions but delayed the decision process so that it required more than double the 22
338
0.Svenson and P.Humphreys
time in comparison with the same decisions made without verbalizations. The consistency of the concurrent verbal reports was found higher than the consistency of retrospective and interpretative verbal reports. Svenson explores the possibility of measuring the information in verbal protocols with the ultimate goal of making it possible to perform regular statistical parametric tests on think-aloud data. His subjects judged evaluative statements, sampled from verbal protocols with respect to degree of attractiveness and attribute specificity vs. attribute generality (e.g., ”expensive” being a statement with high specificity because it cannot be used with any attribute while ”good” has a high attribute generality across attributes). The results provided no support for the hypothesis that the use of simpler decision rules early in a protocol (e.g., a conjunctive rule) generates more negative evaluative statements during the first half of a protocol. However, the attribute specifcity was significantly greater in the first half tllan in the second half of a protocol. Sjoberg’s study draws on Bieri’s (1961) observations that not only the overall differences between decision alternatives but also the within alternatives attractiveness patterns of the aspects characterizing the decision alternatives are important for pre-decisional information processing. Furthermore, Sjoberg shows that individual differences in differentiating within and between alternatives are important for the information processing in an emotionally loaded field, that of quitting smoking or not. Klayman analyzes information search patterns and shows the difficulties in relating these to decision rules because the search characteristics of a given decision rule or sequence of decision rules may be variable and task-dependent. He discusses two ways in which analyses of information search patterns may be improved. The first is task-specific simulations to establish the search characteristics associated with different decision rules in different decision situations and the second is the use of more sophisticated analyses of search characteristics such as the extent to which future information search is controlled by prior information. Furthermore, Klayman shows that sequential combinations of decision rules and continuous variations of various parameters associated with information processing and decision strategies could be revealed in the more advanced analyses of information search patterns which he advocates. T h ~ sis a promising direction of research because it attacks the problem of extracting more knowledge from human information search data in relation t o a general process model of decision making containing specified decision rules and strategies. One of the leading themes in Montgomery’s paper was the decision maker’s need to justify his OT her decision. Empirically, this theme is taken
Section IV:INTRODUCTION
3 39
up in Ranyard and Crozier's study on reasons given for judgements and choices. In particular they studied biddings on individual gambles, choices between two gambles, and decisions among 3 or 5 alternatives. The verbal reasons collected in these situations were classified in different categories. To exemplify the results, absolute value judgments were most frequently used in the bidding task and least frequent in the two alternative decision situations as one may expect. Heuristics used to simplify the task when justifying the choice were almost always of a computational type where, for example, the number of aspects were reduced or the information was converted into a simpler form. Rounding the numbers describing the alternatives was most frequent in the bidding task and least frequent in the binary choice task. More than half of all the heuristics reported concerned difference computations for choice situations with two and more alternatives. Most often, decisions involve choices among alternatives where we do not have information about the same set of attributes for all alternatives. To exemplify, how to make a choice between two applicants to a research assistant job when you know that one candidate has good mathematical background and good reports from earlier employers and the other a good philosophical background and has not been employed earlier at all? Huber's well designed study is devoted to the question if and how decision makers substitute missing aspects of choice alternatives. The results show that substitution does occur, and that the value of the substituted aspect depends on the decision maker's cognitive representation of inter-dependencies between attributes. If no such strong presumptions exist the substitution depends on considerations whether the missing aspect is better or worse than that available for another alternative on that attribute. The question of information processing for incomplete alternatives is certainly worth more attention in future research. The papers by Goldsmith and Sahlin and by Smith and Ferrell support this contention in examining aspects of subjective probability structures: how people interpret the inferences they make (as "best guesses") in cases where information is not supplied to them in task instructions, but has to be retrieved from memory in forming inferences. To these inferences, subjects may attach, or report, varying degrees of subjective confidence concerning their precision which have been extensively investigated in the past by, e.g., Lichtenstein, Fischhoff, and Phillips (cf. references in Smith and Ferrell, this volume). Goldsmith and Sahlin talk about such confidence reports as "second order probabilities", and report studies where subjects chose between alternatives, expressed as bets where the second order probabilities of the outcomes were varied, while holding the first order probabilities constant. 22*
340
0.Svenson and P. Humphreys
In pre-testing, "first order" probabilities were established by aslung for probability estimates in the usual way for events like "there will be thick fog in the center of Birmingham, England sometime tomorrow morning". Here, (the Swedish) subjects could be expected to give low degrees of confidence (second order probabilities) in the estimates they gave. This was because they were likely to have very incomplete structural knowledge of the data generating processes for such events available in memory. The probability estimate obtained from a subject for each event of this type w a s matched against a similar "first order" probability estimate that the same subject assigned to an event generated by a precisely specified data generating process (e.g., sampling from an urn containing defined proportions of balls of different colours). Consistent preferences for bets according to their second order probabilities were found among subjects, althought the types of preferences varied across subjects. In the following paper, Smith and Ferrell investigate a similar problem viz. How well "calibrated" are subjects' estimates of subjective probabilities? Using a two-parameter (discriminability and response criterion) model derived form signal detection theory, they show how different predictions can be derived in the case where a subject responds in a binary sense ("true" or "false") and then gives a "second order" probability that the response is correct, from the case where the subject assesses directly (at the "first order" level) the probability that a questionnaire item is true. The difference in predictions stems from a consideration of the interaction between task structure and the subject's behaviour (in particular, the tradeoff between being "well-calibrated" and wishing to perform well on "true"-"false" judgemental tasks). Smith and Ferrel demonstrate that their model fits experimentally obtained calibration curves very well over a wide variety of judgemental tasks. They consider that a process akin to that implemented in the model seems to underlie the way people translate their perceptions about the truth of propositions into numerical judgements. However, they point out that, in the absence of process tracing methodology, their model does not say anything about the cognitive processes by which "apparent truth" is arrived at. The final paper in this section, on the surprisingness of coincidences by Falk and McGregor, suggests that in estimating what Smith and Ferrell call "metacognition" (knowledge about one's knowledge) other parameters must be considered in addition to discriminability and response evaluation. Whereas Smith and Ferrell, and Goldsmith and Sahlin, examined probabilities assigned to particular events viewed in isolation, Falk and MacCregor examined judgements concerning degree of surprise (discrepancy between prior opinion and data) for intersections; pairs of events embed-
Section IV: INTRODUCTION
34 1
ded in "coincidence-like stories". Coincidences are surprising by definition in the common usage of the term: intersections of events that are expected on the basis of priors are treated as commonplace rather than coincidences in conventional storytelling. However, the importance of the study comes from the use of a research design permitting the study of changes in the relative degree of judged surprisingness of a story with changes in parameters which affect its psychological impact, without altering the likelihood structure. In particular, subjects were found to judge stories describing coincidences that they theinselves had written to be more "surprising than similar stories written by other subjects or by the experimenters. TWO reasons are offered for this difference in forming the subjective probability structure within which the possible occurrence of an event is assessed: (i) subjects reading stories written by others could view the writer of a coincidence as one of many individuals, and every event a realization of many possibilities; (ii) Subjects, viewing themselves as unique to themselves, may not see how easily they could have been replaced in a set of events or circumstances by another individual. However, in this experiment no verbal protocols were available to serve as a basis for ascertaining how subjects actually construed the subjective probability structures present in their own and the others' stories, and so these possibilities remain a stimulus for further research. In conclusion, we believe that in general the reader will find that this section shows the importance of studying decision making as a process and the value of data from process tracing studies for improving our knowledge about human decision making.
This Page Intentionally Left Blank
DECISION RULES AND THE SEARCH FOR A DOMINANCE STRUCTURE: TOWARDS A PROCESS MODEL OF DECISION MAKING* Henry MONTGOMERY Department of Psychology, University of Goteborg, Sweden
Abstract A number of problems associated with non-compensatory and compensatory decision rules are discussed. It is suggested that these problems could be avoided if the rules are seen as operators in a search for a dominance structure, that is, a cognitive structure in which one choice alternative can be seen as dominant over the others. The search for a dominance structure is assumed to go through four stages, viz., preediting, finding a promising alternative, dominance testing, and dominance structuring. Each of these stages is related to particular decision rules. Finally, dire@ tions are suggested for future research based on the frameworkpresented in the paper.
Introduction This paper was inspired by an idea that decision making is a search for good arguments. As pointed out by several researchers, people wish to be able to justify their decisions, that is, t o have easily understandable reasons why they act as they do (e.g., Slovic, 1975; Tversky, 1972). They also need a criterion for knowing when they are ready for a decision. I believe that a decision is often preceded by a "click" experience, a feeling of confidence-now I know what to do. This "click" experience emerges when the decision maker finds arguments that are strong enough for making a decision. Numerous decision rules have been suggested for how people choose among multiattribute alternatives (for a review, see Svenson, 1979). Below it is argued that one of these rules is more fundamental than the others in ~
* This study was financed by a grant from the Swedish Council for Research in the Humanities and Social Sciences.1 am indebted to Hannes Eisler, Per-H8kan Ekberg, Eric Johnson, and Ola Svenson for their comments on earlier versions of the paper.
344
H. Montgomery
the decision maker’s search for good arguments. This rule is the dominance rule, which states that one should always choose an alternative which is not worse than other alternatives on any attribute and better o n at least one attribute. The dominance rule is a cornerstone in theories of rational decision making (e.g., Edwards et al., 1966). The problem with this rule is that in most decision situations, we d o not find an alternative which, strictly speaking, dominates over all other alternatives. What will the decision maker d o in such a situation? It is assumed herein that he or she will attempt to create dominance by changing his or her representation of the decision situation such that one alternative becomes dominant. Put differently, the decision process is seen as a search for a dominance structure. To create such a structure, the decision maker could change his evaluation of alternatives on particular attributes. Alternatively, he could elinunate, neutralize or counterbalance old attributes or attend to or introduce new ones. A dominance structure is then equivalent to a representation where one alternative has at least one advantage compared t o other alternatives, and where all disadvantages associated with that alternative are neutralized or counterbalanced in one way or another. There are two points which should be clear right from the beginning regarding the dominance structure concept, as it is used in this paper. First, a dominance structure could be more or less close t o pure dominance. The more disadvantages of the chosen alternative that have t o be neutralized or counterbalanced the greater is the distance to pure dominance. Second, a dominance structure may reflect varying degrees of rationalitylirrationality in the decision process. As is shown below, there are many ways of construing a dominance structure. Some of these ways are sensitive to wishful thinking and other cognitive distortions, whereas others are particularly suitable for sound, reality oriented thinking. Below, I present a tentative model for decision processes seen as a search for a dominance structure. This search is assumed to be compatible with using various decision rules besides the dominance rule. More exactly, these other decision rules are assumed to serve local functions in the decision process by serving as operators in the search for a dominance structure. For example, one rule may be used for excluding a particular alternative which has a low probability to become dominant, while another rule may be used for neutralizing a disadvantage of a promising alternative. The decision rules referred to in the following discussion are listed in the Table. All rules presuppose that a decision situation consists of a number of choice alternatives which can be described in terms of subjectively defined dimensions or attributes (cf. Montgomery and Svenson, 1976). The attributes may be on any metric level ranging from presencelabsence
DECISION RULES
345
Examples of Decision Rules Name of rule
Abbreviation
Choice requirement*
Dominance rule
DOM
Choose alternative A, over 4 if A, is better than A, on at least one attribute and not worse than Ir, on all other attributes.
Conjunctive rule
CON
Choose only alternatives which exceed or are equal to all of a set of criterion values Cj on the attributes.
Disjunctive rule
DI s
Choose only alternatives which exceed or are equal t o at least one of a set of criterion values Di on the attributes.
Lexicographic rule
LEX
Choose alternative A, over A, if it is better (or significantly better) than A, on the most important attribute. If this requirement is not fulfilled, base the choice on the most attractive aspects of the attributes next in order of importance, etc.
Elimination by aspects rule
EBA
Maximizillg number of attributes with a greater attractiveness rule
MNA
Exclude all alternatives which do not exceed a criterion Ci on the most important attribute. Repeat this procedure with new attributes in order of importance. Choose A , over A, if A, differs favorably from A, on a greater number of attributes than the number of attributes on which A, differs favorably from A,.
Addition of utilities rule
AU
Choose the alternative with the greatest sum of (weighted) attractiveness values (utilities) across all attributes.
Addition of utility differences rule
AU D
Add "differences" Dk = f (alk - a2k) where ajk is the attractiveness of aspect jk on alternative j and attribute k, and & is a continuous function of a l k - a2k. If the sum of these "differences" is positive, choose A, and if it is negative. choose A,.
*
-
The choice of requirements are sometimes described in terms of two alternatives A, and A,, but could, however, in all these cases easily be generalized to more alternatives.
346
H.Montgomery
of a particular feature (e.g., bath-tub/no bathtub in flats) to ratio scale measures (e.g., space in sq.m of flats). The values on each attribute are referred to as aspects (e.g., the particular size of a home on the attribute size of homes). Each aspect is assumed to be experienced as more or less attractive by the decision maker. Hence, it is proposed that the aspects could be mapped on an attractiveness scale. This scale is specific for each attribute and the values on the scale need not be commensurable across attributes. The rules in the Table include five noncompensatory rules (the DOM, CON, DIS, LEX, and EBA rules) and three compensatory rules (the MNA, AU, and AUD rules). The non-compensatory rules do not allow that an unattractive aspect on one attribute is compensated by an attractive aspect on another attribute (or vice versa). In contrast, the compensatory rules require that drawbacks and advantages of different attributes are integrated to a total attractiveness measure. One important class of rules is not listed in the Table, namely, rules which primarily correspond to decision making under risk or uncertainty. Still, the present framework is assumed to be valid also for decision making under risk or uncertainty. Actually, as is shown later in this paper, probabilistic features of a decision situation yield particular possibilities for creating a dominance structure. However, the applicability of various rules for decision making under risk, such as the EV, EU and SEU-models, to the present framework is, in my view, unclear or questionable. Partly this is because of the current doubts against some or all of these rules as descriptive models of decision making (Fischhoff et. al., 1981;Montgomery and Adelbratt, 1980; Kahneman and Tversky, 1979). The remainder of this paper consists of five sections. In the first and second sections I discuss a number of problems associated with noncompensatory and compensatory decision rules, respectively. In the third section different phases of a decision process are described and related to various decision rules. In the fourth section it is shown how problems associated with non-compensatory and compensatory rules could be avoided if the rules are seen as operators in search for a dominance strue twe. In the fifth section, I discuss research ideas that might be generated from the present framevork. Problem with NonCompensatory Rules Most non-compensatory rules are probably easier to apply for reaching a decision than the compensatory ones. Usually, they only require ordinal relationships between attractiveness values and most of them do not
DECISION RULES
341
require commensurability of values across different attributes (cf. Montgomery and Svenson, 1976; Svenson, 1979). There is, however, a price to pay for this simplicity. First, non-compensatory rules do not always yield a unique solution implying that their applicability is limited. Second, using these rules implies a risk of neglecting important information.
Limited Applicability Decision rules may be more or less applicable. A decison rule is completely applicable when it could be used for selecting one, and only one, alternative. However, all noncompensatory rules listed in the Table require a certain structure of the decision situation to be applicable in this sense. As mentioned previously, the DOM rule requires that one alternative dominates over all other alternatives, The CON and DIS rules require one, and only one, alternative to be above the conjunctive or disjunctive criteria on all attributes (CON rule) or on at least one attribute (DIS rule). The EBA and LEX rules both imply that the alternatives are eliminated in a stepwise fashion. To be completely applicable, these rules require that there be a stage in the elimination procedure where one, and only one, alternative is acceptable (EBA rule) or (significantly) more attractive on a particular attribute (LEX rule) as compared to the other remaining alternatives. Problems with applying a particular decison rule could be counteracted by changing the representation of the decision situation. Attributes could be redefined, criteria used in particular rules could be changed, etc. Actually, the decision procedure t o be suggested in the present paper implies that the representation of the decision situation is changed, if necessary, to make a particular rule, the DOM rule, applicable. However, the procedure also implies that the process of changing the representation requires its own decision rules, of which some may be compensatory rules.
Neglect of Important Information The fact that non-compensatory rules do not allow drawbacks to be compensated by advantages (or vice versa) implies a risk that important information is neglected. Particularly this is the case if rigid criteria are assumed with these rules. Consider, for instance, a situation where one alternative is just below a conjunctive criterion on one\ attribute but extremely attractive on all other attributes and where another alternative is mediocre but acceptable on all attributes. The CON rule would then prescribe a choice of the second alternative, but it seems reasonable that in
348
H. Montgomery
most situations of this type people would prefer the first alternative. That is, one would argue that the fact that the first alternative is below the conjunctive criterion on one attribute is compensated by all the advantages this alternative has on other attributes compared to the second alternative. Recently, Einhorn et a1. (1979) presented a number of arguments for the necessity of making compensatory judgements in some situations.
Problems with Compensatory Rules Compensatory rules can, at least theoretically, be used in all situations. Usually, they allow the decision maker t o consider all information which is relevant for his decision. There are, however, at least four problems associated with compensatory rules. They are: (a) compensatory rules may require too complex value judgements; (b) it may be difficult to have a good overview of arguments based on compensatory rules; (c) the overall attractiveness measures associated with compensatory rules may be experienced as too abstract; (d) compensatory rules emphasize that one has t o give up certain good things t o get other good things and people hate the thought of giving up anything.
Complexity of Value Judgements A traditional view of a decision situation is that each choice alternative may be described in terms of two components, namely, (a) the attractiveness (or utility) of possible consequences of the alternative (i.e., aspects) and, (b) the probability of each consequence. Hence, the consequences are regarded as uncertain but not their attractiveness. In reality, however, people are often uncertain about how to evaluate the aspects in a decision situation (cf. Fischhoff et al., 1981). There are several reasons for this. The aspects may be new to the decision maker and, therefore, difficult to evaluate. They may correspond to future events which implies that the decision maker may have to predict his emotional reactions to these events in order to be able to evaluate them. Needless to say, these predictions may be uncertain. There is also the well-known problem of comparing the attractiveness of events in the near and distant future, respectively. Finally, it may be difficult t o compare the attractiveness of an aspect corresponding to many events (e.g., the total discomfort of wearing a seatbelt on each car ride) to the attractiveness of a single event (e.g., not being hurt in a car accident).
DECISION RULES
349
The fact that we have difficulties in evaluating the aspects of a decision situation cannot free us from the burden of making these evaluations. It is clear, however, that compensatory rules require more complex value judgements than non-compensatory rules do. That is, they require comparisons of attractiveness values across different attributes whereas the non-compensatory rules only require comparisons within an attribute. The former type of value judgements may be difficult to make as exemplified above. By contrast, people usually have no problems in comparing aspects within an attribute at least on an ordinal level. This is because the general direction of the attractiveness function for a particular attribute usually is clear to people. To pay much money is usually worse than paying little money. A big flat is usually better than a small flat and so on.
The Overview Problem To be able to justify a decision it is important t o have a good overview of the arguments for and against different choice alternatives. That is, it should be possible t o keep as much as possible of the arguments within the limits of the decision maker’s short-term memory. Compensatory rules, as they are described in the literature, usually do not contain any restrictions on the amount of information that should be taken into account by the decision maker. Hence, choices based on these rules may be difficult to justify when the decision situation contains a larger number of aspects. Noncompensatory rules, on the other hand, tend to focus on one or a few attributes and may, hence, lead t o decisions which are easy to justify. For example, in order to choose an alternative according to the disjunctive rule, it may be sufficient to find one attribute on which only one alternative exceeds the disjunctive criterion.
Lack of Concreteness The overall attractiveness measures associated with compensatory rules tell us very little about the underlying pattern of attractiveness values. Therefore, these measures may be regarded as abstract. Non-compensatory rules, on the other hand, are based on more concrete information since they require that the chosen alternative should exhibit a certain pattern of attractiveness values in relation to the other alternatives. They require, for example, that the chosenalternative should have at least one advantage and no disadvantages in relation to other alternatives (DOM rule) or that it should be the only alternative with acceptable aspects on all attributes
350
H.Montgomery
(CON rule). If people experience the compensatory rules as too abstract, they may prefer non-compensatory rules even in situations with a small number of attributes and with well defined aspects. That people indeed may prefer non-compensatory rules in such situations has been found in a number of experiments (e.g., Lichtenstein, Slovic, and Zink, 1969; Montgomery, 1977; Montgomery and Abelbratt, 1980; Tversky, 1969, 1972). The Giving up Problem Since compensatory rules require trade-off judgements across different attributes, they emphasize the painful fact that we sometimes have to give up something in order t o get something else. It seems clear that people d o not like such conflicts and that they tend to experience the world such, that they avoid seeing these conflicts (Festinger, 1957, 1964; Humphreys and McFadden, 1980; Sjoberg, 1981; Janis and Mann, 1977). Often people simply hate to admit that they have to give up somethmg t o get something else. This may lead to distorted world views or what Sjoberg (1981) called "frozen attitudes". However, as is shown below, there are ways of reducing the pain of giving up something without losing contact with reality.
Phases in a Decision Process
We have now discussed problems with both non-compensatory and compensatory rules. However; it is obvious that both types of rules have some psychological reality. Different types of data, such as think-aloud data (Montgomery, 1977; Payne, 1976; Svenson; 1974) choice data (Tversky, 1969, 1972), judgement data (Einhorn, 1970, 1971), and ratings of the applicability of various decision rules (Adelbratt and Montgomery, 1980) illustrate convincingly the descriptive and predictive validity of various decision rules. Still, I believe that all decision rules, except for the DOM rule (in most cases) are not pure decision rules in the sense that they by themselves are sufficient for determining whether a particular choice alternative is chosen or not. Instead they correspond to operations that serve other, more auxiliary or local functions in a decision process. To elucidate what may lie behind various decision rules, I will now distinguish between four phases in a decision process. These are preediting, finding a promising alternative, dominance testing and dominance structuring. All four phases involve search for and evaluation of information in terms of aspects, attributes and alternatives. Each phase is associated with a particular goal that the decision maker has to attain in order to, eventually, find or
351
DECISION RULES
-
construct a dominance structure. The goals, are associated with various operations corresponding to certain patterns of search for information and evaluation of information. These operations are often closely related to certain decision rules. The Figure gives an overview of how a decision process may be organized in terms of the present framework. As can be seen, the decision process is viewed as following a flowchart. This format should not be taken too literally, however. In reality there may be fuzzy border-lines
)-(
Ail
< = to iind a new
alter-
native within current represento-
a
Pre-editing Selecting and evaluating attributes Screening (CON, EBA)
I
yes Finding a promising alternative (01s. LEX. EBA)
the decision
0
a
1
ves
Dominance structuring De-emphasizing (LEX, Bolstering I DIS, CON! Cancellation IMNA,AUO) Collapsing ( AU 1
A Dominance Search Model of Decision Making. (The expressions within parantheses, CON, EBA, etc, stand for decision rules assumed to be closely related to a particuh decision making phase.)
352
H.Montgomery
between the activities described in the model. Moreover, the temporal order between the activities in the model is not strictly deterministic, but corresponds rather to stochastic tendencies which need to be explored in future empirical research. The decision process, as it is described in the Figure, is assumed to start with the problem of choosing one out of several alternatives. Exactly how the decision problem has arisen is outside the scope of this paper. On the other hand, the definition of the problem once it has arisen, is an important part of the model in the Figure, The model may be described in terms of goals that are associated with each decision making phase. The goal of the pre-editing phase, which is assumed to occur primarily in the beginning of the decision process, is mainly to delimit the decision problem by selecting those alternatives and attributes that should be included in the dominance structure. The goal of the activities associated with finding a promising alternative, which is the next phase, is t o find an alternative that has a reasonable chance t o be seen as dominant over the other alternatives selected in the pre-editing phase. When a promising alternative has been found, the decision maker continues to the dominance testing phase. The goal of this phase is to actually find out whether a promising alternative can be seen as dominant over the other alternatives. The dominance structuring phase, finally, can be regarded as a sub-routine to the dominance testing phase. The operations in this phase are performed when it has been found that a promising alternative violates a dominance structure. The goal of the dominance structuring phase is then to neutralize such a violation. If this succeeds the decision maker continues testing whether the promising alternative could be regarded as dominant until all information selected in the pre-editing phase has been evaluated. If, on the other hand, dominance structuring fails, the decision maker may return to the pre-editing or finding a promising alternative phase and attempt to redefine the decision problem and/or to find a new promising alternative. Alternatively, the decison maker could postpone the decision, if this is possible. I will now discuss each of the four decision making phases in some detail.
Be-Editing The main goal of this phase can be described as separating relevant information from less relevant information which can be discarded in subsequent information processing. The activities in this phase can also rise t o
DECISION RULES
35 3
finer distinctions which could be used for setting up priorities for how the information will be handled in subsequent decision making phases. This phase is called pre-editing to stress that the activities in this phase provide the basis for subsequent operations but also to distinguish them from the editing or structuring operations which are performed in the dominance structuring phase. Recently, Kahneman and Tversky (1979) argued for the existence of an editing phase in decision making. As is the case with preediting in the present model, the goal of Kahneman and Tversky’s editing phase is to concentrate the representation of a decision situation into its most essential features. However, the editing operations described by Kahneman and Tversky are more similar to some of the dominance structuring operations described below than to pre-editing in the present model. (Note, though, that Kahneman and Tversky do not assume that their editing operations are instrumental to the findlng of a dominance structure.) The operations in the pre-editing phase are of two types, viz., selecting and evaluating attributes, and screening of alternatives. Selecting and evaluating attributes. Several process tracing studies have found that decision makers evaluate the importance of attributes in their representation of the choice situation (Huber, 1980; Payne, 1976; Svenson, 1974). It might be assumed that the function of these evaluations is to determine the actual importance of different attributes in subsequent information processing. Some evaluations may even result in discarding certain attributes from further consideration for all or some alternatives (cf. Johnson and Russo, 1981; Payne, 1976). That there is indeed a relationship between judged importance of an attribute and its actual importance in the decision process, has been shown by Slovic (1975) and Huber (1980). However, there are also other factors than judged importance which might affect the extent to which an attribute plays a role in the decision process. Slovic and MacPhillamy (1974) found that subjects weighted attributes more heavily when they were common to all alternatives than when they were unique. Tversky (1969) reported findings implying that the reliability or discriminability of values on an attribute is positively related to the attribute’s salience in the decision process. To the author’s knowledge no general decision rule has been offered in decision making research for how people find or select important attributes. On the other hand, some decision rules, such as the EBA and LEX rules, and the AU and AUD rules (with weighting), are based on the idea that attributes vary in importance and that the importance determines the salience of particular attributes in subsequent information processing. Screening. This operation is the counterpart of the selecting and evaluating attributes operation with regard to the alternatives in a choice
354
H. Montgomery
situation. However, the screening operation is assumed to function more in an either or fashion inasmuch as it is focused on either finding acceptable or interesting alternatives or on discarding or rejecting non-acceptable or uninteresting alternatives. The main function of screening in the present model is to select alternatives with some chance of becoming dominant or discard alternatives which have a very small chance to be seen as dominant over other alternatives. Discarding an alternative does not necessarily imply that the alternative is totally neglected in the following decision process. On the contrary, the decision maker may keep some discarded alternative in mind in later stages of the decision process to make it possible to check whether one alternative indeed dominates, in some sense, over other alternatives. This idea is indirectly supported by Tyszka’s (1983) recent finding that alternatives which dominate over a particular alternative tend to be preferred to other alternatives which do not dominate over the alternative in question. These results imply that a decision may be facilitated by having access to an alternative which is dominated by another alternative. Screening also reflects the fact that people just do not accept any alternatives. Usually, certain minimum requirements should be fulfilled by those alternatives that are experienced as possible candidates for the final choice (Simon, 1955). Sometimes, no alternative fulfills such minimum requirements. The decision maker may then attempt to find new alternatives which may be selected as acceptable (cf. Corbin, 1980). The alternative which finally is chosen, is always assumed to have been previously selected as acceptable by the decision maker. If the screening operation has resulted in finding at least one acceptable alternative, it ends when all available alternatives have been checked or when the set of accepted alternatives has reached some optimal size. The latter case implies that the psychological (or economical) cost for searching for and considering new choice alternatives may be of importance in deciding when to stop the screening process. (For a discussion of related problems, see Corbin, 1980) The psychological reality of the screening operation is backed up by resultsfrom several studies which suggest that at an early stage of t4e decision process people tend to discard alternatives which are unattractive or not acceptable on one or several attributes (e.g., Payne, 1976; Svenson, 1974; Wright and Barbour, 1977). Hence, it seems that this initial discarding of alternatives follows the CON or EBA rules. However, the evidence is not conclusive enough to exclude that other decision rules may also be used for discarding or selecting alternatives. For example, an alternative that does not pass a CON or an EBA test may not necessarily be discarded. As noted above, people may accept alternatives which do not seem to be
DECISION RULES
355
acceptable on some attribute if this drawback is compensated by very attractive aspects on other attributes. Thus, the selection of an acceptable alternative could be associated with compensatory decision rules and particulary the AU rule which is focused on comparisons within an alternative. It could also be the case that people discard alternatives which are acceptable on all attributes but not particularly attractive on any attribute. Such a line of reasoning would imply that people discard alternatives which do not fulfill the requirements of the DIS rule. In conclusion, several decision rules may be associated with the screening operation, although the CON and EBA rules seem to be most closely related to this operation.
Finding a Promising Alternative The model in the Figure implies that a decision process is regarded as a search for and testing of hypotheses about which alternative is better than the others, or more specifically, which is the alternative that most naturally can be seen as dominant over the others. It seems reasonable that when the decision maker finds an alternative to test with regard to its superiority over other alternatives, he or she has some belief or hope that this alternative actually is better than the others.For this reason such an alternative is henceforth denoted as a promising alternative. It is assumed that as a rule only one alternative at a time is tested with regard to its Superiority. Because of this it is natural to distinguish between the prsediting phase which results in several acceptable or interesting alternatives and the present phase in which one of these alternatives is picked out as a promising alternative to test with regard to its superiority. However, dependent on the outcome of subsequent decision making phases, the individual may switch rapidly between different alternatives which he finds promising, one at a time. What makes an alternative promising? It seems reasonable that the m r e attractive an alternative is on any attribute, the greater is the chance that the alternative is experienced as promising. Particularly, this rmght be the case for important attributes (Slovic, 1975). Two of the decision rules referred to in the Table, viz., the DIS and LEX rules, are focused on the most attractive aspects in a decision situation and could, hence, be used for selecting a promising alternative. The EBA rule may also be relevant insofar as consistent usage of this rule implies finding an alternative that is more attractive on a particular attribute than other alternatives remaining after the screening phase. In fact, the EBA rule provides a comprehensive procedure for both the screen23*
356
H.Montgomery
ing and the finding of promising alternative phases. Hence, the first attribute (or attributes) on which the rule is applied may be related to the screening phase whereas the final attribute on which the rule is applied, that is, an attribute on which only one remaining alternative exceeds the criterion Cj (see Table ), may be related to the finding of a promising alternative phase. There are other issues besides decision rules which we have to discuss in order to explain what makes an alternative promising. However, these issues are also related topreediting and are discussed in the following section.
General Discussion of Pre-Editing and Finding a Promising Alternative: Directionality of Decision Processes The two pre-editing operations and the finding of a promising alternative may be more or less dependent on each other. For example, screening and the finding of a promising alternative may depend on the selection and evaluation of attributes. This is because the two former processes may be based on aspects on important or sailent attributes. There may also be dependencies between screening and finding a promising alternative which may depend on the selection and evaluation of attributes. For example, the finding of a very promising alternative may make it easier to discard other less promising alternatives. Conversely, the finding of an appalling alternative (screening) may make it easier to experience some other alternative as promising. It is interesting to note that exactly these dependencies could be predicted from Helson’s (1964) adaption level theory or Parducci’s (1 965) related range-frequency model. The preliminary phases of a decision process now discussed may involve more or less constructive activities. Sometimes alternatives and attributes are presented to the decision maker in a clear-cut manner, e.g., in terms of an alternative by attributes matrix. This is often the case in laboratory experiments on decision making. In such situations the selection of important or salient attributes and of acceptable alternatives as well as of a promising alternative may be done in a straightforward manner, i.e., with a minimum of constructive activity. In other situations, the decision maker is confronted with ill-defined alteAatives and attributes which implies that he must actively search for or construct interesting or acceptable attributes and alternatives. He may even attempt to
DECISION RULES
351
construct very unattractive alternatives in order to make it easier t o accept other less unattractive alternatives. Re-editing and the finding of a promising alternative imply that the decision process acquires a certain directionality in the sense that certain alternatives and attributes will receive more attention than others in subsequent decision making phases. The directionality of the process may be determined more or less consciously. Shifts in the directionality may occur several times in the process, particularly when the decision maker fails to find a dominance structure for a promising alternative. The force of the directionality may vary across individuals and situations. By strong directionality I mean cases where the decision maker is heavily committed to only one alternative and only attends to information which supports that alternative. By contrast, the directionality is weak when the decision maker attends to both positive and negative evidence associated with a promising alternative and when he is prepared t o search for a new promising alternative, if the negative evidence against the current promising alternative is too strong. The directionality may be particularly strong in stress situations of various kinds (Janis and Mann, 1977; Sjoberg, 1980, 1981) or when high values are at stake. An internal factor that may strengthen the directionality is what Sjoberg (1980) called "images", such as certain experiences with a strong emotional tinge. In general, when the directionality is strong it may be more influenced by affective reactions or "hot cognitions". Zajonc (1980) recently argued that decision making, as well as preferences in general, is determined by a sort of affective reaction, which he assumed to be fairly independent of the cognitive processes involved in recognizing a particular object. Support for this idea was taken from a series of experimental results which indicated that people can reliably discriminate between new and old objects in terms of affective judgements (like-dislike ratings) in the total absence of recognition memory (old -new judgements). However, perhaps in contrast to Zajonc, I believe that there are decision situations where affective reactions play a minor role. In these cases directionality is rather weak and takes the form of an impartial hypothesis testing procedure without any strong bias toward or against particular choice alternatives. For example, the decision maker could attempt t o find a dominance structure for each of a number of alternatives in turn and finally choose that alternative for which a dominance structure is most easily found. Whether the directionality is strong or weak, it might be appropriate to describe the underlying process in terms of particular decision rules. For instance, regardless of whether the bias toward a certain alternative is
358
H.Montgomery
strong or weak, this bias might be due to the fact that the decision maker is attracted by a particular aspect of that alternative, which would be equivalent to being guided by the DIS rule or, perhaps, the LEX rule. The decision rule concept would be less fruitful if Zajonc (1980) is correct in his hypothesis that preferences in general are guided by gross, vague, and global stimulus features. However, Zajonc’s support for this particular hypothesis is fairly indirect and primarily restricted t o certain visual stimuli, such as colors and faces. There are some situations, though, besides preferences among colors and faces, where a description in terms of decision rules gives an epiphenomenal view of what is really going on. Often the real reason why people feel inclined to choosing, or not choosing, a particular alternative, is that other people support or are against the choice of this alternative, or that there is some norm or cultural habit which prescribes or prohiiits the choice of that alternative. Such factors may determine the selection of a promising alternative as well as the strength of the directionality of subsequent information processing. Norms, habits or other people’s opinions could, of course, be defined as aspects with a certain degree of attractiveness and in this way be regarded as equivalent to other aspects in the decision maker’s representation of the choice situation. That is, if people in general like alternative A, this may be regarded as an attractive aspect of this alternative and the fact that alternative B does not follow a particular norm may be seen as an unattractive aspect of that alternative. Such an approach, however, obscures the fact that these factors may not (only) be experienced as part of the choice alternatives but rather as means for facilitating the evaluation of the choice alternatives. Hence, factors like norms and habits, may serve a similar function as decision rules or even replace them as guides for people’s decisions. This line of reasoning could be related to Abelson’s (1976) idea that decision making is guided by scripts of varying degrees of generality. A decision rule could be regarded as a very general script whereas a norm is more specific.
Dominance Testing After a promising alternative has been found, the next logical step on the way to finding a dominance structure is to test whether the promising alternative can be seen as dominant over the other alternative. The fact that an alternative is promising implies that this alternative probably has some advantage in relation to other alternatives when it enters into the dominance testing phase. Hence, the most critical function of the dominance testing phase may be to find out whether a promising alternative has any
DECISION RULES
359
disadvantages in relation to other alternatives. There are two types of evaluations which could be used for the dominance testing. Both types of evaluations also occur in the previous decision making phases. One type concerns ''absolute" evaluations of single aspects on a particular attribute related to some criterion on that attribute, e.g., "the rent for apartment A is high". The other type corresponds to comparative evaluations between aspects on the same attribute across different alternatives, e.g., "the rent of apartment A is hgher than for apartment B". If the former type of evaluations results in highly positive attractiveness values, this may be seen as supporting a dominance structure for the promising alternative. Comparative evaluations, on the other hand, support a dominance structure for a promising alternative as long as they result in judgements of b h e r or equal attractiveness for the promising alternative on particular attributes as compared to other alternatives. A "pure" dominance structure, i.e., a structure in accordance with the dominance rule as it usually is formulated, should be based on the latter type of judgements and not on absolute judgements. However, when absolute attractiveness values are high for a particular alternative on particular attributes the likelihood for a violation of dominance in its pure sense for that alternative on the attributes in question may be low. Moreover, when the number of alternatives is large the decision maker may save considerable amount of processing time by making absolute evaluations of aspects associated with a promising alternative rather than comparing this alternative with many others. It might thus be expected that the proportion of absolute evaluations within an alternative, or more generally the proportion of intra-altemative judgements, will increase the larger the number of alternatives is. This indeed what has been found empirically in a number of studies (cf. Svenson's 1979 review). As shown in the Figure, the dominance testing phase ends definitely when the decision maker has evaluated, as he or she sees it, all relevant information and has succeeded in eliminating or counterbalancing all violations, if any, of dominance for the current promising alternative. "All relevant information" could cover more or less completely the information selected in the pre-editing phase. That is, depending on, for instance, the extent to which the decision maker is committed to the promising alternative he or she may "forget" to evaluate facts which may violate dominance for the alternative. The dominance testing phase is obviously related to the DOM rule. However, the decision maker's actual handling of the information in this phase, together with its "sub-routine" dominance structuring, may come close to several other decision rules. In an extreme case, the decision maker may immediately choose a promising alternative only because it is
360
H.Montgomery
superior on an important attribute. Hence, in this situation, he or she neglects to evaluate the promising alternative on other attributes, which implies that the fmding of a promising alternative and the dominance testing phase will coincide and that the decision is made according to the LEX rule. A number of decision rules may be associated with the dominance structuring phase, which is discussed next.
Dominance Structuring
Dominance structuring aims more or less directly at eliminating or neutralizingall violations of dominance for a promising alternative that have been found in the dominance testing phase. Four types of operations may be applied in the dominance structuring phase. These are de-emphasizing, bolstering, cancellation and collapsing. De-emphasizing. This operation implies that the decision maker de-emphasizes the importance of an attribute or of differences across alternatives on a particular attribute. The result may be that an entire attribute or a subset of aspects on an attribute is deleted from the decision maker's representation of the choice situation. In this respect de-emphasizing is similar to the evaluation of attributes operations in the pre-editing phase. In fact, de-emphasizing may function as a second try to eliminate relatively unimportant attributes or relatively small differences between aspects on an attribute. However, in contrast to pre-editing, de-emphasizing is instrumental to the finding of a dominance structure for a pzrticular alternative, i.e., the promising alternative. That is, de-emphasizing implies that the decision maker attempts t o delete violations of dominance for the promising alternative and nothing more. Because of this, de-emphasizing may be more strongly focused on eliminating differences on particular attributes across alternatives (the promising alternative compared t o others) than on eliminating attributes per se, as compared t o evaluation of attributes in the pre-editing phase. De-emphasizing may be brought about in at least three ways. First, the decision maker may simply change his criterion somewhat (as compared to the pre-editing phase) for what should be considered an unimportant attribute or a negligible difference between alternatives on an attribute. In the latter case it may not be necessary to change the criterion but rather to establish a criterion since the decision maker may not have evaluated all relevant intra-attribute differences in the pre-editing phase. Second, the decision maker may bring in probabilistic arguments. He may say: "OK, this is a great drawback of this (promising) alternative, but
DECISION RULES
361
the probability is so low that this drawback actually will occur, that I could nevertheless disregard it". He may also persuade himself that if a particular outcome threatens to occur there will be means t o prevent it from actually happening or at least that there will be means to protect himself from the consequences of the outcome in question. A particular type of argument with regard to preventing negative outcomes is the reversibility argument, that is, to refer to the possibility of reversing a decision. For example, a person who has bought a house with high mortgages may argue: "If I cannot afford the mortgages I can always sell the house and buy a cheaper onef'. It goes without saying that this kind of argument leaves ample space for wishful thinking. Our potential house buyer may, for example, uncritically take for granted that he will get back all the money he originally gave for the house if he later sells it. Thud, the decision maker may bring in time perspectives and argue that the occurrence of a particular negative outcome is so remote in time that, for this reason, it could be disregarded (cf. Ortendahl and Sjoberg, 1979). This argument could be supported by probabilistic arguments like "In the distant future (e.g., that time when a now young smoker may get lungcancer) things may have happened (e.g., vaccine against cancer) that make the probability very low for evils like this". The decision rule that is most closely related to de-emphasizing is perhaps the LEX which assumes that the decision maker could disregard intra-attribute differences on less important attributes. In some versions of the rule the decision maker may also disregard small differences on more important attributes (cf. Svenson, 1979; Tversky, 1969). Bolstering. This operation stands for attempts t o increase the support for the promising alternative. More speclfically, bolstering may either imply enhancing positive aspects associated with a promising alternative or enhancing negative aspects associated with non-promising alternatives. Bolstering may be used as an indirect route of de-emphasizing. This is because the more successful the decision maker is in bolstering supporting evidence the easier it may be t o de-emphasize conflicting evidence. The de-emphasizing that may result from bolstering supporting evidence may be related t o compensatory decision rules, such as the AU or AUD rules. This is because de-emphasizing via bolstering may be seen as equivalent to compensating disadvantages (i.e., de-emphasized aspects) of a particular alternative by advantages (i.e., bolstered aspects) of the same alternative. The simplest bolstering method is t o repeat the same supporting arguments over and over again. Alternatively, the decision maker may attempt to generate as many different supporting arguments or aspects as possible. This could be done either by using a "breadth-first" or a "depthfirst" strategy. In the former case the decision maker searches for new
362
H.Montgomery
attributes, whose aspects may yield supporting evidence. In the latter case, the decision maker starts out from supporting aspects on given attributes and attempts to get more supporting evidence by generating consequences (of the aspects in question) and consequences of these consequences and so o n For example, the attractiveness of Sw.Kr. 1,000 may be enhanced by thinking of good things that could be bought for this sum of money and by thinking of pleasant qualities associated with these good things and so on Bolstering may be particularly successful when the decision maker has vivid images of aspects used for bolstering. Toda (1980) argued that decisions could be governed by anticipatory feelings evoked by images of situations associated with particuliu choice alternatives. If this is true, decision makers may deliberately manipulate their evaluations by creating vivid images of aspects known t o favor (or speak against) a certain alternative. Sometimes bolstering may be based on tactics which are the reverse of the de-emphasizing operations described above. Thus, the decision maker may attempt to enhance the chances for outcomes supporting the promising alternatives or he may persuade himself that such supporting outcomes that are remote in time are not that remote after all. Bolstering is perhaps most closely related t o the DIS and CON rules. This is because successful bolstering implies that the decision later will be based on highly positive attractiveness values associated with the promising alternative (DIS rule) or on highly negative attractiveness values associated with non-promising alternatives (CON rule). Empirical evidence for bolstering as well as for de-emphasizing may be found in studies of wishful thinking or belief-value distortion (Janis and Mann, 1977; Sjoberg, 1980). Moreover, the above discussion of various de-emphasizing and bolstering operations overlaps to some extent with Janis and Mann’s (1 977) discussion of tactics used for defective avoidance in decision making. Cancellation. Consider a choice between two homes where there is a positive attractive difference between the alternatives on one attribute, say rent, and a negative attractiveness difference on another attribute, say location. The decision maker may then discard these two attributes because he or she experiences that the attractiveness difference in the one attribute is cancelled by the difference in the other attribute. Cancellation is useful only if the promising alternative has some other advantage besides the advantage used for cancelling. Otherwise a dominance structure could not be reached for that alternative. Cancellation may correspond more or less closely to the kind of de-emphasizing of a particular aspect that results from bolstering other
DECISION RULES
363
aspects. However, cancellation always requires that a specific disadvantage is counterbalanced by a specific advantage. In contrast, de-emphasizing via bolstering implies that the decision maker may be insensitive to any disadvantage below a certain size as a result of bolstering one or more other aspects. It might be expected that there should be some natural connection between the advantage and disadvantage matched to each other in a particular cancellation operation. This connection may, for example, correspond to natural trade-off relationships, such as between prize and quality. It could also be based on similarity between underlying attributes. For example, in the choice between two types of power plants the pollution associated with the one type may be seen as cancelled by some kind of pollution associated with the other type of plant. Cancellation obviously requires compensatory thinking. However, it does not necessarily require more exact comparisons of the attractiveness of relevant advantages or disadvantages since the difference between, or the sum of, these attractiveness values is not reflected in the output of the cancellation operation, which simply is the elimination of a particular disadvantage. Tversky (1969) refers to cancellation as an information processing strategy that could be used in decisions following the AUD rule. However, cancellation is perhaps most clearly in line with the MNA rule, which prescribes a choice of the alternatives with the greatest number of favorable attributes (for empirical evidence on this choice strategy, see Huber, 1981; Russo and Dosher, 1981). This is because if cancellation is applied throughout on the positive and negative attractiveness differences between two alternatives, then the alternative with the greater number of favorable attributes will finally dominate the other alternatives on the remaining noncancelled attributes. Cancellation may also occur within an alternative and, hence, follow the AU rule. For example, the fact that a job applicant is somewhat lazy may be cancelled by the fact that he or she is relatively intelligent. Cancellation is probably particularly useful when there are moderate attractiveness differences within or between alternatives. When the differences are very small the deemphasizing operation may be sufficient to get rid of attributes. When the differences are very large cancellation may require too much of compensatory thinking to satisfy the decision maker’s requirements of a good dominance structure. In such cases, the decision maker may instead attempt to apply the operation to be discussed next, namely, collapsing. Collapsing. This operation implies that two or more attributes are collapsed into a new, more comprehensive attribute. The perhaps most
364
H.Montgomery
common example of collapsing is when a number of attributes are redefined in monetary terms (cf. so-called cost-benefit analysis). Another such comprehensive attribute is time. For example, the attributes distance to downtown and modern facilities for a set of homes may be collapsed into the attribute time saved. More precisely, the shorter the distance to downtown, the less time is spent on commuting and the more time is saved for other activities. Analogously, the more modern facilities there are in a home, the less time is spent on house-keeping and the more time is saved for other thmgs. A third kind if collapsing, popular nowadays, is to redefine various attributes in terms of energy consumption. That is, all links in the production and usage of various products consume energy and the total amount of energy required may be used as a comprehensive measure of the (un)attractiveness of a product. All compensatory decision rules, as they are traditionally defined, require a certain kind of collapsing, namely that different attributes are compared in terms of general, that is, attribute independent, measures such as attractiveness or utility. In fact, collapsing may function on various levels of generality or abstraction with measures such as attractiveness or utility on the highest level. On an intermediate level there are attributes like monetary costs or gains, time and energy consumption. On a still lower level, there are attributes like rent per square meter or price per kilo, where a few specific attributes are collapsed into a higher order measure. The effectiveness of the collapsing operation is dependent on the extent to which essential features of the collapsed attributes are reflected in the compehensive attribute supposed to replace the collapsed attributes. As an example, consider a choice between two heating installations, X and Y , where X has high energy consumption but long durability, whereas the reverse is true for Y . The two attributes, durability of installation and energy consumption, may then be redefined in economic terms and the resulting sums of money added t o the initial investment costs. However, a long durability is not only advantageous from an economic point of view. It saves trouble as well. Similarly, a low energy consumption has other advantages besides economic ones. For example, it may be more satisfying from a moral point of view than is the case for a high consump tion. All types of collapsing may be associated with compensatory decision rules, since this operation implies that conflicts are collapsed into a general advantage or disadvantage for one of the alternatives on some comprehensive attribute. It should be noted, however, that this comprehensive attribute need not be expressed in terms of attractiveness or utility as is traditionally required by compensatory decision rules. Of the three compensatory decision rules listed in the Table, the AU rule seems to
DECISION RULES
3 65
be most closely related to collapsing since most types of collapsing seem to require that values of aspects are added to each other as is also the case for the AU rule. Avoiding Problems with NonCompensatory and Compensatory Decision Rules The process model outlined above implies that different non-compensatory and compensatory decision rules serve various local functions in a decision process. By doing so, the problems associated with each of these two types of rules may be avoided. Turning first briefly to the two problems discussed in relation to non-compensatory rules, namely the problems of limited applicability and of neglect of important information, it could be noted that both these problems are avoided by the combined usage of different non- compensatory and compensatory rules in different phases of the decision process. As for problems with compensatory rules, these problems may be counteracted by simply avoiding using compensatory rules as long as it is possible, which the present model allows for. However, the model also yields other ways for meeting these problems. One of the problems discussed above implied that compensatory rules may require too complex value judgements mainly because these judgements involve comparisons across different attributes. This problem is counteracted in the cancellation and collapsing operations, which stress the importance of defining and grouping attributes in such a way that commensurability of aspects across attributes is maximized. The second problem with compensatory rules discussed above is that it may be difficult to get a good overview of a decision based on these rules. This problem may be related to the aims of all the four decision malung phases proposed in this paper. Pre-editing facilities, a good overview by limiting the number of attributes and alternatives on which a decision is based. The basis for finding a promising alternative may be used as the main argument for a final choice of this alternative. Dominance testing and dominance structuring may further reduce the amount of information to consider. Moreover, these activities may result in a cognitive structure, that provides a natural basis for surveying the arguments for a decision as well as for picking up particular supporting arguments or for meeting particular counterarguments. Partly this is because a dominance structure is more or lesshierarchical due to the extent to which attributes have been matched to each other or grouped as a result of cancellation or collapsing. A hierarchical structure provides an economic format for stor-
366
H. Montgomery
ing information and make short acquisition times possible (e.g., Collins and Quillian, 1969; Wickelgren, 1979). The third problem with compensatory rules-the lack of concreteness of an overall attractiveness measure-is counteracted by the fact that a dominance structure allows arguments and counterarguments to be related to each other. This is in contrast to the "blind" summation of negative and positive attractiveness values according to compensatory decision rules as they usually are described. The fourth problem with compensatory rules discussed above, is that these rules emphasize the painful fact that one sometimes has t o give up certain good things to get other good things. The present model allows various, more or less rational ways of meeting this problem. The least rational way is certainly to "forget" conflicting evidence in the dominance testing phase. Somewhat more rational, but often not very much so, may be to use the de-emphasizing and bolstering operations to get rid of conflicting evidence. A more rational way of using these operations would be to apply them to each reasonably promising alternative in turn and finally favor that alternative for which the greatest number of aspects could be successfully de-emphasized. The cancellation and collapsing operations are less suitable for concealing conflicting evidence but they may provide some comfort. As discussed earlier, cancellation may be applied on attributes that stand in a natural trade-off relationship to each other, that is, such giving upgaining relationshps to which people have often long since been accustomed and therefore have accepted. Collapsing, finally, aims at maximizing commensurability of aspects across different attributes. The more the decision maker succeeds in achieving this aim the easier it will be for h m to accept sacrifices in order to get something good. For example, if various attributes could be redefined in terms of monetary costs or gains, it will be easier to accept a loss of, say, Sw.Kr. 1 ,OOO to get Sw.Kr. 2,000, than to accept a loss on one attribute to get gains on another attribute.
Conclusions The model described above is intended to cover a wide range of decision processes. However, in this paper only a few hints have been given about the type of decision process that actually mght occur in a given situation. Because of this the model should primarily be regarded as a framework for generating research ideas rather than as a set of falsifiable assertions. However, there are same fairly straightforward predictions that could be made from the model.
DECISION RULES
361
One type of predictions could be made with respect to the dynamicsof a decision process as reflected in think-aloud data or other kinds of process tracing data. It might be predicted, for example, that the alternative which is finally chosen will (a) receive more attention than other alternatives, and (b) tend to be more positively evaluated before it is chosen. Both predictions follow from the assertion that a promising alternative, which is finally chosen, is treated differently than other alternatives in the dominance structuring phase. Another type of predictions concern final choices rather than the underlying process. In particular, there could be situations where a strictly compensatory rule would predict other choices than the present model would do. For example, according to the present model an alternative with a big advantage but with many small disadvantages might be experienced as promising and the small disadvantages mght be eliminated by means of the de-emphasizing operation. Hence, it could be predicted that people will choose such an alternative even if other alternatives are better according to a compensatory rule. Both types of predictions described above might perhaps be made from other models than the present one. There is, however, a third type of predictions which follows uniquely from the present model. These predictions presuppose that decision aids are constructed in line with the present framework. In general, it could be predicted that decision aids which help people to find a good dominance structure, are experienced as more useful than other aids Of course it will then be very important to clarify in detail what a good dominance structure is. This is an interesting problem for future research
References Abelson, R , 1916. Script processing in attitude formation and decision making. In: J.D. Carrol. and J.W. Payne (eds.), Cognition and Social Behavior. Hillsdale, N.J.: Erlbaum Adelbratt, T. and H. Montgomery, 1980. Attractiveness of decision rules. Acra PSychOlOgi~,45, 117- 185. Bettman, J.R and M.A. Zins, 1979. Information format and choice task effects in decision making. Journal of Consumer Research, 6, 141-153. Collins, AM. and M.R. QuilUan, 1969. Retrieval time for semantic memory. Journal of Verbal Learning and Verbal Behavior, 8, 230-241. Corbin, R.M., 1980. Decisions that might not get be made. In: T. Wallsten (ed.), Cognitive Processes in Choice and Decision Behavior. Hillsdale, N.J.: Erlbaum. Edwards, W., H. Lindman, and L.D. Phillips, 1966. Emerging technologies for making decisions In: T. Newcombe (ed.), New Directions in Psychology, 11. New York: Holt
368
H. Montgomery
Einhorn, H.J., 1970. The use of nonlinear noncompensatory models in decision makii. Psychological Bulletin, 73, 221-230. Einhorn, KJ., 1971. The use of nonlinear noncompensatory models as a function of task and amount of information. Organizational Behavior and Human Performance, 6, 1- 27. Einhorn, KJ., B. Kleinmuntz, and D.N. Kleinmuntz, 1979. Linear regression and process tracing models of clinical judgment. Psychological Review, 47, 26 3- 29 1. Festinger, L., 1957. A Theory of Cognitive Dissonance. Evanston, Ill.: Row Peterson. Festinger, L. (ed.), 1964. Conflict, Decision and Dissonance Stanford, Calif.: Stanford University Press. Fischhoff, B., B. Gointein, and 2. Zhapira. The experienced utility of expected utility approaches. In: N.T. Feather (ed), Expectancy Incentive and Action. Hillsdale, N.J.: Erlbaum (in press). Helson, A, 1964. Adaption-Level Theory. New York: Harper & Row. Huber, O., 1980. The influence of some task variables on cognitive operations in an information-processingdecislon model. Acta Psychologicu, 45, 187-1 96. Huber, 0. Dominance among some cognitive strategies for multidimensional decisions. In: L. Sjoberg, T. Tyszka, and J.A. Wise (eds.), Human Decision Making. Lund: Doxa (in press). Humphreys, P. and W. McFadden, 1980. Experiences with MAUD: Aiding decision structuring versus bootstrapping the decision maker. Acta Psychologica. 45, 5 1- 69. Janis, 1.L. and L. Mann, 1977. Decision Making New York: The Free Press. Johnson, E.J. and J.E. Russo, 1981. Product familiarity and learning new information. In: J.C. Olson (ed.), Advances in Consumer Research, Vol. 7. Ann Arbor: Association for Consumer Research. Kahneman, D. and T. Tversky, 1979. Prospect theory: an analysis of decisions under risk. Econometrica, 47, 263-291. Lichtenstein, W., P. Slovic, and D. Zink, 1969. Effect of instruction in expected value on optimality of gambling decisions. Journal o f Experimental Psychology, 79, 236-240. Montgomery, K, 1977. A study of intransitive preferences using a thinkaloud procedure. In: H. Jungerman and G. de Zeeuw (eds.), Decision Making and Change in Human Affairs Dordrecht: Reidel. Montgomery, H. and T. Adelbratt, 1980. Gambling decisions and information about expected d u e . Goteborg Psychological Reports, 10 (3). Montgomery, H. and 0. Svenson, 1976. On decislon rules and information processing strategies for choices among multiattribute alternatives. Scandinavian Journal O f P S y C h O b ~17, , 283-291. Parducci, A., 1965. Category judgment: A range frequency model. Psychological Review, 72, 407-418. h e n d a h i , M. and L. Sjoberg, 1979. Delay of outcome and preference for different courses ofaction Percept. mot. Skills, Monograph Supplement 1, vol. 48. Payne, J.W., 1976. Task complexity and contingent processing in decision making: An Information search and protocol analysis. Organizational Behavior and Human Performance, 16, 366-387. Russo, J.E. and B.A. Dosher. Cognitive effort and strategy selection in binary choice. Unpublished manuscript.
DECISION RULES
369
Simon, H.A., 1955. A behavioral model of rational choica. Quarterly Journal of Economics, 68,99- 118. Sjoberg, L., 1980. Volitional problems m carrying through a difficult decision. Acta Psychologicq 12, 123-132. Sjoberg, L. Beliefs and values as components of attitudes. In: B. Wegener (ed), Social Psychophysics Hillsdale, N. J.: Erlbaum (in press). Slovic, P., 1975. Choice between equally valued alternatives. Journal of Experimental psycho lo^, I, 280-287. Slovic, P. and D. MacPhillamy, 1974. Dimensional commensurability and cue utilization in comparative judgment. Organizational Behavior and Human Performance, 11, 172-194. Svenson, O., 1974. A note on thinkaloud protocols, obtained during the choice of a home. Reports from the Psychological Laboratories, University of Stockholm, No. 421. Svenson, O., 1979. Process descriptions of decision making. Organizational Behavior and Human Performance, 23, 86- 112. Toda, M., 1980. Emotion and decision making. Acta Psychologica, 45, 133-155. Tversky, A, 1969. Intransitivity of preferences. Psychological Review, 76, 31 -48. lkersky, A, 1978. Elimination by aspects: A theory of choice. Psychological Review, 79, 281-299. Tyszka, T., 1983. Contextual multiattribute decision rules. In: L. Sjoberg, T. Tyszka, and J.A. Wise (eds.), Human Decision Making. Lund: Doxa. Wickelgren, W.A., 1979. Cognitive Psycholou. Englewood Cliffs, N. J.: Prentice Hall. Wright, P. and F. Barbour, 1977. Phased decision strategies: Sequels t o an initial screening. (Research Paper No. 353.) Stanford University, Graduate School of Business Zajonc, R.B., 1980. Feeling and thinking. Preferences need no inferences. American PSyChOlOQ, 35, 15 1 - 1 75.
24
This Page Intentionally Left Blank
SCALING EVALUATIVE STATEMENTS IN VERBAL PROTOCOLS FROM DECISION PROCESSES* Ola SVENSON Deprtment of Psychology, University of Stockholm, Sweden
Abstract Subjects judged evaluative statements, sampled from verbal protocols, with re;pct to degree of attractiveness and attribute specificity vs. attribute generality (e.g, expen sive" is, a sta$ement with high specificity because it cannot be used for any attribute while good has a high attribute generality across attributes). The results gave no significant support for the hypothesis that evaluations sampled from the f i s t halves of the protocols tended to be more negative than later evaluations. The attribute specificity of the evaluations was generally high and signiflcantly greater in the first half compared with the second half of a protocoL The method used and the results are discused in relation to general process models of decision making.
Introduction During recent years process tracing techniques have attracted a growing interest among researchers interested in decision making (cf. Svenson, 1974; Payne, 1976; Slovic, Fischhoff, and Lichtenstein, 1977; Payne, Braunstein, and Carroll, 1978; Bettman, 1979; Svenson, 1979a; Olshavsky, 1979; Montgomery, this volume). It seems now widely accepted that studies of decision making can be improved by collecting data about information search processes and by recording think aloud protocols In the following, the focus will be on verbal protocols. Although verbal protocols can cast new light on cognitive processes their interpretations are coupled with methodological difficulties (cf. Nisbett and Wilson, 1977), many of whch were treated by Ericsson and Simon, (1980) in their extensive article on verbal reports as data.
* The author wishes t o thank Gunnar Karlsson and LiseLotte Muller for their help in carrying through the experiments and coding the protocols. I want to thank Henry Montgomery, G r a n Hagert, Patrick Humphreys and the referees for valuable comments on an earlier version. This project was supported by the Swedish Council for Research in the Humanities and Social Sciences. 24 *
312
0.Svenson
Verbal protocols from decision processes provide data, such as, e.g., the order in which the information is searched and the evaluation of that information. They can also contain information about the length of time devoted to different pieces of information and to .subprocesses in the decision process (e.g., Svenson, 1974; Russo and Rosen, 1975; Payne, 1976; Bettman and Jacoby, 1976; Montgomery, 1977; Olshavsky, 1979). However, the present paper is primarily concerned with the evaluative aspects of the information treated in a decision process. The decision situations studied are defined here as consisting of two or more choice alternatives, which may be characterized by set of aspects (e.g., the particular size or rent of a home). Furthermore, it is assumed that the decision maker’s cognitive representation of these aspects corresponds to values on a set of dimensions or attributes which thus characterize the decision (cf. Montgomery and Svenson, 1976). To exemplify, the particular size of a home perceived as an aspect by the decision maker represents a magnitude on the (subjective) attribute of size. Each aspect is also assumed to be experienced as more or less attractive by the decision maker. Thus, it is assumed that an aspect can be mapped on an attractiveness scale. There is one attractiveness scale for each attribute and the values on these scales need not be commensurable across attributes. To illustrate, a decision situation may consist of a number of homes among wluch one has to be selected for purchase. Each home is characterized by a number of objective aspects (e.g., size in mz etc). Each of these aspects have their subjectively perceived counterparts (e.g.. perceived size-which may depend on plan ofhome, colors,etc). Each of the subjectively defined aspects also have a value on an attractiveness scale (e.g., the bigger the home up to a certain optimal size the more attractive it is-when size increases beyond the optimal size attractiveness decreases again). It has been argued (e.g., Montgomery, this volume; Tyszka, 1981) that a decision process involves active structuring and restructuring of the decision situation to come to grips with the problem and to obtain a satisfactory guide to, or rationale for action. Therefore, the representation of the alternatives in terms of attractiveness may be changing during the decision process. Montgomery, in effect, suggests that the decision making process may be characterized as a hypothesis testing activity where one “promising alternative is first selected and tested to find out if it dominates the other alternatives. Various decision rules may be applied in the search for a dominant alternative, together with restructuring of the decision alternatives to try to achieve a situation in which one alternative is seen to dominate the others. Such a process (cf. “dominance structuring”) would favor the use of simple non-compensatory decision rules (such as
SCALING VERBAL PROTOCOLS
313
"elimination by aspects") early and more sophisticated compensatory rules later in the process when trade-offs among the remaining alternatives have to be forced. It would also favor the use of less elaborate attractiveness representations early in the process (e .g., rankader scales of attractiveness; attradivenes values within attributes which are non-commensurate across attributes) in order to facilitate subsequent reconstructions of the situation. Later in the process, restructuring of the situation through activities involving bolstering, cancelling, collapsing and the application of compensatory decision rules necessitates more elaborate attractiveness representations (cf. Montgomery, this volume). Earlier research (cf. Svenson, 1974; Bettman, 1979) indicates that simpler decision rules such as the conjunctive rule are used more frequently early in a decision process. Later in the process more complex rules are used more frequently with their greater demands on simultaneous information processing and on more sophisticated metric attractiveness representation. Typically, multiattribute utility (MAUT) decision rules are complex rules of this type requiring interval scales of attractiveness which are commensurate across attributes. If uncertainty is to be handled as well, a ratio representation of subjective probability is also required as an input to the SEU decision rule, and in each case the decision maker has t o integrate the information in a cognitively rather complex way (see, however, e.g., Lopes and Ekberg, 1980, for simplifying strategies which may be used in integrating information under uncertainty). Hence, the primary aim here will be to study the attractiveness representation of decision alternatives at both early and late stages in a decision process. In particular, changes in the degree of commensurability across attributes and in attractiveness scale values of aspects will be elucidated. However, a second aim is to explore the possibilities of using a group of coders of think aloud protocols to produce data on which statistical methods may be used in testing hypotheses concerning "think aloud" data as interpreted by these coders. Three experiments are reported, the first of which served the purpose of generating verbal protocols. In the second and third experiments the commensurability of attractiveness across attributes and values of attractiveness of aspects were estimated.
Experiment 1: Collecting the Verbal Protocols Twelve subjects made decisions about their futbre homes among five available one-family house alternatives. None of them lived in a house of their own at the time when the experiment was conducted. The homes were presented in great detail in authentic booklets and represented a
0.Svenson
374
sample of one-family houses for sale in the Stockholm region at the time of the experiment (cf. Svenson, 1979b). The subjects were instructed that they personally should (hypothetically) choose to buy one of the homes. The choice should be made according to each subject's own opinion and present personal situation with the following exceptions: The subject was asked to assume that he or she had the money required for a downpayment in a bank deposit and a salary of sufficient size to pay the annual costs for any of the choice alternatives under consideration. The subjects were also instructed that (irrespective of their present situation) they should assume their family to consist of two adults and two children less than 10 years old. All decisions were made in individual sessions where the subject was instructed to think aloud ffanythought that crosses your nind during the decision process". In the first and training phase of the experiment the subjects were given just two choice alternatives. These were not included in the set of five used in the experiment proper but were similar. This gave the subjects an opportunity to learn how to think aloud and get acquainted with the task. During the training phase the experimenter was seated in an adjoining room with the door open so that he could hear what the subject said. Whenever the subject lapsed into silence the experimenter asked him in a neutral tone of voice "what are you thinking of now? This training required 20 to 30 minutes. In the first part of the training session the experimenter usually had to ask the subject to think aloud much more often than in the second half which indicates the usefulness of a training phase. The second phase, the main experiment, consisted of a choice among the five alternative one-family houses in the Stockholm area. Here the subject was left alone with a tape recorder and each made his or her choice in 30 to 45 minutes while producing the verbal protocols whose analysis is described below. I'
Sampling Evaluative Statements Typically, in interpreting verbal protocols, only one or two coders are used. However, in the present study the precision of the codings was particularly important. Therefore, a group of judges were subsequently used to judge evaluative statements from the protocols coded in the first experiment. To the present author's knowledge this is a new way of analyzing verbal protocols which may prove valuable also in the future. Unfortunately, it was not possible to analyze every evaluative statement in all the twelve protocols as that would require too long a time. Therefore, the statements analyzed were restricted to those on certain key attributes.
SCALING VERBAL PROTOCOLS
315
The sampling of the statements analyzed was performed in the following way. First, all evaluative statements concerning the attractiveness of aspects were identified in the typewritten transcripts of the twelve pro tocols from Experiment 1. Each evaluative expression was marked and classified according to which attribute it belonged to. The identification and classlfcation of evaluative statements were made by two judges who both had coded verbal protocols earlier and who worked independently on the present protocols before discussing their few discrepant interpretations to reach agreement on how each protocol should be coded. Inspection of the set of evaluative statements indicated that the attributes which could be described as economy, communications, plan of house, and plan of housing area all had many evaluations through all parts of the protocols. Therefore, evaluations concerning aspects on these attributes were selected for further studies. Second evaluations given in the first half of a protocol and in the second half were split in two groups respectively for all 12 protocols. The length of a protocol was defined by the number of statements (cf. Svenson, 1974). Using this definition the first 50% of the statements was regarded as the first part of a protocol and the rest as the last part. A finer grained decomposition of the process (e.g., in four parts) would lead to some attributes not having evaluative statements in all parts of a protocol. However, for the present purpose the split of a protocol into only two parts was judged sufficient. An evaluative statement concerning the attractiveness of an aspect was judged t o represent either an obsolufe evaluation (e.g., very good) or a comparative evaluation (e.g., alternative B’s communications are better than A’s). The coding of the 12 protocols revealed that on the four attributes studied 505 statements were absolute and 87 statements were comparative. In the following only absolute evaluations will be investigated because there were too few comparative evaluations to permit conclusions concerning differences between the first and second half of a decision process. Unfortunately, they were not significantly different in frequency during the first and second halves of the protocols as such a difference would have been interesting to analyze. The absolute evaluative statements were distributed over the four attniutes in the following way: economy (80), communications (99), plan of house (216), and plan of housing area (1 10). From this set of statements one statement was sampled randomly from each subject and half of each protocol (with an exception for the attribute of economy where only 23 statements were obtained). Thus, a total of 95 statements were used as stimuli in Experiments 2 and 3. One way to elucidate the ease with which a comparison of evalua-
316
0.Svenson
tions across attributes can be made is to investigate how attribute specific each evaluation is. To illustrate, evaluative statements of the type "close", "cheap", "damned expensive" are more closely related to their respective attributes while others such as "bad", "rather good", "excellent" seem to have a more general evaluative meaning facilitating (or being the result of) cross attribute evaluations. The mainstream of psychological research has not focused on attribute specific evaluative statements. Instead, this problem has often been avoided by choosing evaluative expressions of the greatest possible generality (e.g., Osgood, Suci, and Tannenbaum, 1957). In the next experiment the degree of attribute specificity of evaluative statements was explored and a comparison made between the first and second halves of a decision process. Experiment 2: Attribute Specificity of Evaluative Statements Subjects Sixteen psychology students at the University of Stockholm participated in the experiment. None of them had participated in Experiment 1.
Material The 95 evaluative statements sampled from the protocols from Experiment 1 were judged in this experiment. In addition, 10 fillers were chosen among the evaluative statements concerning the attribute of "service in housing area". This addition was made t o decrease the probability of subjects guessing whch attribute a statement originally referred to. Data relating statements of this additional attribute were not analyzed, however, as they were too few in number. The statements were presented in a column to the left on the pages of a booklet. Next to each statement in the leftmost column were five empty spaces one for each attribute. The subject's task as a judge was to put numbers in each of these spaces according to "how well the statement could describe an aspect" on the "characteristics" (attributes) of "economy", "communication", "plan of house", "plan of housing area" and the filler attribute of "service of housing area" respectively. For instance,"very expensive" would get the maximum score of 10 in the first space (describing an aspect of economy) and lower scores in the other four spaces. Placing zeros in each of the other four spaces would imply that an evaluative statement was judged as being entirely attribute specific.
SCALING VERBAL PROTOCOLS
311
Procedure The subjects were instructed that in an earlier experiment other students had generated verbal protocols containing a number of evaluative statements. It was explained that the protocols were obtained while the subjects were making the choice of a home. Now, the task of the present subject was to judge how well the authentic evaluative statements could characterize expressions concerning each of five different attributes in turn. No information was given about the immediate context in which an evaluative statement had been given originally. It was stressed that it was of no interest how favorable or negative the statements were and that it was not a matter of guessing in what context a statement had been given originally. Instead, the subjects were told that they had to give their own opinion of how applicable a statement seemed to be to aspects on each of the five attributes. As an example it was pointed out that "too expensive" cannot be used in conjunction with all attributes while "very good" probably is applicable to many different attributes. The ratings were given on a scale 0-10, where 0 meant that this statement cannot at all be used for the attribute in question. The number 10 indicated that the statement was perfectly fitted for characterizing the attribute.
Results The judgments were averaged over individuals and statements to obtain means for each of the four attributes respectively. Table 1 shows the mean judgements for the statements in the first and second halves of the protocol. Each column in the table represents the applicability ratings for evaluations of aspects originally referring to the attribute for that specific column It is quite clear from Table 1 that evaluative statements from the verbal protocol tend to be highly attribute specific. The difference between the mean of the entries in the diagonals of the matrices in Table 1 and the mean of the remaining numbers were tested in two t-tests. The difference for the first part of the protocol gave t = 4.55 (a = 0.01, df = 14) and the difference for the second part resulted in a t = 7.06 (a = 0.01, df = 14). Thus, in general, the evaluative statements spontaneously used in protocols tended to be attribute specific and not of a generally valid evaluative character. Each column in Table 1 describes to what extent the statements are found applicable to one attribute rather than to the other three. In order
378
0. Svenson
Table 1. Mean Applicability Ratings of Evaluative Statements Classified on Different Attributes. Eleven or Twelve Statements from 4 Attributes were Judged on the Four Attributes Judged adequacy of statement for attribute
Attribute classification when coding protocol ~
First part of protocol
(1)
( 1 ) costs
(2) Communication
(2)
(3)
Second part of protocol
(4)
(1) (2) (3) (4) 7.50 0.84 0.77 3.37 7.86 2.85 1.74 2.92 2.63 8.40 1.88 3.05 3.72 8.17 1.57 3.51
(3) Surroundings
2.70 4.24 8.51 5.05 3.05 4.67 6.91 5.29
(4) Plan
2.67 1.59 5.25 6.59 3.44 2.42 3.57 6.55
(Max estimate) (mean other estimates)
4.83 6.18 5.88 2.77 4.46 4.86 4.62 2.64
to get a numerical index of the attribute specificity it is possible t o corn pute, e.g., the difference between the maximum estimate and the mean of the other entries in each column. This was done for each of the 16 subjects. When comparing the first and second halves of the protocols in an analysis of variance of the difference measure at the bottom of the Table (attributes X parts of protocol X individuals) a significant F ratio was obtained (F = 48.02, a = 0.01, df = 1/45) when the triple interaction term was used as error estimate. In conclusion, the attribute specificity of evaluative expressions was found to be generally quite high but tended t o decrease towards the end of a protocoL This finding is in accord with the account of general process models of decision malung as given by Montgomery in this volume. Such models also suggest that negative evaluations are used to eliminate decision alternatives early in a decision process. Therefore, the next experiment was designed to provide data on the degree of attractiveness represented by evaluative statements both early and late in the decision process.
Experiment 3: Scaling Evaluative Statements Subjects The subjects were drawn from the same population as the decision makers in Experiment 1 (psychology students at the University of Stockholm). Thirty subjects participated in fulfillment of a course requirement. None of them had participated in Experiment 1 or 2.
379
SCALING VERBAL PROTOCOLS
Procedure The subjects were first informed about the decision processes generating the evaluative expressions and that the decisions were made by students like themselves. The information about the earlier experiment was the same as in Experiment 2. After this, each subject was given a booklet with an instruction to judge (on a linear 154 mm long scale) the attractiveness of evaluative statements presented to them and which were originally @yen as a characteristic of the attribute given on top of each page. Each evaluative statement was written above the response line which had a small bar at the midpoint and open ends. On top of the midpoint of the line a label said “neither positive nor negative evaluation”, on the left the text above the line read “negative evaluation”, and on the right ”positive evaluation”. The subjects were instructed to judge each of the evaluative statements with reference to the attribute indicated on each page by putting a mark on the response line. Each evaluative statement wasjudged twice by each subject but for time reasons a subject was only presented statements belonging to two attributes. All pairs of two attributes were evenly distributed among the 30 subjects.
Results The attractiveness estimates were distributed over the whole range of the scale with the exception of the very extremes. Table 2 shows the mean values for the different attributes respectively. Generally spealung, negative statements predominate. Also, statements in the second half of a protocol were on the average more negative than those in the first half. However, this difference was not significant in an analysis of variance (half of a protocol x attribute, with mean judgements for statements as repli-
First part of protocol (1) costs Communication
Second part of protocol (11)
11-1
-2.6
-0.3
2.3
-17.1
-3.4
13.7
Plan of house
-7.9
Surroundings
-26.6
-0.4 -20.7
7.5 5.9
380
0.Svenson
cates, F = 1.01, df = 1/88).The mean standard deviation of the judgements for an aspect did not vary systematically between the first and second parts of the protocols.
Concluding Remarks
As mentioned earlier, the illustrations provided by the experiments should be seen against the background of a general process model for decision making (cf. Montgomery and Svenson, 1976; Montgomery, this volume). According to such models different decision rules and restructuring processes are used to eliminate early choice candidates to find the best alternative among the remaining alternatives. Experiment 2 showed that in the first half of the protocols the evaluative statements tended to be more attribute specific than towards the end. This can be interpreted as a reflection of a less elaborate representation of the decision situation and more frequent use of non-compensatory decision rules in the early stages where comparisons of aspects across attributes and alternatives is not necessary. In the second part of the protocols less attribute specific evaluations were found which may reflect more elaborate representations of the decision situation and more use of compensatory decision rules. Experiment 3 did not indicate that more negative statements were used in the first half of a protocol. If this finding is replicated in further research this is important because negative evaluations are assumed important for eliminating alternatives in non-compensatory conjunctive type decision rules. Overall, negative evaluations tended to dominate in the protocols. This may reflect the aspiration level of the subjects or a general characteristic of a decision process. The collection of verbal protocols during experiments on cognitive processes introduce many promising prospects and possibilities of obtaining richer information about what is taking place in the subjects’minds. At the same time verbal protocols seem to invite unrealistic expectations which cannot be met. On the surface, protocols may seem complete, comprehensive and often logical to those who approach the decision task through constructing the same representation as do the subjects. Quite often, the seemingly complete and comprehensive character of a verbal protocol affects the evaluations of both proponents and opponents of verbal protocol data. The proponents believe in the story told in the protocol and sometimes try to fit a computer model to the protocol. The opponents criticize verbal protocols because they can be shown, e.g., not to tell the whole story (cf. Nisbett and Wilson, 1977). To conclude, the present experiments provided evidence supporting
SCALING VERBAL PROTOCOLS
381
process models for decision making of the type discussed by Montgomery (this volume). They also illustrated that conventional interpretations of think aloud protocols can be verified in additional studies using several judges t o provide data whch can be treated statistically. This has the advantage that the conclusions are independent of the usually one or two critical coders of the protocols who are usually well informed about the aim of a study and consequently may introduce biases into the interpretations of the protocols. The general procedure exemplified in this paper can be applied in order to discriminate between specific hypotheses concerning the processes tapped by verbal protocols, provided enough data are collected. Here, as in any research paradigm, the problem is to create meaningful and precise alternative hypotheses. Verbal protocols are very much like any other data. If properly obtained and analyzed they can help in advancing our knowledge about psychological processes. Verbal protocols may be harder and more expensive in time and effort to analyze than most other data but they are invaluable if we want to understand the process of human decision making.
References Bettman, J.R, 1979. An Information Processing Theory of Consumer Choice. Reading: Addison-Wesley. Bettman, J.R and J. Jacoby, 1976. Patterns of processing in consumer information acquisition. In: B.B. Anderson (ed.), Advances in Consumer Research, Vol. 3. Chicago: Association for Consumer Research, 315-320. Ericsson, KA. and H.A. Simon, 1980. Verbal reports as data. Psychological Review, 87, 215-251. Lopes L.L. and P.H.S. Ekberg, 1980. Test of an ordering hypothesis in risky decision making. Acta Psychologiim, 45, 161-167. Montgomery, H., 1977. A study of intransitive preferences using think aloud p r e cedure. In: H. Jungeman and G. de Zeeuw (eds.), Decision Making and Change in Human Affairs Dordrecht: ReideL Montgomery, H., 1983. Decision rules and the search for a dominance structure: Towards a process model of decision making. In this volume, 343-369. Montgomery, H. and 0. Svenson, 1976. On decision rules and information processing strategies for choices among multiattribute alternatives. Sundinavian Journal O f PSyChOlOgy, 17, 283-291. Newell, A. and HA. Simon, 1972. Human Problem Solving Englewood Cliffs: Prentice Hall. Nisbett, R.E. and T.D. Wilson, 1977. Telling more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231 -259. Olshavsky, RW., 1979. Task complexity and contingent processing in decision making. A replication and extension. Organizational Behavior and Human Performance, 24, 300-316.
38 2
0. Svenson
Osgood, C.E., G.1. Suck and P.H. Tannenbaum, 1957. The Measurement of Meaning Urbana: University of Illinois Press Payne, J.W.. 1976. Task complexity and contingent processing In decision making: An information search and protocol analysis. Orgrrnizational Behavior and Human Performance, 16, 366-387. Payne, J.W., M.L Braunstein, and J.S. Carroll, 1978. Exploring prededsional behavior: An alternative approach to decision research. Oganizationul Behavior and Human Performance, 22, 17-44. Russo, J.E. and LD. Rosen, 1975. An eye fixation analysisof multialternative choice. Memory and Cognition, 3, 267-276. Slovic, P., B. Fischhoff, and S. Llchtenstein, 1977. Behavioral decision theory. Annual Review of Psychology, 28, 1-39. Svenson, O., 1974. A note on think aloud protocols obtained during the choice of a home. Reports from the Psychological Laboratories, University of Stockholm, No. 421. Svenson, O., 1979a. Process descriptions of dedsion making Oganizationul Behavior and Human Perfovmance, 23, 86- 112. Svenson, O., 1979b. Conflict and conflict resolution in couples making a choice of a new home: A process tracing study. Department of Psychology, University of Stockholm. Project: Cognitive aspects of human information processing. Report No. 17. Tyszka, T., 1981. Simple decision strategies vs. a multi-attribute utility theory approach to complex decision problems. Praxiology, 2, 159- 172.
TO SMOKE OR NOT TO SMOKE: CONFLICT OR LACK OF DIFFERENTIATION? Lennart SJdBERG University of Giiteborg, Sweden
Abstract Decision making has been treated by conflict and dissonance theorists who agree that defensive avoidance is a likely response to threat but disagree as to its cause and when it is likely to occur. However, the measurement of conflict has seldom.been extensively discussed, nor has the possible importance of differentiation between and within options. In a study of smokers it was found that conflict measures were not consistently related to differentiation. Differentiation rather than conflict was consistently related to avoidance and it correlated with previous response to failure in quitting smoking, a finding which was predicted neither by conflict nor dissonance theory.
Introduction A choice situation involves conflict when there is no one alternative of action which is certain to be best in all possible respects and under all conceivable future states. This can occur when there is a possibility of negative events directly consequent on choice, or when there is anticipated regret over the positive aspects of rejected options. C o a c t is here defined as a property of the choice situation. It This study was initiated during my tenure as a Fellow of The Netherlands Institute for Advanced Study; the generous support by the Institute is gratefully acknowledged. It was concluded when I was associated with the FacultC Universitaire Catholique de Mons (FUCAM), Bebum, and also supported by the European Institute for Advanced Studies in Management, Brussels. 1 also wish to thank Oscar beetveld who collected the data. The study was supported by a grant from the Swedish Council for Humanistic and Social Science Research; computer time was made available through the Department of Psychology, University of Leiden. The careful editorial review by Patrick Humphreys and Charles Vlek is deeply appreciated.
384
L. Sjoberg
should be distinguished from its effects of a cognitive or emotional nature to be discussed below. It is a concept which has been rather neglected in modern works on decision making, perhaps partly because it is associated with motivational and emotional notions that are awkward to a field of research dominated by cognitive psychology and normative economic approaches. Many of the potentially important aspects of conflict also concern phenomena within a wider frame than that of choice processes proper, such as action or information selection and distortion. Such events occur prior to the final choice processes as conceived in traditional work on decision making, and are thus external to that field of study. Among the earlier treatments of conflict, one of the best known is due to Lewin who suggested a classification of conflict (Lewin, 1946): approach-approach, avoidancGavoidance, approach-avoidance, and multiple approach-avoidance. The last mentioned case is of special interest here. It involves choices among options that are negative in some respects and positive in others. This is typically true of decisions of whether to quit or to retain addictive habits such as smoking. Lewin’s thinlung, however, has not had much influence on current work in the field of decision making, although an attempt was made to argue for the revival of some of his ideas, particularly on the emergence of decisions, at the 1979 SPUDM conference (Beach and Wise, 1980). One reason may be that, after all, Lewin was rather non-specific and his empirical illustrations were largely confined to the spatial behavior of chldren. Some of the notions he developed from such work seem of rather minor applicability in much of decision making. One idea which I will, however, use here, was that the total emotional stress brought about by a choice situation was assumed to be increasing as a function of the sum of the valences of the options, whether they are negative or positive (Lewin, 1946). If options are either negative or positive this is an index closely related to outcome variance. Lewin’s theoretical analysis of conflict is fairly representative of the most common notions of psychodynamic theories. Conflict creates frustration. It should lead to defensive moves, avoidance of making a decision and various forms of information distortion. Janis and Mann (1977) have refined and extended the psychodynamic approach t o the role of conflict in decision making. They postulated defensive avoidance as one type of response to stress; another, in the case of extreme threat, would be hypervigilance. That case is not of direct relevance here since the threats under study are not that severe. Montgomery (this volume) assumed that the decision maker is motivated to avoid conflict and that he or she therefore transforms information so as to obtain a dominating alternative. Some motivation theorists, such as Berlyne (1960), have assumed that some degree of uncertainty is attractive. Although uncertainty in this
TO SMOKE OR NOT TO SMOKE
385
context refers to outcome rather than choice uncertainty, it may be assumed that the two concepts are closely related, since outcome uncertainty is likely to lead to choice uncertainty. A low degree of conflict could, then, be attractive and stimulating: a higher degree would produce defensiveness and a still higher degree of conflict would tend to bring about hypervigilance and “panic”. Some of the phenomena related to conflict are treated by dissonance theory (Festinger, 1957), but that theory was never clearly formulated, and the many attempts at replicating some of its basic findings have often been unsuccessful. It is, indeed, questionable whether a motive’ to avoid cognitive dissonance exists at all, one major competitor being that of selfesteem needs (Greenwald and Ronis, 1978). Furthermore, conflict need not at all give rise to dissonance. The anticipation of some unpleasant outcome is quite sufficient to account for the fact that conflict is an aversive state. Take Festinger’s often quoted example of the smoker who is allegedly in a ”dissonant” state if he knows that smoking is dangerous. The unpleasant nature of such a state can simply be due to the fact that the smoker anticipates that his addiction may bring about serious illness. Or take the fact that relapse among addicts is an aversive experience. Marlatt (1978) took dissonance theory as a basis for accounting for the consequences of relapse, but it was pointed out by Sjoberg and Olsson (1981) that aversive emotional experiences, e.g., of guilt and shame, are just as likely as explanations and certainly much better established as potent psychological forces than the elusive concept of dissonance. In his book on pre- and post-decisional processes Festinger (1964) argued that predecisional processes display little bias while a decision and its consequent commitment created dissonance and regret which is then reduced by various defensive manoeuvers. Thus, he tended to play down the effects of pre-decisional conflict and to emphasize the postulated state of dissonance, but the empirical evidence he presented for unbiased pre-decisional thinking was quite inextensive. Earlier approaches to conflict have, thus, been rather inconclusive as to its effects, and two opposing views can be discerned. In the Lewinian and psychodynamic tradition conflict is seen as emotionally aversive and it should lead to defensive moves, and to a decreased overt concern with the issues involved in the choice problem. Montgomery’s model of decision making is an example of this tradition. On the other hand, in dissonance
’
The phenomena of “dissonance” exists, of course, but what is doubtful is the postulation of a motive (drive, need) to avoid dissonance per se. The point is quite important because there is a widespread opinion that dissonance theory accounts for t h e phenomena. It does not.
25
386
L. Sjoberg
theory it is held that pre-decision conflict is much less potent than post-decision and post-commitment dissonance, which is given the main role in accounting for irrational, defensive moves. Thus, dynamic theory predicts conflicts t o be correlated with defensive moves while dissonance theory predicts no such (strong) relations before a decision is made. The present study is concerned with pre-decisional processes and there should, therefore, be little or no defensive distortion of information according to dissonance theory and no correlation between conflict and distortion. An alternative to conflict and dissonance is that of differentiation or cognitive style (Bieri, 1961). If expected values of options are similar, there is little between option differentiation and much conflict. However, within option differentiation means much conflict, because differentiated options with several highly negative and several highly positive attributes will be ambivalent and conflict producing according to dynamic conflict theory. If conflict is the main agent in accounting for data, we should expect between and within option differentiation to be negatively related. If differentiation is more potent they should correlate positively. Thus either it should be true that people with a differentiated view should be better informed and show more concern with the issues, or, according to conflict theory, there should be, with conflict, defensive moves and decreased thought concerns with the issue. Dissonance theory predicts that there should be no systematic distortion of information prior to making a decision. Examples of concern in the case of quitting to smoke are the following. Concern can be exhibited by deciding to quit some time in the future, thinking about quitting, regretting having started t o smoke, wishing t o be a non-smoker, advising others not to start smoking, deciding in principle to quit and finding it less likely that a decision t o quit will be changed. In conflict, thoughts about quitting should be unpleasant, tiring and anxious, One should also find it difficult to decide and the future prospect of ever quitting should be meagre. If a conflict is repressed there should possibly be more dreams, according t o psychodynamic theory; at any rate the frequency of dreams concerned with smoking should not correlate with overr measures of conflict (only with covert or repressed conflicts-and here we will obtain no measures of covert conflicts). Another way of measuring defensive restructuring of information is to obtain correlations between beliefs and values. Positive correlations indicate wishful thinking and negative hypervigilance (Janis and Mann, 1977). Strong correlations have been observed when, for instance, the decision maker is highly involved in the issue (Sjoberg, 1981a; Sjoberg and Biel, 1981).
TO SMOKE OR NOT TO SMOKE
381
In my previous work, outcomes of alternatives of action were rated as to their value and probability, and therefore the values and probabilities were defmed by the action to be taken. This is similar to the Fishbein approach to attitudes (Fishbein and Ajzen, 1975). In other works on belief-value interaction this condition has not been met (e.g., McCuire, 1960). Studies have been made of heterogeneous samples of events. In that way one could possibly find a general tendency, for example, to wishful thinking but such a tendency need not, to any great extent, account for belief-value interaction in any specific topic. Sjoberg and Biel (1981) found, for example, that a scale of optimism did not correlate with belief-value correlation in the field of nuclear power attitudes. The dynamics of the specific topic w?.s apparently more important than the trait of optimism-pessimism. It was, furthermore, found that commitment to a course of action was followed by a sharp increase in belief-value correlations (Sjoberg, 1981b). The latter finding is in line with the results that Festinger (1964) reported for post-decision process, though he interpreted them in terms of the reduction of post-decision dissonance. Reformulation of alternatives to obtain dominance, in line with Montgomery’s (this volume) model, leads to greater overall differences among them. This can be achieved both by probability and by value changes. Basically, bad properties of a desired alternative may be seen either as not bad or as very unlikely, and vice versa for good properties. This will eventually result in a positive belief-value correlation for the accepted alternative and negative correlations for the rejected alternatives. The dominant or quasidominant option which is, according to Montgomery, constructed by the decision maker, is therefore likely to have a high positive belief-value correlation. In the Sjoberg and Biel paper it was also noted that people who show extensive belief-value interaction are more likely to take action. The purpose of the present paper was to investigate conflict and differentiation in the area of smoking, where conflict should be rather common. To get some information about possible antecedents of conflict, data were also obtained on reactions to failure in quitting smoking and on how smoking was experienced. Various measures of conflict and differentiation can be constructed and will be described below. They should all, according to dynamic theory, correlate negatively with overt concern with the issue and positively with belief-value correlation, which reflects a tendency to construct a dominant option. If differentiation is the major factor, there should be convergent relations with that aspect rather than with conflict. If dissonance theory is correct there should be little or no systematic distortion of infor mat ion. 25*
388
L. Sjoberg
Method Subjects Fifty persons agreed to participate in the study. Of these, 45 actually returned filled out questionnaires, 20 women and 25 men. Their mean age was 30.4 years (SD = 11.2 years). Thirtyone had university or some other form of higher vocational education. They had been smoking on the average for 13.1 years and they smoked at the present on the average 19.6 cigarettes per day (or the equivalent amount of tobacco in cigars or pipes). They had made, on the average, two to five attempts t o quit, their most recent attempt being performed more than eight months ago. The questionnaires were rather carefully responded to; all analyses to be reported are based on all data available, usually implying that statistics are based on about 40 observations, but in some cases efficient N approached 20. Questionnaire The questionnaire was rather extensive. After questions about the background data referred to above, it asked about causes for failure in quitting, and effects experienced following failure. There were 14 causes and 19 effects. Examples of causes are: being in a pleasant mood, cigarettes were easily available and the will to feel free. Examples of effects are: feelings of shame, guilt and relief. A number of questions about behavior assumed t o be related to conflict were then asked, as well as questions about the effects of smoking and when the subject usually smoked (see Table 1 for examples). The questions were given with rating scales so as to produce the quantitative variables used in the data analysis presented below. Then followed five sets of ratings of each of the 35 smoking related events or states, such as decreased appetite, feeling of pain in the chest and being emotionally stable. These events had been chosen so as t o be equally divided among positive and negative effects of smoking or not smoking, respectively. Their selection was done in a pilot study with 20 smokers and they had previously been used in a study of value change subsequent t o a decision to quit or reduce smoking (Sjoberg, 1981b). Based on this rather extensive expenence, the list was revised and the verbal formulations improved for the present study. The events were to be rated as to (a) overall value, (b) probability under continued smokmg, (c) probability under giving up smoking,(d) whether they spoke for or against smoking, and (e) how much weight they had in determining the attitude of the subject towards smohng. Note that there was one importance rating for each event and that it is conceptually different from value because, among
TO SMOKE OR NOT TO SMOKE
389
other things, it was tied to how much it determines the attitude. All ratings except (d) were made on 5-point category scales, (d) ratings using only two categories. The subjects filled out the questionnaires at their leisure, mostly at home, and returned them, most often by mail, to the experimenter. It took about 45 minutes to respond to the questionnaire. Indices of Conflict and Differentiation. Subjects rated the values of 35 smoking related events and states, and their probabilities given continued smoking and given quitting smokmg. The importance weights of the 35 events and states were also rated. After transforming the values linearly to the range -0.5 - +0.5 and the probabilities and weights to the range 0 - 1 , poducts of the type value x probability x weight were computed both for the option of smoking and for not smoking, yielding 35 such products for each option3 The conflict and differentiation measures were based on these products. The following indices were obtained: (1) Discriminution. The absolute difference between smoking and not smoking, based on their average products. Lack of discrimination should give rise t o conflict mostly because of anticipated regret. ( 2 ) Valence. The sum of the absolute average products for smoking and not smoking. Lewin’s suggestion (Lewin, 1946) was that total valence should account for emotional stress. (3) Max. The average product for the best alternative. The better it is, the less should the conflict be. The lower the value of the best alternative, the more regret and negative effects of a direct nature should one expect. (4) SD-max. The standard deviation of products for the best alternative. If the dispersion is large there should be a high degree of conflict and of differentiation. ( 5 ) SDpooled. The pooled dispersion of both alternatives. The two dispersion measures are presumably loaded in both aspects of conflict since they should reflect the ambivalence of the subject either to the best option or to both of the options. (6) SDdiff: The standard deviation of differences. It is related, somewhat indirectly, to lack of dominance. The measure SD-diff is therefore an index of degree of dominance. (7) Domnor. A more direct measure of dominance: the number of times the overall superior alternative gave the highest product divided by the number of times the two alternatives were not tied. In thn way, events rated as neutral in value, or quite unimportant for determining attitude, did not contribute to the scores used for the subsequently calculated conflict indices.
390
L. Sjoberg
(8) Regret. The average product for those aspects in which the alternative which was overall inferior was superior. Note that (7-DOmnOr) gave the proportion of such aspects The measure of regret is a rather straightforward operational definition measuring the advantages of the least preferred option. (9) Behavior Conflict. The difference between the average product for not smoking and for smoking. The measure of behavior conflict is suggested because all subjects were actually smokers, and the more they preferred not smoking, the more should they experience conflict between their values and their behavior.
Relations Among Constructed Indices. These relations largely reflect which functions were chosen for defining the indices, but some of the results are still of interest. Discrimination correlated strongly with Behavior Conflict (r = 0.91), reflecting the fact that most subjects gave a higher composite score to the alternative of not smoking. AU dispersion measures correlated strongly among themselves and also with Discrimination, albeit positively (r’s from 0.61 to 0.66). Valence, Max, Domnor and Regret had low correlations with other indices, mostly positive. It is especially noteworthy that Discrimination and Max correlated positively with the dispersion indices, contrary to the expectation of negative correlations under a conflict interpretation, but consonant with a dif. ferentiation interpretation. Correlations Between Constructed Indices and Thought Con*-erns. The nine indices were correlated with thoughts reflecting concern about the issue (see Table 1). Several aspects are noteworthy about the data in Table 1 : (1) Discrimination correlated as expected, under a conflict interpretation, with concern. The measure Behavior Conflict reflected a sirmlar trend, and it correlated 0.9 1 with Discrimination. (Sixty-four per cent of the subjects had a mean value rating of not smoking larger than for smoking.) It was also possible, of course, that very small values of Discrimination might also generate concern with the issues, implying a curvilinear relationship, but investigation of selected scattergrams did not show any such trends. Valence was unrelated to practically all other data on the deci(2) sion problem, thus not supporting Lewin’s suggestion that it should reflect total emotional stress.
39 1
TO SMOKE OR NOT TO SMOKE
Table 1. Correlations Between Conflict Measures and Concern About Smoking. Discrimination and Max are inverse measures of conflict. Zero and decimal point omitted.
Discri- Valmina- mce tion
-
-- --
Max
SDmax
SD0010
SDdiff
-- - Quit in the future
45**4
15
18
Think about quitting
37*
36*
Thoughts about quitting, pleasant
-11
Thoughts about quitting, active Thoughts about quitting, calm
)om nor
Beh. :onf.
-
so** 12** .3** 09
46** 17**
-07
41**
34*
30*
02
-3 1
35*
-0 1
08
-11
-08
-02
22
07
-02
-15
04
-13
-26
-2 1
-22
-10
06
-16
-25
-21
08
-31
-29
-36*
.o
30
-24
19
11
02
23
25
16**
00
02
24
Regret having started to smoke
40**
10
3
57**'
7**:
-0 1
-25
29
Wish to be a nonsmoker
40* *
12
24
47** 12**
28
08
-30
31*
Advice against smoking
15
01
23
41**
to**
36*
24
-10
12
Decide in principle to quit
26
5
-0 1 52**4 17** 34*
-08
-35'
24
Difficult to decide
17
12
14
30
19
08
01
-29
08
Likely t o change decision
13
07
-03
15
04
-03
09
00
-19
will you ever quit
27
-12
Dreams about smoking
*** p < 0.001 ** p < 0.01 * p < 0.05 (3) (4)
18 ;2* -09 -18 31* 07 19 --- --
Max was quite unrelated to thought concerns. AU three measures involving dispersions were related to thought concerns opposite t o the direction expected by conflict theory, but in accordance with differentiation theory.
39 2
L.Sjoberg
(5) Dominance was unrelated to thought concerns. ( 6 ) Behavior conflict was related to concern opposite t o the conflict interpretation but weaker than the dispersions. (7) Regret was weakly related to concerns. It seems that, with a high level of Regret, subjects tended to stay away from the decision problem. (8) There were virtually no significant correlations between the nine indices and the emotional effects of thinking about the issue. Some more understanding of these trends may be obtained from Table 2, which gives correlations between thought concerns and the means and standard deviations of smoking and not smoking value ratings. It is very clear that the means were virtually unrelated to thought concerns while the standard deviations correla?ed substantially with them. When it comes to information search, Table 3, the same measures as in Table 1 had some predictive power but only with reference t o finding information about how to quit. Concern with tobacco advertisements or anti-smoking information seemed unrelated to conflict measures, and trust in information about health risks of smoking was also unrelated. Since the value ratings were of the type value x probability x weight, it was of interest to see if thought concerns were related t o all three of the dispersions or only to one or two of them. Correlations among dispersions were considerable (see Table 4), with the quabfication that standard deviations of importance weghts correlated less with the other three sets of dispersions. Thought concerns were related about equally strongly to dispersions of values and the probabilities of smoking and not smoking but not at all to dispersions of importance weights. This finding suggests that the possible artefact of differences in the handling of the rating scale can be ruled out as an explanation of the relationships. The correlations between the proposed conflict measures and belief-value correlations are given in Table 5. The latter correlations were either entered with sign retained and transformed to Fisher’s z or entered with their absolute value. It is noted that Domnor, as expected, correlated with absolute belief-value correlations and that dispersion measures were little related to belief-value correlations. On the other hand, Max and Behavior Conflict did correlate with belief-value correlations, sign retained. The commitment suggested by large values of these measures was thus reflected by large absolute belief-value correlations for not smoking, the option considered preferable by most subjects. It has been reported elsewhere that action brings about a higher belief-value correlation for the chosen act (Sjoberg, 1981b).
TO S M O K E OR NOT TO S M O K E
39 3
Table 2. Correlations Between Means (M) and Standard Deviations (SD) of Rating Products and Selected Thought Concerns, Frequency of Attempts To Quit and Attitude to Smoking. Zero and decimal point omitted. ~~
Smoke
Not smoke
M
SD
M
SD
Quit in the future
-22
41**
22
44**
Think about quitting
-31 *
34*
-05
34 *
Dreams about smoking
-13
26
06
23
Regret having started to smoke
-13
53***
11
58***
Wish to be a non-smoker
-05
31 *
22
47**
Advice against smoking
-10
40**
22
36*
Decide in principle to quit
-21
48**
-0 1
45**
Difficult to decide
03
12
11
28
Likely to change decision
09
-01
07
12
20
11
17
Will you ever quit
--
15
Frequency of attempts to quit
01
16
19
24
Attitude to smoking
08
-31*
13
-25
**p< 0.001 ** p < 0.01 * p < 0.05
L. Sjoberg
394
Table 3. Correlations Between Conflict Measures and Information Search. Zero and decimal point omitted.
Dis-
-
ValcrimiMax :nce natior
SD- SDmax looled
Info. about dangers of smoking Info, about quitting Trust in info.
*** ** *
Beh. Conf.
--
Cig. adv.
Regret
-07
-05
-09
25
18
-30
-04
13
02
04
'2
21
-31
11
37**
I8
05
50*'
45**
-31
36*
01
-01
-01 05 - -22 -
13 04
p
Table 4. Correlations Among Standard Deviations in Value, Weight and Probability Ratings. Zero and decimal point omitted.
Value
Weight
Probability, Smoke
-
Probability, Not Smoke
20
66
73
Weight
20
-
31
46
Probability, smoke
66
31
-
78
Probability, not smoke
73
46
78
Value
A further aspect of concern is that of dreaming. Frequency of dreaming about smoking correlated with recency of attempts t o quit, thinking more often about smoking and having made a decision t o quit in the future. Dreamers reported more conflicts with others about smoking and trusted information about smoking risks more. They saw failures t o quit as caused by their will t o be free and responded to such failure b y calmness and fatigue, but also guilt and fear, Conflict and differentiation measures did not correlate with frequency of dreaming.
395
TO SMOKE OR NOT TO SMOKE ~
T a l k 5. Correlation$ Betwcen Conflict Measures and Transformed (1:islier’s Z) and Ahsolutc Corrclations Between k t i n g s o f Valuc and Probahility. Zero and decimal point omitted.
-
DisValcrim i:nce nation
-
--
--
SD- S Doolec diff
Re- Beh. ;ret Conf.
--
-
-
Con. val ue-pro b. smoking
--34* 41*
-07
-09
18
32
-31*
Carr. value-prob. not smoking
37*
19
29
24
30
16
45**
Abs. corr. value-prob. smoking
07,
-09
-06
-lR
37*
10
-05
34 *
-1 1 4 7 * *
-
Abs. corr.
value-prob. not smoking
*** **
*
p
-
1
13
35 * - - -* - 14
05
30
09
p < 0.01 p < 0.0s
The actual frequency of attempts to quit was correlated with the conflict and differentiation measures (see Table 6 ) . It is interesting to note that Domnor, indicating a more action promoting decision structure, also correlated with frequency of attempts to quit while a high level of Regret once more came out as negative for action. Antecedents of Conflict. The questionnaire also asked for causes of failure and response t o failure. The most common causes of failure, as seen by the subjects themselves, were a strong desire to smoke, a strong habit, feeling tense and a lack of any positive consequence of not smoking. Some items with lower means and large standard deviations suggested that some people attributed their failure to symptoms of abstinence, unpleasant feelings or easy availability of smoking material. The value of not smoking seemed to some extent related to the attribution of causes. Those who attributed failure to a pleasant and calm mood, to habit or to a will t o feel free, had a higher level of value for not smoking. Otherwise, attribution of causes had little relation to conflict and differentiation measures or to dispersions of values.
L. Sjoberg
396 Table 6. Correlations Between Conflict Measures and Reported Frequerrcy and Recency of Attempts To Quit. Zero and decimal point omitted. Frequency Discrimination
Recency
20
-25
-18
-05
Max
18
-06
SD-max
24
-06
SD-pooled
19
-09
SDdiff
02
-19
Domnor
38**
Regret
20
Valence
Behavior Conflict
**
p
-42**
01 - 29
07
< 0.01
The opposite was true for responses to failure, however, as can be seen in Table 7. The most frequent responses were tension, guilt and an unpleasant mood, but relief and pleasure were also rather frequent. The responses correlated negatively with the value of smoking and positively with the dispersions both of smoking and not smoking. The lowered smoking value and the increased dispersions are reflected in the relations between conflict measures and responses in Table 7. Note that it is for those conflict measures that correlated with thought concerns that sizeable correlations with response to failure were obtained. In this connection it may be noted that the reported mood effects of smoking (results not given here) did not correlate significantly with means or dispersions for smoking and not smoking, respectively. Hence, the response to failure in quitting smoking seemed a more potent force in influencing values than actual experience of smoking. Background data such as age, sex, education, amount smoked, etc., were largely unrelated t o conflict and value measures, lending some support to the generality of the present findings.
TO SMOKE OR NOT TO SMOKE
397
Table 7. Correlations Between Conflict Measures and Response to Failure in Quitting Smoking. Zero and decimal point omitted.
-
Dis-
Response
SDmax
Valcrimi:nce nation
SD- SD- Dom- i e g Beh. ret Conf. oolec liff nor
-
- -
-
Pleasant mood
-16
-16
17
02
00
-0I
13
12
-18
Unpleasant mood
35*
17
23
17
20
10
18
-05
42**
Tense
08
05
14
11
10
13
04
18
-03
Calm
38*
02
23
26
28
30
10
-14
40*
Tired
37*
07
06
19
21
19
26
-3 1
33
Active
18
19
02
35 *
29
13
06
- 30
17
46**
Guilt
36*
31
02
46** 50**
33
- 09
06
Shame
53**
29
02
40*
13*
32
-05
-21 52**
Disgust
55**
25
06
31
32
29
00
-37 59***
Surprise
32
21
33
16
14
12
21
- 30
61*** 21
18
35 *
36 *
27
17
-31 59***
14
32
23
23
18
09
-12
Sadness Anger
43*
24 40*
Self-contempt
58**# 27
13
SO**
jO** 33
11
-33 s3***
Fear
47**
16
09
36 *
14*
17*
00
-08 49**
Loss of confidence in selfcontrol
28
3f *
-06
31
35 *
04
01
00
23
Relief
12
05
31
40 *
35 *
13*
03
22
13
Resignat ion
04
06
12
18
13
17
-04
13
08
Not wanting to think about quitting any more
05
02
24
08
13
33
02
37
-02
Despair
*** ** *
p
12 08 37* 07 00 10* 38* 23 -- 25- --
Discussion Dissonance theory is not supported by the present data because concern about the issue and amount of information distortion were systematically related to the indices of conflict and differentiation. The question then becomes whether a conflict or a differentiation interpretation is to be preferred.
398
L. Sjoberg
Conflict, as measured here, partly split up in two different aspects, descrimination-regret and dispersion. Discrimination was rather strongly correlated with dispersion measures (but in a direction opposite to that predicted under a conflict interpretation). Regret did not give the same support to distinguishing between the two types of measures. Difficulty in discrimination generally, and the experience of many good aspects in the least preferred alternative in particular, led to avoiding the decision problem A differentiated opinion of each alternative was correlated with concern with the decision problem. Conflict was defined in the introduction as a property of the perceived decision situation. The proposed nine indices of conflict did not give convergent results under the conflict interpretation. Under the interpretation of differentiation, however, a convergent pattern emerged. Internal differentiation in terms of values and probabilities seemed particularly important for generating concerns. There was n o evidence for emotional stress due to conflict. Difficulty to decide and t o act appears, thus, to be dependent o n lack of differentiation both within and between alternatives; and the consequence of such difficulty is lack of concern with or escape from the decision problem. Further sources of difficulty are, of course, likely t o exist, such as uncertainty about values and probabilities and current concerns with other burdensome problems. hfferentiation, in the present data, was thus reflected both within and between options. Tolerance of some ambivalence to options seemed t o be coupled with the advantage of discriminating between them and with an active, concerned, attitude. Perhaps this could be interpreted in terms of cognitive style, or, preferably, availability of mental energy to spend on these issues, a concept which need not be interpreted as a general trait but could be specific to the issue. Affective differentiation is likely t o be aroused by motives. Hence, strong motives should lead to concern and action via this mechanism Both the bipolar values and the importance weights should become more extreme with more intensive motives. One could argue that the concern shown by such activity as deciding in principle to quit in the future is really a form of evasion. More direct and detailed action measures could answer that question. However, the fact remains that discrimination between options was positively related to discrimination within options so it is unlikely that a general concept of conflict could be saved b y more data on action. An alternative interpretation is that concern causes differentiation' causal path models could help to distinguish the two interpretations, but more data are needed to study such models. The phenomena postulated i n such causal models are hard to study also because of the unknown time
TO SMOKE OR NOT TO SMOKE
399
relations. It is not known how fast the influence of concern is, nor can we easily observe its workings, in field settings, at the time when it occurs. We are, therefore, uncertain whether we study the factors at play in affecting each other in a field study such as the present one, or whether we study people in a steady state. In the latter case we obtain, under a conflict interpretation, measures of tolerated conflict residuals rather than the effective amount of conflict which once existed and is responsible for the present steady state. Some antecedents of conflict and differentiation were found in responses to failure in quitting. These failure responses correlated with the value of smoking and its dispersion. They were also related t o concerns. Perceived causes of failure correlated mostly with the value of not smoking. Some subjects seem t o have come to favour not smoking because they attributed failure to quit t o their being under the control of hedonistic impulses or habit. Follow-up studies may make it possible to analyze if all of the relationships between dispersion and concern are due t o responses t o failure. The present data set is too small t o answer the question. Furthermore, longitudinal change measures would be of value in such an investigation. Action, actually trying to quit, was little related to differentiation but did correlate with Regret and Domnor. Much regret led t o lack of action. Dominance was positive for action, and was also related as expected to belief-value correlations. This is in good agreement with the findings reported elsewhere (Sjoberg, 1981b) that commitment brings with it more belief-value distortion. Action and concern measures were related so it is reasonable to expect concern t o bring about action, even if action was little related here to differentiation. Distortion t o achieve dominance (Montgomery, this volume) should have the same effect; even better may be to decrease regret directly by defining the rejected alternative so as to be of litte value even in aspects where it exceeds the preferred alternative. If beliefs and values correlate positively for the accepted and negatively for the rejected option there will be an increased between option but a more modest within option differentiation. A U-shaped relation between beliefs (y-axis) and values (x-axis) will give a maximum within option and rather little between option differentiation. In such a case each alternative is simultaneously attractive and repulsive and a very hesitant attitude (the Hamlet syndrome) can be expected. An inverse U-shape for the relationdup between beliefs and values would imply simultaneous denial of threat and doubt about the value of action. I t would be of interest t o study such complex belief-value relations in a context of action; for the present case only linear belief-value relations were investigated (see,
400
L. Sjoberg
however, SjBberg, 1981a, for examples for other types). Action is facilitated by differentiation at least as long as defensive distortions are not brought in. Once a decision is made, the risk of reopening the issue must be counteracted at all costs, since various stressors and impulses work against any long-term commitment (Sjoberg, 1980). Such re-opening of the issue can be, and probably frequently is, fought with some success by the various strategies for arriving at a high belief-value correlation for the chosen alternative, hence approaching a dominance structure.
References Beach, L.R. and J.A. Wise, 1980. Decision emergence: A Lewlnlan prespective. Acta Psychologica, 45, 343-356. Berlyne, D., 1960. Conflict, Arousal, and Curiosity. New York: McCraw-Hill. Bieri, J., 1961. Complexity-simplicity as a personality variable in cognitive and preferential behavior. In: D. W. Fiske and S.R. Maddl (eds.), Functions of Vaned Experience Homewood, 111: Dorsey Press. Festinger, L., 1957. A Theory of Cognitive Dissonance. Evanston, Ill.: Row and Peterson. Festinger, L., 1964. Conflict, Decision and Dissonance. Stanford, Calif.:Stanford University Press. Fishbein, M. and J. Ajzen, 1975. Belief; Attitude, Intention and Behavior: A n Intre duction to Theory and Research Reading, Mass.: Addison-Wesley. Greenwald, A.G. and D.L. Ronis, 1978. Twenty years of cognitive dissonance: Case study of the evolution of a theory. Psychological Review, 85, 53-57. Janis, I. L. and L. Mann, 1977. Desision Making New Y9:k: The Free Press. Lewin, K, 1943. Defining the field at a given time Psychological Review, 50, 292-310. Marlatt, G.A., 1978. Craving for alcohol, loss of control and relapse: A cognitivebehavioral analysis. In: PJi. Nathan, G.A. Marlatt, and T. Loberg (eds.), Alcoholism: New Directions in Behavioral Research and Treatment. New York: Plenum. McGuire, W., 1960. Cognitive consistency and attitude change. Journal ofAbnormaI and Social Psychology, 60,345- 353. Montgomery, H., 1983. Decision rules and the search for a dominance structure: Towards a process model of decision making. In this volume, 343-369. Sjoberg, L., 1980. Volitional problems in carrying through a difficult decision. Acta Psychologica, 45, 132- 132. Sjoberg, L., 1981a. Beliefs and values as components of attitudes. In: B. Wegener (ed), Social Psychophysics Hillsdale, N.J.: Erlbaum. Sjoberg, L., 1981b. Value change and relapse following a decision t o quit or reduce smoking. Unpublished manuscript. Sjoberg. L. and A. Bid, 1981. Mood, belief-value distortion and action. Gotehorg Psychological Reports, 1 I (2). Sjoberg, L. and G. Olsson, 1981. Volitional problems in carrying through a difficult decision: The case of drug addiction. Drug and Alcohol Dependence, 7, 177-191.
.
ANALYSIS OF PREDECISIONAL INFORMATION SEARCH PATTERNS' Joshua KLAYMAN Center for Decision Research, Graduate School of Business University of Chicago, U S A .
Abstract Recent interest in complex and varied decision strategies has highlighted the need for more sophisticated process tracing analyses. e.g., in analyzing information gathering patterns. Earlier studies have classified strategies as highllow proportion of available information used, constantlvariable amount of search across alternatives, and intra-linterdimensional direction of search. However, more powerful analyses are needed, since the search characteristics of a given strategy may be variable and highly taskdependent. Two major means of improving search analysis are discussed: (a) the use of task-specific simulations to establish the search characteristics expected from different strategies; and (b) the analysis of additional search characteristics, such as the extent to which future information search is controlled by prior information (contingency), and different types of search variability. An experimental example of the use of these techniques is presented. Applications are proposed in three areas: (a) the study of sequential combinations of decision rules, and multi-phase decision making; (b) exploration of the possibility that there exists continuous variation among strategies along various parameters, rather than a set of discrete rules; and (c) the investigation of how decision makers adapt strategy t o task.
'
This paper is based in part on work supported by a Graduate Dissertation Fellowship and Grant from the University of Minnesota Additional support for computer analyses was provided by Hampshire College (Amherst, Massachusetts), and by the University of Chicago, Graduate School of Business. Simulatlon programs were developed with the collaboration of Eric Smlth at Hampshire College. Thanks to Hillel Emhorn, J.E. Russo, Paul Schoemaker, and especially Robin Hogarth for comments on an earlier draft, and to Ola Svenson and two anonymous reviewers for helpful editorial suggestions.
26
402
J. Klayman
Introduction Human decisions have, in recent years, been seen as based on complex predecision processes. The decision maker is seen as having a broad repertoire of available strategies, the choice and application of which are dependent on characteristics of the task (cf. Montgomery, this volume; Svenson, 1979). Thus, there is a need for process tracing analyses which are sensitive enough to identify complex decision rules, and to represent the effects of task characteristics on decision strategies. The present paper discusses the analysis of iriforniation gathering patterns as a tool for process tracing. Two approaches for expanding the analysis of search patterns are proposed: (a) computer simulation of decision strategies; and (b) measurement of a greater variety of search characteristics. Probably the best example of the use of information gathering data is provided by Payne (1976). In this study, people chose apartments on the basis of matrices of information provided on "information boards". Three search characteristics were used in an essentially dichotomous way to classify strategies, as follows (abbreviations mine): Proportion of information searched (prop search): Searching a large proportion of the available information (high prop search) is indicative of compensatory strategies (e.g., additive, additive-difference); low prop search, o n the other hand, indicates noncompensatory strategies (e.g., conjunctive, elimination by aspects). Variability in proportion searched across alternatives (variab by a h ) : Search of the same proportion of information on each alternative (absence of wriab by a h ) indicates compensatory strategies; presence of variab by afts indicates noncompensatory strategies. (3) Direction of search (direction): An interdimensional direction indicates alternative-wise strategies (e.g., additive, conjunctive); an intradimensional direction indicates dimension-wise search (e.g., additive difference, elimination by aspects). Such a categorical treatment of search characteristics makes only linuted use of a potentially rich source of information. The basic problem is that decision strategies -provide more detailed specifications of search patterns w h c h are task-specific and cannot be adequately captured by the broad classifications mentioned above. In particular, the nature of information search may be affected by factors such as the stringency of pass/fail criteria, the distribution of values on different dimensions across alternatives, and the number of dimensions and alternatives.
ANALYSIS OF SEARCH PATTERNS
40 3
Figures 1-4 illustrate the effect of task characteristics on the measures used by Payne (1976) for various strategies.* Each figure displays a matrix of information, with alternatives as rows (2 through S ) , and dimensions as colums (A through F). The cells are numbered to indicate the order of search which would occur with the given strategy. Information is represented by a "+" or "-", indicating a value above the subject's criterion of acceptability ("pass") or below ("fail"), respectively. Shaded cells represent information w h c h is not examined by the decision maker. Vuriab by a h is measured by the standard deviation of the proportion of information searched per alternative across the set of alternatives. For direction, a score of -1 .O represents strictly dimension-wise (intradimensional) search and +1.0 strictly alternative-wise (interdimensional) search. Inter - Intra where Inrra is the number of This measure is computed asInter + Intra instances in which the nth 1 item searched is of the same dimension as the nth, and Infer is the number in which the n t h + 1 is of the same alternative as the nth. Figure 1 shows the effects of a change in pass/fail criteriaon search, using an elimination by aspects (EBA) rule. The criteria in Example 1 b are set lower than in la, so that a few "fail" values are now considered "passing". With the relaxed criteria in Example 1 b, prop info and variab by alts are both much greater, and the direction of search is more mixed. Note that these differences result only from a change in evaluation criteria, with no change of search strategy. Figure 2 illustrates the effects of differences in the distribution of values in the matrix, using alternative focused conjunctive search. Note that Examples 2a and 2b differ only with respect t o the distribution of pass and fail values; the overall proportions of pass t o fail values are identical. The seemingly modest difference between these two examples produces important differences in search characteristics. In particular, the direction of search shifts from a lareely alternative-wise +0.71 to a randomlooking 0.00, and the proportion of search decreases considerably. In Figure 3 the two matrices are identical, except that in 3b one more of the available attributes is considered essential. As a result, 3b requires twice the amount of information search to find a satisfactory alternative, and variability is also greater.
+
* The rules modeled here are specified in the appendix. They represent the author's interpretation of the rules considered by Payne (1976) and earller by Tversky (1969, 1972).
26*
404
J. Klayrnan
A
B
C
,~
DIMENSIONS
la
D
A
E
E
C
D
E
Z Y
X W
V
U T S .50
Prop info: .35 Variob by olts: ,177 Direction; -.83
,328
- 64
Fijpre 1. Effects of Change in Stringency of PasslFail Criteria, with an EBA Search Strategy
2a
A
B
DIMENSIONS A
C
2b
B
C
U Prop info: .61 Variab by altv ,390 Direction: +.71
.44 ,344 0
Figure 2. Effects of Change in Distributionof Pass and Fail Values, with a Conjunctive Search Strategy.
ANALYSIS OF SEARCH PATTERNS
405
Figure 3. Effects of Change m Number of Critical Dimensions, with a Conjunctive Search Strategy, (Dimension D is considered critical in Example 3a, but not in 3b.)
DIMENSIONS
Figure 4. Effects of Change in Number of Available Alternatives, with a Lexicographic Search Strategy
406
J . Klayman
Finally, Figure 4 illustrates the effects of a number of available alternatives with a minimum-difference lexicographic (lexico) strategy. The most striking difference here is in the direction of search, being mostly intradimensional in 4a, but mostly interdimensional in 4b. After the first pair of alternatives, search of new information is largely alternative-wise, although comparison is being made to a previously searched alternative. The same effect is obtained with the additive-difference strategy. The above examples illustrate only a few of the ways in which search characteristics are task-dependent. The upshot is that the question of what t o expect from different decision strategies is complicated, and cannot be answered by broad generalizations. Rather, it is necessary to consider the search patterns implied by different strategies in a particular decision situation. Here, task-specific simulations of search strategies can be useful. The tasks and task-manipulations of interest can be analyzed, then, in terms of how different simulated strategies function on the tasks in question. By so doing, one can establish theoretical points of comparison for the interpretation of human search data. The first requirement for this process is the specification of the step-by-step search implications of the various decision strategies. These specifications, and simulations based o n them, also allow the measurement and analysis of a number of search characteristics beyond the three discussed above. One class of such measures are referred to here as “contingency” measures. These indicate the extent t o which future search is guided by prior information. One can measure different types of contingency as the proportion of search moves which conform to a given contingency rule, out of all instances where the rule applies. Simulations can then provide theoretical comparison values for specific decision tasks. For each noncompensatory strategy modeled here there is a different process by which ”what to search next” depends o n “what has been discovered so far” (see appendix). For example, in the conjunctive strategy, if a fail value is revealed, you change alternatives, but if a pass value is revealed, you may change dimensions, staying with the same alternative. Thus, the type of search move is contingent on the value of the immediate prior information. The lexico strategy has a similar contingency rule, but based on differences between pairs of values. The rule for the EBA strategy is also like that for the conjunctive, except that the information is stored so that one knows which alternative to consider on the next dimension (i.e., only those which passed on the previous dimension). In contrast t o these rules, strictly compensatory strategies do not operate contingently, but instead rely on exhaustive search of those dimensions deemed relevant to the formation of an overall evaluation of an alternative. Thus, contingency
ANALYSIS OF SEARCH PATTERNS
407
measures are useful because they distinguish between compensatory and noncompensatory strategies more directly than other measures do, and they can also help distinguish among different noncompensatory strategies, A second class of search measures supplement Payne’s (1976) measure of variability of search. Recall that Payne measured variability in information searched per alternative, across alternatives in a given decision ( w i o b by alts). However, one can also consider the extent to which the different dimensions receive different amounts of search. Here, the measure (variab by dims) is the standard deviation of the proportion of information searched per dimension, across the set of available dimensions. Figure 5 illustrates that the two types of variability are not redundant. In fact, the distinction between them can differentiate among sources of variability (e.g., not looking at some alternatives or some dimensions at all).
DIMENSIONS
50.
A
B
C
Variab b y alis: 204 Variab by dims:.456
D
A
5b,
B
C
D
,456 ,204
Figure 5. Illustration of the Distinction Between variab by alts and varmb by dims
In measuring variability, it is also important to take into account that the size of the information matrix and the proportion of information searched both have an important impact o n how variable the decision maker can be, and how variable one would be by chance. Thus, standardized scores should be used, based on the distributions of all possible patterns of Variability which would be generated by random search with a given size of matrix, and a given proportion of search. These standardized scores (referred t o here as *vat?&) allow search patterns to be judged against a task-specific base rate for variability.
408
J. Klayman
Empirical Illustration A recent study of decision strategies in 12-year-olds (Klayman, 1981) illustrates the use of computer simulations and measures of contingency and variability, as described above. As in Payne's (1976) study, subjects were presented with "information boards" consisting of matrices of cards, in the manner of the illustrations in Figures 1-4. Information was initially hidden from view, and subjects obtained information from the matrix one item at a time, with the understanding that they would continue until they had sufficient information to choose one of the available alternatives. The criteria for sufficient information was subjectively determined by each subject, and the costs associated with information search were simply the time and effort required in turning over the cards to reveal information. The number of alternatives and number of dimensions were varied as measures of the effects of task complexity. Patterns of information search were the principal data. In this study, 48 children made four decisions, each from a different size board (three or six alternatives, three or six dimensions). Decisions concerned one of three age-appropriate topics (e.g., buying a used bicycle). Information generally consisted of high and low value terms (e.g., "new tires" or "worn-out tires"). It was assumed that, for any item subjected t o pass/fail analysis, ths high value would be above criterion, and the low value below. The values were distributed such that neither dominance nor equivalence of alternatives was likely. The orientation of the boards (alternative as rows or as columns) was counterbalanced. The same matrices presented to subjects were used in the simulations. hgh and low values were encoded as (0, l ) , with pass/fail criteria intermediate, when used. Two compensatory strategies (additive and additive-difference) and three noncompensatory strategies (conjunctive, lexico, and EBA) were simulated, based on conceptualizations of these strategies used in previous studies (see appendix). The principal focus in this study was o n the complexity variables, i.e., number of alternatives and number of dimensions. The dependent measures were those described earlier (prop search, vuriob and *wriab measures), and two measures of contingency. One contingency measure (contificonj) was based on a conjunctive rule, the other (contin-EBA)on an EBA rule. Specifically, contin-conj was calculated as M,/M, where M is the total number of moves from the nth item searched t o the nth + l/(n = 1, ...., N; where N is total number of items), and M, is the number of such moves consistent with the following rule: If item n was a "pass", item n + 1 should be on the same alternatives as n;if n was a fail, n 1 should be on a different alternative. Contin-EBA was RE/R, where R
ANALYSIS OF SEARCH PATTERNS
409
is the total number of times that search returned to an alternative after other alternatives were search, and RE is the number of such returns consistent with this rule: Return to an alternative only if the last item searched on that alternative was a "pass". For both contingency measures, then, chance value was 0.50, and perfect adherence t o the rule scored 1.oo. The major hypothesis, as in Payne's (1976) study, was that there would be a shift from compensatory to noncompensatory strategies as complexity increased. The results with the "traditional" measures supported Payne's finding of such a shift. Prop search decreased as complexity increased, from 0.79 on the 9-item boards, to 0.54 on 36-item boards. At the same time, vuriub by alts increased from 0.219 to 0.294. However, a different picture emerged with the new search measures, described above, and with comparisons to simulation data: (1) Vurkb by alts was higher overall than the simulation values for any of the five strategies tested (a mean of 0.264, versus 0.00 for compensatory strategies, and means of 0.1 16, 0.137, and 0.234 for EBA, lexico, and conjunctive, respectively). The difference was even more pronounced with the standardized values, *vuriub by ults. (2) Variab by dims, on the other hand, was well above the value for compensatory strategies (0.199 versus O.OO), but was much lower than in the simulations of noncompensatory strategies (0.469, 0.456, and 0.369). Again, the standardized measures confirmed this relationship. (3) Continconj was moderately high (0.72), but did not vary significantly with complexity, contrary to the expectation of greater use of conjunctive strategies. (4) The EBA type of contingency was well below chance (0.37), indicating that subjects were more likely to return to an alternative which fmled on last examination, than to one which passed. This tendency increased with more dimensions. These seemingly anomalous results actually revealed an interesting pattern. First, high vuriab by ults with low vuriub by dims is characteristic of what occurs with satisficing, i.e., when some alternatives are left completely unexamined. This is illustrated in Figure 5 above. Subjects in the present study did this more often as the number of alternatives increased (on 7% of the 3-alternative boards, and 30% of the &alternative boards). Second, an EBA contingency below chance is exactly what occurs when a noncompensatory rule (e.g., conjunctive) is used for a first pass, and then more information is obtained on a second pass. In this study, multiple-pass search was operationally defined as search in which information is gathered on some alternatives after all alternatives have been searched at least once. Subjects did this more often as the number of dimensions increased (on 50% of the 3-dimension boards, and 73% of the &dimension boards).
4 10
J . Klayman
Tlurd, the moderately high value obtained on contin-conj indicates that the rules used for satisficing, or for selection for second-pass search, were partially compensatory. The data did not indicate a mix of some individual compensatory patterns and some noncompensatory. Rather, the modal individual pattern was itself mixed. In sum, the search analyses used here highlighted a pattern of both sequential and simultaneous mixing of decision rules. Changes with complexity were manifested in increased use of multiple search passes, and in increased use of satisficing. However, further investigation will be necessary to validate this interpretation, and t o test its generality, since characteristics of materials and procedures can be assumed t o affect the behavior of both human and simulated decision makers. In the study above, for example, the number of available dimensions and alternatives was finite and known, there was no missing information, and only acquisition of new information was traced; there was no information about recheclung of previously searched data. The simulations, too, involved a number of procedural assumptions (see Appendix), and the five search rules simulated by no means exhaust the set of possible strategies (cf. Svenson, 1979). Further investigation is required to understand the effects on decision strategies of variations in materials, procedures, and assump tions.
Concluding Discussion In conclusion, there are several important areas of study in which more detailed analysis of search patterns could provide important process information. One such area is the use of sequential combinations of decision rules, or multi-stage processing. The empirical investigation described above illustrates how multi-stage patterns may be identified. The investigation also suggests the intriguing possibility that decision strategies d o not exist as discrete members of a set, but instead represent points in a multidimensional space of decision strategies. Strategies would then differ along several continuous parameters. Figure 6 illustrates several possible decision rules which fall between some of the usual distinctions made among strategies. The "two-alternative EBA" (A) is a blend of the conjuno tive and EBA in its elimination rule. The "double whammy" and the "additive with truncationff rules (B) represent two different points between the purely compensatory and purely noncompensatory rules. The "three-level lexico" rule is one step along the line from dichotomous evaluation to interval scaling. Other possibilities for continuous variation among strategies are suggested by Einhorn (1970) and by Montgomery and
ANALYSIS OF SEARCH PATTERNS
41 1
A Mixing contingency rules
2-011 EBB "Consider further if pass I + ) on A and 8"
A
DIMENSIONS B C D
Note Conlunctivs contingency EBA contingency =I00
B
Parlial compensation
Double whomrny (sliqhtly compensatory) "Eliminate if two toilures"
Additive with truncation (most compensatory) "If < 20 on f i r s t three dimensios. quit"
DIMENSIONS A B C D
DIMENSIONS A B C D
21' 6 1 ' 6
cn W
Z Y
t z x
Note Conjunctive con tingency = 0 67
E
*, 3 9 ~ 4 81
5
I
ti
V
I
C In-between scalin9
3-1eve1iexico~"Ehninate if large difference. or net Iwo small"
DIMENSIONS C D E
F
W' over X
-u Figure 6. Examples of "Compromise"Strategies
~
Variability low but not 0
5a W
B
:
412
J. Klayman
Svenson (1976). The question, then, becomes not whether a strategy is compeilsatory, but rather, how compensatory it is, how variable, how conjunctive, and so on. Finally, the most important use of task-specific simulations and expanded search measures is to study how strategies are adapted to tasks. Although the question of task complexity has received some investigation, as in Payne's 1976 study, there are other important task factors which have received less attention; e.g., the demand for accuracy of judgment, costs of information search, and effort-related factors such as time restraints, competing tasks, and the availability of aids to memory, computation, e t c (cf. Hogarth, 1975). The use of more detailed analyses can make information-gathering measures more powerful as process tracing tools. By combining these measures with other process tracing methods, and by establishing taskspecific points of comparison for different strategies, one can formulate testable predictions concerning the effects of many important types of task characteristics.
Appendix The following are summaries of the search rules used in Figures 1-4, and in the simulations described (Klayman, 1981). It must be noted that these are not the only possible representations of the five named strategies (cf. Svenson, 1979) since assumptions must be made concerning the translation of algebraic models or axiomatic systems into behavioral terms. These models represent cases in which a unique choice can be obtained through application of a single "pure" strategy. A ddirive. "Search one alternative, across dimensions, until all dimensions are examined which might contribute significantly to the overall value of an alternative. When this is done, begin .examination of a new alternative. Continue until all alternatives are examined." The choice is the alternative with the b h e s t overall evaluation, based on a weighted sum of values for that alternative, across dimensions. It is assumed here that (a) the dimensions need not be searched in any particular order; (b) exact ties among alternatives do not occur; and (c) there is no "truncation", in which further search of an alternative is halted because absolute maximum values on further dimensions could not compensate sufficiently for the low values thus far obtained. Additive difference. "Examine and compare huo alternatives on one dimension. Then consider the same pair on another dimension. Continue until all dimensions have been examined which might contribute signifi-
ANALYSIS OF SEARCH PAll'ERNS
413
cantly to the overall difference in value between a pair of alternatives. When this is done, use the same procedure to compare the better of the pair to a new, third alternative. Continue until all alternatives are examined." The "better" alternative is the one favored overall in a weighted sum of value differences, across dimensions. The final choice is the better of the final pair of alternatives. Assumptions here correspond to those for the additive rule. Conjunctive (by alternatives). "Search one alternative, across dimensions, as long as values are above the pass/fail criterion for each dimension. Stop at the first observation of a 'fail' value, and begin search on a new alternative. If an alternative has been searched on all dimensions for which a criterion of minimum acceptability exists, and no 'fail' values are obtained, choose that alternative, and do not search others." Some assumptions here are: (a) search is oriented toward satisficing, in that alternatives are evaluated one at a time, and search ceases with the first completely acceptable alternative; and (b) dimensions are searched in a fixed order across alternatives. In addition, there must be some process for handling "default" (cases in which all alternatives fail). In the simulations, the choice in case of default was that alternative which was most searched before failing. If a unique choice did not result, the case was discarded. A fixed order of dimension search is modeled because searching dimensions in order of importance, or in order of likelihood of failure, enhances optimality or efficiency, respectively. Lexicographic (minimum-difference). Search is pairwise, as with additive difference, except that differences are classified as either significant or insignificant: "If a difference is significant, stop search on this pair. Next, use the same procedure to compare the favored alternative of the pair to a new, third alternative. Continue until all alternatives have been considered." The chosen alternative is the favored one of the last pair. It is assumed that (a) for each dimension considered, there exists a minimum difference, 4 such that differences less than 4 are ignored, and differences greater than Ad are sufficient to eliminate the disfavored alternative; and (b) dimensions are searched in a fixed order. Default is possible in that exact ties can occur (i.e., all differences between two alternatives may be below 4).In simulations, this was handled by allowinga temporary 3-way comparison process. If ties persisted, the case was discarded. Elimination by aspects. "Look at all alternatives on one dimension. Then, go to the next dimension and examine those alternatives which passed the acceptability criterion of the previous dimension. Continue until only one alternative passes on the dimension considered." That last alternative is chosen. Default is possible in that the final dimension may eliminate all remaining alternatives. In the simulations, those cases were excluded.
414
J. Klayman
References Einhorn, H.J., 1970. The use of nonlinear, noncompensatory models in decision making Psychological Bulletin, 73, 221 -230. Hogarth, R.M., 1975. Decision time as a function of task complexity. 1n:D. Wendt and C. Vlek (eds.), Utility, Probability, and Human Decision Making Dordrecht, Holland: Reidel. Klayman, J., 1981. The analysis of decision making strategies in children. Working paper. Center for Decision Research, University of Chicago, Graduate School of Business. Montgomery, H., 1983. Decision rules and the search for a dominance structure: Toward a process model of decision making. In this volume, 343-369. Montgomery, H. and 0. Svenson, 1976. On decision rules and information processing strategies for choice among multiattribute alternatives. Scandinavian Journal o f P s y c h o l o ~ 17, , 283-291. Payne, J.W., 1976. Task complexity and contingent processing in decision making: An information search and protocol analysis. Organizational Behavior and Human Performance, 16, 366- 387. Svenson, O., 1979. Process descriptions of decision making, Organizational Behavior and Human Performance, 23, 86- 1 12. Tversky, A., 1969. Intransitivity of preferences. Psychological Review, 76, 3 1-48. Tversky, A., 1972. Elimination by aspects: A theory of choice. Psychological Review. 79, 28 1- 299.
REASONS GIVEN FOR RISKY JUDGMENT AND CHOICE: A COMPARISON OF THREE TASKS Rob RANYARD PsychologV Section, Bolton Institute of Technology, Bolton, UX.
and Ray CROZIER Faculty of Art and Design, South Glamorgan, I.H.E., Cardiff; U.K.
Abstract Subjects spoke aloud their reasons for choice in three risky decision experiments-a bidding task, a binary choice task, and a multi-option choice task. The options were simple bets varying on two risk dimensions, an amount and a chance to win. The major propositions contained in reasons were classified as value judgments, heuristics, decision rules, and miscellaneous, and the pattern of such propositions varied across
tasks. Task differences were found in the ratio of absolute to relative judgments and the type of computational heuristic employed. A large proportion of miscellaneous propositions in the bidding task referred to the bidding transaction. Implications of these findings were discussed for the nature of framing and anticipatory processes involved in the production of reasons.
Introduction Decision making can be seen as involving two interdependent activities-making the choice and justifying it, and the way in which these activities relate is an important issue for cognitive decision theory. Some recent research (e.g., Slovic, 1975; Slovic et al., 1976; Fischhoff et al., 1980) has suggested that the justifiability of decision rules may be an important factor in their adoption and might explain aspects of judgment and choice behaviour, for example. the lack of appeal to some decision makers of certain quantitative decision aids (Slovic et al., 1976). This notion implies that decision processes may be biased or altered by the justification process, but alternative views of the relationship between these have been presented (Ericsson and Simon, 1980). The 'anti-introspectionist'view (Nisbett and Wilson, 1977) proposes that decision p r e cesses are not accessible to introspection and are separate and distinct from justification, while the process tracing view (Payne et al., 1978;
416
R. Ranyard and R. Crozier
Svenson, 1979) suggests that justifications are generated by decision processes and that verbal reasons for choice allow insights into these. We would argue that a major goal for the psychology of decision making should be t o understand the role which justification of choice plays, and that a necessary first step towards this involves empirical investigations of the kinds of reasons which subjects actually produce in different decision tasks. We are currently approaching the task of examining these reasons for choice in two ways. In the first kind of study, subjects are presented with carefully selected choice options and are explicitly asked to give reasons for each choice which they make. We can then examine relationships among options, choices, and reasons for that task. In the second approach the decision task is varied and we can study relationships between changing task parameters and the pattern of reasons for choice which subjects produce. The research described here takes this second approach, and this paper will present the findings from three studies intended to show how reasons for choice varied across three different risky decision tasks: all three studies involved simple gambling options which included explicit probabilities of winning small amounts of money, but they differed in other task requirements. In Study 1 subjects nominated the smallest selling price (their bid) which they would be prepared to accept for a number of options; in Study 2 subjects made a series of binary choices between pairs of options; in Study 3 subjects made a series of choices among sets of either three or five options. The relationships among options, reasons and choices in Study 2 were described also by Ranyard (1981). For a number of reasons our preference at the moment is for the study of simple risky decision tasks of kinds which have become traditional in research into decision making. We know both a lot and very little about such tasks. On the one hand, very little is known about the reasons people give in these tasks. On the other hand, a considerable body ol research has investigated subjects’ performance in these situations and has produced interesting findings, including frequent deviations from the predictions of normative models, the influence of information processing considerations, response induced reverals of preference, e.g., between bids and choices, and the existence of decision rules, such as elimination-by-aspects rules (Tversky, 1972) which account for significant aspects of choice behaviour. These tasks offer ’fertile’ ground for studying relationships between choice and reasons. They can also be readily described and compared in terms of their risk dimensions and their expected value and higher moments of their underlying distributions. As we compare reasons and choices, we can relate our subjects’ choices to the literature on decision
REASONS FOR RISKY JUDGMENT AND CHOICE
417
making in similar tasks, and consider the insights which the study of reasons may offer for those findings. The data which we report on below come from three experiments carried out using options which involved very similar amounts of money and probabilities and which were presented to three groups of subjects from a common background-volunteer undergraduate psychology students. Our objectives were (a) to encourage our subjects to produce their reasons for the choices which they made, and t o record these; (b) to utilize a classification scheme which would accommodate the reasons produced; (c) to compare the patterns of reasons produced in the three separate tasks, to identify major differences, and t o relate these to task requirements.
Method Task 1 (bidding) Subjects Five subjects participated in the experiment. Options Forty-four options of the form, win amount s with probability p, incorporating combinations of payoff values in the range E1.40 t o 24.00 and probability values in the range 3/20 t o 16/20. These were presented one at a time in the same randomly determined order for all subjects.
Procedure Each of the 44 options was printed on a card, and the subject was faced with the pile of cards face down upon a table. The subject was asked t o work through these at his/her own pace, and on each trial to state the minimum selling price acceptable for each option and to speak aloud the reasons for arriving at that price, by saying a sentence of the form, "I would ......... because ..... _... Instructions to the subject about this task and the method for arriving at a purchase price for the option were standard (e.g., Becker et al., 1964). After some prectice trials t o ensure that the subject understood the task, the experimenter left the room, and
".
27
418
R. Ranyard and R.Crozier
subjects’ responses were tape recorded throughout the session. At the end, the experimenter returned, one of the options was selected and depending on the selling price proposed the experimenter bought the option or permitted the subject to play it and to keep any winnings. Finally, subjects were asked to summarise the approach which they had taken to the task and a short debriefing session followed.
Task 2 (binary choice) Subjects A total of 10 subjects participated in the experiment.
Options Thirty-nine pairs of options of the form, win amount s with probability p, incorporating payoff values in the range E 1 . 4 0 to M.00 and probability values in the range 3/20 to 16/20, were presented one pair at a time in a random order.
Procedure
The option pairs were presented on the visual display of a Commodore Pet computer. On each trial the subject was asked t o press one of two keys on the computer keyboard to indicate the option chosen, and to speak aloud the reasons for arriving at that choice by saying a sentence of the form, ”I choose option (1 or 2) because ......”. After giving full instructions to the subject and running some practice trials t o ensure that the task was understood the experimenter left the room, and the subject’s reasons were tape recorded throughout the session. At the completion of the trials the experimenter returned and, as described to the subject beforehand, one of the option pairs was selected and the subject’s preferred option played with the subject keeping any winnings. As in Task 1 , the experiment ended with subjects being asked to summarise their approach to the task and a short debriefing session.
REASONS FOR RISKY JUDGMENT AND CHOICE
419
Task 3 (multichoice) Subjects Ten subjects participated in the experiment.
Options Twenty-one sets of options of the form, win amount s with probability p, incorporating payoff values in the range fo.60to g4.40 and probability values in the range 2/20 t o 18/20, were presented one set at a time in a random order. Twelve of the sets contained 3 options, and 9 contained 5 options.
Procedure This was the same as in Task 2. In all three tasks, considerable care was taken t o ensure that subjects understood the task and that their decision had a real consequence in determining their payoff.
Results
Basic Classification of Propositions Contained in Reasons. The major propositions contained in the reasons were identified and classified using four categories whose titles are presented in Table 1 and w h c h had been developed from the classification scheme devised by Ranyard (1981). The first category refers to value judgments and these were classed in terms of the object of evaluation (the risk dimensions included in the option, or the option itself) and the basic type of evaluation (absolute or relative). For example, "not much difference between the options" would be an instance of a relative judgment about options, while "the chance of winning is high" would be classed as an absolute judgment of the probability risk dimension. Heuristic propositions imply some strategy to simplify the decision task. The classification heuristic is where subject classify the current task as similar to one they have previously encountered, while the computation heuristic is where subjects reduce the number of items of information they need to consider by computing relationships between aspect values and subsequently working with these, or where they convert the information t o a simpler form, e.g., odds of 27*
R. Ranyard and R. Crozier
420
6 in 10 are seen as 50-50 or 90p is evaluated as ’close to gl.00’. In
Decision Rule propositions subjects explicitly formulate some rule which they claim to be following, i.e., the rule appears directly in the proposition and is not inferred by the experimenter. The Miscellaneous category was used for those propositions which could not be classified elsewhere. All propositions in all three tasks were identified and classified independently by the two authors who then compared their results. The percent agreement between the two judges was very high, 94%, and the remaining differences were resolved without difficulty, as they tended to be errors in applying the scheme rather than disagreements as to how the scheme should be applied. Table 1 presents the percentage occurrencesofthe four types of proposition for the bidding, binary choice, and multi-option choice tasks. Differences across the three tasks will be considered for each of the four proposition categories. Table 1. Percentages o f the Four Types of Proposition Across the Three Tasks
I
Task
Category Bidding
I 1 Binary choice
Multichoice
Judgm en t
Relative
8.2
6R.6
59.7
Absolute
335
9.5
25.2
Computation
24.0
13.7
8.2
Classification
0.9
3.2
0.2
Heuristic
Decision rule
Miscellaneac s Total number o f propositions Total number o f trials
REASONS FOR RISKY JUDGMENT AND CHOICE
42 1
Judgments Inspection of Table 1 shows clear differences in the pattern of judgments between the bidding task and the two choice tasks; with bidding inducing more absolute judgments and the choice tasks more relative judgments. However, there was also a difference between the two choice tasks, in that there were many more absolute judgments in the multi-option task, due t o an increase in this task of absolute judgments of the payoff dimension (44 such propositions as opposed to 7 in the binary task). Further inspection of propositions revealed that judgments were largely about the risk dimensions rather than about complete options and this was true of all three tasks. Overall, only 3.5% of relative judgments and 8.9%of absolute judgments concerned options. In the bidding task this rose t o 20% of all judgments but even here dimensional propositions predominated. There seemed t o be no marked differences in references t o the two types of risk dimensions included in an option; 51.2% of the judgments concerned probabilities and 42% payoffs. However, when a distinction was made between relative and absolute judgments it was noted that whle relative judgments were divided equally between the risk dimensions, absolute judgments of probabilities were twice as frequent as of payoffs (60% of absolute judgments as opposed to 30%). This was largely due to different processing considerations across tasks, as absolute judgments of payoffs were virtually absent in the binary choice task. In the bidding task, there were more propositions about proba5ilities than about payoffs (47.4%as against 35.5%), which does not support the common assertion (e.g., Slovic and Lichtenstein, 1968) that subjects give more weight in bidding tasks to payoff information since they have to produce a selling-price in monetary terms.
Heuristics
Classification heuristics played virtually no part in the propositions produced by subjects. Computation heuristics were more significant, especially in the bidding task where they accounted for 24%of all propositions. The smallest number occurred in the multi-option task. In order t o look more closely at the role of computations in propositions, a further post hoc analysis was carried out. Subjects’ original protocols were re-examined for reference to computations and these were listed, whether or not they had originally been categorised as computational heuristics. For example, if a proposition like “the chances of winning are only about one in two“ had been categorised as an absolute judgment about proba-
R. Ranyard and R. Crozier
422
bilities, it was now considered as an instance of a computation, so that this new class was broader than and included the original computation heuristics category. Three kinds of computation could be distinguished: rounding, i.e., a conversion or simplification of values on the risk dimensions, as in the example above; a difference computation, where differences between probability or payoff values are calculated; a ratio computation, where a ratio of values was calculated. Table 2 presents these three classes in terms of overall frequency and as a percentage of the total number of computations for the three different tasks. Table 2. Rounding, Difference and Ratio Computations in Each Task as a Percentage of Total Computations in the Task (frequencies in brackets)
I
Task
Computation
Rounding
Binary
I
98.4 (121) 19.2 (39)
Multichoice
I
43.1 (25)
Difference Ratio
Total number o f computations
123
20 3
58
Total number o f trials
220
390
210
Again, the three tasks induced different patterns of propositions. Bidding was dominated by rounding while difference and ratio computations were virtually absent. Binary choice had instances of all three classes with an emphasis on difference computations, and the multi-option task attracted a mixture of rounding and difference computations. Ratio computations tended to occur only in the binary choice task.
Decision Rule Explicit formulation of a decision rule was rarely included in the propositions elicited in the three tasks. Where it occurred in the bidding task these instances were nearly all produced by one subject who expressed a variant of the expected value rule on every trial; where the chance of winning payoffs was p, his selling price was explicitly set at slightly more than p x S.
REASONS FOR RISKY JUDGMENT AND CHOlCE
423
In the post-experimental session he stated that he set his price slightly high in order to introduce more risk into the task by making it more likely that he would be able to play the gamble rather than sell it. While subjects in general may use a rule to make their decision they do not explain themselves in terms of rules.
Miscellaneous While only a small number of propositions in the two choice tasks could not he accommodated by the above three categories and these seemed to be a haphazard collection of propositions without any discernible trends, a much larger number fell into the miscellaneous category in the bidding task. Closer examination showed that these seemed to be orderly and could be subsumed under three headings: (1) reference to some strategy (more general and less explicit than a decision rule) which the subject was adopting, e.g., “I’m going to ask a lot”, ”I’d better ask a high price“, or “1’11 try not to sell it”. These accounted for 8.4% of the total number of propositions. (2) Anticipation of the buyer’s response to the subject’s bid, e.g., “If he was prepared to pay 60 for 16/20 chance he’s obviously going to be prepared to pay much less for odds of 13/20. Buyer won’t pay more than 55 p”. Such propositions accounted for 5.6% of the total. (3) Anticipation of the seller’s own response to the outcome of the transaction, e.g., “Bad odds. Would I go below &1? Three to 17, sell for 50, no, 30, I want to get rid, I don’t like that one”. These comprised 6.4%of the total propositions in the bidding task. Altogether these three types of proposition accounted for a sizeable minority (20.4%)of the propositions in the bidding task. While it is self-evident that these could only occur in this task, it is of considerable interest that there were so many references to this aspect of the task, and suggests an aspect of pre-decision behaviour which would only be apparent when reasons for choice are elicited.
Discussion The above results show major task differences in the percentages of pro position types. Task differences can be further elucidated by considering the way propositions were combined to form reasons. The bidding task. The relative judgments that occurred in this absolute judgment task suggest that subjects were trying t o develop a frame of reference for absolute evaluation as their experience increased, not too bad a ratio compared to the others”. Relative judgments e.g., ff.....
4 24
R. Ranyard and R. Crozier
also seemed to reflect the subject's need to be consistent over time, e.g., "Doesn't compare anywhere near as good as the previous ones, so obviously do not ask the same price". The role of rounding computations was as an aid to making absolute judgments of probabilities. Subjects seemed more comfortable evaluating, for example, a 7/20 p value when they rounded it to "about a third". The high percentage of judgments of p and s suggest that subjects based their bids on an evaluation of both dimensions, though they didn't necessarily mention both in every reason. These judgments were utilized in different ways. As previously mentioned, one subject consistently and explicitly fed the absolute judgments into a decision rule. The more frequent approach, however, involved anticipating the transaction between buyer and seller. Subjects seemed to form a preliminary evaluation of the bet, and then decide whether they wanted to play it or sell it. They then manipulated the bid in an attempt t o secure the preferred outcome, e.g, "A good chance here, not a bad payoff, so I'm going to ask quite a bit here because he should be keen t o buy that". The processes that dominated in the bidding task then, were absolute judgments and anticipatory, strategic ones. The binary choice task. As reported previously (Ranyard, 1981) and as shown in Table 1, relative dimensional evaluation dominated reasons in t h s task. This was also found by Montgomery (1 977). The most common combination of propositions was a pair of relative judgments, one on each risk dimension. Although these judgments were not usually compared explicitly, a comparison was often implied, e.g., "1'11 go for the slightly better chance, even though the winnings are very slightly less". The role of difference and ratio computations was as an aid t o relative dimensional evaluation, e.g., "The chance for option one is double and there's only 60 pence difference in the amounts. The double chance seems worth going for". It should be noted that reasons were not solely based on relative dimensional processing. Absolute judgments were quite common and usually shown by references to the highest or lowest value, but also references to the middle, second highest value, etc. were made. It was much the bidding task, rounding computations seemed to function as an aid t o the absolute evaluation of probabilities. In the binary choice task then, the main processes were relative dimensional judgments sometimes combined with absolute ones. The multi-choice task Propositions were combined in a similar way as in the binary choice task. However, the proportion of relative judgments was somewhat reduced, and absolute judgments increased. Relative judgments revealed a concern with the rank order of options on a dimension, usually shown by references t o the hghest or lowest value, but also references to the middle, second highest value, etc. were made. It was much
REASONS FOR RISKY JUDGMENT AND CHOICE
425
more common in this task to find relative judgments combined with absolute, e.g., "You have a high chance of winning a high amount, and it was the second highest chance". Also choice was sometimes justified completely in terms of absolute judgments, e.g., "There's a 50-50 chance of winning a large amount". Combinations of absolute judgments were less common though than combinations of relative and absolute, or pairs of dimensional judgments. As in the binary task, these latter were often implicitly compared.
Comparisons Between Bidding and Choice Lichtenstein and Slovic (1971, 1973) have shown that preferences for duplex gambles inferred from bids can be the reverse of those inferred from binary choices. Their explanation of this, partly derived from post-experiment interviews, is that the underlying cognitive processes utilized in the two tasks are quite different. In particular, they propose that binary choice is based on dimensional processing and bidding on a process which uses the amount to win as a starting point and adjusts downwards by an amount based on the other risks dimensions. The production of the bid is said to be subject to a commensurability bias, and this is said to be the main determinant of preference reversals. Our results are quite consistent with this hypothesis. However, our reasons for bids suggest that a second determining factor may be operating: the anticipatory, strategic processes that were reported. If, as previously suggested, subjects decide whether they want t o play or sell, and then manipulate the bid to try to secure the preferred outcome, reversals of preference between bidding and choice would occur. This is not to deny the existence of a commensurability bias with options that have monetary outcomes. But, if anticipatory, strategic thinking is also a major determinant, it would produce reversals with options having non-monetary outcomes. Commensurability bias, of course, would not predict this.
Binary and Multi-Option Choice Recent research suggests that decision processes change when the number of options available increases (Payne etal., 1978). We found that the ratio of absolute to relative judgments increased with the number of options, although the generality of this result needs t o be tested further. The multi-choice data reinforces a point previously made (Ranyard, 1981), i.e., even in the simple choice tasks considered here, absolute and relative
426
R. Ranyard and R. Crozier
judgments were often combined and such combinations are not described by current decision rule taxonomies (e.g., Svenson, 1979; cf., however, Klayman, this volume). This suggests that as a unit of analysis, the decision rule may be too complex to give valid descriptions of decision process. Elementary information processing units such as those considered by Huber (1980) may be more useful.
Computational Framing Differences in computational heuristics used in the three tasks can be understood as differences in the framing of the tasks (Tversky and Kahneman, 1981). It was suggested by Kahneman and Tversky (1979) that rounding may be one of the operations people employ when formulating a judgment or choice task. We observed the rounding of probabilities in all three tasks, presumably because probability values presented as fractions of 20 were difficult to process. The greater frequency of rounding in the bidding task suggests that it is framed quite differently to the choice tasks. The occurrence of other computations, such as difference and ratio ones, suggests that framing in terms of computations is a basic element of the process when dimension values are numerical. Our interpretation of differences in dimensional computations between the binary and multichoice tasks is as follows. A basic framing operation which takes precedence over computations in risky choice is the rank ordering of options on the risk dimensions. In the binary choice task t h s requires little effort, and additional framing, in terms of computations, takes place. In the multichoice task, however, this rank ordering involves rather more cognitive effort and, therefore, rather less computational framing occurs. This explains the smaller proportion of dimensional computations in the multichoice task Another aspect of the way the task is framed can account for the virtual absence of ratio computations. The option set may have been framed in terms of comparisons between options adjacent on the two risk dimensions and their dimension values did not have the simple ratios that tend to produce ratio computations.
Reasons Formation Fischhoff er al. (1980) proposed that "........ people make choices by searchlng for rules or concepts that provide a good justification, that minimize the lingering doubts, and that can be defended no matter what outcome occurs". They go on to suggest that reason formation may be
REASONS FOR RISKY JUDGMENT AND CHOICE
421
task-dependent, and that this may explain reversals of preference between bids and choices. We have already shown that our results are compatible with this latter view. The more general proposal quoted above, however, requires considerable elaboration. Both Montgomery (this volume) and Kahneman and Tversky (1979) have recently proposed models which imply that the search for reasons is a multi-stage process. What our results suggest about the nature ofthe processes leading to the production of reasons in the tasks studied can be summarized as follows. In framing the tasks, absolute and relative judgments of values o n both risk dimensions were involved. The balance of absolute to relative depended on basic task variables, such as the number and similarity of options available and whether it was an absolute or relative judgment task. Framing also involved various computations of the numerical values of both risk dimensions, the type also depending on basic task variables. The frame constructed often seemed to lead directly to a reason. Otherwise, the frame was used t o anticipate the consequences of opting for one decision rather than another. In the bidding task this anticipatory process involved the use of a psychological model of the other player in the game. In all tasks, reasons were constructed of a variety of forms. The discussions by Tversky (1972) and Slovic (1975) seem to suggest that justifications in terms of decision rules may be important. However, we found that explicit statements of decision rules were exceptional. More usually, evaluations and anticipatory statements were combined creatively in ways only predictable in general terms.
References Becker, C.M., M.H. de Groot, and J. Marschak, 1964. Measuring utility by a single response sequential method. Behavioral Science, 9, 226-233. Ericsson, K.A. and H.A. Simon, 1980. Verbal reports as data. Psychological Review, 87, 215-251. Fischhoff, B., P. Slovic, and S. Llchtenstein, 1980. Knowing what you want: Measuring labile values. In: T. Wallsten (ed.), Cognitive Processes in Choice and Decision Behaviour. Lawrence Erlbaum. Huber, O., 1980. The influence of some task variables on cognitive operations in an informatiowprocesshg decision model. Acta Psychologica, 45, 187-196. Kahneman, D. and A. Tversky, 1979. Prospect theory: An analysis of decisions under risk. Econometricq 47, 263-291. Lichtenstein, S. and P. Slovic, 1971. Reversals of preference between bids and choices gambling decisions. Journal of Experimental Psychofogy, 89, 46-55. Lichtenstein, S. and P. Slovic, 1973. Response-indud reversals of preference in gambling: An extended replication in Ias Vegas. Journal of Experimental PSyChOlOQ, 101, 16-20.
4 28
R. Ranyard and R. Crozier
Montgomery, H., 1977. A study of intransitive preferences using a think aloud procedure. In: H. Jungerman and G. de Zeeuw (eds.), Decision Making and Change in Human X ffairs. Dordrecht :Reidel. Montgomery, H., 1983. Decision rules and the search for a dominance structure: Towards a process model of decision making. In this volume. 343-369. Nisbett, R.E. and T.D. Wilson, 1977. Telllng more than we can know: Verbal reports on mental processes. Psychological Review, 84, 231 -259. Payne, J.W., M.L. Braunstein, and J.S. Carroll, 1978. Exploring predecisional behaviour: An alternative approach to decislon research. Organizational Behavior and Human Performance, 22, 17 -44. Ranyard, R.H., 1981. Binary choice patterns and reasons given for simple risky choice. Unpublished manuscript. Slovlc, P., 1975. Choice between equally valued alternatlves. Journal of Experimental Psychology, HPP I , 280 -287. Slovic, P. and S. Lichtenstein, 1968. The relathe importance of probability and payoffs in risk taking. Journal of Experimental Psychology Monograph Supplement, 78 (3), part 2. Slovic, P., B, t k h h o f f , and S. Liclitenstein, 1976. Cognitive processes and social risk taking. In: J.S. Carroll and J.W. Payne (eds.), Cognition and Social Behaviour. Lawrence Erlbaum. Svenson, O., 1979. Process descriptions of dedsion making. Organizational Behavior and Human Performance, 23, 86- 11 2. Tversky, A., 1972. Elimination by aspects: A theory of choice. Psychological Review 79, 281-299. Tversky, A. and D. Kahneman, 1981. The framing of decisions and the psychology of choke. Science, 21 I , 45 3 -458.
THE RELIABILITY AND VALIDITY OF CONCURRENT, RETROSPECTIVE,AND INTERPRETIVE VERBAL REPORTS: AN EXPERIMENTAL STUDY Eduard J. FIDLER Faculty of Commerce and Business Administration University of British Columbin, Canada
Abstract The purpose of this study is to examine empirically the reliability and validity of three kinds of verbal reports: concurrent, retrospective, and interpretive verbalizations. An analysis of verbal reports obtained for 13 subjects gave the following results: (1) Concurrent verbakation does not Seem to affect the reliability of decision outcomes. (2) Concurrent verbal reports appear to be more reliable than retrospective reports. (3) The correlations between different operationahations of attribute salience are all positive and signiflcantly different from zero.
Introduction
In recent years verbal reports have been used increasingly in studies of human judgment and choice (e.g., Payne et al., 1978; Svenson, 1979). Ericsson and Simon (1980), in a recent review of the literature, distinguish three forms of verbal reports: concurrent, retrospective, and interpretive reports. Concurrent Verbalization is obtained by having subjects think aloud while performing a task Thus, subjects perform two tasks concurrently: the verbalization of the thought process and the task being studied. Retrospective verbal reports provide information about a task performance that happened at an earlier point in time. In particular, subjects are asked t o recall the cognitive processes that occurred during a particular task performance. Finally interpretive reports are obtained by probing subjects after the completion of a series of task trials. Probes for such reports ask for a description or explanation of the general characteristics of the thought process that occurred during task performance. Within a decision making context very little research has addressed the following issues: (a) What impact has concurrent verbalization on the reliability of decision outcomes? (b) What is the test-retest reliability of different verbal reports? and (c) What is the convergent validity between
430
E.J. Fidler
different kinds of verbal reports? Therefore, the major goal of this study is to provide some answers to the above research questions. Based on an extensive review of the literature, Ericsson and Simon (1980) propose that verbalization instructions have a major impact on the validity of the information being obtained from subjects. For this purpose, they distinguish three levels of verbalization: Level 1 verbalization, the direct articulation of information stored in a language (verbal) code; Level 2 verbalization, the articulation or verbal recoding of non-propositional information without additional processing; and Level 3 verbalization, the articulation of information after scanning, filtering, inference, or generative processes have modified the information available (Ericsson and Simon, 1980, p. 227). For Level 1 and Level 2 verbalizations Ericsson and Simon suggest that the course and structure of the task performance process will remain largely unaffected by the verbalizations. However, if subjects are instructed to explain specific thought processes such as the reasons for their behavior (i.e., Level 3 verbalizations), Ericsson and Simon predict substantial effects of verbalization on task performance, except in cases where the information requested would normally be attended to. In the present study, subjects will be given instructions to think aloud and to provide information about processes that could be expected to occur normally (e.g., thoughts about the data, evaluations of the information and reasons for the decision). Hence, according to Ericsson and Simon’s (1980) model, no substantial effect of verbalization on decision outcomes is expected. However, because subjects were asked to verbalize every detail of the thinking process, the decision process is expected to be slowed down. Thus, it is hypothesized that decisions during concurrent verbalization take substantially more time than those for which no verbal reports are requested. One characteristic of verbal reports that has received virtually n o attention to date is the consistency of reports analogous to the test--retest reliability of psychological tests. Clearly, for very simple decisions it can be expected that decision processes will be highly consistent over time. However, for complex decisions (like choosing one out of 20 cars where each car is described by several attributes) it would not be surprising that the choice process would change substantially from one decision situation to another. Therefore, this study will examine decisions of intermediate difficulty which require binary choices between alternatives described by six attributes. Concurrent verbalizations reflect some of the information being attended to during the task performance. Retrospective verbalizations contain information about the task performance recalled from long term memory after the task has been completed. Since recall from memory is
RELIABILITY AND VALIDITY OF VERBAL RFPORTS
431
fallible, concurrent reports would be expected to be more reliable than retrospective ones. The final issue to be addressed in this paper is the convergent validity of different forms of verbalizations. Method Subjects Thirteen subjects (6 females and 7 males) participated in two experimental sessions. Participation was voluntary. Subjects were told a t the beginning of session 1 that at the end of the second session they would receive between $3 and $6 depending on how well they predicted which student would do better. At the end of session 2 each subject received$6. All 13 subjects that participated in the first session participated also in the second session of the experiment.
Task Subjects were presented with information about pairs of students, and were told that both students were admitted into a quantitatively oriented graduate business program. Their task was to decide which of the two students would be expected to achieve a higher grade point average (GPA) in that graduate program.
Stimuli The information about each student included some or all of the following pieces of information: GMAT-Q: The percentile score on a Quantitative aptitude test. GMAT- V: The percentile score on a Verbal aptitude test. Undergraduate Institution: The name of the undergraduate school and a rating of the overall quality of undergraduate education provided in that school. GPA-Total: The GPA the student has achieved during his undergraduate education on a scale from 1 to 4. GPA-Qwntitatiue: The GPA achieved, on a scale from 1 t o 4, for all mathematics and statistics courses from the student’s undergraduate education.
E.J. Fidler
432
Undergraduate Major: The major field of study during Gndergraduate education, and its rating in terms of suitability for entering a quantitatively oriented business school. The ordinal scales for the schools and majors were obtained from a groupof 20 management students who were asked to rate the various schools and majors on a scale from 1 to 5 with 1 representing the lowest rating and 5 the highest. Based on these responses, each school and major was assigned the median rating rounded to the closest integer value. The knowledge of the attributes for students who were actually admitted into a specific business school guided the generation of 26 different hypothetical decisions. As an example, one decision is displayed in Table 1.
GMAT
Alternative 1 Alternative 2
Undergraduate Institution
GPA Total
I7
GMAT Quant.
89
Kansas State (1)
3.6
15
Undergraduate
Humanities (1)
Quant.
I
I
3.3
3*1
Since another purpose of the experiment was t o examine how people make decisions under conditions of missing information, some pieces of information were frequently unknown (e.g., for the decision in Table 1 the school and GPA-T in alternative one, and the major and GMAT-V in alternative two were not known). The results of this aspect of the experiment are described in a companion paper (Fidler, 1981).
Procedure Each subject was told that he/she would be participating in a study that examines decisions between two alternatives. Subjects made the decisions in two sessions, w h c h were held one week apart. As can be seen from Table 2, in Session 1, six practice decisions plus 38 decisions were made, while in Session 2 , 3 2 decisions (a subset of the Session 1 decisions) had to be made. In Session 1 twelve of 38 decisions and in Session 2 eleven of the 32 decisions were made twice. This resulted in eleven decisions being made four times, one decision three times, nine decisions twice, and five decisions once during the course of the two sessions. Each subject made the
RELIABILITY AND VALIDITY OF VERBAL REPORTS Table 2. Experimental Design -~ Sequence of decisions in Session 1 P lb P2 P3 1 2 3 *a 4 5
6
I * Subjects 1-7 verbalized these decisions. 8 Subjects 8-13 made these 9 decisions silently. 10 11 12 * 1 8* 13 4 3* 14 15 P4 P5 P6 16 * 17 18 19 20 21 22 * Subjects 1-7 made these 2 decisions silently. 9 Subjects 8-13 verbalized 23 these decisions 24 6
I*
433
Sequence of decisions in Session 2
22 * 26 21 12 * 11 16 * 20 19 16 * 6 10 9 2 7* 18 23 8* 1 12 * 4 7* 11 9 14 3* 8 6 4 2 1 5 3*
Subjects 1-7 made these decisions silently. Subjects 8 , 1 3 verbalized these decisions.
--
Subjects 1-7 verbalized these decisions Subjects 8-13 made these decisions silently.
25 16 * 26 11 12 * 10 response. p 1, p 2, p 3, p 4, p 5, p 6 are practice decisions. The other numbers refer to the decision number.
4 34
E.J. Fidler
decisions in the same order. In each session half the decisions were made while thinking aloud and half were made silently. The decisions that were varbalized in Session 1 were made silently in Session 2 . Similarly, the decisions made silently in Session 1 were verbalized in Session 2. Additionally, the order of the decisions was changed in Session 2 . Instructions to think aloud were given in both sessions of the experiment before the fust decision to be vocalized. In these instructions subjects were told to verbalize every thought and every detail of their thinking process, including what information they were looking at, what thoughts they were having about any piece of information, how they were evaluating the different pieces of information, and the reasons which led t o their decisions. In each session general and specific retrospective verbal reports were requested. Subjects were not told in advance that retrospective verbal reports would be prompted. As can be seen from Table 2 the retrospective verbal reports were prompted for each subject after the same ten decisions in both sessions. Specific retrospective reports were solicited immediately after completion of the general retrospective reports. General retrospective reports were prompted with the following question: "You just responded that alternative l(2) would d o better in graduate school, how did you reach this decision? " After responding to this question, subjects were prompted to specific retrospective reports by asking: "Which decision attributes were most important for this last decision? " Results
Influence of Verbalization on the Reliability of Decision Outcomes The question t o be addressed in this section is whether verbalization leads to more deliberate and consequently "less noisy" (or more consistent) decisions. In order to explore the existence of such a verbalization effect, all decisions that were either always verbalized or always made silently over the course of the experiment have been analyzed. In total four decisions made four times and eight decisions made twice were included in the analysis For each of these decisions the number of inconsistent decisions was established as follows. For decisions made twice the number of inconsistencies was 0 if the same alternative was preferred both times and 1 othervise; for decisions made four times the number of inconsistencies was 0 if the same alternative was preferred each time, 1 if one choice outcome was inconsistent with the majority of choices, and 2 if both alternatives
RELIABILlTY AND VALIDITY OF VERBAL REPORTS
435
were preferred equally often. The consistency measure was then defined as: actually consistent decisions/potentially consistent decisions. The results of t h s analysis indicate that the choice outcomes were on average consistent for 77% (N = 104) of the decisions when subjects were thinking aloud and for 75% (N = 104) of the decisions when subjects were making the decisions silently. For n o subject was the consistency between verbalized and silent decisions statistically significant. However, this is not surprising when considering the small number of observations (N = 16) of this within-subject analysis.
Influence of Verbalization on Decision Latencies Decision latency was defined as the time elapsed between the presentation of the decision alternatives and the choice response. The average decision time of verbalized decisions was 58.7 sec (N = 356) more than double that of nonverbalized decisions (24.7, N = 328). For twelve of the thirteen subjects the latencies of concurrently verbalized decisions were significantly (two-tailed t-test p < 0.01) higher than those made silently.
Consistency of Verbal Reports To determine the consistency of verbal reports, some form of encoding of the verbalizations is necessary. The codings of all verbal reports were performed by two coders, the author and a coder who was unaware of the goals of the study. The intercoder-consistency was high and 90.6% of all encodings made by the two coders were identical. Because of this hgh consistency between the two coders, the results presented below are based only on the encodings of one coder, the author. Since the goal of this section is to compare the consistency of concurrent and retrospective verbalizations, an encodhg procedure was selected that is applicable for both types of reports. The verbal reports with the least amount of information were the specific retrospective reports which solicited the decision attributes that were most important for a decision. Therefore, for the general retrospective reports and the concurrent verbalizations of all attributes which were indicated in the conclusion of the verbalizations as having influenced (or being important for) the decision outcome were encoded. As an example, the GPA-Q, GMAT-Q,and school were encoded for the following concurrent report (concerning the example given in Table 1).
28*
E.J. Fidler
436
Alternative 1 is a humanities major, that does not impress me but at the same time the GMAT-Q is definitely higher than the verbal which does not seem like a humanities major too much, so it may be that person may be more quantitatively inclined especially if the second one has 75 in GMAT-Q and No. 1 has 89, yes I go for one because also two has a lower GPA-Q and a lower GMAT-Q and did not go to a terrific school. The (within-subject) consistency (in terms of attributes mentioned) of the verbal reports was determined for pairs of identical decisions which resulted in the same choice outcome. The consistency between the codes of two decisions was defined as: 100 *
intersection (decision 1 and 2 codes) ' union (decision 1 and 2 codes)
As an illustration of this consistency measure, let us assume the following hypothetical example: Decision No. 1 codes: GMAT-V, school, GPA-T and decision No.2 codes: school, GPA-T, major. For this pair of decisions the 2(school and GPA-T) . As consistency is: 50 = 100 * 4(GMAT-V, School, GPA-T, and major) defined, the consistency measure ranges between 0 (no consistency) and 100 (perfect consistency). A comparison of the consistency measures gives for concurrent, general retrospective, and specific retrospective verbal reports the following mean consistencies, 69.2, 53.2, 56.7, respectively. To test whether the concurrent reports are more consistent than the retrospective ones, a within-subject analysis was performed. For 10 subjects this analysis indicates that concurrent reports are more consistent than general retrospective reports (p < 0.05, two-tailed t-test). This same result also holds for 9 subjects when comparing concurrent with specific retrospective reports.
Convergent Validity of Atnibute Salience Measures The relationship between the following operationalizations of attribute salience were determined: (1) The total number of times each attribute was indicated in the mncurrent reports as having had an influence on the decision outcome. (2) The total number of times each attribute was indicated in the general retrospective reports as having had an influence on the decision outcoms. (3) The total number of times each attribute was indicated in the specific retrospective reports as having been most important for the decision. (4) The linear model weights of the attributes obtained
RELIABILITY AND VALIDITY OF VERBAL REPORTS
437
from an analysis of the choice outcomes under two assumptions of inferences for missing information. (5) The ratings of attribute salience at the beginning and after the end of the experiment (i.e., interpretive reports). For the first three measures, it was determined how often each attribute was encoded across all decisions. As a result, for each subject a numerical score was derived showing how often each attribute was considered to have had an influence on (or t o be important for) the decision outcome. To determine the standardized linear model weights, bk, the following additive choice model was assumed: n yij =
C (x,k - Xjk) bk
k=l
+ Etj
(1)
This model states that the preference response Yij (which takes on the value 1 when alternative i is preferred to alternative j, and the value 0 when alternative j is preferred to alternative i) is a linear function of the attribute differences (Xik - Xjk). Furthermore, Equation (1) is a logical extension of the traditional linear judgment model (see Slovic and Lichtenstein (1971) for an extensive review of this literature) t o binary choices. Choice responses provide an ordinal judgement, thereby indicating only which alternative is better (i.e., whether Ylj is 0 or 1). When the observed dependent variable is not on an interval scale, the parameter estimates, &, if estimated by ordinary least squares, m i a t be biased and inconsistent (McKelvey and Zavoina, 1975). Therefore, the attribute weights reported in this study were estimated with a maximum likelihood procedure (cf. McKelvey and Zavoina (1975) for a detailed description of this procedure) which does not suffer from the potential inconsistency problems, as does the simple regression model. Because of frequent missing information for the attributes x i k , any estimation technique for determining the bk’s in Equation (1) requires implicit or explicit assumptions about the unknown attribute values. Therefore, the author used for purposes of analysis two different estimation methods for determining the missing attributes: (1) The substitution of the population mean of the observed attribute values for missing data; and The use of the average score of the available standardized attrib(2) ute values of a given alternative as an estimate for all missing attributes of that alternative. The explanatory power of the linear models based on these two methods for determining the missing attributes was very similar. For 3 subjects the first method described better choice outcomes; for 2 subjects the second method gave the best description, and for 8 subjects both methods of substitution described equally well the decision outcomes.
E.J. Fidier
438
Finally, two additional salience measures were derived from interpretive verbal reports. Each subject was asked t o rate the overall importance of each decision attribute on a scale from 1 to 10 (with a higher rating indicating a more important attribute) (i) before the start of the decision making experiment; and (ii) immediately after the last decision had been completed . To illustrate the convergent validity of these salience measures, their intercorrelations for CMAT-Q are displayed in Table 3. The intercorrelations for the other decision attributes are similar in size and are, because of space limitations, not reproduced. The product moment correlations of the salience measures shown in Table 3 are all positive and significantly different from zero ( p <0.05, N =13). Furthermore, the "within-methods" relationships appear, as expected, to be much stronger than the "between-methods" ones. Table 3. Intercorrelations between Attribute Salience Measures for the Salience of GMAT-Q CON
1.O
RET 1
0.68
1.0
RET 2
0.74
0.91
1.0
INT 1
0.70
0.64
0.62
1.0
INT 2
0.75
0.67
0.69
0.83
1.0
LM 1
0.65
0.55
0.60
0.60
0.69
LM 2
0.62
0.68
0.68
0.54
0.67
0.90
1.0
CON
RETl
RET2
INT1
INT2
LM 1
LM2
1.0
Note: The following codes were used: CON =Concurrent reports, RET 1 =General retrospective reports, RET 2 = Specific retrospective reports, INT 1 =Interpretive reports (beginnlng of the experiment), INT 2 =Interpretive reports (end of experiment), LM 1 =Linear model weights-attribute mean substituted, and LM 2 =Linear model weights -average score of given attributes substituted.
Discussion The results of this study are based on a relatively small size o f 13 subjects. Thus the generalizability of the results may be limited.However, the findings are generally consistent with previous research (cf. Ericsson and Simon, 1980) suggesting that verbal reports are (1) reliable and valid sources of data and (2) d o not change the course of the thought process.
RELIABILITY A N D VALIDITY OF VERBAL REPORlS
439
The result that verbalized decisions took a much longer time than non-verbalized ones appears t o be caused by the differential speed of thinking and speaking processes (cf. Payne et al., 1978). The lower consistency of both retrospective reports than the concurrent reports seems t o be caused largely by failures t o recall the correct thought process for the previous decision. Examples of statements in the retrospective reports which refer to the inability of recalling information are? ... It is hard to answer these questions because I turn the page and I don’t remember, I don’t know how I did that everything is gone...”; one...“. One partizularly interesting statement was: “...It is hard to remember because when you talk about it you look more at it, and it sticks the more; it is like they tell you the read aloud stuff so that it sticks better in your mind, and that is true; when 1 am just thinking I am faster than if I am talking...”. The first part of the last statement suggests that retrospective reports after verbalized decisions are more consistent than those after non-verbalized decisions In the present study the mean consistency of these retrospective reports were 60.2 (N = 60) and 48.8 (N = 48) respectively. Thus the data of this study appear to support somewhat the notion that verbalization facilitates recall. The analysis of the convergent validity of the attribute salience measures suggests that all of the measures included in the analysis are indeed related t o the same construct, i.e., attribute salience. Furthermore, the correlations of the within-method measures are substantially higher than the between-method ones. The analysis also indicates that the between-method correlations are very similar for the different salience measures. Finally, the ratings of salience at the end of the experiment have generally higher correlations with the other salience measures than the ratings at the beginning of the experiment (INT2). This suggests that ratings, in general, may be more valid immediately after the performance of a task that is related to the ratings to be made. A potential factor contributing t o some of the results of this study may have been the specific characteristics of the task (i.e., subjects were given information cues for each alternative and only based on that information had t o decide which alternative t o choose). In many everyday decision situations like shopping in a supermarket or department store, information is usually not displayed in the form of a stimulus-attribute matrix. Consequently, for such decisions retrospective verbalizations of choice reasons may be less valid because subjects simply might not be aware of all the factors that influenced their choice. This is congruent with Nisbett and Wilson’s (1977) findings that people’s responses to the question “why did you choose alternative x” did not coincide with other
”...
440
E.J. Fidler
behavioral measures when information was not presented in the form of a stimulus- attribute matrix. In conclusion, concurrent verbalizations do not seem to affect the reliability of decision outcomes, and they appear t o be more reliable than the retrospective reports. Therefore, concurrent verbalizations may be a more desirable form of data than retrospective and interpretive verbalizations. For these reasons concurrent verbal reports should probably be the first choice when considering a method for collecting verbal reports concerning decision tasks of the present type. The increased time necessary for subjects t o complete the tasks as well as the efforts for analyzing the reports appear to be a necessary price to pay for the potential insights to be gained from their analysis and for a better understanding of human thought processes.
References Ericsson, K . A . and H.A. Simon, 1980. Verbal reports as data. PsychologicalReview, 87, 215- 251. Fidler, E.J., 1981. Choices among incompletely described alternatives: A process tracing study. Unpubllshed manuscript. Faculty of Commerce, University of British Columbia. McKelvey, R.D. and W. Zavoina, 1975. A statistical model for the analysis of ordinal level dependent variables. Journul of Mathemutical Sociology, 4 , 103- 120. Nevell, A. and H.A. Slmon, 1972. Human Problem Solving. Englewood Cliffs, N.J.: Prentice-Hall. Nisbett, R.E. and T.D. Wilson, 1977. Telling more than we can know: Verbal reports on mental processes Psychological Review, 84, 231 -259. Payne, J.W., 1976. Task complexity and contingent processing in decision making: An information search and protocol analysis. Organizatioml Behuvior and Humon Performance, 16, 366-387. Payne, J.W., M.L. Braunsteln, and J.S. Carroll, 1978. Explorlng predecisional behavior: An alternative approach t o decision research. Organizational Behavior and Human Performance, 22, 17-44. Slovic, P. and S. Lichtenstein, 1971. Comparison of Bayesian and regression approaches to the study of information processing in judgment. Organizational Behavior and Human Performance, 6, 649-144. Svenson, 0..1979. Process descriptions of decision making. Organizational Behavior and Human Performance, 23, 86- 112. Tversky, A., 1969. Intransitivity of preferences. Psychological Review, 76, 31-48.
THE INFORMATION PRESENTED AND ACTUALLY PROCESSED IN A DECISION TASK* Oswald HUBER Univ ersitat SaIzb urg, Austria
Abstract The results of this study show that decision makers (DM) do not simply react t o the dimensions presented by the experimenter only, but also introduce new information from memory when constructing a cognitive representation of a decision task. In Experiments 1 and 2 subjects chose among candidates for a manager-post, who were described on two or three dimensions. On the common dimension information was available for both candidates, on the single dimensions for only one of them. The results showed that if a relevant aspect was missing and the DM has a knowledge base to predict the missing value, the missing value is substituted. If no prediction is possible, most DMs base their choice on considerations whether the missing aspect on the single dimension Dsis better or worse than the available aspect on this dimension. In Experiment 3 subject were confronted with a familiar decision task (student jobs) and a new one (research projects). As may be predicted DMs produced more new information from memory in the familiar decision task than in the new one. In the new task the less information available the more information the DMs introduced from memory. Consequences of these results for methodology and theory are finally discussed in the paper.
* I want to thank the Akademische Senat der Universitat Salzburg for granting a financial subvention to enable the performance of the experiments. Furthermore I want to thank H. Lindner and G. Louis for performing the preexperiment to Experiment 2, and U. Adami and A. Matt for coding the thlnking-aloud protocols of Experiment 3. I am grateful to 0. Svenson and two referees for their helpful suggestions on the manuscript. Mailing address: Dr. Oswald Huber, Universitat Salzburg, Institut fiir Psyche logie, Akademiestr. 22, A-5020 Salzburg, Austria.
442
0. Huber
Introduction Experiments in decision theory usually assume that the decision maker (DM) utilizes only the information presented by the experimenter, or a subset of this information. Generally, the possibility that the DM introduces new information into the decision task from his or her memory is implicitly or explicitly ruled out. However, research in human information processing has shown that man does not react t o information passively, but actively interprets and relates it t o already stored knowledge (cf., e.g., Norman, 1979). There is no reason t o assume that man in a decision situation behaves otherwise. In fact, Shanteau and Nagy (1976) in their research on dating preferences found DMs to incorporate a new dimension (the subjective probability of getting a date), even when no information concerning this dimension was presented. Abelson (1976) argues very convincingly that the DM produces cognitive scripts which are an essential part in the cognitive representation of a decision task. The DM's knowledge is ecplicitly incorporated in consumer research (e.g.. in terms of product knowlcdge and brand name), cf. Bettman (1979) for a review. The present paper reports some experimental results indicating that the DM introduces new information from his or her memory, and discusses the consequences of these results for methodology and theory.
Presentation of Incomplete Information Consider the following situation (Table 1). The DM is confronted with a pair of alternatives x , y , which are described on two dimensions (Ds and Dc) or on three (Di, Dc, Di). On the common dimension Dc information is Table 1. General Schemes for Construction of Alternatives
I
Two-dimensional pairs Dimensions Alternatives
D,
X
-
Y
Ys
Dc XC
Yc
Three-dimensional pairs
1
I
Dimensions DC
D;
XC
XS'
Yc
-
Ds
-
ys '
PRESENTED AND PROCESSED INFORMATION
44 3
presented for both alternatives, on the single dimensions D,, DQand Di, information is available for only one of the alternatives. In a series of experiments, Slovic and MacPhillamy (1974) confirmed the hypothesis that the DM in such a situation attaches more weight t o a dimension if it is a common one than if it is a single one. However, it is doubtful whether an increase in the subjective weight of the common dimension D, is SUEcient to determine the decision in all situations with incomplete information. One should expect the DM also to think about the missing aspect.
Experiment I Experiment 1 was designed to test the hypothesis that the DM in a situation with incomplete information inserts a derived aspect or a distribution of aspects instead of the missing one. This hypothesis was tested in an extreme decision situation, where the relevant information is available for one alternative only.
Decision Task Ss had to choose one out of two candidates for the post of the acting manager of a firm Candidates for the post were described on two or three dimensions, such as professional qualification, manliness, intelligence, elegance and objectiveness. Scores ranging from 50 t o 150 were used to describe the candidates on each dimension (150 was the most, 50 the least desirable score, mean value was said to be 100). Thirty-six two- and 30 three-dimensional pairs were constructed with scores varying according to the General schemes in Table 1. However, only the 7 two- and the 6 three-dimensional pairs constructed in correspondence to the Critical schemes in Table 2 are the most critical ones for a test of the hypothesis. In the Critical pairs D, and Db respectively are important dimensions, D, and D; are unimportant ones. Let x, as in Table 2, be that alternative where the information on the important dimension D, and Di respectively is missing, and y that one, where the information is available on dimension D, and Db respectively. If no substitution takes place in choices among pairs of alternatives described on 2 dimensions (left side of Table 2), vir: tually all decision strategies predict a choice of y or do not enable a decision (Majority-, Greatest-attractivenes-difference-,Weighted-sets-ofdimensions-, Lexicographic-, Maximax-, Maximin-, Additive-Utility- and Averaging-strategy; most of these strategies are described in Svenson, 1979). If a score for the missing information for alternative x is substi-
0. Huber
444
-Twodimensional pairs
Threedimensional pairs
Dimensions
Dimensions
Alternatives
DS -
X
Y
Dd
DC x,<
ys< 80 ~~
-
100
y s < 80
y c > 100
DC xc < 100 YC
> 100
D; x,"
~
Examples for pairs of alternatives I
organizing ability
manliness
professional qualification
elegance
religiousness
X
-
62
-
69
117
Y
65
123
13
126
-
tuted, the choice depends on the value of that score. Since the available value for the complete alternative y is very low (80), I suppose that most DMs believe the other alternative to have a better score on this dimension (e.g., the mean value). If a score greater t b n 80 is inserted instead of the missing one for alternative x, the much more important dimensions Ds and DI respectively should determine the choice for most DMs, and therefore alternative x should be chosen more often. The predictions for the three-dimensional pairs of alternatives (right side of Table 2) are similar, if we assume that the introduction of an additional unimportant dimension (D;) does not change the essence of the situation. Subjects and Procedure Eighteen Ss (first year students of psychology and zoology) were run individually in sessions of about 40 min. After the S had been informed about the task, he or she rated the subjective importance of 22 dimensions on a linear scale ranging from 0 (no importance) to 100 (highest importance). The important dimensions in the main experiment (for each S respec-
PRESENTED AND PROCESSED INFORMATION
44 5
tively) were chosen randomly from the set of dimensions with a rating 2 75. Dimensions with ratings < 25 were selected as unimportant ones. Then the Ss made their choices among the pairs of candidates in a randomized order.
Predictions and Results Let the dependent variable p(y) be the probability of a choice of alternative y, i.e.. a choice of that alternative in the critical schemes of Table 2, where information is available on the important single dimension. From the discussion above it follows that p(y) = 1 or at least p(y) > 0.5, if no substitution takes place. If a score greater than 80 or a distribution of scores is substituted, p(y) ~0 or at least p(y)< 0.5. The choice probability p(y) was computed for each S respectively. The results support the substitution hypothesis. The mean observed choice probability was p(y) = 0.1 89 (0.191 in the two-dimensional pairs, and 0.26 in the three-dimensional ones). The probability that the true value of p(y) is smaller than 0.5 is 0.93. Since the total number of choices per S is small ( n = 13), p(y) and the cuniulative probability were computed on a Beta-distribution with noninformative prior-knowledge.
Experiment 2
In Experiment 1 the subjectively perceived correlations between pairs of dimensions were not controlled. However, different magnitudes of subjective correlations should lead to different DM behaviors, as stated in the following hypothesis. If D, and D, are positively correlated, the DM should substitute a score or a distribution of scores on D, similar t o the available one on D,; if the two dimensions are not correlated, the DM is expected more often to base the substitution on other considerations, whether the missing score for alternative x on dimension D, is higher or lower than the available score for alternative y on this dimension. Furthermore, in Experiment 1 the common dimension D, was always an unimportant one. This condition too was altered in Experiment 2 .
0.Iluber
446
Decision Task The same general task was used as in Experiment 1 with the alternatives described on two dimensions (D, and D,). Three conditions were varied: (1 ) high’ or low values (see Table 3) ; (2) subjective correlation among D, and D,: high positive and low (the estimation procedure is described below); (3) importance of dimensions: condition I1 (both dimensions important) and 1U (D, important, D, unimportant). The specific pairs of alternatives in the schemes of Table 3 were selected from the previous experiment in such a way that the choice probability for each alternative in a pair was 0.5. Table 3 . High- and Low-Value Schemes for Construction of Alternatives in Experiment 2
Alternatives
-Highvalue scheme _.
Low-value scheme
Dimensions
Dimensions
DS
X
Y
DC xc = 120
-
ys
90
y c < 95
D, -
y s < 80
DC X C sz
y, ~
60 9 0
Predict ions Let p(y) again be the probability of a choice of alternative y, i.e., a choice of that alternative in the schemes of Table 3, where inforniation is available on the single dimension D,.If the hypothesis is true, p(y) should differ in high- and lowcorrelation conditions, but the direction of differences should depend on the value-schemes of alternatives. In the high-value scheme a high correlation between D, and D, leads to the substitution of a high score on D,, hence p(y) should be smaller here than in the low correlation condition. In the low-value condition the opposite is predicted: p(y) should be greater in the high correlation condition, because the substitution of a similar small score favors a choice of y. If the substitution behavior is influenced by the weight of the common dimension DC, p(y) should differ in the importance conditions I1 and IU, or there should be an interaction between the importance conditions and the correlation conditions.
PRESENTED AND PROCESSED INFORMATION
44 I
Preselection of Dimensions and Construction of Pairs of Alternatives The main problem in the design was to find the proper time to measure subjective correlation. For choosing the dimensions it would have been optimal to estimate the correlation at the beginning of the experiment. However, such a procedure would make the Ss attentive to correlations beforehand and possibly bias the results. Hence, it was decided t o perform the estimation of correlations at the end of the experiment. Therefore, a preselection of dimensions was necessary in a preliminary experiment with 10 Ss The Ss in this experiment rated the importance of a series of dimensions as in Experiment 1. The subjective correlations among pairs of these dimensions were estimated b y the same method as described below. Based on the results from these Ss, 20 pairs of dimensions were selected for the main experiment. Ten had a mean positive correlation as high as possible, 5 of these 10 pairs were in the 11-condition, 5 in the IU-condition. Ten pairs had a mean correlation of about 0; again 5 were 11- and 5 IU-pairs. For each selected pair of dimensions two pairs of candidates were constructed, one according t o the high-value scheme of Table 3, one following the lowvalue scheme. Thus there were 40 pairs of alternatives altogether.
Estimation of Subjective Correlations In the main experiment the subjective correlation among the preselected pairs of dimensions was estimated for each S separately. Each S was presented two linear scales ranging from 50 t o 1-50, representing t w o dimensions of a candidate’s description. The score on one of the scales was given by a mark on the scale. The S was asked t o mark the lower and upper bounds of the interval on the second scale that he or she felt reasonably sure to contain the candidate’s score on the second dimension, but was as narrow as possible (bounded interval measure, cf. Fishburn, 1964). Each pair of dimensions was presented twice, with a high and a low available score. Available scores and the S’s upper and lower bounds were correlated for each pair of dimensions. Subjects and Procedure Twelve students were run individually in sessions of about one hour.& were paid 45 Austrian Schillings (about 8 3). First, the S rated the importance of the dimensions as in Experiment 1. Then, he or she chose among the 40
448
0.Huber
pairs of alternatives presented in a random order. A concurrent thinking aloud procedure was employed in this phase of the experiment. In the third part, the subjective correlations among dimensions were estimated. For each S each pair of dimensions was assigned t o the appropriate combination of importance- and correlation-conditions. Pairs of dimensions with an estimated r = 0.67 were assigned to the high correlation condition, pairs of dimensions with an r <0.33 to the low correlation condition. The classification of dimensions as important and unimportant was performed as in Experiment 1. Since it was possible that a S’s estimations departed from that of the Ss in the preexperiment, and therefore did not meet the definitions of conditions, the pairs of candidates based on descriptions on these dimensions were excluded. In the worst case, data from 24 choices among candidates described on 12 dimensions could be used.
Results The choice probability p(y) (as defined in the Predictions-section) was the dependent variable in a four-way analysis of variance (importance X correlation X value scheme X subjects), which confirmed the hypothesis: With the high-value schemes: p(y) = 0.35 in the high- and p(y) = 0.51 in the low-correlation condition; with the low-value scheme: p(y) = 0.63 in the high- and p(y) = 0.49 in the low-correlation condition, the interaction being in the predicted direction, F(l,77) = 24,18, p < 0.001. The probability, p(y), was not influenced by the importance conditions I1 and IU: F( 1,77) = 0.02; the interaction importance-conditions X correlationconditions was not significant (F(1,77) = 0.04). Thus the substitution behavior seems not t o be influenced by the weight of the common dimension D,. The thinking-aloud protocols for each choice were first classified as (A) CORR (if the S expressed a subjective correlation or connection between the two dimensions involved in the decision) or as (B)NO CORR (if no correlation was expressed), and then they were categorized into one of the following subcategories. (1) S: Substitution of a score or an interval of scores on D, with explicit reference to the score on D, (e.g., “if he has such a high intelligence, he will also have a high value on professional qualification”); (2) SD: Considering a distribution of scores on dimension D, in relation to the available score on D, (e.g., “since the other one
PRESENTED AND PROCESSED INFORMATION
449
has such a low intelligence I have a good chance that this one is more intelligent"); ( 3 ) D,: The common dimension D, is more important than the single dimension D, and determines the decision (e.g., "this one is higher on fairness, and fairness counts more than professional qualification"); (4) Acc: The choice is based on the Acceptability or unacceptability of available scores (e.g., "such a low qualification is simply not acceptable, therefore I choose the other one"); ( 5 ) Others: Other strategies and protocols which could not be interpret ed In the CORR choices the mean percentages of the subcategories were, S = 88%, SD = 12%, D, = Acc = Others = 0%. In the NO CORR choices the mean percentages were S = lo%, SD = 33% D, = 28%, Acc = 18%, Others = 11%. The difference between the proportions S/(S + SD) as well as between S/(S t SD t D, + Acc + Others) in CORR and NO CORR choices is in the predicted direction (p < 0.001). Piease note that an Acc-classification does not exclude the possibility that considerations about a distribution of scores play a role. Thus a DM in condition IU could apply the following rule: If the available score y, on D, is acceptable, choose y, because it is less probable that x, would also be acceptable. Also, note that in the D,-choices a context phenomenon in the sense of Slovic and MacPhillamy's hypothesis (the DM attaches more subjective weight to a dimension if it is a common one than if it is a single one) may influence the choice. The choice probabilities p(y) were explored for CORR- and NOCORR-pairs also, by performing a four-way analysis of variance (CORR/NU CORR X importance X value scheme X subjects), with p(y) as dependent variable. The results parallel that of the analysis of variance with the factors (importance X correlation X value scheme X subjects) reported above. In the high-value scheme $(y) = 0.12 with CORR-pairs and 0.46 with NO CORR-pairs; with the low-valuescheme $(y) = 0.68 for CORR. and g y ) = 0.45 for NO CORR-pairs, the interaction is in the predicted direction, F(1,77) = 75,27, p <0.001. Again c(y) was not affected by the importance conditions (F(1,77) = 0.58), and the interaction significant importance conditions C O W N O CORR was not (F(1,77) =O.OO). The results of Experiments 1 and 2 can be summarized as follows. If an aspect on a relevant dimension is not available, there are two possibilities. First, if the DM has a memory data base enabling him or her to predict the missing aspect (such as a high subjective correlation), the predicted aspect is substituted. If the prediction of a specific missing aspect or
.
29
450
0.Muber
an interval of aspects is not possible, most DMs base their choice on considerations, whether the missing aspect on dimension D, is better or worse than the available aspect on D,. Presentation of Complete Information Ex penmen t 3 The DMs representation of the decision task is assumed to be influenced by the information stored in memory also in situations where there is no information missing, i.e., where all dimensions are common t o all alternatives. Experiment 3 was designed to test the following hypotheses. ( I ) The more background knowledge about the alternatives a DM has stored in memory, the more 'new' information he or she introduces into his or her representation of the decision task. (2) The less information available in the decision task, the more information the DM produces from his or her memory. Subjects Twenty-eight paid students were run individually in sessions of about one hour. At the time of the experiment, all Ss were in the process of choosing a student job during the summer vacations. All the students had also made such a decision at least once before. Decision Task and Procedure The following two experimental conditions were relevant for the hypotheses. (1) Amount of background knowledge as reflected in two types of decision tasks, decisions among student jobs during summer vacations (=familiar task with much background knowledge) and decisions among research projects (new task with less background knowledge). ( 2 ) Amount of available information was varied by the number of dimensions used t o describe the alternatives: 4 or 14. The order of familiar and new tasks was randomized among Ss. Within each task the order of the various conditions was randomized for each S. Each S made at least 12 choices (in each combination of conditions two choices between two alternatives, and one choice among six alternatives) plus two warming-up choices. A concurrent thinking-aloud pocedure was employed.
PRESENTED AND PROCESSED INFORMATION
451
Results Let NI be number of statements indicating the processing of information
nor presented by the experimenter. Three independent coders classified the statements, intercoder reliability was computed according t o Fleiss (1971), K = 0.83. Nf was computed only from choices between two alternatives, because of the structure of the decision process in choices among more than two alternatives (cf., e.g., Svenson, 1979). No attempt was made to distinguish between single pieces of background knowledge and scripts in the sense of Abelson (1976). A three-way analysis of variance (type of task X number of dimension X Ss) was performed with NI as dependent variable. Mean NI was 0.92 in the familiar task choices and 0.49 with the new task, thus hypothesis (1) was confirmed, F(1,81) = 15.55, p < 0.001, The effect of the number of dimensions was not significant (F(1,81) = 0.73), mean NI was0.67 and 0.75 in the 14 and the 4-dirnensions-condition~,respectively. Therefore, hypothesis (2) was falsified. The interaction type of task X number of dimensions was significant, F( 1,81) = 6.92, p < 0.05. In the new task condition mean NI was 0.30 with 14 and 0.68 with 4 dimensions, but in the familiar task mean NI was 1 .O 1 with 14 and 0.82 with 4 dimensions. Thus the hypothesis that more information retrieved from memory is processed the less information available, was supported in the new task condition, but contradicted in the familiar task condition. Discussion First, it is interesting to compare the results of Experiments 1 and 2 of the present study, with those of Englander, Tyszka, and Farkas (1980), who also investigated the role of missing information. In contrast to the present results, their Ss did not seem to substitute missing information. Two reasons may cause this difference: (1) In my study the missing information was always a substantial part of the description, in Experiment 1 and in importancecondition IU of Experiment 2 there was no relevant information available for one alternative. If two alternatives are described on several different dimensions, which are important to the DM, he or she may have enough relevant information to reach a decision without reflecting about the missing aspects. (2) In my Experiments 1 and 2, the information was presented in an alternatives X dimensions-matrix,and the aspects of an alternative were introduced as numerical values on a continuous dimension. Such a presentation format may make the lack of a score more spectacular for the DM. 29 *
452
0.Huber
As a consequence of the preceding discussion 1 think the results of Experiments 1 and 2 may be generalized to decision situations, where a relevant part of the information is missing. The present results show that the dimensions given by an experimenter were not identical with the dimensions in fact used by the DM. The DM interprets the presented information in his or her own way and combines i t with the information stored in his or her memory. The dimensions actually used in the DM’s representation of the decision task may in part be the same as the experimenter’s, but the DM also may produce completely new dimensions from memory, and utilize combinations of presented and new dimensions. Let D(x) be the set of dimensions actually used by the DM in connection with alternative x. Even if the same set of dimensions is presented by the experimenter for all alternatives, the DM may introduce information on different dimensions for the alternatives. Hence in the general situation we have to expect D(x) #D(y) for all pairs o f alternatives, although we can assume D(x) n D(y) # @. Since DMs are obviously able to deal with such decision situations (cf. also the study of Englander et al., 1980), we should call our attention t o the decision strategies enabling them t o do so. There are (at least) two possibilities: (1) The DM may use strategies, which d o not presuppose D(x) = D(y). Conjunctive-, Disjunctive- and Majority-strategy (cf. Svenson, 1979) and the Weighted-sets-of-dimensionsstrategy (Huber, 1979) are examples. (2) The DM may proceed sequentially: for example, he or she may first concentrate on the common dimensions in the intersection D(x) n D(y) n ..., selecting the best alternative applying any strategy or sequence of strategies. Let z be the best alternative resulting from such a decision procedure. In the next step the DM may check whether alternative z is acceptable on all other dimensions in D(z). Several methods frequently applied in decision research require an explicit knowledge of D(x), D b ) , ..., e.g., registration of information acquisition (e.g., eye-movement-recordings), making predictions for choices among alternatives constructed in a specific way (e.g., constructing triplets of alternatives, where the application of the Majority-strategy leads to intransitivities), regression analysis and ANOVA-techniques. In some studies (ems., in those utilizing the ANOVA technique) the additional assumption D(x) = D(y) for all alternatives x and y isnecessary. Usually, the knowledge about D(x), D b ) , ..., is gained by the implicit assumption that the Ss use the same dimensions the experimenter presents. However, if the S introduces additional information from the memory in an unsystematic manner (e.g., the experimenter describes the alternatives x, y and z on dimensions D1 and D2, but for the S is D(x)= {D1,D2,D3] ,
PRESENTED AND PROCESSED INFORMATION
453
D(y) = {D, ,D2) and D(z) = {D, ,D2,Dq 1 ), results obtained with the methods mentioned above m y be inconclusive. I do not want to suggest that all the results are artifacts. What I mean is that the information stored in the DM’s memory is a relevant variable in the representation of the decision task If it is not included as an independent variable, this variable should be properly controlled. One possibility t o control the introduction of information from memory would be a restriction to decision tasks where the DM has no useful background knowledge. Another way is the combination of the thinking-aloud technique with other methods. The thinking-aloud data could be used to determine whether or not information from the DM’s memory is introduced in a way harmful to the experiment. The issue of background knowledge is closely related to other topics of growing interest in decision theory. One is the problem of capacity limitations. Norman and Bobrow (1975) distinguish two types of limitations of human information processing capacity: one type concerns the availability and applicability of cognitive operations (resource-limitations), the other the information available in an information processing context (data-limitations). The information about the alternatives the DM has stored in memory helps him or her to decrease the data-limitations. However, how useful this irrformation is depends on the quality of the representation in memory. The amount and the specific properties of the information stored in the DM’s memory affect the subjective representation of the decision task. The subjective task representation is one of the factors determining resource limitations, because a specific representation may prevent the application of certain decision strategies and enable the use of others. Thus, a future general theory of decision making will have t o incorporate the role of the DM’ semantic memory.
References Abelson, R.P., 1976. Script processing in attitude formation. In: J.S. Carroll and J.W. Payne (eds.), Cognition and Sociol Behavior. Hillsdale, N. J.: Erlbaum, 33-45. Bettman, J.R., 1979. An Information Processing Theory of Consumer Choice. Reading, Massachusetts:Addison-Wesley. Carroll, J.S. and J.W. Payne (eds.), 1976. Cognition and SoclpI Behavior. Hillsdale, N. J.: Erlbaum. Englander, T., T. Tyszka, and E. Farkas, (1980). Comparisons of non-overlapping options. Unpublished manuscript. Fleiss, J. L., 1971. Measuring nominal scale agreement among many raters. Psycholog ical Bulletin. 76, 378 -382. Huber, O., 1979. Nontransitive multidimensional preferences: Theoretical analysis of a model. Theory and Decision, 10, 147-165.
454
0. Huber
Norman, D . k , 1979. Perception, memory and mental process. In: L.G. Nilsson (ed.), Perspectives on Memory Research: Essays in Honor of Uppsala University’s 5UOth Anniversary. Hillsdale, N.J.: Erlbaum, 121-144. Norman, D.A. and D.G. Bobrow, 1975. On data-limited and resource-limited processes. Cognitive Psychology, 7, 44-64. Shanteau, J. and G. Nagy, 1976. Uedslons made about other people: A human judgment analysis of dating choice. In: J.S. Carroll and J.W. Payne (eds.). Cognition and Social Behavior. Hillsdale, N . J.:Erlbaum, 221 -242. Slovic, P. and D. MacPhillamy, 1974. Dimensional commensurability and cue u tilization in comparative judgment, Organizational Behavior and Human Performance, 11, 172-194. Svenson, O., 1979. Process descriptions of decision making. Organizafional Behavior and Human Performance. 23, 06 - 112.
THE ROLE OF SECOND-ORDER PROBABILITIES IN DECISION MAKING' Robert W. GOLDSMITH Department of Psychology, University of Lund, Sweden
and Nils-Eric SAHLIN Department of Philosophy, University of Lund
Abstract The importance, legitimacy and role of secondader probabilities are discussed. Two descriptive models of the use of secondader probabilities in decisions are presented. The results of two empirical studies of the effects of second-order probabilities upon the rank orderings of bets are summarized briefly. The bets were of three basic types and involved a wide variety of first- and second-order probabilities as subjectively assessed by the subjects. Support was obtained for the assumption that the majority of subjects make use of the one model or the other. It is suggested that greater attention should be paid to second-order probabilities, both from a normative and descriptive standpoint.
Introduction Decision theorists have commonly assumed that the degree of certainty a decision maker attaches to the probability estimate of an uncertain event should not affect the decision taken. Expressed differently, this assump tion states that from a normative standpoint second-order probabilities (also referred to as probabilities of probabilities, or secondary probabilities) are irrelevant to a decision. A usual argument for this position (cf. Savage, 1954) is that if the probability of an event (first-order probability) does appear uncertain the decision maker should use a weighted average with second-order probabilities as weights to obtain a new point estimate, this latter estimate then expressing all uncertainty of relevance t o the
This work was supported by The Bank of Sweden Tercentenary Foundation. We are grateful to Eva Malmberg and Kristo lvanov for their help in collecting the data and to Peter Giirdenfors for his comments. Requests for reprints should be sent to Robert W. Goldsmith, Department of Psychology, Universlty of Lund, Paradisgatan 5, S-22350 Lund, Sweden,
456
R.W. Goldsmith and N.-E.Sahlin
decision. A version of this argument is that of Brown and Lindley (1981) who, although they suggest that second-order probabilities be used in reconciling incoherent probability judgments, see this only as a means of obtaining a coherent point estimate. Another frequent argument is that a second-order probability could have associated with it a third-order probability, and so on ad infinirum Even Good (1950, 1956), whose position on second-order probabilities is at least partly a positive one, suggests that their consideration by the decision maker could engender confusion. De Finetti (1972) goes further, declaring ”unknown probabilities”, and thus implicitly second-order probabilities, to be ”meaningless”. Another argument, of similar import, is that second-order probabilities can assume values of only 1 or 0, making a closer consideration of them pointless. Although there has been an increasing interest in second-order probabilities in recent years on the part of philosophers (cf. Gardenfors and Sahlin, 1982; Skyrms, 1980), the view among decision theorists and those applying decision theory t o practical problems seems, as just exemplified, t o have been more critical or qualifying. This may explain the dearth of empirical work concerning the possible effects of second-order probabilities on decision making. Ellsberg (1961; cf. also Fellner, 1961) discovered what has come to be called the “Ellsberg paradox.” This stands for the tendency of subjects given the choice between two bets-one involving an urn of known and the other an urn of unknown composition (where with use of a principle of insufficient reason both bets would be ascribed the same expected value) to select the bet with the urn of known composition. Slovic and Tversky (1974) showed this paradox to persist even when supposedly cogent arguments were directed against the behavior in question. Becker and Brownson (1964) presented subjects with sets of ambiguous bets, i.e., bets of uncertain outcome probability. The bets were of equal expected value but differed in the range of probabilities possible and subjects were found to be willing t o pay for obtaining bets with narrower ranges of probability. Beach and Wise (1 969) employed a set of probability learning tasks in which the amount of learning experience subjects had with different events was varied. Giving subjects the chance to bid for bets based on these events and controlling of differences in assessed probabilities, they found subjects willing t o bid more for the chance to play bets involving events with which they had greater learning experience. Yates and Zukowski (1976), using ambiguous and unambiguous bets, reported that subjects preferred ambiguous bets in which the distribution of possible win probabilities was flat and of maximal range to those in which subjects were ignorant of the distribution. This result ran counter to their hypothesis. In addition, they also received only partial support for the hypothesis that unambiguous bets would be preferred to ambiguous ones.
SECONDORDER PROBABILITIES
451
Larson (1980) employed a factorial design involving comparisons of ambiguous bets, the factors of expected probability of winning (three levels) and outcome probability uncertainty (two levels) being vaned. He found that bets of more certain probability were preferred at all three levels, a result which again ran counter t o the hypothesis. These studies provide support for the assumption that differences in second-order probabilities can affect decisions and that under many conditions alternatives with more certain outcome probabilities-i.e., with higher second-order probabilities-tend t o be preferred. However, the following limitations of work in this area can be noted: subjective estimates of second-order probabilities are generally lacking; the types of uncertain events and the levels of first- and second-order probabilities involved have been limited; the bets appear to only have been ones of positive expected value with the possibility of winning but not of losing; subjects have been given little chance to reveal more complex strategies of utilizing second-order probability information; individual differences have scarcely received attention; and little has been done t o develop a theoretical rationale for the behavior observed. In the present paper criticisms directed against the use of second-order probabilities will be discussed, two models of the use of second-order probability information in decisions will be presented, and the results of three studies in which the models were tested will be briefly reviewed.
Theoretical Considerations
The Meaning, Legitimacy and Importance
of Second-Order Probabilities’ As will be shown, first- and second-order probabilities seem clearly different in the judgmental behaviors they involve. A first-order probability, a measure of which will be termed P,represents (from a subjectivist viewpoint) a person’s degree of belief in some event or state. Better still, it is a member of a person’s set of epistemimlly possible degrees of belief concerning the event or state in question, where “epistemically possible”
The discussion of philosophical issues in this section is partly based on a number of ideas and arguments presented in an unpublished manuscript by Sahlln (1980b). The concepts of epistemic reliability and epistemically possible degrees of belief, as well as the measure p, are discussed more fully, within the framework of a generalized Bayesian decision theory, in Girdenfors and Sahlln (1981,1982).
458
R.W.Coldsmith and N.-E. Sahlin
means not in contradiction with the knowledge the person possesses, Thus, at the first-order level the person’s beliefs about the world are involved. In contrast, a second-order probability-a measure of which will be termed p-represents the person’s degree of belief concerning some first-order probability. Since the distribution of p over the set of epistemically possible first-order probabilities is reflective of how complete or adequate the person assesses the knowledge to be upon which his or her first-order probability is based, p is referred to as a measure of episternic reliability -low reliability being equated with limited information. The basic form of judgmental behavior which second-order probabilities involve seems thus to be that of assessing the adequacy of one’s knowledge. This appears a much neglected aspect of decision making research Such judgments presumably lead the decision maker t o either seek more information or dispense with seeking it, analyze the available information further or cease analyzing it, and either reach a decision or keep options open. As the work already reviewed suggests, such judgments appear to sometimes affect the valuation of decision alternatives as well. Gardenfors and Sahlin (1982) and Levi (1980) present examples where one can intuitively feel that this ought t o be the case. In a different context, though still reflective of the evaluation of whether knowledge is adequate, Goldsmith (1980) found support for the assumption that judges and prosecutors make use of a criterion for information search in dealing with evidence. The possible application of second-order probabilities t o judicial decision making is discussed by Goldsmith (1973). Gidenfors and Sahlin (1982) propound the view that second-order probabilities are essential t o any adequate theory of risk. Regarding the claim that an infinite regress of higher-order probabilities would thwart the attempt to draw practical conclusions from second-order probability assessments, one can suggest that from a pragmatic standpoint the utility of using successively higher orders of probability would diminish rapidly (cf. Good, 1971), and that this together with human cognitive limitations and the implicit cost of thinking should keep the decision maker’s efforts here within reasonable bounds. In addition, probabilities of second and higher orders all seem to involve the same basic type of judgmental behavior, namely the evaluation of how adequate one’s knowledge i s Thus, they can be considered as belonging t o a single dimension, one which the decision maker would presumably not try to analyze in successive stages very far. We also know of no example in the literature where third-order probabilities (or higher) are shown to be important t o the decision taken3 Concerning the argument that a point estimate consistent with a weighted average suffices t o express all uncertainty of relevance t o a deci-
SECOND-ORDER PROBABILITIES
459
sion,we note that such a point estimate does not express the decision maker’s uncertainty regarding the adequacy of the information upon which his or her first-order estimate is based, an aspect which the dispersion of the second-order probability distribution represents. Thus, as already indicated, some estimate of epistemic reliability seems called for as well. In addition, we feel that the exact shape of the second-order probability distribution may be of importance for a decision. We assume that a second-order probability distribution can take on many forms other than simply 3 bell-shaped one-being readily skewed, for example, particularly when the information upon which the estimate is based is very limited, or even being bimodal (cf. Gardenfors and Sahlin, 1982). We assume that under such conditions the decision maker might consider the location of the second-order probability distribution to best be described, not by the mean but by some other point, such as the mode. There is some empirical work suggestive of this, Hederstierna (1 980), who studied a group of subjects with both academic training in probability theory and professional experience in assessing probabilities, reported evidence that the subjectively ”best” estimates of these supposedly ”ideal” probability assessors deviated systematically from the means of their judgmental uncertainty distributions. We feel that the use of the information contained in the secondorder probability distribution can and ought t o depend on both the structure of the decision problem and the idiosyncracies of the decision maker. We do not mean, however, that consistency and coherence are unimportant, but coherence can be conceived of in various and differing ways (see, e.g., Lee, 1971 ;Vickers, 1976, for discussions of this matter). The argument that second-order probabilities can assume only values of 1 or 0 seems at odds with psychological observation regarding judgmental uncertainty generally. One should also note the importance which philosophers have ascribed to other second-order or secondary phenomena. Mellor (1980) considers second-order beliefs t o be necessary to provide for a theory of conscious belief. Frankfurt (1971) suggests that the mark of personhood is the ability to form second-order desires. Both Hallden (1980) and Jeffrey (1974) have developed the idea of secondary preferences and Sahlin (1 981a) has demonstrated empirically that subjects can deal with secondary preferences in a consistent way.
’
Our views on third- (and higher-) order probabilities differ somewhat. Sahlin considers such measures incomprehensible from a pragmatic standpoint and argues that if no behaviorally meaningful interpretation can be given, introducing them as a construct has no point. Goldsmith feels their closer consideration could be meaningful.
460
R.W. Goldsmith and N.-E. Sahlin
TwoModels of the Use of Second-Order Probability Assessmen ts in Decisions As a starting point for our empirical work, we hypothesized two models for the utilization of second-order probability information in decisions. We assumed that under any given conditions the decision maker's choice of the one model or the other would be a matter of individual differences. Both models conceive of decisions as the choice between two or more bets, each bet having two possible outcomes-success (winning a sum or not losing a sum) and failure (losing a sum or not winning a sum)-with the assessed probabilities of success and failure summing to 1.O. Bets are considered to be of three types: win or not win, lose or not lose, and win or lose. For simplicity, only decisions are examined where the bets are all of the same type and involve the same first-order probabilities and sums but differ as regards second-order probability estimates. Although this restricts us t o rather artificial situations, generalizations to more complex situations seem straightforward enough. According t o the one model, which we term the Epistemic Risk Model or Model-I -and which in a more purely descriptive vein might be called the Consistent Uncertainty Avoidance or Uncertainty Seeking Model-the decision maker shows a consistent preference over the entire range of first-order probabilities either for bets with high second-order probabilities or for bets with low second-order probabilities. We assume that for those employing the model uncertainty regarding outcome probabilities is experienced as risk (what we refer to as epistemic as opposed t o outcome risk) and that the risk is either consistently avoided (choice of high second-order probability bets) or consistently sought (choice of low second-order probability bets), According to the other model, which we call the Differential Gain or Loss in Utility Model or Model-2, the decision maker prefers high secondorder probabilities in the case of bets adjudged t o have a "high" probability of success, and low second-order probabilities in the case of bets adjudged to have a "low" probability of success. The basis for constructing this model w a s the notion that subjects would employ an intuitive version of the following argument: that with bets they considered to have a high probability of success the increase in subjectively expected utility (SEU), which could potentially occur if the probability of success were in fact higher than they had judged it to be, would be only negligible compared with the decrease in SEU which s u l d potentially occur if the probability of success were lower than they had judged it t o be; this form of "ceiling effect" would make precision (lugh second-order probability) appear
SECONDaRDER PROBABILITIES
46 1
advantageous. Subjects were assumed to employ the reverse of this intuitive argument when confronted with bets which they perceived to have a low probability of success, an analogous "floor effect" there making imprecision (low second-order probability) seem advantageous. The assumption already made that second-order probability distributions are frequently skewed is important to Model-2. That assumption, more specifically, is that at any first-order probability level the decision maker's second-order probability distributions tend to be increasingly skewed as the perceived adequacy of the information upon which the first-order estimates are based decreases. This, together with an assumed tendency for the mode rather than the mean t o be selected as the measure of locatim for the second-order probability distribution (see above), could conceivably result in judgmental combinations such as the following: A decision maker with little information available estimates the event probability to be 0.90 but considers a probability of 0.10 t o be "perfectly possible." In this example, one can imagine a second-order probability distribution with a mode of 0.90 but with a portion of the probability density extending to 0.10 and below. Such judgmental conditions, which we suspect may in principle be rather common, would seem particularly conducive to the use of Model-2. One should note that Model-2's predictions bear a superficial similarity t o those of a version of Karmaikar's (1978) Subjectively Weighted Utility Model w h c h Larson (1980) tested, in fact with negative results. The predictions of the Karmarkar model were that for bets with objective win probabilities of greater than 0.50 subjects would prefer more certain outcome probabilities, whereas for bets with objective win probabilities of less than 0.50 subjects would prefer more uncertain outcome probabilities. In addition to the fact that the reasoning underlying the Karmarkar model is entirely different from that underlying ours, our model conceives of the line of demarcation between "high" and "low" first-order probabilities as being a subjective one which can vary from subject to subject.
Empirical Studies Aim and Method
The three empirical studies which we carried out, w h c h will be termed Studies A, B and C (these .will be reported on in greater detail elsewhere), aimed at testing both the general hypothesis that subjects' choice between bets of the sorts described would be influenced by second-order probabilities, and two subordinate hypotheses, namely HI ,that some subjects
462
R.W.Goldsmith and N . E . Sahlin
will behave in accordance with Model-], and Hz , that some subjects will behave in accordance with Model-2. The basic plan of all three studies was as follows. Initially, subjects were t o assess the probabilities of various events about many of which they had very limited information (e.g., "that a transit strike will occur in Verona, Italy next week" or "that there will be a thick fog in the center of Birmingham, England sometime tomorrow morning"). In connection with each such first-order probability estimate, the subject was to make a second-order probability estimate in some form. At the next stage of testing, the subject was to rank order in terms of desirability bets which involved these same events. The success or failure of a bet was to be conceived as contingent upon the occurrence or non-occurrence of the event, success and/or failure being represented by imagined monetary sums. The rank orderings were carried out within various sets of bets, where all the bets in a set concerned events for which the subject had given the same first-order probability estimate and where the monetary sums and types of bets were all the same. In Study A, 20 subjects performed rank orderings of this sort twice, once under win-or-not-win and once under lose-or-not-lose conditions; six additional subjects rank ordered bets under only one of the two conditions and later gave retrospective reports on their reasons for ranking the bets as they did. In Studies B and C (n = 20 in each), subjects rank ordered bets of the win-or-lose type, giving retrospective reports of the sort just described and (in Study B) providing think aloud protocols while rank ordering the bets. In Study C, two different winor-lose sums were employed, the one being 20 times larger than the other. For half of the subjects there the bets t o be rank ordered involved the smaller sum, these subjects later being asked what changes if any in their rank orderings use of the larger sum would produce. For the other half of the subjects the order in which the two sums were presented was reversed, the larger sum being employed first and the smaller one afterwards. In all three studies subjects recorded their first-order probability estimates on a graphic scale, being instructed t o respond verbally if an estimate turned out to be too high or too low to be adequately expressed on the scale. For obtaining a second-order probability measure, the most valid approach appeared to be an indirect one, since asking for a secondorder probability directly could involve the following difficulty: that if the subject thought of the first-order estimate as occupying a point rather than a small interval on the graphic scale, the second-order estimate could be viewed as infinitesimal, whereas if the first-order estimate were thought of as occupying an interval, the size of the second-order estimate should depend on the size of the interval. To avoid this problem, we asked subjects the more accustomed question of how certain they felt that the
SECON DQRDER PROBABILITIES
46 3
(first-order) estimate was correct. We assume that the answer t o such a question could be shown t o accord reasonably well with a second-order probability measure assessed in conjunction with a first-order estimate occupying some clearly defined interval. The subject answered this second-order probability question in Study A by adjusting a slider on a meter-long scale, and in Studies B and C by selecting a number on a 25-point scale which contained seven verbal designations at equidistant points. Both scales had as one endpoint absolute certainty that the firstorder estimate was correct and as the other endpoint complete uncertainty as to what the probability was. Study A employed two types of events: ambiguous events (40 in number, of the sort about which subjects had very limited information) and unambiguous events (involving urns, fair coins,and the like, such that a logical or frequentistic basis for a particular probability was provided). This study was so designed that for each probability level represented in the subject’s first-order estimates for the ambiguous events, an unambiguous event was included to which the subject could be expected to give the same first-order estimate. T h s aimed at allowing all ambiguous events to be used in forming bets which could be rank ordered with one or more other bets. For subjects who did not behave as expected with the ambiguous events, a different methodology was employed: asking the subject what composition an urn would need to have for the subject to give a first-order estimate of a certain specified size. In Studies B and C only events of the ambiguous variety were employed. This was t o make it possible to determine whether the presence of unambiguous items (which we saw as having quite a different probabilistic status than the ambiguous ones) was crucial to obtaining support for the hypotheses.
Results and Discussion For each of the sets of bets a subject had rank ordered in Studies A, B and C the correlation (rs) between the second-order probability assessments the subject had given and the ranks the subject assigned the bets was determined. Analysis of individual subjects’ patterns of rs values provided a hasis for assessing the degree of support for HI and H2.A tendency for rs values of a given sign (positivelnegative) to predominate across the whole range of fKst-order probabilities represented was seen as supportive of H I . This would result from a subject’s consistent preference for bets of high second-order probability, or, alternatively, the subject’s consistent preference for bets of low second-order probability. Regarding the assessment of support for H 2 ,account was taken of any pattern pointing toward
464
R.W. Goldsmith and N.-E. Sahlin
behavior of the opposite sort. This would be the case where a subject exhibits a preference for bets of high second-order probability when the probability of success is high and for bets of low second-order probability when the probability of success is low, or vice versa. Support for H2 requires that a tendency be found for positive rs values to be located more towards the one end and negative rs values more towards the other end of the scale of first-order probabilities. A statistic Pu ,representing a descriptive use of the p-value which a Mann-Whitney U test gave, served as a measure of this tendency. For the 20-subject group in Study A, analysis of the patterns of rs values revealed the following: Under win-or-not-win conditions there were eight subjects, and under lose-or-not-lose conditions there were seven, who displayed patterns consistent with H I , binomial tests (twesided) of the relative frequencies of positive and negative rs values yielding p < 0.05 for each of the eight subjects in the former case and for five of the seven in the latter case. However, whereas for each of the eight subjects under the type, it was of win-or-not-win condition the pattern shown was of the the H, type for each of the seven under the loseor-not-lose conditions. Regarding Hz, it was found under win-or-not-win conditions that of the seven subjects with the lowest Pu values, all showed patterns compatible with H2,Similarly, of the nine subjects with the lowest Pu values under lose-or-not-lose conditions, eight behaved in accordance with H2,only one subject displaying H2 pattern. Thus,in Study A there was clear support for both HI and H2,as well as for the assumption that the version of H, behavior elicited tends t o be a preference for bets of more certain outcome probability (high second-order probability) under win-or-not-win conditions and for bets of more uncertain outcome probability (low secondorder probability) under loseor-not-lose conditions. One should note that whereas conforming with H2 under win-ornot-win conditions meant tending t o prefer high second-order probabilities when the probability of winning was high and low second-order probabilities when the probability of winning was low, under lose-or-notlose conditions it meant preferring high second-order probabilities when the probability of losing was low and low second-order probabilities when the probability of losing was high. The two amount to the Same thing in terms of probability of’success, however, since a high probability of winning and a low probability of losing both represent a high probability of success. It should also be noted that the bets were so constructed that under win-or-not-win conditions the occurrence of the event in question was associated with success (winning) whereas under losear-not-lose conditions it was associated with failure (losing). Since to show an H2 pattern under both conditions a subject would thus need to order the bets in
Hi
SECOND-ORDER PROBABILITIES
465
roughly the opposite way under the one condition than the other, the findings concerning H2 cannot be interpreted in terms of subjects simply giving the same rank orderings twice. For subjects who displayed H2 type behavior in Study A, the point of transition on the scale of first-order probab'ilities between preferring high second-order probabilities and preferring low appeared to vary considerably from subject to subject -between about 0.05 and 0.45 of winning in the case of win-or-not-win bets and between 0.05 and 0.65 of losing in the case of lose-or-not-lose bets. This argues against the validity of Karmarkar's (1978) model as it was interpreted and tested in Larson's (1980) study. In Studies B and C, which involved win-or-lose conditions, largely similar results were obtained. In Study B there were six subjects and in Study C there were five whose patterns were compatible with H 1 , all but one of these subjects showing a consistent preference for high secondorder probabilities. In addition, six of the seven subjects whose Pu values were lowest in Study B and all of the eleven subjects with lowest Pu values in Study C displayed patterns which conformed with H2. There were also four subjects in the two studies who showed patterns not evident in Study A. Three of the four appeared to prefer low second-order probabilities toward both ends of the first-order probability scale and high second-order probabilities toward the middle, whereas the fourth subject showed just the opposite pattern. Despite these four atypical patterns, the results as a whole provide clear support for both H1 and Hz under win-orlose conditions and for the assumption that the form of H1 behavior occurring there tends to be a preference for high second-order probabilities. There were no signs of interindividually consistent effects of varying the size of the monetary sums in Study C. Examination of the verbal protocols in all three studies indicated many subjects to give verbal expression for strategies which were in line with the hypotheses which their ordinal and numerical data supported. When bets with "low" probabilities of success were involved, subjects whose patterns were of the H2 type commented in such ways as the following (all quoted remarks are translated from Swedish) on their assigning a higher preference rank to a bet of more uncertain outcome probability: ''I don't know about this and thus I can hope it will occur." "I think [the alternatives]. . .are both bad and since I'm not familiar with [this]. . . I have the most confidence in it, because with the other one I know it's exactly 2W." "Perhaps it's very bad but perhaps it's extremely good, and since the chances are so low. Similarly, in a self-critical vein, one subject with an overall pattern of the H2type remarked, while deviating from this pattern at one point, "There is of course a certain contradic-
. ."
30
466
R.W. Goldsmith and N.-E. Sahlin
tion. . . in my saying it’s the best alternative when I’m certain that the probability is low. I suppose I ought to say the opposite”. In most cases subjects who made comments such as those just cited gave signs of a different view toward bets of uncertain outcome probability when higher first-order probabilities were involved, their stated reason for assigning such a bet a low preference rank being, for example, “since I don’t know anything about it”, ”since it’s just a matter of guessing”, ”since I’m so unsure about my probability estimate”, etc. Remarks of t h s sort were also common among subjects with H I patterns (of the H; variety), both under conditions where first-order probabilities were high and where they were low. Some subjects, in commenting on their rank orderings of the bets, made no reference t o the certainty of their probability estimates or the like. The protocol results as a whole, however, support the assumption that under conditions such as those studied here many subjects tend to think in a manner consistent with H I or H2. More basic than the findings relating specifically to H I and H2 appears to us to be the evidence obtained that the majority of subjects, in choosing between bets involving uncertain probabilities, deal with secondorder probability information in a consistent way. Greater attention, we feel, should be paid to second-order probabilities and t o the aspect of behavior these involve-namely, the assessment of the adequacy of the knowledge upon which one’s first-order probability estimates are based-in empirical research on decision making, in decision theory and in the application of decision theory to practical problems.
References Beach, L.R. and J.A. Wise, 1969. Subjective probability and decision strategy. Journal of Experimental Psychology, 79, 133- 138. Becker, S.W. and F.O. Brownson, 1964. What price ambiguity? Or the role of ambiguity in decision making. Journal of Political Economy, 72, 62- 73. Brown, R . V. and D . V . Lindley. Improving judgments by reconciling incoherence. Theory and Decision (in press). de Finettl, B., 1972. Probability, Induction and Statistics New York: Wiley. Ellsberg, D., 1961. Risk, ambiguity, and the Savage axioms. Quarterly Joicrnol of Economic$ 75, 643-669. Fellner, W., 1961. Distortion of subjective probabihtles as a reaction to uncertainty. Quarterly Journal of Economics, 75, 610-689. Frankfurt, H., 1971. Freedom of the will and the concept of a person. Journal of Philosophy, 68, S - 20. GLdenfors, P. and N.-E. Sahlln, 1981. Decision making with unreliable probabilities. Paper presented at the Eighth Research Conference on Subjective Probability, Utility and Decision Making. Budapest.
SECOND-ORDER PROBABILITIES
46 7
Gardenfors, P. and N.-E. Sahlin, 1982. Unreliable probabilities, risk taking, and decision making. Synthese Goldsmith, R.W., 1973. Applications of probability and signal detection theory to court decision making. Paper presented at the Fourth Research Conference on Subjective Probability, Utility, and Decision Making. Rome. Goldsmith, R.W., 1980. Studies of a model for evaluating judicial evidence. A(,ta Psychologica, 45, 211-221. Good, I.J., 1950. Probability and the Weighing of Evidence New York: Hafner. Good, I.J., 1965. The Estimation of Probabilities Cambridge, Mass.: M.I.T. Press. Good, LJ., 1971. Twenty-seven principles of rationahty. In: V.P. Godambe and D.A. Sprott (eds.), Foundations of Statistical Inference Toronto: Holt, Rinehart, and Winston, 123- 127. Halldh, S.. 1980. The Foundations of Decision Logic; Lund: Library of Theoria. Hederstierna, A,, 1980. Decision making under uncertainty: An empirical study of confidence in personal probability assessments. Mimeograph. Economic Research Institute, Stockholm School of Economics. Jeffrey, R.C., 1974. Preferences among preferences. Journal of Philosopliy, 71, 377- 391. Karmarkar, U.S., 1978. Subjectively weighted utility: A descriptive extension of the expected utility model. Organizational Behavior and Human Performance, 21, 61- 72. Larson, J.R., 1980. Exploring the external validity of a subjectively weighted utility model of decision making. Organizational Behavior and Human Perforrnance, $6, 293-304. Lee, k!, 1971. Decision Theory and Human Behavior. New York: Wiley. Levi, I., 1980. 7he Enterprise of Knowledge Cambridge, Mass.: M.I.T. Press. Mellor, D.H., 1980. Consciousness and degrees of belief. In: D.H. Mellor (ed.), Prospects for Pragmatism. Essays in Memory of F.P. Ramsey. Cambridge: Cambridge University Press, 139- 172. Sahlin, N.-E., 1981a. Preferences among preferences as a method for obtaining a higher ordered metric scale, The British Journal of Mathematical and StatisticalPsychology, 34, 62-75. Sahlin, N.-E., 1981b. Generalized Bayesian decision models. Unpublished manuscript. Savage, L.J., 1954. The Foundations of Statistics New York: Wiley. Skyrms, B., 1980. Higher order degrees of belief. In: D.H. Mellor (ed.), Prospects for Pragmatism Essays in Memory of F.P. Ramsey. Cambridge: Cambridge University Press, 109- 137. Slovic, P. and A. Tversky, 1974. Who accepts Savage’saxiom? BehavioralScience, 19, 368-373. Vickers, J.M., 1976. Belief and Probability. Dordrecht, Holland: Reidel. Yates, J.F. and L.G. Zukowski, 1976. Characterization of ambiguity in decision making. Behavioral Science, 21. 19-25.
This Page Intentionally Left Blank
THE EFFECT OF BASE RATE ON CALIBRATION OF SUBJECTIVE PROBABILITY FOR TRUE-FALSE QUESTIONS: MODEL AND EXPERIMENT Mariam SMITH and William R. FERRELL Systems and Industrial Engineering Department The University of Arizona, LI. S. A.
Abstract The degree of calibration of subjective probabilities of events is the extent to which the observed proportion of events that occur agrees with the assigned probability values. The decision variable partition model of calibration is reviewed tutorlally. It shows how numerical subjective probabilities for discrete events can be related to the perceived truth of propositions It has been able t o explain a number of experimental findings about calibration of subjective probabilities of correct response to two-alternative multiple-choice questions and to questions t o which the respondent supplies the answer. In this paper, the model's predictlons for true-false items are derived, and an experiment testing them is reported. The model predicts that when the subjective probability that items are true is assessed, there will be a specific effect of the base rate, the proportion of true items, but that there will be no effect when the respondent decides true or false and then reports a subjective probability of being correct. A systematic effect of task dif€iculty is predicted in both cases. The experimental results are in close agreement with the model's predictions.
Introduction
Subjective probabilities are statements of opinion, but it is generally agreed that for consistency and practical applicability they should be calibrated. Of all the events assigned a given probability, the proportion that actually occur should equal the probability. The degree t o which this condition is met is best shown by a calibration curve, a graph of the proportion of occurrences as a function of assigned probability. The results of a large number of experimental assessments of calibration are reviewed in Lichtenstein, Fischhoff, and Phillips (1981). The most common findings are overconfidence and a systematic effect of task difficulty. Ferrell and McCoey (1980) proposed a model of how perceived truth of propositions may be translated into numerical judgments of sub-
470
M ,Smith and W .R.Ferrell
jective probability or of confidence. When coupled with the assumption that respondents do not change their response criteria as task difficulty changes, it predicted the observed systematic effect of difficulty on Calibration. The model has not previously been tested with true-false questions, although Ferrell and McGoey (1980) pointed out some of its implications for changes in base rates with true-false questions, and Lichtenstein, Fischhoff, and Phillips (1981) presented data which, they speculate, shows an effect of base rate predicted by the model. This paper presents the derivation of detailed predictions about the effects on calibration of the proportion of true items in true-false tests and an experiment in which the predictions were tested empirically. The calibration model is explained tutorially in the following section, because the predictions of the base rate effect depend upon its structure and assumptions and because it is hoped that the model will thus be more accessible t o others.
The Calibration Model The decision variable partition model of calibration describes how suitable numerical judgments may be generated in response. t o requests of the sort, "Give your subjective probability that ..." or "Give your confidence that...". For example, "What is absinthe? Answer and give your subjective probability that you are correct". The same question can be posed in a variety of ways. It could be a multiple choice question: "Absinthe is (a) a precious stone, or (b) a liqueur. Say which it is and give your subjective piobability that you are correct." It could be in a true-false format. "Absinthe is a liqueur. Give your subjective probability that that proposition is true" or, alternatively, "Say whether the proposition is true or false and give your subjective probability that you are correct". The various question formats can be characterized by two features, the number of alternative propositions presented t o the respondent and the type of event t o which the subjective probability response refers, either the truth of a proposition or the correctness of a choice from among n possibilities. The event determines the potential range of the probability judgment, full range 0 to 1 if the event is truth, limited range l/n to 1 if it is correctness. The model attempts to accommodate all task formats in a single framework. A probability is a number applied to the occurrence or non-occurrence of an event. The model assumes that a discrimination process for distinguishing between the occurrence and non-occurrence of the event under lies the generation of numerical responses.
EFFECT OF BASE RATE ON CALIBRATION
47 1
The Decision Variable and I t s Partition The model assumes that the discrimination process is probabilistic and that it functions as if based on a decision variable. A decision variable is a scalar variable such that if one had t o decide whether or not the event has occurred, one would d o so by observing whether the value of the decision variable is greater than or less than some criterion value. The familiar statistics t, F, Z, and chi-square are decision variables used t o discriminate between null and alternative hypotheses. One finds an appropriate criterion value in a table and accepts the alternative hypothesis only if the value of the statistic is greater than the criterion. The decision variable in the model can be thought of as strength of belief in the event’s occurrence. The model assumes that there is a finite set of numerical responses that the respondent may use and that the range of the decision variable is divided by criterion values into intervals, one interval for each response. The responses are assigned t o intervals so that the next more extreme interval sets the next more extreme response. Each question generates a particular value of the decision variable, and the interval into which that value falls determines the numerical response. It should be noted that the only requirement on the responses is that they become more extreme with more extreme values of the decision variable. Hence the model applies to ratings of confidence (e.g., on a scale of 1 t o 5) as well as to probabilities. The degree of calibration cannot be assessed unambiguously for ratings, but graphs of the proportion of occurrences as a function of the level of confidence such as those in Decker and Pollack (1958) can be modeled. A discrete response set is not a very restrictive assumption,since one generally finds that respondents limit their subjective probability judgments t o a small set of possible values even when free to give any number in the 0 to 1 interval. The model is related to signal detection theory. The use of a decision variable to describe the discrimination process for the occurrence or nonoccurrence of the event is the same as its use in signal detection models to describe the classification of a stimulus as noise or signal-plus-noise. Distributions of the Decision Variable
When one knows how a statistic such as t or F is distributed under the null and under the alternative hypothesis, one can determine the posterior probability that the alternative hypothesis is true when one has accepted it on the basis of a particular criterion value. (Of course, one also needs a prior probability for the alternative hypothesis.) In the model there are
412
M.Smith and W.R. Ferrell
also two distributions of the decision variable, one when the event has occurred and one when it has not. In a similar way, when one knows these distributions, the criterion values that separate the intervals associated with the response categories and the prior probability of the event, one can determine in the same way the probability the event has occurred given a particular response. The set of these probabilities for all the response categories gives the asymptotic calibration curve predicted by the model.
Relation of the Decision Variable to Propositions The discrete events to which subjective probabilities are assigned are either that a given proposition is true or that a selection from among n possibilities is correct. In both cases the decision variable, the strength of belief in the event, is related to the apparent truth of a corresponding proposition or set of propositions. The relationship depends upon the question format. The general concept determining the relationship is the normative one that the decision variable should be a monotone increasing function of the posterior probability of the event given the apparent truth of the related proposition(s). In other words, the apparent truth is used as statistical evidence for the occurrence or non-occurrence of the event and thereby determines strength of belief. The functions of apparent truth that will do this depend on the specific relation of the proposition(s) to the event. The mdel's general assumption is that if there is such a function that is sufficiently simple, then it is used. If not, then optimality is sacrificed for simplicity. It turns out that this must be done with multiplechoice questions with more than two alternatives. In such case ad hoc assumptions are needed, but for the most common question formats, the normative principle can be used consistently. For any particular experimental .assessment of calibration, it is assumed that there is a population of propositions, a reference population, that provides the basis for the probability judgment task and in relation t o which calibration is assessed. The particular experiment is assumed t o involve a sample of propositions from that population. For the population, there is a distribution of apparent truth for propositions in it that are true and for those that are false. Although other assumptions can be made, it may be adequate to assume that these distributions can be represented as normal, of equal variance, with the mean value higher for true propositions than for false ones. If so, the distributions can be characterized by the discriminability parameter d' of signal detection theory, the difference in
EFFECT OF BASE RATE ON CALIBRATION
413
the means divided by the standard deviation. The zero point is taken for convenience at the point halfway between the means. The assumption of normal distributions is not quite as strong a psychological assumption as it might seem at first sight. The distributions need only be normal on some monotone increasing transformation of the scale of apparent truth to satisfy the assumption. The decision variable is a function of the apparent truth value(s) of the related proposition(s) and thus the distributions of the decision variable are obtained from the normal distributions of apparent truth by an appropriate transformation of variable. The relationshp between the decision variable and apparent truth for several of the question formats is considered in detail in the following section.
The Decision Variables for Different Question Formats One-Alternative, Probability- True Task
The simplest case is when the questions are of the form, "Absinthe is a liqueur. Give your subjective probability that that proposition is true" or, more simply, "What is the probability that absinthe is a liqueur?". In such a case, the decision variable is just the apparent truth itself, and intervals on apparent truth are associated with the set of allowable responses. The distributions of the decision variable are the normal distributions of apparent truth. An illustration is given in Figure 1. Suppose an experiment using questions in this form has been performed. Each proposition is either true or false, and has associated with it a subjective probability (or rating) response. Using the methods for a rating experiment in signal detection (Egan, Schulman, and Greenberg, 1959;
Figure I . An Example of Distributionsof the Decision Variable for the One-Alternative, Probability-True Task
474
M . Smith and W.R. Ferrell
Green and Swets, 1966) one can determine the discriminability d' characterizing the distributions of the decision variable and one can determine by trial and error or by a n optimal estimation program (Grey and Morgan, 1972) the criterion values of the normalized decision variable that will produce proportions of the various responses that best approximate the obtained data. From these criterion values it is a simple matter to compute the probabilities for the calibration curve, the asymptotic proportions of Occurrences for each response category. An example of the calculation is shown in the Figure. For such an experiment, the estimate of d' is a measure of the respondent's ability to separate true from false propositions in the supposed population represented by the sample. Thus it is a measure of task difficulty. The proportion of true propositions is an arbitrary parameter of the task.
ZemAlternative, Probability-True (or Correct) Task
A similar format is one in which questions are asked in the form, "What is absinthe? Answer, and give your subjective probability that you are correct (or that the answer is true)". In this case, the decision variable is also the apparent truth of the proposition, the answer, but the proportion of true propositions is no longer an arbitrary task parameter but a measure of task difficulty, of how frequently one can answer correctly. The measure of discriminability d' is now a measure of how well the respondent can distinguish his or her own correct from incorrect answers, a measure of metacognition, of "knowing that one knows" (Hosseini and Ferrell, 1981).
TweAlternative, Probability-Correct Task Two-alternative multiple-choice questions provide somewhat more complicated formats. There are then two propositions, one of which is true and the other false, with equal likelihood. Each of them produces a value of apparent truth. The model assumes that when subjective probability of correctness is required, the respondent chooses the alternative answer that gives the higher value, and that the decision variable is the absolute difference between the two values. The distributions of the absolute difference when the true answer produced the larger value and when the false answer produced it are truncated normal distributions of the sort shown in Figure 2.
EFFECT OF BASE RATE ON CALIBRATION
475
p(C) =probability correct = @ (dj/Z"')
07
1
08
09
?.
=
responses r,
f ( x 1 correct)
,-
incorrect)
-
variable x
0
x,
xz
x3
xq
response criteria
Figure 2. An Exampleof Distributionsof the Decision Variable for a Two-Alternative, Probability-Correct Task
Again, the model can be fitted t o data. The proportion of correct responses enables one to calculate a value of d' for the distributions of apparent truth, or it can be found from the tables of Elliot (1964). That value determines the parameters of the distributions of the decision variable. The response criteria can be found by trial and error, and asymptotic values for a calibration curve are calculated using them, as shown in the Figure. The proportion of true propositions is 0.5 by design. The d' value is a measure of task difficulty, the ability of the respondent to distinguish the correct answers from the given distractors.
One-Alternative, Probability-Correct Task
With propositions to which the respondent answers "true" or "false" and gives a subjective probability of being correct, the decision variable is also apparent truth and it is used to decide the answer. If the apparent truth is greater than a decision criterion value, the answer is "true"; if less, the answer is "false". Thus, there are two ranges. The one above and the one below the decision criterion that are partitioned by response criteria and assigned the same set of probability responses from the range 0.5 to 1. If correct response is equally rewarded and incorrect response equally penalized for true and false propositions, there is a decision criterion that can be shown to maximize reward. It also is best under a variety of other assumptions (Green and Swets, 1966). It is the value of
M . Smith and WR. Perrell
416
apparent truth at which the ratio of the distribution density function for true propositions to that for false ones is equal to the ratio of the proportion of false propositions to the proportion of true ones. It is assumed that with feedback of performance this criterion would tend to be adopted by respondents. To fit the model to data in this case, the discriminability d' can be computed from the proportion of times the respondent is correct and incorrect when the propositions are true, or it can be found from the tables of Elliot (1964). Again, optimal or cut and try methods can be used to yield response criteria that best match observed response proportions and the calibration curve. If there is no feedback of performance and no prior expectation that the proportion of true propositions is not 0.5, the decision criterion is assumed to be midway between the distributions and the probability response criteria are assumed to be symmetrical about it. Figure 3 illustrates such a case and shows a sample calculation of the proportion-correct given a particular response. decide " f o l d
- A
A+B
with symmetric criteria
1
Ftgure 3 An Example of Distributions of the Decision Variable for the One-Alternative, ProbabilityCorrect Task when the Criteria are Symmetric
Results from Previous Applications of the Model The model has been applied to data from two-alternative, probabilitycorrect tasks and from zero-alternative tasks as described in Ferrell and McGoey (1980). The principal result in the former case was that the widely observed systematic effect of the proportion of correct answers could be accounted for by the model by assuming that in the absence of feedback, the criteria used to partition the decision variable stay the same
417
EFFECT OF BASE RATE ON CALIBRATION
across tasks of different difficulty. Figure 4 shows the model with data from Lichtenstein and Fischhoff ( 1 977). The criterion values were estimated from a separate set of data and the model was fitted t o the calibration curves and response proportions using only the four proportions of correct answers. 4.0
1
I
L-/
$5
06
07
08
p(C)
Data
Model
085
A
A
ose
v
v
09
I
I Response r,
Figure 4. The Model for Data from Lichtenstein and Fischhoff's (1980) Experiment 3 for ProportionCorrect p(C) 0.5. (The response criteria were estimated from another experiment, from Ferrell and McGoey, 1980.)
478
M.Smith and W .R.Ferrell
As noted earlier, with questions to which the respondent supplies the answer, the discriminability d‘ and the proportion-correct are independent measures of performance. The model predicts separate effects on the calibration curves when criteria are held constant if discriminability or the proportion-correct change. These effects were clearly evident in group data when separate calibration curves were plotted for the four subgroups with below or above median values of d‘ and proportion-correct. A single set of criteria obtained from fitting the model for one group gave calibration and response frequency curves that fit the other three groups’ data when used with their values of d’ and proportion-correct.
Model Predictions of Base Rate Effects of True-False Items PLobability-True Task
Probability-true tasks, ones with true-false items to which one gives a probability of truth, have the same model structure as zero-alternative tasks in which the respondent both supplies the proposition, the answer, and judges its truth. Although the interpretation of the discriminability d’ and proportion of true propositions is different in the two cases, they are both full range and the decision variable is apparent truth, itself. Hence, the effects of d’ and the proportion of true items on calibration shown in Ferrell and McCoey (1980) for zero-alternative questions are also predicted for the probability-true task with true-false items. The predicted effects are shown in Figure 5 , an increase in all calibration proportions with increasing proportion of true items, and more extreme calibration proportions, i.e , closer to zero or one, with increasing d‘. The assumption is that the response criteria do not change, unless there is feedback, with changes in either difficulty or the base rate of true propositions. The predictions are not just qualitative. Given data from which to estimate the criteria, and given the difficulty and base rate, the entire calibration curve and the response proportions are given by the model.
EFFECT OF BASE RATE ON CALIBRATION
479
09
07
-
c
iI-
a
c
05
L
0 U 0 c
.t 0
n 0
a‘ 0.3
0.1
Figure 5. The Separate Effects of Proportion-True p(T) and Discriminability d’ on Good Calibration for a One-Alternative, Probability-True Task. (If both vary, the effects combine.) (Adapted from Ferrel and McCoey, 1980.)
Probability-Correct Tmk
In a probability-correct task, the respondent answers “true” or ”false” to each item and gives a subjective probability of being correct. If the decision criterion is midway between the means of the distributions of apparent truth and if the response criteria are symmetric about if as in Figure 3, and if difficulty is held constant, there will be no change in calibration or response proportions with changes in the proportion of true items. This is because the symmetry of the response criteria ensures that for each response category the areas under the distributions are the same for both the intervals associated with that response. Hence, if the proportion of items correctly identified as true increases due to an increase of true items, the proportion of false items correctly identified will decrease
480
M.Smith and W.R. Ferrell
by the same amount due t o the decrease in false items. Thus, the total proportion of correctly identified items for that response category will stay the same and the calibration will not change. This symmetry of the criteria is only appropriate for equal proportions of true and false items. But it is assumed that without prior information or feedback, respondents will act as if the proportions were equal even when they are not. This raises an interesting problem. If respondents are calibrated and have symmetric criteria when there are equal proportions of true and false items, they will want to keep those criteria when the proportion of true items changes in order to stay calibrated. But to maximize reward for the decision response respondents should adopt a decision criterion that is not at the midpoint between the distributions, one that will cause the response criteria to be asymmetric. Thus reward for subjective probability calibration would have to be traded off against reward for answering "true" or "false". Task difficulty d' also affects calibration with probability-correct tasks. When the response criteria are symmetric, the probability response is a function only of the absolute value of apparent truth, not of its sign. When the model is represented in terms of the absolute value its structure is that of a probability-true task with the apparent truth value above zero. Consequently, changes in d' will affect the calibration curve of half range, probability-correct tasks the same way they affect the upper half of the calibration curve of the full range, probability-true tasks (illustrated in Figure 5). Increases in d' will thus raise all the points on the calibration curve. Given the response criteria and the difficulty, the model prediction is quantitative.
Experiment The experiment to test the predictions of the model for true-false questions had the following general plan. Three sets of questions were prepared, consisting of propositions with proportions true p(T) of 0.25,0.50 and 0.75. They were intended to be of exactly equal difficulty. The respondents were randomly assigned to two groups, one for the probability-true task and the other for the pobability-correct. Each group responded to each set of propositions. For the probability-true task the model predicts that each of the proportions of true items will produce a separate calibration curve with the highest proportion producing the highest curve. For the probability-correct task, it predicts no differences due to the proportion of true items.
EFFECT OF BASE RATE ON CALIBRATION
481
For each task type, the model was fitted t o the pooled data from the set of questions with equal proportions of true and false propositions. This gave the set of response criteria which were assumed to be same for the other proportions since feedback was not provided. Using those criteria, the given values of p(T) and the discriminability d' estimated from the data, predictions of the calibration curves and response proportions were made for the other two sets of questions. There were 20 subjects, one undergraduate and 15 graduate students and 4 working adults, randomly assigned to the two groups. The questions were 600 propositions about the ordering of population magnitudes, distances between cities and historical events. Examples are: 1. Europe is more populous than Asia. 2. Paris is farther from Vienna than is New York. 3. Republican Party formed before Monroe Doctrine declared. Of the questions, 500 were from Lichtenstein and Fischhoff (1980) and the other 100 were constructed for the experiment. They had all been classified as either "hard" or "easy" depending on the size of the difference in population, distance or time. Two hundred questions of each type and difficulty classification were randomly assigned to each of three question sets, but individual questions were not matched for difficulty in each set. The order of the quantities was changed in enough questions in each set to obtain sets with proportions of true propositions of 0.25, 0.50 and 0.75. The question sets were given in random orders in booklets to the group doing the probability true task Those respondents, working on their own, gave their subjective probabilities that the propositions were true using the responses iO.1, 0.3,0.5,0.7,0.9) . The propositions in each set were reversed in sense (so that the 25% true became the 75% true, etc.) and were given to the probability-correct group. The respondents in it gave an answer of "true" or "false" and a subjective probability of being correct using the responses {0.5,0.6,0.7,0.8,0.9) The change in question sense was intended to prevent confounding the proportion-true with difficulty, should there be differences in difficulty between the sets.
.
Results
The Table gives the proportions of true propositions and the values of d' for both tasks and the proportion of correct responses p(C), from which d' was obtained, for the probability-correct task. The proportions p(T) differ from their nominal values because some subjects in both groups did not respond t o some of the questions, which were then omitted from the analysis. There are clearly differences in difficulty among the sets of ques31
M . Smith and W.R.I'errell
482
ProbabilityCorrect task
Probability true task
P (T)
d'
P (n
d'
p (0
I
I
Quest ion yet 1
0.25
1 .os
0.74
1.22
0.73
Question set 2
0.56
0.78
0.45
0.94
0.68
Quest ion set 3
0.68
0.5 1
0.28
0.55
0.6 1
I
tions. The differences were observed for each of the 20 respondents. Because the truth of the same propositions was switched for the two tasks, the differences in difficulty can be attributed to the propositions' content and not the proportion of true items. The calibration curves for the two tasks are shown in Figures 6 and 7. According to the model, the separation of the curves for the probability-correct task in Figure 7 is entirely due to differences in d'. It is due to both d' and p(T) for the probability-true curves in Figure 6. Note that Figure 7 is drawn t o twice the scale of Figure 6, magnifying its relative deviations. For both tasks, the groups are quite well calibrated when the proportion of true and false propositions is about equal. 'The model was fitted, by trial-and-error, to the data for that condition. The fit appears quite close to both the response proportions and the calibration curve. Using the results response criteria (shown in the Figures) and estimating only the values of d' from the data, the model calibration curves and response proportions were predicted for the low and high p(T) conditions for both tasks. Had task difficulty been constant, as intended, the prediction would have been unconditional. The predictions and data are cornpared in Figures 8 and 9. Again there appears to be very good agreement. By itself, d' adequately accounts for the rather small differences between the curves from the probability-correct task, and p(T) accounts for most of the larger difference between those from the probability-true task. Although the model and its predictions are close to the data, they are not statistically indistinguishable from them, as is indicated by chi-square goodness of fit tests on the observed and predicted frequencies in the 10 categories defined b y the five responses and by truth and falsity (or correctness and incorrectness).
EFFECT OF BASE RATEON CALIBRAIION
Dato 0.9
Model
p(T) 0.68
0
A
4R3
0.56
A
0.7 ,-L-
t-
v
a 0)
2 + c
0
05
c
0 L
a
P
n
0.3
0.1
Criteria
/'
/
I
=( - t . l ,
I
05 Response r,
-0.8, 0.2, 0.85) 1
I
07
09
01
03
1
I
I
I
0.1
0.3
0.5
0.7
0.9
I
Response 6 Figure 6 . Results from the Probability-True Task with the Model Fitted to the Data from the Intermediate Proportion of 'True Propositions
31*
M . Smith and W.R. Ferrell
4 84
0.9
P
i- 0.8 u
-
-
v
0
300 0.7
0
=
f P n
0.6
-
0.5
-
7 05
-
Criteria-{?O,
0.6
0.7 Response
0.5, 0.85, 11, 1 5 )
0.8
09
0.4 -
i -
v
P
6 0.3 -
.c
go,2CL
4:
E 0.1 -
8
a
0'
1
05
I
0.6
I
0.7 Response
I
0.8
I
0.9
Figure 7. Results from the FVobabilityConect Task with the Model Fitted to the Data for the Intermediate Proportion of True Propositions
EFFECT OF BASE RATE ON CALIBRATION
Data
Model
0
o
0.9
485
/
p(T) 0.68
-
0.7
-
i -
I-
n
=J U
2= 0 . 5 0 c .-
? i n z a 0.3
0.i I
1
I
I
0.3
05
07
0.9
I
0.4
Response r,
20.4-
P
f 0.3 -
% -
I::
-
d
I
I
I
1
I
0.1
0.3
0.5
0.7
0.9
Response r,
Figure 8. Model Predictions and Data from the Probability-True Task for the Low and High Proportions o f True Proportions
M . Smith and W . R . Ferrell
486
0.9
0.8
1
0.7
06
05
Response ri
I
I
I
1
0.6
0.7
0.8
0.9
Response 9
J'igure 9. Model Predictions and Data from the ProbabilityCortect Task for the Low and High Proportions of True Proportions
EFFECT 01 BASE RATE ON CALIBRATION
487
Discussion and Conclusions The authors consider the model’s fit to experimental calibration curves and response proportions, and its capacity to predict numerically the consequences of changes in conditions for a variety of task types t o be striking. The model seems to have captured the underlying structure of the interaction of task and behavior. It seems to us that a process akin t o that of the model does, in fact, underlie the way people translate their percep tions about truth of propositions into numerical judgments. It may also underlie other types of numerical judgment, as well. But we wish to stress that the model says nothing about the cognitive processes by which what has here been called apparent truth is arrived at. It merely assumes for simplicity that the distributions of apparent truth, or of a monotone transform of it, are normal and of equal variance for true and for false propositions. The fact that the small differences between the model and the experimental data are too large to be attributed to random sampling fluctuations is not surprising. Indeed, it is to be expected in view of the many factors tllat have been found systematically to affect probability judgments (Hogarth, 1975) and numerical judgments generally (Poulton, 1977). What is surprising is that the normal equal variance assumption approximates so well the results for actual sets of questions. However, when the model has been applied t o data for judgments of confidence in performance of tasks which are more perceptual than cognitive, e.g., detecting patterns on X-ray plates, it fits with even greater precision, perhaps because the normal distribution is a better approximation in such cases. We have confidence in the model principally because it provides both reasonable numerical values and a meaningful and coherent structure in which to interpret data and experimental manipulations. Further, it reveals the relationships among question formats and clarifies problems in the measurement of task difficulty and performance for zero-and one-alternative probability-true tasks. Perhaps it can be used to help further disentangle the cognitive and numerical aspects of discrete probability judgment .
488
M . Smith and W .R . €errell
References Decker, L. and 1. Pollack, 1958. Confidence ratings and message receptlon for flltered speech. Journal of the Acoustical Society of America, 30. 432-434. Egan, J.P., A.F. Schulman, and G.Z.Greenberg, 1959. Operatkg characteristics determlned by binary declslons and by ratings. Journal of the Acoustical Society of America, 31, 768-773. Elllot, P.B., 1964. In: J.A. Swets (ed.), Signal Detection and Recognition By Human Observers. New York: Wiley. Ferrell, W.R. and P.J. McGoey, 1980. A model of calibration for subjectlve probabUties. Organizational Behavicu and Human Performance, 26, 32 -5 3. Green, D.M. and J.A. Swets, 1966. Signal Detection Theory and Psychophysics. New York: Wiley , Grey, D.R. and B.J.T. Morgan, 1972. Some aspects of ROC curvefitting:Normaland logistic models. Journal of Mathematical Psychology, 9, 128-1 39. Hogarth, R.M., 1975. Cognitive processes and the assessment of subjectlve probability distributions. Journal of the American Statistical Association, 70, 271-294. Hosseini, J. and W.R. Ferrell, 1981. DetectabUty of correctness: A measure of knowing that one knows. Unpublished manuscript. Llchtensteln, S. and B. Fischhoff, 1977. Do those who know more also know more about how much they know? Organizational Behavior and Human Performance, 20, 159-183. Llchtensteln, S. and B. Fischhoff, 1980. How well d o probability experts assess probability? Decision Research Report 80-5. Lichtenstein, S., 8. Flschhoff, and L. Phillips, 1981. calibration of probabllltles: The state of the art to 1980. In: D. Kahneman, P. Slovic, and A. Tversky (eds.), Judgment Under Uncertainty: Heuristics and Biases New York: Cambridge University Press. Poulton, E.C., 1977. Quantitative subjective assessments are almost always biased, sometimes completely misleadlng. British Journal of Psychology, 68. 409-425.
r'
THE SURPRISINGNESS OF COINCIDENCES' R u m FALK The Hebrew University, Jerusalem, Israel
and Don MAcGREGOR Decision Research, A Branch of Perceptronics, Oregon, U.S.A.
Abstract Coincidence-type stories, a popular social pastime, were studied in terms of factors affecting their surprisingness Subjects were presented with stories varying in tense (past vs. future) and scope of detail (union vs. intersection). The normal form of coincidence stories, past-intersection, was found to be judged less surprising than stories of identical scope told in the future tense and simllar in surprisingness to past-tense staries of greater scope. Subjects judged their own coincidences as more surprising than coincidences constructed by the experimenters and more surprising than when evaluated by a separate group of subjects. Implications of these results are discussed in terms of what we may remember or learn from experiences related by others.
Introduction The "one chance in a million" will undoubtedly occur, with no less and no more than its appropriate frequency, however surprised we may be that is should occur to us R .A. Fisher
Partial support for this study was provided by the Human Development Center, the Hebrew University of Jerusalem, and by Perceptronics, Inc. The authors are grateful to Klka Farag6, Baruch Fischhoff, Ray Hyman, and Sarah Lichtenstein for their help in the initial stages of the research. We wish to thank Raphael Falk for helpful editorial comments Reprints may be obtained from the second author at Decision Research, A Branch of Perceptronics, 1201 Oak Street, Eugene, Oregon 97401.
490
R. talk and D. MacGregor
Coincidences are an intriguing class of events. One dictionary definition states: "A notable concurrence of events or circumstances without apparent causal connexion" (Oxford Dictionary, 1931 ). A second repeats the same idea in somewhat different words: "An accidental and remarkable occurrence of events, ideas, etc., at the same time, suggesting but lacking a causal relationship" (Webster's, 1970). Both definitions contain the notion that coincidences are characterized by a very low probability of occurrence, and by extraordinary or remarkable content. Despite popular interest in the subject, coincidences have rarely been systematically studied. The few cases where they have, however, provide some important insights. For one thing, coincidental events often don't have the (normatively) low probability they are attributed with. Extraordinary events, like those reported in the parapsychological literature as evidence for "communication with the dead", are actually not as improbable as they are frequently taken to be (Alvarez, 1965). Moreover, a simple random simulation experiment is capable, in some circumstances, of generating coincidences comparable in rate and quality to the most remarkable matches obtained in telepathy experiments (Hardy, Iiarvie, and Koestler, 1973). Falk (198112) suggested that surprise evoked by coincidences may be the result of a largely unconscious cognitive bias which she called "the selection fallacy". When events attract our attention because of their unusualness, we tend t o invent explanations for their seeming rarity. By (logical) inference, we conclude that such events are extremely improbable, but fail t o recognize that only by random sampling is such a conclusion justified. When we aren't erring as intuitive scientists, we may be erring as intuitive samplers. In any properly executed sampling scheme, at least two things are required: a definition of the sample space and a specification of the objects (or events) being sampled. Taking the case of coincidences, the appropriate sample space is all event combinations and the objects those events having a surprising character. However, a human shortcoming is the tendency to underestimate the number of ways in which (independent) events can combine. Tversky and Kahneman (1974) asked subjects to estimate the value of 8 ! and found the median estimate to be less than 6% of the true value. Though the purpose of their experiment was to examine the biasing effect of a particular judgmental heuristic (i.e., anchoring) on estimates, the result illustrates as well the lack of appreciation people have intuitively for how large multiplicative phenomena can become. Simdar underestimation has been found in judgments of exponential growth (Wagenaar and Sagaria, 1975).
COINCIDENCES
491
By underestimating the extent of possibilities, a particular combination of events may tend to be judged unique instead of ubiquitous. For example, the relatively commonplace coincidence of a friend who knows someone who knows someone you know doesn't seem quite so surprising when one realizes that, on the average, the acquaintances of the acquaintances of people we are acquainted with comprise (as a group) a set approximately half the population of the United States (Bernard and Killworth, 1979). Memory processes may also influence our judgment about the surprisingness of events. Hintzman, Asher, and Stern (1978), for example, demonstrated that incidents can trigger the retrieval from memory of specific events, but leave other (often related) events undisturbed or forgotten. Their results suggest that when real-world events promt the recall of stored dreams, fantasies or imaginations, we may get the impression of a "precognitive" experience that is actually no more than a cognitive mirage. Hindsight wisdom usually makes events that occurred in the past seem more likely than comparable hypothetical events. This suggests that the same story will be judged less surprising if we are told that it has occurred than if asked about a future possibility. However, Fischhoff (1976) found that temporal context alone does not influence judgments about the likelihood of events. If true, then coincidence-type stories should seem equally surprising whether told in the past or the future tense. Future coincidences, while not the normal way in which coincidence stories are told, are not unlike scenarios often used as an aid t o forecasting and prediction (Martino, 1972), and may be thought of as similar to the mental simulation activities people exercise in judging the likelihood of specific event combinations (Kahneman and Tversky, 1981). Whether in the past or future, coincidences are marked by the specificity of the events involved. If their surprisingness is attributable to this aspect of their character, then enumerating (or union) of all of the ways in which a similar set of circumstances could have occurred should lead t o their being judged less surprising. A union of events should seem less surprising than any specified intersection of events. Since an intersection story always relates an elementary event which is included in the scope of a union story, one could expect,on psychological grounds, that a union would be seen as more likely than an intersection.'
'
Falk (1975) asked subjects to judge the likelihoods of coincidence stories, on the surprisingness of which the present study focuses. Subjects asked to give judgments of surprisingness may not interpret it as the precise opposite of likelihood, but rather produce a judgment of likelihood combined with an affective response. This departure may in part account for differences between the present results and Falk (1975).
492
R.Falk and D.MacCregor
In a preliminary study on the judgment of coincidences each of 7 9 subjects (university and high school students from Israel) read four coincidence stories and rated them for perceived likelihood (Falk, 1975). Each of four different coincidences appeared in four different versions, manipulating both the event scope (intersection vs. union) and tense (past vs. future). Thus, the four versions included intersection and union stories, each of which was told either in the past or in the future. The results suggested that a coincidence story as usually told (i.e., a past-intersection story) was judged about as likely as both the past-union and future-union stories. Apparently, the specific event read by subjects was perceived to be as likely as a broader event "of that kind". The tense of the story had no significant main effect on likelihood judgments, though the future-intersection story was judged significantly less probable than the other three stories. This was the only story that was interpreted as a strict intersection. However, Falk (1975) did not control for the possibility that stories differed in objective surprisingness. The present study replicated and extends the findings of Falk (1975). Briefly, coincidence stories are presented in each of four versions, futureintersection (FI), past-intersection (PI), future-union (FU), and p t - u n i o n (PIJ). The intersection versions (past and future) are phrased as detailed events, describing a conjunction of several specific components. In the union versions (past and future), events are nonspecific, allowing for the occurence of at least one of a set of possibilities, Future stories describe a hypothetical event and past stories describe an event which has occurred. Hence, one could expect intersection stories to be more surprising than union stories. If temporal setting has no influence on the surprisingness of a story, then the order of judgments for the four versions will be: S(P1) = S(F1) >S(PU) =S(FU), where S is surprisingness. However, if temporal context does influence judgments, then the surprisingness of future stories will be greater than past stories; thus S(F1) > S(P1) and S(FU) > S(PU). Experiment 1
Method Design The design of the experiment was similar to that of Falk (1975). Four different coincidence stories were created, each having different content. Four versions of each of the four stories were created: past-intersection (PI), past-union (PU), future-intersection (FI), and future-union (FU).
COINCIDENCES
493
Altogether 16 stimulus stories comprised the design. Each subject was given four coincidences to read and rate, one (different) version from each of the four stories, presented in random order. All 24 permutations were used, most of them repeating four times (except for 5 permutations that appeared 3 times).
Stimulus material The stories were essentially the same as in Falk (1975), adapted to an American setting. The coincidences described an unexpected meeting with an old friend, a fortunate hitchhike, a convergence of birthdays, and a peculiar numerical combination; versions of each were constructed as follows: The pustiitersection (PI) version enumerated all the specific details that converged in the event that had occurred. For example: Recently, I met a student who told me how fortunate he was: Precisely on his 19th birthday, h e tried to hitchhike home t o John Day for his birthday party and started waiting at the outskirts of Eugene. Quite soon, a driver stopped to offer him a ride, and it turned out that he was going exactly to John Day, a small town in Eastern Oregon. The past-union (PU) version related the same story that had occurred, however, it also gave the reader several hints about the universe of possibilities of which this story represented but one: Recently, I met a student who told me how fortunate he was. His home town is John Day, a small town in Eastern Oregon situated on a main highway. One day earlier, precisely on his birthday, he tried to hitchhike and started waiting at the outskirts of Eugene. Quite soon, a driver stopped to offer him a ride, and it turned out that he was going directly to John Day. He wasn’t that lucky up t o that day through a year of frequent hitchhiking. In the future-union (FU) version, the same scope of possibilities was described but the question of surprisingness referred t o a hypothetical event: Recently, I met a student who told me that his hometown was John Day, a small town in Eastern Oregon situated on a main
R .Falk and D. MacCregor
4 94
highway. He is a student in Eugene. He prepares himself for frequent hitchhiking and a lot of waiting on the roads. He often asks himself whether throughout the year he will be fortunate enough to get a ride directly from Eugene to John Day. The future-intersection (FI) version described a hypothetical event in which all the above mentioned components converged: Recently 1 met a student on his way to the outskirts of Eugene in an attempt to hitchhike. He told me that his home town is John Day, a small town in Eastern Oregon. He added that it was exactly his 19th birthday and he was going home for his birthday party. He wondered whether he would be fortunate that day and that the first driver to stop and offer him a ride would go directly t o John Day. Parallel versions were constructeg for the three remaining stories.
Procedure The task was introduced to subjects as follows: In this task we would like you to read four stories, each of which describes coincidences which occur in people’s lives. Read each one carefully. At the end of each story, you will be given an event from the story and asked t o rate it for how surprising it is. Each story was presented on a separate page. Subjects were asked to judge the surprisingness of each story on a 20-point scale where 1 = ”not at all surprising” and 20 = “very su~prising”.~ In addition, subjects were asked t o write and describe a coincidence that had happened to them, and rate it for swprisingness.
An alternative task would have bym to ask subjects to judge surprlsingness via For example, how many more tlmes surprising did you find magnitude this event? than some specified event. Such judgments would form a log-ratio scale, in wNch case their subsequent analysis would have involved division rather than subtraction Though the affective character of ”surprisingness”may seem to call for a logarithmic response, the judgmental scale used was chosen for simplicity and to parallel Falk (1975).
tima mat ion.
495
COINCIDENCES
Subjects A total of 91 subjects recruited through an advertisement in the University
of Oregon student newspaper took part in the experiment. Each was paid $5/for participating in a 1-112 hour session in which they completed this and a number of other unrelated tasks dealing with judgment and decision making.
Results
Since each subject rated four stories, six paired comparisons can be made for each subject. An ordinal analysis was performed noting only the dkection of the difference (whether more, less or equally surprising) between the ratings of each pair of stories. Hence, six signs (each being >, < or =) were obtained for each subject. The results of this analysis are presented in Table 1. The paired comparisons are arranged in the Table such that S(Y) >S(X), where S( Y) is the rated surprisingness of the version noted on the Y axis and S(X) the corresponding rating of the version on the X axis. The ordering of the versions along each axis is such that the preponderance of instances correspond to S(Y) > S(X). The z statistics presented in the cells of Table 1 are significance tests on the sign of the comparison between the row and column entries. The lower line (in parenthesis) presents analogous results for the 79 subjects in Falk (1975). Table 1 . Analysis of Version Effect Within-In Subjects: Unadjusted cores^ @J = 91)
Analysis of the Hypothesis S ( Y ) > S ( X ) Y PU (z = 0.97)
PI
FI
z = 5.16
z = 4.10
(Z = 3.35) (Z = 2.02)
FU a
PU
z = 2.49 (z = 4.36)
X
PI
Values in parentheses are for the 79 Israeli subjects from Falk (1975).
R. Falk and D. Madregor
4 96
Since the same subjects contributed their responses t o all six cells, the z ratios should be interpreted only as descriptive measures. The most apparent result in Table 1 is that the future-intersection versions were judged considerably more surprising than any of the others. Furthermore, a PI coincidence was less surprising than a comparable FI event, although both mentioned the same list details. The same result can be seen from consideration of the mean ratings given to the four versions by all subjects. These means, written in decreasing order of surprisingness, are: S ( F I ) = 14.l;S(PI)= 11.5;S(PU)= 10.9;S(FU)=9.20 However, the four stories differed in mean surprisingness (across versions) about as much as the four versions. Therefore, each raw rating of a specific story-version combination was adjusted by subtracting from it the mean rating of that story (across subjects and versions). If S(X) is the raw surprisingness rating of version-story A', then S'Q denotes the adjusted rating of X,that is, the difference between S(X) and the mean rating of the same story. Hence, S'(X) measures the net version effect. The differences between pairs of adjusted ratings were organized in a format as previously. Table 2 presents the results of this analysis. The ordering of the versions along the axes matches that of Table 1, confirming the general Table 2. Analysis of Version Effect WithinSubjects: Mjusted Scores
(N = 91) d = S' (Y)- S' (X)
6=1.71
PU
-I
u = 7.976
t = 231*
2 = 2.43
PI
u
Fl
* **
***
p < 0.05
p < 0.01 p
u = 6.386
t = 3.36 **
I
d' = 4.86
a= 3.24
= ;
2.40
u = 6.921
a
= 6.613
t = 4.45**+
t =
u -.
= 6.898
d = 0.68
= 7.813
1 = 5.90***
= 1.02
3.41 ***
-
X
COINCIDENCES
497
pattern that emerged from the ordinal analysis. Symbolically, S(FI) >S(P1) z S(PU) > S(FU) FI stories were judged most surprising, FU least surprising, and PI was judged halfway between them.
Discussion The main finding of Falk (1975) was that the usual form of a coincidence (PI) was not judged to be extremely unlikely, not as unlikely as a comparable intersection in the future; that result was in evidence here also. In addition, no overall effect of temporal context was found; future stories were not generally evaluated as more surprising than past stories. The other independent variable of the design, namely, the extent of enumeration of story elements, had a distinct effect for future stories (S(F1) > S(FU)), but a negligible one for past stories. Though past-union stories were judged somewhat more surprising than future-union stories, past-intersection stories were nearly equal in surprisingness to past-union Why do coincidences, as usually told, seem as surprising as they d o if they are seen as comparable t o events of much greater scope? One possible explanation is that coincidence stories are less surprising t o the listener than they are to the teller. Nisbett and Ross (1980), for example, claim that inferences about matters that touch upon oneself have unique status. If so, coincidences may seem especially surprising when we are involved in them ourselves. Possibly the teller and the listener concep tualize the same event differently and, therefore, judge surprisingness unequally. Experiment 2 explores the relative surprisingness of coincidences that involve self versus those that involve others.
When one tells a past story, it is difficult to claim that "at least one)) of a set of events happened. The PU stories used here related a specinc occurrence and casually alluded to other possibllities. Those hints may not have suggested a union set strongly enough. This may account in part for why past-union stories were rated more surpri~ ing than futureunlons, but equally surprising as past-intersections.
498
R.Falk and D.MacCregor
Experiment 2 Method Design Recall that in Experiment 1, subjects wrote a personal coincidence and rated it for surprisingness (on a 20-point scale). Of the 91 coincidence stories obtained, 75 were judged to be sufficiently complete that they could be rated for surprisingness by a separate group of individuals, with minor corrections to grammar and spelling. In addition, the four PI version stories used in Experiment 1 were presented again in this experiment. From the readers’ point of view, all stories in this experiment had happened to another person.
Stimulus materials The 75 stories selected from Experiment 1 were corrected for grammar and spelling and each clearly typed on a separate sheet. They were divided into 3 groups according to length.
Procedure The task was introduced similarly t o Experiment 1. Each subject received 10 coincidence stories for evaluation; the four original PI stories and six of the subject-written stories (two short, two medium, and two long). The set of 10 stories was randomly ordered such that no two of the four PI stories ever appeared sequentially. As a result, each of the PI stories was rated by each of the subjects in the experiment, while each of the subject-written stories was rated by 10 to 12 individuals.
Subjects A total of 146 subjects recruited and paid in the same manner as Experiment 1 participated in this experiment.
COINCIDENCES
499
Results Regarding notation, S (story) denotes the rated surprisingness of a story. "Self' refers t:, stories written by subjects. "Other" refers t o the PI stories written by the experimenters and rated by subjects in Experiments 1 and 2. The subscript c designates the mean value of that variable for all control subjects (those in Experiment 2) who rated a given story (1 2 in most cases). For example, S (self) denotes the surprisingness ratlng of a story by its author, and S, (self) denotes the mean surprisingness rating of the 12 subjects who read that story. Table 3 presents a summary of the results of Experiment 2. The source of the coincidence (oneself or someone else) was found to affect the surprisingness evaluation of an event. The difference, d , between a subject's rating of his or her own story and the rating of the PI story read by that subject (i.e., the only PI story read in Experiment 1) was cornputed for each of the 75 subjects from Experiment 1 whose stories were selected for use in Experiment 2. Column 1 of Table 3 gives summary statistics for the 75 d values. On the average, authors judged their own coincidences somewhat more surprising than they judged the PI coincidences written by the experimenters, though not significantly so. Each subject-written story from Experiment 1 was rated by 10 to 12 subjects from the present experiment. Each rater from the present experiment also rated the same PI stories as the Experiment 1 subjects. A cornparison of the ratings of "selfr' stories with "other" stories when both are judged by subjects in Experiment 2 indicates that the stories authored by the experimenters (other stories) were "objectively" more surprising than those authored by subjects.s This suggests that even a substantial tendency of subjects to judge their stories more surprising than "other" stories might have been obscured because "other" stories were more surprising. Column 3 of Table 3 presents the results of the same comparison as column 1 when ratings are adjusted for differences in objective surprisingness by subtracting the mean rating of each story (from Experiment 2) from each subject's rating in Experiment 1. Now subjects' ratings of their own coincidence stories are greater than their ratings of "other" stories; self-coincidences appear more surprising than coincidences that happen to others. The "other" stories (te., the four PI stories composed for Experiment 1 ) were carefully mnstructed and worded. In contrast, the stories authored by subjects were written on the spur of P e moment, during a short experimental session. This may explain why the "other stories turned out, on the average, to be objectively more surprising.
32*
VI
0 0
Table 3. Summary of Differences Between Ratings of Coincidences Authored by Subjects (Self) and Coincidences Authored by the Experimenters (Other)
1
”Experimental” Subjects (Experiment 1) d= S (self) - S (other)
Mean
0.96
Standard Deviation
7.045
t (df = 74)
1.18
“Control” Subjects (Experiment 2)
-1.50 4.925 -263*
I
Adjusted Scores d’= ’(self) - ’(Other) = d - dc
S ‘(self) = S (self) - Sc (self)
S’(other) = S (other) - S, (other)
2.46
3.28
0.83
6.320
4.885
5.099
3.37**
5.82**
1.40
COINCIDENCES
501
-
Column 4 contrasts t h e surprisingness ratings subjects gave their own stories at the time they wrote them with the average surprisingness ratings given to them when read by another group (of 10-12 subjects per story). Subjects who wrote stories apparently felt them t o be considerably more surprising (on average) than did other subjects who read them. Taken together, the results in Table 3 suggest two faces of coincidence j u d g ments; subjects found their own coincidence stories more surprising than those of another (i.e., PI stories) and they thought their own coincidence more surprising than did other subjects.
Discussion The results of Experiment 1 suggest that people judge the surprisingness of a coincidence story told in its usual scope and tense (i,e., PI) very similarly to the way they judge a more extended event (i.e., PU). However, the most striking finding was that coincidence stories involving an intersection of details were judged considerably more surprising when cast in the future tense than in the (usual) past tense. Though the conclusion of Fischhoff (1 976) that temporal context has no appreciable effect on likelihood judgments was generally supported here, the present results hint that, at least for some types of coincidence, stories’ surprise value may be influenced by tense. Yet, a direct parallel between Fischhoff‘s findings and these results must be drawn cautiously since judgments of surprisingness may contain an affective component not a part of likellhood judgments. In Experiment 2, the source of the coincidence (oneself or someone else) was to affect the surprisingness evaluation of an event; subjects found their own coincidences more surprising than another’s coincidences, although they were objectively less so. The root of this difference in judgments could be tied to the union-intersection distinction; subjects reading stories written by others could view the writer of a coincidence as one of many individuals and every event as a realization of one of many possibilities. A related explanation is that we seem unique to ourselves and fail to see how the events which happen t o us could have happened to anyone. Said another way, we may not see how easily we could have been replaced in a set of events or circumstances by another individual. The tendency t o view ourselves and our motives as fundamentally different from those of others is a pervasive finding in social psychology (see, for example, Nisbett and Ross, 1980) that may account in part for a tendency to view the events occurring in one’s life as somehow more novel or noteworthy than the events occurring in the life of another.
502
R.Falk and D. MacGregor
What lessons can we tentatively draw from these results? One is that by discounting the unusualness of events told to us by others, we may not learn all we can from experiences in which we don’t directly partake. By assigning a unique status to ourselves, we may come to believe that some events, while they happen infrequently to others, would happen even less frequently to us, like motor vehicle accidents. Such a bias in judgment could work against our taking adequate protective action or adopting an appropriate defensive attitude. On the other hand, when a seemingly rare combination of events happens in our own life, we may give it attention beyond what it is due, perhaps becoming preoccupied with attributing causality to something m r e chimerical than real.
References Alvarez, L.W., 1965. A pseudo experience in parapsychology. Science, 148, 1541. Bernard, H.R. and P.D. Killworth, 1979. A review of small world literature. Sociological Symposium, 28, 87- 100. Falk, R., 1975. Perception of randomness. Doctoral dissertation (in Hebrew). Jerusalem: The Hebrew University. Falk, R.,198112. On coincidences. The Skepticallnquirer, 6, 18 -31. Fischhoff, B., 1976. Temporal settlng and judgment under uncertainty. Organizational Behavior and Human Performance, 15, 180-194. Hardy, A, R Harvie, and A. Koestler, 1973. The Challenge of Chance. New York: Random House. Hlntzman, D.L., S.I. Asher, and L.D. Stern, 1978. Incidental retrieval and memory far coincidences. In: M.M. Gruneberg, P.E. Morris, and R.N. Sykes (eds.), Fractical Aspects of Memory. London: Academic Press. Kahneman, D. and A. Tversky, 1981. Availability and the simulation heuristic. In: D. Kahneman, P. S!ovic, and A. Tversky (eds.), Judgment Under Uncertainty: Heuristics and Bioses. New York: Cambridge University Press. Martino, J.P., 1972. Technological Forecasting for Decision Making. New York: Elsevier, Nlsbett, R. and L. Ross, 1980. Human Inference: Strategies and Shortcomings of Social Judgment Englewood Cllffs, N.J.: Prentice Hall. Tversky, A. and D. Kahneman, 1974. Judgment under uncertainty :Heuristicsand biases. Science, 185, 1124-1131. Wagenaar, W.A. and S.D. Sagaria, 1975. Misperception of exponential growth. Perception and Psychophysics, 18, 416-422.
Section V
A SYMPOSIUM ON THE VALIDITY OF STUDIES ON HEURISTICS AND BIASES
This Page Intentionally Left Blank
INTRODUCTION Willem-Albert WAGENAAR
This section was ca1led”symposium”since it contains personal views rather than research papers. Because of their personal character the invited contributions remained unedited. Thus much of the dialogue a$ it evolved is preserved, including some of the weaker arguments. The main question under discussion, raised eloquently in Ward Edwards’ paper, is whether results gathered in experiments on heuristics and biases in probabilistic thinking and decision making are sufficiently valid for everyday practice. If validity indeed exists the research has important consequences for many problems society is now facing. However, the risks in applying these results without checking their ecological validity are overwhelming; this may be one of the reasons why emotions in discussions of this topic run so high. Edwards claims that validity, when checked in the rare experiments involving experts in real life situations, appears to be nonexistent. The illusion of validity has originated from the consistency among experiments run almost exclusively in laboratory settings and with subjects drawn from student populations. To what extent this description of psychological research is itself valid is a matter the reader should judge carefully. While Edwards argues that laboratory experiments on heuristics and biases create false perspectives and that they therefore are harmful and destructive, Fischhoff expresses the opposite opinion that these experiments are useful and constructive. His argument is that replications in a large variety of experimental settings produced consistent results and that many everyday life situations exist which share formal properties with the situations studied. Perhaps even more important is the consideration that heuristics have to be studied by psychologists, because in many complex situations people simply cannot apply normative computational rules, even when their final solutions conform to the norm. Normative models lack
5 06
W.-A. Wagenaar
psychological validity as much as too restrictive process models lack ecological validity. Phillips takes u p this argument by explaining that psychological research should produce results which are valid in both senses. Modelling people as either normative statisticians or maladapted intuitives is outmoded; new paradigms should reveal the richness and adaptability of the human heuristic repertoire, as well as its limitations. One lesson learned from the discussion is that laboratory studies on heuristics and biases in probabilistic thinking and decision making should explicitly enumerate the real life conditions for which they are supposedly valid; and also demonstrate how this validity was checked.
HUMAN COGNITIVE CAPABILITIES, REPRESENTATIVENESS, AND GROUND RULES FOR RESEARCH* Ward EDWARDS Sociol Science Research Institute
University of Southern California, U S A
If someone says "2 + 2 = 4", that isn't psychology; it is just arithmetic. But "2 + 2 = 5" is psychology. If enough experimental subjects say it often enough, it will be a finding, and the experimental and theoretical literature about it will burgeon. We inherited this focus on error, deficiency, bias, delay, and illusion as the subject matter of psychology from a very long tradition indeed. Wundt and Titchener, focusing on issues fairly closely related to human sensory capabilities, established both a definition of what psychology was about and the beginnings of a tradition about how psychologists should do their jobs. Psychology, in their formulation, was concerned with explicating the functions of the "generalized normal adult human mind". Since the goal was generalization, and any individual's mind was as much a representative that of the "generalized normal adult human" as any other, it seemed both natural and convenient t o use the subjects closest to hand. In the Introspectionist days, those subjects were the scientists themselves, and their graduate students. After this form of research came under heavy fire (though not, in my view, heavy enough), the subjects changed and became the readily available college sophomores who populated the professors' classes. So long as the topic of psychological research was human sensory ability, this made relatively little difference-though I suggest that a broader selection of subjects might have led t o earlier appreciation of the importance of monocular cues to visual depth. The study of rote memory was certainly handicapped both by the selection for relatively high intelligence and by the selection for youth
*
Preparation of this paper was sponsored by the Defense Advanced Projects Agency (DoD). ARPA Order No. 4089, under Contract No. MDA903-81-C-0203 issued by Department of Army, Defense Supply Service-Washington, Washington, D.C. 20310.
W.Edwards
508
implicit in using college sophomores as subjects. But a more serious handicap in that area came from the use of syllables in a meaningless order or disconnected words as stimuli in such studies. As a result of this very nonrepresentative sampling of memorization tasks, professors can seriously spend a working day correcting proof of an article about the severe capacity limitations of human memory, and that evening go out to the theater and listen to an actor repeat lines from Shakespeare for three hours, without apparently noticing the incongruity. Even the commonplace observation that some actors are “quick studies”, able to learn their lines in a few read-throughs, while others are the opposite, did not seem to stimulate much interest either in individual variability, inherent or acquired, in ability to perform rote memorization, or in the relationship between that variability and what was being memorized. More recently, we have been subjected to a fifteen-year spate of studies that purport to show that men are incompetent intellectually. The arena for such studies, somewhat surprisingly, has been probabilistic reasoning. Somewhat by accident, I started this work off in the early 196% much to my present regret, with research on the degree t o which college sophomores could conform to Bayes’s Theorem in revising probabilities (Phillips, Hays,and Edwards, 1966; Phillips and Edwards, 1966; Edwards, 1968). Probabilities are convenient because they are of some intellectual importance, they are easy t o estimate, and most important, the existence of a body of theory specifies the right answer and so makes it easy to find that human beings do it wrong. Such authors as Kahneman and Tversky (Kahneman and Tversky, 1972; Kahneman and Tversky, 1973; Tversky and Kahneman, 1973; Tversky and Kahneman, 1974), who have been key participants in this activity, have been quite careful to note the fact that work done on probabilistic reasoning does not necessarily generalize to other kinds of reasoning. They have been much less persistent about pointing out a second caveat with which I am sure they would agree: that both their methods and their selection of subjects encourage the occurrence of error. It should be noted in this context that studies of unmotivated subjects and use of nonexpert subjects are enshrined in psychological tradition. I have been in disagreement with this line of research and thought for some time, and now I am ashamed about my own role in starting it off. I remained silent about it because I believed, wrongly, that it was a fad and would die out-though those of you who have followed m work will note that I published not a word about conservatism in proba ilistic inference since about 1970. However, I now find that the ideas, without the accompanying caveats, have spread far beyond psychology. I hear the message that man is a ”cognitive cripple” from a wide variety of nonspychologists
l.
COGNITIVE CAPABILITIES
509
these days. I encounter it in refusals t o accept manuscripts submitted for publication showing men performing such tasks well; in refusals of grant or contract applications because "Kahneman and Tversky have shown that people can't do such tasks". Much more importantly, I hear that message from researchers on medical diagnosis and intelligence analysis as one basis for resisting the introduction of explicit probability assessments into their ways of doing business. I even hear it generalized to intellectual tasks that have nothing to do with probabilities. The net effect has been a significant contribution to the widely held view that whenever possible human intellectual tasks shodd be done by computers instead. It is time to call a halt. I have two messages. One is that psychologists have failed to heed the urging of Egon Brunswik (1955) that generalizations from laboratory tasks should consider the degree to which the task (and the person performing it) resemble or represent the context to which the generalization is made. Few experimenting psychologists working on cognition deny that their studies are grossly unrepresentative both of tasks and of subjects who might perform these tasks. The Table shows a simple taxonomy of intellectual tasks and of their performers; most of the taxonomic categories result from dichotomization or trichotomization of continua. If one thought of the products of all those numbers as representing a table into which psychological experiments, and also real-life intellectual tasks, could be categorized, the table would contain 192 kinds of tasks, 18 kinds of performers, and so 3,456 cells. The right-hand column of the table contains my judgments about the extent to which each taxonomic category separately has been explored with a significant body of literature concerned with human intellectual performance, Thus, for example, virtually no instances of realistic tasks appear in the literature on human intellectual performance. Most such studies have been done in the laboratory; a very small set of studies on the job also exist. The 'plus'symbol isintended to identify such sparsely populated categories. The conclusion from the table is that, of the 3,456 cells we could study, we have in fact studied only 32 well. What is worse, for the purpose of making interesting generalizations it is the wrong 32. We can find bookshelves full of experiments on unrealistic, trivial tasks performed in the laboratory without tools by individual students. But the supply of studies of real-life tasks in which tools were used, performed by an organization of experts, is scanty indeed. Yet for many purposes, the latter kind of situation is the one to which we would like our conclusions t o apply. The reason why we study the wrong tasks and people is not primarily because psychologists are perverse, though considerable perversity is built into our definition of purpose. The main reason for this mismatch
510
W. Edwards
Representativeness of Intellectual Tasks and of their Performers
-
Task Dimensions
Number Product
a. Easy vs. Difficult
2
b. Realistic vs. Unrealistic
2
c. Time Pressure Present vs. Not
2
d. Routine & Repeated vs. Unique
2
e. Laboratory vs. Job vs. Life
3
f. Important vs. Trivial
2
g. Tools Availablc vs. Not
2
Number Studied
Product
I92
Performer Dimensions
h. One Person vs. Several vs. an Organization
3
i. Studentsvs. Nonexpertsvs. Experts
3
j.
Able vs. Less so
Total numbers of cells
2
18
3456
between what we study and what we would like our studies to be relevant to is simply that it is extremely difficult to obtain access t o the right tasks and people. Note that I say difficult, rather than impossible; 1 can point to a few studies currently under way that d o in fact overcame such difficulties. A particularly interesting taxonomic dimension is the use of tools. Researchers on verbal learning ban paper and pencil or tape recorder aids from their experimental rooms-though in real life the fEst thing one would d o if one wanted to remember something correctly would be to write it down. Similarly, researchers on judgments involving probabilities ban the combination of knowledge of probability theory and the computational and reference tools needed to implement that knowledge from their own experimental laboratories. Yet, in any real context in which explicit use of probabilities is important, we go to great pains, if at all possible, to make sure that the individuals who must use them understand them-and we spend a great deal of time and money designing sophisticated tools for them to use, with the probabilistic logic built into the tools themselves. This fits well a quite different and more recent tradition of psychology. Human engineers, if confronted with a context in which learning is difficult, usually conclude that the thing to do is to build as much of the task as is conveniently possible into the hardware and software of the
COGNITIVE CAPABILITIES
511
equipment, rather than leaving it to the man. The development of autopilots, predictor displays, quickened displays, and many other sophisticated forms of equipment found in aviation and military contexts illustrates the point. My second message is that, even without tools, experts can in fact do a remarkably good job of assessing and working with probabilities. Two major groups of studies seem to show this. One is concerned with weather forecasters. A series of studies by Winkler and Murphy (e.g., Murphy and Winkler, 1977) shows that in almost any situation in which a weather forecaster is required to work with probabilities, the results are excellent. They did not choose t o use the kinds of tasks typically found in the laboratory. Instead, they used either the tasks that weather forecasters must do every day, or else close analogues to them. Variation from optimal behavior is impressively low. A second group of studies seems to say that physicians also work well with probabilistic reasoning tasks of kinds familiar to them (see, for example, Lusted et al., 1,982). Here the evidence is less unequivocal-but the bulk of it is consistent with the weather forecaster work. T h s is important, because in the U.S. daily probabilistic forecasting, and resulting training in what probabilities mean and how to manipulate them, are all part of a weather forecaster's job. Physicians, on the other hand, have no explicit training in probability-if they are lucky, they heard a lecture or two in medical school closely tied to statistics, and that is all. Nevertheless, exceedingly untrained physicians can be shown to do a good job, at least sometimes, in assessing probabilities, and can be shown to do such things as exploiting the information in base rates. I reach various conclusions from this. One is that, as a practical matter, the rejection of human capability to perform probabilistic tasks is extremely premature. Indeed, the studies often cited as showing that people perform such tasks poorly can be interpreted to argue for the opposite conclusion. Obviously, the experimenters themselves, using tools and expertise, were able t o perform such tasks rather well. If they had not been, they could not have determined the correct answers with which the errors that purport t o show human inadequacy are compared. My conclusion from such studies would be that, if you need t o perform a difficult intellectual task, both tools and expertise are very likely t o be helpful -which seems hardly surprising, if a bit unglamorous. A second conclusion is that the "generalized normal adult human mind" is simply the wrong target for research on human intellectual performance. We must recognize that minds vary, that tools can help, that expertise can help, presumably both expertise in the subject matter of the task and expertise in probability itself. In spite of all the difficulties
512
W. Edwards
inherent in doing so, we must learn how to get access t o the populations to which we wish t o generalize. And when we do, we must give them tasks representative of the kinds of tasks that we wish our generalizations to cover. In short, in the intellectual domain, we need t o study specific classes of minds performing specific kinds of tasks. If broader generalislations emerge out of that study, we will be very lucky, but we shouldn’t expect that in advance. A most difficult and demanding final conclusion is that we have no choice but to develop a taxonomy of intellectual tasks themselves. Only with the aid of such a taxonomy can we think with reasonable sophisitication about how to identify among the myriad kinds of experts and the myriad kinds of tasks such experts normally perform just exactly what kinds of people and tasks especially deserve our attention. Fortunately, we are by no means alone in the need for a task taxonomy. Exactly such a task taxonomy has been under development for years by those practically-oriented men and women who define their roles as being to help others to perform intellectual tasks, notably decision making (see, e.g., Raiffa, 1968; Edwards, 1973; Howard, 1973). While the task taxonomy of decision analysis is quite incomplete, it is in my view considerably better than anything that the psychologists have developed as the basis for a taxonomy of intellectual tasks. It is an excellent place t o start.
References Brunswik E., 1955. Representative design and probabilistic theory in a functional psychology. Psychological Review, 62, 193-217. Edwards, W., 1968: Conservatism in human information processing. In: B. Kleinmuntz (ed.), Formal Representation of Human Judgment. New York: Wiley & Sons. Edwards, W., 1973. Divide and conquer: How to use likelihood and value judgments In decision making. In: R.F. Miles, Jr. (ed.), Systems Concepts: Lectures on Contemporary Approaches t o Systems New York: WUey & Sons. Howard, R.A., 1973. Decision analysis in systems engineering. In: R.F. Miles, Jr. (ed), Systems Concepts: Lectures on Contemporary Approaches to Systems, New York: Wiley & Sons. Kahneman, D. and A. Tversky, 1972. Subjective probability: A judgment of representativeness. Cognitive Psychology, 3, 430-454. Kahneman, D. and A. Tversky, 1973. On the psychology of predlction. Psychological Review, SO, 237-251. Lusted, L.B., H.V. Roberts, D.L. Wallace, M. Lahiff, W. Edwards, J.W. Loop, R.S. Bell, J.R. Thornbury, D.L. Seale, J.P. Steele, and D.B. Fryback, 1982. Efficacy of diagnostic radiologic procedures. In: K. Snapper (ed.), Practical Evaluation: Case Studies in Simplifying Complex Decision Problems Washington, D.C.: Information Resources Press.
COGNITIVE CAPABILITIES
513
Murphy, A.H. and R.L. Winkler, 1977. Reliability of subjective probabihty forecasts of precipitation and temperature. Journal of the Royal Statistical Society, Series C (Applied Statistics), 26, 41 -41. Philhps, L.D. and W. Edwards, 1966. Conservatism in a simple probability inference task Journal of Experimental Psychology, 72, 346--357. Phillips, L.D., W.L. Hays, and W. Edwards, 1966. Conservatism in complex probabilistlc Inference. IEEE Transuctions on Human Factors in Electronics, HFE- 7, 7- 18. Raiffa, H. 1968. Decision Analysis: Introductory Lectures o n Choice Under Uncertainty. Reading, Mass: Addison-Wesley. Tversky, A. and D. Kahneman, 1973. Availability: A heuristic for judging frequency and probability. Cognitive Psychology, 5, 201-232. Tversky, A. and D. Kahneman, 1974. Judgment under uncertainty: Heuristics and biases. Science, 185, 1124-1131.
33
This Page Intentionally Left Blank
RECONSTRUCTIVE CRITICISM* Baruch FISCHHOFF Decision Research, Eugene, Oregon, U,S A. and MRC Applied Psychology Unit, Cambridge, England
Abstract Evaluating or criticizing any research program is a protracted exercise UI interpretation. Repeatedly one must decide whether t o take existing results a t face value or to reconstruct theu meaning, by claiming that the task meant something different t o subjects than was intended by the experimenter or that theu range of generalizabtlity is more limited than the experimenter had hoped. To be useful and valid, such criticism must follow certain guidelines, which are set out in the hulk of thls essay. The sort of detailed analysis of existing data and texts that these guidelines require is beyond the scope of the present effort. Nonetheless, the essay ends with some speculations about what a proper evaluation would reveal about the legacy of heuristics and biases.
Over the last decade, much judgment research has focused on the heuristics that people use to help them interpret uncertain situations-and the biases that those heuristics sometimes produce (Kahneman, Slovic, and Tversky, 1982). The task of our panel is t o evaluate the progress made by this research and its prospects for the future. This task makes me feel very uneasy. In part, my discomfort is due t o the inherent difficulty making such a global assessment. At any point in time, the data always present a somewhat confused picture. Not only d o they point in different directions, but they were produced b y investigators using different methods and varying levels of care in their work. The need t o produce clarity out of this confusion, in order to make some summary statements about what the data seem to show, leaves one vulnerable to a variety of traps that threaten good science. One of these traps is methodological imperialism, whose outward manifestations are pugnacious claims about there being only one way to study behaviour and a siege mentality when confronted with alter-
* I would like to thank Ruth Beyth-Marom, Ruth Phelps, and Paul Slovic for their comments on an earlier draft. 33 *
516
B. Fischhoff
natives. A second trap is methodological skepticism, which attempts to render all empirical evidence moot by enveloping it in a cloud of claims about unintended factors that might have influenced the observed behaviour. A third trap is opportunism (or special pleading), whereby one uses whatever ad hoc arguments help one to reinterpret existing studies so as to prove a particular point, without concern for the consistency of the claims made in different contexts or to their testability. Although there are procedures for avoiding these traps, they require an investment of time, as well as a tolerance for confusion, that are hard to reconcile with a symposium’s pressure for brevity and clarity. To be valid, an assessment requires a lot of detailed work. It must look hard at all existing data and the conditions under which they were collected. It must look carefully at texts, in order to extricate the claims made for those data by their producers from those made by secondary interpreters and camp followers (Berkeley and Humphreys, 1982). It must identify its own hidden and unsupported assumptions. For such an assessment to b e useful, it needs to build as well as destroy. Just as it is hard to extinguish one response to a stimulus without offering an alternative, it is not enough t o show faults with an existing theory. Theories typically deal best with data whose creation they themselves have prompted, however, even there their account is never complete. Yet they are allowed to reign so long (and only so long) as they do a little better than their nearest competitor (Lakatos, 1970). Evaluating the research on judgmental heuristics is going to require the exegesis of existing studies and the elaboration of new theories. Those battles will be waged in technical papers appearing in places like JEP, OBHP, and A c f a Until they are conducted, the meaning of that research is a matter of interpretation. What a symposium can offer is some thoughts on how to conduct them most fruitfully and how to make the best use of existing data while waiting for something better to come along. The following thoughts are organized around the three most common questions of interpretation: (a) how did subjects interpret the situations in which they were placed; (b) how robust are the phenomena that have been observed; (c) what can we say, in general, about the quality of people’s judgment.
Interpreting the Situation The most difficult time for a science is an interregnum, during which the accepted wisdom seems inadequate but no alternative has arisen. The productivity of that period depends largely upon the attitudes of its citizens.
RECONSTRUCTION CRITICISM
517
It is probably highest when all are engaged in a common search to discover what the effective stimuli have been in previous experiments; that is, how did subjects interpret the complex social and perceptual situation created for them by an experimental (or real-life) context (Boring, 1969). In order to avoid the interpretative traps noted earlier, such reanalysis of existing studies must acknowledge that all faithfully collected and replicated data have some range of validity. The "trick" is to clarify what that range is, and whether it includes the red-world situations in which we would like to predict behaviour. Characterizing the effective features of a particular task begins with informed speculations. Such speculations come from looking inward at the task, in order to see how subjects' interpretation of it may have differed from that assumed by the experimenter. And they come from looking outward at the world within which people live, in order to see how the tasks it presents differ from the microcosms created in our experiments. These speculations remain just that until they have been disciplined by data (Mills, 1959). Thus, being able to envision a way in which subjects might have viewed a task differently is no reason to assume that they have done so. Indeed, the most reasonable assumption is that people tend t o be prisoners of whatever problem representation initially confronts them. Similarly, the ability t o list differences between the (experimental) context within which behaviour has been studied and the (red world) context within which it must be predicted is no guarantee that these differences matter. A useful case in point may be found in survey research (Turner and Martin, 1982). There, investigators were disturbed to find that respondents asked about race-related questions felt that their task included managing a pleasant interaction with the interviewer, and not just reporting on their true beliefs. As a result, they reported somewhat different beliefs when asked by same-race and different-race interviewers. Further studies showed, however, that the effects of race similarity are quite limited. They appear in race-related questions (e.g., attitudes toward integration), but nowhere else (e.g., not even in choices of favourite entertainers). Thus, responses to most questions are equally valid whatever the racial similarity of interviewer and interviewee, whereas for racerelated questions one requires the "right" match. What that match is depends upon why one is asking the question. For predicting the content and temper of conversations in samerace bars, one may want responses t o same-race interviewers. To predict interactions in integrated work places, one may want to "mismatch" the interview and interviewee. Responses in both interview situations are real an potentially useful, but for different purposes. It makes little sense to say that life is like a samerace interview,
518
B. Fischhoff
meaning that responses observed there represent the only indication of people's true beliefs and behaviour. In short, the usefulness of a research result depends upon both how similar the context created by the investigator is to the real-world context that interests us and how well we can assess what difference the differences between the two contexts makes. Where a particular contextual difference has not been studied, one must fall back on intuition, guided perhaps by an analysis of the consequences of setting too high or too low a threshold for doubting the relevance of previous studies. Too low a threshold means rejecting others' evidence in favour of one's own intuitions; too high a threshold means enslavement t o whatever results happen to have made their way first through peer review.
Generalizing Phenomena When one must predict behaviour in a particular situation, there is no substitute for the sort of detailed "clinical" interpretation described in the previous section. That is because no-one studies judgmental processes in their full richness. Indeed, one could not even try to do so without abandoning experimental psychology as a profession or predictive knowledge as a goal. What we have are studies of various stylized situations that differ in the nature and quantity of the experimental control they provide. Unless one believes that all of life's judgmental problems are akin t o one of these archetypes, each kind of experiment can provide the results that are most useful for predicting some real-life situations. Taken together, they allow us to generalize about the overall robustness of judgmental phenomena. The Table presents one attempt to ascertain the generality of two related biases: The tendencies (a) to exaggerate in hindsight what could have been known in foresight and (b) to be over-confident in the extent of one's own knowledge. This particular review organized all known studies according to a number of dimensions that might have made a difference in the results obtained. Observed differences (i.e., reductions in bias) are indicated by studies whose code numbers are underlined. The details are given in Fischhoff (1982). Briefly, what they seem to indicate is that the biases are "real" enough to remain after a variety of artifactual changes in the experimental design (e.g., attempts to clarify instractions, motivate subjects, or use "realistic" stimuli). Moreover, expertise makes a difference only when its acquisition is accompanied by explicit training in the relevant judgment. Swh training seems most effective when it changes the way people look at problems, not just provides them with practice.
RECONSTRUCTION CRITICISM
519
Debiasing Experience No. of studics reporting manipulations that were
STRATEGIES
at least partially successful
lindsight
Over:onfidence
unsuccessful
Hindsight
OverConfidence
Faulty tasks Unfair tosks Raise stakes Clarify instructions/stimuli Discourage second guessing Use better response modes Ask fewer questions Misunderstood tasks Demonstrate alternative goal
Demonstrate semantic disagreement Demonstrate impossibility of task
1
3 1
Faulty judges Perfectible individuals Warn of problem Describe problem Provide personalised feedback Train extensively In corrigible In div duals Recalibrate their responses Mismatch between judges and task Restructuring Make knowledge explicit Search for discrepant information Decompose problem Offer alternative formulations Education Rely on substantive experts Educate from childhood
Source: Adapted from Fischhol pulations that have yet to be subjected to empirical test or for which the evidence is unclear have been excluded from this task. Fischhoff (1982) identified the individual studies represented in the frequency counts.
5 20
-
B. Fischhoff
Perhaps as revealing as these conclusions are some features that the studies all have in common: (a) subjects are provided with no tools (e.g., decision aids, calculators), in order to make intuitive judgment the focus of the experiment; (b) the experimental task is treated as an end in itself, in order to remove any temptation to sacrifice immediate performance for long-term goals (e.g., gathering experience by deliberately making mistakes); (c) all necessary information is presented, in order t o isolate the interpretation of information from its discovery; (d) n o deliberate tricks are tried t o mislead subjects, so that judgmental errors can be interpreted as "honest mistakes", reflecting fundamental psychological processes; (e) a normative theory is available, allowing investigators to claim to know what constitutes biased b ehaviour. Results of this research are, therefore, most relevant for predicting performance in situations providing no tools, having no aftermath, and so on. Extrapolating t o other situations requires judgment to discern how those situations differ (pending research to establish what difference that difference makes). Such judgmental extrapolations are embedded in summary statements such as "people are over-confident in their knowledge'', unless qualified by "in situations like those studied". Of course, that qualification is unnecessary if one believes that life is like the situations studied.
Evaluating Human Judgment People study judgment not just as an intellectual curiosity, or as a key to understanding basic cognitive processes, but also as a guide to action. Often that action requires a global appraisal of "how much d o people know? " or "how good is people's judgment?". For example, the responses to such appraisals may determine the information provided about medical side effects, the awards given in product liability cases, or the resources allocated to decision aid(e)s (Jungermann, 1980). Providing such an appraisal requires a review o f the evidence, perhaps along the lines described above. Utilizing that appraisal requires an assessment of the consequences associated with different sorts of errors. How bad is it to overestimate and to underestimate how much people know? Clearly, the answer to this question depends upon the action and consequences being considered, Consider, for example, the decision whether to invest in decision analysis. In this case, the analysts stand to gain an undeserved increase in business to the extent that judgmental foibles are exaggerated (hence pointing to the need for their services). If
RECONSTRUCTION CRITICISM
521
those foibles are exaggerated too greatly, however, then analysis may be rejected unfairly, due to an undeserved feeling that decision makers cannot provide the sort of inputs needed to make analyses credible. On the other hand, potential clients of decision analysis stand t o lose something however their judgment is misunderstood. If their judgmental prowess is mildy underestimated, then they may be sold analyses that they do not need. If it is greatly underestimated or greatly overestimated, then they will be denied access to needed analytical help, because they are mistakenly considered either to be helpless or to need no help. Although scientists have no particular reasons to accentuate the positive or the negative aspects of human judgments, substantial stakes may ride on what is accepted wisdom about its quality. For example, hazardous technologies may be managed quite differently depending upon the level of understanding one attributes t o various sectors of the public. If the judgmental abilities of lay participants in political debates are underestimated, then they may be inappropriately disenfranchised from the decision-making process. If those abilities are overestimated then people will be left to fend for themselves without needed aids to their judgment and institutional protections against those who would exploit their weaknesses (Fischhoff el al., 1981; Sjoberg, 1980). As a result, people who study human judgment have a special responsibility to hedge their conclusions properly. Moreover, they should monitor the way that those results are used, for cases where the hedges are either trimmed or magnified, either by those who fail to appreciate the niceties of experimental design or by those who choose to ignore them,in order to achieve some rhetorical purpose. Nonetheless, there are limits to monitoring and correcting misinterpretations and misrepresentations. Ensuring a balanced summary of results may require tediously retelling one’s tale to different audiences varying the presentation so as t o avoid the different misunderstandings t o which it is prone.
Ten Years After What might be found if the last decade of research on heuristics were reviewed within the interpretative constraints described above? Possibly, the following: - The work on heuristics succeeded in rescuing the study of judgment from the mechanistic models of behaviour inherited from economics. The result has been a more phenomenologically based concept of behaviour. - Because there are often relatively few ways of explaining a particular
522
B. Fischhoff
pattern of ”error”, the identification of errors has provided the crucial tool for the identification of heuristics. The pioneering studies in the area cannot and d o not, however, make any claim regarding how prevalent bias is in life, or how far-reaching are its effects. - The retelling of these results has tended to accentuate the negative, in part because the errors are more salient than the heuristics, in part because some people wished to tell a tale of error, in part because the pioneering studies showed their caution more in claims that were not made than in claims that were denied. - The further one gets from the original sources, the greater is the tendency to blur the distinction between judgment and decision making. Hence, evidence showing at best that people d o not know or use the rules of statistical inference is interpreted as showing that they are irrational (by some interpretation of that loaded term). - The impact of this research on applied decision analysis has been good in the short run (by showing that people need help with at least one aspect of their decision making) and problematic in the intermediate run (by suggesting that people cannot readily provide the inputs for definitive analyses). It is likely to be good for the long run by providing an empirical and theoretical base for knowing when analysis is needed and how it can best be applied. Developing that base requires the sort of ruthless questioning for which practitioners are the best source when they worry about the fundamentals and the future of the field, and the worst source at times when they are worried about getting someone to use their current potential. - - Constructive replications have shown the biases to be quite robust. However, the most informative studies have been failures t o replicate which help us to specify when and how heuristics are used. - Although the study of heuristics is an aspect of cognition, it has made relatively little contact with cognitive psychology. Nor has there been a full confrontation between the analytic Bayesian decision aids and the more knowledge-based systems of cognitive science (Fox, 1982). Making that contact may reveal as much about the limits of current cognitive models as about the workings of heuristics. - Both looking more broadly, at the context within which judgment is exercised, and looking more deeply, at the way in which cognitive processes operate, will produce provocative hypotheses that prove very hard t o test empirically. As a result, people’s appraisal of the meaning of heuristics and biases will depend upon the prior probabilities they attach to these hypotheses. Until better data arrive, we will still have t o provide ourselves and others with some appraisal of how good or bad people’s judgment is. For
RECONSTRUCTION CRITICISM
523
this interpretative exercise, there is no substitute for our own best judgment. When people appear to exhibit judgmental biases, we should look both to them and to ourselves for causes. Are they using inappropriate cognitive strategies? Or, are we failing t o discern some method in their apparent madness? Would a critical reconstruction of the way they have interpreted our task show them to be doing something quite different than what we have imputed to them? Conversely, when people seem to exhibit judgmental prowess, we should look both to them and to ourselves. Do they have a grasp of relevant inferential principles? Or, have we pressured or led them to the right answer by some feature of our experimental design? Have they reached that answer by some trial-and-error process that circumvents the need for such analytical understanding? Are they right for wrong (or other) reasons? We do people no service by giving them too little credit for understanding-or too much.
References Berkeley, D. and P.C. Humphreys, 1982. Structuring declsion problems and the 'bias heuristic'. Acta Psychologica, 50, 201 -252. Boring, E.G., 1969. Perspective: Artifact and control. In: R. Rosenthal and R.L. Rosnow (eds.), Artifact in Behavioural Research New York: Academic Press. Fischhoff, B., 1982. Debiasing. In: D. Kahneman, P. Slovic, and A. Tversky (eds.), Judgment Under Uncertainty: Heuristics and Biases New York: Cambridge University Press. Fischhoff, B., S. Lichtenstem, P. Slonc, S. Derby, and R. Keeney, 1981. Acceptable Risk. New York: Cambridge University Press. Fox, J., 1982. Statistical and non-statistical inference in medical diagnosis. International Journal of Bio-Medical Computing. Jungermann, H., 1980. Speculations about decision theoretic aids for personal decision making. Acta Psychologica, 45, 7- 37. Kahneman, D., P. Slovic, and A. Tversky, 1982. Judgment Under Uncertainty: Heuristics and Biases New York: Cambridge University Press. Lakatos, I., 1970. Falsification and scientific research programmes. In: 1. Lakatos and A. Musgrave (eds.), Criticism and the Growth of Knowledge. Cambridge, Cambridge University Press. Mills, C.W., 1959. The SociologicalImagination New York: Oxford University Press. Sjoberg, L., 1980. The risks of risk analysls. Acta Psychologica, 45, 301-321. Turner, C. and E. Martin, 1982. Surveys of Subjective Phenomena. Washington, D.C.: National Academy of Science.
This Page Intentionally Left Blank
A THEORETICAL PERSPECTIVE ON HEURISTICS AND BIASES IN PROBABILISTIC THINKING’ Lawrence D. PHILLIPS London School of Economics and Political Science, England
Abstract Three paradigms are identified as crucial to interpretations of results in most studies of heuristics and biases in probabilistic thinking The paradigms are crtticised as being so limited and inadequate that generalisations from current research on heuristics and biases cannot be justified. In particular, the view of people as ’intellectual cripples’, who exhibit severe and systematic biases in making judgements, is shown to be a value judgement on the part of the investigator. The implicit acceptance of the paradigms is shown to have created four problems in current research. An alternative perspective, the generation paradigm, emphasizes the role of problem structuring, in particular the subject’s internal representation of the task, within which information is processed. The generation paradigm sees decision making and the forming of judgements as dynamic, generative processes, conducted interactlvely between people within a social and cultural context. From this perspective, research would attempt t o discover the conditions under which people can do well, aiding rather than de-bia* ing procedures would be investigated, and expected utility theory would be seen as neither a normative nor descriptive model, but rather as a guide to thinking that polices consistency within a small-world representation of a problem.
When Tversky and Kahneman presented their work o n heuristics and biases at the Rome Conference in 1973, it seemed to this observer that at long last some theory would soon develop about how people make judgements under uncertainty. As discussant on their paper, I argued that research on probability judgements had been through two stages, the first largely governed by a psychophysical paradigm, the second by a test theory model, and that we were about to enter a third stage which would be characterised by an information processing approach Soon we would have some idea of what goes on in people’s heads when they deal with Many of the ideas in this paper have been generated in discussion with Patrick Humphreys, Stuart Wooler, Ayleen Wisudha, and Tom Wisniewski to whom 1 am most grateful. Probably none of them, however, agrees with everything in this paper.
5 26
L.D.Phillips
inconclusive information, make assessments of probability, even take decisions when consequences are uncertain. However, instead of theory, we now have a collection of empirical findings that, collectively, seem strongly t o suggest that people are severely limited in their ability to make judgements under uncertainty (Humphreys, Wooler, and Phillips, 1980). Edwards (1983) attacks this view of people as ’intellectual cripples’ on the grounds that the studies are u n r e p resentative of tasks and subjects, that with tools people do well, and that even without tools people can do well. I am in broad agreement with his viewpoint, which is argued by considering recent empirical findings and taxonomies of intellectual tasks, but it is also important to recognise that the validity of generalisations from any body of research depends as much on the theory used to interpret the results as on the data themselves. T h s paper shows that generalisations from research on heuristics and biases are based on inadequate, even naive, theories that have been applied, mostly implicitly, in interpreting the results of studies. The elements of an alternative theoretical perspective are identified in only enough detail to show that the view of people as ’intellectual cripples’ can be seen for what it is: a value judgement on the part of investigators who choose a theoretical perspective from which the ’intellectual cripple’ interpretation naturally flows. The major conclusion of this paper is that we d o not yet know how good people are at judging uncertainty, and that people may well be capable of making precise, reliable and accurate assessments of probability. The first section of the paper reviews and criticizes the paradigms most commonly encountered to date. The second section outlines the elements of an alternative paradigm, and the paper concludes with some remarks on implications for the normative and descriptive status of expected utility theory, the new directions needed in research, and problems facing investigators who wish to follow these new directions in research.
Research Paradigms Philosophers remind us that underlying every research investigation is a theory, model or paradigm (Kuhn, 1970). If this were not so, results could not be interpreted. A major problem in research o n probability and value judgements is that these paradigms are usually not explicated, so their effect in helping the investigator to shape his or her conclusions is not obvious. Recently, I attempted to make these explicit (Phillips, 1 9 8 2 ) . In so doing, it became obvious that none could aspire t o the status of a theory, and even their status as models could be considered contentious.
PERSPECTIVE ON HEURISTICS AND BIASES
5 21
So, I have opted for the description of ’paradigm’, using the word in its weak sense of a generalised example. The use of these paradigms is not weak, however, and we shall see how powerfully they influence the interpretation of research findings. Most research on probabilistic thinking over the past twenty years shows the influence of just three paradigms: the psycho physical paradigm, the test theory paradigm, and the information processing paradigm. Because they are rarely explicated in any given piece of research, it is not always easy to detect their presence, particularly as some researchers are fond of mixing paradigms in a single paper. This tactic is particularly useful as a means of deflecting criticism that arises from a particular paradigm; abundant examples can be seen in the discussion and replies to the papers on probabilistic thinking presented in recent years at meetings of the Royal Statistical Society (Tversky, 1974; Lindley, Tversky, and Brown, 1979).
The Psychophysical Paradigm This paradigm assumes the existence of objective standards against which subject’s judgements can be judged. The paradigm underlies all the early research on subjective-objective probability functions (e.g., Cohen, 1960), but it continues t o appear in current w o r k In their classic paper for Science, Tversky and Kahneman (1974) state, ”The subjective assessment of probability resembles the subjective assessment of physical quantities such as distance or size”.In their recent Science article, these same authors argue: It is often possible to represent a given decision problem in more than one way. Alternative frames for a decision problem may be compared to alternative perspectives on the same scene. Veridical perception requires that the perceived relative height of two neighbouring mountains, say, should not reverse with changes of vantage point, Similarly, rational choice requires that the preference between options should not reverse with changes of frame. Because of imperfections of human perception and decision, however, changes of perspective often reverse the relative apparent size of objects and the relative desirability of options. (Tversky and Kahneman, 198 1 ). Note that the paper’s conclusion, ” ... that the processes of framing and evaluation produce predictable but incoherent preferences”, requires the psychophysical paradigm for its validity, for if there are no mountains,
528
L.D. Phillips
no 'objective' problems, veridicality is irrelevant and it would not be possible to describe shifts of preference as incoherent. There is, of course, a stated problem But that is unlikely to be identical to the subject's internal representation of it. As Nisbett and Ross (1980) point out, psychologists generally agree "that objects and events in the phenomenal world are almost never approached as if they were sui generis configurations but rather are assimilated into pre-existing structures in the mind of the perceiver" (p. 36). Given the briefness of the problems presented in Tversky and Kahneman's research, the unavailability of further information about the problem and the limited time in which to state a preference, it is not surprising that the internal representations of identical problems with different frames are different. It is only the experimenters who perceive the problems as identical, and they must convince the reader of the veridicality of their representations t o justify the conclusion that their subjects are incoherent. As an example, Figure 1 a gives problems 5 and 7 as stated by Tversky and Kahneman (1 981), followed by literal translations into decision trees. The structure is of the form act--event-payoff. Note that the problem statement is incomplete (Figure l b ) and the reader is left to guess at the existence of alternative events or of unspecified payoffs. The interpretation intended by Tversky and Kahneman is given in Figure 1 c. Converting the payoffs to utilities by assuming u(0) = 0 and 4 4 4 5 ) = 100, shown in Figure 1 d, makes obvious the standard interpretation that since probabilities are reduced by a constant factor from problem 5 to 7, the ordering of acts' must be preserved under an expected utility interpretation. If they are not, a preference reversal is said to have occurred. However, Humphreys (1977) has pointed out that people's choice behaviour can be conceptualised at different levels of decomposition. An alternative to the decompositions shown in Figure 1 is to chop the tree after the acts, and incoprorate the uncertainty as one attribute describing the consequences. Thus, in problem 5, the consequence of choosing option A is a n w i s k payoff of $30, while for option B it is a low-risk payoff of $45. Letting the utility of no-risk be 100 and of low-risk 0, and assigning utilities to the monetary payoffs as before, the problem can be represented as in Figure 2 where numbers in parentheses represent the utilities of the consequences. The weights assigned to the criteria now determine the preferences between A and B. Using a relative weighting system, an individual would compare the utility difference in riskiness between probabilities of 1 .O and 0.8 with the utility difference between payoffs of $30 and #45. In problem 7, structured to the same level of decomposition as in Figure 2, the risks are higher since the probabilities are lower, 0.25 and 0.20, but the utility difference would probably be less for most people.
5 29
PERSPECTIVE ON HEURISTICS AND BIASES
Problem 7. Which of the followin options do you prefer.
Problem 5. Which of the following options do you prefer A . a sure win
9
E. 2 5 % chance to win $30 F. 20% chance to win $45
o f $30
a' B. 8 0 % chonce t o win $45
4 30 ?
$45 ?
-/
4 30
A
$30
0 $45
0
d'
o(
'O0
8
-
1
0.2 A
k
B if ~($30)> 80
E
k
F if u($30),80
Figure I . Possible Representation of an Allais-Type Decision task.
Thus, the risk dimension would receive a smaller weight than in problem 5. People for whom the value of w is above 0.5 in problem 5 and below it in problem 7, a very plausible set of assignments, will prefer A to B and F to E. Note that this pattern of preferences is not a reversal, is not incoherent, and is not a violation of expected utility theory, if the problem structures in Figure 2 are assumed. In other words, the Allais paradox illustrated by the supposed preference reversal from problem 5 t o 7 is only a paradox if you accept Allais' formulation of the problem. It is perfectly possible that all those who have investigated the Allais paradox are correct in saying that many subjects make inconsistent shifts of preference. Certainly I am not questioning the data that many people do change their preference order. But to call this a reversal, to say it is 34
5 30
L.D. Phillips
poblrm 5
Problem 7 Criteria
mqilts: w, risk
A
>B
Criteria
4-IN,
WL
I-w2
payoff
risk
payoff
if w, > 0.5
E > F if wz > 0.5 However, w,
.'.
it
IS
possible for A
> w2
> 6 and F > E
Figure 2. An Alternative Representation of the Allais Problem
incoherent or that it violates expected utility theory is too strong when we do not know how subjects structured the problem. Thus, the importance of the psychophysical paradigm. Lacking inforination about how their subjects structure the problems presented, investigators have to assume that one particular problem structure is veridical or objective or else the data cannot be interpreted. But to prefer the structures of Figure 1 to those of Figure 2 is a value judgement; decision theory is largely silent about how tasks should be structured. There is, then, a substantial valueladen component (or bias) in most of the work on judgemental heuristics and biases when investigators conclude, without knowledge of their subjects' subjective problem structures, that people exhibit systematic biases in making judgements of uncertainty. Where decision research is concerned, the simple psychophysical model is untenable. There are no 'objective' problems, no 'veridical' structures. A group can reach concensus about the statement of a problem but disagree about specific structural representations of it. In decision conferences, as the decision analyst facilitates the group in working toward a requisite structure (a socially shared and agreed structural representation of the problem), the very nature of the problem often changes as the explicit modelling allows new intuitions about the problem to emerge and then to be captured in a revised structural representation. This same process of iterating towards a more satisfactory representation of a decision problem is evident also in individual decision making. Wooler and Erlich
PERSPECTIVE ON HEURISTICS AND BIASES
531
(1 983) observe that students using multi-attribute utility theory typically change and revise their problem representation over the course of a one- to two-hour session, adding and deleting attributes as they gain new intuitions about their job choices. As for laboratory research, or ’demonstrations’, as Tversky and Kahneman (1981) now call their research, investigators usually assume that their representation of a problem is veridical and can be considered objective. Decision researchers have ignored the theoretical stance of other psychologists who have long since recognised that people do not construct identical representations of their phenomenal world (Evans, 1972). To reject the psychophysical model deals a severe blow to most of the work over the past twenty years on judgement research conducted within the expected utility framework. That work has almost exclusively been concerned with comparisons of people’s behaviour to that predicted by the expected utility rule, the probability laws (including Bayes’ theorem), and multi-attribute models. Issues of structure have been ignored, yet it should now be clear that conclusions about how people process infornlation cannot be made without knowing how subjects structure problems. Since this lack is characteristic of most research t o date, the conclusion of ’bias’ is possibly wrong, or at best, premature.
The Test Thcoty Model Some decision research, particularly work on scoring rules, assumes a paradigm that sounds very much like the classical psychometric model: test score = true score + error assessed probability = true probability + error judged value = true value + error The ’true probability’ or ’true value’ is presumed to reside inside the assessor as an indication of the individual’s true state of uncertainty or true utility, but becomes distorted by motivational, cognitive, and other biases while being translated into a response (de Zeeuw and Wagenaar, 1974). Proper scoring rules (Stael von Holstein, 1970) were thought t o keep the assessor ’honest’, to deter the influence of biasing processes. However, more than a decade of research using scoring rules has shown that their main function is to serve as rather uninformative outcome feedback, of some use in the early stages of teaching people how t o assess probabilities. It appears that scoring rules lend some meaning to the scale of probability, and so help people to formulate numerical assessments. There never was a ’true probability’, only a rather diffuse feeling of uncer-
5 32
L.D. Phillips
tainty. Not surprising, for probabilities are numbers, while uncertainty is registered as a feeling. The concept never was very tenable, and it is not made more so by claiming that people behave ’as if‘ they experience probabilities. A more serious objection to the test theory paradigm is that it propagates wllat may well turn out t o be a fundamental misconception about the nature of probabilistic thinking, that the ability t o assess probabilities varies in degree from one person t o the next. This is clearly the assumption behind all research o n calibration where the majority of studies have shown that most people are too sure of themselves (Lichtenstein, Fischhoff,and Phillips, 1982). Just as some people are more intelligent than others, so some are better calibrated than others; both are thought to be matters of degree. But an alternative forniulation is beginning to find ernpirical support. Under this view, probabilistic thinking is one aspect of an individual’s capacity to d o work, but there are qualitative differences, discontinuities, in capacity, not quantitative degrees of capacity as with IQ. These differences are described by Jaques (1978) as consituting seven levels of abstraction, with each higher level implying increased ability by iridividuals to deal with problems in an abstract way. Capacity develops as a person matures, so it is possible for an individual to move up this hierarchy of levels, though few will ever achieve the highest levels. According to Jaques’ theory, people differ in their capacity to think probabilistically. There is no hint that people at lower levels are in any way deficient; they are qualitatively different, not suffering from some cognitive deficit. Capacity theory also reveals two different ways in which uncertainty may arise (Humphreys and Berkeley, in press). At levels I arid 2 , Uncertainty is about how a defined task is to be done, whereas at higher levels uncertainty concerns the possible consequences of tasks and the characteristics of the ’small worlds’ within which these tasks are located. This clearly suggests that task characteristics will influence the handling of uncertainty. Since an individual’s ability to deal with uncertainty depends on characteristics of the task as well as on the person’s level of capacity for abstraction, three criticisms can be made of most work o n heuristics and biases. First, research has typically used subject populations that are biased; they are deficient in people with higher levels of capacity, the very people who could be expected to be better at probabilistic thinking. Indeed, Schoemaker (1980) found that subjects whose mean age was about 40 make hypothetical insurance decisions more consistently and more in accordance with expected utility theory than did student subjects with mean age about 20. The second criticism concerns the selection of tasks. Capacity theory
PERSPECTIVE ON HEURISTICS AND BIASES
5 33
predicts that tasks involving uncertainty will be dealt with differently depending on whether the problem is stated in a concrete or abstract context. Schoemaker found that 13 percent of his studelit subjects preferred a zero deductible insurance policy when they were presented with an abstract representation of the two choices facing them, but that 45 percent of these same subjects then preferred this policy when it was presented in an insurance context. The heuristics literature abounds with abstract problems, hypothetical gambles, imaginary games with abstract probabilities of winning or losing amounts that materialise out of thin air, imagined balls and urns with samples generated by unseen forces, life and death decisions, choices between large amounts of money and stated probabilities that differ by only 0.01 and 0.05, unfamiliar contexts, mostly tasks for which good performance could be expected only from people whose capacity is above level 3.’ Virtually no intercorrelations were found between 12 different measures of probabilistic thinking derived from the two tasks given to 143 adult volunteers in a study by Wright and Phillips (1979b). Nor did Evans and Pollard (1982) find any correlation between errors on two tasks involving the perception of non-randomness. Both these studies highhght the importance of considering task characteristics before generalisations can be made to other tasks. Taking these two criticisms together, only when the task characteristics are appropriate for the level of abstraction of which the individual is capable, should we expect good performance. If a group of subjects of varying capacity is given a collection of concrete and abstract tasks, a subjects-by-tasks interaction should appear. Usually, though, group perform n c e is examined for individual tasks, and between-task comparisons are rarely made. This preference for analysing only group data is my third criticism. Elsewhere (Phillips, 1980) I have reported that at least one person in 20 in the West and more in the East shows no inclination to t h n k probabilistically. As we have seen, capacity theory predicts qualitative differences among people in probabilistic thinking. By analysing group data we lose the sense of some people doing well on some of the tasks. It is a revealing observation that researchers in this area prefer to focus on the deficiencies, t o develop explanations and models to account for these deficiencies, rather than t o look for the characteristics of tasks that would enable people with different capacities t o do well. Since level 3 is the typical capacity of middle managers, this theory explains the reluctance observed by Harrison (1977) of many middle managers to use decision analytic structures that incorporate uncertainty, such as decision trees.
5 34
L.D. Phillips
In short, because tasks have not been matched t o people’s capacities, it is quite wrong to assume that current research justifies the generalisation that people are inconsistent and biased in their handling of real-life tasks. One should not give bank clerks judgemental tasks appropriate only for the manager of the bank, and then believe that any useful psychological theory has resulted from the categorising and labelling of the observed errors of judgement. The psychophysical paradigm is inappropriate because although perceptual processes do not differ substantially from one person to the next, problem solving capacity does. Further, even if the ’true probability’ concept were tenable, the test theory paradigm would still be inapplicable because a given task would call up the ’true probability’ in some people but not in others. By tacitly accepting either paradigm, researchers construct experiments or ’demonstrations’ whose generalisability to other tasks and other people is negligible. The Information Processing Paradigm When researchers focus on how subjects use the information that is presented to them in arriving at an assessment of probability or in making a choice or preference judgement based on probabilities and values, they are assuming an information processing paradigm. Our subjects are assumed to be processors of information, so any failure to come u p with the normatively prescribed answer indicates a failure in processing. When we use the representativeness heuristic, we do not incorporate base rate information in making our assessments. When the availability heuristic influences our judgements, we process only information that comes readily to mind and d o not consider the information contained in less available cases. The anchoring or adjustment heuristic causes us to over-value the information implied by the starting value, while we are not sufficiently influenced by the information that should pull us away from the anchor. Each of these is interpreted as a failure to process relevant information. This may be so, but another explanation is possible, too. Consider this simple problem. A fair coin is to be tossed twice. What is the chance of obtaining at least one head? Most people would reason that there are four equally likely outcomes, HH, HT, TH and TT, three of which are favourable cases so the probability is 3/4. We would assume that any other answer is a failure to combine the probabilities at each toss according t o the laws of probability, and if enough people made the same mistake, we would infer a cognitive limitation on the ability to combine information. But D’Alembert, as Todhunter (1865) reports, claimed that the probability was 213, reasoning that if the head appeared on the first toss, there
PERSPECTIVE ON HEURISTICS AND BIASES
5 35
was no need for a second toss, leaving only three cases, H, TH and TT. Thus, the probability is 213. Today, we would argue that D’Alembert erred in assigning equal probabilities in those cases. Suppose, however, that a drawing-pin (thumbtack) is being tossed. It can fall with its point up, U,or down, D. What is the probability that in two tosses it will fall point up at least once? If you have no idea of the direction ar extent of the drawing pin’s bias, 0, which might be interpreted as the long-run ratio of Us to total tosses, you might choose p(0) = 1/2, assume that the tosses are independent, thus making the four possible outcomes equally likely, giving the answer 314, as with the coin. A Bayesian, on the other hand, might assign a uniform prior distribution over 0 before the first toss. T h s has its mean at 0 = 112, so for the first toss p(U) = p(D) = 112. If the drawing-pin comes up U on the first toss, that datum revises the prior so it now becomes a triangle with its mode at 8 = 1.0, and with its mean at 8 = 213. Thus, if the first toss results in U, then for the second toss p(U/U) = 2/3 and p(D/U) = 113. By a similar line of reasoning, if the first toss is a D, then for the second toss p(U/D) = 113 and p(D/D) = 2/3. A little multiplication gives the probabilities for the joint events: p(U,U) = p(D.D) = 112 x 213 = 113 and p(U,D) = 1/2 x 1/3 = 1/6. Thus, the probability of at least one U is 113 + 116 + 1/6 = 213, in agreement with D’Alembert’sresult if not his reasoning. The poirit of this simple example is to highlight once again the important role of structure. Information has been processed correctly in the two aproaches to the drawing-pin problem, but the structures are different. It is not possible to tell whether information has been processed correctly lacking knowledge of how the person has structured the problem internally. Thus, the ghost o f the psychophysical model lurks behmd the information processing paradigm. Experimenters must assume, usually implicitly, that some structure is ’objectively’ correct, otherwise nothing could be inferred about people’s information processing capabilities at all. This is true of very nearly all experiments conducted to date, for no attempt is made to discover how subjects structure the task presented. There is an even more serious objection to the information processing paradigm that is assumed by most research. Most investigators seem to assume that information in the task is encoded in some way by the subject, who may selectively filter or preprocess the information, that additional information thought to be relevant may be retrieved from the subject’s memory, that these inputs are then combined or ’processed’ with the individual’s needs, goals, and motives possibly influencing the processing, all being affected by group norms and cultural values, with the resulting brew somehow governing the subject’s response. It is an open system; information in, response out.
5 36
L.D. Pliillips
Problem solving is conducted in interaction with other people and with the environment (Hogarth, 1981 ). Because of this, the production rule version of the information pocessirlg paradigm (Pitz, 1977; T u g l e and Barron, 1980) is inadequate since production rules reside in memory unchanged by attempts t o solve a particular problem. Yet experience in using decision analysis as a framework for helping decision makers t o generate ’requisite’ representations of problems, suggests that this opensystem information processing paradigm fails to capture the interactivc, iterative nature of good problem solving. In the research literature, s u b jects are almost never given feedback about the logical implications of their judgements, never shown their inconsistencies and invited to resolve then4 rarely asked for redundant judgements so that inconsistency can he utilised as part of the assessment process, and almost never asked to make judgements in a group setting. Yet many of these characteristics can be seen in recent studies that demonstrated good calibration of probability assessments (Balthasar, Boschi,and Menke, 1978; Kabus, 1976). It is perfectly possible that many people, given the right tasks in the right circumstances, could make precise, reliable, accurate assessments of probability, as I have argued elsewhere (Phillips, 1982b), but as long as research is governed by the open-system version of the information processing paradigm it is unlikely that those circumstances will ever be identified.
Summary
So far I have tried t o show the restrictive effect our implicit acceptance of three paradigms has had on decision research into heuristics and biases, with the consequence that we believe our subjects to be cognitively deficient, intellectual cripples. This generalisation was shown not to be justified by current research because it depends on untenable assumptions embedded in the paradigm. From the discussion of these paradigms and their limitations, four problems with current research can be identified. 1. 7he problem of objectivity. Most investigators assume the existence of an objective problem, which then justifies interpretations of incoherence, violation of expected utility theory, etc. We ignore the fact that our subjects generate their own structural representations of a stated problem that assessed probabilities are conditional on the assessor’s structure, not the experimenter’s, and that judged values are also conditional on structure which may be multi-attributed even though a single attribute is given in the stated problem. Even where structure is not at issue, it may be misleading to use statistical or actuarial data as an objective standard to
PEP.SPECT1VE ON HEURISTICS AND BIASES
5 37
which subjects' judgements can be compared, as in some risk studies (Slovic,Fischhoff,and Lichtenstein, 1381). The age a t which I will die is an event that is conditional on other events and information: my current age, my family history, general health, smoking and drinking habits, etc. Some of these conditioning events are explicitly identified in the actuarial standards, but many are not. Thus, without knowing the conditioning events I consider in making my judgement, it is not possible to ascribe bias t o me when my assessment differs from the actuarial standard. In any event, actuarial standards are notoriously difficult to develop. Insurance companies rely substantially on the subjective judgements of their underwriters in developing differential rates for all classes of business other than life insurance. Private motor insurance is a good example: companies differ in their rating schemes, offering different premiums for the same risk. If insurance companies cannot develop objective rating schemes, the standards used in risk assessments studies must surely be suspect. 2. The problem of generalisability. Heuristics research has ignored the possibility that subjects are characterised by qualitative differences in their capacity to deal with the research tasks. Instead, biases are likened t o error scores, different from one person to the next in degree, not kind. Furthermore, no consideration has been given to the representativeness of tasks used in the research. In particular, no distinction has been made between concrete and abstract tasks, nor has any research attempted to match the task to the level of abstraction of which the subject is capable. Also, by relying almost exclusively on group data, experimenters have ignored the people who d o well on these tasks. No attempt has been made to discover the task characteristics and subject characteristics that lead to a good performance. A good example is the almost exclusive reliance o n the fractile method of assessing probability distributions described in narrative form in Raiffa (1968). It is well-documented that this procedure results in distributions that are too tight, which has led to the generalisation that people are too sure of themselves when assessing continuous distributions. However, the encoding method recommended by Stael von Holstein and Matheson (1979, pp. 45-53), which requires the subject t o make cumulative probability judgements for various possible values of the uncertain quantity, so reliably gives more spread-out distributions than the fractile method that the two methods can be demonstrated in a class-room setting with one assessor using both approaches. For the most part, it is decision analysts, not decision recearchers, who have sought and found the conditions under which good assessments can be obtained. Until decision researchers take a greater interest in task taxonomy and individual differences, they are likely t o continue making unwarranted generalisations from unrepresentative experiments.
L.D. Phillips
538
3. The problem of cognitive de.ficiency. To their credit, Tversky and Kahneman have been careful not t o suggest the cognitive deficit interpretation of observed biases in decision research, but most other researchers have not been so discrete, as Edwards (1982) observes. i n anthropological research ’cognitive deficit’ interpretations have been attacked since the turn of the century. Boas (1911) was an early critic. Summarising his argument, Cole, Gay, Click, and Sharp (1971) write:
”...the thrust of Boas’s argument seems to be that previous observers failed to understand the people they were describing and then mistook their own lack of understanding as evidence of their informants’ stupidity.” Lacking any understanding of the internal problem representations generated by their subjects, decision researchers may actually be describing themselves when they ascribe ’bias’ to their subjects. 4. The problem of measurement, Much decision research is predicated on the notion that the subjects’ assessments, preferences and choices are stable, that an enduring internal state is being measured. An alternative view is that assessments, preferences and choices are generated in the process of solving a problem, that they develop as the individual explores alternative problem structures, attempts to resolve inconsistencies and incorporates new intuitions and information that arises in the course of this iterative process. Eventually stability is achieved and a decision can be made with confidence. Nisbett and Ross (1980) observe: ”It would be interesting to know how any of the inferential errors reported in this book would survive intact after an open discussion among groups of ten or twelve subjects. Our guess is that most of the errors would at least be substantially reduced.” (p. 267) We need far more research which allows subjects to explore problems that are meaningful to them. At the moment, research results characterise only our subjects’ first reactions, which in real-life is only the starting point for serious problem solving. Research on heuristics and biases has become a psychology of first impressions.
The Generation Paradigm One line of the argument so far has been that unless we know the subject’s internal structural representation of the task, we cannot adequately interpret his or her data. That applies to us as decision researchers, too. Unless
PERSPECTIVE ON HEURISTICS AND BIASES
5 39
we are clear about the paradigm or model that we bring to our research design, our interpretations of the data are likely to be confused, muddled, or wrong. I have already sketched some of the elements of a new paradigm that seems more realistic and less restrictive than the three discussed so far (Phillips, 1983a). Here I will revise that view somewhat and extend the model, but it is still incomplcte. The key idea in the generation paradigm is that process is embedded in structure. We cannot talk about information processing without reference to the internal representation of the task. Structure works in different ways and at different levels. Here, we distinguish two types, general structure and problem structure. The research on cultural and individual differences in probabilistic thinking (Wright and Phillips, 1 979a; Wright and Phillips, 1979b) suggests that people impose a general structural framework on a problem. Causal and fatalistic structures have been mentioned, but others are possible: deductive logic structures, inductive logic structures, similarity structures, diagnostic or conditional probability structures, and more. For example, Hamniond, McClelland, and Mumpower (1 980) have identified six general structures as characterising research on decision and choice processes. You interpret the same data differently if you use a lens model rather than expected utility theory. Problem structure represents the task at hand. Even if you accept the general structure of expected utility theory, different problem structures are possible (Yates and Carlson, 198l), as Figures 1 and 2 show. hoblem structuring may include aspects of the editing phase of prospect theory, but other operations, such as comparing parts of the internal model against features of the presented problem, or testing consequences of the internal representation against the environment, or scanning the internal model for coherence, are also possible. It is because prospect theory omits these additional operations and fails to consider the impact of general structure, that I claimed earlier it was incomplete. Composition rules are applied witlun the problem structure. These may take the form of production rules, compensation rules such as expected utility, non-compensatory rules such as elimination-by-aspects, or any other rule judged by the subject as appropriate for the task in hand. Application of these rules allows information to be processed. The capacity of the individual, indicated by the level of abstraction, will affect the kinds of general structures used by the individual, the types of problem structures developed, and the information processing rules used. In this way, the generation paradigm allows for individual differences in judgements and choice. A second feature of the generatim paradigm is that general structur-
5 40
L.D. Phillips
ing, problem structuring, and information processing are carried out in any order, and they are done iteratively. (Note the contrast with prospect theory, in which editing comes before evaluation; see Kahnenian and Tversky, 1979.) The failure to find an adequate problem structure niay lead the individual t o shift to a different general structure. Processing some information may lead to inconsistent results, motivating a change in problem structure. Problem solving activities continue within the practical constraints of time and resources. In many cases, the goal of the subject may be t o develop a requisite solution, requisite in the sense that the internal representation includes all the features required to solve the problem, and judgements are no more considered than is required t o make a decision or to express a preference. Note that this is not a 'cost of thinking' or 'satisficing' criterion; it is a pragmatic rule that exploits the insensitivity of decisions to particular aspects of the problem (for an example, see Phillips, I982c). A third feature of this paradigm is that task characteristics can be influential at any level. In studies of calibration, questions about future events nught be more likely t o induce probabilistic general structures than general knowledge questions. Pitz showed that a simple rewording of one of the problems appearing in Tversky and Kahneman (1971) caused nearly every respondent t o answer the question correctly, whereas most people answered the original problem incorrectly. Pitz argued that the rewording blocked the higher-order production usually given to the original problem and allowed a lower-order production, which gives the right answer, to emerge. Schoemaker (1980) speculated that the context effect of his insurance study is an evohng process, the insurance context calling u p influences like societal norms that were not evoked by the hypothetical wagers. This latter observation highlights a fourth feature of the generation paradigm: it acknowledges the multi-dimensional nature of even the simplest tasks. In building their own internal representation of the problem, subjects may abstract features of the presented problem, but they may also incorporate dimensions drawn from experience, influenced by group norms and cultural values. There is no assumption that an 'objective' problem exists; the reality is the subject's current internal representation. The experimenter may have a different one, most likely corresponding closely to the stated problem. But there are no correct representations, only different ones. Thus, supposed preference reversals are not necessarily the result of different frames imposed o n the 'real' problem, they are different preferences resulting from different internal representations. A minor change in interpretation? Not so, for the 'reversal' interpretation
PERSPECTIVF ON HEURISTICS AND BIASES
541
implies inconsistency, whereas applying the generation paradigm to the results may well indicate no incoherence. In summary, the generation paradigm sees decision making and the forming of judgements as dynamic, social processes, involving interaction between the individual and his or her environment and with other people. Problem statements are seen as the starting point for an iterative develop ment of a requisite structural representation within which information can be processed. Task characteristics and individual differences are seen as interacting and influencing the problem solving process at all levels. In short, judgements of value, assessments of probability, representations of structures are all generated by the individual. Investigators working from this new paradigm will not be content with asking subjects for first impressions to static problems. Instead, it will be necessary to allow subjects time t o work toward a requisite representation, and investigators will have to discover how subjects internally represent stated problems if inferences are to be made about consistency. Rather than investigate ’de-biasing’ procedures, experimenters will examine the effects of aids, from paper-and-pencilto interactive, conversational computer programs. The effects of gioLips on decision representation will be investigated (strangely, this was never done in any of the numerous studies on the risky-shift phenomenon). Effort will be devoted to finding the circumstances in which people are ’intellectual athletes’ not ’intellectual cripples’. As a final observation, it is worth noting that the generation paradigm has important implications for the status of expected utility theory. There seems little point from thn perspective in asking whether people violate fundamental axioms of preference in either experimental or applied settings. Instead, the generation paradigm suggests that it would be more appropriate to ask whether people would accept the axioms and resulting theory as a useful framework within which a requisite structure can be developed, and which will guide individuals in generating a coherent set of assessments which are themselves requisite. Instead of worrying about grand world violations of consistency, we might be more modest in aslung of expected utility theory that it serve as a guide for decision makers who wish to develop a small-world model of their problem, not an optimal model but a requisite one that goes as far as is deemed useful toward achieving internal consistency. In this sense, expected utility theory is neither normative nor descriptive. Rather, it is a guide to thinking that polices consistency. Thus, attacks on its normative or descriptive adequacy are irrelevant, and contribute only t o a destructive devaluation of the power of expected utility theory as an organising principle and demonstrably useful guide to choosing and deciding.
542
L.D. Phillips
References Boas, F., 1911. The Mind of Primitive Man New York: Macmillan, 1938, rev. ed. Balthasar, H.U., K.A.A. Boschi, and M.M. Menke, 1978. Calling the shots in R & D. Harvard Rusincss Review, May-June, 15 1 - 160. Cohen, J., 1960. Cliance, Skill and Luck, Harmondsworth, Middlesex: Penguin. Cole, M., J. Gay, J.A. Click, and D.W. Sharp, 1971. The Cultural Context of Learning and Thinking U,S.A.: Basic Books; London: Methuen and Tavistock. de Zeeuw, G. and W.A. Wagenaar, 1974. h e subjective probabilities probabilities? In: C-A.S. Stael von Holstein (ed.), The Concept of Probability in Psych@ logical Experiments Dordrecht: D. Keidel. Edwards, W., 1983. Human cognitive capabilities, representativeness and ground rules for research. In this volume, 509 --5 15. Evans, J.St.B.T., 1972. On the problems of interpreting reasoning data: Logical and psychological approaches. Cognition, 1, 373-384. Evans, J.St.B.T. and A.E. Dusoir, 1977. Proportionality and sample size as factors in intuitive statistical judgement. Acta Psychologica, 41. 129%138. Evans, J3.B.T. and P. Pollard. Statistical Judgement:A further test of the representativeness construct. Acta Psychologica (in press). Hammond, KR., G.H. McClelland, and J. Mumpower, 1980. tfuinan Judgment and Decision Making New York: Praeger. Harrison, F.L., 1977. Decision making in conditions of extreme uncertainty. The Journal of Management Studies, 169- 178. Hogarth, RM., 1981. Beyond discrete biases: Functional and dysfunctional aspects of judgmental heuristics. Psychological Bulletin, YO, 197-217. Humphreys, P.C., 1977. Application of multi-attribute theory. In: H. Jungermann and G. de Zeeuw (eds.), Decision Making and Change in Human Affairairs Dordrecht: D. Reidel. Humphreys, P.C. and D. Berkeley. Problem structuring calculi and levels of knowledge representation in decision making. In: R . Scholz (ed.), Decision Making Under Uncertainty. Amsterdam: North Holland (in press). Humphreys, P.C., S. Wooler, and L.D. Phillips, 1980. Structuring decisions: The role of structuring heuristics. Technical Report 80- 1. Uxbridge, Middlesex: Decision Analysis Unit, Brunel University. Jaques, E., 1978. Level of abstraction in mental activity. In: E. Jaques, R.O. Gibson, and D.J. Isaac (eds.), Levels of Abstraction in Logic and Human Action London: Heinemann. Kabus, I., 1978. You can bank on uncertainty. Harvard Business Review, May-June, 95-105. Kahneman, D. and A. Tversky, 1979. Prospect Theory: An analysis of decision under risk Econometrica, 47, 263-291. Kuhn, T.S., 1970. The Structure of' Scientific Revolutions Chicago: The University of Chicago Press, 2nd ed. Lichtenstein, S., B. Fischhoff, and L.D. Phillips, 1981. Calibration of probabilities: The state of the art to 1980. In: D. Kahneman, P. Slovie, and A. Tversky (eds.), Judgement Under Uncertainty: Heuristics and Biases Cambridge University Press. Llndley, D.V., A. Tversky and R.V. Brown, 1979. On the reconciliation of probability assessments. Journal of the Royal Statistical Society, 142, 146-162.
PERSPECTlVE ON HEURISTICS AND BIASES
543
Nisbett, R. and L. Ross, 1980. Human Inference: Strategies and Shortcomings of Social Judgeitlent. Englewood Cliffs, New Jersey: Prentice Hall. Phillips, L.D., 1980. Organisational structure and decision technology. Acta Psyche logica, 45, 247 -- 264. Phillips, L.D., 1982a. Generation theory. In: L. McAlister (ed.), Research in Marketing, Supplement I : Choice Models for Buyer Behaviour. Greenwich, Conn.: JAI Press. Phillips, L.D., 1982b. The evaluation of risks estimates: Limitations to human judgement? In: G. Volta (ed.), Risk and Safety Assessments in Industrial Activi. ties. Phillips, L.D., 1 9 8 2 ~ Requisite . decision modelling: A case study. Journal of Opere tiotial Research Society. Pitz, G., 1977. Decision making and cognition. In: H. Jungermann and G. de Zeeuw (eds.), Decision Making and Change in Human Affairs. Dordrecht: ReideL Raiffa, H., 1968. Decision Analysis: Introductory Lectures on Choice Under Uncertainty. New York: Addison-Wesley. Schoemaker, P.J., 1980. Experiments on Decision Under Risk: The Expected Utility Hypothesis Boston: Martinus Nijhoff. Slovic, P., B. Fischhoff, and S. Lichtenstein, 1981. Perceived risk: Psychological factors and social implications. In: The Assessment and Perception of Risk. London: The Royal Society. Stael von Holstein, C-A.S., 1970. Assessment and evaluation of subjective probability distributions. Stockholm: Economic Research Institute. Stael yon Holstein C-A.S. and J. Matheson, 1979. A manual for encoding probability distributtons. Menlo Park: SRI International. Todhunter, I., 1865. A History of the Mathematical Theory o f Probability. Cambridge and London: Mamillan. Tuggle, F.D. and F.H. Barron, 1980. On the validation of descriptive models of decision making. Acta Psychologica, 45, 197-210. Tversky, A, 1974. Assessing uncertainty. The Journal of the Royal Statisticd Society, Series B, 36, 148- 159. Tversky, A. and D. Kahneman, 1971. Belief in the "law of small numbers". Psychclogical Bulletin, 76, 105-1 10. Tversky, A. and D. Kahneman, 1974. Judgement under uncertainty. Science, 185, 1124- 1131. Tversky, A. and D. Kahneman, 198 1. The framing of decisions and the psychology of choice. Science, 211, 453-458. Wooler, S. and A. Erlich, 1983. Interdependence between problem structuring and attribute weighting in transitional decision problems, In this volume, 321 -334. Wright, G.N. and L.D. Phillips, 1979a. Cross-cultural differences in the assessment and communication of uncertainty. Current Anthropology. 20, 845-846. Wright, G N . and L.D. Phillips, 1979b. Personality and probabilistic thinking: An exploratory study. British Journal of Psychology, 70, 295-303. Yates, F. and B.W. Carlson, 1981. A synopsis of representation t b o r y . Paper p r e sented at the Eighth Research Conference on Subjectfve Probabllity, Utility and Decision Making, Budapest.
This Page Intentionally Left Blank
AUTHOR INDEX
Abelson, R.P. 148, 159, 163, 213, 221, 358, 367,444,453,455 Ackerman, B. 6 5 , 6 7 Ackerman, S. 67 Ackoff, R.L. 146, 163 Acton, J.P. 24, 37 Adelbratt, T. 3 4 6 , 3 5 0 , 3 6 7 , 3 6 8 Adelman, L. 164 Ahern, W. 5 7 , 6 7 , 81,88 Ajzen, J. 387,400 Alluisi, E.A. 206. 219 Altshuler, G.S. 127, 141 Alvarez, L.W. 492,504 Anderson, G.A. 220 Anderson, J.R. 208,215, 219,226,235 Argyris, C. 163 Arrow, K.J. 133, 141 Aschenbrenner, K. 134, 142 Aschenbrenner, K.M. 290,298.326,333 Aschenbrenner, M. 138,141 Asher, S.I. 493, 504 Atwood, M.E. 216, 219 Atz, H. 5 9 , 6 7
Balthasar, H.U. 538,544 Barbour, F. 354, 369 Barclay, J.R. 220 Barron, F.H. 538,545 Bauer, V. 159, 1 6 3 , 2 8 4 , 2 9 1 , 2 9 8 , 3 2 8 Beach, L.R. 226, 235, 282, 284, 298, 400,458,468, Becker, G.M. 419,429
35
Becker, S.W. 458,468 Bell, D. 147,163 Bell, R.S. 180,514 Benne, K.D. 164 Bennis, W.G. 158, 163,164 Berkeley, D. 5 1 8 , 5 2 5 , 5 3 4 , 5 4 4 Berlyne, D. 384,400 Bernard, H.R. 493,504 Bettman, J.R. 134, 141, 367, 371, 372, 373,381,444,455 Beyth, R. 137, 142 Biel, A. 386, 387,400 Bieri, J . 338, 386,400 Boas, F. 540,544 Bobrow, D.G. 208,219,455,456 Boichenko, V.S. 134, 142,143 Bonczek, R.H. 268, 277 Boring. E.G. 5 1 9 , 5 2 5 Boschi, R.A.A. 538,544 Boulding. K.E. 152, 161, 164 Bowel, G.H. 226, 235 Brach, E.W. 103 Bransford, J.D. 21 3, 220 Braunstein, M.L. 371,382,430,442 Braybrooke, D. 75,88 Bronner, A.E. 293, 297,298 Bronner, F. 201,202,281 Brown,M.T. 212,213,221,304, 319 Brown, R.V. 1 8 , 1 9 , 2 0 , 2 1 , 9 1 , 9 3 , 1 0 3 , 152, 156, 159, 166, 172, 180, 181, 277,458,468,529,544 Brownson, F.O. 458, 468 Brunswik, E. 511,514
546 Budge, 1 , 2 8 8 , 2 9 8 Buedc, D.M. 274,277,318 Burns, T. 148, 164
Campbell, D.1'. 164 Campbell, F.L. 298 Carlson, B.W. 541,545 Carnap, R. 138,142 Carroll, J. S. 371,382,430,442,455 Carter-Sobell, L. 212,220 Casper, S. 105 Chase, W.G. 208, 220 Chin, R. 164 Christen, F.G. 303, 318 Chuev, Ju.V. 144 Churchman, C.W. 135,142 Clayton, li. 85,88 Cohen, J. 529,544 Cole, M.J. 540,544 Collins, A.M. 226, 235,366,367 Coombs, C.H. 261,278, 328,333 Corbin, R.M. 3.54, 367 Corey, K.E. 164 Crcwe, 1. 289 Crozicr, R. 339,417 Cyert, R.M. 153, 164
Dale, A. 164 Da'vid, L. 184, 195 Davis, L.N. 8 3 , 8 8 Dawes, R.M. 2 6 1 , 2 7 8 , 3 2 8 , 3 3 3 Dean, B.V. 129, 142 de Bono, E. 268,278 Decker, L. 473,490 de Finetti, €3,458,468 de Groot, M.H. 429 de Hoog, R. 201, 202, 281, 293, 297, 298 Derby, S.67,525 Detmer, 1).119 de Zeeuw, G . 533,544 Dimitrov, V. 246,252 Dolbear, F.T. 138,142 Dosher, B.A. 138,144,363,368 Driankov, D. 200,237,243,246,252 Duncker, K. 213, 220,267,278 Dunnette. M.D. 164
Dusoir, A.E. 544 Dyer, J.C. 134, 142 Dyer, J.S. 176, 180 Eden, C . 164 Edwards, W. 8 4 , 8 8 , 1 0 5 , 1 1 9 , 136, 137, 142, 171, 173, 175, 178, 179, 180, 195, 202, 270, 278, 298, 301. 310, 318, 322, 323, 326, 333, 334, 344, 367, 507, 509, 510, 514, 515, 528, 540,544 Egan, J.P. 475,490 Eilon, S. 146, 164 Einhorn, H. J . 133, 138, 142, 155, 164, 206, 214, 220, 224, 225, 235, 322, 325, 333, 348, 350, 368, 411, 415 Ekberg, P.H. 207,220,373, 381 Elbing, A.O. 164 Elliot, P.B. 4 7 6 , 4 7 7 , 4 9 0 Ellsberg, D. 458,468 Emelyanov, S.V. 1 3 9 , 1 4 0 , 1 4 2 Englinder,T. 453,454,455 Enthoven,A. 1 2 6 , 1 2 7 , 1 2 9 , 1 4 2 Erdman, H. 180 Ericsson, K.A. 371, 381, 4 1 7 , 4 2 9 , 4 3 1 , 432,440,442 Erlich, A. 203,321. 532,545 Evans, J.St.B.T. 533, 544
Falk, R. 340, 491, 4 9 2 , 4 9 3 , 4 9 4 , 4 9 5 , 496,497,499,504 Farkas, E. 453,455 Farlie, D. 298 Feldman, Ph. 282,299 Fellner, W. 458,468 Ferreira, J . 32, 37 Ferrell, W.R. 339, 340, 471, 472, 476, 478,479,480,490 liestinger, L. 350,368, 385 Filippov, V.A. 144 Fidler, E.J. 337,431,434,442 Fishburn, P.C. 134,142 Fischer,G.W. 137,142, 304,318 Fischhoff, B. 37,57,67,74,88,131,136, 137, 142, 143, 144, 146, 147, 152, 157, 164, 166, 218, 220, 303, 319, 339, 346, 348, 368, 371, 382,417, 428, 429, 430, 471, 472, 4 7 9 , 4 8 3 ,
547 490, 493, 503, 504, 5 0 7 , 5 1 7 , 520, 521,523,525,534,539,544,545 Fishbein, M. 387,400 Fisher, S.D. 2 1 2 , 2 2 0 , 2 2 4 , 2 3 5 Fiss, C. 119 Fleiss, J.L. 453, 455 Ford, D. 119 Fox, J. 524,525 Frankfurt, H. 461,468 Franks, J.S. 220 . Freud,S. 157,164 Fryback, D. 1 1 9 , 1 8 0 , 5 1 4 Fryback, J. 119 Fustos, L. 185,195 Galbraith, J.K. I64 Gay, J . 540,544 Girdenfors, P. 458, 4 5 9 , 4 6 0 , 461,468, 469 Gerds, U. 134, 142 Gershuny J.1. 130, 142 Gettys,C.F. 2 1 2 , 2 2 0 , 2 2 4 , 2 3 5 Glass, A.L. 268, 278 Click, J.A. 540,544 Gointein, B. 368 Goldsmith, R.W 339, 340, 457, 460, 469 Good, I.J. 4 5 8 , 4 6 0 , 4 6 9 Goodman, B.C. 1180 Graesser, A.C. 215,217,220 Granovskaya, K.M.136,142 Green, D.M. 4 7 6 , 4 7 7 , 4 9 0 Green, P.E. 288,298 Greenberg, G.Z. 475,490 Greenwald, A.C. 385,400 Greist, J.H. 180 Grey, D.R. 4 7 6 , 4 9 0 Gruemm, H. 93, 103 Gustafson, D.H. 1 8 , 20, 105, 106, 107, 119,177, 180 Gustafson, D.J. 333 Guttentag, M. 119 Hackman, J .R. 165 Hagman, D. 8 7 , 8 8 Hall, J . 107, 119 Hallden, S. 461,469 Hammond,K.R. 1 6 4 , 5 4 1 , 5 4 4 Handy,C.B. 149, 154,156,164
Hardy, A. 492,504 Harrison, F.L. 535,544 Harvie, R. 492,504 Hausmann, L. 1 6 5 , 2 0 0 , 2 2 3 Havens, J. 4 7 , 6 8 Hays, W.L. l 8 0 , 5 1 0 , 5 15 Hederstierna, A . 461,469 Heerboth, J. 212, 221, 224, 225, 226, 236,304,319 Heerboth, M.T 254,259,278 Helson, A . 356, 368 Henderson, D. 67 Heuston, M.C. 127, 142 Hickson, D.J. 148, 165 Hiles, M. 119 Hintzman, D.L. 493,504 Hoessel, W. 333 Hofstadter, D.R. 218,220 Hogarth, R.M. 133, 138, 142, 155,164, 205, 206, 214, 220, 224, 225, 235, 413,415,489,490,538,544 Holling, C.B. 74,88 Holsapple, C.W. 268,277 Holyoak, K.J. 268,278 Hoos, I. 133, 142 Hoos, J.R. 130,133,136, 142 Hosseini, J . 476, 490 Howard,R.A. 2 4 , 3 7 , 2 5 4 , 2 7 8 , 5 1 4 Huber, G. 119 Huber, G.P. 323, 333 Huber, 0. 339,353, 363, 368,428,429, 443,454,455 Huff, M. 282,299 Humphreys, P.C. 15, 66, 68, 84, 88, 155, 159, 164, 176, 180, 184, 185, !94, 195, 199, 201, 202, 205, 210, 211, 213, 219, 220, 253, 259, 260, 269, 270, 277, 278, 282, 284, 291, 292, 298, 301, 302, 304, 308, 309, 319, 322, 324, 325, 333, 334, 337, 350, 368, 518, 525, 528, 530, 534, 544 Hyman, R . 234,235 leromin, S. 142 Jackson, J . 72,88. Jacoby, J . 372,381 Janis, I.L. 205, 220, 350, 357, 362, 368,384,385,400
548 Jaques, E. 148,165,534,544 Kulkalni, R. 68 Jaus, D. 298,333 Kunreuther, H. 17,21,42,68,69,70, Jeffrey, R.C. 461,469 72,74,75,85,88,158,165 John, R.S. 171, 180, 184, 195, 202,
269,278,284,298,301 Johnson, D.M. 323,333 Johnson, E.J. 353,368 Johnson, E.M. 323,333 Johnson, R.M. 295,299 Johnson-Lenz, P. 291,299 Johnson-Lenz, T. 299 Jones, S. 164 Jones-Lee, M. 24,37 Jotwani, P. 119 Juedes,G. 119 Jungermann, H. 134, 143, 159, 160,
Lacan, J. 268,278 Lahiff, M. 180,514 Lakatos, 1.518,525 Lansley, P. 165 Larichev, 0.1. 123,125, 136,137,138,
139,140,142,144,185,195 Larson, J.R. 459,463,467,469 Lathani, G.P. 234,236 Lathrop, J.W.16,39,42,55,68,70,74,
75,80,82,84,88
Laughren, T. 180 165, 200,205,210, 220,223, 224, Lave, L.V. 138,142 Lawler, E.E. 165 235,291,299,522,525 Lawless, J. 74,88 Leal, A . 199,221,225,236,254,278, Kabus, 1,538,544 299,302,319 Lee,W.461,469 Kahne.S. 185,195 Kahneman, D. 73, 88, 137, 144,177, Levi, 1.460.469 180,218,221,226,234,236,330, Levine, M.E. 73,88,150,165 334, 346,353,368,428,429,430, Lewin,K. 161,384,389,390,400 492,493,504,510,514,515,517, Lichtenstein, S . 37, 67, 74, 88, 137,
525,529,530,533, 540,542,544, 139,143, 144, 166,220,339,371, 382,423,427,429,430,439,442, 545 Karmarkar, U S . 463,467,469 471,472,479,483,490,525,534, Katz, M. 21 1,220 539,544,545 Lichtenstein, W.350,368 Keating, G.W. 298 Keeney, R.L. 15,46,23,24,30,31,32, Light, L.L. 212,220 36,37,66,67,68,70,88,134,136, Lindley, D.V. 458,468,529,544 143,210, 220,225,236,254,269, Lindman, H. 367 278,309,319,322,323,333,525 Linncrooth, J. 16, 24, 37,39, 42,68, 70,75,84,88 Killworth, P.D. 493,504 Lock, A . R . 124, 145, 147, 151, 156, King, D.C. 107,119 165 Klayman, J. 338, 401, 409,413,415, Locke, E . A . 234,236 428 Loftus, E.F. 226,235 Klee, A.J. 137,143 Loop, J.W. 180,514 Kleinmuntz, B. 368 Lopes, L.L. 207,220,373,381 Kleinmuntz, D . N . 368 Luchins, A.S. 213,220 Kneppreth, N.P. 322,333 Lusted, L.B. 169,180,513,514 Knop, H.173,180 Kochen, M. 298,299 Koestler, A. 492,504 MacBride, J.N. 295,299 Kopetsky, E. 105 Macco, A. 105 Kozhukharov, A.N. 140,143 MacCrimmon, K.R. 162,165,323,333 Kroh-Puschl, E. 142 Magill, S.M.59,68 Kuhn, T. S. 528,544
54 9 MacCregor, D. 340.491 MacMillan, 1.C. 153. 165 MacPhillamy, D. 138, 144, 353, 369, 445,451,456 Maheshwari, A . 298 Majone, G. 45, 60. 6 2 , 68, 133, 135, 143,167,180,181 Majone, N. 7 0 , 8 4 , 8 8 Makridakis, S. 164 Mandl, C. 5 5 , 6 8 , 8 L , 84,88 Mann, L. 205, 220. 350,356, 362,368, 384, 385,400 March, J.G. 73, 74. 89, 152, 153, 154, 164, 165 Marlatt, G.A. 385,400 Marschak, J. 137, 143,429 Martin, E. 519,525 Martino, J.P. 493,504 Matheson, J. 254,278,539,545 Mazur, A . 169,180 McClelland, G.H. 541,544 McCoach, W. 322,325, 333 McCnmmon, K.R. 137,143 McFadden, W. 84, 88, 155, 159, 176, 180, 184, 194, 195, 210, 219, 220, 269, 278, 282, 292, 298, 304, 308, 319,322, 333, 350, 368 McGoey, P.J. 471, 472, 478, 479,480, 490 McGuire, W. 387,400 McKelvey, R.D. 439,442 McNamara, D.E. 107,119 Meadows, D.H.132,143 Meadows, D.L. 132,143 Mellor, D.H. 461,469 Menke, M.M. 538,544 Miles, D. 176. 180 Miles, R.F. 134, 142 Mills,C.W. 519,525 Minin, V.A. 142 Minsky, M.A. 208,221 Mintzberg, H. 150, 165 Misczynski, D. 87, 88 Mitchell, R.C. 72,88 Mokken, R.J. 292,299 Montgomery, H. 138, 139, 143, 254, 278, 328, 333, 337, 338, 343, 344, 346, 347, 350, 367, 368, 371, 372, 373, 378, 380, 381, 384, 385, 387,
399, 4 0 0 , 4 0 2 , 4 1 1 , 4 1 5 , 4 2 6 , 429, 4 30 Moore, L. 85, 88 Morgan, B.B. 206, 219 Morgan, B.J.T. 476,490 Morgcnstern, 0. 25, 37 Moshkovich, H.M. 142,143 Mulkay, M . 6 1 , 6 8 Mumpowcr, J. 541,544 Murphy, A.H. 513,515 Nagara, H. 268,278 Nagy, G. 444,456 Nair, K. 68,322, 333 Nappelbaum, E.L. 140,142 Neave, E.H. 165 Neisscr, U. 165 Neniiroff, P.M. 107,119 Newell, A . 207, 221, 234, 236, 381, 442 Nisbett, K. 165, 499, 503, 504, 530, 540,545 Nisbett, R.E. 371, 380, 381, 4 1 7 , 4 3 0 , 441,442 Nord, W.R. 165 Norman, D.A.208,219,444,455,456 Novick, D. 129,143 0gawa.G. 127,142 O'Hare, M. 72, 87 Okrent, D. 37 Olsen, J. 7 4 , 8 8 Olshavsky, R.W. 371,372,381 Olson, M. 72,88 Olsson, G. 385,400 Optner, S.L. 135, 143 Ortendahl, M. 361,368 Osgood, C.E. 376, 382 Oskamp, S . 214,221 Otway, H. 6 7 , 6 8 Parducci, A. 356, 368 Passmore, W. 119 Parkes, C.M. 325,333 Payne, J.W. 350, 353, 354, 368, 371, 372, 382, 402, 403, 408, 4 0 9 , 4 1 0 , 413, 415, 417, 427, 430, 4 3 1 , 4 4 1 , 442,455
550 Pearl, J . 199. 205, 211, 212, 216, 221, 225, 236, 254, 278, 282, 299, 302, 319 Pctersen, E J I . 165 Peterson, C.R. 274, 277 Peterson, K. 105 Petrov, A. 243, 252 Pettigrew,A.M. 149, 151, 165 Pfeffer, J . 150, 165 Phillips, L.D. 143, 146, 147, 148, 165. 179, 180, 259, 278, 301, 333, 339, 367, 471, 472.490. 508, 510, 515, 5 2 1 , 528, 534, 535, 538, 542. 544, 545 Pitz, G.1:. 199, 205, 2 0 8 , 2 1 2 , 2 1 3 , 2 1 5 , 221, 224, 225, 226, 236, 254, 259, 267, 278, 302, 304, 319, 323, 333, 538,541,542,545 Plott, C.R. 7 3 , 8 8 , 1 5 0 , 1 6 5 Pollack, I. 4 7 3 , 4 9 0 Pollard, P. 535,544 Polson, P.G. 216, 219 Polyakov,O.A. 137. 143 Porter, L.W. 1 5 4 , 1 6 1 , 1 6 2 , 1 6 5 Poulton, E.C.4 8 9 , 4 9 0 ~ug11,D.S. 1 4 8 , 1 6 5 Quade, E.S. 70, 88, 127, 128,129,132, 1 3 3 , 1 3 5 , 1 4 3 , 1 6 7 , 180 Quillian, M.R. 366, 367
Raiffa, H. 1 3 4 , 1 3 6 , 1 3 8 , 1 4 3 , 2 1 0 , 2 2 0 , 221, 225, 236, 254, 269, 278, 309, 319, 322, 323, 333, 514, 515, 539, 545 Ranyard, R. 339, 417, 418, 421, 426, 427,430 K ~ O ,V.R. 298 Ravetz, H. 6 2 , 6 8 Ravetz, J . 1 6 7 , 1 8 1 Reitman, J.S. 216, 221 Richards, M.D. 153,165 Rios, M. 5 5 , 6 8 Rips, L.J. 214, 221 Rittel, H. 130, 133, 143 Rivett, P. 137, 143 Roberts, H.V. 1 8 0 , 5 1 4 Robertson, S.P. 215, 220 Ronis, D.L. 385.400
Rose, J. 119 Rosen,L.D. 136, 137,144.372, 382 Rosennian, M . I . 1 4 0 , 1 4 3 Ross, L. 165. 499, 503, 504, 530, 540, 545 Rossineissl, J . 105 Roy, B . 137, 144 Rueter, M . R . 216, 221 Rugg, T. 282,299 Rubso, J.E. 136, 137, 138, 144, 353, 363,368,372,382
Saari, L.M. 234,236 Sdchs, N.J. 212, 215, 221, 224, 225, 226, 236, 2 5 4 , 2 5 9 , 2 7 8 , 3 0 4 , 3 1 9 Sadler, P. 165 Sagaria, S.D. 492,504 Sahlin, N . - E . 339, 340,457, 458, 459, 460,461,468,469 Salancik, G.R. 166 Saleh, N.J. 199, 221. 225, 236, 254, 278, 299,302. 319 Samet, M.G. 303, 318 Sattath, S. 215.221 Savage, L.J. 25, 37, 260, 278, 323, 333, 457,469 Sawyer, J., Jr. 67 Sayeki, Y. 3 2 2 , 3 3 3 Schank, R.C. 213, 221 Scher, J.M. 299 Schlesinger, J . R . 131, 132, 144 Schneider, W. 208, 221 Schoemaker, P.J. 534,535, 5 4 2 , 5 4 5 Schulman, A.F. 475,490 Schum,D.A. 1 7 9 , 1 8 1 Scott Morton, M . 282,299 Seale, D.L. 1 8 0 , 5 1 4 ,Sewer, D.A. 105, 120, 168, 169, 181 Seery, J . 260,262,278 Shanteau, J . 4 4 4 , 4 5 6 Shapka, Z. 3 3 1 , 3 3 4 , 3 6 8 Sharp, D.W. 5 4 0 , 5 4 4 Shaw, K.N. 234,236 Shaw, M.L.G. 297,299 Shea, T.E. 9 5 , 9 6 , 1 0 2 , 103 Sheptalova, L.P. 1 4 3 Shiffrin, R.M. 208, 221 Shinners, S . M . 127, 144 Shneiderman, B. 282,299
551 Shoben, E.J. 221 Todhunter. I. 536, 545 Shuford, E.H. 260,278 Torgerson, W.S. 185, 195 Shuman, J . 136,144 Townes, B.D. 298 Simon, H.A. 147, 165, 208, 220, 234, Tuggle, F.D. 538,545 236, 354, 369, 371, 381, 4 1 7 , 4 2 9 , Turner, C. 519,525 431,432,440,442 Turoff, M. 282,291,299 Sims, D. 164 Tversky,A. 73, 88, 136, 137, 138,144, Sin, J.K. 137, 143 177, 180, 215, 218, 221, 226, 234, Sjoberg, L. 63, 68, 259, 278, 331, 334, 236, 261, 278, 328, 330, 333, 334, 338, 350, 357, 361, 362, 368, 369, 343, 346, 350, 353, 361, 363, 368, 383, 385, 386, 387, 388, 392, 399, 369, 403, 415, 418, 428, 429, 430, 400,523,525 442, 458, 469, 492, 493, 504,510, Skyrms, 8 . 4 5 8 . 4 6 9 514, 515, 517, 525, 529, 5 3 0 , 5 3 3 , Slesin, L. 32, 37 540,542,544,545 Slovic, P. 32, 33, 37, 67, 74, 88, 133, Tyszka, T . 3 5 4 , 3 6 9 , 3 7 2 , 3 8 2 , 4 5 3 , 4 5 5 137, 138, 144, 166, 220, 343, 350, 353, 355, 368, 369, 371, 382,417, 423, 427, 429, 430, 439, 4 4 2 , 4 4 5 , Ulvila, J.W. 18, 19, 20, 2 1 , 9 1 , 93, 9 5 , 451, 456, 458, 469, 517, 5 2 5 , 5 3 9 , 96,103,172,180,481 545 Uyterhoeven, H.E.R. 155,166 Smelser, P. 119 Smith, E.E. 221, 268,279 Smith, M. 339, 340.471 van Houten, H.J. 293, 298 Snapper, K. 105, 119, 120, 168, 169, van Koningsveld, R. 105 VLi, A. 124, 171, 181, 183, 184, 185, 181 Stael von Holstein, C-AS. 533, 539, 195,199 Vaupel, J.W. 6 2 , 6 8 545 Sta1ker.G.M. 148,164 Vecsenyi, J . 123, 124, 171, 181, 183, Starr, C. 33. 38 184,185,195 Stantchev, I. 200,237 Vickers, J.M. 461,469 Staw, B.M. 1 5 2 , 1 6 6 Villani, C. 298,333 Steele, J.P. 180, 514 Vlek, Ch. 224,225, 236 Stern, L.D. 493,504 von Ulardt, 1.165,200,223 Strauss, F.F. 180 von Neumann, J . 25, 37 Suci, G.J. 376, 382 von Winterfeldt, D. 5 5 , 6 7 , 6 8 , 105.120, Svenson. 0. 254, 279, 337, 338, 343, 123, 124, 137, 144, 167, 170, 171, 344, 347, 350, 353, 354, 359, 361, 180, 181, 186, 195, 202, 210, 217, 368, 369, 371, 372, 373, 374, 375, 221, 224, 236, 270, 278, 297, 298, 380, 381, 382, 402, 411, 4 1 3 , 4 1 5 , 299, 301, 309, 319, 322, 323, 325, 418, 428, 430, 431, 442, 4 4 5 , 4 5 3 , 334, 454,456 Vroom, V.H. 166 Swets, J.A. 476,477.490
Tannenbaum, P.H. 376,382 Taylor, R.W. 162,165 Thomas, K. 166 Thornbury, J.R. 180,514 Thurow, L.C. 140,144 Toda, M. 225, 236, 260, 279, 362,369
Wagenaar, W.A. 224, 225, 236, 492, 504,507,533,544 Wagner, H.M. 126, 144 Walker, J. 74,88 Wallace, D.L. 1 8 0 , 5 1 4 Watson, S.R. 152,166 Watson, W. 107, 119
552 Webb, T. 165 Webber, M. 1 3 0 , 1 3 3 , 1 4 3 Wcgcner, M. 159, 163, 284, 291, 298, 328 Wehrung, D.A. 323, 333 Weick,K.E. 152,155,161,166 Weiss, C.H. 152, 166 Weiss, J.J. 205, 207, 211, 218, 221, 322,334 Wickelgren, W.A. 366,369 Wildavsky, A. 87.88 Wilks, Y . 215, 221,226,236 Wilson, J.Q. 7 2 , 8 8 Wilson, T.D. 371, 380, 381, 417, 430, 441,442 Winkler, R.L. 513,515 Winston, A.B. 268, 277 Wise, J.A. 226, 235, 384, 400, 458, 468 Wisudha, A. 199, 202, 205, 211, 213, 220, 269, 218, 284, 298, 302, 308, 309, 319 Wooler,S. 203,259,270,218,282,284, 298, 321, 325, 326, 333, 334, 528, 532,544,545
Wright, G.N. 535,541,545 Wright. P. 354, 369 Wynne, B. 6 1 , 6 8 Yates, P. 541,545 Yates, J.F. 458,469 Yetton, P.W. 166 Young, F.260.262,278 Young, S . 127,144
Zadeh, L.A. 2 3 9 , 2 4 1 , 2 4 3 , 2 5 2 Zajonc, R.B. 357, 358,369 Zaltman, G. 161.166 Zaus, M. 142 Zavoina, W. 439,442 Zeckhauser, R. 2 4 , 2 6 , 3 7 Zeleny, M. 324,334 Zink, D. 350,368 Zins, M.A. 367 Zuev, Ju.A. 139, 144 Zukowsky, L.G. 458,469
SUBJECT INDEX
compromises in 195 Absolute judgment see Judgment Action development of 168,177 justification of 259 Analytic bias, false approach t o 61 Anomaly in nuclear safeguards analysis quality of 228 definition of 97 representation of 224 observation of 92 Actions, packages of 229 Aspiration value 238. 243 Activities, hierarchy of 255 Attributes Actuarial data, mistaken use of 538 attractiveness scale for 372 Actuarial standards, difficult t o develop collapsing of 363,373 539 de-emphasising importance of 360 Additive rule see Decision rules defining of 225 Affective reaction determining preferdesirahle vs. undesirable 268 ences 357 development of within decision Agenda setting 73 analysis 187 Allais paradox reformulated 531 elicitation of, procedurc 21 1 , 284, Alternatives see also Options 296,308 comparison o f , in risk assessments 66 generation of 21 3 discarding of 354 hierarchy of 185 finding a promising one 355,372 independence of 202 holistic ratings of 305 independence, test of 284 limitations of techniques for cornparnumber of, and reliability 187 ison of 1 3 1 reference images for 333 rejection of 399 relative importance of 85, 269, 310, s c r c c n i g of 354 322,353 Alternatives. multi-attributed see also salience of 265 Multi-Attribute Utility Analysis weighting of 185, 203, 308,439 acceptance of 202 Attribute sets, judgments of completedecision rules for choosing betwcen ness, independence and quality 306, 34 3 311, 315 Ambiguity in organizational decision Availability of information see Informaking 1 5 5 mation Analyst-clicnt relationship Availability heuristic see Heuristics
36
554
Background knowledge see Knowledge Base rate effects 480 Basic Reference Lottery Ticket (BRLT) see Lottery methods Bayesian model for suicide attempts 117 Bayesian inference structure 178 Belief-vdlue correlation 384 Biases in probabilistic thinking 507 focus on 509 likened to error smres 539 part and parcel of science 6 1 premature conclusions 533 Bidding and choice, comparison between 427 Bidding task procedure 339.419 Binary choice procedure 339,420 Bolstering support for an alternative 361,373 Bootstrapping proccsy 21 2 Calibration of subjectivc probabilities 471,534,538 curve for 477,484 hard and easy questions 483 model 472 Cancellation of differences on attributes 363, 373 Capacity theory 534 Care of patients, long term 108 Care, quality of see also Quality Assurance Process index of 108 monitoringof 19,106,113 Multi-Attribute Utility Model 108, 115 Catastrophes avoidance of 16, 32 calculation of probabilities of 4 8 deaths in 81 insurance as protection against 86 probabilities plotted against fa talities 51 risk measures sensitive to 65 Causal path models, use of 398 Causally directed knowledge 226 Change agents, effective 113 Change model, planned 160 CUent-analyst relationship see Analystclient relationship
Clinical judgment compared with Bayesian model 178 Clinical skills in decision analysis 157 Coalitions in collective decision making 176 in organizations 154 Cognitive deficiency, problem of 540 Cognitive dissonance. theory of 385. 397 Cognitive goal hierarchy 227 Cognitive operations, availability of 45 5 Cognitive reprcsentation of decision problems 6ee Decision problem representation Cognitive script 21 3, 358, 444 Coincidences experienced by self and others 341, 496,499 surprisingness of 340,491 Collapse circuit 250 (:ollective decision making see Social decision making Compensation payments 8 7 Compensatory decision rules see Decision rules Compensatory thinking 363 Computerized decision aids see Decision aids Computers, choice between 254 Conflict antecedents of 395 and differentiation between and within options 386 motivation t o avoid 384 in organizational decision making 162 Conflicting evidence, forgetting of 366 Consciousness raising 189, 202,291 Conservatism in probabilistic inference 510 Consistency, difficulties in demonstrating 218 Context effects see Decision problem I epresent at ion Contingent regulation of quality of care 107,119 Convergence among intuitive and prescribed preferences 291, 304, 312, 329
555
increase over time 31 3 measures of 3 12 Convergent validity 433 salience ineasures 441 Correlation between information dimcn. sions 449 Cost-benefit analysis 127, 199 focus on outcomes, rather than process 84 Cost-effectiveness technique I27 ill-suited to strategic choice problems 131 uselessness of constructing objective models 130 Court awards, use of, to evaluate mortality risks 27 Criteria, elicitation of 159 Cultural setting, influence on attributes considered 85 Culture, organizational 149
Database for decision structuring 207 structured 268 Decision aiding ability, improvement of 235 Decision aids, computerized see EVAL, CENTREE, GODDESS, MAUD, OPINT, and QVAL ease of interaction 288 evaluation of 202,284 helping to find a dominance structure 368 human engineering of 206 improvement of 283 interactive 199, 282, 302 perceived applicability 290 user satisfaction with 202, 312, 314 Decision analysis art of 302 comparison with psychoanalysis 146 compromise between decision maker and analyst 195 development of attributes within 186 developing clinical skills 157 formal models of 205 goals of 18,257 and ideology 147 impact of judgment research on 524
36*
implementation of 157, 160, 167 importance of awareness of organizational context 146 institutional constraints on 173 personalist 91 pitfalls of see Pitfalls of analysis potential as an evaluation methodology 105 reasons for rejection 303 task taxonomy for 514 use of prescriptions to justify decisions 276 valuation of 152, 219 Decision analytic model for inspection of activities 96 selection of 210 Decision behavior, improvement of 235 Decision conferences 532 Decision criterion, location of 477 Decision latency 437 Decision marking compared with systems analysis 133 defective avoidance in 362 distinction from judgment blurred 5 24 levels of abstraction in see Levels of abstract ion multi-criteria see Multi-criteria decision making personal 303 psychological and sociological factors in 141 Decision making process automation of 207 control of 150 directionality of 357 missing information 434,445 model for 341.380 rounds in see Sequential decision making process phases in 352 Decision making research paradigms 528 Decision making strategies development of 184,409 with missing information 445 as points in multidimensional space 41 1 repertoire of 402 selection of 425,454 Decision making systems, hierarchical 186
556 Decision objectives see Objectives Decision option see Options Decision problem representation 200, 224 cognitive 216, 372,444 context effects 542 differences between subjects 530 editing phase in 353 methods for eliciting 234 preferences for incompleteness 186 prototypical structures for 210 role of the goal in 2 2 3 , 237 syntax for 21 7 testing against the environment 541 veridicality of 530 Decision problem structure see Structuring decision problems Decision problem taxonomy see Taxonomy Decision process, sequential see Sequential decision making process Decision rules abstract vs. concrete 349 additive difference 402 additive MAUT model 189,309 compensatory, application of 373 compensatory vs. noncornpensatory 297,346,365,402 complex 402 conjunctive 345,373,402,410 disjunctive 345 dominance 344 elimination by aspects 345,403,410 expected utility 344,533 formulated by subjects 422 integrating qualitative and quantitative factors I39 lexicographic 345,407,410 limited applicability of 347 non-compensatory 346, 380 problem of finding appropriate ones 184 subjective nature of 130 validity of 350 Decision structure see Structuring decision problems and Requisite decision structure Decision support system 268,282 Decision strategy see Decision making strategies
Decision theoretic approach lo program evaluation 105 Decision thcory 15 application of 235 goal concept in 225 Decision variable partitioning of 473 relation to propositions 474 Defensive avoidance of stress see Stress Deregulation, e r ; ~ of 1 I8 Detection, probability of, in diversion path analysis 20, 94 Detection of problems 116 Directed graphs 246 Discriminability parameter (d’) 474 Discrimination proccss, probabilistic 473 Diversion paths in nuclear safcguards analysis 19, 93, 99 Dominance structure hierarchical 365 search for 360,372,387 unrelated to thought concerns 392 Dominance testing 352 Drawing pin probleni 536
Ecological validity of research o n probabilistic thinking 507 backing in process models 508 Editing operations 353 Effective change agents 1 1 3 ELECTRE technique 137 Elicitation o f attributes see Attributes of options see Options Elimination by aspects see Decision rules Ellsberg Paradox 458 Emergence of decisions 384 Empirical findings, unrepresentative of tasks and subjects 528 Environmental effects, analysis of 105 Environmental impact report 44 Epistemic risk model 462 Equitable distribution of risk 31 Error, focus on, in research 510,524 Errors reduced by group discussion 540 EVAL 302 Evaluation methodology, use o f 105
557 Evaluation of dccision aids see Decision aids of human judgment 522 of mortality risks 28, 35 Event, exogenous, role in triggering social interest 73, 82 low probability 73 Evidence, rule\ of, in risk assessments 17,64 Expected Utility see Subjective expected utility Expected utility analysis 210,531 Expected utility theory influence on decision making 135 interpretation of violations of 538 neither normative nor prescriptive 543 Experts advocacy role of 17 involvement in decision analysis 175 Exploring decision problems 159 Facility siting, Liquefied Natural Gas see Liquefied Natural Gas powcr plant 173 sequential decision making in 4 3 Facts vs. judgment in risk assessments 61 Fatalities see also Risk analysis confused measures of 56 factors to be taken into consideration 2 3 , 6 5 "good" and "bad" kinds 34 indices of 49 use in risk indices 15 Flexibility analysis 200, 276 Forgetting of conflicting evidence 366 F+aming of decisions 428 of goals 226 Functional fixity 213, 267, 276 Fuzzy set 239 Fuzzy structural modeling 2 0 0 , 2 3 7 Games played by decision makers 176 General system\ approach see Systems analysis Generalizations from laboratory tasks 511,520,535 problems of 539
C;eneralizations froni research on heuristics and biases 528 Generation of options see Option generation Generation ofoutcomcs 225 Generation paradigm 540 Generic problem structure 318 see also Structuring decision problems GENTREE 201 GERT see Graphic Evaluation and Revicw Technique Goal achievement, potcntial 232 Goal concept, activation of 200, 228 Goal confusion 184 Goal directcd behavior contrastcd with policy making process 45 Coal directed decision structuring system sce GODDESS Goal driven approach to decision making 225 Goal hierarchy 160, 1 8 7 , 2 0 0 , 2 2 9 , 2 3 3 means and ends mixed I90 Goal indicators 238 Goal-oriented knowledge 215, 226 Goal specification, problems of 130 Goal structure, organizational 153 Goal structuring 159, 302 Goal system 238 Goals analysis of, in R & D decisions 187 associated with phases in the decision process 350 assumed to be clearly definable 147 cognitive hierarchy of see Goal hierarchy conflicting in social decision making 63 context-bound 225 differences between analyst and client 15 7 elicitation of 159, 211 explicitness (General activation hypothesis) 231 formal, context free 225 furthered through coalitions 153 importance of 233 meaning of 224 and objectives of parties with different interests 83 related to personal preferences 228, 233
558 of sponsor 158 value of 228 GODDESS 199,225 procedures contrasted with MAUD 216,302 Good configuration 264 Graphic Evaluation and Review Technique (GERT) 18,85 Group discussion, role in reducing errors 540 Group process, integrative 107 Guttman scaling 291
Information availability in memory 455 defensive restructuring of 386 epistemic reliability of 460 introduced from memory 444,452
missing,roleof434,445,453
neglect of, in noncompensatory rules 347 about preferences (non-spatial) 215 presented, interpretation of 454 separated from judgment 61 storage in memory 208 substitution hypothesis 447 utilization of 444 Heuristics, availability and representa- Information boards 402 tion of knowledge 73,234 Information gathering patterns 402 classification vs. computation 423 Information integration, minimization of employed by decision makers 185 215 evaluation of research on 518 Information processing approach to generation of 224 decision research 527,536 identification of 524 Information processing capacity in probabilistic thinking 507 increase of 136 Hierarchical decomposition of goals see. limitations of 135 Goal hicrarchy types of limitations 455 Human engineering tradition in psy- Information processing paradigm chology 5 12 production rule version 538 Information retrieval from mcmory see Memory IAEA see International Atomic Energy Information search patterns 338, 392, Agency 403 Ideal point, positioning of 284,308 and decision rules 35 1,403 Identifiable fatalities vs. statistical fataltask dependent charactcristics 407, ities 29 44 1 Ideology and decision analysis 147 Inspection activities 18 Imagination, role of in creating action allocation of time in 100,113, 116 224 effectiveness of 91 Importance weights see also Attributes timeliness of 97 and Tradeoffs Inspection histories 97 as ascaling parameters 31 1 Inspection resources, allocation of 95, Improvement of decision aids see Deci172 sion aids Institutional decision problems 22 Incoherence Insurance as a protection against catasinterpretation of 538 trophes see Catastrophes of preference structures 184 use of, for evaluating mortality risks Inconsistencies, detection of 218,436 27 Individual differences, importance of Integrative group process 107 539 Intellectual tasks, taxonomy of 51 1 Inference structure, Bayesian 178 Influence, degree of between goal and Interactive decision aids see Decision aids indicators 241
559 Interest groups characteristics of 1 6 , 4 0 , 7 1 , 1 5 8 conflict between standpoints 130 differences in value structures 5 5 problem of different viewpoints 140 refusal to cooperate 173 International A,tomic Energy Agency 18,81,172 Interpretation of situations by subjects 518 Involvement of decision makers, and commitment 159 Judgment absolute vs. relative in tasks 423,425 need for in analysis 62 ordinal 439 Judgmental abilities, results of underestimation of 523 Judgmental heuristics see Heuristics Judgmental phenomena, robustness of 518
Judgment-free measurement 92 Justification of decisions 293, 337,343, 417,427 Knowledge background, effect of 452,455 causally oriented vs. goal oriented 215 goaldirected 226 in long-term memory see Memory non-spatial 215 theories of organization of 214 Knowledge base in memory 208 interfacing with 297 KYST 260 LNG see Liquefied Natural Gas Laboratory experiments on decision making, controversy over 507 Laboratory tasks, generalizations from 511 Language used in measurement techniques 138 Lateral thinking 268 Legitimacy of group membership 108
Levels of abstraction of alternatives 170 in dealing with decision problems 5 34 Limitations of information processing capacity see Information processing Linear judgment model 438 explanatory power of 439 Linguistic approach in describing goals 237 Linguistic approximation procedure 243 Linguistic variable 239 Liquefied Natural Gas, facility siting case studies 4 0 , 7 5 economic aspects of 80 environmental aspects of 80 risks of see Risk assessments worst case scenarios see Scenarios Long-term memory see Memory Lottery methods for attribute weighting 138,284, 308 difficulties with 31 7 Lotteries over mortality risk vectors 25
MAMP see Multi-Attribute Multi-Party model Management information systems 214 Mathematical programming techniques I27 MAUA see Multi-Attribute Utility Analysis MAUD 84, 176, 194, 199, 201, 211, 269,284,302 Procedures contrasted with GODDESS 216,302 MAUD3 compared with decision analyst 304 involvement of subjects with 316 peculiarities of 317 Maximum feasible utility of R & D projects 192 Meaning episodic 228 of goals 225 Means-end confusion 184 Means-end relationship between op tions, outcomes and goals 216 in goal hierarchy 189 in goal structuring 21 1
560 Measurement, judgment-free see Judgment Measurement problems in systems a n d ysis and decision making 137. 540 Measurement techniques, use of amenable language in 138 Medical services, evaluation of cffectiveness of 105 Membership function 242 Memory, limitations of 5 10 Memory, long-term, retrieval of information from 2 0 0 , 2 0 9 , 4 5 2 , 4 9 3 representation of knowledge in 208, 215, 226 use of models of 234 Memory, organization of 208 Memory, rote, study of 509 Memory, short-term limitations of 136, 207, 214 limits of, and compensatory rules 349 Modeling, prescriptive 9 2 Modeling, qualitative 107 Modeling, structural see Structuring decision problems Mortality risks 24 evaluation of 28, 35 utility function for 30. 35 Motivation of the client 170 sec also Goals Multi-Attribute MultiLParty model 17, 21, 7 0 , 7 5 focus o n process 84 interpretation of 80 lessons from 83 Multi-attribute risk aversion 309 Multiattribute Utility Analysis 210 characterizing discrete features in 215 comparison of alternatives in 185, 322 compared with other techniques 136 decision rules 373 focus o n outcomes rather than process 84 incorporating concerns of stakeholders 174 performed by computer vs. analyst 304 pitfalls of 183
role in social decision making 18, 84 self-reports of usefulness of 305 use in resource allocation problems 186 use by students 533 weighting of attributes see Attributes Multi-Attribute Utility model for quality of care 108 Multi- criteria decision making 128 see also Multiattribute Utility Analysis clustering of sub-criteria 186 problems of attribute weighting 137 Multi-criteria linear programming, manmachine techniques in 137
Negative evaluations, dominance of 380 Network analysis 127 Network model connecting goals and actions 200 of the representation of knowledge 233 Network theory, use of, in GERT 8 5 Non-conipensatory rules see Decision rules No n-proli fera t ion treaty 9 2 Normative models, lack of psychological validity 508 Nuclear material, diversion of 91 Nursing homes, quality of care in 106
Objective models built o n inadequate information 131 replacement by subjective models in systems analysis 135 Objective surrogate for subjective assessments 19, 94 Objectives conflicting 254 focussing o n 226 generation of 2 1 3 multiple 225 Objectivity, problem of 538, 5 4 2 Operations research techniques, application of 126 Opinions, elicitation from knowledgeable people 114
56 1 Opinion leaders, use of 108 OPlNT 302 Optimal behavior, in weather forecasters 513 Optimality, understanding of 225 Optimization of procedures rather than outcomes 2 2 Option-anchored scales in multiattribute utility analysis 324 Option comparison, differences between systems and O.R. approaches 127 Option-driven approach to decision making 225 Option generation 200, 208, 224, 257, 30 2 role of the goal in 234 Option structuring technique 254,272 Options differentiation within and between 385 elicitation of 2 1 1 kept open 277 screening of 303,35 3 tradeoffs between see Tradeoffs Organizational culture 156 Organizational decision-making process 146,152 dealing with conflicts 162 procedural rules in 148 reconciling contradictions in goal hierarchies 189 Organizational structure and opportunities for decision analysis 147 Paradigms used in decision-making research 528 Personal construct theory 21 1 Personal decision making see Decision making Personalist decision analysis see Decision analysis PIP see Probabilistic information processing system Pitfalls of analysis 33, 61, 146, 167, 183 Planned change model 16 1 Policy-making process 5 8 , 6 0 characteristics of 45 introduction of competing expertise 65
Positivist view of science, challenged 61 Posterior probability of an event 474 Power in organizations 149 PPB system 129 Pre-editing operations 352, 354 see also Editing operations Preference independence 30 Preference reversals 53 1 Prescriptive modeling 92 Pressure groups see Interest groups Probabilistic discrimination see Discrimination process Probabilistic information processing system, failure to adopt 178 Probabilities second order 339,457 "true" 533 Probability assessments elicitation of 176 problems in research on 528 resistance in their use 5 1 1 Probability distributions, methods of encoding 5 39 Probability revision experiments 5 10 Problem, definition of, in organizational decision making 156 Problem detection in monitoring quality of care 116 Problem representation see Decision problem representation Problem solving creative 200, 208 descriptive models of 216 reasons for failure 2 14 Problem statements, incomplete 530 Problem translation 216 Process tracing analysis 337, 368, 371, 380,402,417 Process tracing tools 41 3 Production rules in information processing paradigm 5 38 relation to continuous quantities not clear 215 Proper scoring rules 533 Prospect theory contrasted with generation paradigm 542 Prototypical decision problem structures see Structuring decision problems
56 2 Psychophysical paradigm in decision making research 527,529 Psychosocial transitions 325 Public interest groups see Interest groups Qualitative factors measurement of 138 problcm of how to integrate them 139 Qualitative model see Modeling Quality Assurance Process (QAP) 20, 109,113 Quality control, statistical 106 Quality of care see Care QVAL 211
R & D see Research and devclopment Rare events, attribution of causality to 504 Rational procedures for problem analysis 140 Rationality, understanding of 225 Rationalization (postdecision) 60 Reasons for choice. formation of 339, 428 see also Justification of decisions Recidivism rate, for nursing homes 115 Regret attempts to decrease 399 in giving up alternatives 350, 366 Regulated industries, implications of deregulation 116 Regulation, contingent 20 Regulatory agencies, role of in decision making 4 2 , 7 2 Regulatory system 18,105 Relative judgment see Judgment Relevancy, requirement of 310 Reliability of evaluations, improvement of 186,189 of information 139 of screening process 11 2 Repertory grid procedure 2 I 1 , 284 problems with 286,296 Representativeness of intellectual tasks and performers 5 12 Requirements, structuring of 257 Requirements space 259 efficient partitioning of 264
Requisite decision structure 158, 160, 258,532,542 Research and dcvelopment problems, decision analysis of 129 Rcscarch and development projects, selection of 184 Resistance to change circuit 25 1 Rctrospectivc verbal reports see Verbal reports Risk acceptability of 57 equitable distribution of 16, 31 health 22 to individuals 25 institution perspective 25 lack of definition of, in risk assessments 55 mortality see Mortality risks multidimensional concept 64, 67 multiple causes of 33 public 23 of re-oprning an issue, counteraction of 400 safety 23 Risk analysis distinguished from decision analysis 41 generation of false expectations rcgarding 4 0 hidden agendas in 169 increase in polarization of arguments 63 modeling fatalities in 47 prescriptive 25 results viewed as evidence 40 use in LNG facility siting 40 use of utility analysis in 24 Risk assessments advocacy of 62 comparison of 52 conflicting 84 effectiveness of 64 as evidence 17 improvement throuph rules of evidence 64 omission of "unquantiliable" risks 51
parameters used in 54 purpose of 1 7 , 5 9 , 6 4 role in sequential decision making
563 process 47 scope of 5 4 subjectivity of 16 suspect standards 5 39 timing of 5 8 , 6 4 use of 1 6 , 5 3 , 6 0 , 8 4 value judgments in 17, 24, 3 5 , 8 3 viewed as facts rather than evidence 58,61 Risk aversion, multiattribute 309 Risk equity preference for 16 vs. catastrophe avoidance, preference for 32 Rounds in decision making see Sequential decision making process Rules, decision see Decision rules of evidence in risk assessments 1 7 , 6 4 , 70 procedural in organizational decision making 148 production see Production rules
Safeguards nuclear 92 aggregate measure of effectiveness 98 objective of 96 Safeguard system, vulnerabilities of 99 Satisficing, rules used for 4 1 1 Scenarios impact of changes in context on 86 stimulation of 21 3 use of 159 viability of 304 worst case 1 6 . 4 2 , 81 Schemata 2 1 2 , 2 5 4 Screening of decision optionsseeOptions Screening instrument for assehsing quality of care 1 1 1 Screening process 1 9 , 1 1 7 reliability of 1 12 Script, cognitive see Cognitive script Search for information see Information search Self-reference in systems 21 8 Self-report, use of in inspections 114 Semantic memory see Memory
Sensitivity analysis 154, 175, 277, 302 vs. strict optimization 160 Sequential decision-making process 6 3 , 73,82 roundsin 1 6 , 4 1 , 7 5 , 1 7 3 SEU see Subjective Expected Utility Severity measure of illness 105 Short-term memory see Memory Signal detection theory 473 Simple Multi-Attribute Rating Technique (SMART) 310, 317,326 Small worlds 258,277 SMART see Simple Multi-Attribute Rating Technique Snowball circuit 249 Social decision making 15, 18, 139 problem of applying systems analysis 130 use of multiattribute utility analysis 174 Stakeholders see Interest groups States of the world, generation of hypotheses about 224 Statistical quality control 106 Status quo, transformation of 234 Strategic choice problems 131 Stress defensive avoidance as response to 384 and directionality of decision making 351 Structural modeling, fuzzy 2 0 0 , 2 3 7 Structure of goals and subgoals 225 important role of 537 search for 344 Structuring decision problems 24, 159, 224,254,282,302,372,532 consequences of failure 542 database for 207 goal directed approach 2 1 1 , 3 18 pitfalls of 174 prototypical structures 210 restructuring to reduce contradictions 194 small scale problems 199 Subjective Expected Utility see also Expected utility theory applicability of model 344 compared with maximum feasible
564 utility 1 9 3 neglect o f , by decision makers 186 Subjective-objective probability functions 5 29 Subjective probabilities, calibration of see Calibration Substitution hypothesis see Information Surprisingness of coincidences see Coincidences Swrogate measures see Objective surrogate for subjective assessments Systems analysis application t o ill-structured problems 129 art of application 130 cost and benefit models in 128 current trends in 131 and decision making 126 general systems approach 127, 131 pitfalls of 133 psychological and sociological factors 141 qualitative and quantitative aspects of 129 Systems engineering 127 Task characteristics choice of decision rule dependent o n 402 complex nature of 542 effects of 4 4 1 , 5 3 4 Task performance, effects of goal setting o n 234 Task requiremcnts and reasons for choice 419 Task treated as an end in itself 522 Taxonomy of decision problems 210 of intellectual tasks 5 1 1 , s 39 Test theory paradigm in decision research 527,533 Think-aloud procedure 337, 350, 368, 371,436,455 Think-aloud protocol see Verbal protocols Tools for aiding decision making see also Decision aids absence o f , in laboratory experiments 522 use of 512
‘I’rddeoff ratios assumed fixed in Multi-Attribute Utility Analysis 269 Tradeofls see also Attributes, weighting of between attributes, difficulties with discrete features 215 between efficiency and equity in social decision making 8 6 between problcm formulations 1 7 2 between risks and costs 25, 27 between types of decision-making procedures 85 in modeling uncertainties in risk assessments 66 problems in making cxplicit 46 Tradeoff techniques 136 Transformation, topological 262 ’Truc” probability 5 33
Uncertainties involved in R & D project assessmcnt 192 modeled in risk assessments 6 5 Uncertainty ability to cope with 148 as an attribute describing conscquences 5 30 structuring of 254 Utilities, see also Expected utility theory and Subjective Expected Utility Maximum feasible 192 problems in the assessment of 176, 323 Utility analysis in risk analysis see Risk analysis Utility functions for mortality risks 30 for equitable risk vs. catastrophe avoidance 32 Utility independence 3 0 , 1 8 9 , 3 0 8
Value functions, assessment of 308 Value of a life, lack of consensus o n 24,30 Value judgments classification of 421 coniplexity of 347 in risk assessments see Risk assessments
565 Value structures, differences between interest groups see Interest groups Value systems insights provided by decision aids 219 only part made explicit 186 Value tree 169 Verbal protocols, analysis of 337, 371, 374,450,467 Verbal reports consistency of 3 3 8 , 4 3 2 , 4 3 7 dominance of negative evaluations in 380
reliability of 431 retrospective 4 3 1 , 4 3 6 validity of 431 Verbalization, effect of 4 3 2
Weighting of attributes see Attributes Wishful thinking 361 Worst case scenario see Scenario ZAPROS technique 139
This Page Intentionally Left Blank