NON-BAYESIAN DECISION THEORY BELIEFS AND DESIRES AS REASONS FOR ACTION
THEORY AND DECISION LIBRARY General Editor: Ju...
66 downloads
1327 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
NON-BAYESIAN DECISION THEORY BELIEFS AND DESIRES AS REASONS FOR ACTION
THEORY AND DECISION LIBRARY General Editor: Julian Nida-Rümelin (Universita¨t Mu¨nchen) Series A: Philosophy and Methodology of the Social Sciences Series B: Mathematical and Statistical Methods Series C: Game Theory, Mathematical Programming and Operations Research SERIES A: PHILOSOPHY AND METHODOLOGY OF THE SOCIAL SCIENCES VOLUME 44
Assistant Editor: Martin Rechenauer (Universita¨t Mu¨nchen) Editorial Board: Raymond Boudon (Paris), Mario Bunge (Montréal ), Isaac Levi (New York), Richard V. Mattessich (Vancouver), Bertrand Munier (Cachan), Amartya K. Sen (Cambridge), Brian Skyrms (Irvine), Wolfgang Spohn (Konstanz) Scope: This series deals with the foundations, the general methodology and the criteria, goals and purpose of the social sciences. The emphasis in the Series A will be on well-argued, thoroughly analytical rather than advanced mathematical treatments. In this context, particular attention will be paid to game and decision theory and general philosophical topics from mathematics, psychology and economics, such as game theory, voting and welfare theory, with applications to political science, sociology, law and ethics.
For other titles published in this series, go to www.springer.com/series/6616
Martin Peterson
NON-BAYESIAN DECISION THEORY Beliefs and Desires as Reasons for Action
ABC
Martin Peterson Department of History and Philosophy of Science University of Cambridge Cambridge, CB2 3RH
ISBN 978-1-4020-8698-4
e-ISBN 978-1-4020-8699-1
Library of Congress Control Number: 2008928678 c 2008 Springer Science+Business Media B.V. ° No part of this work may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, micro lming, recording or otherwise, without written permission from the Publisher, with the exception of any material supplied specically for the purpose of being entered and executed on a computer system, for exclusive use by the purchaser of the work. Printed on acid-free paper 9 8 7 6 5 4 3 2 1 springer.com
Preface
For quite some time, philosophers, economists, and statisticians have endorsed a view on rational choice known as Bayesianism. The work on this book has grown out of a feeling that the Bayesian view has come to dominate the academic community to such an extent that alternative, non-Bayesian positions are seldom extensively researched. Needless to say, I think this is a pity. Non-Bayesian positions deserve to be examined with much greater care, and the present work is an attempt to defend what I believe to be a coherent and reasonably detailed non-Bayesian account of decision theory. The main thesis I defend can be summarised as follows. Rational agents maximise subjective expected utility, but contrary to what is claimed by Bayesians, utility and subjective probability should not be defined in terms of preferences over uncertain prospects. On the contrary, rational decision makers need only consider preferences over certain outcomes. It will be shown that utility and probability functions derived in a non-Bayesian manner can be used for generating preferences over uncertain prospects, that support the principle of maximising subjective expected utility. To some extent, this non-Bayesian view gives an account of what modern decision theory could have been like, had decision theorists not entered the Bayesian path discovered by Ramsey, de Finetti, Savage, and others. I will not discuss all previous non-Bayesian positions presented in the literature. Some demarcation lines between alternative non-Bayesian positions will simply be taken for granted. Most notably, I assume that some version of the Humean beliefdesire model of action is correct. Decision theories that seek to derive normative conclusions from other entities than beliefs and desires (such as objective frequencies or propensities) will hardly be discussed at all. By sticking to the traditional belief-desire model of action, I hope to retain as much as possible of what I think are the good features of the Bayesian approach, without being committed to accepting the less attractive parts. The present work is mainly concerned with philosophical issues in decision theory. Although a number of technical results are presented, the focus is set on conceptual and normative problems. All proofs appear in the appendix. Only the most elementary kinds of decision problems are considered, that is, single decisions taken
v
vi
Preface
by a single agent at a given point in time. More complicated decisions problems inevitably require a more complex technical apparatus, but the philosophical significance of those problems seldom stand in proportion to the technical apparatus required for handling them. *** The opportunity to write this book arose when I accepted a research position in the Department of History and Philosophy of Science at the University of Cambridge. I wish to thank all my colleagues for their support and for creating such a stimulating research atmosphere in the department. The book is, however, based on a number of articles I have written over the past five years while working at the Royal Institute of Technology in Stockholm and at Lulea University of Technology, so I am also deeply indebted to my colleges there. In particular, I would like to thank Sven Ove Hansson for comments and helpful criticism of nearly all views and arguments put forward in this book. Without his ability to quickly and precisely identify the weak part of an argument, this book could never have been completed. I would also like to thank Nicholas Espinoza for stimulating discussions on indeterminate preferences. A large number of people have given invaluable comments on individual chapters or the papers on which they are based. In particular, I would like to thank Barbro Bj¨orkman, Anna Bjurman, Sven Danielsson, John Cantwell, Johan Gustafsson, Stephen John, Peter Kesting, Karsten Klint Jensen, Duncan Luce, Wlodek Rabionowicz, Per Sandin, Tor Sandqvist, Nils-Eric Sahlin, and Teddy Seidenfeld. My work on this project has been partially funded by a generous grant from the Swedish Rescue Services Agency. Chapters 1, 2, 5 and 6 are based on previously unpublished material. Chapter 3 is based on, but not identical to, Peterson (2003a), (2003b), and (2004a) and Peterson and Hansson (2004). Most of Chapter 4 is taken from Peterson (2006a) and Espinoza and Peterson (2006). I wish to thank Espinoza and Hansson for allowing me to include material from our joint papers. The formal results in Chapter 7 originally appeared in Peterson (2002a) and (2004b). Chapter 8 is based on Peterson (2002b), (2006b), and (2006c). I thank the editors of the journals in which the papers appeared for letting me reproduce substantial sections of them here. I dedicate this book to my children, Louise and Henrik. Cambridge, March 2008 Martin Peterson
Contents
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 The subjective non-Bayesian approach . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2 A criterion of rationality for ideal agents . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Preview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2
Bayesian decision theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 The basic idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 From objective to subjective probability . . . . . . . . . . . . . . . . . . . . . . . . 2.3 The purely subjective approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.4 The propositional approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.5 Do Bayesians put the cart before the horse? . . . . . . . . . . . . . . . . . . . . .
13 14 17 20 23 26
3
Choosing what to decide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.1 Transformative and effective rules defined . . . . . . . . . . . . . . . . . . . . . . 3.2 A comparison structure for formal representations . . . . . . . . . . . . . . . 3.3 An axiomatic analysis of transformative decision rules . . . . . . . . . . . 3.4 Strong versus weak monotonicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Two notions of permutability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 More on iterativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Acyclicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Rival representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
31 33 36 37 40 43 48 52 53
4
Indeterminate preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 Previous accounts of preferential indeterminacy . . . . . . . . . . . . . . . . . 4.2 What is a preference? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Introduction to the probabilistic theory . . . . . . . . . . . . . . . . . . . . . . . . . 4.4 The probabilistic analysis of preference . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Reflexivity, symmetry, and transitivity . . . . . . . . . . . . . . . . . . . . . . . . . 4.6 The choice axiom is not universally valid . . . . . . . . . . . . . . . . . . . . . . . 4.7 Spohn and Levi on self-predicting probabilities . . . . . . . . . . . . . . . . .
61 62 64 68 69 70 73 74
vii
viii
Contents
4.8
Further remarks on indeterminate preferences . . . . . . . . . . . . . . . . . . . 79
5
Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.1 The classical theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 The probabilistic theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.3 A modified version of the probabilistic theory . . . . . . . . . . . . . . . . . . . 5.4 Can desires be reduced to beliefs? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Second-order preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Subjective probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 6.1 Why not objective probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 6.2 Why not Bayesian subjective probability? . . . . . . . . . . . . . . . . . . . . . . 99 6.3 Non-Bayesian subjective probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 6.4 Subjective non-Bayesian probability and horse race lotteries . . . . . . . 105 6.5 Concluding remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
7
Expected utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 7.1 From Pascal to Allais . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 7.2 Preamble to the new axiomatisations . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.3 The rule-based axiomatisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 7.4 The act-based axiomatisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 7.5 The Allais paradox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.6 The independence axiom vs. the trade-off principle . . . . . . . . . . . . . . 123
8
Risk aversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8.1 Beyond the Pratt-Arrow concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8.2 The first impossibility theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 8.3 The second impossibility theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 8.4 The precautionary principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 8.5 The fourth impossibility theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 8.6 Risk aversion as an epistemic concept? . . . . . . . . . . . . . . . . . . . . . . . . 140
Appendix A:
81 82 87 89 91 92
Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Chapter 1
Introduction
A while back a beautiful woman, whom I quite liked, asked me to marry her. I was stunned. Marriage? Now? It is too early! I have not even turned forty! However, for one reason or another I decided not to share my spontaneous reaction with her. I said I felt overwhelmed by this flattering, although rather unexpected proposal, and that I needed some time to think it over. At dawn the following day I sneaked out and raced to the university library. I borrowed all the books I could find on decision theory. Later the same afternoon, after having learned what modern decision theory is all about, I still had no clue how to answer the lady. Most decision theorists agree that there is nothing special about marriage proposals – at least not from a theoretical point of view. A successful decision theory should be equally applicable to choosing a partner as to decisions on financial investments or environmental management, or in issues related to health and safety. This is because decision theorists seek to make a perfectly general claim about rational decision making. According to the overwhelming majority of scholars, the aim of decision theory is to characterise what an agent ought to do, given his or her present beliefs and desires. What kind of issue these beliefs and desires are about, is irrelevant. The literature on decision theory is huge and ever-expanding; it is difficult to draw a comprehensive map of the field. The present book is concerned with a single, well-defined problem in decision theory, viz. the controversy over Bayesianism and non-Bayesianism. The essence of the dispute is the following. Bayesians think it is enough that rational agents behave as if they maximise subjective expected utility, whereas non-Bayesians believe one should choose an act over another because its subjective expected utility is optimal. (The subjective expected utility of an act is the sum of utilities of all its possible outcomes, weighted by the agent’s subjective probability of each outcome.) For several decades, Bayesian views have dominated the field. The aim of this book is to challenge the Bayesian approach and show that there is a viable non-Bayesian alternative. The difference between the two approaches can be illustrated in the marriage example. There are two alternatives, marriage or no marriage. Marriage may lead to roughly two possible outcomes, bliss or divorce. If the proposal is rejected, the
1
2
1 Introduction
status quo will be preserved. In what may be called a common sense approach to rational choice, the agent should estimate the desirability and probability of each outcome before he makes his choice. Suppose the degree to which the agent desires the three outcomes can be represented by real numbers (on an interval scale)1 : Bliss +100, status quo 0, divorce -100. Also suppose the agent estimates the probability of a happy marriage to be 0.1, and that of a divorce to be 0.9. Then, since (0 · 0.1) + (0 · 0.9) > (+100 · 0.1) + (−100 · 0.9), it follows that the expected utility of not getting married exceeds that of accepting the proposal. Bayesians reject the common sense approach. The problem, as they perceive it, is that it gives no technically precise meaning to the concepts of utility and probability. Where do the numbers in the example come from, and how can one be sure that they truly reflect the agent’s beliefs and desires? Furthermore, why should one apply the principle of maximising expected utility, rather than some other decision rule? Because of these problems, Bayesians favour a radically different approach. Instead of forming preferences over alternative acts by multiplying utilities and subjective probabilities of outcomes, Bayesians generate utilities and subjective probabilities from preferences over uncertain prospects and act as if they were guided by the expected utility principle. Exactly what is meant by the phrase ‘as if’ will be clarified in Chapter 2. The non-Bayesian approach advocated here seeks to overcome the Bayesian objections to the common sense approach. By developing technically precise notions of utility and subjective probability and then proposing a non-Bayesian axiomatisation of the expected utility principle, support is given to the claim that rational agents choose an act over another because its subjective expected utility is optimal. Having given a rough characterisation of the difference between Bayesian and non-Bayesian approaches, it should be noted that the terms ‘Bayesianism’ and ‘non-Bayesianism’ are often given many different meanings in the literature. In the present context, it is particularly important to distinguish between Bayesian decision theory and Bayes’s theorem. The latter is a simple mathematical result concerning how to update partial beliefs in light of new evidence. Bayesian decision theory is, in contrast, a claim about rational decision making. In its simplest form, Bayes’s theorem tells us that the probability of a hypothesis H given a piece of evidence E is equal to the probability of H multiplied by the probability of E given H, divided by the probability of E. A remarkable fact about Bayes’s theorem is that two or more statisticians who disagree on the unconditional probability, the prior, of e.g. a coin being biased, will come to agree on the conditional probability that the coin is biased after having tossed it a large number of times. The priors will be washed out. Hence, one can figure out the probability that a coin is biased by tossing it a large number of times and then calculating the conditional probability that it is biased given that it landed heads up a certain number of times.2 1 An interval scale is a scale that is invariant up to a positive linear transformation, that is, it preserves distances between objects. 2 For an accessible overview of Bayesian statistics, see Howson and Urbach (2006).
1 Introduction
3
Bayes’s theorem is not a part of Bayesian nor any other decision theory. Bayes’s theorem is a claim about what it is rational to believe after new information has been received. Bayesian decision theory is, on the other hand, a claim about what to do once one’s partial beliefs and desires have been fixed. In a decision on getting married, Bayes’s theorem can, at least in principle, be applied for calculating the probability that one will lead a happier life were one to get married. However, the desire for happiness must be weighed against others, such as the desire for freedom, and this is not an epistemic issue. The aggregation of beliefs and desires into a judgement about rational choice is, naturally, a decision-theoretical issue.3 In what follows it will be assumed that all decision-makers, both Bayesian and non-Bayesian, accept Bayes’s theorem and use it for updating their partial beliefs in light of new evidence. This theorem is not what the discussion is all about. As indicated above, Bayesian decision theory is a claim about a certain way of defending the principle of maximising subjective expected utility. It holds that probabilities are subjective degrees of belief that can be determined by observing the agent’s preferences over uncertain prospects. For example, if you prefer the uncertain prospect ‘win $10 if rain, $5 if no rain’ to the uncertain prospect ‘win $5 if rain, $10 if no rain’, and given that you think $10 is better than $5, then Bayesians are committed to saying that your subjective degree of belief that it will rain is higher than your subjective degree of belief that it will not. Scholars advocating the Bayesian view are also committed to a certain view about the concept of utility. Both Bayesians and non-Bayesians use the concept of utility for measuring the strength of the agent’s desires. However, what is unique for Bayesianism compared to non-Bayesianism is the claim that utilities can be determined in the same way as probabilities, i.e. by observing the agent’s preferences over uncertain prospects. Suppose, for example, that the agent prefers a Mercedes to a Volvo, and a Volvo to a Ford, and is indifferent between a Volvo for certain and a fifty-fifty chance to win a Mercedes or a Ford. Then a Mercedes is worth 1, a Volvo 0.5, and a Ford 0 (or any positive linear transformation of these numbers) on the agent’s personal utility scale.4 In summation, Bayesian decision theorists maintain that if an agent’s preferences over uncertain prospects satisfy a number of structural conditions, then she behaves as if she were choosing acts that maximise her subjective expected utility. In Bayesian decision theory, subjective probabilities and utilities are not reasons for preferring one act over another. As explained above, subjective probability and utility are hypothetical entities that can be constructed only after the agent’s preferences over uncertain prospects have been revealed. However, what most vacillating agents are looking for, potential bridegrooms included, is presumably a decision 3
Some choices are, of course, epistemic choices, i.e. choices about what to believe. This does not invalidate the distinction, however. The best way to draw the line is to emphasise that only actionguiding theories take the agent’s desires into account. Purely belief-guiding claims like Bayes’s theorem aim at approximating the truth, without taking any notice of the agent’s desire for learning the truth. 4 As will be explained in Chapter 2, Bayesians define the concept of a fifty-fifty chance by applying Ramsey’s concept of an ‘ethically neutral’ proposition. See Ramsey (1926).
4
1 Introduction
theory in which the agent’s actual desires and beliefs figure as genuine reasons for deciding what to do. The non-Bayesian decision theory defended here is an attempt to construct a theory that meets this demand.
1.1 The subjective non-Bayesian approach Non-Bayesian decision theory is a heterogenous set of theories. For example, theories advocating objective concepts of probability, such as Laplace (1814), Keynes (1921), Popper (1957), and Mellor (1971) are all non-Bayesian. However, the present work defends a subjective non-Bayesian decision theory, i.e. a theory based on a subjective notion of probability. The subjective non-Bayesian approach retains and develops the attractive Humean intuition that beliefs and desires—and nothing else—constitute reasons for choosing among uncertain prospects. It is thus irrelevant what the objective propensity or frequency of an event happens to be. It should be emphasised that I will simply take for granted that beliefs and desires constitute reasons for action. No attempt will be made to offer any justification for this claim. I will also refrain from discussing more general aspects of the concept of reason, and how reasons are related to action. There is a huge contemporary literature addressing these problems, to which no contribution will be made here.5 As explained above, the main thesis defended by advocates of the subjective nonBayesian approach is that rationality requires us to maximise subjective expected utility, even though utility and subjective probability should not be defined in terms of preferences over uncertain prospects, contrary to what Bayesians argue. Instead, utility and probability should be defined in terms of preferences over certain outcomes. Utility and probability functions derived in this manner can then be used to generate preferences over uncertain prospects, which conform to the principle of maximising expected utility. It is convenient to articulate the subjective non-Bayesian view in a number of, so far loosely interconnected claims. The first claim is a negative one. Briefly put, I argue that Bayesian decision theory is unavailing from an action-guiding perspective. For the deliberating agent, the output of Bayesian decision theory is not a set of preferences over alternative acts—these preferences are on the contrary used as input to the theory. Instead, the output of a decision theory based on the Bayesian approach is a (set of) utility function(s) that can be used for describing the agent as an expected utility maximiser. This is why ideal agents do not prefer an act because its expected utility is favourable, but can only be described as if they were acting from this principle. The second claim is that a successful decision theory must incorporate a theory about the initial, representational phase of decision making, i.e. how to represent decision problems in formal representations. Briefly put, I will show that all
5
See e.g. Dancy (2004) and Broome (2005).
1.2 A criterion of rationality for ideal agents
5
permissible transformations of one representation of a decision problem into another are governed by a single, intuitive axiom. A further claim concerns the concepts of utility and subjective probability. Instead of defining utility and probability in terms of preferences over uncertain prospects, it is shown that these concepts can be defined by making use of a philosophical claim about the nature of preferences. The basic idea is that preferences, including preferences over certain prospects, are sometimes indeterminate. Traditionally, it is assumed that an agent who has a preference knows for sure that he has the preference in question and is disposed to act upon it. However, for reasons to be explained later, I argue that the picture of rational preference inherent in this assumption is too simplistic. As an alternative to the deterministic account of preferences, a probabilistic theory of indeterminate preferences is developed, which aligns well with the non-Bayesian approach. From this theory of preferences, nonBayesian notions of utility and subjective probability can be derived, which are not based on preferences over uncertain prospects. Two new non-Bayesian axiomatisations of the expected utility principle will also be presented. They differ from Bayesian ones in two respects. First, they are more straightforward in that they do not prove representation or uniqueness theorems (for preferences over uncertain prospects). Instead, the principal idea is to use the concept of transformative decision rules for decomposing the principle of maximising expected utility into a sequence of normatively reasonable subrules. It is showed that this technique provides a resolution of the Allais paradox that cannot be obtained in other axiomatisations. An additional advantage of the non-Bayesian axiomatisations is that they do not rely on any version of the much criticised independence axiom or sure-thing principle, adopted by Ramsey, Savage, and some (but not all) of their successors. Of course, both the sure-thing principle and the independence axiom can be derived from the principle of maximising expected utility, and must hence be regarded as true by an adherent to this decision rule. However, they are not necessary for deriving the expected utility principle. This point is subtle, but important to bear in mind when discussing the justification of the expected utility principle.
1.2 A criterion of rationality for ideal agents Having given a brief characterisation of the view I seek to defend, it is essential to explain what this theory is supposed to concern. In decision theory, it is commonplace to make a distinction between normative and descriptive decision theory. Normative decision theory seeks to offer advice on how to behave in situations involving risk or uncertainty. The discipline emerged in the 17th century, as mathematicians and philosophers began to discuss the ‘fair price’ for taking part in various games and lotteries. Descriptive decision theory, on the other hand, aims at describing and explaining how decisions are actually made. This book is a contribution to the literature on normative decision theory.
6
1 Introduction
Nearly all research in normative decision theory aims at finding out to what ideal agents are committed. An ideal agent is defined as an agent whose ability to process and store information is physically unlimited6 , and whose preferences do not violate a set of axioms formulated by the decision theorist. The axioms typically regulate how beliefs and desires are linked to choices, or how sets of choices are linked to each other. Non-ideal agents violate at least one of the axiomatic constraints, or fail to process sufficiently large amounts of information. The difference between ideal and non-ideal agents can be contrasted with an additional distinction between criteria of rationality and decision making procedures. A criterion of rationality specifies the conditions that must be fulfilled if an act is rational, but it does not give any advice on how to find out which acts are rational. An agent, who actually starts to calculate, say, the expected utility of a set of alternative acts every time he faces a decision, is likely to end up making mistakes or spending too much time doing his calculations. In either case, the consequence might be that the agent fails to choose the act prescribed by the expected utility principle. This problem can be partly avoided by implementing a decision making procedure saying that one should only make a rough estimate of the expected utility of an act when making everyday decisions.7 By combining the two distinctions, four different categories of decision theories can be distinguished. Table 1.1 Criterion Procedure
Ideal agent Ramsey, Savage, Jeffrey
Non-ideal agent Weirich, Pollock Hammond et al, Clemen
In the upper left corner are theories proposing criteria of rationality for ideal agents. The theory proposed herein, as well as the theories advocated by Ramsey (1926), Savage (1954/72), and Jeffrey (1983), are best conceived of as criteria of rationality for ideal agents. These theories can be contrasted with theories proposing criteria of rationality for non-ideal agents. Two recent contributions to that field are Weirich’s book Realistic Decision Theory (2004) and Pollock’s Thinking About Acting (2006). These authors seek to develop decision theories that are applicable to human agents who have limited capacities to store and process information. This is, of course, an important area of investigation, but it will not be discussed any further here. The lower right corner of the matrix is reserved for theories advising nonideal agents about decision making procedures. These kinds of decision theories 6 It does not follow that an ideal agent has a totally unlimited computing capacity. For example, if a problem is not Turing-computable, then no agent (not even an ideal one) could overcome this restriction. 7 Note that one can decide which decision making procedure is best only if one can somehow compare the recommendations yielded by the decision making procedure with those derived from the criterion of rightness. This might very well turn out to be impossible.
1.3 Basic concepts
7
are frequently developed by business consultants. Clemen’s book Making Hard Decisions (1991) and Hammond, Keeney, and Raiffa’s Smart Choices: A Practical Guide To Making Better Life Decisions (1999) are illuminating examples. The lower left corner of the matrix is empty. As far as I am aware, no one has yet proposed a decision making procedure for ideal agents. This is not because ideal agents have no need of such a procedure. Even if there are no physical limitations to one’s ability to process large amounts of information, it might still be true that the expected utility of not actually doing the calculations oneself would be higher. The agent’s computing capacity could then be redirected to work on other, more urgent problems. The non-Bayesian view I propose is, of course, not a solution to all problems in decision theory. In fact, many interesting issues discussed by contemporary decision theorists will not be touched upon at all in this book. As will be explained in Section 1.3, my theory is for example neutral with respect to the controversy over causal and non-causal decision theory. The focus is on the controversy over Bayesianism and non-Bayesianism.
1.3 Basic concepts As no theory can be more precise than the definitions of its basic concepts, it is worth spelling out in some detail the basic concepts employed in the non-Bayesian theory. A decision problem will be conceived of as all entities in the world that prompt the agent to make a choice, as well as all entities that are relevant for that choice. The latter set includes the agent’s partial beliefs and desires. A formal representation of a decision problem, on the other hand, is constituted by a set of symbols representing the decision problem. The distinction between decision problems and formal representations of decision problems is commonplace in decision theory, even though it is sometimes expressed in other words. Jeffrey, for instance, uses the term ‘formal Bayesian decision problem[s]’, which he defines as ‘two rectangular arrays (matrices) of numbers which represent probability and desirability assignments to the act-condition pairs.’8 Savage speaks of ‘formal descriptions’ of decision problems, whereas Resnik uses the term ‘problem specification’. For simplicity, I shall use the terms ‘formal decision problem’ and ‘formal representation’ as being synonymous with the somewhat clumsy term ‘formal representation of a decision problem’. The meaning of the three terms is the same. To model one’s decision problem in a formal representation is essential in decision theory, since decision rules are only defined relative to such formal representations. For example, it makes no sense to say that the principle of maximising expected utility recommends one act rather than another, unless there is a formal representation listing the available acts, the possible states of the world, and the 8
Jeffrey (1983:5).
8
1 Introduction
corresponding probability and utility functions. Formal representations of decision problems are sometimes visualised in decision matrices or decision trees. I shall conceive of a formal representation as an ordered quadruple π =A, S, P,U. The intended interpretation of its elements is as follows. A = {a1 , a2 , . . .} is a non-empty set of acts. S ={s1 , s2 , . . .} is a non-empty set of states. P = {p1 : A × S → [0, 1], p2 : A × S → [0, 1], . . .} is a set of probability functions.9 U = {u1 : A × S → Re, u2 : A × S → Re, . . .} is a set of utility functions. An act is, intuitively speaking, an uncertain prospect. I shall use the term ‘uncertain prospect’ when I speak of alternatives in a loose sense. The term ‘act’ is used when more precision is required. By introducing the quadruple A, S, P,U, a substantial assumption is made. The assumption implies that everything that matters in rational decision making can be represented by the four sets A, S, P,U. Hence, nothing except acts, states, and numerical representations of partial beliefs and desires is allowed to be of any relevance. This is a standard assumption in decision theory. A formal decision problem under risk is a quadruple π = A, S, P,U in which each set P and U has exactly one element. A formal decision problem under uncertainty is a quadruple π = A, S, P,U in which P = 0/ and U has exactly one element. Since P and U are sets of functions rather than single functions, agents are allowed to consider several alternative probability and utility measures in a formal decision problem. This set-up can thus model what is sometimes referred to as ‘epistemic risks’, i.e. cases in which there are several alternative probability and utility functions that describe a given situation.10 Note that the concept of an ‘outcome’ or ‘consequence’ of an act is not explicitly employed in a formal decision problem. Instead, utilities are assigned to ordered pairs of acts and states. Also note that it is not taken for granted that the elements in A and S (which may be either finite or countable infinite sets) must be jointly exhaustive and mutually exclusive. To determine whether such requirements ought to be levied or not is a normative issue that will be analysed in greater detail in subsequent chapters. It is easy to jump to the conclusion that the framework employed here presupposes that the probability and utility functions have been derived ex ante (i.e. before any preferences over alternatives have been stated), rather than ex post (i.e. after the agent has stated his preferences over the available acts). However, the sets P and U are allowed to be empty at the beginning of a representation process, and can be successively expanded with functions obtained at a later stage. The formal set-up outlined above is one of many alternatives. In Savage’s theory, the fundamental elements of a formal decision problem are taken to be a set S of 9
I take for granted that all elements in P satisfy the axioms of the probability calculus. (Otherwise p1 , p2 , . . . would not, of course, be probability functions.) 10 Cf. Levi (1980), G¨ ardenfors and Sahlin (1982).
1.3 Basic concepts
9
states of the world and a set F of consequences. Acts are then defined as functions from S to F. No probability or utility functions are included in the formal decision problem. These are instead derived ‘within’ the theory by using the agent’s preferences over uncertain prospects (that is, risky acts). Jeffrey’s theory is, in contrast to Savage’s, homogenous in the sense that all elements of a formal decision problem—e.g. acts, outcomes, probabilities, and utilities—are defined on the same set of entities, namely a set of propositions. For instance, ‘[a]n act is . . . a proposition which it is within the agent’s power to make true if he pleases’, and to hold it probable that it will rain tomorrow is ‘to have a particular attitude toward the proposition that it will rain tomorrow’.11 In line with this, the conjunction B ∧C of the propositions B and C is interpreted as the set-theoretic intersection of the possible worlds in which B and C are true, and so on. Note that Jeffrey’s way of conceiving an act implies that all consequences of an act in a decision problem under certainty are acts themselves, since the agent can make those propositions true if he pleases. Therefore, the distinction between acts on the one hand and consequences on the other cannot be upheld in his terminology, which appears to be a drawback. However, irrespective of this, the homogenous character of Jeffrey’s set-up is no decisive reason for preferring it to Savage’s, since the latter can easily be reconstructed as a homogenous theory by widening the concept of a state. The consequence of having a six-egg omelette can, for example, be conceived as a state in which the agent enjoys a six-egg omelette; acts can then be defined as functions from states to states. A similar manoeuvre can, mutatis mutandis, be carried out for the quadruple A, S, P,U. I think it is more reasonable to take states of the world, rather than propositions, to be the basic building blocks of formal decision problems. States are what ultimately matter for agents, and states are less opaque from a metaphysical point of view. Propositions have no spatio-temporal location, and one cannot get into direct acquaintance with them. Arguably, the main reason for preferring Jeffrey’s set-up would be that things then become more convenient from a technical point of view, since it is easy to perform logical operations on propositions. In my humble opinion, however, technical convenience is not the right kind of reason for making metaphysical choices. An additional reason for conceiving of a formal decision problem as a quadruple A, S, P,U, rather than in the way proposed by Savage or Jeffrey, is that it is neutral with regard to the controversy over causal and evidential decision theory. To take a stand on that issue would be beyond the scope of the present work. In Savage’s theory, which has been claimed to be ‘the leading example of a causal decision theory’, it is explicitly assumed that states are probabilistically independent of acts, since acts are conceived of as functions from states to consequences.12 This requirement makes sense only if one thinks that agents should take beliefs about causal relations into account: Acts and states are, in this type of theory, two independent entities that 11 12
Jeffrey (1983:59 and 84). Broome (1999:103).
10
1 Introduction
together cause outcomes. In an evidential theory such as Jeffrey’s, the probability of a consequence is allowed to be affected by what act is chosen; the agent’s beliefs about causal relations play no role. Evidential and causal decision theories come to different conclusions in Newcomb-style problems.13 For a realistic example, consider the smoking-caused-by-genetic-defect problem:14 Suppose that there is some genetic defect that is known to cause both lung cancer and the drive to smoke. In this case, the fact that 80 percent of all smokers suffer from lung cancer should not prevent a causal decision theorist from starting to smoke, since (i) one either has that genetic defect or not, and (ii) there is a small enjoyment associated with smoking, and (iii) the probability of lung cancer is not affected by one’s choice.15 An evidential decision theorist would, on the contrary, conclude (incorrectly) that if you start to smoke, there is an 80 percent risk that you will contract lung cancer. It seems obvious that causal decision theory, but not its evidential rival (in its most na¨ıve version), comes to the right conclusion in the smoking-caused-bygenetic-defect problem. Some authors have proposed that for precisely this reason, evidential decision theory should be interpreted as a theory of valuation rather than as a theory of decision.16 A theory of valuation ranks a set of alternative acts with regard to how good or bad they are in some relevant sense, but it does not prescribe any acts. Valuation should, arguably, be ultimately linked to decision, but there is no conceptual mistake involved in separating the two questions. A significant advantage of e.g. Jeffrey’s evidential theory, both when interpreted as a theory of decision and as a theory of valuation, is that it does not require that we understand what causality is. The concept of causality plays no role in this theory. Since there are significant pros and cons for either the causal or evidential approach, it seems reasonable to opt for a set-up that allows for both kinds of theories, until the dispute has been resolved. Thus, advocates of causal decision theory should claim that the functions P = {p1 : A × S → [0, 1], p2 : A × S → [0, 1], . . .} ought to be ‘inert’ with respect to A, i.e. that probabilities can be described equally well by a set of functions P = {p1 : S → [0, 1], p2 : S → [0, 1], . . .}. Evidential decision theorists like Jeffrey should, on the contrary, insist that probabilities must indeed be represented by functions that take pairs of acts and states as their arguments.17
13
Newcomb’s problem was introduced in Nozick (1969). According to Pollock (2002:143-144), this example was first mentioned by Robert Stalnaker in a letter to David Lewis in 1978. 15 For an excellent discussion of causal decision theory, see Joyce (1999). 16 See Broome (1999:104–6). 17 The claim that my set-up is neutral with respect to the controversy over causal and evidential decision theory is, of course, not a decisive reason for choosing it. It seems likely that it is also possible to construct some propositional approach that is neutral with respect to this controversy. 14
1.4 Preview
11
1.4 Preview In Chapter 2 the Bayesian approach is spelled out in more detail, in a way I believe is accessible for readers who have no background in decision theory. After having explained what I take to be the core of Bayesian decision theory, I will argue that this theory ‘puts the cart before the horse’, from the point of view of the deliberating agent. The theory defines subjective probability and utility in terms of preferences over uncertain prospects. Therefore, those probability and utility numbers cannot figure as reasons for forming new preferences over the same set of uncertain prospects. In Chapter 3, I develop an account of how decision problems ought to be represented in formal decision problems. For this purpose, it is fruitful to distinguish between two classes of decision rules, viz. effective and transformative decision rules. Effective decision rules yield prescriptions on how to act on the basis of the available information. The principle of maximising expected utility, as well as the maximin and minimax regret rules are well-known examples of effective decision rules. Transformative decision rules, on the other hand, do not directly prescribe any particular act or set of acts. Instead, they transform a given decision problem into another by altering the structure of the initial problem. A transformative decision rule can, more precisely, change the set of alternatives or the set of states of the world taken into consideration, modify the probabilities assigned to the states of the world, or modify the utilities assigned to the corresponding outcomes. The upshot of Chapter 3 is an axiomatic characterisation of transformative decision rules, based on a single axiom. Chapter 4 questions the common assumption that preferences over alternative acts are determinate. As an alternative to the deterministic account of preference, a probabilistic theory of indeterminate preferences is developed—one which fits well with the non-Bayesian approach. Suppose, for example, that you cannot make up your mind in a choice between two bottles of wine. That is, it is not the case that you prefer bottle A to bottle B, nor that you prefer B to A, nor that you are indifferent between A and B. Then, instead of just saying that you consider A and B to be incomparable, the probabilistic theory allows one to say that you consider the probability that you will choose A over B to be p. This theory allows for more minute statements about incomparability. In Chapter 5 the probabilistic theory of preferences is applied for justifying a non-Bayesian theory of utility. The technical part of the theory relies on a theorem proved by Duncan Luce (1959/2005). However, unlike Luce, I favour a subjective interpretation of the probabilities assigned to the subject’s choices. The chapter also considers two other non-Bayesian notions of utility, viz. the classical concept, holding that utility should be defined in terms of pleasurable mental states, as well as a theory proposed by Halld´en (1980) and Sahlin (1981). Chapter 6 outlines a non-Bayesian theory of subjective probability, based on the theory proposed by DeGroot (1970). DeGroot assumes that agents can make qualitative comparisons between pairs of events (states), and judge which one they think is most likely to occur. He then proves that, if the agent’s qualitative judgements
12
1 Introduction
are sufficiently fine-grained and satisfy a number of structural axioms, there exists a function p that assigns real numbers between 0 and 1 to all events, such that one event is judged to be more likely than another if, and only if, it is assigned a higher number. Moreover, p satisfies the axioms of the probability calculus, so it is a genuine subjective probability function. The end result of the chapter is what I take to be an improved version of DeGroot’s theory. The main advantage is that the new version explicates a procedure for linking subjective probability to observable choice. Chapter 7 is devoted to an axiomatic treatment of the principle of maximising expected utility. Two different axiomatisations are proposed. A key observation in the first axiomatisation is that the principle of maximising expected utility can be broken down into a sequence of transformative and effective subrules that can be justified individually. The axioms together imply that it is normatively reasonable to apply the principle of maximising expected utility to every decision problem under risk. The proof is obtained by constructing a sequence of transformative and effective decision rules. The second axiomatisation makes no use of the notion of transformative decision rules. None of the axioms proposed in Chapter 7 are implied, nor can they be derived from, the independence axiom or the sure-thing principle. It is argued that this opens the door for a resolution of the notorious Allais paradox, which cannot be obtained by using arguments relying on those axioms. Finally, in Chapter 8, I analyse the concept of risk aversion from a non-Bayesian point of view. In the widely accepted theory of risk aversion developed by Pratt (1964) and Arrow (1970), a risk averter is defined as an agent who rejects actuarially fair bets, e.g. a bet in which the probability is fifty percent that you win $1 million and fifty percent that you lose the same amount. The Pratt-Arrow theory of risk aversion has gained influence mainly because it reconciles the expected utility principle with several paradigmatic examples of risk aversion, such as insurance and fixed interest rates. However, it has been claimed that when it comes to decisions with potentially fatal outcomes, such as death, the Pratt-Arrow concept of risk aversion is too weak and ought to be replaced with some stronger concept. Chapter 8 argues against the use of such a stronger concept by proposing a number of impossibility results for decision rules that are risk averse in a sense that goes beyond the Pratt-Arrow concept.
Chapter 2
Bayesian decision theory
Thomas Bayes is best known for his formula for calculating conditional probabilities. However, he also deserves recognition for his observation that the concepts of probability and utility can be defined in terms of preferences over a set of uncertain prospects: The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed, and the chance of the thing expected upon it’s happening . . . If a person has an expectation depending on the happening of an event, the probability of the event is to the probability of its failure as his loss if it fails to his gain if it happens. (Bayes 1763:376-77)
Contemporary accounts of Bayesian decision theory are faithful to Bayes’s original idea. The main difference is that modern theorists seek to render the theory much more precise. The following three principles summarise modern Bayesian decision theory: 1. Subjective degrees of belief are represented by a probability function defined in terms of preferences over uncertain prospects, as suggested by Bayes himself. 2. Degrees of desire are represented by a utility function defined in terms of preferences over uncertain prospects. 3. Rational agents act as if they were maximising subjective expected utility, by multiplying utilities and subjective probabilities. Depending on how the principles of Bayesianism are rendered more precise, the demarcation between Bayesian and non-Bayesian positions can be drawn in slightly different ways. However, for present purposes, it is wise to remain somewhat fuzzy about where exactly to draw the line. Bayesianism is best understood as an active field of research, in which slightly different formulations of certain shared ideas are investigated and elaborated. The aim of this chapter is to give an overview of the ideas central to Bayesian decision theory. The first modern, axiomatic account of Bayesian decision theory was presented by Ramsey in ‘Truth and Probability’ (1926). However, Ramsey never worked out his axiomatisation in detail, since ‘this would, I think, be rather like working out to
13
14
2 Bayesian decision theory
seven places of decimals a result only valid to two.’1 Many contemporary scholars find Ramsey’s view to be question-begging. In what follows, three Bayesian theories will be scrutinised, all of which work out the Bayesian position ‘to seven places of decimals’. The selected theories represent different philosophical approaches to Ramsey’s original idea. The first theory is Anscombe and Aumann (1963), who derive subjective probabilities from objective ones. The second theory is Savage (1954/72), who advocates a purely subjective approach. Finally, the third theory is Jeffrey (1983). His theory articulates a propositional ideal according to which acts, states and outcomes should be defined on the same set of entities, viz. a set of propositions. The Anscome-Aumann approach is interesting in that it is transparent from a technical point of view, so it makes sense to present it first. Savage’s theory, which has been remarkably influential over the past fifty years, is considerably more complex. Jeffrey’s theory is—together with Ramsey’s—among the philosophically most sophisticated Bayesian decision theories developed so far. The plan of this chapter is as follows. Section 2.1 gives a detailed but nontechnical introduction to the basic ideas of Bayesian decision theory. Sections 2.2 to 2.4 are devoted to the theories presented by Anscombe and Aumann, Savage, and Jeffrey, respectively. Section 2.5 presents an argument for considering alternative, non-Bayesian views; briefly put, the main complaint raised against the Bayesian approach is that it does not offer any substantial action-guidance to agents, not even to ideal ones. This is because in Bayesian theories, beliefs and desires are mere hypothetical entities ascribed to agents. Beliefs and desires do not figure as genuine reasons for selecting one alternative act over another.
2.1 The basic idea Bayesian decision theory starts off from the conviction that subjective probabilities and utilities cannot be directly revealed through introspection. Surely, one could ask people to estimate their numerical probabilities and utilities, but the answers gathered by this method would almost certainly be arbitrary. Therefore, in order to comply with the old behaviouristic ideal, which holds that beliefs and desires may be ascribed to agents only if those beliefs and desires can somehow be linked to observable behaviour, Bayesians favour a more sophisticated strategy. In essence, Bayesians argue that subjective probabilities and utilities can be established by asking agents to state preferences over uncertain prospects. Preferences are, according to the mainstream view, revealed in choice behaviour. If the agent is offered a choice between two uncertain prospects and chooses one of them, it is reasonable to conclude that he preferred the chosen one. Some Bayesians go as far as saying that preferences may be identified with choices.2
1 2
Ramsey (1926:180). See e.g. Varian (1999).
2.1 The basic idea
15
It is helpful to illustrate the Bayesian view in an example. Consider the following very generous options: A If it rains in Cambridge tomorrow, you win a trip for two to Hawaii; otherwise, you win nothing. B If it does not rain in Cambridge tomorrow, you win a trip for two to Hawaii; otherwise, you win nothing. A rational agent should let his preference between A and B be determined by two considerations, viz. the degree to which he believes that it will rain, and his desire to win the trip to Hawaii. We know that the agent desires the trip to Hawaii to some degree, i.e. desires it more than no trip to Hawaii, because a trip to Hawaii for certain is preferred over no trip. Then, if the agent prefers A over B, he thinks it is more probable that it will rain in Cambridge tomorrow than not; otherwise, he would be more likely to get what he desires by choosing the other option. Furthermore, if the agent prefers B to A, he thinks it is more probable that there will be no rain, for the same reason. Finally, if the agent is indifferent between A and B, the agent must consider both events to be equiprobable. This is because no other probabilities make both options come out as equally attractive. This example illustrates how Bayesians can elicit some qualitative information about partial beliefs from preferences over uncertain prospects. But what about quantitative information? Suppose that you wish to measure your subjective probability that your first edition of Wittgenstein’s Tractatus, worth $10, 000, will get stolen. If you consider $500 to be a fair price for insuring the book—that is, if that amount is the highest price you and your insurance company can agree on for a bet in which you win $10, 000 if the event ‘The book gets stolen’ takes place, and nothing 500 = 0.05. In order otherwise, then your subjective probability is approximately 10,000 to render the approximation precise, a linear measure of desire must be established. Since the publication of Bernoulli’s (1738) paper on the St Petersburg paradox, decision theorists use the concept of utility to refer to such a linear measure of desire.3 Most agents have a decreasing marginal utility for money, so the monetary amounts in the Wittgenstein example give only a rough indication of the strength of the utilities at stake. However, if one knew the probability of at least one event, a utility function could be extracted from preferences over uncertain prospects in the same way as with subjective probabilities. In order to see this, suppose that we somehow know that a person named Bill considers the probability of ‘Rain in Cambridge today’ to be fifty percent. Also suppose that he prefers a first edition of Wittgenstein’s Tractatus to a copy of Philosophical Investigations, to a copy of his Zettel. Then, if Bill is indifferent between the prospect of getting a copy of Philosophical Investigations for certain, and that of getting a first edition of Wittgenstein’s Tractatus if the 3
The St. Petersburg paradox is derived from the St. Petersburg game, which is played as follows: Player X tosses a fair coin until a head appears. Then X gives player Y a prize worth n dollars, where n is the number of times the coin was tossed. How much money should Y be willing to pay for playing this game? According to the expected utility principle, Y should pay any finite amount of money for entering the game, since its expected utility is ∑ 2n × (2n ) = 1 + 1 + 1 + . . . = ∞. But this is absurd. No one would pay even $100 for playing a game in which there is almost a 90 percent chance of winning $8 or less.
16
2 Bayesian decision theory
state of ‘rain in Cambridge today’ occurs, and a copy of Zettel if it does not, then Bill’s utility for the three possible outcomes could be represented on a linear scale in the following way: A first edition of Tractatus is worth 1, a copy of Philosophical Investigations is worth 0.5, and a copy of Zettel is worth 0 (because then the expected utility of the two prospects will be equal). These numbers are, as mathematicians say, invariant up to positive linear transformations. This means that one could equally well let the outcomes be represented by the numbers 200, 150, and 100, respectively. The numbers only carry information about value differences. The Wittgenstein example shows how Bayesians can extract subjective probabilities and utilities from preferences over uncertain prospects, given that the probability of at least one event can be exogenously defined. For technical reasons, this exogenously defined event is usually taken to be an event whose probability is fifty percent. Here is an example. Suppose that Bill strictly prefers one object—say, one of Wittgenstein’s books—to another. Then, if Bill is indifferent between the prospect in which (i) he wins the first object if Event Q occurs and the second object if Event ¬Q occurs, and the prospect in which (ii) he wins the second object if Event Q occurs and the first object if Event ¬Q occurs, then the two events are by definition equiprobable. It follows that the probability of Q is fifty percent, since Q and ¬Q are mutually exclusive. A simple numerical example can help clarify this important point. If the agent considers Q and ¬Q to be equiprobable, then he will be indifferent between winning, say, (i) 200 utiles if Q occurs and 100 utiles if ¬Q occurs, and (ii) 100 utiles if Q occurs and 200 utiles if ¬Q occurs. This holds true, no matter what his attitude to risk is; all that is being assumed is that the agent’s preference is entirely fixed by his beliefs and desires. A peculiar feature of Bayesianism is, thus, that probabilities and utilities are derived from ‘within’ the theory. The agent does not prefer an uncertain prospect to another because he judges the utilities and probabilities of the outcomes to be more favourable than those of another. Instead, the well-organised structure of the agent’s preferences over uncertain prospects logically implies that the agent can be described as if his choices were governed by a utility function and a subjective probability function, constructed such that a prospect preferred by the agent always has a higher expected utility than a non-preferred prospect. Put in slightly different words, the probability and utility functions are established by reasoning backwards: Since the agent preferred some uncertain prospects to others, and the preferences over uncertain prospects satisfy a number of structural axioms, the agent behaves as if he had acted from a subjective probability function and a utility function that are consistent with the principle of maximising expected utility. The axiomatic conditions on preferences employed in Bayesian theories only restrict which combinations of preferences are legitimate. As Simon Blackburn puts it, the axioms constitute a ‘grid for imposing interpretation: a mathematical structure, designed to render processes of deliberation mathematically tractable, whatever those processes are’.4 4
Blackburn (1998:135).
2.2 From objective to subjective probability
17
The most fundamental axiom in Bayesian decision theory is the ordering axiom. It holds that, ‘for any two objects or rather any two imagined events, [the agent] possesses a clear intuition of preference’, and these preferences are asymmetric and transitive.5 In technical terms, the ordering axiom holds that the binary preference relation ‘at least as preferred as’, , has the following properties. Ordering Axiom For all alternatives a, a , a : 1. a a or a a. 2. If a a and a a , then a a . The ordering axiom does not tell the agent whether he should prefer red wine to white, but the axiom does tell the agent that, for instance, if it is not the case that red wine is at least as preferred as white, then white wine is at least as preferred as red. Before stating the remaining axioms, it is helpful to say a little bit more about what a Bayesian axiomatisation is supposed to prove. Strictly speaking, when Bayesians claim that rational agents behave ‘as if’ they act from subjective probability and utility functions, and maximise subjective expected utility, they merely claim that the agent’s preferences over alternative acts can be described by some representation and uniqueness theorems, both of which have certain technical properties. As will be further explained in Section 2.5, this is why Bayesian decision theorists cannot legitimately say that the probability and utility functions constitute genuine reasons for choosing one alternative over another. A representation theorem is a mathematical result that shows that a certain non-numerical structure, e.g. a set of preferences over uncertain prospects {a, a , . . .}, can be represented by some real-valued function r, such that a is preferred to a if and only if r(a) > r(a ). A uniqueness theorem states what transformations of r are allowed, given that the new function r is to be an accurate representation of the same structure as r.
2.2 From objective to subjective probability If one knows the objective probability of sufficiently many events, then one can derive subjective probabilities for other events. This is the take home message of a popular route to Bayesian decision theory, first proposed by Anscombe and Aumann in 1963, in a short note called ‘A Definition of Subjective Probability’. As indicated by the title, their main objective was not to propose a theory of rational choice, but rather to develop a subjective theory of probability. However, in the process of working out their theory, they also happened to develop a normative decision theory en passant. 5
von Neumann and Morgenstern (1947:17).
18
2 Bayesian decision theory
As frequently pointed out in the literature, the Anscombe-Aumann approach is not a pure Bayesian theory. This is because they assume a preference ordering over lotteries with objective probabilities. More precisely put, Anscombe and Aumann distinguish between two types of lotteries: roulette lotteries and horse race lotteries. A roulette lottery is a lottery for which there are known objective probabilities, whereas a horse race lottery is conceived of as a lottery, ‘based on a single, particular horse race’.6 In light of this distinction, Anscombe and Aumann make clear that the aim of their theory is to ‘define the [subjective] probabilities which you associate with each of the possible outcomes of this race.’7 In order to explain how the Anscombe-Aumann approach works, let M be a set of prizes. It is assumed that M contains some least-desired prize, as well as some most-desired one. A ‘lottery’ is a device for deciding which prize in the set of M the agent will be awarded. Lotteries are either simple or compound. A compound lottery is a lottery whose prizes are other lotteries; the prizes are pure roulette lotteries, or a mix of roulette lotteries and horse race lotteries. Roulette lotteries are represented as vectors of ordered pairs, [(o1 , m1 ), . . . , (on , mn )], where (o1 , m1 ) denotes an objective probability o1 of getting prize m1 . It is assumed that all oi :s sum up to 1, and that each mi is either a prize from M or a lottery ticket in a roulette lottery or a horse race lottery over M. Anscombe and Aumann ask us to consider a set of roulette lotteries R, which they define as the set of all simple and compound roulette lotteries with prizes in M. The elements of R are denoted by r = [(o1 , m1 ), . . . , (on , mn )], and so on. Let be a binary preference relation on R satisfying the von Neumann and Morgenstern axioms:8 vNM 1 is a complete, asymmetric and transitive binary ordering on R. vNM 2 If r r r , then there is an α , β ∈ (0, 1) such that α · r + (1 − α ) · r r β · r + (1 − β ) · r . vNM 3 For all r, r , r and all α ∈ (0, 1), r r if and only if α · r + (1 − α ) · r α · r + (1 − α ) · r . In this exposition, vNM 3 is the much-criticised independence axiom. Numerous counter examples have been constructed purporting to show that vNM 3 is an unacceptable restriction on preferences over lotteries.9 Some authors have also questioned the plausibility of vNM 1 and vNM 2.10 It follows from vNM 1 to vNM 3 that the agent’s preferences can be represented by a utility function u on R, such that: 6
Anscombe and Aumann (1963:200). ibid. 8 von Neumann and Morgenstern express their axioms in a slightly different way; see (1947:247). The formulations given here, which are more attractive from a technical point of view, follow Kreps (1988:43-4), and Schmidt (1998:4-6). 9 Cf. Allais (1953), Rabinowicz (1995) and Schmidt (1998:15-19). 10 See e.g. Schmidt (1998). 7
2.2 From objective to subjective probability
19
1. u(r) > u(r ) iff r r 2. u(r) = o0 u(m1 ) + . . . + on u(mn ) 3. For every other function u satisfying (1.) and (2.), there are numbers c > 0 and d such that u = c · u + d Property (1.) states that u assigns higher utility numbers to better lotteries. Property (2.) is the expected utility property, according to which the value of a compound lottery is equal to the expected value of its components. Property (3.) implies that all utility functions satisfying (1.) and (2.) are positive linear transformations of each other, i.e. that utility is measured on an interval scale.11 After having established that the agent’s preferences over roulette lotteries can be represented by numerical utilities, the next move in the Anscombe-Aumann approach is to consider a horse race lottery [r1 , . . . , rn ], whose prizes are roulette lotteries in R. The set of all horse race lotteries constructed from R is denoted by H. Consider the set R∗ , which is defined as the set of all roulette lotteries whose prizes are horse race lotteries from H. The key assumptions made by Anscombe and Aumann is that (i) the agent has preferences over the elements of R∗ (in addition to the preferences over R), and that (ii) preferences over the elements of R∗ satisfy vNM 1 to vNM 3. Assumption (ii) directly implies that utilities can be assigned to the elements of R∗ . The main innovation in the Anscombe-Aumann approach is to apply von Neumann-Morgenstern utility theory twice over, first on the elements of R, and then on the elements of R∗ , as explained above. The two systems of preferences and utilities can then be connected by introducing two additional axioms. In order to separate the two preference orderings, the symbol ∗ will be used to refer to a binary preference relation on R∗ . Furthermore, ri denotes the ith subset of R∗ . AA 1 If ri ri , then [r1 , . . . , ri , . . . , rn ] ∗ [r1 , . . . , ri , . . . , rn ] AA 2 (o1 [r11 , . . . , rw1 ], . . . , om [r1m , . . . , rwm ]) ∼∗ [(o1 r11 , . . . , om r1m ), . . . , (o1 rw1 , . . . , om rwm )] Axiom AA 1 is a straightforward monotonicity assumption. Axiom AA 2 is more complex. It asserts that if the prize the agent will receive is determined in two steps, by first observing the outcome of a horse race and then spinning a roulette wheel, then the value of this compound lottery should not be affected if the procedure is reversed, such that the roulette wheel is spun before the horse race. AA 2 is sometimes called ‘reduction of compound lotteries’. The axioms stated above, vNM 1 to vNM 3 and AA 1 and AA 2, together imply the following theorem. Theorem 2.1 There is a unique and finite set of non-negative numbers p1 , . . . , pn summing to 1, such that for all [r1 , . . . , rn ] in H, u∗ [r1 , . . . , rn ] = p1 u(r1 ) + . . . + pn u(rn ) 11
Proofs of claim (1.) to (3) are given in many textbooks on decision theory; see e.g. Luce and Raiffa (1957), Kreps (1988), Resnik (1993), and Peterson (in press)
20
2 Bayesian decision theory
Put in non-technical terms, Theorem 2.1 shows that there is a subjective probability function such that the von Neumann-Morgenstern utility of a horse race lottery equals the subjective expected utility of the horse race, i.e. the overall utility of the horse race is determined by the weighted sum of the agent’s subjective probability for each outcome of the race, multiplied by the von Neumann-Morgenstern utility of that outcome.
2.3 The purely subjective approach The theory considered above derives subjective probabilities from objective ones. From a philosophical point of view, it might be objected that this is unhelpful. For reasons explained in Chapter 6, many decision theorists believe that objective notions of probability are obsolete. It is therefore desirable to develop some other, purely subjective method for establishing subjective probabilities and justifying the expected utility principle. Several options are available, but the most influential theory is Savage’s. In his theory, the alternatives from which the agent is asked to choose are defined as functions that attach an outcome to each possible state of the world. These uncertain prospects are called acts. For example, the act of bringing a present to your friend’s birthday party is a function that returns the outcome that your friend is happy if he likes your present (if that state obtains), but the outcome that your friend is unhappy if he does not like it. Savage’s theory is based on six axioms for preferences over acts. In order to state the axioms and the corresponding theorem in full detail, let S = {s, s , . . .} be a set of states of the world with subsets B,C, . . ., and let X = {x, x , . . .} be a set of outcomes. Let A = {a, a , . . .} be a set of acts, i.e. functions from S to X. Because acts are functions, a(s) = x denotes the outcome of performing a given that s is the true state of the world. Savage stipulates that is a binary relation on the set of acts. By stipulation, a and a agree with each other in B just in case a(s) = a (s) for all s ∈ B. It is also stipulated that a a given B, if and only if, b b for all a and b that agree with a and b, respectively, on B and with each other on ¬B and b b either for all such pairs or for none. Furthermore, B is null if and only if a a given B for every a, a . Savage’s first axiom is a traditional ordering axiom: SAV 1 is a complete and transitive binary ordering on A. Savage’s second axiom is the infamous sure-thing principle. In the words of Aumann et al., ‘the sure-thing principle. . . says that if an agent would take a certain action if he knew that an event E obtained, and also if he knew that its negation ¬E obtained, then he should take that action even if he knows nothing about E’.12 Stated in Savage’s technical terminology, the axiom looks slightly more complex. SAV 2 If a, a , a , a are such that: 12
Aumann R J, S Hart, and M Perry (2005:2).
2.3 The purely subjective approach
21
1. in ¬B, a agrees with a , and a agrees with a , 2. in B, a agrees with a , and a agrees with a , 3. a a ; then a a In order to illustrate the link between the informal formulation quoted above and condition SAV 2, one might imagine a bus driver who must choose between taking either of two alternative routes between London and Cambridge. Depending on the weather in Cambridgeshire, the passengers’ mood will be slightly different under some states of the world B (unhappy if bad weather, happy otherwise). Suppose that the driver prefers the first route to the second, no matter what the weather is like, i.e. irrespectively of whether the passengers are unhappy or not. Then suppose the two alternatives are slightly modified—perhaps modern stereo equipment is installed in the bus, and if the weather is not bad the passengers will (i) become even happier when listening to the music, but (ii) they will be equally happy no matter which route is taken. However, if the weather is bad they will not take any notice of the music, and the outcomes will be exactly as before. Then, according to the surething principle, the bus driver should still prefer the first route to the second. It is this intuition that is captured by SAV 2. The sure-thing principle is commonly considered to be dubious. Several empirical studies have confirmed that it is frequently violated by ordinary people, even in situations in which they are given plenty of time to consider their preference.13 That said, the normative relevance of empirical findings seems to be marginal. Claims about what we actually do have little bearing on what we ought to do. However, the sure-thing principle (and von Neumann and Morgensgtern’s independence axiom) is also the target of the famous Allais paradox, which is intended to show that this axiom is dubious also from a normative point of view.14 The Allais paradox is discussed in Section 7.5. Savage’s third axiom establishes a link between preferences over acts and preferences over consequences. Consider an act that gives rise to the same consequence, no matter which state occurs. Call such acts ‘static’. Then, the axiom states that one static act is preferred to another, just in case the consequence corresponding to the first static act is preferred to the consequence corresponding to the second static act. Here is the technical formulation: SAV 3 If a(s) = x, a (s) = x for every s ∈ B, and B is not null, then a a given B, if and only if, x x . The next two axioms are primarily required for establishing subjective probabilities. As explained above, was originally defined as a relation over acts. However, in the fourth axiom, the relation is supposed to hold between sets of states. The trick that Savage employs is to introduce the following definition: B is not more probable 13
For an overview of the empirical evidence, see Kagel and Roth (1995: Chapter 8). paradox was originally proposed in Allais (1953). For an introduction, see Resnik (1993:103-5).
14 Allais’
22
2 Bayesian decision theory
than C (abbreviated C ≥ B) if and only if aB aC or x x , for every aB , aC , x, x such that: aB (s) = x for s ∈ B, aB (s) = x for s ∈ ¬B, aC (s) = x, for s ∈ C, aC (s) = x , for s ∈ ¬C. Now consider the following axiom. SAV 4 For every B and C: B ≥ C or C ≥ B. The fifth axiom holds that not all consequences are equally desirable: SAV 5 It is false that for every x, x : x x . Axioms SAV 1–5 imply that the relation as applied to events is a qualitative probability ordering. In other words, orders events from the least probable to the most probable, but it does not assign unique numbers to the events.15 The next axiom, SAV 6, is adopted by Savage for establishing a quantitative probability measure. Briefly put, the axiom declares that the set of states can be partitioned into an arbitrarily large number of equivalent subsets. This assumption entails that one can always find a subset that corresponds exactly to each probability number. SAV 6 Suppose it false that a a ; then, for every x, there is a (finite) partition of S such that, if a agrees with a and a agrees with a except on an arbitrary element of the partition, a and a being equal to x(B) there, then it will be false that a a or a a . From SAV 1–6, Savage derives the following representation and uniqueness theorem. Theorem 2.2 There is a subjective probability function p and a real-valued function of consequences u, such that:
s
s
a a if and only if [u(a(s)) · p(s)]ds > [u(a (s)) · p(s)]ds. Furthermore, for every other function u satisfying (1), there are numbers c > 0 and d such that: u = c · u + d, Savage’s representation and uniqueness theorem guarantees the existence of a probability function representing degrees of belief, and a real-valued function representing desires, such that the agent can be described as if he were acting from the principle of maximising expected utility. As in every Bayesian decision theory, no claim is made about what mental or other process actually triggered the agent’s choices. The theorem merely proves that a particular formal representation of the agent’s choices is possible, which can be used for making predictions about future choices as long as preferences remain unchanged. Savage’s own proof of his theorem is complex. An accessible proof can be found in Kreps (1988). 15
Savage (1954/72:32).
2.4 The propositional approach
23
2.4 The propositional approach The two Bayesians decision theories discussed above refer to real-world objects, e.g. states, acts, and outcomes. Some philosophers consider this to be an advantage, whereas others think it is a disadvantage. The key issue is whether one thinks it is possible to explain exactly what kind of entities states, acts, and outcomes are. Decisions theorists who find this problematic to explain often feel tempted to opt for propositional theories, in which all elements of the decision problem are defined on the same set of entities: a set of propositions. The most well-known propositional account of Bayesian decision theory is Jeffrey (1983). A mathematically rigourous axiomatisation of Jeffrey’s theory was proposed by Bolker in his (1966).16 In the Bolker-Jeffrey theory, ‘[a]n act is . . . a proposition which it is within the agent’s power to make true if he pleases’, and to hold it probable that it will rain tomorrow is, ‘to have a particular attitude toward the proposition that it will rain tomorrow’.17 In order to spell out the Bolker-Jeffrey approach in detail, let S = {A, B, . . .} be a closed atomless set of propositions. That the set is ‘atomless’ means that each element A in S can be defined in terms of other elements in the set. For example, if A = A ∧ B and A = A ∧ ¬B, then A = A ∨ A . This point is important, because it means that in a certain sense, the utility of every proposition depends on the subjective expected utility of some other proposition (the disjunction of the contraries). In the Bolker-Jeffrey theory there are, therefore, no ‘atomic’ propositions corresponding to e.g. Savage’s notion of outcomes. That S is closed means that if A, B ∈ S then A ∨ B ∈ S, ¬A ∈ S, etc. Let T denote the necessarily true proposition and F the necessarily false one. Then, for all A in S, it holds that A ∨ ¬A = T and ¬T = F. It is assumed that S is a Boolean algebra. This means that for all A, B,C in S, it holds that: 1. 2. 3. 4.
A∧T = A = A∨F A ∧ ¬A = F; and A ∨ ¬A = T A ∧ B = B ∧ A; and A ∨ B = B ∨ A A ∧ (B ∨C) = (A ∧ B) ∨ (A ∧C); and A ∨ (B ∧C) = (A ∨ B) ∧ (A ∨C)
The agent is supposed to have pairwise preferences between all propositions in a set S∗ , which is defined as the set S − {F}. This means that F has no place in the preference ordering, but all other propositions can be ordered from the worst to the best. Bolker now proposes the following ordering axiom. BJ 1 For all A, B,C in S∗ : 1. If A B and B C, then A C 2. If both A and B are related through with respect to some element in S∗ , then A B or B A. 16
Jeffrey’s theory was originally presented in 1964. His (1983) is a much revised and improved version of his earlier work. 17 Jeffrey (1983:59, 84).
24
2 Bayesian decision theory
Axiom BJ 1 is neither better nor worse than other Bayesian ordering axioms, such as those preferred by Savage or Anscombe and Aumann. If one has reason to reject one Bayesian ordering axiom, one probably has reason to reject them all. Now consider the second axiom of the Bolker-Jeffrey theory, which is called ‘averaging’. BJ 2 If A ∧ B = F, then 1. if A B, then A A ∨ B and A ∨ B B, and 2. if A ∼ B, then A ∼ A ∨ B and A ∨ B ∼ B. The averaging axiom holds that a disjunction of two propositions should be ranked somewhere between the two disjuncts, i.e. the disjunction should not be strictly preferred to both disjuncts or vice versa. Broome points out that averaging, ‘slightly resembles the independence axiom found in other versions of expected utility theory’.18 Expressed in von Neumann and Morgenstern’s terminology, in which ApC is a lottery that gives you A with probability p and C with probability 1 − p, the independence axiom holds that if A B, then ApC BpC for all C. (Cf. Section 2.2.) In the Bolker-Jeffrey terminology, this could be rewritten as the slightly weaker claim that if A B, then A ∨C B ∨C for all C. Now, by replacing C for A, one gets the averaging axiom. However, this informal derivation of the averaging axiom from the independence axiom shows that the averaging axiom is in fact weaker than the independence axiom. The averaging axiom does not imply the independence axiom, because the former does not hold for an arbitrary C and an arbitrary p. This is because in the Bolker-Jeffrey theory, each proposition carries its own probability. The third axiom in the Bolker-Jeffrey approach is called ‘impartiality’. BJ 3 Given A ∧ B = F and A ∼ B, if A ∨ C ∼ B ∨ C for some C where A ∧ C = B ∧C = F and not C ∼ A, then A ∨C ∼ B ∨C for every such C. Impartiality can be used for testing whether two propositions A and B are equally probable. The idea is simple. If A and B are equally desirable, and C is not, then A and B will be equally probable just in case the disjunctions A ∨ C and B ∨ C are equally desirable. In this manner, degrees of belief can be derived from degrees of desirability, as pointed out already by Ramsey (1926). That said, it should be noted that impartiality is among the most controversial axioms in the Bolker-Jeffrey theory. According to Jeffrey, ‘The axiom is there because we need it, and it is justified by our antecedent belief in the plausibility of the result we mean to deduce from it.’19 This point is elaborated by Broome, who points out that ‘it presupposes expected utility theory to some extent’.20 The reason is that in stipulating that A ∨ C and B ∨ C are equally preferred just in case A and B are equally probable, one presupposes that the correct way of evaluating uncertain prospects is to calculate expected utility. Someone who is inclined to adopt a 18 19 20
Broome (1999:98). Jeffrey (1983:147). Broome (1999:99).
2.4 The propositional approach
25
more risk averse aggregation mechanism for beliefs and desires may simply reject this test for equally probable propositions. It is worth noticing that Savage’s fourth axiom faces the same problem. The last axiom in the Bolker-Jeffrey theory holds that preferences over the elements of S∗ are continuous in the sense that logical implications between propositions and preferences ‘fit together’, as Jeffrey puts it.21 In order to state the axiom, a number of technical terms must be introduced: (i) The ‘supremum’ of a set of propositions is a proposition A∗ that is implied by every proposition in the set, so that A∗ is the upper bound of the set, and implies every other upper bound. (ii) The ‘infimum’ of a set of propositions is a proposition A∗ that implies every proposition in the set, so that A∗ is the lower bound of the set, and implies every other lower bound. (iii) A ‘complete’ Boolean algebra has both a supremum and an infimum. Now consider the following axiom. BJ 4 Suppose that A A∗ B (or that A A∗ B). Then there exists a C such that: If D is implied by C (or that implies C), then A D B. The main result of the Bolker-Jeffrey theory is the following representation theorem. Theorem 2.3 Let S be a complete atomless Boolean algebra, and let be a binary relation on S∗ that satisfies axioms BJ 1–4. Then there is a probability function p and a value function v on S, such that for all A, B in S∗ : 1. A B if and only if v(A)/p(A) ≥ v(B)/p(B) Note that no traditional utility function is introduced in the representation theorem. Instead, the role of utility is played by v(A)/p(A). The function v can be thought of as an abstract construction that measures both probability and utility. Hence, the relation between v and the utility function u is, by stipulation, as follows: u(A) = v(A)/p(A) for all A in S∗
(2.1)
If A and B are contraries (i.e. if they are never concurrently true in the same possible world), it holds that p(A ∨ B) = p(A) + p(B) and v(A ∨ B) = v(a) + v(B). Hence: u(A ∨ B) =
v(A) + v(B) p(A) · u(A) + p(B) · u(B) v(A ∨ B) = = p(A ∨ B) p(A) + p(B) p(A) + p(B)
(2.2)
Equation (2.2) explains how the principle of maximising expected utility fits into the Bolker-Jeffrey theory: Since the theory is atomless, every proposition in S∗ can be reformulated as a disjunction of two contraries, as described above. Hence, the utility of an arbitrary proposition in S∗ will equal the expected utility of its contraries. Bolker also proved a uniqueness theorem. For reasons explained below, it is slightly more complex than other Bayesian uniqueness theorems. 21
Jeffrey (1983:148).
26
2 Bayesian decision theory
Theorem 2.4 Let p and p be probability functions and let v and v be value functions on a complete atomless Boolean algebra S. Then p, v and p , v represent one and the same preference ordering, if and only if: 1. 2. 3. 4.
v = au + bp and p = cu + d p, where ad − bc > 0, and cv(T ) + d = 1, and cv(A) + d p(A) > 0 for all A in S∗ .
Some algebra reveals how transformations of p and v are related to transformations of the utility function u: u =
v a · u + b p c · u + d
(2.3)
Equation (2.3) shows that the Bolker-Jeffrey theory allows for a wider range of transformations of utility functions than other Bayesian theories, and unlike other theories it also allows for transformations of probabilities. The technical term for this kind of transformation is a ‘fractional linear transformation’. The reason why the Bolker-Jeffrey theory allows for this wider range of transformations is that it is based on a smaller set of data than other Bayesian theories. In, for example, Savage’s theory, the agent is required to state preferences over all logically possible uncertain prospects, whereas in the Bolker-Jeffrey theory certain propositions are ruled out from the field of preferences, e.g. propositions that are assigned a probability of 0. (It follows from the averaging axiom that if A B and p(B) = 0 then, because of equation (2.2), it holds that u(A ∨ B) = u(A), which contradicts the averaging axiom.) The proofs of the Bolker-Jeffrey theorems are complicated and will not be presented here. The reader wishing to study them is advised to consult Bolker (1966).
2.5 Do Bayesians put the cart before the horse? Should we be content with the Bayesian approach to decision theory? I think the answer is no. Briefly put, the problem is that the Bayesian approach is not action guiding. Bayesians put the cart before the horse from the point of view of the deliberating decision maker. An agent who is able to state preferences over a set of uncertain prospects already knows what to do. Therefore, a Bayesian agent does not get any new, action-guiding information from the theory. In what follows, I shall articulate what I take to be the most forceful version of this argument. Similar, but less detailed, concerns have been raised by others. In particular, I think I have allies in Arrow (1963:10), Hansson (1988:143-4), Malmn¨as (1994), Resnik (1993:99-100), Yilmaz (1997), and Zynda (2000).
2.5 Do Bayesians put the cart before the horse?
27
The argument to be proposed here questions the normative relevance of every Bayesian decision theory. It is thus irrelevant which particular Bayesian axiomatisation one considers. My argument seeks to show that even if all the axiomatic constraints on preferences proposed by Bayesians were perfectly acceptable from a normative point of view, the Bayesian representation and uniqueness theorems would provide little or no action-guidance. The agent who is about to choose from a large number of very complex acts must know already from the beginning which act(s) to prefer. This follows directly from the ordering axiom. For the deliberating agent, the output of a Bayesian decision theory is thus not a set of preferences over alternative acts—these preferences are, on the contrary, used as input to the theory. Instead, the output of a Bayesian decision theory is a (set of) utility function(s) that can be used to describe the agent as an expected utility maximiser. This is why ideal agents do not prefer an act because its expected utility is favourable, but can only be described as if they were acting from this principle. Ramsey seems to have come to a similar insight shortly after having written ‘Truth and Probability’ (1926). In a short note entitled ‘Probability and Partial Belief’ (1928), he observed that: sometimes the [probability] number is used itself in making a practical decision. How? I want to say in accordance with the law of mathematical expectation; but I cannot do this, for we could only use that rule if we had measured goods and bads. (Ramsey 1931:256)
Ramsey never developed this point further. The argument developed here might capture the problem Ramsey had in mind. However, irrespective of whether this is so, I believe my argument to be worth considering in its own right. In order to spell out the argument in more detail, it is helpful to imagine two agents, A and B, who are exactly parallel with respect to their psychological makeup. They like and dislike the same movies, the same food, the same wines, etc, and they hold the same beliefs about past, present, and future events. For instance, they both believe (to the same degree) that it will rain tomorrow, and they dislike this equally much. However, as always in these kinds of examples, there is one important difference—agent A is able to express his preferences over uncertain prospects in the way required by Anscombe and Aumann, Savage and Jeffrey. That is, for a very large set of acts, agent A knows whether he prefers a to a ; and of course, his preferences conform to the rest of the axioms employed in these theories as well. But agent B is more like the rest of us, so in most cases he does not know if he prefers a to a . However, B’s inability to express preferences over alternative acts is not due to any odd structure of his beliefs and desires; rather, it is just a matter of his low capacity to process large amounts of information (so his preferences conform to the Bayesian axioms in an implicit sense). Since B’s tastes and beliefs are exactly parallel to those of A, it follows that, in every decision, B ought to behave as A would have behaved; that is, B can read the behaviour of A as a guidebook for himself. By now it should be evident that agent A is designed to be the kind of highly idealised rational person described by Anscombe and Aumann, Savage and Jeffrey. Suppose that you are A and then ask yourself what output you get from decision
28
2 Bayesian decision theory
theory as it is presented by the Bayesians. Do you get any advice about what acts to prefer, i.e. does their decision theory provide you with any action-guidance? The answer is no. On the contrary: Even if A was to decide among a large number of very complex acts, it is assumed in the ordering axiom that A knows already from the beginning which act(s) to prefer. So, for A, the output of decision theory is not a set of preferences over alternative acts—these preferences are used as input to the theory. As shown above, the output of a decision theory based on the Bayesian approach is a (set of) utility function(s) that can be used to describe A as an expected utility maximiser. This is why ideal agents do not prefer an act because its expected utility is favourable; they can only be described as if they were acting from this principle. So Anscombe and Aumann, Savage, and Jeffrey take too much for granted. What they use as input data to their theories is exactly what decision theorists want to obtain as output. In that sense, theories based on the Bayesian approach ‘put the cart before the horse’ from the point of view of the deliberating agent. Bayesians may, of course, object that the representation and uniqueness theorems make no claim about the order of the cart and the horse. The theorems merely prove that there is an interesting technical link between preferences and subjective probability and utility. From a mathematical point of view, the theorems are neutral about the temporal order of the two structures.22 However, although true, this point is irrelevant. As explained above, a reasonable decision theory should help ideal decision makers to find out what to do, but Bayesian theories offer no such action guidance to ideal agents. For the non-ideal agent B, the situation is somewhat different. Of course, since B ought to choose the same act as A would have chosen (because they have the same beliefs and desires), B could simply ask A which act he would have preferred in the situation B is now facing. However, in real life, there are no ideal agents like A and, if there were, they would not need decision theory to figure out what to do, as pointed out above. Let us, therefore, adjust our thought experiment to this fact, by assuming that A is dead. It is still true that B ought to behave as A would have behaved, but since the preferences of A are no longer accessible, B cannot use A as a tool to figure out what to do. Now, a Bayesian theory has, strictly speaking, nothing to tell B about how to behave. However, despite this, it is commonly assumed by decision theorists that a representation theorem and its corresponding uniqueness theorem are normatively relevant for a non-ideal agent in indirect ways. Joyce (1999) admits that the completeness part of the ordering axiom is unreasonably strong. His diagnosis is that it must be weakened. Joyce proposes that instead of requiring that the agent’s preference ordering is complete, we should rather ensure that it can be coherently extended to a complete preference ordering, meaning that it must be possible to fill in the missing gaps without violating any of the axioms. Not all incomplete preference orderings can be extended into complete orderings without violating some of the structural axioms on preferences. (Joyce gives
22
I would like to thank Johan Gustafsson and Sven Ove Hansson for stressing this point.
2.5 Do Bayesians put the cart before the horse?
29
an interesting example.) Every such non-extendable incomplete ordering should according to Joyce be regarded as irrational. A slightly different idea is discussed, and eventually rejected, by Hansson (1988: 143-4). He asks us to imagine a non-ideal agent like B who has access to some of his preferences over uncertain prospects, but not to all. Now, if the non-ideal agent also has some partial information about his utility and probability functions, the theorems of Savage and Jeffrey can be put to work to ‘fill the missing gaps’ of the preference ordering, utility function, and probability function, by using the initially incomplete information to reason back and forth, thereby making the preference ordering and the functions less incomplete. In this process, some preferences for uncertain prospects might be found to be inconsistent with the initial preference ordering, and for this reason be ruled out as illegitimate. It should be admitted that for agent B, the proposals of Joyce and Hansson are to some extent action guiding. Since the Bayesian axioms prevent B from forming whatever new preferences he likes, he gets some partial and indirect action guidance from the theory. However, two important problems remain. First, even if the initial information considered by the agent happens to be sufficiently rich to fill the gaps, this manoeuvre offers no theoretical justification for the initial preference ordering over uncertain prospects. Why should the initial preferences be retained? (Unlike Joyce, it seems that Hansson is aware of this problem.23 ) The second problem, which I think has been overlooked in the literature, is that it is unreasonably optimistic to assume that the initial information happens to be sufficiently rich to allow the non-ideal agent to fill all the gaps in the preference ordering. For instance, nothing excludes that the initial preferences over uncertain prospects only allows the agent to derive the parts of the utility and probability function that were already known. Many incomplete preference orderings can, of course, be coherently extended in many (too many!) different ways. Imagine that you have just a single preference for one risky act over another. Then your preferences over a large set of risky acts can be coherently extended in millions of ways. Hence, you will get very little new and useful action-guiding information out of your decision theory. All the theory will tell is that you must not prefer the second risky act over the first, which is trivial. Another way in which non-ideal agents like B might try to make use of the Bayesian representation theorems, with which Savage himself seems to have been sympathetic (as is an elementary textbook in decision theory), is the following:24 Since it is known that an ideal agent can be described as an expected utility maximiser, it follows that, in case a non-ideal agent does the right thing, he can also be described as an expected utility maximiser. Therefore, the non-ideal agent, whom I shall call B2 , should assign quasi-utilities and quasi-probabilities to the possible outcomes of his decision problem. It is appropriate to speak of quasi-utilities and quasiprobabilities, since the numbers B2 comes up with in his assessment have not been derived from a Bayesian theory, but from B2 ’s own ex ante estimate. Thereafter, 23 24
See Hansson (1988:143-44). See Savage (1972:155-6) and Resnik (1993:99-100).
30
2 Bayesian decision theory
in the second step, the non-ideal agent chooses an act that maximises his quasiexpected-utility, i.e. he applies the principle of maximising expected utility to his ex ante estimates of utility and probability. Let us assume, for the sake of the argument, that the ex ante assessments of utility and probability suggested by agent B2 have, in fact, the appropriate mathematical structure, i.e. that these functions satisfy the axioms used in Bayesian theories. The point of this proposal is that the quasi-functions used by the non-ideal agent B2 are not guaranteed to be identical to the functions that an ideal agent like A would have applied in the same situation, even though A and B2 have exactly the same tastes and beliefs. Nothing excludes that B2 ’s ‘real’ utility and probability functions differ from his quasi-functions. Hence, there are cases in which A and B2 would choose different acts. In order to illustrate this point, suppose that A and B2 wish to decide if they should bet $1 on No 17, or not bet at all, at the casino in Monte Carlo. We assume that A prefers a gamble yielding either $-1 or $36, to $0 for sure. Because of his ability to express preferences over a large number of additional gambles, the theories of Anscombe and Aumann, Savage, and Jeffrey inform us that A’s utility for money is, say, linear for negative amounts and quadratic for positive amounts. The probability that the ball will fall into pocket No 17 is 1/38. Thus, since 0 < −1 × 37/38 + 362 × 1/38, the agent’s choice can be described as conforming to the principle of maximising expected utility, even though he does not, of course, form his preferences by applying this principle. Now, despite the fact that B2 ’s beliefs and desires are exactly parallel to A’s, nothing excludes that B2 ’s ex ante quasi-utility for money is, say, linear for all amounts; in this case, the quasiexpected utility is 0 > −1 × 37/38 + 36 × 1/38. Hence, A and B2 would choose different alternative acts in this example. In summary, you are either an ideal agent, or a non-ideal one. In the first case, you do not get any new action guiding information out of the Bayesian approach, because the relevant information (your preferences over risky acts) actually served as input data to the theory. In the latter case you merely get partial and negative information out of the theory, which tells you that some (but not which) preferences must be revised. Of course, these problems do not show that every insight learned from a Bayesian approach is mistaken, but it seems fair to say that they legitimise the development of an alternative approach.
Chapter 3
Choosing what to decide
This chapter addresses what is sometimes referred to as the ‘framing’, or ‘problem specification’, or ‘editing’ phase of rational decision making. The point of departure is simple. Before you make a decision, you have to choose what to decide, i.e. determine the relevant alternatives, states, and outcomes of your decision problem. Consider, for example, Savage’s famous omelette example.1 An agent intending to cook an omelette has just broken five good eggs into a bowl, and now has to decide whether to break a sixth egg, that might be rotten, into the omelette. Before deciding what to do, the agent has to determine exactly what sets of alternatives, states, and outcomes to take into account, and how to represent these entities in a formal representation. Expressed in Savage’s terminology, the main objective of the pre-deliberative phase of rational decision making is to decide ‘which world to use in a given context’2 , that is, to choose an appropriate representation of states, consequences, and acts, and thereby obtain a ‘formal description, or model, of what the person is uncertain about’.3 The purpose of the second phase of rational decision making is to establish ‘criteria for deciding among possible courses of action’, and choose an act prescribed by such a criterion.4 The pre-deliberative phase has only been briefly touched upon in the literature. Savage, like most other decision theorists, pays considerably more attention to the second phase. This chapter is entirely devoted to developing a normative theory of the predeliberative phase of rational decision making. At first glace it might be thought that the pre-deliberative phase is of little relevance for the controversy over Bayesianism and non-Bayesianism, which is supposed to be the main topic of this book. However, a number of concepts developed in the present chapter will play a crucial role in the axiomatisation of the expected utility principle undertaken in Chapter 7, and in
1 2 3 4
Savage (1954/72:13-15). I have simplified Savage’s example slightly. Savage (1954/72:9). Ibid. Savage (1954/72:6).
31
32
3 Choosing what to decide
the formulation of some of the impossibility theorems in Chapter 8. The present discussion can thus be seen as a preparation for future chapters. In order to render my theory of the pre-deliberative phase formally precise, I introduce and explore a novel class of decision rules, which I call ‘transformative’ decision rules. A transformative decision rule alters the representation of a decision problem, either by changing the set of alternative acts or the set of states of the world taken into consideration, or by modifying the probability or value assignments. A paradigmatic example is the principle of insufficient reason, which prescribes that in case there is no reason to believe that one state of the world is more probable than another, the agent should transform the initial representation of the decision problem into another in which every state is assigned equal probability. Transformative decision rules will be contrasted with effective decision rules, which yield prescriptions on how to act on the basis of the available information. The principle of maximising expected utility, the maximin and minimax regret rules are well-known examples of effective decision rules. The theory I propose will be stated by articulating formal conditions for how one representation of a decision problem may be transformed into another. The upshot is an axiomatic analysis of transformative decision rules. A noteworthy consequence of this axiomatisation is that there exist situations in which no formal representation is uniquely better than all alternative representations. There might, for example, exist two or more representations of one and the same decision problem that are equally reasonable and strictly better than all alternative representations. This phenomenon will be referred to as the problem of ‘rival representations’. Transformative decision rules are, of course, not the only means agents can use for choosing among alternative formal representations. However, transformative decision rules provide a theory for such choices that is surprisingly exact. For example, structural constraints on transformative decision rules can be rendered much more precise than general rules of thumb or procedures based on direct intuition. The remaining sections of the chapter are organised as follows. In Section 3.1, I spell out the distinction between effective and transformative rules in more detail and show that many effective rules—e.g. the principle of maximising expected utility—can be decomposed and shown to have a transformative component. In Section 3.2, I argue that the application of transformative decision rules should be governed by what I refer to as ‘deliberative values’. In Section 3.3 a single axiom for transformative decision rules is proposed, the weak monotonicity axiom. In Sections 3.4 and 3.5 I study the property of order-independence, which is an important property of transformative decision rules. Sections 3.6 and 3.7 analyse two other formal properties of transformative decision rules, viz. convergence and iterativity. Finally, in Section 3.8 I analyse the problem of rival representations mentioned earlier in this section.
3.1 Transformative and effective rules defined
33
3.1 Transformative and effective rules defined There is a well-established tradition in decision theory to conceive of decision rules as mathematical functions, i.e. as relations that uniquely associates members of one set, the argument set, with members of another set, the value set. The argument set is usually taken to be a set of formal decision problems, or a set of propositions describing formal decision problems, whereas the value set is conceived of as a set of alternative acts, or a set of propositions describing acts.5 Here I accept the general idea that decision rules are mathematical functions, but widen the concept of a decision rule by allowing for rules that are functions from one set of formal decision problems to another such set, i.e. that do not return a set of acts. This kind of decision rules are called transformative decision rules. Below is a list four well-known examples of transformative decision rules. 1. The Principle of Insufficient Reason: If there is no reason to believe that one state of the world is more probable than another, then the agent should assign equal probabilities to all states. 2. Merger of States: If two or more states yield identical outcomes under all acts in a decision problem under uncertainty, then these repetitious states should be collapsed into one. 3. Levi’s Condition of E-Admissibility: If an alternative act is not E-admissible6 , then it should be deleted from the set of alternative acts. 4. De Minimis: If the probability for some state is sufficiently small, then it should not be included in the set of states considered by the agent. The first example, the principle of insufficient reason, is frequently associated with Laplace. He argued that it is sound provided that the initial representation of the decision problem is reasonable.7 The second example, the merger of states rule, was first proposed in a slightly different form in a paper by Milnor in the 1950’s, and adopted as Axiom 11 in Luce and Raiffa’s discussion of decisions under uncertainty.8 The third example, Isaac Levi’s condition of E-admissibility, is one of several rules of the same transformative structure proposed by Levi in The Enterprise of Knowledge.9 Levi’s discussion of the rule is sophisticated, but the rule itself 5
An advantage of conceiving of decision rules as mathematical functions is that one thereby blocks the possibility of a pathological decision rule returning random output. By definition, any mathematical function must yield the same output (i.e. recommended acts or formal decision problems) every time it is applied to a given argument. This leaves no room for what in other normative disciplines is sometimes referred to as ‘particularism’, i.e. the claim that optimal acts in different situations need not have anything in common besides their optimality. For a recent discussion of particularism in ethics, see e.g. Dancy (2004). 6 An act a is, roughly put, E-admissible just in case there is a ‘seriously permissible’ probability i function q in B (B is a set of probability functions) and a ‘seriously permissible’ utility function u in G (G is a set of utility functions) such that the expected utility of ai is optimal. For a precise definition, see Levi (1980:96). 7 See Keynes (1921). 8 Luce and Raiffa (1957), Milnor (1954). 9 Levi (1980:96).
34
3 Choosing what to decide
has never been formally analysed. The philosophical roots of the de minimis principle can be traced back to Buffon’s discussion of ‘morally impossible’ events in his analysis of the St Petersburg paradox.10 The phrase ‘de minimis’ was derived by risk analysts in the 1970’s from the legal principle ‘de minimis non curat lex’, which roughly means that the law should not concern itself with trifles.11 The distinction between transformative and effective decision rules can be formally spelled out as follows. Definition 3.1 Let Π be a set of formal decision problems. t is a transformative decision rule on Π if and only if t is a function such that for all π ∈ Π , it holds that t (π ) ∈ Π . Definition 3.2 Let Π be a set of formal decision problems. e is an effective decision rule on Π if and only if e is a function such that for all A, S, P,U ∈ Π it holds that e (A, S, P,U) ⊆ A. The two classes of decision rules are mutually exclusive, i.e. no decision rule is both transformative and effective. However, theoretically there may be decision rules that are neither transformative nor effective. For instance, one can imagine a function (decision rule) that takes a formal decision problem as its argument and returns ordered pairs of acts and states. Such non-transformative and non-effective ‘decision rules’ will not be discussed here, primarily because there seems to be no intuitively plausible decision theoretic interpretation of them. A composite decision rule is a decision rule that is made up of other (transformative or effective) decision rules. For an example, consider the rule prescribing that in a formal decision problem under uncertainty one should first apply the principle of insufficient reason (ir), and thereafter select an act by applying the principle of maximising expected utility (eu). This transformative decision rule can be conceived of as a composite function (ir ◦ eu)(π ) = eu(ir(π )). Definition 3.3 If ti and t j are transformative decision rules, then (ti ◦ t j ) (π ) = t j (ti (π )) is a composite transformative rule. According to Definition 3.3, every composite decision rules is transformative. However, one could also combine transformative and effective rules into mixed rules. More precisely, if x1 , x2 , . . . xn are a number of (composite or non-composite) effective or transformative rules, it holds that: 1. If x1 , x2 , . . . , xn are all transformative decision rules, then x1 ◦ x2 ◦ . . . ◦ xn is a transformative decision rule. 2. If x1 , . . . , xn−1 are transformative decision rules and xn is an effective decision rule, then x1 ◦ . . . ◦ xn−1 ◦ xn is an effective decision rule. 3. If at least one of x1 , . . . , xn−1 is an effective decision rule, then x1 ◦ . . . ◦ xn−1 ◦ xn is undefined. 10 11
Keynes (1921). Whipple (1987).
3.1 Transformative and effective rules defined
35
A set of transformative rules is closed under rule composition if the following condition is met: Definition 3.4 Let Π be a set of formal decision problems and T a set of transformative decision rules in Π . tid denotes the identity rule such that tid (π ) = π for all π ∈ Π . Then, the composite closure of T is the smallest set T∗ of decision rules such that 1. T ∪ {tid } ⊆ T∗ 2. If t, u ∈ T∗ , then t ◦ u ∈ T∗ Furthermore, a set T of transformative decision rules in Π is closed under composition if and only if T = T∗ . Almost all transformative decision rules that are usually regarded as non-composite effective rules can be decomposed and shown to have a transformative element. The principle of maximising expected utility provides an instructive example: Let weigh be a transformative decision rule that transforms a formal decision problem under risk into a formal decision problem under certainty12 with the same alternative set, such that the utility of every outcome in the latter decision problem equals the weighed sum of the utilities and probabilities for the corresponding outcomes in the former, i.e. s∈S [p (s) · u (a, s)]ds. Furthermore, let max be an effective rule that recommends the decision-maker to choose an alternative act associated with the maximum utility in a formal decision problem under certainty. Then, the principle of maximising expected utility can be trivially reconstructed as a composite decision rule eu(π ) = (weigh ◦ max)(π ). Many other effective decision rules can be decomposed in analogous ways, e.g. the maximin rule, the minimax regret rule, and Kahneman and Tversky’s prospect rule. For example, the maximin rule can be reconstructed as a composite rule (min ◦ max), in which min transforms a decision problem under uncertainty into a new decision problem under certainty where one chooses between the ‘worst-case’ scenarios of the original decision problem. The max rule is, of course, identical to the effective subrule of the principle of maximising expected utility, i.e. it recommends the agent to choose an alternative act associated with the maximum utility. The rule max can be further decomposed into two subrules, maxset and pick. The first subrule, maxset, is a transformative decision rule that transforms a decision problem under certainty into another decision problem under certainty in which all non-optimal alternatives have been deleted. The effective subrule pick thereafter recommends the agent to pick any of the remaining alternatives, which—in case there is more than one—are of course identical with respect to the utility of the certain outcomes.
12
For definitions of these terms, see Section 1.3.
36
3 Choosing what to decide
3.2 A comparison structure for formal representations By applying transformative decision rules to a formal representation, agents can make choices among alternative representations of one and the same decision problem. The question ‘What principle should a rational agent follow in the predeliberative phase of decision making?’ can therefore be replaced by a more precise question, namely: ‘What sequence of transformative decision rules (t ◦ u ◦ v ◦ . . .) should a rational agent apply to an initial formal decision problem π ?’ In order to answer the new question, it is important to separate those (sequences of) transformative decision rules that may be applied to a formal representation, from those that may not be applied. Let Π , be a comparison structure for formal decision problems, in which Π is a set of formal decision problems, and is a binary relation in Π corresponding to the English phrase ‘at least as reasonable representation as’.13 All elements in Π are different formal representations of one and the same decision problem, and orders the elements in that set with regard to some list of relevant decision theoretic values. I shall assume that is a complete, reflexive and transitive relation. This assumption is needed for technical purposes; I cannot give any other justification for it beyond that. The relations and ∼ can be defined in terms of in the usual way.14 Note that in case Π is an infinite set, it need not contain an optimal element—there might be no formal decision problem that is at least as reasonable as all alternative representations. I propose that there are at least four deliberative values which determine whether the relation holds between two alternative formal representations. These values are: realizability, completeness, relevance, and simplicity. Note that all values are non-moral values. A formal decision problem is realisable just in case all elements in the representation correspond to features of the decision problem that have a potential to realise. A realisable formal representation should, for example, not include acts that cannot be performed by the agent, e.g. the act ‘run 100 metres in five seconds’. The situation is similar for states of the world: all states listed in a formal representation should of course be possible states of nature, and similar points can be made about probability and utility functions. Depending on how close the formal representation is to what can actually be realised, one can speak of degrees of realizability. Suppose, for instance, that you have decided to put a pencil mark ten centimetres below the top of this page. No matter how hard you try, you will never be able to put the mark exactly ten centimetres below the top. However, despite this, (a representation including) the act of putting a mark ten centimetres below the top has a high degree of realizability, since this act is very close to what can actually be realised. The more complete a formal representation is, the fewer (realisable) acts, states, probability or utility functions have been left out. Of course, a formal representation can have a high degree of realizability without also having a high degree of 13
Comparison structures are investigated in detail in Hansson (2001: Chapter 2). The usual way to do this is as follows: (i) π π iff (π π ) ∧ ¬(π π ), and (ii) π ∼ π iff (π π ) ∧ (π π ). 14
3.3 An axiomatic analysis of transformative decision rules
37
completeness and vice versa. Consider for example a formal representation containing the act ‘bring the umbrella’ and the state ‘it rains’. This formal representation is fully realisable, but it has a low degree of completeness. Relevance means that a reasonable formal representation ought to be faithful to the decision problem under consideration. If, for example, you are about to go for a walk or not go for a walk, and list the acts ‘move my right foot’ and ‘move my left foot’, and add some appropriate states, and probability and utility functions, you may perhaps end up with a realisable and complete formal representation. But it does not fulfil the condition of relevance to any high degree. A more relevant formal representation should rather contain the acts ‘go for a walk’ and ‘stay home’. The fourth value that ought to be taken into consideration when a decision problem is to be modelled in a formal representation is simplicity. A formal representation containing a small number of acts and states is, ceteris paribus, simpler than one containing many acts and states. The reason for including simplicity in the list of values is that in case too many features of a decision problems are modelled in a formal representation, it might very well turn out that the calculations and investigations needed for coming to a decision will demand more effort than what is actually motivated by the decision problem. Suppose, for example, that you are about to decide whether you should bring the umbrella or not when going for a walk in the park. It might perhaps be true that a realisable, complete, and relevant formal representation of this decision problem should include the state in which a bird decides to discharge some urine on the spot where you are, and that an umbrella would protect you from this. However, in case the number of birds in the park is not extremely high, it is reasonable to neglect this state, for reasons of simplicity. In some cases the four deliberative values pull in different directions. For example, one formal representation may score high in realizability and simplicity but low in completeness and relevance, while the reverse holds for another representation. In such a situation the agent has to make a trade-off, and choose a formal representation in which he or she considers the aggregated value to be optimal. When making a trade-off, the agent may of course assign different weights to different values. It is beyond the scope of this book to analyse in more detail how such a trade-off should be made.
3.3 An axiomatic analysis of transformative decision rules This section proposes an axiomatisation of transformative decisions rules based on a single, intuitive axiom. It is plausible to assume that the initial formal representation to which the sequence of transformative decision rules is to be applied is the formal representation containing all acts, states, probability and utility functions the agent can imagine. That is, the initial representation is a quadruple A, S, P,U in which the four sets A, S, P, U contain as many elements as the agent can imagine, irrespective of whether e.g. some pairs of acts are mutually exclusive or not. Arguably, there is only one such initial representation. The reason for taking this
38
3 Choosing what to decide
to be the initial representation is simple. An improved formal representation can then be obtained by just filtering out insignificant or inconsistent information in the subsequent transformations. The following axiom, first proposed in Peterson and Hansson (2005), ensures that transformative decision rules satisfy several attractive properties. I shall assume that the axiom holds for all t, u ∈ T∗ and all π ∈ Π . W EAK MONOTONICITY
(u ◦ t)(π ) t(π ) (t ◦ t)(π )
The left inequality, (u ◦ t)(π ) t(π ), states that a rule u should not, metaphorically expressed, throw a spanner in the work carried out by another rule t. Hence, the representation obtained by first applying u and then t has to be at least as good as the representation obtained by only applying t. For example, suppose that t is a rule that increases the simplicity of a representation by reducing redundant states (i.e. exactly parallel states that yield the same outcomes for all alternative acts). Then, u must be constructed such that the gain in simplicity to be cashed in by later applying t is not outweighed by a loss caused by u; for instance, u might be a rule that increases simplicity by reducing the number of redundant (exactly parallel) acts. The right-hand inequality, t(π ) (t ◦ t)(π ), says that nothing can be gained by immediately repeating a rule. (This property is further discussed below in relation to Theorem 3.1, part 2.) Consider the following theorem. Theorem 3.1 (Peterson and Hansson 2005) Let T be a set of transformative rules in Π that is closed under composition and satisfies weak monotonicity. Then, for all t, u in Π : 1. 2. 3. 4.
t(π ) π t(π ) ∼ (t ◦ t)(π ) (u ◦ t)(π ) ∼ (t ◦ u)(π ) (t ◦ u ◦ t)(π ) ∼ (u ◦ t)(π )
Part 1 states that the application of a transformative decision rule to a formal representation will yield a formal representation that is at least as reasonable as the one it was applied to. It might perhaps be objected that this result is too strong. Suppose, for instance, that π t(π ) and that (t ◦ u)(π ) π . As stated here, the property under consideration does not permit the agent to carry out the transformation from π to (t ◦ u)(π ) in two separate steps. However, in response to this argument, note that Part 1 does not prevent the agent from treating the composite rule (t ◦ u) as a single rule fulfilling this condition (in which case t ◦ u but not t is an element of T). Therefore, this is not a counter-example to the intuition underlying Part 1, and hence not to weak monotonicity. Part 2 states that the aggregated value of a formal representation will remain constant no matter how many times (≥ 1) a transformative decision rule is iterated. A detailed defense of the intuition underlying this property (formulated in a slightly
3.3 An axiomatic analysis of transformative decision rules
39
different way) was presented in Peterson (2003a). The basic argument runs as follows. In order for a transformative rule to be applicable, agents cannot be required to apply a rule more than a finite number of times. Obviously, this means that the rule has to be convergent in the sense that for every π ∈ Π there is some number n such that for all m ≥ 1 it holds that (t◦)n+m (π ) ∼ (t◦)n (π ), where (t◦)n denotes the rule t iterated n times. Otherwise the rule could be repeated indefinitely and yet its full capacity for improving the decision problem not be used. But in case a rule is convergent in the sense just defined, then it can be replaced by a rule that satisfies (t ◦ t)(π ) ∼ t(π ) for all π for all π , thus satisfying Part 2. Part 3 establishes that transformative decision rules are order-independent: they can be applied in any order. This is an essential result. The notion of orderindependence studied here, (u ◦ t)(π ) ∼ (t ◦ u)(π ), should not be mixed up with (u ◦ t)(π ) = (t ◦ u)(π ). It is very rare that sets of transformative rules satisfy the latter, strong notion of order-independence. See Section 3.5. Finally, Part 4 makes it clear that nothing can be gained by applying t more than once, no matter what other transformative rules were applied between the two applications of t. Theorem 3.1 does not give advice on how many rules in T the agent ought to apply. However, Theorem 3.3 below shows that all permutations obtained from the largest subset of rules satisfying weak monotonicity are optimal. Hence, the agent may safely apply all transformative decision rules that satisfy these conditions, and it does not matter in which order they are applied. A permutation of T is any composite rule that makes use of every element in T exactly once. Thus, the permutations of T = {t, u} are (t ◦ u) and (u ◦ t). Lemma 3.2 is instrumental in the proof of Theorem 3.3. Lemma 3.2 Let T∗ be a finite set of transformative rules for Π that is closed under composition and satisfies weak monotonicity. Then all permutations pa and pb obtainable from T are of equal value, i.e. pa (π ) ∼ pb (π ). Theorem 3.3 Let T be a set of transformative rules for Π that is closed under composition and satisfies weak monotonicity and let A ⊆ B ⊆ T. Then, for every π ∈ Π and every permutation pa obtainable from A and every permutation pb obtainable from B, it holds that pb (π ) pa (π ). To sum up, the theorems in this section show that if a set of transformative decision rules satisfies weak monotonicity, then the agent may apply all transformative rules in this set, in any desired order he wishes, and there is no requirement to apply any rule more than once. Furthermore, the transformative rules in that set will improve the initial representation as much as can possibly be achieved; there is no other way in which these rules can be applied that would return a formal representation that is strictly better.
40
3 Choosing what to decide
3.4 Strong versus weak monotonicity It is illuminating to compare the axiom proposed in Section 3.3 with an earlier axiomatisation. In Peterson (2004b), Parts 1 and 2 of Theorem 3.1 were adopted as axioms together with the following, slightly stronger monotonicity condition. S TRONG MONOTONICITY If π π , then t(π ) t(π ). Seidenfeld has pointed out that the axiomatisation based on the strong monotonicity condition has the following implication:15 Whenever t is a ‘better’ rule for π than u (i.e. t(π ) u(π )), if t has been applied to a representation π , yielding the representation π = t(π ), then it is never the case that u can improve π . Hence, no substantial interaction is allowed among transformative decision rules: It never happens that one gets a better representation by first applying u and then t, compared to what one gets by applying t directly. In order to spell out this difficulty in more detail, suppose that T = {t, u} satisfies strong monotonicity and Parts 1 and 2 of Theorem 3.1, and also suppose that t(π ) u(π ) π . Now, if t is applied to u(π ) respectively t(π ) it follows from strong monotonicity that (t ◦ t)(π ) (u ◦ t)(π ). But according to Part 2 of Theorem 3.1 it holds that (t ◦ t)(π ) ∼ t(π ), so t(π ) (u ◦ t)(π ), which means that t and u have not interacted in a way that opens up for a representation that is any better than what could be reached by t alone. Hence, no interaction between t and u can occur. Of course, the left-hand side of weak monotonicity draws on the same intuition as strong monotonicity. Hence, it might be objected that weak monotonicity and strong monotonicity are problematic for the same reason. However, in order to derive strong monotonicity from weak monotonicity one has to add the following axiom (or some other axiom that is at least as strong): ACHIEVABILITY If π π , then there is a set of rules {t1 , . . . , tn } such that (t1 ◦ . . . ◦ tn )(π ) = π . Observation 3.4 Weak monotonicity and achievability imply strong monotonicity. Unfortunately, achievability is a highly questionable property. There is no reason to believe that there always is a set of rules {t1 , . . . , tn } that can take us from one (bad) representation to another specified (better) representation. Suppose, for instance, that the better representation contains some information (e.g. more alternative acts) that was not contained in the bad representation; then it is not certain that there exists a set of rules that can take us to the better representation. The following weaker version of achievability is not sufficient for deriving strong monotonicity from weak monotonicity. 15
When acting as external examiner of the author’s doctoral dissertation, May 16, 2003.
3.4 Strong versus weak monotonicity
41
W EAK ACHIEVABILITY If π π , then there is a set of rules {t1 , . . . , tn } such that (t1 ◦ . . . ◦ tn )(π ) π . Observation 3.5 Weak monotonicity and weak achievability do not imply strong monotonicity. Even though weaker than achievability, weak achievability, is not self-evident. For example, it presupposes that there are no cul-de-sacs, that is, non-optimal representations that cannot be improved. However, from a normative point of view it seems reasonable to require that transformative decision rules should have a structure that does not allow the agent to end up in a cul-de-sac. Observations 3.4 and 3.5 together indicate that the axiomatisation based on the weak monotonicity axiom avoids the problem identified by Seidenfeld. Further evidence for this conclusion can be obtained by deriving a representation theorem for transformative decision rules. This theorem shows that an agent obeying weak monotonicity can be described as if he maps formal representations into a onedimensional space, while taking certain restrictions into account. This gives a better understanding of what is, and is not, implied by the weak monotonicity axiom. As will be explained below, the representation theorem ensures that the kind of problem pointed out by Seidenfeld cannot occur if weak monotonicity is assumed to be the sole axiom governing the application of transformative decision rules. Let a, b, . . . . . . be elements in a set M and consider the following definitions. Definition 3.5 A vector a, b is an upvector if and only if |b| ≥ |a|, where | | is a function that assigns a real number to each element in M. Definition 3.6 An upvector-label is a set L of upvectors such that if a, b and a, b ∈ L, then b = b . The semantic unit used here is a set of upvector-labels, or SEUL for short. It can be easily verified that sets of transformative decision rules can be represented by SEULs. Let V be a function from Π to M such that |V (π )| ≥ |V (π )| if and only if π π . V represents the projections of the elements of Π to a scale that conforms with the relation . Non-identical elements of Π can be equivalent in terms of ≥, in other words one can have |a| = |b| and a = b. (Otherwise, it will not be possible to avoid implausible results; as one example (t ◦ u)(π ) = (u ◦ t)(π ) would follow from weak monotonicity, contrary to our strivings in Section 3.4 to avoid such postulates.) It follows from what has been said above that every representation π ∈ Π and set of rules T can be described as a model consisting of a set M, a SEUL and a function | |, as defined above. Definition 3.7 A transformative rule t in Π is representable by an upvector-label in a SEUL if and only if, for every transformation from π to t(π ) in Π , V (π ), V (t(π )) is an upvector.
42
3 Choosing what to decide
The following theorems show that sets of transformative decision rules can be represented in a SEUL, and that order-independent transformative rules can be represented in a SEUL satisfying certain restrictions. Theorem 3.6 (Peterson and Hansson 2005) Let T be a set of transformative rules in Π that is closed under composition, such that t(π ) π for all t ∈ T and all π ∈ Π . Then there exists a SEUL such that each t ∈ T is represented by exactly one upvector-label, and vice versa. Theorem 3.7 (Peterson and Hansson 2005) Let T be a set of transformative rules in Π that is closed under composition and that satisfies weak monotonicity. Then there exists a SEUL such that each t ∈ T is represented by exactly one upvectorlabel, and vice versa, with the following restrictions on the upvector-labels: 1. There are no |c| > |b| > |a| such that a, b and b, c are elements in the same upvector-label. 2. If |d| > |c| ≥ |b| ≥ |a|, and a, d and b, c are elements in the same upvectorlabel, then there is no upvector-label that contains a, b. Theorem 3.7 can be graphically illustrated in an example, see Figure 1. In the model used in this example the only ways to reach one of the two optimal representations πd and πd is to either start with t and then apply u, or to start with u and then apply t. (Of course, this model can be iterated and expanded in various ways.) The model is constructed by letting Π = {πa , πb , πc , πd , πd : πd ∼ πd πc πb πa } and by assuming that T = {t, u}, where : t(πa ) = πb , t(πb ) = πb , t(πc ) = πd , t(πd ) = πd , t(πd ) = πd , u(πa ) = πc , u(πb ) = πd , u(πc ) = πc , u(πd ) = πd , and u(πd ) = πd . Since interaction between the rules is the only way to reach an optimal representation in this example, Theorem 3.7 shows that weak monotonicity is not too strong. Thus, Theorem 3.7 indirectly supports the claim that weak monotonicity satisfies reasonable normative requirements.
3.5 Two notions of permutability
πd
πd
MB B
u
43
Bt B B πc
πb u I @ @ t@ @ πa Figure 1.
3.5 Two notions of permutability Are transformative decision rules permutable? This is equivalent to asking whether it matters in which order the rules in a set of transformative rules are applied. From trivial combinatorial considerations it follows that if T contains n different transformative rules, then there exist n! different sequences in which each transformative rule is used exactly once. If rules may be applied more than once, then the number of possible sequences is of course infinite. It would be unrealistic to require that agents should be able to decide between all different sequences of transformative decision rules by systematically checking all possible combinations. Thus, a reasonable theory about transformative decision rules should either (1) contain a well-motivated instruction for the order in which the rules should be applied, or (2) be order-independent in the sense that it does not matter in which order the different rules are applied. The former approach is adopted in Levi (1980), even though he does not use the term ‘transformative decision rule’. This section investigates the second approach. In Theorem 3.1 in Section 3.3 it was pointed out that transformative decision rules satisfying the weak monotonicity axiom are permutable in the following sense: No matter which order the rules are applied in, the aggregated value of the representations will be the same. This notion of permutability, derived from the weak
44
3 Choosing what to decide
monotonicity axiom, will be called weak permutability. Consider the following definition. Definition 3.8 A set T of transformative decision rules is weakly permutable for Π just in case, for every t, u ∈ T and every π ∈ Π , it holds that (t ◦ u)(π ) ∼ (u ◦ t)(π ). Weak permutability may be contrasted with strong permutability. Strong permutability holds just in case all formal decision problems returned by different sequences of some set of transformative decision rules are ‘sufficiently similar’ for yielding identical recommendations by a given effective rule. Unlike weak permutability, it is not a value issue whether strong permutability holds or not. Consider the following definition. Definition 3.9 A set T of transformative decision rules is strongly permutable for Π with respect to an effective rule e just in case, for all t, u ∈ T and every π ∈ Π , it holds that (t ◦ u ◦ e)(π ) = (u ◦ t ◦ e)(π ). A special case of strong permutability occurs in case it does not matter which effective rule e and which set Π is considered; this important case of strong permutability will be called total permutability. For an example of a set of transformative decision rules that are totally permutable, consider the following three principles: 1. Merger of States (ms): If two or more states in a formal decision problem under uncertainty yield identical outcomes for all acts, then they should be collapsed into one. 2. The Precautionary Principle: (pp): If the worst possible outcome of an alternative act a is very undesirable, i.e. if there is a pair a, s such that u(a, s) < c (where c is some suitable constant), then a should be removed from the set of alternative acts. 3. Merger of Acts (ma): If two or more acts in a formal decision problem yield identical outcomes for all states, then they should be collapsed into one. It can be easily verified that the set {ms, pp, ma} is strongly permutable for any effective rule e and any set of formal decision problems Π , i.e. that it is totally permutable. It is worth noticing that total permutability, but not strong permutability, implies weak permutability. As far as I have been able to find out, neither strong nor total permutability can— unlike weak permutability—be taken to be a general normative requirement for sets of transformative decision rules. There are, on the contrary, convincing examples of sets of normatively reasonable rules that do not satisfy strong permutability. For an example, consider the ms rule defined above, and the Principle of Insufficient Reason ir, prescribing that in case there is no reason to believe that one state of the world is more probable than another, then the decision-maker should transform his initial decision problem into one in which every state is assigned equal probability. It can clearly make a difference to a decision under uncertainty if one first assigns equal probability to all states by applying ir and then use ms for deleting a repetitious state, or start by applying ms for deleting a state and then use ir for assigning equal probability to the remaining states.
3.5 Two notions of permutability
45
Even though not all (sets of) normatively reasonable transformative decision rules are strongly permutability it is nevertheless desirable to state sufficient conditions for this form of permutability. This is because an agent who finds that such a set of conditions are fulfilled can significantly simplify the framing process by applying the rules in question in any order he wishes, without considering the rather complex value issues involved in the application of weak permutability. Suppose that the following conditions hold for all transformative decision rules t ∈ T and all formal decision problems π ∈ Π . S TRONG ITERATIVITY (t ◦ t)(π ) = t(π ) E XTENDED ITERATIVITY (t ◦ u ◦ t)(π ) = (u ◦ t)(π ) C ORRESPONDENCE If π = π , then t(π ) = t(π ). Strong iterativity, which trivially implies weak iterativity, holds in case a transformative decision rule returns the same formal decision problem no matter how many times (≥ 1) it is applied. Almost all transformative decision rules discussed in this study satisfy strong iterativity, or can be restated as such rules. For an example of a rule that does not satisfy this condition, consider the rule ea (Expansion of Alternatives), prescribing that in case the set of alternative acts is not jointly exhaustive, then it should be expanded by one more alternative act. This rule is, as stated here, not strongly iterative since there is no guarantee that the set of alternative acts becomes exhaustive by just adding one more alternative act, and neither is it convergent since there may be an infinite number of alternative acts open to the agent, e.g. a1 : ‘I put my pen 12 ft from the book’, a2 : ‘I put my pen 12 + 14 ft from the book’, a3 : ‘I put my pen 12 + 14 + 18 ft from the book’, etc. However, ea can easily be restated as a strongly iterative rule, by simply saying that in case the set of alterative acts is not jointly exhaustive then it should be expanded by some (perhaps infinite) set of alternative acts such that this condition is fulfilled. For an example of a pathological rule that cannot be restated as a strongly iterative rule, consider the rule prescribing that in case the number of alternative acts is odd then one act should be deleted, and in case the number of alternative acts is even then one act should be added. Extended iterativity asserts that no rule needs to be applied twice even if some other transformations take place between the two occurrences of the rule in question. This condition resembles the condition of strong iterativity, but none of the two conditions is implied by the other. The condition of correspondence states that there are no ‘junctions’, i.e. no points such that π = π and t(π ) = t(π ). At first glance this condition seems to be rather strong, since it is violated by e.g. the version of the ms rule stated in the beginning of this section (for example, suppose π and π are exactly parallel, except that π contains one more repetitious state). However, if the ms rule is stated in a more precise way, for example as a set of n different rules prescribing that ‘If n states
46
3 Choosing what to decide
. . . yield identical pay-offs, then . . . ’, ‘If n − 1 states . . . yield identical pay-offs, then . . . ’, etc., then correspondence will in fact be satisfied. In analogous ways, the pp and ma rules can also be restated as sets of more precise rules that satisfy correspondence. The conditions of strong iterativity, extended iterativity and correspondence are jointly sufficient for strong permutability. Consider the following theorem. Theorem 3.8 Let T be a set of transformative decision rules for Π that satisfy strong iterativity, extended iterativity, and correspondence. Then T is strongly permutable with respect to every e and every Π . Below are two further conditions that, together with strong iterativity, are also sufficient for strong permutability. U LTRA - STRONG ITERATIVITY For every t, u ∈ T, if t and u satisfy strong iterativity for Π , then (t ◦ u) and (u ◦ t) also satisfy string iterativity.
R EVERSIBILITY For every t ∈ T and every π ∈ Π there is some rule t+ such that (t ◦ t+ )(π ) = π . Theorem 3.9 Let T be a set of transformative decision rules for Π that satisfy strong iterativity, ultra-strong iterativity, and reversibility. Then T is strongly permutable with respect to every e and every Π . Ultra-strong iterativity is a modified and strengthened version of the original condition of strong iterativity. Reversibility implies that for every transformative decision rule that transforms a formal decision problem into another, there is some other transformative decision rule that reverses the original transformation, such that the initial formal decision problem is retained. For an example, consider the principle of insufficient reason (ir). One can easily construct an ‘inverse’ to ir, that takes a formal decision problem under risk as its argument and returns a formal decision problem under uncertainty in which all probabilities have been deleted. Reversibility is a rather strong condition. For an example of a transformative decision rule that does not satisfy this condition, consider the much discussed de minimis rule (dm), prescribing that sufficiently improbable states should be neglected.16 Since this rule deletes some unique information (in this case states with corresponding probabilities) it has no inverse dm+ . That is, there is no unique formal decision problem that corresponds to the ‘inverse’ of dm(π ). Another prominent example of an irreversible transformative decision rule is the pp rule. This rule is also irreversible basically because it deletes some information: there is no function that can recover the deleted information, since two or more different formal decision problems π and π may correspond equally well to the ‘inverse’ of pp(π ). 16
For an introduction, see Whipple (1987).
3.5 Two notions of permutability
47
It seems that strong permutability is a desirable property of sets of transformative decision rules, since it significantly reduces the complexity of the framing process. There are, however, reasonable sets of transformative decision rules that are not strongly permutable. Therefore, only weak permutability can be considered as a strict normative requirement of sets of transformative decision rules. Furthermore, of the two notions of permutability discussed here, weak permutability also appears to be the most interesting one from a general decision theoretical perspective. This is because weak permutability gives support to a non-sequential theory of how to represent decision problems. Nearly all normative theories dealing with the pre-deliberative phase of decision making are sequential.17 In this context ‘sequential’ means that the proposed process can be divided into a finite number of steps that must be performed in some given order. For example, a sequential theory may typically prescribe that the agent should:18 1) Identify the set of alternative acts, and make sure that the acts are mutually exclusive and exhaustive. 2) Identify the relevant set of (act-independent) states of the world, and make sure that the states are mutually exclusive and exhaustive. 3) For each pair of acts and states, identify the corresponding outcomes, and assess their utility on an interval scale. 4) Assess the (objective or subjective) probability for each state. A non-sequential theory, on the other hand, prescribes no fixed order in which the different steps in the framing process should be performed. The discussion of transformative decision rules has so far been primarily concerned with transformations of formal decision problems that are already fairly well worked-out in terms of completeness, relevance, and realizability, and so on. However, it should be clear that transformative decision rules can also be applied for transforming an initial representation of a decision problem (that might be very primitive) into one that can be used for decision making. When transformative decision rules are used for this second purpose, they can be said to be part of a nonsequential theory of framing. More precisely, since the transformative decision rules applied in this process ought to be weakly permutably, just like other transformative decision rules, it does not matter in which order the different steps are performed. Hence, if the agent wishes to start the framing process by assigning probabilities to different states and then arrange the set of alternative acts, that is as acceptable as first arranging the set of alternative acts and then assign probabilities to different states. The aggregated values of the resulting formal decision problems will be the same. Two important arguments in favour of a non-sequential account of how to represent decision problems is that it significantly simplifies the framing process, and that it does not contain any controversial or arbitrary assumption about which parts of the framing process should be carried out in which order. Furthermore, it is worth noticing that the non-sequential account advocated here does not contain any a priori assumption that sets of alternative acts or states have to be mutually exclusive and exhaustive. On the contrary, it might very well be the case that in an optimal 17 18
See e.g. Savage (1954: Chapter 1), Brim et al (1962:9), and Resnik (1993: Section 1.2). See e.g. Brim et al (1962).
48
3 Choosing what to decide
formal representation some alternative acts or states ought to be omitted, e.g. because of simplicity.
3.6 More on iterativity In Section 3.5, several different notions of iterativity were proposed. By distinguishing between iterative and convergent rules, this section seeks to attain a more general understanding of iterativity, and thereby gain additional support to the right-hand side of the weak monotonicity axiom. Let (t◦)n denote the transformative rule t iterated n times. Definition 3.10 A transformative decision rule t for Π is convergent if and only if for every π ∈ Π there is some finite number n such that (t◦)n (π ) = (t◦)n+1 (π ). Definition 3.11 A transformative decision rule t for Π is strongly iterative if and only if for every π ∈ Π it holds that t (π ) = (t ◦ t) (π ). Clearly, strong iterativity implies convergence, but not vice versa. Below I shall give an argument to the effect that no transformative rule is normatively reasonable unless it is, or can be restated as, a convergent as well as a strongly iterative rule. If correct, this shows that there are certain fixed points at which a well-organised transformation process will inevitably stop. The central assumption in the argument to be spelled out here is that transformative decision rules ought to be conservative. This means that a rule should not transform a representation into another, unless the new representation is strictly better than the first one. Consider the following definition. Definition 3.12 A transformative decision rule t for Π is conservative if and only if for every π ∈ Π it holds that if π = t (π ), then t (π ) π . The reason for requiring strict rather than weak preference in Definition 3.12 is that agents ought not to give up a formal representation of a decision problem unless there is at least some advantage of the new representation. There is always a risk of making mistakes when a formal representation is transformed into another. Below it will be shown that conservative rules are, or can be restated as, strongly iterative rules. However, before doing this, let me first point out that there are of course transformative rules that are neither iterative nor convergent. An example of a rule belonging to the latter category, and which seems to be widely used by actual decision-makers, but seldom discussed by decision-theorists, is the rule prescribing that in case the set of alternative acts A (of the formal decision problem A, S, p, u) is not exhaustive, it should be expanded by one more alternative act. This rule, which we may call expansion of alternatives (ea), is (i) not strongly iterative since there is no guarantee that the set A becomes exhaustive by just adding one more alternative act, and it is (ii) not convergent since there may be an infinite number of alternative acts open to the agent, e.g. a1 : ‘I decide to believe there is 1 god’, a2 : ‘I decide
3.6 More on iterativity
49
to believe there are 2 gods’, etc. Since the ea rule is not convergent and strongly iterative it is not normatively reasonable, as will be shown below. Before the announced argument can be delivered, a last preparatory step has to be taken. Let us define the following relations that may hold between pairs of transformative rules. Definition 3.13 Rule t is a perfect substitute for t with respect to Π just in case it holds for every π ∈ Π that t (π ) = t (π ). Definition 3.14 Rule t is equivalent to t with respect to e and Π just in case it holds for every π ∈ Π that (t ◦ e) (π ) = (t ◦ e) (π ). Clearly, if t is a perfect substitute for t with respect to Π , then t is equivalent to t with respect to Π and every e. But the converse implication does not hold. I shall separate my argument for the claim that all transformative rules ought to be strongly iterative—and hence convergent—into two parts, based on a division of transformative rules into two subsets. I shall first show that my claim holds for the first subset of transformative rules, and then that it holds for the second subset as well. Taken together, this will show that my claim holds for all transformative rules. The first subset of transformative rules is the set of inert transformative rules. Definition 3.15 A transformative rule t is inert with respect to an effective rule e and a class Π of decision problems if and only if it holds for all π ∈ Π that (t ◦ e) (π ) = e (π ). Even though an inert rule t does not affect the act(s) chosen by e, decision-makers may nevertheless have preferences between π and t (π ) based on considerations of simplicity. An instructive example of a rule that is inert with respect to many effective rules—for example the maximin and the minimax regret rule—is the transformative rule mc (merger of repetitious columns). The mc rule, which was extensively disused by by Luce and Raiffa (1957) in their treatment of decisions under uncertainty, prescribes that repetitious columns in a formal decision problem under uncertainty should be deleted. 19 That is, if some states yield identical payoffs for all acts, mc prescribes that they should be collapsed into one. Most decisionmakers would presumably agree that decision problems under uncertainty in which repetitious columns have been deleted, have a lower degree of complexity than the corresponding original problems. Of course, mc is strongly iterative. In case the set of formal decision problems Π under consideration is finite, the following observation supports the first part of my general claim about convergence and iterativity. (As will be shown below, this observation in fact applies to non-inert rules as well.) Observation 3.10 Let be acyclic and let Π be a finite set with n elements. Then: 1. Every conservative rule t for Π is convergent. 19
Luce and Raiffa, Axiom 11, (1957:295). (The mc rule seems to have been invented by Milnor (1954).)
50
3 Choosing what to decide
2. Every conservative rule t for Π either is strongly iterative, or there is a strongly iterative rule t such that for all m ≥ 0, t is a perfect substitute for (t◦)n+m . Arguably, the case with a finite Π covers most authentic applications of transformative rules. From a theoretical point of view it is, however, interesting to investigate what further assumptions are needed to obtain corresponding results in the infinite case. As far as I have been able to figure out, one needs two rather strong assumptions. First, let us say that π ∗ is an optimal decision problem with respect to Π if and only if there is no π ∈ Π such that π π ∗ . Second, let S(π , π ) be a function that for every π , π ∈ Π returns a real number. The intended interpretation of S(π , π ) is a measure representing the difference in deliberative value between π and π . Consider the following definition, which relies on the proposed function. Definition 3.16 Suppose that t(π ) = t (π ). Then t is an approximation of t with respect to π if and only if S(t(π ), t (π )) ≤ ε By choosing a suitable value of ε ≥ 0, one can adjust the precision of the approximation. (A small ε yields a precise approximation and a large ε yields an imprecise one.) Note that if t is a perfect substitute for t with respect to Π , then t approximates t with maximum precision (i.e. ε = 0) with respect to every π ∈ Π , since the decision-maker will then be indifferent between t (π ) and t (π ). Now consider the following observation about the infinite case. Observation 3.11 If is acyclic, Π is an infinite set, t is inert with respect to e and Π , and there at least one decision problem π ∗ that is optimal with respect to Π , then 1. If t is conservative, then t either is convergent, or for every ε > 0 there is a convergent rule t such that, for every n, (i) (t ◦)n is an approximation of (t◦)n with respect to all π ∈ Π , and (ii) (t ◦)n is equivalent to (t◦)n with respect to e and Π . 2. If t is conservative, then t either is strongly iterative, or there is a strongly iterative rule t that is equivalent to the convergent rule t with respect to e and Π . Observation 3.12 relies heavily on the assumption that there is at least one decision problem π ∗ that is optimal with respect to Π . Can this rather strong assumption be motivated? Yes, I think so. There seems to be a lower (but no upper) boundary for the complexity of a formal decision problem, exemplified by the formal decision problem consisting of one alternative act and one state of the world that occurs with probability one, and also—for every real-world decision problem—a (set of) most realisable abstraction(s) of this problem, namely the formal decision problem(s) incorporating exactly those alternative that are open to the agent and exactly those states that may occur. The same goes for the other deliberative values. Of course, this argument is not incontestable, but it nevertheless gives considerable reason to believe in the assumption in question. I come now to the second part of my claim about convergence and iterativity, which is concerned with the subset of active transformative rules.
3.6 More on iterativity
51
Definition 3.17 A transformative rule t is active with respect to an effective rule e and a class Π of decision problems if and only if it holds for some π ∈ Π that (t ◦ e) (π ) = e (π ). For an example of an active rule, consider the de minimis rule (dm) mentioned in Section 3.1, which recommends the decision-maker to neglect sufficiently improbable events, e.g. being hit by a comet or getting cancer from very small doses of radiation. The dm rule is active with respect to the principle of maximising expected, since eu for some formal decision problems under risk (incorporating de minimis risks) yields different prescriptions depending on whether dm was applied before eu or not, i.e. (dm ◦ eu) (π ) = eu (π ) for some π ∈ Π . In case Π is finite one may again refer to Observation 3.11, which applies to active rules as well: No rule (neither active nor inert) is conservative unless it is convergent and is, or can be restated as, a strongly iterative rule. Unfortunately, the infinite case is more problematical. Here is an example explaining why. Assume that we have an infinite set of formal decision problems, each with two acts a1 and a2 , and also assume that we have an active transformative rule t and an effective rule e, such that every second time t is applied it yields a decision problem in which e selects a1 , but the rest of the times a decision problem in which e selects a2 . Furthermore, assume that (t◦)n+1 (π ) (t◦)n (π ) for all n. In this situation, it seems completely irrelevant that t can be approximated with arbitrary high precision by a convergent rule t (cf. Observation 3.12). Because if e yields different prescriptions every time t is applied, and it is always better to apply t one extra time, how should we ever find out what to do? The structure of the above-mentioned example is clearly pathological. However, one way to handle the problem might be to adopt the following assumption: Let Π (π ) contain exactly those elements in Π that are strictly preferred to π .20 Then, no matter whether Π is finite or infinite, I assume that for all π ∈ Π the set Π (π ) is finite. Consider the following observation. Observation 3.12 If is acyclic, Π is an infinite set, and for all π ∈ Π the set Π (π ) is finite, then 1. Every conservative rule t for Π is convergent. 2. Every conservative rule t for Π either is strongly iterative, or there is a strongly iterative rule t that is equivalent to t with respect to e and Π . The assumption that for all π ∈ Π the set Π (π ) is finite is stronger than the earlier assumption, used in Observation 3.12, that there is at least one formal decision problem π ∗ that is optimal with respect to Π . An intuitive motivation for this new assumption, which relies on a somewhat controversial value theoretic premise, can be obtained by an analogy to the natural numbers N = {0, 1, 2, 3, . . .}. Even though N is infinite, it holds for every n ∈ N that there is only a finite number of numbers that are smaller than n. In the same way, note that even in case Π is infinite, it seems reasonable to claim that for every π ∈ Π there is only a finite number of formal 20
That is, Π (π ) is the subset of Π such that π ∈ Π is an element of Π (π ) just in case π π .
52
3 Choosing what to decide
decision problems π , π , ... that the decision-maker should prefer to π , since there is at least one decision problem π ∗ that is optimal with respect to Π and one cannot ‘feel’ the difference between sufficiently small variations of a decision problem, and consequently not have preferences between them.21 This completes my argument for the claim, stated in the outset of this section, that all transformative rules ought to be convergent as well as strongly iterative. If correct, it clearly has implications for many effective rules as well, since many such rules—e.g. the principle of maximising expected utility, the maximin rule, and the minimax regret rule—can be reconstructed as containing transformative subrules, as shown in Section 3.1.
3.7 Acyclicity Intuitively put, a set of transformative rules is acyclic just in case it holds that once the agent has applied one of the rules and transformed a formal decision problem π into another problem π , there is no sequence of rules that can take him or her back to the initial problem π . Arguably, it is desirable that transformative rules are acyclical, because why make any transformations at all if there is a risk that one ends up where one started? Consider the following formal definition. Definition 3.18 The set T of transformative rules is acyclic in Π if and only if, for every sequence t1 , ..., tn of elements of T and every π ∈ Π , if tm (π ) = (t1 ◦ . . . ◦ tm−1 )(π ) for some tm ∈ {t2 , ..., tn }, then (t1 ◦ ... ◦ tn )π = π . For a trivial example of a cyclic (i.e. non-acyclic) set of transformative rules, consider the two rules merger of columns (mc) and split of columns (sc), originating from Luce and Raiffa’s Axiom 11 (1957, p 295) in their discussion of individual decision making under uncertainty. As explained in Section 3.5, the mc rule takes a formal decision problem under uncertainty and transforms it into another formal decision problem under uncertainty in which repetitious columns have been deleted (i.e. in case several states yield identical payoffs for all acts, they are collapsed into one). The sc rule reverses this procedure, that is, transforms a formal decision problem under uncertainty into another by splitting one of its columns into two identical columns. As can be easily verified by the reader, weak monotonicity does not guarantee acyclicity. However, this acyclicity can be guaranteed if at least one rule in a sequence of transformative rules lead to a strict improvement, and the relation is transitive. Consider the following observation, based on the notion of conservativity introduced in Section 3.6. Observation 3.13 If is transitive, then every set of conservative transformative rules T is acyclic. 21
In this analogy, the number 1 of course corresponds to the optimal decision problem π ∗ .
3.8 Rival representations
53
Observation 3.13 is the last formal observation stated in this chapter. However, the most important conclusion in this chapter is perhaps not the formal results as such, but rather the very idea of formalising the discussion of how to represent decision problems in formal representations. It has been shown that through the use of a transformative decision rule, an initial representation A, S, P,U can be transformed into another representation A , S , P ,U by modifying the sets of acts, states, probability functions, or utility functions.
3.8 Rival representations This section is devoted to a general problem with formal representations, which arises no matter how formal representations are constructed. Roughly put, the problem is that in some cases there is no unique formal representation that is better than all alternative representations. It might, for example, exist two or more representations of some decision problem that are equally reasonable and strictly better than all alternative representations. Or, alternatively, it might turn out that there exist two or more representations that are optimal in the sense that no alternative representation is strictly better even though none of the optimal representations are of equal value, e.g. because they are incomparable. In the present theory, rival representations arise because reasonable transformative decision rules satisfy the weak monotonicity axiom. The weak monotonicity axiom implies that transformative rules are order-independent, in the sense that for every π , it holds that (u ◦ t)(π ) ∼ (t ◦ u)(π ). Obviously, rival representations are troublesome if an act is judged as rational in one optimal representation of a decision problem, but as non-rational in another optimal representation of that decision problem, by the same decision criterion (e.g. the principle of maximising expected utility). In such cases one may legitimately ask whether the act in question should be performed or not. What should a rational agent do? The scope of this problem is illustrated by the fact that, theoretically, there might cases in which all acts that are rational in one optimal representation of a decision problem are non-rational in another rival representation of the same decision problem, whereas all acts that are rational according to the latter representation are non-rational according to the former. In such a case the normative problem is even more acute: Can normative decision theory provide the agent with any action guiding advice at all? Before I render this claim plausible, I shall first show how rival representations may arise in the framework described here. To some degree, the problem of rival representations resembles the problem of underdetermination in science, famously discussed by e.g. Quine.22 According to Quine’s notion of underdetermination, there might be several different scientific theories that explain all accumulated evidence equally well. In such cases there are at least two possible positions one could take. Ecumenists think that both (all) of the 22
See for example Quine (1992).
54
3 Choosing what to decide
incompatible theories should be regarded as ‘locally’ true, whereas sectarians argue that every rivaling theory ought to be considered false. The main problem with the latter standpoint is that it forces us to make judgments about the truth or falsity of scientific theories that are not based on empirical evidence (or other rational considerations, e.g. simplicity, scope, etc.), since the accumulated evidence and all other relevant features of both theories are equal. The problem faced by advocates of the ecumenical position is to explain what it means for incompatible theories to be ‘locally’ true; in that case truth as ‘correspondence to external facts’ seems impossible. In decision theory the ecumenical position is less problematic. In fact, it seems perfectly reasonable to maintain that in case one and the same act is judged as rational in one formal representation but non-rational in another formal representation, then there are good reasons to regard that act as rational and to regard it as irrational. This is not contradictory, given that acts are here treated as rational only relative to a certain formal representation of a decision problem. Before I render this claim plausible, I shall first show how rival representations may arise in the decision theoretical framework outlines in the preceding sections. I shall give three examples of rival representations. In the first and the second example, the transformative rules I mention are not only order-independent in the sense that (u ◦ t)(π ) ∼ (t ◦ u)(π ), they are also order-independent in the stronger sense that (u ◦ t)(π ) = (t ◦ u)(π ). The latter notion of order-independence is not implied by the weak monotonicity axiom.23 In the third example, only the weak form of order-independence holds. All three examples are trivial from a technical point of view. However, just because the examples are so simple they indicate that order-independence is of greater significance than one might think at first glance. For the first example, consider the formal representation depicted below. This is a decision problem under ignorance — no probabilities for the states s1 , s2 , and s3 are known, so the set of probability functions P is empty. The letters a - d denote utilities. Table 3.1 a1 a2 a3
s1 a c c
s2 b d d
s3 b d d
According to Luce and Raiffa’s merger of states rule (ms) mentioned in Sections 3.1 and 3.5, it holds that: Merger of States: If π is a formal decision problem in which two or more states yield identical outcomes under all acts, then these states should be collapsed into a single state. (And if there are any known probabilities of the states they should be added.) 23
For a discussion of the strong notion of order-independence, see Section 3.5.
3.8 Rival representations
55
By applying ms to the formal representation above one obtains a representation in which s2 and s3 are merged into as a single state. This is an improvement of the initial representation, since it leads to a gain in simplicity. For an example, suppose that s2 is the state ‘prices increase by ten percent and the coin lands heads up’, and s3 is the state ‘prices increase by ten percent and the coin lands tails’. Then, by merging s2 and s3 into a single state, one obtains the less complex state ‘prices increase by ten percent’. In order to complete the example, recall the merger of acts rule (ma) introduced in Section 3.5. This rule is parallel to the merger of states rule, except that it operates on acts. More precisely, the ma rule prescribes that alternative acts that yield identical outcomes, no matter which state occurs, should be collapsed into a single act. Arguably, the ms rule and the ma rule return new formal representations that are at least as reasonable as the original representations; hence, both rules satisfy the left-hand side of the weak monotonicity axiom. Furthermore, since all parallel states and acts are detected by applying the ms and the ma rules, both rules are also iterative in the sense required by the right-hand side of weak monotonicity. The set constituted by the ms and the ma rules is, therefore, an example of orderindependent transformative decision rules. For the second example, remember that uncertainty about utilities and secondorder uncertainty about probabilities is modelled by a set of utility functions (U) and a set of probability functions (P). Suppose that an agent wishes to transform a formal representation containing sets with many probability functions and utility functions into a representation containing only one utility function and one probability function. (The motivation might be that this makes it possible to calculate the expected utilities of the alternative acts.) Now consider two hypothetical transformative rules, the u rule and p rule, respectively. The u rule is a rule that aggregates the elements of U into a single utility function u, e.g. by calculating the mean utilities, or in some other way. (For present purposes it does not matter how the elements of U are aggregated.) Furthermore, the p rule is a rule that aggregates the elements of P into a single probability function p. (As before, it does not matter exactly how this is done.) It follows trivially that u and p satisfy the right-hand side of the orderindependence axiom, and given that the aggregation functions are reasonable, both rules also satisfy the left-hand side of the axiom. Hence, the p rule and the u rule constitute an example of order-independent transformative decision rules—or rather a set of examples, since both rules can be specified in several different ways. The third example is more controversial that the previous one. Consider the following formulation of the principle of insufficient reason (ir), mentioned in Section 2: The Principle of Insufficient Reason: If π is a decision problem under ignorance, then it should be transformed into a decision problem under risk π in which equal probabilities are assigned to all states. The ir rule as well as the ms rule satisfy the weak monotonicity axiom. Nothing is gained by applying one of the rules more than once, and none of the rules throw
56
3 Choosing what to decide
a spanner in the work carried out by the other. It follows that (ms ◦ ir)(π ) ∼ (ir ◦ ms)(π ). However, it does not hold that (ms ◦ ir)(π ) = (ir ◦ ms)(π ) for all π , i.e. the two representations need not be identical. In order to construct an example of this, I stipulate that if the antecedent in the formulations of ir and ms are false, then no transformation is carried out, i.e. they return the same representation that was used as input. The following story illustrates how the decision problem corresponding to the formal representation π , depicted below, can arise: You are a paparazzi photographer, and rumour has it that actress Julia Roberts will show up in either New York (NY), Geneva (G), or Z¨urich (Z). Nothing is known about the probabilities for these three states of the world. You have to decide if you should stay in Switzerland or catch a plane to America. If you stay (a1 ) and Ms Roberts shows up in New York (NY ), you receive 0 utiles; otherwise, you get your photos and receive 10. If you catch a plane to America (a2 ) and Ms Roberts shows up in New York (NY ) you receive 5 utiles, and if she shows up in Switzerland you receive 6 (because you are able to call a friend that takes even better photos). Table 3.2 [π ]
[π ] a1 a2
NY 0 5
G 10 6
1/2 0 5
1/2 10 6
Z 10 6
a1 a2
1/3 0 5
2/3 10 6
[π ] a1 a2
Representation π is obtained from π by first applying the ms rule and then the ir rule. Representation π is obtained from π by applying the two rules in the reversed order. Which formal representation is best: π , π or π ? (For the sake of the argument, I assume that no alternative representation is to be considered.) Because of Theorem 3.1, it is known that (ms ◦ ir)(π ) ∼ (ir ◦ ms)(π ) π . Hence, π ∼ π π . However, observe that EU(a1 ) > EU(a2 ) in π , but EU(a2 ) > EU(a1 ) in π . Thus, the principle of maximising utility recommends one act in π (stay in Switzerland) but another in π (catch a plane to America). How should a rational agent act? Anyone who accepts the two transformative rules ms and ir and considers them to satisfy weak monotonicity will consider the two formal representations to be equally reasonable, because of Theorem 3.1. As pointed out above, an agent facing rival representations can react in at least two ways. Inspired by the terminology introduced by W.V. Quine24 , I stipulate that you are an ecumenist just in case you think all rival representations are equally 24
Quine (1992: Chapter 5).
3.8 Rival representations
57
reasonable and it is indifferent which one the agent decides to apply. Furthermore, you are a sectarian just in case you think each decision problem directly determines a set of rational acts, irrespective of how that decision problem is, or can be, formally represented. At first glance, ecumenism seems to imply a contradiction. Because if one and the same act is judged as rational in one rival representation (by e.g. the principle of maximising expected utility) but non-rational in another rival representation (by the same principle), ecumenists have to maintain that it is both rational and not rational to perform that act. Thus, if R(b) is the predicate ‘it is rational to perform b’ one has R(b) and ¬R(b). However, ecumenists avoid this contradiction by arguing that acts are rational only relative to a certain representation. From a logical point of view, this means that rationality is a two-place predicate, i.e. R(b, π ) means ‘it is rational to perform b relative to representation π ’. Obviously, R(b, π ) and ¬R(b, π ) are not contradictory. So typically, ecumenists will say things like ‘this act is rational when evaluated in this optimal representation but not when evaluated in that optimal representation’. It should be emphasized that the ecumenic position can provide the agent with action guidance. In the general case in which an act is judged as both rational and non-rational (in different rival representations), ecumenists shall maintain that it is permitted both to perform and to not perform the act in question. Furthermore, acts that are non-rational in every rival representation are, according to ecumenists, forbidden. Analogously, an act is obligatory just in case it is permitted in every rival representation and no alternative act is permitted in any rival representation. Sectarians do not accept the ecumenic assumption that acts are rational only relative to a certain representation. They think that what is rational to do is determined by the decision problem itself, not the representation of it. To talk about rationality as something being relative to different representations is, according to sectarians, misleading since the representation is just a tool the agent uses for figuring out what to do in a given decision problem. More precisely, sectarians think that there is a distinction to be drawn between the conditions under which an act is rational, and the conditions under which an agent can know that an act is rational. The phenomenon of rival representations shows that there are limits to what one can know when it comes to rational actions, but it shows nothing about when acts are in fact rational. Thus, sectarians claim that in case there is only one optimal representation of a decision problem one can know which acts are rational, provided that one is able to find that optimal representation and apply the correct effective decision rule to it. But in the typical case, with more than one optimal representation, it does not help for sectarians to know which effective rule is correct since one and the same act may come out as both rational and non-rational in different optimal representations. A problem for adherents of sectarianism is that all decision rules, e.g. the maximin rule, the minimax regret rule and the principle of maximising expected utility, are defined only in relation to formal representations of decision problems. For example, as noted above, it does not make any sense to say that the expected utility of one alternative act is higher than that of another alternative act unless there is a formal representation of the decision problem listing the relevant acts, states,
58
3 Choosing what to decide
probability and utility functions. Therefore, sectarians are forced to deny that acts are rational because they, for example maximise expected utility, since the concept of expected utility is only defined relative to a given formal representation. But in decision theory it is quite common to argue that acts are rational because they maximise expected utility, or because they guarantee an optimal security level, or because they have a minimal regret value, etc. Is this way of arguing fundamentally mistaken? Sectarians could perhaps reply that acts need not be rational because they, for example, maximise expected utility, even though acts that are rational also maximise expected utility (in some optimal representation). The trick is to say that the covariation between rationality and optimal expected utility arises in virtue of some independent set of basic features of the decision problem. Exactly what these features are is irrelevant here. However, since an act that has the highest expected utility in one optimal representations may have a very low expected utility in a another representation and vice versa, this position is not compatible with the claim that there is only one unique set of rational acts in every decision problem. Therefore, sectarianism seems to be problematic even when assessed from an ‘inside’ perspective. Another problem for sectarianism is that it leaves little room for the action guiding dimension of decision theory.25 If one in the typical case cannot know, because of the phenomenon of rival representations, which acts are rational and which are not, one has to invent some other theory that can help us guide those decisions that nevertheless have to be taken. We cannot refrain from deciding whether to invest or not invest some money in a company, because making no decision will (in the typical case) be equivalent to deciding not to invest any money, and this decision may yield an enormous regret value. More generally speaking, the claim that one sometimes cannot know if an act a is rational is uninformative from a practical point of view, and it is not analogous to the claim that one sometimes cannot know if some proposition p is true or false. It is uninformative because if one cannot know if a is more rational than ¬a, one has to formulate some other criterion besides rationality according to which practical decisions can be evaluated. And the analogy with the doxastic case fails because in the case with beliefs a reasonable option is to accept neither p nor ¬p, whereas in the practical case one cannot refrain from doing either a or ¬a, as illustrated by the investment problem. Perhaps the formulation of the sectarian’s position needs to be improved. In Quine’s terminology, a ‘sectarian’ is someone who thinks that his or her own optimal theory is true and all rival theories are false. In an analogous way, the sectarian position in decision theory could be taken to mean that there is one correct representation of each decision problem (the one the agent actually acts from). This version of sectarianism is, however, implausible for at least two reasons. First, it forces the agent to adjudicate upon which representation is correct by considering some new unspecified criterion, that was not applied for determining the aggregated value of different representations. What could that criterion be? Second, it sounds odd to speak of the ‘correctness’ (or ‘truth’) of a representation, since two incompatible statements cannot both be true, even though such incompatible statements 25
For a careful discussion of the concept of action guidance, see Carlsson (2002).
3.8 Rival representations
59
can be elements of different rival representations. Therefore, this modified version of sectarianism does not seem to be any better than the original version. A fundamentally different reaction to the phenomenon of rival representations is to claim that in case an agent ends up with more than one optimal representation, he (or she) should go back and start over the representation process. Of course, it might be reasonable to assign different weights to the four values if the representation process is iterated. For example, some people may feel that some amount of simplicity can be sacrificed in order to obtain a unique representation that is optimal. However, this strategy cannot guarantee that an unique and reasonable optimal representation will always be reached in a finite number of iterations; therefore, it is no general resolution to the problem with rival representations. A fourth option is to let all rival representations form a new ‘disjunctive’ representation. For example, in case it is equally plausible to assume that the probability for rain is .40 as .75, one can construct a new representation that contains two or more different probability functions that reflect this epistemic uncertainty. (For detailed attempts, see Isaac Levi (1980) respectively G¨ardenfors and Sahlin (1982).) However, even though this strategy might be reasonable in the case with probabilities, it is hard to defend it in the case with acts. The reason is that two sets of acts, each of which contain elements that are jointly exclusive and exhaustive, will typically not form a union in which all elements are jointly exclusive and exhaustive; the same holds for states of the world. Hence, this position is no genuine option in a general discussion of rival representations.
Chapter 4
Indeterminate preferences
On the non-Bayesian view, preferences over uncertain prospects should be derived from preferences over outcomes. In short, an uncertain prospect ought to be preferred over another because the agent holds certain desires and beliefs about its possible outcomes. This is a natural extension of the Humean belief-desire model of action, from deterministic to non-deterministic choices. In this and the following three chapters, this so far vaguely stated view will be rendered more precise, and supported by an axiomatic argument. The present chapter seeks to develop a theory of rational preference over outcomes. It is argued that preferences are sometimes indeterminate, meaning that a preference may come in degrees and that the agent may therefore prefer A over B and B over A without violating any principle of logic or rationality. In Chapter 5 a non-Bayesian notion of utility is developed, based on the theory of indeterminate preferences presented here. Chapter 6 articulates a non-Bayesian concept of subjective probability, and Chapter 7 defends an axiomatic analysis of the principle of maximising expected utility. The theory of indeterminate preference articulated in this chapter has its origins in Ramsey’s ‘Truth and Probability’, in which he presented the theory of subjective probability as the ‘logic of partial belief’.1 According to Ramsey, having a belief is not an all-or-nothing affair. It is something that comes in degrees. Therefore, a rational agent may believe in a proposition and its negation simultaneously, given that his beliefs are partial and satisfy a number of structural conditions. The theory of indeterminate preferences proposed here extends Ramsey’s logic of partial belief into a ‘logic of partial preference’. The point of departure is the common assumption that a strict binary preference is asymmetric: Someone who prefers A to B should not prefer B to A. According to the new theory, however, the picture of rational preference inherent in this assumption is too simplistic. Imagine, for example, that you are attempting to choose between listening to a Wagner or Verdi disc. Introspection reveals that your preference is indeterminate, meaning that you have no settled opinion about which composer you prefer most at the moment. To some 1
Ramsey (1926).
61
62
4 Indeterminate preferences
degree, you prefer Verdi to Wagner and Wagner to Verdi at the same time, and that is not necessarily irrational. When you reach for the Wagner disc, you think that Verdi would better suit your present mood, but had you reached for the Verdi, you would have thought that Wagner was really what you wanted to listen to. (As will be made clear below, preferences are not to be defined in terms of actual choice behaviour.) Your indeterminate preference for Wagner and for Verdi is not due to any lack of information about the external world. It can be characterised as an instance of ‘intrinsic indeterminacy’, because the preference in question would remain indeterminate, even had you known all relevant facts of the external world. The central claim of the new theory is thus that a preference may come in degrees, and that an agent may therefore prefer A over B and B over A without violating any principle of logic or rationality. The Wagner-or-Verdi case is just one example; there are of course many more. My proposed analysis of the kind of vacillating preferential attitude exemplified here consists of two steps: Indeterminacy is analysed in terms of partial preferences, and partial preferences are analysed in terms of subjective probabilities for acts. To say that a preference for Wagner over Verdi is partial to degree 0 < p < 1 means that the probability is p that the agent will chose Wagner, if given the opportunity. Obviously, this is not equivalent to saying that the strength of a preference may vary. That claim is trivial. The thesis defended here is that even if one were to keep the strength of a preference fixed, one may still prefer Wagner over Verdi to degree p, and Verdi over Wagner to degree 1 − p, all things considered.
4.1 Previous accounts of preferential indeterminacy In previous accounts, indeterminate preferences have generally been treated as a special case of indifference. However, Savage (1954) hints at a possible distinction between indifference and indeterminacy: ‘If the person really does regard f and g equivalent, that is, if he is indifferent between them, then, if f and g were modified by attaching an arbitrary small bonus to its consequences in every state, the person’s decision would presumably be for whichever act was thus modified’.2 Arguably, an agent with an indeterminate preference between f and g would most certainly not make a decision based on an arbitrary small bonus added to one of the objects. Suppose, for instance, that you have decided to spend all of your savings on a new car. You have a determinate preference for a BMW over a Ford, and a determinate preference for a Mercedes over a Ford. However, your preference for a BMW over a Mercedes and vice versa, is indeterminate. If you were indifferent about the choice between a BMW and a Mercedes, a $100 discount on either of the two cars would turn your state of indifference into a strict determinate preference for the discounted car. However, a $100 discount on a product costing $50, 000 or more will
2
Savage (1954/72:17).
4.1 Previous accounts of preferential indeterminacy
63
presumably not have such a dramatic effect, indicating that the concept of indifference is distinct from that of indeterminacy. Another option is to explicate indeterminate preferences as instances of an incomplete preference ordering. In the car example, an analysis of indeterminacy in terms of incompleteness would mean that all of the following are false: (i) a BMW is strictly preferred to a Mercedes; (ii) a Mercedes is strictly preferred to a BMW; and (iii) the agent is indifferent between a BMW and a Mercedes. However, this analysis cannot account for the fact that indeterminacy comes in degrees. Suppose, for example, that you have an almost determinate preference for a Mercedes over a BMW and feel that you would buy that car nine times out of ten without changing any of your tastes or wishes. In this case, it seems that the determinacy of the reversed preference (i.e. for a BMW) is not the same as the determinacy of the preference for the Mercedes. The more determinate the preference for the Mercedes, the less determinate the preference for the BMW will be. This indicates that the concept of incompleteness cannot fully account for reasonable intuitions about indeterminacy. A more complex account of indeterminate preferences has been proposed by Isaac Levi, who argues that the phenomenon of what I call indeterminacy arises in what he calls ‘decision making under unresolved conflict’.3 Such decision making is characterised by a clash among conflicting values (e.g. those that are cognitive, practical, and ethical), in which at least one value is set aside no matter which alternative is chosen. Levi may be correct in his analysis of clashes among incompatible values, but such clashes do not seem to be the sole source of indeterminate preferences. Sometimes agents do not know what they prefer, even when all alternatives are evaluated according to only one criterion. This is illustrated by the Verdi-or-Wagner example, in which the only value used for evaluating alternatives is the-music-you-want-most-right-now. Furthermore, Levi’s method for handling indeterminate preferences is dubious. In several of his books, he argues that a rational agent should start the deliberative process by dividing the alternatives into sets of feasible and non-feasible options, and then delimit the first set by applying various criteria such as E-, S-, and V-admissibility.4 In this approach, indeterminacy is primarily reflected in the division into feasible and non-feasible alternatives. If the agent is almost sure that a particular alternative will not be chosen, then that alternative is not feasible. However, a fundamental problem with Levi’s approach is that it leaves little or no room for the idea that preference indeterminacy varies, more or less continuously, by degrees. By categorising alternatives into sets of feasible and non-feasible acts, only extremely rough qualitative statements about preferential indeterminacy can be made. This is a significant limitation. Most of us have experienced that some preferences are much more indeterminate than others. For instance, my preference for red wine over water is much less indeterminate than my preference for a Mercedes over a BMW.
3 4
Levi (1986). See Levi (1986: Chapters 4-7).
64
4 Indeterminate preferences
Yet another theory of indeterminate preferences has been proposed by Sugden.5 He suggests that we should talk about ‘taste states’ (e.g. states in which coffee is preferred over tea), and then ascribe subjective probabilities to those states. A drawback of this view is that it requires that there be true taste states, i.e. taste states that actually occur. However, if your preference between coffee and tea is indeterminate, it seems absurd to think that either coffee or tea is your ‘true’ taste state. How can you have a determinate preference for something without knowing it?
4.2 What is a preference? Before spelling out my theory of indeterminate preferences, it is necessary to raise the more general question of what a preference is, irrespective of whether it is indeterminate or not. There are two predominant, general theories that address this issue. Scholars advocating internalism hold that preferences are mental states that trigger choices. These mental states constitute the agent’s dispositions to act. Advocates of revealed preference theory hold a different view: They claim that preferences should be accounted for in terms of actual choices, without referring to mental states. Advocates of revealed preference theory do not claim that preferences should be identified with choices. The claim defended by revealed preference theorists is a weaker one: An agent whose choices fulfil certain structural axioms can be described as if he acted from a preference ordering that satisfies a set of reasonable logical properties. It follows that revealed preference theory is not inconsistent with the claim that mental states are ‘in charge’ of the agent’s choices. The disagreement between internalists and revealed preference theorists is rather about whether mental states should figure in explanations and rationalisations of human behaviour. Briefly put, the basic message of revealed preference theory is that observable behaviour can be represented by a preference ordering in a technically precise way without making substantial assumptions about mental states or other unobservable entities; therefore, such assumptions should be avoided. Internalists do not agree. Revealed preference theory has a long history. The locus classicus is Paul Samuelson’s paper ‘A note on the pure theory of consumer’s behaviour’ published in 1938. The basic assumption is that if x is chosen when y is available, then x is (weakly) preferred to y. According to advocates of revealed preference theory, this holds true, no matter what triggered the agent to choose x. As noted above, revealed preference theorists do not claim that preferences can be identified with choices. The theory merely proves that an agent whose choices fulfil certain structural axioms can be described as if he acted from a preference ordering satisfying a set of technical conditions. This technical result can be summarised as follows:6 Let S = {x, y, z, . . .} be a finite set of items, and let C(S) denote the subset of S that was actually chosen, that is, the ‘choice set’. In revealed 5
My comments on this proposal are based on Sugden’s presentation at FUR XI in Paris 2004. This presentation of revealed preference theory draws on Till Gr¨une’s excellent discussion in Gr¨une (2004). 6
4.2 What is a preference?
65
preference theory, x is directly revealed preferred to y, abbreviated as xDy, if and only if x is in fact chosen when both x and y are available.7 Put into technical terms, xDy ⇔ (x, y ∈ S) ∧ (C(S) = x). Furthermore, x is indirectly revealed preferred to z, abbreviated as xIz, if and only if x and z are connected by the relation D through a finite series of intermediary elements yi...k in S, such that xDyi ∧ . . . ∧ yk Dz. According to the weak axiom of revealed preference theory (WARP), D is an asymmetric relation, i.e. if xDy, then ¬(yDx). The strong axiom of revealed preference theory (SARP) requires that I be asymmetric, so if xIz, then ¬(zIx). The following representation theorem can then be proved: Let WARP hold for S; then there exists a complete and asymmetric preference ordering on S. Furthermore, let SARP hold for S; then there exists a complete, asymmetric and transitive preference ordering on S. (For proofs, see e.g. Varian (1999)). The word ‘exist’, as used in the present context, does not imply that a preference ordering exists in any physical or mental sense. It just means that it is possible to construct a hypothetical preference ordering that rationalises the agent’s behaviour: his choices can be described as if they were guided by a preference ordering. The formal results of revealed preference theory are, of course, impeccable from a technical point of view, but one can question their explanatory power, nevertheless. First, consider the case in which a preference ordering is constructed from SARP and WARP by using and ∼ as primitive relations (instead of ). Then, what observations should one expect to make if the hypothesis x y is true? Of course, C({x, y}) = x would falsify the hypothesis that y x, but this observation cannot distinguish between x y and x ∼ y.8 This shows that no single observation is sufficient for determining the agent’s preference in this case. Next, consider the case in which is used as a primitive concept, instead of and ∼. Obviously, C({x, y}) = y and C({x, y}) = x are both compatible with the hypothesis that x y. Again, no single observation of a pairwise choice between x and y could prove false (or true) that x y. The obvious remedy, explicitly advocated by revealed preference theorists, is to base conclusions about preferences on sequences of choices. The basic idea is to systematically exclude alternative hypotheses by asking the agent to make pairwise choices. This procedure presupposes that the agent’s preference ordering remains constant while choice data is gathered. Suppose, for example, that one first observes that C({x, y}) = x, but a few moments later observes that, C({x, y}) = y. Unless there is reason to believe that the preference between x and y is constant, it cannot be concluded that the preference underlying these choices violates asymmetry. Even if the agent is isolated in a laboratory, it cannot be excluded that his preference changed during the course of the experiment, due to some internal revision of beliefs and values. Furthermore, even if one could somehow guarantee that preferences remain constant, it would not always help to acquire longer series of data. If one makes the observation that C({x, y}) = x a repeated number of times, but make no 7 Here is Samuelson’s formulation: ‘if an individual selects batch one over batch two, he does not at the same time select batch two over one’. Samuelson (1938:65). 8 For the sake of the argument, I here exclude the hypothesis that x and y are incomparable.
66
4 Indeterminate preferences
observations of C({x, y}) = y, nevertheless, this sequence of observations cannot distinguish between x y and x ∼ y. Another way of explaining this weakness of revealed preference theory is to reason as follows. Since one can never know a priori that preferences remain constant as choice data is gathered, not even if people are kept isolated, it is always possible to formulate at least two different hypotheses that explain a set of choice data equally well. The first hypothesis is that preferences are exactly as choice data suggest. The second is that preferences changed while data was gathered. If the second hypothesis is true, one cannot tell whether the agent’s choices satisfy structural conditions such as asymmetry and transitivity. This shows that every preference ordering generated by revealed preference theory is underdetermined by data. It also worth noticing that the representation theorem of revealed preference theory only implies that there exists at least one preference ordering that is consistent with the observed sequence of choices. There might very well be more than one. For example, as pointed out above, a large number of observations of C({x, y}) = x is equally consistent with x y as with x ∼ y. The problems described so far can perhaps be considered minor cosmetic defects. However, a more fundamental problem is that revealed preference theory cannot account for incomparability. Presumably, if one neither prefers x to y, nor y to x, nor is indifferent between the two items, one would, nevertheless, if faced with a choice between x and y, choose one of the items. For example, if a vacation in Alaska is judged to be incomparable with (an equally long and expensive) vacation in Paris, then the agent neither prefers one to the other, nor is he indifferent. The two trips are so different that they cannot be compared. Nevertheless, if faced with a choice between Alaska and Paris, presumably, the agent would choose one of the trips. In this case, an external observer would falsely conclude that he therefore preferred the chosen trip. If the agent simply refused to choose either Alaska or Paris when offered the choice, an external observer would falsely conclude that the agent preferred 0/ (that is, the empty set, a third alternative) over Alaska and Paris. Unsurprisingly, incomparability is not acknowledged as a possible preference relation in revealed preference theory. A leading economist even thinks that to deny the possibility of incomparability, ‘is hardly objectionable’ and, ‘To say that any two bundles can be compared is simply to say that the consumer is able to make a choice between any two given bundles.’9 In light of the more minute view of incomparability developed in Sections 4.3 and 4.4, it should be clear that this view is problematic. An agent facing a choice between Fanta and Pepsi may very well choose one of them, not because he prefers the chosen soft drink, but simply because he chooses randomly. Arguably, this random choice is a sign of incomparability. Now consider internalism. Internalists believe in a tight causal connection between mental states and choices. (Advocates of revealed preference theory are not committed to denying that such a tight causal connection might exist, but they do not think this is relevant when rationalising and explaining human behaviour.) For present purposes, a distinction will be drawn between strict internalism and 9
Varian (1999:35).
4.2 What is a preference?
67
quasi-internalism. Advocates of strict internalism believe that mental states are real entities located ‘inside’ agents, and that mental states cause choices in much the same way as ordinary physical objects figure in causal relations. Quasi-internalists, on the other hand, argue that mental states are theoretical constructs ascribed to agents for rationalising their behaviour. According to this view, mental states are not real entities located ‘inside’ agents. See Dennett (1978) for a defense of this view. In what follows, I focus on strict internalism, which is the most influential version of internalism. According to strict internalism, if you choose x over y, you do so because you are in a mental state that triggers this choice. Of course, advocates of strict internalism acknowledge that there might be extreme cases in which the agent’s mental states do not match actual choices. The agent might stumble and press the red button instead of the blue, even though his mental state disposed him to chose conversely. However, for the advocate of strict internalism, this is just involuntary behaviour. No genuine-counter example can be constructed out of such cases. By stipulation, strict internalists believe that mental states uniquely determine choices. Mental states and choices either stand in a one-to-one relation, or more plausibly, in a many-to-one. As will be explained in subsequent sections, the probabilistic theory denies this tight connection between mental states and choices. A potential problem for strict internalism is that according to this view, governments, corporations, and other collective agents have no preferences. This is because collective agents have no mental states. A possible remedy could be to reduce collective preferences to individual ones, but this manoeuvre has its own problems. Suppose, for instance, that corporate preferences are reduced to individual shareholder preferences. Also suppose that an agent is a shareholder in two corporations, A and B, which have opposite preferences. The first corporation prefers x to y and the other y to x. For simplicity, it will be assumed that all shareholders within each corporation agree on their attitudes towards x and y, so by reduction, every shareholder must prefer x over y and y over x. This conclusion is, however, false. Arguably, what the shareholders really want is that corporation A, as a corporation, should prefer x over y and that corporation B, as a corporation, should prefer y over x. The reason was perhaps that this would help both corporations improve their competitiveness. Personally, the shareholders have no opinion at all about their preference between x and y. Strict internalism also faces difficulties of another kind. Suppose, for instance, that an agent plays a game on the Internet without knowing whether his opponent is a human or a computer. After having played for a while, he believes that he has learned something about his opponent’s preferences. Later on, he is informed that the opponent was, in fact, a computer and therefore had no preferences at all. Then, why should one accept a theory forcing the agent to make such a drastic revision of beliefs, just to accommodate this rather trivial piece of new information? However, the perhaps most fundamental challenge faced by strict internalists is to make sense of incomparability. Arguably, the most natural explication would be to let this preference relation denote a specific mental state not resulting in any choice at all. (Otherwise, if the internalist claims that incomparability corresponds
68
4 Indeterminate preferences
to a disposition to choose some alternative actively, then there would be no genuine difference between incomparability and an ordinary preference.) Unfortunately, if incomparability is explicated in this way, it is impossible to distinguish incomparability from a deliberate choice to just sit still and do nothing. For example, even if the agent chooses 0/ (rather than a vacation in Alaska or Paris), the internalist cannot exclude that the agent had a disposition to choose 0. / Hence, there seems to be no genuine difference between incomparability and an ordinary preference.
4.3 Introduction to the probabilistic theory The theory of indeterminate preference advocated here is incompatible with revealed preference theory, as well as with strict internalism. In order to spell out the theory in more detail, we shall introduce a quantitative measure p ranging from 0 to 1, such that 1 corresponds to a determinate preference for x over y, and 0 to a determinate preference for y over x. A preference is indeterminate whenever p lies between the two extremes. A natural way to make sense of quantitative statements of indeterminacy is to conceive of them as probabilities. This is attractive because it provides a method for empirically establishing degrees of indeterminacy. An external observer analysing the indeterminate preferences of others may take such probabilities to be objective truths about relative frequencies (or, if one is a strict subjectivist about probability, subjective estimates of relative frequencies). Suppose, for example, that the owner of a supermarket observes that one of his customers tends to buy apples but no bananas four times out of ten, and bananas but no apples six times out of ten. In that case, the owner may conclude, if he is somehow able to verify that none of the customer’s tastes or wishes have changed, that the customer has an indeterminate preference for bananas over apples that corresponds to a probability of about 0.6. This probabilistic statement is, arguably, an improvement of revealed preference theory. The revealed preference theorist would have to say that the customer changed his preference between apples and bananas a large number of times. However, advocates of both theories agree that a preference is not a property of the agent. It is, on the contrary, something the observer attributes to the agent. The probabilistic theory allows for the formulation of more minute behavioural hypotheses than revealed preference theory allows, in that probabilities can be assigned to the agent’s choice. From a philosophical point of view, the first-person perspective is arguably more interesting. However, leaving the general problems of a frequentistic analysis of probability aside, it seems clear that it makes little sense, from a first-person perspective, to analyse indeterminate preferences in terms of relative frequencies. The indeterminacy you face when choosing between an apple and a banana is a mental phenomenon, and should presumably be viewed from your perspective, i.e. from ‘within’ your mind, not from a frequentistic, ‘outside’ perspective. Therefore,
4.4 The probabilistic analysis of preference
69
indeterminate preferences appear to be better analysed in terms of subjective probabilities.10 If the numbers in the quantitative measure of indeterminacy are to be interpreted as subjective probabilities, there must be some underlying set of events (or propositions) to which they refer. Given that preferences are somehow connected with the tendency to make certain judgements and act upon them accordingly, it is natural to let those events or propositions be related to the agent’s current set of options, i.e. to interpret quantitative statements about preferential indeterminacy as subjective selfpredictions about which act the agent will eventually choose in the current decision situation. This is equivalent to saying that an indeterminate preference for A over B has the numerical value x if and only if the agent is confident to degree x that she will choose A over B in her current decision situation. For the time being, the concept of subjective probability will be left undefined. I shall return to this issue in Chapter 6. Up to that point, it will simply be taken for granted that subjective probabilities can somehow be assigned to choices.
4.4 The probabilistic analysis of preference The probabilistic analysis of preference I propose is the following. Let p , O be a comparison structure in which O is a set of options {x, y, ...} and p is a relation on O such that x p y if and only if the subjective probability is p that x will be chosen over y. Also, let x+ be a slightly improved version of x such that x+ 1 x. Presumably, x+ could be conceived of as the smallest noticeable improvement of x that affects the agent’s probability of choosing this object. Then, 1. 2. 3. 4.
The agent strictly prefers x to y iff x 1 y The agent strictly disprefers x to y iff x 0 y The agent is indifferent between x and y iff (x 0.5 y) ∧ (x+ 1y) ∧ (y+ 1 x) The agent considers x and y to be incomparable iff (x p y) ∧ x+ p y , where 0 < p < p < 1
The intuition articulated in (1) and (2) is that x is strictly preferred to y just in case it is certain that the agent will choose x rather than y (or vice versa) if given the opportunity. According to the probabilistic notion of indifference articulated in (3), the agent is indifferent whenever there is a fifty-fifty chance for each option to be chosen and a small improvement of one of the items would raise the probability from 0.5 to 1 that the improved item is chosen. Hence, if you are indifferent between Fanta 10
It has been suggested to me that indeterminate preferences could perhaps be analysed as propensities, rather than in terms of subjective probabilities. However, it seems to me that the propensity interpretation cannot adequately explain why anyone who has an indeterminate preference feels and believes that his choice is uncertain. Arguably, this indicates that an indeterminate preference is not merely a propensity; it also requires some kind of belief about one’s own future choice.
70
4 Indeterminate preferences
and Pepsi at current prices, but get the opportunity to buy a Pepsi for one penny less, the probability that you choose Pepsi will rise to one; otherwise you were not indifferent. This notion of indifference can be contrasted with the following asymmetrical indifference relation. (3∗ ) The agent is asymmetrically indifferent between x and y iff (x p y) ∧ (x+ 1 y), where 0 < p < 0.5 or 0.5 < p < 1. If the probability is, say, 0.7 that you will choose x over y, but it is certain that you will choose x+ over y, then you are asymmetrically indifferent between the two objects in the sense specified in (3∗ ). I by no means claim that this is a common type of probabilistic preference, but it is worth mentioning as a genuine possibility – it cannot be excluded a priori that people sometimes behave like this. Incomparability is modeled by (4). The basic idea is that the agent considers two objects to be incomparable in case the probability that x will be chosen over y is higher than zero but less than one, given that a small improvement will increase the likelihood that the improved object is chosen just a little bit. From a probabilistic point of view, the difference between indifference and incomparability is thus straightforward. An addition of a small bonus to x in case the agent is indifferent will increase the probability that the agent chooses x from 0.5 to 1; see (3) and (3∗ ). However, if an equally small bonus is added to x when this object is incomparable to y, the probability that x is chosen would presumably increase just a little bit, say from 0.5 to 0.51. Hence, indifference is a probabilistic preference that is extremely sensitive to slight modifications of the alternatives, whereas incomparability is not. It is important to distinguish between degrees of incomparability and the probability that an incomparable option is chosen. The the degree D to which a pair of options x and y is incomparable can be defined as D = p − p, where p is the probability that x+ is chosen over y and p is the probability that x is chosen over y.
4.5 Reflexivity, symmetry, and transitivity The non-probabilistic preference relation is reflexive, antisymmetric, and transitive. There are strong reasons for thinking that the corresponding properties of the probabilistic preference relation must be as follows: P-reflexivity:
x 0.5 x
P-symmetry:
If x p y, then y 1−p x
P-transitivity:
If x p y, and y q z, then x r z, where r=
p·q (p·q)+(1−p)·(1−q)
4.5 Reflexivity, symmetry, and transitivity
71
The formula for P-reflexivity may look somewhat arbitrary. However, note that any other value than 0.5 will imply a contradiction in conjunction with P-symmetry, which may be taken to be a self-evident property. Arguably, it is best to think of the choices described by P-reflexivity as between uniform types: P-reflexivity is not the claim that the agent will choose some individual object with probability 0.5 when compared with itself; the idea is rather that in a choice between one x and another x, the probability is 0.5 that the agent will choose the first x. The formula for P-transitivity implies that the traditional (non-probabilistic) notion of indifference will remain a transitive relation, because 0.5 · 0.5 = 0.5 0.5 · 0.5 + (1 − 0.5) · (1 − 0.5)
(4.1)
Hence, if x 0.5 y and y 0.5 z, then x 0.5 z. For the same reason, the traditional notion of strict preference is also a transitive relation. That is, if x 1 y and y 1 z, then x 1 z, because 1·1 =1 1 · 1 + (1 − 1) · (1 − 1)
(4.2)
Arguably, the best argument for accepting P-transitivity is that it is entailed by Luce’s well-known choice axiom. In order to show this, we need to adopt a slightly more complex notation: Let p(x > B) be the probability that x is chosen from the set of alternatives B, and let p(A > B) mean that when A is a subset of B the probability is p that the chosen alternative is an element in A. Assume that all probabilities are nonzero. Now, according to the choice axiom, the probability that x is chosen from B equals the probability of choosing x from A, where A is a subset of B, multiplied by the probability that the chosen alternative is an element in A. In symbols, Choice Axiom Let A ⊂ B. Then p(x B) = p(x A) · p(A B). Theorem 4.1 The choice axiom and the axioms of the probability calculus imply P-transitivity. To grasp what kind of assumption is at stake in the choice axiom, suppose that an agent is about to choose a wine from a list containing two red and two white ones. The choice axiom tells us that it should not matter if he divides his choice into two stages, that is, first choose between red and white wine, and then between the wines in the chosen subset, or choose directly which of the four wines to order. Hence, if he is indifferent between red and white wine in general, as well as between the two red wines and the two white ones, the probability that a particular bottle will be chosen is 12 · 12 = 14 . As first pointed out by Debreu (1960), there are seemingly rational combinations of preferences that violate the choice axiom. The following example was suggested by Coombs et al. (1970): At the restaurant, an agent is indifferent between seafood and meat, as well as between steak and roast beef. The menu comprises only three
72
4 Indeterminate preferences
dishes: x lobster, y steak, and z roast beef. Let B be the entire menu, let A be the set comprising x and y, and let A be the set of y and z. Now, given the notation introduced above, since you are indifferent between seafood and meat, p(x B) = 12 . However, p(A B) = 1 − p(z B) = 1 − p(z A ) · p(A B) = 1 − 12 · 12 = 34 . Hence, the choice axiom implies that p(x B) = p(x A) · p(A B) = 12 · 34 = 38 . Luckily, advocates of the choice axiom need not feel too worried about this example. The root of the problem lies in the individuation of alternatives.11 Lobster, steak, and roast beef are not alternatives at the same level. Lobster belongs to the category ‘seafood’, and could equally well be replaced with tuna, or any other seafood dish. However, neither steak nor roast beef could be replaced with some other meat dish, say kebab, because then it would no longer be certain that the agent would remain indifferent between the two meat dishes.12 Arguably, the moral of Debreu’s example is that alternatives must be individuated with care. The choice axiom should already be taken into account when alternatives are individuated. It can thus be conceived of as a normative requirement for how alternatives ought to be individuated: They should be individuated such that the choice axiom holds. Presumably, this does not transform the choice axiom into a tautology. It rather imposes a normative constraint at a very basic level. Before closing this section, it should be pointed out that Luce’s reason for assigning probabilities to choices had nothing to do with our present concerns about incomparability. He assigned probabilities to choices in an attempt to address an empirical problem in psychology, viz. to explain why people sometimes behave in ways that contradict the prescriptions of traditional decision theory. Luce’s explanation was that people sometimes fail to discriminate which alternative is best for them. More precisely, Luce explicitly claimed that all (i) alternatives faced by the agent have ‘true’ utilities, but (ii) sometimes the agent cannot fully perceive these utilities and, (iii) therefore occasionally chooses sub-optimal alternatives (see Luce 1959; 2005). The higher the probability is that an alternative is chosen, the better is the discrimination. This is parallel to a discrimination problem in which an agent cannot tell which of two stones is heaviest. There is a true fact of the matter, but due to limited perceptual capabilities, the truth cannot be revealed. Judged from a contemporary perspective, it seems difficult to justify the assumption that people make probabilistic choices because there are utilities out there that they fail to perceive. The concept of utility is closely related to mental states, and with a few notable exceptions it is widely agreed that mental states are internal and readily accessible through introspection. Therefore, in order to avoid the kind of externalism proposed by Luce the present study advocates an alternative, subjective interpretation of probabilistic choices. According to this interpretation, there are no external utilities that people fail to discriminate among. Probabilities are rather assigned to choices for indicating that there is a certain amount of genuine indeterminacy involved. Consequently, as explained above, the expression x 0.7 y does not 11
Cf. Debreu (1960:188). On this view, since p(x B) = 1/2 it holds that 1/2 = p(x B) = p(x A) · p(A B) = p(x A) · 3/4. Hence, p(x A) = 2/3.
12
4.6 The choice axiom is not universally valid
73
mean that the agent tends to choose x over y seven times out of ten. It rather means that the agent’s degree of belief that x will be chosen corresponds to the number 0.7.
4.6 The choice axiom is not universally valid As explained above, P-transitivity is entailed by the choice axiom. It is therefore worthwhile to scrutinise this axiom in some detail.13 Imagine that you are about to buy a new car, a Jaguar or a Porsche. The Jaguar is elegant and comfortable, but not as fast and reliable as the Porsche. The probability that you choose the Jaguar is 40/100. You are then offered a third alternative, viz. a slightly discounted (say, $10 cheaper) but otherwise exactly similar Jaguar. When faced with a pair-wise choice between the discounted Jaguar and the Porsche, the probability is 41/100 that you choose the Jaguar. The fact that you get a small discount makes it a little bit more probable that you will choose the Jaguar, but since the two options are very different the difference in probability is relatively small. According to the probabilistic analysis of preference, you consider the two cars to be incomparable. However, when faced with a pairwise choice between the discounted Jaguar and the non-discounted Jaguar, it is almost certain that you will choose the discounted Jaguar. Let us say that he probability is at least 99/100, although it is, of course, not 1. (Strictly speaking, you should never assign a probability of 1 to any contingent event.) This means that a rational agent should be allowed to make the following pair-wise probabilistic choices: p: Choose the Jaguar over the Porsche with probability 40/100. r: Choose the discounted Jaguar over the Porsche with probability 41/100. q: Choose the discounted Jaguar over the (non-discounted) Jaguar with a probability of at least 99/100. The probabilistic choice dispositions described above contradict the choice axiom. In order to see this, consider the formula for P-transitivity in Section 4.5. In our example, p denotes the probability of choosing the non-discounted Jaguar over the Porsche, and q denotes the probability of choosing the discounted Jaguar over the Porsche. Hence, p = 40/100 and q ≥ 99/100. It follows that r ≈ 99/100, which contradicts the probability of r = 41/100 reported in our example. More generally speaking, it holds that whenever q (or p) gets close to 1 in the formula for P-transitivity, then r will also approach 1. This can be formally established by calculating the following limit function: p·q =1 q→1 (p · q) + (1 − p) · (1 − q) lim
(4.3)
13 I have not been able to trace the original source of the counter example below. We know that Debreu was familiar with it, but the example has also been attributed to Armstrong, although it is not mentioned in his (1939).
74
4 Indeterminate preferences
Equation (4.3) shows that the counter example does not depend on the exact values of the non-extreme probabilities. The numerical example thus serves as an illustration of a more general point: Intuitively, if you are offered a choice between two objects, x and y, and it turns out that your choice is probabilistic, then a slight discount on one of them, say y, may not always have any dramatic effect on your choice between x and the slightly discounted object y+ , although the discounted object y+ must of course be chosen over the non-discounted object y with near certainty. However, as can be seen above, the choice axiom does not allow for this. If it is almost certain that y+ is chosen over y, then it must also be almost certain that y+ is chosen over x. Hence, if you are thinking of buying a Jaguar or a Porsche, and there is a 40/100 probability that you will buy the Jaguar, then any small discount on the Jaguar, say one cent, must increase the probability that you buy the Jaguar from 40/100 to a number arbitrarily close to 1. This is of course fine if you are (asymmetrically) indifferent between the two cars. However, in the example described above, the cars were assumed to be incomparable. It seems that two radically different conclusions could be drawn from this example. First, one could dismiss the intuition that the cars are incomparable and save the choice axiom. As explained above, I think this is the wrong conclusion. The second alternative is to give up the choice axiom as a general constraint on indeterminate preferences. This is, I think, the right conclusion. There are decision problems in which the choice axiom holds true, and others for which it is false. Nowadays, even Luce agrees with this view. He no longer thinks of the choice axiom as an axiom; he rather prefers to call it the ‘choice property’.14 Thus, the choice axiom is invalid in decisions among incomparable objects. An agent facing incomparable objects is not irrational. However, as will be explained in Chapter 5, one can only ascribe utilities to (sets of) objects for which the choice axiom holds true. This means that the expected utility principle cannot be applied to every possible decision problem. It can only be applied to cases with well-defined utilities. Arguably, this is very much what one should have expected. Anyone facing a decision between two genuinely incomparable objects will fail to maximise expected utility, simply because there is no expected utility to be maximised in such cases.
4.7 Spohn and Levi on self-predicting probabilities The probabilistic theory of preference requires us to assign probabilities to our own choices. Wolfgang Spohn and Isaac Levi have independently argued that it is incoherent to assign subjective probabilities, defined in the Bayesian way, to one’s own choices. Spohn even goes as far as to say that it is ‘absurd to assume that someone
14
Personal communication, February 2008.
4.7 Spohn and Levi on self-predicting probabilities
75
has subjective probabilities for things which are under his control and which he can actualise as he pleases’.15 Even though I do not accept the premise of Spohn’s and Levi’s objection—that subjective probability should be defined in the Bayesian way, i.e. in terms of preferences over bets—I still think it is important to discuss their argument. It is desirable that a theory of indeterminate preference does not rely on any particular theory of subjective probability. Therefore, in this section, it will be argued that Spohn’s and Levi’s attacks on self-predicting probabilities rest on an incorrect view about the limits of measurement theory: Even if a particular measurement instrument tends to affect the outcome of the measurement, it does not follow that the disturbance cannot be controlled. Let us first consider Spohn’s argument, as it is presented by Rabinowicz (2002).16 Suppose for reductio ad absurdum, the agent has a self-predicting probability x that she will perform act A rather than ¬A in her present decision situation. Let eu(A) be the expected utility of act A, disregarding any bets placed on act A itself, and let eu(¬A) be the corresponding expected utility of act ¬A. According to the Bayesian account of subjective probability, if the subjective probability that A will be performed is x, then the agent considers the fair price P for buying a bet that pays W if A is performed and nothing otherwise to be x = WP . If the agent wins her bet, her net gain will be N = W − P. Furthermore, since the agent is rational, we may conclude that he should refuse any offer to buy an unfair bet, in which WP > x. However, also observe that the agent will choose A rather than ¬A if and only if eu(A) + N > eu(¬A). Hence, it is the difference between P and W that directs the agents choice between A and ¬A, not the ratio between P and W . If the difference is sufficiently large, then the agent will accept the bet on A even in case WP > x. This contradicts the above-mentioned conclusion, and hence there is no room for self-predicting probabilities in the betting account of subjective probability. As stated here, Spohn’s argument seems to presuppose that the agent is certain that she will take the act upon which she has placed a bet. However, as pointed out by Rabinowicz, the argument can be reconstructed without this assumption.17 Let eu(A) < eu(¬A) and let the net gain N of winning a bet placed on A be sufficiently high for making it true that eu(A) + N > eu(¬A). Then, if the agent is rational in most cases but not in all, the probability that A will be performed is fairly low before the bet on A is offered (since the expected utility of ¬A is higher). However, merely offering this bet has made the act in question much more probable, since the agent’s utility for doing so has increased from eu(A) to eu(A) + N, which exceeds eu(¬A). This runs counter to the very idea of defining self-predicting probabilities in terms of betting dispositions: The mere offering of a bet can hardly be allowed to affect the probability that an act will be chosen. The basic rationale behind Spohn’s argument is, of course, reasonable. If one can win a large amount of money by performing an act that one would otherwise 15 16 17
Spohn (1977:115). Spohn’s original argument in his (1977) is very brief. Rabinowicz (2002:101).
76
4 Indeterminate preferences
have considered unattractive, merely offering a bet might turn the unattractive alternative into an attractive one. However, the importance of this problem should not be exaggerated. First, Rabinowicz (2002) points out that the betting situation could be modified such that the agent asks a well-informed friend who knows him very well to do the betting, without telling the agent whether he will win or lose if the alternative the friend bets on is chosen. A drawback of Rabinowicz’s solution is that it solves the problem by inventing a new theory of probability. Subjective probability is no longer defined as before. So in a strict sense, Rabinowicz’s reply does not address Spohn’s objection. However, there is also another reply to Spohn’s argument. As the agent is offered the option to bet on her own acts, the measurement process is disturbed by the measurement instrument. This occurs in a similar way when one tries to measure the temperature in a cup of hot water using a very cold thermometer. Suppose, for instance, that the true (mean) temperature of a cup of water is 95◦C, and that one tries to measure this with a thermometer taken directly from the freezer. The thermometer might read 94◦C when it is put into the water, because the thermometer has cooled the water. On a theoretical level, this shows that it is impossible to measure the temperature of a fluid by putting a sensor into it, unless one already knows the temperature of the fluid and adjusts the temperature of the sensor accordingly before it is put into the fluid. From a practical point of view, this is, of course, a trivial problem. One must simply ensure that a sufficiently small sensor (thermometer) is used, such that the effect on the object being measured becomes negligible. The insight illustrated in the example above also applies to subjective probabilities. A fundamental problem with Spohn’s argument is that it presupposes no limit to how much money (or other valuable entities) one might use when constructing bets. Of course, if the stakes are high, the measurement system itself (the bets) may affect the object of measurement (the agent’s preferences among acts). Even though the standard account of subjective probability contains no upper limit to the amounts that may be offered in bets, we can still impose such a limit. More precisely, it is reasonable to demand that, if a set of bets is used to measure the probability for one’s own acts, the maximum net gain from such bets should be chosen such that it is negligible compared to the expected utilities of the acts. This will guarantee that no unattractive act is turned into an attractive one merely by offering a bet on it. If you were about to choose among acts involving potential gains or losses of thousands of dollars, then, your preference would not be affected by a bet in which you could win only a few cents by performing the act you bet on. The upshot of this criticism is that we do not have to accept the claim that the potential net gain N of winning one’s bet will affect our preferences, because if N is negligibly small, the measurement instrument will not affect the object being measured. Rabinowicz has suggested that this point is not applicable if the agent is indifferent between two acts.18 If you are truly indifferent, then an arbitrarily small bonus added to either option will dictate your preference. However, it seems to me that this is not a genuine problem. If an arbitrarily small bonus has such a dramatic effect on 18
In conversation.
4.7 Spohn and Levi on self-predicting probabilities
77
one’s preference, this effect can, of course, be isolated. If you are truly indifferent, we know that the two options are equally probable to be chosen, which means that we have successfully measured the entity we wished to measure. The upshot is that what counts as a ‘negligible’ amount must be determined relative to each decision situation/instance of measurement, and if no amount is negligible that is also a helpful piece of information. The philosophically important moral is that the amounts being offered must be chosen with care. The analogy with the thermometer is of some relevance: on a theoretical level, the measurement instrument will almost always disturb the object of measurement, but on a practical level this need not be a problem, since one can use an instrument with a negligible effect on the object being measured. In conclusion, it seems that if one is aware of Spohn’s point, one can adjust the measurement process so that the disturbance is rendered minimal or non-existent. The underlying phenomenon—the object we try to measure—surely exists, no matter what the measurement theorist argues. Let us now consider Levi’s argument. His point of departure is similar to Spohn’s: (i) If an agent ascribes subjective probabilities to acts, then the agent will be prepared to take on bets on which act she will eventually choose. (ii) By taking on such bets, it becomes more attractive for the agent to perform the act she is betting on, so (iii) the ‘measurement instrument’ (the bets used for eliciting subjective probabilities) will interfere with the entity being measured.19 The gist of Levi’s argument does, however, differ from that of Spohn’s. As before, let there be only two alternative acts, A and ¬A, and assume that eu(A) > eu(¬A). Furthermore, let x be the agent’s subjective probability that she will perform A in her present decision situation. This means that the agent would be willing to accept a bet on her performance of A in which x = WP , where P and W are positive amounts. If the agent wins her bet, the net gain is N = W − P. However, when offered to bet on A, the agent suddenly has four alternatives to choose from, rather than just two, viz.: bet on A and perform A (to be abbreviated as b ∧ A), bet on A and perform ¬A (b ∧ ¬A), do not bet on A and perform A (¬b ∧ A), and finally, do not bet on A and perform ¬A (¬b ∧ ¬A). The expected utilities of the four alternatives are as follows: 1. 2. 3. 4.
eu(b ∧ A) = eu(A) + N eu(¬b ∧ A) = eu(A) eu(b ∧ ¬A) = eu(¬A) − P eu(¬b ∧ ¬A) = eu(¬A)
Since the agent is rational, she will not perform (3); that alternative is dominated by (4), since P > 0. Furthermore, (1) will be at least as good as (2), since N ≥ 0. It is assumed that eu(A) > eu(¬A), so it follows that the agent will prefer (1) to (4) at least up to the point at which N = W − P = 0. Hence, the fair price for a bet on A is P = W , and it directly follows that the agent’s subjective probability of performing 19 An excellent discussion of Levi’s argument can be found in Joyce (2002). Here I analyse only a small part of Levi’s argument, namely the part that is relevant to the present discussion.
78
4 Indeterminate preferences
A is WP = 1. This is why Levi believes that ‘To be an agent crowds out being a predictor’.20 Rabinowicz objects to Levi’s argument because it does not show that the probability for A is 1; the argument only shows that the probability for the conjunction b ∧ A is 1.21 More precisely, Levi’s argument fails because the agent’s betting dispositions depend on the value she assigns to the alternatives that are open to her, and the value of a whole (taking on the bet b and doing A) need not be the sum of the value of its parts. In Levi’s example, the value of one of the parts, the bet b, clearly increases in presence of the other part, act A, as showed above. That is why b ∧ A will be assigned a probability of 1. However, according to Rabinowicz, the fair betting rate for A can only be determined by considering what bet on A the agent is willing to take, period. What bet on A he is willing to take in combination with A is irrelevant for the specification of his betting rate for A.22
Rabinowicz proposes the following analogy to clarify his point: A motorist who is driving behind a truck might be prepared to swerve to another lane and accelerate, i.e. to perform a combination of two acts. We may assume that the agent is more willing to both swerve and accelerate than just to swerve. But if the motorist has some doubts about whether she actually will swerve if she accelerates, then she is not prepared to accelerate. Even in case the probability for the swerve-and-accelerate act is 1 whenever this act is available, it does not follow that the probability that the motorist will accelerate is also 1. Rabinowicz’s argument seems to be correct. However, what he does not explain is how one is to determine ‘what bet on A the agent is willing to take, period’.23 Isn’t Levi’s point that the only available method is to offer bets on an act, just as the act in question is available, and that this leads to the problem spelled out above? At this point, it is appropriate, per Rabinowicz’s suggestion, to invoke the idea that our minds are modular.24 As explained by philosophers working on the philosophy of biology, to say that our mind is modular is equivalent to saying that ‘the mind is a set of special purpose thinking devices or modules whose domain-specific structure is an adaption to ancestral environments’.25 It is plausible to assume that choices are taken care of by one module, and cognitive assessments of likelihoods by another. The agent can freely switch between the two modules, which cooperate with each other. The cognitive module, for instance, provides input to the choice module that is 20
Levi (1997:32). Rabinowicz (2002:109). 22 Ibid. 23 Ibid. 24 Rabinowicz (2002:92). 25 Gerrans (2002:305). There are at least three different theories of what a mental module is. According to the hardware conception, the human brain is ‘a set of cognitive devices with distinct neural realizations’. According to the algorithmic conception, modules are instead ‘individuated, not by their physical but their computational architecture’. Finally, the epistemic conception holds that ‘modularity is a domain specific body of innate knowledge’. (All quotes are from Gerrans (2002:307).) 21
4.8 Further remarks on indeterminate preferences
79
used in deliberation. This modular account of the mind explains why an agent cannot simultaneously see an act as an object of choice and as an object of prediction, even though she can work out the probability of any given act open to her at present by jumping back and forth between the two modules. The modular account has many advocates among biologists and philosophers,26 but whether it is correct is ultimately an empirical question. If it turns out to be false, act probabilities cannot be empirically established in the way suggested above, and some other method or device will have to be applied. However, it is important to bear in mind that even if the modular account does turn out to be false, it does not follow that Levi’s objection is valid.
4.8 Further remarks on indeterminate preferences It is commonly assumed that agents have privileged access to their own mental states, i.e. that sincere statements about one’s own mental state cannot be mistaken. If, for instance, you believe that you feel pain, it follows that you do feel pain. By the same token, if an agent believes that she prefers Wagner to Verdi, then she really does prefer Wagner to Verdi, because we never make mistakes about our own preferences. In order to avoid any misunderstanding, it is worth noting that nothing said in this chapter is incompatible with the assumptions that preferences are never mistaken, and that we have privileged access to our own mental states. The basic point is simply that agents sometimes have no settled opinion about what they prefer most. In such cases there is no ‘true’ preference that can be discovered through introspection. These indeterminate preferences should be taken to be partial, in the same sense that Ramsey argues that almost all of our beliefs are partial. It might perhaps be objected that, with the probabilistic theory, statements about preferences become purely cognitive claims. If an agent claims that he prefers Pepsi over Fanta, this is a factual statement about the world. The agent is simply talking about what he believes he will do in the future. He is not talking about how the world ought to be. Therefore, it might be argued that the evaluative aspect of preferences is lost somewhere in the analysis. Arguably, the best counter argument to this observation is to point out that an agent’s prediction about his own future behaviour is indirectly influenced by his emotions and other non-cognitive processes. Since I desire Pepsi more than Fanta, I predict that I will probably choose Pepsi the next time I have dinner at McDonald’s. This suggests that the evaluative aspect has, after all, not been lost. My non-cognitive processes indirectly influence my predictions about my behaviour. Before closing this chapter, a point should also be made about the phenomenon known as weakness of the will (akrasia). Weakness of the will arises when an agent ‘acts intentionally . . . counter to his own best judgment’.27 Suppose that an agent 26 27
See Gerrans (2002). Davidson (1980:21).
80
4 Indeterminate preferences
is almost sure that she prefers to quit smoking over continuing to smoke. If her preferences are analysed as (correlated) tendencies to utter certain judgements and act in accordance with them, it should follow that, if no abnormal conditions are present, there is a very high probability that the agent will quit smoking. Since the agent has been smoking for many years, however, she is strongly disposed not to quit smoking. A reasonable response to this example is that the agent does not really have a determinate preference to quit smoking. Even though she might be disposed to utter the right judgements (that smoking is bad for her and that she should stop), it is obvious that she does not have the corresponding act disposition. On the contrary, however, the truth is that the agent merely wishes that she could muster a preference to quit smoking. The conclusion then, is that agents do not always have the preferences they wish they had. Of course, the modular account of the mind proposed by Rabinowicz can also be used to account for weakness of the will: There is a cognitive module in our minds that is responsible for judging which act is best, and another module that prompts our action. Since the two modules are separate, it is sometimes the case that acts judged as sub-optimal by the cognitive module are still performed by the act-executive module.
Chapter 5
Utility
A utility function measures how strongly the agent desires one outcome in relation to another. As explained in Chapter 2, Bayesians define utility in terms of preferences over uncertain prospects. If an agent is indifferent between the prospect of getting a copy of Wittgenstein’s Philosophical Investigations for certain, and the prospect of winning a copy of Tractatus if a coin he believes to be fair lands heads and a copy of Zettel if it does not, then his utility of the three books is as follows: u(Tractatus) = 1, u(PhilosophicalInvestigations) = 0.5, and u(Zettel) = 0. Non-Bayesians think that this theory puts the cart before the horse, since it presupposes that agents have access to preferences over uncertain prospects before they make their choice. The aim of this chapter is to present an alternative, non-Bayesian theory of utility. The theory I propose draws on the notion of indeterminate preferences developed in Chapter 4. Another important element is a technical result originally proved by Luce (1959/2005). Luce showed that there is a close link between the utility of an object and the probability that it is chosen. More precisely put, he proved that the higher the utility of an object is in relation to another, the more probable that the better object will be chosen. However, in opposition to Luce, I argue that a probabilistic theory of utility should be based on a subjective notion of probability. Hence, the utility of an object is defined in terms of the agent’s self-predicting subjective probability that he will choose the object in question. As a consequence of this, I argue that utility, and hence desire, is a certain kind of belief—the belief that one is likely to choose an object if given the opportunity to do so. Although this view differs from the traditional Humean picture, according to which desires cannot be reduced to beliefs, I believe the probabilistic theory of utility is nevertheless compatible with the empiricist spirit of the Humean project. It is worth pointing out that the new theory allows the agent to derive a utility function without considering any uncertain prospects. Even the agent who believes that the world is entirely deterministic could use this theory for assigning utilities to outcomes. The non-Bayesian theory advocated here can be contrasted with the classical theory, which identifies utility with happiness. The classical theory is closely
81
82
5 Utility
interconnected with utilitarianism and writers like Bentham and Mill.1 It is unfortunate that the classical theory has never been spelled out in a technically precise way. This makes it difficult to compare the merits of the two theories. Before spelling out the probabilistic theory, I shall therefore make some effort to render the classical theory more precise. I shall also discuss a third alternative, a non-Bayesian theory of utility developed by Halld´en and Sahlin, according to which utility is defined in terms of second-order preferences. Throughout the chapter, the discussion will be limited to intra-personal comparisons of utility. The special problem arising when making inter-personal comparisons is of little relevance to the present discussion.
5.1 The classical theory To the best of my knowledge, the most sophisticated defense of the classical theory is McNaughton (1953), who argues that utility should be conceived of as a certain mental state, which he calls happiness. Briefly put, McNaughton’s main idea is to divide the utility of an outcome into temporal intervals, such that the utility may vary from one interval to the next, but not within an interval. Call such intervals, which may be arbitrarily small, moments of utility. It is, of course, an empirical question whether moments of utility exist. It cannot be excluded that, in some time periods, there exist no constant moments of utility. In order to overcome this problem (not mentioned by McNaughton) I shall assume that if m is an interval which cannot be divided into a sequence of constant intervals, then it is always possible to construct a constant interval m covering the same time interval, such that m ∼ m , by choosing some m having the right ‘intensity’.2 A moment of utility is to be thought of as a property of an individual’s experience in a certain time interval. The more an agent wants to experience a moment for its own sake, the higher is the utility of the moment. Thus, the agent’s well-informed preferences among different moments are likely to be the best way of determining the utility of moments. In this respect, the classical utility concept resembles the Bayesian approach, since the latter also uses preferences for measuring utility. However, note that the classical theory does not make any use of risk, uncertainty, or preferences over lotteries. McNaughton’s theory has never become accepted in wider circles. I think there are least two reasons for this. First, McNaughton’s theorem was never satisfactory proved.3 Second, some of the axioms McNaughton suggested are not very plausible. For example, his fifth axiom holds that ‘A moment of no conscious experience counts as nothing’.4 However, to identify unconsciousness as a zero-point seems arbitrary. Perhaps it is good to be unconscious. 1
See e.g. Bentham (1789/1970) and Mill (1863/1998). This presupposes that ‘intensity’ is a relevant factor for determining the utility of an outcome; if it isn’t, just choose another factor that is. 3 McNaughton listed several axioms, but he did not state or prove any formal theorems. 4 McNaughton (1953:175). 2
5.1 The classical theory
83
In what follows I shall depart from McNaughon’s original theory and opt for a formally more rigourous exposition, which relies on less controversial axioms. Let M = {a, b, , . . .} be a set of utility moments, and let the pair M × M, be a comparison structure for utility moments, such that ab is the difference in utility between the moments a and b and the cartesian product M × M is the set of all possible such differences between the moments a, b, . . . in M. Of course, ab ∼ cd iff ab cd and cd ab. We say that M × M, is an algebraic-difference structure if and only if, the following five axioms are satisfied for all a, b, c, d, a , b , c ∈ M and all sequences a1 , a2 , . . . , ai , . . . ∈ M:5 Classic 1 is a weak order on M × M. Classic 2 If ab cd, then dc ba. Classic 3 If ab a b and bc b c , then ac a c . Classic 4 If ab cd aa, then there exist d , d ∈ M, such that ad ∼ cd ∼ d b. Classic 5 If a1 , a2 , . . . , ai , . . . is a strictly bounded standard sequence (ai+1 ai ∼ a2 a1 for every ai , ai+1 in the sequence; not a2 a1 ∼ a1 a1 and there exist d , d ∈ M such that d d ai a1 d d for all ai in the sequence), then it is finite.
Theorem 5.1 (Krantz et al 1971) If M × M, is an algebraic-difference structure, then there exists a real-valued function u on M such that for all a, b, c, d ∈ M: u(a) − u(b) ≥ u(c) − u(d) if and only if ab cd.
(5.1)
Moreover, u is unique up to a positive linear transformation, i.e., if u has the same property as u, there are real constants α , β , with α > 0, such that u = α u + β . Theorem 5.1 is a standard result in measurement theory. Briefly put, it shows that if a set of utility moments satisfies the axioms of an algebraic difference-structure, then utility can be measured on an interval scale. The question is, then, whether the proposed axioms hold or fail in the intended interpretation. Axiom 1 in the classical theory is an ordering axiom. I shall return to it at the end of this section. Axiom 2 is straightforward, and will not be further discussed here. Axiom 3 is a monotonicity condition. Krantz et al (1971) point out that this axiom does not hold if, for instance, we take an ellipse and order intervals according to the lengths of chords.6 In the example with utility moments, it seems hard to find out whether Axiom 3 holds or not, partly because this interpretation is rather abstract. 5 6
Cf. Krantz et al (1971: Chapter 4). Cf. Krantz et al (1971:145-6).
84
5 Utility
However, an argument speaking in favour of the axiom is that it trivially holds for many causes of utility, e.g. money. Axiom 4 is sometimes referred to as a solvability condition. It asserts that a given positive interval cd can be ‘copied’ within any larger positive interval ab, no matter if one decides to use a or b as the endpoint of the copy. This implies, among other things, that the elements of M have to be indefinitely fine grained. In order to see how this might be problematic, suppose that you dislike sugar in your coffee, and— due to certain biological circumstances—can discern between 0 or 0.5 gram sugar and between 0.5 and 1.5, but not between 0.5 and 1.0 gram. Given that your utility scale for sugar is proportional to the differences in the amounts of sugar you are offered, it follows that this scale is not indefinitely fine grained. However, even if there is no pair of sugar amounts that equals 1.0 utilities on your interval scale, that level can, arguably, be reached by offering you a certain amount of sugar and some amount of money and some other objects that you either like or dislike. So given that we have enough objects to offer you, it seems plausible to assume that Axiom 4 will be satisfied. Axiom 5 is essentially an Archimedean condition, asserting that a standard sequence (1,2,3 metre sticks; 1,2,3 degrees Celsius; etc.) has to be finite given that it is strictly bounded. In the case with metre sticks, Axiom 5 implies that it only takes a finite number of metre sticks to fill the distance between Tokyo and Sydney, since that distance is strictly bounded. Arguably, this axiom is no less plausible in the case with utility moments. Suppose, for instance, that you experience a bad utility moment in the morning but a good one in the evening. Then this change has arguably taken place in a finite number of steps given that all increments were non-zero and of equal magnitude. (If they were not, Axiom 5 is not violated.) It is worth noting that none of the five axioms leading to Theorem 5.1 presuppose comparisons between time intervals of different lengths. An initial set of moments can always, as noted above, be repartitioned into a new set of moments in which all moments have the same duration. This is a noteworthy advantage over McNaughton’s original theory, in which some of the axioms presuppose comparisons between time intervals of very different lengths.7 If time intervals of different lengths have to be compared, a preference for e.g. one hour at home to three hours at work might depend on how the remaining two hours are spent in the first alternative, which implies that a preference stated by an agent is not only a preference between the compared moments.8 As explained above, Classics 1—5 imply that utility can be represented on an interval scale. I shall now formulate a slightly different set of axioms, saying that utility can be represented on a ratio scale. As before, I will use some standard results from measurement theory, and focus on whether the proposed axioms hold in the intended interpretation. Let the pair M, be a comparison structure for moments of utility, in which M is a set of moments, and is a relation on M representing strict preference. 7 8
See McNaughton (1953:175-6). Cf. Danielsson (1983).
5.1 The classical theory
85
Indifference is defined in the ordinary way: a ∼ b iff ¬(a b) ∧ ¬(b a). Also suppose that there is a binary operation ◦ on M. Intuitively, a ◦ b denotes the utility of first experiencing the utility moment a, followed immediately by the utility moment b. The pair M, is an extensive structure if and only if, for all a, b, c, d ∈ M, the following four axioms hold:9 Classic∗ 1 is a strict weak order on M.10 Classic∗ 2 For all a, b, c in M: [a ◦ (b ◦ c)] ∼ [(a ◦ b) ◦ c]. Classic∗ 3 For all a, b, c in M: a b ⇔ (a ◦ c) (b ◦ c) ⇔ (c ◦ a) (c ◦ b). Classic∗ 4 For all a, b, c, d in M, if a b, then there is a positive integer n such that (na ◦ c) (nb ◦ d), where na is defined inductively as 1a = a, (n + 1)a = (a ◦ na). Axiom 1∗ and 2∗ do not need any further clarification. Axiom 3∗ states that in case a utility moment a is preferred to b, then a b even in case a moment c comes before or after those moments. Of course, this axiom does not imply that the entities that cause utility can be attached in this way. For example, if salmon is preferred to beef, it would be a mistake to conclude that salmon followed by ice cream is preferred to beef followed by ice cream. Arguably, what Axiom 3∗ tells us is that if the utility of eating salmon is preferred to the utility of eating beef, then the utility of eating salmon followed by the utility of eating-ice-cream-after-eating-salmon is preferred to the utility of eating beef followed by the utility of eating-ice-cream-after-eatingsalmon. Axiom 4∗ is an Archimedean condition. It implies that even if d is very strongly preferred to c, then this difference can always be outweighed by a sufficiently large number of moments equal to a respectively b, a b, such that (na ◦ c) (ny ◦ d). This roughly corresponds to the Archimedean property of real numbers: if b > a > 0 there exists a finite integer n such that na > b, no matter how small a is. Axiom 4∗ is problematic in case one believes that there is some critical level of utility, such that a sequence of moments containing a sub-critical-level moment should never be preferred to a sequence of moments not containing a sub-critical-level moment. Personally I do not think there are any such critical levels, but to really argue for that point is beyond the scope of the present book. Taken together, Axioms 1∗ - 4∗ imply the following representation and uniqueness theorems.
9
Extensive structures were first investigated by H¨older (1901). The axioms presented here have been derived from Roberts (1979:126-8). 10 That is, M, is asymmetric and negatively transitive.
86
5 Utility
Theorem 5.2 (Roberts and Luce 1968) Let M be a non-empty set, is a binary relation on M and ◦ is a binary operation on M. Then there is a real-valued function u on M satisfying a b ⇔ u(a) > u(b)
(5.2)
u(a ◦ b) = u(a) + u(b)
(5.3)
and
if and only if M, , ◦ is an extensive structure. Theorem 5.3 (Roberts 1979) Let M be a non-empty set, is a binary relation on M and ◦ is a binary operation on M, and u is a real-valued function on M satisfying a b ⇔ u(a) > u(b)
(5.4)
u(a ◦ b) = u(a) + u(b)
(5.5)
and
u
Then another function satisfies these two properties if and only if there exists α > 0 such that u = α u, i.e. u is a ratio scale. From a technical point of view the classical theory of utility is, of course, impeccable. If the axioms are satisfied, then one can indeed assign real numbers to moments of utility. However, there are at least two reasons why non-Bayesians should not accept the classical theory. First, the theory is inconsistent with the theory of indeterminate preferences developed in Chapter 4. In both versions of the classical theory outlined here, the ordering axioms (Axiom 1 and Axiom 1∗ ) require that the agent has a determinate preference between all moments of utility. However, it is easy to imagine possible moments of utility for which this is not the case. For example, if asked to state a preference between a hypothetical moment of utility experienced while skiing in the Alps and another hypothetical moment experienced while attending a really good philosophy seminar, I might reply that the two experiences are very different and that my preference is indeterminate. Of course, I might be able to make a choice between the two moments, but it simply does not make any sense to say that I, all things considered, determinately prefer one moment to the other, as required by the ordering axiom. I simply feel indeterminate between the two moments. The second worry about the classical theory is that it is too abstract. Are agents really able to state preferences not between, say, salmon and beef, but between the mental states caused by having salmon or beef, respectively? And are they really able to do so even if the comparison is made between hypothetical mental states which are never experienced by anyone, as required by the theory? Of course, one could respond that ideal agents are by definition able to state all kinds of preferences, even very abstract ones. However, this reply just seems to beg the question. Why is
5.2 The probabilistic theory
87
such a definition better than one that leaves it open whether ideal agents are able to state preferences between e.g. hypothetical mental states never experienced by anyone?
5.2 The probabilistic theory The key idea in the probabilistic theory can be stated in a single sentence: The higher the agent’s utility of an object is, the higher the probability is that he will choose that object if given an opportunity to do so. Suppose, for instance, you are offered a choice between salmon and tuna. Then, if the probability that you choose salmon is higher than the probability that you choose tuna, it is reasonable to say that your utility of salmon exceeds that of tuna. Note that this choice between salmon and tuna is a choice between certain outcomes. It cannot be excluded that you would simply be unable to state a precise probability of choosing, say, one uncertain prospect over another.11 Before spelling out the probabilistic theory in more detail, it should be pointed out that this theory can easily lead to unreasonable conclusions if interpreted incorrectly. Suppose, for example, that I wish to measure my utility of money. If offered a choice between $30 or $20, I would go for the higher amount with probability 1, i.e. p($30 $20) = 1. However, if offered a choice between $40 or $20, I would, of course, also go for the higher amount with probability 1. So p($40 $20) = 1. Presumably, this does not show that my utility for $40 equals that of $30. Call this the problem of perfect discrimination. Advocates of the probabilistic theory are aware of this problem, and they need to find a solution to it; I shall propose one shortly. The locus classicus of the probabilistic theory is Duncan Luce’s monograph Individual Choice Behavior (1959). The theory he presented is based on a single axiom, the choice axiom. This axiom was stated and discussed in Chapter 4: If A is a subset of B, then the probability that x will be chosen from B equals the probability that x will be chosen from A multiplied by the probability that the chosen alternative in B is also an element of A.
As noted in Chapter 4, the choice axiom does not hold water in all contexts. If the set of options comprises incomparable objects, it need not be a valid constraint on indeterminate preferences. However, in such cases there is no utility to be measured, so this should not be a problem.12 Furthermore, as will be explained shortly, the problem with incomparability can often be overcome by simply dividing the initial set of options into ‘finitely connected’ subsets. 11
From a technical point of view, the restriction to certain outcomes is not essential, as pointed out by Luce (1959/2005). However, since the whole point of the non-Bayesian approach to utility is to define utility without considering any preferences over uncertain prospects, this is nevertheless a reasonable restriction to make. 12 Here I leave aside Carlsson’s (so far unpublished) suggestion that the utility of incomparable objects can be represented by vectors of real numbers.
88
5 Utility
From the choice axiom it follows that utility can be measured on a ratio scale. This means that if the probability that you choose salmon is 2/3, and the probability that you choose tuna is 1/3, then your utility of salmon is twice as high as that of tuna. Naturally, this presupposes that the perceived desirability of the object is the only factor that affects the agent’s indeterminate preference. This is one of the reasons why probabilities should be interpreted subjectively. What you believe about your own choice directly or indirectly reflects how desirable you consider the option to be, all things considered. From a subjective point of view, there is no other reason for thinking that it is more probable that one will choose x rather than y, apart from the fact that one (given what one believes about the world and one’s present decision problem) considers x to be more desirable than y. Someone might be blackmailing you, but then that has to be taken into account when deciding what to do. Consider the following representation and uniqueness theorem. Theorem 5.4 (Luce 1959/2005) Let B be a finite set of objects such that p(xy) = 0, 1 for all x, y in B. Then, if the choice axiom holds for B and all its subsets, and the axioms of probability theory hold, then there exists a positive real-valued function u on B, which is unique up to multiplication by a positive constant, such that for every A ⊂ B, p(x A) =
u(x) ∑ u(y)
y∈A
The proof of Theorem 5.4 is very simple. Its basic idea is to first make a clever substitution, and then do some elementary algebra. (See the Appendix for details.) Let us now return to the problem of perfect discrimination mentioned above. As explained by Luce, the problem is that ‘the [utility] scale is defined only over a set having no pairwise perfect discriminations, which is probably only a small portion of any dimension we might wish to scale’13 That is, the problem lies in the assumption that p(x y) = 0, 1 for all x, y in B. After all, this condition is rather unlikely to be satisfied, because most agents know for sure that they prefer $40 to $20, and $30 to $20, etc. Luce indicates a possible solution in his (1959/2005). He points out that, ‘If R and S are two sets over which v-scales are defined, and if they overlap, then’ the two scales can be welded together to form a single scale over the whole of B.14 Consider the following definition. Definition 5.1 The B with pairwise probabilities p(x y) is finitely connected if for every a, b ∈ B for which p(b a) > 12 , there exist a finite sequence x1 , x2 , ..., xn ∈ B such that 1 2
≤ p(x1 a) < 1,
1 2
≤ p(xi+1 xi ) < 1 and
1 2
≤ p(b xn ) < 1
Now, it seems clear that if a finitely connected structure exists, then the problem of perfect discrimination can be resolved. But why should one believe that such a 13 14
Luce (1959:24). Luce (1959:25).
5.3 A modified version of the probabilistic theory
89
structure exists? This problem was never addressed by Luce. As pointed out above, the probability that I will go for a higher amount of money rather than a lower is 1, so in that particular case there seems to be no correlation between probabilities and utilities.
5.3 A modified version of the probabilistic theory In this section I present a modified version of Luce’s probabilistic theory. The modified theory provides a solution to the problem of perfect discrimination. However, before the problem of perfect discrimination can be addressed a few words have to be said about the interpretation of the theory of utility considered here. As pointed out by Luce and Suppes (1965), the probabilistic theory can be interpreted in at least two different ways: as constant utility or as random utility models. The difference between the two versions can be demonstrated by separating the agent’s choice process into two steps. 1) In the first step, the agent assesses the utility of each alternative. 2) In the second step, he makes a choice among the alternatives by simply trying to choose an alternative that maximises utility. According to the constant utility model, defended by Luce (1959), the first step of this process is deterministic while the second step is not. The agent sometimes fails to choose an alternative that maximises utility, because he fails to discriminate which alternative is best for them. The random utility model, favoured by e.g. Manski (1977), takes the opposite view. It holds that the agent, in the second step, always chooses an alternative with the highest utility, whereas the first step is probabilistic—the process of assigning or assessing utility is a random process. In both models, probability is conceived of as an objective concept.15 The interpretation of the probabilistic theory I defend differs fundamentally from both the constant and random utility models. The two most important differences are that in my interpretation of the theory, (i) probabilities are ascribed to choices, not preferences, and that (ii) all probabilities are subjective, rather than objective. Pertaining to (i), there is no need in my interpretation of the theory to assume that there are some ‘correct’ utilities ‘out there’ that people fail to perceive. Judged from a contemporary philosophical perspective, it seems difficult to justify the assumption that something can be subjectively good for the agent, even though he fails to realise that that is the case. Pertaining to (ii), when I say that p(x y) = 0.75, this does not mean that the agent chooses x over y three times out of four. As explained in Chapter 4, it rather means that the agent has a subjective degree of belief that he will choose x over y, which corresponds to this number. So to some degree, the agent prefers x to y and y to x at the same time, in a sense similar to that arising in discussions of vagueness: In a certain sense, some people are both bald, and not bald, at 15 Luce (1959) did not mention anything about the interpretation of the probability concept. However, in personal communication with the author, he has confirmed that he was thinking of objective probabilities.
90
5 Utility
the same time. However, unlike vagueness, claims about indeterminate preferences can, at least in principle, be tested empirically. It is worth pointing out that since I favour a subjective interpretation of probability, the interpretation of the probabilistic theory I defend also works for single choice situations, whereas previous probabilistic theories only work for repeated choices. I shall now return to the problem of perfect discrimination. The upshot of my solution is that there exists, relative to every set of objects B for which the choice axiom holds, some non-perfect object x∗ ∈ B. A non-perfect object is an object that will not be chosen with probability 1 or 0 in a pairwise choice. Consider the following existence axiom. Non-Perfect Object For every set of objects B for which the choice axiom holds there exists some non-perfect object x∗ such that p(x∗ x) = 0, 1 for every x ∈ B. To some degree, this axiom corresponds to Ramsey’s famous existence axiom, saying that there exists at least one ‘ethically neutral’ proposition. (An ethically neutral proposition is one whose truth value does not matter to the agent.) The new axiom immediately resolves the problem of perfect discrimination. Suppose, for example, that I wish to determine my utility of $20, $30, and $40, respectively. In this case, the non-perfect object can be a photo of my beloved cat Carla, who died when I was fourteen. If offered a choice between $20 and the photo, the probability is 1/4 that I would choose the money; if offered a choice between $30 and the photo, the probability is 2/4 that I would choose the money; and if offered a choice between $40 and the photo, the probability is 3/4 that I would choose the money. This information is sufficient for constructing a single ratio scale for all four objects. Here is how to do it: The point of departure is the three local scales, which have one common element, the photo of Carla. The utility of the photo is the same in all three pairwise choices. Let u(photo) = 1. Then the utility of money is calculated by calibrating the three local scales such that u(photo) = 1 in all of them. This is achieved by dividing the probability numbers listed above by 3/4, 2/4, and 1/4, respectively. The following table summaries the example. u1 - u3 denotes the three local scales. The letter u denotes the single scale obtained by welding together u1 - u3 . Table 5.1 $20
u1 1/4
u2 -
u3 -
$30
-
2/4
-
$40 photo
3/4
2/4
3/4 1/4
u 1/4 3/4 2/4 2/4 3/4 1/4
= 1/3 =1 =3
1
Of course, there might exist some large amount of money that would make me choose the money over the photo with probability one. This indicates that the photo
5.4 Can desires be reduced to beliefs?
91
is not non-perfect with respect to any amount of money. However, this difficulty can be overcome by choosing some other beloved object to compare it with, e.g. the only remaining photo of my daughter, or peace in the Middle East. The main advantage of the subjective version of the probabilistic theory proposed herein, is that it explains why people are not always able to choose between two alternatives with probability 1. If one feels indeterminate between a small amount of money and a beloved photo of a dead cat, this is explained by the hypothesis that one has no determinate preference between the two objects. This explanation is not available for Luce, who accepts the frequentistic interpretation of probability.16 For him, the only explanation of a probabilistic preference is that agents sometimes make mistakes when assessing utilities. This is parallel to a discrimination problem in which an agent cannot tell which one of two stones is the heaviest: There is a true fact of the matter, but due to limited perceptual capabilities, the truth cannot be revealed.
5.4 Can desires be reduced to beliefs? On the probabilistic theory, statements about utility are purely cognitive claims about the world. They are predictions of what the agent will do in the near future. This suggests that utility, and hence desires, can be reduced to beliefs about future behaviour. The upshot is that there might be no fundamental difference between desires and beliefs. The agent’s set of desires is a strict subset of his set of beliefs. The term ‘reduction’ should be interpreted with care, however. I am not suggesting that a theory can be reduced to another. So my claim is not analogous to the claim that certain parts of the classical theory of thermodynamics can be reduced to the kinetic theory of gases. For a better comparison, one may rather compare my claim about desires and beliefs with certain widely held views in other philosophical disciplines. For example, it is true that all bachelors are unmarried men, and it is also true that water is H2 O. That said, there is an important difference between these two examples. The first example is an analytic truth, but the second is not. According to Kripke, the claim that water is H2 O is an a posteriori necessity.17 This suggests that the thesis that all desires are beliefs (about future behaviour) might perhaps also be an a posteriori necessity. No one has performed any empirical experiment in order to find out whether this claim is true. But one could argue that this discovery, if it is one, is something we have discovered through introspection. It is a metaphysical necessity that desires are beliefs and this has recently been discovered a posteriori. Alternatively, it might be argued that the identification of desires with beliefs is based entirely on linguistic considerations of the meaning of the terms ‘desire’ and ‘belief’. Hence, this has to be an analytical truth. 16 17
Personal communication, March 2006. Kripke (1980).
92
5 Utility
Which account is correct? My opinion is that the probabilistic account is, or ought to be, regarded as, an a posteriori discovery. A mere linguistic analysis is not sufficient for reaching the conclusion that a desire is a belief about future behaviour. A competent speaker can deny this claim without making any linguistic error. At this point at least two objections can be raised. First, as shown by Lewis (1988), desires cannot be identified with certain kinds of beliefs. In order to render this claim more precise, suppose you maintain that for every proposition D expressing a desire, there is another proposition D expressing a belief, such that the probability of D equals the expected utility of D, i.e. p(D) = eu(D ).18 Then, it can be proved that Jeffrey’s axioms (stated in Chapter 2) lead to the seemingly absurd conclusion that for every desire D, the number p(D) and the number eu(D ) cannot change simultaneously in light of new evidence. This is not a formal contradiction, but according to Lewis it is a strong reason for denying that every desire corresponds to a belief. (For an alternative argument to the same effect, see Costa et al (1995).) However, note that Lewis’s argument relies on Bayesian expected utility theory. According to the view defended in this chapter, every desire is a belief— but no assumption has been made about Bayesian expected utility theory. So my response to Lewis argument is very straightforward: I deny the premise that the probability of the relevant belief has to equal the expected utility of the fulfilment of the corresponding desire. The second objection holds that desires cannot be beliefs, because beliefs and desires have different directions of fit. If there is a mismatch between the world and the agent’s belief about the world, the belief is in some sense wrong and ought to be revised. However, if there is a mismatch between the world and the agent’s desires, it is the world that is wrong and that ought to be changed. Hence, desires are not beliefs. However, against this argument it might be objected that the distinction between two directions of fit is, in fact, compatible with the probabilistic beliefbased account of desire. This is because the probabilistic account is not a claim about what ought to change: it a claim about how certain evaluative phenomena can be described in a precise way. The distinction between two directions of fit merely shows that some beliefs are different from others. In order to make false desirebeliefs true, the world has to change (thereby inducing a change in behaviour), but the proper response to an ordinary false belief is to revise the belief, not to change the world.
5.5 Second-order preferences An interesting alternative to the probabilistic theory has been proposed by Halld´en (1980) and Sahlin (1981). They argued that a non-Bayesian notion of utility may be defined in terms of second-order preferences. The point of departure in their proposal is that it is better to be able to state a preference between two outcomes 18 In the present discussion it is supposed that the utility scale is normalised, such that 1≥eu(D )≥0.
5.5 Second-order preferences
93
when the distance in utility between the outcomes is large, compared to the case in which the distance in utility is small. This is an interesting idea because, in line with the probabilistic approach, it does not presuppose any preferences among uncertain prospects, or any other elements of risk or indeterministic entities. Let p, q, r, s, . . . be outcomes, and consider the following informal definition: If it is not the case that q is better than p and it is not the case that s is better than r, then the distance [in utility] between p and q is greater than the distance [in utility] between r and s if and only if it is better to prefer p to q than to prefer r to s. (Sahlin 1981:63.)
Sahlin distinguishes between two kinds of preference relations, a first order preference 1 , which is a binary relation on outcomes, and a second order preference 2 , which holds between first order preferences. In the informal definition above, the phrase ‘better to prefer’ corresponds to 2 , whereas ‘better than’ corresponds to 1 . In order to express Halld´en’s and Sahlin’s definition of utility formally, let u(p, q) denote the distance in utility between p and q. Then, Definition 5.2 If ¬(q 2 p) and ¬(s 2 r), then u(p, q) > u(r, s) if and only if (p 1 q) 2 (r 1 s). Definition 5.2 can be used as the basis of an axiomatic analysis, together with the five axioms proposed in the discussion of the classical theory, i.e. Classics 1 – 5. From a logical point of view, it makes no difference if one measures the difference between moments of utility or something else, such as second order preferences. The formal structure is the same. An advantage of the Halld´en-Sahlin proposal is that it seems to work in practice. Sahlin (1981) reports an experiment in which he successfully asked ten respondents to state preferences among a set of 23 outcomes. It was discovered that the respondents’ first order preferences could be described as a weak order, and that the structure of their second order preferences were sufficient for deriving a quantitative utility function, which was unique up to a positive linear transformation. However, there are at least two problems with the Halld´en-Sahlin approach. First, it is not consistent with the theory of indeterminate preferences, developed and defended in Chapter 2. The Halld´en-Sahlin approach requires that the agent’s preferences among preferences form a weak order (Classic 1), and this is a dubious assumption. Even if the agent’s first order preferences happen to be complete, it is far from clear that the second order preferences are, or ought to be, complete. It is easy to imagine situations in which the agent feels indeterminate about which first order preferences it would be best to have. Suppose, for example, that I am asked to state a preference between (i) being able to state a preference between a BMW and a Mercedes, and (ii) being able to state a preference between reading the early and the later Wittgenstein. In this example, my first order preferences might very well be complete. I prefer the Mercedes over the BMW, and I prefer the early Wittgenstein over the later. (And I definitely prefer any car over anything written by Wittgenstein.) However, as far as I can see, there is little reason to believe that I would be able to state a second order preference between options (i) and (ii) described above.
94
5 Utility
The values at stake, a mere materialistic value and a more intellectual one, are so different that I cannot tell which preference it would be better to be able to state. The second objection attacks another corner stone of the Halld´en-Sahlin proposal. The problem is that some preferences might be better to have rather than others not only because the distance in utility between the involved objects is larger, but also because certain types of preferences have to be used more frequently, or for other reasons. Sahlin’s claim that it is ‘obviously better to be able to order two entities which are widely separated on the scale than two which have more or less the same value’ is not obviously true.19 This claim only seems to be true if the distance in value is the sole consideration that ought to guide second order preferences. For instance, I consider it better to be able to state a preference between wine and orange juice because I face that choice much more often than the choice between dirty water and arsenic solution, despite the fact that the arsenic solution may end my life in case I fail to not prefer it. Hence, the basic premise of the Halld´en-Sahlin proposal seems to be false. If I have reason to believe that it is extremely unlikely that I will ever face a choice between a pair of outcomes, it does not matter that I fail to order them.
19
Sahlin (1981:63).
Chapter 6
Subjective probability
For Bayesians, there is a close link between subjective probability and preferences over uncertain prospects. According to their view, the more you are willing to pay for entering a bet in which you win some fixed amount if your belief turns out to be true, the higher is your subjective probability. So your willingness to bet serves as an indicator of how likely you think the belief is to be true. However, as argued in Chapter 2, Bayesians put the cart before the horse from the point of view of the deliberating agent. Bayesians define subjective probability (and utility) in terms of preferences over uncertain prospects; therefore, the obtained numbers cannot figure as reasons for forming new preferences over the same set of uncertain prospects. The aim of this chapter is to offer a non-Bayesian account of probability that avoids this problem, and that is coherent with the Humean belief-desire model of action. Most critics of Bayesian decision theory advocate objective concepts of probability. They claim that probabilities represent objective features of the world, such as long run frequencies, propensities, or logical inference relations. However, in opposition to this traditional non-Bayesian view, I defend a subjective non-Bayesian theory of probability. The non-Bayesian approach to subjective probability is seldom or never discussed in the philosophical literature, although it has been given some attention in the statistical and mathematical literature. The pioneer was Koopman (1940), who inspired Good (1950) and DeGroot (1970) to develop similar nonBayesian theories.1 Table 6.1 illustrates some major alternatives in the debate over the interpretation of the probability calculus. Table 6.1 Subjective Objective
Bayesian Ramsey, Savage, Jeffrey
non-Bayesian Koopman, Good, DeGroot Keynes, Carnap, Popper
1 These non-Bayesian theories are very similar to Savage’s (1954/72) theory of qualitative probability.
95
96
6 Subjective probability
The position I shall defend belongs to the upper right corner, that is, it combines subjectivism with non-Bayesianism. A subjective non-Bayesian position defines probability without referring to preferences over uncertain prospects, or other evaluative concepts. This is what makes the position non-Bayesian. However, it also claims that probabilities are mental phenomena, and this is what makes the position subjective. In the non-Bayesian subjective theories proposed by Koopman, Good, and DeGroot, the qualitative relation ‘at least as likely as’ is used as a primitive concept. However, this is not an essential feature of a subjective non-Bayesian subjective position as will be shown below. The upper left corner (subjective Bayesianism) and lower right corner (objective non-Bayesianism) in Table 6.1 constitute genuine alternatives to the position I wish to defend. However, the lower left corner is empty. This is because the combination of Bayesianism and objectivism is conceptually impossible, given that Bayesian decision theory is defined in terms of preferences over uncertain prospects, and preferences are taken to be mental phenomena. The strategy in this chapter is to first cast some doubt on the two genuine alternatives to my position, subjective Bayesianism and objective non-Bayesianism. Thereafter I will defend the subjective non-Bayesian approach. The two alternative positions are briefly discussed in Sections 6.1 and 6.2. The subjective non-Bayesian position is presented and defended in Sections 6.3 and 6.4.
6.1 Why not objective probability? Even though influential in philosophical circles, subjective theories of probability still represent a fairly odd minority view. Most people, both laymen and scientists, believe that probabilities carry information about the external world. At first glace, the number of specialists who believe that the concept of probability refers to degrees of belief, therefore, seems to be surprisingly large. However, I think the disagreement between objectivists and subjectivists is often exaggerated. Furthermore, in many cases the disagreement itself is also described in a misleading way. I believe that once we realise what the disagreement is all about, we will realise that both sides are right, in a sense to be explained. However, at that point we will also realise that only a subjective theory is relevant when making decisions—or so I shall claim. The perhaps most obvious way of drawing the line between objective and subjective theories is the following. Objectivists think that probabilities refer to facts in the external world, whereas subjectivists think they refer to mental phenomena, i.e. degrees of belief. This distinction remains reasonably clear until one considers ‘mental’ probabilities, such as the probability that I will believe something to a certain degree. In that case, an objective notion of probability also refers to a fact about the agent’s degree of belief, that is, a mental phenomenon, rather than about the external world. In order to avoid this problem, a more detailed account of the objective approach is required.
6.1 Why not objective probability?
97
There are four main alternatives for the objectivist: The classical theory (Laplace 1814), the logical theory (Keynes 1921, Carnap 1950), the frequency theory (Venn 1876), and the propensity theory (Popper 1957). The following summary is not an attempt at making full justice to these theories, but should rather be conceived as a reminder of the main points in each account. Laplace argued that the probability of an event equals one to the number of possible ways in which the event can occur. His view presupposes that all events are equally likely. For example, if I roll a die the probability that it lands with a six is one in six. So it seems that the classical theory works fine for events such as fair coins, roulette wheels, etcetera. However, more complicated events are difficult to make sense of with this theory. For example, a marriage can end in either of two ways, divorce or eternal death, but it seems very pessimistic to conclude from this piece of information alone that the probability of divorce is always fifty percent, no matter who is marrying who. The logical theory of probability, famously developed by Keynes (1921) and Carnap (1950), is more sophisticated. Its basic idea is that probability is a logical relation between a hypothesis and the evidence supporting it. The probability relation is thus, in a certain sense, a generalisation of deductive logic from the deterministic case to the indeterministic one. For example, to say that it is highly probable that my marriage will end in a divorce means that the evidence I have at hand (separate bedrooms, no romantic dinners, etc.) ‘entail’ the conclusion that my marriage will end in a divorce to a certain degree. It is beyond the scope of the present book to analyse this theory in detail. However, an often acknowledged weakness of the logical theory is that it seems to be sensitive to the choice of language used for describing the hypothesis and the evidence. The frequency theory comes in different versions. The most straightforward suggestion is to define probability as the actual relative frequency of some event, as observed in a finite series of observations. However, if this theory is accepted, the probability that a symmetrical coin will land heads up will vary over time, depending on which particular sequence of events one observes. This is implausible. Another suggestion is to consider the limiting relative frequency of the event. However, it should be noted that one can never be sure that a limiting relative frequency exists. When tossing a balanced coin, the relative frequency of heads will perhaps never converge. A related problem is that the limiting relative frequency seems to be inaccessible from an epistemic point of view, even in principle. The fact that the relative frequency of a coin landing heads up seems to be close to fifty percent does not exclude that the true limiting frequency is much lower or higher. In fact, no finite sequence of observations can prove that the limiting frequency is close to the observed frequency. A further problem is that the frequency theory cannot be applied to unique events, that is, events that only occur once, e.g. a happy marriage between two specific individuals. Needless to say, it is also unclear how one should separate unique events from non-unique events. Perhaps it makes most sense to say that all events are unique, even coin tosses. Literally speaking, you never toss the same coin twice.
98
6 Subjective probability
The propensity theory holds that probabilities can be identified with certain features of the external world, namely, the propensity or disposition an object has to give rise to a certain effect. For instance, propensity theorists may think that the coin in your pocket has a propensity or disposition to land heads up about every second time it is tossed. Probabilities are thus real features of the world, at least as long as one thinks that dispositions are real features of the world. The most well-known counter argument to the propensity theory is Humphreys’ paradox.2 Briefly put, the point is that conditional probabilities can be ‘inverted’ by applying Bayes’ theorem, i.e. if we know the probability of A given B we can calculate the probability of B given A by using Bayes’s theorem. However, propensities cannot be ‘inverted’ in this sense. Suppose, for example, that we know the probability that the train will arrive on time at its destination given that it departs on time. Then it makes sense to say that if the train departs on time, it has a propensity to arrive on time at its destination. However, even though it makes sense to speak of the inverted probability, i.e. the probability that the train departed on time given that it arrived on time, it makes no sense to speak of the corresponding inverted propensity. No one would admit that the on-time arrival of the train has a propensity to make it depart on time a few hours earlier. Despite the drawbacks of the objective approach, these theories can nevertheless explain in what sense a mental probability is objective: A mental probability is either a propensity, a relative frequency, a logical inference relation, or a Laplaceian possible-state ratio. Hence, there is no need to deny that the distinction between objective and subjective theories can indeed be drawn in a precise way. However, at this point it also seems clear that there is no need for the subjectivist to claim that these theories are all false. Different theories of probability simply address different questions. Even if it is true that some objects in the external world have, say, a propensity to behave in certain ways, this does not preclude people from having partial beliefs, and these partial beliefs can presumably be measured in one way or another. Furthermore, both of these phenomena—the features of the external world and partial beliefs—may have the same formal structure: Both phenomena may satisfy Kolmorgorov’s axioms. So what is the disagreement between objectivists and subjectivists all about then? Arguably, it is mainly a debate about what people mean when they use the term ‘probability’. That is, the debate is to a large extent a debate over which theory makes most sense of the way we use our language. Hence, the disagreement is to some extent concerned with empirical issues, which could in principle be settled by observing people’s linguistic behaviour. Both the objective and subjective approaches are coherent from a mathematical point of view, but the formulas describe different phenomena. For the decision theorist it is presumably rather irrelevant to consider how people actually use the term probability. The relevant question is which theory of probability one ought to adopt when making decisions. Arguably, this shift of focus will make subjective theories come out more attractive. It is less controversial to claim that decision makers ought to think of probability as degree of belief, than to claim 2
Humphreys (1985).
6.2 Why not Bayesian subjective probability?
99
that they do actually think of probability in this way. Since subjective probability equates partial belief this normative claim is particularly attractive from the Humean belief-desire perspective of action. The belief-part of the theory can simply be replaced with a subjective theory of probability. To sum up, non-Bayesian decision theorists do not necessarily think that objective theories of probability are false. The objective theories are, however, irrelevant from a Humean belief-desire perspective of action. Subjective theories make better sense of what it means to have a partial belief. As explained in Chapter 1, the present work simply assumes the truth of the Humean belief-desire account; to actually defend this premise is beyond the scope of this book.
6.2 Why not Bayesian subjective probability? A major reason for not accepting Bayesian theories of subjective probability is that Bayesians put the cart before the horse. The details of this argument were discussed in Chapter 2. However, in addition to that argument (which also casts doubt on the Bayesian theory of utility), I wish to explore two other, more specific arguments against Bayesian theories of subjective probability. Here is the first argument. A preference invariably requires a belief. However, the converse does not hold true. An agent can hold beliefs without having any preference. So why on Earth should a theory of subjective probability involve assumptions about preferences, given that preferences and beliefs are separate entities? Contrary to what is claimed by Bayesians, emotionally inert agents failing to muster any preference at all (because they have no desires) can hold partial beliefs. Here is a slightly different way of putting the argument. If the Bayesian view is correct, an agent cannot hold a partial belief unless he also has a number of desires, all of which are manifested as preferences over uncertain prospects. However, contrary to what is claimed by Bayesians, emotionally inert agents can also hold partial beliefs. The Bayesian assumption that there is an intimate link from probabilities to preferences (desires) therefore seems to be false. This claim is supported by the Humean belief-desire account of action, according to which there is virtually no relation at all between beliefs and desires.3 On the contrary, beliefs and desires are separate entities. Therefore, a theory of probability that presupposes that every believing agent is a desiring agent is doomed. I take it that the main structure of the argument is reasonably clear. The point is that Bayesian theories rely on an assumption that is not essential to the concept of probability, viz. the concept of desire. This shows that the meaning of the term subjective probability is not captured by the Bayesian analysis. In the present context, the term ‘meaning’ does not refer to the meaning people actually attach to the term in ordinary usage, but rather to the recommenced meaning in normative decision theory. Of course, it could be objected that in real life there are no emotionally inert 3
Note that this might be true even if desires can be reduced to beliefs in the way proposed in Chapter 5.
100
6 Subjective probability
agents. This might very well be true, at least in the actual world; as far as I can remember, I have never met an emotionally inert agent. However, this is hardly a relevant objection. For my argument to go through, it is enough that the existence of an emotionally inert agent is conceptually possible. At this point Bayesians may object that they have never attempted to analyse the meaning of the term subjective probability. All they seek to do is to offer a procedure for explaining and predicting human behaviour. Agents behave as if they acted from subjective probabilities and utilities. As long as agents can muster preferences (or choices that may be interpreted as preferences) over a sufficiently rich set of uncertain prospects, that will suffice for deriving a probability function. This holds true even if the agent happens to be emotionally inert, i.e. it does not matter what actually triggered the agent’s choice. Even an unconscious reflex movement would do. My response to this argument is that it is both right and wrong. It is correct that it does not matter from a Bayesian point of view if the set of preferences stem from an ordinary human or an emotionally inert robot. The formal results hold true in either case. However, from the point of view of normative decision theory there is an important difference. A basic working assumption, stated in Chapter 1, is that agents ought to choose some uncertain prospects rather than others because they have certain subjective probabilities (beliefs) and utilities (desires). Hence, if subjective probabilities, conceived of as partial beliefs, are to figure as genuine reasons for preferring one action to another, they cannot be entities that we may equally well ascribe to non-reasoning robots as to human beings. There has to be a difference, and the interpretation of the Bayesian approach outlined above cannot account for this difference. The second argument against Bayesian theories of subjective probability has been well-known among statisticians since the 1980’s. Even though solutions have been proposed, they come at a price. The argument starts from the observation that utilities are sometimes state-dependent. Suppose, for example, that you are standing next to James Bond, who is about to disarm a bomb. Now ask yourself what your subjective probability is that Mr Bond will manage to disarm the bomb before it goes off. Since you are a true Bayesian, you are prepared to state a preference between the gamble in which you win $100 if Bond manages to disarm the bomb and nothing otherwise, and the gamble in which you win nothing in case Bond manages to disarm the bomb and $100 if the bomb goes off. Let us suppose that you prefer the gamble in which you win $100 if Bond manages to disarm the bomb. According to the traditional Bayesian axiomatisations, it then follows that you think it is more probable that Bond will manage to disarm the bomb than not. However, the problem is that a rational agent will always prefer the gamble in which he wins $100 if the bomb is disarmed, no matter what he believes about the bomb, because if the bomb goes off money does not matter any more. (Once in heaven, you will, of course, be served a dry martini free of charge!) Hence, Bayesian theories come to the wrong conclusion. Even an agent who strongly believes that the bomb will go off will prefer the gamble in which he wins some money if the state he thinks is more unlikely occurs.
6.2 Why not Bayesian subjective probability?
101
The problem illustrated by the James Bond example is that utilities are sometimes state-dependent, whereas traditional Bayesian theories of subjective probability rely on the more or less tacit assumption that utilities are state-independent. (As far as I know, this crucial assumption was never explicitly stated by Ramsey or de Finetti. However, Savage mentions it.) That utilities are state-dependent means that the agent’s desire for an outcome depends on the state of the world under which the outcome occurs. A natural reaction to the James Bond problem is to argue that Bayesians should simply add the assumption that utilities are state-independent. Then the James Bond example could be ruled out as an illegitimate formal representation of the decision problem, since the utility of money seems to be state-dependent. However, the following example, originally proposed by Schervish et al (1990), shows that this is not a viable solution. Table 6.2 Lottery 1 Lottery 2 Lottery 3
State 1 $100 0 0
State 2 0 $100 0
State 3 0 0 $100
State 1 100 0 0
State 2 0 125 0
State 3 0 0 150
Table 6.3 Lottery 1 Lottery 2 Lottery 3
Suppose that the agent is indifferent between the three lotteries in Table 6.2. Bayesians then have to conclude that the agent considers the probability of each state to be 1/3. Also, suppose that the agent is indifferent between the three lotteries in Table 6.3. Given that the agent’s marginal utility for money is positive, it follows that his subjective probability of s1 is higher than his subjective probability of s2 , which is higher than his subjective probability of s3 . (Otherwise, the expected utility of the three lotteries could not be equal.) It is, therefore, tempting for the Bayesian to conclude that the agent has contradicted himself. It cannot both be the case that the probability of each state is 1/3, at the same time as the probability of s1 is higher than that of s2 . However, suppose that the three states denote three possible exchange rates between dollars and yens. State s1 is the state in which $100 = 100, s2 is the state in which $100 = 125, and s3 is the state in which $100 = 150. Obviously, this would restore the coherence of the agent’s preferences. By considering the hypothesis that the utility of money may be state-dependent for the agent, the Bayesian theory can be temporarily saved. However, important problems now arise. First of all, how could one tell from an external point of view whether utilities are
102
6 Subjective probability
state-dependent or not? In case one has only observed the preferences stated in Table 6.2, then what is the agent’s true subjective probability of the three states? More importantly, is s1 more probable than s3 or not? As long as the probability function is not unique, this question will remain open. It is beyond the scope of the present work to review the vast literature on statedependent utilities. A large number of articles have been published on this issue. However, the perhaps most influential solution is that proposed by Karni et al (1983). They suggested an axiomatisation that permits the derivation of a unique subjective probability function even in case the agent’s utility function is statedependent. This solution comes at a price, though. The axiomatisation requires that the agent be able to state preferences over a much wider class of uncertain prospects than other Bayesian theories. More precisely, Karni et al assume that the agent is able to state preferences over gambles under the assumption that he holds a particular hypothetical probability distribution over states, and the agent also has to be able to compare acts under different hypothetical probability distributions. This is, obviously, a very strong assumption, which makes the Bayesian theory of probability look less attractive than many of its supporters are aware of.
6.3 Non-Bayesian subjective probability The most prominent example of a non-Bayesian and subjective theory of probability is DeGroot (1970). DeGroot mainly worked on technical problems in mathematical statistics, so it is perhaps no surprise that this theory is little known by scholars from other disciplines. In what follows I shall first explain his theory in some detail and thereafter, in the next section, defend a slightly modified version of it. DeGroot’s basic assumption is that agents can make qualitative comparisons between pairs of events (rather than states), and judge which one they think is most likely to occur. For example, agents can judge whether it is more, or less, or equally likely, according to their own beliefs, that it will rain today in Cambridge than in Cairo. DeGroot then shows that if the agent’s qualitative judgements are sufficiently fine-grained and satisfy a number of structural axioms, there exists a function p that assigns real numbers between 0 and 1 to all events, such that one event is judged to be more likely than another if and only if it is assigned a higher number. In addition, p satisfies the axioms of the probability calculus. So in DeGroot’s theory, the probability function is obtained by fine-tuning qualitative data, thereby making them quantitative. One may think of this as a kind of bootstrap approach to subjective probability. The probabilistic information was there already from the beginning, but after putting the qualitative information to work the theory becomes quantitative. In order to spell out DeGroot’s theory in more detail, let S be the sample space of some act or experiment, for example all possible outcomes of an election. Let E be
6.3 Non-Bayesian subjective probability
103
a set of events to which probabilities are to be assigned4 and let X,Y, . . . be subsets of E. As usual, X ∪ Y denotes the union of event X and event Y , and X ∩ Y their intersection. The relation ‘more likely to occur than’ is a binary relation between pairs of events in E. This relation is a primitive concept in DeGroot’s theory. X > Y means that X is judged to be more likely to occur than Y , and X ∼ Y means that neither X > Y nor Y > X. For simplicity, we use X ≥ Y as an abbreviation for ‘either X > Y or X ∼ Y ’. DeGroot proposes five axioms, which are supposed to hold for all X,Y, . . . in E. DG 1 X ≥ 0/ and S > 0. / DG 1 articulates the trivial assumption that no event is less likely to occur than the empty set, and that the entire sample space is strictly more likely than the empty set. DG 2 For any two events X and Y , exactly one of the following relations hold: X > Y or Y > X or X ∼ Y . DG 2 requires that all events are comparable. This is not an entirely uncontroversial assumption. If the set of events contains extremely disparate events, such as ‘rain here within an hour’ and ‘humans will become extinct within a century’, some agents will perhaps find it impossible to tell which event is most likely to occur. However, in response to this, note that the axioms spelled out here are supposed to be normative requirements for ideal agents. Perhaps it is not unreasonable to require that an ideal agent should be able to compare the likelihood of any two events. DG 2 resembles the ordering axiom in the Bayesian approach, according to which rational agents must be able to rank any set of alternative risky acts without (explicitly) knowing the probabilities and utilities associated with their potential outcomes. However, DG 2 is less demanding than the ordering axiom in the Bayesian approach, since it does not involve any evaluative judgements. DG 3 If X1 , X2 , Y1 and Y2 are four events such that X1 ∩ X2 = Y1 ∩Y2 = 0/ and Yi ≥ Xi for i = 1, 2, then Y1 ∪ Y2 ≥ X1 ∪ X2 . If, in addition, either Y1 > X1 or Y2 > X2 , then Y1 ∪Y2 > X1 ∪ X2 . DG 3 can be explained by supposing that some events can occur in either of two mutually exclusive ways, for example (1) ‘the coin lands heads and you win a car’ and (2) ‘the coin lands tails and you win a car’; and (3) ‘the coin lands heads and you win a cycle’ and (4) ‘the coin lands tails and you win a cycle’. Then, if (3) is more likely than (1) and (4) is more likely than (2), it is more likely that you win a cycle than a car. sequence of events and Y is some event such DG 4 If X1 ⊃ X2 ⊃ . . . is a decreasing X that Xi ≥ Y for i = 1, 2, . . ., then ∞ i=1 i ≥ Y .
4
Formally put the set E, whose elements are sets of outcomes that are all elements in S, is a set of events iff the following requirements are met: 1) S ∈ E 2) If X ∈ E, then X c ∈ E 3) If X1 , X2 , . . . is an infinite sequence of sets from E, then ∞ i=1 Xi ∈ E.
104
6 Subjective probability
For an intuitive interpretation of DG 4, suppose that the X-events denote decreasing subsets of the real line between n and infinity. It follows from DG 4 that Y ∼ 0, / because no matter how unlikely Y is (Y > 0), / it is impossible that for every n between one and infinity, the intersection of events ∞ i=1 Xi is at least as likely as Y , given that each Xn is more likely than Xn+1 . This axiom guarantees that the probability distribution is countably additive. DG 5 There exists a (subjective) random variable which has a uniform distribution on the interval [0, 1]. DG 5 needs to be qualified. This axiom does not require that the random variable in question really exists. It is sufficient that the agent believes that it does. The nonBayesian approach to subjective probability theory makes no assumption about the nature of the external world—all that matters is the structure of internal subjective beliefs. DG 5 is thus consistent with the world being deterministic. In order to understand what work is carried out by DG 5, suppose that an agent wishes to determine her subjective probability for the two events ‘rain here within an hour’ and ‘no rain here within an hour’. Then, since the set of events E only contains two elements, it is not possible to obtain a quantitative probability function by only comparing those two events. The set of events has to be extended in some way. DG 5 is the key to this extension. In a uniform probability distribution all elements (values) are equally likely. As an example, think of a roulette wheel in which the original numbers have been replaced with an infinite number of points in the interval [0, 1]. Then, by applying DG 5 the set of events can be extended to the union of the two original events and the infinite set of events ‘the wheel stops at x (0 ≤ x ≤ 1)’, etc. Theorem 6.1 DG 1-5 are jointly sufficient and necessary for the existence of a unique function p that assigns a real number in the interval [0, 1] to all elements in E, such that X ≥ Y if and only if p(X) ≥ p(Y ). In addition, p satisfies the axioms of the probability calculus.5 A proof of Theorem 6.1 can be found in DeGroot (1970: 79-81). Although this theorem covers only unconditional probabilities, it can be easily extend to conditional probabilities. We just have to add an axiom saying that the probability of an event X given Z, (X | Z), is at least as likely as Y given Z, (Y | Z), if and only if X ∩Z ≥ Y ∩Z. See DeGroot (1970, Section 6.6) for a proof. It is worth emphasising that the numbers obtained in DeGroot’s theory are genuine subjective probabilities. They reflect the agent’s personal degree of belief that some particular events will occur. Another agent might make other qualitative judgements, and thereby obtain a different subjective probability function for the same set of events. However, even though impeccable from a technical point of view, it might be objected that DeGroot’s theory offers limited advice about how to, pragmatically 5
Using the notation introduced above, the axioms of the probability calculus can be stated as follows: 1) p(X) ≥ 0 for all X, 2) p(S) = 1, 3) p (ni=1 Xi ) = ∑ni=1 p(Xi ).
6.4 Subjective non-Bayesian probability and horse race lotteries
105
speaking, generate subjective probabilities. The theory seems to be too abstract. It does not suggest any procedure for determining the agent’s subjective probability function. On the contrary, DG 2 in conjunction with DG 5 immediately imply that the agent is able to assign quantitative probabilities to every element in S. These two axioms imply that for every event under consideration, the agent is able to judge whether the event in question is more, less, or equally likely, than every constructed event such as ‘the roulette-wheel stops somewhere in the interval [0, .x]’. Even ideal agents may need help with linking such judgements to some sort of observable behaviour.
6.4 Subjective non-Bayesian probability and horse race lotteries In what follows I shall outline what I take to be an improved version of DeGroot’s theory. The main difference is that I offer a procedure for linking subjective probability to observable choice, without making any assumption about desires or preferences. Most of the technical framework will remain the same, but the set of objects over which the agent is supposed to state qualitative judgements is drastically reduced. In this version of the theory, the agent only has to consider beliefs about his own choices. The basic set-up of the new theory is as follows. Consider a hypothetical horse race lottery, in which you win $100 dollars if and only if you bet on the winning horse. A horse race lottery is a metaphor for a lottery in which you either win a fixed amount of money or nothing, depending on the true state of the world. Table 6.4 Lottery 1 Lottery 2 Lottery 3
State 1 $100 0 0
State 2 0 $100 0
State 3 0 0 $100
Suppose that the agent’s utility for money is state-independent, i.e. that $100 is worth equally much no matter which state occurs.6 In this set-up, the agent’s choice will presumably be governed exclusively by which state he thinks is most likely to occur: Lottery 1 will be chosen over lottery 2 if and only if the agent considers state 1 to be more likely to occur than state 2, and so on. Now, in order to totally eliminate any reference to a desire for money the following step will be taken: Suppose that the agent is not making the choice for himself, but is rather advising a friend about which horse race lottery he should enter. For example, the agent may say that, ‘If you prefer $100 over $0, then I recommend you to choose lottery 1 over lottery 2’. The truth of this statement does not depend on the strength of the speaker’s desires, 6
As pointed out by Schervish et al (1990), this assumption does not always hold true. Cf Section 6.2.
106
6 Subjective probability
although it obviously tells us something about the speaker’s subjective probability: The speaker thinks that state 1 is more likely to occur than state 2. Moreover, in order to avoid the conditional form of the recommendation, one may prefer to consider conjunctions such as, ‘I believe that you prefer $100 over $0, and I therefore recommend you to choose lottery 1 over lottery 2’. This is a purely factual report about the speaker’s own beliefs, i.e. the statement expresses his subjective probability, but it says nothing about the speaker’s desires. From a behaviouristic point of view it can be argued that subjective probability ought to be linked to observable choices, rather than merely linguistic behaviour. Let us assume, for the sake of the argument, that this objection is valid. Then, in order to meet the behaviouristic requirement, one may assume that the agent in not only advising his friend, but is, in fact, also making the decision on his behalf. This is to say that the agent decides which lottery his friend will enter based on (i) his own beliefs about the world, and (ii) his beliefs about his friend’s desires. Of course, the agent’s choice will be influenced by his desire to help his friend, but this desire is constant and will not affect his choice behaviour. Therefore, the agent’s choice behaviour can be interpreted as a direct report of his subjective probability. It is important to keep in mind that we only consider horse race lotteries, i.e. lotteries in which the agent’s friend will either win some fixed amount of money or nothing. This means that the agent does not have to make any assumption about the relative strength of his friend’s desire for money—it will suffice to assume that $100 is worth more than $0. Compare this set-up with the analogous theory based on Savage’s axioms. In Savage’s set-up it would make little sense to ask the agent to make choices on behalf of a friend. This is because there is no way in which one could know the strength of the friend’s desire for various amounts of money and other goods. The utility functions of others are in general inaccessible, so it would make no sense to try to separate beliefs from desires in the way proposed above. The new theory can easily be formalised. Let S = {s1 , s2 , . . .} denote a set of states to which the agent seeks to assign subjective probabilities, and let L be a set of horse race lotteries corresponding to the elements of S.7 L is constructed from the elements of S as described in Table 1. Statisticians sometimes prefer to assign probabilities to events rather than states. However, note that states can be thought of as events that occur, and vice versa. As usual, s1 ∪ s2 denotes the union of s1 and s2 , i.e. the complex state in which at least one of the two states s1 or s2 obtains. s1 ∩ s2 is the intersection of the two states, i.e. the state consisting of the elements that the two states have in common.8 A pair of states is mutually exclusive if and only if / s1 ∩ s2 = 0. Note that each state corresponds to a horse race lottery in L. For instance, s1 ∪ s2 corresponds to the lottery l1 ∪ l2 , i.e. the lottery in which you win if at least one of the two states s1 or s2 materialises. Hence, to perform a set-theoretic operation on 7 We assume that S meets the following requirements: (i) s ∈ S (ii) If s ∈ S, then sc ∈ S (iii) If S1 , S2 , . . . is an infinite sequence of sets from S, then ∞ i=1 si ∈ S. 8 An example might help to explain this. If s is the state in which the die lands showing either 1, 2 1 or 3, and s2 is the state in which the die lands showing either 3, 4 or 5, the intersection is the state in which the die lands showing 3.
6.5 Concluding remarks
107
horse race lotteries is equivalent to performing the corresponding operation on the underlying states. The relation ‘is chosen by the agent rather than’ is a non-evaluative binary relation between lotteries in L. l1 > l2 means that l1 is chosen by the agent rather than l2 , and l1 ∼ l2 means that both lotteries are judged to be equally choice-worthy by the agent. Furthermore, l1 ≥ l2 is an abbreviation of ‘either l1 > l2 or l1 ∼ l2 , but not both’. Let us now consider the following five axioms, which are analogous to DG 1–5. The axioms are supposed to hold for all l1 , l2 , . . . in L: Axiom 6.1 L > 0/ and l1 ≥ 0. / Axiom 6.2 For any two lotteries l1 and l2 , exactly one of the following relations holds: l1 > l2 or l2 > l1 or l1 ∼ l2 . Axiom 6.3 If l1 , l2 , and m1 , m2 are four lotteries such that l1 ∩ l2 = m1 ∩ m2 = 0/ and mi ≥ li for i = 1, 2, then m1 ∪ m2 ≥ l1 ∪ l2 . If, in addition, either m1 > l1 or m2 > l2 , then m1 ∪ m2 > l1 ∪ l2 . Axiom 6.4 If l1 ⊃ l2 ⊃ . . . and m is some lottery such that li ≥ m for i = 1, 2, . . . , then ∞ i=1 li ≥ m. Axiom 6.5 The agent believes that there exists a random variable which has a uniform distribution on the interval [0, 1]. Theorem 6.2 Axioms 6.1-6.5 are jointly sufficient and necessary for the existence of a unique function p that assigns a real number in the interval [0, 1] to all elements in S, such that l1 ≥ l2 if and only if p(s1 ) ≥ p(s2 ). In addition, p satisfies the axioms of the probability calculus.
6.5 Concluding remarks A subjective probability function must, at least in principle, be accessible for empirical observation. Both the Bayesian and the non-Bayesian approaches fulfil this criterion. No matter which approach one takes, a subjective probability function can be established by studying the agent’s choice behaviour in an experimental situation. This tallies well with the behaviouristic ideal many of us respect and defend in one way or another. However, unlike the Bayesian approach, the non-Bayesian theory also meets a stronger criterion of action guidance. This theory gives the agent a reason for performing certain acts, which was not available before the application of the theory. Theories of subjective probability are primarily applicable to decisions made by ideal agents. For example, a non-ideal agent would probably not be able to meet the demands of Axiom 6.2, which require the agent to make pairwise comparisons between every pair of horse race lotteries. However, the fact that non-ideal agents may find it hard to use the theory in real-life decision making is a practical problem
108
6 Subjective probability
that need not concern us here. There is no reason for thinking that there is anything wrong with the non-Bayesian theory of subjective probability from a conceptual point of view. It is a normative ideal that non-ideal agents should live up to as best they can. If a non-ideal agent falls short of this ideal, there is clearly a flaw in his decision making, in that his decisions would be more rational if they were more in conformity with ideal ones.
Chapter 7
Expected utility
Up to this point, little has been said about how to actually choose among uncertain prospects. The primary focus has been on how to represent decision problems and define key concepts such as preference, utility, and subjective probability. In this chapter it will be argued that utilities and probabilities, construed in the nonBayesian way, constitute reasons for choosing in accordance with the expected utility principle. As informally explained in Chapter 1, this principle holds that an act a is rational just in case the sum ∑s∈S p (s) · u (a, s) is maximal, where p (s) is the subjective probability of state s and u (a, s) the utility of outcome a, s. Two slightly different non-Bayesian axiomatisations of the expected utility principle will be presented, none of which rely on any version of the independence axiom or sure-thing principle. As will be explained at the end of the chapter, this opens up for a resolution of the Allais paradox that cannot be obtained with arguments relying on those axioms. The main difference between the two axiomatisations is that the first is formulated in terms of decision rules, whereas the second is formulated in terms of formal decision problems. This difference is subtle, but important. In the rule-based axiomatisation the main theorem states that a certain decision rule—the principle of maximising expected utility—has a certain normative status, namely that it is ‘rational’. In the act-based axiomatisation the main theorem states that an act is rational to perform just in case its expected utility is at least as high as that of every alternative. According to the rule-based axiomatisation the ultimate aim of normative decision theory is to justify a particular decision rule. According to the act-based axiomatisation the ultimate aim is rather to justify a particular set of acts. This distinction raises a fundamental decision theoretical problem: Should decision theorists seek to justify a certain way of making decisions or just certain decisions? Or slightly differently put, should we do the right thing for the right reasons, or is it sufficient that we (always) do the right thing, no matter what the reasons are? Since I am not sure what the correct answer is, I have developed axiomatisations to cover both alternatives. The plan of this chapter is as follows. Section 7.1 gives an overview of previous attempts to justify the expected utility principle, and Section 7.2 outlines the
109
110
7 Expected utility
intuitions on which the new axiomatisations are based. Section 7.3 presents the rulebased axiomatisation, which is followed in Section 7.4 by a presentation of the actbased axiomatisation. Section 7.5 discusses how the new axiomatisations resolves the Allais paradox and Section 7.6 analyses the relationship between the infamous independence axiom and the fifth intuition stated above, the trade-off principle.
7.1 From Pascal to Allais The principle of maximising subjective expected utility has its origins in the principle of maximising expected monetary value. Pascal proposed that the latter principle should be applied for calculating the fair price of gambles, and in Arnauld and Nicole’s Port-Royal Logic from 1662 Pascal’s principle was generalised to cover all kinds of decisions, not only decisions about monetary transactions:1 in order to decide what we ought to do to obtain some good or avoid some harm, it is necessary to consider not only the good or harm itself, but also the probability that it will or will not occur; and to view geometrically the proportion that all these things have when taken together. . . . Those who do not draw [this conclusion], however exact they are in everything else, are treated in Scripture as foolish and senseless persons, and they misuse logic, reason, and life.2
The principle of maximising (objective) expected utility was formulated more clearly by Bernoulli in 1738.3 Bernoulli explicitly introduced the concept of utility, which he defined as a linear measure of the strength of agent’s desires. Ramsey was the first to propose a subjective interpretation of this principle in his seminal paper ‘Truth and Probability’ (1926). Decision theorists have proposed roughly two kinds of arguments for (their favourite version of) the expected utility principle, viz. the classic argument and the axiomatic approach. The point of departure in the classic argument is the law of large numbers. According to this mathematical theorem, if a random experiment is repeated n times and each experiment has a probability p of success, then the probability that the percentage of successes differs from the probability p by more than a fixed positive amount, ε > 0, converges to zero as the number of trials n goes to infinity, for every positive ε . Therefore, the agent will be better off in the long run if he chooses to maximise expected utility, rather than opts for any other alternative. A widely recognised weakness of the classic argument is that agents seldom face the same (or similar) decisions a large number of times. Most decision problems are unique in at least some respect and some decision makers face only a very small number of decisions. Consequently, arguments based on what would happen
1 2 3
For a useful historical survey of the principle of maximising expected utility, see Keynes (1921). Arnauld and Nicole (1662/1996:273-5). Bernoulli (1738/1954).
7.1 From Pascal to Allais
111
in the long run seem to be of little or no normative relevance. John Maynard Keynes stressed this point, as he reminded us that ‘in the long run we are all dead’.4 Axiomatic arguments for the expected utility principle are independent of the law of large numbers. They seek to show that this decision rule can be derived from axioms that hold independently of what would happen in the long run. Generally speaking, in an axiomatised theory a small number of intuitively reasonable propositions are adopted for supporting a set of more complex propositions. Two instructive examples are Euclid’s axiomatisation of geometry and Peano’s axiomatisation of arithmetics. Until recently, it was commonly believed that a successful axiomatisation ought to adopt as its axioms some set of fundamental or indisputable truths. Such very strong criteria are seldom stated nowadays, mainly because there seem to be no fundamental or indisputable truths. It seems more plausible to maintain that even axioms should be regarded as fallible—as every other proposition—and hence compatible with the Quinean slogan that ‘no statement is immune to revision’.5 According to this line of thought, an axiomatised theory is valuable mainly because it shows how certain central ideas are connected to each other. If a set of axioms accepted by an agent is less controversial than some theorem, then a proof that the theorem follows from the axioms indicates that the agent should either give up at least one of the axioms, or accept the theorem. In case the axioms are sufficiently reasonable, it will be more attractive to accept the theorem compared to giving up an axiom.6 A larger number of decision theorists have proposed axiomatic arguments for the expected utility principle over the years, including those mentioned in Chapter 2. A unifying feature of nearly all existing axiomatisations is that they take a Bayesian approach. The only exception I am aware of is Oddie and Milne (1991), who start from the assumption that the agent has access to exogenously defined probability and utility functions (which they never define). Based on that assumption, they propose a set of axioms that entail that everyone obeying them will maximise expected utility. However, one of their axioms is a version of the infamous independence axiom, which directly implies the Allais paradox.7
4 Keynes (1923:89). In this quote Keynes is mainly concerned with long run effects in economic theory. Similar remarks about the irrelevance of such effects in general decision theory can be found in Keynes (1921: Chapter 26). 5 Quine (1951: Section VI). 6 It is worth noticing that this way of conceiving of axiomatisations is fully compatible with epistemic coherentism, according to which agents are justified to believe in (a set of) propositions in proportion to how well the proposition(s) cohere with the rest of the agent’s beliefs. 7 See Rabinowicz (1990).
112
7 Expected utility
7.2 Preamble to the new axiomatisations Before presenting the non-Bayesian axiomatisations in detail it is helpful to first explain the normative intuitions on which they are based. Consider the following principles: 1. Action guidance: In every decision problem, at least one act is rational to choose. 2. Dominance: If an act yields strictly better outcomes than another under all states, then the latter act is not rational. 3. Split of states: If a state is split into two other states, which both have exactly the same outcome-distribution as the original state, then an act is rational in the original decision problem if and only if it is rational in the modified decision problem. 4. Split of acts: If an act is split into two other acts, which both have exactly the same outcome-distribution as the original act, then the two new acts are rational if and only if the original act is rational. 5. Trade-off : Suppose that the outcome of act a given state s is better than a given state s’. Then, there is some non-zero amount by which the outcome of a given s can be deteriorated, such that there is some (possibly large) improvement of the outcome of a given s’ that compensates for this. The principles articulated above will be more carefully formulated in Sections 7.3 and 7.4. A technical limitation of the two axiomatisations is that they are only applicable to formal decision problems with finite sets of alternative acts and states. The axiomatisations also presuppose that all probabilities can be expressed as rational numbers. I believe that the technical convenience gained by introducing these assumptions outweighs the philosophical benefits of considering the unrestricted case.
7.3 The rule-based axiomatisation I shall now present the rule-based axiomatisation. This axiomatisation seeks to reconstruct the principle of maximising expected utility as a composite decision rule consisting of transformative and effective subrules, along the lines mentioned in Chapter 3. Unlike the act-based axiomatisation presented in the next section, the present one explicitly assumes that the agent is able to assign utilities to outcomes in the manner proposed in Chapter 6. Furthermore, as explained in Section 1.4, outcomes are conceived of as ordered pairs of acts and states; that is, a, s is the outcome of performing act a given that s is the true state of the world. The utility of a, s is denoted u(a, s). Before proceeding, a warning is in order. The rule-based axiomatisation is more attractive from a philosophical point of view than the act-based one. However, it is also less transparent and employs more axioms than the act-based one. Before trying
7.3 The rule-based axiomatisation
113
to come to grips with the present section it might therefore be a good idea to have a quick look at the act-based axiomatisation in Section 7.4. As explained in Section 3.1, there is a distinction to be drawn between two classes of decision rules, viz. effective and transformative rules. Recall that t is a transformative decision rule on Π if and only if t is a function such that for all π ∈ Π , it holds that t (π ) ∈ Π , and that e is an effective decision rule on Π if and only if e is a function such that for all A, S, P,U ∈ Π it holds that e (A, S, P,U) ⊆ A. Also recall that a composite decision rule is one that is made up of other (transformative or effective) decision rules. More precisely, a new decision rule (t ◦ x) may be constructed from the transformative rule t and the rule x, where x is either a transformative or effective rule. Remember that by applying the concept of composite decision rules, the expected utility principle (eu) can be reconstructed as a composite rule consisting of two subrules, one transformative and one effective, such that eu = weigh ◦ max: weigh: Transform a decision problem under risk π into another under certainty π with the same alternative set, such that the utility of every outcome in π equals the weighed sum of the utilities and probabilities for the corresponding outcomes in π , i.e. ∑s∈S p (s) · u (a, s). max: In a decision problem under certainty, choose an alternative act yielding the highest utility. As noted in Chapter 1, a formal decision problem under risk is a formal representation with exactly one probability function, such that 0 < p (s) < 1 for some s ∈ S. In a formal decision problem under certainty it holds that p (s) = 1 for some s ∈ S. That weigh transforms a decision problem under risk into one under certainty does not, of course, mean that the actual decision situation faced by the decisionmaker is changed; the outside world is unaffected. What is changed by weigh is the representation of that situation. However, this decomposition indicates that to some extent, this means that Savage’s distinction between the two phases of rational decision making mentioned in Chapter 3 is not water-tight: Both phases comprise genuine transformative elements. Arguably, the most controversial subrule of the principle of maximising expected utility is its transformative element weigh, not its effective subrule. Note, however, that nothing prevents the agent from decomposing a controversial transformative subrule into new subrules, thereby analysing it in more detail. The basic idea is simple: If one can find some rational subrules into which eu can be further decomposed, then eu can be shown to be rational by justifying these subrules individually. Below I propose a set of conditions which I believe is useful for this task. The conditions will, somewhat artificially, be divided into technical and normative conditions. For simplicity, the technical conditions will be called ‘postulates’, and the normative conditions ‘axioms’. When formulating the conditions, the phrase ‘rational’ will be used throughout. Instead of speaking of rational applications of (effective and transformative) decision rules, one could also speak of ‘normatively reasonable’ or ‘sound’ applications of decision rules.
114
7 Expected utility
First of all, in order to articulate the idea that a decision rule x can be justified by splitting it into a number of subrules x1 , x2 ,. . . , xn , consider the following condition: Postulate 7.1 (Decomposition): If (i) it is rational to apply the rule x1 to every element of Π1 , x2 to every element of Π2 ,. . . ,xn to every element of Πn , and (ii) it for every element π in Π1 holds that x1 (π ) ∈ Π2 , x1 ◦ x2 (π ) ∈ Π3 , . . . , x1 ◦ x2 ◦ ... ◦ xn−1 (π ) ∈ Πn , then it is rational to apply the rule x1 ◦ x2 ◦ ... ◦ xn to every element of Π1 . The intuition underlying the second postulate is that decision rules should be evaluated solely according to what recommendations they make; their internal mathematical structure does not matter. Consider the following condition. Postulate 7.2 If x1 and x2 yield the same outcomes when applied to all elements of a set of formal decision problems Π and the application of x1 to every element of Π is rational, then the application of x2 to every element of Π is also rational. In order to state the third and last technical condition, let two formal decision problems π and π be solution-equivalent if and only if every alternative act that is rational to perform in π is rational to perform in π , and vice versa. Postulate 7.3 Let π and π be two solution-equivalent formal decision problems. Then it is rational to apply a transformative rule t to π that transforms π to π . Postulate 7.3 is not entirely innocent. It could be argued that a transformation of the kind described by this postulate should be permitted only in case the formal decision problem obtained by carrying out the transformation scores at least as high as the original one with respect to the deliberative values described in Section 3.2. However, as can be seen in the proof of Theorem 7.1 the transformations carried out by Postulate 7.3 seem to fulfil this requirement; hence, a more complex formulation of the postulate could most certainly address this criticism. I shall now state the normative axioms. Let π = A, S, p, u be a formal decision problem. Let the set of dominating acts D be a subset of the set of acts A such that a is a member of D if and only if, for all states s in S, u(a, s) ≥ u(a , s) for every a ∈ A. EU 1 (Dominance) Whenever the set of dominating acts D of π is non-empty, an effective rule e is rational if and only if e (π ) = D. EU 2 (Split of states) Let π = A, S, p, u. Let π = A , S , p , u be exactly as π , except that S = (S − {s}) ∪ {s , s } and p(s) = p (s ) + p (s ). Then, if t(π ) = π and u(a, s) = u(a, s ) = u(a, s ) for all a ∈ A, it is rational to apply t to π . EU 3 (Split of acts) Let π = A, S, p, u. Let π = A , S , p , u be exactly as π , except that A = (A − {a}) ∪ {a , a }. Then, if u(a, s) = u(a , s) = u(a , s) for all s ∈ S, it is rational to apply a rule t that transforms π into π , or vice versa.
7.3 The rule-based axiomatisation
115
EU 4 (Trade-Off) Let π = A, S, p, u. If u(a, s) > u(a, s ), then there is some number δ > 0, such that for every ε1 , 0 ≤ ε1 ≤ δ , and every p(s) and p(s ), there is some ε2 such that it is rational to transform π into π by t, where π is obtained from π by subtracting ε1 from u(a, s) and adding ε2 to u(a, s ). EU 1 is a traditional dominance condition. EU 2 is a generalisation of Milnor’s Axiom 8 (‘column duplication’) and Luce and Raiffa’s Axiom 11.8 It allows the agent to split a state into two, given that the sum of probabilities for the two news states is the same as for the original state. For example, if s is the state ‘it’s raining’, then s might be the state ‘it’s raining & the coin land heads up’ and s the state ‘it’s raining & the coin does not lands heads up’. Then, according to EU 2, it follows that the solution of the formal decision problem in which these states are elements is unaffected. The implications of this axiom are best realised through an example. According to EU 2, π and π in Table 7.1 are solution-equivalent: Table 7.1 [π ]
[π ] a1 a2
0.1 0 −1
0.9 0 1
a1 a2
0.1 0 −1
0.1 0 1
0.8 0 1
EU 3 asserts that if an act is split into two other acts, which both have exactly the same outcome-distribution as the original act, then the two new acts are rational just in case the original act is rational. In order to illustrate what it means to split an act into two, suppose that you have decided to go for a walk in the park. This activity can be performed in two ways, namely by carrying the umbrella in either your left or in your right hand. If all possible outcomes of these two new acts are the same as the possible outcomes of the original act (and the probability function for the states is the same), then the normative status of all three acts should be the same. EU 4 is the most controversial axiom. Intuitively, it says that a good outcome can always be slightly deteriorated, such that this is compensated for by improving a not-so-good outcome of the same act. However, note that (i) the compensation might be enormous, compared to the restricted slight deterioration, and (ii) the slight deterioration is always ‘withdrawn’ from an outcome that is better than (is preferred to) the outcome to which the compensation is added. The implications of this axiom can be illustrated with an example: Suppose that Adam offers you to toss a fair coin. If it lands heads up you will be given 10 utiles, otherwise you receive 1 utile. If you refuse to take part in the game you will be given 5 utiles. Before you decide whether to play the game or not, Adam informs you that he is willing to change the rules of the game such that instead of giving you 10 utiles if the coin lands heads up he will give you a little bit less, 10 − ε1 , but compensate you for this 8 See Milnor (1954:52) and Luce and Raiffa (1957:295). In the formulation given above, the phrase ‘π is exactly as π , except that . . .’ should be taken to mean that all necessary adjustments of u are also made.
116
7 Expected utility
potential loss by increasing the other prize to 1 + ε2 utiles. He adds that you are free to choose the value of ε2 yourself! Now, note that the trade-off principle does not say anything about whether you should choose 5 utiles for sure instead of the gamble yielding either 1 or 10 utiles, or vice versa. Such choices must be determined by other considerations. The trade-off principle only tells you that there is some (perhaps very small ) number δ < 0, such that for all ε1 , 0 ≤ ε1 ≤ δ , there is a number ε2 such that the trade-off suggested by Adam is indifferent to you, i.e. such that the two decision problems are solution-equivalent. Consider Table 7.2. Table 7.2 [π ] a1 a2
s 5 10
s 5 1
[π ] a1 a2
s s 5 5 10 − ε1 1 + ε2
If a sufficiently large value of ε2 is chosen, even many risk averse decision-makers would accept the suggested trade-off.9 Therefore, the trade-off principle can be accepted not only by decision-makers that are neutral to utility risks. However, this axiom is nevertheless more controversial than EU 1–3. Most notably, the trade-off principle implies that once ε1 and ε2 have been established for a given pair of states, these constants can be added over and over again to the utility numbers representing this pair of outcomes. This is because the axiom assumes that ε2 is a function of (at most) ε1 , p(s), and p(s ). In the Appendix a weaker, but more complex, version of the trade-off principle is formulated, which allows ε2 to also be a function of the difference between u(a, s) and u(a, s ). In principle, a range of other variables could also be taken into account, as long as the function can be described by a positive monotone function. The trade-off principle slightly resembles Savage’s sure-thing principle.10 Expressed in the terminology employed here, the sure-thing principle implies that a decision problem under risk π is solution-equivalent to a decision problem under risk π if π is transformed into π by adding an arbitrary constant to each entry of a column in which all utilities are equal. Both axioms regulate the additions of utilityconstants to decision matrixes, but the sure-thing principle operates ‘vertically’, i.e. on columns, whereas the trade-off principle operates on a ‘horizontal’ level, i.e. on alternatives. From a technical point of view, both the sure-thing principle and the trade-off principle are rather strong. Both axioms are, for instance, incompatible with the maximin and maximax rules. (To realise why, apply the sure-thing principle 9
In this context, ‘risk aversion’ refers to utility risks, not to the notion of risk aversion developed by Pratt (1964) and Arrow (1970). See Chapter 8. 10 Also Milnor’s well-known axiom of column linearity (1954:51) has some similarities with the trade-off principle. According to the column linearity axiom a decision problem π is solutionequivalent to a decision problem π if the former is transformed into the latter by adding an arbitrary constant to each entry of a column, no matter if all utilities are equal or not. Hence, the sure-thing principle is trivially implied by column linearity.
7.3 The rule-based axiomatisation
117
and add −$2M to the right column of Gamble 1 and 2 in Section 7.4, and apply the trade-off axiom to a decision problem with equi-probable states in which the utilities of all outcomes are equal.11 ) The trade-off axiom is however not implied by the sure-thing principle, and neither is the latter implied by the former; according to the sure-thing principle π is solution-equivalent to π in Table 7.3, but this conclusion cannot be drawn from the trade-off principle. Therefore, the former axiom is not implied by the latter. An analogous argument applies to π and π . Table 7.3 [π ]
[π ] 0.4 0 0
a1 a2
0.2 10 5
0.4 1 5
a1 a2
[π ] 0.4 0.2 0 − ε1 10 0 − ε1 5
0.4 1 5
a1 a2
0.4 0.2 0 10 0 + ε2 5
0.4 1 5 − ε3
My tentative conclusion is that even though the trade-off principle is normatively substantial, its plausibility does not fall short of that of the sure-thing principle. The main theorem in the rule-based axiomatisation is Theorem 7.1 stated below. Let a set Π of formal decision problems be closed if and only if, for every formal decision problem A, S, p, u ∈ Π , if S is a non-empty and finite set of (act-independent) states of the world, p is a function from S to [0, 1] such that n
∑ p(s j ) = 1, and u is a function from A × S to R that is invariant up to positive
j=1
linear transformations, then A, S , p , u ∈ Π . Theorem 7.1 Let Postulates 7.1–7.3 and EU 1–4 hold for a closed set Π of formal decision problems. Then it is rational to apply eu to every π ∈ Π . A formal proof of Theorem 7.1 is given in the Appendix. In what follows, I will give an informal presentation of its main structure. As suggested by Postulate 7.1, the fundamental idea is to show that the principle of maximising expected utility can be seen as a composite rule consisting of a number of transformative and effective rules, each of which is rational on its own. In the first transformation step, the original decision problem π is transformed into another decision problem with equiprobable states. This is done by applying EU 2 a finite number of times, each time splitting a more probable state into two less probable ones. The new states are irrelevantly different versions of the original, in so far as they allot the same utility for the alternative acts as the original state. In the second transformation step the aim is to equalise the utilities of each act over the equiprobable states. In order to do this, the following lemma is applied.
11
Here is an example: 0.5 0.5 0.5 0.5 a1 1 − ε1 1 + ε2 a1 1 1 a2 1 1 a2 1 1
118
7 Expected utility
Lemma 7.2 (Equal trade-off): Let EU 1, 3 and 4 hold. Then, there is some number δ > 0 such that for all ε , 0 ≤ ε ≤ δ , and all decision problems π , if two states s and s are equiprobable in π , and a is one of the alternatives in π , then π is solution-equivalent to the decision problem π obtained from π by subtracting ε from the utility of a under s and adding ε to the utility of a under s , given that u(a, s) > u(a, s ). A formal proof of Lemma 7.2 is given in the last section of this chapter. By adding a small amount of utility to the lowest utility of a given act and at the same time subtracting the same amount from its highest utility, and repeating this operation a finite number of times, it can be ensured that all utilities of each act over the different equiprobable states will be equalised. The constant utility of each act in this decision problem will, as is easily seen, equal the expected utility of each act in the original decision problem. In the third transformation step EU 2 is applied again, and all equiprobable states are merged into a single state, which is assigned probability one. The utility of each act given this state equals its constant utility from transformation step two. In the fourth step EU 1 is applied to the decision problem under certainty yielded by the third step. As EU 1 is applied to a decision problem under certainty its recommendations will be identical to that of rule max on page 35. Consequently, the set of acts chosen in these four steps will be identical to the set of acts chosen by the principle of maximising expected utility.
7.4 The act-based axiomatisation This section presents the act-based axiomatisation. The main result is Theorem 7.3, which establishes that an act is rational if and only if its expected utility is at least as high as that of every alternative. The difference between this and the previous axiomatisation is that the new approach does not require that a particular decision rule, e.g. the principle of maximising expected utility, is applied for deciding what to do. It is the act itself that matters. The point of departure in the act-based approach is the following definition of a solution of a formal decision problem. Definition 7.1 Let π be a formal decision problem under risk. Then, Φ (π ) is a solution of π if and only if (i) Φ (π ) ⊆ A and (ii) every member in Φ (π ) is rational to perform.
Φ (π ) is of course not supposed to be known in advance by the agent; rather, it is the aim of decision theory to determine the members of Φ (π ). The notion of solution-equivalent decision problems, employed in the previous axiomatisation, can be easily reconstructed in the new terminology: π and π are solution-equivalent iff Φ (π ) = Φ (π ).
7.4 The act-based axiomatisation
119
In Chapter 5 it was shown that numerical utilities could be assigned to outcomes without stating preferences among uncertain prospects. The main result was Theorem 5.1, which was implicity presupposed in the rule-based axiomatisation. However, in the act-based axiomatisation proposed in this section, the axioms will be formulated in a more general way, that does not presuppose any particular notion of utility. By doing so the analysis of the expected utility principle can be clearly separated from discussion of the utility concept as such. It will be helpful to introduce a few auxiliary concepts. First, I wish to model the idea of an outcome being slightly deteriorated. Intuitively put, the outcome ‘I win ten million dollars & I lose one euro’ is a slight deterioration of the outcome ‘I win ten million dollars’. However, strictly speaking, to deteriorate an outcome a, s by another outcome a, s presupposes that the formal decision problem is modified. Since outcomes are ordered pairs of acts and states, all acts and states cannot be the same. This will be taken care of by modifying the set of states, while leaving the set of acts unmodified. For example, it might not have been true that I will win ten million dollars if I buy the lottery ticket; perhaps some small amount would have been deducted by the Inland Revenue. This implies that the set of states was not exactly as I thought it was. Let ◦ be a binary operation on outcomes. Then, the composite outcome a, s ◦ a, s = a, s is an outcome of a modified formal representation, as explained above. However, instead of writing a, s ◦ a, s , it is more convenient to write a, s ◦ x, where a, s is an outcome slightly deteriorated by an undesirable outcome x, e.g. ‘I lose one euro’ or ‘I arrive to work two minutes earlier than necessary’. Consider the following technical definition, which explicates the idea of a continuous interval of slightly undesirable outcomes. Definition 7.2 X is a set of bounded undesirable outcomes if and only if (i) there are some x, x ∈ X such that for every y ∈ X, x y x , (ii) for every x, y ∈ X, x x ◦ y, and (iii) for every y, y ∈ X there is some y such that y y y . Consider the following axioms, which I propose hold for every formal decision problem under risk π . / EU 5 (Action guidance) Φ (π ) = 0. EU 6 (Dominance) If there is an a such that for some a it holds that a , s a, s for all s, then a ∈ Φ (π ). EU 7 (Split of states) Let π = A, S, p, u and π = A, S , p , u , where S = (S − {s}) ∪ {s , s } and p(s) = p (s ) + p (s ) and u is parallel to u. Then, if a, s ∼ a, s ∼ a, s for all a ∈ A, it holds that Φ (π )=Φ (π ). EU 8 (Split of acts) Let π = A, S, p, u and π = A , S, p, u , where A = (A − {a}) ∪ {a , a } and u is parallel to u. Then, if a, s ∼ a , s ∼ a , s for all s ∈ S, it holds that {a , a } ⊆ Φ (π ) if and only if {a} ⊆ Φ (π ). EU 9 (Trade-off) If a, s a, s and a ∈ Φ (π ), then there is a non-empty set of bounded undesirable outcomes Q such that for every y ∈ Q, p(s) and p(s ) there is
120
7 Expected utility
an outcome x such that Φ (π ) = Φ (π ), where π is obtained from π by substituting a, s ◦ y for a, s and a, s ◦ x for a, s . It can be easily verified that each of EU 5–9 is a necessary consequence of the expected utility principle. Furthermore, none of these axioms imply, or is implied by, von Neumann and Morgensgtern’s infamous independence axiom or Savage’s surething principle, as explained in Section 7.3. The intuitions underlying the axioms employed in this act-based axiomatisation are the same as those referred to in the rule-based axiomatisation, expect for EU 5. This axiom excludes the possibly of a formal decision problem in which no act is rational to perform. This is of course a substantial normative assumption, not just a ‘technical’ principle. If there are genuine moral dilemmas in decision theory, EU 5 is false. Suppose, for instance, that you have to choose between a career as a banker or as a philosopher. Clearly, the banking career is better since it will earn you more money, whereas the philosophy career is better since it will give you the opportunity to work on problems that really interests you. However, it is not clear that one of the two alternatives is better than the other—all things considered. The two values might be incomparable. Hence, it might be argued that you cannot avoid acting irrational whatever you do. (I am aware that some people remain unconvinced by this type of examples. Perhaps that is because they take for granted that all values are comparable.) It is worth noticing that, in principle, EU 5 does all the work carried out by Postulates 7.1–7.3 in the rule-based approach. This suggests that there is a choice to be made here: One could either accept EU 5 and the normative implications associated with it or stick to the rule-based axiomatisation and the three postulates suggested in Section 7.3. I will leave this choice to the reader. The upshot of the act-based axiomatisation is Theorem 7.3, stated below. This theorem requires that utilities can be assigned to outcomes, as suggested by Theorem 5.4. In order to keep the technical notation simple, I will abbreviate u(a, s) as u(a, s). Here is the theorem. Theorem 7.3 Let axioms EU 5–EU 9 hold for a set Π of formal decision problems. Then, for every utility function u satisfying the conditions of Theorem 5.4, and for every π ∈ Π , it holds that:
Φ (π ) = {a : ∑ p(s) · u(a, s) ≥ ∑ p(s) · u(a , s) for all a ∈ A} s∈S
s∈S
Theorem 7.3 shows that all acts that are elements in the solution of the decision problem fulfill the expected utility criterion—but nothing more is implied. In particular, nothing follows about the normative status of the principle of maximising expected utility.
7.5 The Allais paradox The non-Bayesian axiomatisations presented in Sections 7.3 and 7.4 can resolve the Allais paradox in a novel way. This is because the new axiomatisations do not
7.5 The Allais paradox
121
make any use of the infamous independence axiom or sure-thing principle. The independence axiom and the sure-thing principle can, of course, be derived from the principle of maximising expected utility (and hence from the conjunction of the axioms stated in Section 7.3 and 7.4). But the independence and sure-thing axioms are not necessary for deriving the expected utility principle. This difference is subtle, but important. The Allais paradox arises when agents are asked to make pair-wise choices between the following gambles, in which exactly one winning ticket will be drawn.12 Table 7.4 Gamble 1 Gamble 2
no. 1 $1M $0
no. 2-11 $1M $5M
no. 12-100 $1M $1M
Gamble 3 Gamble 4
$1M $0
$1M $5M
$0 $0
In a choice between Gamble 1 and 2 most people prefer Gamble 1, since it gives the agent $1M for sure. However, in a choice between Gamble 3 and 4 a large majority is prepared to trade a ten-in-hundred chance of getting $5M instead of $1M, against a one-in-hundred risk of getting nothing instead of $1M, and consequently chose Gamble 4. Several empirical studies have confirmed these preferences.13 The problem is that no matter how utility numbers are assigned to money, the principle of maximising expected utility recommends the agent to prefer Gamble 1 to 2 if and only if Gamble 3 is preferred to 4. There is simply no utility function that would make this decision rule consistent with a preference for Gamble 1 to 2 and 4 to 3. This is easily verified by calculating the difference in expected utility between Gambles 1 and 2, respectively 3 and 4: u(Gamble 1)-u(Gamble 2) = 0.11u(1M)-[0.01u(0)+0.1u(5M)] u(Gamble 3)-u(Gamble 4) = 0.11u(1M)-[0.01u(0)+0.1u(5M)] Arguably, the challenge raised by the Allais paradox is not to show that it actually is reasonable to prefer Gamble 1 to 2 but not 3 to 4. An argument to this effect would solve no problem for the adherent of this decision rule. Rather, the challenge posed by the Allais paradox is to give some plausible argument for the claim that one ought to prefer Gamble 3 to 4 if and only if one prefers Gamble 1 to 2. Savage himself provided an argument to this effect, which runs as follows: if one of the tickets numbered from 12 through 100 is drawn, it does not matter, in either situation which gamble I choose. I therefore focus on the possibility that one of the tickets numbered from 1 through 11 will be drawn, in which case [the gamble between 1 and 2 and between 3 and 4] are exactly parallel. . . . It seems to me that in reversing my preference between Gambles 3 and 4 I have corrected an error. (Savage 1972:103). 12 13
Savage (1954/72:103). Kagel and Roth (1995: Chapter 8) give a useful list of references.
122
7 Expected utility
Savage’s argument is a direct application of the sure-thing principle: Gamble 3 should be preferred to 4 if and only if Gamble 1 is preferred to 2, because the addition of a constant to a column in which the utility of all outcomes are equal does not affect the solution of the decision problem. However, since the Allais paradox is intended to challenge nothing but the truth of the sure-thing principle, and can be derived directly from this principle, Savage’s argument seems to beg the question. Why is it true that columns with parallel outcomes can be neglected? I believe that in order to resolve the Allais paradox in a way that is not question-begging, one has to construct an argument that (unlike Savage’s) is independent of the sure-thing principle. Another type of response to the Allais paradox is to question the accuracy of the formalisation of the decision problem. The outcome of getting $0 in Gamble 2 is very different from the outcome of getting $0 in Gamble 4. The disappointment one would feel if one won nothing instead of a fortune in Gamble 2 is likely to be substantial. Table 7.5 is therefore a more accurate representation. Note that it no longer holds true that the expected utility principle is inconsistent with the preference pattern people actually entertain. Table 7.5 Gamble 1 Gamble 2
no. 1 $1M $0 & disapp.
no. 2-11 $1M $5M
no. 12-100 $1M $1M
Gamble 3 Gamble 4
$1M $0
$1M $5M
$0 $0
A drawback of this response is that it seems difficult to tell exactly how fine-grained the description of outcomes ought to be. In principle, it seems that every potential violation of the expected utility principle could be explained away by simply making the individuation of outcomes more fine-grained. However, this would make the principle immune to criticism, unless one has some independent reason for adjusting the individuation of outcomes.14 I shall now state my own view about the Allais paradox. My point is that the axiomatisations constructed in Sections 7.3 and 7.4 do not make any use of the sure-thing principle, nor do any of the axioms (taken alone) imply that Gamble 3 should be preferred to 4 if and only if Gamble 1 is preferred to 2. Hence, people who accept the new axiomatisations are in a much better position with respect to the Allais paradox. These people could simply claim that one ought to reverse some of one’s preferences between the gambles in the Allais paradox since one would then maximise expected utility. The adherent of the new axiomatisations can appeal to the principle of maximising expected utility because the independence axiom or the sure-thing principle is not used as an axiom in the argument for this principle. 14
Broome (1991) discusses a counterargument to this objection. I believe his point might be right, but this is not the right occasion for an in-depth discussion of this topic.
7.6 The independence axiom vs. the trade-off principle
123
It is worth pointing out that, as noted by several authors, the independence axiom also implies a version of the Allais paradox directly.15 According to the independence axiom, an uncertain prospect A is preferred to B if and only if a lottery with probability p for A and (1-p) for C is preferred to a lottery with probability p for B and (1-p) for C, i.e. AC if and only if ApCBpC. Let A denote $1M for sure and let B be a gamble with a 10/11 probability for $5M and a 1/11 probability for $0. In this decision problem, many agents would prefer A to B. Now, let C be a prize of $0 and p = 0.11. In a choice between ApC (i.e. an 11% chance of getting $1M) and YpZ (i.e. an 11% chance of a 10/11 chance of getting $5M) many agents would prefer the latter gamble, since this prize is considerable larger but the chance of getting it insignificantly smaller. Hence, according to their preferences AB and BpCApC, which is inconsistent with the independence axiom. Again, the advocate of the expected utility principle has to explain why these choices would be irrational, whereas the advocate of the new axiomatisations could simply appeal to the principle of maximising expected utility.
7.6 The independence axiom vs. the trade-off principle The trade-off principle is the strongest axiom employed in the axiomatisations presented in Sections 7.3 and 7.4. The aim of this section is to analyse the relationship between the trade-off principle and the independence axiom in more detail. As demonstrated in Section 7.3 the two axioms are logically independent. However, it could still be objected that they to some extent rely on similar normative intuitions. Although logically independent of each other, both axioms could in theory appeal to the same intuitions about rationality. The traditional formulation of the independence axiom, first suggested by von Neumann and Morgenstern, was stated in Section 7.3. Let ApB be a lottery yielding prize A with probability p and prize B with probability (1 − p); then, for every A, B,C, it holds that A B if and only if ApC BpC. However, the non-Bayesian approach is not concerned with conditions regulating preferences over uncertain prospects. Therefore, it will be more appropriate to consider a slightly different principle, which appeals to the same intuition as the independence axiom and the sure-thing principle. Consider the following axiom, first suggested in a paper by Milnor.16 EU 10 (Milnor’s Independence) Let u0 (s) be a one place utility function such that u0 (s) = u0 (s, a) for all s ∈ S and a ∈ A, and let π0 = A, S, p, u0 . Then, if π1 = A, S, p, u1 and π2 = A, S, p, u2 , where u2 (a, s) = α ·u1 (a, s)+(1 − α )·u0 (s) , 0 < α < 1, it holds that Φ (π1 ) = Φ (π2 ). In order to see how Milnor’s independence axiom works, suppose that Leonard is planning a trip to Teheran. He can either fly Iran Air or Lufthansa, and depending 15 16
Rabinowicz (1995:281). Milnor (1954).
124
7 Expected utility
on the turbulence around the Alborz Mountains the trip will be more or less convenient. The price and service level for both flights are the same, as are the departure times. Unfortunately, Leonard does not know how turbulence will affect the aircraft. However, he does know that either, with some (known or unknown) probability α , the aircraft operated by Lufthansa will behave considerably better in turbulent airs because it is technically more advanced, but arrive later because its environmental friendly engines give a slightly lower cruising speed or with some (known or unknown) probability 1 − α , the Iran Air and the Lufthansa aircraft will be affected in exactly the same way because both companies operate identical 747s. Let us suppose that Leonard’s decision problem can be represented by the following decision matrices. (See Table 7.6.) Alternative a1 is the Lufthansa alternative and a2 is the Iran Air alternative, s1 is the state in which turbulence occurs and s2 is the state in which no turbulence occurs. Table 7.6 [Prob 1] a1 a2
[Prob 2] s1 1 0
s2 50 100
a1 a2
s1 10 10
s2 20 20
In case Leonard faces Problem 2 and not Problem 1 it doesn’t matter which alternative is chosen, since the outcome of the former problem does not depend on the alternative acts, only on which state of nature happens to be the true state. Therefore, Leonard should, according to Axiom EU 10, focus entirely on the problem in which Lufthansa and Iran Air do not operate identical aircraft; that is, the solution to the mixed Problem 1 & 2 is not dependent on a trivial problem, like Problem 2. From EU 10 it directly follows that a constant can be added to all entries of a column without affecting the solution of the decision problem:17 Lemma 7.4 Let EU 10 hold. Then, if π1 = A, S, p, u1 , π2 = A, S, p, u2 and u2 (a, s) = u1 (a, s) + u0 (s), and if there is an α , 0 < α < 1, such that π3 = A, S, p, u3 = α · u2 + (1 − α ) · 0 is a formal decision problem, then it holds that Φ (π1 ) = Φ (π2 ). Axiom EU 10 does more or less the same work as EU 9. In order to see this, I shall show that EU 9 can be replaced by EU 10 in Theorem 7.3, given that an additional, less controversial axiom is adopted. The following axiom articulates the intuition that the cause of an outcome is normatively irrelevant. That is, agent s should only be concerned with the amount of utility they receive, not what state made them receive this amount of utility. For example, if s and s are equi-probable it doesn’t matter if the agent receives 100 utiles under s and 0 utiles under s , or 100 utiles under s and 0 utiles under s. 17 This lemma was first proved by Chernoff (1954:433). The proof given in the Appendix is in all relevant aspects identical to his.
7.6 The independence axiom vs. the trade-off principle
125
EU 11 (Irrelevance of cause) Suppose that two states s and s are equi-probable in π , and let ak be one of the alternatives in π . Then, if π is transformed into a formal decision problem π which is identical to π except that u (ak , si ) = u(ak , s j ) and u (ak , sj ) = u(ak , si ), then Φ (π ) = Φ (π ). The following theorem can now be proved. Theorem 7.5 Let EU 5—EU 8 and EU 10—EU 11 hold for a set Π of formal decision problems. Then, for every π ∈ Π , it holds that:
Φ (π ) = {a : ∑ p(s) · u(a, s) ≥ ∑ p(s) · u(a , s) for all a ∈ A} s∈S
s∈S
Theorem 7.5 indicates that there is a choice to be made here. One can either justify the principle of maximising expected utility by adopting the independence axiom, or by adopting the trade-off principle. Arguably, the trade-off principle is more attractive from an intuitive point of view. Of course, as pointed out above, it is important to demonstrate that the tradeoff principle and the independence axiom are genuinely different axioms. Someone might wish to claim that the trade-off principle implicity invokes the same intuition as the independence axiom, and must therefore be considered as fallacious for the same reason: According to the trade-off principle, the trade-off rates for a pair of outcomes under the states s and s are independent of the outcomes under other states, and this is, it could be argued, just another way of formulating the same intuition. In reply to this objection, I wish to emphasise that the trade-off principle does not logically imply the independence axiom, and it cannot be logically derived from that principle, as demonstrated in Section 7.3. Furthermore, even if the term ‘implicitly invoke’ is taken in a non-formal and intuitive sense, it can be reasonably claimed that the trade-off principle is based on an entirely different normative intuition, compared to the independence axiom. The intuition underlying the trade-off principle is that trade-offs are always acceptable if the tradeoff rate is good enough, and improves the worst outcome. Having said that, it should also be noted that the trade-off principle can actually be formulated in terms that take the outcomes of all states into consideration. For a formal proof that this weak trade-off principle is sufficient for deriving the expected utility rule, see Observation A.3 in the Appendix.
Chapter 8
Risk aversion
In Bayesian decision theory, ‘[a] risk averter is defined as one who, starting from a position of certainty, is unwilling to take a bet which is actuarially fair’. This definition was famously proposed by Pratt (1964) and Arrow (1970) in a series of ground-breaking articles. Expressed in technical terms, risk aversion R in the Pratt-Arrow theory is defined as R (Y ) = −u (Y ) /u (Y ) where u is the utility of the agent’s wealth Y . A merit of the Pratt-Arrow concept of risk aversion, both when used in normative and in descriptive contexts, is that it reconciles the principle of maximising subjective expected utility with paradigmatic examples of risk aversion, such as the prevalence of insurance and fixed interest rates. For example, if you prefer to pay $500 for insuring your car, rather than having to pay $10, 000 for a new car if it gets stolen, you are not violating the principle of maximising subjective expected utility even in case you consider the probability for theft to be well below one in twenty. This is because when stating a preference for insurance, you indirectly reveal an increasing marginal disutility of monetary losses. Your disutility of loosing $10, 000 is more than twenty times as high as the disutility of loosing $500. However, some decision theorists think that the Pratt-Arrow theory cannot account for a number of central intuitions about risk aversion. Consequently, alternative, stronger notions of risk aversion have been proposed, which cannot be reconciled with the expected utility principle.1 The underlying idea is that a risk averter ought to substitute the expected utility principle by some decision rule that is not only aversive to actuarial risks but also to (large) utility risks. Prominent examples of decision rules that are risk averse in this stronger sense is the maximin rule, the maximum probable loss rule, and policy principles such as the precautionary principle. The aim of this chapter is to provide an argument to the effect that those ‘strong’ notions of risk aversion are unacceptable from a normative point of view. Arguably, this will help the expected utility principle come out in an even stronger position. Having said that, the axiomatisations of the expected utility principle from Chapter 7 are, of course, in themselves reasons for not accepting any strong notion of risk aversion. Therefore, I will focus mainly on forms of risk aversion that are 1
See e.g. Ayres and Sandilya (1986), Ekenberg et al (1997, 2001), Kavka (1980).
127
128
8 Risk aversion
advocated by people wishing to reject even the most fundamental assumptions of the expected utility principle: that the possibility of a very bad outcome can always be outweighed by the possibility of a very good outcome. Decision rules that are inconsistent with this Archimedean condition will be referred to as ‘strongly risk averse’ decision rules. Essentially, what strongly risk averse decision rules are telling us is that sufficiently bad outcomes should always be avoided, if possible. The term ‘fatal’ outcomes will be used for referring to such very bad outcomes. The notion of a fatal outcome will be rigourously defined later on. For the moment being, a fatal outcome can be thought of as an outcome whose utility is very low. For example, when building a house in Tokyo the outcome of a decision to build a non-earthquake safe house rather than an earthquake safe one might very well turn out to be fatal, since its utility will—if an earthquake takes place—be very low. By the same line of thought, the outcome of the decision to not prohibit genetically modified organisms in the USA, or the outcome of the decision to not phase out nuclear power in Sweden, might eventually turn out to be fatal. The upshot of this chapter is a number of impossibility theorems, demonstrating that given a few reasonable desiderata for strongly risk averse decision rules, there is no rule that can fulfil all the proposed desiderata. Together these theorems give indirect support to the expected utility principle. However, a further point I seek to make is that many of our normative intuitions about risk aversion can be explained by considering a much different, epistemic notion of risk aversion. According to the epistemic (belief-guiding) notion of risk aversion, intuitions about risk aversion should be accounted for in terms of what rational agents are urged to believe. In Section 8.1 the concept of a fatal outcome is defined and discussed in relation to some prominent examples of strongly risk averse decision rules. In Sections 8.2 and 8.3 a number of desiderata for such decision rules are stated, and proved to be inconsistent. In Sections 8.4 and 8.5 a set of even weaker desiderata are proposed, which appeal to the intuition that in many cases when risk aversion is called for it does not make sense to presuppose that the agent has access to quantitative information about the utility and subjective probability of each outcome.
8.1 Beyond the Pratt-Arrow concept The informal characterisation of a strongly risk averse decision rule given so far refers to the notion of a fatal outcome. Clearly, the concept of a fatal outcome needs to be defined. I shall consider three alternative definitions below. Let c1 , c2 , c3 be real numbers denoting some suitable cut-off levels.2
2 What is the exact value of c , c , c ? The best answer is perhaps to admit that the term ‘fatal’ 1 2 3 is vague, that is, that there are no sharp boundaries between fatal and non-fatal outcomes. Note, however, that this vagueness does not exclude that some outcomes are genuine instances of fatal outcomes.
8.1 Beyond the Pratt-Arrow concept
129
Definition 8.1 The outcome a, s of the formal decision problem π is fatal1 if and only if u(a, s) ≤ c1 . Definition 8.2 Let u∗ (π ) denote the utility of the optimal outcome of the formal decision problem π . Then, the outcome a, s of π is fatal2 if and only if u∗ (π ) − u(a, s) ≤ c2 . Definition 8.3 Let eu∗ (π ) denote the highest expected utility obtainable with any act in π . Then, the outcome a, s of π is fatal3 if and only if eu∗ (π ) − u(a, s) ≤ c3 . Definition 8.1 articulates a form of absolutism about fatal outcomes, in maintaining that an outcome is fatal whenever its utility falls below a certain absolute level, e.g. death or a life not worth living, etc. Definitions 8.2 and 8.3 represent two different forms of relativism about fatalness. They assert that an outcome is fatal whenever the difference between the utility of the outcome that the agent actually got and the utility of the optimal outcome, or the expected outcome, is too large. An argument for preferring relativism to absolutism, i.e. to choose Definition 8.2 or 8.3 rather than Definition 8.1, is that the former approach allows one to speak about fatal outcomes in cases where all possible outcomes are fairly good. In order to see this point, suppose that you are 99 percent certain to have a long and healthy life plus a wife who loves you but for some reason, you fail to get married and instead get a long and healthy life as a bachelor. Then, even though both outcomes are indeed very attractive, one might argue that the result was fatal for you compared to the anticipated outcome. The absolutist cannot accommodate this intuition; according to this view, the alternative outcomes do not matter. On the other hand, it might be argued that the form of relativism articulated by Definition 8.2, for example, cannot handle very improbable outcomes in a reasonable way. Here is an example explaining why: Suppose that you take part in the National Lottery and draw a blank; then, provided the first prize was sufficiently valuable, the blank ticket was a fatal outcome for you (since the difference between u∗ (π ) and u (a, s) exceeds c2 ). However, this problem is perhaps not as devastating as it might first appear to be. If we accept some version of the de minimis principle, i.e. the thesis that sufficiently unlikely outcomes should be neglected, relativism would no longer imply that it was a fatal outcome to draw a blank in the National Lottery.3 I shall not take a definite stand on how to define a fatal outcome, i.e. decide which of Definition 8.1, 8.2 or 8.3 is best. Instead, I will in the subsequent sections work with all three definitions in parallel. Thus, by taking any of the three definitions as a point of departure, a strongly risk averse decision rule can be characterised as a decision rule that avoids fatal outcomes (as defined in Definitions 8.1—8.3) in a larger number of decision problems than the expected utility principle, which can be used as a reference point for risk neutrality. This rather imprecise definition will do, because in the formal exposition below no reference is made to the concept of strong
3
For discussions of the de minimis principle, see Whipple (1987).
130
8 Risk aversion
risk aversion. All technical work is carried out by the concept of a fatal outcome, as defined in Definitions 8.1—8.3. Examples of strongly risk averse decision rules include the maximin rule, the maximum probable loss rule, and a number of modified versions of the expected utility principle.4 The maximin rule recommends the agent to choose an alternative act for which the worst possible outcome is at least as good as the worst possible outcome of every alternative act, that is, an act ai for which min (ui,x ) ≥ min uk,y for all k, x, y. It can be easily verified that in case some of the alternative acts are certain to not end up in a fatal outcome, the maximin rule will select a subset of these acts. The maximum probable loss rule is used by insurance companies for avoiding ‘too dangerous’ contracts; i.e. those that might cost so much that bankruptcy is a possible outcome. This rule first eliminates from the list of alternatives all acts associated with a non-negligible probability for a fatal outcome and thereafter recommends the agent to choose, among the remaining alternatives, one that has the highest expected utility.5 Hence, if A is the non-empty subset of A containing only acts with no or a negligible probability for a fatal outcome, the maximum probable loss rule selects an act ai in A for which ∑ p (s j ) · u (ai , s j ) is maximal.6 ∀s j ∈S
The modified version of the principle of maximising expected utility proposed in Ekenberg et al (1997, 2001) is a somewhat weaker version of the maximum probable loss rule. It first eliminates, from the list of alternatives acts, all acts having a probability above p for an outcome below a certain level l, and thereafter recommends the agent to choose one of the remaining alternatives that has the highest expected utility. (The values of p and l, which must not necessarily be negligible small and catastrophically fatal, are determined by the agent according to his attitudes toward risk.)
8.2 The first impossibility theorem In this section I propose a set of desiderata for strongly risk averse decision rules, which I believe many risk averters would be willing to accept. Thereafter, I show that no decision rule can satisfy all of the desiderata. The first desideratum appeals to the intuition that if it is sufficiently probable that a fatal outcome will arise if a hazardous alternative is chosen, then a safe alternative that is certain to not lead to a fatal outcome should, if possible, be preferred. Let ph be a probability, 0 ≤ ph < 1, and let f be an effective decision rule (as defined in Section 3.1) on a set of formal decision problem Π . 4
See e.g. Ekenberg et al (1997, 2001). When is a risk [probability] negligible? A common suggestion is 10−6 ; see for example Whipple (1987). 6 This interpretation of the maximum probable loss rule is based on Brun et al (1997), Kozlowski and Mathewson (1995), and Kunreuther (1997). 5
8.2 The first impossibility theorem
131
Desideratum 1 (Safety): Let an alternative a ∈ A in π be hazardous if and only if the probability that a results in a fatal outcome is equal to or exceeds ph . Then, if Ah is the subset of hazardous alternatives in A, it holds for every formal decision problem π ∈ Π that f(π ) ⊆Ah whenever A − Ah is non-empty. In order to see what is required by this desideratum, consider the maximin rule. This rule implies that every act associated with a non-zero probability for a fatal outcome should be avoided, provided that there is at least one alternative act that is certain not to result in a fatal outcome. Hence, this rule satisfies Desideratum 1 just in case ph = 0. Other rules, e.g. the maximum probable loss rule, and Ekenberg et al (1997, 2001) specify security constraints for hazardous acts by selecting higher probability thresholds. For example, Ekenberg et al suggest that ‘a strategy could be considered undesirable if a consequence with probability above 0.15 also has a value below 0.10.’7 An obvious objection to Desideratum 1 (and some of the rules satisfying it) is that it makes a ‘too sharp’ distinction between hazardous alternatives, which ought to be avoided, and non-hazardous alternatives, which are permitted. Suppose, for instance, that the probability that a1 and a2 result in a fatal outcome is 0.010 and 0.012, respectively. Furthermore, suppose that ph = 0.011. Then a2 , but not a1 , ought to be avoided according to Desideratum 1. But in case the fatal outcome yielded by a1 is more severe than the fatal outcome yielded by a2 (or if some other potential outcomes of a2 are much better), why not then disregard the rather small difference in probability and chose a2 ? In order to make our desiderata as uncontroversial as possible, I shall consider the following weaker version of Desideratum 1. Let ps and ph be some numbers such that 0 ≤ ps < ph < 1. Desideratum 1 (Weak Safety): Let an alternative a ∈ A be hazardous if and only if the probability that a results in a fatal outcome exceeds ph and let it be safe if and only if the probability that a results in a fatal outcome falls below ps . Then, if Ah is the subset of hazardous alternatives in A and As is the subset of safe alternatives in A, it holds for every formal decision problem π ∈ Π that f (π ) ⊆Ah whenever As is non-empty. The strength of Desideratum 1 depends on the values assigned to ps and ph . If ps = 0, and ph is < 1 but arbitrary close to 1 we obtain its weakest version, which is satisfied by e.g. the maximin rule, the maximum probable loss rule, and the rule suggested in Ekenberg et al (1997, 2001). Of course, in most realistic examples the agent would certainly choose some less extreme values; perhaps ph = 0.01 and ps = 10−6 are more realistic values. However, in what follows I shall adopt the weakest version of Desideratum 1 unless otherwise stated, since that will be sufficient for my purposes.8 7
Ekenberg et al (2001:39). Note that even in case ps = 0 and ph is < 1 and arbitrary close to 1, Desideratum 1 will surely not be accepted by everyone. An advocate of the principle of maximising expected utility would,
8
132
8 Risk aversion
The second desideratum was first suggested as a general postulate for rational decision making by Rubin and Chernoff.9 This principle, mentioned in Chapter 7 and referred to as Axiom EU 10, stresses that decision problems that are trivial in the sense that their outcomes do not depend on which alternative is chosen, should not affect the solution of a mixed problem consisting of both trivial and non-trivial problems. If the decision maker knows only that he is playing problem 1 with probability [α ] and problem 2 with probability [1 − α ] when he has to adopt an act, then he should adopt an act which is optimal for problem 1, since problem 2, which enters with probability [1 − α ], is irrelevant as far as his choice is concerned.10
The following formal formulation is is identical to EU 10, except that it is formulated in terms of decision rules rather than in terms of solutions to decision problems. Desideratum 2 (Irrelevance of Trivial Problems): Let u0 (s) be a one place utility function such that u0 (s) = u0 (s, a) for all s ∈ S and a ∈ A. Furthermore, let π0 = A, S, p, u0 . Then, if π1 = A, S, p, u1 and π2 = A, S, p, u2 are elements in Π and u2 (a, s) = α · u1 (a, s) + (1 − α ) · u0 (s) , 0 < α < 1, it holds that f (π1 )=f (π2 ). According to Chernoff, Desideratum 2 ‘seems extremely reasonable’, and Luce and Raiffa think that it has a ‘very compelling a priori quality’.11 From a formal point of view, Desideratum 2 is closely related to Savage’s sure-thing principle and von Neumann and Morgenstern’s independence axiom.12 For this reason, I have also worked out an impossibility theorem in which Desideratum 2 is replaced by two other desiderata; see Section 8.3. The third desideratum can be formally stated as follows. Desideratum 3 (Completeness): f (π ) = 0/ for every π ∈ Π . Desideratum 3 stresses that a reasonable decision rule must prescribe at least one alternative act in every decision problem. Most decision theorists consider this intuition to be rather uncontroversial.13 However, even though each of Desiderata 1 , 2 and 3 appears to be normatively reasonable on its own, they are jointly inconsistent. Consider the following theorem. (The restriction in Theorem 8.1.4-8.1.6 that 0 ≤ ps < ph ≤ 0.5 is uncontroversial, since in any real-life application ph will certainly fall well below 0.5.) for instance, deny that even this very weak form of strong risk aversion can be accepted. But this is because that decision rule is not strongly risk averse. 9 See Chernoff (1954:431). 10 Luce and Raiffa (1957:290). 11 Chernoff (1954:431), Luce and Raiffa (1957:292). 12 Von Neumann & Morgenstern (1947: Chapter 3.6), Savage (1954/72: Chapter 2.7). 13 One of the earliest advocates of completeness is Milnor (1954), who accepts it as one of his postulates for rational decision procedures.
8.3 The second impossibility theorem
133
Theorem 8.1 (The First Impossibility Theorem) 1. Let Definition 8.1 hold and let 0 ≤ ph < 1 in Desideratum 1. Then, there is no decision rule f that satisfies Desiderata 1 & 2 & 3. 2. Let Definition 8.2 hold and let 0 ≤ ph < 1 in Desideratum 1. Then, there is no decision rule f that satisfies Desiderata 1 & 2 & 3. 3. Let Definition 8.3 hold and let 0 ≤ ph < 1 in Desideratum 1. Then, there is no decision rule f that satisfies Desiderata 1 & 2 & 3. 4. Let Definition 8.1 hold and let 0 ≤ ps < ph ≤ 0.5 in Desideratum 1 . Then, there is no decision rule f that satisfies Desiderata 1 & 2 & 3. 5. Let Definition 8.2 hold and let 0 ≤ ps < ph ≤ 0.5 in Desideratum 1 . Then, there is no decision rule f that satisfies Desiderata 1 & 2 & 3. 6. Let Definition 8.3 hold and let 0 ≤ ps < ph ≤ 0.5 in Desideratum 1 . Then, there is no decision rule f that satisfies Desiderata 1 & 2 & 3. Luce and Raiffa pointed out the maximin rule is inconsistent with Desideratum 2 alone, and they famously used this observation as an argument against the maximin rule.14 So the fresh part of Theorem 1 is not that a number of strongly risk averse decision rules are inconsistent with a number of desiderata, but rather that no decision rule can be consistent with these desiderata. Theorem 1 is, thus, a generalisation of the point made by Luce and Raiffa.
8.3 The second impossibility theorem I shall now derive a second impossibility theorem, in which Desideratum 2 of the first impossibility theorem is substituted by two other desiderata. First consider the following desideratum. Desideratum 4 (Dominance): Let Ad be the subset of A such that for any a ∈ A, a is a member of Ad if and only if, for all states s ∈ S, the utility of a given s is at least as high as the utility of every alternative to a given s. Then, f (π ) = Ad whenever Ad is non-empty. Desideratum 4 recommends one to choose, if possible, an act that dominates the other acts. In the present context it is hardly fruitful to question this deep intuition. The next desideratum is an adapted version of the trade-off principle, stated and discussed in Chapter 7. Desideratum 5 (Trade-off): There is some number δ > 0, such that for all ε1 , 0 ≤ ε1 ≤ δ , and all decision problems π ∈ Π , if two states s and s are equiprobable in π , and a is one of the alternatives in π , then there is some number ε2 such that f (π ) = f (π ), where π is the decision problem obtained from π by subtracting ε1 from the utility of a under sand adding ε2 to the utility of a under s , or vice versa. 14
Luce and Raiffa (1957:291-2).
134
8 Risk aversion
Desideratum 5.2 can be used as a direct argument against the maximin rule: a2 in π is not worse than a2 in π , because the potential loss of the very small amount of utility ε1 can always be compensated by some (perhaps very large) amount of utility ε2 . Arguably, this rather uncontroversial intuition (remember that ε1 might be really small!) can be taken as a strong reason for giving up any strongly risk averse decision rule appealing to similar intuitions as the maximin rule. Now, the second impossibility theorem can be stated as follows. Theorem 8.2 (The Second Impossibility Theorem) 1. Let Definition 8.1 hold and let 0 ≤ ph < 1 in Desideratum 1. Then, there is no decision rule f that satisfies Desiderata 1 & 3 & 4 & 5. 2. Let Definition 8.2 hold and let 0 ≤ ph < 1 in Desideratum 1. Then, there is no decision rule f that satisfies Desiderata 1 & 3 & 4 & 5. 3. Let Definition 8.3 hold and let 0 ≤ ph < 1 in Desideratum 1. Then, there is no decision rule f that satisfies Desiderata 1 & 3 & 4 & 5. 4. Let Definition 8.1 hold and let 0 ≤ ps < ph < 1 in Desideratum 1 . Then, there is no decision rule f that satisfies Desiderata 1 & 3 & 4 & 5. 5. Let Definition 8.2 hold and let 0 ≤ ps < ph < 1 in Desideratum 1 . Then, there is no decision rule f that satisfies Desiderata 1 & 3 & 4 & 5. 6. Let Definition 8.3 hold and let 0 ≤ ps < ph < 1 in Desideratum 1 . Then, there is no decision rule f that satisfies Desiderata 1 & 3 & 4 & 5.
8.4 The precautionary principle It might be objected that the criticism raised above against the maximin rule and other strongly risk averse decision rules is unfair. After all, very few people would deny that the expected utility principle should be adopted if quantitative information about utilities and subjective probabilities is available. (Remember that the PrattArrow notion of risk aversion, which is consistent with the expected utility principle, can account for intuitions about actuarial risk aversion.) So perhaps one ought to be risk averse in a sense that goes beyond the Pratt-Arrow concept only in case no quantitative information is available? Call this the non-quantitative hypothesis. In what follows I shall analyse the non-quantitative hypothesis by considering it in relation to the so called precautionary principle. The maximin rule and other strongly risk averse decision rules can plausibly be conceived of as principles favouring a precautionary approach to risk. However, no attempt will be made to formulate the precautionary principle as a version of those decision rules. As pointed out by Sandin, there seems to be no single formulation of the precautionary principle that we can all agree upon. Different authors advocate very different formulations of the principle.15 15 See Sandin (1999) for an overview. For discussions of various other uses of the precautionary principle, see e.g. Bodansky (1994), Graham (2000), and Sandin (2004).
8.4 The precautionary principle
135
The present discussion of the precautionary principle starts off with a strong negative claim, which is gradually qualified and justified: No version of the precautionary principle can be reasonably applied to decisions that may lead to fatal outcomes. In support of this thesis a number of desiderata are proposed, which reasonable rules for rational decision making based on non-quantitative information ought to satisfy. Thereafter two impossibility theorems are proved, in this and the following section, showing that no version of the precautionary principle can satisfy the proposed desiderata. It would be pointless to analyse the precautionary principle by using the formal framework used in other parts of this work. As explained above, the reason is that it makes little sense to apply the precautionary principle to decisions in which numerical probabilities and utilities are known. The precautionary principle should be applied only when reliable quantitative information is lacking. The following tristinction between different types of decision making helps us to determine the type of decisions to which the precautionary principle may legitimately be applied: 1. Decision making under ignorance. 2. Decision making based on qualitative information. 3. Decision making based on quantitative information. In decision making under ignorance nothing is known about the likelihood of the outcomes, but the desirability of the outcomes can be ranked on an ordinal (that is, qualitative) scale. In decision making based on qualitative information both the likelihood and desirability of each outcome can be ranked on ordinal scales. Finally, in decision making based on quantitative information both the probability (a quantitative measure of likelihood) and the utility (a quantitative measure of desirability) can be ranked on cardinal scales. According to Resnik, ‘To apply the precautionary principle to any particular problem, one must make judgments regarding the plausibility and seriousness of a threat’.16 This indicates that the the precautionary principle is a decision rule of type (2), that is, a rule for making decisions based on qualitative information. This interpretation, which will be adopted here, also tallies well with the well-known Rio declaration, in which it is explicitly pointed out that the precautionary principle should be applied if there is a lack of ‘full scientific certainty’: Where there are threats of serious or irreversible damage, lack of full scientific certainty shall not be used as a reason for postponing cost-effective measures to prevent environmental degradation (UNCED 1993).
In order to analyse the precautionary principle formally, let S be a finite set of states, and let O be a finite set of outcomes.17 Let s be a binary relation on S such that si s s j if and only if si is at least as likely to occur as s j . Let d be a binary relation on O such that o d o if and only if o is at least as desirable as o. In order to avoid confusion with the formal setup used elsewhere in this book, alternative 16 17
Resnik (2004: 293-4). As explained in Section 1.3, outcomes can be defined in terms of states.
136
8 Risk aversion
acts in a qualitative decision problem will be denoted by X, Y , . . . and conceived of as functions from S to O, i.e. each alternative a function that takes a state as its input and returns an outcome as its output. A is the set of all alternatives. Let be a relation among the elements of A denoting weak preference. Indifference, ∼, and strict preference, , are defined in the standard way. For each X in A, there is a vector [x1 , , xn ] in which the outcomes of X have been ordered from the most likely one to the least likely one by applying s to the states corresponding to each outcome. The desirability of the outcomes x1 , . . . , xn is ordered by d . The letters a, b, . . . will be taken to denote the desirability of outcomes, such that a d b d . . .. For simplicity, the outcomes themselves will also be denoted either by x1 , x2 , . . . or by their desirability a,b,. . . . Somewhere along the ordinal scale of desirability there is a limit between non-fatal and fatal outcomes. The limit need not be sharp. There might be a zone of vagueness, in which outcomes are neither non-fatal nor fatal. However, I assume that every outcome equal to or worse than p is well beyond the limit. Such an undesirable outcome is an example of a fatal outcome that makes it mandatory to apply the precautionary principle. Furthermore, the formula a j denotes the iteration of an outcome, whose desirability is a, exactly j times. To iterate an outcome j times means that the sets S and O are expanded such that j −1 new outcomes are added, all of which are equally desirable. It might be helpful to illustrate the new formalisation in an example. Let X and Y be two alternative acts. Each alternative may lead to three possible outcomes, depending on which state of the world happens to be the true state. See Table 8.1. Table 8.1 X= Y=
[a, [a,
p, q,
q] p]
The lower case letters denote possible outcomes, corresponding to the different states. The states themselves are not represented in the figure. The most likely outcome comes first, the second most likely one next, and so on. In the matrix above, the most likely outcome of X is a, followed by p. The least likely outcome is q. As explained above, the lowercase letters are chosen such that an outcome denoted by a is more desirable than an outcome denoted by b, which is more desirable than an outcome denoted by c, and so forth. Hence, in the example illustrated in Figure 1, the most likely outcome also happens to be the most desirable one, and no matter which alternative one chooses this outcome is equally likely to occur. Now consider the following preliminary, qualitative formulation of what the precautionary principle might require from a rational decision maker. The intuition is that if one act is more likely to give rise to a fatal outcome than another, then the latter should be preferred to the former; and if the two acts are equally likely to give rise to a fatal outcome, then they should be equipreferred. Desideratum 6 (PP 1) Let X = [x1 , . . . , xn ] such that for at least one xi , p xi , and let x∧ be the state obtained by merging all states fulfilling this condition into a single
8.4 The precautionary principle
137
state. Let y∧ be defined in the analogous way for an act Y . Then, if x∧ s y∧ , it holds that Y X. Also consider the following desiderata. They can be conceived of as more general conditions, which every qualitative decision rule ought to satisfy. Desideratum 7 (Dominance) If xi d yi for all i, then Y X. Desideratum 8 (Covariance) Let X = [. . . , xi , x j , . . .] such that x j xi . Let X be exactly as X, except that X = [. . . , x j , xi , . . .]. Then X X. Desideratum 9 (Total Order) The relation is complete, asymmetric, and transitive. Desideratum 7 expresses the intuition that if one act yields at least as good outcomes as another under all possible states of the world, then the latter is not preferred to the former. This is a familiar dominance condition. Desideratum 8 asserts that if the relative likelihood of a fatal outcome decreases in relation to a strictly better outcome, then the new act is strictly preferred to the original one; there is a covariance between preferences and the likelihood of fatal outcome. Desideratum 9 is a well-known ordering condition. The four desiderata stated above are logically inconsistent. In order to see this, one could reason as follows. Consider the example in Figure 1. X and Y are equally likely to give rise to fatal outcomes. According to PP 1 we can therefore conclude that, (i) X and Y should be equipreferred. However, according to Covariance, Y can be modified by decreasing the relative likelihood for q, which is by definition more undesirable than p. Suppose that this is done by adjusting the relative likelihood of p and q (of Y ) such that the likelihood of these outcomes becomes exactly similar to the likelihood of outcomes p and q of X. Call the modified act Y . It follows from the principle of covariance that (ii) Y is strictly preferred to Y . However, by stipulation, Y and X are exactly parallel. Therefore, no matter which state happens to be the true state of the world, the outcome of Y will be at least as good as that of X, and vice versa. It follows from dominance in conjunction with the ordering condition that (iii) X and Y are equipreferred. Since the preference ordering is transitive, it follows from (i) and (iii) that Y and Y are equipreferred. This contradicts (ii). Theorem 8.3 PP 1, Dominance, Covariance, and Total Order are logically inconsistent. Of course, the logical contradiction derived in Theorem 8.3 could be circumvented by giving up the rather strong version of the precautionary principle implied by PP 1. Even if advocates of the precautionary principle think that the possibility of a fatal outcome is something that should be avoided at nearly any cost, they need not claim that it is rational to be equally concerned about very undesirable fatal outcomes and slightly less undesirable fatal outcomes. In the example above, it seems that a reasonable formulation of the precautionary principle should prescribe that X is preferred to Y . Hence, even advocates of the precautionary principle must be willing to admit that both the likelihood and the desirability of outcomes matter.
138
8 Risk aversion
Hence, some more reasonable desiderata for the precautionary principle ought to be considered.
8.5 The fourth impossibility theorem This section considers a set of desiderata which are logically weaker than PP 1, and which can plausibly be taken to be implied by every reasonable version of the precautionary principle. Or, slightly differently out, the new desiderata are intended as minimal criteria that advocates of different versions of the precautionary principle ought to agree upon, no matter which version of the principle happens to be their personal favourite. To start with, consider the desideratum below. It articulates the intuition that if one act is more likely to give rise to a fatal outcome than another, then the latter should be preferred to the former, given that both fatal outcomes are equally undesirable. Obviously, this desideratum circumvents the criticism raised against PP 1. Desideratum 10 (PP 2) Let X = [x1 , . . . , xn ] such that for exactly one xi , p ∼d xi . Let Y = [y1 , . . . , yn ] such that for exactly one yi , p ∼d yi . Then, if yi s xi , it holds that X Y . Note that PP 2 does not entail anything about what one should do in case one faces alternatives that may lead to fatal outcomes that are not equally undesirable. Hence, PP 2 is at most a partial explication of the precautionary principle. However, it might be objected that even this condition is too strong. As argued by Resnik, it seems unreasonable to apply the precautionary principle to decisions in which the fatal outcome one wishes to avoid is way too far fetched.18 For example, if a doctor advises a patient suffering from overweight not to go for a walk in the park because there is a tiny chance that she may be killed by a murderer walking by, the doctor is hardly using the precautionary principle in a reasonable way. Arguably, the precautionary principle should not be applied to situations in which the likelihood of a fatal outcome is negligible. Let us therefore consider another desideratum, which is logically weaker than PP 2. The idea is the following: If one act is more likely to give rise to a fatal outcome than another, then the latter should be preferred to the former, given that: (i) both fatal outcomes are equally undesirable and, (ii) not negligibly unlikely. For present purposes, there is no need to spell out what a ‘negligibly unlikely’ outcome is. I just assume that there is a limit, which need not be sharp, between outcomes that are negligibly unlikely and not negligibly unlikely. Let x∗ be an outcome that is not negligibly unlikely. Desideratum 11 (PP 3) Let X = [x1 , . . . , xn ] such that for exactly one xi , p ∼d xi and xi s x∗ . Let Y = [y1 , . . . , yn ] such that for exactly one y j , p ∼d y j and y j s x∗ . Then, if y j s xi , it holds that X Y . 18
Resnik (2003:337-9).
8.5 The fourth impossibility theorem
139
A problem with condition PP 3 is that the likelihood that an act leads to a fatal outcome might be just negligibly higher than the likelihood that some other act leads to such an outcome. If this happens, it is not clear that the likelihood of a fatal outcome should be allowed to determine the decision maker’s preference between acts. Perhaps the desirability and likelihood of non-fatal outcomes should also be allowed to play a role. Hence, given that one can somehow make judgements about differences in likelihoods (this is not always assumed to be the case in qualitative decision theory) one might be tempted to consider an even weaker desideratum, which articulates the following intuition: If one act is more likely to give rise to a fatal outcome than another, then the latter should be preferred to the former, given that: (i) both fatal outcomes are equally undesirable and, (ii) not negligibly unlikely and, (iii) the non-preferred act is sufficiently more likely to lead to a fatal outcome than the preferred one. In order to state this desideratum formally, let y∗ be an outcome that is ‘sufficiently’ more likely than x∗ . Desideratum 12 (PP 4) Let X = [x1 , . . . , xn ] such that for exactly one xi , p ∼d xi and xi s x∗ . Let Y = [y1 , . . . , yn ] such that for exactly one y j , p ∼d y j and y j s y∗ . Then X Y . PP 4 is the weakest explication of the precautionary principle to be considered here. Presumably, it is so weak that it cannot reasonably be refuted by any advocate of the precautionary principle. However, since the precautionary principle has now been substantially weakened, at least one of the other conditions has to be strengthened. The following Archimedean condition is a natural way to strengthen the condition of Covariance in a way that seems acceptable for advocates of the precautionary principle: Desideratum 13 (Archimedes) Let X = [a, b, , p, c], such that c s x∗ and p s y∗ . Then there are some j, k, l, m such that [a j , bk , ldots, pl , cm ] ∼ [b j , ak , . . . , cl , pm ]. The Archimedean condition states that if the relative likelihood of a non-fatal outcome is increased in relation to a strictly better non-fatal outcome, then there is some (non-negligible) decrease of the relative likelihood of a fatal outcome that counterbalances this precisely. This Archimedean condition tallies well with the conclusion of the third impossibility theorem, according to which advocates of the precautionary principle must be willing to admit that, to some extent, both the likelihood and the desirability of an outcome matter. According to the fourth impossibility theorem, each of desiderata for the precautionary principle stated above (Desiderata 10, 11, and 12) is inconsistent with the Archimedean condition, the ordering condition, and the following slightly strengthened version of the dominance condition. Desideratum 14 (Strong Dominance) If xi d yi for all i, and there is some j such that x j d y j , then X Y . Now consider the following theorem.
140
8 Risk aversion
Theorem 8.4 Total Order, Archimedes, and Strong Dominance are logically inconsistent with: 1. PP 2 2. PP 3 3. PP 4 Briefly put, the contradictions arise because the precautionary principle is inconsistent with the plausible intuition that the value of all possible outcomes of an act matter to some degree, even if the trade-off rate between good and fatal outcomes is allowed to be heavily biased towards avoiding fatal outcomes.
8.6 Risk aversion as an epistemic concept? The analysis in the previous sections focused entirely on risk aversion conceived of as a property of decision rules. Arguably, the four impossibility theorems should cast some doubt on whether any strongly risk averse decision rule is, after all, suitable for rational decision making. However, the concept of risk aversion could of course also be interpreted in other ways. In particular, it appears fruitful to distinguish between risk aversion conceived of as a property of decision rules, and as an epistemic concept. According to the epistemic (belief-guiding) interpretation, risk aversion should be characterised in terms of what agents ought to believe, not in terms of what they ought to do. There might, of course, be several different epistemic principles inherent in a belief-guiding approach to risk aversion.19 However, the most prominent epistemic principle is arguably the common claim that in decision theoretical contexts it is more desirable to avoid false negative errors than false positive ones. Call this principle the preference for false positives. If it is valid, it would be more undesirable from an epistemic point of view to not discover a relationship between a hazard and an activity that is in fact there, compared to incorrectly discover a relationship that is actually non-existent. This preference for false positives is usually not accepted in scientific research, since scientists prefer to remain unknowing about a truth, rather than believing something that is actually false. So why should we reason differently when it comes to practical decisions about life and death? Arguably, the answer is that the aim of science differs from that of practical decision making. Scientists strive to acquire as many true beliefs as possible, while minimising the false ones. However, the aim of a decision process is not to provide a correct representation of the relevant facts. The aim is rather to protect people from hazards. So if offered a choice between failing to reject a hypothesis that is in fact false, or failing to adopt a hypothesis that is in fact true, scientists would generally prefer to not discover an additional truth about the world, compared to coming to believe something that is in fact false. 19
Cf. Peterson (2006).
8.6 Risk aversion as an epistemic concept?
141
Of course, there is a simple and sound explanation of this epistemic preference. New scientific beliefs are often instrumental when making further discoveries, so any mistake incorporated into the corpus of scientific knowledge is likely to give rise to more mistakes further down the road. This is illustrated by the well-known example of phlogiston. In the 17th and 18th centuries it was widely accepted that all flammable materials contained phlogiston, a substance claimed to have no mass, colour, taste, or odour. It was believed that phlogiston was given off in combustion. This false belief guided chemists in the wrong direction for a long period. In fact, chemists did not come any closer to the truth about combustion for more than a century. The mistake of believing in phlogiston was not corrected until 1777, when Lavoisier presented his theory of combustion. So, briefly put, the scientists’ preference for false negatives can be traced to the negative consequences for future research of incorrectly accepting a false hypothesis. What about decision making, then? The most plausible argument for preferring false positive errors over false negatives is, arguably, that the consequences of coming to believe that something is hazardous when it in fact isn’t are seldom disastrous. The consequences of falsely believing something to be safe when it isn’t might, however, be disastrous. If I believe that it is safe to drink the tap water when it isn’t, I might get sick. Hence, it is better to pay a small amount for a bottle of mineral water. Call this the pragmatic argument. The pragmatic argument relies on a number of empirical premises. These can be articulated by addressing the following problem suggested by Lewens: You live in a jungle populated by an unknown number of tigers. The tigers are yellow and black. Unfortunately, everything eatable in the jungle is also yellow. For example, bananas are yellow. You decide to protect yourself against tigers by building a device that detects and warns for everything that is yellow. The good news is that because of the detector you will not be killed by a tiger. The bad news is that you will starve to death, because you will never find anything to eat. Hence, it is far from clear that it is in general better to prefer false positives over false negatives. The tiger example makes it clear that the epistemic preference for false positives would only be acceptable if one had reasons to believe that the combined undesirability and likelihood of making a false positive error outweighs the combined undesirability and likelihood of making a false negative error. Proponents of the pragmatic argument believe that we have such reasons. Of course, in the tiger example, the number of tigers in the jungle might be very small, whereas the consequence of not finding any bananas to eat may be disastrous. Under these circumstances, a preference for false positives would be unreasonable. However, in many real-life situations there are empirical reasons indicating that the risk of missing a real hazard outweighs the consequence of making a false negative error. Metaphorically speaking, this means that the number of tigers is so high that it outweighs the fact that no bananas are found. At least in a one-shot decision, i.e. a decision that is never repeated, this could motivate the principle of preferring false positives. The principle of preferring false positive errors is frequently combined with the claim that the burden of proof should be reversed when risks are high. According to this view, it is not the person who claims that X is hazardous who has the burden of
142
8 Risk aversion
proof; it is rather the person who claims that X is safe who ought to support his claim with arguments. This idea about a reversed burden of proof is, however, problematic. Arguably, anyone who is making a claim about something has the burden of proof, no matter what is being claimed. In order to see this, suppose that there exist a set of beliefs B such that one is free to accept these beliefs without having any reason for doing so, i.e. without having any burden of proof. Let b be an element of B. Then consider a person who happens to believe ¬b, and does so for some reason. For example, let ¬b be the belief that a new drug does not give rise to any adverse drug reactions; the reason might be that preliminary, inconclusive tests give partial support to this belief. Now, faced with the belief b, the agent has to decide whether to revise her previous belief, ¬b, or reject the new belief b. Since b ∧ ¬b is a contradiction, both believes cannot be accepted. However, if the claim about a fixed burden of proof is taken seriously, it would imply that a person who believes ¬b for some reason, which might be inconclusive, would be forced to give up that belief in favour of the opposite belief b, without being able to give any reason for this revision of beliefs. This is implausible. In fact, it is almost bizarre to accept a principle forcing us to change beliefs without being able to give any reason for doing so. At this point it might be objected that the idea of a reversed burden of proof is only applicable to cases in which one has not yet acquired a belief in either b or ¬b. Claims about a reversed burden of proof can, therefore, only be invoked if it is completely open whether one should believe b or ¬b. Given this qualification the problem outlined above could be avoided. Unfortunately, the qualification also makes the claim more or less empty. In nearly every case of practical relevance, people already hold some belief about the issue under consideration. Consider, for example, the case with genetically modified food. If the claim about a reversed burden of proof is taken seriously, one should believe that GM food is hazardous until it has been proven safe. The problem is, however, that most people already hold a belief about GM food, and some people do indeed believe that GM food is safe. Should they really change their view, without being able to give any reason for doing so? Note that the preference for false positives can be accepted without simultaneously adopting the idea about a reversed burden of proof. The two principles are distinct. The former is a methodological rule derived from statistics, according to which it is less serious, in a risk appraisal, to make false positive errors, compared to making a false negative error. The latter is a more general meta-epistemological principle about how one should decide what to believe.
Appendix A
Proofs
The theorems stated in this book fall into three classes: (i) Bayesian theorems originally proved by others, (ii) non-Bayesian theorems originally proved by others, and (iii) new, non-Bayesian theorems. In this chapter all theorems belonging to class (iii) are proved. Brief proof sketches are provided for nearly all theorems belonging to class (ii), whereas no proofs are given for theorems belonging to class (i).
Chapter 2 References to proofs of the classic Bayesian theorems recapitulated in this chapter are given in the main body of the text.
Chapter 3 Theorem 3.1, Part 1. Proof. According to Definition 3.4, tid ∈ T∗ . Substitute tid for t in the leftmost part of the weak monotonicity axiom. Theorem 3.1, Part 2. Proof. Substitute t for u in the leftmost part of the weak monotonicity axiom.
143
144
A Proofs
Theorem 3.1, Part 3. Proof. Let π be an arbitrary element in Π . (1) (t ◦ u ◦ t)(π ) (u ◦ t)(π ) Left-hand side of axiom (2) (t ◦ u ◦ t ◦ u)(π ) (u ◦ t)(π ) (1), Part 1 of present theorem (2), Part 2 of present theorem (3) (t ◦ u)(π ) (u ◦ t)(π ) (4) (u ◦ t ◦ u)(π ) (t ◦ u)(π ) Left-hand side of axiom (5) (u ◦ t ◦ u ◦ t)(π ) (t ◦ u)(π ) (4), Part 1 of present theorem (5), Part 2 of present theorem (6) (u ◦ t)(π ) (t ◦ u)(π ) (3), (6) (7) (u ◦ t)(π ) ∼ (t ◦ u)(π ) Theorem 3.1, Part 4. Proof. Let π be an arbitrary element in Π . (1) (u ◦ t ◦ u ◦ t)(π ) (t ◦ u ◦ t)(π ) Left-hand side of axiom Part 2 of present theorem (2) (u ◦ t)(π ) (t ◦ u ◦ t)(π ) Left-hand side of axiom (3) (t ◦ u ◦ t)(π ) (u ◦ t)(π ) (2), (3) (4) (t ◦ u ◦ t)(π ) ∼ (u ◦ t)(π ) Lemma 3.2 Proof. We prove this by induction: It follows from Theorem 1, Part 3 that the claim holds in case T has 2 elements. (In case T has only one element the claim is trivially true, because of reflexivity.) In order to prove the inductive step, suppose that the claim holds in case T has n (n ≥ 2) elements. Let t be element n + 1, and let F be a sequence of v elements and let G be a sequence of w elements; v + w = n. We need to show that (t ◦ F ◦ G)(π ) ∼ (F ◦ t ◦ G)(π ) ∼ (F ◦ G ◦ t)(π ) ∼ (t ◦ G ◦ F)(π ) ∼ (G◦t◦F)(π ) ∼ (G◦F◦t)(π ). First consider the case in which both F and G have a non-zero number of elements, i.e. v, w = 0. Note that the number of elements in (t ◦ G) is ≤ n. Hence, since the theorem was assumed to hold for up to n elements and F(π ) ∈ Π , it follows that (F ◦ t ◦ G)(π ) ∼ (F ◦ G ◦ t)(π ). We also need to show that (t ◦ F ◦ G)(π ) ∼ (F ◦ G ◦ t)(π ). In order to do this, we substitute u for F ◦ G in the proof of Part 3, Theorem 1. So far we have shown (since ∼ is transitive) that (t ◦ F ◦ G)(π ) ∼ (F ◦ t ◦ G)(π ) ∼ (F ◦ G ◦ t)(π ); by applying an analogous argument we find that (t◦G◦F)(π ) ∼ (G◦t◦F)(π ) ∼ (G◦F◦t)(π ). Finally, since the number of elements in (F ◦ G) is = n, it follows from our assumption that (t ◦ F ◦ G)(π ) ∼ (t ◦ G ◦ F)(π ). The second case, in which the number of elements in either F or G is zero, is trivial, since we have shown above that (t ◦ F ◦ G)(π ) ∼ (F ◦ G ◦ t)(π ).
A Proofs
145
Theorem 3.3 Proof. Let C = B−A. From the right-hand side of the axiom it follows that for every pb (π ) there is a permutation pc (π ) such that pa ◦ pc (π ) ∼ pb (π ). Hence, because of Part 1 of Theorem 1, pb (π ) pa (π ). Observation 3.4 Proof. Let us assume that π π . We use achievability and define u as the composite rule (u1 ◦ . . . ◦ un )(π ) = π . Due to the left part of order-independence, i.e. t(u(π ) t(π ), it holds that t(π ) = t(u(π )) t(π ). Hence, t(π ) t(π ). Observation 3.5 Proof. This is proved by constructing a counter-example: In the model described in Section 3.4 (Figure 1) both the weak monotonicity axiom and weak achievability are satisfied, but strong monotonicity is not, because u(π ) t(π ) but (t ◦ u)(π ) (u ◦ u)(π ). Theorem 3.6 Proof. In order to construct a SEUL such that each t ∈ T is represented by exactly one upvector-label, let the upvector-label be constituted by the set of upvectors given by, for each transformation from π to t(π ) in Π , V (π ), V (t(π )), as stated in Definition 3.7. That there is such an upvector-label follows from the assumption that t(π ) π for all π ∈ Π and Definition 3.1. The former assumption guarantees that there are upvectors. Definition 3.1, which states that transformative decision rules are functions, guarantees that b = b if a, b and a, b ∈ L. Hence, the upvectorlabels exist. Theorem 3.7, Part 1. Proof. Assume for reductio that a, b and b, c are elements in the upvector-label L. According to Theorem 3.6 there is exactly one t ∈ T for each upvector-label; suppose that u is the transformative rule corresponding to L. Since |c| > |b| ≥ |a| in a, b and b, c, it follows that for some π ∈ Π , u ◦ u(π ) u(π ) π , which contradicts the right-hand side of the weak monotonicity axiom.
146
A Proofs
Theorem 3.7, Part 2. Proof. Assume for reductio that there is an upvector-label L that contains a, b. Let L be an upvector-label containing a, d and b, c. From Theorem 6 it follows that there are some transformative decision rules corresponding to L and L ; suppose that t corresponds to L and suppose that u corresponds to L . According to Theorem 1, Part 3, it holds that t ◦ u(π ) ∼ u ◦ t(π ). According to Theorem 6 there are some upvectors corresponding to these (composite) transformations, and the real numbers corresponding to these upvectors have to be equal (because of Theorem 1, Part 3). However, this implies a contradiction, since |d| is strictly greater than |c|. Theorem 3.8 Proof. Let π be an arbitrary element in Π . Because of extended iterativity (u ◦ t)(π ) = (t ◦ u ◦ t)(π ). Because of strong iterativity (u ◦ t ◦ t)(π ) = (t ◦ u ◦ t)(π ), and because of correspondence (u ◦ t)(π ) = (t ◦ u)(π ). (Since π = π → t(π ) = t(π ) ⇔ t(π ) = t(π ) → π = π .) Theorem 3.9 Proof. It is a tautology that (t◦u)(π ) = (t◦u)(π ). Because of ultra-strong iterativity (u ◦ t) satisfy strong iterativity. Therefore, by applying strong iterativity to (t ◦ u) we have (t ◦ u)(π ) = (t ◦ u ◦ t ◦ u)(π ), and (t ◦ u ◦ u)(π ) = (t ◦ u ◦ t ◦ u)(π ). Since transformative decision rules are defined as mathematical functions, it holds that (t = u) ⇒ (t ◦ v = u ◦ v), where v is an arbitrary transformative decision rule. Hence, (t ◦ u ◦ u ◦ u+ )(π ) = (t ◦ u ◦ t ◦ u ◦ u+ )(π ), which can be rewritten as (t ◦ u)(π ) = (t ◦ u ◦ t)(π ). By applying Reversibility once more, (t ◦ u)(π ) = (t ◦ t ◦ t+ ◦ u ◦ t)(π ), and because of strong iterativity (t ◦ u)(π ) = (t ◦ t+ ◦ u ◦ t)(π ). From this it finally follows that (t ◦ u)(π ) = (u ◦ t)(π ). Observation 3.10 Proof. Part (1). Assume for reductio that there is a conservative rule t for Π that is not convergent. Then, for every π ∈ Π , it holds for the distinct elements π , t (π ) ,t (t (π )) , . . . in Π that π ≺ t (π ) ≺ t (t (π )), and so on. Furthermore, since Π is finite there are some m, m + k such that (t◦)m (π ) = (t◦)m+k (π ), which is impossible since ≺ is acyclic. Part (2). Let t = (t◦)n . Rule (t◦)n is strongly iterative because ≺ is acyclic. Furthermore, since t is conservative and ≺ is acyclic, (t◦)n is a perfect substitute for (t◦)n+m for any m ≥ 0.
A Proofs
147
Observation 3.11 Proof. Part (1). In case t is not convergent, let k be a number such that, for every m > k, the agent’s preference between (t◦)k and (t◦)m does not exceed ε . (We know that there is such a number k since there is at least one decision problem π ∗ that is optimal with respect to Π and ≺ is acyclic.) Then, let (t ◦)a = (t◦)a if a ≤ k, and (t ◦)a = (t◦)k if a > k. Now, rule t is convergent since whenever a > k, (t ◦)a = (t ◦)a+1 = (t◦)k . By Definition 3.7, (t ◦)n approximates (t◦)n for every n. Furthermore, since t is inert, t is equivalent to (t◦)n for all n (with respect to e and Π ). Part (2). In case t is not strongly iterative, let m be the smallest number for which it holds for every π ∈ Π that (t ◦)m (π ) = (t ◦)m+1 (π ). (That there is such a number m follows from the fact that t is convergent, as shown in Part (1). ) Then, t =(t ◦)m . Since t is inert, t is equivalent to rule t (which we defined above) with respect to e and Π . Observation 3.12 Proof. Part (1) Assume for reductio that there is a conservative rule t for Π that is not convergent. Then, for every π ∈ Π , it holds for the distinct elements π , t (π ) , t (t (π )) , . . . in Π ≺ (π ) that π ≺ t (π ) ≺ t (t (π )), and so on. Furthermore, since Π ≺ (π ) is finite there are some m, m + n such that (t◦)m (π ) = (t◦)m+n (π ), which is impossible since ≺ is acyclic. Part (2) In case t is not strongly iterative, let m be the smallest number for which it holds for every π ∈ Π that (t◦)m (π ) = (t◦)m+1 (π ). (That there is such a number m follows from the fact that t is convergent, as shown in Part (1).) Then, t =(t◦)m . Since t is inert, t is equivalent to t with respect to e and Π . Observation 3.13 Proof. Assume for reductio that there is a set T of conservative rules that is not acyclic. Then there is some sequence (t1 . . . tn ) of elements in T such that (t1 ◦ . . . ◦ tn )π = π , as well as some element tm such that (t1 ◦ . . . ◦ tm−1 )π = tm (π ). From Definition 3.3 it follows that π ≺ tm (π ). Hence π ≺ tm (π ) (t1 ◦ . . . ◦ tn )π , which is impossible since ≺ is transitive. ( is defined in the usual way.)
148
A Proofs
Chapter 4 Theorem 4.1. Proof. This proof is based on Luce (1959/2005:16-17). Since all probabilities were assumed to be non-zero, the choice axiom implies that: p(x A) · p(A B) p(x B) = p(y B) p(y A) · p(A B)
(A.1)
In equation (1) the rightmost terms are identical and can be ignored. This implies equation (2) below, whose rightmost part is obtained by letting B equal {x, y}. p(x A) p(x y) p(x B) = = p(y B) p(y A) p(y x)
(A.2)
(2) is sometimes called the ‘constant ratio rule’. The basic insight is that the ratio between x B and y B does not depend on B, i.e. the set of alternatives. In order to derive P-Transitivity from the choice axiom, observe that: p(x B) · p(y B) · p(z B) =1 p(y B) · p(z B) · p(x B)
(A.3)
By applying (A.2) to (A.3) we obtain: p(x y) · p(y z) · p(z x) =1 p(y x) · p(z y) · p(x z)
(A.4)
We can now substitute p(z x) = 1 − p(x z). By rearranging the terms we obtain the following equation. p(x z) =
p(x y) · p(y z) p(x y) · p(y z) + p(y x) · p(z y)
(A.5)
Finally, we apply P-Symmetry to p(y x) and q(z y) and substitute p = p(x y), q = p(y z), and r = (x z). This yields the expressions for the lower end points of the interval specified in P-Transitivity; the other endpoint is obtained in the analogous way. r=
p·q p · q + (1 − p) · (1 − q)
(A.6)
A Proofs
149
Chapter 5 Theorems 5.1 – 5.3: References to proofs are given in the main body of the text. Theorem 5.4 Proof. This proof is based on Luce (1959/2005:23-24). First consider the existence part of the theorem, saying that a function u exists. Since it was assumed that p(A B) = 0, the choice axiom implies that: p(x A) =
p(x B) p(A B)
(A.7)
Let u(x) = k · p(x B), where k > 0. Then, since the elements of A are mutually exclusive, the third probability axiom guarantees the truth of the following equation. p(x A) =
k · p(x B) u(x) = ∑ k · p(y B) ∑ u(y)
y∈A
(A.8)
y∈A
In order to prove the uniqueness part, suppose that u is another function defined as above. Then, for every x ∈ B, the following holds. u(x) = k · p(x B) =
k · u (x) ∑ u (y)
(A.9)
y∈B
By letting
k
= k/ ∑
u (y),
it immediately follows that u(x) = k u (x).
y∈B
Chapter 6 Theorem 6.1. A reference is given in the main body of the text. Theorem 6.2. Proof. A convenient way to prove Theorem 6.2 is to start from DeGroot’s (1970:7981) analogous theorem. This means that we shall first show that Axioms 6.1 – 6.5 imply the existence of a unique probability function, and thereafter verify that this function satisfies the axioms of the probability calculus. Part (1): The subjective probability function p is constructed by first applying Axiom 6.5, according to which there exists a random variable which has a uniform distribution on the interval [0, 1]. Let G[a, b] denote the event that the random variable x lies in the interval [a, b]. Then consider the following lemma: If
150
A Proofs
lx is any element in L, then there exists a unique number a∗ (1 ≥ a∗ ≥ 0) such that x ∼ G[0, a∗ ]. (For a proof, see DeGroot 1970:77-78.) Now, if lx is any element in L we can apply the lemma and let p(lx ) be defined as the number a∗ . Hence, lx ∼ G[0, p(lx )]. It follows that lx ly if and only if p(lx ) ≥ p(ly ), since lx ly if and only if G[0, p(lx )] G[0, p(ly )]. Part (2): A probability function p is characterized by the following axioms: (i) p(lx ) ≥ 0 for all lx , (ii) p(L) = 1, (iii) p( (ni=1 li )) = ∑ni=1 p(li ). In order to show that these conditions are fulfilled, note that it follows from the definition of p above that p(lx ) ≥ 0 for every lx . This verifies (i). Moreover, L = G[0, 1], which entailsthat p(L) = 1; this verifies (ii). In order to verify(iii), we have to show that p( (ni=1 lxi )) = ∑ni=1 p(lxi ). To start with, consider the binary case with / only two elements lx1 and lx2 ; that is, we want to show that if lx1 ∪ lx2 = 0, then p(lx1 ∪ lx2 ) = p(lx1 ) + p(lx2 ). In the first part of the proof we showed that lx1 ∼ G[0, p(lx1 )]. Hence, lx1 ∪ lx2 ∼ G[0, p(lx1 ∪ lx2 )]. According to a lemma proved by DeGroot (1970:79) it also holds that L ∼ G[p(lx1 ), p(lx1 ∪ x2 )]. Now, note that G[p(lx1 ), p(lx1 ∪ lx2 )] ∼ G[0, p(lx1 ∪ lx2 ) − p(lx1 )]. Also note that by definition L ∼ [0, p(L )]. Hence, p(lx1 ∪ lx2 ) − p(lx1 ) = p(lx2 ). By induction this result can be generalized to hold for any finite number of disjoint elements.
Chapter 7 Theorem 7.1 Proof. The proof proceeds by the application of a series of transformative decision rules to an arbitrary element π in Π . Step 1: First, apply to π the rule ss∗ , which is defined as the following rule ss iterated n times for some n such that ssn (π ) = ssn+1 (π ), where ssn denotes the iteration of ss n times. Let d equal the inverse of the denominator of the rational number p (s1 ) · p (s2 ) ... · p (sn ). π ifp(s1 ) = . . . = p(sn ) ss(π ) A , S , p , u otherwise, with for allp, q, r :
1. 2.
A = A S = (S − s) ∪ sα ∪ sβ , where s is some state in π for which there is some other state sa ∈ S in π with p(s) > p(sa )
(A.10)
A Proofs
3.
4.
151
p (s p ) = p(s p ) whenever s p ∈ S − s p (sα ) = p(s) − d p (sβ ) = d u (aq , sr ) = u(aq , sr ) whenever sr ∈ S − s u (aq , sr ) = u(aq , s) otherwise
Rule ss∗ will converge in such way that all states receive probability d in ss∗ (π ), because every time ss is applied exactly one state is split into two, of which at least one has a probability equal to d and the other one has a probability ≥ d. Since rule ss∗ is obtained from an iterated application of axiom EU 2, π is solution-equivalent to ss∗ . Hence, because of Postulate 7.3, the transformation from π to ss∗ (π ) is rational. Furthermore, eu (π ) = ss∗ ◦ eu (π ) because p (s) · u = (p (s) − d) · u + d · u for all p (s) , d, u. Step 2: Now apply to ss∗ (π ) the rule to∗ (trade-off), which is defined as the follown n+1 ing rule to iterated n times for some n such that to (π ) = to (π ). Let high(π ,a) be a function that returns an ordered pair a, s j such that u (a, s j )≥ u (a, sk ) for every k, and low(π ,a) a function that returns an ordered pair a, s j such that u (a, s j ) ≤ u (a, sk ) for every k. Let ε be an arbitrary small positive non-zero number such that (i) 0 < ε ≤ δ , as defined in axiom EU 4, and (ii) for every u (a, s) there is an even number t such that ε · t = u (a, s). π if u(ao , s p ) = u(ao , sq ) for all o, p, q to(π ) A , S , p , u otherwise, with :
1. 2. 3. 4.
(A.11)
A = A S = S p = p For some a such that h = high(π , a), l = low(π , a) and u(h) = u(l), for all r,t, v, w: u (a, sr ) = u(a, sr ) − ε if and only if a, sr = h u (a, st ) = u(a, st ) + ε if and only if a, st = l u (av , sw ) = u(av , sw ) for all av , sw = h, l
Rule to∗ is obtained from an iterated application of Lemma 7.2, to be proved below. Therefore, since p (s1 ) = ... = p (sn ) in ss∗ (π ), it holds that ss∗ (π ) is solutionequivalent to ss∗ ◦ to∗ (π ). Hence, because of Postulate 7.3, the transformation from ss∗ (π ) to ss∗ ◦ to∗ (π ) is rational. It follows from the definition of ε above that to∗ will converge such that u (a j , sk ) = u (a j , sl ) for all j, k, l in the formal decision problem ss∗ ◦ to∗ (π ). Furthermore, eu (π ) = ss∗ ◦ to∗ ◦ eu (π ) because p · u1 + p · u2 = p · (u1 − ε ) +p · (u2 + ε ) for all p, u1 , u2 and ε .
152
A Proofs
Step 3: Now apply to ss∗ ◦ to∗ (π ) the rule ms∗ (merger of states), which is defined as the following rule ms iterated n times for some n such that msn (π ) = msn+1 (π ). Let sa and sb be two arbitrary states in S. π if u(ao .s p ) = u(ao , sq ) for all o, p, q ms(π ) A , S , p , u otherwise, with :
1. 2. 3. 4.
(A.12)
A = A S = (S − sa − sb )) ∪ sz , where sz is a new state p (s p ) = p(s p ) whenever s p ∈ (S − sa − sb ) p (sz ) = p(sa ) + p(sb ) u (aq , sr ) = u(aq , sr ) whenever sr ∈ (S − sa − sb ) u (aq , sz ) = u(aq , sa )
Since rule ms∗ is obtained from an iterated application of EU 2, used ‘backwards’, and u (a p , sq ) = u (a p , sr ) for all p, q, r in ss∗ ◦ to∗ (π ), it holds that ss∗ ◦ to∗ (π ) is solution-equivalent to ss∗ ◦ to∗ ◦ ms∗ (π ). Hence, because of Postulate 7.3, the transformation from ss∗ ◦ to∗ (π ) to ss∗ ◦ to∗ ◦ ms∗ (π ) is rational. Observe that rule ms∗ will converge such that S = {s} in the formal decision problem ss∗ ◦ to∗ ◦ ms∗ (π ), i.e. that ss∗ ◦ to∗ ◦ ms∗ (π ) is a decision problem under certainty. Furthermore, ss∗ ◦ to∗ ◦ ms∗ ◦ eu (π ) = ss∗ ◦ to∗ ◦ eu (π ) for obvious arithmetical reasons. Step 4: Now apply to ss∗ ◦ to∗ ◦ ms∗ (π ) the rule max, which is defined as follows: max(π )={a p : u(a p , s) ≥ u(aq , s) for all q} Rule max follows from axiom EU 1 (dominance), and is thus rational when applied to every decision problem under certainty, e.g. ss∗ ◦ to∗ ◦ ms∗ (π ) Step 5: Since π was assumed to be an arbitrary decision problem in Π , it follows from steps 1-4 and Postulate 7.3 that it is rational to apply the rule ss∗ ◦ to∗ ◦ ms∗ ◦ max (π ) to every π ∈ Π . As noted in step 3, ss∗ ◦ to∗ ◦ ms∗ (π ) is a decision problem under certainty, and in every decision problem under certainty πc it holds that max (πc ) = eu (πc ). Hence, ss∗ ◦ to∗ ◦ ms∗ ◦ eu (π ) = ss∗ ◦ to∗ ◦ ms∗ ◦ max (π ). Furthermore, from step 1-4 it follows that eu (π ) = ss∗ ◦ to∗ ◦ ms∗ ◦ eu (π ). Hence, eu (π )=ss∗ ◦ to∗ ◦ ms∗ ◦ max (π ). Because of Postulate 7.2 it can now be concluded that it is rational to apply eu to every π ∈ Π .
A Proofs
153
Lemma 7.2 Proof. Lemma 7.2 can be proved by showing that ε1 = ε2 whenever axiom EU 4 is applied to a formal decision problem π in which s and s are equi-probable. Assume for reductio that ε2 = ε1 as axiom EU 4 is applied to the act ai and the equi-probable states s and s . Then, first consider the case in which s and s are the only states in π , and a is an element in the solution of π . Let u1 = u(a, s) and u2 = u(a, s ). By applying axiom EU 3, π can be transformed into π = A ∪ {a }, S, p, u. It holds that a and a are elements in the solution of π . Table A.1 [π ] s ... a a ...
u1 u1
s u2 u2
[π ] s ... ai a ...
s
u1 u2 u1 − ε1 u2 + ε2
[π ] s ... a a ...
s
u1 u2 u1 − ε1 + ε2 u2 + ε2 − ε1
According to axiom EU 4, π , π , π , and π are identical with respect to the factors determining ε2 as a function of ε1 , p(s), p(s ). Hence, by applying axiom EU 4 twice, it follows that π is solution-equivalent to π , which is solution-equivalent to pi . However if ε1 = ε2 either a dominates a , or a dominates a. Therefore, by axiom EU 1, both a and a cannot be elements in the solution of π , which contradicts the result that π and π are solution-equivalent. Consequently, the initial assumption must be false. In order to handle the general case with more than two states, it is sufficient to note that for every s ∈ {S − {s ∪ s }}, it holds that u(a , s ) = u(a, s ). Hence, the manoeuvre from π to π can be carried out in exactly the same way as above. Theorem 7.3 Proof. The proof of Theorem 7.3 starts from the observation that numerical utilities can be assigned to outcomes, by applying Theorem 5.4. Therefore, the proof of Theorem 7.3 is to a large extent analogous to the proof of Theorem 7.1. More precisely put, it is sufficient to prove Lemma 9.1 below, which corresponds to Lemma 7.2 in Theorem 7.1; all other modifications of the proof of Theorem 7.1 are analogous. Lemma A.1 (Equal Trade-Off) If u(a, s) > u(a, s ) and a ∈ Φ (π ) and s and s are equiprobable, then there is a closed interval of positive numbers [a, b] such that for every α ∈ [a, b], Φ (π ) = Φ (π ), where π is the formal decision problem obtained from π by subtracting α from u(a, s) and adding α to u(a, s ). Lemma 9.1 is proved by showing that α = β whenever the following lemma, Lemma 9.2, is applied to a formal decision problem with equiprobable states.
154
A Proofs
Lemma A.2 If u(a, s) > u(a, s ) and a ∈ Φ (π ), then there is a closed interval of positive numbers [a, b] such that for every α ∈ [a, b], p(s), p(s ) there is a number β such that Φ (π ) = Φ (π ), where π is obtained from π by subtracting α from u(a, s) and adding β to u(a, s ). Lemma 9.2 follows immediately from axiom EU 9 and Theorem 5.4. Hence, in order to prove Lemma 9.1, first consider the case in which s and s are the only states in π , and a ∈ Φ (π ). Let u(a, s) = u1 and let u(a, s ) = u2 ; u1 > u2 . Assume for reductio that α = β as Lemma 9.2 is applied to the act a and the equiprobable states s and s . By applying EU 8 twice, it follows that π can be transformed into a formal decision problem π with three new (identical) acts a , a , a such that Φ (π ) = {Φ (π ) − a} ∪ {a ∪ a ∪ a }. Now apply Lemma 9.2 to a and a in π exactly k times with some α and β , such that 0 < u1 − kα − [u2 + kβ ] < α . (It might hold that there are no k, α , β that satisfy this inequality, e.g. if β α . In that case, let k = 1, α = α , β = β , and ignore the next sentence.) Then apply Lemma 9.2 to a one more time, this time subtracting α respectively adding β . It now holds that u(a , s) < u(a , s ) in π . Finally, apply Lemma 9.2 again, but this time subtract α from u(a , s ) and add β to u(a , s). Table A.2 [π ] s ... a a a ...
u1 u1 u1
s u2 u2 u2
[π ] s ... a a a ...
s
u1 u2 u1 − k α u2 + k β u1 − k α − α + β u2 + k β + β − α
Initially it was assumed that α = β . Hence, −α + β = 0. Thus, by EU 6, a ∈ Φ (π ) or a ∈ Φ (π ), which contradicts the result that {a , a , a } = Φ (π ) = Φ (π ). Consequently, the initial assumption must be false. Now consider the case in which a ∈ Φ (π ). Then, by EU 5, there is some other act b such that b ∈ Φ (π ). Furthermore, since β is a function of α , it follows that α and β can be applied to u(b, s) and u(b, s ) in the same way as above, yielding the analogous contradiction. Finally, in order to handle the general case with more than two equiprobable states, it is sufficient to note that for every state v ∈ {S − {s ∪ s }}, it holds that u(a, v) = u(a , v) = u(a , v) = u(a , v). Hence, the manoeuvre from π to π can be carried out in a similar way as above.
A Proofs
155
Lemma 7.4 Proof. Note that u3 (a, s) = α · u2 (a, s) + (1 − α ) · 0 = α · u1 (a, s) + α · u0 (s) = α · u1 (a, s) + (1 − α ) · α · u0 (s) / (1 − α ) . By applying Axiom 7.10 twice it follows that Φ (π3 ) = Φ (π2 ) and Φ (π3 ) = Φ (π1 ). Theorem 7.5 Proof. The proof of Theorem 7.5 has the same structure as the proof of Theorems 7.1 and 7.3. First consider the case with a formal decision problem π which contains only two alternative acts and two equiprobable states. Let x = ujm − uil . Table A.3 [π ]
[π ] ai aj
sl uil ujl
sm uim ujm
ai aj
[π ] sl sm ujm uim ujl + x ujm
ai aj
[π ] sl sm uim ujm ujl + x ujm
ai aj
sl uim ujl + x
Note that Φ (π ) = Φ (π ) = Φ (π ) = Φ (π ), since: (i) π can be obtained from π by applying Lemma 7.4 and (ii) π can be obtained from π by applying axiom EU 11 and (iii) π can be obtained from π by applying Lemma 7.4. Furthermore, note that eu(π ) = eu(π ) = eu(π ) = eu(π ), since all transformations from π to π affects the expected utility of both alternative acts equally much. Let ind (π ) be the rule that transforms π into π . Then, ss∗ ◦ ind (π ) is a decision problem under certainty, and furthermore eu (π ) = ss∗ ◦ ind ◦ eu (π ). In order to handle the general case, the rule ind∗ (π ) can be introduced, which is define as the rule ind (π ) iterated a finite number of times to different ‘parts’ (i.e. quadruples of acts and states) of the general problem, such that each time ind (π ) is applied the original problem π comes one step closer to being a decision problem under certainty. That there is such a general rule ind∗ (π ) follows from the fact that the number of alternative acts and states in the original decision problem is assumed to be finite. As before, ss∗ ◦ ind∗ (π ) is a decision problem under certainty, and eu (π ) = ss∗ ◦ ind∗ ◦ eu (π ). The remaining steps of the proof are parallel to the proof of Theorem 7.3. Weak version of the trade-off principle Consider the following weak version of the trade-off principle, mentioned in Section 7.6.
156
A Proofs
EU 12 (Weak Trade-off): There is some number δ > 0 such that for every decision problem π and every ε1 , δ ≥ ε1 ≥ 0, if s and s are two of the states and a is one of the alternatives in π , there is a number ε2 , determined by a function C(ε1 , p(s), p(s ), u(ai∗ , s) − u(ai∗ , s ) > 0 that is weakly increasing with respect to u(ai∗ , s) − u(ai∗ , s ), such that Φ (π ) = Φ (π ), where π is the decision problem obtained from π by subtracting ε1 from u(a, s) and adding ε2 to u(a, s ). Observation A.3 The trade-off principle (EU 9) can be substituted for EU 12 in Observation A.3. Proof. It is sufficient to prove Lemma 7.2, since the trade-off principle is not used in the proofs of Theorem 7.1 or 7.3. As before, Lemma 7.2 will be proved by showing that ε1 = ε2 whenever EU 12 is applied to a formal decision problem π in which s and s are equiprobable. Assume for reductio that ε2 = ε1 as EU 12 is applied to the act ai and the equiprobable states s and s . Then, let us first consider the case in which s and s are the only states in π , and ai ∈ Φ (π ). Let u1 denote u(ai , s) and let u2 denote u(ai , s ). By applying EU 7, it follows that π can be transformed into a formal decision problem π such that Φ (π ) = {ai∗ } ∪ Φ (π ). It follows that ai , ai∗ ∈ Φ (π ). Table A.4 [π ] ... ai ai∗ ...
[π ]
[π ]
s
s
s
s
s
s
u1 u1
u2 u2
u1 u1 − ε1
u2 u2 + ε2
u1 u1 − ε1 + Σε4,4 ,...
u2 u2 + ε2 − Σε3,3 ,... = u2
According to axiom EU 12, π and π are identical with respect to the factors that determine ε2 as a function C of ε1 , p(s), p(s ), u(ai , s) − u(ai , s ). Hence, by applying EU 12 to π , it follows that Φ (π ) = Φ (π ). Now, there are four cases to consider. Let us first consider the case in which ε2 > ε1 and u2 ≥ u1 . Initially, we withdraw small amounts of utility ε3 , ε3 , . . . from u2 + ε2 in π a finite number of times, and add ε4 , ε4 , . . . to u1 − ε1 in π , until Σ ε3,3 ,... = ε2 . It now holds that u(ai , s ) = u(ai∗ , s ) in π . Then, since the function C was assumed to be weakly increasing with respect to u(ai , s) − u(ai , s ) it holds that Σ ε4,4 ,... ≥ ε2 and Σ ε3,3 ,... = ε1 . Furthermore, since it was assumed that ε2 > ε1 , it can now be concluded that Σ ε4,4 ,... > Σ ε3,3 ,... . Hence, by EU 5, ai ∈ Φ (π ), which contradicts the result that Φ (π ) = Φ (π ). The three other cases, i.e. ε1 > ε2 and u1 > u2 , ε2 > ε1 and u1 > u2 respectively ε1 > ε2 and u2 ≥ u1 , are treated in analogous ways. Let us now consider the case in which ai ∈ Φ (π ). Then, by EU 5, there is some other act ak such that ak ∈ Φ (π ). As a preparatory step, apply EU 12 to u(ak , s) and u(ak , s ) a finite number of times, until u(ak , s) − u(ak , s ) = u(ai , s) − u(ai , s ). Then, since ε2 is a function of ε1 , p(s), p(s ), u(ai , s) − u(ai , s ), it follows that ε1 and ε2 can be applied twice to u(ak , s) and u(ak , s ), in the same way as above.
A Proofs
157
Finally, in order to handle the general case with more than two states, it is sufficient to note that for every s ∈ {S − {s ∪ s }}, it holds that u(ai∗ , s ) = u(ai , s ). Hence, the manoeuvre from π to π can be carried out in exactly the same way as above.
Chapter 8 Theorem 8.1 Proof of Theorem 8.1: We prove each of 8.1.1-8.1.6 separately. We assume that c1 , c2 , c3 > 0. (This is straightforward, since utility is measured on an interval scale.) Theorem 8.1.1 Proof. Assume for reductio that f satisfies Desiderata 1 & 2 & 3. Let α be a small number, and let π1 and π2 be defined as follows. Table A.5 [π1 ]
a1 a2
p(s1 ) = ph − α
p(sm ) = α
p(sn ) = α
p(sn ) = 1 − ph − α
c1 c1
c1 2c1
3c1 2c1
2c1 2c1
p(s1 ) = ph − α
p(sm ) = α
p(sn ) = α
p(sn ) = 1 − ph − α
c1 c1
2c1 3c1
2c1 c1
2c1 2c1
[π2 ]
a1 a2
By applying Desideratum 1 to π1 we find that f (π1 ) ⊂ {a1 }, since the probability that a1 results in a fatal outcome is equal to ph . Because of Desideratum 3 it follows that f (π1 ) = a2 . Moreover, by applying Lemma 7.4 twice we find that f (π1 ) = f (π2 ), since π2 can be obtained from π1 by first adding +c1 to sm and then adding –c1 to sn . Finally, by applying Desideratum 4.1 to π2 we find that f (π2 ) ⊂ {a2 }, since the probability that a2 results in a fatal outcome is now equal to ph . But this contradicts
158
A Proofs
the result that f (π1 ) = f (π2 ) = a2 . Therefore, our initial assumption must be false. Theorem 8.1.2 Proof. Assume for reductio that f satisfies Desiderata 8.1 & 8.2 & 8.3. Let α be a small number, and let π1 and π2 be defined as below. Then, the proof is completed by applying the same reasoning as in the proof of Theorem 8.1 (Lemma 7.4 implies that f (π1 ) = f (π2 ), since π2 can be obtained from π1 by adding –c2 to sn and sn respectively.). Table A.6 [π1 ]
a1 a2
p(s1 ) = ph − α
p(sm ) = α
p(sn ) = α
p(sn ) = 1 − ph − α
0 −c2 /2
0 +c2 /2
+c2 +c2 /2
+c2 +c2
p(s1 ) = ph − α
p(sm ) = α
p(sn ) = α
p(sn ) = 1 − ph − α
0 −c2 /2
0 +c2 /2
0 −c2 /2
0 0
[π2 ]
a1 a2
Theorem 8.1.3 Proof. Assume for reductio that f satisfies Desiderata 8.1 & 8.2 & 8.3. Let α be a small number, and let π1 and π2 be defined as below. (See Table A.7.) Then, the proof is completed by applying the same reasoning as in the proof of Theorem 8.1 (Lemma 7.4 implies that f (π1 ) = f (π2 ), since π2 can be obtained from π1 by adding –c3 to s1 and s2 )
A Proofs
159
Table A.7 [π1 ]
a1 a2
p(s1 ) = ph
p(s2 ) = 1 − ph
0 +c3
+c3 +c3
[π2 ]
a1 a2
p(s1 ) = ph
p(s2 ) = 1 − ph
−c3 0
+c3 0
Theorem 8.1.4 Proof. Assume for reductio that f satisfies Desiderata 8.1 & 8.2 & 8.3. Let p (s1 ) = p (s2 ) = 0.5, and let π1 and π2 be defined as follows. Table A.8 [π1 ] a1 a2
s1 3c1 2c1
s2 c1 2c1
[π2 ] s1 2c1 c1
s2 2c1 3c1
By applying Desideratum 8.1 to π1 we find that f (π1 ) ⊂ {a1 }, since the probability that a1 results in a fatal outcome is equal to or exceeds ph (and a2 is certain to not result in a fatal outcome). Because of Desideratum 8.3 it follows that f (π1 ) = a2 . By applying Lemma 7.4 twice we find that f (π1 ) = f (π2 ), since π2 can be obtained from π1 by first adding -c1 to s1 and then adding +c1 to s2 . Finally, by applying Desideratum 8.1 to π2 we find that f (π2 ) ⊂ {a2 }, since the probability that a2 results in a fatal outcome is now equal to or exceeds ph (and a1 is certain to not result in a fatal outcome). But this contradicts the result that f (π1 ) = f (π2 ) = a2 . Therefore, our initial assumption must be false. Theorem 8.1.5 Proof. Assume for reductio that f satisfies Desiderata 4.1 & 4.2 & 4.3. Let p (s1 ) = p (s2 ) = 0.5, and let π1 and π2 be defined as below. Then, the proof is completed by applying the same reasoning as in the proof of Theorem 4.1.4 (Lemma 7.4 implies that f (π1 ) = f (π2 ), since π2 can be obtained from π1 by adding – c2 to s2 .)
160
A Proofs
Table A.9 [π1 ] a1 a2
s1 0 +c2 /2
s2 +c2 +c2 /2
[π2 ] s1 0 +c2 /2
s2 0 −c2 /2
Theorem 8.1.6 Proof. Assume for reductio that f satisfies Desiderata 4.1 & 4.2 & 4.3. Let p (s1 ) = p (s2 ) = 0.5, and let π1 and π2 be defined as below. Then, the proof is completed by applying the same reasoning as in the proof of Theorem 4.1.4 (Lemma 7.4 implies that f (π1 ) = f (π2 ), since π2 can be obtained from π1 by adding -2c3 to s2 .) Q.E.D. Table A.10 [π1 ] a1 a2
s1 0 +c3
s2 +2c3 +c3
[π2 ] s1 0 +c3
s2 0 −c3
Proof of Theorem 8.2 Lemma 2 (Equal trade-off ): Let Desiderata 5.1 and 5.2 hold. Then, there is some number δ > 0 such that for all ε , 0 ≤ ε ≤ δ , and all decision problems π , if two states s and s are equi-probable in π , a is one of the alternatives in π , and π is the decision problem obtained from π by subtracting ε from the utility of a under s and adding ε to the utility of a under s , or vice versa, then f (π ) = f (π ). Proof. For a proof of Lemma 2, see Theorem 7.3
Note that in case Theorem 8.2.4 holds, then Theorem 8.2.1 also holds, and if Theorem 8.2.5 holds then Theorem 8.2.2 holds, and if Theorem 8.2.6 holds then Theorem 8.2.3 holds. Therefore, it is sufficient to prove Theorems 8.2.4, 8.2.5 and 8.2.6. We assume that c1 , c2 , c3 > 0. (This is unproblematic, since utility is measured on an interval scale.)
A Proofs
161
Theorem 8.2.4 Proof. Assume for reductio that f satisfies Desiderata 8.1 & 8.3 & 8.10 & 8.13 & 8.14. Let α be a small number, and let π1 and π2 be defined as follows. [π1 ] ∑k1 p(s j ) = 1 − ph − 2α ∑m l p(s j ) = ph − α
p(s1 ), ..., p(sk ) = α p(sl ), ..., p(sm ) = α p(sn ) = α p(sn ) = α p(sn ) = α a1 2c1 c1 c1 2c1 (m − l + 3) · c1 a2 2c1 2c1 3c1 /2 3c1 /2 2c1
[π2 ] ∑k1 p(s j ) = 1 − ph − 2α ∑m l p(s j ) = ph − α
p(s1 ), ..., p(sk ) = α p(sl ), ..., p(sm ) = α p(sn ) = α p(sn ) = α p(sn ) = α a1 2c1 2c1 3c1 /2 3c1 /2 2c1 a2 2c1 c1 c1 2c1 (m − l + 3) · c1
By applying Desideratum 8.1 to π1 we find that f (π1 ) ⊂ {a1 }, since the probability that a1 results in a fatal outcome is equal to ph (and a2 is certain not to result in a fatal outcome). Because of Desideratum 8.3 it follows that f (π1 ) = a2 . As a consequence of Lemma 2, f (π1 ) = f (π2 ) since π2 can be obtained from π1 by: (i) first adding and subtracting ε to a1 , sn & a1 , sn respectively a2 , sn & a2 , sn a finite number of times, and then (ii) add c1 to each of the outcomes a1 , sl ...sm and add all this amount of utility from a1 , sn (each time adding and subtracting just ε ), and finally (iii) subtract c1 to each of the outcomes a2 , sl ...sm and add all this amount of utility to a2 , sn (each time adding and subtracting just ε ). Finally, by applying Desideratum 8.1 to π2 we find that f (π2 ) ⊂ {a2 }, for the same reason as above. But this contradicts the result that f (π1 ) = f (π2 ) = a2 . Therefore, our initial assumption must be false. Theorem 8.2.5 Proof. The proof of Theorem 8.2.4 applies to Theorem 8.2.5 as well, provided that all occurrences of c1 are substituted by c2 . Theorem 8.2.6 Proof. The proof of Theorem 8.2.4 applies to Theorem 8.2.6 as well, provided that all occurrences of c1 are substituted by c3 and c3 is added to the first column of π1 and π2 .
162
A Proofs
Theorem 8.3 Proof. Let X = [a, p, q], and let Y = [a, q, p]. By definition, X and Y are equally likely to give rise to fatal outcomes, so PP 1 implies that (i): X ∼ Y . The condition of Covariance implies that there is some Y = [a, p, q] such that (ii): Y Y . However, Y = X, so the dominance condition implies that Y X and X Y . It follows from the ordering condition that: (iii) X ∼ Y . Since the preference ordering is transitive, it follows from (i) and (iii) that Y ∼ Y , which contradicts (ii), given the ordering condition. Theorem 8.4 Condition PP 4 trivially entails condition PP 3, which trivially entails condition PP 2. Therefore, it is sufficient to prove part (3) of Theorem 2 – if this part holds the other parts will also hold. Proof. Let X = [a, b, a, p, c]. The Archimedean condition then implies that are some j, k, l, m such that X ∼ X , where X = [a j , bk , a, pl , cm ] and X = [b j , ak , a, cl , pm ]. Let Y = [b j , ak , b, cl , pm ]. In this construction, pl and pm are chosen such that they are not negligibly unlikely and such that pl is ‘sufficiently’ more likely than pm . Then, PP 4 implies that (i): Y X . Furthermore, the dominance condition implies that X Y . Since X ∼ X it follows that X ∼ X Y . The ordering condition guarantees that the preference ordering is transitive, so (ii): X Y . This contradicts (i), given the ordering condition.
References
Allais M. (1953) ‘Le Comportement de l’homme rationnel devant le risque: Critique des postulates et axiomes de l’ecole Amricaine’ Econometrica, 21:503–546. Anscombe F.J. and R.J. Aumann (1963) ‘A Definition of Subjective Probability’, Annals of Mathematical Statistics, 34:199-205. Arnauld A. and P. Nicole (1662/1996) Logic or the Art of Thinking, 5th ed., Translated and edited by Jill Vance Buroker, Cambridge University Press. Arrow K. J. (1970) ‘The theory of Risk Aversion’, in his Essays in the Theory of Risk- Bearing, North-Holland Publ. Comp. Armstrong W. E. (1939) ‘The Determinateness of the Utility Function’, The Economic Journal, 49:453–467. Aumann R. J., S. Hart, M. Perry (2005), ‘Conditioning and the Sure-Thing Principle’, mimeo, The Hebrew University of Jerusalem. Ayres, Robert U. & Sandilya, Manalur S. (1986) ‘Catasrophe Avoidance and Risk Aversion: Implications of Formal Utility Maximization’, Theory and Decision, 20:63–78. Bayes T. (1763) ‘An Essay towards solving a Problem in the Doctrine of Chances, Philosophical Transactions of the Royal Society of London, vol 53. Bateson M. (2002) ‘Context-dependent foraging choices in risk-sensitive starlings’ Animal Behaviour, 64:251-260. Bentham J. (1789/1970) An Introduction to the Principles of Morals and Legislation, Ed. J.H. Burns and H.L.A. Hart, London: The Athlone Press. Bergstr¨om L. (1966) The Alternatives and Consequences of Actions, Almqvist & Wiksell International. Bernoulli D. (1738/1954) ‘Specimen Theoriae Novae de Mensura Sortis’, Commentari Academiae Scientiarium Imperialis Petrolitanae, 5, 175-192. Translated as: ‘Expositions of a New Theory on the Measurement of Risk’, Econometrica, 22:23–36. Blackburn S. (1998) Ruling Passions, Oxford University Press. Bodansky, D. (1994) ‘The precautionary principle in US environmental law’. In T. ORiordan & J. Cameron (Eds.), Interpreting the Precautionary Principle, Cameron May, 203-28. Bolker, E. (1966) ‘Functions Resembling Quotients of Measures’, Transactions of the American Mathematical Society, 124:292–312. Brim O. G. (1962) Personality and Decision Processes, Studies in the Social Psychology of Thinking, Stanford. Broome, J. (1991) Weighing Goods, Basil Blackwell. Broome, J. (1998) ‘Is incommensurability vagueness?’, in Incommensurability, Incomparability, and Practical Reason, edited by Ruth Chang, Harvard University Press, 67-89. Reprinted in Broome’s book Ethics Out of Economics, 123-44. Broome J. (1999) Ethics out of Economics, Cambridge University Press.
163
164
References
Broome J. (2005) ‘Does rationality give us reasons?’, Philosophical Issues, 15:321-337. Carlsson E. (1995) Consequentialism Reconsidered, Kluwer Academic Publishers. Carlsson E. (2002) ‘Deliberation, Foreknowledge, and Morality as a Guide to Action’, Erkenntnis 57:71-89. Carnap R. (1950) Logical Foundations of Probability, University of Chicago Press. Chang R. (2002) ‘The Possibility of Parity’, Ethics 112:659–88. Clemen R. T. (1991) Making hard decisions: an introduction to decision analysis, PWS-Kent Publishing Comapny. Coombs C.H., R.M. Dawes, and A. Tversky (1970) Mathematical Psychology: An elementary introduction, Prentice-Hall. Costa H. A., J. Collins, and I. Levi (1995) Desire-as-Belief Implies Opinionation or Indifference, Analysis, 55:2–5. Dancy J. (2004) Ethics Without Principles, Oxford University Press. Davidson D. (1980) Essays on Actions and Events, Oxford University Press. Davidson D., and P. Suppes (1957) Decision Making: An Experimental Approach, Stanford University Press. Danielsson, S. (1983) ‘Hur man inte kan m¨ata v¨alm˚aga. (In Swedish.) Filosofisk Tidskrift, 3:33-55. Debreu G. (1960) ‘Review of R. D. Luce, Individual Choice Behavior: A Theoretical Analysis’, American Economic Review 50:186-88. DeFinetti, B. (1974-5) Theory of Probabilty 2 vols., Wiley. DeGroot M. (1970) Optimal statistical decisions, McGraw-Hill. Dennett D. C. (1978) Brainstorms: Philosophical Essays on Mind and Psychology, Cambridge: MIT Press. Ekenberg L., Boman M., Linnerooth-Bayer J. (2001) ‘General Risk Constraints’, Journal of Risk Research, 4:31–47. (2006)N. Espinoza and M. Peterson (2006) ‘A probabilistic analysis of incomparability’, mimeo, Lulea University of Technology. Fishburn P. (1970) Utility Theory for Decision Making, John Wiley and Sons. Reprinted by Krieger Press 1979. G¨ardenfors P. and N.-E. Sahlin (1982) ‘Unreliable probabilities, risk taking, and decision making’, Synthese, 53:361–386. Gerrans, P. (2002) ‘The Theory of Mind Module in Evolutionary Psychology’, Biology and Philosophy 17:305-321. Good I.J. (1950) Probability and the weighing of evidence, Griffin. Graham, J. D. (2000) ‘Perspectives on the precautionary principle’, Human and Ecological Risk Assessment, 6:383-385. Gr¨une T. (2004) ‘The problems of testing preference axioms with revealed preference theory’, Analyse & Kritik, 26:382–397. Halld´en S. (1980) The Foundations of Decision Logic, Library of Theoria no. 14. Hammond J. S., Raiffa H., and Keeney R. L. (1999) Smart Choices: A Practical Guide to Making Better Decisions, Harvard Business School Press. Hansson B. (1988) ‘Risk Aversion as a Problem of Conjoint Measurement’, in G¨aredenfors & Sahlin (1988): Decision, Probability, and Utility, Cambridge University Press. Hansson S. O.(2001) The Structure of Values and Norms, Cambridge University Press. Harsanyi J. C. (1978) ‘Bayesian Decision Theory and Utilitarian Ethics’, American Economic Review, Papers & Proc., 68:223–228. Harsanyi J. C. (1979) ‘Bayesian Decision Theory, Rule Utilitarianism, and Arrow’s Impossibility Theorem’, Theory and Decision, 11:289–317. Herstein I. N. & Milnor, J. (1953) ‘An axiomatic approach to measurable utility’, Econometrica, 21:291–297. Howson C. and P. Urbach (2006) Scientific Reasoning: The Bayesian Approach, 3nd ed., Open Court. Humphreys, P. (1985),‘Why Propensities Cannot Be Probabilities’, Philosophical Review, 94:557-70.
References
165
Jeffrey R. (1983) The Logic of Deciosion, 2nd ed. (significant improvements from 1st ed.), University of Chicago Press. Joyce J. M. (1999) The foundations of causal decision theory , Cambridge University Press. Joyce J. M. (2002) ‘Levi on Causal Decision Theory and the Possibility of Predicting Ones Own Actions’, Philosophical Studies, 110:69-102. Kagel J. H. and A. Roth (1995) The Handbook of Experimental Economics, Princeton University Press. Kahneman D. and A. Tversky, (1979) ‘Prospect Theory: An Analysis of Decisions Under Risk’, Econometrica, 47:263–91. Kavka, G. S.(1980) ‘Deterrence, Utility, and Rational Choice’, Theory and Decision, 12:41–60. Keynes J. M. (1921) A Treatise on Probability, MacMillan & Co. Keynes J. M. (1923) A Tract on Monetary Reform, MacMillan & Co. Kihlbom U. (2002) Ethical Particularism: An Essay on Moral Reasons, Almqvist & Wiksell International. Kozlowski R.T. and Mathewson, S.B. (1995) ‘Measuring and Managing Catastrophe Risk’, Journal of Actuarial Practice, 3, 211-32. Krantz D. H., D. R. Luce, P. Suppes, and Tversky, A. (1971) Foundations of Measurement: Volume 1 Additive and Polynomial Representations, Academic Press. Kreps D. M. (1988) Notes on the Theory of Choice, Westview Press. Kripke S. (1980) Naming and Necessity, Harvard University Press. Koopman B.0. (1940) ‘The Bases of Probability’, Bullentin of the American Mathematial Society, 46:763–774. Kunreuther H. (1997) ‘Managing Catastrophic Risks Through Insurance and Mitigation’, mimeo, Wharton Risk Management and Decision Process Center. Laming D. (1973) Mathematical Psychology, Academic Press. Laplace P. S. (1814) A Philosophical Essay on Probabilities, English edition 1951, New York: Dover Publications Inc. Levi I. (1980) The Enterprise of Knowledge, MIT Press. Levi I. (1989) ‘Rationality, Prediction, and Autonomous Choice’, Canadian Journal of Philosophy 19 (suppl), 339–362. Re-printed in Levi, I. (1997) The Covenant of Reason, Cambridge University Press. Lewis D. (1988) ‘Desire as Belief’, Mind, 97:323-332. Luce R. D. (1959/2005) Individual Choice Behaviour. A Theoretical Analysis, John Wiley and Sons. Reprinted 2005 by Dover. Luce R. D. (1977) ‘The Choice Axiom after Twenty Years’, Journal of Mathematical Psychology 15:215-233. Luce D. and H. Raiffa (1957) Games and Decisions: Introduction and Critical Survey, Wiley and Sons. Malmn¨as, P-E. (1994) ‘Axiomatic Justifications of the Utility Principle - A Formal Investigation’, Synthese, 99:233–249. Manski, C.F. (1977) ‘The structure of random utility models Journal’, Theory and Decision, 8:229– 254. Marschak J. (1950) ‘Rational Behaviour, Uncertain Prospects, and Measurable Utility’, Econimetrica, 18:111–141. McNaughton R. (1953) ‘A Metrical Concept of Happiness’, Philosophy and Phenomenological Research, 14:172–183. Mellor D. H. (1971) The Matter of Chance, Cambridge: Cambridge University Press. Mill J. S. (1863/1998) Utilitarianism, Ed. R. Crisp, Oxford: Oxford University Press. Milnor J.W. (1954) ‘Games against nature’, in Thrall et al: Decision Processes, Wiley & Sons, 49–60. von Neumann J. and O. Morgenstern (1947) Theory of Games and Economic Behavior, 2nd edition, Princeton University Press. (1st ed. without utility theory.) Nozick R. (1969) ‘Newcomb’s Problem and Two Principles of Choice’, in Rescher, N. et al. (eds), Essays in Honor of Carl G. Hempel, Reidel, Dordrecht, 114–146.
166
References
Oddie G. and P. Milne (1991) ‘Act and Value: Expectation and the Representability of Moral Theories’, Theoria, 57:42–76. Peterson M. (2002) ‘An argument for the principle of maximizing expected utility’, Theoria, 68:112–128. Peterson M. (2003a) ‘Transformative decision rules’, Erkenntnis, 58:71–85. Peterson M, (2003b) Transformative Decision Rules: Foundations and Applications (diss.), Royal Institute of Technology, Stockholm. Peterson M. (2004a) ‘From outcomes to acts: a non-standard axiomatization of the expected utility principle’, Journal of Philosophical Logic, 33:361–378. Peterson M. (2004b) ‘Transformative decision rules, permutability, and non-sequential framing of decision problems’, Synthese, 139:387–403. Peterson M. (2006a) ‘Indeterminate preferences’, Philosophical Studies, 130:297–320. Peterson M. (2006b) ‘The precautionary principle is incoherent’, Risk Analysis, 26:595–601. Peterson M. (2006c) ‘Should the precautionary principle guide our actions or our beliefs?’, Journal of Medical Ethics, in press. Peterson M. and S. O. Hansson (2004) ‘Order-independent transformative decision rules’, Synthese, 147:323–342. Pollock J. L. (2002) ‘Causal Probability’, Synthese, 132:143-85. Popper, K. R. (1957) ‘The Propensity Interpretation of the Calculus of Probability and the Quantum Theory’ in S. Krner (ed.), The Colston Papers, 9:65–70. Pratt J.W. (1964) ‘Risk Aversion in the Small and in the Large’, Econometrica, 32:122–36. Quine W. V. (1992) Pursuit of truth, Harvard University Press. Quine W. V. (1951) ‘Two Dogmas of Empiricism’, The Philosophical Review, 60:20–43. Rabinowicz W. (1995) ‘On Seinfeld’s Criticism of Sophisticated Violations of the Independence Axiom’, Theory and Decision, 43:279–292. Rabinowicz W. (2002) ‘Does Practical Deliberation Crowd Out Self-Prediction?’, Erkenntnis 57:91–122. Ramsey F. P. (1926) ‘Truth and Probability’, in Ramsey, 1931, The Foundations of Mathematics and other Logical Essays, edited by R.B. Braithwaite, London: Kegan, Paul, Trench, Trubner & Co., New York: Harcourt, Brace and Company. Ramsey F. P. (1928) ‘Probability and Partial Belief’, in Ramsey, 1931, The Foundations of Mathematics and other Logical Essays, edited by R.B. Braithwaite, London: Kegan, Paul, Trench, Trubner & Co., New York: Harcourt, Brace and Company. Resnik M. (1993) Choices. An introduction to decision theory, University of Minnesota Press. Resnik, D. (2004) ‘The precautionary principle and medical decision making’, Journal of Medicine and Philosophy, 29:281- 299. Roberts F. and D. R. Luce (1968) ‘Axiomatic Thermodynamics and Extensive Measurement’, Synthese, 18:311–326. Roberts F. (1979) Measurment Theory, volume 7 of Gian-carlo Rota (ed.), Encyclopedia of Mathematics and its Applications, Addison-Wesley, Reading, Mass. Rozemond M. (1999) ‘Descartes on Mind-Body Interaction: What’s the Problem?’, Journal of the History of Philosophy 37:435–467. Sahlin N-E. (1981) ‘Preference among preferences as a method for obtaing a higher-ordered metric scale’, The British Journal of Mathematical and Statistical Psychology, 34:62–75. Sahlin N-E. (1990) The philosophy of F. P. Ramsey, Cambridge University Press. Samuleson P. (1938) ‘A Note on the Pure Theory of Consumer’s Behaviour’, Economica, 5:61–71. Sandin P. (1999) ‘Dimensions of the Precautionary Principle’, Human and Ecological Risk Assessment, 5:889–907. Savage L. J. (1954/1972) The Foundations of Statistics, Wiley and Sons. Reprinted 1972 by Dover. Schmidt U. (1998) Axiomatic Utility Theory under Risk: Non-Archimedian Representations and Application to Insurance Economics, Lecture Notes in Economics and Mathematical Systems, Springer Verlag. Segal U. (1990) ‘Two-Stage Lotteries without the Reduction Axiom’, Econometrica, 58:349–379. Sen A. (1982) Choice, Welfare and Measurement, Blackwell.
References
167
Spohn W. (1977) ‘Where Luce and Krantz Do Really Generalize Savage’s Decision Model’, Erkenntnis 11:113–134. Sugden R. (2004) ‘Loss Aversion and Prefrence Imprecision’, paper presented at 11-th International Conference on the Foundations and Applications of Utility, Risk and Decison Theory. T¨annsj¨o, T. (1998), Hedonistic Utilitarianism, Edinburgh University Press. Varian H. R. (1999) Intermediate Microeconomics: A Modern Approach, 5. ed., W.W. Norton. Venn J. (1867) The Logic of Chance, Macmillan. Whipple C. (ed.) (1987), De Minimis Risk, Plenum Press. Yilmaz M. R. (1997) ‘In Defense of a Constructive, Information-Based Approach to Decision Theory’, Theory and Decision 43:21–44. Zynda L. (2000) ‘Representation Theorems and Realism About Degrees of Belief’, Philosophy of Science 67:45–69.
Index
act, 8 Allais paradox, 5, 120–123 Anscombe and Aumann, 17–20 Arrow, 26 Bayes, 13 Bayes’s theorem, 2–3, 98 Bentham, 82 Bernoulli, 110 Bolker, 23–26 Broome, 4 Buffon, 34 Carnap, 97 causal decision theory, 9 Chernoff, 132 choice axiom, 71–74, 87 Clemen, 7 coherently extended, 28 consequence, 8 Coombs, 71 Dancy, 4, 33 Davidson, 79 de minimis, 34, 51 Debreu, 71 decision problem, 7 degrees of incomparability, 70 DeGroot, 11, 95, 102 E-Admissibility, 33 ecumenism, 57 effective decision rule, 34 emotionally inert agents, 99 epistemic risk, 8 evidential decision theory, 9 expected monetary value, 110
expected utility, 1, 110–125 act-based axiomatisation, 118 independence axiom, 21, 120, 123–125 rule-based axiomatisation, 112 trade-off principle, 123 fatal outcome, 129 G¨ardenfors and Sahlin, 8 Good, 95 Halld´en, 11, 92 Hammond, 7 Hansson, 26 horse race lottery, 19 Humphreys’ paradox, 98 indeterminate preference, 61–69 indifference, 62, 69 Jeffrey, 6–9, 23–26 Joyce, 10, 28–29 Kahneman and Tversky, 35 Keynes, 4, 97 Koopman, 95 Kreps, 18 Kripke, 91 Laplace, 4, 33, 97 law of large numbers, 110 Levi, 8, 63, 78 Lewens, 141 Lewis, 92 Luce, 11, 72, 88, 91 Luce and Raiffa, 33, 52, 133 Malmn¨as, 26
169
170 maximum probable loss, 130 McNaughton, 82 Mellor, 4 merger of acts, 55 merger of states, 54 Mill, 82 Milnor, 123 Newcomb-style problem, 10 non-ideal agent, 28 non-perfect object, 90 objective probability, 96 Oddie and Milne, 111 ordering axiom, 17 outcome, 8 P-reflexivity, 71 P-symmetry, 71 P-transitivity, 71–73 particularism in ethics, 33 Pollock, 6 Popper, 4 Port-Royal Logic, 110 pre-deliberative phase, 31 precautionary principle, 44, 135 preference for false positives, 142 principle of insufficient reason, 33 probabilistic analysis of preference, 69 probabilistic theory of utility, 87 prospect rule, 35 qualitative probability, 22 quantitative probability, 22 quasi-expected utility, 30 Quine, 53, 111 Rabinowicz, 18, 78 Ramsey, 5–6, 27, 61, 90 reasons for action, 4 representation and uniqueness theorem, 22, 88 representation theorem, 17
Index Resnik, 7, 26 revealed preference, 64 roulette lotteries, 19 Sahlin, 11, 92 Samuelson, 64 Sandin, 134 Savage, 5, 6, 8, 20–22, 29, 31, 62, 122 Schervish et al, 101 Schmidt, 18 second order preference, 93 sectarianism, 57 Seidenfeld, 40 Spohn, 77 Sugden, 64 sure-thing principle, 21 transformative decision rule, 32–59 achievability, 40 acyclicity, 52 composite decision rule, 34 conservativity, 52 deliberative values, 36 iterativity, 45, 48 mixed rules, 34 permutability, 43 rival representations, 53 strong monotonicity, 40 upvector, 41 weak monotonicity, 38–43, 52 uncertain prospect, 8 uniqueness theorem, 17 von Neumann and Morgenstern, 18, 123 weak axiom of revealed preference, 65 Weirich, 6 Yilmaz, 26 Zynda, 26