RESEARCH IN EXPERIMENTAL ECONOMICS Series Editor: Mark R. Issac Recent Volumes: Volume 7: Volume 8:
Emissions Permit Experiments, 1999 Research in Experimental Economics, 2001
Volume 9: Volume 10: Volume 11:
Experiments Investigating Market Power, 2002 Field Experiments in Economics, 2005 Experiments Investigating Fundraising and Charitable Contributors, 2006
RESEARCH IN EXPERIMENTAL ECONOMICS
VOLUME 12
RISK AVERSION IN EXPERIMENTS EDITED BY
JAMES C. COX Andrew Young School of Policy Studies, Georgia State University, Atlanta, USA
GLENN W. HARRISON Department of Economics, College of Business Administration, University of Central Florida, Orlando, USA
United Kingdom – North America – Japan India – Malaysia – China
JAI Press is an imprint of Emerald Group Publishing Limited Howard House, Wagon Lane, Bingley BD16 1WA, UK First edition 2008 Copyright r 2008 Emerald Group Publishing Limited Reprints and permission service Contact:
[email protected] No part of this book may be reproduced, stored in a retrieval system, transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without either the prior written permission of the publisher or a licence permitting restricted copying issued in the UK by The Copyright Licensing Agency and in the USA by The Copyright Clearance Center. No responsibility is accepted for the accuracy of information contained in the text, illustrations or advertisements. The opinions expressed in these chapters are not necessarily those of the Editor or the publisher. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-7623-1384-6 ISSN: 0193-2306 (Series)
Awarded in recognition of Emerald’s production department’s adherence to quality systems and processes when preparing scholarly journals for print
LIST OF CONTRIBUTORS Steffen Andersen
Copenhagen Business School, Denmark
Peter Bossaerts
Swiss Federal Institute of Technology, Lausanne, Switzerland
Keith H. Coble
Mississippi State University, USA
James C. Cox
Georgia State University, USA
Glenn W. Harrison
University of Central Florida, USA
Frank Heinemann
Berlin University of Technology, Germany
Charles A. Holt
University of Virginia, USA
Morten I. Lau
Durham University, UK
Susan K. Laury
Georgia State University, USA
Jayson L. Lusk
Oklahoma State University, USA
E. Elisabet Rutstro¨m
University of Central Florida, USA
Vjollca Sadiraj
Georgia State University, USA
Nathaniel T. Wilcox
University of Houston, USA
William R. Zame
University of California at Los Angeles, USA
vii
RISK AVERSION IN EXPERIMENTS: AN INTRODUCTION James C. Cox and Glenn W. Harrison Attitudes to risk play a central role in economics. Policy makers should know them in order to judge the certainty equivalent of the effects of policy on individuals. What might look like a policy improvement when judged by the average impact could easily entail a welfare loss for risk averse individuals if the variance of expected impacts is wide compared to the alternatives. Economists interested in behavior also need to be interested in risk attitudes. In some settings, risk plays an essential role in accounting for behavior: job search and bidding in auctions are two of the best studied. But some assumptions about risk attitudes play a role in many more settings. The predictions of game theory rest on payoffs defined over utility, so we (almost always) need to know something about utility functions in order to make these predictions operationally meaningful. Estimates of subjective discount rates are needed to understand intertemporal choice behavior, and are defined in terms of the present value of utility streams, so we need to know utility functions in order to be able to correctly estimate discount rates reliably. However, one of the perennial challenges of testing economic theory is that predictions from theory often depend on unobservables. In this setting, an unobservable is some variable that is part of a theory of behavior, but that cannot be directly observed without making some assumption; it is therefore a latent variable to the observer. Experimental methods offer a Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 1–7 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00001-X
1
2
JAMES C. COX AND GLENN W. HARRISON
significant methodological advance in such settings. In some cases, one can completely sidestep the identification issue by directly inducing values or preferences. In other cases, one can often design an experiment to identify the previously unobservable variable, at least under some assumptions about the rationality of agents. This general point is, in fact, the major methodological innovation of experimental economics. Binswanger (1982, p. 393) was among the first to see the broader implications of experimental methods for the estimation or control of latent variables such as risk attitudes and subjective beliefs. Despite the progress of past decades, risk attitudes are confounding unobservables that have remained latent in a wide range of experiments. The focus of this volume is on the treatment of risk aversion in the experimental literature, including the interpretation of risk aversion as potentially involving more than just the concavity of the utility function. Experimental methods can be viewed now as one of the major tools by which theories are rendered operational. In many cases, it is simply impossible to efficiently test theory without experiments, since too many variables have to be proxied to provide tests that are free of major confounds. Experiments also provide a useful lightning rod for controversies over the interpretation of theory, as we will see. This meta-pedagogic role of experiments is often misunderstood as intellectual navel-gazing. Why spend so much effort trying to understand the behavior of students in a cloistered lab? The answer is simple: if we cannot understand their behavior, with some effort, then we have no business claiming that we can understand behavior in less controlled, naturally occurring settings. This does not mean that any experimental task provides insight into every naturally occurring setting. In fact, in their short history experimental economists have been remarkably adept at finding ways in which their procedures or instructions might create unusual or unfamiliar tasks for subjects. The remedy in that case is just to design and run a better experiment for the inferential purpose. So experiments provide a focal point and meeting ground for theorists and applied economists. This volume admirably reflects that role for experiments. Chapters 2–4 provide analyses of topics that arise at the point of contact between experimental economists and theorists over the concept of risk aversion (Cox and Sadiraj [Chapter 2]), ways in which different experimental procedures and estimation methods affect inference about risk (Harrison and Rutstro¨m [Chapter 3]), and the sense in which stochastic assumptions should be viewed as substantive theoretical hypotheses about the random parts of behavior (Wilcox [Chapter 4]).
Introduction
3
Three surprising themes emerge from these initial chapters, even for those that know the experimental literature reasonably well. First, most of the theoretical, behavioral, and econometric issues that face analysts using expected utility theory (EUT) also apply to those using rankdependent and sign-dependent alternatives. It is simply not the case that EUT is dead as a descriptive model of broad applicability, or that the inferential tools for applying EUT and alternative models are all that different. It is hard to understand how anyone can read Hey and Orme (1994) and Harless and Camerer (1994) and come to any other conclusion, but many have. We believe that this misreading of the literature comes from an undue focus on special cases, which we liken to ‘‘trip-wire’’ tests of EUT. We say ‘‘undue’’ carefully here, since there is some value in looking at these cases because they allow different qualitative predictions. But they often imply quantitatively and stochastically insignificant predictions, such as ‘‘preference reversal’’ tests if the subjects are risk neutral. We view the challenge of the behaviorists as an implied call to state theoretical implications more explicitly, to design procedures more carefully, and above all to undertake econometric inference more rigorously. Chapters 2–4 review efforts to do that, and systematically reject simplistic conclusions about one or other model of risk attitudes being correct. The second theme is that one cannot maintain the presumed division of labor between the theorist, experimenter, and econometrician. If you write down a theory with no stochastic errors, it can be rejected by the slightest deviation from predicted behavior. Any theory, not just EUT. Absent the archetypal ‘‘clean room’’ to undertake our experiments in, we should expect some deviations, no matter how small. So we have to say something formal about how we identify those deviations, and what inferential weight to put on them. The tendency to let these metrics of evaluation be implicit has led to unqualified claims about stylized facts on risk attitudes that do not withstand careful scrutiny. But the moment one starts to be explicit about the metric of evaluation, it becomes clear that the metric chosen has theoretical import for the testable hypotheses of the theory, as well as implications for the design of experiments. These metrics cannot just be an afterthought, as all three chapters illustrate. The third theme is the tendency by theorists and experimental economists to gloss over the difference between in-sample predictions and out-of-sample predictions. Theorists want to make evaluations of the plausibility of empirical estimates of risk attitudes using out-of-sample predictions, and yet ignore the well-known statistical uncertainty that comes from applying
4
JAMES C. COX AND GLENN W. HARRISON
estimates beyond the domain of estimation. On the other hand, experimental economists producing these estimates have been strikingly loathe to qualify their claims about risk attitudes only applying ‘‘locally’’ to the prizes given to subjects. Theorists have to start using econometric language if they want to draw disturbing implications of estimates that come with standard errors, and applied researchers need to be wary of the substantive implications of making alternative stochastic assumptions. To illustrate this point, which connects Chapters 2–4, consider the estimation of the humble Constant Relative Risk Aversion utility function u(y) ¼ y1r/(1r) from the responses to the famous binary choice experiments of Hey and Orme (1994). This experiment gave 100 choices to 80 subjects over lotteries defined on prizes of d0, d10, d20, and d30. Maximum likelihood methods from Table 8 of Harrison and Rutstro¨m [chapter 3] generate an estimate of r ¼ 0.613, implying modest risk aversion under EUT. The standard error on this estimate is 0.025, and the 95% confidence interval (CI) is between 0.56 and 0.66, so the evidence of risk aversion is statistically significant and we can reject the hypothesis of risk neutrality (r ¼ 0 here). Fig. 1 shows predicted in-sample utility values and their 95% CI using these estimates. Obviously the cardinal values on the vertical axis are
10
8
6
4
2
0 0
5
10
15
20
25
30
Income in British Pounds (£)
Fig. 1. Estimated In-Sample Utility. Estimated from responses of 80 subjects over 100 binary choices. Data from Hey and Orme (1994): choices over prizes of d0, d10, d20, and d30. Point prediction of utility and 95% CIs.
5
Introduction
arbitrary, but the main point is to see how relatively tight the CIs are in relation to the changes in the utility numbers over the lottery prizes. By contrast, Fig. 2 extrapolates to provide predictions of out-of-sample utility values, up to d1000, and their 95% CIs. The widening CIs are exactly what one expects from elementary econometrics. And it should be mentioned that they would be even wider if we accounted for our uncertainty that this is the correct functional form, and our uncertainty that we had used the correct stochastic identifying assumptions. Moreover, the (Fechner) error specification used here allows for an extra element of imprecision when predicting what a subject would actually choose after evaluating the expected utility of the out-of-sample lotteries, and this does not show up in Fig. 2. The lesson here is that we have to be cautious when we make theoretical and empirical claims about risk attitudes. If the estimates displayed in Fig. 1 are to be used in the out-of-sample domain of Fig. 2, the extra uncertainty of prediction in that domain should be acknowledged. Chapter 2 shows why we want to make such predictions, for both EUT and non-EUT specifications; Chapter 3 shows how one can marshal experimental designs and econometric methods to do that; and Chapter 4 shows how alternative stochastic assumptions can have strikingly different substantive implications for the estimation of out-of-sample risk attitudes. 50 40 30 20 10 0 0
100
200
300
400
500
600
700
800
900
1000
Income in British Pounds (£)
Fig. 2. Estimated Out-of-Sample Utility. Estimated from responses of 80 subjects over 100 binary choices. Data from Hey and Orme (1994): choices over prizes of d0, d10, d20, and d30. Point prediction of utility and 95% CIs.
6
JAMES C. COX AND GLENN W. HARRISON
The remaining chapters consider a variety of specific issues. Heinemann [Chapter 5] considers the manner in which experimental wealth might be integrated with experimental income, in the context of a reexamination of inferences from the design of Holt and Laury (2002). He proposes that one estimates the wealth parameter in the subject’s utility function along with the risk attitude. One can equivalently view this parameter as referring to baseline consumption, with which the experimental prize is integrated by the subject when evaluating lotteries. The contribution here is to point out how alternative assumptions about what is the behaviorally relevant argument of the utility function can influence inferences about risk attitudes. Although this is well known theoretically (e.g., Cox & Sadiraj, 2006), it has only recently been explored by experimental economists when making inferences about risk attitudes (e.g., Harrison, List, & Towe, 2007; Andersen, Harrison, Lau, & Rutstro¨m, 2008). Lusk and Coble [chapter 6] examine the effect of the presence of background risk on the foreground elicitation of risk attitudes. The naturally occurring environment in which risky choices are made is not free of risks, and the theoretical literature on portfolio allocation has extensively examined the effects of correlated risks. But a newer strand of theoretical literature examines the role of uncorrelated risk, and under what circumstances it leads the decision maker to behave as if more or less risk averse in terms of the foreground choice task. The laboratory experiment considered here cleanly identifies a possible role for background risk, in the direction predicted by EUT, and complements the field experiments by Harrison et al. (2007) studying the same hypothesis. The study carefully points out how these conclusions are conditional on certain plausible modeling assumptions in the ex post analysis of the experimental data, illustrating again one of the general themes noted earlier. Bossaerts and Zame [chapter 7] study the presence of risk aversion in experimental asset markets. They find evidence of risk aversion, and also of an ‘‘equity premium’’ once one allows for risk attitudes. Their results extend evidence of risk aversion in ‘‘low stakes’’ settings beyond the type of individual choice task typically used in risk aversion elicitation experiments. Andersen, Harrison, Lau, and Rutstro¨m [chapter 8] review the use of natural experiments from large-stake game shows to measure risk aversion. In many cases, these shows provide evidence when contestants are making decisions over very large stakes, and in a replicated, structured way. They consider the game shows Card Sharks, Jeopardy!, Lingo, and finally Deal Or No Deal, which have all been examined in the literature in terms of the
7
Introduction
implied risk attitudes. They also provide a detailed case study of Deal Or No Deal, since it is one of the cleanest games for inference and has attracted considerable attention. They propose, and test, a general method to overcome the curse of dimensionality that one encounters when estimating risk attitudes in the context of a dynamic, stochastic programming environment. Finally, Laury and Holt [chapter 9] consider the ‘‘reflection effect of prospect theory,’’ which is one of the stylized facts that one hears repeatedly about how risk attitudes vary over the gain and loss domain. Extending the popular design first presented in Holt and Laury (2002), they show that the evidence for the reflection effect is not at all clear when one pays subjects for their choices, and that it is arguably just another artifact of using hypothetical responses. The data, instructions, and statistical code to replicate the empirical analyses in each chapter are available at the ExLab Digital Library at http://exlab.bus.ucf.edu.
ACKNOWLEDGMENTS The authors thank Nathaniel Wilcox for comments and the US National Science Foundation for research support under grants NSF/DUE 0622534 and NSF/IIS 0630805 (Cox) and NSF/HSD 0527675 and NSF/SES 0616746 (Harrison).
REFERENCES Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008). Eliciting risk and time preferences. Econometrica, 76, forthcoming. Binswanger, H. (1982). Empirical estimation and use of risk preferences: Discussion. American Journal of Agricultural Economics, 64, 391–393. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56, 45–60. Harless, D. W., & Camerer, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62, 1251–1289. Harrison, G. W., List, J. A., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. Econometrica, 75, 433–458. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62, 1291–1326. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655.
RISKY DECISIONS IN THE LARGE AND IN THE SMALL: THEORY AND EXPERIMENT James C. Cox and Vjollca Sadiraj 1. INTRODUCTION Much of the literature on theories of decision making under risk has emphasized differences between theories. One enduring theme has been the attempt to develop a distinction between ‘‘normative’’ and ‘‘descriptive’’ theories of choice. Bernoulli (1738) introduced log utility because expected value theory was alleged to have descriptively incorrect predictions for behavior in St. Petersburg games. Much later, Kahneman and Tversky (1979) introduced prospect theory because of the alleged descriptive failure of expected utility (EU) theory (von Neumann & Morgenstern, 1947). In this essay, we adopt a different approach. Rather than emphasizing differences between theories of decision making under risk, we focus on their similarities – and on their common problems when viewed as testable theories. We examine five prominent theories of decision making under risk – expected value theory, EU theory, cumulative prospect theory, rank dependent utility theory, and dual theory of EU – and explain the fundamental problems inherent in all of them. We focus on two generic types of problems that are common to theories of risky decisions: (a) generalized St. Petersburg paradoxes; and (b) implications
Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 9–40 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00002-1
9
10
JAMES C. COX AND VJOLLCA SADIRAJ
of implausible risk aversion. We also discuss the recent generalization of the risk aversion calibration literature, away from its previously exclusive focus on implications of decreasing marginal utility of money, to include implications of probability transformations (Cox, Sadiraj, Vogt, & Dasgupta, 2008b). We also note that much recent discussion of alleged ‘‘behavioral’’ implications of Rabin’s (2000) concavity calibration proposition has not involved any credible observations of behavior, and discuss possible remedies including the experiments reported in Cox et al. (2008b) and other designs for experiments outlined below. Section 2 in the chapter discusses ‘‘utility functionals’’ that represent risk preferences for the five representative theories of decision making under risk listed above and defines a general class of theories that contains all of them. In Section 3, we discuss issues that arise if the domain on which theories of decision making under risk aversion are defined is unbounded, as in the seminal papers on the EU theory of risk aversion by Arrow (1971) and Pratt (1964) and the textbook by Laffont (1989). These prominent developments of the theory assume bounded utility (see, for example, Arrow, 1971, p. 92 and Laffont, 1989, p. 8) in order to avoid generalized St. Petersburg paradoxes on an unbounded domain. We demonstrate that this traditional assumption of bounded utility substitutes one type of problem for another because, on unbounded domains, bounded utility implies implausible risk aversion (as defined in Section 3.1 below). Our discussion is not confined to EU theory. We demonstrate that, on an unbounded domain, all five of the prominent theories of risky decisions have arguably implausible implications: with unbounded utility (or ‘‘value’’ of ‘‘money transformation’’) functions there are generalized St. Petersburg paradoxes and with bounded utility functions there are implausible aversions to risk taking. One possible reaction to the analysis in Section 3 might be: ‘‘So what? All empirical applications of risky decision theory are on bounded domains, so why should an applied economist care about any of this?’’ The answer to this question is provided in subsequent sections of the chapter in which we elucidate how the analysis on an unbounded domain causes one to ask new questions about applications of risky decision theories on bounded domains. We explain how finite St. Petersburg games provide robustness tests for empirical work on risk aversion on bounded domains. We discuss parametric forms of money transformation (or utility) functions commonly used in econometric analysis of lottery choice data and calibrate implications of parameter estimates in the literature for binary lottery preferences. These implied preferences over binary lotteries provide the basis
11
Risky Decisions in the Large and in the Small
for robustness tests of whether the reported parameter estimates can, indeed, rationalize the risk preferences of the subjects. Finally, we consider risk aversion patterns that are not based on parametric forms of money transformation functions or probability transformation functions. We summarize recent within-subjects experiments on the empirical validity of the postulated patterns of risk aversion underlying the concavity calibration literature and extensions of this literature to include convexity calibration of probability transformations. We also explain why some across-subjects experiments on concavity calibration reported in the literature do not, in fact, have any implications for empirical validity of calibrated patterns of small stakes risk aversion.
2. REPRESENTATIVE THEORIES OF DECISION UNDER RISK Let fY n ; Pn g denote a lottery that pays amounts of money Y n ¼ ½ yn ; yn@1 ; . . . ; y1 with respective probabilities Pn ¼ ½ pn ; pn@1 ; . . . ; p1 ; n P2n N, the set of integers, and yj yj@1 ; pj 0, for j ¼ 1; 2; . . . ; n, and j¼1 pj ¼ 1. This essay is concerned with theories of preferences over such lotteries. In representing the theories with utility functionals, it will be useful to also define notation for the probabilities of all outcomes except yj: P@j n ¼ ½ pn ; pn@1 ; . . . pj@1 ; pjþ1 ; . . . ; p1 . We discuss expected value theory (Bernoulli, 1738), EU theory (von Neumann & Morgenstern, 1947), dual theory of EU (Yaari, 1987), rank dependent utility theory (Quiggin, 1982, 1993), and cumulative prospect theory (Tversky & Kahneman, 1992). All five of these theories represent risk preferences with utility functionals that have a common form that is additive across states of the world (represented by the index j ¼ 1; 2; . . . ; n). This additive form defines a class D of decision theories that contains the above five prominent theories. We will review utility functionals for these five theories before stating the general functional form that can represent each theory’s typical functional as a special case. Expected value theory represents preferences over the lotteries with a functional of the form U EV ðfY n ; Pn gÞ ¼ a þ b
n X j¼1
pj yj ; b40
(1)
12
JAMES C. COX AND VJOLLCA SADIRAJ
The same EV preferences are represented when functional (1) is simplified by setting a ¼ 0 and b ¼ 1.1 We will avoid some otherwise tedious repetitions by using similar affine transformations of utility (or ‘‘money transformation’’) functions, without explicit discussion, for other theories considered in subsequent paragraphs. EU theory represents preferences over the lotteries with a functional that can be written as U EU ðfY n ; Pn gÞ ¼
n X
pj uðyj ; wÞ
(2)
j¼1
where w is the agent’s initial wealth. Utility functionals (1) and (2) are both linear in probabilities, which in the case of EU theory is an implication of the independence axiom. Functional (2) is linear in money payoffs y only if the agent is risk neutral. EU theory contains (at least) three models. The EU of terminal wealth model (Pratt, 1964; Arrow, 1971) assumes that risk preferences are defined over terminal wealth, i.e. that the ‘‘money transformation function’’ (or utility function) u takes the form uðy; wÞ ¼ jEUW ðy þ wÞ. The EU of income model commonly used in bidding theory assumes that risk preferences are independent of wealth, i.e. that the money transformation function takes the form uðy; wÞ ¼ jEUI ðyÞ.2 The EU of initial wealth and income model (Cox & Sadiraj, 2006) represents risk preferences with a money transformation function of the ordered pair of arguments (y,w). This model includes as special cases the terminal wealth model in which there is full asset integration, the income model in which there is no asset integration, and other models in which there is partial asset integration.3 The dual theory of EU represents preferences over the lotteries with a functional of the form " ! !# n n n X X X U DU ðfY n ; Pn gÞ ¼ f pk @f pk yj (3) j¼1
k¼j
k¼jþ1
Functional (3) is linear in payoffs as a consequence of the dual independence axiom. The transformation function f for decumulative probabilities is strictly convex if the agent is risk averse. If the agent is risk neutral then the decumulative probability transformation function f is linear and hence the utility functional (3) is linear in probabilities (in that special case).
13
Risky Decisions in the Large and in the Small
Rank dependent utility theory represents preferences over the lotteries with a functional of the form4 " ! !# j j@1 n X X X U RD ðfY n ; Pn gÞ ¼ q pk @q pk mðyj Þ (4) j¼1
k¼1
k¼1
Prospect theory transforms both probabilities and payoffs differently for losses than for gains. In the original version of cumulative prospect theory, Tversky and Kahneman (1992) defined gains and losses in a straightforward way relative to zero income. Some more recent versions of the theory have reintroduced the context-dependent gain/loss reference points used in the original version of ‘‘non-cumulative’’ prospect theory (Kahneman & Tversky, 1979). Let r be the possibly non-zero reference point value of money payoffs that determines which payoffs are ‘‘losses’’ (yor) and which payoffs are ‘‘gains’’ ( y4r) and let the lottery money payoffs yj be less than r for j N r . Then risk preferences for cumulative prospect theory can be represented with a functional of the form " ! !# j j@1 Nr X X X U CP ðfY n ; Pn gÞ ¼ w@ pk @w@ pk v@ðyj@rÞ j¼1
þ
n X
"
j¼N r þ1
k¼1 þ
w
n X k¼j
!
k¼1
pk @w
þ
n X
!# pk
vþ ðyj@rÞ
ð5Þ
k¼jþ1
In utility functional (5): v is the value function for losses; vþ is the value function for gains; and w@ and wþ are the corresponding weighting functions for probabilities (or ‘‘capacities’’). There is a discontinuity in the slope of the value function at payoff equal to the reference payoff r, which is ‘‘loss aversion.’’5 A strictly concave value function for gains vþ and associated S-shaped probability weighting function wþ are commonly used in applications of prospect theory. The analysis in subsequent sections will use a general form of utility functional that, with suitable interpretations, represents all of the above theories of decision making under risk. Let hD be a probability transformation function for theory D. Let a positively monotonic function jD denote a money transformation function for theory D. Let w be the amount of initial wealth. Let D be the set of decision theories D that represent preferences over lotteries by utility functionals with the form: n X U D ðfY n ; Pn gÞ ¼ hD ð pj ; P@j (6) n ÞjD ð yj ; wÞ j¼1
14
JAMES C. COX AND VJOLLCA SADIRAJ
The additive-across-states form of (6) defines the class D of theories we discuss. This class contains all of the popular examples of theories discussed above. Many results in following sections apply to all theories in class D. Discussion in subsequent sections will describe some instances in which specific differences between the utility functionals for distinct theories are relevant to the analysis of properties of the theories we examine. Before proceeding to analyze the implications of functionals of form (6), it might be helpful to further discuss interpretations of (6) using the examples of theories D in D mentioned above. In the case of expected value theory, the probability transformation function hD in (6), written as hEV , is a @j constant function of P@j n and is the identity map of pj: hEV ðpj ; Pn Þ ¼ pj for @j all ðpj ; Pn Þ. The money transformation function jEV is linear in y (or in y þ w). Functional (6) is interpreted for EU theory as follows. The probability transformation function hEU is a constant function of P@j n and is the identity map of pj, as a consequence of the independence axiom. Interpretations of the money transformation function jEU vary across three EU models, as explained above. The interpretation of functional (6) for the dual theory of EU is as follows. The money transformation function jDU is always linear in y (or in y þ w) as a consequence of the dual independence axiom. ThePprobability transformation function hDU is a composition of functions of k j pk and P k jþ1 pk as shown in statement (3). The probability transformation function is linear only if the agent is risk neutral. Functional (6) is interpreted for rank dependent utility theory as follows. The money transformation function jRD is a constant function of w and is increasing in y. The probability transformation function hRD is a comP P position of functions of k j pk and k j@1 pk as shown in statement (4). The interpretation of functional (6) for cumulative prospect theory is the most complicated one because of the various interdependent special features of that theory. The money transformation function jCP is a constant function of w and increasing in y, with a discontinuous change in slope at y ¼ r; furthermore, in some versions of the theory the reference point income r can be variable and context dependent. As shown in (5), theP probability transformation function hCP is a composition of functions Pj@1 j of p and p , when yor, and a composition of functions of Pn k¼1 k Pn k¼1 k p and p , when y r. k¼j k k¼jþ1 k We now proceed to derive some implications of theories in class D that have preferences over lotteries that can be represented by utility functionals with the form given by statement (6).
Risky Decisions in the Large and in the Small
15
3. THEORY FOR UNBOUNDED DOMAIN: ST. PETERSBURG PARADOX OR IMPLAUSIBLE RISK AVERSION We here discuss theories of decision making under risk in the domain of discourse adopted in classic expositions of EU theory such as Arrow (1971) and Pratt (1964), as well as in advanced textbook treatments such as Laffont (1989). In contrast to those studies, our discussion is not confined to EU theory but, instead, applies to all decision theories in class D. For any money transformation function jD , defined on an unbounded domain, there can be only two exclusive cases: the function is either unbounded from above or bounded. In this section we consider both of these two cases and show that all decision theories in class D have similar implausible implications. Models for theories in class D that assume unbounded money transformation functions are characterized by generalized St. Petersburg paradoxes. Models for theories in class D that assume bounded money transformation functions are characterized by implausible risk aversion, as defined below.
3.1. Unbounded Money Transformation Functions Some examples of unbounded money transformation functions are linear functions, power functions, and logarithmic functions. Daniel Bernoulli (1738) introduced the St. Petersburg paradox (as described in the next paragraph) that questioned the plausibility of expected value theory. Bernoulli offered log utility of money as a solution to the St. Petersburg paradox that preserves linearity in probabilities (and in that way anticipated subsequent development of EU theory). However, unbounded monotonic money transformation functions (including log functions) do not eliminate generalized St. Petersburg paradox problems for EU theory (Arrow, 1971, p. 92; Samuelson, 1977). We here explain that unbounded money transformation functions produce similar plausibility problems for other decision theories in class D (see also Rieger & Wang, 2006). The original St. Petersburg game pays 2k when a fair coin comes up heads for the first time on flip k, an event with probability 1/2k. The game can be represented by fY 1 ; P1 g ¼ f21 ; 1=21 g where 21 ¼ ½. . . ; 2n ; 2n@1 ; . . . ; 2 and 1=21 ¼ ½. . . ; 1=2n ; 1=2n@1 ; . . . ; 1=2. Expected theory evaluates this P value k k lottery according to U EV ðf21 ; 1=21 gÞ ¼ 1 2 ð1=2 Þ ¼ 1. Bernoulli k¼1
16
JAMES C. COX AND VJOLLCA SADIRAJ
(1738) famously reported that most people stated they would be unwilling to pay more than a small finite amount to play this game. A log utility of money function, offered by Bernoulli as an alternative to the linear utility of money function, does solve the paradox of the original St. Petersburg lottery P k because 1 ½lnð2 Þ ð1=2k Þ ¼ 2lnð2Þ is finite. k¼1 It is now well known that the log utility of money function cannot solve the paradox of a slightly modified version of the original St. Petersburg game: pay exp(2k) when a fair coin comes up heads for the first time on flip k. The problem is not with the log function per se. No unbounded money transformation function can eliminate problems of the St. Petersburg type of paradox for EU theory. For any jEU not bounded from above, define a sequence of payments X EU ¼ fxk : k 2 Ng such that, for all k, jEU ðzk Þ 2k , where zk equals either xk or w þ xk depending on whether one is applying the EU of income model or the EU of terminal wealth model.6 The EU of a St. Petersburg game that pays xk (instead of 2k) when a fair coin comes up heads for the first time on flip k is infinite. This is shown for the EU of income model by X 1 k 1 k X 1 1 1 U EUI x1 ; jEUI ðxk Þ 2k ¼ 1 (7) ¼ 21 2 2 k¼1 k¼1 Hence an EU maximizer whose preferences are represented with money transformation function jEUI for amounts of income would prefer game X EU to any certain amount of money, no matter how large. Similarly, an EU maximizer whose preferences are represented with money transformation function jEUW of amounts of terminal wealth would be willing to pay any amount p up to his entire (finite) amount of initial wealth w to play game X EU since, for all p w, X 1 k 1 k X 1 1 1 U EUW x1 ; jEUW ðw@p þ xk Þ 2k ¼ 21 2 2 k¼1 k¼1 ¼ 14jEUW ðwÞ
ð8Þ
The following proposition generalizes this result and demonstrates that unbounded money transformation functions produce similar plausibility problems for all decision theories in class D. One has: Proposition 1. Let an agent’s preferences defined on an unbounded domain be represented by functional (6) with an unbounded money transformation function j and a strictly positive probability transformation function h. The agent will reject any finite amount of money in favor of a St. Petersburg
Risky Decisions in the Large and in the Small
17
lottery that pays xk 2 X j;h ¼ fxj j j 2 N; jðxj Þ 1=hð1=2j ; ½1=21 @j Þg when a fair coin comes up heads for the first time on flip k. Proof. Apply the Lemma in Appendix A.1. To illustrate Proposition 1, we report examples of generalized St. Petersburg games for some of the alternatives to EU theory in class D, including dual theory of EU, rank dependent utility theory, and cumulative prospect theory. First consider the dual theory of EU with positively monotonic transformation f for decumulative probabilities. According to this theory, the St. Petersburg game that pays xn if the first head appears on flip n is evaluated by ! !! 1 1 X X X 1 1 U DU ðX DU Þ ¼ xn f @f k k n;xn 2X DU k¼n 2 k¼nþ1 2 X xn ðf ð21@n Þ@f ð2@n ÞÞ ð9Þ ¼ n;xn 2X DU
which is unbounded from above for xn from X DU ¼ fxn : n 2 N; xn 1=½ f ð21@n Þ@f ð2@n Þg. Next, consider rank dependent utility theory with transformation function q (for cumulative probabilities). Since jRD is not bounded from above, one can find a sequence of payments X RD ¼ fxn : n 2 N; jRD ðxn Þ 1=½qð1@2@n Þ@qð1@2@ðn@1Þ Þg. The rank dependent utility of the St. Petersburg game that pays xn, xn 2 X RD if a fair coin comes up heads for the first time on flip n is ! !! 1 n n@1 X X X 1 1 U RD ðX RD Þ ¼ jRD ðxj Þ q @q k k n¼1 k¼1 2 k¼1 2 1 X ¼ jRD ðxn Þðqð1@2@n Þ@qð1@2@ðn@1Þ ÞÞ ð10Þ n¼1
which is unbounded by construction of XRD. Finally, consider cumulative prospect theory with reference point equal to a given amount of money r. Let j@ CP be the money transformation (or ‘‘value’’) function for losses and jþ be the money transformation function for gains. CP Let w@ be the probability transformation in the loss domain and wþ be the probability transformation function in the gain domain. Assume loss aversion: a discontinuity of the slope of the value function at x ¼ r. Define
18
JAMES C. COX AND VJOLLCA SADIRAJ
þ P P @k @k @wþ . Without X CP ¼ xn : n 2 N; jþ kn 2 knþ1 2 CP ðxn@rÞ 1= w
loss of generality, let r be between xj and xjþ1 , for some j 2 N. The St. Petersburg game that pays xn 2 X CP if a fair coin comes up heads for the first time on flip n is evaluated by cumulative prospect theory as follows: ! !! j i i@1 X X X 1 1 @ @ @ U CP ðX CP Þ ¼ jPT ðxi@rÞ w @w k k i¼1 k¼1 2 k¼1 2 ! !! 1 1 1 X X X 1 1 þ þ jþ @wþ ð11Þ PT ðxn@rÞ w k k n¼jþ1 k¼n 2 k¼nþ1 2 Note that U CP ðX CP Þ is unbounded from above since the first term on the right hand side is always finite whereas the second term on the right is unbounded from above by construction of XCP. All of the above, of course, is also true if the reference point r is set equal to zero; therefore a prospect theory agent would prefer the lottery XCP to any finite amount of money. In this way, for any unbounded money transformation function one can construct a generalized St. Petersburg paradox for any of the five decision theories when they are defined on an unbounded domain. Bounded money transformation functions are immune to critique with generalized St. Petersburg lotteries. We will explain, however, that on unbounded domains bounded money transformation functions imply implausible risk aversion, as next defined. Let fy2 ; p; y1 g denote a binary lottery that pays the larger amount y2 with probability p and the smaller amount y1 with probability 1@p. We define ‘‘implausible risk aversion’’ for binary lotteries as follows. (I) Implausible risk aversion: for any z there exists a finite L such that the certain amount of money z þ L is preferred to the lottery f1; 0:5; zg.
3.2. Bounded Money Transformation Functions In order to escape the behaviorally implausible implications of the generalized St. Petersburg paradox for any theory in class D defined on an unbounded domain, one needs to use a money transformation function that is bounded from above. But bounded money transformation functions imply implausible risk aversion, as we shall explain. We start with two illustrative examples using bounded, parametric money transformation functions commonly used in the literature. Subsequently, we present a
19
Risky Decisions in the Large and in the Small
general proposition for bounded money transformation functions that applies to all theories in class D. One of the commonly used money transformation (or utility) functions in the literature is the (concave transformation of the) exponential function, commonly known as CARA, defined as:7 jD ðyÞ ¼ ð1@e@ly Þ;
l40
(12)
Define gD ð0:5Þ hD ð0:5; ½0:5Þ as the transformed probability of the higher outcome in a binary lottery with 0.5 probabilities of the two payoffs. For the exponential money transformation function in statement (12), it can be easily verified that decision theory D implies that a certain payoff in amount x þ lnð1@gD ð0:5ÞÞ@1=l is preferred to f1; 0:5; xg; for all x. For example, an EU maximizing agent (for whom gð0:5Þ ¼ 0:5) with l ¼ 0:29 would prefer a certain payoff of $25 (or, in the terminology of Proposition 2, x þ L ¼ $22 þ $3) to the lottery f$1; 0:5; $22g. The parameter value l ¼ 0:07 implies that an EU maximizing agent would prefer $32 for sure to the lottery f$1; 0:5; $22g. Another common parametric specification in recent literature is the expopower (EP) function introduced by Saha (1993). Using the same notation as Holt and Laury (2002), the EP function is defined as 1 1@r jD ðyÞ ¼ ð1@e@ay Þ; a
for ro1
(13)
The EP functional form converges to a CARA (bounded) function in the limit as r ! 0 and it converges to a power (unbounded) function in the limit as a ! 0. The power function is commonly known as CRRA.8 For some ða; rÞ parameter values the EP function is bounded while for other parameter values it is unbounded. With an EP function and aa0, a decision theory D implies that 1 þ ðx1@r þ ð1=aÞlnð1=ð1@gD ð0:5ÞÞÞÞ1=ð1@rÞ is preferred to f1; 0:5; xg; for any given x. For example, an EU maximizing agent with a ¼ 0:029 and r ¼ 0:269 would prefer a certain payoff in amount $77 to the lottery f$1; 0:5; $0g. The implied risk aversion for the above examples of money transformation functions would be at least as implausible with use of these parametric forms in cumulative prospect theory and rank dependent utility theory as in EU theory because in these former two theories the probability of the high outcome is pessimistically transformed; i.e. gD ð0:5Þo0:5.9 So, if models of cumulative prospect theory and rank dependent utility theory utilize the same bounded money transformation function as an EU model, then if the
20
JAMES C. COX AND VJOLLCA SADIRAJ
EU model predicts preference of a sure amount x þ L to risky lottery fG; 0:5; xg; for all G, so do cumulative prospect theory and rank dependent utility theory. These examples with commonly used parametric utility functions illustrate a general property of all theories in class D that admit bounded money transformation functions.10 The following proposition generalizes the discussion. Proposition 2. Consider any theory D in class D defined on an unbounded domain that assumes a bounded money transformation function. For any given x there exists a finite L such that x þ LD f1; 0:5; xg. Proof. See Appendix A.3. The import of Proposition 2 can be explicated by considering the special case in which the money transformation function jD has an inverse function j@1 D . In that case the proof of Proposition 2 in Appendix A.3 tells us that that if jD ðyÞ A for all y then for any x40 the certain amount of money zD ¼ j@1 D ðgD ð0:5ÞA þ ð1@gD ð0:5ÞÞxÞ is preferred to a 50/50 lottery that pays x or any positive amount G no matter how large (represented as N). Clearly, L ¼ zD@x. Proposition 2 tells us that a bounded money transformation function is a sufficient condition for the implication of implausible risk aversion of type (I) with decision theories in class D.
4. THEORY AND EXPERIMENTS FOR BOUNDED DOMAINS 4.1. Does the Original St. Petersburg Paradox have Empirical Relevance? There is a longstanding debate about the relevance of the original version of the St. Petersburg paradox for empirical economics. The claimed bite of the paradox has been based on thought experiments or hypothetical choice experiments in which it was reported that most people say they would be unwilling to pay more than a small amount of money to play a St. Petersburg game with infinite expected value. A traditional dismissal of the relevance of the paradox is based on the observation that no agent could actually offer a real St. Petersburg game for another to play because such an offer would necessarily involve a credible promise to pay unboundedly large
Risky Decisions in the Large and in the Small
21
amounts of money. Recognition that there is a maximum affordable payment can resolve the paradox for expected value theory. For example, if the maximum affordable payment is (or is believed by the decision maker to be) $3.3554 107 (¼ $225 ) then the original St. Petersburg lottery is a game that actually pays $2n if no25, and $225 for n 25. The expected value of this game is only $26, so it would not be paradoxical if individuals stated they would be unwilling to pay large amounts to play the game. If the maximum affordable payment is $210 ¼ $1; 024 (respectively, $29 ¼ $512) then the expected value is $11 (respectively, $10). It would be affordable to test predictions from expected value theory for the last two lotteries with experiments.
4.2. Does the Generalized St. Petersburg Paradox have Empirical Relevance? It is straightforward to construct affordable St. Petersburg lotteries for any decision theory in class D that assumes unbounded money transformation functions. A corollary to Proposition 1 provides a result for an affordable version of the generalized St. Petersburg game for risk preferences that can be represented by functional (6). Corollary 1. (An affordable version of the generalized St. Petersburg Game) For any given N, consider a St. Petersburg lottery that pays xn 2 X j;h when a fair coin comes up heads for the first time on flip n, for noN, and pays xN, otherwise. Let U denote the value of functional (6) for this lottery. Then the agent is indifferent between the lottery and receiving a certain amount j@1 D ðUÞ. Proof. See Appendix A.2. Let us see what Corollary 1 tells us about one of the commonly used unbounded money transformation functions in the literature, the power function. Suppose that an agent’s preferences are assumed to be represented by the EU of income model with CRRA or power function utility (or money transformation) function jEU ðxÞ ¼ x1@r =ð1@rÞ for some r 2 ð0; 1Þ. Then the lottery prizes can be set equal to xn ¼ ðð1@rÞ2n Þ1=ð1@rÞ for noN þ 1, and xN for n4N. The corollary implies that the agent with power function coefficient r would be indifferent between getting ðð1@rÞðN þ 1ÞÞ1=ð1@rÞ for sure and playing this game. Figures in the second column of Table 1 are constructed for generalized St. Petersburg games for different values of r.
L=[1,3,19,155] (CE(L)=6.45; EV(L)=23) L=[1,4,18,85,408] (CE(L)=9.78; EV=34.56)
0.67
g 0.62d
0.71e
0.56f
a 0.88d
0.5e
0.37b
L=[4,391] (CE(L)=61.62; EV(L)=197.5)
L=[4,36,96,220,503] (CE(L)=46.88; EV(L)=68.19)
L=[2,10,17,24,35,50,75,115,180,284,454] (CE(L)=18.50; EV(L)=11.11)
xn
CP and RD (jðxÞ ¼ xa )
b
A prize vector of length k means the lottery pays the nth coordinate when head appears for the first time on flip n for nok, and xk otherwise. The estimate of alpha is the estimate of Wu and Gonzalez (p. 1686) using Camerer and Ho (1994) data. c (field data) Campo, Perrigne, and Vuong (2000). d Tversky and Kahneman (1992). e Wu and Gonzalez (1996). f Camerer and Ho (1994).
a
0.56c
L=[1,4,16,64,256] (CE(L)=9; EV(L)=23.5)
0.5
(f(p)=p/(2p)) L=[2,6,14,30,62,126,254,510] (CE(L)=9.6; EV(L)=16) (f(p)=p2) L=[2,6,22,86,342] (CE(L)=6; EV=32)
xn ¼ 1=hD ð2@n Þ
xn ¼ ðð1@rÞ2n Þ1=ð1@rÞ
L=[2,5,9,20,42,91,196,422] (CE(L)=10.56; EV(L)=12.19)
DU
Power function EU
0.1
r
Payments in Finite St. Petersburg Lotteriesa.
EV: xn ¼ 2n , [2,4,8,16,32,64,128,256,512], (EV=10)
Table 1.
22 JAMES C. COX AND VJOLLCA SADIRAJ
Risky Decisions in the Large and in the Small
23
Papers on several laboratory and field experiments reported power function (CRRA) estimates in the range 0.44 to 0.67.11 The r ¼ 0:5 value in the table is close to the midpoint of these estimates. As shown in Table 1, an EU of income maximizer with power function parameter 0.5 has a certainty equivalent (CE) equal to 9 for the affordable St. Petersburg lottery {YN, 1/2N} with prizes and Y 1 ¼ ½. . . ; 256; 256; 64; 16; 4; 1, and respective probabilities 1=21 ¼ ½. . . ; 2@n ; . . . 2@2 ; 2@1 . For cumulative prospect theory with a value function xa and weighting function wþ ðpÞ ¼ pg =ðpg þ ð1@pÞg Þ1=g and with reference point 0 (as in Tversky & Kahneman, 1992), consider the St. Petersburg game that pays " #1=a ð21@n Þg ð2@n Þg xn ¼ @ (14) ðð21@n Þg þ ð1@ð21@n ÞÞg Þ1=g ðð2@n Þg þ ð1@ð2@n ÞÞg Þ1=g if head appears for the first time on the n-th flip for noN, and pays xN if the first head appears on any toss n N þ 1. According to cumulative prospect theory, the utility of this game is N þ 1. Hence, the agent will be indifferent between $(N+1)1/a for sure and playing this game. Similar results hold for rank dependent utility theory. The last column of Table 1 shows a sequence of payments in an affordable St. Petersburg lottery for cumulative prospect theory models with a and g parameter values reported by Camerer and Ho (1994), Tversky and Kahneman (1992), and Wu and Gonzalez (1996). The Wu and Gonzalez parameter values of ða; gÞ ¼ ð0:5; 0:71Þ imply that a cumulative prospect theory decision maker with zero reference point has a CE of 46.88 for an affordable St. Petersburg lottery ({YN, PN}) with prizes Y 1 ¼ ½. . . ; 503; 503; 220; 96; 36; 4. As shown in Table 1, the parameter values ða; gÞ ¼ ð0:37; 0:56Þ used for rank dependent utility theory and cumulative prospect theory imply that an agent’s CE for the lottery {YN, 1/2N} with prizes Y 1 ¼ ½. . . ; 391; 391; 4 is 61.62. Finally, for the dual theory of EU we report payments involved in a generalized St. Petersburg game for two specifications of the function f: (a) f ðpÞ ¼ p=ð2@pÞ and (b) f ðpÞ ¼ p2 . The first specification is offered by Yaari as an example that solves the common ratio effect paradox (Yaari, 1987, p.105). The second specification is used to demonstrate a rationale for using the Gini coefficient to rank income distributions (Yaari, 1987, p. 106). Generalized versions of the St. Petersburg game involve payments 2nþ1@1 and 4n. The affordable versions of the generalized
24
JAMES C. COX AND VJOLLCA SADIRAJ
St. Petersburg game are reported in the DU column in Table 1. In case (b) with f ðpÞ ¼ p2 , an example is provided by the sequence of payments vDU ¼ ½. . . ; 342; 342; 86; 22; 6; 2 with expected value of 32 and dual EU U DU ðvDU ; 1=21 Þ ¼ 6.
4.3. A Real Experiment with a Finite St. Petersburg Game An experimental design with clear relevance to evaluating the empirical applicability of expected value theory is to offer subjects a finite St. Petersburg bet with highest possible payoff an amount that is known to be affordable for payment by the experimenter. One such experiment, reported by Cox, Sadiraj, and Vogt (2008a), involved offering subjects the opportunity to decide whether to pay their own money to play nine truncated St. Petersburg bets. One of each subject’s decisions was randomly selected for real money payoff. Bets were offered for N ¼ 1, 2,y, 9. Bet N had a maximum of N coin tosses and paid h2n if the first head occurred on toss number n, for n ¼ 1; 2; . . . ; N, and paid nothing if no head occurred. The price offered to a subject for playing bet N was 25 euro cents lower than hN where, of course, hN was the expected value of bet N. An expected value maximizer would accept all of these bets. The experimenter could credibly offer the game to the subjects because the highest possible payoff was h512 (¼ 29 ) for each subject. Cox et al. (2008a) report that 47% of their subjects’ choices were to reject the opportunity to play the St. Petersburg bets. They use a linear mixture model (Harless & Camerer, 1994) to estimate whether a risk neutral preference model can characterize the data. Let the letter a denote a subject’s response that she accepts the offer to play a specific St. Petersburg game in the experiment. Let r denote rejection of the offer to play the game. The linear mixture model is used to address the specific question whether, for the nine St. Petersburg games offered to their subjects, the risk neutral response pattern ða; a; a; a; a; a; a; a; aÞ or the risk averse response pattern ðr; r; r; r; r; r; r; r; rÞ is more consistent with the data. Let the stochastic preferences with error rate be specified in the following way: (a) if option Z is preferred then Prob(choose Z) ¼ 1@; and (b) if option Z is not preferred then Prob(choose Z) ¼ . The maximum likelihood point estimate of the proportion of subjects for which risk neutral preferences are rejected in favor of risk averse preferences is 0.49, with a Wald 90% confidence interval of (0.30, 0.67). They conclude that 30–67% of the subjects are not risk neutral in this experiment.
Risky Decisions in the Large and in the Small
25
4.4. Plausibility Checks on Empirical Findings with St. Petersburg Games Experiments with St. Petersburg games can be designed by following the logic of the discussion in Section 4.2. Of course, as that discussion makes clear, one needs a postulated money transformation function and/or postulated probability transformation function to construct the payoffs for the experiment. But that, in itself, does not rule out the possible empirical relevance of the generalized St. Petersburg game, as can be understood from the following. If a researcher concludes, say, that EU theory with pffiffiffipower function utility (or money transformation) function jEU ðxÞ ¼ x can rationalize risk preferences on a finite domain of payoffs ½z; Z, this opens the question of whether the conclusion is plausible because it implies that the EU maximizing agents would accept all finite St. Petersburg bets with prizes xn ¼ 4n , n ¼ 1, 2, y, N, so long as 4N Z. The theory implies that the agent with power coefficient 1/2 would reject any sure amount of money up to $(N+1)2 in favor of playing the finite St. Petersburg lottery with a maximum payoff of N coin tosses that pays $4n if the first head occurs on toss number n, for noN þ 1, and pays $4N otherwise. This experiment would be feasible to run for values of N such that $4N is affordable. It would provide an empirical check on plausibility of the conclusion that EU theory with square root power function preferences can rationalize the subjects’ risky decisions on domain ½z; Z. For example, a finite version with N ¼ 5 of this game that can be credibly tested in the laboratory is reported in Table 1. Let Y 5 ¼ ½256; 64; 16; 4; 1 and 1=25 ¼ ½2@4 ; 2@4 ; 2@3 ; 2@2 ; 2@1 denote the finite St. Petersburg game that pays $1 if a coin lands ‘‘head’’ on the first flip, $4 if the coin lands ‘‘head’’ for the first time on the second flip, $16 if the coin lands ‘‘head’’ for the first time on the third flip, $64 if the coin lands ‘‘head’’ for the first n on the fourth flip, and $256 otherwise (with probaP time bility 1@ 4n¼1 12 ). The expected value of this game is $23.5 whereas U EUI ðY 5 ; 1=25 Þ ¼ 3. Hence, the EU of income model predicts that the agent will prefer getting $10 for sure to playing this game. The expected value model, however, predicts that the agent prefers this game to getting $23 for sure. For cumulative prospect theory, the last column of Table 1 shows a sequence of payments in a generalized St. Petersburg game. Only payments that are smaller than $500 are reported since that is reasonably affordable in an experiment. Suppose for instance that someone has preferences that can be represented by cumulative prospect theory with reference point 0, g ¼ 0:71, and a ¼ 0:5 as reported by Wu and Gonzalez (1996). A finite version of the generalized version of the St. Petersburg game for this case that can be credibly tested in the laboratory is vPT ¼ ½503; 220; 96; 36; 4.
26
JAMES C. COX AND VJOLLCA SADIRAJ
That is, the game pays $4 if a coin lands ‘‘head’’ on the first flip, $36 if the coin lands ‘‘head’’ for the first time on the second flip, $96 if the coin lands ‘‘head’’ for the first time on the third flip, $220 if the coin lands ‘‘head’’ for the first time on the fourth flip, and $503 otherwise. The expected value of this game is $68.19 whereas U CP ðvCP ; 1=25 Þ ¼ 5:1. Hence, cumulative prospect theory with the above parameter specifications predicts that the agent will prefer getting $26 for sure to playing this game. The expected value model, however, predicts that the agent prefers the game to getting $68 for sure. Table 1 also reports examples of lotteries and predictions by rank dependent utility theory and dual theory of EU, as discussed in Section 4.2.
4.5. Plausibility Checks on Empirical Findings with Binary Lotteries Proposition 2 can provide a researcher with checks on the empirical plausibility of estimates of risk aversion parameters on a finite domain ½z; Z. Using the notation of the proposition, questions that are clearly relevant to a finite domain involve payoff amounts x and x þ L and G, all in the domain of interest, that imply x þ L for sure is preferred to fG; 0:5; xg. Implications such as these provide plausibility checks on reported parameter estimates. Table 2 presents some implications of two money transformation (or utility) functions using parameter estimates for three experiments with small stakes lotteries reported in the literature. The parameter estimates are taken from Harrison and Rutstro¨m (2008, Table 8, p. 120). Unlike the discussion in Section 3.2 above, we here examine the implications of estimated parametric money transformation functions only on the local domains of the data samples used in estimation of the parameters. As shown at the top of Table 2, data are from experiments reported by Holt and Laury (2005), Hey and Orme (1994), and Harrison and Rutstro¨m (2008). As shown just below the top of the table, parameter estimates from two functional forms are used: CRRA and EP. As shown at the next level in the table, estimates based on two theories are used: EU of income models and rank dependent utility models (RD). The entries in the first and third columns of Table 2 convey the following information. The third column reports parameter estimates for a rank dependent utility model with power functions for both the money transformation and probability transformation functions. Data from the experiment reported in Holt and Laury (2005) yield the parameter estimate b r ¼ 0:85 for the money transformation function and the parameter estimate bg ¼ 1:46 for the probability transformation function. With these parameters,
0.4 0.2 0.08
b r ¼ 0:85 bg ¼ 1:46b
b r ¼ 0:76
4.3 1.7 0.78
RD
EU
CRRA
15.8 7.5 3.81
b r ¼ 0:4 b a ¼ 0:07
EU
Holt and Laury (2005)
RD
b r ¼ 0:26 b a ¼ 0:02b bg ¼ 0:37 8.6 3.4 1.9
EP
5.1 2.4
b r ¼ 0:61
EU
RD
5.1 2.4
b r ¼ 0:61 bg ¼ 0:99
CRRA
4.6 1.81
b r ¼ 0:82 b a ¼ @1:06
EU
Hey and Orme (1994)
EP
4.6 1.81
b r ¼ 0:82 b a ¼ @1:06 bg ¼ 0:99
RD
3.3
b r ¼ 0:53
EU
RD
3.2
b r ¼ 0:53 bg ¼ 0:97
CRRA
3.1
b r ¼ 0:78 b a ¼ @1:10
EU
EP
3.1
b r ¼ 0:78 b a ¼ @1:10 bg ¼ 0:97
RD
Harrison and Rutstro¨m (2008) replication of Hey and Orme (1994)
Predictions for Binary Lotteries Using Parameter Point Estimates from Small Stakes Data.
a The higher payoff in a binary lottery is within the range of payoffs used in the experiment. Numbers are in US dollars for the Holt–Laury and Harrison–Rutstro¨m studies and in British ponds for the Hey–Orme study in the middle columns. b p-values >0.1.
f77; 0:5; 0g f30; 0:5; 0g f14; 0:5; 0g
Binary Lotteriesa
Table 2.
Risky Decisions in the Large and in the Small 27
28
JAMES C. COX AND VJOLLCA SADIRAJ
the rank dependent utility model implies that $0.40 for sure (in column 3) is preferred to the lottery f$77; 0:5; $0g (in column 1). It seems to us likely that almost all people would have risk preferences that are inconsistent with this prediction and, in that sense, that the estimated parametric utility function is implausible. Importantly, the prediction that $0.40 for sure is preferred to {$77, 0.5; $0} is clearly testable and, therefore, a conclusion about plausibility or implausibility of the estimated model can be based on data not mere opinion. Estimation of the CRRA parameter using the EU of income model and data from Holt and Laury (2005) yields b r ¼ 0:76. With this parameter, as reported in the second column of Table 2, $4.30 for sure is preferred to the lottery f$77; 0:5; $0g. The fourth and fifth columns of Table 2 report parameter estimates for the EP money transformation function. The parameter estimates imply that $8.60 for sure is preferred to the lottery f$77; 0:5; $0g for the rank dependent utility model. The preferred sure amount of money is $15.80 in case of the EU of income model. Table 2 uses point estimates of parameters from three data sets and four combinations of money transformation and probability transformation functions to derive implied preferences for sure amounts of money (in all columns except the first) over binary lotteries (in the first column). All of these implied preferences are stated on domains that are the same or smaller than those for the data samples. Furthermore, all of these implied preferences are testable with real, affordable experiments. Conducting such tests would provide data to inform researchers’ decisions about whether the estimated parametric forms provide plausible or implausible characterizations of the risk attitudes of the subjects in experiments. Finally, similar experiments can be designed with binary lotteries based on any parameter estimates within the 90% confidence limits of the estimation if a researcher wants to thoroughly explore the plausibility question. In the preceding sections, we have explored testable implications for empirical plausibility of parametric forms of decision theories in class D. Some recent studies have identified patterns of risk aversion, known as calibration patterns, that can be used to test plausibility of theories under risk without any parametric specifications. Concavity calibrations involve certain types of patterns of choices that target decision theories under risk that assume concave money transformation (or utility) functions (Rabin, 2000; Neilson, 2001; Cox & Sadiraj, 2006; Rubinstein, 2006). Convexity calibrations, on the other hand, involve patterns of risk aversion that apply to theories that represent risk aversion with probability transformation functions (Cox et al., 2008b). The following three sections summarize what
Risky Decisions in the Large and in the Small
29
is currently known about the empirical validity of patterns of risk aversion underlying calibration propositions.
4.6. Do Concavity Calibrations of Payoff Transformation (or Utility) Functions have Empirical Relevance? Cox et al. (2008b) report an experiment run in Calcutta, India to test the empirical validity of a postulated pattern of small stakes risk aversion that has implications for cumulative prospect theory, rank dependent utility theory, and all three EU models discussed in Cox and Sadiraj (2006), the EU of terminal wealth model, the EU of income model, and the EU of initial wealth and income model. Subjects in the Calcutta experiment were asked to choose between a certain amount of money, x rupees (option B) and a binary lottery that paid x@20 rupees or x+30 rupees with equal probability (option A) for values of x from a finite set O. Subjects were informed that one of their decisions would be randomly selected for payoff. The amount at risk in the lotteries (50 rupees) was about a full day’s pay for the subjects in the experiment. By Proposition 2 and Corollary 2 in Cox et al. (2008b), if a subject chooses option B for at least four sequential values of x then calibration of the revealed pattern of small stakes risk aversion implies behaviorally implausible large stakes risk aversion. They call any choice pattern that meets this criterion a ‘‘concavity calibration pattern’’ and test a null hypothesis that the data are not characterized by concavity calibration patterns against an alternative that includes them. To conduct the test, Cox et al. (2008b) applied a linear mixture model similar to that described in Section 4.3. The reported point estimate for the proportion of the subjects in the Calcutta experiment that made choices for which EU theory, rank dependent utility theory, cumulative prospect theory (with 0 reference point payoff) imply implausible large stakes risk aversion was 0.495, with Wald 90% confidence interval of (0.289, 0.702). They conclude that 29–70% of the subjects made choices that, according to three theories of risky decision making, can be calibrated to imply implausible large stakes risk aversion. According to Proposition 2 and Corollary 2 in Cox et al. (2008b), this conclusion applies to all theories in class D that represent risk preferences with concave transformations of payoffs. Thus the conclusion applies to all EU models regardless of whether they specify full asset integration (the terminal wealth model), no asset integration (the income model), or partial asset integration (variants of the initial wealth and income model).
30
JAMES C. COX AND VJOLLCA SADIRAJ
Prospect theory can be immunized to concavity calibration critique by introducing variable reference points set equal to the x values in the Calcutta experiments (Wakker, 2005). The variable reference points do not, however, immunize prospect theory to other tests with data from the experiment because they imply that a subject will make the same choice (of the lottery or the certain payoff ), for all values of the sure payoff x. Cox et al. (2008b) report that the likelihood ratio test rejects this ‘‘non-switching hypothesis’’ in favor of an alternative that allows for one switch at 5% significance level. Adding possible choice patterns with more than one switch to the alternative hypothesis would also lead to rejection of the non-switching hypothesis. Hence, variable reference points do not rescue cumulative prospect theory from inconsistency with the data from the experiment.
4.7. Do Convexity Calibrations of Probability Transformation Functions have Empirical Relevance? Cox et al. (2008b) demonstrate that the problem of possibly implausible implications from theories of decision making under risk is more generic than implausible (implications of ) decreasing marginal utility of money by extending the calibration literature in their Proposition 1 to include the implications of convex transformations of decumulative probabilities used to model risk aversion in the dual theory. They report another experiment run in Magdeburg, Germany in which subjects were asked to make nine choices between pairs of lotteries. Subjects were informed that one of their decisions would be randomly selected for payoff. Decision task i, for i ¼ 1; 2; . . . ; 9, presented a choice between lottery A that paid h40 with probability i=10 and h0 with probability 1@i=10 and lottery B that paid h40 with probability ði@1Þ=10, h10 with probability 2/10, and h0 with probability 1@½ði@1 þ 2Þ=10. By Proposition 1 in Cox et al. (2008b), if a subject chooses lottery B for at least seven sequential values of the probability index i then calibration of the revealed pattern of small stakes risk aversion implies implausible large stakes risk aversion for the dual theory. They call any choice pattern that meets this criterion a ‘‘convexity calibration pattern’’ and test the null hypothesis that the data are not characterized by convexity calibration patterns against an alternative that includes them. Again applying a linear mixture model, Cox et al. (2008b) report that the linear mixture model yields a point estimate of 0.81 and Wald 90% confidence interval of (0.66, 0.95) for the proportion of subjects for which the dual theory implies implausible risk aversion. Thus the data
Risky Decisions in the Large and in the Small
31
are consistent with the conclusion that 66–95% of the subjects made choices that, according to the dual theory, can be calibrated to imply implausible large stakes risk aversion.
4.8. Is the Expected Utility of Terminal Wealth Model More (or Less) Vulnerable to Calibration Critique than Other Theories? Rabin (2000) initiated recent literature on the large stakes risk aversion implications implied by calibration of postulated patterns of small stakes risk aversion. His analysis is based on the supposition that an agent will reject a small stakes gamble with equal probabilities of 50% of winning or losing relatively small amounts, and that the agent will do this at all initial wealth levels in some large interval. For example, Rabin demonstrated that if an agent would reject a 50/50 bet in which she would lose $100 or gain $110 at all initial wealth levels up to $300,000 then the EU of terminal wealth model implies that, at an initial wealth level of $290,000, that agent would also reject a 50/50 bet in which she would lose $6,000 or gain $180 million. Rabin (2000) and Rabin and Thaler (2001) stated strong conclusions about implausible risk aversion implications for EU theory implied by their supposed patterns of small stakes risk aversion but reported no experiments supporting the empirical validity of the suppositions. Their conclusions about EU theory were taken quite seriously by some scholars (Kahneman, 2003; Camerer & Thaler, 2003) and by a Nobel Prize committee (Royal Swedish Academy of Sciences, 2002, p. 16), despite the complete absence of data consistent with the supposed patterns of small stakes risk aversion underlying the concavity calibrations. It is ironic that, in this heyday of behavioral economics, strong conclusions about the behavioral plausibility of theory could be drawn without any actual observations of behavior. As explained by Cox and Sadiraj (2006), observations of behavior consistent with the pattern of risk aversion supposed in Rabin’s concavity calibration would have limited implications for risky decision theory because they would have no implications for EU models other than the terminal wealth model nor for other theories in class D in which income rather than terminal wealth is postulated as the argument of functional (6). Furthermore, an experiment that could provide empirical support for Rabin’s supposition would have to be conducted with a within-subjects design, as we shall explain after first explaining problems with acrosssubjects experiments in the literature.
32
JAMES C. COX AND VJOLLCA SADIRAJ
Barberis, Huang, and Thaler (2003) report an across-subjects, hypothetical experiment with a 50/50 lose $500 or gain $550 bet using as subjects MBA students, financial advisers, investment officers, and investor clients. They report that about half of the subjects stated they would be unwilling to accept the bet. They do not report wealth data for these subjects nor the relationship, if any, between subjects’ decisions and their wealth levels; therefore the relation between the subjects’ decisions and the supposed pattern of risk aversion used in concavity calibration propositions is unknown. Barberis et al. (2003) also report an across-subjects, real experiment with a 50/50 lose $100 or gain $110 bet using MBA students as subjects. They report that only 10% of the subjects were willing to play the bet. No wealth data are reported for these subjects either. It is straightforward to show that any across-subjects experiment involving one choice per subject cannot provide data that would support the conclusion of implausible risk aversion. Suppose one has a sample from an experiment (like the two Barberis et al., 2003 experiments) in which each of N subjects is asked to make one decision about accepting or rejecting a 50/50 lose $100 or gain $110 bet. Suppose that the initial wealth level of every subject is observed and that these wealth levels vary across a large range, say from a low of $100 to a high of $300,000. Would such a data sample provide support for any conclusion about the EU of terminal wealth model? Without making other assumptions about preferences, the answer is clearly ‘‘no’’ as we next explain. Suppose that we observe individual wealth levels w~ j 2 ½100; 300K, j ¼ 1; 2; . . . ; N, for each of N individuals and that every one of them rejects the 50/50 lose $100, gain $110 bet. Can they all be EU of terminal wealth maximizers with globally plausible risk aversion? Yes, and the following equation can be used to generate N utility functions with parameters aj and rj, each of which implies indifference between accepting and rejecting the bet at the observed individual wealth levels: 100 rj 110 rj 2 ¼ 1@ þ 1þ ; aj w~ j@100 (15) w~ j@aj w~ j@aj Any ordered pair of parameters ðaðw~ j Þ; rðw~ j ÞÞ below the graph of the level set of this equation can be used to construct a utility function uj ðw~ j þ yÞ ¼ ð@aðw~ j Þ þ w~ j þ yÞrðw~ j Þ
(16)
that implies rejection of the bet for an EU of terminal wealth maximizer with initial wealth w~ j and money transformation function given by (16). But
Risky Decisions in the Large and in the Small
33
data-consistent utility functions for all subjects exhibit plausible risk aversion globally. Therefore, the empirical relevance of Rabin’s concavity calibration for the EU of terminal wealth model cannot be tested with such an across-subjects experiment. The empirical validity of Rabin’s concavity calibration for the EU of terminal wealth model could, however, be tested with a within-subjects experiment. Let subject j have initial wealth wj at the beginning of the experiment. In round t of the experiment, give subject j an amount of money xt and an opportunity to play a 50/50 bet with loss of 100 or gain of 110. Choose the set X of values for xt so that there are enough observations covering a sufficiently large range that concavity calibration can bite. An example of suitable specifications of the set X are provided by the sets of certain income payoffs used in the Calcutta experiment reported in Cox et al. (2008b) and summarized above. Consider the set of certain payoff x values used in the Calcutta experiment; define X ¼ f100; 1K; 2K; 4K; 5K; 6Kg and let xt denote the value in position t in this set. Using subject j’s (observed) initial wealth wt at the beginning of the experiment, and the controlled values xt, t ¼ 1; 2; . . . ; 6, define subject j’s variable initial wealth level during the experiment as ojt ¼ wj þ xt . At round t in the experiment, give the subject xt and then ask her whether she wants to accept the 50/50 gamble with loss amount 100 and gain amount 110. If the answer is ‘‘no’’ for at least four sequential values of x then Proposition 2 in Cox et al. (2008b) and Rabin’s (2000) concavity calibration proposition imply implausible risk aversion for the EU of terminal wealth model. Therefore this type of ‘‘pay-x-in-advance,’’ within-subjects experiment could support, or fail to support, the empirical relevance of Rabin’s concavity calibration supposition for the terminal wealth model.12
5. SUMMARY IMPLICATIONS FOR THEORIES OF RISKY DECISIONS Some implications for theories of decision making under risk are straightforward while others are nuanced. 5.1. Decision Theories on Unbounded Domain have Implausible Implications One implication is that all theories in class D have the same problems with respect to the plausibility of modeling decisions under risk on an unbounded
34
JAMES C. COX AND VJOLLCA SADIRAJ
domain. This conclusion follows from the demonstration that, on an unbounded domain, theories in class D are characterized by either the generalized St. Petersburg paradox or implausible aversion to risk of type (I). This raises doubts about the plausibility of classic developments of EU theory for risky decisions (Pratt, 1964; Arrow, 1971). But this plausibility critique of the theory is not confined to EU theory; instead, it applies to all theories in a class that contains cumulative prospect theory, rank dependent utility theory, and dual theory of EU (as well as EU theory). In this sense, the fundamental problems shared by these theories may be more significant than their much-touted differences.
5.2. Implications for Theory and its Applications on Bounded Domains Theories of risky decisions defined on bounded domains can be characterized by the generalized St. Petersburg paradox or by implausible large stakes risk aversion or by neither problem. Conclusions for theory on bounded domains are more nuanced, and more complicated, but they are empirically testable. Concavity calibration of postulated patterns of risk aversion have implausible large stakes risk aversion implications for all theories in class D that incorporate decreasing marginal utility of money except for specific versions of prospect theory that postulate variable reference points (which are rejected by testing the data). Implausible implications for theory following from calibrating postulated patterns of risk aversion are not confined to theories with decreasing marginal utility of money. The dual theory of EU, characterized by constant marginal utility of money, can be critiqued with convexity calibration of the probability transformation that exclusively incorporates risk aversion into this theory. Whether or not critiques with the generalized St. Petersburg paradox or (calibrated) implausible large stakes risk aversion have bite for theories defined on bounded domains are empirical questions. The reason for this is apparent: people may accept feasible St. Petersburg bets and/or they may not reject the small stakes bets postulated in calibrations. If both those outcomes were observed, the St. Petersburg paradox and calibration critiques would have no implication of implausible theory for bounded domains. To date, the empirical evidence is limited. As discussed above, even on very large bounded domains, expected values for St. Petersburg bets are quite small, of the order of $25, which (for what it’s worth) is consistent with commonly reported subjects’ statements about
Risky Decisions in the Large and in the Small
35
willingness to pay to play the bets in hypothetical experiments. In one real payoff experiment with finite St. Petersburg bets reported by Cox et al. (2008a), 30–67% of the subjects revealed risk preferences that were inconsistent with the expected value model. There is not yet any existing study that supports the conclusion that terminal wealth models are more vulnerable to calibration critique than income models. There are various misstatements in the literature about the existence of data supporting Rabin’s (2000) supposition that an agent will reject a given small stakes bet at all initial wealth levels in a wide interval. In fact, there is no test of Rabin’s supposition in the literature. Furthermore, a test of this supposition would, in any case, have no implications for models in which income rather than terminal wealth is the argument of utility functionals (Cox & Sadiraj, 2006). The within-subjects Calcutta experiment with concavity calibration reported by Cox et al. (2008b) has implications for all three EU models, rank dependent utility theory, and the original version of cumulative prospect theory with constant reference point equal to zero income (Tversky & Kahneman, 1992). This was a within-subjects, real payoff experiment. In the Calcutta experiment, 25–62% of the subjects made patterns of small stakes risky choices for which EU theory, rank dependent utility theory, and prospect theory (with zero reference point payoff ) imply implausible large stakes risk aversion. Variable reference points can be incorporated into prospect theory in ways that immunize the theory to concavity calibration critique with this experimental design. But the testable implication of this version of prospect theory has a high rate of inconsistency with data from the Calcutta experiment and is rejected in favor of the ‘‘calibration pattern’’ by a likelihood ratio test. The Madgeburg experiment with convexity calibration for probability transformations (Cox et al., 2008b) has implications for the dual theory of EU that has constant marginal utility of money and incorporates risk aversion solely through non-linear transformation of probabilities. In this experiment, 66–95% of the subjects made patterns of risky choices for which the dual theory of EU implies implausible large stakes risk aversion. We conclude that, together, the Calcutta concavity calibration experiment and Magdeburg convexity calibration experiment provide data that suggest skepticism about the plausibility of popular theories of decision making for risky environments. However, more experiments and larger samples are needed to arrive at definitive conclusions about the empirical relevance of the calibration propositions. One thing that is clear is that the traditional focus on decreasing marginal utility of money as the source of implausible
36
JAMES C. COX AND VJOLLCA SADIRAJ
implications from calibration of postulated patterns of risk aversion is wrong; modeling risk aversion with probability transformations also can produce implausible implications from calibration. Empirical research leading to conclusions that estimated parametric forms of utility functionals can represent subjects’ behavior in risky decision making can be checked for plausibility by applying research methods explained here. Two types of questions can be posed. First, does the estimated parametric form survive testing with St. Petersburg lotteries that can be derived from the parametric form using methods explained above? Second, does the estimated parametric form of a utility functional survive experimental testing with binary lottery designs that can be derived from the parametric form using methods explained above? If the answer to either question is ‘‘no’’ then the conclusion that the estimated utility functional can rationalize risk taking behavior is called into question.
NOTES 1. The EV theory of risk preferences has the same implications if terminal wealth rather than income is assumed to be the random lottery payoff in the functional in statement (1). 2. The expected utility of income model was used to develop much of Bayesian– Nash equilibrium bidding theory. See, for examples: Holt (1980), Harris and Raviv (1981), Riley and Samuelson (1981), Cox, Smith, and Walker (1982), Milgrom and Weber (1982), Matthews (1983), Maskin and Riley (1984), and Moore (1984). 3. See Harrison, List, and Towe (2007) and Heinemann (2008) for empirical applications of partial asset integration models. 4. We write the functional for rank dependent utility theory with transformation of cumulative probabilities in the same way as Quiggin (1982, 1993). Some later expositions of this theory use a logically equivalent representation with transformation of decumulative probabilities. 5. Loss aversion, when defined as a discontinuity in the slope of the utility function at zero income, is consistent with the expected utility of income model (Cox and Sadiraj, 2006). 6. If there exists an inverse function j@1 EU then the sequence of payoffs zn ¼ n j@1 EU ð2 Þ provides a generalized St. Petersburg game with infinite expected utility. 7. In the context of the expected utility of terminal wealth model, utility function (12) represents constant absolute risk averse preferences, which is the source of the name CARA. 8. For the case of the expected utility of terminal wealth model, power function utility represents constant relative risk averse preferences, which is the source of the name CRRA. 9. Tversky and Kahneman (1992, p. 300) provide the value gD ð0:5Þ ¼ 0:42 (where, in our notation, gD is the same as their probability weighting function for gains wþ ).
Risky Decisions in the Large and in the Small
37
10. Clearly, Proposition 2 does not apply to expected value theory and the dual theory of expected utility theory because their money transformation functions are (linear and hence) unbounded. 11. As cited in Holt and Laury (2002, fn. 9, p.1649), CRRA estimates in the range 0.44–0.67 were reported by Campo et al. (2000), Chen and Plott (1998), Cox and Oaxaca (1996), Goeree and Holt (2004), and Goeree, Holt, and Palfrey (2002, 2003). Harrison, Lau, and Rutstro¨m (2007) report CRRA estimates within the same range using field experiment data. 12. In contrast, this type of experiment could not produce data that would have a calibration-pattern implication for any of the other models discussed above for which income, not terminal wealth is the argument of the utility functional (Cox & Sadiraj, 2006). However, this type of experiment would have a testable implication for all other models in class D: (a) always choose the risky option with EV theory; or (b) always choose the same option with other theories.
ACKNOWLEDGMENT We thank Glenn W. Harrison and Nathanial T. Wilcox for helpful comments and suggestions. Financial support was provided by the National Science Foundation (grant numbers DUE-0622534 and IIS-0630805).
REFERENCES Arrow, K. J. (1971). Essays in the theory of risk-bearing. Chicago, IL: Markham. Barberis, N., Huang, M., & Thaler, R. (2003). Individual preferences, monetary gambles and the equity premium. NBER Working Paper 9997. Bernoulli, D. (1738). Specimen Theoriae Novae de Mensura Sortis, Commentarii Academiae Scientiarum Imperialis Petropolitanae 5, 175–192. English translation (1954): Exposition of a new theory on the measurement of risk. Econometrica, 22, 23–36. Camerer, C., & Thaler, R. H. (2003). In honor of Matthew Rabin: Winner of the John Bates Clark medal. Journal of Economic Perspectives, 17, 159–176. Camerer, C. F., & Ho, T. (1994). Violations of the betweenness axiom and nonlinearity in probability. Journal of Risk and Uncertainty, 8, 167–196. Campo, S., Perrigne, I., & Vuong, Q. (2000). Semi-parametric estimation of first-price auctions with risk aversion. Working Paper. University of Southern California. Chen, K., & Plott, C. R. (1998). Nonlinear behavior in sealed bid first-price auctions. Games and Economic Behavior, 25, 34–78. Cox, J. C., & Oaxaca, R. L. (1996). Is bidding behavior consistent with bidding theory for private value auctions?. In: R. M. Isaac (Ed.), Research in experimental economics (Vol. 6). Greenwich, CT: JAI Press. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56, 45–60.
38
JAMES C. COX AND VJOLLCA SADIRAJ
Cox, J. C., Sadiraj, V., & Vogt, B. (2008a). On the empirical relevance of St. Petersburg lotteries. Experimental Economics Center Working Paper 2008—05. Georgia State University. Cox, J. C., Sadiraj, V., Vogt, B., & Dasgupta, U. (2008b). Is there a plausible theory for decision under risk? Experimental Economics Center Working Paper 2008—04. Georgia State University. Cox, J. C., Smith, V. L., & Walker, J. M. (1982). Auction market theory of heterogeneous bidders. Economics Letter, 9, 319–325. Goeree, J. K., & Holt, C. A. (2004). A model of noisy introspection. Games and Economic Behavior, 47, 365–382. Goeree, J. K., Holt, C. A., & Palfrey, T. (2002). Quantal response equilibrium and overbidding in private-value auctions. Journal of Economic Theory, 104, 247–272. Goeree, J. K., Holt, C. A., & Palfrey, T. (2003). Risk averse behavior in generalized matching pennies games. Games and Economic Behavior, 45, 97–113. Harless, D., & Camerer, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62, 1251–1290. Harris, M., & Raviv, A. (1981). Allocation mechanisms and the design of auctions. Econometrica, 49, 1477–1499. Harrison, G. W., Lau, M., & Rutstro¨m, E. E. (2007). Estimating risk attitudes in Denmark: A field experiment. Scandinavian Journal of Economics, 109, 341–368. Harrison, G. W., List, J., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. Econometrica, 75, 433–458. Harrison, G. W., & Rutstro¨m, E. E. (2008). Risk aversion in the laboratory. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Greenwich, CT: JAI Press, Research in Experimental Economics. Heinemann, F. (2008). Measuring risk aversion and the wealth effect. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Greenwich, CT: JAI Press, Research in Experimental Economics. Hey, J., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62, 1291–1326. Holt, C. A., Jr. (1980). Competitive bidding for contracts under alternative auction procedures. Journal of Political Economy, 88, 433–445. Holt, C. A., & Laury, S. (2002). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655. Holt, C. A., & Laury, S. (2005). Risk aversion and incentive effects: New data without order effects. American Economic Review, 95, 902–912. Kahneman, D. (2003). A psychological perspective on economics. American Economic Review Papers and Proceedings, 93, 162–168. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Laffont, J. J. (1989). The economics of uncertainty and information. Cambridge, MA: MIT Press. Maskin, E., & Riley, R. (1984). Optimal auctions with risk averse buyers. Econometrica, 52, 1473–1518. Matthews, S. A. (1983). Selling to risk averse buyers with unobservable tastes. Journal of Economic Theory, 30, 370–400. Milgrom, P. R., & Weber, R. J. (1982). A theory of auctions and competitive bidding. Econometrica, 50, 1089–1122. Moore, J. (1984). Global incentive constraints in auction design. Econometrica, 52, 1523–1536.
39
Risky Decisions in the Large and in the Small
Neilson, W. S. (2001). Calibration results for rank-dependent expected utility. Economics Bulletin, 4, 1–5. Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32, 122–136. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3, 323–343. Quiggin, J. (1993). Generalized expected utility theory. The rank-dependent model. Boston, MA: Kluwer Academic Publishers. Rabin, M. (2000). Risk aversion and expected utility theory: A calibration theorem. Econometrica, 68, 1281–1292. Rabin, M., & Thaler, R. H. (2001). Anomalies: Risk aversion. Journal of Economic Perspectives, 15, 219–232. Rieger, M. O., & Wang, M. (2006). Cumulative prospect theory and the St. Petersburg paradox. Economic Theory, 28, 665–679. Riley, J. G., & Samuelson, W. F. (1981). Optimal auctions. American Economic Review, 71, 381–392. Royal Swedish Academy of Sciences (2002). Foundations of behavioral and experimental economics: Daniel Kahneman and Vernon Smith. Advanced Information on the Prize in Economic Sciences, 17, 1–25. Rubinstein, A. (2006). Dilemmas of an economic theorist. Econometrica, 74, 865–883. Samuelson, P. A. (1977). St. Petersburg paradoxes: Defanged, dissected, and historically described. Journal of Economic Literature, 15, 24–55. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5, 297–323. Saha, A. (1993). Expo-power utility: A ‘flexible’ form for absolute and relative risk aversion. American Journal of Agricultural Economics, 75, 905–913. von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior. Princeton, NJ: Princeton University Press. Wakker, P. P. (2005). Formalizing reference dependence and initial wealth in Rabin’s calibration theorem. Working Paper. Econometric Institute, Erasmus University, Rotterdam. Wu, G., & Gonzalez, R. (1996). Curvature of the probability weighting function. Management Science, 42, 1676–1690. Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica, 55, 95–115.
APPENDIX A.1 Lemma. Let functions h : ½0; 1 ! ½0; 1, s.t. hðpn ; P@n 1 Þa0 for all pn a0; and j : < ! <, be given. If j is unbounded from above then (a) for all n 2 N, there exists 1=hð1=2n ; ð1=21 Þ@n Þ, and P @n n (b) Þjðxn Þ ¼ 1 n2N hð1=2 ; ð1=21 Þ
xn 2 N
such
that
jðxn Þ
40
JAMES C. COX AND VJOLLCA SADIRAJ
Proof. (a) It follows from hð1=2n ; ð1=21 Þ@n Þa0, for all n 2 N, and by definition of a function being unbounded from above. (b) Part (a) and hð1=2n ; ð1=21 Þ@n Þ40 implies jðxn Þ hð1=2n ; ð1=21 Þ@n Þ 1, hence (b) is true. A.2 Corollary: An Affordable Version of the Generalized St. Petersburg Game. Let an agent’s preferences, kj;h on a lottery space be represented by functional (6) with an unbounded money transformation function j and probability transformation function h. Define X j;h ¼ fxn jn 2 N; xn ¼ supfj@1 ð1=hð1=2n ; ð1=21 Þ@n ÞÞgg. For any given N, 1 j@1 ðN þ 1Þ j;h x1 ; 21 N where fx1 ; 1=21 gN is a St. Petersburg game that pays xn 2 X j;h when a fair coin comes up the first time on flip n, for noN and xN such Pheads for that jðxN Þ ¼ 2= h ð1=2n ; ð1=21 Þ@n Þ, otherwise. nN
Proof. Note that X 1 1 @n 1 U x1 ; h n; ¼ jðxn Þ 21 N 2 21 n¼1N X 1 1 @n h n; þ jðxNþ1 Þ ¼N þ1 2 21 n4N A.3 Implausible Risk Aversion of Type (I). Let an agent’s preferences on a lottery space be represented by a theory D with functional (6) with a bounded money transformation function. For any given x there exists an L such that x þ LD f1; 0:5; xg. Proof. Let j be the money transformation function and g(0.5) be the transformed probability of 0.5 under decision theory D. Function j is bounded from above and positively monotonic, so: there exists an A such that (i) A ¼ supx fjðxÞg, and (ii) lim jðxÞ ¼ A. For any given x, take ¼ x!1
ðA@jðxÞÞð1@gð0:5ÞÞ40 and apply (ii) to find a zx s.t. jðzx Þ4A@. To complete the proof take L ¼ zx@x, substitute the expression of in the last inequality and verify that jðx þ LÞ4Agð0:5Þ þ jðxÞð1@gð0:5ÞÞ.
RISK AVERSION IN THE LABORATORY Glenn W. Harrison and E. Elisabet Rutstro¨m ABSTRACT We review the experimental evidence on risk aversion in controlled laboratory settings. We review the strengths and weaknesses of alternative elicitation procedures, the strengths and weaknesses of alternative estimation procedures, and finally the effect of controlling for risk attitudes on inferences in experiments.
Attitudes to risk are one of the primitives of economics. Individual preferences over risky prospects are taken as given and subjective in all standard economic theory. Turning to the characterization of risk in applied work, however, one observes many restrictive assumptions being used. In many cases individuals are simply assumed to be risk neutral;1 or perhaps to have the same constant absolute or relative aversion to risk.2 Assumptions buy tractability, of course, but at a cost. How plausible are the restrictive assumptions about risk attitudes that are popularly used? If they are not plausible, perhaps there is some way in which one can characterize the distribution of risk attitudes so that it can be used to analyze the implications of relaxing these assumptions. If so, such characterizations will condition inferences about choice behavior under uncertainty, bidding in auctions, and behavior in games. Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 41–196 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00003-3
41
42
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
We examine the design of experimental procedures that can be used to estimate risk attitudes of individuals. We also investigate how the data generated by these procedures should be analyzed. We focus on procedures that allow ‘‘direct’’ estimation of risk preferences by eliciting choices in noninteractive settings, since we want to minimize the role of auxiliary or joint hypotheses about Nash Equilibrium (NE) behavior in games. It is important to try to get estimates that are independent of such joint assumptions, in order that the characterizations that emerge can be used to provide tighter tests of those joint assumptions.3 Nevertheless, we also include designs that rely on subjects recognizing a dominant strategy response in a game against the experimenter, although we will note settings in which the presumption that subjects actually use these might be suspect.4 In Section 1 we consider the major procedures used to elicit risk attitudes. In Section 2 we review the alternative ways in which risk attitudes have been estimated from observed behavior using these procedures. In Section 3 we examine the manner in which measures of risk attitudes are used to draw inferences about lab behavior. Section 4 offers some thoughts on several open and closed issues, and Section 5 draws some grand conclusions. Our review is intended to complement the review by Cox and Sadiraj (2008) of theoretical issues in the use of concepts of risk aversion in experiments, as well as the review by Wilcox (2008a) of econometric issues involved in identifying risk attitudes when there is allowance for unobserved heterogeneity and ‘‘mistakes’’ by subjects. We take some positions on these theoretical and econometric issues, but leave detailed discussion to their surveys. We default to thinking of risk attitudes as synonymous with the properties of the utility function, consistent with traditional expected utility theory (EUT) representations. When we consider rank-dependent and signdependent specifications, particularly in Sections 2 and 3, the term ‘‘risk attitudes’’ will be viewed more broadly to take into account the effects of more than just the curvature of the utility function. Appendix A descriptively reviews the manner in which the humble ‘‘lottery’’ has been represented in laboratory experiments. Although we do not focus on the behavioral effects that may arise from the framing of the lotteries, we need to be aware that the stimulus provided to subjects often varies significantly from experiment to experiment. In effect, we experimenters are assuming that the subject views the lottery the way we view the lottery; the validity of this assumption of common knowledge between subject and observer rests, in large part, on the representation chosen by the experimenter. Some day a systematic comparison of the effects of these
Risk Aversion in the Laboratory
43
alternatives on risk attitudes should be undertaken, but here we simply want to provide a reminder that alternative representations exist and are used.5 We return to this issue much later, since it relates to the manner in which laboratory experiments might provide artifactual representations of the uncertainty subjects face in the field. In Appendices B, C, D, and E we examine in some depth the data and inferences drawn from four heavily cited studies of risk attitudes. The objective is to be very clear as to what these studies find, and what they do not find, since references to the literature are often casual and sometimes even inaccurate. Appendices B and C focus on two bona fide classics in the area. Hey and Orme (1994) (HO) introduced a robust experimental design to test EUT, a maximum likelihood (ML) estimation procedure that does not impose parametric functional forms, and a careful discussion of the role of ‘‘errors’’ when making inferences about risk attitudes. Holt and Laury (2002) (HL) introduced a justifiably popular method for eliciting risk attitudes for an individual, as well as important innovations in the ML estimation of risk aversion that go beyond simplistic functional forms. Appendices D and E focus on two studies that illustrate the problems that arise when experiments suffer from design issues or draw general inferences from restrictive models. Kachelmeier and Shehata (1992) (KS) apply an elicitation procedure that is popular, but which generates so much noise that reliable inferences cannot be drawn. Gneezy and Potters (1997) (GP) consider the important issue of ‘‘evaluation periods’’ on risk attitudes, but confound that valuable objective with extremely restrictive specifications of risk attitudes, leading them to incorrectly conclude that risk attitudes change with evaluation periods. In each of these studies there is an important objective; in the one case, examining risk attitudes among very poor subjects for whom the stakes are huge, and in the other case, considering the framing of the choice in a fundamental manner. But the problems with each study show why one has to pay proper attention to design and inferential issues before drawing reliable conclusions. We conclude that there is systematic evidence that subjects in laboratory experiments behave as if they are risk averse. Some subjects tend towards a mode of risk neutrality (RN), but very few exhibit risk-loving behavior. The degree of risk aversion is modest, but does exhibit heterogeneity that is correlated with observable individual characteristics. Some risk elicitation methods are expected to provide more reliable estimates than others, due to the simplicity of the task and the transparency of the incentives to respond truthfully. Limited evidence exists on the
44
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
stability of risk attitudes across elicitation instruments, but there is some evidence to indicate that roughly equal measures of risk aversion can be obtained in the laboratory using a variety of procedures that are a priori attractive. There are also several methods for eliciting risk that we do not recommend. Inferences about risk attitudes can be undertaken using several empirical approaches. One approach is to infer bounds on parameters for a limited class of (one-parameter) utility functions, but a preferable approach is to estimate a latent structural model of choice. Developments in statistical software now allow experimenters to undertake such structural estimation using ML methods. In addition, inferences about risk attitudes depend on whether the data generating process is viewed from the lens of a single model of choice behavior: there is striking evidence that two or more models may have support from different subjects or different task domains. Appropriate statistical tools exist that allow one to model the extent to which one model or another is favored by the data, and for which subjects and task domains. We review evidence that subjects exhibit some modest amounts of probability weighting, and some controversial evidence concerning the extent of loss aversion. Much of the behavioral folklore on probability weighting and loss aversion has employed elicitation procedures and/or statistical methods, which are piecemeal or have ad hoc properties. Our final topic for discussion is how the characterization of behavior in a wide range of experimental tasks is affected by the treatment of risk attitudes, or confounded by the lack of such a treatment. Examples reviewed here include tests of EUT, estimates of discount rates, and evaluations of alternative models of bidding behavior in auctions. One open issue, with the potential to undermine many inferences in experimental economics, is the extent to which sample selection is driven by risk attitudes. A related concern is the reliability of measurements of treatment effects when subjects have some choice as to which treatment to participate in. In brief, risk attitudes play a central role in experimental economics, and the nuances of measuring and controlling them demand the attention of every experimenter.
1. ELICITATION PROCEDURES Five general elicitation procedures have been used to ascertain risk attitudes from individuals in the experimental laboratory using non-interactive settings. The first is the Multiple Price List (MPL), which entails giving
Risk Aversion in the Laboratory
45
the subject an ordered array of binary lottery choices to make all at once. The MPL requires the subject to pick one of the lotteries on offer, and then the experimenter plays that lottery out for the subject to be rewarded. The second is a series of Random Lottery Pairs (RLP), in which the subject picks one of the lotteries in each pair, and faces multiple pairs in sequence. Typically one of the pairs is randomly selected for payoff, and the subject’s preferred lottery is then played out as the reward. The third is an Ordered Lottery Selection (OLS) procedure in which the subject picks one lottery from an ordered set. The fourth method is a Becker–DeGroot–Marschak (BDM) auction in which the subject is asked to state a minimum certainty-equivalent (CE) selling price to give up the lottery he has been endowed with. The fifth method is a hybrid of the others: the Trade-Off (TO) design, in which the subject is given lotteries whose prizes (or probabilities) are endogenously defined in real-time by prior responses of the same subject, and some CE elicited. We also review several miscellaneous elicitation procedures that have been proposed.
1.1. The Multiple Price List Design The earliest use of the MPL design in the context of elicitation of risk attitudes is, we believe, Miller, Meyer, and Lanzetta (1969). Their design confronted each subject with five alternatives that constitute an MPL, although the alternatives were presented individually over 100 trials. The method was later used by Schubert, Brown, Gysler, and Brachinger (1999), Barr and Packard (2002), and Holt and Laury (2002). Appendix C reviews the HL experiments in detail. The HL instrument provides a simple test for risk aversion using an MPL design. Each subject is presented with a choice between two lotteries, which we can call A or B. Panel A of Table 1 illustrates the basic payoff matrix presented to subjects. The first row shows that lottery A offered a 10% chance of receiving $2 and a 90% chance of receiving $1.60. The expected value of this lottery, EVA, is shown in the third-last column as $1.64, although the EV columns were not presented to subjects.6 Similarly, lottery B in the first row has chances of payoffs of $3.85 and $0.10, for an expected value of $0.48. Thus, the two lotteries have a relatively large difference in expected values, in this case $1.17. As one proceeds down the matrix, the expected value of both lotteries increases, but the expected value of lottery B becomes greater than the expected value of lottery A. The subject chooses A or B in each row, and one row is later selected at random for payout for that subject. The logic behind this test for risk
p ($1.60)
p ($3.85) a
Probability of Bad Outcome
Bad Outcome (Indian Rupees)
1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2
Probability of Good Outcome
0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0
p ($0.10)
$0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10
$0.48 $0.85 $1.23 $1.60 $1.98 $2.35 $2.73 $3.10 $3.48 $3.85
EVB
50 95 120 125 150 160 190 200
Good Outcome (Indian Rupees)
$1.64 $1.68 $1.72 $1.76 $1.80 $1.84 $1.88 $1.92 $1.96 $2.00
EVA
50 70 80 80 90 90 100 100
Expected Value
$1.17 $0.83 $0.49 $0.16 $0.17 $0.51 $0.84 $1.18 $1.52 $1.85
Difference
Experiments were also conducted at the 20, 50, and 90 level. Experiments were also conducted at the rupees 0.5 level (compared to alternative O) and at the rupees 5 level, with roughly 2 weeks interval.
b
a
B. Binswanger (1980, 1981) instrument with payoffs at the rupees 50 levelb O 1/2 50 A 1/2 45 B 1/2 40 B 1/2 35 C 1/2 30 C 1/2 20 E 1/2 10 F 1/2 0
Alternative
A. Holt and Laury (2002) instrument with payoffs at the 1 level 0.1 $2 0.9 $1.60 0.1 $3.85 0.2 $2 0.8 $1.60 0.2 $3.85 0.3 $2 0.7 $1.60 0.3 $3.85 0.4 $2 0.6 $1.60 0.4 $3.85 0.5 $2 0.5 $1.60 0.5 $3.85 0.6 $2 0.4 $1.60 0.6 $3.85 0.7 $2 0.3 $1.60 0.7 $3.85 0.8 $2 0.2 $1.60 0.8 $3.85 0.9 $2 0.1 $1.60 0.9 $3.85 1 $2 0 $1.60 1 $3.85
p ($2)
Lottery B
Lottery Choices in the Holt/Laury and Binswanger Risk Aversion Instruments.
Lottery A
Table 1. 46 GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Risk Aversion in the Laboratory
47
aversion is that only risk loving subjects would take lottery B in the first row, and only risk-averse subjects would take lottery A in the second last row. Arguably, the last row is simply a test that the subject understood the instructions, and has no relevance for risk aversion at all.7 A risk-neutral subject should switch from choosing A to B when the EV of each is about the same, so a risk-neutral subject would choose A for the first four rows and B thereafter. The HL instrument is typically applied using a random lottery incentive procedure in which one row is selected to be played out according to the choices of the subjects, rather than all rows being played out. But that is not an essential component of the instrument, even if it is popular and widely used in many experiments to save scarce experimental funds. We discuss the random lottery incentive procedure in detail in Section 3.8. The MPL instrument has one apparent weakness as an elicitation procedure: it might suggest a frame that encourages subjects to select the middle row, contrary to their unframed risk preferences. The antidote for this potential problem is to devise various ‘‘skewed’’ frames in which the middle row implies different risk attitudes, and see if there are differences across frames. Simple procedures to detect such framing effects, and correcting them statistically if present, have been developed (e.g., Harrison, Lau, Rutstro¨m, & Sullivan, 2005; Andersen, Harrison, Lau, & Rutstro¨m, 2006; Harrison, List, & Towe, 2007). The evidence suggests that there may be some slight framing effect, but it is not systematic and can be easily allowed for in the estimation of risk attitudes. A variant of the MPL instrument was developed in the laboratory by Schubert et al. (1999).8 Figs. 1 and 2 illustrate the interface provided to subjects by Barr and Packard (2002), in a sequential field implementation of this variant used in Chile. Respondents were confronted with a series of gambles framed first as an investment. The experiment then elicited their CE for an uncertain lottery. Trained experimenters asked the respondents to imagine themselves as investors choosing whether to invest in Firm A, whose profits were determined by its chances of success or failure, or Firm B, whose profits were fixed regardless of how well it fared. The experimenter explained the probabilities of Firm A’s success, the payoffs from Firm A in each state, and the fixed payoff from Firm B. The respondents were then asked to decide in which firm to invest. After registering their answer, the experimenter would raise the amount of the secure payoff, and ask the respondents to choose between the two firms again. As the amount of the secure payoff grew, investing in Firm A looked less attractive to a riskaverse respondent. In this way a CE, the point at which respondents would
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
48
Investment Decision 1
FIRM B
FIRM A
Very successful Profit=3,000 P with a 1 in 6 chance, i.e., if Not very successful Profit=1,000 P with a 5 in 6 chance. i.e., if
,
Fig. 1.
,
,
,
Do you choose Firm A or Firm B?
Primary MPL Instrument of Barr and Packard (2002).
no longer risk investing in Firm A, was elicited for each gamble. The probability of Firm A’s failure was altered three times while keeping the state-specific payoffs constant, and in the fourth investment gamble the payoffs were altered. A risk-averse subject would state a value for Firm B below the expected value of Firm A, and a risk-loving subject would state a value for Firm A above the expected value of Firm A. The subject knew that the CE ‘‘price list’’ would span the range shown in Fig. 2 before the sequence began. Two variants of the MPL instrument were developed by Harrison et al. (2005d; Section 3.1), and studied at length by Andersen et al. (2006a). One is called the Switching MPL method, or sMPL for short, and simply changes the MPL to ask the subject to pick the switch point from one lottery to the other. Thus, it enforces monotonicity, but still allows subjects to express indifference at the ‘‘switch’’ point, akin to a ‘‘fat switch point.’’ The subject was then paid in the same manner as with MPL, but with the non-switch choices filled in automatically. The other variant is the Iterative MPL method, or iMPL for short. The iMPL extends the sMPL to allow the individual to make choices from refined options within the option last chosen. That is, if someone decides at some stage to switch from option A to option B between values of $10 and $20, the next stage of an iMPL would
49
Risk Aversion in the Laboratory
Profit = 1,000 P
Profit = 1,200 P
Profit = 1,400 P
Profit = 1,600 P
Profit = 1,800 P
Profit = 2,000 P
Profit = 2,200 P
Profit = 2,400 P
Profit = 2,600 P
Profit = 2,800 P
Profit = 3,000 P
Tab for Investment Decision 1
Fig. 2. Slider in MPL Instrument of Barr and Packard (2002).
50
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
then prompt the subject to make more choices within this interval, to refine the values elicited.9 The computer implementation of the iMPL restricts the number of stages to ensure that the intervals exceed some a priori cognitive threshold (e.g., probability increments of 0.001). The iMPL uses the same incentive logic as the MPL and sMPL.10 Another feature of the MPL should be noted, although it is not obvious that it is a weakness or a strength: the fact that subjects see all choices in one (ordered) table. One alternative is to have the subjects make each binary lottery choice in a sequence, embedding them into the RLP design of Section 1.2. It is possible that allowing the subject to see all choices in one frame might lead some subjects to make more consistent choices than they would otherwise. Which approach, then, is the correct one to use? The answer depends on the inferential objective of the design, and the external context that the implied measure of risk aversion is to be applied to. We view the MPL and RLP as two different elicitation procedures: their effect on behavior should be studied systematically, in the manner we illustrate later in Section 2.5. We do not believe that consistency should always be the primary criterion for selection across elicitation procedures, particularly when one allows formally for the stochastic choice process (Section 2.3 and Wilcox (2008a)) and the possibility that it could interact with the elicitation procedure in some manner. Evidence for different risk attitudes across procedures is, by definition, a sign of a procedural artifact. But that evidence needs to be documented with formal statistical models and, if present, recognized as a behavioral corollary of using that procedure. In summary, the set of MPL instruments provides a relatively transparent procedure to elicit risk attitudes. Subjects rarely get confused about the incentives to respond truthfully, particularly when the randomizing devices are physical die that they know that they will toss themselves.11 As we demonstrate later, it is also possible to infer a risk attitude interval for the specific subject, at least under some reasonable assumptions.
1.2. The Random Lottery Pair Design The RLP design has not been used directly to infer risk attitudes, but has been generally used to test the predictions of EUT. Hey and Orme (1994) used an extensive RLP design to estimate utility functionals over lotteries for individuals non-parametrically. The use of the random lottery design, coupled with treating each pairwise choice as independent, implicitly means that the estimates they provide rely on the EUT specification.
Risk Aversion in the Laboratory
51
Related experimental data, from the earlier ‘‘preference reversal’’ debate, provide comparable evidence of risk aversion for smaller samples (see Grether and Plott, 1979 and Reilly, 1982). Additionally, many prominent experiments testing EUT provide observations based on a rich array of lotteries that vary in terms of probabilities and monetary prizes; for example, see Camerer (1989), Battalio, Kagel, and Jiranyakul (1990), Kagel, MacDonald, and Battalio (1990), Loomes, Starmer, and Sugden (1991), Harless (1992), and Harless and Camerer (1994). In most cases the published study only reports patterns of choices, with no information on individual characteristics, but they can be used to obtain general characterizations of risk attitudes for that subject pool. Hey and Orme (1994) asked subjects to make direct preference choices over 100 pairs of lotteries, in which the probabilities varied for four fixed monetary prizes of d0, d10, d20, and d30. Subjects could express direct preference for one lottery over the other, or indifference. One of the pairs was actually chosen at random at the end of the session for payout for each subject, and the subject’s preferences over that pair applied. Some days later the same subjects were asked back to essentially repeat the task, facing the same lottery combinations in different presentation order. HO used pie charts to display the probabilities of the lotteries they presented to subjects. A sample display from their computer display to subjects is shown in Fig. 3. There is no numerical referent for the probabilities, which must be judged from the pie chart. As a check, what fraction would you guess that each slice is on the left-hand lottery? In fact, this lottery offers d10 with probability 0.625, and d30 with probability 0.385. The right-hand lottery offers the same probabilities, as it happens, but with prizes of d10 and d20, respectively. Fig. 4 illustrates a modest extension of this display to include information on the probabilities of each pie slice, and was used in a replication and extension of the HO experiments by Harrison and Rutstro¨m (2005). HO used their data to estimate a series of utility functionals over lotteries, one for each subject since there were 100 observations for each subject in each task. This is a unique data set since most other studies rely on pooled data over individuals and the presumption that unobserved heterogeneity (after conditioning on any collected individual characteristics, such as sex and race and income) is random. The EUT functional that HO estimated was non-parametric, in the sense that they directly estimated the utility of the two intermediate outcomes, normalizing the lowest and highest to 0 and 1, respectively. This attractive approach works well when there are a small number of final outcomes
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
52
Fig. 3.
Lottery Display Used by Hey and Orme (1994).
across many choices, as here, but would not be statistically efficient if there had been many outcomes. In that case it would be appropriate to use some parametric functional form for utility, and estimate the parameters of that function. We illustrate these points later. The RLP instrument is typically used in conjunction with the random lottery payment procedure in which one choice is picked to be played out, but this is again not essential to the logical validity of the instrument. The great advantage of the RLP instrument is that it is extremely easy to explain to subjects, and the incentive compatibility of truthful responses apparent. Contrary to the MPL, it is generally not possible to directly infer a risk attitude from the pattern of responses, and some form of estimation is needed. We illustrate such estimations later.
1.3. The Ordered Lottery Selection Design The OLS design was developed by Binswanger (1980, 1981) in an early attempt to identify risk attitudes using experimental procedures with real
Risk Aversion in the Laboratory
Fig. 4.
53
Lottery Display for Hey and Orme (1994) Replication.
payoffs. Each subject is presented with a choice of eight lotteries, shown in each row of panel B of Table 1, and asked to pick one. Alternative O is the safe option, offering a certain amount. All other alternatives increase the average actuarial payoff but with increasing variance around that payoff. The lotteries were actually presented to subjects in the form of photographs of piles of money, to assist illiterate subjects. Each lottery had a generic label, such as the ones shown in the left column of panel B of Table 1. Fig. 5 shows the display used by Barr (2003) in a field replication of the basic Binswanger OLS instrument in Zimbabwe, and essentially matches the graphical display used in the original experiments (Hans Binswanger; personal communication). Because the probabilities for each lottery outcome are 1/2, this instrument can be presented relatively simply to subjects.12 The OLS instrument was first used in laboratory experiments by Murnighan, Roth, and Shoumaker (1987, 1988), although they only used the results to sort subjects into one group that was less risk averse than the other. Beck (1994) utilized it to identify risk aversion in laboratory subjects,
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
54
Fig. 5.
Lottery Display of Binswanger Replication by Barr (2003).
Risk Aversion in the Laboratory
55
prior to them making group decisions about the dispersion of everyone else’s potential income. This allowed an assessment of the extent to which subjects in the second stage chose more egalitarian outcomes because they were individually averse to risk or because they cared about the distribution of income. Eckel and Grossman (2002, 2008) used the OLS instrument to directly measure risk attitudes, as well as an innovative application in which subjects guessed the risk attitudes of other subjects. They found that subjects did appear to use sexual stereotypes in guessing the risk attitudes. The OLS instrument is easy to present to subjects, but has two problems when used to make inferences about non-EUT models of choice behavior. The versions that restrict probabilities to 1/2 make it virtually impossible to use these responses to make inferences about probability weighting, which play a major role in rank-dependent alternatives to EUT. Of course, there is nothing in the instrument itself that restricts the probabilities to 1/2, although that has been common. The second problem is that the use of the certain amount may frame the choices that subjects make in a manner that makes them ‘‘sign-dependent,’’ such that the certain amount provides a reference point to identify gains and losses. This concern applies more broadly, of course, but in the OLS instrument there is a natural and striking reference point for (some) subjects to use. We consider both of these issues later when we consider inferences from observed choices. Engle-Warnick, Escobal, and Laszlo (2006) undertake laboratory experiments with the OLS instrument to test the effect of presenting the choices in different ways. The baseline mimics the procedures of Binswanger (1980, 1981) and Barr (2003), shown in Fig. 5, except that five lotteries were arrayed in a circle in an ordered counter-clockwise fashion, with the certain amount at 12 o’clock. The first treatment then presents the ordered pairs of lotteries in a binary choice fashion, so that the subject makes four binary choices. The second treatment extends these binary choices by including a lottery that is dominated by one of the original binary pairs. The dominated lottery is always presented in between the non-dominated lotteries, so it appears to be physically intermediate. Each subject made 13 decisions, which were randomized in order and left–right presentation (for the undominated lotteries). The statistical analysis of the results is unfortunately couched in terms of ordinal measures of the degree of risk aversion, such as the number of safe choices, and it would be valuable to see the effect of these treatments on estimated measures of relative risk aversion (RRA) using more explicit statistical methods (e.g., per Section 2.2, and particularly Sections 2.5 and 2.6). But there is evidence that the instruments are positively correlated, although the correlation is significantly less than one.
56
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
In particular, the correlation between the baseline OLS instrument and the transformed binary choice version for Canadian university students is 0.63, but it is only 0.31 for Peruvian farmers. Moreover, the introduction of a dominated lottery appeared to have no significant effect on the correlation of risk attitudes for the Canadian university students, but considerable effects on the correlation for Peruvian farmers.
1.4. The Becker–DeGroot–Marschak Design The original BDM design developed by Becker, DeGroot, and Marschak (1964) was modified by Harrison (1986, 1990) and Loomes (1988) for use as a test for risk aversion.13 This design was later used by McKee (1989), Kachelmeier and Shehata (1992) and James (2007) in similar exercises. The basic idea is to endow the subject with a series of lotteries, and to ask for the ‘‘selling price’’ of the lottery. The subject is told that a ‘‘buying price’’ will be picked at random, and if the buying price that is picked exceeds the stated selling price, the lottery will be sold at that price and the subject will receive that buying price. If the buying price equals or is lower than the selling price, the subject keeps the lottery and plays it out. It is relatively transparent to economists that this auction procedure provides a formal incentive for the subject to truthfully reveal the CE of the lottery. However, it is not clear that subjects always understand this logic, and responses may be sensitive to the exact nature of the instructions given. For the instrument to elicit truthful responses, the experimenter must ensure that the subject realizes that the choice of a buying price does not depend on the stated selling price.14 If there is reason to suspect that subjects do not understand this independence, the use of physical randomizing devices (e.g., die or bingo cages) may mitigate such strategic thinking. Of course, the BDM procedure is formally identical to a two-person Vickrey sealed-bid auction, with the same concerns about subjects not understanding dominant strategies without considerable training (Harstad, 2000; Rutstro¨m, 1998). A major concern when choosing elicitation formats is the strength of the incentives provided at the margin, that is, the magnitude of the losses generated by misrepresenting true preferences. While the BDM is known to have weak incentives around the optimum (Harrison, 1992), the same is also true for other elicitation formats.15 Comparing the incentive properties of the BDM to the MPL in a pairwise evaluation of a safer and a riskier lottery, we find that the expected loss from errors in the latter is a weighted average of the losses implied for the safe and the risky evaluations
Risk Aversion in the Laboratory
57
respectively in the BDM. The incentives in the BDM can be strengthened through a careful choice of the range of the buying prices and are generally stronger the higher is the variance of the lottery being valued.16 Plott and Zeiler (2005) express a concern with the way that the BDM mechanism is popularly implemented. Appendix D reviews in detail an application of the BDM mechanism for eliciting risk attitudes by Kachelmeier and Shehata (1992) and illustrates some possible problems. It may be possible to re-design the BDM mechanism to avoid some of these problems,17 but more attractive elicitation procedures are available.
1.5. The Trade-Off Design Wakker and Deneffe (1996) propose a TO method to elicit utility values which does not make any assumption about whether the subject weighs probabilities. This is an advantage compared to the methods widely used in the ‘‘judgement and decision-making literature,’’ such as the CE or probability-equivalent methods,18 since those methods assume that there is no probability weighting. The TO method proceeds by asking the subject to consider two lotteries defined over prizes x0, x1, r, and R and probabilities p and 1 p: (x1, p; r, 1 p) and (x0, p; R, 1 p). It is assumed that RWr, p is some fixed probability of receiving the first outcome, and that x0 is some fixed and small amount such as $0. The subject is asked to tell the experimenter what x1 would make him indifferent between these two lotteries. Call this stage 1 of the TO method. Then the subject is asked the same question about the lotteries (x2, p; r, 1 p) and (x1, p; R, 1 p) and asked to state the x2 that makes him indifferent between these two. Call this stage 2 of the TO method. If the subject responds truthfully to these questions, it is possible to infer that u(x2) u(x1) ¼ u(x1) u(x0) using the logic explained by Wakker and Deneffe (1996; p. 1134). Setting u(x0) ¼ 0, we can then infer that u(x2) ¼ 2 u(x1). A similar argument leads to an elicited x3 such that u(x3) ¼ 3 u(x1), and so on. If we wanted to stop at x3, we could then renormalize u(x1) to 1/3, so that the we have elicited utility over the unit interval. The obvious problem with the TO method as implemented by Wakker and Deneffe (1996) is that it is not incentive compatible: subjects have a transparent incentive to overstate the value of x1, and indeed all other elicited amounts. Assume that subjects are to be incentivized in the obvious manner by one of the lotteries in each task being picked by a coin toss to be played out (or by just one such lottery being picked at random
58
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
over all three stages). First, by overstating x1 in stage 1 the subject increases the final outcome received if a lottery in stage 1 is used to pay him because x1 is one of the outcomes in one of the lotteries in stage 1. Second, by overstating x1 in stage 1 the subject increases the final outcome received if a lottery in stage 2 is used to pay him, since x1 is used to define one of the lotteries in stage 2. Thus, we would expect some subject to ask us, sheepishly in stage 1, ‘‘how large an x1 am I allowed to state?’’ It is surprising that the issue of incentive compatibility was not even discussed in Wakker and Deneffe (1996), but since the actual experiments they report were hypothetical, even an otherwise incentive compatible mechanism could have problems generating truthful answers. There is a recognition that the ‘‘chaining’’ of old responses into new lotteries might lead to error propagation (p. 1148), but that is an entirely separate matter than strategic misrepresentation. The TO method was extended by Fennema and van Assen (1999) to consider losses as well as gains. The experiments were all hypothetical, primarily to avoid the ethical problems of exposing subjects to real losses. The TO method was extended by Abdellaoui (2000) to elicit probability weights after utilities have been elicited. Real rewards were provided for one randomly selected binary choice in the gain domain from one randomly selected subject out of the 46 present, but the issue of incentive compatibility is not discussed. There is an attempt to elicit utility values in a nonsequential manner, which might make the chaining effect less transparent to inexperienced subjects, but again this only mitigates the second of the sources of incentive incompatibility.19 Bleichrodt and Pinto (2000) proposed a different way of extending the TO method to elicit probability weights, but only applied their method to hypothetical utility elicitation in the health domain. They provide a discussion (p. 1495) of ‘‘error propagation’’ that points to some of the literature on stochastic error specifications considered in Section 2.3, but in each case assume that the error has mean zero, which misses the point of the incentive incompatibility of the basic TO method. Abdellaoui, Bleichrodt, and Paraschiv (2007b) extend the TO method to elicit measures of loss aversion. Their experiments were for hypothetical rewards, and they do not discuss incentive compatibility.20
1.6. Miscellaneous Designs There are several experimental designs that attempt to elicit risk attitudes that do not easily fit into one of the five major designs considered above.
Risk Aversion in the Laboratory
59
We again ignore any designs that do not claim to elicit risk attitudes in any conceptual sense that an economist would recognize, even if those designs might elicit some measure which is empirically correlated in some settings with the measures of interest to economists. Fehr and Goette (2007) estimate a loss aversion parameter using a Blind Loss Aversion model of behavior, ‘‘extending’’ the Myopic Loss Aversion model of Benartzi and Thaler (1995); we review the latter model in detail in Section 3.5. They ask subjects to consider two lotteries, expressed here in equivalent dollars instead of Swiss Francs: Lottery A: Win $4.50 with probability 1/2, lose $2.80 with probability 1/2. Otherwise get $0. Lottery B: Play six independent repetitions of lottery A. Otherwise get $0. Subjects could participate in both lotteries, neither, or either. Fehr and Goette (2007) assume that subjects have a linear utility function for stakes that are this small, relying on the theoretical arguments of Rabin (2000) rather than the data of Holt and Laury (2002) and others. They also assume that there is no probability weighting: even though Quiggin (1982; Section 4) viewed 1/2 as a plausible fixed point in probability weighting, most others have assumed or found otherwise. If one is blind to the effects of curvature of the utility function and probability weighting then the only thing left to explain choices over these lotteries is loss aversion. On the other hand, it becomes ‘‘heroic’’ to then extrapolate those estimates to explain behavior that one has elsewhere (p. 304) assumed to be characterized by stakes large enough that strictly concave utility is plausible a priori. Of course, the preferred model (p. 306) assumes away concavity and only uses the loss aversion parameter, but without explanation for why behavior over such stakes should be driven solely by loss aversion instead of risk attitudes more generally.21 Tanaka, Camerer, and Nguyen (2007) (TCN) propose a method to elicit risk and time preferences from individuals. They assume a certain parametric structure in their risk elicitation procedure, assuming Cumulative Prospect Theory (CPT): specifically, power Constant Relative Risk Aversion (CRRA) utility functions for gains and losses, and the oneparameter version of the Prelec (1998) probability weighting function. They further assume that the CRRA coefficient for gains and losses is the same. We consider these functional forms in detail in Sections 3.1 and 3.2. The upshot is they seek to elicit one parameter s that controls the concavity or convexity of the utility function, one parameter a that controls the curvature of the probability weighting function, and one parameter l that determines
60
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
the degree of loss aversion. Their elicitation procedure for time preferences is completely separate conceptually from their elicitation procedure for risk attitudes, and is not used to infer anything about risk preferences.22 To elicit the first two parameters, s and a, TCN ask subjects to consider three MPL sheets. The first sheet contains 14 options akin to those used in the Holt and Laury (2002) MPL procedure, shown in panel A of Table 1. The difference is that the probabilities of the high or low outcomes in each lottery stay constant from row to row, but the high prize in the ‘‘risky’’ lottery get larger and larger: the risky lottery start off in row 1 as ‘‘relatively risky’’ but with a relatively low expected value, and changes so that in the last row it becomes ‘‘extremely risky’’ but with a substantially higher expected value. The specific, fixed probabilities used are 0.3 for the high prize in the safe lottery and 0.1 for the high prize in the risky lottery. Subjects are asked to pick a switch point in this sheet, akin to the sMPL procedure of Andersen et al. (2006a); of course, this is just a monotonicityenforcing variant of the basic MPL procedure of Holt and Laury (2002). So we can see that behavior in the first sheet elicits an interval for s if we had ignored probability weighting, just as it elicited an interval for the CRRA coefficient in Holt and Laury (2002; Table 3, p. 1649). But with probability weighting allowed, all we can infer from this choice are combinations of intervals for s and a. TCN indicate (p. 8, fn. 11) that the values of s and a they report are actually ‘‘rounded mid-points’’ of the intervals. For example, one interval they infer is 0.65oso0.74 and 0.66oao0.74, and they round this to the values s ¼ 0.7 and a ¼ 0.7. They note in a footnote to Table A1 (p. 33) that the boundaries of the intervals are approximated to the nearest 0.05 increments. If subjects do not switch they use the approximate values at the last possible interval; in fact, the implied interval should have a finite value for a lower bound and N for the upper bound, as noted by Coller and Williams (1999).23 For their particular parameters there are seven such combinations of interval pairs. The second sheet in the procedure of TCN is qualitatively the same as the first sheet, except that the probabilities of the high prize in each lottery are now 0.1 and 0.7. The specific prizes are different, but have the same structure as the first sheet. From the switching point in the second sheet one can derive another set of interval pairs for the parameters s and a. The values for these intervals will be different than the intervals derived from the first sheet, because of differences in the value of the prizes and probabilities. By crossing the two sets of intervals one can reduce the implied intervals to the intersections from the two sheets. Since the prizes in these two sheets involve gains, the loss aversion parameter l plays no role.
Risk Aversion in the Laboratory
61
The third sheet in the procedure of TCN involves losses. There are seven options in which each lottery contains one positive prize and one negative prize, so these are ‘‘mixed lotteries.’’ Probabilities of the high prize are fixed at 1/2 for all rows, and variations in three of the prizes occur from row to row. Conditional on a value of s from responses to the first two sheets, the response in the third sheet implies an interval for l. For example, if s ¼ 0.2 then somebody switching at, say, row 4 in the third sheet would have revealed a loss aversion parameter such that 1.88olo2.31, but if s ¼ 1 then somebody switching at row 4 in the third sheet would have revealed a loss aversion parameter such that 1.71olo2.42. The parameters for the third sheet were chosen, for a given observed response, so that the implied intervals for l did not differ widely as s varied over the expected range. Of course, the responses in the third sheet provides information on s as well as l. In other words, if one only observed responses from the third sheet there would be a number of interval pairs for s and l that could account for the data, just as there are a number of interval pairs of s and a that could rationalize the observed response in the first or second sheet. So, the TCN procedure implicitly imposes a recursive estimation structure, so that s is pinned down only from the responses in the first two sheets, and then the responses in the third sheet are used, conditional on some s, to infer bounds for l. This is a wily and parsimonious assumption, but might lead to different inferences than if one simply took all responses in these three sheets and simultaneously estimated s, a, and l, using ML methods discussed in Section 2.2. The TCN procedure generates no information on standard errors of estimates, but such information would be provided automatically with the use of ML methods. Although the parameters they derive are conditional on the specific functional forms assumed, and in some cases (e.g., the third sheet) chosen to generate relatively robust inferences assuming those parameterizations, it should be possible to recover estimates for some minor variations in functional form (e.g., Constant Absolute Risk Aversion (CARA) instead of CRRA).
2. ESTIMATION PROCEDURES Two broad methods of estimating risk attitudes have been used. One involves the calculation of bounds implied by the observed choices, typically using utility functions which only have a single-parameter to be inferred. The other involves the direct estimation by ML of some structural model of a latent choice process in which the core parameters defining risk attitudes
62
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
can be estimated, in the manner pioneered by Camerer and Ho (1994; Section 6.1) and Hey and Orme (1994). The latter approach is particularly attractive for non-EUT specifications, where several core parameters combine to characterize risk attitudes. For example, one cannot characterize risk attitudes under Prospect Theory (PT) without making some statement about loss aversion and probability weighting, along with the curvature of the utility function. Thus, joint estimation of all parameters is a necessity for reliable statements about risk attitudes in such cases. We first review examples of each approach (Sections 2.1 and 2.2), and then consider the role of stochastic errors (Section 2.3), the possibility of non-parametric estimation (Section 2.4), and a comparison of risk attitudes elicited from different procedures (Section 2.5), and treatments (Section 2.6). The exposition in this section focuses almost exclusively on EUT characterizations of risk attitudes. Alternative models are considered in Section 3.
2.1. Inferring Bounds The HL data may be analyzed using a variety of statistical models. Each subject made 10 responses in each task, and typically made 30 responses over three different tasks. The responses in each task can be reduced to a scalar if one looks at the lowest row in panel A of Table 1 where the subject ‘‘switched’’ over to option B.24 This reduces the response to a scalar for each subject and task, but a scalar that takes on integer values between 0 and 10. In fact, over 83% of their data takes on values of 4 through 7, and 94% takes on values between 3 and 8. HL evaluate these data using ordinary least squares regression with the number of safe choices as the dependent variable, estimated on the sample generated by each task separately, and report univariate tests of demographic effects.25 They also report semi-parametric tests of the number of safe choices with experimental condition as the sole control. To study the effects of experimental conditions, while controlling for characteristics of the sample and the conduct of the experiment, one could employ an interval regression model, first proposed by Coller and Williams (1999) for an MPL experimental task (eliciting discount rates). The dependent variable in this analysis is the CRRA interval that each subject implicitly chose when they switched from option A to option B. For each row of panel A in Table 1, one can calculate the bounds on the CRRA coefficient that is implied, and these are in fact reported by Holt and Laury
Risk Aversion in the Laboratory
63
(2002; Table 3). Thus, for example, a subject that made five safe choices and then switched to the risky alternatives would have revealed a CRRA interval between 0.15 and 0.41, and a subject that made seven safe choices would have revealed a CRRA interval between 0.68 and 0.97, and so on.26 When we consider samples that pool responses over different tasks for the same individual, we would use a random effects panel interval regression model to allow for the correlation of responses from the same subject. Using this panel interval regression model, we can control for all of the individual characteristics collected by HL, which includes sex, age, race (Black, Asian, or Hispanic), marital status, personal income, household income, household size, whether the individual is the primary household budget decision-maker, indicator of full-time employment, student status, faculty status, whether the person is a junior, senior, or graduate student, and whether the person has ever voted. In addition, dummy variables indicate specific sessions, and a separate indicator identifies those sessions conducted at Georgia State University. The treatment variables, of course, include the scale of payoffs (1, 20, 50, or 90), the order of the task (1, 2, 3, or 4), and the experimental income earned by the subject in task 3. Table 2 presents ML estimates of this interval regression model. Since each subjects contributed several tasks, a random effects specification has been used to control for unobserved individual heterogeneity. One of the advantages of the use of inferred bounds for risk attitudes is that one can estimate detailed models such as in Table 2, since interval regression is a relatively stable statistical model, and a straightforward extension of ordinary least squares. It is also easy to correct for multiplicative heteroskedasticity using this estimation method, although that can introduce convergence problems as a practical matter. The main benefit of such an estimation is the ability to quickly ascertain treatment and demographic effects for the sample. Consider first the question of order effects. Tasks 1 and 4 were identical in terms of the payoff scale, but differed because of their order and the fact that subjects had some experimental income from the immediately prior task 3. Controlling for that prior income, as well as other individual covariates, we find that there is an order effect: the CRRA coefficient increases by 0.16 in task 4 compared to task 1, and this is significant at the 2% level. Thus, order effects do seem to matter in these experiments, and in a direction that confound the inferences drawn about scale from the highpayoff treatments. There is also a significant scale effect, as seen for task 3 in Table 2, so the only way that one can ascertain the pure effect of order when there is a confounding change in scale, without such assumptions, would be
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
64
Table 2.
Interval Regression Model of Responses in Holt and Laury Experimentsa.
Variable
Description
scale5090 Task3 Task4 wealth
Payoffs scaled by 50 or 90 Third task Fourth task Wealth coming into the lottery choice Session B Session C Session D Session E Session F Session G Session H Session I Session J Session K Session M Female Black Asian Hispanic Age Ever married Personal income between $5k and $15k Personal income between $15k and $30k Personal income above $30k Household income between $5k and $15k Household income between $15k and $30k Household income between $30k and $45k Household income between $45k and $100k Household income over $100k Number in household
Sess2 Sess3 Sess4 Sess5 Sess6 Sess7 Sess8 Sess9 Sess10 Sess11 Sess13 female black asian hispanic age married Pinc2 Pinc3 Pinc4 Hinc2 Hinc3 Hinc4 Hinc5
Hinc6 nhhd
Estimate
Standard p-Value Lower 95% Upper 95% Error Confidence Confidence Interval Interval
0.13 0.26 0.16 0.00
0.15 0.04 0.07 0.00
0.38 0.00 0.02 0.10
0.16 0.18 0.02 0.00
0.42 0.34 0.30 0.00
0.18 0.01 0.16 0.27 0.14 0.24 0.45 0.21 0.31 0.07 0.10 0.04 0.05 0.05 0.39 0.01 0.12 0.06
0.20 0.16 0.20 0.20 0.15 0.18 0.20 0.18 0.18 0.22 0.21 0.06 0.16 0.10 0.12 0.01 0.09 0.11
0.37 0.92 0.43 0.17 0.34 0.18 0.02 0.23 0.08 0.75 0.62 0.46 0.75 0.63 0.00 0.34 0.18 0.56
0.58 0.29 0.54 0.66 0.44 0.60 0.84 0.55 0.67 0.36 0.31 0.07 0.26 0.14 0.62 0.02 0.06 0.15
0.21 0.32 0.23 0.12 0.15 0.11 0.06 0.13 0.04 0.50 0.52 0.16 0.36 0.23 0.16 0.01 0.30 0.27
0.14
0.11
0.24
0.36
0.09
0.10
0.13
0.41
0.35
0.14
0.24
0.16
0.13
0.07
0.54
0.17
0.15
0.27
0.13
0.47
0.08
0.16
0.63
0.23
0.39
0.31
0.14
0.03
0.03
0.58
0.14
0.17
0.39
0.18
0.47
0.03
0.03
0.38
0.09
0.03
65
Risk Aversion in the Laboratory
Table 2. (Continued ) Variable
decide fulltime student business junior senior grad faculty voter gsu
Description
Estimate
Primary household budget decision-maker Full time employment Student Business major Junior Senior Graduate student Faculty Ever voted Experiment at Georgia State University
0.09
0.08
0.26
0.25
0.07
0.15 0.17 0.20 0.16 0.03 0.18 0.07 0.01 0.40
0.10 0.08 0.10 0.13 0.14 0.15 0.24 0.07 0.22
0.16 0.02 0.05 0.23 0.84 0.22 0.77 0.86 0.07
0.06 0.02 0.39 0.41 0.31 0.11 0.55 0.15 0.83
0.35 0.32 0.00 0.10 0.25 0.46 0.40 0.12 0.03
0.63 0.29
0.27 0.03
0.02 0.00
0.10 0.24
1.15 0.34
0.33
0.01
0.00
0.30
0.36
Constant Standard deviation of su random individual effect se Standard deviation of residual
Standard p-Value Lower 95% Upper 95% Error Confidence Confidence Interval Interval
Notes: Log-likelihood value is 838.24; Wald test for null hypothesis that all coefficients are zero has a w2 value of 118.44 with 40 degrees of freedom, implying a p-value less than 0.001; fraction of the total error variance due to random individual effects is estimated to be 0.433, with a standard error of 0.043. a Random-effects interval regression. N ¼ 495, based on 181 subjects from Holt and Laury (2002).
to modify the HL design and directly test for it. Harrison, Johnson, McInnes, and Rutstro¨m (2005b) provided such a test, and found that there were statistically significant order effects on risk attitudes; we consider their data below. We observe no significant effect in Table 2 from sex: women are estimated to have a CRRA that is 0.04 higher than men, but the standard error of this estimate is 0.06. Hispanic subjects do have a statistically significant difference in risk attitudes: their CRRA is 0.39 lower on average, with a p-value of less than 0.001. Subjects with an annual household income that places them in the ‘‘upper middle class’’ (between $45,000 and $100,000) have a significantly higher CRRA that is 0.31 above the norm, with a p-value of 0.03. Students have a CRRA that is 0.17 higher on average ( p-value ¼ 0.02); the HL sample included faculty and staff in their
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
66
experiments. Business majors were less risk averse on average, by about 0.20 ( p-value ¼ 0.05). There are some quantitatively large session effects, although only two sessions (H and J) have effects that are statistically significant in terms of the p-value. To preserve anonymity, the locations of these sessions apart from those at Georgia State University are confidential, so one can only detect individual session effects. Fig. 6 shows the distribution of predicted CRRA coefficients from the interval regression model estimates of Table 2 from task 1 (top left panel) and task 3 (bottom left panel). The estimates for the high-payoff task 3 are only from those subjects that faced the payoffs that were scaled by a factor of 20. The average low-payoff CRRA is estimated to be 0.28, with a standard deviation of 0.20; the average high-payoff CRRA is estimated to be 0.54 with a standard deviation of 0.26. As Fig. 6 demonstrates, the distribution is normally shaped, with relatively few of the estimates exhibiting significant risk aversion above 0.9. Harrison et al. (2005b) recruited 178 subjects from the University of South Carolina to participate in a series of non-computerized experiments using the MPL procedure of HL. Their design called for subjects to
2
2
1.5
1.5
1
1
.5
.5
0
0 -1
-.5
0
.5
1
-1
-.5
Estimated CRRA from First Task, 1× Responses
2
2
1.5
1.5
1
1
.5
.5
0
0
.5
1
Estimated CRRA from First Task, 1× Responses
0 -1
-.5
0
.5
1
Estimated CRRA from Third Task, 20× Responses
Fig. 6.
-1
-.5
0
.5
1
Estimated CRRA from Fourth Task, 1× Responses
Interval Regression Estimates of Risk Aversion From Holt and Laury (2002) Experiments (Fraction of the Sample, N ¼ 181).
Risk Aversion in the Laboratory
67
participate in either a 1 session, a 10 session, or a 110 session, where the ‘‘’’ denotes the scalar applied to the basic payoffs used by HL in their 1 design (shown in panel A of Table 1). In the 1 session that is all that the subjects were asked to do; in the 10 session they did one risk elicitation task but with payoffs scaled up by 10. In the 110 session subjects were asked to state their choices over 1 lotteries, and then given the opportunity to give up any earnings from that task and participate in a comparable 10 task. We examine the responses of the subjects in the 10 session and in the 10 part of the 110 session, with controls for whether their 10 responses were preceded by the 1 task or not. Table 3 reports the statistical analysis of these data, also using an interval regression model. Since each subject made only one 10 choice, no panel corrections are needed. The results show no significant effect from sex, and some effect from age, citizenship, and task order. One limitation of this approach is that it assumes that all of the heterogeneity of the sample is captured by the individual characteristics measured by the experimenter. Although the socio-demographic questions typically used are relatively extensive, there is always some concern that there might be unobserved individual heterogeneity that could affect preferences towards risk. It is possible to undertake a statistical analysis of the responses of each individual, which implicitly controls for unobserved heterogeneity in the pooled analysis. However, the MPL design is not well suited to such an estimation task, even if it can be undertaken numerically, due to the small sample size for each individual. It is a simple matter to extend the HL design to have the subject consider several MPL tables for different lottery prizes, providing a richer data set with which to characterize individual risk attitudes (e.g., Harrison, Lau, & Rutstro¨m, 2007b). Apart from providing several interval responses per subject, such designs allow one to vary the prizes in the MPL design and pin down the latent CRRA more precisely by having overlapping intervals across tasks, as explained by Harrison et al. (2005d). Thus, if one task tells us that a given subject has a CRRA interval between 0.1 and 0.3, and another task tells us that the same subject has an interval between 0.2 and 0.4, we can infer a CRRA interval between 0.2 and 0.3 from the two tasks (with obvious assumptions about the absence of order effects, or some controls for them). Another limitation of this approach, somewhat more fundamental, is that it restricts the analyst to utility functions that can characterize risk attitudes using one parameter. This is because one must infer the bounds that make the subject indifferent between the switch points, and such inferences become virtually incoherent statistically when there are two or more
68
Table 3. Variable
Female Black Age Business Sophomore Junior Senior GPAhi GPAlow Graduate EdExpect
EdFather EdMother Citizen Order
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Interval Regression Model of Responses in Harrison, Johnson, McInnes, and Rutstro¨m Experimentsa. Description
Estimate
Standard p-Value Lower 95% Upper 95% Error Confidence Confidence Interval Interval
Female Black Age in years Major is in business Sophomore in college Junior in college Senior in college High GPA (greater than 3.75) Low GPA (below 3.24) Graduate student Expect to complete a PhD or Professional Degree Father completed college Mother completed college U.S. citizen RA session 10 comes after 1 Constant
0.088 0.084 0.022 0.043 0.068 0.035 0.023 0.004
0.08 0.10 0.01 0.07 0.11 0.12 0.13 0.09
0.26 0.40 0.07 0.56 0.54 0.77 0.85 0.97
0.06 0.11 0.00 0.19 0.29 0.27 0.27 0.18
0.24 0.28 0.05 0.10 0.15 0.20 0.22 0.19
0.137
0.09
0.12
0.31
0.04
0.034 0.119
0.16 0.09
0.83 0.18
0.27 0.29
0.34 0.05
0.106
0.09
0.24
0.07
0.28
0.027
0.08
0.75
0.19
0.14
0.234 0.166
0.12 0.08
0.05 0.03
0.00 0.01
0.47 0.32
0.092
0.34
0.78
0.75
0.56
Notes: Log-likelihood value is 290.2; Wald test for null hypothesis that all coefficients are zero has a w2 value of 18.36 with 15 degrees of freedom, implying a p-value of 0.244. a All subjects facing 10 payoffs. N ¼ 178 subjects from Harrison et al. (2005b).
parameters. Of course, for popular functions such as CRRA or CARA this is not an issue, but if one wants to move beyond those functions then there are problems. It is possible to devise one-parameter functional forms with more flexibility than CRRA or CARA in some dimension, as illustrated nicely by the one-parameter Expo-Power (EP) function developed by Abdellaoui, Barrios, & Wakker (2007a; Section 4). But in general we will need to move to structural modeling with ML to accommodate richer models, illustrated in Section 2.2. We conclude that relatively consistent estimates of the CRRA coefficient of experimental subjects emerge from the HL experiments and the MPL
69
Risk Aversion in the Laboratory
design used in subsequent studies. There are, however, some apparent effects from task order, explored further in Harrison et al. (2005b) and Holt and Laury (2005). And there are significant limitations on the flexibility of the modeling of risk attitudes, pointing to the need for a complementary approach that allows structural estimation of latent models of choice under uncertainty.
2.2. Structural Estimation Assume for the moment that utility of income is defined by UðxÞ ¼
xð1rÞ ð1 rÞ
(1)
where x is the lottery prize and r 6¼ 1 a parameter to be estimated. For r ¼ 1, assume U(x) ¼ ln(x) if needed. Thus, r is the coefficient of CRRA: r ¼ 0 corresponds to RN, ro0 to risk loving, and rW0 to risk aversion. Let there be k possible outcomes in a lottery. Under EUT the probabilities for each outcome k, pk, are those that are induced by the experimenter, so expected utility is simply the probability weighted utility of each outcome in each lottery i: X EUi ¼ ð pk U k Þ (2) k¼1;K
The EU for each lottery pair is calculated for a candidate estimate of r, and the index rEU ¼ EUR EUL
(3)
calculated, where EUL is the ‘‘left’’ lottery and EUR is the ‘‘right’’ lottery. This latent index, based on latent preferences, is then linked to the observed choices using a standard cumulative normal distribution function F(rEU). This ‘‘probit’’ function takes any argument between 7N and transforms it into a number between 0 and 1 using the function shown in Fig. 7. Thus, we have the probit link function, probðchoose lottery RÞ ¼ FðrEUÞ
(4)
The logistic function is very similar, as illustrated in Fig. 7, and leads instead to the ‘‘logit’’ specification.
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
70
1
Prob(y*)
.75
.5
.25
0 -5
-4
-3
-2
-1
0
1
2
3
4
5
y*
Fig. 7.
Normal and Logistic Cumulative Density Functions (Dashed Line is Normal and Solid line is Logistic).
Even though Fig. 7 is common in econometrics texts, it is worth noting explicitly and understanding. It forms the critical statistical link between observed binary choices, the latent structure generating the index y, and the probability of that index y being observed. In our applications y refers to some function, such as Eq. (3), of the EU of two lotteries; or, later, the Prospective Utility (PU) of two lotteries. The index defined by Eq. (3) is linked to the observed choices by specifying that the R lottery is chosen when F(rEU)W1/2, which is implied by Eq. (4). Thus, the likelihood of the observed responses, conditional on the EUT and CRRA specifications being true, depends on the estimates of r given the above statistical specification and the observed choices. The ‘‘statistical specification’’ here includes assuming some functional form for the cumulative density function (CDF), such as one of the two shown in Fig. 7. If we ignore responses that reflect indifference for the moment the conditional log-likelihood would be X ln Lðr; y; XÞ ¼ ððln FðrEUÞjyi ¼ 1Þ þ ðln Fð1 rEUÞjyi ¼ 1ÞÞ (5) i
71
Risk Aversion in the Laboratory
where yi ¼ 1( 1) denotes the choice of the Option R (L) lottery in risk aversion task i, and X is a vector of individual characteristics reflecting age, sex, race, and so on. In most experiments the subjects are told at the outset that any expression of indifference would mean that if that choice was selected to be played out the experimenter would toss a fair coin to make the decision for them. Hence, one can modify the likelihood to take these responses into account by recognizing that such choices implied a 50:50 mixture of the likelihood of choosing either lottery: P ln Lðr; y; XÞ ¼ ððln FðrEUÞjyi ¼ 1Þ þ ðln Fð1 rEUÞjyi ¼ 1Þ i (50 ) þðlnð1=2 FðrEUÞ þ 1=2 Fð1 rEUÞÞjyi ¼ 0ÞÞ where yi ¼ 0 denotes the choice of indifference. In our experience very few subjects choose the indifference option, but this formal statistical extension accommodates those responses.27 The latent index, Eq. (3), could have been written in a ratio form: rEU ¼
EUR ðEUR þ EUL Þ
(30 )
and then the latent index would already be in the form of a probability between 0 and 1, so we would not need to take the probit or logit transformation. We will see that this specification has also been used, with some modifications we discuss later, in HL. Appendix F reviews procedures and syntax from the popular statistical package Stata that can be used to estimate structural models of this kind, as well as more complex models discussed later. The goal is to illustrate how experimental economists can write explicit ML routines that are specific to different structural choice models. It is a simple matter to correct for stratified survey responses, multiple responses from the same subject (‘‘clustering’’),28 or heteroskedasticity, as needed, and those procedures are discussed in Appendix F. Applying these methods to the data from the Hey and Orme (1994) experiments, one can obtain ML estimates of the core parameter r. Pooling all 200 of the responses from each subject over two sessions, and pooling over all subjects, we estimate r ¼ 0.66 with a standard error of 0.04 assuming a normal CDF as in the dashed line in Fig. 7. These estimates correct for the clustering of responses by the same subject. If we instead assume a logistic CDF, as in the solid line in Fig. 7, we instead obtain an
72
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
estimate r ¼ 0.80 with a standard error of 0.04. This is not a significant economic difference, but it does point to the fact that parametric assumptions matter for estimation of risk attitudes using these methods. In particular, the choice of normal or logistic CDF is almost entirely arbitrary in this setting. One might apply some nested or non-nested hypothesis test to choose between specifications, but we will see that it is dangerous to rush into rejecting alternative specifications too quickly. Extensions of the basic model are easy to implement, and this is the major attraction of the structural estimation approach. For example, one can easily extend the functional forms of utility to allow for varying degrees of RRA. Consider, as one important example, the EP utility function proposed by Saha (1993). Following Holt and Laury (2002), the EP function is defined as UðxÞ ¼
ð1 expðax1r ÞÞ a
(10 )
where a and r are parameters to be estimated. RRA is then r+a(1 r)y1 r, so RRA varies with income if a 6¼ 0. This function nests CRRA (as a-0) and CARA (as r-0). We illustrate the use of this EP specification later. It is also simple matter to generalize this ML analysis to allow the core parameter r to be a linear function of observable characteristics of the individual or task. In the HO experiments no demographic data were collected, but we can examine the effect of the subjects coming back for a second session by introducing a binary dummy variable (Task) for the second session. In this case, we extend the model to be r ¼ r0+r1 Task, where r0 and r1 are now the parameters to be estimated. In effect the prior model was to assume r ¼ r0 and just estimate r0. This extension significantly enhances the attraction of structural ML estimation, particularly for responses pooled over different subjects, since one can condition estimates on observable characteristics of the task or subject. We illustrate the richness of this extension later. For now, we estimate r0 ¼ 0.60 and r1 ¼ 0.10, with standard errors of 0.04 and 0.02, respectively, using the probit specification. So there is some evidence of a session effect, with slightly greater risk aversion in the second session. The effect of demographics and task can be examined using data generated by Harbaugh, Krause, and Vesterlund (2002). They examined lottery choices by a large number of individuals, varying in age between 5 and 64. Focusing on their lottery choices for dollars with individuals aged 19 and over, seven choices involved gambles in a gain frame, and
73
Risk Aversion in the Laboratory
seven over gambles in a loss frame. The loss frame experiments all involved subjects having some endowment up front, such that the loss was solely a framed loss, not a loss relative to the income they had coming into the session. In all cases the gamble was compared to a certain gain or loss, so these are relatively simple gambles to evaluate. The only demographic information included is age and sex, so we include those and interact them.29 We also allow for quadratic effects of age. Table 4 collects the estimates for models estimated separately on the choices made in the gain frame and choices made in the loss frame; later we
Table 4. Variable
Structural Maximum Likelihood Estimates of Risk Attitudes in Harbaugh, Krause, and Vesterlund Experimentsa. Description
Estimate Standard p-Value Lower 95% Upper 95% Error Confidence Confidence Interval Interval
A. Gain Domain Order2 Task order control Order3 Task order control Order4 Task order control Male Male Age Age in years Age2 Age squared Mage Male age Mage2 Male age2 Constant
0.009 0.010 0.005 0.016 0.014 0.000 0.001 0.000 0.476
0.007 0.008 0.007 0.029 0.001 0.000 0.002 0.000 0.021
0.168 0.197 0.481 0.594 0.000 0.000 0.776 0.852 0.000
0.004 0.005 0.009 0.042 0.011 0.000 0.005 0.000 0.434
0.023 0.026 0.019 0.073 0.017 0.000 0.003 0.000 0.517
B. Loss Domain Order2 Task order control Order3 Task order control Order4 Task order control Male Male Age Age in years Age2 Age squared Mage Male age Mage2 Male age2 Constant
0.004 0.000 0.005 0.030 0.013 0.000 0.003 0.000 0.483
0.006 0.007 0.007 0.024 0.001 0.000 0.002 0.000 0.016
0.575 0.974 0.494 0.205 0.000 0.000 0.053 0.026 0.000
0.009 0.013 0.018 0.077 0.011 0.000 0.000 0.000 0.452
0.016 0.014 0.009 0.016 0.016 0.000 0.007 0.000 0.514
Notes: Log-likelihood values are 8,070.56 in the gain domain, and 9,931.9 in the loss domain; Wald test for null hypothesis that all coefficients are zero has a w2 value of 339.2 with 8 degrees of freedom, implying a p-value less than 0.001 in the gain domain, and a value of 577.9 with 8 degrees of freedom in the loss domain. a Maximum likelihood estimation of CRRA utility function using all pooled binary choices of adults. N ¼ 1092, based on 156 adult subjects from Harbaugh et al. (2002).
74
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
consider the effect of assuming a model of loss aversion, rather than just viewing these as different frames.30 There is virtually no effect from the loss frame, and in fact some evidence of a slight increase in risk aversion in that frame. The average of individual CRRA estimates is 0.476 in the gain frame, and is virtually identical in the loss frame. We find no evidence of a sex effect in the gain frame. The direct effect of sex is to change CRRA by 0.016, but this small effect has a p-value of 0.594 and a 95% confidence interval that easily spans zero. The joint effect of sex and age is also statistically insignificant: a test of the joint effect of sex and the sex–age interactions has a w2 value of 1.17, and with three degrees of freedom has a p-value of 0.761. Age has a significant effect on CRRA in the gain domain, at first increasing RRA and then eventually decreasing RRA as the individual gets older. The order dummies indicate no significant effect of task presentation order. There does appear to be an effect of sex on CRRA elicited in the loss frame. This effect is not direct, but is based on the interaction with age. Apart from the statistical significance of the individual interaction terms, a test that they are jointly zero has a w2 of 7.08 and a p-value of 0.069 with three degrees of freedom.
2.3. Stochastic Errors An important extension of the core model is to allow for subjects to make some errors. The notion of error is one that has already been encountered in the form of the statistical assumption that the probability of choosing a lottery is not one when the EU of that lottery exceeds the EU of the other lottery. This assumption is clear in the use of a link function between the latent index rEU and the probability of picking one or other lottery; in the case of the normal CDF, this link function is F(rEU) and is displayed in Fig. 7. If there were no errors from the perspective of EUT, this function would be a step function in Fig. 7: zero for all values of yo0, anywhere between 0 and 1 for y ¼ 0, and 1 for all values of yW0. By varying the shape of the link function in Fig. 7, one can informally imagine subjects that are more sensitive to a given difference in the index rEU and subjects that are not so sensitive. Of course, such informal intuition is not strictly valid, since we can choose any scaling of utility for a given subject, but it is suggestive of the motivation for allowing for structural errors, and why we might want them to vary across subjects or task domains. Consider the structural error specification used by HL, originally due to Luce. The EU for each lottery pair is calculated for candidate estimates of r,
75
Risk Aversion in the Laboratory
as explained above, and the ratio 1=m
rEU ¼
EUR 1=m
1=m
(300 )
ðEUL þ EU R Þ calculated, where m is a structural ‘‘noise parameter’’ used to allow some errors from the perspective of the deterministic EUT model. The index rEU is in the form of a cumulative probability distribution function defined over differences in the EU of the two lotteries and the noise parameter m. Thus, as m-0 this specification collapses to the deterministic choice EUT model, where the choice is strictly determined by the EU of the two lotteries; but as m gets larger and larger the choice essentially becomes random. When m ¼ 1, this specification collapses to Eq. (3u), where the probability of picking one lottery is given by the ratio of the EU of one lottery to the sum of the EU of both lotteries. Thus, m can be viewed as a parameter that flattens out the link functions in Fig. 7 as it gets larger. This is just one of several different types of error story that could be used, and Wilcox (2008a, 2008b) provides masterful reviews of the implications of the alternatives.31 The use of this structural error parameter can be illustrated by a replication of the estimates provided by Holt and Laury (2002). Using the EP utility function in Eq. (1u), the Luce specification in Eq. (3v), and ignoring the fact that each subject made multiple binary choices, we estimate r ¼ 0.268 and a ¼ 0.028 using the non-hypothetical data from HL. Panel A of Table 5 lists these estimates, which replicate the results reported by HL (p. 1653) almost exactly. Their estimates were obtained using optimization procedures in GAUSS, and did not calculate the likelihood at the level of the individual observation. Instead their data was aggregated according to the lottery choices in each row, and scaled up to reflect the correct sample size of observations. This approach works fine for a completely homogenous model in which one does not seek to estimate effects of individual characteristics or correct for unobserved heterogeneity at the level of the individual. But the approach adopted in our replication does operate at the level of the individual observation, so it is possible to make these extensions. In fact, allowing for unobserved individual heterogeneity does not affect these estimates greatly. The role of the stochastic error assumption in Eq. (3v) can be evaluated by using Eq. (3u) instead, which is to assume that m ¼ 1 in Eq. (3v). The effect, shown in panel B of Table 5, is to estimate more risk-loving behavior, with ro0. Hence, at low levels of income subjects are now
76
Table 5. Variable
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Structural Maximum Likelihood Estimates of Risk Attitudes in Holt and Laury Experimentsa. Description
Estimate Standard p-Value Lower 95% Upper 95% Error Confidence Confidence Interval Interval
A. Luce Error Specification and No Corrections for Clustering r Utility function 0.268 0.017 o0.001 parameter a Utility function 0.028 0.002 o0.001 parameter m Structural noise 0.134 0.004 o0.001 parameter B. No Luce Error Specification, No Corrections for Clustering r Utility function 0.161 0.044 o0.001 parameter a Utility function 0.015 0.003 o0.001 parameter
0.234
0.302
0.024
0.033
0.125
0.143
0.247
0.074
0.010
0.020
C. Probit Link Function, No Fechner Error Specification, Corrections for Clustering r Utility function 0.293 0.021 o0.001 0.251 0.334 parameter a Utility function 0.038 0.003 o0.001 0.032 0.043 parameter D. Probit Link Function, Fechner Error Specification, and Corrections for Clustering r Utility function 0.684 0.049 o0.001 0.589 0.780 parameter a Utility function 0.045 0.059 0.452 0.072 0.161 parameter m Structural noise 0.172 0.016 o0.001 0.140 0.203 parameter a
Maximum likelihood estimation of EP utility function using all pooled binary choices. N ¼ 3990, based on 212 subjects from Holt and Laury (2002).
estimated to be risk loving. There is still evidence of Increasing Relative Risk Aversion (IRRA), with aW0. However, the log-likelihood of this specification is much worse than the original HL specification, and we can comfortably reject the null that m ¼ 1. The point of this result is to demonstrate that the stochastic identifying restriction, to use the concept developed by Wilcox (2008a, 2008b), is not innocuous for inference about risk attitudes. There is one other important error specification, due originally to Fechner and popularized by Hey and Orme (1994).32 This error specification posits
77
Risk Aversion in the Laboratory
the latent index rEU ¼
ðEUR EUL Þ m
(3000 )
instead of Eq. (3), (3u), or (3v). Wilcox (2008a) notes that as an analytical matter the evidence of IRRA in HL would be weaker, or perhaps even absent, if one had used a Fechner error specification instead of a Luce error specification. This important claim, that the evidence for IRRA may be an artifact of the (more or less arbitrary) stochastic identifying restriction assumed, can be tested with the HL data. The estimates in panels C and D of Table 5 confirm the claim of Wilcox (2008a). In panel C, we employ the probit link function Eq. (4) and the latent index function Eq. (3), and assume no Fechner error specification.33 We confirm the original estimates of HL, with minor deviations: the path of estimated RRA in the left side of Fig. 9 mimics the original results from HL in Fig. 8. But when we add a Fechner error specification, in panel D of Table 5, we find striking evidence of CRRA over this prize domain. The path of RRA in this case is shown on the right side of Fig. 9, and provides a dramatic contrast to Fig. 8.
2.5
2
1.5
1
.5
0 0
50
100
150 200 Income in Dollars
250
300
350
Fig. 8. Estimated Relative Risk Aversion Using the Holt–Laury Statistical Model. Estimated from Experimental Data of Holt & Laury (2002) Assuming Logit Likehood Function and Luce Noise.
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
78 2.5
Probit and No Noise
2.5
2
2
1.5
1.5
1
1
.5
.5
0
Probit and Fechner Noise
0 0
Fig. 9.
50 100 150 200 250 300 350 Income in Dollars
0
50 100 150 200 250 300 350 Income in Dollars
Estimated Relative Risk Aversion with Expo-Power Utility and Fechner Noise. Estimated from Experimental Data of Holt and Laury (2002).
The log-likelihood of the Fechner specification is worse than the loglikelihood of the Luce specification. Since neither specification is nested in the other, a non-nested hypothesis test would seem to be called for. We reject the Fechner specification using either the Vuong (1989) test or the variant proposed by Clarke (2003). On the other hand, we prefer to avoid rejecting one specification out of hand just yet, since an alternative is to posit a latent data generating process in which two or more specifications have some validity. We return to consider this approach later.
2.4. Non-Parametric Estimation It is possible to estimate the EUT model without assuming a functional form for utility, following Hey and Orme (1994). This approach works well for problem domains in which there are relatively few outcomes, since it involves estimation of one parameter for all but two of the outcomes. So if the task domain is constrained to just four outcomes, as in HO or HL, there are only two parameters to be estimated. But if the task domain spans many outcomes, these methods become inefficient and one must resort to a function defined by a few parameters, such as CRRA or EP utility functions.
Risk Aversion in the Laboratory
79
To illustrate, we use the experimental data of HO, and then the replication of their experiments by Harrison and Rutstro¨m (2005). We also use the Fechner noise specification introduced above, to replicate the specification of HO. In HO there were only four monetary prizes of d0, d10, d20, and d30. We normalize to u(d0) ¼ 0 and u(d30) ¼ 1, and estimate u(d10), u(d20), and the noise parameter. As explained by HO, one could normalize the noise parameter to some fixed value and then estimate u(d30) instead, but this choice of normalization seems the most natural. It is then possible to predict the values of the two estimated utilities: pooling over the two sessions and across subjects, we estimate u(d10) ¼ 0.66 with a standard error of 0.02, and u(d20) ¼ 0.84 with a standard error of 0.01, so u(d0)ou(d10)ou(d20)ou(d30) as expected. The application of this estimation procedure in HO was at the level of the individual, which obviously allows variation in estimated utilities over individuals. This illustrative calculation does not. The experiments of Harrison and Rutstro¨m (2005) were intended, in part, to replicate those of HO in the gain frame and additionally collect individual characteristics. In their case the prizes spanned $0, $5, $10, and $15. Employing the same non-parametric structure for this data as for the HO data above, the estimates are u($5) ¼ 0.60 and u($10) ¼ 0.80. In these data a set of demographic characteristics for each subject are known and we can therefore allow the estimated utilities to vary linearly with these characteristics. It is then possible to simply predict the estimated utilities, using the characteristics of each subject and the estimated coefficients on those characteristics, and plot them. Fig. 10 shows the distribution of estimated values. No subject had estimates that implied u($10)ou($5).
2.5. Comparing Procedures Do the various risk elicitation procedures imply essentially the same risk attitudes? In part this question requires that one agree on a standard way of representing lotteries, and that we understand the effect of those representations on elicited risk attitudes. It also requires that we agree on how to characterize risk attitudes statistically, and there are again many alternatives available in that direction that should be expected to affect inferred risk attitudes (Wilcox, 2008a). The older literature on utility elicitation was careful to undertake controlled comparisons of different procedures, as reviewed in Hershey, Kunreuther, and Schoemaker (1982) and illustrated by Hershey and Schoemaker (1985). But none of that
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
80
Density
u($10)
u($5)
0
.1
.2
.3
.4
.5 .6 Estimated Utility
.7
.8
.9
1
Fig. 10. Non-Parametric Estimates of Utility. (Assuming EUT and Normalized so that u($0)=0 and u($15)=1; Kernel Density of Predicted Utility Estimates for N=120 Subjects; Data from Hey–Orme Replication of Harrison and Rutstro¨m (2005).)
literature seemed to be concerned with incentive compatibility and the effect of real rewards. The striking counter-example, of course, is the preference reversal literature started for economists by Grether and Plott (1979), since they used methods for eliciting responses which were incentive compatible and they used real consequences to choices. And the phenomenon of preference reversals itself may be viewed as the claim that risk attitudes elicited from two procedures are not consistent, since the reversal is an ‘‘as if’’ change in risk attitudes when the elicitation mode changes. Unfortunately, the preference reversals in question involved a comparison of risk attitudes elicited with the RLP and BDM procedures, which both rely on strong assumptions to reliably elicit preferences. It may therefore be useful to compare the three procedures that we do find attractive on a priori grounds: the MPL of Holt and Laury (2002), the RLP of Hey and Orme (1994), and the OLS of Binswanger (1980, 1981). Each procedure is applied to the same sample drawn from the same population: students at the University of Central Florida. In one session the MPL method was first and the OLS method last, in another session these orders were reversed, and the RLP method was always presented to subjects in
81
Risk Aversion in the Laboratory
between. The subjects learned what their payoffs were from each procedure at the end of the sequence of tasks for that procedure, so there is some potential in this design for income effects. There were 26 subjects in one session and 27 subjects in the second session, for a pooled sample of 53. The parameters for the MPL procedure were scaled up by a factor of 10 to those used in the baseline experiments of Holt and Laury (2002), shown in panel A of Table 1. Thus, the prizes were $1.00, $16, $20, and $38.50. The parameters for the OLS procedure follow the broad pattern proposed by Binswanger (1980, 1981). The certain option offers $10 whether a coin toss is heads or tails, and the next options offer $19 or $9, $24 or $8, $25 or $7, $30 or $6, $32 or $4, $38 or $2, and finally $40 or $0.34 The RLP procedure used lotteries with probabilities and prizes that were each randomly drawn.35 Each prize was randomly drawn from the uniform interval ($0.01, $15.00) in dollars and cents, and the number of prizes in each lottery pair was either 2, 3, or 4, also selected at random. For any lottery pair the cardinality of the outcomes was the same, so if one lottery had three prizes the other lottery would also have three prizes. The probabilities were also drawn at random, and represented to subjects to two decimal places. Each subject was given 60 pairs of lotteries to choose from, and three picked at random to be played out and paid. The expected value of each lottery was roughly $7.50, with the expected value from the RLP procedure as a whole around $22.70. Thus, the scale of prizes in the MPL and OLS procedures was virtually identical: up to $38.50 and $40, respectively. The scale of prizes in the RLP procedure was comparable: up to $45 if all three selected lotteries generated an outcome of $15 each. In each case we estimate a CRRA model using Eq. (1). For the MPL and RLP procedures we use the probit link function, that is Eq. (4), defined over the difference in EU of the two lotteries for a candidate estimate of r and m, and the Fechner error specification Eq. (3vu). For the OLS procedure we use the standard logit specification originally due to Luce (1959); McFadden (2001) reviews the starred history of this specification beautifully, and Train (2003) reviews modern developments. The EU for each lottery pair in this latter specification is calculated for a candidate estimate of r and m, the exponential of the EU is taken as 1=m
eui ¼ expðEUi Þ
(6)
and the index rEUi ¼
eui ðeu1 þ eu2 þ eu3 þ eu4 þ eu5 þ eu6 Þ
(7)
82
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
calculated for each lottery i. This latent index, based on latent preferences, is in the form of a probability, and can therefore be directly linked to the observed choices; it is a multiple-lottery analogue of the Luce error specification Eq. (3v) for binary lottery choice.36 The results indicate consistency in the elicitation of risk attitudes, at least at the level of the inferred sample distribution. The point estimate (and 95% confidence intervals) for the MPL, RLP, and OLS procedures, respectively, are 0.75 (0.62, 0.88), 0.51 (0.42, 0.60), and 0.66 (0.44, 0.89). There is no significant order effect on the estimates from the OLS procedure: the estimates when it was first are 0.68 (0.43, 0.94), and when it was last they are 0.65 (0.25, 1.05). The 95% confidence intervals are wider in these estimates of the sub-samples, due to smaller samples. There is, however, a small but statistically significant order effect on the estimates from the MPL procedure: when it was first the CRRA estimate is 0.61 (0.46, 0.76) and when it was last the estimate is 0.86 (0.67, 1.05). These results are suggestive that the procedures elicit roughly the same risk attitudes, apart from the sensitivity of the MPL procedure to order. Thus, one would tentatively conclude, based on the above analysis, that the procedures should be expected to generate roughly the same estimates of risk attitudes for a target population, and when used as the sole measuring instrument when used at the beginning of a session.37 A closely related issue is the temporal stability of risk preferences, even when one uses the same elicitation procedure. It is possible to define temporal stability of preferences in several different ways, reflecting alternative conceptual definitions and operational measures. Each definition has some validity for different inferential purposes. Temporal stability of risk preferences can mean that subjects exhibit the same risk attitudes over time, or that their risk attitudes are a stable function of states of nature and opportunities that change over time. It is quite possible for risk preferences to be stable in both, either, or neither of these senses, depending on the view one adopts regarding the role preference stability takes in the theory. The temporal stability of risk preferences is one component of a broader set of issues that relate to the state-dependent approach to utility analysis.38 This is a perfectly general approach, where the state of nature could be something as mundane as the weather or as fundamental as the individual’s mortality risk. The states could also include the opportunities facing the individual, such as market prices and employment opportunities. Crucial to the approach, however, is the fact that all state realizations must be exogenous, or the model will not be identified and inferences about stability will be vacuous.
Risk Aversion in the Laboratory
83
Problems arise, however, when one has to apply this approach empirically. Where does one draw the line in terms of the abstract ‘‘states of nature’’? Many alleged violations of EUT amount to claims that a person behaved as if they had one risk preference for one lottery pair and another risk preference for a different lottery pair. Implicit in the claim that these are violations of EUT is the presumption that the differences in the two lottery pairs was not some state of nature over which preferences could be different.39 Similarly, should we deem the preferences elicited with an openended auction procedure to be different from those elicited with a binary choice procedure, such as in the famous preference reversals of Grether and Plott (1979), because of some violation of EUT or just some change in the state of nature? Of course, it is a slippery inferential slope that allows ‘‘free parameters’’ to explain any empirical puzzle by shifting preferences. Such efforts have to be guided by direct evidence from external sources, lest they become open-ended specification searches.40 Several studies have begun to examine the temporal stability question. Limited exercises in laboratory settings are reported by Horowitz (1992) and Harrison, Johnson, McInnes, and Rutstro¨m (2005a), who demonstrate the temporal stability of risk attitudes in lab experiments over a period of up to 4 months. Horowitz (1992; p. 177) collects information on financial characteristics of the individual to control for changes in state of nature, but does not report if it changed the statistical inference about temporal stability. Harrison et al. (2005a) consider the temporal stability of risk attitudes in college students over a 4-week period, and do not control for changes in state of nature. Andersen, Harrison, Lau, and Rutstro¨m (2008b) extend these simple designs in several ways. They use a much longer time span, control for changes in state of nature, use a stratified sample of a broader population, and report the results of a large-scale panel experiment undertaken in the field designed to examine this issue. Over a 17-month period they elicited risk preferences from subjects chosen to be representative of the adult Danish population. During this period many of the subjects were re-visited, and the same MPL risk aversion elicitation task repeated. In each visit information was also elicited on the individual characteristics of the subject, as well as their expectations about the state of their own economic situation and macroeconomic variables. The statistical analysis includes controls for changes in the subject’s perceived states of nature, as well as the possible effects of endogenous sample selection into the re-test. There is evidence of some variation in risk attitudes over time, but there is no general tendency for risk attitudes to increase or decrease over a 17-month span.
84
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Additionally, the small variation of risk attitudes over time is less prominent than variations across tasks and across individuals. The results also suggest that risk preferences are state contingent with respect to personal finances. Of course, we could easily imagine target populations, such as the poor, that might be far less stable over time than the average adult Dane. There is some evidence from Dave, Eckel, Johnson, and Rojas (2007; Table 7) that the MPL instrument might exhibit some drift over time in such a population: estimated RRA increases by 0.12 compared to a baseline of 0.71, but the p-value of this change is 0.14, so it is not statistically significant. The real contribution of these studies is a systematic methodology for examining the issue of temporal stability with longitudinal experiments.
2.6. Comparing Treatments The use of structural estimation of latent choice models also allows one to compare experimental treatments in terms of their effect on core parameters. Thus, we can answer questions such as ‘‘does treatment X affect risk attitudes’’ by directly estimating the effect on parameters determining risk attitudes, rather than relying on less direct measures of that effect. The value of inferences of this kind become more important when we allow for various parameters and processes to affect choice under uncertainty, such as when we consider rank-dependent preferences and/or sign-dependent preferences in Section 3. To illustrate, consider the effect of providing information to subjects about the EV of lotteries they are to choose from. For simple, binaryoutcome lotteries one often observes some subjects actually trying to do this arithmetic themselves on scrap paper, whether or not they then use that to decide which lottery to accept without adding or subtracting a risk premium. But when the cardinality of outcomes exceeds two, virtually all subjects tend to give up on those efforts to calculate EV. This raises the hypothesis that elicited risk attitudes might reflect underlying preferences or the interaction of those preferences and cognitive constraints on applying them to a particular lottery (if one assumes, for now, that subjects apply them the way economists theorize about them). A direct measure of the effect of providing EV can be obtained by running these treatments and then estimating a model in which the treatment acts as a binary dummy on a core parameter of the latent structural model. For data we use the replication of the RLP procedures of Hey and Orme (1994) reported in Appendix B. These tasks were only over the gain frame;
Risk Aversion in the Laboratory
85
63 subjects received no information over 60 binary choices, and 25 different subjects received information. For the structural model, we assume a CRRA power utility function, a Fechner error specification, and a probit linking function. If we introduce the binary dummy variable Info to capture those choices made under the treatment condition, we can estimate r ¼ r0+r1 Info and directly assess the effect on risk attitudes by the sign and statistical significance of the coefficient r1. It is also possible to allow for heteroskedasticity in the Fechner noise term, by estimating m ¼ m0+m1 Info and examining the estimate of m1. Thus, we allow for the possibility that providing information on EV might not change risk attitudes, but might change the precision with which the subject makes choices given a latent preference for one lottery over the other. The estimation results show that there is indeed a statistically significant effect on elicited risk attitudes from providing the EV of each lottery. The power function coefficient r increases by 0.15 from 0.47, which indicates a reduction in risk aversion towards RN. The p-value on the hypothesis test that this effect is zero is only 0.016, and the 95% confidence interval on the effect is between 0.03 and 0.28. So we conclude that there does appear to be a significant influence on elicited risk attitudes from providing information on EV. Whether this reflects better estimates of true preferences due to removing the confound of the cognitive burden of calculating EV, or reflects a simple anchoring response, cannot be determined. The point is that we can report the effect of the treatment in terms of its effect on the metric of interest, the core risk aversion parameter. In this specification there is no statistically significant effect on the Fechner noise parameter. Nor is there an effect on these conclusions from also controlling for the heterogeneity in preferences attributable to observed individual demographic effects.
3. EXTENSIONS AND FURTHER APPLICATIONS We elicit risk attitudes to make inferences about different things. Obviously there is interest in the characterization of risk attitudes in general, and the previous section reviewed the estimation issues that arise under EUT. It is also important to consider the characterization of risk attitudes under alternatives to EUT. We consider the class of rank-dependent models due to Quiggin (1982) (Section 3.1), and then the class of sign-dependent models due to Kahneman and Tversky (1979) (Section 3.2). The implications for allowing several latent data generating processes to characterize risk attitudes
86
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
are then considered (Section 3.3), concluding with a plea to avoid the assumption that there is one true model. Risk attitudes also constitute a fundamental confound to inferences about behavior in stochastic settings, and it is here that we believe that the major payoff to better experimental controls for risk attitudes will be seen. We consider three major areas of investigation in which controls for risk should play a more significant role: identification of discount rates (Section 3.4), tests of EUT against competing models (Section 3.5), and tests of bidding behavior in auctions (Section 3.6). We also consider tests of a model of choice behavior that has radical implications for how one might think about risk aversion, Myopic Loss Aversion (Section 3.7). Finally, we consider the implications of the random lottery incentive procedure for risk elicitation (Section 3.8), and present some summary estimates using comparable modeling assumptions and designs that we believe to be the most reliable (Section 3.9).
3.1. Characterizing Risk Attitudes with Probability Weighting and Rank-Dependent Utility One route of departure from EUT has been to allow preferences to depend on the rank of the final outcome through probability weighting. The idea that one could use non-linear transformations of the probabilities as a lottery when weighting outcomes, instead of non-linear transformations of the outcome into utility, was most sharply presented by Yaari (1987). To illustrate the point clearly, he assumed a linear utility function, in effect ruling out any risk aversion or risk seeking from the shape of the utility function per se. Instead, concave (convex) probability weighting functions would imply risk seeking (risk aversion).41 It was possible for a given decision-maker to have a probability weighting function with both concave and convex components, and the conventional wisdom held that it was concave for smaller probabilities and convex for larger probabilities. The idea of rank-dependent preferences had two important precursors.42 In economics, Quiggin (1982, 1993) had formally presented the general case in which one allowed for subjective probability weighting in a rankdependent manner and allowed non-linear utility functions. This branch of the family tree of choice models has become known as Rank-Dependent Utility (RDU). The Yaari (1987) model can be seen as a pedagogically important special case, and can be called Rank-Dependent Expected Value (RDEV). The other precursor, in psychology, is Lopes (1984). Her concern
87
Risk Aversion in the Laboratory
was motivated by clear preferences that experimental subjects exhibited for lotteries with the same expected value but alternative shapes of probabilities, as well as the verbal protocols those subjects provided as a possible indicator of their latent decision processes. Formally, to calculate decision weights under RDU one replaces expected utility X EUi ¼ ð pk U k Þ (2) k¼1;K
with RDU RDUi ¼
X
ðwk U k Þ
(20 )
k¼1;K
where wi ¼ oð pi þ . . . þ pn Þ oð piþ1 þ . . . þ pn Þ
(8a)
for i ¼ 1, y, n 1, and wi ¼ oð pi Þ
(8b)
for i ¼ n, the subscript indicates outcomes ranked from worst to best, and where o( p) is some probability weighting function. In the RDU model we have to define risk aversion in terms of the properties of the utility function and the probability weighting function, since both can affect risk attitudes. However, one can define conditional orderings, following Chew, Karni, and Safra (1987) and others, by considering the effects of more or less concave utility functions given a probability weighting function, and vice versa. Similarly, when we consider sign-dependent preferences in Section 3.2 the notion of risk aversion must include the effects of the sign of outcomes (e.g., possible loss aversion). Picking the right probability weighting function is obviously important for RDU specifications. A weighting function proposed by Tversky and Kahneman (1992) has been widely used. It is assumed to have well-behaved endpoints such that o(0) ¼ 0 and o(1) ¼ 1 and to imply weights oð pÞ ¼
pg ð pg þ ð1 pÞg Þ1=g
(9)
for 0opo1. The normal assumption, backed by a substantial amount of evidence reviewed by Gonzalez and Wu (1999), is that 0ogo1. This gives the weighting function an ‘‘inverse S-shape,’’ characterized by a concave
88
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
section signifying the overweighting of small probabilities up to a crossoverpoint where o(p) ¼ p, beyond which there is then a convex section signifying underweighting. Under the RDU assumption about how these probability weights get converted into decision weights, go1 implies overweighting of extreme outcomes. Thus, the probability associated with an outcome does not directly inform one about the decision weight of that outcome. If gW1 the function takes the less conventional ‘‘S-shape,’’ with convexity for smaller probabilities and concavity for larger probabilities.43 Under RDU gW1 implies underweighting of extreme outcomes. We illustrate the effects of allowing for probability weighting using the experimental data from Holt and Laury (2005). We assume the EP functional form UðxÞ ¼
ð1 expðax1r ÞÞ a
(100 )
for utility. The remainder of the econometric specification is the same as for the EUT model with Luce error m, generating 1=m
rRDU ¼
RDUR 1=m
1=m
(30000 )
ðRDUL þ RDUR Þ instead of Eq. (3vu). The conditional log-likelihood, ignoring indifference, becomes X X ln LRDU ðr; g; m; y; XÞ ¼ l RDU ¼ ððln FðrRDUÞjyi ¼ 1Þ i i i (500 ) þ ðlnð1 FðrRDUÞÞjyi ¼ 0ÞÞ and requires the estimation of r, g, and m. For RDEV one replaces Eq. (2u) with a specification that weights the prizes themselves, rather than the utility of the prizes: X RDEVi ¼ ðok mk Þ (200 ) k¼1;K
where mk is the kth monetary prize. In effect, the RDEV specification is a special case of RDU. The experimental data from Holt and Laury (2005) consists of 96 subjects facing their 1 condition or their 20 condition on a between-subjects basis.44 The final monetary prizes ranged from a low of $0.10 up to $77. We only consider data in which subjects faced real rewards. Replicating their EUT statistical model, and allowing for clustering of responses, we estimate
89
Risk Aversion in the Laboratory
r ¼ 0.40 with a standard error of 0.07, and a ¼ 0.076 with a standard error of 0.02, closely tracking the estimates from Holt and Laury (2002). In particular, there is evidence of increasing RRA over this income domain. When we estimate the RDU model using these data and specification, we find clear evidence of probability weighting. The estimate of g is 0.37 with a standard error of 0.16, so we can easily reject the hypothesis that g ¼ 1 and that there is no probability weighting. Thus, we observe the conventional qualitative shape of the probability weighting function, an inverse S-shape. The effect of allowing for probability weighting is to lower the estimates of the curvature of the utility function – but we should be careful here not to associate curvature of the utility function with risk aversion. The risk aversion parameter r is estimated to be 0.26 and the a parameter to be 0.02, with standard errors of 0.05 and 0.012, respectively. Thus, there is some evidence for increasing curvature of the utility function as income increases (aW0), but it is not statistically significant ( p-value of 0.16 that a ¼ 0). Fig. 11 displays the ‘‘relative risk aversion’’ associated with the curvature of the utility function, and the shape of the probability weighting function. Of course, RRA should actually be defined here in terms of both the curvature of the utility function and the effect of probability weighting, so the coefficients are not directly comparable to the EUT model. Nevertheless, 2.5
RDU ρ =.26 and α=.02
1
RDU γ =.37
2.25 2
.75
1.75 RRA
1.5 ω(p) .5
1.25 1 .75
.25
.5 .25 0
0 0 10 20 30 40 50 60 70 80 Income in Dollars
0
.25
.5 p
.75
1
Fig. 11. Probability Weighting in Holt and Laury Risk Elicitation Task. RDU Parameters Estimated with N ¼ 96 Subjects from Experimental Data of Holt and Laury (2005).
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
90
we can clearly say that inferences about increasing RRA depends on the assumptions one makes about probability weighting.
3.2. Characterizing Risk Attitudes with Loss Aversion and Sign-Dependent Utility 3.2.1. Original Prospect Theory Kahneman and Tversky (1979) introduced the notion of sign-dependent preferences, stressing the role of the reference point when evaluating lotteries. In various forms, as we will see, PT has become the most popular alternative to EUT. Original Prospect Theory (OPT) departs from EUT in three major ways: (a) allowance for subjective probability weighting; (b) allowance for a reference point defined over outcomes, and the use of different utility functions for gains or losses; and (c) allowance for loss aversion, the notion that the disutility of losses weighs more heavily than the utility of comparable gains. The first step is probability weighting, of the form o( p) defined in Eq. (10), for example. One of the central assumptions of OPT, differentiating it from later variants of PT, is that w( p) ¼ o( p), so that the transformed probabilities given by o( p) are directly used to evaluate PU: X PUi ¼ ðok uk Þ (2000 ) k¼1;K
The second step in OPT is to define a reference point so that one can identify outcomes as gains or losses. Let the reference point be given by w for a given subject in a given choice. Consistent with the functional forms widely used in PT, we again use the CRRA functional form uðmÞ ¼
m1a ð1 aÞ
(1000 )
when mZw, and uðmÞ ¼ l
ðmÞ1a ð1 aÞ
(10000 )
when mow, and where l is the loss aversion parameter. We use the same exponent a for the utility functions defined over gains and losses, even though the original statements of PT keep them theoretically distinct. Ko¨bberling and Wakker (2005; Section 7) point out that this constraint is
91
Risk Aversion in the Laboratory
needed to identify the degree of loss aversion if one uses CRRA functional forms and does not want to make other strong assumptions (e.g., that utility is measurable only on a ratio scale).45 Although l is free in principle to be less than 1 or greater than 1, most PT analysts presume that lZ1. The specification of the reference point is critical to PT, and is discussed in Section 3.2.3. One issue is that it influences the nature of subjective probability weighting assumed, since different weights are allowed for gains and losses. Thus, we can again specify oðpÞ ¼
pg ð pg þ ð1 pÞg Þ1=g
(9)
for gains, but oðpÞ ¼
pf ð pf þ ð1 pÞf Þ1=f
(90 )
for losses. It is common in empirical applications to assume g ¼ f. The remainder of the econometric specification would be the same as for EUT and RDU models. The latent index can be defined in the same manner, and the conditional log-likelihood defined comparably. Estimation of the core parameters a, l, g, f, and m is required. The primary logical problem with OPT was that it implied violations of stochastic dominance. Whenever g 6¼ 1 or f 6¼ 1, it is possible to find nondegenerate lotteries such that one lottery would stochastically dominate the other, but would be assigned a lower PU. Examples arise quickly when one recognizes that g( p1+p2) 6¼ g( p1)+g( p2) for some p1 and p2. Kahneman and Tversky (1979) dealt with this problem by assuming that evaluation using OPT only occurred after dominated lotteries were eliminated. For specifications such as the one discussed here there is no modeling of an editing phase, but the stochastic error term m could be interpreted as a reduced-form proxy for that editing process.46 We do not provide any illustrative estimations of this model but move straight to the extensions provided by CPT. 3.2.2. Cumulative Prospect Theory The notion of rank-dependent decision weights was incorporated into OPT by Starmer and Sugden (1989), Luce and Fishburn (1991), and Tversky and Kahneman (1992). Instead of implicitly assuming that w( p) ¼ o( p), it allowed w( p) to be defined as in the RDU specification given by Eqs. (8a) and (8b). The sign-dependence of subjective probability weighting in OPT,
92
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
leading to the estimation of different probability weighting functions, Eqs. (9) and (9u), for gains and losses, is maintained in CPT. Thus, there is a separate decumulative function used for gains and losses, but otherwise the logic is the same as for RDU.47 The estimation of a structural CPT model can be illustrated with data from the Harrison and Rutstro¨m (2005) replication and extension of the Hey and Orme (1994) RLP procedure. As explained in Appendix B, they had some subjects face lotteries defined over a gain frame, some face lotteries defined over a loss frame, and some face lotteries defined over a mixed gain–loss frame. In the mixed frame some prizes in a lottery were gains, and some were losses. In each case the subjects were endowed with cash to ensure that final outcomes were either exactly or approximately the same across frames. Table 6 displays the ML estimates of the core parameters, and Fig. 12 displays the distributions over individuals of predicted values for each parameter. In each case the utility function is the CRRA power specification, a Fechner error story is included with a probit link function, and m is a linear function of the same observable characteristics as every other parameter (Table 6 does not show the estimate for m). The distribution of estimates of a are consistent with concave utility functions over gains and convex utility functions over losses, as expected. The estimates of g are also consistent with expectations of an inverse S-shaped probability weighting function, implying greater decision weights on extreme prizes within each lottery. However, the estimates of l are not at all consistent with loss aversion, and in fact suggest a clear tendency towards loss seeking. We reconsider the sensitivity of estimates of l to the assumed reference point in more detail below. Table 6 shows that there are some systematic effects of observable demographics on the EUT and CPT parameter estimates. Under EUT there is a slight effect from sex, with women being more risk averse, but it is not statistically significant. Similarly, ethnic characteristics show a large effect on risk attitudes, but they are not statistically significant. The only characteristic that has a statistically significant effect on risk attitudes under EUT is age, which is here shown in deviations from age 20. So every extra year leads to reduction in risk aversion. For completeness, we also estimate RDU on these data, not shown in Table 6, and find the curvature of the utility function similar to that of EUT, contrary to the estimates discussed above for the data of Holt and Laury (2005). For the RDU model the data here indicate a significant sex effect, with women being more risk averse
93
Risk Aversion in the Laboratory
Table 6. Maximum Likelihood Estimates for EUT and CPT Models. Parameter
Variable
Point Estimate
Standard p-Value Lower 95% Upper 95% Error Confidence Confidence Interval Interval
A. EUT Model (log-likelihood ¼ 7,665.0) r Constant 0.952 Female 0.133 Black 0.138 Hispanic 0.195 Age (compared to 20) 0.039 Major is in business 0.107 Low GPA (below 3.24) 0.061
0.149 0.094 0.133 0.127 0.009 0.135 0.121
0.00 0.16 0.30 0.13 0.00 0.43 0.61
0.66 0.32 0.40 0.44 0.02 0.37 0.18
1.24 0.05 0.12 0.05 0.06 0.16 0.30
B. CPT Model (log-likelihood ¼ 7,425.5) a Constant 0.761 Female 0.160 Black 0.132 Hispanic 0.358 Age (compared to 20) 0.017 Major is in business 0.037 Low GPA (below 3.24) 0.036
0.079 0.109 0.277 0.192 0.009 0.097 0.093
0.00 0.14 0.63 0.06 0.07 0.70 0.69
0.61 0.37 0.67 0.73 0.00 0.23 0.14
0.91 0.05 0.41 0.02 0.04 0.15 0.22
g
Constant Female Black Hispanic Age (compared to 20) Major is in business Low GPA (below 3.24)
1.017 0.050 0.300 0.092 0.001 0.021 0.066
0.061 0.074 0.133 0.142 0.004 0.075 0.070
0.00 0.49 0.02 0.51 0.75 0.78 0.35
0.89 0.20 0.56 0.37 0.01 0.17 0.20
1.14 0.09 0.04 0.18 0.01 0.13 0.07
l
Constant Female Black Hispanic Age (compared to 20) Major is in business Low GPA (below 3.24)
0.447 0.432 0.233 0.386 0.033 0.028 0.057
0.207 0.416 1.062 0.386 0.018 0.240 0.238
0.03 0.30 0.83 0.32 0.08 0.91 0.81
0.04 0.38 1.85 1.14 0.00 0.44 0.41
0.85 1.25 2.31 0.37 0.07 0.49 0.52
( 0.09, p-value ¼ 0.02), as well as Hispanics ( 0.17, p-value ¼ 0.009). In addition, age has the same effect as under EUT. Although the extent of probability weighting is slight, and overall curvature of the utility function matches EUT, there are therefore some significant changes in the composition of the curvature of utility across the sample.
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
94
α Density
Density
γ
.5
.25
0
.75
1
.6 .65 .7 .75 .8 .85 .9 .95
μ Density
Density
λ
1
0
.5
1
1.5
.5
.6
.7
.8
.9
1
1.1 1.2 1.3
Fig. 12. Estimates of the Structural CPT Model. (Data from Hey–Orme Replication of Harrison and Rutstro¨m (2005); N ¼ 207 Subjects: 63 Gain Frame, 57 Loss Frame, and 87 Mixed Frame.)
The CPT estimates in Table 6 also show some demographic effects on the composition of the curvature of utility across the sample. There is now a large and statistically significant effect from being Hispanic, in addition to a comparable age effect. The only characteristic that significantly affects the extent of probability weighting is whether the subject is Black, and it is a large effect. The effects on loss aversion appear to be poorly estimated, which of course may just be a reflection that this is not a stable parameter in terms of its effect, at least as currently modeled. Although these were static tasks, in the sense that there was no accumulation of earnings, subjects may have been adjusting their reference point during the 60 binary choices in some unspecified manner. Finally, Fig. 13 collates estimates of the curvature of the utility function for these data using the three major alternative models of choice. In the top panel we include an EUT specification assuming the CRRA power utility function with parameter r. In the bottom left panel we estimate an RDU model with utility function parameter r, and that allows for rank-dependent probability weighting. The EUT and RDU models are estimated on the choices made in the loss frame, but with the actual net gain amount included
95
Risk Aversion in the Laboratory EUT model: r Density
Density
EUT model: r
0
.25
.5
.75
1
1.25
0
.25
.5
.75
RDU model: α
1
1.25
Density
Density
CPT Model: α
0
.25
.5
.75
1
1.25
0
.25
.5
.75
1
1.25
Fig. 13. Estimates of Curvature of Utility Function. (Data from Hey–Orme Replication of Harrison and Rutstro¨m (2005); N ¼ 207 Subjects: 63 Gain Frame, 57 Loss Frame, and 87 Mixed Frame; Prizes for EUT and RDU Include Endowment.)
in the utility function.48 In the bottom-right panel, we reproduce the estimate of a from Fig. 12, scaled to the EUT estimate above it for comparability. We see evidence that the RDU specification does not change the inferences we make about the curvature of the utility function significantly in comparison to EUT, so risk aversion here is not reflected in a transformation of probabilities. The CPT specification, which adds sign-dependence to utility, does result in a shift towards greater concavity of the utility function for gains, and more distinct modes reflecting a greater heterogeneity in preferences. Of course, curvature of the utility function under RDU and CPT is not the same as aversion to risk, but it is nonetheless useful to compare the implied shapes of the utility function. 3.2.3. The Reference Point and Loss Aversion It is essential to take a structural perspective when estimating CPT models. Estimates of the loss aversion parameter depend intimately on the assumed reference point, as one would expect since the latter determines what are to be viewed as losses. So if we have assumed the wrong reference point, we will not reliably estimate the degree of loss aversion. However, if we do not get loss aversion leaping out at us when we make a natural assumption about
96
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
the reference point, should we infer that there is no loss aversion or that there is loss aversion and we just used the wrong reference point? This question points to a key operational weakness of CPT: the need to specify what the reference point is. Loss aversion may be present for some reference point, but if it is not present for the one we used, and none others are ‘‘obviously’’ better, then should one keep searching for some reference point that generates loss aversion? Without a convincing argument about the correct reference point, and evidence for loss aversion conditional on that reference point, one simply cannot claim that loss aversion is always present. This specification ambiguity is arguably less severe in the lab, where one can frame tasks to try to induce a loss frame, but is a particularly serious issue in the field. Similarly, estimates of the nature of probability weighting vary with changes in reference points, loss aversion parameters, and the concavity of the utility function, and vice versa. All of this is to be expected from the CPT model, but necessitates joint econometric estimation of these parameters if one is to be able to make consistent statements about behavior. In many laboratory experiments it is simply assumed that the manner in which the task is framed to the subject defines the reference point that the subject uses. Thus, if one tells the subject that they have an endowment of $15 and that one lottery outcome is to have $8 taken from them, then the frame might be appropriately assumed to be $15 and this outcome coded as a loss of $8. But if the subject had been told, or expected, to earn only $5 from the experimental task, would this be coded instead as a gain of $2? The subjectivity and contextual nature of the reference point has been emphasized throughout by Kahneman and Tversky (1979), even though one often collapses it to the experimenter-induced frame in evaluating laboratory experiments. This imprecision in the reference point is not a criticism of PT, just a challenge to be careful assuming that it is always fixed and deterministic (see Schmidt, Starmer, & Sugden, 2005; Ko+ szegi & Rabin, 2006, 2007; Andersen, Harrison, & Rutstro¨m, 2006b).49 A corollary is that it might be a mistake to view loss aversion as a fixed parameter l that does not vary with the context of the decision, ceteris paribus the reference point. See Novemsky and Kahneman (2005a) and Camerer (2005; pp. 132, 133) for discussion of this concern, which arises most clearly in dynamic decision-making settings with path-dependent earnings. This issue is particularly serious when one evaluates risk attitudes in some of the high-stakes game shows: see Andersen, Harrison, Lau, and Rutstro¨m (2008c) for a review of these studies and the modeling issues that arise.
97
Risk Aversion in the Laboratory
To gauge the extent of the problem, we re-visit the estimation of a structural CPT model using our laboratory data (the replication of the Hey and Orme (1994) reported in Harrison and Rutstro¨m (2005)), but this time consider the effect of assuming different reference points than the one induced by the task frame. Assume that the reference point is w, as in Eqs. (1uuu) and (1uuuu) above, but instead of setting w ¼ $0, allow it to vary between $0 and $10 in increments of $0.10. The results are displayed in Fig. 14. The top left panel shows a trace of the log-likelihood value as the reference point is increased, and reaches a maximum at $4.60. To properly interpret this value, note that these estimates are made at the level of the individual choice in this task, and the subject was to be paid for three of those choices. So the reference point for the overall task of 60 choices would be $13.80 ( ¼ 3 $4.60). This is roughly consistent with the range of estimates of expected session earnings elicited by Andersen et al. (2006b) for a sample drawn from the same population.50 The other interesting part of Fig. 14 is that the estimate of loss aversion increases steadily as one increases the assumed reference point. At the ML reference point of $4.60, l is estimated to be 2.51, with a standard error of 0.37 and a 95% confidence interval between 1.79 and 3.24. These estimates
-7450
Log-Likelihood
-7500 -7550 -7600 -7650 0 1 2 3 4 5 6 7 8 Reference Point ($)
0
9 10 α
1.5
λ
4 3.5 3 2.5 2 1.5 1 .5 1
2
3 4 5 6 7 8 Reference Point ($)
9 10 γ
1.1 1.05
1.25 1
1
.75
.95 .9
.5 0
1
2
3 4 5 6 7 8 Reference Point ($)
9 10
0
1
2
3 4 5 6 7 8 9 10 Reference Point ($)
Fig. 14. Estimates of the Structural CPT Model with a Range of Assumed Reference Points. (Estimated with Subjects from the Harrison and Rutstro¨m (2005) design; N ¼ 207 Subjects: 63 Gain Frame, 57 Loss Frame, and 87 Mixed Frame.)
98
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
raise an important methodological question: was it the data that led to the conclusion that loss aversion was significant, or the priors favoring significant loss aversion that led to the empirical specification of reference points? Our results may appear to be a confirmation of the argument made by some PT analysts that lE2, but it is important to recognize that the estimates presented here may not extend to other data sets or to other error specifications in the likelihood function. Further, in experimental subject pools with different reference points we would find something else entirely. At the very least, it is premature to proclaim ‘‘three cheers’’ for loss aversion (Camerer, 2005).
3.3. Characterizing Risk Attitudes with Several Latent Data Generating Processes Since different models of choice behavior under uncertainty imply somewhat different characterizations of risk attitudes, it is important that we make some determination about which of these models is to be adopted. One of the enduring contributions of behavioral economics is that we now have a rich set of competing models of behavior in many settings, with EUT and PT as the two front-runners for choices under uncertainty. Debates over the validity of these models have often been framed as a horse race, with the winning theory being declared on the basis of some statistical test in which the theory is represented as a latent process explaining the data. In other words, we seem to pick the best theory by ‘‘majority rule.’’ If one theory explains more of the data than another theory, we declare it the better theory and discard the other one. In effect, after the race is over we view the horse that ‘‘wins by a nose’’ as if it was the only horse in the race. The problem with this approach is that it does not recognize the possibility that several behavioral latent processes may co-exist in a population. Recognizing that possibility has direct implications for the characterization of risk attitudes in the population. Ignoring this possibility can lead to erroneous conclusions about the domain of applicability of each theory, and is likely an important reason for why the horse races pick different winners in different domains. For purely statistical reasons, if we have a belief that there are two or more latent population processes generating the observed sample, one can make more appropriate inferences if the data are not forced to fit a specification that assumes one latent population process. Heterogeneity in responses is well recognized as causing statistical problems in experimental and non-experimental data. Nevertheless, allowing for heterogeneity in responses through standard methods, such as fixed or
Risk Aversion in the Laboratory
99
random effects, is not helpful when we want to identify which people behave according to what theory, and when. Heterogeneity can be partially recognized by collecting information on observable characteristics and controlling for them in the statistical analysis. For example, a given theory might allow some individuals to be more risk averse than others as a reflection of personal preference. But this approach only recognizes heterogeneity within a given theory. This may be important for valid inferences about the ability of the theory to explain the data, but it does not allow for heterogeneous theories to co-exist in the same sample. One approach to heterogeneity and the possibility of co-existing theories adopted by Harrison and Rutstro¨m (2005) is to propose a ‘‘wedding’’ of the theories. They specify and estimate a grand likelihood function that allows each theory to co-exist and have different weights, a so-called mixture model. The data can then identify what support each theory has. The wedding is consummated by the ML estimates converging on probabilities that apportion non-trivial weights to each theory. Their results are striking: EUT and PT share the stage, in the sense that each accounts for roughly 50% of the observed choices. Thus, to the extent that EUT and PT imply different things about how one measures risk aversion, and the role of the utility function as against other constructs, assuming that the data is generated by one or the other model can lead to erroneous conclusions. The fact that the mixture probability is estimated with some precision, and that one can reject the null hypothesis that it is either 0 or 1, also indicates that one cannot claim that the equal weight to these models is due to chance. The main methodological lesson from this exercise is that one should not rush to declare one or other model as a winner in all settings.51 One would expect that the weight attached to EUT would vary across task domains, just as it can be shown to vary across observable socio-economics characteristics of individual decision makers. Another approach to heterogeneity involves the use of ‘‘random parameters’’ in models, illustrated well by Wilcox (2008a, 2008b). Consider the simple EUT specification with no stochastic noise assumption, given by Eqs. (1)–(5). There is one parameter doing all the empirical work: the coefficient of RRA r. In the traditional statistical specification r is treated as the same across all individuals in the sample, or as a linear function of observable characteristics. An alternative approach is to view r as varying over the sample according to some distribution, commonly assumed to be Normal. In that specific case there are really two parameters to be estimated, the mean of r and the standard deviation of r.
100
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
If the heterogeneity of process takes a nested form, in the sense that one process is a restricted form of the other, then one can think of the correct statistical specification as either a finite mixture model or a random coefficients specification. In the latter case one would want to allow more flexible functional forms than Normal, to allow for multiple modes, but this is easy to generate as the sum of several uni-modal distributions. If the heterogeneity of process takes a non-nested form, such that the parameter sets are distinct for each process, then the mixture specification is more appropriate, or one should use a combination of mixture and random parameter specifications (Conte, Hey, & Moffatt, 2007).
3.4. Joint Elicitation of Risk Attitudes and Other Preferences In many settings in experimental economics we want to elicit some preference from a set of choices that also depend on risk attitudes. Often these involve strategic games, where the uncertain ways in which behavior of others deviate from standard predictions engenders a lottery for each player. Such uncertain deviations could be due to, for example, unobservable social preferences such as fairness or reciprocity. One example is offers made in Ultimatum bargaining when the other player cannot be assumed to always accept a minuscule amount of money, and acceptable thresholds may be uncertain. Other examples include Public goods contribution games where one does not know the extent of free riding of other players, and Trust games in which one does not know the likelihood that the other player will return some of the pie transferred to him. Another source of uncertainty is the possibility that subjects make decisions with error, as predicted in Quantal Response Equilibria. Later we consider one example of this use of controls for risk attitudes in bidding in first-price auctions. In some cases, however, we simply want to elicit a preference from choices that do not depend on the choices made by others in a strategic sense, but which still depend on risk attitudes. An example due to Andersen, Harrison, Lau, and Rutstro¨m (2008a) is the elicitation of individual discount rates. In this case, it is the concavity of the utility function that is important, and under EUT that is synonymous with risk attitudes. The implication is that we should combine a risk elicitation task with a time preference elicitation task, and use them jointly to infer discount rates over utility. Assume EUT holds for choices over risky alternatives and that discounting is exponential. A subject is indifferent between two income
Risk Aversion in the Laboratory
options Mt and Mt+t if and only if 1 1 Uðo þ M t Þ þ UðoÞ ¼ UðoÞ þ Uðo þ M tþt Þ ð1 þ dÞt ð1 þ dÞt
101
(10)
where U(o+Mt) is the utility of monetary outcome Mt for delivery at time t plus some measure of background consumption o, d the discount rate, t the horizon for delivery of the later monetary outcome at time t+t, and the utility function U is separable and stationary over time. The left-hand side of Eq. (10) is the sum of the discounted utilities of receiving the monetary outcome Mt at time t (in addition to background consumption) and receiving nothing extra at time t+t, and the right-hand side is the sum of the discounted utilities of receiving nothing over background consumption at time t and the outcome Mt+t (plus background consumption) at time t+t. Thus, Eq. (10) is an indifference condition and d is the discount rate that equalizes the present value of the utility of the two monetary outcomes Mt and Mt+t, after integration with an appropriate level of background consumption o. Most analyses of discounting models implicitly assume that the individual is risk neutral,52 so that Eq. (10) is instead written in the more familiar form 1 (11) Mt ¼ M tþt ð1 þ dÞt where d is the discount rate that makes the present value of the two monetary outcomes Mt and Mt+t equal. To state the obvious, Eqs. (10) and (11) are not the same. As one relaxes the assumption that the decision-maker is risk neutral, it is apparent from Jensen’s Inequality that the implied discount rate decreases if U(M) is concave in M. Thus, one cannot infer the level of the individual discount rate without knowing or assuming something about their risk attitudes. This identification problem implies that risk attitudes and discount rates cannot be estimated based on discount rate experiments alone, but separate tasks to identify the influence of risk preferences must also be implemented. Andersen et al. (2008a) do this, and infer discount rates for the adult Danish population that are well below those estimated in the previous literature that assumed RN, such as Harrison, Lau, and Williams (2002), who estimated annualized rates of 28.1% for the same target population. Allowing for concave utility, they obtain a point estimate of the discount rate of 10.1%, which is significantly lower than the estimate of 25.2% for the same sample assuming linear utility. This does more than simply verify that discount rates and risk aversion coefficients are mathematical substitutes in
102
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
the sense that either of them have the effect of lowering the influence from future payoffs on present utility. It tells us that, for risk aversion coefficients that are reasonable from the standpoint of explaining choices in the lottery choice task, the estimated discount rate takes on a value that is much more in line with what one would expect from market interest rates. To evaluate the statistical significance of adjusting for a concave utility function one can test the hypothesis that the estimated discount rate assuming risk aversion is the same as the discount rate estimated assuming RN. This null hypothesis is easily rejected. Thus, allowing for risk aversion makes a significant difference to the elicited discount rates.
3.5. Testing Expected Utility Theory Much of the data collected with the direct intent of testing EUT involved choice pairs selected deliberately to provide a way of testing EUT without having to know the risk attitudes of subjects. Unfortunately they provide extremely weak tests, since one can only count a choice as a success or failure of the theory, and no transparent metric suggests itself to weight some violations rather than others as more serious.53 This is why we generally use ML to estimate parameters in such binary choice settings, and not the ‘‘hit ratio,’’ since some hits are closer than others and we want to take that into account by calculating the probability of the observed choice conditional on the parameters being evaluated.54 The problem is even more serious than devising a metric to test the seriousness of violations. In two respects, EUT is a hard theory to reject in these settings First, how does one know if the subjects are actually indifferent to the choice pairs on offer? Allowing subjects to express indifference does not suffice, since there is no way to know if they have randomized internally before picking out one lottery. Moreover, waiting for the data to exhibit 50–50 splits for indifference presumes that no artifactual presentation biases exist.55 Second, how does one know if the subjects are not extremely risk averse? High levels of risk aversion mean that the CE of the utility values of the prizes are all close to ‘‘very small numbers.’’ Hence, for sufficiently high levels of risk aversion, the CEs of the two lotteries are virtually identical and the subject should be rationally indifferent. Unfortunately, this free parameter gives EUT the formal leeway to escape from virtually any test one can think of. These problems lead one to question how operationally meaningful these tests are without some independent characterization of risk attitudes.
Risk Aversion in the Laboratory
103
To provide one striking example of this issue, consider the Preference Reversal tests of EUT presented to economists by Grether and Plott (1979). In these experiments, the subject was asked to make a direct binary choice between lotteries A and B, and then to state a valuation on each of A and B. From the latter two valuations the experimenter can infer a binary preference. The reversal is said to occur when the inferred binary preference differs from the direct binary choice. One design feature of these tasks is that A and B had virtually identical expected value. Given this information, anthropomorphize and sympathize with a poor ML estimation routine trying to explain any sample of choices in which there are significant numbers of reversals. It could try assuming subjects were risk neutral, and then it could ‘‘explain’’ any observed choice since the subject would be indifferent between either option. The best way to address these concerns is to characterize the risk attitudes of the subjects independently of the choice tasks, allowing the experimenter to identify those subjects that make for better tests of EUT. This identification can proceed independently of the choice data one is seeking to confront with EUT. To illustrate, consider the Common Ratio tests of EUT from Cubitt, Starmer, and Sugden (1988a) (CSS). The CSS tests used 451 subjects, who were randomly given one of five problems.56 The first and last problems in CSS were a choice between simple prospects. Problem 1 was a choice between option A, which was an 80% chance of d16, and option B, which was d10 for certain. Problem 5 was a simple ‘‘common ratio’’ transformation which multiplied each option by 1/4, so that option A was a 20% chance of d16 and option B was a 25% chance of d10. Problems 2 through 4 were procedural variants on Problem 2, which are identical to Problem 5 from the perspective of EUT. We refer to these as problems AB and AB for present purposes, in new experiments discussed below. Thus, CSS Problems 2–5 correspond to problem AB in our design, and their Problem 1 corresponds to our problem AB. Cubitt, Starmer, and Sugden (1988a; Table 2, p. 1375) report that 50% of their sample chose option A in their Problems 2 through 5, which are qualitatively identical to problem AB in our design. Only 38% of their subjects chose option A in their Problem 1, which is qualitatively the same as problem AB in our design. Using the same w2 contingency table test employed by CSS, we can only reject the EUT hypothesis at a significance level of 11.2%; Fisher’s exact test for the same two-sided comparison has a significance level of 15.3%. So there is weak evidence that EUT is violated, even if it does not strictly fail at conventional levels of significance.57
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
104
For a specific example of the Common Ratio test, in which we have independent information on risk attitudes, suppose Lottery A consists of prizes $0 and $30 with probabilities 0.2 and 0.8, and that Lottery B consists of prizes $0 and $20 with probabilities 0 and 1. Then one may construct two additional compound lotteries, A and B, by adding a front-end probability q ¼ 0.25 of winning zero to lotteries A and B. That is, A offers a (1 q) chance to play lottery A and a q chance of winning zero. Subjects choosing A over B and B over A, or choosing B over A and A over B, are said to violate EUT. To show precisely how risk aversion does matter, assume that risk attitudes can be characterized by the popular CRRA function, Eq. (1). The CE of the lottery pairs AB and AB as a function of r are shown in the left and right upper panels, respectively, of Fig. 15. The CRRA coefficient ranges from 0.5 (moderately risk loving) up to 1.25 (very risk averse), with a risk-neutral subject at r ¼ 0. The CE of lottery B, which offers $20 for sure, is the horizontal line in the left panel of Fig. 15. The CE of A, A, and B all decline as risk aversion increases. The lower panels of Fig. 15 show the CE differences between the A and B (A and B) lotteries. Note that for
CE for Lotteries A*and B*
CE for Lotteries A and B 30.00 25.00 20.00 15.00 10.00 5.00 0.00
Lottery B Lottery A
-.5
0
.5 1 CRRA coefficient
30.00 25.00 20.00 15.00 10.00 5.00 0.00
1.5
Lottery A* Lottery B* -.5
CE of Lottery A Minus CE of Lottery B
0 .5 1 CRRA coefficient
1.5
CE of Lottery A* Minus CE of Lottery B* 10.00 5.00 0.00 -5.00 -10.00 -15.00 -20.00
10.00 5.00 0.00 -5.00 -10.00 -15.00 -20.00 -.5
0
.5 1 CRRA coefficient
Fig. 15.
1.5
-.5
0 .5 1 CRRA coefficient
Risk Attitudes and Common Ratio Tests of EUT.
1.5
Risk Aversion in the Laboratory
105
the AB (AB) lotteries, the preferred outcome switches to lottery B (B) for a CRRA coefficient about 0.45. Most evaluations of EUT acknowledge that one cannot expect any theory to predict perfectly, since any violation would lead one to reject the theory no matter how many correct predictions it makes. One way to evaluate mistakes is to calculate their costs under the theory being tested and to ‘‘forgive’’ those mistakes that are not very costly, while holding to account those that are. For each subject in our data and each lottery choice pair, we can calculate the CE difference given the individual’s estimated CRRA coefficient, allowing us to identify those choice pairs that are most salient. A natural metric for defining ‘‘trivial EUT violations’’ can then be defined in terms of choices that involve a difference in CE below some given threshold. Suppose for the moment that an expected utility maximizing individual will flip a coin to make a choice whenever the difference in CE falls below some cognitive threshold. If r ¼ 0.8, the CE difference in favor of B is large in the first lottery pair and B will be chosen. In the second lottery pair, the difference between the payoffs for choosing A and B is trivial (less than a cent, in fact) and a coin is flipped to make a choice. Thus, with probability 0.5 the experimenter will observe the individual choosing B and A, a choice pattern inconsistent with EUT. In a sample with these risk attitudes, half the choices observed would then be expected to be inconsistent with EUT. With such a large difference between the choice frequencies, standard statistical tests would easily reject the hypothesis that they are the same. Thus, we would reject EUT in this case even though EUT is essentially58 true. Fig. 16 collates estimates of risk attitudes elicited by Harrison, Johnson, McInnes, and Rutstro¨m (2005b) from 152 subjects, described in Section 1.2 and Table 3. The idea is to simply align the CE differences for each of the CR lotteries (AB in the left panel, and AB in the right panel) with the distribution of risk attitudes expected from this sample (the bottom boxes). Clearly the subjects tend to have risk attitudes at precisely the point at which these tests have least power to reject EUT. This is particularly striking for the AB lottery choice, but even for the AB lottery choice it is only the few subjects ‘‘in the tails’’ of the risk distribution for which EUT has a strong prediction. Further, these risk attitude distributions refer to point estimates, and do not reflect the uncertainty of those estimates: it is quite possible that some subject that has a point estimate of his CRRA coefficient that makes the AB test powerful also has a large enough standard error on that point estimate that the AB test is not powerful. This issue of precision is addressed directly by Harrison, Johnson, McInnes, and Rutstro¨m (2007a).
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
106
CE of Lottery A Minus CE of Lottery B 5
CE of Lottery A* Minus CE of Lottery B* 1.0
0 0.5
-5 -10
0.0
-15 0
.25 .5 .75 CRRA coefficient
1
0
.25
.5 .75 CRRA coefficient
1
Distribution of Risk Attitudes
Distribution of Risk Attitudes 30
30
20
20
10
10
0
0 0
.25
.5
.75
Estimated CRRA Coefficient
Fig. 16.
1
0
.25
.5
.75
1
Estimated CRRA Coefficient
Observed Risk Attitudes and Common-Ratio Tests of EUT.
For some violations it may be easy to write out specific parametric models of the latent EUT decision-making process that can account for the data. The problem is that the model that can easily account for one set of violations need not account for others. As already noted, the preference reversals of Grether and Plott (1979) can be explained by assuming risk-neutral subjects with an arbitrarily small error process, since the paired lotteries are designed to have the same expected value. Hence, each subject is indifferent, and the error process can account for the data.59 But then such subjects should not violate EUT in other settings, such as common ratio tests. However, rarely does one encounter tests that confront subjects with a wide range of tasks and evaluates behavior simultaneously over that wider domain. There are three striking counter-examples to this trend. First, Hey, and Orme (1994) deliberately use lotteries that span a wide range of prizes and probabilities, avoiding ‘‘trip wire’’ pairs, and they conclude that EUT does an excellent job of explaining behavior compared to a wide range of alternatives. Second, Harless and Camerer (1994) consider a wide range of aggregate data across many studies, and find that EUT does a good job of explaining behavior if one places sufficient value on parsimony. On the
Risk Aversion in the Laboratory
107
other hand, all of the data used by Harless and Camerer (1994) come from experimental designs that were intended to be tough on EUT compared to some alternative model; so their data are not as generic as Hey and Orme (1994). Third, Loomes and Sugden (1998) deliberately choose lotteries ‘‘y to provide good coverage of the space within each (implied Marschak–Machina probability) triangle, and also to span a range of gradients sufficiently wide to accommodate most subjects’ risk attitudes.’’ (p. 589). Their coverage is not as wide as Hey and Orme (1994) in terms of the range of CRRA values for which subjects would be indifferent under EUT, but the intent is clearly to provide some variability, and for the right reasons. Maximal statistical power calls for what might be termed a ‘‘complementary slack experimental design’’: choose one set of tasks such that if subjects are risk averse (risk neutral) then the choice model is tested, recognizing that if they are risk neutral (risk averse) then the other set of tasks tests the choice model. Thus, the subjects that clearly provide little information about EUT in common ratio tests in Fig. 16 should provide significant information about EUT in preference reversal tests (Harrison et al., 2007a).60 On the other hand, we know relatively little about what is the most ‘‘ecologically relevant’’ lottery pairs to use if we are trying to model task domains in a representative manner. Our only point is that this consideration deserves more attention by economists interested in making claims about the general validity of EUT or any other model, echoing similar calls from others (Smith, 2003).
3.6. Testing Auction Theory To illustrate the potential importance of controlling for the risk attitude confound in a strategic setting, consider an important case in which there has been considerable debate over the ability of received theory to account for behavior: bidding in a first-price sealed-bid auction characterized by private and independent values.61 Auction theory is very rich, and has been developed specifically for the parametric cases considered in experiments (e.g., Cox, Roberson, & Smith, 1982; Cox, Smith, & Walker, 1988). In a new series of laboratory experiments data are collected on observed valuations and bids, using standard procedures. However, information is also elicited that identifies the risk attitudes of the same subject, since that is a critical characteristic of the predicted bid under the standard model (e.g., Harrison, 1990). It is then straightforward to specify a joint likelihood function for the observed risk aversion responses and bids, estimate the risk aversion
108
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
characteristic, and test if the implied NE bid systematically differs from the observed bid. The results are striking. In the simplest possible case, when there are only two bidders (N ¼ 2), received theory does a wonderful job of characterizing behavior when one controls for the risk attitudes of the individual bidder.62 3.6.1. Theoretical Predictions Cox et al. (1982) develop a model of bidding behavior in first-price sealedbid auctions that assumes that each agent has a CRRA power utility function U(y) ¼ yr, where U is the utility of experimental income y and (1 ri) is the Arrow–Pratt measure of risk aversion (RA). Each agent has their own ri, so each agent is allowed to have distinct risk attitudes. However, ri is restricted to lie on the closed interval (0,1), where ri ¼ 1 corresponds to RN. Hence, this model allows (weak) risk aversion, but does not admit risk-loving behavior.63 Each agent in the model knows their own risk attitude, their own valuation vi, that everyone’s risk attitudes are drawn from the closed interval (0,1), and that everyone’s valuation is drawn from a uniform distribution over the interval (v0, v1). It can then be shown that the symmetric Bayesian NE implies the following bid function: ðN 1Þ bi ¼ v 0 þ (12) ðvi v0 Þ ðN 1 þ ri Þ where there are N active bidders. In the RN case in which v0 ¼ 0, v1 ¼ 1, and ri ¼ 1, this model is the one derived by Vickrey (1961), and calls for bidders to choose their optimal bid using a simple rule: take the valuation received and shade it down by (N 1)/N. When N ¼ 2, the RN NE bidding rule is therefore particularly simple: bid one-half of the valuation. Thus, one might expect the N ¼ 2 case to provide a particularly compelling test of the general RA NE bidding rule, since the optimal RN NE bid is also an arithmetically simple heuristic.64 3.6.2. Experimental Design and Procedures Each subject in our experiment participated in a single session consisting of two tasks. The first task involved a sequence of choices designed to reveal each subject’s risk preferences. In the second task, subjects participated in a series of 10 first-price auctions against random opponents, followed by a small survey designed to collect individual characteristics. A total of 58 subjects from the student population of the University of Central Florida participated over three sessions. The smallest number of subjects in one
Risk Aversion in the Laboratory
109
session was 16, so there was little chance that the subjects would rationally believe that they could establish reputations over the 10 rounds of bidding against a random opponent.65 Each subject was told that they would be privately assigned induced values between $0 and $8, using a uniform distribution. Cox et al. (1982) show that for RN subjects the expected earning of each subject in a firstprice auction is (v1 v0)/N(N+1), where v1 and v0 are the upper and lower bound for the support of the induced values. Thus, expected RN earnings were $1.33 per subject in each period. Subjects in each session were also informed of the number of other bidders in the auction; that the other bidders’ induced values were, like their own, drawn from a uniform support with bounds given above; and that their earnings in the auction would equal their induced value minus their bid if they have the highest bid, or zero otherwise. We used the Holt and Laury (2002) design to elicit risk attitudes from the same subjects. In these experiments, we scaled these baseline prizes of their design, shown in panel A of Table 1, up by a factor of 2, so that the largest prize was $7.70 and the smallest prize was $0.20. The prizes in these lotteries effectively span the range of possible incomes in the auction, which range from $8.00 to zero. 3.6.3. Results Panel B of Fig. 17 displays observed bidding behavior. The induced value is displayed on the bottom axis, a 451 line is shown and corresponds to the subject just bidding their value, and then the RN bid prediction is shown under that 451 line. The standard behavior from a long series of such experiments is observed: subjects tend to bid higher than the RN prediction, to varying degrees. The statistical model consists of a likelihood of observing the risk aversion responses and the observed bidding responses. The likelihood of the risk aversion responses is modeled with a probit choice rule defined over the 10 binary choices that each subject made, exactly as illustrated in Section 1.2 but for the power utility function. To allow for subject heterogeneity with respect to risk attitudes, the parameter r is modeled as a linear function of observed individual characteristics of the subject. For example, assume that we only had information on the age and sex of the subject, denoted Age (in years) and Female (0 for males, and 1 for females). Then we would estimate the coefficients a, b, and g in r ¼ a+b Age+g Female. Therefore, each subject would have a different estimated r, r^, for a given set of estimates of a, b, and g to the extent that the
110
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
8
8 7
6
6
Bid
5 4
4 3
2
2 1
0
0
.55 .6 .65 .7 .75 .8 .85 .9 .95 1 1.05 (A) CRRA Coefficient
(B)
0
1
2
3
4
5
6
7
8
Induced value
Fig. 17. Observed Risk Attitudes and Observed Bidding. (A) Risk Elicitation Task; Power Utility Function Assumed: ro1 is RA, r ¼ 1 is RN, and rW1 is RL. (B) FirstPrice Sealed Bid Auction; 2 bidders per auction over 10 rounds. N ¼ 58 Subjects with Random Opponents. Valuations between $0 and $8.
subject had distinct individual characteristics. So if there were two subjects with the same sex and age, to use the above example, they would literally have the same r^, but if they differed in sex and/or age they would generally have distinct r^. In fact, we use 12 individual characteristics in our model. Apart from age and sex, these include binary indicators for race (NonWhite), a Business major, rich (parental or own income over $80,000 in 2003), high GPA (above 3.75), low GPA (below 3.25), college education for the father of the subject, college education for the mother of the subject, whether the subject works, whether the subject is a Catholic, and whether the subject is some other Christian denomination. Panel A of Fig. 17 displays the predicted risk attitudes from this estimation exercise, using only the risk aversion task. The likelihood of the bidding responses is then modeled as a multiplicative function of the predicted bid conditional on the estimated risk attitude for the subject. Thus, we estimate a coefficient b which scales up or down the predicted NE bid: if b ¼ 1 then the observed bid exactly tracks the predicted bid for that subject. The predicted NE bid for each subject i depends, of course, on the r^i for that subject, as well as the parameters N,
Risk Aversion in the Laboratory
111
v0, v1, and vi. Thus, if we observe two subjects with the same vi but different bids, it is perfectly possible for this to be consistent with the predicted NE bid if they have distinct individual characteristics and hence distinct r^i . The coefficient b is also modeled as a linear function of the same set of individual characteristics as the coefficient r.66 The full specification of the likelihood function for bidding allows for heteroskedasticity with respect to individual characteristics. Thus, the specification is (b bNE)+e, where the variance of e is again a linear function of the individual characteristics. Thus we obtain information from the coefficients of b on which types of subjects deviate systematically from the NE prediction, and we obtain information from the coefficients on e on which types of subjects exhibit more noise in their bidding. The overall likelihood consists of the likelihood of the risk aversion responses plus the likelihood of the bidding responses, conditional on estimates of r, b, and the variance of e. In turn, these three parameters are linear functions of a constant and the individual characteristics of the subject. Since each subject provides 10 binary choices in the risk aversion task, and 10 bids in the auction task, we use clustering to allow for the responses of the same subject to be correlated due to unobserved individual effects. Table 7 displays the ML estimates. The intercept for r is estimated to be 0.612, consistent with evidence from comparable experiments of risk aversion discussed earlier. The intercept for b is 1.02, consistent with bids being centered on the RA NE bid conditional on the estimated risk aversion for each subject. The top panel of Fig. 18 shows the distribution of predicted values of b for each of the 58 subjects. Some subjects have estimates of b as low as 0.8, or as high as 1.35, but the clear majority seem to be tracked well by the RA NE bidding prediction. The bottom panel of Fig. 18 displays a distribution of comparable estimates when we use the RN NE bidding prediction instead of the RA NE bidding prediction, and re-estimate the model. Observed bids are about 25% higher than predicted if one assumes, counter-factually, that subjects are all RN.
3.7. Testing Myopic Loss Aversion Prospect Theory has forced economists to worry about the task domain over which decisions are evaluated, where a sequence of many tasks over time may be treated very differently from a single choice task. PT obviously focuses on the implications for loss aversion from this differential treatment.
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
112
Table 7. Parameter
r
b
e
Maximum Likelihood Estimates for Model of Bidding Behavior. Variable
Point Estimate
Constant Age Female Non-white Major is in business Father completed college Mother completed college Income over $80k in 2003 Low GPA (below 3.24) High GPA (greater than 3.75) Work full-time or parttime Catholic religious beliefs Other Christian religious beliefs
0.612 0.003 0.052 0.011 0.058 0.004 0.003 0.036 0.024 0.190
0.320 0.015 0.079 0.081 0.086 0.083 0.093 0.073 0.098 0.113
0.06 0.82 0.51 0.89 0.50 0.96 0.97 0.62 0.81 0.09
0.02 0.03 0.21 0.15 0.11 0.17 0.18 0.11 0.22 0.03
1.24 0.03 0.10 0.17 0.23 0.16 0.19 0.18 0.17 0.41
0.022
0.074
0.77
0.17
0.12
0.046 0.040
0.130 0.080
0.72 0.62
0.30 0.12
0.21 0.20
1.021 0.007 0.019 0.059 0.023 0.054 0.023 0.078 0.001 0.210
0.721 0.030 0.084 0.079 0.083 0.068 0.085 0.074 0.079 0.124
0.16 0.81 0.82 0.45 0.78 0.43 0.79 0.29 0.99 0.09
0.39 0.07 0.14 0.21 0.14 0.08 0.19 0.07 0.15 0.03
2.43 0.05 0.18 0.09 0.19 0.19 0.14 0.22 0.16 0.45
0.019
0.068
0.78
0.15
0.11
0.035 0.157
0.095 0.080
0.72 0.05
0.15 0.00
0.22 0.31
Constant 0.096 Age 0.011 Female 0.008 Non-white 0.077 Major is in business 0.123 Father completed college 0.330 Mother completed college 0.078
1.093 0.046 0.156 0.162 0.133 0.139 0.199
0.93 0.81 0.96 0.63 0.36 0.02 0.69
2.05 0.10 0.30 0.39 0.14 0.60 0.31
2.24 0.08 0.32 0.24 0.38 0.06 0.47
Constant Age Female Non-white Major is in business Father completed college Mother completed college Income over $80k in 2003 Low GPA (below 3.24) High GPA (greater than 3.75) Work full-time or parttime Catholic religious beliefs Other Christian religious beliefs
Standard p-Value Lower 95% Upper 95% Error Confidence Confidence Interval Interval
113
Risk Aversion in the Laboratory
Table 7. (Continued ) Parameter
Variable
Point Estimate
Income over $80k in 2003 0.044 Low GPA (below 3.24) 0.116 High GPA (greater than 0.044 3.75) Work full-time or part 0.144 time Catholic religious beliefs 0.341 Other Christian religious 0.077 beliefs
Standard p-Value Lower 95% Upper 95% Error Confidence Confidence Interval Interval 0.119 0.111 0.191
0.71 0.29 0.82
0.19 0.10 0.33
0.28 0.33 0.42
0.148
0.33
0.43
0.15
0.157 0.149
0.03 0.61
0.65 0.37
0.03 0.21
Risk-Averse NE Predicted Bid
8 6 4 2 0
Risk-Neutral NE Predicted Bid
8 6 4 2 0 .8
Fig. 18.
.9
1 1.1 1.3 1.2 Estimated Ratio of Actual Bids to Predicted Bids
1.4
Relative Support for Alternative Nash Equilibrium Bidding Models.
Unfortunately, the insight from PT that the evaluation period might differ from setting to setting, or from subject to subject, has not been integrated into EUT. In fact, this insight is often presented as one of the essential points of departure from EUT, and as one of the differentiating characteristics of PT. We argue that behavioral issues of the evaluation period is a
114
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
more general and fundamental concern than concerns about loss aversion in PT. By considering recent experimental tests of aversion of this insight known as Myopic Loss Aversion (MLA), it is possible to see that the insight is just as relevant for EUT, and that a full characterization of risk attitudes must account for the evaluation period. Camerer (2005; p. 130) explains why one naturally thinks of loss aversion and the evaluation period together: A crucial ingredient in empirical applications of loss aversion is decision isolation, or focusing illusion, in which single decisions loom large even though they are included in a stream of similar decisions. If many small decisions are integrated into a portfolio of choices, or a broad temporal view – the way a gambler might view next year’s likely total wins and losses – the loss on any one gamble is likely to be offset by others, so aversion to losses is muted. Therefore, for loss aversion to be a powerful empirical force requires not only aversion to loss but also a narrow focus such that local losses are not blended with global gains. This theme emerges in the ten field studies that Camerer (2000) discusses, which show the power of loss aversion (and other prospect theory features) to explain substantial behaviors outside the lab.
However, there is very little direct experimental evidence, with real stakes, to support MLA. Furthermore, we argue that what evidence there is also happens to be consistent with EUT. By carefully considering those experimental tests from the perspective of EUT and the implications for the characterization of risk attitudes, it is easy to see that the behavioral issue of the evaluation period is a more general and fundamental concern. Several recent studies propose experimental tests that purport to directly test EUT against the alternative hypothesis of MLA. Gneezy and Potters (1997) and Haigh and List (2005) use simple experiments in which many potential confounds are removed.67 Unfortunately, those experiments only test a very special case of EUT against the alternative hypothesis. This special case is CRRA, and it fails rather dramatically. But it is easy to come up with other utility functions that are consistent with EUT and that can explain the observed data without relying on MLA. For example, any utility function with decreasing RRA and that exhibits risk aversion for low levels of income will suffice at a qualitative level. The empirical outcomes observed at the individual level can then be explained by simply fitting specific parameters to this utility function. Appendix E demonstrates this intuitively, as well as more formally. Our new analysis of the GP data presented in Appendix E also identifies some unsettling implications of these experiments for MLA: that the key ‘‘loss aversion’’ parameters of the standard MLA model vary dramatically
Risk Aversion in the Laboratory
115
according to the exogenously imposed evaluation period and that the risk attitudes are the opposite of those generally assumed in PT, viz., risk loving in gains and risk averse in losses. Thus, the behaviorist explanation is hoisted on the same petard it alleged applied to the EUT explanation, the presence of anomalous behavior. However, although it is useful and trivial to come up with a standard EUT story that accounts for the data, and even fun to find an anomaly for the behaviorists to ponder, these experiments force one to examine a much deeper question than ‘‘can EUT explain the data?’’ That question is whether utility is best defined over each individual decision that the subject faces or over the full sequence of decisions that the subject is asked to make in an experimental session, or perhaps even including extra-lab decisions. Depending on how the subjects interpret the experimental task, these frames could differ in this experimental task. This perspective suggests the hypothesis that behavior might be better characterized as a mixture of two latent data generating processes, as suggested by Harrison and Rutstro¨m (2005) and Section 3.3, with some subjects using one frame and other subjects using another frame. A related issue underlying the assessment of behavior from these experiments is asset integration within the laboratory session. What incomes are arguments of the utility functions of the subjects? The common assumption in experimental economics is that it is simply the prizes over which they were making choices whenever they got to make a choice.68 But what about asset integration of income earned during the sequence of rounds? Gneezy and Potters (1997; p. 636) note that this could affect risk attitudes in a more general specification, but assert that the effect is likely to be small given the small stakes. This may be true, but is just an assertion and deserves more complete study using the general framework proposed by Cox and Sadiraj (2006). The Gneezy and Potters (1997) data provide an opportunity to study this question, since subjects received information on their intra-session income flows at different rates. Hence one could, in principle, test what function of accumulated wealth was relevant for their choices. We believe that the fundamental insight of Benartzi and Thaler (1995) of the importance of the evaluation horizon of decision makers is worthy of more attention, even though we find that the present tests of MLA have been somewhat misleading. The real contribution of the MLA literature and the experimental design of Gneezy and Potters (1997) is to force mainstream economists to pay attention to an issue they have neglected within their own framework.
116
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
3.8. The Random Lottery Incentive Procedure The random lottery incentive procedure originated from the desire to avoid ‘‘wealth or portfolio effects’’ of subjects making multiple choices at once to determine their final experimental income.69 It also has the advantage of saving scarce experimental subject payments, but that arose originally as a happy by-product. The procedure bothers theorists and non-experimenters, particularly when one is using the experimental responses to estimate risk attitudes. The reason is that there is some ambiguity as to whether the subject is evaluating the utility of the lottery in each choice, or the compound lottery that includes the random selection of one lottery for payment. The procedure also imposes a motivational constraint on the level of incentives one can have in certain elicitation tasks. To generate better econometric estimates we would like to gather more choices from each subject: witness the glee that Wilcox (2008a) expresses over the sample size of the design in Hey (2001). Each subject in that design generated 500 binary choices, over five sessions at separate times, and was paid for one selected at random. But 1-in-500 is a small number, even if the prizes were as high as d125 and EV maximization would yield a payoff of just over d79. So there is a tension here, in which we want to gather more choices per subject, but run the risk that the probability of any one choice being realized drops as we do so. The experiments of Hey (2001) are remarkable because they appear to have motivated subjects well – aggregate error rates from repeated tasks are very low compared to those found in comparable designs with fewer tasks (Nathaniel Wilcox; personal communication). What we would like to do is run an experiment with as many choices as we believe that subjects can perform without getting bored, but ensure that they do not see each choice as having a vanishing chance of being salient. In our experience, 60 binary choices are about the maximum we can expect our subjects to undertake without visible signs of boredom setting in. But even 1-in-60 sounds small, and may be viewed that way by subjects, effectively generating hypothetical responses and the biases that typically come with them (see Section 4.1). Of course, this is a behavioral issue: do subjects focus on the task as if it were definitely the one to be paid, or do they mostly focus on the likelihood of the task determining their earnings? Several direct tests of this procedure lead some critics of EUT to the conclusion that the procedure appears, as an empirical matter, to induce no cross-task contamination effects when choices are over simple lottery prospects; see Cubitt, Starmer, & Sugden (1988b; p. 129), for example. Related tests include Starmer and Sugden (1991) and Beattie and Loomes
Risk Aversion in the Laboratory
117
(1997). So the empirical evidence suggests that it does not matter behaviorally. On the other hand, doubts remain. Certain theories of decision-making under risk differ in terms of the predicted effect these procedures have on behavior. To take an important example, consider the use of the random lottery incentive procedure in the context of an MPL task. The theoretical validity of this procedure presumes EUT, and if EUT is invalid then it is possible that this procedure might be generating invalid inferences. Under EUT it does not matter if the subjects evaluate their choices in each task separately, make one big decision over the whole set of tasks, or anything in between, since the random incentive is just a ‘‘common ratio probability’’ applied to each task. However, under RDU or PT this common ratio probability could lead to very different choices, depending on the extent of probability weighting. Hey and Lee (2005a, 2005b) provide evidence that subjects do not appear to consider all possible tasks, but their evidence is provided in the context of RLP designs discussed in Section 1.2. In that case the subject does not know the exact lotteries to be presented in the future, after the choice before him is made, so one can readily imagine the cognitive burden involved in anticipating what the future lotteries will be.70 But for the MPL instrument the subject does know the exact lotteries to be presented in the whole task, and the set of responses can be plausibly reduced in number to just picking one switch point, rather than picking from the 210 ¼ 1024 possible binary choices in 10 rows. Thus, the MPL instrument may be more susceptible to concerns with the validity of the random lottery incentive procedure than other instruments.71 On the other hand, it is not obvious theoretically that one wants to avoid ‘‘portfolio effects’’ when eliciting risk attitudes. These effects arise as soon as subjects are paid for more than one out of K choices. Again, consider the same type of binary choice experiments considered above. The standard implementation of the random lottery incentive mechanism in experiments such as these would have one choice selected at random. For the case of investigating ‘‘preference reversals’’ the reason for only using one choice is well explained by Cox and Epstein (1989; p. 409): Economic theories of decision making under risk explain how variations in wealth can affect choices. Thus an agent with wealth w may prefer lottery A to lottery B but that ^ same agent with wealth waw may prefer lottery B to A. Therefore, the results of preference reversal experiments that allow a subject’s wealth to change between choices cannot provide a convincing challenge to economic theory unless it can be shown that wealth effects cannot account for the results.
118
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Economic theories of decision making under risk provide explanations of optimal portfolio choice. Such theories explain why an agent might prefer lottery A to lottery B but prefer the portfolio (A, B) to the portfolio (A, A). If the portfolio is accumulated by sequential choice of A over B and then B over A, an apparent preference reversal could consist of choices that construct an agent’s optimal portfolio.
When the interest is in the inferred risk coefficient, however, the possibility of subjects choosing portfolios to match their preferences have different implications. To avoid risk-pooling incentives, the outcomes of the lotteries must be uncorrelated, which is normally the case in such experiments. Nevertheless, even then it is possible for a subject to prefer the portfolio (A, B) to (A, A) even if he would prefer A to B when being paid only for one of his choices. To see this, recall that the lottery options presented to subjects are always discrete. In the MPL, for example, a switch from lottery A to lottery B on row 6 would lead us to infer a risk aversion coefficient that is in a numeric interval, (0.14, 0.41) in the Holt and Laury (2002) experiments. An individual with a risk aversion coefficient close to the boundaries of this interval would always pick (B, B) or (A, A), but an individual with a risk aversion coefficient in the middle of the interval would have a preference for a mixed portfolio of (A, B). Paying for more than one lottery therefore elicits more information and allows a more precise expression of the risk preference of each subject. The point is that we then have to evaluate risk attitudes assuming that subjects compare portfolios, rather than comparing one individual lottery with another individual lottery. If we do that, then there is no theoretical reason for avoiding portfolio effects for this inferential purpose. There may be a practical and behavioral reason for avoiding that assumption in the design considered by Hey and Lee (2005a, 2005b), given the cognitive burden (to subject and analyst) of constructing all possible expected portfolios. The behavioral significance of the portfolio effect can be directly tested by varying the number of lottery choices to be paid. In our replication of Hey and Orme (1994) we defaulted to having 60 binary lottery choices. Over 60 binary choices we used three choices for payment, to ensure comparability of rewards with other experiments in which subjects made choices over 40 or 20 lotteries, and where 2 lotteries or 1 lottery was respectively selected at random to be played out. Thus, the 1-in-20 treatment corresponds exactly to the random lottery incentive procedure that avoids portfolio effects, and the other two treatments raise the possibility of these effects. All of these tasks were in the gain frame, and all involved subjects being provided information on the EV of each lottery. The samples consisted of 11, 21, and 25 subjects in the 20, 40, and 60 lottery treatments, respectively, for a pooled sample of
Risk Aversion in the Laboratory
119
57 subjects. All the lottery outcomes were uncorrelated by executing independent draws. We find no evidence of portfolio effects, measured by the effect on the mean elicited risk attitudes. Assume an EUT model initially, and use the CRRA function given by Eq. (1), with a Fechner error specification. Pooling data over tasks in which the subject faced 20, 40, or 60 lotteries, on a between-subjects basis, and including a binary dummy for those sessions with 20 or 40 lotteries, there is no statistically significant effect on elicited risk attitudes. Quite apart from statistical insignificance, the estimated effect is small: around 70.04 or less in terms of the risk aversion coefficient. The same conclusion holds with a comparable RDU model, whether one looks at the concavity of the utility function, Eq. (1), the curvature of the probability weighting function, Eq. (9), or both. This valuable result is worth replicating with larger samples and in different elicitation procedures. We want to have more binary choices from the same subject to get more precise estimates of latent structural models, but on the other hand we worry that paying 1-in-K choices for K ‘‘large’’ might seriously dilute incentives for thoughtful behavior over consequential outcomes. If one can modestly increase the salience of each choice, as implemented here, and not worry about portfolio effects, then it is possible to use values of K that allow much more precise estimates of risk attitudes. Of course, the absence of the portfolio effect must be checked behaviorally, as illustrated here.
3.9. Summary Estimates We finally collate some ‘‘preferred’’ estimates of simple specifications of risk attitudes from the various designs and statistical specifications in the literature. We do not mechanically list every estimate from every design and specification, in the spirit of some meta-analyses, ignoring the weaknesses we have discussed in each. Instead, we use a priori judgements to focus on two of the designs that we believe to be most attractive, the statistical specifications we believe to be the best available, and the studies we have the most reliable data. One design is the classic data set of Hey and Orme (1994), and the other is the classic design of Holt and Laury (2002, 2005). We favor the Holt and Laury (2005) study over Holt and Laury (2002), because of the contaminant of order effects in the earlier design, identified by Harrison, Johnson, McInnes, and Rutstro¨m (2005b). Similarly, we favor the Fechner error specification of Hey and Orme (1994) over the Luce specification of Holt and Laury (2002), for reasons detailed by Wilcox
120
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
(2008a).72 We also augment the British data from Hey and Orme (1994) with results from our replications with U.S. college students. We consider CRRA and EP variants for EUT, and also consider some simple RDU specifications. The CRRA utility function is specification Eq. (1) from Section 2.2, the EP utility function is specification Eq. (1u) from Section 2.2, and the probability weighting function is the popular specification Eq. (9) from Section 3.1. We do not consider CPT, due to ambiguity over the interpretation of reference points in the laboratory. So this is a selective summary, guided by our views on these issues. We assume a homogenous preferences specification, with no allowances for heterogeneity across subjects. In part, this is to anticipate their use by theorists interested in using point estimates for ‘‘calibration finger arithmetic’’ (Cox & Sadiraj, 2008). We stress that these point estimates have standard errors, and structural noise parameters, and that out-ofsample predictions of utility will have ever-expanding confidence intervals for well-known statistical reasons. We encourage theorists not to forget this simple statistical point when taking estimates such as these to make predictions over domains that they were not estimated over. Of course, the corollary is that we should always qualify estimates such as these by referencing the domain over which the responses were made. Indeed, Cox and Sadiraj (2008) show that these point estimates can produce implausible thought experiments far enough out of sample. Cox and Harrison (2008) provide further discussion of this point, using estimates from Table 8, and we return to it below. Table 8 collects the estimates.73 In each case the preferred model is the RDU specification with the EP utility function. Some interesting patterns emerge. First, there appears to be very little substantive probability weighting in the Hey and Orme (1994) data, even if the coefficient g is statistically significantly different from 1: indeed, the log-likelihood of the EUT and RDU specifications with EP are close. Second, the estimates from the two implementations of the Hey and Orme (1994) design generate estimates that are remarkably similar. Third, the extent and nature of probability weighting varies significantly in the Holt and Laury (2005) data depending on the assumed utility function. Fourth, there is evidence of decreasing RRA in the Hey and Orme (1994) data and our replication, with ao0, but evidence of very slightly increasing RRA in the Holt and Laury (2005) data. Finally, the estimates of the concavity of the utility function do not seem to depend so much on the EUT or RDU specification, as on the choice of utility function. To return to the point about how estimates such as these should be ‘‘read’’ by theorists, and qualified by those presenting them, consider the
121
Risk Aversion in the Laboratory
Table 8. Specification
Parameter
Summary Estimates.
Estimate Standard Error
pValuea
Lower 95% Upper 95% Log Confidence Confidence Likelihood Interval Interval
Hey and Orme (1994): N ¼ 80 subjects, pooled over both tasks; 15,567 responses, excluding indifference EUT with r 0.61 0.03 0.56 0.66 8865.01 CRRA m 0.78 0.06 0.67 0.90 EUT with r 0.82 0.02 0.80 0.84 8848.03 a 1.06 0.04 1.13 0.99 ExpoPower m 0.47 0.04 0.39 0.55 RDU with r 0.61 0.03 0.56 0.66 8861.18 CRRA g 0.99 o0.01 0.98 1.00 m 0.78 0.05 0.67 0.89 RDU with r 0.82 0.01 0.80 0.84 8844.11 Expoa 1.06 0.04 1.13 0.99 Power g 0.99 o0.01 0.98 1.00 m 0.46 0.04 0.38 0.54 Our replication of Hey and Orme (1994): N ¼ 63 subjects in gain domain; 3,736 responses, excluding indifference EUT with r 0.53 0.05 0.44 0.62 2418.62 CRRA m 0.79 0.06 0.67 0.91 r 0.78 0.02 0.74 0.82 2412.26 EUT with a 1.10 0.05 1.19 1.00 Expom 0.58 0.05 0.48 0.69 Power RDU with r 0.53 0.04 0.45 0.62 2414.46 CRRA g 0.97 0.01 0.95 0.99 m 0.78 0.05 0.66 0.90 RDU with r 0.78 0.02 0.74 0.82 2408.25 Expoa 1.10 0.05 1.19 1.01 Power g 0.97 0.01 0.95 0.99 m 0.57 0.05 0.47 0.67 Holt and Laury (2005): N ¼ 96 non-hypothetical responses EUT with r CRRA m EUT with r a ExpoPower m RDU with r CRRA g m RDU with r Expoa Power g m a
subjects, pooled over 1 and 20 tasks, with no order effects; 960 0.76 0.94 0.40 0.07 0.12 0.85 1.46 0.89 0.26 0.02 0.37 0.06
0.04 0.15 0.07 0.02 0.02 0.08 0.35 0.14 0.05 0.01 0.15 0.02
Empty cells are p-values that are less than 0.005. The null hypothesis here is that g ¼ 1.
b
0.19b
0.16
0.68 0.64 0.25 0.04 0.07 0.69 0.77 0.61 0.16 0.01 0.07 0.02
0.84 1.24 0.54 0.11 0.16 1.00 2.15 1.17 0.36 0.04 0.67 0.11
330.93 303.94
325.50
288.09
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
122
A. In-Sample
B. Out-of-Sample
10
40
8 30 6 20 4 10 2
0
0 0
5
10
15
Prizes in U.S. Dollars
20
0
25 50 75 100 125 150 175 200 225 250
Prizes in U.S. Dollars
Fig. 19. Estimated In-Sample and Out-of-Sample Utility. (Estimated from Responses of 63 Subjects over 60 Binary Choices. Assuming EUT CRRA Specification with Fechner Error. Data from Our Replication of Hey and Orme (1994): Choices Over Prizes of $0, $5, $10, and $15. Point Prediction of Utility and 95% Confidence Intervals.) (A) In-sample. (B) Out-of Sample.
predicted utility values in Fig. 19. These predictions are from our replications of the Hey and Orme (1994) design, and the estimates for the EUT CRRA specification in Table 8. Fig. 19 displays predicted in-sample utility values and their 95% confidence interval using these estimates. Obviously the cardinal values on the vertical axis are arbitrary, but the main point is to see how relatively tight the confidence intervals are in relation to the changes in the utility numbers over the lottery prizes. Note the slight ‘‘flare’’ in the confidence interval in panel A of Fig. 19, as we start to modestly predict utility values beyond the top $15 prize used in estimation. Panel B extrapolates to provide predictions of out-of-sample utility values, up to $250, and their 95% confidence intervals. The widening confidence intervals are exactly what one expects from elementary econometrics. And these intervals would be even wider if we accounted for our uncertainty that this is the correct functional form, and our uncertainty that we had used the correct stochastic identifying assumptions. Moreover, the (Fechner) error specification used here allows for an extra element of imprecision when predicting what a subject would actually choose after evaluating the expected utility of the out-of-sample lotteries, and
Risk Aversion in the Laboratory
123
this does not show up in Fig. 19 since we only use the point estimate of m. The lesson here is that we have to be cautious when we make theoretical and empirical claims about risk attitudes. If the estimates displayed in panel A of Fig. 19 are to be used in the out-of-sample domain of panel B of Fig. 19, the extra uncertainty of prediction in that domain should be acknowledged. Cox and Sadiraj (2008) shows why we want to make such predictions, for both EUT and non-EUT specifications; we review the methods that can be used to generate these data, and econometric methods to estimate utility functions; and Wilcox (2008a) shows how alternative stochastic assumptions can have strikingly different substantive implications for the estimation of out-of-sample risk attitudes.
4. OPEN AND CLOSED QUESTIONS We briefly review some issues which are, in our view, wide open for research or long closed.
4.1. Hypothetical Bias Top of the ‘‘closed’’ list for us is the issue of hypothetical bias. This was a prime focus of Holt and Laury (2002, 2005), and again in Laury and Holt (2008), and has been reviewed in detail by Harrison (2007). For some reason, however, many proponents of behavioral economics insist on using task responses that involve hypothetical choices. One simple explanation is that many of the earliest examples in behavioral economics came from psychologists, who did not use salient rewards to motivate subjects, and this tradition just persisted. Another explanation is that an influential survey by Camerer and Hogarth (1999) is widely mis-quoted as concluding that there is no evidence of hypothetical bias in such lottery choices. What Camerer and Hogarth (1999) actually conclude, quite clearly, is that the use of hypothetical rewards makes a difference to the choices observed, but that it does not generally change the inference that they draw about the validity of EUT.74 Since the latter typically involve paired comparisons of response rates in two lottery pairs (e.g., in common ratio tests), it is logically possible for there to be (i) differences in choice probabilities in a given lottery depending on whether one use hypothetical or real responses,
124
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
and (ii) no difference between the effect of the EUT treatment on lottery pair responses rates depending on whether one uses hypothetical or real responses. Furthermore, Camerer and Hogarth (1999) explicitly exclude from their analysis the mountain of data from experiments on valuation that show hypothetical bias.75 Their rationale for this exclusion was that economic theory did not provide any guidance as to which set of responses was valid. This is an odd rationale, since there is a well-articulated methodology in experimental economics that is quite precise about the motivational role of salient financial incentives (Smith, 1982). And the experimental literature has generally been careful to consider elicitation mechanisms that provide dominant strategy incentives for honest revelation of valuations, and indeed in most instances explain this to subjects since it is not being tested. Thus, economic theory clearly points to the real responses as having a stronger claim to represent true valuations. In any event, the mere fact that hypothetical and real valuations differ so much tells us that at least one of them is wrong! Thus, one does not actually need to identify one as reflecting true preferences, even if that is an easy task a priori, in order to recognize that there are systematic and large differences in behavior between hypothetical and real responses.
4.2. Sample Selection This is a wide-open issue that experimental economists will have to confront systematically before other researchers from labor economics do so for them. It is likely to be a significant factor in many experiments, since randomization to treatment is fundamental to statistical control in the design of experiments. But randomization implies some uncertainty about treatment condition, and individuals differ in their preferences towards taking on risk. Since human subjects volunteer for experiments, it is possible that the sample observed in an experiment might be biased because of the risk inherent in randomization. In the extreme case, subjects in experiments might be those that are least averse to being exposed to risk. For many experiments of biological response this might not be expected to have any influence on measurement of treatment efficacy, but other laboratory, field and social experiments measure treatment efficacy in ways that could be directly affected by randomization bias.76 On the other hand, the practice in experimental economics is to offer subjects a fixed participation fee to encourage attendance. These
Risk Aversion in the Laboratory
125
non-stochastic participation fees could offset the effects of randomization, by encouraging more risk-averse subjects to participate than might otherwise be the case. Thus, the term ‘‘randomization bias,’’ in the context of economics experiments, should be taken to mean the net effects from these two latent sample selection effects.77 There is indirect evidence for these sample selection effects within the laboratory. One can recruit subjects to an experiment, conduct a test of risk attitudes, and then allow subjects to sort themselves into a given task rewarded by fixed or performance-variable payments. Cadsby, Song, and Tapon (2007) and Dohmen and Falk (2006) did just this, and show that more risk-averse subjects select into tasks with fixed rewards rather than rewards that vary with uncertain performance, and suffer in terms of expected pay. Of course, they were happy to forego some expected income in return for reduced variance. But these results strongly suggest that there would be an effect from risk attitudes if one moved the sample selection process one step earlier to include the choice to participate in the experimental session itself.78 Harrison, Lau, and Rutstro¨m (2005c) undertake a field experiment and a laboratory experiment to directly test the hypothesis that risk attitudes play a role in sample selection.79 In both cases they followed standard procedures in the social sciences to recruit subjects. In their experiments the primary source of randomness had to do with the stochastic determination of final earnings, as explained below. They also employed random assignment to treatment in some experiments, but the general point applies whether the randomness is due to assignment to treatment or random determination of earnings, since the effect is the same on potential subjects. Nevertheless, it is reasonable to suspect that members of most populations from which experimenters recruit participants hold beliefs that the benefits from participating are uncertain. All that is required for sample selection to introduce a bias in the risk attitude of the participants is an expectation of uncertainty, not an actual presence of uncertainty in the experimental task. In the field experiment it was possible to exploit the fact that the experimenter already knew certain characteristics of the population sampled, adults in Denmark in 2003, allowing a correction for sample selection bias using well-known methods from econometrics. The classic problem of sample selection refers to possible recruitment biases, such that the observed sample is generated by a process that depends on the nature of the experiment.80 In principle, there are two offsetting forces at work in this sample selection process, as mentioned above. The use of randomization
126
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
could attract subjects to experiments that are less risk averse than the population, if the subjects rationally anticipate the use of randomization.81 Conversely, the use of guaranteed financial remuneration, common in experiments in economics for participation, could encourage those that are more risk averse to participate. These field experiments therefore allowed an evaluation of the net effect of these opposing forces, which are intrinsic to any experiment in which subjects are voluntarily recruited with financial rewards. The results indicate that measured risk aversion is smaller after corrections for sample selection bias, consistent with the hypothesis that the use of a substantial, guaranteed show-up fees more than offset any bias against attending an experiment that involved randomization. This effect is statistically significant. The results also suggest that there is no evidence that any sample selection that occurred influenced inferences about the effects of observed individual demographic characteristics on risk aversion. Harrison et al. (2005c) then conducted a laboratory experiment to complement the insights from their field experiment, and explore the conclusion that a larger gross sample selection effect might have been experienced due to randomization, but that the muted net sample selection effect actually observed was due to ‘‘lucky’’ choices of participation fees. The field design used the same fixed recruitment fee for all subjects, to ensure comparability of subjects in terms of the behavioral task. In the laboratory experiments this fixed recruitment fee was exogenously varied. If the level of the fixed fee affects the risk attitudes of the sample that choose to participate in the experiment, at least over the amounts they consider, one should then be able to directly see different risk attitudes in the sample. As expected a priori, they observed samples that were more risk averse when a higher fixed participation fee was used. In another treatment in the laboratory experiments they vary only the range of the prizes possible in the task, keeping the fixed participation fee constant, but announcing these ranges at the time of recruitment. In this case, they observed samples that were more risk averse when the range of prizes was widened, compared to the control. Hence, the level of the fixed recruitment fee and information on the range of prizes in the experiment had a direct influence on the composition of the sample in terms of individual risk attitudes. The implication is that experimental economists should pay much more attention to the process that leads subjects to participate in the experiment if they are to draw reliable inferences in any setting in which risk attitudes play a role. This is true whether one conducts experiments in the laboratory or the field.82
Risk Aversion in the Laboratory
127
A closely related issue is what role risk attitudes may play in affecting subjects’ participation choices over different institutions or cohorts when such choices are allowed.83 It is common in the experimental literature to study behavior in two or more institutions imposed exogenously on subjects, or to put subjects together exogenously. But in the naturally occurring world that our experiments are modeling, people choose institutions to some degree, and choose who to interact with to some degree. The effect of treatments may be completely different when people have some ability to select into them, or some ability to choose the cohorts to participate with, compared to the standard experimental paradigm. In effect, the experiment just has to be widened to include these processes of selection, if appropriate for the behavior under study. The broader experimental literature now identifies many possible mechanisms for this process, such as migration from one region to another in which local public policies exhibit differences (Ehrhart and Keser (1999), Page, Putterman and Unel (2005), Gu¨rerk, Irlenbusch, and Rockenbach (2006)), voting in an explicit social choice setting (Botelho, Harrison, Pinto, & Rutstro¨m, 2005a; Ertan, Page, & Putterman, 2005; Sutter, Haigner, & Kocher, 2006), lobbying for policies (Bullock & Rutstro¨m, 2007), and even the evolution of social norms of conduct (Falk, Fehr, & Fischbacher, 2005). Each of these processes will interact with the risk attitudes of subjects.
4.3. Extending Lab Procedures to the Field One of the main attractions of experimental methods is the control that it provides over factors that could influence behavior. The ability to control the environment allows the researcher to study the effects of treatments in isolation, and hence makes it easier to draw inferences as to what is influencing behavior. In most cases we are interested in making inferences about field behavior. We hypothesize that there is a danger that the imposition of an exogenous laboratory control might make it harder, in some settings, to make reliable inferences about field behavior. The reason is that the experimenter might not understand something about the factor being controlled, and might impose it in a way that is inconsistent with the way it arises naturally in the field, and that affects behavior. Harrison et al. (2007c) take as a case study the elicitation of measures of risk aversion in the field. In the traditional paradigm, risk aversion is viewed in terms of diminishing marginal utility of the final prize in some abstract lottery. The concept of a lottery here is just a metaphor for a real lottery,
128
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
although in practice the metaphor has been used as the primary vehicle for laboratory elicitation of risk attitudes. In general there is some commodity x and various levels i of x, xi, that depend on some state of nature which occurs with a probability pi that is known to the individual whose preferences are being elicited. Thus, the lottery is defined by {xi; pi}. Traditional measures of risk aversion under EUT are then defined in terms of the curvature of the utility function with respect to x. Now consider the evaluation of risk attitudes in the field. This generally entails more than just ‘‘leaving the classroom’’ and recruiting outside of a university setting, as emphasized by Harrison and List (2004). In terms of sample composition, it means finding subjects who deal with that type of uncertainty to varying degrees, and trying to measure the extent of their field experience with uncertainty. Moreover, it means developing stimuli that more closely match those that the subjects have previously experienced, so that they can use whatever heuristics they have developed for that commodity when making their choices. Finally, it means developing ways of communicating probabilities that correspond with language that the subjects are familiar with. Thus, field experimentation in this case, and in general, involves several simultaneous changes from the lab setting with respect to subject recruitment and the development of stimuli that match the field setting. Apart from sample and task selection issues a second theme that is important to the relevance of lab findings to field inferences is the influence of ‘‘background risk’’ on the attitudes towards a specific ‘‘foreground risk’’ that is the focus of the elicitation task. In many field settings it is not possible to artificially identify attitudes towards one risk source without worrying about how the subjects view that risk as being correlated with other risks. For example, mortality risks from alternative occupations tend to be highly correlated with morbidity risks. It is implausible to ask subjects their attitude toward one risk without some coherent explanation as to why a higher or lower level of that risk would not be associated with a higher or lower risk of the other. Apart from situations where risks may be correlated, ‘‘background risk’’ can have an influence on elicited risk attitudes also when it is independent of the ‘‘foreground risk.’’ The theoretical literature has also yielded a set of preferences that guarantee that the addition of an unfair background risk to wealth reduces the CE of any other independent risk. That is, the addition of background risk of this type makes risk-averse individuals behave in a more risk averse way with respect to any other independent risk. Gollier and Pratt (1996) refer to this type of behavior as ‘‘risk vulnerability,’’ and show
Risk Aversion in the Laboratory
129
that all weakly Decreasing Absolute Risk Averse utility functions are risk vulnerable. This class includes many popular characterizations of risk attitudes, such as CARA and CRRA. Eeckhoudt, Gollier, and Schlesinger (1996) extend these results by providing the necessary and sufficient conditions on the characterization of risk aversion to ensure that any increase in background risk induces more risk aversion. The field experiment in Harrison et al. (2007c) is designed to analyze such situations of independent multiple risk. The compare using monetary prizes to using prizes whose values involve some risk and conclude that the risk attitudes elicited are not the same in the two circumstances. These prizes are collector coins and the subjects are numismatists. They find that the subjects are generally more risk averse over the prizes when the latter involve additional, and independent, risk.84 These results are consistent with the available theory from conventional EUT for the effects of background risk on attitudes to risk. Thus, applying risk preferences that have been elicited in the lab to field settings with background risks may underestimate the extent to which decisions will reflect risk aversion. In addition, eliciting risk attitudes in a natural field setting with natural tasks and non-monetary prizes requires one to consider the nature and degree of background risk, since it is inappropriate to ignore.85 A further virtue of extending lab procedures to the field, therefore, is to encourage richer lab designs by forcing the analyst to account for realistic features of the natural environment that have been placed aside. In virtually any market with asymmetric information, whether it is a coins market, an open-air market, or a stock exchange, a central issue is the quality of the object being traded. This issue, and attendant uncertainty, arises naturally. In many markets, the grade of the object, or professional certification of the seller, is one of the critical variables determining price. Thus, one could scarcely design a test of foreground risk in these markets without attending to the background risk. Harrison et al. (2007c) exploit the fact that such risks can be exogenously controlled in these settings, and in a manner consistent with the predictions of theory.86 In a complementary manner, Fiore, Harrison, Hughes, and Rutstro¨m (2007) consider how one can use simulation tools to represent ‘‘naturally occurring probabilities.’’ As one moves away from the artifactual controls of the laboratory, distributions of outcomes are not always discrete, and probabilities are not given from outside. They are instead estimated as the result of some process that the subject perceives. One approach to modeling such naturally occurring probabilities in experiments is to write out a numerical simulation model that represents the physical process
130
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
that stochastically generates the outcome as a function of certain inputs, render that process to subjects in a natural manner using tools of Virtual Reality, and study how behavior changes as one changes the inputs. For example, the probability that a wildfire will burn down a property ‘‘owned’’ by the subject might depend on the location of the property, the vegetation surrounding it, the location of the start of the wildfire, weather conditions, and interventions that the subject can choose to pay for to reduce the spread of a wildfire (e.g., prescribed burning). This probability can be simulated using a model, such as FARSITE developed by the U.S. Forest Service (Finney, 1998) to predict precisely these events. Thus, the subject sees a realistic rendering of the process generating a distribution over the binary outcome, ‘‘my property burns down or not.’’ By studying how subjects react to this process, one can better approximate the manner in which risk attitudes affect decisions in naturally occurring environments.
5. CONCLUSION At a substantive level, the most important conclusion is that the average subject is moderately risk averse, but there is evidence of considerable individual heterogeneity in risk attitudes in the laboratory. This heterogeneity is in evidence within given elicitation formats, so it cannot be ascribed to differences in elicitation formats. The range of risk attitudes is modest, however, and there is relatively little evidence of risk-loving behavior. The temptation to talk about a ‘‘central tendency’’ of ‘‘slight risk aversion’’ does not fit well with the bi-modal nature of the responses observed in several studies: a large fraction of subjects is well characterized as being close to risk neutral, or very slightly risk averse, and another large fraction as being quite risk averse. At a methodological level, the evidence suggests some caution in expecting different elicitation formats to generate comparable data on risk attitudes. Both the framing of the questions and the implied incentives differ across instruments and may affect responses. One would expect the MPL and RLP procedures to generate comparable results, since they are so similar from a behavioral perspective, and they do. The OLS instrument is very portable in the field, has transparent incentives for truthful responses, and is easy to administer in all environments, so more work comparing its performance to the MPL and RLP instruments would be valuable. It suffers from not being able to provide a rich characterization of behavior when
131
Risk Aversion in the Laboratory
allowances are made for probability weighting, but that may be mitigated with extensions to consider probabilities other than 1/2. In the Epilogue to a book-length review of the economics of risk and time, Gollier (2001; p.424ff.) writes that It is quite surprising and disappointing to me that almost 40 years after the establishment of the concept of risk aversion by Pratt and Arrow, our profession has not yet been able to attain a consensus about the measurement of risk aversion. Without such a consensus, there is no hope to quantify optimal portfolios, efficient public risk prevention policies, optimal insurance deductibles, and so on. It is vital that we put more effort on research aimed at refining our knowledge about risk aversion. For unclear reasons, this line of research is not in fashion these days, and it is a shame.
The most important conclusion we draw from our survey is that reliable laboratory methods exist to determine the individual aversion to risk of a subject, or to characterize the distribution of risk attitudes of a specific sample. These methods can now be systematically employed to ensure greater control over tests and applications of theory that depend on risk attitudes.
NOTES 1. For example, in virtually all experimental studies of non-cooperative bargaining behavior. A particularly striking example is provided by Ochs and Roth (1989), since Roth and Malouf (1979) pioneered the use of experimental procedures to induce risk neutral behavior in cooperative bargaining settings. 2. For example, in virtually all experimental studies of bidding behavior in firstprice auctions, whether in private values settings (Cox et al., 1982) or common values settings (Kagel & Levin, 2002). 3. For example, the experimental literature on bidding behavior in first-price sealed bid auctions relies on predictions that are conditioned on the subjects following some Nash Equilibrium strategy as well as being characterized by risk in some way. Overbidding in comparison to the risk-neutral prediction could be due to failure of either the assumption of Nash bidding or the failure of the assumption of risk neutrality (Section 3.6). Harrison (1990) and Cox, Smith, and Walker (1985) attempt to tease these two possibilities apart using different designs. 4. We do not consider experimental designs that attempt to control for risk, or induce specific risk attitudes. Our general focus is on direct estimation of risk attitudes where rewards are real and there is some presumption that the procedure is incentive compatible. There is a huge, older literature on the elicitation of utility, but virtually none of it is concerned with incentive compatibility of elicitation, which we take as central. Great reviews include Fishburn (1967) and Farquhar (1984). Many components of the procedures we consider can be viewed as building on methods developed in this older literature. Biases in utility elicitation procedures are reviewed by Hershey et al. (1982), although again there is no discussion at all of incentive compatibility or hypothetical rewards bias.
132
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
5. Birnbaum (2004) illustrates the type of systematic comparison of representations that ought to be built in for broader research programs. He considers various representations of probability in terms of text, pie charts, natural frequencies, and alignments of equally likely consequences, as well as minor variants within each type of representation. One reason for this focus is his concern with violations of stochastic dominance, which is an elemental behavioral property of decisions, and presumed to be directly affected by task representation. In brief, he finds little effect on the extent of stochastic dominance of these alternative representations. That conclusion is limited to the hypotheses he considers, of course; there could still be an effect on structural estimates of underlying models, and other hypotheses derived from those estimates. Unfortunately, the procedures he uses, still common in the psychology and decision-making literature, employ hypothetical or near-hypothetical rewards for subjects to make salient decisions. Wakker, Erev, and Weber (1994) considered four types of representations, shown in Appendix A, in salient choices, but provide no evaluation of the effects of the alternatives. 6. There is an interesting question as to whether they should be provided. Arguably some subjects are trying to calculate them anyway, so providing them avoids a test of the joint hypothesis that ‘‘the subjects can calculate EV in their heads and will not accept a fair actuarial bet.’’ On the other hand, providing them may cue the subjects to adopt risk-neutral choices. The effect of providing EV information deserves empirical study. 7. The last row does have the advantage of helping subjects see that they should obviously switch to option B by the last row, and hence seeing the ordered nature of the overall instrument. Arguably it would be useful to add a row 0 in which the lower prize for options A and B were obtained with certainty, to help the subject see that they should always choose A at the top and B at the bottom, and the only issue is where they should switch. 8. Schubert et al. (1999) present their method as the elicitation of a certaintyequivalent, but do not say clearly how they elicited the certainty-equivalent. In fact (Renate Schubert; personal communication) their procedures represent an early application of the MPL idea. Each subject was asked to choose between two lotteries, where one lottery was the risky one and the other degenerate lottery was a non-stochastic one. They asked subjects 98 binary choice questions, spanning 8 risky lotteries. These were arrayed in an ordered fashion on 98 separate sheets. The responses could then be ordered in increasing values for the non-stochastic lottery, and a ‘‘switch point’’ determined to identify the certainty-equivalent. 9. If the subject always chooses A, or indicates indifference for any of the decision rows, there are no additional decisions required and the task is completed. 10. Let the first stage of the iMPL be called Level 1, the second stage Level 2, and so on. After making all responses, the subject has one row from the first table of responses in Level 1 selected at random by the experimenter. In the MPL and sMPL procedures, that is all there is since there is only a Level 1 table. In the iMPL, that is all there is if the row selected at random by the experimenter is not the one at which the subject switched in Level 1. If it is the row at which the subject switched, another random draw is made to pick a row in the Level 2 table. For some tasks this procedure is repeated to Level 3.
Risk Aversion in the Laboratory
133
11. In our experience subjects are suspicious of randomization generated by computers. Given the propensity of many experimenters in other disciplines to engage in deception, we avoid computer randomization whenever feasible. 12. Dave et al. (2007) draw similar conclusions, and include an explicit comparison in the field with the Holt and Laury (2002) MPL instrument. They also collect information on the cognitive abilities of subjects, to better identify the sources of any differences in behavior. 13. Millner, Pratt, and Reilly (1988) offered some important, critical observations on the design and analysis proposed by Harrison (1986). There is possible contamination from intra-session experimental earnings if the subject is paid for each selling price elicited, but this issue is common to all of the methods. One could either assume these wealth effects away (Harrison, 1986; Kachelmeier & Shehata, 1992), test for them (McKee, 1989), or pay subjects for just one of the stages. The last of these options is now the standard method when applying BDM, but raises the same issues with the validity of the random lottery incentive mechanism that have been discussed for other procedures (see Section 3.8). 14. One must also ensure that the buyout range exceeds the highest price that the subject would reasonably state, but this is not a major problem. 15. The same ‘‘payoff dominance problem’’ applies to first-price auctions, as noted by Harrison (1989). Hence, both of the institutions used by Isaac and James (2000) to infer risk attitudes are blighted with this problem. The same problem applies to two of the three institutions studied by Berg, Dickhaut, and McCabe (2005). Their third institution, the English auction, is known to have more reliable behavioral incentives for truthful responses (Harstad, 2000; Rutstro¨m, 1998). 16. Assume a risk neutral subject facing an MPL with prizes $20 and $16 for the safe lottery and $38.50 and $1 for the risky one. Such a subject should choose the risky lottery for rows 1 through 4 and then switch to the risky one. Not doing so would result in an expected earnings loss. For example, if he erroneously responds as if he is slightly risk loving by choosing the risky lottery already on row 4 he is forgoing $1.60, and if he erroneously responds as if he is slightly risk averse by still choosing the safe lottery on row 5, he is forgoing $1.75. Since the chances are 1 in 10 that the row with his erroneous choice is picked, his expected foregone earnings are about 16 to 17.5 cents. If he instead were asked to state his minimum WTA for each of the lotteries in a BDM, his true WTA when the probabilities correspond to those given in row 5 of the MPL (i.e., 50/50) would be $18 for the safe and $19.75 for the risky lottery. We can then calculate the expected loss from different misrepresentations of his preferences in ways that are comparable to those calculated for the MPL. To find the expected loss from representing his preferences as if they were defined over the safe MPL lottery given on row 4 we simply calculate the maximum WTA for the safe lottery on row 4 as $17.60. If this is his stated WTA he will experience a loss if the BDM selling price is between this report and his true WTA ($18).The likelihood for this is obviously quite small. On the other hand, the expected loss of a similarly erroneous report for the risky lottery would involve a report of $16 for a true maximum WTA of $19.75, a much stronger incentive. Again, the likelihood of this loss is the likelihood of the BDM selling price falling in between the stated and the true WTA. This likelihood is a function of the range of the buying prices used in
134
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
the particular implementation of the BDM. The narrower the range the higher is this likelihood. It is clear from this numeric example that the incentive properties of the BDM are much worse than those for the MPL for the safe lottery, but quite a bit better for the risky lottery. One problem with the BDM is that for a risk-loving subject who would state a high WTA for the lottery the probability of the BDM drawing a number higher than or equal to his WTA is low. Thus, the incentives for precision are low for such a subject. 17. Millner et al. (1988; p. 318) suggest that one should develop methods for identifying inconsistent responses, although they would agree that the original checks built in by BDM have some flaws, since the lotteries offered to subjects at later stages depend on earlier elicited selling prices. This sounds attractive in the abstract, but we caution against the use of mechanical rules for classifying subjects as inconsistent. For example, erratic responses could just be a reflection that the subject rationally perceives the absence of a strong incentive to respond truthfully. Classifying such subjects as inconsistent is inappropriate. 18. The former asks the subject to state a certain amount that makes them indifferent to the lottery, similar to what is done in the BDM, and the latter asks the subject to state some probability in the lottery that makes them indifferent to some fixed and certain amount, similar to what is done in the OLS. The latter method presumes that there are only two outcomes, and hence one probability. 19. Abdellaoui (2000) did introduce the use of a bisection method for establishing indifference in each stage that might mitigate some strategic concerns. The idea is to only allow subjects to pick one of two given lotteries, and not to state the indifference lottery directly. By starting this process at some a priori extreme pair, one can iterate down to the point of indifference using a conventional bisection search algorithm. In this instance the chaining strategy is limited to always picking the lottery with the highest possible prize. This method was also used by Kuilen, Wakker, and Zou (2007), and has the advantage of limiting the financial exposure of the experimenter to known bounds. Of course, subjects might not adopt the chaining strategy in the logically extreme form, perhaps to avoid being dismissed from the experiment or not being invited back again, but still be generating strategically biased responses. 20. The TO method has also been extended by Attema, Bleichrodt, Rohde, and Wakker (2006) to elicit discount rates. The same incentive compatibility problems apply, only hypothetical experiments are conducted, and there is no discussion of the problems of incentivizing responses. 21. We use the term ‘‘risk attitudes’’ here in the broader sense of including possible effects from non-linear utility functions, probability weighting and loss aversion. 22. Andersen et al. (2008a) and Section 3.4 discuss the elicitation of risk preference and time preferences, and the need for joint estimation of all parameters. The basic idea is that the discount rate involves the present value of utility streams, and not money streams, so one needs to know the concavity of the utility function to infer discount rates. In effect, the TCN procedure assumes risk neutrality when inferring discount rates, which will lead to overestimates of discount rates between utility flows. 23. We consider the use of such interval bounds for estimation in Section 2.1. Having some bounds that span a finite number and N does not pose problems for
Risk Aversion in the Laboratory
135
the ‘‘interval regression’’ methods widely available, although it does correctly lead to larger standard errors than collapsing this interval to just the lower bound. 24. Some subjects switched several times, but the minimum switch point is always well defined. It turns out not to make much difference how one handles these ‘‘multiple switch’’ subjects, but our analysis and the analysis of HL considers the effect of allowing for them in different ways explained below. 25. HL find that there is a significant sex effect in the low-payoff conditions, with women being more risk averse, and no effect in the high payoff conditions. We replicate this conclusion using their procedures and data. Unfortunately, the lowpayoff sex effect does not hold if one controls for the other characteristics of the subject and uses a negative binomial regression model to handle the discrete nature of the dependant variable. HL also report that there is a significant Hispanic effect, with Hispanic subjects making fewer risk-averse choices in high payoff conditions. We confirm this conclusion, using their procedures as well as when one uses all covariates in a negative binomial regression. 26. A subject that switched from option A to option B after five safe choices, then switched back for one more option A before choosing all B’s in the remaining rows, would therefore have revealed a CRRA interval between 0.15 and 0.97. Such subjects simply provide less precise information than subjects that switch once. 27. Our treatment of indifferent responses uses the specification developed by Papke and Wooldridge (1996; Eq. 5, p. 621) for fractional dependant variables. Alternatively, one could follow Hey and Orme (1994; p. 1302) and introduce a new parameter t to capture the idea that certain subjects state indifference when the latent index showing how much they prefer one lottery over another falls below some threshold t in absolute value. This is a natural assumption to make, particularly for the experiments they ran in which the subjects were told that expressions of indifference would be resolved by the experimenter, but not told how the experimenter would do that (p. 1295, footnote 4). It adds one more parameter to estimate, but for good cause. 28. Clustering commonly arises in national field surveys from the fact that physically proximate households are often sampled to save time and money, but it can also arise from more homely sampling procedures. For example, Williams (2000; p. 645) notes that it could arise from dental studies that ‘‘collect data on each tooth surface for each of several teeth from a set of patients’’ or ‘‘repeated measurements or recurrent events observed on the same person.’’ The procedures for allowing for clustering allow heteroskedasticity between and within clusters, as well as autocorrelation within clusters. They are closely related to the ‘‘generalized estimating equations’’ approach to panel estimation in epidemiology (see Liang & Zeger, 1986), and generalize the ‘‘robust standard errors’’ approach popular in econometrics (see Rogers, 1993). Wooldridge (2003) reviews some issues in the use of clustering for panel effects, noting that significant inferential problems may arise with small numbers of panels. 29. Age was imputed as 20 for all subjects in the undergraduate class experiments conducted at the University of New Mexico, based on personal knowledge of the experimenters of the age distribution in those classes (Kate Krause, personal communication). Given the variation in age for non-student adults, this imputation is less likely to be a major factor compared to studies that only use student subjects.
136
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
30. That is, we treat the prizes here as gains measured as the net gain after deducting losses from the endowment. This analysis still allows for a framing effect, of course. 31. See Harless and Camerer (1994), Hey and Orme (1994) and Loomes and Sugden (1995) for the first wave of empirical studies including some formal stochastic specification in the version of EUT tested. There are several species of ‘‘errors’’ in use, reviewed by Hey (1995, 2002), Loomes and Sugden (1995), Ballinger and Wilcox (1997), Loomes, Moffatt, and Sugden (2002) and Wilcox (2008a). Some place the error at the final choice between one lottery or the other after the subject has decided deterministically which one has the higher expected utility; some place the error earlier, on the comparison of preferences leading to the choice; and some place the error even earlier, on the determination of the expected utility of each lottery. 32. This ends up being simple to formalize, but involves some extra steps in the economics. Let EUR and EUL denote the expected utility of lotteries R and L, respectively. If we ignore indifference, and the subject does not make mistakes, then R is chosen if EUR EULW0, and otherwise L is chosen. If the subject makes measurement errors, denoted by e, then the decision is made on the basis of the value of EUR EUL+e. That is, R is chosen if EUR EUL+eW0, and otherwise L is chosen. If e is random then the probability that R is chosen ¼ P(EUR EUL+eW0) ¼ P(eW (EUR EUL)). Now suppose that e is normally distributed with mean 0 and standard deviation s, then it follows that Z ¼ e/s is normally distributed with mean 0 and standard deviation 1: in other words, Z has a unit normal distribution. Hence, the probability that R is chosen is P(eW (EUR EUL)) ¼ P(e/sW (EUR EUL)/s). If F( ) denotes the cumulative normal standard distribution, it follows that the probability that R is chosen is 1 F( (EUR EUL)/s) ¼ F((EUR EUL)/s), since the distribution is symmetrical about 0. Hence, the probability that B is chosen is given by: F( (EUR EUL)/s) ¼ 1 F((EUR EUL)/ s). If we denote by y the decision of the subject with y ¼ 1 indicating that R was chosen and y ¼ 1 indicating that L was chosen, then the likelihood is F((EUR EUL)/s) if y ¼ 1 and 1 F((EUR EUL)/s) if y ¼ 1. 33. We also correct for clustering, since it is the right thing to do statistically, but this again makes no essential difference to the estimates. 34. The instructions were brief: ‘‘Your decision sheet shows 8 options listed on the left. You should choose one of these options, which will then be played out for you. If the coin toss is a Heads you will receive the amount listed in the second column. If the coin toss is a Tail you will receive the amount listed in the third column.’’ The transparency of the OLS procedure is apparent, and derives from only using probabilities of 1/2. 35. The secondary purpose of this design is to allow statistical examination of the hypothesis that subjects use ‘‘similarity relations’’ and ‘‘editing processes’’ to evaluate lotteries when prizes and probabilities are not pre-rounded, as in Hey and Orme (1994). 36. The use of the noise parameter m in Eq. (8) is also familiar from the numerical literature on the smoothing of accept–reject simulators in discrete choice statistical modeling: see Train (2003; p. 125ff.), for example. This connection also reminds us that the use of specific linking functions such as logit or probit have a certain
Risk Aversion in the Laboratory
137
arbitrariness to them, but embody implicit behavioral assumptions about responsiveness to latent indices. 37. A more complete statistical analysis would consider two factors: the effect of information about earnings in the prior procedure, and a more elaborate likelihood function that recognized that these are in-sample responses. Our estimates ignore both factors. It would also be useful to examine the experimental data from EngleWarnick et al. (2006) using inferential methods such as ours, since their design used exactly the same lotteries in the RLP and OLS instruments. Dave et al. (2007) provide careful tests of the MPL and OLS instruments, concluding that the OLS instrument provides a more reliable measuring rod for risk attitudes in samples drawn from populations expected to have limited cognitive abilities. 38. Hirshleifer and Riley (1992) and Chambers and Quiggin (2000) demonstrate the elegant and powerful representations of decision-making under uncertainty that derive from adopting a state-contingent approach instead of popular alternatives. 39. Many of these claims involve evidence from between-sample designs, and rely on the assumption that sample sizes are large enough for randomization to ensure that between-sample differences in preferences (even if they are not state-contingent) are irrelevant. For two careful examples, see Conlisk (1989) and Cubitt et al. (1998a). There is also a rich literature on the contextual role of extreme lotteries, such that one often observes different behavior for ‘‘interior lotteries’’ that assign positive probability to all prizes as compared to ‘‘corner-solution lotteries’’ that assign zero weight to some prizes. 40. Stigler and Becker (1977; p. 76) note the nature of the impasse: ‘‘an explanation of economic phenomena that reaches a difference in tastes between people or times is the terminus of the argument: the problem is abandoned at this point to whoever studies and explains tastes (psychologists? anthropologists? phrenologists? socio-biologists?).’’ 41. Camerer (2005; p. 130) provides a useful reminder that ‘‘Any economics teacher who uses the St. Petersburg paradox as a ‘‘proof’’ that utility is concave (and gives students a low grade for not agreeing) is confusing the sufficiency of an explanation for its necessity.’’ 42. Of course, many others recognized the basic point that the distribution of outcomes mattered for choice in some holistic sense. Allais (1979; p. 54) was quite clear about this, in a translation of his original 1952 article in French. Similarly, in psychology it is easy to find citations to kindred work in the 1960s and 1970s by Lichtenstein, Coombs and Payne, inter alios. 43. There are some well-known limitations of the probability weighting function Eq. (9). It does not allow independent specification of location and curvature; it has a crossover-point at p ¼ 1/e ¼ 0.37 for go1 and at p ¼ 1 0.37 ¼ 0.63 for gW1; and it is not increasing in p for small values of g. Prelec (1998) and Rieger and Wang (2006) offer two-parameter probability weighting functions that exhibits more flexibility than Eq. (9), but for our expository purposes the standard probability weighting function is adequate. 44. In this case, because each lottery only consists of two outcomes, the ‘‘rank dependence’’ of the RDU model does not play a distinctive role, but it will in later applications.
138
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
45. The estimates of the coefficient obtained by Tversky and Kahneman (1992) fortuitously happened to be the same for losses and gains, and many applications of PT assume that for convenience. The empirical methods of Tversky and Kahneman (1992) are difficult to defend, however: they report median values of the estimates obtained after fitting their model for each subject. The estimation for each subject is attractive if data permits, as magnificently demonstrated by Hey and Orme (1994), but the median estimate has nothing to commend it statistically. 46. In other words, evaluating the PU of two lotteries, without having edited out dominated lotteries, might lead to a dominated lottery having a higher PU. But if subjects always reject dominated lotteries, the choice would appear to be an error to the likelihood function. Apart from searching for better parameters to explain this error, as the ML algorithm does as it tries to find parameter estimates that reduce any other prediction error, our specification allows m to increase. We stress that this argument is not intended to rationalize the use of separable probability weights in OPT, just to explain how a structural model with stochastic errors might account for the effects of stochastic dominance. Wakker (1989) contains a careful account of the notion of transforming probabilities in a ‘‘natural way’’ but without violating stochastic dominance. 47. One of the little secrets of CPT is that one must always have a probability weight for the residual outcome associated with the reference point, and that the reference outcome receive a utility of 0 for both gains and losses. This ensures that decision weights always add up to 1. 48. An alternative specification would be to take the negative of the utility function defined over the gross losses, in effect assuming l ¼ 1 from the CPT specification. 49. A corollary is that it might be a mistake to view loss aversion as a fixed parameter l that does not vary with the context of the decision, ceteris paribus the reference point. 50. The mean estimate from their sample was $31, but there were clear nodes at $15 and $30. Our experimental sessions typically consist of several tasks, so expected earnings from the lottery task would have been some fraction of these expectations over session earnings. No subject stated an expected earning below $7. 51. A concrete implication, considered at length in Harrison and Rutstro¨m (2005; Section 5), is that the rush to use non-nested hypothesis tests is misplaced. If one reads the earlier literature on those tests it is immediately clear that they were viewed as poor, second-best alternatives to writing out a finite mixture model and estimating the weights that the data place on each latent process. The computational constraints that made them second-best decades ago no longer apply. 52. See Keller and Strazzera (2002; p. 148) and Frederick, Loewenstein, and O’Donoghue (2002; p. 381ff.) for an explicit statement of this assumption, which is often implicit in applied work. We refer to risk aversion and concavity of the utility function interchangeably, but it is concavity that is central (the two can differ for non-EUT specifications). 53. Harless and Camerer (1994) do consider different ways that one can compare different theories that have different numbers of ‘‘free parameters.’’ They also
Risk Aversion in the Laboratory
139
consider simple metrics for violations, but even these are still defined in terms of the number of failures of the theory in a given triple (e.g., one failure out of three predictions is considered better from the perspective of the theory than two failures out of three). 54. Some semi-parametric estimators, such as the Maximum Score estimator of Manski, do rely on ‘‘hit rates’’ as a metric. 55. Some experiments attempt to design checks for some of the more obvious biases, such as which lottery is presented on the left or right, or whether the lotteries are ordered best to worst or vice versa (e.g., see Harless, 1992; Hey & Orme, 1994). 56. Problem 2 in CSS involves losing three subjects at random for every one subject that was actually asked to make a choice, whereas the other problems involved all recruited subjects making a choice. Hence 200 subjects were recruited to Problem 2, and the eventual sample of choices was roughly 50 subjects for each problem, by design. 57. Comparing only Problems 1 and 5 in CSS, which involve choices only over simple lotteries, the evidence against EUT is even weaker. 58. The word ‘‘essentially’’ reminds us that this is EUT with some explicit stochastic error story. There are many alternative error stories, of course. Wilcox (2008a, 2008b) explores the deeper modeling issues of writing out a theory without specifying any stochastic process connecting it to data. 59. Some might object that even if the behavior can be formally explained by some small error, there are systematic behavioral tendencies that are not consistent with a white-noise error process. Of course, one can allow asymmetric errors or heteroskedastic errors. 60. Wakker et al. (1994) in effect adopted such a design. Their primary tasks deliberately had comparable expected values in the paired lotteries subjects were to choose over, but their ‘‘filler’’ tasks were then deliberately set up to have different expected values. 61. See Kagel (1995) and Harrison (1989, 1990) for a flavor of the debates. 62. Harrison, List, and Tra (2005e) show, however, that when auctions consist of more and more bidders, received theory does increasingly poorly in terms of characterizing ‘‘one shot’’ behavior. Their evidence suggests that received theory is relevant for ‘‘small auctions’’ but not for ‘‘large auctions.’’ Thus, if one were testing received theory it would matter on what domain the data were generated. Cox et al. (1982) reported different results, with the smallest of their auctions (N ¼ 3) generating the data that seemed to most obviously contradict the risk-averse Nash Equilibrium bidding model. However, this could have been due to collusion. In all of their experiments the same N bidders participated in multiple rounds, facilitating coordination of collusive under-bidding strategies that wreak havoc with the oneshot predictions of the theory. 63. Cox et al. (1988) offer a generalization that admits of some degrees of riskloving behavior. Since we do not observe much risk loving in the population used in these experiments, college students in the United States, this extension is not needed for present purposes. 64. That is, 1/2 is arguably more focal than 2/3 or 3/4, and so on for NW2. It is certainly easier to implement arithmetically, absent calculating aids.
140
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
65. Unfortunately, there is evidence that subjects may not see it this way. In a generic public goods voluntary contribution game Botelho, Harrison, Pinto, and Rutstro¨m (2005b) show that Random Strangers designs do not generate the same behavior as Perfect Strangers designs in which the subject is guaranteed not to meet the same opponent twice. 66. One might be concerned that the full model fits the RA NE bidding model simply because it has a ‘‘free parameter ri’’ to fit the bidding data to. In some sense this is true, since the joint likelihood of the data includes the effect of different r^i ’s on bids, and the estimates seek r^i values that explain the bidding data best. But it is not true entirely, since the joint likelihood must also explain the risk attitude choice data as well. One can formally compare the distribution of predicted risk attitudes if one only uses the risk aversion tasks and the distribution that is generated if one uses all data simultaneously. The two distributions are virtually identical. Kendall’s t statistic can be used to test for rank correlation; it has a value of 0.82, and leads one to reject the null hypothesis that the two sets of estimates of risk attitudes are independent at p-values below 0.0001. 67. Additional experimental tests include Thaler, Tversky, Kahneman, and Schwartz (1997) and Gneezy, Kapteyn, and Potters (2003). These provide results that are qualitatively identical, but harder to evaluate. Thaler et al. (1997) did not provide subjects with precise knowledge of the probabilities involved in the lotteries, but allowed them to infer that over time; hence behavior could have been driven by mistakes in the subjective inference of probabilities rather than MLA. Gneezy et al. (2003) embed the task in an asset market, which may have influenced individual behavior in other ways than predicted by EUT or MLA. These influences are of interest, since markets are the institution in which most stocks and bonds are traded, but from the perspective of wanting the cleanest possible test of competing theories those extra influences are just a confound. Camerer (2005) and Novemsky and Kahneman (2005a, 2005b) provide an overview of the history and current status of the loss aversion hypothesis. 68. It is also possible to augment the estimation procedure to include a parameter that can be interpreted as ‘‘baseline consumption,’’ to which prizes are added before being evaluated using the utility function. This approach has been employed by Harrison et al. (2007c) and Heinemann (2008). Andersen et al. (2008a) consider the theoretical and empirical implications of this approach in detail. 69. The term ‘‘portfolio effects’’ is unfortunate, since it suggests a concern with correlated risks and risk pooling, which is not the issue here. Unfortunately, we cannot come up with a better expression, and this one has some currency in the literature. 70. For K binary choices it is 2K, assuming that indifference is not an option. For K ¼ 10 this is only 1,024, but for K ¼ 15 it is 32,768, and one can guess the rest for larger K. The use of random lottery incentives in the context of the Random Lottery Pair elicitation procedure raises some deep modeling issues of sequential choice, since it introduces the interaction of risk aversion and ambiguity aversion with respect to future lotteries (Klibanoff, Marinacci, & Mukerji, 2005; Nau, 2006), as well as concerns with possible preferences over the temporal resolution of uncertainty (Kreps & Porteus, 1978). In effect, this is a setting in which the ‘‘small world’’ assumption of Savage (1972; Section 5.5), under which one focuses on
Risk Aversion in the Laboratory
141
isolated decisions and ignores the broader context, may be particularly appropriate to apply. It may not be appropriate to apply for other tasks, as we discuss below. 71. Harrison et al. (2007; fn.16) report a direct test of the random lottery procedure with the MPL instrument, and note that it did not change elicited risk attitudes assuming EUT to infer risk attitudes. 72. In fact, Wilcox (2008a) recommends a third alternative specification, the Contextual Utility model developed in Wilcox (2008b), over both the Luce and Fechner specifications. If the choice is between Luce and Fechner, however, his discussion clearly favors Fechner. The estimates from Holt and Laury (2005) presented in Section 3.1 used the Luce specification, and hence differ from those presented here. 73. These do not exactly replicate all estimates presented earlier since there are slight differences in specifications. 74. With one exception, we do not believe that this inference is supported by the existing data and experimental designs. That exception is Beattie and Loomes (1997), an excellent example of the type of controlled study of incentives that is needed to address these issues. 75. The term ‘‘valuation’’ subsumes open-ended elicitation procedures as well as dichotomous choice, binary referenda, and stated choice tasks. See Harrison (2006a, 2006b) and Harrison and Rutstro¨m (2008) for reviews. 76. Heckman and Smith (1995; pp. 99–101) provide many examples, and coin the expression ‘‘randomization bias’’ for this possible effect. Harrison and List (2004) review the differences between laboratory, field, social, and natural experiments in economics, and all could be potentially affected by randomization bias. Palfrey and Pevnitskaya (2008) use thought experiments and laboratory experiments to illustrate how risk attitudes can theoretically affect the mix of bidders in sealed-bid auctions with endogenous entry, and thereby change behavior in the sample of bidders observed in the auction. 77. We hesitate to endorse practices in other fields, in which recruitment fees are not paid to subjects, since they open themselves up to abuse. We have considerable experience of faculty recruiting subjects for ‘‘extra credit,’’ but where the task and behavior bears no relationship at all to the learning objectives of the class, and no pedagogic feedback is provided to students even if it does bear some tangential relationship to the topic of the class. We have serious ethical problems with such practices, quite apart from the problems of motivation that they raise. 78. There is also evidence of differences in the demographics and behavior of volunteers and ‘‘pseudo-volunteers,’’ which are subjects formally recruited in a classroom to participate in an experiment during class time (Rutstro¨m, 1998; Eckel & Grossman, 2000). The disadvantage with pseudo-volunteers is that the subjects might simply not be interested in participating in the experiment, even with the use of salient rewards. The advantage, of course, is that the selection process that leads them to be in the classroom is unrelated to the characteristics of the experimental task, although even here one might just be replacing one ill-studied sample selection process with another. After all, even if we model the factors that cause subjects from a university population to select into an experiment, we have not modeled the factors that cause individuals to choose to become university students (Casari, Ham, & Kagel, 2007).
142
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
79. Endogenous subject attrition from the experiment can also be informative about subject preferences, since the subject’s exit from the experiment indicates that the subject had made a negative evaluation of it. See Diggle and Kenward (1994) and Philipson and Hedges (1998) for discussion of this statistical issue. 80. More precisely, the statistical problem is that there may be some unobserved individual effects that cause subjects to be in the observed sample or not, and these effects could be correlated with responses once in the observed sample. For example, Camerer and Lovallo (1999) find that excess entry into competitive games occurs more often when subjects volunteered to participate knowing that payoffs would depend on skill in a sports or current events trivia. This treatment could encourage less risk-averse subjects to participate in the experiment and may explain the observed reference bias effect, or part of it. 81. It is well known in the field of clinical drug trials that persuading patients to participate in randomized studies is much harder than persuading them to participate in non-randomized studies (e.g., Kramer and Shapiro (1984; p. 2742ff.)). The same problem applies to social experiments, as evidenced by the difficulties that can be encountered when recruiting decentralized bureaucracies to administer the random treatment (e.g., Hotz, 1992). For example, Heckman and Robb (1985) note that the refusal rate in one randomized job training program was over 90%. 82. Here we consider the role of preferences over risk, but the same concerns apply to the elicitation of other types of preferences, such as social preferences or time preferences (Eckel & Grossman, 2000; Lazear, Malmendier, & Weber, 2006; Dohmen & Falk, 2006). These concerns arise when subjects have some reason to believe that the task will lead them to evaluate those preferences, such as in longitudinal designs allowing attrition, or social experiments requiring disclosure of the nature of the task prior to participation. They might also arise if the sample is selected by some endogenous process in which selection might be correlated with those preferences, such as group membership or location choices. 83. In addition, we often just assign subjects to some role in an experiment, whether or not they would have selected for this role in any naturally occurring environment. This issue lies at the heart of the interest in field experiments initiated by Bohm (2002). 84. Lusk and Coble (2005) also report evidence consistent with this conclusion, comparing risk preferences elicited for an artificial monetary instrument and comparable preferences for an instrument defined over genetically-modified food. Lusk and Coble (2008) find that adding abstract background risk to an elicitation procedure using artificial monetary outcomes also generates more risk aversion, although they do not find the effect to be large quantitatively. 85. To make this point more succinctly, consider the elicitation of the value that a person places on safety, a critical input in the cost-benefit assessment of environmental policy such as the Clean Air Act (United States Environmental Protection Agency, 1997). Conventional procedures to measure such preferences focus on monetary values to avoid mortality risk, by asking subjects to value scenarios in which they face different risks of death. The traditional interpretation of responses to such questions ignores the fact that it is hard to imagine a physical risk that could kill you with some probability but that would leave you alive and have no effect whatsoever on your health. Of course, such risks exist, but most of the environmental
Risk Aversion in the Laboratory
143
risks of concern for policy do not fall into such a category. In general, then, responses to the foreground risk question should allow for the fact that the subject likely perceived some background risk. This example represents an important policy issue and highlights the import of the theoretical literature on background risk. 86. However, since we do not know the subject probability distribution of background risk in the field, we cannot know if background risk is statistically independent with the foreground risk. We can think of no reason why the two might be correlated, but this illustrates again the type of trade-off one experiences with field experiments. It also points to the complementary nature of field and lab experiments: Lusk and Coble (2008) show that independent background risk in a lab setting is associated with an increase in foreground risk aversion. 87. The typical application of the random lottery incentive mechanism in experiments such as these would have one choice selected at random. We used three to ensure comparability of rewards with other experiments in which subjects made choices over 40 or 20 lotteries, and where 2 lotteries or 1 lottery was, respectively, selected at random to be played out. 88. The computer laboratory used for these experiments has 28 subject stations. Each screen is ‘‘sunken’’ into the desk, and subjects were typically separated by several empty stations due to staggered recruitment procedures. No subject could see what the other subjects were doing, let alone mimic what they were doing since each subject was started individually at different times. 89. These final outcomes differ by $1 from the two highest outcomes for the gain frame and mixed frame, because we did not want to offer prizes in fractions of dollars. 90. To ensure that probabilities summed to one, we also used probabilities of 0.26 instead of 0.25, 0.38 instead of 0.37, 0.49 instead of 0.50, or 0.74 instead of 0.75. 91. The control data in these three panels, for the 1 problem, are pooled across all task #1 responses. That is, the task #1 responses in the bottom left panel of Fig. 27 are not just the task #1 responses of the individuals facing the 90 problem. Nothing essential hinges on this at this stage of exposition. The statistical analysis in Section 2.1 does take this into account, using appropriate panel estimation procedures. 92. The experience was not with the same prize level, as noted earlier. 93. See Ortona (1994) and Kachelmeier and Shehata (1994). 94. These conclusions come from a panel regression model that controls for all of the factors discussed, and that allows for individual-level heteroskedasticity and individual-level first-order autocorrelation. All conclusions refer to effects that are statistically significant at the 1% level. 95. References to increases in risk aversion should also be understood, in this context, to refer to decreases in risk loving. 96. Although purely anecdotal, our own experience is that many subjects faced with the BDM task believe that the buying price depends in some way on their selling price. To mitigate such possible perceptions we have tended to use physical randomizing devices that are less prone to being questioned. 97. The stakes in the experiments of Gneezy and Potters (1997) were actually 2 Dutch guilders, which converted at the time of the experiment to roughly $1.20. Haigh and List (2005) used a stake of $1.00 for their students, to be comparable to the earlier stake. They quadrupled the stakes to $4.00 for the traders, on the grounds
144
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
that it would be more salient for them. Of course, this change in monetary stake size adds a potential confound to the comparability of results across students and traders, but one that has no obvious resolution without an elaborate investigation into the purchasing power of a dollar to students and traders. 98. Gneezy and Potters generously provided their individual data, and we used the same statistical model as Haigh and List (2005; Table II, specification 2, p. 530) on their data. Haigh and List also generously provided their individual data, and we replicated their statistical conclusions. 99. In fact, subjects tended to pick in round percentages. In the Low frequency treatment 76% of the choices were for 0, 25, 50, or 100% bets, and in the High frequency treatment 81% of the choices were for the 25, 50, or 100% bets. 100. For example, Kahneman and Lovallo (1993, p. 20), Benartzi and Thaler (1995, p. 79), Gneezy and Potters (1997, p. 632), Thaler et al. (1997. p. 650), Gneezy et al. (2003, p. 822), and Haigh and List (2005, p. 525). 101. In other words, there are settings in which a CRRA or even RN utility function might be appropriate for some theoretical, econometric, or policy exercise. But this experiment is not obviously one of those settings. 102. Yet another approach would be to modify the experimental design and allow subjects to leverage their bets beyond 100% of their stake. There are some logistical problems running such experiments in a laboratory setting, although of course stock exchanges and futures markets allow such trades. 103. The a parameter may be viewed as a counterpart in this specification of the noise parameter used by Holt and Laury (2002). 104. Benartzi and Thaler (1995, p. 80) are clear that this evaluation horizon is not the same thing as a planning horizon: ‘‘A young investor, for example, might be saving for retirement 30 years off in the future, but nevertheless experience the utility associated with the gains and losses of his investment every quarter when he opens a letter from his mutual fund. In this case, his (planning) horizon is 30 years but his evaluation period (evaluation horizon) is 3 months.’’ 105. They prefer the expression ‘‘prospective utility,’’ but there is no confusion as long we are clear about which utility functions and probabilities are being used to calculate expected utility. 106. Mankiw and Zeldes (1991) make the important observation that only 12% of Americans hold stocks worth more than $10,000, using a 1984 survey, so one really has to explain their indifference between holding bonds and stocks. Presumably, the remaining ‘‘corner-solution’’ individuals face some transactions costs to undertaking such investments. It would be an easy and important extension of the approach of BT to allow for such heterogeneity in the composition of stockholders and others. 107. The constant term in this linear function is suppressed, since it would be perfectly correlated with the sum of these two binary variables. To be explicit, denote these dummy variables for the treatments as L and H, respectively. Then we actually estimate aL, aL, bL, bH, lL, and lH, where a ¼ aL L+aH H, b ¼ bL L+bH H, and l ¼ lL L+lH H. Thus, the logic of the likelihood function is as follows: candidate values of these six parameters are proposed, the linear function evaluated so that we know candidate value of a, b, and l for each of the Low and High frequency treatments, the expected utility of the actual choice is evaluated using the Tversky and
Risk Aversion in the Laboratory
145
Kahneman (1992) specification, and then the log-likelihood function specified above is evaluated. 108. The Arrow–Pratt coefficient of RRA is 1 a, so a ¼ 1 implies risk neutrality, ao1 implies risk aversion, and aW1 implies risk-loving behavior. These benchmarks are worth noting, to avoid confusion, given the popularity of specifications from Holt and Laury (2002) that estimate 1 a directly (the risk-neutral value is 0 in that case, positive estimates indicate risk aversion, and negative estimates indicate risk loving). 109. The exposition is deliberately transparent to economists. Most of the exposition in Section F1 would be redundant for those familiar with Gould, Pitblado, and Sribney (2006) or even Rabe-Hesketh and Everitt (2004; ch.13). It is easy to find expositions of ML in Stata that are more general and elegant for their purpose, but for those trying to learn the basic tools for the first time that elegance can just appear to be needlessly cryptic coding, and actually act as an impediment to comprehension. There are good reasons that one wants to build more flexible and computationally efficient models, but ease of comprehension is rarely one of them. StataCorp (2007) documents the latest version 10 of Stata, but the exposition of the ML syntax is minimal in that otherwise extensive documentation. 110. Paarsch and Hong (2006; Appendix A.8) provide a comparable introduction to the use of MATLAB for estimation of structural models of auctions. Unfortunately their documentation contains no ‘‘real data’’ to evaluate the programs on. 111. Note that this is ‘euL’ and not ‘euL’: beginning Stata users make this mistake a lot. 112. Since the ML_eut0 program is called many, many times to evaluate Jacobians and the like, these warning messages can clutter the screen display needlessly. During debugging, however, one normally likes to have things displayed, so the command ‘‘quietly’’ would be changed to ‘‘noisily’’ for debugging. Actually, we use the ‘‘ml check’’ option for debugging, as explained later, and never change this to ‘‘noisily.’’ Or we can display one line by using the ‘‘noisily’’ option, to debug specific calculations.
ACKNOWLEDGMENT We thank the U.S. National Science Foundation for research support under grants NSF/IIS 9817518, NSF/HSD 0527675, and NSF/SES 0616746, and the Danish Social Science Research Council for research support under project #24-02-0124. We are grateful to William Harbaugh, John Hey, Steven Kachelmeier, and Robert Sugden for making detailed experimental results available. Valuable comments were received from Steffen Andersen, James Cox, Morten Lau, Vjollca Sadiraj, Peter Wakker, and Nathaniel Wilcox. Harrison is also affiliated with the Durham Business School, Durham University, UK.
146
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
REFERENCES Abdellaoui, M. (2000). Parameter-free elicitation of utilities and probability weighting functions. Management Science, 46, 1497–1512. Abdellaoui, M., Barrios, C., & Wakker, P. P. (2007a). Reconciling introspective utility with revealed preference: Experimental arguments based on prospect theory. Journal of Econometrics, 138, 356–378. Abdellaoui, M., Bleichrodt, H., & Paraschiv, C. (2007b). Measuring loss aversion under prospect theory: A parameter-free approach. Management Science, 53(10), 1659–1674. Allais, M. (1979). The foundations of positive theory of choice involving risk and a criticism of the postulates and Axioms of the American school. In: M. Allais & O. Hagen (Eds), Expected utility hypotheses and the Allais paradox. Dordrecht, The Netherlands: Reidel. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006a). Elicitation using multiple price lists. Experimental Economics, 9(4), 383–405. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008a). Eliciting risk and time preferences. Econometrica, 76, forthcoming. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008b). Lost in state space: Are preferences stable? International Economic Review, 49, forthcoming. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008c). Risk aversion in game shows. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Andersen, S., Harrison, G. W., & Rutstro¨m, E. E. (2006b). Choice behavior, asset integration and natural reference points. Working Paper 06-04. Department of Economics, College of Business Administration, University of Central Florida. Attema, A. E., Bleichrodt, H., Rohde, K. I. M., & Wakker, P. P. (2006). Time-tradeoff sequences for quantifying and visualizing the degree of time inconsistency, using only pencil and paper. Working Paper. Erasmus University, Rotterdam. Ballinger, T. P., & Wilcox, N. T. (1997). Decisions, error and heterogeneity. Economic Journal, 107, 1090–1105. Barr, A. (2003). Risk pooling, commitment, and information: An experimental test of two fundamental assumptions. Working Paper 2003–05. Centre for the Study of African Economies, Department of Economics, University of Oxford. Barr, A., & Packard, T. (2002). Revealed preference and self insurance: Can we learn from the self employed in Chile? Policy Research Working Paper #2754. World Bank, Washington DC. Battalio, R. C., Kagel, J. C., & Jiranyakul, K. (1990). Testing between alternative models of choice under uncertainty: Some initial results. Journal of Risk and Uncertainty, 3, 25–50. Beattie, J., & Loomes, G. (1997). The impact of incentives upon risky choice experiments. Journal of Risk and Uncertainty, 14, 149–162. Beck, J. H. (1994). An experimental test of preferences for the distribution of income and individual risk aversion. Eastern Economic Journal, 20(2), 131–145. Becker, G. M., DeGroot, M. H., & Marschak, J. (1964). Measuring utility by a single-response sequential method. Behavioral Science, 9, 226–232. Benartzi, S., & Thaler, R. H. (1995). Myopic loss aversion and the equity premium puzzle. Quarterly Journal of Economics, 111(1), 75–92. Berg, J., Dickhaut, J., & McCabe, K. (2005). Risk preference instability across institutions: A dilemma. Proceedings of the National Academy of Sciences, 102, 4209–4214.
Risk Aversion in the Laboratory
147
Binswanger, H. P. (1980). Attitudes toward risk: Experimental measurement in rural India. American Journal of Agricultural Economics, 62, 395–407. Binswanger, H. P. (1981). Attitudes toward risk: Theoretical implications of an experiment in rural India. Economic Journal, 91, 867–890. Birnbaum, M. H. (2004). Tests of rank-dependent utility and cumulative prospect theory in gambles represented by natural frequencies: Effects of format, event framing, and branch splitting. Organizational Behavior and Human Decision Processes, 95, 40–65. Bleichrodt, H., & Pinto, J. L. (2000). A parameter-free elicitation of the probability weighting function in medical decision analysis. Management Science, 46, 1485–1496. Bohm, P. (2002). Pitfalls in experimental economics. In: F. Andersson & H. Holm (Eds), Experimental economics: Financial markets, auctions, and decision making. Dordrecht: Kluwer. Botelho, A., Harrison, G. W., Pinto, L. M. C., & Rutstro¨m, E. E. (2005a). Social norms and social choice. Working Paper 05-23. Department of Economics, College of Business Administration, University of Central Florida. Botelho, A., Harrison, G. W., Pinto, L. M. C., & Rutstro¨m, E. E. (2005b). Testing static game theory with dynamic experiments: A case study of public goods. Working Paper 05–25. Department of Economics, College of Business Administration, University of Central Florida. Bullock, D. S., & Rutstro¨m, E. E. (2007). Policy making and rent-dissipation: An experimental test. Experimental Economics, 10(1), 21–36. Cadsby, C. B., Song, F., & Tapon, F. (2007). Sorting and incentive effects of pay for performance: An experimental investigation. Academy of Management Journal, 50(2), 387–405. Calman, K. C., & Royston, G. (1997). Risk language and dialects. British Medical Journal, 315, 939–942. Camerer, C. F. (1989). An experimental test of several generalized utility theories. Journal of Risk and Uncertainty, 2, 61–104. Camerer, C. F. (2000). Prospect theory in the wild: Evidence from the field. In: D. Kahneman & A. Tversky (Eds), Choices, values and frames. New York: Cambridge University Press. Camerer, C. F. (2005). Three cheers – psychological, theoretical, empirical – for loss aversion. Journal of Marketing Research, XLII, 129–133. Camerer, C., & Ho, T. (1994). Violations of the betweenness axiom and nonlinearity in probability. Journal of Risk and Uncertainty, 8, 167–196. Camerer, C., & Hogarth, R. (1999). The effects of financial incentives in experiments: A review and capital-labor framework. Journal of Risk and Uncertainty, 19, 7–42. Camerer, C., & Lovallo, D. (1999). Overconfidence and excess entry: An experimental approach. American Economic Review, 89(1), 306–318. Casari, M., Ham, J. C., & Kagel, J. H. (2007). Selection bias, demographic effects and ability effects in common value experiments. American Economic Review, 97(4), 1278–1304. Chambers, R. G., & Quiggin, J. (2000). Uncertainty, production, choice, and agency: The statecontingent approach. New York, NY: Cambridge University Press. Chew, S. H., Karni, E., & Safra, Z. (1987). Risk aversion in the theory of expected utility with rank dependent probabilities. Journal of Economic Theory, 42, 370–381. Clarke, K. A. (2003). Nonparametric model discrimination in international relations. Journal of Conflict Resolution, 47(1), 72–93. Cleveland, W. S., Harris, C. S., & McGill, R. (1982). Judgements of circle sizes on statistical maps. Journal of the American Statistical Association, 77(379), 541–547. Cleveland, W. S., Harris, C. S., & McGill, R. (1983). Experiments on quantitative judgements of graphs and maps. Bell System Technical Journal, 62(6), 1659–1674.
148
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Cleveland, W. S., & McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association, 79(387), 531–554. Coller, M., & Williams, M. B. (1999). Eliciting individual discount rates. Experimental Economics, 2, 107–127. Conlisk, J. (1989). Three variants on the Allais example. American Economic Review, 79(3), 392–407. Conte, A., Hey, J. D., & Moffatt, P. G. (2007). Mixture models of choice under risk. Discussion Paper No. 2007/06. Department of Economics and Related Studies, University of York. Cox, J. C., & Epstein, S. (1989). Preference reversals without the independence axiom. American Economic Review, 79(3), 408–426. Cox, J. C., & Harrison, G. W. (2008). Risk aversion in experiments: An introduction. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Cox, J. C., Roberson, B., & Smith, V. L. (1982). Theory and behavior of single object auctions. In: V. L. Smith (Ed.), Research in experimental economics (Vol. 2). Greenwich: JAI Press. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games & Economic Behavior, 56, 45–60. Cox, J. C., & Sadiraj, V. (2008). Risky decisions in the large and in the small: Theory and experiment. In: J. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Cox, J. C., Smith, V. L., & Walker, J. M. (1985). Experimental development of sealed-bid auction theory: Calibrating controls for risk aversion. American Economic Review (Papers & Proceedings), 75, 160–165. Cox, J. C., Smith, V. L., & Walker, J. M. (1988). Theory and individual behavior of first-price auctions. Journal of Risk and Uncertainty, 1, 61–99. Cubitt, R. P., Starmer, C., & Sugden, R. (1988a). Dynamic choice and the common ratio effect: An experimental investigation. Economic Journal, 108, 1362–1380. Cubitt, R. P., Starmer, C., & Sugden, R. (1988b). On the validity of the random lottery incentive system. Experimental Economics, 1(2), 115–131. Dave, C., Eckel, C., Johnson, C., & Rojas, C. (2007). On the heterogeneity, stability and validity of risk preference measures. Unpublished manuscript. Department of Economics, University of Texas at Dallas. Diggle, P., & Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis. Applied Statistics, 43(1), 49–93. Dohmen, T., & Falk, A. (2006). Performance pay and multi-dimensional sorting: Productivity, preferences and gender. Discussion Paper #2001. Institute for the Study of Labor (IZA), Bonn, Germany. Eckel, C. C., & Grossman, P. J. (2000). Volunteers and pseudo-volunteers: The effect of recruitment method in dictator experiments. Experimental Economics, 3, 07–120. Eckel, C. C., & Grossman, P. J. (2002). Sex differences and statistical stereotyping in attitudes toward financial risk. Evolution and Human Behavior, 23(4), 281–295. Eckel, C. C., & Grossman, P. J. (2008). Forecasting risk attitudes: An experimental study of actual and forecast risk attitudes of women and men. Journal of Economic Behavior & Organization, forthcoming.
Risk Aversion in the Laboratory
149
Eeckhoudt, L., Gollier, C., & Schlesinger, H. (1996). Changes in background risk and risktaking behavior. Econometrica, 64(3), 683–689. Ehrhart, K.-M., & Keser, C. (1999). Mobility and cooperation: On the run. Working Paper 99s-24. CIRANO, University of Montreal. Engle-Warnick, J., Escobal, J., & Laszlo, S. (2006). The effect of an additional alternative on measured risk preferences in a laboratory experiment in Peru. Working Paper 2006s-06. CIRANO, Montreal. Ertan, A., Page, T., & Putterman, L. (2005). Can endogenously chosen institutions mitigate the free-rider problem and reduce perverse punishment? Working Paper 2005-13. Department of Economics, Brown University. Falk, A., Fehr, E., & Fischbacher, U. (2005). Driving forces behind informal sanctions. Econometrica, 73(6), 2017–2030. Farquhar, P. H. (1984). Utility assessment methods. Management Science, 30(11), 1283–1300. Fehr, E., & Goette, L. (2007). Do workers work more if wages are high? Evidence from a randomized field experiment. American Economic Review, 97(1), 298–317. Fennema, H., & van Assen, M. (1999). Measuring the utility of losses by means of the trade off method. Journal of Risk and Uncertainty, 17(3), 277–295. Finney, M. A. (1998). FARSITE: Fire Area Simulator – Model Development and Evaluation, Research Paper RMRS-RP-4. Rocky Mountain Research Station, Forest Service, United States Department of Agriculture. Fiore, S. M., Harrison, G. W., Hughes, C. E., & Rutstro¨m, E. E. (2007). Virtual experiments and environmental policy. Working Paper 07-01. Department of Economics, College of Business Administration, University of Central Florida. Fishburn, P. C. (1967). Methods of estimating additive utilities. Management Science, 13(7), 435–453. Frederick, S., Loewenstein, G., & O’Donoghue, T. (2002). Time discounting and time preference: A critical review. Journal of Economic Literature, XL, 351–401. Gegax, D., Gerking, S., & Schulze, W. (1991). Perceived risk and the marginal value of safety. Review of Economics and Statistics, 73, 589–596. Gerking, S., de Haan, M., & Schulze, W. (1988). The marginal value of job safety: A contingent value study. Journal of Risk and Uncertainty, 1, 185–199. Gneezy, U., Kapteyn, A., & Potters, J. (2003). Evaluation periods and asset prices in a market experiment. Journal of Finance, 58, 821–838. Gneezy, U., & Potters, J. (1997). An experiment on risk taking and evaluation periods. Quarterly Journal of Economics, 112, 631–645. Gollier, C. (2001). The economics of risk and time. Cambridge, MA: MIT Press. Gollier, C., & Pratt, J. W. (1996). Risk vulnerability and the tempering effect of background risk. Econometrica, 64(5), 1109–1123. Gonzalez, R., & Wu, G. (1999). On the shape of the probability weighting function. Cognitive Psychology, 38, 129–166. Gould, W., Pitblado, J., & Sribney, W. (2006). Maximum likelihood estimation with Stata (3rd ed.). College Station, TX: Stata Press. Grether, D. M., & Plott, C. R. (1979). Economic theory of choice and the preference reversal phenomenon. American Economic Review, 69, 623–648. Gu¨rerk, O¨., Irlenbusch, B., & Rockenbach, B. (2006). The competitive advantage of sanctioning institutions. Science, 312, 108–111.
150
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Haigh, M. S., & List, J. A. (2005). Do professional traders exhibit myopic loss aversion? An experimental analysis. Journal of Finance, 60(1), 523–534. Harbaugh, W. T., Krause, K., & Vesterlund, L. (2002). Risk attitudes of children and adults: Choices over small and large probability gains and losses. Experimental Economics, 5, 53–84. Harless, D. W. (1992). Predictions about indifference curves inside the unit triangle: A test of variants of expected utility theory. Journal of Economic Behavior and Organization, 18, 391–414. Harless, D. W., & Camerer, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62(6), 1251–1289. Harrison, G. W. (1986). An experimental test for risk aversion. Economics Letters, 21(1), 7–11. Harrison, G. W. (1989). Theory and misbehavior of first-price auctions. American Economic Review, 79, 749–762. Harrison, G. W. (1990). Risk attitudes in first-price auction experiments: A Bayesian analysis. Review of Economics and Statistics, 72, 541–546. Harrison, G. W. (1992). Theory and misbehavior of first-price auctions: Reply. American Economic Review, 82, 1426–1443. Harrison, G. W. (2006a). Experimental evidence on alternative environmental valuation methods. Environmental and Resource Economics, 34, 125–162. Harrison, G. W. (2006b). Making choice studies incentive compatible. In: B. Kanninen (Ed.), Valuing environmental amenities using stated choice studies: A common sense guide to theory and practice (pp. 65–108). Boston: Kluwer. Harrison, G. W. (2006c). Maximum likelihood estimation of utility functions using Stata. Working Paper 06-12. Department of Economics, College of Business Administration, University of Central Florida. Harrison, G. W. (2007). Hypothetical bias over uncertain outcomes. In: J. A. List (Ed.), Using experimental methods in environmental and resource economics. Northampton, MA: Elgar. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2005a). Temporal stability of estimates of risk aversion. Applied Financial Economics Letters, 1, 31–35. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2005b). Risk aversion and incentive effects: Comment. American Economic Review, 95(3), 897–901. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2007a). Measurement with experimental controls. In: M. Boumans (Ed.), Measurement in economics: A handbook. San Diego, CA: Elsevier. Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2005c). Risk attitudes, randomization to treatment, and self-selection into experiments. Working Paper 05-01. Department of Economics, College of Business Administration, University of Central Florida; Journal of Economic Behaviour & Organization, forthcoming. Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2007b). Estimating risk attitudes in Denmark: A field experiment. Scandinavian Journal of Economics, 109(2), 341–368. Harrison, G. W., Lau, M. I., Rutstro¨m, E. E., & Sullivan, M. B. (2005d). Eliciting risk and time preferences using field experiments: Some methodological issues. In: J. Carpenter, G. W. Harrison & J. A. List (Eds), Field experiments in economics (Vol. 10). Greenwich, CT: JAI Press, Research in Experimental Economics. Harrison, G. W., Lau, M. I., & Williams, M. B. (2002). Estimating individual discount rates for Denmark: A field experiment. American Economic Review, 92(5), 1606–1617.
Risk Aversion in the Laboratory
151
Harrison, G. W., & List, J. A. (2004). Field experiments. Journal of Economic Literature, 42(4), 1013–1059. Harrison, G. W., List, J. A., & Towe, C. (2007c). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. Econometrica, 75(2), 433–458. Harrison, G. W., List, J. A., & Tra, C. (2005e). Statistical characterization of heterogeneity in experiments. Working Paper 05-10. Department of Economics, College of Business Administration, University of Central Florida. Harrison, G. W., & Rutstro¨m, E. E. (2005). Expected utility theory and prospect theory: One wedding and a decent funeral. Working Paper 05-18. Department of Economics, College of Business Administration, University of Central Florida; Experimental Economics, forthcoming. Harrison, G. W., & Rutstro¨m, E. E. (2008). Experimental evidence on the existence of hypothetical bias in value elicitation methods. In: C. R. Plott, V. L. Smith (Eds), Handbook of experimental economics results. North-Holland: Amsterdam, forthcoming. Harstad, R. M. (2000). Dominant strategy adoption and bidders’ experience with pricing rules. Experimental Economics, 3(3), 261–280. Heckman, J. J., Robb, R., Heckman, J., & Singer, B. (Eds). (1985). Longitudinal analysis of labor market data. New York: Cambridge University Press. Heckman, J. J., & Smith, J. A. (1995). Assessing the case for social experiments. Journal of Economic Perspectives, 9(2), 85–110. Heinemann, F. (2008). Measuring risk aversion and the wealth effect. In: J. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Greenwich, CT: JAI Press, Research in Experimental Economics. Hershey, J. C., Kunreuther, H. C., & Schoemaker, P. J. H. (1982). Sources of bias in assessment procedures for utility functions. Management Science, 28(8), 936–954. Hershey, J. C., & Schoemaker, P. J. H. (1985). Probability versus certainty equivalence methods in utility measurement: Are they equivalent? Management Science, 31(10), 1213–1231. Hey, J. D. (1995). Experimental investigations of errors in decision making under risk. European Economic Review, 39, 633–640. Hey, J. D. (2001). Does repetition improve consistency? Experimental Economics, 4, 5–54. Hey, J. D. (2002). Experimental economics and the theory of decision making under uncertainty. Geneva Papers on Risk and Insurance Theory, 27(1), 5–21. Hey, J. D., & Lee, J. (2005a). Do subjects remember the past? Applied Economics, 37, 9–18. Hey, J. D., & Lee, J. (2005b). Do subjects separate (or are they sophisticated)? Experimental Economics, 8, 233–265. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62(6), 1291–1326. Hirshleifer, J., & Riley, J. G. (1992). The analytics of uncertainty and information. New York, NY: Cambridge University Press. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92(5), 1644–1655. Holt, C. A., & Laury, S. K. (2005). Risk aversion and incentive effects: New data without order effects. American Economic Review, 95(3), 902–912. Horowitz, J. K. (1992). A test of intertemporal consistency. Journal of Economic Behavior and Organization, 17, 171–182.
152
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Hotz, V. J. (1992). Designing an evaluation of JTPA. In: C. Manski & I. Garfinkel (Eds), Evaluating welfare and training programs. Cambridge: Harvard University Press. Isaac, R. M., & James, D. (2000). Just who are you calling risk averse? Journal of Risk and Uncertainty, 20(2), 177–187. James, D. (2007). Stability of risk preference parameter estimates within the Becker–DeGroot– Marschak procedure. Experimental Economics, 10, 123–141. Kachelmeier, S. J., & Shehata, M. (1992). Examining risk preferences under high monetary incentives: Experimental evidence from the People’s Republic of China. American Economic Review, 82(5), 1120–1141. Kachelmeier, S. J., & Shehata, M. (1994). Examining risk preferences under high monetary incentives: Reply. American Economic Review, 84(4), 1104. Kagel, J. H. (1995). Auctions: A survey of experimental research. In: J. H. Kagel & A. E. Roth (Eds), The handbook of experimental economics. Princeton: Princeton University Press. Kagel, J. H., & Levin, D. (2002). Common value auctions and the winner’s curse. Princeton: Princeton University Press. Kagel, J. H., MacDonald, D. N., & Battalio, R. C. (1990). Tests of ‘Fanning Out’ of indifference curves: Results from animal and human experiments. American Economic Review, 80(4), 912–921. Kahneman, D., & Lovallo, D. (1993). Timid choices and bold forecasts: A cognitive perspective on risk taking. Management Science, 39(1), 17–31. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Keller, L. R., & Strazzera, E. (2002). Examining predictive accuracy among discounting models. Journal of Risk and Uncertainty, 24(2), 143–160. Kent, S. (1964). Words of estimated probability. Studies in Intelligence, 8, 49–65. Klibanoff, P., Marinacci, M., & Mukerji, S. (2005). A smooth model of decision making under ambiguity. Econometrica, 73(6), 1849–1892. Ko¨bberling, V., & Wakker, P. P. (2005). An index of loss aversion. Journal of Economic Theory, 122, 119–131. Ko+ szegi, B., & Rabin, M. (2006). A model of reference-dependent preferences. Quarterly Journal of Economics, 121(4), 1133–1165. Ko+ szegi, B., & Rabin, M. (2007). Reference-dependent risk attitudes. American Economic Review, 97(4), 1047–1073. Kramer, M., & Shapiro, S. (1984). Scientific challenges in the application of randomized trials. Journal of the American Medical Association, 252, 2739–2745. Kreps, D. M., & Porteus, E. L. (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica, 46(1), 185–200. Krupnick, A., Alberini, A., Cropper, M., Simon, N., O’Brien, B., Goeree, R., & Heintzelman, M. (2002). Age, health and the willingness to pay for mortality risk reductions: A contingent valuation survey of Ontario residents. Journal of Risk and Uncertainty, 24(2), 161–186. Kuilen, G., Wakker, P. P., & Zou, L. (2007). A midpoint technique for easily measuring prospect theory’s probability weighting. Working Paper. Econometric Institute, Erasmus University, Rotterdam, The Netherlands. Laury, S. K., & Holt, C. A. (2008). Further reflections on prospect theory. In: J. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Lazear, E. P., Malmendier, U., & Weber, R. A. (2006). Sorting in experiments with application to social preferences. Working Paper #12041. National Bureau of Economic Research.
Risk Aversion in the Laboratory
153
Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22. List, J. A. (2003). Does market experience eliminate market anomalies? Quarterly Journal of Economics, 118, 41–71. Loomes, G. (1988). Different experimental procedures for obtaining valuations of risky actions: Implications for utility theory. Theory and Decision, 25, 1–23. Loomes, G., Moffatt, P. G., & Sugden, R. (2002). A microeconometric test of alternative stochastic theories of risky choice. Journal of Risk and Uncertainty, 24(2), 103–130. Loomes, G., Starmer, C., & Sugden, R. (1991). Observing violations of transitivity by experimental methods. Econometrica, 59(2), 425–439. Loomes, G., & Sugden, R. (1995). Incorporating a stochastic element into decision theories. European Economic Review, 39, 641–648. Loomes, G., & Sugden, R. (1998). Testing different stochastic specifications of risky choice. Economica, 65, 581–598. Lopes, L. L. (1984). Risk and distributional inequality. Journal of Experimental Psychology: Human Perception and Performance, 10(4), 465–484. Luce, R. D. (1959). Individual choice behavior. New York: Wiley. Luce, R. D., & Fishburn, P. C. (1991). Rank and sign-dependent linear utility models for finite first-order gambles. Journal of Risk and Uncertainty, 4, 29–59. Lusk, J. L., & Coble, K. H. (2005). Risk perceptions, risk preference, and acceptance of risky food. American Journal of Agricultural Economics, 87(2), 393–404. Lusk, J. L., & Coble, K. H. (2008). Risk aversion in the presence of background risk: Evidence from the lab. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Mankiw, N. G., & Zeldes, S. P. (1991). The consumption of stockholders and non-stockholders. Journal of Financial Economics, 29(1), 97–112. McFadden, D. (2001). Economic choices. American Economic Review, 91(3), 351–378. McKee, M. (1989). Intra-experimental income effects and risk aversion. Economics Letters, 30, 109–115. Miller, L., Meyer, D. E., & Lanzetta, J. T. (1969). Choice among equal expected value alternatives: Sequential effects of winning probability level on risk preferences. Journal of Experimental Psychology, 79(3), 419–423. Millner, E. L., Pratt, M. D., & Reilly, R. J. (1988). A reexamination of Harrison’s experimental test for risk aversion. Economics Letters, 27, 317–319. Murnighan, J. K., Roth, A. E., & Shoumaker, F. (1987). Risk aversion and bargaining: Some preliminary results. European Economic Review, 31, 265–271. Murnighan, J. K., Roth, A. E., & Shoumaker, F. (1988). Risk aversion in bargaining: An experimental study. Journal of Risk and Uncertainty, 1(1), 101–124. Nau, R. F. (2006). Uncertainty aversion with second-order utilities and probabilities. Management Science, 52(1), 136–145. Novemsky, N., & Kahneman, D. (2005a). The boundaries of loss aversion. Journal of Marketing Research, XLII, 119–128. Novemsky, N., & Kahneman, D. (2005b). How do intentions affect loss aversion? Journal of Marketing Research, XLII, 139–140. Ochs, J., & Roth, A. E. (1989). An experimental study of sequential bargaining. American Economic Review, 79(3), 355–384. Ortona, G. (1994). Examining risk preferences under high monetary incentives: Comment. American Economic Review, 84(4), 1104.
154
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Paarsch, H. J., & Hong, H. (2006). An introduction to the structural econometrics of auction data. Cambridge, MA: MIT Press. Page, T., Putterman, L., & Unel, B. (2005). Voluntary association in public goods experiments: Reciprocity, mimicry, and efficiency. Economic Journal, 115, 1037–1058. Palfrey, T. R., & Pevnitskaya, S. (2008). Endogenous entry and self-selection in private value auctions: An experimental study. Journal of Economic Behavior & Organization, 66, forthcoming. Papke, L. E., & Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics, 11, 619–632. Philipson, T., & Hedges, L. V. (1998). Subject evaluation in social experiments. Econometrica, 66(2), 381–408. Plott, C. R., & Zeiler, K. (2005). The willingness to pay-willingness to accept gap, the ‘Endowment Effect,’ subject misconceptions, and experimental procedures for eliciting valuations. American Economic Review, 95(3), 530–545. Prelec, D. (1998). The probability weighting function. Econometrica, 66, 497–527. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior & Organization, 3(4), 323–343. Quiggin, J. (1993). Generalized expected utility theory: The rank-dependent model. Norwell, MA: Kluwer Academic. Rabe-Hesketh, S., & Everitt, B. (2004). A handbook of statistical analyses using Stata (3rd ed.). New York: Chapman & Hall/CRC. Rabin, M. (2000). Risk aversion and expected utility theory: A calibration theorem. Econometrica, 68, 1281–1292. Reilly, R. J. (1982). Preference reversal: Further evidence and some suggested modifications in experimental design. American Economic Review, 72, 576–584. Rieger, M. O., & Wang, M. (2006). Cumulative prospect theory and the St. Petersburg paradox. Economic Theory, 28, 665–679. Rogers, W. H. (1993). Regression standard errors in clustered samples. Stata Technical Bulletin, 13, 19–23. Roth, A. E., & Malouf, M. W. K. (1979). Game-theoretic models and the role of information in bargaining. Psychological Review, 86, 574–594. Rutstro¨m, E. E. (1998). Home-grown values and the design of incentive compatible auctions. International Journal of Game Theory, 27(3), 427–441. Saha, A. (1993). Expo-power utility: A flexible form for absolute and relative risk aversion. American Journal of Agricultural Economics, 75(4), 905–913. Savage, L. J. (1972). The foundations of statistics (2nd ed.). New York: Dover. Schmidt, U., Starmer, C., & Sugden, R. (2005). Explaining preference reversal with thirdgeneration prospect theory. Working Paper. School of Economic and Social Science, University of East Anglia. Schubert, R., Brown, M., Gysler, M., & Brachinger, H. W. (1999). Financial decision-making: Are women really more risk-averse? American Economic Review (Papers & Proceedings), 89(2), 381–385. Smith, V. L. (1982). Microeconomic systems as an experimental science. American Economic Review, 72(5), 923–955. Smith, V. L. (2003). Constructivist and ecological rationality in economics. American Economic Review, 93(3), 465–508.
Risk Aversion in the Laboratory
155
Starmer, C., & Sugden, R. (1989). Violations of the independence axiom in common ratio problems: An experimental test of some competing hypotheses. Annals of Operational Research, 19, 79–102. Starmer, C., & Sugden, R. (1991). Does the random-lottery incentive system elicit true preferences? An experimental investigation. American Economic Review, 81, 971–978. StataCorp. (2007). Stata statistical software: Release 10. College Station, TX: Stata Corporation. Stigler, G. J., & Becker, G. S. (1977). De gustibus non est disputandum. American Economic Review, 67(2), 76–90. Sutter, M., Haigner, S., & Kocher, M. (2006). Choosing the stick or the carrot? Endogenous institutional choice in social dilemma situations. Discussion Paper No. 5497. Centre for Economic Policy Research, London. Tanaka, T., Camerer, C. F., & Nguyen, Q. (2007). Risk and time preferences: Experimental and household survey data from Vietnam. Working Paper. California Institute of Technology. Thaler, R. H., Tversky, A., Kahneman, D., & Schwartz, A. (1997). The effect of myopia and loss aversion on risk taking: An experimental test. Quarterly Journal of Economics, 112, 647–661. Train, K. E. (2003). Discrete choice methods with simulation. New York: Cambridge University Press. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representations of uncertainty. Journal of Risk and Uncertainty, 5, 297–323. United States Environmental Protection Agency. (1997). The benefits and costs of the clean air act: 1970 to 1990. Washington, DC: Office of Air and Radiation, US EPA. Vickrey, W. S. (1961). Counterspeculation, auctions and competitive sealed tenders. Journal of Finance, 16, 8–37. von Winterfeldt, D., & Edwards, W. (1986). Decision analysis and behavioral research. New York: Cambridge University Press. Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57(2), 307–333. Wakker, P. P. (1989). Transforming probabilities without violating stochastic dominance. In: E. Roskam (Ed.), Mathematical Psychology in Progress. Berlin: Springer. Wakker, P. P., & Deneffe, D. (1996). Eliciting von Neumann-Morgenstern utilities when probabilities are distorted or unknown. Management Science, 42, 1131–1150. Wakker, P. P., Erev, I., & Weber, E. U. (1994). Comonotonic independence: The critical test between classical and rank-dependent utility theories. Journal of Risk and Uncertainty, 9, 195–230. Wilcox, N. T. (2008a). Stochastic models for binary discrete choice under risk: A critical primer and econometric comparison. In: J. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Wilcox, N. T. (2008b). ‘Stochastically more risk averse:’ A contextual theory of stochastic discrete choice under risk. Journal of Econometrics, 142, forthcoming. Williams, R. (2000). A note on robust variance estimation for cluster-correlated. Biometrics, 56, 645–646. Wooldridge, J. (2003). Cluster-sample methods in applied econometrics. American Economic Review (Papers & Proceedings), 93, 133–138. Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica, 55(1), 95–115.
156
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
APPENDIX A. REPRESENTATION AND PERCEPTION OF PROBABILITIES There are two representational issues with probabilities. The first is that subjects may base their decisions on concepts of subjective probabilities such that we should expect them to deviate in some ways from objective probabilities. The second is that perceptions of probabilities may not correspond to the actual probabilities. Only with a theory that explains both the perception of probabilities and the relationship between subjective and objective probabilities we would be able to identify both of these deviations. Nevertheless, careful experimental design can be helpful in generating some robustness in subjective and perceived probabilities, and a convergence in both of these on the underlying objective ones when that is normatively desirable. The review in this appendix complements the discussion in Section 1 of the paper by showing some alternative ways to represent the lotteries to subjects. Camerer (1989) used a stacked box display to represent his lotteries to subjects. The length of the box provided information on the probabilities of each prize, and the width of the box provided information on the relative size of the prizes. The example in Fig. 20 was used in his written instructions to subjects, to explain how to read the lottery. Those instructions were as follows: The outcomes of the lotteries will be determined by a random number between 01 and 100. Each number between (and including) 01 and 100 is equally likely to occur. In the example above, the left lottery, labeled ‘‘A’’, pays nothing (0) if the random number is between 01 and 40. Lottery A pays five dollars ($5) if the random number is between 41 and 100. Notice that the picture is drawn so that the height of the line between 01 and 40 is 40% of the distance from 01 to 100. The rectangle around ‘‘$5’’ is 60% of the distance from 01 to 100. In the example above the lottery on the right, labeled ‘‘B’’, pays nothing (0) if the random number is between 01 and 50, five dollars ($5) if the random number is between 51 and 90, and ten dollars ($10) if the random number is between 91 and 100. As with lottery A, the heights of the lines in lottery B represent the fraction of the possible numbers which yield each payoff. For example, the height of the $10 rectangle is 10% of the way from 01 to 100. The widths of the rectangles are proportional to the size of their payoffs. In lottery B, for example, the $10 rectangle is twice as wide as the $5 rectangle.
This display is ingenious in the sense that it compactly displays the ‘‘numbers’’ as well as visual referents for the probabilities and relative
157
Risk Aversion in the Laboratory
Fig. 20.
Lottery Display Used by Camerer (1989).
prizes. The subject has to judge the probabilities for each prize from the visual referent, and is not directly provided that information numerically. There is a valuable literature on the ability of subjects to accurately assess quantitative magnitudes from visual referents of this kind, and it points to the need for individual-specific calibration in experts and non-experts (Cleveland, Harris, & McGill, 1982, 1983; Cleveland & McGill, 1984). Battalio et al. (1990) and Kagel et al. (1990) employed purely numerical displays of their lotteries. For example, one such lottery was presented to subjects as follows: A: B:
Winning Winning Winning Winning
$11 if 1–20 $5 if 21–200 $25 if 1–6 $5 if 7–100
(20%) (80%) (6%) (94%)
Answer: (1) I prefer A. (2) I prefer B. (3) Indifferent.
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
158
This display presents all values numerically, with no visual referents. The numerical display shows the probability for each prize, rather than require the subject to infer that from the cumulative probabilities. Beattie and Loomes (1997) used displays that were similar to those employed by Camerer (1989), although the probabilities were individually comparable since they were vertically aligned with a common base. Fig. 21 illustrates how they presented the lotteries to subjects. In addition, they provided text explaining how to read the display. Wakker, Erev, and Weber (1994) considered four types of representations, shown in Fig. 22. One, on the far right, was a copy of the display employed by Camerer (1989), and the three on the left varied the extent to which information on outcomes was collapsed (top two panels on the left) and whether numerical information was provided in addition to the verbal information about probabilities (bottom panel on the left). The alternative representations were applied on a between-subjects basis, but no information is provided about the effect on behavior. An example of the representation of probability using a verbal analogical scale is provided by Calman and Royston (1997; Table 4), using a distance analogue. For risks of 1 in 1, 1 in 10, 1 in 100, 1 in 1000, for example, the distance containing one ‘‘risk stick’’ 1 foot in length is 1 foot, 10 feet, 100 feet, and 1,000 feet, respectively. An older tradition seeks to ‘‘calibrate’’ words that are found in the natural English language with precise probability ranges. This idea stems from a concern that Kent (1964) had with the ambiguity in the use of colloquial expressions of uncertainty by intelligence operatives. He proposed that certain words be assigned specific numerical probability ranges. A study reported by von Winterfeldt and Edwards (1986, p. 98ff.) used these expressions and asked a number of NATO officials to state the probabilities that they would attach to the use of those words in sentences. The dots in Fig. 23 show the elicited probability judgements, and the shaded bars show the ranges suggested by Kent (1964). The fact that there
Fig. 21.
Lottery Display Used by Beattie and Loomes (1997).
Risk Aversion in the Laboratory
Fig. 22.
159
Lottery Displays Used by Wakker et al. (1994).
is a poor correspondence with untrained elicitors does not mean, however, that one could not undertake such a ‘‘semantic coordination game’’ using salient rewards, and try to encourage common usage of critical words. The visual dots method is employed by Krupnick et al. (2002, p. 167), and provides a graphic image to complement the direct fractional, numerical representation of probability. An example of their visualization method is shown in Fig. 24. Visual ladders have been used in previous research on mortality risk by Gerking, de Haan, and Schulze (1988) and Gegax, Gerking, and Schulze (1991). One such ladder, from their survey instrument, is shown in Fig. 25. An alternative ladder visualization is offered by Calman and Royston (1997; Fig. 1), and is shown in Fig. 26. One hypothesis to emerge from this review of the representation of lotteries in laboratory and survey settings is that there is no single task representation for lotteries that is perfect for all subjects. It follows that
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
160
Fig. 23.
Representing Risk on a Verbal Analogical Scale.
161
Risk Aversion in the Laboratory
Fig. 24.
Fig. 25.
Representing Risk with Dots.
Representing Risk with a 2D Ladder.
162
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
Fig. 26. Representing Risk with a 3D Ladder.
163
Risk Aversion in the Laboratory
some of the evidence for framing effects in the representation of risk may be due to the implicit assumption that one form of representation works best for everyone: the ‘‘magic bullet’’ assumption. Rather, we should perhaps expect different people to perform better with different representations. To date no systematic comparison of these different methods have been performed and there is no consensus as to what constitutes a state of the art representation.
APPENDIX B. THE EXPERIMENTS OF HEY AND ORME (1994) B1. The Original Experiments The experiments of Hey and Orme (1994) are important in many respects. First, they use lottery tasks that are not designed as ‘‘trip wire’’ tests of one theory or another, but instead as representative lottery tasks. This design objective has strengths and weaknesses. The strength is that one can evaluate many different theories without the task domain being biased in favor of any one theory. Thus, tests of a theory will be based on tasks that are not just built to trick it into error. The weakness is that it might be inefficient as a domain for choosing between different theories. The second reason that these experiments are important, of course, is that they were evaluated using formal ML methods at the level of the individual, including explicit discussion of structural error models due to Fechner. The basic experiments of HO are reviewed in Section 1.2, and the display subjects saw is presented in Fig. 3. Subjects were recruited from the University of York and participated in two sessions, each consisting of 100 binary lottery choices. The sample consisted of 80 students, who were allowed to proceed at their own pace. The lottery tasks took roughly 35 min to complete, and subjects earned an average of d17.50 per hour for this task and one other task. B2. Replication There are two limitations of the original HO experimental data, which make it useful to undertake a replication and extension. One is that there is no data on individual characteristics, so that it is impossible to pool data across subjects and condition estimation on those characteristics. Of course,
164
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
this was not the objective of HO, who estimated choice functionals for each individual separately. But it does limit the use of these data for other purposes. Second, all of the lotteries were in the gain domain, and many theories require lotteries that are framed as losses or as mixtures of gains and losses. Hence, we review here the replications and extensions of Harrison and Rutstro¨m (2005), which address these two limitations. Subjects were presented with 60 lottery pairs, each represented as a ‘‘pie’’ showing the probability of each prize. Fig. 4 illustrates one such representation. The subject could choose the lottery on the left or the right, or explicitly express indifference (in which case the experimenter would flip a coin on the subject’s behalf). After all 60 lottery pairs were evaluated, and three were selected at random for payment.87 The lotteries were presented to the subjects in color on a private computer screen,88 and all choices recorded by the computer program. This program also recorded the time taken to make each choice. In addition to the choice tasks, the subjects provided information on demographic and other personal characteristics. In the gain frame experiments the prizes in each lottery were $0, $5, $10, and $15, and the probabilities of each prize varied from choice to choice, and from lottery to lottery. In the loss frame experiments subjects were given an initial endowment of $15, and the corresponding prizes from the gain frame lotteries were transformed to be $15, $10, $5, and $0. Hence, the final outcomes, inclusive of the endowment, were the same in the gain frame and loss frame. In the mixed frame experiments subjects were given an initial endowment of $8, and the prizes were transformed to be $8, $3, $3, and $8, generating final outcomes inclusive of the endowment of $0, $5, $11, and $16.89 In addition to the fixed endowment, each subject received a random endowment between $1 and $10. This endowment was generated using a uniform distribution defined over whole dollar amounts, operationalized by a 10-sided die. The purpose of this random endowment is to test for endowment effects on the choices. The probabilities used in each lottery ranged roughly evenly over the unit interval. Values of 0, 0.13, 0.25, 0.37, 0.5, 0.62, 0.75, and 0.87 were used.90 The presentation of a given lottery on the left or the right was determined at random, so that the ‘‘left’’ or ‘‘right’’ lotteries did not systematically reflect greater risk or greater prize range than the other. Subjects were recruited at the University of Central Florida, primarily from the College of Business Administration, using the online recruiting application at ExLab (http://exlab.bus.ucf.edu). Each subject received a $5
Risk Aversion in the Laboratory
165
fee for showing up to the experiments, and completed an informed consent form. Subjects were deliberately recruited for ‘‘staggered’’ starting times, so that the subject would not pace their responses by any other subject. Each subject was presented with the instructions individually, and taken through the practice sessions at an individual pace. Since the rolls of die were important to the implementation of the objects of choice, the experimenters took some time to give each subject ‘‘hands-on’’ experience with the (10-sided, 20-sided, and 100-sided) die being used. Subjects were free to make their choices as quickly or as slowly as they wanted. Our data consists of responses from 158 subjects making 9,311 choices that do not involve indifference. Only 1.7% of the choices involved explicit choice of indifference, and to simplify we drop those in estimation unless otherwise noted. Of these 158 subjects, 63 participated in gain frame tasks, 37 participated in mixed frame tasks, and 58 participated in loss frame tasks.
APPENDIX C. THE EXPERIMENTS OF HOLT AND LAURY (2002) C1. Explaining the Data Holt and Laury (2002) examine two main treatments with 212 subjects. The first is the effect of incentives. They vary the scale of the payoffs in the matrix shown in panel A of Table 1, which we take to be the scale of 1. Every subject was presented with the first matrix of choices shown in panel A of Table 1, and with the exact same matrix at the end of the experiment. These two choices were always given to all subjects, and we will refer to them as task #1 and task #4. All subjects additionally had one or two intermediate choices, referred to here as task #2 and task #3. The question in task #2, if asked, was a higher-scale, hypothetical version of the initial matrix of payoffs. The question in task #3, if asked, was the same higher-scale version of payoffs but with real payoffs. Some subjects were asked one of these intermediate task questions; most subjects were asked both of them (hence for some subjects task #4 was actually their third and last task). Thus, we obtain the tabulation of individual responses shown in Table 9. We see from Table 9 how each subject experienced different scales of payoffs in task #2 and/or task #3. This provides in-sample tests of the hypothesis that risk aversion does not vary with wealth, an important issue for those that assume specific functional forms such as CRRA or CARA.
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
166
Table 9.
Sample Size and Design of the Holt and Laury (2002) Experiments.
Scale of Payoffs
Task 1
1 20 50 90 All
Total
2
3
118 19 18 155
150 19 18 187
212
212
4 212
212
424 268 38 36 766
A rejection of the ‘‘constancy’’ assumption in CRRA or CARA is not a rejection of EUT in general, of course, but just these particular (popular) parameterizations. In Section 3.7 and Appendix E, we see that some studies unfortunately equate ‘‘rejection of EUT’’ with ‘‘rejection of CRRA.’’ The second treatment in the HL design is the effect of hypothetical payoffs, which is why the questions in task #2 are included. Economic theory has no prediction when the task is not salient, and we have no control over subject behavior as an experimenter. The effect of using hypothetical responses is examined in depth in Harrison (2007) using these and other data, since the use of such data has been so prevalent in the empirical literature on the validity of EUT, but we do not consider them any further here. There is considerable evidence, bolstered by Holt and Laury (2005), that risk attitudes elicited with hypothetical responses are significantly different to risk attitudes elicited with real economic consequences, so this is a debate simply not worth pursuing. Although having in-sample responses is valuable, it comes at a price in terms of control since there may be wealth effects from the subjects having earned some profit in the previous choice. To handle this HL use a nice trick: when the subjects proceed from task #1 to task #3, they are first asked if they are willing to give up their earnings in task #1 in order to play task #3. Since the stakes are so much higher in task #3, all subjects chose to do so. This means that the subjects face tasks #1 and #3 with no prior earnings from these experiments, although they do have experience with the type of task when facing task #3. No such trick can be applied for task #4, since the subjects would be unlikely to give up their earnings in task #3 in this instance. Thus, the responses to task #4 have no controls for wealth built in to the design. However, we do know the actual earnings of the subjects from the experimental data.
167
Risk Aversion in the Laboratory Payoffs of 1× and 20×, by Choice Problem
Payoffs of 1× and 50×, by Choice Problem
1
1
.75
.75 .5
.5 Risk Neutral
1×
20×
Risk Neutral
1×
50×
.25
.25 0
0 1
2
3
4 5 6 7 Problem Sequence
8
9
1
10
Payoffs of 1× and 90×, by Choice Problem
2
3
4 5 6 7 Problem Sequence
8
9
10
Same Payoffs of 1×, by Choice Problem
1
1
.75
.75 .5
.5 Risk Neutral
1×
Risk Neutral
90× .25
.25 0
0 1
2
3
Fig. 27.
4 5 6 7 Problem Sequence
8
9
10
1
2
3
4 5 6 7 8 Problem Sequence
9
10
Observed Choices in Holt and Laury (2002) Experiments.
HL also ask each subject to fill out a detailed question of individual demographic information, so their data include a rich set of controls for differences in risk preferences due to these characteristics. Fig. 27 shows the main responses in the HL experiments. Consider the top left panel, which shows the average number of choices of the ‘‘safe’’ option A in each problem. In Problem 1, which is row 1 in panel A of Table 1, virtually everyone chooses option A (the safe choice). By the time the subjects get to Problem 10, which is the last row in panel A of Table 1, virtually everyone has switched over to problem B, the ‘‘risky’’ option. The dashed line marked RN shows the prediction if each and every subject were risk neutral: in this case everyone would choose option A up to Problem 4, then everyone would choose option B thereafter. The solid line marked with a circle shows the observed behavior in task #1, the low-payoff case. The solid line marked with a diamond shows the observed behavior in task #3, the high-payoff case. In the top left panel, the high payoff refers to payoff matrices that scale up the values in panel A of Table 1 by 20. The top right panel in Fig. 27 shows comparable data for the 50 problems, and the bottom left panel shows comparable data for the 90 problems.91
168
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
We examine the bottom-right panel later. HL proceed with their analysis by looking at the first three pictures in Fig. 27 and drawing two conclusions. First, that one has to introduce some ‘‘noise’’ into any model of the data-generation process, since the observed choices are ‘‘smoother’’ than the risk-neutral prediction. A more general way of saying this is to allow subjects to have a specific degree of risk aversion, but to critically assume that they all have exactly the same degree of risk aversion. Thus, if subjects were a little risk averse the line marked RN would shift to the right and drop down a bit to the right, perhaps at Problem 6 or 7 instead of Problem 5. Of course, it would no longer represent riskneutral responses, but it would still drop sharply, and that is the point being made by HL when arguing for a noise parameter. Second, and related to the previous explanation, the best-fitting line that assumes homogenous risk preferences would have to be a bit to the right of the risk-neutral line marked RN. So some degree of risk aversion, they argue, is needed to account for the location of the observed averages, quite apart from the need for a noise parameter to account for the smoothness of the observed averages. Both conclusions depend critically on the assumption that every subject in the experiment has the same preferences over risk. The smoothness of the observed averages is easily explained if one allows heterogenous risk attitudes and no noise at all at the individual level: some people drop down at Problem 4, some more at Problem 5, some more at Problem 6, and so on. The smoothness that the eyeball sees in the aggregate data is just a counterpart of averaging this heterogenous process. The fact that some degree of risk aversion is needed for some subjects is undeniable, from the positive area above the RN line and below the circle or diamond lines from Problems 5 through 10. But it simply does not follow without further statistical analysis that all subjects, or even the typical subject, exhibit significant amounts of risk aversion. Nor does it follow that a noise parameter is needed to model these data. These conclusions follow from inspection of each of the first three panels of Fig. 27, and just the RN and circle lines in each for that matter. Now turn to the comparison of the circle and diamond lines within each of the first three panels. The eyeball suggests that the diamond lines are to the right of the circle lines, which implies that risk aversion increases as the scale of payoffs increases. But this conclusion requires some measures of the uncertainty of these averages. Not surprisingly, the standard deviation in responses is the largest around Problems 5 through 7, suggesting that the confidence intervals around these diamond and circle lines could easily
Risk Aversion in the Laboratory
169
overlap. Again, this is a matter for an appropriate statistical analysis, not eyeball inspection of the averages. Finally, compare the differences between the diamond and circle lines as one scans across the first three panels in Fig. 27. As the payoff scale gets larger, from 20 to 50 and then to 90, it appears that the gap widens. That is, if one ignores the issue of standard errors around these averages, it appears that the degree of risk aversion increases. This leads HL to reject CRRA and CARA, and to consider generalized functional forms for utility functions that admit of increasing risk aversion. However, as Table 9 shows, the sample sizes for the 50 and 90 treatments were significantly smaller than those for the 20 treatment: 38 and 36 subjects, respectively, compared to 268 subjects for the 20 treatments. So one would expect that the standard errors around the 50 and 90 high-payoff lines would be much larger than those around the 20 high-payoff lines. This could make it difficult to statistically draw the eyeball conclusion that scale increases risk aversion. Finally, one needs to account for the fact that all of the high-payoff data in the HL experiments was obtained in a task that followed the low-payoff task. Income effects were controlled for, in an elegant manner described above. But there could still be simple order effects due to experience with the qualitative task. HL recognize the possibility of order effects when discussing why they had the high hypothetical task before the high real task: ‘‘Doing the high hypothetical choice task before high real allows us to hold wealth constant and to evaluate the effect of using real incentives. For our purposes, it would not have made sense to do the high real treatment first, since the careful thinking would bias the high hypothetical decisions.’’ The same (correct) logic applies to comparisons of the second real task with the first real task. The bottom, right panel examines the data collected by HL in task #1 and task #4, which have the same scale but differ only in terms of the order effect and the accumulated wealth from task #3. These lines appear to be identical, suggesting no order effect, but a closer statistical analysis that conditions on the two differences shows that there is in fact an order effect at work.
C2. Modeling Behavior One of the major contributions of HL is to present ML estimates of a relatively flexible utility function using their data. Recognizing the apparent changes in RRA with the scale treatments, they note that CRRA would not
170
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
be appropriate, and use a parameterization of the EP function introduced by Saha (1993).
APPENDIX D. THE EXPERIMENTS OF KACHELMEIER AND SHEHATA (1992) To illustrate the use of the BDM procedure, and to point to some potential problems, consider the ‘‘high payoff’’ experiments from China reported by Kachelmeier and Shehata (1992). These involved subjects facing lotteries with prizes equal to 0.5 yuan, 1 yuan, 5 yuan, or 10 yuan. Although 10 yuan only converted to about $2.50 at the time of the experiments, this represented a considerable amount of purchasing power in that region of China, as discussed by KS (p.1123). There were four treatments. One treatment used 25 lotteries with 5 yuan, one used 25 lotteries with 10 yuan, one used 25 lotteries with 0.5 yuan followed by 25 lotteries with 5 yuan, and one used 25 lotteries with 1 yuan followed by 25 lotteries with 10 yuan. In all cases, the first of the battery of 25 lotteries was a hypothetical trainer, and is ignored in the analysis shown below. Figs. 28 and 29 show the data from the experiments of KS in China. The vertical axis shows the ratio of the elicited CE to the expected value of the lottery, and the horizontal axis shows the probability of winning each lottery. Each panel in Fig. 28 shows a scatter of data from each prize treatment. In Fig. 28 we only show data from the first series of lottery choices, for comparability in terms of experience. In Fig. 29 we show the results for the high-prize treatments, with the first series on top and the second series on the bottom, to show the effect of experience with the general task.92 To orient the analysis, a simple cubic-spline is drawn through the median-bands; these lines are consistent with the formal statistical analysis reported below, but help explain certain features of the data. Four properties of these responses are evident from the pictures. First, the general tendency towards risk-loving behavior at the lower three prize levels, as evidenced by the CEs being greater than the expected value in Fig. 28. Second, the dramatic reduction in the dispersion of selling prices as the probability of winning increases to 1, as evidenced by the pattern of the scatter within each panel. Indeed, these pictures discard data for probabilities less than 0.15, and for ratios greater than 2.5, to allow reasonable scaling. The discarded data exhibit even more dramatic dispersion than is already evident at probability levels of 0.25. Third, the
171
Ratio of Certainty-Equivalent to Expected Value
Risk Aversion in the Laboratory First Task Prize of 1 yuan
First Task Prize of 5 yuan
First Task Prize of 10 yuan
2
1 0
2 1 0 0
.25
.5
1 0 .75 .25 Probability of Winning
.5
.75
1
Risk Premia and Probability of Winning in First Series of Kachelmeier– Shehata Experiments.
Fig. 28.
Ratio of Certainty-Equivalent to Expected Value
First Task Prize of 0.5 yuan
First Task Prize of 0.5 yuan
First Task Prize of 1 yuan
First Task Prize of 5 yuan
First Task Prize of 10 yuan
2 1 0
2 1 0 0
Fig. 29.
.25
.5
.75
1 0 .25 Probability of Winning
.5
.75
1
Risk Premia and Probability of Winning in High Stakes Kachelmeier– Shehata Experiments.
172
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
responses for the highest prize treatment in Fig. 28 are much closer to being risk neutral or risk averse, at least for winning probabilities greater than around 0.2. Finally, the data in Fig. 29 suggests that subjects are less risk loving when they have experience with the task, and/or that there is an increase in risk aversion due to the accumulation of experimental income from the first series of tasks. Since the BDM method generates a CE for each lottery, it is possible to estimate the CRRA coefficient directly for each response that a subject makes using the BDM method.93 If p is the winning probability for prize y, and s is the CE elicited as a selling price from the subject, then the coefficient is equal to 1 {ln(p)/(ln(s) ln(y))}. In this form, a value of zero indicates risk neutrality, and negative (positive) values risk-loving (risk averse) behavior. The behavior of the CRRA coefficient elicited using the BDM method is extremely sensitive to experimental conditions, even if one restricts attention to the high-stakes lotteries and win probabilities within 15% of the boundaries.94 First, the coefficients for low win probabilities imply extreme risk loving. This is perfectly plausible given the paltry stakes involved in such lotteries. Second, the coefficient depends on accumulated earnings, as hypothesized by McKee (1989). Increases in the average accumulated income earned in the task increase risk aversion, and increases in the three-round moving average of income decrease risk aversion.95 Third, ‘‘bad joss,’’ as measured by the fraction of random buying prices below the expected buying price of 50% of the prize, is associated with a large increase in risk-loving behavior.96 Fourth, as Fig. 29 would suggest, experience with the general task increases risk aversion. Fifth, increasing the prize from 5 yuan to 10 yuan increases risk aversion significantly. Of course, this last result is consistent with non-constant RRA, and should not be necessarily viewed as a problem unless one insisted on applying the same CRRA coefficient over these two reward domains. Fig. 30 summarizes the distribution of CRRA coefficients for the hightask decisions in KS. The dispersion of estimates is high, even though there is a marked tendency towards RN with the 10 yuan task and with experienced subjects. One of the key results here, as stressed by Kachelmeier and Shehata (1994), is that there is considerable variation in CRRA coefficients within each subject’s sample of responses, as well as between subjects. The withinsubjects’ standard deviation in CRRA coefficients is 1.10, and the betweensubjects’ standard deviation is 1.13, around a mean of negative 1.36. To deal with some of these problems we recommend paying subjects for just one stage to avoid intra-session income effects, the use of physical randomizing device to encourage subjects to see the random buyout price as independent of their selling price, the use of winning probabilities between 1/4 and 3/4 to avoid
173
Risk Aversion in the Laboratory First Task Prize of 5 yuan
First Task Prize of 10 yuan
Second Task Prize of 5 yuan
Second Task Prize of 10 yuan
.5 .4 .3 .2 .1 0 .5 .4 .3 .2 .1 0 -6
Fig. 30.
-4 -2 0 2 -6 -4 -2 Estimated CRRA Coefficients for Interior Winning Probabilities
0
2
Estimates of Risk Aversion from Kachelmeier–Shehata Experiments.
the more extreme effects of the end-point probabilities, and the provision of experience in the task in a completely prior session. We would also utilize extended instructions along the lines developed by Plott and Zeiler (2005).
APPENDIX E. THE EXPERIMENTS OF GNEEZY AND POTTERS (1997) The experimental task of Gneezy and Potters (1997) was very simple, and was followed exactly by Haigh and List (2005). Each subject in the baseline treatment made nine decisions over a fixed stake. In GP this stake was 2 Dutch Guilders, which we will call $2.00 for pedagogic ease. In each round they could choose a fraction of the stake to bet. If they chose to bet nothing then they received $2.00 in that round for certain. If they bet $x then they faced a 2/3 chance of losing $x and a 1/3 chance of winning $2.5x. These earnings were on top of the initial stake of $2.00. Thus, the subject literally ended up with ($2.00 $x) with probability 2/3 and ($2.00+$2.5x) with probability 1/3. Since $x could not exceed $2.00, by design, the subject actually faced no losses for the round as a whole. Of course, if one ignores the $2.00 stake the subject did face a loss. In the baseline condition the subject
174
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
chose a bet in each round, the random outcome was realized, their earnings in that round tabulated, and then the next round decision was made. In the alternative treatment the subject made three decisions instead of nine. The first decision was a single amount to bet in each of rounds 1 through 3, the second decision was a single amount to bet in each of rounds 4 through 6, and the third decision was a single amount to bet in each of rounds 7 through 9. Thus, the subject made one decision or choice for each of the outcomes in rounds 1, 2, and 3. To state it equivalently, since this is critical to follow, one decision was simply applied three times: it is not the case that the subject made three separate decisions at round 1 that were applied in rounds 1, 2, and 3, respectively. The subject could not say ‘‘bet x, y, and z% in rounds 1, 2, and 3,’’ but could only instead say ‘‘bet x%,’’ meaning that x% would be bet for the subject in each of round 1, 2, and 3. In all other respects the experimental task was the same: the only thing that varied was the horizon over which the choices were made. This is referred to as the Low frequency treatment (L), and the baseline is referred to as the High frequency treatment (H). The raw data in the two sets of experiments are presented in Figs. 31 and 32, which show the distribution of percentage bets. The general qualitative outcome is for subjects to bet more in the L treatment than in the H treatment. Gneezy and Potters (1997; Table I, p. 639) report that 50.5% was bet in their treatment H and 67.4% in their treatment L over all 9 rounds. They conducted their experiments with 83 Dutch students, split roughly evenly across the two treatments in a between-subjects design. Haigh and List (2005) (HLI) report virtually the same outcomes: for their sample of 64 American college students, the fractions were 50.9% and 62.5%, respectively, and for their sample of 54 current and former traders from the Chicago Board of Trade the fractions were 45% and 75%, respectively.97 Using unconditional non-parametric tests or panel Tobit models, these differences are statistically significant at standard levels.98 Thus, it appears that samples of subjects drawn from the same population behave as if more risk averse in treatment H compared to treatment L, and that the average subject is risk averse. The latter inference follows from the fact that a risk-neutral subject, according to EUT, would bet 100% of the stake. Figs. 31 and 32 also alert us to one stochastic feature of these data that will play a role later: that there is a substantial spike at the 100% bet level. From an EUT perspective, this corresponds to subjects that are risk neutral or risk loving. If we just consider ‘‘interior bets’’ then the same qualitative results obtain. In GP, the Low frequency treatment generates an average 42.1% bet
175
Risk Aversion in the Laboratory High Frequency .4 .3 .2
Fraction
.1 0 Low Frequency .4 .3 .2 .1 0 0
Fig. 31.
25
50 Percentage of Stake Bet
75
100
Distribution of Percentage Bets in Gneezy and Potters (1997) Experiments.
High Frequency Students
High Frequency Traders
Low Frequency Students
Low Frequency Traders
.4 .3 .2
Fraction
.1 0 .4 .3 .2 .1 0 0
Fig. 32.
25
50
75 100 0 Percentage of Stake Bet
25
50
75
100
Distribution of Percentage Bets in Haigh and List (2005) Experiments.
176
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
compared to an average 33.9% bet in the High frequency treatment. In HLI, the students (traders) bet an average of 37.7% (25.3%) and 51.4% (59.3%) in each treatment.
E1. Explaining the Data When interpreting the experiments of GP and HLI it is important to view subjects as having a utility function that is defined over prize income that reflects the stakes that choices are being made over. The high frequency subjects can be viewed as making a series of nine choices over stakes defined, for each choice, by a vector y which takes on a range of integer values between $0 and $7. The subject could get $0 if they bet 100% of the stake and lost it; or they could get as much as $7 if they bet 100% of the stake and won 2.5 $2.00. The low frequency subjects, on the other hand, made three choices over stakes defined by the possible combinations of gains and losses over three random draws. Thus, they could end up with three losses, 2 losses and 1 gain, 1 loss and 2 gains, or 3 gains. The probabilities for each outcome, irrespective of order, are 0.30, 0.44, 0.22, and 0.04, respectively. The monetary outcome in each case depends on the fraction of the stake that the subject chose to bet. Table 10 spells out the arithmetic for different bets. For simplicity we evaluate the possible choices in increments of 10 cents, but of course the choices could be in pennies.99 The second column shows the bet as a percent of the stake of $2.00. Columns 3 though 7 show the components of the lottery facing the subject in the High frequency treatment for each possible bet, and columns 8 through 16 show the same components for the subject in Low frequency treatment. Consider, for example, a bet of 10 cents, which is 5% of the stake. If the subject is in the High treatment and loses, they earn 190 ( ¼ 200 10) cents in that period; this occurs with probability 2/3. If the subject is in the High treatment and wins, they earn 225( ¼ 200+ 10 2.5 ¼ 200+25) cents; this occurs with probability 1/3. In the corresponding entry for the subject in the Low treatment, the value of prizes is calculated similarly, but for three random draws. Thus, in the LLL outcome, the subject earns 570( ¼ 200 10+200 10+200 10) cents. From Table 10 we see instantly that a risk-neutral subject that obeyed EUT would bet 100% of the pie in both treatments and thereby maximize expected value. It can also be inferred that a moderately risk-averse subject would bet some fraction of the pie in each treatment, less than 100%, and that a risk-loving subject would always bet 100% of the pie.
0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100
0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200
200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0
L
0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67
p(L)
200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600 625 650 675 700
G
0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33
p(G)
High Frequency Treatment
200.0 201.7 203.3 205.0 206.7 208.3 210.0 211.7 213.3 215.0 216.7 218.3 220.0 221.7 223.3 225.0 226.7 228.3 230.0 231.7 233.3
EV
600 570 540 510 480 450 420 390 360 330 300 270 240 210 180 150 120 90 60 30 0
LLL
0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30
p(LLL)
600 605 610 615 620 625 630 635 640 645 650 655 660 665 670 675 680 685 690 695 700
LLG
0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44
p(LLG)
600 640 680 720 760 800 840 880 920 960 1000 1040 1080 1120 1160 1200 1240 1280 1320 1360 1400
LGG
0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22
p(LGG)
Low Frequency Treatment
Illustrative Calculations Assuming Risk Neutrality.
600 675 750 825 900 975 1050 1125 1200 1275 1350 1425 1500 1575 1650 1725 1800 1875 1950 2025 2100
GGG
0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
p(GGG)
600 605 610 615 620 625 630 635 640 645 650 655 660 665 670 675 680 685 690 695 700
EV
Note: Bold row shows EUT-consistent choices. p(LLL) ¼ 2/3 2/3 2/3; p(LLG) ¼ 2/3 2/3 1/3, and can occur in three equivalent ways (LLG, LGL, and GLL), so the probability shown is 2/3 2/3 1/3 3; p(LGG) ¼ 2/3 1/3 1/3, and can also occur in three equivalent ways; and p(GGG) ¼ 1/3 1/3 1/3.
Bet as %
Bet in cents
Possible Choices
Table 10.
Risk Aversion in the Laboratory 177
178
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
The outcomes of the lotteries being evaluated by subjects in the High and Low treatments differ significantly. Consider the 50% bet, in the middle of Table 10. For subjects in the High treatment the two final outcomes from each choice are 100 and 450, occurring with the probabilities shown there. For subjects in the Low treatment there are four final outcomes from each choice: 300, 650, 1,000, and 1,350. Thus, the monetary rewards from the same percentage choice differ significantly. So, to explain why subjects in the High treatment are more risk averse than subjects in the Low treatment, it suffices at a qualitative level to find some utility function that has moderate amounts of risk aversion for ‘‘low’’ income levels and smaller amounts of risk aversion for ‘‘higher’’ income levels. Although less obvious than the RN prediction, any subject exhibiting CRRA would choose the same bet fraction in each row. The more risk averse they were, the smaller would be the bet, but it would be the same bet in each of the High and Low treatments. This result is important since every statement of ‘‘the EUT null hypothesis’’ in the MLA literature that we can find uses RN or CRRA specifications for the utility function.100 Thus, it is easy to see why evidence of a difference between the bet fractions in the High and Low treatments is viewed as a rejection of EUT. Of course, this does not test EUT at all. It only tests a very special case of EUT, where the specific functional form seems to have been chosen to perform poorly.101 It is easy to propose more flexible utility functions than CRRA. There are many such functions, but one of the most popular in recent work that is fully consistent with EUT has been the EP utility function proposed by Saha (1993). Following Holt and Laury (2002), the EP function is defined as UðxÞ ¼ ð1 expðax1r ÞÞ a where a and r are parameters to be assumed or estimated. RRA is then r+a(1 r)y1 r, so RRA varies with income if a ¼ 6 0. This function nests CRRA (as a-0) and CARA (as r-0). At a qualitative level, if rW0 and ao0 one can immediately rationalize the qualitative data in these experiments: RRA ¼ r+a(1 r)y1 r-r as y-0, and then one has declining RRA with higher prize incomes since ao0.
E2. Modeling Behavior The qualitative insight that one can explain these data with a simple EUT specification can be formalized by estimating the parameters of a model that
Risk Aversion in the Laboratory
179
account for the data. Such an exercise also helps explain some differences between the traders and students in Haigh and List (2005). As noted earlier, Figs. 31 and 32 alert us to the fact that the behavioral process generating data at the 100% bet level may be different than the process generating data at the ‘‘interior’’ solutions. From a statistical perspective, this is just a recognition that a model that tries to explain the interior modes of these data, and why they vary between the High and Low treatments, might have a difficult time also accounting for the spike at 100%. One approach is just to ignore that spike, and see what estimates obtain. Another approach is to construct a model and likelihood function that accounts for these two processes.102 We apply both approaches, although favoring the latter a priori. The dependent variable is naturally characterized as the fraction of the stake bet, denoted p. Therefore, the likelihood function is constructed using the specification developed by Papke and Wooldridge (1996) for fractional dependent variables. Specifically, the log-likelihood of observation i is defined as li(x) ¼ pi log(G(xi, x))+(1 pi) log(1 G(xi, x)) for parameter vector x, a vector of explanatory variables x, and some convenient cumulative distribution function G( ). We use the cumulative Gamma distribution function G(z) ¼ G(a,z), where a is a parameter that can be estimated.103 The index zi is the expected utility of the bet chosen, conditional on some parameter estimates of x and some characteristics xi for observation i. The index z is constructed using information on the lottery for the actual bet, reflecting a more detailed version of the arithmetic underlying Table 10. Thus, for a particular fractional bet, the parameters of the task imply that the subject was facing a particular lottery. So, one element of the x vector is whether or not the subject was in the High or Low treatment. Another element is the stake. Yet another element is the set of parameters of the experimental task defining the lottery outcomes (e.g., the probabilities of a loss or a gain, and the numbers defining how the bet is scaled to define the loss or the gain). Using this information, and candidate estimates of r and a for the EP utility function, the likelihood constructs the expected utility of the observed choice, and the ML estimates find the parameters of the EP utility function that best explain the observed choices. This approach can be applied directly to the data in Figs. 31 and 32, recognizing that one model must explain the multiple modes of these distributions. Alternatively, one can posit a natural two-step decision process, where the subject first decides if they are going to bet everything or not, and then if they decide not to, decides how much to bet (including 0%). This might correspond to one way that a risk-averse or risk-loving subject
180
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
might process such tasks: first figure out what a RN decision-maker would do, since that is computationally easier, and then shade one’s choice in the direction dictated by risk preferences. Since the matrix in Table 10 was not presented to subjects in such an explicit form, this would be one sensible heuristic to use. Irrespective of the interpretation, this proposed decision process implies a statistical ‘‘hurdle’’ model. First the subject makes a binary choice to contribute 100% or less. Then the subject decides what fraction to contribute, conditional on contributing less than 100%. The first stage can be modeled using a standard probit specification, although it is the second stage that is really of greatest interest. A key feature of these estimates is that they pool the data from High and Low treatments. The objective is to ascertain if one EUT-consistent model can explain the shift in the distributions between these treatments in Figs. 31 and 32. Since each subject provided multiple observations there are clustering corrections for the possible correlation of errors associated with a given subject. Table 11 reports the results of ML estimation of these models. Panel A provides estimates for the individual responses from Gneezy and Potters (1997). These estimates show some initial risk aversion at zero income levels (r ¼ 0.21) and then some slight evidence of declining RRA as income rises (a ¼ 10.019). However, the evidence of declining RRA is not statistically significant, although the 95% confidence interval is skewed towards negative values. Much more telling evidence comes from comparable estimates for the interior bets, in panel B. Here we find striking evidence of the qualitative explanation presented earlier: initial risk aversion at zero income levels (r ¼ 1.12) and sharply declining RRA as income rises (a ¼ 0.57). The point estimate of r exceeds 1 in this case, which violates the assumption of non-satiation. But the standard error on this estimate is 0.25, with a 95% confidence interval between 0.61 and 1.63. So we cannot reject the hypothesis that rr1; in fact, the p-value that the coefficient equals 1 is 0.68, so we cannot reject that specific hypothesis. Panels C through E report estimates for the treatments of Haigh and List (2005), estimated separately for traders and students since that was their main treatment. With the exception of the estimates in panel E, for all bets by University of Maryland (UMD) students, these results again confirm the qualitative explanation proposed above. Therefore, one must simply reject the conclusion of Haigh and List (2005, p. 531) that their ‘‘findings suggest that expected utility theory may not model professional traders’ behavior well, and this finding lends credence to behavioral economics and finance
181
Risk Aversion in the Laboratory
Table 11. Coefficient
Maximum Likelihood Estimates of Expo-Power Utility Function. Estimate
Standard Error
p-Value
Lower 95% Confidence Interval
Upper 95% Confidence Interval
A. Gneezy and Potters (1997) r 0.21 a 0.02 a 2.32
– Estimates for All Bets by Dutch Students 0.08 0.009 0.06 0.03 0.463 0.07 0.22 0.000 1.87
B. Gneezy and Potters (1997) ra 1.12 a 0.57 a 1.88
– Estimates for Interior Bets by Dutch Students 0.25 0.000 0.61 1.63 0.09 0.000 0.74 0.40 0.29 0.000 1.30 2.46
0.37 0.03 2.76
C. Haigh and List (2005) – Estimates for All Bets by CBOT Traders r 0.36 0.05 0.000 0.26 a 0.13 0.02 0.000 0.16 a 3.67 0.42 0.000 2.82
0.46 0.10 4.53
D. Haigh and List (2005) – Estimates for Interior Bets by CBOT Traders r 0.67 0.04 0.000 0.60 a 0.44 0.01 0.000 0.46 a 3.69 0.34 0.000 3.01
0.74 0.42 4.37
E. Haigh and List (2005) – Estimates for All Bets by UMD Studentsb r 0.99 0.27 0.001 1.54 a 0.22 0.05 0.000 0.13 a 1.71 0.21 0.000 1.28
0.44 0.32 2.13
a
See text for discussion of the point estimate for r exceeding 1, since that violates the nonsatiation assumption for this specification. b There are no estimates for the sub-sample of interior bets, since the estimate of r exceeds 1, and is statistically significantly greater than 1.
models, which are beginning to relax inherent assumptions used in standard financial economics.’’ Whether MLA models the behavior of traders better than EUT is a separate matter, but EUT easily explains the data. In fact, these data are more consistent with the priors that motivated the Haigh and List (2005) study, illustrated by List (2003), that students would be more likely to exhibit anomalies than field traders. E3. Coals To Newcastle: An Anomaly for the Behaviorists The reason that MLA is interesting is that Benartzi and Thaler (1995) use it to provide an intuitive explanation for the equity premium puzzle. Their
182
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
empirical approach is to assume a particular numerical specification of MLA, and then solve for the ‘‘evaluation horizon’’104 of returns to stocks and equities that makes their expected utility105 equivalent. They find that this horizon is roughly 12 months, which strikes one as a priori plausible if one had to pick a single representative evaluation horizon for all investors.106 Thus, they assume a particular empirical version of MLA and further assume that these coefficients do not change as they counterfactually calculate the effects of alternative evaluation horizons: According to our theory, the equity premium is produced by a combination of loss aversion and frequent evaluation. Loss aversion plays the role of risk aversion in standard models, and can be considered a fact of life (or, perhaps, a fact of preferences). In contrast, the frequency of evaluations is a policy choice that presumably could be altered, at least in principle. Furthermore, as the charts (y) show, stocks become more attractive as the evaluation period increases.
So the parameters of the MLA specification are assumed invariant to evaluation horizon, as an essential premiss of the empirical methodology. Thus the motivation for the experiments of GP and HL. As GP note, Benartzi and Thaler (1995) ‘‘y do not present direct (experimental) evidence for the presence of MLA. The evidence presented in (BT) is only circumstantial. (y) We have experimental subjects making a sequence of risky choices. To analyze the presence of MLA, we do not try to estimate the period over which subjects evaluate financial outcomes, but rather we try to manipulate this evaluation period.’’ Hence the data from GP can be used to recover the MLA preferences that are consistent with the observed behavior, and the empirical premiss of Benartzi and Thaler (1995) evaluated. Since behavioral economists are so enamored of anomalies, it may be useful to point out one or two in the MLA literature being considered here. The first anomaly is that the data from the experiments of GP demonstrate that the MLA parameters themselves depend on the evaluation horizon, which of course was varied by experimental design in their data. Hence one cannot assume that those parameters stay fixed as one calibrates the equity premium by varying the evaluation horizon. The second anomaly is that these data also imply risk attitudes defined over the utility function that are qualitatively the opposite of those customarily assumed. The MLA parameterization adopted by Benartzi and Thaler (1995, p. 79) is taken directly from Tversky and Kahneman (1992), both in terms of the functional forms and parameter values. They assume a power utility function defined separately over gains and losses: U(x) ¼ xa if xZ0, and U(x) ¼ l( x)b for xo0. So a and b are the risk aversion parameters, and
183
Risk Aversion in the Laboratory
l is the coefficient of loss aversion. Tversky and Kahneman (1992, p. 59) provide estimates that have been universally employed in applied work by behaviorists: a ¼ b ¼ 0.88 and l ¼ 2.25. Using the data from GP we estimate the parameters of this MLA model. For simplicity we assume no probability weighting, although that could be included. Benartzi and Thaler (1995, p. 83) and GP stress that it is the loss aversion parameter l that drives the main prediction of MLA, rather than probability weighting or even risk aversion in the utility function. The likelihood function is again constructed using the specification developed by Papke and Wooldridge (1996) for fractional dependent variables. Since there are no data on personal characteristics in the GP data, the x vector refers solely to whether or not the decision was made in the Low frequency setting or the High frequency setting. Thus, x ¼ (a, b, l), and each of those fundamental parameters is estimated as a linear function of binary dummies for the Low and High frequencies.107 Table 12 reports the ML estimates obtained. The ‘‘good news’’ for MLA is that they provide strong evidence that the loss aversion parameter is greater than 1. The ‘‘bad news’’ for MLA is that they provide equally striking evidence that all of the parameters of the MLA specification vary with the evaluation horizon. The ‘‘awkward news’’ for MLA is that they provide inconsistent evidence about risk attitudes in relation to the received empirical wisdom. The estimates for a indicate risk-loving behavior over gains.108 There does not appear to be much difference in risk attitudes over gains, and indeed one cannot reject the null hypothesis that they are equal with a Wald test ( p-value ¼ 0.391). The estimates for b indicate a severe case of risk aversion over losses. Moreover, subjects appear to be more risk averse in the Low frequency setting than in the High frequency setting: a Wald test of the null Table 12. Coefficient
a b l a
Maximum Likelihood Estimates of Myopic Loss Aversion Utility Functiona. Variable
Estimate
Standard Error
pValue
Lower 95% Confidence Interval
Upper 95% Confidence Interval
Low frequency High frequency Low frequency High frequency Low frequency High frequency
1.48 1.38 0.03 0.55 1.90 4.28
0.04 0.10 0.07 0.28 0.08 0.64
0.000 0.000 0.689 0.052 0.000 0.000
1.40 1.18 0.11 0.00 1.74 2.99
1.55 1.59 0.17 1.10 2.07 5.56
Estimates from responses in Gneezy and Potters (1997) experiments.
184
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
hypothesis of equality has a p-value of 0.074. Finally, the estimates for l are consistent with loss aversion, since they are both each significantly greater than 1 ( p-valueso0.0001). However, these subjects appear to be significantly more loss averse in the High frequency setting than in the Low frequency setting ( p-value ¼ 0.0005). This new analysis of the GP data therefore imply that the MLA parameters depend on the evaluation horizon and that subjects are risk loving in gains and risk averse in losses, thus pointing to anomalies compared to the standard view of PT.
APPENDIX F. ESTIMATION USING MAXIMUM LIKELIHOOD Economists in a wide range of fields are now developing customized likelihood functions to correspond to specific models of decision-making processes. These demands derive partly from the need to consider a variety of parametric functional forms, but also because these models often specify non-standard decision rules that have to be ‘‘written out by hand.’’ Thus, it is becoming common to see user-written ML estimates, and less use of prepackaged model specifications. These pedagogic notes document the manner in which one can estimate ML models of utility functions within Stata.109 However, we can quickly go beyond ‘‘utility functions’’ and consider a wide range of decision-making processes, to parallel the discussion in the text. We start with a standard CRRA utility function and binary choice data over two lotteries, assuming EUT. This step illustrates the basic economic and statistical logic, and introduces the core Stata syntax. We then quickly consider an extension to consider loss aversion and probability weighting from PT, the inclusion of ‘‘stochastic errors,’’ and the estimation of utility numbers themselves to avoid any parametric assumption about the utility function. We then illustrate a replication of the ML estimates of HL. Once the basic syntax is defined from the first example, it is possible to quickly jump to other likelihood functions using different data and specifications. Of course, this is just a reflection of the ‘‘extensible power’’ of a package such as Stata, once one understands the basic syntax.110 F1. Estimating a CRRA Utility Function Consider the simple CRRA specification in Section 2.2. This is an EUT model, with a CRRA utility function, and no stochastic error specification.
185
Risk Aversion in the Laboratory
The following Stata program defines the model, in this case using the lottery choices of Harrison and Rutstro¨m (2005), which are a replication of the experimental tasks of Hey and Orme (1994): * define Original Recipe EUT with CRRA and no errors program define ML_eut0 args lnf r tempvar prob0l prob1l prob2l prob3l prob0r prob1r prob2r prob3r y0 y1 y2 y3 tempvar euL euR euDiff euRatio tmp lnf_eut lnf_pt p1 p2 f1 f2 quietly { * construct likelihood for generate double `prob0l' = generate double `prob1l' = generate double `prob2l' = generate double `prob3l' =
EUT $ML_y2 $ML_y3 $ML_y4 $ML_y5
generate generate generate generate
double double double double
`prob0r' `prob1r' `prob2r' `prob3r'
$ML_y6 $ML_y7 $ML_y8 $ML_y9
generate generate generate generate
double double double double
`y0' `y1' `y2' `y3'
= = = =
= = = =
($ML_y14+$ML_y10)^`r' ($ML_y14+$ML_y11)^`r' ($ML_y14+$ML_y12)^`r' ($ML_y14+$ML_y13)^`r'
gen double `euL' = (`prob0l'*`y0')+(`prob1l'*`y1')+(`prob2l'*`y2')+(`prob3l'*`y3') gen double `euR' = (`prob0r'*`y0')+(`prob1r'*`y1')+(`prob2r'*`y2')+(`prob3r'*`y3') generate double `euDiff' = `euR' - `euL' replace `lnf' = ln(normal( `euDiff')) if $ML_y1==1 replace `lnf' = ln(normal(-`euDiff')) if $ML_y1==0 } end
This program makes more sense when one sees the command line invoking it, and supplying it with values for all variables. The simplest case is where there are no explanatory variables for the CRRA coefficient (we cover those below): ml model lf ML_eut0 (r: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = ) if Choices~=., cluster(id) technique(nr) maximize
The ‘‘ml model’’ part invokes the Stata ML model specification routine, which essentially reads in the ML_eut0 program defined above and makes sure that it does not violate any syntax rules. The ‘‘lf’’ part of ‘‘lf ML_eut0’’ tells this routine that this is a particular type of likelihood specification (specifically, that the routine ML_eut0 does not calculate analytical derivatives, so those must be calculated numerically). The part in brackets defines the equation for the CRRA coefficient r. The ‘‘r:’’ part just labels this equation, for output display purposes and to help reference initial values if they are specified for recalcitrant models. There is no need for the ‘‘r:’’ here to match the ‘‘r’’ inside the ML_eut0 program; we could have referred to
186
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
‘‘rEUT:’’ in the ‘‘ml model’’ command. We use the same ‘‘r’’ to help see the connection, but it is not essential. The ‘‘Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake’’ part tells the program what observed values and data to use. This allows one to pass parameter values as well as data to the likelihood evaluator defined in ML_eut0. Each item in this list translates into a $ML_y variable referenced in the ML_eut0 program, where denotes the order in which it appears in this list. Thus, the data in variable Choices, which consists of 0’s and 1’s for choices (and a dot, to signify ‘‘missing’’), is passed to the ML_eut0 program as variable $ML_y1. Variable p0left, which holds the probabilities of the first prize of the lottery presented to subjects on the left of their screen, is passed as $ML_y2, and so on. Finally, variable stake, holding the values of the initial endowments provided to subjects, gets passed as variable $ML_y14. It is good programming practice to then define these in some less cryptic manner, as we do just after the ‘‘quietly’’ line in ML_eut0. This does not significantly slow down execution, and helps avoid cryptic code. There is no error if some variable that is passed to ML_eut0 is not referenced in ML_eut0. Once the data is passed to ML_eut0 the likelihood function can be evaluated. By default, it assumes a constant term, so when we have ‘‘ ¼ )’’ in the above command line, this is saying that there are no other explanatory variables. We add some below, but for now this model is just assuming that one CRRA coefficient characterizes all choices by all subjects. That is, it assumes that everyone has the same risk preference. We restrict the data that is passed to only include strict preferences, hence the ‘‘if ChoicesB ¼ .’’ part at the end of the command line. The response of indifference was allowed in this experiment, and we code it as a ‘‘missing’’ value. Thus, the estimation only applies to the sub-sample of strict preferences. One could modify the likelihood function to handle indifference. Returning to the ML_eut0 program, the ‘‘args’’ line defines some arguments for this program. When it is called, by the default Newton– Raphson optimization routine within Stata, it accepts arguments in the ‘‘r’’ array and returns a value for the log-likelihood in the ‘‘lnf’’ scalar. In this case, ‘‘r’’ is the vector of coefficient values being evaluated. The ‘‘tempvar’’ lines create temporary variables for use in the program. These are temporary in the sense that they are only local to this program, and hence can be the same as variables in the main calling program. Once defined they are referred to with the ML_eut0 program by adding the funny
187
Risk Aversion in the Laboratory
left single-quote mark ‘ and the regular right single-quote mark ’. Thus temporary variable euL, to hold the expected utility of the left lottery, is referred to as ‘euL’ in the program.111 The ‘‘quietly’’ line defines a block of code that is to be processed without the display of messages. This avoids needless display of warning messages, such as when some evaluation returns a missing value. Errors are not skipped, just display messages.112 The remaining lines should make sense to any economist from the comment statements. The program simply builds up the expected utility of each lottery, using the CRRA specification for the utility of the prizes. Then it uses the probit index function to define the likelihood values. The actual responses, stored in variable Choices (which is internal variable $ML_y1), are used at the very end to define which side of the probit index function this choice happens to be. The logit index specification is just as easy to code up: you replace ‘‘normal’’ with ‘‘invlogit’’ and you are done! The most important feature of this specification is that one can ‘‘build up’’ the latent index with as many programming lines as needed. Thus, as illustrated below, it is an easy matter to write out more detailed models, such as required for estimation of PT specifications or mixture models. The ‘‘cluster(id)’’ command at the end tells Stata to treat the residuals from the same person as potentially correlated. It then corrects for this fact when calculating standard errors of estimates. Invoking the above command line, with the ‘‘maximize’’ option at the end to tell Stata to actually proceed with the optimization, generates this output: initial: alternative: rescale: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:
log log log log log log log log
pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood
= = = = = = = =
-8155.5697 -7980.4161 -7980.4161 -7980.4161 -7692.4056 -7689.4848 -7689.4544 -7689.4544
(not concave)
. ml display Log pseudolikelihood = -7689.4544
Number of obs Wald chi2(0) Prob > chi2
= = =
11766 . .
(Std. Err. adjusted for 215 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------_cons | .7531553 .0204812 36.77 0.000 .7130128 .7932977 ------------------------------------------------------------------------------
188
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
So we see that the optimization routine converged nicely, with no error messages or warnings about numerical irregularities at the end. The interim warning message is nothing to worry about: only worry if there is an error message of any kind at the end of the iterations. (Of course, lots of error message, particularly about derivatives being hard to calculate, usually flag convergence problems.) The ‘‘ml display’’ command allows us to view the standard output, and is given after the ‘‘ml model’’ command. For our purposes the critical thing is the ‘‘_cons’’ line, which displays the ML estimate and its standard error. Thus, we have estimated that r^ ¼ 0:753. This is the ML CRRA coefficient in this case. This indicates that these subjects are risk averse. Before your program runs nicely it may have some syntax errors. The easiest way to check these is to issue the command ml model lf ML_eut0 (r: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = )
which is the same as before except that it drops off the material after the comma, which tells Stata to maximize the likelihood and how to handle the errors. This command simply tells Stata to read in the model and be ready to process it, but not to begin processing it. You would then issue the command ml check
and Stata will provide some diagnostics. These are extremely informative if you use them, particularly for syntax errors. The power of this approach becomes evident when we allow the CRRA coefficient to be determined by individual or treatment characteristics. To illustrate, consider the effect of allowing the CRRA coefficient to differ depending on the individual demographic characteristics of the subject, as explained in the text. Here is a list and sample statistics: Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------Female | 215 .4790698 .5007276 0 1 Black | 215 .1069767 .309805 0 1 Hispanic | 215 .1348837 .3423965 0 1 Age | 215 19.95814 3.495406 17 47 Business | 215 .4511628 .4987705 0 1 215 .4604651 .4995978 0 1 GPAlow |
189
Risk Aversion in the Laboratory
The earlier command line is changed slightly at the ‘‘ ¼ )’’ part to read ‘‘ ¼ Female Black Hispanic Age Business GPAlow)’’, and no changes are made to ML_eut0. The results are as follows: ml model lf ML_eut0 (r: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = Female Black Hispanic Age Business GPAlow), cluster(id) maximize . ml display Log pseudolikelihood = -7557.2809
Number of obs Wald chi2(6) Prob > chi2
= = =
11766 27.48 0.0001
(Std. Err. adjusted for 215 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------Female | -.0904283 .0425979 -2.12 0.034 -.1739187 -.0069379 Black | -.1283174 .0765071 -1.68 0.094 -.2782686 .0216339 Hispanic | -.2549614 .1149935 -2.22 0.027 -.4803446 -.0295783 Age | .0218001 .0052261 4.17 0.000 .0115571 .0320432 Business | -.0071756 .0401536 -0.18 0.858 -.0858753 .071524 GPAlow | .0131213 .0394622 0.33 0.740 -.0642233 .0904659 _cons | .393472 .1114147 3.53 0.000 .1751032 .6118408 ------------------------------------------------------------------------------
So we see that the CRRA coefficient changes from r ¼ 0.753 to r ¼ 0.393– 0.090 Female 0.128 Black y and so on. We can quickly find out what the average value of r is when we evaluate this model using the actual characteristics of each subject and the estimated coefficients: . predictnl r=xb(r) . summ r if task==1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------r | 215 .7399284 .1275521 .4333093 1.320475
So the average value is 0.739, extremely close to the earlier estimate of 0.753. Thus, all we have done is provided a richer characterization of risk attitudes around roughly the same mean.
F2. Loss Aversion and Probability Weighting It is a simple matter to specify different economic models. Two of the major structural features of PT are probability weighting and loss aversion. The code below implements each of these specifications, using the parametric forms of Tversky and Kahneman (1992). For simplicity we assume that the decision weights are the probability weights, and do not implement the rank-dependent transformation of probability weights into
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
190
decision weights. Thus, the model is strictly an implementation of OPT from Kahneman and Tversky (1979). The extension to rank-dependent decision weights is messy from a programming perspective, and nothing is gained pedagogically here by showing it; Harrison (2006c) shows the mess in full. Note how much of this code is similar to ML_eut0, and the differences: * define OPT specification with no errors program define MLkt0 args lnf alpha beta lambda gamma tempvar prob0l prob1l prob2l prob3l prob0r prob1r prob2r prob3r y0 y1 y2 y3 tempvar euL euR euDiff euRatio tmp quietly { gen double `tmp' = (($ML_y2^`gamma')+($ML_y3^`gamma')+($ML_y4^`gamma')+($ML_y5^`gamma')) replace `tmp’ = `tmp’^(1/`gamma') generate double `prob0l' = ($ML_y2^`gamma')/`tmp' generate double `prob1l' = ($ML_y3^`gamma')/`tmp' generate double `prob2l' = ($ML_y4^`gamma')/`tmp' generate double `prob3l' = ($ML_y5^`gamma')/`tmp' replace `tmp' = replace `tmp' = generate double generate double generate double generate double
(($ML_y6^`gamma')+($ML_y7^`gamma')+($ML_y8^`gamma')+($ML_y9^`gamma')) `tmp’^(1/`gamma') `prob0r' = ($ML_y6^`gamma')/`tmp' `prob1r' = ($ML_y7^`gamma')/`tmp' `prob2r' = ($ML_y8^`gamma')/`tmp' `prob3r' = ($ML_y9^`gamma')/`tmp'
generate double `y0' = . replace `y0' = ( $ML_y10)^(`alpha') if $ML_y10>=0 replace `y0' = -`lambda'*(-$ML_y10)^(`beta') if $ML_y10<0 generate double `y1' = . replace `y1' = ( $ML_y11)^(`alpha') if $ML_y11>=0 replace `y1' = -`lambda'*(-$ML_y11)^(`beta') if $ML_y11<0 generate double `y2' = . replace `y2' = ( $ML_y12)^(`alpha') if $ML_y12>=0 replace `y2' = -`lambda'*(-$ML_y12)^(`beta') if $ML_y12<0 generate double `y3' = . replace `y3' = ( $ML_y13)^(`alpha') if $ML_y13>=0 replace `y3' = -`lambda'*(-$ML_y13)^(`beta') if $ML_y13<0 gen double `euL'=(`prob0l'*`y0')+(`prob1l'*`y1')+(`prob2l'*`y2')+(`prob3l'*`y3') gen double `euR'=(`prob0r'*`y0')+(`prob1r'*`y1')+(`prob2r'*`y2')+(`prob3r'*`y3') generate double `euDiff' = `euR' - `euL' replace `lnf' = ln(normal( `euDiff')) if $ML_y1==1 replace `lnf' = ln(normal(-`euDiff')) if $ML_y1==0 } end
The first thing to notice is that the initial line ‘‘args lnf alpha beta lambda gamma’’ has more parameters than with ML_eut0. The ‘‘lnf ’’ parameter is the same, since it is the one used to return the value of the likelihood function for trial values of the other parameters. But we now have four parameters instead of just one.
191
Risk Aversion in the Laboratory
When we estimate this model we get this output: . ml model lf MLkt0 (alpha: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 = ) (beta: ) (lambda: ) (gamma: ), cluster(id ) maximize ml display Log pseudolikelihood = -7455.1001
Number of obs Wald chi2(0) Prob > chi2
= = =
11766 . .
(Std. Err. adjusted for 215 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------alpha | _cons | .6551177 .0275903 23.74 0.000 .6010417 .7091938 -------------+---------------------------------------------------------------beta | _cons | .8276235 .0541717 15.28 0.000 .721449 .933798 -------------+---------------------------------------------------------------lambda | _cons | .7322427 .1163792 6.29 0.000 .5041436 .9603417 -------------+---------------------------------------------------------------gamma | _cons | .938848 .0339912 27.62 0.000 .8722265 1.00547 ------------------------------------------------------------------------------
So we get estimates for all four parameters. Stata used the variable ‘‘_cons’’ for the constant, and since there are no characteristics here, that is the only variable to be estimated. We could also add demographic or other characteristics to any or all of these four parameters. We see that the utility curvature coefficients a and b are similar, and indicate concavity in the gain domain and convexity in the loss domain. The loss aversion parameter l is less than 1, which is a blow for PT since ‘‘loss aversion’’ calls for lW1. And g is very close to 1, which is the value that implies that w( p) ¼ p for all p, the EUT case. We can readily test some of these hypotheses: . test [alpha]_cons=[beta]_cons ( 1) [alpha]_cons - [beta]_cons = 0 chi2( 1) = Prob > chi2 =
8.59 0.0034
. test [lambda]_cons=1 ( 1) [lambda]_cons = 1 chi2( 1) = Prob > chi2 =
5.29 0.0214
. test [gamma]_cons=1 ( 1) [gamma]_cons = 1 chi2( 1) = Prob > chi2 =
3.24 0.0720
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
192
So we see that PT is not doing so well here in relation to the a priori beliefs it comes packaged with, and that the deviation in l is indeed statistically significant. But g is less than 1, so things are not so bad in that respect. F3. Adding Stochastic Errors In the text the Luce and Fechner ‘‘stochastic error stories’’ were explained. To add the Luce specification, popularized by HL, we return to base camp, the ML_eut0 program, and simply make two changes. We augment the arguments by one parameter, m, to be estimated: args lnf r mu
and then we revise the line defining the EU difference from generate double `euDiff' = `euR' - `euL'
to generate double `euDiff' = (`euR'^(1/`mu'))/((`euR'^(1/`mu')) +(`euL'^(1/`mu')))
So this changes the latent preference index from being the difference to the ratio. But it also adds the 1/m exponent to each expected utility. Apart from this change in the program, there is nothing extra that is needed. You just add one more parameter in the ‘‘ml model’’ stage, as we did for the PT extensions. In fact, HL cleverly exploit the fact that the latent preference index defined above is already in the form of a cumulative probability density function, since it ranges from 0 to 1, and is equal to 1/2 when the subject is indifferent between the two lotteries. Thus, instead of defining the likelihood contribution by replace `lnf' = ln(normal( `euDiff')) if $ML_y1==1 replace `lnf' = ln(normal(-`euDiff')) if $ML y1==0
we can use replace `lnf' = ln(`euDiff') if $ML_y1==1 replace `lnf' = ln(1-`euDiff') if $ML y1==0
instead. The Fechner specification popularized by Hey and Orme (1994) implies a simple change to ML_eut0. Again we add an error term ‘‘noise’’ to the arguments of the program, as above, and now we have the latent index generate double `euDiff' = (`euR' - `euL')/`noise'
instead of the original generate double `euDiff' = `euR' - `euL'
193
Risk Aversion in the Laboratory
Here are the results: . ml model lf ML_eut (r: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = ) (noise: ), cluster(id) maximize . ml display Number of obs Wald chi2(0) Prob > chi2
Log pseudolikelihood = -7679.9527
= = =
11766 . .
(Std. Err. adjusted for 215 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------r | _cons | .7119379 .0303941 23.42 0.000 .6523666 .7715092 -------------+---------------------------------------------------------------noise | _cons | .7628203 .080064 9.53 0.000 .6058977 .9197429 ------------------------------------------------------------------------------
So the CRRA coefficient declines very slight, and the noise term is estimated as a normal probability with standard deviation of 0.763. F4. Non-Parametric Estimation of the EUT Model It is possible to estimate the EUT model without assuming a functional form for utility, following Hey and Orme (1994). The likelihood function is evaluated as follows: * define Original Recipe EUT with Fechner errors: non-parametric program define ML_eut0_np args lnf u5 u10 noise tempvar prob0l prob1l prob2l prob3l prob0r prob1r prob2r prob3r y0 y1 y2 y3 tempvar euL euR euDiff euRatio tmp lnf_eut lnf_pt p1 p2 f1 f2 u0 u15 quietly { * construct likelihood for generate double `prob0l' = generate double `prob1l' = generate double `prob2l' = generate double `prob3l' =
EUT $ML_y2 $ML_y3 $ML_y4 $ML_y5
generate generate generate generate
$ML_y6 $ML_y7 $ML_y8 $ML_y9
double double double double
`prob0r' `prob1r' `prob2r' `prob3r'
= = = =
generate double `u0' = 0 generate double `u15' = 1 generate generate generate generate
double double double double
`y0' `y1' `y2' `y3'
= = = =
`u0' `u5' `u10' `u15'
gen double `euL'=(`prob0l'*`y0')+(`prob1l'*`y1')+(`prob2l'*`y2')+(`prob3l'*`y3') gen double `euR'=(`prob0r'*`y0')+(`prob1r'*`y1')+(`prob2r'*`y2')+(`prob3r'*`y3') generate double `euDiff' = (`euR' - `euL')/`noise' replace `lnf' = ln(normal( `euDiff')) if $ML_y1==1 replace `lnf' = ln(normal(-`euDiff')) if $ML_y1==0 } end
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
194
and estimates can be obtained in the usual manner. We include demographics for each parameter, and introduce the notion of a ‘‘global’’ macro function in Stata. Instead of typing out the list of demographic variables, one gives the command global demog “Female Black Hispanic Age Business GPAlow”
and then simply refer to $global. Every time Stata sees ‘‘$demog’’ it simply substitutes the string ‘‘Female Black Hispanic Age Business GPAlow’’ without the quotes. Hence, we have the following results: . ml model lf ML_eut0_np (u5: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = $demog ) (u10: $demog ) (noise: ) if expid=="ucf0", cluster(id) technique(dfp) maximize difficult . ml display Log pseudolikelihood = -2321.8966
Number of obs Wald chi2(6) Prob > chi2
= = =
3736 18.19 0.0058
(Std. Err. adjusted for 63 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------u5 | Female | .096698 .0453102 2.13 0.033 .0078916 .1855044 Black | .0209427 .0808325 0.26 0.796 -.1374861 .1793715 Hispanic | .0655292 .0784451 0.84 0.404 -.0882203 .2192787 Age | -.0270362 .0093295 -2.90 0.004 -.0453217 -.0087508 Business | .0234831 .0493705 0.48 0.634 -.0732813 .1202475 GPAlow | -.0101648 .0480595 -0.21 0.832 -.1043597 .0840301 _cons | 1.065798 .1853812 5.75 0.000 .7024573 1.429138 -------------+---------------------------------------------------------------u10 | Female | .0336875 .0287811 1.17 0.242 -.0227224 .0900973 Black | .0204992 .0557963 0.37 0.713 -.0888596 .1298579 Hispanic | .0627681 .0413216 1.52 0.129 -.0182209 .143757 Age | -.0185383 .0072704 -2.55 0.011 -.032788 -.0042886 Business | .0172999 .0308531 0.56 0.575 -.0431711 .0777708 GPAlow | -.0110738 .0304819 -0.36 0.716 -.0708171 .0486696 _cons | 1.131618 .1400619 8.08 0.000 .8571015 1.406134 -------------+---------------------------------------------------------------noise | _cons | .0952326 .0079348 12.00 0.000 .0796807 .1107844 ------------------------------------------------------------------------------
It is then possible to predict the values of the two estimated utilities, which will vary with the characteristics of each subject, and plot them. Fig. 10 in the text shows the distributions of estimated utility values.
195
Risk Aversion in the Laboratory
F5. Replication of Holt and Laury (2002) Finally, it may be useful to show an implementation in Stata of the ML problem solved by HL: program define HLep1 args lnf r alpha mu tempvar theta lnfj prob1 prob2 scale euSAFE euRISKY euRatio mA1 mA2 mB1 mB2 yA1 yA2 yB1 yB2 wp1 wp2 quietly { /* initializations */ generate double `prob1' = $ML_y2/10 generate double `prob2' = 1 - `prob1' generate double `scale' = $ML_y7 /* add the endowments generate double `mA1' generate double `mA2' generate double `mB1' generate double `mB2'
to the prizes */ = $ML_y8 + $ML_y3 = $ML_y8 + $ML_y4 = $ML_y8 + $ML_y5 = $ML_y8 + $ML_y6
/* utility of prize m */ generate double `yA1' = generate double `yA2' = generate double `yB1' = generate double `yB2' =
(1-exp(-`alpha'*((`scale'*`mA1')^(1-`r'))))/`alpha' (1-exp(-`alpha'*((`scale'*`mA2')^(1-`r'))))/`alpha' (1-exp(-`alpha'*((`scale'*`mB1')^(1-`r'))))/`alpha' (1-exp(-`alpha'*((`scale'*`mB2')^(1-`r'))))/`alpha'
/* classic EUT probability weighting function */ generate double `wp1' = `prob1' generate double `wp2' = `prob2' /* expected utility */ generate double `euSAFE' = (`wp1'*`yA1')+(`wp2'*`yB1') generate double `euRISKY' = (`wp1'*`yA2')+(`wp2'*`yB2') /* EU ratio */ generate double `euRatio' = (`euSAFE'^(1/`mu'))/ ((`euSAFE'^(1/`mu'))+(`euRISKY'^(1/`mu'))) /* contribution to likelihood */ replace `lnf' = ln(`euRatio') if $ML_y1==0 replace `lnf' = ln(1-`euRatio') if $ML_y1==1 } end
The general structure of this routine should be easy to see. The routine is called with this command ml model lf HLep1 (r: Choices problem m1a m2a m1b m2b scale wealth = ) (alpha: ) (mu: )where variable ‘‘Choices’’ is a binary variable defining the subject’s choices of the safe or risky lottery; variable ‘‘problem’’ is a counter from 1 to 10 in the usual implementation of the design; the next four variables define the fixed prizes; variable ‘‘scale’’ indicates the multiples of the basic payoffs used (e.g., 1, 10, 20, 50, or 90), and variable ‘‘wealth’’ measures initial endowments prior to the risk aversion task (typically $0). Three parameters are estimated, as defined in
196
GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M
the EP specification discussed in the text. The only new steps are the definition of the utility of the prize, using the EP specification instead of the CRRA specification, and the definition of the index of the likelihood. Use of this procedure with the original HL data replicates the estimates in Holt and Laury (2002, p. 1653) exactly. The advantage of this formulation is that one can readily extend it to include covariates for any of the parameters. One can also correct for clustering of observations by the same subject. And extensions to consider probability weighting are trivial to add. F6. Extensions There are many possible extensions of the basic programming elements considered here. Harrison (2006c) illustrates the following: modeling rank-dependent decision weights for the RDU and RDEV structural model; modeling rank-dependent decision weights and sign-dependent utility for the CPT structural model; the imposition of constraints on parameters to ensure non-negativity (e.g., lW1 or mW0), or finite bounds (e.g., 0oro1); the specification of finite mixture models; the coding of non-nested hypothesis tests; and maximum simulated likelihood, in which one or more parameters are treated as random coefficients to reflect unobserved individual heterogeneity (e.g., Train (2003)). In each case template code is provided along with data and illustrative estimates.
STOCHASTIC MODELS FOR BINARY DISCRETE CHOICE UNDER RISK: A CRITICAL PRIMER AND ECONOMETRIC COMPARISON Nathaniel T. Wilcox ABSTRACT Choice under risk has a large stochastic (unpredictable) component. This chapter examines five stochastic models for binary discrete choice under risk and how they combine with ‘‘structural’’ theories of choice under risk. Stochastic models are substantive theoretical hypotheses that are frequently testable in and of themselves, and also identifying restrictions for hypothesis tests, estimation and prediction. Econometric comparisons suggest that for the purpose of prediction (as opposed to explanation), choices of stochastic models may be far more consequential than choices of structures such as expected utility or rank-dependent utility.
Lotteries are the alternatives of many theories of choice under risk. For instance, getting $10 if a fair coin toss lands heads, and zero if tails, is a lottery – call it ‘‘Toss.’’ Getting $24 if a fair deck of cards cuts to a spade, and zero if other suits, is also a lottery – call it ‘‘Cut.’’ Call the set {Toss,Cut} a pair, specifically pair 1 here. In binary lottery choice Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 197–292 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00004-5
197
198
NATHANIEL T. WILCOX
experiments subjects view many such pairs, and select just one lottery from each pair. In pair 1, three possible money outcomes are available across the lotteries in the pair. Call this the pair’s outcome context or more simply its context and write it as the vector c1 ¼ ð0;10;24Þ. Knowing pair 1’s context, we may express Toss and Cut as outcome probability vectors on that context. For instance, Toss is the vector ð1=2;1=2;0Þ, and Cut is the vector ð3=4;0;1=4Þ, both on context c1. Even under unchanging conditions, a subject might not choose Toss or Cut every time she encountered pair 1. On some occasions, she might be swayed by the somewhat higher expected value of Cut ($6 versus $5 for Toss), while on others she might opt for the relative safety of Toss (a 1/2 chance of some positive payment versus just a 1/4 chance with Cut). That is, choice under risk could be less than perfectly predictable or stochastic, and in fact an overwhelming canon of experimental evidence suggests this is so. Beginning with Mosteller and Nogee (1951), binary lottery choice experiments with repeated trials of pairs reveal substantial choice switching by the same subject between trials. In some cases, the repeated trials span days (e.g., Tversky, 1969; Hey & Orme, 1994; Hey, 2001); in such cases, one could argue that conditions may have changed between trials. Yet substantial switching occurs even between trials separated by a couple of minutes or less, and with no intervening change in wealth, portfolios of background risks, or any other obviously decision-relevant variable (Camerer, 1989; Starmer & Sugden, 1989; Ballinger & Wilcox, 1997; Loomes & Sugden, 1998). Many theories of risky choice come to us as deterministic theories. These theories take as given a single fixed relational system – a collection of relational statements about pairs of lotteries, such as ‘‘Toss is weakly preferred to Cut by subject n’’, written Tosskn Cut. One empirical interpretation of weak preference statements is that they are records of outcomes of particular choice trials, all observed under unchanging conditions. Under this interpretation, Tosskn Cut formally means we observed subject n choosing Toss from pair 1 on some trial. Indifference is then defined as observing both Cutkn Toss and Tosskn Cut on different trials (under unchanging conditions). Under this interpretation, all switching across trials (under unchanging conditions) is called one thing: Indifference. This interpretation of relational systems implies an implausible amount of indifference, given the ubiquity of choice switching in the experimental canon with repeated trails. It also renders big differences in behavior moot and small differences in behavior crucial. Suppose Anne chose Toss
Stochastic Models for Binary Discrete Choice Under Risk
199
in 19 of 20 trials, and Bob chose Toss in 1 of 20 trials: Do we want an interpretive straightjacket that simply describes Anne and Bob identically, as both indifferent between Toss and Cut? Why would that rather large difference in behavior be uninteresting, while the rather trivial empirical difference between Anne and Charlie, who chose Toss in all 20 trials, be considered crucial?1 Stochastic differences between subjects are frequently much more empirically interesting than this severe classification (implied by this particular view of a relational system) allows. Fortunately, alternative views of relational systems offer escapes from this trouble. One may instead give relational statements a probabilistic interpretation. In this view, choice probabilities are the underlying empirical reality, and relational statements are derived from them. Let Pn1 denote the probability that subject n chooses Toss from pair 1 under given conditions. In this view, dating at least to Edwards (1954), Tosskn Cut means that Pn1 0:5, and indifference means that Pn1 ¼ 0:5. The purpose of a structural theory or structure, such as expected utility, is then primarily to represent the preference directions so derived from underlying choice probabilities. In this view, the structure will play a strong supporting role in determining choice probabilities, but does not do this alone: Extra assumptions concerning random parts of behavior (for instance, random fluctuations in comparison processes from trial to trial) are the crucial missing element. An entirely different view is that subject n has a set of deterministic relational systems kn1 ; kn2 ; . . . ; knk and randomly draws one on each trial to determine her choice. Both of these approaches are stochastic models: They add randomness to received deterministic theory in some formal way to produce choice probabilities from the deterministic preference directions of a single relational system or a set of relational systems. The main reason I believe we must combine our decision theories with a stochastic model is that the deterministic view of these theories is so deeply troubled in light of all known evidence with repeated trials. However, even experiments without repeated trials invariably produce some violations of every theory, and few scholars would view a theory as falsified on the basis of one or a few observed violations of its predictions. Stochastic models give empirical discipline to this view by placing restrictions on violations, as Harless and Camerer’s (1994) note. Stochastic models also provide us with useful statistical information and instructions: The most powerful and informative statistical tests are invariably those where we have the most a priori information about the true data-generating process, and the true stochastic model is a crucial part of a data-generating process. Of course, this presumes that we know the true stochastic model.2 For choice under
200
NATHANIEL T. WILCOX
risk, work on this matter dates back at least to Becker, DeGroot, and Marschak (1963a, 1963b), but there has been resurgent interest in it amongst experimental economists (Hey, 1995; Ballinger & Wilcox, 1997; Carbone, 1997; Loomes & Sugden, 1998; Loomes, Moffatt, & Sugden, 2002; Blavatskyy, 2007). Some of this evidence will be reviewed below. Many experimentalists view assumptions about random parts of behavior as someone else’s business. In this view: (1) proper randomization of subjects to treatments (or in the case of within-designs, random assignment of treatment order to subjects) guarantees that random parts are uncorrelated with treatments, (2) large enough samples sizes and/or tasks per subject guarantee that random parts average out, and (3) appropriate nonparametric tests guarantee that specific features of random parts do not influence inference. However, all these truisms allow is unbiased inferences about average or median treatment effects over a sample. Average treatment effects are quite interesting in many policy contexts. But I argue here that when it comes to evaluating theories of discrete choice under risk, where many interesting inferences depend crucially on stochastic assumptions, average treatment effects alone are relatively uninformative. This point goes back to Becker et al. (1963b), but has received much attention of late (Loomes & Sugden, 1995; Ballinger & Wilcox, 1997; Hey, 2005; Loomes, 2005). Several principles guide my discussion. The first is that in both theory tests and estimation, stochastic models are identifying restrictions. In the case of theory tests, for instance, we will see that the well-known common ratio effect (e.g., Kahneman & Tversky, 1979) contradicts expected utility theory if some stochastic models are true, but not necessarily if others are true (Loomes & Sugden, 1995; Ballinger & Wilcox, 1997). Thus, the inference that a common ratio effect contradicts expected utility depends implicitly on a stochastic modeling assumption; that assumption identifies the common ratio effect as a theory rejection. This example will be developed at length in Section 3, since it also illustrates how average treatment effects alone (i.e., without any consideration of stochastic models) may mislead us in inference. In estimation, inferences regarding patterns and kinds of risk aversion depend strongly on stochastic models. When some stochastic models are combined with a structure that contains constant (absolute or relative) risk aversion, they will imply a matching invariance of choice probabilities across contexts that differ by an additive or proportional shift of outcomes (called CARA- and CRRA-neutrality, respectively). But not all stochastic models have this property. Therefore, structural parameter estimates can yield constant (absolute or relative) risk aversion across contexts when estimated with one stochastic model, but increasing/decreasing risk aversion
Stochastic Models for Binary Discrete Choice Under Risk
201
across contexts when estimated with a different stochastic model. Again, the stochastic model is an important identifying restriction, here because it can change our conclusions about a subject’s pattern of risk aversion across contexts as embodied in a structural parameter estimate. The preceding example also illustrates a second principle: Stochastic models have different implications across contexts. This is particularly important when we estimate some parameters of a structure, intending to use the estimates to predict or explain behavior in a new context. At a very general level in disparate ways, there has been a recent awakening to the importance of thinking about decisions across multiple contexts. The ‘‘calibration theorem’’ of Rabin (2000) and its broad generalization by Cox and Sadiraj (2006, 2008 – this volume) concerns the relationship between choices on many small contexts and a large context spanning all the smaller contexts. Holt and Laury’s (2002) examination of the effect of a large proportional change in a context is directly related to the example in the previous paragraph. That stochastic models have quite different implications across contexts is, however, a relatively underappreciated point, though it has been understood by psychologists for quite some time (see e.g., citations in Busemeyer & Townsend, 1993). The last principle is that stochastic models have very different implications about the empirical meaning of the deterministic relation ‘‘more risk averse’’ or MRA. Pratt (1964) originally developed a definition of what it means to say ‘‘Bob is more risk averse than Anne,’’ or Bob Anne, in deterministic mra expected utility theory. Wilcox (2007a) suggests a definition of ‘‘Bob is stochastically more risk averse than Anne,’’ or Bob Anne, and shows smra that under many common stochastic modeling assumptions Bob mra AnneRBob Anne. A new stochastic model, called contextual utility, smra allows one to say that Bob Anne ) Bob Anne, and this model will be mra smra examined here along with the better-known alternatives. Section 1 begins with some formal notation and definitions that facilitate later discussions, and introduces two particular structures that I use as examples throughout the chapter: Standard expected utility and rankdependent expected utility (RDEU) (Quiggin, 1982; Chew, 1983). It also defines several general properties of structures and stochastic models. Section 2 then introduces five stochastic models. Section 3 shows how average treatment effects can be uninformative or worse for matters of interest in decision under risk, using the well-known example of the common ratio effect, as discussed by Ballinger and Wilcox (1997) and Loomes (2005). Section 4 compares the stochastic models using the three principles
202
NATHANIEL T. WILCOX
described above, from the viewpoint of the structural and stochastic properties discussed in Section 1. Section 5 provides an empirical comparison between combinations of the two structures and the five stochastic models using the well-known Hey and Orme (1994) data set. This empirical comparison will illustrate the use of random parameters estimation for representing heterogeneity in a sample, thus adding to the arsenal of econometric methods described by Harrison and Rutstro¨m (2008 – this volume). Section 5 also introduces a seldom-seen wrinkle by comparing the ability of models to predict ‘‘out-of-context.’’ That is, I will compare how well particular combinations of stochastic model and structure predict choices on one context, after having been estimated using choices made on different contexts. This comparison will strongly suggest that when it comes to prediction, it is much more important to get the stochastic model right than it is to get the structure right. At the same time, in-sample fit comparison suggests the opposite. Thus, it seems that explanation and prediction may call for different emphasis: Explanation hinges more on correct structure, while prediction in new contexts hinges more on correct stochastic models.
1. PRELIMINARIES 1.1. Notation and Definitions Let Z ¼ ð0; 1; 2; . . . ; z; . . . ; I 1Þ denote I equally spaced nonnegative money outcomes z including zero.3 The ‘‘unit outcome’’ in Z varies across experiments and surveys: That is, ‘‘1’’ could represent $0.05 or $5 or d50 in Z. A lottery S m ¼ ðsm0 ; sm1 ; . . . ; smz ; . . . ; smðI1Þ Þ is a discrete probability distribution on Z. A pair m is a set fS m ; Rm g of two lotteries. Let cm be the context of pair m, defined as the vector of outcomes remaining after deletion of all outcomes in Z such that smi ¼ rmi ¼ 0. That is, pair m’s context is the vector of outcomes that can occur in at least one of its two lotteries. In many experiments, all contexts are triples (i.e., all pairs involve just three possible outcomes). For instance in Hey and Orme (1994), Hey (2001), and Harrison and Rutstro¨m (2005), there are I ¼ 4 outcomes, so that Z ¼ (0,1,2,3); this quadruple yields four distinct contexts, namely (0,1,2), (0,1,3), (0,2,3), and (1,2,3), and all pairs in those experiments are on one of those four contexts. For pairs m ¼ fS m ; Rm g ¼ fðsmj ; smk ; sml Þ; ðrmj ; rmk ; rml Þg on a threeoutcome context cm ¼ ð j; k; lÞ where lWkWj, choose the lottery names Sm
Stochastic Models for Binary Discrete Choice Under Risk
203
and Rm so that smk þ sml 4rmk þ rml and sml orml whenever this is possible. Sm then has less probability of the lowest or highest outcomes j and l, but a larger probability of the middle outcome k, than lottery Rm. In this case, we say that Sm is safer than Rm and call Sm the safe lottery in pair m. This labeling is henceforth adopted: It applies to all pairs in Hey and Orme’s (1994) experiment. Let Ob denote the set of all such basic pairs. Some experiments also include pairs where one lottery first-order stochastically dominates the other, that is where lottery names can be chosen so that smk þ sml rmk þ rml and sml rml , with at least one inequality strict. Such lotteries are not ordered by the ‘‘safer than’’ relation as I have just described it, but I will nevertheless let Sm denote the dominating lottery in such pairs. Let Ofosd denote the set of all such FOSD pairs. Together, basic pairs and FOSD pairs are a mutually exclusive and exhaustive classification of lottery pairs on three-outcome contexts. An experiment consists of presenting the pairs m ¼ 1, 2,y, M to subjects n ¼ 1, 2,y, N. In some experiments, each pair is presented repeatedly; in such cases, let t ¼ 1, 2,y,T denote these repeated trials. Let Pnm;t ¼ Prð ynm;t ¼ 1Þ be the probability that subject n chooses S m in trial t of pair m, where ynm;t ¼ 1 is this event in the experiment ( ynm;t ¼ 0 if n instead chooses Rm ). The trial subscript t may be dropped for experiments with single trials (T ¼ 1) or if one assumes that choices are independent across trials and choice probabilities do not change across trials,4 writing Pnm instead of Pnm;t . Except where necessary, I suppress t to prevent notational clutter. It will occasionally be convenient to index different pairs by real numbers with special meanings, or by a pair of indices, rather than a natural number index m. This should be clear as it comes up. Finally, when considering expectations of various sample moments, we need notation appropriate to the population from which subjects n are sampled, instead of actual samples. Put differently, we occasionally need to think about subject types c, and their cumulative distribution function or c.d.f. JðcÞ, in the sampled population instead of actual subjects n in a particular sample. On such occasions, the subject superscript n will be replaced by a subject type superscript c, distributed according to some c.d.f. JðcÞ in the sampled population.
1.2. The Two Structures For transitive theories, the structure of choice under risk is a function V of lotteries and a vector of structural parameters bn such that VðS m jbn Þ VðRm jbn Þ 03Pnm 0:5.5 This equates the relational statement S m kn Rm
204
NATHANIEL T. WILCOX
with the probability statement Pnm 0:5.6 Structure maps pairs into a set of possible probabilities rather than a single unique probability, and hence underdetermines choice probabilities. Stochastic models, discussed subsequently, remedy this. The expected utility (or EU) structure is VðS m jbn Þ
I1 X
smz unz ; such that
z¼0
I1 X
smz unz
z¼0
I1 X
rmz unz 3Pnm 0:5
(1)
z¼0
The unz are called utilities of outcomes z. Representation theorems for both EU and RDEU show that the utilities representing preferences are unique only up to an affine transformation, so we may choose a common ‘‘zero and unit’’ utility for all subjects. I do this here with the two lowest outcomes in Z, choosing un0 ¼ 0 and un1 ¼ 1 for all subjects n. We may then think of the structural parameters of EU as subject n’s utilities of the remaining I2 outcomes in Z (i.e., bn ¼ ðun2 ; un3 ; . . . ; unI1 Þ). Note that I refer to both EU and RDEU as affine structures because of this affine transformation nonuniqueness property of their utilities. The expression VðS m jbn Þ VðRm jbn Þ, while only unique up to scale for EU and RDEU (because they are affine structures), will frequently be called V-distance as this plays a central role in several stochastic models. The RDEU structure (Quiggin, 1982; Chew, 1983) replaces the probabilities smz in the P expected utility P structure with weights wsmz . These weights are wsmz ¼ wð iz smi Þ wð i4z smi Þ, where a continuous and increasing weighting function w(q) takes the unit interval onto itself. The RDEU structure is then VðS m jbn Þ
I1 X z¼0
wsmz unz such that
I1 X z¼0
wsmz unz
I1 X
wrmz unz 3Pnm 0:5 (2)
z¼0
Several parametric forms have been suggested for the weighting function; here, I use Prelec’s (1998) one-parameter form, which is wðqjgn Þ ¼ n expð½ lnðqÞg Þ ’ qA(0,1), w(0) ¼ 0, and w(1) ¼ 1. In RDEU, bn ¼ ðun2 ; un3 ; . . . ; unI1 ; gn Þ is then the structural parameter vector, and EU is a special case of this where gn ¼ 1, in which case w(q) q and wsmz smz . I use EU and RDEU as widely known and important exemplars, but many of the issues discussed here are not specific to them or any specific parametric instantiation of them. While my estimations ultimately use nonparametric utilities as expressed above, I will discuss EU and RDEU structures that use well-known
Stochastic Models for Binary Discrete Choice Under Risk
205
parametric utility functions of theoretical importance. These are the CARA n form unz ¼ signðan Þea z , in which the local absolute risk aversion 00 0 u ðzÞ=u ðzÞ at z is the constant an’z, and the CRRA form unz ¼ n ð1 jn Þ1 z1j , in which the local relative risk aversion zu00 ðzÞ=u0 ðzÞ at z is the constant jn’z. 1.3. Preference Equivalence Sets Suppose that structure V implies that, for any fixed structural parameter vector b, there is a set OeV of pairs over which preference directions must all be equivalent, formally defined by VðSm jbÞ VðRm jbÞ 03VðSm0 jbÞ VðRm0 jbÞ 0 8 fixed b; 8 m and m0 2 OeV (3)
Call such sets preference equivalence sets: These are typically derived from the axiom system underlying a structure, or from the algebraic form of the structure. Most are well-known to empirical decision research because they are a large basis (but not the only basis) for theory-testing experiments. Several types of preference equivalence sets play central roles in my discussion of stochastic models. Preference equivalence sets are special because one of the stochastic models, random preferences (RPs), makes extremely strong predictions about them. 1.3.1. Common Ratio Sets These are perhaps the most widely discussed example of preference equivalence sets for EU. A common ratio set has the form ffSt ; Rt gjS t ¼ ð1 ts; ts; 0Þ; Rt ¼ ð1 tr; 0; trÞg. Calling t 2 ð0; 1 the common ratio, pairs in a common ratio set vary only by this common ratio: The pairs all have the same value of s and r (with sWr), and they are all on the same context ð j; k; lÞ. The EU structure implies that for any given utility vector ðunj ; unk ; unl Þ, either S t kn Rt 8t or Rt kn S t 8t (e.g., Kahneman & Tversky, 1979) since the EU V-distance between all the pairs in a common ratio set may be written ð1 tsÞunj þ tsunk ½ð1 trÞunj þ trunl t½ðr sÞunj þ sunk runl , whose sign is obviously independent of the common ratio t. So any common ratio set is an EU preference equivalence set. Experimenters choose a root pair fS1 ; R1 g, that is a pair with t ¼ 1, for a common ratio set in such a way that most subjects are expected to choose S1 from the root pair, and also choose t small, say equal to 14 or less, and include these kinds of pairs in the design as well. The usual finding is that most subjects in a sample choose S1 from the root pair, but most subjects
206
NATHANIEL T. WILCOX
instead choose Rt from pairs with sufficiently small t. This is generally called the common ratio effect and is widely taken as evidence against the EU structure. Loomes and Sugden (1995), Ballinger and Wilcox (1997), and others have pointed out, however, that while some stochastic models make this a correct inference, other stochastic models do not. This is explained later.
1.3.2. MPS Pairs on a Specific Three-Outcome Context There is a theoretically important subset of the basic pairs Ob defined previously. If m 2 Ob and EðRm Þ ¼ EðS m Þ, Rm is a mean-preserving spread (MPS) of Sm according to the definition of Rothschild and Stiglitz (1970): Call Omps Ob the set of MPS pairs. There is a well-known implication of Jensen’s Inequality for the EU structure, for all pairs m 2 Omps : If unz is weakly concave (convex) in z, then the EU structure implies that S m kn Rm ðRm kn S m Þ8m 2 Omps (Rothschild & Stiglitz, 1970). Neither EU nor RDEU require unz to be weakly concave or weakly convex across all z. Indeed, where utilities have been estimated nonparametrically across four outcomes (0,1,2,3), as in Hey and Orme (1994) and Hey (2001), a substantial fraction of subjects (30–40%, depending on the structure estimated) display a concave-then-convex pattern of utilities – concave on the contexts (0,1,2), (0,1,3), and (0,2,3), but convex on the context (1,2,3). This is reflected in the fact that risk-averse choices in those data sets are much less frequent for pairs on the context (1,2,3). When I come to estimation using Hey and Orme’s data, this is why I avoid a parametric utility function (like CARA or CRRA), which would force a uniform curvature over all four outcomes. However, any vector of utilities ðunj ; unk ; unl Þ on any specific three-outcome context is either weakly concave or weakly convex in z (or both). Therefore, if we let Ocmps Omps be all MPS pairs on any specific three-outcome context c, it is a preference equivalence set for EU. This is not true of RDEU for all weighting functions. I place heavy emphasis on this particular property of the EU structure in my comparison of stochastic models, even though it is rarely considered in decision experiments. Outside of decision theory and experimental economics, the chief uses of both EU and RDEU in applied economic theory stress comparative statics results that flow from pairs m 2 Omps (precisely because of the property described above) or from pairs in special subsets of Omps , such as ‘‘linear spreads’’ and ‘‘monotone spreads’’ (see e.g., Quiggin, 1991). It seems sensible to emphasize and concentrate on those properties of structures that get used the most in applied economic theory,
207
Stochastic Models for Binary Discrete Choice Under Risk
and how those properties may be changed (or not) by various stochastic modeling options. 1.3.3. FOSD Pairs It should be obvious that Ofosd , the set of FOSD pairs, is a preference equivalence set for both EU and RDEU since both of these structures obey first-order stochastic dominance. In fact it is something stronger than a preference equivalence set, since these structures imply that all subjects n prefer S m 8m 2 Ofosd quite regardless of their structural parameter vector bn . Loomes and Sugden (1995) have attached particular importance to this preference equivalence set because it is common to both EU and RDEU (and a great number of theories). In experiments, subjects rarely violate FOSD when it is ‘‘transparent’’ (Loomes & Sugden, 1998; Hey, 2001). Without further adornment, only one of the five stochastic models considered here gets this observation approximately correct; but we will see that with the aid of trembles and an appropriate computational interpretation, other stochastic models can be modified sensibly, and at almost no cost of parsimony, to explain this observation. However, none of the five models can explain ‘‘nontransparent’’ violations of FOSD such as in Birnbaum and Navarrete (1998).7 1.3.4. Context Shifts and Parametric Utility Functions Say that the lottery pairs m ¼ fS m ; Rm g and m0 ¼ fS m0 ; Rm0 g differ by an additive context shift if Sm and S m0 are identical probability vectors, and Rm and Rm0 are identical probability vectors, but the contexts of pairs m and m0 differ by the addition of x to all outcomes; that is cm ¼ ð j; k; lÞ and cm0 ¼ ð j þ x; k þ x; l þ xÞ. If an EU or RDEU maximizer has a CARA n n utility function over outcomes z, then unzþx signðan Þea ðzþxÞ ea x n n ½signðan Þea z ea x unz . Therefore, whenever m and m0 differ by an additive context shift, n
n
VðS m0 jbn Þ ea x VðSm jbn Þ and VðRm0 jbn Þ ea x VðRm jbn Þ
(4)
for both the EU and RDEU structures with a CARA utility function. It follows in this instance that, for any given b, VðSm jbÞ VðRm jbÞ 03VðS m0 jbÞ VðRm0 jbÞ 0. Therefore, a set of pairs that differ only by additive context shifts are a preference equivalence set for both EU and RDEU structures with CARA utility functions. Similarly, say that the lottery pairs m ¼ fSm ; Rm g and m0 ¼ fS m0 ; Rm0 g differ by a proportional context shift if S m and S m0 are identical probability
208
NATHANIEL T. WILCOX
vectors, and Rm and Rm0 are identical probability vectors, but the contexts of pairs m and m0 differ by the multiplication of all outcomes by yW0; that is cm ¼ ð j; k; lÞ and cm0 ¼ ð yj; yk; ylÞ. If an EU or RDEU maximizer has a n CRRA utility function over outcomes z, then unyz ð1 jn Þ1 ðyzÞ1j 1jn n 1 1jn 1jn n 0 y ð1 j Þ z y uz . Therefore, whenever m and m differ by a proportional context shift, n
n
VðS m0 jbn Þ y1j VðSm jbn Þ and VðRm0 jbn Þ y1j VðRm jbn Þ
(5)
for both the EU and RDEU structures with a CARA utility function. So by similar reasoning, a set of pairs that differ only by proportional context shifts are a preference equivalence set for both EU and RDEU structures with CRRA utility functions. These humble properties of CARA and CRRA utility functions, wholly transparent to anyone familiar with Pratt (1964), are not always important for theory comparisons, where experimental design can sometimes make parametric assumptions about utility functions unnecessary. Yet they are quite important for estimation because different stochastic models interact with these parametric functional forms in very different ways. We will see below that three of our stochastic models imply an intuitive choice probability invariance with an additive (proportional) context shift when a CARA (CRRA) utility function is used for estimation, but other stochastic models will not. Because of this, stochastic models are crucial to inferences about the constancy (or not) of absolute or relative risk aversion across different contexts (as in, e.g., Holt & Laury, 2002).
1.4. Other Structural Properties Not all predictions of structures take the shape of preference equivalence sets. Formally, a preference equivalence set is a two-way implication about preference directions across a minimum of two pairs. But structures have some properties that are not two-way implications and, moreover, are defined with a minimum of three pairs. The two major properties here are betweenness and transitivity. Stochastic models can behave very differently with respect to these kinds of properties. For instance, we will see that while RPs make extremely strong predictions about properties expressible as preference equivalence sets, RPs produce nothing recognizable as a stochastic transitivity property.
Stochastic Models for Binary Discrete Choice Under Risk
209
1.4.1. Betweenness Betweenness is the mother of all predicted differences between stochastic models (Becker et al., 1963a, 1963b). Let D ¼ tC þ ð1 tÞE, t 2 ½0; 1, be a linear mixture of the probability vectors making up any lotteries C and E; obviously D is itself a third lottery. A structure is said to satisfy betweenness if VðDjbÞ is between VðCjbÞ and VðEjbÞ for any b and any t. It is wellknown that EU satisfies betweenness and that RDEU does not. Generally speaking, betweenness tests are conducted by having subjects choose from sets of three lotteries fC; D; Eg constructed as in the definition of betweenness above, where D is a mixture of C and E, rather than from pairs of lotteries. Becker et al. (1963a) showed that, in such situations, some stochastic models predict mild violations of betweenness when combined with the EU structure, while others do not. Becker et al. (1963b) showed rates of violation of betweenness in about 30% of all choices from suitably constructed lottery triples. This is far too high a rate to be a slip of the fingers (we will revisit the notion of ‘‘trembles’’ shortly), but just about right for strong utility models, given contemporary knowledge about retest reliability of lottery choice (e.g., Ballinger & Wilcox, 1997). Blavatskyy (2006) discusses how betweenness violations of this order of magnitude, observed within and across many studies, can be explained by EU with the strong utility model (very similar explanations flow from strict utility, contextual utility, and the wandering vector (WV) model). Random preference EU models, however, seem to be rejected by these observations (Becker et al., 1963a, 1963b). In the language of Gul and Pesendorfer (2006), EU with RPs has the property of extremeness, which means that any observed choice from a set must be the unique maximizer in the choice set for some utility function. Betweenness implies that this can only be C or E in a set fC; D; Eg when D is a mixture of C and E. Strictly speaking, this is only true for regular RP models, in which ‘‘tied preferences’’ (and so indifference between C and E) are zero probability events (see Gul and Pesendorfer). But to explain a 30% rate of violations of betweenness as indifference between C and E, one would be invoking a rather scary amount of indifference. Since I focus on binary choice – choice from pairs, not triples – these observations about betweenness tests in suitably constructed lottery triples are mostly tangential. Yet betweenness does have testable implications across certain sets of MPS pairs for most of the stochastic models introduced below. So after that introduction, it will reappear when MPS pairs are taken up again.
210
NATHANIEL T. WILCOX
1.4.2. Transitivity, Stochastic Transitivities, and Simple Scalability Consider the three unique lottery pairs that can be constructed from any triple of lotteries {C,D,E}, that is the pairs {C,D}, {D,E}, and {C,E}, and call these pairs 1, 2, and 3. A deterministic relational system is transitive if Ckn E whenever both Ckn D and Dkn E. Both EU and RDEU are transitive structures, and so are many other structures. There are three stochastic versions of transitivity that may or may not be satisfied when a transitive structure is combined with a stochastic model. These are: Strong stochastic transitivity (SST): minðPn1 ; Pn2 Þ 0:5 ) Pn3 maxðPn1 ; Pn2 Þ; Moderate stochastic transitivity (MST): minðPn1 ; Pn2 Þ 0:5 ) Pn3 minðPn1 ; Pn2 Þ; and Weak stochastic transitivity (WST): minðPn1 ; Pn2 Þ 0:5 ) Pn3 0:5. Stochastic transitivities are among the central distinctions between stochastic choice models (Morrison, 1963). As we will see, the five stochastic models considered here predict either SST or MST, or they predict no stochastic transitivity at all – not even WST. The relative power of stochastic transitivity predictions is that they are independent of whether EU or RDEU is the true structural model. Indeed, any stochastic transitivity property of a stochastic model will hold for all transitive structures VðS m jbn Þ VðRm jbn Þ 03Pnm 0:5. The rhetoric of identifying restrictions cuts both ways: We need to be circumspect about using a ‘‘favorite’’ structure as an identifying restriction for choosing a stochastic model. As will be clear below, most stochastic model predictions depend on the true structural model: Stochastic transitivities are particularly useful testable properties precisely because they do not (for the class of transitive structures). Simple scalability is a stochastic choice property that is closely related to stochastic transitivities: It is a necessary condition for the stochastic models that produce SST. Simple scalability holds if there is a function f(x, y), increasing in x and decreasing in y, and an assignment of structure values V to all lotteries, such that Pnm f ½VðSm jbn Þ; VðRm jbn Þ. Simple scalability implies a choice probability ordering independence property. Let the pairs {C,E} and {D,E} be indexed by ce and de, respectively. Let E 0 be any fourth lottery, to be substituted for E in these two pairs, forming the pairs fC; E 0 g and fD; E 0 g indexed by ce0 and de0 , respectively. Then simple scalability requires Pnce Pnde iff Pnce0 Pnde0 (Tversky & Russo, 1969). Intuitively,
Stochastic Models for Binary Discrete Choice Under Risk
211
suppose we thought of lottery E as a standard of comparison, and we measure the relative strength of subject n’s preference for lotteries C and D by the relative likelihood that these are chosen over the standard lottery E. Simple scalability requires that this ordering is independent of the choice we make for the standard lottery: We must be able to replace E by any E 0 and get the same ordering of choice probabilities. There is a large canon of experimental results, generated almost exclusively by experimental psychologists, on stochastic transitivities and simple scalability. Experimental economists will notice that much of this canon was generated long before battle was joined on methodological matters such as performance-contingent incentives and incentive compatibility, as initiated by Grether and Plott (1979). Notwithstanding Camerer and Hogarth’s (1999) claim to the contrary, there are findings based on purely hypothetical tasks, or tasks with very low incentive levels, that simply do not hold up with real performance-contingent incentives of sufficient size (see e.g., Wilcox (1993) on violations of ‘‘reduction of compound lotteries,’’ or Cummings, Harrison, and Rutstro¨m (1995) on binary choice valuation methods). Nevertheless, collective experience since Grether and Plott leads me to respect the experimental canon from ‘‘psychology before incentives’’ since many (though not all) of its qualitative findings replicate when using our own methods. Therefore, I regard the psychological canon as useful information, while also believing we need to replicate its important findings using our own methods. With these remarks in mind, the psychological experimental canon suggests that SST holds much of the time, but that there are systematic violations of it in many judgment and choice contexts (Block & Marschak, 1960; Luce & Suppes, 1965; Tversky & Russo, 1969; Tversky, 1972; Luce, 1977). That evidence coincides with theoretical reasoning based on similarity and/or dominance relations (Debreu, 1960), and generally supports MST instead; but much of that evidence does not use lotteries as choice alternatives. In the specific case of choice under risk, some evidence against SST comes from experiments where lottery outcome probabilities are deliberately made uncertain or imperfectly discriminable by experimental manipulation (e.g., Chipman, 1963; Tversky, 1969). Indeed under these circumstances Tversky showed that WST can be violated in systematic ways by at least some subjects. However, there are occasional violations of SST with standard lotteries too, and with our own methods, again matching theoretical reasoning based on similarity relations (Ballinger & Wilcox, 1997). The implications of simple scalability have also failed repeatedly with
212
NATHANIEL T. WILCOX
lotteries (e.g., Myers & Sadler, 1960; see Busemeyer & Townsend, 1993) though many such demonstrations involve miniscule or purely hypothetical incentives. My own assessment of this evidence is that we should expect stochastic models that imply SST to have some systematic problems, but that stochastic models implying MST may be fairly robust for typical binary lottery choice under risk.8
2. FIVE STOCHASTIC MODELS The sheer number of stochastic modeling options for binary discrete choice under risk is, frankly, amazingly large. Good sources for the five models I consider here, as well as other models, are Becker et al. (1963a), Luce and Suppes (1965), Loomes and Sugden (1995), Busemeyer and Townsend (1993), Fishburn (1999), and Wilcox (2007a). Three of the models are wellknown to experimental economics: These are the RP model, the strong utility (or Fechnerian) model, and the strict utility model. Moderate utility models are less well-known to the field, though related stochastic modeling assumptions are found in Hey (1995) and Buschena and Zilberman (2000). I discuss two of these: The WV model of Carroll (1980) and my own contextual utility model (Wilcox, 2007a). I also briefly discuss two other very interesting models, Blavatskyy’s (2007) truncated error model and Busemeyer and Townsend’s (1993) decision field theory, but do not treat these in detail here. Because stochastic transitivities are such a fundamental part of stochastic models, these properties are discussed with the introduction of each model, rather than in later sections. I begin with a useful stochastic modeling device that can be used in conjunction with any of these models in various ways and for various purposes.
2.1. Trembles Some randomness of observed choice has been thought to arise from attention lapses or simple responding mistakes (e.g., pushing the wrong button) that are wholly or mostly independent of pairs m. Following Moffatt and Peters (2001), call such events trembles and assume they occur with probability on independent of m and, in the event of a tremble, assume that choices of Sm or Rm are equiprobable.9 In concert with this, draw a distinction between overall choice probabilities Pnm (that in part reflect trembles) and considered choice probabilities Pnm that depend on
Stochastic Models for Binary Discrete Choice Under Risk
213
characteristics of m and govern choice behavior when no tremble occurs. Under these assumptions and definitions, we have Pnm ¼ ð1 on ÞPnm þ
on 2
(6)
Note that under Eq. (6), Pnm 0:5 iff Pnm 0:5, ’ onA[0,1]. In words, we may give a stochastic definition of preference in terms of either Pnm or Pnm since trembles do not reverse preference directions relative to stochastic indifference (defined as Pnm ¼ Pnm ¼ 0:5. The stochastic models that follow are all models of considered choice probabilities Pnm , to which we may add trembles in the manner of Eq. (6) if this is empirically or theoretically desirable. Later, we will see that trembles usefully allow for violations of ‘‘transparent’’ instances of first-order stochastic dominance.
2.2. The Random Preference Model The RP model views each subject as having many deterministic relational systems, and assumes that at each trial of pair m, the subject randomly draws one of these and chooses (without error) according to the drawn relational system’s relational statement for that pair. From an econometric viewpoint, the RP model views random choice as arising from randomness of structural parameters. We think of each individual subject n as having an urn filled with possible structural parameter vectors bn . (For instance, a CRRA EU structure with an RP model could be thought of as a situation where subject n has an urn filled with various values of her coefficient of relative risk aversion jn .) At each new trial t of any pair m, the subject draws a new parameter vector from this urn (with replacement) and uses it to calculate both VðS m jbn Þ and VðRm jbn Þ without error. Then Sm is chosen iff VðSm jbn Þ VðRm jbn Þ 0: Formally, it is a relational system that is randomly selected, and that relational system’s relational statement for the pair fS m ; Rm g determines choice without error. Each bn represents a relational system, and so a single draw of bn is applied to both lotteries in a pair and determines the choice on that draw. The considered choice n probability is simply the probability that a value of b is drawn such that n n VðSm jb Þ VðRm jb Þ 0. Let Bm ¼ bjVðS m jbÞ VðRm jbÞ 0 ; then under the RP model, Pnm ¼ Prðbn 2 Bm Þ.10 To carry out parametric estimation for any subject n, one then needs to specify a joint distribution function Gb ðxjan Þ for bn , conditioned on some vector an of parameters governing the location and shape of Gb. Along with
214
NATHANIEL T. WILCOX
any tremble probability, call the vector Zn ¼ ðan ; on Þ stochastic parameters as they determine choice probabilities but are not themselves structural parameters. In RP models, the stochastic parameters an determine a subject’s distribution of structural parameters. We then have the considered choice probability Z n n Pm ¼ dGb ðxja Þ (7) b2Bm
Substituting (a,o) for the true parameter vector, one may then use Eq. (7) to construct a likelihood function for observations ynm conditional on (a,o), for some subject n. This likelihood function would then be maximized in (a,o) to estimate ðan ; on Þ. How does this work with specific structures on a given context? Denoting combinations of structures and stochastic models by ordered pairs, I begin with an (RDEU,RP) model, which will imply an (EU,RP) model since EU is a special case of RDEU. The technique described here is due to Loomes et al. (2002): Like them, I simplify the problem by assuming that any weighting function parameters are nonstochastic, that is, that the only structural parameters that vary in a subject’s ‘‘RP urn’’ are her utilities. Let Gu be the joint c.d.f. of those utilities. Substituting Eq. (2) into Eq. (7), considered choice probabilities under an (RDEU,RP) model with Prelec’s (1998) one-parameter weighting function are ! P n n n n Pm ¼ Pr W mz ðg Þum 0jGu ðxja Þ ; where z2cm
n
W mz ðg Þ ¼ w
P iz
! n
smi jg
w
P i4z
n
smi jg
w
P iz
! n
rmi jg
þw
P
rmi jg
n
(8)
i4z
Noting that W mj ðgn Þ W mk ðgn Þ W ml ðgn Þ for pairs m on three-outcome contexts cm ¼ ð j; k; lÞ, and assuming strict monotonicity of utility in outcomes so that we may divide the inequality in Eq. (8) through by unk unj , Eq. (8) becomes Pnm ¼ PrðW mk ðgn Þ þ W ml ðgn Þðvnm þ 1Þ 0jGu ðxjan ÞÞ; where ðun unk Þ ; vnm ln ðuk unj Þ W mk ðgn Þ ¼ wðsmk þ sml jgn Þ wðsml jgn Þ wðrmk þ rml jgn Þ þ wðrml jgn Þ; and W ml ðgn Þ ¼ wðsml jgn Þ wðrml jgn Þ
(9)
Stochastic Models for Binary Discrete Choice Under Risk
215
This elegant trick reduces the random utility vector ðunj ; unk ; unl Þ on context cm to the scalar random variable vnm 2 Rþ , containing all choice-relevant stochastic information about the agent’s random utilities on context cm. Let Gvm ðxjanm Þ be the c.d.f. of vnm (generated by Gu), and consider just basic pairs where rml sml 40 so that W ml ðgn Þ ¼ wðrml jgn Þ wðsml jgn Þ40.11 We can then rewrite Eq. (9) to make the change of random variables described above explicit: W mk ðgn Þ þ W ml ðgn Þ n n jGvm ðxjam Þ ; or ¼ Pr vm W ml ðgn Þ wðsmk þ sml jgn Þ wðrmk þ rml jgn Þ n n jam Pm ¼ Gvm wðrml jgn Þ wðsml jgn Þ
Pnm
(10)
With Eq. (10), we have arrived where Loomes et al. (2002) left things. Choosing some c.d.f. on R+ for Gvm , such as the lognormal or gamma distribution, construction of a likelihood function from Eq. (10) and choice data is straightforward for one context and this is the kind of experimental data Loomes, Moffatt, and Sugden considered. But careful attention to the notation above makes two things clear that Loomes, Moffatt, and Sugden did not discuss. First, because the random variable vnm is generated from the joint distribution of utilities on the context cm, its c.d.f. Gvm is contextdependent. Second, if m and mu are pairs on distinct contexts that share some outcomes, one cannot simply choose any old distribution functions for vnm and vnm0 since these two distinct random variables are generated by the same underlying joint distribution Gu of outcome utilities. This implies that tractable generalizations of (RDEU,RP) models across multiple contexts are inherently subtle, as will be made plain later. Loomes and Sugden (1995) point out a property of RP models that I make repeated use of later. The definition of a preference equivalence set OeV given in Eq. (3), and the definition Bm ¼ bjVðS m jbÞ VðRm jbÞ 0 , together imply that Bm Bm0 8m and m0 2 OeV . Eq. (7) then implies that Pnm Pnm0 8m and m0 2 OeV . That is, the RP model requires that each subject n with structure V must have constant choice probabilities across every pair that is in a preference equivalence set for structure V. This does not mean that all subjects must have the same choice probabilities; these may vary across subjects with the same V, since different subjects may have differently composed ‘‘urns’’ of parameter vectors b. However, it implies the following: If the RP model and structure V are both true for all subjects in a population, expected sample choice proportions for all pairs m and m0 in a preference equivalence set of structure V must be equal. Formally, replace
216
NATHANIEL T. WILCOX
the subject index n by the subject type index c with distribution in the R JðcÞ c eV c c 0 sampled population: If P P 8m and m 2 O , 8c, then P dJðcÞ m0 m m R c Pm0 dJðcÞ8m and m0 2 OeV as well. That is, the same individual invariance of choice probabilities across pairs in any preference equivalence set will characterize the population (and, obviously, any sample from it up to sampling variability). This preference equivalence set property of RP models is extremely powerful, especially for more restrictive structures like expected utility which create several distinct kinds of preference equivalence sets: I use it frequently below. It is also a seductive property because it is a powerful identifying restriction and rationalizes many relatively casual inferences. For instance, we will see later that the usual conclusion drawn about the common ratio effect (that it is an EU violation) is sensible if RP is the true stochastic model. However, there is a downside to this property of RP models. Because few preference equivalence sets are shared by several different theories, tests of a preference equivalence set property are almost always joint tests of the RP model and some specific structure V. The only exception to this is the preference equivalence set of FOSD pairs, which is shared by many structures V. In stark contrast to the powerful preference equivalence set property, RP models generally have no stochastic transitivity property – not even WST – even if the structure considered is a transitive one like EU or RDEU (Loomes & Sugden, 1995; Fishburn, 1999). The reason for this is identical to the well-known ‘‘voting paradox’’ of public choice theory (Black, 1948): Unless all preference orderings in the urn have the property of singlepeakedness, there need be no Condorcet winner.12 Those who regard tests of transitivity as central to empirical decision research may find this aspect of RPs deeply troubling. Of course, a proponent of RPs might well accept a restriction to urns of preference orderings that do have the singlepeakedness property.13 Proponents of rank-dependent models will similarly (sometimes) accept a shape restriction on weighting functions, utility functions, and/or value functions, in order to mollify critics who deem these models too flexible relative to the more restrictive EU alternative.
2.3. The Strong Utility and Strict Utility Models: The Fechnerian and ‘‘Luce’’ Models Strong utility was first axiomatized by Debreu (1958), but it has a very old pedigree going back to Thurstone (1927) and Fechner (1966/1860) and
Stochastic Models for Binary Discrete Choice Under Risk
many writers call it behavioral meaning increasing function (i.e., skew-symmetry
217
the Fechnerian model. Strong utility models attach to V-distance: They assume that there exists an F:R-[0,1], with Fð0Þ ¼ 0:5 and FðxÞ ¼ 1 FðxÞ about x ¼ 0), such that Pnm ¼ Fðln ½VðS m jbn Þ VðRm jbn ÞÞ
(11)
Because EU and RDEU are affine structures, the V-distance VðS m jbn Þ VðRm jbn Þ is only unique up to scale. From one theoretical viewpoint, then, strong utility models for EU or RDEU might seem ill-conceived since the V-distance which is its argument is nonunique. We need to keep in mind, though, that the EU and RDEU structures are representations of underlying preference directions, not underlying choice probabilities. A more positive view of the matter is that both EU and RDEU imply that ln is a free parameter that may be chosen to do stochastic descriptive work that is not these structures’ primary descriptive purpose. No preference direction represented by the structure V is changed, for any fixed bn , as ln varies. However, a somewhat subtle problem still lurks within this line of thinking, having to do with the stochastic meaning of risk aversion: This problem is the inspiration for the contextual utility model introduced later. It is wellknow that strong utility models imply SST (Morrison, 1963; Luce & Suppes, 1965). There is a virtually one-to-one relationship between strong utility models and homoscedastic ‘‘latent variable models’’ widely employed in empirical microeconomics for modeling discrete dependent variables. In general, such models assume that there is an underlying but unobserved continuous random latent variable ynmn such that ynm ¼ 13ynmn 0; then Pnm ¼ Prð ynmn 0Þ. In our case, the latent variable takes the form ynmn ¼ VðS m jbn Þ VðRm jbn Þ sn , where e is a mean zero random variable with some standard variance and c.d.f. FðxÞ such that Fð0Þ ¼ 0:5 and FðxÞ ¼ 1 FðxÞ, usually assumed to be the standard normal or logistic c.d.f.14 The resulting latent variable model of a considered choice probability is then VðSm jbn Þ VðRm jbn Þ n Pm ¼ F (12) sn In latent variable models, the random variable sn may be thought of as random computational, perceptual or evaluative noise in the decision maker’s apprehension of VðS m jbn Þ VðRm jbn Þ, with sn being proportional to the standard deviation of this noise. As sn approaches zero, considered
218
NATHANIEL T. WILCOX
choice probabilities converge on zero or one, depending on the sign of VðSm jbn Þ VðRm jbn Þ; in other words, the observed choice becomes increasingly likely to express the underlying preference direction. To complete the analogy with a strong utility model, one may interpret ln as equivalent to 1=sn . In keeping with common (but not universal) parlance, I will call ln subject n’s precision parameter. In strong utility models with a tremble, the stochastic parameter vector is Zn ¼ ðln ; on Þ. One of Luce’s (1959) stochastic choice models, known as the strict utility model, may be thought of as a strong utility model in which natural logarithms of structural lottery values replace the lottery values. It appears in contemporary applied work, where for example Holt and Laury (2002) write considered choice probabilities as n
Pcnm
¼
VðSm jbn Þl n
VðS m jbn Þl þ VðRm jbn Þl
n
(13)
A little bit of algebra shows that this is equivalent to Pcnm ¼ L½ln ðln½VðS m jbn Þ ln½VðRm jbn ÞÞ
(14)
where LðxÞ ¼ ½1 þ e x 1 is the Logistic c.d.f. This resembles strong utility, but natural logarithms of V are differenced to create the latent variable, rather than differencing V itself: I will call this logarithmic V-distance. Note that strict utility algebraically requires strictly positive values of V. The nonparametric representation of utilities adopted earlier, that is U n ¼ ð0; 1; un2 ; un3 ; . . .Þ on the outcomes (0,1,2,3,y) makes this so (since the minimum outcome 0 has a utility of zero), so this is satisfied for all lotteries except a sure zero outcome. Formally, there is a theoretical mismatch inherent in combining affine structures (such as EU or RDEU) with strict utility. An affine structure V is unique up to an affine transformation. Yet formally speaking, the axiom systems that produce strict utility models imply that the V within the stochastic specification is the stronger kind of scale known as a ratio scale, in which V must be strictly positive and is unique only up to a ratio transformation (Luce & Suppes, 1965). It is not entirely clear whether this theoretical mismatch implies any highly consequential and deep axiomatic incoherence. From a purely algebraic perspective, all we must do is choose a nonnegative utility of the minimum outcome in some experiment, and the combination will be well-defined. Yet as we will see below, an (EU,Strict) model will then have a rather peculiar property: It can explain common ratio effects on any context where the minimum outcome’s utility is positive,
Stochastic Models for Binary Discrete Choice Under Risk
219
but not on any context where the minimum outcome’s utility is zero. Strict utility has other relatively attractive properties across contexts, but contextual utility will share those properties without this odd ‘‘scaling dependence’’ vis-a`-vis the common ratio effect. Before leaving strong and strict utility, note that several distinct models get called ‘‘Luce’’ models. This is not surprising since Luce developed many different probabilistic choice models, but it does cause some confusion. The strict utility model in Eq. (13) is one of these, but this well-known model, which closely resembles it, also gets called ‘‘the Luce model:’’ Pcnm ¼
exp½ln VðSm jbn Þ exp½ln VðSm jbn Þ þ exp½ln VðRm jbn Þ
(15)
Many readers will recognize this model as the one used by McKelvey and Palfrey (1995) for quantal response equilibrium and by Camerer and Ho (1999) for their EWA learning model. Under the terminology I am using here, this is a strong utility model because simple algebra shows it to be equivalent to Eq. (11) with a logistic c.d.f. for F. From the viewpoint of the affine structures EU and RDEU considered here, the crucial distinction between what I am calling strong and strict utility is whether the argument of the c.d.f. F is a V-distance or a logarithmic V-distance. All strict utility models are also strong utility models of a sort, in which natural logarithms of the structure V replace the structure V and the c.d.f. is the logistic, but not all strong utility models are strict utility models (Luce & Suppes, 1965). Since the nonlinear transformation of V by natural logarithms is a peculiar move for affine structures, I distinguish these particular models with the name ‘‘strict utility’’ and reserve the term ‘‘strong utility’’ for models in which V-distance, and not logarithmic V-distance, appears in the c.d.f. F.
2.4. Moderate Utility I: The Wandering Vector Model Econometrically, moderate utility models are heteroscedastic latent variable models, that is models where the standard deviation of judgmental noise is conditioned on pairs m so that we write snm , and considered choice probabilities become VðSm jbn Þ VðRm jbn Þ c Pnm ¼ F (16) snm
220
NATHANIEL T. WILCOX
Hey (1995) and Buschena and Zilberman (2000) have explored some heteroscedastic models for discrete choice under risk. Moderate utility models, however, place specific restrictions on the form of the heteroscedasticity, so as to guarantee MST. Again consider the three pairs associated with any lottery triple {C,D,E}, that is the pairs {C,D}, {D,E}, and {C,E}, and call these pairs 1, 2, and 3, respectively. Then moderate utility models require that standard deviations snm behave like a distance measure or norm on these lottery pairs, satisfying the triangle inequality: That is, they require sn1 þ sn2 sn3 across all such triples of pairs in order to satisfy MST (Halff, 1976; see Appendix A). Therefore, letting d m dðSm ; Rm Þ be a norm on pairs m, the moderate utility model is n l ½VðSm jbn Þ VðRm jbn Þ c Pnm ¼ F (17) dm We can generate one class of moderate utility models by using a measure of distance between probability vectors Sm and Rm. The Minkowski norm P ð Ii¼1 jsim rim ja Þ1=a is an obvious choice for d m dðS m ; Rm Þ here; this would add the extra parameter aZ1 to a model. Intuitively, such a norm is one measure of similarity between lotteries in a pair, and these moderate utility models assert that for given V-distance, more similar lotteries are compared with less noise.15 Carroll (1980) pioneered a simple computational underpinning for this the WV model. It implies the Euclidean norm PI1intuition called ð z¼0 ðsmz rmz Þ2 Þ1=2 as the proper choice for dm, so that the WV model has no extra parameters. Therefore, we can compare the WV model to RPs, strong utility and strict utility without taking a position on the value of parsimony. Therefore, I illustrate it here for the expected utility structure. Suppose subject n has a noisy perception of her utilities of each outcome z; in particular, suppose this utility is a random variable unz ¼ unz xnz , where xnz N½0; ðsnu Þ2 8z and unz is her mean utility of outcome z. At each new trial of any pair m, assume that a new vector of noisy utility perceptions occurs, so that there is a new realization of the vector ðxn0 ; xn1 ; . . . ; xnI 1 Þ, the ‘‘wandering’’ part of P the utility vector. P Then by definiI1 I1 tion, VðS m jbn Þ VðRm jbn Þ sn m z¼0 ðsmz rmz Þunz z¼0 ðsmz rmz Þunz PI1 P I1 n ðsmz rmz Þxz , so that sn m z¼0 ðsmi rmi Þxnz . Since PI1z¼0 n ðs r Þx is a linear combination of normally distributed random mi mi z z¼0 variables, it is normally distributed too. If we further assume that covðxz ; xz0 Þ ¼ 0 8 zaz0 – what Carroll and De Soete (1991) call the PI1 n ‘‘standard’’ WV model – the variance of z¼0 ðsmi rmi Þxz is then
221
Stochastic Models for Binary Discrete Choice Under Risk
P 2 ðsnu Þ2 I1 of the latent z¼0 ðsmz rmz Þ . Therefore, the standard PI1 deviation2 0:5 variable’s error term sn m becomes snu ð z¼0 ðsmz rmz Þ Þ . Thus the standard WV model is a moderate utility model of the Eq. (17) form, where ln ¼ 1=snu , dm is the Euclidean norm, and F(x) is the standard normal c.d.f.
2.5. Moderate Utility II: Contextual Utility Let Vðzjbn Þ be subject n’s structural value of a ‘‘degenerate’’ lottery that pays z with certainty. Contextual utility is a moderate utility model in which n min n 16 max the distance norm is d nm ¼ ½Vðzmax where zmin m jb Þ Vðzm jb Þ, m and zm denote the minimum and maximum possible outcomes in the context cm of pair m. Thus, considered choice probabilities are n n n VðS m jb Þ VðRm jb Þ n Pm ¼ F l (18) n min n Vðzmax m jb Þ Vðzm jb Þ Contextual utility essentially asserts that the stochastic perceptual impact of V-distance in a pair is mediated by the range of possible outcome utilities in a pair, a notion which has a ring of psychophysical plausibility about it, and which has grounding in psychologists’ experiments on categorization and models of error in categorization (Wilcox, 2007a). Econometrically, it is the assumption that the standard deviation of computational error snm is proportional to the range of outcome utilities perceived by subject n in pair m. Although this is both pair- and subject-specific heteroscedasticity, it introduces no extra parameters into a model since the form of the heteroscedasticity is entirely determined by pre-existing parameters (outcome utilities). Contextual utility makes the stochastic implications of structural definitions of the MRA relation sensible within and across contexts. In affine structures such as EU and RDEU, the only truly unique characteristic of a utility function is a ratio of differences: Intuitively, contextual utility exploits this uniqueness to create a correspondence between structural and stochastic definitions of MRA. To see this, consider any three-outcome MPS pair on any context cm ¼ ð j; k; lÞ. Under the RDEU structure and contextual utility, the choice probability in Eq. (18) can be rewritten as Pnm ¼ Fðln ½ðwsmk wrmk Þunm þ ðwsml wrml ÞÞ; where unm ¼
unk unj unl unj
(19)
Since wsmk wrmk 40 in MPS pairs, Eq. (19) shows that Pnm is increasing in the ratio of differences unm . Note the similarity between unm in Eq. (19) and vnm
222
NATHANIEL T. WILCOX
Eq. (9) from the section on RPs. In both models, the three utilities on a context cm are reduced to a single ratio of differences by affine transformations (but they are not the same ratio, and the stochastic treatment of these ratios differs across the models). Consider two subjects Anne and Bob: Assume that they have identical weighting functions (which includes the case where both have an EU structure) and that Bob’s local absolute risk aversion u00 ðzÞ=u0 ðzÞ exceeds that of Anne for all z. The latter assumption, and simple algebra based on Anne Pratt’s (1964) theorem, then implies that uBob on all contexts cm. mk 4umk Formally, these conditions imply that Bob is more risk averse than Anne (or Bob Anne) in the structural sense Chew, Karni, and Safra (1987) mra define for RDEU preferences: Although differences in weighting functions contribute to differences in risk aversion in RDEU models, we focus here on the ‘‘traditional’’ source of risk aversion associated with the curvature of utility functions by holding the weighting function constant across agents. Finally, assume that Bob and Anne are ‘‘equally noisy’’ decision makers (that lBob ¼ lAnne ). It then follows from Eq. (19) that Anne PBob for all m 2 Omps . This is a sensible (albeit strong) meaning of m 4Pm ‘‘Bob is stochastically more risk averse than Anne’’ or Bob Anne, and it smra closely resembles Hilton’s (1989) definition of ‘‘more risk averse in selection.’’ Wilcox (2007a) also shows that under strong utility and strict utility, it is not possible for Bob Anne to imply Bob Anne across all mra smra contexts. It is important to notice that strong utility and contextual utility are observationally indistinguishable on a single context. This is easy to see. In a contextual utility model, we can redefine the precision parameter on n min n context cm as lnm ln =½Vðzmax m jb Þ Vðzm jb Þ; seen this way, we understand that contextual utility is a model with subject- and context-specific heteroscedasticity. Obviously, when we confine attention to pairs on a single fixed context, we can ignore the context-dependence and suppress the subscript m on lnm , writing ln instead. So for any set of pairs on a fixed context, contextual utility behaves exactly as strong utility does. This is a useful fact: It means that any prediction of strong utility on a single context will also be true of contextual utility on a single context, and I use this repeatedly below. Notice too that this property implies that it is entirely pointless to compare the fit of strong and contextual utility using choice data in which no subject makes choices from pairs on several different contexts (e.g., the data set of Loomes & Sugden, 1998), since strong and contextual utility are observationally indistinguishable for such data. One can still estimate the contextual utility model with such data, however; and
Stochastic Models for Binary Discrete Choice Under Risk
223
for reasons discussed above, comparisons of risk aversion estimates across subjects will be potentially more meaningful for the purpose of prediction in other contexts.
2.6. Other Models There are many possible heteroscedastic models of choice under risk and uncertainty. It is possible to imagine a model resembling the WV model, in which it is probabilities (or weights, in the RDEU case) that are random variables rather than utilities. This possibility seems most compelling in choice under uncertainty, that is when alternatives are acts with consequences in different states and no objective probabilities are available. In this situation, random variation in subjective probabilities of states across trials is a quite plausible conjecture. This is the initial point of departure for decision field theory, developed by Busemeyer and Townsend (1993). Decision field theory is an explicitly computational model based on random shifts of attention between states, and formally reflected by random variation of subjective probability weights. Decision field theory explains many stylized facts of stochastic choice under uncertainty; for instance, it predicts the kinds of violations of SST and simple scalability observed in the psychological canon. Therefore, it obviously deserves serious attention. Although I do not consider decision field theory in detail here, I will refer to it often in discussing the other models. Blavatskyy (2007) has offered an interesting heteroscedastic model based on the notion that subjects will trim or truncate ‘‘illogical’’ errors. The idea here is that noise in the computation of V-distance should not exceed the logical bounds imposed by the utilities of the maximum and minimum outcomes of the lotteries in a pair. Blavatskyy calls this the ‘‘internality axiom.’’ The error truncation implies that the distribution of truncated errors depends on the lottery pair and does not have a zero median: In other words, evaluative errors have predictable biases in Blavatskyy’s model. Because of this, the truncated error model with an EU structure can explain phenomena such as the four-fold pattern of risk aversion that are normally thought of as demanding a rank-dependent structure like RDEU. It should be noted that Blavatskyy also adds heteroscedasticity to his model in a manner that closely resembles contextual utility without much comment, which may help to account for its good performance in Blavatskyy’s tests. We ought also to expect stochastic consequences of pair complexity, and there is evidence of this (e.g., Sonsino, Benzion, & Mador, 2002), but
224
NATHANIEL T. WILCOX
I refrain from any development of this here. Perhaps this kind of stochastic modeling deserves wider examination by experimentalists. Again, strong utility models are simple and powerful workhorses, and probably equal to many estimation tasks; but they have had descriptive problems that moderate utility models largely solved, at least in psychologists’ experimental canon. While many of those experiments did not follow the methodological dicta of experimental economics, they still give us a good reason to examine the WV model alongside strong utility and RP.
3. THE AMBIGUITY OF AVERAGE TREATMENT EFFECTS: A COMMON RATIO ILLUSTRATION Stochastic models matter in part because they mediate the predictions of structures. This fact alone means that inferences about structures depend crucially on stochastic modeling assumptions. If in addition subjects are heterogeneous, structural inferences are still more difficult. The purpose of this section is to illustrate this in detail for the common ratio effect introduced in Section 1.3.1. Throughout this section it is assumed that the true structure of all subjects is EU: Therefore, in this section ‘‘subject heterogeneity’’ means only heterogeneity of subjects’ utilities and/or stochastic parameter vectors. This kind of heterogeneity is, by itself, enough to make inferences from observed sample proportions very difficult without a strong stochastic identifying assumption such as the RP model. This is true even for within-designs. Formally, the inference problem grows from the fact that structures are about preference directions, while stochastic models determine the observed magnitude of these preference directions as reflected by choice probabilities and how these change across the pairs in a preference equivalence set of some theory, such as a common ratio set for EU. Structures play an important role in the reality of observed choice proportions, but stochastic models and subject heterogeneity play large and confounding roles too. Throughout this section, I replace the subject index n by the subject type index c, and will assume that this is distributed JðcÞ in the sampled population. This represents heterogeneity in the sampled population. To think about this heterogeneity in the simplest possible terms in this section, consider a population of subjects with EU structures, composed of just two types c 2 fS; Rg. Type S(R) strictly prefers R the safe (risky) lottery in all pairs in the common ratio set. Then Pt Pct dJðcÞ yPSt þ ð1 yÞPR t is the
Stochastic Models for Binary Discrete Choice Under Risk
225
expected population proportion of choices of S t from pair t, where R dJðcÞ ¼ y 2 ½0; 1 denotes the proportion of the population that is c¼S type S. This two-type population mixture is used repeatedly below. Note that throughout this discussion, I assume that truly indifferent subjects are of zero measure in the population. This is to keep things simple: A nonzero fraction of truly indifferent subjects only complicates the discussion below without creating any special insights. 3.1. Predictions of the Stochastic Models in Common Ratio Sets As discussed in Section 1.3.1, a common ratio set is composed of at least two pairs of the form fSt ; Rt g fð1 ts; ts; 0Þ; ð1 tr; 0; trÞg, both on one context ð j; k; lÞ where sWr. For the EU structure, we have VðSt jbc Þ ¼ ð1 tsÞucj þ tsuck and VðRt jbc Þ ¼ ð1 trÞucj þ trucl . In general, consider two pairs fS t ; Rt g and fS t0 ; Rt0 g where t4t0 . 3.1.1. Random Preferences and the Wandering Vector Model RPs are the simplest of the predictions. Recall from Section 1.3.1 that common ratio sets are preference equivalence sets for EU. It immediately follows from the preference equivalence set property of RP models that an (EU,RP) model predicts that Pct ¼ Pct0 for all c. Therefore, regardless of the distribution of c, population choice proportions are constant across the pairs of a common ratio set, so that Pt ¼ Pt0 . The WV model behaves in exactly the same way, but for a different reason. With the probability vectors in pair t given by fð1 ts; ts; 0Þ; ð1 tr; 0; trÞg, and the distance d t between these vectors given by Euclidean distance, we have d t ¼ ððtr tsÞ2 þ ðtsÞ2 þ ðtrÞ2 Þ0:5 ¼ tððr sÞ2 þ s2 þ r2 Þ0:5 ¼ td 1
(20)
Recall that the V-distance in common ratio pairs is t½ðr sÞucj þ suck rucl , and recall that in the WV model, this V-distance is divided by the distance measure: Clearly, this division eliminates the common ratio t. Therefore, the argument of F – the latent variable – will not depend on the common ratio in the WV model. As with (EU,RP) models, any (EU,WV) model requires Pct ¼ Pct0 for all c and hence Pt ¼ Pt0 in the population. 3.1.2. Strong Utility and Contextual Utility Recall that on a given context, contextual utility and strong utility make the same predictions. Therefore, since pairs in a common ratio set are all on a
226
NATHANIEL T. WILCOX
given context, we may treat strong and contextual utility together here. Recall that the V-distance between common ratio pairs is t½ðr sÞucj þ suck rucl . Note that ðr sÞucj þ suck rucl is the V-distance in the root pair (i.e., the pair with t ¼ 1) of the common ratio set: This is positive for S-type subjects (since they prefer the safe lotteries in the common ratio set’s pairs) and is negative for R-type subjects (since they prefer the risky lotteries in the common ratio set’s pairs). Therefore, the V-distance is increasing in t for S-types and decreasing in t for R-types. In strong and contextual utility, choice probabilities are increasing in V-distance. Putting all this together, we have these predictions for strong and contextual utility: R 0 PSt 4PSt0 40:5 and PR t oPt0 o0:58t4t
(21)
To get a sense of possibilities with a very typical F and common ratio set, choose the logistic distribution for F, and consider the two pairs generated by t ¼ 1 (the root pair) and t0 ¼ 1=4. Let Dc lc ½ðr sÞucj þ suck rucl be the latent variable (the argument of F) for a c-type subject in the t ¼ 1 root pair of the common ratio set. Then Pc1 ¼ ½1 þ expð Dc Þ 1 and Pc1=4 ¼ ½1 þ expð Dc =4Þ 1 . Fig. 1 illustrates the relationship between Pc1 and Pc1=4 for three possible S- and R-types. For the S-types, the three values of DS considered are 15, 1.5, and 0.15, corresponding to a precise, moderate, and noisy S-type respectively. Similarly, the three values 15, 1.5, and 0.15 for DR correspond to a precise, moderate and noisy R-type, respectively. Fig. 1 shows how the absolute value of Dc and the behavior of a typical c.d.f. such as the logistic conspire to create three distinctive possibilities for the pattern of strong or contextual utility choice probabilities over a pair of common ratio pairs. We could have very precise types, characterized by choice probabilities near one or zero in both the root pair and the t ¼ 1/4 pair. We could also have very noisy types, characterized by choice probabilities not much different from one-half in both pairs. For both of these types, the ordering relationship in Eq. (21) is reflected very weakly: In a population composed solely of such types, the hypotheses Pc1 ¼ Pc1=4 8c, and P1 ¼ P1=4 , would be hard to reject even at fairly large sample sizes. Put differently, it would be very difficult to tell this population from an (EU,RP) population, and neither would predict the common ratio effect understood as sample proportions supporting P1 41=24P1=4 . Therefore, from the perspective of common ratio effects, interesting and distinctive (EU,Strong) or (EU,Contextual) populations must contain at least one of the ‘‘moderate’’ types shown in Fig. 1. These types show the
227
Stochastic Models for Binary Discrete Choice Under Risk 1
Probability of choosing safe lottery in pair
Δ=15 Δ=1.5 0.75
Δ=0.15 0.5 Δ=−0.15 Precise S-type Moderate S-type Noisy S-type
0.25 Δ=−1.5
Noisy R-type Moderate R-type Precise R-type
Δ=−15 0 root pair
pair with tau=1/4 Common ratio pairs
Fig. 1.
Possible Choice Probability Patterns in a Common Ratio Set: Strong Utility and Contextual Utility.
distinctive ordering relationship in Eq. (21) strongly. Fig. 2 takes the choice probabilities for moderate S-types and precise R-types from Fig. 1, and mixes them according to y ¼ 0.7. That is, Fig. 2 shows a population made up of 70% moderate S-types and 30% precise R-types. The heavy line shows that in this population, we have P1 ¼ 0.57 and P1/4 ¼ 0.42. Thus, this is a population where the EU structure is the true structure for all subjects, and yet we expect to observe P1 41=24P1=4 , a common ratio effect. We should, therefore, call this the false common ratio effect since it is a possibility in a heterogeneous EU population with standard stochastic models. This possibility is a distinctive feature of strong utility and contextual utility models (and sometimes for strict utility, as will be clear below). It is worth dwelling on this example a bit since it illustrates the ambiguities associated with typical casual and not-so-casual inferences extremely well. Consider, for instance, a within-design where each subject n makes a choice both from the root pair and from the pair with t ¼ 1/4.
228
NATHANIEL T. WILCOX 0.9 0.82
moderate S -type (70% of population)
probability of choosing safe lottery in pair
0.8 0.7 0.6
0.59
0.57
0.5 0.42 population 0.4 0.3 0.2 precise R-type (30% of population) 0.1 0.02
0.00 0 root pair
pair with tau=1/4 common ratio pair
Fig. 2.
The False Common Ratio Effect in a Two-Type (EU,Strong) or Two-Type (EU,Contextual) Population.
If we are sampling from the population of Fig. 2, we will expect an asymmetry between ‘‘predicted and unpredicted violations’’ of deterministic EU structural expectations. There is a long history of regarding such observations as decisive evidence against the EU structure (Conlisk, 1989; Harless & Camerer, 1994). The formal basis for these inferences is a stochastic model called the constant error rate model which was critically examined by Ballinger and Wilcox (1997). What would a population like that in Fig. 2 imply about this asymmetry? The event ð yn1 ¼ 1 [ yn1=4 ¼ 0Þ, that is ‘‘subject n made the safe choice in the root pair and the risky choice in the t ¼ 1/4 pair,’’ corresponds to the switch in preference predicted by (for instance) a deterministic RDEU structure with the Prelec (1998) weighting function. Similarly, the event ð yn1 ¼ 0 [ yn1=4 ¼ 1Þ corresponds to the switch in preference not predicted by that structure. Both events are violations of deterministic EU, but only the
Stochastic Models for Binary Discrete Choice Under Risk
229
former violation is predicted by an alternative (RDEU or prospect theory, with usual assumptions about weighting functions). Assuming random sampling from the population and statistically independent choices by each subject from each pair, the probabilities of the predicted and unpredicted violations for a randomly selected subject (and hence a sample) are R Prð yn1 ¼ 1 [ yn1=4 ¼ 0Þ ¼ yPS1 ð1 PS1=4 Þ þ ð1 yÞPR 1 ð1 P1=4 Þ;
and
(22)
R Prð yn1 ¼ 0 [ yn1=4 ¼ 1Þ ¼ yð1 PS1 ÞPS1=4 þ ð1 yÞð1 PR 1 ÞP1=4
From the information in Fig. 2 and these equations, it is simple to calculate the expected proportion of both kinds of violations in that population: These are Prð yn1 ¼ 1 [ yn1=4 ¼ 0Þ ¼ 0:235 and Prð yn1 ¼ 0 [ yn1=4 ¼ 1Þ ¼ 0:080. That is, a heterogeneous (EU,Strong) or (EU,Contextual) population like that in Fig. 2 implies that ‘‘predicted violations’’ of deterministic EU will be three times more common than ‘‘unpredicted violations.’’ Consider samples of N ¼ 80 subjects, drawn randomly from the Fig. 2 population. A simple Monte Carlo simulation shows that in about five out of six such samples, we would reject (at 5%, two-tailed) the hypothesis that predicted and unpredicted violations are equally likely, using the suggested test of Conlisk (1989) which is based (incorrectly for this population) on the constant error rate assumption described by Conlisk in his appendix and generalized by Harless and Camerer (1994). The population in Fig. 2 also implies ‘‘within-pair switching rates’’ that are typical of lottery choice experiments. Experiments with repeated trials allow one to look at the degree of choice consistency. Suppose that in our experiment, subjects had two trials t ¼ 1 and 2 of both the root pair and the pair with t ¼ 1/4. Adding back the trial subscript for a moment, the within switching probability for any pair, for a randomly selected subject, is R W t Prð ynt;1 aynt;2 Þ ¼ 2½yPSt ð1 PSt Þ þ ð1 yÞPR t ð1 Pt Þ
(23)
Using the information in Fig. 2 and this equation, we have expected withinpair switching rates W 1 ¼ 0:209 and W 1=4 ¼ 0:351; these are of the magnitude observed in the experimental canon on common ratio effects using repeated trials (e.g., Camerer, 1989; Starmer & Sugden, 1989; Ballinger & Wilcox, 1997).
230
NATHANIEL T. WILCOX
3.1.3. Strict Utility On any context where ucj ¼ 0, that is where the context’s minimum outcome has zero utility, strict utility behaves just like the RP model and the WV model do in common ratio sets. However, if the common ratio set is defined on a context where ucj 40, strict utility instead behaves just as strong utility and contextual utility do in common ratio sets. To analyze both cases, recall that strict utility uses a logarithmic V-distance as the latent variable in F. In a common ratio pair, this is ln½VðS t jbc Þ ln½VðRt jbc Þ ¼ ln½ð1 tsÞucj þ tsuck ln½ð1 trÞucj þ trucl (24) If the common ratio set is defined on a context where ucj ¼ 0, the right side of this equation is lnðtsuck Þ lnðtrucl Þ ¼ lnðsuck Þ lnðrucl Þ. So it is clear that, in this instance, strict utility behaves just as the WV model does with the EU structure: The common ratio disappears from the latent variable, so Pct ¼ Pct0 for all c and hence Pt ¼ Pt0 in the population. In the case of a context where ucj 40, the derivative of Eq. (24) with respect to t is sðuck ucj Þ rðucl ucj Þ @ ln½VðSt jbc Þ ln½VðRt jbc Þ ¼ c (25) @t uj þ tsðuck ucj Þ ucj þ trðucl ucj Þ The two terms on the right share the form b=ða þ tbÞ, differing only by b, and a simple differentiation shows this to be increasing in b. It follows that Eq. (25) has the same sign as sðuck ucj Þ rðucl ucj Þ, and this is positive for S-types and negative for R-types. Therefore, when ucj 40 strict utility allows the same patterns of choice probabilities shown in Eq. (21), and illustrated by Fig. 1, that strong utility and contextual utility do. The dependence of strict utility’s predictions on the utility of the minimum outcome in the context of the common ratio set occurs because of the theoretical mismatch between the affine structure EU and the fact that strict utility requires a ratio scale. Because of this, what is an arbitrary choice in deterministic EU – the choice of a zero for the utility function – is consequential when an EU structure is put into a strict utility model.
Stochastic Models for Binary Discrete Choice Under Risk
231
3.2. Summary To summarize, all of the qualitative features of simple sample moments that get emphasized in the literature on the common ratio effect are reproducible by a heterogeneous EU structure population in which strong or contextual utility is the true stochastic model (and sometimes the strict utility model too). Therefore, these qualitative findings cannot by themselves be the reason we dismiss the EU structure. This is what I mean by ‘‘the ambiguity of average treatment effects:’’ Generally, their qualitative patterns are not by themselves capable of telling us which structures are true. To do that, we need to make explicit assumptions about stochastic models and the nature of heterogeneity in the sampled population. This realization is why authors such as Loomes et al. (2002) have revisited old data sets (Loomes & Sugden, 1998) and re-analyzed them with explicit attention to both stochastic models and heterogeneity. The point of this discussion is not – que milagro – to explain away common ratio effects as mere aggregation phenomena with strong, strict, or contextual utility. Rather, it is that parts, perhaps substantial parts, of what we normally think of as violations of EU may be due to aggregation and stochastic models, rather than nonlinear probability weighting arising from rank-dependent structure. Put differently, if we wish to properly measure the strength of nonlinear probability weighting, the examples show that we will necessarily need to take account of heterogeneity whenever we believe that the true stochastic model is strong, strict, or contextual utility. This is the important take-away message of this section.
4. PROPERTIES OF THE STOCHASTIC MODELS COMBINED WITH THE STRUCTURES I now turn to a general listing of how the stochastic models of Section 2 combine with the EU and RDEU structures, in terms of the properties reviewed in Section 1. The previous section just did this in detail for the common ratio set property of the EU structure. Recall that stochastic transitivity properties (or lack of these) were discussed in Section 2, as each stochastic model was introduced. Nevertheless, it will be interesting to consider the implications of models that obey SST as we look at sets of MPS pairs on a given context for EU. Throughout much of this section, I suppress both the subject and subject type superscripts (n or c) to keep
232
NATHANIEL T. WILCOX
down notational clutter. But it is important to remember that the results described here are for individual subjects or subject types: As the previous section on the common ratio effect showed, many of these properties will be hidden, modified, or confounded by aggregation across different types of subjects. This is noted where it is important.
4.1. Mean Preserving Spreads, Stochastic Transitivities, and Betweenness Recall that Ocmps Omps is the set of MPS pairs on any specific threeoutcome context c. Section 1.3.2 showed that this is a preference equivalence set for EU. Obviously, any specific subsets of any Ocmps will also be preference equivalence sets for EU. A particularly interesting subset is any three MPS pairs fC h ; Dh g, fDh ; E h g and fC h ; E h g, indexed by hi ¼ h1, h2, and h3, respectively, generated by a triple h of lotteries fC h ; Dh ; E h g with common expected value, all on one context c. In this instance, E h is a MPS of both Ch and Dh , and Dh is also a MPS of C h : C h is safest, and E h riskiest, in such triples, with Dh of moderate risk. Call such a set of three lotteries, and the three MPS pairs it generates, a spread triple. Table 1 shows three spread triples, each on a different context, that happen to occur in Hey’s (2001) experimental design. The spread triples are indexed by h 2 f1; 2; 3g in the left column of the table. Under this indexing, for instance, given the spread triples in Table 1, P23 is the probability that a subject chooses C2 from the pair hi ¼ 23 which is fC 2 ; E 2 g where C2 and E2 are as given in the second (h ¼ 2) row of Table 1. After discussing the properties of the models, we will look at the data from Hey’s experiment for these three spread triples. 4.1.1. Random Preferences and the Wandering Vector Model Ocmps is a preference equivalence set for EU. Therefore, the preference equivalence set property of the RP model implies that any (EU,RP) model requires that Pm ¼ Pm0 for each subject and hence the population, 8 m; m0 2 Ocmps . In words, an (EU,RP) model requires that expected sample choice proportions for all MPS pairs on a given context are equivalent. This is of course true for spread triples too: Using the special indexing of spread triples, Phi ¼ Phi0 8 i and i0 , given h, for each subject and hence the population. None of this holds for (RDEU,RP) models since Ocmps is not in general a preference equivalence set of RDEU. As in the case of common ratio effects, it turns out that the WV model has precisely the same properties as the RP model for MPS pairs on a
Context of Triple
(0,d50,d100) (0,d100,d150) (d50,d100,d150)
Triple (h)
1 2 3
(0,1,0) (2/8,6/8,0) (2/8,6/8,0)
Ch (3/8,2/8,3/8) (3/8,3/8,2/8) (3/8,4/8,1/8)
Dh (4/8,0,4/8) (4/8,0,4/8) (5/8,0,3/8)
Eh
d50 d75 d87.5
Common EV in Triple
Spread Triples from Hey (2001).
Lotteries in Triple
Table 1.
5 5 10
Pair 1 fCh ; Dh g
10 5 5
Pair 2 fDh ; E h g
10 5 5
Pair 3 fCh ; E h g
Trials of Pairs in Triple
Stochastic Models for Binary Discrete Choice Under Risk 233
234
NATHANIEL T. WILCOX
single context. This is proved in Appendix C, but the reason resembles what occurs in the case of the common ratio effect. It turns out that both the EU V-distance between lotteries, and the Euclidean distance between lotteries, are linear in the ‘‘size’’ of an MPS, defined as the difference between the probabilities of receiving the maximum outcome on a context. Hence, the ratio of the EU V-distance to Euclidean distance is independent of the spread size, making (EU,WV) choice probabilities independent of the spread size. Moreover, choice probabilities in the (EU,WV) model turn out to be independent of the expected values of the lotteries in a MPS pair as well: They depend only on the context of the MPS pair and the subject’s utilities of outcomes on that context. It is well worth reflecting on this highly nonintuitive prediction of (EU,RP) and (EU,WV) models. Consider these two choice problems: Problem I. Choose $100 for sure, or lottery (0.01,0.98,0.01) on the context ($75,$100,$125). Problem II. Choose $100 for sure, or lottery (0.5,0,0.5) on the context ($75,$100,$125). The increased risk of the lottery relative to the sure thing is much greater in Problem II than in Problem I. It would be trivially easy to show that any risk averter (in Pratt’s sense) would associate a much larger risk premium with the lottery in Problem II than the lottery in Problem I. Nevertheless, (EU,RP) and (EU,WV) models demand that the choice probabilities in these two problems be identical for each decision maker.17 Later we will see that RP models uniquely make the intuitively satisfying prediction that dominated lotteries are never chosen in an FOSD pair. Yet it is obvious here that RPs are equally capable of making astonishingly nonintuitive predictions. This illustrates one of my themes: If you are waiting for a stochastic model that is intuitively satisfying in every way, you are waiting for Godot. Every stochastic model mutilates your structural intuition in some distinctive way: There is no escape from this. 4.1.2. Strict, Strong, and Contextual Utility Strict and strong utility imply SST, and contextual utility implies SST on any given context, with any transitive structure such as EU or RDEU. So in any spread triple h, we must have SST for any subject. However, EU makes stronger predictions: It permits just two linear orderings of the three lotteries in any spread triple since any three utilities will be either weakly concave or weakly convex. If a subject has weakly concave utilities on the
Stochastic Models for Binary Discrete Choice Under Risk
235
context of h, then C h kDh and Dh kE h , and if a subject instead has weakly convex utilities on the context of h, then E h kDh and Dh kCh . From the perspective of the algebraic form of EU, these are implications of Jensen’s inequality. Alternatively, from an axiomatic perspective, this follows from the betweenness property of EU (see Appendix D). Therefore, we have either VðC h jbÞ VðDh jbÞ VðE h jbÞ, or VðE h jbÞ VðDh jbÞ VðCh jbÞ, which has two separate implications for an EU structure with strict, strong, or contextual utility. The first implication reflects the fact that Dh must be between C h and E h in preference, in any spread triple h: Either minðPh1 ; Ph2 Þ 0:5 ðfor weakly risk-averse subjectsÞ or maxðPh1 ; Ph2 Þ 0:5 ðfor weakly risk-seeking subjectsÞ
(26)
The second implication includes Eq. (26) but adds SST to it: Either or
minðPh1 ; Ph2 Þ 0:5 and Ph3 maxðPh1 ; Ph2 Þ
ðfor weakly risk-averse subjectsÞ; maxðPh1 ; Ph2 Þ 0:5 and Ph3 minðPh1 ; Ph2 Þ
(27)
ðfor weakly risk-seeking subjectsÞ Eq. (27) is essentially SST but with Eq. (26) specifying exactly which pairs (fC h ; Dh g and fDh ; E h g) provide the antecedent for the SST implication, and which pair will be in the consequence of the SST implication (namely the pair fC h ; E h g containing the safest and riskiest lotteries of the spread triple). It should be noted that Eqs. (26) and (27) imply nothing across subjects: One might sample from a heterogeneous population that mixes risk-averse and risk-seeking subjects and, as with the common ratio effect, this mixing can hide these individual level implications. Therefore, these implications should be tested at the individual level. 4.1.3. Stochastic Models are Consequential: An Illustration Using Hey’s (2001) Spread Triples Hey’s (2001) experiment is a repeated trials design with at least five repetitions of all pairs (and ten repetitions of some pairs) for every subject, as shown in Table 1. This allows for tests of the predictions described above at the individual level – that is, one subject at a time. The data are on the three spread triples shown in Table 1. For each test, an unrestricted log likelihood is simply the sum of the sample log likelihoods for a subject at the observed choice proportions for each of the nine pairs in Table 1. A restricted log likelihood is then computed for a subject by finding the nine
236
NATHANIEL T. WILCOX
choice probabilities that maximize the sample log likelihood for a subject with the restrictions described in the previous sections on the nine choice probabilities. These are of course restrictions imposed within each spread triple, not across them. Table 2 reports likelihood ratio tests of the restrictions. The (EU,RP) and (EU,WV) models require that choice probabilities within a spread triple are all equal. This is two restrictions per triple, or six restrictions in all for each subject. Therefore, under the null that the restrictions are true, twice the difference between the unrestricted and restricted log likelihoods will follow a w2 distribution with six degrees of freedom. The results soundly reject the restriction: The first row of Table 2 shows that it is rejected at the 10% level of significance for nearly half of Hey’s (2001) 53 subjects. A sum of independent w2 variates also has a w2 distribution with degrees of freedom equal to the sum of the degrees of freedom of each of the variates. Treating each subject as an independent sample, we may then perform this test overall: The sum of the test statistics across subjects should follow a w2 distribution with 53 6 ¼ 318 degrees of freedom. The left column of Table 2 reports this statistic and its p-value, which is essentially zero for the (EU,RP) and (EU,WV) models. The second row of Table 2 tests the betweenness implication (26) made by EU with strict, strong, or contextual utility. This is one restriction per triple. This is most easily seen by noticing that the single nonlinear constraint ðPh1 0:5ÞðPh2 0:5Þ 0 captures both allowable patterns of the implication (26). Across the three spread triples, then, the likelihood ratio test statistic against the implication will have three degrees of freedom for each subject. Table 2 shows that the implication is rejected at the 10% level for 13% of subjects – not an unexpected rate. Summing the test statistics across subjects, the left column shows that the p-value against the implication for all subjects is unity. So the implication (26) appears to be broadly acceptable. The third to fifth rows of Table 2 test the betweenness implication (26) separately for each of Hey’s (2001) spread triples. Notice from Table 1 that the intermediate lottery D3 of spread triple 3 has a relatively low but nonzero probability 1/8 of the highest outcome (d150) on the context of that triple. In RDEU and cumulative prospect theory, such lotteries are expected to be particularly attractive due to overweighting of small probabilities of the largest outcome (given contemporary wisdom about the shape of weighting functions (see e.g., Tversky & Kahneman, 1992)). Therefore, if we are expecting any violation of betweenness in any of these spread triples, we ought to expect it here in triple 3. And in fact, Table 2 indicates that we have
w253 ¼ 89:66 ( p ¼ 0.0012)
Equal spread size of pairs 1 and 2 in triple 2 yields special prediction that P21 ¼ P22
25%
EU
EU
Strong, strict, or contextual utility Strong or contextual utility
w253 ¼ 12:43 ð p 1Þ w253 ¼ 20:05 ð p 1Þ w253 ¼ 52:34 ( p ¼ 0.50) w2159 ¼ 77:11 ( pE1) w253 ¼ 10:52 ( pE1) w253 ¼ 46:19 ( p ¼ 0.73) w253 ¼ 20:39 ( pE1) w2318 ¼ 230:70 ( pE1) 4% 2% 11% 4% 4% 11% 4% 13%
Any transitive (EU or RDEU)
w2159 ¼ 84:82 ð p 1Þ
13%
Betweenness (Dh of intermediate preference): minðPh1 ; Ph2 Þ 0:5, or maxðPh1 ; Ph2 Þ 0:5. Betweenness in triple h ¼ 1 alone Betweenness in triple h ¼ 2 alone Betweenness in triple h ¼ 3 alone Strong stochastic transitivity (SST) SST in triple h ¼ 1 alone SST in triple h ¼ 2 alone SST in triple h ¼ 3 alone Betweenness and SST together
EU
w2318 ¼ 664:98 ð p 0Þ
Overall w2 (p-value)
49%
Percent of 53 Subjects Violating Prediction at 10% Significance
Ph1 ¼ Ph2 ¼ Ph3 , for each h
Prediction
EU
Structure
Tests of Predictions in Spread Triples, Using Spread Triples from Hey (2001).
Strong, strict, or contextual utility
Random preferences or wandering vector Strong, strict, or contextual utility
Stochastic Model
Table 2.
Stochastic Models for Binary Discrete Choice Under Risk 237
238
NATHANIEL T. WILCOX
more subjects violating betweenness at the 10% level of significance in triple 3 (13% of subjects) than in triples 1 and 2 (4% and 2% of subjects, respectively). Yet these rates of violation are low in all three triples: Summing the test statistics across subjects, the left column shows that overall p-values against the implication never approach significance – not even in triple 3. The sixth to ninth rows of Table 2 report the results of tests of SST alone in the three spread triples, which is implied by either EU or RDEU with strong, strict, or contextual utility. The sixth row does this for all three triples together, while the seventh, eighth, and ninth rows do it for each of the triples separately. SST is actually just one restriction within each triple. While SST rules out two of eight possible patterns of choice probabilities in the three choice pairs arising from any triple of lotteries, the two violating patterns are mutually exclusive. Only one can ever occur: That is, it is mathematically impossible for three choice probabilities to violate both restrictions at once. Therefore, twice the log likelihood difference (between the unrestricted model and the SST restricted model in the three triples) follows a w2 distribution with three degrees of freedom for each subject. The sixth row of Table 2 shows that this test rejects SST at the 10% level for just 6% of Hey’s (2001) subjects. Summing the test statistics over subjects, the left column shows that the p-value against SST for all subjects is unity. So SST in spread triples appears to be broadly acceptable. The seventh, eighth, and ninth rows show similar results for the SST restriction in each of the three triples separately. The tenth row of Table 2 displays test results against the restriction Eq. (27), which is the combination of the betweenness and SST restrictions implied by EU with strict, strong, or contextual utility. This imposes two restrictions per triple, and so the likelihood ratio test statistics here have six degrees of freedom per subject across three spread triples. The test results reject Eq. (27) at the 10% level for 13% of Hey’s (2001) subjects. Summing the test statistics over subjects, the left column shows that the p-value against Eq. (27) for all subjects is unity. So the combination of betweenness and SST in spread triples appears to be broadly acceptable. The good performance so far of EU with strong, strict, or contextual utility is perhaps somewhat surprising given the long history of problems with EU. The question naturally arises: Is there any evidence in these spread triples that seems to reject EU with these stochastic models? In fact there is a special testable equality restriction of EU with strong or contextual utility (but not strict utility) in the second spread triple shown in Table 1. It happens that the spread sizes in the pairs 1 and 2 of triple h ¼ 2, that is fC 2 ; D2 g and fD2 ; E 2 g, are equivalent, which implies that the EU V-distance between the lotteries in
Stochastic Models for Binary Discrete Choice Under Risk
239
these two pairs is equivalent (see Table 1 and the definition of spread size in Appendix C). Therefore, EU with strong or contextual utility implies that P21 ¼ P22 for each subject. The last row of Table 2 shows that this restriction is rejected at the 10% level for 25% of Hey’s (2001) subjects. Summing the test statistics over subjects, the p-value in the left column of Table 2 soundly rejects this restriction overall. Table 2 illustrates one of my major themes very clearly: Stochastic models are consequential identifying restrictions for theory tests. From the perspective of RPs or the WV model, EU is rejected at the individual level for nearly half (49%) of Hey’s (2001) 53 subjects. Yet from the perspective of strong and contextual utility, none of the testable implications in spread triples ever rejects EU for more than one-fourth of subjects (the special restriction in the last line of Table 2). For most of EU’s predictions in spread triples with strong, strict, or contextual utility, the predictions are rejected for a percentage of subjects roughly equivalent to the size of the test – what one would expect if the predictions are essentially true for all subjects. This convincingly illustrates that stochastic model assumptions are crucial and consequential identifying restrictions for theory tests. It is sobering to compare the inferences we might have made from Hey’s (2001) data if we had depended wholly on simple sample moments, completely ignoring heterogeneity and stochastic models. Recall that Hey’s (2001) spread triple 3 is the one where we should expect violations of betweenness according to today’s conventional wisdom. Let P3i be the sample proportion of choices of the safer lottery from pair i of triple h ¼ 3 in Hey’s data set. In Hey’s data set, we have P31 ¼ 0:413 and P32 ¼ 0:740. With 10 trials per subject of pair 1 (which is fC 3 ; D3 g) and five trials per subject of pair 2 (which is fD3 ; E 3 g) in a sample of 53 subjects, the hypothesis that these sample proportions equal 0.5 will be soundly rejected by any statistical test. A certain style of inference would then be written: ‘‘Most subjects prefer D3 to C 3 , and most subjects prefer D3 to E 3 , and this violates betweenness in spread triple 3.’’ Clearly, that is not the conclusion we draw from the test results in Table 2 – tests that respect heterogeneity simply by following the sound methodological example of Tversky (1969). Most data sets do not have enough repeated trials (most have none at all) to permit the disaggregated tests shown in Table 2. Yet the previous example illustrates how misleading aggregate tests can be. Therefore, we need statistical methods that can plausibly account for heterogeneity without treating every subject as a separate experiment (as done in Table 2). Linear mixture models (Harrison & Rutstro¨m, 2008 – this volume) are one approach to this. Later, I will describe the complementary random parameters approach.
240
NATHANIEL T. WILCOX
4.2. First-Order Stochastic Dominance Recall from Section 1.3.3 that FOSD pairs are perhaps the only preference equivalence set that is common to a broad collection of structures including EU and RDEU. Sadly, none of the five stochastic models get the facts about FOSD remotely right. There is a computationally plausible fix-up for part of this problem based on trembles, but not all of it. For the rest, one needs something like Busemeyer and Townsend’s (1993) decision field theory – one reason that this theory deserves our close attention in the future. 4.2.1. Random Preferences Because FOSD pairs are a preference equivalence set for both EU and RDEU, the preference equivalence set property of RP models implies what it always does in this case: All choice probabilities are equivalent for all FOSD pairs, for all subjects and hence the population. However, as mentioned in Section 1.3.3, all EU and RDEU preference orderings obey FOSD. In terms of RP intuition, there are no parameter vectors in the RP ‘‘urn’’ for which a dominated lottery is preferred. Therefore, the probability of choosing the stochastically dominating lottery (which by notational conventions is Sm in FOSD pairs) must always be 1. We therefore have Pm ¼ 1 8 m 2 Ofosd , for all subjects and hence the population. Again, this is equally true of EU and RDEU structures with the RP model. 4.2.2. Strict Utility, Strong Utility, Contextual Utility, and the Wandering Vector Model None of these models yield any special predictions about FOSD pairs. Neither V-distance (in strong and contextual utility), nor logarithmic V-distance (in strict utility), nor the Euclidean distance between lottery vectors (in the WV model) take any ‘‘special notice’’ of FOSD. For this reason, they all (counter to intuition) predict at best a small change in choice probabilities, if any, as one passes from a basic pair to an FOSD pair by way of any small change that causes such a change in the classification of the lottery pair. 4.2.3. Transparent Dominance Violations as Tremble Events It is now well-known that in cases of transparent dominance, the probability that FOSD is violated is very close to zero, but still different from zero. It is difficult to define the distinction between transparent and nontransparent dominance (see Birnbaum & Navarrete, 1998, or Blavatskyy, 2007, for useful attempts), but ‘‘you know it when you see it.’’ Here is an example of a
241
Stochastic Models for Binary Discrete Choice Under Risk
lottery pair that writers describe as a ‘‘transparent FOSD pair,’’ taken from Hey (2001): S : 3=8 chance of d50; 1=8 chance of d100; 4=8 chance of d150 R : 3=8 chance of d50; 2=8 chance of d100; 3=8 chance of d150 In the experiment of Loomes and Sugden (1998), subjects collectively violate FOSD in about 1.5% of transparent FOSD pair trials; a similar rate is observed by Hey (2001). Yet within-set switching probabilities for basic pairs are noticeably higher than this in all known experiments. Therefore, the continuity between basic and FOSD pairs that is suggested by all of the models except for the RP model seems to be wrong. By contrast, the RP model’s prediction seems to be approximately right in such transparent FOSD pairs. Yet the RP model’s prediction that FOSD is never violated will cause the log likelihood of any RP model to be infinitely negative for any arbitrarily small but positive rate of FOSD violation in any sample, including the 1.5% rate reported above. So even in the case of the RP model, some kind of fix-up seems necessary. For the RP model, the obvious solution is to add the possibility of tremble events, so as to give a choice probability Pm slightly different from the considered choice probability Pm . Recall from Section 2.1 that this gives Pm ¼ ð1 oÞPm þ o=2, where o is the tremble probability. Since Pm ¼ 1 8 m 2 Ofosd in an RP model, this implies that Pm ¼ 1 o=2 8 m 2 Ofosd in an RP model ‘‘with trembles.’’ For the other models, we need a ‘‘processing story’’ in which subjects begin by screening pairs for transparent dominance. If such a relationship is not found, then the noisy evaluative processes that generate a considered choice probability are undertaken. But if transparent dominance is found, then these processes are not undertaken since they are not necessary: A minimally sensible information processor simply would not put cognitive effort into such irrelevant computations after detecting dominance. However, we do add the possibility of a tremble event, just as with the RP model. Letting dm ¼ 1 if m 2 Ofosd , dm ¼ 0 otherwise, we can then write choice probabilities as follows: Pm ¼ ð1 oÞ½ð1 dm ÞPm þ dm þ
o 2
(28)
It is worth noting that Eq. (28) is equally applicable to all five models including the RP model. The explicit ‘‘transparent dominance detection’’ step introduced by dm ¼ 1 is not formally necessary for the RP model, but
242
NATHANIEL T. WILCOX
Eq. (28) is identical to Pm ¼ ð1 oÞPm þ o=2 in the case of the RP model (since Pm ¼ 1 whenever dm ¼ 1 with the RP model). Loomes et al. (2002, p. 126) argue ‘‘the low rate of [transparent] dominance violationsymust count as evidence [favoring the RP model]’’ because they feel that other stochastic models, such as strong utility, do not predict this. I find this unpersuasive. As a question of relevance, the fact that the RP model gets transparent dominance roughly right says nothing about its descriptive or predictive adequacy in pairs where interesting tradeoffs are at stake, which, of course, is what really matters to the bulk of applied microeconomic theory and empirical microeconomics. As a question of econometric modeling, Eq. (28) shows that it is trivial to add the restriction Pm ¼ 1 o=2 8 m 2 Ofosd to any stochastic model that already contains a tremble with no additional parameters, so there is no loss of parsimony associated with such a modification. As a theoretical question, minimally sensible information processors will detect easy (i.e., transparent) dominance relationships and exploit them so as to conserve cognitive effort, as mentioned above. And finally, violations of nontransparent FOSD occur at rates far too high to be properly described as tremble events, and in such cases the other models would predict choice probabilities closer to the truth (though still qualitatively incorrect) than the RP model does. Let us turn to this. 4.2.4. Nontransparent Dominance Violations The trouble with the RP model in particular, and generally for the other models, is that there are lottery pairs in which FOSD is not so transparently detectable. In such cases, the method of trembles is inadequate. Again, you know these when you see them. Here is an example from Birnbaum and Navarrete (1998): S n : 1=20 chance of $12; 1=20 chance of $14; 18=20 chance of $96 Rn : 2=20 chance of $12; 1=20 chance of $90; 17=20 chance of $96 The majority of subjects in Birnbaum and Navarrete’s experiment choose R in this pair even though S dominates R. Obviously, this cannot be explained as a tremble event, at least not one that occurs at the same very low probability that explains violations of transparent FOSD. Notice that this pair has a four-outcome context. This seems to be necessary for generating similar empirical examples, and the theoretical explanation offered by Birnbaum and Navarrete requires a four-outcome context.
Stochastic Models for Binary Discrete Choice Under Risk
243
Busemeyer and Townsend’s (1993) decision field theory also provides an intriguing explanation for nontransparent dominance violations. 4.2.5. Are FOSD Pairs Hydrogen or Hassium? Violations of nontransparent FOSD appear to put all of the five stochastic models in a serious bind: None of them can accommodate such examples without some deus ex machina. But how worried should we be about this? If our subjects were oxygen and FOSD pairs were hydrogen, and we were physical or biological scientists, we’d be terribly interested in FOSD pairs. Hydrogen, the most common element, plays a starring role in everything from stars to starfish. But if FOSD pairs are instead hassium, the situation is quite different. Hassium, with atomic number 108 and half-life of 14 sec, is one of the so-called transuranic elements – those things beyond uranium in the periodic table. The Wikipedia says this about them: All of [these] elementsyhave been first discovered artificially, and other than plutonium and neptunium, none occur naturally on earthy[Any] atoms of these elements, if they ever were present at the earth’s formation, have long since decayed. Those that can be found on earth now are artificially generatedyvia nuclear reactors or particle accelerators.
Many economists expect dominated alternatives to be akin to transuranic elements – that is, they expect dominated alternatives to have short half-lives in the real economic world, outside of labs. That expectation relies, at least in part, on competition amongst sellers; therefore it is not obvious at all that laboratory violations of dominance imply anything about the survivability of dominated alternatives in any long run equilibrium in the real world. A potential entrant may well be able to profit at the expense of an incumbent seller who (say) sells R for a higher price than S to consumers. The potential entrant can, after all, reframe the choice to expose the dominance relation just as easily as Birnbaum and Navarrete (1998) do: Snn : 1=20 chance of $12; 1=20 chance of $14; 1=20 chance of $96; 17=20 chance of $96 R : 2=20 chance of $12; 0=20 chance of $14; nn
1=20 chance of $90; 17=20 chance of $96 An entrant can advertise the choice this way instead and call explicit attention to the chicanery of the incumbent in her advertisement. Expressed this way, few subjects choose R. An experiment of this kind, allowing sellers to choose different frames for lotteries, as well as advertise
244
NATHANIEL T. WILCOX
informatively to buyers with comment on other sellers’ ads and offerings, could be quite interesting: Do ‘‘good frames’’ drive out ‘‘bad frames,’’ or is it the other way around? Physicists create transuranic elements in labs to learn things about nuclear physics in general, and so it is with FOSD pairs in our own labs. We may learn a good deal about decision making by doing this. So it is not a waste of time to look at FOSD pairs. And even if dominated alternatives mostly cease to exist in equilibrium, they could be important out of equilibrium and hence on paths to equilibrium. The issue is really one of relative importance: We should be much more interested in pairs that we think will be common both in equilibrium and out of it. Those are pairs that contain interesting tradeoffs, such as risk-return tradeoffs. The principle suggested by these thoughts is this. If there is a conflict in the explanatory power of stochastic models A and B that can be boiled down to ‘‘model A explains data better in pairs with interesting tradeoffs, or MPSs, etc., while model B explains data better only in FOSD pairs’’ then it seems to me that model A is the strongly favored choice. A corollary is this: Any argument in favor of any stochastic model, based solely on FOSD pairs, may be a relatively weak argument.
4.3. Context shifts and Parametric Utility Functions With any large number of outcomes, or for predicting choices with new outcomes, a parametric utility function for outcomes will frequently be required. Therefore, we need to know how these parametric forms behave when combined with each stochastic model. Recall the definitions of additive and proportional context shifts from Section 1.3.4. Define two stochastic model properties in terms of such shifts. Say that a stochastic model is CARA-neutral if Pm ¼ Pm0 for all subjects, whenever m ¼ fSm ; Rm g and m0 ¼ fS m0 ; Rm0 g differ by an additive context shift and the utilities of outcomes are given by CARA utility functions. Similarly, say that a stochastic model is CRRA-neutral if Pm ¼ Pm0 for all subjects, whenever m ¼ fS m ; Rm g and m0 ¼ fS m0 ; Rm0 g differ by a proportional context shift and the utilities of outcomes are given by CRRA utility functions. Only some of the stochastic models are CARA-neutral and CRRA-neutral. Throughout this section, the results will hold for both EU and RDEU structures. This is because the probability vectors in lottery pairs are definitionally constant across pairs that differ by an additive or proportional
Stochastic Models for Binary Discrete Choice Under Risk
245
context shift. Thus, probabilities (in EU) and weights (in RDEU) play no role in determining CARA- or CRRA-neutrality. 4.3.1. Random Preferences, Strict Utility, and Contextual Utility Recall from Section 1.3.4 that when utilities follow the CARA utility function, sets of pairs that differ by an additive context shift are preference equivalence sets for both EU and RDEU. If all utility functions in a subject’s RP ‘‘urn’’ are CARA utility functions, then, the preference equivalence set property of the RP model implies that Pm ¼ Pm0 for all subjects, whenever m and m0 differ by an additive context shift. Section 1.3.4 also showed that when utilities follow the CRRA utility function, sets of pairs that differ by a proportional context shift are preference equivalence sets for both EU and RDEU; so similarly, the RP model with only CRRA utility functions ‘‘in the urn’’ implies that Pm ¼ Pm0 for all subjects, whenever m and m0 differ by a proportional context shift. Therefore, RPs are both CARA- and CRRA-neutral. The logarithmic V-distance form in strict utility gives it CARA- and CRRA-neutrality too. Taking natural logarithms through both of the identities (4), we have ln½VðS m0 jbÞ ax þ ln½VðS m jbÞ and ln½VðRm0 jbÞ ax þ ln½VðRm jbÞ
(29)
for all subjects, for both EU and RDEU with CARA utility functions, whenever pairs m and m0 differ by an additive outcome shift x. It follows from Eq. (29) that ln½VðS m0 jbÞ ln½VðRm0 jbÞ ln½VðS m jbÞ ln½VðRm jbÞ
(30)
so that strict utility’s latent variable in F is constant across pairs that differ by an additive context shift. The choice probability is then constant across such pairs too, so strict utility is CARA-neutral. Similarly, taking natural logarithms through both of the identities (5), we have ln½VðSm0 jbÞ ð1 jÞ lnð yÞ þ ln½VðSm jbÞ and ln½VðRm0 jbÞ ð1 jÞ lnð yÞ þ ln½VðRm jbÞ
(31)
for all subjects, for both EU and RDEU with CRRA utility functions, where pairs m and m0 differ by the proportional outcome shift y. Eq. (30) also follows from these two identities. So by the same kind of argument, strict utility is CRRA-neutral as well.
246
NATHANIEL T. WILCOX
Because of contextual utility’s ratio of differences form, EU and RDEU contextual utility models are also CARA- and CRRA-neutral. Recall from Eq. (19) that contextual utility’s latent variable depends only on the ratio of differences um ¼ ðuk uj Þ=ðul uj Þ on context cm ¼ ð j; k; lÞ. Recall from Section 2.5 that with CARA utility, uzþx ¼ eax uz . Therefore, with CARA utility and an additive context shift cm0 ¼ ð j þ x; k þ x; l þ xÞ, we have um 0 ¼
eax uk eax uj uk uj ¼ ¼ um eax ul eax uj ul uj
(32)
With CARA utility, contextual utility’s latent variable is therefore unchanged by an additive context shift, so it is CARA-neutral. Similarly, since uyz ¼ y1j uz with CRRA utility, a proportional context shift cm0 ¼ ðyj; yk; ylÞ with CRRA utility gives um 0 ¼
y1j uk y1j uj uk uj ¼ ¼ um y1j ul y1j uj ul uj
(33)
So with CRRA utility, contextual utility’s latent variable is likewise unchanged by a proportional context shift, implying that it is CRRAneutral too. 4.3.2. Strong Utility and the Wandering Vector Model Strong utility and the WV model are neither CARA- nor CRRA-neutral. Consider first CARA utility on the context cm0 ¼ ð j þ x; k þ x; l þ xÞ: From the identities (4), strong utility’s latent variable is in this case VðS m0 jbÞ VðRm0 jbÞ eax ½VðSm jbÞ VðRm jbÞ
(34)
for any subject, for EU and RDEU. Taking the derivative with respect to x, we have @VðS m0 jbÞ VðRm0 jbÞ aeax ½VðSm jbÞ VðRm jbÞ @x
(35)
Obviously, this implies that the latent variable in a strong utility model with CARA utility changes with an additive context shift. Therefore, strong utility is not CARA-neutral. For risk-averse subjects (those with a40), an additive context shift moves choice probabilities in the direction of indifference, while for risk-seeking subjects (those with ao0), it moves choice probabilities away from indifference, making them more extreme. Similarly, if we have CRRA utility on the context cm0 ¼ ð yj; yk; ylÞ, the identities (5) imply that for any subject, both EU and RDEU, strong utility’s
Stochastic Models for Binary Discrete Choice Under Risk
247
latent variable in pair m0 is VðS m0 jbÞ VðRm0 jbÞ y1j ½VðS m jbÞ VðRm jbÞ
(36)
Taking the derivative with respect to the proportional shift y, we have @VðSm0 jbÞ VðRm0 jbÞ ð1 jÞyj ½VðS m jbÞ VðRm jbÞ @y
(37)
Again, this implies that the latent variable in a strong utility model with CRRA utility generally changes with a proportional context shift. Therefore, strong utility is not CRRA-neutral. Because the CRRA utility function approaches lnðzÞ as j ! 1, call CRRA utility functions with j41 ‘‘sublogarithmic.’’ For subjects with sublogarithmic CRRA utility, a proportional context shift moves choice probabilities in the direction of indifference, while for other subjects with jo1, it moves choice probabilities away from indifference, making them more extreme. All of these results apply equally to the WV model since probability vectors are definitionally held constant across pairs m and m0 that differ by either an additive or proportional context shift. The Euclidean distance between probability vectors in such pairs is therefore constant across pairs: That is, d m d m0 ¼ d. Therefore, the derivatives in Eqs. (35) and (37) for strong utility simply differ by the factor d 1 in the WV model, which is positive since d is a distance. So the WV model is neither CARA- nor CRRA-neutral. 4.3.3. Patterns of Risk Aversion Across Contexts: Stochastic Models Versus Structure CARA- and CRRA-neutrality are important properties of stochastic models because they identify changes in risk-taking behavior across contexts as structural differences. Consider for instance the well-known experiment of Holt and Laury (2002). Holt and Laury examine binary choices from pairs on two four-outcome contexts that differ by a 20-fold ( y ¼ 20) proportional context shift.18 There is a general shift toward safer choices in pairs after the proportional context shift, which Holt and Laury interpret as increasing relative risk aversion. The results of this section demonstrate that this interpretation depends on an implicit stochastic identifying restriction. In particular, Holt and Laury (2002) implicitly assume that the true stochastic model is CRRA-neutral. As we have seen, that could be RPs, strict utility, or contextual utility – all of these are CRRA-neutral. In fact, Holt and Laury go on to specify a strict
248
NATHANIEL T. WILCOX
utility EU model, with a flexible ‘‘expo-power’’ utility function for maximum-likelihood estimation, and the estimates confirm their interpretation of increasing relative risk aversion. The results of this section basically imply that once Holt and Laury select strict utility, the estimation need not be done. If the probability of safer choices increases with a proportional context shift, strict utility must put this down to increasing relative risk aversion in the structural sense of that term because strict utility is CRRAneutral: It cannot do otherwise. This is not simply an academic point since other stochastic models are not CRRA-neutral. For instance, suppose that strong utility is the true stochastic model, that CRRA EU is the true structure, and that most subjects have a (constant) coefficient of relative risk aversion between zero and one, which is typical of estimates (Harrison & Rutstro¨m, 2008 – this volume). Then Eq. (37) implies that if most subjects prefer the safe lottery in some pair m, a proportional context shift of that pair will increase their probability of choosing the safe lottery in the shifted pair. This is precisely what Holt and Laury (2002) report. The lesson here resembles that learned from the discussion of the common ratio effect, though in this case heterogeneity (mixing across different subject types) is not part of the problem. Qualitative patterns of risky choice do not by themselves tell us what structure we are looking at because stochastic models interact with structure in nontrivial ways. To tell whether Holt and Laury (2002) are looking at increasing relative risk aversion and a CRRA-neutral stochastic model, or in contrast constant relative risk aversion and a stochastic model that is not CRRA-neutral, we need to do more. In the event, actual comparisons of log likelihoods (Harrison & Rutstro¨m, 2008 – this volume) suggest that Holt and Laury’s conclusion was correct. But separate estimations with different stochastic models, and a comparison of log likelihoods, was necessary to validate that conclusion: The qualitative pattern of results simply cannot decide the issue on its own. Econometrically, CARA- and CRRA-neutrality can be viewed as ‘‘desirable’’ features of a stochastic model precisely because of the strong structural identification implied by these properties. There is also a theoretical sense in which these are ‘‘nice’’ properties. In deterministic EU and RDEU, we single out CARA and CRRA utility functions for special notice because they create preference equivalence sets with additive and proportional context shifts, respectively. CARA- and CRRA-neutrality are intuitively satisfying stochastic choice reflections of these special deterministic preference properties. I understand and sympathize with this kind of
Stochastic Models for Binary Discrete Choice Under Risk
249
theoretical appeal: It resembles the appeal that contextual utility has by virtue of creating congruence between stochastic and structural definitions of MRA. It is not clear, though, that we are required to choose stochastic models that create properties that mirror the deterministic structure in some theoretically satisfying manner.
4.4. Simple Scalability Because they satisfy SST with any transitive structure, strong and strict utility must satisfy simple scalability with the EU or RDEU structure. Recall that contextual utility is observationally identical to strong utility (and satisfies SST) for pairs that share the same context. Therefore, contextual utility must also satisfy simple scalability for pairs on the same context. However, the moderate utility models will not, in general, satisfy simple scalability. For instance, contextual utility can violate both SST and simple scalability across pairs that have different contexts. The WV model can violate both SST and simple scalability even for pairs that share the same context, since its heteroscedasticity varies across pairs with the same context (unlike contextual utility). Violations of simple scalability for pairs that share the same context would therefore reject strong, strict, and contextual utility in favor of the WV model or some other alternative, such as decision field theory (Busemeyer & Townsend, 1993), that permits heteroscedasticity across pairs with a common context. Recall that simple scalability implies an ordering independence property of choice probabilities across special sets of four pairs, which we can call a quadruple. Let the pairs {C,E} and {D,E} be indexed by ce and de, respectively, and let the pairs fC; E 0 g and fD; E 0 g be indexed by ce0 and de0 , respectively. Then simple scalability requires Pce Pde iff Pce0 Pde0 . RP models do not in general require this, as shown by this counterexample. Consider an urn with three linear orderings (from best to worst) in it: Two ‘‘copies’’ of the ordering DE 0 CE, and one of the ordering CE 0 ED. Supposing that each of these three orderings is equally likely to be drawn on any choice trial, we have Pce ¼ 1 and Pde ¼ 2=3, but also have Pce0 ¼ 1=3 and Pde0 ¼ 2=3. So like the WV model and decision field theory, RPs do not in general need to satisfy simple scalability. Though Hey’s (2001) data set provides many opportunities to test this property, most of the suitable quadruples are ones where C first-order stochastically dominates D. Of course, the pair fC; Dg is not itself involved in a test of the ordering independence property, and this property must still
250
NATHANIEL T. WILCOX
hold when fC; Dg is an FOSD pair. Unfortunately, such quadruples make less sharp distinctions between the stochastic models. For instance, if fC; Dg is an FOSD pair, every linear ordering in which D precedes E must also be an ordering in which C precedes E (since linear orderings are transitive), for any structure that satisfies FOSD. Consider, then, an urn filled with linear orderings of the lotteries in the quadruple, and suppose fC; Dg is an FOSD pair: The number of linear orderings in this urn for which D is preferred to E cannot exceed the number of linear orderings in this urn for which C is preferred to E in this instance. Therefore, even a RP model should obey the ordering independence property implied by simple scalability whenever fC; Dg is an FOSD pair. Additionally, quadruples where the sample proportions Pce and Pde (or Pce0 and Pde0 ) happen to be very close to either zero or one cannot significantly violate the ordering independence property for any appreciable number of subjects: The constraint Pnce ¼ Pnde (or Pnce0 ¼ Pnde0 ) will necessarily be satisfied with little loss of fit for virtually every subject n in such cases. Therefore, ordering independence imposed as a constraint will necessarily result in little loss of fit, for virtually all subjects, in any such quadruple. Unfortunately, many potentially interesting quadruples in Hey’s design have this sample characteristic in his data, making them relatively uninformative about simple scalability. Table 3 shows the only pairs from Hey’s (2001) data set that I regard as ‘‘suitable’’ for a test of simple scalability by the ordering independence property. Suitability is defined in two ways, based on the immediately preceding discussion. First, the pair fC; Dg is not an FOSD pair; and second,
Table 3.
Pairs from Hey (2001) Used for A Limited Test of Simple Scalability.
The Lotteries, all on the Context (0,d50,d150)
C ¼ ð2=8;3=8;3=8Þ, D ¼ ð3=8;1=8;4=8Þ, E ¼ ð0;7=8;1=8Þ, E 0 ¼ ð1=8;6=8;1=8Þ, E 00 ¼ ð1=8;7=8;0Þ
The Pairs Pair
Sample Proportion (choices of C or D)
fC; Eg fD; Eg fC; E 0 g fD; E 0 g fC; E 00 g fD; E 00 g fC; Dg
0.132 0.117 0.498 0.309 0.626 0.449 0.785
Stochastic Models for Binary Discrete Choice Under Risk
251
the sample proportions Pce and Pde are in the interval [0.10,0.90], and hence bounded away from zero or one, for each ‘‘standard lottery’’ E involved in these pairs. As Table 3 shows, there are six pairs all involving the same two lotteries C and D, which are each paired against three different ‘‘standard lotteries’’ denoted by E, E 0 , and E 00 . Thus, we have six pairs in all, in principle forming three quadruples. These are not three independent quadruples: If we impose ordering independence for any two of them, then ordering independence will be true in the third quadruple as well. Put differently, ordering independence for these six pairs is the imposition of the two nonlinear constraints ðPnce Pnde ÞðPnce0 Pnde0 Þ 0 and ðPnce0 Pnde0 Þ ðPnce00 Pnde00 Þ 0, for each subject n. Hey’s design also happens to present the pair fC; Dg directly, and the logic of the ordering independence property implies that we must also have Pnce Pnde iff Pncd 0:5. Therefore, we can add each subject’s choice data for the direct choice between C and D to the test, and add a third nonlinear constraint ðPnce Pnde ÞðPncd 0:5Þ 0 to the previous two for each subject n. As usual, with three constraints, twice the difference between the unrestricted and restricted log likelihood for each subject is distributed w2 with three degrees of freedom. This restriction is not rejected for any of Hey’s 53 subjects at the 10% level (and obviously holds overall). This is a ‘‘happenstance test’’ of simple scalability: I am simply working with what happens to be available in Hey’s (2001) data set. Yet it is of some interesting since the three ‘‘standard lotteries’’ E, E 0 , and E 00 happen to be distinctive (see Table 3). Lottery E has a zero probability of the lowest outcome on the context: This may call extra attention to the nonzero probabilities in C and D of receiving that lowest outcome, and that might make D look especially poor in comparison to E. Likewise, lottery E 00 has a zero probability of the highest outcome on the context: This may call extra attention to the nonzero probabilities in C and D of receiving the highest outcome, and that might make D look especially good in comparison to E 00 . So the test does in principle put some stress on simple scalability, which is intuitively the assumption that such differential effects of the standard of comparison are weak or nonexistent. On the other hand, I take little comfort from this test. Hey (2001) did not deliberately design his experiment as a test of simple scalability. The test performed here uses pairs that are entirely on a single context. The most robust violations of simple scalability found in the psychological canon involve pairs with different contexts: To explain these violations, we need theories like decision field theory and contextual utility that relax SST across contexts.19 Experimental economists need to deliberately set about testing
252
NATHANIEL T. WILCOX
simple scalability with suitable designs that replicate and extend what psychologists have already done. It is at the heart of latent variable approaches to modeling discrete choice under risk.
4.5. Generalizability and Tractability: The Special Problem of Random Preferences When contexts vary in a data set, or when we wish to predict choices from one context to another, the generalizability of stochastic models across contexts becomes an important consideration for choosing amongst them. For most of the stochastic models discussed here this is not a pressing issue. RP models, however, are inherently difficult to generalize across contexts with structures that are more complex than EU. In fact, RDEU models with RPs quickly become econometrically intractable except for special cases – and even in these special cases, generalizing the model across all contexts is not transparent. I now work out one such special case that illustrates these problems. Consider an experiment such as Hey and Orme (1994) that uses four equally spaced money outcomes including zero. Let (0,1,2,3) denote such outcomes, and let a subject’s random utility vector for the four outcomes be ð0; 1; u2 ; u3 Þ, where u2 1 and u3 u2 . In Hey and Orme (1994), as well as Hey (2001) and Harrison and Rutstro¨m (2005), pairs are on the four possible three-outcome contexts one may create from the four outcomes (0,1,2,3). Index the four contexts by their omitted outcome: For instance, c ¼ 3 (read as ‘‘not outcome 3’’) indexes the context (0,1,2). The three left columns of Table 4 summarize these four contexts and their utility vectors. Table 4. Contexts and Random Preference Representation for Experiments with Four Overlapping Three-outcome Contexts. Context Index (c)
3 2 1 0
Context c, with Outcomes ð j; k; lÞ
Utility Vector on Context c
vm ðul uk Þ=ðuk uj Þ on Context c with Outcomes ð j; k; lÞ
vm in Terms of the Underlying Random Variables g1 and g2, on Context c
(0,1,2) (0,1,3) (0,2,3) (1,2,3)
ð0; 1; u2 Þ ð0; 1; u3 Þ ð0; u2 ; u3 Þ ð1; u2 ; u3 Þ
u2 1 u3 1 ðu3 u2 Þ=u2 ðu3 u2 Þ=ðu2 1Þ
g1 g1 þ g2 g2 =ðg1 þ 1Þ g2 =g1
253
Stochastic Models for Binary Discrete Choice Under Risk
It should be clear that random preference RDEU requires a choice for the joint distribution of the two utility parameters u2 and u3 in a subject’s ‘‘RP urn.’’ But in order to use the elegant specification of Loomes et al. (2002) introduced earlier, we will need to choose that joint distribution cleverly so that vm ðul uk Þ=ðuk uj Þ will have a tractable distribution on each context ð j; k; lÞ: This is because vm is the key random variable of that specification, as shown in the discussion of Eqs. (9) and (10) earlier. To explore this, let g1 u2 1 2 Rþ and g2 u3 u2 2 Rþ be two underlying random variables generating the two random utilities as u2 ¼ 1 þ g1 and u3 ¼ 1 þ g1 þ g2 . Then, algebra shows the following: g1
for pairs m on c ¼ 3; that is the context ð0; 1; 2Þ;
g1 þ g2 g2 vm ¼ g1 þ 1 g2 g1
for pairs m on c ¼ 2; that is the context ð0; 1; 3Þ; for pairs m on c ¼ 1; that is the context ð0; 2; 3Þ; and for pairs m on c ¼ 0; that is the context ð1; 2; 3Þ (38)
These results are also summarized in the two right columns of Table 4. With vm expressed in terms of the two underlying random variables g1 and g2 , we need a joint distribution of g1 and g2 that will generate tractable parametric distributions of as many of the context-specific forms taken by vm as possible. The best choice I am aware of still only works for three of the four forms in Eq. (38). That choice is two independent gamma variates, each with the gamma distribution c.d.f. Gðxjf; kÞ, with identical ‘‘scale parameter’’ k but possibly different ‘‘shape’’ parameters f1 and f2 . Under this choice, we have the following: Gamma with c:d:f: Gðxjf1 ; kÞ for pairs m on context c ¼ 3; vm is distributed . . .
Gamma with c:d:f: Gðxjf1 þ f2 ; kÞ for pairs m on context c ¼ 2; Beta-prime with c:d:f: B0 ðxjf2 ; f1 Þ for pairs m on context
(39)
c ¼ 0
Sums of independent gamma variates with common scale have gamma distributions, and ratios of independent gamma variates with common scale have beta-prime distributions on R+, also called ‘‘beta distributions of the second kind’’ (Aitchison, 1963).20 These assumptions also imply a joint
254
NATHANIEL T. WILCOX
distribution of u2 1 and u3 1 knownpas ‘‘McKay’s bivariate gamma ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi distribution’’ and a correlation coefficient f1 =ðf1 þ f2 Þ between u2 and u3 in the subject’s ‘‘RP urn’’ (McKay, 1934; Hutchinson & Lai, 1990). Notice that Eq. (39) only involves three parameters – two shape parameters f1 and f2 , and a scale parameter k. These parameters correspond to the three parameters one would find in the other four stochastic models when combined with EU, in the form of the (nonrandom) utilities u2 and u3 , and the precision parameter l. Of course, when combined with RDEU, all five models of choice probabilities would also include a weighting function parameter such as g. An acquaintance with the literature on estimation of random utility models may make these assumptions seem very special and unnecessary. They are very special, but this is because theories of risk preferences over money outcomes are very special relative to the kinds of preferences that typically get treated in that literature. Consider the classic example of transportation choice well-known from Domencich and McFadden (1975). Certainly we expect the value of time and money to be correlated across the population of commuters. But for a single commuter making a specific choice between car and bus on a specific morning, we do not require a specific relationship between the disutility of commuting time and the marginal utility of income she happens to ‘‘draw’’ from her random utility urn on that particular morning. This gives us fairly wide latitude when we choose a distribution for the unobserved parts of her utilities of various commuting alternatives. This is definitely not true of any specific trial of her choice from a lottery pair m. The spirit of the RP model is that every preference ordering drawn from the urn obeys all properties of the preference structure (Loomes & Sugden, 1995). We demand, for instance, that she ‘‘draw’’ a vector of outcome utilities that respects monotonicity in z; this implies that the joint distribution of u2 and u3 must have the property that u3 u2 1. Moreover, the assumptions we make about the vm must be probabilistically consistent across pair contexts. Choosing a joint distribution of u2 and u3 immediately implies exact commitments regarding the distribution of any and all functions of u2 and u3 . The issue does not arise in a data set where subjects make choices from pairs on just one context, as in Loomes et al. (2002): In this simplest of cases, any distribution of vm on R+, including the lognormal choice they make, is a wholly legitimate hypothesis. But as soon as each subject makes choices from pairs on several different overlapping contexts, staying true to the demands of RP models is much more exacting. Unless we can specify a joint distribution of g1 and g2 that implies it, we are not entitled (for instance) to assume that vm follows lognormal distributions in all of three overlapping contexts for a single subject.21 Put differently, a
Stochastic Models for Binary Discrete Choice Under Risk
255
choice of a joint distribution for vm in two contexts has exact and inescapable implications for the distribution of vm on a third context that shares outcomes with the first two contexts. Carbone (1997) correctly saw this in her treatment of EU with RPs. Under these circumstances, a cagey choice of the joint distribution of g1 and g2 is necessary. RP models can be quite limiting in practical applications. For instance, notice that Eq. (39) gives no distribution of vm on the context c ¼ 1 (i.e. (0,2,3)). Fully a quarter of the data from experiments such as Hey and Orme (1994) are on that context. As far as I am aware, the following is a true statement, though I may yet see it disproved. Conjecture. There is no nondegenerate joint distribution of g1 and g2 on (R+)2 such that g1 , g1 þ g2 , g2 =ðg1 þ 1Þ and g2 =g1 all have tractable parametric distributions. Shortly I compare the stochastic models using Hey and Orme’s data, and limit myself to just the choices subjects made from pairs on the contexts (0,1,2), (0,1,3) and (1,2,3). These are the contexts that the ‘‘independent gamma model’’ of RPs developed above can be applied to, and I am not aware of any alternative that would permit parametric estimation of random preference RDEU across all four contexts. There are no similar practical econometric modeling constraints on strict, strong, or contextual utility models, or WV models, with RDEU (a considerable practical point in their favor); these models are all applied with relative ease to choices on any number of different outcome contexts. Specifications that adopt some parametric form for the utility of money, and then regard the randomness of preference as arising from the randomness of a utility function parameter, offer no obvious escape from these difficulties, at least for RDEU. For instance, if we adopt the CRRA form, it is fairly simple to show that this implies vm ¼ ðl 1j k1j Þ=ðk1j j 1j Þ, where ð j; k; lÞ is the context of pair m. Substituting into Eq. (10), we then have 1j l k1j wðsmk þ sml jgÞ wðrmk þ rml jgÞ Pm ¼ Pr 1j 1j wðrml jgÞ wðsml jgÞ k j
(40)
There are two possible routes for implementing this when 1 j is a random variable. The first is to solve the inequality in Eq. (40) for 1 j as a function of j, k, and l, the pair characteristics, and whatever parameters of w(q) we have. We could then choose a distribution for 1 j and be done. I invite
256
NATHANIEL T. WILCOX
readers to try it with any popular weighting function: Contexts (0,1,2) and (0,1,3) are simple but the context (1,2,3) is intractable. A second route is suggested by an approach that works well for the EU structure where w(q) q. In the EU case, although we still cannot analytically solve Eq. (40) for all contexts, we can easily use numerical methods to find 1 jm (to any desired degree of accuracy) prior to estimation, for each pair m on whatever context, such that l 1jm k1jm smk þ sml ðrmk þ rml Þ ¼ rml sml k1jm j 1jm
(41)
Here, jm is that coefficient of relative risk aversion that makes a subject indifferent between the lotteries in pair m. With this in hand, we can choose a distribution H 1 j ðxjaÞ for 1 j and use Pm ¼ H 1j ð1 jm jaÞ as our model of considered choice probabilities under EU with RPs: The probability of choosing the safe lottery is simply the probability that the subject draws a coefficient of relative risk aversion larger than jm from her RP urn. For RDEU, however, 1 jm is a function of any parameters of the weighting function w(q). In terms of well-known theory, risk aversion arises from both the utilities of money and the weighting function, so there is no unique coefficient of relative risk aversion, independent of the weighting function, that makes the subject indifferent between the lotteries in pair m. Therefore we cannot simply provide a constant 1 jm to our model as we can with EU: We need the function 1 jm ðgÞ (in the case of the Prelec weighting function with parameter g) so that we can write Pm ¼ H 1j ½1 jm ðgÞja. But we have been here before: We cannot analytically solve Eq. (40) for this function, so it would have to be approximated numerically on the fly, for each pair m, within our estimation. Numerical methods probably exist for such tasks, but they are beyond my current knowledge. On the basis of this discussion, I think it fair to say that in the case of RDEU, RP models are much less generalizable (in the sense of econometric tractability) across contexts than are other stochastic models.
4.6. Summary of Stochastic Model Properties Table 5 summarizes the properties of the stochastic models at a glance. The following conclusion is inescapable: All of the stochastic models, when combined with an EU structure, have a prediction or property which either (a) can be, and has been, taken as a violation of EU, or (b) is
No (false CRE possible with heterogeneity) No
Not without trembles Yes Yes No
No (false CRE possible with heterogeneity) No
Not without trembles
No
Yes
No
Invariance of choice probabilities to common ratio change with EU Invariance of choice probabilities in spread triple with EU Near-zero probability of choosing stochastically dominated lottery CARA and CRRA neutrality Tractable generalization across contexts Sensible stochastic meaning of ‘‘more risk averse’’ in Pratt’s sense
SST
Strict utility
SST
Strong utility
Yes
Yes
Yes
Not without trembles
SST within a context, MST across contexts No (false CRE possible with heterogeneity) No
Contextual utility
Stochastic Model Wandering vector
No
Yes
No
Possible but not always meaningful
Not for RDEU
Yes
Yes
Yes
Yes
Not without trembles
Yes
No
Random preferences
Yes
MST
A Summary of Stochastic Model Properties.
Stochastic transitivity
Property
Table 5.
Stochastic Models for Binary Discrete Choice Under Risk 257
258
NATHANIEL T. WILCOX
econometrically problematic. For instance, strong, strict, and contextual utility are all capable of producing the ‘‘false common ratio effect’’ described in Section 3.1.2 when subjects are heterogeneous; the WV model is neither CARA- nor CRRA-neutral; and RPs have no stochastic transitivity properties at all. My own view, in the end, is that it is rather pointless to single out some specific ‘‘weird’’ or ‘‘difficult’’ feature of a stochastic model as a criterion for rejecting it: Each model has its own weird and/or difficult features which are not shared by all models. This is just another way of saying that stochastic models are unavoidably consequential when it comes to discrete choice and choosing between structures such as EU and RDEU. It is simply wrong to claim otherwise.
5. AN OVERALL ECONOMETRIC COMPARISON OF THE STOCHASTIC MODELS Henceforth, I refer to any combination of a structure and a stochastic model as a specification and denote these as ordered pairs. The two structures used here are denoted EU and RDEU as always, while the stochastic models will be denoted as strong, strict, contextual, WV, and RP. For instance, the specification (EU,Contextual) is an EU structure and the contextual utility stochastic model, while the specification (RDEU,RP) is an RDEU structure and the RP stochastic model. On occasion an index for specifications will be helpful; let this be s, not to be confused with the subscripted s that are probabilities in a safe lottery, as in S m ¼ ðsmj ; smk ; sml Þ. In Sections 3 and 4, specific predictions and properties of various specifications were discussed and in some cases tested with Hey’s (2001) data. These piecemeal tests have some usefulness because they identify specific ways in which specifications fail; in doing so, these tests can suggest specific avenues for theoretical improvement. Additionally, most of these tests are free of assumptions about cumulative distribution functions, functional forms and/or parameter values. But these piecemeal tests confine attention to very narrow sets of lottery pairs. In the tests of Section 4, I used (in all) 16 lottery pairs from Hey’s data, but that data set has choices from 92 distinct pairs. There is an obvious danger associated with focusing attention only on sets of pairs where specifications deliver crisp predictions: We could miss the fact (if it is a fact) that some specifications have relatively good explanatory and/or predictive performance across broad sets of pairs,
Stochastic Models for Binary Discrete Choice Under Risk
259
even if they fail specific tests across narrow sets of pairs. So this question naturally arises: Which stochastic model, when combined with EU or RDEU, actually explains and predicts binary choice under risk best, in an overall sense – that is across a broad collection of pairs? In this section, I bring together what I regard as some of the best insights and methods, both large and small, for answering such questions. Although my particular combination of these insights and methods is unique, I do not view it as particularly innovative. All I really do here is combine and extend contributions made by many others. Readers can usefully view what I do here as an elaboration of Loomes et al.’s (2002) approach that allows for more pair contexts, more kinds of heterogeneity and more stochastic models, though it also calls on certain independent insights, such as those of Carbone (1997) about RP models. In the large, I add an emphasis on prediction as opposed to explanation – an emphasis that certainly precedes me (Busemeyer & Wang, 2000). There are things I will not do here. This chapter is a drama about stochastic models; the structures are ‘‘bit parts’’ in this play. My strategy is to write down and estimate specifications so that they all depend on equal numbers of parameters, conditional on their structure. The question I then focus on is this: Holding structure constant (i.e., given an EU or RDEU structure), which stochastic model performs best? With numbers of parameters deliberately equalized across stochastic models, holding structure constant, this question can then be answered without taking a position on the value of parsimony. Others will decide whether they are willing, for instance, to pay the extra parameters required to get the extra fit (if indeed there is any) of RDEU over EU: My rhetorical pose is that this does not interest me. Yet as will be clear later, this is more than a pose: The data may tell us that stochastic models are more consequential than structures, and this appears to be the case in prediction. Recently, finite mixture models have appeared, in which a population is viewed as a mixture of two or more specifications. For instance, Harrison and Rutstro¨m (2005) consider a ‘‘wedding’’ of EU and cumulative prospect theory, and Conte, Hey, and Moffatt (2007) later considered an alternative marriage of EU and RDEU. In both cases, the population is viewed as composed of some fraction f of one specification, and a fraction 1 f of another. The fraction f then becomes an extra parameter to estimate. Without prejudice, I do not pursue this kind of heterogeneity here; consult Harrison and Rutstro¨m (2008 – this volume) for an example and discussion.
260
NATHANIEL T. WILCOX
5.1. The Data and its Special Features To compare models in an overall way, we need a suitable data set. Recall that strong utility and contextual utility are observationally identical on any one context. Therefore, data from any experiment where no subject makes choices from pairs on two or more contexts, such as Loomes and Sugden’s (1998) data, are not suitable: Such data cannot distinguish between strong and contextual utility. The experiment of Hey and Orme (1994), hereafter HO, is suitable since all subjects make choices from pairs on four distinct contexts. However, Section 4.5 showed that tractable parametric version of random preference RDEU can only be extended across three of those contexts. Therefore, I confine attention to those three contexts: ‘‘The HO data’’ henceforth means the 150 choices Hey and Orme’s 80 subjects made from pairs on the contexts (0,d10,d20), (0,d10,d30), and (d10,d20,d30). As in Section 4.5, these contexts are denoted (0,1,2), (0,1,3), and (1,2,3), and indexed by their omitted outcome as 3, 2, and 0, respectively. The HO design has another relatively nice feature. In Section 4.5, jm was defined as that coefficient of relative risk aversion that would produce indifference between the lotteries in basic pair m under the EU structure. Let j and j be the maximum and minimum value, respectively, of jm across the pairs used in some experiment. We can call ½j ; j the identifying range of the experiment since the experiment’s pairs cannot identify coefficients of relative risk aversion falling outside this range. A big identifying range is desirable if we suspect that the distribution of j may have substantial tails in a sampled population.22 In Loomes and Sugden (1998), the identifying range is [0.32,0.68] for subjects choosing from pairs on the context (0,d10,d20), and [0.17,0.74] for subjects choosing from pairs on the context (0,d10,d30). For Harrison and Rutstro¨m’s (2005) subjects who make choices from ‘‘gain only’’ pairs on contexts formed from the outcomes (0,$5,$10,$15), the identifying range [0.15,2.05] is substantially broader. In HO’s design, we have a still broader identifying range of [0.71,2.87]. So the HO data is relatively attractive in this sense. The HO experiment allowed subjects to express indifference between lotteries. HO model this with an added ‘‘threshold of discrimination’’ parameter within a strong utility model. An alternative parameter-free approach, and the one I take here, treats indifference in a manner suggested by decision theory, where the indifference relation Sm n Rm is defined as the intersection of two weak preference relations, i.e. ‘‘Sm kn Rm \ Rm kn Sm .’’ This suggests treating indifference responses as two responses in likelihood functions – one of Sm being chosen from mc, and another of Rm
Stochastic Models for Binary Discrete Choice Under Risk
261
being chosen from m – but dividing that total log likelihood by two since it is really based on just one independent observation. Formally, the definite choice of S m by subject n adds lnðPnm Þ to the total log likelihood; the definite choice of Rm adds lnð1 Pnm Þ to that total; and indifference adds ½lnðPnm Þ þ lnð1 Pnm Þ=2 to that total. See also Papke and Wooldridge (1996) and Andersen, Harrison, Lau, and Rutstro¨m (2008) for related justifications of this approach. The HO experiment contains no FOSD pairs; therefore, we need no special specification to account for low rates of violation of transparent FOSD. However, Moffatt and Peters (2001) found significant evidence of nonzero tremble probabilities using the HO data, so I nevertheless add a tremble probability to all specifications after the manner of Eq. (6). Using Hey’s (2001) still larger data set, which contains 125 observations on each of four contexts, for each subject, I have estimated tremble probabilities on separately on all four contexts, for each subject. This estimation reveals no significant correlation of these subject-specific estimates of on across contexts, suggesting that there is no reliable between-subjects variance in tremble probabilities – that is, that on ¼ o for all n – and I will henceforth assume that this is true of the population in all cases. In the sampled population, this is the assumption of no heterogeneity of tremble probabilities in the population, that is oc ¼ o for all c. Under this assumption, likelihood functions are in all instances built from probabilities that contain this invariant tremble probability: In the sampled population, this is Pcm ¼ ð1 oÞPcm þ o=2 for all c. The discussion here concentrates wholly on specifications of Pcm and its distribution in the sampled population. 5.2. Two Kinds of Comparisons: In-Sample Versus Out-of-Sample Fit I compare the performance of specifications in two ways. The first way (very common in this literature) are ‘‘in-sample fit comparisons.’’ Parameters are estimated for each specification by maximum likelihood, using choice data from all three of the HO contexts (0, 2, and 3), and the resulting log likelihood of specifications for all three contexts are compared. The second way, which is rare in this literature but well-known generally, compares the ‘‘out-of-sample’’ fit of specifications – that is, their ability to predict choices on pairs that are not used in estimation. For these comparisons, parameters are again estimated for each specification by maximum likelihood, but using only choice data from the two HO contexts 2 and 3, that is contexts (0,1,3) and (0,1,2). These estimated parameters are then used to predict
262
NATHANIEL T. WILCOX
choice probabilities and calculate the log likelihood of observed choices on HO context 0, that is context (1,2,3), for each specification. This is something more than a simple out-of-sample prediction, which could simply be a prediction to new choices made from pairs on the same contexts used for estimation, which Busemeyer and Wang (2000) call ‘‘cross-validation:’’ It is additionally an ‘‘out-of-context’’ prediction, which Busemeyer and Wang call ‘‘generalization.’’ This particular kind of out-of-sample fit comparison may be quite difficult in the HO data. Relatively safe choices are the norm for pairs on the contexts 2 and 3 of the HO data: The mean proportion of safe choices made by HO subjects in these contexts is 0.764, and at the individual level this proportion exceeds ½ for seventy of the 80 subjects. But relatively risky choices are the norm for pairs on the context 0 of the HO data: The mean proportion of safe choices there is just 0.379, and falls short of ½ for 58 of 80 subjects. Out-of-sample prediction will be difficult: From largely safe choices in the ‘‘estimation contexts’’ 2 and 3, specifications need to predict largely risky choices in the ‘‘prediction context’’ 0.
5.3. Choosing an Approach to the Utility of Money The apparent switch in the balance of safe choices across contexts has its counterpart in Hey and Orme’s (1994) estimation results. HO estimate a variety of structures combined with strong utility, and estimate these specifications individually – that is, each structure is estimated separately for each subject n, using strong utility as the stochastic model. Additionally, for all structures that specify a utility function on outcomes, HO take the nonparametric approach to the utility function. Given the latent variable form of strong utility models and the affine transformation property of the utility of money, just three of the five potential parameters ln , un0 , un1 , un2 , and un3 are identified. HO set un0 ¼ 0 and ln ¼ 1, and estimate un1 , un2 , and un3 directly. This allows the utility function to take arbitrary shapes across the outcome vector (0,1,2,3). HO found that estimated utility functions overwhelmingly fall into two classes: Concave utility functions, and inflected utility functions that are concave on the context (0,1,2) but convex on the context (1,2,3). The latter class is quite common, accounting for 30–40% of subjects (depending on the structure estimated). Because of this, I follow HO and avoid simple parametric functional forms (such as CARA or CRRA) that force concavity or convexity across the entire outcome vector (0,1,2,3), instead adopting their nonparametric
Stochastic Models for Binary Discrete Choice Under Risk
263
treatment of utility functions in strong, strict, contextual, and WV specifications – that is, non-RP specifications. This seems especially advisable here, where the focus is on the performance of the stochastic models. However, I set un0 ¼ 0 and un1 ¼ 1 for all subjects n, and view ln , un2 , and un3 as the individual parameters of interest in non-RP specifications, in keeping with the parameterization conventions of this chapter. The similar move in the case of RP specifications is allowing independent draws of the gamma variates gn1 and gn2 that determine the distribution of subject n’s random utilities un2 and un3 ; for this purpose we view the shape parameters fn1 and fn2 , and the scale parameter kn , as the individual parameters of interest. This also allows for both the concave and inflected shapes of mean utility functions across subjects, as reported by Hey and Orme (1994).
5.4. Allowing for Heterogeneity One of my themes has been that aggressive aggregation can destroy or distort specification predictions and properties at the level of individuals when subjects in fact differ, as illustrated earlier in Sections 3.1.2 and 4.1.3. Therefore, it seems prudent to allow for heterogeneity in econometric comparisons of specifications. There are several different ways to approach heterogeneity. Perhaps the most obvious way is to treat every subject separately, estimating parameters of specifications separately for every subject: Call this individual estimation. This approach has much to recommend it in principle, and admirable exemplars both in economics (Hey & Orme, 1994; Hey, 2001) and psychology (Tversky, 1969). If individual subject samples were ‘‘large’’ in the sense that they were big enough for asymptotic properties to approximately hold true with individual estimation, there would perhaps be nothing left to say. Many would say that in this case, individual estimation dominates any alternative for the purpose of evaluating stochastic models of individual behavior. In the HO data, we have 150 observations per subject. Is this sample size ‘‘large’’ in the aforementioned sense? Each discrete choice carries very little information about any hypothesized continuous latent construct we wish to estimate, such as parameters of a V-distance or the precision parameter l. Additionally, estimating k parameters of a nonlinear function is very different from estimating effects of k orthogonal regressors, such as k independently varied treatment variables. This is because the first derivatives of a nonlinear function with
264
NATHANIEL T. WILCOX
respect to parameters, which play the mathematical role of regressors in nonlinear estimation, are usually correlated with one another (as orthogonal treatment indicators are not). For both these reasons (because our data is discrete and our specifications are nonlinear) estimation of specifications of discrete choice under risk is potentially a very data-hungry enterprise – much more so than intuition might suggest. In Wilcox (2007b), Monte Carlo simulations suggest that for the purpose of in-sample comparisons of the fit of different stochastic models, the HO data is indeed ‘‘large’’ in the aforementioned sense. For instance, consider the 100 HO data set observations of choices on the contexts (0,1,2) and (0,1,3). Monte Carlo methods allow us to create simulated data sets that resemble this real data set, except that a particular specification can be made the ‘‘true’’ specification or ‘‘data-generating process’’ in the simulated data sets. We can estimate both the true specification and other specifications on such simulated data, and see whether log likelihood comparisons correctly choose the true specification. In fact, this seems to be the case in most such simulated data sets with individual estimation when the fit comparison is confined to the same choice data used for estimation – that is, for in-sample fit comparisons. This nice result does not hold for out-of-sample comparisons. We can also create simulated data sets of 150 choices on the three contexts (0,1,2), (0,1,3), and (1,2,3), where again we know the true specification or data-generating process. We can again perform individual estimation using the data on the contexts (0,1,2) and (0,1,3), but now use those estimates to predict choices and compute out-of-sample log likelihoods for the choices from pairs on the context (1,2,3). We can then see whether comparisons of these out-of-sample log likelihoods correctly choose the true specification. It turns out that this procedure produces an extreme bias favoring strong utility models. For instance, Wilcox (2007b) reports a Monte Carlo simulation in which (EU,RP) is the true specification. Out-ofsample log likelihood comparisons using individual estimation never correctly identify RP as the true stochastic model, and in half of these samples strong utility is incorrectly identified as fitting significantly better than RP out-of-sample. Similar results hold when contextual utility is the true stochastic model. This seems inescapable: For the purpose of out-ofsample prediction based on individual estimation, the HO data set is not ‘‘large’’ in the aforementioned sense. Or, put differently, individual estimation suffers from a powerful finite sample bias when it comes to out-of-sample prediction as a method of evaluating alternative stochastic models.
Stochastic Models for Binary Discrete Choice Under Risk
265
Individual estimation is not, therefore, a suitable treatment of heterogeneity in samples of the HO size, when the purpose of the estimation is out-of-sample prediction and comparison of specifications. The alternative for the HO data set is random parameters estimation, which is illustrated well by Loomes et al. (2002) and Moffatt (2005), and this is the method I will use here. There are 80 subjects in the HO data set, which is half again as many as the 53 subjects in Hey (2001). This larger cross-sectional size in the HO data is better for the purpose of random parameters estimation. Unfortunately, the HO data does not contain any covariates: It can also be helpful to condition structural or stochastic parameters (or their distributions in random parameters estimations) on covariates such as demographic variables, cognitive ability measures, and/or personality scales. Readers should consult Harrison and Rutstro¨m (2008 – this volume) to see examples of this with demographic variables. Although it is tempting to view conditioning on covariates and random parameters estimation as substitutes, I think that ideally one would want to do both at once. Surely only a part of the valid (i.e., stable, repeatable, and reliable) cross-sectional variance in risk parameters is explainable by easily observed covariates. Therefore, attempting to account for heterogeneity solely by conditioning on them will surely leave a potential for residual aggregation biases associated with what they miss. Correspondingly, surely every random parameters estimation is based on distributional assumptions that are at best only approximately true: If the approximation is poor, then that estimation will also suffer from bias. When we do both at once, the covariates will ease some of the inferential burden borne by distributional assumptions of the random parameters, and the random parameters will catch much of the variance missed by the covariates. In the end, the only reason I do not condition my estimations on covariates here is because the HO data set does not have any. But we should also remember that as we add either covariates or distributional parameters or both, we are burning precious degrees of freedom. The truth is that we need data sets that are bigger in almost every conceivable dimension: More subjects, more pairs, more repetitions, and more covariates. 5.4.1. General Framework for Random Parameters Estimation Let s denote a particular specification. Suppose that s is the ‘‘true datagenerating process’’ or DGP for all subject types in the sampled population. Let cs ¼ ðbs ; as Þ denote a vector of parameters governing choice from pairs for specification s. Here, bs is the structural parameter vector of specification s: It contains utilities of outcomes u2 and u3 whenever s is a non-RP specification, and also the weighting function parameter g whenever s is an RDEU specification. The vector as is the stochastic parameter vector of
266
NATHANIEL T. WILCOX
specification s, which governs the shape and/or variance of distributions determining choice probabilities: This is l in non-RP specifications, and is the vector ðf1 ; f2 ; kÞ in RP specifications. Let J s ðcs jys Þ denote the joint c.d.f. governing the distribution of cs in the population from which subjects are sampled, where ys are parameters governing J s . Notice that we are now thinking of a subject type c as a parameter vector, and we are thinking of this vector as following some joint distribution J s in the sampled population; that distribution’s shape is governed by another vector of parameters ys . Let yns be the true value of ys in that population. We want an estimate y~ s of yns : This is what is meant by random parameters estimation. Sensible random parameters estimation requires a reasonable and tractable form for J s that arguably characterizes main features of the joint distribution of parameter vectors cs in the sample. The approach I take to choosing J s is empirical: In essence, exploratory individual estimations produce some rough facts about the ‘‘look’’ of the distribution of vectors cs in the sample, under the null of specification s, in the form of correlations and first principal component. The idea is to build a distribution from independent standard normal variates that captures the most salient features of that ‘‘look.’’ The best way to explain the approach, I believe, is by a detailed example using the most prosaic specification – the (EU,Strong) specification. Appendix E outlines the approach more generally, and provides the exact random parameters form used for all specifications. Obviously, judgment enters into this approach, and the judgments could be very different from different perspectives. For instance, self-selection plays some role in the composition of our laboratory samples, and this may quite literally shape ‘‘the sampled population’’ in real and important ways.23 Moreover, each of the ten specifications could, to some extent, produce very different looking rough facts. Fortunately, this does not seem to be an issue: There is a surprising degree of similarity between the rough distributional facts that seem to emerge from individual estimations of the different specifications, and this is noted in Appendix E. This is good, because it allows the form of the distribution J s to be very similar across specifications (and it is, and this is elaborated in Appendix E). 5.4.2. The Random Parameters Approach for EU with Strong Utility: An Illustration At the level of an individual subject, without trembles and suppressing the subject superscript n, the (EU,Strong) model is Pm ¼ Lðl½ðsmj tmj Þuj þ ðsmk tmk Þuk þ ðsml tml Þul Þ
(42)
Stochastic Models for Binary Discrete Choice Under Risk
267
where LðxÞ ¼ ½1 þ expðxÞ1 is the logistic c.d.f. (which will be consistently employed as the function H(x) for strong, strict, contextual, and WV models). In terms of the two underlying utility parameters u2 and u3 to be estimated, the utilities ðuj ; uk ; ul Þ in Eq. (42) are ðuj ; uk ; ul Þ ¼
ð1; u2 ; u3 Þ for pairs m on context c ¼ 0; that is ð1; 2; 3Þ; ð0; 1; u3 Þ for pairs m on context c ¼ 2; that is ð0; 1; 3Þ; and ð0; 1; u2 Þ for pairs m on context c ¼ 3; that is ð0; 1; 2Þ (43)
I begin by estimating the parameters of a simplified version of Eq. (42) individually, for 68 of HO’s 80 subjects,24 using all 150 observations of choices on the contexts 0, 2, and 3 combined. This initial subject-bysubject estimation gives a rough impression of the look of the joint distribution of parameter vectors c ¼ ðu2 ; u3 ; lÞ across subjects, and how we might choose JðcjyÞ to represent that distribution. At this initial step, o is not estimated, but rather assumed constant across subjects and equal to 0.04.25 Estimation of o is undertaken later in the random parameters estimation. Therefore, I begin by estimating u2, u3, and l, using the choice probabilities o Pm ¼ ð1 oÞPm þ 2 (44) ¼ 0:96Lðl½ðsmj tmj Þuj þ ðsmk tmk Þuk þ ðsl tl Þul Þ þ 0:02 for each subject (temporarily fixing o at 0.04 for each subject). The log likelihood function for subject n is X LLn ðu2 ; u3 ; lÞ ¼ ynm lnðPm Þ þ ð1 ynm Þ lnð1 Pm Þ (45) m
with the specification of Pm in Eq. (44). This is maximized for each subject n, ~ n ¼ ðu~n ; u~n ; l~ n Þ, initial estimates for each subject n. Fig. 3 graphs yielding c 2 3 n lnðu~ n2 1Þ, lnðu~n3 1Þ, and lnðl~ Þ against their first principal component, which accounts for about 69% of their collective variance.26 The figure also shows regression lines on the first principal component. The Pearson correlation between lnðu~ n2 1Þ and lnðu~ n3 1Þ is fairly high (0.848). Given that these are both estimates and hence contain some pure sampling error, it appears that an assumption of perfect correlation between them in the underlying population may not do too much violence to truth. Therefore, I make this assumption n about the joint distribution of c in the population. While lnðl~ Þ does appear to share limited variance with lnðu~n2 1Þ and lnðu~ n3 1Þ (Pearson correlations
NATHANIEL T. WILCOX
natural logarithm of precision and utility parameter estimates
268
-3
Fig. 3.
6
ln() 4
2
ln(u3 -1) 0 -1
1
-2
3 ln(u2 -1)
-4 First principal component of variance
Shared Variance of Initial Individual Parameter Estimates in the (EU,Strong) Model.
of 0.22 and 0.45, respectively), it obviously either has independent variance of its own or is estimated with relatively low precision. These observations suggest modeling the joint distribution JðcjyÞ of c ¼ ðu2 ; u3 ; l; oÞ as being generated by two independent standard normal deviates xu and xl , as follows: u2 ðxu ; yÞ ¼ 1 þ expða2 þ b2 xu Þ; u3 ðxu ; yÞ ¼ 1 þ expða3 þ b3 xu Þ; lðxu ; xl ; yÞ ¼ expðal þ bl xu þ cl xl Þ and o a constant; where ðy; oÞ ¼ ða2 ; b2 ; a3 ; b3 ; al ; bl ; cl ; oÞ are parameters to be estimated (46) In essence, Eq. (46) characterize the sampled population as having two heterogeneous dimensions ‘‘indexed’’ by two independent normal variates. The first variate xu can be thought of as the first principal component of the vector c ¼ ðu2 ; u3 ; lÞ, mainly associated with heterogeneity of utility functions, while the second variate xl captures a second dimension of heterogeneity mainly associated with precision. The term bl xu in the
269
Stochastic Models for Binary Discrete Choice Under Risk
equation for precision, however, allows for a relationship between utility functions and precision in the sampled population, by allowing precision to partake of some of the first principal component of variance. The (EU,Strong) specification, conditional on xu, xl, and ðy; oÞ, then becomes Pm ðxu ; xl ; y; oÞ ¼ ð1 oÞLðlðxu ; xl ; yÞ½ðsmj tmj Þuj þ ðsmk tmk Þuk þ ðsml tml Þul Þ þ
o ; 2
where uj ¼ 1 for pairs m on context c ¼ 0; uj ¼ 0 otherwise;
(47)
uk ¼ u2 ðxu ; yÞ for pairs m on context c ¼ 0; uk ¼ 1 otherwise; and ul ¼ u2 ðxu ; yÞ for pairs m on context c ¼ 3; ul ¼ u3 ðxu ; yÞ otherwise
Now, we estimate yn and on by maximizing this random parameters log likelihood function in these parameters: LLðy; oÞ ¼ X ZZ Y ynm 1 ynm ln P ðx ; x ; y; oÞ ½1 P ðx ; x ; y; oÞ ÞdFðx Þ dFðx m u l u l m m u l n
(48)
where F is the standard normal c.d.f. and Pm ðxu ; xl ; y; oÞ is as given in Eq. (47).27 The integrations in Eq. (48) take account of how heterogeneity in the sampled population modifies population choice proportions in ways that do not necessarily match the individual level properties of the specification. This is how the random parameters approach accounts for potentially confounding effects of heterogeneity discussed earlier. The estimation problem is recast as choosing parameters y that govern the distribution of the specification’s parameter vector ðu2 ; u3 ; lÞ in the population, rather than choosing a specific pooled ‘‘representative decision maker’’ parameter vector or individual vectors for each decision maker. Maximizing expressions like Eq. (48) can be difficult, but fortunately the linear regression lines in Fig. 3 may provide reasonable starting values for the parameter vector y. That is, initial estimates of the a and b coefficients in y are the intercepts and slopes from the linear regressions of lnðu~n2 1Þ, n lnðu~n3 1Þ, and lnðl~ Þ on their first principal component; and the root mean n squared error of the regression of lnðl~ Þ on the first principal component provides an initial estimate of cl. Table 6 shows the results of maximizing Eq. (48) in ðy; oÞ. As can be seen, the initial parameter estimates are good starting values, though some final estimates are significantly different from the initial estimates (judging from
270
NATHANIEL T. WILCOX
Table 6. Random Parameters Estimates of the (EU,Strong) Model, using Choice Data from the Contexts (0,1,2), (0,1,3), and (1,2,3) of the Hey and Orme (1994) Sample. Structural and Stochastic Parameter Models
Distributional Parameter
Initial Estimate
Final Estimate
Asymptotic Standard Error
Asymptotic t-statistic
u2 ¼ 1 þ expða2 þ b2 xu Þ
a2 b2
1.2 0.57
1.28 0.514
0.0411 0.0311
31.0 16.5
u3 ¼ 1 þ expða3 þ b3 xu Þ
a3 b3
0.51 0.63
0.653 0.657
0.0329 0.0316
16.9 20.8
l ¼ expðal þ bl xu þ cl xl Þ
al bl cl
3.2 0.49 0.66
3.39 0.658 0.584
0.101 0.124 0.0571
33.8 5.32 10.2
o constant
o
0.04
0.0105
4.26
0.0446
Log likelihood ¼ 5311.44
Notes: xu and xl are independent standard normal variates. Standard errors are calculated using the ‘‘sandwich estimator’’ (Wooldridge, 2002) and treating all of each subject’s choices as a single ‘‘super-observation,’’ that is, using degrees of freedom equal to the number of subjects rather than the number of subjects times the number of choices made.
the asymptotic standard errors of the final estimates). These estimates produce the log likelihood in the first column of the top row of Table 7, to be discussed shortly. Note that wherever b~2 ab~3 , sufficiently large or small values of the underlying standard normal deviate xu imply a violation of monotonicity (i.e., u2 4u3 ). Rather than imposing b2 ¼ b3 as a constraint on the estimations, I impose the weaker constraint |(a2a3)/(b3b2)|>4.2649, making the estimated population fraction of such violations no larger than 105. This constraint does not bind for the estimates shown in Table 6. For other non-RP estimations, it rarely binds (and when it does, it is never close to significantly binding). Recall that the nonparametric treatment of the utility of outcomes avoids a fixed risk attitude across the outcome vector (0,1,2,3), as would be implied by a one-parameter parametric form such as CARA or CRRA utility. The estimates shown in Table 6 imply a population in which about 68% of subjects have a weakly concave utility function, while the remaining 32% have an inflected ‘‘concave then convex’’ utility function. This is very similar to the results of Hey and Orme’s (1994) individual estimations: That is, the random parameters estimation used here produces estimated sample heterogeneity of utility function shapes much like that suggested by Hey and
271
Stochastic Models for Binary Discrete Choice Under Risk
Table 7.
Log Likelihoods of Random Parameters Characterizations of the Models in the Hey and Orme Sample.
Stochastic Model
EU structure Strong utility Strict utility Contextual utility Wandering vector Random preferences RDEU structure Strong utility Strict utility Contextual utility Wandering vector Random preferences
Estimated on all Three Contexts
Estimated on Contexts (0,1,2) and (0,1,3)
Log likelihood on all three contexts (in-sample fit)
Log likelihood on context (1,2,3) (out-of-sample fit)
5311.44 5448.50 5297.08 5362.61 5348.36
2409.38 2373.12 2302.55 2417.76 2356.60
5207.81 5306.48 5190.43 5251.82 5218.00
2394.75 2450.41 2281.36 2397.91 2335.55
Orme’s individual strong utility estimations. The random parameters treatment of the other specifications is very similar to what has been discussed here in detail for the (EU,Strong) specification, with the necessary changes made; Appendix E shows this in detail.
5.5. A Comparison of the Specifications Table 7 displays both the in-sample and out-of-sample log likelihoods for the ten specifications. The top five rows are the EU specifications, and the bottom five rows are the RDEU specifications; for each structure, the five rows show results for strong utility, strict utility, contextual utility, the WV model and RPs. The first column shows total in-sample log likelihoods, and the second column shows total out-of-sample log likelihoods. Contextual utility always produces the highest log likelihood, whether it is combined with EU or RDEU, and whether we look at in-sample or out-of-sample log likelihoods (though the log likelihood advantage of contextual utility is most pronounced in the out-of-sample comparisons). Buschena and Zilberman (2000) and Loomes et al. (2002) point out that the best-fitting stochastic model may depend on the structure estimated, a very sensible econometric
272
NATHANIEL T. WILCOX
point, and offer empirical illustrations. Yet in Table 7 contextual utility is the best stochastic model regardless of whether we view the matter from the perspective of the EU or RDEU structures, or from the perspective of in-context or out-of-context fit. Table 7 suggests that the relative consequence of structure and stochastic model depends on whether we examine in-sample or out-of-sample fit. Consider first the in-sample fit column. Holding stochastic models constant, the maximum improvement in log likelihood associated with moving from EU to RDEU is 142.02 (with strict utility), and the improvement is 106.64 for the best-fitting stochastic model (contextual utility). Holding structures constant instead, the maximum improvement in log likelihood associated with changing the stochastic model is 151.48 (with the EU structure, switching from strict to contextual utility), but this is atypical: Omitting strict utility specifications, which have unusually poor in-sample fit, the maximum improvement is 65.53 (with the EU structure, switching from the WV model to contextual utility). Therefore, except for the especially poor strict utility fits, in-sample comparisons make stochastic models appear to be a sideshow relative to choices of structure. This appearance is reversed when we look at out-of-sample comparisons – that is, predictive power. Looking now at the out-of-sample fit column, notice first that under strict utility, RDEU actually fits worse than EU does. But strict utility is an unusually poor performer overall, so perhaps we should set it aside. Among the remaining four stochastic models, the maximum out-of-sample fit improvement associated with switching from EU to RDEU is 21.19 (for contextual utility). Holding structures constant instead, the maximum out-of-sample fit difference between the stochastic models (again omitting strict utility) is 116.55 (for the RDEU structure, switching from the WV model to contextual utility). In the realm of out-ofsample prediction, then, structures seem inconsequential relative to the stochastic models. Moreover, it is worth emphasizing that the improvements associated with changing stochastic models ‘‘cost no parameters’’ here since the number of parameters estimated is fixed for a given structure. There has been a tendency toward structural innovation rather than stochastic model innovation over the last quarter century. Perhaps, at least in the realm of prediction, we ought to be paying more attention to stochastic models, as repeatedly urged by Hey (Hey & Orme, 1994; Hey, 2001; Hey, 2005) and suggested by Ballinger and Wilcox (1997). Table 8 reports the results of a more formal comparison between the n stochastic models, conditional on each structure. Let D~ denote the difference between the estimated log likelihoods (in-sample or out-of-sample)
–
–
z ¼ 0.981 p ¼ 0.163 –
–
–
z ¼ 1.723 p ¼ 0.042 –
–
z ¼ 0.877 p ¼ 0.190 z ¼ 0.44 p ¼ 0.330 –
–
z ¼ 0.703 p ¼ 0.241 z ¼ 1.574 p ¼ 0.058 –
z ¼ 2.239 p ¼ 0.013 z ¼ 1.765 p ¼ 0.0388 z ¼ 2.697 p ¼ 0.0035 –
z ¼ 2.354 p ¼ 0.0093 z ¼ 0.739 p ¼ 0.230 z ¼ 3.236 p ¼ 0.0006 –
z ¼ 4.352 po0.0001 z ¼ 3.808 po0.0001 z ¼ 5.973 po0.0001 z ¼ 2.700 po0.0035
z ¼ 6.067 po0.0001 z ¼ 5.419 po0.0001 z ¼ 5.961 po0.0001 z ¼ 5.079 po0.0001
–
–
z ¼ 3.879 po0.0001 –
–
–
z ¼ 4.387 po0.0001 –
Notes: Positive z means the row stochastic model fits better than the column stochastic model.
Wandering vector
RDEU structure Contextual utility Random preferences Strong utility
Wandering vector
EU structure Contextual utility Random preferences Strong utility
Strict utility
–
z ¼ 3.304 p ¼ 0.0005 z ¼ 1.652 p ¼ 0.049 –
–
z ¼ 3.044 p ¼ 0.0012 z ¼ 1.639 p ¼ 0.051 –
Strong utility
z ¼ 4.040 po0.0001 z ¼ 2.073 p ¼ 0.0191 z ¼ 0.261 p ¼ 0.397 –
z ¼ 3.509 p ¼ 0.0002 z ¼ 2.148 p ¼ 0.016 z ¼ 0.965 p ¼ 0.167 –
Wandering vector
z ¼ 5.978 po0.0001 z ¼ 3.831 po0.0001 z ¼ 3.918 po0.0001 z ¼ 5.695 po0.0001
z ¼ 2.739 p ¼ 0.0031 z ¼ 1.422 p ¼ 0.078 z ¼ 0.028 p ¼ 0.49 z ¼ 0.322 p ¼ 0.37
Strict utility
Random preferences
Wandering vector
Random preferences
Strong utility
Estimated on Contexts (0,1,2) and (0,1,3), and Comparing Fit on Context (1,2,3) (Out-of-context Fit Comparison)
Estimated on all Three Contexts, and Comparing Fit on all Three Contexts (In-context Fit Comparison)
Table 8. Vuong (1989) Nonnested Comparisons between Fit of Stochastic Model Pairs, In-sample and Out-of-sample Fit.
Stochastic Models for Binary Discrete Choice Under Risk 273
274
NATHANIEL T. WILCOX
from a pair of specifications, for subject n. Vuong (1989) provides an n asymptotic justification for treating a z-score based on the D~ as following a normal distribution under the hypothesis that two non-nested specifications are equally good, in the sense that they are equally close to the true specification (neither specification topbe ffiffiffiffi the true specification). P needs ~ n =ð~sD N Þ, where s~D is the sample The statistic is computed as z ¼ N D k¼1 n standard deviation of the D~ across subjects n (calculated without the usual adjustment for a degree of freedom) and N is the number of subjects. Table 8 reports these z-statistics, and associated p-values against the null of equally good fit, with a one-tailed alternative that the directionally better fit is significantly better. While contextual utility is always directionally better than its competitors, no convincingly significant ordering of the stochastic models emerges from the in-sample comparisons shown in the left half of Table 8, though strict utility is clearly significantly worse than the other four stochastic models. Contextual utility shines, though, in the out-of-sample fit comparisons in the right half of Table 8, regardless of whether the structure is EU or RDEU, where it beats the other four stochastic models with strong significance. In spite of the problems with individual estimation and prediction discussed in Wilcox (2007b), it is worth remarking on the relative performance of an individual estimation approach. Unsurprisingly, total in-sample fits of specifications with individual estimation are much better than the in-sample fits shown in Table 7. Yet total out-of-sample fits of specifications with individual estimation are uniformly worse than the out-ofsample random parameter fits in Table 7 for all ten specifications. There is, of course, one prosaic reason to expect this. The random parameter model fits are based on at most 11 parameters (RDEU models) for characterizing the entire sample (the parameters in y), whereas individual model fits are based on many more parameters (RDEU models have five parameters per subject; this gives 400 parameters for 80 subjects). We should be unsurprised that an out-of-sample prediction based on 400 parameter estimates fares worse than one based on 11 parameters: Shrinkage associated with liberal burning of degrees of freedom is to be expected, after all. However, there is some surprise here too. Consider that as an asymptotic matter, the individual estimation fits must be better than the random parameters fits, even if the random parameters characterization of heterogeneity – that is, the specification of the joint distribution function J(c|y) – is exactly right. This is because a random parameters likelihood function takes the expectation of probabilities with respect to J before taking logs, while the individual estimations do not. Since the log likelihood
Stochastic Models for Binary Discrete Choice Under Risk
275
function is concave in P, Jensen’s inequality implies that asymptotically (i.e., as estimated probabilities converge to true probabilities for both the random parameters and individual estimations) the ‘‘expected log likelihoods’’ of individual estimation must exceed the ‘‘log expected likelihoods’’ of random parameters estimation. That this asymptotic expectation is so clearly reversed for out-of-sample predictions (even though our choice of J, the distribution of parameters in the sampled population, is surely approximate at best) just hammers home how far individual estimations are from large sample consistency, as noted in Wilcox (2007b), even in a sample as ‘‘large’’ as the HO data.
6. CONCLUSIONS: A BLUNT PERSONAL VIEW I take two facts as given. First, discrete choice is highly stochastic; and second, people differ a lot. To me, any account of the structure of discrete choice under risk that attempts to skirt these facts is unacceptable. The reasons should now be clear. First, stochastic models spindle, fold, and in general mutilate the properties and predictions of structures, and each stochastic model produces its own distinctive mutilations (see Table 5). Second, aggregation across different types of people further hides, distorts, and in general destroys the individual level properties of specifications – that is particular combinations of structure and stochastic model. This should be clear from the discussions of the common ratio effect (Section 3.1.2) and how differently individual and aggregate tests of betweenness in spread triples appear (Section 4.1.3). I conclude that the practice of testing decision theories by looking only at sample choice proportions, sample switching rates, sample proportions of predicted versus unpredicted violations, and so on is indefensible. It follows from exactly the same considerations that the common practice of estimating a single pooled ‘‘representative decision maker’’ preference functional is equally indefensible. Stochastic model implications and heterogeneity must be dealt with: They are at least as important in determining sample properties as structures are. When we turn from the realm of theory-testing to the twin realms of estimation and prediction, the case is similar and if anything stronger. It turns out that different stochastic models imply different things about the empirical meaning of estimated risk aversion parameters across people and contexts (Wilcox, 2007a). Some stochastic models (those that are CARA- and CRRA-neutral) identify changes in risk-taking across context shifts as changes in structural risk aversion, while others do not. And as
276
NATHANIEL T. WILCOX
shown in Table 7, stochastic models appear to have much more to do with successful prediction than structures do, at least in one well-known data set. It is hard to escape the conclusion that decision research could benefit strongly from more work on stochastic models. Structure has been worried to death for a quarter of a century. How much better has this enabled us to predict? If the findings of Table 7 are general, the answer is: ‘‘Not that much and more effort should have been put into stochastic models.’’ At any rate, it is not clear that improving the prediction fit by 21.19 of log likelihood (switching from EU to RDEU, with contextual utility), at the cost of three extra parameters, should earn any trips to Stockholm. Stochastic models have been unaccountably neglected and gains in predictive power are likely to come from working on improving them. It will be no surprise that I like my own model, contextual utility, better than the other alternatives I have closely considered here. It predicts best; it makes sense of the ‘‘more risk averse’’ relation across people and contexts; it is CARA- and CRRA-neutral; and (I view this as good, though others dislike this property of the strong/strict/contextual family) it can explain parts of common ratio effects, betweenness violations and other phenomena normally associated with nonlinearity in probability through its form and through heterogeneity, without recourse to probability weighting functions. Yet I would not be hugely surprised if bona fide theorists can do better than contextual utility, and I hope they will try. To aid these theorists, we experimentalist might do more tests of stochastic model properties that hold for broad classes of structures. I have in mind here the varieties of stochastic transitivity and simple scalability: Stochastic model predictions about these properties are the same for all transitive structures, and not simply EU and RDEU. Replicating and extending the psychologists’ experimental canon on these kinds of properties will help us build a strong base of stochastic facts that are at least general for transitive structures. For instance, contextual utility should obey simple scalability for pairs that share the same context, but should violate it in distinctive ways for pairs on different contexts, much in the manner of the Myers effect discussed by Busemeyer and Townsend (1993). This is true of contextual utility whether the structure is EU, RDEU, or any transitive structure. RPs are attractive to many economists, but they suffer from several problems, not least of which is their intractability across contexts for structures more complex than EU. But I think the really deep problem with RPs is the near impossibility of building an interesting cumulative and general empirical research program about them. This is no problem at all for other models: As discussed above, models like strong, strict, and contextual
Stochastic Models for Binary Discrete Choice Under Risk
277
utility make distinctive predictions (about stochastic transitivities and simple scalability) that should hold for all transitive structures. But the only RP prediction that is shared by a broad collection of structures is that the probability of an FOSD violation is zero. This prediction is wrong for the ‘‘nontransparent’’ violations discussed by Birnbaum and Navarrete (1998), and I argued here that FOSD properties are relatively weak ones when choosing among stochastic models. Therefore, RP models produce almost no interesting predictions that hold across a large class of structures. So it seems that there can be little accumulation of interesting knowledge about the performance of the RP hypothesis that is applicable across structures: Any particular study tests its predictions with a specific structure, and the predictions are wholly idiosyncratic to that structure. This looks to me like a recipe for little or no cumulative knowledge about the general truth or applicability of the RP hypothesis itself since there is almost no general prediction that it makes. That problem is not shared by the other stochastic models. Finally, this chapter has been selective. Neither Blavatskyy’s (2007) truncated error model, nor Busemeyer and Townsend’s (1993) decision field theory, were a part of the contest in Section 5. These are also heteroscedastic models, resembling contextual utility and the WV model in various ways. I do believe that we are witnessing a fertile period for stochastic model innovation now. The likelihood-based ‘‘fit comparison’’ approach taken here and elsewhere is good, but it needs to be complemented by some testing of general predictions that transcend particular functional forms, structures and parameter values. So I will close by urging exactly that. Proponents of models like contextual utility, decision field theory, and truncated error models need to figure out what these models rule out, and not just show what they allow and how well they fit. The stochastic transitivities and simple scalability properties, or testable modifications of these suited to heteroscedastic models, are the likely places to begin such work.
NOTES 1. Psychologists ask similar questions (see Busemeyer & Townsend, 1993). 2. There could be more than one ‘‘true’’ stochastic model in populations. Without prejudice, I ignore this here. 3. My restriction to lotteries without losses is for expositional clarity and focus; ignoring loss aversion here has no consequences for my main econometric points.
278
NATHANIEL T. WILCOX
4. Some experiments show a substantial ‘‘drift’’ with repetition toward increasingly safe choices (Hey & Orme, 1994; Loomes & Sugden, 1998) or a small one (Ballinger & Wilcox, 1997). If most subjects are risk averse, decreased random parts of decision making with repetition can explain this (see Loomes et al., 2002 for details). Harrison et al. (2005) find order effects with just two trials. I abstract from these phenomena here. 5. The two structures considered here are transitive ones. A broader definition allowing for both transitive and nontransitive structures is a function D such that DðSm ; Rm jbn Þ 0:53Pnm ¼ 0:5. 6. There are alternative stochastic choice models under which this is not innocuous (e.g., Machina, 1985). The evidence on these alternatives is not encouraging, though as yet meager (see Hey & Carbone, 1995). 7. Such evidence (also found in Tversky & Kahneman, 1986) comes from hypothetical design or ‘‘near-hypothetical designs’’ (designs with vanishing likelihoods of decisions actually counting), but my hunch is that we would also see this in an experiment with incentive-compatible mechanisms and more typical likelihoods that the decisions count, though perhaps at a somewhat reduced frequency. 8. The proviso ‘‘binary’’ in this statement is quite important. There are phenomena that violate almost all stochastic models for choice amongst three or more alternatives in choice sets. Perhaps the best-known of these is the ‘‘asymmetrically dominated alternative effect’’ that violates regularity and independence from irrelevant alternatives, as well as Debreu’s (1960) similarity-based exception to the latter (see Huber, Payne, & Puto, 1982). 9. One may also condition on on a task order subscript t if, for instance, one believes that trembles become less likely with experience, as in Loomes et al. (2002) and Moffatt (2005). 10. For simplicity’s sake I assume throughout this chapter that parameter vectors producing indifference in any pair mhave zero measure for all n, so that the sets Bm and Bnm ¼ bjVðS m jbÞ VðRm jbÞ40 have equal measure. However, one may make indifference a positive probability event in various ways; for a strong utility approach based on a threshold of discrimination (see Hey & Orme, 1994). 11. If tmk smk ¼ 0, either Tm and Sm are identical, or m is an FOSD pair. In this latter case, the RP model implies a zero probability of choosing the dominated lottery, or with a small tremble probability, a on =2 probability of choosing the dominated lottery, as shown later. 12. Briefly, let there be just three equally likely linear (and hence transitive) orderings in subject n’s urn of orderings of lotteries C, D, and E, denoted CDE, DEC, and ECD, where each ordering is from best to worst. As usual, consider the pairs {C,D}, {D,E}, and {C,E}, calling them pairs 1, 2, and 3, respectively. Then Pn1 ¼ 2=3 and Pn2 ¼ 2=3, but Pn3 ¼ 1=3, violating weak stochastic transitivity. 13. I have not seen this discussed in the literature, and it is not clear to me what restrictions on the preference orderings in the urn would be required to guarantee single-peakedness of all orders for all lottery triples. This could be a very mild, or a very strong, restriction in practice. 14. If we used the standard normal c.d.f., the ‘‘standard variance’’ would be 1; if we used the logistic c.d.f., the ‘‘standard variance’’ would be p2/3.
Stochastic Models for Binary Discrete Choice Under Risk
279
15. Note that since the distances dm are distances between entire lotteries, this is a measure of the similarity of two lotteries. One may also ask questions about the similarity of individual dimensions of lotteries, for example, are these two probabilities of receiving outcome zi very close, and hence so similar that the outcome zi can safely be ignored as approximately irrelevant to making a decision? This ‘‘dimension-level similarity’’ is a different kind of similarity not dealt with by dm, but it also has decision-theoretic force: It implies a different structure, usually an intransitive one with a D representation rather than a V one (see Tversky, 1969; Rubinstein, 1988; or Leland, 1994). 16. Appendix B shows that this satisfies the triangle inequality, and hence shows that contextual utility is a moderate utility model (obeys MST) for all lottery triples that generate three basic pairs. This rules out triples with FOSD pairs in them, but such pairs are taken care of in the special manner discussed later in the section on FOSD. 17. Hilton (1989, p. 214) originally pointed out that there is no necessary correspondence between expected risk premium orderings and choice probability orderings under random preference EU models. 18. They look at a smaller number of fifty- and ninety-fold proportional shifts. The twenty-fold shifts are far more numerous. 19. See Busemeyer and Townsend (1993), page 438. The most robust violation of simple scalability is known as the Myers effect, which explicitly involves pairs with different contexts. In decision field theory, the variance of evaluative noise increases with the range of lottery utilities in pairs (holding other pair features constant), and this largely accounts for the Myers effect. Contextual utility obviously has the same property, and so gives a similar explanation of the Myers effect. 20. The ratio relationship here is a generalization of the well-known fact that the ratio of independent w2 variates follows an F distribution. w2 variates are gamma variates with common scale parameter k ¼ 2. In fact, a beta-prime variate can be transformed into an F variate: If x is a beta-prime variate with parameters a and b, then bx/a is an F variate with degrees of freedom 2a and 2b. This is convenient because almost all statistics software packages contain thorough call routines for F variates, but not necessarily any call routines for beta-prime variates. 21. Although ratios of lognormal variates are lognormal, there is no similar simple parametric family for sums of lognormal variates. The independent gammas with common scale are the only workable choice I am aware of. 22. There is another reason why a relatively wide identifying range is useful even if we suspect that the actual range of j is narrow in the sampled population. This has to do with estimation of stochastic parameters in non-RP models. Suppose we happened to use a set of lotteries that all have an identical jm that also happens to equal the actual j of some subject. It is clear that Pm ¼ 0.5 for this subject for all the lottery pairs at jm regardless of our choice of lm in any non-RP model: In other words, the stochastic parameter lm is unidentified in this instance. More generally, the stochastic parameter lm in non-RP models is best identified for pairs that are well away from indifference, and this implies that an identifying range that is wider than the actual range of j still serves good identification purposes. 23. For instance, consider an experiment with no fixed participation payment (no ‘‘show-up fee’’ in experimental lingo) that requires a substantial investment of
280
NATHANIEL T. WILCOX
subject time and has uncertain payments. It wouldn’t be surprising if this design attracts a relatively risk-seeking subset of a campus student population. Supposing some measure of risk aversion was distributed normally in that population, then, we wouldn’t necessarily expect a normal distribution of that same measure of risk aversion in our laboratory sample. We probably don’t know enough about general distributions of any measure of risk aversion in any campus student body to know what the unconditional distribution actually looks like, or to know what portion of that distribution would be drawn to an experiment by self-selection. Nevertheless, it should be clear that self-selection is literally expected to influence the shape of the distributions of subject types we get in our laboratory samples. See Harrison, Lau, and Rutstro¨m (2007) for evidence on this matter. 24. Twelve of the eighty HO data make three or fewer choices of the riskier lottery in any pair on contexts 2 and 3. They can ultimately be included in random parameters estimations, but at this initial stage of individual estimation it is either not useful (due to poor identification) or simply not possible to estimate models for these subjects. 25. Estimation of o is a nuisance at the individual level. Trembles are rare enough that individual estimates of o are typically zero for individuals. Even when estimates are nonzero, the addition of an extra parameter to estimate increases the noisiness of the remaining estimates and hides the pattern of variance and covariance of these parameters that we wish to see at this step. 26. Two hugely obvious outliers have been removed for both the principal components extraction and the graph. 27. Such integrations must be performed numerically in some manner for estimation. I use gauss-hermite quadratures, which are practical up to two or three integrals; for integrals of higher dimension, simulated maximum likelihood is usually more practical. Judd (1998) and Train (2003) are good sources for these methods.
ACKNOWLEDGMENTS John Hey generously made his remarkable data sets available to me. I also thank John Hey and Pavlo Blavatskyy, Jim Cox, Soo Hong Chew, Edi Karni, Ondrej Rydval, and especially Glenn Harrison for conversation, commentary, and/or help, though any errors here are solely my own. I thank the National Science Foundation for support under grant SES 0350565.
REFERENCES Aitchison, J. (1963). Inverse distributions and independent gamma-distributed products of random variables. Biometrika, 50, 505–508. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008). Eliciting risk and time preferences. Econometrica, 76 (forthcoming).
Stochastic Models for Binary Discrete Choice Under Risk
281
Ballinger, T. P., & Wilcox, N. T. (1997). Decisions, error and heterogeneity. Economic Journal, 107, 1090–1105. Becker, G. M., DeGroot, M. H., & Marschak, J. (1963a). Stochastic models of choice behavior. Behavioral Science, 8, 41–55. Becker, G. M., DeGroot, M. H., & Marschak, J. (1963b). An experimental study of some stochastic models for wagers. Behavioral Science, 8, 199–202. Birnbaum, M. H., & Navarrete, J. B. (1998). Testing descriptive utility theories: Violations of stochastic dominance and cumulative independence. Journal of Risk and Uncertainty, 17, 49–78. Black, D. (1948). On the rationale of group decision making. Journal of Political Economy, 56, 23–34. Blavatskyy, P. R. (2006). Violations of betweenness or random errors? Economics Letters, 91, 34–38. Blavatskyy, P. R. (2007). Stochastic choice under risk. Journal of Risk and Uncertainty, 34, 259–286. Block, H. D., & Marschak, J. (1960). Random orderings and stochastic theories of responses. In: I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow & H. B. Mann (Eds), Contributions to probability and statistics: Essays in honor of Harold Hotelling (pp. 97–132). Stanford, CA: Stanford University Press. Buschena, D. E., & Zilberman, D. (2000). Generalized expected utility, heteroscedastic error, and path dependence in risky choice. Journal of Risk and Uncertainty, 20, 67–88. Busemeyer, J. R., & Townsend, J. T. (1993). Decision field theory: A dynamic-cognitive approach to decision making in an uncertain environment. Psychological Review, 100, 432–459. Busemeyer, J. R., & Wang, Y.-M. (2000). Model comparisons and model selections based on generalization criterion methodology. Journal of Mathematical Psychology, 44, 171–189. Camerer, C. (1989). An experimental test of several generalized expected utility theories. Journal of Risk and Uncertainty, 2, 61–104. Camerer, C., & Ho, T.-H. (1999). Experience weighted attraction learning in normal-form games. Econometrica, 67, 827–874. Camerer, C., & Hogarth, R. (1999). The effects of financial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty, 19, 7–42. Carbone, E. (1997). Investigation of stochastic preference theory using experimental data. Economics Letters, 57, 305–311. Carroll, J. D. (1980). Models and methods for multidimensional analysis of preferential choice (or other dominance) data. In: E. D. Lantermann & H. Feger (Eds), Similarity and choice (pp. 234–289). Bern, Switzerland: Huber. Carroll, J. D., & De Soete, G. (1991). Toward a new paradigm for the study of multiattribute choice behavior. American Psychologist, 46, 342–351. Chew, S. H. (1983). A generalization of the quasilinear mean with applications to the measurement of income inequality and decision theory resolving the Allais paradox. Econometrica, 51, 1065–1092. Chew, S. H., Karni, E., & Safra, Z. (1987). Risk aversion in the theory of expected utility with rank-dependent preferences. Journal of Economic Theory, 42, 370–381. Chipman, J. (1963). Stochastic choice and subjective probability. In: D. Willner (Ed.), Decisions, values and groups (pp. 70–95). New York: Pergamon. Conlisk, J. (1989). Three variants on the Allais example. The American Economic Review, 79, 392–407.
282
NATHANIEL T. WILCOX
Conte, A., Hey, J., & Moffatt, P. (2007). Mixture models of choice under risk. University of York, Discussion Paper in Economics 2007/6. Cox, J., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56, 45–60. Cummings, R. G., Harrison, G. W., & Rutstro¨m, E. E. (1995). Homegrown values and hypothetical surveys: Is the dichotomous choice approach incentive-compatible? American Economic Review, 85, 260–266. Debreu, G. (1958). Stochastic choice and cardinal utility. Econometrica, 26, 440–444. Debreu, G. (1960). Review of R. D. Luce – Individual choice behavior: A theoretical analysis. American Economic Review, 50, 186–188. Domencich, T., & McFadden, D. (1975). Urban travel demand: A behavioral analysis. Amsterdam: North-Holland. Edwards, W. (1954). A theory of decision making. Psychological Bulletin, 51, 380–417. Fechner, G. (1966/1860). Elements of psychophysics (Vol. 1). New York: Holt, Rinehart and Winston. Fishburn, P. (1999). Stochastic utility. In: S. Barbara, P. Hammond & C. Seidl (Eds), Handbook of utility theory (Vol. 1, pp. 273–320). Berlin: Springer. Grether, D., & Plott, C. (1979). Economic theory of choice and the preference reversal phenomenon. American Economic Review, 69, 623–638. Gul, F., & Pesendorfer, W. (2006). Random expected utility. Econometrica, 74, 121–146. Halff, H. M. (1976). Choice theories for differentially comparable alternatives. Journal of Mathematical Psychology, 14, 244–246. Harless, D., & Camerer, C. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62, 1251–1289. Harrison, G. W., Johnson, E., McInnes, M., & Rutstro¨m, E. E. (2005). Risk aversion and incentive effects: Comment. American Economic Review, 95, 897–901. Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2007). Risk attitudes, randomization to treatment, and self-selection into experiments. Working Paper no. 05-01. Department of Economics, College of Business Administration, University of Central Florida. Harrison, G. W., & Rutstro¨m, E. E. (2005). Expected utility theory and prospect theory: One wedding and a decent funeral. Working Paper no. 05-18. Department of Economics, College of Business Administration, University of Central Florida. Harrison, G. W., & Rutstro¨m, E. E. (2008). Risk aversion in the laboratory. In: J. C. Cox & G. W. Harrison (Eds), Research in experimental economics 12: Risk aversion in experiments (Vol. 12, pp. 41–196). Bingley, UK: Emerald (forthcoming). Hey, J. D. (1995). Experimental Investigations of errors in decision making under risk. European Economic Review, 39, 633–640. Hey, J. D. (2001). Does repetition improve consistency? Experimental Economics, 4, 5–54. Hey, J. D. (2005). Why we should not be silent about noise. Experimental Economics, 8, 325–345. Hey, J. D., & Carbone, E. (1995). Stochastic choice with deterministic preferences: An experimental investigation. Economics Letters, 47, 161–167. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62, 1291–1329. Hilton, R. W. (1989). Risk attitude under random utility. Journal of Mathematical Psychology, 33, 206–222. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655.
Stochastic Models for Binary Discrete Choice Under Risk
283
Huber, J., Payne, J. W., & Puto, C. (1982). Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of Consumer Research, 9, 90–98. Hutchinson, T. P., & Lai, C. D. (1990). Continuous bivariate distributions, emphasizing applications. Adelaide, Australia: Rumsby Scientific Publishers. Judd, K. L. (1998). Numerical methods in economics. Cambridge, MA: MIT Press. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Leland, J. (1994). Generalized similarity judgments: An alternative explanation for choice anomalies. Journal of Risk and Uncertainty, 9, 151–172. Loomes, G. (2005). Modeling the stochastic component of behaviour in experiments: Some issues for the interpretation of data. Experimental Economics, 8, 301–323. Loomes, G., Moffatt, P., & Sugden, R. (2002). A microeconometric test of alternative stochastic theories of risky choice. Journal of Risk and Uncertainty, 24, 103–130. Loomes, G., & Sugden, R. (1995). Incorporating a stochastic element into decision theories. European Economic Review, 39, 641–648. Loomes, G., & Sugden, R. (1998). Testing different stochastic specifications of risky choice. Economica, 65, 581–598. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley. Luce, R. D. (1977). The choice axiom after twenty years. Journal of Mathematical Psychology, 15, 215–233. Luce, R. D., & Suppes, P. (1965). Preference, utility and subjective probability. In: R. D. Luce, R. R. Bush & E. Galanter (Eds), Handbook of mathematical psychology (Vol. III, pp. 249–410). Wiley: New York. Machina, M. (1985). Stochastic choice functions generated from deterministic preferences over lotteries. Economic Journal, 95, 575–594. McKay, A. T. (1934). Sampling from batches. Supplement to the Journal of the Royal Statistical Society, 1, 207–216. McKelvey, R., & Palfrey, T. (1995). Quantal response equilibria for normal form games. Games and Economic Behavior, 10, 6–38. Moffatt, P. (2005). Stochastic choice and the allocation of cognitive effort. Experimental Economics, 8, 369–388. Moffatt, P., & Peters, S. (2001). Testing for the presence of a tremble in economics experiments. Experimental Economics, 4, 221–228. Morrison, H. W. (1963). Testable conditions for triads of paired comparison choices. Psychometrika, 28, 369–390. Mosteller, F., & Nogee, P. (1951). An experimental measurement of utility. Journal of Political Economy, 59, 371–404. Myers, J. L., & Sadler, E. (1960). Effects of range of payoffs as a variable in risk taking. Journal of Experimental Psychology, 60, 306–309. Papke, L. E., & Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics, 11, 619–632. Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32, 122–136. Prelec, D. (1998). The probability weighting function. Econometrica, 66, 497–527. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3, 323–343.
284
NATHANIEL T. WILCOX
Quiggin, J. (1991). Comparative statics for rank-dependent expected utility theory. Journal of Risk and Uncertainty, 4, 339–350. Rabin, M. (2000). Risk aversion and expected-utility theory: A calibration theorem. Econometrica, 68, 1281–1292. Rothschild, M., & Stiglitz, J. E. (1970). Increasing risk I: A definition. Journal of Economic Theory, 2, 225–243. Rubinstein, A. (1988). Similarity and decision making under risk (Is there a utility theory resolution to the allais paradox?). Journal of Economic Theory, 46, 145–153. Sonsino, D., Benzion, U., & Mador, G. (2002). The complexity effects on choice with uncertainty: Experimental evidence. Economic Journal, 112, 936–965. Starmer, C., & Sugden, R. (1989). Probability and juxtaposition effects: An experimental investigation of the common ratio effect. Journal of Risk and Uncertainty, 2, 159–178. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 76, 31–48. Train, K. (2003). Discrete choice methods with simulation. Cambridge, U.K.: Cambridge University Press. Tversky, A. (1969). Intransitivity of preferences. Psychological Review, 76, 31–48. Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79, 281–299. Tversky, A., & Kahneman, D. (1986). Rational choice and the framing of decisions. Journal of Business, 59, S251–S278. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5, 297–323. Tversky, A., & Russo, J. E. (1969). Substitutability and similarity in binary choices. Journal of Mathematical Psychology, 6, 1–12. Vuong, Q. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333. Wilcox, N. T. (1993). Lottery choice: Incentives, complexity and decision time. Economic Journal, 103, 1397–1417. Wilcox, N. T. (2007a). Stochastically more risk averse: A contextual theory of stochastic discrete choice under risk. Journal of Econometrics (forthcoming). Wilcox, N. T. (2007b). Predicting risky choices out-of-context: A Monte Carlo Study. Working Paper. Department of Economics, University of Houston. Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.
APPENDIX A Definitions. Let {C,D,E} be any triple of lotteries generating the pairs {C,D}, {D,E}, and {C,E}, denoted as pairs m ¼ 1, 2, and 3, respectively. Consider a heteroscedastic latent variable specification of the form Pm ¼ Fð½VðSm jbÞ VðRm jbÞ=sm Þ, where S m and Rm are the lotteries making up any pair m. Let V S VðSjbÞ be a shorthand notation for the structural value V of any lottery S, where the structural parameter b is suppressed but assumed fixed throughout the discussion (i.e., the discussion is about an individual decision maker).
Stochastic Models for Binary Discrete Choice Under Risk
285
Halff’s Theorem (Halff, 1976). Consider a heteroscedastic latent variable specification in which the standard deviations of latent error variance sm obey the triangle inequality across the three pairs generated by any triple of lotteries. Such specifications satisfy MST. Proof. I will prove the contrapositive. MST fails if we have (i) both P3 oP1 and P3 oP2 , and (ii) both P1 0:5 and P2 0:5. Conditions (i) imply that both ðV C V E Þ=s3 oðV C V D Þ=s1 and ðV C V E Þ=s3 o ðV D V E Þ=s2 . Conditions (ii) imply that both V C V D 0 and V D V E 0, so that V C V E 0 as well. Note that for conditions (i) to hold, it cannot be the case that conditions (ii) both hold as equalities, for then we would have P3 o0:5 from condition (i), implying V C V E o0 which contradicts conditions (ii). Therefore, either V C V D 0 or V D V E 0 (or both) hold as a strict inequality, and it follows that V C V E 40. Therefore, one may divide the inequalities ðV C V E Þ=s3 oðV C V D Þ=s1 and ðV C V E Þ=s3 oðV D V E Þ=s2 through by V C V E to get both 1=s3 ot=s1 and 1=s3 oð1 tÞ=s2 , where t ¼ ðV C V D Þ=ðV C V E Þ. These imply s1 ots3 and s2 oð1 tÞs3 , which sum to s1 þ s2 os3 and contradict the triangle inequality.
APPENDIX B Definitions. A basic triple of lotteries {C,D,E} is one where the pairs {C,D}, {D,E}, and {C,E}, denoted pairs 1, 2, and 3 respectively, are all basic pairs (i.e., none are FOSD pairs). Let V C and V C denote the value (to some agent) of degenerate lotteries that pay the minimum and maximum outcomes in lottery C with certainty, respectively. Notice that in a basic triple, the intersection of the three intervals ½V C ; V C , ½V D ; V D , and ½V ; V must be nonempty; otherwise, the outcome ranges of two lotteries E E would be disjoint, and the pair composed of them would be an FOSD pair. Proposition. Contextual utility obeys MST, but not SST, on all basic triples. Remark. This rules out only triples that contain ‘‘glaringly transparent’’ FOSD pairs in which all the outcomes in one lottery exceed all the outcomes in another lottery. See Section 4.2.3 for a suitable treatment of transparent FOSD pairs by the use of trembles. Proof. Let d CD maxðV C ; V D Þ minðV C ; V D Þ; this is equivalent to the divisor in the latent variable of contextual utility, as given by Eq. (18).
286
NATHANIEL T. WILCOX
Notice that d CD V C V C and d DE V E V E , since the utility range of a pair cannot be less than the utility range in either of the lotteries in a pair. Summing, we have d CD þ d DE V C V C þ V E V E . Since {C,D,E} is a basic triple, the intersection of the intervals ½V C ; V C and ½V E ; V E is nonempty (otherwise pair {C,E} would be an FOSD pair). Therefore, the utility range in pair {C,E} cannot exceed the sum of the utility ranges of its component lotteries C and E; that is d CE V C V C þ V E V E . Combining this with the previous inequality, we have d CD þ d DE d CE : The divisor d in Eq. (18) obeys the triangle inequality. So by Halff’s Theorem, contextual utility obeys MST for all basic triples. To show that Contextual utility can violate SST on basic triples, it is sufficient to show an example. Consider an expected value maximizer. Assume that C, D, and E have outcome ranges [0,200], [100,300], and [100,400], respectively, and expected values 162, 160, and 150, respectively. The latent variable in contextual utility is the ratio of a pair’s V-distance to the pair’s range of possible utilities. In this example, these ratios are 2/300 in pair {C,D}, 10/300 in pair {D,E}, and 12/400 ¼ 9/300 in pair {C,E}. All are positive, implying that all choice probabilities (of the first lottery in each pair) exceed 1/2, satisfying WST. However, the probability that C is chosen over E will be less than the probability that D is chosen over E, since the latent variable in the former pair (9/300) falls short of the latent variable in the latter pair (10/300). This violates SST.
APPENDIX C Definitions. Let ðzj ; zk ; zl Þ be the context of any MPS pair fS m ; Rm g ¼ fðsmj ; smk ; sml Þ; ðrmj ; rmk ; rml Þg. Rewrite the lotteries as fðsmj ; 1 smj sml ; sml Þ; ðrmj ; 1 rmj rml ; rml Þg, and measure the outcome vector in terms of zmk , writing it instead as ðxmj ; 1; xml Þ where xmj ¼ zmj =zmk o1 and xml ¼ zml =zmk 41. Since this is a MPS pair, we have smj xmj þ ð1 smj sml Þ þ sml xml ¼ rmj xmj þ ð1 rmj rml Þ þ rml xml , which implies that ðrmj smj Þ ¼ ab, where a ¼ ðxml 1Þ=ð1 xmj Þ ¼ ðzml zmk Þ=ðzmk zmj Þ 40 and b ¼ ðrml sml Þ. Call b the spread size. Obviously a is positive for any nondegenerate lottery, and b is positive too since by convention Rm is the riskier lottery in all basic pairs (and so has a higher probability of the highest outcome on the context). Also, notice that ð1 smj sml Þ ð1 rmj rml Þ ¼ ð1 þ aÞb.
Stochastic Models for Binary Discrete Choice Under Risk
287
Proposition. Under an (EU,WV) specification, choice probabilities are invariant across pairs in any set of MPS pairs on a given context. Proof. It follows from the definitions that the difference between lottery probability vectors in any MPS pair, that is ðsmj rmj ; smk rmk ; sml rml Þ, can be expressed in the form bð a; 1 þ a; 1Þ where b is the spread size and a depends only on the context. The EU V-distance between the lotteries in an MPS pair is therefore VðSm jbÞ VðRm jbÞ ¼ b½ auj þ ð1 þ aÞuk ul , and q theffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Euclidean distance between the probability vectors ffi 2 2 in an MPS pair is b a þ ð1 þ aÞ þ 1. Under an (EU,WV) specification, the considered choice probability in a MPS pair would therefore be: 0 1 B auj þ ð1 þ aÞuk ul C Pm ¼ F @ qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi A 8 b40 a2 þ ð1 þ aÞ2 þ 1 Therefore, choice probabilities in any MPS pair depend only on the context through the utilities of outcomes and the parameter a; in particular, they are independent of the size b of the spread. The proposition follows from the fact that pairs in any set of MPSs on a single context differ only by the size of the spread b and (perhaps) their expected values. But clearly the expression above is independent of expected values in lottery pairs as well: It depends only on the context and the utilities of the outcomes on the context.
APPENDIX D Definitions. Let fC h ; Dh ; E h g be a spread triple. From Appendix C, we can write the difference S m Rm between probability vectors in any MPS pair fS m ; Rm g as bð a; 1 þ a; 1Þ, where b is the spread size rml sml in the pair and a ¼ ðzml zmk Þ=ðzmk zmj Þ depends only on the context of the pair. Thus, for a spread triple h, we can write: (i) C h Dh ¼ bCD ðah ; 1 þ ah ; 1Þ, where bCD is the spread size in MPS pair fC h ; Dh g and (ii) Dh E h ¼ bDE ðah ; 1 þ ah ; 1Þ, where bDE is the spread size in MPS pair fDh ; E h g.
288
NATHANIEL T. WILCOX
Proposition. Betweenness implies that either C h kDh and Dh kE h , or E h kDh and Dh kC h , in every spread triple h. Proof. This follows immediately from betweenness if we can show that in every spread triple, there exists t 2 ½0; 1 such that Dh ¼ tC h þ ð1 tÞE h . From (ii), Dh ¼ E h þ bDE ð ah ; 1 þ ah ; 1Þ ¼ E h þ tðbCD þ bDE Þðah ; 1þ ah ; 1Þ, where t ¼ bDE =ðbCD þ bDE Þ is obviously in the unit interval. Adding (i) and (ii), we get C h E h ¼ ðbCD þ bDE Þð ah ; 1 þ ah ; 1Þ, which can be substituted into the previous result to give Dh ¼ E h þ tðCh E h Þ ¼ tC h þ ð1 tÞE h , which is as required.
APPENDIX E A process very similar to that detailed in Section 5.4.2 was used to select and estimate random parameters characterizations of heterogeneity for all specifications. In each case, a specification’s parameter vector c is first estimated separately for each subject with a fixed tremble probability n o ¼ 0.04. Let these estimates be c~ . The correlation matrix of these ~ n are also subjected to a parameters is then computed, and the vectors c principal components analysis, with particular attention to the firstprincipal component. As with the detailed example of the (EU,Strong) specification, all non-RP specifications with utility parameters u2 and u3 (i.e., any strict, strong, contextual, or WV specification) yield quite high Pearson correlations between lnðu~n2 1Þ and lnðu~n3 1Þ across subjects, and heavy loadings of these on first principal components of estimated ~ n . Therefore, the population distributions JðcjyÞ, where parameter vectors c c ¼ ðu2 ; u3 ; lÞ (non-RP EU specifications) or c ¼ ðu2 ; u3 ; g; lÞ (non-RP RDEU specifications), are in all cases modeled as having a perfect correlation between lnðun2 1Þ and lnðun3 1Þ, generated by an underlying standard normal deviate xu. Quite similarly, individual estimations of the two RP specifications, with Gamma distribution shape parameters f1 and f2 , yield quite high ~ n Þ and lnðf ~ n Þ across subjects, and heavy Pearson correlations between lnðf 1 2 loadings of these on first principal components of estimated parameter ~ n . Therefore, the joint distributions JðcjyÞ, where c ¼ ðf ; f ; kÞ vectors c 1 2 in the (EU,RP) specification, or c ¼ ðg; f1 ; f2 ; kÞ in the (RDEU,RP) specification, are both assumed to have a perfect correlation between lnðfn1 Þ
Stochastic Models for Binary Discrete Choice Under Risk
289
and lnðfn2 Þ in the population, generated by an underlying standard normal deviate xf. In all cases, all other specification parameters are characterized as possibly partaking of some of the variance represented by xu (in non-RP specifications) or xf (in RP specifications), but also having independent variance represented by an independent standard normal variate. In essence, all correlations between specification parameters are represented as arising from a single underlying first principal component (xu or xf), which in all cases accounts for two-thirds (frequently more) of the shared variance n of parameters in c~ according to the principal components analyses. The correlation is assumed to be a perfect one for lnðun2 1Þ and lnðun3 1Þ (in non-RP specifications) or lnðfn1 Þ and lnðfn2 Þ (in RP specifications), since this seems very nearly characteristic of all of the preliminary individual estimations; but aside from o, other specification parameters are given their own independent variance since their correlations with lnðun2 1Þ and lnðun3 1Þ are always weaker than that observed between lnðun2 1Þ and lnðun3 1Þ (similarly for lnðfn1 Þ and lnðfn2 Þ in RP specifications). The following equation systems show the characterization for all specifications, where any subset of xu, xf, xl, xk, and xg found in each characterization are jointly independent standard normal variates. Tremble probabilities o are modeled as constant in the population, for reasons discussed in Section 5.1, and so there are no equations below for o. The systems represent the choice probabilities Pm , but of course Pm ¼ ð1 oÞPm þ o=2 is used to build likelihood functions allowing for trembles. As in the text, L, G, and B0 are the logistic, gamma, and beta-prime cumulative distribution functions, respectively. In all non-RP specifications, the utility vector ðuj ; uk ; ul Þ for pair m is related to the functions u2 ðxu ; yÞ and u3 ðxu ; yÞ in the precise way shown below Eq. (47) in the text, that is: uj ¼ 1 for pairs m on c ¼ 0; that is contextð1; 2; 3Þ; and uj ¼ 0 otherwise; uk ¼ u2 ðxu ; yÞ for pairs m on c ¼ 0; that is context ð1; 2; 3Þ; and uk ¼ 1 otherwise; and ul ¼ u2 ðxu ; yÞ for pairs m on c ¼ 3; that is context ð0; 1; 2Þ; and ul ¼ u3 ðxu ; yÞ otherwise The non-RP specifications below also include a divisor d m . This divisor is as
290
NATHANIEL T. WILCOX
follows: u2 ðxu ; yÞ for pairs m on context c ¼ 3; d m ¼ u3 ðxu ; yÞ for pairs m on context c ¼ 2; and u3 ðxu ; yÞ 1 for pairs m on context c ¼ 0; for contextual specifications; d m ¼ ððsmj rmj Þ2 þ ðsmk rmk Þ2 þ ðsml rml Þ2 Þ0:5 for WV specifications; and d m 1 for strong and strict specifications: Finally, the non-RP specifications below also include a function f ðxÞ: It is f ðxÞ ¼ lnðxÞ for strict specifications and f ðxÞ ¼ x for strong, contextual, and WV specifications. (EU,Strong), (EU,Strict), (EU,Contextual), and (EU,WV) specifications:
f ðsmj uj þ smk uk þ sml ul Þ f ðrmj uj þ rmk uk þ rml ul Þ ; Pm ðxu ; xl ; yÞ ¼ L lðxu ; xl ; yÞ dm where u2 ðxu ; yÞ ¼ 1 þ expða2 þ b2 xu Þ; u3 ðxu ; yÞ ¼ 1 þ expða3 þ b3 xu Þ; and lðxu ; xl ; yÞ ¼ expðal þ bl xu þ cl xl Þ; where y ¼ ða2 ; b2 ; a3 ; b3 ; al ; bl ; cl Þ
(EU,RP) specification: Pm ðxf ; xk ; yÞ ¼ GðRm jf1 ðxf ; yÞ; kðxf ; xk ; yÞÞ for pairs m on c ¼ 3; GðRm jf1 ðxf ; yÞ þ f2 ðxf ; yÞ; kðxf ; xk ; yÞÞ for pairs m on c ¼ 2; and B0 ðRm jf2 ðxf ; yÞ; f1 ðxf ; yÞÞ for pairs m on c ¼ 0; where Rm ¼
smk þ sml rmk rml ; and rml sml
f1 ðxf ; yÞ ¼ expða1 þ b1 xf Þ; f2 ðxf ; yÞ ¼ expða2 þ b2 xf Þ; and kðxf ; xk ; yÞ ¼ expðak þ bk xf þ ck xk Þ; where y ¼ ða1 ; b1 ; a2 ; b2 ; ak ; bk ; ck Þ
291
Stochastic Models for Binary Discrete Choice Under Risk
(RDEU,Strong), (RDEU,Strict), (RDEU,Contextual), and (RDEU,WV) specifications: Pm ðxu ; xl ; xg ; yÞ ¼ L lðxu ; xl ; yÞ½ f ðwsmj ðxu ; xg ; yÞuj þ wsmk ðxu ; xg ; yÞuk þ wsml ðxu ; xg ; yÞul Þ
f ðwtmj ðxu ; xg ; yÞuj þ wtmk ðxu ; xg ; yÞuk þ wtml ðxu ; xg ; yÞul Þ=d m ; where ! ! P P wsmi ðxu ; xg ; yÞ ¼ w smz jgðxu ; xg ; yÞ w smz jgðxu ; xg ; yÞ and zzmi
wrmi ðxu ; xg ; yÞ ¼ w
P
! rmz jgðxu ; xg ; yÞ w
zzmi
z4zmi
P
! rmz jgðxu ; xg ; yÞ ; where
z4zmi
wðqjgðxu ; xg ; yÞÞ ¼ expð ½ lnðqÞgðxu ;xg ;yÞ Þ; and u2 ðxu ; yÞ ¼ 1 þ expða2 þ b2 xu Þ; u3 ðxu ; yÞ ¼ 1 þ expða3 þ b3 xu Þ; gðxu ; xg ; yÞ ¼ expðag þ bg xu þ cg xg Þ; and lðxu ; xl ; yÞ ¼ expðal þ bl xu þ cl xl Þ; where y ¼ ða2 ; b2 ; a3 ; b3 ; ag ; bg ; cg ; al ; bl ; cl Þ (RDEU,RP) specifications: Pm ðxf ; xk ; xg ; yÞ ¼ GðWRm ðxf ; xg ; yÞjf1 ðxf ; yÞ; kðxf ; xk ; yÞÞ for pairs m on c ¼ 3; GðWRm ðxf ; xg ; yÞjf1 ðxf ; yÞ þ f2 ðxf ; yÞ; kðxf ; xk ; yÞÞ for pairs m on c ¼ 2; and B0 ðWRm ðxf ; xg ; yÞjf2 ðxf ; yÞ; f1 ðxf ; yÞÞ for pairs m on c ¼ 0; where w½smk þ sml jgðxf ; xg ; yÞ w½rmk þ rml jgðxf ; xg ; yÞ and WRm ðxf ; xg ; yÞ ¼ w½rml jgðxf ; xg ; yÞ w½sml jgðxf ; xg ; yÞ wðqjgðxf ; xg ; yÞÞ ¼ expð ½ lnðqÞgðxf ;xg ;yÞ Þ; and f1 ðxf ; yÞ ¼ expða1 þ b1 xf Þ; f2 ðxf ; yÞ ¼ expða2 þ b2 xf Þ; gðxf ; xg ; yÞ ¼ expðag þ bg xf þ cg xg Þ; and kðxf ; xk ; yÞ ¼ expðak þ bk xf þ ck xk Þ; where y ¼ ða1 ; b1 ; a2 ; b2 ; ag ; bg ; cg ; ak ; bk ; ck Þ
292
NATHANIEL T. WILCOX
For specifications with the EU structure, a likelihood function nearly identical to Eq. (48) is maximized in y; for instance, for the (EU,RP) specification, simply replace Pm ðxu ; xl ; y; oÞ with Pm ðxf ; xk ; y; oÞ ð1 oÞ Pm ðxf ; xk ; yÞ þ o=2, and replace dFðxu ÞdFðxl Þ with dFðxf ÞdFðxk Þ. For specifications with the RDEU structure, a third integration appears since these specifications allow for independent variance in g (the Prelec weighting function parameter) through the addition of a third standard normal variate xg. In all cases, the integrations are carried out by Gauss–Hermite quadrature. For specifications with the EU structure, where there are two nested integrations, 14 nodes are used for each nested quadrature of an integral. For specifications with the RDEU structure, 10 nodes are used for each nested quadrature. In all cases, starting values for these numerical maximizations are computed in the manner described for the (EU,Strong) n specification: Parameters in c~ are regressed on their first principal component, and the intercepts and slopes of these regressions are the starting values for the a and b coefficients in the specifications, while the root mean squared errors of these regressions are the starting values for the c coefficients found in the equations for l, k, and/or g.
MEASURING RISK AVERSION AND THE WEALTH EFFECT Frank Heinemann ABSTRACT Measuring risk aversion is sensitive to assumptions about the wealth in subjects’ utility functions. Data from the same subjects in low- and highstake lottery decisions allow estimating the wealth in a pre-specified oneparameter utility function simultaneously with risk aversion. This paper first shows how wealth estimates can be identified assuming constant relative risk aversion (CRRA). Using the data from a recent experiment by Holt and Laury (2002a), it is shown that most subjects’ behavior is consistent with CRRA at some wealth level. However, for realistic wealth levels most subjects’ behavior implies a decreasing relative risk aversion. An alternative explanation is that subjects do not fully integrate their wealth with income from the experiment. Within-subject data do not allow discriminating between the two hypotheses. Using between-subject data, maximum-likelihood estimates of a hybrid utility function indicate that aggregate behavior can be described by expected utility from income rather than expected utility from final wealth and partial relative risk aversion is increasing in the scale of payoffs.
Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 293–313 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00005-7
293
294
FRANK HEINEMANN
1. INTRODUCTION It is an open question, whether subjects integrate their wealth with the potential income from laboratory experiments when deciding between lotteries. Most theoretical economists assume that agents evaluate decisions under uncertainty by the expected utility that they achieve from consuming the prospective final wealth that is associated with potential gains and losses from their decisions. This requires that agents fully integrate income from all sources in every decision. The well-known examples provided by Samuelson (1963) and Rabin (2000) raise serious doubts on such behavior. As Rabin points out, a person, who rejects to participate in a lottery that has a 50–50 chance of winning $105 or losing $100 at all initial wealth levels up to $300,000, should also turn down a 50–50 bet of losing $10,000 and gaining $5.5 million at a wealth level of 290,000. More generally, Arrow (1971) has already observed that maximizing expected utility from consumption implies an almost risk-neutral behavior towards decisions that have a small impact on wealth, while many subjects behave risk averse in experiments. There is a long tradition in distinguishing two versions of expected utility theory: expected utility from wealth (EUW) versus expected utility from income (EUI). EUW assumes full integration of income from all sources in each decision and is basically another name for expected utility from consumption over time (Johansson-Stenman, 2006). EUI assumes that agents decide by evaluating the prospective gains and losses associated with the current decision, independent from initial wealth. Agents who behave according to EUI isolate risky decisions of an experiment from other decisions in their lives. EUI is inconsistent with maximizing expected utility from consumption. It amounts to preferences (on consumption) depending on the reference point given by the initial wealth position (Wakker, 2005). Markowitz (1952) has already demonstrated that EUI may explain some puzzles like people buying insurance and lottery tickets at the same time.1 Prospect theory, introduced by Kahneman and Tversky (1979), is in the tradition of EUI by stating that people evaluate prospective gains and losses in relation to a status quo. Cox and Sadiraj (2004, 2006) provide an example showing that maximizing EUI is consistent with observable behavior in small- and large-stake lottery decisions. Barberis, Huang, and Thaler (2006) argue in the same direction, calling EUI ‘‘narrow framing.’’ Using data from medium-scale lottery decisions in low-income countries, Binswanger (1981) and Schechter (2007) argue that the evidence is consistent with EUI, but inconsistent with EUW, because asset integration implies absurdly high levels of risk aversion.
Measuring Risk Aversion and the Wealth Effect
295
However, it is hard to imagine that people strictly behave according to EUI: consider, for example, a person deciding on whether to invest in a certain industry or buy a risky bond. If the investor has a given utility function defined over prospective gains and losses independent from status quo, then the decision would be independent from her or his initial portfolio, which seems implausible and betrays all theories of optimal portfolio composition.2 Sugden (2003) suggests an axiomatic approach of reference-dependent preferences that generalizes expected utility and includes EUW, EUI, and prospect theory as special cases. Cox and Sadiraj (2004, 2006) also suggest a broader model of expected utility depending on two variables, initial (non-random) wealth and prospective gains from risky decisions, which may enter the utility function without being fully integrated. One way to test this is a two-parameter approach to measuring utility functions: one parameter determines the local curvature of the utility function (like traditional risk aversion) and a second parameter determines the degree to which a subject integrates wealth with potential gains and losses from a lottery. Holt and Laury (2002a) report an experiment designed to measure how risk aversion is affected by an increase in the scale of payoffs. In this experiment each subject participates in two treatments that are distinguished by the scale of payoffs. Holt and Laury observe that most subjects increase the number of safe choices with an increasing payoff scale and conclude that relative risk aversion (RRA) must be rising in scales. In this paper, we show that within-subject data from subjects who participate in small- and large-stake lottery decisions can be used for a simultaneous estimation of constant relative risk aversion (CRRA) and the degree to which subjects integrate their wealth. The wealth effect is identified only if there is a substantial difference in the scales. The experiment by Holt and Laury (2002a) and a follow-up study by Harrison, Johnson, McInnes, and Rutstro¨m (2005) satisfy this requirement. We use their data and show: 1. For 90% of all subjects whose behavior is consistent with expected utility maximization the hypothesis of a CRRA cannot be rejected. 2. If subjects integrate their true wealth, then most subjects have a decreasing RRA. 3. If subjects have a non-decreasing RRA, the degree to which most subjects integrate initial wealth with lottery income is extremely small. 4. Combining the ideas of Holt and Laury (2002a) with Cox and Sadiraj (2004, 2006), we construct an error-response model with a threeparameter hybrid utility function generalizing CRRA and constant
296
FRANK HEINEMANN
absolute risk aversion (CARA) and containing a parameter that measures the integration of initial wealth. A maximum-likelihood estimate based on between-subject data yields the result that subjects fail to integrate initial wealth in their decisions. Thus, it confirms EUI. For the estimated utility function, partial RRA is increasing in the scale of payoffs. In the next section, we explain how the degree to which subjects integrate their wealth in laboratory decisions can be measured by within-subject data from small- and large-stake lottery decisions. Section 3 applies this idea to the data obtained by Holt and Laury (2002a) and Harrison et al. (2005). Section 4 uses the data from their experiments to estimate a three-parameter hybrid utility function. Section 5 concludes and raises questions for future research.
2. THEORETICAL CONSIDERATIONS Let us first consider the traditional approach of EUW, which assumes that decisions are based on comparisons of utility from consumption that can be financed with the financial resources available to a decision maker. Let U(y) be the indirect utility, i.e., the utility that an agent obtains from spending an amount y, and assume Uu(y)W0. Consider a subject asked to decide between two lotteries R and S. Lottery L R (risky) yields a high payoff of xH R with probability p and a low payoff xR H with probability 1p. Lottery S (safe) yields xS with probability p and xLS H L L with probability 1p, where xH R 4 xS 4 xS 4 xR . Let p vary from 0 to 1 continuously and ask the subject for the preferred lottery for different values of p. An expected utility maximizer should choose S for low probabilities p of gaining the high payoff and switch to R at some level p1 that depends on the person’s utility function. At p1 the subject may be thought of as being indifferent between both lotteries, i.e., L H L p1 UðW þ xH R Þ þ ð1 p1 ÞUðW þ xR Þ ¼ p1 UðW þ xS Þ þ ð1 p1 ÞUðW þ xS Þ (1)
where W is the wealth of this subject from other sources. Now, assume that the utility function has just one free parameter r determining the degree of risk aversion. If W is known, this free parameter is identified by p1. For example, if we assume CRRA, the utility function is given by UðxÞ ¼ sgnð1 rÞx1r for ra1 and UðxÞ ¼ ln x for r ¼ 1, where r is
Measuring Risk Aversion and the Wealth Effect
297
the Arrow–Pratt measure of relative risk aversion (RRA)3. The unknown parameter r is identified by the probability p1 at which the subject is indifferent and can be obtained by solving Eq. (1) for r. However, if W is not known, Eq. (1) has two unknowns and the degree of risk aversion is not identified. Here, we can solve Eq. (1) for a function r1(W). Let us now ask the subject to choose between lotteries Ru and Su that yield k times the payoffs of lotteries R and S, where the scaling factor k differs from 1. Again, the subject should choose Su for low values of p and Ru otherwise. Denote the switching point by pk. Now, we have a second equation L pk UðW þ k xH R Þ þ ð1 pk ÞUðW þ k xR Þ L ¼ pk UðW þ k xH S Þ þ ð1 pk ÞUðW þ k xS Þ
ð2Þ
and the two Eqs. (1) and (2) may yield a unique solution for both unknowns W and r. Assuming CRRA, the solution to this second equation is characterized by a function rk(W). If the subject is an expected utility maximizer with a CRRA ra0, then the two functions r1 and rk have a unique intersection. Thereby, the simultaneous solution to Eqs. (1) and (2) identifies the wealth level and the degree of risk aversion. Denote this ^ r^Þ. solution by ðW; If the subject is risk neutral, then r1 ðWÞ ¼ rk ðWÞ ¼ 0 for all W. Risk aversion is still identified (^r ¼ 0), but not the wealth level. If the two functions do not intersect at any W, then the model is misspecified. Either the subject does not have a CRRA or she is not an expected utility maximizer. Simulations show that for s close to 1, the difference r1 ðWÞ rk ðWÞ is ^ This implies that small errors in the observations have a large very flat at W. impact on the estimated values W^ and r^. Reliable estimates require that the scaling factor k is sufficiently different from 1 (at least 10 or at most 0.1). Obviously, one could also identify W and r by different pairs of lotteries in the same payoff scale. Unfortunately, this leads to the same problem as having a low scaling factor: measurement errors have an extremely large impact on W^ and r^. Fig. 1 below shows functions r1 (dashed curves) and rk (solid curves) for a particular example. The difference in slopes between dashed and solid curves identifies W. This difference is due to the scaling factor and diminishes for scaling factors k close to 1.
298
FRANK HEINEMANN (W,r) Combinations consistent with CRRA of median subject in the Experiment by Holt and Laury (2002a) 1,4 1,2 1
r
0,8 consistent (W,r) combinations
0,6
p = 0.6, k = 1
0,4
p = 0.5, k = 1 0,2
p = 0.7, k = 20 p = 0.6, k = 20
0 0
Fig. 1.
1
2
3
4 W
5
6
7
8
(W,r)-Combinations in the Rhombic Area are Consistent with CRRA of a Subject with Median Number of Safe Choices in Both Treatments.
The bottom line to these considerations is that we may estimate individual degrees of risk aversion and the wealth effect simultaneously from smalland large-stake lottery decisions of the same subjects.
3. ANALYZING INDIVIDUAL DATA FROM THE EXPERIMENT BY HOLT AND LAURY Holt and Laury (2002a) present a carefully designed experiment in which subjects first participate in low-scale lottery decisions and then in large-scale lottery decisions. Both treatments are designed as multiple-price lists, where probabilities vary by 0.1. This leaves a range for all estimates that may be in the order of measurement errors. Before participating in the large-scale lottery, subjects must give up their earnings from the previous low-scale treatment. Thereby, subjects’ initial wealth is the same in both treatments4. In the experiment, subjects make 10 choices between paired lotteries as laid out in Table 1. In the low-stake treatment, payoffs for option S are L xH S ¼ $2:00 with probability p and xS ¼ $1:60 with probability 1p. Payoffs
299
Measuring Risk Aversion and the Wealth Effect
Table 1. The 10-paired Lottery-choice Decisions with Low Payoffs. Option S 1/10 of $2.00, 9/10 of $1.60 2/10 of $2.00, 8/10 of $1.60 3/10 of $2.00, 7/10 of $1.60 4/10 of $2.00, 6/10 of $1.60 5/10 of $2.00, 5/10 of $1.60 6/10 of $2.00, 4/10 of $1.60 7/10 of $2.00, 3/10 of $1.60 8/10 of $2.00, 2/10 of $1.60 9/10 of $2.00, 1/10 of $1.60 10/10 of $2.00, 0/10 of $1.60
Option R 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10
of of of of of of of of of of
$3.85, $3.85, $3.85, $3.85, $3.85, $3.85, $3.85, $3.85, $3.85, $3.85,
9/10 8/10 7/10 6/10 5/10 4/10 3/10 2/10 1/10 0/10
Expected Payoff Difference of of of of of of of of of of
$0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10
$1.17 $0.83 $0.50 $0.16 $0.18 $0.51 $0.85 $1.18 $1.52 $1.85
L for option R are xH R ¼ $3:85 with probability p and xR ¼ $0:10 with probability 1 p. Probabilities vary for the 10 pairs from p ¼ 0.1 to 1.0 with increments of 0.1 for p. The difference in expected payoffs for option S versus option R decreases with rising p. For p 0:4, option S has a higher expected payoff than option R. For p 0:5 the order is reversed. In the high-stake treatments, payoffs are scaled up by a factor k which is either 20, 50, or 90 in different sessions. Harrison et al. (2005) repeated the experiment with a scaling factor k ¼ 10. In addition, they control for order effects arising from the subsequent presentation of lotteries with different scales. Comparing results from these two samples will show how robust our conclusions are. Subjects typically choose option S for low values of p and option R for high values of p. Most subjects switch at probabilities ranging from 0.4 to 0.9 with the proportion of choices for option S increasing in the scaling factor. Observing the probabilities, at which subjects switch from S to R, Holt and Laury estimate individual degrees of RRA on the basis of a CRRA utility function U(x), where x is replaced by the gains from the lotteries. Initial wealth W is assumed to be zero. The median subject chooses S for p 0:5 and R for p 0:6 in the treatment with k ¼ 1. For k ¼ 20 the median subject chooses S for p 0:6 and R for p 0:7. Median and average number of safe choices are increasing in the scaling factor. For W ¼ 0, RRA is independent from k. Holt and Laury use this property to argue that higher switching probabilities at high stakes are evidence for increasing RRA. For a positive initial wealth, however, the evidence is consistent with constant or even decreasing RRA as will be shown now.
300
FRANK HEINEMANN
Consider, for example, the behavior of the median subject. In the lowstake treatment, she is switching from S to R at some probability p1, with 0:5 p1 0:6. Solving Eq. (1) for r at p1 ¼ 0:5 gives us a function rmin 1 ðWÞ. Solving Eq. (1) for r at p1=0.6 yields rmax ðWÞ. Combinations of wealth and 1 risk aversion in the area between these two functions are consistent with CRRA. These (W,r)-combinations are illustrated in Fig. 1 above by the area between the two dashed curves. In the high-stake treatment with k ¼ 20, the median subject switches at some probability pk between 0.6 and 0.7. This behavior is consistent with CRRA for all (W,r)-combinations indicated by the range between the two solid curves in Fig. 1. The two areas intersect, and for any (W,r)-combination in this intersection her behavior in both treatments is consistent with CRRA. As Fig. 1 indicates, the behavior of the median subject is consistent with CRRA if 0 W 7:5. Without knowing her initial wealth, we cannot reject the hypothesis that the median subject has a CRRA. Participants of the experiment were US-American students, and their initial wealth is most certainly higher than $7.50. If we impose the restriction W W7.50, then we can reject the hypothesis that the median subject has a constant or even increasing RRA. For any realistic wealth level her RRA must be decreasing. There are nine subjects who behaved like the median subject. By the same logic we can test whether the behavior of other subjects is consistent with constant, increasing or decreasing RRA, and at which wealth levels. Table 2 gives an account of the distribution of choices for subjects who switched at most once from S to R (for increasing p) and never switched in the other direction. In Holt and Laury (2002a), there were 187 subjects participating in sessions with a low-scale and a real high-scale treatment. Twenty-five of these subjects were switching back at some point, making their behavior inconsistent with maximizing expected utility. We exclude these subjects from our analysis. This leaves us with 162 subjects for whom we can analyze whether their behavior is consistent with CRRA.5 In Table 2, rows count the number of safe choices in the low-stake treatment, while columns count the number of safe choices in the high-stake treatment. For the purpose of testing whether RRA is constant, increasing or decreasing, we can join the data from sessions with k=20, 50, and 90.6 One hundred five subjects are counted in cells above the diagonal. They made more safe choices in the high-stake treatment than in the low-stake treatment. Forty subjects are counted on the diagonal: they made the same choices in both treatments. Seventeen subjects below the diagonal have chosen more of the risky options in the high-stake treatment than for low payoffs.
301
Measuring Risk Aversion and the Wealth Effect
Harrison et al. (2005) had 123 subjects participating in their 1 10 treatment. One hundred two of these subjects behaved consistent with expected utility maximization, i.e., never switched from R to S for increasing p. Sixty-five subjects are counted in cells above the diagonal, 28 subjects are counted on the diagonal, and 9 subjects below the diagonal. In the remainder of this section we refer to the data by Holt and Laury (2002a, 2002b) without brackets and to data from Harrison et al. (2005) in brackets [ ]. To count how many subjects behave in accordance with CRRA, we sort subjects in 10 groups indicated by letters A to J in Tables 2 and 3. A. Group A contains 10 [8] subjects. Their behavior implies increasing RRA at all wealth levels. B. Group B contains 2 [0] subjects. Their behavior is consistent with CRRA at W ¼ 0. For W W 5 their behavior is consistent only with increasing RRA. C. Group C contains 25 [21] subjects. Their behavior is consistent with constant, increasing, or decreasing RRA at all wealth levels. D. Group D contains 32 [18] subjects. Their behavior is consistent with increasing RRA at all wealth levels and with constant or decreasing Table 2. 0 0 1 2 3 4 5 6 7 8 9
Distribution of Choices in the Experiment by Holt and Laury (2002a). 1
2
3
4
5
6
7
3 (A) 10 (C) 8 (F) 2 (G)
3 (A) 11 (D) 12 (F) 10 (F) 3 (G) 1 (H)
1 (A) 4 (D) 11 (E) 17 (F) 7 (F)
8
9
1 (A)
1 (B) 1 (B) 3 (C) 10 (C) 3 (G) 2 (H)
1 (H) 1 (H)
1 (H)
1 3 1 8 2 3
(A) (D) (E) (E) (F) (F)
1 (A) 1 (D) 1 (D) 4 (D) 8 (D) 1 (C) 1 (C)
Note: Rows indicate the number of safe choices in the low-stake treatment, columns indicate the number of safe choices in the real high-stake treatment. Letters in parentheses refer to the groups. Source: Holt and Laury (2002b).
302
FRANK HEINEMANN
Table 3. 0 0 1 2 3 4 5 6 7 8 9
Distribution of Choices in the Experiment by Harrison et al. (2005). 1
2
1 (I)
3
4
5
6
9 (C) 1 (G) 1 (H)
3 (A) 5 (C) 7 (F) 2 (G)
1 (A) 2 (A) 3 (D) 8 (F) 6 (F) 3 (G)
7
8
9
1 (A) 2 (D) 4 (E) 9 (F) 3 (F) 1 (G)
3 (D) 1 (J) 4 (E) 3 (F) 1 (F)
1 (A) 1 (D) 5 (D) 4 (D) 5 (C) 2 (C)
Note: Rows indicate the number of safe choices in the low-stake treatment, columns indicate the number of safe choices in the high-stake treatment. Letters in parentheses refer to the groups. Source: Harrison et al. (2005).
E.
F.
G.
H. I.
J.
RRA at all wealth levels W W 50. It is inconsistent with non-increasing RRA at W ¼ 0. Group E contains 20 [8] subjects. Their behavior is consistent with increasing RRA at W ¼ 0, with decreasing RRA for W W 50, and with CRRA for some wealth level in the range 0 o W o 50. It is inconsistent with non-increasing RRA at W ¼ 0 and with non-decreasing RRA at W W50. Group F contains 59 [37] subjects. Their behavior is consistent with decreasing RRA at all wealth levels and with constant or increasing RRA at W ¼ 0. It is inconsistent with non-decreasing RRA at W W 15. Group G contains 8 [7] subjects. Their behavior is inconsistent with increasing RRA at all wealth levels. It is consistent with CRRA at W ¼ 0. For W W0 their behavior is consistent only with decreasing RRA. Group H contains 6 [1] subjects. Their behavior implies decreasing RRA at all wealth levels. Group I contains 0 [1] subject. Her or his behavior is consistent with constant, increasing, and decreasing RRA if W W 5. W o 5 implies increasing RRA. Group J contains 0 [1] subject. Her or his behavior is consistent with constant, increasing, and decreasing RRA for 2 o W o 1; 000. W o 2 implies decreasing RRA, W W1,000 implies decreasing RRA.
Measuring Risk Aversion and the Wealth Effect
303
Summing up, the behavior of 146 [93] out of 162 [102] subjects (90% [91%]) is consistent with CRRA at some wealth level (groups B–G, I, J). For 62 [35] subjects (38% [34%]) a wealth level of W ¼ 0 implies an increasing RRA (groups A, D, E, and I), for 6 [2] subjects (4% [2%]) W ¼ 0 implies decreasing RRA (groups H and J). Realistic wealth levels are certainly all above $50. For 93 [53] subjects (57% [52%]) W W 50 implies decreasing RRA (groups E–H), and only for 12 [8] subjects (7% [8%]) a realistic wealth level implies increasing RRA (groups A–B). This analysis shows that the data do not provide firm grounds for the hypothesis that RRA is increasing in the scale of payoffs. There seems to be more evidence for the opposite conclusion: most subjects’ behavior is qualified to reject constant or increasing RRA in favor of decreasing RRA at any realistic wealth level. Decreasing RRA is a possible explanation for behavior in low-stake and high-stake lottery decisions in the experiment. An alternative explanation is that subjects do not fully integrate their wealth with the prospective income from lotteries. Cox and Sadiraj (2004) suggest a two-parameter utility function UðW; xÞ ¼ sgnð1 rÞðdW þ xÞ1r
(3)
where W is initial wealth, x the gain from a lottery, and d a parameter thought to be smaller than 1 that rules the degree to which a subject integrates initial wealth with prospective gains from the lottery. Parameter r is the curvature of this function with respect to dW+x. Although the functional form is similar to CRRA, r cannot be interpreted as the Arrow–Pratt measure of RRA, as will be explained below. Cox and Sadiraj (2004) provide an example to show that the puzzle raised by Rabin (2000) can be resolved if dW is close to 1. In the experiment, we do not know subjects’ wealth W. Hence, d is not identified. But, we can estimate integrated wealth dW by the same method that we applied for analyzing wealth levels at which observed behavior is consistent with CRRA. Each cell in Tables 2 and 3 is associated with a range for integrated wealth dW that is consistent with utility function (3). From the previous analysis we know already that the behavior of 90% of all subjects, who never switch from R to S for increasing p, is consistent with Eq. (3) at some level of integrated wealth. Going through all cells and counting at which wealth levels their behavior is consistent with Eq. (3), yields the result that the proportion of subjects whose behavior is consistent with Eq. (3) has a maximum at dW 1.
304
FRANK HEINEMANN
For dW ¼ 0, there are 54 [37] subjects whose behavior is consistent with Eq. (3) only for one particular value of r. The median subject shown in Fig. 1 is such a case. Her behavior is consistent with Eq. (3) at dW ¼ 0 if and only if r is precisely 0.4115. It is unlikely that more than 30% of all subjects have a degree of risk aversion that comes from a set with measure zero. It is much more likely that these subjects have a positive level of integrated wealth, which opens a range for r at which behavior is consistent with Eq. (3). The proportion of subjects whose behavior is robustly consistent with Eq. (3) at dW ¼ 0 drops to 25% [27%], and we get a unique maximum for this proportion at dW ¼ 1. We illustrate this proportion in Figs. 2 and 3 for the two data sets. Some subjects’ behavior is consistent with Eq. (3) only for sufficiently high levels of dW, while others require dW to be small. For 55% [51%] of all subjects behavior is consistent with Eq. (3) only if dW o 50. Thus, it seems that most subjects integrate initial wealth in their evaluation of lotteries only to a very small degree. Let us now analyze what this means for the question of whether RRA is increasing or decreasing. The answer may depend on how we define RRA for this utility function. The Arrow–Pratt measure is defined by RRA ¼ yU 00 ð yÞ=U 0 ð yÞ, where y is the single argument in the indirect utility function comprising initial wealth with potential gains from lotteries. Utility function (3) has two arguments, though. Suppose that d is a positive constant. Then one might define RRA by the derivatives of Eq. (3) with
Proportion of subjects whose behavior is consistent with utility function (3) 60% 50% 40% 30% 20% 0
10
20 30 integrated wealth W
40
50
Fig. 2. Proportion of Subjects Whose Behavior is Consistent with Utility Function (3). Non-robust Cases Counted as Inconsistent. Source: Holt and Laury (2002b).
305
Measuring Risk Aversion and the Wealth Effect Proportion of subjects whose behavior is consistent with utility function (3) 70% 60% 50% 40% 30% 20% 0
10
20 30 integrated wealth W
40
50
Fig. 3. Proportion of Subjects Whose Behavior is Consistent with Utility Function (3). Non-robust Cases Counted as Inconsistent. Source: Harrison et al. (2005).
respect to W or x. RRAW ¼ W
@2 U=@W 2 rdW ¼ dW þ x @U=@W
(4)
is increasing in W if r W 0 and decreasing if r o 0. Thus, increasing wealth increases the curvature of the utility function with respect to wealth. RRAx ¼ x
@2 U=@x2 rx ¼ dW þ x @U=@x
(5)
is increasing in x if r W 0 and decreasing if r o 0. Thereby, increasing the scale of lottery payments x increases the absolute value of this measure of RRA for all subjects who are not risk neutral. Following Binswanger (1981), we may call RRAx ‘‘partial RRA,’’ because it defines the curvature of the utility function with respect to the potential income from the next decision only. RRAW is the curvature of the utility function with respect to wealth, which is relevant for portfolio choice and all kinds of normative questions. While the absolute value of RRAW is increasing in W, it is decreasing in x. This means that increasing the scale of lottery payments reduces RRAW. This reconciles the results for utility function (3) with the previous result that for fully integrated wealth, risk aversion must be decreasing to explain
306
FRANK HEINEMANN
the predominant behavior. On the other hand, the absolute value of partial risk aversion is decreasing in W. Thus, subjects with a higher wealth should (on average) accept more risky bets. These properties are inherent in utility function (3) and can, therefore, not be rejected without rejecting utility function (3). As we laid out before, 90% of the subjects who never switch from R to S behave in a way that is consistent with Eq. (3). It follows that the experiment is not well-suited to discriminate between the two hypotheses: (i) agents fully integrate wealth and RRA is decreasing, and (ii) agents do not fully integrate wealth. Furthermore, if subjects do not fully integrate wealth, an experiment with lotteries of different scales cannot answer the question of whether RRA is increasing or decreasing in the wealth level. Harrison et al. (2005) have shown that there is an order effect that leads subjects to choose safer actions in a high-scale treatment after participating in a low-scale treatment before than in an experiment that consists of the high-scale treatment only. This order effect may account for a substantial part of the observed increase in the number of safe choices in the high-scale treatments. Although the order effect does not reverse the responses to increasing payoff scale, the numerical estimates are affected. If the increase in safe choices with rising payoff scale had been smaller, then we would find less subjects in upper-right cells of Table 2 and more in cells, for which consistency with fully integrated wealth requires decreasing RRA. The proportion of subjects, whose behavior is consistent with utility function (3) would be shifted to the left, indicating an even lower level of integrated wealth. We infer that accounting for the order effect strengthens our results.
4. ESTIMATING A HYBRID UTILITY FUNCTION Hybrid utility functions with more than two parameters cannot be estimated individually, if within-subject data are only elicited for lotteries of two different scales. In principle, one could do the same exercise with a third parameter, if subjects participate in lottery decisions of three very distinct scales. However, between-subject data can be used to estimate models with more parameters than lottery scales. The obvious disadvantage of this procedure is that one assumes a representative utility function governing the choices of all subjects. Idiosyncratic differences are then attributed to ‘‘errors’’ and assumed to be random.7 Holt and Laury (2002a) apply such an error-response model. They assume a representative agent with a probabilistic choice rule, where the
307
Measuring Risk Aversion and the Wealth Effect
individual probability of choosing lottery S is given by ½EUðSÞ1=m ½EUðSÞ1=m þ ½EUðRÞ1=m EU( ) is the expected utility from the respective lottery and m is an error term. For m-0 the agent chooses the option with higher expected utility almost certainly (rational choice). For m-N, the behavior approaches a 50:50 random choice. Utility is defined by a ‘‘power-expo’’ utility function UðxÞ ¼
1 expðax1r Þ a
This function converges to CRRA for a-0 and to CARA for r-0. For x, Holt and Laury insert the respective gains from lotteries. Again, they assume that initial wealth does not enter the utility function. We extend this approach by including a parameter for integrated wealth, i.e., we use a utility function UðW; xÞ ¼
1 expðaðdW þ xÞ1r Þ a
(6)
where dW is integrated wealth and x is replaced by the respective gains from lotteries. As in the previous analysis, lack of data for personal income prevent an estimation of d. Instead we may treat dW as a parameter of the utility function that is identified. Following Holt and Laury (2002a), we estimate this model using a maximum-likelihood procedure. Table 4 reports the results of these estimates, in the first row for data from decisions in real-payoff treatments by all subjects in Holt and Laury’s sample, in the second row for the data from Harrison et al. (2005). Table 4. Estimated Parameters of the Error-response Model. m Data from (2002b) Data from (2005) Data from (2002b) Data from (2005)
Holt and Laury Harrison et al. Holt and Laury Harrison et al.
0.1156 (0.0063) 0.1324 (0.0100) 0.1315 (0.0046) 0.1726 (0.0074)
r
a
dW
0.324 (0.0251) 0.0327 (0.0441) 0.273 (0.0172) 0.0050 (0.0258)
0.0326 (0.00323) 0.0500 (0.0056) 0.0286 (0.00244) 0.0459 (0.0034)
0.189 (0.069) 0.737 (0.210)
308
FRANK HEINEMANN
Rows 3 to 4 contain the estimates of Holt and Laury’s model for both data sets, which has the additional restriction of dW=0. Numbers in parentheses denote standard errors. We can formally reject the hypothesis that dW=0. p-values are 0.6% for the data from Holt and Laury and below 0.1% for data from Harrison et al.8 However, for both data sets, the estimated amount of asset integration dW is below $1. This shows that subjects behave as if they almost neglect their wealth from other sources. Note that for dW ¼ 0, utility function (6) implies that partial RRA is increasing in x. On the other hand, partial RRA is decreasing in W if 0 o r o 1 and d W 0. We may conclude that partial RRA is increasing in the scale of lottery payments but not in wealth. RRAW is zero for d ¼ 0. This seems to imply a CRRA with respect to wealth. This conclusion is rash, though. Since subjects do not integrate wealth at all, the experiment is inappropriate to measure how risk aversion depends on wealth. It is worth noting that the data from Harrison et al. (2005) do not support the hybrid utility function. The estimated value of r is not significantly different from 0 (all p-values are above 45%). Thus, their data are consistent with constant partial absolute risk aversion.
5. CONCLUSION AND OUTLOOK ON FUTURE RESEARCH The extent to which subjects integrate wealth with potential income from lottery decisions in laboratory experiments can be identified if subjects participate in lottery decisions with small and large payoffs and enter both decisions with the same wealth. To avoid order effects, these decisions should be made simultaneously, for example, by using two multiple-price lists with different scaling factors and then randomly selecting one situation for payoffs. Although the experiment by Holt and Laury (2002a) suffers from an order effect, their within-subject data indicate that most subjects either have a decreasing RRA or integrate their wealth only to a very small extent. Within-subject data do not allow us to discriminate between these two hypotheses. Neither can the hybrid utility function given by Eq. (6), because it implies increasing RRA for d ¼ 1. The calibrations provided by Rabin (2000) are based on full integration of wealth and do not rely on any assumptions
Measuring Risk Aversion and the Wealth Effect
309
about increasing or decreasing risk aversion. Their examples indicate that behavior in low- and medium-stake lotteries (as employed in experiments) can be reconciled with observed behavior on financial markets only, if initial wealth is not fully integrated in laboratory decisions. Our estimates of a common hybrid utility function also indicate that subjects do not integrate initial wealth, partial RRA is increasing in the scale of lotteries but not in wealth. Harrison, List, and Towe (2007) apply this method to another experiment on risk aversion and report a similar result. Sasaki, Xie, Ohtake, Qin, and Tsutsui (2006) report a small experiment on sequential lottery decisions with 30 Chinese students, where neither external income, nor initial wealth, nor previous gains within the experiment have a significant impact on choices. Andersen, Harrison, and Rutstro¨m (2006c) estimate integration of gains in sequential lottery decisions. They allow for observations being partially explained by maximizing expected utility from a CRRA utility function and partially by prospect theory. They find that about 67% of observations are better explained by maximizing expected utility. The estimated degree of CRRA is negative (indicating risk-loving behavior) and they cannot reject the hypothesis that those who maximize expected utility integrate their earning from previous lotteries. When assuming that all subjects have a CRRA utility function, however, they find that earnings from previous lotteries are not integrated in decisions. Although they do not test the integration of initial wealth, their exercise demonstrates that non-integration results may be an artifact of assuming expected utility theory, when in fact a substantial proportion of subjects follows other decision rules. The common evidence from these studies is that initial wealth from outside the laboratory is not fully integrated, but income from previous decisions within the lab may be. Non-integration of initial wealth is also called ‘‘narrow framing’’ by Barberis et al. (2006). Non-integration of laboratory income from subsequent decisions has been called ‘‘myopic risk aversion’’ by Gneezy and Potters (1997). While the evidence for narrow framing is strong, it is still debatable, whether subjects suffer from myopic risk aversion. One possible explanation for the lack of integration of initial wealth with laboratory income can be found in mental accounting:9 subjects treat an experiment and possibly each decision situation as one entity for which they have an aspiration level that they try to achieve with low risk. The dark side of explaining non-integration by mental accounting is that it opens Pandora’s box to context-specific explanations for all kinds of behavior and severely limits the external validity of experiments.
310
FRANK HEINEMANN
Subjects who do not integrate wealth treat decision situations as being to some degree independent from other (previous) decisions, even though the budget constraint connects all economic decisions. They behave as if wealth from other sources is small compared to the amounts under consideration in a particular decision situation. We have formalized this by assuming that a subject considers only a fraction d of wealth from other sources in each decision. In our analysis, we have assumed that d is a constant parameter. However, it seems perfectly reasonable that d might be higher, if a subject has more reasons to consider her wealth in a particular decision. For example, Holt and Laury (2002a) observe that the number of safe choices in high-stake treatments with hypothetical earnings was significantly lower than in high-stake treatments with real earnings. This may be explained by d being higher in situations with real earnings. Andersen, Harrison, Lau, and Rutstro¨m (2006b) estimate integration of initial wealth in a power-expo utility function using data from a British television lottery show, where prices range up to d250,000 and average earnings are above d10,000. They find that participants integrate on average an amount of d26,011, which is likely to be a substantial part of their wealth. Narrow framing and myopic risk aversion have interesting consequences for behavior in financial markets: if a person does not integrate her wealth with the potential income from financial assets that she decides to buy or sell, she does not consider the correlation between the payoffs of different assets and evaluates each asset only by the moments of the distribution of this asset’s returns. Her decision is independent from the distribution of returns from other assets, which results in a portfolio that is not optimally diversified. It is an interesting question for future research under which circumstances subjects consider a substantial part of their wealth in decisions. Gneezy and Potters (1997) have gone so far as to draw practical conclusions for fund managers from the observed disintegration or ‘‘myopic behavior’’ as they call it. However, it is an open question whether and to what extent the framing of a decision situation raises the awareness that decisions affect final wealth. This awareness might be systematically higher for decisions in financial market than for lottery choices in laboratory experiments. A worthwhile study could compare lottery decisions with decisions for an equal payoff structure where lotteries are phrased as financial assets. Another possible explanation for disintegration of wealth in laboratory decisions is the high evaluation that humans have for immediate rewards. Decisions in a laboratory have immediate consequences: subjects get money at the end of a session or at least they get to know how much money they
Measuring Risk Aversion and the Wealth Effect
311
will receive. The positive or negative feedback (depending on outcome and aspiration level) affects personal happiness, although the absolute amount of money is rather small (for k ¼ 90 they win at most $346.50). By introspection, I would suggest that this feeling is much smaller for an unexpected increase in the value of a portfolio by the same amount. People know that immediate rewards evoke emotions and the high degree of risk aversion exhibited in the lab may be a consequence of this foresight.10 Subjects may try to maximize a myopic utility arising from immediate feedback. To test this hypothesis, one might compare behavior in sessions, where the outcome of a lottery is announced immediately after the decision, with sessions in which the outcome is announced with delay. In both treatments, payments would need to be delayed by the same time interval for avoiding that time preference for cash exerts an overlaying effect.
NOTES 1. For a detailed description of the history of reference-dependent utility see Wakker (2005). 2. These theories are, of course, based on EUW. 3. An alternative notion of CRRA utility rescales the function by 1/|1 r| for r a 1, which does not affect the results. 4. Before participating in lottery choices, subjects participated in another unrelated experiment. It cannot be ruled out that previous earnings affected behavior in lottery choices. 5. Provided that non-satiation holds, switching back is inconsistent with expected utility theory. Andersen, Harrison, Lau, and Rutstro¨m (2006a) argue that some of these subjects may be indifferent to monetary payoffs. Switching back occurs more often in the small-stake treatment (14%) than in the large-stake treatments (9% for k ¼ 10 and 5–6% for k ¼ 20, 50, 90). This may be seen as evidence for indifference toward small payoffs. Another explanation would be stochastic mistakes: if each subject makes a mistake with some probability, this probability would need to be in the range of 1–2% to get the observed 90% threshold players. Then, more than 85% of non-threshold players should not make more than one mistake. However, we observe that most non-threshold players make at least two mistakes, which is at odds with the overall small number of non-threshold players, provided that the error probability is the same for all decisions. 6. A detailed analysis of the wealth levels, at which observed choices are consistent with constant, increasing or decreasing RRA is provided by Heinemann (2006). 7. Personal characteristics can explain some of the data variation between subjects (Harrison et al., 2005) and reduce the estimated error rate. By using personal characteristics, the assumption of a common utility function can be replaced by a common function explaining differences in preferences or behavior. Still, one
312
FRANK HEINEMANN
assumes a common function and attributes all unexplained differences to errors, while within-subject data allow estimating one utility function for each subject. 8. Note, however, that p-values are underestimated, because different decisions by the same subject are treated as independent observations. 9. Thaler (1999) and Rabin and Thaler (2001) provide nice surveys of these arguments. Schechter (2005, Footnote 2) provides anecdotal evidence for mental accounting. 10. For a theoretical treatment of this issue see Kreps and Porteus (1978).
ACKNOWLEDGMENTS The author is grateful to Werner Gu¨th, Peter Wakker, Martin Weber, two anonymous referees, and the editors of this issue, Jim Cox and Glenn Harrison for their valuable comments.
REFERENCES Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006a). Elicitation using multiple price list formats. Experimental Economics, 9, 383–405. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006b). Dynamic choice behavior in a natural experiment. Working Paper no. 06-10. Department of Economics, College of Business Administration, University of Central Florida, http://www.bus.ucf.edu/wp/ Working%20Papers/papers_2006.htm Andersen, S., Harrison, G. W., & Rutstro¨m, E. E. (2006c). Dynamic choice behavior: Asset integration and natural reference points. Working Paper no. 06-07. Department of Economics, College of Business Administration, University of Central Florida, http:// www.bus.ucf.edu/wp/Working%20Papers/papers_2006.htm Arrow, K. (1971). Essays in the theory of risk-bearing. Chicago, IL: Markham Publishing Company. Barberis, N., Huang, M., & Thaler, R. H. (2006). Individual preferences, monetary gambles, and stock market participation: A case for narrow framing. American Economic Review, 96(4), 1069–1090. Binswanger, H. P. (1981). Attitudes toward risk: Theoretical implications of an experiment in rural India. The Economic Journal, 91, 867–890. Cox, J. C., & Sadiraj, V. (2004). Implications of small- and large-stakes risk aversion for decision theory. Working paper prepared for workshop on Measuring Risk and Time Preferences by the Centre for Economic and Business Research in Copenhagen, June 2004. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56, 45–60. Gneezy, U., & Potters, J. (1997). An experiment on risk taking and evaluation periods. Quarterly Journal of Economics, 112, 631–645. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2005). Risk aversion and incentive effects: Comment. American Economic Review, 95, 897–901.
Measuring Risk Aversion and the Wealth Effect
313
Harrison, G. W., List, J. A., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study for risk aversion. Econometrica, 75, 433–458. Heinemann, F. (2006). Measuring risk aversion and the wealth effect: Calculations, available at http://anna.ww.tu-berlin.de/Bmakro/Heinemann/publics/measuring-ra.html Holt, C. A., & Laury, S. K. (2002a). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655. Holt, C. A., & Laury, S. K. (2002b). Risk aversion and incentive effects: Appendix, available at http://www2.gsu.edu/Becoskl/Highdata.pdf Johansson-Stenman, O. (2006). A note in the risk behavior and death of Homo Economicus. Working Papers in Economics no. 211. Go¨teborg University. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Kreps, D. M., & Porteus, E. L. (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica, 46, 185–200. Markowitz, H. (1952). The utility of wealth. Journal of Political Economy, 60, 151–158. Rabin, M. (2000). Risk aversion and expected-utility theory: A calibration theorem. Econometrica, 68, 1281–1292. Rabin, M., & Thaler, R. H. (2001). Anomalies: Risk aversion. Journal of Economic Perspectives, 15, 219–232. Samuelson, P. (1963). Risk and uncertainty: A fallacy of large numbers. Scientia, 98, 108–113. Sasaki, S., Xie, S., Ohtake, F., Qin, J., & Tsutsui, Y. (2006). Experiments on risk attitude: The case of Chinese students. Discussion Paper No. 664. Institute of Social and Economic Research, Osaka University. Schechter, L. (2007). Risk aversion and expected-utility theory: A calibration exercise. Journal of Risk and Uncertainty, 35, 67–76. Sugden, R. (2003). Reference-dependent subjective expected utility. Journal of Economic Theory, 111, 172–191. Thaler, R. H. (1999). Mental accounting matters. Journal of Behavioral Decision Making, 12, 183–206. Wakker, P. P. (2005). Formalizing reference dependence and initial wealth in Rabin’s calibration theorem. Working Paper. Econometric Institute, Erasmus University Rotterdam, http:// people.few.eur.nl/wakker/pdf/calibcsocty05.pdf
RISK AVERSION IN THE PRESENCE OF BACKGROUND RISK: EVIDENCE FROM AN ECONOMIC EXPERIMENT Jayson L. Lusk and Keith H. Coble ABSTRACT This paper investigates whether individuals’ risk-taking behavior is affected by background risk by analyzing individuals’ choices over a series of lotteries in a laboratory setting in the presence and absence of independent, uncorrelated background risks. Overall, our results were mixed. We found some support for the notion that individuals were more risk averse when faced with the introduction of an unfair or meanpreserving background risk than when no background risk was present, but this finding depends on how individuals incorporate endowments and background gains and losses into their utility functions and how error variance is modeled.
Characterizing individual behavior in the presence of risk is a fundamental concept in a variety of disciplines. Most risk analysis focuses on individuals’ behavior when faced with single risky decisions such as whether to buy (or sell) an asset with an uncertain return, whether to purchase insurance, or Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 315–340 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00006-9
315
316
JAYSON L. LUSK AND KEITH H. COBLE
how much to pay for other forms of risk protection. However, individuals are rarely faced with a single risk. Individuals are constantly confronted with a variety of risks, some of which can be influenced and others that are exogenous to the individual. Inevitably, an individual must make a particular risky decision before outcomes of other exogenous or ‘‘background’’ risks are fully realized. Theoretical models that ignore background risk have the potential to generate biased estimates of optimal risk-taking behavior. For example, Weil (1992) argued that prices of risky assets are likely to be overestimated and equity premiums underestimated if background risks, such as risk on human capital, are not taken into consideration when calculating optimal portfolio allocations. It seems natural to expect that changes in exogenous background risk might affect individuals’ choices between risky prospects. However, there is not universal agreement on the anticipated effect of background risk on risk aversion. Based on their own intuition, Gollier and Pratt (1996) and Eeckhoudt, Gollier, and Schlesinger (1996) derived necessary and sufficient restrictions on utility such that an addition of, or increase in, background risk will cause a utility-maximizing individual to make more conservative choices in other risky situations. In contrast, Diamond (1984) investigated conditions under which individuals would find a gamble more attractive when another independent risky gamble was added to the portfolio. Quiggin (2003) showed that aversion to one risk will be reduced by the presence of an independent background risk for certain classes of non-expected utility preferences that are consistent with constant risk aversion as in Safra and Segal (1998). It is clear that theoretical expositions cannot provide an unambiguous indication of the effect of background risk on risk aversion. It is ultimately an empirical question as to whether and how individuals’ risk preferences are actually affected by background risk. Unfortunately, existing empirical evidence has provided conflicting results. For example, Heaton and Lucas (2000) found that higher levels of background risk were associated with reduced stock market participation; Guiso, Japelli, and Terlizzese (1996) found that demand for risky assets fell as uninsurable background risks increased; Alessie, Hochguertel, and van Soest (2001) found no relationship between income uncertainty and demand for risky assets; and Arrondel and Masson (1996) found increases in earnings risk were associated with higher levels of stock ownership. To date, such evidence has been based primarily on household survey data, which pose a variety of statistical and inferential challenges. One exception is Harrison, List, and Towe (2007), who studied background risk in the market for rare coins. They compared choices
Risk Aversion in the Presence of Background Risk
317
between gambles to obtain coins of known, certified quality (i.e., low background risk) to choices between gambles to obtain coins with quality certifications removed (i.e., high background risk) and found that increasing background risk increased risk aversion. Harrison et al. (2007) deliberately used a ‘‘naturally occurring’’ form of background risk by removing the quality certification of the coins. This approach provides a qualitative test of the effect of an increase in background risk on foreground risk aversion. One premise of their design, completely plausible in their context, is that virtually all subjects would view the lack of certification as adding risk to the final outcome. However, there is some subjectivity in the amount of background risk that was added in their study. For some purposes it is useful to be able to control this level of background risk explicitly and artefactually, as we do in the laboratory. One reason is that it is possible to imagine naturally occurring contexts where the lack of certification does not generate background risk with the clarity that it does in the coin market setting (e.g., payola scandals in the entertainment industry, or reviews written by film producers). Another reason is that one might want to compare utility functions incorporating final monetary outcomes including the background risk, and one cannot do that without artefactual, objective background risk treatments. Thus, this study complements the field study of Harrison et al. (2007) by studying the effect of adding an explicit background risk on risk-taking behavior in a laboratory setting. In this study we conduct what we believe is the first laboratory experiment to investigate the effect of background risk on risk aversion; we do so by investigating how choices between gambles change when individuals are forced to play another, exogenous gamble. Our experiments were primarily constructed to test for ‘‘risk vulnerability,’’ as defined by Eeckhoudt et al. (1996) by analyzing the effect of adding an exogenous unfair background risk (e.g., a lottery with a negative expected value) and a mean-zero background risk on subjects’ behavior in a risk preference elicitation experiment proposed by Holt and Laury (2002). Our results provide some, but far from unequivocal, support for the notion than individuals exposed to background risks behaved in a more risk-averse manner than subjects with no background risk.
RISK VULNERABILITY Gollier and Pratt (1996) sought to determine the weakest conditions under which (p. 1110), ‘‘y adding an unfair background risk to wealth makes
318
JAYSON L. LUSK AND KEITH H. COBLE
risk-averse individuals behave in a more risk-averse way with respect to another independent risk.’’ They define this condition as ‘‘risk vulnerability’’ because the condition ensures that an individual’s (p. 1110), ‘‘willingness to bear risk is vulnerable to the introduction of another unfair risk.’’ This condition ensures that: (a) introduction of an unfair background risk reduces the certainty equivalent of any other independent risk (i.e., introduction of an unfair background risk reduces the demand for risky assets), and (b) a lottery is never complementary to an unfair gamble (i.e., introduction of an independent, unfair risk cannot make a previously undesirable risk become desirable). Standard risk aversion as defined by Kimball (1990) and proper risk aversion as defined by Pratt and Zeckhauser (1987) both imply risk vulnerability. In general, risk vulnerability implies that the first two derivatives of the utility function are concave transformations of the original utility function. Eeckhoudt et al. (1996) sought to determine the conditions under which any increase in background risk would generate more risk-averse behavior. They focused on first- and second-degree stochastic dominance changes in background risk. Concerning a first-degree stochastic dominance change in background risk, Eeckhoudt et al. (1996) show that decreasing absolute risk aversion (DARA) is sufficient to guarantee that adding a negative noise to background wealth (i.e., an unfair background risk) makes people behave in a more risk-averse way; however, this condition is not sufficient for any firstdegree stochastic dominance change in background risk. They also show that if the third and forth derivatives of the utility function are negative, then a meanpreserving increase in background risk will generate more risk-averse behavior. In this paper, we consider the effect of three types of background risks: none, an unfair background risk, and a mean-preserving increase in background risk. By comparing risk attitudes when subjects are exposed to an unfair background risk versus no background risk, we test the concept of risk vulnerability. Adding a mean-zero background risk (versus no background risk) constitutes a second-degree stochastic dominance change in background risk. By comparing risk attitudes when subjects are exposed to a mean-zero background risk versus no background risk, we test whether individuals have preferences consistent with those outlined in Eeckhoudt et al. (1996) regarding mean-preserving increases in risk.
EXPERIMENTAL PROCEDURES We elicited individuals’ risk attitudes following Holt and Laury (2002). Their approach, which resembles that of Binswanger (1980), entails
319
Risk Aversion in the Presence of Background Risk
individuals making a series of 10 choices between two lotteries, A and B, where, lottery A is the ‘‘safe’’ lottery and lottery B is the ‘‘risky’’ lottery. Table 1 reports the series of decisions subjects were asked to make in all treatments.1 For each decision, a subject choose either option A or option B. Although 10 decisions were made, only one was random selected as binding by rolling a 10-sided die.2 Once the binding decision was determined, the die was thrown again to determine whether the subject received the high or low payoff for the chosen gamble. Subjects participated in one of three treatments: no background risk, mean-zero background risk, or unfair background risk. That is, our experiment involved between-subject comparisons. The treatment with no background risk was a replication of Holt and Laury’s experiment with slightly different payoffs. In the two treatments involving background risk, subjects completed the decision task outlined in Table 1 prior to but with full knowledge that they would participate in a background risk lottery.3 That is, individuals’ risk preferences were elicited via the decision task when individuals knew they would subsequently face an independent, exogenous background risk over which they had no control. In the mean-zero Table 1. Decision 1 2 3 4 5 6 7 8 9 10
Decision Task.
Option A 10% chance of $10.00 20% chance of $10.00 30% chance of $10.00 40% chance of $10.00 50% chance of $10.00 60% chance of $10.00 70% chance of $10.00 80% chance of $10.00 90% chance of $10.00 100% chance of $10.00
90% chance of $8.00 80% chance of $8.00 70% chance of $8.00 60% chance of $8.00 50% chance of $8.00 40% chance of $8.00 30% chance of $8.00 20% chance of $8.00 10% chance of $8.00 0% chance of $8.00
Option B 10% chance of $19.00 20% chance of $19.00 30% chance of $19.00 40% chance of $19.00 50% chance of $19.00 60% chance of $19.00 70% chance of $19.00 80% chance of $19.00 90% chance of $19.00 100% chance of $19.00
90% chance of $1.00 80% chance of $1.00 70% chance of $1.00 60% chance of $1.00 50% chance of $1.00 40% chance of $1.00 30% chance of $1.00 20% chance of $1.00 10% chance of $1.00 0% chance of $1.00
320
JAYSON L. LUSK AND KEITH H. COBLE
treatment, after each subject completed the decision task, they participated in a mean-zero lottery with a 50% chance of losing $10.00 and a 50% chance of winning $10.00. In the unfair treatment, after each subject completed the decision task, they played a lottery with a 50% chance of losing $10.00 and a 50% chance of winning $0.00. One hundred thirty undergraduate students were recruited from introductory economics and business courses by passing around sign-up sheets containing session times and dates. Upon arrival at a session (a typical session contained about 20 subjects), students were given a $10 show-up fee and were asked to complete a lengthy survey on food consumption habits. The purpose of the lengthy survey was to make subjects feel as though they had earned their show-up fee prior to participating in the risk preference elicitation experiment. After the risk preference elicitation experiment, subjects were individually paid their earnings in cash (except for the few cases in the background risk treatments where individuals owed us money, in which subjects paid us for their losses). Sessions lasted approximately for 1 h.
ANALYSIS AND RESULTS There are a variety of methods that can be used to determine risk preferences based on the choices in the experiment. Before proceeding, distinctions need be drawn concerning different types of analysis that can be undertaken and different measures of risk preferences that can be calculated. First, certain analyses can be carried out where risk preferences are permitted to vary from individual-to-individual. That is, based on choices in the decision task, we can create measures of each individual’s risk preferences. However, the decision task only permits rather crude measures of each individual’s risk preferences (e.g., a range on an individual’s coefficient of relative risk aversion rather than a point estimate). The second type of analysis that is undertaken is to estimate aggregate risk preferences in each treatment. Although this approach has the disadvantage of combining individuals with different preferences, it permits more precise estimates of risk aversion and permits us to relax the assumption of strict expected utility preferences. In addition to this issue, we carry out our analysis using risk preferences estimated via one of two manners: (a) the initial $10 endowment and the income/loss from the background risk are explicitly assumed to enter individuals’ utility functions in addition to the potential winnings from the
Risk Aversion in the Presence of Background Risk
321
decision task, or (b) individual risk preferences are calculated based only on the winnings from the decision task. To illustrate, consider the expected utility of option A in decision 1 from Table 1. Under approach (a) expected utility in the fair background risk treatment would be calculated as: 0.05U($10+$10$10 ¼ $10)+0.45U($10+$8$10 ¼ $8)+0.05U ($10+$19+ $10 ¼ $39)+0.45U($10+$1+$10 ¼ $21), and 0.05U($10+$10$10 ¼ $10)+ 0.45U($10+$8$10 ¼ $8)+0.05U($10+$19+$0 ¼ $29) +0.45U($10+$1+ $0 ¼ $11) in the unfair background risk treatment, and 0.1U($10+$10 ¼ $20)+0.9U($10+$8 ¼ $18) in the no background risk treatment. In contrast, under approach (b), the expected utility for all three treatments would simply be calculated as: 0.1U($10)+0.9U($8). Although most work in economics incorporates final wealth (as opposed in income) as the argument in the utility function, expected utility theory, in and of itself, is silent regarding whether approach (a) or (b) is appropriate, and as such, we carry out our analysis both ways. With the stage set, we now turn our attention to first characterizing individuals’ risk preferences. Individual’s choices in the decision task shown in Table 1 can be used to determine risk preferences. Regardless of whether income/loss from background risk is included in the utility calculation, a risk-neutral individual would choose option A for the first four decisions listed in Table 1 because the expected value of lottery A exceeds that of lottery B for the first four choices. As one moves down Table 1, the chances of winning the higher payoff increase for both options. In fact, decision 10 is a simple check of participant understanding as subjects are simply asked to choose between $10.00 and $19.00. When completing the decision task, most individuals start with option A and at some point then switch to option B, which they choose for the remainder of the decision task. Although this behavior is the norm, there was no requirement that subjects behave in such a manner. That is, individuals could choose A, then B, and then A again. As a first step in characterizing individuals’ risk preferences, we follow Holt and Laury (2002) and report the sum of the number of safe choices an individual made in the decision task. In addition, we report an alternative, but similar measure of risk aversion: the decision task where an individual first chose option B – the risky prospect. Although both measures provide some indication of an individual’s risk preference, a more appropriate and useful approach is to analyze the range of a local measure of an individual’s coefficient of absolute or relative risk aversion. Assuming subjects exhibit constant relative risk aversion (CRRA) i.e., U(x) ¼ x(1rr)/(1rr), where rr is a local measure of the coefficient of relative risk aversion, choices in the
322
JAYSON L. LUSK AND KEITH H. COBLE
decision task can be used to determine a range on a subject’s coefficient of relative risk aversion. Coefficients corresponding to rro0, rr ¼ 0, and rrW0, are associated with risk-loving, risk-neutral, and risk-averse behavior, respectively. It is important to note that the assumption of CRRA preferences generates DARA, which, as previously mentioned, was a sufficient condition to guarantee that adding an unfair background risk increases risk aversion. Alternatively, one could assume subjects exhibit constant absolute risk aversion (CARA) i.e., U(x) ¼ exp(ar x), where ar is a local measure of the coefficient of absolute risk aversion. Risk-loving, riskneutral, and risk-averse behavior is associated with aro0, ar ¼ 0, and arW0, respectively. With the CARA specification, it is inconsequential whether the endowment and income/losses from the background risk are incorporated into the utility calculations or whether utility is only calculated based on earnings from the decision task; one arrives at the same estimate of ar in either case. Turning to the individual-level results, Table 2 reports the distributions of the number of safe choices in each experimental treatment.4 In addition, Table 2 reports the range of ar and rr (for the situation where the $10 endowment and background risk gains/losses are not incorporated into the expected utility formula) corresponding to the situation where an individual starts the task of choosing option A and makes one switch to option B, which he/she chooses thereafter. Fig. 1 plots the percentage of safe choices Table 2. Number of Safe Choices
0–1 2 3 4 5 6 7 8 9–10 Number of a
Risk Aversion Classification Based on Lottery Choices.
Range of Relative Risk Aversiona
rro0.97 0.97orro0.49 0.49orro0.12 0.12orro0.19 0.19orro0.49 0.49orro0.79 0.79orro1.13 1.13orro1.61 1.61orr observations
Range of Absolute Risk Aversionb
aro0.11 0.11oaro0.06 0.06oaro0.02 0.02oaro0.03 0.03oaro0.07 0.07oaro0.11 0.11oaro0.17 0.17oaro0.25 0.25oar
Treatment No background risk
Mean-zero background risk
Unfair background risk
2.0% 0.0% 10.0% 24.0% 12.0% 24.0% 16.0% 10.0% 2.0% 50
0.0% 0.0% 0.0% 11.1% 22.2% 40.7% 18.6% 7.4% 0.0% 27
0.0% 0.0% 3.8% 20.7% 18.9% 35.9% 7.5% 7.5% 5.7% 53
Assuming U(x) ¼ x(1rr)/(1rr) and x only includes the gains from the decision task. Assuming U(x) ¼ exp(ar x) and x only includes the gains from the decision task.
b
323
Risk Aversion in the Presence of Background Risk 100.00% 90.00%
No Background Risk Mean-Zero Background Risk Unfair Background Risk Risk Neutral Behavior
Percentage of Safe Choices
80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1
Fig. 1.
2
3
4
5 6 Decision
7
8
9
10
Percentage of Safe Choices in Each Decision and Treatment.
in each of the 10 decision tasks shown in Table 1. First, it is apparent that majority of subjects in the sample are risk averse. A risk-averse individual would choose option A for the first four decision tasks; however, the majority of respondents chose option A five or more times. Second, there appears to be a slight treatment effect. In particular, Fig. 1 shows that data in the mean-zero treatment lies to the right of that in the no background risk treatment for most of the decision tasks. Although the unfair background risk treatment paralleled the no background risk treatment closely for most decision tasks, a larger percentage of safe choices are observed for the final three decision tasks in the unfair background risk treatment. Table 3 reports the mean, median, and standard deviation of the number of safe choices and the first risky choice for each treatment. On average, subjects in both background risk treatments behaved in a more risk-averse manner than subjects that did not face a background risk. The median number of safe choices and first risk choice were similar across all three treatments. Regardless of whether one focuses on the number of same choices or the first risky choice, ANOVA tests are unable to reject the hypothesis that the mean risk aversion levels were identical across treatment at any standard significance level and Wilcoxon rank sum tests are unable to reject the hypothesis of equality of distributions across treatments at any standard significance level. Interestingly, the standard deviation of number
324
Table 3.
JAYSON L. LUSK AND KEITH H. COBLE
Summary Statistics of Number of Safe Choices and First Risky Choice Across Treatment. Treatment No background risk
Mean-zero background risk
Unfair background risk
Mean number of safe choicesa Median number of safe choices Standard deviation of number of safe choices
5.40 6.00 1.78
5.89 6.00 1.09
5.68 6.00 1.48
Average first risky choiceb Median first risky choice Standard deviation of first risky choice
6.32 6.50 1.81
6.70 7.00 1.35
6.34 6.00 1.75
Number of participants
50
27
53
a
The p-value from an ANOVA test associated with the null hypothesis that the mean number of safe choices is equal across treatment is p ¼ 0.38. The p-value from a Wilcoxon rank sum test of the equality of distributions of safe choices across treatment is p ¼ 0.43. b The p-value from an ANOVA test associated with the null hypothesis that the mean first risky choice is equal across treatment is p ¼ 0.60. The p-value from a Wilcoxon rank sum test of the equality of distributions of first risky choices across treatment is p ¼ 0.46.
of safe choices and first risky choice were greater when no background risk was present, a result that is statistically significant. Although comparing data across treatments in Table 3 is useful for summarizing individuals’ behavior, such an approach does not control for potential differences in subject-specific characteristics across treatment, nor does it explicitly incorporate the precision with which we were able to measure an individual’s level of risk aversion. That is, an analysis focused solely on the number of safe choices or the first risky choice would not account for the fact that individuals who switched back and forth between options A and B contribute less information (i.e., have greater variance) regarding their risk preferences. To address both issues, we estimated interval-censored models with and without multiplicative heteroscedasticy. Table 4 reports three models, the first two of which use interval-censored rr as the dependent variable and the last of which uses interval-censored ar as the dependent variable. The two dummy variables at the bottom of the table show the effect of background
Risk Aversion in the Presence of Background Risk
325
risk on risk aversion holding constant other subject-specific effects. The dummy variables are statistically significant only in the CRRA model when the endowment and background risk income/loss is incorporated into the expected utility calculation, in which case both background risk treatments are associated with lower rr. This result implies background risk increases risk-taking behavior, which is contrary to assumptions in the Gollier and Pratt (1996). However, caution should be taken in interpreting this result. First, the result may be an artifact of fact that the final monetary outcomes of the treatment without background risk did not span the range of final outcomes in the treatments with background risk. The result is that an individual that chose option A for the first seven decision tasks, for example, would have a lower bound on rr of about 0.94 in the both background risk treatments, but an identical individual that chose option for the first seven task in the no background treatment would have a lower bound on rr of 2.04; that is, exactly same choices generate different estimates of rr. Second the models in Table 4 do not control for heteroscedasticity that might arise due to differences in variance across treatment or other explanatory variables. Table 5 reports results from interval-censored models with multiplicative heteroscedasticity. Once heteroscedasticity is taken into account, we find that subjects in the mean-zero background risk treatment behaved in a more risk-averse manner (i.e., exhibited higher levels of rr and ra) than individuals that were not exposed to a background risk according to the CRRA model that did not incorporate income from the background risk and the CARA model. In both of these specifications, a similar result was obtained for the unfair background risk treatment, although less significant ( p ¼ 0.15).5 In the CRRA model that incorporates the endowment and background risk gains/losses into the utility calculation, neither of the background risk treatments was statistically significant. One interesting result from Table 5, which is not addressed by theory, is that both background risk treatments generated less variability around measured levels of rr and ra than when no background risk was present. Although the interval-censored models have appealing features in that they utilize individual estimates of risk preference and permit a straightforward way to control for subject-specific effects across treatment, there are some drawbacks. In particular, the estimates rest on the assumption of a particular functional form for the utility function, CRRA or CARA. Further, the models do not permit one to determine whether individuals have non-expected utility preferences.6 To address both issues, we used the choices in the decision task to estimate a variety of preference functionals by
326
JAYSON L. LUSK AND KEITH H. COBLE
Table 4.
Effect of Background Risk on Risk Aversion: Interval-censored Regressions. Relative Risk Aversion Modelsa Income from background risk not integratedc
Constant Gender (1 ¼ female; 0 ¼ male) Age (age in years) Freshman (1 ¼ freshman; 0 ¼ otherwise) Sophomore (1 ¼ sophomore; 0 ¼ otherwise) Junior (1 ¼ junior; 0 ¼ otherwise) Employment (1 ¼ not employed; 0 ¼ employed at least part time) Income (annual income from all sources) Race (1 ¼ white; 0 ¼ otherwise) Mean-zero background risk (1 ¼ mean-zero background risk treatment; 0 ¼ otherwise) Unfair background risk (1 ¼ unfair background risk treatment; 0 ¼ otherwise) Scale
Income from background risk is integratedd
Absolute Risk Aversion Modelb
0.028 (0.374) 0.010 (0.107) 0.014 (0.014) 0.0008 (0.173) 0.172 (0.129) 0.132 (0.124) 0.035 (0.100) 0.037 (0.020) 0.211 (0.146) 0.150 (0.133)
0.787 (0.746) 0.034 (0.212) 0.016 (0.028) 0.125 (0.341) 0.309 (0.258) 0.189 (0.249) 0.059 (0.199) 0.051 (0.039) 0.229 (0.264) 0.519 (0.229)
0.004 (0.054) 0.005 (0.016) 0.002 (0.002) 0.001 (0.025) 0.025 (0.019) 0.019 (0.018) 0.007 (0.015) 0.006 (0.003) 0.033 (0.021) 0.018 (0.019)
0.142 (0.114)
0.501 (0.120)
0.019 (0.017)
0.509 (0.034)
1.020 (0.069)
0.074 (0.005)
Note: Numbers in parentheses are standard errors; , , and represent statistical significance at 0.10, 0.05, and 0.01 levels, respectively; log-likelihood function values are 224.81, 220.83, and 224.08, respectively, for the three models above. a Dependent variable is the range of individuals’ coefficient of relative risk aversion; number of observations ¼ 130. b Dependent variable is the range of individuals’ coefficient of absolute risk aversion; number of observations ¼ 130. c Only the earnings from the decision task are not incorporated into the expected utility formula to calculate CARA intervals. d The $10 endowment and background risk gains/losses are not incorporated into the expected utility formula to calculate CARA intervals.
327
Risk Aversion in the Presence of Background Risk
Table 5. Effect of Background Risk on Risk Aversion: Intervalcensored Regressions with Multiplicative Heteroscedasticity.
Constant Gender Age Freshman Sophomore Junior Employment Income Race Mean-zero background risk Unfair background risk
Relative Risk Aversion Model: Income from Background Risk not Incorporateda
Relative Risk Aversion Model: Income from Background Risk is Incorporatedb
Absolute Risk Aversion Mode
Mean equation
Variance equation
Mean equation
Variance equation
Mean equation
Variance equation
0.782 (0.451) 0.086 (0.106) 0.016 (0.019) 0.044 (0.117) 0.092 (0.121) 0.128 (0.107) 0.005 (0.080) 0.043 (0.011) 0.226 (0.106) 0.263 (0.100)
2.011 (0.648) 0.618 (0.189) 0.057 (0.025) 0.466 (0.308) 0.392 (0.239) 0.054 (0.208) 0.424 (0.185) 0.094 (0.041) 0.578 (0.256) 0.779 (0.236)
1.303 (0.789) 0.048 (0.134) 0.019 (0.034) 0.017 (0.179) 0.113 (0.187) 0.106 (0.163) 0.081 (0.106) 0.047 (0.017) 0.300 (0.121) 0.199 (0.232)
0.863 (0.751) 0.578 (0.257) 0.053 (0.031) 0.731 (0.355) 0.396 (0.292) 0.002 (0.232) 0.328 (0.222) 0.072 (0.042) 0.445 (0.260) 1.734 (0.324)
0.116 (0.064) 0.013 (0.015) 0.002 (0.003) 0.008 (0.017) 0.011 (0.017) 0.019 (0.016) 0.001 (0.012) 0.006 (0.002) 0.032 (0.015) 0.036 (0.014)
3.921 (0.646) 0.623 (0.197) 0.055 (0.024) 0.503 (0.320) 0.440 (0.241) 0.093 (0.209) 0.451 (0.188) 0.094 (0.040) 0.599 (0.262) 0.800 (0.243)
0.432 (0.212)
0.311 (0.235)
1.367 (0.242)
0.147 (0.097)
0.021 (0.014)
0.419 (0.220)
Note: Numbers in parentheses are standard errors; , , and represent statistical significance at 0.10, 0.05, and 0.01 levels, respectively; log-likelihood function values are 205.69, 203.96, and 205.63, respectively, for the three models above. a Dependent variable is the range of individuals’ coefficient of relative risk aversion; number of observations ¼ 130; only the earnings from the decision task are not incorporated into the expected utility formula to calculate CARA intervals. b Dependent variable is the range of individuals’ coefficient of absolute risk aversion; number of observations ¼ 130; the $10 endowment and background risk gains/losses are not incorporated into the expected utility formula to calculate CARA intervals.
treatment. To carry out this task, an individual is assumed to choose option A if the difference in (rank dependent) expected utility of option A and B exceeded zero. Adding a mean-zero normally distributed error term to this difference calculation produces a familiar logit specification. Because the
328
JAYSON L. LUSK AND KEITH H. COBLE
utility functions we estimate have the properties that U(0) ¼ 0 and that A is chosen if EU(A)EU(B)W0, these normalization allows us to directly estimate the standard deviation of the error in the probit such that the utility coefficients are directly interpretable and comparable across treatments. All the model specifications we consider can be derived from the following (rank dependent) expected utility preference function: RDEU ¼
N X i¼1
pi
1 expðaxið1rÞ Þ a
(1)
where N is the number of outcomes (xi) from a lottery and xi are ordered such that x1Wx2WyW xn. The utility function is the ‘‘power expo’’ function used in Holt and Laury (2002), where the Pratt–Arrow coefficient of relative risk aversion is r+a(1r)x(1r). If r ¼ 0, then the utility function exhibits CARA of degree a. If a ¼ 0, then the utility function exhibits CRRA, where r is the coefficient of relative risk aversion. Thus, the utility function nests constant relative and CARA as special cases. In Eq. (1), pi is a ‘‘decision weight’’ that takes the form of rank dependence such as that proposed by Quiggin (1982): pi ¼ w( p1+y+pi )w( p1+y+pi1), where pi is the probability of obtaining xi and w( p) is a probability weighting function, which we assume to take the form: w( p) ¼ pg/[ pg+(1p)g]1/g. If g ¼ 1, then the weighting function is linear in parameters and pi ¼ pi, which implies that Eq. (1) is expected utility. For values of go1, individuals overweight low-probability events and underweight medium-to-high probability events. Table 6 reports utility function and probability weighting function estimates for each experimental treatment assuming a ¼ 0 (CRRA) and further assumes that individuals do not incorporate their endowment and background risk gains/losses into utility calculations. The first three columns of results assume expected utility theory is the appropriate model of behavior by fixing g ¼ 1, whereas the last three columns directly estimate g. In addition to the probit estimates, the last two rows of Table 6 report results from unconditional interval-censored models for the sake of comparability. Assuming linear probability weighting, we find results very similar to that presented in Tables 4 and 5. The coefficient of CRRA is higher in the two background risk treatments as compared to the treatment without background risk, although the 95% confidence intervals overlap. We also find lower variance in the background risk treatments. We also note that the probit and interval-censored specifications generate nearly identical results. The last three columns in Table 6 allow for non-linear probability
329
Risk Aversion in the Presence of Background Risk
Table 6. Preference Function Estimates by Background Risk Treatment Ignoring Endowment and Income/Loss from Background Risk (Assuming a ¼ 0)a. Models Assuming Linear Probability Weighting
No background risk modelc
Models Assuming Nonlinear Probability Weighting and Rank-dependenceb
Mean-zero background risk modeld
Unfair background risk modele
No background risk modelc
Mean-zero background risk modeld
Unfair background risk modele
0.33 [0.21, 0.45] 0.60 [0.48, 0.72] 1
0.51 [0.38, 0.64] 0.55 [0.43, 0.68] 1
0.62 [0.19, 1.05] 0.27 [0.60, 1.14] 0.70 [0.00, 1.41]
0.47 [0.08, 1.02] 0.08 [1.02, 1.19] 0.56 [0.11, 0.99]
0.61 [0.41, 0.81] 0.55 [0.38, 0.71] 1.32 [0.95, 1.69]
Interval-censored model estimates sf 0.62 0.33 [0.49, 0.75] [0.23, 0.43] rr 0.46 0.62 [0.29, 0.63] [0.49, 0.75]
0.51 [0.40, 0.62] 0.57 [0.42, 0.72]
Probit model estimates s/2f 0.60 [0.36, 0.83]g r 0.46 [0.31, 0.61] g 1
a
Utility function takes the form: U(x) ¼ x(1rr)/(1rr), where rr is the coefficient of relative risk aversion, and x are the prizes in decision task in Table 1. b Probability weighting function is of the form: wð pÞ ¼ pg =½ pg þ ð1pÞg 1=g . c Sample size ¼ 50 in interval-censored model and 50 individuals 10 choices ¼ 500 in probit model. d Sample size ¼ 27 in interval-censored model and 27 individuals 10 choices ¼ 270 in probit model. e Sample size ¼ 53 in interval-censored model and 53 individuals 10 choices ¼ 530 in probit model. f s is the standard deviation of the error term in the model. g Numbers in brackets are 95% confidence intervals.
weighting. For the no background risk and mean-zero background risk treatments, estimates of g are less than one and are consistent with previously published estimates, which range from 0.56 to 0.71 (e.g., Camerer & Ho, 1994; Tversky & Kahneman, 1992; or Wu & Gonzalez, 1996), although the mean-zero background risk treatment is the only treatment where the 95% confidence interval for g does not include one. Once probability weighting is taken into account, we are no longer able to reject the hypothesis that individuals’ utility functions are linear in the no background risk and mean-zero background risk treatments; however, individuals in the unfair background risk treatment still exhibit risk aversion after probability weighting is taken into account.
330
JAYSON L. LUSK AND KEITH H. COBLE
Table 7. Preference Function Estimates by Background Risk Treatment Ignoring Endowment and Income/Loss from Background Riska. Models Assuming Linear Probability Weighting
Models Assuming Nonlinear Probability Weighting and Rank-dependenceb
No background risk modelc
Mean-zero background risk modeld
Unfair background risk modele
No background risk modelc
Mean-zero background risk modeld
Unfair background risk modele
Probit model estimates s/2f 0.51 [0.47, 0.54]g r 0.23 [0.09, 0.37] a 0.07 [0.02, 0.12] g 1
0.47 [0.45, 0.48] 0.03 [0.12, 0.18] 0.09 [0.01, 0.11] 1
0.51 [0.49, 0.53] 0.18 [0.10, 0.26] 0.09 [0.06, 0.12] 1
0.55 [0.50, 0.60] 0.16 [0.04, 0.28] 0.02 [0.19, 0.23] 0.70 [0.01, 1.41]
0.59 [0.67, 1.85] 0.11 [0.99, 0.77] 0.02 [0.05, 0.09] 0.56 [0.09, 1.03]
0.58 [0.29, 0.87] 0.20 [0.10, 0.30] 0.09 [0.05, 0.13] 1.32 [0.95, 1.69]
a
Utility function takes the form: UðxÞ ¼ ½1 expðaxð1rrÞ Þ=a, where rr is the coefficient of relative risk aversion, a is the coefficient of absolute risk aversion, and x are the prizes in decision task in Table 1. b Probability weighting function is of the form: wð pÞ ¼ pg =½ pg þ ð1pÞg 1=g . c Sample size ¼ 50 individuals 10 choices ¼ 500. d Sample size ¼ 27 individuals 10 choices ¼ 270. e Sample size ¼ 53 individuals 10 choices ¼ 530. f s is the standard deviation of the error term in the model. g Numbers in brackets are 95% confidence intervals.
Table 7 reports estimates similar to those in Table 6 except the restriction of a ¼ 0 is relaxed. Overall, our estimates are similar to those in Holt and Laury, who estimated a ¼ 0.27 and r ¼ 0.03, which implies increasing relative risk aversion and DARA. Although the point estimates reveal differences in behavior across the three experimental treatments, the 95% confidence intervals overlap for every parameter of interest regardless of whether we assume linear probability weighting. Table 8 reports utility function estimates assuming a ¼ 0 (CRRA) and that individuals incorporate their endowment and background risk gains/ losses into utility calculations. In addition to the probit estimates, we also present results from the simple interval-censored models for comparison. As in Table 4, we find higher levels of risk aversion in the no background risk treatment as compared to the two treatments that incorporated background risk; however, the 95% confidence intervals overlap.
331
Risk Aversion in the Presence of Background Risk
Table 8. Preference Function Estimates by Background Risk Treatment Incorporating $10 Endowment and Income/Losses from Background Risk (Assuming a ¼ 0)a. Models Assuming Linear Probability Weighting
Models Assuming Nonlinear Probability Weighting and Rank-dependenceb
No background risk modelc
Mean-zero background risk modeld
Unfair background risk modele
No background risk modelc
Mean-zero background risk modeld
Unfair background risk modele
Probit model estimates s/2f 0.10 [0.02, 0.22] rr 1.15 [0.76, 1.54] g 1
0.35 [0.23, 0.47]g 0.73 [0.61, 0.86] 1
0.62 [0.58, 0.64] 0.69 [0.55, 0.83] 1
0.30 [1.31, 1.92] 0.67 [1.54, 2.88] 0.70 [0.00, 1.40]
1.12 [0.81, 3.05] 0.01 [1.34, 1.36] 0.48 [0.11, 1.07]
0.65 [0.39, 0.91] 0.74 [0.56, 0.92] 1.45 [1.10, 1.80]
Interval-censored model estimates sf 1.58 0.35 [1.25, 1.91] [0.24, 0.46] rr 1.24 0.74 [0.79, 1.69] [0.60, 0.88]
0.57 [0.45, 0.69] 0.66 [0.50, 0.82]
a
Utility function takes the form U(x) ¼ x(1rr)/(1rr), where rr is the coefficient of relative risk aversion, and x are final wealth states including the $10 endowment and the income from the background risk lotteries. b Probability weighting function is of the form wð pÞ ¼ pg =½ pg þ ð1pÞg 1=g . c Sample size ¼ 50 in interval-censored model and 50 individuals 10 choices ¼ 500 in probit model. d Sample size ¼ 27 in interval-censored model and 27 individuals 10 choices ¼ 270 in probit model. e Sample size ¼ 53 in interval-censored model and 53 individuals 10 choices ¼ 530 in probit model. f s is the standard deviation of the error term in the model. g Numbers in brackets are 95% confidence intervals.
CONCLUSION Whether and to what extent preferences for risk are affected by background risk carries important implications for economic analysis. If subjects are significantly influenced by background risk, economic analyses must move beyond studies of behavior in single risky situations, which can almost never be expected to arise in practice. Changes in public policies, for example, that limit risk-taking behavior in one domain, might generate seemingly counterintuitive results by increasing risk-taking behavior in other domains.
332
JAYSON L. LUSK AND KEITH H. COBLE
We found little support for the notion that individuals made choices consistent with risk vulnerability as defined by Gollier and Pratt (1996). We found that a mean-preserving increase in background risk had a stronger influence on risk aversion than the addition of an unfair background risk. Individuals that were forced to play a lottery with a 50% change of winning $10 and a 50% chance of losing $10 behaved in a more risk-averse manner than individuals that were not exposed to such a lottery. However, this finding depends on: (1) how individuals incorporate endowments and background gains and losses into their utility functions and (2) how error variance is modeled. It is also important to note that much of the risk-averse behavior in this treatment may arise from non-linear probability weighting. We found weak evidence that individuals may weight probabilities differently in the unfair background risk treatment than in the other treatments; only for this treatment were we able to reject the hypothesis of linear probability weighting. Finally, we found that background risk, whether mean-preserving or unfair, generated less variable estimates of coefficients of relative and absolute risk aversion than when no background risk was present, although some of this effect dissipates when we allow for non-linear probability weighting. Although previous theoretical work has generated plausible signs on the effect of background risk on risk-taking behavior, it is silent regarding the distribution of risk preferences with and without background risk. In general, however, we found that the effect of background risk on risk preferences was not particularly large in this experiment. There may be a variety of factors attributing to this result, some relating to experimental design issues and others that are farther reaching. Regarding the experimental design, future work on this issue might consider using a more precise risk-elicitation approach. Although the decision task shown in Table 1 is easy for subjects to complete, it only identifies a range of plausible risk preferences. To the extent that background risks only have a small effect on risk-taking behavior, a more refined elicitation tool is required in order to measure the effect. Future experiments might also vary the range of earnings in the no background risk treatment. In our experiment, the final monetary outcomes of the treatment without background risk did not span the range of final outcomes in the treatments with background risk. Aside from experiment design issues, other factors might be related to our finding that background risk has small to no affect on risk-taking behavior. First, experimental subjects may bring a number of background risks with them into the experiment. If so, non-experimental background risks might swamp the effect of experimentally induced background risk.
Risk Aversion in the Presence of Background Risk
333
Future laboratory research investigating the effect of background risk on risk preferences might focus methods for measuring and controlling for other ‘‘field’’ background risks. Second, the sample of respondents might have been heterogeneous with regard to preferences; some individuals might have been expected utility maximizers, while others might have had generalized expected utility preferences. If some portion of the sample had generalized expected utility preferences, the results in Quiggin (2003) suggest these individuals would behave in a less risk-averse way when confronted with a background risk, which would tend to dampen the aggregate results presented here. A risk preference elicitation approach that permitted a test of expected utility for each individual would be able to sort out these issues. Finally, behavioral research suggests that when confronted with several risky choices, individuals tend to assess each risky choice in isolation rather than assessing all risks jointly (Benartzi & Thaler, 1995; Kahneman & Lovallo, 1993; Read, Loewenstein, & Rabin, 1999). This behavior might cause individuals to at least partially disregard background risks when making endogenous risky decisions. Such behavior would cause background risk to have a lesser effect on risk preference than predicted by the models of Gollier and Pratt (1996) or Quiggin (2003). Given the implications of background risk, our results clearly suggest that this is a research area meriting further experimental research to test these alternative theories.
NOTES 1. The values in Table 1 are roughly five times the baseline treatment used by Holt and Laury (2002). 2. Because only 1 in 10 choices were picked at random, there is some background risk present in all treatments; however, this particular background risk is constant across all treatments. 3. Instructions for each treatment are in the appendix. 4. All data and computer code used to generate the results in this paper are available on ExLab (http://exlab.bus.ucf.edu). 5. We are able to reject the joint hypothesis that rr is unaffected by mean-zero and unfair background risks, at the p ¼ 0.05 level of statistical significance for the CRRA model without income from background risk. A similar result ( p ¼ 0.06) is obtained for ra. 6. As shown by Harrison (2006) allowing for non-EU preferences can have a substantive impact on interpretation of results. Harrison (2006) showed that while there are significant differences in behavior between real and hypothetical treatments, some non-EU models suggest that the difference arises due to changes in the probability weighting function and not due to the changes in the utility or value function.
334
JAYSON L. LUSK AND KEITH H. COBLE
7. Note to the reader: Strictly speaking a 10-sided die cannot be constructed to provide an exact uniform distribution; the 10-sided die gives an approximately equal chance of each decision being binding.
ACKNOWLEDGMENTS The authors would like to thank Glenn Harrison, Jason Shogren, and Eric Rasmusen for their helpful comments on the previous version of this paper.
REFERENCES Alessie, R., Hochguertel, S., & van Soest, A. (2001). Household portfolios in the Netherlands. In: L. Guison, M. Haliassos & T. Japelli (Eds), Household portfolios. Cambridge: MIT Press. Arrondel, L., & Masson, A. (1996). Gestion du risque et comportements patrimoniauz. Economie et Statistique, 296–297, 63–89. Benartzi, S., & Thaler, R. H. (1995). Myopic loss aversion and the equity premium puzzle. Quarterly Journal of Economics, 110, 73–93. Binswanger, H. P. (1980). Attitudes toward risk: Experimental measurement in rural India. American Journal of Agricultural Economics, 62, 396–407. Camerer, C., & Ho, T. (1994). Violations of the betweenness axiom and nonlinearity in probability. Journal of Risk and Uncertainty, 8, 167–196. Diamond, D. W. (1984). Financial intermediation and delegated monitoring. Review of Economic Studies, 51, 393–414. Eeckhoudt, L., Gollier, C., & Schlesinger, H. (1996). Changes in background risk and risk taking behavior. Econometrica, 64, 683–689. Gollier, C., & Pratt, J. W. (1996). Risk vulnerability and the tempering effect of background risk. Econometrica, 64, 1109–1123. Guiso, L., Japelli, T., & Terlizzese, D. (1996). Income risk, borrowing constraints and portfolio choice. American Economic Review, 86, 158–172. Harrison, G. W. (2006). Hypothetical bias over uncertain outcomes. In: J. A. List (Ed.), Using experimental methods in environmental and resource economics. Northampton, MA: Elgar. Harrison, G. W., List, J. A., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. Econometrica, 75, 433–458. Heaton, J., & Lucas, D. (2000). Portfolio choice in the presence of background risk. Economic Journal, 110, 1–26. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655. Kahneman, D., & Lovallo, D. (1993). Timid choices and bold forecasts: A cognitive perspective on risk taking. Management Science, 39, 17–32. Kimball, M. S. (1990). Precautionary savings in the small and in the large. Econometrica, 58, 53–73. Pratt, J. W., & Zeckhauser, R. (1987). Proper risk aversion. Econometrica, 55, 143–154.
Risk Aversion in the Presence of Background Risk
335
Read, D., Loewenstein, G., & Rabin, M. (1999). Choice bracketing. Journal of Risk and Uncertainty, 19, 171–197. Safra, Z., & Segal, U. (1998). Constant risk aversion. Journal of Economic Theory, 83, 19–42. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5, 297–323. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3, 323–343. Quiggin, J. (2003). Background risk in generalized expected utility theory. Economic Theory, 22, 607–611. Weil, P. (1992). Equilibrium asset prices with undiversifiable labor income risk. Journal of Economic Dynamics and Control, 16, 769–790. Wu, G., & Gonzalez, R. (1996). Curvature of the probability weighting function. Management Science, 42, 1676–1690.
APPENDIX. EXPERIMENT INSTRUCTIONS Beginning Instructions – Common to All Three Treatments Thank you for agreeing to participate in today’s session. Before beginning today’s exercise, I have two requests. First, you should sit some distance from any of the other participants. Second, other than questions directed toward me, there is to be NO talking. Failure to comply with the no talking policy will result in immediate disqualification from this exercise. Before we begin, I want to emphasize that your participation in this session is completely voluntary. If you do not wish to participate in the experiment, please say so at any time. Non-participants will not be penalized in any way. I want to assure you that the information you provide will be kept strictly confidential and used only for the purposes of this research. At this time, you should have been given a consent form. Please sign this form and return it to me. Now, you will be given $10.00 and a packet with two separate documents. The $10.00 is yours to keep and it has been provided to compensate you for your time. In the upper right hand corner of the documents is an ID number. This ID number is used to ensure confidentiality. In today’s session, you will participate in two exercises. First, I would like you all to look at the document titled ‘‘Survey on Consumer Opinions.’’ At this time, take the next 20–30 min to complete the survey. When you complete the survey, then we will proceed to the second exercise, which will be explained after everyone has completed the survey. Are there any questions before we begin? {to be read after completion of the surveyc
336
JAYSON L. LUSK AND KEITH H. COBLE
Has everyone completed the survey? Please return the completed survey to me. Now, you will participate in an exercise where you will have the opportunity to earn money. You will be asked to make several choices, which will determine how much money you will earn. Please turn your attention to the second document you have been given, which is titled, ‘‘Decision Record Sheet.’’ Instructions for the No-Background Risk Treatment Your decision sheet shows ten decisions numbered one to ten on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices (either A or B) and record these in the final column, but only one of them will be used in the end to determine your earnings. Before you start making your ten choices, please let me explain how these choices will affect your earnings for the experiment. Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, either A or B. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end.7 Now, please look at Decision 1 at the top. Option A pays $10.00 if the throw of the ten-sided die is 1, and it pays $8.00 if the throw is 2–10. Option B yields $19.00 if the throw of the die is 1, and it pays $1.00 if the throw is 2–10. Similarly, for Decision 2, Option A will pay $10.00 if the throw of the die is 1 or 2 and will pay $8.00 if the throw of the die is 3–10. The other decisions are similar, except that as you move down the table, the chances of the higher payoff for each option increase. In fact, for Decision 10 in the bottom row, the die will not be needed since each option pays the highest payoff for sure, so your choice here is between $10.00 or $19.00. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are finished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your money earnings for the option you chose for that decision. Earnings for this choice will be paid in cash when we finish.
Risk Aversion in the Presence of Background Risk
337
So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B in each of the ten boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings. Then you will write your earnings in the blank at the bottom of the page. Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question. Instructions for Mean-Zero Background Risk Treatment Your decision sheet shows ten decisions numbered one to ten on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices (either A or B) and record these in the final column, but only one of them will be used in the end to determine your earnings. Before you start making your ten choices, please let me explain how these choices will affect your earnings for the experiment. Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, either A or B. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end. Now, please look at Decision 1 at the top. Option A pays $10.00 if the throw of the ten-sided die is 1, and it pays $8.00 if the throw is 2–10. Option B yields $19.00 if the throw of the die is 1, and it pays $1.00 if the throw is 2–10. Similarly, for Decision 2, Option A will pay $10.00 if the throw of the die is 1 or 2 and will pay $8.00 if the throw of the die is 3–10. The other decisions are similar, except that as you move down the table, the chances of the higher payoff for each option increase. In fact, for Decision 10 in the bottom row, the die will not be needed since each option pays the highest payoff for sure, so your choice here is between $10.00 or $19.00. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are finished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your money
338
JAYSON L. LUSK AND KEITH H. COBLE
earnings for the option you chose for that decision. Earnings for this choice will be paid in cash when we finish. So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B in each of the ten boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings. Then you will write your earnings from the Decision Task in the first blank at the bottom of the page marked ‘‘Earnings from Decision Task.’’ Are there any questions about the Decision Task before the next part of this exercise is explained? After your earnings from the Decision Task are determined, you will participate in a lottery. In this lottery, there is a 50% chance of losing $10.00 and a 50% chance of winning $10.00. So, after your earnings from the Decision Task are determined, while we are still at your desk, we will role the die again. If the throw of the die is 1–5, you will lose $10.00, but if the throw of the die comes up 6–10, you will earn $10.00. After your earnings from the lottery are determined, you will write this amount on the second blank at the bottom of the page marked ‘‘Earnings from Lottery.’’ Total earnings for the experiment are determined by adding ‘‘Earnings from Decision Task’’ and ‘‘Earnings from Lottery.’’ Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question. Instructions for Unfair Background Risk Treatment Your decision sheet shows ten decisions numbered one to ten on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices (either A or B) and record these in the final column, but only one of them will be used in the end to determine your earnings. Before you start making your ten choices, please let me explain how these choices will affect your earnings for the experiment. Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, either A or B. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end. Now, please look at Decision 1 at the top. Option A pays $10.00 if the throw of the ten-sided die is 1, and it pays $8.00 if the throw is 2–10.
Risk Aversion in the Presence of Background Risk
339
Option B yields $19.00 if the throw of the die is 1, and it pays $1.00 if the throw is 2–10. Similarly, for Decision 2, Option A will pay $10.00 if the throw of the die is 1 or 2 and will pay $8.00 if the throw of the die is 3–10. The other decisions are similar, except that as you move down the table, the chances of the higher payoff for each option increase. In fact, for Decision 10 in the bottom row, the die will not be needed since each option pays the highest payoff for sure, so your choice here is between $10.00 or $19.00. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are finished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your money earnings for the option you chose for that decision. Earnings for this choice will be paid in cash when we finish. So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B in each of the ten boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings. Then you will write your earnings from the Decision Task in the first blank at the bottom of the page marked ‘‘Earnings from Decision Task.’’ Are there any questions about the Decision Task before the next part of this exercise is explained? After your earnings from the Decision Task are determined, you will participate in a lottery. In this lottery, there is a 50% chance of losing $10.00 and a 50% chance of winning $0.00. So, after your earnings from the Decision Task are determined, while we are still at your desk, we will role the die again. If the throw of the die is 1–5, you will lose $10.00, but if the throw of the die comes up 6–10, you will earn $0.00. After your earnings from the lottery are determined, you will write this amount on the second blank at the bottom of the page marked ‘‘Earnings from Lottery.’’ Total earnings for the experiment are determined by adding ‘‘Earnings from Decision Task’’ and ‘‘Earnings from Lottery.’’ Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question.
10% chance of $10.00, 90% chance of $8.00
20% chance of $10.00, 80% chance of $8.00
30% chance of $10.00, 70% chance of $8.00
40% chance of $10.00, 60% chance of $8.00
50% chance of $10.00, 50% chance of $8.00
60% chance of $10.00, 40% chance of $8.00
70% chance of $10.00, 30% chance of $8.00
80% chance of $10.00, 20% chance of $8.00
90% chance of $10.00, 10% chance of $8.00
100% chance of $10.00, 0% chance of $8.00
1
2
3
4
5
6
7
8
9
10
$_______________
$_______________
Earnings from Lottery
Total Earnings
Earnings from Decision Task $_______________
Option A
Decision
Decision Task
Participant Number _______________
100% chance of $19.00, 0% chance of $1.00
90% chance of $19.00, 10% chance of $1.00
80% chance of $19.00, 20% chance of $1.00
70% chance of $19.00, 30% chance of $1.00
60% chance of $19.00, 40% chance of $1.00
50% chance of $19.00, 50% chance of $1.00
40% chance of $19.00, 60% chance of $1.00
30% chance of $19.00, 70% chance of $1.00
20% chance of $19.00, 80% chance of $1.00
10% chance of $19.00, 90% chance of $1.00
Option B
Decision Record Sheet
Which Option is Preferred?
340 JAYSON L. LUSK AND KEITH H. COBLE
RISK AVERSION IN LABORATORY ASSET MARKETS Peter Bossaerts and William R. Zame ABSTRACT This paper reports findings from a series of laboratory asset markets. Although stakes in these markets are modest, asset prices display a substantial equity premium (risky assets are priced substantially below their expected payoffs) – indicating substantial risk aversion. Moreover, the differences between expected asset payoffs and asset prices are in the direction predicted by standard asset-pricing theory: assets with higher beta have higher returns. This work suggests ways to separate the effects of risk aversion from competing explanations in other experimental environments.
1. INTRODUCTION Forty years of econometric tests have provided only a weak support for the predictions of asset-pricing theories (see Davis, Fama, & French, 2000, for instance). However, it is difficult to know where the problems in such models lie, or how to improve them, because basic parameters of the theories – including the market portfolio, the true distribution of asset returns, the information available to investors – cannot be observed in the
Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 341–358 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00007-0
341
342
PETER BOSSAERTS AND WILLIAM R. ZAME
historical record. Laboratory tests of these theories are appealing because these basic parameters (and others) can be observed accurately – or even controlled. However, most asset-pricing theories rest on the assumption that individuals are risk averse.1 Because risks and rewards in laboratory experiments are (almost of necessity) small (in comparison to subjects’ lifetime wealth, or even current wealth), the degree of risk aversion observable in the laboratory might be so small as to be undetectable in the unavoidable noise, which would present an insurmountable problem. This paper reports findings from a series of laboratory asset markets that belie this concern: despite relatively small risks and rewards, the effects of risk aversion are detectable and significant. Most obviously, observed asset prices imply a significant equity premium: risky assets are priced significantly below their expected payoffs. Moreover, the differences between expected asset payoffs and returns (payoffs per unit of investment) are in the direction predicted by standard asset-pricing theory: assets with higher beta have higher returns. In our laboratory markets, 30–60 subjects trade one riskless and two risky securities (whose dividends depend on the state of nature) and cash. Each experiment is divided into 6–9 periods. At the beginning of each period, subjects are endowed with a portfolio of securities and cash. During the period, subjects trade through a continuous, web-based open-book system (a form of double auction that keeps track of infra-marginal bids and offers). After a pre-specified time, trading halts, the state of nature is drawn, and subjects are paid according to their terminal holdings. The entire situation is repeated in each period but the state of nature is drawn anew at the end of each period. Subjects know the dividend structure (the payoff of each security in each state of nature) and the probability that each state will occur, and of course they know their own holdings and their own attitudes toward wealth and risk. They also have access to the history of orders and trades. Subjects do not know the number of participants in any given experiment, nor the holdings of other participants, nor the market portfolio. Typical earnings in a single experiment (lasting more than 2 h) are $50–100 per subject. Although this is a substantial wage for some subjects, it is small in comparison to lifetime wealth, or indeed to current wealth (the pool of subjects consists of undergraduates and MBA students). Small rewards suggest approximately risk-neutral behavior, asset prices nearly coincident with expected payoffs, little incentive to trade, and hence little trade at all. However, our experimental data are inconsistent with these implications of risk neutrality; rather the data suggest significant risk aversion. Most obviously, substantial trade takes place and market prices are below expected
Risk Aversion in Laboratory Asset Markets
343
returns; moreover, assets with higher beta have higher returns/lower prices (as predicted by standard asset-pricing theories). Quantitative measures of risk aversion are provided by the Sharpe ratios of the market portfolio, which are in the range 0.2–1.7 – on the same order as the Sharpe ratio of the New York Stock Exchange (NYSE; computed on the basis of yearly data), which is 0.43 – and the imputed market risk aversion derived from CAPM, which is approximately 103. Following this introduction, Section 2 describes our experimental asset markets, Section 3 presents the data generated by these experiments and the relationship of these data to standard asset-pricing theories. Section 4 suggests implications of our experiments for the design and interpretation of other experiments where risk aversion may play a role, and concludes.
2. EXPERIMENTAL DESIGN In our laboratory markets the objects of trade are assets (state-dependent claims to wealth at the terminal time) A, B, N (Notes), and Cash. Notes are riskless and can be held in positive or negative amounts (can be sold short); assets A and B are risky and can only be held in non-negative amounts (cannot be sold short). Each experimental session of approximately 2 h is divided into 6–9 periods, lasting 15–20 min. (The length of the period is determined and announced to subjects in advance. Within each period, subject computers show time remaining.) At the beginning of a period, each subject (investor) is endowed with a portfolio of assets and Cash; the endowment of risky assets and Cash are non-negative, the endowment of Notes is negative (representing a loan that must be repaid). During the period, the market is open and assets may be traded for Cash. Trades are executed through an electronic open book system (a continuous double auction). During the period, while the market is open, no information about the state of nature is revealed, and no credits are made to subject accounts; in effect, consumption takes place only at the close of the market. At the end of each period, the market closes, the state of nature is drawn, payments on assets are made, and dividends are credited to subject accounts. (In some experiments, subjects were also given a bonus upon completion of the experiment.) Accounting in these experiments is in a fictitious currency called francs, to be exchanged for dollars at the end of the experiment at a pre-announced exchange rate. Subjects whose cumulative earnings at the end of a period are not sufficient to repay their loan are bankrupt; subjects who are bankrupt
344
PETER BOSSAERTS AND WILLIAM R. ZAME
for two consecutive trading periods are barred from trading in future periods.2 In effect, therefore, consumption in a given period can be negative. Subjects know their own endowments, and are informed about asset payoffs in each of the three states of nature X, Y, Z, and of the objective probability distribution over states of nature. We use two treatments of uncertainty. In the first treatment, states of nature for each period are drawn independently with probabilities 1/3, 1/3, 1/3; randomization is achieved by using a random number generator or by drawing with replacement from an urn containing equal number of balls representing each state. In the second treatment, balls, marked with the state, are drawn without replacement from an urn initially containing 18 balls, 6 for each state.3 (In each treatment, subjects are informed of the procedure.) Asset payoffs are shown in Table 1 (1 unit of Cash is 1 franc in each state of nature), and the remaining parameters for each experiment are shown in Table 2. (Experiments are identified by year-month-day.) In all experiments, subjects were given complete instructions, including descriptions of some portfolio strategies (but no suggestions as to which strategies to choose). Complete instructions and other details are available at http://eeps3.caltech.edu/market-011126; use anonymous login, ID 1, and password a. Subjects are not informed of the endowments of others, or of the market portfolio (the social endowment of all assets), or the number of subjects, or whether these are the same from one period to the next. The information provided to subjects parallels the information available to participants in stock markets such as the NYSE and the Paris Bourse. We are especially careful not to provide information about the market portfolio, so that subjects cannot easily deduce the nature of aggregate risk – lest they attempt to use a standard model (such as CAPM) to predict prices, rather than to take observed prices as given. Keep in mind that neither general equilibrium theory nor asset-pricing theory require that participants have any more information than is provided in these experiments. Indeed, much of the power of these theories comes precisely from the fact that agents know only market prices and their own preferences and endowments. Table 1. State A B N
Asset Payoffs.
X
Y
Z
170 160 100
370 190 100
150 250 100
Yale
Stanford
Tulane
Berkeley
Caltech
Sofia
Caltech
99-02-11
99-04-07
99-11-10
99-11-11
01-11-14
01-11-26
01-12-05
D
D
D
I
I
I
I
I I
Draw Typeb
30 23 21 8 11 22 22 33 30 22 23 21 12 18 18 17 17
Subject Category (Number) 0 0 0 0 0 175 175 175 175 175 175 125 125 125 125 125 125
Bonus Reward (franc) 4 5 2 5 2 9 1 5 2 5 2 5 2 5 2 5 2
A 4 4 7 4 7 1 9 4 8 4 8 4 8 4 8 4 8
B
Endowments
19 20 20 20 20 25 24 22 23.1 22 23.1 22 23.1 22 23.1 22 23.1
Notes
c
400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400
Cash (franc)
0.03 0.03 0.03 0.03 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04
Exchange Rate $/franc
b
Place where subjects attended college. I, states are drawn independently across periods; D, states are drawn without replacement, starting from a population of 18 balls, 6 of each type (state). c As discussed in the text, endowment of Notes includes loans to be repaid at the end of the period.
a
Yale UCLA
Subject Poola
98-10-07 98-11-16
Date
Table 2. Experimental Parameters.
Risk Aversion in Laboratory Asset Markets 345
346
PETER BOSSAERTS AND WILLIAM R. ZAME
Keep in mind that the social endowment (the market portfolio), the distribution of endowments, and the set of subjects and hence preferences differ across experiments. Indeed, because preferences may be affected by earnings during the experiment, the possibility of bankruptcy, and the time to the end of the experiment, preferences may even be different across periods in the same experiment. Because equilibrium prices and choices depend on all of these, and because of the inevitable noise present in every experiment, there is every reason to expect equilibrium prices and choices to be different across experiments or even across different periods in a given experiment. Most of the subjects in these experiments had some knowledge of economics in general and of financial economics in particular. In one experiment (01-11-26), subjects were mathematics undergraduates at the University of Sofia (Bulgaria), and were perhaps less knowledgeable about economics and finance. These experiments reported here were conducted between 1998 and 2001. More recently, we have used a different trading platform, which, among other things, avoids bankruptcy issues. Bossaerts, Meloso, and Zame (2006) reports data that replicate the features that we document here, in particular risk aversion.
3. FINDINGS Because all trading is done through a computerized continuous double auction, we can observe and record every transaction – indeed, every offer – but we focus on end-of-period prices: that is, the prices of the last transaction in each period.4 Because no uncertainty is resolved while the market is open, it is natural to organize the data using a static model of asset trading: investors trade assets before the state of nature is known, assets yield dividends and consumption takes place after the state of nature is revealed (see Arrow & Hahn, 1971 or Radner, 1972).5 Because Notes and Cash are both riskless, we simplify slightly and treat them as redundant assets.6 We therefore model our environment as involving trade in risky assets A, B, and a one riskless asset N (notes). Assets are claims to consumption in each of the three possible states of nature X, Y, Z. Write div A for the state-dependent dividends of asset A, div A(s) for dividends in state s, and so forth. If y ¼ ðyA ; yB ; yN Þ 2 IR3 is a portfolio of assets, we write div y ¼ yA ðdiv AÞ þ yB ðdiv BÞ þ yN ðdiv NÞ for the state-dependent dividends on the portfolio y.
347
Risk Aversion in Laboratory Asset Markets
There are I investors, each characterized by an endowment portfolio oi ¼ ðoiA ; oiB ; oiN Þ 2 IR2þ IR of risky and riskless assets, and a strictly concave, strictly monotone utility function U i : IR3 ! IR defined over statedependent terminal consumptions. (To be consistent with our experimental design, we allow consumption to be negative but we require holdings of A, B to be non-negative.) Investors care only about consumption, so given asset prices q, investor i chooses a portfolio yi to maximize div yi subject to the budget constraint q yi r q oi. An equilibrium consists of asset prices q 2 IR3þþ and portfolio choices i y 2 IR2þ IR for each investor such that choices are budget feasible: for each i q yi q oi choices are budget optimal: for each i j 2 IR2þ IR; U i ðdiv jÞ 4 U i ðdiv yi Þ ) q j 4 q oi asset markets clear: I X i¼1
yi ¼
I X
oi
i¼1
In the following sections, we show first, that observed prices are generally below risk-neutral prices, which implies risk aversion; second, that risk aversion is systematic; third that the effects of risk aversion can be quantified; and fourth, that risk aversion can be estimated.
3.1. Risk-neutral Pricing and Observed Pricing Risk neutrality for investor i means that Ui(x) ¼ E(x) (where the expectation is taken with respect to the true probabilities). If all investors are risk neutral then (normalizing so that the price of Cash is 1 and the price of Notes is 100), the unique equilibrium price is the risk-neutral price q ¼ ðEðAÞ; EðBÞ; EðNÞÞ ¼ ðEðAÞ; EðBÞ; 100Þ. Table 3 displays end-of-period prices in 72 periods across 9 experiments: the end-of-period price of asset A is below its expectation in 64 periods,
A B Nc A B N A B N A B N A B N A B N A B N A B N A B N
98-10-07
b
220/230 194/200 95d 215e 187 99 219 190 96 224 195 99 203 166 96 225 196 99 230/230 189/200 99 180/230 144/200 93 213/230 195/200 99
1
216/230 197/200 98 203 194 100 230 183 95 210 198 99 212 172 97 217 200 99 207/225 197/203 99 175/222 190/201 110 212/235 180/197 100
2
Table 3.
215/230 192/200 99 210 195 98 220 187 95 205 203 100 214 180 97 225 181 99 200/215 197/204 99 195/226 178/198 99 228/240 177/194 99
3 218/230 192/200 97 211 193 100 201 175 98 200 209 99 214 190 99 224 184 99 210/219 200/207 99 183/217 178/198 100 205/231 180/194 99
4 208/230 193/200 99 185 190 100 219 190 96 201 215 99 210 192 98 230 187 99 223/223 189/204 99 200/220 190/201 98 207/237 172/190 99
5
Period
205/230 195/200 99 201 185 99 230 180 99 213 200 99 204 189 101 233 188 99 226/228 203/208 99 189/225 184/197 99 232/242 180/192 99
6
End-of-Period Transaction Prices.
215 188 99 233/234 211/212 99 177/213 188/198 102 242/248 190/195 99
240 200 97 201 204 99
7
209 190 99 246/242 198/208 98 190/219 175/193 99 255/257 185/190 99
208 220 99
8
229/246 185/190 100
209/228 203/210 99
9
b
Security. End-of-period transaction price/expected payoff. c Notes. d For Notes, end-of-period transaction prices only are displayed. Payoff equals 100. e End-of-period transaction prices only are displayed. Expected payoffs are as in 98-10-07. Same for 99-02-11, 99-04-07, 99-11-10, and 99-11-11.
a
01-12-05
01-11-26
01-11-14
99-11-11
99-11-10
99-04-07
99-02-11
98-11-16
Sec
Date
a
348 PETER BOSSAERTS AND WILLIAM R. ZAME
349
Risk Aversion in Laboratory Asset Markets
equal to its expectation in 5 periods, above its expectation in 3 periods; the end-of-period price of asset B is below its expectation in 64 periods, equal to its expectation in 3 periods, above its expectation in 5 periods. Indeed, in many experiments, all or nearly all transactions take place at a price below the asset expectation. For example, Fig. 1 records all the purchases/sales of assets throughout the eight periods of an experiment conducted on November 26, 2001: all of the more than 500 trades of the risky assets take place at a price below the assets’ expected payoffs. Two aspects of the data deserve further discussion. As may be seen from Fig. 1 and Table 3, Notes – which are riskless – may sell at a substantial discount throughout a trading period. As Bossaerts and Plott (2004) discuss, this discount is the effect of the cash-in-advance constraint imposed by the trading mechanism. Because trades require cash, subjects who wish to purchase a risky asset must either sell the other risky asset or sell Notes. This put downward pressure on the pricing of all assets. However, because Notes
240 A B Notes
220
Prices (in francs)
200 180 160 140 120 100 80 0
1000
2000
Fig. 1.
3000
4000 5000 6000 time (in seconds)
7000
Transaction Prices in Experiment 01-11-26.
8000
9000
350
PETER BOSSAERTS AND WILLIAM R. ZAME
can be sold short, while risky assets cannot, there is greater downward pressure on the pricing of Notes than on the pricing of other assets. However, because there is downward pressure on the pricing of risky assets, it is useful to have an additional test that the discounts at which they sell reflect risk aversion and not solely this downward pressure. Such a test is readily available, because we have two risky securities, with correlated final payoffs. In particular, CAPM predicts that the security with the lower beta (lower covariance of final payoff with the market portfolio) will have lower expected returns, and hence will be priced at a lower discount relative to expected payoff. Inspection of Fig. 1 provides suggestive evidence for this: the discount for security B is generally less than that for security A; in experiment 01-11-26, it is precisely security B that had the lower beta. In the next two sections, we provide a systematic study of the relationship between discounts and betas. As mentioned before, prices within a period generally start out low and increase toward the end. This is most pronounced for the Notes, but the phenomenon occurs for the risky securities as well (in Fig. 1, one can detect it in all periods except the first one). Again, the cash-in-advance constraint may explain this drift – subjects first obtain cash by selling securities early on, and the subsequent execution of buy orders puts upward pressure on prices. An alternative explanation for the drift in prices of risky securities comes from out-of-equilibrium trading. Bossaerts (2006) shows that such a drift obtained in a world where subjects only attempt to trade in locally optimal directions. Local optimization makes sense when subjects cannot execute large orders without affecting prices, and when it is hard to put any prior on possible future price movements for lack of knowledge of the structure of the economy (number of traders, preferences of traders, endowments, etc.). Importantly, this explanation builds on risk aversion; under risk neutrality, the drift would disappear. As such, the upward pressure on prices of risky securities during a period could be attributed to risk aversion as well as the cash-in-advance constraint. This dual possibility requires that we provide an independent test of the importance of risk aversion, to which we now turn.
3.2. Prices and Betas Section 3.1 shows that asset prices are below risk-neutral prices, which implies risk aversion on the part of subjects. To see that the effect of risk aversion is systematic, we examine expected returns and asset betas.
351
Risk Aversion in Laboratory Asset Markets
Recall that the market portfolio is the social endowment of all assets 1 X M¼ oi i¼1
The beta of a portfolio y is the ratio of the covariance of y with the market portfolio to the variance of the market portfolio bðyÞ ¼
covðdiv y; div MÞ varðdiv MÞ
Given prices q, the expected rate of return of a portfolio y is Eðdiv y=q yÞ. Most asset-pricing theories predict that assets with higher betas should have higher expected rates of return. (For example, the Capital Asset Pricing Model (CAPM) predicts Eðdiv y=q yÞ 1 ¼ bðyÞ Eðdiv M=q MÞ 1.) In our laboratory markets, asset A always has higher beta than asset B so should have higher expected rate of return. Fig. 2 plots the difference in 0.5
Difference in Expected Return
0.4
0.3
0.2
0.1
0
1.2
Fig. 2.
1.3
1.4
1.5 1.6 1.7 Difference in Beta
1.8
1.9
Differences of Betas versus Differences of Expected Returns.
2
352
PETER BOSSAERTS AND WILLIAM R. ZAME
expected rates of return (expected rate of return of A minus expected rate of return of B) against the difference in betas (beta of A minus beta of B) for all 67 observations (all periods of all experiments).7 As the reader can see, the difference in expected rate of return is positive roughly 75% of the time. Applying a binomial test to the data yields a z-score of 8, so the correlation is very unlikely to be accidental.
3.3. Sharpe Ratios The data discussed above show that asset prices in our laboratory asset markets reflect significant risk aversion; Sharpe ratios provide a useful way to quantify the effect of this risk aversion. Given asset prices q, the excess rate of return is the difference between the rate of return on y and the rate of return on the riskless asset. In our context, the rate of return on the riskless asset is 1, so the excess rate of return on the portfolio y is E½div y=q y 1. By definition, the Sharpe ratio of y is the ratio of its excess return to its volatility: E½div y=q y 1 Sh ðyÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi varðdiv y=q yÞ In particular, the Sharpe ratio of the market portfolio M is E½div M=q M 1 ShðMÞ ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi varðdiv M=q MÞ If investors were risk neutral, asset prices would be equal expected dividends, so the numerator would be 0, and the Sharpe ratio of the market portfolio (indeed of every portfolio) would be 0. Roughly speaking, increasing risk aversion leads to lower equilibrium prices and hence to a higher Sharpe ratio (as we see below, CAPM leads to a precise statement), so the Sharpe ratio is a quantitative – although indirect – measure of market risk aversion. As Fig. 3 shows, except for one outlier, Sharpe ratios in our laboratory markets are in the range 0.2–1.7, clustering in the range 0.4–0.6. For comparison, recall that the Sharpe ratio of the market portfolio of stocks traded on the NYSE (computed on yearly data) is about 0.43. (Keep in mind that risks and rewards on the NYSE are enormously greater than in our experiments, so similar Sharpe ratios do not translate precisely into similar risk attitudes.)
353
Risk Aversion in Laboratory Asset Markets 2.5
981007
981116
990211
990407
991110
991111
011114
011126
011205
Market Sharpe Ratio
2
1.5
1
0.5
0 period
Fig. 3. Sharpe Ratios: All Periods, All Experiments.
3.4. CAPM An alternative approach to quantifying the risk aversion in our laboratory markets is to use a particular asset-pricing model to impute the market risk aversion. The CAPM of Sharpe (1964) is particularly well-suited to this exercise. CAPM can be derived from various sets of assumptions on primitives. For our purposes, assume that each investor’s utility for risky consumption depends only on the mean and variance; specifically, investor i’s utility function for state-dependent wealth x is U i ðxÞ ¼ EðxÞ
bi var ðxÞ 2
where expectations and variances are computed with respect to the true probabilities, and bi is absolute risk aversion. We assume throughout that risk aversion is sufficiently small that the utility functions Ui are strictly monotone
354
PETER BOSSAERTS AND WILLIAM R. ZAME
in the range of feasible consumptions, or at least observed consumptions. Because we allow consumption to be negative, and individual endowments 8 are portfolios of assets, this is enough to imply that CAPM holds. P i To formulate the pricing conclusion of CAPM, write m ¼ ðoA ; oiB Þ for the market portfolio of risky assets, and m ¼ m=I for the per capital portfolio of risky assets. Write m ¼ ðEðAÞ; EðBÞÞ for the vector of expected dividends of risky assets, ! cov ½A; A cov ½A; B D¼ cov ½B; A cov ½B; B for the covariance matrix of risky assets, and !1 I 1X 1 G¼ I i¼1 bi for the market risk aversion. Write p ¼ ð pA ; pB Þ for the vector of prices of risky assets. The pricing conclusion of CAPM is that the equilibrium price of risky assets is given by the formula p~ ¼ m G Dm In our setting, we know equilibrium prices, expected dividends, asset dividends and true probabilities, hence the covariance matrix, and the per capita market portfolio but not individual risk aversions. If CAPM pricing held exactly, we could impute the market risk aversion by solving the pricing formula for G. In our experiments, CAPM pricing does not hold exactly (see Bossaerts, Plott, and Zame (2007) for discussion of the distance of actual pricing to CAPM pricing), but we can impute market risk aversion as the best-fitting G. Several possible notions of ‘‘best-fitting’’ might be natural; we use Generalized Least Squares, where weights are based on the dispersion of individual holdings from the market portfolio; this is an economic measure of distance used and discussed in more detail in Bossaerts et al. (2007). This approach generates a direct estimate of the harmonic average risk aversion of the subjects, as opposed to individual estimates of the risk aversion coefficients, from which the harmonic mean could be computed. Fig. 4 shows the imputed market risk aversion for all periods in all experiments. Note that there is considerable variation across experiments, and even within a given experiment; as we have noted earlier, subject preferences certainly vary across experiments and may even vary within a given experiment.
355
Risk Aversion in Laboratory Asset Markets 5 981007
981116
990211
990407
991110
991111
011114
011126
011205
4.5
Estimated Risk Aversion (*10 3)
4 3.5 3 2.5 2 1.5 1 0.5 0 period
Fig. 4.
Imputed Market Risk Aversion: All Periods, All Experiments.
4. CONCLUSION We have argued here that the effects of risk aversion in laboratory asset markets are observable and significant, the observed effects are in the direction predicted by theory, and these effects are quantifiable. A crucial feature of our experimental design is that two risky assets are traded, so that the realization of uncertainty has two separate – but correlated – effects. It is this correlation that makes it possible to make quantitative inferences about the effects of risk aversion. In particular, willingness to pay for either risky asset depends on the price of the other risky asset and on the correlation between asset payoffs. (This is perhaps the central insight of CAPM.) In particular, if asset payoffs are negatively correlated, holding a portfolio of both assets (diversifying) is less risky than holding either asset separately, and more risk averse bidders should be willing to pay more to purchase a portfolio of both assets. Manipulation of the correlation between asset payoffs can therefore provide a rich variety of
356
PETER BOSSAERTS AND WILLIAM R. ZAME
choices, enabling the experimenter to better determine to what extent risk aversion influences behavior. These insights also suggest an approach to other laboratory settings in which risk aversion may play a role. For example, Harrison (1990) argues that deviations of observed behavior from theoretical predictions in laboratory tests of auction theory may be interpreted in a number of different ways: as failures of the theory, or as effects of risk aversion of bidders, or as effects of bidders’ (possibly incorrect) beliefs about the risk aversion of other bidders. It seems possible that these competing explanations might be disentangled by auctioning two prizes whose payoffs are risky but correlated, and by manipulating the correlation between values. In particular, it seems that bidders’ own risk aversion should drive up bids for prizes whose payoffs are negatively correlated (in comparison to bids for prizes whose payoffs are positively correlated). Because correlated risk is central to our work, our work is less closely connected to laboratory and naturally occurring experiments concerning gambles in the presence of background risk (Lusk & Coble, 2006; Harrison, List, & Towe, 2007).
NOTES 1. Here we refer to theories such as the Capital Asset Pricing Model of Sharpe (1964) that predict the prices of fundamental assets, rather than to theories such as the pricing formula of Black and Scholes (1973) that predicts the prices of options or other derivative assets. The latter theories do not rest on assumptions about investor risk attitudes, but rather on the absence of arbitrage. 2. However, the bankruptcy rule was never triggered more than twice in any experiment, and in half of the experiments was never triggered at all. 3. The second treatment was introduced because we noticed that some subjects fell prey to the gambler’s fallacy, behaving as if balls were drawn without replacement even when they were drawn with replacement. This suggested the second treatment, in which we actually used the procedure that some subjects believed to be used in the first treatment. Note that, in the second treatment, true probabilities – hence payoff distributions – changed every period, and hence, that markets definitely had to find a new equilibrium. However, Bossaerts and Plott (2004) report that prices generally remain much closer to CAPM under the second treatment than under the first one. 4. See Asparouhova, Bossaerts, and Plott (2003) and Bossaerts and Plott (2004) for discussion of the evolution of prices during the experiment. 5. Because there is only one good, there is no trade in commodities, hence no trade after the state of nature is revealed. 6. In fact, Cash and Notes are not quite perfect substitutes because all transactions must take place through Cash, so that there is a transaction value to
Risk Aversion in Laboratory Asset Markets
357
Cash. As Table 3 shows, however, Cash and Notes are nearly perfect substitutes at the end of most periods in most experiments. 7. Expected return is computed as the ratio of expected payoff under the theoretical distribution and the last transaction price for the period minus 1; beta is computed analogously, as the ratio of: (i) the theoretical covariance in the payoff the security with the payoff of the market portfolio divided by the product of the last transaction price of the security; and (ii) the theoretical market payoff variance divided by the last-traded price of the market portfolio. The last-traded price of the market portfolio is obtained from the last transactions of the two risky securities. 8. In the usual CAPM, all assets can be sold short, while in our framework the risky assets A, B cannot be sold short. However, in Appendix A of Bossaerts et al. (2007) we show that, given the particular asset structure here, the restriction on short sales does not change the conclusions.
ACKNOWLEDGMENTS Comments from the editors and an anonymous referee were very helpful; the authors remain responsible for any mistakes or omissions. Bossaerts is grateful for financial support from the R. G. Jenkins Family Fund, the National Science Foundation, and the Swiss Finance Institute. Zame is grateful for financial support from the John Simon Guggenheim Memorial Foundation, the National Science Foundation, the Social and Information Sciences Laboratory at Caltech, and the UCLA Academic Senate Committee on Research. Opinions, findings, conclusions, and recommendations expressed in this material are those of the authors and do not necessarily reflect the views of any funding agency.
REFERENCES Arrow, K., & Hahn, F. (1971). General competitive analysis. San Francisco: Holden-Day. Asparouhova, E., Bossaerts, P., & Plott, C. (2003). Excess demand and equilibration in multisecurity financial markets: The empirical evidence. Journal of Financial Markets, 6, 1–2. Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81, 637–654. Bossaerts, P. (2006). Equilibration under competition in smalls: Theory and experimental evidence. Caltech Working Paper. Bossaerts, P., Meloso, D., & Zame, W. (2006). Pricing in experimental dynamically complete asset markets. Caltech Working Paper. Bossaerts, P., & Plott, C. (2004). Basic principles of asset pricing theory: Evidence from largescale experimental financial markets. Review of Finance, 8, 135–169. Bossaerts, P., Plott, C., & Zame, W. (2007). Prices and portfolio choices in financial markets: Theory, econometrics, experiments. Econometrica, 75(4), 993–1038.
358
PETER BOSSAERTS AND WILLIAM R. ZAME
Davis, J., Fama, E., & French, K. (2000). Characteristics, covariances, and average returns: 1929 to 1997. Journal of Finance, 55, 389–406. Harrison, G. W. (1990). Risk attitudes in first-price auction experiments: A Bayesian analysis. Review of Economics and Statistics, 72, 541–546. Harrison, G. W., List, J., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. University of Central Florida. Econometrica, 75(2), 433–458. Lusk, J. L., & Coble, K. H. (2006). Risk aversion in the presence of background risks: Evidence from an economic experiment. Oklahoma State University Working Paper. Radner, R. (1972). Existence of equilibrium of plans, prices, and price expectations in a sequence of markets. Econometrica, 40, 289–303. Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance, 19, 425–442.
RISK AVERSION IN GAME SHOWS Steffen Andersen, Glenn W. Harrison, Morten I. Lau and E. Elisabet Rutstro¨m ABSTRACT We review the use of behavior from television game shows to infer risk attitudes. These shows provide evidence when contestants are making decisions over very large stakes, and in a replicated, structured way. Inferences are generally confounded by the subjective assessment of skill in some games, and the dynamic nature of the task in most games. We consider the game shows Card Sharks, Jeopardy!, Lingo, and finally Deal Or No Deal. We provide a detailed case study of the analyses of Deal Or No Deal, since it is suitable for inference about risk attitudes and has attracted considerable attention.
Observed behavior on television game shows constitutes a controlled natural experiment that has been used to estimate risk attitudes. Contestants are presented with well-defined choices where the stakes are real and sizeable, and the tasks are repeated in the same manner from contestant to contestant. We review behavior in these games, with an eye to inferring risk attitudes. We describe the types of assumptions needed to evaluate behavior, and propose a general method for estimating the parameters of structural models of choice behavior for these games. We illustrate with a detailed case study of behavior in the U.S. version of Deal Or No Deal (DOND). Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 359–404 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00008-2
359
360
STEFFEN ANDERSEN ET AL.
In Section 1 we review the existing literature in this area that is focused on risk attitudes, starting with Gertner (1993) and the Card Sharks program. We then review the analysis of behavior on Jeopardy! by Metrick (1995) and on Lingo by Beetsma and Schotman (2001).1 In Section 2 we turn to a detailed case study of the DOND program that has generated an explosion of analyses trying to estimate large-stakes risk aversion. We explain the basic rules of the game, which is shown with some variations in many countries. We then review complementary laboratory experiments that correspond to the rules of the naturally occurring game show. Finally, we discuss alternative modeling strategies employed in related DOND literature. Section 3 proposes a general method for estimating choice models in the stochastic dynamic programming environment that most of these game shows employ. We resolve the ‘‘curse of dimensionality’’ in this setting by using randomization methods and certain simplifications to the forwardlooking strategies adopted. We discuss the ability of our approach to closely approximate the fully dynamic path that agents might adopt. We illustrate the application of the method using data from the U.S. version of DOND, and estimate a simple structural model of expected utility theory choice behavior. The manner in which our method can be extended to other models is also discussed. Finally, in Section 4 we identify several weaknesses of game show data, and how they might be addressed. We stress the complementary use of natural experiments, such as game shows, and laboratory experiments.
1. PREVIOUS LITERATURE 1.1. Card Sharks The game show Card Sharks provided an opportunity for Gertner (1993) to examine dynamic choice under uncertainty involving substantial gains and losses. Two key features of the show allowed him to examine the hypothesis of asset integration: each contestant’s stake accumulates from round to round within a game, and the fact that some contestants come back for repeat plays after winning substantial amounts. The game involves each contestant deciding in a given round whether to bet that the next card drawn from a deck will be higher or lower than
Risk Aversion in Game Shows
361
Fig. 1. Money Cards Board in Card Sharks.
some ‘‘face card’’ on display. Fig. 1 provides a rough idea of the layout of the ‘‘Money Cards’’ board before any face cards are shown. Fig. 2 provides a representation of the board from a computerized laboratory implementation2 of Card Sharks. In Fig. 2 the subject has a face card with a 3, and is about to enter the first bet. Cards are drawn without replacement from a standard 52-card deck, with no Jokers and with Aces high. Contestants decide on the relative value of the next card, and then on an amount to bet that their choice is correct. If they are correct their stake increments by the amount bet, if they are incorrect their stake is reduced by the amount bet, and if the new card is the same as the face card there is no change in the stake. Every contestant starts off with an initial stake of $200, and bets could be in increments of $50 of the available stake. After three rounds in the first, bottom ‘‘row’’ of cards, they move to the second, middle ‘‘row’’ and receive an additional $200 (or $400 in some versions). If the stake goes to zero in the first row, contestants go straight to the second row and receive the new stake; otherwise, the additional stake is added to what remains from row one. The second row includes three choices, just as in the first row. After these three choices, and if the stakes have not dropped to zero, they can play the final bet. In this case they have to bet at least one-half of their stake, but otherwise the betting works the same way. One feature of the game is that contestants
362
STEFFEN ANDERSEN ET AL.
Fig. 2.
Money Cards Board from Lab Version of Card Sharks.
sometimes have the option to switch face cards in the hope of getting one that is easier to win against.3 The show aired in the United States in two major versions. The first, between April 1978 and October 1981, was on NBC and had Jim Perry as the host. The second, between January 1986 and March 1989, was on CBS and had Bob Eubanks as the host.4 The maximum prize was $28,800 on the NBC version and $32,000 on the CBS version, and would be won if the contestant correctly bet the maximum amount in every round. This only occurred once. Using official inflation calculators5 this converts into 2006 dollars between $89,138 and $63,936 for the NBC version, and between $58,920 and $52,077 for the CBS version.
363
Risk Aversion in Game Shows
These stakes are actually quite modest in relation to contemporary game shows in the United States, such as DOND described below, which typically has a maximal stake of $1,000,000. Of course, maximal stakes can be misleading, since Card Sharks and DOND are both ‘‘long shot’’ lotteries. Average earnings in the CBS version used by Gertner (1993) were $4,677, which converts to between $8,611 and $7,611 in 2006, whereas average earnings in DOND have been $131,943 for the sample we report later (excluding a handful of special shows with significantly higher prizes). 1.1.1. Estimates of Risk Attitudes The analysis of Gertner (1993) assumes a Constant Absolute Risk Aversion (CARA) utility function, since he did not have information on household wealth and viewed that as necessary to estimate a Constant Relative Risk Aversion (CRRA) utility function. We return to the issue of household wealth later. Gertner (1993) presents several empirical analyses. He initially (p. 511) focuses on the last round, and uses the optimal ‘‘investment’’ formula b¼
lnðpwin Þ lnðplose Þ 2a
where the probabilities of winning and losing the bet b are defined by pwin and plose, and the utility function is UðWÞ ¼ expðaWÞ for wealth W.6 From observed bets he infers a. There are several potential problems with this approach. First, there is an obvious sample selection problem from only looking at the last round, although this is not a major issue since relatively few contestants go bankrupt (less than 3%). Second, there is the serious problem of censoring at bets of 50% or 100% of the stake. Gertner (1993, p. 510) is well aware of the issue, and indeed motivates several analytical approaches to these data by a desire to avoid it: Regression estimates of absolute risk aversion are sensitive to the distribution assumptions one makes to handle the censoring created by the constraints that a contestant must bet no more than her stake and at least half of her stake in the final round. Therefore, I develop two methods to estimate a lower bound on the level of risk aversion that do not rely on assumptions about the error distribution.
The first method he uses is just to assume that the censored responses are in fact the optimal response. The 50% bets are assumed to be optimal bets, when in fact the contestant might wish to bet less (but cannot due to the
364
STEFFEN ANDERSEN ET AL.
final-round betting rules); thus inferences from these responses will be biased towards showing less risk aversion than there might actually be. Conversely, the 100% bets are assumed to be risk neutral, when in fact they might be risk lovers; thus inferences from these responses will be biased towards showing more risk aversion than there might actually be. Two wrongs do not make a right, although one does encounter such claims in empirical work. Of course, this approach still relies on exactly the same sort of assumptions about the interpretation of behavior, although not formalized in terms of an error distribution. And it is not apparent that the estimates will be lower bounds, since this censoring issue biases inferences in either direction. The average estimate of ARA to emerge is 0.000310, with a standard error of 0.000017, but it is not clear how one should interpret this estimate since it could be an overestimate or an underestimate. The second approach is a novel and early application of simulation methods, which we will develop in greater detail below. A computer simulates optimal play by a risk-neutral agent playing the entire game 10 million times, recognizing that the cards are drawn without replacement. The computer does not appear to recognize the possibility of switching cards, but that is not central to the methodological point. The average return from this virtual lottery (VL) is $6,987 with a standard deviation of $10,843. It is not apparent that the lottery would have a Gaussian distribution of returns, but that can be allowed for in a more complete numerical analysis as we show later, and is again not central to the main methodological point. The next step is to compare this distribution with the observed distribution of earnings, which was an average of $4,677 with a standard deviation of $4,258, and use a revealed preference argument to infer what risk attitudes must have been in play for this to have been the outcome instead of the VL: A second approach is to compare the sample distribution of outcomes with the distribution of outcomes if a contestant plays the optimal strategy for a risk-neutral contestant. One can solve for the coefficient of absolute risk aversion that would make an individual indifferent between the two distributions. By revealed preference, an ‘‘average’’ contestant prefers the actual distribution to the expected-value maximizing strategy, so this is an estimate of the lower bound of constant absolute risk aversion (pp. 511/512).
This approach is worth considering in more depth, because it suggests estimation strategies for a wide class of stochastic dynamic programming problems which we develop in Section 3. This exact method will not work once one moves beyond special cases such as risk neutrality, where outcomes
Risk Aversion in Game Shows
365
and behavior in later rounds have no effect on optimal behavior in earlier rounds. But we will see that an extension of the method does generalize. The comparison proposed here generates a lower bound on the ARA, rather than a precise estimate, since we know that an agent with an even higher ARA would also implicitly choose the observed distribution over the virtual RN distribution. Obviously, if one could generate VL distributions for a wide range of ARA values, it would be possible to refine this estimation step and select the ARA that maximizes the likelihood of the data. This is, in fact, exactly what we propose later as a general method for estimating risk attitudes in such settings. The ARA bound derived from this approach is 0.0000711, less than one-fourth of the estimate from the first method. Gertner (1993, p. 512) concludes that The ‘‘Card Sharks’’ data indicate a level of risk aversion higher than most existing estimates. Contestants do not seem to behave in a risk-loving and enthusiastic way because they are on television, because anything they win is gravy, or because the producers of the show encourage excessive risk-taking. I think this helps lend credence to the potential importance and wider applicability of the anomalous results I document below.
His first method does not provide any basis for these claims, since risk loving is explicitly assumed away. His second method does indicate that the average player behaves as if risk averse, but there are no standard errors on that bound. Thus, one simply cannot say that it is statistically significant evidence of risk aversion. 1.1.2. EUT Anomalies The second broad set of empirical analyses by Gertner (1993) considers a regression model of bets in the final round, and shows some alleged violations of EUT. The model is a two-limit tobit specification, recognizing that bets at 50% and 100% may be censored. However, most of the settings in which contestants might rationally bet 50% or 100% are dropped. Bets with a face card of 2 or an Ace are dropped since they are sure things in the sense that the optimal bet cannot result in a loss (the bet is simply returned if the same card is then turned up). Similarly, bets with a face card of 8 are dropped, since contestants almost always bet the minimum. These deletions amount to 258 of the 844 observations, which is not a trivial sub-sample. The regression model includes several explanatory variables. The central ones are cash and stake. Variable cash is the accumulated earnings by the contestant to that point over all repetitions of the game. So this includes previous plays of the game for ‘‘champions,’’ as well as earnings
366
STEFFEN ANDERSEN ET AL.
accumulated in rounds 1–6 of the current game. Variable stake is the accumulated earnings in the current game, so it excludes earnings from previous games. One might expect the correlation of stake and cash to be positive and high, since the average number of times the game is played in these data is 1.85 ( ¼ 844/457). Additional explanatory variables include a dummy for new players that are in their first game; the ratio of cash to the number of times the contestant has played the whole game (the ratio is 0 for new players); the value of any cars that have been won, given by the stated sticker price of the car; and dummy variables for each of the possible face card pairs (in this game a 3 is essentially the same as a King, a 4 the same as a Queen, etc). The stake variable is included as an interaction with these face dummies, which are also included by themselves.7 The model is estimated with or without a multiplicative heteroskedasticity correction, and the latter estimates preferred. Card-counters are ignored when inferring probabilities of a win, and this seems reasonable as a first approximation. Gertner (1993, Section VI) draws two striking conclusions from this model. The first is that stake is statistically significant in its interactions with the face cards. The second is that the cash variable is not significant. The first result is said to be inconsistent with EUT since earnings in this show are small in relation to wealth, and The desired dollar bet should depend upon the stakes only to the extent that the stakes impact final wealth. Thus, risky decisions on ‘‘Card Sharks’’ are inconsistent with individuals maximizing a utility function over just final wealth. If one assumes that utility depends only on wealth, estimates of zero on card intercepts and significant coefficients on the stake variable imply that outside wealth is close to zero. Since this does not hold, one must reject utility depending only on final wealth (p. 517).
This conclusion bears close examination. First, there is a substantial debate as to whether EUT has to be defined over final wealth, whatever that is, or can be defined just over outcomes in the choice task before the contestant (e.g., see Cox and Sadiraj (2006) and Harrison, Lau, and Rutstro¨m (2007) for references to the historical literature). So even if one concludes that the stake matters, this is not fatal for specifications of EUT defined over prizes, as clearly recognized by Gertner (1993, p. 519) in his reference to Markowitz (1952). Second, the deletion of all extreme bets likely leads to a significant understatement of uncertainty about coefficient estimates. Third, the regression does not correct for panel effects, and these could be significant since the variables cash and stake are correlated with the individual.8 Hence their coefficient estimates might be picking up other, unobservable effects that are individual-specific.
367
Risk Aversion in Game Shows
The second result is also said to be inconsistent with EUT, in conjunction with the first result. The logic is that stake and cash should have an equal effect on terminal wealth, if one assumes perfect asset integration and that utility is defined over terminal wealth. But one has a significant effect on bets, and the other does not. Since the assumption that utility is defined over terminal wealth and that asset integration is perfect are implicitly maintained by Gertner (1993, p. 517ff.), he concludes that EUT is falsified. However, one can include terminal wealth as an argument of utility without also assuming perfect asset integration (e.g., Cox & Sadiraj, 2006). This is also recognized explicitly by Gertner (1993, p. 519), who considers the possibility that ‘‘contestants have multi-attribute utility functions, so that they care about something in addition to wealth.’’9 Thus, if one accepts the statistical caveats about samples and specifications for now, these results point to the rejection of a particular, prominent version of EUT, but they do not imply that all popular versions of EUT are invalid.
1.2. Jeopardy! In the game show Jeopardy! there is a subgame referred to as Final Jeopardy. At this point, three contestants have cash earnings from the initial rounds. The skill component of the game consists of hearing some text read out by the host, at which point the contestants jump in to state the question that the text provides the answer to.10 In Final Jeopardy the contestants are told the general subject matter for the task, and then have to privately and simultaneously state a wager amount from their accumulated points. They can wager any amount up to their earned endowment at that point, and are rewarded with even odds: if they are correct they get that wager amount added, but if they are incorrect they have that amount deducted. The winner of the show is the contestant with the most cash after this final stage. The winner gets to keep the earnings and come back the following day to try and continue as champion. In general, these wagers are affected by the risk attitudes of contestants. But they are also affected by their subjective beliefs about their own skill level relative to the other two contestants, and by what they think the other contestants will do. So this game cannot be fully analyzed without making some game-theoretic assumptions. Jeopardy! was first aired in the United States in 1964, and continued until 1975. A brief season returned between 1978 and 1979, and then the modern era began in 1984 and continues to this day. The format changes have been
368
STEFFEN ANDERSEN ET AL.
relatively small, particularly during the modern era. The data used by Metrick (1995) comes from shows broadcasted between October 1989 and January 1992, and reflects more than 1,150 decisions. Metrick (1995) examines behavior in Final Jeopardy in two stages.11 The first stage considers the subset of shows in which one contestant is so far ahead in cash that the bet only reveals risk attitudes and beliefs about own skill. In such ‘‘runaway games’’ there exist wagers that will ensure victory, although there might be some rationale prior to September 2003 for someone to bet an amount that could lead to a loss. Until then, the champion had to retire after five wins, so if one had enough confidence in one’s skill at answering such questions, one might rationally bet more than was needed to ensure victory. After September 2003 the rules changed, so the champion stays on until defeated. In the runaway games Metrick (1995, p. 244) uses the same formula that Gertner (1993) used for CARA utility functions. The only major difference is that the probability of winning in Jeopardy! is not known objectively to the observer.12 His solution is to substitute the observed fraction of correct answers, akin to a rational expectations assumption, and then solve for the CARA parameter a that accounts for the observed bets. The result is an estimate of a equal to 0.000066 with a standard error of 0.000056. Thus, there is slight evidence of risk aversion, but it is not statistically significant, leading Metrick (1995, p. 245) to conclude that these contestants behaved in a risk-neutral manner. The second stage of the analysis considers subsamples in which two players have accumulated scores that are sufficiently close that they have to take beliefs about the other into account, but where there is a distant third contestant who can be effectively ignored. Metrick (1995) cuts this Gordian knot of strategic considerations by assuming that contestants view themselves as betting against contestants whose behavior can be characterized by their observed empirical frequencies. He does not use these data to make inferences about risk attitudes.
1.3. Lingo The underlying game in Lingo involves a team of two people guessing a hidden five-letter word. Fig. 3 illustrates one such game from the U.S. version. The team is told the first letter of the word, and can then just state words. If incorrect, the words that are tried are used to reveal letters in the correct word if there are any. To take the example in Fig. 3, the true word
369
Risk Aversion in Game Shows
Fig. 3.
The Word Puzzle in Lingo.
was STALL. So the initial S was shown. The team suggested SAINT and is informed (by light grey coloring) that A and T are present in the correct word. The team is not told the order of the letters A and T in the correct word. The team then suggested STAKE, and was informed that the T and A were in the right place (by grey coloring) and that no other letters were in the correct word. The team then tried STAIR, SEATS, and finally STALL. Most teams are able to guess the correct word in five rounds. The game occurs in two stages. In the first stage, one team of two plays against another team for several of these Lingo word-guessing games. The couple with the most money then goes on to the second stage, which is the one of interest for measuring risk attitudes because it is non-interactive. So the winning couple comes into the main task with a certain earned endowment (which could be augmented by an unrelated game called ‘‘jackpot’’). The team also comes in with some knowledge of its own ability to solve these word-guessing puzzles. In the Dutch data used by Beetsma and Schotman (2001), spanning 979 games, the frequency distribution of the number of solutions across rounds
370
STEFFEN ANDERSEN ET AL.
1–5 in the final stage was 0.14, 0.32, 0.23, 0.13, 0.081, and 0.089, respectively. Every round that the couple requires to guess the word means that they have to pick one ball from an urn affecting their payoffs, as described below. If they do not solve the word puzzle, they have to pick six balls. These balls determine if the team goes ‘‘bust’’ or ‘‘survives’’ something called the Lingo Board in that round. An example of the Lingo Board is shown in Fig. 4, from Beetsma and Schotman (2001, Fig. 3).13 There are 35 balls in the urn numbered from 1 to 35, plus one ‘‘golden ball.’’ If the golden ball is picked then the team wins the cash prize for that round and gets a free pass to the next round. If one of the numbered balls is picked, then the fate of the team depends on the current state of the Lingo Board. The team goes ‘‘bust’’ if they get a row, column, or diagonal of X’s, akin to the parlor game noughts and crosses. So solving the word puzzle in fewer moves is good, since it means that fewer balls have to be drawn from the urn, and hence that the survival probability is higher. In the example from Fig. 4, drawing a 5 would be fatal, drawing an 11 would not be, and drawing a 1 would not be if a 2 or 8 had not been previously drawn. If the team survives a round it gets a cash prize, and is asked if they want to keep going or stop. This lasts for five rounds. So apart from the skill part of the game, guessing the words, this is the only choice the team makes. This is therefore a ‘‘stop-go’’ problem, in which the team balances current earnings with the lottery of continuing and either earning more cash or going bust. If the team chooses to continue the stake doubles; if the golden ball had been drawn it is replaced in the urn. If the team goes bust it takes home nothing. Teams can play the game up to three times, then retire from the show.
Fig. 4.
Example of a Lingo Board.
Risk Aversion in Game Shows
371
Risk attitudes are involved when the team has to balance the current earnings with the lottery of continuing. That lottery depends on subjective beliefs about the skill level of the team, the state of the Lingo Board at that point, and the perception of the probabilities of drawing a ‘‘fatal’’ number or the golden ball. In many respects, apart from the skill factor and the relative symmetry of prizes, this game is remarkably like DOND, as we see later. Beetsma and Schotman (2001) evaluate data from 979 finals. Each final lasts several rounds, so the sample of binary stop/continue decisions is larger, and constitutes a panel. Average earnings in this final round in their sample are 4,106 Dutch guilders ( f ), with potential earnings, given the initial stakes brought into the final, of around f 15,136. The average exchange rate into U.S. dollars in 1997, which is around when these data were from, was f 0.514 per dollar, so these stakes are around $2,110 on average, and up to roughly $7,780. These are not life-changing prizes, like the top prizes in DOND, but are clearly substantial in relation to most lab experiments. Beetsma and Schotman (2001, Section 4) show that the stop/continue decisions have a simple monotonic structure if one assumes CRRA or CARA utility. Since the odds of surviving never get better with more rounds, if it is optimal to stop in one round then it will always be optimal to stop in any later round. This property does not necessarily hold for other utility functions. But for these utility functions, which are still an important class, one can calculate a threshold survival probability pni for any round i such that the team should stop if the actual survival probability falls below it. This threshold probability does depend on the utility function and parameter values for it, but in a closed-form fashion that can be easily evaluated within a maximum-likelihood routine.14 Each team can play the game three times before it has to retire as a champion. The specification of the problem clearly recognizes the option value in the first game of coming back to play the game a second or third time, and then the option value in the second game of coming back to play a third time. The certainty-equivalent of these option values depends, of course, on the risk attitudes of the team. But the estimation procedure ‘‘black boxes’’ these option values to collapse the estimation problem down to a static one: they are free parameters to be estimated along with the parameter of the utility function. Thus, they are not constrained by the expected returns and risk of future games, the functional form of utility, and the specific parameters values being evaluated in the maximum-likelihood routine. Beetsma and Schotman (2001, p. 839) do clearly check that the option value in the first game exceeds the option value in the second game, but (a) they only examine point estimates, and make no claim that this
372
STEFFEN ANDERSEN ET AL.
difference is statistically significant,15 and (b) there is no check that the absolute values of these option values are consistent with the utility function and parameter values. In addition, there is no mention of any corrections for the fact that each team makes several decisions, and that errors for that team are likely correlated. With these qualifications, the estimate of the CRRA parameter is 0.42, with a standard error of 0.05, if one assumes that utility is only defined over the monetary prizes. It rises to 6.99, with a standard error of 0.72, if one assumes a baseline wealth level of f 50,000, which is the preferred estimate. Each of these estimates is significantly different from 0, implying rejection of risk neutrality in favor of risk aversion. The CARA specification generates comparable estimates. One extension is to allow for probability weighting on the actual survival probability pi in round i. The weighting occurs in the manner of original Prospect Theory, due to Kahneman and Tversky (1979), and not in the rank-dependent manner of Quiggin (1982, 1993) and Cumulative Prospect Theory. One apparent inconsistency is that the actual survival probabilities are assumed to be weighted subjectively, but the threshold survival probabilities pni are not, which seems odd (see their Eq. (18), p. 843). The results show that estimates of the degree of concavity of the utility function increase substantially, and that contestants systematically overweight the actual survival probability. We return to some of the issues of structural estimation of models assuming decision weights, in a rank-dependent manner, in the discussion of DOND and Andersen, Harrison, Lau, and Rutstro¨m (2006a, 2006b).
2. DEAL OR NO DEAL 2.1. The Game Show as a Natural Experiment The basic version of DOND is the same across all countries. We explain the general rules by focusing on the version shown in the United States, and then consider variants found in other countries. The show confronts the contestant with a sequential series of choices over lotteries, and asks a simple binary decision: whether to play the (implicit) lottery or take some deterministic cash offer. A contestant is picked from the studio audience. They are told that a known list of monetary prizes, ranging from $0.01 up to $1,000,000, has been placed in 26 suitcases.16 Each suitcase is carried onstage by attractive female models, and has a number
Risk Aversion in Game Shows
373
from 1 to 26 associated with it. The contestant is informed that the money has been put in the suitcase by an independent third party, and in fact it is common that any unopened cases at the end of play are opened so that the audience can see that all prizes were in play. Fig. 5 shows how the prizes are displayed to the subject at the beginning of the game. The contestant starts by picking one suitcase that will be ‘‘his’’ case. In round 1, the contestant must pick 6 of the remaining 25 cases to be opened, so that their prizes can be displayed. Fig. 6 shows how the display changes after the contestant picks the first case: in this case the contestant unfortunately picked the case containing the $300,000 prize. A good round for a contestant occurs if the opened prizes are low, and hence the odds increase that his case holds the higher prizes. At the end of each round the host is phoned by a ‘‘banker’’ who makes a deterministic cash offer to the contestant. In one of the first American shows (12/21/2005) the host made a point of saying clearly that ‘‘I don’t know what’s in the suitcases, the banker doesn’t, and the models don’t.’’ The initial offer in early rounds is typically low in comparison to expected offers in later rounds. We use an empirical offer function later, but the qualitative trend is quite clear: the bank offer starts out at roughly 10% of
Fig. 5.
Opening Display of Prizes in TV Game Show Deal or No Deal.
374
STEFFEN ANDERSEN ET AL.
Fig. 6.
Prizes Available After One Case Has Been Opened.
the expected value of the unopened cases, and increments by about 10% of that expected value for each round. This trend is significant, and serves to keep all but extremely risk-averse contestants in the game for several rounds. For this reason, it is clear that the case that the contestant ‘‘owns’’ has an option value in future rounds. In round 2, the contestant must pick five cases to open, and then there is another bank offer to consider. In succeeding rounds, 3–10, the contestant must open 4, 3, 2, 1, 1, 1, 1, and 1 cases, respectively. At the end of round 9, there are only two unopened cases, one of which is the contestant’s case. In round 9 the decision is a relatively simple one from an analyst’s perspective: either take the non-stochastic cash offer or take the lottery with a 50% chance of either of the two remaining unopened prizes. We could assume some latent utility function, and estimate parameters for that function that best explain observed binary choices. Unfortunately, relatively few contestants get to this stage, having accepted offers in earlier rounds. In our data, only 9% of contestants reach that point. More serious than the smaller sample size, one naturally expects that risk attitudes would affect those surviving to this round. Thus, there would be a serious sample attrition bias if one just studied choices in later rounds.
Risk Aversion in Game Shows
375
The bank offer gets richer and richer over time, ceteris paribus the random realizations of opened cases. In other words, if each unopened case truly has the same subjective probability of having any remaining prize, there is a positive expected return to staying in the game for more and more rounds. A risk-averse subject that might be just willing to accept the bank offer, if the offer were not expected to get better and better, would choose to continue to another round since the expected improvement in the bank offer provides some compensation for the additional risk of going into the another round. Thus, to evaluate the parameters of some latent utility function given observed choices in earlier rounds, we have to mentally play out all possible future paths that the contestant faces.17 Specifically, we have to play out those paths assuming the values for the parameters of the likelihood function, since they affect when the contestant will decide to ‘‘deal’’ with the banker, and hence the expected utility of the compound lottery. This corresponds to procedures developed in the finance literature to price pathdependant derivative securities using Monte Carlo simulation (e.g., Campbell, Lo, & MacKinlay, 1997, Section 9.4). We discuss general numerical methods for this type of analysis later. Saying ‘‘no deal’’ in early rounds provides one with the option of being offered a better deal in the future, ceteris paribus the expected value of the unopened prizes in future rounds. Since the process of opening cases is a martingale process, even if the contestant gets to pick the cases to be opened, it has a constant future expected value in any given round equal to the current expected value. This implies, given the exogenous bank offers (as a function of expected value), that the dollar value of the offer will get richer and richer as time progresses. Thus, bank offers themselves will be a submartingale process. In the U.S. version the contestants are joined after the first round by several family members or friends, who offer suggestions and generally add to the entertainment value. But the contestant makes the decisions. For example, in the very first show a lady was offered $138,000, and her hyperactive husband repeatedly screamed out ‘‘no deal!’’ She calmly responded, ‘‘At home, you do make the decisions. But y. we’re not at home!’’ She turned the deal down, as it happens, and went on to take an offer of only $25,000 two rounds later. Our sample consists of 141 contestants recorded between December 19, 2005 and May 6, 2007. This sample includes 6 contestants that participated in special versions, for ratings purposes, in which the top prize was increased from $1 million to $2 million, $3 million, $4 million, $5 million or $6 million.18 The biggest winner on the show so far has been Michelle Falco, who was lucky enough to be on the September 22, 2006 show with a top prize
376
STEFFEN ANDERSEN ET AL.
of $6 million. Her penultimate offer was $502,000 when the 3 unopened prizes were $10, $750,000 and $1 million, which has an expected value of $583,337. She declined the offer, and opened the $10 case, resulting in an offer of $808,000 when the expected value of the two remaining prizes was $875,000. She declined the offer, and ended up with $750,000 in her case. In other countries there are several variations. In some cases there are fewer prizes, and fewer rounds. In the United Kingdom there are only 22 monetary prizes, ranging from 1p up to d250,000, and only 7 rounds. In round 1 the contestant must pick 5 boxes, and then in each round until round 6 the contestant has to open 3 boxes per round. So there can be a considerable swing from round to round in the expected value of unopened boxes, compared to the last few rounds of the U.S. version. At the end of round 6 there are only 2 unopened boxes, one of which is the contestant’s box. Some versions substitute the option of switching the contestant’s box for an unopened box, instead of a bank offer. This is particularly common in the French and Italian versions, and relatively rare in other versions. Things become much more complex in those versions in which the bank offer in any round is statistically informative about the prize in the contestant’s case. In that case the contestant has to make some correction for this possibility, and also consider the strategic behavior of the banker’s offer. Bombardini and Trebbi (2005) offer clear evidence that this occurs in the Italian version of the show, but there is no evidence that it occurs in the U.K. version. The Australian version offers several additional options at the end of the normal game, called Chance, SuperCase, and Double Or Nothing. In many cases they are used as ‘‘entertainment filler,’’ for games that otherwise would finish before the allotted 30 min. It has been argued, most notably by Mulino, Scheelings, Brooks, and Faff (2006), that these options should rationally change behavior in earlier rounds, since they provide some uncertain ‘‘insurance’’ against saying ‘‘deal’’ earlier rather than later. 2.2. Comparable Laboratory Experiments We also implemented laboratory versions of the DOND game, to complement the natural experimental data from the game shows.19 The instructions were provided by hand and read out to subjects to ensure that every subject took some time to digest them. As far as possible, they rely on screen shots of the software interface that the subjects were to use to enter their choices. The opening page for the common practice session in the lab, shown in Fig. 7, provides the subject with basic information about the task
Risk Aversion in Game Shows
Fig. 7.
377
Opening Screen Shot for Laboratory Experiment.
before them, such as how many boxes there were and how many boxes needed to be opened in any round.20 In the default setup the subject was given the same frame as in the Australian and U.S. game shows: this version has more prizes (26 instead of 22) and more rounds (9 instead of 6) than the U.K. version. After clicking on the ‘‘Begin’’ box, the lab subject was given the main interface, shown in Fig. 8. This provided the basic information for the DOND task. The presentation of prizes was patterned after the displays used on the actual game shows. The prizes are shown in the same nominal denomination as the Australian daytime game show, and the subject told that an exchange rate of 1,000:1 would be used to convert earnings in the DOND task into cash payments at the end of the session. Thus, the top cash prize the subject could earn was $200 in this version. The subject was asked to click on a box to select ‘‘his box,’’ and then round 1 began. In the instructions we illustrated a subject picking box #26, and then six boxes, so that at the end of round 1 he was presented with a deal from the banker, shown in Fig. 9. The prizes that had been opened in round 1 were ‘‘shaded’’ on the display, just as they are in the game show display. The subject is then asked to accept $4,000 or continue. When the
378
STEFFEN ANDERSEN ET AL.
Fig. 8.
Prize Distribution and Display for Laboratory Experiment.
game ends the DOND task earnings are converted to cash using the exchange rate, and the experimenter prompted to come over and record those earnings. Each subject played at their own pace after the instructions were read aloud. One important feature of the experimental instructions was to explain how bank offers would be made. The instructions explained the concept of the expected value of unopened prizes, using several worked numerical examples in simple cases. Then subjects were told that the bank offer would be a fraction of that expected value, with the fractions increasing over the rounds as displayed in Fig. 10. This display was generated from Australian game show data available at the time. We literally used the parameters defining the function shown in Fig. 10 when calculating offers in the experiment, and then rounding to the nearest dollar.
Risk Aversion in Game Shows
Fig. 9.
379
Typical Bank Offer in Laboratory Experiment.
The subjects for our laboratory experiments were recruited from the general student population of the University of Central Florida in 2006.21 We have information on 676 choices made by 89 subjects. We estimate the same models for the lab data as for the U.S. game show data. We are not particularly interested in getting the same quantitative estimates per se, since the samples, stake, and context differ in obvious ways. Instead our interest is whether we obtain the same qualitative results: is the lab reliable in terms of the qualitative inferences one draws from it? Our null hypothesis is that the lab results are the same as the naturally occurring results. If we reject this hypothesis one could infer that we have just not run the right lab experiments in some respect, and we have some sympathy for that view. On the other hand, we have implemented our lab experiments in exactly the manner that we would normally do as lab experimenters. So we
380
STEFFEN ANDERSEN ET AL. Path of Bank Offers 1
Bank Offer As A Fraction of Expected Value of Unopened Cases
.9 .8 .7 .6 .5 .4 .3 .2 .1 0 1
Fig. 10.
2
3
4
5 Round
6
7
8
9
Information on Bank Offers in Laboratory Experiment.
are definitely able to draw conclusions in this domain about the reliability of conventional lab tests compared to comparable tests using naturally occurring data. These conclusions would then speak to the questions raised by Harrison and List (2004) and Levitt and List (2007) about the reliability of lab experiments. 2.3. Other Analyses of Deal or No Deal A large literature on DOND has evolved quickly.22 Appendix B in the working paper version documents in detail the modeling strategies adopted in the DOND literature, and similarities and differences to the approach we propose.23 In general, three types of empirical strategies have been employed to modeling observed DOND behavior. The first empirical strategy is the calculation of CRRA bounds at which a given subject is indifferent between one choice and another. These bounds can be calculated for each subject and each choice, so they have the advantage of not assuming that each subject has the same risk preferences, just that they use the same functional form. The studies differ in terms of
Risk Aversion in Game Shows
381
how they use these bounds, as discussed briefly below. The use of bounds such as these is familiar from the laboratory experimental literature on risk aversion: see Holt and Laury (2002), Harrison, Johnson, McInnes, and Rutstro¨m (2005), and Harrison, Lau, Rutstro¨m, and Sullivan (2005) for discussion of how one can then use interval regression methods to analyze them. The limitation of this approach, discussed in Harrison and Rutstro¨m (2008, Section 2.1), is that it is difficult to go beyond the CRRA or other one-parameter families, and in particular to examine other components of choice under uncertainty (such as more flexible utility functions, preference weighting or loss aversion).24 Post, van den Assem, Baltussen, and Thaler (2006) use CRRA bounds in their analysis, and it has been employed in various forms by others as noted below. The second empirical strategy is the examination of specific choices that provide ‘‘trip wire’’ tests of certain propositions of EUT, or provide qualitative indicators of preferences. For example, decisions made in the very last rounds often confront the contestant with the expected value of the unopened prizes, and allow one to identify those who are risk loving or risk averse directly. The limitation of this approach is that these choices are subject to sample selection bias, since risk attitudes and other preferences presumably played some role in whether the contestant reached these critical junctures. Moreover, they provide limited information at best, and do not allow one to define a metric for errors. If we posit some stochastic error specification for choices, as is now common, then one has no way of knowing if these specific choices are the result of such errors or a manifestation of latent preferences. Blavatskyy and Pogrebna (2006) illustrate the sustained use of this type of empirical strategy, which is also used by other studies in some respects. The third empirical strategy it to propose a latent decision process and estimate the structural parameters of that process using maximum likelihood. This is the approach we favor, since it allows one to examine structural issues rather than rely on ad hoc proxies for underlying preferences. Harrison and Rutstro¨m (2008, Section 2.2) discuss the general methodological advantages of this approach.
3. A GENERAL ESTIMATION STRATEGY The DOND game is a dynamic stochastic task in which the contestant has to make choices in one round that generally entail consideration of future consequences. The same is true of the other game shows used for estimation
382
STEFFEN ANDERSEN ET AL.
of risk attitudes. In Card Sharks the level of bets in one round generally affects the scale of bets available in future rounds, including bankruptcy, so for plausible preference structures one should take this effect into account when deciding on current bets. Indeed, as explained earlier, one of the empirical strategies employed by Gertner (1993) can be viewed as a precursor to our general method. In Lingo the stop/continue structure, where a certain amount of money is being compared to a virtual money lottery, is evident. We propose a general estimation strategy for such environments, and apply it to DOND. The strategy uses randomization to break the general ‘‘curse of dimensionality’’ that is evident if one considers this general class of dynamic programming problems (Rust, 1997).
3.1. Basic Intuition The basic logic of our approach can be explained from the data and simulations shown in Table 1. We restrict attention here to the first 75 contestants that participated in the standard version of the television game with a top prize of $1 million, to facilitate comparison of dollar amounts. There are nine rounds in which the banker makes an offer, and in round 10 the contestant simply opens his case. Only 7 contestants, or 9% of the sample of 75 continued to round 10, with most accepting the banker’s offer in rounds 6, 7, 8, and 9. The average offer is shown in column 4. We stress that this offer is stochastic from the perspective of the sample as a whole, even if it is non-stochastic to the specific contestant in that round. Thus, to see the logic of our approach from the perspective of the individual decision-maker, think of the offer as a non-stochastic number, using the average values shown as a proximate indicator of the value of that number in a particular instance. In round 1 the contestant might consider up to nine VLs. He might look ahead one round and contemplate the outcomes he would get if he turned down the offer in round 1 and accepted the offer in round 2. This VL, realized in virtual round 2 in the contestant’s thought experiment, would generate an average payoff of $31,141 with a standard deviation of $23,655. The top panel of Fig. 11 shows the simulated distribution of this particular lottery. The distribution of payoffs to these VLs are highly skewed, so the standard deviation may be slightly misleading if one thinks of these as Gaussian distributions. However, we just use the standard deviation as one pedagogic indicator of the uncertainty of the payoff in the VL: in our formal analysis we consider the complete distribution of the VL in a nonparametric manner.
75 100% 75 100% 75 100% 75 100% 74 99% 69 92% 53 71% 33 44% 17 23% 7 9%
10
16
20
16
5
1
0
0
0
$79,363
$107,779
$119,746
$112,818
$103,188
$75,841
$54,376
$33,453
$16,180
$31,141 $53,757 $73,043 $97,275 ($23,655) ($45,996) ($66,387) ($107,877) $53,535 $72,588 $96,887 ($46,177) ($66,399) ($108,086) $73,274 $97,683 ($65,697) ($107,302) $99,895 ($108,629)
Round 5 $104,793 ($102,246) $104,369 ($102,222) $105,117 ($101,271) $107,290 ($101,954) $111,964 ($106,137)
Round 6 $120,176 ($121,655) $119,890 ($121,492) $120,767 ($120,430) $123,050 ($120,900) $128,613 ($126,097) $128,266 ($124,945)
Round 7 $131,165 ($154,443) $130,408 ($133,239) $131,563 ($153,058) $134,307 ($154,091) $140,275 ($160,553) $139,774 ($159,324) $136,720 ($154,973)
Round 8
Round 10
$136,325 $136,281 ($176,425) ($258,856) $135,877 $135,721 ($175,278) ($257,049) $136,867 $136,636 173810 ($255,660)) $139,511 $139,504 ($174,702) ($257,219)) $145,710 $145,757 ($180,783) ($266,303) $145,348 $145,301 ($180,593) ($266,781) $142,020 $142,323 ($170,118) ($246,044) $116,249 $116,020 ($157,005) ($223,979) $53,929 ($113,721)
Round 9
Note: Data drawn from observations of contestants on the U.S. game show, plus author’s simulations of virtual lotteries as explained in the text.
10
9
8
7
6
5
4
3
2
1
Round 2 Round 3 Round 4
Looking At Virtual Lottery Realized In y
Virtual Lotteries for US Deal or No Deal Game Show.
Round Active Contestants Deal! Average Offer
Table 1.
Risk Aversion in Game Shows 383
384
STEFFEN ANDERSEN ET AL.
Density
VL if No Deal in round 1 and then Deal in round 2
0
50000
100000 Prize Value
150000
200000
Density
VL if No Deal in round 1 No Deal in round 2 and then Deal in round 3
0
50000
Fig. 11.
100000 Prize Value
150000
200000
Two Virtual Lottery Distributions in Round 1.
In round 1 the contestant can also consider what would happen if he turned down offers in rounds 1 and 2, and accepted the offer in round 3. This VL would generate, from the perspective of round 1, an average payoff of $53,757 with a standard deviation of $45,996. The bottom panel of Fig. 11 shows the simulated distribution of this particular VL. Compared to the VL in which the contestant said ‘‘No Deal’’ in round 1 and ‘‘Deal’’ in round 2, shown above it in Fig. 11, it gives less weight to the smallest prizes and greater weight to higher prizes. Similarly for each of the other VLs shown. The VL for the final Round 10 is simply the implied lottery over the final two unopened cases, since in this round the contestant would have said ‘‘No Deal’’ to all bank offers. The forward-looking contestant in round 1 is assumed to behave as if he maximizes the expected utility of accepting the current offer or continuing. The expected utility of continuing, in turn, is given by simply evaluating each of the nine VLs shown in the first row of Table 1. The average payoff increases steadily, but so does the standard deviation of payoffs, so this evaluation requires knowledge of the utility function of the contestant. Given that utility function, the contestant is assumed to behave as if they evaluate the expected utility of each of the nine VLs. Thus, we calculate nine expected utility numbers, conditional on the specification of the parameters
385
Risk Aversion in Game Shows
of the assumed utility function and the VLs that each subject faces in their round 1 choices. In round 1, the subject then simply compares the maximum of these nine expected utility numbers to the utility of the non-stochastic offer in round 1. If that maximum exceeds the utility of the offer, he turns down the offer; otherwise he accepts it. In round 2, a similar process occurs. One critical feature of our VL simulations is that they are conditioned on the actual outcomes that each contestant has faced in prior rounds. Thus, if a (real) contestant has tragically opened up the six top prizes in round 1, that contestant would not see VLs such as the ones in Table 1 for round 2. They would be conditioned on that player’s history in round 1. We report here averages over all players and all simulations. We undertake 100,000 simulations for each player in each round, so as to condition on their history.25 This example can also be used to illustrate how our maximum-likelihood estimation procedure works. Assume some specific utility function and some parameter values for that utility function, with all prizes scaled by the maximum possible at the outset of the game. The utility of the nonstochastic bank offer in round R is then directly evaluated. Similarly, the VLs in each round R can then be evaluated.26 They are represented numerically as 100-point discrete approximations, with 100 prizes and 100 probabilities associated with those prizes. Thus, by implicitly picking a VL over an offer, it is as if the subject is taking a draw from this 100-point distribution of prizes. In fact, they are playing out the DOND game, but this representation as a VL draw is formally identical. The evaluation of these VLs generates v(R) expected utilities, where v(1) ¼ 9, v(2) ¼ 8, y , v(9) ¼ 1 as shown in Table 1. The maximum expected utility of these v(R) in a given round R is then compared to the utility of the offer, and the likelihood evaluated in the usual manner. We present a formal statement of the latent EUT process leading to a likelihood defined over parameters and the observed choices, and then discuss how this intuition changes when we assume alternative, non-EUT processes.
3.2. Formal Specification We assume that utility is defined over money m using the popular CRRA function uðmÞ ¼
m1r ð1 rÞ
(1)
386
STEFFEN ANDERSEN ET AL.
where r is the utility function parameter to be estimated. In this case r 6¼ 1 is the RRA coefficient, and u(m) ¼ ln(m) for r ¼ 1. With this parameterization r ¼ 0 denotes risk-neutral behavior, rW0 denotes risk aversion, and ro0 denotes risk loving. We review one extension to this simple CRRA model later, but for immediate purposes it is desirable to have a simple specification of the utility function in order to focus on the estimation methodology.27 Probabilities for each outcome k, pk, are those that are induced by the task, so expected utility is simply the probability-weighted utility of each outcome in each lottery. There were 100 outcomes in each VL i, so X ½pk uk (2) EUi ¼ k¼1;100
Of course, we can view the bank offer as being a degenerate lottery. A simple stochastic specification was used to specify likelihoods conditional on the model. The EU for each lottery pair was calculated for a candidate estimate of the utility function parameters, and the index rEU ¼
EUBO EUL m
(3)
is calculated, where EUL is the lottery in the task, EUBO the degenerate lottery given by the bank offer, and m a Fechner noise parameter following Hey and Orme (1994).28 The index rEU is then used to define the cumulative probability of the observed choice to ‘‘Deal’’ using the cumulative standard normal distribution function: GðrEUÞ ¼ FðrEUÞ
(4)
This provides a simple stochastic link between the latent economic model and observed choices.29 The likelihood, conditional on the EUT model being true and the use of the CRRA utility function, depends on the estimate of r and m given the above specification and the observed choices. The conditional log-likelihood is ln LEUT ðr; m; yÞ ¼
X
½ðln GðrEUÞjyi ¼ 1Þ þ ðlnð1 GðrEUÞÞjyi ¼ 0Þ
(5)
i
where yi ¼ 1(0) denotes the choice of ‘‘Deal’’ (No Deal) in task i. We extend this standard formulation to include forward-looking behavior by redefining the lottery that the contestant faces. One such VL reflects the
Risk Aversion in Game Shows
387
possible outcomes if the subject always says ‘‘No Deal’’ until the end of the game and receives his prize. We call this a VL since it need not happen; it does happen in some fraction of cases, and it could happen for any subject. Similarly, we can substitute other VLs reflecting other possible choices by the contestant. Just before deciding whether to accept the bank offer in round 1, what if the contestant behaves as if the following simulation were repeated G times: Play out the remaining eight rounds and pick cases at random until all but two cases are unopened. Since this is the last round in which one would receive a bank offer, calculate the expected value of the remaining two cases. Then multiply that expected value by the fraction that the bank is expected to use in round 9 to calculate the offer. Pick that fraction from a prior as to the average offer fraction, recognizing that the offer fraction is stochastic.
The end result of this simulation is a sequence of G virtual bank offers in round 9, viewed from the perspective of round 1. This sequence then defines the VL to be used for a contestant in round 1 whose horizon is the last round in which the bank will make an offer. Each of the G bank offers in this virtual simulation occurs with probability 1/G, by construction. To keep things numerically manageable, we can then take a 100-point discrete approximation of this lottery, which will typically consist of G distinct real values, where one would like G to be relatively large (we use G ¼ 100,000). This simulation is conditional on the six cases that the subject has already selected at the end of round 1. Thus, the lottery reflects the historical fact of the six specific cases that this contestant has already opened. The same process can be repeated for a VL that only involves looking forward to the expected offer in round 8. And for a VL that only involves looking forward to rounds 7, 6, 5, 4, 3, and 2, respectively. Table 1 illustrates the outcome of such calculations. The contestant can be viewed as having a set of nine VLs to compare, each of which entails saying ‘‘No Deal’’ in round 1. The different VLs imply different choices in future rounds, but the same response in round 1. To decide whether to accept the deal in round 1, we assume that the subject simply compares the maximum EU over these nine VLs with the utility of the deterministic offer in round 1. To calculate EU and utility of the offer one needs to know the parameters of the utility function, but these are just nine EU evaluations and one utility evaluation. These evaluations can be undertaken within a likelihood function evaluator, given candidate values of the parameters of the utility function. The same process can be repeated in round 2, generating another set of eight VLs to be compared to the actual bank offer in round 2. This
388
STEFFEN ANDERSEN ET AL.
simulation would not involve opening as many cases, but the logic is the same. Similarly for rounds 3–9. Thus, for each of round 1–9, we can compare the utility of the actual bank offer with the maximum EU of the VLs for that round, which in turn reflects the EU of receiving a bank offer in future rounds in the underlying game. In addition, there exists a VL in which the subject says ‘‘No Deal’’ in every round. This is the VL that we view as being realized in round 10 in Table 1. There are several significant advantages of this VL approach. First, since the round associated with the highest expected utility is not the same for all contestants due to heterogeneity in risk attitudes, it is of interest to estimate the length of this horizon. Since we can directly see that the contestant who has a short horizon behaves in essentially the same manner as the contestant who has a longer horizon, and just substitutes different VLs into their latent EUT calculus, it is easy to test hypotheses about restrictions on the horizon generated by more myopic behavior. Second, one can specify mixture models of different horizons, and let the data determine what fraction of the sample employs which horizon. Third, the approach generalizes for any known offer function, not just the ones assumed here and in Table 1. Thus, it is not as specific to the DOND task as it might initially appear. This is important if one views DOND as a canonical task for examining fundamental methodological aspects of dynamic choice behavior. Those methods should not exploit the specific structure of DOND, unless there is no loss in generality. In fact, other versions of DOND can be used to illustrate the flexibility of this approach, since they sometimes employ ‘‘follow on’’ games that can simply be folded into the VL simulation. Finally, and not least, this approach imposes virtually no numerical burden on the maximum-likelihood optimization part of the numerical estimation stage: all that the likelihood function evaluator sees in a given round is a non-stochastic bank offer, a handful of (virtual) lotteries to compare it to given certain proposed parameter values for the latent choice model, and the actual decision of the contestant to accept the offer or not. This parsimony makes it easy to examine non-CRRA and non-EUT specifications of the latent dynamic choice process, illustrated in Andersen et al. (2006a, 2006b). All estimates allow for the possibility of correlation between responses by the same subject, so the standard errors on estimates are corrected for the possibility that the responses are clustered for the same subject. The use of clustering to allow for ‘‘panel effects’’ from unobserved individual effects is common in the statistical survey literature.30 In addition, we consider allowances for random effects from unobserved individual heterogeneity31
389
Risk Aversion in Game Shows
after estimating the initial model that assumes that all subjects have the same preferences for risk.
3.3. Estimates from Behavior on the Game Show We estimate the CRRA coefficient to be 0.18 with a standard error of 0.030, implying a 95% confidence interval between 0.12 and 0.24. So this provides evidence of moderate risk aversion over this large domain. The noise parameter m is estimated to be 0.077, with a standard deviation of 0.015. Based on the estimated risk coefficients we can calculate the future round for which each contestant had the highest expected utility, seen from the perspective of the round when each decision is made. Fig. 12 displays histograms of these implied maximum EU rounds for each round-specific decision. For example, when contestants are in round 1 making a decision over ‘‘Deal’’ or ‘‘No Deal’’ we see that there is a strong mode for future round 9 as being the round with the maximum EU, given the estimated risk coefficients. The prominence of round 9 remains across all rounds where contestants are faced with a ‘‘Deal’’ or ‘‘No Deal’’ choice, although we can
1
4
7
2
5
8
3
6
9
1 2 3 4 5 6 7 8 9 10
1 2 3 4 5 6 7 8 9 10 Future Round Used
1 2 3 4 5 6 7 8 9 10
150 100 50
Frequency
0 150 100 50 0 150 100 50 0
Fig. 12.
Evaluation Horizon by Round.
390
STEFFEN ANDERSEN ET AL.
see that in rounds 5–7 there is a slight increase in the frequency by which earlier rounds provide the maximum EU for some contestants. The expected utilities for other VLs may well have generated the same binary decision, but the VL for round 9 was the one that appeared to be used since it was greater than the others in terms of expected utility. We assume in the above analysis that all contestants can and do evaluate the expected utility for all VLs defined as the EU of bank offers in future rounds. Nevertheless, it is possible that some, perhaps all, contestants used a more myopic approach and evaluated EU over much shorter horizons. It is a simple matter to examine the effects of constraining the horizon over which the contestant is assumed to evaluate options. If one assumed that choices in each round were based on a comparison of the bank offer and the expected outcome from the terminal round, ignoring the possibility that the maximum EU may be found for an intervening round, then the CRRA estimate becomes 0.12, with a 95% confidence interval between 0.10 and 0.15. We cannot reject the hypothesis that subjects behave as if they are less risk averse if they are only assumed to look to the terminal round, and ignore the intervening bank offers. If one instead assumes that choices in each round were based on a myopic horizon, in which the contestant just considers the distribution of likely offers in the very next round, the CRRA estimate becomes 0.22, with a 95% confidence interval between 0.18 and 0.42. Thus, we obtain results that are similar to those obtained when we allow subjects to consider all horizons, although the estimates are biased and imply greater risk aversion than the unconstrained estimates. The estimated noise parameter increases to 0.12, with a standard error of 0.043. Overall, the estimates assuming myopia are statistically significantly different from the unconstrained estimates, even if the estimates of risk attitudes are substantively similar. Our specification of alternative evaluation horizons does not lead to a nested hypothesis test of parameter restrictions, so a formal test of the differences in these estimates required a non-nested hypothesis test. We use the popular Vuong (1989) procedure, even though it has some strong assumptions discussed in Harrison and Rutstro¨m (2005). We find that we can reject the hypothesis that the evaluation horizon is only the terminal horizon with a p-value of 0.026, and also reject the hypothesis that the evaluation horizon is myopic with a p-value of less than 0.0001. Finally, we can consider the validity of the CRRA assumption in this setting, by allowing for varying RRA with prizes. One natural candidate utility function to replace (1) is the Hyperbolic Absolute Risk Aversion (HARA) function of Merton (1971). We use a specification of HARA32
Risk Aversion in Game Shows
391
given in Gollier (2001): y 1g UðyÞ ¼ z Z þ ; ga0 g
(10 Þ
where the parameter z can be set to 1 for estimation purposes without loss of generality. This function is defined over the domain of y such that Z þ y=g40. The first order derivative with respect to income is zð1 gÞ y g U 0 ð yÞ ¼ Zþ g g which is positive if and only if zð1 gÞ=g40 for the given domain of y. The second-order derivative is zð1 gÞ y g1 U 00 ðyÞ ¼ o0 Zþ g g which is negative for the given domain of y. Hence it is not possible to specify risk-loving behavior with this specification when non-satiation is assumed. This is not a particularly serious restriction for a model of aggregate behavior in DOND. With this specification ARA is 1/(Z+y/g), so the inverse of ARA is linear in income; RRA is y/(Z+y/g), which can both increase and decrease with income. Relative risk aversion is independent of income and equal to g when Z ¼ 0. Using the HARA utility function, we estimate Z to be 0.30, with a standard error of 0.070 and a 95% confidence interval between 0.15 and 0.43. Thus, we can easily reject the assumption of CRRA over this domain. We estimate g to be 0.992, with a standard error of 0.001. Evaluating RRA over various prize levels reveals an interesting pattern: RRA is virtually 0 for all prize levels up to around $10,000, when it becomes 0.03, indicating very slight risk aversion. It then increases sharply as prize levels increase. At $100,000 RRA is 0.24, at $250,000 it is 0.44, at $500,000 it is 0.61, at $750,000 it is 0.70, and finally at $1 million it is 0.75. Thus, we observe striking evidence of risk neutrality for small stakes, at least within the context of this task, and risk aversion for large stakes. If contestants are constrained to only consider the options available to them in the next round, roughly the same estimates of risk attitudes obtain, even if one can again statistically reject this implicit restriction. RRA is again overestimated, reaching 0.39 for prizes of $100,000, 0.61 for prizes of $250,000, and 0.86 for prizes of $1 million. On the other hand, assuming that contestants only evaluate the terminal option leads to much lower
392
STEFFEN ANDERSEN ET AL.
estimates of risk aversion, consistent with the findings assuming CRRA. In this case there is virtually no evidence of risk aversion at any prize level up to $1 million, which is clearly implausible a priori.
3.4. Approximation to the Fully Dynamic Path Our VL approach makes one simplifying assumption which dramatically enhances its ability to handle complicated sequences of choices, but which can lead to a bias in the resulting estimates of risk attitudes. To illustrate, consider the contestant in round 8, facing three unopened prizes and having to open one prize if he declines the bank offer in round 8. Call these prizes X, Y, and Z. There are three combinations of prizes that could remain after opening one prize. Our approach to the VL, from the perspective of the round 8 decision, evaluates the payoffs that confront the contestant for each of these three combinations if he ‘‘mentally locks himself into saying Deal (D) in round 9 and then gets the stochastic offer given the unopened prizes’’ or if he ‘‘mentally locks himself into saying No Deal (ND) in round 9 and then opens 1 more prize.’’ The former is the VL associated with the strategy of saying ND in round 8 and D in round 9, and the latter is the VL associated with the strategy of saying ND in round 8 and ND again in round 9. We compare the EU of these two VL as seen from round 8, and pick the largest as representing the EU from saying ND in round 8. Finally, we compare this EU to the U from saying D in round 8, since the offer in round 8 is known and deterministic. The simplification comes from the fact that we do not evaluate the utility function in each of the possible virtual round 9 decisions. A complete enumeration of each possible path would undertake three paired comparisons. Consider the three possible outcomes: If prize X had been opened we would have Y and Z unopened coming into virtual round 9. This would generate a distribution of offers in virtual round 9 (it is a distribution since the expected offer as a percent of the EV of unopened prizes is stochastic as viewed from round 8). It would also generate two outcomes if the contestant said ND: either he opens Y or he opens Z. A complete enumeration in this case should evaluate the EU of saying D and compare it to the EU of a 50–50 mix of Y and Z. If prize Y had been opened we would have X and Z unopened coming into virtual round 9. A complete enumeration should evaluate the EU of saying D and compare it to the EU of a 50–50 mix of X and Z.
Risk Aversion in Game Shows
393
If prize Z had been opened we would have X and Y unopened coming into virtual round 9. A complete enumeration should evaluate the EU of saying D and compare it to the EU of a 50–50 mix of X and Y. Instead of these three paired comparisons in virtual round 9, our approach collapses all of the offers from saying D in virtual round 9 into one VL, and all of the final prize earnings from saying ND in virtual round 9 into another single VL. Our approach can be viewed as a valid solution to the dynamic problem the contestant faces if one accepts the restriction in the set of control strategies considered by the contestant. This restriction could be justified on behavioral grounds, since it does reduce the computational burden if in fact the contestant was using a process such as we use to evaluate the path. On the other hand, economists typically view the adoption of the optimal path as an ‘‘as if’’ prediction, in which case this behavioral justification would not apply. Or our approach may just be viewed as one way to descriptively model the forward-looking behavior of contestants, which is one of the key features of the analysis of the DOND game show. Just as we have alternative ways of modeling static choice under uncertainty, we can have alternative ways of modeling dynamic choice under uncertainty. At some point it would be valuable to test these alternative models against each other, but that does not have to be the first priority in trying to understand DOND behavior. It is possible to extend our general VL approach to take into account these possibilities, since one could keep track of all three pairs of VL in the above complete enumeration, rather than collapsing it down to just one pair of VL. Refer to this complete enumeration as VL. From the perspective of the contestant, we know that the EU(VL)ZEU(VL), since VL contains VL as a special case. We can therefore identify the implication of using VL instead of VL for our inferences about risk attitudes, again considering the contestant in round 8 for ease of exposition, and assuming that the contestant actually undertakes full enumeration as reflected in VL. Specifically, we will understate the EU of saying ND in round 8. This means that our ML estimation procedure would be biased toward finding less risk aversion than there actually is. To see this, assume some trial value of a CRRA risk aversion parameter. There are three possible cases, taking strict inequalities to be able to state matters crisply: 1. If this trial parameter r generates EU(VL)WEU(VL)WU(D) then the VL approach would make the same qualitative inference as the
394
STEFFEN ANDERSEN ET AL.
VL approach, but would understate the likelihood of that observation. This understatement comes from the implication that EU(VL)U(D)WEU(VL)U(D), and it is this difference that determines the probability of the observed choice (after some adjustment for a stochastic error). 2. If this trial parameter r generates EU(VL)oEU(VL)oU(D) then the VL approach would again make the same qualitative inference as the VL approach, but would overstate the likelihood of that observation. This overstatement comes from the implication in this case that EU(VL)U(D)oEU(VL)U(D). 3. If this trial parameter r generates EU(VL)WU(D)WEU(VL), then the VL approach would lead us to predict that the subject would make the D decision, whereas the VL approach would lead us to predict that the subject would make the ND decision. If we assume that the subject is actually motivated by VL, and we incorrectly use VL, we would observe a choice of ND and would be led to lower our trial parameter r to better explain the observed choice; lowering r would make the subject less risk averse, and more likely to reject the D decision under VL. But we should not have lowered the parameter r, we should just have calculated the EU of the ND choice using VL instead of VL. Note that one cannot just tabulate the incidence of these three cases at the final ML estimate of r, and check to see if the vast bulk of choices fall into case #1 or case #2, since that estimate would have been adjusted to avoid case #3 if possible. And there is no presumption that the bias of the likelihood estimation in case #1 is just offset by the bias in case #2. So the bias from case #3 would lead us to expect that risk aversion would be underestimated, but the secondary effects from cases #1 and #2 should also be taken into account. Of course, if the contestant does not undertake full enumeration, and instead behaves consistently with the logic of our VL model, there is no bias at all in our estimates. The only way to evaluate the extent of the bias is to undertake the complete enumeration required by VL and compare to the approximation obtained with VL. We have done this for the game show data in the United States, starting with behavior in round 6. By skipping behavior in rounds 1–5 we only drop 15 out of 141 subjects, and undertaking the complete enumeration from earlier rounds is computationally intensive. We employ a 19-point approximation of the empirical distribution of bank offers in each round; in the VL approach we sampled 100,000 times from those distributions as part of the VL simulations. We then estimate the CRRA
Risk Aversion in Game Shows
395
model using VL, and estimate the same model for the same behavior using VL, and compare results. We find that the inferred CRRA coefficient increases as we use VL, as expected a priori, but by a very small amount. Specifically, we estimate CRRA to be 0.366 if we use VL and 0.345 if we use VL, and where the 95% confidence intervals comfortably overlap (they are 0.25 and 0.48 for the VL approach, and 0.25 and 0.44 for the VL approach). The log-likelihood under the VL approach is 212.54824, and it is 211.27711 under the VL approach, consistent with the VL approach providing a better fit, but only a marginally better fit. Thus, we can claim that our VL approach provides an excellent approximation to the fully dynamic solution. It is worth stressing that the issue of which estimate is the correct one depends on the assumptions made about contestant behavior. If one assumes that contestants in fact use strategies such as those embodied in VL, then using VL would actually overstate true risk aversion, albeit by a trivial amount.
3.5. Estimates from Behavior in the Laboratory The lab results indicate a CRRA coefficient of 0.45 and a 95% confidence interval between 0.38 and 0.52, comparable to results obtained using more familiar risk elicitation procedures due to Holt and Laury (2002) on the same subject pool. When we restrict the estimation model to only use the terminal period we again infer a much lower degree of risk aversion, consistent with risk neutrality; the CRRA coefficient is estimated to be 0.02 with a 95% confidence interval between 0.07 and 0.03. Constraining the estimation model to only consider prospects one period ahead leads to higher inferred risk aversion; the CRRA coefficient is estimated to be 0.48 with a 95% confidence interval between 0.41 and 0.55.
4. CONCLUSIONS Game shows offer obvious advantages for the estimation of risk attitudes, not the least being the use of large stakes. Our review of analyses of these data reveal a steady progression of sophistication in terms of the structural estimation of models of choice under uncertainty. Most of these shows, however, put the contestant into a dynamic decision-making environment; so one cannot simply (and reliably) use static models of choice. Using DOND as a detailed case study, we considered a general estimation
396
STEFFEN ANDERSEN ET AL.
methodology for such shows in which randomization of the potential outcomes allows us to break the curse of dimensionality that comes from recognizing these dynamic elements of the task environment. The DOND paradigm is important for several reasons, and more general than it might at first seem. It incorporates many of the dynamic, forwardlooking decision processes that strike one as a natural counterpart to a wide range of fundamental economic decisions in the field. The ‘‘option value’’ of saying ‘‘No Deal’’ has clear parallels to the financial literature on stock market pricing, as well as to many investment decisions that have future consequences (so-called real options). There is no frictionless market ready to price these options, so familiar arbitrage conditions for equilibrium valuation play no immediate role, and one must worry about how the individual makes these decisions. The game show offers a natural experiment, with virtually all of the major components replicated carefully from show to show, and even from country to country. The only sense in which DOND is restrictive is that it requires that the contestant make a binary ‘‘stop/go’’ decision. This is already a rich domain, as illustrated by several prominent examples: the evaluation of replacement strategy of capital equipment (Rust, 1987) and the closure of nuclear power plants (Rothwell & Rust, 1997). But it would be valuable to extend the choice variable to be non-binary, such as in Card Sharks where the contestant has the bet level to decide in each round, as well as some binary decision (whether to switch the face card). Although some progress has been made on this problem, reviewed in Rust (1994), the range of applications has not been wide (e.g., Rust & Rothwell, 1995). Moreover, none of these have considered risk attitudes, let alone associated concepts such as loss aversion or probability weighting. Thus, the detailed analysis of choice behavior in environments such as Card Sharks should provide a rich test case for many broader applications. These game shows provide a particularly fertile environment to test extensions to standard EUT models, as well as alternatives to EUT models of risk attitudes. Elsewhere, we have discussed applications that consider rankdependent models such as RDU, and sign-dependent models such as CPT (Andersen et al., 2006a, 2006b). These applications, using the VL approach and U.K. data, have demonstrated the sensitivity of inferences to the manner in which key concepts are operationalized. Andersen et al. (2006a) find striking evidence of probability weighting, which is interesting since the DOND game has symmetric probabilities on each case. Using natural reference points to define contestant-specific gains or losses, they find no evidence of loss aversion. Of course, that inference depends on having
Risk Aversion in Game Shows
397
identified the right reference point, but CPT is generally silent on that specification issue when it is not obvious from the frame. Andersen et al. (2006b) illustrate the application of alternative ‘‘dual-criteria’’ models of choice from psychology, built to account for lab behavior with long shot, asymmetric lotteries such as one finds in DOND. No doubt many other specifications will be considered. Within the EUT framework, Andersen et al. (2006a) demonstrate the importance of allowing for asset integration. When utility is assumed to be defined over prizes plus some outside wealth measure,33 behavior is well characterized by a CRRA specification; but when it is assumed to be defined over prizes only, behavior is better characterized by a non-CRRA specification with increasing RRA over prizes. There are three major weaknesses of game shows. The first is that one cannot change the rules of the game or the information that contestants receive, much as one can in a laboratory experiment. Thus, the experimenter only gets to watch and learn, since natural experiments are, as described by Harrison and List (2004), serendipity observed. However, it is a simple matter to design laboratory experiments that match the qualitative task domains in the game show, even if one cannot hope to have stakes to match the game show (e.g., Tenorio & Cason, 2002; Healy & Noussair, 2004; Andersen et al., 2006b; and Post, van den Assem, Baltussen, & Thaler, 2006). Once this has been done, exogenous treatments can be imposed and studied. If behavior in the default version of the game can be calibrated to behavior in a lab environment, then one has some basis for being interested in the behavioral effects of treatments in the lab. The second major weakness of game shows is the concern that the sample might have been selected by some latent process correlated with the behavior of interest to the analyst: the classic sample selection problem. Most analyses of game shows are aware of this, and discuss the procedures by which contestants get to participate. At the very least, it is clear that the demographic diversity is wider than found in the convenience samples of the lab. We believe that controlled lab experiments can provide guidance on the extent of sample selection into these tasks, and that the issue is a much more general one. The third major weakness of game shows is the lack of information on observable characteristics, and hence the inability to use that information to examine heterogeneity of behavior. It is possible to observe some information from the contestant, since there is normally some pre-game banter that can be used to identify sex, approximate age, marital status, and ethnicity. But the general solution here is to employ econometric methods that allow one to correct for possible heterogeneity at the level of the individual, even if one
398
STEFFEN ANDERSEN ET AL.
cannot condition on observable characteristics of the individual. Until then, one either pools over subjects under the assumption that they have the same preferences, as we have done; make restrictive assumptions that allow one to identify bounds for a given contestant, but then provide contestant-specific estimates (e.g., Post et al., 2006); or pay more attention to statistical methods that allow for unobserved heterogeneity. One such method is to allow for random coefficients of each structural model to represent an underlying variation in preferences across the sample (e.g., Train, 2003, Chapter 6; De Roos & Sarafidis, 2006; and Botti et al., 2006). This is quite different from allowing for standard errors in the pooled coefficient, as we have done. Another method is to allow for finite mixtures of alternative structural models, recognizing that some choices or subjects may be better characterized in this domain by one latent decision-making process and that others may be better characterized by some other process (e.g., Harrison & Rutstro¨m, 2005). These methods are not necessarily alternatives, but they each demand relatively large data sets and considerable attention to statistical detail.
NOTES 1. Behavior on Who Wants To Be A Millionaire has been carefully evaluated by Hartley, Lanot, and Walker (2005), but this game involves a large number of options and alternatives that necessitate some strong assumptions before one can pin down risk attitudes rigorously. We focus on games in which risk attitudes are relatively easier to identify. 2. These experiments are from unpublished research by the authors. 3. In the earliest versions of the show this option only applied to the first card in the first row. Then it applied to the first card in each row in later versions. Finally, in the last major version it applied to any card in any row, but only one card per row could be switched. 4. Two further American versions were broadcast. One was a syndicated version in the 1986/1987 season, with Bill Rafferty as host. Another was a brief syndicated version in 2001. A British version, called Play Your Cards Right, aired in the 1980s and again in the 1990s. A German version called Bube Dame Ho¨rig, and a Swedish version called Lagt Kort Ligger, have also been broadcast. Card Sharks re-runs remain relatively popular on the American Game Show Network, a cable station. 5. Available at http://data.bls.gov/cgi-bin/cpicalc.pl 6. Let the expected utility of the bet b be pwin U(b)+plose U(b). The first order condition for a maximum over b is then pwin Uu(b)+plose Uu(b) ¼ 0. Since Uu(b) ¼ exp(ab) and Uu(b) ¼ exp(a(b)), substitution and simple manipulation yield the formula.
Risk Aversion in Game Shows
399
7. In addition, a variable given by stake2/2000 is included by itself to account for possible nonlinearities. 8. Gertner (1993, p. 512): ‘‘I treat each bet as a single observation, ignoring any contestant-specific effects.’’ 9. He rejects this hypothesis, for reasons not important here. 10. For example, in a game aired on 9/16/2004, the category was ‘‘Speaking in Tongues.’’ The $800 text was ‘‘A 1996 Oakland School Board decision made many aware of this term for African-American English.’’ Uber-champion Ken Jennings correctly responded, ‘‘What be Ebonics?’’ 11. Nalebuff (1990, p. 182) proposed the idea of the analysis, and the use of empirical responses to avoid formal analysis of the strategic aspects of the game. 12. One formal difference is that the first order condition underlying that formula assumes an interior solution, and the decision-maker in runaway games has to ensure that he does not bet too much to fall below the highest possible points of his rival. Since this constraint did not bind in the 110 data points available, it can be glossed. 13. The Lingo Board in the U.S. version is larger, and there are more balls in the urn, with implications for the probabilities needed to infer risk attitudes. 14. Their Eq. (12) shows the formula for the general case, and Eqs. (5) and (8) for the special final-round cases assuming CRRA or CARA. There is no statement that this is actually evaluated within the maximum-likelihood evaluator, but pni is not listed as a parameter to be estimated separately from the utility function parameter, so this is presumably what was done. 15. The point estimates for the CRRA function (their Table 6, p. 837) are generally around f1,800 and f1,500, with standard errors of roughly f 200 on each. Similar results obtain for the CARA function (their Table 7, p. 839). So these differences are not obviously significant at standard critical levels. 16. A handful of special shows, such as season finales and season openers, have higher stakes up to $6 million. Our later statistical analysis includes these data, and adjusts the stakes accordingly. 17. Or make some a priori judgments about the bounded rationality of contestants. For example, one could assume that contestants only look forward one or two rounds, or that they completely ignore bank offers. 18. Other top prizes were increased as well. For example, in the final show of the first season, the top five prizes were changed from $200k, $300k, $400k, $500k, and $1m to $300k, $400k, $500k, $2.5m, and $5m, respectively. 19. The instructions are available in Appendix A of the working paper version, available online at http://www.bus.ucf.edu/wp/ 20. The screen shots provided in the instructions and computer interface were much larger, and easier to read. Baltussen, Post, and van den Assem (2006) also conducted laboratory experiments patterned on DOND. They used instructions which were literally taken from the instructions given to participants in the Dutch DOND game show, with some introductory text from the experimenters explaining the exchange rate between the experimental game show earnings and take home payoffs. Their approach has the advantage of using the wording of instructions used in the field. Our objective was to implement a laboratory experiment based on the DOND task, and clearly referencing it as a natural counterpart to the lab
400
STEFFEN ANDERSEN ET AL.
experiment. But we wanted to use instructions which we had complete control over. We wanted subjects to know exactly what bank offer function was going to be used. In our view the two types of DOND laboratory experiments complement each other, in the same sense in which lab experiments, field experiments, and natural experiments are complementary (see Harrison & List, 2004). 21. Virtually all subjects indicated that they had seen the U.S. version of the game show, which was a major ratings hit on network television in five episodes screened daily at prime time just prior to Christmas in 2005. Our experiments were conducted about a month after the return of the show in the U.S., following the 2006 Olympic Games. 22. The literature has already generated a lengthy lead article in the Wall Street Journal (January 12, 2006, p. A1) and National Public Radio interviews in the U.S. with researchers Thaler and Post on the programs Day to Day (http://www.npr.org/ templates/story/story.php?storyId=5243893) and All Things Considered (http:// www.npr.org/templates/story/story.php?storyId=5244516) on March 3, 2006. 23. Appendix B is available in the working paper version, available online at http://www.bus.ucf.edu/wp/ 24. Abdellaoui, Barrios, and Wakker (2007, p. 363) offer a one-parameter version of the Expo-Power function which exhibits non-constant RRA for empirically plausible parameter values. It does impose some restrictions on the variations in RRA compared to the two-parameter EP function, but is valuable as a parsimonious way to estimate non-CRRA specifications, and could be used for ‘‘bounds analyses’’ such as these. 25. If bank offers were a deterministic and known function of the expected value of unopened prizes, we would not need anything like 100,000 simulations for later rounds. For the last few rounds of a full game, in which the bank offer is relatively predictable, the use of this many simulations is a numerically costless redundancy. 26. There is no need to know risk attitudes, or other preferences, when the distributions of the virtual lotteries are generated by simulation. But there is definitely a need to know these preferences when the virtual lotteries are evaluated. Keeping these computational steps separate is essential for computational efficiency, and is the same procedurally as pre-generating ‘‘smart’’ Halton sequences of uniform deviates for later, repeated use within a maximum-simulated likelihood evaluator (e.g., Train, 2003, p. 224ff.). 27. It is possible to extend the analysis by allowing the core parameter r to be a function of observable characteristics. Or one could view the CRRA coefficient as a random coefficient reflecting a subject-specific random effect u, so that one would estimate r^ ¼ r^0 þ u instead. This is what De Roos and Sarafidis (2006) do for their core parameters, implicitly assuming that the mean of u is zero and estimating the standard deviation of u. Our approach is just to estimate r^0 . 28. Harless and Camerer (1994), Hey and Orme (1994), and Loomes and Sugden (1995) provided the first wave of empirical studies including some formal stochastic specification in the version of EUT tested. There are several species of ‘‘errors’’ in use, reviewed by Hey (1995, 2002), Loomes and Sugden (1995), Ballinger and Wilcox (1997), and Loomes, Moffatt, and Sugden (2002). Some place the error at the final choice between one lottery or the other after the subject has decided deterministically which one has the higher expected utility; some place the error earlier, on the comparison of preferences leading to the choice; and some place the error even earlier, on the determination of the expected utility of each lottery.
401
Risk Aversion in Game Shows
29. De Roos and Sarafidis (2006) assume a random effects term v for each individual and add it to the latent index defining the probability of choosing deal. This is the same thing as changing our specification (4) to GðrEUÞ ¼ FðrEUÞ þ v, and adding the standard deviation of v as a parameter to be estimated (the mean of v is assumed to be 0). 30. Clustering commonly arises in national field surveys from the fact that physically proximate households are often sampled to save time and money, but it can also arise from more homely sampling procedures. For example, Williams (2000, p. 645) notes that it could arise from dental studies that ‘‘collect data on each tooth surface for each of several teeth from a set of patients’’ or ‘‘repeated measurements or recurrent events observed on the same person.’’ The procedures for allowing for clustering allow heteroskedasticity between and within clusters, as well as autocorrelation within clusters. They are closely related to the ‘‘generalized estimating equations’’ approach to panel estimation in epidemiology (see Liang & Zeger, 1986), and generalize the ‘‘robust standard errors’’ approach popular in econometrics (see Rogers, 1993). Wooldridge (2003) reviews some issues in the use of clustering for panel effects, noting that significant inferential problems may arise with small numbers of panels. 31. In the DOND literature, de Roos and Sarafidis (2006) demonstrate that alternative ways of correcting for unobserved individual heterogeneity (random effects or random coefficients) generally provide similar estimates, but that they are quite different from estimates that ignore that heterogeneity. Botti, Conte, DiCagno, and D’Ippoliti (2006) also consider unobserved individual heterogeneity, and show that it is statistically significant in their models (which ignore dynamic features of the game). 32. Gollier (2001, p. 25) refers to this as a Harmonic Absolute Risk Aversion, rather than the Hyperbolic Absolute Risk Aversion of Merton (1971, p. 389). 33. This estimated measure might be interpreted as wealth, or as some function of wealth in the spirit of Cox and Sadiraj (2006).
ACKNOWLEDGMENTS Harrison and Rutstro¨m thank the U.S. National Science Foundation for research support under grants NSF/IIS 9817518, NSF/HSD 0527675, and NSF/SES 0616746. We are grateful to Andrew Theophilopoulos for artwork.
REFERENCES Abdellaoui, M., Barrios, C., & Wakker, P. P. (2007). Reconciling introspective utility with revealed preference: Experimental arguments based on prospect theory. Journal of Econometrics, 138, 356–378. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006a). Dynamic choice behavior in a natural experiment. Working Paper 06–10, Department of Economics, College of Business Administration, University of Central Florida.
402
STEFFEN ANDERSEN ET AL.
Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006b). Dual criteria decisions. Working Paper 06–11, Department of Economics, College of Business Administration, University of Central Florida. Ballinger, T. P., & Wilcox, N. T. (1997). Decisions, error and heterogeneity. Economic Journal, 107, 1090–1105. Baltussen, G., Post, T., & van den Assem, M. (2006). Stakes, prior outcomes and distress in risky choice: An experimental study based on Deal or No Deal. Working Paper, Department of Finance, Erasmus School of Economics, Erasmus University. Beetsma, R. M. W. J., & Schotman, P. C. (2001). Measuring risk attitudes in a natural experiment: Data from the television game show Lingo. Economic Journal, 111, 821–848. Blavatskyy, P., & Pogrebna, G. (2006). Testing the predictions of decision theories in a natural experiment when half a million is at stake. Working Paper 291, Institute for Empirical Research in Economics, University of Zurich. Bombardini, M., & Trebbi, F. (2005). Risk aversion and expected utility theory: A field experiment with large and small stakes. Working Paper 05–20, Department of Economics, University of British Columbia. Botti, F., Conte, A., DiCagno, D., & D’Ippoliti, C. (2006). Risk attitude in real decision problems. Unpublished Manuscript, LUISS Guido Carli, Rome. Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The econometrics of financial markets. Princeton: Princeton University Press. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56(1), 45–60. De Roos, N., & Sarafidis, Y. (2006). Decision making under risk in deal or no deal. Working Paper, School of Economics and Political Science, University of Sydney. Gertner, R. (1993). Game shows and economic behavior: Risk-taking on Card Sharks. Quarterly Journal of Economics, 108(2), 507–521. Gollier, C. (2001). The economics of risk and time. Cambridge, MA: MIT Press. Harless, D. W., & Camerer, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62(6), 1251–1289. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2005). Risk aversion and incentive effects: Comment. American Economic Review, 95(3), 897–901. Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2007). Estimating risk attitudes in Denmark: A field experiment. Scandinavian Journal of Economics, 109(2), 341–368. Harrison, G. W., Lau, M. I., Rutstro¨m, E. E., & Sullivan, M. B. (2005). Eliciting risk and time preferences using field experiments: Some methodological issues. In: J. Carpenter, G. W. Harrison & J. A. List (Eds), Field Experiments in Economics (Vol. 10). Greenwich, CT: JAI Press, Research in Experimental Economics. Harrison, G. W., & List, J. A. (2004). Field experiments. Journal of Economic Literature, 42(4), 1013–1059. Harrison, G. W., & Rutstro¨m, E. E. (2005). Expected utility theory and prospect theory: One wedding and a decent funeral. Working Paper 05–18, Department of Economics, College of Business Administration, University of Central Florida; Experimental Economics, forthcoming. Harrison, G. W., & Rutstro¨m, E. E. (2008). Risk aversion in the Laboratory. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics.
Risk Aversion in Game Shows
403
Hartley, R., Lanot, G., & Walker, I. (2005). Who really wants to be a Millionaire? Estimates of risk aversion from gameshow data. Working Paper, Department of Economics, University of Warwick. Healy, P., & Noussair, C. (2004). Bidding behavior in the Price Is Right Game: An experimental study. Journal of Economic Behavior and Organization, 54, 231–247. Hey, J. (1995). Experimental investigations of errors in decision making under risk. European Economic Review, 39, 633–640. Hey, J. D. (2002). Experimental economics and the theory of decision making under uncertainty. Geneva Papers on Risk and Insurance Theory, 27(1), 5–21. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62(6), 1291–1326. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92(5), 1644–1655. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences reveal about the real world? Journal of Economic Perspectives, 21(2), 153–174. Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22. Loomes, G., Moffatt, P. G., & Sugden, R. (2002). A microeconometric test of alternative stochastic theories of risky choice. Journal of Risk and Uncertainty, 24(2), 103–130. Loomes, G., & Sugden, R. (1995). Incorporating a stochastic element into decision theories. European Economic Review, 39, 641–648. Markowitz, H. (1952). The utility of wealth. Journal of Political Economy, 60, 151–158. Merton, R. C. (1971). Optimum consumption and portfolio rules in a continuous-time model. Journal of Economic Theory, 3, 373–413. Metrick, A. (1995). A natural experiment in ‘Jeopardy!’. American Economic Review, 85(1), 240–253. Mulino, D., Scheelings, R., Brooks, R., & Faff, R. (2006). An Empirical Investigation of Risk Aversion and Framing Effects in the Australian Version of Deal Or No Deal. Working Paper, Department of Economics, Monash University. Nalebuff, B. (1990). Puzzles: Slot machines, zomepirac, squash, and more. Journal of Economic Perspectives, 4(1), 179–187. Post, T., van den Assem, M., Baltussen, G., & Thaler, R. (2006). Deal or no deal? decision making under risk in a large-payoff game show. Working Paper, Department of Finance, Erasmus School of Economics, Erasmus University; American Economic Review, forthcoming. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3(4), 323–343. Quiggin, J. (1993). Generalized expected utility theory: The rank-dependent model. Norwell, MA: Kluwer Academic. Rogers, W. H. (1993). Regression standard errors in clustered samples. Stata Technical Bulletin, 13, 19–23. Rothwell, G., & Rust, J. (1997). On the optimal lifetime of nuclear power plants. Journal of Business and Economic Statistics, 15(2), 195–208. Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 55, 999–1033.
404
STEFFEN ANDERSEN ET AL.
Rust, J. (1994). Structural estimation of Markov decision processes. In: D. McFadden & R. Engle (Eds), Handbook of econometrics (Vol. 4). Amsterdam, NL: North-Holland. Rust, J. (1997). Using randomization to break the curse of dimensionality. Econometrica, 65(3), 487–516. Rust, J., & Rothwell, G. (1995). Optimal response to a shift in regulatory regime: The case of the US Nuclear Power Industry. Journal of Applied Econometrics, 10, S75–S118. Tenorio, R., & Cason, T. (2002). To spin or not to spin? Natural and laboratory experiments from The Price is Right. Economic Journal, 112, 170–195. Train, K. E. (2003). Discrete choice methods with simulation. New York, NY: Cambridge University Press. Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57(2), 307–333. Williams, R. L. (2000). A note on robust variance estimation for cluster-correlated data. Biometrics, 56, 645–646. Wooldridge, J. (2003). Cluster-sample methods in applied econometrics. American Economic Review (Papers and Proceedings), 93, 133–138.
FURTHER REFLECTIONS ON THE REFLECTION EFFECT Susan K. Laury and Charles A. Holt ABSTRACT This paper reports a new experimental test of the notion that behavior switches from risk averse to risk seeking when gains are ‘‘reflected’’ into the loss domain. We conduct a sequence of experiments that allows us to directly compare choices under reflected gains and losses where real and hypothetical payoffs range from several dollars to over $100. Lotteries with positive payoffs are transformed into lotteries over losses by multiplying all payoffs by –1, that is, by reflecting payoffs around zero. When we use hypothetical payments, more than half of the subjects who are risk averse for gains turn out to be risk seeking for losses. This reflection effect is diminished considerably with cash payoffs, where the modal choice pattern is to exhibit risk aversion for both gains and losses. However, we do observe a significant difference in risk attitudes between losses (where most subjects are approximately risk neutral) and gains (where most subjects are risk averse). Reflection rates are further reduced when payoffs are scaled up by a factor of 15 (for both real and hypothetical payoffs).
Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 405–440 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00009-4
405
406
SUSAN K. LAURY AND CHARLES A. HOLT
1. INTRODUCTION One of the most widely cited articles in economics is Kahneman and Tversky’s (1979) paper on prospect theory, which is designed to explain a range of lottery choice anomalies. This theory is motivated by the authors’ laboratory surveys and by subsequent field observations (e.g., Camerer, 2001). A key observation is that decision making begins by identifying a reference point, often the current wealth position, from which people tend to be risk averse for gains and risk loving for losses. A striking prediction of the theory is the ‘‘reflection effect’’: a replacement of all positive payoffs by their negatives (reflection around zero) reverses the choice pattern. For example, a choice between a sure payoff of 3,000 and an 80 percent chance of getting 4,000 would be replaced by a choice between a certain loss of 3,000 and an 80 percent chance of losing 4,000. The typical reflection effect would imply a risk-averse preference for the sure safe 3,000 gain, but a reversed preference for the risky lottery in the loss domain. Reflected choice patterns reported by Kahneman and Tversky (1979) were quite high; for example, 80 percent of subjects chose the sure gain of 3,000, but only 8 percent chose the sure outcome when all payoffs were transformed into losses. The intuition is that ‘‘y certainty increases the aversiveness of losses as well as the desirability of gains’’ (Kahneman & Tversky, 1979, p. 269). The mathematical value functions used in prospect theory (concave for gains, convex for losses) can explain such a reflection effect, even when the safer prospect is not certain. This paper reports new experiments involving choice patterns with reflected gains and losses, using lotteries with real payoffs that range from several dollars to over $100. In this paper, we will use the terms ‘‘risk aversion’’ and ‘‘risk seeking’’ to refer to concavity and convexity of the utility function. It is worth noting that the nonlinear probability weighting present in many non-expected utility theories can also generate behavior that exhibits risk aversion (Tversky & Wakker, 1995).1 For example, consider an S-shaped weighting function that overweights small probabilities and underweights large probabilities, with probabilities of 0 and 1 getting weights of 0 and 1, respectively. In this setting, a person who prefers a 0.05 chance of 100 (otherwise 0) to a sure gain of 10 would exhibit risk seeking, which could be explained by overweighting of the low-probability gain. Similarly a person who prefers a sure payoff of 85 to a 0.95 chance of 100 (otherwise 0) would exhibit risk aversion, which could be explained by underweighting of the high-probability gain. A similar analysis can explain risk seeking for
Further Reflections on the Reflection Effect
407
high-probability losses and risk aversion for low-probability gains. Evidence supporting this ‘‘fourfold pattern’’ is provided by Tversky and Kahneman (1992). Notice that each of the above choices involved a comparison of a certainty with an uncertain payoff, and probability weighting can have a major effect if probabilities of 0 and 1 are not over- or underweighted (as is typically assumed), which can generate a ‘‘certainty effect.’’ The experiment design used in this paper involves paired lottery choices in which the probabilities of the high and low payoffs are held constant for a given pair of lotteries, but the payoffs for one of the lotteries are closer together, that is, it involves less risk. Another key element of risk preferences in prospect theory is loss aversion, which is typically modeled as a kink in the value function at the reference point, for example, 0. The intuition is that loss aversion causes the value function to decline more rapidly as the payoff is reduced below zero, and the kink at zero produces a concavity with respect to a pair of positive and negative payoffs. The experiments reported in this paper only involved pairs of payoffs that were either both positive or both negative, in order to avoid the confounding effect of loss aversion. Despite the widespread references to prospect theory, the decision patterns reported in Kahneman and Tversky (1979) and Tversky and Kahneman (1992) are based on hypothetical payoffs, set to be about equal to median monthly income in Israeli pounds at the time. They acknowledged that using real payoffs might change some behavioral patterns. However, their interest was in economic phenomena with larger stakes than those typically used in the lab; therefore they believed that using high hypothetical payoffs was the preferred method of eliciting choices. In doing so, they relied on the ‘‘assumption that people often know how they would behave in actual situations of choice, and on the further assumption that subjects have no special reason to disguise their true preferences’’ (Kahneman & Tversky, 1979). In Tversky and Kahneman (1992) they state that they found little difference in behavior between subjects who faced real and hypothetical payoffs.2 In contrast, there have been early documented effects of switching from hypothetical to monetary payoffs for choices between gambles (Slovic, 1969). While the use of hypothetical payoffs may not affect behavior much when low amounts of money are involved, this may not be the case with very high payoffs of the type used by Kahneman and Tversky to document the reflection effect. For example, Holt and Laury (2002) report that switching from hypothetical to real money payoffs has no significant effect in a series of lottery choices when the scale of payoffs is in the range of several dollars per decision problem, as is typical in economics experiments. In addition,
408
SUSAN K. LAURY AND CHARLES A. HOLT
there is no significant effect on choices when hypothetical payoffs are scaled up by factors of 20, 50, and 90, yielding (hypothetical) payoffs of several hundred dollars in the highest payoff conditions. This might lead researchers to conclude that increasing the scale of payoffs, or using hypothetical incentives, does not affect behavioral patterns. However, risk aversion increases sharply when real payoffs in these lotteries are increased in an identical manner.3 A similar increase in risk aversion as real payments are scaled up was reported by Binswanger (1980a, 1980b). Not all studies have shown evidence of ‘‘hypothetical bias,’’ but Harrison (2006) makes a strong case for the presence of such a bias in lottery choice experiments. In particular, he reexamines the widely cited Battalio, Kagel, and Jiranyakul (1990) study that found no qualitative effects of using hypothetical rather than real payoffs. Harrison reevaluates the data using a within-subject analysis, instead of a between-subjects analysis, and finds a significant difference between risk attitudes in real payoff and hypothetical payoff settings.4 These results are not surprising to the extent that risk aversion may be influenced by emotional considerations that psychologists call ‘‘affect’’ (Slovic, 2001), since emotional responses are likely to be stronger when gains and losses must be faced in reality.5 In view of economists’ skepticism about hypothetical incentives and of psychologists’ notions of affect, we decided to reevaluate the reflection effect using hypothetical gains and losses and real monetary gains and losses, and also to test the effect of payoff scale on choices under gains and losses of differing magnitudes. Risk seeking over losses has been observed in experiments with financial incentives that implement insurance markets. For example, Myagkov and Plott (1997) use market price and quantity data to infer that a majority of subjects are risk seeking in the loss domain in early periods of trading, but this tendency tends to diminish with experience. In contrast, BoschDomenech and Silvestre (1999) report a very strong tendency for subjects to purchase actuarially fair insurance over relatively large losses. This observation may indicate risk aversion in the loss domain; alternatively, it may be attributed to overweighting the low (0.2) probability of a loss (as suggested by the probability weighting function typically assumed in prospect theory).6 Laury and McInnes (2003) also find that almost all subjects choose to purchase fair insurance against low-probability losses. The percentage insuring decreases as the probability of incurring a loss increases, but about two-thirds purchase insurance when the probability of a loss is close to one-half and systematic probability misperceptions cannot be a factor. Laury, McInnes, and Swarthout (2007) report that over
Further Reflections on the Reflection Effect
409
90 percent of subjects purchase insurance against a 1-percent chance of a loss of their full $60 earned-endowment when the insurance is fair, and 85 percent purchase when the insurance price is four times the actuarially fair price. None of these studies were primarily focused on the reflection effect, and therefore, none of them had parallel gain/loss treatments. Taken together, these market experiments provide no strong evidence either for or against such an effect, although there is some evidence in each direction. Some lottery choice experiments have directly tested the reflection effect. Hershey and Schoemaker (1980) find evidence of reflection using hypothetical choices; in their study the highest rates of reflection were observed when probabilities were extreme. Cohen, Jaffray, and Said (1987) report only mixed support for a reflection effect, with only about 40 percent of the subjects exhibiting risk aversion for gains and risk preference for losses. Real payoffs were used, but the probability that any decision would be relevant was less than one in 5,000.7 Both Battalio et al. (1990) and Camerer (1989) report lottery choice experiments in which reflection patterns are present with real payoffs. These two studies involve choices where one gamble is a mean-preserving spread of the other, which is typically a certain amount of money. However, the amount of reflection (about 50 percent) is less than that reported by Kahneman and Tversky for most of their gambles. Harbaugh, Krause, and Vesterlund (2002) find that support for reflection depends on how the choice problem is presented. Specifically, they report that risk attitudes are consistent with prospect theory when subjects are asked to price gambles, but not when they choose between the gamble and its expected value. Market and insurance purchase experiments are useful in that they provide a rich, economically relevant context. Our approach is complementary; we use a simple tool to measure risk preferences directly, based on a series of lottery choices with significant money payoffs in parallel gain and loss treatments. This menu of choices allows us to obtain a well-calibrated measure of risk attitudes, which is not possible given the single pair-wise choices used in many of the earlier studies. The goal of the paper is to document the effect of reflecting payoffs (multiplying by –1) on lottery choice data for different payoff scales: hypothetical low payoffs, hypothetical high payoffs, low-money payoffs, and high-money payoffs. Our design, procedures, results (for low then high payoff conditions), maximumlikelihood estimation, and conclusions are presented in Sections 2–7, respectively.
410
SUSAN K. LAURY AND CHARLES A. HOLT
2. LOTTERY CHOICE DESIGN AND THEORETICAL PREDICTIONS The lottery choice task for the loss domain is shown in Table 1, as a menu of 10 decisions between lotteries that we will denote by S and R. These will be referred to as Decisions 1–10 (from top to bottom). In Decision 1 at the top of the table, the choice is between a loss of $3.20 for S and a loss of 20 cents for R, so subjects should start out choosing R at the top of the table, and then switch to S as the probability of the worse outcome ($4.00 for S or $7.70 for R) gets high enough. The optimal choice for a risk-neutral expected-utility maximizer is to choose R for the first five decisions, and then switch to S, as indicated by the sign change in the expected payoff differences shown in the right column of the table. In fact, the payoff numbers were selected so that the risk-neutral choice pattern (five risky followed by five safe choices) was optimal for constant absolute risk aversion in the range (0.05, 0.05), which is symmetric around zero.
Table 1. Lottery S
0/10 of $4.00, $3.20 1/10 of $4.00, $3.20 2/10 of $4.00, $3.20 3/10 of $4.00, $3.20 4/10 of $4.00, $3.20 5/10 of $4.00, $3.20 6/10 of $4.00, $3.20 7/10 of $4.00, $3.20 8/10 of $4.00, $3.20 9/10 of $4.00, $3.20
Lottery Choices in the Loss Domain. Lottery R
10/10 of 9/10 of 8/10 of 7/10 of 6/10 of 5/10 of 4/10 of 3/10 of 2/10 of 1/10 of
0/10 of $7.70, $0.20 1/10 of $7.70, $0.20 2/10 of $7.70, $0.20 3/10 of $7.70, $0.20 4/10 of $7.70, $0.20 5/10 of $7.70, $0.20 6/10 of $7.70, $0.20 7/10 of $7.70, $0.20 8/10 of $7.70, $0.20 9/10 of $7.70, $0.20
Expected Payoff of S– Expected Payoff of R
10/10 of
$3.00
9/10 of
$2.33
8/10 of
$1.66
7/10 of
$0.99
6/10 of
$0.32
5/10 of
$0.35
4/10 of
$1.02
3/10 of
$1.69
2/10 of
$2.36
1/10 of
$3.03
Further Reflections on the Reflection Effect
411
Since the two payoffs for the S lottery are of roughly the same magnitude, this lottery is relatively ‘‘safe’’ (i.e., the variance of outcomes is low relative to the R lottery). Therefore, increases in risk aversion will tend to cause one to switch to the S side before Decision 6. For example, with absolute risk aversion of r ¼ 0.1 in the utility function u(x) ¼ (1e(rx)), it is straightforward to show that the expected-utility maximizing choice is R in the first four decisions, and S in subsequent decisions. Conversely, riskloving preferences will cause a person to wait longer before switching to S, for example, to choose R in the six decisions at the top of the table for an absolute risk aversion coefficient of 0.1.8 The gain treatment was obtained from Table 1 by replacing each loss with the corresponding gain, so that Decision 1 involves a choice between certain earnings of $3.20 for S and a certain gain of $0.20 for R. This reverses the signs of the expected payoff differences shown in the final column of Table 1, so a risk-neutral person will choose S for the first five decisions before switching to Lottery R. A risk-averse person will wait longer to switch, therefore choosing more than five safe choices. With constant relative risk aversion (CRRA) (u(x) ¼ (x)1r/(1 r) and xW0), the expected-utility maximizing decision is to choose S in the top four rows of the transformed table with gains when r ¼ 0.3, and to choose S in the top six rows when r ¼ 0.3. To summarize, a risk-neutral expected-utility maximizer would make five safe choices in each treatment, risk aversion (in the sense of concave utility) implies more than five safe choices in either treatment, and risk seeking (in the sense of convex utility) implies less than five safe choices in the loss treatment. We will interpret seeing more than five safe choices in the gain treatment and less than five safe choices in the loss treatment as behavioral evidence of a reflection effect.9 Note that this type of reflection is an empirical pattern that fits nicely with the notion of a reference point in prospect theory from which gains and losses are measured. The predictions of a formal version of prospect theory will be considered next.
2.1. Prospect Theory We begin by reviewing the essential components of prospect theory. A prospect consists of a set of money prizes and associated probabilities. Consider a simple prospect that offers an amount x with probability p and y with probability 1 p, where xWyW0 are gains. The valuation
412
SUSAN K. LAURY AND CHARLES A. HOLT
functional is PT( p: x, y) ¼ w+( p)u(x)+(1 w+( p))u(y), where u designates the utility of money, and w+ the probability weighting function for gains. Next consider the case where xoyo0 are losses, which yields: PT( p: x, y) ¼ w( p)u(x)+(1 w( p))u(y). Here, u again designates the utility of money, now for losses, and w is the probability weighting function for losses. The standard approach for losses is to first transform the probability of the lowest outcome, and not the probability of the highest outcome as is done for gains. Tversky and Kahneman (1992) estimated a value function parameterized by a utility function xa where xW0, and l(x)b when xo0, where l is a loss aversion parameter. The estimate of l was 2.25, which creates a sharp ‘‘concave kink’’ in the value function at x ¼ 0. The estimates of a and b were both 0.88, which correspond to concavity in the gain domain and convexity in the loss domain. They also concluded that w+ was not very different from w. In what follows, we will assume that w+ and w are the same and we will therefore denote them both by w. We will also assume that the value functions for gains and losses are symmetric in the sense that utility for a loss is found by multiplying the utility of a gain of equal absolute value by l, for example, a ¼ b in the power function parameterization. Although Tversky and Kahneman (1992) distinguish between these two parameters in their theoretical exposition, many others have adopted the simplifying assumption that a and b are identical. Further, Kobberling and Wakker (2005, p. 127) note that the assumption of CRRA and a ¼ b allows for the identification of loss aversion without making other strong assumptions about utility. It easily follows that for xWyW0, the prospect theory valuation functionals are ‘‘reflected’’ in the sense that PT(p: x, y) ¼ lPT(p: x, y), or equivalently wð pÞuðxÞ þ ð1 wð pÞÞuðyÞ ¼ l½wð pÞuðxÞ þ ð1 wð pÞÞuðyÞ
(1)
The parameter l is important for evaluations of mixed lotteries with both gains and losses, but such lotteries are not present in our experiment. The parameter l plays no role in the ordering of lotteries with only losses (or only gains). Some studies done after Tversky and Kahneman (1992), including the data to be reported in this paper, suggest that reflection does not hold exactly, but as a benchmark, it is useful to know what an exact reflection ‘‘straw man’’ would imply for the choice menus that we use. Recall that our treatment transforms gains into losses of equal absolute value. The safe option is preferred in the gain domain if
Further Reflections on the Reflection Effect
413
w( p)u(4.00)+[1 w( p)]u(3.20)Ww( p)u(7.70)+[1 w( p)]u(0.20), or equivalently Option S preferred if
wð pÞ uð3:20Þ uð0:20Þ o 1 wð pÞ uð7:70Þ uð4:00Þ
(2)
Similarly, in the loss domain it is straightforward to show that Option S preferred if
wð pÞ uð0:20Þ uð3:20Þ 4 1 wð pÞ uð4:00Þ uð7:70Þ
(3)
But it follows from (1) that the right side of (3) is the same as the right side of (2), since the l expressions in the numerator and denominator cancel under the maintained assumptions. The reversal of inequalities in (2) and (3) means that if Lottery S is preferred in the gain domain for any particular value of p, the Lottery R will be preferred in the loss domain for that probability. Thus, an exact reflection in the value function (1) results in an exact reflection in lottery choices. Such a reflection occurs when, for example, a ¼ b ¼ 0.88, as noted above. Although exact reflection (e.g., seven safe choices in the gain domain and seven risky choices in the loss domain) can be predicted under these strong parametric conditions, such behavior is not pervasive in our data. Following Tversky and Kahneman (1992), we will focus on the qualitative predictions: whether there is risk aversion in the gain domain, and if so, whether this aversion becomes a preference in the loss domain. As noted above, the observation of more than five safe choices in either treatment is implied by risk aversion (concave utility), and the observation of less than five safe choices is implied by risk preference (convex utility).10
3. PROCEDURES All experiments were conducted at Georgia State University; participants responded to classroom and campus announcements about an opportunity to earn money in an economics research experiment. We recruited a total of 253 subjects in 25 groups, ranging in size from 4 to 16. No subject participated in more than one session. Subjects were separated by privacy dividers and were instructed not to communicate with each other after we began reading the instructions. Losses typically cannot be deducted from participants’ out-of-pocket cash reserves, so it was necessary to provide an initial cash balance. For example, Myagkov and Plott (1997) began by
414
SUSAN K. LAURY AND CHARLES A. HOLT
giving each participant a cash balance of $60. We chose to have subjects earn their initial balance; therefore all first participated in another decisionmaking task. We hoped that by doing so they would not view these earnings as windfall gains.11 Therefore, we appended the lottery choices for losses and gains to the end of research experiments being used for other projects.12 Instructions (contained in the Appendix) and the choice tasks were identical between the real and hypothetical sessions. At the beginning of the hypothetical payment sessions, subjects were given a handout (contained in the Appendix) that informed them that all earnings were hypothetical. The instructions read, in part, ‘‘The instructions y describe how your earnings depend on your decisions (and sometimes on the decisions of others). It is important that you understand that you will not actually be receiving any of this additional money (other than your $45 participation fee).’’ All subjects signed a statement indicating that they understood this. All sessions (real and hypothetical) began with a simple lottery choice to acquaint them with the procedures and the 10-sided die that was used to determine the random outcomes. The payoffs in this initial lottery choice task differed from those used later. After finishing these initial tasks, subjects knew their earnings up to that point. In the real payment sessions, these initial earnings averaged $43, and ranged from $21.68 to $92.08. As noted above, subjects in hypothetical sessions received a $45 participation fee. Even though the average cash amounts were about the same in the two treatments, the initial cash amounts differed from person to person in the real-payoff treatments, which could have an effect on variations in observed risk attitudes. The experiments reported here consisted of four choice tasks. The first and third of these were the lottery choice menus shown in Table 1, with alternation in the order of the gain and loss treatments in each pair of sessions to ensure that approximately the same number of subjects encountered each order. Thus, potential order effects were controlled in the low-payoff treatments (top two rows of Table 2) by alternating the order of the gain and loss treatments. As explained below, the average numbers of safe choices observed for the two orders were essentially the same, so for the high-payoff treatments (real and hypothetical) shown in the bottom three rows of Table 2, all sessions were conducted with the loss treatment first. In order to minimize ‘‘carry-over effects,’’ these lottery choice tasks were separated by an intentionally neutral decision, a symmetric matching pennies game with (real or hypothetical) payoffs of $3.00 for the ‘‘winner’’ or $2.00 for the ‘‘loser’’ in each cell. In the lottery choice parts, all 10 choices were presented as in Table 1, but with the lotteries labeled as Option A and Option B, and without the expected payoff
415
Further Reflections on the Reflection Effect
Table 2. Payoff Treatment
Low hypothetical Low real High hypothetical High hypotheticala High real
Number of Subjects by Treatment and Order. Initial Earnings
$45 $43 $45 $132 $140
Option A ‘‘Safe’’
Option A ‘‘Risky’’
Gains first
Losses first
Gains first
Losses first
19 19 0 0 0
19 19 16 16 16
23 22 0 0 0
20 16 16 16 16
a
Decisions in this hypothetical payoff experiment followed another experiment that used very high real earnings.
calculations that might bias subjects toward risk-neutral decisions. Option A was always listed on the left side of the decision sheet. For about half of these subjects, Option A was the safe lottery and it was the risky lottery for the remaining subjects. Table 2 shows the number of subjects in each treatment and presentation order. Probabilities were presented in terms of the outcome of a throw of a 10sided die, for example, ‘‘$3.20 if the throw is 1 or 2, y’’ The instructions also specified that payoffs would be determined by one decision selected ex post (again with the throw of a 10-sided die).13 We collected decisions for all four parts (the gain and loss menus and the two matching pennies games) before determining earnings for any of them. While this does not exactly hold (anticipated) wealth effects constant, it does control for emotional responses to good or bad outcomes in each part. Moreover, wealth effects do not matter in prospect theory, since the utility valuations are based on gains and losses from the current wealth position.
4. RESULTS FROM LOW-PAYOFF SESSIONS In this section, we present an overview of our data patterns and a nonparametric analysis of treatment effects in our experiment. More formal statistical analysis is presented in Section 6, below. We first compare the overall pattern of choices between the lotteries over gains and the lotteries over losses. This allows us to look for a reflection effect (risk aversion over gains and risk seeking over losses) in the aggregate data from our experiment.
416
SUSAN K. LAURY AND CHARLES A. HOLT
Recall that a risk-averse person would choose the safe lottery more than five times in each set of 10 paired choices, and that an approximately riskneutral person would choose the safe lottery five times before switching to the lottery with a wider range of payoffs. Some people are risk neutral in this sense, particularly when payoffs involve losses or are hypothetical. Fig. 1 shows cumulative choice frequencies for the number of safe choices for hypothetical payoffs (top) and real payoffs (bottom). In each panel, the thin line shows the risk-neutral prediction, for which the cumulative probability of four or fewer safe choices is zero, and the cumulative probability goes to one at five safe choices. The actual cumulative distributions for the gain treatment are below those of the loss treatment, indicating the tendency to make more safe choices in the gain domain, regardless of whether payoffs are real or hypothetical. These distributions indicate that, in aggregate, people are risk averse in the gain domain and approximately risk neutral in the loss domain. The difference between choices in the gain and loss domains is significant, both for real and hypothetical payoffs. We use a matched-pairs Wilcoxon test (one-tailed) because each subject made choices under both gains and losses. We do not observe any significant effect from the order in which the loss and gain treatments were conducted. Table 3 shows the mean number of safe choices by treatment (gain or loss, real or hypothetical). The top row shows all data for low (1) payoffs with both treatment orders combined (gains first and losses first). There was no clear effect of treatment order (gains first or losses first), as can be seen by comparing the top row (all data for both orders) with the second row (low 1 payoffs with losses presented to subjects first).14 Next, we turn our attention to the evidence for reflection at the individual level. The top panel of Fig. 2 summarizes the choice data for the low-payoff hypothetical choice sessions. We begin by looking at count data (an econometric analysis will follow in Section 6). We use the number of safe choices to categorize individuals as being risk averse, risk neutral, or risk seeking, both in the loss domain (left to right) and the gain domain (back to front). The ‘‘spike’’ at the back, right corner of the graph represents those who exhibit the predicted reflection effect: risk seeking for losses and risk aversion for gains. Fifty percent of the subjects are risk averse over gains (back row of the figure); of these, just over half are risk-loving for losses. Of those subjects who do reflect, 40 percent involve exact reflection, that is, the number of safe choices in the gain domain exactly matches the number of risky choices in the loss domain. The modal choice pattern under hypothetical payoffs is reflection, and in this sense we are able to replicate the predicted choice
417
Further Reflections on the Reflection Effect Hypothetical Payments Risk Neutral
Cumulative Frequency
1
0.8 Gains
0.6
Losses
0.4
0.2
0 0
1
2
3
8
9
10
8
9
10
Real Payments Risk Neutral
1
Cumulative Frequency
4 5 6 7 Number of Safe Choices
0.8 Gains 0.6
Losses
0.4
0.2
0 0
1
2
Fig. 1.
3
4 5 6 7 Number of Safe Choices
Cumulative Choice Frequencies.
418
SUSAN K. LAURY AND CHARLES A. HOLT
Table 3.
Mean Number of Safe Choices by Treatment.
Treatment
1, all data 1, loss–gain 15, loss–gain 15, loss–gaina
Real Payoffs
Hypothetical Payoffs
Gains
Losses
Gains
Losses
5.91 5.71 6.31 –
5.21 5.26 5.22 –
5.53 5.62 5.69 4.91
4.98 5.13 5.13 5.31
Note: 1, low payoff treatment; 15, high payoff treatment; all data, both presentation orders combined: losses then gains and gains then loss; loss–gain, losses presented first, then gains. a Decisions in this hypothetical payoff experiment followed an experiment that used very high real earnings.
pattern using our hypothetical lotteries, neither of which involves a certain prospect. However, when real cash payments are used, the results are quite different, as shown in the lower panel of Fig. 2. The modal outcome (shown in the back left corner) involves risk aversion for both gains and losses, even though these gains and losses are ‘‘low’’ (less than $8 in absolute value). Over gains, there is a little more risk aversion with (low) real payoffs: 60 percent of subjects exhibit risk aversion in the gain condition (back row of Fig. 2). Of these only about one-fifth are risk seeking for losses (see the bar in the back, right corner). The rate of reflection in the bottom panel with real payoffs (13 percent) is half the rate of reflection observed under hypothetical payoffs (26 percent). Recall that the predicted choice pattern involves switching between the safe lottery and the risky lottery once, with the switch point determining the inferred risk attitude. In total, there were 44 out of 157 subjects who switched more than once in either the gain or loss treatment (or both).15 Since such multiple switching introduces some noise due to confusion or other considerations, it is instructive to look at choice patterns for those who switch only once in either treatment. These data produce a little more risk aversion in the gain domain, but the basic patterns shown in Fig. 2 remain unchanged. With real payoffs, for example, 67 percent are risk averse in the gain domain, but less than one-fifth of these subjects exhibit reflection. Using hypothetical payoffs, the modal decision is still reflection; half who are risk averse in the gain domain are risk seeking in the loss domain. Just as when the full dataset is used, we find about twice as much reflection with hypothetical payoffs as with real payoffs (26 percent compared with 12 percent, respectively).
419
Further Reflections on the Reflection Effect Hypothetical Payoffs number of observations
25 20 15 Gain Domain averse
10 5 0
neutral loving averse
neutral Loss Domain
loving
Real Payoffs number of observations
25 20 15 Gain Domain averse
10 5 0
Fig. 2.
neutral loving averse
neutral Loss Domain
loving
Risk Aversion Categories for Low Losses and Low Gains.
5. RESULTS FROM HIGH-PAYOFF SESSIONS Kahneman and Tversky’s (1979, p. 265) initial tests of prospect theory used high hypothetical payoffs, and they questioned the generality of data derived from small stakes lotteries. One might also suppose that large gains and losses have a higher emotional impact than low-payoff lotteries, so the
420
SUSAN K. LAURY AND CHARLES A. HOLT
predicted effects of a psychologically motivated theory like prospect theory might be more apparent with very high payoffs. Given this, we decided to scale up the stakes to levels that had a clear effect on risk attitudes in Holt and Laury (2002). To do this, we ran high-payoff treatments (with gains and losses, real and hypothetical) where the payoff numbers shown in Table 1 were multiplied by a factor of 15. This multiplicative scaling of all payoff amounts does not alter the risk-neutral crossover point at five safe choices. The result of this scaling was that the safe lottery had payoffs of $60 and $48 (positive or negative) and the risky lottery had payoffs of $115.50 and $3.00. The real-incentive sessions were quite expensive, since pre-lottery choice earnings had to be built up to high levels in order to make real losses credible. Initial earnings were therefore built up with a high-payoff public goods experiment. The real-payoff sessions were preceded by a high realpayoff experiment to raise subjects’ earnings, and the high hypothetical payoff sessions were preceded by an analogous experiment with high hypothetical payoffs. The initial earnings in the high real-payoff sessions averaged about $140 (and ranged from $112 to $190). We did not provide a higher initial payoff for the high hypothetical sessions, since losses were hypothetical.16 Because of the additional expense associated with these high payoff sessions, we have about half the number of observations as for the low-payoff sessions. Given that we did not observe any systematic effect of the order in which gains and losses were presented to subjects, we chose to use only one treatment order in the high payoff sessions. Therefore in all sessions, the lottery over losses was given first. As before, the lotteries over losses and gains were separated by a matching pennies game (with payoffs scaled up by a factor of 1), and the results for choices under both treatments were not announced until all decisions had been made. There were 32 subjects who faced high real payoffs and 32 who faced high hypothetical payoffs, and in both cases exactly half of the observations were for the treatment with the risky lottery listed on the left, and half with the risky lottery listed on the right (see Table 2). In Table 3, rows 2 and 3 (for the 1 and 15 Loss–Gain treatment) allow a comparison of the average number of safe choices, holding the treatment order (losses first) constant. There are no obvious effects of scaling up payoffs, except for an increase in risk aversion in the real gain domain. Fig. 3 shows the cumulative choice frequencies of high hypothetical (top) and high real (bottom) payoffs. Notice that the gain and loss lines are closer together for hypothetical payoffs, shown in the top panel. However, a matched-pairs Wilcoxon test (using the difference between an individual’s
421
Further Reflections on the Reflection Effect 15x Hypothetical Payments, Losses then Gains Risk Neutral
1
Cumulative Frequency
0.8
Gains
0.6
Losses
0.4
0.2
0 0
1
2
3
4 5 6 7 Number of Safe Choices
8
9
10
8
9
10
15x Real Payments, Losses then Gains Risk Neutral
1
Cumulative Frequency
0.8
Gains
0.6
0.4
Losses
0.2
0 0
Fig. 3.
1
2
3
4 5 6 7 Number of Safe Choices
Cumulative Choice Frequencies for High Losses and High Gains.
422
SUSAN K. LAURY AND CHARLES A. HOLT
choice in the gain and loss treatment as the unit of observation) rejects the null hypothesis of no difference in favor of the one-tailed alternative that fewer safe choices are made in the loss treatment. The top panel of Fig. 4 summarizes individual data for the 32 subjects in the high hypothetical payoff sessions. As before, the number of safe choices 15x Hypothetical Payoffs, Losses then Gains number of observations
0.3 0.25 0.2 Gain Domain averse
0.15 0.1
neutral
0.05 0
loving averse
neutral Loss Domain
loving
15x Real Payoffs, Losses then Gains number of observations
0.3 0.25 0.2 Gain Domain
0.15 0.1
averse neutral
0.05 0
loving averse
neutral Loss Domain
loving
Fig. 4. Risk Aversion Categories for High Losses and High Gains.
Further Reflections on the Reflection Effect
423
is used to categorize risk attitudes. Just as we observed for low payoffs, about half of these subjects are risk averse over gains (53 percent), however reflection is no longer the modal outcome. Only about one-third of those who are risk averse for gains turn out to be risk preferring for losses, while the majority of subjects (28 percent in all) are risk averse over gains and losses. The outcomes for high real cash payoffs are shown in the bottom panel of Fig. 4. About two-thirds of subjects are risk averse over gains (back row), of these only about 15 percent are also risk preferring for losses. Overall, we observe less reflection when we scale up payoffs, both real and hypothetical. And as before, we observe about twice the rate of reflection for high hypothetical payoffs (19 percent) as for high real payoffs (9 percent). In these high payoff sessions, 49 of 64 subjects exhibited a clean switch point between the safe and risky lotteries. With real payoffs, 73 percent of these subjects exhibit risk aversion over gains. Of these, only about 10 percent also show risk preference over losses. Little difference is observed in the hypothetical data. As before, reflection occurs about twice as often with hypothetical payoffs (17 percent of subjects) as with real payoffs (8 percent). There is one potentially important procedural difference between these high real and high hypothetical payoff sessions. The high real-payoff sessions were preceded by a real-payoff experiment in which earnings averaged about $140. In contrast, the high hypothetical payoff sessions were preceded by a hypothetical choice task, with earnings set equal to $45 for the entire session (which is identical to earnings in the low hypothetical payoff sessions). If previously earned high payoffs affect risk attitudes, this could bias the comparison between these real and hypothetical payoff sessions. In order to address this, we ran two additional high hypothetical payoff sessions.17 All procedures were identical to those described above (32 subjects participated, all faced the loss condition first, and half of the subjects saw the risky lottery on the left of their decision sheet), however both sessions were preceded by a high real-payoff experiment. Earnings in these sessions were quite close to those that preceded the high real-payoff sessions. Average earnings were $132 (compared with $140 for the real payment sessions reported above), and ranged from $111 to $182 ($112 to $190 for the real payment sessions). This high initial stake had a large effect on choices in the hypothetical gain treatment, but only a small effect in the loss domain. On average, individuals are very slightly risk seeking in the gain domain (4.9 safe choices), as shown in the bottom row of Table 3, while they are still
424
SUSAN K. LAURY AND CHARLES A. HOLT
somewhat risk averse over losses. This pattern (higher risk aversion over losses than gains) is opposite to that predicted by prospect theory, although the difference in choices between the gain and loss treatments is not significant. Overall, only 25 percent of subjects are risk averse over gains; of these, about one-third are risk seeking over losses. The rate of reflection (9 percent) is comparable to that observed with high real payoffs. Using the subset of data from those subjects who switch only one time strengthens these conclusions: 29 percent of subjects are risk averse over gains, however only 8 percent of all subjects in this treatment reflect. At the end of each session, we asked subjects to complete a demographic questionnaire. Our subject pool was almost equally divided among men and women (46 percent male and 54 percent female). Looking at our data by gender does not change our primary conclusion: the modal outcome is reflection only for low hypothetical payoffs. All sessions were held at Georgia State University, which is an urban campus located in downtown Atlanta and has a very diverse student body. Almost half of these subjects (43 percent) were raised outside of North America (Europe, South America, Asia, and Africa). The rate of reflection is generally higher among subjects from North America (the notable exception to this is in the low hypothetical treatment, where reflection occurs 50 percent more often among those raised outside of North America). However, none of our main results are changed when looking only at those raised in North America or only those raised abroad. The interpretation of our data is complicated by those individuals classified as being risk neutral over gains or losses. Recall that (for low payoffs) five safe choices are consistent with constant absolute risk aversion in the interval (0.05, 0.05). This is symmetric around zero (risk neutrality), but is also consistent with a very small degree of risk aversion or risk preference. An alternative interpretation of this is to assume that those we classified as risk neutral are evenly divided between being risk averse and risk seeking. If we eliminate the risk-neutral category and classify subjects in this manner, our primary conclusions stand. When payments are real, the modal outcome under high and low incentives is risk aversion under gains and losses. For low hypothetical payments, the modal outcome is reflection; however for high hypothetical payoffs (preceded by an experiment that uses hypothetical payments), the modal outcome is risk aversion under gains and losses. Using high hypothetical payoffs (preceded by a high real-payoff experiment) the modal outcome is the reverse pattern of reflection: risk preference over gains and risk aversion over losses.18
Further Reflections on the Reflection Effect
425
6. MAXIMUM-LIKELIHOOD ESTIMATION The nonparametric statistical tests presented thus far fail to support the notion that a full reflection of payoffs (multiplication by 1) causes subjects to exhibit risk aversion for the lotteries involving gains and risk preference for the lotteries involving losses. However, interpretation of the data is complicated by the fact that subjects entered the lottery choice part of the experiment with different earnings. Moreover, there were differences in presentation, and different subjects (with differing demographic characteristics) faced the real and hypothetical treatments, and low- and high-payoff treatments. In this section we present results from maximum-likelihood estimation that controls for (and measures the impact of) these factors. Recall that prior to the start of this part of the session, subjects participated in another experiment in which they earned their initial endowment. Before facing their first lottery choice task subjects were told: The remaining part of today’s experiment will consist of a series of choices given to you one at a time. Although each part will count toward your final earnings, you will not find out how much you have earned for any of these decisions until you have completed all of them. For one of these decision tasks, all payoffs are negative; for this decision, payoffs will be subtracted from your earnings in the other parts of today’s experiment. For all of the other decision tasks, payoffs are positive and will be added to your earnings in the other parts of today’s experiment.
In the high-payoff treatment, subjects faced a maximum loss of $115.50. In the real-payoff sessions, four subjects entered this part of the experiment with earnings below this level, and so when they faced a loss of $115.50 and thus there was the possibility of losing more money than their accumulated earnings. These subjects only knew that they would have future opportunities to earn money, but did not know the size of the earnings opportunities.19 Because of this uncertainty, it is unclear how these subjects perceived the potential losses. For example, a subject who started with $110 might perceive the possible $115.50 loss as a loss of $110 (the initial endowment) instead. Therefore, these subjects are omitted from the following analysis. The estimation presented here follows the structural estimation procedures employed in Holt and Laury (2002) to estimate the parameters of an Expo-Power utility function. The extension to estimate a structural model using individual observations, rather than grouped observations, is described in Appendix F of Harrison and Rutstro¨m (2008), who also discuss the extension to allow each core parameter to be estimated as a linear function of observable demographic or treatment characteristics. For
426
SUSAN K. LAURY AND CHARLES A. HOLT
a given lottery choice, the probabilities and values of the prizes are used to determine the expected utility of each lottery, using a CRRA specification u(x) ¼ x1r/(1 r). The model estimated here assumes this functional form for utility, an expected utility theory representation of the latent decision process, a cumulative normal distribution to link predicted choices and actual choices, and a Fechner structural error specification. The estimation procedures account for the fact that we have choices in 10 lotteries for each subject under both gains and losses, and therefore we use clustered-robust standard errors to allow for correlated errors by each subject. The estimates are obtained using standard maximum-likelihood methods, using Stata. The top panel of Table 4 presents maximum-likelihood estimates for the baseline (low-payoff) data. One can compute the size of the CRRA coefficient using the coefficient of the regression constant and then adding in the marginal effects from the demographic and treatment variables. In this case, the CRRA coefficient is calculated as 0.242 0.02 loss0.004 male+0.002 age y The loss variable is an indicator variable that is set equal to one when the lotteries involve losses and zero otherwise. The negative coefficient suggests that there is less risk aversion under losses, however the effect is not significant at any standard level of confidence. In fact, the only variable that is significant on its own is the ‘‘white’’ indicator variable: subjects who classify themselves as white or Caucasian are less risk averse than non-white subjects. These results also show that, for the baseline payoff data, the subject’s sex (male ¼ 1 if male subjects), age (in years), where they were raised (North America or abroad), use of hypothetical payments (hyp), the order in which they faced gains and losses (gl_order), and whether the safe lottery was listed as Option A or Option B (safe_left) have no significant effect on the CRRA coefficient. The coefficient for ‘‘mu’’ gives the estimate for the Fechner noise term. The top panel of Table 5 presents the predicted CRRA coefficient using the characteristics of each subject for the baseline payoff data. The coefficient is slightly smaller under losses than gains (r ¼ 0.189 under losses compared to 0.217 under gains), but these values indicate risk aversion under both gains and losses. Turning to the high-payoff sessions (second panel of Tables 4 and 5), subjects are approximately risk neutral, but again there is no significant effect of losses on the risk aversion coefficient. The coefficient is both very small (0.002) and insignificant ( p ¼ 0.93). It is important to note that there is also a large increase in the noise coefficient (mu) for the high-payoff data. Therefore, the effect of payoff scale on the Fechner noise term must be recognized and dealt with when both payoff scales are combined.
427
Further Reflections on the Reflection Effect
Table 4. Maximum-likelihood Estimation of CRRA Utility Function. Variable
Description
Baseline payoff dataa Cons Loss Indicator for loss treatment Male Indicator for male subjects Age Subject’s age (in years) White Indicator for White/ Caucasian NAmerican Indicator raised in North America hyp Indicator for hypothetical treatment gl_order Indicator for gains presented first safe_left Indicator for safe presented on left mu Fechner noise parameter High-payoff datab Cons Loss Indicator for loss treatment Male Indicator for male subjects Age Subject’s age (in years) White Indicator for White/ Caucasian NAmerican Indicator raised in North America hyp Indicator for hypothetical treatment safe_left Indicator for safe presented on left mu Fechner noise parameter All data, contextual utility modelc Cons Loss Indicator for loss treatment
Estimate
Standard p-Value Error
Lower 95% Confidence Interval
Upper 95% Confidence Interval
0.242 0.028
0.223 0.022
0.277 0.193
0.195 0.071
0.679 0.014
0.005
0.026
0.856
0.055
0.046
0.002
0.003
0.381
0.003
0.007
0.066
0.027
0.014
0.119
0.014
0.041
0.028
0.141
0.100
0.014
0.010
0.026
0.701
0.061
0.041
0.011
0.027
0.679
0.065
0.042
0.036
0.027
0.185
0.089
0.017
0.580
0.455
0.202
0.311
1.471
0.066 0.002
0.111 0.027
0.550 0.927
0.151 0.055
0.283 0.050
0.042
0.029
0.152
0.016
0.100
0.000
0.003
0.953
0.006
0.006
0.026
0.035
0.460
0.095
0.043
0.048
0.034
0.162
0.115
0.019
0.026
0.029
0.364
0.083
0.030
0.041
0.028
0.152
0.015
0.096
16.593
6.896
0.016
3.076
30.109
2.212 0.784
0.556 0.221
0.000 0.000
1.123 1.217
3.301 0.351
428
SUSAN K. LAURY AND CHARLES A. HOLT
Table 4. (Continued ) Variable
hyp
gl_order safe_left Scale
Description
Indicator for hypothetical treatment Indicator for gains presented first Indicator for safe presented on left Indicator for high scale
Noise
Estimate
Standard p-Value Error
Lower 95% Confidence Interval
Upper 95% Confidence Interval
0.113
0.225
0.617
0.554
0.329
0.219
0.694
0.752
1.579
1.141
0.067
0.224
0.766
0.373
0.506
0.095
0.036
0.008
0.165
0.024
5.081
0.317
0.000
4.460
5.703
a
Log-likelihood ¼ 1006.437; Wald test for null hypothesis that all coefficients are zero has a w2 value of 15.87 with eight degrees of freedom, implying a p-value of 0.0443. b Log-likelihood ¼ 535.229; Wald test for null hypothesis that all coefficients are zero has a w2 value of 12.78 with seven degrees of freedom, implying a p-value of 0.0777. c Log-likelihood ¼ 1655.181; Wald test for null hypothesis that all coefficients are zero has a w2 value of 29.48 with five degrees of freedom, implying a p-value of 0.000.
Table 5. Mean
Predicted CRRA Coefficients.
Standard Deviation
Minimum Value
Maximum Value
Baseline payoff data Gains 0.217 Losses 0.189
0.044 0.044
0.128 0.100
0.321 0.292
High-payoff data Gains 0.063 Losses 0.063
0.048 0.046
0.031 0.033
0.153 0.126
All data, contextual utility model Gains 1.561 0.592 Losses 0.868 0.549
0.676 0.108
2.184 1.400
The bottom panel of Tables 4 and 5 present results from the pooled (baseline and high-payoff) data, using a contextual utility model that incorporates heteroscedasticity in the noise term that can be attributed to the change in context from low payoffs to high payoffs (see Wilcox, 2007, and Wilcox, 2008, for a derivation of and further details on the contextual utility model). These estimates show that framing the lottery choice problem in terms of losses causes a significant decrease in risk aversion (Table 4),
Further Reflections on the Reflection Effect
429
however the predicted values show that subjects are still risk averse under both gains and losses.
7. CONCLUSION This paper adds to the literature of experimental tests of elements of prospect theory, which in its various versions is the leading alternative to expected utility theory. The design uses a menu of lottery choices structured to allow an inference about risk aversion as gains are transformed into losses, holding payoff probabilities constant. When hypothetical payoffs are used, we do see that the modal choice pattern is for subjects to ‘‘reflect’’ from risk-averse behavior over gains to risk-seeking behavior over losses. This reflection rate is reduced by more than half when we use lotteries with real money payoffs, and the modal tendency is to be risk averse for both gains and losses. There is a significant difference in risk attitudes, however, with less risk aversion observed in the loss domain. When payoffs are scaled up by a factor of 15 (yielding potential gains and losses of over $100), there is even less support for reflection. Sharper results are obtained when we remove the ‘‘noisy’’ subjects who switch between the safe and risky lotteries more than once. There is a little more risk aversion with the no-switch data, and the scaling up of payoffs cuts reflection rates by almost half (for both real and hypothetical payoffs). In fact, the incidence of reflection with high real payoffs is only about 7 percent, and is lower than the rate of ‘‘reverse reflections’’ (risk seeking for gains and risk aversion for losses) that is opposite of the pattern predicted by prospect theory. The lack of a clear reflection effect in our data is little surprising, given the results of other studies that report reflection effects with real money incentives (Camerer, 1989; Battalio et al., 1990). One procedural difference is the nature of what was held constant between treatments. Instead of holding initial wealth roughly constant in both treatments as we did, these studies provided a high initial stake in the loss treatment, so the final wealth position is constant across treatments. For example, a lottery over gains of $20 and $0 could be replaced with an initial payoff of $20 and a choice involving $20 and $0. Each ‘‘frame’’ yields the same possible final wealth positions ($0 or $20), but the framing is in terms of gains in one treatment and in terms of losses in the other.20 A setup like this is precisely what is needed to isolate a ‘‘framing effect.’’ Such an effect is present since both studies report a tendency for subjects to be risk averse in the gain frame and
430
SUSAN K. LAURY AND CHARLES A. HOLT
risk seeking in the loss frame. Whether these results indicate a reflection effect is less clear, since the higher stake provided in the loss treatment may have induced more risk-seeking behavior.
NOTES 1. Hilton (1988) provides a neat decomposition of an ‘‘overall risk premium’’ into a standard Arrow–Pratt risk premium from expected utility theory and a ‘‘decision weight risk premium’’ resulting from nonlinear probability weighting. Levy and Levy (2002) characterize the risk premium ‘‘in the small’’ for the case of cumulative prospect theory. 2. ‘‘In the present study we did not pay subjects on the basis of their choices because in our experience with choices between prospects of the type used in the present study, we did not find much difference between subjects who were paid a flat fee and subjects whose payoffs were contingent on their decisions’’ (Tversky & Kahneman, 1992, p. 315). The choices being referred to were choices between gambles and sure money amounts. 3. Harrison, Johnson, McInnes, and Rutstrom (2005) report a follow-up experiment and conclude that the payoff-scale effects reported by Holt and Laury (2002) were, in part, due to a treatment-order effect, but that the qualitative results (higher risk aversion for higher stakes) were replicated. Holt and Laury (2005) ran a subsequent experiment with no order effects that also resulted in a clear effect of payoff scale on risk aversion, although the magnitude of the payoff-scale effect was diminished (consistent with findings of Harrison et al.). 4. For other surveys of the effects of using money payoffs in economics experiments, see Smith and Walker (1993), Hertwig and Ortmann (2001), Laury and Holt (2007), Harrison and Rutstro¨m (2008), and Camerer and Hogarth (1999). 5. In addition, the idea that one might respond to losses and gains differently is supported by Gehring and Willoughby (2002) who measure brain activity (eventrelated brain potentials measured with an EEG) milliseconds after a subject makes a choice that results in a gain or loss. They find that this brain activity is greater in amplitude after a (real) loss is experienced than when a gain is experienced. Moreover, choices made after losses were riskier than choices made after gains. Dickhaut et al. (2003) also observed brain activity in choice tasks with monetary gains and losses, and they report that subjects are risk averse for gains but not for losses, and that reaction time and brain activation patterns differ for these two contexts. 6. For example, suppose a subject must choose between a small certain loss (payment for insurance) and a gamble with a small probability of a large loss. Overweighting this small probability would cause the subject to appear to be quite risk averse. If the payoffs are multiplied by 1 then the small probability of a large loss becomes a small probability of a large gain, and if that probability is overweighted, the subject would be more willing to take an actuarially fair risk. Bosche-Dome`nech and Silvestre (2006) deal with the problem of probability
Further Reflections on the Reflection Effect
431
weighting by cleverly decomposing reflection into a payoff translation and a probability switch. The payoff translation involves subtracting a constant from all payoffs, holding probabilities fixed. A probability switch involves assigning the probability of the high payoff to the low payoff instead, and vice versa. In their setup (one option is a certainty and the other involves only one non-zero payoff) the reflection obtained by multiplying all payoffs by 1 can be decomposed into a payoff translation and a probability switch. They consider four cases: the base lottery choice with gains, a probability switch (still with gains), a payoff translation into losses (with no switch in probabilities), and full reflection (both a payoff translation and a probability switch). They find equally strong payoff translation and probability switch effects. 7. In this study, only one of 134 subjects was selected at random ex post to be paid, for one of the 20 questionnaires that they completed over a 10-week period, with 1 of the 21 paired-choice questions for that questionnaire actually used to determine the payoff. 8. These calculations are meant to be illustrative; we do not mean to imply that absolute risk aversion will be constant over a wide range of payoffs. The lottery choice experiments in Holt and Laury (2002) involve scaling up payoffs by factors of 20, 50, and 90, and we find evidence of decreasing absolute risk aversion when utility is expressed as a function of income, not wealth. This result is not surprising since it is well known that the absolute risk aversion needed to explain choices between low stakes gambles implies absurd amounts of risk aversion over high stakes (Rabin, 2000). Rabin’s theorem pertains to a standard utility of final wealth function, but similar considerations apply when utility is a function of only gains and losses around a reference point (utility of income). To see this, consider the utility function u(x) ¼ exp(rx), which exhibits a constant absolute risk aversion of r. Notice that scaling up all money prizes by a factor of, say, 100, yields utilities of exp(100rx), so this is equivalent to leaving the stakes the same and increasing risk aversion by a factor of 100, which yields an absurd amount of risk aversion. 9. For subjects with multiple ‘‘switch points’’ (i.e., subjects who switch from making a safe choice to a risky choice, back to a safe choice, before finally settling on the risky choice) using the total number of safe choices results in an approximation of their risk attitude. A more precise characterization of risk attitude, based on an individual’s choice in each of the 10 gambles, is used when maximum likelihood estimates are presented in Section 6 below. 10. To clarify these qualitative predictions, consider the Arrow–Pratt coefficient of risk aversion, r(x) ¼ uv(x)/uu(x), and suppose that r(x) is higher for one utility function than for another on some interval of payoffs, with strict inequality holding for at least one point. Then it is a direct implication of parts (a) and (e) of Pratt’s (1964) Theorem 1 that the right side of (2) is higher for the more risk-averse utility function. Since the left side is increasing in p, this increases the range of probabilities for which the safe option is preferred. Conversely, Pratt’s Theorem 1 implies that the right side of (3) is lower for the more risk-averse utility function, which again raises the interval of probabilities over which the safe option is preferred. 11. If time permits, we prefer this approach because, as Camerer (1989) notes, losses from such a windfall stake obtained without any effort may be coded as foregone gains. For example, if a subject is given $20 and then experiences a loss of
432
SUSAN K. LAURY AND CHARLES A. HOLT
$5, the subject may consider this $15 earnings and not a $5 loss. There is also clear evidence that earned endowments tend to increase the incidence of self-interested decisions in dictator and division games, see Rutstrom and Williams (2000) and Cherry, Frykbom, and Shogren (2002). 12. This initial phase involved a sequential search task in about half of the sessions, and a public goods experiment in the other half. 13. Similarly, Myagkov and Plott (1997) told subjects that cash earnings would be based on the outcome of one market period, selected at random ex post. As Holt (1986) notes, the random selection method produces a compound lottery composed of the simple lotteries. There is no clear experimental evidence for such a compound lottery effect, however. For example, Laury (2002) finds no significant difference in behavior between lottery choice treatments where subjects are paid for one of 10 decisions, or paid for all 10 decisions. See also Harrison and Rutstro¨m (2007) for a survey of evidence on the random selection method. 14. In the hypothetical treatment, the presentation of the S/R lotteries has some effect on behavior (i.e., whether safe or risky lottery is shown to subjects on the left side of the decision sheet as ‘‘Option A’’. However, as shown in Table 2, observations are about equally divided between these orders. Moreover, Kahneman and Tversky (1979) and Tversky and Kahneman (1992) alternated their presentation of lotteries in a similar manner and did not separate their data by presentation order. For consistency, we do not do so either, but do control for this presentation order in the maximum likelihood estimation contained in Section 6. 15. For example, in the lotteries over gains, subjects should initially choose the safe lottery and then switch to the risky lottery when the probability of the highpayoff outcome is high enough. Some subjects initially chose the safe lottery, switched to the risky lottery, then switched back to the safe lottery before returning to the risky lottery. 16. We chose this treatment order (real preceded by real, and hypothetical preceded by hypothetical) for consistency with the low-payoff experiments reported above. Of course, if differences are observed between our high-payoff real and hypothetical reflection experiments, it could be because one was preceded by a realpayoff experiment, and the other by a hypothetical experiment (where total earnings were $45, regardless of one’s choices). We consider this below. 17. We thank Colin Camerer for suggesting this treatment. 18. Of course, those who are most supportive of prospect theory’s reflection effect might suggest that those individuals in the category centered around risk neutrality are not evenly distributed between risk aversion and risk preference. Instead, they might classify risk-neutral individuals in the manner most supportive of prospect theory. We can do so by classifying anyone risk neutral over gains as being risk averse, and anyone risk neutral over losses as risk seeking. Under this interpretation, the four upper-right bars in Figures 2 and 4 are combined to create the category for reflection. This includes those classified as risk neutral for both gains and losses. When risk-neutral individuals are reclassified in this manner, the modal choice pattern is reflection in all treatments. However, as reported by Camerer (1989) and Battalio et al. (1990), reflection is far from universal. In our low real payoff treatment, only 45 percent of all subjects exhibit reflection (compared with 38 percent who are risk averse for both gains and losses). There is a little more reflection
Further Reflections on the Reflection Effect
433
in the high real payoff treatment when risk-neutral subjects are reclassified in this manner: 56 percent reflect, while 31 percent are risk averse over gains and losses. As before, the strongest support for the reflection effect comes from subjects who faced low hypothetical payoffs; 63 percent of these (reclassified) subjects exhibited the predicted risk aversion for gains and risk preference over losses. When high hypothetical payoffs follow a hypothetical payoff experiment, 44 percent of subjects reflect (and 34 percent are risk averse over both gains and losses). Following a high real payoff experiment, only 41 percent of subjects exhibit reflection under high hypothetical payoffs. Because the risk-neutral data are categorized in the way most favorable to prospect theory it is not surprising that there is much more support for reflection when the data are presented in this manner. Moreover, this would indicate that the strongest support for the reflection effect comes from those who are at best very slightly risk averse over gains and very slightly risk loving over losses. 19. In fact, earnings in the matching pennies game were set to ensure that all subjects would receive a positive payment in the session. 20. Similarly, Cohen et al. (1987) informed subjects in advance that a constant amount of money sufficient to cover losses would be added to the payoff before the determination of losses in the loss treatment.
ACKNOWLEDGMENTS We wish to thank Ron Cummings for his helpful suggestions and for funding the human subjects’ payments and Glenn Harrison for his helpful suggestions and assistance. We also thank Eunice Heredia for research assistance, and Colin Camerer, Glenn Harrison, Peter Moffatt, Lindsay Osco, Alda Turfo, and Peter Wakker for their comments and suggestions. Any remaining errors are our own. This work was funded in part by the National Science Foundation (SBR-9753125 and SBR-0094800).
REFERENCES Battalio, R. C., Kagel, J. H., & Jiranyakul, K. (1990). Testing between alternative models of choice under uncertainty: Some initial results. Journal of Risk and Uncertainty, 3(1), 25–50. Binswanger, H. P. (1980a). Attitudes toward risk: Experimental measurement in rural India. American Journal of Agricultural Economics, 62(3), 395–407. Binswanger, H. P. (1980b). Attitudes toward risk: Experimental measurement in rural India. American Journal of Agricultural Economics, 62(3), 395–407. Bosch-Domenech, A., & Silvestre, J. (1999). Does risk aversion or attraction depend on income? An experiment. Economics Letter, 65(3), 265–273. Bosche-Dome`nech, A., & Silvestre, J. (2006). Reflections on gains and losses: A 2 2 7 experiment. Journal of Risk and Uncertainty, 33, 217–235.
434
SUSAN K. LAURY AND CHARLES A. HOLT
Camerer, C. F. (1989). An experimental test of several generalized utility theories. Journal of Risk and Uncertainty, 2(1), 61–104. Camerer, C. F. (2001). Prospect theory in the wild: Evidence from the field. In: D. Kahneman & A. Tversky (Eds), Choices, values, and frames (pp. 288–300). Cambridge: Cambridge University Press. Camerer, C. F., & Hogarth, R. M. (1999). The effects of financial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty, 19(1–3), 7–42. Cherry, T., Frykbom, P., & Shogren, J. (2002). Hardnose the dictator. American Economic Review, 92(4), 1218–1221. Cohen, M., Jaffray, J., & Said, T. (1987). Experimental comparisons of individual behavior under risk and under uncertainty for gains and losses. Organizational Behavior and Human Decision Processes, 39, 1–22. Dickhaut, J., McCabe, K., Nagode, J. C., Rustichini, A., Smith, K., & Pardo, J. V. (2003). The impact of certainty context on the process of choice. Proceedings of the National Academy of Sciences, 100(18 March), 3536–3541. Gehring, W. J., & Willoughby, A. R. (2002). The medial frontal cortex and the rapid processing of monetary gains and losses. Science, 295(22 March), 2279–2282. Harbaugh, W. T., Krause, K., & Vesterlund, L. (2002). Prospect theory in choice and pricing tasks. Working Paper. University of Oregon. Harrison, G., & Rutstro¨m, E. (2007). Experimental evidence on the existence of hypothetical bias in value elicitation experiments. In: C. R. Plott & V. L. Smith (Eds), Handbook of experimental economics results. New York: Elsevier Press. Harrison, G., & Rutstrom, E. (2008). Risk aversion in the laboratory. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Research in Experimental Economics, Vol. 12). Greenwich, CT: JAI Press. Harrison, G. W. (2006). Hypothetical bias over uncertain outcomes. In: J. A. List (Ed.), Using experimental methods in environmental and resource economics (pp. 41–69). Northampton, MA: Edward Elgar. Harrison, G. W., Johnson, E., McInnes, M., & Rutstrom, E. (2005). Risk aversion and incentive effects: Comment. American Economic Review, 95(3), 897–901. Hershey, J. C., & Schoemaker, P. J. H. (1980). Risk taking and problem context in the domain of losses: An expected utility analysis. Journal of Risk and Insurance, 47(1), 111–132. Hilton, R. W. (1988). Risk attitude under two alternative theories of choice under risk. Journal of Economic Behavior and Organization, 9, 119–136. Holt, C. A. (1986). Preference reversals and the independence axiom. American Economic Review, 76(3), 508–515. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92(5), 1644–1655. Holt, C. A., & Laury, S. K. (2005). Risk aversion and incentive effects: New data without order effects. American Economic Review, 95(3), 902–912. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of choice under risk. Econometrica, 47(2), 263–291. Kobberling, V., & Wakker, P. (2005). An index of loss aversion. Journal of Economic Theory, 122, 119–132. Laury, S. K. (2002). Pay one or pay all: Random selection of one choice for payment. Working Paper. Georgia State University.
Further Reflections on the Reflection Effect
435
Laury, S. K., & Holt, C. A. (2007). Payoff effects and risk preference under real and hypothetical conditions. In: C. Plott & V. Smith (Eds), Handbook of experimental results. Amsterdam: Elsevier. Laury, S. K., & McInnes, M. M. (2003). The impact of insurance prices on decision-making biases: An experimental analysis. Journal of Risk and Insurance, 70(2), 219–233. Laury, S. K., McInnes, M. M., & Swarthout, J. T. (2007). Catastrophic insurance: New experimental evidence. Working Paper. Georgia State University. Levy, H., & Levy, M. (2002). Arrow–Pratt risk aversion, risk premium, and decision weights. Journal of Risk and Uncertainty, 25(3), 265–290. Myagkov, M., & Plott, C. (1997). Exchange economies and loss exposure: experiments exploring prospect theory and competitive equilibria in market environments. American Economic Review, 87(5), 801–828. Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32(1–2), 122–136. Rabin, M. (2000). Risk aversion and expected utility theory: A calibration theorem. Econometrica, 68(5), 1281–1292. Rutstrom, E., & Williams, M. (2000). Entitlements and fairness: An experimental study of distributive preferences. Journal of Economic Behavior and Organization, 43(1). Slovic, P. (1969). Differential effects of real versus hypothetical payoffs on choices among gambles. Journal of Experimental Psychology, 79, 434–437. Slovic, P. (2001). Rational actors or rational fools: Implications of the affect heuristic for behavioral economics (Vol. 31, pp. 245–261). Working Paper. University of Oregon. Smith, V. L., & Walker, J. M. (1993). Monetary rewards and decision cost in experimental economics. Economic Inquiry, 31(2), 245–261. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 54(4), 297–323. Tversky, A., & Wakker, P. (1995). Risk attitudes and decision weights. Econometrica, 63(6), 1255–1280. Wilcox, N. (2007). ‘Stochastically more risk averse:’ A contextual theory of stochastic discrete choice under risk. Journal of Econometrics, forthcoming. Wilcox, N. (2008). Stochastic models for binary discrete choice under risk: A critical primer and econometric comparison. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Research in Experimental Economics, Vol. 12). Greenwich, CT: JAI Press.
APPENDIX. EXPERIMENT INSTRUCTIONS Initial Instructions for Hypothetical Payment Sessions Today you will be participating in several experiments about decision making. Typically, in an experiment like this one, you would earn money. The amount of money that you would earn would depend on the choices that you and the other participants would make. In the experiment today, however, you will be paid $45 for participating in the experiment. You can write this amount now on your receipt form.
436
SUSAN K. LAURY AND CHARLES A. HOLT
You will not earn any additional money today based on the choices that you and the other participants make. The instructions for each part of the today’s experiment will describe how your earnings depend on your decisions (and sometimes on the decisions of others). It is important that you understand that you will not actually be receiving any of this additional money (other than your $45 participation fee). We would like for you to sign the statement below indicating that you understand this. I understand that I will be paid $45 for participation in today’s experiment. All other earnings described in the instructions that I receive are hypothetical and will not actually be paid to me. __________________________ Signature
Although you will not actually earn any additional money today, we ask that you make choices in the following experiments as if you could earn more money, and the amount that you could earn would depend on choices that you and the others make. You will not actually be paid any additional money, but we want you to make decisions as if you would be paid additional money.
Instructions for Lottery Choice Tasks (Real and Hypothetical) The remaining part of today’s experiment will consist of a series of choices given to you one at a time. Although each part will count toward your final earnings, you will not find out how much you have earned for any of these decisions until you have completed all of them. For one of these decision tasks, all payoffs are negative; for this decision, payoffs will be subtracted from your earnings in the other parts of today’s experiment. For all of the other decision tasks, payoffs are positive and will be added to your earnings in the other parts of today’s experiment.
Instructions Your decision sheet shows ten decisions listed on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices and record these in the final column, but only one of them will be used in the end to determine your earnings. Before you start making your
Further Reflections on the Reflection Effect
437
ten choices, please let me explain how these choices will affect your earnings for this part of the experiment. Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, A or B, for the particular decision selected. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end. Now, please look at Decision 1 at the top. Option A yields a sure gain of $0.20 (20 cents), and option B yields a sure gain of $3.20 (320 cents). Next look at Decision 2 in the second row. Option A yields $7.70 if the throw of the ten-sided die is 1, and it yields $0.20 if the throw is 2–10. Option B yields $4.00 if the throw of the die is 1, and it yields $3.20 if the throw is 2–10. The other decisions are similar, except that as you move down the table, the chances of the better payoff for each option increase. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are finished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your payoff for the option you chose for that decision. Payoffs for this choice are positive and will be added to your previous earnings, and you will be paid the sum of all earnings in cash when we finish. So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B in each of these boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings for this part. Then you will write your earnings in the blank at the bottom of the page. Please note that these gains will be added to your previous earnings up to now. Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question.
438
SUSAN K. LAURY AND CHARLES A. HOLT
1/11/01,1 Decision
ID: _______ Option A
Option B
1
$3.20 if throw of die is 1–10
$0.20 if throw of die is 1–10
2
$4.00 if throw of die is 1 $3.20 if throw of die is 2–10
$7.70 if throw of die is 1 $0.20 if throw of die is 2–10
3
$4.00 if throw of die is 1 or 2 $3.20 if throw of die is 3–10
$7.70 if throw of die is 1 or 2 $0.20 if throw of die is 3–10
4
$4.00 if throw of die is 1–3 $3.20 if throw of die is 4–10
$7.70 if throw of die is 1–3 $0.20 if throw of die is 4–10
5
$4.00 if throw of die is 1–4 $3.20 if throw of die is 5–10
$7.70 if throw of die is 1–4 $0.20 if throw of die is 5–10
6
$4.00 if throw of die is 1–5 $3.20 if throw of die is 6–10
$7.70 if throw of die is 1–5 $0.20 if throw of die is 6–10
7
$4.00 if throw of die is 1–6 $3.20 if throw of die is 7–10
$7.70 if throw of die is 1–6 $0.20 if throw of die is 7–10
8
$4.00 if throw of die is 1–7 $3.20 if throw of die is 8–10
$7.70 if throw of die is 1–7 $0.20 if throw of die is 8–10
9
$4.00 if throw of die is 1–8 $7.70 if throw of die is 1–8 $3.20 if throw of die is 9 or 10 $0.20 if throw of die is 9 or 10
10
$4.00 if throw of die is 1–9 $3.20 if the throw of die is 10
Your Choice (A or B)
$7.70 if throw of die is 1–9 $0.20 if the throw of die is 10
Decision used: ________, Die throw: _____, Your earnings: _______.
Instructions Your decision sheet shows ten decisions listed on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices and record these in the final column, but only one of them will be used in the end to determine your earnings. Before you start making your ten choices, please let me explain how these choices will affect your earnings for this part of the experiment.
Further Reflections on the Reflection Effect
439
Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, A or B, for the particular decision selected. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end. Now, please look at Decision 1 at the top. Option A yields a sure loss of $0.20 (minus 20 cents), and option B yields a sure loss of $3.20 (minus 320 cents). Next look at Decision 2 in the second row. Option A yields $7.70 if the throw of the ten-sided die is 1, and it yields $0.20 if the throw is 2–10. Option B yields $4.00 if the throw of the die is 1, and it yields $3.20 if the throw is 2–10. The other decisions are similar, except that as you move down the table, the chances of the worse payoff for each option increase. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are finished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your payoff for the option you chose for that decision. Payoffs for this choice are negative and will be subtracted from your previous earnings, and you will be paid the sum of all earnings in cash when we finish. So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B, in each of these boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings for this part. Then you will write your earnings in the blank at the bottom of the page. Please note that losses will be subtracted from your previous earnings up to now. Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question.
440
SUSAN K. LAURY AND CHARLES A. HOLT
1/11/01,2 Decision
ID: _______ Option A
Option B
1
$3.20 if throw of die is 1–10
$0.20 if throw of die is 1–10
2
$4.00 if throw of die is 1 $3.20 if throw of die is 2–10
$7.70 if throw of die is 1 $0.20 if throw of die is 2–10
3
$4.00 if throw of die is 1 or 2 $3.20 if throw of die is 3–10
$7.70 if throw of die is 1 or 2 $0.20 if throw of die is 3–10
4
$4.00 if throw of die is 1–3 $3.20 if throw of die is 4–10
$7.70 if throw of die is 1–3 $0.20 if throw of die is 4–10
5
$4.00 if throw of die is 1–4 $3.20 if throw of die is 5–10
$7.70 if throw of die is 1–4 $0.20 if throw of die is 5–10
6
$4.00 if throw of die is 1–5 $3.20 if throw of die is 6–10
$7.70 if throw of die is 1–5 $0.20 if throw of die is 6–10
7
$4.00 if throw of die is 1–6 $3.20 if throw of die is 7–10
$7.70 if throw of die is 1–6 $0.20 if throw of die is 7–10
8
$4.00 if throw of die is 1–7 $3.20 if throw of die is 8–10
$7.70 if throw of die is 1–7 $0.20 if throw of die is 8–10
9
$4.00 if throw of die is 1–8 $7.70 if throw of die is 1–8 $3.20 if throw of die is 9 or 10 $0.20 if throw of die is 9 or 10
10
$4.00 if throw of die is 1–9 $3.20 if the throw of die is 10
$7.70 if throw of die is 1–9 $0.20 if the throw of die is 10
Decision used: ________, Die throw: _____, Your earnings: _______.
Your Choice (A or B)