Risk aversion in experiments

RESEARCH IN EXPERIMENTAL ECONOMICS Series Editor: Mark R. Issac Recent Volumes: Volume 7: Volume 8: Emissions Permit Ex...

Author: Glenn W. Harrison | James C. Cox

52 downloads 1789 Views 4MB Size Report

This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!

Report copyright / DMCA form

DOWNLOAD PDF

RESEARCH IN EXPERIMENTAL ECONOMICS Series Editor: Mark R. Issac Recent Volumes: Volume 7: Volume 8:

Emissions Permit Experiments, 1999 Research in Experimental Economics, 2001

Volume 9: Volume 10: Volume 11:

Experiments Investigating Market Power, 2002 Field Experiments in Economics, 2005 Experiments Investigating Fundraising and Charitable Contributors, 2006

RESEARCH IN EXPERIMENTAL ECONOMICS

VOLUME 12

RISK AVERSION IN EXPERIMENTS EDITED BY

JAMES C. COX Andrew Young School of Policy Studies, Georgia State University, Atlanta, USA

GLENN W. HARRISON Department of Economics, College of Business Administration, University of Central Florida, Orlando, USA

United Kingdom – North America – Japan India – Malaysia – China

JAI Press is an imprint of Emerald Group Publishing Limited Howard House, Wagon Lane, Bingley BD16 1WA, UK First edition 2008 Copyright r 2008 Emerald Group Publishing Limited Reprints and permission service Contact: [email protected] No part of this book may be reproduced, stored in a retrieval system, transmitted in any form or by any means electronic, mechanical, photocopying, recording or otherwise without either the prior written permission of the publisher or a licence permitting restricted copying issued in the UK by The Copyright Licensing Agency and in the USA by The Copyright Clearance Center. No responsibility is accepted for the accuracy of information contained in the text, illustrations or advertisements. The opinions expressed in these chapters are not necessarily those of the Editor or the publisher. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library ISBN: 978-0-7623-1384-6 ISSN: 0193-2306 (Series)

Awarded in recognition of Emerald’s production department’s adherence to quality systems and processes when preparing scholarly journals for print

LIST OF CONTRIBUTORS Steffen Andersen

Copenhagen Business School, Denmark

Peter Bossaerts

Swiss Federal Institute of Technology, Lausanne, Switzerland

Keith H. Coble

Mississippi State University, USA

James C. Cox

Georgia State University, USA

Glenn W. Harrison

University of Central Florida, USA

Frank Heinemann

Berlin University of Technology, Germany

Charles A. Holt

University of Virginia, USA

Morten I. Lau

Durham University, UK

Susan K. Laury

Georgia State University, USA

Jayson L. Lusk

Oklahoma State University, USA

E. Elisabet Rutstro¨m

University of Central Florida, USA

Vjollca Sadiraj

Georgia State University, USA

Nathaniel T. Wilcox

University of Houston, USA

William R. Zame

University of California at Los Angeles, USA

vii

RISK AVERSION IN EXPERIMENTS: AN INTRODUCTION James C. Cox and Glenn W. Harrison Attitudes to risk play a central role in economics. Policy makers should know them in order to judge the certainty equivalent of the effects of policy on individuals. What might look like a policy improvement when judged by the average impact could easily entail a welfare loss for risk averse individuals if the variance of expected impacts is wide compared to the alternatives. Economists interested in behavior also need to be interested in risk attitudes. In some settings, risk plays an essential role in accounting for behavior: job search and bidding in auctions are two of the best studied. But some assumptions about risk attitudes play a role in many more settings. The predictions of game theory rest on payoffs deﬁned over utility, so we (almost always) need to know something about utility functions in order to make these predictions operationally meaningful. Estimates of subjective discount rates are needed to understand intertemporal choice behavior, and are deﬁned in terms of the present value of utility streams, so we need to know utility functions in order to be able to correctly estimate discount rates reliably. However, one of the perennial challenges of testing economic theory is that predictions from theory often depend on unobservables. In this setting, an unobservable is some variable that is part of a theory of behavior, but that cannot be directly observed without making some assumption; it is therefore a latent variable to the observer. Experimental methods offer a Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 1–7 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00001-X

1

2

JAMES C. COX AND GLENN W. HARRISON

signiﬁcant methodological advance in such settings. In some cases, one can completely sidestep the identiﬁcation issue by directly inducing values or preferences. In other cases, one can often design an experiment to identify the previously unobservable variable, at least under some assumptions about the rationality of agents. This general point is, in fact, the major methodological innovation of experimental economics. Binswanger (1982, p. 393) was among the ﬁrst to see the broader implications of experimental methods for the estimation or control of latent variables such as risk attitudes and subjective beliefs. Despite the progress of past decades, risk attitudes are confounding unobservables that have remained latent in a wide range of experiments. The focus of this volume is on the treatment of risk aversion in the experimental literature, including the interpretation of risk aversion as potentially involving more than just the concavity of the utility function. Experimental methods can be viewed now as one of the major tools by which theories are rendered operational. In many cases, it is simply impossible to efﬁciently test theory without experiments, since too many variables have to be proxied to provide tests that are free of major confounds. Experiments also provide a useful lightning rod for controversies over the interpretation of theory, as we will see. This meta-pedagogic role of experiments is often misunderstood as intellectual navel-gazing. Why spend so much effort trying to understand the behavior of students in a cloistered lab? The answer is simple: if we cannot understand their behavior, with some effort, then we have no business claiming that we can understand behavior in less controlled, naturally occurring settings. This does not mean that any experimental task provides insight into every naturally occurring setting. In fact, in their short history experimental economists have been remarkably adept at ﬁnding ways in which their procedures or instructions might create unusual or unfamiliar tasks for subjects. The remedy in that case is just to design and run a better experiment for the inferential purpose. So experiments provide a focal point and meeting ground for theorists and applied economists. This volume admirably reﬂects that role for experiments. Chapters 2–4 provide analyses of topics that arise at the point of contact between experimental economists and theorists over the concept of risk aversion (Cox and Sadiraj [Chapter 2]), ways in which different experimental procedures and estimation methods affect inference about risk (Harrison and Rutstro¨m [Chapter 3]), and the sense in which stochastic assumptions should be viewed as substantive theoretical hypotheses about the random parts of behavior (Wilcox [Chapter 4]).

Introduction

3

Three surprising themes emerge from these initial chapters, even for those that know the experimental literature reasonably well. First, most of the theoretical, behavioral, and econometric issues that face analysts using expected utility theory (EUT) also apply to those using rankdependent and sign-dependent alternatives. It is simply not the case that EUT is dead as a descriptive model of broad applicability, or that the inferential tools for applying EUT and alternative models are all that different. It is hard to understand how anyone can read Hey and Orme (1994) and Harless and Camerer (1994) and come to any other conclusion, but many have. We believe that this misreading of the literature comes from an undue focus on special cases, which we liken to ‘‘trip-wire’’ tests of EUT. We say ‘‘undue’’ carefully here, since there is some value in looking at these cases because they allow different qualitative predictions. But they often imply quantitatively and stochastically insigniﬁcant predictions, such as ‘‘preference reversal’’ tests if the subjects are risk neutral. We view the challenge of the behaviorists as an implied call to state theoretical implications more explicitly, to design procedures more carefully, and above all to undertake econometric inference more rigorously. Chapters 2–4 review efforts to do that, and systematically reject simplistic conclusions about one or other model of risk attitudes being correct. The second theme is that one cannot maintain the presumed division of labor between the theorist, experimenter, and econometrician. If you write down a theory with no stochastic errors, it can be rejected by the slightest deviation from predicted behavior. Any theory, not just EUT. Absent the archetypal ‘‘clean room’’ to undertake our experiments in, we should expect some deviations, no matter how small. So we have to say something formal about how we identify those deviations, and what inferential weight to put on them. The tendency to let these metrics of evaluation be implicit has led to unqualiﬁed claims about stylized facts on risk attitudes that do not withstand careful scrutiny. But the moment one starts to be explicit about the metric of evaluation, it becomes clear that the metric chosen has theoretical import for the testable hypotheses of the theory, as well as implications for the design of experiments. These metrics cannot just be an afterthought, as all three chapters illustrate. The third theme is the tendency by theorists and experimental economists to gloss over the difference between in-sample predictions and out-of-sample predictions. Theorists want to make evaluations of the plausibility of empirical estimates of risk attitudes using out-of-sample predictions, and yet ignore the well-known statistical uncertainty that comes from applying

4

JAMES C. COX AND GLENN W. HARRISON

estimates beyond the domain of estimation. On the other hand, experimental economists producing these estimates have been strikingly loathe to qualify their claims about risk attitudes only applying ‘‘locally’’ to the prizes given to subjects. Theorists have to start using econometric language if they want to draw disturbing implications of estimates that come with standard errors, and applied researchers need to be wary of the substantive implications of making alternative stochastic assumptions. To illustrate this point, which connects Chapters 2–4, consider the estimation of the humble Constant Relative Risk Aversion utility function u(y) ¼ y1r/(1r) from the responses to the famous binary choice experiments of Hey and Orme (1994). This experiment gave 100 choices to 80 subjects over lotteries deﬁned on prizes of d0, d10, d20, and d30. Maximum likelihood methods from Table 8 of Harrison and Rutstro¨m [chapter 3] generate an estimate of r ¼ 0.613, implying modest risk aversion under EUT. The standard error on this estimate is 0.025, and the 95% conﬁdence interval (CI) is between 0.56 and 0.66, so the evidence of risk aversion is statistically signiﬁcant and we can reject the hypothesis of risk neutrality (r ¼ 0 here). Fig. 1 shows predicted in-sample utility values and their 95% CI using these estimates. Obviously the cardinal values on the vertical axis are

10

8

6

4

2

0 0

5

10

15

20

25

30

Income in British Pounds (£)

Fig. 1. Estimated In-Sample Utility. Estimated from responses of 80 subjects over 100 binary choices. Data from Hey and Orme (1994): choices over prizes of d0, d10, d20, and d30. Point prediction of utility and 95% CIs.

5

Introduction

arbitrary, but the main point is to see how relatively tight the CIs are in relation to the changes in the utility numbers over the lottery prizes. By contrast, Fig. 2 extrapolates to provide predictions of out-of-sample utility values, up to d1000, and their 95% CIs. The widening CIs are exactly what one expects from elementary econometrics. And it should be mentioned that they would be even wider if we accounted for our uncertainty that this is the correct functional form, and our uncertainty that we had used the correct stochastic identifying assumptions. Moreover, the (Fechner) error speciﬁcation used here allows for an extra element of imprecision when predicting what a subject would actually choose after evaluating the expected utility of the out-of-sample lotteries, and this does not show up in Fig. 2. The lesson here is that we have to be cautious when we make theoretical and empirical claims about risk attitudes. If the estimates displayed in Fig. 1 are to be used in the out-of-sample domain of Fig. 2, the extra uncertainty of prediction in that domain should be acknowledged. Chapter 2 shows why we want to make such predictions, for both EUT and non-EUT speciﬁcations; Chapter 3 shows how one can marshal experimental designs and econometric methods to do that; and Chapter 4 shows how alternative stochastic assumptions can have strikingly different substantive implications for the estimation of out-of-sample risk attitudes. 50 40 30 20 10 0 0

100

200

300

400

500

600

700

800

900

1000

Income in British Pounds (£)

Fig. 2. Estimated Out-of-Sample Utility. Estimated from responses of 80 subjects over 100 binary choices. Data from Hey and Orme (1994): choices over prizes of d0, d10, d20, and d30. Point prediction of utility and 95% CIs.

6

JAMES C. COX AND GLENN W. HARRISON

The remaining chapters consider a variety of speciﬁc issues. Heinemann [Chapter 5] considers the manner in which experimental wealth might be integrated with experimental income, in the context of a reexamination of inferences from the design of Holt and Laury (2002). He proposes that one estimates the wealth parameter in the subject’s utility function along with the risk attitude. One can equivalently view this parameter as referring to baseline consumption, with which the experimental prize is integrated by the subject when evaluating lotteries. The contribution here is to point out how alternative assumptions about what is the behaviorally relevant argument of the utility function can inﬂuence inferences about risk attitudes. Although this is well known theoretically (e.g., Cox & Sadiraj, 2006), it has only recently been explored by experimental economists when making inferences about risk attitudes (e.g., Harrison, List, & Towe, 2007; Andersen, Harrison, Lau, & Rutstro¨m, 2008). Lusk and Coble [chapter 6] examine the effect of the presence of background risk on the foreground elicitation of risk attitudes. The naturally occurring environment in which risky choices are made is not free of risks, and the theoretical literature on portfolio allocation has extensively examined the effects of correlated risks. But a newer strand of theoretical literature examines the role of uncorrelated risk, and under what circumstances it leads the decision maker to behave as if more or less risk averse in terms of the foreground choice task. The laboratory experiment considered here cleanly identiﬁes a possible role for background risk, in the direction predicted by EUT, and complements the ﬁeld experiments by Harrison et al. (2007) studying the same hypothesis. The study carefully points out how these conclusions are conditional on certain plausible modeling assumptions in the ex post analysis of the experimental data, illustrating again one of the general themes noted earlier. Bossaerts and Zame [chapter 7] study the presence of risk aversion in experimental asset markets. They ﬁnd evidence of risk aversion, and also of an ‘‘equity premium’’ once one allows for risk attitudes. Their results extend evidence of risk aversion in ‘‘low stakes’’ settings beyond the type of individual choice task typically used in risk aversion elicitation experiments. Andersen, Harrison, Lau, and Rutstro¨m [chapter 8] review the use of natural experiments from large-stake game shows to measure risk aversion. In many cases, these shows provide evidence when contestants are making decisions over very large stakes, and in a replicated, structured way. They consider the game shows Card Sharks, Jeopardy!, Lingo, and ﬁnally Deal Or No Deal, which have all been examined in the literature in terms of the

7

Introduction

implied risk attitudes. They also provide a detailed case study of Deal Or No Deal, since it is one of the cleanest games for inference and has attracted considerable attention. They propose, and test, a general method to overcome the curse of dimensionality that one encounters when estimating risk attitudes in the context of a dynamic, stochastic programming environment. Finally, Laury and Holt [chapter 9] consider the ‘‘reﬂection effect of prospect theory,’’ which is one of the stylized facts that one hears repeatedly about how risk attitudes vary over the gain and loss domain. Extending the popular design ﬁrst presented in Holt and Laury (2002), they show that the evidence for the reﬂection effect is not at all clear when one pays subjects for their choices, and that it is arguably just another artifact of using hypothetical responses. The data, instructions, and statistical code to replicate the empirical analyses in each chapter are available at the ExLab Digital Library at http://exlab.bus.ucf.edu.

ACKNOWLEDGMENTS The authors thank Nathaniel Wilcox for comments and the US National Science Foundation for research support under grants NSF/DUE 0622534 and NSF/IIS 0630805 (Cox) and NSF/HSD 0527675 and NSF/SES 0616746 (Harrison).

REFERENCES Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008). Eliciting risk and time preferences. Econometrica, 76, forthcoming. Binswanger, H. (1982). Empirical estimation and use of risk preferences: Discussion. American Journal of Agricultural Economics, 64, 391–393. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56, 45–60. Harless, D. W., & Camerer, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62, 1251–1289. Harrison, G. W., List, J. A., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. Econometrica, 75, 433–458. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62, 1291–1326. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655.

RISKY DECISIONS IN THE LARGE AND IN THE SMALL: THEORY AND EXPERIMENT James C. Cox and Vjollca Sadiraj 1. INTRODUCTION Much of the literature on theories of decision making under risk has emphasized differences between theories. One enduring theme has been the attempt to develop a distinction between ‘‘normative’’ and ‘‘descriptive’’ theories of choice. Bernoulli (1738) introduced log utility because expected value theory was alleged to have descriptively incorrect predictions for behavior in St. Petersburg games. Much later, Kahneman and Tversky (1979) introduced prospect theory because of the alleged descriptive failure of expected utility (EU) theory (von Neumann & Morgenstern, 1947). In this essay, we adopt a different approach. Rather than emphasizing differences between theories of decision making under risk, we focus on their similarities – and on their common problems when viewed as testable theories. We examine ﬁve prominent theories of decision making under risk – expected value theory, EU theory, cumulative prospect theory, rank dependent utility theory, and dual theory of EU – and explain the fundamental problems inherent in all of them. We focus on two generic types of problems that are common to theories of risky decisions: (a) generalized St. Petersburg paradoxes; and (b) implications

Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 9–40 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00002-1

9

10

JAMES C. COX AND VJOLLCA SADIRAJ

of implausible risk aversion. We also discuss the recent generalization of the risk aversion calibration literature, away from its previously exclusive focus on implications of decreasing marginal utility of money, to include implications of probability transformations (Cox, Sadiraj, Vogt, & Dasgupta, 2008b). We also note that much recent discussion of alleged ‘‘behavioral’’ implications of Rabin’s (2000) concavity calibration proposition has not involved any credible observations of behavior, and discuss possible remedies including the experiments reported in Cox et al. (2008b) and other designs for experiments outlined below. Section 2 in the chapter discusses ‘‘utility functionals’’ that represent risk preferences for the ﬁve representative theories of decision making under risk listed above and deﬁnes a general class of theories that contains all of them. In Section 3, we discuss issues that arise if the domain on which theories of decision making under risk aversion are deﬁned is unbounded, as in the seminal papers on the EU theory of risk aversion by Arrow (1971) and Pratt (1964) and the textbook by Laffont (1989). These prominent developments of the theory assume bounded utility (see, for example, Arrow, 1971, p. 92 and Laffont, 1989, p. 8) in order to avoid generalized St. Petersburg paradoxes on an unbounded domain. We demonstrate that this traditional assumption of bounded utility substitutes one type of problem for another because, on unbounded domains, bounded utility implies implausible risk aversion (as deﬁned in Section 3.1 below). Our discussion is not conﬁned to EU theory. We demonstrate that, on an unbounded domain, all ﬁve of the prominent theories of risky decisions have arguably implausible implications: with unbounded utility (or ‘‘value’’ of ‘‘money transformation’’) functions there are generalized St. Petersburg paradoxes and with bounded utility functions there are implausible aversions to risk taking. One possible reaction to the analysis in Section 3 might be: ‘‘So what? All empirical applications of risky decision theory are on bounded domains, so why should an applied economist care about any of this?’’ The answer to this question is provided in subsequent sections of the chapter in which we elucidate how the analysis on an unbounded domain causes one to ask new questions about applications of risky decision theories on bounded domains. We explain how ﬁnite St. Petersburg games provide robustness tests for empirical work on risk aversion on bounded domains. We discuss parametric forms of money transformation (or utility) functions commonly used in econometric analysis of lottery choice data and calibrate implications of parameter estimates in the literature for binary lottery preferences. These implied preferences over binary lotteries provide the basis

11

Risky Decisions in the Large and in the Small

for robustness tests of whether the reported parameter estimates can, indeed, rationalize the risk preferences of the subjects. Finally, we consider risk aversion patterns that are not based on parametric forms of money transformation functions or probability transformation functions. We summarize recent within-subjects experiments on the empirical validity of the postulated patterns of risk aversion underlying the concavity calibration literature and extensions of this literature to include convexity calibration of probability transformations. We also explain why some across-subjects experiments on concavity calibration reported in the literature do not, in fact, have any implications for empirical validity of calibrated patterns of small stakes risk aversion.

2. REPRESENTATIVE THEORIES OF DECISION UNDER RISK Let fY n ; Pn g denote a lottery that pays amounts of money Y n ¼ ½ yn ; yn@1 ; . . . ; y1 with respective probabilities Pn ¼ ½ pn ; pn@1 ; . . . ; p1 ; n P2n N, the set of integers, and yj yj@1 ; pj 0, for j ¼ 1; 2; . . . ; n, and j¼1 pj ¼ 1. This essay is concerned with theories of preferences over such lotteries. In representing the theories with utility functionals, it will be useful to also deﬁne notation for the probabilities of all outcomes except yj: P@j n ¼ ½ pn ; pn@1 ; . . . pj@1 ; pjþ1 ; . . . ; p1 . We discuss expected value theory (Bernoulli, 1738), EU theory (von Neumann & Morgenstern, 1947), dual theory of EU (Yaari, 1987), rank dependent utility theory (Quiggin, 1982, 1993), and cumulative prospect theory (Tversky & Kahneman, 1992). All ﬁve of these theories represent risk preferences with utility functionals that have a common form that is additive across states of the world (represented by the index j ¼ 1; 2; . . . ; n). This additive form deﬁnes a class D of decision theories that contains the above ﬁve prominent theories. We will review utility functionals for these ﬁve theories before stating the general functional form that can represent each theory’s typical functional as a special case. Expected value theory represents preferences over the lotteries with a functional of the form U EV ðfY n ; Pn gÞ ¼ a þ b

n X j¼1

pj yj ; b40

(1)

12

JAMES C. COX AND VJOLLCA SADIRAJ

The same EV preferences are represented when functional (1) is simpliﬁed by setting a ¼ 0 and b ¼ 1.1 We will avoid some otherwise tedious repetitions by using similar afﬁne transformations of utility (or ‘‘money transformation’’) functions, without explicit discussion, for other theories considered in subsequent paragraphs. EU theory represents preferences over the lotteries with a functional that can be written as U EU ðfY n ; Pn gÞ ¼

n X

pj uðyj ; wÞ

(2)

j¼1

where w is the agent’s initial wealth. Utility functionals (1) and (2) are both linear in probabilities, which in the case of EU theory is an implication of the independence axiom. Functional (2) is linear in money payoffs y only if the agent is risk neutral. EU theory contains (at least) three models. The EU of terminal wealth model (Pratt, 1964; Arrow, 1971) assumes that risk preferences are deﬁned over terminal wealth, i.e. that the ‘‘money transformation function’’ (or utility function) u takes the form uðy; wÞ ¼ jEUW ðy þ wÞ. The EU of income model commonly used in bidding theory assumes that risk preferences are independent of wealth, i.e. that the money transformation function takes the form uðy; wÞ ¼ jEUI ðyÞ.2 The EU of initial wealth and income model (Cox & Sadiraj, 2006) represents risk preferences with a money transformation function of the ordered pair of arguments (y,w). This model includes as special cases the terminal wealth model in which there is full asset integration, the income model in which there is no asset integration, and other models in which there is partial asset integration.3 The dual theory of EU represents preferences over the lotteries with a functional of the form " ! !# n n n X X X U DU ðfY n ; Pn gÞ ¼ f pk @f pk yj (3) j¼1

k¼j

k¼jþ1

Functional (3) is linear in payoffs as a consequence of the dual independence axiom. The transformation function f for decumulative probabilities is strictly convex if the agent is risk averse. If the agent is risk neutral then the decumulative probability transformation function f is linear and hence the utility functional (3) is linear in probabilities (in that special case).

13

Risky Decisions in the Large and in the Small

Rank dependent utility theory represents preferences over the lotteries with a functional of the form4 " ! !# j j@1 n X X X U RD ðfY n ; Pn gÞ ¼ q pk @q pk mðyj Þ (4) j¼1

k¼1

k¼1

Prospect theory transforms both probabilities and payoffs differently for losses than for gains. In the original version of cumulative prospect theory, Tversky and Kahneman (1992) deﬁned gains and losses in a straightforward way relative to zero income. Some more recent versions of the theory have reintroduced the context-dependent gain/loss reference points used in the original version of ‘‘non-cumulative’’ prospect theory (Kahneman & Tversky, 1979). Let r be the possibly non-zero reference point value of money payoffs that determines which payoffs are ‘‘losses’’ (yor) and which payoffs are ‘‘gains’’ ( y4r) and let the lottery money payoffs yj be less than r for j N r . Then risk preferences for cumulative prospect theory can be represented with a functional of the form " ! !# j j@1 Nr X X X U CP ðfY n ; Pn gÞ ¼ w@ pk @w@ pk v@ðyj@rÞ j¼1

þ

n X

"

j¼N r þ1

k¼1 þ

w

n X k¼j

!

k¼1

pk @w

þ

n X

!# pk

vþ ðyj@rÞ

ð5Þ

k¼jþ1

In utility functional (5): v is the value function for losses; vþ is the value function for gains; and w@ and wþ are the corresponding weighting functions for probabilities (or ‘‘capacities’’). There is a discontinuity in the slope of the value function at payoff equal to the reference payoff r, which is ‘‘loss aversion.’’5 A strictly concave value function for gains vþ and associated S-shaped probability weighting function wþ are commonly used in applications of prospect theory. The analysis in subsequent sections will use a general form of utility functional that, with suitable interpretations, represents all of the above theories of decision making under risk. Let hD be a probability transformation function for theory D. Let a positively monotonic function jD denote a money transformation function for theory D. Let w be the amount of initial wealth. Let D be the set of decision theories D that represent preferences over lotteries by utility functionals with the form: n X U D ðfY n ; Pn gÞ ¼ hD ð pj ; P@j (6) n ÞjD ð yj ; wÞ j¼1

14

JAMES C. COX AND VJOLLCA SADIRAJ

The additive-across-states form of (6) deﬁnes the class D of theories we discuss. This class contains all of the popular examples of theories discussed above. Many results in following sections apply to all theories in class D. Discussion in subsequent sections will describe some instances in which speciﬁc differences between the utility functionals for distinct theories are relevant to the analysis of properties of the theories we examine. Before proceeding to analyze the implications of functionals of form (6), it might be helpful to further discuss interpretations of (6) using the examples of theories D in D mentioned above. In the case of expected value theory, the probability transformation function hD in (6), written as hEV , is a @j constant function of P@j n and is the identity map of pj: hEV ðpj ; Pn Þ ¼ pj for @j all ðpj ; Pn Þ. The money transformation function jEV is linear in y (or in y þ w). Functional (6) is interpreted for EU theory as follows. The probability transformation function hEU is a constant function of P@j n and is the identity map of pj, as a consequence of the independence axiom. Interpretations of the money transformation function jEU vary across three EU models, as explained above. The interpretation of functional (6) for the dual theory of EU is as follows. The money transformation function jDU is always linear in y (or in y þ w) as a consequence of the dual independence axiom. ThePprobability transformation function hDU is a composition of functions of k j pk and P k jþ1 pk as shown in statement (3). The probability transformation function is linear only if the agent is risk neutral. Functional (6) is interpreted for rank dependent utility theory as follows. The money transformation function jRD is a constant function of w and is increasing in y. The probability transformation function hRD is a comP P position of functions of k j pk and k j@1 pk as shown in statement (4). The interpretation of functional (6) for cumulative prospect theory is the most complicated one because of the various interdependent special features of that theory. The money transformation function jCP is a constant function of w and increasing in y, with a discontinuous change in slope at y ¼ r; furthermore, in some versions of the theory the reference point income r can be variable and context dependent. As shown in (5), theP probability transformation function hCP is a composition of functions Pj@1 j of p and p , when yor, and a composition of functions of Pn k¼1 k Pn k¼1 k p and p , when y r. k¼j k k¼jþ1 k We now proceed to derive some implications of theories in class D that have preferences over lotteries that can be represented by utility functionals with the form given by statement (6).

Risky Decisions in the Large and in the Small

15

3. THEORY FOR UNBOUNDED DOMAIN: ST. PETERSBURG PARADOX OR IMPLAUSIBLE RISK AVERSION We here discuss theories of decision making under risk in the domain of discourse adopted in classic expositions of EU theory such as Arrow (1971) and Pratt (1964), as well as in advanced textbook treatments such as Laffont (1989). In contrast to those studies, our discussion is not conﬁned to EU theory but, instead, applies to all decision theories in class D. For any money transformation function jD , deﬁned on an unbounded domain, there can be only two exclusive cases: the function is either unbounded from above or bounded. In this section we consider both of these two cases and show that all decision theories in class D have similar implausible implications. Models for theories in class D that assume unbounded money transformation functions are characterized by generalized St. Petersburg paradoxes. Models for theories in class D that assume bounded money transformation functions are characterized by implausible risk aversion, as deﬁned below.

3.1. Unbounded Money Transformation Functions Some examples of unbounded money transformation functions are linear functions, power functions, and logarithmic functions. Daniel Bernoulli (1738) introduced the St. Petersburg paradox (as described in the next paragraph) that questioned the plausibility of expected value theory. Bernoulli offered log utility of money as a solution to the St. Petersburg paradox that preserves linearity in probabilities (and in that way anticipated subsequent development of EU theory). However, unbounded monotonic money transformation functions (including log functions) do not eliminate generalized St. Petersburg paradox problems for EU theory (Arrow, 1971, p. 92; Samuelson, 1977). We here explain that unbounded money transformation functions produce similar plausibility problems for other decision theories in class D (see also Rieger & Wang, 2006). The original St. Petersburg game pays 2k when a fair coin comes up heads for the ﬁrst time on ﬂip k, an event with probability 1/2k. The game can be represented by fY 1 ; P1 g ¼ f21 ; 1=21 g where 21 ¼ ½. . . ; 2n ; 2n@1 ; . . . ; 2 and 1=21 ¼ ½. . . ; 1=2n ; 1=2n@1 ; . . . ; 1=2. Expected theory evaluates this P value k k lottery according to U EV ðf21 ; 1=21 gÞ ¼ 1 2 ð1=2 Þ ¼ 1. Bernoulli k¼1

16

JAMES C. COX AND VJOLLCA SADIRAJ

(1738) famously reported that most people stated they would be unwilling to pay more than a small ﬁnite amount to play this game. A log utility of money function, offered by Bernoulli as an alternative to the linear utility of money function, does solve the paradox of the original St. Petersburg lottery P k because 1 ½lnð2 Þ ð1=2k Þ ¼ 2lnð2Þ is ﬁnite. k¼1 It is now well known that the log utility of money function cannot solve the paradox of a slightly modiﬁed version of the original St. Petersburg game: pay exp(2k) when a fair coin comes up heads for the ﬁrst time on ﬂip k. The problem is not with the log function per se. No unbounded money transformation function can eliminate problems of the St. Petersburg type of paradox for EU theory. For any jEU not bounded from above, deﬁne a sequence of payments X EU ¼ fxk : k 2 Ng such that, for all k, jEU ðzk Þ 2k , where zk equals either xk or w þ xk depending on whether one is applying the EU of income model or the EU of terminal wealth model.6 The EU of a St. Petersburg game that pays xk (instead of 2k) when a fair coin comes up heads for the ﬁrst time on ﬂip k is inﬁnite. This is shown for the EU of income model by X 1 k 1 k X 1 1 1 U EUI x1 ; jEUI ðxk Þ 2k ¼ 1 (7) ¼ 21 2 2 k¼1 k¼1 Hence an EU maximizer whose preferences are represented with money transformation function jEUI for amounts of income would prefer game X EU to any certain amount of money, no matter how large. Similarly, an EU maximizer whose preferences are represented with money transformation function jEUW of amounts of terminal wealth would be willing to pay any amount p up to his entire (ﬁnite) amount of initial wealth w to play game X EU since, for all p w, X 1 k 1 k X 1 1 1 U EUW x1 ; jEUW ðw@p þ xk Þ 2k ¼ 21 2 2 k¼1 k¼1 ¼ 14jEUW ðwÞ

ð8Þ

The following proposition generalizes this result and demonstrates that unbounded money transformation functions produce similar plausibility problems for all decision theories in class D. One has: Proposition 1. Let an agent’s preferences deﬁned on an unbounded domain be represented by functional (6) with an unbounded money transformation function j and a strictly positive probability transformation function h. The agent will reject any ﬁnite amount of money in favor of a St. Petersburg

Risky Decisions in the Large and in the Small

17

lottery that pays xk 2 X j;h ¼ fxj j j 2 N; jðxj Þ 1=hð1=2j ; ½1=21 @j Þg when a fair coin comes up heads for the ﬁrst time on ﬂip k. Proof. Apply the Lemma in Appendix A.1. To illustrate Proposition 1, we report examples of generalized St. Petersburg games for some of the alternatives to EU theory in class D, including dual theory of EU, rank dependent utility theory, and cumulative prospect theory. First consider the dual theory of EU with positively monotonic transformation f for decumulative probabilities. According to this theory, the St. Petersburg game that pays xn if the ﬁrst head appears on ﬂip n is evaluated by ! !! 1 1 X X X 1 1 U DU ðX DU Þ ¼ xn f @f k k n;xn 2X DU k¼n 2 k¼nþ1 2 X xn ðf ð21@n Þ@f ð2@n ÞÞ ð9Þ ¼ n;xn 2X DU

which is unbounded from above for xn from X DU ¼ fxn : n 2 N; xn 1=½ f ð21@n Þ@f ð2@n Þg. Next, consider rank dependent utility theory with transformation function q (for cumulative probabilities). Since jRD is not bounded from above, one can ﬁnd a sequence of payments X RD ¼ fxn : n 2 N; jRD ðxn Þ 1=½qð1@2@n Þ@qð1@2@ðn@1Þ Þg. The rank dependent utility of the St. Petersburg game that pays xn, xn 2 X RD if a fair coin comes up heads for the ﬁrst time on ﬂip n is ! !! 1 n n@1 X X X 1 1 U RD ðX RD Þ ¼ jRD ðxj Þ q @q k k n¼1 k¼1 2 k¼1 2 1 X ¼ jRD ðxn Þðqð1@2@n Þ@qð1@2@ðn@1Þ ÞÞ ð10Þ n¼1

which is unbounded by construction of XRD. Finally, consider cumulative prospect theory with reference point equal to a given amount of money r. Let j@ CP be the money transformation (or ‘‘value’’) function for losses and jþ be the money transformation function for gains. CP Let w@ be the probability transformation in the loss domain and wþ be the probability transformation function in the gain domain. Assume loss aversion: a discontinuity of the slope of the value function at x ¼ r. Deﬁne

18

JAMES C. COX AND VJOLLCA SADIRAJ

þ P P @k @k @wþ . Without X CP ¼ xn : n 2 N; jþ kn 2 knþ1 2 CP ðxn@rÞ 1= w

loss of generality, let r be between xj and xjþ1 , for some j 2 N. The St. Petersburg game that pays xn 2 X CP if a fair coin comes up heads for the ﬁrst time on ﬂip n is evaluated by cumulative prospect theory as follows: ! !! j i i@1 X X X 1 1 @ @ @ U CP ðX CP Þ ¼ jPT ðxi@rÞ w @w k k i¼1 k¼1 2 k¼1 2 ! !! 1 1 1 X X X 1 1 þ þ jþ @wþ ð11Þ PT ðxn@rÞ w k k n¼jþ1 k¼n 2 k¼nþ1 2 Note that U CP ðX CP Þ is unbounded from above since the ﬁrst term on the right hand side is always ﬁnite whereas the second term on the right is unbounded from above by construction of XCP. All of the above, of course, is also true if the reference point r is set equal to zero; therefore a prospect theory agent would prefer the lottery XCP to any ﬁnite amount of money. In this way, for any unbounded money transformation function one can construct a generalized St. Petersburg paradox for any of the ﬁve decision theories when they are deﬁned on an unbounded domain. Bounded money transformation functions are immune to critique with generalized St. Petersburg lotteries. We will explain, however, that on unbounded domains bounded money transformation functions imply implausible risk aversion, as next deﬁned. Let fy2 ; p; y1 g denote a binary lottery that pays the larger amount y2 with probability p and the smaller amount y1 with probability 1@p. We deﬁne ‘‘implausible risk aversion’’ for binary lotteries as follows. (I) Implausible risk aversion: for any z there exists a ﬁnite L such that the certain amount of money z þ L is preferred to the lottery f1; 0:5; zg.

3.2. Bounded Money Transformation Functions In order to escape the behaviorally implausible implications of the generalized St. Petersburg paradox for any theory in class D deﬁned on an unbounded domain, one needs to use a money transformation function that is bounded from above. But bounded money transformation functions imply implausible risk aversion, as we shall explain. We start with two illustrative examples using bounded, parametric money transformation functions commonly used in the literature. Subsequently, we present a

19

Risky Decisions in the Large and in the Small

general proposition for bounded money transformation functions that applies to all theories in class D. One of the commonly used money transformation (or utility) functions in the literature is the (concave transformation of the) exponential function, commonly known as CARA, deﬁned as:7 jD ðyÞ ¼ ð1@e@ly Þ;

l40

(12)

Deﬁne gD ð0:5Þ hD ð0:5; ½0:5Þ as the transformed probability of the higher outcome in a binary lottery with 0.5 probabilities of the two payoffs. For the exponential money transformation function in statement (12), it can be easily veriﬁed that decision theory D implies that a certain payoff in amount x þ lnð1@gD ð0:5ÞÞ@1=l is preferred to f1; 0:5; xg; for all x. For example, an EU maximizing agent (for whom gð0:5Þ ¼ 0:5) with l ¼ 0:29 would prefer a certain payoff of $25 (or, in the terminology of Proposition 2, x þ L ¼ $22 þ $3) to the lottery f$1; 0:5; $22g. The parameter value l ¼ 0:07 implies that an EU maximizing agent would prefer $32 for sure to the lottery f$1; 0:5; $22g. Another common parametric speciﬁcation in recent literature is the expopower (EP) function introduced by Saha (1993). Using the same notation as Holt and Laury (2002), the EP function is deﬁned as 1 1@r jD ðyÞ ¼ ð1@e@ay Þ; a

for ro1

(13)

The EP functional form converges to a CARA (bounded) function in the limit as r ! 0 and it converges to a power (unbounded) function in the limit as a ! 0. The power function is commonly known as CRRA.8 For some ða; rÞ parameter values the EP function is bounded while for other parameter values it is unbounded. With an EP function and aa0, a decision theory D implies that 1 þ ðx1@r þ ð1=aÞlnð1=ð1@gD ð0:5ÞÞÞÞ1=ð1@rÞ is preferred to f1; 0:5; xg; for any given x. For example, an EU maximizing agent with a ¼ 0:029 and r ¼ 0:269 would prefer a certain payoff in amount $77 to the lottery f$1; 0:5; $0g. The implied risk aversion for the above examples of money transformation functions would be at least as implausible with use of these parametric forms in cumulative prospect theory and rank dependent utility theory as in EU theory because in these former two theories the probability of the high outcome is pessimistically transformed; i.e. gD ð0:5Þo0:5.9 So, if models of cumulative prospect theory and rank dependent utility theory utilize the same bounded money transformation function as an EU model, then if the

20

JAMES C. COX AND VJOLLCA SADIRAJ

EU model predicts preference of a sure amount x þ L to risky lottery fG; 0:5; xg; for all G, so do cumulative prospect theory and rank dependent utility theory. These examples with commonly used parametric utility functions illustrate a general property of all theories in class D that admit bounded money transformation functions.10 The following proposition generalizes the discussion. Proposition 2. Consider any theory D in class D deﬁned on an unbounded domain that assumes a bounded money transformation function. For any given x there exists a ﬁnite L such that x þ LD f1; 0:5; xg. Proof. See Appendix A.3. The import of Proposition 2 can be explicated by considering the special case in which the money transformation function jD has an inverse function j@1 D . In that case the proof of Proposition 2 in Appendix A.3 tells us that that if jD ðyÞ A for all y then for any x40 the certain amount of money zD ¼ j@1 D ðgD ð0:5ÞA þ ð1@gD ð0:5ÞÞxÞ is preferred to a 50/50 lottery that pays x or any positive amount G no matter how large (represented as N). Clearly, L ¼ zD@x. Proposition 2 tells us that a bounded money transformation function is a sufﬁcient condition for the implication of implausible risk aversion of type (I) with decision theories in class D.

4. THEORY AND EXPERIMENTS FOR BOUNDED DOMAINS 4.1. Does the Original St. Petersburg Paradox have Empirical Relevance? There is a longstanding debate about the relevance of the original version of the St. Petersburg paradox for empirical economics. The claimed bite of the paradox has been based on thought experiments or hypothetical choice experiments in which it was reported that most people say they would be unwilling to pay more than a small amount of money to play a St. Petersburg game with inﬁnite expected value. A traditional dismissal of the relevance of the paradox is based on the observation that no agent could actually offer a real St. Petersburg game for another to play because such an offer would necessarily involve a credible promise to pay unboundedly large

Risky Decisions in the Large and in the Small

21

amounts of money. Recognition that there is a maximum affordable payment can resolve the paradox for expected value theory. For example, if the maximum affordable payment is (or is believed by the decision maker to be) $3.3554 107 (¼ $225 ) then the original St. Petersburg lottery is a game that actually pays $2n if no25, and $225 for n 25. The expected value of this game is only $26, so it would not be paradoxical if individuals stated they would be unwilling to pay large amounts to play the game. If the maximum affordable payment is $210 ¼ $1; 024 (respectively, $29 ¼ $512) then the expected value is $11 (respectively, $10). It would be affordable to test predictions from expected value theory for the last two lotteries with experiments.

4.2. Does the Generalized St. Petersburg Paradox have Empirical Relevance? It is straightforward to construct affordable St. Petersburg lotteries for any decision theory in class D that assumes unbounded money transformation functions. A corollary to Proposition 1 provides a result for an affordable version of the generalized St. Petersburg game for risk preferences that can be represented by functional (6). Corollary 1. (An affordable version of the generalized St. Petersburg Game) For any given N, consider a St. Petersburg lottery that pays xn 2 X j;h when a fair coin comes up heads for the ﬁrst time on ﬂip n, for noN, and pays xN, otherwise. Let U denote the value of functional (6) for this lottery. Then the agent is indifferent between the lottery and receiving a certain amount j@1 D ðUÞ. Proof. See Appendix A.2. Let us see what Corollary 1 tells us about one of the commonly used unbounded money transformation functions in the literature, the power function. Suppose that an agent’s preferences are assumed to be represented by the EU of income model with CRRA or power function utility (or money transformation) function jEU ðxÞ ¼ x1@r =ð1@rÞ for some r 2 ð0; 1Þ. Then the lottery prizes can be set equal to xn ¼ ðð1@rÞ2n Þ1=ð1@rÞ for noN þ 1, and xN for n4N. The corollary implies that the agent with power function coefﬁcient r would be indifferent between getting ðð1@rÞðN þ 1ÞÞ1=ð1@rÞ for sure and playing this game. Figures in the second column of Table 1 are constructed for generalized St. Petersburg games for different values of r.

L=[1,3,19,155] (CE(L)=6.45; EV(L)=23) L=[1,4,18,85,408] (CE(L)=9.78; EV=34.56)

0.67

g 0.62d

0.71e

0.56f

a 0.88d

0.5e

0.37b

L=[4,391] (CE(L)=61.62; EV(L)=197.5)

L=[4,36,96,220,503] (CE(L)=46.88; EV(L)=68.19)

L=[2,10,17,24,35,50,75,115,180,284,454] (CE(L)=18.50; EV(L)=11.11)

xn

CP and RD (jðxÞ ¼ xa )

b

A prize vector of length k means the lottery pays the nth coordinate when head appears for the ﬁrst time on ﬂip n for nok, and xk otherwise. The estimate of alpha is the estimate of Wu and Gonzalez (p. 1686) using Camerer and Ho (1994) data. c (ﬁeld data) Campo, Perrigne, and Vuong (2000). d Tversky and Kahneman (1992). e Wu and Gonzalez (1996). f Camerer and Ho (1994).

a

0.56c

L=[1,4,16,64,256] (CE(L)=9; EV(L)=23.5)

0.5

(f(p)=p/(2p)) L=[2,6,14,30,62,126,254,510] (CE(L)=9.6; EV(L)=16) (f(p)=p2) L=[2,6,22,86,342] (CE(L)=6; EV=32)

xn ¼ 1=hD ð2@n Þ

xn ¼ ðð1@rÞ2n Þ1=ð1@rÞ

L=[2,5,9,20,42,91,196,422] (CE(L)=10.56; EV(L)=12.19)

DU

Power function EU

0.1

r

Payments in Finite St. Petersburg Lotteriesa.

EV: xn ¼ 2n , [2,4,8,16,32,64,128,256,512], (EV=10)

Table 1.

22 JAMES C. COX AND VJOLLCA SADIRAJ

Risky Decisions in the Large and in the Small

23

Papers on several laboratory and ﬁeld experiments reported power function (CRRA) estimates in the range 0.44 to 0.67.11 The r ¼ 0:5 value in the table is close to the midpoint of these estimates. As shown in Table 1, an EU of income maximizer with power function parameter 0.5 has a certainty equivalent (CE) equal to 9 for the affordable St. Petersburg lottery {YN, 1/2N} with prizes and Y 1 ¼ ½. . . ; 256; 256; 64; 16; 4; 1, and respective probabilities 1=21 ¼ ½. . . ; 2@n ; . . . 2@2 ; 2@1 . For cumulative prospect theory with a value function xa and weighting function wþ ðpÞ ¼ pg =ðpg þ ð1@pÞg Þ1=g and with reference point 0 (as in Tversky & Kahneman, 1992), consider the St. Petersburg game that pays " #1=a ð21@n Þg ð2@n Þg xn ¼ @ (14) ðð21@n Þg þ ð1@ð21@n ÞÞg Þ1=g ðð2@n Þg þ ð1@ð2@n ÞÞg Þ1=g if head appears for the ﬁrst time on the n-th ﬂip for noN, and pays xN if the ﬁrst head appears on any toss n N þ 1. According to cumulative prospect theory, the utility of this game is N þ 1. Hence, the agent will be indifferent between $(N+1)1/a for sure and playing this game. Similar results hold for rank dependent utility theory. The last column of Table 1 shows a sequence of payments in an affordable St. Petersburg lottery for cumulative prospect theory models with a and g parameter values reported by Camerer and Ho (1994), Tversky and Kahneman (1992), and Wu and Gonzalez (1996). The Wu and Gonzalez parameter values of ða; gÞ ¼ ð0:5; 0:71Þ imply that a cumulative prospect theory decision maker with zero reference point has a CE of 46.88 for an affordable St. Petersburg lottery ({YN, PN}) with prizes Y 1 ¼ ½. . . ; 503; 503; 220; 96; 36; 4. As shown in Table 1, the parameter values ða; gÞ ¼ ð0:37; 0:56Þ used for rank dependent utility theory and cumulative prospect theory imply that an agent’s CE for the lottery {YN, 1/2N} with prizes Y 1 ¼ ½. . . ; 391; 391; 4 is 61.62. Finally, for the dual theory of EU we report payments involved in a generalized St. Petersburg game for two speciﬁcations of the function f: (a) f ðpÞ ¼ p=ð2@pÞ and (b) f ðpÞ ¼ p2 . The ﬁrst speciﬁcation is offered by Yaari as an example that solves the common ratio effect paradox (Yaari, 1987, p.105). The second speciﬁcation is used to demonstrate a rationale for using the Gini coefﬁcient to rank income distributions (Yaari, 1987, p. 106). Generalized versions of the St. Petersburg game involve payments 2nþ1@1 and 4n. The affordable versions of the generalized

24

JAMES C. COX AND VJOLLCA SADIRAJ

St. Petersburg game are reported in the DU column in Table 1. In case (b) with f ðpÞ ¼ p2 , an example is provided by the sequence of payments vDU ¼ ½. . . ; 342; 342; 86; 22; 6; 2 with expected value of 32 and dual EU U DU ðvDU ; 1=21 Þ ¼ 6.

4.3. A Real Experiment with a Finite St. Petersburg Game An experimental design with clear relevance to evaluating the empirical applicability of expected value theory is to offer subjects a ﬁnite St. Petersburg bet with highest possible payoff an amount that is known to be affordable for payment by the experimenter. One such experiment, reported by Cox, Sadiraj, and Vogt (2008a), involved offering subjects the opportunity to decide whether to pay their own money to play nine truncated St. Petersburg bets. One of each subject’s decisions was randomly selected for real money payoff. Bets were offered for N ¼ 1, 2,y, 9. Bet N had a maximum of N coin tosses and paid h2n if the ﬁrst head occurred on toss number n, for n ¼ 1; 2; . . . ; N, and paid nothing if no head occurred. The price offered to a subject for playing bet N was 25 euro cents lower than hN where, of course, hN was the expected value of bet N. An expected value maximizer would accept all of these bets. The experimenter could credibly offer the game to the subjects because the highest possible payoff was h512 (¼ 29 ) for each subject. Cox et al. (2008a) report that 47% of their subjects’ choices were to reject the opportunity to play the St. Petersburg bets. They use a linear mixture model (Harless & Camerer, 1994) to estimate whether a risk neutral preference model can characterize the data. Let the letter a denote a subject’s response that she accepts the offer to play a speciﬁc St. Petersburg game in the experiment. Let r denote rejection of the offer to play the game. The linear mixture model is used to address the speciﬁc question whether, for the nine St. Petersburg games offered to their subjects, the risk neutral response pattern ða; a; a; a; a; a; a; a; aÞ or the risk averse response pattern ðr; r; r; r; r; r; r; r; rÞ is more consistent with the data. Let the stochastic preferences with error rate be speciﬁed in the following way: (a) if option Z is preferred then Prob(choose Z) ¼ 1@; and (b) if option Z is not preferred then Prob(choose Z) ¼ . The maximum likelihood point estimate of the proportion of subjects for which risk neutral preferences are rejected in favor of risk averse preferences is 0.49, with a Wald 90% conﬁdence interval of (0.30, 0.67). They conclude that 30–67% of the subjects are not risk neutral in this experiment.

Risky Decisions in the Large and in the Small

25

4.4. Plausibility Checks on Empirical Findings with St. Petersburg Games Experiments with St. Petersburg games can be designed by following the logic of the discussion in Section 4.2. Of course, as that discussion makes clear, one needs a postulated money transformation function and/or postulated probability transformation function to construct the payoffs for the experiment. But that, in itself, does not rule out the possible empirical relevance of the generalized St. Petersburg game, as can be understood from the following. If a researcher concludes, say, that EU theory with pﬃﬃﬃpower function utility (or money transformation) function jEU ðxÞ ¼ x can rationalize risk preferences on a ﬁnite domain of payoffs ½z; Z, this opens the question of whether the conclusion is plausible because it implies that the EU maximizing agents would accept all ﬁnite St. Petersburg bets with prizes xn ¼ 4n , n ¼ 1, 2, y, N, so long as 4N Z. The theory implies that the agent with power coefﬁcient 1/2 would reject any sure amount of money up to $(N+1)2 in favor of playing the ﬁnite St. Petersburg lottery with a maximum payoff of N coin tosses that pays $4n if the ﬁrst head occurs on toss number n, for noN þ 1, and pays $4N otherwise. This experiment would be feasible to run for values of N such that $4N is affordable. It would provide an empirical check on plausibility of the conclusion that EU theory with square root power function preferences can rationalize the subjects’ risky decisions on domain ½z; Z. For example, a ﬁnite version with N ¼ 5 of this game that can be credibly tested in the laboratory is reported in Table 1. Let Y 5 ¼ ½256; 64; 16; 4; 1 and 1=25 ¼ ½2@4 ; 2@4 ; 2@3 ; 2@2 ; 2@1 denote the ﬁnite St. Petersburg game that pays $1 if a coin lands ‘‘head’’ on the ﬁrst ﬂip, $4 if the coin lands ‘‘head’’ for the ﬁrst time on the second ﬂip, $16 if the coin lands ‘‘head’’ for the ﬁrst time on the third ﬂip, $64 if the coin lands ‘‘head’’ for the ﬁrst n on the fourth ﬂip, and $256 otherwise (with probaP time bility 1@ 4n¼1 12 ). The expected value of this game is $23.5 whereas U EUI ðY 5 ; 1=25 Þ ¼ 3. Hence, the EU of income model predicts that the agent will prefer getting $10 for sure to playing this game. The expected value model, however, predicts that the agent prefers this game to getting $23 for sure. For cumulative prospect theory, the last column of Table 1 shows a sequence of payments in a generalized St. Petersburg game. Only payments that are smaller than $500 are reported since that is reasonably affordable in an experiment. Suppose for instance that someone has preferences that can be represented by cumulative prospect theory with reference point 0, g ¼ 0:71, and a ¼ 0:5 as reported by Wu and Gonzalez (1996). A ﬁnite version of the generalized version of the St. Petersburg game for this case that can be credibly tested in the laboratory is vPT ¼ ½503; 220; 96; 36; 4.

26

JAMES C. COX AND VJOLLCA SADIRAJ

That is, the game pays $4 if a coin lands ‘‘head’’ on the ﬁrst ﬂip, $36 if the coin lands ‘‘head’’ for the ﬁrst time on the second ﬂip, $96 if the coin lands ‘‘head’’ for the ﬁrst time on the third ﬂip, $220 if the coin lands ‘‘head’’ for the ﬁrst time on the fourth ﬂip, and $503 otherwise. The expected value of this game is $68.19 whereas U CP ðvCP ; 1=25 Þ ¼ 5:1. Hence, cumulative prospect theory with the above parameter speciﬁcations predicts that the agent will prefer getting $26 for sure to playing this game. The expected value model, however, predicts that the agent prefers the game to getting $68 for sure. Table 1 also reports examples of lotteries and predictions by rank dependent utility theory and dual theory of EU, as discussed in Section 4.2.

4.5. Plausibility Checks on Empirical Findings with Binary Lotteries Proposition 2 can provide a researcher with checks on the empirical plausibility of estimates of risk aversion parameters on a ﬁnite domain ½z; Z. Using the notation of the proposition, questions that are clearly relevant to a ﬁnite domain involve payoff amounts x and x þ L and G, all in the domain of interest, that imply x þ L for sure is preferred to fG; 0:5; xg. Implications such as these provide plausibility checks on reported parameter estimates. Table 2 presents some implications of two money transformation (or utility) functions using parameter estimates for three experiments with small stakes lotteries reported in the literature. The parameter estimates are taken from Harrison and Rutstro¨m (2008, Table 8, p. 120). Unlike the discussion in Section 3.2 above, we here examine the implications of estimated parametric money transformation functions only on the local domains of the data samples used in estimation of the parameters. As shown at the top of Table 2, data are from experiments reported by Holt and Laury (2005), Hey and Orme (1994), and Harrison and Rutstro¨m (2008). As shown just below the top of the table, parameter estimates from two functional forms are used: CRRA and EP. As shown at the next level in the table, estimates based on two theories are used: EU of income models and rank dependent utility models (RD). The entries in the ﬁrst and third columns of Table 2 convey the following information. The third column reports parameter estimates for a rank dependent utility model with power functions for both the money transformation and probability transformation functions. Data from the experiment reported in Holt and Laury (2005) yield the parameter estimate b r ¼ 0:85 for the money transformation function and the parameter estimate bg ¼ 1:46 for the probability transformation function. With these parameters,

0.4 0.2 0.08

b r ¼ 0:85 bg ¼ 1:46b

b r ¼ 0:76

4.3 1.7 0.78

RD

EU

CRRA

15.8 7.5 3.81

b r ¼ 0:4 b a ¼ 0:07

EU

Holt and Laury (2005)

RD

b r ¼ 0:26 b a ¼ 0:02b bg ¼ 0:37 8.6 3.4 1.9

EP

5.1 2.4

b r ¼ 0:61

EU

RD

5.1 2.4

b r ¼ 0:61 bg ¼ 0:99

CRRA

4.6 1.81

b r ¼ 0:82 b a ¼ @1:06

EU

Hey and Orme (1994)

EP

4.6 1.81

b r ¼ 0:82 b a ¼ @1:06 bg ¼ 0:99

RD

3.3

b r ¼ 0:53

EU

RD

3.2

b r ¼ 0:53 bg ¼ 0:97

CRRA

3.1

b r ¼ 0:78 b a ¼ @1:10

EU

EP

3.1

b r ¼ 0:78 b a ¼ @1:10 bg ¼ 0:97

RD

Harrison and Rutstro¨m (2008) replication of Hey and Orme (1994)

Predictions for Binary Lotteries Using Parameter Point Estimates from Small Stakes Data.

a The higher payoff in a binary lottery is within the range of payoffs used in the experiment. Numbers are in US dollars for the Holt–Laury and Harrison–Rutstro¨m studies and in British ponds for the Hey–Orme study in the middle columns. b p-values >0.1.

f77; 0:5; 0g f30; 0:5; 0g f14; 0:5; 0g

Binary Lotteriesa

Table 2.

Risky Decisions in the Large and in the Small 27

28

JAMES C. COX AND VJOLLCA SADIRAJ

the rank dependent utility model implies that $0.40 for sure (in column 3) is preferred to the lottery f$77; 0:5; $0g (in column 1). It seems to us likely that almost all people would have risk preferences that are inconsistent with this prediction and, in that sense, that the estimated parametric utility function is implausible. Importantly, the prediction that $0.40 for sure is preferred to {$77, 0.5; $0} is clearly testable and, therefore, a conclusion about plausibility or implausibility of the estimated model can be based on data not mere opinion. Estimation of the CRRA parameter using the EU of income model and data from Holt and Laury (2005) yields b r ¼ 0:76. With this parameter, as reported in the second column of Table 2, $4.30 for sure is preferred to the lottery f$77; 0:5; $0g. The fourth and ﬁfth columns of Table 2 report parameter estimates for the EP money transformation function. The parameter estimates imply that $8.60 for sure is preferred to the lottery f$77; 0:5; $0g for the rank dependent utility model. The preferred sure amount of money is $15.80 in case of the EU of income model. Table 2 uses point estimates of parameters from three data sets and four combinations of money transformation and probability transformation functions to derive implied preferences for sure amounts of money (in all columns except the ﬁrst) over binary lotteries (in the ﬁrst column). All of these implied preferences are stated on domains that are the same or smaller than those for the data samples. Furthermore, all of these implied preferences are testable with real, affordable experiments. Conducting such tests would provide data to inform researchers’ decisions about whether the estimated parametric forms provide plausible or implausible characterizations of the risk attitudes of the subjects in experiments. Finally, similar experiments can be designed with binary lotteries based on any parameter estimates within the 90% conﬁdence limits of the estimation if a researcher wants to thoroughly explore the plausibility question. In the preceding sections, we have explored testable implications for empirical plausibility of parametric forms of decision theories in class D. Some recent studies have identiﬁed patterns of risk aversion, known as calibration patterns, that can be used to test plausibility of theories under risk without any parametric speciﬁcations. Concavity calibrations involve certain types of patterns of choices that target decision theories under risk that assume concave money transformation (or utility) functions (Rabin, 2000; Neilson, 2001; Cox & Sadiraj, 2006; Rubinstein, 2006). Convexity calibrations, on the other hand, involve patterns of risk aversion that apply to theories that represent risk aversion with probability transformation functions (Cox et al., 2008b). The following three sections summarize what

Risky Decisions in the Large and in the Small

29

is currently known about the empirical validity of patterns of risk aversion underlying calibration propositions.

4.6. Do Concavity Calibrations of Payoff Transformation (or Utility) Functions have Empirical Relevance? Cox et al. (2008b) report an experiment run in Calcutta, India to test the empirical validity of a postulated pattern of small stakes risk aversion that has implications for cumulative prospect theory, rank dependent utility theory, and all three EU models discussed in Cox and Sadiraj (2006), the EU of terminal wealth model, the EU of income model, and the EU of initial wealth and income model. Subjects in the Calcutta experiment were asked to choose between a certain amount of money, x rupees (option B) and a binary lottery that paid x@20 rupees or x+30 rupees with equal probability (option A) for values of x from a ﬁnite set O. Subjects were informed that one of their decisions would be randomly selected for payoff. The amount at risk in the lotteries (50 rupees) was about a full day’s pay for the subjects in the experiment. By Proposition 2 and Corollary 2 in Cox et al. (2008b), if a subject chooses option B for at least four sequential values of x then calibration of the revealed pattern of small stakes risk aversion implies behaviorally implausible large stakes risk aversion. They call any choice pattern that meets this criterion a ‘‘concavity calibration pattern’’ and test a null hypothesis that the data are not characterized by concavity calibration patterns against an alternative that includes them. To conduct the test, Cox et al. (2008b) applied a linear mixture model similar to that described in Section 4.3. The reported point estimate for the proportion of the subjects in the Calcutta experiment that made choices for which EU theory, rank dependent utility theory, cumulative prospect theory (with 0 reference point payoff) imply implausible large stakes risk aversion was 0.495, with Wald 90% conﬁdence interval of (0.289, 0.702). They conclude that 29–70% of the subjects made choices that, according to three theories of risky decision making, can be calibrated to imply implausible large stakes risk aversion. According to Proposition 2 and Corollary 2 in Cox et al. (2008b), this conclusion applies to all theories in class D that represent risk preferences with concave transformations of payoffs. Thus the conclusion applies to all EU models regardless of whether they specify full asset integration (the terminal wealth model), no asset integration (the income model), or partial asset integration (variants of the initial wealth and income model).

30

JAMES C. COX AND VJOLLCA SADIRAJ

Prospect theory can be immunized to concavity calibration critique by introducing variable reference points set equal to the x values in the Calcutta experiments (Wakker, 2005). The variable reference points do not, however, immunize prospect theory to other tests with data from the experiment because they imply that a subject will make the same choice (of the lottery or the certain payoff ), for all values of the sure payoff x. Cox et al. (2008b) report that the likelihood ratio test rejects this ‘‘non-switching hypothesis’’ in favor of an alternative that allows for one switch at 5% signiﬁcance level. Adding possible choice patterns with more than one switch to the alternative hypothesis would also lead to rejection of the non-switching hypothesis. Hence, variable reference points do not rescue cumulative prospect theory from inconsistency with the data from the experiment.

4.7. Do Convexity Calibrations of Probability Transformation Functions have Empirical Relevance? Cox et al. (2008b) demonstrate that the problem of possibly implausible implications from theories of decision making under risk is more generic than implausible (implications of ) decreasing marginal utility of money by extending the calibration literature in their Proposition 1 to include the implications of convex transformations of decumulative probabilities used to model risk aversion in the dual theory. They report another experiment run in Magdeburg, Germany in which subjects were asked to make nine choices between pairs of lotteries. Subjects were informed that one of their decisions would be randomly selected for payoff. Decision task i, for i ¼ 1; 2; . . . ; 9, presented a choice between lottery A that paid h40 with probability i=10 and h0 with probability 1@i=10 and lottery B that paid h40 with probability ði@1Þ=10, h10 with probability 2/10, and h0 with probability 1@½ði@1 þ 2Þ=10. By Proposition 1 in Cox et al. (2008b), if a subject chooses lottery B for at least seven sequential values of the probability index i then calibration of the revealed pattern of small stakes risk aversion implies implausible large stakes risk aversion for the dual theory. They call any choice pattern that meets this criterion a ‘‘convexity calibration pattern’’ and test the null hypothesis that the data are not characterized by convexity calibration patterns against an alternative that includes them. Again applying a linear mixture model, Cox et al. (2008b) report that the linear mixture model yields a point estimate of 0.81 and Wald 90% conﬁdence interval of (0.66, 0.95) for the proportion of subjects for which the dual theory implies implausible risk aversion. Thus the data

Risky Decisions in the Large and in the Small

31

are consistent with the conclusion that 66–95% of the subjects made choices that, according to the dual theory, can be calibrated to imply implausible large stakes risk aversion.

4.8. Is the Expected Utility of Terminal Wealth Model More (or Less) Vulnerable to Calibration Critique than Other Theories? Rabin (2000) initiated recent literature on the large stakes risk aversion implications implied by calibration of postulated patterns of small stakes risk aversion. His analysis is based on the supposition that an agent will reject a small stakes gamble with equal probabilities of 50% of winning or losing relatively small amounts, and that the agent will do this at all initial wealth levels in some large interval. For example, Rabin demonstrated that if an agent would reject a 50/50 bet in which she would lose $100 or gain $110 at all initial wealth levels up to $300,000 then the EU of terminal wealth model implies that, at an initial wealth level of $290,000, that agent would also reject a 50/50 bet in which she would lose $6,000 or gain $180 million. Rabin (2000) and Rabin and Thaler (2001) stated strong conclusions about implausible risk aversion implications for EU theory implied by their supposed patterns of small stakes risk aversion but reported no experiments supporting the empirical validity of the suppositions. Their conclusions about EU theory were taken quite seriously by some scholars (Kahneman, 2003; Camerer & Thaler, 2003) and by a Nobel Prize committee (Royal Swedish Academy of Sciences, 2002, p. 16), despite the complete absence of data consistent with the supposed patterns of small stakes risk aversion underlying the concavity calibrations. It is ironic that, in this heyday of behavioral economics, strong conclusions about the behavioral plausibility of theory could be drawn without any actual observations of behavior. As explained by Cox and Sadiraj (2006), observations of behavior consistent with the pattern of risk aversion supposed in Rabin’s concavity calibration would have limited implications for risky decision theory because they would have no implications for EU models other than the terminal wealth model nor for other theories in class D in which income rather than terminal wealth is postulated as the argument of functional (6). Furthermore, an experiment that could provide empirical support for Rabin’s supposition would have to be conducted with a within-subjects design, as we shall explain after ﬁrst explaining problems with acrosssubjects experiments in the literature.

32

JAMES C. COX AND VJOLLCA SADIRAJ

Barberis, Huang, and Thaler (2003) report an across-subjects, hypothetical experiment with a 50/50 lose $500 or gain $550 bet using as subjects MBA students, ﬁnancial advisers, investment ofﬁcers, and investor clients. They report that about half of the subjects stated they would be unwilling to accept the bet. They do not report wealth data for these subjects nor the relationship, if any, between subjects’ decisions and their wealth levels; therefore the relation between the subjects’ decisions and the supposed pattern of risk aversion used in concavity calibration propositions is unknown. Barberis et al. (2003) also report an across-subjects, real experiment with a 50/50 lose $100 or gain $110 bet using MBA students as subjects. They report that only 10% of the subjects were willing to play the bet. No wealth data are reported for these subjects either. It is straightforward to show that any across-subjects experiment involving one choice per subject cannot provide data that would support the conclusion of implausible risk aversion. Suppose one has a sample from an experiment (like the two Barberis et al., 2003 experiments) in which each of N subjects is asked to make one decision about accepting or rejecting a 50/50 lose $100 or gain $110 bet. Suppose that the initial wealth level of every subject is observed and that these wealth levels vary across a large range, say from a low of $100 to a high of $300,000. Would such a data sample provide support for any conclusion about the EU of terminal wealth model? Without making other assumptions about preferences, the answer is clearly ‘‘no’’ as we next explain. Suppose that we observe individual wealth levels w~ j 2 ½100; 300K, j ¼ 1; 2; . . . ; N, for each of N individuals and that every one of them rejects the 50/50 lose $100, gain $110 bet. Can they all be EU of terminal wealth maximizers with globally plausible risk aversion? Yes, and the following equation can be used to generate N utility functions with parameters aj and rj, each of which implies indifference between accepting and rejecting the bet at the observed individual wealth levels: 100 rj 110 rj 2 ¼ 1@ þ 1þ ; aj w~ j@100 (15) w~ j@aj w~ j@aj Any ordered pair of parameters ðaðw~ j Þ; rðw~ j ÞÞ below the graph of the level set of this equation can be used to construct a utility function uj ðw~ j þ yÞ ¼ ð@aðw~ j Þ þ w~ j þ yÞrðw~ j Þ

(16)

that implies rejection of the bet for an EU of terminal wealth maximizer with initial wealth w~ j and money transformation function given by (16). But

Risky Decisions in the Large and in the Small

33

data-consistent utility functions for all subjects exhibit plausible risk aversion globally. Therefore, the empirical relevance of Rabin’s concavity calibration for the EU of terminal wealth model cannot be tested with such an across-subjects experiment. The empirical validity of Rabin’s concavity calibration for the EU of terminal wealth model could, however, be tested with a within-subjects experiment. Let subject j have initial wealth wj at the beginning of the experiment. In round t of the experiment, give subject j an amount of money xt and an opportunity to play a 50/50 bet with loss of 100 or gain of 110. Choose the set X of values for xt so that there are enough observations covering a sufﬁciently large range that concavity calibration can bite. An example of suitable speciﬁcations of the set X are provided by the sets of certain income payoffs used in the Calcutta experiment reported in Cox et al. (2008b) and summarized above. Consider the set of certain payoff x values used in the Calcutta experiment; deﬁne X ¼ f100; 1K; 2K; 4K; 5K; 6Kg and let xt denote the value in position t in this set. Using subject j’s (observed) initial wealth wt at the beginning of the experiment, and the controlled values xt, t ¼ 1; 2; . . . ; 6, deﬁne subject j’s variable initial wealth level during the experiment as ojt ¼ wj þ xt . At round t in the experiment, give the subject xt and then ask her whether she wants to accept the 50/50 gamble with loss amount 100 and gain amount 110. If the answer is ‘‘no’’ for at least four sequential values of x then Proposition 2 in Cox et al. (2008b) and Rabin’s (2000) concavity calibration proposition imply implausible risk aversion for the EU of terminal wealth model. Therefore this type of ‘‘pay-x-in-advance,’’ within-subjects experiment could support, or fail to support, the empirical relevance of Rabin’s concavity calibration supposition for the terminal wealth model.12

5. SUMMARY IMPLICATIONS FOR THEORIES OF RISKY DECISIONS Some implications for theories of decision making under risk are straightforward while others are nuanced. 5.1. Decision Theories on Unbounded Domain have Implausible Implications One implication is that all theories in class D have the same problems with respect to the plausibility of modeling decisions under risk on an unbounded

34

JAMES C. COX AND VJOLLCA SADIRAJ

domain. This conclusion follows from the demonstration that, on an unbounded domain, theories in class D are characterized by either the generalized St. Petersburg paradox or implausible aversion to risk of type (I). This raises doubts about the plausibility of classic developments of EU theory for risky decisions (Pratt, 1964; Arrow, 1971). But this plausibility critique of the theory is not conﬁned to EU theory; instead, it applies to all theories in a class that contains cumulative prospect theory, rank dependent utility theory, and dual theory of EU (as well as EU theory). In this sense, the fundamental problems shared by these theories may be more signiﬁcant than their much-touted differences.

5.2. Implications for Theory and its Applications on Bounded Domains Theories of risky decisions deﬁned on bounded domains can be characterized by the generalized St. Petersburg paradox or by implausible large stakes risk aversion or by neither problem. Conclusions for theory on bounded domains are more nuanced, and more complicated, but they are empirically testable. Concavity calibration of postulated patterns of risk aversion have implausible large stakes risk aversion implications for all theories in class D that incorporate decreasing marginal utility of money except for speciﬁc versions of prospect theory that postulate variable reference points (which are rejected by testing the data). Implausible implications for theory following from calibrating postulated patterns of risk aversion are not conﬁned to theories with decreasing marginal utility of money. The dual theory of EU, characterized by constant marginal utility of money, can be critiqued with convexity calibration of the probability transformation that exclusively incorporates risk aversion into this theory. Whether or not critiques with the generalized St. Petersburg paradox or (calibrated) implausible large stakes risk aversion have bite for theories deﬁned on bounded domains are empirical questions. The reason for this is apparent: people may accept feasible St. Petersburg bets and/or they may not reject the small stakes bets postulated in calibrations. If both those outcomes were observed, the St. Petersburg paradox and calibration critiques would have no implication of implausible theory for bounded domains. To date, the empirical evidence is limited. As discussed above, even on very large bounded domains, expected values for St. Petersburg bets are quite small, of the order of $25, which (for what it’s worth) is consistent with commonly reported subjects’ statements about

Risky Decisions in the Large and in the Small

35

willingness to pay to play the bets in hypothetical experiments. In one real payoff experiment with ﬁnite St. Petersburg bets reported by Cox et al. (2008a), 30–67% of the subjects revealed risk preferences that were inconsistent with the expected value model. There is not yet any existing study that supports the conclusion that terminal wealth models are more vulnerable to calibration critique than income models. There are various misstatements in the literature about the existence of data supporting Rabin’s (2000) supposition that an agent will reject a given small stakes bet at all initial wealth levels in a wide interval. In fact, there is no test of Rabin’s supposition in the literature. Furthermore, a test of this supposition would, in any case, have no implications for models in which income rather than terminal wealth is the argument of utility functionals (Cox & Sadiraj, 2006). The within-subjects Calcutta experiment with concavity calibration reported by Cox et al. (2008b) has implications for all three EU models, rank dependent utility theory, and the original version of cumulative prospect theory with constant reference point equal to zero income (Tversky & Kahneman, 1992). This was a within-subjects, real payoff experiment. In the Calcutta experiment, 25–62% of the subjects made patterns of small stakes risky choices for which EU theory, rank dependent utility theory, and prospect theory (with zero reference point payoff ) imply implausible large stakes risk aversion. Variable reference points can be incorporated into prospect theory in ways that immunize the theory to concavity calibration critique with this experimental design. But the testable implication of this version of prospect theory has a high rate of inconsistency with data from the Calcutta experiment and is rejected in favor of the ‘‘calibration pattern’’ by a likelihood ratio test. The Madgeburg experiment with convexity calibration for probability transformations (Cox et al., 2008b) has implications for the dual theory of EU that has constant marginal utility of money and incorporates risk aversion solely through non-linear transformation of probabilities. In this experiment, 66–95% of the subjects made patterns of risky choices for which the dual theory of EU implies implausible large stakes risk aversion. We conclude that, together, the Calcutta concavity calibration experiment and Magdeburg convexity calibration experiment provide data that suggest skepticism about the plausibility of popular theories of decision making for risky environments. However, more experiments and larger samples are needed to arrive at deﬁnitive conclusions about the empirical relevance of the calibration propositions. One thing that is clear is that the traditional focus on decreasing marginal utility of money as the source of implausible

36

JAMES C. COX AND VJOLLCA SADIRAJ

implications from calibration of postulated patterns of risk aversion is wrong; modeling risk aversion with probability transformations also can produce implausible implications from calibration. Empirical research leading to conclusions that estimated parametric forms of utility functionals can represent subjects’ behavior in risky decision making can be checked for plausibility by applying research methods explained here. Two types of questions can be posed. First, does the estimated parametric form survive testing with St. Petersburg lotteries that can be derived from the parametric form using methods explained above? Second, does the estimated parametric form of a utility functional survive experimental testing with binary lottery designs that can be derived from the parametric form using methods explained above? If the answer to either question is ‘‘no’’ then the conclusion that the estimated utility functional can rationalize risk taking behavior is called into question.

NOTES 1. The EV theory of risk preferences has the same implications if terminal wealth rather than income is assumed to be the random lottery payoff in the functional in statement (1). 2. The expected utility of income model was used to develop much of Bayesian– Nash equilibrium bidding theory. See, for examples: Holt (1980), Harris and Raviv (1981), Riley and Samuelson (1981), Cox, Smith, and Walker (1982), Milgrom and Weber (1982), Matthews (1983), Maskin and Riley (1984), and Moore (1984). 3. See Harrison, List, and Towe (2007) and Heinemann (2008) for empirical applications of partial asset integration models. 4. We write the functional for rank dependent utility theory with transformation of cumulative probabilities in the same way as Quiggin (1982, 1993). Some later expositions of this theory use a logically equivalent representation with transformation of decumulative probabilities. 5. Loss aversion, when deﬁned as a discontinuity in the slope of the utility function at zero income, is consistent with the expected utility of income model (Cox and Sadiraj, 2006). 6. If there exists an inverse function j@1 EU then the sequence of payoffs zn ¼ n j@1 EU ð2 Þ provides a generalized St. Petersburg game with inﬁnite expected utility. 7. In the context of the expected utility of terminal wealth model, utility function (12) represents constant absolute risk averse preferences, which is the source of the name CARA. 8. For the case of the expected utility of terminal wealth model, power function utility represents constant relative risk averse preferences, which is the source of the name CRRA. 9. Tversky and Kahneman (1992, p. 300) provide the value gD ð0:5Þ ¼ 0:42 (where, in our notation, gD is the same as their probability weighting function for gains wþ ).

Risky Decisions in the Large and in the Small

37

10. Clearly, Proposition 2 does not apply to expected value theory and the dual theory of expected utility theory because their money transformation functions are (linear and hence) unbounded. 11. As cited in Holt and Laury (2002, fn. 9, p.1649), CRRA estimates in the range 0.44–0.67 were reported by Campo et al. (2000), Chen and Plott (1998), Cox and Oaxaca (1996), Goeree and Holt (2004), and Goeree, Holt, and Palfrey (2002, 2003). Harrison, Lau, and Rutstro¨m (2007) report CRRA estimates within the same range using ﬁeld experiment data. 12. In contrast, this type of experiment could not produce data that would have a calibration-pattern implication for any of the other models discussed above for which income, not terminal wealth is the argument of the utility functional (Cox & Sadiraj, 2006). However, this type of experiment would have a testable implication for all other models in class D: (a) always choose the risky option with EV theory; or (b) always choose the same option with other theories.

ACKNOWLEDGMENT We thank Glenn W. Harrison and Nathanial T. Wilcox for helpful comments and suggestions. Financial support was provided by the National Science Foundation (grant numbers DUE-0622534 and IIS-0630805).

REFERENCES Arrow, K. J. (1971). Essays in the theory of risk-bearing. Chicago, IL: Markham. Barberis, N., Huang, M., & Thaler, R. (2003). Individual preferences, monetary gambles and the equity premium. NBER Working Paper 9997. Bernoulli, D. (1738). Specimen Theoriae Novae de Mensura Sortis, Commentarii Academiae Scientiarum Imperialis Petropolitanae 5, 175–192. English translation (1954): Exposition of a new theory on the measurement of risk. Econometrica, 22, 23–36. Camerer, C., & Thaler, R. H. (2003). In honor of Matthew Rabin: Winner of the John Bates Clark medal. Journal of Economic Perspectives, 17, 159–176. Camerer, C. F., & Ho, T. (1994). Violations of the betweenness axiom and nonlinearity in probability. Journal of Risk and Uncertainty, 8, 167–196. Campo, S., Perrigne, I., & Vuong, Q. (2000). Semi-parametric estimation of ﬁrst-price auctions with risk aversion. Working Paper. University of Southern California. Chen, K., & Plott, C. R. (1998). Nonlinear behavior in sealed bid ﬁrst-price auctions. Games and Economic Behavior, 25, 34–78. Cox, J. C., & Oaxaca, R. L. (1996). Is bidding behavior consistent with bidding theory for private value auctions?. In: R. M. Isaac (Ed.), Research in experimental economics (Vol. 6). Greenwich, CT: JAI Press. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56, 45–60.

38

JAMES C. COX AND VJOLLCA SADIRAJ

Cox, J. C., Sadiraj, V., & Vogt, B. (2008a). On the empirical relevance of St. Petersburg lotteries. Experimental Economics Center Working Paper 2008—05. Georgia State University. Cox, J. C., Sadiraj, V., Vogt, B., & Dasgupta, U. (2008b). Is there a plausible theory for decision under risk? Experimental Economics Center Working Paper 2008—04. Georgia State University. Cox, J. C., Smith, V. L., & Walker, J. M. (1982). Auction market theory of heterogeneous bidders. Economics Letter, 9, 319–325. Goeree, J. K., & Holt, C. A. (2004). A model of noisy introspection. Games and Economic Behavior, 47, 365–382. Goeree, J. K., Holt, C. A., & Palfrey, T. (2002). Quantal response equilibrium and overbidding in private-value auctions. Journal of Economic Theory, 104, 247–272. Goeree, J. K., Holt, C. A., & Palfrey, T. (2003). Risk averse behavior in generalized matching pennies games. Games and Economic Behavior, 45, 97–113. Harless, D., & Camerer, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62, 1251–1290. Harris, M., & Raviv, A. (1981). Allocation mechanisms and the design of auctions. Econometrica, 49, 1477–1499. Harrison, G. W., Lau, M., & Rutstro¨m, E. E. (2007). Estimating risk attitudes in Denmark: A ﬁeld experiment. Scandinavian Journal of Economics, 109, 341–368. Harrison, G. W., List, J., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. Econometrica, 75, 433–458. Harrison, G. W., & Rutstro¨m, E. E. (2008). Risk aversion in the laboratory. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Greenwich, CT: JAI Press, Research in Experimental Economics. Heinemann, F. (2008). Measuring risk aversion and the wealth effect. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Greenwich, CT: JAI Press, Research in Experimental Economics. Hey, J., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62, 1291–1326. Holt, C. A., Jr. (1980). Competitive bidding for contracts under alternative auction procedures. Journal of Political Economy, 88, 433–445. Holt, C. A., & Laury, S. (2002). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655. Holt, C. A., & Laury, S. (2005). Risk aversion and incentive effects: New data without order effects. American Economic Review, 95, 902–912. Kahneman, D. (2003). A psychological perspective on economics. American Economic Review Papers and Proceedings, 93, 162–168. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Laffont, J. J. (1989). The economics of uncertainty and information. Cambridge, MA: MIT Press. Maskin, E., & Riley, R. (1984). Optimal auctions with risk averse buyers. Econometrica, 52, 1473–1518. Matthews, S. A. (1983). Selling to risk averse buyers with unobservable tastes. Journal of Economic Theory, 30, 370–400. Milgrom, P. R., & Weber, R. J. (1982). A theory of auctions and competitive bidding. Econometrica, 50, 1089–1122. Moore, J. (1984). Global incentive constraints in auction design. Econometrica, 52, 1523–1536.

39

Risky Decisions in the Large and in the Small

Neilson, W. S. (2001). Calibration results for rank-dependent expected utility. Economics Bulletin, 4, 1–5. Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32, 122–136. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3, 323–343. Quiggin, J. (1993). Generalized expected utility theory. The rank-dependent model. Boston, MA: Kluwer Academic Publishers. Rabin, M. (2000). Risk aversion and expected utility theory: A calibration theorem. Econometrica, 68, 1281–1292. Rabin, M., & Thaler, R. H. (2001). Anomalies: Risk aversion. Journal of Economic Perspectives, 15, 219–232. Rieger, M. O., & Wang, M. (2006). Cumulative prospect theory and the St. Petersburg paradox. Economic Theory, 28, 665–679. Riley, J. G., & Samuelson, W. F. (1981). Optimal auctions. American Economic Review, 71, 381–392. Royal Swedish Academy of Sciences (2002). Foundations of behavioral and experimental economics: Daniel Kahneman and Vernon Smith. Advanced Information on the Prize in Economic Sciences, 17, 1–25. Rubinstein, A. (2006). Dilemmas of an economic theorist. Econometrica, 74, 865–883. Samuelson, P. A. (1977). St. Petersburg paradoxes: Defanged, dissected, and historically described. Journal of Economic Literature, 15, 24–55. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5, 297–323. Saha, A. (1993). Expo-power utility: A ‘ﬂexible’ form for absolute and relative risk aversion. American Journal of Agricultural Economics, 75, 905–913. von Neumann, J., & Morgenstern, O. (1947). Theory of games and economic behavior. Princeton, NJ: Princeton University Press. Wakker, P. P. (2005). Formalizing reference dependence and initial wealth in Rabin’s calibration theorem. Working Paper. Econometric Institute, Erasmus University, Rotterdam. Wu, G., & Gonzalez, R. (1996). Curvature of the probability weighting function. Management Science, 42, 1676–1690. Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica, 55, 95–115.

APPENDIX A.1 Lemma. Let functions h : ½0; 1 ! ½0; 1, s.t. hðpn ; P@n 1 Þa0 for all pn a0; and j : < ! <, be given. If j is unbounded from above then (a) for all n 2 N, there exists 1=hð1=2n ; ð1=21 Þ@n Þ, and P @n n (b) Þjðxn Þ ¼ 1 n2N hð1=2 ; ð1=21 Þ

xn 2 N

such

that

jðxn Þ

40

JAMES C. COX AND VJOLLCA SADIRAJ

Proof. (a) It follows from hð1=2n ; ð1=21 Þ@n Þa0, for all n 2 N, and by deﬁnition of a function being unbounded from above. (b) Part (a) and hð1=2n ; ð1=21 Þ@n Þ40 implies jðxn Þ hð1=2n ; ð1=21 Þ@n Þ 1, hence (b) is true. A.2 Corollary: An Affordable Version of the Generalized St. Petersburg Game. Let an agent’s preferences, kj;h on a lottery space be represented by functional (6) with an unbounded money transformation function j and probability transformation function h. Deﬁne X j;h ¼ fxn jn 2 N; xn ¼ supfj@1 ð1=hð1=2n ; ð1=21 Þ@n ÞÞgg. For any given N, 1 j@1 ðN þ 1Þ j;h x1 ; 21 N where fx1 ; 1=21 gN is a St. Petersburg game that pays xn 2 X j;h when a fair coin comes up the ﬁrst time on ﬂip n, for noN and xN such Pheads for that jðxN Þ ¼ 2= h ð1=2n ; ð1=21 Þ@n Þ, otherwise. nN

Proof. Note that X 1 1 @n 1 U x1 ; h n; ¼ jðxn Þ 21 N 2 21 n¼1N X 1 1 @n h n; þ jðxNþ1 Þ ¼N þ1 2 21 n4N A.3 Implausible Risk Aversion of Type (I). Let an agent’s preferences on a lottery space be represented by a theory D with functional (6) with a bounded money transformation function. For any given x there exists an L such that x þ LD f1; 0:5; xg. Proof. Let j be the money transformation function and g(0.5) be the transformed probability of 0.5 under decision theory D. Function j is bounded from above and positively monotonic, so: there exists an A such that (i) A ¼ supx fjðxÞg, and (ii) lim jðxÞ ¼ A. For any given x, take ¼ x!1

ðA@jðxÞÞð1@gð0:5ÞÞ40 and apply (ii) to ﬁnd a zx s.t. jðzx Þ4A@. To complete the proof take L ¼ zx@x, substitute the expression of in the last inequality and verify that jðx þ LÞ4Agð0:5Þ þ jðxÞð1@gð0:5ÞÞ.

RISK AVERSION IN THE LABORATORY Glenn W. Harrison and E. Elisabet Rutstro¨m ABSTRACT We review the experimental evidence on risk aversion in controlled laboratory settings. We review the strengths and weaknesses of alternative elicitation procedures, the strengths and weaknesses of alternative estimation procedures, and ﬁnally the effect of controlling for risk attitudes on inferences in experiments.

Attitudes to risk are one of the primitives of economics. Individual preferences over risky prospects are taken as given and subjective in all standard economic theory. Turning to the characterization of risk in applied work, however, one observes many restrictive assumptions being used. In many cases individuals are simply assumed to be risk neutral;1 or perhaps to have the same constant absolute or relative aversion to risk.2 Assumptions buy tractability, of course, but at a cost. How plausible are the restrictive assumptions about risk attitudes that are popularly used? If they are not plausible, perhaps there is some way in which one can characterize the distribution of risk attitudes so that it can be used to analyze the implications of relaxing these assumptions. If so, such characterizations will condition inferences about choice behavior under uncertainty, bidding in auctions, and behavior in games. Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 41–196 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00003-3

41

42

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

We examine the design of experimental procedures that can be used to estimate risk attitudes of individuals. We also investigate how the data generated by these procedures should be analyzed. We focus on procedures that allow ‘‘direct’’ estimation of risk preferences by eliciting choices in noninteractive settings, since we want to minimize the role of auxiliary or joint hypotheses about Nash Equilibrium (NE) behavior in games. It is important to try to get estimates that are independent of such joint assumptions, in order that the characterizations that emerge can be used to provide tighter tests of those joint assumptions.3 Nevertheless, we also include designs that rely on subjects recognizing a dominant strategy response in a game against the experimenter, although we will note settings in which the presumption that subjects actually use these might be suspect.4 In Section 1 we consider the major procedures used to elicit risk attitudes. In Section 2 we review the alternative ways in which risk attitudes have been estimated from observed behavior using these procedures. In Section 3 we examine the manner in which measures of risk attitudes are used to draw inferences about lab behavior. Section 4 offers some thoughts on several open and closed issues, and Section 5 draws some grand conclusions. Our review is intended to complement the review by Cox and Sadiraj (2008) of theoretical issues in the use of concepts of risk aversion in experiments, as well as the review by Wilcox (2008a) of econometric issues involved in identifying risk attitudes when there is allowance for unobserved heterogeneity and ‘‘mistakes’’ by subjects. We take some positions on these theoretical and econometric issues, but leave detailed discussion to their surveys. We default to thinking of risk attitudes as synonymous with the properties of the utility function, consistent with traditional expected utility theory (EUT) representations. When we consider rank-dependent and signdependent speciﬁcations, particularly in Sections 2 and 3, the term ‘‘risk attitudes’’ will be viewed more broadly to take into account the effects of more than just the curvature of the utility function. Appendix A descriptively reviews the manner in which the humble ‘‘lottery’’ has been represented in laboratory experiments. Although we do not focus on the behavioral effects that may arise from the framing of the lotteries, we need to be aware that the stimulus provided to subjects often varies signiﬁcantly from experiment to experiment. In effect, we experimenters are assuming that the subject views the lottery the way we view the lottery; the validity of this assumption of common knowledge between subject and observer rests, in large part, on the representation chosen by the experimenter. Some day a systematic comparison of the effects of these

Risk Aversion in the Laboratory

43

alternatives on risk attitudes should be undertaken, but here we simply want to provide a reminder that alternative representations exist and are used.5 We return to this issue much later, since it relates to the manner in which laboratory experiments might provide artifactual representations of the uncertainty subjects face in the ﬁeld. In Appendices B, C, D, and E we examine in some depth the data and inferences drawn from four heavily cited studies of risk attitudes. The objective is to be very clear as to what these studies ﬁnd, and what they do not ﬁnd, since references to the literature are often casual and sometimes even inaccurate. Appendices B and C focus on two bona ﬁde classics in the area. Hey and Orme (1994) (HO) introduced a robust experimental design to test EUT, a maximum likelihood (ML) estimation procedure that does not impose parametric functional forms, and a careful discussion of the role of ‘‘errors’’ when making inferences about risk attitudes. Holt and Laury (2002) (HL) introduced a justiﬁably popular method for eliciting risk attitudes for an individual, as well as important innovations in the ML estimation of risk aversion that go beyond simplistic functional forms. Appendices D and E focus on two studies that illustrate the problems that arise when experiments suffer from design issues or draw general inferences from restrictive models. Kachelmeier and Shehata (1992) (KS) apply an elicitation procedure that is popular, but which generates so much noise that reliable inferences cannot be drawn. Gneezy and Potters (1997) (GP) consider the important issue of ‘‘evaluation periods’’ on risk attitudes, but confound that valuable objective with extremely restrictive speciﬁcations of risk attitudes, leading them to incorrectly conclude that risk attitudes change with evaluation periods. In each of these studies there is an important objective; in the one case, examining risk attitudes among very poor subjects for whom the stakes are huge, and in the other case, considering the framing of the choice in a fundamental manner. But the problems with each study show why one has to pay proper attention to design and inferential issues before drawing reliable conclusions. We conclude that there is systematic evidence that subjects in laboratory experiments behave as if they are risk averse. Some subjects tend towards a mode of risk neutrality (RN), but very few exhibit risk-loving behavior. The degree of risk aversion is modest, but does exhibit heterogeneity that is correlated with observable individual characteristics. Some risk elicitation methods are expected to provide more reliable estimates than others, due to the simplicity of the task and the transparency of the incentives to respond truthfully. Limited evidence exists on the

44

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

stability of risk attitudes across elicitation instruments, but there is some evidence to indicate that roughly equal measures of risk aversion can be obtained in the laboratory using a variety of procedures that are a priori attractive. There are also several methods for eliciting risk that we do not recommend. Inferences about risk attitudes can be undertaken using several empirical approaches. One approach is to infer bounds on parameters for a limited class of (one-parameter) utility functions, but a preferable approach is to estimate a latent structural model of choice. Developments in statistical software now allow experimenters to undertake such structural estimation using ML methods. In addition, inferences about risk attitudes depend on whether the data generating process is viewed from the lens of a single model of choice behavior: there is striking evidence that two or more models may have support from different subjects or different task domains. Appropriate statistical tools exist that allow one to model the extent to which one model or another is favored by the data, and for which subjects and task domains. We review evidence that subjects exhibit some modest amounts of probability weighting, and some controversial evidence concerning the extent of loss aversion. Much of the behavioral folklore on probability weighting and loss aversion has employed elicitation procedures and/or statistical methods, which are piecemeal or have ad hoc properties. Our ﬁnal topic for discussion is how the characterization of behavior in a wide range of experimental tasks is affected by the treatment of risk attitudes, or confounded by the lack of such a treatment. Examples reviewed here include tests of EUT, estimates of discount rates, and evaluations of alternative models of bidding behavior in auctions. One open issue, with the potential to undermine many inferences in experimental economics, is the extent to which sample selection is driven by risk attitudes. A related concern is the reliability of measurements of treatment effects when subjects have some choice as to which treatment to participate in. In brief, risk attitudes play a central role in experimental economics, and the nuances of measuring and controlling them demand the attention of every experimenter.

1. ELICITATION PROCEDURES Five general elicitation procedures have been used to ascertain risk attitudes from individuals in the experimental laboratory using non-interactive settings. The ﬁrst is the Multiple Price List (MPL), which entails giving

Risk Aversion in the Laboratory

45

the subject an ordered array of binary lottery choices to make all at once. The MPL requires the subject to pick one of the lotteries on offer, and then the experimenter plays that lottery out for the subject to be rewarded. The second is a series of Random Lottery Pairs (RLP), in which the subject picks one of the lotteries in each pair, and faces multiple pairs in sequence. Typically one of the pairs is randomly selected for payoff, and the subject’s preferred lottery is then played out as the reward. The third is an Ordered Lottery Selection (OLS) procedure in which the subject picks one lottery from an ordered set. The fourth method is a Becker–DeGroot–Marschak (BDM) auction in which the subject is asked to state a minimum certainty-equivalent (CE) selling price to give up the lottery he has been endowed with. The ﬁfth method is a hybrid of the others: the Trade-Off (TO) design, in which the subject is given lotteries whose prizes (or probabilities) are endogenously deﬁned in real-time by prior responses of the same subject, and some CE elicited. We also review several miscellaneous elicitation procedures that have been proposed.

1.1. The Multiple Price List Design The earliest use of the MPL design in the context of elicitation of risk attitudes is, we believe, Miller, Meyer, and Lanzetta (1969). Their design confronted each subject with ﬁve alternatives that constitute an MPL, although the alternatives were presented individually over 100 trials. The method was later used by Schubert, Brown, Gysler, and Brachinger (1999), Barr and Packard (2002), and Holt and Laury (2002). Appendix C reviews the HL experiments in detail. The HL instrument provides a simple test for risk aversion using an MPL design. Each subject is presented with a choice between two lotteries, which we can call A or B. Panel A of Table 1 illustrates the basic payoff matrix presented to subjects. The ﬁrst row shows that lottery A offered a 10% chance of receiving $2 and a 90% chance of receiving $1.60. The expected value of this lottery, EVA, is shown in the third-last column as $1.64, although the EV columns were not presented to subjects.6 Similarly, lottery B in the ﬁrst row has chances of payoffs of $3.85 and $0.10, for an expected value of $0.48. Thus, the two lotteries have a relatively large difference in expected values, in this case $1.17. As one proceeds down the matrix, the expected value of both lotteries increases, but the expected value of lottery B becomes greater than the expected value of lottery A. The subject chooses A or B in each row, and one row is later selected at random for payout for that subject. The logic behind this test for risk

p ($1.60)

p ($3.85) a

Probability of Bad Outcome

Bad Outcome (Indian Rupees)

1/2 1/2 1/2 1/2 1/2 1/2 1/2 1/2

Probability of Good Outcome

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0

p ($0.10)

$0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10

$0.48 $0.85 $1.23 $1.60 $1.98 $2.35 $2.73 $3.10 $3.48 $3.85

EVB

50 95 120 125 150 160 190 200

Good Outcome (Indian Rupees)

$1.64 $1.68 $1.72 $1.76 $1.80 $1.84 $1.88 $1.92 $1.96 $2.00

EVA

50 70 80 80 90 90 100 100

Expected Value

$1.17 $0.83 $0.49 $0.16 $0.17 $0.51 $0.84 $1.18 $1.52 $1.85

Difference

Experiments were also conducted at the 20, 50, and 90 level. Experiments were also conducted at the rupees 0.5 level (compared to alternative O) and at the rupees 5 level, with roughly 2 weeks interval.

b

a

B. Binswanger (1980, 1981) instrument with payoffs at the rupees 50 levelb O 1/2 50 A 1/2 45 B 1/2 40 B 1/2 35 C 1/2 30 C 1/2 20 E 1/2 10 F 1/2 0

Alternative

A. Holt and Laury (2002) instrument with payoffs at the 1 level 0.1 $2 0.9 $1.60 0.1 $3.85 0.2 $2 0.8 $1.60 0.2 $3.85 0.3 $2 0.7 $1.60 0.3 $3.85 0.4 $2 0.6 $1.60 0.4 $3.85 0.5 $2 0.5 $1.60 0.5 $3.85 0.6 $2 0.4 $1.60 0.6 $3.85 0.7 $2 0.3 $1.60 0.7 $3.85 0.8 $2 0.2 $1.60 0.8 $3.85 0.9 $2 0.1 $1.60 0.9 $3.85 1 $2 0 $1.60 1 $3.85

p ($2)

Lottery B

Lottery Choices in the Holt/Laury and Binswanger Risk Aversion Instruments.

Lottery A

Table 1. 46 GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Risk Aversion in the Laboratory

47

aversion is that only risk loving subjects would take lottery B in the ﬁrst row, and only risk-averse subjects would take lottery A in the second last row. Arguably, the last row is simply a test that the subject understood the instructions, and has no relevance for risk aversion at all.7 A risk-neutral subject should switch from choosing A to B when the EV of each is about the same, so a risk-neutral subject would choose A for the ﬁrst four rows and B thereafter. The HL instrument is typically applied using a random lottery incentive procedure in which one row is selected to be played out according to the choices of the subjects, rather than all rows being played out. But that is not an essential component of the instrument, even if it is popular and widely used in many experiments to save scarce experimental funds. We discuss the random lottery incentive procedure in detail in Section 3.8. The MPL instrument has one apparent weakness as an elicitation procedure: it might suggest a frame that encourages subjects to select the middle row, contrary to their unframed risk preferences. The antidote for this potential problem is to devise various ‘‘skewed’’ frames in which the middle row implies different risk attitudes, and see if there are differences across frames. Simple procedures to detect such framing effects, and correcting them statistically if present, have been developed (e.g., Harrison, Lau, Rutstro¨m, & Sullivan, 2005; Andersen, Harrison, Lau, & Rutstro¨m, 2006; Harrison, List, & Towe, 2007). The evidence suggests that there may be some slight framing effect, but it is not systematic and can be easily allowed for in the estimation of risk attitudes. A variant of the MPL instrument was developed in the laboratory by Schubert et al. (1999).8 Figs. 1 and 2 illustrate the interface provided to subjects by Barr and Packard (2002), in a sequential ﬁeld implementation of this variant used in Chile. Respondents were confronted with a series of gambles framed ﬁrst as an investment. The experiment then elicited their CE for an uncertain lottery. Trained experimenters asked the respondents to imagine themselves as investors choosing whether to invest in Firm A, whose proﬁts were determined by its chances of success or failure, or Firm B, whose proﬁts were ﬁxed regardless of how well it fared. The experimenter explained the probabilities of Firm A’s success, the payoffs from Firm A in each state, and the ﬁxed payoff from Firm B. The respondents were then asked to decide in which ﬁrm to invest. After registering their answer, the experimenter would raise the amount of the secure payoff, and ask the respondents to choose between the two ﬁrms again. As the amount of the secure payoff grew, investing in Firm A looked less attractive to a riskaverse respondent. In this way a CE, the point at which respondents would

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

48

Investment Decision 1

FIRM B

FIRM A

Very successful Profit=3,000 P with a 1 in 6 chance, i.e., if Not very successful Profit=1,000 P with a 5 in 6 chance. i.e., if

,

Fig. 1.

,

,

,

Do you choose Firm A or Firm B?

Primary MPL Instrument of Barr and Packard (2002).

no longer risk investing in Firm A, was elicited for each gamble. The probability of Firm A’s failure was altered three times while keeping the state-speciﬁc payoffs constant, and in the fourth investment gamble the payoffs were altered. A risk-averse subject would state a value for Firm B below the expected value of Firm A, and a risk-loving subject would state a value for Firm A above the expected value of Firm A. The subject knew that the CE ‘‘price list’’ would span the range shown in Fig. 2 before the sequence began. Two variants of the MPL instrument were developed by Harrison et al. (2005d; Section 3.1), and studied at length by Andersen et al. (2006a). One is called the Switching MPL method, or sMPL for short, and simply changes the MPL to ask the subject to pick the switch point from one lottery to the other. Thus, it enforces monotonicity, but still allows subjects to express indifference at the ‘‘switch’’ point, akin to a ‘‘fat switch point.’’ The subject was then paid in the same manner as with MPL, but with the non-switch choices ﬁlled in automatically. The other variant is the Iterative MPL method, or iMPL for short. The iMPL extends the sMPL to allow the individual to make choices from reﬁned options within the option last chosen. That is, if someone decides at some stage to switch from option A to option B between values of $10 and $20, the next stage of an iMPL would

49

Risk Aversion in the Laboratory

Profit = 1,000 P

Profit = 1,200 P

Profit = 1,400 P

Profit = 1,600 P

Profit = 1,800 P

Profit = 2,000 P

Profit = 2,200 P

Profit = 2,400 P

Profit = 2,600 P

Profit = 2,800 P

Profit = 3,000 P

Tab for Investment Decision 1

Fig. 2. Slider in MPL Instrument of Barr and Packard (2002).

50

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

then prompt the subject to make more choices within this interval, to reﬁne the values elicited.9 The computer implementation of the iMPL restricts the number of stages to ensure that the intervals exceed some a priori cognitive threshold (e.g., probability increments of 0.001). The iMPL uses the same incentive logic as the MPL and sMPL.10 Another feature of the MPL should be noted, although it is not obvious that it is a weakness or a strength: the fact that subjects see all choices in one (ordered) table. One alternative is to have the subjects make each binary lottery choice in a sequence, embedding them into the RLP design of Section 1.2. It is possible that allowing the subject to see all choices in one frame might lead some subjects to make more consistent choices than they would otherwise. Which approach, then, is the correct one to use? The answer depends on the inferential objective of the design, and the external context that the implied measure of risk aversion is to be applied to. We view the MPL and RLP as two different elicitation procedures: their effect on behavior should be studied systematically, in the manner we illustrate later in Section 2.5. We do not believe that consistency should always be the primary criterion for selection across elicitation procedures, particularly when one allows formally for the stochastic choice process (Section 2.3 and Wilcox (2008a)) and the possibility that it could interact with the elicitation procedure in some manner. Evidence for different risk attitudes across procedures is, by deﬁnition, a sign of a procedural artifact. But that evidence needs to be documented with formal statistical models and, if present, recognized as a behavioral corollary of using that procedure. In summary, the set of MPL instruments provides a relatively transparent procedure to elicit risk attitudes. Subjects rarely get confused about the incentives to respond truthfully, particularly when the randomizing devices are physical die that they know that they will toss themselves.11 As we demonstrate later, it is also possible to infer a risk attitude interval for the speciﬁc subject, at least under some reasonable assumptions.

1.2. The Random Lottery Pair Design The RLP design has not been used directly to infer risk attitudes, but has been generally used to test the predictions of EUT. Hey and Orme (1994) used an extensive RLP design to estimate utility functionals over lotteries for individuals non-parametrically. The use of the random lottery design, coupled with treating each pairwise choice as independent, implicitly means that the estimates they provide rely on the EUT speciﬁcation.

Risk Aversion in the Laboratory

51

Related experimental data, from the earlier ‘‘preference reversal’’ debate, provide comparable evidence of risk aversion for smaller samples (see Grether and Plott, 1979 and Reilly, 1982). Additionally, many prominent experiments testing EUT provide observations based on a rich array of lotteries that vary in terms of probabilities and monetary prizes; for example, see Camerer (1989), Battalio, Kagel, and Jiranyakul (1990), Kagel, MacDonald, and Battalio (1990), Loomes, Starmer, and Sugden (1991), Harless (1992), and Harless and Camerer (1994). In most cases the published study only reports patterns of choices, with no information on individual characteristics, but they can be used to obtain general characterizations of risk attitudes for that subject pool. Hey and Orme (1994) asked subjects to make direct preference choices over 100 pairs of lotteries, in which the probabilities varied for four ﬁxed monetary prizes of d0, d10, d20, and d30. Subjects could express direct preference for one lottery over the other, or indifference. One of the pairs was actually chosen at random at the end of the session for payout for each subject, and the subject’s preferences over that pair applied. Some days later the same subjects were asked back to essentially repeat the task, facing the same lottery combinations in different presentation order. HO used pie charts to display the probabilities of the lotteries they presented to subjects. A sample display from their computer display to subjects is shown in Fig. 3. There is no numerical referent for the probabilities, which must be judged from the pie chart. As a check, what fraction would you guess that each slice is on the left-hand lottery? In fact, this lottery offers d10 with probability 0.625, and d30 with probability 0.385. The right-hand lottery offers the same probabilities, as it happens, but with prizes of d10 and d20, respectively. Fig. 4 illustrates a modest extension of this display to include information on the probabilities of each pie slice, and was used in a replication and extension of the HO experiments by Harrison and Rutstro¨m (2005). HO used their data to estimate a series of utility functionals over lotteries, one for each subject since there were 100 observations for each subject in each task. This is a unique data set since most other studies rely on pooled data over individuals and the presumption that unobserved heterogeneity (after conditioning on any collected individual characteristics, such as sex and race and income) is random. The EUT functional that HO estimated was non-parametric, in the sense that they directly estimated the utility of the two intermediate outcomes, normalizing the lowest and highest to 0 and 1, respectively. This attractive approach works well when there are a small number of ﬁnal outcomes

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

52

Fig. 3.

Lottery Display Used by Hey and Orme (1994).

across many choices, as here, but would not be statistically efﬁcient if there had been many outcomes. In that case it would be appropriate to use some parametric functional form for utility, and estimate the parameters of that function. We illustrate these points later. The RLP instrument is typically used in conjunction with the random lottery payment procedure in which one choice is picked to be played out, but this is again not essential to the logical validity of the instrument. The great advantage of the RLP instrument is that it is extremely easy to explain to subjects, and the incentive compatibility of truthful responses apparent. Contrary to the MPL, it is generally not possible to directly infer a risk attitude from the pattern of responses, and some form of estimation is needed. We illustrate such estimations later.

1.3. The Ordered Lottery Selection Design The OLS design was developed by Binswanger (1980, 1981) in an early attempt to identify risk attitudes using experimental procedures with real

Risk Aversion in the Laboratory

Fig. 4.

53

Lottery Display for Hey and Orme (1994) Replication.

payoffs. Each subject is presented with a choice of eight lotteries, shown in each row of panel B of Table 1, and asked to pick one. Alternative O is the safe option, offering a certain amount. All other alternatives increase the average actuarial payoff but with increasing variance around that payoff. The lotteries were actually presented to subjects in the form of photographs of piles of money, to assist illiterate subjects. Each lottery had a generic label, such as the ones shown in the left column of panel B of Table 1. Fig. 5 shows the display used by Barr (2003) in a ﬁeld replication of the basic Binswanger OLS instrument in Zimbabwe, and essentially matches the graphical display used in the original experiments (Hans Binswanger; personal communication). Because the probabilities for each lottery outcome are 1/2, this instrument can be presented relatively simply to subjects.12 The OLS instrument was ﬁrst used in laboratory experiments by Murnighan, Roth, and Shoumaker (1987, 1988), although they only used the results to sort subjects into one group that was less risk averse than the other. Beck (1994) utilized it to identify risk aversion in laboratory subjects,

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

54

Fig. 5.

Lottery Display of Binswanger Replication by Barr (2003).

Risk Aversion in the Laboratory

55

prior to them making group decisions about the dispersion of everyone else’s potential income. This allowed an assessment of the extent to which subjects in the second stage chose more egalitarian outcomes because they were individually averse to risk or because they cared about the distribution of income. Eckel and Grossman (2002, 2008) used the OLS instrument to directly measure risk attitudes, as well as an innovative application in which subjects guessed the risk attitudes of other subjects. They found that subjects did appear to use sexual stereotypes in guessing the risk attitudes. The OLS instrument is easy to present to subjects, but has two problems when used to make inferences about non-EUT models of choice behavior. The versions that restrict probabilities to 1/2 make it virtually impossible to use these responses to make inferences about probability weighting, which play a major role in rank-dependent alternatives to EUT. Of course, there is nothing in the instrument itself that restricts the probabilities to 1/2, although that has been common. The second problem is that the use of the certain amount may frame the choices that subjects make in a manner that makes them ‘‘sign-dependent,’’ such that the certain amount provides a reference point to identify gains and losses. This concern applies more broadly, of course, but in the OLS instrument there is a natural and striking reference point for (some) subjects to use. We consider both of these issues later when we consider inferences from observed choices. Engle-Warnick, Escobal, and Laszlo (2006) undertake laboratory experiments with the OLS instrument to test the effect of presenting the choices in different ways. The baseline mimics the procedures of Binswanger (1980, 1981) and Barr (2003), shown in Fig. 5, except that ﬁve lotteries were arrayed in a circle in an ordered counter-clockwise fashion, with the certain amount at 12 o’clock. The ﬁrst treatment then presents the ordered pairs of lotteries in a binary choice fashion, so that the subject makes four binary choices. The second treatment extends these binary choices by including a lottery that is dominated by one of the original binary pairs. The dominated lottery is always presented in between the non-dominated lotteries, so it appears to be physically intermediate. Each subject made 13 decisions, which were randomized in order and left–right presentation (for the undominated lotteries). The statistical analysis of the results is unfortunately couched in terms of ordinal measures of the degree of risk aversion, such as the number of safe choices, and it would be valuable to see the effect of these treatments on estimated measures of relative risk aversion (RRA) using more explicit statistical methods (e.g., per Section 2.2, and particularly Sections 2.5 and 2.6). But there is evidence that the instruments are positively correlated, although the correlation is signiﬁcantly less than one.

56

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

In particular, the correlation between the baseline OLS instrument and the transformed binary choice version for Canadian university students is 0.63, but it is only 0.31 for Peruvian farmers. Moreover, the introduction of a dominated lottery appeared to have no signiﬁcant effect on the correlation of risk attitudes for the Canadian university students, but considerable effects on the correlation for Peruvian farmers.

1.4. The Becker–DeGroot–Marschak Design The original BDM design developed by Becker, DeGroot, and Marschak (1964) was modiﬁed by Harrison (1986, 1990) and Loomes (1988) for use as a test for risk aversion.13 This design was later used by McKee (1989), Kachelmeier and Shehata (1992) and James (2007) in similar exercises. The basic idea is to endow the subject with a series of lotteries, and to ask for the ‘‘selling price’’ of the lottery. The subject is told that a ‘‘buying price’’ will be picked at random, and if the buying price that is picked exceeds the stated selling price, the lottery will be sold at that price and the subject will receive that buying price. If the buying price equals or is lower than the selling price, the subject keeps the lottery and plays it out. It is relatively transparent to economists that this auction procedure provides a formal incentive for the subject to truthfully reveal the CE of the lottery. However, it is not clear that subjects always understand this logic, and responses may be sensitive to the exact nature of the instructions given. For the instrument to elicit truthful responses, the experimenter must ensure that the subject realizes that the choice of a buying price does not depend on the stated selling price.14 If there is reason to suspect that subjects do not understand this independence, the use of physical randomizing devices (e.g., die or bingo cages) may mitigate such strategic thinking. Of course, the BDM procedure is formally identical to a two-person Vickrey sealed-bid auction, with the same concerns about subjects not understanding dominant strategies without considerable training (Harstad, 2000; Rutstro¨m, 1998). A major concern when choosing elicitation formats is the strength of the incentives provided at the margin, that is, the magnitude of the losses generated by misrepresenting true preferences. While the BDM is known to have weak incentives around the optimum (Harrison, 1992), the same is also true for other elicitation formats.15 Comparing the incentive properties of the BDM to the MPL in a pairwise evaluation of a safer and a riskier lottery, we ﬁnd that the expected loss from errors in the latter is a weighted average of the losses implied for the safe and the risky evaluations

Risk Aversion in the Laboratory

57

respectively in the BDM. The incentives in the BDM can be strengthened through a careful choice of the range of the buying prices and are generally stronger the higher is the variance of the lottery being valued.16 Plott and Zeiler (2005) express a concern with the way that the BDM mechanism is popularly implemented. Appendix D reviews in detail an application of the BDM mechanism for eliciting risk attitudes by Kachelmeier and Shehata (1992) and illustrates some possible problems. It may be possible to re-design the BDM mechanism to avoid some of these problems,17 but more attractive elicitation procedures are available.

1.5. The Trade-Off Design Wakker and Deneffe (1996) propose a TO method to elicit utility values which does not make any assumption about whether the subject weighs probabilities. This is an advantage compared to the methods widely used in the ‘‘judgement and decision-making literature,’’ such as the CE or probability-equivalent methods,18 since those methods assume that there is no probability weighting. The TO method proceeds by asking the subject to consider two lotteries deﬁned over prizes x0, x1, r, and R and probabilities p and 1 p: (x1, p; r, 1 p) and (x0, p; R, 1 p). It is assumed that RWr, p is some ﬁxed probability of receiving the ﬁrst outcome, and that x0 is some ﬁxed and small amount such as $0. The subject is asked to tell the experimenter what x1 would make him indifferent between these two lotteries. Call this stage 1 of the TO method. Then the subject is asked the same question about the lotteries (x2, p; r, 1 p) and (x1, p; R, 1 p) and asked to state the x2 that makes him indifferent between these two. Call this stage 2 of the TO method. If the subject responds truthfully to these questions, it is possible to infer that u(x2) u(x1) ¼ u(x1) u(x0) using the logic explained by Wakker and Deneffe (1996; p. 1134). Setting u(x0) ¼ 0, we can then infer that u(x2) ¼ 2 u(x1). A similar argument leads to an elicited x3 such that u(x3) ¼ 3 u(x1), and so on. If we wanted to stop at x3, we could then renormalize u(x1) to 1/3, so that the we have elicited utility over the unit interval. The obvious problem with the TO method as implemented by Wakker and Deneffe (1996) is that it is not incentive compatible: subjects have a transparent incentive to overstate the value of x1, and indeed all other elicited amounts. Assume that subjects are to be incentivized in the obvious manner by one of the lotteries in each task being picked by a coin toss to be played out (or by just one such lottery being picked at random

58

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

over all three stages). First, by overstating x1 in stage 1 the subject increases the ﬁnal outcome received if a lottery in stage 1 is used to pay him because x1 is one of the outcomes in one of the lotteries in stage 1. Second, by overstating x1 in stage 1 the subject increases the ﬁnal outcome received if a lottery in stage 2 is used to pay him, since x1 is used to deﬁne one of the lotteries in stage 2. Thus, we would expect some subject to ask us, sheepishly in stage 1, ‘‘how large an x1 am I allowed to state?’’ It is surprising that the issue of incentive compatibility was not even discussed in Wakker and Deneffe (1996), but since the actual experiments they report were hypothetical, even an otherwise incentive compatible mechanism could have problems generating truthful answers. There is a recognition that the ‘‘chaining’’ of old responses into new lotteries might lead to error propagation (p. 1148), but that is an entirely separate matter than strategic misrepresentation. The TO method was extended by Fennema and van Assen (1999) to consider losses as well as gains. The experiments were all hypothetical, primarily to avoid the ethical problems of exposing subjects to real losses. The TO method was extended by Abdellaoui (2000) to elicit probability weights after utilities have been elicited. Real rewards were provided for one randomly selected binary choice in the gain domain from one randomly selected subject out of the 46 present, but the issue of incentive compatibility is not discussed. There is an attempt to elicit utility values in a nonsequential manner, which might make the chaining effect less transparent to inexperienced subjects, but again this only mitigates the second of the sources of incentive incompatibility.19 Bleichrodt and Pinto (2000) proposed a different way of extending the TO method to elicit probability weights, but only applied their method to hypothetical utility elicitation in the health domain. They provide a discussion (p. 1495) of ‘‘error propagation’’ that points to some of the literature on stochastic error speciﬁcations considered in Section 2.3, but in each case assume that the error has mean zero, which misses the point of the incentive incompatibility of the basic TO method. Abdellaoui, Bleichrodt, and Paraschiv (2007b) extend the TO method to elicit measures of loss aversion. Their experiments were for hypothetical rewards, and they do not discuss incentive compatibility.20

1.6. Miscellaneous Designs There are several experimental designs that attempt to elicit risk attitudes that do not easily ﬁt into one of the ﬁve major designs considered above.

Risk Aversion in the Laboratory

59

We again ignore any designs that do not claim to elicit risk attitudes in any conceptual sense that an economist would recognize, even if those designs might elicit some measure which is empirically correlated in some settings with the measures of interest to economists. Fehr and Goette (2007) estimate a loss aversion parameter using a Blind Loss Aversion model of behavior, ‘‘extending’’ the Myopic Loss Aversion model of Benartzi and Thaler (1995); we review the latter model in detail in Section 3.5. They ask subjects to consider two lotteries, expressed here in equivalent dollars instead of Swiss Francs: Lottery A: Win $4.50 with probability 1/2, lose $2.80 with probability 1/2. Otherwise get $0. Lottery B: Play six independent repetitions of lottery A. Otherwise get $0. Subjects could participate in both lotteries, neither, or either. Fehr and Goette (2007) assume that subjects have a linear utility function for stakes that are this small, relying on the theoretical arguments of Rabin (2000) rather than the data of Holt and Laury (2002) and others. They also assume that there is no probability weighting: even though Quiggin (1982; Section 4) viewed 1/2 as a plausible ﬁxed point in probability weighting, most others have assumed or found otherwise. If one is blind to the effects of curvature of the utility function and probability weighting then the only thing left to explain choices over these lotteries is loss aversion. On the other hand, it becomes ‘‘heroic’’ to then extrapolate those estimates to explain behavior that one has elsewhere (p. 304) assumed to be characterized by stakes large enough that strictly concave utility is plausible a priori. Of course, the preferred model (p. 306) assumes away concavity and only uses the loss aversion parameter, but without explanation for why behavior over such stakes should be driven solely by loss aversion instead of risk attitudes more generally.21 Tanaka, Camerer, and Nguyen (2007) (TCN) propose a method to elicit risk and time preferences from individuals. They assume a certain parametric structure in their risk elicitation procedure, assuming Cumulative Prospect Theory (CPT): speciﬁcally, power Constant Relative Risk Aversion (CRRA) utility functions for gains and losses, and the oneparameter version of the Prelec (1998) probability weighting function. They further assume that the CRRA coefﬁcient for gains and losses is the same. We consider these functional forms in detail in Sections 3.1 and 3.2. The upshot is they seek to elicit one parameter s that controls the concavity or convexity of the utility function, one parameter a that controls the curvature of the probability weighting function, and one parameter l that determines

60

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

the degree of loss aversion. Their elicitation procedure for time preferences is completely separate conceptually from their elicitation procedure for risk attitudes, and is not used to infer anything about risk preferences.22 To elicit the ﬁrst two parameters, s and a, TCN ask subjects to consider three MPL sheets. The ﬁrst sheet contains 14 options akin to those used in the Holt and Laury (2002) MPL procedure, shown in panel A of Table 1. The difference is that the probabilities of the high or low outcomes in each lottery stay constant from row to row, but the high prize in the ‘‘risky’’ lottery get larger and larger: the risky lottery start off in row 1 as ‘‘relatively risky’’ but with a relatively low expected value, and changes so that in the last row it becomes ‘‘extremely risky’’ but with a substantially higher expected value. The speciﬁc, ﬁxed probabilities used are 0.3 for the high prize in the safe lottery and 0.1 for the high prize in the risky lottery. Subjects are asked to pick a switch point in this sheet, akin to the sMPL procedure of Andersen et al. (2006a); of course, this is just a monotonicityenforcing variant of the basic MPL procedure of Holt and Laury (2002). So we can see that behavior in the ﬁrst sheet elicits an interval for s if we had ignored probability weighting, just as it elicited an interval for the CRRA coefﬁcient in Holt and Laury (2002; Table 3, p. 1649). But with probability weighting allowed, all we can infer from this choice are combinations of intervals for s and a. TCN indicate (p. 8, fn. 11) that the values of s and a they report are actually ‘‘rounded mid-points’’ of the intervals. For example, one interval they infer is 0.65oso0.74 and 0.66oao0.74, and they round this to the values s ¼ 0.7 and a ¼ 0.7. They note in a footnote to Table A1 (p. 33) that the boundaries of the intervals are approximated to the nearest 0.05 increments. If subjects do not switch they use the approximate values at the last possible interval; in fact, the implied interval should have a ﬁnite value for a lower bound and N for the upper bound, as noted by Coller and Williams (1999).23 For their particular parameters there are seven such combinations of interval pairs. The second sheet in the procedure of TCN is qualitatively the same as the ﬁrst sheet, except that the probabilities of the high prize in each lottery are now 0.1 and 0.7. The speciﬁc prizes are different, but have the same structure as the ﬁrst sheet. From the switching point in the second sheet one can derive another set of interval pairs for the parameters s and a. The values for these intervals will be different than the intervals derived from the ﬁrst sheet, because of differences in the value of the prizes and probabilities. By crossing the two sets of intervals one can reduce the implied intervals to the intersections from the two sheets. Since the prizes in these two sheets involve gains, the loss aversion parameter l plays no role.

Risk Aversion in the Laboratory

61

The third sheet in the procedure of TCN involves losses. There are seven options in which each lottery contains one positive prize and one negative prize, so these are ‘‘mixed lotteries.’’ Probabilities of the high prize are ﬁxed at 1/2 for all rows, and variations in three of the prizes occur from row to row. Conditional on a value of s from responses to the ﬁrst two sheets, the response in the third sheet implies an interval for l. For example, if s ¼ 0.2 then somebody switching at, say, row 4 in the third sheet would have revealed a loss aversion parameter such that 1.88olo2.31, but if s ¼ 1 then somebody switching at row 4 in the third sheet would have revealed a loss aversion parameter such that 1.71olo2.42. The parameters for the third sheet were chosen, for a given observed response, so that the implied intervals for l did not differ widely as s varied over the expected range. Of course, the responses in the third sheet provides information on s as well as l. In other words, if one only observed responses from the third sheet there would be a number of interval pairs for s and l that could account for the data, just as there are a number of interval pairs of s and a that could rationalize the observed response in the ﬁrst or second sheet. So, the TCN procedure implicitly imposes a recursive estimation structure, so that s is pinned down only from the responses in the ﬁrst two sheets, and then the responses in the third sheet are used, conditional on some s, to infer bounds for l. This is a wily and parsimonious assumption, but might lead to different inferences than if one simply took all responses in these three sheets and simultaneously estimated s, a, and l, using ML methods discussed in Section 2.2. The TCN procedure generates no information on standard errors of estimates, but such information would be provided automatically with the use of ML methods. Although the parameters they derive are conditional on the speciﬁc functional forms assumed, and in some cases (e.g., the third sheet) chosen to generate relatively robust inferences assuming those parameterizations, it should be possible to recover estimates for some minor variations in functional form (e.g., Constant Absolute Risk Aversion (CARA) instead of CRRA).

2. ESTIMATION PROCEDURES Two broad methods of estimating risk attitudes have been used. One involves the calculation of bounds implied by the observed choices, typically using utility functions which only have a single-parameter to be inferred. The other involves the direct estimation by ML of some structural model of a latent choice process in which the core parameters deﬁning risk attitudes

62

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

can be estimated, in the manner pioneered by Camerer and Ho (1994; Section 6.1) and Hey and Orme (1994). The latter approach is particularly attractive for non-EUT speciﬁcations, where several core parameters combine to characterize risk attitudes. For example, one cannot characterize risk attitudes under Prospect Theory (PT) without making some statement about loss aversion and probability weighting, along with the curvature of the utility function. Thus, joint estimation of all parameters is a necessity for reliable statements about risk attitudes in such cases. We ﬁrst review examples of each approach (Sections 2.1 and 2.2), and then consider the role of stochastic errors (Section 2.3), the possibility of non-parametric estimation (Section 2.4), and a comparison of risk attitudes elicited from different procedures (Section 2.5), and treatments (Section 2.6). The exposition in this section focuses almost exclusively on EUT characterizations of risk attitudes. Alternative models are considered in Section 3.

2.1. Inferring Bounds The HL data may be analyzed using a variety of statistical models. Each subject made 10 responses in each task, and typically made 30 responses over three different tasks. The responses in each task can be reduced to a scalar if one looks at the lowest row in panel A of Table 1 where the subject ‘‘switched’’ over to option B.24 This reduces the response to a scalar for each subject and task, but a scalar that takes on integer values between 0 and 10. In fact, over 83% of their data takes on values of 4 through 7, and 94% takes on values between 3 and 8. HL evaluate these data using ordinary least squares regression with the number of safe choices as the dependent variable, estimated on the sample generated by each task separately, and report univariate tests of demographic effects.25 They also report semi-parametric tests of the number of safe choices with experimental condition as the sole control. To study the effects of experimental conditions, while controlling for characteristics of the sample and the conduct of the experiment, one could employ an interval regression model, ﬁrst proposed by Coller and Williams (1999) for an MPL experimental task (eliciting discount rates). The dependent variable in this analysis is the CRRA interval that each subject implicitly chose when they switched from option A to option B. For each row of panel A in Table 1, one can calculate the bounds on the CRRA coefﬁcient that is implied, and these are in fact reported by Holt and Laury

Risk Aversion in the Laboratory

63

(2002; Table 3). Thus, for example, a subject that made ﬁve safe choices and then switched to the risky alternatives would have revealed a CRRA interval between 0.15 and 0.41, and a subject that made seven safe choices would have revealed a CRRA interval between 0.68 and 0.97, and so on.26 When we consider samples that pool responses over different tasks for the same individual, we would use a random effects panel interval regression model to allow for the correlation of responses from the same subject. Using this panel interval regression model, we can control for all of the individual characteristics collected by HL, which includes sex, age, race (Black, Asian, or Hispanic), marital status, personal income, household income, household size, whether the individual is the primary household budget decision-maker, indicator of full-time employment, student status, faculty status, whether the person is a junior, senior, or graduate student, and whether the person has ever voted. In addition, dummy variables indicate speciﬁc sessions, and a separate indicator identiﬁes those sessions conducted at Georgia State University. The treatment variables, of course, include the scale of payoffs (1, 20, 50, or 90), the order of the task (1, 2, 3, or 4), and the experimental income earned by the subject in task 3. Table 2 presents ML estimates of this interval regression model. Since each subjects contributed several tasks, a random effects speciﬁcation has been used to control for unobserved individual heterogeneity. One of the advantages of the use of inferred bounds for risk attitudes is that one can estimate detailed models such as in Table 2, since interval regression is a relatively stable statistical model, and a straightforward extension of ordinary least squares. It is also easy to correct for multiplicative heteroskedasticity using this estimation method, although that can introduce convergence problems as a practical matter. The main beneﬁt of such an estimation is the ability to quickly ascertain treatment and demographic effects for the sample. Consider ﬁrst the question of order effects. Tasks 1 and 4 were identical in terms of the payoff scale, but differed because of their order and the fact that subjects had some experimental income from the immediately prior task 3. Controlling for that prior income, as well as other individual covariates, we ﬁnd that there is an order effect: the CRRA coefﬁcient increases by 0.16 in task 4 compared to task 1, and this is signiﬁcant at the 2% level. Thus, order effects do seem to matter in these experiments, and in a direction that confound the inferences drawn about scale from the highpayoff treatments. There is also a signiﬁcant scale effect, as seen for task 3 in Table 2, so the only way that one can ascertain the pure effect of order when there is a confounding change in scale, without such assumptions, would be

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

64

Table 2.

Interval Regression Model of Responses in Holt and Laury Experimentsa.

Variable

Description

scale5090 Task3 Task4 wealth

Payoffs scaled by 50 or 90 Third task Fourth task Wealth coming into the lottery choice Session B Session C Session D Session E Session F Session G Session H Session I Session J Session K Session M Female Black Asian Hispanic Age Ever married Personal income between $5k and $15k Personal income between $15k and $30k Personal income above $30k Household income between $5k and $15k Household income between $15k and $30k Household income between $30k and $45k Household income between $45k and $100k Household income over $100k Number in household

Sess2 Sess3 Sess4 Sess5 Sess6 Sess7 Sess8 Sess9 Sess10 Sess11 Sess13 female black asian hispanic age married Pinc2 Pinc3 Pinc4 Hinc2 Hinc3 Hinc4 Hinc5

Hinc6 nhhd

Estimate

Standard p-Value Lower 95% Upper 95% Error Conﬁdence Conﬁdence Interval Interval

0.13 0.26 0.16 0.00

0.15 0.04 0.07 0.00

0.38 0.00 0.02 0.10

0.16 0.18 0.02 0.00

0.42 0.34 0.30 0.00

0.18 0.01 0.16 0.27 0.14 0.24 0.45 0.21 0.31 0.07 0.10 0.04 0.05 0.05 0.39 0.01 0.12 0.06

0.20 0.16 0.20 0.20 0.15 0.18 0.20 0.18 0.18 0.22 0.21 0.06 0.16 0.10 0.12 0.01 0.09 0.11

0.37 0.92 0.43 0.17 0.34 0.18 0.02 0.23 0.08 0.75 0.62 0.46 0.75 0.63 0.00 0.34 0.18 0.56

0.58 0.29 0.54 0.66 0.44 0.60 0.84 0.55 0.67 0.36 0.31 0.07 0.26 0.14 0.62 0.02 0.06 0.15

0.21 0.32 0.23 0.12 0.15 0.11 0.06 0.13 0.04 0.50 0.52 0.16 0.36 0.23 0.16 0.01 0.30 0.27

0.14

0.11

0.24

0.36

0.09

0.10

0.13

0.41

0.35

0.14

0.24

0.16

0.13

0.07

0.54

0.17

0.15

0.27

0.13

0.47

0.08

0.16

0.63

0.23

0.39

0.31

0.14

0.03

0.03

0.58

0.14

0.17

0.39

0.18

0.47

0.03

0.03

0.38

0.09

0.03

65

Risk Aversion in the Laboratory

Table 2. (Continued ) Variable

decide fulltime student business junior senior grad faculty voter gsu

Description

Estimate

Primary household budget decision-maker Full time employment Student Business major Junior Senior Graduate student Faculty Ever voted Experiment at Georgia State University

0.09

0.08

0.26

0.25

0.07

0.15 0.17 0.20 0.16 0.03 0.18 0.07 0.01 0.40

0.10 0.08 0.10 0.13 0.14 0.15 0.24 0.07 0.22

0.16 0.02 0.05 0.23 0.84 0.22 0.77 0.86 0.07

0.06 0.02 0.39 0.41 0.31 0.11 0.55 0.15 0.83

0.35 0.32 0.00 0.10 0.25 0.46 0.40 0.12 0.03

0.63 0.29

0.27 0.03

0.02 0.00

0.10 0.24

1.15 0.34

0.33

0.01

0.00

0.30

0.36

Constant Standard deviation of su random individual effect se Standard deviation of residual

Standard p-Value Lower 95% Upper 95% Error Conﬁdence Conﬁdence Interval Interval

Notes: Log-likelihood value is 838.24; Wald test for null hypothesis that all coefﬁcients are zero has a w2 value of 118.44 with 40 degrees of freedom, implying a p-value less than 0.001; fraction of the total error variance due to random individual effects is estimated to be 0.433, with a standard error of 0.043. a Random-effects interval regression. N ¼ 495, based on 181 subjects from Holt and Laury (2002).

to modify the HL design and directly test for it. Harrison, Johnson, McInnes, and Rutstro¨m (2005b) provided such a test, and found that there were statistically signiﬁcant order effects on risk attitudes; we consider their data below. We observe no signiﬁcant effect in Table 2 from sex: women are estimated to have a CRRA that is 0.04 higher than men, but the standard error of this estimate is 0.06. Hispanic subjects do have a statistically signiﬁcant difference in risk attitudes: their CRRA is 0.39 lower on average, with a p-value of less than 0.001. Subjects with an annual household income that places them in the ‘‘upper middle class’’ (between $45,000 and $100,000) have a signiﬁcantly higher CRRA that is 0.31 above the norm, with a p-value of 0.03. Students have a CRRA that is 0.17 higher on average ( p-value ¼ 0.02); the HL sample included faculty and staff in their

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

66

experiments. Business majors were less risk averse on average, by about 0.20 ( p-value ¼ 0.05). There are some quantitatively large session effects, although only two sessions (H and J) have effects that are statistically signiﬁcant in terms of the p-value. To preserve anonymity, the locations of these sessions apart from those at Georgia State University are conﬁdential, so one can only detect individual session effects. Fig. 6 shows the distribution of predicted CRRA coefﬁcients from the interval regression model estimates of Table 2 from task 1 (top left panel) and task 3 (bottom left panel). The estimates for the high-payoff task 3 are only from those subjects that faced the payoffs that were scaled by a factor of 20. The average low-payoff CRRA is estimated to be 0.28, with a standard deviation of 0.20; the average high-payoff CRRA is estimated to be 0.54 with a standard deviation of 0.26. As Fig. 6 demonstrates, the distribution is normally shaped, with relatively few of the estimates exhibiting signiﬁcant risk aversion above 0.9. Harrison et al. (2005b) recruited 178 subjects from the University of South Carolina to participate in a series of non-computerized experiments using the MPL procedure of HL. Their design called for subjects to

2

2

1.5

1.5

1

1

.5

.5

0

0 -1

-.5

0

.5

1

-1

-.5

Estimated CRRA from First Task, 1× Responses

2

2

1.5

1.5

1

1

.5

.5

0

0

.5

1

Estimated CRRA from First Task, 1× Responses

0 -1

-.5

0

.5

1

Estimated CRRA from Third Task, 20× Responses

Fig. 6.

-1

-.5

0

.5

1

Estimated CRRA from Fourth Task, 1× Responses

Interval Regression Estimates of Risk Aversion From Holt and Laury (2002) Experiments (Fraction of the Sample, N ¼ 181).

Risk Aversion in the Laboratory

67

participate in either a 1 session, a 10 session, or a 110 session, where the ‘‘’’ denotes the scalar applied to the basic payoffs used by HL in their 1 design (shown in panel A of Table 1). In the 1 session that is all that the subjects were asked to do; in the 10 session they did one risk elicitation task but with payoffs scaled up by 10. In the 110 session subjects were asked to state their choices over 1 lotteries, and then given the opportunity to give up any earnings from that task and participate in a comparable 10 task. We examine the responses of the subjects in the 10 session and in the 10 part of the 110 session, with controls for whether their 10 responses were preceded by the 1 task or not. Table 3 reports the statistical analysis of these data, also using an interval regression model. Since each subject made only one 10 choice, no panel corrections are needed. The results show no signiﬁcant effect from sex, and some effect from age, citizenship, and task order. One limitation of this approach is that it assumes that all of the heterogeneity of the sample is captured by the individual characteristics measured by the experimenter. Although the socio-demographic questions typically used are relatively extensive, there is always some concern that there might be unobserved individual heterogeneity that could affect preferences towards risk. It is possible to undertake a statistical analysis of the responses of each individual, which implicitly controls for unobserved heterogeneity in the pooled analysis. However, the MPL design is not well suited to such an estimation task, even if it can be undertaken numerically, due to the small sample size for each individual. It is a simple matter to extend the HL design to have the subject consider several MPL tables for different lottery prizes, providing a richer data set with which to characterize individual risk attitudes (e.g., Harrison, Lau, & Rutstro¨m, 2007b). Apart from providing several interval responses per subject, such designs allow one to vary the prizes in the MPL design and pin down the latent CRRA more precisely by having overlapping intervals across tasks, as explained by Harrison et al. (2005d). Thus, if one task tells us that a given subject has a CRRA interval between 0.1 and 0.3, and another task tells us that the same subject has an interval between 0.2 and 0.4, we can infer a CRRA interval between 0.2 and 0.3 from the two tasks (with obvious assumptions about the absence of order effects, or some controls for them). Another limitation of this approach, somewhat more fundamental, is that it restricts the analyst to utility functions that can characterize risk attitudes using one parameter. This is because one must infer the bounds that make the subject indifferent between the switch points, and such inferences become virtually incoherent statistically when there are two or more

68

Table 3. Variable

Female Black Age Business Sophomore Junior Senior GPAhi GPAlow Graduate EdExpect

EdFather EdMother Citizen Order

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Interval Regression Model of Responses in Harrison, Johnson, McInnes, and Rutstro¨m Experimentsa. Description

Estimate

Standard p-Value Lower 95% Upper 95% Error Conﬁdence Conﬁdence Interval Interval

Female Black Age in years Major is in business Sophomore in college Junior in college Senior in college High GPA (greater than 3.75) Low GPA (below 3.24) Graduate student Expect to complete a PhD or Professional Degree Father completed college Mother completed college U.S. citizen RA session 10 comes after 1 Constant

0.088 0.084 0.022 0.043 0.068 0.035 0.023 0.004

0.08 0.10 0.01 0.07 0.11 0.12 0.13 0.09

0.26 0.40 0.07 0.56 0.54 0.77 0.85 0.97

0.06 0.11 0.00 0.19 0.29 0.27 0.27 0.18

0.24 0.28 0.05 0.10 0.15 0.20 0.22 0.19

0.137

0.09

0.12

0.31

0.04

0.034 0.119

0.16 0.09

0.83 0.18

0.27 0.29

0.34 0.05

0.106

0.09

0.24

0.07

0.28

0.027

0.08

0.75

0.19

0.14

0.234 0.166

0.12 0.08

0.05 0.03

0.00 0.01

0.47 0.32

0.092

0.34

0.78

0.75

0.56

Notes: Log-likelihood value is 290.2; Wald test for null hypothesis that all coefﬁcients are zero has a w2 value of 18.36 with 15 degrees of freedom, implying a p-value of 0.244. a All subjects facing 10 payoffs. N ¼ 178 subjects from Harrison et al. (2005b).

parameters. Of course, for popular functions such as CRRA or CARA this is not an issue, but if one wants to move beyond those functions then there are problems. It is possible to devise one-parameter functional forms with more ﬂexibility than CRRA or CARA in some dimension, as illustrated nicely by the one-parameter Expo-Power (EP) function developed by Abdellaoui, Barrios, & Wakker (2007a; Section 4). But in general we will need to move to structural modeling with ML to accommodate richer models, illustrated in Section 2.2. We conclude that relatively consistent estimates of the CRRA coefﬁcient of experimental subjects emerge from the HL experiments and the MPL

69

Risk Aversion in the Laboratory

design used in subsequent studies. There are, however, some apparent effects from task order, explored further in Harrison et al. (2005b) and Holt and Laury (2005). And there are signiﬁcant limitations on the ﬂexibility of the modeling of risk attitudes, pointing to the need for a complementary approach that allows structural estimation of latent models of choice under uncertainty.

2.2. Structural Estimation Assume for the moment that utility of income is deﬁned by UðxÞ ¼

xð1rÞ ð1 rÞ

(1)

where x is the lottery prize and r 6¼ 1 a parameter to be estimated. For r ¼ 1, assume U(x) ¼ ln(x) if needed. Thus, r is the coefﬁcient of CRRA: r ¼ 0 corresponds to RN, ro0 to risk loving, and rW0 to risk aversion. Let there be k possible outcomes in a lottery. Under EUT the probabilities for each outcome k, pk, are those that are induced by the experimenter, so expected utility is simply the probability weighted utility of each outcome in each lottery i: X EUi ¼ ð pk U k Þ (2) k¼1;K

The EU for each lottery pair is calculated for a candidate estimate of r, and the index rEU ¼ EUR EUL

(3)

calculated, where EUL is the ‘‘left’’ lottery and EUR is the ‘‘right’’ lottery. This latent index, based on latent preferences, is then linked to the observed choices using a standard cumulative normal distribution function F(rEU). This ‘‘probit’’ function takes any argument between 7N and transforms it into a number between 0 and 1 using the function shown in Fig. 7. Thus, we have the probit link function, probðchoose lottery RÞ ¼ FðrEUÞ

(4)

The logistic function is very similar, as illustrated in Fig. 7, and leads instead to the ‘‘logit’’ speciﬁcation.

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

70

1

Prob(y*)

.75

.5

.25

0 -5

-4

-3

-2

-1

0

1

2

3

4

5

y*

Fig. 7.

Normal and Logistic Cumulative Density Functions (Dashed Line is Normal and Solid line is Logistic).

Even though Fig. 7 is common in econometrics texts, it is worth noting explicitly and understanding. It forms the critical statistical link between observed binary choices, the latent structure generating the index y, and the probability of that index y being observed. In our applications y refers to some function, such as Eq. (3), of the EU of two lotteries; or, later, the Prospective Utility (PU) of two lotteries. The index deﬁned by Eq. (3) is linked to the observed choices by specifying that the R lottery is chosen when F(rEU)W1/2, which is implied by Eq. (4). Thus, the likelihood of the observed responses, conditional on the EUT and CRRA speciﬁcations being true, depends on the estimates of r given the above statistical speciﬁcation and the observed choices. The ‘‘statistical speciﬁcation’’ here includes assuming some functional form for the cumulative density function (CDF), such as one of the two shown in Fig. 7. If we ignore responses that reﬂect indifference for the moment the conditional log-likelihood would be X ln Lðr; y; XÞ ¼ ððln FðrEUÞjyi ¼ 1Þ þ ðln Fð1 rEUÞjyi ¼ 1ÞÞ (5) i

71

Risk Aversion in the Laboratory

where yi ¼ 1( 1) denotes the choice of the Option R (L) lottery in risk aversion task i, and X is a vector of individual characteristics reﬂecting age, sex, race, and so on. In most experiments the subjects are told at the outset that any expression of indifference would mean that if that choice was selected to be played out the experimenter would toss a fair coin to make the decision for them. Hence, one can modify the likelihood to take these responses into account by recognizing that such choices implied a 50:50 mixture of the likelihood of choosing either lottery: P ln Lðr; y; XÞ ¼ ððln FðrEUÞjyi ¼ 1Þ þ ðln Fð1 rEUÞjyi ¼ 1Þ i (50 ) þðlnð1=2 FðrEUÞ þ 1=2 Fð1 rEUÞÞjyi ¼ 0ÞÞ where yi ¼ 0 denotes the choice of indifference. In our experience very few subjects choose the indifference option, but this formal statistical extension accommodates those responses.27 The latent index, Eq. (3), could have been written in a ratio form: rEU ¼

EUR ðEUR þ EUL Þ

(30 )

and then the latent index would already be in the form of a probability between 0 and 1, so we would not need to take the probit or logit transformation. We will see that this speciﬁcation has also been used, with some modiﬁcations we discuss later, in HL. Appendix F reviews procedures and syntax from the popular statistical package Stata that can be used to estimate structural models of this kind, as well as more complex models discussed later. The goal is to illustrate how experimental economists can write explicit ML routines that are speciﬁc to different structural choice models. It is a simple matter to correct for stratiﬁed survey responses, multiple responses from the same subject (‘‘clustering’’),28 or heteroskedasticity, as needed, and those procedures are discussed in Appendix F. Applying these methods to the data from the Hey and Orme (1994) experiments, one can obtain ML estimates of the core parameter r. Pooling all 200 of the responses from each subject over two sessions, and pooling over all subjects, we estimate r ¼ 0.66 with a standard error of 0.04 assuming a normal CDF as in the dashed line in Fig. 7. These estimates correct for the clustering of responses by the same subject. If we instead assume a logistic CDF, as in the solid line in Fig. 7, we instead obtain an

72

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

estimate r ¼ 0.80 with a standard error of 0.04. This is not a signiﬁcant economic difference, but it does point to the fact that parametric assumptions matter for estimation of risk attitudes using these methods. In particular, the choice of normal or logistic CDF is almost entirely arbitrary in this setting. One might apply some nested or non-nested hypothesis test to choose between speciﬁcations, but we will see that it is dangerous to rush into rejecting alternative speciﬁcations too quickly. Extensions of the basic model are easy to implement, and this is the major attraction of the structural estimation approach. For example, one can easily extend the functional forms of utility to allow for varying degrees of RRA. Consider, as one important example, the EP utility function proposed by Saha (1993). Following Holt and Laury (2002), the EP function is deﬁned as UðxÞ ¼

ð1 expðax1r ÞÞ a

(10 )

where a and r are parameters to be estimated. RRA is then r+a(1 r)y1 r, so RRA varies with income if a 6¼ 0. This function nests CRRA (as a-0) and CARA (as r-0). We illustrate the use of this EP speciﬁcation later. It is also simple matter to generalize this ML analysis to allow the core parameter r to be a linear function of observable characteristics of the individual or task. In the HO experiments no demographic data were collected, but we can examine the effect of the subjects coming back for a second session by introducing a binary dummy variable (Task) for the second session. In this case, we extend the model to be r ¼ r0+r1 Task, where r0 and r1 are now the parameters to be estimated. In effect the prior model was to assume r ¼ r0 and just estimate r0. This extension signiﬁcantly enhances the attraction of structural ML estimation, particularly for responses pooled over different subjects, since one can condition estimates on observable characteristics of the task or subject. We illustrate the richness of this extension later. For now, we estimate r0 ¼ 0.60 and r1 ¼ 0.10, with standard errors of 0.04 and 0.02, respectively, using the probit speciﬁcation. So there is some evidence of a session effect, with slightly greater risk aversion in the second session. The effect of demographics and task can be examined using data generated by Harbaugh, Krause, and Vesterlund (2002). They examined lottery choices by a large number of individuals, varying in age between 5 and 64. Focusing on their lottery choices for dollars with individuals aged 19 and over, seven choices involved gambles in a gain frame, and

73

Risk Aversion in the Laboratory

seven over gambles in a loss frame. The loss frame experiments all involved subjects having some endowment up front, such that the loss was solely a framed loss, not a loss relative to the income they had coming into the session. In all cases the gamble was compared to a certain gain or loss, so these are relatively simple gambles to evaluate. The only demographic information included is age and sex, so we include those and interact them.29 We also allow for quadratic effects of age. Table 4 collects the estimates for models estimated separately on the choices made in the gain frame and choices made in the loss frame; later we

Table 4. Variable

Structural Maximum Likelihood Estimates of Risk Attitudes in Harbaugh, Krause, and Vesterlund Experimentsa. Description

Estimate Standard p-Value Lower 95% Upper 95% Error Conﬁdence Conﬁdence Interval Interval

A. Gain Domain Order2 Task order control Order3 Task order control Order4 Task order control Male Male Age Age in years Age2 Age squared Mage Male age Mage2 Male age2 Constant

0.009 0.010 0.005 0.016 0.014 0.000 0.001 0.000 0.476

0.007 0.008 0.007 0.029 0.001 0.000 0.002 0.000 0.021

0.168 0.197 0.481 0.594 0.000 0.000 0.776 0.852 0.000

0.004 0.005 0.009 0.042 0.011 0.000 0.005 0.000 0.434

0.023 0.026 0.019 0.073 0.017 0.000 0.003 0.000 0.517

B. Loss Domain Order2 Task order control Order3 Task order control Order4 Task order control Male Male Age Age in years Age2 Age squared Mage Male age Mage2 Male age2 Constant

0.004 0.000 0.005 0.030 0.013 0.000 0.003 0.000 0.483

0.006 0.007 0.007 0.024 0.001 0.000 0.002 0.000 0.016

0.575 0.974 0.494 0.205 0.000 0.000 0.053 0.026 0.000

0.009 0.013 0.018 0.077 0.011 0.000 0.000 0.000 0.452

0.016 0.014 0.009 0.016 0.016 0.000 0.007 0.000 0.514

Notes: Log-likelihood values are 8,070.56 in the gain domain, and 9,931.9 in the loss domain; Wald test for null hypothesis that all coefﬁcients are zero has a w2 value of 339.2 with 8 degrees of freedom, implying a p-value less than 0.001 in the gain domain, and a value of 577.9 with 8 degrees of freedom in the loss domain. a Maximum likelihood estimation of CRRA utility function using all pooled binary choices of adults. N ¼ 1092, based on 156 adult subjects from Harbaugh et al. (2002).

74

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

consider the effect of assuming a model of loss aversion, rather than just viewing these as different frames.30 There is virtually no effect from the loss frame, and in fact some evidence of a slight increase in risk aversion in that frame. The average of individual CRRA estimates is 0.476 in the gain frame, and is virtually identical in the loss frame. We ﬁnd no evidence of a sex effect in the gain frame. The direct effect of sex is to change CRRA by 0.016, but this small effect has a p-value of 0.594 and a 95% conﬁdence interval that easily spans zero. The joint effect of sex and age is also statistically insigniﬁcant: a test of the joint effect of sex and the sex–age interactions has a w2 value of 1.17, and with three degrees of freedom has a p-value of 0.761. Age has a signiﬁcant effect on CRRA in the gain domain, at ﬁrst increasing RRA and then eventually decreasing RRA as the individual gets older. The order dummies indicate no signiﬁcant effect of task presentation order. There does appear to be an effect of sex on CRRA elicited in the loss frame. This effect is not direct, but is based on the interaction with age. Apart from the statistical signiﬁcance of the individual interaction terms, a test that they are jointly zero has a w2 of 7.08 and a p-value of 0.069 with three degrees of freedom.

2.3. Stochastic Errors An important extension of the core model is to allow for subjects to make some errors. The notion of error is one that has already been encountered in the form of the statistical assumption that the probability of choosing a lottery is not one when the EU of that lottery exceeds the EU of the other lottery. This assumption is clear in the use of a link function between the latent index rEU and the probability of picking one or other lottery; in the case of the normal CDF, this link function is F(rEU) and is displayed in Fig. 7. If there were no errors from the perspective of EUT, this function would be a step function in Fig. 7: zero for all values of yo0, anywhere between 0 and 1 for y ¼ 0, and 1 for all values of yW0. By varying the shape of the link function in Fig. 7, one can informally imagine subjects that are more sensitive to a given difference in the index rEU and subjects that are not so sensitive. Of course, such informal intuition is not strictly valid, since we can choose any scaling of utility for a given subject, but it is suggestive of the motivation for allowing for structural errors, and why we might want them to vary across subjects or task domains. Consider the structural error speciﬁcation used by HL, originally due to Luce. The EU for each lottery pair is calculated for candidate estimates of r,

75

Risk Aversion in the Laboratory

as explained above, and the ratio 1=m

rEU ¼

EUR 1=m

1=m

(300 )

ðEUL þ EU R Þ calculated, where m is a structural ‘‘noise parameter’’ used to allow some errors from the perspective of the deterministic EUT model. The index rEU is in the form of a cumulative probability distribution function deﬁned over differences in the EU of the two lotteries and the noise parameter m. Thus, as m-0 this speciﬁcation collapses to the deterministic choice EUT model, where the choice is strictly determined by the EU of the two lotteries; but as m gets larger and larger the choice essentially becomes random. When m ¼ 1, this speciﬁcation collapses to Eq. (3u), where the probability of picking one lottery is given by the ratio of the EU of one lottery to the sum of the EU of both lotteries. Thus, m can be viewed as a parameter that ﬂattens out the link functions in Fig. 7 as it gets larger. This is just one of several different types of error story that could be used, and Wilcox (2008a, 2008b) provides masterful reviews of the implications of the alternatives.31 The use of this structural error parameter can be illustrated by a replication of the estimates provided by Holt and Laury (2002). Using the EP utility function in Eq. (1u), the Luce speciﬁcation in Eq. (3v), and ignoring the fact that each subject made multiple binary choices, we estimate r ¼ 0.268 and a ¼ 0.028 using the non-hypothetical data from HL. Panel A of Table 5 lists these estimates, which replicate the results reported by HL (p. 1653) almost exactly. Their estimates were obtained using optimization procedures in GAUSS, and did not calculate the likelihood at the level of the individual observation. Instead their data was aggregated according to the lottery choices in each row, and scaled up to reﬂect the correct sample size of observations. This approach works ﬁne for a completely homogenous model in which one does not seek to estimate effects of individual characteristics or correct for unobserved heterogeneity at the level of the individual. But the approach adopted in our replication does operate at the level of the individual observation, so it is possible to make these extensions. In fact, allowing for unobserved individual heterogeneity does not affect these estimates greatly. The role of the stochastic error assumption in Eq. (3v) can be evaluated by using Eq. (3u) instead, which is to assume that m ¼ 1 in Eq. (3v). The effect, shown in panel B of Table 5, is to estimate more risk-loving behavior, with ro0. Hence, at low levels of income subjects are now

76

Table 5. Variable

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Structural Maximum Likelihood Estimates of Risk Attitudes in Holt and Laury Experimentsa. Description

Estimate Standard p-Value Lower 95% Upper 95% Error Conﬁdence Conﬁdence Interval Interval

A. Luce Error Speciﬁcation and No Corrections for Clustering r Utility function 0.268 0.017 o0.001 parameter a Utility function 0.028 0.002 o0.001 parameter m Structural noise 0.134 0.004 o0.001 parameter B. No Luce Error Speciﬁcation, No Corrections for Clustering r Utility function 0.161 0.044 o0.001 parameter a Utility function 0.015 0.003 o0.001 parameter

0.234

0.302

0.024

0.033

0.125

0.143

0.247

0.074

0.010

0.020

C. Probit Link Function, No Fechner Error Speciﬁcation, Corrections for Clustering r Utility function 0.293 0.021 o0.001 0.251 0.334 parameter a Utility function 0.038 0.003 o0.001 0.032 0.043 parameter D. Probit Link Function, Fechner Error Speciﬁcation, and Corrections for Clustering r Utility function 0.684 0.049 o0.001 0.589 0.780 parameter a Utility function 0.045 0.059 0.452 0.072 0.161 parameter m Structural noise 0.172 0.016 o0.001 0.140 0.203 parameter a

Maximum likelihood estimation of EP utility function using all pooled binary choices. N ¼ 3990, based on 212 subjects from Holt and Laury (2002).

estimated to be risk loving. There is still evidence of Increasing Relative Risk Aversion (IRRA), with aW0. However, the log-likelihood of this speciﬁcation is much worse than the original HL speciﬁcation, and we can comfortably reject the null that m ¼ 1. The point of this result is to demonstrate that the stochastic identifying restriction, to use the concept developed by Wilcox (2008a, 2008b), is not innocuous for inference about risk attitudes. There is one other important error speciﬁcation, due originally to Fechner and popularized by Hey and Orme (1994).32 This error speciﬁcation posits

77

Risk Aversion in the Laboratory

the latent index rEU ¼

ðEUR EUL Þ m

(3000 )

instead of Eq. (3), (3u), or (3v). Wilcox (2008a) notes that as an analytical matter the evidence of IRRA in HL would be weaker, or perhaps even absent, if one had used a Fechner error speciﬁcation instead of a Luce error speciﬁcation. This important claim, that the evidence for IRRA may be an artifact of the (more or less arbitrary) stochastic identifying restriction assumed, can be tested with the HL data. The estimates in panels C and D of Table 5 conﬁrm the claim of Wilcox (2008a). In panel C, we employ the probit link function Eq. (4) and the latent index function Eq. (3), and assume no Fechner error speciﬁcation.33 We conﬁrm the original estimates of HL, with minor deviations: the path of estimated RRA in the left side of Fig. 9 mimics the original results from HL in Fig. 8. But when we add a Fechner error speciﬁcation, in panel D of Table 5, we ﬁnd striking evidence of CRRA over this prize domain. The path of RRA in this case is shown on the right side of Fig. 9, and provides a dramatic contrast to Fig. 8.

2.5

2

1.5

1

.5

0 0

50

100

150 200 Income in Dollars

250

300

350

Fig. 8. Estimated Relative Risk Aversion Using the Holt–Laury Statistical Model. Estimated from Experimental Data of Holt & Laury (2002) Assuming Logit Likehood Function and Luce Noise.

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

78 2.5

Probit and No Noise

2.5

2

2

1.5

1.5

1

1

.5

.5

0

Probit and Fechner Noise

0 0

Fig. 9.

50 100 150 200 250 300 350 Income in Dollars

0

50 100 150 200 250 300 350 Income in Dollars

Estimated Relative Risk Aversion with Expo-Power Utility and Fechner Noise. Estimated from Experimental Data of Holt and Laury (2002).

The log-likelihood of the Fechner speciﬁcation is worse than the loglikelihood of the Luce speciﬁcation. Since neither speciﬁcation is nested in the other, a non-nested hypothesis test would seem to be called for. We reject the Fechner speciﬁcation using either the Vuong (1989) test or the variant proposed by Clarke (2003). On the other hand, we prefer to avoid rejecting one speciﬁcation out of hand just yet, since an alternative is to posit a latent data generating process in which two or more speciﬁcations have some validity. We return to consider this approach later.

2.4. Non-Parametric Estimation It is possible to estimate the EUT model without assuming a functional form for utility, following Hey and Orme (1994). This approach works well for problem domains in which there are relatively few outcomes, since it involves estimation of one parameter for all but two of the outcomes. So if the task domain is constrained to just four outcomes, as in HO or HL, there are only two parameters to be estimated. But if the task domain spans many outcomes, these methods become inefﬁcient and one must resort to a function deﬁned by a few parameters, such as CRRA or EP utility functions.

Risk Aversion in the Laboratory

79

To illustrate, we use the experimental data of HO, and then the replication of their experiments by Harrison and Rutstro¨m (2005). We also use the Fechner noise speciﬁcation introduced above, to replicate the speciﬁcation of HO. In HO there were only four monetary prizes of d0, d10, d20, and d30. We normalize to u(d0) ¼ 0 and u(d30) ¼ 1, and estimate u(d10), u(d20), and the noise parameter. As explained by HO, one could normalize the noise parameter to some ﬁxed value and then estimate u(d30) instead, but this choice of normalization seems the most natural. It is then possible to predict the values of the two estimated utilities: pooling over the two sessions and across subjects, we estimate u(d10) ¼ 0.66 with a standard error of 0.02, and u(d20) ¼ 0.84 with a standard error of 0.01, so u(d0)ou(d10)ou(d20)ou(d30) as expected. The application of this estimation procedure in HO was at the level of the individual, which obviously allows variation in estimated utilities over individuals. This illustrative calculation does not. The experiments of Harrison and Rutstro¨m (2005) were intended, in part, to replicate those of HO in the gain frame and additionally collect individual characteristics. In their case the prizes spanned $0, $5, $10, and $15. Employing the same non-parametric structure for this data as for the HO data above, the estimates are u($5) ¼ 0.60 and u($10) ¼ 0.80. In these data a set of demographic characteristics for each subject are known and we can therefore allow the estimated utilities to vary linearly with these characteristics. It is then possible to simply predict the estimated utilities, using the characteristics of each subject and the estimated coefﬁcients on those characteristics, and plot them. Fig. 10 shows the distribution of estimated values. No subject had estimates that implied u($10)ou($5).

2.5. Comparing Procedures Do the various risk elicitation procedures imply essentially the same risk attitudes? In part this question requires that one agree on a standard way of representing lotteries, and that we understand the effect of those representations on elicited risk attitudes. It also requires that we agree on how to characterize risk attitudes statistically, and there are again many alternatives available in that direction that should be expected to affect inferred risk attitudes (Wilcox, 2008a). The older literature on utility elicitation was careful to undertake controlled comparisons of different procedures, as reviewed in Hershey, Kunreuther, and Schoemaker (1982) and illustrated by Hershey and Schoemaker (1985). But none of that

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

80

Density

u($10)

u($5)

0

.1

.2

.3

.4

.5 .6 Estimated Utility

.7

.8

.9

1

Fig. 10. Non-Parametric Estimates of Utility. (Assuming EUT and Normalized so that u($0)=0 and u($15)=1; Kernel Density of Predicted Utility Estimates for N=120 Subjects; Data from Hey–Orme Replication of Harrison and Rutstro¨m (2005).)

literature seemed to be concerned with incentive compatibility and the effect of real rewards. The striking counter-example, of course, is the preference reversal literature started for economists by Grether and Plott (1979), since they used methods for eliciting responses which were incentive compatible and they used real consequences to choices. And the phenomenon of preference reversals itself may be viewed as the claim that risk attitudes elicited from two procedures are not consistent, since the reversal is an ‘‘as if’’ change in risk attitudes when the elicitation mode changes. Unfortunately, the preference reversals in question involved a comparison of risk attitudes elicited with the RLP and BDM procedures, which both rely on strong assumptions to reliably elicit preferences. It may therefore be useful to compare the three procedures that we do ﬁnd attractive on a priori grounds: the MPL of Holt and Laury (2002), the RLP of Hey and Orme (1994), and the OLS of Binswanger (1980, 1981). Each procedure is applied to the same sample drawn from the same population: students at the University of Central Florida. In one session the MPL method was ﬁrst and the OLS method last, in another session these orders were reversed, and the RLP method was always presented to subjects in

81

Risk Aversion in the Laboratory

between. The subjects learned what their payoffs were from each procedure at the end of the sequence of tasks for that procedure, so there is some potential in this design for income effects. There were 26 subjects in one session and 27 subjects in the second session, for a pooled sample of 53. The parameters for the MPL procedure were scaled up by a factor of 10 to those used in the baseline experiments of Holt and Laury (2002), shown in panel A of Table 1. Thus, the prizes were $1.00, $16, $20, and $38.50. The parameters for the OLS procedure follow the broad pattern proposed by Binswanger (1980, 1981). The certain option offers $10 whether a coin toss is heads or tails, and the next options offer $19 or $9, $24 or $8, $25 or $7, $30 or $6, $32 or $4, $38 or $2, and ﬁnally $40 or $0.34 The RLP procedure used lotteries with probabilities and prizes that were each randomly drawn.35 Each prize was randomly drawn from the uniform interval ($0.01, $15.00) in dollars and cents, and the number of prizes in each lottery pair was either 2, 3, or 4, also selected at random. For any lottery pair the cardinality of the outcomes was the same, so if one lottery had three prizes the other lottery would also have three prizes. The probabilities were also drawn at random, and represented to subjects to two decimal places. Each subject was given 60 pairs of lotteries to choose from, and three picked at random to be played out and paid. The expected value of each lottery was roughly $7.50, with the expected value from the RLP procedure as a whole around $22.70. Thus, the scale of prizes in the MPL and OLS procedures was virtually identical: up to $38.50 and $40, respectively. The scale of prizes in the RLP procedure was comparable: up to $45 if all three selected lotteries generated an outcome of $15 each. In each case we estimate a CRRA model using Eq. (1). For the MPL and RLP procedures we use the probit link function, that is Eq. (4), deﬁned over the difference in EU of the two lotteries for a candidate estimate of r and m, and the Fechner error speciﬁcation Eq. (3vu). For the OLS procedure we use the standard logit speciﬁcation originally due to Luce (1959); McFadden (2001) reviews the starred history of this speciﬁcation beautifully, and Train (2003) reviews modern developments. The EU for each lottery pair in this latter speciﬁcation is calculated for a candidate estimate of r and m, the exponential of the EU is taken as 1=m

eui ¼ expðEUi Þ

(6)

and the index rEUi ¼

eui ðeu1 þ eu2 þ eu3 þ eu4 þ eu5 þ eu6 Þ

(7)

82

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

calculated for each lottery i. This latent index, based on latent preferences, is in the form of a probability, and can therefore be directly linked to the observed choices; it is a multiple-lottery analogue of the Luce error speciﬁcation Eq. (3v) for binary lottery choice.36 The results indicate consistency in the elicitation of risk attitudes, at least at the level of the inferred sample distribution. The point estimate (and 95% conﬁdence intervals) for the MPL, RLP, and OLS procedures, respectively, are 0.75 (0.62, 0.88), 0.51 (0.42, 0.60), and 0.66 (0.44, 0.89). There is no signiﬁcant order effect on the estimates from the OLS procedure: the estimates when it was ﬁrst are 0.68 (0.43, 0.94), and when it was last they are 0.65 (0.25, 1.05). The 95% conﬁdence intervals are wider in these estimates of the sub-samples, due to smaller samples. There is, however, a small but statistically signiﬁcant order effect on the estimates from the MPL procedure: when it was ﬁrst the CRRA estimate is 0.61 (0.46, 0.76) and when it was last the estimate is 0.86 (0.67, 1.05). These results are suggestive that the procedures elicit roughly the same risk attitudes, apart from the sensitivity of the MPL procedure to order. Thus, one would tentatively conclude, based on the above analysis, that the procedures should be expected to generate roughly the same estimates of risk attitudes for a target population, and when used as the sole measuring instrument when used at the beginning of a session.37 A closely related issue is the temporal stability of risk preferences, even when one uses the same elicitation procedure. It is possible to deﬁne temporal stability of preferences in several different ways, reﬂecting alternative conceptual deﬁnitions and operational measures. Each deﬁnition has some validity for different inferential purposes. Temporal stability of risk preferences can mean that subjects exhibit the same risk attitudes over time, or that their risk attitudes are a stable function of states of nature and opportunities that change over time. It is quite possible for risk preferences to be stable in both, either, or neither of these senses, depending on the view one adopts regarding the role preference stability takes in the theory. The temporal stability of risk preferences is one component of a broader set of issues that relate to the state-dependent approach to utility analysis.38 This is a perfectly general approach, where the state of nature could be something as mundane as the weather or as fundamental as the individual’s mortality risk. The states could also include the opportunities facing the individual, such as market prices and employment opportunities. Crucial to the approach, however, is the fact that all state realizations must be exogenous, or the model will not be identiﬁed and inferences about stability will be vacuous.

Risk Aversion in the Laboratory

83

Problems arise, however, when one has to apply this approach empirically. Where does one draw the line in terms of the abstract ‘‘states of nature’’? Many alleged violations of EUT amount to claims that a person behaved as if they had one risk preference for one lottery pair and another risk preference for a different lottery pair. Implicit in the claim that these are violations of EUT is the presumption that the differences in the two lottery pairs was not some state of nature over which preferences could be different.39 Similarly, should we deem the preferences elicited with an openended auction procedure to be different from those elicited with a binary choice procedure, such as in the famous preference reversals of Grether and Plott (1979), because of some violation of EUT or just some change in the state of nature? Of course, it is a slippery inferential slope that allows ‘‘free parameters’’ to explain any empirical puzzle by shifting preferences. Such efforts have to be guided by direct evidence from external sources, lest they become open-ended speciﬁcation searches.40 Several studies have begun to examine the temporal stability question. Limited exercises in laboratory settings are reported by Horowitz (1992) and Harrison, Johnson, McInnes, and Rutstro¨m (2005a), who demonstrate the temporal stability of risk attitudes in lab experiments over a period of up to 4 months. Horowitz (1992; p. 177) collects information on ﬁnancial characteristics of the individual to control for changes in state of nature, but does not report if it changed the statistical inference about temporal stability. Harrison et al. (2005a) consider the temporal stability of risk attitudes in college students over a 4-week period, and do not control for changes in state of nature. Andersen, Harrison, Lau, and Rutstro¨m (2008b) extend these simple designs in several ways. They use a much longer time span, control for changes in state of nature, use a stratiﬁed sample of a broader population, and report the results of a large-scale panel experiment undertaken in the ﬁeld designed to examine this issue. Over a 17-month period they elicited risk preferences from subjects chosen to be representative of the adult Danish population. During this period many of the subjects were re-visited, and the same MPL risk aversion elicitation task repeated. In each visit information was also elicited on the individual characteristics of the subject, as well as their expectations about the state of their own economic situation and macroeconomic variables. The statistical analysis includes controls for changes in the subject’s perceived states of nature, as well as the possible effects of endogenous sample selection into the re-test. There is evidence of some variation in risk attitudes over time, but there is no general tendency for risk attitudes to increase or decrease over a 17-month span.

84

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Additionally, the small variation of risk attitudes over time is less prominent than variations across tasks and across individuals. The results also suggest that risk preferences are state contingent with respect to personal ﬁnances. Of course, we could easily imagine target populations, such as the poor, that might be far less stable over time than the average adult Dane. There is some evidence from Dave, Eckel, Johnson, and Rojas (2007; Table 7) that the MPL instrument might exhibit some drift over time in such a population: estimated RRA increases by 0.12 compared to a baseline of 0.71, but the p-value of this change is 0.14, so it is not statistically signiﬁcant. The real contribution of these studies is a systematic methodology for examining the issue of temporal stability with longitudinal experiments.

2.6. Comparing Treatments The use of structural estimation of latent choice models also allows one to compare experimental treatments in terms of their effect on core parameters. Thus, we can answer questions such as ‘‘does treatment X affect risk attitudes’’ by directly estimating the effect on parameters determining risk attitudes, rather than relying on less direct measures of that effect. The value of inferences of this kind become more important when we allow for various parameters and processes to affect choice under uncertainty, such as when we consider rank-dependent preferences and/or sign-dependent preferences in Section 3. To illustrate, consider the effect of providing information to subjects about the EV of lotteries they are to choose from. For simple, binaryoutcome lotteries one often observes some subjects actually trying to do this arithmetic themselves on scrap paper, whether or not they then use that to decide which lottery to accept without adding or subtracting a risk premium. But when the cardinality of outcomes exceeds two, virtually all subjects tend to give up on those efforts to calculate EV. This raises the hypothesis that elicited risk attitudes might reﬂect underlying preferences or the interaction of those preferences and cognitive constraints on applying them to a particular lottery (if one assumes, for now, that subjects apply them the way economists theorize about them). A direct measure of the effect of providing EV can be obtained by running these treatments and then estimating a model in which the treatment acts as a binary dummy on a core parameter of the latent structural model. For data we use the replication of the RLP procedures of Hey and Orme (1994) reported in Appendix B. These tasks were only over the gain frame;

Risk Aversion in the Laboratory

85

63 subjects received no information over 60 binary choices, and 25 different subjects received information. For the structural model, we assume a CRRA power utility function, a Fechner error speciﬁcation, and a probit linking function. If we introduce the binary dummy variable Info to capture those choices made under the treatment condition, we can estimate r ¼ r0+r1 Info and directly assess the effect on risk attitudes by the sign and statistical signiﬁcance of the coefﬁcient r1. It is also possible to allow for heteroskedasticity in the Fechner noise term, by estimating m ¼ m0+m1 Info and examining the estimate of m1. Thus, we allow for the possibility that providing information on EV might not change risk attitudes, but might change the precision with which the subject makes choices given a latent preference for one lottery over the other. The estimation results show that there is indeed a statistically signiﬁcant effect on elicited risk attitudes from providing the EV of each lottery. The power function coefﬁcient r increases by 0.15 from 0.47, which indicates a reduction in risk aversion towards RN. The p-value on the hypothesis test that this effect is zero is only 0.016, and the 95% conﬁdence interval on the effect is between 0.03 and 0.28. So we conclude that there does appear to be a signiﬁcant inﬂuence on elicited risk attitudes from providing information on EV. Whether this reﬂects better estimates of true preferences due to removing the confound of the cognitive burden of calculating EV, or reﬂects a simple anchoring response, cannot be determined. The point is that we can report the effect of the treatment in terms of its effect on the metric of interest, the core risk aversion parameter. In this speciﬁcation there is no statistically signiﬁcant effect on the Fechner noise parameter. Nor is there an effect on these conclusions from also controlling for the heterogeneity in preferences attributable to observed individual demographic effects.

3. EXTENSIONS AND FURTHER APPLICATIONS We elicit risk attitudes to make inferences about different things. Obviously there is interest in the characterization of risk attitudes in general, and the previous section reviewed the estimation issues that arise under EUT. It is also important to consider the characterization of risk attitudes under alternatives to EUT. We consider the class of rank-dependent models due to Quiggin (1982) (Section 3.1), and then the class of sign-dependent models due to Kahneman and Tversky (1979) (Section 3.2). The implications for allowing several latent data generating processes to characterize risk attitudes

86

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

are then considered (Section 3.3), concluding with a plea to avoid the assumption that there is one true model. Risk attitudes also constitute a fundamental confound to inferences about behavior in stochastic settings, and it is here that we believe that the major payoff to better experimental controls for risk attitudes will be seen. We consider three major areas of investigation in which controls for risk should play a more signiﬁcant role: identiﬁcation of discount rates (Section 3.4), tests of EUT against competing models (Section 3.5), and tests of bidding behavior in auctions (Section 3.6). We also consider tests of a model of choice behavior that has radical implications for how one might think about risk aversion, Myopic Loss Aversion (Section 3.7). Finally, we consider the implications of the random lottery incentive procedure for risk elicitation (Section 3.8), and present some summary estimates using comparable modeling assumptions and designs that we believe to be the most reliable (Section 3.9).

3.1. Characterizing Risk Attitudes with Probability Weighting and Rank-Dependent Utility One route of departure from EUT has been to allow preferences to depend on the rank of the ﬁnal outcome through probability weighting. The idea that one could use non-linear transformations of the probabilities as a lottery when weighting outcomes, instead of non-linear transformations of the outcome into utility, was most sharply presented by Yaari (1987). To illustrate the point clearly, he assumed a linear utility function, in effect ruling out any risk aversion or risk seeking from the shape of the utility function per se. Instead, concave (convex) probability weighting functions would imply risk seeking (risk aversion).41 It was possible for a given decision-maker to have a probability weighting function with both concave and convex components, and the conventional wisdom held that it was concave for smaller probabilities and convex for larger probabilities. The idea of rank-dependent preferences had two important precursors.42 In economics, Quiggin (1982, 1993) had formally presented the general case in which one allowed for subjective probability weighting in a rankdependent manner and allowed non-linear utility functions. This branch of the family tree of choice models has become known as Rank-Dependent Utility (RDU). The Yaari (1987) model can be seen as a pedagogically important special case, and can be called Rank-Dependent Expected Value (RDEV). The other precursor, in psychology, is Lopes (1984). Her concern

87

Risk Aversion in the Laboratory

was motivated by clear preferences that experimental subjects exhibited for lotteries with the same expected value but alternative shapes of probabilities, as well as the verbal protocols those subjects provided as a possible indicator of their latent decision processes. Formally, to calculate decision weights under RDU one replaces expected utility X EUi ¼ ð pk U k Þ (2) k¼1;K

with RDU RDUi ¼

X

ðwk U k Þ

(20 )

k¼1;K

where wi ¼ oð pi þ . . . þ pn Þ oð piþ1 þ . . . þ pn Þ

(8a)

for i ¼ 1, y, n 1, and wi ¼ oð pi Þ

(8b)

for i ¼ n, the subscript indicates outcomes ranked from worst to best, and where o( p) is some probability weighting function. In the RDU model we have to deﬁne risk aversion in terms of the properties of the utility function and the probability weighting function, since both can affect risk attitudes. However, one can deﬁne conditional orderings, following Chew, Karni, and Safra (1987) and others, by considering the effects of more or less concave utility functions given a probability weighting function, and vice versa. Similarly, when we consider sign-dependent preferences in Section 3.2 the notion of risk aversion must include the effects of the sign of outcomes (e.g., possible loss aversion). Picking the right probability weighting function is obviously important for RDU speciﬁcations. A weighting function proposed by Tversky and Kahneman (1992) has been widely used. It is assumed to have well-behaved endpoints such that o(0) ¼ 0 and o(1) ¼ 1 and to imply weights oð pÞ ¼

pg ð pg þ ð1 pÞg Þ1=g

(9)

for 0opo1. The normal assumption, backed by a substantial amount of evidence reviewed by Gonzalez and Wu (1999), is that 0ogo1. This gives the weighting function an ‘‘inverse S-shape,’’ characterized by a concave

88

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

section signifying the overweighting of small probabilities up to a crossoverpoint where o(p) ¼ p, beyond which there is then a convex section signifying underweighting. Under the RDU assumption about how these probability weights get converted into decision weights, go1 implies overweighting of extreme outcomes. Thus, the probability associated with an outcome does not directly inform one about the decision weight of that outcome. If gW1 the function takes the less conventional ‘‘S-shape,’’ with convexity for smaller probabilities and concavity for larger probabilities.43 Under RDU gW1 implies underweighting of extreme outcomes. We illustrate the effects of allowing for probability weighting using the experimental data from Holt and Laury (2005). We assume the EP functional form UðxÞ ¼

ð1 expðax1r ÞÞ a

(100 )

for utility. The remainder of the econometric speciﬁcation is the same as for the EUT model with Luce error m, generating 1=m

rRDU ¼

RDUR 1=m

1=m

(30000 )

ðRDUL þ RDUR Þ instead of Eq. (3vu). The conditional log-likelihood, ignoring indifference, becomes X X ln LRDU ðr; g; m; y; XÞ ¼ l RDU ¼ ððln FðrRDUÞjyi ¼ 1Þ i i i (500 ) þ ðlnð1 FðrRDUÞÞjyi ¼ 0ÞÞ and requires the estimation of r, g, and m. For RDEV one replaces Eq. (2u) with a speciﬁcation that weights the prizes themselves, rather than the utility of the prizes: X RDEVi ¼ ðok mk Þ (200 ) k¼1;K

where mk is the kth monetary prize. In effect, the RDEV speciﬁcation is a special case of RDU. The experimental data from Holt and Laury (2005) consists of 96 subjects facing their 1 condition or their 20 condition on a between-subjects basis.44 The ﬁnal monetary prizes ranged from a low of $0.10 up to $77. We only consider data in which subjects faced real rewards. Replicating their EUT statistical model, and allowing for clustering of responses, we estimate

89

Risk Aversion in the Laboratory

r ¼ 0.40 with a standard error of 0.07, and a ¼ 0.076 with a standard error of 0.02, closely tracking the estimates from Holt and Laury (2002). In particular, there is evidence of increasing RRA over this income domain. When we estimate the RDU model using these data and speciﬁcation, we ﬁnd clear evidence of probability weighting. The estimate of g is 0.37 with a standard error of 0.16, so we can easily reject the hypothesis that g ¼ 1 and that there is no probability weighting. Thus, we observe the conventional qualitative shape of the probability weighting function, an inverse S-shape. The effect of allowing for probability weighting is to lower the estimates of the curvature of the utility function – but we should be careful here not to associate curvature of the utility function with risk aversion. The risk aversion parameter r is estimated to be 0.26 and the a parameter to be 0.02, with standard errors of 0.05 and 0.012, respectively. Thus, there is some evidence for increasing curvature of the utility function as income increases (aW0), but it is not statistically signiﬁcant ( p-value of 0.16 that a ¼ 0). Fig. 11 displays the ‘‘relative risk aversion’’ associated with the curvature of the utility function, and the shape of the probability weighting function. Of course, RRA should actually be deﬁned here in terms of both the curvature of the utility function and the effect of probability weighting, so the coefﬁcients are not directly comparable to the EUT model. Nevertheless, 2.5

RDU ρ =.26 and α=.02

1

RDU γ =.37

2.25 2

.75

1.75 RRA

1.5 ω(p) .5

1.25 1 .75

.25

.5 .25 0

0 0 10 20 30 40 50 60 70 80 Income in Dollars

0

.25

.5 p

.75

1

Fig. 11. Probability Weighting in Holt and Laury Risk Elicitation Task. RDU Parameters Estimated with N ¼ 96 Subjects from Experimental Data of Holt and Laury (2005).

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

90

we can clearly say that inferences about increasing RRA depends on the assumptions one makes about probability weighting.

3.2. Characterizing Risk Attitudes with Loss Aversion and Sign-Dependent Utility 3.2.1. Original Prospect Theory Kahneman and Tversky (1979) introduced the notion of sign-dependent preferences, stressing the role of the reference point when evaluating lotteries. In various forms, as we will see, PT has become the most popular alternative to EUT. Original Prospect Theory (OPT) departs from EUT in three major ways: (a) allowance for subjective probability weighting; (b) allowance for a reference point deﬁned over outcomes, and the use of different utility functions for gains or losses; and (c) allowance for loss aversion, the notion that the disutility of losses weighs more heavily than the utility of comparable gains. The ﬁrst step is probability weighting, of the form o( p) deﬁned in Eq. (10), for example. One of the central assumptions of OPT, differentiating it from later variants of PT, is that w( p) ¼ o( p), so that the transformed probabilities given by o( p) are directly used to evaluate PU: X PUi ¼ ðok uk Þ (2000 ) k¼1;K

The second step in OPT is to deﬁne a reference point so that one can identify outcomes as gains or losses. Let the reference point be given by w for a given subject in a given choice. Consistent with the functional forms widely used in PT, we again use the CRRA functional form uðmÞ ¼

m1a ð1 aÞ

(1000 )

when mZw, and uðmÞ ¼ l

ðmÞ1a ð1 aÞ

(10000 )

when mow, and where l is the loss aversion parameter. We use the same exponent a for the utility functions deﬁned over gains and losses, even though the original statements of PT keep them theoretically distinct. Ko¨bberling and Wakker (2005; Section 7) point out that this constraint is

91

Risk Aversion in the Laboratory

needed to identify the degree of loss aversion if one uses CRRA functional forms and does not want to make other strong assumptions (e.g., that utility is measurable only on a ratio scale).45 Although l is free in principle to be less than 1 or greater than 1, most PT analysts presume that lZ1. The speciﬁcation of the reference point is critical to PT, and is discussed in Section 3.2.3. One issue is that it inﬂuences the nature of subjective probability weighting assumed, since different weights are allowed for gains and losses. Thus, we can again specify oðpÞ ¼

pg ð pg þ ð1 pÞg Þ1=g

(9)

for gains, but oðpÞ ¼

pf ð pf þ ð1 pÞf Þ1=f

(90 )

for losses. It is common in empirical applications to assume g ¼ f. The remainder of the econometric speciﬁcation would be the same as for EUT and RDU models. The latent index can be deﬁned in the same manner, and the conditional log-likelihood deﬁned comparably. Estimation of the core parameters a, l, g, f, and m is required. The primary logical problem with OPT was that it implied violations of stochastic dominance. Whenever g 6¼ 1 or f 6¼ 1, it is possible to ﬁnd nondegenerate lotteries such that one lottery would stochastically dominate the other, but would be assigned a lower PU. Examples arise quickly when one recognizes that g( p1+p2) 6¼ g( p1)+g( p2) for some p1 and p2. Kahneman and Tversky (1979) dealt with this problem by assuming that evaluation using OPT only occurred after dominated lotteries were eliminated. For speciﬁcations such as the one discussed here there is no modeling of an editing phase, but the stochastic error term m could be interpreted as a reduced-form proxy for that editing process.46 We do not provide any illustrative estimations of this model but move straight to the extensions provided by CPT. 3.2.2. Cumulative Prospect Theory The notion of rank-dependent decision weights was incorporated into OPT by Starmer and Sugden (1989), Luce and Fishburn (1991), and Tversky and Kahneman (1992). Instead of implicitly assuming that w( p) ¼ o( p), it allowed w( p) to be deﬁned as in the RDU speciﬁcation given by Eqs. (8a) and (8b). The sign-dependence of subjective probability weighting in OPT,

92

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

leading to the estimation of different probability weighting functions, Eqs. (9) and (9u), for gains and losses, is maintained in CPT. Thus, there is a separate decumulative function used for gains and losses, but otherwise the logic is the same as for RDU.47 The estimation of a structural CPT model can be illustrated with data from the Harrison and Rutstro¨m (2005) replication and extension of the Hey and Orme (1994) RLP procedure. As explained in Appendix B, they had some subjects face lotteries deﬁned over a gain frame, some face lotteries deﬁned over a loss frame, and some face lotteries deﬁned over a mixed gain–loss frame. In the mixed frame some prizes in a lottery were gains, and some were losses. In each case the subjects were endowed with cash to ensure that ﬁnal outcomes were either exactly or approximately the same across frames. Table 6 displays the ML estimates of the core parameters, and Fig. 12 displays the distributions over individuals of predicted values for each parameter. In each case the utility function is the CRRA power speciﬁcation, a Fechner error story is included with a probit link function, and m is a linear function of the same observable characteristics as every other parameter (Table 6 does not show the estimate for m). The distribution of estimates of a are consistent with concave utility functions over gains and convex utility functions over losses, as expected. The estimates of g are also consistent with expectations of an inverse S-shaped probability weighting function, implying greater decision weights on extreme prizes within each lottery. However, the estimates of l are not at all consistent with loss aversion, and in fact suggest a clear tendency towards loss seeking. We reconsider the sensitivity of estimates of l to the assumed reference point in more detail below. Table 6 shows that there are some systematic effects of observable demographics on the EUT and CPT parameter estimates. Under EUT there is a slight effect from sex, with women being more risk averse, but it is not statistically signiﬁcant. Similarly, ethnic characteristics show a large effect on risk attitudes, but they are not statistically signiﬁcant. The only characteristic that has a statistically signiﬁcant effect on risk attitudes under EUT is age, which is here shown in deviations from age 20. So every extra year leads to reduction in risk aversion. For completeness, we also estimate RDU on these data, not shown in Table 6, and ﬁnd the curvature of the utility function similar to that of EUT, contrary to the estimates discussed above for the data of Holt and Laury (2005). For the RDU model the data here indicate a signiﬁcant sex effect, with women being more risk averse

93

Risk Aversion in the Laboratory

Table 6. Maximum Likelihood Estimates for EUT and CPT Models. Parameter

Variable

Point Estimate

Standard p-Value Lower 95% Upper 95% Error Conﬁdence Conﬁdence Interval Interval

A. EUT Model (log-likelihood ¼ 7,665.0) r Constant 0.952 Female 0.133 Black 0.138 Hispanic 0.195 Age (compared to 20) 0.039 Major is in business 0.107 Low GPA (below 3.24) 0.061

0.149 0.094 0.133 0.127 0.009 0.135 0.121

0.00 0.16 0.30 0.13 0.00 0.43 0.61

0.66 0.32 0.40 0.44 0.02 0.37 0.18

1.24 0.05 0.12 0.05 0.06 0.16 0.30

B. CPT Model (log-likelihood ¼ 7,425.5) a Constant 0.761 Female 0.160 Black 0.132 Hispanic 0.358 Age (compared to 20) 0.017 Major is in business 0.037 Low GPA (below 3.24) 0.036

0.079 0.109 0.277 0.192 0.009 0.097 0.093

0.00 0.14 0.63 0.06 0.07 0.70 0.69

0.61 0.37 0.67 0.73 0.00 0.23 0.14

0.91 0.05 0.41 0.02 0.04 0.15 0.22

g

Constant Female Black Hispanic Age (compared to 20) Major is in business Low GPA (below 3.24)

1.017 0.050 0.300 0.092 0.001 0.021 0.066

0.061 0.074 0.133 0.142 0.004 0.075 0.070

0.00 0.49 0.02 0.51 0.75 0.78 0.35

0.89 0.20 0.56 0.37 0.01 0.17 0.20

1.14 0.09 0.04 0.18 0.01 0.13 0.07

l

Constant Female Black Hispanic Age (compared to 20) Major is in business Low GPA (below 3.24)

0.447 0.432 0.233 0.386 0.033 0.028 0.057

0.207 0.416 1.062 0.386 0.018 0.240 0.238

0.03 0.30 0.83 0.32 0.08 0.91 0.81

0.04 0.38 1.85 1.14 0.00 0.44 0.41

0.85 1.25 2.31 0.37 0.07 0.49 0.52

( 0.09, p-value ¼ 0.02), as well as Hispanics ( 0.17, p-value ¼ 0.009). In addition, age has the same effect as under EUT. Although the extent of probability weighting is slight, and overall curvature of the utility function matches EUT, there are therefore some signiﬁcant changes in the composition of the curvature of utility across the sample.

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

94

α Density

Density

γ

.5

.25

0

.75

1

.6 .65 .7 .75 .8 .85 .9 .95

μ Density

Density

λ

1

0

.5

1

1.5

.5

.6

.7

.8

.9

1

1.1 1.2 1.3

Fig. 12. Estimates of the Structural CPT Model. (Data from Hey–Orme Replication of Harrison and Rutstro¨m (2005); N ¼ 207 Subjects: 63 Gain Frame, 57 Loss Frame, and 87 Mixed Frame.)

The CPT estimates in Table 6 also show some demographic effects on the composition of the curvature of utility across the sample. There is now a large and statistically signiﬁcant effect from being Hispanic, in addition to a comparable age effect. The only characteristic that signiﬁcantly affects the extent of probability weighting is whether the subject is Black, and it is a large effect. The effects on loss aversion appear to be poorly estimated, which of course may just be a reﬂection that this is not a stable parameter in terms of its effect, at least as currently modeled. Although these were static tasks, in the sense that there was no accumulation of earnings, subjects may have been adjusting their reference point during the 60 binary choices in some unspeciﬁed manner. Finally, Fig. 13 collates estimates of the curvature of the utility function for these data using the three major alternative models of choice. In the top panel we include an EUT speciﬁcation assuming the CRRA power utility function with parameter r. In the bottom left panel we estimate an RDU model with utility function parameter r, and that allows for rank-dependent probability weighting. The EUT and RDU models are estimated on the choices made in the loss frame, but with the actual net gain amount included

95

Risk Aversion in the Laboratory EUT model: r Density

Density

EUT model: r

0

.25

.5

.75

1

1.25

0

.25

.5

.75

RDU model: α

1

1.25

Density

Density

CPT Model: α

0

.25

.5

.75

1

1.25

0

.25

.5

.75

1

1.25

Fig. 13. Estimates of Curvature of Utility Function. (Data from Hey–Orme Replication of Harrison and Rutstro¨m (2005); N ¼ 207 Subjects: 63 Gain Frame, 57 Loss Frame, and 87 Mixed Frame; Prizes for EUT and RDU Include Endowment.)

in the utility function.48 In the bottom-right panel, we reproduce the estimate of a from Fig. 12, scaled to the EUT estimate above it for comparability. We see evidence that the RDU speciﬁcation does not change the inferences we make about the curvature of the utility function signiﬁcantly in comparison to EUT, so risk aversion here is not reﬂected in a transformation of probabilities. The CPT speciﬁcation, which adds sign-dependence to utility, does result in a shift towards greater concavity of the utility function for gains, and more distinct modes reﬂecting a greater heterogeneity in preferences. Of course, curvature of the utility function under RDU and CPT is not the same as aversion to risk, but it is nonetheless useful to compare the implied shapes of the utility function. 3.2.3. The Reference Point and Loss Aversion It is essential to take a structural perspective when estimating CPT models. Estimates of the loss aversion parameter depend intimately on the assumed reference point, as one would expect since the latter determines what are to be viewed as losses. So if we have assumed the wrong reference point, we will not reliably estimate the degree of loss aversion. However, if we do not get loss aversion leaping out at us when we make a natural assumption about

96

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

the reference point, should we infer that there is no loss aversion or that there is loss aversion and we just used the wrong reference point? This question points to a key operational weakness of CPT: the need to specify what the reference point is. Loss aversion may be present for some reference point, but if it is not present for the one we used, and none others are ‘‘obviously’’ better, then should one keep searching for some reference point that generates loss aversion? Without a convincing argument about the correct reference point, and evidence for loss aversion conditional on that reference point, one simply cannot claim that loss aversion is always present. This speciﬁcation ambiguity is arguably less severe in the lab, where one can frame tasks to try to induce a loss frame, but is a particularly serious issue in the ﬁeld. Similarly, estimates of the nature of probability weighting vary with changes in reference points, loss aversion parameters, and the concavity of the utility function, and vice versa. All of this is to be expected from the CPT model, but necessitates joint econometric estimation of these parameters if one is to be able to make consistent statements about behavior. In many laboratory experiments it is simply assumed that the manner in which the task is framed to the subject deﬁnes the reference point that the subject uses. Thus, if one tells the subject that they have an endowment of $15 and that one lottery outcome is to have $8 taken from them, then the frame might be appropriately assumed to be $15 and this outcome coded as a loss of $8. But if the subject had been told, or expected, to earn only $5 from the experimental task, would this be coded instead as a gain of $2? The subjectivity and contextual nature of the reference point has been emphasized throughout by Kahneman and Tversky (1979), even though one often collapses it to the experimenter-induced frame in evaluating laboratory experiments. This imprecision in the reference point is not a criticism of PT, just a challenge to be careful assuming that it is always ﬁxed and deterministic (see Schmidt, Starmer, & Sugden, 2005; Ko+ szegi & Rabin, 2006, 2007; Andersen, Harrison, & Rutstro¨m, 2006b).49 A corollary is that it might be a mistake to view loss aversion as a ﬁxed parameter l that does not vary with the context of the decision, ceteris paribus the reference point. See Novemsky and Kahneman (2005a) and Camerer (2005; pp. 132, 133) for discussion of this concern, which arises most clearly in dynamic decision-making settings with path-dependent earnings. This issue is particularly serious when one evaluates risk attitudes in some of the high-stakes game shows: see Andersen, Harrison, Lau, and Rutstro¨m (2008c) for a review of these studies and the modeling issues that arise.

97

Risk Aversion in the Laboratory

To gauge the extent of the problem, we re-visit the estimation of a structural CPT model using our laboratory data (the replication of the Hey and Orme (1994) reported in Harrison and Rutstro¨m (2005)), but this time consider the effect of assuming different reference points than the one induced by the task frame. Assume that the reference point is w, as in Eqs. (1uuu) and (1uuuu) above, but instead of setting w ¼ $0, allow it to vary between $0 and $10 in increments of $0.10. The results are displayed in Fig. 14. The top left panel shows a trace of the log-likelihood value as the reference point is increased, and reaches a maximum at $4.60. To properly interpret this value, note that these estimates are made at the level of the individual choice in this task, and the subject was to be paid for three of those choices. So the reference point for the overall task of 60 choices would be $13.80 ( ¼ 3 $4.60). This is roughly consistent with the range of estimates of expected session earnings elicited by Andersen et al. (2006b) for a sample drawn from the same population.50 The other interesting part of Fig. 14 is that the estimate of loss aversion increases steadily as one increases the assumed reference point. At the ML reference point of $4.60, l is estimated to be 2.51, with a standard error of 0.37 and a 95% conﬁdence interval between 1.79 and 3.24. These estimates

-7450

Log-Likelihood

-7500 -7550 -7600 -7650 0 1 2 3 4 5 6 7 8 Reference Point ($)

0

9 10 α

1.5

λ

4 3.5 3 2.5 2 1.5 1 .5 1

2

3 4 5 6 7 8 Reference Point ($)

9 10 γ

1.1 1.05

1.25 1

1

.75

.95 .9

.5 0

1

2

3 4 5 6 7 8 Reference Point ($)

9 10

0

1

2

3 4 5 6 7 8 9 10 Reference Point ($)

Fig. 14. Estimates of the Structural CPT Model with a Range of Assumed Reference Points. (Estimated with Subjects from the Harrison and Rutstro¨m (2005) design; N ¼ 207 Subjects: 63 Gain Frame, 57 Loss Frame, and 87 Mixed Frame.)

98

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

raise an important methodological question: was it the data that led to the conclusion that loss aversion was signiﬁcant, or the priors favoring signiﬁcant loss aversion that led to the empirical speciﬁcation of reference points? Our results may appear to be a conﬁrmation of the argument made by some PT analysts that lE2, but it is important to recognize that the estimates presented here may not extend to other data sets or to other error speciﬁcations in the likelihood function. Further, in experimental subject pools with different reference points we would ﬁnd something else entirely. At the very least, it is premature to proclaim ‘‘three cheers’’ for loss aversion (Camerer, 2005).

3.3. Characterizing Risk Attitudes with Several Latent Data Generating Processes Since different models of choice behavior under uncertainty imply somewhat different characterizations of risk attitudes, it is important that we make some determination about which of these models is to be adopted. One of the enduring contributions of behavioral economics is that we now have a rich set of competing models of behavior in many settings, with EUT and PT as the two front-runners for choices under uncertainty. Debates over the validity of these models have often been framed as a horse race, with the winning theory being declared on the basis of some statistical test in which the theory is represented as a latent process explaining the data. In other words, we seem to pick the best theory by ‘‘majority rule.’’ If one theory explains more of the data than another theory, we declare it the better theory and discard the other one. In effect, after the race is over we view the horse that ‘‘wins by a nose’’ as if it was the only horse in the race. The problem with this approach is that it does not recognize the possibility that several behavioral latent processes may co-exist in a population. Recognizing that possibility has direct implications for the characterization of risk attitudes in the population. Ignoring this possibility can lead to erroneous conclusions about the domain of applicability of each theory, and is likely an important reason for why the horse races pick different winners in different domains. For purely statistical reasons, if we have a belief that there are two or more latent population processes generating the observed sample, one can make more appropriate inferences if the data are not forced to ﬁt a speciﬁcation that assumes one latent population process. Heterogeneity in responses is well recognized as causing statistical problems in experimental and non-experimental data. Nevertheless, allowing for heterogeneity in responses through standard methods, such as ﬁxed or

Risk Aversion in the Laboratory

99

random effects, is not helpful when we want to identify which people behave according to what theory, and when. Heterogeneity can be partially recognized by collecting information on observable characteristics and controlling for them in the statistical analysis. For example, a given theory might allow some individuals to be more risk averse than others as a reﬂection of personal preference. But this approach only recognizes heterogeneity within a given theory. This may be important for valid inferences about the ability of the theory to explain the data, but it does not allow for heterogeneous theories to co-exist in the same sample. One approach to heterogeneity and the possibility of co-existing theories adopted by Harrison and Rutstro¨m (2005) is to propose a ‘‘wedding’’ of the theories. They specify and estimate a grand likelihood function that allows each theory to co-exist and have different weights, a so-called mixture model. The data can then identify what support each theory has. The wedding is consummated by the ML estimates converging on probabilities that apportion non-trivial weights to each theory. Their results are striking: EUT and PT share the stage, in the sense that each accounts for roughly 50% of the observed choices. Thus, to the extent that EUT and PT imply different things about how one measures risk aversion, and the role of the utility function as against other constructs, assuming that the data is generated by one or the other model can lead to erroneous conclusions. The fact that the mixture probability is estimated with some precision, and that one can reject the null hypothesis that it is either 0 or 1, also indicates that one cannot claim that the equal weight to these models is due to chance. The main methodological lesson from this exercise is that one should not rush to declare one or other model as a winner in all settings.51 One would expect that the weight attached to EUT would vary across task domains, just as it can be shown to vary across observable socio-economics characteristics of individual decision makers. Another approach to heterogeneity involves the use of ‘‘random parameters’’ in models, illustrated well by Wilcox (2008a, 2008b). Consider the simple EUT speciﬁcation with no stochastic noise assumption, given by Eqs. (1)–(5). There is one parameter doing all the empirical work: the coefﬁcient of RRA r. In the traditional statistical speciﬁcation r is treated as the same across all individuals in the sample, or as a linear function of observable characteristics. An alternative approach is to view r as varying over the sample according to some distribution, commonly assumed to be Normal. In that speciﬁc case there are really two parameters to be estimated, the mean of r and the standard deviation of r.

100

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

If the heterogeneity of process takes a nested form, in the sense that one process is a restricted form of the other, then one can think of the correct statistical speciﬁcation as either a ﬁnite mixture model or a random coefﬁcients speciﬁcation. In the latter case one would want to allow more ﬂexible functional forms than Normal, to allow for multiple modes, but this is easy to generate as the sum of several uni-modal distributions. If the heterogeneity of process takes a non-nested form, such that the parameter sets are distinct for each process, then the mixture speciﬁcation is more appropriate, or one should use a combination of mixture and random parameter speciﬁcations (Conte, Hey, & Moffatt, 2007).

3.4. Joint Elicitation of Risk Attitudes and Other Preferences In many settings in experimental economics we want to elicit some preference from a set of choices that also depend on risk attitudes. Often these involve strategic games, where the uncertain ways in which behavior of others deviate from standard predictions engenders a lottery for each player. Such uncertain deviations could be due to, for example, unobservable social preferences such as fairness or reciprocity. One example is offers made in Ultimatum bargaining when the other player cannot be assumed to always accept a minuscule amount of money, and acceptable thresholds may be uncertain. Other examples include Public goods contribution games where one does not know the extent of free riding of other players, and Trust games in which one does not know the likelihood that the other player will return some of the pie transferred to him. Another source of uncertainty is the possibility that subjects make decisions with error, as predicted in Quantal Response Equilibria. Later we consider one example of this use of controls for risk attitudes in bidding in ﬁrst-price auctions. In some cases, however, we simply want to elicit a preference from choices that do not depend on the choices made by others in a strategic sense, but which still depend on risk attitudes. An example due to Andersen, Harrison, Lau, and Rutstro¨m (2008a) is the elicitation of individual discount rates. In this case, it is the concavity of the utility function that is important, and under EUT that is synonymous with risk attitudes. The implication is that we should combine a risk elicitation task with a time preference elicitation task, and use them jointly to infer discount rates over utility. Assume EUT holds for choices over risky alternatives and that discounting is exponential. A subject is indifferent between two income

Risk Aversion in the Laboratory

options Mt and Mt+t if and only if 1 1 Uðo þ M t Þ þ UðoÞ ¼ UðoÞ þ Uðo þ M tþt Þ ð1 þ dÞt ð1 þ dÞt

101

(10)

where U(o+Mt) is the utility of monetary outcome Mt for delivery at time t plus some measure of background consumption o, d the discount rate, t the horizon for delivery of the later monetary outcome at time t+t, and the utility function U is separable and stationary over time. The left-hand side of Eq. (10) is the sum of the discounted utilities of receiving the monetary outcome Mt at time t (in addition to background consumption) and receiving nothing extra at time t+t, and the right-hand side is the sum of the discounted utilities of receiving nothing over background consumption at time t and the outcome Mt+t (plus background consumption) at time t+t. Thus, Eq. (10) is an indifference condition and d is the discount rate that equalizes the present value of the utility of the two monetary outcomes Mt and Mt+t, after integration with an appropriate level of background consumption o. Most analyses of discounting models implicitly assume that the individual is risk neutral,52 so that Eq. (10) is instead written in the more familiar form 1 (11) Mt ¼ M tþt ð1 þ dÞt where d is the discount rate that makes the present value of the two monetary outcomes Mt and Mt+t equal. To state the obvious, Eqs. (10) and (11) are not the same. As one relaxes the assumption that the decision-maker is risk neutral, it is apparent from Jensen’s Inequality that the implied discount rate decreases if U(M) is concave in M. Thus, one cannot infer the level of the individual discount rate without knowing or assuming something about their risk attitudes. This identiﬁcation problem implies that risk attitudes and discount rates cannot be estimated based on discount rate experiments alone, but separate tasks to identify the inﬂuence of risk preferences must also be implemented. Andersen et al. (2008a) do this, and infer discount rates for the adult Danish population that are well below those estimated in the previous literature that assumed RN, such as Harrison, Lau, and Williams (2002), who estimated annualized rates of 28.1% for the same target population. Allowing for concave utility, they obtain a point estimate of the discount rate of 10.1%, which is signiﬁcantly lower than the estimate of 25.2% for the same sample assuming linear utility. This does more than simply verify that discount rates and risk aversion coefﬁcients are mathematical substitutes in

102

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

the sense that either of them have the effect of lowering the inﬂuence from future payoffs on present utility. It tells us that, for risk aversion coefﬁcients that are reasonable from the standpoint of explaining choices in the lottery choice task, the estimated discount rate takes on a value that is much more in line with what one would expect from market interest rates. To evaluate the statistical signiﬁcance of adjusting for a concave utility function one can test the hypothesis that the estimated discount rate assuming risk aversion is the same as the discount rate estimated assuming RN. This null hypothesis is easily rejected. Thus, allowing for risk aversion makes a signiﬁcant difference to the elicited discount rates.

3.5. Testing Expected Utility Theory Much of the data collected with the direct intent of testing EUT involved choice pairs selected deliberately to provide a way of testing EUT without having to know the risk attitudes of subjects. Unfortunately they provide extremely weak tests, since one can only count a choice as a success or failure of the theory, and no transparent metric suggests itself to weight some violations rather than others as more serious.53 This is why we generally use ML to estimate parameters in such binary choice settings, and not the ‘‘hit ratio,’’ since some hits are closer than others and we want to take that into account by calculating the probability of the observed choice conditional on the parameters being evaluated.54 The problem is even more serious than devising a metric to test the seriousness of violations. In two respects, EUT is a hard theory to reject in these settings First, how does one know if the subjects are actually indifferent to the choice pairs on offer? Allowing subjects to express indifference does not sufﬁce, since there is no way to know if they have randomized internally before picking out one lottery. Moreover, waiting for the data to exhibit 50–50 splits for indifference presumes that no artifactual presentation biases exist.55 Second, how does one know if the subjects are not extremely risk averse? High levels of risk aversion mean that the CE of the utility values of the prizes are all close to ‘‘very small numbers.’’ Hence, for sufﬁciently high levels of risk aversion, the CEs of the two lotteries are virtually identical and the subject should be rationally indifferent. Unfortunately, this free parameter gives EUT the formal leeway to escape from virtually any test one can think of. These problems lead one to question how operationally meaningful these tests are without some independent characterization of risk attitudes.

Risk Aversion in the Laboratory

103

To provide one striking example of this issue, consider the Preference Reversal tests of EUT presented to economists by Grether and Plott (1979). In these experiments, the subject was asked to make a direct binary choice between lotteries A and B, and then to state a valuation on each of A and B. From the latter two valuations the experimenter can infer a binary preference. The reversal is said to occur when the inferred binary preference differs from the direct binary choice. One design feature of these tasks is that A and B had virtually identical expected value. Given this information, anthropomorphize and sympathize with a poor ML estimation routine trying to explain any sample of choices in which there are signiﬁcant numbers of reversals. It could try assuming subjects were risk neutral, and then it could ‘‘explain’’ any observed choice since the subject would be indifferent between either option. The best way to address these concerns is to characterize the risk attitudes of the subjects independently of the choice tasks, allowing the experimenter to identify those subjects that make for better tests of EUT. This identiﬁcation can proceed independently of the choice data one is seeking to confront with EUT. To illustrate, consider the Common Ratio tests of EUT from Cubitt, Starmer, and Sugden (1988a) (CSS). The CSS tests used 451 subjects, who were randomly given one of ﬁve problems.56 The ﬁrst and last problems in CSS were a choice between simple prospects. Problem 1 was a choice between option A, which was an 80% chance of d16, and option B, which was d10 for certain. Problem 5 was a simple ‘‘common ratio’’ transformation which multiplied each option by 1/4, so that option A was a 20% chance of d16 and option B was a 25% chance of d10. Problems 2 through 4 were procedural variants on Problem 2, which are identical to Problem 5 from the perspective of EUT. We refer to these as problems AB and AB for present purposes, in new experiments discussed below. Thus, CSS Problems 2–5 correspond to problem AB in our design, and their Problem 1 corresponds to our problem AB. Cubitt, Starmer, and Sugden (1988a; Table 2, p. 1375) report that 50% of their sample chose option A in their Problems 2 through 5, which are qualitatively identical to problem AB in our design. Only 38% of their subjects chose option A in their Problem 1, which is qualitatively the same as problem AB in our design. Using the same w2 contingency table test employed by CSS, we can only reject the EUT hypothesis at a signiﬁcance level of 11.2%; Fisher’s exact test for the same two-sided comparison has a signiﬁcance level of 15.3%. So there is weak evidence that EUT is violated, even if it does not strictly fail at conventional levels of signiﬁcance.57

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

104

For a speciﬁc example of the Common Ratio test, in which we have independent information on risk attitudes, suppose Lottery A consists of prizes $0 and $30 with probabilities 0.2 and 0.8, and that Lottery B consists of prizes $0 and $20 with probabilities 0 and 1. Then one may construct two additional compound lotteries, A and B, by adding a front-end probability q ¼ 0.25 of winning zero to lotteries A and B. That is, A offers a (1 q) chance to play lottery A and a q chance of winning zero. Subjects choosing A over B and B over A, or choosing B over A and A over B, are said to violate EUT. To show precisely how risk aversion does matter, assume that risk attitudes can be characterized by the popular CRRA function, Eq. (1). The CE of the lottery pairs AB and AB as a function of r are shown in the left and right upper panels, respectively, of Fig. 15. The CRRA coefﬁcient ranges from 0.5 (moderately risk loving) up to 1.25 (very risk averse), with a risk-neutral subject at r ¼ 0. The CE of lottery B, which offers $20 for sure, is the horizontal line in the left panel of Fig. 15. The CE of A, A, and B all decline as risk aversion increases. The lower panels of Fig. 15 show the CE differences between the A and B (A and B) lotteries. Note that for

CE for Lotteries A*and B*

CE for Lotteries A and B 30.00 25.00 20.00 15.00 10.00 5.00 0.00

Lottery B Lottery A

-.5

0

.5 1 CRRA coefficient

30.00 25.00 20.00 15.00 10.00 5.00 0.00

1.5

Lottery A* Lottery B* -.5

CE of Lottery A Minus CE of Lottery B

0 .5 1 CRRA coefficient

1.5

CE of Lottery A* Minus CE of Lottery B* 10.00 5.00 0.00 -5.00 -10.00 -15.00 -20.00

10.00 5.00 0.00 -5.00 -10.00 -15.00 -20.00 -.5

0

.5 1 CRRA coefficient

Fig. 15.

1.5

-.5

0 .5 1 CRRA coefficient

Risk Attitudes and Common Ratio Tests of EUT.

1.5

Risk Aversion in the Laboratory

105

the AB (AB) lotteries, the preferred outcome switches to lottery B (B) for a CRRA coefﬁcient about 0.45. Most evaluations of EUT acknowledge that one cannot expect any theory to predict perfectly, since any violation would lead one to reject the theory no matter how many correct predictions it makes. One way to evaluate mistakes is to calculate their costs under the theory being tested and to ‘‘forgive’’ those mistakes that are not very costly, while holding to account those that are. For each subject in our data and each lottery choice pair, we can calculate the CE difference given the individual’s estimated CRRA coefﬁcient, allowing us to identify those choice pairs that are most salient. A natural metric for deﬁning ‘‘trivial EUT violations’’ can then be deﬁned in terms of choices that involve a difference in CE below some given threshold. Suppose for the moment that an expected utility maximizing individual will ﬂip a coin to make a choice whenever the difference in CE falls below some cognitive threshold. If r ¼ 0.8, the CE difference in favor of B is large in the ﬁrst lottery pair and B will be chosen. In the second lottery pair, the difference between the payoffs for choosing A and B is trivial (less than a cent, in fact) and a coin is ﬂipped to make a choice. Thus, with probability 0.5 the experimenter will observe the individual choosing B and A, a choice pattern inconsistent with EUT. In a sample with these risk attitudes, half the choices observed would then be expected to be inconsistent with EUT. With such a large difference between the choice frequencies, standard statistical tests would easily reject the hypothesis that they are the same. Thus, we would reject EUT in this case even though EUT is essentially58 true. Fig. 16 collates estimates of risk attitudes elicited by Harrison, Johnson, McInnes, and Rutstro¨m (2005b) from 152 subjects, described in Section 1.2 and Table 3. The idea is to simply align the CE differences for each of the CR lotteries (AB in the left panel, and AB in the right panel) with the distribution of risk attitudes expected from this sample (the bottom boxes). Clearly the subjects tend to have risk attitudes at precisely the point at which these tests have least power to reject EUT. This is particularly striking for the AB lottery choice, but even for the AB lottery choice it is only the few subjects ‘‘in the tails’’ of the risk distribution for which EUT has a strong prediction. Further, these risk attitude distributions refer to point estimates, and do not reﬂect the uncertainty of those estimates: it is quite possible that some subject that has a point estimate of his CRRA coefﬁcient that makes the AB test powerful also has a large enough standard error on that point estimate that the AB test is not powerful. This issue of precision is addressed directly by Harrison, Johnson, McInnes, and Rutstro¨m (2007a).

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

106

CE of Lottery A Minus CE of Lottery B 5

CE of Lottery A* Minus CE of Lottery B* 1.0

0 0.5

-5 -10

0.0

-15 0

.25 .5 .75 CRRA coefficient

1

0

.25

.5 .75 CRRA coefficient

1

Distribution of Risk Attitudes

Distribution of Risk Attitudes 30

30

20

20

10

10

0

0 0

.25

.5

.75

Estimated CRRA Coefficient

Fig. 16.

1

0

.25

.5

.75

1

Estimated CRRA Coefficient

Observed Risk Attitudes and Common-Ratio Tests of EUT.

For some violations it may be easy to write out speciﬁc parametric models of the latent EUT decision-making process that can account for the data. The problem is that the model that can easily account for one set of violations need not account for others. As already noted, the preference reversals of Grether and Plott (1979) can be explained by assuming risk-neutral subjects with an arbitrarily small error process, since the paired lotteries are designed to have the same expected value. Hence, each subject is indifferent, and the error process can account for the data.59 But then such subjects should not violate EUT in other settings, such as common ratio tests. However, rarely does one encounter tests that confront subjects with a wide range of tasks and evaluates behavior simultaneously over that wider domain. There are three striking counter-examples to this trend. First, Hey, and Orme (1994) deliberately use lotteries that span a wide range of prizes and probabilities, avoiding ‘‘trip wire’’ pairs, and they conclude that EUT does an excellent job of explaining behavior compared to a wide range of alternatives. Second, Harless and Camerer (1994) consider a wide range of aggregate data across many studies, and ﬁnd that EUT does a good job of explaining behavior if one places sufﬁcient value on parsimony. On the

Risk Aversion in the Laboratory

107

other hand, all of the data used by Harless and Camerer (1994) come from experimental designs that were intended to be tough on EUT compared to some alternative model; so their data are not as generic as Hey and Orme (1994). Third, Loomes and Sugden (1998) deliberately choose lotteries ‘‘y to provide good coverage of the space within each (implied Marschak–Machina probability) triangle, and also to span a range of gradients sufﬁciently wide to accommodate most subjects’ risk attitudes.’’ (p. 589). Their coverage is not as wide as Hey and Orme (1994) in terms of the range of CRRA values for which subjects would be indifferent under EUT, but the intent is clearly to provide some variability, and for the right reasons. Maximal statistical power calls for what might be termed a ‘‘complementary slack experimental design’’: choose one set of tasks such that if subjects are risk averse (risk neutral) then the choice model is tested, recognizing that if they are risk neutral (risk averse) then the other set of tasks tests the choice model. Thus, the subjects that clearly provide little information about EUT in common ratio tests in Fig. 16 should provide signiﬁcant information about EUT in preference reversal tests (Harrison et al., 2007a).60 On the other hand, we know relatively little about what is the most ‘‘ecologically relevant’’ lottery pairs to use if we are trying to model task domains in a representative manner. Our only point is that this consideration deserves more attention by economists interested in making claims about the general validity of EUT or any other model, echoing similar calls from others (Smith, 2003).

3.6. Testing Auction Theory To illustrate the potential importance of controlling for the risk attitude confound in a strategic setting, consider an important case in which there has been considerable debate over the ability of received theory to account for behavior: bidding in a ﬁrst-price sealed-bid auction characterized by private and independent values.61 Auction theory is very rich, and has been developed speciﬁcally for the parametric cases considered in experiments (e.g., Cox, Roberson, & Smith, 1982; Cox, Smith, & Walker, 1988). In a new series of laboratory experiments data are collected on observed valuations and bids, using standard procedures. However, information is also elicited that identiﬁes the risk attitudes of the same subject, since that is a critical characteristic of the predicted bid under the standard model (e.g., Harrison, 1990). It is then straightforward to specify a joint likelihood function for the observed risk aversion responses and bids, estimate the risk aversion

108

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

characteristic, and test if the implied NE bid systematically differs from the observed bid. The results are striking. In the simplest possible case, when there are only two bidders (N ¼ 2), received theory does a wonderful job of characterizing behavior when one controls for the risk attitudes of the individual bidder.62 3.6.1. Theoretical Predictions Cox et al. (1982) develop a model of bidding behavior in ﬁrst-price sealedbid auctions that assumes that each agent has a CRRA power utility function U(y) ¼ yr, where U is the utility of experimental income y and (1 ri) is the Arrow–Pratt measure of risk aversion (RA). Each agent has their own ri, so each agent is allowed to have distinct risk attitudes. However, ri is restricted to lie on the closed interval (0,1), where ri ¼ 1 corresponds to RN. Hence, this model allows (weak) risk aversion, but does not admit risk-loving behavior.63 Each agent in the model knows their own risk attitude, their own valuation vi, that everyone’s risk attitudes are drawn from the closed interval (0,1), and that everyone’s valuation is drawn from a uniform distribution over the interval (v0, v1). It can then be shown that the symmetric Bayesian NE implies the following bid function: ðN 1Þ bi ¼ v 0 þ (12) ðvi v0 Þ ðN 1 þ ri Þ where there are N active bidders. In the RN case in which v0 ¼ 0, v1 ¼ 1, and ri ¼ 1, this model is the one derived by Vickrey (1961), and calls for bidders to choose their optimal bid using a simple rule: take the valuation received and shade it down by (N 1)/N. When N ¼ 2, the RN NE bidding rule is therefore particularly simple: bid one-half of the valuation. Thus, one might expect the N ¼ 2 case to provide a particularly compelling test of the general RA NE bidding rule, since the optimal RN NE bid is also an arithmetically simple heuristic.64 3.6.2. Experimental Design and Procedures Each subject in our experiment participated in a single session consisting of two tasks. The ﬁrst task involved a sequence of choices designed to reveal each subject’s risk preferences. In the second task, subjects participated in a series of 10 ﬁrst-price auctions against random opponents, followed by a small survey designed to collect individual characteristics. A total of 58 subjects from the student population of the University of Central Florida participated over three sessions. The smallest number of subjects in one

Risk Aversion in the Laboratory

109

session was 16, so there was little chance that the subjects would rationally believe that they could establish reputations over the 10 rounds of bidding against a random opponent.65 Each subject was told that they would be privately assigned induced values between $0 and $8, using a uniform distribution. Cox et al. (1982) show that for RN subjects the expected earning of each subject in a ﬁrstprice auction is (v1 v0)/N(N+1), where v1 and v0 are the upper and lower bound for the support of the induced values. Thus, expected RN earnings were $1.33 per subject in each period. Subjects in each session were also informed of the number of other bidders in the auction; that the other bidders’ induced values were, like their own, drawn from a uniform support with bounds given above; and that their earnings in the auction would equal their induced value minus their bid if they have the highest bid, or zero otherwise. We used the Holt and Laury (2002) design to elicit risk attitudes from the same subjects. In these experiments, we scaled these baseline prizes of their design, shown in panel A of Table 1, up by a factor of 2, so that the largest prize was $7.70 and the smallest prize was $0.20. The prizes in these lotteries effectively span the range of possible incomes in the auction, which range from $8.00 to zero. 3.6.3. Results Panel B of Fig. 17 displays observed bidding behavior. The induced value is displayed on the bottom axis, a 451 line is shown and corresponds to the subject just bidding their value, and then the RN bid prediction is shown under that 451 line. The standard behavior from a long series of such experiments is observed: subjects tend to bid higher than the RN prediction, to varying degrees. The statistical model consists of a likelihood of observing the risk aversion responses and the observed bidding responses. The likelihood of the risk aversion responses is modeled with a probit choice rule deﬁned over the 10 binary choices that each subject made, exactly as illustrated in Section 1.2 but for the power utility function. To allow for subject heterogeneity with respect to risk attitudes, the parameter r is modeled as a linear function of observed individual characteristics of the subject. For example, assume that we only had information on the age and sex of the subject, denoted Age (in years) and Female (0 for males, and 1 for females). Then we would estimate the coefﬁcients a, b, and g in r ¼ a+b Age+g Female. Therefore, each subject would have a different estimated r, r^, for a given set of estimates of a, b, and g to the extent that the

110

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

8

8 7

6

6

Bid

5 4

4 3

2

2 1

0

0

.55 .6 .65 .7 .75 .8 .85 .9 .95 1 1.05 (A) CRRA Coefficient

(B)

0

1

2

3

4

5

6

7

8

Induced value

Fig. 17. Observed Risk Attitudes and Observed Bidding. (A) Risk Elicitation Task; Power Utility Function Assumed: ro1 is RA, r ¼ 1 is RN, and rW1 is RL. (B) FirstPrice Sealed Bid Auction; 2 bidders per auction over 10 rounds. N ¼ 58 Subjects with Random Opponents. Valuations between $0 and $8.

subject had distinct individual characteristics. So if there were two subjects with the same sex and age, to use the above example, they would literally have the same r^, but if they differed in sex and/or age they would generally have distinct r^. In fact, we use 12 individual characteristics in our model. Apart from age and sex, these include binary indicators for race (NonWhite), a Business major, rich (parental or own income over $80,000 in 2003), high GPA (above 3.75), low GPA (below 3.25), college education for the father of the subject, college education for the mother of the subject, whether the subject works, whether the subject is a Catholic, and whether the subject is some other Christian denomination. Panel A of Fig. 17 displays the predicted risk attitudes from this estimation exercise, using only the risk aversion task. The likelihood of the bidding responses is then modeled as a multiplicative function of the predicted bid conditional on the estimated risk attitude for the subject. Thus, we estimate a coefﬁcient b which scales up or down the predicted NE bid: if b ¼ 1 then the observed bid exactly tracks the predicted bid for that subject. The predicted NE bid for each subject i depends, of course, on the r^i for that subject, as well as the parameters N,

Risk Aversion in the Laboratory

111

v0, v1, and vi. Thus, if we observe two subjects with the same vi but different bids, it is perfectly possible for this to be consistent with the predicted NE bid if they have distinct individual characteristics and hence distinct r^i . The coefﬁcient b is also modeled as a linear function of the same set of individual characteristics as the coefﬁcient r.66 The full speciﬁcation of the likelihood function for bidding allows for heteroskedasticity with respect to individual characteristics. Thus, the speciﬁcation is (b bNE)+e, where the variance of e is again a linear function of the individual characteristics. Thus we obtain information from the coefﬁcients of b on which types of subjects deviate systematically from the NE prediction, and we obtain information from the coefﬁcients on e on which types of subjects exhibit more noise in their bidding. The overall likelihood consists of the likelihood of the risk aversion responses plus the likelihood of the bidding responses, conditional on estimates of r, b, and the variance of e. In turn, these three parameters are linear functions of a constant and the individual characteristics of the subject. Since each subject provides 10 binary choices in the risk aversion task, and 10 bids in the auction task, we use clustering to allow for the responses of the same subject to be correlated due to unobserved individual effects. Table 7 displays the ML estimates. The intercept for r is estimated to be 0.612, consistent with evidence from comparable experiments of risk aversion discussed earlier. The intercept for b is 1.02, consistent with bids being centered on the RA NE bid conditional on the estimated risk aversion for each subject. The top panel of Fig. 18 shows the distribution of predicted values of b for each of the 58 subjects. Some subjects have estimates of b as low as 0.8, or as high as 1.35, but the clear majority seem to be tracked well by the RA NE bidding prediction. The bottom panel of Fig. 18 displays a distribution of comparable estimates when we use the RN NE bidding prediction instead of the RA NE bidding prediction, and re-estimate the model. Observed bids are about 25% higher than predicted if one assumes, counter-factually, that subjects are all RN.

3.7. Testing Myopic Loss Aversion Prospect Theory has forced economists to worry about the task domain over which decisions are evaluated, where a sequence of many tasks over time may be treated very differently from a single choice task. PT obviously focuses on the implications for loss aversion from this differential treatment.

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

112

Table 7. Parameter

r

b

e

Maximum Likelihood Estimates for Model of Bidding Behavior. Variable

Point Estimate

Constant Age Female Non-white Major is in business Father completed college Mother completed college Income over $80k in 2003 Low GPA (below 3.24) High GPA (greater than 3.75) Work full-time or parttime Catholic religious beliefs Other Christian religious beliefs

0.612 0.003 0.052 0.011 0.058 0.004 0.003 0.036 0.024 0.190

0.320 0.015 0.079 0.081 0.086 0.083 0.093 0.073 0.098 0.113

0.06 0.82 0.51 0.89 0.50 0.96 0.97 0.62 0.81 0.09

0.02 0.03 0.21 0.15 0.11 0.17 0.18 0.11 0.22 0.03

1.24 0.03 0.10 0.17 0.23 0.16 0.19 0.18 0.17 0.41

0.022

0.074

0.77

0.17

0.12

0.046 0.040

0.130 0.080

0.72 0.62

0.30 0.12

0.21 0.20

1.021 0.007 0.019 0.059 0.023 0.054 0.023 0.078 0.001 0.210

0.721 0.030 0.084 0.079 0.083 0.068 0.085 0.074 0.079 0.124

0.16 0.81 0.82 0.45 0.78 0.43 0.79 0.29 0.99 0.09

0.39 0.07 0.14 0.21 0.14 0.08 0.19 0.07 0.15 0.03

2.43 0.05 0.18 0.09 0.19 0.19 0.14 0.22 0.16 0.45

0.019

0.068

0.78

0.15

0.11

0.035 0.157

0.095 0.080

0.72 0.05

0.15 0.00

0.22 0.31

Constant 0.096 Age 0.011 Female 0.008 Non-white 0.077 Major is in business 0.123 Father completed college 0.330 Mother completed college 0.078

1.093 0.046 0.156 0.162 0.133 0.139 0.199

0.93 0.81 0.96 0.63 0.36 0.02 0.69

2.05 0.10 0.30 0.39 0.14 0.60 0.31

2.24 0.08 0.32 0.24 0.38 0.06 0.47

Constant Age Female Non-white Major is in business Father completed college Mother completed college Income over $80k in 2003 Low GPA (below 3.24) High GPA (greater than 3.75) Work full-time or parttime Catholic religious beliefs Other Christian religious beliefs

Standard p-Value Lower 95% Upper 95% Error Conﬁdence Conﬁdence Interval Interval

113

Risk Aversion in the Laboratory

Table 7. (Continued ) Parameter

Variable

Point Estimate

Income over $80k in 2003 0.044 Low GPA (below 3.24) 0.116 High GPA (greater than 0.044 3.75) Work full-time or part 0.144 time Catholic religious beliefs 0.341 Other Christian religious 0.077 beliefs

Standard p-Value Lower 95% Upper 95% Error Conﬁdence Conﬁdence Interval Interval 0.119 0.111 0.191

0.71 0.29 0.82

0.19 0.10 0.33

0.28 0.33 0.42

0.148

0.33

0.43

0.15

0.157 0.149

0.03 0.61

0.65 0.37

0.03 0.21

Risk-Averse NE Predicted Bid

8 6 4 2 0

Risk-Neutral NE Predicted Bid

8 6 4 2 0 .8

Fig. 18.

.9

1 1.1 1.3 1.2 Estimated Ratio of Actual Bids to Predicted Bids

1.4

Relative Support for Alternative Nash Equilibrium Bidding Models.

Unfortunately, the insight from PT that the evaluation period might differ from setting to setting, or from subject to subject, has not been integrated into EUT. In fact, this insight is often presented as one of the essential points of departure from EUT, and as one of the differentiating characteristics of PT. We argue that behavioral issues of the evaluation period is a

114

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

more general and fundamental concern than concerns about loss aversion in PT. By considering recent experimental tests of aversion of this insight known as Myopic Loss Aversion (MLA), it is possible to see that the insight is just as relevant for EUT, and that a full characterization of risk attitudes must account for the evaluation period. Camerer (2005; p. 130) explains why one naturally thinks of loss aversion and the evaluation period together: A crucial ingredient in empirical applications of loss aversion is decision isolation, or focusing illusion, in which single decisions loom large even though they are included in a stream of similar decisions. If many small decisions are integrated into a portfolio of choices, or a broad temporal view – the way a gambler might view next year’s likely total wins and losses – the loss on any one gamble is likely to be offset by others, so aversion to losses is muted. Therefore, for loss aversion to be a powerful empirical force requires not only aversion to loss but also a narrow focus such that local losses are not blended with global gains. This theme emerges in the ten ﬁeld studies that Camerer (2000) discusses, which show the power of loss aversion (and other prospect theory features) to explain substantial behaviors outside the lab.

However, there is very little direct experimental evidence, with real stakes, to support MLA. Furthermore, we argue that what evidence there is also happens to be consistent with EUT. By carefully considering those experimental tests from the perspective of EUT and the implications for the characterization of risk attitudes, it is easy to see that the behavioral issue of the evaluation period is a more general and fundamental concern. Several recent studies propose experimental tests that purport to directly test EUT against the alternative hypothesis of MLA. Gneezy and Potters (1997) and Haigh and List (2005) use simple experiments in which many potential confounds are removed.67 Unfortunately, those experiments only test a very special case of EUT against the alternative hypothesis. This special case is CRRA, and it fails rather dramatically. But it is easy to come up with other utility functions that are consistent with EUT and that can explain the observed data without relying on MLA. For example, any utility function with decreasing RRA and that exhibits risk aversion for low levels of income will sufﬁce at a qualitative level. The empirical outcomes observed at the individual level can then be explained by simply ﬁtting speciﬁc parameters to this utility function. Appendix E demonstrates this intuitively, as well as more formally. Our new analysis of the GP data presented in Appendix E also identiﬁes some unsettling implications of these experiments for MLA: that the key ‘‘loss aversion’’ parameters of the standard MLA model vary dramatically

Risk Aversion in the Laboratory

115

according to the exogenously imposed evaluation period and that the risk attitudes are the opposite of those generally assumed in PT, viz., risk loving in gains and risk averse in losses. Thus, the behaviorist explanation is hoisted on the same petard it alleged applied to the EUT explanation, the presence of anomalous behavior. However, although it is useful and trivial to come up with a standard EUT story that accounts for the data, and even fun to ﬁnd an anomaly for the behaviorists to ponder, these experiments force one to examine a much deeper question than ‘‘can EUT explain the data?’’ That question is whether utility is best deﬁned over each individual decision that the subject faces or over the full sequence of decisions that the subject is asked to make in an experimental session, or perhaps even including extra-lab decisions. Depending on how the subjects interpret the experimental task, these frames could differ in this experimental task. This perspective suggests the hypothesis that behavior might be better characterized as a mixture of two latent data generating processes, as suggested by Harrison and Rutstro¨m (2005) and Section 3.3, with some subjects using one frame and other subjects using another frame. A related issue underlying the assessment of behavior from these experiments is asset integration within the laboratory session. What incomes are arguments of the utility functions of the subjects? The common assumption in experimental economics is that it is simply the prizes over which they were making choices whenever they got to make a choice.68 But what about asset integration of income earned during the sequence of rounds? Gneezy and Potters (1997; p. 636) note that this could affect risk attitudes in a more general speciﬁcation, but assert that the effect is likely to be small given the small stakes. This may be true, but is just an assertion and deserves more complete study using the general framework proposed by Cox and Sadiraj (2006). The Gneezy and Potters (1997) data provide an opportunity to study this question, since subjects received information on their intra-session income ﬂows at different rates. Hence one could, in principle, test what function of accumulated wealth was relevant for their choices. We believe that the fundamental insight of Benartzi and Thaler (1995) of the importance of the evaluation horizon of decision makers is worthy of more attention, even though we ﬁnd that the present tests of MLA have been somewhat misleading. The real contribution of the MLA literature and the experimental design of Gneezy and Potters (1997) is to force mainstream economists to pay attention to an issue they have neglected within their own framework.

116

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

3.8. The Random Lottery Incentive Procedure The random lottery incentive procedure originated from the desire to avoid ‘‘wealth or portfolio effects’’ of subjects making multiple choices at once to determine their ﬁnal experimental income.69 It also has the advantage of saving scarce experimental subject payments, but that arose originally as a happy by-product. The procedure bothers theorists and non-experimenters, particularly when one is using the experimental responses to estimate risk attitudes. The reason is that there is some ambiguity as to whether the subject is evaluating the utility of the lottery in each choice, or the compound lottery that includes the random selection of one lottery for payment. The procedure also imposes a motivational constraint on the level of incentives one can have in certain elicitation tasks. To generate better econometric estimates we would like to gather more choices from each subject: witness the glee that Wilcox (2008a) expresses over the sample size of the design in Hey (2001). Each subject in that design generated 500 binary choices, over ﬁve sessions at separate times, and was paid for one selected at random. But 1-in-500 is a small number, even if the prizes were as high as d125 and EV maximization would yield a payoff of just over d79. So there is a tension here, in which we want to gather more choices per subject, but run the risk that the probability of any one choice being realized drops as we do so. The experiments of Hey (2001) are remarkable because they appear to have motivated subjects well – aggregate error rates from repeated tasks are very low compared to those found in comparable designs with fewer tasks (Nathaniel Wilcox; personal communication). What we would like to do is run an experiment with as many choices as we believe that subjects can perform without getting bored, but ensure that they do not see each choice as having a vanishing chance of being salient. In our experience, 60 binary choices are about the maximum we can expect our subjects to undertake without visible signs of boredom setting in. But even 1-in-60 sounds small, and may be viewed that way by subjects, effectively generating hypothetical responses and the biases that typically come with them (see Section 4.1). Of course, this is a behavioral issue: do subjects focus on the task as if it were deﬁnitely the one to be paid, or do they mostly focus on the likelihood of the task determining their earnings? Several direct tests of this procedure lead some critics of EUT to the conclusion that the procedure appears, as an empirical matter, to induce no cross-task contamination effects when choices are over simple lottery prospects; see Cubitt, Starmer, & Sugden (1988b; p. 129), for example. Related tests include Starmer and Sugden (1991) and Beattie and Loomes

Risk Aversion in the Laboratory

117

(1997). So the empirical evidence suggests that it does not matter behaviorally. On the other hand, doubts remain. Certain theories of decision-making under risk differ in terms of the predicted effect these procedures have on behavior. To take an important example, consider the use of the random lottery incentive procedure in the context of an MPL task. The theoretical validity of this procedure presumes EUT, and if EUT is invalid then it is possible that this procedure might be generating invalid inferences. Under EUT it does not matter if the subjects evaluate their choices in each task separately, make one big decision over the whole set of tasks, or anything in between, since the random incentive is just a ‘‘common ratio probability’’ applied to each task. However, under RDU or PT this common ratio probability could lead to very different choices, depending on the extent of probability weighting. Hey and Lee (2005a, 2005b) provide evidence that subjects do not appear to consider all possible tasks, but their evidence is provided in the context of RLP designs discussed in Section 1.2. In that case the subject does not know the exact lotteries to be presented in the future, after the choice before him is made, so one can readily imagine the cognitive burden involved in anticipating what the future lotteries will be.70 But for the MPL instrument the subject does know the exact lotteries to be presented in the whole task, and the set of responses can be plausibly reduced in number to just picking one switch point, rather than picking from the 210 ¼ 1024 possible binary choices in 10 rows. Thus, the MPL instrument may be more susceptible to concerns with the validity of the random lottery incentive procedure than other instruments.71 On the other hand, it is not obvious theoretically that one wants to avoid ‘‘portfolio effects’’ when eliciting risk attitudes. These effects arise as soon as subjects are paid for more than one out of K choices. Again, consider the same type of binary choice experiments considered above. The standard implementation of the random lottery incentive mechanism in experiments such as these would have one choice selected at random. For the case of investigating ‘‘preference reversals’’ the reason for only using one choice is well explained by Cox and Epstein (1989; p. 409): Economic theories of decision making under risk explain how variations in wealth can affect choices. Thus an agent with wealth w may prefer lottery A to lottery B but that ^ same agent with wealth waw may prefer lottery B to A. Therefore, the results of preference reversal experiments that allow a subject’s wealth to change between choices cannot provide a convincing challenge to economic theory unless it can be shown that wealth effects cannot account for the results.

118

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Economic theories of decision making under risk provide explanations of optimal portfolio choice. Such theories explain why an agent might prefer lottery A to lottery B but prefer the portfolio (A, B) to the portfolio (A, A). If the portfolio is accumulated by sequential choice of A over B and then B over A, an apparent preference reversal could consist of choices that construct an agent’s optimal portfolio.

When the interest is in the inferred risk coefﬁcient, however, the possibility of subjects choosing portfolios to match their preferences have different implications. To avoid risk-pooling incentives, the outcomes of the lotteries must be uncorrelated, which is normally the case in such experiments. Nevertheless, even then it is possible for a subject to prefer the portfolio (A, B) to (A, A) even if he would prefer A to B when being paid only for one of his choices. To see this, recall that the lottery options presented to subjects are always discrete. In the MPL, for example, a switch from lottery A to lottery B on row 6 would lead us to infer a risk aversion coefﬁcient that is in a numeric interval, (0.14, 0.41) in the Holt and Laury (2002) experiments. An individual with a risk aversion coefﬁcient close to the boundaries of this interval would always pick (B, B) or (A, A), but an individual with a risk aversion coefﬁcient in the middle of the interval would have a preference for a mixed portfolio of (A, B). Paying for more than one lottery therefore elicits more information and allows a more precise expression of the risk preference of each subject. The point is that we then have to evaluate risk attitudes assuming that subjects compare portfolios, rather than comparing one individual lottery with another individual lottery. If we do that, then there is no theoretical reason for avoiding portfolio effects for this inferential purpose. There may be a practical and behavioral reason for avoiding that assumption in the design considered by Hey and Lee (2005a, 2005b), given the cognitive burden (to subject and analyst) of constructing all possible expected portfolios. The behavioral signiﬁcance of the portfolio effect can be directly tested by varying the number of lottery choices to be paid. In our replication of Hey and Orme (1994) we defaulted to having 60 binary lottery choices. Over 60 binary choices we used three choices for payment, to ensure comparability of rewards with other experiments in which subjects made choices over 40 or 20 lotteries, and where 2 lotteries or 1 lottery was respectively selected at random to be played out. Thus, the 1-in-20 treatment corresponds exactly to the random lottery incentive procedure that avoids portfolio effects, and the other two treatments raise the possibility of these effects. All of these tasks were in the gain frame, and all involved subjects being provided information on the EV of each lottery. The samples consisted of 11, 21, and 25 subjects in the 20, 40, and 60 lottery treatments, respectively, for a pooled sample of

Risk Aversion in the Laboratory

119

57 subjects. All the lottery outcomes were uncorrelated by executing independent draws. We ﬁnd no evidence of portfolio effects, measured by the effect on the mean elicited risk attitudes. Assume an EUT model initially, and use the CRRA function given by Eq. (1), with a Fechner error speciﬁcation. Pooling data over tasks in which the subject faced 20, 40, or 60 lotteries, on a between-subjects basis, and including a binary dummy for those sessions with 20 or 40 lotteries, there is no statistically signiﬁcant effect on elicited risk attitudes. Quite apart from statistical insigniﬁcance, the estimated effect is small: around 70.04 or less in terms of the risk aversion coefﬁcient. The same conclusion holds with a comparable RDU model, whether one looks at the concavity of the utility function, Eq. (1), the curvature of the probability weighting function, Eq. (9), or both. This valuable result is worth replicating with larger samples and in different elicitation procedures. We want to have more binary choices from the same subject to get more precise estimates of latent structural models, but on the other hand we worry that paying 1-in-K choices for K ‘‘large’’ might seriously dilute incentives for thoughtful behavior over consequential outcomes. If one can modestly increase the salience of each choice, as implemented here, and not worry about portfolio effects, then it is possible to use values of K that allow much more precise estimates of risk attitudes. Of course, the absence of the portfolio effect must be checked behaviorally, as illustrated here.

3.9. Summary Estimates We ﬁnally collate some ‘‘preferred’’ estimates of simple speciﬁcations of risk attitudes from the various designs and statistical speciﬁcations in the literature. We do not mechanically list every estimate from every design and speciﬁcation, in the spirit of some meta-analyses, ignoring the weaknesses we have discussed in each. Instead, we use a priori judgements to focus on two of the designs that we believe to be most attractive, the statistical speciﬁcations we believe to be the best available, and the studies we have the most reliable data. One design is the classic data set of Hey and Orme (1994), and the other is the classic design of Holt and Laury (2002, 2005). We favor the Holt and Laury (2005) study over Holt and Laury (2002), because of the contaminant of order effects in the earlier design, identiﬁed by Harrison, Johnson, McInnes, and Rutstro¨m (2005b). Similarly, we favor the Fechner error speciﬁcation of Hey and Orme (1994) over the Luce speciﬁcation of Holt and Laury (2002), for reasons detailed by Wilcox

120

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

(2008a).72 We also augment the British data from Hey and Orme (1994) with results from our replications with U.S. college students. We consider CRRA and EP variants for EUT, and also consider some simple RDU speciﬁcations. The CRRA utility function is speciﬁcation Eq. (1) from Section 2.2, the EP utility function is speciﬁcation Eq. (1u) from Section 2.2, and the probability weighting function is the popular speciﬁcation Eq. (9) from Section 3.1. We do not consider CPT, due to ambiguity over the interpretation of reference points in the laboratory. So this is a selective summary, guided by our views on these issues. We assume a homogenous preferences speciﬁcation, with no allowances for heterogeneity across subjects. In part, this is to anticipate their use by theorists interested in using point estimates for ‘‘calibration ﬁnger arithmetic’’ (Cox & Sadiraj, 2008). We stress that these point estimates have standard errors, and structural noise parameters, and that out-ofsample predictions of utility will have ever-expanding conﬁdence intervals for well-known statistical reasons. We encourage theorists not to forget this simple statistical point when taking estimates such as these to make predictions over domains that they were not estimated over. Of course, the corollary is that we should always qualify estimates such as these by referencing the domain over which the responses were made. Indeed, Cox and Sadiraj (2008) show that these point estimates can produce implausible thought experiments far enough out of sample. Cox and Harrison (2008) provide further discussion of this point, using estimates from Table 8, and we return to it below. Table 8 collects the estimates.73 In each case the preferred model is the RDU speciﬁcation with the EP utility function. Some interesting patterns emerge. First, there appears to be very little substantive probability weighting in the Hey and Orme (1994) data, even if the coefﬁcient g is statistically signiﬁcantly different from 1: indeed, the log-likelihood of the EUT and RDU speciﬁcations with EP are close. Second, the estimates from the two implementations of the Hey and Orme (1994) design generate estimates that are remarkably similar. Third, the extent and nature of probability weighting varies signiﬁcantly in the Holt and Laury (2005) data depending on the assumed utility function. Fourth, there is evidence of decreasing RRA in the Hey and Orme (1994) data and our replication, with ao0, but evidence of very slightly increasing RRA in the Holt and Laury (2005) data. Finally, the estimates of the concavity of the utility function do not seem to depend so much on the EUT or RDU speciﬁcation, as on the choice of utility function. To return to the point about how estimates such as these should be ‘‘read’’ by theorists, and qualiﬁed by those presenting them, consider the

121

Risk Aversion in the Laboratory

Table 8. Speciﬁcation

Parameter

Summary Estimates.

Estimate Standard Error

pValuea

Lower 95% Upper 95% Log Conﬁdence Conﬁdence Likelihood Interval Interval

Hey and Orme (1994): N ¼ 80 subjects, pooled over both tasks; 15,567 responses, excluding indifference EUT with r 0.61 0.03 0.56 0.66 8865.01 CRRA m 0.78 0.06 0.67 0.90 EUT with r 0.82 0.02 0.80 0.84 8848.03 a 1.06 0.04 1.13 0.99 ExpoPower m 0.47 0.04 0.39 0.55 RDU with r 0.61 0.03 0.56 0.66 8861.18 CRRA g 0.99 o0.01 0.98 1.00 m 0.78 0.05 0.67 0.89 RDU with r 0.82 0.01 0.80 0.84 8844.11 Expoa 1.06 0.04 1.13 0.99 Power g 0.99 o0.01 0.98 1.00 m 0.46 0.04 0.38 0.54 Our replication of Hey and Orme (1994): N ¼ 63 subjects in gain domain; 3,736 responses, excluding indifference EUT with r 0.53 0.05 0.44 0.62 2418.62 CRRA m 0.79 0.06 0.67 0.91 r 0.78 0.02 0.74 0.82 2412.26 EUT with a 1.10 0.05 1.19 1.00 Expom 0.58 0.05 0.48 0.69 Power RDU with r 0.53 0.04 0.45 0.62 2414.46 CRRA g 0.97 0.01 0.95 0.99 m 0.78 0.05 0.66 0.90 RDU with r 0.78 0.02 0.74 0.82 2408.25 Expoa 1.10 0.05 1.19 1.01 Power g 0.97 0.01 0.95 0.99 m 0.57 0.05 0.47 0.67 Holt and Laury (2005): N ¼ 96 non-hypothetical responses EUT with r CRRA m EUT with r a ExpoPower m RDU with r CRRA g m RDU with r Expoa Power g m a

subjects, pooled over 1 and 20 tasks, with no order effects; 960 0.76 0.94 0.40 0.07 0.12 0.85 1.46 0.89 0.26 0.02 0.37 0.06

0.04 0.15 0.07 0.02 0.02 0.08 0.35 0.14 0.05 0.01 0.15 0.02

Empty cells are p-values that are less than 0.005. The null hypothesis here is that g ¼ 1.

b

0.19b

0.16

0.68 0.64 0.25 0.04 0.07 0.69 0.77 0.61 0.16 0.01 0.07 0.02

0.84 1.24 0.54 0.11 0.16 1.00 2.15 1.17 0.36 0.04 0.67 0.11

330.93 303.94

325.50

288.09

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

122

A. In-Sample

B. Out-of-Sample

10

40

8 30 6 20 4 10 2

0

0 0

5

10

15

Prizes in U.S. Dollars

20

0

25 50 75 100 125 150 175 200 225 250

Prizes in U.S. Dollars

Fig. 19. Estimated In-Sample and Out-of-Sample Utility. (Estimated from Responses of 63 Subjects over 60 Binary Choices. Assuming EUT CRRA Speciﬁcation with Fechner Error. Data from Our Replication of Hey and Orme (1994): Choices Over Prizes of $0, $5, $10, and $15. Point Prediction of Utility and 95% Conﬁdence Intervals.) (A) In-sample. (B) Out-of Sample.

predicted utility values in Fig. 19. These predictions are from our replications of the Hey and Orme (1994) design, and the estimates for the EUT CRRA speciﬁcation in Table 8. Fig. 19 displays predicted in-sample utility values and their 95% conﬁdence interval using these estimates. Obviously the cardinal values on the vertical axis are arbitrary, but the main point is to see how relatively tight the conﬁdence intervals are in relation to the changes in the utility numbers over the lottery prizes. Note the slight ‘‘ﬂare’’ in the conﬁdence interval in panel A of Fig. 19, as we start to modestly predict utility values beyond the top $15 prize used in estimation. Panel B extrapolates to provide predictions of out-of-sample utility values, up to $250, and their 95% conﬁdence intervals. The widening conﬁdence intervals are exactly what one expects from elementary econometrics. And these intervals would be even wider if we accounted for our uncertainty that this is the correct functional form, and our uncertainty that we had used the correct stochastic identifying assumptions. Moreover, the (Fechner) error speciﬁcation used here allows for an extra element of imprecision when predicting what a subject would actually choose after evaluating the expected utility of the out-of-sample lotteries, and

Risk Aversion in the Laboratory

123

this does not show up in Fig. 19 since we only use the point estimate of m. The lesson here is that we have to be cautious when we make theoretical and empirical claims about risk attitudes. If the estimates displayed in panel A of Fig. 19 are to be used in the out-of-sample domain of panel B of Fig. 19, the extra uncertainty of prediction in that domain should be acknowledged. Cox and Sadiraj (2008) shows why we want to make such predictions, for both EUT and non-EUT speciﬁcations; we review the methods that can be used to generate these data, and econometric methods to estimate utility functions; and Wilcox (2008a) shows how alternative stochastic assumptions can have strikingly different substantive implications for the estimation of out-of-sample risk attitudes.

4. OPEN AND CLOSED QUESTIONS We brieﬂy review some issues which are, in our view, wide open for research or long closed.

4.1. Hypothetical Bias Top of the ‘‘closed’’ list for us is the issue of hypothetical bias. This was a prime focus of Holt and Laury (2002, 2005), and again in Laury and Holt (2008), and has been reviewed in detail by Harrison (2007). For some reason, however, many proponents of behavioral economics insist on using task responses that involve hypothetical choices. One simple explanation is that many of the earliest examples in behavioral economics came from psychologists, who did not use salient rewards to motivate subjects, and this tradition just persisted. Another explanation is that an inﬂuential survey by Camerer and Hogarth (1999) is widely mis-quoted as concluding that there is no evidence of hypothetical bias in such lottery choices. What Camerer and Hogarth (1999) actually conclude, quite clearly, is that the use of hypothetical rewards makes a difference to the choices observed, but that it does not generally change the inference that they draw about the validity of EUT.74 Since the latter typically involve paired comparisons of response rates in two lottery pairs (e.g., in common ratio tests), it is logically possible for there to be (i) differences in choice probabilities in a given lottery depending on whether one use hypothetical or real responses,

124

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

and (ii) no difference between the effect of the EUT treatment on lottery pair responses rates depending on whether one uses hypothetical or real responses. Furthermore, Camerer and Hogarth (1999) explicitly exclude from their analysis the mountain of data from experiments on valuation that show hypothetical bias.75 Their rationale for this exclusion was that economic theory did not provide any guidance as to which set of responses was valid. This is an odd rationale, since there is a well-articulated methodology in experimental economics that is quite precise about the motivational role of salient ﬁnancial incentives (Smith, 1982). And the experimental literature has generally been careful to consider elicitation mechanisms that provide dominant strategy incentives for honest revelation of valuations, and indeed in most instances explain this to subjects since it is not being tested. Thus, economic theory clearly points to the real responses as having a stronger claim to represent true valuations. In any event, the mere fact that hypothetical and real valuations differ so much tells us that at least one of them is wrong! Thus, one does not actually need to identify one as reﬂecting true preferences, even if that is an easy task a priori, in order to recognize that there are systematic and large differences in behavior between hypothetical and real responses.

4.2. Sample Selection This is a wide-open issue that experimental economists will have to confront systematically before other researchers from labor economics do so for them. It is likely to be a signiﬁcant factor in many experiments, since randomization to treatment is fundamental to statistical control in the design of experiments. But randomization implies some uncertainty about treatment condition, and individuals differ in their preferences towards taking on risk. Since human subjects volunteer for experiments, it is possible that the sample observed in an experiment might be biased because of the risk inherent in randomization. In the extreme case, subjects in experiments might be those that are least averse to being exposed to risk. For many experiments of biological response this might not be expected to have any inﬂuence on measurement of treatment efﬁcacy, but other laboratory, ﬁeld and social experiments measure treatment efﬁcacy in ways that could be directly affected by randomization bias.76 On the other hand, the practice in experimental economics is to offer subjects a ﬁxed participation fee to encourage attendance. These

Risk Aversion in the Laboratory

125

non-stochastic participation fees could offset the effects of randomization, by encouraging more risk-averse subjects to participate than might otherwise be the case. Thus, the term ‘‘randomization bias,’’ in the context of economics experiments, should be taken to mean the net effects from these two latent sample selection effects.77 There is indirect evidence for these sample selection effects within the laboratory. One can recruit subjects to an experiment, conduct a test of risk attitudes, and then allow subjects to sort themselves into a given task rewarded by ﬁxed or performance-variable payments. Cadsby, Song, and Tapon (2007) and Dohmen and Falk (2006) did just this, and show that more risk-averse subjects select into tasks with ﬁxed rewards rather than rewards that vary with uncertain performance, and suffer in terms of expected pay. Of course, they were happy to forego some expected income in return for reduced variance. But these results strongly suggest that there would be an effect from risk attitudes if one moved the sample selection process one step earlier to include the choice to participate in the experimental session itself.78 Harrison, Lau, and Rutstro¨m (2005c) undertake a ﬁeld experiment and a laboratory experiment to directly test the hypothesis that risk attitudes play a role in sample selection.79 In both cases they followed standard procedures in the social sciences to recruit subjects. In their experiments the primary source of randomness had to do with the stochastic determination of ﬁnal earnings, as explained below. They also employed random assignment to treatment in some experiments, but the general point applies whether the randomness is due to assignment to treatment or random determination of earnings, since the effect is the same on potential subjects. Nevertheless, it is reasonable to suspect that members of most populations from which experimenters recruit participants hold beliefs that the beneﬁts from participating are uncertain. All that is required for sample selection to introduce a bias in the risk attitude of the participants is an expectation of uncertainty, not an actual presence of uncertainty in the experimental task. In the ﬁeld experiment it was possible to exploit the fact that the experimenter already knew certain characteristics of the population sampled, adults in Denmark in 2003, allowing a correction for sample selection bias using well-known methods from econometrics. The classic problem of sample selection refers to possible recruitment biases, such that the observed sample is generated by a process that depends on the nature of the experiment.80 In principle, there are two offsetting forces at work in this sample selection process, as mentioned above. The use of randomization

126

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

could attract subjects to experiments that are less risk averse than the population, if the subjects rationally anticipate the use of randomization.81 Conversely, the use of guaranteed ﬁnancial remuneration, common in experiments in economics for participation, could encourage those that are more risk averse to participate. These ﬁeld experiments therefore allowed an evaluation of the net effect of these opposing forces, which are intrinsic to any experiment in which subjects are voluntarily recruited with ﬁnancial rewards. The results indicate that measured risk aversion is smaller after corrections for sample selection bias, consistent with the hypothesis that the use of a substantial, guaranteed show-up fees more than offset any bias against attending an experiment that involved randomization. This effect is statistically signiﬁcant. The results also suggest that there is no evidence that any sample selection that occurred inﬂuenced inferences about the effects of observed individual demographic characteristics on risk aversion. Harrison et al. (2005c) then conducted a laboratory experiment to complement the insights from their ﬁeld experiment, and explore the conclusion that a larger gross sample selection effect might have been experienced due to randomization, but that the muted net sample selection effect actually observed was due to ‘‘lucky’’ choices of participation fees. The ﬁeld design used the same ﬁxed recruitment fee for all subjects, to ensure comparability of subjects in terms of the behavioral task. In the laboratory experiments this ﬁxed recruitment fee was exogenously varied. If the level of the ﬁxed fee affects the risk attitudes of the sample that choose to participate in the experiment, at least over the amounts they consider, one should then be able to directly see different risk attitudes in the sample. As expected a priori, they observed samples that were more risk averse when a higher ﬁxed participation fee was used. In another treatment in the laboratory experiments they vary only the range of the prizes possible in the task, keeping the ﬁxed participation fee constant, but announcing these ranges at the time of recruitment. In this case, they observed samples that were more risk averse when the range of prizes was widened, compared to the control. Hence, the level of the ﬁxed recruitment fee and information on the range of prizes in the experiment had a direct inﬂuence on the composition of the sample in terms of individual risk attitudes. The implication is that experimental economists should pay much more attention to the process that leads subjects to participate in the experiment if they are to draw reliable inferences in any setting in which risk attitudes play a role. This is true whether one conducts experiments in the laboratory or the ﬁeld.82

Risk Aversion in the Laboratory

127

A closely related issue is what role risk attitudes may play in affecting subjects’ participation choices over different institutions or cohorts when such choices are allowed.83 It is common in the experimental literature to study behavior in two or more institutions imposed exogenously on subjects, or to put subjects together exogenously. But in the naturally occurring world that our experiments are modeling, people choose institutions to some degree, and choose who to interact with to some degree. The effect of treatments may be completely different when people have some ability to select into them, or some ability to choose the cohorts to participate with, compared to the standard experimental paradigm. In effect, the experiment just has to be widened to include these processes of selection, if appropriate for the behavior under study. The broader experimental literature now identiﬁes many possible mechanisms for this process, such as migration from one region to another in which local public policies exhibit differences (Ehrhart and Keser (1999), Page, Putterman and Unel (2005), Gu¨rerk, Irlenbusch, and Rockenbach (2006)), voting in an explicit social choice setting (Botelho, Harrison, Pinto, & Rutstro¨m, 2005a; Ertan, Page, & Putterman, 2005; Sutter, Haigner, & Kocher, 2006), lobbying for policies (Bullock & Rutstro¨m, 2007), and even the evolution of social norms of conduct (Falk, Fehr, & Fischbacher, 2005). Each of these processes will interact with the risk attitudes of subjects.

4.3. Extending Lab Procedures to the Field One of the main attractions of experimental methods is the control that it provides over factors that could inﬂuence behavior. The ability to control the environment allows the researcher to study the effects of treatments in isolation, and hence makes it easier to draw inferences as to what is inﬂuencing behavior. In most cases we are interested in making inferences about ﬁeld behavior. We hypothesize that there is a danger that the imposition of an exogenous laboratory control might make it harder, in some settings, to make reliable inferences about ﬁeld behavior. The reason is that the experimenter might not understand something about the factor being controlled, and might impose it in a way that is inconsistent with the way it arises naturally in the ﬁeld, and that affects behavior. Harrison et al. (2007c) take as a case study the elicitation of measures of risk aversion in the ﬁeld. In the traditional paradigm, risk aversion is viewed in terms of diminishing marginal utility of the ﬁnal prize in some abstract lottery. The concept of a lottery here is just a metaphor for a real lottery,

128

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

although in practice the metaphor has been used as the primary vehicle for laboratory elicitation of risk attitudes. In general there is some commodity x and various levels i of x, xi, that depend on some state of nature which occurs with a probability pi that is known to the individual whose preferences are being elicited. Thus, the lottery is deﬁned by {xi; pi}. Traditional measures of risk aversion under EUT are then deﬁned in terms of the curvature of the utility function with respect to x. Now consider the evaluation of risk attitudes in the ﬁeld. This generally entails more than just ‘‘leaving the classroom’’ and recruiting outside of a university setting, as emphasized by Harrison and List (2004). In terms of sample composition, it means ﬁnding subjects who deal with that type of uncertainty to varying degrees, and trying to measure the extent of their ﬁeld experience with uncertainty. Moreover, it means developing stimuli that more closely match those that the subjects have previously experienced, so that they can use whatever heuristics they have developed for that commodity when making their choices. Finally, it means developing ways of communicating probabilities that correspond with language that the subjects are familiar with. Thus, ﬁeld experimentation in this case, and in general, involves several simultaneous changes from the lab setting with respect to subject recruitment and the development of stimuli that match the ﬁeld setting. Apart from sample and task selection issues a second theme that is important to the relevance of lab ﬁndings to ﬁeld inferences is the inﬂuence of ‘‘background risk’’ on the attitudes towards a speciﬁc ‘‘foreground risk’’ that is the focus of the elicitation task. In many ﬁeld settings it is not possible to artiﬁcially identify attitudes towards one risk source without worrying about how the subjects view that risk as being correlated with other risks. For example, mortality risks from alternative occupations tend to be highly correlated with morbidity risks. It is implausible to ask subjects their attitude toward one risk without some coherent explanation as to why a higher or lower level of that risk would not be associated with a higher or lower risk of the other. Apart from situations where risks may be correlated, ‘‘background risk’’ can have an inﬂuence on elicited risk attitudes also when it is independent of the ‘‘foreground risk.’’ The theoretical literature has also yielded a set of preferences that guarantee that the addition of an unfair background risk to wealth reduces the CE of any other independent risk. That is, the addition of background risk of this type makes risk-averse individuals behave in a more risk averse way with respect to any other independent risk. Gollier and Pratt (1996) refer to this type of behavior as ‘‘risk vulnerability,’’ and show

Risk Aversion in the Laboratory

129

that all weakly Decreasing Absolute Risk Averse utility functions are risk vulnerable. This class includes many popular characterizations of risk attitudes, such as CARA and CRRA. Eeckhoudt, Gollier, and Schlesinger (1996) extend these results by providing the necessary and sufﬁcient conditions on the characterization of risk aversion to ensure that any increase in background risk induces more risk aversion. The ﬁeld experiment in Harrison et al. (2007c) is designed to analyze such situations of independent multiple risk. The compare using monetary prizes to using prizes whose values involve some risk and conclude that the risk attitudes elicited are not the same in the two circumstances. These prizes are collector coins and the subjects are numismatists. They ﬁnd that the subjects are generally more risk averse over the prizes when the latter involve additional, and independent, risk.84 These results are consistent with the available theory from conventional EUT for the effects of background risk on attitudes to risk. Thus, applying risk preferences that have been elicited in the lab to ﬁeld settings with background risks may underestimate the extent to which decisions will reﬂect risk aversion. In addition, eliciting risk attitudes in a natural ﬁeld setting with natural tasks and non-monetary prizes requires one to consider the nature and degree of background risk, since it is inappropriate to ignore.85 A further virtue of extending lab procedures to the ﬁeld, therefore, is to encourage richer lab designs by forcing the analyst to account for realistic features of the natural environment that have been placed aside. In virtually any market with asymmetric information, whether it is a coins market, an open-air market, or a stock exchange, a central issue is the quality of the object being traded. This issue, and attendant uncertainty, arises naturally. In many markets, the grade of the object, or professional certiﬁcation of the seller, is one of the critical variables determining price. Thus, one could scarcely design a test of foreground risk in these markets without attending to the background risk. Harrison et al. (2007c) exploit the fact that such risks can be exogenously controlled in these settings, and in a manner consistent with the predictions of theory.86 In a complementary manner, Fiore, Harrison, Hughes, and Rutstro¨m (2007) consider how one can use simulation tools to represent ‘‘naturally occurring probabilities.’’ As one moves away from the artifactual controls of the laboratory, distributions of outcomes are not always discrete, and probabilities are not given from outside. They are instead estimated as the result of some process that the subject perceives. One approach to modeling such naturally occurring probabilities in experiments is to write out a numerical simulation model that represents the physical process

130

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

that stochastically generates the outcome as a function of certain inputs, render that process to subjects in a natural manner using tools of Virtual Reality, and study how behavior changes as one changes the inputs. For example, the probability that a wildﬁre will burn down a property ‘‘owned’’ by the subject might depend on the location of the property, the vegetation surrounding it, the location of the start of the wildﬁre, weather conditions, and interventions that the subject can choose to pay for to reduce the spread of a wildﬁre (e.g., prescribed burning). This probability can be simulated using a model, such as FARSITE developed by the U.S. Forest Service (Finney, 1998) to predict precisely these events. Thus, the subject sees a realistic rendering of the process generating a distribution over the binary outcome, ‘‘my property burns down or not.’’ By studying how subjects react to this process, one can better approximate the manner in which risk attitudes affect decisions in naturally occurring environments.

5. CONCLUSION At a substantive level, the most important conclusion is that the average subject is moderately risk averse, but there is evidence of considerable individual heterogeneity in risk attitudes in the laboratory. This heterogeneity is in evidence within given elicitation formats, so it cannot be ascribed to differences in elicitation formats. The range of risk attitudes is modest, however, and there is relatively little evidence of risk-loving behavior. The temptation to talk about a ‘‘central tendency’’ of ‘‘slight risk aversion’’ does not ﬁt well with the bi-modal nature of the responses observed in several studies: a large fraction of subjects is well characterized as being close to risk neutral, or very slightly risk averse, and another large fraction as being quite risk averse. At a methodological level, the evidence suggests some caution in expecting different elicitation formats to generate comparable data on risk attitudes. Both the framing of the questions and the implied incentives differ across instruments and may affect responses. One would expect the MPL and RLP procedures to generate comparable results, since they are so similar from a behavioral perspective, and they do. The OLS instrument is very portable in the ﬁeld, has transparent incentives for truthful responses, and is easy to administer in all environments, so more work comparing its performance to the MPL and RLP instruments would be valuable. It suffers from not being able to provide a rich characterization of behavior when

131

Risk Aversion in the Laboratory

allowances are made for probability weighting, but that may be mitigated with extensions to consider probabilities other than 1/2. In the Epilogue to a book-length review of the economics of risk and time, Gollier (2001; p.424ff.) writes that It is quite surprising and disappointing to me that almost 40 years after the establishment of the concept of risk aversion by Pratt and Arrow, our profession has not yet been able to attain a consensus about the measurement of risk aversion. Without such a consensus, there is no hope to quantify optimal portfolios, efﬁcient public risk prevention policies, optimal insurance deductibles, and so on. It is vital that we put more effort on research aimed at reﬁning our knowledge about risk aversion. For unclear reasons, this line of research is not in fashion these days, and it is a shame.

The most important conclusion we draw from our survey is that reliable laboratory methods exist to determine the individual aversion to risk of a subject, or to characterize the distribution of risk attitudes of a speciﬁc sample. These methods can now be systematically employed to ensure greater control over tests and applications of theory that depend on risk attitudes.

NOTES 1. For example, in virtually all experimental studies of non-cooperative bargaining behavior. A particularly striking example is provided by Ochs and Roth (1989), since Roth and Malouf (1979) pioneered the use of experimental procedures to induce risk neutral behavior in cooperative bargaining settings. 2. For example, in virtually all experimental studies of bidding behavior in ﬁrstprice auctions, whether in private values settings (Cox et al., 1982) or common values settings (Kagel & Levin, 2002). 3. For example, the experimental literature on bidding behavior in ﬁrst-price sealed bid auctions relies on predictions that are conditioned on the subjects following some Nash Equilibrium strategy as well as being characterized by risk in some way. Overbidding in comparison to the risk-neutral prediction could be due to failure of either the assumption of Nash bidding or the failure of the assumption of risk neutrality (Section 3.6). Harrison (1990) and Cox, Smith, and Walker (1985) attempt to tease these two possibilities apart using different designs. 4. We do not consider experimental designs that attempt to control for risk, or induce speciﬁc risk attitudes. Our general focus is on direct estimation of risk attitudes where rewards are real and there is some presumption that the procedure is incentive compatible. There is a huge, older literature on the elicitation of utility, but virtually none of it is concerned with incentive compatibility of elicitation, which we take as central. Great reviews include Fishburn (1967) and Farquhar (1984). Many components of the procedures we consider can be viewed as building on methods developed in this older literature. Biases in utility elicitation procedures are reviewed by Hershey et al. (1982), although again there is no discussion at all of incentive compatibility or hypothetical rewards bias.

132

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

5. Birnbaum (2004) illustrates the type of systematic comparison of representations that ought to be built in for broader research programs. He considers various representations of probability in terms of text, pie charts, natural frequencies, and alignments of equally likely consequences, as well as minor variants within each type of representation. One reason for this focus is his concern with violations of stochastic dominance, which is an elemental behavioral property of decisions, and presumed to be directly affected by task representation. In brief, he ﬁnds little effect on the extent of stochastic dominance of these alternative representations. That conclusion is limited to the hypotheses he considers, of course; there could still be an effect on structural estimates of underlying models, and other hypotheses derived from those estimates. Unfortunately, the procedures he uses, still common in the psychology and decision-making literature, employ hypothetical or near-hypothetical rewards for subjects to make salient decisions. Wakker, Erev, and Weber (1994) considered four types of representations, shown in Appendix A, in salient choices, but provide no evaluation of the effects of the alternatives. 6. There is an interesting question as to whether they should be provided. Arguably some subjects are trying to calculate them anyway, so providing them avoids a test of the joint hypothesis that ‘‘the subjects can calculate EV in their heads and will not accept a fair actuarial bet.’’ On the other hand, providing them may cue the subjects to adopt risk-neutral choices. The effect of providing EV information deserves empirical study. 7. The last row does have the advantage of helping subjects see that they should obviously switch to option B by the last row, and hence seeing the ordered nature of the overall instrument. Arguably it would be useful to add a row 0 in which the lower prize for options A and B were obtained with certainty, to help the subject see that they should always choose A at the top and B at the bottom, and the only issue is where they should switch. 8. Schubert et al. (1999) present their method as the elicitation of a certaintyequivalent, but do not say clearly how they elicited the certainty-equivalent. In fact (Renate Schubert; personal communication) their procedures represent an early application of the MPL idea. Each subject was asked to choose between two lotteries, where one lottery was the risky one and the other degenerate lottery was a non-stochastic one. They asked subjects 98 binary choice questions, spanning 8 risky lotteries. These were arrayed in an ordered fashion on 98 separate sheets. The responses could then be ordered in increasing values for the non-stochastic lottery, and a ‘‘switch point’’ determined to identify the certainty-equivalent. 9. If the subject always chooses A, or indicates indifference for any of the decision rows, there are no additional decisions required and the task is completed. 10. Let the ﬁrst stage of the iMPL be called Level 1, the second stage Level 2, and so on. After making all responses, the subject has one row from the ﬁrst table of responses in Level 1 selected at random by the experimenter. In the MPL and sMPL procedures, that is all there is since there is only a Level 1 table. In the iMPL, that is all there is if the row selected at random by the experimenter is not the one at which the subject switched in Level 1. If it is the row at which the subject switched, another random draw is made to pick a row in the Level 2 table. For some tasks this procedure is repeated to Level 3.

Risk Aversion in the Laboratory

133

11. In our experience subjects are suspicious of randomization generated by computers. Given the propensity of many experimenters in other disciplines to engage in deception, we avoid computer randomization whenever feasible. 12. Dave et al. (2007) draw similar conclusions, and include an explicit comparison in the ﬁeld with the Holt and Laury (2002) MPL instrument. They also collect information on the cognitive abilities of subjects, to better identify the sources of any differences in behavior. 13. Millner, Pratt, and Reilly (1988) offered some important, critical observations on the design and analysis proposed by Harrison (1986). There is possible contamination from intra-session experimental earnings if the subject is paid for each selling price elicited, but this issue is common to all of the methods. One could either assume these wealth effects away (Harrison, 1986; Kachelmeier & Shehata, 1992), test for them (McKee, 1989), or pay subjects for just one of the stages. The last of these options is now the standard method when applying BDM, but raises the same issues with the validity of the random lottery incentive mechanism that have been discussed for other procedures (see Section 3.8). 14. One must also ensure that the buyout range exceeds the highest price that the subject would reasonably state, but this is not a major problem. 15. The same ‘‘payoff dominance problem’’ applies to ﬁrst-price auctions, as noted by Harrison (1989). Hence, both of the institutions used by Isaac and James (2000) to infer risk attitudes are blighted with this problem. The same problem applies to two of the three institutions studied by Berg, Dickhaut, and McCabe (2005). Their third institution, the English auction, is known to have more reliable behavioral incentives for truthful responses (Harstad, 2000; Rutstro¨m, 1998). 16. Assume a risk neutral subject facing an MPL with prizes $20 and $16 for the safe lottery and $38.50 and $1 for the risky one. Such a subject should choose the risky lottery for rows 1 through 4 and then switch to the risky one. Not doing so would result in an expected earnings loss. For example, if he erroneously responds as if he is slightly risk loving by choosing the risky lottery already on row 4 he is forgoing $1.60, and if he erroneously responds as if he is slightly risk averse by still choosing the safe lottery on row 5, he is forgoing $1.75. Since the chances are 1 in 10 that the row with his erroneous choice is picked, his expected foregone earnings are about 16 to 17.5 cents. If he instead were asked to state his minimum WTA for each of the lotteries in a BDM, his true WTA when the probabilities correspond to those given in row 5 of the MPL (i.e., 50/50) would be $18 for the safe and $19.75 for the risky lottery. We can then calculate the expected loss from different misrepresentations of his preferences in ways that are comparable to those calculated for the MPL. To ﬁnd the expected loss from representing his preferences as if they were deﬁned over the safe MPL lottery given on row 4 we simply calculate the maximum WTA for the safe lottery on row 4 as $17.60. If this is his stated WTA he will experience a loss if the BDM selling price is between this report and his true WTA ($18).The likelihood for this is obviously quite small. On the other hand, the expected loss of a similarly erroneous report for the risky lottery would involve a report of $16 for a true maximum WTA of $19.75, a much stronger incentive. Again, the likelihood of this loss is the likelihood of the BDM selling price falling in between the stated and the true WTA. This likelihood is a function of the range of the buying prices used in

134

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

the particular implementation of the BDM. The narrower the range the higher is this likelihood. It is clear from this numeric example that the incentive properties of the BDM are much worse than those for the MPL for the safe lottery, but quite a bit better for the risky lottery. One problem with the BDM is that for a risk-loving subject who would state a high WTA for the lottery the probability of the BDM drawing a number higher than or equal to his WTA is low. Thus, the incentives for precision are low for such a subject. 17. Millner et al. (1988; p. 318) suggest that one should develop methods for identifying inconsistent responses, although they would agree that the original checks built in by BDM have some ﬂaws, since the lotteries offered to subjects at later stages depend on earlier elicited selling prices. This sounds attractive in the abstract, but we caution against the use of mechanical rules for classifying subjects as inconsistent. For example, erratic responses could just be a reﬂection that the subject rationally perceives the absence of a strong incentive to respond truthfully. Classifying such subjects as inconsistent is inappropriate. 18. The former asks the subject to state a certain amount that makes them indifferent to the lottery, similar to what is done in the BDM, and the latter asks the subject to state some probability in the lottery that makes them indifferent to some ﬁxed and certain amount, similar to what is done in the OLS. The latter method presumes that there are only two outcomes, and hence one probability. 19. Abdellaoui (2000) did introduce the use of a bisection method for establishing indifference in each stage that might mitigate some strategic concerns. The idea is to only allow subjects to pick one of two given lotteries, and not to state the indifference lottery directly. By starting this process at some a priori extreme pair, one can iterate down to the point of indifference using a conventional bisection search algorithm. In this instance the chaining strategy is limited to always picking the lottery with the highest possible prize. This method was also used by Kuilen, Wakker, and Zou (2007), and has the advantage of limiting the ﬁnancial exposure of the experimenter to known bounds. Of course, subjects might not adopt the chaining strategy in the logically extreme form, perhaps to avoid being dismissed from the experiment or not being invited back again, but still be generating strategically biased responses. 20. The TO method has also been extended by Attema, Bleichrodt, Rohde, and Wakker (2006) to elicit discount rates. The same incentive compatibility problems apply, only hypothetical experiments are conducted, and there is no discussion of the problems of incentivizing responses. 21. We use the term ‘‘risk attitudes’’ here in the broader sense of including possible effects from non-linear utility functions, probability weighting and loss aversion. 22. Andersen et al. (2008a) and Section 3.4 discuss the elicitation of risk preference and time preferences, and the need for joint estimation of all parameters. The basic idea is that the discount rate involves the present value of utility streams, and not money streams, so one needs to know the concavity of the utility function to infer discount rates. In effect, the TCN procedure assumes risk neutrality when inferring discount rates, which will lead to overestimates of discount rates between utility ﬂows. 23. We consider the use of such interval bounds for estimation in Section 2.1. Having some bounds that span a ﬁnite number and N does not pose problems for

Risk Aversion in the Laboratory

135

the ‘‘interval regression’’ methods widely available, although it does correctly lead to larger standard errors than collapsing this interval to just the lower bound. 24. Some subjects switched several times, but the minimum switch point is always well deﬁned. It turns out not to make much difference how one handles these ‘‘multiple switch’’ subjects, but our analysis and the analysis of HL considers the effect of allowing for them in different ways explained below. 25. HL ﬁnd that there is a signiﬁcant sex effect in the low-payoff conditions, with women being more risk averse, and no effect in the high payoff conditions. We replicate this conclusion using their procedures and data. Unfortunately, the lowpayoff sex effect does not hold if one controls for the other characteristics of the subject and uses a negative binomial regression model to handle the discrete nature of the dependant variable. HL also report that there is a signiﬁcant Hispanic effect, with Hispanic subjects making fewer risk-averse choices in high payoff conditions. We conﬁrm this conclusion, using their procedures as well as when one uses all covariates in a negative binomial regression. 26. A subject that switched from option A to option B after ﬁve safe choices, then switched back for one more option A before choosing all B’s in the remaining rows, would therefore have revealed a CRRA interval between 0.15 and 0.97. Such subjects simply provide less precise information than subjects that switch once. 27. Our treatment of indifferent responses uses the speciﬁcation developed by Papke and Wooldridge (1996; Eq. 5, p. 621) for fractional dependant variables. Alternatively, one could follow Hey and Orme (1994; p. 1302) and introduce a new parameter t to capture the idea that certain subjects state indifference when the latent index showing how much they prefer one lottery over another falls below some threshold t in absolute value. This is a natural assumption to make, particularly for the experiments they ran in which the subjects were told that expressions of indifference would be resolved by the experimenter, but not told how the experimenter would do that (p. 1295, footnote 4). It adds one more parameter to estimate, but for good cause. 28. Clustering commonly arises in national ﬁeld surveys from the fact that physically proximate households are often sampled to save time and money, but it can also arise from more homely sampling procedures. For example, Williams (2000; p. 645) notes that it could arise from dental studies that ‘‘collect data on each tooth surface for each of several teeth from a set of patients’’ or ‘‘repeated measurements or recurrent events observed on the same person.’’ The procedures for allowing for clustering allow heteroskedasticity between and within clusters, as well as autocorrelation within clusters. They are closely related to the ‘‘generalized estimating equations’’ approach to panel estimation in epidemiology (see Liang & Zeger, 1986), and generalize the ‘‘robust standard errors’’ approach popular in econometrics (see Rogers, 1993). Wooldridge (2003) reviews some issues in the use of clustering for panel effects, noting that signiﬁcant inferential problems may arise with small numbers of panels. 29. Age was imputed as 20 for all subjects in the undergraduate class experiments conducted at the University of New Mexico, based on personal knowledge of the experimenters of the age distribution in those classes (Kate Krause, personal communication). Given the variation in age for non-student adults, this imputation is less likely to be a major factor compared to studies that only use student subjects.

136

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

30. That is, we treat the prizes here as gains measured as the net gain after deducting losses from the endowment. This analysis still allows for a framing effect, of course. 31. See Harless and Camerer (1994), Hey and Orme (1994) and Loomes and Sugden (1995) for the ﬁrst wave of empirical studies including some formal stochastic speciﬁcation in the version of EUT tested. There are several species of ‘‘errors’’ in use, reviewed by Hey (1995, 2002), Loomes and Sugden (1995), Ballinger and Wilcox (1997), Loomes, Moffatt, and Sugden (2002) and Wilcox (2008a). Some place the error at the ﬁnal choice between one lottery or the other after the subject has decided deterministically which one has the higher expected utility; some place the error earlier, on the comparison of preferences leading to the choice; and some place the error even earlier, on the determination of the expected utility of each lottery. 32. This ends up being simple to formalize, but involves some extra steps in the economics. Let EUR and EUL denote the expected utility of lotteries R and L, respectively. If we ignore indifference, and the subject does not make mistakes, then R is chosen if EUR EULW0, and otherwise L is chosen. If the subject makes measurement errors, denoted by e, then the decision is made on the basis of the value of EUR EUL+e. That is, R is chosen if EUR EUL+eW0, and otherwise L is chosen. If e is random then the probability that R is chosen ¼ P(EUR EUL+eW0) ¼ P(eW (EUR EUL)). Now suppose that e is normally distributed with mean 0 and standard deviation s, then it follows that Z ¼ e/s is normally distributed with mean 0 and standard deviation 1: in other words, Z has a unit normal distribution. Hence, the probability that R is chosen is P(eW (EUR EUL)) ¼ P(e/sW (EUR EUL)/s). If F( ) denotes the cumulative normal standard distribution, it follows that the probability that R is chosen is 1 F( (EUR EUL)/s) ¼ F((EUR EUL)/s), since the distribution is symmetrical about 0. Hence, the probability that B is chosen is given by: F( (EUR EUL)/s) ¼ 1 F((EUR EUL)/ s). If we denote by y the decision of the subject with y ¼ 1 indicating that R was chosen and y ¼ 1 indicating that L was chosen, then the likelihood is F((EUR EUL)/s) if y ¼ 1 and 1 F((EUR EUL)/s) if y ¼ 1. 33. We also correct for clustering, since it is the right thing to do statistically, but this again makes no essential difference to the estimates. 34. The instructions were brief: ‘‘Your decision sheet shows 8 options listed on the left. You should choose one of these options, which will then be played out for you. If the coin toss is a Heads you will receive the amount listed in the second column. If the coin toss is a Tail you will receive the amount listed in the third column.’’ The transparency of the OLS procedure is apparent, and derives from only using probabilities of 1/2. 35. The secondary purpose of this design is to allow statistical examination of the hypothesis that subjects use ‘‘similarity relations’’ and ‘‘editing processes’’ to evaluate lotteries when prizes and probabilities are not pre-rounded, as in Hey and Orme (1994). 36. The use of the noise parameter m in Eq. (8) is also familiar from the numerical literature on the smoothing of accept–reject simulators in discrete choice statistical modeling: see Train (2003; p. 125ff.), for example. This connection also reminds us that the use of speciﬁc linking functions such as logit or probit have a certain

Risk Aversion in the Laboratory

137

arbitrariness to them, but embody implicit behavioral assumptions about responsiveness to latent indices. 37. A more complete statistical analysis would consider two factors: the effect of information about earnings in the prior procedure, and a more elaborate likelihood function that recognized that these are in-sample responses. Our estimates ignore both factors. It would also be useful to examine the experimental data from EngleWarnick et al. (2006) using inferential methods such as ours, since their design used exactly the same lotteries in the RLP and OLS instruments. Dave et al. (2007) provide careful tests of the MPL and OLS instruments, concluding that the OLS instrument provides a more reliable measuring rod for risk attitudes in samples drawn from populations expected to have limited cognitive abilities. 38. Hirshleifer and Riley (1992) and Chambers and Quiggin (2000) demonstrate the elegant and powerful representations of decision-making under uncertainty that derive from adopting a state-contingent approach instead of popular alternatives. 39. Many of these claims involve evidence from between-sample designs, and rely on the assumption that sample sizes are large enough for randomization to ensure that between-sample differences in preferences (even if they are not state-contingent) are irrelevant. For two careful examples, see Conlisk (1989) and Cubitt et al. (1998a). There is also a rich literature on the contextual role of extreme lotteries, such that one often observes different behavior for ‘‘interior lotteries’’ that assign positive probability to all prizes as compared to ‘‘corner-solution lotteries’’ that assign zero weight to some prizes. 40. Stigler and Becker (1977; p. 76) note the nature of the impasse: ‘‘an explanation of economic phenomena that reaches a difference in tastes between people or times is the terminus of the argument: the problem is abandoned at this point to whoever studies and explains tastes (psychologists? anthropologists? phrenologists? socio-biologists?).’’ 41. Camerer (2005; p. 130) provides a useful reminder that ‘‘Any economics teacher who uses the St. Petersburg paradox as a ‘‘proof’’ that utility is concave (and gives students a low grade for not agreeing) is confusing the sufﬁciency of an explanation for its necessity.’’ 42. Of course, many others recognized the basic point that the distribution of outcomes mattered for choice in some holistic sense. Allais (1979; p. 54) was quite clear about this, in a translation of his original 1952 article in French. Similarly, in psychology it is easy to ﬁnd citations to kindred work in the 1960s and 1970s by Lichtenstein, Coombs and Payne, inter alios. 43. There are some well-known limitations of the probability weighting function Eq. (9). It does not allow independent speciﬁcation of location and curvature; it has a crossover-point at p ¼ 1/e ¼ 0.37 for go1 and at p ¼ 1 0.37 ¼ 0.63 for gW1; and it is not increasing in p for small values of g. Prelec (1998) and Rieger and Wang (2006) offer two-parameter probability weighting functions that exhibits more ﬂexibility than Eq. (9), but for our expository purposes the standard probability weighting function is adequate. 44. In this case, because each lottery only consists of two outcomes, the ‘‘rank dependence’’ of the RDU model does not play a distinctive role, but it will in later applications.

138

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

45. The estimates of the coefﬁcient obtained by Tversky and Kahneman (1992) fortuitously happened to be the same for losses and gains, and many applications of PT assume that for convenience. The empirical methods of Tversky and Kahneman (1992) are difﬁcult to defend, however: they report median values of the estimates obtained after ﬁtting their model for each subject. The estimation for each subject is attractive if data permits, as magniﬁcently demonstrated by Hey and Orme (1994), but the median estimate has nothing to commend it statistically. 46. In other words, evaluating the PU of two lotteries, without having edited out dominated lotteries, might lead to a dominated lottery having a higher PU. But if subjects always reject dominated lotteries, the choice would appear to be an error to the likelihood function. Apart from searching for better parameters to explain this error, as the ML algorithm does as it tries to ﬁnd parameter estimates that reduce any other prediction error, our speciﬁcation allows m to increase. We stress that this argument is not intended to rationalize the use of separable probability weights in OPT, just to explain how a structural model with stochastic errors might account for the effects of stochastic dominance. Wakker (1989) contains a careful account of the notion of transforming probabilities in a ‘‘natural way’’ but without violating stochastic dominance. 47. One of the little secrets of CPT is that one must always have a probability weight for the residual outcome associated with the reference point, and that the reference outcome receive a utility of 0 for both gains and losses. This ensures that decision weights always add up to 1. 48. An alternative speciﬁcation would be to take the negative of the utility function deﬁned over the gross losses, in effect assuming l ¼ 1 from the CPT speciﬁcation. 49. A corollary is that it might be a mistake to view loss aversion as a ﬁxed parameter l that does not vary with the context of the decision, ceteris paribus the reference point. 50. The mean estimate from their sample was $31, but there were clear nodes at $15 and $30. Our experimental sessions typically consist of several tasks, so expected earnings from the lottery task would have been some fraction of these expectations over session earnings. No subject stated an expected earning below $7. 51. A concrete implication, considered at length in Harrison and Rutstro¨m (2005; Section 5), is that the rush to use non-nested hypothesis tests is misplaced. If one reads the earlier literature on those tests it is immediately clear that they were viewed as poor, second-best alternatives to writing out a ﬁnite mixture model and estimating the weights that the data place on each latent process. The computational constraints that made them second-best decades ago no longer apply. 52. See Keller and Strazzera (2002; p. 148) and Frederick, Loewenstein, and O’Donoghue (2002; p. 381ff.) for an explicit statement of this assumption, which is often implicit in applied work. We refer to risk aversion and concavity of the utility function interchangeably, but it is concavity that is central (the two can differ for non-EUT speciﬁcations). 53. Harless and Camerer (1994) do consider different ways that one can compare different theories that have different numbers of ‘‘free parameters.’’ They also

Risk Aversion in the Laboratory

139

consider simple metrics for violations, but even these are still deﬁned in terms of the number of failures of the theory in a given triple (e.g., one failure out of three predictions is considered better from the perspective of the theory than two failures out of three). 54. Some semi-parametric estimators, such as the Maximum Score estimator of Manski, do rely on ‘‘hit rates’’ as a metric. 55. Some experiments attempt to design checks for some of the more obvious biases, such as which lottery is presented on the left or right, or whether the lotteries are ordered best to worst or vice versa (e.g., see Harless, 1992; Hey & Orme, 1994). 56. Problem 2 in CSS involves losing three subjects at random for every one subject that was actually asked to make a choice, whereas the other problems involved all recruited subjects making a choice. Hence 200 subjects were recruited to Problem 2, and the eventual sample of choices was roughly 50 subjects for each problem, by design. 57. Comparing only Problems 1 and 5 in CSS, which involve choices only over simple lotteries, the evidence against EUT is even weaker. 58. The word ‘‘essentially’’ reminds us that this is EUT with some explicit stochastic error story. There are many alternative error stories, of course. Wilcox (2008a, 2008b) explores the deeper modeling issues of writing out a theory without specifying any stochastic process connecting it to data. 59. Some might object that even if the behavior can be formally explained by some small error, there are systematic behavioral tendencies that are not consistent with a white-noise error process. Of course, one can allow asymmetric errors or heteroskedastic errors. 60. Wakker et al. (1994) in effect adopted such a design. Their primary tasks deliberately had comparable expected values in the paired lotteries subjects were to choose over, but their ‘‘ﬁller’’ tasks were then deliberately set up to have different expected values. 61. See Kagel (1995) and Harrison (1989, 1990) for a ﬂavor of the debates. 62. Harrison, List, and Tra (2005e) show, however, that when auctions consist of more and more bidders, received theory does increasingly poorly in terms of characterizing ‘‘one shot’’ behavior. Their evidence suggests that received theory is relevant for ‘‘small auctions’’ but not for ‘‘large auctions.’’ Thus, if one were testing received theory it would matter on what domain the data were generated. Cox et al. (1982) reported different results, with the smallest of their auctions (N ¼ 3) generating the data that seemed to most obviously contradict the risk-averse Nash Equilibrium bidding model. However, this could have been due to collusion. In all of their experiments the same N bidders participated in multiple rounds, facilitating coordination of collusive under-bidding strategies that wreak havoc with the oneshot predictions of the theory. 63. Cox et al. (1988) offer a generalization that admits of some degrees of riskloving behavior. Since we do not observe much risk loving in the population used in these experiments, college students in the United States, this extension is not needed for present purposes. 64. That is, 1/2 is arguably more focal than 2/3 or 3/4, and so on for NW2. It is certainly easier to implement arithmetically, absent calculating aids.

140

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

65. Unfortunately, there is evidence that subjects may not see it this way. In a generic public goods voluntary contribution game Botelho, Harrison, Pinto, and Rutstro¨m (2005b) show that Random Strangers designs do not generate the same behavior as Perfect Strangers designs in which the subject is guaranteed not to meet the same opponent twice. 66. One might be concerned that the full model ﬁts the RA NE bidding model simply because it has a ‘‘free parameter ri’’ to ﬁt the bidding data to. In some sense this is true, since the joint likelihood of the data includes the effect of different r^i ’s on bids, and the estimates seek r^i values that explain the bidding data best. But it is not true entirely, since the joint likelihood must also explain the risk attitude choice data as well. One can formally compare the distribution of predicted risk attitudes if one only uses the risk aversion tasks and the distribution that is generated if one uses all data simultaneously. The two distributions are virtually identical. Kendall’s t statistic can be used to test for rank correlation; it has a value of 0.82, and leads one to reject the null hypothesis that the two sets of estimates of risk attitudes are independent at p-values below 0.0001. 67. Additional experimental tests include Thaler, Tversky, Kahneman, and Schwartz (1997) and Gneezy, Kapteyn, and Potters (2003). These provide results that are qualitatively identical, but harder to evaluate. Thaler et al. (1997) did not provide subjects with precise knowledge of the probabilities involved in the lotteries, but allowed them to infer that over time; hence behavior could have been driven by mistakes in the subjective inference of probabilities rather than MLA. Gneezy et al. (2003) embed the task in an asset market, which may have inﬂuenced individual behavior in other ways than predicted by EUT or MLA. These inﬂuences are of interest, since markets are the institution in which most stocks and bonds are traded, but from the perspective of wanting the cleanest possible test of competing theories those extra inﬂuences are just a confound. Camerer (2005) and Novemsky and Kahneman (2005a, 2005b) provide an overview of the history and current status of the loss aversion hypothesis. 68. It is also possible to augment the estimation procedure to include a parameter that can be interpreted as ‘‘baseline consumption,’’ to which prizes are added before being evaluated using the utility function. This approach has been employed by Harrison et al. (2007c) and Heinemann (2008). Andersen et al. (2008a) consider the theoretical and empirical implications of this approach in detail. 69. The term ‘‘portfolio effects’’ is unfortunate, since it suggests a concern with correlated risks and risk pooling, which is not the issue here. Unfortunately, we cannot come up with a better expression, and this one has some currency in the literature. 70. For K binary choices it is 2K, assuming that indifference is not an option. For K ¼ 10 this is only 1,024, but for K ¼ 15 it is 32,768, and one can guess the rest for larger K. The use of random lottery incentives in the context of the Random Lottery Pair elicitation procedure raises some deep modeling issues of sequential choice, since it introduces the interaction of risk aversion and ambiguity aversion with respect to future lotteries (Klibanoff, Marinacci, & Mukerji, 2005; Nau, 2006), as well as concerns with possible preferences over the temporal resolution of uncertainty (Kreps & Porteus, 1978). In effect, this is a setting in which the ‘‘small world’’ assumption of Savage (1972; Section 5.5), under which one focuses on

Risk Aversion in the Laboratory

141

isolated decisions and ignores the broader context, may be particularly appropriate to apply. It may not be appropriate to apply for other tasks, as we discuss below. 71. Harrison et al. (2007; fn.16) report a direct test of the random lottery procedure with the MPL instrument, and note that it did not change elicited risk attitudes assuming EUT to infer risk attitudes. 72. In fact, Wilcox (2008a) recommends a third alternative speciﬁcation, the Contextual Utility model developed in Wilcox (2008b), over both the Luce and Fechner speciﬁcations. If the choice is between Luce and Fechner, however, his discussion clearly favors Fechner. The estimates from Holt and Laury (2005) presented in Section 3.1 used the Luce speciﬁcation, and hence differ from those presented here. 73. These do not exactly replicate all estimates presented earlier since there are slight differences in speciﬁcations. 74. With one exception, we do not believe that this inference is supported by the existing data and experimental designs. That exception is Beattie and Loomes (1997), an excellent example of the type of controlled study of incentives that is needed to address these issues. 75. The term ‘‘valuation’’ subsumes open-ended elicitation procedures as well as dichotomous choice, binary referenda, and stated choice tasks. See Harrison (2006a, 2006b) and Harrison and Rutstro¨m (2008) for reviews. 76. Heckman and Smith (1995; pp. 99–101) provide many examples, and coin the expression ‘‘randomization bias’’ for this possible effect. Harrison and List (2004) review the differences between laboratory, ﬁeld, social, and natural experiments in economics, and all could be potentially affected by randomization bias. Palfrey and Pevnitskaya (2008) use thought experiments and laboratory experiments to illustrate how risk attitudes can theoretically affect the mix of bidders in sealed-bid auctions with endogenous entry, and thereby change behavior in the sample of bidders observed in the auction. 77. We hesitate to endorse practices in other ﬁelds, in which recruitment fees are not paid to subjects, since they open themselves up to abuse. We have considerable experience of faculty recruiting subjects for ‘‘extra credit,’’ but where the task and behavior bears no relationship at all to the learning objectives of the class, and no pedagogic feedback is provided to students even if it does bear some tangential relationship to the topic of the class. We have serious ethical problems with such practices, quite apart from the problems of motivation that they raise. 78. There is also evidence of differences in the demographics and behavior of volunteers and ‘‘pseudo-volunteers,’’ which are subjects formally recruited in a classroom to participate in an experiment during class time (Rutstro¨m, 1998; Eckel & Grossman, 2000). The disadvantage with pseudo-volunteers is that the subjects might simply not be interested in participating in the experiment, even with the use of salient rewards. The advantage, of course, is that the selection process that leads them to be in the classroom is unrelated to the characteristics of the experimental task, although even here one might just be replacing one ill-studied sample selection process with another. After all, even if we model the factors that cause subjects from a university population to select into an experiment, we have not modeled the factors that cause individuals to choose to become university students (Casari, Ham, & Kagel, 2007).

142

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

79. Endogenous subject attrition from the experiment can also be informative about subject preferences, since the subject’s exit from the experiment indicates that the subject had made a negative evaluation of it. See Diggle and Kenward (1994) and Philipson and Hedges (1998) for discussion of this statistical issue. 80. More precisely, the statistical problem is that there may be some unobserved individual effects that cause subjects to be in the observed sample or not, and these effects could be correlated with responses once in the observed sample. For example, Camerer and Lovallo (1999) ﬁnd that excess entry into competitive games occurs more often when subjects volunteered to participate knowing that payoffs would depend on skill in a sports or current events trivia. This treatment could encourage less risk-averse subjects to participate in the experiment and may explain the observed reference bias effect, or part of it. 81. It is well known in the ﬁeld of clinical drug trials that persuading patients to participate in randomized studies is much harder than persuading them to participate in non-randomized studies (e.g., Kramer and Shapiro (1984; p. 2742ff.)). The same problem applies to social experiments, as evidenced by the difﬁculties that can be encountered when recruiting decentralized bureaucracies to administer the random treatment (e.g., Hotz, 1992). For example, Heckman and Robb (1985) note that the refusal rate in one randomized job training program was over 90%. 82. Here we consider the role of preferences over risk, but the same concerns apply to the elicitation of other types of preferences, such as social preferences or time preferences (Eckel & Grossman, 2000; Lazear, Malmendier, & Weber, 2006; Dohmen & Falk, 2006). These concerns arise when subjects have some reason to believe that the task will lead them to evaluate those preferences, such as in longitudinal designs allowing attrition, or social experiments requiring disclosure of the nature of the task prior to participation. They might also arise if the sample is selected by some endogenous process in which selection might be correlated with those preferences, such as group membership or location choices. 83. In addition, we often just assign subjects to some role in an experiment, whether or not they would have selected for this role in any naturally occurring environment. This issue lies at the heart of the interest in ﬁeld experiments initiated by Bohm (2002). 84. Lusk and Coble (2005) also report evidence consistent with this conclusion, comparing risk preferences elicited for an artiﬁcial monetary instrument and comparable preferences for an instrument deﬁned over genetically-modiﬁed food. Lusk and Coble (2008) ﬁnd that adding abstract background risk to an elicitation procedure using artiﬁcial monetary outcomes also generates more risk aversion, although they do not ﬁnd the effect to be large quantitatively. 85. To make this point more succinctly, consider the elicitation of the value that a person places on safety, a critical input in the cost-beneﬁt assessment of environmental policy such as the Clean Air Act (United States Environmental Protection Agency, 1997). Conventional procedures to measure such preferences focus on monetary values to avoid mortality risk, by asking subjects to value scenarios in which they face different risks of death. The traditional interpretation of responses to such questions ignores the fact that it is hard to imagine a physical risk that could kill you with some probability but that would leave you alive and have no effect whatsoever on your health. Of course, such risks exist, but most of the environmental

Risk Aversion in the Laboratory

143

risks of concern for policy do not fall into such a category. In general, then, responses to the foreground risk question should allow for the fact that the subject likely perceived some background risk. This example represents an important policy issue and highlights the import of the theoretical literature on background risk. 86. However, since we do not know the subject probability distribution of background risk in the ﬁeld, we cannot know if background risk is statistically independent with the foreground risk. We can think of no reason why the two might be correlated, but this illustrates again the type of trade-off one experiences with ﬁeld experiments. It also points to the complementary nature of ﬁeld and lab experiments: Lusk and Coble (2008) show that independent background risk in a lab setting is associated with an increase in foreground risk aversion. 87. The typical application of the random lottery incentive mechanism in experiments such as these would have one choice selected at random. We used three to ensure comparability of rewards with other experiments in which subjects made choices over 40 or 20 lotteries, and where 2 lotteries or 1 lottery was, respectively, selected at random to be played out. 88. The computer laboratory used for these experiments has 28 subject stations. Each screen is ‘‘sunken’’ into the desk, and subjects were typically separated by several empty stations due to staggered recruitment procedures. No subject could see what the other subjects were doing, let alone mimic what they were doing since each subject was started individually at different times. 89. These ﬁnal outcomes differ by $1 from the two highest outcomes for the gain frame and mixed frame, because we did not want to offer prizes in fractions of dollars. 90. To ensure that probabilities summed to one, we also used probabilities of 0.26 instead of 0.25, 0.38 instead of 0.37, 0.49 instead of 0.50, or 0.74 instead of 0.75. 91. The control data in these three panels, for the 1 problem, are pooled across all task #1 responses. That is, the task #1 responses in the bottom left panel of Fig. 27 are not just the task #1 responses of the individuals facing the 90 problem. Nothing essential hinges on this at this stage of exposition. The statistical analysis in Section 2.1 does take this into account, using appropriate panel estimation procedures. 92. The experience was not with the same prize level, as noted earlier. 93. See Ortona (1994) and Kachelmeier and Shehata (1994). 94. These conclusions come from a panel regression model that controls for all of the factors discussed, and that allows for individual-level heteroskedasticity and individual-level ﬁrst-order autocorrelation. All conclusions refer to effects that are statistically signiﬁcant at the 1% level. 95. References to increases in risk aversion should also be understood, in this context, to refer to decreases in risk loving. 96. Although purely anecdotal, our own experience is that many subjects faced with the BDM task believe that the buying price depends in some way on their selling price. To mitigate such possible perceptions we have tended to use physical randomizing devices that are less prone to being questioned. 97. The stakes in the experiments of Gneezy and Potters (1997) were actually 2 Dutch guilders, which converted at the time of the experiment to roughly $1.20. Haigh and List (2005) used a stake of $1.00 for their students, to be comparable to the earlier stake. They quadrupled the stakes to $4.00 for the traders, on the grounds

144

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

that it would be more salient for them. Of course, this change in monetary stake size adds a potential confound to the comparability of results across students and traders, but one that has no obvious resolution without an elaborate investigation into the purchasing power of a dollar to students and traders. 98. Gneezy and Potters generously provided their individual data, and we used the same statistical model as Haigh and List (2005; Table II, speciﬁcation 2, p. 530) on their data. Haigh and List also generously provided their individual data, and we replicated their statistical conclusions. 99. In fact, subjects tended to pick in round percentages. In the Low frequency treatment 76% of the choices were for 0, 25, 50, or 100% bets, and in the High frequency treatment 81% of the choices were for the 25, 50, or 100% bets. 100. For example, Kahneman and Lovallo (1993, p. 20), Benartzi and Thaler (1995, p. 79), Gneezy and Potters (1997, p. 632), Thaler et al. (1997. p. 650), Gneezy et al. (2003, p. 822), and Haigh and List (2005, p. 525). 101. In other words, there are settings in which a CRRA or even RN utility function might be appropriate for some theoretical, econometric, or policy exercise. But this experiment is not obviously one of those settings. 102. Yet another approach would be to modify the experimental design and allow subjects to leverage their bets beyond 100% of their stake. There are some logistical problems running such experiments in a laboratory setting, although of course stock exchanges and futures markets allow such trades. 103. The a parameter may be viewed as a counterpart in this speciﬁcation of the noise parameter used by Holt and Laury (2002). 104. Benartzi and Thaler (1995, p. 80) are clear that this evaluation horizon is not the same thing as a planning horizon: ‘‘A young investor, for example, might be saving for retirement 30 years off in the future, but nevertheless experience the utility associated with the gains and losses of his investment every quarter when he opens a letter from his mutual fund. In this case, his (planning) horizon is 30 years but his evaluation period (evaluation horizon) is 3 months.’’ 105. They prefer the expression ‘‘prospective utility,’’ but there is no confusion as long we are clear about which utility functions and probabilities are being used to calculate expected utility. 106. Mankiw and Zeldes (1991) make the important observation that only 12% of Americans hold stocks worth more than $10,000, using a 1984 survey, so one really has to explain their indifference between holding bonds and stocks. Presumably, the remaining ‘‘corner-solution’’ individuals face some transactions costs to undertaking such investments. It would be an easy and important extension of the approach of BT to allow for such heterogeneity in the composition of stockholders and others. 107. The constant term in this linear function is suppressed, since it would be perfectly correlated with the sum of these two binary variables. To be explicit, denote these dummy variables for the treatments as L and H, respectively. Then we actually estimate aL, aL, bL, bH, lL, and lH, where a ¼ aL L+aH H, b ¼ bL L+bH H, and l ¼ lL L+lH H. Thus, the logic of the likelihood function is as follows: candidate values of these six parameters are proposed, the linear function evaluated so that we know candidate value of a, b, and l for each of the Low and High frequency treatments, the expected utility of the actual choice is evaluated using the Tversky and

Risk Aversion in the Laboratory

145

Kahneman (1992) speciﬁcation, and then the log-likelihood function speciﬁed above is evaluated. 108. The Arrow–Pratt coefﬁcient of RRA is 1 a, so a ¼ 1 implies risk neutrality, ao1 implies risk aversion, and aW1 implies risk-loving behavior. These benchmarks are worth noting, to avoid confusion, given the popularity of speciﬁcations from Holt and Laury (2002) that estimate 1 a directly (the risk-neutral value is 0 in that case, positive estimates indicate risk aversion, and negative estimates indicate risk loving). 109. The exposition is deliberately transparent to economists. Most of the exposition in Section F1 would be redundant for those familiar with Gould, Pitblado, and Sribney (2006) or even Rabe-Hesketh and Everitt (2004; ch.13). It is easy to ﬁnd expositions of ML in Stata that are more general and elegant for their purpose, but for those trying to learn the basic tools for the ﬁrst time that elegance can just appear to be needlessly cryptic coding, and actually act as an impediment to comprehension. There are good reasons that one wants to build more ﬂexible and computationally efﬁcient models, but ease of comprehension is rarely one of them. StataCorp (2007) documents the latest version 10 of Stata, but the exposition of the ML syntax is minimal in that otherwise extensive documentation. 110. Paarsch and Hong (2006; Appendix A.8) provide a comparable introduction to the use of MATLAB for estimation of structural models of auctions. Unfortunately their documentation contains no ‘‘real data’’ to evaluate the programs on. 111. Note that this is ‘euL’ and not ‘euL’: beginning Stata users make this mistake a lot. 112. Since the ML_eut0 program is called many, many times to evaluate Jacobians and the like, these warning messages can clutter the screen display needlessly. During debugging, however, one normally likes to have things displayed, so the command ‘‘quietly’’ would be changed to ‘‘noisily’’ for debugging. Actually, we use the ‘‘ml check’’ option for debugging, as explained later, and never change this to ‘‘noisily.’’ Or we can display one line by using the ‘‘noisily’’ option, to debug speciﬁc calculations.

ACKNOWLEDGMENT We thank the U.S. National Science Foundation for research support under grants NSF/IIS 9817518, NSF/HSD 0527675, and NSF/SES 0616746, and the Danish Social Science Research Council for research support under project #24-02-0124. We are grateful to William Harbaugh, John Hey, Steven Kachelmeier, and Robert Sugden for making detailed experimental results available. Valuable comments were received from Steffen Andersen, James Cox, Morten Lau, Vjollca Sadiraj, Peter Wakker, and Nathaniel Wilcox. Harrison is also afﬁliated with the Durham Business School, Durham University, UK.

146

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

REFERENCES Abdellaoui, M. (2000). Parameter-free elicitation of utilities and probability weighting functions. Management Science, 46, 1497–1512. Abdellaoui, M., Barrios, C., & Wakker, P. P. (2007a). Reconciling introspective utility with revealed preference: Experimental arguments based on prospect theory. Journal of Econometrics, 138, 356–378. Abdellaoui, M., Bleichrodt, H., & Paraschiv, C. (2007b). Measuring loss aversion under prospect theory: A parameter-free approach. Management Science, 53(10), 1659–1674. Allais, M. (1979). The foundations of positive theory of choice involving risk and a criticism of the postulates and Axioms of the American school. In: M. Allais & O. Hagen (Eds), Expected utility hypotheses and the Allais paradox. Dordrecht, The Netherlands: Reidel. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006a). Elicitation using multiple price lists. Experimental Economics, 9(4), 383–405. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008a). Eliciting risk and time preferences. Econometrica, 76, forthcoming. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008b). Lost in state space: Are preferences stable? International Economic Review, 49, forthcoming. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008c). Risk aversion in game shows. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Andersen, S., Harrison, G. W., & Rutstro¨m, E. E. (2006b). Choice behavior, asset integration and natural reference points. Working Paper 06-04. Department of Economics, College of Business Administration, University of Central Florida. Attema, A. E., Bleichrodt, H., Rohde, K. I. M., & Wakker, P. P. (2006). Time-tradeoff sequences for quantifying and visualizing the degree of time inconsistency, using only pencil and paper. Working Paper. Erasmus University, Rotterdam. Ballinger, T. P., & Wilcox, N. T. (1997). Decisions, error and heterogeneity. Economic Journal, 107, 1090–1105. Barr, A. (2003). Risk pooling, commitment, and information: An experimental test of two fundamental assumptions. Working Paper 2003–05. Centre for the Study of African Economies, Department of Economics, University of Oxford. Barr, A., & Packard, T. (2002). Revealed preference and self insurance: Can we learn from the self employed in Chile? Policy Research Working Paper #2754. World Bank, Washington DC. Battalio, R. C., Kagel, J. C., & Jiranyakul, K. (1990). Testing between alternative models of choice under uncertainty: Some initial results. Journal of Risk and Uncertainty, 3, 25–50. Beattie, J., & Loomes, G. (1997). The impact of incentives upon risky choice experiments. Journal of Risk and Uncertainty, 14, 149–162. Beck, J. H. (1994). An experimental test of preferences for the distribution of income and individual risk aversion. Eastern Economic Journal, 20(2), 131–145. Becker, G. M., DeGroot, M. H., & Marschak, J. (1964). Measuring utility by a single-response sequential method. Behavioral Science, 9, 226–232. Benartzi, S., & Thaler, R. H. (1995). Myopic loss aversion and the equity premium puzzle. Quarterly Journal of Economics, 111(1), 75–92. Berg, J., Dickhaut, J., & McCabe, K. (2005). Risk preference instability across institutions: A dilemma. Proceedings of the National Academy of Sciences, 102, 4209–4214.

Risk Aversion in the Laboratory

147

Binswanger, H. P. (1980). Attitudes toward risk: Experimental measurement in rural India. American Journal of Agricultural Economics, 62, 395–407. Binswanger, H. P. (1981). Attitudes toward risk: Theoretical implications of an experiment in rural India. Economic Journal, 91, 867–890. Birnbaum, M. H. (2004). Tests of rank-dependent utility and cumulative prospect theory in gambles represented by natural frequencies: Effects of format, event framing, and branch splitting. Organizational Behavior and Human Decision Processes, 95, 40–65. Bleichrodt, H., & Pinto, J. L. (2000). A parameter-free elicitation of the probability weighting function in medical decision analysis. Management Science, 46, 1485–1496. Bohm, P. (2002). Pitfalls in experimental economics. In: F. Andersson & H. Holm (Eds), Experimental economics: Financial markets, auctions, and decision making. Dordrecht: Kluwer. Botelho, A., Harrison, G. W., Pinto, L. M. C., & Rutstro¨m, E. E. (2005a). Social norms and social choice. Working Paper 05-23. Department of Economics, College of Business Administration, University of Central Florida. Botelho, A., Harrison, G. W., Pinto, L. M. C., & Rutstro¨m, E. E. (2005b). Testing static game theory with dynamic experiments: A case study of public goods. Working Paper 05–25. Department of Economics, College of Business Administration, University of Central Florida. Bullock, D. S., & Rutstro¨m, E. E. (2007). Policy making and rent-dissipation: An experimental test. Experimental Economics, 10(1), 21–36. Cadsby, C. B., Song, F., & Tapon, F. (2007). Sorting and incentive effects of pay for performance: An experimental investigation. Academy of Management Journal, 50(2), 387–405. Calman, K. C., & Royston, G. (1997). Risk language and dialects. British Medical Journal, 315, 939–942. Camerer, C. F. (1989). An experimental test of several generalized utility theories. Journal of Risk and Uncertainty, 2, 61–104. Camerer, C. F. (2000). Prospect theory in the wild: Evidence from the ﬁeld. In: D. Kahneman & A. Tversky (Eds), Choices, values and frames. New York: Cambridge University Press. Camerer, C. F. (2005). Three cheers – psychological, theoretical, empirical – for loss aversion. Journal of Marketing Research, XLII, 129–133. Camerer, C., & Ho, T. (1994). Violations of the betweenness axiom and nonlinearity in probability. Journal of Risk and Uncertainty, 8, 167–196. Camerer, C., & Hogarth, R. (1999). The effects of ﬁnancial incentives in experiments: A review and capital-labor framework. Journal of Risk and Uncertainty, 19, 7–42. Camerer, C., & Lovallo, D. (1999). Overconﬁdence and excess entry: An experimental approach. American Economic Review, 89(1), 306–318. Casari, M., Ham, J. C., & Kagel, J. H. (2007). Selection bias, demographic effects and ability effects in common value experiments. American Economic Review, 97(4), 1278–1304. Chambers, R. G., & Quiggin, J. (2000). Uncertainty, production, choice, and agency: The statecontingent approach. New York, NY: Cambridge University Press. Chew, S. H., Karni, E., & Safra, Z. (1987). Risk aversion in the theory of expected utility with rank dependent probabilities. Journal of Economic Theory, 42, 370–381. Clarke, K. A. (2003). Nonparametric model discrimination in international relations. Journal of Conﬂict Resolution, 47(1), 72–93. Cleveland, W. S., Harris, C. S., & McGill, R. (1982). Judgements of circle sizes on statistical maps. Journal of the American Statistical Association, 77(379), 541–547. Cleveland, W. S., Harris, C. S., & McGill, R. (1983). Experiments on quantitative judgements of graphs and maps. Bell System Technical Journal, 62(6), 1659–1674.

148

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Cleveland, W. S., & McGill, R. (1984). Graphical perception: Theory, experimentation, and application to the development of graphical methods. Journal of the American Statistical Association, 79(387), 531–554. Coller, M., & Williams, M. B. (1999). Eliciting individual discount rates. Experimental Economics, 2, 107–127. Conlisk, J. (1989). Three variants on the Allais example. American Economic Review, 79(3), 392–407. Conte, A., Hey, J. D., & Moffatt, P. G. (2007). Mixture models of choice under risk. Discussion Paper No. 2007/06. Department of Economics and Related Studies, University of York. Cox, J. C., & Epstein, S. (1989). Preference reversals without the independence axiom. American Economic Review, 79(3), 408–426. Cox, J. C., & Harrison, G. W. (2008). Risk aversion in experiments: An introduction. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Cox, J. C., Roberson, B., & Smith, V. L. (1982). Theory and behavior of single object auctions. In: V. L. Smith (Ed.), Research in experimental economics (Vol. 2). Greenwich: JAI Press. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games & Economic Behavior, 56, 45–60. Cox, J. C., & Sadiraj, V. (2008). Risky decisions in the large and in the small: Theory and experiment. In: J. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Cox, J. C., Smith, V. L., & Walker, J. M. (1985). Experimental development of sealed-bid auction theory: Calibrating controls for risk aversion. American Economic Review (Papers & Proceedings), 75, 160–165. Cox, J. C., Smith, V. L., & Walker, J. M. (1988). Theory and individual behavior of ﬁrst-price auctions. Journal of Risk and Uncertainty, 1, 61–99. Cubitt, R. P., Starmer, C., & Sugden, R. (1988a). Dynamic choice and the common ratio effect: An experimental investigation. Economic Journal, 108, 1362–1380. Cubitt, R. P., Starmer, C., & Sugden, R. (1988b). On the validity of the random lottery incentive system. Experimental Economics, 1(2), 115–131. Dave, C., Eckel, C., Johnson, C., & Rojas, C. (2007). On the heterogeneity, stability and validity of risk preference measures. Unpublished manuscript. Department of Economics, University of Texas at Dallas. Diggle, P., & Kenward, M. G. (1994). Informative drop-out in longitudinal data analysis. Applied Statistics, 43(1), 49–93. Dohmen, T., & Falk, A. (2006). Performance pay and multi-dimensional sorting: Productivity, preferences and gender. Discussion Paper #2001. Institute for the Study of Labor (IZA), Bonn, Germany. Eckel, C. C., & Grossman, P. J. (2000). Volunteers and pseudo-volunteers: The effect of recruitment method in dictator experiments. Experimental Economics, 3, 07–120. Eckel, C. C., & Grossman, P. J. (2002). Sex differences and statistical stereotyping in attitudes toward ﬁnancial risk. Evolution and Human Behavior, 23(4), 281–295. Eckel, C. C., & Grossman, P. J. (2008). Forecasting risk attitudes: An experimental study of actual and forecast risk attitudes of women and men. Journal of Economic Behavior & Organization, forthcoming.

Risk Aversion in the Laboratory

149

Eeckhoudt, L., Gollier, C., & Schlesinger, H. (1996). Changes in background risk and risktaking behavior. Econometrica, 64(3), 683–689. Ehrhart, K.-M., & Keser, C. (1999). Mobility and cooperation: On the run. Working Paper 99s-24. CIRANO, University of Montreal. Engle-Warnick, J., Escobal, J., & Laszlo, S. (2006). The effect of an additional alternative on measured risk preferences in a laboratory experiment in Peru. Working Paper 2006s-06. CIRANO, Montreal. Ertan, A., Page, T., & Putterman, L. (2005). Can endogenously chosen institutions mitigate the free-rider problem and reduce perverse punishment? Working Paper 2005-13. Department of Economics, Brown University. Falk, A., Fehr, E., & Fischbacher, U. (2005). Driving forces behind informal sanctions. Econometrica, 73(6), 2017–2030. Farquhar, P. H. (1984). Utility assessment methods. Management Science, 30(11), 1283–1300. Fehr, E., & Goette, L. (2007). Do workers work more if wages are high? Evidence from a randomized ﬁeld experiment. American Economic Review, 97(1), 298–317. Fennema, H., & van Assen, M. (1999). Measuring the utility of losses by means of the trade off method. Journal of Risk and Uncertainty, 17(3), 277–295. Finney, M. A. (1998). FARSITE: Fire Area Simulator – Model Development and Evaluation, Research Paper RMRS-RP-4. Rocky Mountain Research Station, Forest Service, United States Department of Agriculture. Fiore, S. M., Harrison, G. W., Hughes, C. E., & Rutstro¨m, E. E. (2007). Virtual experiments and environmental policy. Working Paper 07-01. Department of Economics, College of Business Administration, University of Central Florida. Fishburn, P. C. (1967). Methods of estimating additive utilities. Management Science, 13(7), 435–453. Frederick, S., Loewenstein, G., & O’Donoghue, T. (2002). Time discounting and time preference: A critical review. Journal of Economic Literature, XL, 351–401. Gegax, D., Gerking, S., & Schulze, W. (1991). Perceived risk and the marginal value of safety. Review of Economics and Statistics, 73, 589–596. Gerking, S., de Haan, M., & Schulze, W. (1988). The marginal value of job safety: A contingent value study. Journal of Risk and Uncertainty, 1, 185–199. Gneezy, U., Kapteyn, A., & Potters, J. (2003). Evaluation periods and asset prices in a market experiment. Journal of Finance, 58, 821–838. Gneezy, U., & Potters, J. (1997). An experiment on risk taking and evaluation periods. Quarterly Journal of Economics, 112, 631–645. Gollier, C. (2001). The economics of risk and time. Cambridge, MA: MIT Press. Gollier, C., & Pratt, J. W. (1996). Risk vulnerability and the tempering effect of background risk. Econometrica, 64(5), 1109–1123. Gonzalez, R., & Wu, G. (1999). On the shape of the probability weighting function. Cognitive Psychology, 38, 129–166. Gould, W., Pitblado, J., & Sribney, W. (2006). Maximum likelihood estimation with Stata (3rd ed.). College Station, TX: Stata Press. Grether, D. M., & Plott, C. R. (1979). Economic theory of choice and the preference reversal phenomenon. American Economic Review, 69, 623–648. Gu¨rerk, O¨., Irlenbusch, B., & Rockenbach, B. (2006). The competitive advantage of sanctioning institutions. Science, 312, 108–111.

150

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Haigh, M. S., & List, J. A. (2005). Do professional traders exhibit myopic loss aversion? An experimental analysis. Journal of Finance, 60(1), 523–534. Harbaugh, W. T., Krause, K., & Vesterlund, L. (2002). Risk attitudes of children and adults: Choices over small and large probability gains and losses. Experimental Economics, 5, 53–84. Harless, D. W. (1992). Predictions about indifference curves inside the unit triangle: A test of variants of expected utility theory. Journal of Economic Behavior and Organization, 18, 391–414. Harless, D. W., & Camerer, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62(6), 1251–1289. Harrison, G. W. (1986). An experimental test for risk aversion. Economics Letters, 21(1), 7–11. Harrison, G. W. (1989). Theory and misbehavior of ﬁrst-price auctions. American Economic Review, 79, 749–762. Harrison, G. W. (1990). Risk attitudes in ﬁrst-price auction experiments: A Bayesian analysis. Review of Economics and Statistics, 72, 541–546. Harrison, G. W. (1992). Theory and misbehavior of ﬁrst-price auctions: Reply. American Economic Review, 82, 1426–1443. Harrison, G. W. (2006a). Experimental evidence on alternative environmental valuation methods. Environmental and Resource Economics, 34, 125–162. Harrison, G. W. (2006b). Making choice studies incentive compatible. In: B. Kanninen (Ed.), Valuing environmental amenities using stated choice studies: A common sense guide to theory and practice (pp. 65–108). Boston: Kluwer. Harrison, G. W. (2006c). Maximum likelihood estimation of utility functions using Stata. Working Paper 06-12. Department of Economics, College of Business Administration, University of Central Florida. Harrison, G. W. (2007). Hypothetical bias over uncertain outcomes. In: J. A. List (Ed.), Using experimental methods in environmental and resource economics. Northampton, MA: Elgar. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2005a). Temporal stability of estimates of risk aversion. Applied Financial Economics Letters, 1, 31–35. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2005b). Risk aversion and incentive effects: Comment. American Economic Review, 95(3), 897–901. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2007a). Measurement with experimental controls. In: M. Boumans (Ed.), Measurement in economics: A handbook. San Diego, CA: Elsevier. Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2005c). Risk attitudes, randomization to treatment, and self-selection into experiments. Working Paper 05-01. Department of Economics, College of Business Administration, University of Central Florida; Journal of Economic Behaviour & Organization, forthcoming. Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2007b). Estimating risk attitudes in Denmark: A ﬁeld experiment. Scandinavian Journal of Economics, 109(2), 341–368. Harrison, G. W., Lau, M. I., Rutstro¨m, E. E., & Sullivan, M. B. (2005d). Eliciting risk and time preferences using ﬁeld experiments: Some methodological issues. In: J. Carpenter, G. W. Harrison & J. A. List (Eds), Field experiments in economics (Vol. 10). Greenwich, CT: JAI Press, Research in Experimental Economics. Harrison, G. W., Lau, M. I., & Williams, M. B. (2002). Estimating individual discount rates for Denmark: A ﬁeld experiment. American Economic Review, 92(5), 1606–1617.

Risk Aversion in the Laboratory

151

Harrison, G. W., & List, J. A. (2004). Field experiments. Journal of Economic Literature, 42(4), 1013–1059. Harrison, G. W., List, J. A., & Towe, C. (2007c). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. Econometrica, 75(2), 433–458. Harrison, G. W., List, J. A., & Tra, C. (2005e). Statistical characterization of heterogeneity in experiments. Working Paper 05-10. Department of Economics, College of Business Administration, University of Central Florida. Harrison, G. W., & Rutstro¨m, E. E. (2005). Expected utility theory and prospect theory: One wedding and a decent funeral. Working Paper 05-18. Department of Economics, College of Business Administration, University of Central Florida; Experimental Economics, forthcoming. Harrison, G. W., & Rutstro¨m, E. E. (2008). Experimental evidence on the existence of hypothetical bias in value elicitation methods. In: C. R. Plott, V. L. Smith (Eds), Handbook of experimental economics results. North-Holland: Amsterdam, forthcoming. Harstad, R. M. (2000). Dominant strategy adoption and bidders’ experience with pricing rules. Experimental Economics, 3(3), 261–280. Heckman, J. J., Robb, R., Heckman, J., & Singer, B. (Eds). (1985). Longitudinal analysis of labor market data. New York: Cambridge University Press. Heckman, J. J., & Smith, J. A. (1995). Assessing the case for social experiments. Journal of Economic Perspectives, 9(2), 85–110. Heinemann, F. (2008). Measuring risk aversion and the wealth effect. In: J. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Greenwich, CT: JAI Press, Research in Experimental Economics. Hershey, J. C., Kunreuther, H. C., & Schoemaker, P. J. H. (1982). Sources of bias in assessment procedures for utility functions. Management Science, 28(8), 936–954. Hershey, J. C., & Schoemaker, P. J. H. (1985). Probability versus certainty equivalence methods in utility measurement: Are they equivalent? Management Science, 31(10), 1213–1231. Hey, J. D. (1995). Experimental investigations of errors in decision making under risk. European Economic Review, 39, 633–640. Hey, J. D. (2001). Does repetition improve consistency? Experimental Economics, 4, 5–54. Hey, J. D. (2002). Experimental economics and the theory of decision making under uncertainty. Geneva Papers on Risk and Insurance Theory, 27(1), 5–21. Hey, J. D., & Lee, J. (2005a). Do subjects remember the past? Applied Economics, 37, 9–18. Hey, J. D., & Lee, J. (2005b). Do subjects separate (or are they sophisticated)? Experimental Economics, 8, 233–265. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62(6), 1291–1326. Hirshleifer, J., & Riley, J. G. (1992). The analytics of uncertainty and information. New York, NY: Cambridge University Press. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92(5), 1644–1655. Holt, C. A., & Laury, S. K. (2005). Risk aversion and incentive effects: New data without order effects. American Economic Review, 95(3), 902–912. Horowitz, J. K. (1992). A test of intertemporal consistency. Journal of Economic Behavior and Organization, 17, 171–182.

152

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Hotz, V. J. (1992). Designing an evaluation of JTPA. In: C. Manski & I. Garﬁnkel (Eds), Evaluating welfare and training programs. Cambridge: Harvard University Press. Isaac, R. M., & James, D. (2000). Just who are you calling risk averse? Journal of Risk and Uncertainty, 20(2), 177–187. James, D. (2007). Stability of risk preference parameter estimates within the Becker–DeGroot– Marschak procedure. Experimental Economics, 10, 123–141. Kachelmeier, S. J., & Shehata, M. (1992). Examining risk preferences under high monetary incentives: Experimental evidence from the People’s Republic of China. American Economic Review, 82(5), 1120–1141. Kachelmeier, S. J., & Shehata, M. (1994). Examining risk preferences under high monetary incentives: Reply. American Economic Review, 84(4), 1104. Kagel, J. H. (1995). Auctions: A survey of experimental research. In: J. H. Kagel & A. E. Roth (Eds), The handbook of experimental economics. Princeton: Princeton University Press. Kagel, J. H., & Levin, D. (2002). Common value auctions and the winner’s curse. Princeton: Princeton University Press. Kagel, J. H., MacDonald, D. N., & Battalio, R. C. (1990). Tests of ‘Fanning Out’ of indifference curves: Results from animal and human experiments. American Economic Review, 80(4), 912–921. Kahneman, D., & Lovallo, D. (1993). Timid choices and bold forecasts: A cognitive perspective on risk taking. Management Science, 39(1), 17–31. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Keller, L. R., & Strazzera, E. (2002). Examining predictive accuracy among discounting models. Journal of Risk and Uncertainty, 24(2), 143–160. Kent, S. (1964). Words of estimated probability. Studies in Intelligence, 8, 49–65. Klibanoff, P., Marinacci, M., & Mukerji, S. (2005). A smooth model of decision making under ambiguity. Econometrica, 73(6), 1849–1892. Ko¨bberling, V., & Wakker, P. P. (2005). An index of loss aversion. Journal of Economic Theory, 122, 119–131. Ko+ szegi, B., & Rabin, M. (2006). A model of reference-dependent preferences. Quarterly Journal of Economics, 121(4), 1133–1165. Ko+ szegi, B., & Rabin, M. (2007). Reference-dependent risk attitudes. American Economic Review, 97(4), 1047–1073. Kramer, M., & Shapiro, S. (1984). Scientiﬁc challenges in the application of randomized trials. Journal of the American Medical Association, 252, 2739–2745. Kreps, D. M., & Porteus, E. L. (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica, 46(1), 185–200. Krupnick, A., Alberini, A., Cropper, M., Simon, N., O’Brien, B., Goeree, R., & Heintzelman, M. (2002). Age, health and the willingness to pay for mortality risk reductions: A contingent valuation survey of Ontario residents. Journal of Risk and Uncertainty, 24(2), 161–186. Kuilen, G., Wakker, P. P., & Zou, L. (2007). A midpoint technique for easily measuring prospect theory’s probability weighting. Working Paper. Econometric Institute, Erasmus University, Rotterdam, The Netherlands. Laury, S. K., & Holt, C. A. (2008). Further reﬂections on prospect theory. In: J. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Lazear, E. P., Malmendier, U., & Weber, R. A. (2006). Sorting in experiments with application to social preferences. Working Paper #12041. National Bureau of Economic Research.

Risk Aversion in the Laboratory

153

Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22. List, J. A. (2003). Does market experience eliminate market anomalies? Quarterly Journal of Economics, 118, 41–71. Loomes, G. (1988). Different experimental procedures for obtaining valuations of risky actions: Implications for utility theory. Theory and Decision, 25, 1–23. Loomes, G., Moffatt, P. G., & Sugden, R. (2002). A microeconometric test of alternative stochastic theories of risky choice. Journal of Risk and Uncertainty, 24(2), 103–130. Loomes, G., Starmer, C., & Sugden, R. (1991). Observing violations of transitivity by experimental methods. Econometrica, 59(2), 425–439. Loomes, G., & Sugden, R. (1995). Incorporating a stochastic element into decision theories. European Economic Review, 39, 641–648. Loomes, G., & Sugden, R. (1998). Testing different stochastic speciﬁcations of risky choice. Economica, 65, 581–598. Lopes, L. L. (1984). Risk and distributional inequality. Journal of Experimental Psychology: Human Perception and Performance, 10(4), 465–484. Luce, R. D. (1959). Individual choice behavior. New York: Wiley. Luce, R. D., & Fishburn, P. C. (1991). Rank and sign-dependent linear utility models for ﬁnite ﬁrst-order gambles. Journal of Risk and Uncertainty, 4, 29–59. Lusk, J. L., & Coble, K. H. (2005). Risk perceptions, risk preference, and acceptance of risky food. American Journal of Agricultural Economics, 87(2), 393–404. Lusk, J. L., & Coble, K. H. (2008). Risk aversion in the presence of background risk: Evidence from the lab. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Mankiw, N. G., & Zeldes, S. P. (1991). The consumption of stockholders and non-stockholders. Journal of Financial Economics, 29(1), 97–112. McFadden, D. (2001). Economic choices. American Economic Review, 91(3), 351–378. McKee, M. (1989). Intra-experimental income effects and risk aversion. Economics Letters, 30, 109–115. Miller, L., Meyer, D. E., & Lanzetta, J. T. (1969). Choice among equal expected value alternatives: Sequential effects of winning probability level on risk preferences. Journal of Experimental Psychology, 79(3), 419–423. Millner, E. L., Pratt, M. D., & Reilly, R. J. (1988). A reexamination of Harrison’s experimental test for risk aversion. Economics Letters, 27, 317–319. Murnighan, J. K., Roth, A. E., & Shoumaker, F. (1987). Risk aversion and bargaining: Some preliminary results. European Economic Review, 31, 265–271. Murnighan, J. K., Roth, A. E., & Shoumaker, F. (1988). Risk aversion in bargaining: An experimental study. Journal of Risk and Uncertainty, 1(1), 101–124. Nau, R. F. (2006). Uncertainty aversion with second-order utilities and probabilities. Management Science, 52(1), 136–145. Novemsky, N., & Kahneman, D. (2005a). The boundaries of loss aversion. Journal of Marketing Research, XLII, 119–128. Novemsky, N., & Kahneman, D. (2005b). How do intentions affect loss aversion? Journal of Marketing Research, XLII, 139–140. Ochs, J., & Roth, A. E. (1989). An experimental study of sequential bargaining. American Economic Review, 79(3), 355–384. Ortona, G. (1994). Examining risk preferences under high monetary incentives: Comment. American Economic Review, 84(4), 1104.

154

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Paarsch, H. J., & Hong, H. (2006). An introduction to the structural econometrics of auction data. Cambridge, MA: MIT Press. Page, T., Putterman, L., & Unel, B. (2005). Voluntary association in public goods experiments: Reciprocity, mimicry, and efﬁciency. Economic Journal, 115, 1037–1058. Palfrey, T. R., & Pevnitskaya, S. (2008). Endogenous entry and self-selection in private value auctions: An experimental study. Journal of Economic Behavior & Organization, 66, forthcoming. Papke, L. E., & Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics, 11, 619–632. Philipson, T., & Hedges, L. V. (1998). Subject evaluation in social experiments. Econometrica, 66(2), 381–408. Plott, C. R., & Zeiler, K. (2005). The willingness to pay-willingness to accept gap, the ‘Endowment Effect,’ subject misconceptions, and experimental procedures for eliciting valuations. American Economic Review, 95(3), 530–545. Prelec, D. (1998). The probability weighting function. Econometrica, 66, 497–527. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior & Organization, 3(4), 323–343. Quiggin, J. (1993). Generalized expected utility theory: The rank-dependent model. Norwell, MA: Kluwer Academic. Rabe-Hesketh, S., & Everitt, B. (2004). A handbook of statistical analyses using Stata (3rd ed.). New York: Chapman & Hall/CRC. Rabin, M. (2000). Risk aversion and expected utility theory: A calibration theorem. Econometrica, 68, 1281–1292. Reilly, R. J. (1982). Preference reversal: Further evidence and some suggested modiﬁcations in experimental design. American Economic Review, 72, 576–584. Rieger, M. O., & Wang, M. (2006). Cumulative prospect theory and the St. Petersburg paradox. Economic Theory, 28, 665–679. Rogers, W. H. (1993). Regression standard errors in clustered samples. Stata Technical Bulletin, 13, 19–23. Roth, A. E., & Malouf, M. W. K. (1979). Game-theoretic models and the role of information in bargaining. Psychological Review, 86, 574–594. Rutstro¨m, E. E. (1998). Home-grown values and the design of incentive compatible auctions. International Journal of Game Theory, 27(3), 427–441. Saha, A. (1993). Expo-power utility: A ﬂexible form for absolute and relative risk aversion. American Journal of Agricultural Economics, 75(4), 905–913. Savage, L. J. (1972). The foundations of statistics (2nd ed.). New York: Dover. Schmidt, U., Starmer, C., & Sugden, R. (2005). Explaining preference reversal with thirdgeneration prospect theory. Working Paper. School of Economic and Social Science, University of East Anglia. Schubert, R., Brown, M., Gysler, M., & Brachinger, H. W. (1999). Financial decision-making: Are women really more risk-averse? American Economic Review (Papers & Proceedings), 89(2), 381–385. Smith, V. L. (1982). Microeconomic systems as an experimental science. American Economic Review, 72(5), 923–955. Smith, V. L. (2003). Constructivist and ecological rationality in economics. American Economic Review, 93(3), 465–508.

Risk Aversion in the Laboratory

155

Starmer, C., & Sugden, R. (1989). Violations of the independence axiom in common ratio problems: An experimental test of some competing hypotheses. Annals of Operational Research, 19, 79–102. Starmer, C., & Sugden, R. (1991). Does the random-lottery incentive system elicit true preferences? An experimental investigation. American Economic Review, 81, 971–978. StataCorp. (2007). Stata statistical software: Release 10. College Station, TX: Stata Corporation. Stigler, G. J., & Becker, G. S. (1977). De gustibus non est disputandum. American Economic Review, 67(2), 76–90. Sutter, M., Haigner, S., & Kocher, M. (2006). Choosing the stick or the carrot? Endogenous institutional choice in social dilemma situations. Discussion Paper No. 5497. Centre for Economic Policy Research, London. Tanaka, T., Camerer, C. F., & Nguyen, Q. (2007). Risk and time preferences: Experimental and household survey data from Vietnam. Working Paper. California Institute of Technology. Thaler, R. H., Tversky, A., Kahneman, D., & Schwartz, A. (1997). The effect of myopia and loss aversion on risk taking: An experimental test. Quarterly Journal of Economics, 112, 647–661. Train, K. E. (2003). Discrete choice methods with simulation. New York: Cambridge University Press. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representations of uncertainty. Journal of Risk and Uncertainty, 5, 297–323. United States Environmental Protection Agency. (1997). The beneﬁts and costs of the clean air act: 1970 to 1990. Washington, DC: Ofﬁce of Air and Radiation, US EPA. Vickrey, W. S. (1961). Counterspeculation, auctions and competitive sealed tenders. Journal of Finance, 16, 8–37. von Winterfeldt, D., & Edwards, W. (1986). Decision analysis and behavioral research. New York: Cambridge University Press. Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57(2), 307–333. Wakker, P. P. (1989). Transforming probabilities without violating stochastic dominance. In: E. Roskam (Ed.), Mathematical Psychology in Progress. Berlin: Springer. Wakker, P. P., & Deneffe, D. (1996). Eliciting von Neumann-Morgenstern utilities when probabilities are distorted or unknown. Management Science, 42, 1131–1150. Wakker, P. P., Erev, I., & Weber, E. U. (1994). Comonotonic independence: The critical test between classical and rank-dependent utility theories. Journal of Risk and Uncertainty, 9, 195–230. Wilcox, N. T. (2008a). Stochastic models for binary discrete choice under risk: A critical primer and econometric comparison. In: J. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics. Wilcox, N. T. (2008b). ‘Stochastically more risk averse:’ A contextual theory of stochastic discrete choice under risk. Journal of Econometrics, 142, forthcoming. Williams, R. (2000). A note on robust variance estimation for cluster-correlated. Biometrics, 56, 645–646. Wooldridge, J. (2003). Cluster-sample methods in applied econometrics. American Economic Review (Papers & Proceedings), 93, 133–138. Yaari, M. E. (1987). The dual theory of choice under risk. Econometrica, 55(1), 95–115.

156

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

APPENDIX A. REPRESENTATION AND PERCEPTION OF PROBABILITIES There are two representational issues with probabilities. The ﬁrst is that subjects may base their decisions on concepts of subjective probabilities such that we should expect them to deviate in some ways from objective probabilities. The second is that perceptions of probabilities may not correspond to the actual probabilities. Only with a theory that explains both the perception of probabilities and the relationship between subjective and objective probabilities we would be able to identify both of these deviations. Nevertheless, careful experimental design can be helpful in generating some robustness in subjective and perceived probabilities, and a convergence in both of these on the underlying objective ones when that is normatively desirable. The review in this appendix complements the discussion in Section 1 of the paper by showing some alternative ways to represent the lotteries to subjects. Camerer (1989) used a stacked box display to represent his lotteries to subjects. The length of the box provided information on the probabilities of each prize, and the width of the box provided information on the relative size of the prizes. The example in Fig. 20 was used in his written instructions to subjects, to explain how to read the lottery. Those instructions were as follows: The outcomes of the lotteries will be determined by a random number between 01 and 100. Each number between (and including) 01 and 100 is equally likely to occur. In the example above, the left lottery, labeled ‘‘A’’, pays nothing (0) if the random number is between 01 and 40. Lottery A pays ﬁve dollars ($5) if the random number is between 41 and 100. Notice that the picture is drawn so that the height of the line between 01 and 40 is 40% of the distance from 01 to 100. The rectangle around ‘‘$5’’ is 60% of the distance from 01 to 100. In the example above the lottery on the right, labeled ‘‘B’’, pays nothing (0) if the random number is between 01 and 50, ﬁve dollars ($5) if the random number is between 51 and 90, and ten dollars ($10) if the random number is between 91 and 100. As with lottery A, the heights of the lines in lottery B represent the fraction of the possible numbers which yield each payoff. For example, the height of the $10 rectangle is 10% of the way from 01 to 100. The widths of the rectangles are proportional to the size of their payoffs. In lottery B, for example, the $10 rectangle is twice as wide as the $5 rectangle.

This display is ingenious in the sense that it compactly displays the ‘‘numbers’’ as well as visual referents for the probabilities and relative

157

Risk Aversion in the Laboratory

Fig. 20.

Lottery Display Used by Camerer (1989).

prizes. The subject has to judge the probabilities for each prize from the visual referent, and is not directly provided that information numerically. There is a valuable literature on the ability of subjects to accurately assess quantitative magnitudes from visual referents of this kind, and it points to the need for individual-speciﬁc calibration in experts and non-experts (Cleveland, Harris, & McGill, 1982, 1983; Cleveland & McGill, 1984). Battalio et al. (1990) and Kagel et al. (1990) employed purely numerical displays of their lotteries. For example, one such lottery was presented to subjects as follows: A: B:

Winning Winning Winning Winning

$11 if 1–20 $5 if 21–200 $25 if 1–6 $5 if 7–100

(20%) (80%) (6%) (94%)

Answer: (1) I prefer A. (2) I prefer B. (3) Indifferent.

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

158

This display presents all values numerically, with no visual referents. The numerical display shows the probability for each prize, rather than require the subject to infer that from the cumulative probabilities. Beattie and Loomes (1997) used displays that were similar to those employed by Camerer (1989), although the probabilities were individually comparable since they were vertically aligned with a common base. Fig. 21 illustrates how they presented the lotteries to subjects. In addition, they provided text explaining how to read the display. Wakker, Erev, and Weber (1994) considered four types of representations, shown in Fig. 22. One, on the far right, was a copy of the display employed by Camerer (1989), and the three on the left varied the extent to which information on outcomes was collapsed (top two panels on the left) and whether numerical information was provided in addition to the verbal information about probabilities (bottom panel on the left). The alternative representations were applied on a between-subjects basis, but no information is provided about the effect on behavior. An example of the representation of probability using a verbal analogical scale is provided by Calman and Royston (1997; Table 4), using a distance analogue. For risks of 1 in 1, 1 in 10, 1 in 100, 1 in 1000, for example, the distance containing one ‘‘risk stick’’ 1 foot in length is 1 foot, 10 feet, 100 feet, and 1,000 feet, respectively. An older tradition seeks to ‘‘calibrate’’ words that are found in the natural English language with precise probability ranges. This idea stems from a concern that Kent (1964) had with the ambiguity in the use of colloquial expressions of uncertainty by intelligence operatives. He proposed that certain words be assigned speciﬁc numerical probability ranges. A study reported by von Winterfeldt and Edwards (1986, p. 98ff.) used these expressions and asked a number of NATO ofﬁcials to state the probabilities that they would attach to the use of those words in sentences. The dots in Fig. 23 show the elicited probability judgements, and the shaded bars show the ranges suggested by Kent (1964). The fact that there

Fig. 21.

Lottery Display Used by Beattie and Loomes (1997).

Risk Aversion in the Laboratory

Fig. 22.

159

Lottery Displays Used by Wakker et al. (1994).

is a poor correspondence with untrained elicitors does not mean, however, that one could not undertake such a ‘‘semantic coordination game’’ using salient rewards, and try to encourage common usage of critical words. The visual dots method is employed by Krupnick et al. (2002, p. 167), and provides a graphic image to complement the direct fractional, numerical representation of probability. An example of their visualization method is shown in Fig. 24. Visual ladders have been used in previous research on mortality risk by Gerking, de Haan, and Schulze (1988) and Gegax, Gerking, and Schulze (1991). One such ladder, from their survey instrument, is shown in Fig. 25. An alternative ladder visualization is offered by Calman and Royston (1997; Fig. 1), and is shown in Fig. 26. One hypothesis to emerge from this review of the representation of lotteries in laboratory and survey settings is that there is no single task representation for lotteries that is perfect for all subjects. It follows that

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

160

Fig. 23.

Representing Risk on a Verbal Analogical Scale.

161

Risk Aversion in the Laboratory

Fig. 24.

Fig. 25.

Representing Risk with Dots.

Representing Risk with a 2D Ladder.

162

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

Fig. 26. Representing Risk with a 3D Ladder.

163

Risk Aversion in the Laboratory

some of the evidence for framing effects in the representation of risk may be due to the implicit assumption that one form of representation works best for everyone: the ‘‘magic bullet’’ assumption. Rather, we should perhaps expect different people to perform better with different representations. To date no systematic comparison of these different methods have been performed and there is no consensus as to what constitutes a state of the art representation.

APPENDIX B. THE EXPERIMENTS OF HEY AND ORME (1994) B1. The Original Experiments The experiments of Hey and Orme (1994) are important in many respects. First, they use lottery tasks that are not designed as ‘‘trip wire’’ tests of one theory or another, but instead as representative lottery tasks. This design objective has strengths and weaknesses. The strength is that one can evaluate many different theories without the task domain being biased in favor of any one theory. Thus, tests of a theory will be based on tasks that are not just built to trick it into error. The weakness is that it might be inefﬁcient as a domain for choosing between different theories. The second reason that these experiments are important, of course, is that they were evaluated using formal ML methods at the level of the individual, including explicit discussion of structural error models due to Fechner. The basic experiments of HO are reviewed in Section 1.2, and the display subjects saw is presented in Fig. 3. Subjects were recruited from the University of York and participated in two sessions, each consisting of 100 binary lottery choices. The sample consisted of 80 students, who were allowed to proceed at their own pace. The lottery tasks took roughly 35 min to complete, and subjects earned an average of d17.50 per hour for this task and one other task. B2. Replication There are two limitations of the original HO experimental data, which make it useful to undertake a replication and extension. One is that there is no data on individual characteristics, so that it is impossible to pool data across subjects and condition estimation on those characteristics. Of course,

164

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

this was not the objective of HO, who estimated choice functionals for each individual separately. But it does limit the use of these data for other purposes. Second, all of the lotteries were in the gain domain, and many theories require lotteries that are framed as losses or as mixtures of gains and losses. Hence, we review here the replications and extensions of Harrison and Rutstro¨m (2005), which address these two limitations. Subjects were presented with 60 lottery pairs, each represented as a ‘‘pie’’ showing the probability of each prize. Fig. 4 illustrates one such representation. The subject could choose the lottery on the left or the right, or explicitly express indifference (in which case the experimenter would ﬂip a coin on the subject’s behalf). After all 60 lottery pairs were evaluated, and three were selected at random for payment.87 The lotteries were presented to the subjects in color on a private computer screen,88 and all choices recorded by the computer program. This program also recorded the time taken to make each choice. In addition to the choice tasks, the subjects provided information on demographic and other personal characteristics. In the gain frame experiments the prizes in each lottery were $0, $5, $10, and $15, and the probabilities of each prize varied from choice to choice, and from lottery to lottery. In the loss frame experiments subjects were given an initial endowment of $15, and the corresponding prizes from the gain frame lotteries were transformed to be $15, $10, $5, and $0. Hence, the ﬁnal outcomes, inclusive of the endowment, were the same in the gain frame and loss frame. In the mixed frame experiments subjects were given an initial endowment of $8, and the prizes were transformed to be $8, $3, $3, and $8, generating ﬁnal outcomes inclusive of the endowment of $0, $5, $11, and $16.89 In addition to the ﬁxed endowment, each subject received a random endowment between $1 and $10. This endowment was generated using a uniform distribution deﬁned over whole dollar amounts, operationalized by a 10-sided die. The purpose of this random endowment is to test for endowment effects on the choices. The probabilities used in each lottery ranged roughly evenly over the unit interval. Values of 0, 0.13, 0.25, 0.37, 0.5, 0.62, 0.75, and 0.87 were used.90 The presentation of a given lottery on the left or the right was determined at random, so that the ‘‘left’’ or ‘‘right’’ lotteries did not systematically reﬂect greater risk or greater prize range than the other. Subjects were recruited at the University of Central Florida, primarily from the College of Business Administration, using the online recruiting application at ExLab (http://exlab.bus.ucf.edu). Each subject received a $5

Risk Aversion in the Laboratory

165

fee for showing up to the experiments, and completed an informed consent form. Subjects were deliberately recruited for ‘‘staggered’’ starting times, so that the subject would not pace their responses by any other subject. Each subject was presented with the instructions individually, and taken through the practice sessions at an individual pace. Since the rolls of die were important to the implementation of the objects of choice, the experimenters took some time to give each subject ‘‘hands-on’’ experience with the (10-sided, 20-sided, and 100-sided) die being used. Subjects were free to make their choices as quickly or as slowly as they wanted. Our data consists of responses from 158 subjects making 9,311 choices that do not involve indifference. Only 1.7% of the choices involved explicit choice of indifference, and to simplify we drop those in estimation unless otherwise noted. Of these 158 subjects, 63 participated in gain frame tasks, 37 participated in mixed frame tasks, and 58 participated in loss frame tasks.

APPENDIX C. THE EXPERIMENTS OF HOLT AND LAURY (2002) C1. Explaining the Data Holt and Laury (2002) examine two main treatments with 212 subjects. The ﬁrst is the effect of incentives. They vary the scale of the payoffs in the matrix shown in panel A of Table 1, which we take to be the scale of 1. Every subject was presented with the ﬁrst matrix of choices shown in panel A of Table 1, and with the exact same matrix at the end of the experiment. These two choices were always given to all subjects, and we will refer to them as task #1 and task #4. All subjects additionally had one or two intermediate choices, referred to here as task #2 and task #3. The question in task #2, if asked, was a higher-scale, hypothetical version of the initial matrix of payoffs. The question in task #3, if asked, was the same higher-scale version of payoffs but with real payoffs. Some subjects were asked one of these intermediate task questions; most subjects were asked both of them (hence for some subjects task #4 was actually their third and last task). Thus, we obtain the tabulation of individual responses shown in Table 9. We see from Table 9 how each subject experienced different scales of payoffs in task #2 and/or task #3. This provides in-sample tests of the hypothesis that risk aversion does not vary with wealth, an important issue for those that assume speciﬁc functional forms such as CRRA or CARA.

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

166

Table 9.

Sample Size and Design of the Holt and Laury (2002) Experiments.

Scale of Payoffs

Task 1

1 20 50 90 All

Total

2

3

118 19 18 155

150 19 18 187

212

212

4 212

212

424 268 38 36 766

A rejection of the ‘‘constancy’’ assumption in CRRA or CARA is not a rejection of EUT in general, of course, but just these particular (popular) parameterizations. In Section 3.7 and Appendix E, we see that some studies unfortunately equate ‘‘rejection of EUT’’ with ‘‘rejection of CRRA.’’ The second treatment in the HL design is the effect of hypothetical payoffs, which is why the questions in task #2 are included. Economic theory has no prediction when the task is not salient, and we have no control over subject behavior as an experimenter. The effect of using hypothetical responses is examined in depth in Harrison (2007) using these and other data, since the use of such data has been so prevalent in the empirical literature on the validity of EUT, but we do not consider them any further here. There is considerable evidence, bolstered by Holt and Laury (2005), that risk attitudes elicited with hypothetical responses are signiﬁcantly different to risk attitudes elicited with real economic consequences, so this is a debate simply not worth pursuing. Although having in-sample responses is valuable, it comes at a price in terms of control since there may be wealth effects from the subjects having earned some proﬁt in the previous choice. To handle this HL use a nice trick: when the subjects proceed from task #1 to task #3, they are ﬁrst asked if they are willing to give up their earnings in task #1 in order to play task #3. Since the stakes are so much higher in task #3, all subjects chose to do so. This means that the subjects face tasks #1 and #3 with no prior earnings from these experiments, although they do have experience with the type of task when facing task #3. No such trick can be applied for task #4, since the subjects would be unlikely to give up their earnings in task #3 in this instance. Thus, the responses to task #4 have no controls for wealth built in to the design. However, we do know the actual earnings of the subjects from the experimental data.

167

Risk Aversion in the Laboratory Payoffs of 1× and 20×, by Choice Problem

Payoffs of 1× and 50×, by Choice Problem

1

1

.75

.75 .5

.5 Risk Neutral

1×

20×

Risk Neutral

1×

50×

.25

.25 0

0 1

2

3

4 5 6 7 Problem Sequence

8

9

1

10

Payoffs of 1× and 90×, by Choice Problem

2

3

4 5 6 7 Problem Sequence

8

9

10

Same Payoffs of 1×, by Choice Problem

1

1

.75

.75 .5

.5 Risk Neutral

1×

Risk Neutral

90× .25

.25 0

0 1

2

3

Fig. 27.

4 5 6 7 Problem Sequence

8

9

10

1

2

3

4 5 6 7 8 Problem Sequence

9

10

Observed Choices in Holt and Laury (2002) Experiments.

HL also ask each subject to ﬁll out a detailed question of individual demographic information, so their data include a rich set of controls for differences in risk preferences due to these characteristics. Fig. 27 shows the main responses in the HL experiments. Consider the top left panel, which shows the average number of choices of the ‘‘safe’’ option A in each problem. In Problem 1, which is row 1 in panel A of Table 1, virtually everyone chooses option A (the safe choice). By the time the subjects get to Problem 10, which is the last row in panel A of Table 1, virtually everyone has switched over to problem B, the ‘‘risky’’ option. The dashed line marked RN shows the prediction if each and every subject were risk neutral: in this case everyone would choose option A up to Problem 4, then everyone would choose option B thereafter. The solid line marked with a circle shows the observed behavior in task #1, the low-payoff case. The solid line marked with a diamond shows the observed behavior in task #3, the high-payoff case. In the top left panel, the high payoff refers to payoff matrices that scale up the values in panel A of Table 1 by 20. The top right panel in Fig. 27 shows comparable data for the 50 problems, and the bottom left panel shows comparable data for the 90 problems.91

168

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

We examine the bottom-right panel later. HL proceed with their analysis by looking at the ﬁrst three pictures in Fig. 27 and drawing two conclusions. First, that one has to introduce some ‘‘noise’’ into any model of the data-generation process, since the observed choices are ‘‘smoother’’ than the risk-neutral prediction. A more general way of saying this is to allow subjects to have a speciﬁc degree of risk aversion, but to critically assume that they all have exactly the same degree of risk aversion. Thus, if subjects were a little risk averse the line marked RN would shift to the right and drop down a bit to the right, perhaps at Problem 6 or 7 instead of Problem 5. Of course, it would no longer represent riskneutral responses, but it would still drop sharply, and that is the point being made by HL when arguing for a noise parameter. Second, and related to the previous explanation, the best-ﬁtting line that assumes homogenous risk preferences would have to be a bit to the right of the risk-neutral line marked RN. So some degree of risk aversion, they argue, is needed to account for the location of the observed averages, quite apart from the need for a noise parameter to account for the smoothness of the observed averages. Both conclusions depend critically on the assumption that every subject in the experiment has the same preferences over risk. The smoothness of the observed averages is easily explained if one allows heterogenous risk attitudes and no noise at all at the individual level: some people drop down at Problem 4, some more at Problem 5, some more at Problem 6, and so on. The smoothness that the eyeball sees in the aggregate data is just a counterpart of averaging this heterogenous process. The fact that some degree of risk aversion is needed for some subjects is undeniable, from the positive area above the RN line and below the circle or diamond lines from Problems 5 through 10. But it simply does not follow without further statistical analysis that all subjects, or even the typical subject, exhibit signiﬁcant amounts of risk aversion. Nor does it follow that a noise parameter is needed to model these data. These conclusions follow from inspection of each of the ﬁrst three panels of Fig. 27, and just the RN and circle lines in each for that matter. Now turn to the comparison of the circle and diamond lines within each of the ﬁrst three panels. The eyeball suggests that the diamond lines are to the right of the circle lines, which implies that risk aversion increases as the scale of payoffs increases. But this conclusion requires some measures of the uncertainty of these averages. Not surprisingly, the standard deviation in responses is the largest around Problems 5 through 7, suggesting that the conﬁdence intervals around these diamond and circle lines could easily

Risk Aversion in the Laboratory

169

overlap. Again, this is a matter for an appropriate statistical analysis, not eyeball inspection of the averages. Finally, compare the differences between the diamond and circle lines as one scans across the ﬁrst three panels in Fig. 27. As the payoff scale gets larger, from 20 to 50 and then to 90, it appears that the gap widens. That is, if one ignores the issue of standard errors around these averages, it appears that the degree of risk aversion increases. This leads HL to reject CRRA and CARA, and to consider generalized functional forms for utility functions that admit of increasing risk aversion. However, as Table 9 shows, the sample sizes for the 50 and 90 treatments were signiﬁcantly smaller than those for the 20 treatment: 38 and 36 subjects, respectively, compared to 268 subjects for the 20 treatments. So one would expect that the standard errors around the 50 and 90 high-payoff lines would be much larger than those around the 20 high-payoff lines. This could make it difﬁcult to statistically draw the eyeball conclusion that scale increases risk aversion. Finally, one needs to account for the fact that all of the high-payoff data in the HL experiments was obtained in a task that followed the low-payoff task. Income effects were controlled for, in an elegant manner described above. But there could still be simple order effects due to experience with the qualitative task. HL recognize the possibility of order effects when discussing why they had the high hypothetical task before the high real task: ‘‘Doing the high hypothetical choice task before high real allows us to hold wealth constant and to evaluate the effect of using real incentives. For our purposes, it would not have made sense to do the high real treatment ﬁrst, since the careful thinking would bias the high hypothetical decisions.’’ The same (correct) logic applies to comparisons of the second real task with the ﬁrst real task. The bottom, right panel examines the data collected by HL in task #1 and task #4, which have the same scale but differ only in terms of the order effect and the accumulated wealth from task #3. These lines appear to be identical, suggesting no order effect, but a closer statistical analysis that conditions on the two differences shows that there is in fact an order effect at work.

C2. Modeling Behavior One of the major contributions of HL is to present ML estimates of a relatively ﬂexible utility function using their data. Recognizing the apparent changes in RRA with the scale treatments, they note that CRRA would not

170

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

be appropriate, and use a parameterization of the EP function introduced by Saha (1993).

APPENDIX D. THE EXPERIMENTS OF KACHELMEIER AND SHEHATA (1992) To illustrate the use of the BDM procedure, and to point to some potential problems, consider the ‘‘high payoff’’ experiments from China reported by Kachelmeier and Shehata (1992). These involved subjects facing lotteries with prizes equal to 0.5 yuan, 1 yuan, 5 yuan, or 10 yuan. Although 10 yuan only converted to about $2.50 at the time of the experiments, this represented a considerable amount of purchasing power in that region of China, as discussed by KS (p.1123). There were four treatments. One treatment used 25 lotteries with 5 yuan, one used 25 lotteries with 10 yuan, one used 25 lotteries with 0.5 yuan followed by 25 lotteries with 5 yuan, and one used 25 lotteries with 1 yuan followed by 25 lotteries with 10 yuan. In all cases, the ﬁrst of the battery of 25 lotteries was a hypothetical trainer, and is ignored in the analysis shown below. Figs. 28 and 29 show the data from the experiments of KS in China. The vertical axis shows the ratio of the elicited CE to the expected value of the lottery, and the horizontal axis shows the probability of winning each lottery. Each panel in Fig. 28 shows a scatter of data from each prize treatment. In Fig. 28 we only show data from the ﬁrst series of lottery choices, for comparability in terms of experience. In Fig. 29 we show the results for the high-prize treatments, with the ﬁrst series on top and the second series on the bottom, to show the effect of experience with the general task.92 To orient the analysis, a simple cubic-spline is drawn through the median-bands; these lines are consistent with the formal statistical analysis reported below, but help explain certain features of the data. Four properties of these responses are evident from the pictures. First, the general tendency towards risk-loving behavior at the lower three prize levels, as evidenced by the CEs being greater than the expected value in Fig. 28. Second, the dramatic reduction in the dispersion of selling prices as the probability of winning increases to 1, as evidenced by the pattern of the scatter within each panel. Indeed, these pictures discard data for probabilities less than 0.15, and for ratios greater than 2.5, to allow reasonable scaling. The discarded data exhibit even more dramatic dispersion than is already evident at probability levels of 0.25. Third, the

171

Ratio of Certainty-Equivalent to Expected Value

Risk Aversion in the Laboratory First Task Prize of 1 yuan

First Task Prize of 5 yuan

First Task Prize of 10 yuan

2

1 0

2 1 0 0

.25

.5

1 0 .75 .25 Probability of Winning

.5

.75

1

Risk Premia and Probability of Winning in First Series of Kachelmeier– Shehata Experiments.

Fig. 28.

Ratio of Certainty-Equivalent to Expected Value

First Task Prize of 0.5 yuan

First Task Prize of 0.5 yuan

First Task Prize of 1 yuan

First Task Prize of 5 yuan

First Task Prize of 10 yuan

2 1 0

2 1 0 0

Fig. 29.

.25

.5

.75

1 0 .25 Probability of Winning

.5

.75

1

Risk Premia and Probability of Winning in High Stakes Kachelmeier– Shehata Experiments.

172

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

responses for the highest prize treatment in Fig. 28 are much closer to being risk neutral or risk averse, at least for winning probabilities greater than around 0.2. Finally, the data in Fig. 29 suggests that subjects are less risk loving when they have experience with the task, and/or that there is an increase in risk aversion due to the accumulation of experimental income from the ﬁrst series of tasks. Since the BDM method generates a CE for each lottery, it is possible to estimate the CRRA coefﬁcient directly for each response that a subject makes using the BDM method.93 If p is the winning probability for prize y, and s is the CE elicited as a selling price from the subject, then the coefﬁcient is equal to 1 {ln(p)/(ln(s) ln(y))}. In this form, a value of zero indicates risk neutrality, and negative (positive) values risk-loving (risk averse) behavior. The behavior of the CRRA coefﬁcient elicited using the BDM method is extremely sensitive to experimental conditions, even if one restricts attention to the high-stakes lotteries and win probabilities within 15% of the boundaries.94 First, the coefﬁcients for low win probabilities imply extreme risk loving. This is perfectly plausible given the paltry stakes involved in such lotteries. Second, the coefﬁcient depends on accumulated earnings, as hypothesized by McKee (1989). Increases in the average accumulated income earned in the task increase risk aversion, and increases in the three-round moving average of income decrease risk aversion.95 Third, ‘‘bad joss,’’ as measured by the fraction of random buying prices below the expected buying price of 50% of the prize, is associated with a large increase in risk-loving behavior.96 Fourth, as Fig. 29 would suggest, experience with the general task increases risk aversion. Fifth, increasing the prize from 5 yuan to 10 yuan increases risk aversion signiﬁcantly. Of course, this last result is consistent with non-constant RRA, and should not be necessarily viewed as a problem unless one insisted on applying the same CRRA coefﬁcient over these two reward domains. Fig. 30 summarizes the distribution of CRRA coefﬁcients for the hightask decisions in KS. The dispersion of estimates is high, even though there is a marked tendency towards RN with the 10 yuan task and with experienced subjects. One of the key results here, as stressed by Kachelmeier and Shehata (1994), is that there is considerable variation in CRRA coefﬁcients within each subject’s sample of responses, as well as between subjects. The withinsubjects’ standard deviation in CRRA coefﬁcients is 1.10, and the betweensubjects’ standard deviation is 1.13, around a mean of negative 1.36. To deal with some of these problems we recommend paying subjects for just one stage to avoid intra-session income effects, the use of physical randomizing device to encourage subjects to see the random buyout price as independent of their selling price, the use of winning probabilities between 1/4 and 3/4 to avoid

173

Risk Aversion in the Laboratory First Task Prize of 5 yuan

First Task Prize of 10 yuan

Second Task Prize of 5 yuan

Second Task Prize of 10 yuan

.5 .4 .3 .2 .1 0 .5 .4 .3 .2 .1 0 -6

Fig. 30.

-4 -2 0 2 -6 -4 -2 Estimated CRRA Coefficients for Interior Winning Probabilities

0

2

Estimates of Risk Aversion from Kachelmeier–Shehata Experiments.

the more extreme effects of the end-point probabilities, and the provision of experience in the task in a completely prior session. We would also utilize extended instructions along the lines developed by Plott and Zeiler (2005).

APPENDIX E. THE EXPERIMENTS OF GNEEZY AND POTTERS (1997) The experimental task of Gneezy and Potters (1997) was very simple, and was followed exactly by Haigh and List (2005). Each subject in the baseline treatment made nine decisions over a ﬁxed stake. In GP this stake was 2 Dutch Guilders, which we will call $2.00 for pedagogic ease. In each round they could choose a fraction of the stake to bet. If they chose to bet nothing then they received $2.00 in that round for certain. If they bet $x then they faced a 2/3 chance of losing $x and a 1/3 chance of winning $2.5x. These earnings were on top of the initial stake of $2.00. Thus, the subject literally ended up with ($2.00 $x) with probability 2/3 and ($2.00+$2.5x) with probability 1/3. Since $x could not exceed $2.00, by design, the subject actually faced no losses for the round as a whole. Of course, if one ignores the $2.00 stake the subject did face a loss. In the baseline condition the subject

174

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

chose a bet in each round, the random outcome was realized, their earnings in that round tabulated, and then the next round decision was made. In the alternative treatment the subject made three decisions instead of nine. The ﬁrst decision was a single amount to bet in each of rounds 1 through 3, the second decision was a single amount to bet in each of rounds 4 through 6, and the third decision was a single amount to bet in each of rounds 7 through 9. Thus, the subject made one decision or choice for each of the outcomes in rounds 1, 2, and 3. To state it equivalently, since this is critical to follow, one decision was simply applied three times: it is not the case that the subject made three separate decisions at round 1 that were applied in rounds 1, 2, and 3, respectively. The subject could not say ‘‘bet x, y, and z% in rounds 1, 2, and 3,’’ but could only instead say ‘‘bet x%,’’ meaning that x% would be bet for the subject in each of round 1, 2, and 3. In all other respects the experimental task was the same: the only thing that varied was the horizon over which the choices were made. This is referred to as the Low frequency treatment (L), and the baseline is referred to as the High frequency treatment (H). The raw data in the two sets of experiments are presented in Figs. 31 and 32, which show the distribution of percentage bets. The general qualitative outcome is for subjects to bet more in the L treatment than in the H treatment. Gneezy and Potters (1997; Table I, p. 639) report that 50.5% was bet in their treatment H and 67.4% in their treatment L over all 9 rounds. They conducted their experiments with 83 Dutch students, split roughly evenly across the two treatments in a between-subjects design. Haigh and List (2005) (HLI) report virtually the same outcomes: for their sample of 64 American college students, the fractions were 50.9% and 62.5%, respectively, and for their sample of 54 current and former traders from the Chicago Board of Trade the fractions were 45% and 75%, respectively.97 Using unconditional non-parametric tests or panel Tobit models, these differences are statistically signiﬁcant at standard levels.98 Thus, it appears that samples of subjects drawn from the same population behave as if more risk averse in treatment H compared to treatment L, and that the average subject is risk averse. The latter inference follows from the fact that a risk-neutral subject, according to EUT, would bet 100% of the stake. Figs. 31 and 32 also alert us to one stochastic feature of these data that will play a role later: that there is a substantial spike at the 100% bet level. From an EUT perspective, this corresponds to subjects that are risk neutral or risk loving. If we just consider ‘‘interior bets’’ then the same qualitative results obtain. In GP, the Low frequency treatment generates an average 42.1% bet

175

Risk Aversion in the Laboratory High Frequency .4 .3 .2

Fraction

.1 0 Low Frequency .4 .3 .2 .1 0 0

Fig. 31.

25

50 Percentage of Stake Bet

75

100

Distribution of Percentage Bets in Gneezy and Potters (1997) Experiments.

High Frequency Students

High Frequency Traders

Low Frequency Students

Low Frequency Traders

.4 .3 .2

Fraction

.1 0 .4 .3 .2 .1 0 0

Fig. 32.

25

50

75 100 0 Percentage of Stake Bet

25

50

75

100

Distribution of Percentage Bets in Haigh and List (2005) Experiments.

176

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

compared to an average 33.9% bet in the High frequency treatment. In HLI, the students (traders) bet an average of 37.7% (25.3%) and 51.4% (59.3%) in each treatment.

E1. Explaining the Data When interpreting the experiments of GP and HLI it is important to view subjects as having a utility function that is deﬁned over prize income that reﬂects the stakes that choices are being made over. The high frequency subjects can be viewed as making a series of nine choices over stakes deﬁned, for each choice, by a vector y which takes on a range of integer values between $0 and $7. The subject could get $0 if they bet 100% of the stake and lost it; or they could get as much as $7 if they bet 100% of the stake and won 2.5 $2.00. The low frequency subjects, on the other hand, made three choices over stakes deﬁned by the possible combinations of gains and losses over three random draws. Thus, they could end up with three losses, 2 losses and 1 gain, 1 loss and 2 gains, or 3 gains. The probabilities for each outcome, irrespective of order, are 0.30, 0.44, 0.22, and 0.04, respectively. The monetary outcome in each case depends on the fraction of the stake that the subject chose to bet. Table 10 spells out the arithmetic for different bets. For simplicity we evaluate the possible choices in increments of 10 cents, but of course the choices could be in pennies.99 The second column shows the bet as a percent of the stake of $2.00. Columns 3 though 7 show the components of the lottery facing the subject in the High frequency treatment for each possible bet, and columns 8 through 16 show the same components for the subject in Low frequency treatment. Consider, for example, a bet of 10 cents, which is 5% of the stake. If the subject is in the High treatment and loses, they earn 190 ( ¼ 200 10) cents in that period; this occurs with probability 2/3. If the subject is in the High treatment and wins, they earn 225( ¼ 200+ 10 2.5 ¼ 200+25) cents; this occurs with probability 1/3. In the corresponding entry for the subject in the Low treatment, the value of prizes is calculated similarly, but for three random draws. Thus, in the LLL outcome, the subject earns 570( ¼ 200 10+200 10+200 10) cents. From Table 10 we see instantly that a risk-neutral subject that obeyed EUT would bet 100% of the pie in both treatments and thereby maximize expected value. It can also be inferred that a moderately risk-averse subject would bet some fraction of the pie in each treatment, less than 100%, and that a risk-loving subject would always bet 100% of the pie.

0 5 10 15 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 95 100

0 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 160 170 180 190 200

200 190 180 170 160 150 140 130 120 110 100 90 80 70 60 50 40 30 20 10 0

L

0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.67

p(L)

200 225 250 275 300 325 350 375 400 425 450 475 500 525 550 575 600 625 650 675 700

G

0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33 0.33

p(G)

High Frequency Treatment

200.0 201.7 203.3 205.0 206.7 208.3 210.0 211.7 213.3 215.0 216.7 218.3 220.0 221.7 223.3 225.0 226.7 228.3 230.0 231.7 233.3

EV

600 570 540 510 480 450 420 390 360 330 300 270 240 210 180 150 120 90 60 30 0

LLL

0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30 0.30

p(LLL)

600 605 610 615 620 625 630 635 640 645 650 655 660 665 670 675 680 685 690 695 700

LLG

0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44 0.44

p(LLG)

600 640 680 720 760 800 840 880 920 960 1000 1040 1080 1120 1160 1200 1240 1280 1320 1360 1400

LGG

0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22 0.22

p(LGG)

Low Frequency Treatment

Illustrative Calculations Assuming Risk Neutrality.

600 675 750 825 900 975 1050 1125 1200 1275 1350 1425 1500 1575 1650 1725 1800 1875 1950 2025 2100

GGG

0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04

p(GGG)

600 605 610 615 620 625 630 635 640 645 650 655 660 665 670 675 680 685 690 695 700

EV

Note: Bold row shows EUT-consistent choices. p(LLL) ¼ 2/3 2/3 2/3; p(LLG) ¼ 2/3 2/3 1/3, and can occur in three equivalent ways (LLG, LGL, and GLL), so the probability shown is 2/3 2/3 1/3 3; p(LGG) ¼ 2/3 1/3 1/3, and can also occur in three equivalent ways; and p(GGG) ¼ 1/3 1/3 1/3.

Bet as %

Bet in cents

Possible Choices

Table 10.

Risk Aversion in the Laboratory 177

178

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

The outcomes of the lotteries being evaluated by subjects in the High and Low treatments differ signiﬁcantly. Consider the 50% bet, in the middle of Table 10. For subjects in the High treatment the two ﬁnal outcomes from each choice are 100 and 450, occurring with the probabilities shown there. For subjects in the Low treatment there are four ﬁnal outcomes from each choice: 300, 650, 1,000, and 1,350. Thus, the monetary rewards from the same percentage choice differ signiﬁcantly. So, to explain why subjects in the High treatment are more risk averse than subjects in the Low treatment, it sufﬁces at a qualitative level to ﬁnd some utility function that has moderate amounts of risk aversion for ‘‘low’’ income levels and smaller amounts of risk aversion for ‘‘higher’’ income levels. Although less obvious than the RN prediction, any subject exhibiting CRRA would choose the same bet fraction in each row. The more risk averse they were, the smaller would be the bet, but it would be the same bet in each of the High and Low treatments. This result is important since every statement of ‘‘the EUT null hypothesis’’ in the MLA literature that we can ﬁnd uses RN or CRRA speciﬁcations for the utility function.100 Thus, it is easy to see why evidence of a difference between the bet fractions in the High and Low treatments is viewed as a rejection of EUT. Of course, this does not test EUT at all. It only tests a very special case of EUT, where the speciﬁc functional form seems to have been chosen to perform poorly.101 It is easy to propose more ﬂexible utility functions than CRRA. There are many such functions, but one of the most popular in recent work that is fully consistent with EUT has been the EP utility function proposed by Saha (1993). Following Holt and Laury (2002), the EP function is deﬁned as UðxÞ ¼ ð1 expðax1r ÞÞ a where a and r are parameters to be assumed or estimated. RRA is then r+a(1 r)y1 r, so RRA varies with income if a ¼ 6 0. This function nests CRRA (as a-0) and CARA (as r-0). At a qualitative level, if rW0 and ao0 one can immediately rationalize the qualitative data in these experiments: RRA ¼ r+a(1 r)y1 r-r as y-0, and then one has declining RRA with higher prize incomes since ao0.

E2. Modeling Behavior The qualitative insight that one can explain these data with a simple EUT speciﬁcation can be formalized by estimating the parameters of a model that

Risk Aversion in the Laboratory

179

account for the data. Such an exercise also helps explain some differences between the traders and students in Haigh and List (2005). As noted earlier, Figs. 31 and 32 alert us to the fact that the behavioral process generating data at the 100% bet level may be different than the process generating data at the ‘‘interior’’ solutions. From a statistical perspective, this is just a recognition that a model that tries to explain the interior modes of these data, and why they vary between the High and Low treatments, might have a difﬁcult time also accounting for the spike at 100%. One approach is just to ignore that spike, and see what estimates obtain. Another approach is to construct a model and likelihood function that accounts for these two processes.102 We apply both approaches, although favoring the latter a priori. The dependent variable is naturally characterized as the fraction of the stake bet, denoted p. Therefore, the likelihood function is constructed using the speciﬁcation developed by Papke and Wooldridge (1996) for fractional dependent variables. Speciﬁcally, the log-likelihood of observation i is deﬁned as li(x) ¼ pi log(G(xi, x))+(1 pi) log(1 G(xi, x)) for parameter vector x, a vector of explanatory variables x, and some convenient cumulative distribution function G( ). We use the cumulative Gamma distribution function G(z) ¼ G(a,z), where a is a parameter that can be estimated.103 The index zi is the expected utility of the bet chosen, conditional on some parameter estimates of x and some characteristics xi for observation i. The index z is constructed using information on the lottery for the actual bet, reﬂecting a more detailed version of the arithmetic underlying Table 10. Thus, for a particular fractional bet, the parameters of the task imply that the subject was facing a particular lottery. So, one element of the x vector is whether or not the subject was in the High or Low treatment. Another element is the stake. Yet another element is the set of parameters of the experimental task deﬁning the lottery outcomes (e.g., the probabilities of a loss or a gain, and the numbers deﬁning how the bet is scaled to deﬁne the loss or the gain). Using this information, and candidate estimates of r and a for the EP utility function, the likelihood constructs the expected utility of the observed choice, and the ML estimates ﬁnd the parameters of the EP utility function that best explain the observed choices. This approach can be applied directly to the data in Figs. 31 and 32, recognizing that one model must explain the multiple modes of these distributions. Alternatively, one can posit a natural two-step decision process, where the subject ﬁrst decides if they are going to bet everything or not, and then if they decide not to, decides how much to bet (including 0%). This might correspond to one way that a risk-averse or risk-loving subject

180

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

might process such tasks: ﬁrst ﬁgure out what a RN decision-maker would do, since that is computationally easier, and then shade one’s choice in the direction dictated by risk preferences. Since the matrix in Table 10 was not presented to subjects in such an explicit form, this would be one sensible heuristic to use. Irrespective of the interpretation, this proposed decision process implies a statistical ‘‘hurdle’’ model. First the subject makes a binary choice to contribute 100% or less. Then the subject decides what fraction to contribute, conditional on contributing less than 100%. The ﬁrst stage can be modeled using a standard probit speciﬁcation, although it is the second stage that is really of greatest interest. A key feature of these estimates is that they pool the data from High and Low treatments. The objective is to ascertain if one EUT-consistent model can explain the shift in the distributions between these treatments in Figs. 31 and 32. Since each subject provided multiple observations there are clustering corrections for the possible correlation of errors associated with a given subject. Table 11 reports the results of ML estimation of these models. Panel A provides estimates for the individual responses from Gneezy and Potters (1997). These estimates show some initial risk aversion at zero income levels (r ¼ 0.21) and then some slight evidence of declining RRA as income rises (a ¼ 10.019). However, the evidence of declining RRA is not statistically signiﬁcant, although the 95% conﬁdence interval is skewed towards negative values. Much more telling evidence comes from comparable estimates for the interior bets, in panel B. Here we ﬁnd striking evidence of the qualitative explanation presented earlier: initial risk aversion at zero income levels (r ¼ 1.12) and sharply declining RRA as income rises (a ¼ 0.57). The point estimate of r exceeds 1 in this case, which violates the assumption of non-satiation. But the standard error on this estimate is 0.25, with a 95% conﬁdence interval between 0.61 and 1.63. So we cannot reject the hypothesis that rr1; in fact, the p-value that the coefﬁcient equals 1 is 0.68, so we cannot reject that speciﬁc hypothesis. Panels C through E report estimates for the treatments of Haigh and List (2005), estimated separately for traders and students since that was their main treatment. With the exception of the estimates in panel E, for all bets by University of Maryland (UMD) students, these results again conﬁrm the qualitative explanation proposed above. Therefore, one must simply reject the conclusion of Haigh and List (2005, p. 531) that their ‘‘ﬁndings suggest that expected utility theory may not model professional traders’ behavior well, and this ﬁnding lends credence to behavioral economics and ﬁnance

181

Risk Aversion in the Laboratory

Table 11. Coefﬁcient

Maximum Likelihood Estimates of Expo-Power Utility Function. Estimate

Standard Error

p-Value

Lower 95% Conﬁdence Interval

Upper 95% Conﬁdence Interval

A. Gneezy and Potters (1997) r 0.21 a 0.02 a 2.32

– Estimates for All Bets by Dutch Students 0.08 0.009 0.06 0.03 0.463 0.07 0.22 0.000 1.87

B. Gneezy and Potters (1997) ra 1.12 a 0.57 a 1.88

– Estimates for Interior Bets by Dutch Students 0.25 0.000 0.61 1.63 0.09 0.000 0.74 0.40 0.29 0.000 1.30 2.46

0.37 0.03 2.76

C. Haigh and List (2005) – Estimates for All Bets by CBOT Traders r 0.36 0.05 0.000 0.26 a 0.13 0.02 0.000 0.16 a 3.67 0.42 0.000 2.82

0.46 0.10 4.53

D. Haigh and List (2005) – Estimates for Interior Bets by CBOT Traders r 0.67 0.04 0.000 0.60 a 0.44 0.01 0.000 0.46 a 3.69 0.34 0.000 3.01

0.74 0.42 4.37

E. Haigh and List (2005) – Estimates for All Bets by UMD Studentsb r 0.99 0.27 0.001 1.54 a 0.22 0.05 0.000 0.13 a 1.71 0.21 0.000 1.28

0.44 0.32 2.13

a

See text for discussion of the point estimate for r exceeding 1, since that violates the nonsatiation assumption for this speciﬁcation. b There are no estimates for the sub-sample of interior bets, since the estimate of r exceeds 1, and is statistically signiﬁcantly greater than 1.

models, which are beginning to relax inherent assumptions used in standard ﬁnancial economics.’’ Whether MLA models the behavior of traders better than EUT is a separate matter, but EUT easily explains the data. In fact, these data are more consistent with the priors that motivated the Haigh and List (2005) study, illustrated by List (2003), that students would be more likely to exhibit anomalies than ﬁeld traders. E3. Coals To Newcastle: An Anomaly for the Behaviorists The reason that MLA is interesting is that Benartzi and Thaler (1995) use it to provide an intuitive explanation for the equity premium puzzle. Their

182

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

empirical approach is to assume a particular numerical speciﬁcation of MLA, and then solve for the ‘‘evaluation horizon’’104 of returns to stocks and equities that makes their expected utility105 equivalent. They ﬁnd that this horizon is roughly 12 months, which strikes one as a priori plausible if one had to pick a single representative evaluation horizon for all investors.106 Thus, they assume a particular empirical version of MLA and further assume that these coefﬁcients do not change as they counterfactually calculate the effects of alternative evaluation horizons: According to our theory, the equity premium is produced by a combination of loss aversion and frequent evaluation. Loss aversion plays the role of risk aversion in standard models, and can be considered a fact of life (or, perhaps, a fact of preferences). In contrast, the frequency of evaluations is a policy choice that presumably could be altered, at least in principle. Furthermore, as the charts (y) show, stocks become more attractive as the evaluation period increases.

So the parameters of the MLA speciﬁcation are assumed invariant to evaluation horizon, as an essential premiss of the empirical methodology. Thus the motivation for the experiments of GP and HL. As GP note, Benartzi and Thaler (1995) ‘‘y do not present direct (experimental) evidence for the presence of MLA. The evidence presented in (BT) is only circumstantial. (y) We have experimental subjects making a sequence of risky choices. To analyze the presence of MLA, we do not try to estimate the period over which subjects evaluate ﬁnancial outcomes, but rather we try to manipulate this evaluation period.’’ Hence the data from GP can be used to recover the MLA preferences that are consistent with the observed behavior, and the empirical premiss of Benartzi and Thaler (1995) evaluated. Since behavioral economists are so enamored of anomalies, it may be useful to point out one or two in the MLA literature being considered here. The ﬁrst anomaly is that the data from the experiments of GP demonstrate that the MLA parameters themselves depend on the evaluation horizon, which of course was varied by experimental design in their data. Hence one cannot assume that those parameters stay ﬁxed as one calibrates the equity premium by varying the evaluation horizon. The second anomaly is that these data also imply risk attitudes deﬁned over the utility function that are qualitatively the opposite of those customarily assumed. The MLA parameterization adopted by Benartzi and Thaler (1995, p. 79) is taken directly from Tversky and Kahneman (1992), both in terms of the functional forms and parameter values. They assume a power utility function deﬁned separately over gains and losses: U(x) ¼ xa if xZ0, and U(x) ¼ l( x)b for xo0. So a and b are the risk aversion parameters, and

183

Risk Aversion in the Laboratory

l is the coefﬁcient of loss aversion. Tversky and Kahneman (1992, p. 59) provide estimates that have been universally employed in applied work by behaviorists: a ¼ b ¼ 0.88 and l ¼ 2.25. Using the data from GP we estimate the parameters of this MLA model. For simplicity we assume no probability weighting, although that could be included. Benartzi and Thaler (1995, p. 83) and GP stress that it is the loss aversion parameter l that drives the main prediction of MLA, rather than probability weighting or even risk aversion in the utility function. The likelihood function is again constructed using the speciﬁcation developed by Papke and Wooldridge (1996) for fractional dependent variables. Since there are no data on personal characteristics in the GP data, the x vector refers solely to whether or not the decision was made in the Low frequency setting or the High frequency setting. Thus, x ¼ (a, b, l), and each of those fundamental parameters is estimated as a linear function of binary dummies for the Low and High frequencies.107 Table 12 reports the ML estimates obtained. The ‘‘good news’’ for MLA is that they provide strong evidence that the loss aversion parameter is greater than 1. The ‘‘bad news’’ for MLA is that they provide equally striking evidence that all of the parameters of the MLA speciﬁcation vary with the evaluation horizon. The ‘‘awkward news’’ for MLA is that they provide inconsistent evidence about risk attitudes in relation to the received empirical wisdom. The estimates for a indicate risk-loving behavior over gains.108 There does not appear to be much difference in risk attitudes over gains, and indeed one cannot reject the null hypothesis that they are equal with a Wald test ( p-value ¼ 0.391). The estimates for b indicate a severe case of risk aversion over losses. Moreover, subjects appear to be more risk averse in the Low frequency setting than in the High frequency setting: a Wald test of the null Table 12. Coefﬁcient

a b l a

Maximum Likelihood Estimates of Myopic Loss Aversion Utility Functiona. Variable

Estimate

Standard Error

pValue

Lower 95% Conﬁdence Interval

Upper 95% Conﬁdence Interval

Low frequency High frequency Low frequency High frequency Low frequency High frequency

1.48 1.38 0.03 0.55 1.90 4.28

0.04 0.10 0.07 0.28 0.08 0.64

0.000 0.000 0.689 0.052 0.000 0.000

1.40 1.18 0.11 0.00 1.74 2.99

1.55 1.59 0.17 1.10 2.07 5.56

Estimates from responses in Gneezy and Potters (1997) experiments.

184

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

hypothesis of equality has a p-value of 0.074. Finally, the estimates for l are consistent with loss aversion, since they are both each signiﬁcantly greater than 1 ( p-valueso0.0001). However, these subjects appear to be signiﬁcantly more loss averse in the High frequency setting than in the Low frequency setting ( p-value ¼ 0.0005). This new analysis of the GP data therefore imply that the MLA parameters depend on the evaluation horizon and that subjects are risk loving in gains and risk averse in losses, thus pointing to anomalies compared to the standard view of PT.

APPENDIX F. ESTIMATION USING MAXIMUM LIKELIHOOD Economists in a wide range of ﬁelds are now developing customized likelihood functions to correspond to speciﬁc models of decision-making processes. These demands derive partly from the need to consider a variety of parametric functional forms, but also because these models often specify non-standard decision rules that have to be ‘‘written out by hand.’’ Thus, it is becoming common to see user-written ML estimates, and less use of prepackaged model speciﬁcations. These pedagogic notes document the manner in which one can estimate ML models of utility functions within Stata.109 However, we can quickly go beyond ‘‘utility functions’’ and consider a wide range of decision-making processes, to parallel the discussion in the text. We start with a standard CRRA utility function and binary choice data over two lotteries, assuming EUT. This step illustrates the basic economic and statistical logic, and introduces the core Stata syntax. We then quickly consider an extension to consider loss aversion and probability weighting from PT, the inclusion of ‘‘stochastic errors,’’ and the estimation of utility numbers themselves to avoid any parametric assumption about the utility function. We then illustrate a replication of the ML estimates of HL. Once the basic syntax is deﬁned from the ﬁrst example, it is possible to quickly jump to other likelihood functions using different data and speciﬁcations. Of course, this is just a reﬂection of the ‘‘extensible power’’ of a package such as Stata, once one understands the basic syntax.110 F1. Estimating a CRRA Utility Function Consider the simple CRRA speciﬁcation in Section 2.2. This is an EUT model, with a CRRA utility function, and no stochastic error speciﬁcation.

185

Risk Aversion in the Laboratory

The following Stata program deﬁnes the model, in this case using the lottery choices of Harrison and Rutstro¨m (2005), which are a replication of the experimental tasks of Hey and Orme (1994): * define Original Recipe EUT with CRRA and no errors program define ML_eut0 args lnf r tempvar prob0l prob1l prob2l prob3l prob0r prob1r prob2r prob3r y0 y1 y2 y3 tempvar euL euR euDiff euRatio tmp lnf_eut lnf_pt p1 p2 f1 f2 quietly { * construct likelihood for generate double `prob0l' = generate double `prob1l' = generate double `prob2l' = generate double `prob3l' =

EUT $ML_y2 $ML_y3 $ML_y4 $ML_y5

generate generate generate generate

double double double double

`prob0r' `prob1r' `prob2r' `prob3r'

$ML_y6 $ML_y7 $ML_y8 $ML_y9

generate generate generate generate

double double double double

`y0' `y1' `y2' `y3'

= = = =

= = = =

($ML_y14+$ML_y10)^`r' ($ML_y14+$ML_y11)^`r' ($ML_y14+$ML_y12)^`r' ($ML_y14+$ML_y13)^`r'

gen double `euL' = (`prob0l'*`y0')+(`prob1l'*`y1')+(`prob2l'*`y2')+(`prob3l'*`y3') gen double `euR' = (`prob0r'*`y0')+(`prob1r'*`y1')+(`prob2r'*`y2')+(`prob3r'*`y3') generate double `euDiff' = `euR' - `euL' replace `lnf' = ln(normal( `euDiff')) if $ML_y1==1 replace `lnf' = ln(normal(-`euDiff')) if $ML_y1==0 } end

This program makes more sense when one sees the command line invoking it, and supplying it with values for all variables. The simplest case is where there are no explanatory variables for the CRRA coefﬁcient (we cover those below): ml model lf ML_eut0 (r: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = ) if Choices~=., cluster(id) technique(nr) maximize

The ‘‘ml model’’ part invokes the Stata ML model speciﬁcation routine, which essentially reads in the ML_eut0 program deﬁned above and makes sure that it does not violate any syntax rules. The ‘‘lf’’ part of ‘‘lf ML_eut0’’ tells this routine that this is a particular type of likelihood speciﬁcation (speciﬁcally, that the routine ML_eut0 does not calculate analytical derivatives, so those must be calculated numerically). The part in brackets deﬁnes the equation for the CRRA coefﬁcient r. The ‘‘r:’’ part just labels this equation, for output display purposes and to help reference initial values if they are speciﬁed for recalcitrant models. There is no need for the ‘‘r:’’ here to match the ‘‘r’’ inside the ML_eut0 program; we could have referred to

186

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

‘‘rEUT:’’ in the ‘‘ml model’’ command. We use the same ‘‘r’’ to help see the connection, but it is not essential. The ‘‘Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake’’ part tells the program what observed values and data to use. This allows one to pass parameter values as well as data to the likelihood evaluator deﬁned in ML_eut0. Each item in this list translates into a $ML_y variable referenced in the ML_eut0 program, where denotes the order in which it appears in this list. Thus, the data in variable Choices, which consists of 0’s and 1’s for choices (and a dot, to signify ‘‘missing’’), is passed to the ML_eut0 program as variable $ML_y1. Variable p0left, which holds the probabilities of the ﬁrst prize of the lottery presented to subjects on the left of their screen, is passed as $ML_y2, and so on. Finally, variable stake, holding the values of the initial endowments provided to subjects, gets passed as variable $ML_y14. It is good programming practice to then deﬁne these in some less cryptic manner, as we do just after the ‘‘quietly’’ line in ML_eut0. This does not signiﬁcantly slow down execution, and helps avoid cryptic code. There is no error if some variable that is passed to ML_eut0 is not referenced in ML_eut0. Once the data is passed to ML_eut0 the likelihood function can be evaluated. By default, it assumes a constant term, so when we have ‘‘ ¼ )’’ in the above command line, this is saying that there are no other explanatory variables. We add some below, but for now this model is just assuming that one CRRA coefﬁcient characterizes all choices by all subjects. That is, it assumes that everyone has the same risk preference. We restrict the data that is passed to only include strict preferences, hence the ‘‘if ChoicesB ¼ .’’ part at the end of the command line. The response of indifference was allowed in this experiment, and we code it as a ‘‘missing’’ value. Thus, the estimation only applies to the sub-sample of strict preferences. One could modify the likelihood function to handle indifference. Returning to the ML_eut0 program, the ‘‘args’’ line deﬁnes some arguments for this program. When it is called, by the default Newton– Raphson optimization routine within Stata, it accepts arguments in the ‘‘r’’ array and returns a value for the log-likelihood in the ‘‘lnf’’ scalar. In this case, ‘‘r’’ is the vector of coefﬁcient values being evaluated. The ‘‘tempvar’’ lines create temporary variables for use in the program. These are temporary in the sense that they are only local to this program, and hence can be the same as variables in the main calling program. Once deﬁned they are referred to with the ML_eut0 program by adding the funny

187

Risk Aversion in the Laboratory

left single-quote mark ‘ and the regular right single-quote mark ’. Thus temporary variable euL, to hold the expected utility of the left lottery, is referred to as ‘euL’ in the program.111 The ‘‘quietly’’ line deﬁnes a block of code that is to be processed without the display of messages. This avoids needless display of warning messages, such as when some evaluation returns a missing value. Errors are not skipped, just display messages.112 The remaining lines should make sense to any economist from the comment statements. The program simply builds up the expected utility of each lottery, using the CRRA speciﬁcation for the utility of the prizes. Then it uses the probit index function to deﬁne the likelihood values. The actual responses, stored in variable Choices (which is internal variable $ML_y1), are used at the very end to deﬁne which side of the probit index function this choice happens to be. The logit index speciﬁcation is just as easy to code up: you replace ‘‘normal’’ with ‘‘invlogit’’ and you are done! The most important feature of this speciﬁcation is that one can ‘‘build up’’ the latent index with as many programming lines as needed. Thus, as illustrated below, it is an easy matter to write out more detailed models, such as required for estimation of PT speciﬁcations or mixture models. The ‘‘cluster(id)’’ command at the end tells Stata to treat the residuals from the same person as potentially correlated. It then corrects for this fact when calculating standard errors of estimates. Invoking the above command line, with the ‘‘maximize’’ option at the end to tell Stata to actually proceed with the optimization, generates this output: initial: alternative: rescale: Iteration 0: Iteration 1: Iteration 2: Iteration 3: Iteration 4:

log log log log log log log log

pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood pseudolikelihood

= = = = = = = =

-8155.5697 -7980.4161 -7980.4161 -7980.4161 -7692.4056 -7689.4848 -7689.4544 -7689.4544

(not concave)

. ml display Log pseudolikelihood = -7689.4544

Number of obs Wald chi2(0) Prob > chi2

= = =

11766 . .

(Std. Err. adjusted for 215 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------_cons | .7531553 .0204812 36.77 0.000 .7130128 .7932977 ------------------------------------------------------------------------------

188

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

So we see that the optimization routine converged nicely, with no error messages or warnings about numerical irregularities at the end. The interim warning message is nothing to worry about: only worry if there is an error message of any kind at the end of the iterations. (Of course, lots of error message, particularly about derivatives being hard to calculate, usually ﬂag convergence problems.) The ‘‘ml display’’ command allows us to view the standard output, and is given after the ‘‘ml model’’ command. For our purposes the critical thing is the ‘‘_cons’’ line, which displays the ML estimate and its standard error. Thus, we have estimated that r^ ¼ 0:753. This is the ML CRRA coefﬁcient in this case. This indicates that these subjects are risk averse. Before your program runs nicely it may have some syntax errors. The easiest way to check these is to issue the command ml model lf ML_eut0 (r: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = )

which is the same as before except that it drops off the material after the comma, which tells Stata to maximize the likelihood and how to handle the errors. This command simply tells Stata to read in the model and be ready to process it, but not to begin processing it. You would then issue the command ml check

and Stata will provide some diagnostics. These are extremely informative if you use them, particularly for syntax errors. The power of this approach becomes evident when we allow the CRRA coefﬁcient to be determined by individual or treatment characteristics. To illustrate, consider the effect of allowing the CRRA coefﬁcient to differ depending on the individual demographic characteristics of the subject, as explained in the text. Here is a list and sample statistics: Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------Female | 215 .4790698 .5007276 0 1 Black | 215 .1069767 .309805 0 1 Hispanic | 215 .1348837 .3423965 0 1 Age | 215 19.95814 3.495406 17 47 Business | 215 .4511628 .4987705 0 1 215 .4604651 .4995978 0 1 GPAlow |

189

Risk Aversion in the Laboratory

The earlier command line is changed slightly at the ‘‘ ¼ )’’ part to read ‘‘ ¼ Female Black Hispanic Age Business GPAlow)’’, and no changes are made to ML_eut0. The results are as follows: ml model lf ML_eut0 (r: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = Female Black Hispanic Age Business GPAlow), cluster(id) maximize . ml display Log pseudolikelihood = -7557.2809

Number of obs Wald chi2(6) Prob > chi2

= = =

11766 27.48 0.0001

(Std. Err. adjusted for 215 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------Female | -.0904283 .0425979 -2.12 0.034 -.1739187 -.0069379 Black | -.1283174 .0765071 -1.68 0.094 -.2782686 .0216339 Hispanic | -.2549614 .1149935 -2.22 0.027 -.4803446 -.0295783 Age | .0218001 .0052261 4.17 0.000 .0115571 .0320432 Business | -.0071756 .0401536 -0.18 0.858 -.0858753 .071524 GPAlow | .0131213 .0394622 0.33 0.740 -.0642233 .0904659 _cons | .393472 .1114147 3.53 0.000 .1751032 .6118408 ------------------------------------------------------------------------------

So we see that the CRRA coefﬁcient changes from r ¼ 0.753 to r ¼ 0.393– 0.090 Female 0.128 Black y and so on. We can quickly ﬁnd out what the average value of r is when we evaluate this model using the actual characteristics of each subject and the estimated coefﬁcients: . predictnl r=xb(r) . summ r if task==1 Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------r | 215 .7399284 .1275521 .4333093 1.320475

So the average value is 0.739, extremely close to the earlier estimate of 0.753. Thus, all we have done is provided a richer characterization of risk attitudes around roughly the same mean.

F2. Loss Aversion and Probability Weighting It is a simple matter to specify different economic models. Two of the major structural features of PT are probability weighting and loss aversion. The code below implements each of these speciﬁcations, using the parametric forms of Tversky and Kahneman (1992). For simplicity we assume that the decision weights are the probability weights, and do not implement the rank-dependent transformation of probability weights into

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

190

decision weights. Thus, the model is strictly an implementation of OPT from Kahneman and Tversky (1979). The extension to rank-dependent decision weights is messy from a programming perspective, and nothing is gained pedagogically here by showing it; Harrison (2006c) shows the mess in full. Note how much of this code is similar to ML_eut0, and the differences: * define OPT specification with no errors program define MLkt0 args lnf alpha beta lambda gamma tempvar prob0l prob1l prob2l prob3l prob0r prob1r prob2r prob3r y0 y1 y2 y3 tempvar euL euR euDiff euRatio tmp quietly { gen double `tmp' = (($ML_y2^`gamma')+($ML_y3^`gamma')+($ML_y4^`gamma')+($ML_y5^`gamma')) replace `tmp’ = `tmp’^(1/`gamma') generate double `prob0l' = ($ML_y2^`gamma')/`tmp' generate double `prob1l' = ($ML_y3^`gamma')/`tmp' generate double `prob2l' = ($ML_y4^`gamma')/`tmp' generate double `prob3l' = ($ML_y5^`gamma')/`tmp' replace `tmp' = replace `tmp' = generate double generate double generate double generate double

(($ML_y6^`gamma')+($ML_y7^`gamma')+($ML_y8^`gamma')+($ML_y9^`gamma')) `tmp’^(1/`gamma') `prob0r' = ($ML_y6^`gamma')/`tmp' `prob1r' = ($ML_y7^`gamma')/`tmp' `prob2r' = ($ML_y8^`gamma')/`tmp' `prob3r' = ($ML_y9^`gamma')/`tmp'

generate double `y0' = . replace `y0' = ( $ML_y10)^(`alpha') if $ML_y10>=0 replace `y0' = -`lambda'*(-$ML_y10)^(`beta') if $ML_y10<0 generate double `y1' = . replace `y1' = ( $ML_y11)^(`alpha') if $ML_y11>=0 replace `y1' = -`lambda'*(-$ML_y11)^(`beta') if $ML_y11<0 generate double `y2' = . replace `y2' = ( $ML_y12)^(`alpha') if $ML_y12>=0 replace `y2' = -`lambda'*(-$ML_y12)^(`beta') if $ML_y12<0 generate double `y3' = . replace `y3' = ( $ML_y13)^(`alpha') if $ML_y13>=0 replace `y3' = -`lambda'*(-$ML_y13)^(`beta') if $ML_y13<0 gen double `euL'=(`prob0l'*`y0')+(`prob1l'*`y1')+(`prob2l'*`y2')+(`prob3l'*`y3') gen double `euR'=(`prob0r'*`y0')+(`prob1r'*`y1')+(`prob2r'*`y2')+(`prob3r'*`y3') generate double `euDiff' = `euR' - `euL' replace `lnf' = ln(normal( `euDiff')) if $ML_y1==1 replace `lnf' = ln(normal(-`euDiff')) if $ML_y1==0 } end

The ﬁrst thing to notice is that the initial line ‘‘args lnf alpha beta lambda gamma’’ has more parameters than with ML_eut0. The ‘‘lnf ’’ parameter is the same, since it is the one used to return the value of the likelihood function for trial values of the other parameters. But we now have four parameters instead of just one.

191

Risk Aversion in the Laboratory

When we estimate this model we get this output: . ml model lf MLkt0 (alpha: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 = ) (beta: ) (lambda: ) (gamma: ), cluster(id ) maximize ml display Log pseudolikelihood = -7455.1001

Number of obs Wald chi2(0) Prob > chi2

= = =

11766 . .

(Std. Err. adjusted for 215 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------alpha | _cons | .6551177 .0275903 23.74 0.000 .6010417 .7091938 -------------+---------------------------------------------------------------beta | _cons | .8276235 .0541717 15.28 0.000 .721449 .933798 -------------+---------------------------------------------------------------lambda | _cons | .7322427 .1163792 6.29 0.000 .5041436 .9603417 -------------+---------------------------------------------------------------gamma | _cons | .938848 .0339912 27.62 0.000 .8722265 1.00547 ------------------------------------------------------------------------------

So we get estimates for all four parameters. Stata used the variable ‘‘_cons’’ for the constant, and since there are no characteristics here, that is the only variable to be estimated. We could also add demographic or other characteristics to any or all of these four parameters. We see that the utility curvature coefﬁcients a and b are similar, and indicate concavity in the gain domain and convexity in the loss domain. The loss aversion parameter l is less than 1, which is a blow for PT since ‘‘loss aversion’’ calls for lW1. And g is very close to 1, which is the value that implies that w( p) ¼ p for all p, the EUT case. We can readily test some of these hypotheses: . test [alpha]_cons=[beta]_cons ( 1) [alpha]_cons - [beta]_cons = 0 chi2( 1) = Prob > chi2 =

8.59 0.0034

. test [lambda]_cons=1 ( 1) [lambda]_cons = 1 chi2( 1) = Prob > chi2 =

5.29 0.0214

. test [gamma]_cons=1 ( 1) [gamma]_cons = 1 chi2( 1) = Prob > chi2 =

3.24 0.0720

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

192

So we see that PT is not doing so well here in relation to the a priori beliefs it comes packaged with, and that the deviation in l is indeed statistically signiﬁcant. But g is less than 1, so things are not so bad in that respect. F3. Adding Stochastic Errors In the text the Luce and Fechner ‘‘stochastic error stories’’ were explained. To add the Luce speciﬁcation, popularized by HL, we return to base camp, the ML_eut0 program, and simply make two changes. We augment the arguments by one parameter, m, to be estimated: args lnf r mu

and then we revise the line deﬁning the EU difference from generate double `euDiff' = `euR' - `euL'

to generate double `euDiff' = (`euR'^(1/`mu'))/((`euR'^(1/`mu')) +(`euL'^(1/`mu')))

So this changes the latent preference index from being the difference to the ratio. But it also adds the 1/m exponent to each expected utility. Apart from this change in the program, there is nothing extra that is needed. You just add one more parameter in the ‘‘ml model’’ stage, as we did for the PT extensions. In fact, HL cleverly exploit the fact that the latent preference index deﬁned above is already in the form of a cumulative probability density function, since it ranges from 0 to 1, and is equal to 1/2 when the subject is indifferent between the two lotteries. Thus, instead of deﬁning the likelihood contribution by replace `lnf' = ln(normal( `euDiff')) if $ML_y1==1 replace `lnf' = ln(normal(-`euDiff')) if $ML y1==0

we can use replace `lnf' = ln(`euDiff') if $ML_y1==1 replace `lnf' = ln(1-`euDiff') if $ML y1==0

instead. The Fechner speciﬁcation popularized by Hey and Orme (1994) implies a simple change to ML_eut0. Again we add an error term ‘‘noise’’ to the arguments of the program, as above, and now we have the latent index generate double `euDiff' = (`euR' - `euL')/`noise'

instead of the original generate double `euDiff' = `euR' - `euL'

193

Risk Aversion in the Laboratory

Here are the results: . ml model lf ML_eut (r: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = ) (noise: ), cluster(id) maximize . ml display Number of obs Wald chi2(0) Prob > chi2

Log pseudolikelihood = -7679.9527

= = =

11766 . .

(Std. Err. adjusted for 215 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------r | _cons | .7119379 .0303941 23.42 0.000 .6523666 .7715092 -------------+---------------------------------------------------------------noise | _cons | .7628203 .080064 9.53 0.000 .6058977 .9197429 ------------------------------------------------------------------------------

So the CRRA coefﬁcient declines very slight, and the noise term is estimated as a normal probability with standard deviation of 0.763. F4. Non-Parametric Estimation of the EUT Model It is possible to estimate the EUT model without assuming a functional form for utility, following Hey and Orme (1994). The likelihood function is evaluated as follows: * define Original Recipe EUT with Fechner errors: non-parametric program define ML_eut0_np args lnf u5 u10 noise tempvar prob0l prob1l prob2l prob3l prob0r prob1r prob2r prob3r y0 y1 y2 y3 tempvar euL euR euDiff euRatio tmp lnf_eut lnf_pt p1 p2 f1 f2 u0 u15 quietly { * construct likelihood for generate double `prob0l' = generate double `prob1l' = generate double `prob2l' = generate double `prob3l' =

EUT $ML_y2 $ML_y3 $ML_y4 $ML_y5

generate generate generate generate

$ML_y6 $ML_y7 $ML_y8 $ML_y9

double double double double

`prob0r' `prob1r' `prob2r' `prob3r'

= = = =

generate double `u0' = 0 generate double `u15' = 1 generate generate generate generate

double double double double

`y0' `y1' `y2' `y3'

= = = =

`u0' `u5' `u10' `u15'

gen double `euL'=(`prob0l'*`y0')+(`prob1l'*`y1')+(`prob2l'*`y2')+(`prob3l'*`y3') gen double `euR'=(`prob0r'*`y0')+(`prob1r'*`y1')+(`prob2r'*`y2')+(`prob3r'*`y3') generate double `euDiff' = (`euR' - `euL')/`noise' replace `lnf' = ln(normal( `euDiff')) if $ML_y1==1 replace `lnf' = ln(normal(-`euDiff')) if $ML_y1==0 } end

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

194

and estimates can be obtained in the usual manner. We include demographics for each parameter, and introduce the notion of a ‘‘global’’ macro function in Stata. Instead of typing out the list of demographic variables, one gives the command global demog “Female Black Hispanic Age Business GPAlow”

and then simply refer to $global. Every time Stata sees ‘‘$demog’’ it simply substitutes the string ‘‘Female Black Hispanic Age Business GPAlow’’ without the quotes. Hence, we have the following results: . ml model lf ML_eut0_np (u5: Choices P0left P1left P2left P3left P0right P1right P2right P3right prize0 prize1 prize2 prize3 stake = $demog ) (u10: $demog ) (noise: ) if expid=="ucf0", cluster(id) technique(dfp) maximize difficult . ml display Log pseudolikelihood = -2321.8966

Number of obs Wald chi2(6) Prob > chi2

= = =

3736 18.19 0.0058

(Std. Err. adjusted for 63 clusters in id) -----------------------------------------------------------------------------| Robust | Coef. Std. Err. z P>|z| (95% Conf. Interval) -------------+---------------------------------------------------------------u5 | Female | .096698 .0453102 2.13 0.033 .0078916 .1855044 Black | .0209427 .0808325 0.26 0.796 -.1374861 .1793715 Hispanic | .0655292 .0784451 0.84 0.404 -.0882203 .2192787 Age | -.0270362 .0093295 -2.90 0.004 -.0453217 -.0087508 Business | .0234831 .0493705 0.48 0.634 -.0732813 .1202475 GPAlow | -.0101648 .0480595 -0.21 0.832 -.1043597 .0840301 _cons | 1.065798 .1853812 5.75 0.000 .7024573 1.429138 -------------+---------------------------------------------------------------u10 | Female | .0336875 .0287811 1.17 0.242 -.0227224 .0900973 Black | .0204992 .0557963 0.37 0.713 -.0888596 .1298579 Hispanic | .0627681 .0413216 1.52 0.129 -.0182209 .143757 Age | -.0185383 .0072704 -2.55 0.011 -.032788 -.0042886 Business | .0172999 .0308531 0.56 0.575 -.0431711 .0777708 GPAlow | -.0110738 .0304819 -0.36 0.716 -.0708171 .0486696 _cons | 1.131618 .1400619 8.08 0.000 .8571015 1.406134 -------------+---------------------------------------------------------------noise | _cons | .0952326 .0079348 12.00 0.000 .0796807 .1107844 ------------------------------------------------------------------------------

It is then possible to predict the values of the two estimated utilities, which will vary with the characteristics of each subject, and plot them. Fig. 10 in the text shows the distributions of estimated utility values.

195

Risk Aversion in the Laboratory

F5. Replication of Holt and Laury (2002) Finally, it may be useful to show an implementation in Stata of the ML problem solved by HL: program define HLep1 args lnf r alpha mu tempvar theta lnfj prob1 prob2 scale euSAFE euRISKY euRatio mA1 mA2 mB1 mB2 yA1 yA2 yB1 yB2 wp1 wp2 quietly { /* initializations */ generate double `prob1' = $ML_y2/10 generate double `prob2' = 1 - `prob1' generate double `scale' = $ML_y7 /* add the endowments generate double `mA1' generate double `mA2' generate double `mB1' generate double `mB2'

to the prizes */ = $ML_y8 + $ML_y3 = $ML_y8 + $ML_y4 = $ML_y8 + $ML_y5 = $ML_y8 + $ML_y6

/* utility of prize m */ generate double `yA1' = generate double `yA2' = generate double `yB1' = generate double `yB2' =

(1-exp(-`alpha'*((`scale'*`mA1')^(1-`r'))))/`alpha' (1-exp(-`alpha'*((`scale'*`mA2')^(1-`r'))))/`alpha' (1-exp(-`alpha'*((`scale'*`mB1')^(1-`r'))))/`alpha' (1-exp(-`alpha'*((`scale'*`mB2')^(1-`r'))))/`alpha'

/* classic EUT probability weighting function */ generate double `wp1' = `prob1' generate double `wp2' = `prob2' /* expected utility */ generate double `euSAFE' = (`wp1'*`yA1')+(`wp2'*`yB1') generate double `euRISKY' = (`wp1'*`yA2')+(`wp2'*`yB2') /* EU ratio */ generate double `euRatio' = (`euSAFE'^(1/`mu'))/ ((`euSAFE'^(1/`mu'))+(`euRISKY'^(1/`mu'))) /* contribution to likelihood */ replace `lnf' = ln(`euRatio') if $ML_y1==0 replace `lnf' = ln(1-`euRatio') if $ML_y1==1 } end

The general structure of this routine should be easy to see. The routine is called with this command ml model lf HLep1 (r: Choices problem m1a m2a m1b m2b scale wealth = ) (alpha: ) (mu: )where variable ‘‘Choices’’ is a binary variable deﬁning the subject’s choices of the safe or risky lottery; variable ‘‘problem’’ is a counter from 1 to 10 in the usual implementation of the design; the next four variables deﬁne the ﬁxed prizes; variable ‘‘scale’’ indicates the multiples of the basic payoffs used (e.g., 1, 10, 20, 50, or 90), and variable ‘‘wealth’’ measures initial endowments prior to the risk aversion task (typically $0). Three parameters are estimated, as deﬁned in

196

GLENN W. HARRISON AND E. ELISABET RUTSTRO¨M

the EP speciﬁcation discussed in the text. The only new steps are the deﬁnition of the utility of the prize, using the EP speciﬁcation instead of the CRRA speciﬁcation, and the deﬁnition of the index of the likelihood. Use of this procedure with the original HL data replicates the estimates in Holt and Laury (2002, p. 1653) exactly. The advantage of this formulation is that one can readily extend it to include covariates for any of the parameters. One can also correct for clustering of observations by the same subject. And extensions to consider probability weighting are trivial to add. F6. Extensions There are many possible extensions of the basic programming elements considered here. Harrison (2006c) illustrates the following: modeling rank-dependent decision weights for the RDU and RDEV structural model; modeling rank-dependent decision weights and sign-dependent utility for the CPT structural model; the imposition of constraints on parameters to ensure non-negativity (e.g., lW1 or mW0), or ﬁnite bounds (e.g., 0oro1); the speciﬁcation of ﬁnite mixture models; the coding of non-nested hypothesis tests; and maximum simulated likelihood, in which one or more parameters are treated as random coefﬁcients to reﬂect unobserved individual heterogeneity (e.g., Train (2003)). In each case template code is provided along with data and illustrative estimates.

STOCHASTIC MODELS FOR BINARY DISCRETE CHOICE UNDER RISK: A CRITICAL PRIMER AND ECONOMETRIC COMPARISON Nathaniel T. Wilcox ABSTRACT Choice under risk has a large stochastic (unpredictable) component. This chapter examines ﬁve stochastic models for binary discrete choice under risk and how they combine with ‘‘structural’’ theories of choice under risk. Stochastic models are substantive theoretical hypotheses that are frequently testable in and of themselves, and also identifying restrictions for hypothesis tests, estimation and prediction. Econometric comparisons suggest that for the purpose of prediction (as opposed to explanation), choices of stochastic models may be far more consequential than choices of structures such as expected utility or rank-dependent utility.

Lotteries are the alternatives of many theories of choice under risk. For instance, getting $10 if a fair coin toss lands heads, and zero if tails, is a lottery – call it ‘‘Toss.’’ Getting $24 if a fair deck of cards cuts to a spade, and zero if other suits, is also a lottery – call it ‘‘Cut.’’ Call the set {Toss,Cut} a pair, speciﬁcally pair 1 here. In binary lottery choice Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 197–292 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00004-5

197

198

NATHANIEL T. WILCOX

experiments subjects view many such pairs, and select just one lottery from each pair. In pair 1, three possible money outcomes are available across the lotteries in the pair. Call this the pair’s outcome context or more simply its context and write it as the vector c1 ¼ ð0;10;24Þ. Knowing pair 1’s context, we may express Toss and Cut as outcome probability vectors on that context. For instance, Toss is the vector ð1=2;1=2;0Þ, and Cut is the vector ð3=4;0;1=4Þ, both on context c1. Even under unchanging conditions, a subject might not choose Toss or Cut every time she encountered pair 1. On some occasions, she might be swayed by the somewhat higher expected value of Cut ($6 versus $5 for Toss), while on others she might opt for the relative safety of Toss (a 1/2 chance of some positive payment versus just a 1/4 chance with Cut). That is, choice under risk could be less than perfectly predictable or stochastic, and in fact an overwhelming canon of experimental evidence suggests this is so. Beginning with Mosteller and Nogee (1951), binary lottery choice experiments with repeated trials of pairs reveal substantial choice switching by the same subject between trials. In some cases, the repeated trials span days (e.g., Tversky, 1969; Hey & Orme, 1994; Hey, 2001); in such cases, one could argue that conditions may have changed between trials. Yet substantial switching occurs even between trials separated by a couple of minutes or less, and with no intervening change in wealth, portfolios of background risks, or any other obviously decision-relevant variable (Camerer, 1989; Starmer & Sugden, 1989; Ballinger & Wilcox, 1997; Loomes & Sugden, 1998). Many theories of risky choice come to us as deterministic theories. These theories take as given a single ﬁxed relational system – a collection of relational statements about pairs of lotteries, such as ‘‘Toss is weakly preferred to Cut by subject n’’, written Tosskn Cut. One empirical interpretation of weak preference statements is that they are records of outcomes of particular choice trials, all observed under unchanging conditions. Under this interpretation, Tosskn Cut formally means we observed subject n choosing Toss from pair 1 on some trial. Indifference is then deﬁned as observing both Cutkn Toss and Tosskn Cut on different trials (under unchanging conditions). Under this interpretation, all switching across trials (under unchanging conditions) is called one thing: Indifference. This interpretation of relational systems implies an implausible amount of indifference, given the ubiquity of choice switching in the experimental canon with repeated trails. It also renders big differences in behavior moot and small differences in behavior crucial. Suppose Anne chose Toss

Stochastic Models for Binary Discrete Choice Under Risk

199

in 19 of 20 trials, and Bob chose Toss in 1 of 20 trials: Do we want an interpretive straightjacket that simply describes Anne and Bob identically, as both indifferent between Toss and Cut? Why would that rather large difference in behavior be uninteresting, while the rather trivial empirical difference between Anne and Charlie, who chose Toss in all 20 trials, be considered crucial?1 Stochastic differences between subjects are frequently much more empirically interesting than this severe classiﬁcation (implied by this particular view of a relational system) allows. Fortunately, alternative views of relational systems offer escapes from this trouble. One may instead give relational statements a probabilistic interpretation. In this view, choice probabilities are the underlying empirical reality, and relational statements are derived from them. Let Pn1 denote the probability that subject n chooses Toss from pair 1 under given conditions. In this view, dating at least to Edwards (1954), Tosskn Cut means that Pn1 0:5, and indifference means that Pn1 ¼ 0:5. The purpose of a structural theory or structure, such as expected utility, is then primarily to represent the preference directions so derived from underlying choice probabilities. In this view, the structure will play a strong supporting role in determining choice probabilities, but does not do this alone: Extra assumptions concerning random parts of behavior (for instance, random ﬂuctuations in comparison processes from trial to trial) are the crucial missing element. An entirely different view is that subject n has a set of deterministic relational systems kn1 ; kn2 ; . . . ; knk and randomly draws one on each trial to determine her choice. Both of these approaches are stochastic models: They add randomness to received deterministic theory in some formal way to produce choice probabilities from the deterministic preference directions of a single relational system or a set of relational systems. The main reason I believe we must combine our decision theories with a stochastic model is that the deterministic view of these theories is so deeply troubled in light of all known evidence with repeated trials. However, even experiments without repeated trials invariably produce some violations of every theory, and few scholars would view a theory as falsiﬁed on the basis of one or a few observed violations of its predictions. Stochastic models give empirical discipline to this view by placing restrictions on violations, as Harless and Camerer’s (1994) note. Stochastic models also provide us with useful statistical information and instructions: The most powerful and informative statistical tests are invariably those where we have the most a priori information about the true data-generating process, and the true stochastic model is a crucial part of a data-generating process. Of course, this presumes that we know the true stochastic model.2 For choice under

200

NATHANIEL T. WILCOX

risk, work on this matter dates back at least to Becker, DeGroot, and Marschak (1963a, 1963b), but there has been resurgent interest in it amongst experimental economists (Hey, 1995; Ballinger & Wilcox, 1997; Carbone, 1997; Loomes & Sugden, 1998; Loomes, Moffatt, & Sugden, 2002; Blavatskyy, 2007). Some of this evidence will be reviewed below. Many experimentalists view assumptions about random parts of behavior as someone else’s business. In this view: (1) proper randomization of subjects to treatments (or in the case of within-designs, random assignment of treatment order to subjects) guarantees that random parts are uncorrelated with treatments, (2) large enough samples sizes and/or tasks per subject guarantee that random parts average out, and (3) appropriate nonparametric tests guarantee that speciﬁc features of random parts do not inﬂuence inference. However, all these truisms allow is unbiased inferences about average or median treatment effects over a sample. Average treatment effects are quite interesting in many policy contexts. But I argue here that when it comes to evaluating theories of discrete choice under risk, where many interesting inferences depend crucially on stochastic assumptions, average treatment effects alone are relatively uninformative. This point goes back to Becker et al. (1963b), but has received much attention of late (Loomes & Sugden, 1995; Ballinger & Wilcox, 1997; Hey, 2005; Loomes, 2005). Several principles guide my discussion. The ﬁrst is that in both theory tests and estimation, stochastic models are identifying restrictions. In the case of theory tests, for instance, we will see that the well-known common ratio effect (e.g., Kahneman & Tversky, 1979) contradicts expected utility theory if some stochastic models are true, but not necessarily if others are true (Loomes & Sugden, 1995; Ballinger & Wilcox, 1997). Thus, the inference that a common ratio effect contradicts expected utility depends implicitly on a stochastic modeling assumption; that assumption identiﬁes the common ratio effect as a theory rejection. This example will be developed at length in Section 3, since it also illustrates how average treatment effects alone (i.e., without any consideration of stochastic models) may mislead us in inference. In estimation, inferences regarding patterns and kinds of risk aversion depend strongly on stochastic models. When some stochastic models are combined with a structure that contains constant (absolute or relative) risk aversion, they will imply a matching invariance of choice probabilities across contexts that differ by an additive or proportional shift of outcomes (called CARA- and CRRA-neutrality, respectively). But not all stochastic models have this property. Therefore, structural parameter estimates can yield constant (absolute or relative) risk aversion across contexts when estimated with one stochastic model, but increasing/decreasing risk aversion

Stochastic Models for Binary Discrete Choice Under Risk

201

across contexts when estimated with a different stochastic model. Again, the stochastic model is an important identifying restriction, here because it can change our conclusions about a subject’s pattern of risk aversion across contexts as embodied in a structural parameter estimate. The preceding example also illustrates a second principle: Stochastic models have different implications across contexts. This is particularly important when we estimate some parameters of a structure, intending to use the estimates to predict or explain behavior in a new context. At a very general level in disparate ways, there has been a recent awakening to the importance of thinking about decisions across multiple contexts. The ‘‘calibration theorem’’ of Rabin (2000) and its broad generalization by Cox and Sadiraj (2006, 2008 – this volume) concerns the relationship between choices on many small contexts and a large context spanning all the smaller contexts. Holt and Laury’s (2002) examination of the effect of a large proportional change in a context is directly related to the example in the previous paragraph. That stochastic models have quite different implications across contexts is, however, a relatively underappreciated point, though it has been understood by psychologists for quite some time (see e.g., citations in Busemeyer & Townsend, 1993). The last principle is that stochastic models have very different implications about the empirical meaning of the deterministic relation ‘‘more risk averse’’ or MRA. Pratt (1964) originally developed a deﬁnition of what it means to say ‘‘Bob is more risk averse than Anne,’’ or Bob Anne, in deterministic mra expected utility theory. Wilcox (2007a) suggests a deﬁnition of ‘‘Bob is stochastically more risk averse than Anne,’’ or Bob Anne, and shows smra that under many common stochastic modeling assumptions Bob mra AnneRBob Anne. A new stochastic model, called contextual utility, smra allows one to say that Bob Anne ) Bob Anne, and this model will be mra smra examined here along with the better-known alternatives. Section 1 begins with some formal notation and deﬁnitions that facilitate later discussions, and introduces two particular structures that I use as examples throughout the chapter: Standard expected utility and rankdependent expected utility (RDEU) (Quiggin, 1982; Chew, 1983). It also deﬁnes several general properties of structures and stochastic models. Section 2 then introduces ﬁve stochastic models. Section 3 shows how average treatment effects can be uninformative or worse for matters of interest in decision under risk, using the well-known example of the common ratio effect, as discussed by Ballinger and Wilcox (1997) and Loomes (2005). Section 4 compares the stochastic models using the three principles

202

NATHANIEL T. WILCOX

described above, from the viewpoint of the structural and stochastic properties discussed in Section 1. Section 5 provides an empirical comparison between combinations of the two structures and the ﬁve stochastic models using the well-known Hey and Orme (1994) data set. This empirical comparison will illustrate the use of random parameters estimation for representing heterogeneity in a sample, thus adding to the arsenal of econometric methods described by Harrison and Rutstro¨m (2008 – this volume). Section 5 also introduces a seldom-seen wrinkle by comparing the ability of models to predict ‘‘out-of-context.’’ That is, I will compare how well particular combinations of stochastic model and structure predict choices on one context, after having been estimated using choices made on different contexts. This comparison will strongly suggest that when it comes to prediction, it is much more important to get the stochastic model right than it is to get the structure right. At the same time, in-sample ﬁt comparison suggests the opposite. Thus, it seems that explanation and prediction may call for different emphasis: Explanation hinges more on correct structure, while prediction in new contexts hinges more on correct stochastic models.

1. PRELIMINARIES 1.1. Notation and Deﬁnitions Let Z ¼ ð0; 1; 2; . . . ; z; . . . ; I 1Þ denote I equally spaced nonnegative money outcomes z including zero.3 The ‘‘unit outcome’’ in Z varies across experiments and surveys: That is, ‘‘1’’ could represent $0.05 or $5 or d50 in Z. A lottery S m ¼ ðsm0 ; sm1 ; . . . ; smz ; . . . ; smðI1Þ Þ is a discrete probability distribution on Z. A pair m is a set fS m ; Rm g of two lotteries. Let cm be the context of pair m, deﬁned as the vector of outcomes remaining after deletion of all outcomes in Z such that smi ¼ rmi ¼ 0. That is, pair m’s context is the vector of outcomes that can occur in at least one of its two lotteries. In many experiments, all contexts are triples (i.e., all pairs involve just three possible outcomes). For instance in Hey and Orme (1994), Hey (2001), and Harrison and Rutstro¨m (2005), there are I ¼ 4 outcomes, so that Z ¼ (0,1,2,3); this quadruple yields four distinct contexts, namely (0,1,2), (0,1,3), (0,2,3), and (1,2,3), and all pairs in those experiments are on one of those four contexts. For pairs m ¼ fS m ; Rm g ¼ fðsmj ; smk ; sml Þ; ðrmj ; rmk ; rml Þg on a threeoutcome context cm ¼ ð j; k; lÞ where lWkWj, choose the lottery names Sm

Stochastic Models for Binary Discrete Choice Under Risk

203

and Rm so that smk þ sml 4rmk þ rml and sml orml whenever this is possible. Sm then has less probability of the lowest or highest outcomes j and l, but a larger probability of the middle outcome k, than lottery Rm. In this case, we say that Sm is safer than Rm and call Sm the safe lottery in pair m. This labeling is henceforth adopted: It applies to all pairs in Hey and Orme’s (1994) experiment. Let Ob denote the set of all such basic pairs. Some experiments also include pairs where one lottery ﬁrst-order stochastically dominates the other, that is where lottery names can be chosen so that smk þ sml rmk þ rml and sml rml , with at least one inequality strict. Such lotteries are not ordered by the ‘‘safer than’’ relation as I have just described it, but I will nevertheless let Sm denote the dominating lottery in such pairs. Let Ofosd denote the set of all such FOSD pairs. Together, basic pairs and FOSD pairs are a mutually exclusive and exhaustive classiﬁcation of lottery pairs on three-outcome contexts. An experiment consists of presenting the pairs m ¼ 1, 2,y, M to subjects n ¼ 1, 2,y, N. In some experiments, each pair is presented repeatedly; in such cases, let t ¼ 1, 2,y,T denote these repeated trials. Let Pnm;t ¼ Prð ynm;t ¼ 1Þ be the probability that subject n chooses S m in trial t of pair m, where ynm;t ¼ 1 is this event in the experiment ( ynm;t ¼ 0 if n instead chooses Rm ). The trial subscript t may be dropped for experiments with single trials (T ¼ 1) or if one assumes that choices are independent across trials and choice probabilities do not change across trials,4 writing Pnm instead of Pnm;t . Except where necessary, I suppress t to prevent notational clutter. It will occasionally be convenient to index different pairs by real numbers with special meanings, or by a pair of indices, rather than a natural number index m. This should be clear as it comes up. Finally, when considering expectations of various sample moments, we need notation appropriate to the population from which subjects n are sampled, instead of actual samples. Put differently, we occasionally need to think about subject types c, and their cumulative distribution function or c.d.f. JðcÞ, in the sampled population instead of actual subjects n in a particular sample. On such occasions, the subject superscript n will be replaced by a subject type superscript c, distributed according to some c.d.f. JðcÞ in the sampled population.

1.2. The Two Structures For transitive theories, the structure of choice under risk is a function V of lotteries and a vector of structural parameters bn such that VðS m jbn Þ VðRm jbn Þ 03Pnm 0:5.5 This equates the relational statement S m kn Rm

204

NATHANIEL T. WILCOX

with the probability statement Pnm 0:5.6 Structure maps pairs into a set of possible probabilities rather than a single unique probability, and hence underdetermines choice probabilities. Stochastic models, discussed subsequently, remedy this. The expected utility (or EU) structure is VðS m jbn Þ

I1 X

smz unz ; such that

z¼0

I1 X

smz unz

z¼0

I1 X

rmz unz 3Pnm 0:5

(1)

z¼0

The unz are called utilities of outcomes z. Representation theorems for both EU and RDEU show that the utilities representing preferences are unique only up to an afﬁne transformation, so we may choose a common ‘‘zero and unit’’ utility for all subjects. I do this here with the two lowest outcomes in Z, choosing un0 ¼ 0 and un1 ¼ 1 for all subjects n. We may then think of the structural parameters of EU as subject n’s utilities of the remaining I2 outcomes in Z (i.e., bn ¼ ðun2 ; un3 ; . . . ; unI1 Þ). Note that I refer to both EU and RDEU as afﬁne structures because of this afﬁne transformation nonuniqueness property of their utilities. The expression VðS m jbn Þ VðRm jbn Þ, while only unique up to scale for EU and RDEU (because they are afﬁne structures), will frequently be called V-distance as this plays a central role in several stochastic models. The RDEU structure (Quiggin, 1982; Chew, 1983) replaces the probabilities smz in the P expected utility P structure with weights wsmz . These weights are wsmz ¼ wð iz smi Þ wð i4z smi Þ, where a continuous and increasing weighting function w(q) takes the unit interval onto itself. The RDEU structure is then VðS m jbn Þ

I1 X z¼0

wsmz unz such that

I1 X z¼0

wsmz unz

I1 X

wrmz unz 3Pnm 0:5 (2)

z¼0

Several parametric forms have been suggested for the weighting function; here, I use Prelec’s (1998) one-parameter form, which is wðqjgn Þ ¼ n expð½ lnðqÞg Þ ’ qA(0,1), w(0) ¼ 0, and w(1) ¼ 1. In RDEU, bn ¼ ðun2 ; un3 ; . . . ; unI1 ; gn Þ is then the structural parameter vector, and EU is a special case of this where gn ¼ 1, in which case w(q) q and wsmz smz . I use EU and RDEU as widely known and important exemplars, but many of the issues discussed here are not speciﬁc to them or any speciﬁc parametric instantiation of them. While my estimations ultimately use nonparametric utilities as expressed above, I will discuss EU and RDEU structures that use well-known

Stochastic Models for Binary Discrete Choice Under Risk

205

parametric utility functions of theoretical importance. These are the CARA n form unz ¼ signðan Þea z , in which the local absolute risk aversion 00 0 u ðzÞ=u ðzÞ at z is the constant an’z, and the CRRA form unz ¼ n ð1 jn Þ1 z1j , in which the local relative risk aversion zu00 ðzÞ=u0 ðzÞ at z is the constant jn’z. 1.3. Preference Equivalence Sets Suppose that structure V implies that, for any ﬁxed structural parameter vector b, there is a set OeV of pairs over which preference directions must all be equivalent, formally deﬁned by VðSm jbÞ VðRm jbÞ 03VðSm0 jbÞ VðRm0 jbÞ 0 8 fixed b; 8 m and m0 2 OeV (3)

Call such sets preference equivalence sets: These are typically derived from the axiom system underlying a structure, or from the algebraic form of the structure. Most are well-known to empirical decision research because they are a large basis (but not the only basis) for theory-testing experiments. Several types of preference equivalence sets play central roles in my discussion of stochastic models. Preference equivalence sets are special because one of the stochastic models, random preferences (RPs), makes extremely strong predictions about them. 1.3.1. Common Ratio Sets These are perhaps the most widely discussed example of preference equivalence sets for EU. A common ratio set has the form ffSt ; Rt gjS t ¼ ð1 ts; ts; 0Þ; Rt ¼ ð1 tr; 0; trÞg. Calling t 2 ð0; 1 the common ratio, pairs in a common ratio set vary only by this common ratio: The pairs all have the same value of s and r (with sWr), and they are all on the same context ð j; k; lÞ. The EU structure implies that for any given utility vector ðunj ; unk ; unl Þ, either S t kn Rt 8t or Rt kn S t 8t (e.g., Kahneman & Tversky, 1979) since the EU V-distance between all the pairs in a common ratio set may be written ð1 tsÞunj þ tsunk ½ð1 trÞunj þ trunl t½ðr sÞunj þ sunk runl , whose sign is obviously independent of the common ratio t. So any common ratio set is an EU preference equivalence set. Experimenters choose a root pair fS1 ; R1 g, that is a pair with t ¼ 1, for a common ratio set in such a way that most subjects are expected to choose S1 from the root pair, and also choose t small, say equal to 14 or less, and include these kinds of pairs in the design as well. The usual ﬁnding is that most subjects in a sample choose S1 from the root pair, but most subjects

206

NATHANIEL T. WILCOX

instead choose Rt from pairs with sufﬁciently small t. This is generally called the common ratio effect and is widely taken as evidence against the EU structure. Loomes and Sugden (1995), Ballinger and Wilcox (1997), and others have pointed out, however, that while some stochastic models make this a correct inference, other stochastic models do not. This is explained later.

1.3.2. MPS Pairs on a Speciﬁc Three-Outcome Context There is a theoretically important subset of the basic pairs Ob deﬁned previously. If m 2 Ob and EðRm Þ ¼ EðS m Þ, Rm is a mean-preserving spread (MPS) of Sm according to the deﬁnition of Rothschild and Stiglitz (1970): Call Omps Ob the set of MPS pairs. There is a well-known implication of Jensen’s Inequality for the EU structure, for all pairs m 2 Omps : If unz is weakly concave (convex) in z, then the EU structure implies that S m kn Rm ðRm kn S m Þ8m 2 Omps (Rothschild & Stiglitz, 1970). Neither EU nor RDEU require unz to be weakly concave or weakly convex across all z. Indeed, where utilities have been estimated nonparametrically across four outcomes (0,1,2,3), as in Hey and Orme (1994) and Hey (2001), a substantial fraction of subjects (30–40%, depending on the structure estimated) display a concave-then-convex pattern of utilities – concave on the contexts (0,1,2), (0,1,3), and (0,2,3), but convex on the context (1,2,3). This is reﬂected in the fact that risk-averse choices in those data sets are much less frequent for pairs on the context (1,2,3). When I come to estimation using Hey and Orme’s data, this is why I avoid a parametric utility function (like CARA or CRRA), which would force a uniform curvature over all four outcomes. However, any vector of utilities ðunj ; unk ; unl Þ on any speciﬁc three-outcome context is either weakly concave or weakly convex in z (or both). Therefore, if we let Ocmps Omps be all MPS pairs on any speciﬁc three-outcome context c, it is a preference equivalence set for EU. This is not true of RDEU for all weighting functions. I place heavy emphasis on this particular property of the EU structure in my comparison of stochastic models, even though it is rarely considered in decision experiments. Outside of decision theory and experimental economics, the chief uses of both EU and RDEU in applied economic theory stress comparative statics results that ﬂow from pairs m 2 Omps (precisely because of the property described above) or from pairs in special subsets of Omps , such as ‘‘linear spreads’’ and ‘‘monotone spreads’’ (see e.g., Quiggin, 1991). It seems sensible to emphasize and concentrate on those properties of structures that get used the most in applied economic theory,

207

Stochastic Models for Binary Discrete Choice Under Risk

and how those properties may be changed (or not) by various stochastic modeling options. 1.3.3. FOSD Pairs It should be obvious that Ofosd , the set of FOSD pairs, is a preference equivalence set for both EU and RDEU since both of these structures obey ﬁrst-order stochastic dominance. In fact it is something stronger than a preference equivalence set, since these structures imply that all subjects n prefer S m 8m 2 Ofosd quite regardless of their structural parameter vector bn . Loomes and Sugden (1995) have attached particular importance to this preference equivalence set because it is common to both EU and RDEU (and a great number of theories). In experiments, subjects rarely violate FOSD when it is ‘‘transparent’’ (Loomes & Sugden, 1998; Hey, 2001). Without further adornment, only one of the ﬁve stochastic models considered here gets this observation approximately correct; but we will see that with the aid of trembles and an appropriate computational interpretation, other stochastic models can be modiﬁed sensibly, and at almost no cost of parsimony, to explain this observation. However, none of the ﬁve models can explain ‘‘nontransparent’’ violations of FOSD such as in Birnbaum and Navarrete (1998).7 1.3.4. Context Shifts and Parametric Utility Functions Say that the lottery pairs m ¼ fS m ; Rm g and m0 ¼ fS m0 ; Rm0 g differ by an additive context shift if Sm and S m0 are identical probability vectors, and Rm and Rm0 are identical probability vectors, but the contexts of pairs m and m0 differ by the addition of x to all outcomes; that is cm ¼ ð j; k; lÞ and cm0 ¼ ð j þ x; k þ x; l þ xÞ. If an EU or RDEU maximizer has a CARA n n utility function over outcomes z, then unzþx signðan Þea ðzþxÞ ea x n n ½signðan Þea z ea x unz . Therefore, whenever m and m0 differ by an additive context shift, n

n

VðS m0 jbn Þ ea x VðSm jbn Þ and VðRm0 jbn Þ ea x VðRm jbn Þ

(4)

for both the EU and RDEU structures with a CARA utility function. It follows in this instance that, for any given b, VðSm jbÞ VðRm jbÞ 03VðS m0 jbÞ VðRm0 jbÞ 0. Therefore, a set of pairs that differ only by additive context shifts are a preference equivalence set for both EU and RDEU structures with CARA utility functions. Similarly, say that the lottery pairs m ¼ fSm ; Rm g and m0 ¼ fS m0 ; Rm0 g differ by a proportional context shift if S m and S m0 are identical probability

208

NATHANIEL T. WILCOX

vectors, and Rm and Rm0 are identical probability vectors, but the contexts of pairs m and m0 differ by the multiplication of all outcomes by yW0; that is cm ¼ ð j; k; lÞ and cm0 ¼ ð yj; yk; ylÞ. If an EU or RDEU maximizer has a n CRRA utility function over outcomes z, then unyz ð1 jn Þ1 ðyzÞ1j 1jn n 1 1jn 1jn n 0 y ð1 j Þ z y uz . Therefore, whenever m and m differ by a proportional context shift, n

n

VðS m0 jbn Þ y1j VðSm jbn Þ and VðRm0 jbn Þ y1j VðRm jbn Þ

(5)

for both the EU and RDEU structures with a CARA utility function. So by similar reasoning, a set of pairs that differ only by proportional context shifts are a preference equivalence set for both EU and RDEU structures with CRRA utility functions. These humble properties of CARA and CRRA utility functions, wholly transparent to anyone familiar with Pratt (1964), are not always important for theory comparisons, where experimental design can sometimes make parametric assumptions about utility functions unnecessary. Yet they are quite important for estimation because different stochastic models interact with these parametric functional forms in very different ways. We will see below that three of our stochastic models imply an intuitive choice probability invariance with an additive (proportional) context shift when a CARA (CRRA) utility function is used for estimation, but other stochastic models will not. Because of this, stochastic models are crucial to inferences about the constancy (or not) of absolute or relative risk aversion across different contexts (as in, e.g., Holt & Laury, 2002).

1.4. Other Structural Properties Not all predictions of structures take the shape of preference equivalence sets. Formally, a preference equivalence set is a two-way implication about preference directions across a minimum of two pairs. But structures have some properties that are not two-way implications and, moreover, are deﬁned with a minimum of three pairs. The two major properties here are betweenness and transitivity. Stochastic models can behave very differently with respect to these kinds of properties. For instance, we will see that while RPs make extremely strong predictions about properties expressible as preference equivalence sets, RPs produce nothing recognizable as a stochastic transitivity property.

Stochastic Models for Binary Discrete Choice Under Risk

209

1.4.1. Betweenness Betweenness is the mother of all predicted differences between stochastic models (Becker et al., 1963a, 1963b). Let D ¼ tC þ ð1 tÞE, t 2 ½0; 1, be a linear mixture of the probability vectors making up any lotteries C and E; obviously D is itself a third lottery. A structure is said to satisfy betweenness if VðDjbÞ is between VðCjbÞ and VðEjbÞ for any b and any t. It is wellknown that EU satisﬁes betweenness and that RDEU does not. Generally speaking, betweenness tests are conducted by having subjects choose from sets of three lotteries fC; D; Eg constructed as in the deﬁnition of betweenness above, where D is a mixture of C and E, rather than from pairs of lotteries. Becker et al. (1963a) showed that, in such situations, some stochastic models predict mild violations of betweenness when combined with the EU structure, while others do not. Becker et al. (1963b) showed rates of violation of betweenness in about 30% of all choices from suitably constructed lottery triples. This is far too high a rate to be a slip of the ﬁngers (we will revisit the notion of ‘‘trembles’’ shortly), but just about right for strong utility models, given contemporary knowledge about retest reliability of lottery choice (e.g., Ballinger & Wilcox, 1997). Blavatskyy (2006) discusses how betweenness violations of this order of magnitude, observed within and across many studies, can be explained by EU with the strong utility model (very similar explanations ﬂow from strict utility, contextual utility, and the wandering vector (WV) model). Random preference EU models, however, seem to be rejected by these observations (Becker et al., 1963a, 1963b). In the language of Gul and Pesendorfer (2006), EU with RPs has the property of extremeness, which means that any observed choice from a set must be the unique maximizer in the choice set for some utility function. Betweenness implies that this can only be C or E in a set fC; D; Eg when D is a mixture of C and E. Strictly speaking, this is only true for regular RP models, in which ‘‘tied preferences’’ (and so indifference between C and E) are zero probability events (see Gul and Pesendorfer). But to explain a 30% rate of violations of betweenness as indifference between C and E, one would be invoking a rather scary amount of indifference. Since I focus on binary choice – choice from pairs, not triples – these observations about betweenness tests in suitably constructed lottery triples are mostly tangential. Yet betweenness does have testable implications across certain sets of MPS pairs for most of the stochastic models introduced below. So after that introduction, it will reappear when MPS pairs are taken up again.

210

NATHANIEL T. WILCOX

1.4.2. Transitivity, Stochastic Transitivities, and Simple Scalability Consider the three unique lottery pairs that can be constructed from any triple of lotteries {C,D,E}, that is the pairs {C,D}, {D,E}, and {C,E}, and call these pairs 1, 2, and 3. A deterministic relational system is transitive if Ckn E whenever both Ckn D and Dkn E. Both EU and RDEU are transitive structures, and so are many other structures. There are three stochastic versions of transitivity that may or may not be satisﬁed when a transitive structure is combined with a stochastic model. These are: Strong stochastic transitivity (SST): minðPn1 ; Pn2 Þ 0:5 ) Pn3 maxðPn1 ; Pn2 Þ; Moderate stochastic transitivity (MST): minðPn1 ; Pn2 Þ 0:5 ) Pn3 minðPn1 ; Pn2 Þ; and Weak stochastic transitivity (WST): minðPn1 ; Pn2 Þ 0:5 ) Pn3 0:5. Stochastic transitivities are among the central distinctions between stochastic choice models (Morrison, 1963). As we will see, the ﬁve stochastic models considered here predict either SST or MST, or they predict no stochastic transitivity at all – not even WST. The relative power of stochastic transitivity predictions is that they are independent of whether EU or RDEU is the true structural model. Indeed, any stochastic transitivity property of a stochastic model will hold for all transitive structures VðS m jbn Þ VðRm jbn Þ 03Pnm 0:5. The rhetoric of identifying restrictions cuts both ways: We need to be circumspect about using a ‘‘favorite’’ structure as an identifying restriction for choosing a stochastic model. As will be clear below, most stochastic model predictions depend on the true structural model: Stochastic transitivities are particularly useful testable properties precisely because they do not (for the class of transitive structures). Simple scalability is a stochastic choice property that is closely related to stochastic transitivities: It is a necessary condition for the stochastic models that produce SST. Simple scalability holds if there is a function f(x, y), increasing in x and decreasing in y, and an assignment of structure values V to all lotteries, such that Pnm f ½VðSm jbn Þ; VðRm jbn Þ. Simple scalability implies a choice probability ordering independence property. Let the pairs {C,E} and {D,E} be indexed by ce and de, respectively. Let E 0 be any fourth lottery, to be substituted for E in these two pairs, forming the pairs fC; E 0 g and fD; E 0 g indexed by ce0 and de0 , respectively. Then simple scalability requires Pnce Pnde iff Pnce0 Pnde0 (Tversky & Russo, 1969). Intuitively,

Stochastic Models for Binary Discrete Choice Under Risk

211

suppose we thought of lottery E as a standard of comparison, and we measure the relative strength of subject n’s preference for lotteries C and D by the relative likelihood that these are chosen over the standard lottery E. Simple scalability requires that this ordering is independent of the choice we make for the standard lottery: We must be able to replace E by any E 0 and get the same ordering of choice probabilities. There is a large canon of experimental results, generated almost exclusively by experimental psychologists, on stochastic transitivities and simple scalability. Experimental economists will notice that much of this canon was generated long before battle was joined on methodological matters such as performance-contingent incentives and incentive compatibility, as initiated by Grether and Plott (1979). Notwithstanding Camerer and Hogarth’s (1999) claim to the contrary, there are ﬁndings based on purely hypothetical tasks, or tasks with very low incentive levels, that simply do not hold up with real performance-contingent incentives of sufﬁcient size (see e.g., Wilcox (1993) on violations of ‘‘reduction of compound lotteries,’’ or Cummings, Harrison, and Rutstro¨m (1995) on binary choice valuation methods). Nevertheless, collective experience since Grether and Plott leads me to respect the experimental canon from ‘‘psychology before incentives’’ since many (though not all) of its qualitative ﬁndings replicate when using our own methods. Therefore, I regard the psychological canon as useful information, while also believing we need to replicate its important ﬁndings using our own methods. With these remarks in mind, the psychological experimental canon suggests that SST holds much of the time, but that there are systematic violations of it in many judgment and choice contexts (Block & Marschak, 1960; Luce & Suppes, 1965; Tversky & Russo, 1969; Tversky, 1972; Luce, 1977). That evidence coincides with theoretical reasoning based on similarity and/or dominance relations (Debreu, 1960), and generally supports MST instead; but much of that evidence does not use lotteries as choice alternatives. In the speciﬁc case of choice under risk, some evidence against SST comes from experiments where lottery outcome probabilities are deliberately made uncertain or imperfectly discriminable by experimental manipulation (e.g., Chipman, 1963; Tversky, 1969). Indeed under these circumstances Tversky showed that WST can be violated in systematic ways by at least some subjects. However, there are occasional violations of SST with standard lotteries too, and with our own methods, again matching theoretical reasoning based on similarity relations (Ballinger & Wilcox, 1997). The implications of simple scalability have also failed repeatedly with

212

NATHANIEL T. WILCOX

lotteries (e.g., Myers & Sadler, 1960; see Busemeyer & Townsend, 1993) though many such demonstrations involve miniscule or purely hypothetical incentives. My own assessment of this evidence is that we should expect stochastic models that imply SST to have some systematic problems, but that stochastic models implying MST may be fairly robust for typical binary lottery choice under risk.8

2. FIVE STOCHASTIC MODELS The sheer number of stochastic modeling options for binary discrete choice under risk is, frankly, amazingly large. Good sources for the ﬁve models I consider here, as well as other models, are Becker et al. (1963a), Luce and Suppes (1965), Loomes and Sugden (1995), Busemeyer and Townsend (1993), Fishburn (1999), and Wilcox (2007a). Three of the models are wellknown to experimental economics: These are the RP model, the strong utility (or Fechnerian) model, and the strict utility model. Moderate utility models are less well-known to the ﬁeld, though related stochastic modeling assumptions are found in Hey (1995) and Buschena and Zilberman (2000). I discuss two of these: The WV model of Carroll (1980) and my own contextual utility model (Wilcox, 2007a). I also brieﬂy discuss two other very interesting models, Blavatskyy’s (2007) truncated error model and Busemeyer and Townsend’s (1993) decision ﬁeld theory, but do not treat these in detail here. Because stochastic transitivities are such a fundamental part of stochastic models, these properties are discussed with the introduction of each model, rather than in later sections. I begin with a useful stochastic modeling device that can be used in conjunction with any of these models in various ways and for various purposes.

2.1. Trembles Some randomness of observed choice has been thought to arise from attention lapses or simple responding mistakes (e.g., pushing the wrong button) that are wholly or mostly independent of pairs m. Following Moffatt and Peters (2001), call such events trembles and assume they occur with probability on independent of m and, in the event of a tremble, assume that choices of Sm or Rm are equiprobable.9 In concert with this, draw a distinction between overall choice probabilities Pnm (that in part reﬂect trembles) and considered choice probabilities Pnm that depend on

Stochastic Models for Binary Discrete Choice Under Risk

213

characteristics of m and govern choice behavior when no tremble occurs. Under these assumptions and deﬁnitions, we have Pnm ¼ ð1 on ÞPnm þ

on 2

(6)

Note that under Eq. (6), Pnm 0:5 iff Pnm 0:5, ’ onA[0,1]. In words, we may give a stochastic deﬁnition of preference in terms of either Pnm or Pnm since trembles do not reverse preference directions relative to stochastic indifference (deﬁned as Pnm ¼ Pnm ¼ 0:5. The stochastic models that follow are all models of considered choice probabilities Pnm , to which we may add trembles in the manner of Eq. (6) if this is empirically or theoretically desirable. Later, we will see that trembles usefully allow for violations of ‘‘transparent’’ instances of ﬁrst-order stochastic dominance.

2.2. The Random Preference Model The RP model views each subject as having many deterministic relational systems, and assumes that at each trial of pair m, the subject randomly draws one of these and chooses (without error) according to the drawn relational system’s relational statement for that pair. From an econometric viewpoint, the RP model views random choice as arising from randomness of structural parameters. We think of each individual subject n as having an urn ﬁlled with possible structural parameter vectors bn . (For instance, a CRRA EU structure with an RP model could be thought of as a situation where subject n has an urn ﬁlled with various values of her coefﬁcient of relative risk aversion jn .) At each new trial t of any pair m, the subject draws a new parameter vector from this urn (with replacement) and uses it to calculate both VðS m jbn Þ and VðRm jbn Þ without error. Then Sm is chosen iff VðSm jbn Þ VðRm jbn Þ 0: Formally, it is a relational system that is randomly selected, and that relational system’s relational statement for the pair fS m ; Rm g determines choice without error. Each bn represents a relational system, and so a single draw of bn is applied to both lotteries in a pair and determines the choice on that draw. The considered choice n probability is simply the probability that a value of b is drawn such that n n VðSm jb Þ VðRm jb Þ 0. Let Bm ¼ bjVðS m jbÞ VðRm jbÞ 0 ; then under the RP model, Pnm ¼ Prðbn 2 Bm Þ.10 To carry out parametric estimation for any subject n, one then needs to specify a joint distribution function Gb ðxjan Þ for bn , conditioned on some vector an of parameters governing the location and shape of Gb. Along with

214

NATHANIEL T. WILCOX

any tremble probability, call the vector Zn ¼ ðan ; on Þ stochastic parameters as they determine choice probabilities but are not themselves structural parameters. In RP models, the stochastic parameters an determine a subject’s distribution of structural parameters. We then have the considered choice probability Z n n Pm ¼ dGb ðxja Þ (7) b2Bm

Substituting (a,o) for the true parameter vector, one may then use Eq. (7) to construct a likelihood function for observations ynm conditional on (a,o), for some subject n. This likelihood function would then be maximized in (a,o) to estimate ðan ; on Þ. How does this work with speciﬁc structures on a given context? Denoting combinations of structures and stochastic models by ordered pairs, I begin with an (RDEU,RP) model, which will imply an (EU,RP) model since EU is a special case of RDEU. The technique described here is due to Loomes et al. (2002): Like them, I simplify the problem by assuming that any weighting function parameters are nonstochastic, that is, that the only structural parameters that vary in a subject’s ‘‘RP urn’’ are her utilities. Let Gu be the joint c.d.f. of those utilities. Substituting Eq. (2) into Eq. (7), considered choice probabilities under an (RDEU,RP) model with Prelec’s (1998) one-parameter weighting function are ! P n n n n Pm ¼ Pr W mz ðg Þum 0jGu ðxja Þ ; where z2cm

n

W mz ðg Þ ¼ w

P iz

! n

smi jg

w

P i4z

n

smi jg

w

P iz

! n

rmi jg

þw

P

rmi jg

n

(8)

i4z

Noting that W mj ðgn Þ W mk ðgn Þ W ml ðgn Þ for pairs m on three-outcome contexts cm ¼ ð j; k; lÞ, and assuming strict monotonicity of utility in outcomes so that we may divide the inequality in Eq. (8) through by unk unj , Eq. (8) becomes Pnm ¼ PrðW mk ðgn Þ þ W ml ðgn Þðvnm þ 1Þ 0jGu ðxjan ÞÞ; where ðun unk Þ ; vnm ln ðuk unj Þ W mk ðgn Þ ¼ wðsmk þ sml jgn Þ wðsml jgn Þ wðrmk þ rml jgn Þ þ wðrml jgn Þ; and W ml ðgn Þ ¼ wðsml jgn Þ wðrml jgn Þ

(9)

Stochastic Models for Binary Discrete Choice Under Risk

215

This elegant trick reduces the random utility vector ðunj ; unk ; unl Þ on context cm to the scalar random variable vnm 2 Rþ , containing all choice-relevant stochastic information about the agent’s random utilities on context cm. Let Gvm ðxjanm Þ be the c.d.f. of vnm (generated by Gu), and consider just basic pairs where rml sml 40 so that W ml ðgn Þ ¼ wðrml jgn Þ wðsml jgn Þ40.11 We can then rewrite Eq. (9) to make the change of random variables described above explicit: W mk ðgn Þ þ W ml ðgn Þ n n jGvm ðxjam Þ ; or ¼ Pr vm W ml ðgn Þ wðsmk þ sml jgn Þ wðrmk þ rml jgn Þ n n jam Pm ¼ Gvm wðrml jgn Þ wðsml jgn Þ

Pnm

(10)

With Eq. (10), we have arrived where Loomes et al. (2002) left things. Choosing some c.d.f. on R+ for Gvm , such as the lognormal or gamma distribution, construction of a likelihood function from Eq. (10) and choice data is straightforward for one context and this is the kind of experimental data Loomes, Moffatt, and Sugden considered. But careful attention to the notation above makes two things clear that Loomes, Moffatt, and Sugden did not discuss. First, because the random variable vnm is generated from the joint distribution of utilities on the context cm, its c.d.f. Gvm is contextdependent. Second, if m and mu are pairs on distinct contexts that share some outcomes, one cannot simply choose any old distribution functions for vnm and vnm0 since these two distinct random variables are generated by the same underlying joint distribution Gu of outcome utilities. This implies that tractable generalizations of (RDEU,RP) models across multiple contexts are inherently subtle, as will be made plain later. Loomes and Sugden (1995) point out a property of RP models that I make repeated use of later. The deﬁnition of a preference equivalence set OeV given in Eq. (3), and the deﬁnition Bm ¼ bjVðS m jbÞ VðRm jbÞ 0 , together imply that Bm Bm0 8m and m0 2 OeV . Eq. (7) then implies that Pnm Pnm0 8m and m0 2 OeV . That is, the RP model requires that each subject n with structure V must have constant choice probabilities across every pair that is in a preference equivalence set for structure V. This does not mean that all subjects must have the same choice probabilities; these may vary across subjects with the same V, since different subjects may have differently composed ‘‘urns’’ of parameter vectors b. However, it implies the following: If the RP model and structure V are both true for all subjects in a population, expected sample choice proportions for all pairs m and m0 in a preference equivalence set of structure V must be equal. Formally, replace

216

NATHANIEL T. WILCOX

the subject index n by the subject type index c with distribution in the R JðcÞ c eV c c 0 sampled population: If P P 8m and m 2 O , 8c, then P dJðcÞ m0 m m R c Pm0 dJðcÞ8m and m0 2 OeV as well. That is, the same individual invariance of choice probabilities across pairs in any preference equivalence set will characterize the population (and, obviously, any sample from it up to sampling variability). This preference equivalence set property of RP models is extremely powerful, especially for more restrictive structures like expected utility which create several distinct kinds of preference equivalence sets: I use it frequently below. It is also a seductive property because it is a powerful identifying restriction and rationalizes many relatively casual inferences. For instance, we will see later that the usual conclusion drawn about the common ratio effect (that it is an EU violation) is sensible if RP is the true stochastic model. However, there is a downside to this property of RP models. Because few preference equivalence sets are shared by several different theories, tests of a preference equivalence set property are almost always joint tests of the RP model and some speciﬁc structure V. The only exception to this is the preference equivalence set of FOSD pairs, which is shared by many structures V. In stark contrast to the powerful preference equivalence set property, RP models generally have no stochastic transitivity property – not even WST – even if the structure considered is a transitive one like EU or RDEU (Loomes & Sugden, 1995; Fishburn, 1999). The reason for this is identical to the well-known ‘‘voting paradox’’ of public choice theory (Black, 1948): Unless all preference orderings in the urn have the property of singlepeakedness, there need be no Condorcet winner.12 Those who regard tests of transitivity as central to empirical decision research may ﬁnd this aspect of RPs deeply troubling. Of course, a proponent of RPs might well accept a restriction to urns of preference orderings that do have the singlepeakedness property.13 Proponents of rank-dependent models will similarly (sometimes) accept a shape restriction on weighting functions, utility functions, and/or value functions, in order to mollify critics who deem these models too ﬂexible relative to the more restrictive EU alternative.

2.3. The Strong Utility and Strict Utility Models: The Fechnerian and ‘‘Luce’’ Models Strong utility was ﬁrst axiomatized by Debreu (1958), but it has a very old pedigree going back to Thurstone (1927) and Fechner (1966/1860) and

Stochastic Models for Binary Discrete Choice Under Risk

many writers call it behavioral meaning increasing function (i.e., skew-symmetry

217

the Fechnerian model. Strong utility models attach to V-distance: They assume that there exists an F:R-[0,1], with Fð0Þ ¼ 0:5 and FðxÞ ¼ 1 FðxÞ about x ¼ 0), such that Pnm ¼ Fðln ½VðS m jbn Þ VðRm jbn ÞÞ

(11)

Because EU and RDEU are afﬁne structures, the V-distance VðS m jbn Þ VðRm jbn Þ is only unique up to scale. From one theoretical viewpoint, then, strong utility models for EU or RDEU might seem ill-conceived since the V-distance which is its argument is nonunique. We need to keep in mind, though, that the EU and RDEU structures are representations of underlying preference directions, not underlying choice probabilities. A more positive view of the matter is that both EU and RDEU imply that ln is a free parameter that may be chosen to do stochastic descriptive work that is not these structures’ primary descriptive purpose. No preference direction represented by the structure V is changed, for any ﬁxed bn , as ln varies. However, a somewhat subtle problem still lurks within this line of thinking, having to do with the stochastic meaning of risk aversion: This problem is the inspiration for the contextual utility model introduced later. It is wellknow that strong utility models imply SST (Morrison, 1963; Luce & Suppes, 1965). There is a virtually one-to-one relationship between strong utility models and homoscedastic ‘‘latent variable models’’ widely employed in empirical microeconomics for modeling discrete dependent variables. In general, such models assume that there is an underlying but unobserved continuous random latent variable ynmn such that ynm ¼ 13ynmn 0; then Pnm ¼ Prð ynmn 0Þ. In our case, the latent variable takes the form ynmn ¼ VðS m jbn Þ VðRm jbn Þ sn , where e is a mean zero random variable with some standard variance and c.d.f. FðxÞ such that Fð0Þ ¼ 0:5 and FðxÞ ¼ 1 FðxÞ, usually assumed to be the standard normal or logistic c.d.f.14 The resulting latent variable model of a considered choice probability is then VðSm jbn Þ VðRm jbn Þ n Pm ¼ F (12) sn In latent variable models, the random variable sn may be thought of as random computational, perceptual or evaluative noise in the decision maker’s apprehension of VðS m jbn Þ VðRm jbn Þ, with sn being proportional to the standard deviation of this noise. As sn approaches zero, considered

218

NATHANIEL T. WILCOX

choice probabilities converge on zero or one, depending on the sign of VðSm jbn Þ VðRm jbn Þ; in other words, the observed choice becomes increasingly likely to express the underlying preference direction. To complete the analogy with a strong utility model, one may interpret ln as equivalent to 1=sn . In keeping with common (but not universal) parlance, I will call ln subject n’s precision parameter. In strong utility models with a tremble, the stochastic parameter vector is Zn ¼ ðln ; on Þ. One of Luce’s (1959) stochastic choice models, known as the strict utility model, may be thought of as a strong utility model in which natural logarithms of structural lottery values replace the lottery values. It appears in contemporary applied work, where for example Holt and Laury (2002) write considered choice probabilities as n

Pcnm

¼

VðSm jbn Þl n

VðS m jbn Þl þ VðRm jbn Þl

n

(13)

A little bit of algebra shows that this is equivalent to Pcnm ¼ L½ln ðln½VðS m jbn Þ ln½VðRm jbn ÞÞ

(14)

where LðxÞ ¼ ½1 þ e x 1 is the Logistic c.d.f. This resembles strong utility, but natural logarithms of V are differenced to create the latent variable, rather than differencing V itself: I will call this logarithmic V-distance. Note that strict utility algebraically requires strictly positive values of V. The nonparametric representation of utilities adopted earlier, that is U n ¼ ð0; 1; un2 ; un3 ; . . .Þ on the outcomes (0,1,2,3,y) makes this so (since the minimum outcome 0 has a utility of zero), so this is satisﬁed for all lotteries except a sure zero outcome. Formally, there is a theoretical mismatch inherent in combining afﬁne structures (such as EU or RDEU) with strict utility. An afﬁne structure V is unique up to an afﬁne transformation. Yet formally speaking, the axiom systems that produce strict utility models imply that the V within the stochastic speciﬁcation is the stronger kind of scale known as a ratio scale, in which V must be strictly positive and is unique only up to a ratio transformation (Luce & Suppes, 1965). It is not entirely clear whether this theoretical mismatch implies any highly consequential and deep axiomatic incoherence. From a purely algebraic perspective, all we must do is choose a nonnegative utility of the minimum outcome in some experiment, and the combination will be well-deﬁned. Yet as we will see below, an (EU,Strict) model will then have a rather peculiar property: It can explain common ratio effects on any context where the minimum outcome’s utility is positive,

Stochastic Models for Binary Discrete Choice Under Risk

219

but not on any context where the minimum outcome’s utility is zero. Strict utility has other relatively attractive properties across contexts, but contextual utility will share those properties without this odd ‘‘scaling dependence’’ vis-a`-vis the common ratio effect. Before leaving strong and strict utility, note that several distinct models get called ‘‘Luce’’ models. This is not surprising since Luce developed many different probabilistic choice models, but it does cause some confusion. The strict utility model in Eq. (13) is one of these, but this well-known model, which closely resembles it, also gets called ‘‘the Luce model:’’ Pcnm ¼

exp½ln VðSm jbn Þ exp½ln VðSm jbn Þ þ exp½ln VðRm jbn Þ

(15)

Many readers will recognize this model as the one used by McKelvey and Palfrey (1995) for quantal response equilibrium and by Camerer and Ho (1999) for their EWA learning model. Under the terminology I am using here, this is a strong utility model because simple algebra shows it to be equivalent to Eq. (11) with a logistic c.d.f. for F. From the viewpoint of the afﬁne structures EU and RDEU considered here, the crucial distinction between what I am calling strong and strict utility is whether the argument of the c.d.f. F is a V-distance or a logarithmic V-distance. All strict utility models are also strong utility models of a sort, in which natural logarithms of the structure V replace the structure V and the c.d.f. is the logistic, but not all strong utility models are strict utility models (Luce & Suppes, 1965). Since the nonlinear transformation of V by natural logarithms is a peculiar move for afﬁne structures, I distinguish these particular models with the name ‘‘strict utility’’ and reserve the term ‘‘strong utility’’ for models in which V-distance, and not logarithmic V-distance, appears in the c.d.f. F.

2.4. Moderate Utility I: The Wandering Vector Model Econometrically, moderate utility models are heteroscedastic latent variable models, that is models where the standard deviation of judgmental noise is conditioned on pairs m so that we write snm , and considered choice probabilities become VðSm jbn Þ VðRm jbn Þ c Pnm ¼ F (16) snm

220

NATHANIEL T. WILCOX

Hey (1995) and Buschena and Zilberman (2000) have explored some heteroscedastic models for discrete choice under risk. Moderate utility models, however, place speciﬁc restrictions on the form of the heteroscedasticity, so as to guarantee MST. Again consider the three pairs associated with any lottery triple {C,D,E}, that is the pairs {C,D}, {D,E}, and {C,E}, and call these pairs 1, 2, and 3, respectively. Then moderate utility models require that standard deviations snm behave like a distance measure or norm on these lottery pairs, satisfying the triangle inequality: That is, they require sn1 þ sn2 sn3 across all such triples of pairs in order to satisfy MST (Halff, 1976; see Appendix A). Therefore, letting d m dðSm ; Rm Þ be a norm on pairs m, the moderate utility model is n l ½VðSm jbn Þ VðRm jbn Þ c Pnm ¼ F (17) dm We can generate one class of moderate utility models by using a measure of distance between probability vectors Sm and Rm. The Minkowski norm P ð Ii¼1 jsim rim ja Þ1=a is an obvious choice for d m dðS m ; Rm Þ here; this would add the extra parameter aZ1 to a model. Intuitively, such a norm is one measure of similarity between lotteries in a pair, and these moderate utility models assert that for given V-distance, more similar lotteries are compared with less noise.15 Carroll (1980) pioneered a simple computational underpinning for this the WV model. It implies the Euclidean norm PI1intuition called ð z¼0 ðsmz rmz Þ2 Þ1=2 as the proper choice for dm, so that the WV model has no extra parameters. Therefore, we can compare the WV model to RPs, strong utility and strict utility without taking a position on the value of parsimony. Therefore, I illustrate it here for the expected utility structure. Suppose subject n has a noisy perception of her utilities of each outcome z; in particular, suppose this utility is a random variable unz ¼ unz xnz , where xnz N½0; ðsnu Þ2 8z and unz is her mean utility of outcome z. At each new trial of any pair m, assume that a new vector of noisy utility perceptions occurs, so that there is a new realization of the vector ðxn0 ; xn1 ; . . . ; xnI 1 Þ, the ‘‘wandering’’ part of P the utility vector. P Then by deﬁniI1 I1 tion, VðS m jbn Þ VðRm jbn Þ sn m z¼0 ðsmz rmz Þunz z¼0 ðsmz rmz Þunz PI1 P I1 n ðsmz rmz Þxz , so that sn m z¼0 ðsmi rmi Þxnz . Since PI1z¼0 n ðs r Þx is a linear combination of normally distributed random mi mi z z¼0 variables, it is normally distributed too. If we further assume that covðxz ; xz0 Þ ¼ 0 8 zaz0 – what Carroll and De Soete (1991) call the PI1 n ‘‘standard’’ WV model – the variance of z¼0 ðsmi rmi Þxz is then

221

Stochastic Models for Binary Discrete Choice Under Risk

P 2 ðsnu Þ2 I1 of the latent z¼0 ðsmz rmz Þ . Therefore, the standard PI1 deviation2 0:5 variable’s error term sn m becomes snu ð z¼0 ðsmz rmz Þ Þ . Thus the standard WV model is a moderate utility model of the Eq. (17) form, where ln ¼ 1=snu , dm is the Euclidean norm, and F(x) is the standard normal c.d.f.

2.5. Moderate Utility II: Contextual Utility Let Vðzjbn Þ be subject n’s structural value of a ‘‘degenerate’’ lottery that pays z with certainty. Contextual utility is a moderate utility model in which n min n 16 max the distance norm is d nm ¼ ½Vðzmax where zmin m jb Þ Vðzm jb Þ, m and zm denote the minimum and maximum possible outcomes in the context cm of pair m. Thus, considered choice probabilities are n n n VðS m jb Þ VðRm jb Þ n Pm ¼ F l (18) n min n Vðzmax m jb Þ Vðzm jb Þ Contextual utility essentially asserts that the stochastic perceptual impact of V-distance in a pair is mediated by the range of possible outcome utilities in a pair, a notion which has a ring of psychophysical plausibility about it, and which has grounding in psychologists’ experiments on categorization and models of error in categorization (Wilcox, 2007a). Econometrically, it is the assumption that the standard deviation of computational error snm is proportional to the range of outcome utilities perceived by subject n in pair m. Although this is both pair- and subject-speciﬁc heteroscedasticity, it introduces no extra parameters into a model since the form of the heteroscedasticity is entirely determined by pre-existing parameters (outcome utilities). Contextual utility makes the stochastic implications of structural deﬁnitions of the MRA relation sensible within and across contexts. In afﬁne structures such as EU and RDEU, the only truly unique characteristic of a utility function is a ratio of differences: Intuitively, contextual utility exploits this uniqueness to create a correspondence between structural and stochastic deﬁnitions of MRA. To see this, consider any three-outcome MPS pair on any context cm ¼ ð j; k; lÞ. Under the RDEU structure and contextual utility, the choice probability in Eq. (18) can be rewritten as Pnm ¼ Fðln ½ðwsmk wrmk Þunm þ ðwsml wrml ÞÞ; where unm ¼

unk unj unl unj

(19)

Since wsmk wrmk 40 in MPS pairs, Eq. (19) shows that Pnm is increasing in the ratio of differences unm . Note the similarity between unm in Eq. (19) and vnm

222

NATHANIEL T. WILCOX

Eq. (9) from the section on RPs. In both models, the three utilities on a context cm are reduced to a single ratio of differences by afﬁne transformations (but they are not the same ratio, and the stochastic treatment of these ratios differs across the models). Consider two subjects Anne and Bob: Assume that they have identical weighting functions (which includes the case where both have an EU structure) and that Bob’s local absolute risk aversion u00 ðzÞ=u0 ðzÞ exceeds that of Anne for all z. The latter assumption, and simple algebra based on Anne Pratt’s (1964) theorem, then implies that uBob on all contexts cm. mk 4umk Formally, these conditions imply that Bob is more risk averse than Anne (or Bob Anne) in the structural sense Chew, Karni, and Safra (1987) mra deﬁne for RDEU preferences: Although differences in weighting functions contribute to differences in risk aversion in RDEU models, we focus here on the ‘‘traditional’’ source of risk aversion associated with the curvature of utility functions by holding the weighting function constant across agents. Finally, assume that Bob and Anne are ‘‘equally noisy’’ decision makers (that lBob ¼ lAnne ). It then follows from Eq. (19) that Anne PBob for all m 2 Omps . This is a sensible (albeit strong) meaning of m 4Pm ‘‘Bob is stochastically more risk averse than Anne’’ or Bob Anne, and it smra closely resembles Hilton’s (1989) deﬁnition of ‘‘more risk averse in selection.’’ Wilcox (2007a) also shows that under strong utility and strict utility, it is not possible for Bob Anne to imply Bob Anne across all mra smra contexts. It is important to notice that strong utility and contextual utility are observationally indistinguishable on a single context. This is easy to see. In a contextual utility model, we can redeﬁne the precision parameter on n min n context cm as lnm ln =½Vðzmax m jb Þ Vðzm jb Þ; seen this way, we understand that contextual utility is a model with subject- and context-speciﬁc heteroscedasticity. Obviously, when we conﬁne attention to pairs on a single ﬁxed context, we can ignore the context-dependence and suppress the subscript m on lnm , writing ln instead. So for any set of pairs on a ﬁxed context, contextual utility behaves exactly as strong utility does. This is a useful fact: It means that any prediction of strong utility on a single context will also be true of contextual utility on a single context, and I use this repeatedly below. Notice too that this property implies that it is entirely pointless to compare the ﬁt of strong and contextual utility using choice data in which no subject makes choices from pairs on several different contexts (e.g., the data set of Loomes & Sugden, 1998), since strong and contextual utility are observationally indistinguishable for such data. One can still estimate the contextual utility model with such data, however; and

Stochastic Models for Binary Discrete Choice Under Risk

223

for reasons discussed above, comparisons of risk aversion estimates across subjects will be potentially more meaningful for the purpose of prediction in other contexts.

2.6. Other Models There are many possible heteroscedastic models of choice under risk and uncertainty. It is possible to imagine a model resembling the WV model, in which it is probabilities (or weights, in the RDEU case) that are random variables rather than utilities. This possibility seems most compelling in choice under uncertainty, that is when alternatives are acts with consequences in different states and no objective probabilities are available. In this situation, random variation in subjective probabilities of states across trials is a quite plausible conjecture. This is the initial point of departure for decision ﬁeld theory, developed by Busemeyer and Townsend (1993). Decision ﬁeld theory is an explicitly computational model based on random shifts of attention between states, and formally reﬂected by random variation of subjective probability weights. Decision ﬁeld theory explains many stylized facts of stochastic choice under uncertainty; for instance, it predicts the kinds of violations of SST and simple scalability observed in the psychological canon. Therefore, it obviously deserves serious attention. Although I do not consider decision ﬁeld theory in detail here, I will refer to it often in discussing the other models. Blavatskyy (2007) has offered an interesting heteroscedastic model based on the notion that subjects will trim or truncate ‘‘illogical’’ errors. The idea here is that noise in the computation of V-distance should not exceed the logical bounds imposed by the utilities of the maximum and minimum outcomes of the lotteries in a pair. Blavatskyy calls this the ‘‘internality axiom.’’ The error truncation implies that the distribution of truncated errors depends on the lottery pair and does not have a zero median: In other words, evaluative errors have predictable biases in Blavatskyy’s model. Because of this, the truncated error model with an EU structure can explain phenomena such as the four-fold pattern of risk aversion that are normally thought of as demanding a rank-dependent structure like RDEU. It should be noted that Blavatskyy also adds heteroscedasticity to his model in a manner that closely resembles contextual utility without much comment, which may help to account for its good performance in Blavatskyy’s tests. We ought also to expect stochastic consequences of pair complexity, and there is evidence of this (e.g., Sonsino, Benzion, & Mador, 2002), but

224

NATHANIEL T. WILCOX

I refrain from any development of this here. Perhaps this kind of stochastic modeling deserves wider examination by experimentalists. Again, strong utility models are simple and powerful workhorses, and probably equal to many estimation tasks; but they have had descriptive problems that moderate utility models largely solved, at least in psychologists’ experimental canon. While many of those experiments did not follow the methodological dicta of experimental economics, they still give us a good reason to examine the WV model alongside strong utility and RP.

3. THE AMBIGUITY OF AVERAGE TREATMENT EFFECTS: A COMMON RATIO ILLUSTRATION Stochastic models matter in part because they mediate the predictions of structures. This fact alone means that inferences about structures depend crucially on stochastic modeling assumptions. If in addition subjects are heterogeneous, structural inferences are still more difﬁcult. The purpose of this section is to illustrate this in detail for the common ratio effect introduced in Section 1.3.1. Throughout this section it is assumed that the true structure of all subjects is EU: Therefore, in this section ‘‘subject heterogeneity’’ means only heterogeneity of subjects’ utilities and/or stochastic parameter vectors. This kind of heterogeneity is, by itself, enough to make inferences from observed sample proportions very difﬁcult without a strong stochastic identifying assumption such as the RP model. This is true even for within-designs. Formally, the inference problem grows from the fact that structures are about preference directions, while stochastic models determine the observed magnitude of these preference directions as reﬂected by choice probabilities and how these change across the pairs in a preference equivalence set of some theory, such as a common ratio set for EU. Structures play an important role in the reality of observed choice proportions, but stochastic models and subject heterogeneity play large and confounding roles too. Throughout this section, I replace the subject index n by the subject type index c, and will assume that this is distributed JðcÞ in the sampled population. This represents heterogeneity in the sampled population. To think about this heterogeneity in the simplest possible terms in this section, consider a population of subjects with EU structures, composed of just two types c 2 fS; Rg. Type S(R) strictly prefers R the safe (risky) lottery in all pairs in the common ratio set. Then Pt Pct dJðcÞ yPSt þ ð1 yÞPR t is the

Stochastic Models for Binary Discrete Choice Under Risk

225

expected population proportion of choices of S t from pair t, where R dJðcÞ ¼ y 2 ½0; 1 denotes the proportion of the population that is c¼S type S. This two-type population mixture is used repeatedly below. Note that throughout this discussion, I assume that truly indifferent subjects are of zero measure in the population. This is to keep things simple: A nonzero fraction of truly indifferent subjects only complicates the discussion below without creating any special insights. 3.1. Predictions of the Stochastic Models in Common Ratio Sets As discussed in Section 1.3.1, a common ratio set is composed of at least two pairs of the form fSt ; Rt g fð1 ts; ts; 0Þ; ð1 tr; 0; trÞg, both on one context ð j; k; lÞ where sWr. For the EU structure, we have VðSt jbc Þ ¼ ð1 tsÞucj þ tsuck and VðRt jbc Þ ¼ ð1 trÞucj þ trucl . In general, consider two pairs fS t ; Rt g and fS t0 ; Rt0 g where t4t0 . 3.1.1. Random Preferences and the Wandering Vector Model RPs are the simplest of the predictions. Recall from Section 1.3.1 that common ratio sets are preference equivalence sets for EU. It immediately follows from the preference equivalence set property of RP models that an (EU,RP) model predicts that Pct ¼ Pct0 for all c. Therefore, regardless of the distribution of c, population choice proportions are constant across the pairs of a common ratio set, so that Pt ¼ Pt0 . The WV model behaves in exactly the same way, but for a different reason. With the probability vectors in pair t given by fð1 ts; ts; 0Þ; ð1 tr; 0; trÞg, and the distance d t between these vectors given by Euclidean distance, we have d t ¼ ððtr tsÞ2 þ ðtsÞ2 þ ðtrÞ2 Þ0:5 ¼ tððr sÞ2 þ s2 þ r2 Þ0:5 ¼ td 1

(20)

Recall that the V-distance in common ratio pairs is t½ðr sÞucj þ suck rucl , and recall that in the WV model, this V-distance is divided by the distance measure: Clearly, this division eliminates the common ratio t. Therefore, the argument of F – the latent variable – will not depend on the common ratio in the WV model. As with (EU,RP) models, any (EU,WV) model requires Pct ¼ Pct0 for all c and hence Pt ¼ Pt0 in the population. 3.1.2. Strong Utility and Contextual Utility Recall that on a given context, contextual utility and strong utility make the same predictions. Therefore, since pairs in a common ratio set are all on a

226

NATHANIEL T. WILCOX

given context, we may treat strong and contextual utility together here. Recall that the V-distance between common ratio pairs is t½ðr sÞucj þ suck rucl . Note that ðr sÞucj þ suck rucl is the V-distance in the root pair (i.e., the pair with t ¼ 1) of the common ratio set: This is positive for S-type subjects (since they prefer the safe lotteries in the common ratio set’s pairs) and is negative for R-type subjects (since they prefer the risky lotteries in the common ratio set’s pairs). Therefore, the V-distance is increasing in t for S-types and decreasing in t for R-types. In strong and contextual utility, choice probabilities are increasing in V-distance. Putting all this together, we have these predictions for strong and contextual utility: R 0 PSt 4PSt0 40:5 and PR t oPt0 o0:58t4t

(21)

To get a sense of possibilities with a very typical F and common ratio set, choose the logistic distribution for F, and consider the two pairs generated by t ¼ 1 (the root pair) and t0 ¼ 1=4. Let Dc lc ½ðr sÞucj þ suck rucl be the latent variable (the argument of F) for a c-type subject in the t ¼ 1 root pair of the common ratio set. Then Pc1 ¼ ½1 þ expð Dc Þ 1 and Pc1=4 ¼ ½1 þ expð Dc =4Þ 1 . Fig. 1 illustrates the relationship between Pc1 and Pc1=4 for three possible S- and R-types. For the S-types, the three values of DS considered are 15, 1.5, and 0.15, corresponding to a precise, moderate, and noisy S-type respectively. Similarly, the three values 15, 1.5, and 0.15 for DR correspond to a precise, moderate and noisy R-type, respectively. Fig. 1 shows how the absolute value of Dc and the behavior of a typical c.d.f. such as the logistic conspire to create three distinctive possibilities for the pattern of strong or contextual utility choice probabilities over a pair of common ratio pairs. We could have very precise types, characterized by choice probabilities near one or zero in both the root pair and the t ¼ 1/4 pair. We could also have very noisy types, characterized by choice probabilities not much different from one-half in both pairs. For both of these types, the ordering relationship in Eq. (21) is reﬂected very weakly: In a population composed solely of such types, the hypotheses Pc1 ¼ Pc1=4 8c, and P1 ¼ P1=4 , would be hard to reject even at fairly large sample sizes. Put differently, it would be very difﬁcult to tell this population from an (EU,RP) population, and neither would predict the common ratio effect understood as sample proportions supporting P1 41=24P1=4 . Therefore, from the perspective of common ratio effects, interesting and distinctive (EU,Strong) or (EU,Contextual) populations must contain at least one of the ‘‘moderate’’ types shown in Fig. 1. These types show the

227

Stochastic Models for Binary Discrete Choice Under Risk 1

Probability of choosing safe lottery in pair

Δ=15 Δ=1.5 0.75

Δ=0.15 0.5 Δ=−0.15 Precise S-type Moderate S-type Noisy S-type

0.25 Δ=−1.5

Noisy R-type Moderate R-type Precise R-type

Δ=−15 0 root pair

pair with tau=1/4 Common ratio pairs

Fig. 1.

Possible Choice Probability Patterns in a Common Ratio Set: Strong Utility and Contextual Utility.

distinctive ordering relationship in Eq. (21) strongly. Fig. 2 takes the choice probabilities for moderate S-types and precise R-types from Fig. 1, and mixes them according to y ¼ 0.7. That is, Fig. 2 shows a population made up of 70% moderate S-types and 30% precise R-types. The heavy line shows that in this population, we have P1 ¼ 0.57 and P1/4 ¼ 0.42. Thus, this is a population where the EU structure is the true structure for all subjects, and yet we expect to observe P1 41=24P1=4 , a common ratio effect. We should, therefore, call this the false common ratio effect since it is a possibility in a heterogeneous EU population with standard stochastic models. This possibility is a distinctive feature of strong utility and contextual utility models (and sometimes for strict utility, as will be clear below). It is worth dwelling on this example a bit since it illustrates the ambiguities associated with typical casual and not-so-casual inferences extremely well. Consider, for instance, a within-design where each subject n makes a choice both from the root pair and from the pair with t ¼ 1/4.

228

NATHANIEL T. WILCOX 0.9 0.82

moderate S -type (70% of population)

probability of choosing safe lottery in pair

0.8 0.7 0.6

0.59

0.57

0.5 0.42 population 0.4 0.3 0.2 precise R-type (30% of population) 0.1 0.02

0.00 0 root pair

pair with tau=1/4 common ratio pair

Fig. 2.

The False Common Ratio Effect in a Two-Type (EU,Strong) or Two-Type (EU,Contextual) Population.

If we are sampling from the population of Fig. 2, we will expect an asymmetry between ‘‘predicted and unpredicted violations’’ of deterministic EU structural expectations. There is a long history of regarding such observations as decisive evidence against the EU structure (Conlisk, 1989; Harless & Camerer, 1994). The formal basis for these inferences is a stochastic model called the constant error rate model which was critically examined by Ballinger and Wilcox (1997). What would a population like that in Fig. 2 imply about this asymmetry? The event ð yn1 ¼ 1 [ yn1=4 ¼ 0Þ, that is ‘‘subject n made the safe choice in the root pair and the risky choice in the t ¼ 1/4 pair,’’ corresponds to the switch in preference predicted by (for instance) a deterministic RDEU structure with the Prelec (1998) weighting function. Similarly, the event ð yn1 ¼ 0 [ yn1=4 ¼ 1Þ corresponds to the switch in preference not predicted by that structure. Both events are violations of deterministic EU, but only the

Stochastic Models for Binary Discrete Choice Under Risk

229

former violation is predicted by an alternative (RDEU or prospect theory, with usual assumptions about weighting functions). Assuming random sampling from the population and statistically independent choices by each subject from each pair, the probabilities of the predicted and unpredicted violations for a randomly selected subject (and hence a sample) are R Prð yn1 ¼ 1 [ yn1=4 ¼ 0Þ ¼ yPS1 ð1 PS1=4 Þ þ ð1 yÞPR 1 ð1 P1=4 Þ;

and

(22)

R Prð yn1 ¼ 0 [ yn1=4 ¼ 1Þ ¼ yð1 PS1 ÞPS1=4 þ ð1 yÞð1 PR 1 ÞP1=4

From the information in Fig. 2 and these equations, it is simple to calculate the expected proportion of both kinds of violations in that population: These are Prð yn1 ¼ 1 [ yn1=4 ¼ 0Þ ¼ 0:235 and Prð yn1 ¼ 0 [ yn1=4 ¼ 1Þ ¼ 0:080. That is, a heterogeneous (EU,Strong) or (EU,Contextual) population like that in Fig. 2 implies that ‘‘predicted violations’’ of deterministic EU will be three times more common than ‘‘unpredicted violations.’’ Consider samples of N ¼ 80 subjects, drawn randomly from the Fig. 2 population. A simple Monte Carlo simulation shows that in about ﬁve out of six such samples, we would reject (at 5%, two-tailed) the hypothesis that predicted and unpredicted violations are equally likely, using the suggested test of Conlisk (1989) which is based (incorrectly for this population) on the constant error rate assumption described by Conlisk in his appendix and generalized by Harless and Camerer (1994). The population in Fig. 2 also implies ‘‘within-pair switching rates’’ that are typical of lottery choice experiments. Experiments with repeated trials allow one to look at the degree of choice consistency. Suppose that in our experiment, subjects had two trials t ¼ 1 and 2 of both the root pair and the pair with t ¼ 1/4. Adding back the trial subscript for a moment, the within switching probability for any pair, for a randomly selected subject, is R W t Prð ynt;1 aynt;2 Þ ¼ 2½yPSt ð1 PSt Þ þ ð1 yÞPR t ð1 Pt Þ

(23)

Using the information in Fig. 2 and this equation, we have expected withinpair switching rates W 1 ¼ 0:209 and W 1=4 ¼ 0:351; these are of the magnitude observed in the experimental canon on common ratio effects using repeated trials (e.g., Camerer, 1989; Starmer & Sugden, 1989; Ballinger & Wilcox, 1997).

230

NATHANIEL T. WILCOX

3.1.3. Strict Utility On any context where ucj ¼ 0, that is where the context’s minimum outcome has zero utility, strict utility behaves just like the RP model and the WV model do in common ratio sets. However, if the common ratio set is deﬁned on a context where ucj 40, strict utility instead behaves just as strong utility and contextual utility do in common ratio sets. To analyze both cases, recall that strict utility uses a logarithmic V-distance as the latent variable in F. In a common ratio pair, this is ln½VðS t jbc Þ ln½VðRt jbc Þ ¼ ln½ð1 tsÞucj þ tsuck ln½ð1 trÞucj þ trucl (24) If the common ratio set is deﬁned on a context where ucj ¼ 0, the right side of this equation is lnðtsuck Þ lnðtrucl Þ ¼ lnðsuck Þ lnðrucl Þ. So it is clear that, in this instance, strict utility behaves just as the WV model does with the EU structure: The common ratio disappears from the latent variable, so Pct ¼ Pct0 for all c and hence Pt ¼ Pt0 in the population. In the case of a context where ucj 40, the derivative of Eq. (24) with respect to t is sðuck ucj Þ rðucl ucj Þ @ ln½VðSt jbc Þ ln½VðRt jbc Þ ¼ c (25) @t uj þ tsðuck ucj Þ ucj þ trðucl ucj Þ The two terms on the right share the form b=ða þ tbÞ, differing only by b, and a simple differentiation shows this to be increasing in b. It follows that Eq. (25) has the same sign as sðuck ucj Þ rðucl ucj Þ, and this is positive for S-types and negative for R-types. Therefore, when ucj 40 strict utility allows the same patterns of choice probabilities shown in Eq. (21), and illustrated by Fig. 1, that strong utility and contextual utility do. The dependence of strict utility’s predictions on the utility of the minimum outcome in the context of the common ratio set occurs because of the theoretical mismatch between the afﬁne structure EU and the fact that strict utility requires a ratio scale. Because of this, what is an arbitrary choice in deterministic EU – the choice of a zero for the utility function – is consequential when an EU structure is put into a strict utility model.

Stochastic Models for Binary Discrete Choice Under Risk

231

3.2. Summary To summarize, all of the qualitative features of simple sample moments that get emphasized in the literature on the common ratio effect are reproducible by a heterogeneous EU structure population in which strong or contextual utility is the true stochastic model (and sometimes the strict utility model too). Therefore, these qualitative ﬁndings cannot by themselves be the reason we dismiss the EU structure. This is what I mean by ‘‘the ambiguity of average treatment effects:’’ Generally, their qualitative patterns are not by themselves capable of telling us which structures are true. To do that, we need to make explicit assumptions about stochastic models and the nature of heterogeneity in the sampled population. This realization is why authors such as Loomes et al. (2002) have revisited old data sets (Loomes & Sugden, 1998) and re-analyzed them with explicit attention to both stochastic models and heterogeneity. The point of this discussion is not – que milagro – to explain away common ratio effects as mere aggregation phenomena with strong, strict, or contextual utility. Rather, it is that parts, perhaps substantial parts, of what we normally think of as violations of EU may be due to aggregation and stochastic models, rather than nonlinear probability weighting arising from rank-dependent structure. Put differently, if we wish to properly measure the strength of nonlinear probability weighting, the examples show that we will necessarily need to take account of heterogeneity whenever we believe that the true stochastic model is strong, strict, or contextual utility. This is the important take-away message of this section.

4. PROPERTIES OF THE STOCHASTIC MODELS COMBINED WITH THE STRUCTURES I now turn to a general listing of how the stochastic models of Section 2 combine with the EU and RDEU structures, in terms of the properties reviewed in Section 1. The previous section just did this in detail for the common ratio set property of the EU structure. Recall that stochastic transitivity properties (or lack of these) were discussed in Section 2, as each stochastic model was introduced. Nevertheless, it will be interesting to consider the implications of models that obey SST as we look at sets of MPS pairs on a given context for EU. Throughout much of this section, I suppress both the subject and subject type superscripts (n or c) to keep

232

NATHANIEL T. WILCOX

down notational clutter. But it is important to remember that the results described here are for individual subjects or subject types: As the previous section on the common ratio effect showed, many of these properties will be hidden, modiﬁed, or confounded by aggregation across different types of subjects. This is noted where it is important.

4.1. Mean Preserving Spreads, Stochastic Transitivities, and Betweenness Recall that Ocmps Omps is the set of MPS pairs on any speciﬁc threeoutcome context c. Section 1.3.2 showed that this is a preference equivalence set for EU. Obviously, any speciﬁc subsets of any Ocmps will also be preference equivalence sets for EU. A particularly interesting subset is any three MPS pairs fC h ; Dh g, fDh ; E h g and fC h ; E h g, indexed by hi ¼ h1, h2, and h3, respectively, generated by a triple h of lotteries fC h ; Dh ; E h g with common expected value, all on one context c. In this instance, E h is a MPS of both Ch and Dh , and Dh is also a MPS of C h : C h is safest, and E h riskiest, in such triples, with Dh of moderate risk. Call such a set of three lotteries, and the three MPS pairs it generates, a spread triple. Table 1 shows three spread triples, each on a different context, that happen to occur in Hey’s (2001) experimental design. The spread triples are indexed by h 2 f1; 2; 3g in the left column of the table. Under this indexing, for instance, given the spread triples in Table 1, P23 is the probability that a subject chooses C2 from the pair hi ¼ 23 which is fC 2 ; E 2 g where C2 and E2 are as given in the second (h ¼ 2) row of Table 1. After discussing the properties of the models, we will look at the data from Hey’s experiment for these three spread triples. 4.1.1. Random Preferences and the Wandering Vector Model Ocmps is a preference equivalence set for EU. Therefore, the preference equivalence set property of the RP model implies that any (EU,RP) model requires that Pm ¼ Pm0 for each subject and hence the population, 8 m; m0 2 Ocmps . In words, an (EU,RP) model requires that expected sample choice proportions for all MPS pairs on a given context are equivalent. This is of course true for spread triples too: Using the special indexing of spread triples, Phi ¼ Phi0 8 i and i0 , given h, for each subject and hence the population. None of this holds for (RDEU,RP) models since Ocmps is not in general a preference equivalence set of RDEU. As in the case of common ratio effects, it turns out that the WV model has precisely the same properties as the RP model for MPS pairs on a

Context of Triple

(0,d50,d100) (0,d100,d150) (d50,d100,d150)

Triple (h)

1 2 3

(0,1,0) (2/8,6/8,0) (2/8,6/8,0)

Ch (3/8,2/8,3/8) (3/8,3/8,2/8) (3/8,4/8,1/8)

Dh (4/8,0,4/8) (4/8,0,4/8) (5/8,0,3/8)

Eh

d50 d75 d87.5

Common EV in Triple

Spread Triples from Hey (2001).

Lotteries in Triple

Table 1.

5 5 10

Pair 1 fCh ; Dh g

10 5 5

Pair 2 fDh ; E h g

10 5 5

Pair 3 fCh ; E h g

Trials of Pairs in Triple

Stochastic Models for Binary Discrete Choice Under Risk 233

234

NATHANIEL T. WILCOX

single context. This is proved in Appendix C, but the reason resembles what occurs in the case of the common ratio effect. It turns out that both the EU V-distance between lotteries, and the Euclidean distance between lotteries, are linear in the ‘‘size’’ of an MPS, deﬁned as the difference between the probabilities of receiving the maximum outcome on a context. Hence, the ratio of the EU V-distance to Euclidean distance is independent of the spread size, making (EU,WV) choice probabilities independent of the spread size. Moreover, choice probabilities in the (EU,WV) model turn out to be independent of the expected values of the lotteries in a MPS pair as well: They depend only on the context of the MPS pair and the subject’s utilities of outcomes on that context. It is well worth reﬂecting on this highly nonintuitive prediction of (EU,RP) and (EU,WV) models. Consider these two choice problems: Problem I. Choose $100 for sure, or lottery (0.01,0.98,0.01) on the context ($75,$100,$125). Problem II. Choose $100 for sure, or lottery (0.5,0,0.5) on the context ($75,$100,$125). The increased risk of the lottery relative to the sure thing is much greater in Problem II than in Problem I. It would be trivially easy to show that any risk averter (in Pratt’s sense) would associate a much larger risk premium with the lottery in Problem II than the lottery in Problem I. Nevertheless, (EU,RP) and (EU,WV) models demand that the choice probabilities in these two problems be identical for each decision maker.17 Later we will see that RP models uniquely make the intuitively satisfying prediction that dominated lotteries are never chosen in an FOSD pair. Yet it is obvious here that RPs are equally capable of making astonishingly nonintuitive predictions. This illustrates one of my themes: If you are waiting for a stochastic model that is intuitively satisfying in every way, you are waiting for Godot. Every stochastic model mutilates your structural intuition in some distinctive way: There is no escape from this. 4.1.2. Strict, Strong, and Contextual Utility Strict and strong utility imply SST, and contextual utility implies SST on any given context, with any transitive structure such as EU or RDEU. So in any spread triple h, we must have SST for any subject. However, EU makes stronger predictions: It permits just two linear orderings of the three lotteries in any spread triple since any three utilities will be either weakly concave or weakly convex. If a subject has weakly concave utilities on the

Stochastic Models for Binary Discrete Choice Under Risk

235

context of h, then C h kDh and Dh kE h , and if a subject instead has weakly convex utilities on the context of h, then E h kDh and Dh kCh . From the perspective of the algebraic form of EU, these are implications of Jensen’s inequality. Alternatively, from an axiomatic perspective, this follows from the betweenness property of EU (see Appendix D). Therefore, we have either VðC h jbÞ VðDh jbÞ VðE h jbÞ, or VðE h jbÞ VðDh jbÞ VðCh jbÞ, which has two separate implications for an EU structure with strict, strong, or contextual utility. The ﬁrst implication reﬂects the fact that Dh must be between C h and E h in preference, in any spread triple h: Either minðPh1 ; Ph2 Þ 0:5 ðfor weakly risk-averse subjectsÞ or maxðPh1 ; Ph2 Þ 0:5 ðfor weakly risk-seeking subjectsÞ

(26)

The second implication includes Eq. (26) but adds SST to it: Either or

minðPh1 ; Ph2 Þ 0:5 and Ph3 maxðPh1 ; Ph2 Þ

ðfor weakly risk-averse subjectsÞ; maxðPh1 ; Ph2 Þ 0:5 and Ph3 minðPh1 ; Ph2 Þ

(27)

ðfor weakly risk-seeking subjectsÞ Eq. (27) is essentially SST but with Eq. (26) specifying exactly which pairs (fC h ; Dh g and fDh ; E h g) provide the antecedent for the SST implication, and which pair will be in the consequence of the SST implication (namely the pair fC h ; E h g containing the safest and riskiest lotteries of the spread triple). It should be noted that Eqs. (26) and (27) imply nothing across subjects: One might sample from a heterogeneous population that mixes risk-averse and risk-seeking subjects and, as with the common ratio effect, this mixing can hide these individual level implications. Therefore, these implications should be tested at the individual level. 4.1.3. Stochastic Models are Consequential: An Illustration Using Hey’s (2001) Spread Triples Hey’s (2001) experiment is a repeated trials design with at least ﬁve repetitions of all pairs (and ten repetitions of some pairs) for every subject, as shown in Table 1. This allows for tests of the predictions described above at the individual level – that is, one subject at a time. The data are on the three spread triples shown in Table 1. For each test, an unrestricted log likelihood is simply the sum of the sample log likelihoods for a subject at the observed choice proportions for each of the nine pairs in Table 1. A restricted log likelihood is then computed for a subject by ﬁnding the nine

236

NATHANIEL T. WILCOX

choice probabilities that maximize the sample log likelihood for a subject with the restrictions described in the previous sections on the nine choice probabilities. These are of course restrictions imposed within each spread triple, not across them. Table 2 reports likelihood ratio tests of the restrictions. The (EU,RP) and (EU,WV) models require that choice probabilities within a spread triple are all equal. This is two restrictions per triple, or six restrictions in all for each subject. Therefore, under the null that the restrictions are true, twice the difference between the unrestricted and restricted log likelihoods will follow a w2 distribution with six degrees of freedom. The results soundly reject the restriction: The ﬁrst row of Table 2 shows that it is rejected at the 10% level of signiﬁcance for nearly half of Hey’s (2001) 53 subjects. A sum of independent w2 variates also has a w2 distribution with degrees of freedom equal to the sum of the degrees of freedom of each of the variates. Treating each subject as an independent sample, we may then perform this test overall: The sum of the test statistics across subjects should follow a w2 distribution with 53 6 ¼ 318 degrees of freedom. The left column of Table 2 reports this statistic and its p-value, which is essentially zero for the (EU,RP) and (EU,WV) models. The second row of Table 2 tests the betweenness implication (26) made by EU with strict, strong, or contextual utility. This is one restriction per triple. This is most easily seen by noticing that the single nonlinear constraint ðPh1 0:5ÞðPh2 0:5Þ 0 captures both allowable patterns of the implication (26). Across the three spread triples, then, the likelihood ratio test statistic against the implication will have three degrees of freedom for each subject. Table 2 shows that the implication is rejected at the 10% level for 13% of subjects – not an unexpected rate. Summing the test statistics across subjects, the left column shows that the p-value against the implication for all subjects is unity. So the implication (26) appears to be broadly acceptable. The third to ﬁfth rows of Table 2 test the betweenness implication (26) separately for each of Hey’s (2001) spread triples. Notice from Table 1 that the intermediate lottery D3 of spread triple 3 has a relatively low but nonzero probability 1/8 of the highest outcome (d150) on the context of that triple. In RDEU and cumulative prospect theory, such lotteries are expected to be particularly attractive due to overweighting of small probabilities of the largest outcome (given contemporary wisdom about the shape of weighting functions (see e.g., Tversky & Kahneman, 1992)). Therefore, if we are expecting any violation of betweenness in any of these spread triples, we ought to expect it here in triple 3. And in fact, Table 2 indicates that we have

w253 ¼ 89:66 ( p ¼ 0.0012)

Equal spread size of pairs 1 and 2 in triple 2 yields special prediction that P21 ¼ P22

25%

EU

EU

Strong, strict, or contextual utility Strong or contextual utility

w253 ¼ 12:43 ð p 1Þ w253 ¼ 20:05 ð p 1Þ w253 ¼ 52:34 ( p ¼ 0.50) w2159 ¼ 77:11 ( pE1) w253 ¼ 10:52 ( pE1) w253 ¼ 46:19 ( p ¼ 0.73) w253 ¼ 20:39 ( pE1) w2318 ¼ 230:70 ( pE1) 4% 2% 11% 4% 4% 11% 4% 13%

Any transitive (EU or RDEU)

w2159 ¼ 84:82 ð p 1Þ

13%

Betweenness (Dh of intermediate preference): minðPh1 ; Ph2 Þ 0:5, or maxðPh1 ; Ph2 Þ 0:5. Betweenness in triple h ¼ 1 alone Betweenness in triple h ¼ 2 alone Betweenness in triple h ¼ 3 alone Strong stochastic transitivity (SST) SST in triple h ¼ 1 alone SST in triple h ¼ 2 alone SST in triple h ¼ 3 alone Betweenness and SST together

EU

w2318 ¼ 664:98 ð p 0Þ

Overall w2 (p-value)

49%

Percent of 53 Subjects Violating Prediction at 10% Signiﬁcance

Ph1 ¼ Ph2 ¼ Ph3 , for each h

Prediction

EU

Structure

Tests of Predictions in Spread Triples, Using Spread Triples from Hey (2001).

Strong, strict, or contextual utility

Random preferences or wandering vector Strong, strict, or contextual utility

Stochastic Model

Table 2.

Stochastic Models for Binary Discrete Choice Under Risk 237

238

NATHANIEL T. WILCOX

more subjects violating betweenness at the 10% level of signiﬁcance in triple 3 (13% of subjects) than in triples 1 and 2 (4% and 2% of subjects, respectively). Yet these rates of violation are low in all three triples: Summing the test statistics across subjects, the left column shows that overall p-values against the implication never approach signiﬁcance – not even in triple 3. The sixth to ninth rows of Table 2 report the results of tests of SST alone in the three spread triples, which is implied by either EU or RDEU with strong, strict, or contextual utility. The sixth row does this for all three triples together, while the seventh, eighth, and ninth rows do it for each of the triples separately. SST is actually just one restriction within each triple. While SST rules out two of eight possible patterns of choice probabilities in the three choice pairs arising from any triple of lotteries, the two violating patterns are mutually exclusive. Only one can ever occur: That is, it is mathematically impossible for three choice probabilities to violate both restrictions at once. Therefore, twice the log likelihood difference (between the unrestricted model and the SST restricted model in the three triples) follows a w2 distribution with three degrees of freedom for each subject. The sixth row of Table 2 shows that this test rejects SST at the 10% level for just 6% of Hey’s (2001) subjects. Summing the test statistics over subjects, the left column shows that the p-value against SST for all subjects is unity. So SST in spread triples appears to be broadly acceptable. The seventh, eighth, and ninth rows show similar results for the SST restriction in each of the three triples separately. The tenth row of Table 2 displays test results against the restriction Eq. (27), which is the combination of the betweenness and SST restrictions implied by EU with strict, strong, or contextual utility. This imposes two restrictions per triple, and so the likelihood ratio test statistics here have six degrees of freedom per subject across three spread triples. The test results reject Eq. (27) at the 10% level for 13% of Hey’s (2001) subjects. Summing the test statistics over subjects, the left column shows that the p-value against Eq. (27) for all subjects is unity. So the combination of betweenness and SST in spread triples appears to be broadly acceptable. The good performance so far of EU with strong, strict, or contextual utility is perhaps somewhat surprising given the long history of problems with EU. The question naturally arises: Is there any evidence in these spread triples that seems to reject EU with these stochastic models? In fact there is a special testable equality restriction of EU with strong or contextual utility (but not strict utility) in the second spread triple shown in Table 1. It happens that the spread sizes in the pairs 1 and 2 of triple h ¼ 2, that is fC 2 ; D2 g and fD2 ; E 2 g, are equivalent, which implies that the EU V-distance between the lotteries in

Stochastic Models for Binary Discrete Choice Under Risk

239

these two pairs is equivalent (see Table 1 and the deﬁnition of spread size in Appendix C). Therefore, EU with strong or contextual utility implies that P21 ¼ P22 for each subject. The last row of Table 2 shows that this restriction is rejected at the 10% level for 25% of Hey’s (2001) subjects. Summing the test statistics over subjects, the p-value in the left column of Table 2 soundly rejects this restriction overall. Table 2 illustrates one of my major themes very clearly: Stochastic models are consequential identifying restrictions for theory tests. From the perspective of RPs or the WV model, EU is rejected at the individual level for nearly half (49%) of Hey’s (2001) 53 subjects. Yet from the perspective of strong and contextual utility, none of the testable implications in spread triples ever rejects EU for more than one-fourth of subjects (the special restriction in the last line of Table 2). For most of EU’s predictions in spread triples with strong, strict, or contextual utility, the predictions are rejected for a percentage of subjects roughly equivalent to the size of the test – what one would expect if the predictions are essentially true for all subjects. This convincingly illustrates that stochastic model assumptions are crucial and consequential identifying restrictions for theory tests. It is sobering to compare the inferences we might have made from Hey’s (2001) data if we had depended wholly on simple sample moments, completely ignoring heterogeneity and stochastic models. Recall that Hey’s (2001) spread triple 3 is the one where we should expect violations of betweenness according to today’s conventional wisdom. Let P3i be the sample proportion of choices of the safer lottery from pair i of triple h ¼ 3 in Hey’s data set. In Hey’s data set, we have P31 ¼ 0:413 and P32 ¼ 0:740. With 10 trials per subject of pair 1 (which is fC 3 ; D3 g) and ﬁve trials per subject of pair 2 (which is fD3 ; E 3 g) in a sample of 53 subjects, the hypothesis that these sample proportions equal 0.5 will be soundly rejected by any statistical test. A certain style of inference would then be written: ‘‘Most subjects prefer D3 to C 3 , and most subjects prefer D3 to E 3 , and this violates betweenness in spread triple 3.’’ Clearly, that is not the conclusion we draw from the test results in Table 2 – tests that respect heterogeneity simply by following the sound methodological example of Tversky (1969). Most data sets do not have enough repeated trials (most have none at all) to permit the disaggregated tests shown in Table 2. Yet the previous example illustrates how misleading aggregate tests can be. Therefore, we need statistical methods that can plausibly account for heterogeneity without treating every subject as a separate experiment (as done in Table 2). Linear mixture models (Harrison & Rutstro¨m, 2008 – this volume) are one approach to this. Later, I will describe the complementary random parameters approach.

240

NATHANIEL T. WILCOX

4.2. First-Order Stochastic Dominance Recall from Section 1.3.3 that FOSD pairs are perhaps the only preference equivalence set that is common to a broad collection of structures including EU and RDEU. Sadly, none of the ﬁve stochastic models get the facts about FOSD remotely right. There is a computationally plausible ﬁx-up for part of this problem based on trembles, but not all of it. For the rest, one needs something like Busemeyer and Townsend’s (1993) decision ﬁeld theory – one reason that this theory deserves our close attention in the future. 4.2.1. Random Preferences Because FOSD pairs are a preference equivalence set for both EU and RDEU, the preference equivalence set property of RP models implies what it always does in this case: All choice probabilities are equivalent for all FOSD pairs, for all subjects and hence the population. However, as mentioned in Section 1.3.3, all EU and RDEU preference orderings obey FOSD. In terms of RP intuition, there are no parameter vectors in the RP ‘‘urn’’ for which a dominated lottery is preferred. Therefore, the probability of choosing the stochastically dominating lottery (which by notational conventions is Sm in FOSD pairs) must always be 1. We therefore have Pm ¼ 1 8 m 2 Ofosd , for all subjects and hence the population. Again, this is equally true of EU and RDEU structures with the RP model. 4.2.2. Strict Utility, Strong Utility, Contextual Utility, and the Wandering Vector Model None of these models yield any special predictions about FOSD pairs. Neither V-distance (in strong and contextual utility), nor logarithmic V-distance (in strict utility), nor the Euclidean distance between lottery vectors (in the WV model) take any ‘‘special notice’’ of FOSD. For this reason, they all (counter to intuition) predict at best a small change in choice probabilities, if any, as one passes from a basic pair to an FOSD pair by way of any small change that causes such a change in the classiﬁcation of the lottery pair. 4.2.3. Transparent Dominance Violations as Tremble Events It is now well-known that in cases of transparent dominance, the probability that FOSD is violated is very close to zero, but still different from zero. It is difﬁcult to deﬁne the distinction between transparent and nontransparent dominance (see Birnbaum & Navarrete, 1998, or Blavatskyy, 2007, for useful attempts), but ‘‘you know it when you see it.’’ Here is an example of a

241

Stochastic Models for Binary Discrete Choice Under Risk

lottery pair that writers describe as a ‘‘transparent FOSD pair,’’ taken from Hey (2001): S : 3=8 chance of d50; 1=8 chance of d100; 4=8 chance of d150 R : 3=8 chance of d50; 2=8 chance of d100; 3=8 chance of d150 In the experiment of Loomes and Sugden (1998), subjects collectively violate FOSD in about 1.5% of transparent FOSD pair trials; a similar rate is observed by Hey (2001). Yet within-set switching probabilities for basic pairs are noticeably higher than this in all known experiments. Therefore, the continuity between basic and FOSD pairs that is suggested by all of the models except for the RP model seems to be wrong. By contrast, the RP model’s prediction seems to be approximately right in such transparent FOSD pairs. Yet the RP model’s prediction that FOSD is never violated will cause the log likelihood of any RP model to be inﬁnitely negative for any arbitrarily small but positive rate of FOSD violation in any sample, including the 1.5% rate reported above. So even in the case of the RP model, some kind of ﬁx-up seems necessary. For the RP model, the obvious solution is to add the possibility of tremble events, so as to give a choice probability Pm slightly different from the considered choice probability Pm . Recall from Section 2.1 that this gives Pm ¼ ð1 oÞPm þ o=2, where o is the tremble probability. Since Pm ¼ 1 8 m 2 Ofosd in an RP model, this implies that Pm ¼ 1 o=2 8 m 2 Ofosd in an RP model ‘‘with trembles.’’ For the other models, we need a ‘‘processing story’’ in which subjects begin by screening pairs for transparent dominance. If such a relationship is not found, then the noisy evaluative processes that generate a considered choice probability are undertaken. But if transparent dominance is found, then these processes are not undertaken since they are not necessary: A minimally sensible information processor simply would not put cognitive effort into such irrelevant computations after detecting dominance. However, we do add the possibility of a tremble event, just as with the RP model. Letting dm ¼ 1 if m 2 Ofosd , dm ¼ 0 otherwise, we can then write choice probabilities as follows: Pm ¼ ð1 oÞ½ð1 dm ÞPm þ dm þ

o 2

(28)

It is worth noting that Eq. (28) is equally applicable to all ﬁve models including the RP model. The explicit ‘‘transparent dominance detection’’ step introduced by dm ¼ 1 is not formally necessary for the RP model, but

242

NATHANIEL T. WILCOX

Eq. (28) is identical to Pm ¼ ð1 oÞPm þ o=2 in the case of the RP model (since Pm ¼ 1 whenever dm ¼ 1 with the RP model). Loomes et al. (2002, p. 126) argue ‘‘the low rate of [transparent] dominance violationsymust count as evidence [favoring the RP model]’’ because they feel that other stochastic models, such as strong utility, do not predict this. I ﬁnd this unpersuasive. As a question of relevance, the fact that the RP model gets transparent dominance roughly right says nothing about its descriptive or predictive adequacy in pairs where interesting tradeoffs are at stake, which, of course, is what really matters to the bulk of applied microeconomic theory and empirical microeconomics. As a question of econometric modeling, Eq. (28) shows that it is trivial to add the restriction Pm ¼ 1 o=2 8 m 2 Ofosd to any stochastic model that already contains a tremble with no additional parameters, so there is no loss of parsimony associated with such a modiﬁcation. As a theoretical question, minimally sensible information processors will detect easy (i.e., transparent) dominance relationships and exploit them so as to conserve cognitive effort, as mentioned above. And ﬁnally, violations of nontransparent FOSD occur at rates far too high to be properly described as tremble events, and in such cases the other models would predict choice probabilities closer to the truth (though still qualitatively incorrect) than the RP model does. Let us turn to this. 4.2.4. Nontransparent Dominance Violations The trouble with the RP model in particular, and generally for the other models, is that there are lottery pairs in which FOSD is not so transparently detectable. In such cases, the method of trembles is inadequate. Again, you know these when you see them. Here is an example from Birnbaum and Navarrete (1998): S n : 1=20 chance of $12; 1=20 chance of $14; 18=20 chance of $96 Rn : 2=20 chance of $12; 1=20 chance of $90; 17=20 chance of $96 The majority of subjects in Birnbaum and Navarrete’s experiment choose R in this pair even though S dominates R. Obviously, this cannot be explained as a tremble event, at least not one that occurs at the same very low probability that explains violations of transparent FOSD. Notice that this pair has a four-outcome context. This seems to be necessary for generating similar empirical examples, and the theoretical explanation offered by Birnbaum and Navarrete requires a four-outcome context.

Stochastic Models for Binary Discrete Choice Under Risk

243

Busemeyer and Townsend’s (1993) decision ﬁeld theory also provides an intriguing explanation for nontransparent dominance violations. 4.2.5. Are FOSD Pairs Hydrogen or Hassium? Violations of nontransparent FOSD appear to put all of the ﬁve stochastic models in a serious bind: None of them can accommodate such examples without some deus ex machina. But how worried should we be about this? If our subjects were oxygen and FOSD pairs were hydrogen, and we were physical or biological scientists, we’d be terribly interested in FOSD pairs. Hydrogen, the most common element, plays a starring role in everything from stars to starﬁsh. But if FOSD pairs are instead hassium, the situation is quite different. Hassium, with atomic number 108 and half-life of 14 sec, is one of the so-called transuranic elements – those things beyond uranium in the periodic table. The Wikipedia says this about them: All of [these] elementsyhave been ﬁrst discovered artiﬁcially, and other than plutonium and neptunium, none occur naturally on earthy[Any] atoms of these elements, if they ever were present at the earth’s formation, have long since decayed. Those that can be found on earth now are artiﬁcially generatedyvia nuclear reactors or particle accelerators.

Many economists expect dominated alternatives to be akin to transuranic elements – that is, they expect dominated alternatives to have short half-lives in the real economic world, outside of labs. That expectation relies, at least in part, on competition amongst sellers; therefore it is not obvious at all that laboratory violations of dominance imply anything about the survivability of dominated alternatives in any long run equilibrium in the real world. A potential entrant may well be able to proﬁt at the expense of an incumbent seller who (say) sells R for a higher price than S to consumers. The potential entrant can, after all, reframe the choice to expose the dominance relation just as easily as Birnbaum and Navarrete (1998) do: Snn : 1=20 chance of $12; 1=20 chance of $14; 1=20 chance of $96; 17=20 chance of $96 R : 2=20 chance of $12; 0=20 chance of $14; nn

1=20 chance of $90; 17=20 chance of $96 An entrant can advertise the choice this way instead and call explicit attention to the chicanery of the incumbent in her advertisement. Expressed this way, few subjects choose R. An experiment of this kind, allowing sellers to choose different frames for lotteries, as well as advertise

244

NATHANIEL T. WILCOX

informatively to buyers with comment on other sellers’ ads and offerings, could be quite interesting: Do ‘‘good frames’’ drive out ‘‘bad frames,’’ or is it the other way around? Physicists create transuranic elements in labs to learn things about nuclear physics in general, and so it is with FOSD pairs in our own labs. We may learn a good deal about decision making by doing this. So it is not a waste of time to look at FOSD pairs. And even if dominated alternatives mostly cease to exist in equilibrium, they could be important out of equilibrium and hence on paths to equilibrium. The issue is really one of relative importance: We should be much more interested in pairs that we think will be common both in equilibrium and out of it. Those are pairs that contain interesting tradeoffs, such as risk-return tradeoffs. The principle suggested by these thoughts is this. If there is a conﬂict in the explanatory power of stochastic models A and B that can be boiled down to ‘‘model A explains data better in pairs with interesting tradeoffs, or MPSs, etc., while model B explains data better only in FOSD pairs’’ then it seems to me that model A is the strongly favored choice. A corollary is this: Any argument in favor of any stochastic model, based solely on FOSD pairs, may be a relatively weak argument.

4.3. Context shifts and Parametric Utility Functions With any large number of outcomes, or for predicting choices with new outcomes, a parametric utility function for outcomes will frequently be required. Therefore, we need to know how these parametric forms behave when combined with each stochastic model. Recall the deﬁnitions of additive and proportional context shifts from Section 1.3.4. Deﬁne two stochastic model properties in terms of such shifts. Say that a stochastic model is CARA-neutral if Pm ¼ Pm0 for all subjects, whenever m ¼ fSm ; Rm g and m0 ¼ fS m0 ; Rm0 g differ by an additive context shift and the utilities of outcomes are given by CARA utility functions. Similarly, say that a stochastic model is CRRA-neutral if Pm ¼ Pm0 for all subjects, whenever m ¼ fS m ; Rm g and m0 ¼ fS m0 ; Rm0 g differ by a proportional context shift and the utilities of outcomes are given by CRRA utility functions. Only some of the stochastic models are CARA-neutral and CRRA-neutral. Throughout this section, the results will hold for both EU and RDEU structures. This is because the probability vectors in lottery pairs are deﬁnitionally constant across pairs that differ by an additive or proportional

Stochastic Models for Binary Discrete Choice Under Risk

245

context shift. Thus, probabilities (in EU) and weights (in RDEU) play no role in determining CARA- or CRRA-neutrality. 4.3.1. Random Preferences, Strict Utility, and Contextual Utility Recall from Section 1.3.4 that when utilities follow the CARA utility function, sets of pairs that differ by an additive context shift are preference equivalence sets for both EU and RDEU. If all utility functions in a subject’s RP ‘‘urn’’ are CARA utility functions, then, the preference equivalence set property of the RP model implies that Pm ¼ Pm0 for all subjects, whenever m and m0 differ by an additive context shift. Section 1.3.4 also showed that when utilities follow the CRRA utility function, sets of pairs that differ by a proportional context shift are preference equivalence sets for both EU and RDEU; so similarly, the RP model with only CRRA utility functions ‘‘in the urn’’ implies that Pm ¼ Pm0 for all subjects, whenever m and m0 differ by a proportional context shift. Therefore, RPs are both CARA- and CRRA-neutral. The logarithmic V-distance form in strict utility gives it CARA- and CRRA-neutrality too. Taking natural logarithms through both of the identities (4), we have ln½VðS m0 jbÞ ax þ ln½VðS m jbÞ and ln½VðRm0 jbÞ ax þ ln½VðRm jbÞ

(29)

for all subjects, for both EU and RDEU with CARA utility functions, whenever pairs m and m0 differ by an additive outcome shift x. It follows from Eq. (29) that ln½VðS m0 jbÞ ln½VðRm0 jbÞ ln½VðS m jbÞ ln½VðRm jbÞ

(30)

so that strict utility’s latent variable in F is constant across pairs that differ by an additive context shift. The choice probability is then constant across such pairs too, so strict utility is CARA-neutral. Similarly, taking natural logarithms through both of the identities (5), we have ln½VðSm0 jbÞ ð1 jÞ lnð yÞ þ ln½VðSm jbÞ and ln½VðRm0 jbÞ ð1 jÞ lnð yÞ þ ln½VðRm jbÞ

(31)

for all subjects, for both EU and RDEU with CRRA utility functions, where pairs m and m0 differ by the proportional outcome shift y. Eq. (30) also follows from these two identities. So by the same kind of argument, strict utility is CRRA-neutral as well.

246

NATHANIEL T. WILCOX

Because of contextual utility’s ratio of differences form, EU and RDEU contextual utility models are also CARA- and CRRA-neutral. Recall from Eq. (19) that contextual utility’s latent variable depends only on the ratio of differences um ¼ ðuk uj Þ=ðul uj Þ on context cm ¼ ð j; k; lÞ. Recall from Section 2.5 that with CARA utility, uzþx ¼ eax uz . Therefore, with CARA utility and an additive context shift cm0 ¼ ð j þ x; k þ x; l þ xÞ, we have um 0 ¼

eax uk eax uj uk uj ¼ ¼ um eax ul eax uj ul uj

(32)

With CARA utility, contextual utility’s latent variable is therefore unchanged by an additive context shift, so it is CARA-neutral. Similarly, since uyz ¼ y1j uz with CRRA utility, a proportional context shift cm0 ¼ ðyj; yk; ylÞ with CRRA utility gives um 0 ¼

y1j uk y1j uj uk uj ¼ ¼ um y1j ul y1j uj ul uj

(33)

So with CRRA utility, contextual utility’s latent variable is likewise unchanged by a proportional context shift, implying that it is CRRAneutral too. 4.3.2. Strong Utility and the Wandering Vector Model Strong utility and the WV model are neither CARA- nor CRRA-neutral. Consider ﬁrst CARA utility on the context cm0 ¼ ð j þ x; k þ x; l þ xÞ: From the identities (4), strong utility’s latent variable is in this case VðS m0 jbÞ VðRm0 jbÞ eax ½VðSm jbÞ VðRm jbÞ

(34)

for any subject, for EU and RDEU. Taking the derivative with respect to x, we have @VðS m0 jbÞ VðRm0 jbÞ aeax ½VðSm jbÞ VðRm jbÞ @x

(35)

Obviously, this implies that the latent variable in a strong utility model with CARA utility changes with an additive context shift. Therefore, strong utility is not CARA-neutral. For risk-averse subjects (those with a40), an additive context shift moves choice probabilities in the direction of indifference, while for risk-seeking subjects (those with ao0), it moves choice probabilities away from indifference, making them more extreme. Similarly, if we have CRRA utility on the context cm0 ¼ ð yj; yk; ylÞ, the identities (5) imply that for any subject, both EU and RDEU, strong utility’s

Stochastic Models for Binary Discrete Choice Under Risk

247

latent variable in pair m0 is VðS m0 jbÞ VðRm0 jbÞ y1j ½VðS m jbÞ VðRm jbÞ

(36)

Taking the derivative with respect to the proportional shift y, we have @VðSm0 jbÞ VðRm0 jbÞ ð1 jÞyj ½VðS m jbÞ VðRm jbÞ @y

(37)

Again, this implies that the latent variable in a strong utility model with CRRA utility generally changes with a proportional context shift. Therefore, strong utility is not CRRA-neutral. Because the CRRA utility function approaches lnðzÞ as j ! 1, call CRRA utility functions with j41 ‘‘sublogarithmic.’’ For subjects with sublogarithmic CRRA utility, a proportional context shift moves choice probabilities in the direction of indifference, while for other subjects with jo1, it moves choice probabilities away from indifference, making them more extreme. All of these results apply equally to the WV model since probability vectors are deﬁnitionally held constant across pairs m and m0 that differ by either an additive or proportional context shift. The Euclidean distance between probability vectors in such pairs is therefore constant across pairs: That is, d m d m0 ¼ d. Therefore, the derivatives in Eqs. (35) and (37) for strong utility simply differ by the factor d 1 in the WV model, which is positive since d is a distance. So the WV model is neither CARA- nor CRRA-neutral. 4.3.3. Patterns of Risk Aversion Across Contexts: Stochastic Models Versus Structure CARA- and CRRA-neutrality are important properties of stochastic models because they identify changes in risk-taking behavior across contexts as structural differences. Consider for instance the well-known experiment of Holt and Laury (2002). Holt and Laury examine binary choices from pairs on two four-outcome contexts that differ by a 20-fold ( y ¼ 20) proportional context shift.18 There is a general shift toward safer choices in pairs after the proportional context shift, which Holt and Laury interpret as increasing relative risk aversion. The results of this section demonstrate that this interpretation depends on an implicit stochastic identifying restriction. In particular, Holt and Laury (2002) implicitly assume that the true stochastic model is CRRA-neutral. As we have seen, that could be RPs, strict utility, or contextual utility – all of these are CRRA-neutral. In fact, Holt and Laury go on to specify a strict

248

NATHANIEL T. WILCOX

utility EU model, with a ﬂexible ‘‘expo-power’’ utility function for maximum-likelihood estimation, and the estimates conﬁrm their interpretation of increasing relative risk aversion. The results of this section basically imply that once Holt and Laury select strict utility, the estimation need not be done. If the probability of safer choices increases with a proportional context shift, strict utility must put this down to increasing relative risk aversion in the structural sense of that term because strict utility is CRRAneutral: It cannot do otherwise. This is not simply an academic point since other stochastic models are not CRRA-neutral. For instance, suppose that strong utility is the true stochastic model, that CRRA EU is the true structure, and that most subjects have a (constant) coefﬁcient of relative risk aversion between zero and one, which is typical of estimates (Harrison & Rutstro¨m, 2008 – this volume). Then Eq. (37) implies that if most subjects prefer the safe lottery in some pair m, a proportional context shift of that pair will increase their probability of choosing the safe lottery in the shifted pair. This is precisely what Holt and Laury (2002) report. The lesson here resembles that learned from the discussion of the common ratio effect, though in this case heterogeneity (mixing across different subject types) is not part of the problem. Qualitative patterns of risky choice do not by themselves tell us what structure we are looking at because stochastic models interact with structure in nontrivial ways. To tell whether Holt and Laury (2002) are looking at increasing relative risk aversion and a CRRA-neutral stochastic model, or in contrast constant relative risk aversion and a stochastic model that is not CRRA-neutral, we need to do more. In the event, actual comparisons of log likelihoods (Harrison & Rutstro¨m, 2008 – this volume) suggest that Holt and Laury’s conclusion was correct. But separate estimations with different stochastic models, and a comparison of log likelihoods, was necessary to validate that conclusion: The qualitative pattern of results simply cannot decide the issue on its own. Econometrically, CARA- and CRRA-neutrality can be viewed as ‘‘desirable’’ features of a stochastic model precisely because of the strong structural identiﬁcation implied by these properties. There is also a theoretical sense in which these are ‘‘nice’’ properties. In deterministic EU and RDEU, we single out CARA and CRRA utility functions for special notice because they create preference equivalence sets with additive and proportional context shifts, respectively. CARA- and CRRA-neutrality are intuitively satisfying stochastic choice reﬂections of these special deterministic preference properties. I understand and sympathize with this kind of

Stochastic Models for Binary Discrete Choice Under Risk

249

theoretical appeal: It resembles the appeal that contextual utility has by virtue of creating congruence between stochastic and structural deﬁnitions of MRA. It is not clear, though, that we are required to choose stochastic models that create properties that mirror the deterministic structure in some theoretically satisfying manner.

4.4. Simple Scalability Because they satisfy SST with any transitive structure, strong and strict utility must satisfy simple scalability with the EU or RDEU structure. Recall that contextual utility is observationally identical to strong utility (and satisﬁes SST) for pairs that share the same context. Therefore, contextual utility must also satisfy simple scalability for pairs on the same context. However, the moderate utility models will not, in general, satisfy simple scalability. For instance, contextual utility can violate both SST and simple scalability across pairs that have different contexts. The WV model can violate both SST and simple scalability even for pairs that share the same context, since its heteroscedasticity varies across pairs with the same context (unlike contextual utility). Violations of simple scalability for pairs that share the same context would therefore reject strong, strict, and contextual utility in favor of the WV model or some other alternative, such as decision ﬁeld theory (Busemeyer & Townsend, 1993), that permits heteroscedasticity across pairs with a common context. Recall that simple scalability implies an ordering independence property of choice probabilities across special sets of four pairs, which we can call a quadruple. Let the pairs {C,E} and {D,E} be indexed by ce and de, respectively, and let the pairs fC; E 0 g and fD; E 0 g be indexed by ce0 and de0 , respectively. Then simple scalability requires Pce Pde iff Pce0 Pde0 . RP models do not in general require this, as shown by this counterexample. Consider an urn with three linear orderings (from best to worst) in it: Two ‘‘copies’’ of the ordering DE 0 CE, and one of the ordering CE 0 ED. Supposing that each of these three orderings is equally likely to be drawn on any choice trial, we have Pce ¼ 1 and Pde ¼ 2=3, but also have Pce0 ¼ 1=3 and Pde0 ¼ 2=3. So like the WV model and decision ﬁeld theory, RPs do not in general need to satisfy simple scalability. Though Hey’s (2001) data set provides many opportunities to test this property, most of the suitable quadruples are ones where C ﬁrst-order stochastically dominates D. Of course, the pair fC; Dg is not itself involved in a test of the ordering independence property, and this property must still

250

NATHANIEL T. WILCOX

hold when fC; Dg is an FOSD pair. Unfortunately, such quadruples make less sharp distinctions between the stochastic models. For instance, if fC; Dg is an FOSD pair, every linear ordering in which D precedes E must also be an ordering in which C precedes E (since linear orderings are transitive), for any structure that satisﬁes FOSD. Consider, then, an urn ﬁlled with linear orderings of the lotteries in the quadruple, and suppose fC; Dg is an FOSD pair: The number of linear orderings in this urn for which D is preferred to E cannot exceed the number of linear orderings in this urn for which C is preferred to E in this instance. Therefore, even a RP model should obey the ordering independence property implied by simple scalability whenever fC; Dg is an FOSD pair. Additionally, quadruples where the sample proportions Pce and Pde (or Pce0 and Pde0 ) happen to be very close to either zero or one cannot signiﬁcantly violate the ordering independence property for any appreciable number of subjects: The constraint Pnce ¼ Pnde (or Pnce0 ¼ Pnde0 ) will necessarily be satisﬁed with little loss of ﬁt for virtually every subject n in such cases. Therefore, ordering independence imposed as a constraint will necessarily result in little loss of ﬁt, for virtually all subjects, in any such quadruple. Unfortunately, many potentially interesting quadruples in Hey’s design have this sample characteristic in his data, making them relatively uninformative about simple scalability. Table 3 shows the only pairs from Hey’s (2001) data set that I regard as ‘‘suitable’’ for a test of simple scalability by the ordering independence property. Suitability is deﬁned in two ways, based on the immediately preceding discussion. First, the pair fC; Dg is not an FOSD pair; and second,

Table 3.

Pairs from Hey (2001) Used for A Limited Test of Simple Scalability.

The Lotteries, all on the Context (0,d50,d150)

C ¼ ð2=8;3=8;3=8Þ, D ¼ ð3=8;1=8;4=8Þ, E ¼ ð0;7=8;1=8Þ, E 0 ¼ ð1=8;6=8;1=8Þ, E 00 ¼ ð1=8;7=8;0Þ

The Pairs Pair

Sample Proportion (choices of C or D)

fC; Eg fD; Eg fC; E 0 g fD; E 0 g fC; E 00 g fD; E 00 g fC; Dg

0.132 0.117 0.498 0.309 0.626 0.449 0.785

Stochastic Models for Binary Discrete Choice Under Risk

251

the sample proportions Pce and Pde are in the interval [0.10,0.90], and hence bounded away from zero or one, for each ‘‘standard lottery’’ E involved in these pairs. As Table 3 shows, there are six pairs all involving the same two lotteries C and D, which are each paired against three different ‘‘standard lotteries’’ denoted by E, E 0 , and E 00 . Thus, we have six pairs in all, in principle forming three quadruples. These are not three independent quadruples: If we impose ordering independence for any two of them, then ordering independence will be true in the third quadruple as well. Put differently, ordering independence for these six pairs is the imposition of the two nonlinear constraints ðPnce Pnde ÞðPnce0 Pnde0 Þ 0 and ðPnce0 Pnde0 Þ ðPnce00 Pnde00 Þ 0, for each subject n. Hey’s design also happens to present the pair fC; Dg directly, and the logic of the ordering independence property implies that we must also have Pnce Pnde iff Pncd 0:5. Therefore, we can add each subject’s choice data for the direct choice between C and D to the test, and add a third nonlinear constraint ðPnce Pnde ÞðPncd 0:5Þ 0 to the previous two for each subject n. As usual, with three constraints, twice the difference between the unrestricted and restricted log likelihood for each subject is distributed w2 with three degrees of freedom. This restriction is not rejected for any of Hey’s 53 subjects at the 10% level (and obviously holds overall). This is a ‘‘happenstance test’’ of simple scalability: I am simply working with what happens to be available in Hey’s (2001) data set. Yet it is of some interesting since the three ‘‘standard lotteries’’ E, E 0 , and E 00 happen to be distinctive (see Table 3). Lottery E has a zero probability of the lowest outcome on the context: This may call extra attention to the nonzero probabilities in C and D of receiving that lowest outcome, and that might make D look especially poor in comparison to E. Likewise, lottery E 00 has a zero probability of the highest outcome on the context: This may call extra attention to the nonzero probabilities in C and D of receiving the highest outcome, and that might make D look especially good in comparison to E 00 . So the test does in principle put some stress on simple scalability, which is intuitively the assumption that such differential effects of the standard of comparison are weak or nonexistent. On the other hand, I take little comfort from this test. Hey (2001) did not deliberately design his experiment as a test of simple scalability. The test performed here uses pairs that are entirely on a single context. The most robust violations of simple scalability found in the psychological canon involve pairs with different contexts: To explain these violations, we need theories like decision ﬁeld theory and contextual utility that relax SST across contexts.19 Experimental economists need to deliberately set about testing

252

NATHANIEL T. WILCOX

simple scalability with suitable designs that replicate and extend what psychologists have already done. It is at the heart of latent variable approaches to modeling discrete choice under risk.

4.5. Generalizability and Tractability: The Special Problem of Random Preferences When contexts vary in a data set, or when we wish to predict choices from one context to another, the generalizability of stochastic models across contexts becomes an important consideration for choosing amongst them. For most of the stochastic models discussed here this is not a pressing issue. RP models, however, are inherently difﬁcult to generalize across contexts with structures that are more complex than EU. In fact, RDEU models with RPs quickly become econometrically intractable except for special cases – and even in these special cases, generalizing the model across all contexts is not transparent. I now work out one such special case that illustrates these problems. Consider an experiment such as Hey and Orme (1994) that uses four equally spaced money outcomes including zero. Let (0,1,2,3) denote such outcomes, and let a subject’s random utility vector for the four outcomes be ð0; 1; u2 ; u3 Þ, where u2 1 and u3 u2 . In Hey and Orme (1994), as well as Hey (2001) and Harrison and Rutstro¨m (2005), pairs are on the four possible three-outcome contexts one may create from the four outcomes (0,1,2,3). Index the four contexts by their omitted outcome: For instance, c ¼ 3 (read as ‘‘not outcome 3’’) indexes the context (0,1,2). The three left columns of Table 4 summarize these four contexts and their utility vectors. Table 4. Contexts and Random Preference Representation for Experiments with Four Overlapping Three-outcome Contexts. Context Index (c)

3 2 1 0

Context c, with Outcomes ð j; k; lÞ

Utility Vector on Context c

vm ðul uk Þ=ðuk uj Þ on Context c with Outcomes ð j; k; lÞ

vm in Terms of the Underlying Random Variables g1 and g2, on Context c

(0,1,2) (0,1,3) (0,2,3) (1,2,3)

ð0; 1; u2 Þ ð0; 1; u3 Þ ð0; u2 ; u3 Þ ð1; u2 ; u3 Þ

u2 1 u3 1 ðu3 u2 Þ=u2 ðu3 u2 Þ=ðu2 1Þ

g1 g1 þ g2 g2 =ðg1 þ 1Þ g2 =g1

253

Stochastic Models for Binary Discrete Choice Under Risk

It should be clear that random preference RDEU requires a choice for the joint distribution of the two utility parameters u2 and u3 in a subject’s ‘‘RP urn.’’ But in order to use the elegant speciﬁcation of Loomes et al. (2002) introduced earlier, we will need to choose that joint distribution cleverly so that vm ðul uk Þ=ðuk uj Þ will have a tractable distribution on each context ð j; k; lÞ: This is because vm is the key random variable of that speciﬁcation, as shown in the discussion of Eqs. (9) and (10) earlier. To explore this, let g1 u2 1 2 Rþ and g2 u3 u2 2 Rþ be two underlying random variables generating the two random utilities as u2 ¼ 1 þ g1 and u3 ¼ 1 þ g1 þ g2 . Then, algebra shows the following: g1

for pairs m on c ¼ 3; that is the context ð0; 1; 2Þ;

g1 þ g2 g2 vm ¼ g1 þ 1 g2 g1

for pairs m on c ¼ 2; that is the context ð0; 1; 3Þ; for pairs m on c ¼ 1; that is the context ð0; 2; 3Þ; and for pairs m on c ¼ 0; that is the context ð1; 2; 3Þ (38)

These results are also summarized in the two right columns of Table 4. With vm expressed in terms of the two underlying random variables g1 and g2 , we need a joint distribution of g1 and g2 that will generate tractable parametric distributions of as many of the context-speciﬁc forms taken by vm as possible. The best choice I am aware of still only works for three of the four forms in Eq. (38). That choice is two independent gamma variates, each with the gamma distribution c.d.f. Gðxjf; kÞ, with identical ‘‘scale parameter’’ k but possibly different ‘‘shape’’ parameters f1 and f2 . Under this choice, we have the following: Gamma with c:d:f: Gðxjf1 ; kÞ for pairs m on context c ¼ 3; vm is distributed . . .

Gamma with c:d:f: Gðxjf1 þ f2 ; kÞ for pairs m on context c ¼ 2; Beta-prime with c:d:f: B0 ðxjf2 ; f1 Þ for pairs m on context

(39)

c ¼ 0

Sums of independent gamma variates with common scale have gamma distributions, and ratios of independent gamma variates with common scale have beta-prime distributions on R+, also called ‘‘beta distributions of the second kind’’ (Aitchison, 1963).20 These assumptions also imply a joint

254

NATHANIEL T. WILCOX

distribution of u2 1 and u3 1 knownpas ‘‘McKay’s bivariate gamma ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ distribution’’ and a correlation coefﬁcient f1 =ðf1 þ f2 Þ between u2 and u3 in the subject’s ‘‘RP urn’’ (McKay, 1934; Hutchinson & Lai, 1990). Notice that Eq. (39) only involves three parameters – two shape parameters f1 and f2 , and a scale parameter k. These parameters correspond to the three parameters one would ﬁnd in the other four stochastic models when combined with EU, in the form of the (nonrandom) utilities u2 and u3 , and the precision parameter l. Of course, when combined with RDEU, all ﬁve models of choice probabilities would also include a weighting function parameter such as g. An acquaintance with the literature on estimation of random utility models may make these assumptions seem very special and unnecessary. They are very special, but this is because theories of risk preferences over money outcomes are very special relative to the kinds of preferences that typically get treated in that literature. Consider the classic example of transportation choice well-known from Domencich and McFadden (1975). Certainly we expect the value of time and money to be correlated across the population of commuters. But for a single commuter making a speciﬁc choice between car and bus on a speciﬁc morning, we do not require a speciﬁc relationship between the disutility of commuting time and the marginal utility of income she happens to ‘‘draw’’ from her random utility urn on that particular morning. This gives us fairly wide latitude when we choose a distribution for the unobserved parts of her utilities of various commuting alternatives. This is deﬁnitely not true of any speciﬁc trial of her choice from a lottery pair m. The spirit of the RP model is that every preference ordering drawn from the urn obeys all properties of the preference structure (Loomes & Sugden, 1995). We demand, for instance, that she ‘‘draw’’ a vector of outcome utilities that respects monotonicity in z; this implies that the joint distribution of u2 and u3 must have the property that u3 u2 1. Moreover, the assumptions we make about the vm must be probabilistically consistent across pair contexts. Choosing a joint distribution of u2 and u3 immediately implies exact commitments regarding the distribution of any and all functions of u2 and u3 . The issue does not arise in a data set where subjects make choices from pairs on just one context, as in Loomes et al. (2002): In this simplest of cases, any distribution of vm on R+, including the lognormal choice they make, is a wholly legitimate hypothesis. But as soon as each subject makes choices from pairs on several different overlapping contexts, staying true to the demands of RP models is much more exacting. Unless we can specify a joint distribution of g1 and g2 that implies it, we are not entitled (for instance) to assume that vm follows lognormal distributions in all of three overlapping contexts for a single subject.21 Put differently, a

Stochastic Models for Binary Discrete Choice Under Risk

255

choice of a joint distribution for vm in two contexts has exact and inescapable implications for the distribution of vm on a third context that shares outcomes with the ﬁrst two contexts. Carbone (1997) correctly saw this in her treatment of EU with RPs. Under these circumstances, a cagey choice of the joint distribution of g1 and g2 is necessary. RP models can be quite limiting in practical applications. For instance, notice that Eq. (39) gives no distribution of vm on the context c ¼ 1 (i.e. (0,2,3)). Fully a quarter of the data from experiments such as Hey and Orme (1994) are on that context. As far as I am aware, the following is a true statement, though I may yet see it disproved. Conjecture. There is no nondegenerate joint distribution of g1 and g2 on (R+)2 such that g1 , g1 þ g2 , g2 =ðg1 þ 1Þ and g2 =g1 all have tractable parametric distributions. Shortly I compare the stochastic models using Hey and Orme’s data, and limit myself to just the choices subjects made from pairs on the contexts (0,1,2), (0,1,3) and (1,2,3). These are the contexts that the ‘‘independent gamma model’’ of RPs developed above can be applied to, and I am not aware of any alternative that would permit parametric estimation of random preference RDEU across all four contexts. There are no similar practical econometric modeling constraints on strict, strong, or contextual utility models, or WV models, with RDEU (a considerable practical point in their favor); these models are all applied with relative ease to choices on any number of different outcome contexts. Speciﬁcations that adopt some parametric form for the utility of money, and then regard the randomness of preference as arising from the randomness of a utility function parameter, offer no obvious escape from these difﬁculties, at least for RDEU. For instance, if we adopt the CRRA form, it is fairly simple to show that this implies vm ¼ ðl 1j k1j Þ=ðk1j j 1j Þ, where ð j; k; lÞ is the context of pair m. Substituting into Eq. (10), we then have 1j l k1j wðsmk þ sml jgÞ wðrmk þ rml jgÞ Pm ¼ Pr 1j 1j wðrml jgÞ wðsml jgÞ k j

(40)

There are two possible routes for implementing this when 1 j is a random variable. The ﬁrst is to solve the inequality in Eq. (40) for 1 j as a function of j, k, and l, the pair characteristics, and whatever parameters of w(q) we have. We could then choose a distribution for 1 j and be done. I invite

256

NATHANIEL T. WILCOX

readers to try it with any popular weighting function: Contexts (0,1,2) and (0,1,3) are simple but the context (1,2,3) is intractable. A second route is suggested by an approach that works well for the EU structure where w(q) q. In the EU case, although we still cannot analytically solve Eq. (40) for all contexts, we can easily use numerical methods to ﬁnd 1 jm (to any desired degree of accuracy) prior to estimation, for each pair m on whatever context, such that l 1jm k1jm smk þ sml ðrmk þ rml Þ ¼ rml sml k1jm j 1jm

(41)

Here, jm is that coefﬁcient of relative risk aversion that makes a subject indifferent between the lotteries in pair m. With this in hand, we can choose a distribution H 1 j ðxjaÞ for 1 j and use Pm ¼ H 1j ð1 jm jaÞ as our model of considered choice probabilities under EU with RPs: The probability of choosing the safe lottery is simply the probability that the subject draws a coefﬁcient of relative risk aversion larger than jm from her RP urn. For RDEU, however, 1 jm is a function of any parameters of the weighting function w(q). In terms of well-known theory, risk aversion arises from both the utilities of money and the weighting function, so there is no unique coefﬁcient of relative risk aversion, independent of the weighting function, that makes the subject indifferent between the lotteries in pair m. Therefore we cannot simply provide a constant 1 jm to our model as we can with EU: We need the function 1 jm ðgÞ (in the case of the Prelec weighting function with parameter g) so that we can write Pm ¼ H 1j ½1 jm ðgÞja. But we have been here before: We cannot analytically solve Eq. (40) for this function, so it would have to be approximated numerically on the ﬂy, for each pair m, within our estimation. Numerical methods probably exist for such tasks, but they are beyond my current knowledge. On the basis of this discussion, I think it fair to say that in the case of RDEU, RP models are much less generalizable (in the sense of econometric tractability) across contexts than are other stochastic models.

4.6. Summary of Stochastic Model Properties Table 5 summarizes the properties of the stochastic models at a glance. The following conclusion is inescapable: All of the stochastic models, when combined with an EU structure, have a prediction or property which either (a) can be, and has been, taken as a violation of EU, or (b) is

No (false CRE possible with heterogeneity) No

Not without trembles Yes Yes No

No (false CRE possible with heterogeneity) No

Not without trembles

No

Yes

No

Invariance of choice probabilities to common ratio change with EU Invariance of choice probabilities in spread triple with EU Near-zero probability of choosing stochastically dominated lottery CARA and CRRA neutrality Tractable generalization across contexts Sensible stochastic meaning of ‘‘more risk averse’’ in Pratt’s sense

SST

Strict utility

SST

Strong utility

Yes

Yes

Yes

Not without trembles

SST within a context, MST across contexts No (false CRE possible with heterogeneity) No

Contextual utility

Stochastic Model Wandering vector

No

Yes

No

Possible but not always meaningful

Not for RDEU

Yes

Yes

Yes

Yes

Not without trembles

Yes

No

Random preferences

Yes

MST

A Summary of Stochastic Model Properties.

Stochastic transitivity

Property

Table 5.

Stochastic Models for Binary Discrete Choice Under Risk 257

258

NATHANIEL T. WILCOX

econometrically problematic. For instance, strong, strict, and contextual utility are all capable of producing the ‘‘false common ratio effect’’ described in Section 3.1.2 when subjects are heterogeneous; the WV model is neither CARA- nor CRRA-neutral; and RPs have no stochastic transitivity properties at all. My own view, in the end, is that it is rather pointless to single out some speciﬁc ‘‘weird’’ or ‘‘difﬁcult’’ feature of a stochastic model as a criterion for rejecting it: Each model has its own weird and/or difﬁcult features which are not shared by all models. This is just another way of saying that stochastic models are unavoidably consequential when it comes to discrete choice and choosing between structures such as EU and RDEU. It is simply wrong to claim otherwise.

5. AN OVERALL ECONOMETRIC COMPARISON OF THE STOCHASTIC MODELS Henceforth, I refer to any combination of a structure and a stochastic model as a speciﬁcation and denote these as ordered pairs. The two structures used here are denoted EU and RDEU as always, while the stochastic models will be denoted as strong, strict, contextual, WV, and RP. For instance, the speciﬁcation (EU,Contextual) is an EU structure and the contextual utility stochastic model, while the speciﬁcation (RDEU,RP) is an RDEU structure and the RP stochastic model. On occasion an index for speciﬁcations will be helpful; let this be s, not to be confused with the subscripted s that are probabilities in a safe lottery, as in S m ¼ ðsmj ; smk ; sml Þ. In Sections 3 and 4, speciﬁc predictions and properties of various speciﬁcations were discussed and in some cases tested with Hey’s (2001) data. These piecemeal tests have some usefulness because they identify speciﬁc ways in which speciﬁcations fail; in doing so, these tests can suggest speciﬁc avenues for theoretical improvement. Additionally, most of these tests are free of assumptions about cumulative distribution functions, functional forms and/or parameter values. But these piecemeal tests conﬁne attention to very narrow sets of lottery pairs. In the tests of Section 4, I used (in all) 16 lottery pairs from Hey’s data, but that data set has choices from 92 distinct pairs. There is an obvious danger associated with focusing attention only on sets of pairs where speciﬁcations deliver crisp predictions: We could miss the fact (if it is a fact) that some speciﬁcations have relatively good explanatory and/or predictive performance across broad sets of pairs,

Stochastic Models for Binary Discrete Choice Under Risk

259

even if they fail speciﬁc tests across narrow sets of pairs. So this question naturally arises: Which stochastic model, when combined with EU or RDEU, actually explains and predicts binary choice under risk best, in an overall sense – that is across a broad collection of pairs? In this section, I bring together what I regard as some of the best insights and methods, both large and small, for answering such questions. Although my particular combination of these insights and methods is unique, I do not view it as particularly innovative. All I really do here is combine and extend contributions made by many others. Readers can usefully view what I do here as an elaboration of Loomes et al.’s (2002) approach that allows for more pair contexts, more kinds of heterogeneity and more stochastic models, though it also calls on certain independent insights, such as those of Carbone (1997) about RP models. In the large, I add an emphasis on prediction as opposed to explanation – an emphasis that certainly precedes me (Busemeyer & Wang, 2000). There are things I will not do here. This chapter is a drama about stochastic models; the structures are ‘‘bit parts’’ in this play. My strategy is to write down and estimate speciﬁcations so that they all depend on equal numbers of parameters, conditional on their structure. The question I then focus on is this: Holding structure constant (i.e., given an EU or RDEU structure), which stochastic model performs best? With numbers of parameters deliberately equalized across stochastic models, holding structure constant, this question can then be answered without taking a position on the value of parsimony. Others will decide whether they are willing, for instance, to pay the extra parameters required to get the extra ﬁt (if indeed there is any) of RDEU over EU: My rhetorical pose is that this does not interest me. Yet as will be clear later, this is more than a pose: The data may tell us that stochastic models are more consequential than structures, and this appears to be the case in prediction. Recently, ﬁnite mixture models have appeared, in which a population is viewed as a mixture of two or more speciﬁcations. For instance, Harrison and Rutstro¨m (2005) consider a ‘‘wedding’’ of EU and cumulative prospect theory, and Conte, Hey, and Moffatt (2007) later considered an alternative marriage of EU and RDEU. In both cases, the population is viewed as composed of some fraction f of one speciﬁcation, and a fraction 1 f of another. The fraction f then becomes an extra parameter to estimate. Without prejudice, I do not pursue this kind of heterogeneity here; consult Harrison and Rutstro¨m (2008 – this volume) for an example and discussion.

260

NATHANIEL T. WILCOX

5.1. The Data and its Special Features To compare models in an overall way, we need a suitable data set. Recall that strong utility and contextual utility are observationally identical on any one context. Therefore, data from any experiment where no subject makes choices from pairs on two or more contexts, such as Loomes and Sugden’s (1998) data, are not suitable: Such data cannot distinguish between strong and contextual utility. The experiment of Hey and Orme (1994), hereafter HO, is suitable since all subjects make choices from pairs on four distinct contexts. However, Section 4.5 showed that tractable parametric version of random preference RDEU can only be extended across three of those contexts. Therefore, I conﬁne attention to those three contexts: ‘‘The HO data’’ henceforth means the 150 choices Hey and Orme’s 80 subjects made from pairs on the contexts (0,d10,d20), (0,d10,d30), and (d10,d20,d30). As in Section 4.5, these contexts are denoted (0,1,2), (0,1,3), and (1,2,3), and indexed by their omitted outcome as 3, 2, and 0, respectively. The HO design has another relatively nice feature. In Section 4.5, jm was deﬁned as that coefﬁcient of relative risk aversion that would produce indifference between the lotteries in basic pair m under the EU structure. Let j and j be the maximum and minimum value, respectively, of jm across the pairs used in some experiment. We can call ½j ; j the identifying range of the experiment since the experiment’s pairs cannot identify coefﬁcients of relative risk aversion falling outside this range. A big identifying range is desirable if we suspect that the distribution of j may have substantial tails in a sampled population.22 In Loomes and Sugden (1998), the identifying range is [0.32,0.68] for subjects choosing from pairs on the context (0,d10,d20), and [0.17,0.74] for subjects choosing from pairs on the context (0,d10,d30). For Harrison and Rutstro¨m’s (2005) subjects who make choices from ‘‘gain only’’ pairs on contexts formed from the outcomes (0,$5,$10,$15), the identifying range [0.15,2.05] is substantially broader. In HO’s design, we have a still broader identifying range of [0.71,2.87]. So the HO data is relatively attractive in this sense. The HO experiment allowed subjects to express indifference between lotteries. HO model this with an added ‘‘threshold of discrimination’’ parameter within a strong utility model. An alternative parameter-free approach, and the one I take here, treats indifference in a manner suggested by decision theory, where the indifference relation Sm n Rm is deﬁned as the intersection of two weak preference relations, i.e. ‘‘Sm kn Rm \ Rm kn Sm .’’ This suggests treating indifference responses as two responses in likelihood functions – one of Sm being chosen from mc, and another of Rm

Stochastic Models for Binary Discrete Choice Under Risk

261

being chosen from m – but dividing that total log likelihood by two since it is really based on just one independent observation. Formally, the deﬁnite choice of S m by subject n adds lnðPnm Þ to the total log likelihood; the deﬁnite choice of Rm adds lnð1 Pnm Þ to that total; and indifference adds ½lnðPnm Þ þ lnð1 Pnm Þ=2 to that total. See also Papke and Wooldridge (1996) and Andersen, Harrison, Lau, and Rutstro¨m (2008) for related justiﬁcations of this approach. The HO experiment contains no FOSD pairs; therefore, we need no special speciﬁcation to account for low rates of violation of transparent FOSD. However, Moffatt and Peters (2001) found signiﬁcant evidence of nonzero tremble probabilities using the HO data, so I nevertheless add a tremble probability to all speciﬁcations after the manner of Eq. (6). Using Hey’s (2001) still larger data set, which contains 125 observations on each of four contexts, for each subject, I have estimated tremble probabilities on separately on all four contexts, for each subject. This estimation reveals no signiﬁcant correlation of these subject-speciﬁc estimates of on across contexts, suggesting that there is no reliable between-subjects variance in tremble probabilities – that is, that on ¼ o for all n – and I will henceforth assume that this is true of the population in all cases. In the sampled population, this is the assumption of no heterogeneity of tremble probabilities in the population, that is oc ¼ o for all c. Under this assumption, likelihood functions are in all instances built from probabilities that contain this invariant tremble probability: In the sampled population, this is Pcm ¼ ð1 oÞPcm þ o=2 for all c. The discussion here concentrates wholly on speciﬁcations of Pcm and its distribution in the sampled population. 5.2. Two Kinds of Comparisons: In-Sample Versus Out-of-Sample Fit I compare the performance of speciﬁcations in two ways. The ﬁrst way (very common in this literature) are ‘‘in-sample ﬁt comparisons.’’ Parameters are estimated for each speciﬁcation by maximum likelihood, using choice data from all three of the HO contexts (0, 2, and 3), and the resulting log likelihood of speciﬁcations for all three contexts are compared. The second way, which is rare in this literature but well-known generally, compares the ‘‘out-of-sample’’ ﬁt of speciﬁcations – that is, their ability to predict choices on pairs that are not used in estimation. For these comparisons, parameters are again estimated for each speciﬁcation by maximum likelihood, but using only choice data from the two HO contexts 2 and 3, that is contexts (0,1,3) and (0,1,2). These estimated parameters are then used to predict

262

NATHANIEL T. WILCOX

choice probabilities and calculate the log likelihood of observed choices on HO context 0, that is context (1,2,3), for each speciﬁcation. This is something more than a simple out-of-sample prediction, which could simply be a prediction to new choices made from pairs on the same contexts used for estimation, which Busemeyer and Wang (2000) call ‘‘cross-validation:’’ It is additionally an ‘‘out-of-context’’ prediction, which Busemeyer and Wang call ‘‘generalization.’’ This particular kind of out-of-sample ﬁt comparison may be quite difﬁcult in the HO data. Relatively safe choices are the norm for pairs on the contexts 2 and 3 of the HO data: The mean proportion of safe choices made by HO subjects in these contexts is 0.764, and at the individual level this proportion exceeds ½ for seventy of the 80 subjects. But relatively risky choices are the norm for pairs on the context 0 of the HO data: The mean proportion of safe choices there is just 0.379, and falls short of ½ for 58 of 80 subjects. Out-of-sample prediction will be difﬁcult: From largely safe choices in the ‘‘estimation contexts’’ 2 and 3, speciﬁcations need to predict largely risky choices in the ‘‘prediction context’’ 0.

5.3. Choosing an Approach to the Utility of Money The apparent switch in the balance of safe choices across contexts has its counterpart in Hey and Orme’s (1994) estimation results. HO estimate a variety of structures combined with strong utility, and estimate these speciﬁcations individually – that is, each structure is estimated separately for each subject n, using strong utility as the stochastic model. Additionally, for all structures that specify a utility function on outcomes, HO take the nonparametric approach to the utility function. Given the latent variable form of strong utility models and the afﬁne transformation property of the utility of money, just three of the ﬁve potential parameters ln , un0 , un1 , un2 , and un3 are identiﬁed. HO set un0 ¼ 0 and ln ¼ 1, and estimate un1 , un2 , and un3 directly. This allows the utility function to take arbitrary shapes across the outcome vector (0,1,2,3). HO found that estimated utility functions overwhelmingly fall into two classes: Concave utility functions, and inﬂected utility functions that are concave on the context (0,1,2) but convex on the context (1,2,3). The latter class is quite common, accounting for 30–40% of subjects (depending on the structure estimated). Because of this, I follow HO and avoid simple parametric functional forms (such as CARA or CRRA) that force concavity or convexity across the entire outcome vector (0,1,2,3), instead adopting their nonparametric

Stochastic Models for Binary Discrete Choice Under Risk

263

treatment of utility functions in strong, strict, contextual, and WV speciﬁcations – that is, non-RP speciﬁcations. This seems especially advisable here, where the focus is on the performance of the stochastic models. However, I set un0 ¼ 0 and un1 ¼ 1 for all subjects n, and view ln , un2 , and un3 as the individual parameters of interest in non-RP speciﬁcations, in keeping with the parameterization conventions of this chapter. The similar move in the case of RP speciﬁcations is allowing independent draws of the gamma variates gn1 and gn2 that determine the distribution of subject n’s random utilities un2 and un3 ; for this purpose we view the shape parameters fn1 and fn2 , and the scale parameter kn , as the individual parameters of interest. This also allows for both the concave and inﬂected shapes of mean utility functions across subjects, as reported by Hey and Orme (1994).

5.4. Allowing for Heterogeneity One of my themes has been that aggressive aggregation can destroy or distort speciﬁcation predictions and properties at the level of individuals when subjects in fact differ, as illustrated earlier in Sections 3.1.2 and 4.1.3. Therefore, it seems prudent to allow for heterogeneity in econometric comparisons of speciﬁcations. There are several different ways to approach heterogeneity. Perhaps the most obvious way is to treat every subject separately, estimating parameters of speciﬁcations separately for every subject: Call this individual estimation. This approach has much to recommend it in principle, and admirable exemplars both in economics (Hey & Orme, 1994; Hey, 2001) and psychology (Tversky, 1969). If individual subject samples were ‘‘large’’ in the sense that they were big enough for asymptotic properties to approximately hold true with individual estimation, there would perhaps be nothing left to say. Many would say that in this case, individual estimation dominates any alternative for the purpose of evaluating stochastic models of individual behavior. In the HO data, we have 150 observations per subject. Is this sample size ‘‘large’’ in the aforementioned sense? Each discrete choice carries very little information about any hypothesized continuous latent construct we wish to estimate, such as parameters of a V-distance or the precision parameter l. Additionally, estimating k parameters of a nonlinear function is very different from estimating effects of k orthogonal regressors, such as k independently varied treatment variables. This is because the ﬁrst derivatives of a nonlinear function with

264

NATHANIEL T. WILCOX

respect to parameters, which play the mathematical role of regressors in nonlinear estimation, are usually correlated with one another (as orthogonal treatment indicators are not). For both these reasons (because our data is discrete and our speciﬁcations are nonlinear) estimation of speciﬁcations of discrete choice under risk is potentially a very data-hungry enterprise – much more so than intuition might suggest. In Wilcox (2007b), Monte Carlo simulations suggest that for the purpose of in-sample comparisons of the ﬁt of different stochastic models, the HO data is indeed ‘‘large’’ in the aforementioned sense. For instance, consider the 100 HO data set observations of choices on the contexts (0,1,2) and (0,1,3). Monte Carlo methods allow us to create simulated data sets that resemble this real data set, except that a particular speciﬁcation can be made the ‘‘true’’ speciﬁcation or ‘‘data-generating process’’ in the simulated data sets. We can estimate both the true speciﬁcation and other speciﬁcations on such simulated data, and see whether log likelihood comparisons correctly choose the true speciﬁcation. In fact, this seems to be the case in most such simulated data sets with individual estimation when the ﬁt comparison is conﬁned to the same choice data used for estimation – that is, for in-sample ﬁt comparisons. This nice result does not hold for out-of-sample comparisons. We can also create simulated data sets of 150 choices on the three contexts (0,1,2), (0,1,3), and (1,2,3), where again we know the true speciﬁcation or data-generating process. We can again perform individual estimation using the data on the contexts (0,1,2) and (0,1,3), but now use those estimates to predict choices and compute out-of-sample log likelihoods for the choices from pairs on the context (1,2,3). We can then see whether comparisons of these out-of-sample log likelihoods correctly choose the true speciﬁcation. It turns out that this procedure produces an extreme bias favoring strong utility models. For instance, Wilcox (2007b) reports a Monte Carlo simulation in which (EU,RP) is the true speciﬁcation. Out-ofsample log likelihood comparisons using individual estimation never correctly identify RP as the true stochastic model, and in half of these samples strong utility is incorrectly identiﬁed as ﬁtting signiﬁcantly better than RP out-of-sample. Similar results hold when contextual utility is the true stochastic model. This seems inescapable: For the purpose of out-ofsample prediction based on individual estimation, the HO data set is not ‘‘large’’ in the aforementioned sense. Or, put differently, individual estimation suffers from a powerful ﬁnite sample bias when it comes to out-of-sample prediction as a method of evaluating alternative stochastic models.

Stochastic Models for Binary Discrete Choice Under Risk

265

Individual estimation is not, therefore, a suitable treatment of heterogeneity in samples of the HO size, when the purpose of the estimation is out-of-sample prediction and comparison of speciﬁcations. The alternative for the HO data set is random parameters estimation, which is illustrated well by Loomes et al. (2002) and Moffatt (2005), and this is the method I will use here. There are 80 subjects in the HO data set, which is half again as many as the 53 subjects in Hey (2001). This larger cross-sectional size in the HO data is better for the purpose of random parameters estimation. Unfortunately, the HO data does not contain any covariates: It can also be helpful to condition structural or stochastic parameters (or their distributions in random parameters estimations) on covariates such as demographic variables, cognitive ability measures, and/or personality scales. Readers should consult Harrison and Rutstro¨m (2008 – this volume) to see examples of this with demographic variables. Although it is tempting to view conditioning on covariates and random parameters estimation as substitutes, I think that ideally one would want to do both at once. Surely only a part of the valid (i.e., stable, repeatable, and reliable) cross-sectional variance in risk parameters is explainable by easily observed covariates. Therefore, attempting to account for heterogeneity solely by conditioning on them will surely leave a potential for residual aggregation biases associated with what they miss. Correspondingly, surely every random parameters estimation is based on distributional assumptions that are at best only approximately true: If the approximation is poor, then that estimation will also suffer from bias. When we do both at once, the covariates will ease some of the inferential burden borne by distributional assumptions of the random parameters, and the random parameters will catch much of the variance missed by the covariates. In the end, the only reason I do not condition my estimations on covariates here is because the HO data set does not have any. But we should also remember that as we add either covariates or distributional parameters or both, we are burning precious degrees of freedom. The truth is that we need data sets that are bigger in almost every conceivable dimension: More subjects, more pairs, more repetitions, and more covariates. 5.4.1. General Framework for Random Parameters Estimation Let s denote a particular speciﬁcation. Suppose that s is the ‘‘true datagenerating process’’ or DGP for all subject types in the sampled population. Let cs ¼ ðbs ; as Þ denote a vector of parameters governing choice from pairs for speciﬁcation s. Here, bs is the structural parameter vector of speciﬁcation s: It contains utilities of outcomes u2 and u3 whenever s is a non-RP speciﬁcation, and also the weighting function parameter g whenever s is an RDEU speciﬁcation. The vector as is the stochastic parameter vector of

266

NATHANIEL T. WILCOX

speciﬁcation s, which governs the shape and/or variance of distributions determining choice probabilities: This is l in non-RP speciﬁcations, and is the vector ðf1 ; f2 ; kÞ in RP speciﬁcations. Let J s ðcs jys Þ denote the joint c.d.f. governing the distribution of cs in the population from which subjects are sampled, where ys are parameters governing J s . Notice that we are now thinking of a subject type c as a parameter vector, and we are thinking of this vector as following some joint distribution J s in the sampled population; that distribution’s shape is governed by another vector of parameters ys . Let yns be the true value of ys in that population. We want an estimate y~ s of yns : This is what is meant by random parameters estimation. Sensible random parameters estimation requires a reasonable and tractable form for J s that arguably characterizes main features of the joint distribution of parameter vectors cs in the sample. The approach I take to choosing J s is empirical: In essence, exploratory individual estimations produce some rough facts about the ‘‘look’’ of the distribution of vectors cs in the sample, under the null of speciﬁcation s, in the form of correlations and ﬁrst principal component. The idea is to build a distribution from independent standard normal variates that captures the most salient features of that ‘‘look.’’ The best way to explain the approach, I believe, is by a detailed example using the most prosaic speciﬁcation – the (EU,Strong) speciﬁcation. Appendix E outlines the approach more generally, and provides the exact random parameters form used for all speciﬁcations. Obviously, judgment enters into this approach, and the judgments could be very different from different perspectives. For instance, self-selection plays some role in the composition of our laboratory samples, and this may quite literally shape ‘‘the sampled population’’ in real and important ways.23 Moreover, each of the ten speciﬁcations could, to some extent, produce very different looking rough facts. Fortunately, this does not seem to be an issue: There is a surprising degree of similarity between the rough distributional facts that seem to emerge from individual estimations of the different speciﬁcations, and this is noted in Appendix E. This is good, because it allows the form of the distribution J s to be very similar across speciﬁcations (and it is, and this is elaborated in Appendix E). 5.4.2. The Random Parameters Approach for EU with Strong Utility: An Illustration At the level of an individual subject, without trembles and suppressing the subject superscript n, the (EU,Strong) model is Pm ¼ Lðl½ðsmj tmj Þuj þ ðsmk tmk Þuk þ ðsml tml Þul Þ

(42)

Stochastic Models for Binary Discrete Choice Under Risk

267

where LðxÞ ¼ ½1 þ expðxÞ1 is the logistic c.d.f. (which will be consistently employed as the function H(x) for strong, strict, contextual, and WV models). In terms of the two underlying utility parameters u2 and u3 to be estimated, the utilities ðuj ; uk ; ul Þ in Eq. (42) are ðuj ; uk ; ul Þ ¼

ð1; u2 ; u3 Þ for pairs m on context c ¼ 0; that is ð1; 2; 3Þ; ð0; 1; u3 Þ for pairs m on context c ¼ 2; that is ð0; 1; 3Þ; and ð0; 1; u2 Þ for pairs m on context c ¼ 3; that is ð0; 1; 2Þ (43)

I begin by estimating the parameters of a simpliﬁed version of Eq. (42) individually, for 68 of HO’s 80 subjects,24 using all 150 observations of choices on the contexts 0, 2, and 3 combined. This initial subject-bysubject estimation gives a rough impression of the look of the joint distribution of parameter vectors c ¼ ðu2 ; u3 ; lÞ across subjects, and how we might choose JðcjyÞ to represent that distribution. At this initial step, o is not estimated, but rather assumed constant across subjects and equal to 0.04.25 Estimation of o is undertaken later in the random parameters estimation. Therefore, I begin by estimating u2, u3, and l, using the choice probabilities o Pm ¼ ð1 oÞPm þ 2 (44) ¼ 0:96Lðl½ðsmj tmj Þuj þ ðsmk tmk Þuk þ ðsl tl Þul Þ þ 0:02 for each subject (temporarily ﬁxing o at 0.04 for each subject). The log likelihood function for subject n is X LLn ðu2 ; u3 ; lÞ ¼ ynm lnðPm Þ þ ð1 ynm Þ lnð1 Pm Þ (45) m

with the speciﬁcation of Pm in Eq. (44). This is maximized for each subject n, ~ n ¼ ðu~n ; u~n ; l~ n Þ, initial estimates for each subject n. Fig. 3 graphs yielding c 2 3 n lnðu~ n2 1Þ, lnðu~n3 1Þ, and lnðl~ Þ against their ﬁrst principal component, which accounts for about 69% of their collective variance.26 The ﬁgure also shows regression lines on the ﬁrst principal component. The Pearson correlation between lnðu~ n2 1Þ and lnðu~ n3 1Þ is fairly high (0.848). Given that these are both estimates and hence contain some pure sampling error, it appears that an assumption of perfect correlation between them in the underlying population may not do too much violence to truth. Therefore, I make this assumption n about the joint distribution of c in the population. While lnðl~ Þ does appear to share limited variance with lnðu~n2 1Þ and lnðu~ n3 1Þ (Pearson correlations

NATHANIEL T. WILCOX

natural logarithm of precision and utility parameter estimates

268

-3

Fig. 3.

6

ln() 4

2

ln(u3 -1) 0 -1

1

-2

3 ln(u2 -1)

-4 First principal component of variance

Shared Variance of Initial Individual Parameter Estimates in the (EU,Strong) Model.

of 0.22 and 0.45, respectively), it obviously either has independent variance of its own or is estimated with relatively low precision. These observations suggest modeling the joint distribution JðcjyÞ of c ¼ ðu2 ; u3 ; l; oÞ as being generated by two independent standard normal deviates xu and xl , as follows: u2 ðxu ; yÞ ¼ 1 þ expða2 þ b2 xu Þ; u3 ðxu ; yÞ ¼ 1 þ expða3 þ b3 xu Þ; lðxu ; xl ; yÞ ¼ expðal þ bl xu þ cl xl Þ and o a constant; where ðy; oÞ ¼ ða2 ; b2 ; a3 ; b3 ; al ; bl ; cl ; oÞ are parameters to be estimated (46) In essence, Eq. (46) characterize the sampled population as having two heterogeneous dimensions ‘‘indexed’’ by two independent normal variates. The ﬁrst variate xu can be thought of as the ﬁrst principal component of the vector c ¼ ðu2 ; u3 ; lÞ, mainly associated with heterogeneity of utility functions, while the second variate xl captures a second dimension of heterogeneity mainly associated with precision. The term bl xu in the

269

Stochastic Models for Binary Discrete Choice Under Risk

equation for precision, however, allows for a relationship between utility functions and precision in the sampled population, by allowing precision to partake of some of the ﬁrst principal component of variance. The (EU,Strong) speciﬁcation, conditional on xu, xl, and ðy; oÞ, then becomes Pm ðxu ; xl ; y; oÞ ¼ ð1 oÞLðlðxu ; xl ; yÞ½ðsmj tmj Þuj þ ðsmk tmk Þuk þ ðsml tml Þul Þ þ

o ; 2

where uj ¼ 1 for pairs m on context c ¼ 0; uj ¼ 0 otherwise;

(47)

uk ¼ u2 ðxu ; yÞ for pairs m on context c ¼ 0; uk ¼ 1 otherwise; and ul ¼ u2 ðxu ; yÞ for pairs m on context c ¼ 3; ul ¼ u3 ðxu ; yÞ otherwise

Now, we estimate yn and on by maximizing this random parameters log likelihood function in these parameters: LLðy; oÞ ¼ X ZZ Y ynm 1 ynm ln P ðx ; x ; y; oÞ ½1 P ðx ; x ; y; oÞ ÞdFðx Þ dFðx m u l u l m m u l n

(48)

where F is the standard normal c.d.f. and Pm ðxu ; xl ; y; oÞ is as given in Eq. (47).27 The integrations in Eq. (48) take account of how heterogeneity in the sampled population modiﬁes population choice proportions in ways that do not necessarily match the individual level properties of the speciﬁcation. This is how the random parameters approach accounts for potentially confounding effects of heterogeneity discussed earlier. The estimation problem is recast as choosing parameters y that govern the distribution of the speciﬁcation’s parameter vector ðu2 ; u3 ; lÞ in the population, rather than choosing a speciﬁc pooled ‘‘representative decision maker’’ parameter vector or individual vectors for each decision maker. Maximizing expressions like Eq. (48) can be difﬁcult, but fortunately the linear regression lines in Fig. 3 may provide reasonable starting values for the parameter vector y. That is, initial estimates of the a and b coefﬁcients in y are the intercepts and slopes from the linear regressions of lnðu~n2 1Þ, n lnðu~n3 1Þ, and lnðl~ Þ on their ﬁrst principal component; and the root mean n squared error of the regression of lnðl~ Þ on the ﬁrst principal component provides an initial estimate of cl. Table 6 shows the results of maximizing Eq. (48) in ðy; oÞ. As can be seen, the initial parameter estimates are good starting values, though some ﬁnal estimates are signiﬁcantly different from the initial estimates (judging from

270

NATHANIEL T. WILCOX

Table 6. Random Parameters Estimates of the (EU,Strong) Model, using Choice Data from the Contexts (0,1,2), (0,1,3), and (1,2,3) of the Hey and Orme (1994) Sample. Structural and Stochastic Parameter Models

Distributional Parameter

Initial Estimate

Final Estimate

Asymptotic Standard Error

Asymptotic t-statistic

u2 ¼ 1 þ expða2 þ b2 xu Þ

a2 b2

1.2 0.57

1.28 0.514

0.0411 0.0311

31.0 16.5

u3 ¼ 1 þ expða3 þ b3 xu Þ

a3 b3

0.51 0.63

0.653 0.657

0.0329 0.0316

16.9 20.8

l ¼ expðal þ bl xu þ cl xl Þ

al bl cl

3.2 0.49 0.66

3.39 0.658 0.584

0.101 0.124 0.0571

33.8 5.32 10.2

o constant

o

0.04

0.0105

4.26

0.0446

Log likelihood ¼ 5311.44

Notes: xu and xl are independent standard normal variates. Standard errors are calculated using the ‘‘sandwich estimator’’ (Wooldridge, 2002) and treating all of each subject’s choices as a single ‘‘super-observation,’’ that is, using degrees of freedom equal to the number of subjects rather than the number of subjects times the number of choices made.

the asymptotic standard errors of the ﬁnal estimates). These estimates produce the log likelihood in the ﬁrst column of the top row of Table 7, to be discussed shortly. Note that wherever b~2 ab~3 , sufﬁciently large or small values of the underlying standard normal deviate xu imply a violation of monotonicity (i.e., u2 4u3 ). Rather than imposing b2 ¼ b3 as a constraint on the estimations, I impose the weaker constraint |(a2a3)/(b3b2)|>4.2649, making the estimated population fraction of such violations no larger than 105. This constraint does not bind for the estimates shown in Table 6. For other non-RP estimations, it rarely binds (and when it does, it is never close to signiﬁcantly binding). Recall that the nonparametric treatment of the utility of outcomes avoids a ﬁxed risk attitude across the outcome vector (0,1,2,3), as would be implied by a one-parameter parametric form such as CARA or CRRA utility. The estimates shown in Table 6 imply a population in which about 68% of subjects have a weakly concave utility function, while the remaining 32% have an inﬂected ‘‘concave then convex’’ utility function. This is very similar to the results of Hey and Orme’s (1994) individual estimations: That is, the random parameters estimation used here produces estimated sample heterogeneity of utility function shapes much like that suggested by Hey and

271

Stochastic Models for Binary Discrete Choice Under Risk

Table 7.

Log Likelihoods of Random Parameters Characterizations of the Models in the Hey and Orme Sample.

Stochastic Model

EU structure Strong utility Strict utility Contextual utility Wandering vector Random preferences RDEU structure Strong utility Strict utility Contextual utility Wandering vector Random preferences

Estimated on all Three Contexts

Estimated on Contexts (0,1,2) and (0,1,3)

Log likelihood on all three contexts (in-sample ﬁt)

Log likelihood on context (1,2,3) (out-of-sample ﬁt)

5311.44 5448.50 5297.08 5362.61 5348.36

2409.38 2373.12 2302.55 2417.76 2356.60

5207.81 5306.48 5190.43 5251.82 5218.00

2394.75 2450.41 2281.36 2397.91 2335.55

Orme’s individual strong utility estimations. The random parameters treatment of the other speciﬁcations is very similar to what has been discussed here in detail for the (EU,Strong) speciﬁcation, with the necessary changes made; Appendix E shows this in detail.

5.5. A Comparison of the Speciﬁcations Table 7 displays both the in-sample and out-of-sample log likelihoods for the ten speciﬁcations. The top ﬁve rows are the EU speciﬁcations, and the bottom ﬁve rows are the RDEU speciﬁcations; for each structure, the ﬁve rows show results for strong utility, strict utility, contextual utility, the WV model and RPs. The ﬁrst column shows total in-sample log likelihoods, and the second column shows total out-of-sample log likelihoods. Contextual utility always produces the highest log likelihood, whether it is combined with EU or RDEU, and whether we look at in-sample or out-of-sample log likelihoods (though the log likelihood advantage of contextual utility is most pronounced in the out-of-sample comparisons). Buschena and Zilberman (2000) and Loomes et al. (2002) point out that the best-ﬁtting stochastic model may depend on the structure estimated, a very sensible econometric

272

NATHANIEL T. WILCOX

point, and offer empirical illustrations. Yet in Table 7 contextual utility is the best stochastic model regardless of whether we view the matter from the perspective of the EU or RDEU structures, or from the perspective of in-context or out-of-context ﬁt. Table 7 suggests that the relative consequence of structure and stochastic model depends on whether we examine in-sample or out-of-sample ﬁt. Consider ﬁrst the in-sample ﬁt column. Holding stochastic models constant, the maximum improvement in log likelihood associated with moving from EU to RDEU is 142.02 (with strict utility), and the improvement is 106.64 for the best-ﬁtting stochastic model (contextual utility). Holding structures constant instead, the maximum improvement in log likelihood associated with changing the stochastic model is 151.48 (with the EU structure, switching from strict to contextual utility), but this is atypical: Omitting strict utility speciﬁcations, which have unusually poor in-sample ﬁt, the maximum improvement is 65.53 (with the EU structure, switching from the WV model to contextual utility). Therefore, except for the especially poor strict utility ﬁts, in-sample comparisons make stochastic models appear to be a sideshow relative to choices of structure. This appearance is reversed when we look at out-of-sample comparisons – that is, predictive power. Looking now at the out-of-sample ﬁt column, notice ﬁrst that under strict utility, RDEU actually ﬁts worse than EU does. But strict utility is an unusually poor performer overall, so perhaps we should set it aside. Among the remaining four stochastic models, the maximum out-of-sample ﬁt improvement associated with switching from EU to RDEU is 21.19 (for contextual utility). Holding structures constant instead, the maximum out-of-sample ﬁt difference between the stochastic models (again omitting strict utility) is 116.55 (for the RDEU structure, switching from the WV model to contextual utility). In the realm of out-ofsample prediction, then, structures seem inconsequential relative to the stochastic models. Moreover, it is worth emphasizing that the improvements associated with changing stochastic models ‘‘cost no parameters’’ here since the number of parameters estimated is ﬁxed for a given structure. There has been a tendency toward structural innovation rather than stochastic model innovation over the last quarter century. Perhaps, at least in the realm of prediction, we ought to be paying more attention to stochastic models, as repeatedly urged by Hey (Hey & Orme, 1994; Hey, 2001; Hey, 2005) and suggested by Ballinger and Wilcox (1997). Table 8 reports the results of a more formal comparison between the n stochastic models, conditional on each structure. Let D~ denote the difference between the estimated log likelihoods (in-sample or out-of-sample)

–

–

z ¼ 0.981 p ¼ 0.163 –

–

–

z ¼ 1.723 p ¼ 0.042 –

–

z ¼ 0.877 p ¼ 0.190 z ¼ 0.44 p ¼ 0.330 –

–

z ¼ 0.703 p ¼ 0.241 z ¼ 1.574 p ¼ 0.058 –

z ¼ 2.239 p ¼ 0.013 z ¼ 1.765 p ¼ 0.0388 z ¼ 2.697 p ¼ 0.0035 –

z ¼ 2.354 p ¼ 0.0093 z ¼ 0.739 p ¼ 0.230 z ¼ 3.236 p ¼ 0.0006 –

z ¼ 4.352 po0.0001 z ¼ 3.808 po0.0001 z ¼ 5.973 po0.0001 z ¼ 2.700 po0.0035

z ¼ 6.067 po0.0001 z ¼ 5.419 po0.0001 z ¼ 5.961 po0.0001 z ¼ 5.079 po0.0001

–

–

z ¼ 3.879 po0.0001 –

–

–

z ¼ 4.387 po0.0001 –

Notes: Positive z means the row stochastic model ﬁts better than the column stochastic model.

Wandering vector

RDEU structure Contextual utility Random preferences Strong utility

Wandering vector

EU structure Contextual utility Random preferences Strong utility

Strict utility

–

z ¼ 3.304 p ¼ 0.0005 z ¼ 1.652 p ¼ 0.049 –

–

z ¼ 3.044 p ¼ 0.0012 z ¼ 1.639 p ¼ 0.051 –

Strong utility

z ¼ 4.040 po0.0001 z ¼ 2.073 p ¼ 0.0191 z ¼ 0.261 p ¼ 0.397 –

z ¼ 3.509 p ¼ 0.0002 z ¼ 2.148 p ¼ 0.016 z ¼ 0.965 p ¼ 0.167 –

Wandering vector

z ¼ 5.978 po0.0001 z ¼ 3.831 po0.0001 z ¼ 3.918 po0.0001 z ¼ 5.695 po0.0001

z ¼ 2.739 p ¼ 0.0031 z ¼ 1.422 p ¼ 0.078 z ¼ 0.028 p ¼ 0.49 z ¼ 0.322 p ¼ 0.37

Strict utility

Random preferences

Wandering vector

Random preferences

Strong utility

Estimated on Contexts (0,1,2) and (0,1,3), and Comparing Fit on Context (1,2,3) (Out-of-context Fit Comparison)

Estimated on all Three Contexts, and Comparing Fit on all Three Contexts (In-context Fit Comparison)

Table 8. Vuong (1989) Nonnested Comparisons between Fit of Stochastic Model Pairs, In-sample and Out-of-sample Fit.

Stochastic Models for Binary Discrete Choice Under Risk 273

274

NATHANIEL T. WILCOX

from a pair of speciﬁcations, for subject n. Vuong (1989) provides an n asymptotic justiﬁcation for treating a z-score based on the D~ as following a normal distribution under the hypothesis that two non-nested speciﬁcations are equally good, in the sense that they are equally close to the true speciﬁcation (neither speciﬁcation topbe ﬃﬃﬃﬃ the true speciﬁcation). P needs ~ n =ð~sD N Þ, where s~D is the sample The statistic is computed as z ¼ N D k¼1 n standard deviation of the D~ across subjects n (calculated without the usual adjustment for a degree of freedom) and N is the number of subjects. Table 8 reports these z-statistics, and associated p-values against the null of equally good ﬁt, with a one-tailed alternative that the directionally better ﬁt is signiﬁcantly better. While contextual utility is always directionally better than its competitors, no convincingly signiﬁcant ordering of the stochastic models emerges from the in-sample comparisons shown in the left half of Table 8, though strict utility is clearly signiﬁcantly worse than the other four stochastic models. Contextual utility shines, though, in the out-of-sample ﬁt comparisons in the right half of Table 8, regardless of whether the structure is EU or RDEU, where it beats the other four stochastic models with strong signiﬁcance. In spite of the problems with individual estimation and prediction discussed in Wilcox (2007b), it is worth remarking on the relative performance of an individual estimation approach. Unsurprisingly, total in-sample ﬁts of speciﬁcations with individual estimation are much better than the in-sample ﬁts shown in Table 7. Yet total out-of-sample ﬁts of speciﬁcations with individual estimation are uniformly worse than the out-ofsample random parameter ﬁts in Table 7 for all ten speciﬁcations. There is, of course, one prosaic reason to expect this. The random parameter model ﬁts are based on at most 11 parameters (RDEU models) for characterizing the entire sample (the parameters in y), whereas individual model ﬁts are based on many more parameters (RDEU models have ﬁve parameters per subject; this gives 400 parameters for 80 subjects). We should be unsurprised that an out-of-sample prediction based on 400 parameter estimates fares worse than one based on 11 parameters: Shrinkage associated with liberal burning of degrees of freedom is to be expected, after all. However, there is some surprise here too. Consider that as an asymptotic matter, the individual estimation ﬁts must be better than the random parameters ﬁts, even if the random parameters characterization of heterogeneity – that is, the speciﬁcation of the joint distribution function J(c|y) – is exactly right. This is because a random parameters likelihood function takes the expectation of probabilities with respect to J before taking logs, while the individual estimations do not. Since the log likelihood

Stochastic Models for Binary Discrete Choice Under Risk

275

function is concave in P, Jensen’s inequality implies that asymptotically (i.e., as estimated probabilities converge to true probabilities for both the random parameters and individual estimations) the ‘‘expected log likelihoods’’ of individual estimation must exceed the ‘‘log expected likelihoods’’ of random parameters estimation. That this asymptotic expectation is so clearly reversed for out-of-sample predictions (even though our choice of J, the distribution of parameters in the sampled population, is surely approximate at best) just hammers home how far individual estimations are from large sample consistency, as noted in Wilcox (2007b), even in a sample as ‘‘large’’ as the HO data.

6. CONCLUSIONS: A BLUNT PERSONAL VIEW I take two facts as given. First, discrete choice is highly stochastic; and second, people differ a lot. To me, any account of the structure of discrete choice under risk that attempts to skirt these facts is unacceptable. The reasons should now be clear. First, stochastic models spindle, fold, and in general mutilate the properties and predictions of structures, and each stochastic model produces its own distinctive mutilations (see Table 5). Second, aggregation across different types of people further hides, distorts, and in general destroys the individual level properties of speciﬁcations – that is particular combinations of structure and stochastic model. This should be clear from the discussions of the common ratio effect (Section 3.1.2) and how differently individual and aggregate tests of betweenness in spread triples appear (Section 4.1.3). I conclude that the practice of testing decision theories by looking only at sample choice proportions, sample switching rates, sample proportions of predicted versus unpredicted violations, and so on is indefensible. It follows from exactly the same considerations that the common practice of estimating a single pooled ‘‘representative decision maker’’ preference functional is equally indefensible. Stochastic model implications and heterogeneity must be dealt with: They are at least as important in determining sample properties as structures are. When we turn from the realm of theory-testing to the twin realms of estimation and prediction, the case is similar and if anything stronger. It turns out that different stochastic models imply different things about the empirical meaning of estimated risk aversion parameters across people and contexts (Wilcox, 2007a). Some stochastic models (those that are CARA- and CRRA-neutral) identify changes in risk-taking across context shifts as changes in structural risk aversion, while others do not. And as

276

NATHANIEL T. WILCOX

shown in Table 7, stochastic models appear to have much more to do with successful prediction than structures do, at least in one well-known data set. It is hard to escape the conclusion that decision research could beneﬁt strongly from more work on stochastic models. Structure has been worried to death for a quarter of a century. How much better has this enabled us to predict? If the ﬁndings of Table 7 are general, the answer is: ‘‘Not that much and more effort should have been put into stochastic models.’’ At any rate, it is not clear that improving the prediction ﬁt by 21.19 of log likelihood (switching from EU to RDEU, with contextual utility), at the cost of three extra parameters, should earn any trips to Stockholm. Stochastic models have been unaccountably neglected and gains in predictive power are likely to come from working on improving them. It will be no surprise that I like my own model, contextual utility, better than the other alternatives I have closely considered here. It predicts best; it makes sense of the ‘‘more risk averse’’ relation across people and contexts; it is CARA- and CRRA-neutral; and (I view this as good, though others dislike this property of the strong/strict/contextual family) it can explain parts of common ratio effects, betweenness violations and other phenomena normally associated with nonlinearity in probability through its form and through heterogeneity, without recourse to probability weighting functions. Yet I would not be hugely surprised if bona ﬁde theorists can do better than contextual utility, and I hope they will try. To aid these theorists, we experimentalist might do more tests of stochastic model properties that hold for broad classes of structures. I have in mind here the varieties of stochastic transitivity and simple scalability: Stochastic model predictions about these properties are the same for all transitive structures, and not simply EU and RDEU. Replicating and extending the psychologists’ experimental canon on these kinds of properties will help us build a strong base of stochastic facts that are at least general for transitive structures. For instance, contextual utility should obey simple scalability for pairs that share the same context, but should violate it in distinctive ways for pairs on different contexts, much in the manner of the Myers effect discussed by Busemeyer and Townsend (1993). This is true of contextual utility whether the structure is EU, RDEU, or any transitive structure. RPs are attractive to many economists, but they suffer from several problems, not least of which is their intractability across contexts for structures more complex than EU. But I think the really deep problem with RPs is the near impossibility of building an interesting cumulative and general empirical research program about them. This is no problem at all for other models: As discussed above, models like strong, strict, and contextual

Stochastic Models for Binary Discrete Choice Under Risk

277

utility make distinctive predictions (about stochastic transitivities and simple scalability) that should hold for all transitive structures. But the only RP prediction that is shared by a broad collection of structures is that the probability of an FOSD violation is zero. This prediction is wrong for the ‘‘nontransparent’’ violations discussed by Birnbaum and Navarrete (1998), and I argued here that FOSD properties are relatively weak ones when choosing among stochastic models. Therefore, RP models produce almost no interesting predictions that hold across a large class of structures. So it seems that there can be little accumulation of interesting knowledge about the performance of the RP hypothesis that is applicable across structures: Any particular study tests its predictions with a speciﬁc structure, and the predictions are wholly idiosyncratic to that structure. This looks to me like a recipe for little or no cumulative knowledge about the general truth or applicability of the RP hypothesis itself since there is almost no general prediction that it makes. That problem is not shared by the other stochastic models. Finally, this chapter has been selective. Neither Blavatskyy’s (2007) truncated error model, nor Busemeyer and Townsend’s (1993) decision ﬁeld theory, were a part of the contest in Section 5. These are also heteroscedastic models, resembling contextual utility and the WV model in various ways. I do believe that we are witnessing a fertile period for stochastic model innovation now. The likelihood-based ‘‘ﬁt comparison’’ approach taken here and elsewhere is good, but it needs to be complemented by some testing of general predictions that transcend particular functional forms, structures and parameter values. So I will close by urging exactly that. Proponents of models like contextual utility, decision ﬁeld theory, and truncated error models need to ﬁgure out what these models rule out, and not just show what they allow and how well they ﬁt. The stochastic transitivities and simple scalability properties, or testable modiﬁcations of these suited to heteroscedastic models, are the likely places to begin such work.

NOTES 1. Psychologists ask similar questions (see Busemeyer & Townsend, 1993). 2. There could be more than one ‘‘true’’ stochastic model in populations. Without prejudice, I ignore this here. 3. My restriction to lotteries without losses is for expositional clarity and focus; ignoring loss aversion here has no consequences for my main econometric points.

278

NATHANIEL T. WILCOX

4. Some experiments show a substantial ‘‘drift’’ with repetition toward increasingly safe choices (Hey & Orme, 1994; Loomes & Sugden, 1998) or a small one (Ballinger & Wilcox, 1997). If most subjects are risk averse, decreased random parts of decision making with repetition can explain this (see Loomes et al., 2002 for details). Harrison et al. (2005) ﬁnd order effects with just two trials. I abstract from these phenomena here. 5. The two structures considered here are transitive ones. A broader deﬁnition allowing for both transitive and nontransitive structures is a function D such that DðSm ; Rm jbn Þ 0:53Pnm ¼ 0:5. 6. There are alternative stochastic choice models under which this is not innocuous (e.g., Machina, 1985). The evidence on these alternatives is not encouraging, though as yet meager (see Hey & Carbone, 1995). 7. Such evidence (also found in Tversky & Kahneman, 1986) comes from hypothetical design or ‘‘near-hypothetical designs’’ (designs with vanishing likelihoods of decisions actually counting), but my hunch is that we would also see this in an experiment with incentive-compatible mechanisms and more typical likelihoods that the decisions count, though perhaps at a somewhat reduced frequency. 8. The proviso ‘‘binary’’ in this statement is quite important. There are phenomena that violate almost all stochastic models for choice amongst three or more alternatives in choice sets. Perhaps the best-known of these is the ‘‘asymmetrically dominated alternative effect’’ that violates regularity and independence from irrelevant alternatives, as well as Debreu’s (1960) similarity-based exception to the latter (see Huber, Payne, & Puto, 1982). 9. One may also condition on on a task order subscript t if, for instance, one believes that trembles become less likely with experience, as in Loomes et al. (2002) and Moffatt (2005). 10. For simplicity’s sake I assume throughout this chapter that parameter vectors producing indifference in any pair mhave zero measure for all n, so that the sets Bm and Bnm ¼ bjVðS m jbÞ VðRm jbÞ40 have equal measure. However, one may make indifference a positive probability event in various ways; for a strong utility approach based on a threshold of discrimination (see Hey & Orme, 1994). 11. If tmk smk ¼ 0, either Tm and Sm are identical, or m is an FOSD pair. In this latter case, the RP model implies a zero probability of choosing the dominated lottery, or with a small tremble probability, a on =2 probability of choosing the dominated lottery, as shown later. 12. Brieﬂy, let there be just three equally likely linear (and hence transitive) orderings in subject n’s urn of orderings of lotteries C, D, and E, denoted CDE, DEC, and ECD, where each ordering is from best to worst. As usual, consider the pairs {C,D}, {D,E}, and {C,E}, calling them pairs 1, 2, and 3, respectively. Then Pn1 ¼ 2=3 and Pn2 ¼ 2=3, but Pn3 ¼ 1=3, violating weak stochastic transitivity. 13. I have not seen this discussed in the literature, and it is not clear to me what restrictions on the preference orderings in the urn would be required to guarantee single-peakedness of all orders for all lottery triples. This could be a very mild, or a very strong, restriction in practice. 14. If we used the standard normal c.d.f., the ‘‘standard variance’’ would be 1; if we used the logistic c.d.f., the ‘‘standard variance’’ would be p2/3.

Stochastic Models for Binary Discrete Choice Under Risk

279

15. Note that since the distances dm are distances between entire lotteries, this is a measure of the similarity of two lotteries. One may also ask questions about the similarity of individual dimensions of lotteries, for example, are these two probabilities of receiving outcome zi very close, and hence so similar that the outcome zi can safely be ignored as approximately irrelevant to making a decision? This ‘‘dimension-level similarity’’ is a different kind of similarity not dealt with by dm, but it also has decision-theoretic force: It implies a different structure, usually an intransitive one with a D representation rather than a V one (see Tversky, 1969; Rubinstein, 1988; or Leland, 1994). 16. Appendix B shows that this satisﬁes the triangle inequality, and hence shows that contextual utility is a moderate utility model (obeys MST) for all lottery triples that generate three basic pairs. This rules out triples with FOSD pairs in them, but such pairs are taken care of in the special manner discussed later in the section on FOSD. 17. Hilton (1989, p. 214) originally pointed out that there is no necessary correspondence between expected risk premium orderings and choice probability orderings under random preference EU models. 18. They look at a smaller number of ﬁfty- and ninety-fold proportional shifts. The twenty-fold shifts are far more numerous. 19. See Busemeyer and Townsend (1993), page 438. The most robust violation of simple scalability is known as the Myers effect, which explicitly involves pairs with different contexts. In decision ﬁeld theory, the variance of evaluative noise increases with the range of lottery utilities in pairs (holding other pair features constant), and this largely accounts for the Myers effect. Contextual utility obviously has the same property, and so gives a similar explanation of the Myers effect. 20. The ratio relationship here is a generalization of the well-known fact that the ratio of independent w2 variates follows an F distribution. w2 variates are gamma variates with common scale parameter k ¼ 2. In fact, a beta-prime variate can be transformed into an F variate: If x is a beta-prime variate with parameters a and b, then bx/a is an F variate with degrees of freedom 2a and 2b. This is convenient because almost all statistics software packages contain thorough call routines for F variates, but not necessarily any call routines for beta-prime variates. 21. Although ratios of lognormal variates are lognormal, there is no similar simple parametric family for sums of lognormal variates. The independent gammas with common scale are the only workable choice I am aware of. 22. There is another reason why a relatively wide identifying range is useful even if we suspect that the actual range of j is narrow in the sampled population. This has to do with estimation of stochastic parameters in non-RP models. Suppose we happened to use a set of lotteries that all have an identical jm that also happens to equal the actual j of some subject. It is clear that Pm ¼ 0.5 for this subject for all the lottery pairs at jm regardless of our choice of lm in any non-RP model: In other words, the stochastic parameter lm is unidentiﬁed in this instance. More generally, the stochastic parameter lm in non-RP models is best identiﬁed for pairs that are well away from indifference, and this implies that an identifying range that is wider than the actual range of j still serves good identiﬁcation purposes. 23. For instance, consider an experiment with no ﬁxed participation payment (no ‘‘show-up fee’’ in experimental lingo) that requires a substantial investment of

280

NATHANIEL T. WILCOX

subject time and has uncertain payments. It wouldn’t be surprising if this design attracts a relatively risk-seeking subset of a campus student population. Supposing some measure of risk aversion was distributed normally in that population, then, we wouldn’t necessarily expect a normal distribution of that same measure of risk aversion in our laboratory sample. We probably don’t know enough about general distributions of any measure of risk aversion in any campus student body to know what the unconditional distribution actually looks like, or to know what portion of that distribution would be drawn to an experiment by self-selection. Nevertheless, it should be clear that self-selection is literally expected to inﬂuence the shape of the distributions of subject types we get in our laboratory samples. See Harrison, Lau, and Rutstro¨m (2007) for evidence on this matter. 24. Twelve of the eighty HO data make three or fewer choices of the riskier lottery in any pair on contexts 2 and 3. They can ultimately be included in random parameters estimations, but at this initial stage of individual estimation it is either not useful (due to poor identiﬁcation) or simply not possible to estimate models for these subjects. 25. Estimation of o is a nuisance at the individual level. Trembles are rare enough that individual estimates of o are typically zero for individuals. Even when estimates are nonzero, the addition of an extra parameter to estimate increases the noisiness of the remaining estimates and hides the pattern of variance and covariance of these parameters that we wish to see at this step. 26. Two hugely obvious outliers have been removed for both the principal components extraction and the graph. 27. Such integrations must be performed numerically in some manner for estimation. I use gauss-hermite quadratures, which are practical up to two or three integrals; for integrals of higher dimension, simulated maximum likelihood is usually more practical. Judd (1998) and Train (2003) are good sources for these methods.

ACKNOWLEDGMENTS John Hey generously made his remarkable data sets available to me. I also thank John Hey and Pavlo Blavatskyy, Jim Cox, Soo Hong Chew, Edi Karni, Ondrej Rydval, and especially Glenn Harrison for conversation, commentary, and/or help, though any errors here are solely my own. I thank the National Science Foundation for support under grant SES 0350565.

REFERENCES Aitchison, J. (1963). Inverse distributions and independent gamma-distributed products of random variables. Biometrika, 50, 505–508. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2008). Eliciting risk and time preferences. Econometrica, 76 (forthcoming).

Stochastic Models for Binary Discrete Choice Under Risk

281

Ballinger, T. P., & Wilcox, N. T. (1997). Decisions, error and heterogeneity. Economic Journal, 107, 1090–1105. Becker, G. M., DeGroot, M. H., & Marschak, J. (1963a). Stochastic models of choice behavior. Behavioral Science, 8, 41–55. Becker, G. M., DeGroot, M. H., & Marschak, J. (1963b). An experimental study of some stochastic models for wagers. Behavioral Science, 8, 199–202. Birnbaum, M. H., & Navarrete, J. B. (1998). Testing descriptive utility theories: Violations of stochastic dominance and cumulative independence. Journal of Risk and Uncertainty, 17, 49–78. Black, D. (1948). On the rationale of group decision making. Journal of Political Economy, 56, 23–34. Blavatskyy, P. R. (2006). Violations of betweenness or random errors? Economics Letters, 91, 34–38. Blavatskyy, P. R. (2007). Stochastic choice under risk. Journal of Risk and Uncertainty, 34, 259–286. Block, H. D., & Marschak, J. (1960). Random orderings and stochastic theories of responses. In: I. Olkin, S. G. Ghurye, W. Hoeffding, W. G. Madow & H. B. Mann (Eds), Contributions to probability and statistics: Essays in honor of Harold Hotelling (pp. 97–132). Stanford, CA: Stanford University Press. Buschena, D. E., & Zilberman, D. (2000). Generalized expected utility, heteroscedastic error, and path dependence in risky choice. Journal of Risk and Uncertainty, 20, 67–88. Busemeyer, J. R., & Townsend, J. T. (1993). Decision ﬁeld theory: A dynamic-cognitive approach to decision making in an uncertain environment. Psychological Review, 100, 432–459. Busemeyer, J. R., & Wang, Y.-M. (2000). Model comparisons and model selections based on generalization criterion methodology. Journal of Mathematical Psychology, 44, 171–189. Camerer, C. (1989). An experimental test of several generalized expected utility theories. Journal of Risk and Uncertainty, 2, 61–104. Camerer, C., & Ho, T.-H. (1999). Experience weighted attraction learning in normal-form games. Econometrica, 67, 827–874. Camerer, C., & Hogarth, R. (1999). The effects of ﬁnancial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty, 19, 7–42. Carbone, E. (1997). Investigation of stochastic preference theory using experimental data. Economics Letters, 57, 305–311. Carroll, J. D. (1980). Models and methods for multidimensional analysis of preferential choice (or other dominance) data. In: E. D. Lantermann & H. Feger (Eds), Similarity and choice (pp. 234–289). Bern, Switzerland: Huber. Carroll, J. D., & De Soete, G. (1991). Toward a new paradigm for the study of multiattribute choice behavior. American Psychologist, 46, 342–351. Chew, S. H. (1983). A generalization of the quasilinear mean with applications to the measurement of income inequality and decision theory resolving the Allais paradox. Econometrica, 51, 1065–1092. Chew, S. H., Karni, E., & Safra, Z. (1987). Risk aversion in the theory of expected utility with rank-dependent preferences. Journal of Economic Theory, 42, 370–381. Chipman, J. (1963). Stochastic choice and subjective probability. In: D. Willner (Ed.), Decisions, values and groups (pp. 70–95). New York: Pergamon. Conlisk, J. (1989). Three variants on the Allais example. The American Economic Review, 79, 392–407.

282

NATHANIEL T. WILCOX

Conte, A., Hey, J., & Moffatt, P. (2007). Mixture models of choice under risk. University of York, Discussion Paper in Economics 2007/6. Cox, J., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56, 45–60. Cummings, R. G., Harrison, G. W., & Rutstro¨m, E. E. (1995). Homegrown values and hypothetical surveys: Is the dichotomous choice approach incentive-compatible? American Economic Review, 85, 260–266. Debreu, G. (1958). Stochastic choice and cardinal utility. Econometrica, 26, 440–444. Debreu, G. (1960). Review of R. D. Luce – Individual choice behavior: A theoretical analysis. American Economic Review, 50, 186–188. Domencich, T., & McFadden, D. (1975). Urban travel demand: A behavioral analysis. Amsterdam: North-Holland. Edwards, W. (1954). A theory of decision making. Psychological Bulletin, 51, 380–417. Fechner, G. (1966/1860). Elements of psychophysics (Vol. 1). New York: Holt, Rinehart and Winston. Fishburn, P. (1999). Stochastic utility. In: S. Barbara, P. Hammond & C. Seidl (Eds), Handbook of utility theory (Vol. 1, pp. 273–320). Berlin: Springer. Grether, D., & Plott, C. (1979). Economic theory of choice and the preference reversal phenomenon. American Economic Review, 69, 623–638. Gul, F., & Pesendorfer, W. (2006). Random expected utility. Econometrica, 74, 121–146. Halff, H. M. (1976). Choice theories for differentially comparable alternatives. Journal of Mathematical Psychology, 14, 244–246. Harless, D., & Camerer, C. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62, 1251–1289. Harrison, G. W., Johnson, E., McInnes, M., & Rutstro¨m, E. E. (2005). Risk aversion and incentive effects: Comment. American Economic Review, 95, 897–901. Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2007). Risk attitudes, randomization to treatment, and self-selection into experiments. Working Paper no. 05-01. Department of Economics, College of Business Administration, University of Central Florida. Harrison, G. W., & Rutstro¨m, E. E. (2005). Expected utility theory and prospect theory: One wedding and a decent funeral. Working Paper no. 05-18. Department of Economics, College of Business Administration, University of Central Florida. Harrison, G. W., & Rutstro¨m, E. E. (2008). Risk aversion in the laboratory. In: J. C. Cox & G. W. Harrison (Eds), Research in experimental economics 12: Risk aversion in experiments (Vol. 12, pp. 41–196). Bingley, UK: Emerald (forthcoming). Hey, J. D. (1995). Experimental Investigations of errors in decision making under risk. European Economic Review, 39, 633–640. Hey, J. D. (2001). Does repetition improve consistency? Experimental Economics, 4, 5–54. Hey, J. D. (2005). Why we should not be silent about noise. Experimental Economics, 8, 325–345. Hey, J. D., & Carbone, E. (1995). Stochastic choice with deterministic preferences: An experimental investigation. Economics Letters, 47, 161–167. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62, 1291–1329. Hilton, R. W. (1989). Risk attitude under random utility. Journal of Mathematical Psychology, 33, 206–222. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655.

Stochastic Models for Binary Discrete Choice Under Risk

283

Huber, J., Payne, J. W., & Puto, C. (1982). Adding asymmetrically dominated alternatives: Violations of regularity and the similarity hypothesis. Journal of Consumer Research, 9, 90–98. Hutchinson, T. P., & Lai, C. D. (1990). Continuous bivariate distributions, emphasizing applications. Adelaide, Australia: Rumsby Scientiﬁc Publishers. Judd, K. L. (1998). Numerical methods in economics. Cambridge, MA: MIT Press. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Leland, J. (1994). Generalized similarity judgments: An alternative explanation for choice anomalies. Journal of Risk and Uncertainty, 9, 151–172. Loomes, G. (2005). Modeling the stochastic component of behaviour in experiments: Some issues for the interpretation of data. Experimental Economics, 8, 301–323. Loomes, G., Moffatt, P., & Sugden, R. (2002). A microeconometric test of alternative stochastic theories of risky choice. Journal of Risk and Uncertainty, 24, 103–130. Loomes, G., & Sugden, R. (1995). Incorporating a stochastic element into decision theories. European Economic Review, 39, 641–648. Loomes, G., & Sugden, R. (1998). Testing different stochastic speciﬁcations of risky choice. Economica, 65, 581–598. Luce, R. D. (1959). Individual choice behavior: A theoretical analysis. New York: Wiley. Luce, R. D. (1977). The choice axiom after twenty years. Journal of Mathematical Psychology, 15, 215–233. Luce, R. D., & Suppes, P. (1965). Preference, utility and subjective probability. In: R. D. Luce, R. R. Bush & E. Galanter (Eds), Handbook of mathematical psychology (Vol. III, pp. 249–410). Wiley: New York. Machina, M. (1985). Stochastic choice functions generated from deterministic preferences over lotteries. Economic Journal, 95, 575–594. McKay, A. T. (1934). Sampling from batches. Supplement to the Journal of the Royal Statistical Society, 1, 207–216. McKelvey, R., & Palfrey, T. (1995). Quantal response equilibria for normal form games. Games and Economic Behavior, 10, 6–38. Moffatt, P. (2005). Stochastic choice and the allocation of cognitive effort. Experimental Economics, 8, 369–388. Moffatt, P., & Peters, S. (2001). Testing for the presence of a tremble in economics experiments. Experimental Economics, 4, 221–228. Morrison, H. W. (1963). Testable conditions for triads of paired comparison choices. Psychometrika, 28, 369–390. Mosteller, F., & Nogee, P. (1951). An experimental measurement of utility. Journal of Political Economy, 59, 371–404. Myers, J. L., & Sadler, E. (1960). Effects of range of payoffs as a variable in risk taking. Journal of Experimental Psychology, 60, 306–309. Papke, L. E., & Wooldridge, J. M. (1996). Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics, 11, 619–632. Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32, 122–136. Prelec, D. (1998). The probability weighting function. Econometrica, 66, 497–527. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3, 323–343.

284

NATHANIEL T. WILCOX

Quiggin, J. (1991). Comparative statics for rank-dependent expected utility theory. Journal of Risk and Uncertainty, 4, 339–350. Rabin, M. (2000). Risk aversion and expected-utility theory: A calibration theorem. Econometrica, 68, 1281–1292. Rothschild, M., & Stiglitz, J. E. (1970). Increasing risk I: A deﬁnition. Journal of Economic Theory, 2, 225–243. Rubinstein, A. (1988). Similarity and decision making under risk (Is there a utility theory resolution to the allais paradox?). Journal of Economic Theory, 46, 145–153. Sonsino, D., Benzion, U., & Mador, G. (2002). The complexity effects on choice with uncertainty: Experimental evidence. Economic Journal, 112, 936–965. Starmer, C., & Sugden, R. (1989). Probability and juxtaposition effects: An experimental investigation of the common ratio effect. Journal of Risk and Uncertainty, 2, 159–178. Thurstone, L. L. (1927). A law of comparative judgment. Psychological Review, 76, 31–48. Train, K. (2003). Discrete choice methods with simulation. Cambridge, U.K.: Cambridge University Press. Tversky, A. (1969). Intransitivity of preferences. Psychological Review, 76, 31–48. Tversky, A. (1972). Elimination by aspects: A theory of choice. Psychological Review, 79, 281–299. Tversky, A., & Kahneman, D. (1986). Rational choice and the framing of decisions. Journal of Business, 59, S251–S278. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5, 297–323. Tversky, A., & Russo, J. E. (1969). Substitutability and similarity in binary choices. Journal of Mathematical Psychology, 6, 1–12. Vuong, Q. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57, 307–333. Wilcox, N. T. (1993). Lottery choice: Incentives, complexity and decision time. Economic Journal, 103, 1397–1417. Wilcox, N. T. (2007a). Stochastically more risk averse: A contextual theory of stochastic discrete choice under risk. Journal of Econometrics (forthcoming). Wilcox, N. T. (2007b). Predicting risky choices out-of-context: A Monte Carlo Study. Working Paper. Department of Economics, University of Houston. Wooldridge, J. M. (2002). Econometric analysis of cross section and panel data. Cambridge, MA: MIT Press.

APPENDIX A Deﬁnitions. Let {C,D,E} be any triple of lotteries generating the pairs {C,D}, {D,E}, and {C,E}, denoted as pairs m ¼ 1, 2, and 3, respectively. Consider a heteroscedastic latent variable speciﬁcation of the form Pm ¼ Fð½VðSm jbÞ VðRm jbÞ=sm Þ, where S m and Rm are the lotteries making up any pair m. Let V S VðSjbÞ be a shorthand notation for the structural value V of any lottery S, where the structural parameter b is suppressed but assumed ﬁxed throughout the discussion (i.e., the discussion is about an individual decision maker).

Stochastic Models for Binary Discrete Choice Under Risk

285

Halff’s Theorem (Halff, 1976). Consider a heteroscedastic latent variable speciﬁcation in which the standard deviations of latent error variance sm obey the triangle inequality across the three pairs generated by any triple of lotteries. Such speciﬁcations satisfy MST. Proof. I will prove the contrapositive. MST fails if we have (i) both P3 oP1 and P3 oP2 , and (ii) both P1 0:5 and P2 0:5. Conditions (i) imply that both ðV C V E Þ=s3 oðV C V D Þ=s1 and ðV C V E Þ=s3 o ðV D V E Þ=s2 . Conditions (ii) imply that both V C V D 0 and V D V E 0, so that V C V E 0 as well. Note that for conditions (i) to hold, it cannot be the case that conditions (ii) both hold as equalities, for then we would have P3 o0:5 from condition (i), implying V C V E o0 which contradicts conditions (ii). Therefore, either V C V D 0 or V D V E 0 (or both) hold as a strict inequality, and it follows that V C V E 40. Therefore, one may divide the inequalities ðV C V E Þ=s3 oðV C V D Þ=s1 and ðV C V E Þ=s3 oðV D V E Þ=s2 through by V C V E to get both 1=s3 ot=s1 and 1=s3 oð1 tÞ=s2 , where t ¼ ðV C V D Þ=ðV C V E Þ. These imply s1 ots3 and s2 oð1 tÞs3 , which sum to s1 þ s2 os3 and contradict the triangle inequality.

APPENDIX B Deﬁnitions. A basic triple of lotteries {C,D,E} is one where the pairs {C,D}, {D,E}, and {C,E}, denoted pairs 1, 2, and 3 respectively, are all basic pairs (i.e., none are FOSD pairs). Let V C and V C denote the value (to some agent) of degenerate lotteries that pay the minimum and maximum outcomes in lottery C with certainty, respectively. Notice that in a basic triple, the intersection of the three intervals ½V C ; V C , ½V D ; V D , and ½V ; V must be nonempty; otherwise, the outcome ranges of two lotteries E E would be disjoint, and the pair composed of them would be an FOSD pair. Proposition. Contextual utility obeys MST, but not SST, on all basic triples. Remark. This rules out only triples that contain ‘‘glaringly transparent’’ FOSD pairs in which all the outcomes in one lottery exceed all the outcomes in another lottery. See Section 4.2.3 for a suitable treatment of transparent FOSD pairs by the use of trembles. Proof. Let d CD maxðV C ; V D Þ minðV C ; V D Þ; this is equivalent to the divisor in the latent variable of contextual utility, as given by Eq. (18).

286

NATHANIEL T. WILCOX

Notice that d CD V C V C and d DE V E V E , since the utility range of a pair cannot be less than the utility range in either of the lotteries in a pair. Summing, we have d CD þ d DE V C V C þ V E V E . Since {C,D,E} is a basic triple, the intersection of the intervals ½V C ; V C and ½V E ; V E is nonempty (otherwise pair {C,E} would be an FOSD pair). Therefore, the utility range in pair {C,E} cannot exceed the sum of the utility ranges of its component lotteries C and E; that is d CE V C V C þ V E V E . Combining this with the previous inequality, we have d CD þ d DE d CE : The divisor d in Eq. (18) obeys the triangle inequality. So by Halff’s Theorem, contextual utility obeys MST for all basic triples. To show that Contextual utility can violate SST on basic triples, it is sufﬁcient to show an example. Consider an expected value maximizer. Assume that C, D, and E have outcome ranges [0,200], [100,300], and [100,400], respectively, and expected values 162, 160, and 150, respectively. The latent variable in contextual utility is the ratio of a pair’s V-distance to the pair’s range of possible utilities. In this example, these ratios are 2/300 in pair {C,D}, 10/300 in pair {D,E}, and 12/400 ¼ 9/300 in pair {C,E}. All are positive, implying that all choice probabilities (of the ﬁrst lottery in each pair) exceed 1/2, satisfying WST. However, the probability that C is chosen over E will be less than the probability that D is chosen over E, since the latent variable in the former pair (9/300) falls short of the latent variable in the latter pair (10/300). This violates SST.

APPENDIX C Deﬁnitions. Let ðzj ; zk ; zl Þ be the context of any MPS pair fS m ; Rm g ¼ fðsmj ; smk ; sml Þ; ðrmj ; rmk ; rml Þg. Rewrite the lotteries as fðsmj ; 1 smj sml ; sml Þ; ðrmj ; 1 rmj rml ; rml Þg, and measure the outcome vector in terms of zmk , writing it instead as ðxmj ; 1; xml Þ where xmj ¼ zmj =zmk o1 and xml ¼ zml =zmk 41. Since this is a MPS pair, we have smj xmj þ ð1 smj sml Þ þ sml xml ¼ rmj xmj þ ð1 rmj rml Þ þ rml xml , which implies that ðrmj smj Þ ¼ ab, where a ¼ ðxml 1Þ=ð1 xmj Þ ¼ ðzml zmk Þ=ðzmk zmj Þ 40 and b ¼ ðrml sml Þ. Call b the spread size. Obviously a is positive for any nondegenerate lottery, and b is positive too since by convention Rm is the riskier lottery in all basic pairs (and so has a higher probability of the highest outcome on the context). Also, notice that ð1 smj sml Þ ð1 rmj rml Þ ¼ ð1 þ aÞb.

Stochastic Models for Binary Discrete Choice Under Risk

287

Proposition. Under an (EU,WV) speciﬁcation, choice probabilities are invariant across pairs in any set of MPS pairs on a given context. Proof. It follows from the deﬁnitions that the difference between lottery probability vectors in any MPS pair, that is ðsmj rmj ; smk rmk ; sml rml Þ, can be expressed in the form bð a; 1 þ a; 1Þ where b is the spread size and a depends only on the context. The EU V-distance between the lotteries in an MPS pair is therefore VðSm jbÞ VðRm jbÞ ¼ b½ auj þ ð1 þ aÞuk ul , and q theﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ Euclidean distance between the probability vectors ﬃ 2 2 in an MPS pair is b a þ ð1 þ aÞ þ 1. Under an (EU,WV) speciﬁcation, the considered choice probability in a MPS pair would therefore be: 0 1 B auj þ ð1 þ aÞuk ul C Pm ¼ F @ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ A 8 b40 a2 þ ð1 þ aÞ2 þ 1 Therefore, choice probabilities in any MPS pair depend only on the context through the utilities of outcomes and the parameter a; in particular, they are independent of the size b of the spread. The proposition follows from the fact that pairs in any set of MPSs on a single context differ only by the size of the spread b and (perhaps) their expected values. But clearly the expression above is independent of expected values in lottery pairs as well: It depends only on the context and the utilities of the outcomes on the context.

APPENDIX D Deﬁnitions. Let fC h ; Dh ; E h g be a spread triple. From Appendix C, we can write the difference S m Rm between probability vectors in any MPS pair fS m ; Rm g as bð a; 1 þ a; 1Þ, where b is the spread size rml sml in the pair and a ¼ ðzml zmk Þ=ðzmk zmj Þ depends only on the context of the pair. Thus, for a spread triple h, we can write: (i) C h Dh ¼ bCD ðah ; 1 þ ah ; 1Þ, where bCD is the spread size in MPS pair fC h ; Dh g and (ii) Dh E h ¼ bDE ðah ; 1 þ ah ; 1Þ, where bDE is the spread size in MPS pair fDh ; E h g.

288

NATHANIEL T. WILCOX

Proposition. Betweenness implies that either C h kDh and Dh kE h , or E h kDh and Dh kC h , in every spread triple h. Proof. This follows immediately from betweenness if we can show that in every spread triple, there exists t 2 ½0; 1 such that Dh ¼ tC h þ ð1 tÞE h . From (ii), Dh ¼ E h þ bDE ð ah ; 1 þ ah ; 1Þ ¼ E h þ tðbCD þ bDE Þðah ; 1þ ah ; 1Þ, where t ¼ bDE =ðbCD þ bDE Þ is obviously in the unit interval. Adding (i) and (ii), we get C h E h ¼ ðbCD þ bDE Þð ah ; 1 þ ah ; 1Þ, which can be substituted into the previous result to give Dh ¼ E h þ tðCh E h Þ ¼ tC h þ ð1 tÞE h , which is as required.

APPENDIX E A process very similar to that detailed in Section 5.4.2 was used to select and estimate random parameters characterizations of heterogeneity for all speciﬁcations. In each case, a speciﬁcation’s parameter vector c is ﬁrst estimated separately for each subject with a ﬁxed tremble probability n o ¼ 0.04. Let these estimates be c~ . The correlation matrix of these ~ n are also subjected to a parameters is then computed, and the vectors c principal components analysis, with particular attention to the ﬁrstprincipal component. As with the detailed example of the (EU,Strong) speciﬁcation, all non-RP speciﬁcations with utility parameters u2 and u3 (i.e., any strict, strong, contextual, or WV speciﬁcation) yield quite high Pearson correlations between lnðu~n2 1Þ and lnðu~n3 1Þ across subjects, and heavy loadings of these on ﬁrst principal components of estimated ~ n . Therefore, the population distributions JðcjyÞ, where parameter vectors c c ¼ ðu2 ; u3 ; lÞ (non-RP EU speciﬁcations) or c ¼ ðu2 ; u3 ; g; lÞ (non-RP RDEU speciﬁcations), are in all cases modeled as having a perfect correlation between lnðun2 1Þ and lnðun3 1Þ, generated by an underlying standard normal deviate xu. Quite similarly, individual estimations of the two RP speciﬁcations, with Gamma distribution shape parameters f1 and f2 , yield quite high ~ n Þ and lnðf ~ n Þ across subjects, and heavy Pearson correlations between lnðf 1 2 loadings of these on ﬁrst principal components of estimated parameter ~ n . Therefore, the joint distributions JðcjyÞ, where c ¼ ðf ; f ; kÞ vectors c 1 2 in the (EU,RP) speciﬁcation, or c ¼ ðg; f1 ; f2 ; kÞ in the (RDEU,RP) speciﬁcation, are both assumed to have a perfect correlation between lnðfn1 Þ

Stochastic Models for Binary Discrete Choice Under Risk

289

and lnðfn2 Þ in the population, generated by an underlying standard normal deviate xf. In all cases, all other speciﬁcation parameters are characterized as possibly partaking of some of the variance represented by xu (in non-RP speciﬁcations) or xf (in RP speciﬁcations), but also having independent variance represented by an independent standard normal variate. In essence, all correlations between speciﬁcation parameters are represented as arising from a single underlying ﬁrst principal component (xu or xf), which in all cases accounts for two-thirds (frequently more) of the shared variance n of parameters in c~ according to the principal components analyses. The correlation is assumed to be a perfect one for lnðun2 1Þ and lnðun3 1Þ (in non-RP speciﬁcations) or lnðfn1 Þ and lnðfn2 Þ (in RP speciﬁcations), since this seems very nearly characteristic of all of the preliminary individual estimations; but aside from o, other speciﬁcation parameters are given their own independent variance since their correlations with lnðun2 1Þ and lnðun3 1Þ are always weaker than that observed between lnðun2 1Þ and lnðun3 1Þ (similarly for lnðfn1 Þ and lnðfn2 Þ in RP speciﬁcations). The following equation systems show the characterization for all speciﬁcations, where any subset of xu, xf, xl, xk, and xg found in each characterization are jointly independent standard normal variates. Tremble probabilities o are modeled as constant in the population, for reasons discussed in Section 5.1, and so there are no equations below for o. The systems represent the choice probabilities Pm , but of course Pm ¼ ð1 oÞPm þ o=2 is used to build likelihood functions allowing for trembles. As in the text, L, G, and B0 are the logistic, gamma, and beta-prime cumulative distribution functions, respectively. In all non-RP speciﬁcations, the utility vector ðuj ; uk ; ul Þ for pair m is related to the functions u2 ðxu ; yÞ and u3 ðxu ; yÞ in the precise way shown below Eq. (47) in the text, that is: uj ¼ 1 for pairs m on c ¼ 0; that is contextð1; 2; 3Þ; and uj ¼ 0 otherwise; uk ¼ u2 ðxu ; yÞ for pairs m on c ¼ 0; that is context ð1; 2; 3Þ; and uk ¼ 1 otherwise; and ul ¼ u2 ðxu ; yÞ for pairs m on c ¼ 3; that is context ð0; 1; 2Þ; and ul ¼ u3 ðxu ; yÞ otherwise The non-RP speciﬁcations below also include a divisor d m . This divisor is as

290

NATHANIEL T. WILCOX

follows: u2 ðxu ; yÞ for pairs m on context c ¼ 3; d m ¼ u3 ðxu ; yÞ for pairs m on context c ¼ 2; and u3 ðxu ; yÞ 1 for pairs m on context c ¼ 0; for contextual specifications; d m ¼ ððsmj rmj Þ2 þ ðsmk rmk Þ2 þ ðsml rml Þ2 Þ0:5 for WV specifications; and d m 1 for strong and strict specifications: Finally, the non-RP speciﬁcations below also include a function f ðxÞ: It is f ðxÞ ¼ lnðxÞ for strict speciﬁcations and f ðxÞ ¼ x for strong, contextual, and WV speciﬁcations. (EU,Strong), (EU,Strict), (EU,Contextual), and (EU,WV) speciﬁcations:

f ðsmj uj þ smk uk þ sml ul Þ f ðrmj uj þ rmk uk þ rml ul Þ ; Pm ðxu ; xl ; yÞ ¼ L lðxu ; xl ; yÞ dm where u2 ðxu ; yÞ ¼ 1 þ expða2 þ b2 xu Þ; u3 ðxu ; yÞ ¼ 1 þ expða3 þ b3 xu Þ; and lðxu ; xl ; yÞ ¼ expðal þ bl xu þ cl xl Þ; where y ¼ ða2 ; b2 ; a3 ; b3 ; al ; bl ; cl Þ

(EU,RP) speciﬁcation: Pm ðxf ; xk ; yÞ ¼ GðRm jf1 ðxf ; yÞ; kðxf ; xk ; yÞÞ for pairs m on c ¼ 3; GðRm jf1 ðxf ; yÞ þ f2 ðxf ; yÞ; kðxf ; xk ; yÞÞ for pairs m on c ¼ 2; and B0 ðRm jf2 ðxf ; yÞ; f1 ðxf ; yÞÞ for pairs m on c ¼ 0; where Rm ¼

smk þ sml rmk rml ; and rml sml

f1 ðxf ; yÞ ¼ expða1 þ b1 xf Þ; f2 ðxf ; yÞ ¼ expða2 þ b2 xf Þ; and kðxf ; xk ; yÞ ¼ expðak þ bk xf þ ck xk Þ; where y ¼ ða1 ; b1 ; a2 ; b2 ; ak ; bk ; ck Þ

291

Stochastic Models for Binary Discrete Choice Under Risk

(RDEU,Strong), (RDEU,Strict), (RDEU,Contextual), and (RDEU,WV) speciﬁcations: Pm ðxu ; xl ; xg ; yÞ ¼ L lðxu ; xl ; yÞ½ f ðwsmj ðxu ; xg ; yÞuj þ wsmk ðxu ; xg ; yÞuk þ wsml ðxu ; xg ; yÞul Þ

f ðwtmj ðxu ; xg ; yÞuj þ wtmk ðxu ; xg ; yÞuk þ wtml ðxu ; xg ; yÞul Þ=d m ; where ! ! P P wsmi ðxu ; xg ; yÞ ¼ w smz jgðxu ; xg ; yÞ w smz jgðxu ; xg ; yÞ and zzmi

wrmi ðxu ; xg ; yÞ ¼ w

P

! rmz jgðxu ; xg ; yÞ w

zzmi

z4zmi

P

! rmz jgðxu ; xg ; yÞ ; where

z4zmi

wðqjgðxu ; xg ; yÞÞ ¼ expð ½ lnðqÞgðxu ;xg ;yÞ Þ; and u2 ðxu ; yÞ ¼ 1 þ expða2 þ b2 xu Þ; u3 ðxu ; yÞ ¼ 1 þ expða3 þ b3 xu Þ; gðxu ; xg ; yÞ ¼ expðag þ bg xu þ cg xg Þ; and lðxu ; xl ; yÞ ¼ expðal þ bl xu þ cl xl Þ; where y ¼ ða2 ; b2 ; a3 ; b3 ; ag ; bg ; cg ; al ; bl ; cl Þ (RDEU,RP) speciﬁcations: Pm ðxf ; xk ; xg ; yÞ ¼ GðWRm ðxf ; xg ; yÞjf1 ðxf ; yÞ; kðxf ; xk ; yÞÞ for pairs m on c ¼ 3; GðWRm ðxf ; xg ; yÞjf1 ðxf ; yÞ þ f2 ðxf ; yÞ; kðxf ; xk ; yÞÞ for pairs m on c ¼ 2; and B0 ðWRm ðxf ; xg ; yÞjf2 ðxf ; yÞ; f1 ðxf ; yÞÞ for pairs m on c ¼ 0; where w½smk þ sml jgðxf ; xg ; yÞ w½rmk þ rml jgðxf ; xg ; yÞ and WRm ðxf ; xg ; yÞ ¼ w½rml jgðxf ; xg ; yÞ w½sml jgðxf ; xg ; yÞ wðqjgðxf ; xg ; yÞÞ ¼ expð ½ lnðqÞgðxf ;xg ;yÞ Þ; and f1 ðxf ; yÞ ¼ expða1 þ b1 xf Þ; f2 ðxf ; yÞ ¼ expða2 þ b2 xf Þ; gðxf ; xg ; yÞ ¼ expðag þ bg xf þ cg xg Þ; and kðxf ; xk ; yÞ ¼ expðak þ bk xf þ ck xk Þ; where y ¼ ða1 ; b1 ; a2 ; b2 ; ag ; bg ; cg ; ak ; bk ; ck Þ

292

NATHANIEL T. WILCOX

For speciﬁcations with the EU structure, a likelihood function nearly identical to Eq. (48) is maximized in y; for instance, for the (EU,RP) speciﬁcation, simply replace Pm ðxu ; xl ; y; oÞ with Pm ðxf ; xk ; y; oÞ ð1 oÞ Pm ðxf ; xk ; yÞ þ o=2, and replace dFðxu ÞdFðxl Þ with dFðxf ÞdFðxk Þ. For speciﬁcations with the RDEU structure, a third integration appears since these speciﬁcations allow for independent variance in g (the Prelec weighting function parameter) through the addition of a third standard normal variate xg. In all cases, the integrations are carried out by Gauss–Hermite quadrature. For speciﬁcations with the EU structure, where there are two nested integrations, 14 nodes are used for each nested quadrature of an integral. For speciﬁcations with the RDEU structure, 10 nodes are used for each nested quadrature. In all cases, starting values for these numerical maximizations are computed in the manner described for the (EU,Strong) n speciﬁcation: Parameters in c~ are regressed on their ﬁrst principal component, and the intercepts and slopes of these regressions are the starting values for the a and b coefﬁcients in the speciﬁcations, while the root mean squared errors of these regressions are the starting values for the c coefﬁcients found in the equations for l, k, and/or g.

MEASURING RISK AVERSION AND THE WEALTH EFFECT Frank Heinemann ABSTRACT Measuring risk aversion is sensitive to assumptions about the wealth in subjects’ utility functions. Data from the same subjects in low- and highstake lottery decisions allow estimating the wealth in a pre-speciﬁed oneparameter utility function simultaneously with risk aversion. This paper ﬁrst shows how wealth estimates can be identiﬁed assuming constant relative risk aversion (CRRA). Using the data from a recent experiment by Holt and Laury (2002a), it is shown that most subjects’ behavior is consistent with CRRA at some wealth level. However, for realistic wealth levels most subjects’ behavior implies a decreasing relative risk aversion. An alternative explanation is that subjects do not fully integrate their wealth with income from the experiment. Within-subject data do not allow discriminating between the two hypotheses. Using between-subject data, maximum-likelihood estimates of a hybrid utility function indicate that aggregate behavior can be described by expected utility from income rather than expected utility from ﬁnal wealth and partial relative risk aversion is increasing in the scale of payoffs.

Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 293–313 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00005-7

293

294

FRANK HEINEMANN

1. INTRODUCTION It is an open question, whether subjects integrate their wealth with the potential income from laboratory experiments when deciding between lotteries. Most theoretical economists assume that agents evaluate decisions under uncertainty by the expected utility that they achieve from consuming the prospective ﬁnal wealth that is associated with potential gains and losses from their decisions. This requires that agents fully integrate income from all sources in every decision. The well-known examples provided by Samuelson (1963) and Rabin (2000) raise serious doubts on such behavior. As Rabin points out, a person, who rejects to participate in a lottery that has a 50–50 chance of winning $105 or losing $100 at all initial wealth levels up to $300,000, should also turn down a 50–50 bet of losing $10,000 and gaining $5.5 million at a wealth level of 290,000. More generally, Arrow (1971) has already observed that maximizing expected utility from consumption implies an almost risk-neutral behavior towards decisions that have a small impact on wealth, while many subjects behave risk averse in experiments. There is a long tradition in distinguishing two versions of expected utility theory: expected utility from wealth (EUW) versus expected utility from income (EUI). EUW assumes full integration of income from all sources in each decision and is basically another name for expected utility from consumption over time (Johansson-Stenman, 2006). EUI assumes that agents decide by evaluating the prospective gains and losses associated with the current decision, independent from initial wealth. Agents who behave according to EUI isolate risky decisions of an experiment from other decisions in their lives. EUI is inconsistent with maximizing expected utility from consumption. It amounts to preferences (on consumption) depending on the reference point given by the initial wealth position (Wakker, 2005). Markowitz (1952) has already demonstrated that EUI may explain some puzzles like people buying insurance and lottery tickets at the same time.1 Prospect theory, introduced by Kahneman and Tversky (1979), is in the tradition of EUI by stating that people evaluate prospective gains and losses in relation to a status quo. Cox and Sadiraj (2004, 2006) provide an example showing that maximizing EUI is consistent with observable behavior in small- and large-stake lottery decisions. Barberis, Huang, and Thaler (2006) argue in the same direction, calling EUI ‘‘narrow framing.’’ Using data from medium-scale lottery decisions in low-income countries, Binswanger (1981) and Schechter (2007) argue that the evidence is consistent with EUI, but inconsistent with EUW, because asset integration implies absurdly high levels of risk aversion.

Measuring Risk Aversion and the Wealth Effect

295

However, it is hard to imagine that people strictly behave according to EUI: consider, for example, a person deciding on whether to invest in a certain industry or buy a risky bond. If the investor has a given utility function deﬁned over prospective gains and losses independent from status quo, then the decision would be independent from her or his initial portfolio, which seems implausible and betrays all theories of optimal portfolio composition.2 Sugden (2003) suggests an axiomatic approach of reference-dependent preferences that generalizes expected utility and includes EUW, EUI, and prospect theory as special cases. Cox and Sadiraj (2004, 2006) also suggest a broader model of expected utility depending on two variables, initial (non-random) wealth and prospective gains from risky decisions, which may enter the utility function without being fully integrated. One way to test this is a two-parameter approach to measuring utility functions: one parameter determines the local curvature of the utility function (like traditional risk aversion) and a second parameter determines the degree to which a subject integrates wealth with potential gains and losses from a lottery. Holt and Laury (2002a) report an experiment designed to measure how risk aversion is affected by an increase in the scale of payoffs. In this experiment each subject participates in two treatments that are distinguished by the scale of payoffs. Holt and Laury observe that most subjects increase the number of safe choices with an increasing payoff scale and conclude that relative risk aversion (RRA) must be rising in scales. In this paper, we show that within-subject data from subjects who participate in small- and large-stake lottery decisions can be used for a simultaneous estimation of constant relative risk aversion (CRRA) and the degree to which subjects integrate their wealth. The wealth effect is identiﬁed only if there is a substantial difference in the scales. The experiment by Holt and Laury (2002a) and a follow-up study by Harrison, Johnson, McInnes, and Rutstro¨m (2005) satisfy this requirement. We use their data and show: 1. For 90% of all subjects whose behavior is consistent with expected utility maximization the hypothesis of a CRRA cannot be rejected. 2. If subjects integrate their true wealth, then most subjects have a decreasing RRA. 3. If subjects have a non-decreasing RRA, the degree to which most subjects integrate initial wealth with lottery income is extremely small. 4. Combining the ideas of Holt and Laury (2002a) with Cox and Sadiraj (2004, 2006), we construct an error-response model with a threeparameter hybrid utility function generalizing CRRA and constant

296

FRANK HEINEMANN

absolute risk aversion (CARA) and containing a parameter that measures the integration of initial wealth. A maximum-likelihood estimate based on between-subject data yields the result that subjects fail to integrate initial wealth in their decisions. Thus, it conﬁrms EUI. For the estimated utility function, partial RRA is increasing in the scale of payoffs. In the next section, we explain how the degree to which subjects integrate their wealth in laboratory decisions can be measured by within-subject data from small- and large-stake lottery decisions. Section 3 applies this idea to the data obtained by Holt and Laury (2002a) and Harrison et al. (2005). Section 4 uses the data from their experiments to estimate a three-parameter hybrid utility function. Section 5 concludes and raises questions for future research.

2. THEORETICAL CONSIDERATIONS Let us ﬁrst consider the traditional approach of EUW, which assumes that decisions are based on comparisons of utility from consumption that can be ﬁnanced with the ﬁnancial resources available to a decision maker. Let U(y) be the indirect utility, i.e., the utility that an agent obtains from spending an amount y, and assume Uu(y)W0. Consider a subject asked to decide between two lotteries R and S. Lottery L R (risky) yields a high payoff of xH R with probability p and a low payoff xR H with probability 1p. Lottery S (safe) yields xS with probability p and xLS H L L with probability 1p, where xH R 4 xS 4 xS 4 xR . Let p vary from 0 to 1 continuously and ask the subject for the preferred lottery for different values of p. An expected utility maximizer should choose S for low probabilities p of gaining the high payoff and switch to R at some level p1 that depends on the person’s utility function. At p1 the subject may be thought of as being indifferent between both lotteries, i.e., L H L p1 UðW þ xH R Þ þ ð1 p1 ÞUðW þ xR Þ ¼ p1 UðW þ xS Þ þ ð1 p1 ÞUðW þ xS Þ (1)

where W is the wealth of this subject from other sources. Now, assume that the utility function has just one free parameter r determining the degree of risk aversion. If W is known, this free parameter is identiﬁed by p1. For example, if we assume CRRA, the utility function is given by UðxÞ ¼ sgnð1 rÞx1r for ra1 and UðxÞ ¼ ln x for r ¼ 1, where r is

Measuring Risk Aversion and the Wealth Effect

297

the Arrow–Pratt measure of relative risk aversion (RRA)3. The unknown parameter r is identiﬁed by the probability p1 at which the subject is indifferent and can be obtained by solving Eq. (1) for r. However, if W is not known, Eq. (1) has two unknowns and the degree of risk aversion is not identiﬁed. Here, we can solve Eq. (1) for a function r1(W). Let us now ask the subject to choose between lotteries Ru and Su that yield k times the payoffs of lotteries R and S, where the scaling factor k differs from 1. Again, the subject should choose Su for low values of p and Ru otherwise. Denote the switching point by pk. Now, we have a second equation L pk UðW þ k xH R Þ þ ð1 pk ÞUðW þ k xR Þ L ¼ pk UðW þ k xH S Þ þ ð1 pk ÞUðW þ k xS Þ

ð2Þ

and the two Eqs. (1) and (2) may yield a unique solution for both unknowns W and r. Assuming CRRA, the solution to this second equation is characterized by a function rk(W). If the subject is an expected utility maximizer with a CRRA ra0, then the two functions r1 and rk have a unique intersection. Thereby, the simultaneous solution to Eqs. (1) and (2) identiﬁes the wealth level and the degree of risk aversion. Denote this ^ r^Þ. solution by ðW; If the subject is risk neutral, then r1 ðWÞ ¼ rk ðWÞ ¼ 0 for all W. Risk aversion is still identiﬁed (^r ¼ 0), but not the wealth level. If the two functions do not intersect at any W, then the model is misspeciﬁed. Either the subject does not have a CRRA or she is not an expected utility maximizer. Simulations show that for s close to 1, the difference r1 ðWÞ rk ðWÞ is ^ This implies that small errors in the observations have a large very ﬂat at W. impact on the estimated values W^ and r^. Reliable estimates require that the scaling factor k is sufﬁciently different from 1 (at least 10 or at most 0.1). Obviously, one could also identify W and r by different pairs of lotteries in the same payoff scale. Unfortunately, this leads to the same problem as having a low scaling factor: measurement errors have an extremely large impact on W^ and r^. Fig. 1 below shows functions r1 (dashed curves) and rk (solid curves) for a particular example. The difference in slopes between dashed and solid curves identiﬁes W. This difference is due to the scaling factor and diminishes for scaling factors k close to 1.

298

FRANK HEINEMANN (W,r) Combinations consistent with CRRA of median subject in the Experiment by Holt and Laury (2002a) 1,4 1,2 1

r

0,8 consistent (W,r) combinations

0,6

p = 0.6, k = 1

0,4

p = 0.5, k = 1 0,2

p = 0.7, k = 20 p = 0.6, k = 20

0 0

Fig. 1.

1

2

3

4 W

5

6

7

8

(W,r)-Combinations in the Rhombic Area are Consistent with CRRA of a Subject with Median Number of Safe Choices in Both Treatments.

The bottom line to these considerations is that we may estimate individual degrees of risk aversion and the wealth effect simultaneously from smalland large-stake lottery decisions of the same subjects.

3. ANALYZING INDIVIDUAL DATA FROM THE EXPERIMENT BY HOLT AND LAURY Holt and Laury (2002a) present a carefully designed experiment in which subjects ﬁrst participate in low-scale lottery decisions and then in large-scale lottery decisions. Both treatments are designed as multiple-price lists, where probabilities vary by 0.1. This leaves a range for all estimates that may be in the order of measurement errors. Before participating in the large-scale lottery, subjects must give up their earnings from the previous low-scale treatment. Thereby, subjects’ initial wealth is the same in both treatments4. In the experiment, subjects make 10 choices between paired lotteries as laid out in Table 1. In the low-stake treatment, payoffs for option S are L xH S ¼ $2:00 with probability p and xS ¼ $1:60 with probability 1p. Payoffs

299

Measuring Risk Aversion and the Wealth Effect

Table 1. The 10-paired Lottery-choice Decisions with Low Payoffs. Option S 1/10 of $2.00, 9/10 of $1.60 2/10 of $2.00, 8/10 of $1.60 3/10 of $2.00, 7/10 of $1.60 4/10 of $2.00, 6/10 of $1.60 5/10 of $2.00, 5/10 of $1.60 6/10 of $2.00, 4/10 of $1.60 7/10 of $2.00, 3/10 of $1.60 8/10 of $2.00, 2/10 of $1.60 9/10 of $2.00, 1/10 of $1.60 10/10 of $2.00, 0/10 of $1.60

Option R 1/10 2/10 3/10 4/10 5/10 6/10 7/10 8/10 9/10 10/10

of of of of of of of of of of

$3.85, $3.85, $3.85, $3.85, $3.85, $3.85, $3.85, $3.85, $3.85, $3.85,

9/10 8/10 7/10 6/10 5/10 4/10 3/10 2/10 1/10 0/10

Expected Payoff Difference of of of of of of of of of of

$0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10 $0.10

$1.17 $0.83 $0.50 $0.16 $0.18 $0.51 $0.85 $1.18 $1.52 $1.85

L for option R are xH R ¼ $3:85 with probability p and xR ¼ $0:10 with probability 1 p. Probabilities vary for the 10 pairs from p ¼ 0.1 to 1.0 with increments of 0.1 for p. The difference in expected payoffs for option S versus option R decreases with rising p. For p 0:4, option S has a higher expected payoff than option R. For p 0:5 the order is reversed. In the high-stake treatments, payoffs are scaled up by a factor k which is either 20, 50, or 90 in different sessions. Harrison et al. (2005) repeated the experiment with a scaling factor k ¼ 10. In addition, they control for order effects arising from the subsequent presentation of lotteries with different scales. Comparing results from these two samples will show how robust our conclusions are. Subjects typically choose option S for low values of p and option R for high values of p. Most subjects switch at probabilities ranging from 0.4 to 0.9 with the proportion of choices for option S increasing in the scaling factor. Observing the probabilities, at which subjects switch from S to R, Holt and Laury estimate individual degrees of RRA on the basis of a CRRA utility function U(x), where x is replaced by the gains from the lotteries. Initial wealth W is assumed to be zero. The median subject chooses S for p 0:5 and R for p 0:6 in the treatment with k ¼ 1. For k ¼ 20 the median subject chooses S for p 0:6 and R for p 0:7. Median and average number of safe choices are increasing in the scaling factor. For W ¼ 0, RRA is independent from k. Holt and Laury use this property to argue that higher switching probabilities at high stakes are evidence for increasing RRA. For a positive initial wealth, however, the evidence is consistent with constant or even decreasing RRA as will be shown now.

300

FRANK HEINEMANN

Consider, for example, the behavior of the median subject. In the lowstake treatment, she is switching from S to R at some probability p1, with 0:5 p1 0:6. Solving Eq. (1) for r at p1 ¼ 0:5 gives us a function rmin 1 ðWÞ. Solving Eq. (1) for r at p1=0.6 yields rmax ðWÞ. Combinations of wealth and 1 risk aversion in the area between these two functions are consistent with CRRA. These (W,r)-combinations are illustrated in Fig. 1 above by the area between the two dashed curves. In the high-stake treatment with k ¼ 20, the median subject switches at some probability pk between 0.6 and 0.7. This behavior is consistent with CRRA for all (W,r)-combinations indicated by the range between the two solid curves in Fig. 1. The two areas intersect, and for any (W,r)-combination in this intersection her behavior in both treatments is consistent with CRRA. As Fig. 1 indicates, the behavior of the median subject is consistent with CRRA if 0 W 7:5. Without knowing her initial wealth, we cannot reject the hypothesis that the median subject has a CRRA. Participants of the experiment were US-American students, and their initial wealth is most certainly higher than $7.50. If we impose the restriction W W7.50, then we can reject the hypothesis that the median subject has a constant or even increasing RRA. For any realistic wealth level her RRA must be decreasing. There are nine subjects who behaved like the median subject. By the same logic we can test whether the behavior of other subjects is consistent with constant, increasing or decreasing RRA, and at which wealth levels. Table 2 gives an account of the distribution of choices for subjects who switched at most once from S to R (for increasing p) and never switched in the other direction. In Holt and Laury (2002a), there were 187 subjects participating in sessions with a low-scale and a real high-scale treatment. Twenty-ﬁve of these subjects were switching back at some point, making their behavior inconsistent with maximizing expected utility. We exclude these subjects from our analysis. This leaves us with 162 subjects for whom we can analyze whether their behavior is consistent with CRRA.5 In Table 2, rows count the number of safe choices in the low-stake treatment, while columns count the number of safe choices in the high-stake treatment. For the purpose of testing whether RRA is constant, increasing or decreasing, we can join the data from sessions with k=20, 50, and 90.6 One hundred ﬁve subjects are counted in cells above the diagonal. They made more safe choices in the high-stake treatment than in the low-stake treatment. Forty subjects are counted on the diagonal: they made the same choices in both treatments. Seventeen subjects below the diagonal have chosen more of the risky options in the high-stake treatment than for low payoffs.

301

Measuring Risk Aversion and the Wealth Effect

Harrison et al. (2005) had 123 subjects participating in their 1 10 treatment. One hundred two of these subjects behaved consistent with expected utility maximization, i.e., never switched from R to S for increasing p. Sixty-ﬁve subjects are counted in cells above the diagonal, 28 subjects are counted on the diagonal, and 9 subjects below the diagonal. In the remainder of this section we refer to the data by Holt and Laury (2002a, 2002b) without brackets and to data from Harrison et al. (2005) in brackets [ ]. To count how many subjects behave in accordance with CRRA, we sort subjects in 10 groups indicated by letters A to J in Tables 2 and 3. A. Group A contains 10 [8] subjects. Their behavior implies increasing RRA at all wealth levels. B. Group B contains 2 [0] subjects. Their behavior is consistent with CRRA at W ¼ 0. For W W 5 their behavior is consistent only with increasing RRA. C. Group C contains 25 [21] subjects. Their behavior is consistent with constant, increasing, or decreasing RRA at all wealth levels. D. Group D contains 32 [18] subjects. Their behavior is consistent with increasing RRA at all wealth levels and with constant or decreasing Table 2. 0 0 1 2 3 4 5 6 7 8 9

Distribution of Choices in the Experiment by Holt and Laury (2002a). 1

2

3

4

5

6

7

3 (A) 10 (C) 8 (F) 2 (G)

3 (A) 11 (D) 12 (F) 10 (F) 3 (G) 1 (H)

1 (A) 4 (D) 11 (E) 17 (F) 7 (F)

8

9

1 (A)

1 (B) 1 (B) 3 (C) 10 (C) 3 (G) 2 (H)

1 (H) 1 (H)

1 (H)

1 3 1 8 2 3

(A) (D) (E) (E) (F) (F)

1 (A) 1 (D) 1 (D) 4 (D) 8 (D) 1 (C) 1 (C)

Note: Rows indicate the number of safe choices in the low-stake treatment, columns indicate the number of safe choices in the real high-stake treatment. Letters in parentheses refer to the groups. Source: Holt and Laury (2002b).

302

FRANK HEINEMANN

Table 3. 0 0 1 2 3 4 5 6 7 8 9

Distribution of Choices in the Experiment by Harrison et al. (2005). 1

2

1 (I)

3

4

5

6

9 (C) 1 (G) 1 (H)

3 (A) 5 (C) 7 (F) 2 (G)

1 (A) 2 (A) 3 (D) 8 (F) 6 (F) 3 (G)

7

8

9

1 (A) 2 (D) 4 (E) 9 (F) 3 (F) 1 (G)

3 (D) 1 (J) 4 (E) 3 (F) 1 (F)

1 (A) 1 (D) 5 (D) 4 (D) 5 (C) 2 (C)

Note: Rows indicate the number of safe choices in the low-stake treatment, columns indicate the number of safe choices in the high-stake treatment. Letters in parentheses refer to the groups. Source: Harrison et al. (2005).

E.

F.

G.

H. I.

J.

RRA at all wealth levels W W 50. It is inconsistent with non-increasing RRA at W ¼ 0. Group E contains 20 [8] subjects. Their behavior is consistent with increasing RRA at W ¼ 0, with decreasing RRA for W W 50, and with CRRA for some wealth level in the range 0 o W o 50. It is inconsistent with non-increasing RRA at W ¼ 0 and with non-decreasing RRA at W W50. Group F contains 59 [37] subjects. Their behavior is consistent with decreasing RRA at all wealth levels and with constant or increasing RRA at W ¼ 0. It is inconsistent with non-decreasing RRA at W W 15. Group G contains 8 [7] subjects. Their behavior is inconsistent with increasing RRA at all wealth levels. It is consistent with CRRA at W ¼ 0. For W W0 their behavior is consistent only with decreasing RRA. Group H contains 6 [1] subjects. Their behavior implies decreasing RRA at all wealth levels. Group I contains 0 [1] subject. Her or his behavior is consistent with constant, increasing, and decreasing RRA if W W 5. W o 5 implies increasing RRA. Group J contains 0 [1] subject. Her or his behavior is consistent with constant, increasing, and decreasing RRA for 2 o W o 1; 000. W o 2 implies decreasing RRA, W W1,000 implies decreasing RRA.

Measuring Risk Aversion and the Wealth Effect

303

Summing up, the behavior of 146 [93] out of 162 [102] subjects (90% [91%]) is consistent with CRRA at some wealth level (groups B–G, I, J). For 62 [35] subjects (38% [34%]) a wealth level of W ¼ 0 implies an increasing RRA (groups A, D, E, and I), for 6 [2] subjects (4% [2%]) W ¼ 0 implies decreasing RRA (groups H and J). Realistic wealth levels are certainly all above $50. For 93 [53] subjects (57% [52%]) W W 50 implies decreasing RRA (groups E–H), and only for 12 [8] subjects (7% [8%]) a realistic wealth level implies increasing RRA (groups A–B). This analysis shows that the data do not provide ﬁrm grounds for the hypothesis that RRA is increasing in the scale of payoffs. There seems to be more evidence for the opposite conclusion: most subjects’ behavior is qualiﬁed to reject constant or increasing RRA in favor of decreasing RRA at any realistic wealth level. Decreasing RRA is a possible explanation for behavior in low-stake and high-stake lottery decisions in the experiment. An alternative explanation is that subjects do not fully integrate their wealth with the prospective income from lotteries. Cox and Sadiraj (2004) suggest a two-parameter utility function UðW; xÞ ¼ sgnð1 rÞðdW þ xÞ1r

(3)

where W is initial wealth, x the gain from a lottery, and d a parameter thought to be smaller than 1 that rules the degree to which a subject integrates initial wealth with prospective gains from the lottery. Parameter r is the curvature of this function with respect to dW+x. Although the functional form is similar to CRRA, r cannot be interpreted as the Arrow–Pratt measure of RRA, as will be explained below. Cox and Sadiraj (2004) provide an example to show that the puzzle raised by Rabin (2000) can be resolved if dW is close to 1. In the experiment, we do not know subjects’ wealth W. Hence, d is not identiﬁed. But, we can estimate integrated wealth dW by the same method that we applied for analyzing wealth levels at which observed behavior is consistent with CRRA. Each cell in Tables 2 and 3 is associated with a range for integrated wealth dW that is consistent with utility function (3). From the previous analysis we know already that the behavior of 90% of all subjects, who never switch from R to S for increasing p, is consistent with Eq. (3) at some level of integrated wealth. Going through all cells and counting at which wealth levels their behavior is consistent with Eq. (3), yields the result that the proportion of subjects whose behavior is consistent with Eq. (3) has a maximum at dW 1.

304

FRANK HEINEMANN

For dW ¼ 0, there are 54 [37] subjects whose behavior is consistent with Eq. (3) only for one particular value of r. The median subject shown in Fig. 1 is such a case. Her behavior is consistent with Eq. (3) at dW ¼ 0 if and only if r is precisely 0.4115. It is unlikely that more than 30% of all subjects have a degree of risk aversion that comes from a set with measure zero. It is much more likely that these subjects have a positive level of integrated wealth, which opens a range for r at which behavior is consistent with Eq. (3). The proportion of subjects whose behavior is robustly consistent with Eq. (3) at dW ¼ 0 drops to 25% [27%], and we get a unique maximum for this proportion at dW ¼ 1. We illustrate this proportion in Figs. 2 and 3 for the two data sets. Some subjects’ behavior is consistent with Eq. (3) only for sufﬁciently high levels of dW, while others require dW to be small. For 55% [51%] of all subjects behavior is consistent with Eq. (3) only if dW o 50. Thus, it seems that most subjects integrate initial wealth in their evaluation of lotteries only to a very small degree. Let us now analyze what this means for the question of whether RRA is increasing or decreasing. The answer may depend on how we deﬁne RRA for this utility function. The Arrow–Pratt measure is deﬁned by RRA ¼ yU 00 ð yÞ=U 0 ð yÞ, where y is the single argument in the indirect utility function comprising initial wealth with potential gains from lotteries. Utility function (3) has two arguments, though. Suppose that d is a positive constant. Then one might deﬁne RRA by the derivatives of Eq. (3) with

Proportion of subjects whose behavior is consistent with utility function (3) 60% 50% 40% 30% 20% 0

10

20 30 integrated wealth W

40

50

Fig. 2. Proportion of Subjects Whose Behavior is Consistent with Utility Function (3). Non-robust Cases Counted as Inconsistent. Source: Holt and Laury (2002b).

305

Measuring Risk Aversion and the Wealth Effect Proportion of subjects whose behavior is consistent with utility function (3) 70% 60% 50% 40% 30% 20% 0

10

20 30 integrated wealth W

40

50

Fig. 3. Proportion of Subjects Whose Behavior is Consistent with Utility Function (3). Non-robust Cases Counted as Inconsistent. Source: Harrison et al. (2005).

respect to W or x. RRAW ¼ W

@2 U=@W 2 rdW ¼ dW þ x @U=@W

(4)

is increasing in W if r W 0 and decreasing if r o 0. Thus, increasing wealth increases the curvature of the utility function with respect to wealth. RRAx ¼ x

@2 U=@x2 rx ¼ dW þ x @U=@x

(5)

is increasing in x if r W 0 and decreasing if r o 0. Thereby, increasing the scale of lottery payments x increases the absolute value of this measure of RRA for all subjects who are not risk neutral. Following Binswanger (1981), we may call RRAx ‘‘partial RRA,’’ because it deﬁnes the curvature of the utility function with respect to the potential income from the next decision only. RRAW is the curvature of the utility function with respect to wealth, which is relevant for portfolio choice and all kinds of normative questions. While the absolute value of RRAW is increasing in W, it is decreasing in x. This means that increasing the scale of lottery payments reduces RRAW. This reconciles the results for utility function (3) with the previous result that for fully integrated wealth, risk aversion must be decreasing to explain

306

FRANK HEINEMANN

the predominant behavior. On the other hand, the absolute value of partial risk aversion is decreasing in W. Thus, subjects with a higher wealth should (on average) accept more risky bets. These properties are inherent in utility function (3) and can, therefore, not be rejected without rejecting utility function (3). As we laid out before, 90% of the subjects who never switch from R to S behave in a way that is consistent with Eq. (3). It follows that the experiment is not well-suited to discriminate between the two hypotheses: (i) agents fully integrate wealth and RRA is decreasing, and (ii) agents do not fully integrate wealth. Furthermore, if subjects do not fully integrate wealth, an experiment with lotteries of different scales cannot answer the question of whether RRA is increasing or decreasing in the wealth level. Harrison et al. (2005) have shown that there is an order effect that leads subjects to choose safer actions in a high-scale treatment after participating in a low-scale treatment before than in an experiment that consists of the high-scale treatment only. This order effect may account for a substantial part of the observed increase in the number of safe choices in the high-scale treatments. Although the order effect does not reverse the responses to increasing payoff scale, the numerical estimates are affected. If the increase in safe choices with rising payoff scale had been smaller, then we would ﬁnd less subjects in upper-right cells of Table 2 and more in cells, for which consistency with fully integrated wealth requires decreasing RRA. The proportion of subjects, whose behavior is consistent with utility function (3) would be shifted to the left, indicating an even lower level of integrated wealth. We infer that accounting for the order effect strengthens our results.

4. ESTIMATING A HYBRID UTILITY FUNCTION Hybrid utility functions with more than two parameters cannot be estimated individually, if within-subject data are only elicited for lotteries of two different scales. In principle, one could do the same exercise with a third parameter, if subjects participate in lottery decisions of three very distinct scales. However, between-subject data can be used to estimate models with more parameters than lottery scales. The obvious disadvantage of this procedure is that one assumes a representative utility function governing the choices of all subjects. Idiosyncratic differences are then attributed to ‘‘errors’’ and assumed to be random.7 Holt and Laury (2002a) apply such an error-response model. They assume a representative agent with a probabilistic choice rule, where the

307

Measuring Risk Aversion and the Wealth Effect

individual probability of choosing lottery S is given by ½EUðSÞ1=m ½EUðSÞ1=m þ ½EUðRÞ1=m EU( ) is the expected utility from the respective lottery and m is an error term. For m-0 the agent chooses the option with higher expected utility almost certainly (rational choice). For m-N, the behavior approaches a 50:50 random choice. Utility is deﬁned by a ‘‘power-expo’’ utility function UðxÞ ¼

1 expðax1r Þ a

This function converges to CRRA for a-0 and to CARA for r-0. For x, Holt and Laury insert the respective gains from lotteries. Again, they assume that initial wealth does not enter the utility function. We extend this approach by including a parameter for integrated wealth, i.e., we use a utility function UðW; xÞ ¼

1 expðaðdW þ xÞ1r Þ a

(6)

where dW is integrated wealth and x is replaced by the respective gains from lotteries. As in the previous analysis, lack of data for personal income prevent an estimation of d. Instead we may treat dW as a parameter of the utility function that is identiﬁed. Following Holt and Laury (2002a), we estimate this model using a maximum-likelihood procedure. Table 4 reports the results of these estimates, in the ﬁrst row for data from decisions in real-payoff treatments by all subjects in Holt and Laury’s sample, in the second row for the data from Harrison et al. (2005). Table 4. Estimated Parameters of the Error-response Model. m Data from (2002b) Data from (2005) Data from (2002b) Data from (2005)

Holt and Laury Harrison et al. Holt and Laury Harrison et al.

0.1156 (0.0063) 0.1324 (0.0100) 0.1315 (0.0046) 0.1726 (0.0074)

r

a

dW

0.324 (0.0251) 0.0327 (0.0441) 0.273 (0.0172) 0.0050 (0.0258)

0.0326 (0.00323) 0.0500 (0.0056) 0.0286 (0.00244) 0.0459 (0.0034)

0.189 (0.069) 0.737 (0.210)

308

FRANK HEINEMANN

Rows 3 to 4 contain the estimates of Holt and Laury’s model for both data sets, which has the additional restriction of dW=0. Numbers in parentheses denote standard errors. We can formally reject the hypothesis that dW=0. p-values are 0.6% for the data from Holt and Laury and below 0.1% for data from Harrison et al.8 However, for both data sets, the estimated amount of asset integration dW is below $1. This shows that subjects behave as if they almost neglect their wealth from other sources. Note that for dW ¼ 0, utility function (6) implies that partial RRA is increasing in x. On the other hand, partial RRA is decreasing in W if 0 o r o 1 and d W 0. We may conclude that partial RRA is increasing in the scale of lottery payments but not in wealth. RRAW is zero for d ¼ 0. This seems to imply a CRRA with respect to wealth. This conclusion is rash, though. Since subjects do not integrate wealth at all, the experiment is inappropriate to measure how risk aversion depends on wealth. It is worth noting that the data from Harrison et al. (2005) do not support the hybrid utility function. The estimated value of r is not signiﬁcantly different from 0 (all p-values are above 45%). Thus, their data are consistent with constant partial absolute risk aversion.

5. CONCLUSION AND OUTLOOK ON FUTURE RESEARCH The extent to which subjects integrate wealth with potential income from lottery decisions in laboratory experiments can be identiﬁed if subjects participate in lottery decisions with small and large payoffs and enter both decisions with the same wealth. To avoid order effects, these decisions should be made simultaneously, for example, by using two multiple-price lists with different scaling factors and then randomly selecting one situation for payoffs. Although the experiment by Holt and Laury (2002a) suffers from an order effect, their within-subject data indicate that most subjects either have a decreasing RRA or integrate their wealth only to a very small extent. Within-subject data do not allow us to discriminate between these two hypotheses. Neither can the hybrid utility function given by Eq. (6), because it implies increasing RRA for d ¼ 1. The calibrations provided by Rabin (2000) are based on full integration of wealth and do not rely on any assumptions

Measuring Risk Aversion and the Wealth Effect

309

about increasing or decreasing risk aversion. Their examples indicate that behavior in low- and medium-stake lotteries (as employed in experiments) can be reconciled with observed behavior on ﬁnancial markets only, if initial wealth is not fully integrated in laboratory decisions. Our estimates of a common hybrid utility function also indicate that subjects do not integrate initial wealth, partial RRA is increasing in the scale of lotteries but not in wealth. Harrison, List, and Towe (2007) apply this method to another experiment on risk aversion and report a similar result. Sasaki, Xie, Ohtake, Qin, and Tsutsui (2006) report a small experiment on sequential lottery decisions with 30 Chinese students, where neither external income, nor initial wealth, nor previous gains within the experiment have a signiﬁcant impact on choices. Andersen, Harrison, and Rutstro¨m (2006c) estimate integration of gains in sequential lottery decisions. They allow for observations being partially explained by maximizing expected utility from a CRRA utility function and partially by prospect theory. They ﬁnd that about 67% of observations are better explained by maximizing expected utility. The estimated degree of CRRA is negative (indicating risk-loving behavior) and they cannot reject the hypothesis that those who maximize expected utility integrate their earning from previous lotteries. When assuming that all subjects have a CRRA utility function, however, they ﬁnd that earnings from previous lotteries are not integrated in decisions. Although they do not test the integration of initial wealth, their exercise demonstrates that non-integration results may be an artifact of assuming expected utility theory, when in fact a substantial proportion of subjects follows other decision rules. The common evidence from these studies is that initial wealth from outside the laboratory is not fully integrated, but income from previous decisions within the lab may be. Non-integration of initial wealth is also called ‘‘narrow framing’’ by Barberis et al. (2006). Non-integration of laboratory income from subsequent decisions has been called ‘‘myopic risk aversion’’ by Gneezy and Potters (1997). While the evidence for narrow framing is strong, it is still debatable, whether subjects suffer from myopic risk aversion. One possible explanation for the lack of integration of initial wealth with laboratory income can be found in mental accounting:9 subjects treat an experiment and possibly each decision situation as one entity for which they have an aspiration level that they try to achieve with low risk. The dark side of explaining non-integration by mental accounting is that it opens Pandora’s box to context-speciﬁc explanations for all kinds of behavior and severely limits the external validity of experiments.

310

FRANK HEINEMANN

Subjects who do not integrate wealth treat decision situations as being to some degree independent from other (previous) decisions, even though the budget constraint connects all economic decisions. They behave as if wealth from other sources is small compared to the amounts under consideration in a particular decision situation. We have formalized this by assuming that a subject considers only a fraction d of wealth from other sources in each decision. In our analysis, we have assumed that d is a constant parameter. However, it seems perfectly reasonable that d might be higher, if a subject has more reasons to consider her wealth in a particular decision. For example, Holt and Laury (2002a) observe that the number of safe choices in high-stake treatments with hypothetical earnings was signiﬁcantly lower than in high-stake treatments with real earnings. This may be explained by d being higher in situations with real earnings. Andersen, Harrison, Lau, and Rutstro¨m (2006b) estimate integration of initial wealth in a power-expo utility function using data from a British television lottery show, where prices range up to d250,000 and average earnings are above d10,000. They ﬁnd that participants integrate on average an amount of d26,011, which is likely to be a substantial part of their wealth. Narrow framing and myopic risk aversion have interesting consequences for behavior in ﬁnancial markets: if a person does not integrate her wealth with the potential income from ﬁnancial assets that she decides to buy or sell, she does not consider the correlation between the payoffs of different assets and evaluates each asset only by the moments of the distribution of this asset’s returns. Her decision is independent from the distribution of returns from other assets, which results in a portfolio that is not optimally diversiﬁed. It is an interesting question for future research under which circumstances subjects consider a substantial part of their wealth in decisions. Gneezy and Potters (1997) have gone so far as to draw practical conclusions for fund managers from the observed disintegration or ‘‘myopic behavior’’ as they call it. However, it is an open question whether and to what extent the framing of a decision situation raises the awareness that decisions affect ﬁnal wealth. This awareness might be systematically higher for decisions in ﬁnancial market than for lottery choices in laboratory experiments. A worthwhile study could compare lottery decisions with decisions for an equal payoff structure where lotteries are phrased as ﬁnancial assets. Another possible explanation for disintegration of wealth in laboratory decisions is the high evaluation that humans have for immediate rewards. Decisions in a laboratory have immediate consequences: subjects get money at the end of a session or at least they get to know how much money they

Measuring Risk Aversion and the Wealth Effect

311

will receive. The positive or negative feedback (depending on outcome and aspiration level) affects personal happiness, although the absolute amount of money is rather small (for k ¼ 90 they win at most $346.50). By introspection, I would suggest that this feeling is much smaller for an unexpected increase in the value of a portfolio by the same amount. People know that immediate rewards evoke emotions and the high degree of risk aversion exhibited in the lab may be a consequence of this foresight.10 Subjects may try to maximize a myopic utility arising from immediate feedback. To test this hypothesis, one might compare behavior in sessions, where the outcome of a lottery is announced immediately after the decision, with sessions in which the outcome is announced with delay. In both treatments, payments would need to be delayed by the same time interval for avoiding that time preference for cash exerts an overlaying effect.

NOTES 1. For a detailed description of the history of reference-dependent utility see Wakker (2005). 2. These theories are, of course, based on EUW. 3. An alternative notion of CRRA utility rescales the function by 1/|1 r| for r a 1, which does not affect the results. 4. Before participating in lottery choices, subjects participated in another unrelated experiment. It cannot be ruled out that previous earnings affected behavior in lottery choices. 5. Provided that non-satiation holds, switching back is inconsistent with expected utility theory. Andersen, Harrison, Lau, and Rutstro¨m (2006a) argue that some of these subjects may be indifferent to monetary payoffs. Switching back occurs more often in the small-stake treatment (14%) than in the large-stake treatments (9% for k ¼ 10 and 5–6% for k ¼ 20, 50, 90). This may be seen as evidence for indifference toward small payoffs. Another explanation would be stochastic mistakes: if each subject makes a mistake with some probability, this probability would need to be in the range of 1–2% to get the observed 90% threshold players. Then, more than 85% of non-threshold players should not make more than one mistake. However, we observe that most non-threshold players make at least two mistakes, which is at odds with the overall small number of non-threshold players, provided that the error probability is the same for all decisions. 6. A detailed analysis of the wealth levels, at which observed choices are consistent with constant, increasing or decreasing RRA is provided by Heinemann (2006). 7. Personal characteristics can explain some of the data variation between subjects (Harrison et al., 2005) and reduce the estimated error rate. By using personal characteristics, the assumption of a common utility function can be replaced by a common function explaining differences in preferences or behavior. Still, one

312

FRANK HEINEMANN

assumes a common function and attributes all unexplained differences to errors, while within-subject data allow estimating one utility function for each subject. 8. Note, however, that p-values are underestimated, because different decisions by the same subject are treated as independent observations. 9. Thaler (1999) and Rabin and Thaler (2001) provide nice surveys of these arguments. Schechter (2005, Footnote 2) provides anecdotal evidence for mental accounting. 10. For a theoretical treatment of this issue see Kreps and Porteus (1978).

ACKNOWLEDGMENTS The author is grateful to Werner Gu¨th, Peter Wakker, Martin Weber, two anonymous referees, and the editors of this issue, Jim Cox and Glenn Harrison for their valuable comments.

REFERENCES Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006a). Elicitation using multiple price list formats. Experimental Economics, 9, 383–405. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006b). Dynamic choice behavior in a natural experiment. Working Paper no. 06-10. Department of Economics, College of Business Administration, University of Central Florida, http://www.bus.ucf.edu/wp/ Working%20Papers/papers_2006.htm Andersen, S., Harrison, G. W., & Rutstro¨m, E. E. (2006c). Dynamic choice behavior: Asset integration and natural reference points. Working Paper no. 06-07. Department of Economics, College of Business Administration, University of Central Florida, http:// www.bus.ucf.edu/wp/Working%20Papers/papers_2006.htm Arrow, K. (1971). Essays in the theory of risk-bearing. Chicago, IL: Markham Publishing Company. Barberis, N., Huang, M., & Thaler, R. H. (2006). Individual preferences, monetary gambles, and stock market participation: A case for narrow framing. American Economic Review, 96(4), 1069–1090. Binswanger, H. P. (1981). Attitudes toward risk: Theoretical implications of an experiment in rural India. The Economic Journal, 91, 867–890. Cox, J. C., & Sadiraj, V. (2004). Implications of small- and large-stakes risk aversion for decision theory. Working paper prepared for workshop on Measuring Risk and Time Preferences by the Centre for Economic and Business Research in Copenhagen, June 2004. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56, 45–60. Gneezy, U., & Potters, J. (1997). An experiment on risk taking and evaluation periods. Quarterly Journal of Economics, 112, 631–645. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2005). Risk aversion and incentive effects: Comment. American Economic Review, 95, 897–901.

Measuring Risk Aversion and the Wealth Effect

313

Harrison, G. W., List, J. A., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study for risk aversion. Econometrica, 75, 433–458. Heinemann, F. (2006). Measuring risk aversion and the wealth effect: Calculations, available at http://anna.ww.tu-berlin.de/Bmakro/Heinemann/publics/measuring-ra.html Holt, C. A., & Laury, S. K. (2002a). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655. Holt, C. A., & Laury, S. K. (2002b). Risk aversion and incentive effects: Appendix, available at http://www2.gsu.edu/Becoskl/Highdata.pdf Johansson-Stenman, O. (2006). A note in the risk behavior and death of Homo Economicus. Working Papers in Economics no. 211. Go¨teborg University. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Kreps, D. M., & Porteus, E. L. (1978). Temporal resolution of uncertainty and dynamic choice theory. Econometrica, 46, 185–200. Markowitz, H. (1952). The utility of wealth. Journal of Political Economy, 60, 151–158. Rabin, M. (2000). Risk aversion and expected-utility theory: A calibration theorem. Econometrica, 68, 1281–1292. Rabin, M., & Thaler, R. H. (2001). Anomalies: Risk aversion. Journal of Economic Perspectives, 15, 219–232. Samuelson, P. (1963). Risk and uncertainty: A fallacy of large numbers. Scientia, 98, 108–113. Sasaki, S., Xie, S., Ohtake, F., Qin, J., & Tsutsui, Y. (2006). Experiments on risk attitude: The case of Chinese students. Discussion Paper No. 664. Institute of Social and Economic Research, Osaka University. Schechter, L. (2007). Risk aversion and expected-utility theory: A calibration exercise. Journal of Risk and Uncertainty, 35, 67–76. Sugden, R. (2003). Reference-dependent subjective expected utility. Journal of Economic Theory, 111, 172–191. Thaler, R. H. (1999). Mental accounting matters. Journal of Behavioral Decision Making, 12, 183–206. Wakker, P. P. (2005). Formalizing reference dependence and initial wealth in Rabin’s calibration theorem. Working Paper. Econometric Institute, Erasmus University Rotterdam, http:// people.few.eur.nl/wakker/pdf/calibcsocty05.pdf

RISK AVERSION IN THE PRESENCE OF BACKGROUND RISK: EVIDENCE FROM AN ECONOMIC EXPERIMENT Jayson L. Lusk and Keith H. Coble ABSTRACT This paper investigates whether individuals’ risk-taking behavior is affected by background risk by analyzing individuals’ choices over a series of lotteries in a laboratory setting in the presence and absence of independent, uncorrelated background risks. Overall, our results were mixed. We found some support for the notion that individuals were more risk averse when faced with the introduction of an unfair or meanpreserving background risk than when no background risk was present, but this ﬁnding depends on how individuals incorporate endowments and background gains and losses into their utility functions and how error variance is modeled.

Characterizing individual behavior in the presence of risk is a fundamental concept in a variety of disciplines. Most risk analysis focuses on individuals’ behavior when faced with single risky decisions such as whether to buy (or sell) an asset with an uncertain return, whether to purchase insurance, or Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 315–340 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00006-9

315

316

JAYSON L. LUSK AND KEITH H. COBLE

how much to pay for other forms of risk protection. However, individuals are rarely faced with a single risk. Individuals are constantly confronted with a variety of risks, some of which can be inﬂuenced and others that are exogenous to the individual. Inevitably, an individual must make a particular risky decision before outcomes of other exogenous or ‘‘background’’ risks are fully realized. Theoretical models that ignore background risk have the potential to generate biased estimates of optimal risk-taking behavior. For example, Weil (1992) argued that prices of risky assets are likely to be overestimated and equity premiums underestimated if background risks, such as risk on human capital, are not taken into consideration when calculating optimal portfolio allocations. It seems natural to expect that changes in exogenous background risk might affect individuals’ choices between risky prospects. However, there is not universal agreement on the anticipated effect of background risk on risk aversion. Based on their own intuition, Gollier and Pratt (1996) and Eeckhoudt, Gollier, and Schlesinger (1996) derived necessary and sufﬁcient restrictions on utility such that an addition of, or increase in, background risk will cause a utility-maximizing individual to make more conservative choices in other risky situations. In contrast, Diamond (1984) investigated conditions under which individuals would ﬁnd a gamble more attractive when another independent risky gamble was added to the portfolio. Quiggin (2003) showed that aversion to one risk will be reduced by the presence of an independent background risk for certain classes of non-expected utility preferences that are consistent with constant risk aversion as in Safra and Segal (1998). It is clear that theoretical expositions cannot provide an unambiguous indication of the effect of background risk on risk aversion. It is ultimately an empirical question as to whether and how individuals’ risk preferences are actually affected by background risk. Unfortunately, existing empirical evidence has provided conﬂicting results. For example, Heaton and Lucas (2000) found that higher levels of background risk were associated with reduced stock market participation; Guiso, Japelli, and Terlizzese (1996) found that demand for risky assets fell as uninsurable background risks increased; Alessie, Hochguertel, and van Soest (2001) found no relationship between income uncertainty and demand for risky assets; and Arrondel and Masson (1996) found increases in earnings risk were associated with higher levels of stock ownership. To date, such evidence has been based primarily on household survey data, which pose a variety of statistical and inferential challenges. One exception is Harrison, List, and Towe (2007), who studied background risk in the market for rare coins. They compared choices

Risk Aversion in the Presence of Background Risk

317

between gambles to obtain coins of known, certiﬁed quality (i.e., low background risk) to choices between gambles to obtain coins with quality certiﬁcations removed (i.e., high background risk) and found that increasing background risk increased risk aversion. Harrison et al. (2007) deliberately used a ‘‘naturally occurring’’ form of background risk by removing the quality certiﬁcation of the coins. This approach provides a qualitative test of the effect of an increase in background risk on foreground risk aversion. One premise of their design, completely plausible in their context, is that virtually all subjects would view the lack of certiﬁcation as adding risk to the ﬁnal outcome. However, there is some subjectivity in the amount of background risk that was added in their study. For some purposes it is useful to be able to control this level of background risk explicitly and artefactually, as we do in the laboratory. One reason is that it is possible to imagine naturally occurring contexts where the lack of certiﬁcation does not generate background risk with the clarity that it does in the coin market setting (e.g., payola scandals in the entertainment industry, or reviews written by ﬁlm producers). Another reason is that one might want to compare utility functions incorporating ﬁnal monetary outcomes including the background risk, and one cannot do that without artefactual, objective background risk treatments. Thus, this study complements the ﬁeld study of Harrison et al. (2007) by studying the effect of adding an explicit background risk on risk-taking behavior in a laboratory setting. In this study we conduct what we believe is the ﬁrst laboratory experiment to investigate the effect of background risk on risk aversion; we do so by investigating how choices between gambles change when individuals are forced to play another, exogenous gamble. Our experiments were primarily constructed to test for ‘‘risk vulnerability,’’ as deﬁned by Eeckhoudt et al. (1996) by analyzing the effect of adding an exogenous unfair background risk (e.g., a lottery with a negative expected value) and a mean-zero background risk on subjects’ behavior in a risk preference elicitation experiment proposed by Holt and Laury (2002). Our results provide some, but far from unequivocal, support for the notion than individuals exposed to background risks behaved in a more risk-averse manner than subjects with no background risk.

RISK VULNERABILITY Gollier and Pratt (1996) sought to determine the weakest conditions under which (p. 1110), ‘‘y adding an unfair background risk to wealth makes

318

JAYSON L. LUSK AND KEITH H. COBLE

risk-averse individuals behave in a more risk-averse way with respect to another independent risk.’’ They deﬁne this condition as ‘‘risk vulnerability’’ because the condition ensures that an individual’s (p. 1110), ‘‘willingness to bear risk is vulnerable to the introduction of another unfair risk.’’ This condition ensures that: (a) introduction of an unfair background risk reduces the certainty equivalent of any other independent risk (i.e., introduction of an unfair background risk reduces the demand for risky assets), and (b) a lottery is never complementary to an unfair gamble (i.e., introduction of an independent, unfair risk cannot make a previously undesirable risk become desirable). Standard risk aversion as deﬁned by Kimball (1990) and proper risk aversion as deﬁned by Pratt and Zeckhauser (1987) both imply risk vulnerability. In general, risk vulnerability implies that the ﬁrst two derivatives of the utility function are concave transformations of the original utility function. Eeckhoudt et al. (1996) sought to determine the conditions under which any increase in background risk would generate more risk-averse behavior. They focused on ﬁrst- and second-degree stochastic dominance changes in background risk. Concerning a ﬁrst-degree stochastic dominance change in background risk, Eeckhoudt et al. (1996) show that decreasing absolute risk aversion (DARA) is sufﬁcient to guarantee that adding a negative noise to background wealth (i.e., an unfair background risk) makes people behave in a more risk-averse way; however, this condition is not sufﬁcient for any ﬁrstdegree stochastic dominance change in background risk. They also show that if the third and forth derivatives of the utility function are negative, then a meanpreserving increase in background risk will generate more risk-averse behavior. In this paper, we consider the effect of three types of background risks: none, an unfair background risk, and a mean-preserving increase in background risk. By comparing risk attitudes when subjects are exposed to an unfair background risk versus no background risk, we test the concept of risk vulnerability. Adding a mean-zero background risk (versus no background risk) constitutes a second-degree stochastic dominance change in background risk. By comparing risk attitudes when subjects are exposed to a mean-zero background risk versus no background risk, we test whether individuals have preferences consistent with those outlined in Eeckhoudt et al. (1996) regarding mean-preserving increases in risk.

EXPERIMENTAL PROCEDURES We elicited individuals’ risk attitudes following Holt and Laury (2002). Their approach, which resembles that of Binswanger (1980), entails

319

Risk Aversion in the Presence of Background Risk

individuals making a series of 10 choices between two lotteries, A and B, where, lottery A is the ‘‘safe’’ lottery and lottery B is the ‘‘risky’’ lottery. Table 1 reports the series of decisions subjects were asked to make in all treatments.1 For each decision, a subject choose either option A or option B. Although 10 decisions were made, only one was random selected as binding by rolling a 10-sided die.2 Once the binding decision was determined, the die was thrown again to determine whether the subject received the high or low payoff for the chosen gamble. Subjects participated in one of three treatments: no background risk, mean-zero background risk, or unfair background risk. That is, our experiment involved between-subject comparisons. The treatment with no background risk was a replication of Holt and Laury’s experiment with slightly different payoffs. In the two treatments involving background risk, subjects completed the decision task outlined in Table 1 prior to but with full knowledge that they would participate in a background risk lottery.3 That is, individuals’ risk preferences were elicited via the decision task when individuals knew they would subsequently face an independent, exogenous background risk over which they had no control. In the mean-zero Table 1. Decision 1 2 3 4 5 6 7 8 9 10

Decision Task.

Option A 10% chance of $10.00 20% chance of $10.00 30% chance of $10.00 40% chance of $10.00 50% chance of $10.00 60% chance of $10.00 70% chance of $10.00 80% chance of $10.00 90% chance of $10.00 100% chance of $10.00

90% chance of $8.00 80% chance of $8.00 70% chance of $8.00 60% chance of $8.00 50% chance of $8.00 40% chance of $8.00 30% chance of $8.00 20% chance of $8.00 10% chance of $8.00 0% chance of $8.00

Option B 10% chance of $19.00 20% chance of $19.00 30% chance of $19.00 40% chance of $19.00 50% chance of $19.00 60% chance of $19.00 70% chance of $19.00 80% chance of $19.00 90% chance of $19.00 100% chance of $19.00

90% chance of $1.00 80% chance of $1.00 70% chance of $1.00 60% chance of $1.00 50% chance of $1.00 40% chance of $1.00 30% chance of $1.00 20% chance of $1.00 10% chance of $1.00 0% chance of $1.00

320

JAYSON L. LUSK AND KEITH H. COBLE

treatment, after each subject completed the decision task, they participated in a mean-zero lottery with a 50% chance of losing $10.00 and a 50% chance of winning $10.00. In the unfair treatment, after each subject completed the decision task, they played a lottery with a 50% chance of losing $10.00 and a 50% chance of winning $0.00. One hundred thirty undergraduate students were recruited from introductory economics and business courses by passing around sign-up sheets containing session times and dates. Upon arrival at a session (a typical session contained about 20 subjects), students were given a $10 show-up fee and were asked to complete a lengthy survey on food consumption habits. The purpose of the lengthy survey was to make subjects feel as though they had earned their show-up fee prior to participating in the risk preference elicitation experiment. After the risk preference elicitation experiment, subjects were individually paid their earnings in cash (except for the few cases in the background risk treatments where individuals owed us money, in which subjects paid us for their losses). Sessions lasted approximately for 1 h.

ANALYSIS AND RESULTS There are a variety of methods that can be used to determine risk preferences based on the choices in the experiment. Before proceeding, distinctions need be drawn concerning different types of analysis that can be undertaken and different measures of risk preferences that can be calculated. First, certain analyses can be carried out where risk preferences are permitted to vary from individual-to-individual. That is, based on choices in the decision task, we can create measures of each individual’s risk preferences. However, the decision task only permits rather crude measures of each individual’s risk preferences (e.g., a range on an individual’s coefﬁcient of relative risk aversion rather than a point estimate). The second type of analysis that is undertaken is to estimate aggregate risk preferences in each treatment. Although this approach has the disadvantage of combining individuals with different preferences, it permits more precise estimates of risk aversion and permits us to relax the assumption of strict expected utility preferences. In addition to this issue, we carry out our analysis using risk preferences estimated via one of two manners: (a) the initial $10 endowment and the income/loss from the background risk are explicitly assumed to enter individuals’ utility functions in addition to the potential winnings from the

Risk Aversion in the Presence of Background Risk

321

decision task, or (b) individual risk preferences are calculated based only on the winnings from the decision task. To illustrate, consider the expected utility of option A in decision 1 from Table 1. Under approach (a) expected utility in the fair background risk treatment would be calculated as: 0.05U($10+$10$10 ¼ $10)+0.45U($10+$8$10 ¼ $8)+0.05U ($10+$19+ $10 ¼ $39)+0.45U($10+$1+$10 ¼ $21), and 0.05U($10+$10$10 ¼ $10)+ 0.45U($10+$8$10 ¼ $8)+0.05U($10+$19+$0 ¼ $29) +0.45U($10+$1+ $0 ¼ $11) in the unfair background risk treatment, and 0.1U($10+$10 ¼ $20)+0.9U($10+$8 ¼ $18) in the no background risk treatment. In contrast, under approach (b), the expected utility for all three treatments would simply be calculated as: 0.1U($10)+0.9U($8). Although most work in economics incorporates ﬁnal wealth (as opposed in income) as the argument in the utility function, expected utility theory, in and of itself, is silent regarding whether approach (a) or (b) is appropriate, and as such, we carry out our analysis both ways. With the stage set, we now turn our attention to ﬁrst characterizing individuals’ risk preferences. Individual’s choices in the decision task shown in Table 1 can be used to determine risk preferences. Regardless of whether income/loss from background risk is included in the utility calculation, a risk-neutral individual would choose option A for the ﬁrst four decisions listed in Table 1 because the expected value of lottery A exceeds that of lottery B for the ﬁrst four choices. As one moves down Table 1, the chances of winning the higher payoff increase for both options. In fact, decision 10 is a simple check of participant understanding as subjects are simply asked to choose between $10.00 and $19.00. When completing the decision task, most individuals start with option A and at some point then switch to option B, which they choose for the remainder of the decision task. Although this behavior is the norm, there was no requirement that subjects behave in such a manner. That is, individuals could choose A, then B, and then A again. As a ﬁrst step in characterizing individuals’ risk preferences, we follow Holt and Laury (2002) and report the sum of the number of safe choices an individual made in the decision task. In addition, we report an alternative, but similar measure of risk aversion: the decision task where an individual ﬁrst chose option B – the risky prospect. Although both measures provide some indication of an individual’s risk preference, a more appropriate and useful approach is to analyze the range of a local measure of an individual’s coefﬁcient of absolute or relative risk aversion. Assuming subjects exhibit constant relative risk aversion (CRRA) i.e., U(x) ¼ x(1rr)/(1rr), where rr is a local measure of the coefﬁcient of relative risk aversion, choices in the

322

JAYSON L. LUSK AND KEITH H. COBLE

decision task can be used to determine a range on a subject’s coefﬁcient of relative risk aversion. Coefﬁcients corresponding to rro0, rr ¼ 0, and rrW0, are associated with risk-loving, risk-neutral, and risk-averse behavior, respectively. It is important to note that the assumption of CRRA preferences generates DARA, which, as previously mentioned, was a sufﬁcient condition to guarantee that adding an unfair background risk increases risk aversion. Alternatively, one could assume subjects exhibit constant absolute risk aversion (CARA) i.e., U(x) ¼ exp(ar x), where ar is a local measure of the coefﬁcient of absolute risk aversion. Risk-loving, riskneutral, and risk-averse behavior is associated with aro0, ar ¼ 0, and arW0, respectively. With the CARA speciﬁcation, it is inconsequential whether the endowment and income/losses from the background risk are incorporated into the utility calculations or whether utility is only calculated based on earnings from the decision task; one arrives at the same estimate of ar in either case. Turning to the individual-level results, Table 2 reports the distributions of the number of safe choices in each experimental treatment.4 In addition, Table 2 reports the range of ar and rr (for the situation where the $10 endowment and background risk gains/losses are not incorporated into the expected utility formula) corresponding to the situation where an individual starts the task of choosing option A and makes one switch to option B, which he/she chooses thereafter. Fig. 1 plots the percentage of safe choices Table 2. Number of Safe Choices

0–1 2 3 4 5 6 7 8 9–10 Number of a

Risk Aversion Classiﬁcation Based on Lottery Choices.

Range of Relative Risk Aversiona

rro0.97 0.97orro0.49 0.49orro0.12 0.12orro0.19 0.19orro0.49 0.49orro0.79 0.79orro1.13 1.13orro1.61 1.61orr observations

Range of Absolute Risk Aversionb

aro0.11 0.11oaro0.06 0.06oaro0.02 0.02oaro0.03 0.03oaro0.07 0.07oaro0.11 0.11oaro0.17 0.17oaro0.25 0.25oar

Treatment No background risk

Mean-zero background risk

Unfair background risk

2.0% 0.0% 10.0% 24.0% 12.0% 24.0% 16.0% 10.0% 2.0% 50

0.0% 0.0% 0.0% 11.1% 22.2% 40.7% 18.6% 7.4% 0.0% 27

0.0% 0.0% 3.8% 20.7% 18.9% 35.9% 7.5% 7.5% 5.7% 53

Assuming U(x) ¼ x(1rr)/(1rr) and x only includes the gains from the decision task. Assuming U(x) ¼ exp(ar x) and x only includes the gains from the decision task.

b

323

Risk Aversion in the Presence of Background Risk 100.00% 90.00%

No Background Risk Mean-Zero Background Risk Unfair Background Risk Risk Neutral Behavior

Percentage of Safe Choices

80.00% 70.00% 60.00% 50.00% 40.00% 30.00% 20.00% 10.00% 0.00% 1

Fig. 1.

2

3

4

5 6 Decision

7

8

9

10

Percentage of Safe Choices in Each Decision and Treatment.

in each of the 10 decision tasks shown in Table 1. First, it is apparent that majority of subjects in the sample are risk averse. A risk-averse individual would choose option A for the ﬁrst four decision tasks; however, the majority of respondents chose option A ﬁve or more times. Second, there appears to be a slight treatment effect. In particular, Fig. 1 shows that data in the mean-zero treatment lies to the right of that in the no background risk treatment for most of the decision tasks. Although the unfair background risk treatment paralleled the no background risk treatment closely for most decision tasks, a larger percentage of safe choices are observed for the ﬁnal three decision tasks in the unfair background risk treatment. Table 3 reports the mean, median, and standard deviation of the number of safe choices and the ﬁrst risky choice for each treatment. On average, subjects in both background risk treatments behaved in a more risk-averse manner than subjects that did not face a background risk. The median number of safe choices and ﬁrst risk choice were similar across all three treatments. Regardless of whether one focuses on the number of same choices or the ﬁrst risky choice, ANOVA tests are unable to reject the hypothesis that the mean risk aversion levels were identical across treatment at any standard signiﬁcance level and Wilcoxon rank sum tests are unable to reject the hypothesis of equality of distributions across treatments at any standard signiﬁcance level. Interestingly, the standard deviation of number

324

Table 3.

JAYSON L. LUSK AND KEITH H. COBLE

Summary Statistics of Number of Safe Choices and First Risky Choice Across Treatment. Treatment No background risk

Mean-zero background risk

Unfair background risk

Mean number of safe choicesa Median number of safe choices Standard deviation of number of safe choices

5.40 6.00 1.78

5.89 6.00 1.09

5.68 6.00 1.48

Average ﬁrst risky choiceb Median ﬁrst risky choice Standard deviation of ﬁrst risky choice

6.32 6.50 1.81

6.70 7.00 1.35

6.34 6.00 1.75

Number of participants

50

27

53

a

The p-value from an ANOVA test associated with the null hypothesis that the mean number of safe choices is equal across treatment is p ¼ 0.38. The p-value from a Wilcoxon rank sum test of the equality of distributions of safe choices across treatment is p ¼ 0.43. b The p-value from an ANOVA test associated with the null hypothesis that the mean ﬁrst risky choice is equal across treatment is p ¼ 0.60. The p-value from a Wilcoxon rank sum test of the equality of distributions of ﬁrst risky choices across treatment is p ¼ 0.46.

of safe choices and ﬁrst risky choice were greater when no background risk was present, a result that is statistically signiﬁcant. Although comparing data across treatments in Table 3 is useful for summarizing individuals’ behavior, such an approach does not control for potential differences in subject-speciﬁc characteristics across treatment, nor does it explicitly incorporate the precision with which we were able to measure an individual’s level of risk aversion. That is, an analysis focused solely on the number of safe choices or the ﬁrst risky choice would not account for the fact that individuals who switched back and forth between options A and B contribute less information (i.e., have greater variance) regarding their risk preferences. To address both issues, we estimated interval-censored models with and without multiplicative heteroscedasticy. Table 4 reports three models, the ﬁrst two of which use interval-censored rr as the dependent variable and the last of which uses interval-censored ar as the dependent variable. The two dummy variables at the bottom of the table show the effect of background

Risk Aversion in the Presence of Background Risk

325

risk on risk aversion holding constant other subject-speciﬁc effects. The dummy variables are statistically signiﬁcant only in the CRRA model when the endowment and background risk income/loss is incorporated into the expected utility calculation, in which case both background risk treatments are associated with lower rr. This result implies background risk increases risk-taking behavior, which is contrary to assumptions in the Gollier and Pratt (1996). However, caution should be taken in interpreting this result. First, the result may be an artifact of fact that the ﬁnal monetary outcomes of the treatment without background risk did not span the range of ﬁnal outcomes in the treatments with background risk. The result is that an individual that chose option A for the ﬁrst seven decision tasks, for example, would have a lower bound on rr of about 0.94 in the both background risk treatments, but an identical individual that chose option for the ﬁrst seven task in the no background treatment would have a lower bound on rr of 2.04; that is, exactly same choices generate different estimates of rr. Second the models in Table 4 do not control for heteroscedasticity that might arise due to differences in variance across treatment or other explanatory variables. Table 5 reports results from interval-censored models with multiplicative heteroscedasticity. Once heteroscedasticity is taken into account, we ﬁnd that subjects in the mean-zero background risk treatment behaved in a more risk-averse manner (i.e., exhibited higher levels of rr and ra) than individuals that were not exposed to a background risk according to the CRRA model that did not incorporate income from the background risk and the CARA model. In both of these speciﬁcations, a similar result was obtained for the unfair background risk treatment, although less signiﬁcant ( p ¼ 0.15).5 In the CRRA model that incorporates the endowment and background risk gains/losses into the utility calculation, neither of the background risk treatments was statistically signiﬁcant. One interesting result from Table 5, which is not addressed by theory, is that both background risk treatments generated less variability around measured levels of rr and ra than when no background risk was present. Although the interval-censored models have appealing features in that they utilize individual estimates of risk preference and permit a straightforward way to control for subject-speciﬁc effects across treatment, there are some drawbacks. In particular, the estimates rest on the assumption of a particular functional form for the utility function, CRRA or CARA. Further, the models do not permit one to determine whether individuals have non-expected utility preferences.6 To address both issues, we used the choices in the decision task to estimate a variety of preference functionals by

326

JAYSON L. LUSK AND KEITH H. COBLE

Table 4.

Effect of Background Risk on Risk Aversion: Interval-censored Regressions. Relative Risk Aversion Modelsa Income from background risk not integratedc

Constant Gender (1 ¼ female; 0 ¼ male) Age (age in years) Freshman (1 ¼ freshman; 0 ¼ otherwise) Sophomore (1 ¼ sophomore; 0 ¼ otherwise) Junior (1 ¼ junior; 0 ¼ otherwise) Employment (1 ¼ not employed; 0 ¼ employed at least part time) Income (annual income from all sources) Race (1 ¼ white; 0 ¼ otherwise) Mean-zero background risk (1 ¼ mean-zero background risk treatment; 0 ¼ otherwise) Unfair background risk (1 ¼ unfair background risk treatment; 0 ¼ otherwise) Scale

Income from background risk is integratedd

Absolute Risk Aversion Modelb

0.028 (0.374) 0.010 (0.107) 0.014 (0.014) 0.0008 (0.173) 0.172 (0.129) 0.132 (0.124) 0.035 (0.100) 0.037 (0.020) 0.211 (0.146) 0.150 (0.133)

0.787 (0.746) 0.034 (0.212) 0.016 (0.028) 0.125 (0.341) 0.309 (0.258) 0.189 (0.249) 0.059 (0.199) 0.051 (0.039) 0.229 (0.264) 0.519 (0.229)

0.004 (0.054) 0.005 (0.016) 0.002 (0.002) 0.001 (0.025) 0.025 (0.019) 0.019 (0.018) 0.007 (0.015) 0.006 (0.003) 0.033 (0.021) 0.018 (0.019)

0.142 (0.114)

0.501 (0.120)

0.019 (0.017)

0.509 (0.034)

1.020 (0.069)

0.074 (0.005)

Note: Numbers in parentheses are standard errors; , , and represent statistical signiﬁcance at 0.10, 0.05, and 0.01 levels, respectively; log-likelihood function values are 224.81, 220.83, and 224.08, respectively, for the three models above. a Dependent variable is the range of individuals’ coefﬁcient of relative risk aversion; number of observations ¼ 130. b Dependent variable is the range of individuals’ coefﬁcient of absolute risk aversion; number of observations ¼ 130. c Only the earnings from the decision task are not incorporated into the expected utility formula to calculate CARA intervals. d The $10 endowment and background risk gains/losses are not incorporated into the expected utility formula to calculate CARA intervals.

327

Risk Aversion in the Presence of Background Risk

Table 5. Effect of Background Risk on Risk Aversion: Intervalcensored Regressions with Multiplicative Heteroscedasticity.

Constant Gender Age Freshman Sophomore Junior Employment Income Race Mean-zero background risk Unfair background risk

Relative Risk Aversion Model: Income from Background Risk not Incorporateda

Relative Risk Aversion Model: Income from Background Risk is Incorporatedb

Absolute Risk Aversion Mode

Mean equation

Variance equation

Mean equation

Variance equation

Mean equation

Variance equation

0.782 (0.451) 0.086 (0.106) 0.016 (0.019) 0.044 (0.117) 0.092 (0.121) 0.128 (0.107) 0.005 (0.080) 0.043 (0.011) 0.226 (0.106) 0.263 (0.100)

2.011 (0.648) 0.618 (0.189) 0.057 (0.025) 0.466 (0.308) 0.392 (0.239) 0.054 (0.208) 0.424 (0.185) 0.094 (0.041) 0.578 (0.256) 0.779 (0.236)

1.303 (0.789) 0.048 (0.134) 0.019 (0.034) 0.017 (0.179) 0.113 (0.187) 0.106 (0.163) 0.081 (0.106) 0.047 (0.017) 0.300 (0.121) 0.199 (0.232)

0.863 (0.751) 0.578 (0.257) 0.053 (0.031) 0.731 (0.355) 0.396 (0.292) 0.002 (0.232) 0.328 (0.222) 0.072 (0.042) 0.445 (0.260) 1.734 (0.324)

0.116 (0.064) 0.013 (0.015) 0.002 (0.003) 0.008 (0.017) 0.011 (0.017) 0.019 (0.016) 0.001 (0.012) 0.006 (0.002) 0.032 (0.015) 0.036 (0.014)

3.921 (0.646) 0.623 (0.197) 0.055 (0.024) 0.503 (0.320) 0.440 (0.241) 0.093 (0.209) 0.451 (0.188) 0.094 (0.040) 0.599 (0.262) 0.800 (0.243)

0.432 (0.212)

0.311 (0.235)

1.367 (0.242)

0.147 (0.097)

0.021 (0.014)

0.419 (0.220)

Note: Numbers in parentheses are standard errors; , , and represent statistical signiﬁcance at 0.10, 0.05, and 0.01 levels, respectively; log-likelihood function values are 205.69, 203.96, and 205.63, respectively, for the three models above. a Dependent variable is the range of individuals’ coefﬁcient of relative risk aversion; number of observations ¼ 130; only the earnings from the decision task are not incorporated into the expected utility formula to calculate CARA intervals. b Dependent variable is the range of individuals’ coefﬁcient of absolute risk aversion; number of observations ¼ 130; the $10 endowment and background risk gains/losses are not incorporated into the expected utility formula to calculate CARA intervals.

treatment. To carry out this task, an individual is assumed to choose option A if the difference in (rank dependent) expected utility of option A and B exceeded zero. Adding a mean-zero normally distributed error term to this difference calculation produces a familiar logit speciﬁcation. Because the

328

JAYSON L. LUSK AND KEITH H. COBLE

utility functions we estimate have the properties that U(0) ¼ 0 and that A is chosen if EU(A)EU(B)W0, these normalization allows us to directly estimate the standard deviation of the error in the probit such that the utility coefﬁcients are directly interpretable and comparable across treatments. All the model speciﬁcations we consider can be derived from the following (rank dependent) expected utility preference function: RDEU ¼

N X i¼1

pi

1 expðaxið1rÞ Þ a

(1)

where N is the number of outcomes (xi) from a lottery and xi are ordered such that x1Wx2WyW xn. The utility function is the ‘‘power expo’’ function used in Holt and Laury (2002), where the Pratt–Arrow coefﬁcient of relative risk aversion is r+a(1r)x(1r). If r ¼ 0, then the utility function exhibits CARA of degree a. If a ¼ 0, then the utility function exhibits CRRA, where r is the coefﬁcient of relative risk aversion. Thus, the utility function nests constant relative and CARA as special cases. In Eq. (1), pi is a ‘‘decision weight’’ that takes the form of rank dependence such as that proposed by Quiggin (1982): pi ¼ w( p1+y+pi )w( p1+y+pi1), where pi is the probability of obtaining xi and w( p) is a probability weighting function, which we assume to take the form: w( p) ¼ pg/[ pg+(1p)g]1/g. If g ¼ 1, then the weighting function is linear in parameters and pi ¼ pi, which implies that Eq. (1) is expected utility. For values of go1, individuals overweight low-probability events and underweight medium-to-high probability events. Table 6 reports utility function and probability weighting function estimates for each experimental treatment assuming a ¼ 0 (CRRA) and further assumes that individuals do not incorporate their endowment and background risk gains/losses into utility calculations. The ﬁrst three columns of results assume expected utility theory is the appropriate model of behavior by ﬁxing g ¼ 1, whereas the last three columns directly estimate g. In addition to the probit estimates, the last two rows of Table 6 report results from unconditional interval-censored models for the sake of comparability. Assuming linear probability weighting, we ﬁnd results very similar to that presented in Tables 4 and 5. The coefﬁcient of CRRA is higher in the two background risk treatments as compared to the treatment without background risk, although the 95% conﬁdence intervals overlap. We also ﬁnd lower variance in the background risk treatments. We also note that the probit and interval-censored speciﬁcations generate nearly identical results. The last three columns in Table 6 allow for non-linear probability

329

Risk Aversion in the Presence of Background Risk

Table 6. Preference Function Estimates by Background Risk Treatment Ignoring Endowment and Income/Loss from Background Risk (Assuming a ¼ 0)a. Models Assuming Linear Probability Weighting

No background risk modelc

Models Assuming Nonlinear Probability Weighting and Rank-dependenceb

Mean-zero background risk modeld

Unfair background risk modele

No background risk modelc

Mean-zero background risk modeld

Unfair background risk modele

0.33 [0.21, 0.45] 0.60 [0.48, 0.72] 1

0.51 [0.38, 0.64] 0.55 [0.43, 0.68] 1

0.62 [0.19, 1.05] 0.27 [0.60, 1.14] 0.70 [0.00, 1.41]

0.47 [0.08, 1.02] 0.08 [1.02, 1.19] 0.56 [0.11, 0.99]

0.61 [0.41, 0.81] 0.55 [0.38, 0.71] 1.32 [0.95, 1.69]

Interval-censored model estimates sf 0.62 0.33 [0.49, 0.75] [0.23, 0.43] rr 0.46 0.62 [0.29, 0.63] [0.49, 0.75]

0.51 [0.40, 0.62] 0.57 [0.42, 0.72]

Probit model estimates s/2f 0.60 [0.36, 0.83]g r 0.46 [0.31, 0.61] g 1

a

Utility function takes the form: U(x) ¼ x(1rr)/(1rr), where rr is the coefﬁcient of relative risk aversion, and x are the prizes in decision task in Table 1. b Probability weighting function is of the form: wð pÞ ¼ pg =½ pg þ ð1pÞg 1=g . c Sample size ¼ 50 in interval-censored model and 50 individuals 10 choices ¼ 500 in probit model. d Sample size ¼ 27 in interval-censored model and 27 individuals 10 choices ¼ 270 in probit model. e Sample size ¼ 53 in interval-censored model and 53 individuals 10 choices ¼ 530 in probit model. f s is the standard deviation of the error term in the model. g Numbers in brackets are 95% conﬁdence intervals.

weighting. For the no background risk and mean-zero background risk treatments, estimates of g are less than one and are consistent with previously published estimates, which range from 0.56 to 0.71 (e.g., Camerer & Ho, 1994; Tversky & Kahneman, 1992; or Wu & Gonzalez, 1996), although the mean-zero background risk treatment is the only treatment where the 95% conﬁdence interval for g does not include one. Once probability weighting is taken into account, we are no longer able to reject the hypothesis that individuals’ utility functions are linear in the no background risk and mean-zero background risk treatments; however, individuals in the unfair background risk treatment still exhibit risk aversion after probability weighting is taken into account.

330

JAYSON L. LUSK AND KEITH H. COBLE

Table 7. Preference Function Estimates by Background Risk Treatment Ignoring Endowment and Income/Loss from Background Riska. Models Assuming Linear Probability Weighting

Models Assuming Nonlinear Probability Weighting and Rank-dependenceb

No background risk modelc

Mean-zero background risk modeld

Unfair background risk modele

No background risk modelc

Mean-zero background risk modeld

Unfair background risk modele

Probit model estimates s/2f 0.51 [0.47, 0.54]g r 0.23 [0.09, 0.37] a 0.07 [0.02, 0.12] g 1

0.47 [0.45, 0.48] 0.03 [0.12, 0.18] 0.09 [0.01, 0.11] 1

0.51 [0.49, 0.53] 0.18 [0.10, 0.26] 0.09 [0.06, 0.12] 1

0.55 [0.50, 0.60] 0.16 [0.04, 0.28] 0.02 [0.19, 0.23] 0.70 [0.01, 1.41]

0.59 [0.67, 1.85] 0.11 [0.99, 0.77] 0.02 [0.05, 0.09] 0.56 [0.09, 1.03]

0.58 [0.29, 0.87] 0.20 [0.10, 0.30] 0.09 [0.05, 0.13] 1.32 [0.95, 1.69]

a

Utility function takes the form: UðxÞ ¼ ½1 expðaxð1rrÞ Þ=a, where rr is the coefﬁcient of relative risk aversion, a is the coefﬁcient of absolute risk aversion, and x are the prizes in decision task in Table 1. b Probability weighting function is of the form: wð pÞ ¼ pg =½ pg þ ð1pÞg 1=g . c Sample size ¼ 50 individuals 10 choices ¼ 500. d Sample size ¼ 27 individuals 10 choices ¼ 270. e Sample size ¼ 53 individuals 10 choices ¼ 530. f s is the standard deviation of the error term in the model. g Numbers in brackets are 95% conﬁdence intervals.

Table 7 reports estimates similar to those in Table 6 except the restriction of a ¼ 0 is relaxed. Overall, our estimates are similar to those in Holt and Laury, who estimated a ¼ 0.27 and r ¼ 0.03, which implies increasing relative risk aversion and DARA. Although the point estimates reveal differences in behavior across the three experimental treatments, the 95% conﬁdence intervals overlap for every parameter of interest regardless of whether we assume linear probability weighting. Table 8 reports utility function estimates assuming a ¼ 0 (CRRA) and that individuals incorporate their endowment and background risk gains/ losses into utility calculations. In addition to the probit estimates, we also present results from the simple interval-censored models for comparison. As in Table 4, we ﬁnd higher levels of risk aversion in the no background risk treatment as compared to the two treatments that incorporated background risk; however, the 95% conﬁdence intervals overlap.

331

Risk Aversion in the Presence of Background Risk

Table 8. Preference Function Estimates by Background Risk Treatment Incorporating $10 Endowment and Income/Losses from Background Risk (Assuming a ¼ 0)a. Models Assuming Linear Probability Weighting

Models Assuming Nonlinear Probability Weighting and Rank-dependenceb

No background risk modelc

Mean-zero background risk modeld

Unfair background risk modele

No background risk modelc

Mean-zero background risk modeld

Unfair background risk modele

Probit model estimates s/2f 0.10 [0.02, 0.22] rr 1.15 [0.76, 1.54] g 1

0.35 [0.23, 0.47]g 0.73 [0.61, 0.86] 1

0.62 [0.58, 0.64] 0.69 [0.55, 0.83] 1

0.30 [1.31, 1.92] 0.67 [1.54, 2.88] 0.70 [0.00, 1.40]

1.12 [0.81, 3.05] 0.01 [1.34, 1.36] 0.48 [0.11, 1.07]

0.65 [0.39, 0.91] 0.74 [0.56, 0.92] 1.45 [1.10, 1.80]

Interval-censored model estimates sf 1.58 0.35 [1.25, 1.91] [0.24, 0.46] rr 1.24 0.74 [0.79, 1.69] [0.60, 0.88]

0.57 [0.45, 0.69] 0.66 [0.50, 0.82]

a

Utility function takes the form U(x) ¼ x(1rr)/(1rr), where rr is the coefﬁcient of relative risk aversion, and x are ﬁnal wealth states including the $10 endowment and the income from the background risk lotteries. b Probability weighting function is of the form wð pÞ ¼ pg =½ pg þ ð1pÞg 1=g . c Sample size ¼ 50 in interval-censored model and 50 individuals 10 choices ¼ 500 in probit model. d Sample size ¼ 27 in interval-censored model and 27 individuals 10 choices ¼ 270 in probit model. e Sample size ¼ 53 in interval-censored model and 53 individuals 10 choices ¼ 530 in probit model. f s is the standard deviation of the error term in the model. g Numbers in brackets are 95% conﬁdence intervals.

CONCLUSION Whether and to what extent preferences for risk are affected by background risk carries important implications for economic analysis. If subjects are signiﬁcantly inﬂuenced by background risk, economic analyses must move beyond studies of behavior in single risky situations, which can almost never be expected to arise in practice. Changes in public policies, for example, that limit risk-taking behavior in one domain, might generate seemingly counterintuitive results by increasing risk-taking behavior in other domains.

332

JAYSON L. LUSK AND KEITH H. COBLE

We found little support for the notion that individuals made choices consistent with risk vulnerability as deﬁned by Gollier and Pratt (1996). We found that a mean-preserving increase in background risk had a stronger inﬂuence on risk aversion than the addition of an unfair background risk. Individuals that were forced to play a lottery with a 50% change of winning $10 and a 50% chance of losing $10 behaved in a more risk-averse manner than individuals that were not exposed to such a lottery. However, this ﬁnding depends on: (1) how individuals incorporate endowments and background gains and losses into their utility functions and (2) how error variance is modeled. It is also important to note that much of the risk-averse behavior in this treatment may arise from non-linear probability weighting. We found weak evidence that individuals may weight probabilities differently in the unfair background risk treatment than in the other treatments; only for this treatment were we able to reject the hypothesis of linear probability weighting. Finally, we found that background risk, whether mean-preserving or unfair, generated less variable estimates of coefﬁcients of relative and absolute risk aversion than when no background risk was present, although some of this effect dissipates when we allow for non-linear probability weighting. Although previous theoretical work has generated plausible signs on the effect of background risk on risk-taking behavior, it is silent regarding the distribution of risk preferences with and without background risk. In general, however, we found that the effect of background risk on risk preferences was not particularly large in this experiment. There may be a variety of factors attributing to this result, some relating to experimental design issues and others that are farther reaching. Regarding the experimental design, future work on this issue might consider using a more precise risk-elicitation approach. Although the decision task shown in Table 1 is easy for subjects to complete, it only identiﬁes a range of plausible risk preferences. To the extent that background risks only have a small effect on risk-taking behavior, a more reﬁned elicitation tool is required in order to measure the effect. Future experiments might also vary the range of earnings in the no background risk treatment. In our experiment, the ﬁnal monetary outcomes of the treatment without background risk did not span the range of ﬁnal outcomes in the treatments with background risk. Aside from experiment design issues, other factors might be related to our ﬁnding that background risk has small to no affect on risk-taking behavior. First, experimental subjects may bring a number of background risks with them into the experiment. If so, non-experimental background risks might swamp the effect of experimentally induced background risk.

Risk Aversion in the Presence of Background Risk

333

Future laboratory research investigating the effect of background risk on risk preferences might focus methods for measuring and controlling for other ‘‘ﬁeld’’ background risks. Second, the sample of respondents might have been heterogeneous with regard to preferences; some individuals might have been expected utility maximizers, while others might have had generalized expected utility preferences. If some portion of the sample had generalized expected utility preferences, the results in Quiggin (2003) suggest these individuals would behave in a less risk-averse way when confronted with a background risk, which would tend to dampen the aggregate results presented here. A risk preference elicitation approach that permitted a test of expected utility for each individual would be able to sort out these issues. Finally, behavioral research suggests that when confronted with several risky choices, individuals tend to assess each risky choice in isolation rather than assessing all risks jointly (Benartzi & Thaler, 1995; Kahneman & Lovallo, 1993; Read, Loewenstein, & Rabin, 1999). This behavior might cause individuals to at least partially disregard background risks when making endogenous risky decisions. Such behavior would cause background risk to have a lesser effect on risk preference than predicted by the models of Gollier and Pratt (1996) or Quiggin (2003). Given the implications of background risk, our results clearly suggest that this is a research area meriting further experimental research to test these alternative theories.

NOTES 1. The values in Table 1 are roughly ﬁve times the baseline treatment used by Holt and Laury (2002). 2. Because only 1 in 10 choices were picked at random, there is some background risk present in all treatments; however, this particular background risk is constant across all treatments. 3. Instructions for each treatment are in the appendix. 4. All data and computer code used to generate the results in this paper are available on ExLab (http://exlab.bus.ucf.edu). 5. We are able to reject the joint hypothesis that rr is unaffected by mean-zero and unfair background risks, at the p ¼ 0.05 level of statistical signiﬁcance for the CRRA model without income from background risk. A similar result ( p ¼ 0.06) is obtained for ra. 6. As shown by Harrison (2006) allowing for non-EU preferences can have a substantive impact on interpretation of results. Harrison (2006) showed that while there are signiﬁcant differences in behavior between real and hypothetical treatments, some non-EU models suggest that the difference arises due to changes in the probability weighting function and not due to the changes in the utility or value function.

334

JAYSON L. LUSK AND KEITH H. COBLE

7. Note to the reader: Strictly speaking a 10-sided die cannot be constructed to provide an exact uniform distribution; the 10-sided die gives an approximately equal chance of each decision being binding.

ACKNOWLEDGMENTS The authors would like to thank Glenn Harrison, Jason Shogren, and Eric Rasmusen for their helpful comments on the previous version of this paper.

REFERENCES Alessie, R., Hochguertel, S., & van Soest, A. (2001). Household portfolios in the Netherlands. In: L. Guison, M. Haliassos & T. Japelli (Eds), Household portfolios. Cambridge: MIT Press. Arrondel, L., & Masson, A. (1996). Gestion du risque et comportements patrimoniauz. Economie et Statistique, 296–297, 63–89. Benartzi, S., & Thaler, R. H. (1995). Myopic loss aversion and the equity premium puzzle. Quarterly Journal of Economics, 110, 73–93. Binswanger, H. P. (1980). Attitudes toward risk: Experimental measurement in rural India. American Journal of Agricultural Economics, 62, 396–407. Camerer, C., & Ho, T. (1994). Violations of the betweenness axiom and nonlinearity in probability. Journal of Risk and Uncertainty, 8, 167–196. Diamond, D. W. (1984). Financial intermediation and delegated monitoring. Review of Economic Studies, 51, 393–414. Eeckhoudt, L., Gollier, C., & Schlesinger, H. (1996). Changes in background risk and risk taking behavior. Econometrica, 64, 683–689. Gollier, C., & Pratt, J. W. (1996). Risk vulnerability and the tempering effect of background risk. Econometrica, 64, 1109–1123. Guiso, L., Japelli, T., & Terlizzese, D. (1996). Income risk, borrowing constraints and portfolio choice. American Economic Review, 86, 158–172. Harrison, G. W. (2006). Hypothetical bias over uncertain outcomes. In: J. A. List (Ed.), Using experimental methods in environmental and resource economics. Northampton, MA: Elgar. Harrison, G. W., List, J. A., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. Econometrica, 75, 433–458. Heaton, J., & Lucas, D. (2000). Portfolio choice in the presence of background risk. Economic Journal, 110, 1–26. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92, 1644–1655. Kahneman, D., & Lovallo, D. (1993). Timid choices and bold forecasts: A cognitive perspective on risk taking. Management Science, 39, 17–32. Kimball, M. S. (1990). Precautionary savings in the small and in the large. Econometrica, 58, 53–73. Pratt, J. W., & Zeckhauser, R. (1987). Proper risk aversion. Econometrica, 55, 143–154.

Risk Aversion in the Presence of Background Risk

335

Read, D., Loewenstein, G., & Rabin, M. (1999). Choice bracketing. Journal of Risk and Uncertainty, 19, 171–197. Safra, Z., & Segal, U. (1998). Constant risk aversion. Journal of Economic Theory, 83, 19–42. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 5, 297–323. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3, 323–343. Quiggin, J. (2003). Background risk in generalized expected utility theory. Economic Theory, 22, 607–611. Weil, P. (1992). Equilibrium asset prices with undiversiﬁable labor income risk. Journal of Economic Dynamics and Control, 16, 769–790. Wu, G., & Gonzalez, R. (1996). Curvature of the probability weighting function. Management Science, 42, 1676–1690.

APPENDIX. EXPERIMENT INSTRUCTIONS Beginning Instructions – Common to All Three Treatments Thank you for agreeing to participate in today’s session. Before beginning today’s exercise, I have two requests. First, you should sit some distance from any of the other participants. Second, other than questions directed toward me, there is to be NO talking. Failure to comply with the no talking policy will result in immediate disqualiﬁcation from this exercise. Before we begin, I want to emphasize that your participation in this session is completely voluntary. If you do not wish to participate in the experiment, please say so at any time. Non-participants will not be penalized in any way. I want to assure you that the information you provide will be kept strictly conﬁdential and used only for the purposes of this research. At this time, you should have been given a consent form. Please sign this form and return it to me. Now, you will be given $10.00 and a packet with two separate documents. The $10.00 is yours to keep and it has been provided to compensate you for your time. In the upper right hand corner of the documents is an ID number. This ID number is used to ensure conﬁdentiality. In today’s session, you will participate in two exercises. First, I would like you all to look at the document titled ‘‘Survey on Consumer Opinions.’’ At this time, take the next 20–30 min to complete the survey. When you complete the survey, then we will proceed to the second exercise, which will be explained after everyone has completed the survey. Are there any questions before we begin? {to be read after completion of the surveyc

336

JAYSON L. LUSK AND KEITH H. COBLE

Has everyone completed the survey? Please return the completed survey to me. Now, you will participate in an exercise where you will have the opportunity to earn money. You will be asked to make several choices, which will determine how much money you will earn. Please turn your attention to the second document you have been given, which is titled, ‘‘Decision Record Sheet.’’ Instructions for the No-Background Risk Treatment Your decision sheet shows ten decisions numbered one to ten on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices (either A or B) and record these in the ﬁnal column, but only one of them will be used in the end to determine your earnings. Before you start making your ten choices, please let me explain how these choices will affect your earnings for the experiment. Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, either A or B. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end.7 Now, please look at Decision 1 at the top. Option A pays $10.00 if the throw of the ten-sided die is 1, and it pays $8.00 if the throw is 2–10. Option B yields $19.00 if the throw of the die is 1, and it pays $1.00 if the throw is 2–10. Similarly, for Decision 2, Option A will pay $10.00 if the throw of the die is 1 or 2 and will pay $8.00 if the throw of the die is 3–10. The other decisions are similar, except that as you move down the table, the chances of the higher payoff for each option increase. In fact, for Decision 10 in the bottom row, the die will not be needed since each option pays the highest payoff for sure, so your choice here is between $10.00 or $19.00. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are ﬁnished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your money earnings for the option you chose for that decision. Earnings for this choice will be paid in cash when we ﬁnish.

Risk Aversion in the Presence of Background Risk

337

So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B in each of the ten boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings. Then you will write your earnings in the blank at the bottom of the page. Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question. Instructions for Mean-Zero Background Risk Treatment Your decision sheet shows ten decisions numbered one to ten on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices (either A or B) and record these in the ﬁnal column, but only one of them will be used in the end to determine your earnings. Before you start making your ten choices, please let me explain how these choices will affect your earnings for the experiment. Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, either A or B. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end. Now, please look at Decision 1 at the top. Option A pays $10.00 if the throw of the ten-sided die is 1, and it pays $8.00 if the throw is 2–10. Option B yields $19.00 if the throw of the die is 1, and it pays $1.00 if the throw is 2–10. Similarly, for Decision 2, Option A will pay $10.00 if the throw of the die is 1 or 2 and will pay $8.00 if the throw of the die is 3–10. The other decisions are similar, except that as you move down the table, the chances of the higher payoff for each option increase. In fact, for Decision 10 in the bottom row, the die will not be needed since each option pays the highest payoff for sure, so your choice here is between $10.00 or $19.00. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are ﬁnished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your money

338

JAYSON L. LUSK AND KEITH H. COBLE

earnings for the option you chose for that decision. Earnings for this choice will be paid in cash when we ﬁnish. So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B in each of the ten boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings. Then you will write your earnings from the Decision Task in the ﬁrst blank at the bottom of the page marked ‘‘Earnings from Decision Task.’’ Are there any questions about the Decision Task before the next part of this exercise is explained? After your earnings from the Decision Task are determined, you will participate in a lottery. In this lottery, there is a 50% chance of losing $10.00 and a 50% chance of winning $10.00. So, after your earnings from the Decision Task are determined, while we are still at your desk, we will role the die again. If the throw of the die is 1–5, you will lose $10.00, but if the throw of the die comes up 6–10, you will earn $10.00. After your earnings from the lottery are determined, you will write this amount on the second blank at the bottom of the page marked ‘‘Earnings from Lottery.’’ Total earnings for the experiment are determined by adding ‘‘Earnings from Decision Task’’ and ‘‘Earnings from Lottery.’’ Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question. Instructions for Unfair Background Risk Treatment Your decision sheet shows ten decisions numbered one to ten on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices (either A or B) and record these in the ﬁnal column, but only one of them will be used in the end to determine your earnings. Before you start making your ten choices, please let me explain how these choices will affect your earnings for the experiment. Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, either A or B. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end. Now, please look at Decision 1 at the top. Option A pays $10.00 if the throw of the ten-sided die is 1, and it pays $8.00 if the throw is 2–10.

Risk Aversion in the Presence of Background Risk

339

Option B yields $19.00 if the throw of the die is 1, and it pays $1.00 if the throw is 2–10. Similarly, for Decision 2, Option A will pay $10.00 if the throw of the die is 1 or 2 and will pay $8.00 if the throw of the die is 3–10. The other decisions are similar, except that as you move down the table, the chances of the higher payoff for each option increase. In fact, for Decision 10 in the bottom row, the die will not be needed since each option pays the highest payoff for sure, so your choice here is between $10.00 or $19.00. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are ﬁnished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your money earnings for the option you chose for that decision. Earnings for this choice will be paid in cash when we ﬁnish. So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B in each of the ten boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings. Then you will write your earnings from the Decision Task in the ﬁrst blank at the bottom of the page marked ‘‘Earnings from Decision Task.’’ Are there any questions about the Decision Task before the next part of this exercise is explained? After your earnings from the Decision Task are determined, you will participate in a lottery. In this lottery, there is a 50% chance of losing $10.00 and a 50% chance of winning $0.00. So, after your earnings from the Decision Task are determined, while we are still at your desk, we will role the die again. If the throw of the die is 1–5, you will lose $10.00, but if the throw of the die comes up 6–10, you will earn $0.00. After your earnings from the lottery are determined, you will write this amount on the second blank at the bottom of the page marked ‘‘Earnings from Lottery.’’ Total earnings for the experiment are determined by adding ‘‘Earnings from Decision Task’’ and ‘‘Earnings from Lottery.’’ Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question.

10% chance of $10.00, 90% chance of $8.00

20% chance of $10.00, 80% chance of $8.00

30% chance of $10.00, 70% chance of $8.00

40% chance of $10.00, 60% chance of $8.00

50% chance of $10.00, 50% chance of $8.00

60% chance of $10.00, 40% chance of $8.00

70% chance of $10.00, 30% chance of $8.00

80% chance of $10.00, 20% chance of $8.00

90% chance of $10.00, 10% chance of $8.00

100% chance of $10.00, 0% chance of $8.00

1

2

3

4

5

6

7

8

9

10

$_______________

$_______________

Earnings from Lottery

Total Earnings

Earnings from Decision Task $_______________

Option A

Decision

Decision Task

Participant Number _______________

100% chance of $19.00, 0% chance of $1.00

90% chance of $19.00, 10% chance of $1.00

80% chance of $19.00, 20% chance of $1.00

70% chance of $19.00, 30% chance of $1.00

60% chance of $19.00, 40% chance of $1.00

50% chance of $19.00, 50% chance of $1.00

40% chance of $19.00, 60% chance of $1.00

30% chance of $19.00, 70% chance of $1.00

20% chance of $19.00, 80% chance of $1.00

10% chance of $19.00, 90% chance of $1.00

Option B

Decision Record Sheet

Which Option is Preferred?

340 JAYSON L. LUSK AND KEITH H. COBLE

RISK AVERSION IN LABORATORY ASSET MARKETS Peter Bossaerts and William R. Zame ABSTRACT This paper reports ﬁndings from a series of laboratory asset markets. Although stakes in these markets are modest, asset prices display a substantial equity premium (risky assets are priced substantially below their expected payoffs) – indicating substantial risk aversion. Moreover, the differences between expected asset payoffs and asset prices are in the direction predicted by standard asset-pricing theory: assets with higher beta have higher returns. This work suggests ways to separate the effects of risk aversion from competing explanations in other experimental environments.

1. INTRODUCTION Forty years of econometric tests have provided only a weak support for the predictions of asset-pricing theories (see Davis, Fama, & French, 2000, for instance). However, it is difﬁcult to know where the problems in such models lie, or how to improve them, because basic parameters of the theories – including the market portfolio, the true distribution of asset returns, the information available to investors – cannot be observed in the

Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 341–358 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00007-0

341

342

PETER BOSSAERTS AND WILLIAM R. ZAME

historical record. Laboratory tests of these theories are appealing because these basic parameters (and others) can be observed accurately – or even controlled. However, most asset-pricing theories rest on the assumption that individuals are risk averse.1 Because risks and rewards in laboratory experiments are (almost of necessity) small (in comparison to subjects’ lifetime wealth, or even current wealth), the degree of risk aversion observable in the laboratory might be so small as to be undetectable in the unavoidable noise, which would present an insurmountable problem. This paper reports ﬁndings from a series of laboratory asset markets that belie this concern: despite relatively small risks and rewards, the effects of risk aversion are detectable and signiﬁcant. Most obviously, observed asset prices imply a signiﬁcant equity premium: risky assets are priced signiﬁcantly below their expected payoffs. Moreover, the differences between expected asset payoffs and returns (payoffs per unit of investment) are in the direction predicted by standard asset-pricing theory: assets with higher beta have higher returns. In our laboratory markets, 30–60 subjects trade one riskless and two risky securities (whose dividends depend on the state of nature) and cash. Each experiment is divided into 6–9 periods. At the beginning of each period, subjects are endowed with a portfolio of securities and cash. During the period, subjects trade through a continuous, web-based open-book system (a form of double auction that keeps track of infra-marginal bids and offers). After a pre-speciﬁed time, trading halts, the state of nature is drawn, and subjects are paid according to their terminal holdings. The entire situation is repeated in each period but the state of nature is drawn anew at the end of each period. Subjects know the dividend structure (the payoff of each security in each state of nature) and the probability that each state will occur, and of course they know their own holdings and their own attitudes toward wealth and risk. They also have access to the history of orders and trades. Subjects do not know the number of participants in any given experiment, nor the holdings of other participants, nor the market portfolio. Typical earnings in a single experiment (lasting more than 2 h) are $50–100 per subject. Although this is a substantial wage for some subjects, it is small in comparison to lifetime wealth, or indeed to current wealth (the pool of subjects consists of undergraduates and MBA students). Small rewards suggest approximately risk-neutral behavior, asset prices nearly coincident with expected payoffs, little incentive to trade, and hence little trade at all. However, our experimental data are inconsistent with these implications of risk neutrality; rather the data suggest signiﬁcant risk aversion. Most obviously, substantial trade takes place and market prices are below expected

Risk Aversion in Laboratory Asset Markets

343

returns; moreover, assets with higher beta have higher returns/lower prices (as predicted by standard asset-pricing theories). Quantitative measures of risk aversion are provided by the Sharpe ratios of the market portfolio, which are in the range 0.2–1.7 – on the same order as the Sharpe ratio of the New York Stock Exchange (NYSE; computed on the basis of yearly data), which is 0.43 – and the imputed market risk aversion derived from CAPM, which is approximately 103. Following this introduction, Section 2 describes our experimental asset markets, Section 3 presents the data generated by these experiments and the relationship of these data to standard asset-pricing theories. Section 4 suggests implications of our experiments for the design and interpretation of other experiments where risk aversion may play a role, and concludes.

2. EXPERIMENTAL DESIGN In our laboratory markets the objects of trade are assets (state-dependent claims to wealth at the terminal time) A, B, N (Notes), and Cash. Notes are riskless and can be held in positive or negative amounts (can be sold short); assets A and B are risky and can only be held in non-negative amounts (cannot be sold short). Each experimental session of approximately 2 h is divided into 6–9 periods, lasting 15–20 min. (The length of the period is determined and announced to subjects in advance. Within each period, subject computers show time remaining.) At the beginning of a period, each subject (investor) is endowed with a portfolio of assets and Cash; the endowment of risky assets and Cash are non-negative, the endowment of Notes is negative (representing a loan that must be repaid). During the period, the market is open and assets may be traded for Cash. Trades are executed through an electronic open book system (a continuous double auction). During the period, while the market is open, no information about the state of nature is revealed, and no credits are made to subject accounts; in effect, consumption takes place only at the close of the market. At the end of each period, the market closes, the state of nature is drawn, payments on assets are made, and dividends are credited to subject accounts. (In some experiments, subjects were also given a bonus upon completion of the experiment.) Accounting in these experiments is in a ﬁctitious currency called francs, to be exchanged for dollars at the end of the experiment at a pre-announced exchange rate. Subjects whose cumulative earnings at the end of a period are not sufﬁcient to repay their loan are bankrupt; subjects who are bankrupt

344

PETER BOSSAERTS AND WILLIAM R. ZAME

for two consecutive trading periods are barred from trading in future periods.2 In effect, therefore, consumption in a given period can be negative. Subjects know their own endowments, and are informed about asset payoffs in each of the three states of nature X, Y, Z, and of the objective probability distribution over states of nature. We use two treatments of uncertainty. In the ﬁrst treatment, states of nature for each period are drawn independently with probabilities 1/3, 1/3, 1/3; randomization is achieved by using a random number generator or by drawing with replacement from an urn containing equal number of balls representing each state. In the second treatment, balls, marked with the state, are drawn without replacement from an urn initially containing 18 balls, 6 for each state.3 (In each treatment, subjects are informed of the procedure.) Asset payoffs are shown in Table 1 (1 unit of Cash is 1 franc in each state of nature), and the remaining parameters for each experiment are shown in Table 2. (Experiments are identiﬁed by year-month-day.) In all experiments, subjects were given complete instructions, including descriptions of some portfolio strategies (but no suggestions as to which strategies to choose). Complete instructions and other details are available at http://eeps3.caltech.edu/market-011126; use anonymous login, ID 1, and password a. Subjects are not informed of the endowments of others, or of the market portfolio (the social endowment of all assets), or the number of subjects, or whether these are the same from one period to the next. The information provided to subjects parallels the information available to participants in stock markets such as the NYSE and the Paris Bourse. We are especially careful not to provide information about the market portfolio, so that subjects cannot easily deduce the nature of aggregate risk – lest they attempt to use a standard model (such as CAPM) to predict prices, rather than to take observed prices as given. Keep in mind that neither general equilibrium theory nor asset-pricing theory require that participants have any more information than is provided in these experiments. Indeed, much of the power of these theories comes precisely from the fact that agents know only market prices and their own preferences and endowments. Table 1. State A B N

Asset Payoffs.

X

Y

Z

170 160 100

370 190 100

150 250 100

Yale

Stanford

Tulane

Berkeley

Caltech

Soﬁa

Caltech

99-02-11

99-04-07

99-11-10

99-11-11

01-11-14

01-11-26

01-12-05

D

D

D

I

I

I

I

I I

Draw Typeb

30 23 21 8 11 22 22 33 30 22 23 21 12 18 18 17 17

Subject Category (Number) 0 0 0 0 0 175 175 175 175 175 175 125 125 125 125 125 125

Bonus Reward (franc) 4 5 2 5 2 9 1 5 2 5 2 5 2 5 2 5 2

A 4 4 7 4 7 1 9 4 8 4 8 4 8 4 8 4 8

B

Endowments

19 20 20 20 20 25 24 22 23.1 22 23.1 22 23.1 22 23.1 22 23.1

Notes

c

400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400 400

Cash (franc)

0.03 0.03 0.03 0.03 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04 0.04

Exchange Rate $/franc

b

Place where subjects attended college. I, states are drawn independently across periods; D, states are drawn without replacement, starting from a population of 18 balls, 6 of each type (state). c As discussed in the text, endowment of Notes includes loans to be repaid at the end of the period.

a

Yale UCLA

Subject Poola

98-10-07 98-11-16

Date

Table 2. Experimental Parameters.

Risk Aversion in Laboratory Asset Markets 345

346

PETER BOSSAERTS AND WILLIAM R. ZAME

Keep in mind that the social endowment (the market portfolio), the distribution of endowments, and the set of subjects and hence preferences differ across experiments. Indeed, because preferences may be affected by earnings during the experiment, the possibility of bankruptcy, and the time to the end of the experiment, preferences may even be different across periods in the same experiment. Because equilibrium prices and choices depend on all of these, and because of the inevitable noise present in every experiment, there is every reason to expect equilibrium prices and choices to be different across experiments or even across different periods in a given experiment. Most of the subjects in these experiments had some knowledge of economics in general and of ﬁnancial economics in particular. In one experiment (01-11-26), subjects were mathematics undergraduates at the University of Soﬁa (Bulgaria), and were perhaps less knowledgeable about economics and ﬁnance. These experiments reported here were conducted between 1998 and 2001. More recently, we have used a different trading platform, which, among other things, avoids bankruptcy issues. Bossaerts, Meloso, and Zame (2006) reports data that replicate the features that we document here, in particular risk aversion.

3. FINDINGS Because all trading is done through a computerized continuous double auction, we can observe and record every transaction – indeed, every offer – but we focus on end-of-period prices: that is, the prices of the last transaction in each period.4 Because no uncertainty is resolved while the market is open, it is natural to organize the data using a static model of asset trading: investors trade assets before the state of nature is known, assets yield dividends and consumption takes place after the state of nature is revealed (see Arrow & Hahn, 1971 or Radner, 1972).5 Because Notes and Cash are both riskless, we simplify slightly and treat them as redundant assets.6 We therefore model our environment as involving trade in risky assets A, B, and a one riskless asset N (notes). Assets are claims to consumption in each of the three possible states of nature X, Y, Z. Write div A for the state-dependent dividends of asset A, div A(s) for dividends in state s, and so forth. If y ¼ ðyA ; yB ; yN Þ 2 IR3 is a portfolio of assets, we write div y ¼ yA ðdiv AÞ þ yB ðdiv BÞ þ yN ðdiv NÞ for the state-dependent dividends on the portfolio y.

347

Risk Aversion in Laboratory Asset Markets

There are I investors, each characterized by an endowment portfolio oi ¼ ðoiA ; oiB ; oiN Þ 2 IR2þ IR of risky and riskless assets, and a strictly concave, strictly monotone utility function U i : IR3 ! IR deﬁned over statedependent terminal consumptions. (To be consistent with our experimental design, we allow consumption to be negative but we require holdings of A, B to be non-negative.) Investors care only about consumption, so given asset prices q, investor i chooses a portfolio yi to maximize div yi subject to the budget constraint q yi r q oi. An equilibrium consists of asset prices q 2 IR3þþ and portfolio choices i y 2 IR2þ IR for each investor such that choices are budget feasible: for each i q yi q oi choices are budget optimal: for each i j 2 IR2þ IR; U i ðdiv jÞ 4 U i ðdiv yi Þ ) q j 4 q oi asset markets clear: I X i¼1

yi ¼

I X

oi

i¼1

In the following sections, we show ﬁrst, that observed prices are generally below risk-neutral prices, which implies risk aversion; second, that risk aversion is systematic; third that the effects of risk aversion can be quantiﬁed; and fourth, that risk aversion can be estimated.

3.1. Risk-neutral Pricing and Observed Pricing Risk neutrality for investor i means that Ui(x) ¼ E(x) (where the expectation is taken with respect to the true probabilities). If all investors are risk neutral then (normalizing so that the price of Cash is 1 and the price of Notes is 100), the unique equilibrium price is the risk-neutral price q ¼ ðEðAÞ; EðBÞ; EðNÞÞ ¼ ðEðAÞ; EðBÞ; 100Þ. Table 3 displays end-of-period prices in 72 periods across 9 experiments: the end-of-period price of asset A is below its expectation in 64 periods,

A B Nc A B N A B N A B N A B N A B N A B N A B N A B N

98-10-07

b

220/230 194/200 95d 215e 187 99 219 190 96 224 195 99 203 166 96 225 196 99 230/230 189/200 99 180/230 144/200 93 213/230 195/200 99

1

216/230 197/200 98 203 194 100 230 183 95 210 198 99 212 172 97 217 200 99 207/225 197/203 99 175/222 190/201 110 212/235 180/197 100

2

Table 3.

215/230 192/200 99 210 195 98 220 187 95 205 203 100 214 180 97 225 181 99 200/215 197/204 99 195/226 178/198 99 228/240 177/194 99

3 218/230 192/200 97 211 193 100 201 175 98 200 209 99 214 190 99 224 184 99 210/219 200/207 99 183/217 178/198 100 205/231 180/194 99

4 208/230 193/200 99 185 190 100 219 190 96 201 215 99 210 192 98 230 187 99 223/223 189/204 99 200/220 190/201 98 207/237 172/190 99

5

Period

205/230 195/200 99 201 185 99 230 180 99 213 200 99 204 189 101 233 188 99 226/228 203/208 99 189/225 184/197 99 232/242 180/192 99

6

End-of-Period Transaction Prices.

215 188 99 233/234 211/212 99 177/213 188/198 102 242/248 190/195 99

240 200 97 201 204 99

7

209 190 99 246/242 198/208 98 190/219 175/193 99 255/257 185/190 99

208 220 99

8

229/246 185/190 100

209/228 203/210 99

9

b

Security. End-of-period transaction price/expected payoff. c Notes. d For Notes, end-of-period transaction prices only are displayed. Payoff equals 100. e End-of-period transaction prices only are displayed. Expected payoffs are as in 98-10-07. Same for 99-02-11, 99-04-07, 99-11-10, and 99-11-11.

a

01-12-05

01-11-26

01-11-14

99-11-11

99-11-10

99-04-07

99-02-11

98-11-16

Sec

Date

a

348 PETER BOSSAERTS AND WILLIAM R. ZAME

349

Risk Aversion in Laboratory Asset Markets

equal to its expectation in 5 periods, above its expectation in 3 periods; the end-of-period price of asset B is below its expectation in 64 periods, equal to its expectation in 3 periods, above its expectation in 5 periods. Indeed, in many experiments, all or nearly all transactions take place at a price below the asset expectation. For example, Fig. 1 records all the purchases/sales of assets throughout the eight periods of an experiment conducted on November 26, 2001: all of the more than 500 trades of the risky assets take place at a price below the assets’ expected payoffs. Two aspects of the data deserve further discussion. As may be seen from Fig. 1 and Table 3, Notes – which are riskless – may sell at a substantial discount throughout a trading period. As Bossaerts and Plott (2004) discuss, this discount is the effect of the cash-in-advance constraint imposed by the trading mechanism. Because trades require cash, subjects who wish to purchase a risky asset must either sell the other risky asset or sell Notes. This put downward pressure on the pricing of all assets. However, because Notes

240 A B Notes

220

Prices (in francs)

200 180 160 140 120 100 80 0

1000

2000

Fig. 1.

3000

4000 5000 6000 time (in seconds)

7000

Transaction Prices in Experiment 01-11-26.

8000

9000

350

PETER BOSSAERTS AND WILLIAM R. ZAME

can be sold short, while risky assets cannot, there is greater downward pressure on the pricing of Notes than on the pricing of other assets. However, because there is downward pressure on the pricing of risky assets, it is useful to have an additional test that the discounts at which they sell reﬂect risk aversion and not solely this downward pressure. Such a test is readily available, because we have two risky securities, with correlated ﬁnal payoffs. In particular, CAPM predicts that the security with the lower beta (lower covariance of ﬁnal payoff with the market portfolio) will have lower expected returns, and hence will be priced at a lower discount relative to expected payoff. Inspection of Fig. 1 provides suggestive evidence for this: the discount for security B is generally less than that for security A; in experiment 01-11-26, it is precisely security B that had the lower beta. In the next two sections, we provide a systematic study of the relationship between discounts and betas. As mentioned before, prices within a period generally start out low and increase toward the end. This is most pronounced for the Notes, but the phenomenon occurs for the risky securities as well (in Fig. 1, one can detect it in all periods except the ﬁrst one). Again, the cash-in-advance constraint may explain this drift – subjects ﬁrst obtain cash by selling securities early on, and the subsequent execution of buy orders puts upward pressure on prices. An alternative explanation for the drift in prices of risky securities comes from out-of-equilibrium trading. Bossaerts (2006) shows that such a drift obtained in a world where subjects only attempt to trade in locally optimal directions. Local optimization makes sense when subjects cannot execute large orders without affecting prices, and when it is hard to put any prior on possible future price movements for lack of knowledge of the structure of the economy (number of traders, preferences of traders, endowments, etc.). Importantly, this explanation builds on risk aversion; under risk neutrality, the drift would disappear. As such, the upward pressure on prices of risky securities during a period could be attributed to risk aversion as well as the cash-in-advance constraint. This dual possibility requires that we provide an independent test of the importance of risk aversion, to which we now turn.

3.2. Prices and Betas Section 3.1 shows that asset prices are below risk-neutral prices, which implies risk aversion on the part of subjects. To see that the effect of risk aversion is systematic, we examine expected returns and asset betas.

351

Risk Aversion in Laboratory Asset Markets

Recall that the market portfolio is the social endowment of all assets 1 X M¼ oi i¼1

The beta of a portfolio y is the ratio of the covariance of y with the market portfolio to the variance of the market portfolio bðyÞ ¼

covðdiv y; div MÞ varðdiv MÞ

Given prices q, the expected rate of return of a portfolio y is Eðdiv y=q yÞ. Most asset-pricing theories predict that assets with higher betas should have higher expected rates of return. (For example, the Capital Asset Pricing Model (CAPM) predicts Eðdiv y=q yÞ 1 ¼ bðyÞ Eðdiv M=q MÞ 1.) In our laboratory markets, asset A always has higher beta than asset B so should have higher expected rate of return. Fig. 2 plots the difference in 0.5

Difference in Expected Return

0.4

0.3

0.2

0.1

0

1.2

Fig. 2.

1.3

1.4

1.5 1.6 1.7 Difference in Beta

1.8

1.9

Differences of Betas versus Differences of Expected Returns.

2

352

PETER BOSSAERTS AND WILLIAM R. ZAME

expected rates of return (expected rate of return of A minus expected rate of return of B) against the difference in betas (beta of A minus beta of B) for all 67 observations (all periods of all experiments).7 As the reader can see, the difference in expected rate of return is positive roughly 75% of the time. Applying a binomial test to the data yields a z-score of 8, so the correlation is very unlikely to be accidental.

3.3. Sharpe Ratios The data discussed above show that asset prices in our laboratory asset markets reﬂect signiﬁcant risk aversion; Sharpe ratios provide a useful way to quantify the effect of this risk aversion. Given asset prices q, the excess rate of return is the difference between the rate of return on y and the rate of return on the riskless asset. In our context, the rate of return on the riskless asset is 1, so the excess rate of return on the portfolio y is E½div y=q y 1. By deﬁnition, the Sharpe ratio of y is the ratio of its excess return to its volatility: E½div y=q y 1 Sh ðyÞ ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ varðdiv y=q yÞ In particular, the Sharpe ratio of the market portfolio M is E½div M=q M 1 ShðMÞ ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ varðdiv M=q MÞ If investors were risk neutral, asset prices would be equal expected dividends, so the numerator would be 0, and the Sharpe ratio of the market portfolio (indeed of every portfolio) would be 0. Roughly speaking, increasing risk aversion leads to lower equilibrium prices and hence to a higher Sharpe ratio (as we see below, CAPM leads to a precise statement), so the Sharpe ratio is a quantitative – although indirect – measure of market risk aversion. As Fig. 3 shows, except for one outlier, Sharpe ratios in our laboratory markets are in the range 0.2–1.7, clustering in the range 0.4–0.6. For comparison, recall that the Sharpe ratio of the market portfolio of stocks traded on the NYSE (computed on yearly data) is about 0.43. (Keep in mind that risks and rewards on the NYSE are enormously greater than in our experiments, so similar Sharpe ratios do not translate precisely into similar risk attitudes.)

353

Risk Aversion in Laboratory Asset Markets 2.5

981007

981116

990211

990407

991110

991111

011114

011126

011205

Market Sharpe Ratio

2

1.5

1

0.5

0 period

Fig. 3. Sharpe Ratios: All Periods, All Experiments.

3.4. CAPM An alternative approach to quantifying the risk aversion in our laboratory markets is to use a particular asset-pricing model to impute the market risk aversion. The CAPM of Sharpe (1964) is particularly well-suited to this exercise. CAPM can be derived from various sets of assumptions on primitives. For our purposes, assume that each investor’s utility for risky consumption depends only on the mean and variance; speciﬁcally, investor i’s utility function for state-dependent wealth x is U i ðxÞ ¼ EðxÞ

bi var ðxÞ 2

where expectations and variances are computed with respect to the true probabilities, and bi is absolute risk aversion. We assume throughout that risk aversion is sufﬁciently small that the utility functions Ui are strictly monotone

354

PETER BOSSAERTS AND WILLIAM R. ZAME

in the range of feasible consumptions, or at least observed consumptions. Because we allow consumption to be negative, and individual endowments 8 are portfolios of assets, this is enough to imply that CAPM holds. P i To formulate the pricing conclusion of CAPM, write m ¼ ðoA ; oiB Þ for the market portfolio of risky assets, and m ¼ m=I for the per capital portfolio of risky assets. Write m ¼ ðEðAÞ; EðBÞÞ for the vector of expected dividends of risky assets, ! cov ½A; A cov ½A; B D¼ cov ½B; A cov ½B; B for the covariance matrix of risky assets, and !1 I 1X 1 G¼ I i¼1 bi for the market risk aversion. Write p ¼ ð pA ; pB Þ for the vector of prices of risky assets. The pricing conclusion of CAPM is that the equilibrium price of risky assets is given by the formula p~ ¼ m G Dm In our setting, we know equilibrium prices, expected dividends, asset dividends and true probabilities, hence the covariance matrix, and the per capita market portfolio but not individual risk aversions. If CAPM pricing held exactly, we could impute the market risk aversion by solving the pricing formula for G. In our experiments, CAPM pricing does not hold exactly (see Bossaerts, Plott, and Zame (2007) for discussion of the distance of actual pricing to CAPM pricing), but we can impute market risk aversion as the best-ﬁtting G. Several possible notions of ‘‘best-ﬁtting’’ might be natural; we use Generalized Least Squares, where weights are based on the dispersion of individual holdings from the market portfolio; this is an economic measure of distance used and discussed in more detail in Bossaerts et al. (2007). This approach generates a direct estimate of the harmonic average risk aversion of the subjects, as opposed to individual estimates of the risk aversion coefﬁcients, from which the harmonic mean could be computed. Fig. 4 shows the imputed market risk aversion for all periods in all experiments. Note that there is considerable variation across experiments, and even within a given experiment; as we have noted earlier, subject preferences certainly vary across experiments and may even vary within a given experiment.

355

Risk Aversion in Laboratory Asset Markets 5 981007

981116

990211

990407

991110

991111

011114

011126

011205

4.5

Estimated Risk Aversion (*10 3)

4 3.5 3 2.5 2 1.5 1 0.5 0 period

Fig. 4.

Imputed Market Risk Aversion: All Periods, All Experiments.

4. CONCLUSION We have argued here that the effects of risk aversion in laboratory asset markets are observable and signiﬁcant, the observed effects are in the direction predicted by theory, and these effects are quantiﬁable. A crucial feature of our experimental design is that two risky assets are traded, so that the realization of uncertainty has two separate – but correlated – effects. It is this correlation that makes it possible to make quantitative inferences about the effects of risk aversion. In particular, willingness to pay for either risky asset depends on the price of the other risky asset and on the correlation between asset payoffs. (This is perhaps the central insight of CAPM.) In particular, if asset payoffs are negatively correlated, holding a portfolio of both assets (diversifying) is less risky than holding either asset separately, and more risk averse bidders should be willing to pay more to purchase a portfolio of both assets. Manipulation of the correlation between asset payoffs can therefore provide a rich variety of

356

PETER BOSSAERTS AND WILLIAM R. ZAME

choices, enabling the experimenter to better determine to what extent risk aversion inﬂuences behavior. These insights also suggest an approach to other laboratory settings in which risk aversion may play a role. For example, Harrison (1990) argues that deviations of observed behavior from theoretical predictions in laboratory tests of auction theory may be interpreted in a number of different ways: as failures of the theory, or as effects of risk aversion of bidders, or as effects of bidders’ (possibly incorrect) beliefs about the risk aversion of other bidders. It seems possible that these competing explanations might be disentangled by auctioning two prizes whose payoffs are risky but correlated, and by manipulating the correlation between values. In particular, it seems that bidders’ own risk aversion should drive up bids for prizes whose payoffs are negatively correlated (in comparison to bids for prizes whose payoffs are positively correlated). Because correlated risk is central to our work, our work is less closely connected to laboratory and naturally occurring experiments concerning gambles in the presence of background risk (Lusk & Coble, 2006; Harrison, List, & Towe, 2007).

NOTES 1. Here we refer to theories such as the Capital Asset Pricing Model of Sharpe (1964) that predict the prices of fundamental assets, rather than to theories such as the pricing formula of Black and Scholes (1973) that predicts the prices of options or other derivative assets. The latter theories do not rest on assumptions about investor risk attitudes, but rather on the absence of arbitrage. 2. However, the bankruptcy rule was never triggered more than twice in any experiment, and in half of the experiments was never triggered at all. 3. The second treatment was introduced because we noticed that some subjects fell prey to the gambler’s fallacy, behaving as if balls were drawn without replacement even when they were drawn with replacement. This suggested the second treatment, in which we actually used the procedure that some subjects believed to be used in the ﬁrst treatment. Note that, in the second treatment, true probabilities – hence payoff distributions – changed every period, and hence, that markets deﬁnitely had to ﬁnd a new equilibrium. However, Bossaerts and Plott (2004) report that prices generally remain much closer to CAPM under the second treatment than under the ﬁrst one. 4. See Asparouhova, Bossaerts, and Plott (2003) and Bossaerts and Plott (2004) for discussion of the evolution of prices during the experiment. 5. Because there is only one good, there is no trade in commodities, hence no trade after the state of nature is revealed. 6. In fact, Cash and Notes are not quite perfect substitutes because all transactions must take place through Cash, so that there is a transaction value to

Risk Aversion in Laboratory Asset Markets

357

Cash. As Table 3 shows, however, Cash and Notes are nearly perfect substitutes at the end of most periods in most experiments. 7. Expected return is computed as the ratio of expected payoff under the theoretical distribution and the last transaction price for the period minus 1; beta is computed analogously, as the ratio of: (i) the theoretical covariance in the payoff the security with the payoff of the market portfolio divided by the product of the last transaction price of the security; and (ii) the theoretical market payoff variance divided by the last-traded price of the market portfolio. The last-traded price of the market portfolio is obtained from the last transactions of the two risky securities. 8. In the usual CAPM, all assets can be sold short, while in our framework the risky assets A, B cannot be sold short. However, in Appendix A of Bossaerts et al. (2007) we show that, given the particular asset structure here, the restriction on short sales does not change the conclusions.

ACKNOWLEDGMENTS Comments from the editors and an anonymous referee were very helpful; the authors remain responsible for any mistakes or omissions. Bossaerts is grateful for ﬁnancial support from the R. G. Jenkins Family Fund, the National Science Foundation, and the Swiss Finance Institute. Zame is grateful for ﬁnancial support from the John Simon Guggenheim Memorial Foundation, the National Science Foundation, the Social and Information Sciences Laboratory at Caltech, and the UCLA Academic Senate Committee on Research. Opinions, ﬁndings, conclusions, and recommendations expressed in this material are those of the authors and do not necessarily reﬂect the views of any funding agency.

REFERENCES Arrow, K., & Hahn, F. (1971). General competitive analysis. San Francisco: Holden-Day. Asparouhova, E., Bossaerts, P., & Plott, C. (2003). Excess demand and equilibration in multisecurity ﬁnancial markets: The empirical evidence. Journal of Financial Markets, 6, 1–2. Black, F., & Scholes, M. (1973). The pricing of options and corporate liabilities. Journal of Political Economy, 81, 637–654. Bossaerts, P. (2006). Equilibration under competition in smalls: Theory and experimental evidence. Caltech Working Paper. Bossaerts, P., Meloso, D., & Zame, W. (2006). Pricing in experimental dynamically complete asset markets. Caltech Working Paper. Bossaerts, P., & Plott, C. (2004). Basic principles of asset pricing theory: Evidence from largescale experimental ﬁnancial markets. Review of Finance, 8, 135–169. Bossaerts, P., Plott, C., & Zame, W. (2007). Prices and portfolio choices in ﬁnancial markets: Theory, econometrics, experiments. Econometrica, 75(4), 993–1038.

358

PETER BOSSAERTS AND WILLIAM R. ZAME

Davis, J., Fama, E., & French, K. (2000). Characteristics, covariances, and average returns: 1929 to 1997. Journal of Finance, 55, 389–406. Harrison, G. W. (1990). Risk attitudes in ﬁrst-price auction experiments: A Bayesian analysis. Review of Economics and Statistics, 72, 541–546. Harrison, G. W., List, J., & Towe, C. (2007). Naturally occurring preferences and exogenous laboratory experiments: A case study of risk aversion. University of Central Florida. Econometrica, 75(2), 433–458. Lusk, J. L., & Coble, K. H. (2006). Risk aversion in the presence of background risks: Evidence from an economic experiment. Oklahoma State University Working Paper. Radner, R. (1972). Existence of equilibrium of plans, prices, and price expectations in a sequence of markets. Econometrica, 40, 289–303. Sharpe, W. (1964). Capital asset prices: A theory of market equilibrium under conditions of risk. Journal of Finance, 19, 425–442.

RISK AVERSION IN GAME SHOWS Steffen Andersen, Glenn W. Harrison, Morten I. Lau and E. Elisabet Rutstro¨m ABSTRACT We review the use of behavior from television game shows to infer risk attitudes. These shows provide evidence when contestants are making decisions over very large stakes, and in a replicated, structured way. Inferences are generally confounded by the subjective assessment of skill in some games, and the dynamic nature of the task in most games. We consider the game shows Card Sharks, Jeopardy!, Lingo, and ﬁnally Deal Or No Deal. We provide a detailed case study of the analyses of Deal Or No Deal, since it is suitable for inference about risk attitudes and has attracted considerable attention.

Observed behavior on television game shows constitutes a controlled natural experiment that has been used to estimate risk attitudes. Contestants are presented with well-deﬁned choices where the stakes are real and sizeable, and the tasks are repeated in the same manner from contestant to contestant. We review behavior in these games, with an eye to inferring risk attitudes. We describe the types of assumptions needed to evaluate behavior, and propose a general method for estimating the parameters of structural models of choice behavior for these games. We illustrate with a detailed case study of behavior in the U.S. version of Deal Or No Deal (DOND). Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 359–404 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00008-2

359

360

STEFFEN ANDERSEN ET AL.

In Section 1 we review the existing literature in this area that is focused on risk attitudes, starting with Gertner (1993) and the Card Sharks program. We then review the analysis of behavior on Jeopardy! by Metrick (1995) and on Lingo by Beetsma and Schotman (2001).1 In Section 2 we turn to a detailed case study of the DOND program that has generated an explosion of analyses trying to estimate large-stakes risk aversion. We explain the basic rules of the game, which is shown with some variations in many countries. We then review complementary laboratory experiments that correspond to the rules of the naturally occurring game show. Finally, we discuss alternative modeling strategies employed in related DOND literature. Section 3 proposes a general method for estimating choice models in the stochastic dynamic programming environment that most of these game shows employ. We resolve the ‘‘curse of dimensionality’’ in this setting by using randomization methods and certain simpliﬁcations to the forwardlooking strategies adopted. We discuss the ability of our approach to closely approximate the fully dynamic path that agents might adopt. We illustrate the application of the method using data from the U.S. version of DOND, and estimate a simple structural model of expected utility theory choice behavior. The manner in which our method can be extended to other models is also discussed. Finally, in Section 4 we identify several weaknesses of game show data, and how they might be addressed. We stress the complementary use of natural experiments, such as game shows, and laboratory experiments.

1. PREVIOUS LITERATURE 1.1. Card Sharks The game show Card Sharks provided an opportunity for Gertner (1993) to examine dynamic choice under uncertainty involving substantial gains and losses. Two key features of the show allowed him to examine the hypothesis of asset integration: each contestant’s stake accumulates from round to round within a game, and the fact that some contestants come back for repeat plays after winning substantial amounts. The game involves each contestant deciding in a given round whether to bet that the next card drawn from a deck will be higher or lower than

Risk Aversion in Game Shows

361

Fig. 1. Money Cards Board in Card Sharks.

some ‘‘face card’’ on display. Fig. 1 provides a rough idea of the layout of the ‘‘Money Cards’’ board before any face cards are shown. Fig. 2 provides a representation of the board from a computerized laboratory implementation2 of Card Sharks. In Fig. 2 the subject has a face card with a 3, and is about to enter the ﬁrst bet. Cards are drawn without replacement from a standard 52-card deck, with no Jokers and with Aces high. Contestants decide on the relative value of the next card, and then on an amount to bet that their choice is correct. If they are correct their stake increments by the amount bet, if they are incorrect their stake is reduced by the amount bet, and if the new card is the same as the face card there is no change in the stake. Every contestant starts off with an initial stake of $200, and bets could be in increments of $50 of the available stake. After three rounds in the ﬁrst, bottom ‘‘row’’ of cards, they move to the second, middle ‘‘row’’ and receive an additional $200 (or $400 in some versions). If the stake goes to zero in the ﬁrst row, contestants go straight to the second row and receive the new stake; otherwise, the additional stake is added to what remains from row one. The second row includes three choices, just as in the ﬁrst row. After these three choices, and if the stakes have not dropped to zero, they can play the ﬁnal bet. In this case they have to bet at least one-half of their stake, but otherwise the betting works the same way. One feature of the game is that contestants

362

STEFFEN ANDERSEN ET AL.

Fig. 2.

Money Cards Board from Lab Version of Card Sharks.

sometimes have the option to switch face cards in the hope of getting one that is easier to win against.3 The show aired in the United States in two major versions. The ﬁrst, between April 1978 and October 1981, was on NBC and had Jim Perry as the host. The second, between January 1986 and March 1989, was on CBS and had Bob Eubanks as the host.4 The maximum prize was $28,800 on the NBC version and $32,000 on the CBS version, and would be won if the contestant correctly bet the maximum amount in every round. This only occurred once. Using ofﬁcial inﬂation calculators5 this converts into 2006 dollars between $89,138 and $63,936 for the NBC version, and between $58,920 and $52,077 for the CBS version.

363

Risk Aversion in Game Shows

These stakes are actually quite modest in relation to contemporary game shows in the United States, such as DOND described below, which typically has a maximal stake of $1,000,000. Of course, maximal stakes can be misleading, since Card Sharks and DOND are both ‘‘long shot’’ lotteries. Average earnings in the CBS version used by Gertner (1993) were $4,677, which converts to between $8,611 and $7,611 in 2006, whereas average earnings in DOND have been $131,943 for the sample we report later (excluding a handful of special shows with signiﬁcantly higher prizes). 1.1.1. Estimates of Risk Attitudes The analysis of Gertner (1993) assumes a Constant Absolute Risk Aversion (CARA) utility function, since he did not have information on household wealth and viewed that as necessary to estimate a Constant Relative Risk Aversion (CRRA) utility function. We return to the issue of household wealth later. Gertner (1993) presents several empirical analyses. He initially (p. 511) focuses on the last round, and uses the optimal ‘‘investment’’ formula b¼

lnðpwin Þ lnðplose Þ 2a

where the probabilities of winning and losing the bet b are deﬁned by pwin and plose, and the utility function is UðWÞ ¼ expðaWÞ for wealth W.6 From observed bets he infers a. There are several potential problems with this approach. First, there is an obvious sample selection problem from only looking at the last round, although this is not a major issue since relatively few contestants go bankrupt (less than 3%). Second, there is the serious problem of censoring at bets of 50% or 100% of the stake. Gertner (1993, p. 510) is well aware of the issue, and indeed motivates several analytical approaches to these data by a desire to avoid it: Regression estimates of absolute risk aversion are sensitive to the distribution assumptions one makes to handle the censoring created by the constraints that a contestant must bet no more than her stake and at least half of her stake in the ﬁnal round. Therefore, I develop two methods to estimate a lower bound on the level of risk aversion that do not rely on assumptions about the error distribution.

The ﬁrst method he uses is just to assume that the censored responses are in fact the optimal response. The 50% bets are assumed to be optimal bets, when in fact the contestant might wish to bet less (but cannot due to the

364

STEFFEN ANDERSEN ET AL.

ﬁnal-round betting rules); thus inferences from these responses will be biased towards showing less risk aversion than there might actually be. Conversely, the 100% bets are assumed to be risk neutral, when in fact they might be risk lovers; thus inferences from these responses will be biased towards showing more risk aversion than there might actually be. Two wrongs do not make a right, although one does encounter such claims in empirical work. Of course, this approach still relies on exactly the same sort of assumptions about the interpretation of behavior, although not formalized in terms of an error distribution. And it is not apparent that the estimates will be lower bounds, since this censoring issue biases inferences in either direction. The average estimate of ARA to emerge is 0.000310, with a standard error of 0.000017, but it is not clear how one should interpret this estimate since it could be an overestimate or an underestimate. The second approach is a novel and early application of simulation methods, which we will develop in greater detail below. A computer simulates optimal play by a risk-neutral agent playing the entire game 10 million times, recognizing that the cards are drawn without replacement. The computer does not appear to recognize the possibility of switching cards, but that is not central to the methodological point. The average return from this virtual lottery (VL) is $6,987 with a standard deviation of $10,843. It is not apparent that the lottery would have a Gaussian distribution of returns, but that can be allowed for in a more complete numerical analysis as we show later, and is again not central to the main methodological point. The next step is to compare this distribution with the observed distribution of earnings, which was an average of $4,677 with a standard deviation of $4,258, and use a revealed preference argument to infer what risk attitudes must have been in play for this to have been the outcome instead of the VL: A second approach is to compare the sample distribution of outcomes with the distribution of outcomes if a contestant plays the optimal strategy for a risk-neutral contestant. One can solve for the coefﬁcient of absolute risk aversion that would make an individual indifferent between the two distributions. By revealed preference, an ‘‘average’’ contestant prefers the actual distribution to the expected-value maximizing strategy, so this is an estimate of the lower bound of constant absolute risk aversion (pp. 511/512).

This approach is worth considering in more depth, because it suggests estimation strategies for a wide class of stochastic dynamic programming problems which we develop in Section 3. This exact method will not work once one moves beyond special cases such as risk neutrality, where outcomes

Risk Aversion in Game Shows

365

and behavior in later rounds have no effect on optimal behavior in earlier rounds. But we will see that an extension of the method does generalize. The comparison proposed here generates a lower bound on the ARA, rather than a precise estimate, since we know that an agent with an even higher ARA would also implicitly choose the observed distribution over the virtual RN distribution. Obviously, if one could generate VL distributions for a wide range of ARA values, it would be possible to reﬁne this estimation step and select the ARA that maximizes the likelihood of the data. This is, in fact, exactly what we propose later as a general method for estimating risk attitudes in such settings. The ARA bound derived from this approach is 0.0000711, less than one-fourth of the estimate from the ﬁrst method. Gertner (1993, p. 512) concludes that The ‘‘Card Sharks’’ data indicate a level of risk aversion higher than most existing estimates. Contestants do not seem to behave in a risk-loving and enthusiastic way because they are on television, because anything they win is gravy, or because the producers of the show encourage excessive risk-taking. I think this helps lend credence to the potential importance and wider applicability of the anomalous results I document below.

His ﬁrst method does not provide any basis for these claims, since risk loving is explicitly assumed away. His second method does indicate that the average player behaves as if risk averse, but there are no standard errors on that bound. Thus, one simply cannot say that it is statistically signiﬁcant evidence of risk aversion. 1.1.2. EUT Anomalies The second broad set of empirical analyses by Gertner (1993) considers a regression model of bets in the ﬁnal round, and shows some alleged violations of EUT. The model is a two-limit tobit speciﬁcation, recognizing that bets at 50% and 100% may be censored. However, most of the settings in which contestants might rationally bet 50% or 100% are dropped. Bets with a face card of 2 or an Ace are dropped since they are sure things in the sense that the optimal bet cannot result in a loss (the bet is simply returned if the same card is then turned up). Similarly, bets with a face card of 8 are dropped, since contestants almost always bet the minimum. These deletions amount to 258 of the 844 observations, which is not a trivial sub-sample. The regression model includes several explanatory variables. The central ones are cash and stake. Variable cash is the accumulated earnings by the contestant to that point over all repetitions of the game. So this includes previous plays of the game for ‘‘champions,’’ as well as earnings

366

STEFFEN ANDERSEN ET AL.

accumulated in rounds 1–6 of the current game. Variable stake is the accumulated earnings in the current game, so it excludes earnings from previous games. One might expect the correlation of stake and cash to be positive and high, since the average number of times the game is played in these data is 1.85 ( ¼ 844/457). Additional explanatory variables include a dummy for new players that are in their ﬁrst game; the ratio of cash to the number of times the contestant has played the whole game (the ratio is 0 for new players); the value of any cars that have been won, given by the stated sticker price of the car; and dummy variables for each of the possible face card pairs (in this game a 3 is essentially the same as a King, a 4 the same as a Queen, etc). The stake variable is included as an interaction with these face dummies, which are also included by themselves.7 The model is estimated with or without a multiplicative heteroskedasticity correction, and the latter estimates preferred. Card-counters are ignored when inferring probabilities of a win, and this seems reasonable as a ﬁrst approximation. Gertner (1993, Section VI) draws two striking conclusions from this model. The ﬁrst is that stake is statistically signiﬁcant in its interactions with the face cards. The second is that the cash variable is not signiﬁcant. The ﬁrst result is said to be inconsistent with EUT since earnings in this show are small in relation to wealth, and The desired dollar bet should depend upon the stakes only to the extent that the stakes impact ﬁnal wealth. Thus, risky decisions on ‘‘Card Sharks’’ are inconsistent with individuals maximizing a utility function over just ﬁnal wealth. If one assumes that utility depends only on wealth, estimates of zero on card intercepts and signiﬁcant coefﬁcients on the stake variable imply that outside wealth is close to zero. Since this does not hold, one must reject utility depending only on ﬁnal wealth (p. 517).

This conclusion bears close examination. First, there is a substantial debate as to whether EUT has to be deﬁned over ﬁnal wealth, whatever that is, or can be deﬁned just over outcomes in the choice task before the contestant (e.g., see Cox and Sadiraj (2006) and Harrison, Lau, and Rutstro¨m (2007) for references to the historical literature). So even if one concludes that the stake matters, this is not fatal for speciﬁcations of EUT deﬁned over prizes, as clearly recognized by Gertner (1993, p. 519) in his reference to Markowitz (1952). Second, the deletion of all extreme bets likely leads to a signiﬁcant understatement of uncertainty about coefﬁcient estimates. Third, the regression does not correct for panel effects, and these could be signiﬁcant since the variables cash and stake are correlated with the individual.8 Hence their coefﬁcient estimates might be picking up other, unobservable effects that are individual-speciﬁc.

367

Risk Aversion in Game Shows

The second result is also said to be inconsistent with EUT, in conjunction with the ﬁrst result. The logic is that stake and cash should have an equal effect on terminal wealth, if one assumes perfect asset integration and that utility is deﬁned over terminal wealth. But one has a signiﬁcant effect on bets, and the other does not. Since the assumption that utility is deﬁned over terminal wealth and that asset integration is perfect are implicitly maintained by Gertner (1993, p. 517ff.), he concludes that EUT is falsiﬁed. However, one can include terminal wealth as an argument of utility without also assuming perfect asset integration (e.g., Cox & Sadiraj, 2006). This is also recognized explicitly by Gertner (1993, p. 519), who considers the possibility that ‘‘contestants have multi-attribute utility functions, so that they care about something in addition to wealth.’’9 Thus, if one accepts the statistical caveats about samples and speciﬁcations for now, these results point to the rejection of a particular, prominent version of EUT, but they do not imply that all popular versions of EUT are invalid.

1.2. Jeopardy! In the game show Jeopardy! there is a subgame referred to as Final Jeopardy. At this point, three contestants have cash earnings from the initial rounds. The skill component of the game consists of hearing some text read out by the host, at which point the contestants jump in to state the question that the text provides the answer to.10 In Final Jeopardy the contestants are told the general subject matter for the task, and then have to privately and simultaneously state a wager amount from their accumulated points. They can wager any amount up to their earned endowment at that point, and are rewarded with even odds: if they are correct they get that wager amount added, but if they are incorrect they have that amount deducted. The winner of the show is the contestant with the most cash after this ﬁnal stage. The winner gets to keep the earnings and come back the following day to try and continue as champion. In general, these wagers are affected by the risk attitudes of contestants. But they are also affected by their subjective beliefs about their own skill level relative to the other two contestants, and by what they think the other contestants will do. So this game cannot be fully analyzed without making some game-theoretic assumptions. Jeopardy! was ﬁrst aired in the United States in 1964, and continued until 1975. A brief season returned between 1978 and 1979, and then the modern era began in 1984 and continues to this day. The format changes have been

368

STEFFEN ANDERSEN ET AL.

relatively small, particularly during the modern era. The data used by Metrick (1995) comes from shows broadcasted between October 1989 and January 1992, and reﬂects more than 1,150 decisions. Metrick (1995) examines behavior in Final Jeopardy in two stages.11 The ﬁrst stage considers the subset of shows in which one contestant is so far ahead in cash that the bet only reveals risk attitudes and beliefs about own skill. In such ‘‘runaway games’’ there exist wagers that will ensure victory, although there might be some rationale prior to September 2003 for someone to bet an amount that could lead to a loss. Until then, the champion had to retire after ﬁve wins, so if one had enough conﬁdence in one’s skill at answering such questions, one might rationally bet more than was needed to ensure victory. After September 2003 the rules changed, so the champion stays on until defeated. In the runaway games Metrick (1995, p. 244) uses the same formula that Gertner (1993) used for CARA utility functions. The only major difference is that the probability of winning in Jeopardy! is not known objectively to the observer.12 His solution is to substitute the observed fraction of correct answers, akin to a rational expectations assumption, and then solve for the CARA parameter a that accounts for the observed bets. The result is an estimate of a equal to 0.000066 with a standard error of 0.000056. Thus, there is slight evidence of risk aversion, but it is not statistically signiﬁcant, leading Metrick (1995, p. 245) to conclude that these contestants behaved in a risk-neutral manner. The second stage of the analysis considers subsamples in which two players have accumulated scores that are sufﬁciently close that they have to take beliefs about the other into account, but where there is a distant third contestant who can be effectively ignored. Metrick (1995) cuts this Gordian knot of strategic considerations by assuming that contestants view themselves as betting against contestants whose behavior can be characterized by their observed empirical frequencies. He does not use these data to make inferences about risk attitudes.

1.3. Lingo The underlying game in Lingo involves a team of two people guessing a hidden ﬁve-letter word. Fig. 3 illustrates one such game from the U.S. version. The team is told the ﬁrst letter of the word, and can then just state words. If incorrect, the words that are tried are used to reveal letters in the correct word if there are any. To take the example in Fig. 3, the true word

369

Risk Aversion in Game Shows

Fig. 3.

The Word Puzzle in Lingo.

was STALL. So the initial S was shown. The team suggested SAINT and is informed (by light grey coloring) that A and T are present in the correct word. The team is not told the order of the letters A and T in the correct word. The team then suggested STAKE, and was informed that the T and A were in the right place (by grey coloring) and that no other letters were in the correct word. The team then tried STAIR, SEATS, and ﬁnally STALL. Most teams are able to guess the correct word in ﬁve rounds. The game occurs in two stages. In the ﬁrst stage, one team of two plays against another team for several of these Lingo word-guessing games. The couple with the most money then goes on to the second stage, which is the one of interest for measuring risk attitudes because it is non-interactive. So the winning couple comes into the main task with a certain earned endowment (which could be augmented by an unrelated game called ‘‘jackpot’’). The team also comes in with some knowledge of its own ability to solve these word-guessing puzzles. In the Dutch data used by Beetsma and Schotman (2001), spanning 979 games, the frequency distribution of the number of solutions across rounds

370

STEFFEN ANDERSEN ET AL.

1–5 in the ﬁnal stage was 0.14, 0.32, 0.23, 0.13, 0.081, and 0.089, respectively. Every round that the couple requires to guess the word means that they have to pick one ball from an urn affecting their payoffs, as described below. If they do not solve the word puzzle, they have to pick six balls. These balls determine if the team goes ‘‘bust’’ or ‘‘survives’’ something called the Lingo Board in that round. An example of the Lingo Board is shown in Fig. 4, from Beetsma and Schotman (2001, Fig. 3).13 There are 35 balls in the urn numbered from 1 to 35, plus one ‘‘golden ball.’’ If the golden ball is picked then the team wins the cash prize for that round and gets a free pass to the next round. If one of the numbered balls is picked, then the fate of the team depends on the current state of the Lingo Board. The team goes ‘‘bust’’ if they get a row, column, or diagonal of X’s, akin to the parlor game noughts and crosses. So solving the word puzzle in fewer moves is good, since it means that fewer balls have to be drawn from the urn, and hence that the survival probability is higher. In the example from Fig. 4, drawing a 5 would be fatal, drawing an 11 would not be, and drawing a 1 would not be if a 2 or 8 had not been previously drawn. If the team survives a round it gets a cash prize, and is asked if they want to keep going or stop. This lasts for ﬁve rounds. So apart from the skill part of the game, guessing the words, this is the only choice the team makes. This is therefore a ‘‘stop-go’’ problem, in which the team balances current earnings with the lottery of continuing and either earning more cash or going bust. If the team chooses to continue the stake doubles; if the golden ball had been drawn it is replaced in the urn. If the team goes bust it takes home nothing. Teams can play the game up to three times, then retire from the show.

Fig. 4.

Example of a Lingo Board.

Risk Aversion in Game Shows

371

Risk attitudes are involved when the team has to balance the current earnings with the lottery of continuing. That lottery depends on subjective beliefs about the skill level of the team, the state of the Lingo Board at that point, and the perception of the probabilities of drawing a ‘‘fatal’’ number or the golden ball. In many respects, apart from the skill factor and the relative symmetry of prizes, this game is remarkably like DOND, as we see later. Beetsma and Schotman (2001) evaluate data from 979 ﬁnals. Each ﬁnal lasts several rounds, so the sample of binary stop/continue decisions is larger, and constitutes a panel. Average earnings in this ﬁnal round in their sample are 4,106 Dutch guilders ( f ), with potential earnings, given the initial stakes brought into the ﬁnal, of around f 15,136. The average exchange rate into U.S. dollars in 1997, which is around when these data were from, was f 0.514 per dollar, so these stakes are around $2,110 on average, and up to roughly $7,780. These are not life-changing prizes, like the top prizes in DOND, but are clearly substantial in relation to most lab experiments. Beetsma and Schotman (2001, Section 4) show that the stop/continue decisions have a simple monotonic structure if one assumes CRRA or CARA utility. Since the odds of surviving never get better with more rounds, if it is optimal to stop in one round then it will always be optimal to stop in any later round. This property does not necessarily hold for other utility functions. But for these utility functions, which are still an important class, one can calculate a threshold survival probability pni for any round i such that the team should stop if the actual survival probability falls below it. This threshold probability does depend on the utility function and parameter values for it, but in a closed-form fashion that can be easily evaluated within a maximum-likelihood routine.14 Each team can play the game three times before it has to retire as a champion. The speciﬁcation of the problem clearly recognizes the option value in the ﬁrst game of coming back to play the game a second or third time, and then the option value in the second game of coming back to play a third time. The certainty-equivalent of these option values depends, of course, on the risk attitudes of the team. But the estimation procedure ‘‘black boxes’’ these option values to collapse the estimation problem down to a static one: they are free parameters to be estimated along with the parameter of the utility function. Thus, they are not constrained by the expected returns and risk of future games, the functional form of utility, and the speciﬁc parameters values being evaluated in the maximum-likelihood routine. Beetsma and Schotman (2001, p. 839) do clearly check that the option value in the ﬁrst game exceeds the option value in the second game, but (a) they only examine point estimates, and make no claim that this

372

STEFFEN ANDERSEN ET AL.

difference is statistically signiﬁcant,15 and (b) there is no check that the absolute values of these option values are consistent with the utility function and parameter values. In addition, there is no mention of any corrections for the fact that each team makes several decisions, and that errors for that team are likely correlated. With these qualiﬁcations, the estimate of the CRRA parameter is 0.42, with a standard error of 0.05, if one assumes that utility is only deﬁned over the monetary prizes. It rises to 6.99, with a standard error of 0.72, if one assumes a baseline wealth level of f 50,000, which is the preferred estimate. Each of these estimates is signiﬁcantly different from 0, implying rejection of risk neutrality in favor of risk aversion. The CARA speciﬁcation generates comparable estimates. One extension is to allow for probability weighting on the actual survival probability pi in round i. The weighting occurs in the manner of original Prospect Theory, due to Kahneman and Tversky (1979), and not in the rank-dependent manner of Quiggin (1982, 1993) and Cumulative Prospect Theory. One apparent inconsistency is that the actual survival probabilities are assumed to be weighted subjectively, but the threshold survival probabilities pni are not, which seems odd (see their Eq. (18), p. 843). The results show that estimates of the degree of concavity of the utility function increase substantially, and that contestants systematically overweight the actual survival probability. We return to some of the issues of structural estimation of models assuming decision weights, in a rank-dependent manner, in the discussion of DOND and Andersen, Harrison, Lau, and Rutstro¨m (2006a, 2006b).

2. DEAL OR NO DEAL 2.1. The Game Show as a Natural Experiment The basic version of DOND is the same across all countries. We explain the general rules by focusing on the version shown in the United States, and then consider variants found in other countries. The show confronts the contestant with a sequential series of choices over lotteries, and asks a simple binary decision: whether to play the (implicit) lottery or take some deterministic cash offer. A contestant is picked from the studio audience. They are told that a known list of monetary prizes, ranging from $0.01 up to $1,000,000, has been placed in 26 suitcases.16 Each suitcase is carried onstage by attractive female models, and has a number

Risk Aversion in Game Shows

373

from 1 to 26 associated with it. The contestant is informed that the money has been put in the suitcase by an independent third party, and in fact it is common that any unopened cases at the end of play are opened so that the audience can see that all prizes were in play. Fig. 5 shows how the prizes are displayed to the subject at the beginning of the game. The contestant starts by picking one suitcase that will be ‘‘his’’ case. In round 1, the contestant must pick 6 of the remaining 25 cases to be opened, so that their prizes can be displayed. Fig. 6 shows how the display changes after the contestant picks the ﬁrst case: in this case the contestant unfortunately picked the case containing the $300,000 prize. A good round for a contestant occurs if the opened prizes are low, and hence the odds increase that his case holds the higher prizes. At the end of each round the host is phoned by a ‘‘banker’’ who makes a deterministic cash offer to the contestant. In one of the ﬁrst American shows (12/21/2005) the host made a point of saying clearly that ‘‘I don’t know what’s in the suitcases, the banker doesn’t, and the models don’t.’’ The initial offer in early rounds is typically low in comparison to expected offers in later rounds. We use an empirical offer function later, but the qualitative trend is quite clear: the bank offer starts out at roughly 10% of

Fig. 5.

Opening Display of Prizes in TV Game Show Deal or No Deal.

374

STEFFEN ANDERSEN ET AL.

Fig. 6.

Prizes Available After One Case Has Been Opened.

the expected value of the unopened cases, and increments by about 10% of that expected value for each round. This trend is signiﬁcant, and serves to keep all but extremely risk-averse contestants in the game for several rounds. For this reason, it is clear that the case that the contestant ‘‘owns’’ has an option value in future rounds. In round 2, the contestant must pick ﬁve cases to open, and then there is another bank offer to consider. In succeeding rounds, 3–10, the contestant must open 4, 3, 2, 1, 1, 1, 1, and 1 cases, respectively. At the end of round 9, there are only two unopened cases, one of which is the contestant’s case. In round 9 the decision is a relatively simple one from an analyst’s perspective: either take the non-stochastic cash offer or take the lottery with a 50% chance of either of the two remaining unopened prizes. We could assume some latent utility function, and estimate parameters for that function that best explain observed binary choices. Unfortunately, relatively few contestants get to this stage, having accepted offers in earlier rounds. In our data, only 9% of contestants reach that point. More serious than the smaller sample size, one naturally expects that risk attitudes would affect those surviving to this round. Thus, there would be a serious sample attrition bias if one just studied choices in later rounds.

Risk Aversion in Game Shows

375

The bank offer gets richer and richer over time, ceteris paribus the random realizations of opened cases. In other words, if each unopened case truly has the same subjective probability of having any remaining prize, there is a positive expected return to staying in the game for more and more rounds. A risk-averse subject that might be just willing to accept the bank offer, if the offer were not expected to get better and better, would choose to continue to another round since the expected improvement in the bank offer provides some compensation for the additional risk of going into the another round. Thus, to evaluate the parameters of some latent utility function given observed choices in earlier rounds, we have to mentally play out all possible future paths that the contestant faces.17 Speciﬁcally, we have to play out those paths assuming the values for the parameters of the likelihood function, since they affect when the contestant will decide to ‘‘deal’’ with the banker, and hence the expected utility of the compound lottery. This corresponds to procedures developed in the ﬁnance literature to price pathdependant derivative securities using Monte Carlo simulation (e.g., Campbell, Lo, & MacKinlay, 1997, Section 9.4). We discuss general numerical methods for this type of analysis later. Saying ‘‘no deal’’ in early rounds provides one with the option of being offered a better deal in the future, ceteris paribus the expected value of the unopened prizes in future rounds. Since the process of opening cases is a martingale process, even if the contestant gets to pick the cases to be opened, it has a constant future expected value in any given round equal to the current expected value. This implies, given the exogenous bank offers (as a function of expected value), that the dollar value of the offer will get richer and richer as time progresses. Thus, bank offers themselves will be a submartingale process. In the U.S. version the contestants are joined after the ﬁrst round by several family members or friends, who offer suggestions and generally add to the entertainment value. But the contestant makes the decisions. For example, in the very ﬁrst show a lady was offered $138,000, and her hyperactive husband repeatedly screamed out ‘‘no deal!’’ She calmly responded, ‘‘At home, you do make the decisions. But y. we’re not at home!’’ She turned the deal down, as it happens, and went on to take an offer of only $25,000 two rounds later. Our sample consists of 141 contestants recorded between December 19, 2005 and May 6, 2007. This sample includes 6 contestants that participated in special versions, for ratings purposes, in which the top prize was increased from $1 million to $2 million, $3 million, $4 million, $5 million or $6 million.18 The biggest winner on the show so far has been Michelle Falco, who was lucky enough to be on the September 22, 2006 show with a top prize

376

STEFFEN ANDERSEN ET AL.

of $6 million. Her penultimate offer was $502,000 when the 3 unopened prizes were $10, $750,000 and $1 million, which has an expected value of $583,337. She declined the offer, and opened the $10 case, resulting in an offer of $808,000 when the expected value of the two remaining prizes was $875,000. She declined the offer, and ended up with $750,000 in her case. In other countries there are several variations. In some cases there are fewer prizes, and fewer rounds. In the United Kingdom there are only 22 monetary prizes, ranging from 1p up to d250,000, and only 7 rounds. In round 1 the contestant must pick 5 boxes, and then in each round until round 6 the contestant has to open 3 boxes per round. So there can be a considerable swing from round to round in the expected value of unopened boxes, compared to the last few rounds of the U.S. version. At the end of round 6 there are only 2 unopened boxes, one of which is the contestant’s box. Some versions substitute the option of switching the contestant’s box for an unopened box, instead of a bank offer. This is particularly common in the French and Italian versions, and relatively rare in other versions. Things become much more complex in those versions in which the bank offer in any round is statistically informative about the prize in the contestant’s case. In that case the contestant has to make some correction for this possibility, and also consider the strategic behavior of the banker’s offer. Bombardini and Trebbi (2005) offer clear evidence that this occurs in the Italian version of the show, but there is no evidence that it occurs in the U.K. version. The Australian version offers several additional options at the end of the normal game, called Chance, SuperCase, and Double Or Nothing. In many cases they are used as ‘‘entertainment ﬁller,’’ for games that otherwise would ﬁnish before the allotted 30 min. It has been argued, most notably by Mulino, Scheelings, Brooks, and Faff (2006), that these options should rationally change behavior in earlier rounds, since they provide some uncertain ‘‘insurance’’ against saying ‘‘deal’’ earlier rather than later. 2.2. Comparable Laboratory Experiments We also implemented laboratory versions of the DOND game, to complement the natural experimental data from the game shows.19 The instructions were provided by hand and read out to subjects to ensure that every subject took some time to digest them. As far as possible, they rely on screen shots of the software interface that the subjects were to use to enter their choices. The opening page for the common practice session in the lab, shown in Fig. 7, provides the subject with basic information about the task

Risk Aversion in Game Shows

Fig. 7.

377

Opening Screen Shot for Laboratory Experiment.

before them, such as how many boxes there were and how many boxes needed to be opened in any round.20 In the default setup the subject was given the same frame as in the Australian and U.S. game shows: this version has more prizes (26 instead of 22) and more rounds (9 instead of 6) than the U.K. version. After clicking on the ‘‘Begin’’ box, the lab subject was given the main interface, shown in Fig. 8. This provided the basic information for the DOND task. The presentation of prizes was patterned after the displays used on the actual game shows. The prizes are shown in the same nominal denomination as the Australian daytime game show, and the subject told that an exchange rate of 1,000:1 would be used to convert earnings in the DOND task into cash payments at the end of the session. Thus, the top cash prize the subject could earn was $200 in this version. The subject was asked to click on a box to select ‘‘his box,’’ and then round 1 began. In the instructions we illustrated a subject picking box #26, and then six boxes, so that at the end of round 1 he was presented with a deal from the banker, shown in Fig. 9. The prizes that had been opened in round 1 were ‘‘shaded’’ on the display, just as they are in the game show display. The subject is then asked to accept $4,000 or continue. When the

378

STEFFEN ANDERSEN ET AL.

Fig. 8.

Prize Distribution and Display for Laboratory Experiment.

game ends the DOND task earnings are converted to cash using the exchange rate, and the experimenter prompted to come over and record those earnings. Each subject played at their own pace after the instructions were read aloud. One important feature of the experimental instructions was to explain how bank offers would be made. The instructions explained the concept of the expected value of unopened prizes, using several worked numerical examples in simple cases. Then subjects were told that the bank offer would be a fraction of that expected value, with the fractions increasing over the rounds as displayed in Fig. 10. This display was generated from Australian game show data available at the time. We literally used the parameters deﬁning the function shown in Fig. 10 when calculating offers in the experiment, and then rounding to the nearest dollar.

Risk Aversion in Game Shows

Fig. 9.

379

Typical Bank Offer in Laboratory Experiment.

The subjects for our laboratory experiments were recruited from the general student population of the University of Central Florida in 2006.21 We have information on 676 choices made by 89 subjects. We estimate the same models for the lab data as for the U.S. game show data. We are not particularly interested in getting the same quantitative estimates per se, since the samples, stake, and context differ in obvious ways. Instead our interest is whether we obtain the same qualitative results: is the lab reliable in terms of the qualitative inferences one draws from it? Our null hypothesis is that the lab results are the same as the naturally occurring results. If we reject this hypothesis one could infer that we have just not run the right lab experiments in some respect, and we have some sympathy for that view. On the other hand, we have implemented our lab experiments in exactly the manner that we would normally do as lab experimenters. So we

380

STEFFEN ANDERSEN ET AL. Path of Bank Offers 1

Bank Offer As A Fraction of Expected Value of Unopened Cases

.9 .8 .7 .6 .5 .4 .3 .2 .1 0 1

Fig. 10.

2

3

4

5 Round

6

7

8

9

Information on Bank Offers in Laboratory Experiment.

are deﬁnitely able to draw conclusions in this domain about the reliability of conventional lab tests compared to comparable tests using naturally occurring data. These conclusions would then speak to the questions raised by Harrison and List (2004) and Levitt and List (2007) about the reliability of lab experiments. 2.3. Other Analyses of Deal or No Deal A large literature on DOND has evolved quickly.22 Appendix B in the working paper version documents in detail the modeling strategies adopted in the DOND literature, and similarities and differences to the approach we propose.23 In general, three types of empirical strategies have been employed to modeling observed DOND behavior. The ﬁrst empirical strategy is the calculation of CRRA bounds at which a given subject is indifferent between one choice and another. These bounds can be calculated for each subject and each choice, so they have the advantage of not assuming that each subject has the same risk preferences, just that they use the same functional form. The studies differ in terms of

Risk Aversion in Game Shows

381

how they use these bounds, as discussed brieﬂy below. The use of bounds such as these is familiar from the laboratory experimental literature on risk aversion: see Holt and Laury (2002), Harrison, Johnson, McInnes, and Rutstro¨m (2005), and Harrison, Lau, Rutstro¨m, and Sullivan (2005) for discussion of how one can then use interval regression methods to analyze them. The limitation of this approach, discussed in Harrison and Rutstro¨m (2008, Section 2.1), is that it is difﬁcult to go beyond the CRRA or other one-parameter families, and in particular to examine other components of choice under uncertainty (such as more ﬂexible utility functions, preference weighting or loss aversion).24 Post, van den Assem, Baltussen, and Thaler (2006) use CRRA bounds in their analysis, and it has been employed in various forms by others as noted below. The second empirical strategy is the examination of speciﬁc choices that provide ‘‘trip wire’’ tests of certain propositions of EUT, or provide qualitative indicators of preferences. For example, decisions made in the very last rounds often confront the contestant with the expected value of the unopened prizes, and allow one to identify those who are risk loving or risk averse directly. The limitation of this approach is that these choices are subject to sample selection bias, since risk attitudes and other preferences presumably played some role in whether the contestant reached these critical junctures. Moreover, they provide limited information at best, and do not allow one to deﬁne a metric for errors. If we posit some stochastic error speciﬁcation for choices, as is now common, then one has no way of knowing if these speciﬁc choices are the result of such errors or a manifestation of latent preferences. Blavatskyy and Pogrebna (2006) illustrate the sustained use of this type of empirical strategy, which is also used by other studies in some respects. The third empirical strategy it to propose a latent decision process and estimate the structural parameters of that process using maximum likelihood. This is the approach we favor, since it allows one to examine structural issues rather than rely on ad hoc proxies for underlying preferences. Harrison and Rutstro¨m (2008, Section 2.2) discuss the general methodological advantages of this approach.

3. A GENERAL ESTIMATION STRATEGY The DOND game is a dynamic stochastic task in which the contestant has to make choices in one round that generally entail consideration of future consequences. The same is true of the other game shows used for estimation

382

STEFFEN ANDERSEN ET AL.

of risk attitudes. In Card Sharks the level of bets in one round generally affects the scale of bets available in future rounds, including bankruptcy, so for plausible preference structures one should take this effect into account when deciding on current bets. Indeed, as explained earlier, one of the empirical strategies employed by Gertner (1993) can be viewed as a precursor to our general method. In Lingo the stop/continue structure, where a certain amount of money is being compared to a virtual money lottery, is evident. We propose a general estimation strategy for such environments, and apply it to DOND. The strategy uses randomization to break the general ‘‘curse of dimensionality’’ that is evident if one considers this general class of dynamic programming problems (Rust, 1997).

3.1. Basic Intuition The basic logic of our approach can be explained from the data and simulations shown in Table 1. We restrict attention here to the ﬁrst 75 contestants that participated in the standard version of the television game with a top prize of $1 million, to facilitate comparison of dollar amounts. There are nine rounds in which the banker makes an offer, and in round 10 the contestant simply opens his case. Only 7 contestants, or 9% of the sample of 75 continued to round 10, with most accepting the banker’s offer in rounds 6, 7, 8, and 9. The average offer is shown in column 4. We stress that this offer is stochastic from the perspective of the sample as a whole, even if it is non-stochastic to the speciﬁc contestant in that round. Thus, to see the logic of our approach from the perspective of the individual decision-maker, think of the offer as a non-stochastic number, using the average values shown as a proximate indicator of the value of that number in a particular instance. In round 1 the contestant might consider up to nine VLs. He might look ahead one round and contemplate the outcomes he would get if he turned down the offer in round 1 and accepted the offer in round 2. This VL, realized in virtual round 2 in the contestant’s thought experiment, would generate an average payoff of $31,141 with a standard deviation of $23,655. The top panel of Fig. 11 shows the simulated distribution of this particular lottery. The distribution of payoffs to these VLs are highly skewed, so the standard deviation may be slightly misleading if one thinks of these as Gaussian distributions. However, we just use the standard deviation as one pedagogic indicator of the uncertainty of the payoff in the VL: in our formal analysis we consider the complete distribution of the VL in a nonparametric manner.

75 100% 75 100% 75 100% 75 100% 74 99% 69 92% 53 71% 33 44% 17 23% 7 9%

10

16

20

16

5

1

0

0

0

$79,363

$107,779

$119,746

$112,818

$103,188

$75,841

$54,376

$33,453

$16,180

$31,141 $53,757 $73,043 $97,275 ($23,655) ($45,996) ($66,387) ($107,877) $53,535 $72,588 $96,887 ($46,177) ($66,399) ($108,086) $73,274 $97,683 ($65,697) ($107,302) $99,895 ($108,629)

Round 5 $104,793 ($102,246) $104,369 ($102,222) $105,117 ($101,271) $107,290 ($101,954) $111,964 ($106,137)

Round 6 $120,176 ($121,655) $119,890 ($121,492) $120,767 ($120,430) $123,050 ($120,900) $128,613 ($126,097) $128,266 ($124,945)

Round 7 $131,165 ($154,443) $130,408 ($133,239) $131,563 ($153,058) $134,307 ($154,091) $140,275 ($160,553) $139,774 ($159,324) $136,720 ($154,973)

Round 8

Round 10

$136,325 $136,281 ($176,425) ($258,856) $135,877 $135,721 ($175,278) ($257,049) $136,867 $136,636 173810 ($255,660)) $139,511 $139,504 ($174,702) ($257,219)) $145,710 $145,757 ($180,783) ($266,303) $145,348 $145,301 ($180,593) ($266,781) $142,020 $142,323 ($170,118) ($246,044) $116,249 $116,020 ($157,005) ($223,979) $53,929 ($113,721)

Round 9

Note: Data drawn from observations of contestants on the U.S. game show, plus author’s simulations of virtual lotteries as explained in the text.

10

9

8

7

6

5

4

3

2

1

Round 2 Round 3 Round 4

Looking At Virtual Lottery Realized In y

Virtual Lotteries for US Deal or No Deal Game Show.

Round Active Contestants Deal! Average Offer

Table 1.

Risk Aversion in Game Shows 383

384

STEFFEN ANDERSEN ET AL.

Density

VL if No Deal in round 1 and then Deal in round 2

0

50000

100000 Prize Value

150000

200000

Density

VL if No Deal in round 1 No Deal in round 2 and then Deal in round 3

0

50000

Fig. 11.

100000 Prize Value

150000

200000

Two Virtual Lottery Distributions in Round 1.

In round 1 the contestant can also consider what would happen if he turned down offers in rounds 1 and 2, and accepted the offer in round 3. This VL would generate, from the perspective of round 1, an average payoff of $53,757 with a standard deviation of $45,996. The bottom panel of Fig. 11 shows the simulated distribution of this particular VL. Compared to the VL in which the contestant said ‘‘No Deal’’ in round 1 and ‘‘Deal’’ in round 2, shown above it in Fig. 11, it gives less weight to the smallest prizes and greater weight to higher prizes. Similarly for each of the other VLs shown. The VL for the ﬁnal Round 10 is simply the implied lottery over the ﬁnal two unopened cases, since in this round the contestant would have said ‘‘No Deal’’ to all bank offers. The forward-looking contestant in round 1 is assumed to behave as if he maximizes the expected utility of accepting the current offer or continuing. The expected utility of continuing, in turn, is given by simply evaluating each of the nine VLs shown in the ﬁrst row of Table 1. The average payoff increases steadily, but so does the standard deviation of payoffs, so this evaluation requires knowledge of the utility function of the contestant. Given that utility function, the contestant is assumed to behave as if they evaluate the expected utility of each of the nine VLs. Thus, we calculate nine expected utility numbers, conditional on the speciﬁcation of the parameters

385

Risk Aversion in Game Shows

of the assumed utility function and the VLs that each subject faces in their round 1 choices. In round 1, the subject then simply compares the maximum of these nine expected utility numbers to the utility of the non-stochastic offer in round 1. If that maximum exceeds the utility of the offer, he turns down the offer; otherwise he accepts it. In round 2, a similar process occurs. One critical feature of our VL simulations is that they are conditioned on the actual outcomes that each contestant has faced in prior rounds. Thus, if a (real) contestant has tragically opened up the six top prizes in round 1, that contestant would not see VLs such as the ones in Table 1 for round 2. They would be conditioned on that player’s history in round 1. We report here averages over all players and all simulations. We undertake 100,000 simulations for each player in each round, so as to condition on their history.25 This example can also be used to illustrate how our maximum-likelihood estimation procedure works. Assume some speciﬁc utility function and some parameter values for that utility function, with all prizes scaled by the maximum possible at the outset of the game. The utility of the nonstochastic bank offer in round R is then directly evaluated. Similarly, the VLs in each round R can then be evaluated.26 They are represented numerically as 100-point discrete approximations, with 100 prizes and 100 probabilities associated with those prizes. Thus, by implicitly picking a VL over an offer, it is as if the subject is taking a draw from this 100-point distribution of prizes. In fact, they are playing out the DOND game, but this representation as a VL draw is formally identical. The evaluation of these VLs generates v(R) expected utilities, where v(1) ¼ 9, v(2) ¼ 8, y , v(9) ¼ 1 as shown in Table 1. The maximum expected utility of these v(R) in a given round R is then compared to the utility of the offer, and the likelihood evaluated in the usual manner. We present a formal statement of the latent EUT process leading to a likelihood deﬁned over parameters and the observed choices, and then discuss how this intuition changes when we assume alternative, non-EUT processes.

3.2. Formal Speciﬁcation We assume that utility is deﬁned over money m using the popular CRRA function uðmÞ ¼

m1r ð1 rÞ

(1)

386

STEFFEN ANDERSEN ET AL.

where r is the utility function parameter to be estimated. In this case r 6¼ 1 is the RRA coefﬁcient, and u(m) ¼ ln(m) for r ¼ 1. With this parameterization r ¼ 0 denotes risk-neutral behavior, rW0 denotes risk aversion, and ro0 denotes risk loving. We review one extension to this simple CRRA model later, but for immediate purposes it is desirable to have a simple speciﬁcation of the utility function in order to focus on the estimation methodology.27 Probabilities for each outcome k, pk, are those that are induced by the task, so expected utility is simply the probability-weighted utility of each outcome in each lottery. There were 100 outcomes in each VL i, so X ½pk uk (2) EUi ¼ k¼1;100

Of course, we can view the bank offer as being a degenerate lottery. A simple stochastic speciﬁcation was used to specify likelihoods conditional on the model. The EU for each lottery pair was calculated for a candidate estimate of the utility function parameters, and the index rEU ¼

EUBO EUL m

(3)

is calculated, where EUL is the lottery in the task, EUBO the degenerate lottery given by the bank offer, and m a Fechner noise parameter following Hey and Orme (1994).28 The index rEU is then used to deﬁne the cumulative probability of the observed choice to ‘‘Deal’’ using the cumulative standard normal distribution function: GðrEUÞ ¼ FðrEUÞ

(4)

This provides a simple stochastic link between the latent economic model and observed choices.29 The likelihood, conditional on the EUT model being true and the use of the CRRA utility function, depends on the estimate of r and m given the above speciﬁcation and the observed choices. The conditional log-likelihood is ln LEUT ðr; m; yÞ ¼

X

½ðln GðrEUÞjyi ¼ 1Þ þ ðlnð1 GðrEUÞÞjyi ¼ 0Þ

(5)

i

where yi ¼ 1(0) denotes the choice of ‘‘Deal’’ (No Deal) in task i. We extend this standard formulation to include forward-looking behavior by redeﬁning the lottery that the contestant faces. One such VL reﬂects the

Risk Aversion in Game Shows

387

possible outcomes if the subject always says ‘‘No Deal’’ until the end of the game and receives his prize. We call this a VL since it need not happen; it does happen in some fraction of cases, and it could happen for any subject. Similarly, we can substitute other VLs reﬂecting other possible choices by the contestant. Just before deciding whether to accept the bank offer in round 1, what if the contestant behaves as if the following simulation were repeated G times: Play out the remaining eight rounds and pick cases at random until all but two cases are unopened. Since this is the last round in which one would receive a bank offer, calculate the expected value of the remaining two cases. Then multiply that expected value by the fraction that the bank is expected to use in round 9 to calculate the offer. Pick that fraction from a prior as to the average offer fraction, recognizing that the offer fraction is stochastic.

The end result of this simulation is a sequence of G virtual bank offers in round 9, viewed from the perspective of round 1. This sequence then deﬁnes the VL to be used for a contestant in round 1 whose horizon is the last round in which the bank will make an offer. Each of the G bank offers in this virtual simulation occurs with probability 1/G, by construction. To keep things numerically manageable, we can then take a 100-point discrete approximation of this lottery, which will typically consist of G distinct real values, where one would like G to be relatively large (we use G ¼ 100,000). This simulation is conditional on the six cases that the subject has already selected at the end of round 1. Thus, the lottery reﬂects the historical fact of the six speciﬁc cases that this contestant has already opened. The same process can be repeated for a VL that only involves looking forward to the expected offer in round 8. And for a VL that only involves looking forward to rounds 7, 6, 5, 4, 3, and 2, respectively. Table 1 illustrates the outcome of such calculations. The contestant can be viewed as having a set of nine VLs to compare, each of which entails saying ‘‘No Deal’’ in round 1. The different VLs imply different choices in future rounds, but the same response in round 1. To decide whether to accept the deal in round 1, we assume that the subject simply compares the maximum EU over these nine VLs with the utility of the deterministic offer in round 1. To calculate EU and utility of the offer one needs to know the parameters of the utility function, but these are just nine EU evaluations and one utility evaluation. These evaluations can be undertaken within a likelihood function evaluator, given candidate values of the parameters of the utility function. The same process can be repeated in round 2, generating another set of eight VLs to be compared to the actual bank offer in round 2. This

388

STEFFEN ANDERSEN ET AL.

simulation would not involve opening as many cases, but the logic is the same. Similarly for rounds 3–9. Thus, for each of round 1–9, we can compare the utility of the actual bank offer with the maximum EU of the VLs for that round, which in turn reﬂects the EU of receiving a bank offer in future rounds in the underlying game. In addition, there exists a VL in which the subject says ‘‘No Deal’’ in every round. This is the VL that we view as being realized in round 10 in Table 1. There are several signiﬁcant advantages of this VL approach. First, since the round associated with the highest expected utility is not the same for all contestants due to heterogeneity in risk attitudes, it is of interest to estimate the length of this horizon. Since we can directly see that the contestant who has a short horizon behaves in essentially the same manner as the contestant who has a longer horizon, and just substitutes different VLs into their latent EUT calculus, it is easy to test hypotheses about restrictions on the horizon generated by more myopic behavior. Second, one can specify mixture models of different horizons, and let the data determine what fraction of the sample employs which horizon. Third, the approach generalizes for any known offer function, not just the ones assumed here and in Table 1. Thus, it is not as speciﬁc to the DOND task as it might initially appear. This is important if one views DOND as a canonical task for examining fundamental methodological aspects of dynamic choice behavior. Those methods should not exploit the speciﬁc structure of DOND, unless there is no loss in generality. In fact, other versions of DOND can be used to illustrate the ﬂexibility of this approach, since they sometimes employ ‘‘follow on’’ games that can simply be folded into the VL simulation. Finally, and not least, this approach imposes virtually no numerical burden on the maximum-likelihood optimization part of the numerical estimation stage: all that the likelihood function evaluator sees in a given round is a non-stochastic bank offer, a handful of (virtual) lotteries to compare it to given certain proposed parameter values for the latent choice model, and the actual decision of the contestant to accept the offer or not. This parsimony makes it easy to examine non-CRRA and non-EUT speciﬁcations of the latent dynamic choice process, illustrated in Andersen et al. (2006a, 2006b). All estimates allow for the possibility of correlation between responses by the same subject, so the standard errors on estimates are corrected for the possibility that the responses are clustered for the same subject. The use of clustering to allow for ‘‘panel effects’’ from unobserved individual effects is common in the statistical survey literature.30 In addition, we consider allowances for random effects from unobserved individual heterogeneity31

389

Risk Aversion in Game Shows

after estimating the initial model that assumes that all subjects have the same preferences for risk.

3.3. Estimates from Behavior on the Game Show We estimate the CRRA coefﬁcient to be 0.18 with a standard error of 0.030, implying a 95% conﬁdence interval between 0.12 and 0.24. So this provides evidence of moderate risk aversion over this large domain. The noise parameter m is estimated to be 0.077, with a standard deviation of 0.015. Based on the estimated risk coefﬁcients we can calculate the future round for which each contestant had the highest expected utility, seen from the perspective of the round when each decision is made. Fig. 12 displays histograms of these implied maximum EU rounds for each round-speciﬁc decision. For example, when contestants are in round 1 making a decision over ‘‘Deal’’ or ‘‘No Deal’’ we see that there is a strong mode for future round 9 as being the round with the maximum EU, given the estimated risk coefﬁcients. The prominence of round 9 remains across all rounds where contestants are faced with a ‘‘Deal’’ or ‘‘No Deal’’ choice, although we can

1

4

7

2

5

8

3

6

9

1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10 Future Round Used

1 2 3 4 5 6 7 8 9 10

150 100 50

Frequency

0 150 100 50 0 150 100 50 0

Fig. 12.

Evaluation Horizon by Round.

390

STEFFEN ANDERSEN ET AL.

see that in rounds 5–7 there is a slight increase in the frequency by which earlier rounds provide the maximum EU for some contestants. The expected utilities for other VLs may well have generated the same binary decision, but the VL for round 9 was the one that appeared to be used since it was greater than the others in terms of expected utility. We assume in the above analysis that all contestants can and do evaluate the expected utility for all VLs deﬁned as the EU of bank offers in future rounds. Nevertheless, it is possible that some, perhaps all, contestants used a more myopic approach and evaluated EU over much shorter horizons. It is a simple matter to examine the effects of constraining the horizon over which the contestant is assumed to evaluate options. If one assumed that choices in each round were based on a comparison of the bank offer and the expected outcome from the terminal round, ignoring the possibility that the maximum EU may be found for an intervening round, then the CRRA estimate becomes 0.12, with a 95% conﬁdence interval between 0.10 and 0.15. We cannot reject the hypothesis that subjects behave as if they are less risk averse if they are only assumed to look to the terminal round, and ignore the intervening bank offers. If one instead assumes that choices in each round were based on a myopic horizon, in which the contestant just considers the distribution of likely offers in the very next round, the CRRA estimate becomes 0.22, with a 95% conﬁdence interval between 0.18 and 0.42. Thus, we obtain results that are similar to those obtained when we allow subjects to consider all horizons, although the estimates are biased and imply greater risk aversion than the unconstrained estimates. The estimated noise parameter increases to 0.12, with a standard error of 0.043. Overall, the estimates assuming myopia are statistically signiﬁcantly different from the unconstrained estimates, even if the estimates of risk attitudes are substantively similar. Our speciﬁcation of alternative evaluation horizons does not lead to a nested hypothesis test of parameter restrictions, so a formal test of the differences in these estimates required a non-nested hypothesis test. We use the popular Vuong (1989) procedure, even though it has some strong assumptions discussed in Harrison and Rutstro¨m (2005). We ﬁnd that we can reject the hypothesis that the evaluation horizon is only the terminal horizon with a p-value of 0.026, and also reject the hypothesis that the evaluation horizon is myopic with a p-value of less than 0.0001. Finally, we can consider the validity of the CRRA assumption in this setting, by allowing for varying RRA with prizes. One natural candidate utility function to replace (1) is the Hyperbolic Absolute Risk Aversion (HARA) function of Merton (1971). We use a speciﬁcation of HARA32

Risk Aversion in Game Shows

391

given in Gollier (2001): y 1g UðyÞ ¼ z Z þ ; ga0 g

(10 Þ

where the parameter z can be set to 1 for estimation purposes without loss of generality. This function is deﬁned over the domain of y such that Z þ y=g40. The ﬁrst order derivative with respect to income is zð1 gÞ y g U 0 ð yÞ ¼ Zþ g g which is positive if and only if zð1 gÞ=g40 for the given domain of y. The second-order derivative is zð1 gÞ y g1 U 00 ðyÞ ¼ o0 Zþ g g which is negative for the given domain of y. Hence it is not possible to specify risk-loving behavior with this speciﬁcation when non-satiation is assumed. This is not a particularly serious restriction for a model of aggregate behavior in DOND. With this speciﬁcation ARA is 1/(Z+y/g), so the inverse of ARA is linear in income; RRA is y/(Z+y/g), which can both increase and decrease with income. Relative risk aversion is independent of income and equal to g when Z ¼ 0. Using the HARA utility function, we estimate Z to be 0.30, with a standard error of 0.070 and a 95% conﬁdence interval between 0.15 and 0.43. Thus, we can easily reject the assumption of CRRA over this domain. We estimate g to be 0.992, with a standard error of 0.001. Evaluating RRA over various prize levels reveals an interesting pattern: RRA is virtually 0 for all prize levels up to around $10,000, when it becomes 0.03, indicating very slight risk aversion. It then increases sharply as prize levels increase. At $100,000 RRA is 0.24, at $250,000 it is 0.44, at $500,000 it is 0.61, at $750,000 it is 0.70, and ﬁnally at $1 million it is 0.75. Thus, we observe striking evidence of risk neutrality for small stakes, at least within the context of this task, and risk aversion for large stakes. If contestants are constrained to only consider the options available to them in the next round, roughly the same estimates of risk attitudes obtain, even if one can again statistically reject this implicit restriction. RRA is again overestimated, reaching 0.39 for prizes of $100,000, 0.61 for prizes of $250,000, and 0.86 for prizes of $1 million. On the other hand, assuming that contestants only evaluate the terminal option leads to much lower

392

STEFFEN ANDERSEN ET AL.

estimates of risk aversion, consistent with the ﬁndings assuming CRRA. In this case there is virtually no evidence of risk aversion at any prize level up to $1 million, which is clearly implausible a priori.

3.4. Approximation to the Fully Dynamic Path Our VL approach makes one simplifying assumption which dramatically enhances its ability to handle complicated sequences of choices, but which can lead to a bias in the resulting estimates of risk attitudes. To illustrate, consider the contestant in round 8, facing three unopened prizes and having to open one prize if he declines the bank offer in round 8. Call these prizes X, Y, and Z. There are three combinations of prizes that could remain after opening one prize. Our approach to the VL, from the perspective of the round 8 decision, evaluates the payoffs that confront the contestant for each of these three combinations if he ‘‘mentally locks himself into saying Deal (D) in round 9 and then gets the stochastic offer given the unopened prizes’’ or if he ‘‘mentally locks himself into saying No Deal (ND) in round 9 and then opens 1 more prize.’’ The former is the VL associated with the strategy of saying ND in round 8 and D in round 9, and the latter is the VL associated with the strategy of saying ND in round 8 and ND again in round 9. We compare the EU of these two VL as seen from round 8, and pick the largest as representing the EU from saying ND in round 8. Finally, we compare this EU to the U from saying D in round 8, since the offer in round 8 is known and deterministic. The simpliﬁcation comes from the fact that we do not evaluate the utility function in each of the possible virtual round 9 decisions. A complete enumeration of each possible path would undertake three paired comparisons. Consider the three possible outcomes: If prize X had been opened we would have Y and Z unopened coming into virtual round 9. This would generate a distribution of offers in virtual round 9 (it is a distribution since the expected offer as a percent of the EV of unopened prizes is stochastic as viewed from round 8). It would also generate two outcomes if the contestant said ND: either he opens Y or he opens Z. A complete enumeration in this case should evaluate the EU of saying D and compare it to the EU of a 50–50 mix of Y and Z. If prize Y had been opened we would have X and Z unopened coming into virtual round 9. A complete enumeration should evaluate the EU of saying D and compare it to the EU of a 50–50 mix of X and Z.

Risk Aversion in Game Shows

393

If prize Z had been opened we would have X and Y unopened coming into virtual round 9. A complete enumeration should evaluate the EU of saying D and compare it to the EU of a 50–50 mix of X and Y. Instead of these three paired comparisons in virtual round 9, our approach collapses all of the offers from saying D in virtual round 9 into one VL, and all of the ﬁnal prize earnings from saying ND in virtual round 9 into another single VL. Our approach can be viewed as a valid solution to the dynamic problem the contestant faces if one accepts the restriction in the set of control strategies considered by the contestant. This restriction could be justiﬁed on behavioral grounds, since it does reduce the computational burden if in fact the contestant was using a process such as we use to evaluate the path. On the other hand, economists typically view the adoption of the optimal path as an ‘‘as if’’ prediction, in which case this behavioral justiﬁcation would not apply. Or our approach may just be viewed as one way to descriptively model the forward-looking behavior of contestants, which is one of the key features of the analysis of the DOND game show. Just as we have alternative ways of modeling static choice under uncertainty, we can have alternative ways of modeling dynamic choice under uncertainty. At some point it would be valuable to test these alternative models against each other, but that does not have to be the ﬁrst priority in trying to understand DOND behavior. It is possible to extend our general VL approach to take into account these possibilities, since one could keep track of all three pairs of VL in the above complete enumeration, rather than collapsing it down to just one pair of VL. Refer to this complete enumeration as VL. From the perspective of the contestant, we know that the EU(VL)ZEU(VL), since VL contains VL as a special case. We can therefore identify the implication of using VL instead of VL for our inferences about risk attitudes, again considering the contestant in round 8 for ease of exposition, and assuming that the contestant actually undertakes full enumeration as reﬂected in VL. Speciﬁcally, we will understate the EU of saying ND in round 8. This means that our ML estimation procedure would be biased toward ﬁnding less risk aversion than there actually is. To see this, assume some trial value of a CRRA risk aversion parameter. There are three possible cases, taking strict inequalities to be able to state matters crisply: 1. If this trial parameter r generates EU(VL)WEU(VL)WU(D) then the VL approach would make the same qualitative inference as the

394

STEFFEN ANDERSEN ET AL.

VL approach, but would understate the likelihood of that observation. This understatement comes from the implication that EU(VL)U(D)WEU(VL)U(D), and it is this difference that determines the probability of the observed choice (after some adjustment for a stochastic error). 2. If this trial parameter r generates EU(VL)oEU(VL)oU(D) then the VL approach would again make the same qualitative inference as the VL approach, but would overstate the likelihood of that observation. This overstatement comes from the implication in this case that EU(VL)U(D)oEU(VL)U(D). 3. If this trial parameter r generates EU(VL)WU(D)WEU(VL), then the VL approach would lead us to predict that the subject would make the D decision, whereas the VL approach would lead us to predict that the subject would make the ND decision. If we assume that the subject is actually motivated by VL, and we incorrectly use VL, we would observe a choice of ND and would be led to lower our trial parameter r to better explain the observed choice; lowering r would make the subject less risk averse, and more likely to reject the D decision under VL. But we should not have lowered the parameter r, we should just have calculated the EU of the ND choice using VL instead of VL. Note that one cannot just tabulate the incidence of these three cases at the ﬁnal ML estimate of r, and check to see if the vast bulk of choices fall into case #1 or case #2, since that estimate would have been adjusted to avoid case #3 if possible. And there is no presumption that the bias of the likelihood estimation in case #1 is just offset by the bias in case #2. So the bias from case #3 would lead us to expect that risk aversion would be underestimated, but the secondary effects from cases #1 and #2 should also be taken into account. Of course, if the contestant does not undertake full enumeration, and instead behaves consistently with the logic of our VL model, there is no bias at all in our estimates. The only way to evaluate the extent of the bias is to undertake the complete enumeration required by VL and compare to the approximation obtained with VL. We have done this for the game show data in the United States, starting with behavior in round 6. By skipping behavior in rounds 1–5 we only drop 15 out of 141 subjects, and undertaking the complete enumeration from earlier rounds is computationally intensive. We employ a 19-point approximation of the empirical distribution of bank offers in each round; in the VL approach we sampled 100,000 times from those distributions as part of the VL simulations. We then estimate the CRRA

Risk Aversion in Game Shows

395

model using VL, and estimate the same model for the same behavior using VL, and compare results. We ﬁnd that the inferred CRRA coefﬁcient increases as we use VL, as expected a priori, but by a very small amount. Speciﬁcally, we estimate CRRA to be 0.366 if we use VL and 0.345 if we use VL, and where the 95% conﬁdence intervals comfortably overlap (they are 0.25 and 0.48 for the VL approach, and 0.25 and 0.44 for the VL approach). The log-likelihood under the VL approach is 212.54824, and it is 211.27711 under the VL approach, consistent with the VL approach providing a better ﬁt, but only a marginally better ﬁt. Thus, we can claim that our VL approach provides an excellent approximation to the fully dynamic solution. It is worth stressing that the issue of which estimate is the correct one depends on the assumptions made about contestant behavior. If one assumes that contestants in fact use strategies such as those embodied in VL, then using VL would actually overstate true risk aversion, albeit by a trivial amount.

3.5. Estimates from Behavior in the Laboratory The lab results indicate a CRRA coefﬁcient of 0.45 and a 95% conﬁdence interval between 0.38 and 0.52, comparable to results obtained using more familiar risk elicitation procedures due to Holt and Laury (2002) on the same subject pool. When we restrict the estimation model to only use the terminal period we again infer a much lower degree of risk aversion, consistent with risk neutrality; the CRRA coefﬁcient is estimated to be 0.02 with a 95% conﬁdence interval between 0.07 and 0.03. Constraining the estimation model to only consider prospects one period ahead leads to higher inferred risk aversion; the CRRA coefﬁcient is estimated to be 0.48 with a 95% conﬁdence interval between 0.41 and 0.55.

4. CONCLUSIONS Game shows offer obvious advantages for the estimation of risk attitudes, not the least being the use of large stakes. Our review of analyses of these data reveal a steady progression of sophistication in terms of the structural estimation of models of choice under uncertainty. Most of these shows, however, put the contestant into a dynamic decision-making environment; so one cannot simply (and reliably) use static models of choice. Using DOND as a detailed case study, we considered a general estimation

396

STEFFEN ANDERSEN ET AL.

methodology for such shows in which randomization of the potential outcomes allows us to break the curse of dimensionality that comes from recognizing these dynamic elements of the task environment. The DOND paradigm is important for several reasons, and more general than it might at ﬁrst seem. It incorporates many of the dynamic, forwardlooking decision processes that strike one as a natural counterpart to a wide range of fundamental economic decisions in the ﬁeld. The ‘‘option value’’ of saying ‘‘No Deal’’ has clear parallels to the ﬁnancial literature on stock market pricing, as well as to many investment decisions that have future consequences (so-called real options). There is no frictionless market ready to price these options, so familiar arbitrage conditions for equilibrium valuation play no immediate role, and one must worry about how the individual makes these decisions. The game show offers a natural experiment, with virtually all of the major components replicated carefully from show to show, and even from country to country. The only sense in which DOND is restrictive is that it requires that the contestant make a binary ‘‘stop/go’’ decision. This is already a rich domain, as illustrated by several prominent examples: the evaluation of replacement strategy of capital equipment (Rust, 1987) and the closure of nuclear power plants (Rothwell & Rust, 1997). But it would be valuable to extend the choice variable to be non-binary, such as in Card Sharks where the contestant has the bet level to decide in each round, as well as some binary decision (whether to switch the face card). Although some progress has been made on this problem, reviewed in Rust (1994), the range of applications has not been wide (e.g., Rust & Rothwell, 1995). Moreover, none of these have considered risk attitudes, let alone associated concepts such as loss aversion or probability weighting. Thus, the detailed analysis of choice behavior in environments such as Card Sharks should provide a rich test case for many broader applications. These game shows provide a particularly fertile environment to test extensions to standard EUT models, as well as alternatives to EUT models of risk attitudes. Elsewhere, we have discussed applications that consider rankdependent models such as RDU, and sign-dependent models such as CPT (Andersen et al., 2006a, 2006b). These applications, using the VL approach and U.K. data, have demonstrated the sensitivity of inferences to the manner in which key concepts are operationalized. Andersen et al. (2006a) ﬁnd striking evidence of probability weighting, which is interesting since the DOND game has symmetric probabilities on each case. Using natural reference points to deﬁne contestant-speciﬁc gains or losses, they ﬁnd no evidence of loss aversion. Of course, that inference depends on having

Risk Aversion in Game Shows

397

identiﬁed the right reference point, but CPT is generally silent on that speciﬁcation issue when it is not obvious from the frame. Andersen et al. (2006b) illustrate the application of alternative ‘‘dual-criteria’’ models of choice from psychology, built to account for lab behavior with long shot, asymmetric lotteries such as one ﬁnds in DOND. No doubt many other speciﬁcations will be considered. Within the EUT framework, Andersen et al. (2006a) demonstrate the importance of allowing for asset integration. When utility is assumed to be deﬁned over prizes plus some outside wealth measure,33 behavior is well characterized by a CRRA speciﬁcation; but when it is assumed to be deﬁned over prizes only, behavior is better characterized by a non-CRRA speciﬁcation with increasing RRA over prizes. There are three major weaknesses of game shows. The ﬁrst is that one cannot change the rules of the game or the information that contestants receive, much as one can in a laboratory experiment. Thus, the experimenter only gets to watch and learn, since natural experiments are, as described by Harrison and List (2004), serendipity observed. However, it is a simple matter to design laboratory experiments that match the qualitative task domains in the game show, even if one cannot hope to have stakes to match the game show (e.g., Tenorio & Cason, 2002; Healy & Noussair, 2004; Andersen et al., 2006b; and Post, van den Assem, Baltussen, & Thaler, 2006). Once this has been done, exogenous treatments can be imposed and studied. If behavior in the default version of the game can be calibrated to behavior in a lab environment, then one has some basis for being interested in the behavioral effects of treatments in the lab. The second major weakness of game shows is the concern that the sample might have been selected by some latent process correlated with the behavior of interest to the analyst: the classic sample selection problem. Most analyses of game shows are aware of this, and discuss the procedures by which contestants get to participate. At the very least, it is clear that the demographic diversity is wider than found in the convenience samples of the lab. We believe that controlled lab experiments can provide guidance on the extent of sample selection into these tasks, and that the issue is a much more general one. The third major weakness of game shows is the lack of information on observable characteristics, and hence the inability to use that information to examine heterogeneity of behavior. It is possible to observe some information from the contestant, since there is normally some pre-game banter that can be used to identify sex, approximate age, marital status, and ethnicity. But the general solution here is to employ econometric methods that allow one to correct for possible heterogeneity at the level of the individual, even if one

398

STEFFEN ANDERSEN ET AL.

cannot condition on observable characteristics of the individual. Until then, one either pools over subjects under the assumption that they have the same preferences, as we have done; make restrictive assumptions that allow one to identify bounds for a given contestant, but then provide contestant-speciﬁc estimates (e.g., Post et al., 2006); or pay more attention to statistical methods that allow for unobserved heterogeneity. One such method is to allow for random coefﬁcients of each structural model to represent an underlying variation in preferences across the sample (e.g., Train, 2003, Chapter 6; De Roos & Saraﬁdis, 2006; and Botti et al., 2006). This is quite different from allowing for standard errors in the pooled coefﬁcient, as we have done. Another method is to allow for ﬁnite mixtures of alternative structural models, recognizing that some choices or subjects may be better characterized in this domain by one latent decision-making process and that others may be better characterized by some other process (e.g., Harrison & Rutstro¨m, 2005). These methods are not necessarily alternatives, but they each demand relatively large data sets and considerable attention to statistical detail.

NOTES 1. Behavior on Who Wants To Be A Millionaire has been carefully evaluated by Hartley, Lanot, and Walker (2005), but this game involves a large number of options and alternatives that necessitate some strong assumptions before one can pin down risk attitudes rigorously. We focus on games in which risk attitudes are relatively easier to identify. 2. These experiments are from unpublished research by the authors. 3. In the earliest versions of the show this option only applied to the ﬁrst card in the ﬁrst row. Then it applied to the ﬁrst card in each row in later versions. Finally, in the last major version it applied to any card in any row, but only one card per row could be switched. 4. Two further American versions were broadcast. One was a syndicated version in the 1986/1987 season, with Bill Rafferty as host. Another was a brief syndicated version in 2001. A British version, called Play Your Cards Right, aired in the 1980s and again in the 1990s. A German version called Bube Dame Ho¨rig, and a Swedish version called Lagt Kort Ligger, have also been broadcast. Card Sharks re-runs remain relatively popular on the American Game Show Network, a cable station. 5. Available at http://data.bls.gov/cgi-bin/cpicalc.pl 6. Let the expected utility of the bet b be pwin U(b)+plose U(b). The ﬁrst order condition for a maximum over b is then pwin Uu(b)+plose Uu(b) ¼ 0. Since Uu(b) ¼ exp(ab) and Uu(b) ¼ exp(a(b)), substitution and simple manipulation yield the formula.

Risk Aversion in Game Shows

399

7. In addition, a variable given by stake2/2000 is included by itself to account for possible nonlinearities. 8. Gertner (1993, p. 512): ‘‘I treat each bet as a single observation, ignoring any contestant-speciﬁc effects.’’ 9. He rejects this hypothesis, for reasons not important here. 10. For example, in a game aired on 9/16/2004, the category was ‘‘Speaking in Tongues.’’ The $800 text was ‘‘A 1996 Oakland School Board decision made many aware of this term for African-American English.’’ Uber-champion Ken Jennings correctly responded, ‘‘What be Ebonics?’’ 11. Nalebuff (1990, p. 182) proposed the idea of the analysis, and the use of empirical responses to avoid formal analysis of the strategic aspects of the game. 12. One formal difference is that the ﬁrst order condition underlying that formula assumes an interior solution, and the decision-maker in runaway games has to ensure that he does not bet too much to fall below the highest possible points of his rival. Since this constraint did not bind in the 110 data points available, it can be glossed. 13. The Lingo Board in the U.S. version is larger, and there are more balls in the urn, with implications for the probabilities needed to infer risk attitudes. 14. Their Eq. (12) shows the formula for the general case, and Eqs. (5) and (8) for the special ﬁnal-round cases assuming CRRA or CARA. There is no statement that this is actually evaluated within the maximum-likelihood evaluator, but pni is not listed as a parameter to be estimated separately from the utility function parameter, so this is presumably what was done. 15. The point estimates for the CRRA function (their Table 6, p. 837) are generally around f1,800 and f1,500, with standard errors of roughly f 200 on each. Similar results obtain for the CARA function (their Table 7, p. 839). So these differences are not obviously signiﬁcant at standard critical levels. 16. A handful of special shows, such as season ﬁnales and season openers, have higher stakes up to $6 million. Our later statistical analysis includes these data, and adjusts the stakes accordingly. 17. Or make some a priori judgments about the bounded rationality of contestants. For example, one could assume that contestants only look forward one or two rounds, or that they completely ignore bank offers. 18. Other top prizes were increased as well. For example, in the ﬁnal show of the ﬁrst season, the top ﬁve prizes were changed from $200k, $300k, $400k, $500k, and $1m to $300k, $400k, $500k, $2.5m, and $5m, respectively. 19. The instructions are available in Appendix A of the working paper version, available online at http://www.bus.ucf.edu/wp/ 20. The screen shots provided in the instructions and computer interface were much larger, and easier to read. Baltussen, Post, and van den Assem (2006) also conducted laboratory experiments patterned on DOND. They used instructions which were literally taken from the instructions given to participants in the Dutch DOND game show, with some introductory text from the experimenters explaining the exchange rate between the experimental game show earnings and take home payoffs. Their approach has the advantage of using the wording of instructions used in the ﬁeld. Our objective was to implement a laboratory experiment based on the DOND task, and clearly referencing it as a natural counterpart to the lab

400

STEFFEN ANDERSEN ET AL.

experiment. But we wanted to use instructions which we had complete control over. We wanted subjects to know exactly what bank offer function was going to be used. In our view the two types of DOND laboratory experiments complement each other, in the same sense in which lab experiments, ﬁeld experiments, and natural experiments are complementary (see Harrison & List, 2004). 21. Virtually all subjects indicated that they had seen the U.S. version of the game show, which was a major ratings hit on network television in ﬁve episodes screened daily at prime time just prior to Christmas in 2005. Our experiments were conducted about a month after the return of the show in the U.S., following the 2006 Olympic Games. 22. The literature has already generated a lengthy lead article in the Wall Street Journal (January 12, 2006, p. A1) and National Public Radio interviews in the U.S. with researchers Thaler and Post on the programs Day to Day (http://www.npr.org/ templates/story/story.php?storyId=5243893) and All Things Considered (http:// www.npr.org/templates/story/story.php?storyId=5244516) on March 3, 2006. 23. Appendix B is available in the working paper version, available online at http://www.bus.ucf.edu/wp/ 24. Abdellaoui, Barrios, and Wakker (2007, p. 363) offer a one-parameter version of the Expo-Power function which exhibits non-constant RRA for empirically plausible parameter values. It does impose some restrictions on the variations in RRA compared to the two-parameter EP function, but is valuable as a parsimonious way to estimate non-CRRA speciﬁcations, and could be used for ‘‘bounds analyses’’ such as these. 25. If bank offers were a deterministic and known function of the expected value of unopened prizes, we would not need anything like 100,000 simulations for later rounds. For the last few rounds of a full game, in which the bank offer is relatively predictable, the use of this many simulations is a numerically costless redundancy. 26. There is no need to know risk attitudes, or other preferences, when the distributions of the virtual lotteries are generated by simulation. But there is deﬁnitely a need to know these preferences when the virtual lotteries are evaluated. Keeping these computational steps separate is essential for computational efﬁciency, and is the same procedurally as pre-generating ‘‘smart’’ Halton sequences of uniform deviates for later, repeated use within a maximum-simulated likelihood evaluator (e.g., Train, 2003, p. 224ff.). 27. It is possible to extend the analysis by allowing the core parameter r to be a function of observable characteristics. Or one could view the CRRA coefﬁcient as a random coefﬁcient reﬂecting a subject-speciﬁc random effect u, so that one would estimate r^ ¼ r^0 þ u instead. This is what De Roos and Saraﬁdis (2006) do for their core parameters, implicitly assuming that the mean of u is zero and estimating the standard deviation of u. Our approach is just to estimate r^0 . 28. Harless and Camerer (1994), Hey and Orme (1994), and Loomes and Sugden (1995) provided the ﬁrst wave of empirical studies including some formal stochastic speciﬁcation in the version of EUT tested. There are several species of ‘‘errors’’ in use, reviewed by Hey (1995, 2002), Loomes and Sugden (1995), Ballinger and Wilcox (1997), and Loomes, Moffatt, and Sugden (2002). Some place the error at the ﬁnal choice between one lottery or the other after the subject has decided deterministically which one has the higher expected utility; some place the error earlier, on the comparison of preferences leading to the choice; and some place the error even earlier, on the determination of the expected utility of each lottery.

401

Risk Aversion in Game Shows

29. De Roos and Saraﬁdis (2006) assume a random effects term v for each individual and add it to the latent index deﬁning the probability of choosing deal. This is the same thing as changing our speciﬁcation (4) to GðrEUÞ ¼ FðrEUÞ þ v, and adding the standard deviation of v as a parameter to be estimated (the mean of v is assumed to be 0). 30. Clustering commonly arises in national ﬁeld surveys from the fact that physically proximate households are often sampled to save time and money, but it can also arise from more homely sampling procedures. For example, Williams (2000, p. 645) notes that it could arise from dental studies that ‘‘collect data on each tooth surface for each of several teeth from a set of patients’’ or ‘‘repeated measurements or recurrent events observed on the same person.’’ The procedures for allowing for clustering allow heteroskedasticity between and within clusters, as well as autocorrelation within clusters. They are closely related to the ‘‘generalized estimating equations’’ approach to panel estimation in epidemiology (see Liang & Zeger, 1986), and generalize the ‘‘robust standard errors’’ approach popular in econometrics (see Rogers, 1993). Wooldridge (2003) reviews some issues in the use of clustering for panel effects, noting that signiﬁcant inferential problems may arise with small numbers of panels. 31. In the DOND literature, de Roos and Saraﬁdis (2006) demonstrate that alternative ways of correcting for unobserved individual heterogeneity (random effects or random coefﬁcients) generally provide similar estimates, but that they are quite different from estimates that ignore that heterogeneity. Botti, Conte, DiCagno, and D’Ippoliti (2006) also consider unobserved individual heterogeneity, and show that it is statistically signiﬁcant in their models (which ignore dynamic features of the game). 32. Gollier (2001, p. 25) refers to this as a Harmonic Absolute Risk Aversion, rather than the Hyperbolic Absolute Risk Aversion of Merton (1971, p. 389). 33. This estimated measure might be interpreted as wealth, or as some function of wealth in the spirit of Cox and Sadiraj (2006).

ACKNOWLEDGMENTS Harrison and Rutstro¨m thank the U.S. National Science Foundation for research support under grants NSF/IIS 9817518, NSF/HSD 0527675, and NSF/SES 0616746. We are grateful to Andrew Theophilopoulos for artwork.

REFERENCES Abdellaoui, M., Barrios, C., & Wakker, P. P. (2007). Reconciling introspective utility with revealed preference: Experimental arguments based on prospect theory. Journal of Econometrics, 138, 356–378. Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006a). Dynamic choice behavior in a natural experiment. Working Paper 06–10, Department of Economics, College of Business Administration, University of Central Florida.

402

STEFFEN ANDERSEN ET AL.

Andersen, S., Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2006b). Dual criteria decisions. Working Paper 06–11, Department of Economics, College of Business Administration, University of Central Florida. Ballinger, T. P., & Wilcox, N. T. (1997). Decisions, error and heterogeneity. Economic Journal, 107, 1090–1105. Baltussen, G., Post, T., & van den Assem, M. (2006). Stakes, prior outcomes and distress in risky choice: An experimental study based on Deal or No Deal. Working Paper, Department of Finance, Erasmus School of Economics, Erasmus University. Beetsma, R. M. W. J., & Schotman, P. C. (2001). Measuring risk attitudes in a natural experiment: Data from the television game show Lingo. Economic Journal, 111, 821–848. Blavatskyy, P., & Pogrebna, G. (2006). Testing the predictions of decision theories in a natural experiment when half a million is at stake. Working Paper 291, Institute for Empirical Research in Economics, University of Zurich. Bombardini, M., & Trebbi, F. (2005). Risk aversion and expected utility theory: A ﬁeld experiment with large and small stakes. Working Paper 05–20, Department of Economics, University of British Columbia. Botti, F., Conte, A., DiCagno, D., & D’Ippoliti, C. (2006). Risk attitude in real decision problems. Unpublished Manuscript, LUISS Guido Carli, Rome. Campbell, J. Y., Lo, A. W., & MacKinlay, A. C. (1997). The econometrics of ﬁnancial markets. Princeton: Princeton University Press. Cox, J. C., & Sadiraj, V. (2006). Small- and large-stakes risk aversion: Implications of concavity calibration for decision theory. Games and Economic Behavior, 56(1), 45–60. De Roos, N., & Saraﬁdis, Y. (2006). Decision making under risk in deal or no deal. Working Paper, School of Economics and Political Science, University of Sydney. Gertner, R. (1993). Game shows and economic behavior: Risk-taking on Card Sharks. Quarterly Journal of Economics, 108(2), 507–521. Gollier, C. (2001). The economics of risk and time. Cambridge, MA: MIT Press. Harless, D. W., & Camerer, C. F. (1994). The predictive utility of generalized expected utility theories. Econometrica, 62(6), 1251–1289. Harrison, G. W., Johnson, E., McInnes, M. M., & Rutstro¨m, E. E. (2005). Risk aversion and incentive effects: Comment. American Economic Review, 95(3), 897–901. Harrison, G. W., Lau, M. I., & Rutstro¨m, E. E. (2007). Estimating risk attitudes in Denmark: A ﬁeld experiment. Scandinavian Journal of Economics, 109(2), 341–368. Harrison, G. W., Lau, M. I., Rutstro¨m, E. E., & Sullivan, M. B. (2005). Eliciting risk and time preferences using ﬁeld experiments: Some methodological issues. In: J. Carpenter, G. W. Harrison & J. A. List (Eds), Field Experiments in Economics (Vol. 10). Greenwich, CT: JAI Press, Research in Experimental Economics. Harrison, G. W., & List, J. A. (2004). Field experiments. Journal of Economic Literature, 42(4), 1013–1059. Harrison, G. W., & Rutstro¨m, E. E. (2005). Expected utility theory and prospect theory: One wedding and a decent funeral. Working Paper 05–18, Department of Economics, College of Business Administration, University of Central Florida; Experimental Economics, forthcoming. Harrison, G. W., & Rutstro¨m, E. E. (2008). Risk aversion in the Laboratory. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Vol. 12). Bingley, UK: Emerald, Research in Experimental Economics.

Risk Aversion in Game Shows

403

Hartley, R., Lanot, G., & Walker, I. (2005). Who really wants to be a Millionaire? Estimates of risk aversion from gameshow data. Working Paper, Department of Economics, University of Warwick. Healy, P., & Noussair, C. (2004). Bidding behavior in the Price Is Right Game: An experimental study. Journal of Economic Behavior and Organization, 54, 231–247. Hey, J. (1995). Experimental investigations of errors in decision making under risk. European Economic Review, 39, 633–640. Hey, J. D. (2002). Experimental economics and the theory of decision making under uncertainty. Geneva Papers on Risk and Insurance Theory, 27(1), 5–21. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62(6), 1291–1326. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92(5), 1644–1655. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47, 263–291. Levitt, S. D., & List, J. A. (2007). What do laboratory experiments measuring social preferences reveal about the real world? Journal of Economic Perspectives, 21(2), 153–174. Liang, K.-Y., & Zeger, S. L. (1986). Longitudinal data analysis using generalized linear models. Biometrika, 73, 13–22. Loomes, G., Moffatt, P. G., & Sugden, R. (2002). A microeconometric test of alternative stochastic theories of risky choice. Journal of Risk and Uncertainty, 24(2), 103–130. Loomes, G., & Sugden, R. (1995). Incorporating a stochastic element into decision theories. European Economic Review, 39, 641–648. Markowitz, H. (1952). The utility of wealth. Journal of Political Economy, 60, 151–158. Merton, R. C. (1971). Optimum consumption and portfolio rules in a continuous-time model. Journal of Economic Theory, 3, 373–413. Metrick, A. (1995). A natural experiment in ‘Jeopardy!’. American Economic Review, 85(1), 240–253. Mulino, D., Scheelings, R., Brooks, R., & Faff, R. (2006). An Empirical Investigation of Risk Aversion and Framing Effects in the Australian Version of Deal Or No Deal. Working Paper, Department of Economics, Monash University. Nalebuff, B. (1990). Puzzles: Slot machines, zomepirac, squash, and more. Journal of Economic Perspectives, 4(1), 179–187. Post, T., van den Assem, M., Baltussen, G., & Thaler, R. (2006). Deal or no deal? decision making under risk in a large-payoff game show. Working Paper, Department of Finance, Erasmus School of Economics, Erasmus University; American Economic Review, forthcoming. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior and Organization, 3(4), 323–343. Quiggin, J. (1993). Generalized expected utility theory: The rank-dependent model. Norwell, MA: Kluwer Academic. Rogers, W. H. (1993). Regression standard errors in clustered samples. Stata Technical Bulletin, 13, 19–23. Rothwell, G., & Rust, J. (1997). On the optimal lifetime of nuclear power plants. Journal of Business and Economic Statistics, 15(2), 195–208. Rust, J. (1987). Optimal replacement of GMC bus engines: An empirical model of Harold Zurcher. Econometrica, 55, 999–1033.

404

STEFFEN ANDERSEN ET AL.

Rust, J. (1994). Structural estimation of Markov decision processes. In: D. McFadden & R. Engle (Eds), Handbook of econometrics (Vol. 4). Amsterdam, NL: North-Holland. Rust, J. (1997). Using randomization to break the curse of dimensionality. Econometrica, 65(3), 487–516. Rust, J., & Rothwell, G. (1995). Optimal response to a shift in regulatory regime: The case of the US Nuclear Power Industry. Journal of Applied Econometrics, 10, S75–S118. Tenorio, R., & Cason, T. (2002). To spin or not to spin? Natural and laboratory experiments from The Price is Right. Economic Journal, 112, 170–195. Train, K. E. (2003). Discrete choice methods with simulation. New York, NY: Cambridge University Press. Vuong, Q. H. (1989). Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica, 57(2), 307–333. Williams, R. L. (2000). A note on robust variance estimation for cluster-correlated data. Biometrics, 56, 645–646. Wooldridge, J. (2003). Cluster-sample methods in applied econometrics. American Economic Review (Papers and Proceedings), 93, 133–138.

FURTHER REFLECTIONS ON THE REFLECTION EFFECT Susan K. Laury and Charles A. Holt ABSTRACT This paper reports a new experimental test of the notion that behavior switches from risk averse to risk seeking when gains are ‘‘reﬂected’’ into the loss domain. We conduct a sequence of experiments that allows us to directly compare choices under reﬂected gains and losses where real and hypothetical payoffs range from several dollars to over $100. Lotteries with positive payoffs are transformed into lotteries over losses by multiplying all payoffs by –1, that is, by reﬂecting payoffs around zero. When we use hypothetical payments, more than half of the subjects who are risk averse for gains turn out to be risk seeking for losses. This reﬂection effect is diminished considerably with cash payoffs, where the modal choice pattern is to exhibit risk aversion for both gains and losses. However, we do observe a signiﬁcant difference in risk attitudes between losses (where most subjects are approximately risk neutral) and gains (where most subjects are risk averse). Reﬂection rates are further reduced when payoffs are scaled up by a factor of 15 (for both real and hypothetical payoffs).

Risk Aversion in Experiments Research in Experimental Economics, Volume 12, 405–440 Copyright r 2008 by Emerald Group Publishing Limited All rights of reproduction in any form reserved ISSN: 0193-2306/doi:10.1016/S0193-2306(08)00009-4

405

406

SUSAN K. LAURY AND CHARLES A. HOLT

1. INTRODUCTION One of the most widely cited articles in economics is Kahneman and Tversky’s (1979) paper on prospect theory, which is designed to explain a range of lottery choice anomalies. This theory is motivated by the authors’ laboratory surveys and by subsequent ﬁeld observations (e.g., Camerer, 2001). A key observation is that decision making begins by identifying a reference point, often the current wealth position, from which people tend to be risk averse for gains and risk loving for losses. A striking prediction of the theory is the ‘‘reﬂection effect’’: a replacement of all positive payoffs by their negatives (reﬂection around zero) reverses the choice pattern. For example, a choice between a sure payoff of 3,000 and an 80 percent chance of getting 4,000 would be replaced by a choice between a certain loss of 3,000 and an 80 percent chance of losing 4,000. The typical reﬂection effect would imply a risk-averse preference for the sure safe 3,000 gain, but a reversed preference for the risky lottery in the loss domain. Reﬂected choice patterns reported by Kahneman and Tversky (1979) were quite high; for example, 80 percent of subjects chose the sure gain of 3,000, but only 8 percent chose the sure outcome when all payoffs were transformed into losses. The intuition is that ‘‘y certainty increases the aversiveness of losses as well as the desirability of gains’’ (Kahneman & Tversky, 1979, p. 269). The mathematical value functions used in prospect theory (concave for gains, convex for losses) can explain such a reﬂection effect, even when the safer prospect is not certain. This paper reports new experiments involving choice patterns with reﬂected gains and losses, using lotteries with real payoffs that range from several dollars to over $100. In this paper, we will use the terms ‘‘risk aversion’’ and ‘‘risk seeking’’ to refer to concavity and convexity of the utility function. It is worth noting that the nonlinear probability weighting present in many non-expected utility theories can also generate behavior that exhibits risk aversion (Tversky & Wakker, 1995).1 For example, consider an S-shaped weighting function that overweights small probabilities and underweights large probabilities, with probabilities of 0 and 1 getting weights of 0 and 1, respectively. In this setting, a person who prefers a 0.05 chance of 100 (otherwise 0) to a sure gain of 10 would exhibit risk seeking, which could be explained by overweighting of the low-probability gain. Similarly a person who prefers a sure payoff of 85 to a 0.95 chance of 100 (otherwise 0) would exhibit risk aversion, which could be explained by underweighting of the high-probability gain. A similar analysis can explain risk seeking for

Further Reﬂections on the Reﬂection Effect

407

high-probability losses and risk aversion for low-probability gains. Evidence supporting this ‘‘fourfold pattern’’ is provided by Tversky and Kahneman (1992). Notice that each of the above choices involved a comparison of a certainty with an uncertain payoff, and probability weighting can have a major effect if probabilities of 0 and 1 are not over- or underweighted (as is typically assumed), which can generate a ‘‘certainty effect.’’ The experiment design used in this paper involves paired lottery choices in which the probabilities of the high and low payoffs are held constant for a given pair of lotteries, but the payoffs for one of the lotteries are closer together, that is, it involves less risk. Another key element of risk preferences in prospect theory is loss aversion, which is typically modeled as a kink in the value function at the reference point, for example, 0. The intuition is that loss aversion causes the value function to decline more rapidly as the payoff is reduced below zero, and the kink at zero produces a concavity with respect to a pair of positive and negative payoffs. The experiments reported in this paper only involved pairs of payoffs that were either both positive or both negative, in order to avoid the confounding effect of loss aversion. Despite the widespread references to prospect theory, the decision patterns reported in Kahneman and Tversky (1979) and Tversky and Kahneman (1992) are based on hypothetical payoffs, set to be about equal to median monthly income in Israeli pounds at the time. They acknowledged that using real payoffs might change some behavioral patterns. However, their interest was in economic phenomena with larger stakes than those typically used in the lab; therefore they believed that using high hypothetical payoffs was the preferred method of eliciting choices. In doing so, they relied on the ‘‘assumption that people often know how they would behave in actual situations of choice, and on the further assumption that subjects have no special reason to disguise their true preferences’’ (Kahneman & Tversky, 1979). In Tversky and Kahneman (1992) they state that they found little difference in behavior between subjects who faced real and hypothetical payoffs.2 In contrast, there have been early documented effects of switching from hypothetical to monetary payoffs for choices between gambles (Slovic, 1969). While the use of hypothetical payoffs may not affect behavior much when low amounts of money are involved, this may not be the case with very high payoffs of the type used by Kahneman and Tversky to document the reﬂection effect. For example, Holt and Laury (2002) report that switching from hypothetical to real money payoffs has no signiﬁcant effect in a series of lottery choices when the scale of payoffs is in the range of several dollars per decision problem, as is typical in economics experiments. In addition,

408

SUSAN K. LAURY AND CHARLES A. HOLT

there is no signiﬁcant effect on choices when hypothetical payoffs are scaled up by factors of 20, 50, and 90, yielding (hypothetical) payoffs of several hundred dollars in the highest payoff conditions. This might lead researchers to conclude that increasing the scale of payoffs, or using hypothetical incentives, does not affect behavioral patterns. However, risk aversion increases sharply when real payoffs in these lotteries are increased in an identical manner.3 A similar increase in risk aversion as real payments are scaled up was reported by Binswanger (1980a, 1980b). Not all studies have shown evidence of ‘‘hypothetical bias,’’ but Harrison (2006) makes a strong case for the presence of such a bias in lottery choice experiments. In particular, he reexamines the widely cited Battalio, Kagel, and Jiranyakul (1990) study that found no qualitative effects of using hypothetical rather than real payoffs. Harrison reevaluates the data using a within-subject analysis, instead of a between-subjects analysis, and ﬁnds a signiﬁcant difference between risk attitudes in real payoff and hypothetical payoff settings.4 These results are not surprising to the extent that risk aversion may be inﬂuenced by emotional considerations that psychologists call ‘‘affect’’ (Slovic, 2001), since emotional responses are likely to be stronger when gains and losses must be faced in reality.5 In view of economists’ skepticism about hypothetical incentives and of psychologists’ notions of affect, we decided to reevaluate the reﬂection effect using hypothetical gains and losses and real monetary gains and losses, and also to test the effect of payoff scale on choices under gains and losses of differing magnitudes. Risk seeking over losses has been observed in experiments with ﬁnancial incentives that implement insurance markets. For example, Myagkov and Plott (1997) use market price and quantity data to infer that a majority of subjects are risk seeking in the loss domain in early periods of trading, but this tendency tends to diminish with experience. In contrast, BoschDomenech and Silvestre (1999) report a very strong tendency for subjects to purchase actuarially fair insurance over relatively large losses. This observation may indicate risk aversion in the loss domain; alternatively, it may be attributed to overweighting the low (0.2) probability of a loss (as suggested by the probability weighting function typically assumed in prospect theory).6 Laury and McInnes (2003) also ﬁnd that almost all subjects choose to purchase fair insurance against low-probability losses. The percentage insuring decreases as the probability of incurring a loss increases, but about two-thirds purchase insurance when the probability of a loss is close to one-half and systematic probability misperceptions cannot be a factor. Laury, McInnes, and Swarthout (2007) report that over

Further Reﬂections on the Reﬂection Effect

409

90 percent of subjects purchase insurance against a 1-percent chance of a loss of their full $60 earned-endowment when the insurance is fair, and 85 percent purchase when the insurance price is four times the actuarially fair price. None of these studies were primarily focused on the reﬂection effect, and therefore, none of them had parallel gain/loss treatments. Taken together, these market experiments provide no strong evidence either for or against such an effect, although there is some evidence in each direction. Some lottery choice experiments have directly tested the reﬂection effect. Hershey and Schoemaker (1980) ﬁnd evidence of reﬂection using hypothetical choices; in their study the highest rates of reﬂection were observed when probabilities were extreme. Cohen, Jaffray, and Said (1987) report only mixed support for a reﬂection effect, with only about 40 percent of the subjects exhibiting risk aversion for gains and risk preference for losses. Real payoffs were used, but the probability that any decision would be relevant was less than one in 5,000.7 Both Battalio et al. (1990) and Camerer (1989) report lottery choice experiments in which reﬂection patterns are present with real payoffs. These two studies involve choices where one gamble is a mean-preserving spread of the other, which is typically a certain amount of money. However, the amount of reﬂection (about 50 percent) is less than that reported by Kahneman and Tversky for most of their gambles. Harbaugh, Krause, and Vesterlund (2002) ﬁnd that support for reﬂection depends on how the choice problem is presented. Speciﬁcally, they report that risk attitudes are consistent with prospect theory when subjects are asked to price gambles, but not when they choose between the gamble and its expected value. Market and insurance purchase experiments are useful in that they provide a rich, economically relevant context. Our approach is complementary; we use a simple tool to measure risk preferences directly, based on a series of lottery choices with signiﬁcant money payoffs in parallel gain and loss treatments. This menu of choices allows us to obtain a well-calibrated measure of risk attitudes, which is not possible given the single pair-wise choices used in many of the earlier studies. The goal of the paper is to document the effect of reﬂecting payoffs (multiplying by –1) on lottery choice data for different payoff scales: hypothetical low payoffs, hypothetical high payoffs, low-money payoffs, and high-money payoffs. Our design, procedures, results (for low then high payoff conditions), maximumlikelihood estimation, and conclusions are presented in Sections 2–7, respectively.

410

SUSAN K. LAURY AND CHARLES A. HOLT

2. LOTTERY CHOICE DESIGN AND THEORETICAL PREDICTIONS The lottery choice task for the loss domain is shown in Table 1, as a menu of 10 decisions between lotteries that we will denote by S and R. These will be referred to as Decisions 1–10 (from top to bottom). In Decision 1 at the top of the table, the choice is between a loss of $3.20 for S and a loss of 20 cents for R, so subjects should start out choosing R at the top of the table, and then switch to S as the probability of the worse outcome ($4.00 for S or $7.70 for R) gets high enough. The optimal choice for a risk-neutral expected-utility maximizer is to choose R for the ﬁrst ﬁve decisions, and then switch to S, as indicated by the sign change in the expected payoff differences shown in the right column of the table. In fact, the payoff numbers were selected so that the risk-neutral choice pattern (ﬁve risky followed by ﬁve safe choices) was optimal for constant absolute risk aversion in the range (0.05, 0.05), which is symmetric around zero.

Table 1. Lottery S

0/10 of $4.00, $3.20 1/10 of $4.00, $3.20 2/10 of $4.00, $3.20 3/10 of $4.00, $3.20 4/10 of $4.00, $3.20 5/10 of $4.00, $3.20 6/10 of $4.00, $3.20 7/10 of $4.00, $3.20 8/10 of $4.00, $3.20 9/10 of $4.00, $3.20

Lottery Choices in the Loss Domain. Lottery R

10/10 of 9/10 of 8/10 of 7/10 of 6/10 of 5/10 of 4/10 of 3/10 of 2/10 of 1/10 of

0/10 of $7.70, $0.20 1/10 of $7.70, $0.20 2/10 of $7.70, $0.20 3/10 of $7.70, $0.20 4/10 of $7.70, $0.20 5/10 of $7.70, $0.20 6/10 of $7.70, $0.20 7/10 of $7.70, $0.20 8/10 of $7.70, $0.20 9/10 of $7.70, $0.20

Expected Payoff of S– Expected Payoff of R

10/10 of

$3.00

9/10 of

$2.33

8/10 of

$1.66

7/10 of

$0.99

6/10 of

$0.32

5/10 of

$0.35

4/10 of

$1.02

3/10 of

$1.69

2/10 of

$2.36

1/10 of

$3.03

Further Reﬂections on the Reﬂection Effect

411

Since the two payoffs for the S lottery are of roughly the same magnitude, this lottery is relatively ‘‘safe’’ (i.e., the variance of outcomes is low relative to the R lottery). Therefore, increases in risk aversion will tend to cause one to switch to the S side before Decision 6. For example, with absolute risk aversion of r ¼ 0.1 in the utility function u(x) ¼ (1e(rx)), it is straightforward to show that the expected-utility maximizing choice is R in the ﬁrst four decisions, and S in subsequent decisions. Conversely, riskloving preferences will cause a person to wait longer before switching to S, for example, to choose R in the six decisions at the top of the table for an absolute risk aversion coefﬁcient of 0.1.8 The gain treatment was obtained from Table 1 by replacing each loss with the corresponding gain, so that Decision 1 involves a choice between certain earnings of $3.20 for S and a certain gain of $0.20 for R. This reverses the signs of the expected payoff differences shown in the ﬁnal column of Table 1, so a risk-neutral person will choose S for the ﬁrst ﬁve decisions before switching to Lottery R. A risk-averse person will wait longer to switch, therefore choosing more than ﬁve safe choices. With constant relative risk aversion (CRRA) (u(x) ¼ (x)1r/(1 r) and xW0), the expected-utility maximizing decision is to choose S in the top four rows of the transformed table with gains when r ¼ 0.3, and to choose S in the top six rows when r ¼ 0.3. To summarize, a risk-neutral expected-utility maximizer would make ﬁve safe choices in each treatment, risk aversion (in the sense of concave utility) implies more than ﬁve safe choices in either treatment, and risk seeking (in the sense of convex utility) implies less than ﬁve safe choices in the loss treatment. We will interpret seeing more than ﬁve safe choices in the gain treatment and less than ﬁve safe choices in the loss treatment as behavioral evidence of a reﬂection effect.9 Note that this type of reﬂection is an empirical pattern that ﬁts nicely with the notion of a reference point in prospect theory from which gains and losses are measured. The predictions of a formal version of prospect theory will be considered next.

2.1. Prospect Theory We begin by reviewing the essential components of prospect theory. A prospect consists of a set of money prizes and associated probabilities. Consider a simple prospect that offers an amount x with probability p and y with probability 1 p, where xWyW0 are gains. The valuation

412

SUSAN K. LAURY AND CHARLES A. HOLT

functional is PT( p: x, y) ¼ w+( p)u(x)+(1 w+( p))u(y), where u designates the utility of money, and w+ the probability weighting function for gains. Next consider the case where xoyo0 are losses, which yields: PT( p: x, y) ¼ w( p)u(x)+(1 w( p))u(y). Here, u again designates the utility of money, now for losses, and w is the probability weighting function for losses. The standard approach for losses is to ﬁrst transform the probability of the lowest outcome, and not the probability of the highest outcome as is done for gains. Tversky and Kahneman (1992) estimated a value function parameterized by a utility function xa where xW0, and l(x)b when xo0, where l is a loss aversion parameter. The estimate of l was 2.25, which creates a sharp ‘‘concave kink’’ in the value function at x ¼ 0. The estimates of a and b were both 0.88, which correspond to concavity in the gain domain and convexity in the loss domain. They also concluded that w+ was not very different from w. In what follows, we will assume that w+ and w are the same and we will therefore denote them both by w. We will also assume that the value functions for gains and losses are symmetric in the sense that utility for a loss is found by multiplying the utility of a gain of equal absolute value by l, for example, a ¼ b in the power function parameterization. Although Tversky and Kahneman (1992) distinguish between these two parameters in their theoretical exposition, many others have adopted the simplifying assumption that a and b are identical. Further, Kobberling and Wakker (2005, p. 127) note that the assumption of CRRA and a ¼ b allows for the identiﬁcation of loss aversion without making other strong assumptions about utility. It easily follows that for xWyW0, the prospect theory valuation functionals are ‘‘reﬂected’’ in the sense that PT(p: x, y) ¼ lPT(p: x, y), or equivalently wð pÞuðxÞ þ ð1 wð pÞÞuðyÞ ¼ l½wð pÞuðxÞ þ ð1 wð pÞÞuðyÞ

(1)

The parameter l is important for evaluations of mixed lotteries with both gains and losses, but such lotteries are not present in our experiment. The parameter l plays no role in the ordering of lotteries with only losses (or only gains). Some studies done after Tversky and Kahneman (1992), including the data to be reported in this paper, suggest that reﬂection does not hold exactly, but as a benchmark, it is useful to know what an exact reﬂection ‘‘straw man’’ would imply for the choice menus that we use. Recall that our treatment transforms gains into losses of equal absolute value. The safe option is preferred in the gain domain if

Further Reﬂections on the Reﬂection Effect

413

w( p)u(4.00)+[1 w( p)]u(3.20)Ww( p)u(7.70)+[1 w( p)]u(0.20), or equivalently Option S preferred if

wð pÞ uð3:20Þ uð0:20Þ o 1 wð pÞ uð7:70Þ uð4:00Þ

(2)

Similarly, in the loss domain it is straightforward to show that Option S preferred if

wð pÞ uð0:20Þ uð3:20Þ 4 1 wð pÞ uð4:00Þ uð7:70Þ

(3)

But it follows from (1) that the right side of (3) is the same as the right side of (2), since the l expressions in the numerator and denominator cancel under the maintained assumptions. The reversal of inequalities in (2) and (3) means that if Lottery S is preferred in the gain domain for any particular value of p, the Lottery R will be preferred in the loss domain for that probability. Thus, an exact reﬂection in the value function (1) results in an exact reﬂection in lottery choices. Such a reﬂection occurs when, for example, a ¼ b ¼ 0.88, as noted above. Although exact reﬂection (e.g., seven safe choices in the gain domain and seven risky choices in the loss domain) can be predicted under these strong parametric conditions, such behavior is not pervasive in our data. Following Tversky and Kahneman (1992), we will focus on the qualitative predictions: whether there is risk aversion in the gain domain, and if so, whether this aversion becomes a preference in the loss domain. As noted above, the observation of more than ﬁve safe choices in either treatment is implied by risk aversion (concave utility), and the observation of less than ﬁve safe choices is implied by risk preference (convex utility).10

3. PROCEDURES All experiments were conducted at Georgia State University; participants responded to classroom and campus announcements about an opportunity to earn money in an economics research experiment. We recruited a total of 253 subjects in 25 groups, ranging in size from 4 to 16. No subject participated in more than one session. Subjects were separated by privacy dividers and were instructed not to communicate with each other after we began reading the instructions. Losses typically cannot be deducted from participants’ out-of-pocket cash reserves, so it was necessary to provide an initial cash balance. For example, Myagkov and Plott (1997) began by

414

SUSAN K. LAURY AND CHARLES A. HOLT

giving each participant a cash balance of $60. We chose to have subjects earn their initial balance; therefore all ﬁrst participated in another decisionmaking task. We hoped that by doing so they would not view these earnings as windfall gains.11 Therefore, we appended the lottery choices for losses and gains to the end of research experiments being used for other projects.12 Instructions (contained in the Appendix) and the choice tasks were identical between the real and hypothetical sessions. At the beginning of the hypothetical payment sessions, subjects were given a handout (contained in the Appendix) that informed them that all earnings were hypothetical. The instructions read, in part, ‘‘The instructions y describe how your earnings depend on your decisions (and sometimes on the decisions of others). It is important that you understand that you will not actually be receiving any of this additional money (other than your $45 participation fee).’’ All subjects signed a statement indicating that they understood this. All sessions (real and hypothetical) began with a simple lottery choice to acquaint them with the procedures and the 10-sided die that was used to determine the random outcomes. The payoffs in this initial lottery choice task differed from those used later. After ﬁnishing these initial tasks, subjects knew their earnings up to that point. In the real payment sessions, these initial earnings averaged $43, and ranged from $21.68 to $92.08. As noted above, subjects in hypothetical sessions received a $45 participation fee. Even though the average cash amounts were about the same in the two treatments, the initial cash amounts differed from person to person in the real-payoff treatments, which could have an effect on variations in observed risk attitudes. The experiments reported here consisted of four choice tasks. The ﬁrst and third of these were the lottery choice menus shown in Table 1, with alternation in the order of the gain and loss treatments in each pair of sessions to ensure that approximately the same number of subjects encountered each order. Thus, potential order effects were controlled in the low-payoff treatments (top two rows of Table 2) by alternating the order of the gain and loss treatments. As explained below, the average numbers of safe choices observed for the two orders were essentially the same, so for the high-payoff treatments (real and hypothetical) shown in the bottom three rows of Table 2, all sessions were conducted with the loss treatment ﬁrst. In order to minimize ‘‘carry-over effects,’’ these lottery choice tasks were separated by an intentionally neutral decision, a symmetric matching pennies game with (real or hypothetical) payoffs of $3.00 for the ‘‘winner’’ or $2.00 for the ‘‘loser’’ in each cell. In the lottery choice parts, all 10 choices were presented as in Table 1, but with the lotteries labeled as Option A and Option B, and without the expected payoff

415

Further Reﬂections on the Reﬂection Effect

Table 2. Payoff Treatment

Low hypothetical Low real High hypothetical High hypotheticala High real

Number of Subjects by Treatment and Order. Initial Earnings

$45 $43 $45 $132 $140

Option A ‘‘Safe’’

Option A ‘‘Risky’’

Gains ﬁrst

Losses ﬁrst

Gains ﬁrst

Losses ﬁrst

19 19 0 0 0

19 19 16 16 16

23 22 0 0 0

20 16 16 16 16

a

Decisions in this hypothetical payoff experiment followed another experiment that used very high real earnings.

calculations that might bias subjects toward risk-neutral decisions. Option A was always listed on the left side of the decision sheet. For about half of these subjects, Option A was the safe lottery and it was the risky lottery for the remaining subjects. Table 2 shows the number of subjects in each treatment and presentation order. Probabilities were presented in terms of the outcome of a throw of a 10sided die, for example, ‘‘$3.20 if the throw is 1 or 2, y’’ The instructions also speciﬁed that payoffs would be determined by one decision selected ex post (again with the throw of a 10-sided die).13 We collected decisions for all four parts (the gain and loss menus and the two matching pennies games) before determining earnings for any of them. While this does not exactly hold (anticipated) wealth effects constant, it does control for emotional responses to good or bad outcomes in each part. Moreover, wealth effects do not matter in prospect theory, since the utility valuations are based on gains and losses from the current wealth position.

4. RESULTS FROM LOW-PAYOFF SESSIONS In this section, we present an overview of our data patterns and a nonparametric analysis of treatment effects in our experiment. More formal statistical analysis is presented in Section 6, below. We ﬁrst compare the overall pattern of choices between the lotteries over gains and the lotteries over losses. This allows us to look for a reﬂection effect (risk aversion over gains and risk seeking over losses) in the aggregate data from our experiment.

416

SUSAN K. LAURY AND CHARLES A. HOLT

Recall that a risk-averse person would choose the safe lottery more than ﬁve times in each set of 10 paired choices, and that an approximately riskneutral person would choose the safe lottery ﬁve times before switching to the lottery with a wider range of payoffs. Some people are risk neutral in this sense, particularly when payoffs involve losses or are hypothetical. Fig. 1 shows cumulative choice frequencies for the number of safe choices for hypothetical payoffs (top) and real payoffs (bottom). In each panel, the thin line shows the risk-neutral prediction, for which the cumulative probability of four or fewer safe choices is zero, and the cumulative probability goes to one at ﬁve safe choices. The actual cumulative distributions for the gain treatment are below those of the loss treatment, indicating the tendency to make more safe choices in the gain domain, regardless of whether payoffs are real or hypothetical. These distributions indicate that, in aggregate, people are risk averse in the gain domain and approximately risk neutral in the loss domain. The difference between choices in the gain and loss domains is signiﬁcant, both for real and hypothetical payoffs. We use a matched-pairs Wilcoxon test (one-tailed) because each subject made choices under both gains and losses. We do not observe any signiﬁcant effect from the order in which the loss and gain treatments were conducted. Table 3 shows the mean number of safe choices by treatment (gain or loss, real or hypothetical). The top row shows all data for low (1) payoffs with both treatment orders combined (gains ﬁrst and losses ﬁrst). There was no clear effect of treatment order (gains ﬁrst or losses ﬁrst), as can be seen by comparing the top row (all data for both orders) with the second row (low 1 payoffs with losses presented to subjects ﬁrst).14 Next, we turn our attention to the evidence for reﬂection at the individual level. The top panel of Fig. 2 summarizes the choice data for the low-payoff hypothetical choice sessions. We begin by looking at count data (an econometric analysis will follow in Section 6). We use the number of safe choices to categorize individuals as being risk averse, risk neutral, or risk seeking, both in the loss domain (left to right) and the gain domain (back to front). The ‘‘spike’’ at the back, right corner of the graph represents those who exhibit the predicted reﬂection effect: risk seeking for losses and risk aversion for gains. Fifty percent of the subjects are risk averse over gains (back row of the ﬁgure); of these, just over half are risk-loving for losses. Of those subjects who do reﬂect, 40 percent involve exact reﬂection, that is, the number of safe choices in the gain domain exactly matches the number of risky choices in the loss domain. The modal choice pattern under hypothetical payoffs is reﬂection, and in this sense we are able to replicate the predicted choice

417

Further Reﬂections on the Reﬂection Effect Hypothetical Payments Risk Neutral

Cumulative Frequency

1

0.8 Gains

0.6

Losses

0.4

0.2

0 0

1

2

3

8

9

10

8

9

10

Real Payments Risk Neutral

1

Cumulative Frequency

4 5 6 7 Number of Safe Choices

0.8 Gains 0.6

Losses

0.4

0.2

0 0

1

2

Fig. 1.

3

4 5 6 7 Number of Safe Choices

Cumulative Choice Frequencies.

418

SUSAN K. LAURY AND CHARLES A. HOLT

Table 3.

Mean Number of Safe Choices by Treatment.

Treatment

1, all data 1, loss–gain 15, loss–gain 15, loss–gaina

Real Payoffs

Hypothetical Payoffs

Gains

Losses

Gains

Losses

5.91 5.71 6.31 –

5.21 5.26 5.22 –

5.53 5.62 5.69 4.91

4.98 5.13 5.13 5.31

Note: 1, low payoff treatment; 15, high payoff treatment; all data, both presentation orders combined: losses then gains and gains then loss; loss–gain, losses presented ﬁrst, then gains. a Decisions in this hypothetical payoff experiment followed an experiment that used very high real earnings.

pattern using our hypothetical lotteries, neither of which involves a certain prospect. However, when real cash payments are used, the results are quite different, as shown in the lower panel of Fig. 2. The modal outcome (shown in the back left corner) involves risk aversion for both gains and losses, even though these gains and losses are ‘‘low’’ (less than $8 in absolute value). Over gains, there is a little more risk aversion with (low) real payoffs: 60 percent of subjects exhibit risk aversion in the gain condition (back row of Fig. 2). Of these only about one-ﬁfth are risk seeking for losses (see the bar in the back, right corner). The rate of reﬂection in the bottom panel with real payoffs (13 percent) is half the rate of reﬂection observed under hypothetical payoffs (26 percent). Recall that the predicted choice pattern involves switching between the safe lottery and the risky lottery once, with the switch point determining the inferred risk attitude. In total, there were 44 out of 157 subjects who switched more than once in either the gain or loss treatment (or both).15 Since such multiple switching introduces some noise due to confusion or other considerations, it is instructive to look at choice patterns for those who switch only once in either treatment. These data produce a little more risk aversion in the gain domain, but the basic patterns shown in Fig. 2 remain unchanged. With real payoffs, for example, 67 percent are risk averse in the gain domain, but less than one-ﬁfth of these subjects exhibit reﬂection. Using hypothetical payoffs, the modal decision is still reﬂection; half who are risk averse in the gain domain are risk seeking in the loss domain. Just as when the full dataset is used, we ﬁnd about twice as much reﬂection with hypothetical payoffs as with real payoffs (26 percent compared with 12 percent, respectively).

419

Further Reﬂections on the Reﬂection Effect Hypothetical Payoffs number of observations

25 20 15 Gain Domain averse

10 5 0

neutral loving averse

neutral Loss Domain

loving

Real Payoffs number of observations

25 20 15 Gain Domain averse

10 5 0

Fig. 2.

neutral loving averse

neutral Loss Domain

loving

Risk Aversion Categories for Low Losses and Low Gains.

5. RESULTS FROM HIGH-PAYOFF SESSIONS Kahneman and Tversky’s (1979, p. 265) initial tests of prospect theory used high hypothetical payoffs, and they questioned the generality of data derived from small stakes lotteries. One might also suppose that large gains and losses have a higher emotional impact than low-payoff lotteries, so the

420

SUSAN K. LAURY AND CHARLES A. HOLT

predicted effects of a psychologically motivated theory like prospect theory might be more apparent with very high payoffs. Given this, we decided to scale up the stakes to levels that had a clear effect on risk attitudes in Holt and Laury (2002). To do this, we ran high-payoff treatments (with gains and losses, real and hypothetical) where the payoff numbers shown in Table 1 were multiplied by a factor of 15. This multiplicative scaling of all payoff amounts does not alter the risk-neutral crossover point at ﬁve safe choices. The result of this scaling was that the safe lottery had payoffs of $60 and $48 (positive or negative) and the risky lottery had payoffs of $115.50 and $3.00. The real-incentive sessions were quite expensive, since pre-lottery choice earnings had to be built up to high levels in order to make real losses credible. Initial earnings were therefore built up with a high-payoff public goods experiment. The real-payoff sessions were preceded by a high realpayoff experiment to raise subjects’ earnings, and the high hypothetical payoff sessions were preceded by an analogous experiment with high hypothetical payoffs. The initial earnings in the high real-payoff sessions averaged about $140 (and ranged from $112 to $190). We did not provide a higher initial payoff for the high hypothetical sessions, since losses were hypothetical.16 Because of the additional expense associated with these high payoff sessions, we have about half the number of observations as for the low-payoff sessions. Given that we did not observe any systematic effect of the order in which gains and losses were presented to subjects, we chose to use only one treatment order in the high payoff sessions. Therefore in all sessions, the lottery over losses was given ﬁrst. As before, the lotteries over losses and gains were separated by a matching pennies game (with payoffs scaled up by a factor of 1), and the results for choices under both treatments were not announced until all decisions had been made. There were 32 subjects who faced high real payoffs and 32 who faced high hypothetical payoffs, and in both cases exactly half of the observations were for the treatment with the risky lottery listed on the left, and half with the risky lottery listed on the right (see Table 2). In Table 3, rows 2 and 3 (for the 1 and 15 Loss–Gain treatment) allow a comparison of the average number of safe choices, holding the treatment order (losses ﬁrst) constant. There are no obvious effects of scaling up payoffs, except for an increase in risk aversion in the real gain domain. Fig. 3 shows the cumulative choice frequencies of high hypothetical (top) and high real (bottom) payoffs. Notice that the gain and loss lines are closer together for hypothetical payoffs, shown in the top panel. However, a matched-pairs Wilcoxon test (using the difference between an individual’s

421

Further Reﬂections on the Reﬂection Effect 15x Hypothetical Payments, Losses then Gains Risk Neutral

1

Cumulative Frequency

0.8

Gains

0.6

Losses

0.4

0.2

0 0

1

2

3

4 5 6 7 Number of Safe Choices

8

9

10

8

9

10

15x Real Payments, Losses then Gains Risk Neutral

1

Cumulative Frequency

0.8

Gains

0.6

0.4

Losses

0.2

0 0

Fig. 3.

1

2

3

4 5 6 7 Number of Safe Choices

Cumulative Choice Frequencies for High Losses and High Gains.

422

SUSAN K. LAURY AND CHARLES A. HOLT

choice in the gain and loss treatment as the unit of observation) rejects the null hypothesis of no difference in favor of the one-tailed alternative that fewer safe choices are made in the loss treatment. The top panel of Fig. 4 summarizes individual data for the 32 subjects in the high hypothetical payoff sessions. As before, the number of safe choices 15x Hypothetical Payoffs, Losses then Gains number of observations

0.3 0.25 0.2 Gain Domain averse

0.15 0.1

neutral

0.05 0

loving averse

neutral Loss Domain

loving

15x Real Payoffs, Losses then Gains number of observations

0.3 0.25 0.2 Gain Domain

0.15 0.1

averse neutral

0.05 0

loving averse

neutral Loss Domain

loving

Fig. 4. Risk Aversion Categories for High Losses and High Gains.

Further Reﬂections on the Reﬂection Effect

423

is used to categorize risk attitudes. Just as we observed for low payoffs, about half of these subjects are risk averse over gains (53 percent), however reﬂection is no longer the modal outcome. Only about one-third of those who are risk averse for gains turn out to be risk preferring for losses, while the majority of subjects (28 percent in all) are risk averse over gains and losses. The outcomes for high real cash payoffs are shown in the bottom panel of Fig. 4. About two-thirds of subjects are risk averse over gains (back row), of these only about 15 percent are also risk preferring for losses. Overall, we observe less reﬂection when we scale up payoffs, both real and hypothetical. And as before, we observe about twice the rate of reﬂection for high hypothetical payoffs (19 percent) as for high real payoffs (9 percent). In these high payoff sessions, 49 of 64 subjects exhibited a clean switch point between the safe and risky lotteries. With real payoffs, 73 percent of these subjects exhibit risk aversion over gains. Of these, only about 10 percent also show risk preference over losses. Little difference is observed in the hypothetical data. As before, reﬂection occurs about twice as often with hypothetical payoffs (17 percent of subjects) as with real payoffs (8 percent). There is one potentially important procedural difference between these high real and high hypothetical payoff sessions. The high real-payoff sessions were preceded by a real-payoff experiment in which earnings averaged about $140. In contrast, the high hypothetical payoff sessions were preceded by a hypothetical choice task, with earnings set equal to $45 for the entire session (which is identical to earnings in the low hypothetical payoff sessions). If previously earned high payoffs affect risk attitudes, this could bias the comparison between these real and hypothetical payoff sessions. In order to address this, we ran two additional high hypothetical payoff sessions.17 All procedures were identical to those described above (32 subjects participated, all faced the loss condition ﬁrst, and half of the subjects saw the risky lottery on the left of their decision sheet), however both sessions were preceded by a high real-payoff experiment. Earnings in these sessions were quite close to those that preceded the high real-payoff sessions. Average earnings were $132 (compared with $140 for the real payment sessions reported above), and ranged from $111 to $182 ($112 to $190 for the real payment sessions). This high initial stake had a large effect on choices in the hypothetical gain treatment, but only a small effect in the loss domain. On average, individuals are very slightly risk seeking in the gain domain (4.9 safe choices), as shown in the bottom row of Table 3, while they are still

424

SUSAN K. LAURY AND CHARLES A. HOLT

somewhat risk averse over losses. This pattern (higher risk aversion over losses than gains) is opposite to that predicted by prospect theory, although the difference in choices between the gain and loss treatments is not signiﬁcant. Overall, only 25 percent of subjects are risk averse over gains; of these, about one-third are risk seeking over losses. The rate of reﬂection (9 percent) is comparable to that observed with high real payoffs. Using the subset of data from those subjects who switch only one time strengthens these conclusions: 29 percent of subjects are risk averse over gains, however only 8 percent of all subjects in this treatment reﬂect. At the end of each session, we asked subjects to complete a demographic questionnaire. Our subject pool was almost equally divided among men and women (46 percent male and 54 percent female). Looking at our data by gender does not change our primary conclusion: the modal outcome is reﬂection only for low hypothetical payoffs. All sessions were held at Georgia State University, which is an urban campus located in downtown Atlanta and has a very diverse student body. Almost half of these subjects (43 percent) were raised outside of North America (Europe, South America, Asia, and Africa). The rate of reﬂection is generally higher among subjects from North America (the notable exception to this is in the low hypothetical treatment, where reﬂection occurs 50 percent more often among those raised outside of North America). However, none of our main results are changed when looking only at those raised in North America or only those raised abroad. The interpretation of our data is complicated by those individuals classiﬁed as being risk neutral over gains or losses. Recall that (for low payoffs) ﬁve safe choices are consistent with constant absolute risk aversion in the interval (0.05, 0.05). This is symmetric around zero (risk neutrality), but is also consistent with a very small degree of risk aversion or risk preference. An alternative interpretation of this is to assume that those we classiﬁed as risk neutral are evenly divided between being risk averse and risk seeking. If we eliminate the risk-neutral category and classify subjects in this manner, our primary conclusions stand. When payments are real, the modal outcome under high and low incentives is risk aversion under gains and losses. For low hypothetical payments, the modal outcome is reﬂection; however for high hypothetical payoffs (preceded by an experiment that uses hypothetical payments), the modal outcome is risk aversion under gains and losses. Using high hypothetical payoffs (preceded by a high real-payoff experiment) the modal outcome is the reverse pattern of reﬂection: risk preference over gains and risk aversion over losses.18

Further Reﬂections on the Reﬂection Effect

425

6. MAXIMUM-LIKELIHOOD ESTIMATION The nonparametric statistical tests presented thus far fail to support the notion that a full reﬂection of payoffs (multiplication by 1) causes subjects to exhibit risk aversion for the lotteries involving gains and risk preference for the lotteries involving losses. However, interpretation of the data is complicated by the fact that subjects entered the lottery choice part of the experiment with different earnings. Moreover, there were differences in presentation, and different subjects (with differing demographic characteristics) faced the real and hypothetical treatments, and low- and high-payoff treatments. In this section we present results from maximum-likelihood estimation that controls for (and measures the impact of) these factors. Recall that prior to the start of this part of the session, subjects participated in another experiment in which they earned their initial endowment. Before facing their ﬁrst lottery choice task subjects were told: The remaining part of today’s experiment will consist of a series of choices given to you one at a time. Although each part will count toward your ﬁnal earnings, you will not ﬁnd out how much you have earned for any of these decisions until you have completed all of them. For one of these decision tasks, all payoffs are negative; for this decision, payoffs will be subtracted from your earnings in the other parts of today’s experiment. For all of the other decision tasks, payoffs are positive and will be added to your earnings in the other parts of today’s experiment.

In the high-payoff treatment, subjects faced a maximum loss of $115.50. In the real-payoff sessions, four subjects entered this part of the experiment with earnings below this level, and so when they faced a loss of $115.50 and thus there was the possibility of losing more money than their accumulated earnings. These subjects only knew that they would have future opportunities to earn money, but did not know the size of the earnings opportunities.19 Because of this uncertainty, it is unclear how these subjects perceived the potential losses. For example, a subject who started with $110 might perceive the possible $115.50 loss as a loss of $110 (the initial endowment) instead. Therefore, these subjects are omitted from the following analysis. The estimation presented here follows the structural estimation procedures employed in Holt and Laury (2002) to estimate the parameters of an Expo-Power utility function. The extension to estimate a structural model using individual observations, rather than grouped observations, is described in Appendix F of Harrison and Rutstro¨m (2008), who also discuss the extension to allow each core parameter to be estimated as a linear function of observable demographic or treatment characteristics. For

426

SUSAN K. LAURY AND CHARLES A. HOLT

a given lottery choice, the probabilities and values of the prizes are used to determine the expected utility of each lottery, using a CRRA speciﬁcation u(x) ¼ x1r/(1 r). The model estimated here assumes this functional form for utility, an expected utility theory representation of the latent decision process, a cumulative normal distribution to link predicted choices and actual choices, and a Fechner structural error speciﬁcation. The estimation procedures account for the fact that we have choices in 10 lotteries for each subject under both gains and losses, and therefore we use clustered-robust standard errors to allow for correlated errors by each subject. The estimates are obtained using standard maximum-likelihood methods, using Stata. The top panel of Table 4 presents maximum-likelihood estimates for the baseline (low-payoff) data. One can compute the size of the CRRA coefﬁcient using the coefﬁcient of the regression constant and then adding in the marginal effects from the demographic and treatment variables. In this case, the CRRA coefﬁcient is calculated as 0.242 0.02 loss0.004 male+0.002 age y The loss variable is an indicator variable that is set equal to one when the lotteries involve losses and zero otherwise. The negative coefﬁcient suggests that there is less risk aversion under losses, however the effect is not signiﬁcant at any standard level of conﬁdence. In fact, the only variable that is signiﬁcant on its own is the ‘‘white’’ indicator variable: subjects who classify themselves as white or Caucasian are less risk averse than non-white subjects. These results also show that, for the baseline payoff data, the subject’s sex (male ¼ 1 if male subjects), age (in years), where they were raised (North America or abroad), use of hypothetical payments (hyp), the order in which they faced gains and losses (gl_order), and whether the safe lottery was listed as Option A or Option B (safe_left) have no signiﬁcant effect on the CRRA coefﬁcient. The coefﬁcient for ‘‘mu’’ gives the estimate for the Fechner noise term. The top panel of Table 5 presents the predicted CRRA coefﬁcient using the characteristics of each subject for the baseline payoff data. The coefﬁcient is slightly smaller under losses than gains (r ¼ 0.189 under losses compared to 0.217 under gains), but these values indicate risk aversion under both gains and losses. Turning to the high-payoff sessions (second panel of Tables 4 and 5), subjects are approximately risk neutral, but again there is no signiﬁcant effect of losses on the risk aversion coefﬁcient. The coefﬁcient is both very small (0.002) and insigniﬁcant ( p ¼ 0.93). It is important to note that there is also a large increase in the noise coefﬁcient (mu) for the high-payoff data. Therefore, the effect of payoff scale on the Fechner noise term must be recognized and dealt with when both payoff scales are combined.

427

Further Reﬂections on the Reﬂection Effect

Table 4. Maximum-likelihood Estimation of CRRA Utility Function. Variable

Description

Baseline payoff dataa Cons Loss Indicator for loss treatment Male Indicator for male subjects Age Subject’s age (in years) White Indicator for White/ Caucasian NAmerican Indicator raised in North America hyp Indicator for hypothetical treatment gl_order Indicator for gains presented ﬁrst safe_left Indicator for safe presented on left mu Fechner noise parameter High-payoff datab Cons Loss Indicator for loss treatment Male Indicator for male subjects Age Subject’s age (in years) White Indicator for White/ Caucasian NAmerican Indicator raised in North America hyp Indicator for hypothetical treatment safe_left Indicator for safe presented on left mu Fechner noise parameter All data, contextual utility modelc Cons Loss Indicator for loss treatment

Estimate

Standard p-Value Error

Lower 95% Conﬁdence Interval

Upper 95% Conﬁdence Interval

0.242 0.028

0.223 0.022

0.277 0.193

0.195 0.071

0.679 0.014

0.005

0.026

0.856

0.055

0.046

0.002

0.003

0.381

0.003

0.007

0.066

0.027

0.014

0.119

0.014

0.041

0.028

0.141

0.100

0.014

0.010

0.026

0.701

0.061

0.041

0.011

0.027

0.679

0.065

0.042

0.036

0.027

0.185

0.089

0.017

0.580

0.455

0.202

0.311

1.471

0.066 0.002

0.111 0.027

0.550 0.927

0.151 0.055

0.283 0.050

0.042

0.029

0.152

0.016

0.100

0.000

0.003

0.953

0.006

0.006

0.026

0.035

0.460

0.095

0.043

0.048

0.034

0.162

0.115

0.019

0.026

0.029

0.364

0.083

0.030

0.041

0.028

0.152

0.015

0.096

16.593

6.896

0.016

3.076

30.109

2.212 0.784

0.556 0.221

0.000 0.000

1.123 1.217

3.301 0.351

428

SUSAN K. LAURY AND CHARLES A. HOLT

Table 4. (Continued ) Variable

hyp

gl_order safe_left Scale

Description

Indicator for hypothetical treatment Indicator for gains presented ﬁrst Indicator for safe presented on left Indicator for high scale

Noise

Estimate

Standard p-Value Error

Lower 95% Conﬁdence Interval

Upper 95% Conﬁdence Interval

0.113

0.225

0.617

0.554

0.329

0.219

0.694

0.752

1.579

1.141

0.067

0.224

0.766

0.373

0.506

0.095

0.036

0.008

0.165

0.024

5.081

0.317

0.000

4.460

5.703

a

Log-likelihood ¼ 1006.437; Wald test for null hypothesis that all coefﬁcients are zero has a w2 value of 15.87 with eight degrees of freedom, implying a p-value of 0.0443. b Log-likelihood ¼ 535.229; Wald test for null hypothesis that all coefﬁcients are zero has a w2 value of 12.78 with seven degrees of freedom, implying a p-value of 0.0777. c Log-likelihood ¼ 1655.181; Wald test for null hypothesis that all coefﬁcients are zero has a w2 value of 29.48 with ﬁve degrees of freedom, implying a p-value of 0.000.

Table 5. Mean

Predicted CRRA Coefﬁcients.

Standard Deviation

Minimum Value

Maximum Value

Baseline payoff data Gains 0.217 Losses 0.189

0.044 0.044

0.128 0.100

0.321 0.292

High-payoff data Gains 0.063 Losses 0.063

0.048 0.046

0.031 0.033

0.153 0.126

All data, contextual utility model Gains 1.561 0.592 Losses 0.868 0.549

0.676 0.108

2.184 1.400

The bottom panel of Tables 4 and 5 present results from the pooled (baseline and high-payoff) data, using a contextual utility model that incorporates heteroscedasticity in the noise term that can be attributed to the change in context from low payoffs to high payoffs (see Wilcox, 2007, and Wilcox, 2008, for a derivation of and further details on the contextual utility model). These estimates show that framing the lottery choice problem in terms of losses causes a signiﬁcant decrease in risk aversion (Table 4),

Further Reﬂections on the Reﬂection Effect

429

however the predicted values show that subjects are still risk averse under both gains and losses.

7. CONCLUSION This paper adds to the literature of experimental tests of elements of prospect theory, which in its various versions is the leading alternative to expected utility theory. The design uses a menu of lottery choices structured to allow an inference about risk aversion as gains are transformed into losses, holding payoff probabilities constant. When hypothetical payoffs are used, we do see that the modal choice pattern is for subjects to ‘‘reﬂect’’ from risk-averse behavior over gains to risk-seeking behavior over losses. This reﬂection rate is reduced by more than half when we use lotteries with real money payoffs, and the modal tendency is to be risk averse for both gains and losses. There is a signiﬁcant difference in risk attitudes, however, with less risk aversion observed in the loss domain. When payoffs are scaled up by a factor of 15 (yielding potential gains and losses of over $100), there is even less support for reﬂection. Sharper results are obtained when we remove the ‘‘noisy’’ subjects who switch between the safe and risky lotteries more than once. There is a little more risk aversion with the no-switch data, and the scaling up of payoffs cuts reﬂection rates by almost half (for both real and hypothetical payoffs). In fact, the incidence of reﬂection with high real payoffs is only about 7 percent, and is lower than the rate of ‘‘reverse reﬂections’’ (risk seeking for gains and risk aversion for losses) that is opposite of the pattern predicted by prospect theory. The lack of a clear reﬂection effect in our data is little surprising, given the results of other studies that report reﬂection effects with real money incentives (Camerer, 1989; Battalio et al., 1990). One procedural difference is the nature of what was held constant between treatments. Instead of holding initial wealth roughly constant in both treatments as we did, these studies provided a high initial stake in the loss treatment, so the ﬁnal wealth position is constant across treatments. For example, a lottery over gains of $20 and $0 could be replaced with an initial payoff of $20 and a choice involving $20 and $0. Each ‘‘frame’’ yields the same possible ﬁnal wealth positions ($0 or $20), but the framing is in terms of gains in one treatment and in terms of losses in the other.20 A setup like this is precisely what is needed to isolate a ‘‘framing effect.’’ Such an effect is present since both studies report a tendency for subjects to be risk averse in the gain frame and

430

SUSAN K. LAURY AND CHARLES A. HOLT

risk seeking in the loss frame. Whether these results indicate a reﬂection effect is less clear, since the higher stake provided in the loss treatment may have induced more risk-seeking behavior.

NOTES 1. Hilton (1988) provides a neat decomposition of an ‘‘overall risk premium’’ into a standard Arrow–Pratt risk premium from expected utility theory and a ‘‘decision weight risk premium’’ resulting from nonlinear probability weighting. Levy and Levy (2002) characterize the risk premium ‘‘in the small’’ for the case of cumulative prospect theory. 2. ‘‘In the present study we did not pay subjects on the basis of their choices because in our experience with choices between prospects of the type used in the present study, we did not ﬁnd much difference between subjects who were paid a ﬂat fee and subjects whose payoffs were contingent on their decisions’’ (Tversky & Kahneman, 1992, p. 315). The choices being referred to were choices between gambles and sure money amounts. 3. Harrison, Johnson, McInnes, and Rutstrom (2005) report a follow-up experiment and conclude that the payoff-scale effects reported by Holt and Laury (2002) were, in part, due to a treatment-order effect, but that the qualitative results (higher risk aversion for higher stakes) were replicated. Holt and Laury (2005) ran a subsequent experiment with no order effects that also resulted in a clear effect of payoff scale on risk aversion, although the magnitude of the payoff-scale effect was diminished (consistent with ﬁndings of Harrison et al.). 4. For other surveys of the effects of using money payoffs in economics experiments, see Smith and Walker (1993), Hertwig and Ortmann (2001), Laury and Holt (2007), Harrison and Rutstro¨m (2008), and Camerer and Hogarth (1999). 5. In addition, the idea that one might respond to losses and gains differently is supported by Gehring and Willoughby (2002) who measure brain activity (eventrelated brain potentials measured with an EEG) milliseconds after a subject makes a choice that results in a gain or loss. They ﬁnd that this brain activity is greater in amplitude after a (real) loss is experienced than when a gain is experienced. Moreover, choices made after losses were riskier than choices made after gains. Dickhaut et al. (2003) also observed brain activity in choice tasks with monetary gains and losses, and they report that subjects are risk averse for gains but not for losses, and that reaction time and brain activation patterns differ for these two contexts. 6. For example, suppose a subject must choose between a small certain loss (payment for insurance) and a gamble with a small probability of a large loss. Overweighting this small probability would cause the subject to appear to be quite risk averse. If the payoffs are multiplied by 1 then the small probability of a large loss becomes a small probability of a large gain, and if that probability is overweighted, the subject would be more willing to take an actuarially fair risk. Bosche-Dome`nech and Silvestre (2006) deal with the problem of probability

Further Reﬂections on the Reﬂection Effect

431

weighting by cleverly decomposing reﬂection into a payoff translation and a probability switch. The payoff translation involves subtracting a constant from all payoffs, holding probabilities ﬁxed. A probability switch involves assigning the probability of the high payoff to the low payoff instead, and vice versa. In their setup (one option is a certainty and the other involves only one non-zero payoff) the reﬂection obtained by multiplying all payoffs by 1 can be decomposed into a payoff translation and a probability switch. They consider four cases: the base lottery choice with gains, a probability switch (still with gains), a payoff translation into losses (with no switch in probabilities), and full reﬂection (both a payoff translation and a probability switch). They ﬁnd equally strong payoff translation and probability switch effects. 7. In this study, only one of 134 subjects was selected at random ex post to be paid, for one of the 20 questionnaires that they completed over a 10-week period, with 1 of the 21 paired-choice questions for that questionnaire actually used to determine the payoff. 8. These calculations are meant to be illustrative; we do not mean to imply that absolute risk aversion will be constant over a wide range of payoffs. The lottery choice experiments in Holt and Laury (2002) involve scaling up payoffs by factors of 20, 50, and 90, and we ﬁnd evidence of decreasing absolute risk aversion when utility is expressed as a function of income, not wealth. This result is not surprising since it is well known that the absolute risk aversion needed to explain choices between low stakes gambles implies absurd amounts of risk aversion over high stakes (Rabin, 2000). Rabin’s theorem pertains to a standard utility of ﬁnal wealth function, but similar considerations apply when utility is a function of only gains and losses around a reference point (utility of income). To see this, consider the utility function u(x) ¼ exp(rx), which exhibits a constant absolute risk aversion of r. Notice that scaling up all money prizes by a factor of, say, 100, yields utilities of exp(100rx), so this is equivalent to leaving the stakes the same and increasing risk aversion by a factor of 100, which yields an absurd amount of risk aversion. 9. For subjects with multiple ‘‘switch points’’ (i.e., subjects who switch from making a safe choice to a risky choice, back to a safe choice, before ﬁnally settling on the risky choice) using the total number of safe choices results in an approximation of their risk attitude. A more precise characterization of risk attitude, based on an individual’s choice in each of the 10 gambles, is used when maximum likelihood estimates are presented in Section 6 below. 10. To clarify these qualitative predictions, consider the Arrow–Pratt coefﬁcient of risk aversion, r(x) ¼ uv(x)/uu(x), and suppose that r(x) is higher for one utility function than for another on some interval of payoffs, with strict inequality holding for at least one point. Then it is a direct implication of parts (a) and (e) of Pratt’s (1964) Theorem 1 that the right side of (2) is higher for the more risk-averse utility function. Since the left side is increasing in p, this increases the range of probabilities for which the safe option is preferred. Conversely, Pratt’s Theorem 1 implies that the right side of (3) is lower for the more risk-averse utility function, which again raises the interval of probabilities over which the safe option is preferred. 11. If time permits, we prefer this approach because, as Camerer (1989) notes, losses from such a windfall stake obtained without any effort may be coded as foregone gains. For example, if a subject is given $20 and then experiences a loss of

432

SUSAN K. LAURY AND CHARLES A. HOLT

$5, the subject may consider this $15 earnings and not a $5 loss. There is also clear evidence that earned endowments tend to increase the incidence of self-interested decisions in dictator and division games, see Rutstrom and Williams (2000) and Cherry, Frykbom, and Shogren (2002). 12. This initial phase involved a sequential search task in about half of the sessions, and a public goods experiment in the other half. 13. Similarly, Myagkov and Plott (1997) told subjects that cash earnings would be based on the outcome of one market period, selected at random ex post. As Holt (1986) notes, the random selection method produces a compound lottery composed of the simple lotteries. There is no clear experimental evidence for such a compound lottery effect, however. For example, Laury (2002) ﬁnds no signiﬁcant difference in behavior between lottery choice treatments where subjects are paid for one of 10 decisions, or paid for all 10 decisions. See also Harrison and Rutstro¨m (2007) for a survey of evidence on the random selection method. 14. In the hypothetical treatment, the presentation of the S/R lotteries has some effect on behavior (i.e., whether safe or risky lottery is shown to subjects on the left side of the decision sheet as ‘‘Option A’’. However, as shown in Table 2, observations are about equally divided between these orders. Moreover, Kahneman and Tversky (1979) and Tversky and Kahneman (1992) alternated their presentation of lotteries in a similar manner and did not separate their data by presentation order. For consistency, we do not do so either, but do control for this presentation order in the maximum likelihood estimation contained in Section 6. 15. For example, in the lotteries over gains, subjects should initially choose the safe lottery and then switch to the risky lottery when the probability of the highpayoff outcome is high enough. Some subjects initially chose the safe lottery, switched to the risky lottery, then switched back to the safe lottery before returning to the risky lottery. 16. We chose this treatment order (real preceded by real, and hypothetical preceded by hypothetical) for consistency with the low-payoff experiments reported above. Of course, if differences are observed between our high-payoff real and hypothetical reﬂection experiments, it could be because one was preceded by a realpayoff experiment, and the other by a hypothetical experiment (where total earnings were $45, regardless of one’s choices). We consider this below. 17. We thank Colin Camerer for suggesting this treatment. 18. Of course, those who are most supportive of prospect theory’s reﬂection effect might suggest that those individuals in the category centered around risk neutrality are not evenly distributed between risk aversion and risk preference. Instead, they might classify risk-neutral individuals in the manner most supportive of prospect theory. We can do so by classifying anyone risk neutral over gains as being risk averse, and anyone risk neutral over losses as risk seeking. Under this interpretation, the four upper-right bars in Figures 2 and 4 are combined to create the category for reﬂection. This includes those classiﬁed as risk neutral for both gains and losses. When risk-neutral individuals are reclassiﬁed in this manner, the modal choice pattern is reﬂection in all treatments. However, as reported by Camerer (1989) and Battalio et al. (1990), reﬂection is far from universal. In our low real payoff treatment, only 45 percent of all subjects exhibit reﬂection (compared with 38 percent who are risk averse for both gains and losses). There is a little more reﬂection

Further Reﬂections on the Reﬂection Effect

433

in the high real payoff treatment when risk-neutral subjects are reclassiﬁed in this manner: 56 percent reﬂect, while 31 percent are risk averse over gains and losses. As before, the strongest support for the reﬂection effect comes from subjects who faced low hypothetical payoffs; 63 percent of these (reclassiﬁed) subjects exhibited the predicted risk aversion for gains and risk preference over losses. When high hypothetical payoffs follow a hypothetical payoff experiment, 44 percent of subjects reﬂect (and 34 percent are risk averse over both gains and losses). Following a high real payoff experiment, only 41 percent of subjects exhibit reﬂection under high hypothetical payoffs. Because the risk-neutral data are categorized in the way most favorable to prospect theory it is not surprising that there is much more support for reﬂection when the data are presented in this manner. Moreover, this would indicate that the strongest support for the reﬂection effect comes from those who are at best very slightly risk averse over gains and very slightly risk loving over losses. 19. In fact, earnings in the matching pennies game were set to ensure that all subjects would receive a positive payment in the session. 20. Similarly, Cohen et al. (1987) informed subjects in advance that a constant amount of money sufﬁcient to cover losses would be added to the payoff before the determination of losses in the loss treatment.

ACKNOWLEDGMENTS We wish to thank Ron Cummings for his helpful suggestions and for funding the human subjects’ payments and Glenn Harrison for his helpful suggestions and assistance. We also thank Eunice Heredia for research assistance, and Colin Camerer, Glenn Harrison, Peter Moffatt, Lindsay Osco, Alda Turfo, and Peter Wakker for their comments and suggestions. Any remaining errors are our own. This work was funded in part by the National Science Foundation (SBR-9753125 and SBR-0094800).

REFERENCES Battalio, R. C., Kagel, J. H., & Jiranyakul, K. (1990). Testing between alternative models of choice under uncertainty: Some initial results. Journal of Risk and Uncertainty, 3(1), 25–50. Binswanger, H. P. (1980a). Attitudes toward risk: Experimental measurement in rural India. American Journal of Agricultural Economics, 62(3), 395–407. Binswanger, H. P. (1980b). Attitudes toward risk: Experimental measurement in rural India. American Journal of Agricultural Economics, 62(3), 395–407. Bosch-Domenech, A., & Silvestre, J. (1999). Does risk aversion or attraction depend on income? An experiment. Economics Letter, 65(3), 265–273. Bosche-Dome`nech, A., & Silvestre, J. (2006). Reﬂections on gains and losses: A 2 2 7 experiment. Journal of Risk and Uncertainty, 33, 217–235.

434

SUSAN K. LAURY AND CHARLES A. HOLT

Camerer, C. F. (1989). An experimental test of several generalized utility theories. Journal of Risk and Uncertainty, 2(1), 61–104. Camerer, C. F. (2001). Prospect theory in the wild: Evidence from the ﬁeld. In: D. Kahneman & A. Tversky (Eds), Choices, values, and frames (pp. 288–300). Cambridge: Cambridge University Press. Camerer, C. F., & Hogarth, R. M. (1999). The effects of ﬁnancial incentives in experiments: A review and capital-labor-production framework. Journal of Risk and Uncertainty, 19(1–3), 7–42. Cherry, T., Frykbom, P., & Shogren, J. (2002). Hardnose the dictator. American Economic Review, 92(4), 1218–1221. Cohen, M., Jaffray, J., & Said, T. (1987). Experimental comparisons of individual behavior under risk and under uncertainty for gains and losses. Organizational Behavior and Human Decision Processes, 39, 1–22. Dickhaut, J., McCabe, K., Nagode, J. C., Rustichini, A., Smith, K., & Pardo, J. V. (2003). The impact of certainty context on the process of choice. Proceedings of the National Academy of Sciences, 100(18 March), 3536–3541. Gehring, W. J., & Willoughby, A. R. (2002). The medial frontal cortex and the rapid processing of monetary gains and losses. Science, 295(22 March), 2279–2282. Harbaugh, W. T., Krause, K., & Vesterlund, L. (2002). Prospect theory in choice and pricing tasks. Working Paper. University of Oregon. Harrison, G., & Rutstro¨m, E. (2007). Experimental evidence on the existence of hypothetical bias in value elicitation experiments. In: C. R. Plott & V. L. Smith (Eds), Handbook of experimental economics results. New York: Elsevier Press. Harrison, G., & Rutstrom, E. (2008). Risk aversion in the laboratory. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Research in Experimental Economics, Vol. 12). Greenwich, CT: JAI Press. Harrison, G. W. (2006). Hypothetical bias over uncertain outcomes. In: J. A. List (Ed.), Using experimental methods in environmental and resource economics (pp. 41–69). Northampton, MA: Edward Elgar. Harrison, G. W., Johnson, E., McInnes, M., & Rutstrom, E. (2005). Risk aversion and incentive effects: Comment. American Economic Review, 95(3), 897–901. Hershey, J. C., & Schoemaker, P. J. H. (1980). Risk taking and problem context in the domain of losses: An expected utility analysis. Journal of Risk and Insurance, 47(1), 111–132. Hilton, R. W. (1988). Risk attitude under two alternative theories of choice under risk. Journal of Economic Behavior and Organization, 9, 119–136. Holt, C. A. (1986). Preference reversals and the independence axiom. American Economic Review, 76(3), 508–515. Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American Economic Review, 92(5), 1644–1655. Holt, C. A., & Laury, S. K. (2005). Risk aversion and incentive effects: New data without order effects. American Economic Review, 95(3), 902–912. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of choice under risk. Econometrica, 47(2), 263–291. Kobberling, V., & Wakker, P. (2005). An index of loss aversion. Journal of Economic Theory, 122, 119–132. Laury, S. K. (2002). Pay one or pay all: Random selection of one choice for payment. Working Paper. Georgia State University.

Further Reﬂections on the Reﬂection Effect

435

Laury, S. K., & Holt, C. A. (2007). Payoff effects and risk preference under real and hypothetical conditions. In: C. Plott & V. Smith (Eds), Handbook of experimental results. Amsterdam: Elsevier. Laury, S. K., & McInnes, M. M. (2003). The impact of insurance prices on decision-making biases: An experimental analysis. Journal of Risk and Insurance, 70(2), 219–233. Laury, S. K., McInnes, M. M., & Swarthout, J. T. (2007). Catastrophic insurance: New experimental evidence. Working Paper. Georgia State University. Levy, H., & Levy, M. (2002). Arrow–Pratt risk aversion, risk premium, and decision weights. Journal of Risk and Uncertainty, 25(3), 265–290. Myagkov, M., & Plott, C. (1997). Exchange economies and loss exposure: experiments exploring prospect theory and competitive equilibria in market environments. American Economic Review, 87(5), 801–828. Pratt, J. W. (1964). Risk aversion in the small and in the large. Econometrica, 32(1–2), 122–136. Rabin, M. (2000). Risk aversion and expected utility theory: A calibration theorem. Econometrica, 68(5), 1281–1292. Rutstrom, E., & Williams, M. (2000). Entitlements and fairness: An experimental study of distributive preferences. Journal of Economic Behavior and Organization, 43(1). Slovic, P. (1969). Differential effects of real versus hypothetical payoffs on choices among gambles. Journal of Experimental Psychology, 79, 434–437. Slovic, P. (2001). Rational actors or rational fools: Implications of the affect heuristic for behavioral economics (Vol. 31, pp. 245–261). Working Paper. University of Oregon. Smith, V. L., & Walker, J. M. (1993). Monetary rewards and decision cost in experimental economics. Economic Inquiry, 31(2), 245–261. Tversky, A., & Kahneman, D. (1992). Advances in prospect theory: Cumulative representation of uncertainty. Journal of Risk and Uncertainty, 54(4), 297–323. Tversky, A., & Wakker, P. (1995). Risk attitudes and decision weights. Econometrica, 63(6), 1255–1280. Wilcox, N. (2007). ‘Stochastically more risk averse:’ A contextual theory of stochastic discrete choice under risk. Journal of Econometrics, forthcoming. Wilcox, N. (2008). Stochastic models for binary discrete choice under risk: A critical primer and econometric comparison. In: J. C. Cox & G. W. Harrison (Eds), Risk aversion in experiments (Research in Experimental Economics, Vol. 12). Greenwich, CT: JAI Press.

APPENDIX. EXPERIMENT INSTRUCTIONS Initial Instructions for Hypothetical Payment Sessions Today you will be participating in several experiments about decision making. Typically, in an experiment like this one, you would earn money. The amount of money that you would earn would depend on the choices that you and the other participants would make. In the experiment today, however, you will be paid $45 for participating in the experiment. You can write this amount now on your receipt form.

436

SUSAN K. LAURY AND CHARLES A. HOLT

You will not earn any additional money today based on the choices that you and the other participants make. The instructions for each part of the today’s experiment will describe how your earnings depend on your decisions (and sometimes on the decisions of others). It is important that you understand that you will not actually be receiving any of this additional money (other than your $45 participation fee). We would like for you to sign the statement below indicating that you understand this. I understand that I will be paid $45 for participation in today’s experiment. All other earnings described in the instructions that I receive are hypothetical and will not actually be paid to me. __________________________ Signature

Although you will not actually earn any additional money today, we ask that you make choices in the following experiments as if you could earn more money, and the amount that you could earn would depend on choices that you and the others make. You will not actually be paid any additional money, but we want you to make decisions as if you would be paid additional money.

Instructions for Lottery Choice Tasks (Real and Hypothetical) The remaining part of today’s experiment will consist of a series of choices given to you one at a time. Although each part will count toward your ﬁnal earnings, you will not ﬁnd out how much you have earned for any of these decisions until you have completed all of them. For one of these decision tasks, all payoffs are negative; for this decision, payoffs will be subtracted from your earnings in the other parts of today’s experiment. For all of the other decision tasks, payoffs are positive and will be added to your earnings in the other parts of today’s experiment.

Instructions Your decision sheet shows ten decisions listed on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices and record these in the ﬁnal column, but only one of them will be used in the end to determine your earnings. Before you start making your

Further Reﬂections on the Reﬂection Effect

437

ten choices, please let me explain how these choices will affect your earnings for this part of the experiment. Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, A or B, for the particular decision selected. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end. Now, please look at Decision 1 at the top. Option A yields a sure gain of $0.20 (20 cents), and option B yields a sure gain of $3.20 (320 cents). Next look at Decision 2 in the second row. Option A yields $7.70 if the throw of the ten-sided die is 1, and it yields $0.20 if the throw is 2–10. Option B yields $4.00 if the throw of the die is 1, and it yields $3.20 if the throw is 2–10. The other decisions are similar, except that as you move down the table, the chances of the better payoff for each option increase. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are ﬁnished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your payoff for the option you chose for that decision. Payoffs for this choice are positive and will be added to your previous earnings, and you will be paid the sum of all earnings in cash when we ﬁnish. So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B in each of these boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings for this part. Then you will write your earnings in the blank at the bottom of the page. Please note that these gains will be added to your previous earnings up to now. Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question.

438

SUSAN K. LAURY AND CHARLES A. HOLT

1/11/01,1 Decision

ID: _______ Option A

Option B

1

$3.20 if throw of die is 1–10

$0.20 if throw of die is 1–10

2

$4.00 if throw of die is 1 $3.20 if throw of die is 2–10

$7.70 if throw of die is 1 $0.20 if throw of die is 2–10

3

$4.00 if throw of die is 1 or 2 $3.20 if throw of die is 3–10

$7.70 if throw of die is 1 or 2 $0.20 if throw of die is 3–10

4

$4.00 if throw of die is 1–3 $3.20 if throw of die is 4–10

$7.70 if throw of die is 1–3 $0.20 if throw of die is 4–10

5

$4.00 if throw of die is 1–4 $3.20 if throw of die is 5–10

$7.70 if throw of die is 1–4 $0.20 if throw of die is 5–10

6

$4.00 if throw of die is 1–5 $3.20 if throw of die is 6–10

$7.70 if throw of die is 1–5 $0.20 if throw of die is 6–10

7

$4.00 if throw of die is 1–6 $3.20 if throw of die is 7–10

$7.70 if throw of die is 1–6 $0.20 if throw of die is 7–10

8

$4.00 if throw of die is 1–7 $3.20 if throw of die is 8–10

$7.70 if throw of die is 1–7 $0.20 if throw of die is 8–10

9

$4.00 if throw of die is 1–8 $7.70 if throw of die is 1–8 $3.20 if throw of die is 9 or 10 $0.20 if throw of die is 9 or 10

10

$4.00 if throw of die is 1–9 $3.20 if the throw of die is 10

Your Choice (A or B)

$7.70 if throw of die is 1–9 $0.20 if the throw of die is 10

Decision used: ________, Die throw: _____, Your earnings: _______.

Instructions Your decision sheet shows ten decisions listed on the left. Each decision is a paired choice between ‘‘Option A’’ and ‘‘Option B.’’ You will make ten choices and record these in the ﬁnal column, but only one of them will be used in the end to determine your earnings. Before you start making your ten choices, please let me explain how these choices will affect your earnings for this part of the experiment.

Further Reﬂections on the Reﬂection Effect

439

Here is a ten-sided die that will be used to determine payoffs; the faces are numbered from 1 to 10 (the ‘‘0’’ face of the die will serve as 10). After you have made all of your choices, we will throw this die twice, once to select one of the ten decisions to be used, and a second time to determine what your payoff is for the option you chose, A or B, for the particular decision selected. Even though you will make ten decisions, only one of these will end up affecting your earnings, but you will not know in advance which decision will be used. Obviously, each decision has an equal chance of being used in the end. Now, please look at Decision 1 at the top. Option A yields a sure loss of $0.20 (minus 20 cents), and option B yields a sure loss of $3.20 (minus 320 cents). Next look at Decision 2 in the second row. Option A yields $7.70 if the throw of the ten-sided die is 1, and it yields $0.20 if the throw is 2–10. Option B yields $4.00 if the throw of the die is 1, and it yields $3.20 if the throw is 2–10. The other decisions are similar, except that as you move down the table, the chances of the worse payoff for each option increase. To summarize, you will make ten choices: for each decision row you will have to choose between Option A and Option B. You may choose A for some decision rows and B for other rows, and you may change your decisions and make them in any order. When you are ﬁnished, we will come to your desk and throw the ten-sided die to select which of the ten decisions will be used. Then we will throw the die again to determine your payoff for the option you chose for that decision. Payoffs for this choice are negative and will be subtracted from your previous earnings, and you will be paid the sum of all earnings in cash when we ﬁnish. So now please look at the empty boxes on the right side of the record sheet. You will have to write a decision, A or B, in each of these boxes, and then the die throw will determine which one is going to count. We will look at the decision that you made for the choice that counts, and circle it, before throwing the die again to determine your earnings for this part. Then you will write your earnings in the blank at the bottom of the page. Please note that losses will be subtracted from your previous earnings up to now. Are there any questions? Now you may begin making your choices. Please do not talk with anyone while we are doing this; raise your hand if you have a question.

440

SUSAN K. LAURY AND CHARLES A. HOLT

1/11/01,2 Decision

ID: _______ Option A

Option B

1

$3.20 if throw of die is 1–10

$0.20 if throw of die is 1–10

2

$4.00 if throw of die is 1 $3.20 if throw of die is 2–10

$7.70 if throw of die is 1 $0.20 if throw of die is 2–10

3

$4.00 if throw of die is 1 or 2 $3.20 if throw of die is 3–10

$7.70 if throw of die is 1 or 2 $0.20 if throw of die is 3–10

4

$4.00 if throw of die is 1–3 $3.20 if throw of die is 4–10

$7.70 if throw of die is 1–3 $0.20 if throw of die is 4–10

5

$4.00 if throw of die is 1–4 $3.20 if throw of die is 5–10

$7.70 if throw of die is 1–4 $0.20 if throw of die is 5–10

6

$4.00 if throw of die is 1–5 $3.20 if throw of die is 6–10

$7.70 if throw of die is 1–5 $0.20 if throw of die is 6–10

7

$4.00 if throw of die is 1–6 $3.20 if throw of die is 7–10

$7.70 if throw of die is 1–6 $0.20 if throw of die is 7–10

8

$4.00 if throw of die is 1–7 $3.20 if throw of die is 8–10

$7.70 if throw of die is 1–7 $0.20 if throw of die is 8–10

9

$4.00 if throw of die is 1–8 $7.70 if throw of die is 1–8 $3.20 if throw of die is 9 or 10 $0.20 if throw of die is 9 or 10

10

$4.00 if throw of die is 1–9 $3.20 if the throw of die is 10

$7.70 if throw of die is 1–9 $0.20 if the throw of die is 10

Decision used: ________, Die throw: _____, Your earnings: _______.

Your Choice (A or B)