Uncertainty in Economic Theory
Recent decades have witnessed developments in decision theory that propose an alternati...
115 downloads
1196 Views
2MB Size
Report
This content was uploaded by our users and we assume good faith they have the permission to share this book. If you own the copyright to this book and it is wrongfully on our website, we offer a simple DMCA procedure to remove your content from our site. Start by pressing the button below!
Report copyright / DMCA form
Uncertainty in Economic Theory
Recent decades have witnessed developments in decision theory that propose an alternative to the accepted Bayesian view. According to this view, all uncertainty can be quantified by probability measures. This view has been criticized on empirical as well as on conceptual grounds. David Schmeidler has offered an alternative way of thinking about decision under uncertainty, which has become popular in recent years. This book provides a review and an introduction to this new decision theory under uncertainty. The first part focuses on theory: axiomatizations, the definitions of uncertainty aversion, of updating and independence, and so forth. The second part deals with applications to economic theory, game theory, and finance. This is the first collection to include chapters on this topic, and it can thus serve as an introduction to researchers who are new to the field as well as a graduate course textbook. With this goal in mind, the book contains survey introductions that are aimed at a graduate level student, and help explain the main ideas, and put them in perspective. Itzhak Gilboa is Professor at the Eitan Berglas School of Economics, Tel-Aviv University, Israel. He is also a Fellow of the Cowles Foundation for Research in Economics at Yale University, USA.
Routledge frontiers of political economy
1 Equilibrium Versus Understanding Towards the rehumanization of economics within social theory Mark Addleson 2 Evolution, Order and Complexity Edited by Elias L. Khalil and Kenneth E. Boulding 3 Interactions in Political Economy Malvern after ten years Edited by Steven Pressman 4 The End of Economics Michael Perelman 5 Probability in Economics Omar F. Hamouda and Robin Rowley 6 Capital Controversy, Post-Keynesian Economics and the History of Economics Essays in honour of Geoff Harcourt, volume one Edited by Philip Arestis, Gabriel Palma and Malcolm Sawyer 7 Markets, Unemployment and Economic Policy Essays in honour of Geoff Harcourt, volume two Edited by Philip Arestis, Gabriel Palma and Malcolm Sawyer 8 Social Economy The logic of capitalist development Clark Everling 9 New Keynesian Economics/Post-Keynesian Alternatives Edited by Roy J. Rotheim 10 The Representative Agent in Macroeconomics James E. Hartley
11 Borderlands of Economics Essays in honour of Daniel R. Fusfeld Edited by Nahid Aslanbeigui and Young Back Choi 12 Value, Distribution and Capital Essays in honour of Pierangelo Garegnani Edited by Gary Mongiovi and Fabio Petri 13 The Economics of Science Methodology and epistemology as if economics really mattered James R. Wible 14 Competitiveness, Localised Learning and Regional Development Specialisation and prosperity in small open economies Peter Maskell, Heikki Eskelinen, Ingjaldur Hannibalsson, Anders Malmberg and Eirik Vatne 15 Labour Market Theory A constructive reassessment Ben J. Fine 16 Women and European Employment Jill Rubery, Mark Smith, Colette Fagan, Damian Grimshaw 17 Explorations in Economic Methodology From Lakatos to empirical philosophy of science Roger Backhouse 18 Subjectivity in Political Economy Essays on wanting and choosing David P. Levine 19 The Political Economy of Middle East Peace The impact of competing trade agendas Edited by J.W. Wright, Jnr 20 The Active Consumer Novelty and surprise in consumer choice Edited by Marina Bianchi 21 Subjectivism and Economic Analysis Essays in memory of Ludwig Lachmann Edited by Roger Koppl and Gary Mongiovi 22 Themes in Post-Keynesian Economics Essays in honour of Geoff Harcourt, volume three Edited by Claudio Sardoni and Peter Kriesler 23 The Dynamics of Technological Knowledge Cristiano Antonelli
24 The Political Economy of Diet, Health and Food Policy Ben J. Fine 25 The End of Finance Capital market inflation, financial derivatives and pension fund capitalism Jan Toporowski 26 Political Economy and the New Capitalism Edited by Jan Toporowski 27 Growth Theory A philosophical perspective Patricia Northover 28 The Political Economy of the Small Firm Edited by Charlie Dannreuther 29 Hahn and Economic Methodology Edited by Thomas Boylan and Paschal F. O’Gorman 30 Gender, Growth and Trade The miracle economies of the postwar years David Kucera 31 Normative Political Economy Subjective freedom, the market and the state David Levine 32 Economist with a Public Purpose Essays in honour of John Kenneth Galbraith Edited by Michael Keaney 33 Involuntary Unemployment The elusive quest for a theory Michel De Vroey 34 The Fundamental Institutions of Capitalism Ernesto Screpanti 35 Transcending Transaction The search for self-generating markets Alan Shipman 36 Power in Business and the State An historical analysis of its concentration Frank Bealey 37 Editing Economics Essays in honour of Mark Perlman Hank Lim, Ungsuh K. Park and Geoff Harcourt
38 Money, Macroeconomics and Keynes Essays in honour of Victoria Chick, volume 1 Philip Arestis, Meghnad Desai and Sheila Dow 39 Methodology, Microeconomics and Keynes Essays in honour of Victoria Chick, volume 2 Philip Arestis, Meghnad Desai and Sheila Dow 40 Market Drive and Governance Reexamining the rules for economic and commercial contest Ralf Boscheck 41 The Value of Marx Political economy for contemporary capitalism Alfredo Saad-Filho 42 Issues in Positive Political Economy S. Mansoob Murshed 43 The Enigma of Globalisation A journey to a new stage of capitalism Robert Went 44 The Market Equilibrium, stability, mythology S.N. Afriat 45 The Political Economy of Rule Evasion and Policy Reform Jim Leitzel 46 Unpaid Work and the Economy Edited by Antonella Picchio 47 Distributional Justice Theory and measurement Hilde Bojer 48 Cognitive Developments in Economics Edited by Salvatore Rizzello 49 Social Foundations of Markets, Money and Credit Costas Lapavitsas 50 Rethinking Capitalist Development Essays on the economics of Josef Steindl Edited by Tracy Mott and Nina Shapiro 51 An Evolutionary Approach to Social Welfare Christian Sartorius
52 Kalecki’s Economics Today Edited by Zdzislaw L. Sadowski and Adam Szeworski 53 Fiscal Policy from Reagan to Blair The Left veers Right Ravi K. Roy and Arthur T. Denzau 54 The Cognitive Mechanics of Economic Development and Institutional Change Bertin Martens 55 Individualism and the Social Order The social element in liberal thought Charles R. McCann, Jnr 56 Affirmative Action in the United States and India A comparative perspective Thomas E. Weisskopf 57 Global Political Economy and the Wealth of Nations Performance, institutions, problems and policies Edited by Phillip Anthony O’Hara 58 Structural Economics Thijs ten Raa 59 Macroeconomic Theory and Economic Policy Essays in honour of Jean-Paul Fitoussi Edited by K. Vela Velupillai 60 The Struggle Over Work The “end of work” and employment alternatives in post-industrial societies Shaun Wilson 61 The Political Economy of Global Sporting Organisations John Forster and Nigel Pope 62 The Flawed Foundations of General Equilibrium Critical essays on economic theory Frank Ackerman and Alejandro Nadal 63 Uncertainty in Economic Theory Essays in honor of David Schmeidler’s 65th birthday Edited by Itzhak Gilboa
Uncertainty in Economic Theory Essays in honor of David Schmeidler’s 65th birthday
Edited by Itzhak Gilboa
First published 2004 by Routledge 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Routledge 29 West 35th Street, New York, NY 10001 This edition published in the Taylor & Francis e-Library, 2006.
“To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” Routledge is an imprint of the Taylor & Francis Group © 2004 selection and editorial matter, Itzhak Gilboa; individual chapters, the contributors All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data A catalog record for this book has been requested ISBN 0-415-32494-7
Contents
List of contributors Preface
xii xv
PART I
Theory 1 Introduction
1 3
ITZHAK GILBOA
2 Preference axiomatizations for decision under uncertainty
20
PETER P. WAKKER
3 Defining ambiguity and ambiguity attitude
36
PAOLO GHIRARDATO
4 Introduction to the mathematics of amibiguity
46
MASSIMO MARINACCI AND LUIGI MONTRUCCHIO
5 Subjective probability and expected utility without additivity
108
DAVID SCHMEIDLER
6 Maxmin expected utility with non-unique prior
125
ITZHAK GILBOA AND DAVID SCHMEIDLER
7 A simple axiomatization of nonadditive expected utility
136
RAKESH SARIN AND PETER P. WAKKER
8 Updating ambiguous beliefs ITZHAK GILBOA AND DAVID SCHMEIDLER
155
x
Contents
9 A definition of uncertainty aversion
171
LARRY G. EPSTEIN
10 Ambiguity made precise: a comparative foundation
209
PAOLO GHIRARDATO AND MASSIMO MARINACCI
11 Stochastically independent randomization and uncertainty aversion
244
PETER KLIBANOFF
12 Decomposition and representation of coalitional games
261
MASSIMO MARINACCI
PART II
Applications
281
13 An overview of economic applications of David Schmeidler’s models of decision making under uncertainty
283
SUJOY MUKERJI AND JEAN-MARC TALLON
14 Ambiguity aversion and incompleteness of contractual form
303
SUJOY MUKERJI
15 Ambiguity aversion and incompleteness of financial markets
336
SUJOY MUKERJI AND JEAN-MARC TALLON
16 A quartet of semigroups for model specification, robustness, prices of risk, and model detection
364
EVAN W. ANDERSON, LARS PETER HANSEN, AND THOMAS J. SARGENT
17 Uncertainty aversion, risk aversion, and the optimal choice of portfolio 419 JAMES DOW AND SÉRGIO RIBEIRO DA COSTA WERLANG
18 Intertemporal asset pricing under Knightian uncertainty
429
LARRY G. EPSTEIN AND TAN WANG
19 Sharing beliefs: between agreeing and disagreeing ANTOINE BILLOT, ALAIN CHATEAUNEUF, ITZHAK GILBOA, AND JEAN-MARC TALLON
472
Contents xi 20 Equilibrium in beliefs under uncertainty
483
KIN CHUNG LO
21 The right to remain silent
522
JOSEPH GREENBERG
22 On the measurement of inequality under uncertainty
531
ELCHANAN BEN-PORATH, ITZHAK GILBOA, AND DAVID SCHMEIDLER
Index
541
Contributors
Evan W. Anderson is Assistant Professor of Economics, University of North Carolina at Chapel Hill. His research interests include heterogeneous agents, recursive utility, robustness, and computational methods. Elchanan Ben-Porath is Professor of Economics at the Hebrew University of Jerusalem. His fields of interest include game theory, decision theory, and social choice theory. Antoine Billot is Professor of Economics at the Université de Paris II, Panthéon-Assas, and junior member of the Institut Universitaire de France. His research interests are in the field of preference theory, social choice theory, and decision theory. Alain Chateauneuf is Professor of Mathematics at the Université de Paris I, Panthéon-Sorbonne. His research is mainly concerned with mathematical economics, focusing on decision theory and particularly on decision under uncertainty. James Dow is Professor of Finance at London Business School. His recent research has been on models that integrate the financial markets with corporate finance. He has also worked on executive compensation and leadership. In his work on Knightian uncertainty with Sérgio Welang, he applied the models developed by David Schmeidler to portfolio choice, to stock price volatility, to the no-trade theorem, and to Nash equilibrium. Larry Epstein is the Elmer B. Milliman Professor of Economics at the University of Rochester. His research interests include decision theory and its applications to macroeconomics and finance. Paolo Ghirardato is Associate Professor of Mathematical Economics at the Università di Torino. His main research interest is decision theory and its consequences for economic, political, and financial modeling. Itzhak Gilboa is Professor of Economics at Tel-Aviv University and a Fellow of the Cowles Foundation for Research in Economics at Yale University. He is interested in decision theory, game theory, and social choice.
Contributors xiii Joseph Greenberg is the Dow Professor of Political Economy at McGill University. His research interests include economic theory, game theory, and theory of social situations. Lars Peter Hansen is the Homer J. Livingston Distinguished Service Professor at the University of Chicago. He is interested in macroeconomic theory, risk, and uncertainty. Peter Klibanoff is Associate Professor of Managerial Economics and Decision Sciences at the Kellogg School of Management, Northwestern University. His research interests include decision making under uncertainty, microeconomic theory, and behavioral finance. Kin Chung Lo is Associate Professor of Economics at York University. He specializes in game theory and decision theory. His publications cover areas such as nonexpected utility, auctions, and foundation of solution concepts in games. Massimo Marinacci is Professor of Applied Mathematics at the Università di Torino, Italy. His main research interest is mathematical economics, in particular choice theory. Luigi Montrucchio is Professor of Economics at the Università di Torino, Italy. His main research interest is mathematical economics, in particular economic dynamics and optimal growth. Sujoy Mukerji is a University Lecturer in Economic Theory at the University of Oxford and Fellow of University College. His research has primarily been on decision making under ambiguity, its foundations, and its relevance in economic contexts. His broader research interests lie in the intersection of bounded rationality and economic theory. Thomas Sargent is Professor of Economics at New York University and senior fellow at the Hoover Institution at Stanford. He is interested in macroeconomics and applied economic dynamics. Rakesh Sarin is Professor of Decisions, Operations, and Technology Management and the Paine Chair in Management at the Anderson School of Management at the University of California at Los Angeles. He is interested in decision analysis, societal risk analysis, and risk analysis. David Schmeidler is Professor of Statistics and Management at Tel-Aviv University and Professor of Economics at Ohio State University. His research topics include economic theory, game theory, decision theory, and social choice. Jean-Marc Tallon is Directeur de Recherche at CNRS and Université Paris I. His research mainly deals with economic applications of models of decision under uncertainty, and general equilibrium models with incomplete market.
xiv Contributors Peter P. Wakker is Professor of Decision under Uncertainty at the University of Amsterdam. His research is on normative foundations of Bayesianism and on descriptive deviations from Bayesianism, the latter both theoretically and empirically. Tan Wang is Associate Professor at the Sauder School of Business, University of British Columbia. His research interest is in decision theory under risk and uncertainty and asset pricing, focusing on the implications of uncertainty aversion. Sérgio Ribeiro da Costa Werlang is Professor of Economics at the Getulio Vargas Foundation. His interests include economic theory and macroeconomics.
Preface
This book is published in celebration of David Schmeidler’s 65th birthday. It is a collection of seventeen papers that have appeared in refereed journals, combined with five introductory chapters that were written for this volume. All papers deal with uncertainty (or “ambiguity”) in economic theory. They range from purely theoretical issues such as axiomatic foundations, definitions, and measurement, to economic applications in fields such as contract theory and finance. There is a large and rapidly growing literature on uncertainty in economic theory, following David’s seminal work on non-additive expected utility. But there is no general introduction to the topic, and scholars who are interested in it are usually referred to the original papers, which are scattered in various journals and are often rather technical. We felt that a collection of papers and introductory surveys would make a significant contribution to the literature, allowing to introduce the novice and to guide the expert. Thus, David’s birthday was the impetus for the publication of this collection, but the latter has an independent raison-d’être. We have no intention or pretense to summarize David Schmeidler’s research. Indeed, uncertainty in economic theory is but one topic that David has worked on. He has made many other remarkable and path breaking contributions to game theory, mathematical economics, economic theory, and decision theory. We do not attempt to give an overview of David’s contributions for two reasons. First, such an overview is a daunting task. Second, this book does not mark David’s retirement in any way. David Schmeidler is a very active researcher. We hope and believe that he will continue to produce new breakthroughs in the future. This book marks a special birthday, but by no means the end of a research career. We thank the contributors to the volume, as well as the publishers for the right to reprint published papers. We hope that this collection, while obviously partial, will give readers a preliminary overview of the research on uncertainty in economic theory. We are grateful to Ms. Lada Burde for her invaluable help in editing and proofreading this volume. Paolo Ghirardato, Itzhak Gilboa, Massimo Marinacci, Luigi Montrucchio, Sujoy Mukerji, Jean-Marc Tallon, and Peter P. Wakker
Part I
Theory
1
Introduction Itzhak Gilboa
1.1. Uncertainty and Bayesianism Ever since economic theory started to engage in formal modeling of uncertainty, it has espoused the Bayesian paradigm. In the mid-twentieth century, the Bayesian approach came to dominate decision theory and game theory, and it has remained a dominant paradigm in the applications of these theories to economics to this day. Economic problems ranging from insurance and portfolio selection to signaling and health policy are typically analyzed in a Bayesian way. In fact, there is probably no other field of formal inquiry involving uncertainty in which Bayesianism enjoys such a predominant status as it does in economic theory. But what does it exactly mean to be Bayesian? One may discern at least three distinct tenets that are often assumed to be held by Bayesians. First, a Bayesian quantifies uncertainty in a probabilistic way. Second, Bayesianism entails updating one’s belief given new information in accordance with Bayes’ law. Finally, in light of the axiomatizations of the Bayesian approach (Ramsey, 1931; de Finetti, 1937; Savage, 1954), Bayesianism is often taken to also imply the maximization of expected utility (EU) relative to probabilistic beliefs.1 Taken as assumptions regarding the behavior of economic agents, all three tenets of Bayesianism have come under attack. The assumption of EU maximization was challenged by Allais (1953). The famous Allais paradox, combined with the body of work starting with Kahneman and Tversky’s Prospect Theory (1979), aimed to show that people may fail to maximize EU even in the face of decisions under risk, namely, where probabilities are known. Tversky and Kahneman (1974) have also shown that people may fail to perform Bayesian updating. That is, even when probabilities are given in a problem, they might not be manipulated in accordance with Bayes’ law. Thus, the second tenet of Bayesianism has also been criticized in terms of descriptive validity. Moreover, other work by Kahneman and Tversky, such as the documentation of framing effects (Tversky and Kahneman, 1981) has shown that some of the implicit assumptions of the Bayesian model are also descriptively inaccurate. Yet, violations of the second and third tenets have not amounted to a serious critique of Bayesianism per se. Violations of Bayesian updating are viewed by most researchers as mistakes. While these mistakes pose a challenge to descriptive
4
Itzhak Gilboa
Bayesian theories, they fail to sway one from the belief that Bayes’ law should be the way that probabilities be updated. Some researchers also view violations of EU maximization (given a probability measure) as plain mistakes, which do not challenge the normative validity of the theory. Other researchers disagree. At any rate, these violations do not clash with the Bayesian view, as statisticians or computer scientists understand it. That is, an agent may quantify uncertainty by a prior probability measure, and update this prior to a posterior in a Bayesian way, without maximizing EU with respect to her probabilistic beliefs.2 This book is devoted to behavioral violations of the first tenet of Bayesianism, namely, that all uncertainty can be quantified by a probability measure. In contrast to the other two types of violations, the rejection of the first tenet is a direct attack on the essence of the Bayesian approach, even when the latter is interpreted as a normative theory. As we argue shortly, there are situations in which violations of the first tenet cannot be viewed as mistakes, and cannot be easily corrected even by decision makers who are willing to convert to Bayesianism. When explaining the basic notion of uncertainty, as opposed to risk, one often starts out with Ellsberg’s (1961) famous examples (the “Ellsberg paradox”, which refers both to the two-urn and to the one-urn experiments). These experiments show that many people tend to prefer bets with known probabilities to bets with unknown ones, in a way that cannot be reconciled with the first tenet of Bayesianism. Specifically, Ellsberg’s paradox provides an example in which Savage’s axiom P2 is consistently violated by a nonnegligible proportion of decision makers.3 A decision maker who violates P2 as in Ellsberg’s paradox does not only deviate from EU maximization. Rather, such a decision maker exhibits a mode of behavior that cannot be described as a function of a probability measure. To the extent that behavioral data can challenge a purely cognitive assumption, Ellsberg’s paradox exhibits a violation of the first tenet of Bayesianism. Yet, David Schmeidler’s interest in uncertainty was not aroused by Ellsberg’s paradox4 or by any other behavioral manifestation of a non-Bayesian approach. Rather, Schmeidler’s starting point was purely cognitive: like Knight (1921) and Ellsberg, he did not find the first tenet of Bayesianism plausible. Specifically, Schmeidler argued that the Bayesian approach “does not reflect the heuristic amount of information that led to the assignment of […] probability” (Schmeidler, 1989: 571). His example was the following: assume that you take a coin out of your pocket, and that you are about to bet on it. You have tossed this coin many times in the past, and you have not observed any significant deviations from the assumption of fairness. For the sake of argument, assume that you have tossed the coin 1,000 times and that it has come up exactly 500 times each as Heads and Tails. Thus, you assign probability of 50 percent to the coin coming up on Heads, as well as on Tails, in the next toss. Next assume that your friend takes a coin out of her pocket. You have absolutely no information about this coin. If you are asked to assign probabilities to the two sides of the coin, you may well follow symmetry considerations (equivalently, Laplace’s principle of insufficient reason) and assign probability of 50 percent to each side of this coin as well. However,
Introduction
5
argued Schmeidler, the 50 percent that are based on empirical frequencies in large databases do not “feel” the same as the 50 percent that were assigned based on symmetry considerations. The Bayesian approach, in insisting that every source of uncertainty be quantified by a (single, additive) probability measure, is too restrictive. It does not allow the amount of information used for probabilistic assessments to be reflected in these assessments. It seems to be a natural step to couch this cognitive observation in a behavioral setup. Indeed, Ellsberg’s two-urn experiment is very similar to Schmeidler’s contrast between the two coins. It is, however, important to note that Schmeidler’s critique of Bayesianism starts from a cognitive perspective. It is not motivated by an observed pattern of behavior, such as much of the work ensuing from Allais’s paradox. Relatedly, Schmeidler’s critique of the first tenet of Bayesianism is not solely on descriptive grounds. Starting with the logic of the Bayesian approach, rather than with experimental evidence, this critique cannot be dismissed as focusing on a setup in which decision makers err. Rather, Schmeidler’s point was that in many situations there is not enough information for the generation of a Bayesian prior. In these situations, it is not clear that the rational thing to do is to behave as if one had such a prior. These considerations also raise doubts regarding the definition of rationality by internal consistency of decisions or of statements. If we were to assume that “rationality” only means coherence, Savage’s axioms would appear to be a very promising candidate for a canon of rationality. However, if we take these axioms as a rationality test, it is too easy to pass it: in a situation of uncertainty, one can arbitrarily choose any prior probability and behave so as to maximize EU with respect to this prior. This will clearly suffice to pass Savage’s rationality test. But it would not seem rational by any intuitive definition of the term. Ellsberg’s paradox is an extremely elegant illustration of a behavioral rejection of the first tenet of Bayesianism. It manages to translate the cognitive unease with the Bayesian approach to observed choice, thanks to certain symmetries in the decision problem. But these symmetries may also be misleading. While many decision makers violate Savage’s P2 in Ellsberg’s paradox, it seems easy enough to “correct” their choices so that they correspond to Savage’s theory. In both of Ellsberg’s examples there is enough symmetry to allow for Laplace’s principle of insufficient reason to pinpoint a single probability measure. This symmetric prior might appear as a natural candidate for the would-be Bayesian, and it might give the impression that violations of P2 can easily be worked around. Cognitive ease aside, the decision maker may behave as if she were Bayesian. This impression would be wrong. Most real-life problems do not exhibit enough symmetries to allow for a Laplacian prior. To consider a simple example, assume that one faces the uncertainty of war. There are only two states of the world to consider – war and no war. Empirical frequencies surely do not suffice to generate a prior probability over these two states, since the uncertain situation cannot be construed as a repeated experiment. Therefore, this is a situation of uncertainty as opposed to risk. But it would be ludicrous to suggest that the probability of war should be 50 percent, simply because there are two states of the world
6
Itzhak Gilboa
with no historical data on their relative frequencies. Indeed, in this situation there is sufficient reason to distinguish between the two states, though not sufficient information to generate a Bayesian prior. Ellsberg’s paradox, as well as Schmeidler’s coin example, should therefore be taken with a grain of salt. They drive home the point that the cognitive unease generated by uncertainty may have behavioral implications. But they do not capture the complexity of a multitude of real-life decisions in which there is sufficient reason (to distinguish among states) but not sufficient information (to generate a prior).
1.2. Nonadditive probabilities (CEU) David Schmeidler’s first attempt to model a non-Bayesian approach to uncertainty involved nonadditive probabilities. This term refers to mathematical entities that resemble probability measures, with the exception that they need not satisfy the additivity axiom. The idea can be simply explained in Schmeidler’s coin example (equivalently, in Ellsberg’s two-urn paradox). Assume, again, that there are two coins. One, the “known” coin, has been tossed many times, with a relative frequency of 50 percent Heads and 50 percent Tails. The other, the “unknown” one, has never been tossed before. Assume further that a decision maker feels uneasy about betting on the unknown coin. That is, she prefers to bet on the known coin coming up Heads than on the unknown coin coming up Heads, and the same applies to Tails. This preference pattern holds despite the fact that the decision maker agrees that both coins will eventually come up on either Heads or Tails. To be more concrete, the decision maker is indifferent between betting on “the known coin comes up Heads or Tails” and on “the unknown coin comes up Heads or Tails”. Should probabilities reflect willingness to bet, argued Schmeidler, the probability that the decision maker assigns to the unknown coin coming up Heads is lower than she does for the known coin coming up Heads, and the same would apply to Tails. To be precise, consider a model with four states of the world, each one describing the outcome of both coin tosses: S = {HH, HT, TH, TT}. HH denotes the state in which both coins come up Heads; HT – the state in which the known coin come up Heads and unknown – Tails; and so forth. In this setup, imagine that the probability of {HH, HT} and of {TH, TT} is 50 percent, whereas the probability of {HH, TH} and of {HT, TT} is 40 percent. This would reflect the fact that the EU of a bet on any side of the known coin is higher than that of a bet on either side of the unknown coin. Yet, the union of the first pair of events, {HH, HT} and {TH, TT}, equals the union of the second pair, namely, {HH, TH} and {HT, TT}, and it equals S. Thus, if probabilities reflect willingness to bet, they are nonadditive: the probability of each of {HH, TH} and {HT, TT} is 40 percent, while the probability of their union is 100 percent. More generally, nonadditive probabilities are real-valued set functions that are defined over a sigma-algebra of events. They are assumed to satisfy three conditions: (i) monotonicity with respect to set inclusion; (ii) assigning zero to the empty
Introduction
7
set; and (iii) assigning 1 to the entire state space (normalization). Observe that no continuity is generally assumed. Hence, adding the requirement of additivity (with respect to the union of two disjoint events) would result in a finitely additive (rather than sigma-additive) probability, in accordance with the derivations of de Finetti (1937) and Savage (1954). As illustrated by the coin example, nonadditive probability measures can reflect the amount of information that was used in estimating a probability of an event. But how does one compute EU with respect to a nonadditive probability measure? Or, how does one define an integral of a real-valued function with respect to such a measure? In the simple case where the function assumes a positive value x on an event A, and otherwise zero, the answer seems simple: the integral should be xv(A) where v(A) denotes the nonadditive probability of A. Indeed, this definition has been implicit in our discussion of “willingness to bet on an event” mentioned earlier. If x stands for the utility level of the more desirable outcome, and zero – of the less desirable one, then 0.4x would be the integral of the bet on each side of the unknown coin, whereas 0.5x would be the integral of the bet on each side of the known coin. What happens, then, if a function assumes two positive values, x on A and y on B (where A and B are two disjoint events)? The straightforward extension would be to define the integral as xv(A) + yv(B). Indeed, this definition would seem to generalize the Riemann integral for the case in which v is nonadditive: it sums the areas of rectangles whose base is the domain of the function, and their height is the value of the function (Figure 1.1). Yet, this definition is problematic. First, letting y approach x, one finds that the integral thus defined is not continuous with respect to the integrand. Specifically, for x = y the value of the integral would be xv(A ∪ B), which will, in general, differ from xv(A) + yv(B) = x[v(A) + v(B)].
x
y xv (A) yv(B)
A
B S
Figure 1.1
8
Itzhak Gilboa
x (x – y)v(A) y
yv(AUB )
A
B S
Figure 1.2
Second, the same example can serve to show that the integral is not monotone with respect to the integrand: a function f may dominate another function g (pointwise), yet the integral of f will be strictly lower than that of g. Schmeidler’s solution to these difficulties was to use the Choquet integral. Choquet (1953–54) dealt with capacities, which he defined as nonadditive probability measures that satisfy certain continuity conditions. Choquet defined a notion of integration of real-valued functions with respect to capacities that satisfies both continuity and monotonicity with respect to the integrand. In the earlier example, the Choquet integral would be computed as follows: assume that x > y. Over the event A ∪ B, one is guaranteed the value y. Hence, let us first calculate yv(A ∪ B). Next, over the event A the function is above y. The additional value, (x − y), is added to y over the event A, but not over B. Thus, we add to the integral (x−y)v(A). Overall, the integral of the function would be (x−y)v(A)+yv(A∪B). This is also the sum of areas of rectangles. But this time their height is not the value of the function. Rather, it is the difference between two consecutive values that the function assumes. (See Figure 1.2.) Observe that this value equals xv(A) + yv(B) if v happens to be additive. That is, this definition generalizes the standard one for additive measures. But even if v is not additive, the Choquet integral retains the properties of continuity and monotonicity. Consider a simplified version of the coin example mentioned earlier, where we ignore the known coin and focus on the unknown coin. There are only two states of the world, H and T. Let us assume that v(H) = v(T) = 0.4. Assume that a function f takes the value x at H and the value y at T. Then, if x ≥ y ≥ 0 the Choquet integral of f is 0.4(x − y) + y = 0.4x + 0.6y. If, however, y > x ≥ 0, the Choquet integral of f is 0.4(y − x) + x = 0.6x + 0.4y.
Introduction
9
The definition of the Choquet integral for general functions follows the intuition stated earlier. The following chapters contain a precise definition and other details. For the time being it suffices to mention that the Choquet integral is, in general, continuous and monotone with respect to its integrand. Schmeidler proposed that decision makers behave as if they were maximizing the Choquet integral of their utility function, where the integral is computed with respect to their subjective beliefs, and the latter are modeled by a nonadditive probability measure. This claim is often referred to as “Choquet Expected Utility” (CEU) theory. Whereas the notion of capacities and the Choquet integral existed in the mathematical literature, Schmeidler’s (1989) was the seminal paper that first applied these concepts to decision under uncertainty, and that also provided an axiomatic foundation for CEU, comparable to the derivation of Expected Utility Theory (EUT) by Anscombe and Aumann (1963). In particular, the axiomatic derivation identifies the nonadditive probability v uniquely, and the utility function – up to a positive linear transformation.5
1.3. A cognitive interpretation of CEU with convex capacities Schmeidler defined a nonadditive (Choquet) EU maximizer to be uncertainty averse if her nonadditive subjective probability v was convex. Convexity, as defined in cooperative game theory,6 means that for any two events A and B, v(A) + v(B) ≤ v(A ∩ B) + v(A ∪ B). This condition is also referred to as supermodularity, or 2-monotonicity, and it is equivalent to stating that the marginal v-contribution of an event is always nondecreasing in the following sense: suppose that an event R is “added”, in the sense of set union, to events S and T that are disjoint from R. Suppose further that S is a subset of T . Then v is convex if and only if, for every such three events R, S, and T , the marginal contribution of R to T , v(T ∪ R) − v(T ), is at least as high as the marginal contribution of R to S, v(S ∪ R) − v(S). Later literature has questioned the appropriateness of this definition of uncertainty aversion, and has provided several alternative definitions. Chapter 3 is devoted to this issue and we do not dwell on it here. But convexity of nonadditive measures has remained an important property for other reasons. It is well known in cooperative game theory that convex games have a nonempty core. That is, if a nonadditive measure v is convex, then there are (finitely) additive probability measures p that dominate it pointwise (p(A) ≥ v(A) for every event A). In the context of cooperative game theory, a dominating measure p suggests a stable imputation: a way to split the worth of the grand coalition, v(S) = p(S) = 1, among its members, in such a way that no coalition A has an incentive to deviate and operate on its own. Schmeidler (1986) showed that, for a convex game v, the Choquet integral of every real-valued function with respect to v equals the minimum over all integrals of this function with respect to the various (additive) measures in the core of v. Conversely, if a game v has a nonempty core, and the Choquet integral of every
10
Itzhak Gilboa
function with respect to v equals the minimum, over additive measures in the core, of the integrals of this function, then v is convex. Consider again the simplified version of the coin example mentioned earlier, where there are only two states of the world, H and T. Assume that v(H) = v(T) = 0.4. This v is convex and its core is Core(v) = {(p, 1 − p) | 0.4 ≤ p ≤ 0.6}. For a function f that takes the value x ≥ 0 at H and the value y ≥ 0 at T, we computed the Choquet integral with respect to v, and found that it is 0.4x + 0.6y if x ≥ y and 0.6x + 0.4y if y > x. It is readily observed that this integral is precisely Min0.4 ≤ p ≤ 0.6 px + (1 − p)y. That is, the Choquet integral (of any non-negative) f with respect to the convex v equals the minimum of the integrals of f relative to additive measures, over all additive measures in Core(v). The decision maker might therefore be viewed as if she does not know what probability measure governs the unknown coin. But she believes that each side of the coin cannot have a probability lower than 40 percent. Thus, she considers all the probability measures that are consistent with this estimate. Each such probability measure defines an integral of the function f . Faced with this set of integral values, the CEU maximizer behaves as if the lowest possible expected value of f is the relevant one. In other words, CEU (with respect to a convex capacity) may be viewed as a theory combining the maxmin principle and EU: the decision maker first computes all possible EU values, then considers the minimal of those, and finally – chooses an act that maximizes this minimal EU. Whenever a CEU maximizer has a convex capacity v, this cognitive interpretation of the Choquet integral holds. The set of probabilities with respect to which one takes the minimum of the integral may be interpreted as representing the information available to the decision maker. In the example stated earlier, the decision maker might not be able to specify the probabilities of the events in question, but she may be able to provide bounds on these probabilities. This cognitive interpretation should be taken with a grain of salt. Observe that in the coin example, the decision maker has no information about the unknown coin. If we were to ask what is the set of probabilities that she deems as possible, we would have to include all probability measures, {(p, 1 − p) | 0 ≤ p ≤ 1}. Yet, the decision maker behaves as if only the measures {(p, 1 − p) | 0.4 ≤ p ≤ 0.6} were indeed possible. Thus, the core of the capacity v need not coincide with the probabilities that are, indeed, possible according to available information. Rather, the core of v is the set of probabilities that the decision maker appears to entertain, given her choices, and in the context of the maxmin decision rule. One may conceive of other decision rules that would give rise to other sets of probability measures, and it is not clear, a priori, that the maxmin framework is the appropriate one to elicit the decision maker’s “real” beliefs.
Introduction
11
1.4. Multiple priors (MMEU) In the coin example, as well as in Ellsberg’s experiments, the information that is explicitly provided to the decision maker can be fully captured by placing lower and/or upper bounds on the probabilities of specific event. For instance, in the coin example it is natural to imagine that the probability of each side of the “known” coin is known to be 50 percent, whereas the probability of each side of the “unknown” coin is only known to be in the range (0, 1). As mentioned earlier, the decision maker’s behavior may not be guided by this set of probabilities. Rather, the decision maker may behave as if each side of the unknown coin has some probability in the range (0.4, 0.6). Further, her behavior may exhibit some uncertainty about the probability governing the known coin as well, and we may find that she behaves as if each side of the known coin has some probability in the range, say, (0.45, 0.55). In all these examples, both explicitly given information and behaviorally derived uncertainty are reflected by lower and upper bounds on the probabilities of specific events. Since upper bounds on the probability of an event may be written as a lower bound on the probability of its complement, lower bounds suffice. That is, one may define the set of relevant probability measures by simple constraints of the form p(A) ≥ v(A) for various events A and an appropriately chosen v. In other words, the set of probabilities may be defined by a nonadditive probability measure, interpreted as the lower bound on the unknown probability.7 But should one follow this cognitive interpretation of CEU, one may find it too restrictive. For example, one might believe that an event A is at least twice as likely as an event B. Thus, one would like to consider only probability measures that satisfy p(A) ≥ 2p(B). This is a simple linear constraint, but it cannot be reduced to constraints of the form p(A) ≥ v(A). Moreover, one may have various pieces of information that restrict the set of probability measures that might be governing the decision problem, that are not representable by linear constraints. For example, assume that a random variable is known (or assumed) to have a normal distribution, with unknown expectation and variance. Ranging over all the possible values of the unknown parameters results in a set of probability measures. This set will generally not be defined by a lower bound nonadditive probability function v. In fact, all problems that are analyzed by the tools of classical statistics are modeled by a set of possible probability measures, over which one has no prior distribution. Thus, a huge variety of problems encountered on a daily basis by individual decision makers, scientists, professional consultants, and other experts involve sets of probability, where, for the most part, these sets do not constitute the core of a nonadditive measure. Whereas classical statistics does not offer a general decision theory, it is natural to extend the CEU interpretation for the case of a convex v to this more general setup. Specifically, assume that a decision maker conceives of a state space S. Over this state, she considers as possible a set of probability measures C. Given a choice problem, and assuming that the decision maker has a utility function u
12
Itzhak Gilboa
defined over possible outcomes, she might adopt the following decision rule: for each possible act f , and for each possible (additive) probability measure p ∈ C, compute the EU of f relative to p. Next consider the minimum (or infimum) of these EU values of f , ranging over all measures in C. Evaluate f by this minimum value, and choose the act that maximizes this index. This theory was suggested and axiomatized by Gilboa and Schmeidler (1989). It has become to be known as the Maxmin Expected Utility (MMEU) model, or the “multiple prior” model. The axiomatization derives a set of priors C that is convex.8 Indeed, given the decision rule of MMEU, any set of probability measures C is observationally equivalent to its convex hull. Given the restriction of convexity, the set C is uniquely identified by the decision maker’s preferences. The utility function in this model is unique up to positive linear transformation, and it is identified in tandem with the set C. In a sense, MMEU theory provided classical statistics with the foundations that Ramsey, de Finetti, and Savage provided Bayesian statistics.9 EUT theory specified how a Bayesian prior might be used for decision making, and the axiomatizations of (subjective) EUT offered a derivation of this prior from observable behavior. Similarly, MMEU specified how decisions might be made given a set of priors, and the related axiomatization provided a derivation of the set of priors. However, with a set of priors there seems to be much lower degree of agreement about the appropriate way to use it for decision making. In particular, the maxmin criterion was often criticized for being too extreme, and several alternatives have been offered. In particular, Jaffray (1989) has offered to use Hurwicz’s α-criterion over the set of EU values of an act, and Klibanoff et al. (2003) offer to aggregate all these EU values. The Gilboa–Schmeidler axiomatization is based on behavioral data. As such, the set of priors that they derive shares the duality of interpretation with the core of a convex capacity. That is, while it is tempting to interpret the set of priors as reflecting the information available to the decision maker, the two might differ. The set of priors is simply those probabilities that describe the decision maker’s behavior, via the maxmin rule, should she satisfy the Gilboa–Schmeidler axioms. It is possible that a decision maker would have actual information represented by a set C, but that she would behave according to the maxmin rule with respect to a different set of priors C . With the caveat mentioned earlier, MMEU has two main advantages over CEU. First, a general set of priors, restricted only by convexity, may represent a much larger variety of decision situations, than may a set that has to be the core of a convex capacity.10 Second, to many authors MMEU appears to be a more intuitive theory than does CEU.11 MMEU is almost as simple to explain as classical EUT. At the same time, MMEU may be easier to implement than EUT, because the former relaxes the informational requirements imposed by the latter. Given that Schmeidler’s interest in uncertainty started with a cognitive unease generated by the assumptions of the Bayesian approach, it is comforting to know that an alternative theory can be offered that can relax the first tenet of Bayesianism, but that retains the cognitive appeal of EUT.
Introduction
13
1.5. Related literature In the introduction, we followed the development of CEU and of MMEU in an associative and chronological order, tracing the path that Schmeidler had taken in his thoughts about decision under uncertainty. Indeed, CEU and MMEU will remain the focus of this volume. However, these theories bear some similarities to other theories of belief representation and/or of decision making. While we do not intend to provide here a complete history of reasoning about uncertainty, the reader would probably benefit from a brief survey of a few other, closely related theories. All of them were developed independently of CEU and MMEU, and some of these developments were more or less concurrent with the development of CEU and MMEU. 1.5.1. Rank dependent expected utility (RDEU)12 Several psychologists have suggested the notion that individuals may not perceive probabilities correctly. This idea dates back to Preston and Baratta (1948) and Edwards (1954), but it has gained popularity among economists mostly with prospect theory (PT), suggested by Kahneman and Tversky (1979). Specifically, it is postulated that, in describing decision making under risk, an event with a stated probability of p has a decision weight f(p) which is, in general, different from p. It is typically assumed that small probabilities are weighted in a disproportionate way, namely, that f(p) > p for small values of p, and that a converse inequality holds when p is close to 1. If we were to separate this idea from the other ingredients of PT (most notably, from gain–loss asymmetry), we would have the following generalization of EUT: faced with a lottery that promises an outcome xi with probability pi , the decision maker evaluates it by f(pi )u(xi ) rather than by pi u(xi ). While this idea can quite intuitively explain many violations of EUT under risk, it poses several theoretical difficulties. First, the evaluation of a lottery depends on its presentation: if the same outcome appears twice in the lottery (with two distinct probabilities), they will enter utility calculations differently than in the case in which they appear only once, with the sum of the corresponding probabilities. Second, the functional f(pi )u(xi ) fails to be continuous in the outcomes xi . Finally, it fails to respect first order stochastic dominance. Specifically, it may decrease as some of the xi increase. All of these problems disappear if f is additive, but in this case it has to be the identity function and the model fails to capture distortion of probabilities (Fishburn, 1978). Prospect theory dealt with the first difficulty by an editing phase that the decision maker goes through before evaluating lotteries.13 But it did not offer solutions to the other two problems. These problems are reminiscent of those that one encounters when one attempts to use a naïve definition of integration with respect to a nonadditive measure, as discussed earlier (see Figure 1.1). Specifically, if one starts out with an additive probability measure P , and “distorts” it by a function f , one obtains a nonadditive probability v defined by v(A) = f(P (A)). In this case,
14
Itzhak Gilboa the functional f(pi )u(xi ) can be thought of as the naïvely defined integral of the utility of the outcome x, with respect to v. It comes as no surprise, then, that maximization of the functional f(pi )u(xi ) poses the same difficulties as those discussed earlier. The discussion of Choquet integration earlier may suggest that, in the context of decision under risk, PT may be modified so that it respects first order stochastic dominance and continuity. To this end, one would like to apply the distortion function f not to the probability that a certain outcome is obtained, but to the cumulative probability, that at least a certain outcome is obtained. Defining v = f(P ), this is tantamount to defining an integral as in Figure 1.2 as opposed to Figure 1.1. This idea was proposed, independently and more or less concurrently, by Quiggin (1982) and by Yaari (1987). Both were developed independently of Schmeidler’s work. Moreover, Weymark (1981) offered yet another independent derivation of the same functional in the context of social choice. The resulting model in the context of decision under risk has come to be known as the “rankdependent expected utility model” (see Chew (1983)), because in this model the decision weight of outcome xi does not depend only on the probability of that outcome, but also on the aggregate probabilities of all outcomes that are ranked above (or below) it. The rank-dependent model has been elaborated on by Segal (1989), and it has been applied to a range of economic problems, as well as tested experimentally. More recently, the rank-dependent model was combined with other ideas of PT to generate cumulative prospect theory (CPT, see Tversky and Kahneman, 1992). The rank-dependent model (without additional ingredients of PT) is a special case of CEU. Specifically, defining v = f(P ), CEU reduces to RDEU. However, the converse is false: not every CEU model can be represented as a RDEU model. Only a very special class of nonadditive probability measures v can be represented by an additive measure P and a distortion function f as stated earlier. The RDEU model does not deal with uncertainty. It is restricted to situations of risk, that is, of known probabilities. In particular, the RDEU model cannot help explain the pattern of choices observed in Ellsberg’s paradox. We therefore do not discuss the RDEU model in this book. 1.5.2. Belief functions The notion that a nonadditive set function may represent bounds on the probabilities of events dates back to the theory of belief functions suggested by Dempster (1967) and Shafer (1976). Assume that one gathers evidence for various events. Some evidence is very specific, suggesting that a particular state of the world is likely to be the case. Other evidence may be more nebulous, suggesting that a nonsingleton event is likely to have obtained, without specifying which state in it has indeed materialized. Generally, evidence is conceptualized as a non-negative number attached to an event. Thus, evidence is specific to an event, and the weight of evidence is measured numerically. Given a collection of such number–event pairs, what is the total weight of evidence supporting a particular event? In answering this question according to
Introduction
15
Dempster–Shafer’s theory, one first normalizes the weight of all evidence gathered so that it adds up to unity. Then one sums up the evidence for the event in question, as well as the evidence for each subset thereof. The resulting function is a nonadditive probability. It can also be shown that this nonadditive probability is convex. In fact, it satisfies a stronger condition than convexity, called infinite monotonicity. Conversely, an infinite monotone nonadditive measure can be obtained from a set of non-negative weights as described earlier. Such functions are called belief functions. In this theory, nonadditivity arises from the fact that evidence is not fully specified. In the context of the coin example, one might imagine that we have evidence for the fact that one of the sides of the coin will come up, but no evidence that specifically points to any side. The weight of evidence for the event {H, T} cannot be split between {H} and {T}. If one were to think in terms of a “true”, “objective” probability measure, one would view the belief function as a lower bound on the values of this probability measure: each event should be assigned a probability that is at least as large as the value attributed to it by the belief function. Since belief functions are convex, and hence have a nonempty core, there are always probability measures that satisfy the constraints represented by a belief function. Dempster–Shafer’s theory is purely cognitive. It has no behavioral component, and no decision theory attached to it. Dempster and Shafer have not offered an axiomatic foundation for their theory. That is, there is no set of axioms on observable data, such as likelihood judgments, that characterize a unique belief function related to these data. Yet, it shares with CEU the representation of uncertainty by a nonadditive measure, and the potential interpretation of this nonadditive measure as a lower bound on what the “real”, additive probability might be.
1.5.3. Multiple priors with unanimity ranking Suppose that a decision maker entertains a set of probability measures as possible priors. For every act, she has a range of possible expected utility values, computed according to these priors. In making a decision, the decision maker may summarize this range by a single number, as suggested by the maxmin, Hurwicz’s or some other criterion. But she may also refrain from collapsing this set of expected utilities into a single number. Rather, she may retain the entire set of EU values, indexed by the priors, as a representation of the act’s desirability. It is then natural to suggest that act f is preferred to act g, if and only if for each and every possible probability measure p, of EU values of f , with respect to p is above that of g. If we think of each possible prior as the opinion of a given individual, then f is preferred to g, if and only if f is considered to be better than g unanimously. Alternatively, if we were to think of probability measures as columns in a decision matrix that specifies, for every act, a row of EU values, this criterion would coincide with strict domination.14 This decision rule was axiomatized, independently, by Gilboa (1984) and Bewley (1986, 2003), both relying on a theorem of Aumann (1962).15
16
Itzhak Gilboa
Strict domination is, obviously, a partial ordering. It follows that the decision theory, one ends up with, will have to remain silent of certain choices. Specifically, if an act f is preferred to another act g according to some priors, but converse preference holds for other priors, the theory does not offer any prediction of choice. This violation of the completeness axiom is viewed by many as problematic for several reasons. Theoretically, the completeness axiom is often justified by necessity: a decision has to be made, so that ultimately revealed preference will have to decide whether f is (at least weakly) preferred to g or vice versa. From a more practical viewpoint, it is often hard to conduct economic analysis when the theory leaves considerable freedom in terms of its predictions. Bewley’s attempt to deal with these difficulties was to suggest that there always is a “status quo” act, f0 , which gets to be chosen unless another act dominates it, and then this new one becomes the new status quo. Bewley has not offered a theory of how the status quo is generated. However, it is still possible that such a theory be offered and complement the unanimous multiple prior model to be an alternative to MMEU.
1.6. Conclusion Choquet expected utility and MMEU were suggested as theories of decision making under uncertainty, rejecting the first tenet of Bayesianism. While some researchers view them solely as theories of bounded rationality, their starting point is not mistakes that decision makers might commit, but the theoretical inadequacy of the Bayesian paradigm. Specifically, when there is no sufficient information for a generation of a prior probability, it is not obvious how one can choose a prior rationally. Instead, one may entertain uncertainty, and make decisions in a way that reflects one’s state of knowledge. The chapters in this volume constitute a sample of the works published on uncertainty in economic theory. They are divided into two main parts: theory and applications, containing more detailed introductions that may help to orient the reader. It (almost) goes without saying that this volume is not exhaustive. Many very good papers, published and unpublished, were written on the topics discussed here, but could not be included in the volume due to obvious constraints. In making the particular selection we offer the reader here, we strove for brevity and variety, in the hope of whetting the reader’s appetite, and with no claim to exhaust the important contributions to this literature. A final caveat relates to terminology. As is often the case, authors who contributed to this volume do not always agree on the appropriate terms for various concepts. In particular, some authors feel very strongly about the choice between “uncertainty” and “ambiguity” (and, correspondingly, between “uncertainty aversion” and “ambiguity aversion”), while they refer, for the most part, to the same concept. Similarly, “MMEU” is sometimes referred to as “MEU” (leaving the maximization out, as in “EU” and “CEU”), or as “the multiple prior model”. Since
Introduction
17
all relevant concepts are defined formally in a way that leaves no room for confusion, we decided to let everyone use the terms they prefer, in the hope that the diverse terms would lead to more fruitful associations.
Acknowledgments I thank my colleagues for comments and suggestions. In particular, this introduction has benefited greatly from many comments by Peter Wakker.
Notes 1 Two more assumptions are often entailed by “Bayesianism”. First, a Bayesian is supposed to conceive of all relevant eventualities. Second, she is expected to apply the Bayesian approach to any decision problem. We do not dwell on these assumptions here. 2 Moreover, the agent may use her probabilistic beliefs in her decisions in a way that uniquely identifies these beliefs, yet that differs from EU maximization, as suggested by Machina and Schmeidler’s (1992) “probabilistic sophistication”. 3 Savage’s axiom P2 states that, if two acts are equal on a given event, then it should not matter what they are equal to on that event. That is, that one can determine preference between them based solely on their values on the event on which they differ. This axiom is often referred to as “the Sure Thing Principle”, though this term has been used in several other ways as well. See Wakker (Chapter 2). 4 In fact, Schmeidler was not aware of Ellsberg’s work when he started his study in the early 1980s. 5 Schmeidler’s work appeared as a working paper in 1982. Some of the mathematical analysis were published separately in Schmeidler (1986). However, it was not until 1989 that his main paper appeared in print. 6 Schmeidler’s choice of the letter v to denote a nonadditive measure was probably guided by his past experience with cooperative game theory, in which the letter v is a standard notation for the characteristic function of a transferable utility cooperative game (which is a nonadditive set function). See Shapley (1965). 7 This set of constraints defines the core of v. Yet, even if this set is a nonempty, v need not be convex. It will, however, be exact (see Schmeidler, 1972). 8 That is, if p and q are in C, then so is αp + (1 − α)q. 9 Gilboa and Schmeidler (1989) derivation was conducted in the framework of Anscombe and Aumann (1963). Derivations that do not resort to objective probabilities were provided by Casadesus-Masanell et al. (2000) and Ghirardato et al. (2001). 10 To be concrete, with a finite state space, the beliefs modeled by CEU are characterized by finitely many parameters, whereas the beliefs modeled by MMEU constitute an infinitely dimensional class. This does not imply that MMEU is a generalization of CEU. MMEU only generalizes CEU with a convex capacity. Generally, capacities need not be convex. Moreover, CEU may reflect uncertainty liking behavior, whereas a certain form of uncertainty aversion is built into MMEU. 11 The axioms of MMEU in the Anscombe–Aumann framework are also easier to interpret than those of CEU. 12 The term “rank dependent utility” is also used for this model. 13 Prospect theory deals with prospects rather than with lotteries, namely, with outcomes as they are perceived relative to a reference point. The notion of a reference point, and the idea that people respond to changes rather than to absolute levels, are ingredients of PT that we ignore here. 14 Alternatively, one may borrow the idea of interval orders (Fishburn, 1985) and argue that f is preferred to g if and only if the entire interval of expected utility values of f
18
Itzhak Gilboa
is above that of g. This would correspond to the notion of “overwhelming” strategy in game theory, which is evidently stronger than domination. 15 Gilboa (1984) appeared in a master’s thesis, and has not been translated from Hebrew. Bewley’s paper appeared as a Cowles Foundation discussion paper in 1986. Bewley took this decision rule as a building block of an elaborate theory.
References Allais, M. (1953), “Le Comportement de L’Homme Rationel devant le Risque: critique des Postulates et Axioms de l’Ecole Americaine,” Econometrica, 21: 503–546. Anscombe, F. J. and R. J. Aumann (1963), “A Definition of Subjective Probability,” The Annals of Mathematics and Statistics, 34: 199–205. Aumann, R. J. (1962), “Utility Theory without the Completeness Axiom,” Econometrica, 30: 445–462. Bayes, T. (1764), “Essay Towards Solving a Problem in the Doctrine of Chances,” Philosophical Transactions of the Royal Society of London. Bewley, T. F. (2003), “Knightian Decision Theory, Part I,” Decisions in Economics and Finance, 25: 79–110 (Cowles Foundation Discussion Paper, 1986). Casadesus-Masanell, R., P. Klibanoff, and E. Ozdenoren (2000), “Maxmin Expected Utility over Savage Acts with a Set of Priors,” Journal of Economic Theory, 92: 35–65. Chew, S. H. (1983), “A Generalization of the Quasilinear Mean with Applications to the Measurement of Income Inequality and Decision Theory Resolving the Allais Paradox,” Econometrica, 51: 1065–1092. Choquet, G. (1953–4), “Theory of Capacities,” Annales de l’Institut Fourier, 5 (Grenoble): 131–295. de Finetti, B. (1937), “La Prevision: Ses Lois Logiques, Ses Sources Subjectives,” Annales de l’Institute Henri Poincare, 7: 1–68. Dempster, A. P. (1967), “Upper and Lower Probabilities Induced by a Multivalued Mapping,” Annals of Mathematical Statistics, 38: 325–339. Edwards, Ward (1954), “The Theory of Decision Making,” Psychological Bulletin, 51: 380–417. Ellsberg, D. (1961), “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal of Economics, 75: 643–669. Fishburn, P. C. (1978), “On Handa’s ‘New Theory of Cardinal Utility’and the Maximization of Expected Return,” Journal of Political Economy, 86: 321–324. Fishburn, P. C. (1985), Interval Orders and Interval Graphs, John Wiley and Sons, New York, 1985. Ghirardato, P., F. Maccheroni, M. Marinacci et al. (2001), “A subjective Spin on Roulette Wheels,” Econometrica, 71 (6): 1897–1908, November 2003. (Reprinted as Chapter 6 in this volume.) Gilboa, I. (1984), Aggregation of Preferences, MA Thesis, Tel-Aviv University. Gilboa, I. and D. Schmeidler (1989), “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18: 141–153. Jaffray, J.-Y. (1989), “Linear Utility Theory for Belief Functions,” Operations Research Letters, 8: 107–112. Kahneman, D. and A. Tversky (1979), “Prospect Theory: An Analysis of Decision Under Risk,” Econometrica, 47: 263–291. Klibanoff, P., M. Marinacci, and S. Mukerji (2003), “A Smooth Model of Decision Making under Ambiguity,” mimeo.
Introduction
19
Knight, F. H. (1921), Risk, Uncertainty, and Profit. Boston, New York: Houghton Mifflin. Machina, M. and D. Schmeidler (1992), “A more Robust Definition of Subjective Probability,” Econometrica, 60: 745–780. Preston, M. G. and P. Baratta (1948), “An Experimental Study of the Auction Value of an Uncertain Outcome,” American Journal of Psychology, 61: 183–193. Quiggin, J. (1982), “A Theory of Anticipated Utility,” Journal of Economic Behaviour and Organization, 3: 323–343. Ramsey, F. P. (1931), “Truth and Probability,” The Foundation of Mathematics and Other Logical Essays. New York: Harcourt, Brace and Co. Savage, L. J. (1954), The Foundations of Statistics. New York: John Wiley and Sons. Schmeidler, D. (1972), “Cores of Exact Games, I,” Journal of Mathematical Analysis and Applications, 40: 214–225. Schmeidler, D. (1986), “Integral Representation without Additivity,” Proceedings of the American Mathematical Society, 97: 255–261. Schmeidler, D. (1989), “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57: 571–587. Segal, U. (1989), “Anticipated Utility: A Measure Representation Approach,” Annals of Operations Research, 19: 359–373. Shafer, G. (1976), A Mathematical Theory of Evidence. Princeton University Press, Princeton NJ. Shapley, L. S. (1965), “Notes on n-Person Games VII: Cores of Convex Games,” The RAND Corporation R.M. Reprinted as: Shapley, L. S. (1972), “Cores of Convex Games,” International Journal of Game Theory, 1: 11–26. (Reprinted as Chapter 5 in this volume.) Tversky, A. and D. Kahneman (1974), “Judgment under Uncertainty: Heuristics and Biases,” Science, 185(4157): 1124–1131. Tversky, A. and D. Kahneman (1981), “The Framing of Decisions and the Psychology of Choice,” Science, 211(4481): 453–458. Tversky, A. and D. Kahneman (1992), “Advances in Prospect Theory: Cumulative Representation of Uncertainty,” Journal of Risk and Uncertainty, 5: 297–323. Weymark, J. A. (1981), “Generalized Gini Inequality Indices,” Mathematical Social Sciences, 1: 409–430. Yaari, M. E. (1987), “The Dual Theory of Choice under Risk,” Econometrica, 55: 95–115.
2
Preference axiomatizations for decision under uncertainty Peter P. Wakker
Several contributions in this book present axiomatizations of decision models, and of special forms thereof. This chapter explains the general usefulness of such axiomatizations, and reviews the basic axiomatizations for static individual decisions under uncertainty. It will demonstrate that David Schmeidler’s contributions to this field were crucial.
2.1. The general purpose of axiomatizations In this section we discuss some general purposes of axiomatizations. In particular, the aim is to convince the reader that axiomatizations are an essential step in the development of new models. To start, imagine that you are a novice in decision theory, and have an important decision to take, say which of several risky medical treatments to undergo. You consult a decision theorist, and she gives you a first advice, as follows: 1 2 3 4
List all relevant uncertainties. In your case we assume that the uncertainty concerns which of n potential diseases s1 , . . . , sn is the one you have. Express your uncertainty about what your disease is numerically through probabilities p1 , . . . , pn , subjective if necessary. Express numerically how good you think the result is of each treatment conditional upon each disease. Call these numbers utilities. Of the available treatments, choose the one that maximizes expected utility, that is, the probability-weighted average utility.
Presented in this way, the first advice is ad hoc, and will not convince you. What are such subjective probabilities, and how are you to choose them? Similar questions apply to the utility numbers. And, if such numbers can be chosen, why should you take products of probabilities and utilities, and then sum these products? Why not use other mathematical operations? The main problem with the first advice is that its concepts of probabilities and utilities do not have a clear meaning. They are theoretical constructs, which means that they have no meaning in isolation, but can only get meaning within a model, in relation to other concepts. The decision theorist did not succeed in convincing you, and she now turns to a second advice, seemingly very different. She explains the meaning of transitivity
Preference axiomatizations under uncertainty
21
and completeness of preferences to you, and you declare that you want to satisfy these conditions. She next explains the sure-thing principle to you, meaning that a choice between two treatments should depend only on their results under those diseases where the treatments differ, and not on the results for diseases for which the two treatments give the same results. Let us assume that you want to satisfy this condition as well. Next the decision theorist succeeds in convincing you of the appropriateness of the other preference conditions of Savage (1954). Satisfying these conditions is the decision analyst’s second advice. The second advice is of a different nature than the first. All of its conditions have been stated directly in terms of choice making. Even if you would not agree with the appropriateness of all conditions, at least you can relate to them, and know what they mean. They do not concern strange undefined theoretical concepts. Still, and this was Savage’s (1954) surprising result, the two advices turn out to be identical. One holds if and only if the other holds, given a number of technical assumptions that we ignore here. Whereas the second advice seemed to be entirely different from the first, it turns out to be the same. The second advice translates the first advice, which was stated in a theoretical language, into the meaningful language of empirical primitives, that is, preferences. Such translations are called axiomatizations. They reformulate, directly in terms of the observable primitives such as choices, what it means to assume that some theoretical model holds. A decision model is normatively appropriate if and only if its characterizing axioms are, and is descriptively valid if and only if the characterizing axioms are. Axiomatizations can be used to justify a model, but also to criticize it. Expected utility can be criticized by criticizing, for instance, the sure-thing principle. This is what Allais (1953) did. If a model is to be falsified empirically, then axioms can be of help because they are stated in terms of directly testable empirical primitives. In applications, we usually do not believe models to hold true perfectly well, and use them as approximations or as metaphors, to clarify some aspects of reality that are relevant to us. We mostly do not actually measure the concepts used in models. For instance, most economic models assume that consumers maximize utility, but we rarely measure consumers’ utility functions. The assumption of utility maximization is justified by the belief that for the topics considered, completeness and transitivity of preference are reasonable assumptions. These preference axioms, jointly with continuity, axiomatize the maximization of utility and clarify the validity and limitations thereof. Axiomatizations are crucial at an early stage of the development of new models or concepts, namely at the stage where setups and intuitions are qualitative but quantifications seem to be desirable. Not only do axiomatizations show how to verify or falsify, and how to justify or criticize given models, but they also demonstrate what are the essential parameters and concepts to be measured or determined. Without axiomatizations of expected utility, Choquet expected utility (CEU), and multiple priors, it would not be clear whether at all their concepts such as utility etc. are sensible concepts, and are at all the parameters to be assessed. Ahistorical example may illustrate the importance of axiomatizations. For a long time, models were popular that deviated from expected utility by transforming
22
Peter P. Wakker
probabilities of separate outcomes, such as those examined by Edwards (1955) and Kahneman and Tversky (1979). These models were never axiomatized, which could have served as a warning signal that something was wrong. Indeed, in 1978, Fishburn discovered that no sensible axiomatization of such models will ever be found because these models violate basic axioms such as continuity and, even more seriously, stochastic dominance. When Quiggin (1982) and Schmeidler (1989, first version 1982) introduced alternative models of nonlinear probabilities, they took good care of providing axiomatic foundations. This made clear what the empirical meaning of their models is, that these models do not contain intrinsic inconsistencies, and that their concepts of utilities and nonlinear probabilities are sensible. Quiggin (1982) and Schmeidler (1989) independently developed the idea of rank-dependence and, thus, were the first to present sound models that allow for a new component in individual decision theory: a subjective decision attitude toward incomplete information (i.e. risk and uncertainty). This new component is essential for the study of decision under incomplete information, and sound models for handling it had been dearly missing in the literature up to that point. I consider this development the main step forward for decision under incomplete information of the last decades. Quiggin developed his idea for decision under risk, Schmeidler for the more important and more subtle domain of decision under uncertainty, which is the topic of this book. Axioms can be divided into three different classes. First there are the basic rationality axioms such as transitivity, completeness, and monotonicity, which are satisfied by most models studied today. For descriptive purposes, it has become understood during the last decades that these very basic axioms are the main cause of most deviations from theoretical models. For normative applications, these axioms are relatively uncontroversial, although there is no unanimous agreement on any axiom. The second class of axioms consists of technical axioms, mostly continuity, that impose a richness on the structures considered. For decision under uncertainty, these axioms impose a richness on the state space or on the outcome space. They are usually necessary for obtaining mathematical proofs, and will be further discussed later in this chapter. The third and final class of axioms consists of the “intuitive” axioms that are most characteristic of the models they characterize. They vary from model to model. For expected utility, the sure-thing principle (which amounts to the independence axiom for given probabilities) is the most characteristic axiom. Most axiomatizations of nonexpected utility models have relaxed this axiom. Many examples will be discussed in the following sections, and in other chapters in this book. I end this introduction with a citation from Gilboa and Schmeidler (2001), who concisely listed the purposes of axiomatizations as follows: Meta-theoretical: Define theoretical terms by observables (and enable their elicitation). Descriptive: Define terms of refutability. Normative: Do the right thing.
Preference axiomatizations under uncertainty
23
2.2. General conditions for decision under uncertainty S denotes a state space, with elements called states (of nature). Exactly one state is true, the others are not true. The decision maker does not know which state is the true one, and has no influence on the truth of the states (no moral hazard). For example, assume that a horse race will take place. Exactly one horse will win the race. Every s ∈ S refers to one of the horses participating, and designates the “state of nature” that this horse will win the race. Alternative terms for state of nature are state of the world or proposition. An event is a subset of S, and is true or obtains if it contains the true state of nature. For example, the event “A Spanish horse will win” is the set {s ∈ S: s is Spanish}. C denotes the outcome space, and F the set of acts. Formally, acts are functions from S to C, and F contains all such functions. A decision maker should choose between different acts. An act will yield the outcome f (s) for the decision maker where s is the true state of nature. Because the decision maker is uncertain about which state is true, she is uncertain about what outcome will result from an act, and has to make decisions under uncertainty. An alternative term for an act is statecontingent payoffs, and acts can refer to financial assets. Acts can be considered random variables with the randomness not expressed through probabilities but through states of nature. David Schmeidler is known for his concise ways of formulating things. In the abstract of Schmeidler (1989), he used only seven words to describe the above model: “Acts map states of nature to outcomes.” By , a binary relation on F , we denote the preference relation of the decision maker over acts. In decision under uncertainty, we study properties of the quadruple S, C, F , . A function V represents if V : F → R and f g if and only if V (f ) ≥ V (g). If a representing function exists, then must be a weak order, that is, it is complete (f g or g f for all acts f , g) and transitive. Completeness implies reflexivity, that is, f f for all acts f . We write f g if f g and not g f , f ∼ g if f g and g f , f ≺ g if g f , and f g if g f . For a weak order , ∼ is an equivalence relation, that is, it is symmetric (f ∼ g if g ∼ f ), transitive, and reflexive. Outcomes are often identified with the corresponding constant acts. In this way, on F generates a binary relation on C, denoted by the same symbol and identified with the restriction of to the constant acts. Decision under risk refers to the special case of decision under uncertainty where an objective probability measure Q on S is given, and f ∼ g whenever f and g generate the same probability distribution over C. Then the only information relevant for the preference value of an act is the probability distribution that the act generates over the outcomes. Therefore, acts are usually identified with the probability distributions generated over the outcomes, and S is suppressed from the model. It is useful to keep in mind, though, that probabilities must be generated by some random process, and that some randomizing state space S is underlying, even if not an explicit part of the model. It is commonly assumed in decision under risk that S is rich enough to generate all probabilities, and all probability distributions. My experience in decision under risk and uncertainty has been that
24
Peter P. Wakker
formulations of concepts for the general context of uncertainty are more clarifying and intuitive than formulations only restricted to the special case of risk. This chapter will focus on axiomatizations for decision under uncertainty, the central topic of this book, and will not discuss axiomatizations for decision under risk. Often, axiomatizations for decision under risk readily follow simply by restricting the axioms of uncertainty to the special case of risk. For example, Yaari’s (1987) axiomatization of rank-dependent utility for risk can be obtained as a mathematical corollary of Schmeidler (1989); I will not elaborate on this point. We will also restrict attention to static models, and will not consider dynamic decision making or multistage models such as examined by Luce (2000) unless serving to interpret static models. Other restrictions are that we only consider individual decisions, and do not examine decompositions of multiattribute outcomes. We will neither discuss topological or measure-theoretic details, and primarily refer to works introducing results and not to follow-up works and generalizations. The most well-known representation for decision under uncertainty is subjective expected utility (SEU). SEU holds if there exists a probability measure P on S, and a utility function U : C → R, such that f → S U (f (s) dP (s), the SEU of f , represents preferences. For infinite state spaces S, measure-theoretical conditions can be imposed to ensure that the expectation is well defined for all acts considered. For the special case of decision under risk, P has to agree with the objective probability measure on S under mild richness assumptions regarding S, contrary to what has often been thought in the psychological literature. In general, P need not be based on objective statistical information, and may be based on subjective judgments of the decision situation in the same way as U is. P is, therefore, often called a subjective probability measure. SEU implies monotonicity, that is, f g whenever f (s) g(s) for all s, where furthermore f g if f (s) = α β = g(s) for outcomes α, β and all s in an event E that is “nonnull” in some sense. E being nonnull means that the outcomes of E can affect the preference value of an act, in a way that depends on the theory considered and that will not be formalized here. The most important implication of SEU is the sure-thing principle, discussed informally in the introduction. It means that a preference between two acts is not affected if, for an event for which the two acts yield the same outcome, that common outcome is changed into another common outcome. The condition holds true under SEU, because an event with a common outcome contributes the same term to the expected-utility integral of both acts, which will cancel from the comparison irrespective of what that common outcome is. Savage (1954) introduced this condition as his P2. He did not use the term sure-thing principle for this condition alone, but for a broader idea. The term is, however, used exclusively for Savage’s P2 nowadays. In a mathematical sense, the sure-thing principle can be equated with separability from consumer demand theory, although Savage developed his idea independently. The condition can be derived from principles for dynamic decisions (Burks, 1977: chapter 5; Hammond, 1988), a topic that falls outside the scope of this chapter.
Preference axiomatizations under uncertainty
25
The sure-thing principle is too weak to imply SEU. For instance, for a fixed partition (A1 , . . . , An ) of S, and acts (A1 : x1 ; . . . ; An : xn ) yielding xj for each s ∈ Aj , the sure-thing principle amounts to an additively decomposable representation V1 (x1 ) + · · · Vn (xn ), under some technical assumptions discussed later. This representation is strictly more general than the SEU representation P (A1 )U (x1 ) + · · · + P (An )U (xn ), for instance if V2 = exp(V1 ). It can be interpreted as state-dependent expected utility (Karni, 1985). Therefore, additional conditions are required to imply the SEU model. The particular reinforcements of the sure-thing principle depend on the particular model chosen, and are discussed in the next section.
2.3. Conditions to characterize subjective expected utility The most desirable characterization of SEU, or any model, would concern an arbitrary set of preferences over acts, not necessarily a complete set of preferences over a set F , and would give necessary and sufficient conditions for the preferences considered to be representable by SEU. Most important would be the case of a finite set of preferences, to truly capture the empirical and normative meaning of models such as SEU. Unfortunately, such general results are very difficult to obtain. For SEU, necessary and sufficient conditions for finite models were given by Shapiro (1979). These conditions are, however, extremely complex, and amount to general solvability requirements of inequalities for mathematical models called rings. They do not clarify the intuitive meaning of the model. Therefore, people have usually resorted to continuity conditions so as to simplify the axiomatizations of models. These continuity conditions imply richness of either the state space or the outcome space. Difficulties in using such technical richness conditions are discussed by Krantz et al. (1971: section 9.1) and Pfanzagl (1968: section 9.5). The following discussion is illustrated in Table 2.1. The most prominent model with richness of the state space is Savage (1954). Savage added an axiom P4 to the sure-thing principle, requiring that a preference for betting on one event rather than another is independent of the stakes of the bets. The richness of the state space was ensured by an axiom P6 requiring arbitrarily fine partitions of the state space to exist, so that the state space must be atomless. Decision under risk can be considered a special case of decision under uncertainty where the state space is rich, because it is commonly assumed that all probabilities can be generated by random events. Other than that, there have not been many derivations of SEU with a rich state space. Most axiomatizations have imposed richness structure on the outcome space, to which we turn in the rest of this section. We start with approaches that assume convex subsets of linear spaces as outcome space, with linear utility. In these approaches, outcomes are either monetary, with C ⊂ R an interval, or they are probability distributions over a set of prizes. The sure-thing principle is reinforced into linearity with respect to addition (f g ⇒ f + c g + c for acts f , g, c, where addition is statewise), or mixing (f g ⇒ λf + (1 − λ)c λg + (1 − λ)c for acts f , g, c,
26
Peter P. Wakker
Table 2.1 Axiomatizations and their structural assumptions
Continuous state space U linear in money U linear in probability mixing, 2-stage Canonical probabilities Continuous U, tradeoff consistency Continuous U, multisymmetry Continuous U, actindependence
SEU
CEU
Savage (1954)
Gilboa (1987)∗
de Finetti (1931, 1937) Anscombe and Aumann (1963)
Chateauneuf (1991) Schmeidler (1989)
Raiffa (1968), Sarin and Wakker (1997) Wakker (1984)
Sarin and Wakker (1992) Wakker (1989)
Nakamura (1990) Gul (1992)
Nakamura (1990) Chew and Karni (1994), Ghirardato et al. (2003)
PT
Multiple priors
Chateauneuf (1991) Gilboa and Schmeidler (1989) Sarin and Wakker (1994) Tversky and Kahneman (1992) × ×
Ghirardato et al. (2003); CasadesusMasanell et al. (2000)
Notes × Such an extension is not possible, because the required certainty equivalents are not contained in most of the sign-comonotonic sets. ∗ Required more modifications than only comonotonic restrictions.
where mixing is statewise, and under continuity can be restricted to λ = 12 ). Both of these approaches characterize SEU with a linear utility function. The additive approach was followed by de Finetti (1931, 1937) and Blackwell and Girshick (1954: theorem 4.3.1 and problem 4.3.1). For the mixture approach, Anscombe and Aumann (1963) provided the most appealing result. For earlier results on mixture spaces, see Arrow (1951: 431–432). In addition to the axioms mentioned, these works used weak ordering, monotonicity (this, together with additivity, is what de Finetti’s book-making amounts to), and some continuity (existence of “fair prizes” for de Finetti, continuous mixing for the mixture approaches). In the mixture approaches, the linear utility function is interpreted as an expected utility functional for the probability distributions over prizes, and acts are two-stage: In the first stage, the uncertainty about the true state of nature is resolved yielding a probability distribution over prizes, in the second stage the probability distribution is resolved, finally leading to a prize. This approach assumes that the two stages are processed through backwards induction (“folding back”). The second-stage probabilities could also be modeled through a rich product state space, but for this survey the categorization as rich outcomes is more convenient.
Preference axiomatizations under uncertainty
27
An alternative to Anscombe and Aumann’s (1963) approach was customary in the early decision-analysis literature of the 1960s (Raiffa, 1968: chapter 5). As in Anscombe and Aumann (1963), a rich set of events with objectively given probabilities was assumed present, with preferences over acts on these events governed by expected utility. However, these events were not part of a second stage to be resolved after the events of interest, but they were simply a subset of the collection of events considered in the first, and only, stage. Formally, this approach belongs to the category that requires a rich state space. To evaluate an arbitrary act (A1 : x1 ; . . . ; An : xn ), where no objective probabilities are given for the events Aj , a canonical representation (E1 : x1 ; . . . ; En : xn ) is constructed. Here each event Ej does have an objective probability and is equally likely as event Aj in the sense that one would just as well bet $1 on Ej as on Aj . It is assumed that such canonical representations can be constructed and are preferentially equivalent. In this manner, SEU is obtained over all acts. Sarin and Wakker (1997) formalized this approach. Ramsey (1931) can be interpreted as a variation of this canonical approach, with his “ethically neutral” event an event with probability half, utility derived from gambles on this event, and the extension of SEU to all acts and events not formalized. Returning to the approach with rich outcome sets, more general axiomatizations have been derived for continuous instead of linear utility. Then C can, more generally, be a connected topological space. For simplicity, we continue to assume that C is a convex subset of a linear space. Pfanzagl (1959) gave an axiomatization of SEU when restricted to two-outcome acts. He added a bisymmetry axiom to the sure-thing principle. Denote by CE(f ) a certainty equivalent of act f , that is, an outcome (identified with a constant act) equivalent to f . For events A, M with complements Ac , M c , bisymmetry requires that (A: CE(M: x1 ; M c : y1 ); Ac : CE(M: x2 ; M c : y2 )) ∼ (M: CE(A: x1 ; Ac : x2 ); M c : CE(A: y1 ; Ac : y2 )). For arbitrary finite state spaces S, Grodal (1978) axiomatized SEU with continuous utility using a mean-groupoid operation (a generalized mixture operation derived from preference) developed by Vind. These works were finally published in Vind (2003). Wakker (1984, 1993) characterized SEU for continuous utility using a tradeoff consistency technique based on conjoint measurement theory of Krantz et al. (1971) and suggested by Pfanzagl (1968: end of remark 9.4.5). The basic axiom requires that (A1 : α; A2 : x2 ; . . . ; An : xn ) (A1 : β; A2 : y2 ; . . . ; An : yn ), (A1 : γ ; A2 : x2 ; . . . ; An : xn ) (A1 : δ; A2 : y2 ; . . . ; An : yn ), and (A1 : v1 ; . . . ; An−1 : vn−1 ; An : α) (A1 : v1 ; . . . ; An−1 : vn−1 ; An : β)
28
Peter P. Wakker
imply (A1 : v1 ; . . . ; An−1 : vn−1 ; An : γ ) (A1 : v1 ; . . . ; An−1 : vn−1 ; An : δ), where (A1 , . . . , An ) can be any partition of S. By renumbering, similar conditions follow for outcomes α, β, γ , δ conditional on all pairs of events Ai , Aj . Nakamura (1990) used multi-symmetry, a generalization of Pfanzagl’s (1959, 1968) bisymmetry to general acts, to characterize SEU with continuous utility for finite state spaces. Similar conditions had appeared before in decision under risk (Quiggin, 1982; Chew, 1989). Chew called the condition event commutativity. Consider a partition (A1 , . . . , An ) and a “mixing” event M with complementary event M c . Multisymmetry requires that (A1 : CE(M: x1 ; M c : y1 ); . . . ; An : CE(M: xn ; M c : yn )) ∼ (M: CE(A1 : x1 ; . . . ; An : xn ); M c : CE(A1 : y1 ; . . . ; An : yn )). Multisymmetry implies that (x1 , . . . , xn ) is separable in (A1 : CE(M: x1 ; M c : c1 ); . . .; An : CE(M: xn ; M c : cn )). This implication is called act-independence, and was introduced by Gul (1992). Formally, the condition requires that (A1 : x1 ; . . . ; An : xn ) (A1 : y1 ; . . . ; An : yn ) implies (A1 : CE(M: x1 ; M c : c1 ); . . . ; An : CE(M: xn ; M c : cn )) (A1 : CE(M: y1 ; M c : c1 ); . . . ; An : CE(M: yn ; M c : cn )). Gul showed that this condition suffices to characterize SEU with continuous utility for finite state spaces, under the usual other assumptions. Gul used an additional symmetry requirement that was shown to be redundant by Chew and Karni (1994). Using bisymmetry axioms for two-outcome acts, Ghirardato et al. (2003a) defined a mixture operation that can be interpreted as an endogeneous analog of the mixture operation used in Anscombe and Aumann (1963). They used it also to derive nonexpected utility models discussed in the next section. Characterizations of properties of utility such as concavity have mostly been studied for decision under risk, and less so for decision under uncertainty. Also for uncertainty, utility is concave if and only if the subjective expected value of an act is always preferred to the act (Wakker, 1989; proposition VII.6.3.ii). This result is more difficult to prove than for decision under risk because not all probabilities need to be available, and is less useful because the subjective expected value is not directly observable, in the same way as subjective probabilities are not. More interesting for uncertainty is that utility is concave if and only if preferences are convex with respect to the mixing of outcomes, that is, if f g then 12 f + 12 g g where outcomes are mixed statewise (Wakker, 1989: proposition VII.6.3.iv). This condition has the advantage that it is directly observable.
Preference axiomatizations under uncertainty
29
2.4. Nonexpected utility models This section considers models deviating from SEU. Abandoning basic axioms. Models abandoning completeness (Bewley, 1986; Dubra et al., 2004), transitivity (Fishburn, 1982; Loomes and Sugden, 1982; Vind, 2003), or continuity (Fishburn and LaValle, 1993) will not be discussed. We will only discuss models that weaken the sure-thing principle. In this class, we will not discuss betweenness models (Chew, 1983; Dekel, 1986; Epstein, 1992). These models have been examined almost exclusively for risk, with statements for uncertainty only in Hazen (1987) and Sarin and Wakker (1998), and have nowadays lost popularity. We will neither discuss quadratic utility (Chew et al., 1991), which has been stated only for decision under risk. Choquet expected utility. The first nonexpected utility model that we discuss is rank-dependent utility, or Choquet expected utility (CEU) as it is often called when considered for uncertainty. We assume a utility function as under SEU, but instead of a subjective probability P on S we assume, more generally, a capacity W on S. W is defined on the collection of subsets of S with W (Ø) = 0, W (S) = 1, and C ⊃ D ⇒ W (C) ≥ W (D). is represented by f → S U (f (s)) dW (s), the CEU of f , defined next. Assume that f = (E1 : x1 ; . . . ; En : xn ). The integral is nj=1 πj U (xj ) where the πj s are defined as follows. Take a permutation ρ on {1, . . . , n} such that xρ(1) ≥ · · · ≥ xρ(n) . πρ(j ) is W (Eρ(1) ∪ · · · ∪ Eρ(j ) ) − W (Eρ(1) ∪ · · · ∪ Eρ(j −1) ); in particular, πρ(1) = W (Eρ(1) ). An important concept in CEU, introduced by Schmeidler (1989), is comonotonicity. Two acts f and g are comonotonic if f (s) > f (t) and g(s) < g(t) for no states s, t. A set of acts is comonotonic if every pair of its elements is comonotonic. Comonotonicity is an important concept because, as can be proved, within any comonotonic subset of F the CEU functional is an SEU functional (with numbers such as the above πρ(j ) playing the role of probabilities). It is, therefore, obvious that a necessary requirement for CEU is that all conditions of SEU hold within comonotonic subsets. Such restrictions are indicated by the prefix comonotonic, leading to the comonotonic sure-thing principle, etc. It is more complex to demonstrate that these comonotonic restrictions are also sufficient to imply CEU, but this can be proved in many circumstances. The third column of Table 2.1 gives the axiomatizations of CEU. Prospect theory. Original prospect theory, introduced by Kahneman and Tversky (1979), assumed nonlinear probability weighting but had theoretical problems, and was defined only for risk, not for uncertainty. Only when Schmeidler (1989) introduced a sound model for nonlinear probabilities, could a model of prospect theory be developed that is theoretically sound and that also deals with uncertainty (Tversky and Kahneman, 1992). We define it next. Under prospect theory, one outcome, called the reference outcome, plays a special role. Outcomes preferred to the reference outcome are gains, outcomes preferred less than the reference outcome are losses. The main deviation from other theories is that in different decision situations the decision maker may choose
30
Peter P. Wakker
different reference points, and remodel her decisions accordingly. Although there is much empirical evidence for such procedures, formal theories to describe them have not yet been developed. We will therefore restrict attention, in this theoretical chapter, to one fixed reference point. For results on varying reference points, see Schmidt (2003). With a fixed reference point, prospect theory generalizes CEU and SEU in that it allows for a different capacity, W − , for losses than for gains, where the gain capacity is denoted as W + . Under prospect theory we define, for an act f , f + by replacing all losses of f by the reference outcome, and f − by replacing all gains of f by the reference outcome. Our notation f − deviates from mathematical conventions that, for real-valued functions f , take f − as a positive function, being our function f − multiplied by −1. For general outcomes, however, such a multiplication cannot be defined, which explains our definition. The prospect theory (PT) of an act f is PT(f ) = CEU(f + ) + CEU(f − ) where CEU(f + ) is with respect to W + and CEU(f − ) is with respect to the dual of W − , assigning 1 − W − (Ac ) to each event A (Ac denotes complement). Two acts f , g are sign-comonotonic if they are comonotonic and, further, there is no state s such that of f (s), g(s) one is a gain and the other a loss. A set of acts is sign-comonotonic if any pair of its elements is sign-comonotonic. Signcomonotonicity plays the same role for PT as comonotonicity for CEU. Within any sign-comonotonic set, PT agrees with SEU and, therefore, all conditions of SEU are satisfied within sign-comonotonic sets. A more difficult result, that can be proved in several situations, is that PT holds as soon as the sign-comonotonic conditions of SEU hold, that is, the restrictions of these conditions to signcomonotonic subsets of acts. Axiomatizations of PT are given in the fourth column of Table 2.1. Properties of utility and capacities under CEU and PT. Specific properties of utilities and capacities have been characterized for CEU, and for PT alike. Schmeidler (1989) demonstrated, in his CEU model with linear utility, that the capacity is convex (W (A ∪ B) + W (A ∩ B) ≥ W (A) + W (B)) if and only if preferences are convex. Chateauneuf and Tallon (2002) generalized this result by showing that, under differentiability assumptions, preferences are convex if and only if both utility is concave and the capacity W is convex. Wakker (2001) gave necessary and sufficient conditions for convexity of the capacity, without restricting the form of utility other than being continuous. Tversky and Wakker (1995) characterized a number of other conditions of capacities, such as bounded subadditivity, that are often found in experimental tests of prospect theory. Multiple priors. Another popular deviation from expected utility is the multiple priors model. As in SEU, it assumes a utility function U over outcomes. It deviates by not considering one fixed probability measure, but a set of probability measures. Say C is such a set of probability measures over S. Then an act f is evaluated by minP ∈C SEUP (f ), where SEUP is taken with respect to P . This defines the multiple priors model. It was first characterized by Gilboa and
Preference axiomatizations under uncertainty
31
Schmeidler (1989) in an Anscombe–Aumann setup where outcomes designate probability distributions over prizes, evaluated through a linear utility function (an expected utility functional). In a comprehensive paper, Chateauneuf (1991) obtained the same characterization independently, also for linear utility, but with linearity relating to monetary outcomes. For two-outcome acts, the multiple priors model coincides with CEU utility, so that the common generalization of these two models that imposes the representation only on two-outcome acts can serve as a good starting point (Ghirardato and Marinacci, 2001). The axiomatization of the multiple priors model requires convexity of preference, implying that a representing functional is quasi-concave. Mainly independence with respect to constant functions: f g ⇒ λf +(1−λ)c λg +(1−λ)c for acts f , g, c, both in the Gilboa–Schmeidler approach and in Chateauneuf’s approach, where c is required to be constant, ensures that the representing functional is even concave. A functional is concave if and only if it is the minimum of dominating linear functions, which, under appropriate monotonicity, must be expected utility functionals. Thus, the multiple priors model results. The axiomatization of multiple priors for continuous instead of linear utility has been obtained by Casadesus-Masanell et al. (2000) who used both bisymmetrylike and tradeoff-consistency-like axioms, and Ghirardato et al. (2003a) who used bisymmetry-like axioms to define an endogeneous mixture operation. A less conservative extension of the multiple priors model is the α-Hurwicz criterion, where acts are evaluated by α times the minimal SEU plus 1 − α times the maximal SEU over C. It was axiomatized by Ghirardato et al. (2003b). Probabilistic sophistication. We finally discuss probabilistic sophistication. The derivation of SEU can be divided into two steps. In the first step, uncertainty is quantified through probabilities and the only relevant aspect for the preference value of an act is the probability distribution that it generates over the outcomes. In the second step, the probability distribution over outcomes is evaluated through expected utility. Probabilistic sophistication refers to the first of these steps without imposing expected utility in the second step. A first characterization was given by Machina and Schmeidler (1992), with an appealing generalization in Epstein and LeBreton (1993). The main axiom is de Finetti’s (1949) additivity: If you rather bet on A than on B, then you also rather bet on A ∪ D than on B ∪ D for any event D disjoint from A and B. Under appropriate richness of the event space, this axiom implies that there exists a probability measure P on the events such that you rather bet on A than on B if and only if P (A) ≥ P (B) (for a review, see Fishburn, 1986). Additional assumptions then guarantee that two different acts that generate the same probability distribution over outcomes are equivalent, which implies probabilistic sophistication.
2.5. Conclusion For all models discussed, axiomatizations provided a crucial step in the beginning of their developments, when it was not entirely clear what the right subjective
32
Peter P. Wakker
parameters and their quantitative rules of combination were. It is remarkable that prospect theory could be modeled in a sound way only after Schmeidler (1989) had developed the first axiomatization of decision under uncertainty with nonlinear decision weights.
References Allais, Maurice (1953), “Fondements d’une Théorie Positive des Choix Comportant un Risque et Critique des Postulats et Axiomes de l’Ecole Américaine,” Colloques Internationaux du Centre National de la Recherche Scientifique 40, Econométrie, 257–332. Paris: Centre National de la Recherche Scientifique. Translated into English, with additions, as “The Foundations of a Positive Theory of Choice Involving Risk and a Criticism of the Postulates and Axioms of the American School,” in Maurice Allais and Ole Hagen (1979, eds), Expected Utility Hypotheses and the Allais Paradox, 27–145, Reidel, Dordrecht, The Netherlands. Anscombe, F. J. and Robert J. Aumann (1963), “A Definition of Subjective Probability,” Annals of Mathematical Statistics 34, 199–205. Arrow, Kenneth J. (1951), “Alternative Approaches to the Theory of Choice in Risk-Taking Situations,” Econometrica 19, 404–437. Bewley, Truman F. (1986), “Knightian Decision Theory Part I,” Cowles Foundation Discussion Paper No. 807. Blackwell, David and M. A. Girshick (1954), “Theory of Games and Statistical Decisions.” Wiley, New York. Burks, Arthur W. (1977), “Chance, Cause, Reason (An Inquiry into the Nature of Scientific Evidence).” The University of Chicago Press, Chicago. Casadesus-Masanell, Ramon, Peter Klibanoff, and Emre Ozdenoren (2000), “Maxmin Expected Utility over Savage Acts with a Set of Priors,” Journal of Economic Theory 92, 35–65. Chateauneuf, Alain (1991), “On the Use of Capacities in Modeling Uncertainty Aversion and Risk Aversion,” Journal of Mathematical Economics 20, 343–369. Chateauneuf, Alain and Jean-Marc Tallon (2002), “Diversification, Convex Preferences and Non-Empty Core,” Economic Theory, 19, 509–523. Chew, Soo Hong (1983), “A Generalization of the Quasilinear Mean with Applications to the Measurement of Income Inequality and Decision Theory Resolving the Allais Paradox,” Econometrica 51, 1065–1092. Chew, Soo Hong (1989), “The Rank-Dependent Quasilinear Mean,” Unpublished manuscript, Department of Economics, University of California, Irvine, USA. Chew, Soo Hong and Edi Karni (1994), “Choquet Expected Utility with a Finite State Space: Commutativity and Act-Independence,” Journal of Economic Theory 62, 469–479. Chew, Soo Hong, Larry G. Epstein, and Uzi Segal (1991), “Mixture Symmetric and Quadratic Utility,” Econometrica 59, 139–163. de Finetti, Bruno (1931), “Sul Significato Soggettivo della Probabilità,” Fundamenta Mathematicae 17, 298–329. Translated into English as “On the Subjective Meaning of Probability,” in Paola Monari and Daniela Cocchi (eds, 1993) “Probabilità e Induzione,” Clueb, Bologna, 291–321. de Finetti, Bruno (1937), “La Prévision: Ses Lois Logiques, ses Sources Subjectives,” Annales de l’Institut Henri Poincaré 7, 1–68. Translated into English by Henry E. Kyburg Jr., “Foresight: Its Logical Laws, its Subjective Sources,” in Henry E. Kyburg Jr.
Preference axiomatizations under uncertainty
33
and Howard E. Smokler (1964, eds), Studies in Subjective Probability, Wiley, New York; 2nd edition 1980, Krieger, New York. de Finetti, Bruno (1949), “La ‘Logica del Plausible’ Secondo la Concezione di Pòlya,” Atti della XLII Riunione della Società Italiana per il Progresso delle Scienze, 227–236. Dekel, Eddie (1986), “An Axiomatic Characterization of Preferences under Uncertainty: Weakening the Independence Axiom,” Journal of Economic Theory 40, 304–318. Dubra, Juan, Fabio Maccheroni, and Efe A. Ok (2004), “Expected Utility without the Completeness Axiom,” Journal of Economic Theory, 115, 118–133. Edwards, Ward (1955), “The Prediction of Decisions Among Bets,” Journal of Experimental Psychology 50, 201–214. Epstein, Larry G. (1992), “Behavior under Risk: Recent Developments in Theory and Applications.” In Jean-Jacques Laffont (ed.), Advances in Economic Theory II, 1–63, Cambridge University Press, Cambridge, UK. Epstein, Larry G. and Michel Le Breton (1993), “Dynamically Consistent Beliefs Must Be Bayesian,” Journal of Economic Theory 61, 1–22. Fishburn, Peter C. (1978), “On Handa’s ‘New Theory of Cardinal Utility’ and the Maximization of Expected Return,” Journal of Political Economy 86, 321–324. Fishburn, Peter C. (1982), “Nontransitive Measurable Utility,” Journal of Mathematical Psychology 26, 31–67. Fishburn, Peter C. (1986), “The Axioms of Subjective Probability,” Statistical Science 1, 335–358. Fishburn, Peter C. and Irving H. LaValle (1993), “On Matrix Probabilities in Nonarchimedean Decision Theory,” Journal of Risk and Uncertainty 7, 283–299. Ghirardato, Paolo and Massimo Marinacci (2001), “Risk, Ambiguity, and the Separation of Utility and Beliefs,” Mathematics of Operations Research 26, 864–890. Ghirardato, Paolo, Fabio Maccheroni, Massimo Marinacci, and Marciano Siniscalchi (2003a), “A Subjective Spin on Roulette Wheels,” Econometrica, 71, 1897–1908. Ghirardato, Paolo, Fabio Maccheroni, and Massimo Marinacci (2003b), “Differentiating Ambiguity and Ambiguity Attitude,” Economic Dept, University of Torino. Gilboa, Itzhak (1987), “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics 16, 65–88. Gilboa, Itzhak and David Schmeidler (1989), “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics 18, 141–153. (Reprinted as Chapter 6 in this volume.) Gilboa, Itzhak and David Schmeidler (2001), lecture at 22nd Linz Seminar on Fuzzy Set Theory, Linz, Austria. Grodal, Birgit (1978), “Some Further Results on Integral Representation of Utility Functions,” Institute of Economics, University of Copenhagen, Copenhagen. Appeared in Vind, Karl (2003), “Independence, Additivity, Uncertainty.” With contributions by B. Grodal. Springer, Berlin. Gul, Faruk (1992), “Savage’s Theorem with a Finite Number of States,” Journal of Economic Theory 57, 99–110. (“Erratum,” 1993, Journal of Economic Theory 61, 184.) Hammond, Peter J. (1988), “Consequentialist Foundations for Expected Utility,” Theory and Decision 25, 25–78. Hazen, Gorden B. (1987), “Subjectively Weighted Linear Utility,” Theory and Decision 23, 261–282. Kahneman, Daniel and Amos Tversky (1979), “Prospect Theory: An Analysis of Decision under Risk,” Econometrica 47, 263–291.
34
Peter P. Wakker
Karni, Edi (1985), “Decision-Making under Uncertainty: The Case of State-Dependent Preferences.” Harvard University Press, Cambridge, MA. Krantz, David H., R. Duncan Luce, Patrick Suppes, and Amos Tversky (1971), “Foundations of Measurement, Vol. I. (Additive and Polynomial Representations).” Academic Press, New York. Loomes, Graham and Robert Sugden (1982), “Regret Theory: An Alternative Theory of Rational Choice under Uncertainty,” Economic Journal 92, 805–824. Luce, R. Duncan (2000), “Utility of Gains and Losses: Measurement-Theoretical and Experimental Approaches.” Lawrence Erlbaum Publishers, London. Machina, Mark J. and David Schmeidler (1992), “A More Robust Definition of Subjective Probability,” Econometrica 60, 745–780. Nakamura, Yutaka (1990), “Subjective Expected Utility with Non-Additive Probabilities on Finite State Spaces,” Journal of Economic Theory 51, 346–366. Pfanzagl, Johann (1959), “A General Theory of Measurement—Applications to Utility,” Naval Research Logistics Quarterly 6, 283–294. Pfanzagl, Johann (1968), “Theory of Measurement.” Physica-Verlag, Vienna. Quiggin, John (1982), “A Theory of Anticipated Utility,” Journal of Economic Behaviour and Organization 3, 323–343. Raiffa, Howard (1968), “Decision Analysis.” Addison-Wesley, London. Ramsey, Frank P. (1931), “Truth and Probability.” In “The Foundations of Mathematics and other Logical Essays,” 156–198, Routledge and Kegan Paul, London. Reprinted in Henry E. Kyburg Jr. and Howard E. Smokler (1964, eds), Studies in Subjective Probability, 61–92, Wiley, New York. (2nd edition 1980, Krieger, New York.) Sarin, Rakesh K. and Peter P. Wakker (1992), “A Simple Axiomatization of Nonadditive Expected Utility,” Econometrica 60, 1255–1272. (Reprinted as Chapter 7 in this volume.) Sarin, Rakesh K. and Peter P. Wakker (1994), “Gains and Losses in Nonadditive Expected Utility.” In Mark J. Machina and Bertrand R. Munier (eds), Models and Experiments on Risk and Rationality, Kluwer Academic Publishers, Dordrecht, The Netherlands, 157–172. Sarin, Rakesh K. and Peter P. Wakker (1997), “A Single-Stage Approach to Anscombe and Aumann’s Expected Utility,” Review of Economic Studies 64, 399–409. Sarin, Rakesh K. and Peter P. Wakker (1998), “Dynamic Choice and Nonexpected Utility,” Journal of Risk and Uncertainty 17, 87–119. Savage, Leonard J. (1954), “The Foundations of Statistics.” Wiley, New York. (2nd edition 1972, Dover, New York.) Schmeidler, David (1989), “Subjective Probability and Expected Utility without Additivity,” Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) Schmidt, Ulrich (2003), “Reference Dependence in Cumulative Prospect Theory,” Journal of Mathematical Psychology 47, 122–131. Shapiro, Leonard (1979), “Necessary and Sufficient Conditions for Expected Utility Maximizations: The Finite Case, with a Partial Order,” Annals of Statistics 7, 1288–1302. Tversky, Amos and Daniel Kahneman (1992), “Advances in Prospect Theory: Cumulative Representation of Uncertainty,” Journal of Risk and Uncertainty 5, 297–323. Tversky, Amos and Peter P. Wakker (1995), “Risk Attitudes and Decision Weights,” Econometrica 63, 1255–1280. Vind, Karl (2003), “Independence, Additivity, Uncertainty.” With contributions by B. Grodal. Springer, Berlin. Wakker, Peter P. (1984), “Cardinal Coordinate Independence for Expected Utility,” Journal of Mathematical Psychology 28, 110–117.
Preference axiomatizations under uncertainty
35
Wakker, Peter P. (1989), “Additive Representations of Preferences, A New Foundation of Decision Analysis.” Kluwer Academic Publishers, Dordrecht, The Netherlands. Wakker, Peter P. (1993), “Unbounded Utility for Savage’s ‘Foundations of Statistics,’ and other Models,” Mathematics of Operations Research 18, 446–485. Wakker, Peter P. (2001), “Testing and Characterizing Properties of Nonadditive Measures through Violations of the Sure-Thing Principle,” Econometrica 69, 1039–1059. Yaari, Menahem E. (1987), “The Dual Theory of Choice under Risk,” Econometrica 55, 95–115.
3
Defining ambiguity and ambiguity attitude Paolo Ghirardato
According to the well-known distinction attributed to Knight (1921), there are two kinds of uncertainty. The first, called “risk,” corresponds to situations in which all events relevant to decision making are associated with obvious probability assignments (which every decision maker agrees to). The second, called “(Knightian) uncertainty” or (following Ellsberg (1961)) “ambiguity,” corresponds to situations in which some events do not have an obvious, unanimously agreeable, probability assignment. As Chapter 1 makes clear, this collection focuses on the issues related to decision making under ambiguity. In this chapter, I briefly discuss the issue of the formal definition of ambiguity and ambiguity attitude. In his seminal paper on the Choquet expected utility (CEU) model David Schmeidler (1989) proposed a behavioral definition of ambiguity aversion, showing that it is represented mathematically by the convexity of the decision maker’s capacity v. The property he proposed can be understood by means of the example of the two coins used in Chapter 1. Assume that the decision maker places bets that depend on the result of two coin flips, the first of a coin that she is very familiar with, the second of a coin provided by somebody else. Given that she is not familiar with the second coin, it is possible that she would consider “ambiguous” all the bets whose payoff depends on the result of the second flip. (For instance, a bet that pays $1 if the second coin lands with heads up, or equivalently if the event {HH, TH} obtains.) If she is averse to ambiguity, she may therefore see such bets as somewhat less desirable than bets that are “unambiguous,” that is, only depend on the result of the first flip. (For instance, a bet that pays $1 if the first coin lands with heads up, or equivalently if the event {HH, HT} obtains.) However, suppose that we give the decision maker the possibility of buying shares of each bet. Then, if she is offered a bet that pays $0.50 on {HH} and $0.50 on {HT}, she may prefer it to either of the two bets that pay $1 contingently on {HH} or on {HT}, which are ambiguous. In fact, such a bet has the same contingent payoffs as a bet which pays $0.50 if the first coin lands with heads up, which is unambiguous. That is, a decision maker who is averse to ambiguity may prefer the equal-probability “mixture” of two ambiguous acts to either of the acts. In contrast, a decision maker who is attracted to ambiguity may prefer to choose one of the ambiguous acts.
Defining ambiguity and ambiguity attitude
37
Formally, Schmeidler called ambiguity averse a decision maker who prefers the even mixture1 (1/2)f + (1/2)g of two acts that she finds indifferent to either of the two acts. That is, (1/2)f + (1/2)g f for all f and g such that f ∼ g. As recalled earlier, if the decision maker has CEU preferences, this property implies that her capacity v is convex. If, instead, she has maxmin expected utility (MMEU) preferences, then she satisifies this property automatically (indeed, it is one of the axioms that characterize the model). While this is certainly a compelling definition, it does not seem to be fully satisfactory as a definition of ambiguity aversion. First of all, it explicitly relies on the availability of mixtures of acts, and thus apparently on the existence of objective randomizing devices. This is not a serious problem, for it has been shown by Ghirardato et al. (2001) that mixtures can be defined without invoking randomizing devices, provided the set of prizes is rich and preferences satisfy some mild restrictions. (Moreover, Casadesus-Masanell et al. (2000) show that Schmeidler’s definition can be formulated in a Savage setting which does not explicitly involve mixtures.) Second—and more important—Schmeidler’s definition is not satisfied by preferences that do seem to embody ambiguity aversion, as illustrated by the following example. Example 3.1. Consider again the decision maker facing the set S = {HH, HT, TH, TT} of results of flips of a familiar and an unfamiliar coin. Suppose that she has CEU preferences represented by a capacity v on S which: •
assigns 1/8 to each singleton state, that is, v({HH}) = v({HT}) = v({TH}) = v({TT}) =
•
assigns 1/2 to the results of the familiar coin flip, that is, v({HH, HT}) = v({TH, TT) =
• •
1 ; 8
1 ; 2
assigns 9/16 to any 3-state event (like {HH, HT, TH}) and 1 to the whole state space; assigns the sum of the weights of its (singleton) elements to each other event.
Such a preference embodies a dislike of ambiguity: The decision maker prefers to bet on the familiar coin rather than on the unfamiliar one (notice that v({HH, TH}) = 1/4 < 1/2 = v({HH, HT})). However, the capacity v is not convex,2 so that she is not ambiguity averse according to Schmeidler’s definition.
3.1. Comparative foundations to ambiguity aversion Motivated by these problems with Schmeidler’s definition, Epstein (1999) tried a different approach to defining aversion to ambiguity, inspired by Yaari’s (1969) general definition of risk aversion for non-expected utility preferences.
38
Paolo Ghirardato
He suggested using a two-stage approach, first defining a notion of comparative ambiguity aversion, and then calling averse to ambiguity any preference which is more averse than (what we establish to be) an ambiguity neutral preference. Ghirardato and Marinacci (2002, GM) followed his example, employing a different comparative notion and a different definition of ambiguity neutrality. For reasons that will become clear presently, I shall discuss these contributions in inverse chronological order. Ghirardato and Marinacci depart from the observation that preferences that obey the classical Expected Utility Theory (EUT) are intuitively ambiguity neutral, and propose using such preferences as the benchmark to measure ambiguity aversion. As to the comparative ambiguity aversion notion, they suggest calling a preference 2 more ambiguity averse than a preference 1 , if both preferences are represented by the same utility function3 and given any constant act x and any act f , we have that whenever the first preference favors the (certainly unambiguous) constant x to the (possibly ambiguous) f , the second does the same; that is, x 1 (1 )f
=⇒
x 2 (2 )f .
(3.1)
Thus, a preference is ambiguity averse if it is more averse to ambiguity than some EUT preference. GM show that every MMEU preference is averse to ambiguity in this sense (while “maximax EU” preferences are ambiguity seeking). In contrast, a CEU preference is ambiguity averse if and only if its capacity v has a nonempty core, a strictly weaker property than convexity. Therefore, GM conclude that Schmeidler’s definition captures strictly more than aversion to ambiguity. (Notice that the capacity v in Example 3.1 does have a nonempty core; the uniform probability on S is in Core(v).) This definition is simple and it has intuitive characterizations,4 but it can be criticized in an important respect. It does not distinguish between those departures from EUT, which are unrelated to ambiguity (like the celebrated “Allais paradox”)—in the terminology of Chapter 1, the violations of the third tenet of Bayesianism—and those which are. Every departure from the EUT benchmark is attributed to the presence of ambiguity. To see why this may be an issue, consider the following example. Example 3.2. Using again the two-coin example, consider a decision maker with CEU (indeed, RDEU) preferences and the capacity v defined by: v (S) = 1 and v (A) = P (A)/2 for A = S, where P is the uniform probability on the state space S. At first blush, we may invoke aversion to ambiguity (recall that the second coin is the unfamiliar one) to explain the fact that v ({TH, HH}) = v ({TT, HT}) = 1/4. However, we also see that v ({HH, HT}) = v ({TH, TT}) = 1/4; that is, the decision maker is similarly unwilling to bet on the familiar, unambiguous coin. What we are observing is a dislike of uncertainty which is more general than just aversion to ambiguity: The decision maker treats even events with “known” probability 1/2 as if they really had probability 1/4. This is a trait usually called probabilistic risk aversion; the decision maker appears in fact to be neutral to
Defining ambiguity and ambiguity attitude
39
the ambiguity in this problem. However, the capacity v is convex, so that both Schmeidler and GM would classify this decision maker as ambiguity averse. Epstein (1999) offers a definition that avoids this problem, carefully distinguishing between “risk-based” behavioral traits and “ambiguity-based” ones. The key idea is to use a set A of events which are exogenously known to be considered unambiguous by every decision maker, like the results of the flips of the familiar coin in the example stated earlier. Acts which only depend on the events in A are called unambiguous. The comparative definition is then modified as follows: say that preference 2 is more ambiguity averse than preference 1 if for any act f and any unambiguous act h, we have h 1 (1 )f
=⇒
h 2 (2 )f .
(3.2)
Notice that this definition is strictly stronger than GM’s, as constant acts are unambiguous, while in general (i.e. for nontrivial A) there will be unambiguous acts which are not constant. As long as the set A (and hence the set of unambiguous acts) is sufficiently rich, Eq. (3.2) implies that the two preferences have identical utility functions as well as identical probabilistic risk aversion. For instance, the CEU decision maker with capacity v in Example 3.1 cannot be compared to the one with capacity v in Example 3.2; their willingness to bet on the unambiguous results of the flips of the second coin are different. A CEU preference comparable to that with capacity v must also “transform” an objective probability of 1/2 into a 1/4. The choice of the benchmark with respect to which ambiguity aversion has to be measured is made consistently with this modified comparative notion. EUT preferences are probabilistic risk neutral, and do not “transform” the probabilities of unambiguous events, so they cannot be compared to preferences like the CEU preference with capacity v . Epstein uses preferences which satisfy Machina and Schmeidler’s (1992) probabilistic sophistication model, which allows nonexpected utility preferences as long as their ranking of bets on events can be represented by a probability.5 He calls a decision maker ambiguity averse if his preference is more averse to ambiguity than a probabilistically sophisticated preference. His characterization results are not as clear-cut as those in GM: While basically every MMEU preference is ambiguity averse, the characterization of CEU preferences is less straightforward. Epstein does provide a full characterization for those CEU preferences that satisfy a certain smoothness condition, which he calls “eventwise differentiability.” I refer the reader to his chapter for details. Epstein’s definition of ambiguity aversion is limited by the requirement of a rich set A of exogenously unambiguous events. Suppose that we observe a decision maker who has CEU preferences with capacity v as in Example 3.2, but we do not know what the decision maker knows about these two coins. Can we conclude that he is ambiguity neutral and probabilistic risk averse? If both coins were unfamiliar, his capacity would instead reflect ambiguity aversion—for all we know, he may even have EUT preferences (i.e. be probabilistic risk neutral) when betting on familiar coins. The problem is that in this case the set A is just the trivial {θ , S},
40
Paolo Ghirardato
too poor to enable us to distinguish between “pure” ambiguity aversion and probabilistic risk aversion. (As a consequence, the observation that the capacity v is convex yet induces behavior that is not intuitively ambiguity averse, may be in need of reconsideration.) We reach the conclusion that a theory of “pure” ambiguity aversion (as opposed to what is measured by GM) must be founded on an endogenous theory of ambiguity, if it is to be generally valid. This is what Epstein next turned his attention to; it is discussed in the next subsection. Before closing this discussion on the comparative foundation to ambiguity aversion, I remark that, while Epstein’s (1999) chapter is the earliest to use a comparative approach to provide an absolute notion of ambiguity aversion, there are others who much earlier discuss comparative ambiguity aversion. Tversky and Wakker (1995) present and characterize some different comparative notions related to ambiguity and probabilistic risk aversion. Kelsey and Nandeibam (1996) propose a comparative notion similar to GM’s, implicitly assuming the equality of utility, and show its characterization for CEU and MMEU preferences.
3.2. What is ambiguity? As observed earlier, the quest for the distinction of ambiguity aversion and behavioral traits unrelated to the presence of ambiguity was a driving force behind the more recent attempts (like Epstein and Zhang (2001)) at understanding the behavioral consequences of the presence of ambiguity. However, there have been others who have addressed the definition of ambiguity. Fishburn (1993) considers a primitive ambiguity relation over events, and discusses its properties and representation by an ambiguity measure. Nehring (1999) defines an event A unambiguous for a MMEU preference with set of priors C if P (A) = P (A) for every P , P ∈ C. As to CEU preferences, Nehring recalls that any capacity v on a finite state space S = {s1 , s2 , . . . , sn } can be canonically associated with the set Cv of the probabilities Pσ defined as follows. Let σ denote a permutation of the indices {1, . . . , n}, and define6 Pσ (sσ (i) ) = v({sσ (1) , sσ (2) , . . . , sσ (i) }) − v({sσ (1) , sσ (2) , . . . , sσ (i−1) }). Using this fact allows him to define ambiguity of events analogously to the MMEU case, with Cv in place of C. In both cases, an event is unambiguous if it is given identical weight in the evaluation of any act. Nehring shows that while for MMEU preferences the set of unambiguous events is a λ-system (a class closed with respect to complements and disjoint unions), for CEU preferences it is an algebra (i.e. it is also closed with respect to intersections). As there are situations in which the set of unambiguous events is not an algebra, this suggests that CEU preferences cannot be used to model all decision problems under ambiguity.7 A notion of ambiguity for events that holds for a wider class of preferences was introduced in Zhang (2002). Loosely put, Zhang calls unambiguous an event A such that Savage’s sure-thing principle holds for acts separated on the partition {A, Ac }. He then shows that the set of such events is a λ-system, and that for
Defining ambiguity and ambiguity attitude
41
a subset of CEU preferences (those which induce an exact v; details are found in GM) it has a simple representation in terms of the capacity v: It is the set of the A’s such that v(A) + v(Ac ) = 1. Zhang’s definition of unambiguous event was later modified in Epstein and Zhang (2001, EZ), the announced attempt to endogenize the class of unambiguous events used in Epstein’s definition of ambiguity aversion. The idea of EZ’s definition is similar to Zhang’s (2002), though it yields a larger collection of unambiguous events. Axioms on the decision maker’s preferences are introduced, which guarantee that the resulting collection of events is a λ-system and that the preferences over the sets of unambiguous acts (those which are measurable with respect to unambiguous events) are probabilistically sophisticated in the sense of Machina and Schmeidler (1992). This yields an interesting extension of Machina and Schmeidler’s and Savage’s models, wherein the set of events on which the decision maker satisfies the first and second tenet of Bayesianism is determined endogenously.8 However, it does not fully solve the problem of screening “riskbased” behavioral traits. In fact, if a preference is probabilistically sophisticated then every event is unambiguous in the EZ sense. It follows that the decision maker with CEU preferences and capacity v in Example 3.2 (who, recall, is probabilistically sophisticated) considers every event unambiguous and is probabilistic risk averse. This is regardless of the information that is available to her; it does not matter whether she is betting on familiar or unfamiliar coins. The problem is that EZ’s definition does not distinguish between the really unambiguous events and those which appear to be. It seems likely that such distinction could only be assessed by enriching the decision framework; that is, allowing the theorist to observe more than just the decision maker’s preferences over acts. Going back to the two-coins flip example, regardless of what a decision maker thinks about the unfamiliar coin she may believe that the event that it lands heads up on a single flip is more likely than the event that it lands heads up twice in a row. That is, she may hold that a bet on one head in two flips is “unambiguously better than” a bet on two heads in two flips. All the notions of ambiguity introduced thus far cannot formally capture this possibility. In an unpublished 1996 conference talk, Nehring suggested doing so using the largest subrelation of that satisfies independence, that I shall label I . He argued that if S is finite, for a class of preferences9 the results in Bewley (2002) can be used to show that I has a multiple priors with unanimity representation, with a set of priors D. In particular, when the decision maker satisfies MMEU with set of priors C we have C = D, while D = Cv when she satisfies CEU with capacity v. Although the relation I thus obtained can in principle be constructed using only behavioral data, its derivation is not simple. Independently, Nehring (2001) and Ghirardato et al. (2002) proposed to derive from the decision maker’s preference an unambiguous preference relation as follows: Say that act f is unambiguously preferred to act g, which is denoted f ∗ g, if αf + (1 − α)h αg + (1 − α)h for every α and every h. That is, f ∗ g if the preference of f over g cannot be overturned by mixing them with another act h, regardless of whether the latter allows to hedge (or speculate on) ambiguity. It turns out that ∗ = I , providing
42
Paolo Ghirardato
a more immediate behavioral foundation to the approach proposed by Nehring in his 1996 talk. The set of priors D representing ∗ by unanimity is naturally interpreted as the ambiguity that the decision maker perceives—better, appears to perceive—in her problem. The events on which all probabilities in D agree (which can simply be characterized in terms of the primitive ; see Ghirardato et al. (2002: prop. 24)) are natural candidates for being called unambiguous, and the collection of unambiguous events forms a λ-system. Unlike his 1996 talk, Nehring (2001) considers a countably infinite S and preferences whose induced ∗ is represented by a D satisfying a “range convexity” condition. Among various consequences of such range convexity, he shows the characterization of two intuitive notions of absolute ambiguity aversion. In particular, say that a preference relation is weakly ambiguity averse if for every pair of partitions of S, {A1 , A2 , . . . , An } and {T1 , T2 , . . . , Tn }, such that each Ti is unambiguous, we cannot have that the decision maker prefers betting on Ai over betting on Ti for every i. Under Nehring’s assumptions, a decision maker is weakly ambiguity averse if her ranking of bets can be represented by a capacity v with a nonempty core. A stronger property, which Nehring calls “ambiguity aversion,” is shown instead to be equivalent to the fact that the decision maker’s ranking of bets is represented by the lower envelope of D. Ghirardato et al. (2002) consider an arbitrary S and a different class of preferences.10 They show that the set D representing ∗ can also be obtained as an (appropriately defined) “derivative” of the functional that represents the preferences. In particular, when the state S is finite, this characterization implies that D is the (closed convex hull of the) set of all the Gateaux derivatives of the preference functional, where they exist. This result generalizes the EUT intuition that a decision maker’s subjective probability of state s is the shadow price for changes in the utility received in state s, by allowing a multiplicity of shadow prices. A consequence is the extension to preferences with nonlinear utility of Nehring’s 1996 result that D corresponds to C (resp. Cv ) in the MMEU (resp. CEU) case—which in turn implies that the set of unambiguous events coincides with that defined for such preferences in Nehring (1999). Ghirardato et al. (2002) also prove that the preferences they study can in general be given a representation, which is a generalization of the MMEU representation. More precisely, an act f is evaluated via a(f ) min u(f (s)) dP (s) + (1 − a(f )) max u(f (s)) dP (s), P ∈D
P ∈D
where a(·) is a function taking values in [0, 1] which represents the decision maker’s aversion to perceived ambiguity in the sense of GM. They also axiomatize the so-called α-maxmin EU model, in which a(·) ≡ α.11 The interesting aspect of this representation is its clear separation of ambiguity (represented by D) and ambiguity attitude (represented by a(·)), and it is encouraging that the model does not impose cross-restrictions between these two aspects of the representation. As can be seen from the foregoing discussion, the “relation-based” approach to modeling ambiguity is, at least in terms of its consequences, a significant
Defining ambiguity and ambiguity attitude
43
improvement over the previous “event-based” approaches. It has also yielded some interesting new perspectives on the characterization of ambiguity aversion and love. On the other hand, it is important to stress that this approach suffers of the same shortcoming as GM’s theory of ambiguity aversion: It does not really describe “pure” ambiguity aversion, rather the conjunction of all those behavioral features that induce departures from the independence axiom of EUT. In the terminology of Chapter 1, it does not distinguish between the violations of the first and the third tenets of Bayesianism. As observed earlier, it is not obvious that a solution to this identification problem can be reached without departing from a purely behavioral approach. Besides, a difficulty with such a departure is that it would require some prejudgment as to what really constitutes ambiguity, which is the very question that we set to answer. Another limitation of the “relation-based” approach due to its purely behavioral nature is the identification of ambiguity neutrality with lack of ambiguity. If a decision maker’s preference satisfy EUT, she is deemed to perceive no ambiguity, while it may be the case that she perceives ambiguity and is neutral with respect to it. Clearly, distinctions could be drawn if we considered ancillary information about the ambiguity present in the problem, at the mentioned cost of prejudging the nature of ambiguity. On the other hand, this is not as serious a concern as the one mentioned earlier, for ultimately our interest is modeling ambiguity as it affects decision makers’ behavior, and not otherwise.
Notes 1 Recall that Schmeidler used the Anscombe–Aumann setting, in which mixtures of acts can be defined state by state. Also, he used the term “uncertainty” averse rather than ambiguity averse. 2 For instance, we have that v({HH, HT, TH}) = 9/16 < 10/16 = v({HH, HT}) + v({TH}). 3 The preferences considered in GM, called “biseparable preferences,” induce stateindependent and cardinally unique utilities. They include CEU and MMEU preferences (and other models as well) as special cases. 4 The idea that nonemptiness of the core could be a more appropriate formalization of ambiguity aversion for CEU preferences had already been suggested by Montesano and Giovannoni (1996). 5 For instance, a CEU preference is probabilistically sophisticated if its capacity v is ordinally equivalent to a probability; that is, if it is RDEU. Such is the case of the preference with capacity v in Example 3.2. 6 Given utility u and an act f , it can be seen from the definition of Choquetintegral that if σ is such that u(f (sσ (1) )) ≥ u(f (sσ (2) )) ≥ · · · ≥ u(f (sσ (n) )), then u(f ) dv = u(f ) dPσ . 7 The fact that unambiguous events should form λ-systems and not algebras was observed earlier in Zhang (2002), whose first version predates Nehring’s. 8 Further extensions in this spirit are found in Kopylov (2002). In that chapter it is also shown that in general the sets of unambiguous events of Zhang and EZ are not λ-systems, but less structured families called “mosaics.” 9 Those that have linear utility among the preferences that satisfy all the axioms in Gilboa and Schmeidler (1989) but their “uncertainty aversion” axiom. The latter are called invariant biseparable preferences by Ghirardato et al. (2002).
44
Paolo Ghirardato
10 Invariant biseparable preferences (see note 9). Such preferences do not yield specific restrictions on D (beyond convexity, nonemptiness and closedness), but they embody a mild restriction that Nehring (2001) calls “utility sophistication”. Nehring shows that under range convexity, it is possible to define an unambiguous likelihood relation on events even without utility sophistication. See chapter for details. 11 Variants of this representation are well known at least since the seminal work of Hurwicz (published in Arrow and Hurwicz (1972)). See, in particular, Jaffray (1989).
References Arrow, K. J. and L. Hurwicz (1972). “An Optimality Criterion for Decision Making under Ignorance,” in Uncertainty and Expectations in Economics, ed. by C. Carter and J. Ford. Basil Blackwell, Oxford. Bewley, T. (2002). “Knightian Decision Theory: Part I,” Decisions in Economics and Finance, 25(2), 79–110 (first version 1986). Casadesus-Masanell, R., P. Klibanoff, and E. Ozdenoren (2000). “Maxmin Expected Utility over Savage acts with a Set of Priors,” Journal of Economic Theory, 92, 33–65. Ellsberg, D. (1961). “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Epstein, L. G. (1999). “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608. (Reprinted as Chapter 9 in this volume.) Epstein, L. G. and J. Zhang (2001). “Subjective Probabilities on Subjectively Unambiguous Events,” Econometrica, 69, 265–306. Fishburn, P. C. (1993). “The Axioms and Algebra of Ambiguity,” Theory and Decision, 34, 119–137. Ghirardato, P., F. Maccheroni, and M. Marinacci (2002). “Ambiguity from the Differential Viewpoint,” Social Science Working Paper 1130, Caltech, http://www.hss.caltech.edu/∼paolo/differential.pdf. Ghirardato, P., F. Maccheroni, M. Marinacci, and M. Siniscalchi (2001). “A Subjective Spin on Roulette Wheels,” Econometrica, 71 (6): 1897–1908, Nov. 2003. Ghirardato, P. and M. Marinacci (2002). “Ambiguity Made Precise: A Comparative Foundation,” Journal of Economic Theory, 102, 251–289. (Reprinted as Chapter 10 in this volume.) Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Jaffray, J.-Y. (1989). “Linear Utility Theory for Belief Functions,” Operations Research Letters, 8, 107–112. Kelsey, D. and S. Nandeibam (1996). “On the Measurement of Uncertainty Aversion,” Mimeo, University of Birmingham. Knight, F. H. (1921). Risk, Uncertainty and Profit. Houghton Mifflin, Boston. Kopylov, I. (2002). “Subjective Probabilities on ‘Small’ Domains,” Work in progress, University of Rochester. Machina, M. J. and D. Schmeidler (1992). “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. Montesano, A. and F. Giovannoni (1996). “Uncertainty Aversion and Aversion to Increasing Uncertainty,” Theory and Decision, 41, 133–148. Nehring, K. (1999). “Capacities and Probabilistic Beliefs: A Precarious Coexistence,” Mathemathical Social Sciences, 38, 197–213.
Defining ambiguity and ambiguity attitude
45
—— (2001). “Ambiguity in the Context of Probabilistic Beliefs,” Mimeo, UC Davis. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Tversky, A. and P. P. Wakker (1995). “Risk Attitudes and Decision Weights,” Econometrica, 63, 1255–1280. Yaari, M. E. (1969). “Some Remarks on Measures of Risk Aversion and on Their Uses,” Journal of Economic Theory, 1, 315–329. Zhang, J. (2002). “Subjective Ambiguity, Probability and Capacity,” Economic Theory, 20, 159–181.
4
Introduction to the mathematics of ambiguity Massimo Marinacci and Luigi Montrucchio
4.1. Introduction As discussed at length in Chapters 1–3, some mathematical objects play a central role in Schmeidler’s decision-theoretic ideas. In this chapter we provide some more details on them. One of the novelties of Schmeidler’s decision theory papers was the use of general set functions, not necessarily additive, to model “ambiguous” beliefs. This provided a new and intriguing motivation for the study of these mathematical objects, already studied from a different standpoint in cooperative game theory, another field where David Schmeidler has made important contributions. Here we overview the main properties of such set functions. Most of the results we will present are known, though often not in the generality in which we state and prove them. In the attempt to provide streamlined proofs and more general statements, we sometimes came up with novel arguments.
4.2. Set functions 4.2.1. Basic properties We begin by studying the basic properties of set functions. We use the setting of cooperative game theory as most of these concepts originated there; their decisiontheoretic interpretation is treated in great detail in Chapters 1–3 and 13, as well as in the other chapters in this book. Let be a set of players and an algebra of admissible coalitions in . A (transferable utility) game is a real-valued set function ν : → R with the only requirement that ν(Ø) = 0. Given a coalition A ∈ , the number ν(A) is interpreted as its worth, that is, the overall value that his members can achieve by teaming up. The condition ν(Ø) = 0 reflects the obvious fact that the worth of the empty coalition is zero; a priori, nothing more is assumed in defining a game ν. In the game theory literature several additional conditions have been considered. In
Introduction to the mathematics of ambiguity
47
particular, a game ν is1 positive if ν(A) ≥ 0 for all A; bounded if supA∈ |ν(A)| < + ∞; monotone if ν(A) ≤ ν(B) whenever A ⊆ B; superadditive if ν(A ∪ B) ≥ ν(A) + ν(B) for all pairwise disjoint sets A and B; 5 convex (supermodular) if ν(A ∪ B) + ν(A ∩ B) ≥ ν(A) + ν(B) for all A, B; 6 additive (a charge) if ν(A ∪ B) = ν(A) + ν(B) for all pairwise disjoint sets A and B.
1 2 3 4
All these conditions have natural game-theoretic interpretations (see, e.g. Moulin (1995) and Owen (1995)). For example, a game is monotone when larger coalitions can achieve higher values, and it is superadditive when combining disjoint coalitions results in more than proportional increases in value. As to supermodularity, it is a stronger property than superadditivity and it can be equivalently formulated as ν(B ∪ C ∪ A) − ν(B ∪ C) ν(B ∪ A) − ν(B),
(4.1)
for all disjoint sets A, B, and C; hence, it can be interpreted as a property of increasing marginal values (see Proposition 4.15). Some assumptions of a more technical nature are also often assumed. For example, a game ν is 7 outer (inner, resp.) continuous at A if limn→∞ ν(An ) = ν(A) whenever An ↓ A (An ↑ A, resp.); 8 continuous at A if it is both inner and outer continuous at A; 9 continuous if it is continuous at each A; ∞ ∞ 10 countably additive (a measure) if ν ν(Ai ) for all countable i=1 Ai = i=1 ∞ collections of pairwise disjoint sets {Ai }∞ i=1 Ai ∈ . i=1 such that We get important classes of games by combining some of the previous properties. In particular, monotone games are called capacities, additive games are called charges, and countably additive games are called measures. Finally, positive games ν that are normalized with ν( ) = 1 are called probabilities. Notice that capacities are always positive and bounded, while positive superadditive games are always capacities. Given a charge µ, its total variation norm µ is given by sup
n
|µ(Ai ) − µ(Ai−1 )|,
(4.2)
i=1
where the supremum is taken over all finite chains Ø = A0 ⊆ A1 ⊆ · · · ⊆ An =
. Denote by ba() and ca() the vector spaces of all charges and of all measures having finite total variation norm, respectively. By classic results (e.g. Dunford and Schwartz (1958) and Rao and Rao (1983)), a charge has finite total variation
48
Massimo Marinacci and Luigi Montrucchio
if and only if it is bounded, and both ba() and ca() are Banach spaces when endowed with the total variation norm. In particular, ca() is a closed subspace of ba(). In view of these classic results, it is natural to wonder whether a useful norm can be introduced in more general spaces of games. Aumann and Shapley (1974) showed that this is the case by introducing the variation norm on the space of all games. Given a game ν, its variation norm ν is given by sup
n
|ν(Ai ) − ν(Ai−1 )|,
(4.3)
i=1
where the supremum is taken over all finite chains Ø = A0 ⊆ A1 ⊆ · · · ⊆ An =
. If ν is a charge, the variation norm ν reduces to the total variation norm. Moreover, all finite games are of bounded variation as they have a finite number of finite chains. Denote by bv() the vector space of all games ν having finite variation norm. Aumann and Shapley (1974) proved the following noteworthy properties. Proposition 4.1. A game belongs to bv() if and only if it can be written as the difference of two capacities. Moreover, bv() endowed with the variation norm is a Banach space, and ba() and ca() are closed subspaces of bv().2 In view of this result, we can say that bv() is a Banach environment for not necessarily additive games that generalizes the classic spaces ba() and ca(). In the sequel we will mostly consider games belonging to it. We close this section by observing that each game ν has a dual game ν¯ defined by ν¯ (A) = ν( ) − ν(Ac ) for each A. From the definition immediately follows that • • •
ν¯¯ = ν; ν is monotone if and only if ν¯ does; ν belongs to bv() if and only if ν¯ does.
More important, dual games have “dual” properties relative to the original game. For example, • •
ν is convex if and only if ν¯ is concave, that is, ν¯ (A ∪ B) + ν¯ (A ∩ B) ≤ ν¯ (A) + ν¯ (B) for all A, B; ν is inner continuous at A if and only if ν¯ is outer continuous at Ac .
For charges µ it clearly holds µ = µ. ¯ Without additivity, ν and ν¯ are in general distinct games (see Proposition 4.3) and sometimes it is useful to consider the pair (ν, ν¯ ) rather than only ν. Example 4.1. The duality between ν and ν¯ does not hold for all properties. For example, it is false that ν is superadditive if and only if ν¯ is subadditive. Consider
Introduction to the mathematics of ambiguity
49
the game ν on = {ω1 , ω2 , ω3 } given by ν(ωi ) = 0 for i = 1, 2, 3, ν(ωi ∪ ωj ) = 5/6 for i, j = 1, 2, 3, and ν( ) = 1. Its dual ν¯ is given by ν¯ (ωi ) = 1/6 for i = 1, 2, 3, ν¯ (ωi ∪ ωj ) = 1 for i, j = 1, 2, 3, and ν( ) = 1. While ν is superadditive, its dual is not subadditive. In fact, ν¯ (ω1 ∪ ω2 ) = 1 > ν¯ (ω1 ) + ν¯ (ω2 ) = 1/3. Normalized superadditive games having subadditive duals are sometimes called upper probabilities (see Wolfenson and Fine (1982) and the references therein contained). 4.2.2. The core Given a game ν, its core is the (possibly empty) set given by core(ν) = {µ ∈ ba() : µ(A) ≥ ν(A) for each A and µ( ) = ν( )}. In other words, the core of ν is the set of all suitably normalized charges that setwise dominate ν. Notice that core(ν) = {µ ∈ ba() : ν ≤ µ ≤ ν¯ } = {µ ∈ ba() : µ(A) ≤ ν¯ (A) for each A and µ( ) = ν( )}, and so the core can be also regarded as the set of charges “sandwiched” between the game and its dual, as well as the set of charges setwise dominated by the dual game. The core is a fundamental solution concept in cooperative game theory, where it is interpreted as the set of undominated allocations (see Moulin (1995) and Owen (1995)). After Schmeidler’s seminal works, the core plays an important role in decision theory as well, as detailed in Chapters 1–3. Mathematically, the interest of the core lies in the connection it provides between games and charges, which, unlike games, are familiar objects in measure theory. As it will be seen later, useful properties of games can be deduced via the core from classic properties of charges. The core is a convex subset of ba(). More interestingly, it has the following compactness property.3 Proposition 4.2. When nonempty, the core of a bounded game is weak*-compact. Proof. Let µ ∈ core(ν) and let k = 2 supA∈ |ν(A)|. For each A it clearly holds µ(A) ≥ ν(A) ≥ − k. On the other hand, µ(A) = µ( ) − µ(Ac ) ≤ ν( ) − ν(Ac ) ≤ 2 sup |ν(A)|, A∈
and so |µ(A)| ≤ k. By (Dunford and Schwartz, 1958: 97), µ ≤ 2k, which implies core(ν) ⊆ {µ ∈ ba() : µ ≤ 2k}. By the Alaoglu Theorem (see Dunford and Schwartz, 1958: 424), {µ ∈ ba() : µ ≤ 2k} is weak*-compact. Therefore, to complete the proof it remains
50
Massimo Marinacci and Luigi Montrucchio
to show that core(ν) is weak*-closed. Let {µα }α be a net in core(ν) that weak*converges to µ ∈ ba(). Using the properties of the weak* topology, it is easy to see that µ ∈ core(ν). Hence, core(ν) is weak*-closed. Remark. When is a σ -algebra, the condition of boundedness of the game in Proposition 4.2 is superfluous by the Nikodym Uniform Boundedness Theorem (e.g. Rao and Rao, 1983: 204–205). The core suggests some further taxonomy on games. A game ν is 11 balanced if its core is nonempty; 12 totally balanced if all its subgames νA have nonempty cores.4 We already observed that for a charge µ it holds µ = µ. ¯ This property actually characterizes charges among balanced games. Proposition 4.3. A balanced game ν is a charge if and only if ν = ν¯ . Proof. The “only if ” part is trivial. As to the “if part”, let µ ∈ core(ν). As ν ≤ µ ≤ ν¯ , we have ν = µ = ν¯ , as desired. The next result characterizes balanced games directly in terms of properties of the game ν. It was proved by Bondareva (1963) and Shapley (1967) for finite games, and extended to infinite games by Schmeidler (1968). Theorem 4.1. A bounded game is balanced if and only if, for all λ1 , . . . , λn ≥ 0 and all A1 , . . . , An ∈ , it holds n i=1
λi ν(Ai ) ≤ ν( ) whenever
n
λi 1Ai = 1.
(4.4)
i=1
Proof. As the converse is trivial, we only show that ν is balanced provided it satisfies (4.4). By (4.4), ν(A) + ν(Ac ) ≤ ν( ) for all A, so that ν ≤ ν¯ . Let E be the collection of all finite subalgebras 0 of ; for each 0 ∈ E set c(0 ) = {γ ∈ R : ν(A) ≤ γ (A) ≤ ν¯ (A) for each A ∈ and γ|0 is a charge}, where R is the collection of all set functions on , and γ|0 is the restriction of γ on 0 . The set c(0 ) is nonempty. In fact, as 0 is finite and the restriction ν|0 satisfies (4.4), by Bondareva (1963) and Shapley (1967) there exists a charge γ0
Introduction to the mathematics of ambiguity
51
on 0 satisfying ν(A) ≤ γ0 (A) ≤ ν¯ (A) for each A ∈ 0 . If we set γ0 (A) if A ∈ 0 γ (A) = ν(A) otherwise we have γ ∈ c(0 ), so that c(0 ) = Ø. Set a = inf A∈ ν(A) and b = supA∈ ν¯ (A). Both a and b belong to R since ν is bounded, Border, 1999: 52) and so by the Tychonoff Theorem (see Aliprantis and . Clearly, c( ) ⊆ [a, b] is compact in the product topology of R the set 0 B∈ B∈ [a, b]. We want to show that c(0 ) is actually a closed subset of B∈ [a, b]. Let γt be a net in c(0 ) such that γt → γ ∈ R in the product topology, that is, γt (A) → γ (A) for all A ∈ . For each A and each t, we have ν(A) ≤ γt (A) ≤ ν¯ (A); hence, ν(A) ∪ γ (A) ≤ ν¯ (A). For each t and for all disjoint A and B in 0 , we have γt (A ∪ B) = γt (A) + γt (B); hence, γ (A ∪ B) = γ (A) + γ (B). We conclude that γ ∈ c(0 ), and so c(0 ) is a closed (and so compact) subset of B∈ [a, b]. ˜ 0 ∈ E the algebra If 0 ⊆ 0 , then c(0 ) ⊆ c(0 ). Hence, denoted by i n generated by a finite sequence {0 }i=1 ⊆ E, we have n
˜0 ⊆ Ø = c c 0i . i=1
In other words, the collection of compact sets {c(0 )}0 ∈E satisfies the finite intersection property. In turn, this implies 0 ∈E c(0 ) = Ø (see Aliprantis and Border, 1999: 38), which means that there exists a charge γ such that ν(A) ≤ γ (A) ≤ ν¯ (A) for each A ∈ . Since γ ∈ B∈ [a, b], the charge γ is bounded and so it belongs to ba(). We conclude that core(ν) = Ø, as desired. Remark. As observed by Kannai (1969: 229–230) for positive games Theorem 4.1 also follows from a result of Fan (1956) on systems of linear inequalities in normed spaces. Since countable additivity is a most useful technical property, it is natural to wonder when it is the case that a nonempty core actually contains some measures. The next example of Kannai (1969) shows that this might well not happen. Example 4.2. Let = N and consider the game ν : 2N → R defined by 0 Ac is infinite ν(A) = 1 else Here core(ν) = Ø. In fact, let ∇ be any ultrafilter containing the filter of all sets having finite complements. The two-valued charge u∇ : 2N → R defined by 1 A∈∇ u∇ (A) = 0 else
52
Massimo Marinacci and Luigi Montrucchio
belongs to core(ν). On the other hand, core(ν) ∩ ca() = Ø. For, suppose per contra that µ ∈ core(ν) ∩ ca(). For each n ∈ N we have µ(n) = µ(N) − µ(N − {n}) = 0. The countable additivity of µ then implies µ(N) = n µ(n) = 0, which contradicts µ(N) = ν(N) = 1. For positive games it is trivially true that core(ν) ⊆ ca() provided ν is continuous at . In fact, for each monotone sequence An ↑ it holds ν( ) = µ( ) ≥ lim µ(An ) ≥ lim ν(An ) = ν( ), n
n
for all µ ∈ core(ν). Hence, µ( ) = limn µ(An ), which implies µ ∈ ca(). For signed games we have a more interesting result, based on Aumann and Shapley (1974: 173). Proposition 4.4. Given a balanced game ν, it holds core(ν) ⊆ ca() provided ν is continuous at both and Ø. Proof. Consider An ↑ . Let µ ∈ core(ν). We want to show that µ( ) = limn µ(An ). Since µ(An ) ≥ ν(An ) for each n, by the continuity of ν at we have lim inf n µ(An ) ≥ lim inf n ν(An ) = ν( ). On the other hand, since Acn ↓ Ø and ν is continuous at Ø, we have lim sup µ(An ) = µ( ) − lim inf µ(Acn ) ≤ ν( ) − lim inf ν(Acn ) = ν( ). n
n
n
In sum, lim sup µ(An ) ≤ ν( ) ≤ lim inf µ(An ), n
n
and so µ( ) = limn µ(An ), as desired. The next example shows that in general these continuity properties are only sufficient for the core being contained in ca(). Example 4.3. Let λ be the Lebesgue measure on [0, 1] and let f : [0, 1] → R be given by ⎧ 1 ⎪ ⎨x 0 ≤ x ≤ 2 1 1 f (x) = 2 2 < x < 1 ⎪ ⎩ 1 x=1 Consider the game ν(A) = f (λ(A)) for each A. Though this game is not continuous at , we have core(ν) = {λ} ∈ ca(). For, let µ ∈ core(ν). We want to show that µ = λ. Given A, there isa partition {Ai }ni=1 of A such that λ(Ai ) ≤ 1/2. any n Hence, µ(A) = i=1 µ(Ai ) ≥ ni=1 λ(Ai ) = λ(A). Since A was arbitrary, this implies µ ≥ λ, and so µ = λ.
Introduction to the mathematics of ambiguity
53
Intuitively, this example works because the connection between the form of the game ν = f (λ) and its core is a bit “loose.” Formally, there are gaps between ν and the core’s lower envelope minµ∈core(ν) µ(A). For example, if A is such that λ(A) = 3/4, then ν(A) = 1/2 < 3/4 = minµ∈core(ν) µ(A). To fix this problem, Schmeidler (1972) introduced the following class of games: a game ν is 13 exact if it is balanced and ν(A) = minµ∈core(ν) µ(A) for each A. In other words, a game is exact if for each A there is µ ∈ core(ν) such that ν(A) = µ(A). Exact games can thus be viewed as games in which there is a tight connection between the form of the game and its core. Schmeidler (1972) provided a characterization of exact games in terms of the game ν, related to (4.4). Moreover, he was able to prove that for exact games continuity becomes a necessary and sufficient condition for the core to be a subset of ca(). To see why this is the case, we need a remarkable property of weak*compact subsets of ca(), due to Bartle, Dunford and Schwartz (see Maccheroni and Marinacci (2000) and the references therein contained). The result requires to be a σ -algebra, a natural domain for continuous set functions. Lemma 4.1. If is a σ -algebra, then a subset of ca() is weak*-compact if and only if it is weakly compact. Remark. As the proof shows, this lemma is a consequence of the Dini Theorem when K ⊆ ca + (). Proof. It is enough to prove that a weak*-compact subset of ca() is weakly compact, the converse being trivial. Suppose K ⊆ ca() is weak*-compact. Since K is bounded and weakly closed, by Dunford and Schwartz (1958: Theorem IV.9.1) the set K is sequentially weakly compact if and only if, given An ↑ , for each ε > 0 there is a positive integer n(ε) such that |µ( ) − µ(An )| < ε for all µ ∈ K and all n ≥ n(ε). In other words, if and only if the measures in K are uniformly countably additive. For convenience, we only consider the case K ⊆ ca + () (e.g., Maccheroni and Marinacci (2000), for the general case). For each n ≥ 1 consider the evaluation functions φn : ba() → R defined by φn (µ) = µ(An )
for each µ ∈ ba().
Moreover, let φ : ba() → R be denned by φ(µ) = µ( ) for each µ ∈ ba(). Both the function φ and each function φn are weak*-continuous, and the sequence {φn }n≥1 is increasing on K. As K is weak*-compact and lim φn (µ) = lim µ(An ) = µ( ) = φ(µ) for each µ ∈ K, n
n
by the Dini Theorem (see Aliprantis and Border, 1999: 55) φn converges uniformly to φ. In turn, this easily implies the desired uniform countable additivity of the
54
Massimo Marinacci and Luigi Montrucchio
measures in K, and so K is sequentially weakly compact. By the Eberlein–Smulian Theorem (see Aliprantis and Border, 1999: 256), K is then weakly compact as well.
Using this lemma we can prove the following result, due to Schmeidler (1972) for positive games. Here |µ|(A) denotes the total variation of µ at A (see Aliprantis and Border, 1999: 360). Theorem 4.2. Let ν : → R be an exact game defined on a σ -algebra . Then, the following conditions are equivalent: (i) (ii) (iii) (iv)
ν is continuous at and Ø. ν is continuous at each A. core(ν) is a weakly compact subset of ca(). There exists λ ∈ ca + () such that, given any A, for all ε > 0 there exists δ > 0 such λ(A) < δ =⇒ |µ|(A) < ε for all µ ∈ core(ν).
(4.5)
Remark. Inspection of the proof shows that when ν is positive, in (i) we can just assume continuity at , while in (iv) we can choose λ so that it belongs to core(ν). Proof. (ii) trivially implies (i), which in turn implies core(ν) ⊆ ca() by Proposition 4.4. By Proposition 4.2, core(ν) is weak*-compact, and so, by Lemma 4.1, it is weakly compact as well. Assume (iii) holds. Since core(ν) is a weakly compact subset of ca(), by Dunford and Schwartz (1958: Theorem IV.9.2) there is λ ∈ ca + () such that (iv) holds. If ν is positive, following Delbaen (1974: 226) replace 1/2i by 1/mn at the bottom of Dunford and Schwartz (1958: 307) to get λ ∈ core(ν). It remains to show that (iv) implies (ii). Assume (iv). Since λ is countably additive, (4.5) implies that each µ ∈ core(ν) is countably additive as well, that is, core(ν) ⊆ ca(). By Lemma 4.1, core(ν) is weakly compact. We are now ready to show that ν is continuous at each A. Per contra, suppose there is some A at which ν is not continuous, that is, there is a sequence, say An ↑ A (the argument for An ↓ A is similar), and some η > 0 such that |ν(An )−ν(A)| ≥ η. As ν is exact, for each n there is µn ∈ core(ν) such that ν(An ) = µn (An ). By the Eberlein– Smulian Theorem (see Aliprantis and Border, 1999: 256), core(ν) is sequentially weakly compact as well. Hence, there is a suitable subsequence {µnk }nk of {µn }n such that µnk weakly converges to some µ˜ ∈ core(ν). By Dunford and Schwartz ˜ for each A. (1958: Theorem IV.9.5), this means that limk µnk (A) = µ(A) Now, consider ν(Ank ) = µnk (Ank ) = µnk (A) − µnk (A \ Ank ).
(4.6)
Clearly, A \ Ank ↓ Ø. Since core(ν) is weakly compact, by Dunford and Schwartz (1958: Theorem IV.9.1) the measures in core(ν) are uniformly countably additive,
Introduction to the mathematics of ambiguity
55
and so for each ε > 0 there is k(ε) 1 such that |µ(A \ Ank )| < ε for all µ ∈ core(ν) and all k ≥ k(ε). In particular, |µnk (A \ Ank )| < ε for all k ≥ k(ε). As ε is arbitrary, this implies limk µnk (A \ Ank ) = 0. By (4.6), we then have lim ν(Ank ) = lim µnk (Ank ) = µ(A) ˜ ≥ ν(A). k
k
(4.7)
On the other hand, there exists a µˆ ∈ core(ν) such that µ(A) ˆ = ν(A). Hence, ˆ nk ) ≥ lim ν(Ank ). ν(A) = µ(A) ˆ = lim µ(A k
k
(4.8)
Putting together (4.7) and (4.8), we get ν(A) = limnk ν(Ank ), thus contradicting |ν(An ) − ν(A)| ≥ η. We conclude that ν is continuous at A, as desired. Point (iv) is noteworthy. It says that the continuity of ν guarantees the existence of a positive control measure λ for core(ν), that is, a measure λ ∈ ca + () such that µ λ for all µ ∈ core(ν). This is a very useful property; inter alia, it implies that core(ν) can be identified with a subset of L1 (λ), the set of all (equivalence classes) of -measurable functions that are integrable with respect to λ. In fact, by the Radon–Nikodym Theorem (see Aliprantis and Border, 1999: 437) to each µ λ corresponds a unique f ∈ L1 (λ) such that µ(A) = A f dλ for all A. Corollary 4.1. Let ν : → R be an exact game defined on a σ -algebra . Then, ν is continuous at and Ø if and only if there is λ ∈ ca + () such that core(ν) is a weakly compact subset of L1 (λ). Proof. Set ca(λ) = {µ ∈ ca : µ λ}. By the Radon–Nikodym Theorem, there is an isometric isomorphism between ca(λ) and L1 (λ) determined by the formula µ(A) = A f dλ (see Dunford and Schwartz, 1958: 306). Hence, a subset is weakly compact in ca(λ) if and only if it is in L1 (λ) as well. It is sometimes useful to know when the core of a continuous game consists of non-atomic measures. We close the section by studying this problem, which also provides a further illustration of the usefulness of the control measure λ. In order to do so, we need to introduce null sets. Given a game ν, a set N is ν-null if ν(N ∪ A) = ν(A)
for all A ∈ .
The next lemma collects some basic properties of null sets. Lemma 4.2. Given a game ν, let N be a ν-null set. Then (i) each subset B ⊆ N is ν-null; (ii) ν(B) = 0 and ν(A \ B) = ν(A) for any B ⊆ N ; (iii) N is ν¯ -null.
(4.9)
56
Massimo Marinacci and Luigi Montrucchio
Proof. (i) Let B ⊆ N and let A be any set in . By (4.9), ν(B ∪ A) = ν(B ∪ A ∪ N ) = ν(A ∪ N ) = ν(A), and so B is ν-null. (ii) If we put A = Ø in (4.9), we get ν(N ) = 0. By (i), each B ⊆ N is νnull, so that ν(B) = 0 by what we have just established. It remains to show that ν(A\B) = ν(A) for any B ⊆ N . By (i), A ∩ B is ν-null. Hence, ν(A\B) = ν((A \ B) ∪ (A ∩ B)) = ν(A), as desired. (iii) Let A be any set in . By (ii) we then have ν¯ (A ∪ N ) = ν( ) − ν(Ac \ N ) = ν( ) − ν(Ac ) = ν¯ (A), as desired. For a charge µ, a set N is µ-null if and only if |µ|(N ) = 0. For, suppose N is µ-null. We have (see Aliprantis and Border, 1999: 360): |µ|(N ) = sup{|µ(B)| + |µ(N\B)| : B ⊆ N }, and so point (ii) of Lemma 4.2 implies |µ|(N ) = 0. Conversely, suppose |µ|(N ) = 0. Then, |µ(B)| = 0 for each B ⊆ N, and so µ(A ∪ N) = µ(A ∪ N \A) = µ(A) + µ(N\A) = µ(A) for each set A ∈ . We conclude that N is µ-null, as desired. Given two games ν1 and ν2 , we say that ν1 is absolutely continuous with respect to ν2 (written ν1 ν2 ) when each ν2 -null set is ν1 -null; we say that the two games are equivalent (written ν1 ≡ ν2 ) when a set is ν1 -null if and only if it is ν2 -null. In the special case of charges we get back to the standard definitions of absolute continuity (see Aliprantis and Border, 1999: 363). Given a balanced game ν, we have µ ν for each µ ∈ core (ν). For, let m ∈ core (ν) and suppose N is ν-null. For each A ⊆ N , we have m(A) ≥ ν(A) = 0, and m(Ac ) ≥ ν(Ac ) = ν( ) = m( ) = m(A) + m(Ac ). Hence, m(A) = 0 for all A ⊆ N, namely, |m|(N ) = 0. For continuous exact games we have the following deeper result, due to Schmeidler (1972: Theorem 3.10), which provides a further useful property of the control measure λ. Lemma 4.3. Given an exact and continuous game ν defined on a σ -algebra , let λ be the control measure of Theorem 4.2. Then, ν ≡ λ.
Introduction to the mathematics of ambiguity
57
Proof. By Dunford and Schwartz (1958: Theorem IV.9.2), we have λ=
kn ∞ −n n 2 µ i kn n=1
(4.10)
i=1
with each µni ∈ core (ν). Let N be ν-null. As µ ν for each µ ∈ core (ν), N is µ-null for each µ ∈ core (ν). Hence, |µ|(N ) = 0 for all µ ∈ core (ν). By (4.10), λ(N ) = 0. Therefore N is λ-null. Conversely, suppose λ(N ) = 0. As µ λ, for each µ ∈ core (ν), we have |µ|(N ) = 0 for each µ ∈ core (ν). By exactness, there are µ, µ ∈ core (ν) such that ν(N ∪ F ) = µ(N ∪ F ) = µ(F ) ≥ ν(F ) = µ (F ) = µ (N ∪ F ) ≥ ν(N ∪ F ) and so N is ν-null. We conclude that ν ≡ λ, as desired. A game ν is non-atomic if for each ν-nonnull set A there is a set B ⊆ A such that both B and A\B are ν-nonnull. In particular, a charge µ is non-atomic if and only if for each |µ|(A) > 0 there is B ⊆ A such that 0 < |µ|(B) < |µ|(A). In turn, this is equivalent to require that for each µ(A) = 0 there is B ⊆ A such that both µ(B) = 0 and µ(A\B) = 0 (see Rao and Rao, 1983: 141–142). We can now state and prove the announced result on “non-atomic” cores. Proposition 4.5. Let ν be a continuous exact game defined on σ -algebra . Then, ν is non-atomic if and only if core (ν) consists of non-atomic measures. Proof. “If” part. Suppose ν is non-atomic. By Lemma 4.3, λ as well is nonatomic. In turn, this implies that each µ ∈ core (ν) is non-atomic. In fact, let |µ|(A) = 0 for some A, so that λ(A) > 0. Since λ is non-atomic, there is a partition 1 of A such that λ(A1 ) = λ(B 1 ) = 1 λ(A) (see Rao and Rao, 1983: A11/2 , B1/2 1/2 1/2 2 1 ) < |µ|(A), we are Theorem 5.1.6). If 0 < |µ|(A11/2 ) < |µ|(A) or 0 < |µ|(B1/2 1 1 ) = |µ|(A). done. Suppose, in contrast, that either |µ|(A1/2 ) = |µ|(A) or |µ|(B1/2 2 be a partition of A1 Without loss, let |µ|(A11/2 ) = |µ|(A). Let A21/2 and B1/2 1/2 1 ) = 1 λ(A1 ). If 0 < |µ|(A2 ) < |µ|(A1 ) or such that λ(A21/2 ) = λ(B1/2 1/2 1/2 1/2 2 2 ) < |µ|(A1 ) we are done. 0 < |µ|(B1/2 1/2 2 ) = Suppose, in contrast, that either |µ|(A21/2 ) = |µ|(A11/2 ) or |µ|(B1/2 1 2 1 |µ|(A1/2 ). Without loss, let |µ|(A1/2 ) = |µ|(A1/2 ). By proceeding in this way, either we find a set B ⊆ A such that 0 < |µ|(B) < |µ|(A) or we can construct a chain {An1/2 }n≥1 such that λ(An1/2 ) = 21n λ(A) and |µ|(An1/2 ) = |µ|(A) for all
n ≥ 1. Hence, being n≥1 An1/2 ∈ , λ( n≥1 An1/2 ) = 0 and |µ|( n≥1 An1/2 ) = |µ|(A) > 0. Since µ λ, this is impossible, and so there exists some set B ⊆ A such that 0 < |µ|(B) < |µ|(A). We conclude that µ is non-atomic, as desired.
58
Massimo Marinacci and Luigi Montrucchio
“Only if.” Suppose that each µ ∈ core (ν) is non-atomic. Set λn = n (2−n /kn ) ki=1 |µni | in (4.10). Then, λ = ∞ n=1 λn . Each positive measure λn is non-atomic. For, suppose λn (A) > 0. There is some |µni | such that |µni |(A) > 0. Hence, there is B ⊆ A such that |µni |(B) > 0 and |µni |(A\B) > 0. Since λn ≥ |µni |, we then have λn (B) > 0 and λn (A\B) > 0, as desired. Since each λn is non-atomic, λ as well is non-atomic. For, suppose λ(A) > 0. There is some λn such that λn (A) > 0. Hence, there is B ⊆ A such that λn (B) > 0 and λn (A\B) > 0. Since λ ≥ λn , we then have λ(B) > 0 and λ(A\B) > 0, and so λ is non-atomic. By Lemma 4.3, ν ≡ λ. As λ is non-atomic, this implies that ν as well is non-atomic, as desired.
4.3. Choquet integrals Given a game ν : → R and a real-valued function f : → R, a natural question is whether there is a meaningful way to write an integral f dν that extends the standard notions of integrals for additive games. Fortunately, Choquet (1953: 265) has shown that it is possible to develop a rich theory of integration in a non-additive setting. As usual with notions of integration, we will present Choquet’s integral in a few steps, beginning with positive functions. 4.3.1. Positive functions A function f : → R is -measurable if f −1 (I ) ∈ for each open and each closed interval I of R (see Dunford and Schwartz, 1958: 240). The set of all bounded -measurable f : → R is denoted by B(). Proposition 4.6. The set B() is a lattice. If, in addition, is a σ -algebra, then B() is a vector lattice. Proof. Let f , g ∈ B(). We only prove that (f ∨ g)−1 (a, b) ∈ for any open (possibly unbounded) interval (a, b) ⊆ R, the other cases being similar. For each t ∈ R, the following holds: (f ∨ g > t) = (f > t) ∪ (g > t) (f ∨ g < t) = (f < t) ∩ (g < t). Hence, (f ∨ g)−1 (a, b) = (f ∨ g > a) ∩ (f ∨ g < b) = ((f > a) ∪ (g > a)) ∩ ((f < b) ∩ (g < b)) ∈ , as desired. Finally, the fact that B() is a vector space when is a σ -algebra is a standard result in measure theory (see Aliprantis and Border, 1999: Theorem 4.26).
Introduction to the mathematics of ambiguity
59
Given a capacity ν : → R and a positive -measurable function f : → R, the Choquet integral of f with respect to ν is given by ∞ ν({ω ∈ : f (ω) ≥ t}) dt, (4.11) f dν = 0
where on the right we have a Riemann integral. To see why the Riemann integral is well defined, first observe that f −1 ([t, +∞)) = {ω ∈ : f (ω) ≥ t} ∈
for each t ∈ R.
Set Et = {ω ∈ : f (ω) ≥ t}; the survival function Gν : R → R of f with respect to ν is defined = ν(Et ) for each t ∈ R. Using this function, we by Gν (t) ∞ can write (4.11) as f dν = 0 Gν (t) dt. The family {Et }t∈R is a chain, with Et ⊇ Et if t ≤ t .5 Since ν is a capacity, we have ν(Et ) ≥ ν(Et ) if t ≤ t , and so Gν is a decreasing function. Moreover, since f is both positive and bounded, the function Gν is positive, decreasing and with compact support. By standard +∞ results on Riemann integration, we conclude that the Riemann integral 0 Gν (t) dt exists, and so the Choquet integral (4.11) is well defined. The Choquet integral f dν reduces to the standard additive integral when ν is additive. Given a positive charge µ and a function f in B(), let f dµ be the standard additive integral for charges (see Aliprantis and Border, 1999: 399 and Rao and Rao, 1983: 115–121). Proposition 4.7. Given a positive function f ∈ B() and a positive charge µ ∈ ba(), it holds f dµ = µ(f ≥ t) dt = f dµ. Proof. We use an argument of Rudin (1987: 172). Set Et = (f ≥ t). Given ω ∈ , we have
∞
1Et (ω) dt =
0
0
∞
1[0,f (ω)] (t) dt =
f (ω)
dt = f (ω).
0
∞ Equivalently, f (ω) = 0 1Et (ω) dλ, where λ is the Lebesgue measure on R. By the Fubini Theorem for the integral (e.g., Marinacci, 1997), we can write
f dµ =
= 0
as desired.
∞
1Et (ω) dλ
0 ∞
dµ =
µ(f ≥ t) dλ = 0
∞
1Et (ω) dµ
0 ∞
µ(f ≥ t) dt,
dλ
60 Massimo Marinacci and Luigi Montrucchio We close by observing that in defining Choquet integrals we could have equivalently used the “strict” upper sets (f > t). Proposition 4.8. Let ν be a capacity and f a positive function in B(). Then,
∞
∞
ν(f t) dt =
ν(f > t) dt.
0
0
Proof. As before, set Gν (t) = ν(f ≥ t) for each t ∈ R. Moreover, set Gν (t) = ν(f > t) for each t ∈ R. We have (f ≥ t + 1/n] ⊆ (f > t) ⊆ (f ≥ t) for each t ∈ R, and so Gν (t +1/n) ≤ Gν (t) ≤ Gν (t) for each t ∈ R. If Gν is continuous at t, we have Gν (t) = limn Gν (t + 1/n) ≤ Gν (t) ≤ Gν (t), so that Gν (t) = Gν (t). On the other hand, as Gν is a decreasing function, it is continuous except on an / T, at most countable setT ⊆ R. As a result, ∞ it holds Gν (t) = Gν (t) for all t ∈ ∞ which in turn implies 0 Gν (t) dt = 0 Gν (t) dt by standard results on Riemann integration. 4.3.2. General functions We now extend the definition of the Choquet integral to general -measurable functions. In the previous subsection we have defined the Choquet integral on B + (), the cone of all positive elements of B(). Each capacity ν induces a functional νc : B + () → R on this cone, given by νc (f ) = f dν for each f ∈ B + (). If f is a characteristic function 1A , we get νc (1A ) = 1A dν = ν(A); thus, the functional νc —which we call the Choquet functional—can be viewed as an extension of the capacity ν from to B + (). Our problem of defining a Choquet integral on B() can be viewed as the problem of how to extend the Choquet functional on the entire space B(). In principle, there are many different ways to extend it. To make the extension problem meaningful we have to set a desideratum for the extension, that is, a property we want it to satisfy. A natural property to require is that the extended functional νc : B() → R be translation invariant, that is, νc (f + α1) = νc (f ) + ανc (1) for each α ∈ R and each f ∈ B(). The next result shows that this desideratum pins down the extension to a particular form. Proposition 4.9. A Choquet functional νc : B + () → R induced by a capacity admits a unique translation invariant extension, given by 0
∞
ν(f ≥ t) dt +
0
−∞
[ν(f ≥ t) − ν( )] dt
for each f ∈ B(), where on the right we have two Riemann integrals.
(4.12)
Introduction to the mathematics of ambiguity
61
Proof. Set
∞
νc (f ) =
ν(f ≥ t) dt +
0
−∞
0
[ν(f ≥ t) − ν( )] dt.
The functional νc is well defined and some simple algebra shows that it is translation invariant and that it reduces to the Choquet integral when f ∈ B + (). Assume ν : B() → R is a translation invariant functional such that ν(f ) = νc (f ) ν satisfies (4.12), so that ν = νc . whenever f ∈ B + (). We want to show that Let f ∈ B() be such that inf f = γ < 0. By translation invariance, ν(f − γ) = ν(f ) − γ ν(1 ). As f − γ belongs to B + (), we can then write: ν(f ) = ν(f − γ ) + γ ν(1 ) = νc (f − γ ) + γ νc (1 ) ∞ ν((f − γ ) ≥ t) dt + γ νc (1 ) = 0
=
∞
ν(f ≥ t + γ ) dt + γ νc (1 )
0
=
∞
ν(f ≥ τ ) dτ + γ νc (1 )
γ
=
0
∞
ν(f ≥ τ ) dτ +
ν(f ≥ τ ) dτ −
0
γ
0
ν( ) dτ γ
where the penultimate equality is due to the change of variable τ = t + γ . As [ν(f ≥ τ ) − ν( )] = 0 for all τ ≤ γ , the following holds: ν(f ) =
∞
ν(f ≥ τ ) dτ +
0
0
−∞
(ν(f ≥ τ ) − ν( )) dτ .
Hence, ν = νc , as desired. Before moving on, observe that the Riemann integrals in (4.12) exist even if ν is a game of bounded variation, that is, if ν ∈ bv(). In fact, for each such game there exist two capacities ν1 and ν2 with ν = ν1 − ν2 . Hence, ν(f ≥ t) = ν1 (f ≥ t) − ν2 (f ≥ t) for each t ∈ R, and so ν(f ≥ t) is a function of bounded variation in t. The Riemann integrals in (4.12) then exist by standard results on Riemann integrals. In view of Proposition 4.9 and the above observation, next we define the Choquet integral for functions in B() with respect to games in bv() as the translation invariant extension of the definition given in (4.11) for positive functions.
62
Massimo Marinacci and Luigi Montrucchio
Definition 4.1. Given a game ν ∈ bv() and a function f ∈ B(), the Choquet integral f dν is defined by
f dν =
∞
ν(f ≥ t) dt +
0
0
−∞
[ν(f ≥ t) − ν( )] dt.
The associated Choquet functional νc : B() → R is given by νc (f ) = for each f ∈ B().
(4.13)
f dν
Translation invariance and Proposition 4.7 imply that when ν is a bounded charge, the Choquet integral f dν of a f ∈ B() reduces to the standard additive integral. Moreover, it is easy to check that Proposition 4.8 holds for general Choquet integrals, that is, ∞ 0 f dν = ν(f > t) dt + [ν(f > t) − ν( )] dt. −∞
0
Finally, the Choquet integral (4.13) is well defined for all finite games since they belong to bv(). As in the finite case B() = R , this means that finite games induce Choquet functional νc : R → R. Example 4.4. Given a nonempty coalition A, the unanimity game uA : → R is the two-valued convex game defined by 1 A⊆B uA (B) = 0 else for all B ∈ . For each f ∈ B() it holds f duA = inf ω∈A f (ω). In fact, we have A ⊆ (f ≥ t) if and only if t ≤ inf ω∈A f (ω), and so GuA (t) = 1(−∞,inf ω∈A f (ω)) (t). Example 4.5. Let = {ω1 , ω2 } and suppose ν is a capacity on 2 with 0 < ν(ω1 ) < 1, 0 < ν(ω2 ) < 1, and ν( ) = 1. Then, νc : R2 → R is given by x1 (1 − ν(ω2 )) + x2 ν(ω2 ) if x2 ≥ x1 , νc (x1 , x2 ) = x1 ν(ω1 ) + x2 (1 − ν(ω1 )) if x2 < x1 Given any k ∈ R, the level curve {(x1 , x2 ) ∈ R2 : νc (x1 , x2 ) = k} is 2) x2 = ν(ωk 2 ) − 1−ν(ω if x2 ≥ x1 , ν(ω2 ) x1 x2 =
k 1−ν(ω1 )
−
ν(ω1 ) 1−ν(ω1 ) x1
if x2 < x1
As a result, the level curve is a straight line when ν is a charge—that is, when ν(ω1 ) + ν(ω2 ) = 1—and it has, in contrast, a kink at the 45-degree line {(x1 , x2 ) ∈ R2 : x1 = x2 } when ν is not a charge. The non additivity of ν is thus reflected
Introduction to the mathematics of ambiguity
63
by kinks in the level curves. In general, level curves of Choquet integrals are not affine spaces, unless the game is a charge. A function f in B() is simple if it is finite-valued, that is, if the set {f (ω) : ω ∈ } is finite. Each simple function f admits a unique represenk k tation f = i=1 αi 1Ai , where {Ai }i=1 ⊆ is a suitable partition of and α1 > · · · > αk . Using this representation, we can rewrite formula (4.13) in a couple of equivalent ways, which are sometimes useful (e.g., the discussion of the Choquet Expected Utility model of Schmeidler (1989) in Chapter 1). Proposition 4.10. Given a game ν ∈ bv() and a simple function f ∈ B(), it holds ⎞ ⎛ i k f dν = Aj ⎠ (αi − αi+1 )ν ⎝ i=1
=
k
⎛ ⎛ αi ⎝ν ⎝
j =1
⎞
i
⎛
Aj ⎠ − ν ⎝
j =0
i=1
⎞⎞
i−1
Aj ⎠⎠ ,
j =0
where we set αk+1 = 0 and A0 = Ø. Proof. It is enough to prove the first equality, the other being a simple rearrangement of its terms. Let f be positive, so that αk ≥ 0. If t > α1 , then {ω ∈ : f (ω) ≥ t} = ∅. If t ∈ (αi+1 , αi ], then (recall that αk+1 = 0): {ω ∈ : f (ω) ≥ t} =
i
Aj .
j =1
Hence, ν(f ≥ t) =
k
⎛ ν⎝
i
⎞ Aj ⎠ 1(αi+1 ,αi ] (t) for each t ∈ R+ ,
j =1
i=1
so that
f dν =
∞
ν(f ≥ t) dt =
0
=
k i=1
k ∞
0
⎛ ν⎝
i
j =1
⎞ Aj ⎠
0
∞
i=1
⎛ ν⎝
i
⎞ Aj ⎠ 1(αi+1 ,αi ] (t) dt
j =1
1(αi+1 ,αi ] (t) dt =
k i=1
⎛ (αi − αi+1 )ν ⎝
i
⎞ Aj ⎠ ,
j =1
as desired. This proves the first equality for a positive f . The case of a general f is easily obtained using translation invariance.
64
Massimo Marinacci and Luigi Montrucchio
When ν ∈ ba(), the above formulae reduce to k αi ν(Ai ), f dν = i=1
which is the standard integral of f with respect to the charge ν. Example 4.6. Let P : → [0, 1] be a probability charge with range R(P ) = {P (A) : A ∈ }. Given a real-valued function g : R(P ) → R, the game ν = f (P ) is called a scalar measure game. It holds ∞ 0 f dν = [g(P (f ≥ t)) − g(1)] dt. g(P (f ≥ t)) dt + −∞
0
The right-hand side becomes ⎡ ⎛ ⎛ ⎞⎞ ⎛ ⎛ ⎞⎞⎤ i i−1 k αi ⎣g ⎝P ⎝ Aj ⎠⎠ − g ⎝P ⎝ Aj ⎠⎠⎦ i=1
j =0
j =0
when f is a simple function. This is a familiar formula in Rank Dependent Expected Utility (see Chapters 1 and 2). 4.3.3. Basic properties We begin by collecting a few basic properties of Choquet integrals. Here, · on bv() is the variation norm given by (4.3), while ≥ and · on B() are the pointwise order and the supnorm, respectively.6 Proposition 4.11. Suppose νc : B() → R is the Choquet functional induced by a game ν ∈ bv(). Then (i) (ii) (iii) (iv)
(Positive homogeneity): νc (αf ) = ανc (f ) for each α ≥ 0. (Translation invariance): νc (f + α1 ) = νc (f ) + ανc (1 ) for each α ∈ R. (Monotonicity): νc (f ) ≥ νc (g) if f ≥ g, provided ν is a capacity. (Lipschitz continuity): for all f , g ∈ B(), |νc (f ) − νc (g)| ≤ νf − g.
(4.14)
Proof. Properties (i) and (ii) are easily established. To see that (iii) holds it is enough to observe that, being ν a capacity, it holds ν(g ≥ t) ≤ ν(f ≥ t) for each t ∈ R since f ≥ g implies (g ≥ t) ⊆ (f ≥ t) for each t ∈ R. As to (iv), suppose first that ν is a capacity. Assume νc (f ) ≥ νc (g) (the other case is similar). As f ≤ g + f − g, by (ii) and (iii) we have νc (f ) ≤ νc (g) + f − gν( ). This implies |νc (f ) − νc (g)| ≤ ν( )f − g, which is (4.14) when ν is monotonic. For, in this case ν = ν( ).
(4.15)
Introduction to the mathematics of ambiguity
65
Now, let ν ∈ bv(). By Aumann and Shapley (1974: 28), ν can be written as ν = ν + − ν − , where ν + and ν − are capacities such that ν = ν + ( ) + ν − ( ). By (4.15), we then have |νc (f ) − νc (g) ≤ [ν + ( ) + ν − ( )]f − g, as desired. If a game ν belongs to bv(), its dual ν¯ as well belongs to bv(). The Choquet functional ν¯ c is therefore well defined and next we show that it can be viewed as the dual functional of νc . Proposition 4.12. Let ν ∈ bv(). Then, ν¯ c (f ) = −νc (−f ) for each f ∈ B(). If, in addition, ν is balanced, then νc (f ) ≤ µ(f ) ≤ ν¯ c (f ) for each f ∈ B() and each µ ∈ core (ν). Proof. Given f ∈ B(), we have ν¯ c (f ) =
∞
ν¯ (f ≥ t) dt +
0
=
∞
0
−∞
[¯ν (f ≥ t) − ν¯ ( )] dt
0
=
∞
[ν( ) − ν(f ≤ t)] dt −
0
=
∞
[ν( ) − ν(f < t)] dt +
0 −∞ 0
−∞
[ν( ) − ν(−f ≥ −t)] dt −
0
−∞
0 −∞
∞
[ν( ) − ν(−f ≥ t)] dt −
=−
ν(f ≤ t) dt
0
=
−ν(f < t) dt
0 0
−∞
ν(−f ≥ −t) dt
ν(−f ≥ t) dt
∞
[ν(−f ≥ t) − ν( )] dt +
ν(−f ≥ t) dt
0
= −νc (−f ). Suppose ν is balanced. Then ν(A) ≤ µ(A) ≤ ν¯ (A) for each A ∈ and each µ ∈ core (ν). In turn this implies that, given any f ∈ B(), ν(f ≥ t) ≤ µ
66
Massimo Marinacci and Luigi Montrucchio
(f ≥ t) ≤ ν¯ (f ≥ t) for each t ∈ R. By the monotonicity of the Riemann integral, 0 ∞ ν¯ (f ≥ t) dt + [¯ν (f ≥ t) − ν¯ ( )] dt 0
≥
−∞
∞
0
≥
0
∞
µ(f ≥ t) dt + ν(f ≥ t) dt +
0
−∞ 0
−∞
[µ(f ≥ t) − ν¯ ( )] dt
[ν(f ≥ t) − ν( )] dt,
and so νc (f ) ≤ µ(f ) ≤ ν¯ c (f ), as desired. In general, Choquet functionals ν : B() → R are not additive, that is, it is in general false that νc (f + g) = νc (f ) + νc (g). However, the next result, due to Dellacherie (1971), shows that additivity holds in a restricted sense. Say that two functions f , g ∈ B() are comonotonic (short for “commonly monotonic”) if (f (ω) − f (ω ))(g(ω) − g(ω )) ≥ 0 for any pair ω, ω ∈ . That is, two functions are comonotonic provided they have a similar pattern. Theorem 4.3. Suppose ν : B() → R is the Choquet functional induced by a game ν ∈ bv(). Then, νc (f + g) = νc (f ) + νc (g) provided f and g are comonotonic, and f + g ∈ B(). To prove this result we need a couple of useful lemmas. The first one says that two functions f and g are comonotonic if and only if all their upper sets are nested. This is trivially true for the two collections (f ≥ t) and (g ≥ t) separately; the interesting part here is that f and g are comonotonic if and only if this is still the case for the combined collection {(f ≥ t)}t∈R ∪ {(g ≥ t)}t∈R . For a proof of this lemma we refer to Denneberg (1994: Prop. 4.5). Lemma 4.4. Two functions f , g ∈ B() are comonotonic if and only if the overall collection of all upper sets (f ≥ t) and (g ≥ t) is a chain. The next lemma says that we can replicate games over chains with suitable charges. The non-additivity of a game is, therefore, immaterial as long as we restrict ourselves to chains. Lemma 4.5. Let ν ∈ bv(). Given any chain C in there is µ ∈ ba() such that µ(A) = ν(A) for all A ∈ C.
(4.16)
If, in addition, ν is a capacity, then we can take µ ∈ ba + (). Proof. It is enough to prove the result for a capacity ν, as the extension to any game in bv() is routine in view of their decomposition as differences of capacities given in Proposition 4.1.
Introduction to the mathematics of ambiguity
67
Consider first a finite chain Ø = A0 ⊆ A1 ⊆ · · · ⊆ An ⊆ An+1 = . Let 0 be the finite subalgebra of generated by such chain. Let µ0 ∈ ba + (0 ) be defined by µ0 (Ai+1 \Ai ) = ν(Ai+1 ) − ν(Ai ) for i = 1, . . . , n. By standard extension theorems for positive charges (see Rao and Rao, 1983: Corollary 3.3.4), there exists µ ∈ ba + () which extends µ0 on , that is, µ(A) = µ0 (A) for each A ∈ 0 . Hence, µ is the desired charge. Now, let C be any chain. Let {Cα }a be the collection of all its finite subchains, and set α = {µ ∈ ba + () : µ(A) = ν(A) for each A ∈ Cα }. By what we just proved, each α is nonempty. Moreover, the collection {α }a has the finite intersection property. For, let {Ci }ni=1 ⊆ {Cα }a be a finite collection. Since ni=1 Ci is in turn a finite chain, by proceeding as before it is easy to establish the existence
of a µ ∈ ba() such that µ(A) = ν(A) for each A ∈ ni=1 Ci . As µ ∈ ni=1 i , the intersection ni=1 i nonempty, as desired. Each α is a weak∗ -closed subset of the weak∗ -compact set {µ ∈ ba + () : µ( ) = ν( )}. Since {α }a has the finite intersection property, we conclude that Any charge µ ∈ α α satisfies (4.16).
α
α = Ø.
Proof of Theorem 4.3. Suppose f and g are comonotonic functions in B(). Then, the sum f + g is comonotonic with both f and g, so that the collection {f , g, f + g} consists of pairwise comonotonic functions. Let C = {(f ≥ t)}t∈R ∪ {(g ≥ t)}t∈R ∪ {(f + g ≥ t)}t∈R . By Lemma 4.4, C is a chain. By Lemma 4.5, there is µ ∈ ba() such that µ(A) = ν(A) for all A ∈ C. Hence, f dν + g dν = f dµ + g dµ = (f + g) dµ = (f + g) dν, as desired. As constant functions are comonotonic with all other functions, comonotonic additivity is a much stronger property than translation invariance. The next result of Bassanezi and Greco (1984: Theorem 2.1) shows that comonotonic additivity is actually the “best” possible type of additivity for Choquet functionals. Proposition 4.13. Suppose contains all singletons. Then, two functions f , g ∈ B(), with f + g ∈ B(), are comonotonic if and only if it holds νc (f + g) = νc (f )+νc (g) for all Choquet functionals induced by convex capacities ν : → R.
68
Massimo Marinacci and Luigi Montrucchio
Proof. The “only if ” part holds by Theorem 4.3. As to “if ” part, assume it holds νc (f + g) = νc (f ) + νc (g)
(4.17)
for all Choquet functionals induced by convex capacities. Suppose, per contra, that f and g are not comonotonic. Then, there exist ω , ω ∈ such that [f (ω ) − f (ω )][g(ω ) − g(ω )] < 0. Say that f (ω ) < f (ω ) and g(ω ) > g(ω ), and consider the convex game u{ω ,ω } (A) =
1 if {ω , ω } ⊆ A . 0 else
By Example 4.4, u{ω ,ω },c (f ) = f (ω ) and u{ω ,ω },c (g) = g(ω ). Hence, u{ω ,ω },c (f + g) = min{(f + g)(ω ), (f + g)(ω )} = f (ω ) + g(ω ) = u{ω ,ω },c (f ) + u{ω ,ω },c (g), which contradicts (4.17). Notice that the argument used to prove the last result can be adapted to give the following characterization of comonotonicity: when contains all singletons, two functions f , g ∈ B() are comonotonic if and only if inf (f (ω) + g(ω)) = inf f (ω) + inf g(ω)
ω∈A
ω∈A
ω∈A
for all A ∈ . Lemmas 4.4 and 4.5 are especially useful in finding counterparts for games and for their Choquet integrals of standard results that hold in the additive case. Theorem 4.3 is a first important example since through these lemmas we could derive the counterpart for Choquet integrals of the additivity of standard integrals. We close this subsection with another simple illustration of this feature of Lemmas 4.4 and 4.5 by showing a version for Choquet integrals of the classic Jensen inequality. Proposition 4.14. Let ν be a capacity with ν( ) = 1. Given a monotone convex function φ : R → R, for each f ∈ B() the following holds:
φ(f ) dν ≥ φ
f dν .
Proof. Given any f ∈ B(), the functions φ ◦ f and f are comonotonic. By Lemmas 4.4 and 4.5, there is µ ∈ ba + () such that µ(f ≥ t) = ν(f ≥ t) and µ(φ(f ) ≥ t) = ν(φ(f ) ≥ t) for each t ∈ R. In turn, this implies µ( ) =
Introduction to the mathematics of ambiguity 69 φ(f ) dµ, and f dν = f dµ. By the standard
ν( ) = 1, φ(f ) dν = Jensen inequality: φ(f ) dν = φ(f ) dµ ≥ φ f dµ = φ f dν , as desired.
4.4. Representation Summing up, Choquet functionals are positively homogeneous, comonotonic additive, and Lipschitz continuous; they are also monotone provided the underlying game does. A natural question is whether these properties actually characterize Choquet functionals among all the functionals defined on B(). Schmeidler (1986) showed that this is the case and we now present his result. Theorem 4.4. Let ν : B() → R be a functional. Define the game ν(A) = ν(1A ) on . The following conditions are equivalent: (i) ν is monotone and comonotonic additive; (ii) ν is a capacity and, for all f ∈ B(), it holds: ν(f ) = 0
∞
ν(f ≥ t) dt +
0
−∞
[ν(f ≥ t) − ν( )] dt.
(4.18)
Remarks. (i) Positive homogeneity is a redundant condition here as it is implied by comonotonic additivity and monotonicity, as shown in the proof. (ii) Zhou (1998) proved a version of this result on Stone lattices. Proof. (ii) trivially implies (i) Conversely, assume (i). We divide the proof into three steps. Step 1. For
and any integer
n, by comonotonic additivity we have
any f ∈ B() f f ν n . Namely, ν fn = n1 ν(f ). Hence, given any positive ν(f ) = ν n n = n rational number α = m/n,
m f f f f m f = ν + ··· + = ν + · · · + ν = ν(f ). ν n n n n n n As a result, we have ν(λf ) = λ ν(f ) for any λ ∈ Q+ . In particular, this implies 0 = ν(λ1 − λ1 ) = λν( ) + ν(−λ1 ) for each λ ∈ Q+ , and so ν(f + λ1 ) = ν(f ) + ν(λ1 ) = ν(f ) + λν( ) for each f ∈ B() and each λ ∈ Q. Step 2. We now prove that ν is supnorm continuous. Let f , g ∈ B() and let {rn }n be a sequence of rationals such that rn ↓ f − g. As f ≤ g + f − g ≤ g + rn ,
70
Massimo Marinacci and Luigi Montrucchio
it follows that ν(f ) ≤ ν(g) + rn ν( ). Consequently, | ν(f ) − ν(g)| ≤ rn ν( ). As n → ∞, we get | ν(f ) − ν(g)| ≤ f − gν( ). Hence, ν is Lipschitz continuous, and so supnorm continuous. In turn, this implies ν(λf ) = λ ν(f ) for all λ ≥ 0 and ν(f + λ1 ) = ν(f ) + λν( ) for each f ∈ B() and each λ ∈ R, that is, ν is translation invariant. Step 3. It remains to show that (4.18) holds, that is, that ν(f ) = νc (f ) for all f ∈ B(). Since both ν and νc are supnorm continuous and B0 () is supnorm dense in B(), it is enough to show that ν(f ) = νc (f ) for all f ∈ B0 (). Let f ∈ B0 (). Since both ν and νc are translation invariant, it is enough to show that ν(f ) = νc (f ) for f ≥ 0. As f ∈ B0 (), we can write f = ki=1 αi 1Ai , where {Ai }ki=1 ⊆ is a suitable partition of and α1 > · · · > αk . Setting Di = ij =1 Aj and αk+1 = 0, we can then write f = k−1 i=1 (αi −αi+1 )1Di +αk 1 . k−1 As the functions {(αi − αi+1 )1Di }i=1 and αk 1 are pairwise comonotonic, by the comonotonic additivity and positive homogeneity of ν we have ⎞ ⎛ k−1 i ν(f ) = (αi − αi+1 )ν ⎝ Aj ⎠ + αk 1. j =1
i=1
∞ i Since ki=1 (αi −αi+1 )ν ν(f ) = j =1 Aj = 0 ν(f ≥ t) dt, we conclude that ∞ ν(f ) = νc (f ), as desired. 0 ν(f ≥ t) dt, that is, Next we extend Schmeidler’s Theorem to the non-monotonic case. Given a functional ν : B() → R and any two f , g ∈ B() with f ≤ g, set V (f ; g) = sup
n−1
| ν(fi+1 ) − ν(fi )|,
i=0
where the supremum is taken over all finite chains f = f0 ≤ f1 ≤ · · · ≤ fn = g. We say that ν is of bounded variation if V (0; f ) < +∞ for all f ∈ B + (). Theorem 4.5. Let ν : B() → R be a functional. Define the game ν(A) = ν(1A ) on . The following conditions are equivalent: (i) ν is comonotonic additive and of bounded variation; (ii) ν is comonotonic additive and supnorm continuous on B + (), and ν ∈ bv(); (iii) ν ∈ bv() and, for all f ∈ B(), ∞ 0 ν(f ) = ν(f ≥ t) dt + [ν(f ≥ t) − ν( )] dt. 0
−∞
Remark. When is finite, the requirement ν ∈ bv() becomes superfluous in conditions (ii) and (iii) as all finite games are of bounded variation.
Introduction to the mathematics of ambiguity
71
Before proving the result, we give a useful lemma. Observe that the decomposition f = (f − t)+ + (f ∧ t) reduces to the standard f = f + − f − when t = 0. Lemma 4.6. Let ν : B() → R be a comonotonic additive functional. Then, ν(f ) = ν((f − t)+ ) + ν(f ∧ t) for each t ∈ R and f ∈ B(). Proof. Given any t ∈ R, the functions (f − t)+ and f ∧ t are comonotonic. In fact, for any ω, ω ∈ we have [(f − t)+ (ω) − (f − t)+ (ω )][(f ∧ t)(ω) − (f ∧ t)(ω )] = (f − t)+ (ω)(f ∧ t)(ω) − (f − t)+ (ω)(f ∧ t)(ω ) − (f − t)+ (ω )(f ∧ t)(ω) + (f − t)+ (ω )(f ∧ t)(ω ) = (f − t)+ (ω)(f − t)− (ω ) + (f − t)+ (ω )(f − t)− (ω) ≥ 0, as desired. Proof of Theorem 4.5. (i) implies (ii). Clearly, ν ∈ bv(). We want to show that (i) implies that ν is supnorm continuous over B + (). As Step 1 of the proof of Theorem 4.4 still holds here, we have ν(f + λ1 ) = ν(f ) + λν( ) for each f ∈ B() and each λ ∈ Q. That is, ν is translation invariant w.r.t. Q. Let f , g ∈ B() with f ≤ g. If f ≥ 0, then V (f ; g) ≤ V (0; g) < +∞. Suppose f is not necessarily positive. There exists λ ∈ Q+ such that f + λ ≥ 0 and g + λ ≥ 0. By the translation invariance w.r.t. Q of ν, we have V (f ; g) = V (f + λ; g + λ) for all λ ∈ Q. Hence, V (f ; g) = V (f + λ; g + λ) < +∞. It is easy to see that V (0; λf ) = λV (0; f ) for all λ ∈ Q+ . The next claim gives a deeper property of V (f ; g). Claim. For all f ≥ 0 and all λ ∈ Q+ , it holds V (−λ; f ) = V (−λ; 0) + V (0; f ). Proof of the Claim. If f ≤ h ≤ g, we have V (f ; g) ≥ V (f ; h)+V (h; g). Hence, it suffices to show that V (−λ; f ) ≤ V (−λ; 0) + V (0; f ). By definition, for any ε > 0 there exists a chain {ϕi }ni=0 such that n−1
| ν(ϕi+1 ) − ν(ϕi )| ≥ V (−λ; f ) − ε,
i=0
with ϕ0 = −λ and ϕn = f . For each ϕi consider the two functions ϕi− = −(ϕi ∧0) and ϕi+ = ϕi ∨ 0 and the two chains {−ϕi− } and {ϕi+ }. The former chain is relative
72
Massimo Marinacci and Luigi Montrucchio
to V (−λ; 0), while the latter is relative to V (0; f ). Therefore, we have V (−λ; 0) + V (0; f ) ≥
n−1
− ) − ν(−ϕi− )| + | ν(−ϕi+1
i=0
=
n−1
n−1
+ | ν(ϕi+1 ) − ν(ϕi+ )|
i=0 − + (| ν(−ϕi+1 ) − ν(−ϕi− )| + | ν(ϕi+1 ) − ν(ϕi+ )|).
(4.19)
i=0
ν(ϕi+ ) + ν(−ϕi− ), On the other hand, by Lemma 4.6 for each i we have ν(ϕi ) = and so + − ν(ϕi )| = | ν(ϕi+1 ) + ν(−ϕi+1 ) − ν(ϕi+ ) − ν(−ϕi− )| | ν(ϕi+1 ) − + − ≤ | ν(ϕi+1 ) − ν(ϕi+ )| + | ν(−ϕi+1 ) − ν(−ϕi− )|.
In view of (4.19), we can write V (−λ; 0) + V (0; f ) ≥
n−1
| ν(ϕi+1 ) − ν(ϕi )| ≥ V (−λ; f ) − ε,
i=0
which proves our claim. Define the monotone functional ν1 (f ) = V (0; f ) on B + (). For each λ ∈ Q+ we have ν1 (f + λ) = V (0; f + λ) = V (−λ; f ) = V (−λ; 0) + V (0; f ) ν1 (f ). = V (0; λ) + V (0; f ) = λV (0; 1) + V (0; f ) = λ ν1 (1 ) + Hence, ν1 is translation invariant w.r.t. Q+ . Since ν1 is monotone, by Step 2 of the proof of Theorem 4.4 it is Lipschitz continuous, and so supnorm continuous. ν1 − ν on B + (). The functional ν2 is monotone; Consider the functional ν2 = moreover, it is translation invariant w.r.t. Q as both ν1 and ν do. Consequently, by Step 2 of the proof of Theorem 4.4 ν2 is supnorm continuous. As ν = ν1 − ν2 , we conclude that also ν is supnorm continuous, thus completing the proof that (i) implies (ii). (ii) implies (iii). Step 1 of the proof of Theorem 4.4 holds here as well. Hence, ν(f + λ1 ) = ν(f ) + λν( ) for each ν(λf ) = λ ν(f ) for all λ ∈ Q+ , and f ∈ B() and each λ ∈ Q. By supnorm continuity, ν(λf ) = λ ν(f ) for all λ ≥ 0, and ν(f + λ1 ) = ν(f ) + λν( ) for each λ ∈ R. The functional ν is, therefore, positively homogeneous and translation invariant. Let νc be the Choquet functional associated with ν. As ν ∈ bv(), νc is well ν and defined and supnorm continuous. We want to show that ν = νc . Since both νc are supnorm continuous and B0 () is supnorm dense in B(), it is enough
Introduction to the mathematics of ambiguity
73
to show that ν(f ) = νc (f ) for each f ∈ B0 (). This can be established by proceeding as in Step 3 of the proof of Theorem 4.4. (iii) implies (i). It remains to show that the Choquet functional νc is of bounded variation as long as ν ∈ bv(). By Proposition 4.1, there exist capacities ν 1 and ν 2 such that ν = ν 1 − ν 2 . Hence, νc = νc1 − νc2 and so the functional νc is the difference of two monotone functionals. This implies V (f ; g) ≤ νc1 (g) − νc1 (f ) + νc2 (g) − νc2 (f ), and we conclude that νc is of bounded variation.
4.5. Convex games Convex games are an interesting class of games and played an important role in Schmeidler’s approach to ambiguity, as explained in Chapter 1. Here we show some of their remarkable mathematical properties. We begin by proving formally that convexity can be formulated as in Equation (4.1), a version useful in game theory for interpreting supermodularity in terms of marginal values (see Moulin, 1995). Proposition 4.15. For any game ν, the following properties are equivalent: (i) ν is convex; (ii) for all sets A, B, and C such that A ⊆ B and B ∩ C = Ø, ν(A ∪ C) − ν(A) ≤ ν(B ∪ C) − ν(B); (iii) for all disjoint sets A, B, and C: ν(B ∪ A) − ν(B) ≤ ν(B ∪ C ∪ A) − ν(B ∪ C). Proof. (ii) easily implies (iii). Assume (ii) holds. Since (A∪B)\A = B \(A∩B), to check the supermodularity of ν is enough to set C = (A∪B)\A. Finally, assume (i) holds. If the sets A, B, and C are disjoint, then (B ∪ C) ∩ (B ∪ A) = B, and so supermodularity implies (iii), as desired. The next result, due to Choquet (1953: 289), shows that the convexity of the game and the superlinearity of the associated Choquet functional are two faces of the same coin.7 Recall that, by Proposition 4.6, B() is a lattice and it becomes a vector lattice when is σ -algebra. Theorem 4.6. For any game ν in bv(), the following conditions are equivalent: (i) ν is convex, (ii) νc is superadditive on B(), that is, νc (f + g) ≥ νc (f ) + νc (g) for all f , g ∈ B() such that f + g ∈ B(). (iii) νc is supermodular on B(), that is, νc (f ∨ g) + νc (f ∧ g) ≥ νc (f ) + νc (g) for all f , g ∈ B().
74
Massimo Marinacci and Luigi Montrucchio
Proof. We prove that both (ii) and (iii) are equivalent to (i). (i) implies (ii). Given f ∈ B + () and E ∈ , we have (f + 1E ≥ t) = (f ≥ t) ∪ (E ∩ (f ≥ t − 1)), and so f + 1E ∈ B + (). In turn, this implies f + g ∈ B + () whenever g ∈ B + () is simple. Moreover, as ν is convex, we get ν(f + 1E ≥ t) ≥ ν(f ≥ t) + ν(E ∩ (f ≥ t − 1)) − ν(E ∩ (f ≥ t)). Consequently, νc (f + 1E ) ∞ = ν(f + 1E ≥ t) dt
0
≥
∞
ν(f ≥ t) dt +
0
= νc (f ) +
∞
0 0 −1
∞
ν(E ∩ (f ≥ t − 1)) dt −
ν(E ∩ (f ≥ t)) dt
0
ν(E ∩ (f ≥ t)) dt = νc (f ) + ν(E).
As νc is positive homogeneous, for each λ ≥ 0 we have f f + 1E ≥ λ νc + ν(E) νc (f + λ1E ) = λνc λ λ = νc (f ) + λν(E).
n Let g ∈ B + () be a simple function. We can write g = i=1 λi 1Di , where D1 ⊆ · · · ⊆ Dn and λi ≥ 0 for each i = 1, . . . , n. As g is simple, we have f + g ∈ B + (). Hence, ! ! n n λi 1Di ≥ νc f + λi 1Di + λ1 ν(D1 ) νc (f + g) = νc f + i=1
≥ · · · ≥ νc (f ) +
i=2 n
λi ν(Di ) = νc (f ) + νc (g),
i=1
as desired. To show that the inequality ν(f + g) ≥ ν(f ) + ν(g) holds for all f , g ∈ B() it is now enough to use the translation invariance and supnorm continuity of νc . (ii) implies (i). Given any sets A and B, it holds 1A∪B + 1A∩B = 1A + 1B . Since the characteristic functions 1A∪B and 1A∩B are comonotonic, we then have ν(A ∪ B) + ν(A ∩ B) = νc (1A∪B ) + νc (1A∩B ) = νc (1A∪B + 1A∩B ) = νc (1A + 1B ) ≥ νc (1A ) + νc (1B ) = ν(A) + ν(B), and so the game ν is convex, as desired.
Introduction to the mathematics of ambiguity
75
(i) implies (iii). As νc is translation invariant, it is enough to prove the implication for f and g positive. It is easy to check that, for each t ∈ R, it holds (f ∨ g ≥ t) = (f ≥ t) ∪ (g ≥ t) (f ∧ g ≥ t) = (f ≥ t) ∩ (g ≥ t). Therefore, if ν is convex, then ν(f ∨ g ≥ t) + ν(f ∧ g ≥ t) ≥ ν(f ≥ t) + ν(g ≥ t). Hence,
∞
νc (f ∨ g) + νc (f ∧ g) = 0
=
∞
ν(f ∨ g ≥ t) dt +
ν(f ∧ g ≥ t) dt
0 ∞
[ν(f ∨ g ≥ t) + ν(f ∧ g ≥ t)] dt
0 ∞
≥
[ν(f ≥ t) + ν(g ≥ t)] dt = νc (f ) + νc (g),
0
as desired. (iii) implies (i). We have 1A ∨ 1B = 1A∪B and 1A ∧ 1B = 1A∩B . Hence, if we put f = 1A and g = 1B in the inequality νc (f ∨ g) + νc (f ∧ g) ≥ νc (f ) + νc (g), we get ν(A ∪ B) + ν(A ∩ B) ≥ ν(A) + ν(B), as desired. By Theorem 4.6, a game is convex if and only if the associated Choquet functional νc is superlinear, that is, superadditive and positively homogeneous. This is a useful property that, for example, makes it possible to use the classic Hahn–Banach Theorem in studying convex games. In order to do so, however, we first have to deal with a technical problem: unless is a σ -algebra, the space B() is not in general a vector space, something needed to apply the Hahn–Banach Theorem and other standard functional analytic results. There are at least two ways to bypass the problem. The first one is to consider the vector space B0 () of -measurable simple functions in place of the whole set B(). This can be enough as long as one is interested in using results that, like the Hahn–Banach Theorem, hold on any vector space. There are important results, however, that only hold on Banach spaces (e.g. the Uniform Boundedness Principle). In this case B0 (), which is not a Banach space, is useless. A solution is to consider B(), the supnorm closure B() of B0 (),8 which is a Banach lattice under the supnorm (Dunford and Schwartz, 1958: 258). B() is a dense subset of B(); it holds B() = B() when is a σ -algebra, and so in this case B() itself is a Banach lattice. If is not a σ -algebra, to work with the Banach lattice B() we have to extend on it the Choquet functional νc , which is originally defined on B().
76
Massimo Marinacci and Luigi Montrucchio
Lemma 4.7. Any Choquet functional νc : B() → R induced by a game ν ∈ bv() admits a unique supnorm continuous extension on B(). Such extension is positively homogeneous and comonotonic additive. Proof. By Proposition 4.11(iv), νc is Lipschitz continuous on B(). By standard results (Aliprantis and Border, 1999: 77), it then admits a unique supnorm continuous extension on the closure B(). Using its supnorm continuity, such extension is easily seen to be positively homogeneous. As to comonotonic additivity, we first prove the following claim. Claim. Given any two comonotonic and supnorm bounded functions f and g, there exist two sequences of simple functions {fn }n and {gn }n uniformly converging to f and g, respectively, and such that fn and gn are comonotonic for each n. Proof of the Claim. It is enough to prove the claim for positive functions. Let f : → R be positive and supnorm bounded, so that there exists a constant M > 0 such that 0 ≤ f (ω) ≤ M for each ω ∈ . Let M = αn > αn−1 > · · · > α1 > α0 = 0, with αi = (i/n)M for each i = 0, 1, . . . , n. SetAi = (f ≥ αi ) for each i = 1, . . . , n − 1, and define fn : → R as fn = n−1 i=1 αi 1Ai . The collection of upper sets {(fn ≥ t)}t∈R is included in {(f ≥ t)}t∈R and f − fn = maxi∈{0,...,n−1} (αi+1 − αi ) = M/n. In a similar way, for each n we can construct a simple function gn such that the collection of upper sets {(gn ≥ t)}t∈R is included in {(g ≥ t)}t∈R and g − gn = M/n. By Lemma 4.4 the collections {(g ≥ t)}t∈R and {(f ≥ t)}t∈R together form a chain. Hence, by what we just proved, for each n the collections {(g ≥ t)}t∈R , {(gn ≥ t)}t∈R , {(f ≥ t)}t∈R , and {(fn ≥ t)}t∈R together form a chain as well. Again by Lemma 4.4, fn and gn are then comonotonic functions, and so the sequences {fn }n and {gn }n we have constructed have the desired properties. This completes the proof of the Claim. Let f , g ∈ B(). Consider the sequences {fn }n and {gn }n of simple functions given by the Claim. As such sequences belong to B(), by the supnorm continuity of νc we have: νc (f + g) = lim νc (fn + gn ) = lim νc (fn ) + lim νc (gn ) = νc (f ) + νc (g), n
n
n
as desired. It is convenient to denote this extension still by νc , and in the sequel we will write νc : B() → R. In the enlarged domain B() the following cleaner version of Theorem 4.6 holds. As B() is a vector space, here we can consider concavity and quasi-concavity. The latter property is the only nontrivial feature of the next result relative to Theorem 4.6.9 Corollary 4.2. For any game ν in bv(), the following conditions are equivalent: (i) ν is convex,
Introduction to the mathematics of ambiguity (ii) (iii) (iv) (v)
νc νc νc νc
77
is superlinear on B(), is supermodular on B(), is concave on B(), is quasi-concave on B(), provided ν( ) = 0.
Proof. In view of Theorem 4.6, the only nontrivial part is to show that (v) implies (iv). We will actually prove the stronger result that (iv) is equivalent to the convexity of the cone {f : νc (f ) ≥ 0}. Set K = {f ∈ B() : νc (f ) ≥ 0}. Given two functions f , g ∈ B(), we have νc (f ) νc (g) νc g − νc f − 1 = 0, 1 = 0. ν( ) ν( ) Hence, both f − (νc (f )/ν( ))1 , and g − (νc (g)/ν( ))1 lie in K. By the convexity of K, taken α ∈ [0, 1] and α ≡ 1 − α, we have αf − α
νc (f ) νc (g) 1 + αg − α 1 ∈ K. ν( ) ν( )
Namely, νc (f ) νc (g) 1 + αg − α 1
νc αf − α ν( ) ν( ) = νc (αf + αg) − ανc (f ) − ανc (g) ≥ 0. Therefore, νc is concave. Remarks. (i) Dual properties hold for submodular games. For example, a game ν is submodular if and only if its Choquet functional νc is convex on B(); equivalently, a game ν is convex if and only if its dual Choquet functional ν c is convex on B(). For brevity, we omit these dual properties. (ii) Condition ν( ) = 0 in point (v) is needed. Consider the game ν on = {ω1 , ω2 } with ν(ω1 ) = 2, ν(ω2 ) = −1, and ν( ) = 0. Being subadditive, ν is not convex. On the other hand, its Choquet integral is 2(x1 − x2 ) x1 ≥ x2 νc (x1 , x2 ) = , x1 − x 2 x2 > x 1 which is quasi-concave. The next result is a first consequence of the use of functional analytic tools in the study of convex games. The equivalence between (i) and (v) is due to Schmeidler (1986) for positive games and to De Waegenaere and Wakker (2001) for finite games; for the other equivalences we refer to Delbaen (1974) and Marinacci and Montrucchio (2003).
78
Massimo Marinacci and Luigi Montrucchio
Theorem 4.7. For a bounded game ν, the following conditions are equivalent: (i) ν is convex; (ii) for any A ⊆ B there is µ ∈ core(ν), such that µ(A) = ν(A) and µ(B) = ν(B); (iii) for any finite chain {Ai }ni=1 , there is µ ∈ core(ν) such that µ(Ai ) = ν(Ai ) for all i = 1, . . . , n; (iv) ν ∈ bv() and, for any chain {Ai }i∈I , there is µ ∈ ext (core (ν)) such that µ(Ai ) = ν(Ai ) for all i ∈ I ; (v) ν ∈ bv() and νc (f ) = minµ∈core(ν) f dµ for all f ∈ B(); (vi) νc (f ) = minµ∈core(ν) f dµ for all f ∈ B0 (). This theorem has a few noteworthy features. First, it shows that bounded and convex games belong to bv(), so that they always have well-defined Choquet integrals on B(). Second, it improves Lemma 4.5 by showing that in the convex case the “replicating” measures over chains can be assumed to be in the core. Finally, Theorem 4.7 shows that Choquet functionals of convex games can be viewed as lower envelopes of the linear functional on B() induced by the measures in the cores. In other words, convex games are exact games of a special type, in which the close connection between the game and the measures in the core holds on the entire space B(), and not just on . Proof. The proof proceeds as follows: (i) ⇒ (vi) ⇒ (iv) ⇒ (v) ⇒ (iii) ⇒ (ii) ⇒ (i). (i) implies (vi). Given any f ∈ B0 (), the Choquet integral f dν is well defined since ν ∈ bv(f ), where f is the finite algebra generated by f . Hence, the Choquet functional νc : B0 () → R exists on the vector space B0 (), and it is positively homogeneous and translation invariant. Let f , g : → R be any two functions in B0 (). Let f ,g be the smallest algebra that makes both f and g measurable. As f ,g is finite, ν ∈ bv(f ,g ) and so we can apply Theorem 4.6 to the restricted Choquet integral νc : B(f ,g ) → R. Thus, νc (f +g) ≥ νc (f )+νc (g). Since f and g were arbitrary elements of B0 (), we conclude that νc : B0 () → R is a superlinear functional on B0 (). Let f ∈ B0 (). The algebraic dual of B0 () is the space f a() of all finitely additive games on .10 As νc : B0 () → R is superlinear, by the Hahn–Banach Theorem there is µc ∈ f a() such that µc (f ) = νc (f ) and µc (g) ≥ νc (g) for each g ∈ B0 (). In other words, νc (f ) = min µc (f ), µ∈C
where C = {µ ∈ f a() : µc (f ) ≥ νc (f ) for each f ∈ B0 ()}. Next we show that C coincides with the set C = {µ ∈ f a() : µ ≥ ν and µ( ) = ν( )}.
Introduction to the mathematics of ambiguity
79
Let µ ∈ C. Then, µ(A) = µc (1A ) ≥ νc (1A ) = ν(A) for all A ∈ ; moreover, −µ( ) = µc (−1 ) ≥ νc (−1 ) = −ν( ). Hence, µ ∈ C . Conversely, suppose µ ∈ C . As µ ≥ ν and µ( ) = ν( ), the definition of Choquet integral immediately implies that νc (f ) µ(f ). Hence, µ ∈ C. It remains to show that C = core(ν). As ba() ⊆ f a(), core(ν) ⊆ C . As to the converse inclusion, suppose µ ∈ C . Since ν is bounded, for each µ ∈ C we have |µ(A)| ≤ 2 supA∈ |ν(A)| (see Proposition 4.2). Then, µ ∈ ba() (see Dunford and Schwartz, 1958: 97) and we conclude that C ⊆ core(ν), as desired. (vi) implies (iv). Consider first a finite chain A1 ⊆ · · · ⊆ An . By (vi), there exists µ ∈ core(ν) such that
µc
n i=1
! 1Ai
= νc
n
! 1Ai .
i=1
n By comonotonic additivity, ni=1 µ(Ai ) = i=1 ν(Ai ). As µ ∈ core(ν), we have µ(Ai ) ≥ ν(Ai ) for all i = 1, . . . , n, which in turn implies µ(Ai ) = ν(Ai ) for all i = 1, . . . , n. Now, let {Ai }i∈I be any chain in . Let J be the (finite) algebra generated by a finite subchain {Ai }i∈J and J = {µ ∈ core(ν) : µ(Aj ) = ν(Aj ) for all j ∈ J }. Since core(ν) is weak∗ -compact, the set J is weak∗ -compact. Moreover, it is convex and, by what we just proved, J = Ø. It is easily seen that J is also extremal in core(ν). The collection of weak∗ -compact sets {J }{J : J ⊆I and |J |<∞} has the finite intersection property, and so its overall intersection {J : J ⊆I and |J |<∞} J is nonempty (see Aliprantis and Border, 1999: 38). Moreover, such intersection ∗ is extremal
in core(ν). Being convex and weak -compact, by the Krein–Milman Theorem {J : J ⊆I and |J |<∞} J has then an extreme point µ. We conclude that µ ∈ ext(core(ν)) and µ(Ai ) = ν(Ai ) for all i ∈ I , as desired. To complete the proof that (vi) implies (iv) it remains to show that ν ∈ bv(). Since core(ν) is weak∗ -compact, it is bounded; that is, there exists M ∈ R such that µ ≤ M for all µ ∈ core(ν). Since, given any finite chain Ø = A0 ⊆ A1 ⊆ · · · ⊆ An = , there exists µ ∈ core(ν) such that µ(Ai ) = ν(Ai ) for all i = 0, . . . , n, we conclude that n
|ν(Ai ) − ν(Ai−1 )| ≤ µ ≤ M.
i=1
(iv) implies (v). Let f ∈ B(). Since νc is translation invariant, assume w.l.o.g. that f ≥ 0. Consider the chain of all upper sets {(f ≥ t)}t∈R . Given any
80
Massimo Marinacci and Luigi Montrucchio
µ ∈ core(ν), the following holds: νc (f ) = ν(f ≥ t) dt ≤ µ(f ≥ t) dt = µ(f ). By (iv), there is µ ∈ core(ν) such that ν(A) = µ(A) for all A ∈ . Hence, νc (f ) = ν(f ≥ t) dt = µ(f ≥ t) dt = µ(f ), and we conclude that νc (f ) = minµ∈core(ν) f dµ. Since B() is supnorm dense in B(), the supnorm continuous functional νc : B() → R given by Lemma 4.7 is superlinear. By proceeding as before, we can show that core(ν) = {µ ∈ ba() : µc (f ) ≥ νc (f ) for each f ∈ B0 ()} = {µ ∈ ba() : µc (f ) ≥ νc (f ) for each f ∈ B()} Hence, by the Hahn–Banach Theorem: νc (f ) = min{µc (f ) : µ ∈ ba() and µc (g) ≥ νc (g) for each g ∈ B()} = min{µc (f ) : µ ∈ ba() and µc (g) ≥ νc (g) for each g ∈ B0 ()} = min{µc (f ) : µ ∈ core(ν)} = min f dµ, µ∈core(ν)
as desired. (v) implies (iii). Consider a finite chain {Ai }ni=1 and set f = ni=1 1Ai . By (v), such that µ(f ) = ν(f ). By comonotonic additivity, core(ν) n there is µ ∈ n n µ(A ) = ν(A ), and so i i i=1 i=1 i=1 [µ(Ai ) − ν(Ai )] = 0. Since µ ≥ ν we conclude that µ(Ai ) = ν(Ai ) for each i = 1, . . . , n, as desired. As (iii) trivially implies (ii), it remains to show that (ii) implies (i). Given any A and B, by (ii) there is µ ∈ core(ν) such that ν(A) = µ(A) and ν(B) = µ(B). Hence, ν(A ∩ B) + ν(A ∪ B) = µ(A ∩ B) + µ(A ∪ B) = µ(B) + µ(A) ≥ ν(B) + ν(A), where the last inequality follows from µ ∈ core(ν). We close with some characterizations of convexity through properties of subgames, thus providing an “hereditary” perspective on it. Theorem 4.8. For a bounded game ν, the following conditions are equivalent: (i) ν is convex; (ii) each subgame of ν is exact;
Introduction to the mathematics of ambiguity
81
(iii) ν is totally balanced and, given any A ⊆ B, each charge in core (νA ) has an extension belonging to core (νB ); (iv) ν is balanced and, given any finite subalgebra 0 of , each charge in core (ν0 ) has an extension on belonging to core (ν). The equivalence between (i) and (iv) is essentially due to Kelley (1959) (see also Delbaen, 1974: 218), that between (i) and (ii) to Biswas et al. (1999: 10), and that between (i) and (iii) to Einy and Shitovitz (1996: 197–199). The second part of condition (iii) is a property introduced by Kikuta and Shapley (1986). They call extendable the games satisfying this property for B = , which turns out to be useful in studying the von Neumann–Morgenstern stability of cores.11 The proof of Theorem 4.8 uses the following straightforward lemma. In this regard, notice that Schmeidler (1972: 219) gives an example of an exact game with four players that is not convex. Lemma 4.8. A finite game with at most three players is exact if and only if it is convex. Proof of Theorem 4.8. For convenience, we first prove the equivalence between (i)–(iii), and then that between (i) and (iv). (ii) implies (i). Given any A and B, consider the subgame νA∪B . By (ii), there is µ ∈ core(νA∪B ) such that µ(A ∩ B) = νA∪B (A ∩ B). Hence, ν(A ∪ B) + ν(A ∩ B) = νA∪B (A ∪ B) + νA∪B (A ∩ B) = µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B) ≥ νA∪B (A) + νA∪B (B) = ν(A) + ν(B), as desired. (i) implies (iii). Since A ⊆ B, the space B0 (A ) of simple A -measurable functions can be regarded as a vector subspace of B0 (B ). Let µ ∈ core(νA ). Given any f ∈ B0+ (A ), it holds νA,c (f ) = νB,c (f ), where νA,c : B0 (A ) → R is the Choquet functional induced by the subgame νA (νB,c is similarly defined). Therefore, µ(f ) ≥ νB,c (f ) for all f ∈ B0+ (A ). Given any f ∈ B0 (A ), there is k > 0 large enough so that f +k1A ∈ B0+ (A ). Since µ(A) = ν(A), by Theorem 4.6 we have µ(f ) + kµ(A) = µ(f + k1A ) ≥ νB,c (f + k1A ) ≥ νB,c (f ) + kν(A) = νB,c (f ) + kµ(A). Hence, µ(f ) ≥ νB,c (f ). We conclude that µ(f ) ≥ νB,c (f ) for all f ∈ B0 (A ). By the Hahn–Banach Theorem, there exists a charge µ∗ : B → R which extends µ and such that µ∗ (f ) ≥ νB,c (f ) for all f ∈ B0 (B ). Hence, µ∗ ∈ core(νB ).
82
Massimo Marinacci and Luigi Montrucchio
(iii) implies (ii). Given any B, let νB be the associated subgame. Given any A ⊆ B, let µ ∈ core(νA ). By hypothesis, there is µ∗ ∈ core(νB ) that extends µ. Hence, µ∗ (A) = µ(A) = νA (A) = νB (A), which implies that νB is exact, as desired. To complete the proof it remains to show that (iv) is equivalent to (i). (i) implies (iv). Let µ ∈ core(ν0 ). By Theorem 4.7, ν ∈ bv(), and so νc : B0 () → R is superlinear by Theorem 4.6. By the Hahn–Banach Theorem, there is µ∗ ∈ ba() that extends µ and such that µ∗ (f ) ≥ νc (f ) for all f ∈ B0 (). Hence, µ∗ ∈ core(ν). (iv) implies (i). If µ ∈ core(ν), then its restriction µ0 on any subalgebra 0 belongs to core(ν0 ). Therefore, the fact that ν is balanced implies that core(ν0 ) = Ø for each 0 . In particular, (iv) implies that: core(ν0 ) = {µ0 ∈ ba(0 ) : µ ∈ core(ν)}.
(4.20)
Given any A, consider first the finite subalgebra 0 = {Ø, A, Ac , }. It is easy to see that there exists an element of core(ν0 ) that on A takes on value ν(A). By (4.20), this amounts to say that there exists µ ∈ core(ν) such that µ(A) = µ0 (A) = ν(A). We conclude that ν is exact. Given any A and B, consider the finite subalgebra 0 generated by the partition {A ∩ B, (A ∪ B)c , A ∪ B \ A ∩ B}. Let C ∈ 0 . As ν is exact, there is µ ∈ core(ν) such that µ(C) = ν(C). As µ0 ∈ core(ν0 ), we have µ0 (C) = ν0 (C) and so ν0 is exact. Since 0 is generated by a partition consisting of three elements, Lemma 4.8 then implies that ν0 is convex. Since A ∩ B, A ∪ B ∈ 0 , by Theorem 4.7 and (4.20) there is µ ∈ core(ν) such that µ0 (A ∩ B) = ν0 (A ∩ B) and µ0 (A ∪ B) = ν0 (A ∪ B). Hence, ν(A ∪ B) + ν(A ∩ B) = ν0 (A ∪ B) + ν0 (A ∩ B) = µ0 (A ∪ B) + µ0 (A ∩ B) = µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B) ≥ ν(A) + ν(B), which shows that ν is convex.
4.6. Finite games 4.6.1. The space of finite games Games defined on finite spaces have some noteworthy peculiar properties, thanks to the special form of their domain. We devote this section to their study.12 Let be a finite set {ω1 , . . . , ωn } of n players and its power set. Vn denotes the space of all finite games on , which is a vector space under the setwise operations ν1 + ν2 and αν for ν1 , ν2 , ν ∈ Vn and α ∈ R. The next result, due to Shapley (1953: Lemma 3), shows the crucial importance of unanimity games, introduced in Example 4.4.
Introduction to the mathematics of ambiguity
83
Theorem 4.9. Unanimity games form a basis for the (2|| −1)-dimensional vector ν uA space Vn . For any ν ∈ Vn , the unique coefficients satisfying ν = Ø=A∈ αA are given by ν = (−1)|A|−|B| ν(B). (4.21) αA B⊆A
Proof. We first show that unanimity games are linearly independent in Vn . Suppose n α i=1 i uAi = θ , where θ is the trivial game such that θ (A) = 0 for each A. We want to show that αi = 0 for each i = 1, . . . , n. Suppose, per contra, that there is a subset I ⊆ {1, . . . , n} such that αi = 0 for each i ∈ I . As in Myerson (1991: 440), let i0 ∈ I be such that Ai0 is of minimal size among the coalitions {Ai }i∈I . By construction, αi = 0 for each i such that Ai Ai0 , so that 0=
n
αi uAi (Ai0 ) =
αi uAi (Ai0 ) = αi0 ,
{i : Ai ⊆Ai0 }
i=1
a contradiction. To complete the proof it remains to prove that, for each ν ∈ Vn , it holds ⎛ ⎞ ⎝ (−1)|A|−|B| ν(B)⎠ uA . ν= Ø=A∈
B⊆A
The needed combinatorial argument is detailed in, for example, Owen (1995: 263), to which we refer the reader. Example 4.7. For a charge µ we have µ(ω) if A = {ω} µ αA = 0 else That is, a game is additive if and only if all coefficients in (4.21) associated with non-singletons are zero. Example 4.8. Let f : R → R with f (0) = 0. Its first difference is f (n) = f (n+1)−f (n). By iteration, the k-order difference is k f = k−1 f . Consider the scalar measure game ν : 2{1,...,n} → R defined by ν(A) = f (|A|) for each A ⊆ {1, . . . , n}. The following holds: ν = |A| f (0). αA
To see why this is the case, observe that (4.21) implies m m ν f (m − 1) + f (m − 2) − · · · = αA = f (m) − 1 2 m m f (m − k), (−1)k = k k=0
(4.22)
84
Massimo Marinacci and Luigi Montrucchio
where we set |A| = m. Denote by I : N → N the identity operator and by S : N → N the shift operator defined by Sf (m) = f (m + 1). As = S − I , we have m m m−k m = (S − I )m = S . (−1)k k k=0
Hence m f (0) =
m k=0
(−1)k
m m m−k m S f (m − k), f (0) = (−1)k k k k=0
and so (4.22) holds. ν By Theorem 4.9, each game ν is uniquely determined by the coefficients {αA } given by (4.21). A natural question is whether there is a significant class of games identified by the requirement that all such coefficients be positive. Fortunately, Theorem 4.10 will show that there is such a class, which we now introduce. A game ν : → R is
14 monotone of order k (with k ≥ 2) if, for every A1 , . . . , Ak ∈ , ν
k i=1
! Ai
≥
{I : Ø =I ⊆{1,...,k}}
(−1)|I |+1 ν
! Ai ,
(4.23)
i∈I
15 totally monotone if it is positive and k-monotone for all k ≥ 2, 16 a belief function if it is a totally monotone probability. These definitions works for any algebra , not necessarily finite. For k = 2, we get back to convexity. Hence, totally monotone games are convex, though the converse is false. When ν is a charge, in (4.23) we have an equality. Totally monotone games are studied at length in Choquet (1953), and belief functions play a central role in the works of Dempster (1967, 1968) and Shafer (1976). They are also related to the theory of Mobius transforms pioneered by Rota (1964), as detailed in Chateauneuf and Jaffray (1989) and Grabisch et al. (2000) (see also Subsection 4.6.4). Example 4.9. All {0, 1}-valued convex games (e.g. unanimity games) are totally monotone (e.g. Marinacci, 1996: 1005) for a proof. Example 4.10. Let ( 1 , 1 , P1 ) be a probability space and 2 a finite space. A correspondence f : 1 → 2 2 is a random set if it is measurable, that is, f −1 (A) = {ω ∈ 1 : f (ω) ⊆ A} ∈ 1 for each A ⊆ 2 . Consider the distribution νf : 2 → R induced by a random set f , defined by νf (A) = P (f −1 (A)) for each A ∈ 2 . The distribution νf is a belief function (e.g. Nguyen, 1978). Random
Introduction to the mathematics of ambiguity
85
sets reduce to standard random variables when the images f (ω) are singletons; in this case, νf is the usual additive distribution induced by a random variable f . Under suitable topological conditions, random sets with values in infinite spaces
2 can also be considered (there is a large literature on them; e.g. Salinetti and Wets, 1986). We can now state the announced result. Theorem 4.10. Let ν be a game defined on the power set of a finite space . Then, the coefficients given by (4.21) are all positive if and only if ν is totally monotone. Remark. This theorem is essentially due to Dempster and Shafer (see Shafer, 1976). It has been extended to games on lattices by Gilboa and Lehrer (1991). ν = ν(A) ≥ 0. Proof. “If part”. Suppose ν is totally monotone. If |A| = 1, then αA Suppose |A| > 1 and set A = {ω1 , . . . , ωk } and Ai = A\{ωi } for each i = 1, . . . , k. We have ν = αA
(−1)|A|−|B| ν(B)
B⊆A
= ν(A) −
ν(Ai ) +
ν(Ai ∩ Aj ) − · · · + (−1)k ν(A1 ∩ · · · ∩ Ak )
i =j
i
= ν(A) −
(−1)|I |+1 ν
Ø=I ⊆{1,...,k}
As A = ν αA
k
i=1 Ai ,
=ν
k i=1
! Ai .
i∈I
we then have ! Ai
−
Ø=I ⊆{1,...,k}
(−1)
|I |+1
ν
! Ai ,
i∈I
ν so that αA ≥ 0, as desired. ν “If ” part. Suppose αA ≥ 0 for each Ø = A ∈ . By Example 4.9, each unanimity game u is totally monotone. Hence, by Theorem 4.9, we have ν = A ν α u . The positive linear combination of totally monotone game is clearly A Ø=A A totally monotone. We infer that ν is totally monotone as desired.
Example 4.11. Given a function f : N → R with f (0) = 0, each scalar measure game f (|A|) is totally monotone if and only if f is absolutely monotone à la Bernstein, that is, k f (n) ≥ 0 for each n and k (see Widder, 1941). By (4.22) and by Theorem 4.10, to prove this fact it is enough to show that f is absolutely
86
Massimo Marinacci and Luigi Montrucchio
monotone if and only if k f (0) ≥ 0 for each k. As S = [ + I ] = n k
n
k
n
k k r=1
k k = r+n , r r r
r=1
we get f (k) = n
k k r=1
r
r+n f (0) ≥ 0,
which gives the desired conclusion. Totally monotone games are therefore the convex cone of Vn consisting of all its elements featuring positive coefficients in (4.21). Denote this cone by Vn+ ; being pointed,13 it induces a partial order ! on Vn defined by ν ! ν if ν − ν ∈ Vn+ . In particular, ν ! θ if and only if ν is totally monotone, so that Vn+ = {ν ∈ Vn : ν ! θ }. The partial order ! makes Vn an ordered vector space. More is true, under the lattice operations ∨ and ∧ induced by !.14 Lemma 4.9. The ordered vector space (Vn , !) is a Riesz space with lattice operations given by ν1 ∨ ν2 =
ν1 ν2 )uA , (αA ∨ αA
Ø=A∈
and ν1 ∧ ν2 =
ν1 ν2 (αA ∧ αA )uA ,
Ø=A∈
for each ν1 and ν2 in Vn . Proof. We only prove the result for ν1 ∨ ν2 , as a similar argument can be used for ν1 ν2 ν1 ∧ ν2 . Set ν = Ø=A∈ (αA ∨ αA )uA . We want to show that ν = ν1 ∨ ν2 . First observe that, for i = 1, 2, νi ν1 ν2 ν − νi = [(αA ∨ αA ) − αA ]uA . Ø=A∈ νi ν1 ν2 Hence, by Theorem 4.10, ν − νi ∈ Vn+ , all coefficients [(αA ∨ αA ) − αA ] being positive. This shows that ν is an upper bound for {ν1 , ν2 }. It remains to show that ν for any game ν such that ν ! νi for it is the least such bound, that is, ν ! i = 1, 2.
Introduction to the mathematics of ambiguity
87
As ν − νi ∈ Vn+ , it holds
ν α A uA −
ν −νi α uA ∈ Vn+ . A
Ø =A∈
Ø =A∈
Ø=A∈
νi αA uA =
ν −νi νi ν By Theorem 4.10, α ≥ 0 for each A, and so α A A ≥ αA for each A. Therefore, ν1 ν2 ν α A ≥ αA ∨ αA for each A, and so the difference
ν − ν=
ν1 ν2 ν [α A − (αA ∨ αA )]uA
Ø=A∈
ν1 ν2 ν belongs to Vn+ by Theorem 4.10, all the coefficients α A −(αA ∨αA ) being positive. We conclude that ν ! νi for each i, as desired. ||
The Riesz space (Vn , !) is lattice isomorphic to the Euclidean space (R2 ||
Lemma 4.10. The function T : Vn → R2
−1
−1 , ≥).
defined by
ν T (ν) = (αA ) for all ν ∈ Vn ||
is a lattice preserving isomorphism between (Vn , !) and (R2
−1 , ≥).
ν Proof. By Theorem 4.9, the vector (αA ) is uniquely determined. Hence, T is one-to-one. Now, let ν1 , ν2 ∈ Vn and α, β ∈ R. By Theorem 4.9,
⎛ αν1 +βν2
T (αν1 + βν2 ) = (αA ⎛ = ⎝α
)=⎝
⎞ (−1)|A|−|B| (αν1 + βν2 )(B)⎠
B⊆A
(−1)|A|−|B| ν1 (B) + β
B⊆A
⎞ (−1)|A|−|B| ν2 (B)⎠
B⊆A
= αT (ν1 ) + βT (ν2 ), and so T is an isomorphism. Moreover, by Lemma 4.9, ||
−1
||
−1
ν1 ν2 T (ν1 ∨ ν2 ) = T (ν1 ) ∨ T (ν2 ) = (αA ∨ αA ) ∈ R2 ν1 ν2 ∧ αA ) ∈ R2 T (ν1 ∧ ν2 ) = T (ν1 ) ∧ T (ν2 ) = (αA
as desired.
, ,
88
Massimo Marinacci and Luigi Montrucchio By Lemma 4.9, the positive ν + and negative ν − parts of a game ν, defined by = ν ∨ 0 and ν − = −(ν ∧ 0), are given by ν (αA ∨ 0)uA , ν+ =
ν+
Ø=A∈
ν− =
ν (αA ∧ 0)uA .
Ø=A∈
The absolute value |ν|, defined by |ν| = ν + + ν − , is given by ν |αA |uA . |ν| = Ø=A∈
Notice that ν + ν + = T −1 ((αA ) ),
ν − ν − = T −1 ((αA ) ),
and
ν |ν| = T −1 (|αA |),
in accordance with Lemma 4.10. The associated norm · c is given by νc = |ν|( ) = ν + ( ) + ν − ( ) for each ν ∈ Vn , that is, ν |αA | = T (ν)1 . (4.24) νc = Ø=A∈
Following Gilboa and Schmeidler (1995), we call · c the composition norm. It is an L-norm since ν1 + ν2 c = ν1 c + ν1 c whenever ν1 and ν2 belong to Vn+ . As a result, (Vn , !, · c ) is an AL-space. Since ν |αA | = T (ν)1 , (4.25) νc = Ø=A∈ ||
where · 1 is the l1 -norm of R2 −1 , the isomorphism T is therefore an isometry || between (Vn , · c ) and (R2 −1 , · ).15 Summing up, Theorem 4.11. There is a lattice preserving and isometric isomorphism T between || the AL-spaces (Vn , !, · c ) and (R2 −1 , ≥, · 1 ) determined by the identity αA u A . (4.26) ν= Ø=A∈
Moreover, ν is totally monotone if and only if the corresponding vector (αA ) in || R2 −1 is nonnegative. ||
In other words, for each ν in Vn there is a unique (αA ) in R2 −1 such that (4.26) || holds; conversely, for each vector (αA ) in R2 −1 there is a unique ν in Vn such
Introduction to the mathematics of ambiguity
89
that (4.26) holds. Moreover, the correspondence T between ν and (αA ) is linear, lattice preserving, and isometric. Consider the restriction of the partial order ! on ba(), the vector subspace of Vn consisting of charges. Since ba + () = Vn+ ∩ ba(), given any µ1 and µ2 in ba(), we have µ1 ! µ2 if and only if µ1 − µ2 ∈ ba + (). Equivalently, µ1 ! µ2 if and only if µ1 ≥ µ2 setwise, that is, µ1 (A) ≥ µ2 (A) for each A. This is the standard partial order studied on ba() (see Rao and Rao, 1983), which can therefore be viewed as the restriction of ! on ba(). As a result, the standard lattice structure on ba() coincides with the one it inherits as a subspace of (Vn , !). In particular, on ba() the norm · c reduces to the total variation norm · . All this shows that the standard structures on ba() studied in measure theory are consistent with the ones we have identified on Vn so far. In the sequel we will denote by !ba the restriction of ! on ba(). 4.6.2. A decomposition The lattice structure of Vn suggests the possibility of achieving a decomposition à la Riesz for finite games. Given the close connection between · and the l1 -norm · 1 established in Theorem 4.11, it is natural to expect that such decomposition would resemble the one available for the familiar l1 -norm. For this reason, we first recall a simple decomposition result for the l1 -norm. Lemma 4.11. Given any vector z ∈ Rn , the vectors z+ and z− are the unique vectors in Rn+ such that z = z+ − z− ,
(4.27)
z1 = z+ 1 + z− 1 .
(4.28)
and
Proof. Clearly, the decomposition z = z+ −z− satisfies (4.28). Suppose x, y ∈ Rn+ satisfy (4.27). We want to show that x ≥ z+ and y ≥ z− . As x = z + y, x ≥ z, we have x ≥ z+ . Likewise, y = x − z ⇒ y ≥ −z ⇒ y ≥ z− . On the other hand, we have x1 + y1 ≥ z+ 1 + z− 1 = z1 . As x1 ≥ z+ 1 and y1 ≥ z− 1 , to get (4.28) we must have x1 = z+ 1 and y1 = z− 1 . Hence, x = z+ and y = z− . Lemma 4.11 leads to the following decomposition result, which generalizes in our finite setting the Jordan Decomposition Theorem for charges. Versions of this result for finite and infinite games have been proved by Revuz (1955), Gilboa and Schmeidler (1994, 1995) and Marinacci (1996).
90
Massimo Marinacci and Luigi Montrucchio
Theorem 4.12. Given any ν ∈ Vn , the games ν + and ν − are the unique totally monotone games such that ν = ν+ − ν−
(4.29)
νc = ν + c + ν − c .
(4.30)
and
Proof. Let ν1 and ν2 be any two games in Vn+ satisfying (4.39) and (4.30). Then, 2|| −1 are such that the positive vectors T (ν1 ) and T (ν2 ) of R+ T (ν) = T (ν1 ) − T (ν2 ),
and T (ν)1 = T (ν1 )1 + T (ν2 )1 .
By Lemma 4.11, T (ν1 ) = T (ν)+ = T (ν + ) and T (ν1 ) = T (ν)− = T (ν − ). Since T is an isomorphism, we conclude that ν1 = ν + and ν2 = ν − , as desired. 4.6.3. Additive representation By Theorem 4.9, each finite game ν can be uniquely written as ν=
ν αA uA ,
(4.31)
Ø=A∈
Let be the collection of all nonempty sets of , that is, = {A ∈ : A = Ø}. The collection can be viewed as new space, whose “points” are the nonempty sets of . By identifying with the collection of all singletons {{ω} : ω ∈ }, we can actually view the space as an enlargement of the original space. ν Define on the power set 2 of the space a charge µν as follows: µν (A) = αA for each A ∈ . By additivity, this is enough to define the charge µν on the entire power set 2 . For example, for the set {A, B} ∈ 2 we have µν ({A, B}) = ν αA + αBν ; more generally, any collection {A1 , . . . , An } ∈ 2 , we have n given ν µν ({A1 , . . . , An }) = i=1 αA . i Each game ν is thus associated with a charge µν on 2 . Denote by I : Vn → ba(2 ) this correspondence ν → µν , which is well defined by Theorem 4.9. It is also linear, that is, I (αν1 + βν2 ) = αI (ν1 ) + βI (ν2 ) for all α, β ∈ R and all ν1 , ν2 ∈ Vn . The linear correspondence I provides some noteworthy insights into Choquet integrals. To see why, given a set E, consider the function 1E : → R defined by " 1 A⊆E 1E (A) = 1E duA = uA (E) = 0 else = {A ∈ : A ⊆ E}, then 1E = 1E. That is, 1E is a for each A. If we set E characteristic function on the enlarged space .
Introduction to the mathematics of ambiguity Using µν and 1E , we can rewrite (4.31) as ν(E) = 1E (A)µν (A) = 1E dµν =
A∈
91
1E dµν = µν (E)
for each E ∈ . Equivalently, 1E dν = 1E dµν for each E ∈ .
(4.32)
Therefore, thanks to the linear correspondence I , we can represent the Choquet integral 1E dν as a standard additive integral on the enlarged space . In this = {A ∈ : A ⊆ E} of extended domain, the set E of is replaced by the setE . We call 1E dµν the additive representation of 1E dν. In a sense, (4.32) says that the Choquet integral 1E dν can be viewed as a “zipped” version of the additive integral 1E dµν . The trade-off here is between a more economical domain—that is, ( , ) rather than ( , 2 )—and a better behaved integral—that is, the additive integral rather than the non-additive one. In any case, to compute both representations we need to know the 2| |−1 values of ν and µν , respectively; hence, both representations involve the same amount of information, though processed in different ways. Next we formally collect the relevant properties of the additive representation. Observe that the correspondence I is actually an isomorphism.16 Theorem 4.13. There is a lattice preserving and isometric isomorphism I between the AL-spaces (Vn , !, · ) and (ba(2 ), !ba , · ) determined by the identity for each E ∈ . ν(E) = µ(E)
(4.33)
The game ν is totally monotone if and only if the corresponding µ is nonnegative. Versions of this result for finite and infinite games can be found in Revuz (1955), Gilboa and Schmeidler (1994, 1995), Marinacci (1996), and Philippe et al. (1999). Denneberg (1997) provides an overview and alternative proofs of some of these results.
Proof. Given a charge µ ∈ ba(2 ), the set function ν on defined by (4.33) is clearly a game. As to the converse, the charge µν defined earlier belongs to ba(2 ) and satisfies (4.33). It is also the unique charge in ba(2 ) satisfying (4.33). In fact, let µ be any other charge in ba(2 ) satisfying (4.33). Consider the collection : E ∈ } of subsets of . As = {E E 1 ∩ E2 = E1 ∩ E2
and E 1 ∪ E2 ⊇ E1 ∪ E2 ,
is, in general, only closed under intersections, that is, it is a π the collection class (see Aliprantis and Border, 1999: 132). As µ and µν coincide on a π -class,
92
Massimo Marinacci and Luigi Montrucchio
generated by (see Aliprantis and Border, they coincide on the algebra A() coincides with the power set 2 of . For, 1999: Theorem 9.10). But, A() \ A contains all singletons: given A ∈ , we have {A} = A A() − ω for any ω ∈ A. As a result, µ and µν coincide on the power set 2 , thus proving that µν is the unique charge in ba(2 ) satisfying (4.33). All this shows that the linear correspondence I we introduced above is an isomor phism between Vn and ba(2 ). It is also an isometry: the equality I (ν) = νr follows from ν µν = |µν (A)| = |αA | = νr . A∈
A∈
It remains to show that I is lattice preserving. We will only consider ∨, the argument for ∧ being similar. For each A ∈ , we have ν1 ν2 ∨ αA = max{µν1 (A), µν2 (A)} = (µν1 ∨ µν2 )(A), µν1 ∨ν2 (A) = αA
as desired (the last equality holds because A is a singleton when viewed as a member of ). The additive representation is not limited to integrals of characteristic functions, but it holds for all functions in B(). To see why this is the case, observe that the additivity of the Riemann integral immediately implies that the Choquet integral is linear on games, that is, f d(ν1 + ν2 ) = f dν1 + f dν2 for any ν1 and ν2 in Vn . Therefore, (4.31) implies that ! ν ν f dν = f d α A uA = αA f duA (4.34) A∈
A∈
for each f ∈ B(). Define a function f: → R by f(A) = f duA for each A ∈ . As
(4.35)
f duA = minω∈A f (ω) (see Example 4.4), it actually holds f(A) = min f (ω) ω∈A
for each A ∈ .
By (4.34) we have ν ν f dν = αA f (A) = αA min f (ω) = A∈
A∈
ω∈A
and so the representation (4.34) can be written as f dν = fdµν for each f ∈ B().
fdµν ,
(4.36)
This is the desired extension of (4.32) to all functions in B(). In fact, if f = 1E , we have f = 1E, and so (4.36) reduces to (4.32) for characteristic functions.
Introduction to the mathematics of ambiguity 93 Summing up, the additive representation of the Choquet integral f dν is given by
fdµν =
min f (ω) dµν =
ω∈A
ν αA min f (ω).
A∈
ω∈A
Theorem 4.13 can be extended from games to Choquet integrals along these lines. In order to do so, consider the space Vnc of all Choquet functionals on B(). It is a vector space since (αν + βν )c = ανc + βνc for all ν , ν ∈ Vn and all α, β ∈ R. By the next result, Vnc is isomorphic to the dual space B(2 )∗ of B(2 ).17 Corollary 4.3. There is an isomorphism between the vector spaces Vnc and B(2 )∗ determined by the identity νc (f ) = µ(f) for each f ∈ B().
(4.37)
In particular, I (ν) = µ, where ν is the game associated to νc and I is the isomorphism of Theorem 4.13.
Remark. For convenience, here µ denotes the linear functional in B(2 )∗ given by f dµ for each f ∈ B(2 ).
Proof. We first show that given any µ ∈ B(2 )∗ , the functional νc : B() → R defined by (4.37) is comonotonic additive. Observe that, given any two comonotonic f1 and f2 in B(), it holds (f 1 + f2 )(A) = min (f1 + f2 )(ω) ω∈A
= min f1 (ω) + min f2 (ω) = f1 (A) + f2 (A) ω∈A
ω∈A
(4.38)
for each A ∈ . Hence, νc (f1 + f2 ) = µ(f 1 + f2 ) = µ(f1 + f2 ) = µ(f1 ) + µ(f2 ) = νc (f1 ) + νc (f2 ), and so νc is comonotonic additive, as desired. It remains to prove that, given any Choquet functional νc ∈ Vnc , the functional µ defined by (4.37) is linear on B(2 ). By Theorem 4.13, Equation (4.37) uniquely the power set 2 . Hence, the associated linear functional determines a charge µ on f dµ belongs to B(2 )∗ , as desired.
94
Massimo Marinacci and Luigi Montrucchio
4.6.4. Polynomial representation A further possible way to represent finite games is in terms of polynomials. Consider the set {0, 1}n of the vertices of the hypercube [0, 1]n . Functions f : {0, 1}n → R are called pseudo-Boolean (see Boros and Hammer, 2002 and Grabisch et al., 2000). Say that a pseudo-Boolean function f is grounded if f (0, . . . , 0) = 0. Finite games can be regarded as grounded pseudo-Boolean functions. In fact, w.l.o.g. set
= {1, . . . , n} and = 2{1,...,n} , so that Vn is the set of all games ν : 2{1,...,n} → R. Given A ⊆ {1, . . . , n}, consider the characteristic vector 1A ∈ {0, 1}n given by " 1A (i) =
1 i∈A 0 else
Since {0, 1}n = {1A : A ⊆ {1, . . . , n}}, each game ν uniquely determines a grounded pseudo-Boolean function f by setting f (1A ) = ν(A) for each A ⊆ {1, . . . , n}. Conversely, each grounded pseudo-Boolean function f induces a game ν : 2{1,...,n} → R by setting ν(A) = f (1A ) for each A ⊆ {1, . . . , n}. Given a pseudo-Boolean function f , consider the polynomial
Bf (x) =
f (1A )
A⊆{1,...,n}
#
xi
#
(1 − xj )
for each x ∈ Rn .
(4.39)
j ∈Ac
i∈A
This polynomial is an extension of f on Rn as Bf (1A ) = f (1A ) for each A ⊆ {1, . . . , n}. More important, Bf is a Bernstein polynomial of f . For, recall (see Schultz, 1969) that given a function f : [0, 1]n → R and an ntuple m = (m1 , m2 , . . . , mn ) with non-negative integer components, its Bernstein polynomial B m f : Rn → R is B m f (x) =
m1 m2 k1 =0 k2 =0
···
mn
f
kn =0
k1 kn ,..., m1 mn
# n i=1
mi ki x (1 − xi )mi −ki . ki i
In particular, the least-degree Bernstein polynomial B (1,...,1) f : Rn → R associated with f is given by B (1,...,1) f (x) =
f (k)x1k1 · · · xnkn (1 − x1 )1−k1 · · · (1 − xn )1−kn .
k=(k1 ,...,kn )∈{0,1}n
To define B (1,...,1) we only need to know the values of f at the vertices {0, 1}n , and this makes it possible to associate B (1,...,1) to any pseudo-Boolean function. The polynomial (4.39) is, therefore, the least-degree Bernstein polynomial B (1,...,1) of f : {0, 1}n → R. When f is grounded, the polynomial Bf is multilinear, that is, it is linear in each variable xi . In particular, Bf is the unique multilinear extension of f on Rn
Introduction to the mathematics of ambiguity and it can be also written as # # Bν (x) = ν(A) xi (1 − xj ) Ø=A∈
i∈A
for each x ∈ Rn ,
95
(4.40)
j ∈Ac
where ν is the game associated with the grounded function f . The polynomical Bν is called the Owen multilinear extension of the game ν, and it was introduced by Owen (1972). In view of our previous discussion, Bν is the least-degree Bernstein polynomial of the grounded pseudo-Boolean function induced by the game. Example 4.12. Consider the game ν ∈ Vn given by ν(A) = |A|2 for each A ⊆ {1, . . . , n}. We have Bν (x) =
n
xi + 2
xi x j .
i =j
i=1
Denote by Pn the vector space of all multilinear polynomials on Rn . The next result provides two basis for Pn and a formula for the relative change of basis. Theorem 4.14. Monomials i∈A xi form a basis for the (2n − 1)-dimensional vector space Pn , as well as the polynomials i∈A xi i∈Ac (1 − xi ). Given P ∈ Pn , if # # # P (x) = αA xi = βA xi (1 − xi ), Ø=A∈
i∈A
Ø =A∈
i∈A
i∈Ac
then αA =
(−1)|A|−|B| βB .
(4.41)
B⊆A
Proof. Each multilinear polynomial P can written as a linear combination α Ø=A∈ A i∈A xi of monomials. Let us prove that such combination is unique. As in Boros and Hammer (2002), we proceed by induction on the size of the subsets A. Begin with |A| = 1. In this case αA = P (1A ), and so the coefficient αA is uniquely determined. Assume next that all αA , with |A| ≤ k − 1, are uniquely determined. Let A be such that |A| = k. Since P (1A ) = B⊆A αB , we have αA = P (1A ) −
αB .
B A
The coefficient αA is then uniquely determined as all coefficients αB are uniquely determined by the induction hypothesis. We conclude that the monomials are a basis for Pn . As there are 2n − 1 monomials, the space Pn has dimension 2n − 1.
96
Massimo Marinacci and Luigi Montrucchio There are 2n − 1 polynomials of the form i∈A xi i∈Ac (1 − xi ). Hence, they form a basis provided they are linearly independent. To see that this is the case, suppose # # P (x) ≡ βA xi (1 − xi ) = 0 for each x ∈ Rn . Ø=A∈
i∈A
i∈Ac
Then P (1A ) = 0 for each A, and so βA = 0. This shows that these polynomials are linearly independent, and so a basis. It remains to prove (4.41). Since P (1A ) = B⊆A αB for each A ⊆ {1, . . . , n}, we can obtain (4.41) by using a combinatorial argument that can be found in Shafer (1976: 48) and Chateauneuf and Jaffray (1989: Lemma 2.3). Remark. Consider the function M : Pn → Pn given by M(P )(1A ) = (−1)|A|−|B| P (1B )
(4.42)
B⊆A
for each index set A ⊆ {1, . . . , n}. This is the Mobius transform on Pn and, by (4.41), it can be viewed as a change of basis formula. By Theorem 4.14, the polynomials i∈A xi i∈Ac (1 − xi ) form a basis for Pn and so each multilinear polynomial can be represented as in (4.39) and viewed as the least-degree Bernstein polynomial of a suitable grounded pseudo-Boolean function. Equivalently, each multilinear polynomial can be viewed as the Owen polynomial of a suitable game. Moreover, by Theorem 4.14 we can represent the polynomial Bf of a grounded f in a unique way as ⎛ ⎞ # ⎝ Bf (x) = (−1)|A|−|B| f (1B )⎠ xi . (4.43) Ø=A∈
B⊆A
i∈A
Hence, the relative Owen polynomial can be uniquely written as ⎛ ⎞ # ⎝ Bν (x) = (−1)|A|−|B| ν(B)⎠ xi . Ø=A∈
B⊆A
i∈A
Let us get back to finite games. Denote by B the Owen correspondence ν → Bν between Vn and Pn . The next lemma collects a few simple properties of B. Here eA denotes the game in Vn given by 1 A=B eA (B) = . 0 else The family {eA }Ø=A∈ is clearly a basis in Vn , and any game ν can be represented by ν = Ø=A∈ ν(A)eA .
Introduction to the mathematics of ambiguity
97
Lemma 4.12. The Owen correspondence B is an isomorphism between the vector spaces Vn and Pn . Moreover, (i) for each unanimity game uA , we have # BuA (x) = xi for each x ∈ Rn ; i∈A
(ii) for each game eA , we have # # BeA (x) = xi (1 − xi ) for each x ∈ Rn ; i∈A
i∈Ac
(iii) for each charge µ, we have Bµ (x) =
n
µ(i)xi for each x ∈ Rn ;
i=1
(iv) a game ν is positive if and only if Bν (x) ≥ 0 for all x ∈ [0, 1]n . (v) a game ν is convex if, for each i = j , ∂Bν (x) ≥ 0 for all x ∈ (0, 1)n . ∂xi ∂xj Proof. By Theorem 4.14, B is one-to-one. As it is also linear, B is an isomorphism between the vector spaces Vn and Pn . Let us prove (i). As i∈A xi ∈ Pn (x) and B is a linear isomorphism, there exists a unique game ν such that Bν (x) = i∈A xi . As ν(B) = Bν (1B ), we have ν(B) = 1 if B ⊇ A and ν(B) = 0 elsewhere. Hence, ν = uA . As (ii) is trivially true, let us prove (iii). By Example 4.7, µ = ni=1 µ(i)δi , where δi is the Dirac charge concentrated on i ∈ . By the linearity of B and by point (i), Bµ (x) = Bni=1 µ(i)δi (x) =
n i=1
µ(i)Bδi (x) =
n
µ(i)xi ,
i=1
as desired. (40) has all positive coefficients. As (iv) If ν ≥ 0, the Owen polynomial n n i∈A xi j ∈Ac (1 − xj ) ≥ 0 on [0, 1] , we then have Bν (x) ≥ 0 on [0, 1] . The converse is obvious as ν(A) = Bν (1A ) ≥ 0. (v) This condition on the second derivatives implies that Bν is supermodular on (0, 1)n . As it is continuous, Bν is then supermodular on the hypercube [0, 1]n . In turn this implies the convexity of ν. Lemma 4.12(i) shows that unanimity games are the game counterpart of monomials. By Theorem 4.14, monomials form a basis of the space Pn of multilinear polynomials. As a result, Theorem 4.9 can be viewed as a corollary of
98
Massimo Marinacci and Luigi Montrucchio
Theorem 4.14, and the representation (4.21) as a consequence of the polynomial representation (4.43). Remark. As we did in Pn with (4.42), here as well we can define a Mobius transform M : Vn → Vn by (−1)|A|−|B| ν(B) M(ν)(A) = B⊆A
for each A ⊆ {1, . . . , n}. The Mobius transform on Vn can be viewed as a change of basis formula, between the basis {eA }Ø=A∈ and {uA }Ø=A∈ . The next result completes Lemma 4.12 by showing what is the polynomial counterpart of total monotonicity. Lemma 4.13. A game ν is totally monotone if and only if its Owen polynomial Bν is nonnegative on Rn+ , that is, Bν (x) ≥ 0 for each x ∈ Rn+ . Proof. Suppose ν is totally monotone. By Lemma 4.12, we can write # ν ν BuA (x) = αA xi . αA Bν (x) = Ø=A∈
Ø =A∈
i∈A
Hence, if ν is totally monotone, then Bν (x) ≥ 0 for all x ∈ Rn+ . Conversely, assume Bν (x) ≥ 0 for all x ∈ Rn+ . We want to show that ν is totally monotone, ν ν ≥ 0 for each A. Suppose, per contra, that αA < 0 for some A. Consider that is, αA n , with t > 0. Then, the vector t1A ∈ R+ ν |A| Bν (t1A ) = αA t + terms of lower degree.
Hence, for t large enough we have Bν (t1A ) < 0, a contradiction. We conclude ν that αA ≥ 0 for each A, as desired. This lemma is the reason why we considered multilinear polynomials defined on Rn rather than on [0, 1]n , as it is usually the case. In fact, by Lemma 4.12(iv) the positivity of the Owen polynomial on [0, 1]n only reflects the positivity of the associated game, not its total monotonicity. We now illustrate Lemmas 4.12 and 4.13 with a couple of examples. Example 4.13. Consider the game ν(A) = |A|2 of Example 4.12. As Bν (x) =
n i=1
xi + 2
i =j
x i xj ≥ 0
for each x ∈ Rn+ ,
by Lemma 4.13 the game ν is totally monotone.
Introduction to the mathematics of ambiguity
99
Example 4.14. Consider the game associated with the multilinear polynomial B(x) = x1 x2 + x1 x3 + x2 x3 − εx1 x2 x3 with ε > 0. As B(10/ε, 10/ε, 2/ε) < 0 for each ε > 0, this game is not totally monotone. The game is positive and convex when ε ≤ 1. In fact, B(x) = x1 x2 (1 − εx3 ) + x1 x3 + x2 x3 ≥ 0 on [0, 1]3 , and so by Lemma 4.12(iv) is positive. On the other hand, ∂ 2B = 1 − εxk ≥ 0, ∂xi ∂xj on (0, 1)n , so that, by Lemma (4.12)(v), the game is convex. In view of Lemma (4.13), it is natural to consider the pointed convex cone Pn+ = {P ∈ Pn : P (x) ≥ 0 for each x ∈ Rn+ }. It induces in the usual way an order !p on Pn as follows: given P1 , P2 ∈ Pn , write P1 !p P2 if P1 − P2 ∈ Pn+ . In turn, !p induces a lattice structure and norm, denoted by · p , that makes Pn an AL-space. For brevity, we omit the details of these by now standard notions. The next result summarizes the relations existing between the space of finite games and the space of multilinear polynomials just introduced. Theorem 4.15. There is a lattice preserving and isometric isomorphism B between the AL-spaces (Vn , !, · ) and (Pn , !p , · p ) determined by the identity P (x) =
ν(A)
Ø=A∈
# i∈A
xi
#
(1 − xj ) for each x ∈ Rn .
j ∈Ac
The game ν is totally monotone if and only if the corresponding polynomial P in Pn is nonnegatiue on Rn+ . Summing up, Theorems 4.11, 4.13, and 4.15 established the following lattice isometries:
R2||−1 , ≥, · 1
T
I
←→ (Vn , !, · ) ←→ (ba(2 ), !ba , · ) #B (Pn , !p , · p )
The resulting isometrics I ◦ T −1 and B ◦ T −1 between R2||−1 , ba(2 ), and Pn are obviously well known. The interesting part here is given by the possibility of representing finite games in different ways, each useful for different purposes.
100
Massimo Marinacci and Luigi Montrucchio
4.6.5. Convex games In this last subsection we show some noteworthy properties of finite convex games. A first important property has been already mentioned right after Theorem 4.12: any finite game can be written as the difference of two convex games. To see other properties of finite convex games, we have to turn our attentions to chains of subsets of . As = {ω1 , . . . , ωn }, the collection C given by {ω1 }, {ω1 , ω2 }, . . . , {ω1 , . . . , ωn } forms a maximal chain, that is, no other chain can contain it. More generally, given any permutation σ on {1, . . . , n}, the collection Cσ given by {ωσ (1) }, {ωσ (1) , ωσ (2) }, . . . , {ωσ (1) , . . . , ωσ (n) } forms another maximal chain. All maximal chains in have this form, and so there are n! of them. Let ν be any game. By Lemma 4.5, for each Cσ there is a charge µσ ∈ ba() such that µσ (A) = ν(A) for each A ∈ Cσ . Because of the maximality of Cσ , the charge µσ is easily seen to be unique. We call µσ is the marginal worth charge associated with permutation σ . Marginal worth charges play a central role in studying finite convex games. We begin by providing a characterization of convexity based on them, due to Ichiishi (1981). Theorem 4.16. A finite game ν is convex if and only if all its marginal worth charges µσ belong to the core. Proof. “Only if”. Suppose ν is convex. We want to show that each µσ belongs to core(ν). By Theorem 4.7, there exists µ ∈ core(ν) such that µ(A) = ν(A) for each A ∈ Cσ . By the maximality of Cσ , µσ is the unique charge having such property. Hence, µ = µσ , as desired. “If”. Suppose µσ ∈ core(ν) for all permutations σ . Given any A and B, let Cσ be a maximal chain containing A ∩ B, A, and A ∪ B. Then ν(A ∪ B) + ν(A ∩ B) − ν(A) = µσ (A ∪ B) + µσ (A ∩ B) − µσ (A) = µσ (B). As µσ ∈ core(ν), we have µσ (B) ≥ ν(B), and so ν is convex. Turn now to cores of finite games. The first observation to make is that the core of a finite game is a subset of the | |-dimensional space R of the form: core(ν) = x ∈ R :
ω∈
xω = ν( ) and
ω∈A
$ xω ≥ ν(A) for each A .
Introduction to the mathematics of ambiguity
101
Equivalently, core(ν) =
A∈
x∈R :
ω∈A
$
xω ≥ ν(A) ∩ x ∈ R :
$ xω ≤ ν( ) ,
ω∈
that is, core(ν) is the set of solutions of a finite system of linear inequalities on R . Sets of this form are called polyhedra. By Proposition 4.2 the core is weak*-compact. In this finite setting, this means that it is a compact subset of R , where compactness is in the standard norm topology of R . The core of a finite game is, therefore, a compact polyhedron. As a result, we have the following geometric property of cores of finite games. Proposition 4.16. The core of a finite game is a polytope in R , that is, it is the convex hull of a finite set. Proof. By a standard result (see Aliprantis and Border, 1999: 233–234 or Webster, 1994: 114), compact polyhedra are polytopes. The extreme points of a polytope are called vertices and they form a finite set. As each element of a polytope can be represented as a convex combination of its vertices, the knowledge of the set of vertices is, therefore, key in describing the structure of a polytope. All this means that, by Proposition 4.16, in order to understand the structure of the core it is crucial to identify the set of its vertices. This is achieved by the next result, due to Shapley (1971). Interestingly, the marginal worth charges, which by Theorem 4.16 always belong to the core of a convex game, turn out to be exactly the sought-after vertices. Theorem 4.17. Let ν be a finite convex game. Then, a charge µ ∈ ba() is a vertex of core(ν) if and only if it is a marginal worth charge, that is, if and only if there is a maximal chain Cσ such that ν(A) = µ(A) for all A ∈ Cσ . Proof. An element of a polytope is a vertex if and only if it is an exposed point. Hence, it is enough to show that the marginal worth charges are the set of exposed points of core(ν). “If ”. Suppose µσ is a marginal worth charge, with associated maximal chain Cσ . We want to show that is an exposed point of core(ν). Since Cσ is a maximal by Cσ , that chain, there is an injective function fσ whose upper sets are given is, Cσ = {(fσ ≥ t)}t∈R . For example, if Cσ = {Aσ (i) }, take fσ = ni=1 1Aσ (i) . By the definition of Choquet integral, we have f dµσ = f dν. Since Cσ is maximal, µσ is the unique charge replicating ν on Cσ . Therefore, given any other charge µ in core(ν), there exists A ∈ Cσ such that µσ (A) < µ(A). Equivalently, there is some t ∈ R such that ν(fσ ≥ t) = µσ (fσ ≥ t) < µ(fσ ≥ t). Hence, fσ dν = fσ dµσ < fσ dµ for all µ ∈ core(ν) with µ = µσ , and this proves that µσ is an exposed point, as desired.
102
Massimo Marinacci and Luigi Montrucchio
“Only if ”. Suppose µ∗ is an exposed point of core(ν). We want to show that is a marginal worth charge, that is, that there exists a maximal chain C ∗ in
such that µ∗ (A) = ν(A) for each A ∈ C ∗ . ∗ Let {µi }m i=1 be the set of all exposed points of core(ν), except µ . Set k1 = ∗ ∗ f: → µ ∨ (maxi=1,...,m µi ). Since µ is an exposed point, there exists ∗ . Set k = f dµ for all µ ∈ core(ν) with µ = µ R such that f dµ∗ < 2 mini=1,...,m ( f dµi − f dµ∗ ). Clearly, k2 > 0. Given 0 < ε < k2 /2k1 , there is an injective g : → R such that f − g < ε. Hence, for each i we have g dµi − g dµ∗ = g dµi − f dµi + f dµi − f dµ∗ + f dµ∗ − g dµ∗ ≥ −εk1 + k2 − εk1 > 0.
µ∗
We conclude that g dµ∗ < g dµi for each i, and so g dµ∗ < g dµ for all µ ∈ core(ν) with µ = µ∗ . Since ν is convex, by Theorem 4.7 it holds g dν = minµ∈core(ν) g dµ, and ∗ < g dµ for all µ ∈ core(ν) with µ = µ∗ . The equality so g dν = g dµ ∗ g dν = g dµ implies that µ∗ (g ≥ t) = ν(g ≥ t) for all t ∈ R. Since g is injective, the chain of upper sets {g ≥ t} is maximal in , and it is actually the desired maximal chain C ∗ . Denote by M(ν) the set of all marginal worth charges of a game ν. By Theorem 4.17, we have core(ν) = co(M(ν)), and so all elements of the core can be represented as convex combinations of marginal worth charges. This result has been recently generalized to infinite games by Marinacci and Montrucchio (forthcoming). Putting together Theorems 4.16 and 4.17, we have the following remarkable property of finite games. Corollary 4.4. A finite game ν is convex if and only if M(ν) = exp(core(ν)). Therefore, given a game, the knowledge of its n! marginal worth charges makes it possible to determine both whether the game is convex and what is the structure of its core. We close by observing that it is not by chance that in Corollary 4.4 we use the set of exposed point exp rather than that of extreme points ext. For a polytope these two sets coincide and they form the set of vertices. For general compact convex sets, even in finite dimensional spaces, this is no longer the case and exposed points are only a subset of the set of extreme points. Inspection of the proof of Theorem 4.17 shows that what we have actually proved is that marginal worth charges are the set of exposed points of the core. The fact that they then turn out to coincide with the set of extreme points is a consequence of properties of polytopes, which are immaterial for the proof.
Introduction to the mathematics of ambiguity
103
When extending the result to infinite convex games this observation is important as in the more general, setting—where exposed and extreme points no longer necessarily coincide—the analog of marginal worth charges will actually characterize the exposed points. We refer the interested reader to Marinacci and Montrucchio (forthcoming) for details.
4.7. Concluding remarks 1
2
3
4
5
6
In this chapter we only considered games defined on spaces having no topological structure. There is a large literature on suitably “regular” set functions defined on topological spaces, tracing back to Choquet (1953). We refer the interested reader to Huber and Strassen (1973) and Dellacherie and Meyer (1978). Epstein and Wang (1996) and Philippe et al. (1999) provide some decision-theoretic applications of capacities on topological domains. In a series of papers, Gabriele Greco proposed an interesting notion of measurability on algebras. A noteworthy feature of his approach is that, unlike B(), the resulting class of measurable functions forms a vector space. Greco’s approach is, therefore, a further way to bypass the lack of vector structure of B() that we discussed in some detail after Theorem 4.6. In this chapter, we preferred to define the Choquet functional on the smaller domain B() ¯ and then extend it on the vector space B() using its Lipschitz continuity, following in this way a standard procedure in functional analysis. In any case, details on Greco’s approach can be found in his papers (e.g. Greco, 1981 and Bassanezi and Greco, 1984) and in Denneberg (1994). We did not consider here games and Choquet functionals defined on product algebras. For details on this topic we refer the interested reader to Ben Porath et al. (1997), Ghirardato (1997), and to the references contained therein. Throughout the chapter we only considered Choquet functionals defined on bounded functions. Results for the unbounded case can be found in Greco (1976, 1982), and Bassanezi and Greco (1984) and in Wakker (1993). Sipos (1979a,b) introduced a different notion of integral for capacities. It coincides with the Choquet integral for positive functions, but the extension to general functions is done according to the standard procedure used to extend the Lebesgue integral from positive functions to general functions, based on the decomposition f = f + −f − . The resulting integral is in general different from the Choquet integral and it turned out to be useful in some applications. We refer the interested reader to Sipos’ original papers and to Denneberg (1994). Theorem 4.6 and Corollary 4.2 make it possible to use convex analysis tools in studying convex games and their Choquet integrals. For example, Carlier and Dana (forthcoming) and Marinacci and Montrucchio (forthcoming) use such tools to study the structure of cores of convex games and the differentiability and subdifferentiability properties of their Choquet integrals.
104
Massimo Marinacci and Luigi Montrucchio
Acknowledgments We thank Fabio Maccheroni for his very insightful suggestions, which greatly improved this chapter. The financial support of MIUR (Ministero dell’Istruzione, Università e Ricerca Scientifica) is gratefully acknowledged.
Notes 1 In the sequel subsets of are understood to be in even where not stated explicitly and they are referred to both as sets and as coalitions. 2 Maccheroni and Ruckle (2002) proved that (bv(), · ) is a dual Banach space. 3 The weak∗ -topology and its properties can be found in, for example, Aliprantis and Border (1999), Dunford and Schwartz (1958) and Rudin (1973). 4 The subgame νA is the restriction of ν on the induced algebra A = ∩ A given by νA (B) = ν(B) for all B ⊆ A. 5 A collection C in is chain if for each A and B in C it holds either A ⊆ B or B ⊆ A. Throughout we assume that Ø, ∈ C. 6 That is, f ≥ g if f (ω) ≥ g(ω) for each ω ∈ , and f = supω∈ |f (ω)|. 7 A functional is superlinear if it is positively homogeneous and superadditive. Recall that, by Proposition 4.11, Choquet functionals are always positively homogeneous. 8 That is, f ∈ B() provided there is a sequence {fn }n ⊆ B0 () such that limn f − fn = 0. Here we are viewing B0 () as a subset of the set of all bounded functions f : → R. 9 The equivalence between the convexity of ν and the concavity of νc established in Corollary 4.2 is also a curious terminological phenomenon, which may give rise to some confusion. A simple way to avoid any problem is to use the terminology “supermodular games.” 10 Notice that ba() is the subspace of f a() consiting of all bounded charges. 11 See, for example, Biswas et al. (1999) and the references therein contained. For characterizations of convexity and exactness related to stability, see Kikuta (1988) and Sharkey (1982). 12 Needless to say, the properties we will establish for finite games also hold for games defined on finite algebras of subsets of infinite spaces. 13 That is, V + ∩ (−V + ) = {0}. 14 See Aliprantis and Border (1999: 263–330) for a definition of these lattice operations, as well as for all notions on vector lattices needed in the sequel. 15 The l1 -norm · 1 of Rn is given by x1 = ni=1 |xi | for each x ∈ Rn . 16 In the statement !ba denotes the restriction of ! on ba() as discussed right after Theorem 4.11. 17 B(2 )∗ is the vector space of all linear functionals defined on the vector space B(2 ) of all functions defined on the enlarged space .
References Aliprantis, C. D. and K. C. Border (1999) Infinite dimensional analysis, Springer-Verlag, New York. Aumann, R. and L. Shapley (1974) Values of non-atomic games, Princeton University Press, Princeton. Bassanezi, R. C. and G. H. Greco (1984) Sull’additività dell’integrale, Rendiconti Seminario Matematico Università di Padova, 72, 249–275.
Introduction to the mathematics of ambiguity
105
Ben Porath, E., I. Gilboa and D. Schmeidler (1997) On the measurement of inequality under uncertainty, Journal of Economic Theory, 75, 194–204. (Reprinted as Chapter 22 in this volume.) Bhaskara Rao, K. P. S., M. Bhaskara Rao (1983) Theory of charges, Academic Press, New York. Biswas, A. K., T. Parthasarathy, J. A. M. Potters, and M. Voorneveld (1999) Large cores and exactness, Games and Economic Behavior, 28, 1–12. Bondareva, O. (1963) Certain applications of the methods of linear programming to the theory of cooperative games, (in Russian) Problemy Kibernetiki, 10, 119–139. Boros, E. and P. L. Hammer (2002) Pseudo-Boolean optimization, Discrete Applied Mathematics, 123, 155–225. Carlier, G. and R. A. Dana, Core of convex distortions of a probability on a non-atomic space, Journal of Economic Theory, 173, 199–222, 2003. Chateauneuf, A. and Jaffray, J.-Y. (1989) Some characterizations of lower probabilities and other monotone capacities through the use of Mobius inversion, Mathematical Social Sciences, 17, 263–283. Choquet, G. (1953) Theory of capacities, Annales de l’Institut Fourier, 5, 131–295. Delbaen, F. (1974) Convex games and extreme points, Journal of Mathematical Analysis and Applications, 45, 210–233. Dellacherie, C. (1971) Quelques commentaires sur le prolongements de capacités, Seminaire Probabilités V, Lecture Notes in Math. 191, Springer-Verlag, New York. Dellacherie, C. and P.-A. Meyer (1978) Probabilities and potential, North-Holland, Amsterdam. Dempster, A. (1967) Upper and lower probabilities induced by a multivalued mapping, Annals of Mathematical Statistics, 38, 325–339. Dempster, A. (1968) A generalization of Bayesian inference, Journal of the Royal Statistical Society (B), 30, 205–247. Denneberg, D. (1994) Non-additive measure and integral, Kluwer, Dordrecht. Denneberg, D. (1997) Representation of the Choquet integral with the σ -additive Mobius transform, Fuzzy Sets and Systems, 92, 139–156. De Waegenaere, A. and P. Wakker (2001) Nonmonotonic Choquet integrals, Journal of Mathematical Economics, 36, 45–60. Dunford, N. and J. T. Schwartz (1958) Linear operators, part I: general theory, WileyIntersience, London. Einy, E. and B. Shitovitz (1996) Convex games and stable sets, Games and Economic Behavior, 16, 192–201. Epstein, L. G. and T. Wang (1996) “Beliefs about beliefs” without probabilities, Econometrica, 64, 1343–1373. Fan, K. (1956) On systems of linear inequalities, in Linear inequalities and related systems, Annals of Math. Studies, 38, 99–156. Ghirardato, P. (1997) On independence for non-additive measures, with a Fu-bini theorem, Journal of Economic Theory, 73, 261–291. Gilboa, I. and E. Lehrer (1991) Global games, International Journal of Game Theory, 20, 129–147. Gilboa, I. and D. Schmeidler (1994) Additive representations of non-additive measures and the Choquet integral, Annals of Operations Research, 52, 43–65. Gilboa, I. and D. Schmeidler (1995) Canonical representation of set functions, Mathematics of Operations Research, 20, 197–212.
106
Massimo Marinacci and Luigi Montrucchio
Grabisch, M., J.-L. Marichal and M. Roubens (2000) Equivalent representations of set functions, Mathematics of Operations Research, 25, 157–178. Greco, G. H. (1976) Integrale monotono, Rendiconti Seminario Matematico Università di Padova, 57, 149–166. Greco, G. H. (1981) Sur la mesurabilité d’une fonction numérique par rapport à une famille d’ensembles. Rendiconti Seminario Matematico Università di Padova, 65, 163–176. Greco, G. H. (1982) Sulla rappresentazione di funzionali mediante integrali, Rendiconti Seminario Matematico Università di Padova, 66, 21–42. Huber, P. J. and V. Strassen (1973) “Minimax tests and the Neyman-Pearson lemma capacities,” Annals of Statistics, 1, 251–263. Ichiishi, T. (1981) Super-modularity: applications to convex games and to the greedy algorithm for LP, Journal of Economic Theory, 25, 283–286. Kannai, Y. (1969) Countably additive measures in cores of games, Journal of Mathematical Analysis and Applications, 27, 227–240. Kelley, J. L. (1959) Measures on Boolean algebras, Pacific Journal of Mathematics, 9, 1165–1177. Kikuta, K. (1988) A condition for a game to be convex, Mathematica Japonica, 33, 425–430. Kikuta, K. and L. S. Shapley (1986) Core stability in n-person games, mimeo (2000). Maccheroni, F. and M. Marinacci, An Heine-Borel theorem for ba(), mimeo (2000). Maccheroni, F. and W. H. Ruckle (2002) BV as a dual space, Rendiconti Seminario Matematico Università di Padova, 107, 101–109. Marinacci, M. (1996) Decomposition and representation of coalitional games, Mathematics of Operations Research, 21, 1000–1015. (Reprinted as Chapter 12 in this volume.) Marinacci, M. (1997) Finitely additive and epsilon Nash equilibria, International Journal of Game Theory, 26, 315–333. Marinacci, M. and L. Montrucchio (2003) Subcalculus for set functions and cores of TU games, Journal of Mathematical Economics, 39, 1–25. Marinacci, M. and L. Montrucchio (2004) A Characterization of the core of convex games through Gateaux derivatives, Journal of Economic Theory, 116, 229–248. Moulin, H. (1995) Cooperative microeconomics, Princeton University Press, Princeton. Myerson, R. (1991) Game theory, Harvard University Press, Cambridge. Nguyen, H. T. (1978) On random sets and belief functions, Journal of Mathematical Analysis and Applications, 65, 531–542. Owen, G. (1972) Multilinear extensions of games, Management Science, 18, 64–79. Owen, G. (1995) Game theory, Academic Press, New York. Philippe, F., G. Debs and J.-Y. Jaffray (1999) Decision making with monotone lower probabilities of infinite order, Mathematics of Operations Research, 24, 767–784. Revuz, A. (1955) Fonctions croissantes et mesures sur les espaces topologiques ordonnés, Annales de l’Institut Fourier, 6, 187–269. Rota, G. C. (1964) Theory of Mobius functions, Z. Wahrsch. und Verw. Geb., 2, 340–368. Rudin, W. (1973) Functional analysis, McGraw-Hill, New York. Rudin, W. (1987) Real and complex analysis (3rd edition), McGraw-Hill, New York. Salinetti, G. and R. Wets (1986) On the convergence in distribution of measurable multifunctions (random sets), normal integrands, stochastic processes and stochastic infima, Mathematics of Operations Research, 11, 385–419. Schmeidler, D. (1968) On balanced games with infinitely many players, Research Program in Game Theory and Mathematical Economics, RM 28, The Hebrew University of Jerusalem.
Introduction to the mathematics of ambiguity
107
Schmeidler, D. (1972) Cores of exact games, Journal of Mathematical Analysis and Applications, 40, 214–225. Schmeidler, D. (1986) Integral representation without additivity, Proceedings of the American Mathematical Society, 97, 255–261. Schmeidler, D. (1989) Subjective probability and expected utility without additivity, Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Schultz, M. H. (1969) L∞ -multivariate approximation theory, SIAM Journal on Numerical Analysis, 6, 161–183. Shafer, G. (1976) A mathematical theory of evidence, Princeton University Press, Princeton. Shapley, L. S. (1953) A value for n-person games, in Contributions to the Theory of Games (H. Kuhn and A. W. Tucker, eds), Princeton University Press, Princeton. Shapley, L. S. (1967) On balanced sets and cores, Naval Research Logistic Quarterly, 14, 453–460. Shapley, L. S. (1971) Cores of convex games, International Journal of Game Theory, 1, 12–26. Sharkey, W. W. (1982) Cooperative games with large cores, International Journal of Game Theory, 11, 175–182. Sipos, J. (1979a) Integral with respect to a pre-measure, Mathematica Slovaca, 29, 141–155. Sipos, J. (1979b) Non linear integrals, Mathematica Slovaca, 29, 257–270. Wakker, P. (1993) Unbounded utility for Savage’s “foundations of statistics” and other models, Mathematics of Operations Research, 18, 446–485. Webster, R. (1994) Convexity, Oxford University Press, Oxford. Widder, D. V. (1941) The Laplace transform, Princeton University Press, Princeton. Wolfenson, M. and T.-L. Fine (1982) Bayes-like decision making with upper and lower probabilities, Journal of the American Statistical Association, 77, 80–88. Zhou, L. (1998) Integral representation of continuous comonotonically additive functionals, Transactions of the American Mathematical Society, 350, 1811–1822.
5
Subjective probability and expected utility without additivity David Schmeidler
5.1. Introduction Bayesian statistical techniques are applicable when the information and uncertainty with respect to the parameters or hypotheses in question can be expressed by a probability distribution. This prior probability is also the focus of most of the criticism against the Bayesian school. My starting point is to join the critics in attacking a certain aspect of the prior probability: The probability attached to an uncertain event does not reflect the heuristic amount of information that led to the assignment of that probability. For example, when the information on the occurrence of two events is symmetric they are assigned equal prior probabilities. If the events are complementary the probabilities will be 1/2, independently of whether the symmetric information is meager or abundant. There are two (unwritten?) rules for assigning prior probabilities to events in case of uncertainty. The first says that symmetric information with respect to the occurrence of events results in equal probabilities. The second says that if the space is partitioned into k symmetric (i.e. equiprobable) events, then the probability of each event is 1/k. I agree with the first rule and object to the second. In the example mentioned earlier, if each of the symmetric and complementary uncertain events is assigned the index 3/7, the number 1/7, 1/7 = 1 − (3/7 + 3/7), would indicate the decision maker’s confidence in the probability assessment. Thus, allowing nonadditive (not necessarily additive) probabilities enables transmission or recording of information that additive probabilities cannot represent. The idea of nonadditive probabilities is not new. Nonadditive (objective) probabilities have been in use in physics for a long time (Feynman, 1963). The nonadditivity describes the deviation of elementary particles from mechanical behavior toward wave-like behavior. Daniel Ellsberg (1961) presented his arguments against necessarily additive (subjective) probabilities with the help of the following “mind experiments”: There are two urns each containing one hundred balls. Each ball is either red or black. In urn I there are fifty balls of each color and
Schmeidler, D. (1989). “Subjective probability and expected utility without additivity,” Econometrica, 57, 571–587.
Subjective probability and EU without additivity
109
there is no additional information about urn I I . One ball is chosen at random from each urn. There are four events, denoted I R, I B, I I R, I I B, where I R denotes the event that the ball chosen from urn I is red, etc. On each of the events a bet is offered: $100 if the event occurs and zero if it does not. According to Ellsberg most decision makers are indifferent between betting on I R and betting on I B and are similarly indifferent between bets on I I R and I I B. It may be that the majority are indifferent among all four bets. However, there is a nonnegligible proportion of decision makers who prefer every bet from urn I (I B or I R) to every bet from urn I I (I I B or I I R). These decision makers cannot represent their beliefs with respect to the occurrence of uncertain events through an additive probability. The most compelling justification for representation of beliefs about uncertain events through additive prior probability has been suggested by Savage. Building on previous work by Ramsey, de Finetti, and von Neumann–Morgenstern (N–M), Savage suggested axioms for decision theory that lead to the criterion of maximization of expected utility. The expectation operation is carried out with respect to a prior probability derived uniquely from the decision maker’s preferences over acts. The axiom violated by the preference of the select minority in the example above is the “sure thing principle,” that is, Savage’s P2. In this chapter a simplified version of Savage’s model is used. The simplification consists of the introduction of objective or physical probabilities. An act in this model assigns to each state an objective lottery over deterministic outcomes. The uncertainty concerns which state will occur. Such a model containing objective and subjective probabilities has been suggested by Anscombe and Aumann (1963). They speak about roulette lotteries (objective) and horse lotteries (subjective). In the presentation here the version in Fishburn (1970) is used. The N–M utility theorem used here can also be found in Fishburn (1970). The concept of objective probability is considered here as a physical concept like acceleration, momentum, or temperature; to construct a lottery with given objective probabilities (a roulette lottery) is a technical problem conceptually not different from building a thermometer. When a person has constructed a “perfect” die, he assigns a probability of 1/6 to each outcome. This probability is objective in the same sense as the temperature measured by the thermometer. Another person can check and verify the calibration of the thermometer. Similarly, he can verify the perfection of the die by measuring its dimensions, scanning it to verify uniform density, etc. Rolling the die many times is not necessarily the exclusive test for verification of objective probability. On the other hand, the subjective or personal probability of an event is interpreted here as the number used in calculating the expectation (integral) of a random variable. This definition includes objective or physical probabilities as a special case where there is no doubt as to which number is to be used. This interpretation does not impose any restriction of additivity on probabilities, as long as it is possible to perform the expectation operation which is the subject of this work. Subjective probability is derived from a person’s preferences over acts. In the Anscombe–Aumann type model usually five assumptions are imposed on preferences to define unique additive subjective probability and N–M utility over
110
David Schmeidler
outcomes. The first three assumptions are essentially N–M’s—weak order, independence, and continuity— and the fourth assumption is equivalent to Savage’s P3, that is, state-independence of preferences. The additional assumption is nondegeneracy; without it uniqueness is not guaranteed. The example quoted earlier can be embedded in such a model. There are four states: (I B, I I B), (I B, I I R), (I R, I I B), (I R, I I R). The deterministic outcomes are sums of dollars. For concreteness of the example, assume that there are 101 deterministic outcomes: $0, $1, $2, . . . , $100. An act assigns to each state a probability distribution over the outcomes. The bet “$100 if I I B” is an act which assigns the (degenerate objective) lottery of receiving “$100 with probability one” to each state in the event I I B and “zero dollars with probability one” to each state in the event I I R. The bet on I I R is similarly interpreted. Indifference between these two acts (bets), the independence condition, continuity, and weak order imply indifference between either of them and the constant act which assigns to each state the objective lottery of receiving $100 with probability 1/2 and receiving zero dollars with probability 1/2. The same considerations imply that the constant act above is indifferent to either of the two acts (bets): “$100 if I B” and “$100 if I R”. Hence the indifference between I B and I R and the indifference between I I B and I I R in Ellsberg’s example, together with the von N–M conditions, imply indifference between all four bets. The nonnegligible minority of Ellsberg’s example does not share this indifference: they are indifferent between the constant act (stated earlier) and each bet from urn I , and prefer the constant act to each bet from urn I I . Our first objective consists of restatement, or more specifically of weakening, of the independence condition such that the new assumption together with the other three assumptions can be consistently imposed on the preference relation over acts. In particular the special preferences of the example become admissible. It is obvious that the example’s preferences between bets (acts) do not admit additive subjective probability. Do they define in some consistent way a unique nonadditive subjective probability, and if so, is there a way to define the expected utility maximization criterion for the nonadditive case? An affirmative answer to this problem is presented in the third section. Thus the new model rationalizes nonadditive (personal) probabilities and admits the computation of expected utility with respect to these probabilities. It formally extends the additive model and it makes the expected utility criterion applicable to cases where additive expected utility is not applicable. Before turning to a precise and detailed presentation of the model, another heuristic observation is made. The nomenclature used in economics distinguishes between risk and uncertainty. Decisions in a risk situation are precisely the choices among roulette lotteries. The probabilities are objectively given; they are part of the data. For this case the economic theory went beyond N–M utility and defined concepts of risk aversion, risk premium, and certainty equivalence. Translating these concepts to the case of decisions under uncertainty we can speak about uncertainty aversion, uncertainty premium, and risk equivalence. Returning to the example, suppose that betting $100 on I I R is indifferent to betting $100 on a risky event with an (objective) probability of 3/7. Thus, the subjective probability
Subjective probability and EU without additivity
111
of an event is its risk equivalent (P (I I R) = 3/7). In this example the number 1/7 computed earlier expresses the uncertainty premium in terms of risk. Note that nonadditive probability may not exhibit consistently either uncertainty aversion or uncertainty attraction. This is similar to the case of decisions in risk situations where N–M utility (of money) may be neither concave nor convex.
5.2. Axioms and background Let X be a set and Y be the set of distributions over X with finite supports Y = y : X → [0, 1] | y(x) = 0 for finitely many x’s in X and
$ y(x) = 1 .
x∈X
For notational simplicity we identify X with the subset {y ∈ Y | y(x) = 1 for some x in X} of Y . Let S be a set and let be an algebra of subsets of S. Both sets, X and S are assumed to be nonempty. Denote by L0 the set of all -measurable finite valued functions from S to Y and denote by Lc the constant functions in L0 . Let L be a convex subset of Y S which includes Lc . Note that Y can be considered a subset of some linear space, and Y S , in turn, can then be considered as a subspace of the linear space of all functions from S to the first linear space. Whereas it is obvious how to perform convex combinations in Y it should be stressed that convex combinations in Y S are performed pointwise. That is, for f and g in Y S and α in [0, 1], αf + (1 − α)g = h where h(s) = αf (s) + (1 − α)g(s) on S. In the neo-Bayesian nomenclature, elements of X are (deterministic) outcomes, elements of Y are random outcomes or (roulette) lotteries, and elements of L are acts (or horse lotteries). Elements of S are states (of nature) and elements of are events. The primitive of a neo-Bayesian decision model is a binary (preference) relation over L to be denoted by . Next are stated several properties (axioms) of the preference relation, which will be used in the sequel. (i) Weak order. (a) For all f and g in L: f g or g f . (b) For all f, g, and h in L: If f g and g h, then f h. The relation on L induces a relation also denoted by on Y : y z iff y S zS where y S denotes the constant function y on S (i.e. {y}S ). As usual, and ∼ denote the asymmetric and symmetric parts, respectively, of . Definition 5.1. Two acts f and g in Y S are said to be comonotonic if for no s and t in S, f (s) f (t) and g(t) g(s). A constant act f , that is, f = y S for some y in Y , and any act g are comonotonic. An act f whose statewise lotteries {f (s)} are mutually indifferent, that is, f (s) ∼ y for all s in S, and any act g are comonotonic. If X is a set of numbers and
112
David Schmeidler
preferences respect the usual order on numbers, then any two X-valued functions f and g are comonotonic iff (f (s) − f (t))(g(s) − g(t)) 0 for all s and t in S. Clearly, I I R and I I B of the Introduction are not comonotonic. (Comonotonicity stands for common monotonicity.) Next our new axiom for neo-Bayesian decision theory is introduced. (ii) Comonotonic Independence. For all pairwise comonotonic acts f, g, and h in L and for all α in ]0, 1[: f g implies αf + (1 − α)h αg + (1 − α)h. (]0, 1[ is the open unit interval.) Elaboration of this condition is delayed until after condition (vii). Comonotonic independence is clearly a less restrictive condition than the independence condition stated below. (iii) Independence. For all f, g, and h in L and for all α in ]0, 1[: f g implies αf + (1 − α)h αg + (1 − α)h. (iv) Continuity. For all f, g, and h in L: f g and g h, then there are α and β in ]0, 1[ such that αf + (1 − α)h g and g βf + (1 − β)h. Next, two versions of state-independence are introduced. The intuitive meaning of each of these conditions is that the preferences over random outcomes do not depend on the state that occurred. The first version is the one to be used here. The second version is stated for comparisons since it is the common one in the literature. (v) Monotonicity. For all f and g in L: If f (s) g(s) on S thenf g. (vi) Strict monotonicity. For all f and g in L, y and z in Y, and E in : If f g, f (s) = y on E and g(s) = z on E, and f (s) = g(s) on E c , then y z. Observation.
If L = L0 , then (vi) and (i) imply (v).
Proof. Let f and g be finite step functions such that f (s) g(s) on s. There is a finite chain f = h0 , h1 , . . . , hk , = g where each pair of consecutive functions hi−1 , hi are constant on the set on which they differ. For this pair (vi) and (i) imply (v). Transitivity (i)(b) of concludes the proof. Clearly (i) and (v) imply (vi). For the sake of completeness we list as axiom: (vii) Nondegeneracy. Not for all f and g in L, f g. Out of the seven axioms listed here the completeness of the preferences, (i)(a), seems to be the most restrictive and most imposing assumption of the theory. One can view the weakening of the completeness assumption as a main contribution of all other axioms. Imagine a decision maker who initially has a partial preference relation over acts. After an additional introspection she accepts the validity of several of the axioms. She can then extend her preferences using these axioms.
Subjective probability and EU without additivity
113
For example, if she ranks f g and g h, and if she accepts transitivity, then she concludes that f h. From this point of view, the independence axiom, (iii), seems the most powerful axiom for extending partial preferences. Given f g and independence we get for all h in L and α in ]0, 1[: f ≡ αf + (1 − α)h αg + (1 − α)h ≡ g . However, after additional retrospection this implication may be too powerful to be acceptable. For example, consider the case where outcomes are real numbers and S = [0, 2π]. Let f and g be two acts defined: f (s) = sin(s) and g(s) = sin(s + π/2) = cos(s). The preferences f g may be induced by the rough evaluation that the event [π/3, 4π/3] is more probable than its complement. Define the act h by h(s) = sin(77s). In this case the structure of the acts f = 12 f + 12 h and g = 12 g + 12 h is far from transparent and the automatic implication of independence, f g , may seem doubtful to the decision maker. More generally: the ranking f g implies some rough estimation by the decision maker of the probabilities of events (in the algebra) defined by the acts f and g. If mixture with an arbitrary act h is allowed, the resulting acts f and g may define a much finer (larger) algebra (especially when the algebra defined by h is qualitatively independent of the algebras of f and g). Careful retrospection and comparison of the acts f and g may lead them to the ranking g f (as in the case of the Ellsberg paradox) contradictory to the implication of the independence axiom. Qualifying the comparisons and the application of independence to comonotonic acts rules out the possibility of contradiction. If f , g, and h are pairwise comonotonic, then the comparison of f to g is not very different from the comparison of f to g . Hence the decision maker can accept the validity of the implication: f g ⇐⇒ f g , without fear of running into a contradiction. Note that accepting the validity of comonotonic independence, (ii), means accepting the validity of the implication mentioned earlier without knowing the specific acts f , g, h, f , g , but knowing that all five are pairwise comonotonic. Before presenting the von Neumann–Morgenstern Theorem we point out that stating the axioms of, (i) weak order, (iii) independence, and (iv) continuity do not require that the preference relation be defined on a set L containing Lc . Only the convexity of L is required for (ii) and (iii). von Neumann–Morgenstern Theorem. Let M be a convex subset of some linear space, with a binary relation defined on it. A necessary and sufficient condition for the relation to satisfy (i) weak order, (iii) independence, and (iv) continuity is the existence of an affine real-valued function, say w, on M such that for all f and g in M: f g iff w(f ) w(g). (Affinity of w means that w(αf +(1−α)g) = αw(f )+(1−α)w(g) for 0 < α < 1.) Furthermore, an affine real-valued function w on M can replace w in the above statement iff there exist a positive number α and a real number β such that w (f ) = αw(f ) + β on M. As mentioned earlier, for proof of this theorem and the statement and proof of Anscombe–Aumann Theorem stated later, the reader is referred to Fishburn (1970).
114
David Schmeidler
Implication. Suppose that a binary relation on some convex subset L of Y S with Lc ⊂ L satisfies (i) weak order, (ii) comonotonic independence, and (iv) continuity. Suppose also that there is a convex subset M of L with Lc ⊂ M such that any two acts in M are comonotonic. Then by the von Neumann–Morgenstern Theorem there is an affine function on M, to be denoted by J , which represents the binary relation on M. That is, for all f and g in M: f g iff J (f ) J (g). Clearly, if M = Lc ≡ {y S | y ∈ Y } any two acts in M are comonotonic. Hence, if a function u is defined on Y by u(y) = J (y S ), then u is affine and represents the induced preferences on Y . The affinity of u implies u(y) = x∈X y(x)u(x). When subjective probability enters into the calculation of expected utility of an act, an integral with respect to a finitely additive set function has to be defined. Denote by P a finitely additive probability measure on and let a be a real-valued -measurable function on S. For the special case where a is a finite step function, a can be uniquely represented by ki=1 ai Ei∗ where α1 > α2 > · · · > αk are the values that a attains and Ei∗ is the indicator function on S of Ei ≡ {s ∈ S | a(s) = αi } for i = 1, . . . , k. Then a dP = S
k
P (Ei )αi .
i=1
The more general case where a is not finitely valued is treated as a special case of nonadditive probability. Anscombe–Aumann Theorem. Suppose that a preference relation on L = L0 satisfies (i) weak order, (iii) independence, (iv) continuity, (vi) strict monotonicity, and (vii) nondegeneracy. Then there exist a unique finitely additive probability measure P on and an affine real-valued function u on Y such that for all f and g in L0 : f g iff u(f (·)) dP (g(·)) dP . S
S
Furthermore, if there exist P and u as above, then the preference relation they induce on L0 satisfied conditions, (i), (iii), (iv), (vi), and (vii). Finally, the function u is unique up to a positive linear transformation. There are three apparent differences between the statement of the main result in the next section and the Anscombe–Aumann Theorem stated earlier: (i) Instead of strict monotonicity, monotonicity is used. It has been shown in the Observation that it does not make a difference. However, for the forthcoming extension, monotonicity is the natural condition. (ii) Independence is replaced with comonotonic independence. (iii) The finitely additive probability measure P is replaced with a nonadditive probability v.
Subjective probability and EU without additivity
115
5.3. Theorem A real-valued set function v on is termed nonadditive probability if it satisfies the normalization conditions v(φ) = 0 and v(S) = 1, and monotonicity, that is, for all E and G in : E ⊂ G implies v(E) v(G). We now introduce the definition of s a dv for v nonadditive probability and a = ki=1 αi Ei∗ a finite step function with α1 > α2 > · · · > αk and (Ei )ki=1 a partition of S. Let αk+1 = 0 and define a dv = S
k
⎛ (αi − αi+1 )v ⎝
i
⎞ Ej ⎠ .
j =1
i=1
For the special case of v additive the definition stated earlier coincides with the usual one mentioned in the previous section. Theorem 5.1. Suppose that the preference relation on L = L0 satisfies (i) weak order, (ii) comonotonic independence, (iv) continuity, (v) monotonicity, and (vii) nondegeneracy. Then there exist a unique nonadditive probability v on and an affine real-valued function u on Y such that for all f and g in L0 : f g iff u(f (·)) dv u(g(·)) dv. S
S
Conversely, if there exist v and u as above, u nonconstant, then the preference relation they induce on L0 satisfies, (i), (ii), (iv), (v), and (vii). Finally, the function u is unique up to positive linear transformations. Proof. From the implication of the von N–M Theorem we get a N–M utility u representing the preference relation induces on Y . By nondegeneracy there are f ∗ and f∗ in L0 with f ∗ f∗ . Monotonicity, (v), implies existence of a state s in S such that f ∗ (s) ≡ y ∗ f∗ (s) ≡ y∗ . Since u is given up to a positive linear transformation, suppose from now on u(y ∗ ) = 1 and u(y∗ ) = −1. Denote K = u(Y ). Hence K is a convex subset of the real line including the interval [1, −1]. For an arbitrary f in L0 denote Mf = {αf + (1 − α)y S | y ∈ Y
and a ∈ [0, 1]}.
Thus Mf is the convex hull of the union of f and Lc . It is easy to see that any two acts in Mf are comonotonic. Hence, there is an affine real-valued function on Mf , which represents the preference relation restricted to Mf . After rescaling, this function, Jf satisfies Jf (Y ∗S ) = 1 and Jf (y∗S ) = −1. Clearly, if h ∈ Mf ∩ Mg , then Jf (h) = Jg (h). So, defining J (f ) = Jf (f ) for f in L0 , we get a real-valued function on L0 which represents the preferences on L0 and satisfies for all y in Y : J (y S ) = u(y). Let B0 (K) denote the -measurable, K-valued finite step function on S. Let U : L0 → B0 (K) be defined by U (f )(s) = u(f (s)) for s in S
116
David Schmeidler
and f in L0 . The function U is onto, and if U (f ) = U (g), then by monotonicity f ∼ g, which in turn implies J (f ) = J (g). We now define a real-valued function I on B0 (K). Given a in B0 (K), let f in L0 be such that U (f ) = a. Then define I (a) = J (f ). I is well defined since as mentioned earlier J is constant on U −1 (a):
L0
U
B0 I
J
R We now have a real-valued function I on B0 (K) which satisfies the following three conditions: (i) For all α in K: I (αS ∗ ) = α. (ii) For all pairwise comonotonic functions a, b, and c in B0 (K) and α in [0, 1]: if I (a) > I (b) then I (αa + (1 − α)c) > I (αb + (1 − α)c). (iii) If a(s) b(s) on S for a and b in B0 (K), then I (a) I (b). To see that (i) is satisfied, let y in Y be such that u(y) = α. Then J (y S ) = α and U (y S ) = αS ∗ . Hence I (αS ∗ ) = α. Similarly, (ii) is satisfied because comonotonicity is preserved by U and J represents which satisfies comonotonic independence. Finally, (iii) holds because U preserves monotonicity. The corollary of Section 3 and the Remark following it in Schmeidler (1986) say that if a real-valued function I on B0 (K) satisfies conditions (i), (ii), and (iii), then the nonadditive probability v on defined by v(E) = I (E ∗ ) satisfies for all a and b in B0 (K):
I (a) I (b)
a dv
iff
b dv.
(5.1)
S
S
Hence, for all f and g in L0 :
f g
U (f ) dv
iff S
U (g) dv, S
and the proof of the main part of the theorem is completed. To prove the opposite direction note first that in Schmeidler (1986) it is shown and referenced that if I on B0 (K) is defined by (5.1), then it satisfies conditions (i), (ii), and (iii). (Only (ii) requires some proof.) Second, the assumptions of the opposite direction say that J is defined as a combination of U and I in the diagram. Hence the preference relation on L0 induced by J satisfies all the required conditions. (U preserves monotonicity and comonotonicity and S a dv is a (sup) norm continuous function of a.) Finally, uniqueness properties of the expected utility representation will be proved. Suppose that there exist an affine real-valued function u on Y and a
Subjective probability and EU without additivity nonadditive probability v on such that for all f and g in L0 : u (f (s)) dv u (g(s)) dv . f g iff S
117
(5.2)
S
Note that monotonicity of v can be derived instead of assumed. When considering (5.2) for all f and g in Lc we immediately obtain, from the uniqueness part of the von N–M Theorem, that u is a positive linear transformation of u. On the other hand it is obvious that the inequality in (5.2) is preserved under positive linear transformations of the utility. Hence, in order to prove that v = v we may assume without loss of generality that u = u. For an arbitrary E in let f in ∗ ∗ L0 be such that f (s) = y ∗ /2 + y∗ /2 U (f ) = E . (E.g. f (s) = y on E and C on E . Then S U (f ) dv = v(E) and S U (f ) dv = v (E).) Let y in Y be such that u(y) = v(E). (E.g. y = v(E)y ∗ +(1 − v(E))(y ∗ /2 + y∗ /2).) Then f ∼ y S which in turn implies u(y) = u (y) = S u (y S ) dv = v (E). The last equality is implied by (5.2). In order to extend the Theorem to more general acts, we have to specify precisely the set of acts L on which the extension holds and we have to extend correspondingly the definition of the integral with respect to nonadditive probability. We start with the latter. Denote by B the set of real-valued, bounded -measurable functions on S. Given a in B and a nonadditive probability v on we define 0 ∞ a dv = (v(a α) − 1) dα + v(a α) dα. S
−∞
0
Each of the integrands mentioned earlier is monotonic, bounded and identically zero where |α| > λ for some number λ. This definition of integration for nonnegative functions in B has been suggested by Choquet (1955). A more detailed exposition appears in Schmeidler (1986). It should be mentioned here that this definition coincides, of course, with the one at the beginning of this section when a obtains finitely many values. For the next definition, existence of weak order over Lc is presupposed. An act f : S → Y is said to be -measurable if for all y in Y the sets {s|f (s) y} and {s|f (s) y} belong to . It is said to be bounded if there are y and z in Y such that y f (s) z on S. The set of all -measurable bounded acts in Y S is denoted by L(). Clearly, it contains L0 . Corollary 5.1. (a) Suppose that a preference relation over L0 satisfies (i) weak order, (ii) comonotonic independence, (iv) continuity, and (v) monotonicity. Then it has a unique extension to all of L() which satisfies the same conditions (over L()). (b) If the extended relation, also to be denoted by , is nondegenerate, then there exist a unique nonadditive probability v on and an affine real-valued function u (unique upto positive linear transformations) such that for all f and g in L(): f g iff S u(f (·)) dv S u(g(·)) dv.
118
David Schmeidler
Proof. The case of degeneracy is obvious, so assume nondegenerate preferences. Consider the following diagram: U⬘
L(≥) ~ i J⬘
U
L0 J
B(K) i
B0 (K)
I⬘
I
R The inner triangle is that of the Proof of the Theorem. B(K) is the set of Kvalued, -measurable, bounded functions on S, and i denotes identity. U is the natural extension of U and is also onto. Because B0 (K) is (sup) norm dense in B(K) and I satisfies condition (iii), I is the unique extension of I that satisfies on B(K) the three conditions that I satisfies on B0 (K). The functional J , defined on L() by: J (f ) = I (U (f )), extends J . Hence, the relation on L() defined by f g iff J (f ) J (g) extends the relation on L0 , and satisfies the desired properties. By the corollary of section 3 in Schmeidler (1986) there exists a nonaddi tive probability v on such that for all f and g in L(): I (f ) I (g) iff S U (f ) dv S U (g) dv. Hence, the expected utility representation of the preference relation has been shown. To complete the proof of (b), uniqueness of v and uniqueness up to a positive linear transformation of u have to be established. However, it follows from the corresponding part of the Theorem. The uniqueness properties also imply that the extension of from L0 to L() is unique. Remark 5.1. Instead of first stating the Theorem for L0 and then extending it to L(), one can state directly the extended Theorem. More precisely a preference relation on L, L0 ⊂ L ⊂ Y S is defined such that in addition to the conditions (i), (ii), (iv), and (vii) it satisfies L = L(). It can then be represented by expected utility with respect to nonadditive probability. However, the first part of the Corollary shows that in this case the preference relation of L() is overspecified: The preferences of L0 dictate those over L(). Remark 5.2. If does not contain all subsets of S, and #X 3 then L() contains finite step functions that do not belong to L0 . Let y and z in Y be such that y ∼ z but y = z, and let E ⊂ S but E ∈ . Define f (s) = y on E and f (s) = z on E C . Clearly f ∈ L0 . The condition #X 3 is required to guarantee existence of y and z as mentioned earlier. Remark 5.3. It is an elementary exercise to show that under the conditions of the Theorem, v is additive iff satisfies (iii) independence (instead of or in addition to (ii) comonotonic independence). Also an extension of an independent relation,
Subjective probability and EU without additivity
119
as in Corollary (a), is independent. Hence our results formally extend the additive theory. We now introduce formally the concept of uncertainty aversion alluded to in the Introduction. A binary relation on L is said to reveal uncertainty aversion if for any three acts f , g, and h in L and any α in [0, 1]: If f h and g h, then αf + (1 − α)g h. Equivalently we may state: If f g, then αf + (1 − α)g g. For definition of strict uncertainty aversion the conclusion should be a strict preference . However, some restrictions then have to be imposed on f and g. One such obvious restriction is that f and g are not comonotonic. We will return to this question in a subsequent Remark. Intuitively, uncertainty aversion means that “smoothing” or averaging utility distributions makes the decision maker better off. Another way is to say that substituting objective mixing for subjective mixing makes the decision maker better off. The definition of uncertainty aversion may become more transparent when its full mathematical characterization is presented. Proposition 5.1. Suppose that on L = L() is the extension of on L0 according to the Corollary. Let v be the derived nonadditive subjective probability and I (the I of the Corollary) be the functional on B, I (a) = S a dv. Then the following conditions are equivalent: (i) reveals uncertainty aversion. (ii) For all a and b in B: I (a + b) I (a) + I (b). (iii) For all a and b in B and for all α in [0, 1]: I (αa + (1 − α)b) αI (a) + (1 − α)b. (iv) For all a and b in B and for all α in [0, 1]: I (αa + (1 − α)b) min{I (a), I (b)}. (v) For all α in R the sets {a ∈ B | I (a) α} are convex. (vi) There exists an α¯ in R s.t. the set {a ∈ b | I (a) α} ¯ is convex. (vii) For all a and b in B and for all α in [0, 1]: If I (a) = I (b), then I (αa + (1 − α)b) I (a). (viii) For all a and b in B: If I (a) = I (b), then I (a + b) I (a) + I (b). (ix) v is convex. That is, for all E and F in : v(E) + v(F ) v(EF ) + v(E + F ). (x) For all a in B: I (a) = min{ S a dp | p ∈ core(v)}, where core(v) = {p: → R | p is additive, p(s) = v(S) and for all E in , p(E) v(E)}.
120
David Schmeidler
Proof. For any functional on B: (iii) implies (iv), (iv) implies (vii), (iv) is equivalent to (v), and (v) implies (vi). The positive homogeneity of degree one of I results in: (ii) equivalent to (iii) and (vii) equivalent to (viii). (vi) implies (v) because for all β in R, (β = α − α), ¯ I (a + βS ∗ ) = I (a) + β, and because adding ∗ βS preserves convexity. (viii) implies (ix). Suppose, without loss of generality, that v(E) v(F ). Then there is γ 1 such that v(E) = γ v(F ). Since I (E ∗ ) = v(E) = γ v(F ) = I (γ F ∗ ), we have by (viii), v(E) + γ v(F ) I (E ∗ + γ F ∗ ). But E ∗ + γ F ∗ = (EF )∗ + (γ − 1)F ∗ + (E + F )∗ , which implies I (E ∗ + γ F ∗ ) = v(EF ) + (γ − 1)v(F ) + v(E + F ). Inserting the last equality in the inequality above leads to the inequality in (ix). The equivalence of (ix), (x), and (ii) is stated as proposition 3 in Schmeidler (1986). Last but not least, (i) is equivalent to (iv). This becomes obvious after considering the mapping U from the diagram in the Proof of the Corollary. The basic result of the Proposition is the equivalence of (i), (iii), (iv), (ix), and (x). (iv) is quasiconcavity of I and it is the translation of (i) by U from L to B. (iii) is concavity, which usually is a stronger assumption. Here I is concave iff it is quasiconcave. Concavity captures best the heuristic meaning of uncertainty aversion. Remark 5.4. The Proposition holds if all the inequalities are strict and in (i) it is strict uncertainty aversion. To show it precisely, null or dummy events in have to be defined. An event E in is termed dummy if for all F in : v(F + E) = v(F ). In (ii)–(vii), in order to state strict inequality one has to assume that a and b are not comonotonic for any b which differs from b on a dummy set. To have a strict inequality in (ix) one has to assume that (E − F )∗ , (EF )∗ , and (F − E)∗ are not dummies. In (x) a geometric condition on the core of v has to be assumed. Remark 5.5. The point of view of this work is that if the information is too vague to be represented by an additive prior, it still may be represented by a nonadditive prior. Another possibility is to represent vague information by a set of priors. Condition (x) and its equivalence to other conditions of the Proposition point out when the two approaches coincide. Remark 5.6. The concept of uncertainty appeal can be defined by: f g implies f αf + (1 − α)g. In the Proposition then all the inequalities have to be reversed and maxima have to replace minima. Obviously, additive probability or the independence axiom reveal uncertainty neutrality.
5.4. Concluding remarks 5.4.1 In the introduction a point of view distinguishing between objective and subjective probabilities has been articulated. It is not necessary for the results
Subjective probability and EU without additivity
121
of this work. What matters is that the lotteries in Y be constructed of additive probabilities. These probabilities can be subjectively arrived upon. This is the point of view of Anscombe and Aumann (1963). They describe their result as a way to assess complicated probabilities, “horse lotteries,” assuming that the probabilities used in the simpler “roulette lotteries” are already known. The Theorem here can also be interpreted in this way, and one can consider the lotteries in Y as derived within the behavioristic framework as follows: Let be a set (a roulette). An additive probability P on all subsets of is derived via Savage’s Theorem. More specifically, let Z be a set of outcomes with two or more elements. (Suppose that the sets Z and X are disjoint.) Let F denote the set of Savage’s acts, that is, all functions from to Z. Postulating existence of a preference relation on F satisfying Savage’s axioms leads to an additive probability P on . Next we identify a lottery, say y, in Y with all the acts from
to X, which induce the probability distribution y. Thus we have a two-step model within the framework of a behavioristic (or personal or subjective) theory of probability. Since the motivation of our Theorem is behavioristic (i.e. derivation of utility and probability from preference), the conceptual consistency of the work requires that the probabilities in Y could also be derived from preferences. We will return to the question of conceptual consistency in the next Remark. Instead of the two-step model of the previous paragraph one can think of omitting the roulette lotteries from the model. One natural way to do this is to try to extend Savage’s Theorem to nonadditive probability. This has been done by Gilboa (1987). Another approach has been followed by Wakker (1986), wherein he substituted a connected topological space for the linear structure of Y . 5.4.2 In recent years many articles have been written which challenged the expected utility hypothesis in the von Neumann–Morgenstern model and in the model with state-dependent acts. We restrict our attention to models that (i) introduce functional representation of a preference relation derived from axioms, and (ii) separate “utilities” from “probabilities” (in the representation). Furthermore (iii) we consider functional representations which are sums of products of two numbers; one number has a “probability” interpretation and the other number has a “utility” interpretation. (For recent works disregarding restriction (iii) the reader may consult Fishburn (1985) and the reference there.) Restriction (iii) is tantamount to the functional representation used in the Theorem (the Choquet integral). An article that preceded the present work in this kind of representation using nonadditive probability is Quiggin (1982). (Thanks for this reference are due to a referee.) His result will be introduced here somewhat indirectly. 5.4.2.1 Consider a preference relation over acts satisfying the assumptions, and hence the conclusions, of the Theorem. Does there exist an additive probability P on and a nondecreasing function f from the unit interval onto itself such that v(E) = f (P (E)) on ? (Such a function f is referred to as a distortion function.) Conditions leading to a positive answer when the function f is increasing are well known. (They are stated as a step in the proof in Savage (1954); see also Fishburn (1970).) In this case v represents qualitative (or ordinal) probability, and
122
David Schmeidler
the question we deal with can be restated as follows: Under what conditions does a qualitative probability have an additive representation? The problem is much more difficult when f is just nondecreasing but not necessarily increasing. A solution has been provided by Gilboa (1985). 5.4.2.2 The set of nonadditive probabilities which can be represented as a composition of a distortion function f and an additive probability P is “small” relative to all nonadditive probabilities. For example, consider the following version of the Ellsberg paradox. There are 90 balls in an urn, 30 black, B, balls and all the other balls are either white, W , or red, R. Bets on the color of a ball drawn at random from the urn are offered. A correct guess is awarded by $100. There are six bets: “B”, “R”, “W ”, “B or W ”, “R or W ”, and “B or R”. The following preferences constitute an Ellsberg paradox: B R ∼ W , R or W B or R ∼ B or W . It is impossible to define an additive probability on the events B, R, and W such that this probability’s (nondecreasing) distortion will be compatible with the preferences mentioned earlier. 5.4.2.3 In Quiggin’s model X is the set of real numbers. An act is a lottery k of the form y = (xi , pi )i=1 where k 1, x1 x2 · · · xk , pi 0 and pi = 1. Quiggin postulates a weak order over all such acts which satisfies several axioms. As a result he gets a unique distortion function f and a monotonic, unique up to a positive linear transformation, utility function u on X such that the mapping y → ki=1 (xi −xi+1 )f ( ij =1 pj ) represents the preferences. However, f (1/2) = 1/2. Quiggin’s axioms are not immediate analogues of the assumptions in Section 5.2. For example he postulates the existence of certainty equivalence for each act, that is, for every y there is x in X such that y ∼ x. Yaari (1987) simplified Quiggin’s axioms and got rid of the restriction f (1/2) = (1/2) on the distortion function. However, Yaari’s main interest was the uncertainty aversion properties of the distortion function f . Hence his simplified axioms result in linear utility over the set of incomes, X. He explored the duality between concavity of the utility functions in the theory of risk aversion and the convexity of the distortion function in the theory of uncertainty aversion. Quiggin extended his results from distributions over the real numbers with finite support to distributions over the real line having density functions. Yaari dealt with arbitrary distribution functions over the real line. Finally, Segal (1984) and Chew (1984) got the most general representation for Quiggin’s model. I conclude my remark on the works of Quiggin, Yaari, and Segal with a criticism from a normative, behavioristic point of view: It may seem conceptually inconsistent to postulate a decision maker who, while computing anticipated utility, assigns weight f (p) to an event known to him to be of probability p, p = f (p). His knowledge of p is derived, within the behavioristic model, from preferences over acts (as in 5.4.1). The use of the terms “anticipation” and “weight,” instead of “expectation” and “probability” does not resolve, in my opinion, the inconsistencies. One way out would be to follow paragraph 5.4.2.1 and to try to derive simultaneously distorted and additive probabilities of events. 5.4.3 The first version of this work (Schmeidler (1982)) includes a slightly extended version of the present Theorem. First recall that Savage termed an event
Subjective probability and EU without additivity
123
E null if for all f and g in L: f = g on E c implies f ∼ g. Clearly, if the conditions of the Theorem are satisfied then an event is null iff it is dummy. The extended version of the Theorem includes the following addition: The nonadditive probability v of the Theorem satisfies the following condition: u(E) = 0 implies E is dummy, if and only if the preference relation also satisfies: E is not null, f = g on E c and f (s) g(s) on E imply f g. 5.4.4 The expected utility model has in economic theory two other interpretations in addition to decisions under uncertainty. One interpretation is decisions over time: s in S represents time or period. The other interpretation of S is the set of persons or agents in the society, and the model is applied to the analysis of social welfare functions. Our extension of the expected utility model may have the same uses. Consider the special case where f (s) is s person’s income. Two income allocations f and g are comonotonic if the social rank (according to income) of any two persons is not reversed between f and g. Comonotonic f , g, and h induce the same social rank on individuals and then f g implies γ f +(1−γ )h γ g +(1−γ )h. This restriction on independence is, of course, consistent with strict uncertainty aversion which can here be interpreted as inequality (or inequity) aversion. In other words we have here an “Expected Utility” representation of a concave Bergson–Samuelson social welfare function. 5.4.5 One of the puzzling phenomena of decisions under uncertainty is people buying life insurance and gambling at the same time.1 This behavior is compatible with the model of this chapter. Let S 0 = S 1 × S 2 × S 3 , where s 1 in S 1 describes a possible state of health of the decision maker, s 2 in S 2 describes a possible resolution of the gamble, and s 3 in S 3 describes a possible resolution of all other relevant uncertainties. Let v i be a nonadditive probability on S i , i = 0, 1, 2, 3. Suppose that v 1 is strictly convex (i.e. satisfying strict uncertainty aversion), v 2 is strictly concave (i.e. v 2 (E) + v 2 (F ) > v(E ∪ F ) + v(E ∩ F ) if E\F and F \E are nonnull). Furthermore, if E 0 = E 1 × E 2 × E 3 , and Ei ⊂ S i , then v 0 (E 0 ) = v 1 (E 1 )v 2 (E 2 )v 3 (E 3 ). To simplify matters suppose that X is a bounded interval of real numbers (representing an income in dollars), and the utility u is linear on X. Let the preference relation over acts on S 0 be represented by f → u(f ) dv 0 . In this case buying insurance and gambling (betting) simultaneously is preferred to buying insurance only or gambling only, ceteris parabus. Also either of these last two acts is preferred to “no insurance no gambling.”
Acknowledgments I am thankful to Roy Radner for comments on the previous version presented at Oberwolfach, 1982. Thanks are due also to Benyamin Shitovitz, and anonymous referees for pointing out numerous typos in previous versions. Partial financial support from the Foerder Institute and NSF Grant No. SES 8026086, is gratefully acknowledged. Parts of this research have been done at the University of Pennsylvania, and at the Institute for Mathematics and its Applications at the University of Minnesota.
124
David Schmeidler
Note 1 It is not puzzling, as a referee pointed out, if one accepts the Friedman–Savage (1948) explanation of this phenomenon.
References Anscombe, F. J. and R. J. Aumann (1963). “A Definition of Subjective Probability,” The Annals of Mathematical Statistics, 34, 199–205. Chew, Soo Hong (1984). “An Axiomatization of the Rank-dependent Quazilinear Mean Generalizing the Gini Mean and the Quazilinear Mean,” mimeo. Choquet, G. (1955). “Theory of Capacities,” Ann. Inst. Fourier (Grenoble), 5, 131–295. Dunford, N. and J. T. Schwarz (1957). Linear Operators Part I. New York: Interscience. Ellsberg, D. (1961). “Risk, Ambiguity and Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Feynman, R. P. et al. (ed.) (1963, 1965). The Feynman Lectures on Physics, Vol. I, Sections 37–4, 37–5, 37–6, and 37–7; Vol III, Chapter 1. Fishburn, P. C. (1970). Utility Theory for Decision Making. New York: John Wiley & Sons. —— (1985). “Uncertainty Aversion and Separated Effects in Decision Making Under Uncertainty,” mimeo. Friedman, M. and L. J. Savage (1984). “The Utility Analysis of Choices Involving Risk,” Journal of Political Economy, 56, 279–304. Gilboa, I. (1987). “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88. —— (1985). “Subjective Distortions of Probabilities and Non-Additive Probabilities,” Working Paper, The Foerder Institute for Economic Research, Tel Aviv University. von Neumann, J. and O. Morgenstern (1947). Theory of Games and Economic Behavior, 2nd ed. Princeton: Princeton University Press. Quiggin, J. (1982). “A Theory of Anticipated Utility,” Journal of Economic Behavior and Organization, 3, 323–343. Savage, L. J. (1954). The Foundations of Statistics. New York (2nd ed. 1972): John Wiley & Sons; New York: Dover Publications. Schmeidler, D. (1982). “Subjective Probability without Additivity” (Temporary Title), Working Paper, The Foerder Institute for Economic Research, Tel Aviv University. —— (1986). “Integral Representation with Additivity,” Proceedings of the American Mathematical Society, 97, 253–261. Segal, U. (1984). “Nonlinear Decision Weights with the Independence Axiom,” UCLA Working Paper #353. Wakker, P. O. (1986). “Representations of Choice Situations,” Ph.D. Thesis, Tilburg, rewritten as Additive Representation of Preferences (1989). Norwell, MA: Kewer Academic Publishers (Ch. VI). Yaari, M. E. (1987). “Dual Theory of Choice Under Uncertainty,” Econometrica, 55, 95–115.
6
Maxmin expected utility with non-unique prior Itzhak Gilboa and David Schmeidler
6.1. Introduction One of the first objections to Savage’s paradigm was raised by Ellsberg (1961). He suggested the following mind experiment, challenging the expected utility hypotheses: subject is asked to preference rank four bets. He/she is shown two urns, each containing 100 balls each one either red or black. Urn A contains 50 black balls and 50 red ones, while there is no additional information about urn B. One ball is drawn at random from each urn. Bet 1 is “the ball drawn from urn A is black,” and will be denoted by AB. Bet 2 is “the ball drawn from urn A is red,” and will be denoted by AR, and similarly we have BB and BR. Winning a bet entitles the subject $100. The following preferences have been observed empirically: AB & AR > BB & BR. It is easy to see that there is no probability measure supporting these preferences through expected utility maximization. One conceivable explanation of this phenomenon which we adopt here is as follows: In case of urn B, the subject has too little information to form a prior. Hence he/she considers a set of priors as possible. Being uncertainty averse, he/she takes into account the minimal expected utility (over all priors in the set) while evaluating a bet. For instance, one may consider the extreme case in which our decision maker takes into account all possible priors over urn B. In this case the minimal utility of each one of the bets AB, AR is $50, while that of bets BB and BR is $0, so that the observed preferences are compatible with the maxmin expected utility decision rule. These ideas are not new. Hurwicz (1951) showed an example of statistical analysis where the statistician is too ignorant to have a unique “Bayesian” prior, but “not quite as ignorant” to apply Wald’s decision rule with respect to all priors. Smith (1961) suggested considering an interval of priors in such situations. He tried to axiomatize this behavior pattern using the “Odds” concept. Other works utilize the Choquet Integration with respect to capacities (Choquet (1955)) to deal with the
Gilboa, I. and D. Schmeidler (1989). Maxmin expected utility with a non-unique prior, Journal of Mathematical Economics, 97, 141–153.
126
Itzhak Gilboa and David Schmeidler
problem of a nonunique prior. Huber and Strassen (1973) use the Choquet Integral in testing hypotheses regarding the choice between two disjoint sets of measures. Schmeidler (1982, 1984, 1986) axiomatizes the preferences representable via the Choquet Integral of the utility with respect to a nonadditive probability measure. He used a framework including both “Horse Lotteries” and “Roulette Lotteries,” à la Anscombe and Aumann (1963). Gilboa (1987) obtains the same representation in the original framework of Savage (1954). (See also Wakker (1986)). In Schmeidler (1986) it has been shown, roughly speaking, that when the nonadditive probability v on S is convex (i.e. v(A ∪ B) + v(A ∩ B) ≥ v(A) + v(B)), the Choquet Integralof a real-valued function, say a, with respect to v is equal to the minimum of { a dP |P is in the core of v}. The core of v, by definition, consists of all finitely additive probability measures that majorize v pointwise (i.e. event-wise). That is to say, the nonadditive expected utility theory coincides with the decision rule we propose here, where the set of possible priors is the core of v. However, when an arbitrary (closed and convex) set of priors C is given, and one defines v(A) = min{P (A)|P ∈ C}, v need not be convex, though it is exact, that is, pointwise minimum of additive set functions. (See examples in Schmeidler (1972) and Huber and Strassen (1973).) Furthermore, even if v happens to be convex C does not have to be its core. It is not hard to construct an example in which C is a proper subset of the core of v. This chapter proposes an axiomatic foundation of the maxmin expected utility decision rule. As in Schmeidler (1984), some of which notations we repeat, using the framework of Anscombe and Aumann (1963). The main difference among the models of Anscombe and Aumann (1963), Schmeidler (1984) and the present one lies in the phrasing of the independence axiom (sure-thing principle). Unlike in the other two works, we also use here an axiom of uncertainty aversion. Similarly to the nonadditive expected utility theory, this model extends classical expected utility. In general, the theories differ from each other; as mentioned earlier, they coincide in the case of a convex v. The straightforward interpretation of our result is an extension of the neoBayesian paradigm which leads to a set of priors instead of a unique one. However, with a different interpretation, in which the set C is the set of possible probability distributions in a statistical decision problem, our result sheds light on Wald’s minimax criterion and on its relation to personalistic probability. (We refer here to the minimax loss criterion, which is equivalent to maximin utility, and not to the minimax regret criterion suggested by Savage (1954: Ch. 9).) In Wald (1950: Section 1.4.2), we find: “A minimax solution seems, in general, to be a reasonable solution of the decision problem when an a priori distribution in does not exist or is unknown to the experimenter.” Hence our main result can be considered as an axiomatic foundation of Wald’s criterion. The detailed exposition of the model and the main result are stated in the next section. The proof in Section 6.3 and Section 6.4 is devoted to an extension and several concluding remarks. Especially, we deal there with the definition of the concept of independence in the case of a nonunique prior.
Maxmin expected utility with non-unique prior
127
Finally we would like to note that different approaches to the phenomenon of a nonunique prior appear in Lindley et al. (1979), Vardennan and Meeden (1983), Agnew (1985), Genest and Schervish (1985), Bewley (1986) and others.
6.2. Statement of the main result Let X be a set and let Y be the set of distributions over X with finite supports " % Y = y : X → [0, 1] | y(x) = 0 for only finitely many x’s in X and y(x) = 1 . x∈X
For notational simplicity we identify X with the subset {y ∈ Y |y(x) = 1 for some x in X} of Y . Let S be a set and let be an algebra of subsets of S. Both sets, X and S are assumed to be nonempty. Denote by L0 the set of all -measurable finite step functions from S to Y and denote by Lc the constant functions in L0 . Let L be a convex subset of Y S which includes Lc . Note that Y can be considered a subset of some linear space, and Y S , in turn, can then be considered as a subspace of the linear space of all functions from S to the first linear space. Whereas it is obvious how to perform convex combinations in Y it should be stressed that convex combinations in Y S are performed pointwise. That is, for f and g in Y S and α in [0, 1], αf + (1 − α)g = h where h(s) = αf (s) + (1 − α)g(s) for s ∈ S. In the neo-Bayesian nomenclature, elements of X are (deterministic) outcomes, elements of Y are random outcomes or (roulette) lotteries and elements of L are acts (or horse lotteries). Elements of S are states (of nature) and elements of are events. The primitive of a neo-Bayesian decision model is a binary (preference) relation over L to be denoted by . Next are stated several properties (axioms) of the preference relation, which will be used in the sequel. A1 Weak order. (a) For all f and g in L: f g or g f . (b) For all f , g, and h in L: If f g and g h then f h. The relation on L induces a relation also denoted by on Y : y z iff y ∗ z∗ where x ∗ (s) = x for all x ∈ Y and s ∈ S. When no confusion is likely to arise, we shall not distinguish between y ∗ and y. As usual, > and & denote the asymmetric and symmetric parts, respectively, of . A2 Certainly-Independence (C-independence for short). For all f , g in L and h in Lc and for all α in ]0, 1[: f > g iff αf + (1 − α)h > αg + (1 − α)h. A3 Continuity. For all f , g, and h in L: if f > g and g > h then there are α and β in ]0, 1[ such that αf + (1 − α)h > g and g > βf + (1 − β)h. A4 Monotonicity. For all f and g in L: if f (s) g(s) on S then f g.
128
Itzhak Gilboa and David Schmeidler
A5 Uncertainty aversion. For all f , g ∈ L and α ∈ ]0, 1[: f & g implies αf + (1 − α)g f . A6 Nondegeneracy. Not for all f and g in L, f g. All the assumptions except for A2 and A5 are quite common. The standard independence axiom is stronger than C-independence as it allows h to be any act in L rather than restricting it to constant acts. This axiom seems heuristically more appealing: a decision maker who prefers f to g can more easily visualize the mixtures of f and g with a constant h than with an arbitrary one, hence he is less likely to reverse his preferences. An intuitive objection to the standard independence axiom is that it ignores the phenomenon of hedging. Like comonotonic independence (Schmeidler (1984)), C-independence does not exclude hedging. However, C-independence is much simpler than and implied by comonotonic independence. Uncertainty aversion (which was introduced in Schmeidler (1984)) captures the phenomenon of hedging, especially when the preference is strict. Thus this assumption complements C-independence. Before stating the main result we mention that the topology to be used on the space of finitely additive set functions on is the product topology, that is, the weak∗ topology in Dunford and Schwartz (1957) terms. Recall that in this topology the set of finitely additive probability measures on is compact. Theorem 6.1. Let be a binary relation on L0 . Then the following conditions are equivalent: (1) satisfies assumptions A1–A5 for L = L0 . (2) There exist an affine function u: Y → R and a nonempty, closed and convex set C of finitely additive probability measures on such that: (∗) f g iff minP ∈C u ◦ f dP minP ∈C u ◦ g dP (for all f , g ∈ L0 ). Furthermore: (a) The function u in (2) is unique up to a positive linear transformation; (b) The set C in (2) is unique iff assumption A6 is added to (1).
6.3. Proof of Theorem 6.1 The crucial part of the proof is that (1) implies (2). If A6 fails to hold, then a constant function u and any closed and convex subset C will satisfy (2), hence for the next several lemmata we suppose assumptions A1–A6. Lemma 6.1. There exists an affine u: Y → R such that for all y, z ∈ Y : y z iff u(y) u(z). Furthermore, u is unique up to a positive linear transformation. Proof. This is an immediate consequence of the von Neumann–Morgenstern theorem, since the independence assumption for Lc is implied by C-independence. (See Fishburn (1970: Ch. 8).)
Maxmin expected utility with non-unique prior
129
Lemma 6.2. Given a u : Y → R from Lemma 6.1, there exists a unique J : L0 → R such that: (i) f g iff J (f ) J (g) (for all f , g ∈ L0 ); (ii) for f = y ∗ ∈ Lc , J (f ) = u(y). Proof. On Lc J is uniquely determined by (ii). We extend J to L0 as follows: ¯ y ∈ Y such that y f y. ¯ Given f ∈ L0 , there are y, By the continuity assumption and other assumptions, there exists a unique α ∈ [0, 1] such that f & αy + (1 − α)y. ¯ Define J (f ) = J (αy + (1 − α)y). ¯ By construction, J satisfies (i), hence it is also unique. We shall henceforth choose a specific u: Y → R such that there are y1 , y2 ∈ Y for which u(y1 )< − 1 and u(y2 ) > 1. (Such a choice of a utility u is possible in view of the nondegeneracy assumption.) We denote by B the space of all bounded -measurable real-valued functions on S (which is denoted B(S, ) in Dunford and Schwartz (1957)). B0 will denote the space of functions in B which assume finitely many values. Let K = u(Y ), and let B0 (K) be the subset of functions in B0 with values in K. For γ ∈ R, let γ ∗ ∈ B0 be the constant function on S the value of which is γ . Lemma 6.3. There exists a functional I : B0 → R such that: (i) (ii) (iii) (iv)
For all f ∈ L0 , I (u ◦ f ) = J (f ) (hence I (1∗ ) = 1). I is monotonic (i.e. for a, b ∈ B0 : a b ⇒ I (a) I (b)). I is superlinear (i.e. superadditive and homogeneous of degree 1). I is C-independent: for any a ∈ B0 and γ ∈ R, I (a + γ ∗ ) = I (a) + I (γ ∗ ).
Proof. We first define I on B0 (K) by condition (i). (Lemma 6.2 and the monotonicity assumption assure that I is thus well defined). We now show that I is homogeneous on B0 (K). Assume a = αb where a, b ∈ B0 (K) and 0 < α 1. We have to show that I (a) = αI (b). (This will imply the equality for α > 1.) Let g ∈ L0 satisfy u ◦ g = b. Let z ∈ Y satisfy J (z) = 0 and define f = αg + (1 − α)z. Hence u ◦ f = αu ◦ g + (1 − α)u ◦ z = αb = a, so I (a) = J (f ). Let y ∈ Y satisfy y & g (hence J (y) = J (g) = I (b)). By C-independence, αy + (1 − α)z & αg + (1 − α)z = f . Hence J (f ) = J (αy + (1 − α)z) = αJ (y) + (1 − α)J (z) = αJ (y). Whence I (a) = J (f ) = αJ (y) = αI (b). We now extend I by homogeneity to all of B0 . Note that I is monotone and homogeneous of degree 1 on B0 . Next we show that I is C-independent (part (iv) of the Lemma). Let there be given a ∈ B0 and γ ∈ R. By homogeneity we may assume without loss of generality that 2a, 2γ ∗ ∈ B0 (K). Now define β = I (2a) = 2I (a). Let f ∈ L0 satisfy u ◦ f = 2a and let y, z ∈ Y satisfy u ◦ y = β ∗ and u ◦ z = 2γ ∗ . Since
130
Itzhak Gilboa and David Schmeidler
f & y, C-independence of implies that 12 f + 12 z & 12 y + 12 z. Hence 1 1 I (a + γ ∗ ) = I ( β ∗ + γ ∗ ) = β + γ = I (a) + γ , 2 2 and I is C-independent. It is left to show that I is superadditive. Let there be given a, b ∈ B0 . Once again, by homogeneity we may assume without loss of generality that a, b ∈ B0 (K). Furthermore, for the same reason it suffices to prove that I ( 12 a + 12 b) 12 I (a) + 1 2 I (b). Suppose that f , g ∈ L0 are such that u◦f = a and u◦g = b. If I (a) = I (b), then f & g and by uncertainty aversion (assumption A5), 12 f + 12 g f , which, in turn, implies I ( 12 a + 12 b) I (a) = 12 I (a) + 12 I (b). Assume, then, I (a) > I (b), and let γ = I (a) − I (b). Set c = b + γ ∗ and note that I (c) = I (b) + γ = I (a) by C-independence of I . Using the C-independence of I twice more and its superadditivity for the case proven earlier, one obtains: 1 1 1 1 1 1 I a+ b + γ =I a + c I (a) + I (c) 2 2 2 2 2 2 =
1 1 1 I (a) + I (b) + γ , 2 2 2
which completes the Proof of the Lemma. Recall that the space B is a Banach space with the sup norm · , and B0 is a norm-dense subspace of B. Lemma 6.4 will also be used in an extension of the Theorem. Lemma 6.4. There exists a unique continuous extension of I to B. Furthermore, this extension is monotonic, superlinear and C-independent. Proof. We first show that for each a, b ∈ B0 , |I (a) − I (b)| a − b. Indeed, a = b + a − b b + a − b∗ . Monotonicity and C-independence of I imply that I (a) I (b + a − b∗ ) = I (b) + a − b or I (a) − I (b) a − b. The same argument implies I (b) − I (a) b − a. Thus there exists a unique continuous extension of I . Obviously, it is superlinear, monotonic and C-independent. In Lemma (6.5) the convex set of finitely additive probability measures C of Theorem 6.1 will be constructed via a separation theorem. Lemma 6.5. If I is a monotonic superlinear and C-independent functional on B with I (1∗ ) = 1, there exists a closed and convex set C of finitely additive probability measures on such that: for all b ∈ B, I (b) = min{ b dP |P ∈ C}. Proof. Let b ∈ B with I (b) > 0 be given.We will construct a finitely additive probability measure Pb such that I (b) = b dPb and I (a) a dPb for all
Maxmin expected utility with non-unique prior
131
a ∈ B. To this end we define D1 = {a ∈ B|I (a) > 1}, D2 = conv({a ∈ B|a 1∗ } ∪ {a ∈ B|a b/I (b)}). We now show that D1 ∩ D2 = ∅. Let d2 ∈ D2 satisfy d2 = αa1 + (1 − α)a2 where a1 1∗ , a2 (b/I (b)), and α ∈ [0, 1]. By monotonicity, homogeneity, and C-independence of I , I (d2 ) α + (1 − α)I (a2 ) 1. Note that each of the sets D1 , D2 has an interior point and that they are both convex. Thus, by a separation theorem (see Dunford and Schwartz (1957, V.2.8)) there exists a nonzero continuos linear functional pb and an α ∈ R such that: for all d1 ∈ D1 and d2 ∈ D2 ,
pb (d1 ) α pb (d2 ).
(6.1)
Since the unit ball of B is included in D2 , α > 0. (Otherwise pb would have been identically zero.) We may therefore assume without loss of generality that α = 1. By (1), pb (1∗ ) 1. Since 1∗ is a limit point of D1 , pb (1∗ ) 1 is also true, hence pb (1∗ ) = 1. We now show that pb is non-negative, or, more specifically, that pb (1E ) 0 whenever 1E is the indicator function of some E ∈ . Since pb (1E ) + pb (1∗ − 1E ) = pb (1∗ ) = 1, and 1∗ − 1E ∈ D2 , the inequality follows. By the classical representation theorem there exists a finitely additive probability measure Pb on such that pb (a) = a dPb for all a ∈ B. We will now show that pb (a) I (a) for all a ∈ B, with equality for a = b: First assume I (a) > 0. It is easily seen that a/I (a) + (1/n)∗ ∈ D1 , so the continuity of pb and (1) imply pb (a) I (a). For the case I (a) 0 the inequality follows from C-independence. Since b/I (b) ∈ D2 , we obtain the converse inequality for b, thus pb (b) = I (b). We now define the set C as the closure of the convex hull of {Pb |I (b) > 0} (which, of course, is convex). It is easy to see that I (a) min{ a dP |P ∈ C}. For a such that I (a) > 0, we have shown the converse inequality to hold as well. For a such that I (a) 0, it is again a simple implication of C-independence. Conclusion of the Proof of Theorem 6.1 Lemmata 6.1–6.5 prove that (1) implies (2). Assuming (2) define I on B by I (b) = min{ b dP |P ∈ C}, C compact and convex. It is easy to see that I is monotonic, superlinear, C-independent, and continuous. So, in turn, the preference relation defined on L0 by (2) satisfies A1–A5. We now turn to prove the uniqueness properties of u and C. The uniqueness of u up to positive linear transformation is implied by Lemma 6.1. If Assumption A6 does not hold, the range of u, K, is a singleton, and C can be any nonempty closed and convex set. We shall now show that if assumption A6
132
Itzhak Gilboa and David Schmeidler
does hold, C is unique. Assume the contrary, that is, that there are C1 = C2 , both nonempty, closed and convex, such that the two functions on L0 : % " J1 (f ) = min u(f ) dP |P ∈ C1 , %
" J2 (f ) = min
u(f ) dP |P ∈ C2 ,
both represent . Without loss of generality one may assume that there exists P1 ∈ C1 \C2 . By a separation Theorem (Dunford and Schwartz (1957: V.2.10)), there exists a ∈ B such that % " a dP1 < min a dP |P ∈ C2 . Without loss of generality we may assume that a ∈ B0 (K). Hence there exists f ∈ L0 such that J1 (f ) < J2 (f ). Now let y ∈ Y satisfy y & f . We get J1 (y) = J1 (f ) < J2 (f ) = J2 (y), a contradiction.
6.4. Extension and concluding remarks A natural question arising in view of Theorem 6.1 is whether it holds when the set of acts L, on which the preference relation is given, is a convex superset of L0 . A partial answer is presented in the sequel. It will be shown that, for a certain superset of L0 , the preference relation on it is completely determined by its restriction to L0 , should it satisfy the assumptions introduced in Section 6.2. Given a weak order on Lc , an act f : S → Y is said to be -measurable if for all y ∈ Y the sets {s|f (s) > y} and {s|f (s) y} belong to . It is said to be bounded (or, more precisely, -bounded) if there are y1 , y2 ∈ Y such that y1 f (s) y2 for all s ∈ S. The set of all -measurable bounded acts in Y S is denoted by L(). It is obvious that L() is convex and contains L0 . Proposition 6.1. Suppose that a preference relation over L0 satisfies assumptions A1–A5. Then it has a unique extension to L() which satisfies the same assumptions (over L()). Proof. Because of monotonicity, the proposition is obvious in case that Assumption A6 does not hold. Therefore we assume it does, and we may apply Lemmata 6.1–6.4. We then define the extension of (also to be denoted by ) as follows: f g iff I (u(f )) I (u(g)). It is obvious that satisfies A1–A5 and that on L() is the unique monotonic extension of on L0 .
Maxmin expected utility with non-unique prior
133
Remark. Suppose that satisfies A1–A5 over L, which is convex and contains L0 . Then, in view of Proposition 6.1, may be represented as in Theorem 6.1 on L ∩ L(). We now introduce the concepts of independence of acts and products of binary relations. Suppose that a given preference relation satisfies A1–A6 over L0 . By Proposition 6.1 we extend it to L = L() and let u and C be as in Theorem 6.1. Two acts f , g ∈ L are said to be independent if the following two conditions hold: (1) There exists P0 ∈ C such that " % u ◦ f dP0 = min u ◦ f dP |P ∈ C , and "
u ◦ g dP0 = min
% u ◦ g dP |P ∈ C ;
(2) u ◦ f and u ◦ g are two stochastically independent random variables with respect to any extreme point of C (for short: Ext(C)). As expected, this notion of independence turns out to be closely related to that of product spaces, once the latter is defined. We will refer to a triple (S, , C) as a nonunique probability space. Given two nonunique probability spaces (Si , i , Ci ) i = 1, 2, we define their product (S, , C) as follows: S = S1 × S2 , = 1 ⊗ 2 and C is the closed convex hull of {P1 ⊗ P2 |P1 ∈ C1 , P2 ∈ C2 }. Suppose that for a given set of outcomes X, there are given two acts spaces Li0 ⊂ Y Si , i = 1, 2, and two preference relations i correspondingly, such that the restrictions of 1 and 2 to Y coincide. As before, we suppose that each i satisfies A1–A6 and we consider its extension to Li = Li (i ). For the product acts space L0 ⊂ Y S1 ×S2 we define the product preference relation = 1 ⊗ 2 as derived from u and C. It is obvious that also satisfies A1–A6, and it has a unique extension to L = L(). Given f i ∈ Li , it has a unique trivial extension f¯i ∈ L. Now we formulate the result which justifies our definition of independence: Proposition 6.2. Given L1 , 1 , L2 , 2 and L as stated earlier, is the unique preference relations over L satisfying: (1) assumptions A1–A6; (2) for all f i , g i ∈ Li , f i i g i iff f¯i g¯ i (i = 1, 2); (3) for all f ∈ L1 and g ∈ L2 , f¯ and g¯ are independent. Proof. It is trivial to see that indeed satisfies (1)–(3). To see that it is unique, let also satisfy (1)–(3). By (1) and our main result, is representable by a utility
134
Itzhak Gilboa and David Schmeidler
u and a convex and closed set of finitely additive measures C . By Lemma 6.1 we assume without loss of generality that u = u . We now wish to show that C = C. Step 1. C ⊂ C. Proof of Step 1. As C is convex, it suffices to show that Ext(C ) ⊂ C. Let, then P0 ∈ Ext(C ). Define Pi to be the restriction of P0 to i (i = 1, 2). Choose A ∈ 1 and B ∈ 2 , and let f ∈ L1 and g ∈ L2 satisfy u◦f = 1A , u◦g = 1B . Since f¯ and g¯ are independent, they are independent with respect to P0 . Hence P0 (A × B) = P0 (A × S2 )P0 (S1 × B) = P1 (A)P2 (B). This implies P0 = P1 ⊗ P2 ∈ C. Step 2. C ⊂ C . Proof of Step 2. We begin with Step 2a. It 1 and 2 are finite, then C ⊂ C . Proof of Step 2a. By a theorem of Straszewicz (1935), it suffices to show that P1 ⊗ P2 ∈ C for all P1 ∈ Exp(C1 ) and P2 ∈ Exp(C2 ), where Exp(C) denotes the set of exposed points in C, that is, the points at which there exists a supporting hyperplane which does not pass through any other point of C. Let there be given, then, P1 ∈ Exp(C1 ) and P2 ∈ Exp(C2 ). Let f ∈ L1 and g ∈ L2 be such that % " u ◦ f dP |P ∈ C1 u ◦ f dP1 = min and
"
u ◦ g dP2 = min
% u ◦ g dP |P ∈ C2 .
By the independence of f and g, there exists P0 ∈ C , for which u ◦ f¯ dP and u ◦ g¯ dP are minimized simultaneously. By step 1, P0 ∈ C, hence there are P1 ∈ C1 and P2 ∈ C2 such that P0 = P1 ⊗P2 . However, u◦ f¯ dP0 = u◦f dP1 and u ◦ g¯ dP0 = u ◦ g dP2 . By the uniqueness property of Exp(Ci )(i = 1, 2), we obtain P1 = P1 and P2 = P2 . Hence P1 ⊗ P2 = P0 ∈ C, and step 2a is proved. We will now complete the Proof of Step 2. Assume that, by way of negation, C \C = ∅, that is, = . As in the Proof of the Theorem, there exists f ∈ L0 ˜ and y ∈ Y such that f > y ∗ and y ∗ > f . Consider the finite sub-algebra, say , of generated by f . There are i finite sub-algebras of i (I = 1, 2), such that ˜ ⊂ = ⊗ . Next consider the restrictions of i to the -measurable 1 2 i functions, and the restrictions of , to -measurable functions. Obviously, both and satisfy requirements (1)–(3) of the Proposition, although they differ on the set of -measurable functions (to which f and y ∗ belong.) This contradicts Step 2a, and the Proof of the Proposition is thus completed.
Acknowledgments The authors acknowledge partial financial support by the Foerder Institute for Economic Research and by The Keren Rauch Fund at Tel Aviv University.
Maxmin expected utility with non-unique prior
135
References Agnew, C. E. (1985). Multiple probability assessments by dependent experts, Journal of the American Statistical Association, 80, 343–347. Anscombe, F. J. and R. J. Aumann (1963). A definition of subjective probability, The Annals of Mathematics and Statistics, 34, 199–205. Bewley, T. (1986). Knightian decision theory: Part 1, Mimeo (Yale University, New Haven, CT). Choquet, G. (1955). Theory of capacities, Annales de l’Institut Fourier, 5, 131–295. Dunford, N. and J. T. Schwartz (1957). Linear operators, Part I (Interscience, New York). Ellsberg, D. (1961). Risk, ambiguity and the savage axioms, Quarterly Journal of Economics, 75, 643–669. Fishburn, P. C. (1970). Utility theory for decision making (Wiley, New York). Genest, C. and M. J. Schervish (1985). Modeling expert judgments for Bayesian updating, The Annals of Statistics, 13, 1198–1212. Gilboa, I. (1987). Expected utility theory with purely subjective non-additive probabilities, Journal of Mathematical Economics, 16, 65–88. Huber, P. J. and V. Strassen (1973). Minimax tests and the Neyman–Pearson lemma for capacities, The Annals of Statistics, 1, 251–263. Hurwicz, L. (1951). Some specification problems and application to econometric models, Econometrica, 19, 343–344. Lindley, D. V., A. Tversky and R. V. Brown (1979). On the reconciliation of probability assessments, Journal of the Royal Statistical Society, Series A 142, 146–180. Savage, L. J. (1954). The foundations of statistics (Wiley, New York). Schmeidler, D. (1972). Cores of exact games, I, Journal of Mathematical Analysis and Applications, 40, 214–225. Schmeidler, D. (1982). Subjective probability without additivity (temporary title), Working paper (Foerder Institute for Economic Research, Tel Aviv University, Tel Aviv). Schmeidler, D. (1984). Subjective probability and expected utility without additivity, IMA Preprint Series. (Reprinted as Chapter 5 in this volume.) Schmeidler, D. (1986). Integral representation without additivity, Proceedings of the American Mathematical Society, 97, No. 2. Smith, A. B. Cedric, (1961). Consistency in statistical inference and decision, Journal of the Royal Statistical Society, Series B 23, 1–25. Straszewicz, S. (1935). Uber exponierte Punkte abgeschlossener Punktmengen, Fundamenta Mathematicae, 24, 139–143. Vardennan, S. and G. Meeden (1983). Calibration, sufficiency and dominant consideration for Bayesian probability assessors, Journal of the American Statistical Association, 78, 808–816. Wakker, P. (1986). Ch. 6 in a draft of a Ph.D thesis. Wald, A. (1950). Statistical decision functions (Wiley, New York).
7
A simple axiomatization of nonadditive expected utility Rakesh Sarin and Peter P. Wakker
7.1. Introduction Savage’s (1954) subjective expected utility (SEU) theory has been widely adopted as the guide for rational decision making in the face of uncertainty. In SEU theory both the probabilities and the utilities are derived from preferences (see also Ramsey (1931)). This represents a hallmark contribution, as it avoids the reliance on introspection for quantifying tastes and beliefs. We continue in Savage’s vein and extend his theory to derive a more general nonadditive expected utility representation, called Choquet expected utility (CEU). Schmeidler (1989, first version 1982) made the first contribution in providing a CEU representation and Gilboa (1987) extended this work. We develop this line of research further by providing an intuitive axiomatization of CEU. The key distinction between our work and that of Savage is that we identify two types of events—unambiguous and ambiguous. People feel relatively “sure” about the probabilities of unambiguous events. An example of an unambiguous event could be the outcome of a toss of a fair coin (heads or tails). We assume that Savage’s axioms hold for a sufficiently rich set of “unambiguous acts,” that is, acts measurable with respect to the unambiguous events. The probabilities of ambiguous events, however, are not known with precision. An example of such an event could be next week’s weather conditions (rain or sunshine). Ambiguity in the probability of such events may be caused, for example, by a lack of available information relative to the amount of conceivable information Keynes (1921). Most people exhibit a reluctance to bet on events with ambiguous probabilities. This reluctance leads to a violation of Savage’s sure-thing principle (P2). The CEU theory proposed here does not impose the sure-thing principle for all events and is therefore capable of permitting a liking for specificity and a dislike for ambiguity in probability. The key condition in this chapter to provide the CEU representation is “cumulative dominance” (P4 in Section 7.3). Simply stated, this condition requires that
Sarin, R. and P. P. Wakker (1992). “A simple axiomatization of nonadditive expected utility,” Economica, 18, 141–153.
Axiomatization of nonadditive expected utility
137
if receiving consequence α or a superior consequence is considered more likely for an act f than for an act g, for every α, then the act f is preferred to the act g. This condition is trivially satisfied for an SEU maximizer. Unlike the sure-thing principle that forces the probabilities for all events to be additive, cumulative dominance permits that probabilities for some events could be nonadditive. A probability function is nonadditive if the probability of the union of two disjoint events is not equal to the sum of the individual probabilities of each event. An example will show how nonadditive probabilities could accommodate an aversion toward ambiguity. The judgments and preferences that may lead to nonadditive probability have been rationalized by many authors. For example, Keynes (1921) has argued that confidence in probability influences decisions under uncertainty. Knight (1921) made the distinction between risk and uncertainty based on whether the event probabilities are known or unknown. Recently Schmeidler (1989) has argued that the amount of information available about an event may influence probabilities in such a way that probabilities are not necessarily additive. In a seminal paper, Ellsberg (1961) showed that if one accepts Savage’s definition of probability then a majority of subjects violates additivity of probability. Numerous experiments since then have confirmed Ellsberg’s findings. Even though Ellsberg’s example is well known, we present it as it serves to illustrate the motivation and direction for our proposed modification of Savage’s theory. Suppose an urn is filled with 90 balls, 30 of which are red (R), and 60 of which are white (W ) and yellow (Y ) in an unknown proportion. One ball will be drawn randomly from the urn and your payoff will depend on the color of the drawn ball and the “act” (decision alternative) you choose. See Table 7.1. When subjects are asked to choose between acts f and g, a majority chooses act f , presumably because in act f the chance of winning $1,000 is precisely known to be 1/3. In act g the chance of drawing a white ball is ambiguous since the number of white balls is unknown. Now, when the same subjects are asked to choose between acts f and g a majority chooses the act g . Again, in act g , the chance of winning $1,000 is precisely known to be 2/3, whereas in act f , the chance of winning is ambiguous. Thus, subjects tend to like specificity and to avoid ambiguity. By denoting v(R), v(W ), and v(Y ) as the probability of drawing a red, white, or yellow ball respectively, we obtain, assuming expected utility with
Table 7.1 The Ellsberg options Act
f g f g
30 balls
60 balls
Red
White
Yellow
$1,000 $0 $1,000 $0
$0 $1,000 $0 $1,000
$0 $0 $1,000 $1,000
138 Rakesh Sarin and Peter P. Wakker u(0) = 0:f g implies v(R)u(1, 000) > v(W )u(1, 000) or
v(R) > v(W );
g f implies v(W )u(1,000) + v(Y )u(1,000) > v(R)u(1,000) + v(Y )u(1,000), or v(W ) > v(R). Thus, consistent probabilities cannot be assigned to the states, as v(R) cannot simultaneously be larger as well as smaller than v(W ). Clearly, in this example no inconsistency results if v(R ∪ Y ) = v(R) + v(Y ). In our development we permit nonadditive probabilities for some events (such as R ∪ Y ) that we call ambiguous events. Our strategy is to differentiate between ambiguous and unambiguous events by requiring that only the acts that are measurable with respect to unambiguous events satisfy Savage’s axioms. General acts are assumed to satisfy somewhat weaker conditions that may yield nonadditive probabilities for ambiguous events. It is to be noted that we do not require an a priori definition of unambiguous or ambiguous events (for the latter see Fishburn, 1991). We do, however, assume that there exists a subclass of events, such as those generated by a roulette wheel, such that an SEU representation holds with respect to these events. The idea is that these events are unambiguous. The subclass of unambiguous events should be rich enough to ensure that all ambiguous events can be calibrated by appropriate bets contingent on unambiguous events. The strategy of permitting probabilities to be nonadditive and using them in CEU was first proposed by Schmeidler (1989, first version 1982). Schmeidler uses the set-up of Anscombe and Aumann (1963) (as refined in Fishburn, 1967, 1970, 1982), where for every state an act leads to an objective probability distribution, to formulate his axioms and derive the result. A nonadditive probability extension for the approach of Savage (1954) in full generality is very complicated. Gilboa (1987) succeeded in finding such an extension. The resulting axioms are, however, quite complicated and do not seem to have simple intuitive interpretations (see Fishburn 1998: 202). In this chapter, we propose another extension of Schmeidler’s model that in our view has a greater intuitive appeal. The basic idea is to reformulate Savage’s axioms to permit nonadditivity in probability for ambiguous events (event R ∪ Y in Table 7.1) while preserving additivity for unambiguous events (event Y ∪ W in Table 7.1). Technically, our work may be viewed as a sort of unification of Gilboa (1987) and Schmeidler (1989), and builds heavily on these works. Additional axiomatizations of CEU that assume some rich structure on the consequences instead of the states have been provided in Wakker (1989a,b, 1993a), and Nakamura (1990, 1992). Wakker (1990) has shown that CEU when applied to decision making under risk (where probabilities are extraneously specified) is identical to rank-dependent (anticipated) utility. A survey of several independent discoveries of the CEU form has been given in Wakker (1991).
Axiomatization of nonadditive expected utility
139
Schmeidler’s lottery-acts formulation may be viewed as a two-stage process where a state s occurs in the first stage and in the second stage a lottery is played to determine the final consequence. If probabilities are additive the one-stage formulation (e.g. of Savage) and the two-stage formulation (e.g., of Anscombe and Aumann) yield the same conclusion. However, as we shall see, in the nonadditive case the two formulations yield different conclusions about the preference rankings of acts. We begin by presenting some notations and definitions in Section 7.2. Our axioms and main result are stated in Section 7.3. In Section 7.4 we explore the relationship between CEU and SEU models. An example and a general result showing the irreconcilability of Schmeidler’s two-stage formulation with a naturally equivalent one-stage formulation are presented in Section 7.5. Finally, Section 7.6 contains conclusions, and proofs are given in the Appendix.
7.2. Definitions 7.2.1. Elementary definitions In this section we present the notation for the Savage (1954) style formulation for decisions under uncertainty and introduce some definitions that are useful in developing our results. There is a set C of consequences (payoffs, prizes, outcomes) and a set S of states of nature. The states in S are mutually exclusive and collectively exhaustive, so that exactly one state is the true state. We shall let A denote a σ -algebra of subsets of S, that is, A contains S, A ∈ A implies Ac (the complement of A) ∈ A, and A is closed under countable unions (this will be generalized in Remark 7.1). Thus A also contains Ø, and is closed under countable intersections. Subjective probabilities or “capacities” will be assigned to the elements of A; these elements are called events. An event A is informally said to occur if A contains the true state. The set C is also assumed to be endowed with a σ -algebra D; this will only play a role for acts with an infinite number of consequences. A decision alternative or an act is a function from S to C that is measurable, that is, f −1 (D) ∈ A for all D ∈ D. If the decision maker chooses an act f , then the consequence f (s) will result where s is the true state. The decision maker is uncertain about which state is true, hence about which consequence will result from an act. The set of acts is denoted as F . Act f is constant if, for some α ∈ C, f (s) = α for all states s. Often a constant act is identified with the resulting consequence. Statements of conditions are simplified by defining fA as the restriction of f to A, fA h as the act that assigns consequences f (s) to all s ∈ A, and consequences h(s) to all s ∈ S \ A. Given that consequences are identified with constant acts, fA α designates the act that is identical to f on A and constant α on S \ A; αA β is similar. Further, for a partition {A1 , . . . , Am }, we denote by 1 . . . α m the act that assigns consequence α j to each s ∈ A , j = 1, . . . , m. αA j Am 1 Such acts are called step acts.1 A binary relation over F gives the decision
140
Rakesh Sarin and Peter P. Wakker
maker’s preferences. The notations , , ≺, and ∼ are as usual. Further, is a weak order if it is complete (f g or g f for all f , g) and transitive. We define on C from on F through constant acts: α β if f g where f is constant α, g is constant β. Postulate P3 will ensure that on F and on C are in proper agreement. We assume that and D are compatible in the sense that all “preference intervals” are contained in D. A preference interval, as defined in Fishburn (1982), is a set E ⊂ D such that α, γ ∈ E, α β γ imply β ∈ E. A special case is a set E such that α ∈ E, β α implies β ∈ E. Such sets are called cumulative consequence sets. They will play a central role in this chapter. Example 7A.1 shows why, in the absence of set continuity, cumulative dominance must include all cumulative consequence sets and not just sets of the form {β:β α}; in the latter case cumulative dominance would become too strong. Following Savage (1954) (see also de Finetti (1931, 1937) and Ramsey (1931)), we define on A from on F through “bets on events:” A B if there exist consequences α β such that αA β αB β. We then say that A is more likely than B. Postulate P4 will ensure that on A satisfies usual conditions such as transitivity and completeness, and is in proper agreement with on F ; see also Lemma 7.1 in Section 7.2.2. Obviously, in this chapter the more-likely-than relation will not correspond to an additive probability; it will correspond to a “capacity”, that is a nonadditive probability see Lemma 7.2.1.2 We will make use of a sub σ -algebra Aua of A that should be thought of as containing unambiguous events, for example events generated by the spin of a roulette wheel, or by repeated tosses of a coin. We denote by F ua the set of acts that are D–Aua measurable; that is, F ua contains the acts f for which f −1 (E) ∈ Aua for each E ∈ D. We will assume that Savage’s (1954) axioms are satisfied if attention is restricted to the unambiguous events and F ua . An event A ∈ Aua is null if fA h ∼ gA h for all f , g ∈ F ua ; it is non-null otherwise. 7.2.2. Choquet expected utility A function v:A → [0, 1] is a capacity if v(Ø) = 0, v(s) = 1, and v is monotonic with respect to set-inclusion, that is, A ⊃ B ⇒ v(A) v(B). The capacity v is a (finitely additive) probability measure if, in addition, v is additive, that is, v(A ∪ B) = v(A)+v(B) for all disjoint A, B. A capacity v is convex-ranged if for every A ⊃ C and every µ between v(A) and v(C) there exists A ⊃ B ⊃ C such that v(B) = µ. For a capacity v, and a measurable functionφ : S → R, the Choquet integral of φ (with respect to v), denoted S φ dv, or φ dv, or φ, and introduced in Choquet (1953–1954), is v({s ∈ S:φ(s) τ }) dτ + [v({s ∈ S:φ(s) τ }) − 1] dτ . (7.1) R+
R−
In Wakker (1989b, Chapter VI) illustrations are given for the Choquet integral. We say that maximizes Choquet expected utility (CEU) if there exist a capacity v
Axiomatization of nonadditive expected utility
141
on A and a measurable utility function U :C → R such that the preference rela tion is represented by f → S U (f (s)) dv; the latter is called the Choquet expected utility of f , denoted CEU(f ). Suppose there are n states s1 , . . . , sn and U (f (s1 )) · · · U (f (sn )). Then CEU(f )=
n−1
(U (f (si )) − U (f (si+1 )))v({s1 , . . . , si }) + U (f (sn )).
i=1
The proof of the following lemma is left to the reader. Lemma 7.1. If on F maximizes CEU , then the relation on C is represented by the utility function U , and the relation on A is represented by the capacity v whenever U is nonconstant.
7.3. The main result Apart from the well-known postulates of Savage on the unambiguous acts, we shall use one additional postulate, “cumulative dominance” (P4 on the next page), to govern preferences over ambiguous acts. It is a natural extension of Savage’s P4 to acts with more than two consequences. When restricted to acts with exactly two consequences, our P4 is identical to Savage’s P4. It is best appreciated as an adaptation of the stochastic dominance condition. Let us recall that stochastic dominance applies to decision making under risk, where for each uncertain event A ∈ A a probability P (A) is well specified, and usually C is an interval within R. In this setting, an act (or its probability distribution as generated over consequences) stochastically dominates another if it assigns to each cumulative consequence set3 at least as high a probability. In the present set-up, without probabilities attached to each event, it is natural to say that an act f stochastically (“cumulatively”) dominates an act g if the decision maker regards each cumulative consequence set at least as likely under f as under g. Monotonicity with respect to stochastic dominance, reformulated with this adaptation, is our additional postulate P4. It turns out that this condition in the presence of the usual conditions, and Savage’s conditions on a rich set of unambiguous acts, is necessary and sufficient for CEU. To readers familiar with CEU and with Savage’s set-up, the proof of the main result may be transparent if P4 is assumed. We hope that this mathematical simplicity is viewed as a strength of the chapter, because P4, in our opinion, is an intuitively appealing assumption about behavior under uncertainty as well. We first state the axioms and then the main theorem, which is followed by a discussion. Postulate P1. Weak ordering. Postulate P2. (The sure-thing principle for unambiguous acts). For all events A and acts f , g, h, h with fA h, gA h, fA h , gA h ∈ F ua : fA h gA h ⇐⇒ fA h gA h .
142
Rakesh Sarin and Peter P. Wakker
Postulate P3. For all events A ∈ A, acts f ∈ F , and consequences α, β : α β ⇒ αA f βA f . The reversed implication holds as well if A ∈ Aua , A is nonnull, and f ∈ F ua . Postulate P4. (Cumulative dominance). For all acts f, g we have: f g whenever f −1 (E) g −1 (E) for all cumulative consequence sets E. Postulate P5. (Nontriviality). There exist consequences α, β such that α β. Postulate P6. (Fineness of the unambiguous events). If α ∈ C and, for f ∈ F ua , g ∈ F , f g, then there exists a partition (A1 , . . . , Am ) of S, with all elements in Aua , such that αAj f g for all j , and the same holds with ≺ instead of . The following postulate is Gilboa’s adaptation of Savage’s P7 to the case of CEU. It is a technical condition, and is only needed for the extension of CEU to acts with infinite range. In order to state the postulate, we define an event A to be f -convex if for any s, s ∈ A and s ∈ S, f (s) f (s ) f (s ) ⇒ s ∈ A. Note that, for some fixed s ∈ A, f (s)A h denotes the act that assigns f (s) to each s ∈ A, and is identical to h on Ac . Postulate P7. For all f , g ∈ F , and nonempty f -convex events A, f (s)A f g
f or all s ∈ A ⇒ f g,
and the same holds with instead of . We now state the main theorem. In it, cardinal abbreviates “unique up to scale and location.” Theorem 7.1. The following two statements are equivalent: (i) The preference relation maximizes CEU for a bounded nonconstant utility function U on C, and for a capacity v on A. On Aua the capacity is additive and convex-ranged. (ii) Postulates P1–P7 are satisfield. Further, the utility function in statement (i) is cardinal, and the capacity is unique. In this result, condition P4 can be weakened to the following “cumulative reduction” condition, if in addition we include Savage’s P4 (i.e. our P4 restricted to two-consequence acts). Cumulative reduction says that the only relevant aspect of an act is its “decumulative” distribution. Cumulative reduction follows from two-fold application of P4, with the roles of f and g interchanged. This condition is the only implication of P4 that we shall use in the proof of Theorem 7.1 for acts
Axiomatization of nonadditive expected utility
143
with more than two consequences. We have preferred to present the stronger P4 in the theorem because of its close relationship with stochastic dominance. Postulate P4 . (Cumulative reduction). For all acts f , g we have: f ∼ g whenever
f −1 (E) ∼ g −1 (E)
for all cumulative consequences sets E. Let us also point out that all conditions can be weakened to hold only for step acts, with the exception of P1, the act g in P6, and P7. If P4/P4 is restricted to step acts then cumulative consequence sets can be restricted to sets of the form {β ∈ C : β α} for some α ∈ C. The next example considers the cases where the state space is a product space. These are the cases considered by Schmeidler. The above theorem applies to any case where there is a sub σ -algebra isomorphic to the Borel sets on [0, 1] endowed with the Lebesgue measure; the latter is somewhat more general than product spaces. The technique of this chapter allows for more generality: the sets of ambiguous acts and events can be quite general, as long as the set of unambiguous acts and events is sufficiently rich. This will be explicated in Remark 7.1. A further generalization can be obtained in our one-stage approach by imposing on F ua the conditions of Gilboa (1987) which lead to CEU, instead of using Savage’s conditions which lead to additive expected utility. The proof of this more general result is almost identical to the proof of Theorem 7.1. In other words, as soon as there is a sufficiently rich subset of acts on which CEU holds, then by cumulative dominance CEU will spread over all acts. Alternatively, for the rich subset of acts, we could have taken the set of probability distributions over the consequences, with expected utility or rank-dependent utility maximized there. We chose Savage’s set-up because it is very appealing. Example 7.1. Let [0, 1] be endowed with the usual Lebesgue measure (i.e. uniform distribution) over the usual Borel σ -algebra. can be any set endowed with any σ -algebra. Let S = × [0, 1], endowed with the usual product σ -algebra; v is any capacity that assigns the Lebesgue measure of E to any set × E. C can be any arbitrary set, and U :C → R any function, nonconstant to avoid triviality. Preferences maximize CEU. With Aua the σ -algebra of all sets of the form × E for E a Borel-subset of [0, 1], all Postulates P1–P7 are satisfied. Remark 7.1. The requirement that A should be a σ -algebra, and that all A–D measurable functions from S to C should be included in F , can be restricted to the unambiguous acts and events, as follows. (i) Aua should be a σ -algebra, and all Aua –D measurable functions from S to C should be included in F . Then, in addition, the following adaptations should be made. First, the measurability requirement should be imposed that for all f ∈ F and cumulative consequence sets E, f −1 (E) ∈ A. Second, Postulate P3 should be required only if αA f , βA f ∈ F . Third, the nontriviality Postulate P5 should be changed as follows.
144 Rakesh Sarin and Peter P. Wakker Postulate P5 . There exist consequences α β such that αA βAc ∈ F for all events A ∈ A. P5 as such is not a necessary condition for the CEU representation. Fourth and finally, for Postulate P7, needed for nonsimple acts, it should be required that for all acts f ∈ F , f -convex events A, and states s ∈ A, f (s)A f be contained in F (consequences can be “collapsed”). Note that this allows for great generality. For instance, A may consist of Aua , events described by a roulette wheel, and a collection of events entirely unrelated to the roulette wheel. There is no need to incorporate intersections or unions of events described by the roulette wheel, and other events. Let us finally comment further on the uniqueness of the capacity in Theorem 7.1. Suppose Statement (i) in Theorem 7.1 holds. Would there exist CEU representations that also represent the preference relation but have v nonadditive on Aua ? The following observation answers this question. Observation 7.1. Suppose Statement (i) in Theorem 7.1 holds. If there exist three or more equivalence classes of consequences, then for any CEU representation the capacity will be additive on Aua . If there exist no more than two equivalence classes of consequences, then any capacity can be taken that is a strictly increasing transform of the capacity of Theorem 7.1.4
7.4. Revealed unambiguous events In this section we characterize revealed unambiguous events and partitions, that is, those for which the capacity is additive (defined hereafter). It is possible that a decision maker considers some events as ambiguous but nevertheless reveals an additive capacity with respect to these. The characterization of this section will lead to a generalization of the theorem of Anscombe and Aumann (1963). A capacity is additive on a partition {A1 , . . . , Am } if v(A ∪ B) = v(A) + v(B) for all disjoint events A, B that are unions of elements of the partition. This is equivalent to additivity of the capacity on the algebra generated by the partition. A capacity is additive with respect to an event A if it is additive with respect to the partition {A, Ac }, that is, if v(A) = 1 − v(Ac ). Gilboa (1989) used the term symmetry for a capacity that is additive with respect to each event. As shown there, symmetry does not imply that the capacity is additive. A capacity is additive if and only if it is additive on each partition, which holds if and only if it is additive on each partition consisting of three events (consider, for disjoint events A, B, the partition {A, B, (A ∪ B)c }). In the presence of the rich Aua in Theorem 7.1, the characterization of revealed unambiguous partitions is easy. Note that in CEU additivity of the capacity immediately leads to SEU. Machina and Schmeidler (1992) consider the case with an additive probability measure on the events, and a general (nonexpected utility) functional, such as used in Machina (1982). Like our main result, their main result weakens Savage’s sure-thing principle and strengthens his P4. Their P4 implies the sure-thing principle for two-consequence acts, which our P4
Axiomatization of nonadditive expected utility
145
obviously does not. In addition, it implies, mainly in the presence of P6, our P4. The Ellsberg paradoxes give examples where their P4 is satisfied. Proposition 7.1. Suppose Statement (i) in Theorem 7.1 holds. Let {A1 , . . . , Am } be a partition. The following four statements are equivalent: (i) The capacity is additive on the partition. (ii) For all disjoint A and A that are unions of elements of the partition, and for disjoint unambiguous events B ua ∼ A, B ua ∼ A , we have A ∪ A ∼ ua ua B ∪B . ua } such that (iii) There exists an unambiguous partition {B1ua , . . . , Bm 1 m ∼ αB1 1ua . . . αBmmua αA . . . αA 1 m
for all consequences α 1 , . . . , α m .
ua } we have: (iv) For each unambiguous partition {B1ua , . . . , Bm
A1 ∪ . . . ∪ Aj ∼ B1ua ∪ . . . ∪ Bjua for all j ⇒
1 αA 1
m . . . αA m
∼
αB1 1ua
. . . αBmmua
(7.2) 1
for all consequences α , . . . , α m .
We could obviously obtain additivity of the capacity v in Statement (i) of Theorem 7.1 by adding any of the conditions in Statements (ii), (iii), or (iv) above, for each partition, to Statement (ii) of Theorem 7.1. Given the importance of the result that can be derived from Statement (iv), let us make the condition explicit: Postulate P4 . (Reduction.) For each partition {A1 , . . . , Am } and each unamua }, (7.2) holds true. bigous partition {B1ua , . . . , Bm If in the definition of reduction we would have added the condition that the consequences in (7.2) are rank-ordered, that is, α1 · · · αm , then the condition would have been identical to P4 (cumulative reduction) restricted to step acts, which is all of P4 that is needed apart from its restriction to two-consequence acts (i.e. Savage’s P4). P4 resembles the reduction principle in Fishburn (1998), which is called neutrality in Yaari (1987). This principle says that if for two acts consequences are in some sense equally likely, then the acts are equivalent. Corollary 7.1. In Statement (i) of Theorem 7.1 additivity of the capacity can be added if in Statement (ii) P4 (cumulative dominance) is replaced by P4 (reduction) plus the restriction of P4 to two-consequence acts. The above corollary can be regarded as a generalization of the result of Anscombe and Aumann (1963) and Fishburn (1967). Their structure is rich enough to satisfy P1–P3, P4 , and P5–P7. The set-up of the above corollary is more general in exactly the same way that the set-up of Theorem 7.1 is more general than the result of Schmeidler (1989): The state space is not required to be a Cartesian product of ambiguous and unambiguous events. All that is needed is that the set of unambiguous events be rich enough. In the same way that Theorem 7.1 can be
146
Rakesh Sarin and Peter P. Wakker
considered a unification of the results of Schmeidler (1989) and Gilboa (1987), the above corollary can be considered a unification of the results of Anscombe and Aumann (1963) and Savage (1954). The key feature in either case is that the events generated be a random device are incorporated within the state space. We think this is more natural than the two-stage approach of Anscombe and Aumann (1963). In the practice of decision analysis, objective probabilities of events Aua generated by a roulette wheel will typically be used as in Lemma 7A.1 in the Appendix to elicit “unknown” probabilities. This in no way requires a two-stage structure. While Theorem 7.1 was (apart from convex-rangedness) less general than Gilboa’s result, the above corollary is a generalization of both Anscombe and Aumann’s result and Savage’s result. A generalization as indicated in Remark 7.1 can also be obtained for the above corollary. An earlier result along these lines, within the classical additive set-up, is Bernardo et al. (1985). Corollary 7.1 is more general, mainly because, unlike Bernardo et al., we do not require a stochastic independence relation as a primitive, or existence of independent unambiguous events.
7.5. Nonequivalence of one- and two-stage approaches Schmeidler made the novel contribution of showing that CEU is capable of permitting attitudes toward ambiguity that are disallowed by Savage’s SEU. Schmeidler stated his axioms using the horserace–roulette wheel set-up of Anscombe and Aumann (1963). This is a two-stage set-up; that is, in the first stage an event (e.g. the horse Secretariat winning) obtains and in the second stage the consequence is determined depending, for example, on a roulette wheel. In Schmeidler’s model capacities are assigned to first-stage events. Further, the lotteries in the second stage are evaluated by the usual additive expected utility. An act assigns to each first-stage event a lottery, thus an expected utility value. The Choquet integral of these (with respect to the capacity over the first-stage events) gives the evaluation of the act. In our one-stage approach we embed the roulette wheel lotteries within Savage’s formulation by enlarging the state space S. Our one-stage approach is complementary to the two-stage approach of Schmeidler as it provides additional flexibility in modeling decisions under uncertainty. This one-stage approach to CEU was introduced in Becker and Sarin (1989). In the SEU theory, whether the one-stage or a two-stage approach is employed is purely a matter of taste or convenience in modeling. In the CEU framework, however, these two variations produce theoretically different results. We demonstrate this theoretical nonequivalence of one- and two-stage approaches through an example. Our analysis gives further evidence that multi-stage set-ups in nonexpected utility may cause complications. Gärdenfors and Sahlin (1983), Luce and Narens (1985), Luce (1988, 1991, 1992), Luce and Fishburn (1991), Segal (1987, 1990) focus on distinctions between one- and two-or-more-stage set-ups. Segal (1990) uses a two-stage set-up to describe an ambiguous event. Probabilities within each stage are assumed to be additive but they do not follow multiplicative rules between
Axiomatization of nonadditive expected utility (a)
Stage 1
act
Stage 2 H ub
H
b
f
T
ub
H
ub
Tb T ub (b)
H bH ub H bTub
147
Consequences (utilities) 1
1
1
–1
0
0
0
0
1
0
0
1
1
1
1
–1
0
0
0
0
1
0
0
1
f T bH ub T bT ub
Figure 7.1 (a) Two-stage formulation of Example 7.2 (b) one-stage formulation of Example 7.2.
the two stages. Segal showed how dominance type axioms can provide nonexpected utility characterizations in the two-stage-set up (also see Wakker, 1993b). Example 7.2. This example is a small variation on one of the paradoxes of Ellsbeurg. The preferences used in the example are consistent with those observed in the Ellsberg paradox. Further, the single-stage capacities are uniquely determined by the equivalent two-stage model of Schmeidler. Suppose a biased coin and an unbiased coin will be tossed. The possible states of nature are H b H ub , H b T ub , T b H ub , T b T ub , where H b T ub denotes the state where the biased coin lands heads up and the unbiased coin lands tails up, and so on. For simplicity assume that utility is known and that payment is in utility. It follows in Schmeidler’s model that subjects consider a bet of 1 on H ub 5 as well as a bet of 1 on T ub equivalent to 1/2 for certain (given that payment is in utility). It has been observed that subjects will typically consider a bet of 1 on H b as well as a bet of 1 on T b less preferable. Let us assume the latter bets are equivalent to α for certain, for some number α < 1/2. In the two-stage set-up of Anscombe–Aumann and Schmeidler, decisions are formulated as shown in Figure 7.1(a). For the act f shown in Figure 7.1(a), the two-stage approach yields CEU(f ) = 0, because the probability of H ub and T ub is 1/2. Thus, f is judged indifferent to a constant act g with consequence 0. Note that our assumption stated in the preceding paragraph implies that, with v m denoting the capacity in the two-stage approach, v m (H b ) = v m (T b ) = α. Now consider the one-stage formulation of the act in Figure 7.1(a) as depicted in Figure 7.1(b). To evaluate CEU(f ) in Figure 7.1(b) we need the single-stage capacities, now denoted v j to distinguish from the capacities in the two-stage approach, v j (H b H ub ) and v j (H b H ub , T b H ub , T b T ub ). For consistency with
148
Rakesh Sarin and Peter P. Wakker
the two-stage approach (see the boxed columns in Figure 7.1(a) and (b), the first column in Schmeidler’s two-stage approach is equivalent to α/2 and the second column to α × 1 + (1 − α) × 12 = 12 + 12 α, so v j (H b H ub ) = α/2 and v j (H b H ub , T b H ub , T b T ub ) = 12 + 12 α must be chosen. Hence, in the one-stage approach, CEU(f ) is α/2 + (1 − ( 12 + α/2))(−1) = α − 12 < 0; it follows that f ≺ g(≡ 0). Thus the one- and the two-stage approach yield different results, and are irreconcilable. They only agree in the additive case α = 1/2. In Sarin and Wakker (1990: Theorem 10) it is shown that the result of the above example holds in full generality. That is to say, only under expected utility can the one- and two-stage approach of CEU be equivalent. As soon as the capacity is nonadditive in Schmeidler’s two-stage approach, the equivalent one-stage approach is not a CEU model.
7.6. Conclusion Savage’s SEU theory is widely accepted as a rational theory of decision making under uncertainty in economics and decision sciences. Unfortunately, however, people’s choices violate the axioms of SEU theory in some well-defined situations. One such situation is when event probabilities are ambiguous. In this chapter we have shown that a simple extension of SEU theory called CEU theory can be derived by assuming a natural cumulative dominance condition. CEU permits a subject to assign probabilities to events so that the probability of a union of two disjoint events is not necessarily the sum of the individual event probabilities. The violation of additivity may occur because a person’s choice may be influenced by the degree of confidence or specificity about the event probabilities. Schmeidler and Gilboa have also proposed axioms to derive the CEU representation. Building on their work, we have provided the simplest derivation of CEU presently available. Also, conditions have been given under which CEU reduces to SEU. It is also shown that unlike SEU theory, where a one-stage set-up of Savage or a two-stage set-up of Anscombe and Aumann yield identical results, the two-stage set-up of Schmeidler cannot be reconciled with a one-stage formulation unless event probabilities are additive. In our opinion the one-stage set-up as used by Gilboa seems more appropriate in single-person decision theory. We hope that our work has clarified the distinction between CEU and SEU theories and that it will stimulate further research and additional explorations of CEU.
Appendix: Proofs A1. Proof of Theorem 7.1, Remark 7.1, and Observation 7.1 For the implication (i) ⇒ (ii) in Theorem 7.1, suppose (i) holds. Then P1 follows directly. P2 and P3 are standard results from, mainly, the usual additive expected utility theory. For Postulate P4, note that if [f −1 (E) g −1 (E) for all cumulative consequence sets E], then by Lemma 7.1 the integrand in (7.1) is at least as large for φ = U ◦ f as for φ = U ◦ g. So f g, as P4 requires. P5 is direct from
Axiomatization of nonadditive expected utility
149
nonconstantness of U . For P6, let f ∈ F ua , g ∈ F , f g (the case f ≺ g is similar) and α ∈ C. By boundedness of utility, there exists µ > 0 such that ∀s ∈ S:U (f (s))−U (α) < µ. Because v is convex-ranged within Aua , we can take a partition {A1 , . . . , Am } of S such that Aj ∈ Aua and v(Aj ) < (CEU(f )−CEU(g))/µ for all j . For P7, let f , g ∈ F , and let A ∈ A be a nonempty event (f -convexity of A will not be used). Then, with U ∗ = U ◦f on Ac , and U ∗ = inf A U ◦f (inf is real-valued by nonemptiness of A and boundedness of U ) on A, the premise in P7 implies ∗ U ◦ f dv U dv = inf U ◦ f (s)A f dv CEU(g). s∈A
Next we suppose (ii) holds, and derive (i) and the uniqueness results, including Observation 7.1. It is immediate that Savage’s postulates P1–P6 hold true on F ua . So we get an SEU representation on F s,ua , which denotes the set of step acts in F ua . There exist a cardinal utility function U :C → R and a unique additive probability measure P on Aua , such that expected utility represents preferences on F s,ua . We call P (A) the “probability” of A. As follows from Savage (1954), P is atomless and satisfies convex-rangedness. Obviously, P will be the restriction of v to F ua . Let us next extend the CEU representation as now established for all unambiguous step acts, to all step acts. First, we define the capacity v. By P5 there are consequences ζ η, which are kept fixed throughout the proof. Lemma 7.A.1. For each event A there exists an Aua ∈ Aua such that ζA η ∼ ζAua η. Proof. By P2, ζS η ζA η ζØ η. Suppose that in fact ζS η ζA η ζØ η (otherwise we are done immediately), and that for event B ua ∈ Aua we have ζA η ζB ua η (e.g. B ua = Ø). This implies P ((B ua )c ) > 0. By P6, there exists a partition C1 , . . . , Cn of S, with all Cj ∈ Aua , such that ζB ua ∪Cj η ≺ ζA η for all j . There exists at least one Cj ∩ (B ua )c with strictly positive probability. So there exists an event B ua : = B ua ∪ Cj with probability strictly greater than B ua , and such that still ζA η ζB ua η. So, using convex-rangedness, the set of probabilities of events B ua as above must be of the form [0, p − [ for some 0 < p − 1. Similarly, the set of probabilities of events C ua ∈ Aua such that ζA η ≺ ζC ua η, must be of the form ]p+ , 1] for some 0 p+ < 1. The only possibility is p− = p+ . By convex-rangedness there exists an event Aua ∈ Aua with probability p − . Now ζA η ∼ ζAua η is the only possibility. Q.E.D. Thus, for every A ∈ A, there exists an Aua that is equally likely. Because each possible choice of Aua has the same P value, we can define v:A → P (Aua ), extending v from Aua (where v = P ) to the entire A. For monotonicity with respect to set-inclusion, suppose that A ⊃ B. Then, by P2, ζA η ζB η . From this v(A) v(B) follows, and v is a capacity. To establish the CEU representation for all step acts, we construct for each ambiguous step act an unambiguous one “with the same cumulative distribution.”
150
Rakesh Sarin and Peter P. Wakker
That is, for the ambiguous and the unambiguous acts the events of obtaining a consequence at least as good as α are equally likely, for each consequence α. For step acts this is not only necessary, but also sufficient, to have all cumulative consequence sets equally likely under the two acts. First, we extend Lemma 7A.1. The proof of the extension is completely similar, with µ, ν in the place of ζ , η, further f in the place of ζA η, and µ f ν implied by P4.
Lemma 7.A.2. For each act f for which there exist consequences µ, ν such that [∀s ∈ S:µ f (s) ν], there exists an Aua ∈ Aua such that µAua ν ∼ f . Obviously, by the SEU representation as already established, ζAua η ∼ ζB ua η for each unambiguous event B ua equally likely as Aua . By convex-rangedness of P , and Lemma 7.A.1, for each partition A1 , . . . , Am of S we can find an unambiguous partition B1 , . . . , Bm of S such that A1 ∪ · · · ∪Aj is equally likely as B1 ∪· · ·∪Bj , for each j . To do so, first we find an unambiguous B1 ∼ A1 , and set B1 : = B1 . Next we find an unambiguous B2 ∼ A1 ∪ A2 . By convex-rangedness of P , we can find an unambiguous B2 with B2 ∩ B1 = Ø such that P (B1 ∪ B2 ) = P (B2 ), so that B1 ∪ B2 ∼ A1 ∪ A2 , and so on. The next paragraph is the central part of the proof, and is simple. The other parts of the proof are all standard after Savage (1954), using Gilboa’s (1987) P7. 1 . . . α m be an arbitrary step act, with α 1 · · · α m . We take Let αA Am 1 an unambiguous partition {B1 , . . . , Bm } as described earlier. The unambiguous act αB1 1 . . . αBmm , by two-fold application of P4 (once with , once with ), is equivalent to the ambiguous act. Its SEU value can, similarly to the Choquet integral, be written as P (B1 )U (α 1 ) + [P (B1 ∪ B2 ) − P (B1 )]U (α 2 ) + · · · + [1 − P (B1 ∪ · · · ∪ Bm−1 )]U (α m ). This shows that it is identical to the CEU value of the ambiguous act. So indeed CEU represents preferences between all step acts. The extension of the CEU representation to non-step acts is mainly by P7, and is similar to Gilboa (1987). Note that this in particular establishes the expected utility representation on the entire set F ua . Contrary to Gilboa (1987), our capacity need not be convex-ranged. We can however follow the reasoning of his subsection 4.3 with only unambiguous step acts f¯, g. ¯ Convex-rangedness is used there for the existence of g, ¯ while convex-rangedness of P suffices for that. In the proof of his Theorem 4.3.4, in Statement (i), the act f¯ can always be chosen unambiguous, by Lemma 7.A.2. Let us also mention that one cannot restrict P7 to F ua . This would be possible if for each ambiguous act there would exist an unambiguous act with the same cumulative distribution. This however is not the case in general. For example if P is countably additive, then it cannot generate strictly finitely additive distributions; for example, with C = R, it does not generate cumulative distribution functions that are not continuous from the right. Also it is possible that for instance U (C) = [0, 1[, P is countably additive, and there exists a positive ε such that under an ambiguous act f each cumulative event {s ∈ S : f (s) α} (0 α < 1) has capacity at least ε.
Axiomatization of nonadditive expected utility
151
The utility functions must be bounded, as follows from the representation on F ua . This is shown in Fishburn (1970: section 14.1), and the second 1972 edition of Savage (1954: footnote on p. 80). Finally we establish the uniqueness results. By the standard results of Savage (1954) we get cardinality of U , and uniqueness of the restriction P of v to Aua . The extension of v to A \ Aua shows that v is uniquely determined. Next let us suppose that v is allowed to be nonadditive on F ua , as studied in Observation 7.1. Let us at first also suppose that there are three or more nonequivalent consequences. Then the representation, if restricted to F ua , satisfies all conditions in Gilboa (1987); hence by his uniqueness results the restriction of v to F ua is unique, so additive. The uniqueness of v follows in the same way as above. Let us finally consider the case where there are exactly two equivalence classes of consequences, with say ζ η. Any U instead of U in a CEU representation is constant on equivalence classes of consequences and satisfies U (ζ ) > U (η). So U is a strictly increasing transform of U , and obviously is bounded. Given the two-valued range, U is cardinal. Because any v in a CEU representation has to represent the same ordering over events as v, v must be a strictly increasing transform of v. Conversely, any such v will do. Thus, it is possible to choose v such that it is not convex-ranged and not additive on Aua . It can however always be chosen such that it is convex-ranged and additive on Aua . For the proof of Remark 7.1, note that all constructions in the proof of the implication (ii) ⇒ (i) of Theorem 7.1 (including the extension to nonsimple acts, following Gilboa 1987) remain possible under the conditions of Remark 7.1. Our result has not established convex-rangedness of the capacity v. That can be characterized by addition of one condition, Gilboa’s P6*. We propose to rename this as “solvability.” Solvability is satisfied if for all acts f , g, consequences α β, and events A, if αA f g βA f , and αA f , βA f “comonotonic” (∀s ∈ Ac :f (s) α or f (s) β), there exists an event B ⊂ A such that αB βA\B fAc ∼ g. That solvability, even if restricted to two-consequence acts, is sufficient for convexrangedness of v, follows mainly from convex-rangedness of P , which gives all desired “intermediate” g. Necessity is straightforward. Proposition 7.A.1. Suppose Statement (i) in Theorem 7.1 holds. Then v is convexranged if and only if satisfies solvability. For the case of three or more equivalence classes of consequences, a more general derivation, without use of F ua , is given in Gilboa (1987). If there are exactly two equivalence classes of consequences and v is not required to be additive on Aua , then, by Observation 7.1, v need not be convex-ranged, even if solvability is satisfied. The following example shows why we used cumulative consequence sets, instead of less general sets of the form {β ∈ C:β α} for α ∈ C, in the definition P4 of cumulative dominance, and its derivatives P4 and P4 . Note that the distinction is relevant only for nonstep acts, and that we could have restricted
152
Rakesh Sarin and Peter P. Wakker
P4, P4’, P4 to step acts. In that case, we could have used the less general sets as mentioned earlier. Example 7.A.1. Suppose the special case of Statement (i) in Theorem 7.1 holds where in fact all of Savage’s axioms are satisfied. So v is an additive probability measure, that we denote by P . Let C = {−1/j :j ∈ N} ∪ {1 + 1/j :j ∈ N}, and ∞ ∞ let U be the identity. Let {Aj }∞ j = 1 ∪ {Bj }j = 1 be a partition of S, {Aj }j = 1 ∪ {Bj } ∞ another partition of S. A: = ∪j = 1 Aj , B, A , B are defined similarly. Suppose that P (Aj ) = P (Aj ), P (Bj ) = P (Bj ) for all j . Further suppose that P (A) > P (A ). Such cases can be constructed if P is not set-continuous, that is, not countably additive. Let f assign 1 + 1/j to each Aj , and −1/j to each Bj . Similarly f assigns 1 + 1/j to each Aj , and −1/j to each Bj . For each consequence 1 + 1/j we have j j P (f (s) 1 + 1/j ) = i = 1 P (Aj ) = i = 1 P (Aj ) = P (f (s) 1 + 1/j ). For each consequence −1/j we have P (f (s) −1/j ) = 1 − P (f (s) ≺ j −1 j −1 −1/j ) = 1 − i = 1 P (Bi ) = 1 − i = 1 P (Bi ) = P (f (s) −1/j ). So for each α ∈ C:{s ∈ S:f (s) α} ∼ {s ∈ S:f (s) α}. However, for 0 µ 1, P (f (s) µ) = P (A) > P (A ) = P (f (s) µ). By Formula (7.1), CEU(f ) − CEU(f ) = 1 × (P (A) − P (A )) > 0. So f f . Only for cumulative consequence sets E: = [µ, ∞[ with µ as above we do not have f −1 (E) f −1 (E). A1. Proof of Proposition 7.1 The implications (i) ⇒ (ii) and (i) ⇒ (iv) are direct. The implication (i) ⇒ (iii) follows from convex-rangedness of P . Next we prove that Statement (i) is implied by each of the other statements. (ii) ⇒ (i) is direct. (iii) ⇒ (ii) follows from taking A and A as union of Aj ’s, taking B ua and B ua as union of corresponding Bjua ’s, and from the equivalences (with ζ η) ζA η ∼ ζB ua η, ζA η ∼ ζB ua η, ζA∪A η ∼ ζB ua ∪B ua η. Finally, suppose (iv) holds. Similarly to the reasoning below Lemma 7.A.2, we can show the existence of ua } such that A ∪· · ·∪A ∼ B ua ∪· · ·∪B ua an unambiguous partition {B1ua , . . . , Bm 1 j 1 j for all j . For any A that is a union of Aj ’s-different-from-A1 , and B ua a union of corresponding Bj ’s, we have, by (7.2), ζA∪A1 η ∼ ζB ua ∪B1ua η and ζA η ∼ ζB ua η. Taking differences and dividing by the positive U (ζ ) − U (η) we get v(A∪A1 )−v(A) = P (B ua ∪B1ua )−P (B ua ) = P (B1ua ). So the “decision weight” that A1 contributes to each union of the other Aj ’s, is independent of those other Aj ’s. The same holds for each Av . Hence, the capacity of a union of different Aj ’s is the sum of the separate capacities: v is additive on the partition.
Acknowledgments The support for this research was provided in part by the Decision, Risk, and Management Science branch of the National Science Foundation; the research of Peter Wakker has been made possible by a fellowship of the Royal Netherlands Academy of Arts and Sciences, and a fellowship of the Netherlands Organization for Scientific Research. The authors are thankful to two referees for many detailed comments.
Axiomatization of nonadditive expected utility
153
Notes 1 Every step act is “simple,” that is, is measurable and has a finite range. If D contains every one-element subset, then every simple act is a step act. Step acts turn out to be easier to work with than simple acts. 2 Sometimes a nonadditive capacity is a strictly increasing transform of a probability measure, which then also represents the “more-likely-than” relation. In general, however, a capacity will not be of that form. 3 For example, receiving α or a superior consequence. 4 Only one will be additive on Aua of course. 5 Such a bet gives 1 if H ub obtains and 0 if T ub obtains.
References Anscombe, F. J. and R. J. Aumann (1963). “A Definition of Subjective Probability,” Annals of Mathematical Statistics, 34, 199–205. Becker, J. L. and R. K. Sarin (1989). “Economics of Ambiguity,” Duke University, Fuqua School of Business, Durham, NC, USA. Bernardo, J. M., J. R. Ferrandiz, and A. F. M. Smith (1985). “The Foundations of Decision Theory: An Intuitive, Operational Approach with Mathematical Extensions,” Theory and Decision, 19, 127–150. Choquet, G. (1953–1954). “Theory of Capacities,” Annales de l’Institut Fourier, 5 (Grenoble), 131–295. de Finetti, B. (1931). “Sul Significato Soggettivo della Probabilità,” Fundamenta Mathematicae, 17, 298–329. —— (1937). “La Prévision: Ses Lois Logiques, ses Sources Subjectives,” Annales de l’ Institut Henri Poincaré, 7, 1–68. Translated into English by H. E. Kyburg, “Foresight: Its Logical Laws, its Subjective Sources,” in Studies in Subjective Probability, ed. by H. E. Kyburg and H. E. Smokler. New York: Wiley, 1964, 53–118; 2nd edition, New York: Krieger, 1980. Ellsberg, D. (1961). “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Fishburn, P. C. (1967). “Preference-Based Definitions of Subjective Probability,” Annals of Mathematical Statistics, 38, 1605–1617. —— (1970). Utility Theory for Decision Making. New York: Wiley. —— (1982). The Foundations of Expected Utility. Dordrecht: Reidel. —— (1988). Nonlinear Preference and Utility Theory. Baltimore: Johns Hopkins University Press. —— (1991). “On the Theory of Ambiguity,” International Journal of Information and Management Science, 2, 1–16. Gärdenfors, P. and N.-E. Sahlin (1983). “Decision Making with Unreliable Probabilities,” British Journal of Mathematical and Statistical Psychology, 36, 240–251. Gilboa, I. (1987). “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88. —— (1989). “Duality in Non-Additive Expected Utility Theory,” in Choice under Uncertainty, Annals of Operations Research, ed. by P. C. Fishburn and I. H. LaValle. Basel: J. C. Baltzer AG, 405–414. Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan. Second edition, 1948. Knight, F. H. (1921). Risk, Uncertainty, and Profit. New York: Houghton Mifflin.
154
Rakesh Sarin and Peter P. Wakker
Luce, R. D. (1988). “Rank-Dependent, Subjective Expected-Utility Representations,” Journal of Risk and Uncertainty, 1, 305–332. —— (1991). “Rank- and-Sign Dependent Linear Utility Models for Binary Gambles,” Journal of Economic Theory, 53, 75–100. —— (1992). “Where Does Subjective Expected Utility Fail Descriptively?” Journal of Risk and Uncertainty, 4, 5–27. Luce, R. D. and P. C. Fishburn (1991). “Rank- and-Sign Dependent Linear Utility Models for Finite First-Order Gambles,” Journal of Risk and Uncertainty, 4, 29–59. Luce, R. D. and L. Narens (1985). “Classification of Concatenation measurement Structures According to Scale Type,” Journal of Mathematical Psychology, 29, 1–72. Machina, M. J. (1982). “ ‘Expected Utility’ Analysis without the Independence Axiom,” Econometrica, 50, 277–323. Machina, M. J. and D. J. Schmeidler (1992). “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. Nakamura, Y. (1990). “Subjective Expected Utility with Non-Additive Probabilities on Finite State Spaces,” Journal of Economic Theory, 51, 346–366. —— (1992). “Multi-Symmetric Structures and Non-Expected Utility,” Journal of Mathematical Psychology, 36, 375–395. Ramsey, F. P. (1931). “Truth and Probability,” in The Foundations of Mathematics and other Logical Essays. London: Routledge and Kegan Paul, 156–198. Reprinted in Studies in Subjective Probability, ed. by H. E. Kyburg and H. E. Smokler. New York: Wiley, 1964, 61–92. Sarin, R. and P. P. Wakker (1990). “Incorporating Attitudes towards Ambiguity in Savage’s Set-up,” University of California, Los Angeles, CA. Savage, L. J. (1954). The Foundations of Statistics. New York: Wiley. Second Edition, New York: Dover, 1972. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Segal, U. (1987). “The Ellsberg Paradox and Risk Aversion: An Anticipated Utility Approach,” International Economic Review, 28, 175–202. —— (1990). “Two-Stage Lotteries without the Reduction Axiom,” Econometrica, 58, 349– 377. Wakker, P. P. (1989a). “Continuous Subjective Expected Utility with Nonadditive Probabilities,” Journal of Mathematical Economics, 18, 1–27. —— (1989b). Additive Representations of Preferences, A New Foundation of Decision Analysis. Dordrecht: Kluwer. —— (1993a). “Unbounded Utility for Savage’s ‘Foundations of Statistics’, and Other Models,” Mathematics of Operations Research, 18, 446–485. —— (1990). “Under Stochastic Dominance Choquet-Expected Utility and Anticipated Utility are Identical,” Theory and Decision, 29, 119–132. —— (1991). “Additive Representations of Preferences on Rank-Ordered Sets I. The Algebraic Approach,” Journal of Mathematical Psychology, 35, 501–531. —— (1993b). “Counterexamples to Segal’s Measure Representation Theorem,” Journal of Risk and Uncertainty, 6, 91–98. Yaari, M. E. (1987). “The Dual Theory of Choice under Risk,” Econometrica, 55, 95–115.
8
Updating ambiguous beliefs Itzhak Gilboa and David Schmeidler
8.1. Introduction The Bayesian approach to decision making under uncertainty prescribes that a decision maker have a unique prior probability and a utility function such that decisions are made so as to maximize the expected utility. In particular, in a statistical inference problem the decision maker is assumed to have a probability distribution over all possible distributions which may govern a certain random process. This paradigm was justified by axiomatic treatments, most notably that of Savage (1954), and it enjoys unrivaled popularity in economic theory, game theory, and so forth. However, this theory is challenged by two classes of evidence: on the one hand, there are experiments and thought experiments (such as Ellsberg (1961) and many others) which seem to show that individuals tend to violate the consistency conditions underlying (and implied by) the Bayesian approach. On the other hand, people seem to have difficulties with specification of a prior for actual statistical inference problems. Thus, classical—rather than Bayesian—methods are user for practical purposes, although they are theoretically less satisfactory. The last decade has witnessed—among numerous and various generalizations of von Neumann and Morgenstern’s (1947) expected utility theory—generatlizations of the Bayesian paradigm as well. We will not attempt to provide a survey of them here. Instead, we only mention the models which are relevant to the sequel. 1
2
Nonadditive probabilities. First introduced by Schmeidler (1982, 1986, 1989) and also axiomatized in Gilboa (1987), Fishburn (1988), and Wakker (1989), nonadditive probabilities are monotone set-functions which do not have to satisfy additivity. Using the Choquet integral (Choquet, 1953–1954) one may define expected utility, and the works cited before axiomatize preference relations which are representable by expected utility in this sense. Multiple priors (MP). As axiomatized by Gilboa and Schmeidler (1989), this model assumes that the decision maker has a set of priors, and each alternative
Gilboa, I. and D. Schmeidler (1993). Updating ambiguous beliefs, J. Econ. Theory, 59, 33–49.
156
Itzhak Gilboa and David Schmeidler is assessed according to its minimal expected utility, where the minimum is taken over all priors in the set. (This idea is also related to Bewley (1986– 1988), who suggests a partial order over alternatives, such that one alternative is preferred to another only if its expected utility is higher according to all priors in the set.)
Of particular interest to this study is the intersection of the two models: it turns out that if a nonadditive measure (NA) exhibits uncertainty aversion (technically, if it is convex in the sense v(A ∪ B) + v(A ∩ B) v(A) + v(B)), then the Choquet integral of a real-valued function with respect to v equals the minimum of all its integrals with respect to additive priors taken from the core of v. (The core is defined as in cooperative game theory, that is, p is in the core of v if p(A) v(A) for every event A with equality for the whole sample space. Convex NA have nonempty cores.) While these models—as many others—suggested generalizations of the Bayesian approach for a one-shot decision problem, they shed very little light on the problem of dynamically updating probabilities as new information is gathered. We find this problem to be of paramount importance for several interrelated reasons: 1 2
3
4
The theoretical validity of any model of decision making under uncertainty is quite dubious if it cannot cope successfully with the dynamic aspect. The updating problem is at the heart of statistical theory. In fact, it may be viewed as the problem statistical inference is trying to solve. Some of the works in the statistical literature which pertain to this study are Agnew (1985), Genest and Schervish (1985), and Lindley et al. (1979). Applications of these models to economic and game theory models require some assumptions on how economic agents change their beliefs over time. The question naturally arises, then: What are the reasonable ways to update such beliefs? The theory of artificial intelligence, which in general seems to have much in common with the foundations of economic, decision, and game theory, also tries to cope with this problem. See, for instance, Fagin and Halpern (1989), Halpern and Fagin (1989), and Halpern and Tuttle (1989).
In this study we try to deal with the problem axiomatically and suggest plausible update rules which satisfy some basic requirements. We present a family of pseudoBayesian rules, each of which may be considered a generalization of Bayes’ rule for a unique additive prior. We also present a family of “classical” update rules, each of which starts out with a given set of priors, possibly rules some of them out in the face of new information, and continues with the (Bayesian) updates of the remaining ones. In particular, a maximum-likelihood-update rule would be the following: consider only those priors which ascribe the maximal probability to the event
Updating ambiguous beliefs
157
that is known to have occurred, update each of them according to Bayes’ rule, and continue in this fashion. It turns out that if the set of priors one starts out with can also be represented by a nonadditive probability, the results of this rule are independent of the order in which information is gathered. Furthermore, for those preferences which can be simultaneously represented by a NA and by MP, the maximum-likelihood-update rule coincides with one of the more intuitive Bayesian rules, and they boil down to the Dempster–Shafer rule (see Dempster (1967, 1968), Shafer (1976), and Smets (1986)). For recent work on belief functions and their updating, see Jaffray (1989), Chateauneuf and Jaffray (1989), and especially Jaffray (1990). Thus, we find that an axiomatically based generalization of the Bayesian approach can accommodate MP (which are used in classical statistics). Moreover, the maximum-likelihood principle, which is at the heart of statistical inference (and implicit in the techniques of confidence sets and hypothesis testing) coincides with the generalized Bayesian updating. Due to the prominence of this rule, it may be a source of insight to study it in a simple example. Consider Ellsberg’s example in which an urn with 90 balls is given, out of which 30 are red, and 60 are either blue or yellow. For simplicity of exposition, let us model this situation in a somewhat extreme fashion, allowing for all distributions of blue and yellow balls. Maxmin expected utility with respect to this set of priors is equivalent to the maximization of the Choquet integral of utility with respect to (w.r.t.) a NA v defined as 1 , 3 1 v(R ∪ B) = v(R ∪ Y ) = , 3 v(R ∪ B ∪ Y ) = 1, v(R) =
v(B) = v(Y ) = 0 v(B ∪ Y ) =
2 3
where R, B, and Y denote the events of a red, blue, or yellow ball being drawn, respectively. Assume now that it is known that a ball (which, say, has already been drawn) is not red. Conditioning on the event B ∪ Y , all priors in the set ascribe probability of 2 3 to it. Thus, they are all left in the set and updated according to Bayes’ rule. This captures our intuition that no ambiguity was resolved, and our complete ignorance regarding the event B ∪ Y has not changed. (Actually, it is now highlighted by the fact that this event, about which we know the least, is now known to have occurred.) Consider, on the other hand, the same update rules in the case that R ∪ B is known. The priors we started out with ascribe to this event probabilities ranging from 13 to 23 . According to the maximum-likelihood principle, only one of them is chosen—namely, the p which satisfies p(R) =
1 , 3
p(B) =
2 , 3
p(Y ) = 0.
158
Itzhak Gilboa and David Schmeidler
In this particular case, the set of priors shrinks to a singleton and, equivalently, the updated measure v is additive (and equals p itself). Ambiguity is thus reduced (in the case, eliminated) by the generalized Bayesian learning embodied in the exclusion of some priors. In the context of such examples it is sometimes argued that the maximumlikelihood rule is too extreme, and that priors which, say, only ε-maximize the likelihood function should not be ruled out. Indeed, classical statistics techniques such as hypothesis testing do allow for ranges of the likelihood function. At present we are not aware of a nice axiomatization of such rules. We point out, however, that the other extreme rule, that is, updating all priors without excluding any of them (see, for instance, Fagin and Halpern (1989), and Jaffray (1990)), does not appear to be any less “extreme” in general, nor does it seem to be implied by more compelling axioms. We believe that our theory can be applied to a variety of economic models, explaining phenomena which are incompatible with the Bayesian theory, and possibly providing better predictions. As a matter of fact, this belief may be updated given new evidence: Dow and Werlang (1990) and Simonsen and Werlang (1990) have already applied the MP theory to portfolio selection problems. These studies have shown that a decision maker having ambiguous beliefs will have a (nontrivial) range of prices at which he/she will neither buy or sell an uncertain asset, exhibiting inertia in portfolio selection. Applying our new results regarding ambiguous beliefs update, one may study the conditions under which these price ranges will shrink in the face of new information. Dow et al. (1989) studied trade among agents, at least one of whom has ambiguous beliefs. They show that the celebrated no-trade result of Milgrom and Stokey (1982) fails to hold in this context. In this study, the Dempster–Shafer rule for updating NA was used, a rule which is justified by the current chapter. Casting the trade setup into a dynamic context raises the question of an asymptotic no-trade theorem: Under what conditions will additional information reduce the volume or probability or trade? In another recent study, Yoo (1990) addressed the question of why stock prices tend to fall after the initial public offering and rise at a later stage. Although Yoo uses ambiguous beliefs mostly as in Bewley’s (1986) model, his results can also be obtained using the models mentioned earlier. It seems that the update rule justified by our study may explain the price dynamics. These various models seem to point at a basic problem: given a convex NA (or, equivalently, a set of priors which is the core of such a measure), under what conditions will the Dempster–Shafer rule yield convergence of beliefs to a single additive prior? Obviously, the answer cannot be “always.” Consider a “large” measurable space with all possible priors (equivalently, with the “unanimity game” as an NA measure). In this setup of “complete ignorance,” no conclusions about the future may be drawn from past observations—that is, the updated beliefs still include all possible priors. However, with some initial information (say, finitely many extreme points of the set of priors) convergence is possible. Conditions that will guarantee such convergence call for further study.
Updating ambiguous beliefs
159
The rest of this chapter is organized as follows. Section 8.2 presents the framework and quotes some results. Section 8.3 defines the update rules and states the theorems. Finally, Section 8.4 includes proofs, related analysis, and some remarks regarding possible generalizations.
8.2. Framework and preliminaries Let X be a set of consequences endowed with a weak order . Let (S, ) be a measurable space of states of the world, where is the algebra of events. A function f : S → X is -measurable if for every x ∈ X {s|f (s) > x},
{s|f (s) x} ∈ .
Let F = {f : S → X|f is -measurable} be the set of acts. Let F0 = {f ∈ F | |range(f )| < ∞} be the set of simple acts. A function u: X → R, which represents , that is, u(x) u(y) ⇐⇒ x y,
∀x, y ∈ X
is called a utility function. A function v: → [0, 1] satisfying (i) v(Ø) = 0; v(S) = 1; (ii) A ⊆ B ⇒ v(A) v(B) is an NA. It is convex if v(A ∪ B) + v(A ∩ B) v(A) + v(B) for all A, B ∈ . It is additive, or simply a measure, if the above inequality is always satisfied as an equality. A real-valued function is -measurable if for every t ∈ R {s|w(s) t},
{s|w(s) > t} ∈ .
Given such a function w and an NA v, the (Choquet) integral of w w.r.t. v on S is
w dv =
w dv =
S
0
∞
v({s|w(s) t}) dt +
0 ∞
[v({s|w(s) t}) − 1] dt.
For an NA measure v we define the core as for a cooperative game, that is, Core(v) = {p|p is a measure s.t. p(A) v(A)∀A ∈ }. Recall that a convex v has a nonempty core (see Shapley (1965)). We are now about to define two classes of binary relations on F : those represented by maximization of expected utility with NA, and those represented by maxmin of expected utility with MP.
160
Itzhak Gilboa and David Schmeidler
Denote by NA◦ (= NA◦ (X, , S, )) the set of binary relations on F such that there are a utility u, unique up to positive linear transformation (p.l.t.), and a unique NA v satisfying: (i) for every f ∈ F , u ◦ f is -measurable; (ii) for every f , g ∈ F f g ⇐⇒
u ◦ f dv
u◦g dv.
Note that in general the measurability of f does not guarantee that of u◦f , and that (ii) implies that on F , when restricted to constant functions, extends on X. Hence we use this convenient abuse of notation. Similarly, we will not distinguish between x ∈ X and the constant act which equals x on S. Characterizations of NA◦ were given by Schmeidler (1986, 1989) for the Anscombe–Aumann (1963) framework, where X is a mixture space and u is assumed affine; by Gilboa (1987) in the Savage (1954) framework, where X is arbitrary but = 2S and v is nonatomic; and by Wakker (1989) for the case where X is a connected topological space. Fishburn (1988) extended the characterization to non-transitive relations. Let MP◦ (= MP◦ (X, , S, )) denote the set of binary relations of F such that there are a utility u unique up to a p.l.t., and a unique nonempty, closed (in the weak* topology), and convex set C of (finitely additive) measures on such that: (i) for every f ∈ F , u ◦ f is -measurable; (ii) for every f , g ∈ F f g ⇐⇒ min p∈C
u ◦ f dp min p∈C
u ◦ g dp.
A characterization of MP◦ in the Anscombe–Aumann framework was given in Gilboa and Schmeidler (1989). To the best of our knowledge, there is no such axiomatization in the framework of Savage. However, the set NA◦ ∩ MP◦ , which will play an important role in the sequel, may be characterized by strengthening the axioms in Gilboa (1987). It will be convenient to include the trivial weak order ∗ = F × F in both NA and MP. Hence, we define NA = NA◦ ∪ {∗ } and MP = MP◦ ∪ {∗ }. For simplicity we assume that X has -maximal and -minimal elements. More specifically, let x ∗ , x∗ ∈ X satisfy x∗ x x ∗ for all x ∈ X. Without loss of generality (w.l.o.g.), assume that x∗ and x ∗ are unique. Since for both NA◦ and MP◦ the utility function is unique only up to a p.l.t. we will assume w.l.o.g. that u(x∗ ) = 0 and u(x ∗ ) = 1 for all utilities henceforth considered. When X is a mixture space we define NA and MP to be the subsets of NA and MP, respectively, where the utility u is also required to be affine. For such spaces X we recall the following results.
Updating ambiguous beliefs
161
Proposition 8.1. Suppose that ∈ NA and let v be the associated NA. Then ∈ MP iff v is convex. Proposition 8.2. Suppose that ∈ MP and let C be the associated set of measures. Define v(A) = min p(A) for A ∈ . p∈C
Then v is an NA and ∈ NA iff v is convex and C = Core(v). The proofs of these appear, explicity or implicity, in Schmeidler (1984, 1986, 1989). Note that the axiomatization of NA (Schmeidler, 1989) uses comonotonic independence, and given this property the convexity of v is equivalent to uncertainty aversion. The axiomatization of MP (Gilboa and Schmeidler, 1989) uses a weaker independence notion—termed C-independence and uncertainty aversion. Given these, the convexity of v and the equality C = Core(v) (where v is defined as in Proposition 8.2) is equivalent to comonotonic independence. We now define update rules. We need the following definitions. Given a measurable partition = {Ai }ni−1 of S and {fi }ni=1 ⊆ F , let (f1 , A1 ; . . . ;fn , An ) denote the act g ∈ F satisfying g(s) = fi (s) for all s ∈ Ai and all 1 i n. Given a binary relation on F , an event A ∈ is -null iff the following holds: for every f , g, h1 , h2 ∈ F , f g
iff (f , Ac ; h1 , A) (g, Ac ; h2 , A).
¯ an update Let B¯ denote the set of all binary relations on F . Given B ⊆ B, rule for B is a collection of functions, U = {UA }A∈ , where UA :B → B¯ such that for all ∈ B and A ∈ , Ac is UA ()-null and US () = . UA () should be thought of as the preference relation once A is known to have occurred. Given B and an update rule for it, U = {UA }A∈ , U is said to be commutative w.r.t. or -commutative if for every A, B ∈ we have UA () ∈ B and UB (UA ()) = UA∩B (). It is commutative if it is commutative w.r.t. for all ∈ B. (Note that this condition is stronger than strict commutativity, that is, UA ◦ UB = UB ◦ UA . However, “commutativity” seems to be a suggestive name which is not overburdened with other meanings.)
8.3. Bayesian and classical rules Given a set B of binary relations of F , every f ∈ F suggests a natural update rule f for B: define BUf = {BUA }A∈ by f
g BUA ()h ⇐⇒ (g, A; f , Ac ) (h, A; f , Ac )
for all g, h ∈ F .
162
Itzhak Gilboa and David Schmeidler f
It is obvious that for every f , BUf is an update rule, that is, that Ac is BUA ()null for all ∈ B and A ∈ . We will refer to it as the f -Bayesian update rule and {BUf }f ∈F will be called the set of Bayesian-update rules. Note that for ∈ NA with an additive v, all the Bayesian update rules coincide with Bayes’ rule, hence the definition of the Bayesian-update rules may be considered a formulation and axiomatization of Bayes’ rule in the case of (a unique) additive prior. Proposition 8.3. For every ∈ B¯ and f ∈ F , BUf is -commutative. Theorem 8.1. Let f ∈ F and assume that || > 4. Then the following are equivalent: (i) BUA (NA ) ⊆ NA for all A ∈ ; (ii) f = (x ∗ , T ; x∗ , T c ) for some T ∈ . f
Of particular interest are the Bayesian update rules corresponding to f = x ∗ and f = x∗ (i.e. T = S or T = Ø in (ii) earlier). For the latter (x∗ ) there is an “optimistic” interpretation: when comparing two actions given a certain event A, the decision maker implicitly assumes that had A not occurred, the worst possible outcome (x∗ ) would have resulted. In other words, the behavior given f A—BUA ()—exhibits “happiness” that A has occurred; the decisions are made as if we are always in “the best of all possible worlds.” Note that the corresponding NA is vA (B) = v(B ∩ A)/v(A). On the other hand, for f = x ∗ , we consider a “pessimistic” decision maker, whose choices reveal the hidden assumption that all the impossible worlds are the best conceivable ones. This rule defines the nonadditive function by vA (B) = [v((B ∩ A) ∪ Ac ) − v(Ac )]/(1 − v(Ac )), which is identical to the Dempster–Shafer rule for updating probabilities. It should not surprise us that this “pessimistic” rule is going to play a major role in relation to MP—that is, to uncertainty averse decision makers who follow a maxmin (expected utility) decision rule. In a similar way one may develop a “dual” theory of “optimism” in which uncertainty seeking will replace uncertainty aversion, concavity of v will replace convexity, and maxmax will supercede maxmin. For this “dual” theory, the update rule vA (B) = v(B ∩ A)/v(A) would be the “appropriate” one (in a sense that will be clear shortly). Note that this rule was used—without axiomatization—as a definition of probability update in Gilboa (1989).
Updating ambiguous beliefs
163
Taking a classical statistics point of view, it is natural to start out with a set of priors. Hence we only define classical update rules for B = MP . A natural procedure in the classical updating process is to rule out some of the given priors, and update the rest according to Bayes’ rule. Thus, we get a family of update rules, which differ in the way the priors are selected. Formally, a classical update rule is characterized by a function R: (C, A) → C such that C ⊆ C is a closed and convex set of measures for every such C and every A ∈ , with R(C, S) = C. The associated update rule will be denoted CU R = {CUAR }A∈ . (If R(C, A) = Ø we define CUAR () = ∗ .) Note that these are indeed update rules, that is, for every ∈ MP , every R and every A ∈ , Ac is CUAR ()-null. Furthermore, for ∈ MP with an associated set C, CUAR () ∈ MP provided that inf {p(A)|R(C, A)} > 0 for all A ∈ . Of particular interest will be the classical update rule called maximum likelihood and defined by R 0 (C A) = {p ∈ C|p(A) = max q(A) > 0}. q∈C
Theorem 8.2. CU R ∈ NA ∩ MP , BU(x A
∗
,S)
0
is commutative on NA ∩ MP . Furthermore, for
() = CUR A () ∈ NA ∩ MP . 0
That is, the Bayesian update rule with f = (x ∗ , S) coincides with the maximumlikelihood classical update rule. Moreover, they are also equivalent to the Dempster–Shafer update rule for belief functions. (Note that every belief function (see Shafer (1976)) is convex, though the converse is false. Yet one may apply the Dempster–Shafer rule for every NA v.)
8.4. Proofs and related analysis Proof of Proposition 8.1. It only requires to note that for every f , g ∈ F , A, B ∈ ((g, A; f , Ac ), B; f , B c ) = (g, A ∩ B; f , (A ∩ B)c ). Proof of Theorem 8.1. 8.1. First assume (ii). Let there be given ∈ NA with associated u and v. Define for B ∈ an NA vB by vB (A) = [v((A ∩ B) ∪ (T ∩ B c )) − v(T ∩ B c )]/[v(B ∪ T ) − v(T ∩ B c )]
164
Itzhak Gilboa and David Schmeidler
if the denominator is positive. (Otherwise the result is trivial.) For every g ∈ F we have
u ◦ (g, B; f , B ) dv = c
1
v({s|u ◦ (g, B; f , B c )(s) t}) dt
0
S
=
1
v((T ∩ B c ) ∪ ({s|u ◦ g(s) t} ∩ B)) dt
0 1
=
[v(T ∩ B c )
0
+ [v(B ∪ T ) − v(T ∩ B c )]vB ({s|u ◦ g(s) t})] dt c c = v(T ∩ B ) + [v(B ∪ T ) − v(T ∩ B )] u ◦ g dvB , where vB and u represent BUB , which implies that the latter is in NA . Conversely, assume (i) holds. Assume, to the contrary, that f (s) ∼ x for s ∈ D where D ∈ , D = S and x∗ < x < x ∗ (where ∼ denotes -equivalence). Let E, F ∈ satisfy E ∩ F = F ∩ D = D ∩ F = Ø. Denote α = u(x) (where 0 < α < 1). Choose m ∈ (α, 1) and a nonadditive v such that f
v(E) = v(F ) = v(D) = m v(E ∪ F ) = v(E ∪ D) = v(F ∪ D) = m and v(T ) = v(T ∩ (E ∪ F ∪ D)) for all T ∈ . Next define ∈ NA by v and (the unique) u. Choose g1 , g2 such that u ◦ g1 (s) = u ◦ g2 (s) = α s∈D u ◦ g1 (s) = 1, u ◦ g2 (s) = α + (1 − α/m) s ∈ E u ◦ g1 (s) = 0, u ◦ g2 (s) = α + (1 − α/m) s ∈ F . Let be BUE∪F (). By assumption it belongs to NA ; hence, there correspond to it u = u and v . Note that v is unique as is nontrivial, and that v (T )= v (T ∩ (E ∪ F )) for all T ∈ . As u ◦ g dv = u ◦ g2 dv, g1 ∼ g2 , whence g1 ∼ g2 . Hence, u ◦ g1 dv = 1 u ◦ g2 dv , that is, v (E) = α + (1 − α/m). Next choose β ∈ (0, α) and choose an act g3 ∈ F such that f
⎧ ⎪ ⎨α u ◦ g3 (s) = β ⎪ ⎩ 0
s∈D s∈E s ∈ F.
Updating ambiguous beliefs
165
For every γ ∈ (0, α) choose gγ ∈ F such that α s∈D u ◦ gγ (s) = γ s ∈ E ∪ F. Then u ◦ g3 dv = αm and u ◦gγ dv = αm + γ (1 − m). Hence, gγ > g3 and gγ > g3 for all γ > 0. However, u ◦ gγ dv = γ and u ◦ g3 dv = βv (E), where v (E) = 0, which is a contradiction. Remark 8.1. In the case of no extreme outcomes, that is, when X has no -maximal or no -minimal elements, and in particular when the utility is not bounded, there are no update rules BUf which map NA into itself. However, one may choose for g, h ∈ F ◦ x ∗ , x∗ ∈ X such that x ∗ g(s), h(s) x∗ , ∀s ∈ S, and for every f T ∈ define BUf () = {BUA }A∈ between g and h by f = (x ∗ , T ; x∗ , T c ). If ∈ NA, this definition is independent of the choice of x ∗ and x∗ . The resulting update rule will be commutative for any (fixed) T ∈ . Proof of Theorem 8.2. Let ∈ MP be given, and let C denote its associated set of additive measures. Define v(·) = minp∈C p(·). Assume that v is convex and C = Core(v). For A ∈ with q(A) > 0 for some q ∈ C, we have " % R 0 (C, A) = p ∈ C|p(A) = max q(A) = {p ∈ C|p(Ac ) = v(Ac )}. q∈C
R R R ∗ (Note that if v(Ac ) = 1, CUR A (CUB ()) = CUB (CUA () = .) As was shown in Schmeidler (1984), v is convex iff for every chain Ø = E0 ⊆ E1 ⊂ · · · ⊂ En = S there is an additive measure p in Core(v) = C such that p(Ei ) = v(Ei ), 0 i n. Furthermore, this requirement for n = 3 is also equivalent to convexity. Next define 0
0
vA (T ) = min{p(T ∩ A)|p ∈ R 0 (C, A)}. (T ) = v((T ∩ A) ∪ Ac ) − v(Ac ). Claim. vA
Proof. For p ∈ R 0 (C, A) we have p(T ∩ A) = p((T ∩ A) ∪ Ac ) − p(Ac ) = p((T ∩ A) ∪ Ac ) − v(Ac ) v((T ∩ A) ∪ Ac ) − v(Ac ) whence vA (T ) v((T ∩ A) ∪ Ac ) − v(Ac ).
0
0
166
Itzhak Gilboa and David Schmeidler
To show the converse inequality, consider the chain Ø ⊆ Ac ⊆ Ac ∪ (A ∩ T ) ⊆ S. By convexity there is p ∈ Core(v) = C satisfying p(Ac ) = v(Ac ) and p(Ac ∪ (T ∩ A)) = v(Ac ∪ (T ∩ A)) which also implies p ∈ R 0 (C, A). Then vA (T ) p(T ∩ A) = p((T ∩ A) ∪ Ac ) − p(Ac )
= v((T ∩ A) ∪ Ac ) − v(Ac ). ∗ c Consider CUR A (). If it is not equal to , it has to be the case that v(A ) < 1, and then it is defined by the set of additive measures 0
CA = {pA |p ∈ R 0 (C, A)} where pA (T ) = p(T ∩ A)/p(A) = p(T ∩ A)/(1 − v(Ac )). Note that CA is nonempty, closed, and convex. Define vA (T ) = min{p(T )|p ∈ CA }, (T )/(1 − v(Ac )), that is, and observe that vA (T ) = vA
vA (T ) = [v((T ∩ A) ∪ Ac ) − v(Ac )]/[1 − v(Ac )].
(8.1)
Hence, vA is also convex. We have to show that CA = Core(vA ). To see this, let p ∈ Core(vA ). We will show that p = qA for some q ∈ R 0 (C, A). Take any q ∈ Core(v) and define q(T ) = p(T ∩ A)[1 − v(Ac )] + q (T ∩ Ac ). Note that q(T ∩ A) = p(T ∩ A)[1 − v(Ac )] vA (T ∩ A)[1 − v(Ac )] = v((T ∩ A) ∪ Ac ) − v(Ac ). (As p ∈ Core(vA ) and by definition of the latter.) Next, since q ∈ Core(v), q(T ∩ Ac ) = q (T ∩ Ac ) v(T ∩ Ac ). Hence, q(T ) = q(T ∩ A) + q(T ∩ Ac ) v((T ∩ A) ∪ Ac ) − v(Ac ) + v(T ∩ Ac ) = v(T ∪ Ac ) − v(Ac ) + v(T ∩ Ac ) v(T ), where the last inequality follows from the convexity of v. Finally, q(S) = q(A) + q(Ac ) = p(A)[1 − v(Ac )] + v(Ac ) = 1. Hence, q ∈ Core(v). Furthermore, q ∈ R 0 (C, A). Obviously, p = qA .
Updating ambiguous beliefs
167
∗
(x , S) R Thus we establish CUR () and A () ∈ NA . Furthermore, CUA () = BUA the non-additive probability update rule (8.1) coincides with the Dempster–Shafer 0 rule. Any of these two facts, combined with the observation CUR A () ∈ NA , 0 R implies that CU is commutative. 0
0
Remark 8.2. It is not difficult to see that the maximum-likelihood update rule is not commutative in general. In fact, one may ask whether the converse of Theorem 8.2 is true, that is, whether a relation ≥ ∈ MP w.r.t. which CUR0 is commutative has to define a set C which is a core of an NA. The negative answer is given by the following example: S = {1, 2, 3, 4}, = 2S , C = conv{p1 , p2 } defined by
p1 p2
1
2
3
4
0.7 0.1
0.1 0.3
0.1 0.3
0.1 0.3
It is easily verifiable that the maximum-likelihood update rule is commutative w.r.t. the induced ∈ MP , though C is not the core of any v. Remark 8.3. It seems that the maximum-likelihood update rule is not commutative in general, because it lacks some “look-ahead” property. One is tempted to define an update rule that will retain all the priors which may, at some point in the future, turn out to be likelihood maximizers. Thus, we are led to the “semi-generalized maximum likelihood”: " R 1 (C, A) = cl conv p ∈ C|p(E) = max q(E) > 0 q∈C
%
for some measurableE ⊆ A (where cl means closure in the weak* topology). Note that the resulting set of measures may include p ∈ C such that p(A) = 0. In this case, define 1 ∗ CUR A () = . However, the following example shows that this update rule also fails to be commutative in general. Consider S = {1, 2, 3, 4, 5}, = 2S , and let C be conv {p1 , p2 , p3 , p4 } defined by the following table:
p1 p2 p3 p4
1
2
3
4
5
0.2 0 0.27 0
0.2 0 0 0.27
0.01 0.4 0.03 0.03
0.09 0.4 0 0
0.5 0.2 0.7 0.7
168
Itzhak Gilboa and David Schmeidler
Taking A = {1, 2, 3, 4} and B = {1, 2, 3}, one may verify that R 1 (R 1 (C, A), B) = {p2 , p3 , p4 } and R 1 (C, B) = {p1 , p2 , p3 , p4 } and that p1B is not in the convex hull of {p2B , p3B , p4B }. We may try an even more generalized version of the maximum-likelihood criterion: retain all priors according to which the integral of some non-negative simple function is maximized. That is, define " " % R 2 (C, A) = cl conv p ∈ C u ◦ f dp = max u ◦ f dq q ∈ C > 0 % for some f ∈ F◦ . The maximization of u ◦ f dp for some f may be viewed as maximization of some convex combination of the likelihood function at several points of time. 2 However, the same example shows that CUR is not commutative in general. Remark 8.4. Although our results are formulated for NA and MP , they may be generalized easily. First, one should note that none of the results actually requires that X be a mixture space. All that is needed is that the utility on X be uniquely defined (up to a p.l.t.) and that its range will contain an open interval. In particular, connected topological spaces with a continuous utility function will do. Moreover, most of the results do not even require such richness of the utility’s range. In fact, this richness was only used in the proof of (i) ⇒ (iii) in Theorem 8.1.
Acknowledgments We thank James Dow, Joe Halpern, Jean-Yves Jaffray, Bart Lipman, Klaus Nehring, Sergiu Werlang, and an anonymous referee for stimulating discussions and comments. NSF grants SES-9113108 and SES-9111873 are gratefully acknowledged.
References C. E. Agnew (1985) Multiple probability assessments by dependent experts, J. Amer. Statist. Assoc., 80, 343–347. F. J. Anscombe and R. J. Aumann (1963) A definition of subjective probability, Ann. Math. Statist., 34, 199–205. T. Bewley (1986) “Knightian Decision Theory: Part I,” Cowles Foundation Discussion paper No. 807, Yale University.
Updating ambiguous beliefs
169
T. Bewley (1987) “Knightian Decision Theory: Part II: Intertemporal Problems,” Cowles Foundation Discussion paper No. 835, Yale University. T. Bewley (1988) “Knightian Decision Theory and Econometric Inference,” Cowles Foundation Discussion paper No. 868, Yale University. A. Chateauneuf and J.-Y. Jaffray (1989) Some characterizations of lower probabilities and other monotone capacities through the use of Moebius inversion, Mathematical Social Sciences, 17, 263–283. G. Choquet (1953–1954) Theory of capacities, Ann. l’Institut Fourier, 5, 131–295. A. P. Dempster (1967) Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Statist., 38, 325–339. A. P. Dempster (1968) A generalization of Bayesian inference, J. Roy. Statist. Soc. Series B, 30, 205–247. J. Dow and S. R. C. Werlang (1990) “Risk Aversion, Uncertainty Aversion and the Optimal Choice of Portfolio,” London Business School Working Paper. (Reprinted as Chapter 17 in this volume.) J. Dow, V. Madrigal, and S. R. C. Werlang (1989) “Preferences, Common Knowledge and Speculative Trade,” Fundacao Getulio Vargas Working paper. D. Ellsberg (1961) Risk, ambiguity and the Savage axioms, Quart. J. Econ., 75, 643–669. R. Fagin and J. Y. Halpern (1989) A new approach to updating beliefs, mimeo. P. C. Fishburn (1988) Uncertainty aversion and separated effects in decision making under uncertainty, in “Combining Fuzzy Imprecision with Probabilistic Uncertainty in Decision Making,” (J. Kacprzyk and M. Fedrizzi, eds), pp. 10–25, Springer-Verlag, New York/Berlin. C. Genest and M. J. Schervish (1985) Modeling expert judgments for Bayesian updating, Ann. Statist., 13, 1198–1212. I. Gilboa (1987) Expected utility theory with purely subjective non-additive probabilities, J. Math. Econ., 16, 65–88. I. Gilboa (1989)Additivizations of non-additive measures, Math. Operations Res., 14, 1–17. I. Gilboa and D. Schmeidler (1989) Maxmin expected utility with non-unique prior, J. Math. Econ., 18, 141–153. (Reprinted as Chapter 6 in this volume.) J. Y. Halpern and R. Fagin (1989) Two views of belief: Belief as generalized probability and belief as evidence, mimeo. J. Y. Halpern and M. R. Tuttle (1989) Knowledge, probability and adversaries, mimeo. J. Y. Jaffray (1989) Linear utility theory for belief functions, Operations Res. Lett., 8, 107–112. J. Y. Jaffray (1990) Bayesian updating and belief functions, mimeo. D. V. Lindley, A. Tversky, and R. V. Brown (1979) On the reconciliation of probability assessments, J. Roy. Statist. Soc. Series A, 142, 146–180. P. Milgrom and N. Stokey (1982) Information, trade and common knowledge, J. Econ. Theory, 26, 17–27. J. von Neumann and O. Morgenstern (1947) “Theory of Games and Economic Behavior,” 2nd edn, Princeton University Press, Princeton, NJ. L. J. Savage (1954) “The Foundations of Statistics,” Wiley, New York. D. Schmeidler (1982) “Subjective Probability Without Additivity,” (temporary title), Working paper, Foerder Institute for Economic Research. Tel Aviv University. D. Schmeidler (1984) Nonadditive probabilities and convex games, mimeo. D. Schmeidler (1986) Integral representation without additivity, Proc. Amer. Math. Soc., 97, 253–261.
170
Itzhak Gilboa and David Schmeidler
D. Schmeidler (1989) Subjective probability and expected utility without additivity, Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) G. Shafer (1976) “A Mathematical Theory of Evidence,” Princeton University Press. Princeton, NJ. M. H. Simonsen and S. R. C. Werlang (1990) Subadditive probabilities and portfolio inertia, mimeo. L. S. Shapley (1971) “Notes on n-person Games VII: Cores of Convex Games,” The Rand Corporation R. M. (1965); also as Cores of convex games. Int. J. Game Theory, 1, 12–26. Ph. Smets (1986) “Combining Non-distinct Evidences,” Technical report ULBIRIDIA-86/3. P. Wakker (1989) Continuous subjective expected utility with non-additive probabilities, J. Math. Econ., 18, 1–17. K. R. Yoo (1990) A theory of the underpricing of initial public offerings, mimeo.
9
A definition of uncertainty aversion Larry G. Epstein
9.1. Introduction 9.1.1. Objectives The concepts of risk and risk aversion are cornerstones of a broad range of models in economics and finance. In contrast, relatively little attention is paid in formal models to the phenomenon of uncertainty that is arguably more prevalent than risk. The distinction between them is roughly that risk refers to situations where the perceived likelihoods of events of interest can be represented by probabilities, whereas uncertainty refers to situations where the information available to the decision-maker is too imprecise to be summarized by a probability measure. Thus the terms “vagueness” or “ambiguity” can serve as close substitutes. Ellsberg, in his famous experiment, has demonstrated that such a distinction is meaningful empirically, but it cannot be accommodated within the subjective expected utility (SEU) model. Perhaps because this latter model has been so dominant, our formal understanding of uncertainty and uncertainty aversion is poor. There exists a definition of uncertainty aversion, due to Schmeidler (1989), for the special setting of Anscombe–Aumann (AA) horse-race/roulette-wheel acts. Though it has been transported and widely adopted in models employing the Savage domain of acts, I feel that it is both less appealing and less useful in such contexts. Because the Savage domain is typically more appropriate and also more widely used in descriptive modeling, this suggests the need for an alternative definition of uncertainty aversion that is more suited to applications in a Savage domain. Providing such a definition is the objective of this chapter. Uncertainty aversion is defined for a large class of preferences. This is done for the obvious reason that a satisfactory understanding of uncertainty aversion can be achieved only if its meaning does not rely on preference axioms that are auxiliary rather than germane to the issue. On the other hand, Choquet expected utility (CEU) theory Schmeidler (1989) and its close relative, the multiple-priors model Epstein, L.G. (1999) “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608.
172
Larry G. Epstein
Gilboa and Schmeidler (1989), provide important examples for understanding the nature of our definition, as they are the most widely used and studied theories of preference that can accommodate Ellsberg-type behavior. Recall that risk aversion has been defined and characterized for general preferences, including those that lie outside the expected utility class (see Yaari (1969) and Chew and Mao (1995), for example). There is a separate technical or methodological contribution of the paper. After the formulation and initial examination of the definition of uncertainty aversion, subsequent analysis is facilitated by assuming eventwise differentiability of utility. The role of eventwise differentiability may be described roughly as follows: The notion of uncertainty aversion leads to concern with the “local probabilistic beliefs” implicit in an arbitrary preference order or utility function. These beliefs represent the decision-maker’s underlying “mean” or “ambiguity-free” likelihood assessments for events. In general, they need not be unique. But they are unique if utility is eventwise differentiable (given suitable additional conditions). Further perspective is provided by recalling the role of differentiability in decision theory under risk, where utility functions are defined on cumulative distribution functions. Much as calculus is a powerful tool, Machina (1982) has shown that differential methods are useful in decision theory under risk. He employs Frechet differentiability; others have shown that Gateaux differentiability suffices for many purposes (Chew et al., 1987). In the present context of decision-making under uncertainty, where utility functions are defined over acts, the preceding two notions of differentiability are not useful for the task of uncovering implicit local beliefs. On the other hand, eventwise differentiability “works.” Because local probabilistic beliefs are likely to be useful more broadly, so it seems will the notion of eventwise differentiability. It must be acknowledged, however, that eventwise differentiability has close relatives in the literature, namely in Rosenmuller (1972) and Machina (1992).1 The differences from this chapter and the value added here are clarified later (Appendix C). It seems accurate to say that this chapter adds to the demonstration in Machina (1992) that differential techniques are useful also for analysis of decision-making under uncertainty. The chapter proceeds as follows: The Schmeidler definition of uncertainty aversion is examined first. This is accompanied by examples that motivate the search for an alternative definition. Then, because the parallel with the well understood theory of risk aversion is bound to be helpful, relevant aspects of that theory are reviewed. A new definition of uncertainty aversion is formulated in the remainder of Section 9.2 and some attractive properties are described in Section 9.3. In particular, uncertainty aversion is shown to have intuitive empirical content and to admit simple characterizations within the CEU and multiple-priors models. The notion of “eventwise derivative” and the analysis of uncertainty aversion given eventwise differentiability follow in Section 9.4. It is shown that eventwise differentiability of utility simplifies the task of checking whether the corresponding preference order is uncertainty averse and thus enhances the tractability of the proposed definition. Section 9.5 concludes with remarks on the significance of the choice between the domain of Savage acts versus the larger AA domain of horse-race/roulette-wheel
A definition of uncertainty aversion
173
acts. This difference in domains is central to understanding the relation between this chapter and Schmeidler (1989). Two important limitations of the analysis should be acknowledged at the start. First, uncertainty aversion is defined relative to an exogenously specified collection of events A. Events in A are thought of as unambiguous or uncertainty free. They play a role here parallel to that played by constant (or risk-free) acts in the standard analysis of risk aversion. However, whether or not an event is ambiguous is naturally viewed as subjective or derived from preference. Accordingly, it seems desirable to define uncertainty aversion relative to the collection of subjectively unambiguous events. Unfortunately, such a formulation is beyond the scope of this chapter.2 In defense of the exogenous specification of the collection A, observe that Schmeidler (1989) relies on a comparable specification through the presence of objective lotteries in the AA domain. In addition, it seems likely that given any future success in endogenizing ambiguity, the present analysis of uncertainty aversion relative to a given collection A will be useful. The other limitation concerns the limited success in this chapter in achieving the ultimate objective of deriving the behavioral consequences of uncertainty aversion. The focus here is on the definition of uncertainty aversion. Some behavioral implications are derived but much is left for future work. In particular, applications to standard economic contexts, such as asset pricing or games, are beyond the scope of the chapter. However, the importance of the groundwork laid here for future applications merits emphasis—an essential precondition for understanding the behavioral consequences of uncertainty aversion is that the latter term has a precise and intuitively satisfactory meaning. Admittedly, there have been several papers in the literature claiming to have derived consequences of uncertainty aversion for strategic behavior and also for asset pricing. To varying degrees these studies either adopt the Schmeidler definition of uncertainty aversion or they do not rely on a precise definition. In the latter case, they adopt a model of preference that has been developed in order to accommodate an intuitive notion of uncertainty aversion and interpret the implications of this preference specification as due to uncertainty aversion. (This author is partly responsible for such an exercise (Epstein and Wang, 1995); there are other examples in the literature.) There is an obvious logical flaw in such a procedure and the claims made (or the interpretations proposed) are unsupportable without a satisfactory definition of uncertainty aversion. 9.1.2. The current definition of uncertainty aversion In order to motivate the chapter further, consider briefly Schmeidler’s definition of uncertainty aversion. See Section 9.5 for a more complete description and for a discussion of the importance of the choice between the AA domain (as in Schmeidler, 1989) and the Savage domain (as in this chapter). Fix a state space (S, ), where is an algebra, and an outcome set X . Denote by F the Savage domain, that is, the set of all finite-ranged (simple) and measurable acts e from (S, ) into X . Choice behavior relative to F is the object of study.
174
Larry G. Epstein
Accordingly, postulate a preference order ! and a representing utility function U defined on F. Schmeidler’s definition of uncertainty aversion has been used primarily in the context of CEU theory, according to which uncertain prospects are evaluated by a utility function having the following form: ceu u(e) dν, e ∈ F. (9.1) U (e) = S
Here, u : X −→ R1 is a vNM utility index, ν is a capacity (or nonadditive probability) on , integration is in the sense of Choquet and other details will be provided later.3 For such a preference order, uncertainty aversion in the sense of Schmeidler is equivalent to convexity of the capacity ν, that is, to the property whereby ν(A ∪ B) + ν(A ∩ B) ≥ ν(A) + ν(B),
(9.2)
for all measurable events A and B. Additivity is a special case that characterizes uncertainty neutrality (suitably defined). However, Ellsberg’s single-urn experiment illustrates the weak connection between convexity of the capacity and behavior that is intuitively uncertainty averse.4 The urn is represented by the state space S = {R, B, G}, where the symbols represent the possible colors, red, blue, and green of a ball drawn at random from an urn. The information provided the decision-maker is that the urn contains 30 red balls and 90 balls in total. Thus, while he knows that there are 60 balls that are either blue or green, the relative proportions of each are not given. Let ! be the decision-maker’s preference over bets on events E ⊂ S. Typical choices in such a situation correspond to the following rankings of events:5 {R} {B} ∼ {G}, {B, G} {R, B} ∼ {R, G}.
(9.3)
The intuition for these rankings is well known and is based on the fact that {R} and {B, G} have objective probabilities, while the other events are “ambiguous,” or have “ambiguous probabilities.” Thus these rankings correspond to an intuitive notion of uncertainty or ambiguity aversion. Next suppose that the decision-maker has CEU preferences with capacity ν. Then convexity is neither necessary nor sufficient for the above rankings. For example, if ν(R) = 8/24, ν(B) = ν(G) = 7/24, and ν({B, G}) = 13/24, ν({R, G}) = ν({R, B}) = 1/2, then (9.3) is implied but ν is not convex. For the fact that convexity is not sufficient, observe that convexity does not even exclude the “opposite” rankings that intuitively reflect an affinity for ambiguity. (Let ν(R) = 1/12, ν(B) = ν(G) = 1/6, ν({B, G}) = 1/3, ν({R, G}) = ν({R, B}) = 1/2.) An additional example, taken from Zhang (1997), will reinforce that stated earlier and also illustrate a key feature of the analysis to follow. An urn contains 100 balls in total, with color composition R, B, W , and G, such
A definition of uncertainty aversion
175
that R + B = 50 = G + B. Thus S = {R, B, G, W } and the collection A = {∅, S, {B, G}, {R, W }, {B, R}, {G, W }} contains the events that are intuitively unambiguous. It is natural to suppose that the decision-maker would use the probability measure p on A, where p assigns probability 1/2 to each binary event. For other subsets of S, she might use the capacity p∗ defined by6 p∗ (E) = sup {p(B) : B ⊂ E, B ∈ A},
E ⊂ S.
The fact that the capacity of any E is computed by means of an inner approximation by unambiguous events seems to capture a form of aversion to ambiguity. However, p∗ is not convex because 1 = p∗ ({B, G}) + p∗ ({B, R}) > p∗ ({B, G, R}) + p∗ ({B}) =
1 . 2
Finally, observe that the collection A is not an algebra, because it is not closed with respect to intersections. Each of {R, B} and {G, B} is unambiguous, but {B} is ambiguous, showing that an algebra is not the appropriate mathematical structure for modeling collections of unambiguous events. This important insight is due to Zhang (1997). He further proposes an alternative structure, called a λ-system, that is adopted in Section 9.2.2.
9.2. Aversion to risk and uncertainty 9.2.1. Risk aversion Recall first some aspects of the received theory of risk aversion. This will provide some perspective for the analysis of uncertainty aversion. In addition, it will become apparent that if a distinction between risk and uncertainty is desired, then the theory of risk aversion must be modified. Because a subjective approach to risk aversion is the relevant one, adapt Yaari’s analysis (Yaari, 1969), which applies to the primitives (S, ), X ⊂ RN and !, a preference over the set of acts F. Turn first to “comparative risk aversion.” Say that !2 is more risk averse than 1 ! if for every act e and outcome x, x !1 (1 )e =⇒ x !2 (2 )e.
(9.4)
The two acts that are being compared here differ in that the variable outcomes prescribed by e are replaced by the single outcome x. The intuition for this definition is clear given the identification of constant acts with the absence of risk or perfect certainty. To define absolute (rather than comparative) risk aversion, it is necessary to adopt a “normalization” for risk neutrality. Note that this normalization is exogenous to the model. The standard normalization is the “expected value function,”
176
Larry G. Epstein
that is, risk neutral orders !rn are those satisfying:
e !rn e ⇐⇒
e(s)dm(s) !rn S
e (s)dm(s),
(9.5)
S
for some probability measure m on (S, ), where the R N -valued integrals are interpreted as constant acts and accordingly are ranked by !rn . This leads to the following definition of risk aversion: Say that ! is risk averse if there exists a risk neutral order !rn such that ! is more risk averse than !rn . Risk loving and risk neutrality can be defined in the obvious ways. In the SEU framework, this notion of risk aversion is the familiar one characterized by concavity of the vNM index, with the required m being the subjective beliefs or prior. By examining the implications of risk aversion for choice between binary acts, Yaari (1969), argues that this interpretation for m extends to more general preferences. Three points from this review merit emphasis. First, the definition of comparative risk aversion requires an a priori definition for the absence of risk. Observe that the identification of risklessness with constant acts is not tautological. For example, Karni (1983) argues that in a state-dependent expected utility model “risklessness” may very well correspond to acts that are not constant. Thus the choice of how to model risklessness is a substantive normalization that precedes the definition of “more risk averse.” Second, the definition of risk aversion requires further an a priori definition of risk neutrality. The final point is perhaps less evident or familiar. Consider rankings of the sort used in (9.4) to define “more risk averse.” A decision-maker may prefer the constant act because she dislikes variable outcomes even when they are realized on events that are understood well enough to be assigned probabilities (risk aversion). Alternatively, the reason for the indicated preference may be that the variable outcomes occur on events that are ambiguous and because she dislikes ambiguity or uncertainty. Thus it seems more appropriate to describe (9.4) as revealing that !2 is “more risk and uncertainty averse than !1 ,” with no attempt being made at a distinction. However, the importance of the distinction between these two underlying reasons seems self-evident; it is reflected also in recent concern with formal models of “Knightian uncertainty” and decision theories that accommodate the Ellsberg (as opposed to Allais) Paradox. The second possibility mentioned earlier can be excluded, and thus a distinction made, by assuming that the decision-maker is indifferent to uncertainty, or put another way, by assuming that there is no uncertainty (all events are assigned probabilities). But these are extreme assumptions that are contradicted in Ellsberg-type situations. This chapter identifies and focuses upon the uncertainty aversion component implicit in the comparisons (9.4) and, to a limited extent, achieves a separation between risk aversion and uncertainty aversion.
A definition of uncertainty aversion
177
9.2.2. Uncertainty aversion Once again, consider orders ! on F, where for the rest of the chapter the outcome set X is arbitrary rather than Euclidean. The objective now is to formulate intuitive notions of comparative and absolute uncertainty aversion. Turn first to comparative uncertainty aversion. It is clear intuitively and also from the discussion of risk aversion that one can proceed only given a prior specification of the “absence of uncertainty.” This specification takes the form of an exogenous family A ⊂ of “unambiguous” events. Assume throughout the following intuitive requirements for A: It contains S and A∈A A1 , A2 ∈ A
implies that Ac ∈ A; and
A1 ∩ A2 = ∅ imply that A1 ∪ A2 ∈ A.
Zhang (1997) argues that these properties are natural for a collection of unambiguous events and, following (Billingsley, 1986: 36), calls such collections λ-systems. Intuitively, if an event being unambiguous means that it can be assigned a probability by the decision-maker, then the sum of the individual probabilities is naturally assigned to a disjoint union, while the complementary probability is naturally assigned to the complementary event. As demonstrated earlier, it is not intuitive to require that A be closed with respect to non-disjoint unions or intersections, that is, that A be an algebra. Denote by F ua the set of A-measurable acts, also called unambiguous acts. The following definition parallels the earlier one for comparative risk aversion. Given two orderings, say that !2 is more uncertainty averse than !1 if for every unambiguous act h and every act e in F, h !1 (1 )e =⇒ h !2 (2 )e.
(9.6)
There is no loss of generality in supposing that the acts h and e deliver the identical outcomes. The difference between the acts lies in the nature of the events where these outcomes are delivered (some of these events may be empty). For h, the typical outcome x is delivered on the unambiguous event h−1 (x), while it occurs on an ambiguous event given e. Then whenever the greater ambiguity inherent in e leads !1 to prefer h, the more ambiguity averse !2 will also prefer h. This interpretation relies on the assumption that each event in A is unambiguous and thus is (weakly) less ambiguous than any E ∈ . Fix an order !. To define absolute (rather than comparative) uncertainty aversion for !, it is necessary to adopt a “normalization” for uncertainty neutrality. As in the case of risk, a natural though exogenous normalization exists, namely that preference is based on probabilities in the sense of being probabilistically sophisticated as defined in Machina and Schmeidler (1992). The functional form of representing utility functions reveals clearly the sense in which preference is based on probabilities. The components of that functional form are a probability measure m on the state space (S, ) and a functional W : (X ) −→ R1 , where (X ) denotes the set of all simple (finite support) probability measures
178
Larry G. Epstein
on the outcome set X . Using m, any act e induces such a probability distribution m,e . Probabilistic sophistication requires that e be evaluated only through the distribution over outcomes m,e that it induces. More precisely, utility has the form U ps (e) = W (m,e ),
e ∈ F.
(9.7)
Following Machina and Schmeidler (1992: 754), assume also that W is strictly increasing in the sense of first-order stochastic dominance, suitably defined.7 Denote any such order by !ps . A decision-maker with !ps assigns probabilities to all events and in this way transforms any act into a lottery, or pure risk. Such exclusive reliance on probabilities is, in particular, inconsistent with the typical “uncertainty averse” behavior exhibited in Ellsberg-type experiments. Thus it is both intuitive and consistent with common practice to identify probabilistic sophistication with uncertainty neutrality. Think of m and W as the “beliefs” (or probability measure) and “risk preferences” underlying !ps .8 This normalization leads to the following definition: Say that ! is uncertainty averse if there exists a probabilistically sophisticated order !ps such that ! is more uncertainty averse than !ps . In other words, under the conditions stated in (9.6), h !ps (ps )e =⇒ h ! ()e.
(9.8)
The intuition is similar to that for (9.6). It is immediate that ! and !ps agree on unambiguous acts. Further, !ps is indifferent to uncertainty and thus views all acts as being risky only. Therefore, interpret (9.8) as stating that !ps is a “risk preference component” of !. The indefinite article is needed for two reasons—first because all definitions depend on the exogenously specified collection A and second, because !ps need not be unique even given A. Subject to these same qualifications, the probability measure underlying !ps is naturally interpreted as “mean” or “uncertainty-free” beliefs underlying !. The formal analysis stated later does not depend on these interpretations. It might be useful to adapt familiar terminology and refer to !ps satisfying (9.8) as constituting a support for ! at h. Then uncertainty aversion for ! means that there exists a single order !ps supporting ! at every unambiguous act. A parallel requirement in consumer theory is that there exist a single price vector supporting the indifference curve at each consumption bundle on the 45◦ line. (This parallel is developed further in Section 9.3.4 and via Theorem 9.3(c).) Turn next to uncertainty loving and uncertainty neutrality. For the definition of the former, reverse the inequalities in (9.8). That is, say that ! is uncertainty loving if there exists a probabilistically sophisticated order !ps such that, under the conditions stated in (9.6), h )ps (≺ps )e =⇒ h ) (≺)e.
(9.9)
The conjunction of uncertainty aversion and uncertainty loving is called uncertainty neutrality.
A definition of uncertainty aversion
179
9.2.3. A degree of separation Consider the question of a separation between attitudes toward uncertainty and attitudes toward risk. Suppose that ! is uncertainty averse with support !ps . Because ! and !ps agree on the set F ua of unambiguous acts, ! is probabilistically sophisticated there. Thus, treating the probability measure underlying !ps as objective, one may adopt the standard notion of risk aversion (or loving) for objective lotteries (see e.g. Machina (1982)) in order to give precise meaning to the statement that ! is risk averse (or loving). In the same way, such risk attitudes are well defined if ! is uncertainty loving. That a degree of separation between risk and uncertainty attitudes has been achieved is reflected in the fact that all four logically possible combinations of risk and uncertainty attitudes are admissible. On the other hand, the separation is partial: If !1 is more uncertainty averse than !2 , then these two preference orders must agree on F ua and thus embody the same risk aversion. As emphasized earlier, the meaning of uncertainty aversion depends on the exogenously specified A. That specification also bears on the distinction between risk aversion and uncertainty aversion. The suggestion just expressed is that the risk attitude of an order ! is embodied in the ranking it induces on F ua , while the attitude toward uncertainty is reflected in the way in which ! relates arbitrary acts e with unambiguous acts h as in (9.6). Thus if the modeler specifies that A = {∅, S}, and hence that F ua contains only constant acts, then she is assuming that the decision-maker is not facing any meaningful risk. Accordingly, the modeler is led to interpret comparisons of the form (9.4) as reflecting (comparative) uncertainty aversion exclusively. At the other extreme, if the modeler specifies that A = , and hence that all acts in F are unambiguous, then she is assuming that the decision-maker faces only risk, which leads to the interpretation of (9.4) as reflecting (comparative) risk aversion exclusively. More generally, the specification of A reflects the modeler’s prior view of the decision-maker’s perception of his environment.
9.3. Is the definition attractive? 9.3.1. Some attractive properties The definition of uncertainty aversion has been based on the a priori identification of uncertainty neutrality (defined informally) with probabilistic sophistication. Therefore, internal consistency of the approach should deliver this identification as a formal result. On the other hand, because attitudes toward uncertainty have been defined relative to a given A, such a result cannot be expected unless it is assumed that A is “large.” Suppose, therefore, that A is rich: There exist x ∗ x∗ such that for every E¯ ⊂ E in and A in A satisfying (x ∗ , A; x∗ , Ac ) ∼ (x ∗ , E; x∗ , E c ),
180
Larry G. Epstein
there exists A¯ in A, A¯ ⊂ A such that ¯ x∗ , A¯ c ) ∼ (x ∗ , E; ¯ x∗ , E¯ c ). (x ∗ , A; A corresponding notion of richness is valid for the roulette-wheel lotteries in the AA framework adopted by Schmeidler (1989).9 The next theorem (proved in Appendix A) establishes the internal consistency of our approach. Theorem 9.1. If ! is probabilistically sophisticated, then it is uncertainty neutral. The converse is true if A is rich. The potential usefulness of the notion of uncertainty aversion depends on being able to check for the existence of a probabilistically sophisticated order supporting a given !. This concern with tractability motivates the later analysis of eventwise differentiability. Anticipating that analysis, consider here the narrower question “does there exist !ps that both supports ! and has underlying beliefs represented by the given probability measure m on ?” On its own, the question may seem to be of limited interest. But once eventwise differentiability delivers m, its answer completes a procedure for checking for uncertainty aversion. Lemma 9.1. Let !ps support ! in the sense of (9.8) and have underlying probability measure m on . Then: (i) For any two unambiguous acts h and h , if m,h first-order stochastically dominates m,h , then U (h) ≥ U (h ). (ii) For all acts e and unambiguous acts h, m,e = m,h =⇒ U (e) ≤ U (h). The converse is true if m satisfies: For each unambiguous A and 0 < r < mA, there exists unambiguous B ⊂ A with mB = r. The added assumption for m is satisfied if S = S1 × S2 , unambiguous events are measurable subsets of S1 and the marginal of m on S1 is convex-ranged in the usual sense. The role of the assumption is to ensure that, using the notation surrounding (9.7), {m,h : h ∈ F ua } = (X ). 9.3.2. Multiple-priors and CEU utilities The two most widely used generalizations of SEU theory are CEU and the multiplepriors model. In this subsection, uncertainty aversion is examined in the context of these models.
A definition of uncertainty aversion
181
Say that ! is a multiple-priors preference order if it is represented by a utility function U mp of the form U mp (e) = min u(e) dm, (9.10) m∈P
S
for some set P of probability measures on (S, ) and some vNM index u : X −→ R1 . Given a class A, it is natural to model the unambiguous nature of events in A by supposing that all measures in P are identical when restricted to A; that is, mA = m A
for all m and m in P and A in A.
(9.11)
These two restrictions on ! imply uncertainty aversion, because ! is more uncertainty averse than the expected utility order !ps with vNM index u and any probability measure m in P . More precisely, the following intuitive result is valid: Theorem 9.2. Any multiple-priors order satisfying (9.11) is uncertainty averse. Proof. Let !ps denote an expected utility order with vNM index u and any probps e ⇐⇒ ability measure m in P . Then h ! u(h) dm ≥ u(e) dm =⇒ U mp (h) = u(h) dm ≥ u(e) dm ≥ U mp (e). A commonly studied special case of the multiple-priors model is a CEU order with convex capacity ν. Then (9.10) applies with P = core(ν) = {m : m(·) ≥ ν(·) on }. Thus convexity of the capacity implies uncertainty aversion given (9.11). Focus more closely on the CEU model, with particular emphasis on the connection between uncertainty aversion and convexity of the capacity. The next result translates Lemma 9.1 into the present setting, thus providing necessary and sufficient conditions for uncertainty aversion combined with a prespecified supporting probability measure m. For necessity, an added assumption is adopted. Say that a capacity ν is convex-ranged if for all events E1 ⊂ E2 and ν(E1 ) < r < ν(E2 ), there exists E, E1 ⊂ E ⊂ E2 , such that ν(E) = r. This terminology applies in particular if ν is additive, where it is standard.10 For axiomatizations of CEU that deliver a convex-ranged capacity, see (Gilboa, 1987: 73 and Sarin and Wakker, 1992: Proposition A.3). Savage’s axiomatization of expected utility delivers a convex-ranged probability measure. Use the notation U ceu to refer to utility functions defined by (9.1), where the vNM index u : X −→ R1 satisfies u(X ) has nonempty interior in R1 . For those unfamiliar with Choquet integration, observe that for simple acts it yields n−1 U ceu (e) = i=1 [u(xi ) − u(xi+1 )]ν(∪i1 Ej ) + u(xn ),
(9.12)
where the outcomes are ranked as x1 x2 · · · xn and the act e has e(xi ) = Ei , i = 1, . . . , n.
182 Larry G. Epstein Lemma 9.2. Let U ceu be a CEU utility function with capacity ν. (a) The following conditions are sufficient for U ceu to be uncertainty averse with supporting U ps having m as underlying probability measure: There exists a bijection g : [0, 1] −→ [0, 1] such that m ∈ core(g −1 (ν));
and
m(·) = g −1 (ν(·)) on A.
(9.13) (9.14)
(b) Suppose that ν is convex-ranged and that A is rich. Then the conditions in (a) are necessary in order that U ceu be uncertainty averse with supporting U ps having m as underlying probability measure. (c) Finally, in each of the preceding parts, the supporting utility U ps can be taken to be an expected utility function if and only if in addition g is the identity function, that is, m = ν on A and m ≥ ν on .
(9.15)
See Appendix A for a proof. The supporting utility function U ps that is provided by the proof of (a) has the form (9.7), where the risk preference functional W is W () = u(x) d(g ◦ )(x), X
a member of the rank-dependent expected utility class (Chew et al., 1987). Observe first that attitudes toward uncertainty do not depend on properties of the vNM index u. More surprising is that given m, the conditions on ν described in (a) are ordinal invariants, that is, if ν satisfies them, then so does ϕ(ν) for any monotonic transformation ϕ. In other words, ν and g satisfy these conditions if and only if ϕ(ν) and g = ϕ(g) do. Consequently, under the regularity conditions in the lemma, the CEU utility function u(e) dν is uncertainty averse if and only if the same is true for u(e) dϕ(ν). The fact that uncertainty aversion is determined by ordinal properties of the capacity makes it perfectly clear that uncertainty aversion has little to do with convexity, a cardinal property. Thus far, only parts (a) and (b) of the lemma have been used. Focus now on (c), characterizing conditions under which U ceu is “more uncertainty averse than some expected utility order with probability measure m.” Because the CEU utility functions studied by Schmeidler are defined on horse-race/roulette-wheels and conform with expected utility on the objective roulette-wheels, this latter comparison may be more relevant than uncertainty aversion per se for understanding the connection with convexity. The lemma delivers the requirement that ν be additive on A and that it admit an extension to a measure lying in its core. It is well known that convexity of ν is sufficient for nonemptiness of the core, but that seems to be the extent of the link with uncertainty aversion. The final example in Section 9.1.2, as completed in the next subsection, shows that U ceu may be more uncertainty averse than some expected utility order even though its capacity is not convex.
A definition of uncertainty aversion
183
To summarize, there appears to be no logical connection in the Savage framework between uncertainty aversion and convexity. Convexity does not imply uncertainty aversion, unless added conditions such as (9.11) are imposed. Furthermore, convexity is not necessary even for the stricter notion “more uncertainty averse than some expected utility order” that seems closer to Schmeidler’s notion. This is not to say that convexity and the associated multiple-priors functional structure that it delivers are not useful hypotheses. Rather, the point is to object to the widely adopted behavioral interpretation of convexity as uncertainty aversion. 9.3.3. Inner measures Zhang (1997) argues that rather than convex capacities, it is capacities that are inner measures that model uncertainty aversion. These capacities are defined as follows: Let p be a probability measure on A; its existence reflects the unambiguous nature of events in A. Then the corresponding inner measure p∗ is the capacity given by p∗ (E) = sup{p(B) : B ⊂ E, B ∈ A},
E ∈ .
The fact that the capacity of any E is computed by means of an inner approximation by unambiguous events seems to capture a form of aversion to ambiguity. Zhang provides axioms for preference that are consistent with this intuition and that deliver the subclass of CEU preferences having an inner measure as the capacity ν. It is interesting to ask whether CEU preferences with inner measures are uncertainty averse in the formal sense of this chapter. The answer is “sometimes” as described in the next lemma. Lemma 9.3. Let U ceu (·) ≡ u(·)dp∗ , where p∗ is the inner measure generated as above from the probability measure p on A. ceu is more (a) If p admits an extension to a probability measure on , then U uncertainty averse than the expected utility function u(·)dp. (b) Adopt the auxiliary assumptions in Lemma 9.2(b). If U ceu is uncertainty averse, then p admits an extension from A to a measure on all of . Proof. (a) p∗ and p coincide on A. For every B ⊂ E, p(B) ≤ p(E). Therefore, p∗ (E) ≤ p(E). From the formula (9.12) for the Choquet integral, conclude that for all acts e and unambiguous acts h,
u(h) dp∗ =
u(h) dp
and
u(e) dp∗ ≤
u(e) dp.
(b) By Lemma 9.2 and its proof, p = p∗ = g(m) on A and m(A) = [0, 1]. Therefore, g must be the identity function. Again by the previous lemma, m lies in core(p∗ ), implying that m ≥ p∗ = p on A. Because A is closed with respect to complements, conclude that m = p on A and hence that m is the asserted extension of p.
184
Larry G. Epstein
Both directions in the lemma are of interest. In general, a probability measure on the λ-system A need not admit an extension to the algebra .11 Therefore, (b) shows that the intuition surrounding “inner approximation” is flawed or incomplete, demonstrating the importance of a formal definition of uncertainty aversion. Part (a) provides a class of examples of CEU functions that are more uncertainty averse than some expected utility order. These can be used to show that even if this stricter notion of (more) uncertainty averse is adopted, the capacity p∗ need not be convex. For instance, the last example in Section 9.1.2 satisfies the conditions in (a)—the required extension is the equally likely (counting) probability measure on the power set. Thus preference is uncertainty averse, even though p∗ is not convex. 9.3.4. Bets, beliefs and uncertainty aversion This section examines some implications of uncertainty aversion for the ranking of binary acts. Because the ranking of bets reveals the decision-maker’s underlying beliefs or likelihoods, these implications clarify the meaning of uncertainty aversion and help to demonstrate its intuitive empirical content. The generic binary act is denoted xEy, indicating that x is obtained if E is realized and y otherwise. Let ! be uncertainty averse with probabilistically sophisticated order !ps satisfying (9.8). Apply the latter to binary acts, to obtain the following relation: For all unambiguous A, events E and outcomes x1 and x2 , x1 Ax2 !ps (ps )x1 Ex2 =⇒ x1 Ax2 ! ()x1 Ex2 . Proceed to transform this relation into a more illuminating form. Exclude the uninteresting case x1 ∼ x2 and assume that x1 x2 . Then x1 Ex2 can be viewed as a bet on the event E. As noted earlier, !ps necessarily agrees with the given ! in the ranking of unambiguous acts and hence also constant acts or outcomes, so x1 ps x2 . Let m be the subjective probability measure on the state space (S, ) that underlies !ps . Then the monotonicity property inherent in probabilistic sophistication implies that x1 Ax2 !ps (ps )x1 Ex2 ⇐⇒ m(A1 ) ≥ (>)m(E1 ). Conclude that uncertainty aversion implies the existence of a probability measure m such that: For all A, E, x1 and x2 as given earlier, m(A) ≥ (>)m(E) =⇒ x1 Ax2 ! ()x1 Ex2 . One final rewriting is useful. Define, for the given pair x1 x2 , ν(E) = U (x1 Ex2 ).
A definition of uncertainty aversion
185
Then, mA ≥ (>)mE =⇒ νA ≥ (>)νE,
(9.16)
which is the sought-after implication of uncertainty aversion.12 In the special case of CEU (9.1), with vNM index satisfying u(x1 ) = 1 and u(x2 ) = 0, ν defined as stated earlier coincides with the capacity in the CEU functional form. Even when CEU is not assumed, (suppose that ν is monotone with respect to set inclusion and) refer to ν as a capacity. The interpretation is that ν represents ! numerically over bets on various events with the given stakes x1 and x2 , or alternatively, that it represents numerically the likelihood relation underlying preference !. From this perspective, only the ordinal properties of ν are significant.13 An implication of (9.16) is that ν and m must be ordinally equivalent on A (though not on ). In other words, uncertainty aversion implies the existence of a probability measure m that supports {E ∈ : ν(E) ≥ ν(A)} at each unambiguous A, where support is in a sense analogous to the usual meaning, except that the usual linear supporting function defined on a linear space is replaced by an additive function defined on an algebra. Think of the measure m as describing the (not necessarily unique) “mean ambiguity-free likelihoods” implicit in ν and !. This interpretation and the “support” analogy are pursued and developed further in Section 9.4.3 under the assumption that preference is eventwise differentiable. In a similar fashion, one can show that uncertainty loving implies the existence of a probability measure q on (S, ) such that q(A) ≤ (<)q(E) =⇒ ν(A) ≤ (<)ν(E),
(9.17)
for every E ∈ and A ∈ A. The conjunction of (9.16) and (9.17) imply, under a mild additional assumption, that ν is ordinally equivalent to a probability measure (see Lemma 9.A.1), which is one step in the proof of Theorem 9.1. Because choice between bets provides much of the experimental evidence regarding nonindifference to uncertainty, the implication (9.16) is convenient for demonstrating the intuitive empirical content of uncertainty aversion. The Ellsberg urn discussed in the introduction provides the natural vehicle. Consider again the typical choices in (9.3). In order to relate these rankings to the formal definition of uncertainty aversion, adopt the natural specification A = {∅, S, {R}, {B, G}}. Given this specification, it is easy to see that these rankings imply uncertainty aversion—the measure m assigning 1/3 probability to each color is a support in the sense of (9.16). Equally revealing is that the notion of uncertainty aversion excludes behavior that is interpreted intuitively as reflecting an affinity for ambiguity.14 To see this, suppose that the decision-maker’s rankings are changed by reversing the strict preference “” to “≺”. These new rankings contradict uncertainty aversion: Let
186
Larry G. Epstein
m be a support as in the implication (9.16) of uncertainty aversion and take A = {B, G}. Then {B, G} ≺ {R, B} implies that m({B, G}) < m({R, B}). Because m is additive, conclude that m(G) < m(R). But then uncertainty aversion applied to the unambiguous event {R} implies that {R} {G}, contrary to the hypothesis. Though a general formal result seems unachievable, there is an informal sense in which these results seem to be valid much more broadly than the specific Ellsberg experiment considered. Typically when choices are viewed as paradoxical relative to probabilistically sophisticated preferences, there is a natural probability measure on the state space that is “contradicted” by observed choices. This seems close to saying precisely that the measure is a support. Another revealing implication of uncertainty aversion is readily derived from (9.16). Notation that is useful here and later is, given A, write an arbitrary event E in the form E = A + F − G,
where
F = E\A
and
G = A\E.
(9.18)
Henceforth, E + F denotes both E ∪ F and the assumption that the sets are disjoint. Similarly, implicit in the notation E − G is that G ⊂ E. Now let m be the supporting measure delivered by uncertainty aversion. Then for any unambiguous A and A, if F ⊂ A ∩ Ac and G ⊂ Ac ∩ A, A − F + G ! A =⇒ A + F − G ) A,
(9.19)
because the first ranking implies (by the support property at A ) that mF ≤ mG and this implies the second ranking (by the support property at A).15 In particular, taking A = Ac , Ac − F + G ! Ac =⇒ A + F − G ) A,
(9.20)
for all F ⊂ Ac and G ⊂ A. The interpretation is that if F seems small relative to G when (as at A ) one is contemplating subtracting F and adding G, then it also seems small when (as at A) one is contemplating adding F and subtracting G. This is reminiscent of the familiar inequality between the compensating and equivalent variations for an economic change, or the property of diminishing marginal rate of substitution. A closer connection between uncertainty aversion and such familiar notions from consumer theory is possible if eventwise differentiability of preference is assumed, as in the next section.
9.4. Differentiable utilities Tractability in applying the notion of uncertainty aversion raises the following question: Is there a procedure for deriving from ! all probabilistically sophisticated orders satisfying (9.8), or for deriving from ν all candidate supporting measures m satisfying (9.16)? Such a procedure is essential for the hypothesis of uncertainty aversion to be verifiable. For example, within CEU, Lemma 9.2 describes the probability measures that can serve as supports. However, to apply
A definition of uncertainty aversion
187
the description, one must be able to compute the cores of the capacity ν and of monotonic transformations of ν, while even the core of ν alone is typically not easily computed from ν. In order to address the question of tractability, this section introduces the notion of eventwise differentiability of preference. Much as within expected utility theory (where outcomes lie in some Rn ), differentiability of the vNM index simplifies the task of checking for concavity and hence risk aversion, eventwise differentiability simplifies the task of checking for uncertainty aversion. That is because such differentiability permits the candidate supporting measures to be derived via convenient calculations of the sort familiar from calculus. Further, conditions are provided that deliver a unique supporting measure from the eventwise derivative of utility. When combined with Lemmas 9.1 and 9.2, this provides a practicable characterization of uncertainty aversion. 9.4.1. Definition of eventwise differentiability The standard representation of an act, used earlier, is as a measurable map from states into outcomes. Let e : S −→ X be such an act. An alternative representation of this act is by means of the inverse correspondence e−1 , denoted by e. Thus e : X −→ , where e(x) denotes the event E on which the act assumes the outcome x. For notational simplicity, it is convenient to write e rather than e and to leave it to the context to make clear whether e denotes a mapping from states into consequences or alternatively from outcomes into events. Henceforth, when examining the decision-maker’s ranking of a pair of acts, view those acts as assigning a common set of outcomes to different events. This perspective is “dual” to the more common one, where distinct acts are viewed as assigning different outcomes to common events. These two perspectives are mathematically equally valid; the choice between them is a matter of convenience. The latter is well suited to the study of risk aversion (attitudes toward variability in outcomes) and, it is argued here, the former is well suited to the study of uncertainty aversion. The intuition is that uncertainty or ambiguity stems from events and that aversion to uncertainty reflects attitudes toward changes in those events. Because acts are simple, {x ∈ X : e(x) = ∅} is finite.
(9.21)
In addition, {e(x) : x ∈ X , e(x) = ∅}
partitions S.
(9.22)
The set of acts F may be identified with the set of all maps satisfying these two conditions. In particular, F ⊂ X , where the latter is defined as the set of all maps from X into satisfying (9.21). Let U : F −→ R be a utility function for ! and define the “eventwise derivative of U .” Because utility is defined on a subset of X , it is convenient to define derivatives first for functions that are defined on all of X . Continue to refer to elements e ∈ X as acts even when they are not elements of F.
188
Larry G. Epstein
The following structure for X is useful. Define the operations “∪”, “∩” and “complementation” (e −→ ec ) on X co-ordinatewise; for example, (e ∪ f )(x) ≡ e(x) ∪ f (x),
for all x ∈ X .
Say that e and f are disjoint if e(x) ∩ f (x) ≡ ∅ for all x, abbreviated e ∩ f = ∅. In that case, denote the above union by e + f . The notation e \e and e e indicates set difference and symmetric difference applied outcome by outcome. Similar meaning is given to g ⊂ e. Say that {f j }nj=1 partitions f if {f j (x)} partitions f (x) for each x. Define the refinement partial ordering of partitions in the obvious way. Given an act f , λ }λ denotes the net of all finite partitions of f , where λ < λ if and only {{f j ,λ }nj =1 if the partition corresponding to λ refines the partition corresponding to λ. A real-valued function µ on X is called additive if it is additive across disjoint acts. Refer to such a function as a (signed) measure even though that terminology is usually reserved for functions defined on algebras, while X is not an algebra.16 Expected utility functions, U (e) = x u(x)p(e(x)), are additive and hence measures in this terminology. The properties of boundedness and convex-rangedness for a measure µ on X can be defined in the natural way (see Appendix B.) Define differentiability for a function : X −→ R1 . In order to better understand the essence of the definition, some readers may wish to focus on the special case where the domain of is . Then each act e is simply an event E. One can think of as a capacity and of δ(·; E) as its derivative at E. Definition 9.1. is (eventwise) differentiable at e ∈ X if there exists a bounded and convex-ranged measure δ(·; e) on X , such that: For all f ⊂ ec and g ⊂ e, λ jn=1 | (e + f j ,λ − g j ,λ ) − (e) − δ(f j ,λ ; e) + δ(g j ,λ ; e) | −→λ 0.
(9.23) Any utility function U is defined on the proper subset F of X . Define δU (·; e) as given earlier, with the exception that the perturbations f j ,λ and g j ,λ are restricted so that e + f j ,λ − g j ,λ lies in F. Say that U is eventwise differentiable if the derivative exists at each e in F. (To clarify the notation, suppose that e is an act in F that assumes the outcomes x1 and x2 on E and E c , respectively. Let f assume (only) these outcomes on events F ⊂ E c and G ⊂ E, while g assumes (only) x1 and x2 on G and F , respectively. Then f and g lie in X , f ⊂ ec , g ⊂ e, and e + f − g is the act in F that yields x1 on E + F − G and x2 on its complement. Further if {F j ,λ } and {Gj ,λ } are partitions of F and G and if f j ,λ and g j ,λ are defined in fashion paralleling the definitions given for f and g, then {f j ,λ } and {g j ,λ } are partitions of f and g that enter into the definition of δU (·; e).) The suggested interpretation is that δU (·; e) represents the “mean” or “uncertainty free” assessment of acts implicit in utility, as viewed from the perspective of
A definition of uncertainty aversion
189
the act e. It may help to recall that in the theory of expected utility over objective lotteries or risk, if the vNM index is differentiable, then utility is linear to the first order and hence preference is risk neutral for small gambles. The suggested parallel here is that a differentiable utility is additive (rather than linear) and uncertainty neutral (rather than risk neutral) to the “first-order.” Before applying eventwise differentiability to the analysis of uncertainty aversion, the next section provides some examples. See Appendix C for some technical aspects of eventwise differentiability, for a stronger form of differentiability (similar to that in Machina, 1992) and for a brief comparison with Rosenmuller (1972), which inspired the earlier mentioned definition. 9.4.2. Examples Turn to some examples that illustrate both differentiability and uncertainty aversion. All are special cases of the CEU model (9.12), though other examples are readily constructed. Because the discussion of differentiability dealt with functions defined on X rather than just F, rewrite the CEU functional form here using this larger domain. If the outcomes satisfy x1 x2 · · · xn and the act e has e(xi ) = Ei , i = 1, . . . , n, then U
ceu
n−1 (e) = [u(xi ) − u(xi+1 )]ν(∪i1 Ej ) + u(xn )ν(∪n1 Ej ). i=1
Suppose that the capacity ν is eventwise differentiable with derivative δν(·; E) at E; naturally, differentiability is in the sense of the last section (with | X |= 1). Then U ceu (·) is eventwise differentiable with derivative δU (e ; e) =
n−1
[u(xi ) − u(xi+1 )]δν(∪i1 Ej ; ∪i1 Ej ) + u(xn )δν(∪n1 Ej ; ∪n1 Ej ),
i=1
(9.24) where e (xi ) = Ei . (This follows as in calculus from the additivity property of differentiation.) Because differentiability of utility is determined totally by that of the capacity, it is enough to consider examples of differentiable (and nondifferentiable) capacities. In each case where the capacity is differentiable, (9.24) describes the corresponding derivative of utility. The CEU case demonstrates clearly that eventwise differentiability is distinct from more familiar notions, such as Gateaux differentiability. It is well-known that a CEU utility function is not Gateaux differentiable, even if the vNM index is smooth, unless it is an expected utility function. In contrast, many CEU utility functions are eventwise differentiable, regardless of the nature of u(·). Verification of the formulae provided for derivatives is possible using the definition (9.23). Alternatively, verification of the stronger µ-differentiability (see Appendix C) is straightforward. (Define µ by (9.C.2) and µ 0 = p in the first two
190
Larry G. Epstein
examples, = q in the third example and = ∗ /∗ (S) in the final example, where only “one-sided” derivatives exist.) Example. (Probability measure) Let p be a convex-ranged probability measure. Then δp(·; E) = p(·), the same measure for all E. Application of (9.24) yields δU (e ; e) =
n
u(xi )pEi .
i=1
Example. (Probabilistic sophistication within CEU ) Let ν = g(p),
(9.25)
where p is a convex-ranged probability measure and g : [0, 1] −→ [0, 1] is increasing, onto and continuously differentiable. The corresponding utility function lies in the rank-dependent-expected-utility class of functions studied in the case of risk where p is taken to be objective. (See Chew et al. (1987) and the references therein.) Then δν(·; E) = g (pE) p(·) and n δU (e ; e) = i=1 [u(xi ) − u(xi+1 )]g (p(∪i1 Ej ))p(∪i1 Ej ), u(xn+1 ) ≡ 0.
Example. (Quadratic capacity) Let ν(E) = p(E) q(E), where p and q are convex-ranged probability measures with p q. Then δν(·; E) = p(E)q(·) + p(·)q(E), a formula that is reminiscent of standard calculus.17 Direct verification shows that ν is convex. As for uncertainty aversion, if p and q agree on A, then the probability measure on defined by m(·) = δν(·; A)/δν(S; A) = [q(·) + p(·)]/2, serves as a support in the sense of (9.16). That the implied CEU utility function is uncertainty averse in the full sense of (9.8) may be established by application of Lemma 9.2. Observe that ν = p 2 = m2 on A; thus g(t) = t 2 . Then m lies in the core of (pq)1/2 , because [p(·) + q(·)]2 ≥ 4p(·)q(·). The probabilistically sophisticated supporting utility function U ps is U ps (e) = u(e) dm2 . S
A definition of uncertainty aversion
191
Example. (Interval beliefs) Let ∗ and ∗ be two non-negative, convex-ranged measures on (S, ), such that ∗ (·) ≤ ∗ (·) and
0 < ∗ (S) < 1 < ∗ (S).
Define ξ = ∗ (S) − 1 and ν(E) = max{∗ (E), ∗ (E) − ξ }.
(9.26)
Then ν is a convex capacity on (S, ) and has the core core(ν) = {p ∈ M(S, ) : ∗ (·) ≤ p(·) ≤ ∗ (·) on }. This representation for the core provides intuition for ν and the reason for its name. See Wasserman (1990) for details regarding this capacity and its applications in robust statistics. Because the capacity is “piecewise additive,” one can easily see that though it has “one-sided derivatives,” ν is generally not eventwise differentiable at any E such that ∗ (E) = ∗ (E) − ξ . It follows from Theorem 9.3 and the nature of core(ν) that a CEU utility U ceu with capacity ν is uncertainty averse for any class A such that ∗ (·) = ∗ (·) on A\{S}. Because any such class A excludes events that are “close to” S, such an A cannot be rich. In fact, one can show using Lemma 9.2, that it is impossible for U ceu to be uncertainty averse relative to any rich class of unambiguous events, unless ∗ (·)/∗ (S) = ∗ (·)/∗ (S) on , in which case U ceu is probabilistically sophisticated, providing another illustration of the lack of a connection between uncertainty aversion and convexity. 9.4.3. Uncertainty aversion under differentiability To begin this section, the discussion will be restricted to binary acts; that is, uncertainty aversion will refer to (9.9), or equivalently, to (9.16). Implications are then drawn for uncertainty aversion in the full sense of general acts and (9.8). The relevant derivative is δν(·; E), where νE ≡ U (x1 Ex2 ) and U need not be a CEU function. Assume that νE is increasing with E. Thus δν(·; E) is a nonnegative measure, though not necessarily a probability measure. The suggested interpretation from Section 9.4.1, specialized to this case, is that δν(·; E) represents the “mean” or “uncertainty-free” likelihoods implicit in ν, as viewed from the perspective of the event E. This interpretation is natural given that δν(·; E) is additive over events and hence ordinally equivalent to a probability measure on . Turn to the relation between differentiability and uncertainty aversion. When ν is differentiable, analogy with calculus might suggest that the support at any event A, in the sense of (9.16), should be unique and given by δν(·; A), perhaps up to a scalar multiple. Though the analogy with calculus is imperfect, it is nevertheless the case that, under additional assumptions, differentiability provides information about the set of supports.
192
Larry G. Epstein
The principal additional assumption may be stated as follows: A0 ≡ {A ∈ A : ν(S) > max{νA, νAc }}, the set of unambiguous events A such that A and its complement are each strictly less likely than S. Say that ν is coherent if there exists a positive real-valued function κ defined on A0 , such that δν(·; A) = κ(A)δν(·; Ac )
on ,
(9.27)
for each A in A0 . Coherence is satisfied by all the differentiable examples in Section 9.4.2. By the Chain Rule for eventwise differentiability (Theorem 9.C.1), coherence is invariant to suitable monotonic transformations of ν and thus is an assumption about the preference ranking of binary acts. It is arguably an expression of the unambiguous nature of events in A. To see this, it may help to consider first the following addition to (9.20): A + F − G ) A =⇒ Ac − F + G ! Ac . This is a questionable assumption because the events Ac − F + G and A + F − G are both ambiguous. Therefore, there is no reason to expect the perspective on the change “add F and subtract G” to be similar at Ac as at A. However, if F and G are both “small,” then only mean likelihoods matter and it is reasonable that the relative mean likelihoods of F and G be the same from the two perspectives. In fact, such agreement seems to be an expression of the existence of “coherent” ambiguity-free beliefs underlying preference. This condition translates into the following restriction on derivatives: δν(F ; A) ≤ δν(G; A) =⇒ δν(F ; Ac ) ≤ δν(G; Ac ). By arguments similar to those in the proof of the theorem, this implication delivers (9.27) under the assumptions in part (b). (Observe that the reverse implication follows from (9.20).) The following result is proven in Appendix A: Theorem 9.3. Let ν be eventwise differentiable. (a) If ν is uncertainty averse, then for all A ∈ A, F ⊂ Ac and G ⊂ A, δν(F ; Ac ) ≤ δν(G; Ac ) =⇒ ν(A + F − G) ≤ ν(A).
(9.28)
(b) Suppose further that is a σ -algebra and that m and each δν(·, A), A ∈ A0 , are countably additive, where m is a support in the sense of (9.16). Then for each A in A0 , δν(F ; A) m(G) ≤ δν(G; A) m(F )
(9.29)
δν(G; Ac ) m(F ) ≤ δν(F ; Ac ) m(G).
(9.30)
and
(c) Suppose further that A0 is nonempty and that ν is coherent. Then the unique countably additive supporting probability measure m is given by m(·) = δν(·; A)/δν(S; A), for any A in A0 .
A definition of uncertainty aversion
193
When division is permitted, the inequalities in (b) imply that δν(F ; A) m(F ) δν(F ; Ac ) ≤ ≤ , δν(G; A) m(G) δν(G; Ac )
(9.31)
which suggests an interpretation as an interval bound for the “marginal rate of substitution at any A between F and G.” The relation (9.28) states roughly that for each A, δν(·; Ac ) serves as a support at A. Given our earlier interpretation for the derivative, it states that if the decisionmaker would rather bet on A + F − G than on A when ambiguity is ignored and when mean-likelihoods are computed from the perspective of Ac , then she would make the same choice also when ambiguity is considered. That is because the former event is more ambiguous and the decision-maker dislikes ambiguity or uncertainty. Finally, part (c) of the theorem describes conditions under which the parallel with calculus is valid— the (countably additive) supporting measure is unique and given essentially by the derivative of ν. Note that the support property in question here is global in that the same measure “works” at each unambiguous A, and not just at a single given A.18 This explains the need for the coherence assumption, which helps to ensure that δν(·; A)/δν(S; A) is independent of A. Turn to uncertainty aversion for general nonbinary acts, that is, in the sense of (9.8). Lemma 9.1 characterizes uncertainty aversion for preferences or utility functions, assuming a given supporting measure. Theorem 9.3 delivers the uniqueness of the supporting measure under the stated conditions. Combining these two results produces our most complete characterization of uncertainty aversion. Theorem 9.4. Let U be a utility function, x1 x2 , ν(E) ≡ U (x1 Ex2 ) and suppose that ν is eventwise differentiable. Suppose further that each δν(·, A), A ∈ A0 , is countably additive, A0 is nonempty and ν is coherent. Then (1) implies (2), where: (1) U is uncertainty averse with countably additive supporting probability measure. (2) U satisfies conditions (i) and (ii) of Lemma 9.1 with measure m given by m(·) = δν(·; A)/δν(S; A),
for any A in A0 .
(9.32)
Conversely, if δν(·; A) is convex-ranged on A for any A in A0 , then (2) implies (1). The combination of Theorem 9.3 with Lemma 9.2 delivers a comparable result for CEU utility functions. In particular, to verify “more uncertainty averse than some expected utility function” (Lemma 9.2(c)), one need only verify (9.15) for the particular measure m defined in (9.32), a much easier task than computing the complete core of ν.
194
Larry G. Epstein
9.5. Concluding remarks Within the CEU framework, convexity of the capacity has been widely taken to characterize uncertainty aversion. This chapter has questioned the appeal of this characterization and has proposed an alternative. To conclude, consider further the relation between the two definitions and, in particular, the significance of the difference in the domains adopted in Schmeidler (1989) and in this chapter. Denote by H the set of all finite-ranged (simple) and measurable acts e from (S, ) into (X ). Then H is the domain of horse-race/roulette-wheel acts used by Anscombe and Aumann. Each such act h involves two stages—in the first, uncertainty is resolved through realization of the horse-race winner s ∈ S and in the second stage the risk associated with the objective lottery h(s) is resolved. An act h that yields a degenerate lottery h(s) in the second stage for every s can be identified with a Savage act; in other words, F ⊂ H. Schmeidler assumes that preference ! and the representing utility function U are defined on the larger domain H. He calls U uncertainty averse if it is quasiconcave, that is, if U (e) ≥ U (f ) =⇒ U (αe + (1 − α)f ) ≥ U (f ),
(9.33)
for all α ∈ [0, 1], where the mixture αe + (1 − α)f is defined in the obvious way. The suggested interpretation (p. 119) is that “substituting objective mixing for subjective mixing makes the decision-maker better off.” Within CEU theory, expanded to the domain H, U ceu is uncertainty averse if and only if the corresponding capacity ν is convex. Though formulated and motivated by Schmeidler within the AA framework, the identification of convexity of ν with uncertainty aversion has been widely adopted in many instances where the Savage domain F, rather than H, is the relevant one, that is, where choice behavior over F is the object of study and in which only such behavior is observable to the analyst. The Ellsberg single-urn experiment provides such a setting, but it was shown in Section 9.1.2 that convexity has little to do with intuitively uncertainty averse behavior in that setting. One possible reaction is to suggest that the single-urn experiment is special and that convexity is better suited to Ellsberg’s other principal experiment involving two urns, one ambiguous and the other unambiguous.19 Because behavior in this experiment is also prototypical of the behavior that is to be modeled and because it might be unrealistic to expect a single definition of uncertainty aversion to perform well in all settings, good performance of the convexity definition in this setting might restore its appeal. Moreover, such good performance might be expected because the Cartesian product state space that is natural for modeling the two-urn experiment suggests a connection with the horse-race/roulette-wheel acts in the AA domain. According to this view, the state space for the ambiguous urn “corresponds” to the horse-race stage of the AA acts and the state space for the unambiguous urn “corresponds” to the roulette-wheel component. In fact, the performance of the convexity definition is no better in the two-urn experiment than in the single-urn case. Rather than providing specific examples
A definition of uncertainty aversion
195
of capacities supporting this assertion, it may be more useful to point out why the grounds for optimism described earlier are unsound. In spite of the apparent correspondence between the AA setup and the Savage domain with a Cartesian product state space, these are substantially different specifications because, as pointed out by Sarin and Wakker (1992), only the AA domain involves two-stage acts (the horse-race first and then the roulette-wheel) and in Schmeidler’s formulation of CEU, these are evaluated in an iterative fashion. Eichberger and Kelsey (1996) show that this difference leads to different conclusions about the connection between convexity of the capacity and attitudes toward randomization. For the same reason the difference in domains leads to different conclusions about the connection between convexity of the capacity and attitudes toward uncertainty. In particular, convexity is not closely connected to typical behavior in the two-urn experiment. While the preceding discussion has centered on examples, albeit telling examples, there is a general point that may be worth making explicit. The general point concerns the practice of transferring to the Savage domain notions, such as uncertainty aversion, that have been formulated and motivated in the AA framework. The difference between the decision-maker’s attitude toward the second-stage roulette-wheel risk as opposed to the uncertainty inherent in the first-stage horse-race is the basis for Schmeidler’s definition of uncertainty aversion. The upshot is that uncertainty aversion is not manifested exclusively or primarily through the choice of pure horse-races or acts over S. Frequently, however, it is the latter choice behavior that is of primary interest to the modeller. This is the case, for example, in the Ellsberg experiments discussed earlier and is the reason for the weak (or nonexistent) connection between convexity and intuitive behavior in those experiments. This is not to deny that convexity may be a useful hypothesis even in a Savage framework nor that its interpretation as uncertainty aversion may be warranted where preferences over AA acts are observable, say in laboratory experiments. Accordingly, this is not a criticism of Schmeidler’s definition within his chosen framework. It argues only against the common practice of interpreting convexity as uncertainty aversion outside that framework. (An alternative behavioral interpretation for convexity is provided in Wakker (1996).) I conclude with one last remark on the AA domain. The extension of the Savage domain of acts to the AA domain is useful because the inclusion of second-stage lotteries delivers greater analytical power or simplicity. This is the reason for their inclusion by Anscombe and Aumann—to simplify the derivation of subjective probabilities—as well as in the axiomatizations of the CEU and multiple-priors utility functions in Schmeidler (1989) and Gilboa and Schmeidler (1989), respectively. In all these cases, roulette-wheels are a tool whose purpose is to help in delivering the representation of utility for acts over S. Kreps (1988: 101) writes that this is sensible in a normative application but “is a very dicey and perhaps completely useless procedure in descriptive applications” if only choices between acts over S are observable. Emphasizing and elaborating this point has been the objective of this section.
196
Larry G. Epstein
Appendix A: Proofs Proof of Lemma 9.1. U ps and U agree on F ua . Therefore, (i) follows from (9.7) and the monotonicity assumed for W . That U ps supports U implies by (9.8) that for all e ∈ F and h ∈ F ua , W (m,e ) ≤ W (m,h ) =⇒ U (e) ≤ U (h). This implies (ii). For the converse, define !ps as the order represented numerically by U ps , U ps (e) = W (m,e ),
e ∈ F,
where W : (X ) −→ R1 is defined by W () = U (h) for any h ∈ F ua
satisfying m,h = .
Part (i) ensures that W () does not depend on the choice of h, making W welldefined. The assumption added for m ensures that this defines W on all of (X ). Then U ps supports U . Proof of Lemma 9.2. (b) U ceu and U ps must agree on F ua , implying that ν and m are ordinally equivalent on A. Because ν is convex-ranged and A is rich, ν(A) = ν() = [0, 1]. Conclude that m(A) = [0, 1] also. Thus (9.14) is proven. Lemma 9.1(ii) implies that for all acts e and unambiguous acts h, U ceu (e) =
n−1 [u(xi ) − u(xi+1 )]ν(∪i1 e(xj )) + u(xn ) i=1
≤
n−1
[u(xi ) − u(xi+1 )]ν(∪i1 h(xj )) + u(xn ) = U ceu (h)
i=1
=
n−1
[u(xi ) − u(xi+1 )]g ◦ m(∪i1 h(xj )) + u(xn )
i=1
if m(e(xj )) = m(h(xj )) for all j . Because this inequality obtains for all u(x1 ) > · · · > u(xn ) and these utility levels can be varied over an open set containing some point (u(x), . . . , u(x)), it follows that g(m(∪i1 e(xj ))) = g(m(∪i1 h(xj ))) ≥ ν(∪i1 e(xj )), for all e and h as stated earlier. Given E ∈ , let e(x1 ) = E and e(x2 ) = E c , x1 x2 . There exists unambiguous A such that mE = mA. Let h(x1 ) = A and h(x2 ) = Ac . Then g(m(E)) ≥ ν(E) follows, proving (9.13). The sufficiency portion (a) can be proven by suitably reversing the preceding argument.
A definition of uncertainty aversion
197
Proof of Theorem 9.1. The following lemma is of independent interest because of the special significance of bets as a subclass of all acts. Notation from Section 9.3.4 is used below. Lemma 9.A.1. Suppose that A is rich, with outcomes x ∗ and x∗ as in the definition of richness. Let ν(E) ≡ U (x ∗ Ex∗ ). Then the conjunction of (9.16) and (9.17) implies that ν is ordinally equivalent to a probability measure on (or equivalently, ν satisfies (9.25)). A fortiori, the conclusion is valid if ! is both uncertainty averse and uncertainty loving. Proof. Let m and q be the hypothesized supports. Their defining properties imply that mF ≤ mG =⇒ qF ≤ qG, for all A ∈ A, F ⊂ Ac and G ⊂ A. But if this relation is applied to Ac in place of A, noting that Ac ∈ A, then the roles of F and G are reversed and one obtains mF ≥ mG =⇒ qF ≥ qG. In other words, mF ≤ mG ⇐⇒ qF ≤ qG, for all A ∈ A, F ⊂ Ac and G ⊂ A. Conclude from (9.16) and (9.17) that mF ≤ mG ⇐⇒ ν(A + F − G) ≤ νA for all A ∈ A, F ⊂ Ac and G ⊂ A; or equivalently, that for all A ∈ A, mE ≤ mA ⇐⇒ νE ≤ νA. In other words, every indifference curve for ν containing some unambiguous event is also an indifference curve for m. The stated hypothesis regarding A ensures that every indifference curve contains some unambiguous A and therefore that ν and m are ordinally equivalent on all of . ps
Complete the proof of Theorem 9.1. Denote by !ps and !∗ the probabilistically sophisticated preference orders supporting ! in the sense of (9.8) and (9.9), respectively, and having underlying probability measures m and q defined on . From the proof of the Lemma, m and q are ordinally equivalent on . Claim. For each act e, there exists h ∈ F ua such that ps
e ∼ps h and e ∼∗ h.
198
Larry G. Epstein
To see this, let e = ((xi , Ei )ni=1 ). By the richness of A, there exist unambiguous events H1 , such that, x ∗ H1 x∗ ∼ x ∗ E1 x∗ , or, in the notation of the lemma, ν(H1 ) = ν(E1 ). Because ν and m are ordinally equivalent, m(H1 ) = m(E1 ) and thus also m(H1c ) = m(E1c ) and ν(H1c ) = ν(E1c ). Thus one can apply richness again to find a suitable unambiguous subset H2 of H1c . Proceeding in this way, one constructs an unambiguous act h = ((xi , Hi )ni=1 ) such that ν(Hi ) = ν(Ei )
and m(Hi ) = m(Ei )
for all i. By the ordinal equivalence of m and q, q(Hi ) = q(Ei ),
all i.
The claim now follows immediately from the nature of probabilistic sophistication. ps From (9.8), ! and !ps agree on F ua . Similarly, ! and !∗ agree on F ua . ps ps Therefore, ! and !∗ agree there. From the claim, it follows that they agree on the complete set of acts F. The support properties (9.8) and (9.9) thus imply that h !ps e ⇐⇒ h ! e,
for all h ∈ F ua
and
e ∈ F.
In particular, every indifference curve for !ps containing some unambiguous act is also an indifference curve for !. But the qualification can be dropped because of the claim. It follows that ! and !ps coincide on F. Proof of Theorem 9.3. (a) Let m satisfy (9.16) at A. Show first that mF ≤ mG =⇒ δν(F ; A) ≤ δν(G; A),
(9.A.1)
for all, F ⊂ Ac and G ⊂ A: Fix ε > 0 and let λ0 be such that the expression defining δν(·; A) is less than ε whenever λ > λ0 . By Lemma 9.B.1, there exist partitions {F j ,λ }n1 λ and {Gj ,λ }n1 λ such that mF j ,λ ≤ mGj ,λ ,
j = 1, . . . , nλ ,
and λ > λ0 , hence nλ
|[ν(A) − ν(A + F j ,λ − Gj ,λ )] − Eδν(Gj ,λ ; A) − δν(F j ,λ ; A)]| < ε.
j =1
Because m is a support, ν(A + F j ,λ − Gj ,λ ) ≤ ν(A). Thus20 nλ [δν(Gj ,λ ; A) − δν(F j ,λ ; A)] > −ε. δν(G; A) − δν(F ; A) = j =1
However, ε is arbitrary. This proves (9.A.1).
A definition of uncertainty aversion
199
Replace A by Ac , in which case F and G reverse roles and deduce that mF ≥ mG =⇒ δν(F ; Ac ) ≥ δν(G; Ac ) or equivalently, δν(F ; Ac ) ≤ δν(G; Ac ) =⇒ mF ≤ mG.
(9.A.2)
Because m is a support, this yields (9.28). (b) Let A ∈ A satisfy S A and
S Ac .
(9.A.3)
Claim 1. δν(Ac ; A) > 0. If it equals zero, then δν(Ac ; A) = δν(φ; A) implies, by (9.28), that A + Ac ) A, or S ∼ A, contrary to (9.A.3). Claim 2. mAc > 0. If not, then mS ≤ mA = 1 and (9.16) implies that S ∼ A, contrary to (9.A.3). Claim 3. δν(A; Ac ) > 0 and mA > 0. Replace A by Ac above. Claim 4. δν(Ac ; Ac ) > 0. If it equals zero, then δν(A; Ac ) mAc = 0 by (9.29), contradicting Claim 3. Claim 5. For any G ⊂ A, δν(G; A) = 0 =⇒ mG = 0: Let F = Ac . By Claim 1, δν(F ; A) > 0. Therefore, Lemma 9.B.1 implies that ∀λ0 ∃λ > λ0 , j . By (9.A.1), ∀λ0 ∃ λ > λ0 , m(F j ,λ ) > m(G) δν(F j ,λ ; A) > 0 = δν(G; A) for nall λ for all j , and thus also mF > j =1 (mG). This implies mG = 0. Claim 6. For any F ⊂ Ac , mF = 0 =⇒ δν(F ; A) = 0: mF = 0 =⇒ (by (9.A.1)) δν(F ; A) ≤ δν(G; A) for all G ⊂ A. Claim 4 implies δν(G; A) > 0 if G = A. Therefore, δν(·; A) convex-ranged implies (Lemma 9.B.1) that δν(F ; A) = 0. Claim 7. m is convex-ranged: By Claim 5, m is absolutely continuous with respect to δν(·; A) on A. The latter measure is convex-ranged. Therefore, m has no atoms in A. Replace A by Ac and use the convex range of δν(·; Ac ) to deduce in a similar fashion that m has no atoms in Ac . Thus m is non-atomic. Because it is also countably additive by hypothesis, conclude that it is convex-ranged (Rao and Rao, 1983: Theorem 5.1.6). Turn to (9.29); (9.30) may be proven similarly. Define the measures µ and p on Ac × A as follows: µ = m δν(·; A),
p = δν(·; A) ⊗ m.
Claims 5 and 6 prove that p µ. Denote by h ≡ dp/dµ the Radon-Nikodym density. (Countable additivity is used here.)
200
Larry G. Epstein
Claim 8. µ{(s, t) ∈ Ac × A : h(s, t) > 1} = 0: If not, then there exist F0 ⊂ Ac and G0 ⊂ A, with µ(F0 × G0 ) > 0, such that h>1
on F0 × G0 .
Case 1. mF0 = mG0 . Integration delivers that
F0 G0 [h(s, t)
− 1]dµ > 0, implying
δν(F0 ; A)mG0 − mF0 δν(G0 ; A) > 0. Consequently, mF0 = mG0 and δν(F0 ; A) > δν(G0 ; A), contradicting (9.A.1). Case 2. mF0 < mG0 . Because m is convex-ranged (Claim 7), there exists G1 ⊂ G0 such that mG1 = mF0 and µ(F0 × G1 ) > 0. Thus the argument in Case 1 can be applied. Case 3. mF0 > mG0 . Similar to Case 2. c This proves Claim 8. Finally, for any F ⊂ A and G ⊂ A, δν(F ; A) (mG) − (mF )δν(G; A) = F G (h − 1)dµ ≤ 0, proving (9.29). (c) Though at first glance the proof may seem obvious given (9.31), some needed details are provided here. Let A ∈ A0 . Multiply through (9.29) by δν(G; Ac ) to obtain that
δν(F ; A)δν(G; Ac )mG ≤ δν(G; A)δν(G; Ac )mF , for all F ⊂ Ac and G ⊂ A. Similarly, multiplying through (9.30) by δν(G; A) yields δν(G; A)δν(G; Ac )mF ≤ δν(G; A)δν(F ; Ac )mG, for all such F and G. Conclude from coherence that δν(G; A)δν(G; Ac )mF = δν(G; A)δν(F ; Ac )mG,
(9.A.4)
for all F ⊂ Ac and G ⊂ A. Take G = A in (9.A.4) to deduce δν(F ; Ac ) = δν(A; Ac )m(F )/m(A),
for all F ⊂ Ac .
(9.A.5)
Next take F = Ac in (9.A.4). If δν(G; A) > 0, then δν(G; Ac ) = δν(Ac ; Ac )m(G)/m(Ac ),
for all G ⊂ A.
(9.A.6)
This equation is true also if δν(G; A) = 0, because then (9.28), with F = Ac , implies δν(Ac ; A)m(G) = 0, which implies mG = 0 by Claim 1.
A definition of uncertainty aversion
201
Substitute the expressions for δν(F ; Ac ) and δν(G; Ac ) into (9.A.4) and set F = Ac and G = A to derive δν(Ac ; Ac )/m(Ac ) = δν(A; Ac )/m(A) ≡ α(A) > 0. Thus
δν(·; A ) = c
α(A)m(·) on ∩ Ac α(A)m(·) on ∩ A.
By additivity, it follows that δν(·; Ac ) = α(A) m(·) on all of . Thus δν(·; A) = κ(A)α(A)m(·), completing the proof.
Appendix B: Additive functions on
X
Some details are provided for such functions, as defined in Section 9.4.1. For any additive µ, µ(∅) = 0 and µ(e) = x µx (e(x)),
(9.B.1)
where µx is the marginal measure on defined by µx (E) = the µ-measure of the act that assigns E to the outcome x and the empty set to every other outcome. Apply to each marginal the standard notions and results for finitely additive measures on an algebra (see Rao and Rao, 1983). In this way, one obtains a decomposition of µ, µ = µ+ − µ− , where µ+ and µ− are non-negative measures. Define | µ | = µ+ + µ − . Say that the measure µ is bounded if ⎧ ⎫ nλ ⎨ ⎬ sup | µ | (f ) = sup | µ(f j ,λ ) | : f ∈ X , λ < ∞. ⎩ ⎭ f
(9.B.2)
j =1
Call the measure µ on X convex-ranged if for every e and r ∈ (0, | µ | (e)), there exists b, b ⊂ e such that | µ | (b) = r, where e and b are elements of X . Lemma 9.B.1 summarizes some useful properties of convex-ranged measures on X . See (Rao and Rao, 1983: 142–3) for comparable results for measures on an algebra. In Rao and Rao, 1983 property (b) is referred to as strong continuity. Lemma 9.B.1. Let µ be a measure on X . Then the following statements are equivalent: (a) µ is convex-ranged.
202
Larry G. Epstein
λ (b) For any act f , with corresponding net of all finite partitions {f j ,λ }nj =1 , and for any ε > 0, there exists λ0 such that
λ > λ0 =⇒ | µ | (f j ,λ ) < ,
for j = 1, . . . , nλ .
(c) For any acts f , g, and h ≡ f + g, if µ(f ) > µ(g), then there exists a λ partition {hj ,λ }nj =1 of h, such that µ(hj ,λ ) < ε and µ(hj ,λ ∩f ) > µ(hj ,λ ∩g), j = 1, . . . , nλ .
Appendix C: Differentiability This appendix elaborates on mathematical aspects of the definition of eventwise differentiability. Then it describes a stronger differentiability notion. The requirement of convex range for δ(·; e) is not needed everywhere, but is built into the definition for ease of exposition. Though I use the term derivative, δ(·; e) is actually the counterpart of a differential. The need for a signed measure arises from the absence of any monotonicity assumptions. If (·) is monotone with respect to inclusion ⊂, then each δ(·; e) is a non-negative measure. The limiting condition (9.23) may seem unusual because it does not involve a difference quotient. It may be comforting, therefore, to observe that a comparable condition can be identified in calculus: For a function ϕ : R1 −→ R1 that is differentiable at some x in the usual sense, elementary algebraic manipulation of the definition of the derivative ϕ (x) yields the following expression paralleling (9.23): N
[ϕ(x + N −1 ) − ϕ(x) − N −1 ϕ (x)] −→N−→∞ 0.
i=1
Further clarification is afforded as follows by comparison with Gateaux differentiability: Roughly speaking, eventwise differentiability at e states that the difference (e + f − g) − (e) can be approximated by δ(f ; e) − δ(g; e) for suitably “small” f and g, where the small size of the perturbation “f − g” is in the sense of the fineness of the partitions as λ grows. Naturally, it is important that the approximating functional δ(·; e) is additive (a signed measure). There is an apparent parallel with Gateaux (directional) differentiability of functions defined on a linear space—“f − g” represents the “direction” of perturbation and the additive approximation replaces the usual linear one. Note that the perturbation from e to e + f − g is perfectly general; any e can be expressed (uniquely) in the form e = e + f − g, with f ⊂ ec and g ⊂ e (see (9.18)). A natural question is “how restrictive is the assumption of eventwise differentiability?” In this connection, the reader may have noted that the definition is formulated for an arbitrary state space S and algebra . However, eventwise differentiability is potentially interesting only in cases where these are both infinite. That is because if is finite, then is differentiable if and only if it is additive.
A definition of uncertainty aversion
203
Another question concerns the uniqueness of the derivative. The limiting condition (9.23) has at most one solution, that is, the derivative is unique if it exists: If p and q are two measures on X satisfying the limiting property, then for λ | p(g j ,λ ) − q(g j ,λ ) | −→λ 0. Therefore, each g ⊂ ec , | p(g) − q(g) | ≤ nj =1 p(g) = q(g) for all g ⊂ e. Similarly, prove equality for all f ⊂ ec and then apply additivity. Next I describe a Chain Rule for eventwise differentiability. Theorem 9.C.1. Let : X −→ R1 be eventwise differentiable at e and ϕ : ( X ) −→ R1 be strictly increasing and continuously differentiable. Then ϕ ◦ is eventwise differentiable at e and δ(ϕ ◦ )(·; e) = ϕ ((e))δ(·; e) Proof. Consider the sum whose convergence defines the eventwise derivative of ϕ ◦ . By the Mean Value Theorem, ϕ ◦ (e + f j ,λ − g j ,λ ) − ϕ ◦ (e) = ϕ (zj ,λ )[(e + f j ,λ − g j ,λ ) − (e)] for suitable real numbers zj ,λ . Therefore, it suffices to prove that nλ
| (e + f j ,λ − g j ,λ ) − (e) || ϕ (zj ,λ ) − ϕ ((e)) | −→λ 0.
j =1
By the continuity of ϕ , the second term converges to zero uniformly in j . Eventwise differentiability of implies that given ε, there exists λ0 such that λ > λ0 =⇒ nλ
| (e + f j ,λ − g j ,λ ) − (e) | ≤ ε +
j =1
nλ
| δ(f j ,λ ; e) − δ(g j ,λ ; e) |
j =1
≤ε+
nλ
[| δ(f j ,λ ; e) | + | δ(g j ,λ ; e) |]
j =1
≤ K, for some K < ∞ that is independent of λ, f and g, as provided by the boundedness of the measure δ(·; e). Eventwise differentiability is inspired by Rosenmuller’s (1972) notion, but there are differences. Rosenmuller deals with convex capacities defined on , rather than with utility functions defined on acts. Even within that framework, his formulation differs from (9.23) and relies on the assumed convexity. Moreover, he restricts attention to “one-sided” derivatives, that is, where the inner perturbation g is identically empty (producing an outer derivative), or where the outer perturbation
204
Larry G. Epstein
f is identically empty (producing an inner derivative). Finally, Rosenmuller’s application is to cooperative game theory rather than to decision theory. A strengthening of eventwise differentiability, called µ-differentiability, is described here. The stronger notion is more easily interpreted, thus casting further light on eventwise differentiability, and it delivers a form of the Fundamental Theorem of Calculus. Machina (1992) introduces a very similar notion. Because it is new and still unfamiliar and because our formulation is somewhat different and arguably more transparent, a detailed description seems in order.21 To proceed, adopt as another primitive a non-negative, bounded and convexranged measure µ on X . This measure serves the “technical role” of determining the distance between acts. To be precise, if e and e are identified whenever µ(ee ) = 0, then d(e, e ) = µ(ee )
(9.C.1)
defines a metric on X ; the assumption of convex range renders the metric space path-connected (by Volkmer and Weber, 1983; see also Landers, 1973: Lemma 4). One way in which such a measure can arise is from a convex-ranged probability measure µ0 on . Given µ0 , define µ by µ(e) = x µ0 (e(x)).
(9.C.2)
Once again let : X −→ R1 Because acts e and e are identified when µ(ee ) = 0, is assumed to satisfy the condition µ(ee ) = 0 =⇒ (e ∪ f ) = (e ∪ f ),
for all f .
(9.C.3)
In particular, acts of µ-measure 0 are assumed to be “null” with respect to . Definition 9.C.1. is µ-differentiable at e ∈ X if there exists a bounded and convex-ranged measure δ(·; e) on X , such that for all f ⊂ ec and g ⊂ e, | (e + f − g) − (e) − δ(f ; e) + δ(g; e) | /µ(f + g) −→ 0 (9.C.4) as µ(f + g) −→ 0. The presence of a “difference quotient” makes the definition more familiar in appearance and permits an obvious interpretation. Think in particular of the case (| X | = 1) where the domain of is . It is easy to see that δ(·; e) is absolutely continuous with respect to µ for each e. (Use additivity of the derivative and (9.C.3).) Eventwise and µ-derivatives have not been distinguished notationally because they coincide whenever both exist. Lemma 9.C.1. If is µ-differentiable at some e in X , then is also eventwise differentiable at e and the two derivatives coincide.
A definition of uncertainty aversion
205
Proof. Let δ(·; e) be the µ-derivative at e, f ⊂ ec and g ⊂ e. Given ε > 0, there exists (by µ-differentiability) ε > 0 such that | (e + f − g ) − (e) − δ(f ; e) + δ(g ; e) | < εµ(f + g ), (9.C.5) if µ(f + g ) < ε . By Lemma 9.B.1 applied to the convex-ranged µ, there exists λ0 such that µ(f j ,λ + g j ,λ ) < ε ,
for all λ > λ0 .
Therefore, one can apply (9.C.5) to the acts (f , g ) = (f j ,λ , g j ,λ ). Deduce that nλ
| (e + f j ,λ − g j ,λ ) − (e) − δ(f j ,λ ; e) + δ(g j ,λ ; e) |
j =1
<ε
nλ
µ(f j ,λ + g j ,λ ) = εµ(f + g) < ε sup µ(·).
j =1
A consequence is that the µ-derivative of is independent of µ; that is, if µ1 and µ2 are two measures satisfying the conditions in the lemma, then they imply the identical derivatives for . This follows from the uniqueness of the eventwise derivative noted earlier. Such invariance is important in light of the exogenous and ad hoc nature of µ. This result is evident because of the deeper perspective afforded by the notion of eventwise differentiability and reflects its superiority over the notion of µ-differentiability. Finally, under a slight strengthening of µ-differentiability, one can “integrate” back to from its derivatives. That is, a form of the Fundamental Theorem of Calculus is valid. Lemma 9.C.2. Let be µ-differentiable and suppose that the convergence in (9.C.4) is uniform in e. For every ε > 0, f ⊂ ec and g ⊂ e, there exist finite partitions f = f j and g = g j such that ε > | (e+f −g)−(e)−i δ(f i ; e+F i−1 −G i−1 )+i δ(g i ; e+F i−1 −G i−1 ) | (9.C.6) where F i =
i
j =1 f
j
and G i =
i
j =1 g
j.
Proof. µ-differentiability and the indicated uniform convergence imply that | (e + F i−1 − G i−1 + f i − g i ) − (e + F i−1 − G i−1 ) −δ(f i ; e + F i−1 − G i−1 ) + δ(g i ; e + F i−1 − G i−1 ) | < εµ(f i + g i ), for any partitions {f j } and {g j } such that µ(f j + g j ) is sufficiently small for all j . But the latter can be ensured by taking the partitions {f j ,λ } and {g j ,λ } for λ sufficiently large. The convex range assumption for µ enters here; use Lemma 9.B.1.
206
Larry G. Epstein
Therefore, the triangle inequality delivers | (e + f − g) − (e) − δ(f i ; e + F i − 1 − G i − 1 ) + δ(g i ; e + F i − 1 − G i − 1 ) | ≤ εi µ(f i + g i ) = εµ(f + g).
Acknowledgments An earlier version of this chapter was circulated in July 1997 under the title “Uncertainty Aversion.” The financial support of the Social Sciences and Humanities Research Council of Canada and the hospitality of the Hong Kong University of Science and Technology are gratefully acknowledged. I have also benefitted from discussions with Paolo Ghirardato, Jiankang Zhang and especially Kin Chung Lo, Massimo Marinacci and Uzi Segal, and from comments by audiences at HKUST and the Chantilly Workshop on Decision Theory, June 1997. The suggestions of an editor and referee led to an improved exposition.
Notes 1 After a version of this chapter was completed, I learned of a revision of Machina (1992), dated 1997, that is even more closely related. 2 Zhang (1997) is the first paper to propose a definition of ambiguity that is derived from preference, but his definition is problematic. An improved definition is the subject of current research by this author and Zhang. 3 See Section 9.3.2 for the definition of Choquet integration. 4 As explained in Section 9.5, the examples to follow raise questions about the widespread use that has been made of Schmeidler’s definition rather than about the definition itself. Section 9.3.4 describes the performance of this chapter’s definition of uncertainty aversion in the Ellsbergian setting. 5 In terms of acts, {R} {B} means 1R 1B and so on. For CEU, a decision-maker always prefers to bet on the event having the larger capacity. 6 p∗ is an inner measure, as defined and discussed further in Section 9.3.3. 7 Write y ≥ x if receiving outcome y with probability 1 is weakly preferable, according to U ps , to receiving x for sure. first-order stochastically dominates if for all outcomes y, ({x ∈ X : y ≥ x}) ≤ ({x ∈ X : y ≥ x}). Thus the partial order depends on the utility function U ps , but that causes no difficulties. See Machina and Schmeidler (1992) for further details. 8 Subjective expected utility is the special case of (9.7) with W () = X u(x)d(x). But more general risk preferences W are admitted, subject only to the noted monotonicity restriction. In particular, probabilistically sophisticated preference can rationalize behavior such as that exhibited in theAllais Paradox. It follows that uncertainty aversion, as defined shortly, is concerned with Ellsberg-type, and not, Allais-type, behavior. 9 It merits emphasis that richness of A is needed only for some results stated later; for example, for the necessity parts of Theorem 9.1 and Lemma 9.2. Richness is not used to describe conditions that are sufficient for uncertainty aversion (or neutrality). In particular, the approach and definitions of this chapter are potentially useful even if A = {∅, S}. 10 See Rao and Rao (1983). Given countable additivity, convex-ranged is equivalent to non-atomicity. 11 Massimo Marinacci provided the following example: Let S be the set of integers {1, . . . , 6} and A the λ-system {∅, S} ∪ {Ai , Aci : 1 ≤ i ≤ 3}, where A1 = {1, 2, 3},
A definition of uncertainty aversion
12
13 14 15
207
A2 = {3, 4, 5} and A3 = {1, 5, 6}. Define p on A as the unique probability measure satisfying p(Ai ) = 1/6 for all i. If p has an extension to the power set, then p(∪i Ai ) = 1 > 1/2 = i p(Ai ). However, the reverse inequality must obtain for any probability measure. This condition is necessary for uncertainty aversion but not sufficient, even if there are only two possible outcomes. That is because by taking h in (9.8) to be a constant act, one concludes that an uncertainty averse order ! assigns a lower certainty equivalent to any act than does the supporting order !ps . In contrast, (9.16) contains information only on the ranking of bets and not on their certainty equivalents. (I am assuming here that certainty equivalents exist.) These ordinal properties are independent of the particular pair of outcomes satisfying x1 x2 if (and only if) ! satisfies Savage’s axiom P4: For any events A and B and outcomes x1 x2 and y1 y2 , x1 Ax2 ! x1 Bx2 implies that y1 Ay2 ! y1 By2 . Alternatively, we could show that the rankings in (9.3) are inconsistent with the implication (9.17) of uncertainty loving. A slight strengthening of (9.19) is valid. Suppose that A − F i + Gi ! A all i,
16 17 18
19
20 21
for some partitions F = F i and G = Gi . Only the trivial partitions were admitted earlier. Then additivity of the supporting measure implies as above that mF ≤ mG and hence that A + F − G ) A. In particular, X is not the product algebra on S X induced by . However, X is a ring, that is, it is closed with respect to unions and differences. More generally, a counterpart of the usual product rule of differentiation is valid for eventwise differentiation. Even given (9.27), the supporting measure at a given single A is not unique, contrary to the intuition suggested by calculus. If the support property “mF ≤ mG =⇒ ν(A + F − G) ≤ νA,” is satisfied by m, then it is also satisfied by any m satisfying m(·) ≤ m (·) on ∩ Ac and m(·) ≥ m (·) on ∩ A. For example, let m be the conditional of m given Ac . Each urn contains 100 balls that are either red or blue. For the ambiguous urn this is all the information provided. For the unambiguous urn, the decision-maker is told that there are 50 balls of each color. The choice problem is whether to bet on drawing a red (or blue) ball from the ambiguous urn versus the unambiguous one. Given |xj − yj | < ε and yj ≥ 0 for all j , then xj ≥ −| xj − yj | + yj ≥ |xj − yj |, implying that xj > −ε. As mentioned earlier, after a version of this chapter was completed, I learned of a revision of Machina (1992), dated 1997, in which Machina provides a formulation very similar to that provided in this subsection. The connection with the more general “partitionsbased” notion of eventwise differentiability, inspired by Rosenmuller (1972), is not observed by Machina.
References F. Anscombe and R.J. Aumann (1963) “A Definition of Subjective Probability,” Ann. Math. Stat., 34, 199–205. P. Billingsley (1986) Probability and Measure, John Wiley. S.H. Chew, E. Karni and Z. Safra (1987) “Risk Aversion in the Theory of Expected Utility with Rank Dependent Probabilities,” J. Econ. Theory, 42, 370–381. S.H. Chew and M.H. Mao (1995) “A Schur Concave Characterization of Risk Aversion for Non-Expected Utility Preferences,” J. Econ. Theory, 67, 402–435.
208
Larry G. Epstein
J. Eichberger and D. Kelsey (1996) “Uncertainty Aversion and Preference for Randomization,” J. Econ. Theory, 71, 31–43. L.G. Epstein and T. Wang (1995) “Uncertainty, Risk-Neutral Measures and Security Price Booms and Crashes,” J. Econ. Theory, 67, 40–82. I. Gilboa (1987) “Expected Utility with Purely Subjective Non-Additive Probabilities,” J. Math. Econ., 16, 65–88. I. Gilboa and D. Schmeidler (1989) “Maxmin Expected Utility With Nonunique Prior,” J. Math. Econ., 18, 141–153. (Reprinted as Chapter 6 in this volume.) P.R. Halmos (1974) Measure Theory, Springer–Verlag. E. Karni (1983) “Risk Aversion for State-Dependent Utility Functions: Measurement and Applications,” Int. Ec. Rev., 24, 637–647. D.M. Kreps (1988) Notes on the Theory of Choice, Westview. D. Landers (1973) “Connectedness Properties of the Range of Vector and Semimeasures,” Manuscripta Math., 9, 105–112. M. Machina (1982) “Expected Utility Analysis Without the Independence Axiom,” Econometrica, 50, 277–323. M. Machina (1992) “Local Probabilistic Sophistication,” mimeo. M. Machina and D. Schmeidler (1992) “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. K.P. Rao and M.B. Rao (1983) Theory of Charges, Academic Press. J. Rosenmuller (1972) “Some Properties of Convex Set Functions, Part II,” Methods of Oper. Research, 17, 277–307. R. Sarin and P. Wakker (1992) “A Simple Axiomatization of Nonadditive Expected Utility,” Econometrica, 60, 1255–1272. (Reprinted as Chapter 7 in this volume.) L. Savage (1954) The Foundations of Statistics, John Wiley. D. Schmeidler (1972) “Cores of Exact Games,” J. Math. Anal. and Appl., 40, 214–225. D. Schmeidler (1986) “Integral Representation without Additivity,” Proc. Amer. Math. Soc., 97, 255–261. D. Schmeidler (1989) “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) G. Shafer (1979) “Allocations of Probability,” Ann. Prob., 7, 827–839. H. Volkmer and H. Weber (1983) “Der Wertebereich Atomloser Inhalte,” Archive der Mathematik, 40, 464–474. P. Wakker (1996) “Preference Conditions for Convex and Concave Capacities in Choquet Expected Utility,” mimeo. L. Wasserman (1990) “Bayes’Theorem for Choquet Capacities,” Ann. Stat., 18, 1328–1339. M. Yaari (1969) “Some Remarks on Measures of Risk Aversion and on Their Uses,” J. Econ. Theory, 1, 315–329. J. Zhang (1997) “Subjective Ambiguity, Probability and Capacity,” U. Toronto, mimeo.
10 Ambiguity made precise A comparative foundation Paolo Ghirardato and Massimo Marinacci
10.1. Introduction In this chapter we propose and characterize a formal definition of ambiguity aversion for a class of preference models which encompasses the most popular models developed to allow ambiguity attitude in decision making. Using this notion, we define and characterize ambiguity of events for ambiguity averse or loving preferences. Our analysis is based on a fully “subjective” framework with no extraneous devices (like a roulette wheel, or a rich set of exogenously “unambiguous” events). This yields a definition that can be fruitfully used with any preference in the mentioned class, though it imposes a limitation in the definition’s ability of distinguishing “real” ambiguity aversion from other behavioral traits that have been observed experimentally. The subjective expected utility (SEU) theory of decision making under uncertainty of Savage (1954) is firmly established as the choice-theoretic underpinning of modern economic theory. However, such success has well-known costs: SEU’s simple and powerful representation is often violated by actual behavior, and it imposes unwanted restrictions. In particular, Ellsberg’s (1961) famous thought experiment (see Section 10.6) convincingly shows that SEU cannot take into account the possibility that the information a decision maker (DM) has about some relevant uncertain event is vague or imprecise, and that such “ambiguity” affects her behavior. Ellsberg observed that ambiguity affected his “nonexperimental” subjects in a consistent fashion: Most of them preferred to bet on unambiguous rather than ambiguous events. Furthermore, he found that even when shown the inconsistency of their behavior with SEU, the subjects stood their ground “because it seems to them the sensible way to behave.” This attitude has later been named ambiguity aversion, and has received ample experimental confirmation.1 Savage was well aware of this limit of SEU, for he wrote: There seem to be some probability relations about which we feel relatively “sure” as compared with others. . . . The notion of “sure” and “unsure”
Ghirardato, P. and M. Marinacci (2002), “Ambiguity made precise: Acomparative foundation,” Journal of Economic Theory, 102, 251–289.
210
Paolo Ghirardato and Massimo Marinacci introduced here is vague, and my complaint is precisely that neither the theory of personal probability, as it is developed in this book, nor any other device known to me renders the notion less vague. (Savage 1954: 57–58 of the 1972 edition)
In the wake of Ellsberg’s contribution, extensions of SEU have been developed allowing ambiguity, and the DM’s attitude towards it, to play a role in her choices. Two methods for extending SEU have established themselves as the standards of this literature. The first, originally proposed in Schmeidler (1989), is to allow the DM’s beliefs on the state space to be represented by nonadditive probabilities, called capacities, and her preferences by Choquet integrals (which are just standard integrals when integrated with respect to additive probabilities). For this reason, this generalization is called the theory of Choquet expected utility (CEU) maximization. The second, axiomatized by Gilboa and Schmeidler (1989), allows the DM’s beliefs to be represented by multiple probabilities, and represents her preferences by the “maximin” on the set of the expected utilities. This generalization is thus called the maxmin expected utility (MEU) theory. Here we use the general class of preferences with ambiguity attitudes developed in Ghirardato and Marinacci (2000a). These orderings, that we call biseparable preferences, are all those such that the ranking of consequences can be represented by a stateindependent cardinal utility u, and the ranking of bets on events by u and a unique numerical function (a capacity) ρ.2 The latter represents the DM’s willingness to bet; that is, ρ(A) is roughly the number of euros she is willing to exchange for a bet that pays 1 euro if event A obtains and 0 euros otherwise. The only restriction imposed on the ranking of nonbinary acts is a mild dominance condition. CEU and MEU are special cases of biseparable preferences, where ρ is respectively the DM’s nonadditive belief and the lower envelope of her multiple probabilities. An important reason for the lasting success of SEU theory is the elegant theory of the measurement of risk aversion developed from the seminal contributions of de Finetti (1952), Arrow (1974) and Pratt (1964). Unlike risk aversion, ambiguity aversion is yet without a fully general formalization, one that does not require extraneous devices and applies to most if not all the existing models of ambiguity averse behavior. This chapter attempts to fill this gap: We propose a definition of ambiguity aversion and show its formal characterization in the general decisiontheoretic framework of Savage, whose only restriction is a richness condition on the set of consequences. Our definition is behavioral; that is, it only requires observation of the DM’s preferences on acts in this fully subjective setting. However, the definition works as well (indeed better, see Proposition 10.2) in the Anscombe– Aumann framework, a special case of Savage’s framework which presumes the existence of an auxiliary device with “known” probabilities. Decision models with ambiguity averse preferences are the objects of increasing attention by economists and political scientists interested in explaining phenomena at odds with SEU. For example, they have been used to explain the existence of incomplete contracts (Mukerji, 1998), the existence of substantial volatility in stock markets (Epstein and Wang, 1994; Hansen et al., 1999), or selective
Ambiguity made precise
211
abstention in political elections (Ghirardato and Katz, 2000). We hope that the characterization provided here will turn out to be useful for the “applications” of models of ambiguity aversion, as that of risk aversion was for the applications of SEU. More concretely, we hope that it will help to understand the predictive differences of risk and ambiguity attitudes. To understand our definition, it is helpful to go back to the characterization of risk aversion in the SEU model. The following approach to defining risk aversion was inspired by Yaari (1969). Given a state space S, let F denote a collection of “acts”, maps from S into R (e.g. monetary payoffs). Define a comparative notion of risk aversion for SEU preferences as follows: Say that 2 is more risk averse than 1 if they have identical beliefs and the following implications hold for every “riskless” (i.e. constant) act x and every “risky” act f : x 1 f
⇒ x 2 f
(10.1)
x 1 f
⇒ x 2 f
(10.2)
(where is the asymmetric component of ). Identity of beliefs is required to avoid possible confusions between differences in risk attitudes and in beliefs (cf. Yaari, 1969: 317). We can use this comparative ranking to obtain an absolute notion of risk aversion by calling some DMs—for instance expected value maximizers – risk neutral, and by then calling risk averse those DMs who are more risk averse than risk neutrals. As it is well known, this “comparatively founded” notion has the usual characterization. Like the traditional “direct” definition of risk aversion, it is fully behavioral in the sense defined above. However, its interpretation is based on two primitive assumptions. First, constant acts are intuitively riskless. Second, expected value maximization intuitively reflects risk neutral behavior, so that it can be used as our benchmark for measuring risk aversion. In this chapter, we follow the example of Epstein (1999) in giving a comparative foundation to ambiguity attitude: We start from a “more ambiguity averse than . . .” ranking and then establish a benchmark, thus obtaining an “absolute” definition of ambiguity aversion. Analogously to Yaari’s, our “more ambiguity averse . . .” relation is based on the following intuitive consideration: If a DM prefers an unambiguous (resp. ambiguous) act to an ambiguous (resp. unambiguous) one, a more (resp. less) ambiguity averse one will do the same. This is natural, but it raises the obvious question of which acts should be used as the “unambiguous” acts for this ranking. Depending on the decision problem the DM is facing and on her information, there might be different sets of “obviously” unambiguous acts; that is, acts that we are confident that any DM perceives as unambiguous. It seems intuitive to us that in any well-formulated problem, the constant acts will be in this set. Hence, we make our first primitive assumption: Constant acts are the only acts that are “obviously” unambiguous in any problem, since other acts may not be perceived as unambiguous by some DM in some state of information. This assumption implies that a preference (not necessarily SEU) 2 is more ambiguity averse than 1 whenever Equations (10.1) and (10.2) hold. However, the following example casts some doubts as to the intuitive appeal of such definition.
212
Paolo Ghirardato and Massimo Marinacci
Example 10.1. Consider an (Ellsberg) urn containing balls of two colors: Black and Red. Two DMs are facing this urn, and they have no information on its composition. The first DM has SEU preferences 1 , with a utility function on the set of consequences R given by u1 (x) = x, and beliefs on the state space of ball extractions S = {B, R} given by ρ1 (B) =
1 2
and
ρ1 (R) = 12 .
The second DM also has SEU √ preferences, and identical beliefs: Her preference 2 is represented by u2 (x) = x and ρ2 = ρ1 . Both (10.1) and (10.2) hold, but it is quite clear that this is due to differences in the DMs’ risk attitudes, and not in their ambiguity attitudes: They both apparently disregard the ambiguity in their information. Given a biseparable preference, call cardinal risk attitude the psychological trait described by the utility function u—what explains any differences in the choices over bets of two biseparable preferences with the same willingness to bet ρ. The problem with the example is that the two DMs have different cardinal risk attitude. To avoid confusions of this sort, our comparative ambiguity ranking uses Equations (10.1) and (10.2) only on pairs which satisfy a behavioral condition, called cardinal symmetry, that implies that two DMs have identical u. As it only looks at each DM’s preferences over bets on one event (which may be different across DMs), cardinal symmetry does not impose any restriction on the DMs’ relative ambiguity attitudes. Having thus constructed the comparative ambiguity ranking, we next choose a benchmark against which to measure ambiguity aversion. It seems generally agreed that SEU preferences are intuitively ambiguity neutral. We use SEU preferences as benchmarks because we posit—our second primitive assumption—that they are the only ones that are “obviously” ambiguity neutral in any decision problem and in any situation. Thus, ambiguity averse is any preference relation for which there is a SEU preference “less ambiguity averse than” . Ambiguity love and (endogenous) neutrality are defined in the obvious way. The main results in the chapter present the characterization of these notions of ambiguity attitude for biseparable preferences. The characterization of ambiguity neutrality is simply stated: A preference is ambiguity neutral if and only if it has a SEU representation. That is, the only preferences which are endogenously ambiguity neutral are SEU. The general characterization of ambiguity aversion (resp. love) implies in particular that a preference is ambiguity averse (resp. loving) only if its willingness to bet ρ is pointwise dominated by (resp. pointwise dominates) a probability. In the CEU case, the converse is also true: A CEU preference is ambiguity averse if and only if its belief (which is equal to ρ) is dominated by a probability; that is, it has a nonempty “core.” On the other hand, all MEU preferences are ambiguity averse, as it is intuitive. As to comparative ambiguity aversion, we find that if 2 is more ambiguity averse than 1 then ρ1 ρ2 . That is, a less ambiguity averse DM will have uniformly higher willingness to
Ambiguity made precise
213
bet. The latter condition is also sufficient for CEU preferences, whereas for MEU preferences containment of the sets of probabilities is necessary and sufficient for relative ambiguity. We next briefly turn to the issue of defining ambiguity itself. A “behavioral” notion of unambiguous act follows naturally from our earlier analysis: Say that an act is unambiguous if an ambiguity averse (or loving) DM evaluates it in an ambiguity neutral fashion. The unambiguous events are those that unambiguous acts depend upon. We obtain the following simple characterization of the set of unambiguous events for biseparable preferences: For an ambiguity averse (or loving) DM with willingness to bet ρ, event A is unambiguous if and only if ρ(A) + ρ(Ac ) = 1. (A more extensive discussion of ambiguity is contained in the companion (Ghirardato and Marinacci, 2000a).) Finally, as an application of the previous analysis, we consider the classical Ellsberg problem with a 3-color urn. We show that the theory delivers the intuitive answers, once the information provided to the DM is correctly incorporated. It is important to underscore from the outset two important limitations of the notions of ambiguity attitude we propose. The first limitation is that while the comparative foundation makes our absolute notion “behavioral,” in the sense defined above, it also makes it computationally demanding. A more satisfactory definition would be one which is more “direct:” It can be verified by observing a smaller subset of the DM’s preference relation. While we conjecture that it may be possible to construct such a definition—obtaining the same characterization as the one proposed here—we leave its development to future work. Our comparative notion is more direct, thus less amenable to this criticism. However, it is in turn limited by the requirement of the identity of cardinal risk attitude. The absolute notion is not, as it conceptually builds on the comparison of the DM with an idealized version of herself, identical to her in all traits but her ambiguity aversion. The second limitation stems from the fact that no extraneous devices are used in this chapter. An advantage of this is that our notions apply to any decision problem under uncertainty, and our results to any biseparable preference. However, such wide scope carries costs: Our notion of ambiguity aversion comprises behavioral traits that may not be due to ambiguity—like probabilistic risk aversion, the tendency of discounting “objective” probabilities that has been observed in many experiments on decision making under risk (including the celebrated “Allais paradox”). Thus, one may consider it more appropriate to use a different name for what is measured here, like “chance aversion” or “extended ambiguity aversion.” The reason for our choice of terminology is that we see a ranking of conceptual importance between ambiguity aversion/love and other departures from SEU maximization. As we argued above using Savage’s words, the presence of ambiguity provides a normatively compelling reason for violating SEU. We do not feel that other documented reasons are similarly compelling. Moreover, we hold (see below and Subsection 10.7.3) that extraneous devices—say, a rich set of exogenously “unambiguous” events—are required for ascertaining the reason of a given departure. Thus, when these devices are not available—say, because the set
214
Paolo Ghirardato and Massimo Marinacci
of “unambiguous” events is not rich enough—we prefer to attribute a departure to the reasons we find normatively more compelling. However, the reader is warned, so that he/she may choose to give a different name to the phenomenon we formally describe.
10.1.1. The related literature The problem of defining ambiguity and ambiguity aversion is discussed in a number of earlier papers. The closest to ours in spirit and generality is Epstein (1999), the first paper to develop a notion of absolute ambiguity aversion from a comparative foundation.3 As we discuss in more detail in Subsection 10.7.3, the comparative notion and benchmarks he uses are different from ours. Epstein’s objective is to provide a more precise measurement of ambiguity attitude than the one we attempt here; in particular, to filter out probabilistic risk aversion. For this reason, he assumes that in the absence of ambiguity a DM’s preferences are “probabilistically sophisticated” in the sense of Machina and Schmeidler (1992). However, we argue that for its conclusions to conform with intuition, Epstein’s approach requires an extraneous device: a rich set of acts which are exogenously established to be “unambiguous,” much larger than the set of the constants that we use. Thus, the higher accuracy of his approach limits its applicability vis à vis our cruder but less demanding approach. The most widely known and accepted definition of absolute ambiguity aversion is that proposed by Schmeidler in his seminal CEU model (Schmeidler, 1989). Employing an Anscombe–Aumann framework, he defines ambiguity aversion as the preference for “objective mixtures” of acts, and he shows that for CEU preferences this notion is characterized by the convexity of the capacity representing the DM’s beliefs. While the intuition behind this definition is certainly compelling, Schmeidler’s axiom captures more than our notion of ambiguity aversion. It gives rise to ambiguity averse behavior, but it entails additional structure that does not seem to be related to ambiguity aversion (see Example 10.4). Doubts about the relation of convexity to ambiguity aversion in the CEU case are also raised by Epstein (1999), but he concludes that they are completely unrelated (see Section 10.6 for a discussion). There are other interesting papers dealing with ambiguity and ambiguity aversion. In a finite setting, Kelsey and Nandeibam (1996) propose a notion of comparative ambiguity for the CEU and MEU models similar to ours and obtain a similar characterization, as well as an additional characterization in the CEU case. Unlike us, they do not consider absolute ambiguity attitude, and they do not discuss the issue of the distinction of cardinal risk and ambiguity attitude. Montesano and Giovannoni (1996) notice a connection between absolute ambiguity aversion in the CEU model and nonemptiness of the core, but they base themselves purely on intuitive considerations on Ellsberg’s example. Chateauneuf and Tallon (1998) present an intuitive necessary and sufficient condition for nonemptiness of the core of CEU preferences in an Anscombe–Aumann framework. Zhang
Ambiguity made precise
215
(1996), Nehring (1999), and Epstein and Zhang (2001) propose different definitions of unambiguous event and act. Fishburn (1993) characterizes axiomatically a primitive notion of ambiguity. 10.1.2. Organization The structure of the chapter is as follows. Section 10.2 provides the necessary definitions and set-up. Section 10.3 introduces the notions of ambiguity aversion. The cardinal symmetry condition is introduced in Subsection 10.3.1, and the comparative and absolute definitions in 10.3.2. Section 10.4 presents the characterization results. Section 10.5 contains the notions of unambiguous act and event, and the characterization of the latter. In Section 10.6, we go back to the Ellsberg urn and show the implications of our results for that example. Section 10.7 discusses the key aspects of our approach, in particular, the choices of the comparative ambiguity ranking and the benchmark for defining ambiguity neutrality; it thus provides a more detailed comparison with Epstein’s (1999) approach. The Appendices contain the proofs and some technical material.
10.2. Set-up and preliminaries The general set-up of Savage (1954) is the following. There is a set S of states of the world, an algebra of subsets of S, and a set X of consequences. The choice set F is the set of all finite-valued acts f : S → X which are measurable w.r.t. . With the customary abuse of notation, for x ∈ X we define x ∈ F to be the constant act x(s) = x for all s ∈ S, so that X ⊆ F . Given A ∈ , we denote by xAy the binary act (bet) f ∈ F such that f (s) = x for s ∈ A, and f (s) = y for s∈ / A. Our definitions require that the DM’s preferences be represented by a weak order on F : a complete and transitive binary relation , with asymmetric (resp. symmetric) component (resp. ∼). The weak order is called nontrivial if there are f , g ∈ F such that f g. We henceforth call preference relation any nontrivial weak order on F . A functional V : F → R is a representation of if for every f , g ∈ F , f g if and only if V (f ) V (g). A representation V is called: monotonic if f (s) g(s) for every s ∈ S implies V (f ) V (g); nontrivial if V (f ) > V (g) for some f,g ∈ F . While the definitions apply to any preference relation, our results require a little more structure, provided by a general decision model introduced in Ghirardato and Marinacci (2000a). To present it, we need the following notion of “nontrivial” event: Given a preference relation , A ∈ is essential for if for some x, y ∈ X, we have x x Ay y. Definition 10.1. Let be a binary relation. We say that a representation V : F → R of is canonical if it is nontrivial monotonic and there exists
216
Paolo Ghirardato and Massimo Marinacci
a set-function ρ : → [0, 1] such that, letting u(x) ≡ V (x) for all x ∈ X, for all consequences x y and all events A, V (x Ay) = u(x) ρ(A) + u(y) (1 − ρ(A)).
(10.3)
A relation is called a biseparable preference if it admits a canonical representation, and moreover such representation is unique up to a positive affine transformation when has at least one essential event. Clearly, a biseparable preference is a preference relation. If V is a canonical representation of , then u is a cardinal state-independent representation of the DM’s preferences over consequences, hence we call it his canonical utility index. Moreover, for all x y and all events A, B ∈ we have xAy xBy if and only if ρ(A) ρ(B). Thus, ρ represents the DM’s willingness to bet (likelihood relation) on events. ρ is easily shown to be a capacity—a set-function normalized and monotonic w.r.t. set inclusion—so that V evaluates binary acts by taking the Choquet expectation of u with respect to ρ.4 However, the DM’s preferences over nonbinary acts are not constrained to a specific functional form. To understand the rationale of the clause relating to essential events, first observe that for any with a canonical representation with willingness to bet ρ, an event A is essential if and only iff 0 < ρ(A) < 1. Thus, there are no essential events iff ρ(A) is either 0 or 1 for every A; that is, the DM behaves as if he does not judge any bet to be uncertain, and his canonical utility index is ordinal. In such a case, the DM’s cardinal risk attitude is then intuitively not defined: without an uncertain event there is no risk. On the other hand, it can be shown (Ghirardato and Marinacci, 2000a: Theorem 4) that cardinal risk attitude is characterized by a cardinal property of the canonical utility index, its concavity. Hence the additional requirement in Definition 10.1 guarantees that when there is some uncertain event cardinal risk aversion is well defined. As the differences in two DM’s cardinal risk attitude might play a role in the choices in Equations (10.1) and (10.2), it is useful to identify the situation in which these attitudes are defined: Say that preference relations 1 and 2 have essential events if there are events A1 , A2 ∈ such that for each i = 1, 2, Ai is essential for i . To avoid repetitions, the following lists all the assumptions on the structure of the decision problem and on the DM’s preferences that are tacitly assumed in all results in the chapter: Structural Assumption X is a connected and separable topological space (e.g. a convex subset of Rn with the usual topology). Every biseparable preference on F has a continuous canonical utility function. A full axiomatic characterization of the biseparable preferences satisfying the Structural Assumption is provided in Ghirardato and Marinacci, 2000a. 10.2.1. Some examples of biseparable preferences As mentioned earlier, the biseparable preference model is very general. In fact, it contains most of the known preference models that obtain a separation between
Ambiguity made precise
217
cardinal (state-independent) utility and willingness to bet. We now illustrate this claim by showing some examples of decision models which under mild additional restrictions (e.g. the Structural Assumption) belong to the biseparable class. (More examples and details are found in Ghirardato and Marinacci, 2000a.) (i) A binary relation on F is a CEU ordering if there exist a cardinal utility index u on X and a capacity ν on (S, ) such that can be represented by the functional V : F → R defined by the following equation: V (f ) =
u(f (s)) ν(ds),
(10.4)
S
where the integral is taken in the sense of Choquet (notice that it is finite because each act in F is finite-valued). The functional V is immediately seen to be a canonical representation of , and ρ = ν is its willingness to bet. An important subclass of CEU orderings are the SEU orderings, which correspond to the special case in which ν is a probability measure, that is, a finitely additive capacity. See Wakker (1989) for an axiomatization of CEU and SEU preferences (satisfying the Structural Assumption) in the Savage setting. (ii) Let denote the set of all the probability measures on (S, ). A binary relation on F is a MEU ordering if there exist a cardinal utility index u and a unique nonempty, (weak∗ )-compact and convex set C ⊆ such that can be represented by the functional V : F → R defined by the following equation: V (f ) = min
P ∈C S
u(f (s)) P (ds).
(10.5)
SEU also corresponds to the special case of MEU in which C = {P } for some probability measure P . If we now let for any A ∈ , P (A) = min P (A), P ∈C
(10.6)
we see that P is an exact capacity. While in general V (f ) is not equal to the Choquet integral of u(f ) with respect to P , this is the case for binary acts f . This shows that V is a canonical representation of , with willingness to bet ρ = P . See Casadesus-Masanell et al. (2000) for an axiomatization of MEU preferences (satisfying the Structural Assumption) in the Savage setting. More generally, consider an α-MEU preference which assigns some weight to both the worst-case and best-case scenarios. Formally, there is a cardinal utility u, a set or probabilities C, and α ∈ [0, 1], such that is
218
Paolo Ghirardato and Massimo Marinacci represented by * ) V (f ) = α min u(f (s)) P (ds) + (1 − α) max u(f (s)) P (ds) . P ∈C S
P ∈C
S
This includes the case of a “maximax” DM, who has α ≡ 0. V is canonical, so that is biseparable, with ρ given by ρ(A) = α minP ∈C P (A) + (1 − α) maxP ∈C P (A), for A ∈ . (iii) Consider a binary relation constructed as follows: There is a cardinal utility u, a probability P and a number β ∈ [0, 1] such that is represented by V (f ) ≡ (1 − β) u(f (s)) P (ds) + β ϕ(u ◦ f ), S
where
"
ϕ(u ◦ f ) ≡ sup
u(g(s)) P (ds) : g ∈ F binary, S
% u(g(s)) u(f (s)) for all s ∈ S . describes a DM who behaves as if he was maximizing SEU when choosing among binary acts, but not when comparing more complex acts. The higher the parameter β, the farther the preference relation is from SEU on nonbinary acts. V is monotonic and it satisfies Equation (10.3) with ρ = P , so that it is a canonical representation of . 10.2.2. The Anscombe–Aumann case The Anscombe–Aumann framework is a widely used special case of our framework in which the consequences have an objective feature: X is also a convex subset of a vector space. For instance, X is the set of all the lotteries on a set of prizes if the DM has access to an “objective” independent randomizing device. In this framework, it is natural to consider the following variant of the biseparable preference model— where for every f , g ∈ F and α ∈ [0, 1], αf + (1 − α)g denotes the act which pays αf (s) + (1 − α)g(s) ∈ X for every s ∈ S. Definition 10.2. A canonical representation V of a preference relation is constant linear (c-linear for short) if V (αf + (1 − α)x) = αV (f ) + (1 − α)V (x) for all binary f ∈ F , x ∈ X, and α ∈ [0, 1]. A relation is called a c-linearly biseparable preference if it admits a c-linear canonical representation. Again, an axiomatic characterization of this model is found in Ghirardato and Marinacci (2000a). It generalizes the SEU model ofAnscombe andAumann (1963) and many non-EU extensions that followed, like the CEU and MEU models of Schmeidler (1989) and Gilboa and Schmeidler (1989) respectively. In fact, a clinearly biseparable preference behaves in a SEU fashion over the set X of the
Ambiguity made precise
219
constant acts, but it is almost unconstrained over nonbinary acts. (C-linearity guarantees the cardinality of V and hence u.) All the results in this chapter are immediately translated to this class of preferences, in particular to the CEU and MEU models in the Anscombe–Aumann framework mentioned earlier. Indeed, as we show in Proposition 10.2 later, in this case removing cardinal risk aversion is much easier than in the more general framework we use.
10.3. The definitions As anticipated in the Introduction, the point of departure of our search for an extended notion of ambiguity aversion is the following partial order on preference relations: Definition 10.3. Let 1 and 2 be two preference relations. We say that 2 is more uncertainty averse than 1 if: For all x ∈ X and f ∈ F , both x 1 f ⇒ x 2 f
(10.7)
x 1 f ⇒ x 2 f .
(10.8)
and
This order has the advantage of making the weakest prejudgment on which acts are “intuitively” unambiguous: The constants. However, Example 10.1 illustrates that it does not discriminate between cardinal risk attitude and ambiguity attitude: DMs 1 and 2 are intuitively both ambiguity neutral, but 1 is more cardinal risk averse, and hence more uncertainty averse than 2. The problem is that constant acts are “neutral” with respect to ambiguity and with respect to cardinal risk. Given that our objective is comparing ambiguity attitudes, we thus need to find ways to coarsen the ranking above, so as to identify which part is due to differences in cardinal risk attitude and which is due to differences in ambiguity attitude. 10.3.1. Filtering cardinal risk attitude While the “factorization” just described can be achieved easily if we impose more structure on the decision framework (see, e.g. the discussion in Subsection 10.7.3), we present a method for separating cardinal risk and ambiguity attitude which is only based on preferences, does not employ extraneous devices, and obtains the result for all biseparable preferences. Moreover, this approach does not impose any restrictions on the two DMs’ beliefs (and hence on their relative ambiguity attitude), a problem that all alternatives share. The key step is coarsening comparative uncertainty aversion by adding the following restriction on which pairs of preferences are to be compared (we write {x, y} z as a short-hand for x z and y z, and similarly for ≺):
220
Paolo Ghirardato and Massimo Marinacci
Definition 10.4. Two preference relations 1 and 2 are cardinally symmetric if for any pair (A1 , A2 ) ∈ × such that each Ai is essential for i , i = 1, 2, and any v∗ , v ∗ , w∗ , w∗ ∈ X such that v∗ ≺1 v ∗ and w∗ ≺2 w ∗ we have: •
If there are x, y ∈ X such that v∗ 1 {x, y}, w∗ 2 {x, y}, and v∗ A1 x ∼1 v ∗ A1 y
and
w∗ A2 x ∼2 w ∗ A2 y,
(10.9)
then for every x , y ∈ X such that v∗ 1 {x , y }, w∗ 2 {x , y } we have v∗ A1 x ∼1 v ∗ A1 y ⇐⇒ w∗ A2 x ∼2 w ∗ A2 y . •
(10.10)
Symmetrically, if there are x, y ∈ X such that v ∗ ≺1 {x, y}, w ∗ ≺2 {x, y}, and x A1 v ∗ ∼1 y A1 v∗
and
x A2 w ∗ ∼2 y A2 w∗ ,
(10.11)
then for every x , y ∈ X such that v ∗ ≺1 {x , y }, w ∗ ≺2 {x , y } we have x A1 v ∗ ∼1 y A1 v∗ ⇐⇒ x A2 w ∗ ∼2 y A2 w∗ .
(10.12)
This condition is inspired by the utility construction technique used in the axiomatizations of additive conjoint measurement in, for example, Krantz et al. (1971) and Wakker (1989). A few remarks are in order: First, cardinal symmetry holds vacuously for any pair of preferences which do not have essential events. Second, cardinal symmetry does not impose restrictions on the DMs’ relative ambiguity attitudes. In fact, for all acts ranked by i , the consequence obtained if Ai is always strictly better than that obtained if Aci , so that all acts are bets on the same event Ai . Intuitively, a DM’s ambiguity attitude affects these bets symmetrically, so that his preferences do not convey any information about it. Moreover, cardinal symmetry does not constrain the DMs’ relative confidence on A1 and A2 , since the “win” (or “loss”) payoffs can be different for the two DMs. On the other hand, it does unsurprisingly restrict their relative cardinal risk attitudes. To better understand the relative restrictions implied by cardinal symmetry, assume that consequences are monetary payoffs and that both DMs like more money to less. Suppose that, when betting on events (A1 , A2 ), (10.9) holds for some “loss” payoffs x and y and “win” payoffs v ∗ 1 v∗ and w∗ 2 w∗ respectively. This says that exchanging v∗ for v ∗ as the prize for A1 , and w∗ for w∗ as the prize for A2 , can for both DMs be traded off with a reduction in “loss” from x to y. Suppose that when the initial loss is x < x, 1 is willing to tradeoff the increase in “win” with a reduction in “loss” to y , but 2 accepts reducing “loss” only to y > y (i.e., w∗ A2 x 2 w ∗ A2 y , in violation of (10.10)). That is, as the amount of the low payoff decreases, 2 becomes more sensible to differences in payoffs than 1 . Such diversity of behavior—that we intuitively attribute to differences in the DMs’ risk attitude—is ruled out by cardinal symmetry, which requires that the two DMs consistently agree on the acceptable tradeoff for improving their “win”
Ambiguity made precise
221
payoff, and similarly for the “loss” payoff. It is important to stress that this discussion makes sense only when both DMs are faced with nontrivial uncertainty (i.e. they are both betting on essential events). Thus, we do not use “trade-off” to mean certain substitution; rather, substitution in the context of an uncertain prospect. To see how cardinal symmetry is used to show that two biseparable preferences have the same cardinal risk attitude, assume first that the two relations are ordinally equivalent: for every x, y ∈ X, x 1 y ⇔ x 2 y. When that is the case, cardinal symmetry holds if and only if their canonical utility indices are positive affine transformations of each other. In order to simplify the statements, we write u1 ≈ u2 to denote such “equality” of indices. Proposition 10.1. Suppose that 1 and 2 are ordinally equivalent biseparable preferences which have essential events. Then 1 and 2 are cardinally symmetric if and only if their canonical utility indices satisfy u1 ≈ u2 . The intuition of the proof (see Appendix B) can be quickly grasped by rewriting, say, Equations (10.9) and (10.10) in terms of the canonical representations to find that for every x, y, x , y ∈ X, u1 (x) − u1 (y) = u1 (x ) − u1 (y ) ⇐⇒ u2 (x) − u2 (y) = u2 (x ) − u2 (y ). Notice however that this does not imply that the preferences are identical on binary acts: The DMs’ beliefs on events could be totally different. The comparative notion of ambiguity aversion we propose in the next subsection checks comparative uncertainty aversion in preferences with the same cardinal risk attitude. Clearly, it would be nicer to have a comparative notion that also ranks preferences without the same cardinal risk attitude. In Subsection 10.7.1, we discuss how to extend our notion to deal with these cases. This extension requires the exact measurement of the two preferences’ canonical utility indices, and is thus “less behavioral” than the one we just anticipated. Finally, we remark that a symmetric exercise to that performed here is to coarsen comparative uncertainty aversion so as to rank preferences by their cardinal risk aversion only. In Ghirardato and Marinacci (2000a) it is shown that for biseparable preferences such ranking is represented by the ordering of canonical utilities by their relative concavity, thus generalizing the standard result. 10.3.2. Comparative and absolute ambiguity aversion Having thus prepared the ground, our comparative notion of ambiguity is immediately stated: Definition 10.5. Let 1 and 2 be two preference relations. We say that 2 is more ambiguity averse than 1 whenever both the following
222
Paolo Ghirardato and Massimo Marinacci
conditions hold: (A) 2 is more uncertainty averse than 1 ; (B) 1 and 2 are cardinally symmetric. Thus, we restrict our attention to pairs which are cardinally symmetric. As explained earlier, when one DM’s preference does not have an essential event, cardinal risk aversion does not play a role in that DM’s choices, so that we do not need to remove it from the picture. Remark 10.1. So far, we have tacitly assumed that cardinal risk and ambiguity attitude completely characterize biseparable preferences. Indeed, the validity of this can be easily verified by observing that if two such preferences are “as uncertainty averse as” each other (i.e., 1 is more uncertainty averse than 2 , and vice versa), they are identical. We finally come to the absolute definition of ambiguity aversion and love. Let be a preference relation on F with a SEU representation.5 As we observed in the Section 10.1, these relations intuitively embody ambiguity neutrality. We propose to use them as the benchmark for defining ambiguity aversion. Of course, one could intuitively hold that the SEU ones are not the only relations embodying ambiguity neutrality, and thus prefer using a wider set of benchmarks. This alternative route is discussed in Subsection 10.7.3. Definition 10.6. A preference relation is ambiguity averse (loving) if there exists a SEU preference relation which is less (more) ambiguity averse than . It is ambiguity neutral if it is both ambiguity averse and ambiguity loving. If is a SEU preference which is less ambiguity averse than , we call it a benchmark preference for . We denote by R() the set of all benchmark preferences for . That is, R() ≡ { ⊆ F × F : is SEU and is more ambiguity averse than }. Each benchmark preference ∈ R() induces a probability measure P on , so a natural twin of R() is the set of the benchmark measures: M() = {P ∈ : P represents , for ∈ R()}. Using this notation, Definition 10.6 can be rewritten as follows: is ambiguity averse if either R() = Ø, or M() = Ø.
10.4. The characterizations We now characterize the notions of comparative and absolute ambiguity aversion defined in the previous section for the general case of biseparable preferences, and the important subcases of CEU and MEU preferences. To start, we use
Ambiguity made precise
223
Proposition 10.1 and the observation that the canonical utility index of a preference with no essential events is ordinal, to show that if two preferences are biseparable and they are ranked by Definition 10.5, they have the same canonical utility index: Theorem 10.1. Suppose that 1 and 2 are biseparable preferences, and that 2 is more ambiguity averse than 1 . Then u1 ≈ u2 . Checking cardinal symmetry is clearly not a trivial task, but for an important subclass of preference relations—the c-linearly biseparable preferences in an Anscombe–Aumann setting—it is implied by comparative uncertainty aversion. In fact, under c-linearity, ordinal equivalence easily implies cardinal symmetry, so that we get: Proposition 10.2. Suppose that X is a convex subset of a vector space, and that 1 and 2 are c-linearly biseparable preferences. 2 is more ambiguity averse than 1 if and only if 2 is more uncertainty averse than 1 . Therefore, in this case Definition 10.3 can be directly used as our definition of comparative ambiguity attitude. 10.4.1. Absolute ambiguity aversion We first characterize absolute ambiguity aversion for a general biseparable preference . Suppose that V is a canonical representation of , with canonical utility u. We let " % D() ≡ P ∈ : u(f (s)) P (ds) V (f ) for all f ∈ F . S
That is, D(), which depends only on V , is the set of beliefs inducing preferences which assign (weakly) higher expected utility to every act f . These preferences exhaust the set of the benchmarks of : Theorem 10.2. Let be a biseparable preference. Then, M() = D(). In particular, is ambiguity averse if and only if D() = Ø. Let ρ be the capacity associated with the canonical representation V . It is immediate to see that if P ∈ D(), then P ρ. Thus, nonemptiness of the core of ρ (the set of the probabilities that dominate ρ pointwise, that we denote C(ρ)) is necessary for to be ambiguity averse. In Subsection 10.4.2 it is shown to be not sufficient in general. Turn now to the characterization of ambiguity aversion for the popular CEU and MEU models. Suppose first that is a CEU preference relation represented by the capacity ν, and let C(ν) denote ν’s possibly empty core. It is shown that D() = C(ν), so that the following result—which also provides a novel decision-theoretic interpretation of the core as the set of all the benchmark measures—follows as a corollary of Theorem 10.2.
224
Paolo Ghirardato and Massimo Marinacci
Corollary 10.1. Suppose that is a CEU preference relation, represented by capacity ν. Then C(ν) = M(). In particular, is ambiguity averse if and only if C(ν) = Ø. Thus, the core of an ambiguity averse capacity is equal to the set of its benchmark measures, and the ambiguity averse capacities are those with a nonempty core, called “balanced.” A classical result (see, e.g. Kannai, 1992) thus provides an internal characterization of ambiguity aversion in the CEU case: Letting 1A denote the characteristic function of A ∈ , a capacity reflects ambiguity aversion if and only if for all λ1 , . . . , λn 0 and all A1 , . . . , An ∈ such that ni=1 λi 1Ai 1S , we have ni=1 λi ν (Ai ) 1. As convex capacities are balanced, but not conversely, the corollary motivates our claim that convexity does not characterize our notion of ambiguity aversion. This point is illustrated by Example 10.4 below, which presents a capacity that intuitively reflects ambiguity aversion but is not convex. On the other hand, given a MEU preference relation with set of priors C, it is shown that D() = C. Thus, Theorem 10.2 implies that any MEU preference is ambiguity averse (as it is intuitive) and, more interestingly, that the set C can be interpreted as the set of the benchmark measures for . Corollary 10.2. Suppose that is a MEU preference relation, represented by the set of probabilities C. Then C = M(), so that is ambiguity averse. As to ambiguity love, reversing the proof of Theorem 10.2 shows that for any biseparable preference, ambiguity love is characterized by nonemptiness of the set " % E () ≡ P ∈ : u(f (s)) P (ds) V (f ) for all f ∈ F . S
In particular, a CEU preference with capacity ν is ambiguity loving if and only if the set of probabilities dominated by ν is nonempty. As for MEU preferences: None is ambiguity loving. Conversely, any “maximax” EU preference is ambiguity loving, with E () = C. Finally, we look at ambiguity neutrality. Since we started with an informal intuition of SEU preferences as reflecting neutrality to ambiguity, an important consistency check on our analysis is to verify that they are ambiguity neutral in the formal sense. This is the case: Proposition 10.3. Let be a biseparable preference. Then is ambiguity neutral if and only if it is a SEU preference relation. 10.4.2. Comparative ambiguity aversion We conclude the section with the characterization of comparative ambiguity aversion. The general result on comparative ambiguity, an immediate consequence
Ambiguity made precise
225
of Theorem 10.2, is stated as follows (where ρ1 and ρ2 represent the willingness to bet of 1 and 2 respectively): Proposition 10.4. Let 1 and 2 be two biseparable preferences. If 2 is more ambiguity averse than 1 , then ρ1 ρ2 , D(1 ) ⊆ D(2 ), E (1 ) ⊇ E (2 ) and u1 ≈ u2 . Thus, relative ambiguity implies containment of the sets D() and E () (clearly in opposite directions), and dominance of the willingness to bet ρ. Of course, the proposition lacks a converse, and thus it does not offer a full characterization. As we argue below, biseparable preferences seem to have too little structure for obtaining a general characterization result. Things are different if we restrict our attention to specific models. For instance, the next result characterizes comparative ambiguity for the CEU and MEU models: Theorem 10.3. Let 1 and 2 be biseparable preferences, with canonical utilities u1 and u2 respectively. (i) Suppose that 1 and 2 are CEU, with respective capacities ν1 and ν2 . Then 2 is more ambiguity averse than 1 if and only if ν1 ν2 and u1 ≈ u2 . (ii) Suppose that 1 is MEU, with set of probabilities C1 . Then 2 is more ambiguity averse than 1 if and only if C1 = D(1 ) ⊆ D(2 ) and u1 ≈ u2 . Observe that part (ii) of the theorem does more than characterize comparative ambiguity for MEU preferences, as it applies to any biseparable 2 . For instance, it is immediate to notice that one can characterize absolute ambiguity aversion using that result and the fact that if 1 is a SEU preference relation with beliefs P , then C1 = {P }. Also, a symmetric result to (ii) holds: If 2 is “maximax” EU, it is more ambiguity averse than 1 iff C2 = E (2 ) ⊆ E (1 ). Remark 10.2. Theorem 10.3 can be used to explain the apparent incongruence of the characterization of comparative risk aversion in SEU (in the sense of Yaari, 1969) and of comparative ambiguity aversion in CEU: Convexity of ν seems to be the natural counterpart of concavity of u, but it is not. This is due to the different uniqueness properties of utility functions and capacities. A SEU 2 is more risk averse than a SEU 1 iff for every common normalization of the utilities, we have u2 (x) u(x) inside the interval of normalization. Since any normalization is allowed, u2 must then be a concave transformation of u1 . In the case of capacities only one normalization is allowed, so we only have ν1 ν2 . It is not difficult to show that the necessary conditions of Proposition 10.4 are not sufficient if taken one by one. For instance, there are pairs of MEU (resp. CEU) preferences 1 and 2 such that ρ1 ρ2 (resp. C(ν1 ) = D(1 ) ⊆ D(2 ) = C(ν2 )) does not entail that 2 is more ambiguity averse than 1 .
226
Paolo Ghirardato and Massimo Marinacci
Example 10.2. Let S = {s1 , s2 , s3 }, the power set of S. Consider the probabilities P , Q and R defined by P = [1/2, 0, 1/2], Q = [0, 1, 0] and R = [1/2, 1/2, 0]. Let C1 and C2 respectively be the closed convex hull of {P , Q} and {P , Q, R}. Then, ρ1 = P 1 = P 2 = ρ2 , but C2 ⊆ C1 , and indeed by Theorem 10.3 the MEU preference 2 inducing C2 is more ambiguity averse than the MEU preference 1 inducing C1 . Consider next a capacity ν such that ν(A) = 1/3 for any A = {Ø, S}, and a probability P equal to 1/3 on each singleton. Then C(ν) = {P }, so that ν is balanced, but not exact (for instance, P ({s1 , s2 }) = 2/3 > 1/3 = ν({s1 , s2 })). We have C(ν) ⊆ C(P ) but ν P , and by Theorem 10.3 the CEU preference inducing ν is not more ambiguity averse than that inducing P . In contrast, P is exact, and we have both C(P ) ⊆ C(ν) and P ν. This example illustrates two conceptual observations. The first (anticipated in Subsection 10.4.1) is that nonemptiness of the core of ρ is not sufficient for absolute ambiguity aversion: A probability can dominate ρ without being a benchmark measure for . Unsurprisingly, in general the capacity ρ does not completely describe the DM’s ambiguity attitude. The second observation is that, while D() does characterize the DM’s absolute ambiguity aversion, it is also an incomplete description of the DM’s ambiguity attitude: There can be preferences 1 and 2 strictly ranked by comparative ambiguity even though D(1 ) = D(2 ). To better appreciate the difficulty of obtaining a general sufficiency result for biseparable preferences, we now present an example in which all the necessary conditions hold but the comparative ranking does not obtain. Example 10.3. For a general S and (but see the restriction on P below), consider two preference relations 1 and 2 which behave according to example (iii) of biseparable preference in Section 10.2. Both have identical P and u (which ranges in a nondegenerate interval of R), with the following restriction on P : There are at least three disjoint events in , A1 , A2 and A3 such that P (Ai ) > 0 for i = 1, 2, 3 (otherwise both preferences are indistinguishable from SEU preferences with utility u and beliefs P ). Their β parameters are different, in particular β2 > β1 > 0. Clearly ρ1 = ρ2 = P and u1 ≈ u2 . It is also immediate to verify that, under the assumption on P , D(1 ) = D(2 ) = {P } and E (1 ) = E (2 ) = Ø, so that both preferences are (strictly) ambiguity averse. However, 1 is not more ambiguity averse than 2 (nor are 1 and 2 equal, which would follow from two applications of the converse). Indeed, the parameter β measures comparative ambiguity for these preferences, so that 2 is more ambiguity averse than 1 .
10.5. Unambiguous acts and events Let be an ambiguity averse or loving preference relation. Even though the preference relation has a strict ambiguity attitude, it may nevertheless behave in an ambiguity neutral fashion with respect to some subclass of acts and events,
Ambiguity made precise
227
that we may like to consider “unambiguous.” The purpose of this section is to identify the class of the unambiguous acts and the related class of unambiguous events, and to present a characterization of the latter for biseparable (in particular CEU and MEU) preference relations. We henceforth focus on ambiguity averse preference relations, but it is easy to see that all the results in this section can be shown for ambiguity loving preferences. A more extensive discussion of the behavioral definition of ambiguity for events and acts is found in Ghirardato and Marinacci, 2000b. In view of our results so far, the natural approach in defining the class of unambiguous events of a preference relation is to fix a benchmark ∈ R(), and to consider the subset of all the acts in F over which is as ambiguity averse as . Intuitively, ambiguity is a property that the DM attaches to partitions of events, so that nonconstant acts which generate the same partition should be consistently deemed either both ambiguous or both unambiguous. Hence, we consider as “truly” unambiguous only the acts which belong to the set defined next. Definition 10.7. Given a preference relation and ∈ R(), the set of -unambiguous acts, denoted H , is the largest subset of F satisfying the following two conditions:6 (A) For every x ∈ X and every f ∈ H , and agree on the ranking of f and x. (B) For every f ∈ H and every g ∈ F , if {{s : g(s) ∼ x} : x ∈ X} ⊆ {{s : f (s) ∼ x} : x ∈ X}, then g ∈ H . Given a preference relation , for any f ∈ F denote by f the collection of all the “upper pre-image” sets of f , that is, f = {{s : f (s) x} : x ∈ X} .
(10.13)
Since any benchmark ∈ R() is ordinally equivalent to , for any act f ∈ F the upper pre-images of f with respect to and coincide: for all x ∈ X, {s: f (s) x} = {s: f (s) x}. The set ⊆ of the -unambiguous events is thus naturally defined to be the collection of all sets of upper pre-images of the acts in H . That is, ≡ f . f ∈H
It is immediate to observe that if A ∈ , then for every x, y ∈ X the binary act xAy belongs to H . This implies that Ac ∈ (i.e., is closed w.r.t. complements). We now present the characterization of the set . This turns out to be quite simple and intuitive: It is the subset of the events over which the capacity ρ representing ’s willingness to bet is complement-additive (sometimes called “symmetric”):
228
Paolo Ghirardato and Massimo Marinacci
Proposition 10.5. Let be an ambiguity averse biseparable preference with willingness to bet ρ. Then for every ∈ R(), the set satisfies: , + = A ∈ : ρ(A) + ρ(Ac ) = 1 . (10.14) It immediately follows from the proposition that the choice of the specific benchmark does not change the resulting set of events. In light of this, we henceforth call = the set of unambiguous events for . The consequences of the proposition for the CEU and MEU models are clear: Just substitute ν or P for ρ. In particular, when is a MEU preference with a set of probabilities C, it can be further shown that is the set of events on which all probabilities agree: = {A ∈ : ρ(A) = P (A) for all P ∈ C} . It is also interesting to observe that is in general not an algebra. This is intuitive, as the intersection of unambiguous events could be ambiguous.7 As to the set of unambiguous acts H , it can also be seen to be independent of the choice of benchmark. In general, the only way to ascertain which acts are unambiguous is to construct the set H . However, for MEU preferences and for CEU preferences whose capacity is exact (the lower envelope of its core), the set H is the set of all the acts which are measurable with respect to the events in . Therefore, in these cases characterizes the set of unambiguous acts as well. (All these results are proved in Ghirardato and Marinacci, 2000b).
10.6. Back to Ellsberg We now illustrate our results using the classical Ellsberg urn. The urn contains 90 balls of three colors: red, blue and yellow. The DM knows that there are 30 red balls and that the other 60 balls are either blue or yellow. However, he does not know their relative proportion. The state space for an extraction from the urn is S = {B, R, Y }. Given the nature of his information, it is natural to assume that the DM’s preference relation will be such that its set of unambiguous events satisfies ⊇ {Ø, {R}, {B, Y }, S}. In particular, assume that the DM’s preference relation is CEU and it induces the capacity ν. To reflect the fact that {R} and {B, Y } form an unambiguous partition, we know from Section 10.5 that if the DM is ambiguity averse (or loving) ν must satisfy ν(R) + ν(B, Y ) = 1.
(10.15)
Also, because of the symmetry of the information that the DM is given, it is natural to assume that ν(B) = ν(Y )
and ν(B, R) = ν(R, Y ).
(10.16)
We first show that, if the ambiguity restriction (10.15) is imposed, ambiguity aversion is not compatible with the following beliefs, which induce behavior that
Ambiguity made precise
229
would on intuitive grounds be considered “ambiguity loving”: ν(R) < ν(B) = ν(Y ); ν(B, Y ) < ν(B, R) = ν(R, Y ).
(10.17)
Proposition 10.6. No ambiguity averse CEU preference relation such that its set of unambiguous events contains {{R}, {B, Y }} can agree with the ranking (10.17). In his paper on ambiguity aversion, Epstein (1999) also discusses the Ellsberg urn, and he presents a convex capacity compatible with ambiguity loving in his sense (see Subsection 10.7.3 for a brief review), which satisfies the conditions in (10.17). This is the capacity ν1 defined by 1 , ν1 (B, Y ) = 13 , ν1 (R) = 12 1 ν1 (B) = ν1 (Y ) = 6 , ν1 (B, R) = ν1 (R, Y ) = 12 .
He thus concludes that convexity of beliefs does not imply ambiguity aversion for CEU preferences (it is also not implied, in his definition). We know from Corollary 10.1 that convexity implies ambiguity aversion in our sense. Proposition 10.6 helps clarifying why this example does not conflict with the intuition developed earlier: In fact, ν1 does embody ambiguity aversion in our sense, but it does not reflect the usual presumption that {R} and {B, Y } are seen as unambiguous events. If it did, it would have to satisfy (10.15), which is not the case (it cannot be, since convex capacities are balanced). For us, the DM with beliefs ν1 does not perceive {R} and {B, Y } as unambiguous. Of course, then it is not clear in which sense the conditions in (10.17) should “intuitively” embody ambiguity loving behavior. Going back to the example, we would say that the DM’s preferences intuitively reflect ambiguity aversion if the reverse inequalities held: ν (R) ν (B) = ν (Y ) ; ν (B, Y ) ν (B, R) = ν (R, Y ) .
(10.18)
We now show that the notion of ambiguity aversion proposed earlier characterizes this intuitive ranking when, besides the obvious symmetry restrictions in (10.16), we strengthen the requirement in (10.15) in the following natural way: ν(R) =
1 3
and
ν(B, Y ) =
2 . 3
(10.19)
Proposition 10.7. Let be a CEU preference relation such that its representing capacity ν satisfies the equalities (10.16) and (10.19). Then is ambiguity averse if and only if ν agrees with the ranking (10.18). In closing our discussion of Ellsberg’s problem, we provide further backing for our belief that convexity is not necessary for ambiguity aversion. Here is a capacity which is not convex, and still makes the typical Ellsberg choices.
230
Paolo Ghirardato and Massimo Marinacci
Example 10.4. Consider the capacity ν2 defined by (10.19) and ν2 (B) = ν2 (Y ) =
7 , 24
ν2 (B, R) = ν2 (R, Y ) =
1 . 2
This capacity satisfies (10.18), so that it reflects ambiguity aversion both formally and intuitively, but it is not superadditive, let alone convex.
10.7. Discussion In this section we discuss some of the choices we have made in the previous sections. First we briefly discuss how the comparative ambiguity ranking can be extended to preferences with different cardinal risk attitude. Then we discuss in more detail how the unambiguous acts described in Section 10.5 can be used in the comparative ranking, and why we chose SEU preferences as benchmarks. 10.7.1. Comparative ambiguity and equality of cardinal risk attitude As we observed earlier, our comparative ambiguity aversion notion cannot compare biseparable preferences with different canonical utility indices. Of course, the characterization results of Section 10.4 can be used to qualitatively compare two preferences by ambiguity: For instance, we can look at two CEU preferences and compare their willingness to bet, or we can use utility functions to compare two SEU preferences by risk aversion, even if they do not have the same beliefs. However, when dealing with biseparable preferences, it is easy to apply the intuition of our comparative ranking to compare preferences which do not have the same canonical utility. This requires eliciting the canonical utility indices first, and then using acts and constants that are “utility equivalents” in Equations (10.7) and (10.8).8 The ranking thus obtained is very general (it does not even entail ordinal equivalence), but it yields mutatis mutandis the same characterization results that we obtained with the more restrictive one. For instance: is ambiguity averse iff D() = Ø, and CEU (MEU) preference 2 is more ambiguity averse than CEU (MEU) preference 1 iff ν1 ν2 (C1 ⊆ C2 ) (but of course in general u1 ≈ u2 ). Nonetheless, this ranking requires the full elicitation of the DMs’ canonical utility indices, and is thus operationally more complex than that in Definition 10.5. 10.7.2. Using unambiguous acts in the comparative ranking One of the intuitive assumptions that our analysis builds on is that constant acts are primitively “unambiguous:” That is, we assume that every DM perceives constants as unambiguous. No other acts are “unambiguous” in this primitive sense. However, one could argue that it is natural to use in the comparative ranking also those acts which are revealed to be deemed unambiguous by both DMs, even if they are not constant.
Ambiguity made precise
231
Suppose that is an ambiguity averse biseparable preference, and let H ( ) be its set of unambiguous acts (events), as defined in Section 10.5. It is possible to see Ghirardato and Marinacci (2000b) that for every ∈ R() and every h ∈ H and f ∈ F , we have hf ⇒hf
and
h > f ⇒ h f.
(10.20)
That is, all benchmarks according to Definition 10.5 satisfy the stronger comparative ranking suggested above. Conversely, it is obvious that if and a SEU preference are cardinally symmetric and satisfy (10.20), they satisfy Definition 10.5. Thus, modifying Definition 10.5 to have (10.20) in part (A) does not change the set of the ambiguity averse preferences. 10.7.3. A more general benchmark We chose SEU maximization as the benchmark representing ambiguity neutrality. While few would disagree that SEU preferences are “ambiguity neutral” (in a primitive, nonformal sense), some readers may find that the result of Proposition 10.3 that SEU maximization characterizes ambiguity neutrality does not agree with their intuition of what constitutes ambiguity neutral behavior. In particular, they might feel that we should also classify as ambiguity neutral any non-SEU preference whose likelihood relation can still be represented by a probability measure. This would clearly be the case if we let such preferences be benchmarks for our comparative ambiguity notion. Here we explain why we have not followed that route, and the consequences of this choice for the interpretation of our notions. The non-SEU preferences in question are those that are probabilistically sophisticated (PS) in the sense of Machina and Schmeidler (1992). For example, consider a CEU preference whose willingness to bet is ρ = g(P ) for some probability measure P and “distortion” function g; that is, an increasing g : [0, 1] → [0, 1] such that g(0) = 0 and g(1) = 1. Such is PS since its ranking of bets (likelihood relation) is represented by the probability P , but it is not SEU if g is different from the identity function. According to the point of view suggested above, such is “ambiguity neutral”; it should thus be used as a benchmark in characterizing ambiguity aversion. Moreover, if we used PS preferences as benchmarks it might be possible to avoid attributing to ambiguity aversion the effects of probabilistic risk aversion. However, go back to the ambiguous urn of Example 10.1 and consider the following: Example 10.1. (continued) In the framework of Example 10.1, consider a third DM with CEU preferences 3 , with canonical utility u(x) = x and willingness to bet defined by ρ3 (B) =
1 4
and
ρ3 (R) =
1 . 4
It is immediate to verify that according to Definition 10.5, DM 3 is more ambiguity averse than DM 1 (who is SEU), so that he is ambiguity averse in our sense.
232
Paolo Ghirardato and Massimo Marinacci
That seems quite natural, since he is willing to invest less in bets on the ball extractions. With PS benchmarks, we conclude that both DMs are ambiguity neutral, since their willingness to bet are ordinally equivalent to the probability ρ1 (ρ3 = g(ρ1 ) for any distortion g such that g(1/2) = 1/4), so that both are PS. Hence, DM 3’s behavior is only due to his probabilistic risk aversion. Yet, it seems that the fact that DM 3 is only willing to bet 1/4 utils on any color may at least in part be due to the ambiguity of the urn and his possible ambiguity aversion. This example is not the only case in which using PS benchmarks yields counterintuitive conclusions. When the state space is finite, if we use PS preferences as benchmarks we find that almost every CEU preference inducing a strictly positive ρ on a finite state space is both ambiguity averse and loving. Thus, a large set of preferences are shown to be ambiguity neutral. Including, as the following example illustrates, many preferences which are not PS. Example 10.5. Suppose that two DMs are faced with the following decision problem. There are two urns, both containing 100 balls, either red or black. The DMs are told that Urn I contains at least 40 balls of each color, while Urn II contains at least 10 balls of each color. One ball will be extracted from each urn. Thus, the state space is S = {Rr, Rb, Br, Bb}, where the upper (lower) case letter stands for the color of the ball extracted from Urn I (II). Suppose that both DMs have CEU preferences 1 and 2 , with respective willingness to bet ρ1 and ρ2 . Using obvious notation, suppose that ρ1 (b) = ρ1 (r) = 0.1 and ρ1 (B) = ρ1 (R) = 0.4, that ρ1 (s) = 0.04 for each singleton s, and for every other event ρ1 is obtained by additivity. According to Definition 13, DM 1 is strictly ambiguity averse. In contrast, with PS benchmarks the result mentioned above shows that he is ambiguity neutral. Let ρ2 be as follows: ρ2 (b) = ρ2 (r) = 0.9 and ρ2 (B) = ρ2 (R) = 0.6, ρ2 (s) = 0.54 for each singleton s, ρ2 (A) = 0.92 for each A ∈ {Rr ∪ Bb, Rb ∪ Br}, and ρ2 (A) = 0.95 for each ternary set. According to Corollary 10.1, DM 2 is ambiguity loving, but if we use PS benchmarks we conclude that she is ambiguity neutral. Both conclusions go against our intuition. Moreover, since both ρ1 and ρ2 are not ordinally equivalent to a probability, 1 and 2 are not PS. The foregoing discussion shows some of the difficulties that may arise if we use PS, rather than SEU, preferences as benchmarks with our comparative ambiguity aversion notion: We end up attributing too much explanatory power to probabilistic risk aversion. Instead, with SEU benchmarks we overemphasize the role of ambiguity aversion. Is it possible to remove probabilistic risk attitude from the picture, as we did for cardinal risk attitude?9 10.7.3.1. Removing probabilistic risk aversion Suppose that there is a subset E of acts which are universally accepted as “unambiguous,” in the sense that we are sure that a DM’s choices among these acts are unaffected by his ambiguity attitude. Then, if E (and the
Ambiguity made precise
233
associated set of “unambiguous” events, denoted ) is sufficiently rich, we can discriminate between probabilistic risk and ambiguity aversion. For instance, modify Example 10.1 by assuming the availability of an “unambiguous” randomizing device, so that each state describes the result of the device as well. Now, find a set A of results of the device (obviously, here is the family of all such sets) which is as likely as R(ed) and then check if B(lack) is as likely as Ac . If it is, the DM behaves identically when faced with (equally likely) ambiguous and unambiguous events, so that all the nonadditivity of ρ3 on {B, R} must be due to his probabilistic risk aversion. His preferences are also PS on the extended problem. If it is not, then DM 3’s behavior is affected by ambiguity, and his preferences are not PS on the extended problem. The point is that in the presence of a sufficiently rich , a DM whose preferences are PS is treating ambiguous and unambiguous events symmetrically, and is hence intuitively ambiguity neutral. Therefore, in such a case we would expect PS preferences to be found ambiguity neutral. This is not the case in the original version of Example 10.1, since a rich set of “unambiguous” events is missing. More generally, consider a biseparable preference which is not PS overall, but is PS when comparing only unambiguous acts. That is, the DM behaves as if he forms a probability P on the set , and calculates his willingness to bet on these events by means of a distortion function g which only reflects his probabilistic risk attitude. As we did in controlling for cardinal risk attitude, we want to use as benchmarks for only those PS preferences—that with a small abuse of notation we also denote —which have the same probabilistic risk attitude; for example, those biseparable preferences which share g as distortion function. Interestingly, it turns out that if the set E is rich enough, any PS preference satisfying Equation (10.20) for all h ∈ E has this property. This is exactly the approach followed by Epstein (1999) in his work on ambiguity aversion: He assumes the existence of a suitably rich set of “unambiguous” events,10 defines E as the set of all the -measurable acts, and uses Equation (10.20) with h ∈ E as his comparative ambiguity notion. His choice of benchmark are PS preferences. This approach attains the objective of “filtering” the effects of probabilistic risk attitude from our absolute ambiguity notion. It thus yields a finer assessment of the DM’s ambiguity attitude. However, the foregoing discussion has illustrated that a crucial ingredient to this filtration is the existence of a set of “unambiguous” acts which is sufficiently rich: If it is too poor (e.g. it contains only the constants, as in Example 10.5), we may use benchmarks whose probabilistic risk attitude is different from the DM’s. This may cause Epstein’s approach to reach counterintuitive conclusions, as illustrated in the previous examples. The main problem we have with this approach is that we find it undesirable to base our measurement of ambiguity attitude on an exogenous notion of “ambiguity,” especially in view of the richness requisite. It seems that in many cases of interest the “obvious” set of “unambiguous” acts does not satisfy such requisite; for example, Ellsberg’s example. Our objective is to develop a notion of ambiguity attitude which is based on the weakest set of primitive requisites (like the two assumptions stated in the Introduction), even though this has a cost in terms of the “purity” of the interpretation of the behavioral feature we measure.
234
Paolo Ghirardato and Massimo Marinacci
Epstein and Zhang (2001) propose a behavioral foundation to the notion of “ambiguity,” so that the existence of a rich set E can be objectively verified, solving the problem mentioned earlier. In Ghirardato and Marinacci (2000b) we present an example which suggests that their behavioral notion can lead to counterintuitive conclusions (in that case, an intuitively ambiguous event is found unambiguous). More generally, we see the following problem with this enterprise: There may be events which are “unambiguous” (resp. “ambiguous”) with respect to which the DM nonetheless behaves in an ambiguity nonneutral (resp. neutral) fashion. Consider a DM who listens to a weather forecast stated as a probabilistic judgment. If the DM does not consider the specific source reliable, he might express a willingness to bet which is a distortion of this judgment, while being probabilistic risk neutral. Alternatively, he may find the source reliable, hence perceive no ambiguity, but be probabilistically risk averse. A preference-based notion of ambiguity must be able to distinguish between these two cases, classifying the relevant events ambiguous in the first case and unambiguous in the second. And this without using any auxiliary information. Considering moreover that the set of “verifiably unambiguous” events must be rich, we are skeptical that this feat is possible: The problem is that the Savage set-up does not provide us with enough instruments; it is too abstract. 10.7.3.2. Summing up We have argued that what motivates using PS (rather than SEU) preferences as benchmarks is the objective of discriminating between probabilistic risk aversion and ambiguity attitude. We have shown that this requires a rich set of “verifiably unambiguous” events, and briefly reviewed our doubts about the possibility of providing a behavioral foundation to this “verifiable ambiguity” notion in a general subjective setting without extraneous devices. In contrast, the analysis in this chapter shows that there are no such problems in using SEU benchmarks to identify an “extended” notion of ambiguity attitude, which can be disentangled from cardinal risk attitude using only behavioral data and no extraneous devices. Though it does not distinguish “real” ambiguity and probabilistic risk attitudes, we think that this “extended” ambiguity attitude is worthwhile, especially because of its wider applicability.
Appendix A: Capacities and Choquet integrals A set-function ν on (S, ) is called a capacity if it is monotone and normalized. That is: if for A, B ∈ , A ⊆ B, then ν(A) ν(B); ν(Ø) = 0 and ν(S) = 1. A capacity is called a probability measure if it is finitely additive: ν(A ∪ B) = ν(A) + ν(B) for all A disjoint from B. It is called convex if for every pair A, B ∈ , we have ν(A ∪ B) ν(A) + ν(B) − ν(A ∩ B). The core of a capacity ν is the (possibly empty) set C(ν) of all the probability measures on (S, ) which dominate it, that is, C(ν) ≡ {P : P ∈ , P (A) ν(A) for all A ∈ }.
Ambiguity made precise
235
Following the usage in Cooperative Game Theory (e.g., Kannai, 1992), all capacities with nonempty core are called balanced. A capacity ν is called exact it is balanced and it is equal to the lower envelope of its core (i.e., for all A ∈ , ν(A) = minP ∈C(ν) P (A)). Convex implies exact, which in turn implies balanced, but the converse implications are all false. The notion of integral used for capacities is the Choquet integral, due to Choquet (1953). For a given -measurable function ϕ : S → R, the Choquet integral of ϕ with respect to a capacity ν is defined as:
∞
ϕ dν = S
ν({s ∈ S : ϕ(s) α}) dα
0
+
0
−∞
[1 − ν({s ∈ S : ϕ(s) α})] dα,
(10.A.1)
where the r.h.s. is a Riemann integral (which is well defined because ν is monotone). When ν is additive, (10.A.1) becomes a standard (additive) integral. In general it is seen to be monotonic, positive homogeneous and comonotonic addi tive: If ϕ, ψ : S → R are non-negative and comonotonic, then (ϕ + ψ) dν = ϕ dν + ψ dν. Two functions ϕ, ψ : S → R are called comonotonic if there are no s, s ∈ S such that ϕ(s) > ϕ(s ) and ψ(s) < ψ(s ).
Appendix B: Cardinal symmetry and biseparable preferences In this Appendix, we prove Proposition 10.1. In order to make the proof as clear as possible, we first explain the notion of “standard sequence,” and then show how the latter can be used to prove the proposition. 10.B.1. Standard sequences Consider a DM whose preferences have a canonical representation V , with canonical utility index u, willingness to bet ρ, and an essential event A ∈ . Fix a pair of consequences v ∗ v∗ , and consider x 0 ∈ X such that x 0 v ∗ . If there is an x ∈ X such that x A v∗ x 0 A v ∗ , then by (10.3) and the convexity of the range of u, there is x 1 ∈ X such that x 1 A v∗ ∼ x 0 A v ∗ .
(10.B.1)
It is easy to verify that x 1 x 0 : If x 0 x 1 held, by monotonicity and biseparability, we would have x 0 A v ∗ x 1 A v ∗ and x 1 A v ∗ x 1 A v∗ . This yields x 0 A v ∗ x 1 A v∗ , a contradiction. Assuming that there is an x ∈ X such that x A v∗ x 1 A v ∗ , as above we can find x 2 ∈ X such that x 2 A v∗ ∼ x 1 A v ∗ .
(10.B.2)
236
Paolo Ghirardato and Massimo Marinacci
Again, x 2 x 1 . We can use the representation V to check that the equivalences in (10.B.1) and (10.B.2) translate to u(x 1 ) − u(x 0 ) =
1 − ρ(A) (u(v ∗ ) − u(v∗ )) = u(x 2 ) − u(x 1 ), ρ(A)
(10.B.3)
that is, the three points x 0 , x 1 , x 2 , are equidistant in u. Proceeding in this fashion we can construct a sequence of points {x 0 , x 1 , x 2 , . . .} all evenly spaced in utility. Such sequence we call an increasing standard sequence with base x 0 , carrier A, and mesh (v∗ , v ∗ ). (Notice that the distance in utility between the points in the sequence is proportional to the distance in utility between v∗ and v ∗ , which is used as the “measuring rod.”) Analogously, we can construct a decreasing standard sequence with base x 0 , carrier A and mesh (v∗ , v ∗ ) where v∗ x 0 . This will be a sequence starting again from x 0 , but now moving in the direction of decreasing utility: For every n 0, v ∗ A x n+1 ∼ v∗ A x n . Henceforth, we call a standard sequence w.r.t. (x 0 , A) any sequence {x¯ 0 , x¯ 1 , x¯ 2 , . . .} such that x¯ 0 = x 0 , and there is a pair of points (above or below x 0 ) which provides the mesh for obtaining {x¯ 0 , x¯ 1 , x¯ 2 , . . .} as a decreasing/increasing standard sequence with carrier A. It is simple to see how—having fixed an essential event A, and a base x 0 which is non-extremal in the ordering on X (i.e. there are y, z ∈ X such that y x 0 z)— standard sequences can be used to measure the canonical utility index u of a biseparable preference (extending the scope of the method proposed by Wakker and Deneffe, 1996): One just needs to construct (increasing and decreasing) standard sequences with base x 0 and finer and finer mesh. In what follows we use standard sequences and cardinal symmetry to show that equality of the ui , i = 1, 2, can be verified without eliciting them. 10.B.2. Equality of utilities: Proof of Proposition 10.1 The proof of Proposition 10.1 builds on two lemmas. The first lemma, whose simple proof we omit, shows the following: Suppose that a pair of biseparable preferences are cardinally symmetric, then for fixed non-extremal x 0 and essential events A1 and A2 , the sets of the standard sequences (with respect to (x 0 , A1 ) and (x 0 , A2 ) respectively) of the orderings are “nested” into each other. Stating this lemma requires some terminology and notation: Given a standard sequence {x n } for preference relation i , we say that a sequence {y m } ⊆ X is a refinement of {x n } if it is itself a standard sequence, and it is such that y m = x n whenever m = kn for some k ∈ N. Two canonical utility indices are subject to a common normalization if they take identical values on two consequences x, y ∈ X such that x i y for both i. Finally, for the rest of this section: For each i = 1, 2, the carrier of any standard sequence for i is a fixed essential event Ai , and SQ(i , x 0 ) ⊆ X denotes the set of the points belonging to some standard sequence of i with base x 0 and carrier Ai .
Ambiguity made precise
237
Lemma 10.B.1. Suppose that 1 , 2 are as assumed in Proposition 10.1. Fix a non-extremal x 0 ∈ X. If 1 and 2 are cardinally symmetric, then the following holds: Either every standard sequence for ordering 1 is a refinement of a standard sequence for 2 , or every standard sequence for ordering 2 is a refinement of a standard sequence for 1 . Hence, SQ(1 , x 0 ) = SQ(2 , x 0 ) ≡ SQ(x 0 ). The second lemma shows that, because of cardinal symmetry, the result holds on SQ(x 0 ): Lemma 10.B.2. Suppose that 1 , 2 are as assumed in Proposition 10.1. If 1 and 2 are cardinally symmetric, then for any non-extremal x 0 ∈ X and any common normalization of the two indices, u1 (x) = u2 (x) for every x ∈ SQ(x 0 ). Proof. Fix a non-extremal x 0 . Suppose that x belongs to an increasing standard sequence for i , {x n }. Since the relations are cardinally symmetric, by Lemma 10.B.1 it is w.l.o.g. (taking refinements if necessary) to take the sequence to be standard for both orderings. That is, there are v∗ , v ∗ , w∗ , w∗ ∈ X such that v ∗ 1 v∗ , w ∗ 2 w∗ and for n 0, x n+1 A1 v∗ ∼1 x n A1 v ∗ , and analogously for 2 (with w replacing v). Moreover, there is n 0 such that x = x n . Choose x m for some m > n, and take positive affine transformations of the two canonical utility functions so as to obtain u1 (x 0 ) = u2 (x 0 ) = 0 and u1 (x m ) = u2 (x m ) = 1. All points in the sequence are evenly spaced for both preferences (cf. Equation (10.B.3)). Hence we have u1 (x n ) = u2 (x n ) = n/m. The case in which x belongs to a decreasing standard sequence is treated symmetrically. Finally, we have the immediate observation that if u1 (x) = u2 (x) for one common normalization, the equality holds for every common normalization. Proof of Proposition 10.1. The “if” part follows immediately from the canonical representation. We now prove the “only if.” Start by fixing a non-extremal x 0 and adding a constant to both indices, so that u1 (x 0 ) = u2 (x 0 ) = 0. Suppose that (after this transformation) there is x ∈ X such that u1 (x) = u2 (x). By relabelling if necessary, assume that u1 (x) = α > β = u2 (x). There are different cases to consider, depending on where α and β are located. Suppose first that β 0. Choose v ∗ ∈ X such that x 0 1 v ∗ and further transform the utilities so that u¯ 1 (v ∗ ) = u¯ 2 (v ∗ ) = −1, to obtain u¯ 1 (x) = α¯ > β¯ = u¯ 2 (x). Choose ε > 0 such that α¯ − β¯ > ε. By the connectedness of the range of each ui and Lemma 10.B.1, there are v∗ , w∗ ∈ X such that (v∗ , v ∗ ) and (w∗ , v ∗ ) generate the same standard sequence {x n } and u¯ 1 (x n+1 ) − u¯ 1 (x n ) = u¯ 2 (x n+1 ) − u¯ 2 (x n ) < ε. So the “length” of the utility interval between each element in the increasing ¯ We also proved standard sequence is smaller than the distance between α¯ and β.
238
Paolo Ghirardato and Massimo Marinacci
in Lemma 10.B.2 that for each element in the standard sequence, we have equality of the utilities (since we imposed a common normalization). Hence there must be ¯ α). n 0 such that u¯ 1 (x n ) = u¯ 2 (x n ) = γ ∈ (β, ¯ We then have u¯ 1 (x n ) > u¯ 1 (x) ⇐⇒ x n 1 x
and u¯ 2 (x n ) < u¯ 2 (x) ⇐⇒ x n ≺2 x,
which contradicts the assumption of ordinal equivalence. The case in which α 0 is treated symmetrically. If, finally, α > 0 > β then, using an argument similar to the one just presented, one can find x¯ ∈ X such that u1 (x) ¯ = u2 (x) ¯ ∈ (0, α) and obtain a similar contradiction. This shows that u1 (x) = u2 (x) for every x ∈ X.
Appendix C: Proofs for Sections 10.4–10.6 10.C.1. Section 10.4 Proof of Theorem 10.1. We first state without proof an immediate result: Lemma 10.C.1. Two preference relations 1 and 2 satisfying Equations (10.7) and (10.8) are ordinally equivalent. Given this lemma, if 1 and 2 have essential events the result follows immediately from Proposition 10.1. If, say, relation i does not have essential events, any ordinal transformations of ui is still a canonical utility. Since the two preferences are ordinally equivalent by the lemma, it is then w.l.o.g. to use uj (j = i) to represent both of them. Proof of Theorem 10.2. We first prove that D() ⊆ M(). Given a canonical representation V of with canonical utility u, suppose that P ∈ D(), and consider the relation induced by P and u. We want to show that is more ambiguity averse than . Since P ∈ D(), u(f ) dP V (f ) for all f ∈ F , so that for every x ∈ X and f ∈ F , u(x) u(f (s)) P (ds) =⇒ V (x) V (f ), S
where the implication follows from the definition u(x) = V (x) for all x ∈ X. This proves that (10.7) holds. Similarly one shows the validity of (10.8). Part (B) of Definition 10.5 is immediate: If and have essential events, then the result follows from Proposition 10.1. Hence ∈ R(), or in other words P ∈ M(). We now prove the opposite inclusion D() ⊇ M(). Suppose that P ∈ M(). Let be the benchmark preference corresponding to P , and let u be the canonical utility index of . Since is a benchmark for , we have for every x ∈ X and f ∈ F , u (x) u (f (s)) P (ds) =⇒ u(x) V (f ), (10.C.1) S
Ambiguity made precise
239
and the same with strict inequality. We have to show that P ∈ D(). By Theorem 10.1, it is w.l.o.g. to take u = u . Hence, (10.C.1) implies that u(f ) dP V (f ) for all f ∈ F , and so P ∈ D(). Proof of Corollary 10.1. By Theorem 10.2, M() = D(). Let P ∈ D(). For every A ∈ and x ∗ x∗ , consider the act f = x ∗ A x∗ . Normalizing u(x ∗ ) = 1 and u(x∗ ) = 0, we have P (A) = u(f (s)) P (ds) u(f (s)) ν(ds) = ν(A), S
S
and so P ∈ C(ν). This implies D() ⊆ C(ν). The converse inclusion is trivial, since P ∈ C(ν) implies u(f ) dP u(f ) dν for all f ∈ F . Proof of Corollary 10.2. We are done if we show that for all f , g ∈ F , u(f (s)) P (ds) min u(g(s)) P (ds). f g ⇐⇒ min P ∈D() S
P ∈D() S
(10.C.2) This follows from the fact that there exists a unique weak∗ -compact and convex set C representing . D() is clearly weak∗ -compact (so that the minimum in (10.C.2) is well defined) and convex. Hence, if (10.C.2) holds C = D(), and by Theorem 1 10.2, D() = M(). To prove (10.C.2), suppose there are f , g ∈ F such that min u(f ) dP min u(g) dP P ∈C
P ∈C
and
min
P ∈D()
u(f ) dP < min
P ∈D()
u(g) dP .
Let P ∗ ∈ arg min{ S u(f (s)) P (ds) : P ∈ D()}. Since C ⊆ D(), we have: min u(f (s)) P (ds) u(f (s)) P ∗ (ds) < P ∈C S
min
P ∈D() S
S
u(g(s)) P (ds) min
P ∈C S
u(g(s)) P (ds),
a contradiction. Similarly, one shows that there cannot be f , g ∈ F such that the preference based on D() prefers weakly f to g, while g f . This shows that Equation (10.C.2) holds, concluding the proof. Proof of Proposition 10.3. That every SEU preference is ambiguity neutral follows immediately from two applications of Theorem 10.1. As for the converse: If is both ambiguity averse and ambiguity loving, there are a SEU preference
240
Paolo Ghirardato and Massimo Marinacci
relation 1 (represented by probability P1 ) such that is more ambiguity averse than 1 , and a SEU preference relation 2 (represented by probability P2 ) which is more ambiguity averse than . Applying Definition 10.5 twice, we obtain that for every f ∈ F and x ∈ X, x 1 f ⇒ x 2 f
and
x >1 f ⇒ x >2 f .
We show that 1 and 2 are cardinally symmetric. This requires first showing that if 2 has an essential event, so must . Suppose that A ∈ is essential for 2 , so that for some x y (remember that and 1 and 2 are all ordinally equivalent), x >2 x A y >2 y. Using the contrapositive of (10.7), we then have x A y y. Since 2 is a SEU preference, Ac is also 2 -essential, similarly implying x Ac y y. Now, suppose that has no essential event. Because of the preferences we just derived, we must have both x ∼ x A y and x ∼ x Ac y. This is impossible since 1 ∈ R(), for the contrapositive of (10.8) then yields x A y 1 x, which implies P1 (A) = 1, and x Ac y 1 x, which implies P1 (A) = 0. This gives us a contradiction, so that must have an essential event if 2 does. Hence, 2 and have essential events, and they are cardinally symmetric by assumption. Similarly one shows that 1 and have essential events and are cardinally symmetric. It is now immediate to check that these facts imply that 1 and 2 are cardinally symmetric. We thus conclude that 2 is more ambiguity averse than 1 . Mimicking the last part of the proof of Theorem 10.2, we then show that then P1 P2 , which immediately implies P1 = P2 , so that 1 = 2 ≡ . Thus is both more and less ambiguity averse than , which immediately implies = . Proof of Theorem 10.3. Part (i) follows immediately along the lines of the proofs of Theorem 10.2 and Corollary 10.1. As for part (ii), it is similarly immediate to show that if 2 is more ambiguity averse than 1 , then C1 ⊆ D(2 ) and u1 ≈ u2 . We show the converse. Let V1 and V2 denote the canonical representations of 1 and 2 , and w.l.o.g. assume that u1 = u2 = u. Then C1 ⊆ D(2 ) implies that for every f ∈ F and every P ∈ C1 , V2 (f ) u(f ) dP . Hence, using the fact that 1 is MEU, we find V2 (f ) min u(f (s)) P (ds) = V1 (f ), P ∈C1 S
which immediately yields the desired result. 10.C.2. Section 10.5 Proof of Proposition 10.5. Let ∈ R() and set ρ(Ac ) = 1}. If A ∈ for all x ∈ X we have u(x) = P (A) ⇐⇒ u(x) = ρ(A) u(x) = P (Ac ) ⇐⇒ u(x) = ρ(Ac ),
≡ {A ∈ : ρ(A) +
Ambiguity made precise and so ρ(A) = P (A) and ρ(Ac ) = P (Ac ). This implies that A ∈ ⊆ . Now, if A ∈ we have ρ(A) = P (A)
and
ρ(Ac ) = P (Ac ).
241
, so that
(10.C.3)
In order to show that A ∈ , we need to show that any act measurable w.r.t. the partition {A, Ac } is in H . This follows from (10.C.3), as for every x, y ∈ X we have V (x A y) = V (x A y). Thus ⊆ , which concludes the proof. 10.C.3. Section 10.6 Proof of Proposition 10.6. Suppose, to the contrary, that ν agrees with (10.17). If Eq. (10.15) holds then P (R) = ν(R) and P (B, Y ) = ν(B, Y ) for all P ∈ C(ν), so that we have P (B, Y ) = ν(B, Y ) < ν(B, R) P (B, R). In turn, this implies P (Y ) < P (R), yielding ν(Y ) P (Y ) < P (R) = ν(R). Hence ν(Y ) < ν(R), contradicting (10.17). Proof of Proposition 10.7. Every ν which satisfies (10.18) is such that C(ν) = Ø. For, the measure P such that P (R) = P (B) = P (Y ) = 1/3 belongs to C(ν). This proves that all preferences satisfying (10.18) are ambiguity averse. As to the converse, let be ambiguity averse, that is, C(ν) = Ø. Let P ∈ C(ν). Assume first that ν(B) = ν(Y ) > ν(R). Since P (B) ν(B) and P (Y ) ν(Y ), P (B) + P (R) + P (Y ) > ν(B) + ν(R) + ν(Y ) > 1, a contradiction. Assume now ν(B, Y ) < ν(B, R) = ν(R, Y ). This implies P (B, Y ) < P (B, R) and P (B, Y ) < P (R, Y ), so that P (Y ) < P (R), P (B) < P (R), and P (B) + P (R) + P (Y ) < 1, a contradiction.
Acknowledgments An earlier version of this chapter was circulated with the title “Ambiguity Made Precise: A Comparative Foundation and Some Implications.” We thank Kim Border, Eddie Dekel, Itzhak Gilboa, Tony Kwasnica, Antonio Rangel, David Schmeidler, audiences at Caltech, Johns Hopkins, Northwestern, NYU, Rochester, UC-Irvine, Université Paris I, the TARK VII-Summer Micro Conference (Northwestern, July 1998), the 1999 RUD Workshop, and especially Simon Grant, Peter Klibanoff, Biung-Ghi Ju, Peter Wakker, and an anonymous referee for helpful comments and discussion. Our greatest debt of gratitude is however to Larry Epstein, who sparked our interest on this subject with his paper (Epstein (1999)) and stimulated it with many discussions. Marinacci gratefully acknowledges the financial support of MURST.
242
Paolo Ghirardato and Massimo Marinacci
Notes 1 Other widespread names are “uncertainty aversion” and “aversion to Knightian uncertainty.” We like to use “uncertainty” in its common meaning of any situation in which the consequences of the DM’s possible actions are not known at the time of choice. 2 A bet “on” an event is any binary act in which a better payoff (“win”) is received when the event obtains. 3 There are earlier papers that use a comparative approach for studying ambiguity attitude, but they do not use it as a basis for defining absolute notions. For example, Tversky and Wakker (1995). 4 See Appendix A for the definition of capacities, Choquet integrals, and some of their properties. 5 We use the symbols (and >) to denote SEU weak (and strict) preferences. 6 Such set is well-defined since it is trivially true that the union of any collection of sets satisfying (A) and (B) below also satisfies the two conditions. 7 See Zhang (1996) for a compelling urn example in which this happens. 8 For any pair of biseparable preferences which have essential events, this elicitation can be done without extraneous devices by using the tradeoff method briefly outlined in Appendix B. 9 We thank Peter Klibanoff for his substantial help in developing the ensuing discussion. 10 The richness condition is: For every F ⊆ E in and A ∈ such that A is as likely as E, there is B ⊆ A in such that B is as likely as F . Epstein remarks that richness of is not required for some of his results.
References F. I. Anscombe and R. J. Aumann (1963), A definition of subjective probability, Ann. Math. Stat. 34, 199–205. K. J. Arrow (1974), The theory of risk aversion, in “Essays in the Theory of Risk-Bearing,” Chap. 3, North-Holland, Amsterdam. R. Casadesus-Masanell, P. Klibanoff, and E. Ozdenoren (2000), Maxmin expected utility over Savage acts with a set of priors, J. Econ. Theory 92, 33–65. A. Chateauneuf and J. M. Tallon (1998), Diversification, convex preferences and non-empty core, mimeo, Université Paris I, July. G. Choquet (1953), Theory of capacities, Ann. Inst. Fourier (Grenoble) 5, 131–295. B. de Finetti (1952), Sulla preferibilitá, Giorn. Econ. 6, 3–27. D. Ellsberg (1961), Risk, ambiguity, and the Savage axioms, Quart. J. Econ. 75, 643–669. L. G. Epstein (1999), A definition of uncertainty aversion, Rev. Econ. Stud. 66, 579–608. (Reprinted as Chapter 9 in this volume.) L. G. Epstein and T. Wang (1994), Intertemporal asset pricing under Knightian uncertainty, Econometrica 62, 283–322. (Reprinted as Chapter 18 in this volume.) L. G. Epstein and J. Zhang (2001), Subjective probabilities on subjectively unambiguous events, Econometrica 69, 265–306. P. C. Fishburn (1993), The axioms and algebra of ambiguity, Theory Dec. 34, 119–137. P. Ghirardato and J. N. Katz (2000a), “Indecision Theory: Explaining Selective Abstention in Multiple Elections,” Social Science Working Paper 1106, Caltech, November. P. Ghirardato and M. Marinacci (2000b), “Risk, Ambiguity, and the Separation of Utility and Beliefs,” Social Science Working Paper 1085, Caltech, March (Revised: January 2001).
Ambiguity made precise
243
P. Ghirardato and M. Marinacci (2000), A subjective definition of ambiguity, Work in progress, Caltech and Università di Torino. I. Gilboa and D. Schmeidler (1989), Maxmin expected utility with a non-unique prior, J. Math. Econ. 18, 141–153. (Reprinted as Chapter 6 in this volume.) L. P. Hansen, T. Sargent, and T. D. Tallarini (1999), Robust permanent income and pricing, Rev. Econ. Stud. 66, 873–907. Y. Kannai (1992), The core and balancedness, in “Handbook of Game Theory” (R. J. Aumann and S. Hart, eds), pp. 355–395, North-Holland, Amsterdam. D. Kelsey and S. Nandeibam (1996), On the measurement of uncertainty aversion, mimeo, University of Birmingham, September. D. H. Krantz, R. D. Luce, P. Suppes, and A. Tversky (1971), “Foundations of Measurement: Additive and Polynomial Representations,” Vol. 1, Academic Press, San Diego. M. J. Machina and D. Schmeidler (1992), A more robust definition of subjective probability, Econometrica 60, 745–780. A. Montesano and F. Giovannoni (1996), Uncertainty aversion and aversion to increasing uncertainty, Theory Dec. 41, 133–148. S. Mukerji (1998), Ambiguity aversion and incompleteness of contractual form, Amer. Econ. Rev. 88, 1207–1231. (Reprinted as Chapter 14 in this volume.) K. Nehring (1999), Capacities and probabilistic beliefs: A precarious coexistence, Math. Soc. Sci. 38, 197–213. J. W. Pratt (1964), Risk aversion in the small and in the large, Econometrica 32, 122–136. L. J. Savage (1954), “The Foundations of Statistics,” Wiley, New York. D. Schmeidler (1989), Subjective probability and expected utility without additivity, Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) A. Tversky and P. P. Wakker (1995), Risk attitudes and decision weights, Econometrica 63, 1255–1280. P. P. Wakker (1989), “Additive Representations of Preferences,” Kluwer, Dordrecht. P. P. Wakker and D. Deneffe (1996), Eliciting von Neumann–Morgenstern utilities when probabilities are distorted or unknown, Manage. Sci. 42, 1131–1150. M. E. Yaari (1969), Some remarks on measures of risk aversion and on their uses, J. Econ. Theory 1, 315–329. J. Zhang (1996), Subjective ambiguity, probability and capacity, mimeo, University of Toronto, October.
11 Stochastically independent randomization and uncertainty aversion Peter Klibanoff
11.1. Introduction An example seminal to interest in uncertainty (or ambiguity) aversion is Ellsberg’s (1961) “two-color” problem. There is a “known urn” which contains 50 red balls and 50 black balls, and an “unknown urn” which contains a mix of red and black balls, totaling 100, about which no information is given. Ellsberg observed (as did many afterwards, more carefully) that a substantial fraction of individuals were indifferent between the colors in both urns, but preferred to bet on either color in the “known urn” rather than the corresponding color in the “unknown urn.” This violates not only expected utility (EU), but probabilistically sophisticated behavior more generally. One contemporary criticism of the displayed behavior was put forward by Raiffa (1961) who pointed out that flipping a coin to decide which color to bet on in the unknown urn should be viewed as equivalent to betting on the “known” 50–50 urn. One can think of such preferences as displaying a preference for randomization. Jumping ahead to more recent work, there is a burgeoning literature attempting to model uncertainty (or ambiguity) aversion in decision makers using representations with nonadditive probabilities or sets of probabilities. Some of this work (e.g. Lo, 1996; Klibanoff, 1994) accepts this preference for mixture or randomization as a facet of uncertainty aversion, while other work (e.g. Dow and Werlang, 1994; Eichberger and Kelsey, 2000) does not. This has led to several papers, most directly Eichberger and Kelsey (1996), but also Ghirardato (1997) and Sarin and Wakker (1992), related to this difference. In particular, all three papers observe that the choice of a “one-stage” or Savage model as opposed to a “two-stage” or Anscombe–Aumann model can lead to different preferences when modeling uncertainty aversion. In Eichberger and Kelsey (1996) the authors set out to “show that while individuals with nonadditive beliefs may display a strict preference for randomization in an Anscombe–Aumann framework they will not do so in a Savage-style decision theory.”1
Klibanoff, P. Stochastically independent randomization and uncertainty aversion. Economic Theory 18, 605–620.
Stochastically independent randomization
245
This chapter was motivated in part by the intuition that the one-stage/two-stage modeling distinction is largely a red herring, at least as it relates to preference for randomization. In particular, while appreciating that there can be differences between the frameworks, one goal of this chapter is to relate these differences to violations of stochastic independence and to point out that they have essentially no role to play in the debate over preference for randomization in uncertainty aversion. In making this point, the related finding of the restrictiveness of Choquet expected utility (CEU) preferences in allowing for randomizing devices is key. An additional contribution of the chapter is to provide preference based conditions to describe a stochastically independent randomizing device in a nonBayesian environment. Section 11.2 sets out some preliminaries and notation. Section 11.3 describes two frameworks in which a randomizing device can be modeled. Section 11.4 provides the key preference conditions and contains the main results on the restrictiveness of CEU when stochastic independence is required and the relative flexibility of Maxmin expected utility (MMEU) with multiple priors. Section 11.5 concludes.
11.2. Preliminaries and notation We will consider two representations of preferences, each of which generalizes EU and allows for uncertainty aversion. The first model is CEU. CEU was axiomatized first in an Anscombe–Aumann framework by Schmeidler (1989), and then in a Savage framework by Gilboa (1987) and Sarin and Wakker (1992). In a Savage framework, but assuming a rich set of consequences and a finite state space, Wakker (1989), Nakamura (1990), and Chew and Karni (1994) have axiomatized CEU. The second model is MMEU with non-unique prior. MMEU was first axiomatized in an Anscombe–Aumann framework by Gilboa and Schmeidler (1989). In a Savage framework, but assuming a rich set of consequences and allowing a finite or infinite state space, MMEU has been axiomatized by Casadesus-Masanell et al. (2000b). Consider a finite set of states of the world S. Let X be a set of consequences. An act f is a function from S to X. Denote the set of acts by F . A function v : 2S → [0, 1] is a capacity or nonadditive probability if it satisfies, (i) v(∅) = 0, (ii) v(S) = 1, and (iii) A ⊆ B implies v(A) ≤ v(B). It is convex if, in addition, (iv) For all A, B ⊆ S, v(A) + v(B) ≤ v(A ∪ B) + v(A ∩ B). Now define the (finite) Choquet integral of a real-valued function a to be: a dv = α1 v(E1 ) +
n
αi [v(∪ij =1 Ej ) − v(∪i−1 j =1 Ej )],
i=2
where αi is the ith largest value that a takes on, and Ei = a −1 (αi ).
246
Peter Klibanoff
Let ! be a binary relation on acts, F , that represents (weak) preferences. A decision maker is said to have CEU preferences if there exists a utility function S u : X → / and a nonadditive probability v : 2 → / such that, for all f , g ∈ F , f ! g if and only if u ◦ f dv ≥ u ◦ g dv. CEU preferences are said to display uncertainty aversion if v is convex.2 A decision maker is said to have MMEU preferences if there exists a utility function u : X → / and a non-empty, closed and convex set B of additive probability measures on S such that, for all f , g ∈ F , f ! g if and only if minp∈B u ◦ f dp ≥ minp∈B u ◦ g dp. All MMEU preferences display uncertainty aversion.3 Finally, note that the set of MMEU preferences strictly contains the set of CEU preferences with convex capacities.
11.3. Modeling a randomizing device Corresponding to the two standard frameworks for modeling uncertainty (Anscombe–Aumann and Savage) there are at least two alternative ways to model a randomizing device. In an Anscombe–Aumann setting, a randomizing device is incorporated in the structure of the consequence space. Specifically the “consequences” X, are often taken to be the set of all simple probability distributions over some more primitive set of outcomes, Z. In this setup, a randomization over two acts f and g with probabilities p and 1 − p respectively is modeled by an act h where h(s)(z) = pf (s)(z) + (1 − p)g(s)(z), for all s ∈ S, z ∈ Z. Observe that h is, indeed, a well-defined act because the set of simple probability distributions is closed under mixture. Returning to the “unknown urn” of the introduction, Table 11.1 shows the three acts (a) “bet on red,” (b) “bet on black,” and (c) “randomize 50–50 over betting on red or on black” as modeled in this setting. Alternatively, consider a Savage-style setting with a finite state space (e.g. Wakker (1984), Nakamura (1990), or Gul (1992)). Here a convex combination of two elements of the consequence space X need not be an element of X (and need not even be defined). Therefore, to model a randomization, we may instead expand the original state space, S, by forming the cross product of S with the possible outcomes (or “states”) of the randomizing device. For example, Table 11.2 shows the acts (a) “bet on red,” (b) “bet on black,” (c) “bet on red if heads, black if tails,” and (d) “bet on black if heads, red if tails” in the case of the unknown urn with a coin used to randomize. Table 11.1 Unknown urn with randomization in the consequence space (Anscombe– Aumann) R(ed)
B(lack)
(a) (b)
$100 $0
$0 $100
(c)
1 2 $100
⊕ 12 $0
1 2 $100
⊕ 12 $0
Stochastically independent randomization
247
Table 11.2 Unknown urn with randomization in the state space only (Savage)
(a) (b) (c) (d)
R(ed), H(eads)
B(lack), H(eads)
R(ed), T(ails)
B(lack), T(ails)
$100 $0 $100 $0
$0 $100 $0 $100
$100 $0 $0 $100
$0 $100 $100 $0
In comparing the two models, observe that the Anscombe–Aumann setting builds in several key properties that a randomizing device should satisfy while the Savage setting does not. In particular, the probabilities attached to the outcomes of the randomizing device should be unambiguous and the device should be stochastically independent from the (rest of the) state space. Arguably these two properties capture the essence of what is meant by a randomizing device. Both properties are automatically satisfied in an Anscombe–Aumann setting. In a Savage setting, as we will see later, these properties require additional restrictions on preferences.4 Several recent papers (including Eichberger and Kelsey, 1996; Ghirardato, 1997; and Sarin and Wakker, 1992), have noted that CEU need not give identical results in the two frameworks. Specifically, they suggest that the choice of a one-stage (Savage) or two-stage (Anscombe–Aumann) model can lead to different behavior. To see this in the unknown urn example, consider the case where the decision maker’s marginal capacity over the colors is v(R) = v(B) = 13 . In the Anscombe–Aumann setting this is enough to pin down preferences as c a ∼ b (i.e. the Raiffa preferences or preference for randomization). In the Savage setting, consider the capacity given by v(R × {H , T }) = v(B × {H , T }) = v({R, B} × H ) = v({R, B} × T ) = v(R × H ) = v(R × T ) = v(B × H ) = v(B × T ) = v((R × H ) ∪ (B × T )) = v((R × T ) ∪ (B × H )) = v(any 3 states) =
1 , 3 1 , 2 1 , 6 1 , 3 2 . 3
This capacity yields the preferences a ∼ b ∼ c ∼ d, and thus does not provide a preference for randomization as in the Anscombe–Aumann setting. Why does this occur despite the fact that the marginals are identical in the two cases and the product capacity is equal to the product of the marginals on all rectangles? Mathematically, as Ghirardato (1997) explains, the source is a failure of the usual Fubini Theorem to hold for Choquet integrals. Intuitively, however, it is not clear what is going “wrong” in the example.
248
Peter Klibanoff Table 11.3 Non-product weights for randomized act
c weights
R, H
B, H
R, T
B, T
$100
$0
$0
$100
1 6
1 3
1 3
1 6
To gain some insight, it is useful to examine the weights applied to each state when evaluating the randomized acts using the Choquet integral. For example, as Table 11.3 shows, “Bet on Red if Heads, Black if Tails” is evaluated using nonproduct weights. The fact that such non-product weights can be applied suggests that the CEU preferences with the capacity above reflect ambiguity not only about the color of the ball drawn from the urn but also about the correlation between the randomizing device and the color of the ball. This can also be seen by noting that v({R, B} × H ) > v((R × H ) ∪ (B × T )), in contrast to the equality one might expect if H and T are really produced by a symmetric, independent randomization. While such ambiguity is certainly possible, it runs directly counter to the stochastic independence we would expect of a randomizing device. In the next section, therefore, I propose conditions on preferences that ensure this independence.
11.4. Stochastically independent randomization and preferences Here I propose conditions on preferences that are designed to reflect two properties of a randomizing device: unambiguous probabilities and stochastic independence. These two properties are essential to what is meant by a randomizing device. Formally, consider preferences, !, over acts, F : S → X, on a finite product state space, S = S1 ×S2 ×· · ·×SN . Let S−i denote the product of all ordinates other than i. Denote by FSi the subset of acts for which outcomes are determined entirely by the ith ordinate. This means that f ∈ FSi implies f (si , s−i ) = f (si , sˆ−i ) for all s−i , sˆ−i ∈ S−i and si ∈ Si . For f , g ∈ F and A ⊆ S, denote by fA g the act which equals f (s) for s ∈ A and equals g(s) for s ∈ / A. We now state some useful definitions concerning preferences. Definition 11.1. ! satisfies solvability on Si if, for f ∈ FSi , x, y, z ∈ X and Ai ⊆ Si , xAi ×S−i z ! f ! yAi ×S−i z implies f ∼ wAi ×S−i z for some w ∈ X. Solvability should be seen as a joint richness condition on ! and X. It is satisfied in all axiomatizations of which I am aware of, EU, CEU, or MMEU over Savage acts on a finite state space. For example, Nakamura (1991) imposes solvability directly, while Wakker (1984, 1989), Gul (1992) and Casadesus-Masanell et al. (2000a,b) ensure it is satisfied through topological assumptions on X and continuity assumptions on !.
Stochastically independent randomization
249
Definition 11.2. ! satisfies expected utility (EU) on Si if ! restricted to FSi can be represented by expected utility where the utility function is unique up to a positive affine transformation and the probability measure on the set of all subsets of Si is unique. While the definition is intentionally stated somewhat flexibly, it could easily be made more primitive/rigorous by assuming that preferences restricted to FSi satisfy the axioms in one of the existing axiomatizations of EU over Savage acts on a finite state space such as Wakker (1984), Nakamura (1991), Gul (1992), or Chew and Karni (1994). This definition is intended to capture the fact that the decision maker associates a unique probability distribution with Si and uses that distribution to weight outcomes. Note that the uniqueness requirement on the probability measure entails the existence of consequences x, y ∈ X such that x y (where preferences over X are derived from preferences over the associated constant-consequence acts in the usual way). Furthermore, any of the axiomatizations cited will imply solvability on Si as well. Definition 11.3. si ∈ Si is null if fsi ×S−i h ∼ gsi ×S−i h for all f , g, h ∈ FSi . Note that given EU, a state is null if and only if it is assigned zero probability. Definition 11.4. Si is stochastically independent of S−i if, for all sˆ−i ∈ S−i , f ∈ FSi and w ∈ X, f ∼w
(11.1)
implies, fSi ׈s−i w ∼ w.
(11.2)
While this is formulated as a general definition of stochastic independence of an ordinate, this chapter will focus only on independence of a randomizing device. For this purpose, the main definition is the following: Definition 11.5. Si is a stochastically independent randomizing device (SIRD) if Si is stochastically independent and contains at least two non-null states, and ! satisfies solvability and EU on Si . This condition is designed to differentiate between EU ordinates that are stochastically independent from the rest of the state space and those that are dependent, while still allowing for possible uncertainty aversion on other ordinates. A useful way to understand this definition is as follows: There are several potential reasons why Equation (11.1) could hold while Equation (11.2) is violated. First, it might be that uncertainty aversion over Si leads a different marginal probability measure over Si to be used when evaluating the acts in (11.2) than when evaluating acts
250
Peter Klibanoff
in (11.1). This is ruled out by the assumption that preferences satisfy EU on Si . Second, it might be that the marginal over Si conditional on sˆ−i is different than the unconditional marginal over Si due to some stochastic dependence (or uncertainty about stochastic independence) between Si and S−i . Since we want to model an independent randomizing device, it is proper that the SIRD condition does not allow for such dependence. Also supporting the idea that this definition reflects stochastic independence is the observation that if preferences are EU and nontrivial, then Si an SIRD is equivalent to requiring that the representing probability measure be a product measure on Si × S−i . Note also that all of the results that follow will also hold true if we additionally impose that S−i is stochastically independent of Si (by switching the role of i and −i in the definition of stochastically independent). Thus this concept shares the symmetry that a notion of stochastic independence should intuitively possess. In the next two sections, we develop the implications of SIRD for some common classes of uncertainty averse preferences. 11.4.1. MMEU and randomizing devices This section develops the implications for MMEU preferences of one ordinate of the state space being a SIRD. MMEU will be found to be flexible enough to easily incorporate both a SIRD and uncertainty aversion. Theorem 11.1. Assume ! are MMEU preferences satisfying solvability for some Si that contains at least two non-null states. Then the following are equivalent: (i) Si is a SIRD; (ii) There exists a probability measure on 2Si , p, ˆ such that all probability measures, p, in the closed, convex set of measures, B, of the MMEU representation satisfy p(s) = p(s ˆ i )p(Si × s−i ), for all s ∈ S. Proof. ((i) ⇒ (ii)) We first show that all p ∈ B must have the same marginal on Si . Fix outcomes x, y ∈ X such that x y. EU on Si implies that ! restricted to FSi ˆ i ) where pˆ is the unique representing may be represented by si ∈Si u(f (si ))p(s probability measure on 2Si , and u is unique up to a positive affine transformation. Using the MMEU representation of ! yields a utility function u˜ and a set of measures B such that, for all f , g ∈ FSi , min p∈B
si ∈Si
⇐⇒
u(f ˜ (si ))p(si × S−i ) ≥ min p∈B
si ∈Si
u(f (si ))p(s ˆ i) ≥
u(g(s ˜ i ))p(si × S−i )
si ∈Si
u(g(si ))p(s ˆ i ).
si ∈Si
Without loss of generality, set u(x) = u(x) ˜ = 1 and u(y) = u(y) ˜ = 0. Using the fact that Si satisfies EU and solvability, combined with the MMEU representation,
Stochastically independent randomization
251
allows one to apply Nakamura (1990: lemma 3) and conclude that, given the normalization, the two utility functions must be the same (i.e. u(x) ˜ = u(x) for all x ∈ X). Therefore, min u(f (si ))p(si × S−i ) ≥ min u(g(si ))p(si × S−i ) p∈B
si ∈Si
⇐⇒
p∈B
ˆ i) ≥ u(f (si ))p(s
si ∈Si
si ∈Si
u(g(si ))p(s ˆ i ).
si ∈Si
ˆ i ) for some si ∈ Si . Suppose there is some p ∈ B such that p (si × S−i ) = p(s Without loss of generality, assume that p(s ˆ i ) > p (si × S−i ) for an si ∈ Si . Consider the act f = xsi ×S−i y. Solvability guarantees that there exists a z ∈ X such that z ∼ f . Thus, u(f (si ))p(si × S−i ) u(z) = min p∈B
≤ <
si ∈Si
p (si × S−i ) p(s ˆ i )
=
u(f (si ))p(s ˆ i)
si ∈Si
= u(z), ˆ i ) for a contradiction. Therefore, it must be that p ∈ B implies p(si × S−i ) = p(s all si ∈ Si . In other words, all the marginals on Si agree. Now we show that each p ∈ B is a product measure on Si × S−i . This part of the argument proceeds by contradiction. Suppose that p ∈ B does not imply that p(s) = p(s ˆ i )p(Si × s−i ), for all s ∈ S. Then there must exist a p0 ∈ B and a sˆ ∈ S such that ˆ si )p0 (Si × sˆ−i ). p0 (ˆs ) < p(ˆ
(11.3)
According to p0 , the probability of sˆi and sˆ−i occurring together is less than the product of the respective marginal probabilities. We now show that this is inconsistent with the assumption that Si is stochastically independent. Consider the act f ∈ FSi such that f = xsˆi ×S−i y. Since x ! f ! y, solvability on Si implies there exists a w ∈ X such that w ∼ f . Observe that our normalization of u and the preference representation imply u(w) = p(ˆ ˆ si )u(x) + (1 − p(ˆ ˆ si ))u(y) = p(ˆ ˆ si ). Define the act h = fSi ׈s−i w. By SIRD, f ∼ w implies h ∼ w. We have the following contradiction: u(h(s))p(s) u(w) = min p∈B
≤
s∈S
s∈S
u(h(s))p0 (s)
252
Peter Klibanoff = u(f (s))p0 (s) + u(w)p0 (s) s∈Si ׈s−i
s ∈S / i ׈s−i
= p0 (ˆs ) + u(w)(1 − p0 (Si × sˆ−i )) ˆ si )p0 (Si × sˆ−i )) = u(w) + (p0 (ˆs ) − p(ˆ < u(w). (Note that the last inequality follows from (11.3).) Therefore each p ∈ B must in fact be a product measure on Si × S−i and (ii) is proved. ((ii) ⇒ (i)) That (ii) implies EU is satisfied on Si is clear because pˆ is the unique representing probability measure. To see that (11.1) implies (11.2) is satisfied on Si , consider any f ∈ FSi and w ∈ X such that f ∼ w. Fix an sˆ−i ∈ S−i and define h = fSi ׈s−i w. By (ii),
min p∈B
u(h(s))p(s) = min p∈B
s∈S
⎡ p(Si × s−i ) ⎣
s−i ∈S−i
⎤ u(h(si , s−i ))p(s ˆ i )⎦
si ∈Si
and,
min p∈B
u(w)p(s) = min
s∈S
p∈B
s−i ∈S−i
⎡ p(Si × s−i ) ⎣
⎤ u(w)p(s ˆ i )⎦ .
si ∈Si
Since f ∼ w, si ∈Si
u(h(si , s−i ))p(s ˆ i) =
u(w)p(s ˆ i ) for all s−i ∈ Si .
si ∈Si
Therefore the two minimization problems are the same and h ∼ w. Thus, we get quite a natural representation in the MMEU framework: • •
All the marginals on the randomizing device agree, reflecting the lack of ambiguity about the device. All the measures in B are product measures on Si × S−i , reflecting the independence of the randomizing device.
Remark 11.1. It is not hard to see from the theorem that, in the Ellsberg “unknown urn” example, if “bet on red” is indifferent to “bet on black” then any MMEU preferences that are not EU and for which the coin is a SIRD lead to the “Raiffa” preference for randomization. As a concrete example, consider the MMEU
Stochastically independent randomization preferences with set of measures " 1 p | p(R × H ) = p(R × T ) = x, 2 % 1 1 2 = (1 − x), ≤x≤ . 2 3 3
253
p(B × H ) = p(B × T )
By the theorem, these preferences make {H , T } a SIRD and it is easy to verify that they exhibit the “Raiffa” preference for randomization. Remark 11.2. The set of product measures that emerges from the MMEU characterization is consistent with a notion of independent product of two sets of measures proposed by Gilboa and Schmeidler (1989). Specifically, the set B is trivially the independent product (in their sense) of the (unique) marginal on Si and the set of marginals on S−i used in representing preferences over FS−i . It is worth noting that no purely preference based justification for their broader notion (when neither of the sets is a singleton) is known. 11.4.2. CEU, uncertainty aversion, and randomizing devices This section examines uncertainty averse CEU preferences on a product state space where one of the ordinates is assumed to be a candidate randomizing device. In stark contrast to the results of the previous section, this class is shown to include only EU preferences. This suggests that CEU preferences with a convex capacity are incapable of modeling both a randomizing device and uncertainty aversion simultaneously. Theorem 11.2. If CEU preferences, !, display uncertainty aversion and, for some i, Si is a SIRD then ! must be EU preferences. Proof. Recall that the state space is S = S1 × S2 × · · · × SN . Without loss of generality, let S1 be a SIRD. Uncertainty aversion implies that the capacity v in the CEU representation is convex. The core of a capacity v is the set of probability measures that pointwise dominate v (i.e. {p | p(A) ≥ v(A), for all A ⊆ S; p a probability measure}). Any CEU preferences with a convex v are also MMEU preferences with the set of measures, B, equal to the core of v (Schmeidler, 1986). It follows that v(A) = minp∈core(v) p(A) for all A ⊆ S (i.e. v is the lower envelope of its core). Since preferences are MMEU and S1 is a SIRD, Theorem 11.1 implies ˆ such that all probability measures, that there exists a probability measure on 2S1 , p, p, in the core of v satisfy p(s) = p(s ˆ 1 )p(S1 × s−1 ), for all s ∈ S. Thus the core of v must be of a very special form. The remainder of the proof is devoted to showing that convexity of v and a core of this form are only compatible when preferences are EU. First I derive a key equality implied by convexity together with the form of the core. To this end, fix any s1 ∈ S1 and A−1 , B−1 ⊆ S−1 . Denote the complement
254
Peter Klibanoff
c of s−1 relative to S−1 by s−1 and the complement of s1 relative to S1 by s1c . Define the sets C = s1 × S−1 and D = (s1 × A−1 ) ∪ (s1c × B−1 ). Convexity of v implies that
v(C) + v(D) ≤ v(C ∪ D) + v(C ∩ D). Using the structure of the core of v and the fact that v is the lower envelope of its core yields the opposite inequality: v(C) + v(D) = p(s ˆ 1) +
ˆ 1 )p(S1 × A−1 ) min [p(s
p∈core(v)
+ (1 − p(s ˆ 1 ))p(S1 × B−1 )] ≥ p(s ˆ 1 ) + p(s ˆ 1 ) min
p(S1 × A−1 )
+ (1 − p(s ˆ 1 )) min
p(S1 × B−1 )
p∈core(v)
p∈core(v)
= v(C ∪ D) + v(C ∩ D). Combining the two inequalities, it must be that, for all s1 ∈ S1 and all A−1 , B−1 ∈ S−1 , ˆ 1 )p(S1 × A−1 ) + (1 − p(s ˆ 1 ))p(S1 × B−1 )] min [p(s
p∈core(v)
= p(s ˆ 1 ) min
p∈core(v)
p(S1 × A−1 ) + (1 − p(s ˆ 1 )) min
p∈core(v)
p(S1 × B−1 ). (11.4)
Now, using Equation (11.4), an argument by contradiction shows that the core of v cannot contain more than one measure. Suppose core(v) contains more than one probability measure. Then there exists an s¯ ∈ S such that arg minp∈core(v) p(¯s ) ⊂ core(v). Since all the measures in the core are of the form p(s) = p(s ˆ 1 )p(S1 × s−1 ), it must be that arg minp∈core(v) p(S1 × s¯−1 ) = arg minp∈core(v) p(¯s ). Since c p(S1 × s¯−1 ) = 1 − p(S1 × s¯−1 ) for any p ∈ core(v), arg min
p∈core(v)
p(S1 × s¯−1 ) ∩ arg min
p∈core(v)
c p(S1 × s¯−1 ) = ∅.
Thus, for any non-null s1 ∈ S1 , c ˆ 1 )p(S1 × s¯−1 ) + (1 − p(s ˆ 1 ))p(S1 × s¯−1 )] min [p(s
p∈core(v)
> p(s ˆ 1 ) min
p∈core(v)
p(S1 × s¯−1 ) + (1 − p(s ˆ 1 )) min
p∈core(v)
c p(S1 × s¯−1 ),
in violation of Equation (11.4). Therefore, the core of v must be a singleton and, since v is the lower envelope of its core, v must be a probability measure and preferences are EU.
Stochastically independent randomization
255
Remark 11.3. This theorem shows that CEU with a convex capacity is a very restrictive class of preferences in a Savage-like setting. In particular a decision maker with such preferences must be either uncertainty neutral (i.e. an EU maximizer) or must not view any ordinate of the state space as a stochastically independent randomizing device. Note that this fact is disguised in an Anscombe– Aumann setting because there the randomizing device is built into the outcome space and thus automatically separated from the uncertainty over the rest of the world. Remark 11.4. The theorem allows us to better understand the result of Eichberger and Kelsey (1996), who find that convexity of v, a symmetric additive marginal on S1 , and a requirement that relabeling the states in S1 do not affect preference, together imply no preference for randomization. The result shown here makes clear that the lack of preference for randomization in their paper comes from the fact that decision makers having preferences in this class (with v somewhere strictly convex) cannot act as if the randomizing device is stochastically independent in the sense of SIRD. In other words, the uncertainty averse preferences they consider rule out a priori the possibility of a stochastically independent device and thus of true randomization. In this light, their result arises because all of the non-EU preferences they consider force a range of possible correlations (which are then viewed pessimistically since they are another source of uncertainty) between the device and the rest of the state space. Once they admit preferences like MMEU, which, as shown earlier, can reflect a proper randomizing device (a SIRD) as well as uncertainty aversion, preference for randomization reappears. Remark 11.5. If convexity of v is replaced by the weaker requirement of v balanced (or, equivalently, the core of v nonempty), as advocated by Ghirardato and Marinacci (1998), preferences do not collapse to EU. For example, if the capacity used in Section 11.3 is modified by setting v((R × H ) ∪ (B × T )) = v((R × T ) ∪ (B × H )) = 21 rather than 13 , the resulting preferences make {H , T } a SIRD and are not EU. This capacity has a nonempty core, but is not convex. Note that these preferences still display a preference for randomization. To the extent that one is willing to accept this weaker characterization of uncertainty aversion and wants to use CEU preferences in a Savage-like setting, these findings suggest that the class of capacities with nonempty cores that are not convex may be of particular interest. 11.4.3. Further discussion of the SIRD condition The key to these results is the definition of a SIRD, in particular the assumption that Equation (11.1) implies Equation (11.2). I argued above that given preferences satisfying EU on Si and given the restriction of the acts in (11.1) to be Si -measurable, it is quite natural to accept Equation (11.1) implies Equation (11.2) as reflecting the stochastic independence of Si from the rest of the state space.
256
Peter Klibanoff
It is worth elaborating a bit on why SIRD is appropriate for a randomizing device. Since stochastic independence does concern independence, and uncertainty aversion fundamentally involves violations of the independence axiom/sure-thing principle of subjective EU theory, it is fair to ask whether imposing SIRD unnecessarily restricts uncertainty aversion. Does SIRD confound stochastic independence with the violations of independence inherent in uncertainty aversion? Theorem 11.1 answers this question in the negative and suggests that uncertainty aversion is not restricted at all by imposing SIRD. Specifically, any MMEU preferences5 over FS−i are compatible with Si being a SIRD. Notice that it is exactly and only uncertainty aversion over S−i that is unrestricted. This is appropriate, since any other uncertainty aversion must be either over Si (ruled out by EU) or over the correlation between Si and S−i (incompatible with the presumption of stochastic independence). There simply is nothing else to be uncertain about. It seems that SIRD strikes a reasonable balance – enforcing stochastic independence of a randomizing device while allowing uncertainty aversion on the other ordinates of the state space.
11.5. Conclusion This chapter has provided preference-based conditions that a randomizing device should satisfy. When these conditions are applied to the class of CEU preferences with convex capacities in a product state space model a collapse to EU results. This does not occur with MMEU preferences in the same setting. In particular, it appears that some previous results on the absence of preference for randomization were driven not by some deep difference in Anscombe–Aumann and Savage style models as they relate to uncertainty aversion, but by the restrictiveness, as it relates to stochastic independence, of the CEU functional form with a convex capacity which is exacerbated in Savage style models. When stochastic independence is properly accounted for, preference for randomization by uncertainty averse decision makers arises in both one- and two-stage models. To my knowledge, Blume et al. (1991) are the only others to have developed a preference axiom for stochastic independence. Their work is in the context of preferences satisfying the decision-theoretic independence axiom. This leads their condition to be unsatisfactory in the setting of this chapter. In particular, their axiom asks more of conditional preferences than is reasonable in the presence of uncertainty aversion and does not need to address the consistency of conditional with unconditional preferences. There have been several functional (i.e. not preference axiom based) notions of stochastically independent product that have been proposed for the MMEU or CEU models. For the case where one marginal is additive in the MMEU model, as was mentioned following Theorem 11.1, the results of the approach taken here agree with the notion proposed in Gilboa and Schmeidler (1989). Approaches specific to the CEU model have been suggested by Ghirardato (1997) and Hendon et al. (1996). If one marginal is additive and the product capacity is convex (as in Theorem 11.2), these approaches are weaker than the one advocated here.
Stochastically independent randomization
257
Specifically, preferences that are independent in the sense of Ghirardato (1997) or Hendon et al. (1996) may violate SIRD. Some other recent work on shortcomings of the CEU model in capturing probabilistic features is Nehring (1999). An analysis relating separability of events in the CEU model to EU is given in Sarin and Wakker (1997). In the context of inequality measurement under uncertainty, Ben-Porath et al. (1997) advocate MMEU type functionals and show that they are closed under iterated application while CEU functionals are not. Differences between CEU and MMEU are also discussed in Klibanoff (2001) and Ghirardato et al. (1996). Any discussion of behavior, such as preference for randomization, that departs from what is considered standard raises some natural questions. First, descriptively, do actual decision makers behave in this way? Unfortunately, there are no studies that I am aware of that examine this issue. To do so properly would require: (1) taking some device like a coin and verifying that the decision maker viewed it as a SIRD; and (2) making it explicit to the decision maker that it is possible to choose acts that depend, not only on the main feature of interest (e.g. the color of ball drawn) but simultaneously on the realization of the coin flip. It is worth noting that many standard Ellsberg-style experiments do not offer an opportunity to examine preference for randomization because they tend to ask only questions such as “Do you prefer betting on red (black) in urn I or red (black) in urn II?” rather than allowing a fuller range of choices that provide a role for randomization. Second, normatively, is the preference for randomization described here “reasonable” or “rational” behavior or is it normatively unacceptable? In examples where uncertainty averse behavior seems reasonable, I find preference for randomization to be just as reasonable. Randomization acts to limit the negative influence of uncertainty on expected payoffs. In a nutshell, if one is afraid that the distribution over colors will be an unfavorable one, if one does not suffer this fear regarding the outcome of a coin flip, and if one is sure that the realization of the coin is independent, then the fact that any joint distribution must respect this independence limits the extent to which acts that pay based on the coin as well as the color (“randomizations”) can be hurt by uncertainty over the colors. This chapter has shown that to reject this argument, one must either (1) reject uncertainty aversion as defined here or (2) reject the possibility of committing to acts that are contingent on a SIRD (i.e. reject the static, Savage-like model of independent randomization). To argue the former, as in Raiffa (1961), one may invoke reasoning based on the decision-theoretic independence axiom/sure-thing principle to reject Ellsberg-type behavior as irrational. However, at the very least, the normative force of the independence axiom/sure-thing principle is a topic on which there are a wide range of opinions. Arguments relying on an inability to commit to a randomized action bring in an explicit dynamic component that is beyond the scope of this chapter to fully address. Such arguments are not, in my view, particularly satisfying since they leave open the question of why such acts would not be introduced, by third parties if necessary, given that the decision maker desires them.
258
Peter Klibanoff
Finally, it is important to emphasize that, although the language of this chapter has been in terms of independent randomization, fundamentally a SIRD is simply an ordinate of a product state space over which preferences are EU and which is viewed as stochastically independent of the rest of the state space. In many situations where an individual faces a number of uncertainties, it may be useful to be able to assume that the individual is EU on some dimensions but uncertainty averse on others, and that the EU facets of the uncertainty are stochastically independent of others. In this regard, Theorem 11.2 has shown that CEU with a convex capacity is not an appropriate class of preferences, while, by Theorem 11.1, MMEU preferences are capable of reflecting these features.
Acknowledgments Previous versions were circulated with the title “Stochastic Independence and Uncertainty Aversion.” I thank Massimo Marinacci, Paolo Ghirardato, Bart Lipman, Ed Schlee, Peter Wakker, Drew Fudenberg, an anonymous referee and especially Eddie Dekel for helpful discussions and comments and also thank a number of seminar audiences.
Notes 1 Eichberger and Kelsey, 1996, Abstract. 2 This characterization of uncertainty aversion for the CEU model stems from an axiom of Schmeidler’s (1989) of the same name in an Anscombe–Aumann framework. CasadesusMasanell et al. (2000a,b) develop analogous axioms for a Savage-style framework. This notion of uncertainty aversion has been by far the most common in the literature. However, recently, Epstein (1999) and Ghirardato and Marinacci (1998) have proposed alternative notions of uncertainty aversion. In particular, for the case of CEU, Ghirardato and Marinacci’s characterization requires the capacity v to be balanced. All convex capacities are balanced, but the converse is not true. Epstein’s notion neither implies nor is implied by convexity of v. However, the reason for this is that he uses a set of preferences larger than EU as an uncertainty neutral benchmark. If (as is the philosophy in this chapter and in Ghirardato and Marinacci) EU is the uncertainty neutral benchmark, then Epstein’s notion also requires v to be balanced. The reason these notions are weaker than Schmeidler’s is that they are based solely on preference comparisons for which at least one of the two acts being compared is “unambiguous.” In contrast, Schmeidler’s approach relies, in addition, on comparisons between certain pairs of “ambiguous” acts that are implicitly ranked as more or less ambiguous by his axiom (or its Savage counterpart). See Casadesus-Masanell et al. (2000b) for a more detailed discussion along these lines. 3 This is true using the approach of either Schmeidler (1989), Casadesus-Masanell et al. (2000a,b), or Ghirardato and Marinacci (1998). Under the assumption that preferences over “unambiguous” acts are EU, it is true in Epstein’s (1999) approach as well. 4 A randomizing device could be modeled in an Anscombe–Aumann setting by expanding the state space in exactly the same way as illustrated for the Savage setting. In this case, the same additional restrictions on preferences as in the latter setting would be required to ensure that the randomizing device was unambiguous and stochastically independent. 5 Recall that this includes any CEU preferences with a convex capacity as well.
Stochastically independent randomization
259
References Ben-Porath, E., Gilboa, I., Schmeidler, D. (1997) On the measurement of inequality under uncertainty. Journal of Economic Theory 75, 194–204. (Reprinted as Chapter 22 in this volume.) Blume, L., Brandenburger, A., Dekel, E. (1991) Lexicographic probabilities and choice under uncertainty. Econometrica 59, 61–79. Casadesus-Masanell, R., Klibanoff, P., Ozdenoren, E. (2000a) Maxmin expected utility through statewise combinations. Economics Letters 66, 49–54. Casadesus-Masanell, R., Klibanoff, P., Ozdenoren, E. (2000b) Maxmin expected utility over savage acts with a set of priors. Journal of Economic Theory 92, 35–65. Chew, S. H., Karni, E. (1994) Choquet expected utility with a finite state space: commutativity and act-independence. Journal of Economic Theory 62, 469–479. Dow, J., Werlang, S. (1994) Nash equilibrium under Knightian uncertainty: breaking down backward induction. Journal of Economic Theory 64, 305–324. Eichberger, J., Kelsey, D. (1996) Uncertainty aversion and preference for randomization. Journal of Economic Theory 71, 31–43. Eichberger, J., Kelsey, D. (2000) Nonadditive beliefs and strategic equilibria. Games and Economic Behavior 30, 183–215. Ellsberg, D. (1961) Risk, ambiguity, and the Savage axioms. Quarterly Journal of Economics 75, 643–669. Epstein, L. (1999) A definition of uncertainty aversion. Review of Economic Studies 66, 579–608. (Reprinted as Chapter 9 in this volume.) Ghirardato, P. (1997) On independence for non-additive measures with a Fubini theorem. Journal of Economic Theory 73, 261–291. Ghirardato, P., Klibanoff, P., Marinacci, M. (1996) Additivity with multiple priors. Journal of Mathematical Economics 30, 405–420. Ghirardato, P., Marinacci, M. (1998) Ambiguity made precise: a comparative foundation. Social Science Working Paper 1026, California Institute of Technology. (Reprinted as Chapter 10 in this volume.) Gilboa, I. (1987) Expected utility theory with purely subjective non-additive probabilities. Journal of Mathematical Economics 16, 141–153. (Reprinted as Chapter 6 in this volume.) Gilboa, I., Schmeidler, D. (1989) Maxmin expected utility with non-unique prior. Journal of Mathematical Economics 18, 141–153. (Reprinted as Chapter 6 in this volume.) Gul, F. (1992) Savage’s theorem with a finite number of states. Journal of Economic Theory 57, 99–110. Hendon, E., Jacobsen, H. J., Sloth, B., Tranaes, T. (1996) The product of capacities and belief functions. Mathematical Social Sciences 32, 95–108. Klibanoff, P. (1994) Uncertainty, decision, and normal form games. Mimeo, Northwestern University. Klibanoff, P. Characterizing uncertainty aversion through preference for mixtures. Social Choice and Welfare, 18, 289–301. Lo, K. C. (1996) Equilibrium in beliefs under uncertainty. Journal of Economic Theory 71, 443–484. (Reprinted as Chapter 20 in this volume.) Nakamura, Y. (1990) Subjective expected utility with non-additive probabilities on finite state spaces. Journal of Economic Theory 51, 346–366.
260
Peter Klibanoff
Nehring, K. (1999) Capacities and probabilistic beliefs: a precarious coexistence. Mathematical Social Sciences 38, 197–213. Raiffa, H. (1961) Risk, ambiguity, and the Savage axioms: comment. Quarterly Journal of Economics 75 , 690–694. Sarin, R., Wakker, P. (1992) A simple axiomatization of nonadditive expected utility. Econometrica 60, 1255–1272. (Reprinted as Chapter 7 in this volume.) Sarin, R., Wakker, P. (1997) Dynamic choice and nonexpected utility. Journal of Risk and Uncertainty 17, 87–119. Schmeidler, D. (1986) Integral representation without additivity. Proceedings of the American Mathematical Society 97, 255–261. Schmeidler, D. (1989) Subjective probability and expected utility without additivity. Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) Wakker, P. (1984) Cardinal coordinate independence for expected utility. Journal of Mathematical Psychology 28, 110–117. Wakker, P. (1989) Continuous subjective expected utility with non-additive probabilities. Journal of Mathematical Economics 18, 1–27.
12 Decomposition and representation of coalitional games Massimo Marinacci
12.1. Introduction Let F be an algebra of subsets of a given space X, and V the set of all set functions ν on F such that ν(Ø) = 0. These set functions are called transferable utility coalitional games (games, for short). Gilboa and Schmeidler (1995) prove the existence of a one-to-one correspondence between the games in V that have finite composition norm (see Section 12.2) and the bounded finitely additive measures defined on an appropriate algebra of subsets of 2F . The algebra is constructed as follows: for T ∈ F , define T ∗ ⊆ F by T ∗ = {S ∈ F : Ø = S ⊆ T }. Denote = {T ∗ : Ø = T ∈ F }. Then is defined as the algebra of subsets of 2F generated by . On the basis of this representation, Gilboa and Schmeidler (1995) show that every game with finite composition norm can be decomposed in the difference of two totally monotone games (i.e., belief functions). Their work is related to Choquet (1953–1954) and Revuz (1955–1956), as discussed on p. 211 of their article. In this chapter we first give a direct proof of the mentioned decomposition theorem, a proof based on the well-known Dempster–Shafer–Shapley Representation Theorem for finite games (see, e.g. Shapley, 1953; Shafer, 1976). On the basis of such a decomposition we obtain a one-to-one correspondence between the games in V that have finite composition norm and the bounded regular countably additive measures defined on an appropriate Borel σ -algebra. To construct this σ -algebra we proceed as follows. Let Ub be the set of all {0, 1}-valued convex games (a game ν is convex if ν(A) + ν(B) ≤ ν(A ∪ B) + ν(A ∩ B) for all A, B ∈ F ). The set Ub can be endowed with a natural topology τν as defined in Section 12.3. Let B(Ub ) be the Borel σ -algebra on (Ub , τν ) This is the σ -algebra we use to get the mentioned one-to-one correspondence. A main advantage of this novel representation theorem is that the space of bounded regular countably additive measures on B(Ub ) has much more structure than the space of finitely additive measures on the algebra . Besides, unlike
Marinacci, M. (1996) Decomposition and representation of coalitional games, Mathematics of Operations Research 21, 1000–1015.
262
Massimo Marinacci
finitely additive measures, regular countably additive measures are widely studied in measure theory, and technically more convenient. For finite algebras, both the Gilboa–Schmeidler representation and the one proved here reduce to the Dempster–Shafer–Shapley Representation Theorem. Finally, in this work we use a topological approach, while Gilboa and Schmeidler (1995) use an algebraic one. This is a secondary contribution of the chapter. In particular, with our topological approach it is possible to reobtain also their finitely additive representation. In sum, the approach taken in this chapter leads to novel results, without losing any of the results already proved with the different algebraic approach taken by Gilboa and Schmeidler (1995). This gives a unified perspective on this topic. The chapter is organized as follows. Section 12.2 contains some preliminary material. In Section 12.3 some properties of a locally convex topological vector space on V are proved. In Section 12.4 a direct proof of the decomposition theorem is provided. In Section 12.5, which is the heart of the chapter, the main result is proved. As a consequence of this result, in Section 12.6 it is proved that every Choquet integral on F can be represented by a standard additive integral on B(Ub ). In Section 12.7 it is shown how the finitely additive representation result of Gilboa and Schmeidler (1995) can be reobtained in our setup. Finally, in Section 12.8 some dual spaces of V are studied.
12.2. Preliminaries A set function ν on the algebra F is said to be a game if ν(Ø) = 0. The symbol V denotes the set of all games defined on F . The space V becomes a vector space if we define addition and multiplication elementwise: (ν1 + ν2 )(A) = ν1 (A) + ν2 (A) and
(αν)(A) = αν(A)
for all A ∈ F and α ∈ /. A game ν is monotone if ν(A) ≤ ν(B) whenever A ⊆ B. A game ν is convex if ν(A) + ν(B) ≤ ν(A ∩ B) + ν(A ∪ B) for all A, B ∈ F . A game ν is normalized if ν(X) = 1. A game ν is totally monotone if it is non-negative and if for every n ≥ 2 and A1 , . . . , An ∈ F we have: ν
n i=1
! Ai
≥
{I : Ø =I ⊆{1,...,n}}
(−1)
|I |+1
ν
! Ai
.
i∈I
For T ∈ F , the {0, 1}-valued game uT ∈ V such that uT (A) = 1 if and only if T ⊆ A is called a unanimity game. We can now present the Dempster–Shafer– Shapley Representation Theorem, which will play a central role in the sequel. Given a finite algebra F = {T1 , . . . , Tn }, the atoms of F are the sets of the form T1i1 ∩ T2i2 ∩ · · · ∩ Tnin where ij ∈ {0, 1} and Tj0 = −Tj , Tj1 = Tj (−T denotes the complement of T ). We denote by the set of all atoms of F . It holds n ≤ | | ≤ 2n .
Representation of coalitional games
263
Theorem 12.1. Suppose F is finite. Then {uT : Ø = T ∈ F } is a linear basis for V . Given ν ∈ V , the unique coefficients {αTν : Ø = T ∈ F } satisfying
ν(A) =
αTν uT (A) for all A ∈ F
Ø=T ∈F
are given by αTν
=
(−1)
|T |−|S|
ν(S) = ν(T ) −
{I : ∅ =I ⊆{1,...,k}}
S⊆T
(−1)
|I |+1
ν
! Ti
i∈I
where Ti = T \ ωi , T = ∪ki=1 ωi . Moreover, ν is totally monotone on F if and only if αTν ≥ 0 for all nonempty T ∈ F . For a finite algebra F , Gilboa and Schmeidler (1995) define the composition norm of ν ∈ V to be ν = T ∈F |αTν |. On infinite algebras, Gilboa and Schmeidler (1995) define the composition norm · of ν ∈ V in the following way. Given a subalgebra F0 ⊆ F , let ν|F0 denote the restriction of ν to F0 . Then define: , + ν = sup ν|F0 : F0 is a finite subalgebra of F . The function · is a norm, and in what follows · will always denote the composition norm. V b will denote the set {ν ∈ V : ν is finite}. The pair (V b , ·) is a Banach space (see Gilboa and Schmeidler, 1995: 204). Another important norm in transferable utility cooperative game theory is the variation norm · ν , introduced by Aumann and Shapley (1974). This norm is defined by n $ |ν(Ai+1 ) − ν(Ai )|: Ø = A0 ⊆ A1 ⊆ · · · ⊆ An+1 = X . νv = sup i=0
BV b will denote the set {ν ∈ V : νv is finite}. If ν is additive, then · coincides with · v , and both norms coincide with the standard variation norm for additive set functions (cf. Gilboa and Schmeidler, 1995: 201). A filter p of F is a collection of subsets of F such that (1) X ∈ p, (2) if A ∈ p, B ∈ F , and A ⊆ B, then B ∈ p, (3) if A ∈ p and B ∈ p, then A ∩ B ∈ p. Let p be a filter of F . Then (1) p is a proper filter if Ø ∈ / p, that is, p = F , (2) p is a principal filter if p = {B ∈ F : A ⊆ B} for some A ∈ F ; p is then the principal filter generated by A, (3) p is a free filter if it is not a principal filter.
264
Massimo Marinacci
In a finite algebra all filters are principal. This is no longer true in infinite algebras. For example, let X be an infinite space. A simple example of a free filter in the power set of X is the collection of all cofinite sets {A ⊆ X : − A is finite}. Every filter p can be directed by the binary relation ≥ defined by A ≥ B ⇐⇒ A ⊆ B
where A, B ∈ p.
Let f : X → / be a real-valued function on X. Set fA = inf f (x)
for each A ∈ p.
x∈A
The pair (fA , ≥) is a monotone increasing net. Using it we can define lim inf p f as follows: lim inf f ≡ lim fA . p
A∈p
If p is a principal filter generated by a set A ∈ F , then lim inf p f = inf x∈A f (x). This shows that lim inf p f is the appropriate generalization of inf x∈A f (x) needed to take care of free filters. We denote by F the set of all bounded functions f : X → / such that for every t ∈ / the sets {x : f (x) > t} and {x : f (x) ≥ t} belong to F . For a monotone set function ν ∈ V and a function f ∈ F , the Choquet integral is defined as
f dν =
∞
ν({x : f (x) ≥ t}) dt +
0
0 −∞
[ν({x : f (x) ≥ t}) − ν(X)] dt,
where the r.h.s. is a Riemann integral.
12.3. A locally convex topological vector space on V A natural topology on V has as a local base at ν0 ∈ V the sets of the form B(ν0 ; A1 , . . . , An ; ) = {ν ∈ V : |ν0 (Ai ) − ν(Ai )| <
for 1 ≤ i ≤ n},
where Ai ∈ F for 1 ≤ i ≤ n, and > 0. We call this topology the V-topology of V . In the next proposition we claim that under this topology the vector space V becomes a locally convex and Hausdorff topological vector space. The proof is standard, and it is therefore omitted. Proposition 12.1. Under the V-topology the vector space V is a locally convex and Hausdorff topological vector space. The next proposition is rather important for the rest of the chapter and it is a simple extension of Alaoglu Theorem to this setup. Proposition 12.2. The set {ν ∈ V : νv ≤ 1} is V-compact in V .
Representation of coalitional games
265
Proof. Let MO be the set of all monotone set functions on F . Set K = {ν ∈ MO : ν(X) ≤ 1}. Let I = A∈F [0, 1]A . By the Tychonoff Theorem, I is compact w.r.t. the product topology. We can define a map τ : K → I by τ (ν) = A∈F ν(A). It is easy to check that τ is a homeomorphism between K endowed with the relative V-topology, and τ (K) endowed with the relative product topology. Therefore, to prove that K is V-compact it suffices to show that K is V-closed. Let να be a net in K that V-converges to a game ν. Since limα να (A) = ν(A) for all A ∈ F , it is easy to check that ν ∈ MO and ν(X) ≤ 1. We conclude that K is V-closed, and, therefore, V-compact. Let ν ∈ BV b . In Aumann and Shapley (1974: 28) it is proved that there exists a decomposition ν1 , ν2 of ν such that ν1 , ν2 ∈ MO and νv = ν1 v + ν2 v . Therefore, {ν ∈ V : νv ≤ 1} ⊆ K − K. Since V is a locally convex and Hausdorff topological vector space, the set K − K is V-compact (see e.g. Schaefer, 1966: I.5.2). Therefore, to prove that {ν ∈ V : νv ≤ 1} is V-compact it suffices to show that it is V-closed. Let να be a net in {ν ∈ V : νv ≤ 1} that V-converges to a game ν. For each α there exists a decomposition ν1,α , ν2,α such that both ν1,α and ν2,α are in K and να v = ν1,α v + v2,α v . Since K is V-compact and να V-converges to ν, there exist two subnets ν1,β and ν2,β that V-converge, respectively, to two games ν1 and ν2 such that ν1 , ν2 ∈ K and ν = ν1 − ν2 . We can write: νv = ν1 − ν2 v ≤ ν1 v + ν2 v = ν1 (X) + ν2 (X) = lim{ν1,β (X) + ν2,β (X)} = lim{ν1,β + ν2,β } = lim νβ . β
β
β
Therefore, νv ≤ 1, as wanted.
12.4. Decomposition In this section we prove the decomposition result mentioned in the Introduction. The proof is rather different than that of Gilboa and Schmeidler (1995), and it is essentially based on the properties of the V-topology. It is worth noting that this result is an extension to set functions in V b of the well-known Jordan Decomposition Theorem for measures. Theorem 12.2. (i) Let ν ∈ V b . Then there exist two totally monotone games ν + and ν − such that ν(A) = ν + (A) − ν − (A) for all A ∈ F and ν = ν + + ν − . Moreover, this is the unique decomposition that satisfies the norm equation. (ii) The set U (X) = {ν ∈ V : ν ≤ 1} is V-compact. Proof. Let T M = {ν ∈ V : ν is totally monotone}. Let ν0 ∈ V b . Let B(ν0 ; A1 , . . . , An ; ε) be a neighborhood of ν0 . Let F (A1 , . . . , An ) be the algebra generated by {A1 , . . . , An }. Since F (A1 , . . . , An ) is finite, Theorem 12.1 holds. Let
266
Massimo Marinacci
F+ = {Ø = A ∈ F (A1 , . . . , An ) : αTν ≥ 0}, and F = {A ∈ F (A1 , . . ., An ) : A = Ø}. Set ν+ = (−αTν )uT . αTν uT and ν − = T ∈F+
T ∈F \F+
As observed by Gilboa and Schmeidler (1994: 56), we have ν0 (A) = ν + (A) − ν − (A) for all F (A1 , . . . , An ) and ν0|F (A1 ,...,An ) = ν + +ν − . Moreover, each unanimity game uT is totally monotone on the entire algebra F . Set ν = ν + − ν − . Clearly, ν = (ν + − ν − ) ∈ B(ν0 ; A1 , . . . , An ; ε). Since ν + is totally monotone, ν + = ν + (X). Therefore, ν + = + − − ν|F (A1 ,...,An ) . Similarly, ν = ν|F (A1 ,...,An ) . We now show that ν = − + ν + ν . Since ν(A) = ν0 (A) for all A ∈ F (A1 , . . . , An ), we have + − ν|F (A1 ,...,An ) = ν0|F (A1 ,...,An ) = ν|F (A1 ,...,An ) + ν|F (A1 ,...,An ) . + − + − Hence, ν ≥ ν|F (A1 ,...,An ) +ν|F (A1 ,...,An ) = ν +ν . On the other hand, since · is a norm, ν ≤ ν + + ν − . We conclude ν = ν + + ν − , as claimed. Using this equality we can write: + − + − ν0 ≥ ν0|F (A1 ,...,An ) = ν|F (A1 ,...,An ) + ν|F (A1 ,...,An ) = ν + ν . (12.1)
By what has just been proved, if we consider the family of all V-neighborhoods of ν0 as directed by the inclusion ⊆, there exists a net να that V-converges to ν0 , and such that for all α we have: (i) να = να+ − να− ; (ii) να = να+ + να− ; (iii) να ≤ ν0 . Set M = ν0 and U M (X) = {ν ∈ V : ν ≤ M}. If ν ∈ TM , then ν = νv = ν(X). Therefore, using Proposition 12.2, it is easy to check that the set TM ∩ U M (X) is V-compact. Since να+ is a net in TM ∩ U M (X), there exists a subnet νβ+ that V-converges to a game ν0+ ∈ TM ∩ U M (X). Since the net να V-converges, this implies that also the subnet νβ− (which is equal to νβ+ − νβ )V-converges to a game ν0− ∈ T M ∩ U M (X). Clearly, ν0 = limβ (νβ+ − νβ− ) = ν0+ − ν0− . Moreover: ν0 ≥ lim νβ = lim{νβ+ + νβ− } = lim{νβ+ (X) + νβ− (X)} β
=
β
ν0+ (X) + ν0− (X)
β
=
ν0+ + ν0− ,
(12.2)
where the first inequality follows from expression (12.1). On the other hand, ν0 = ν0+ − ν0− implies ν0 ≤ ν0+ + ν0− . Together with (12.2), this implies
Representation of coalitional games
267
ν0 = ν0+ + ν0− . This proves the existence of the decomposition. As to uniqueness, to prove it we need the techniques used in the proof of Theorem 12.3. Consequently, uniqueness is proved in the proof of Theorem 12.3. As to part (ii), we first show that U (X) is V-closed. Let να be a net in U (X) that V-converges to an element ν ∈ V b . By part (i), there exists a decomposition να = να+ − να− with να+ , να− ∈ U (X) ∩ TM and να = να+ + να− . Proceeding as before, we can prove that there exist two subnets νβ+ and νβ− that V-converge, respectively, to ν + and ν − , where ν + , ν − ∈ TM ∩ U (X). We have: ν = ν + − ν − ≤ ν + + ν − = ν + (X) + ν − (X) = lim{νβ+ (X) + νβ− (X)} = lim{νβ+ + νβ− } = lim νβ . β
β
β
Since νβ ≤ 1 for all β, we can conclude ν ≤ 1, so that ν ∈ U (X). This proves that U (X) is V-closed. On the other hand, from part (i) it follows that U (X) ⊆ [T M ∩ U (X)] − [T M ∩ U (X)]. Since the set T M ∩ U (X) is V-compact, also the set [T M ∩ U (X)] − [T M ∩ U (X)] is V-compact, and this implies that U (X) is V-compact, as desired.
12.5. Countably additive representation of games in V b We first introduce a class of simple games that will play a key role in the sequel. As observed in Shafer (1979), these games can already be found in Choquet (1953– 1954). Definition 12.1. Let p be a proper filter of F . A normalized game up (·) on F is called a filter game if up (A) = 1 whenever A ∈ p, and up (A) = 0 whenever A∈ / p. We denote by Ub the set of all filter games on F . Unanimity games are a subclass of filter games. In particular, a filter game up is a unanimity game if and only if p is a principal filter. For, if p is a principal filter, that is, p = {B ∈ F : A ⊆ B} for some A ∈ F , then up coincides with the unanimity game uA . In finite algebras all filters are principal, so that all filter games are unanimity games. This is no longer true in infinite algebras, where there are free filters to consider. For example, if P (X) is the power set of an infinite |X| space, it is known that there are 22 filters (see e.g. Balcar and Franek, 1982), and |X| just 2 of them are principal. In sum, filter games are the natural generalization of unanimity games to infinite algebras. We now list some very simple properties of filter games. Proposition 12.3. (i) A game is {0, 1}-valued and convex if and only if it is a filter game. (ii) Every filter game is totally monotone. (iii) The set Ub is V-compact in V .
268
Massimo Marinacci
Remark. Of course, (i) and (ii) together imply that a game is {0, 1}-valued and totally monotone if and only if it is a filter game. Proof. (i) “Only if” part: Let ν be a {0, 1}-valued and convex game. Then ν is monotone. Let p = {A : ν(A) = 1}. By monotonicity, if A ∈ p, then B ∈ p whenever A ⊆ B. Now, let A, B ∈ p. Then ν(A) = ν(B) = ν(A ∪ B) = 1. By convexity, 1 = ν(A) + ν(B) − ν(A ∪ B) ≤ ν(A ∩ B), and so ν(A ∩ B) = 1. We conclude that p is a filter, and ν a filter game. “If ” part: Tedious, but obvious. (ii) Let up be a filter game, and A1 . . . , An ∈ F . Let I∗ = {i : Ai ∈ p}. If I∗ is empty, the claim is obvious. Let I∗ = Ø, with I∗ = k. Let Ck,i be a binomial coefficient. Then: ! k |I |+1 (−1) up Ai = Ck,i (−1)i+1 = 1. {I : Ø =I ⊆{1,...n}}
i∈I
i=1
Since up (∪ni=1 Ai ) = 1, it follows that up is totally monotone. (iii) Let να be a net in Ub that V-converges to an element ν ∈ V . By hypothesis, να (A) ∈ {0, 1} for all A ∈ F . Then ν(A) ∈ {0, 1} for all A ∈ F . Hence ν is {0, 1}-valued. It is easy to check that ν is also convex. Therefore, by part (i), ν is a filter game, as wanted. Let B(Ub ) be the Borel σ -algebra on the space Ub w.r.t. the V-topology, and let rca(Ub ) be the set of all regular and bounded Borel measures on B(Ub ). Moreover, set rca+ (Ub ) = {µ ∈ rca(Ub ) : µ(A) ≥ 0 for all A ∈ B(Ub )} and rca1 (Ub ) = {µ ∈ rca+ (Ub ) : µ(Ub ) = 1}. When the space rca(Ub ) is endowed with the weak∗ -topology, we write (rca(Ub ), τw ). Similarly, (V b , τV ) denotes the space V b endowed with the V-topology. We recall that an isometric isomorphism between two normed spaces is a one-to-one continuous linear map which preserves the norm. As in the previous section, U (X) = {ν ∈ V : ν ≤ 1}. Theorem 12.3. There is an isometric isomorphism J ∗ between (V b , · ) and (rca(Ub ), · ) determined by the identity up (A) dµ for all A ∈ F .
ν(A) =
(12.3)
Ub
The correspondence J ∗ is linear and isometric, that is, µ = J ∗ (ν) = ν. Moreover, ν is totally monotone if and only if the corresponding µ is nonnegative. Finally, the map J ∗ is a homeomorphism between (V b ∩ U (X), τV ) and (rca(Ub ) ∩ U (X), τw ).
Representation of coalitional games
269
In other words, we claim that for each ν ∈ V b there is a unique µ ∈ rca(Ub ) such that (12.3) holds; and, conversely, for each µ ∈ rca(Ub ) there is a unique ν such that (12.3) holds. Remark. The following proof is based on Theorem 12.2. The hard part is uniqueness. The proof of existence of the measure µ when ν is totally monotone is based on the well-known Dempster–Shafer–Shapley Representation Theorem for games defined on finite algebras (Theorem 12.1), and on a simple compactness argument (similar to those used in Choquet Representation Theory (see Phelps, 1965)). Other remarks on the existence part can be found after Corollary (12.1) below. Proof. Let T M1 = {ν ∈ V : ν is totally monotone and ν = 1}. Let U na = {uT : T ∈ F and T = Ø} be the set of all unanimity games on F . Let ν ∈ T M1 , and let B(ν; A1 , . . . , An ; ) be a neighborhood of ν. Let F (A1 , . . . , An ) be the algebra generated by {A1 , . . . , An }. By Theorem (12.1) there exist αTν ≥ 0 such that
v(A) =
αTν uT (A),
{Ø=T ∈F (A1 ,...,An )}
for allA ∈ F (A1 , . . . , An ), αTν≥ 0 for all Ø = T ∈ F (A1 , . . . , An ) ν ν and = 1. Hence T αT Ø =T ∈F (A1 ,...,An ) αT uT (·) belongs both to B(ν; A1 , . . . , An ; ) and to co(Una). This implies T M1 = cl{co(Una)}. Clearly, Una ⊆ Ub . Moreover, we know Proposition 12.3 (ii) that Ub ⊆ T M1 . Therefore, T M1 = cl{co(Ub )}. Hence, there exists a net λβ contained in co(Ub ) that V-converges to ν. For A ∈ F , let fA : Ub → / be defined by fA (up ) = up (A). The map fA is V-continuous on Ub . By definition, λβ (A) =
αj upj (A)
j ∈Iβ
for all A ∈ F and some finite index set Iβ . Since can write fA dµβ , λβ (A) =
j ∈Iβ
αj = 1 and αj ≥ 0, we
Ub
where µβ (upj ) = αj if j ∈ Iβ , and µβ (upj ) = 0 otherwise. Since, by Proposition 12.1 (iii), Ub is V-compact, it is known that rca1 (Ub ) is weak∗ -compact. Then there exists a subnet µγ of µβ that weak∗ -converges to some µ0 ∈ rca1 (Ub ). Since fA is V-continuous, Ub fA dµγ converges to Ub fA dµ0 for all A ∈ F . Set νγ (A) = Ub fA dµγ for all A ∈ F . The net νγ (A) converges to ν(A) for all
270
Massimo Marinacci
A ∈ F because λβ V-converges to ν. Therefore, it follows that up (A) dµ0 , ν(A) = Ub
for all A ∈ F , and we conclude that µ0 is the measure we were looking for. We have therefore proved the existence of a correspondence J1 between TM 1 and rca1 (Ub ), where J1 is determined by ν(A) = Ub up (A) dµ for all A ∈ F . Clearly, ν(X) = Ub dµ = µ(Ub ) because X ∈ p for every filter p in F . Hence, if µ ∈ J1 (ν), then ν = µ. Let T M = {ν ∈ V b : ν is totally monotone}. A simple argument now shows that there exists a correspondence J between T M and rca+ (Ub ), where J is determined by ν(A) = up (A) dµ Ub
for all A ∈ F . Moreover, if µ ∈ J (ν), then ν = µ. Let ν ∈ V b . By Theorem 12.2 (existence part), there exist ν + , ν − ∈ T M such that ν = ν + − ν − and ν = ν + + ν − . Let µ+ ∈ J (ν + ) and µ− ∈ J (ν − ). Set µ = µ+ − µ− . Since {up : A ∈ p} ∈ B(Ub ), we have: + − ν(A) = up (A) dµ − up (A)dµ = up (A) dµ, Ub
Ub
Ub
for all A ∈ F . We claim that ν = µ. On one hand, since ν + = µ+ and ν − = µ− , we have: µ ≤ µ+ + µ− = ν + + ν − = ν. On the other hand, let µ1 and µ2 be the Jordan decomposition of the signed measure µ. We have µ = µ1 + µ2 = ν1 + ν2 ≥ ν, where νi (A) = Ub up (A) dµi for all A ∈ F and i = 1, 2. The inequality holds because ν = ν1 − ν2 by construction. We conclude that ν = µ, as wanted. We now prove that this µ is unique. Indeed, suppose to the contrary that there exist two signed regular Borel measures µ, µ such that µ = µ = ν and ν(A) = up (A) dµ = up (A)dµ , (12.4) Ub
Ub
for all A ∈ F . We first observe that µ = µ = ν < ∞ implies sup |µ(B)| < ∞ and sup |µ (B)| < ∞, that is, µ and µ are bounded. Next, define a map s from F into the power set of Ub by s(A) = {up : A ∈ p}. Let = {s(A) : A ∈ F }. Every set s(A) is V-closed, so that ⊆ B(Ub ).
Representation of coalitional games
271
From (12.4) it follows that µ and µ coincide on . The set is a π -class (i.e., it is closed under intersection) because, as it is easy to check, it holds s(A) ∩ s(B) = s(A ∩ B). Then µ and µ coincide on A(), the algebra generated by . For, let L = {B ⊆ Ub : µ(B) = µ (B)}. We check that L is a λ-system (see, e.g., Billingsley, 1985: 36–38). Since X ∈ p for every filter p in F , (12.4) implies Ub ∈ L. Moreover, if B ∈ L, then B c ∈ L because µ and µ are additive. Finally, suppose {Bi }∞ i=1 is an infinite sequence of pairwise disjoint subsets of Ub . If Bi ∈ L for all i ≥ 1, then ∪∞ i=1 Bi ∈ L because both µ and µ are countably additive. We conclude that L is a λ-system. By the π − λ theorem (see, e.g., Billingsley, 1985: 37), this implies that µ and µ coincide on A(), as wanted. The algebra A() is a base for a topology on Ub . Let us denote by τs such a topology. Next we prove that τs coincides with the relative V-topology τv on Ub . Let B(up0 ; A1 , . . . , An ; ) be a neighborhood of up0 . Set I1 = {i ∈ {1, . . . , n} : up0 (Ai ) = 1}
and
I2 = {i ∈ {1, . . . , n} : up0 (Ai ) = 0}, and ⎡ G=⎣
i∈I1
⎤
⎡
s(Ai )⎦ ∩ ⎣
⎤ (s(Ai ))c ⎦ .
i∈I2
Of course, G ∈ A(). Moreover, up0 ∈ G. If P ∈ G, then up (A1 ) = up0 (Ai ) for all 1 ≤ i ≤ n, so that up ∈ B(up0 ; A1 , . . . , An ; }. Therefore, we conclude up0 ∈ G ⊆ B(up0 ; A1 , . . . , An ; ) and this implies τv ⊆ τs because the sets of the form B(up ; A1 , . . . , An ; ) are a local base for V. As to the converse,
let G ∈ A()
be a neighborhood of up0 . W.l.o.g. the set G has the form [ i∈I1 s(Ai )] ∩ [ i∈I2 (s(Ai ))c ]. This follows from the usual procedure used for the construction of A() from and from the fact that up0 ∈ G. Let us consider B(up0 ; A1 , . . . , An ; ). Clearly, up0 ∈ B(up0 ; A1 , . . . , An ; ). Let up ∈ B(up0 ; A1 , . . . , An ; ). Then up (Ai ) = up0 (Ai ) for all 1 ≤ i ≤ n. This implies Ai ∈ p if i ∈ I1 and Ai ∈ / p if i ∈ I2 . Then / s(Ai ) if i ∈ I2 . Consequently, up ∈ G. Therefore, up ∈ s(Ai ) if i ∈ I1 and up ∈ up0 ∈ B(up0 ; A1 , . . . , An ; ) ⊆ G and this implies τs ⊆ τv . We can conclude τs = τv , as desired. Propositions 12.1 and 12.3 imply that τv is a compact Hausdorff space. From τs = τv it follows that also τs is a compact Hausdorff space. Since µ and µ are regular Borel measures on a compact Hausdorff space, they are τ -additive. For, let µ1 , µ2 be the Jordan decomposition of µ, and let {Gα } be a net of open sets such that Gα ⊆ Gβ for α ≤ β. Both µ1 and µ2 are regular (see Dunford and Schwartz, 1957: 137).
272
Massimo Marinacci
Therefore, they are τ -additive (see Gardner, 1981: 47), that is, lim µi (Gα ) = µi α
! Gα
for i = 1, 2.
α
On the other hand, it holds that ! ! ! µ Gα = µ1 Gα − µ 2 Gα α
α
α
= lim µ1 (Gα ) − lim µ2 (Gα ) = lim[µ1 (Gα ) − µ2 (Gα )] α
α
α
= lim µ(Gα ), α
and this proves that µ is τ -additive. A similar argument holds for µ . Now, let G be an open set in τv . Since A() is a base for τs , we have G = i∈I Gi where Gi ∈A() for all i ∈ I . Let |I | be the cardinal number of I . If |I | ≤ |N| set G∗n = ni=1 Gi . Since A() is an algebra, G∗n ∈ A(), so that countable additivity implies µ(G) = lim µ(G∗n ) = lim µ (G∗n ) = µ (G). n
n
If |I | is any infinite cardinal, we can again order {Gi }i∈I so that {Gi }i∈I = {Gα : α < |I |} (Greek letters denote ordinal numbers). Define G∗α as follows: (i) G∗1 = G1 ; (ii) if α is not a limit ordinal, then set G∗α= G∗α−1 ∪ Gα ; (iii) if α is a limit ordinal, then set G∗α = γ < α G∗γ . To prove that µ(G) = µ (G) we can then use a transfinite induction argument on the increasing net of open sets G∗α , an argument based on τ -additivity and on the fact that α < |I |. Of course, µ(F ) = µ (F ) for all closed subsets F ⊆ Ub . The class of all closed subsets is a π -class, and B(Ub ) is the σ -algebra generated by the closed sets. We have already proved that L = {B ⊆ Ub : µ(B) = µ (B)} is a λ-sytem. Therefore, by the π − λ theorem, B(Ub ) ⊆ L, as wanted. This completes the proof that µ = µ . This implies that there exists a unique decomposition of ν that satisfies the norm equation. For, suppose there exist two pairs ν1 , ν2 and ν1 , ν2 such that ν(A) = ν1 (A) − ν2 (A) = ν1 (A) − ν2 (A) and ν = ν1 + ν2 = ν1 + ν2 .
for all A ∈ F ,
Representation of coalitional games
273
Let µ, µi and µi be the unique measures on B(Ub ) associated to ν, νi and νi for i = 1, 2. It is easy to check that µ(s(A)) = µ1 (s(A)) − µ2 (s(A)) = µ1 (s(A)) − µ2 (s(A))
for all A ∈ F ,
and µ = µ1 + µ2 = µ1 + µ2 . It is then easy to check that µ(A) = µ1 (A) − µ2 (A) = µ1 (A) − µ2 (A)
for all A ∈ A().
Using transfinite induction as we did before, this equality can be extended to all open sets in Ub , and it is then easy to see that µ(A) = µ1 (A) − µ2 (A) = µ1 (A) − µ2 (A)
for all A ∈ B(Ub ).
But, there is only a unique decomposition of µ on B(Ub ) that satisfies the norm equation, that is, the Jordan decomposition. Therefore, µi = µi for i = 1, 2 and so νi = νi for i = 1, 2 as desired. We have already defined a correspondence J between T M and rca+ (Ub ). By what has been proved, this correspondence is indeed a function, that is, J (ν) is a singleton for every ν ∈ T M. Define a function J ∗ on V b by J ∗ (ν) = J (ν + ) − J (ν − ), where ν + , ν − is the unique decomposition of ν that satisfies the norm equation. Clearly J ∗ (ν) ∈ rca(Ub ), and J ∗ is onto. By now we know that µ ∈ J ∗ (ν) implies ν = µ. Therefore, we conclude that J ∗ is an isometric isomorphism. Finally, we show that J ∗ is a homeomorphism between (V b ∩ U (X), τD ) and (rca(Ub ) ∩ U (X), τw ). Since V b ∩ U (X) is V-compact and J ∗ is a bijection, it suffices to show that J ∗ is continuous on V b ∩ U (X). Let να be a net in V b ∩ U (X) that V-converges to an element ν ∈ V b ∩ U (X). Since rca(Ub ) ∩ U (X) is weak∗ -compact, to show that J ∗ (να ) weak∗ -converges to J ∗ (ν) it suffices to prove that every convergent subnet J ∗ (νβ ) of J ∗ (να ) V-converges to J ∗ (ν). Suppose limβ J ∗ (νβ ) = J ∗ (ν ). Then ν(A) = lim νβ (A) = lim up (A) dJ ∗ (νβ ) = up (A) dJ ∗ (ν ). β
β
Ub
Ub
Since J ∗ is bijective, this implies ν = ν , as wanted. As a simple corollary of Theorem 12.3, we can obtain the following interesting result, proved in a completely different way in Choquet (1953–1954). Let T M be the set of all totally monotone games on F . Corollary 12.1. A game is an extreme point of the convex set {ν ∈ T M : ν = 1} if and only if it is a filter game.
274
Massimo Marinacci
Proof. It is well known that the Dirac measures are the extreme points of the set of all regular probability measures defined on the Borel σ -algebra of a compact Hausdorff space. Since B(Ub ) satisfies this condition, a simple application of Theorem 12.3 proves the result. As observed in Shafer (1979), using this result of Choquet, the existence part in Theorem 12.3 for totally monotone set functions can be obtained as a consequence of the celebrated Krein–Milman Theorem. However, we think that the simple proof of existence we have given, based on the well-known Dempster–Shafer–Shapley Representation Theorem for finite algebras, is more attractive in the context of this chapter. Indeed, in Section 12.7 it will be proved that this technique leads to a new proof of the finitely additive representation of Revuz (1955–1956) and Gilboa and Schmeidler (1995).
12.6. Integral representation As a consequence of Theorem 12.3, we have the following representation result for the Choquet integral. Theorem 12.4. Let ν be a monotone set function in V b and f ∈ F . Then
f dν =
)
* lim inf f dµ p
Ub
X
where µ = J ∗ (ν). Proof. If f ∈ F is a simple function, it is easy to check that
f dν = X
f dup
Ub
dµ.
X
If f ∈ F is not a simple function, then there exists a sequence of simple functions that converges uniformly to f . Since the standard convergence theorems hold for the Choquet integral under uniform convergence, we conclude that the above equality is truefor any f ∈ F . To complete the proof we now show that ( Ub X f dup ) dµ = Ub [lim inf p f ] dµ. The pair (uA , ≥) is a net, where ≥ is the binary relation that directs p. We want to show that limA∈p uA = up . Let B(up ; A1 , . . . , An ; ) be a neighborhood of up . Set I = {i ∈ {1, . . . , n} : Ai ∈ p}. Suppose first that I = Ø. Then uA (Ai ) = up (Ai ) = 0 for all 1 ≤ i ≤ n and all
A ∈ p. This implies uA ∈ B(up ; A1 , . . . , An ; ). Suppose I = Ø. Set T = i∈I Ai . Let A ∈ p such that A ≥ T . Then uA (Ai ) = up (Ai ) = 0 whenever i ∈ / I , and uA (Ai ) = up (Ai ) = 1 whenever i ∈ I . Again, this implies
Representation of coalitional games
275
uA ∈ B(up ; A1 , . . . , An ; ). All this proves that limA∈p uA = up . Then f duA = f dup . lim A∈p X
But
X
X
fduA = inf x∈A f (x). Therefore lim f dup dµ = f duA dµ =
Ub
X
Ub
= Ub
)
A∈p X
* lim inf f dµ,
) lim
Ub
*
A∈p
inf f (x)
x∈A
dµ
p
as wanted. This result suggests a simple, but useful observation. Let f : N → X be a bounded infinite sequence in X. For convenience, set xn = f (n) for all n ≥ 1. Let us consider the power set P (N ). Let pc be the free filter of all cofinite subsets of N , and δpc the Dirac measure concentrated on pc . Then f dupc = lim inf f dδpc = lim inf xn . (12.5) N
Ub
pc
n
This shows that the lim inf of an infinite bounded sequence may be seen as a Choquet integral. This is interesting because Choquet integrals have been axiomatized as a decision criterion in the so-called Choquet subjective expected utility (CSEU, for short; see Schmeidler, 1989). As Equation (12.5) shows, the ranking of two infinite payoff streams through their lim inf can then be naturally embedded in CSEU. Of course, here we interpret games as weighting functions over periods and not as beliefs. In repeated games choice criteria based on lim inf have played an important role (see, e.g., Myerson, 1991: Ch. 7). One might hope that elaborating on Equation (12.5) a better understanding of the decision-theoretic bases of these criteria may be obtained. This is the subject of future research.
12.7. Finitely additive representation of games in V b In this section we give a proof of the finitely additive representation already proved, with an algebraic approach, by Revuz (1995–1956) and by Gilboa and Schmeidler (1995). The proof we give is a modification of the proof we used to prove the countably additive representation of the previous section. Let U na = {uT : T ∈ F and T = Ø} be the set of all unanimity games on F . Unlike Ub , the set U na is not V-compact. Consequently, the set of all bounded regular Borel measures on U na is not as well behaved as it was on Ub . To get a representation theorem base on U na it is therefore natural to look directly at
276
Massimo Marinacci
ba(U na, ), the set of all bounded finitely additive measures on an appropriate algebra ⊆ 2U na(X,F ) . Indeed, the unit ball in ba(U na, ) is weak∗ -compact, whatever topological structure on X we have. The problem is now to find out an appropriate algebra . For any A ∈ F let fA : U na → / be defined by fA (uT ) = uT (A). Moreover, let ∗ be the algebra generated by the sets {uT : T ⊆ A and Ø = T ∈ F } for A ∈ F . It turns out that the appropriate algebra is just ∗ , which is also the smallest algebra w.r.t. the functions fA belonging to B(U na, ∗ ), where B(U na, ∗ ) denotes the closure w.r.t. the supnorm of the set of all simple functions on ∗ . Since, as it is easy to check, there is a one-to-one correspondence between ∗ and , where is the algebra generated by the sets of the form {T : T ⊆ A and Ø = T ∈ F } for A ∈ F , we finally have the following result. Theorem 12.5. There is an isometric isomorphism T ∗ between (V b , · ) and (ba(U na, ), · ) determined by the identity ν(A) =
{T ∈F : T =Ø}
uT (A) dµ f or all A ∈ F .
(12.6)
The correspondence T ∗ is linear and isometric, that is, µ = T ∗ (ν) = ν. Moreover, ν is totally monotone if and only if the corresponding µ is nonnegative. In other words, we claim that for each ν ∈ V b there is a unique µ ∈ ba(U na, ) such that (12.6) holds; conversely, for each µ ∈ ba(U na, ) there is a unique ν such that (12.6) holds. Proof. Let ∗∗ be the algebra on U na generated by {uT }T ∈Y and ∗ . Let T M1 = {ν ∈ V : ν is totally monotone and ν = 1}. From the proof of Theorem 12.3 we already know that T M1 = cl{co(U na)}. Since the unit ball in ba(U na, ∗∗ ) is weak∗ -compact, an existence argument similar to that used in the proof of Theorem 12.3 proves that there exists a finitely additive probability measure µ∗∗ ∈ ba(U na, ∗∗ ) such that: ν(A) =
{T ∈F : T =Ø}
uT (A) dµ∗∗
for all A ∈ F .
(12.7)
At this stage we have to consider the whole ∗∗ because measures on U na with finite support might not be in ba(U na, ∗ ), while they always are in ba(U na, ∗∗ ). We can rewrite (12.7) as ν(A) = Y fT (A) dµ∗∗ for all A ∈ F . Let µ∗ be the restriction of µ∗∗ on ∗ . Since fT ∈ B(X, ∗ ), by lemma III.8.1 of Dunford and Schwartz (1957) we have: ν(A) =
{T ∈F : T =Ø}
uT (A) dµ∗
for all A ∈ F .
(12.8)
Representation of coalitional games
277
As to uniqueness, suppose there exists a probability measure µ ∈ ba(U na, ∗ ) such that ν(A) = uT (A) dµ∗ {T ∈F : T = Ø}
=
{T ∈F : T = Ø}
uT (A) dµ
for all A ∈ F .
This implies that µ∗ and µ coincide on {µT : T ⊆ A and Ø = T ∈ F }. Since the sets of the form {uT : T ⊆ A and Ø = T ∈ F } are a π -class and ∗ is generated by them, it is easy to check that µ∗ = µ , as wanted. There is a one-to-one correspondence g between ∗ and such that for A ∈ F we have g({uT : T ⊆ A and Ø = T ∈ F }) = {T : T ⊆ A and Ø = T ∈ F }. Setting µ(g(A)) = µ∗ (A) for all A ∈ ∗ , we finally get ν(A) = uT (A) dµ for all A ∈ F . {T ∈F : T =Ø}
(12.9)
We have therefore proved the existence of a bijection T1 between T M1 and the set of all probability measures in ba(U na, ), and T1 is determined by ν(A) = {T ∈F : T =Ø} uT (A) dµ for all A ∈ F . Clearly, ν(X) = Y dµ = µ(U na) because T ⊆ X for all T ∈ F . Hence ν = T1 (ν). This shows that T1 is isometric. The rest of the proof, that is, the construction of the isometric isomorphism T ∗ that extends T1 from T M1 to V b , can be done through a simple application of the decomposition obtained in Theorem 12.2. This observation completes the proof.
As a corollary we can obtain also Theorem E of Gilboa and Schmeidler (1995). It is interesting to compare this result with Theorem 12.4. In the next corollary the argument of the Choquet integral is inf x∈T f (x) because only unanimity games are considered, while in Theorem 12.4 we used the more general lim inf p f because we integrated over all filter games, including those defined by free filters. Corollary 12.2. Let ν be a monotone set function in V b and f ∈ F . Then ) * inf f (x) dµ, f dv = X
U na
x∈T
where µ = T ∗ (ν). Remark. This corollary is a bit sharper than Theorem E in Gilboa and Schmeidler (1995). In fact, instead of V b they use its subset V σ = {ν ∈ V b : µ = T ∗ (ν) is a σ -additive signed measure}.
278
Massimo Marinacci
Proof. If f ∈ F , we can apply the same argument (based only on finite additivity) used in the first part of the proof of Theorem 12.4 to prove that f dv = f duT dµ. X
Since
U na
X
X
f duT = inf x∈T f (x), we get the desired result.
12.8. Dual spaces On the basis of the representation results proved in Sections 12.5 and 12.6, in this section we give two different Banach spaces that have duals congruent to V b , that is, there exists an isometric isomorphism between V b and these dual spaces. Proposition 12.4. The following Banach spaces have their duals congruent with the space V b : (i) Let C(Ub ) be the space of all continuous functions on the space Ub endowed with the V-topology, and let · s be the supnorm. Then the dual space of the Banach space (C(Ub ), · s ) is congruent to (V b , · ). (ii) Set H = {T : T ∈ F and T = Ø}. Let B(H , ) be the closure w.r.t. the supnorm of the set of all simple functions on . Then the dual space of the Banach space (B(H , ), · s ) is congruent to (V b , · ). Ruckle (1982) proves that BV b ([0, 1], B) is congruent with a dual space. We can obtain his result as a consequence of Propositions 12.1 and 12.2, together with the following result from Functional Analysis (see e.g. Holmes, 1975: 211, Theorem 23.A). Proposition 12.5. Suppose that there is a Hausdorff locally convex topological space τ on a Banach space X such that the unit ball U (X) is τ -compact. Then X is congruent to a dual space.
Acknowledgments The author wishes to thank Itzhak Gilboa for his guidance, and Larry Epstein, Ehud Lehrer, David Schmeidler, two anonymous referees and an Associate Editor of Mathematics of Operations Research for helpful comments, discussions and references. Financial support from Ente Einaudi and Universitá Bocconi is gratefully acknowledged.
References Aumann, R. J., L. S. Shapley (1974). Values of Nonatomic Games. Princeton University Press, Princeton, NJ. Balcar, B., F. Franek (1982). Independent families in complete Boolean algebras. Trans. Amer. Math. Soc. 274, 607–618.
Representation of coalitional games
279
Billingsley, P. (1985). Probability and Measure. John Wiley and Sons, New York. Choquet, G. (1953–1954). Theory of capacities. Ann. Inst. Fourier 5, 131– 295. Dunford, N., J. T. Schwartz (1957). Linear Operators. Interscience, New York. Gardner, R. J. (1981). The regularity of Borel measures. Proc. Measure Theory Conf., Oberwolfach, Lecture Notes in Math., vol. 945, pp. 42–100, Springer-Verlag, New York. Gilboa, I., D. Schmeidler (1994). Additive representations of nonadditive measures and the Choquet integral. Ann Oper. Res. 52, 43– 65. ——, —— (1995). Canonical representation of set functions. Math. Oper. Res. 20, 197– 212. Holmes, R. B. (1975). Geometric Functional Analysis and Its Applications. SpringerVerlag, New York. Myerson, R. B. (1991). Game Theory. Harvard University Press, Cambridge. Phelps, R. (1965). Lectures on Choquet’s Theorem. Van Nostrand, Princeton, NJ. Revuz, A. (1955–1956). Fonctions croissantes et mesures sur les espaces topologiques ordonnés. Ann. Inst. Fourier 6, 187– 268. Ruckle, W. H. (1982). Projections in certain spaces of set functions, Math. Oper. Res. 7, 314–318. Schaefer, H. H. (1966). Topological Vector Spaces. The Macmillan Co., New York. Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Econometrica 57, 571– 587. (Reprinted as Chapter 5 in this volume.) Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ. —— (1979). Allocations of probability. Ann. Probab. 7, 827– 839. Shapley, L. S. (1953). A value for n-person games. Ann. Math. Stud. 28, 307– 317.
Part II
Applications
13 An overview of economic applications of David Schmeidler’s models of decision making under uncertainty Sujoy Mukerji and Jean-Marc Tallon
13.1. Introduction What do ambiguity averse decision makers do when they are not picking balls out of urns—when they find themselves in contexts that are “more realistic” in terms of economic institutions involved? In this part, the reader is provided with a sample of economic applications of the decision theoretic framework pioneered by David Schmeidler. Indeed, decision theoretic models are designed, at least in part, to eventually be used to answer questions about behavior and outcomes in interesting economic environments. Does it make a difference for the outcome of a given game, or market interaction or contractual arrangement if we were to assume that decision makers are ambiguity averse rather than Bayesian? What kind of insights are gained by introducing ambiguity averse agents in our economic models? What are the phenomena that can be explained in the “ambiguity aversion paradigm” that did not have a (convincing) explanation in the expected utility framework? Do equilibrium conditions (e.g. rational expectations equilibrium or any sort of equilibrium in a game) that place constraints on agents’ beliefs rule out certain types of beliefs and attitudes toward uncertainty? These are but a few of the questions that the contributions collected in this part of the volume have touched upon. In this introduction we discuss these contributions along with several other chapters which, while not included in the volume, make important related points and therefore play a significant role in the literature. We have organized the discussion of economic applications principally around three themes: financial markets, contractual arrangements and game theory. In all these contexts, it is found that ambiguity aversion does make a difference in terms of qualitative predictions of the models and, furthermore, often provides an explanation of contextual phenomena that is, arguably, more straightforward and intuitive than that provided by the expected utility model. The first section discusses chapters that have contributed to a better understanding of financial market outcomes based on ambiguity aversion. The second section focusses on contractual arrangements and is divided into two subsections. The first subsection reports research on optimal risk sharing arrangements, while in the second sub-section, discusses research on incentive contracts. The third section concentrates on strategic interaction and reviews several papers that have extended different game theoretic
284
Sujoy Mukerji and Jean-Marc Tallon
solution concepts to settings with ambiguity averse players. A final section deals with several contributions which while not dealing with ambiguity per se, are linked at a formal level, in terms of the pure mathematical structures involved, with Schmeidler’s models of decision making under ambiguity. These contributions involve issues such as, inequality measurement, intertemporal decision making and multi-attribute choice.
13.2. Financial market outcomes In a pioneering work, Dow and Werlang (1992) applied the Choquet expected utility (CEU) model of Schmeidler (1989) to the portfolio choice and identified an important implication of Schmeidler’s model. They showed, in a model with one risky and one riskless asset that there exists a nondegenerate price interval at which a CEU agent will strictly prefer to take a zero position in the risky asset (rather than to sell it short or to buy it). This constitutes a striking difference with an expected utility decision maker, for whom this price interval is reduced to a point (as known since Arrow, 1965). The intuition behind this finding may be grasped in the following example. Consider an asset that pays off 1 in state L and 3 in state H and assume that the DM is of the CEU type with capacity ν(L) = 0.3 and ν(H ) = 0.4 and linear utility function. The expected payoff (that is, the Choquet integral computed in a way explained in the introduction of this volume) of buying a unit of the risky asset (the act zb ) is given by CEν (zb ) = 0.6 × 1 + 0.4 × 3 = 1.8. On the other hand, the payoff from going short on a unit of the risky asset (the act zs ) is higher at L than at H . Hence, the relevant minimizing probability when evaluating CEν (zb ) is that probability in the core of (ν) that puts most weight on H . Thus, CEν (zs ) = 0.3 × (−1) + 0.7 × (−3) = −2.4. Hence, if the price of the asset z were to lie in the open interval (1.8, 2.4), then the investor would strictly prefer a zero position to either going short or buying. Unlike in the case of unambiguous beliefs there is no single price at which to switch from buying to selling. Taking a zero position on the risky asset has the unique advantage that its evaluation is not affected by ambiguity. The “inertia” zone demonstrated by Dow and Werlang was a statement about optimal portfolio choice corresponding to exogenously determined prices, given an initially riskless position.1 It leaves open the issue whether this could be an equilibrium outcome. Epstein and Wang’s (1994) is the first chapter that looked at an equilibrium model with multiple-prior agents. The main contribution is twofold. First, they provide an extension of Gilboa and Schmeidler’s multiple-prior model to a dynamic (infinite horizon) setting. Gilboa and Schmeidler’s model is static and considers only oneshot choice. The extension to a dynamic setting poses the difficult problem of ensuring dynamic consistency of choices, together with the choice of a revision rule for beliefs that are not expressed probabilistically. The recursive approach developed in Epstein and Wang (1994) (and which was subsequently axiomatized by Epstein and Schneider (2003)) allows one to bypass these problems and ensures that intertemporal choices are dynamically consistent. The authors also provide
Overview of economic applications
285
techniques enabling one to find optimal solutions of such a problem; techniques that amount to generalizing Euler equalities and dealing with Euler inequalities. Second, they develop a model of intertemporal asset pricing à la Lucas (1978). They thus construct a representative agent economy, in which the price of the assets are derived at equilibrium. One central result shows that asset prices can be indeterminate at equilibrium. Indeterminacy, for instance, would result when there are factors that influence the dividend process while leaving the endowment of the agent unchanged. In that case, we are back to the intuition developed in Dow and Werlang (1992): there exists a multiplicity of prices supporting the initial endowment. The economic important consequence of this finding is that large volatility of asset prices is consistent with equilibrium. Chen and Epstein (2002) develops the continuous-time counterpart of that in Epstein and Wang (1994). They show that excess returns for a security can be decomposed as a sum of a risk premium and an ambiguity premium. Epstein and Miao (2003) use this model to provide an explanation of the home-bias puzzle: when agents perceive domestic assets as nonambiguous and foreign asset as ambiguous, they will hold “too much” (compared to a model with probabilistic beliefs) of the latter. The framework developed in Epstein and Wang (1994) has the feature that the equilibrium is Pareto optimal and, somewhat less importantly, the equilibrium allocation necessarily entails no-trade (given the representative agent structure). Mukerji and Tallon (2001) develops a static model with heterogeneous agents. They show that ambiguity aversion could be the cause of less than full risk sharing, and, consequently, of an imperfect functioning of financial markets. Actually, ambiguity aversion could lead to the absence of trade on financial markets. This could be perceived to be a direct generalization of Dow and Werlang (1992)’s no-trade price interval result. However, simply closing Dow and Werlang’s model is not enough to obtain this result, as can be seen in an Edgeworth box. Hence, some other ingredient has to be inserted. Similar to the crucial ingredient leading to equilibrium price indeterminacy in Epstein and Wang (1994), what is needed here is the introduction of a component in asset payoffs that is independent of the endowments of the agents. Actually, one also needs to ensure that some component of an asset payoff is independent of both the endowments and the payoff of any other asset as well. Mukerji and Tallon (2001) prove that, when the assets available to trade risk among agents are affected by this kind of idiosyncratic risk, and if agents perceive this idiosyncratic component as being ambiguous and the ambiguity is high enough, then financial market equilibrium entails no trade at all and is suboptimal. This is to be contrasted with the situation in which agents do not perceive any ambiguity, in which standard replication and diversification arguments ensure that, eventually, full risk-sharing is obtained and the equilibrium is Pareto optimal. Thus, ambiguity aversion is identified to cause market breakdown: assets are there to be traded, but agents, because of aversion toward ambiguity, prefer to hold on to their (suboptimal) endowments, rather than bear the ambiguity associated to holding the assets. The absence of trade is of course an extreme result, which in particular is due to the fact that all the assets are internal assets. It would
286
Sujoy Mukerji and Jean-Marc Tallon
in particular be interesting to obtain results concerning the volume of trade, in particular if outside assets were present as well. Building on a similar intuition, Mukerji and Tallon (2004a) explain why unindexed debt is often preferred to indexed debt: the indexation potentially introduces some extra risk in one’s portfolio (essentially, the risk due to relative price variation of goods that appear in the indexation bundle but that the asset holder does not consume nor possess in his endowments). This provides further evidence that risk sharing and market functioning might be altered when ambiguity is perceived in financial markets. Financial applications of the decision theoretic models developed by David Schmeidler are, of course, not limited to the ones reported earlier. There is by now a host of studies that address issues such as under-diversification (Uppal and Wang (2003), cross-sectional properties of asset prices in presence of uncertainty (Kogan and Wang, 2002)), liquidity when the model of the economy is uncertain (Routledge and Zin (2001)). What is probably most needed now is an econometric evaluation of these ideas. Thus more work is needed to precisely assess how (non-probabilistic) uncertainty can be measured in the data. Some econometric techniques are being developed (see Henry (2001)) but applications are still rare. In a series of contributions, Hansen et al. (1999; 2001; 2004)2 have developed an approach to understanding a decision maker’s concern about “model uncertainty,” which, although not formally building on Schmeidler’s work, is based upon a similar intuition. The idea goes back to what the “rational expectations revolution” in macroeconomics wanted to achieve: that the econometrician modeler and the agents within the model be placed on equal footing concerning the knowledge they have of the model. This led to the construction of models in which agents have an exact knowledge of the model, in particular in which they knew the equilibrium price function. However, econometricians typically acknowledge that their models might be mis-specified. Thus, Hansen and Sargent argue, these doubts should also be present in the agents’ minds. They hence came to develop a model of robust decision making, wherein agents have a model in mind but also acknowledge the fact that this model might be wrong: they therefore want to take decisions that are robust against possible mis-specifications. Since a particular model implies a particular probability distribution over the evolution of the economic system, a consideration for robustness can be understood, in familiar terms of Schmeidler’s decision theory, as a concern for the uncertainty about what probability distribution is a true description of the relevant environment. Wang (2003) examines the axiomatic foundation of a related decision model and compares it with the multiple-prior model. The chapter Anderson et al. (2003) included in the volume, belongs to this line of research and takes an important step in formulating the kind of model mis-specifications the decision maker may take into consideration. As the authors emphasize “a main theme of the present chapter is to advocate a workable strategy for actually specifying those [Gilboa-Schmeidler] multiple priors in applied work.” The analysis is based on the assumption that the agents, given that they have access to a limited amount of data, cannot discriminate among various models
Overview of economic applications
287
of the economy. This makes them value decision rules that perform well across a set of models. What is of particular interest is that this cautious behavior will show in the data generated by the model as an uncertainty premium incorporated in equilibrium security market prices, which goes toward an explanation of the equity premium puzzle.
13.3. Optimal contractual arrangements 13.3.1. Risk sharing Optimal risk-sharing arrangements have been studied extensively, in contract theory, in general equilibrium analysis and so on. It is a priori not clear whether the risk-sharing arrangements that were optimal under risk remain so when one reconsiders their efficacy in the context of non-Bayesian uncertainty. Chateauneuf et al. (2000) studies the way economy-wide risk-sharing is affected when agents behave according to the CEU model. They show that the Pareto optimal allocations of an economy in which all agents are of the von-Neumann and Morgenstern type are still optimal in the economy in which agents behave according to CEU provided all agents’ beliefs are described by the same convex capacity. Things are however different when agents have different beliefs. To understand why, consider the particular case of betting: there is no aggregate uncertainty and agents have certain endowments. The only reason why they might be room for Pareto improving trade is if agents have different beliefs. This is the situation treated in Billot et al. (2000). They show that in this case, Pareto optimal allocations are full insurance allocations (i.e. all agents have a constant across state consumption profile) if and only if the intersection of the cores of the capacities representing their beliefs is nonempty. This is to be contrasted with the case in which agents have probabilistic beliefs: then, betting will take place as soon as agents have different beliefs, no matter how “small” this difference is. Thus, the fact that people do not bet against one another on many issues could be interpreted not as evidence that they have the same beliefs but rather that they have vague beliefs about these issues, and that vagueness is sufficiently large to ensure that agents have overlapping beliefs. Rigotti and Shannon (2004) look at similar issues when agents have “multiple priors with unanimity ranking” à la Bewley (1986) (see the introduction to this volume). Mukerji and Tallon (2004b) also considers a risk-sharing problem, but in the context of a wage contract. The chapter studies the optimal degree of (price) indexation in a wage contract between a risk-averse employee and a risk-neutral firm, given the presence of two types of price risk. The two types of risk are, an aggregate price risk, arising from an economy-wide shock (possibly, monetary) that multiplies prices of all goods by the same factor, and specific risks, arising from demand/supply shocks to specific commodities that affects the price of a single good or a restricted class of goods. If contracting parties were SEU maximizers, an optimal wage contract will typically involve partial indexation (i.e. a certain fraction of wages, strictly greater than zero, will be index linked). However, this
288
Sujoy Mukerji and Jean-Marc Tallon
chapter shows that if agents are ambiguity averse with CEU preferences and if they have an ambiguous belief about specific price risks involving goods that are neither in the employee’s consumption basket nor in the firm’s production set, then zero indexation coverage is optimal so long as the variability of inflation is anticipated to lie within a certain bound. What is crucial is the ambiguity of belief about specific price risks. The intuition for this result is again rather simple: ambiguity averse workers will not want to bear the risk associated to changes in relative prices of the goods composing the indexation bundle if these changes are difficult to anticipate. Thus, even though indexation insures them against the risk of inflation, if this risk is well apprehended (which is the case in most countries where inflation is low and not very variable) workers prefer to bear this (known) risk to the ambiguous risk associated to relative price movements which are less predictable. 13.3.2. Incentive contracts Typically, incentive contracts involve arrangements about contingent events. As such, the relevant trade-offs hinge crucially on the likelihoods of the relevant contingencies. Hence, it is a reasonable conjecture that the domain of contractual transactions is one area of economics that is significantly affected by agents’ knowledge of the odds. Thus such contractual relations are a natural choice as a particular focus of the research on the principal economic effects of ambiguity aversion. Why firms exist, what productive processes and activities are typically integrated within the boundaries of a firm, is largely explicated on the understanding that under certain conditions it is difficult or impossible to write supply and delivery contracts that are complete in relevant respects. A contract may be said to be incomplete if the contingent instructions included in the contract do not exhaust all possible contingencies; for some contingencies arrangements are left to be completed ex post. Incomplete contracts are, typically, inefficient. It is held, firms emerge to coordinate related productive activities through administrative hierarchies if such productive activities may only be inefficiently coordinated using incentives delivered through contracts, as would happen if conditions are such that the best possible contractual arrangements are incomplete. Mukerji (1998) shows that uncertainty and an ambiguity averse perception and attitude to this uncertainty is one instance of a set of conditions wherein the best possible contracts may be incomplete and inefficient. The formal analysis there basically involves a reconsideration of the canonical model of a vertical relationship (i.e. a relationship in which one firm’s output is an input in the other firm’s production activity) between two contracting firms under the assumption that the agents’ common belief about the contingent events (which affect the gains from trade) is described by a convex capacity rather than a probability. A complete contract which is appropriate in the sense of being able to deliver enough incentives to the contracting parties to undertake efficient actions, will require that the payments from the contract be uneven across contingencies. For instance, the contract would reward a party in those contingencies which are more likely if the party takes the “right” action. However,
Overview of economic applications
289
the Choquet evaluation of such contract, for either party, may be low because the expected value of the contracted payoffs vary significantly across the different probabilities in the core of the convex capacity. Thus a null contract, an extreme example of an incomplete contract, may well be preferred to the contract which delivers “more appropriate” incentives. This would be so because the null contract would imply that the ex post surplus is divided by an uncontingent rule and, as such, deliver payoffs that are more even across contingencies thereby ensuring that the expected value is more robust to variation in probabilities. Hence, the best contractual agreement under ambiguity might not be a good one, in the sense of being unable to deliver appropriate incentives, and therefore may be improved upon by vertical integration which delivers incentives by a hierarchical authority structure. Why might an explanation like the one given earlier be of interest? A recurrent claim among business people is that they integrate vertically because of uncertainty in input supply, a point well supported in empirical studies (see discussion and references in Shelanski and Klein (1999)). The claim, however, has always caused difficulties for economists in the sense that it has been hard to rationalize on the basis of standard theory (see, for instance, remarks in Carlton, 1979). The analysis in the present chapter explains how the idea of ambiguity aversion provides one precise understanding of the link between uncertainty and vertical integration. In a related vein, Mukerji (2002) finds that ambiguity could provide a theoretical link between a very prevalent contractual practice in procurement contracting and uncertainty. The prevalent practice in question is the use of contracts which reimburse costs (wholly or partially) ex post and therefore provide very weak cost control incentives to the contractor. It is argued in that paper, while there is ample empirical evidence for this link, for instance in the case of research and development procurement by the US Defense Department, the existing theoretical underpinnings for this link based on expected utility are less than plausible. It is worth pointing out that the analyses in both papers, Mukerji (1998) and Mukerji (2002), model firms as ambiguity averse entities. Economists have traditionally preferred to model firms as risk neutral, citing diversification opportunities. Based on formal laws of large numbers results proved in Marinacci (1999), and following upon the intuition in Mukerji and Tallon (2001), it may be conjectured that diversification opportunities, even in the limit, are not good enough to neutralize ambiguity, the way they can neutralize risk. The conjecture, to date, remains an interesting open question. Intriguingly, the optimal contractual forms characterized in both Mukerji (1998) and Mukerji (2002) are instances where ambiguity aversion leads to contracts with low powered incentives. It is an interesting but open question as to how general this is. It has been widely observed that optimal contracts, say under moral hazard, as predicted on the basis of expected utility analysis are far more high powered than seen in the real world. It is an intriguing conjecture that high powered, fine tuned incentive contracts are not robust to the ambiguity about relevant odds and that optimal incentive schemes under, say, moral hazard would be far less complex when ambiguity considerations are taken into account than what they are in standard theory. While this question is an important one, finding an answer is not likely
290
Sujoy Mukerji and Jean-Marc Tallon
to be easy. Ghirardato (1994) investigated the principal agent problem under moral hazard with the player’s preferences given by CEU. A significant finding there is that many of the standard techniques used in characterizing optimal incentive schemes in standard theory, do not seemingly work with CEU/MEU preferences. For instance, the Grossman-Hart “trick” of separating the principal’s objective function into a revenue (from implementing a given action) component and a cost (of implementing a given action) component is not possible with CEU/MEU preferences. On a more positive note, the chapter reports interesting findings about the comparative statics of the optimal incentive scheme with respect to changes in the agent’s uncertainty aversion. One result shows that as uncertainty aversion decreases the agent will be willing to implement an arbitrary action for a uniformly lower incentive scheme. The point of interest is that this is in contrast with what happens for decreases in risk aversion. Typically, a decrease in risk aversion will have an asymmetric effect on contingent payments: it will make high payments higher and low payments lower.
13.4. Strategic interaction In recent years noncooperative game theory, the theory of strategic decision making, has come to be the basic building block of economic theory. Naturally, one of the first points of inspiration stimulated by Schmeidler’s ideas was the question of incorporating these ideas into noncooperative game theory. The general research question was, “What if players in a game were allowed to have beliefs and preferences as in the CEU/MEU model?” More particularly, there were at least three interrelated sets of questions: (1) a set of purely modeling/conceptual questions: For example, how should solution concepts, such as strategic equilibrium, be defined given the new decision theoretic foundations; (2) questions about the general behavioral implications of the new solution concepts; (3) questions about insights such innovations might bring to applied contexts. The research so far has largely focused on clarifying conceptual questions like defining the appropriate analogue of solution concepts like Nash equilibrium, and that too almost exclusively in the domain of strategic form games with complete information. Questions about the appropriate equilibrium concepts in incomplete information games and refinements of equilibrium in extensive form games remain largely unanswered. However, the progress on conceptual clarification has provided significant clues about behavioral implications and, in turn, has led to some important insights in some applied contexts. One reason why the progress has been largely limited to complete information normal form games, is the host of supplementary questions that one has to face up to in order to tackle the question of defining equilibrium even in this simplest of strategic contexts. Defining a strategic equilibrium under ambiguity involves making several nontrivial modeling choices—namely, whether to use multiple priors or capacities to represent beliefs and if the latter what specific class of capacities, whether to allow for a strict preference for randomization, whether to fix actions explicitly in the description of the equilibrium, or whether, instead of
Overview of economic applications
291
explicitly describing actions, to simply describe the supports of the beliefs, and if the latter, what among the various possible notions of support to adopt (see Ryan (1999) for a perspective on this choice). Unsurprisingly, the definition of equilibrium varies across the literature, each definition involving a particular set of modeling choices. Lo (1996) considers the question of an appropriate notion of strategic equilibrium, for normal form games, when players’ preferences conform to the multiple priors MEU model. In Lo’s conceptualization, equilibrium is a profile of beliefs (about other players’ strategic choice) that satisfies certain conditions. To see the key ideas, consider a two-player game. The component of the equilibrium profile that describes player i’s belief, about strategic choice of player j , is a (convex) set of priors such that all j ’s strategies in the support of each prior are best responses for j given j ’s belief component in the equilibrium profile. (Lo also extends the concept to n-player games requiring players’ beliefs to satisfy stochastic independence as defined in Gilboa and Schmeidler (1989).) This notion of equilibrium predicts that player i chooses some (possibly mixed) strategy that is in the set of priors describing player j ’s belief about i’s choice. In terms of behavioral implications, the notion implies that an outsider who can only observe the actual strategy choices (and not beliefs), will not be able to distinguish uncertainty averse players from Bayesian players. Intuitively, the reason why uncertainty aversion has seemingly so limited “bite” in this construction is that players’ belief set is severely restricted by equilibrium knowledge: every prior in i’s belief set about j ’s strategic choice must be a best response mixed strategy. In other words, given equilibrium knowledge, there are too few possible priors, too little to be uncertain about, so to speak. Dow and Werlang (1994), Klibanoff (1996) and Marinacci (2000), all offer equilibrium concepts with uncertainty aversion that differ from Lo’s in one key way. They do not restrict the equilibrium belief to only those priors which are best responses (as mixed strategies); other priors are also possible thus enriching the uncertainty, in a manner of speaking. One principal effect of this is that these notions of equilibria “rationalize” more strategy profiles compared to Lo’s concept, indeed even strategy profiles that are not Nash equilibria. Dow and Werlang (1994) defines equilibrium in two-player normal form games where players have CEU preferences. Equilibrium is simply a pair of capacities, where each capacity gives a particular player’s belief about the strategic choice made by the other player. Further, the support of each capacity is restricted to include only such strategies which are best responses with respect to the counterpart capacity. Significantly, the equilibrium notion only considers pure strategies. A pure strategy is deemed to be a best response if it maximizes the Choquet expectation over the set of pure strategies. Much depends on how the support of a capacity is defined. Indeed, the only restriction on equilibrium behavior is that only those strategies which appear in the set defined to be the support of the equilibrium beliefs may be played in an equilibrium. Dow and Werlang (1994) defines support A of a capacity µi to be a set (a subset of Sj , the set of strategies of player j , in the present context) such that µi (Ac ) = 0 and µi (B c ) > 0 for all B ⊂ A. The nature of this definition may be more readily appreciated if we
292
Sujoy Mukerji and Jean-Marc Tallon
restrict attention to convex capacities and consider the set of priors in the core of such a capacity. Significantly, the set includes priors that put positive probability on pure strategies in Ac which may not be best responses. The convex capacity µi is the lower envelope of the priors in the core of µi ; hence, as long as there is one prior in the core which puts zero probability weight on sj ∈ Ac , µi (sj ) = 0. Hence, a player i’s evaluation of a pure strategy si will take into account the payoff ui (si , sj ), sj ∈ Ac , even though sj may not be a best response for player j , given j ’s equilibrium belief. Klibanoff (1996) defines equilibrium in normal form games and like Lo, applies the multiple priors framework to model players’ preferences. But there are important differences. Klibanoff defines equilibrium as a profile of tuples ({σi }, {mi })ni=1 where σi is the actual mixed strategy used by player i and mi is a set of priors of player i denoting his belief about opponents’ strategy choices. The profile has to satisfy two “consistency” conditions. One, each σi has to be consistent with mi in the sense that σi is a best response for i given his belief set mi . Two, the strategy profile σ−i chosen by other players should be considered possible by player i. The second condition is a consistency condition on the set mi in the sense that it has to include the actual (possibly mixed) strategy chosen by the other players. However, it is permitted that mi may contain priors that are not mixed strategies chosen by other players and indeed strategies that are not best responses. Hence, Klibanoff’s equilibrium is different from Lo’s in that it puts a weaker restriction on equilibrium beliefs, a restriction that is very similar to that implicit in Dow and Werlang’s definition. But Klibanoff’s definition is different from Dow and Werlang’s in that it explicitly allows players to choose a mixed strategy and allows for mixed strategies to be strictly preferred to pure strategies. Moreover, it is different from both Lo’s definition and Dow and Werlang’s in that it specifies more than just equilibrium beliefs: as noted, it explicitly states which strategies will be played in equilibrium. Marinacci (2000), defines equilibrium in two-player normal form games and, like Dow and Werlang, applies the CEU framework to model players’ preferences. He also defines an equilibrium in beliefs, again much like Dow and Werlang, where beliefs are modeled by convex capacities. However, he employs a slightly different notion of support for equilibrium capacities. His definition of support A of a capacity µi consists of all elements si ∈ Si such that µi (si ) > 0. This puts a weaker restriction on beliefs than Lo’s definition, in very much the same spirit as Dow and Werlang’s and Klibanoff’s definitions. But the true distinctiveness of Marinacci’s definition lies elsewhere. His definition includes an explicit, exogenous parametric restriction on the ambiguity incorporated in equilibrium beliefs. Given a capacity µ(·), the ambiguity of belief about an event A, denoted (A), is measured by 1 − µ(A) − µ(Ac ). The measure is intuitive: (A) is precisely the difference between the maximum likelihood put on A by any probability measure in the “core” of the µ(·), and the minimum likelihood put on A by some other probability measure appearing in the core. Thus (A) is indeed a measure of the fuzziness of belief about A. Marinacci views the ambiguity in the description of the strategic situation as a primitive, and characterizes a two-player ambiguous
Overview of economic applications
293
game G by the tuple {Si , ui , i : i = 1, 2}. The addition to the usual menu of strategies and utility functions is the ambiguity functional, i : Sj → [0, 1], restricting the possible beliefs of the player i to have a given level of ambiguity: a ˆ i (A). player i’s belief µi :2Sj → [0, 1] must be such that 1 − µi (A) − µi (Ac ) = In the models of Dow and Werlang, Lo, and Klibanoff, the equilibrium beliefs are freely equilibrating variables and as such, the level of ambiguity in the beliefs is endogenous. Given the endogeneity, it is not possible in these models, strictly speaking, to pose the comparative statics question, “What happens to the equilibrium if the ambiguity in the way players perceive information of the strategic environment changes?” In Marinacci’s model, on the other hand, this question is well posed since beliefs, as equilibrating variables, are not free to the extent they are subject to the ambiguity level constraint imposed by the parameter . Hence, the answer to the question (a very natural one in an applied context) involves a well posed comparative static exercise showing how the equilibrium changes when changes. Following is one way of understanding why notions of equilibrium in Dow and Werlang (1994), Klibanoff (1996) and Marinacci (2000) allow a “rationalization” of non-Nash strategy profiles. For instance, in a two-player game, it is possible to have as equilibrium, (µ∗1 , µ∗2 ), convex capacities denoting equilibrium beliefs of players 1 and 2, where si∗ = sup p(µ∗j ) is not a best response to si∗ = sup p(µ∗i ). The key to the understanding lies in the fact that the support of µ∗i , i = 1, 2, is so defined that it may exclude strategies sˆj ∈ Sˆj ⊂ Sj such that µ∗i (Sˆj ) > 0, but µ∗i (sj ) = 0. Since the strategy sˆj ∈ Sˆj is not in the support of µ∗i it is not required to be a best response to µ∗j , but nevertheless the Choquet integral evaluation of si∗ , with respect to the belief µ∗i , may attach a positive weight to the payoff ui (si∗ , sj ) given that µ∗i (Sˆj ) > 0. Hence, si∗ can be an equilibrium best response even though it may not be a best response to a belief that puts probability 1 on sj∗ . It is as if the player i, when evaluating si∗ , allows for the possibility that j may play a strategy that is not a best response. The discussion in the preceding paragraph suggests that equilibrium, as defined by Dow and Werlang (1994), Klibanoff (1996) and Marinacci (2000) incorporates a flavor of a (standard) Bayesian equilibrium involving “irrational types.” This point has been further investigated in Mukerji and Shin (2002). That paper concerns the interpretation of equilibrium in non-additive beliefs in two-player normal form games. It is argued that such equilibria involve beliefs and actions which are consistent with a lack of common knowledge of the game. The argument rests on representation results which show that different notions of equilibrium in games with non-additive beliefs may be reinterpreted as standard notions of equilibrium in associated games of incomplete information with additive (Bayesian) beliefs where common knowledge of the (original) game does not apply. More precisely, it is shown that any pair of non-additive belief functions (and actions, to the extent these are explicit in the relevant notion of equilibrium), which constitute an equilibrium in the game with Knightian uncertainty/ambiguity may be replicated as beliefs and actions of a specific pair of types, one for each player, in an equilibrium of an orthodox Bayesian game, in which there is a common prior over the type
294
Sujoy Mukerji and Jean-Marc Tallon
space. The representation results provide one way of comparing and understanding the various notions of equilibrium for games with non-additive beliefs, such as those in Dow and Werlang (1994), Klibanoff (1996), and Marinacci (2000). Greenberg (2000) analyzes an example of an equilibrium in a dynamic game wherein beliefs about strategic choice off the equilibrium path of play are modeled using ideas of Knightian uncertainty/ambiguity. The example illustrates both the appropriateness of this modeling innovation and its potential for generating singular insight in the context of extensive form games. The example is a game consisting of three players. Players 1 and 2 first play a “bargaining game” which can end in agreement or disagreement. Disagreement may arise due to the “intransigence” of either player. However, player 3, who comes into play only in the instance of disagreement, does not have perfect information as to which of players 1 or 2 was responsible for the disagreement; at the point player 3 makes his choice the responsibility for disagreement is private information to players 1 and 2. Player 3 has two actions, one of which is disliked by player 1 while the other is disliked by player 2, and disliked more than disagreement. However, player 3 is indifferent between his two choices though he prefers the agreement outcome to either of them. The conditions of Nash equilibrium require that players have a common probabilistic belief about 3’s choice. The details of payoffs are such that any common probability belief would make disagreement more attractive to at least one of players 1 and 2. Greenberg argues that agreement, even though not a Nash equilibrium, can be supported as the unique equilibrium outcome if the first two players’s beliefs about what 3 will do in the event of disagreement (an off equilibrium choice) is ambiguous and players are known to be ambiguity averse. A common set of probabilities describes players 1 and 2’s prediction about 3’s choice. But given uncertainty aversion, say as in MEU, each of players 1 and 2 evaluates his options as if his belief is described by the probability that mirrors his most pessimistic prediction. Hence the two players, given their respective apprehensions, choose to agree, thereby behaving as if they had two different probability beliefs about player 3’s choice. Greenberg further observes, player 3 may actually be able to facilitate this “good” equilibrium by not announcing or precommitting to the action he would choose if called upon to play following disagreement; the player would strictly prefer to exercise “the right to remain silent.” The silence “creates” ambiguity of belief and given aversion to this ambiguity, in turn “brings about” the equilibrium. The question of appropriate modeling of beliefs about offequilibrium path choices has been a source of vexation about as long as extensive form games have been around. It may be argued persuasively that on the equilibrium path of play, beliefs are pinned down by actual play. The argument is far less persuasive, if at all, for beliefs off the equilibrium path of play. Hence, the appropriateness of modeling such beliefs as ambiguous. But off-equilibrium path beliefs may be crucial for the construction of equilibrium. As has been noted, the good equilibrium described in Greenberg’s example would not obtain if players 1 and 2 were required to have a common probabilistic belief about 3’s choice. Of course, this profile would not be ruled out by a solution concept that allows for “disparate” beliefs off the path of play, for instance, self-confirming equilibrium
Overview of economic applications
295
(Fudenberg and Levine, 1993), subjective equilibrium (Kalai and Lehrer, 1994)), extensive form rationalizability (Bernheim (1984), Pearce (1984)). What ambiguity aversion adds, compared to these solution concepts, is a positive theory as to why the players (1 and 2) would choose to behave as if they had particular differing probabilistic beliefs even though they are commonly informed. While Greenberg does not give a formal definition of equilibrium for extensive form games where players may be uncertainty averse, Lo (1999) does. However, Lo does not go far enough to consider the question of extensive form refinements. Hence, determining reasonable restrictions on beliefs about off-equilibrium play, while allowing them to be ambiguous, remains an exciting open question, hopefully to be taken up in future research. Analysis of behavior in auctions has been a prime area of application of game theory, especially in recent years. Traditional analysis of auctions assumes the seller’s and each bidder’s belief about rival bidders’ valuations are represented by probability measures. Lo (1998) makes an interesting departure from this tradition by proceeding on the assumption that such beliefs are instead described by sets of multiple priors and players are uncertainty averse (in the sense of MEU). Lo’s analyzes first and second price sealed bid auctions with independent private values. In a more significant finding, he shows, under “an interesting” parametric specification of beliefs, that the first price auction Pareto dominates the second price auction. A rough intuition for the result is as follows. Suppose the seller and the bidders are commonly informed about “others’” valuation, that is, the information is described by the same set of probabilistic priors. When the seller is considering which of the two auction formats to adopt, the first or the second price sealed bid auction, he evaluates his options using that probabilistic prior (from the “common” set of priors) which reflects his worst fears, namely, that bidders have low valuations. Recall that bidders always bid their true valuation in the second price auction. Therefore, the usual Revenue Equivalence Theorem implies that the seller would be indifferent between the two auction formats, if bidders’ strategy (in the first price auction) were based on the same probabilistic prior that the seller effectively applies for his own strategy evaluations. However, given uncertainty aversion, bidders will behave as if the probability relevant for their purposes is the one that reflects their apprehensions: the fear that their rivals have high valuation. Which means under uncertainty aversion the optimal bid will be higher. On the other hand, because of his apprehensions, the seller will choose a reserve price for first price auction that is strictly lower than the one he chooses for the second price auction. Hence, the first price auction format is Pareto preferred to the second price format. Ozdenoren (2002) generalizes Lo’s results by relaxing the parametric restriction on beliefs significantly. These successful investigations of behavior in auctions point to the potential for further research to understand behavioral implications of uncertainty aversion in incomplete information environments, in general, and of implementation theory in particular. In general, strategic interaction in incomplete information environments would appear to be particularly appropriate a setting for investigation since the scope of play of ambiguity is far greater than what is possible under equilibrium restrictions in complete information settings. More
296
Sujoy Mukerji and Jean-Marc Tallon
particularly, Bayesian implementation theory has frequently been criticized for prescribing schemes which are “too finely tuned” to the principal’s and agents’ knowledge of the precise prior/posteriors. Perhaps introducing the effect of uncertainty aversion will lead to the rationalization of schemes which are more robust in this respect (and hopefully, an understanding of implementation schemes which are more implementable in the real world!). Eichberger and Kelsey (2002) apply Dow and Werlang’s notion of equilibrium to analyze the effect of ambiguity aversion on a public good contribution game. They show that it is possible to sustain as equilibrium (under ambiguity) a strategy profile which involves higher contributions than possible under standard beliefs. The rough idea is as follows. Recall our discussion about how the Dow and Werlang notion may allow a non-Nash profile to be the support of equilibrium beliefs. Working with the CEU model, Eichberger and Kelsey construct an equilibrium belief profile, wherein each player i behaves as if there is a chance that another player j plays a strategy lying outside the support of i’s equilibrium belief, in particular, a strategy that is “bad” for i, that is, make contribution lower than the equilibrium contribution. Essentially, it is this “fear” of lower contribution by others, given the strategic uncertainty, which drives up i’s equilibrium contribution. The chapter also extends the analysis from the public good provision games to consider more general games classified in terms of strategic substitutability and complementarity. Ghirardato and Katz (2002) apply the MEU framework to the analysis of voting behavior to give an explanation of the phenomenon of selective abstention. Consider a multiple office election, that is, an election in which the same ballot form asks for the voter’s opinion on multiple electoral races. It is typically observed in such elections that voters choose to vote on the more high profile races (say, the state governor), but simultaneously abstain from expressing an opinion on other races (say, the county sheriff). Ghirardato and Katz’s objective is to formalize a commonly held intuition that the reason the voters selectively abstain is because they believe that they are relatively poorly informed about the candidates involved in the low profile races. The chapter contends that the key to the intuition is the issue of modeling the sensitivity of a decision maker’s choice to what he perceives to be the quality of his information and that this issue cannot be adequately addressed in a standard Bayesian framework. On the other hand, they argue, it can be addressed in a decision theoretic framework which incorporates ideas of ambiguity; roughly put, information about an issue is comparatively more ambiguous and of lower quality if the information about the issue is represented by a more encompassing set of priors. This point would seem to be of wider interest and worth pursuing in future research.
13.5. Other applications Finally, we survey some applications of the tools developed by David Schmeidler that do not, per se, involve decision making under uncertainty. The applications covered relate to the measurement of inequality, intertemporal choice and multi-attribute choice.
Overview of economic applications
297
Inequality measurement. Decision theory under risk and the theory of inequality measurement essentially deal with the same mathematical objects (probability distributions). Therefore, these two fields are closely related, and their relationship has long been acknowledged.3 However, surprisingly enough, almost all the literature on inequality measurement deals with certain incomes. This is probably due, in part, to a widely held opinion that the problem of measuring inequality of uncertain incomes can be reduced to a problem of individual choice under uncertainty (say, e.g. by first computing in each state a traditional welfare function à la Atkinson-Kolm-Sen, and then reducing the problem to a single decision maker’s choice among prospects of welfare) or alternatively to a problem of inequality measurement over sure incomes (say, e.g. by evaluating each individual’s welfare by his expected utility, and then considering the distribution of the certainty equivalent incomes). In a path-breaking paper, Ben Porath et al. (1997) show that such is not the case, and that inequality and uncertainty should be analyzed jointly and not separately in two stages. They present the following example which serves to illustrate their point. Consider a society with two individuals, a and b, facing two equally likely possible states of the world, s and t, and assume that the planner has to choose among the three following social policies, P1 , P2 , and P3 : P1 s t
a 0 1
b 0 1
P2 s t
a 1 0
b 0 1
P3 s t
a 1 1
b 0 0
Observe that in P1 , both individuals face the same income prospects as in P2 ; but in P1 , there is no ex post inequality, whatever the state of the world. This could lead one to prefer P1 over P2 . Similarly, P2 and P3 are ex post equivalent, since in both cases, whatever the state of the world, the final income distribution is identical; but P3 gives 1 for sure to one individual, and 0 to the other, while P2 provides both individuals with the same ex ante income prospects. On these grounds, it is reasonable to think that P2 should be ranked above P3 . Thus, a natural ordering would be P1 P2 P3 . Ben Porath et al. (1997) point out that there is no hope to obtain such an ordering by two-stage procedures. Indeed, the first two-stage procedure (mentioned earlier) would lead us to neglect ex ante considerations and to judge P2 and P3 as equivalent. In contrast, the second procedure would lead us to neglect ex post considerations and to see P1 and P2 as equivalent. In other words, these procedures would fail to simultaneously take into account the ex ante and the ex post income distributions. They suggest solving this problem by considering a linear combination of the two procedures, that is, a linear combination of the expected Gini index and the Gini index of expected income. This solution captures both ex ante and ex post inequalities. Furthermore, it is a natural generalization of the principles commonly used for evaluating inequality under certainty on the one hand, and for decision making under uncertainty on the other hand.
298
Sujoy Mukerji and Jean-Marc Tallon
The procedure suggested in Ben Porath et al. (1997) is not the only possible evaluation principle that takes into account both ex ante and ex post inequalities. Any functional that is increasing in both individuals’expected income and snapshot inequalities (say, measured by the Gini index) has the same nice property, provided that it takes its values between the expected Gini and the Gini of the expectation. Furthermore, it is unclear why we should restrict ourselves, as Ben Porath et al. (1997) did, to decision makers who behave in accordance with the multiple-priors model. Finally, they do not provide an axiomatization for the specific functional forms they propose. This problem is partially dealt with in Gajdos and Maurin (2004), who provide an axiomatization for a broad class of functionals that can accommodate Ben Porath, Gilboa, and Schmeidler’s example. Intertemporal choice and Multi-attribute choice. Schmeidler’s models of decision under uncertainty have also been shown to hold new insights when applied to decision making contexts and questions that do not (necessarily) involve uncertainty. For instance, in the context of intertemporal decision making (under certainty). In terms of abstract construction, an intertemporal decision setting is essentially the same as that of decision making under uncertainty with time periods replacing states of nature. Gilboa (1989) transposes the CEU model of decision under uncertainty to intertemporal choices. He shows that using the axiomatization of Schmeidler (1989) and adding an axiom called “Variation preserving sure-thing principle” the decision rule is given by a weighted average of the utility in each period and the utility variation between each two consecutive periods. Aversion toward uncertainty is now replaced by aversion toward variation over time of the consumption profile. Wakai (2001) in a similar vein uses the idea that agents dislike time-variability of consumption and axiomatizes a notion of non-separability in the decision criterion. He then goes on to show how such a decision criterion modifies consumption smoothing and can help providing an explanation to the equity and the risk-free rate puzzle. Marinacci (1998) also uses a transposition of Gilboa and Schmeidler (1989) model to intertemporal choice and axiomatizes a complete patience criterion with a new choice criterion (the Polya index). De Waegenaere and Wakker (2001) generalizes the Choquet integral to signed Choquet integral, which captures both violations of separability and monotonicity. This tool can be used to model agents whose aversion toward volatility of their consumption is so high that they could prefer a uniformly smaller profile of consumption if it entails sufficiently less volatility. A work of related interest is Shalev (1997), which uses a mathematically similar technique to represent preferences incorporating a notion of loss aversion in an explicitly intertemporal setting, that is, where objects of choice are intertemporal income/consumption streams and the decision maker is averse to consumption decreasing between successive periods. In the context of decision making under uncertainty, the Choquet integral may be viewed as a way of aggregating utility across different states in order to arrive at an (ex ante) decision criterion. Multi-attribute choice concerns the question of aggregating over different attributes, or characteristics, of commodities in order to formulate an appropriate decision criterion for choosing among the multi-attributed
Overview of economic applications
299
objects (see for instance Grabisch (1996), Dubois et al. (2000).) The use of variants of the Choquet integral allows some flexibility in the way attributes are weighted and combined. In an interesting paper, Nehring and Puppe (2002) use the multiattribute approach to modeling (bio-)diversity. In doing so, they develop tools to measure diversity, based on the notion of the (Möbius) inverse of a capacity. Interestingly, this is also related to another line of research developed by Gilboa and Schmeidler, namely Case-Based Decision Theory (Gilboa and Schmeidler (1995), Gilboa and Schmeidler (2001)).
13.6. Concluding remarks This was a review of a sample of the rich, varied, and very much “alive” literature that has been inspired by David Schmeidler’s path breaking contributions to decision theory. We do want to emphasize that it is very much a sample; the list of papers discussed or cited is far from exhaustive. Nevertheless, we hope the following conclusions are justifiable even on the basis of the limited discussion. First, that thinking of decision making under uncertainty in the way Schmeidler’s models prompt and allow us to do, incorporating issues such as sensitivity of decision makers to the ambiguity of information, does lead to new and important insights about economic phenomena. Second, that while it has long been suspected that issues like ambiguity could be of significance in economics, the great merit of Schmeidler’s contribution has been to provide us with tools that have allowed us to develop tractable models to investigate these intuitions in formal exercises, per standard practice in economic theory. The opportunity we have had is quite unique. There are several other branches of decision theory that depart from the stand expected utility paradigm. But comparatively, these branches have seen far less applied work. One cannot help but speculate that the relative success of the ambiguity paradigm is in no small measure due to the tractability of Schmeidler’s models. Tractability, is a hallmark of a classic model. Lastly, we hope the review has demonstrated that there are many important and exciting questions that remain unanswered. And indeed, enough progress has been made with modeling issues which gives us grounds to be optimistic that the answers to the open questions cannot be far away. Therefore, researchers should consider such questions definitely worth investigating. While we have mentioned several issues worthy of future investigation in the course of our discussions, there are couple of issues that we have not touched upon. One, we hope there will be more directed empirical work, directed to testing predictions and also the basis of some of the assumptions, for instance in financial markets. Two, as would have been evident in our survey, most of the work has been of the “positive” variety, meant to help us “understand” puzzling phenomena. Perhaps, we also want to think about more normative questions: for instance, “What is a good way of deciding between alternatives, in a particular applied context, when information is ambiguous?” The work of Hansen, Sargent and their coauthors is one notable exception, but more is needed, perhaps in fields like environmental policy making where, arguably, information is in many instances, ambiguous.
300
Sujoy Mukerji and Jean-Marc Tallon
Acknowledgment We thank Thibault Gajdos and Itzhak Gilboa for helpful comments.
Notes 1 See Mukerji and Tallon (2003) for a derivation of this inertia property from a primitive notion of ambiguity without relying on a parametric preference model. 2 See for instance Hansen et al. (1999), Hansen et al. (2001) and their book Hansen and Sargent (2004). 3 For instance, it is easy to check that the well-known Gini index relies on a Choquet integral (with respect to a symmetric capacity). Indeed, as recalled in the introduction to this volume, the first axiomatization of a rank dependent model was provided in the framework of inequality measurement (Weymark, 1981). We will not expand on this literature, which has very close links with the RDEU model, here.
References Anderson, E., L. Hansen, and T. Sargent (2003). “A Quartet of Semigroups for Model Specification Robustness, Prices of Risk, and Model Detection,” Journal of the European Economic Association. (Reprinted as Chapter 16 in this volume.) Arrow, K. (1965). “The Theory of Risk Aversion,” in Aspects of the Theory of Risk Bearing. Yrjo J. Saatio, Helsinki. Ben Porath, E., I. Gilboa, and D. Schmeidler (1997). “On the Measurement of Inequality under Uncertainty,” Journal of Economic Theory, 75, 443–467. (Reprinted as Chapter 22 in this volume.) Bernheim, D. (1984). “Rationalizable Strategic Behavior,” Econometrica, 52, 1007–1028. Bewley, T. (1986). “Knightian Decision Theory: Part,” Discussion Paper 807, Cowles Foundation. Billot, A., A. Chateauneuf, I. Gilboa, and J.-M. Tallon (2000). “Sharing Beliefs: Between Agreeing and Disagreeing,” Econometrica, 68(3), 685–694. (Reprinted as Chapter 19 in this volume.) Carlton, D. W. (1979). “Vertical Integration in Competitive Markets Under Uncertainty,” Journal of Industrial Economics, 27(3), 189–209. Chateauneuf, A., R. Dana, and J.-M. Tallon (2000). “Optimal Risk Sharing Rules and Equilibria with Choquet-Expected-Utility,” Journal of Mathematical Economics, 34, 191–214. Chen, Z. and L. Epstein (2002). “Ambiguity, Risk, and Asset Returns in Continuous Time,” Econometrica, 70, 1403–1443. De Waegenaere, A., and P. Wakker (2001). “Nonmonotonic Choquet Integrals,” Journal of Mathematical Economics, 36, 45–60. Dow, J. and S. Werlang (1992). “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, 60(1), 197–204. (Reprinted as Chapter 17 in this volume.) —— (1994). “Nash Equilibrium Under Knightian Uncertainty: Breaking Down Backward Induction,” Journal of Economic Theory, 64(2), 305–324. Dubois, D., M. Grabisch, F. Modave, and H. Prade (2000). “Relating Decision Under Uncertainty and Multicriteria Decision Making Models,” International Journal of Intelligent Systems, 15(10), 967–979.
Overview of economic applications
301
Eichberger, J. and D. Kelsey (2002). “Strategic Complements, Substitutes, and Ambiguity: The Implications for Public Goods,” Journal of Economic Theory, 106(2), 436–466. Epstein, L. and J. Miao (2003). “A Two-person Dynamic Equilibrium Under Ambiguity,” Journal of Economic Dynamics and Control, 27, 1253–1288. Epstein, L. and M. Schneider (2003). “Recursive Multiple Prior,” Journal of Economic Theory, 113, 1–31. Epstein, L. and T. Wang (1994). “IntertemporalAsset Pricing Under Knightian Uncertainty,” Econometrica, 62(3), 283–322. (Reprinted as Chapter 18 in this volume.) Fudenberg, D. and D. K. Levine (1993). “Self-Confirming Equilibrium,” Econometrica, 61, 523—545. Gajdos, T., and E. Maurin (2002). “Unequal Uncertainties and Uncertain Inequalities: An Axiomatic Approach,” Journal of Economic Theory, 116(1), 93–118. Ghirardato, P. (1994). “Agency Theory with Non-additive Uncertainty,” mimeo. Ghirardato, P. and J. N. Katz (2002). “Indecision Theory: Quality of Information and Voting Behavior,” Discussion Paper 1106, California Institute of Technology, Pasadena. Gilboa, I. (1989). “Expectation and Variation in Multi-period Decisions,” Econometrica, 57, 1153—1169. Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18, 141—153. (Reprinted as Chapter 6 in this volume.) —— (1995). “Case-based Decision Theory,” Quarterly Journal of Economics, 59, 605–640. —— (2001). A Theory of Case-Based Decisions. Cambridge University Press, Cambridge. Grabisch, M. (1996). “The Application of Fuzzy Integrals in Multicriteria Decision Making,” European Journal of Operational Research, 89, 445–456. Greenberg, J. (2000). “The Right to Remain Silent,” Theory and Decision, 48(2), 193–204. (Reprinted as Chapter 21 in this volume.) Hansen, L. and T. Sargent (2004). Robust Control and Economic Model Uncertainty. Princeton University Press. Hansen, L., T. Sargent, and J. Tallarini (1999). “Robust Permanent Income and Pricing,” Review of Economic Studies, 66, 873–907. Hansen, L., T. Sargent, G. Turmuhambetova, and N. Williams (2001). “Robusteness and Uncertainty Aversion,” mimeo. Henry, H. (2001). “Generalized Entropy Measures of Ambiguity and its Estimation,” Working paper, Columbia University. Kalai, E. and E. Lehrer (1994). “Subjective Games and Equilibria,” Games and Economic Behavior, 8, 123–163. Klibanoff, P. (1996). “Uncertainty, Decision, and Normal Form Games,” mimeo, Northewestern University. Kogan, L. and T. Wang (2002). “A Simple Theory of Asset Pricing Under Model Uncertainty,” Working paper, University of British Columbia. Lo, K. (1996). “Equilibrium in Beliefs Under Uncertainty,” Journal of Economic Theory, 71(2), 443–484. (Reprinted as Chapter 20 in this volume.) —— (1998). “Sealed Bid Auctions with Uncertainty Averse Bidders,” Economic Theory, 12, 1–20. Lo, K. C. (1999). “Extensive Form Games with Uncertainty Averse Players,” Games and Economic Behavior, 28, 256–270. Lucas, R. (1978). “Asset Prices in an Exchange Economy,” Econometrica, 46, 1429–1445.
302
Sujoy Mukerji and Jean-Marc Tallon
Marinacci, M. (1998). “An Axiomatic Approach to Complete Patience,” Journal of Economic Theory, 83, 105–144. —— (1999). “Limit Laws for Non-additive Probabilities, and their Frequentist Interpretation,” Journal of Economic Theory, 84, 145–195. —— (2000). “Ambiguous Games,” Games and Economic Behavior, 31(2), 191–219. Mukerji, S. (1998). “Ambiguity Aversion and Incompleteness of Contractual Form,” American Economic Review, 88(5), 1207–1231. (Reprinted as Chapter 14 in this volume.) —— (2002). “Ambiguity Aversion and Cost-Plus Contracting,” Discussion paper, Department of Economics, Oxford University. Mukerji, S. and H. S. Shin (2002). “Equilibrium Departures from Common Knowledge in Games with Non-additive Expected Utility,” Advances in Theoretical Economics, 2(1), http://www.bepress.com/bejte/advances/vol2/iss1/art2. Mukerji, S. and J.-M. Tallon (2001). “Ambiguity Aversion and Incompleteness of Financial Markets,” Review of Economic Studies, 68(4), 883–904. (Reprinted as Chapter 15 in this volume.) —— (2004a). “Ambiguity Aversion and the Absence of Indexed Debt,” Economic Theory, 24(3), 665–685. —— (2004b). “Ambiguity Aversion and the Absence of Wage Indexation,” Journal of Monetary Economics, 51(3), 653–670. —— (2003). “Ellsberg’s 2-Color Experiment, Portfolio Inertia and Ambiguity,” Journal of Mathematical Economics, 39(3–4), 299–316. Nehring, K. and C. Puppe (2002). “A Theory of Diversity,” Econometrica, 67, 1155–1198. Ozdenoren, E. (2002). “Auctions and Bargaining with a Set of Priors”. Pearce, D. (1984). “Rationalizable Strategic Behavior and the Problem of Perfection,” Econometrica, 52, 1029–1050. Rigotti, S. and C. Shannon (2004). “Uncertainty and Risk in Financial Markets,” Discussion Paper, University of California, Berkeley, forthcoming in Econometrica. Routledge, B. and S. Zin (2001). “Model Uncertainty and Liquidity,” NBER Working paper 8683, Carnegie Mellon University. Ryan, M. J. (1999). “What Do Uncertainty-Averse Decision-Makers Believe?,” mimeo. University of Auckland. Schmeidler, D. (1989). “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57(3), 571–587. (Reprinted as Chapter 5 in this volume.) Shalev, J. (1997). “Loss Aversion in a Multi-Period Model,” Mathematical Social Sciences, 33, 203–226. Shelanski, H. and P. Klein (1999). “Empirical Research in Transaction Cost Economics: a Review and Assessment,” in Firms, Markets and Hierarchies: A Transactions Cost Economics Perspective, ed. by G. Caroll, and D. Teece, Chap. 6. Oxford University Press. Uppal, R. and T. Wang (2003). “Model Misspecification and Under-diversification,” Journal of Finance, 58(6), 2465–2486. Wakai, K. (2001). “A Model of Consumption Smooting with an Application to Asset Pricing,” Working paper, Yale University. Wang, T. (2003). “A Class of Multi-prior Preferences,” Discussion paper, University of British Columbia. Weymark, J. (1981). “Generalized Gini Inequality Indices,” Mathematical Social Sciences, 1, 409–430.
14 Ambiguity aversion and incompleteness of contractual form Sujoy Mukerji
State-contingent contracts record agreements about rights and obligations of contracting parties at uncertain future scenarios: descriptions of possible future events are listed and the action to be taken by each party on the realization of each listed contingency is specified. Casual empiricism suggests that a real-world contract is very often incomplete in the sense that it may not include any instruction for some possible events. The actual actions to be taken in such events are thus left to ex post negotiation. The fact that real-world contracts are incomplete explains the working of many economic institutions (see surveys by Jean Tirole (1994), Oliver D. Hart (1995), and James M. Malcomson (1997)). Take for instance the institution of the business firm. What determines whether all stages of production will take place within a single firm or will be coordinated through markets? In a world of complete contingent contracts there is no benefit from integrating activities within a single firm as opposed to transacting via the market. However, contractual incompleteness can explain why integration might be desirable and more generally why the allocation of authority and of ownership rights matters. This insight developed by Ronald H. Coase (1937), Herbert A. Simon (1951), Benjamin Klein et al. (1978), Oliver E. Williamson (1985), and Sanford J. Grossman and Hart (1986), among others, underscores the importance of understanding what reasons explain why and in what circumstances contracts are left incomplete. Traditionally, incompleteness of contracts has been explained by appealing to a combination of transactions costs (e.g. Williamson, 1985) and bounded rationality (e.g. Barton L. Lipman, 1992; Luca Anderlini and Leonardo Felli, 1994). These rationalizations validate the incompleteness as an economizing action on the hypothesis that including more detail in a contract involves direct costs. This chapter will provide an alternative explanation based on the hypothesis that decision behavior under subjective uncertainty is affected by ambiguity aversion. Indeed, it will be assumed throughout that there is no direct cost to introducing a marginal contingent instruction into a contract. Suppose an agent’s subjective knowledge about the likelihood of contingent events is consistent with more than one probability distribution, and further that
American Economic Review, 88 (1998), pp. 1207–32.
304
Sujoy Mukerji
what the agent knows does not inform him of a precise (second-order) probability distribution over the set of “possible” probabilities. We say then that the agent’s beliefs about contingent events are characterized by ambiguity. If ambiguous,1 the agent’s beliefs are captured not by a unique probability distribution in the standard Bayesian fashion but instead by a set of probabilities, any one of which could be the “true” distribution. Thus not only is the particular outcome of an act uncertain but also the expected payoff of the action, since the payoff may be measured with respect to more than one probability. An ambiguity-averse decision maker evaluates an act by the minimum expected value that may be associated with it. Thus the decision rule is to compute all possible expected values for each action and then choose the act which has the best minimum expected outcome. The idea being, the more an act is affected adversely by the ambiguity the less its appeal to the ambiguity-averse decision maker. The formal analysis in this chapter basically involves a reconsideration of the canonical model of a vertical relationship between two contracting firms under the assumption that the agents’ common information about the contingent events is ambiguous and that the agents are ambiguity averse. Next, I preview this exercise with a simple example. Consider two vertically related risk-neutral firms, B and S. B is an automobile manufacturer planning to introduce a new line of models. B wishes to purchase a consignment of car bodies (tailor-made for the new models) from S. The firms may sign a contract at some initial date 0 specifying the terms of trade of the sale at date 2; that is, whether trade takes place and at what price. The gains from trade are contingent upon the state of nature realized in date 1. There are three possible contingencies, ω0 , ωb , ωs , with corresponding tradeable surpluses s0 , sb , ss . After date 0 but before date 1, S invests in research for a die that will efficiently cast car bodies required for the new model while B invests effort to put together an appropriate marketing campaign for the new model. The investments affect the likelihood of realizing a particular state of nature. Each firm may choose between a low and a high level of investment effort. The investments are not contractible per se but the terms of trade specification may be made as contingency specific as required. In the case that the contract is incomplete and an “unmentioned” event arises with sure potential for surplus, it is commonly anticipated by the parties that trade will be negotiated ex post and the surplus split evenly. Consider the two possibilities X and Y : X—there is a longer list of reservations for the new model than for comparable makes and at a price higher than those for comparable makes; Y —the variable cost of production of car bodies is low. The state of the world ω0 is characterized by the fact that both the statements are false. At ωb , X is true but not Y ; conversely, at ωs , X is false but Y holds.2 Correspondingly, suppose s0 < sb = ss . The common belief about the likelihood of ωb is at the margin affected (positively) more by B choosing the high investment effort over low effort than by S doing the same, while the opposite is true of ωs . As is customary, we define a (first-best) efficient investment profile as one that would be chosen if investment effort were verifiable and contractible. Bear in mind the allowance of being able to write complete contingent contracts and the institutional setting of a vertical interfirm relationship. As will be
Incompleteness of contractual form
305
formally argued in a subsequent section, given all this and that decision makers are subjective expected utility (henceforth, SEU) maximizers, the non-verifiability of investment will not impede efficiency. In our example for instance, a contract which distinguishes the three contingencies and sets prices that reward B sufficiently higher at ωb than at other contingencies (and similarly reward S at ωs ) will enforce the first-best effort profile. The general conclusion is that if agents are SEU maximizers then an incomplete contract which implements an inefficient profile cannot be rationalized. Such a contract can never be the optimal because it will be possible to find a complete contract that dominates it (i.e. a contract that obtains higher ex ante payoffs for both parties). However, this conclusion is overturned if agents are ambiguity averse. The logic of this may be seen by reevaluating the above-mentioned example with the sole amendment that agents are ambiguity averse. To provide sufficient incentive to take the efficient investment, the ex post payoffs in the contract have to treat the two firms asymmetrically at ωb and ωs ; and ωs ; for B the payoff is higher at ωb than at ωs , while it is the other way around for S. This implies that the firms must use different probability distributions to evaluate their expected payoffs. From the set of probabilities embodying the firms’ symmetric information, B measures its payoffs using a probability distribution that puts a relatively higher weight on ωs than the distribution S thinks prudent to check its payoff against. Consequently, the sum of the expected payoffs will fall short of the expected total surplus—there is a “virtual loss” of the expected surplus. It follows that if this “loss” is large enough the participation constraints will break, thereby making such a contract impossible. An incomplete contract, say the null contract (one that leaves all allocation of contingent surplus to ex post negotiation), is not similarly vulnerable to ambiguity aversion. Such a contract will lead to a proportionate division of surplus at each contingency, implying that each firm will use the same probability to evaluate its payoffs. Additivity of the standard expectation operator then ensures that no “virtual loss” occurs. It will be shown that from all this it follows that there will be parametric configurations for which an incomplete contract, even though only implementing an inefficient investment profile, is not dominated by any other contract. Under such circumstances the market transaction, if maintained, may justifiably be conducted with an “inefficient” incomplete contract. The “inefficiency” of the market transaction would also explain why it might be abandoned in favor of vertical integration. Why might an explanation like the one given earlier be of interest? The final section of the chapter will discuss historic instances of vertical mergers and empirical regularities about supply contracts that are understandable on the basis of ambiguity aversion, but are not well explained by “physical” transactions costs of writing contingent details into contracts. A recurrent claim among business people is that they integrate vertically because of uncertainty in input supply. This idea has always caused difficulties for economists (see, for instance, Dennis W. Carlton, 1979) who have been unable to rationalize it and have generally regarded it as misguided (see, however, George Baker et al., 1997). The analysis in the present chapter explains how the idea of ambiguity aversion provides one precise understanding of the link between uncertainty and vertical integration.3 Moreover, since
306
Sujoy Mukerji
violations of SEU in general, and evidence of ambiguity aversion in particular, have long been noted in laboratory settings, it is worth uncovering what implications such “pathologies” have for “real-world” economics outside the laboratory. The exercise in this chapter is at least partly inspired by this thought. Finally, at a more abstract level, a significant insight obtained is that even if there were no direct cost to conditioning contractual terms on “finely described” events, one may well end up with only “coarse” arrangements because the value of fine-tuning is not robust to the agents’ misgivings that they have only a vague assessment of the likelihoods of the relevant “fine” events. The rest of the chapter is organized as follows: Section 14.1 introduces the framework of the formal decision model and its underlying motivation; Section 14.2 analyses the holdup model under the assumption of ambiguity aversion; Section 14.3 concludes the chapter with a discussion of the empirical significance of the results. The Appendix contains the formal proofs. Those eager for a first pass at the arithmetic of the main results may wish to look at Example 14.2 (which basically fleshes out the above example) in Section 14.2.
14.1. An introduction to the model of decision-making by ambiguity-averse agents It is often the case that a decision maker’s (DM) perception of the uncertain environment is ambiguous in the sense that his knowledge is consistent with more than one probability function. The theory of ambiguity aversion is inspired by two simple hypotheses about decision-making in such situations. First, that behavior is influenced by ambiguity: that is, DM’s behavior actually reflects the fact that his guess about a likelihood may be given by a probability interval. By presumption agents do not necessarily behave as if they have reduced all their ambiguity to a belief consistent with a unique probability using a “second-order” probability over the different probability distributions consistent with their knowledge. Second, that agents are ambiguity averse. That is, ceteris paribus, the more ambiguous their knowledge of the uncertainty the more conservative is their choice. David Schmeidler (1989) pioneered an axiomatic derivation of a model of DMs with preferences incorporating ambiguity aversion. This chapter uses Schmeidler’s model, termed the Choquet expected utility (CEU) model, in the formal arguments. The DM’s domain of uncertainty is the finite state space = {ωi }N i=1 . The DM chooses between acts whose payoffs are state contingent: for example, an act f , f : → R. In the CEU model an ambiguity-averse DM’s subjective belief is represented by a convex non-additive probability function, π . Like a standard probability function it assigns to each event a number between 0 and 1, and it is also true that, (i) π(Ø) = 0 and (ii) π( ) = 1. Where a convex nonadditive probability function differs from a standard probability is in the third property, (iii) π(X ∪ Y) ≥ π(X ) + π(Y) − π(X ∩ Y), for all X , Y ⊂ . By this third property,4 the measure of the union of two disjoint events may be greater than the sum of the measure of each individual event.5 A convex nonadditive probability function is actually a parsimonious representation of the full range of
Incompleteness of contractual form
307
probabilities compatible with the DM’s knowledge. π(X ) is interpreted as the minimum possible probability of X . This is readily seen from the fact that a given convex nonadditive probability π corresponds to a unique convex set of probability functions identified by the core6 of π , denoted by (π ) (notation: ( ) is the set of all additive probability measures on ): (π ) = {πj ∈ ( )|πj (X ) ≥ π(X ), for all X ⊂ }. Hence, π(X ) = minπj ∈(π) πj (X ). The convex nonadditive probability representation enables us to express the notion of ambiguity precisely. We say π is ambiguous if there are two events X , Y such that axiom (iii) holds with a strict inequality; π is unambiguous if axiom (iii) holds as an equality everywhere. (ADM with unambiguous belief is a SEU maximizer.) The ambiguity7 of the belief about an event X is measured by the expression A(π(X )) ≡ 1−π(X )−π(X c ). The relation between π and (π ) shows that the A is indeed a measure of the “fuzziness” of the belief, since, A(π(X)) = maxπj ∈(π) πj (X) − minπj ∈(π) πj (X). The DM evaluates Choquet expectation of each act with respect to the nonadditive probability, and chooses the act with the highest evaluation. Given a convex nonadditive probability π, the Choquet expectation7 of an act is simply the minimum of all possible “standard” expected values obtained by measuring acts with respect to each of the additive probabilities in (π ), the core of π :
CE(f ) = min
⎧ ⎨
πj ∈(π) ⎩
ωi ∈
⎫ ⎬ f (ωi )πj (ωi ) . ⎭
The Choquet expectation of an act is just its standard expectation calculated with respect to a “minimizing probability” corresponding to this act. Hence, in the Choquet method the DM’s appraisal is not only informed by his knowledge of the odds but is also automatically adjusted downwards to the extent it may be affected by the imprecision of his knowledge. The fact that the same additive probability (in the core of relevant nonadditive probability) will not in general “minimize” the expectation for two different acts, explains why the Choquet expectations operator, unlike the standard operator, is not additive: Property. For any two acts f , g : CE(f ) + CE(g) ≤ CE(f + g). Two acts are comonotonic if their outcomes are monotonic across the state space in the same way: that is, the acts f and g are comonotonic if for every ω, ω ∈ , (f (ω) − f (ω ))(g(ω) − (g(ω )) ≥ 0. Clearly, for comonotonic acts the “minimizing probability” will be the same. Hence, the Choquet expectations operator is assuredly additive if the acts being considered are comonotonic, but not otherwise. The first example explains how noncomonotonicity may lead to the failure of additivity.
308
Sujoy Mukerji Table 14.1 Relevant details about the contingent states Possible states
Nonadditive probability of the state Total surplus in the state Surplus designated for B in the state Surplus designated for S in the state
ws
wb
π(ws ) = 0.4 100 40 60
π(wb ) = 0.3 100 60 40
Example 14.1. Two agents, B and S, are considering their respective payoff from an agreement for sharing a contingent “surplus.” Table 14.1 indicates the (nonadditive) probability describing the common information about the uncertainty, the contingent surpluses, and the division of the surplus specified in the agreement. Given π(·), B’s expected payoff is obtained by taking expectations with respect to the relevant minimizing probability in the core of π ≡ CE(b) = 0.7 × 40 + 0.3×60 = 46. Similarly, S’s expected payoff ≡ CE(s) = 0.4×60+0.6×40 = 48. Finally, the expectation of the total surplus ≡ CE(b + s) = 100. Clearly, CE(b) + CE(s) = 94 < CE(b+s). Note that the payoff vectors chosen for B and S are noncomonotonic. This is responsible for the fact that (given π ) the minimizing probability corresponding to B’s payoffs (0.7; 0.3) is different from the corresponding to S’s payoffs (0.4; 0.6)—hence the evident failure of additivity. Notice how b and s mutually “hedge” against the ambiguity. This is the “economic” intuition of why “integrated” payoffs given by (b + s) is relatively robust (in the sense that its expected payoff is less affected by the possible mistake about the actual probability) to ambiguity aversion.9 In the example, the Choquet operator calculates expectations using the nonadditive probability directly by multiplying the nonadditive probability of each ωi with the payoff at the respective wi and then multiplying the “residual” [π({ωs , ωb }) − π(ωs ) − π(ωb )] with minimum outcome of the act across the two states. Thus, CE(b) = 40 × π(ωs ) + 60 × π(ωb ) + min{40, 60} × [π({ωs , ωb }) − π(ωs ) − π(ωb )] = 0.4 × 40 + 0.3 × 60 + 40 × 0.3 = 46. In general, the operator will associate the “residual” in an event with the worst outcome in the event. See Example 14.A.1 for further clarification. A convex nonadditive probability function expresses the idea that an agent’s knowledge of the true likelihood of an event E, is less vague than his knowledge of the likelihood of a cell in a “fine” partition of E.10 It is common experience that the evidence and received knowledge that informs one of the likelihood of a “large”
Incompleteness of contractual form
309
event does not readily break down to give similar information about the “finer” constituent events. While it is routine to work out an “objective” next-day forecast of the probability of rain in the New York area, the same analytics generally would not yield a similar forecast for a particular borough in New York. Beliefs are not built “bottom up”: one typically does not figure out the belief about a “large” event by putting together the beliefs about all possible subevents. This rationale for nonadditive probabilities is formalized in Paolo Ghirardato (1994) and Mukerji (1997). These papers also point out how a similar rationale explains why the DM’s awareness that the precise implication of some contingencies is inevitably left unforeseen will typically lead to beliefs that have nonadditive representation. The papers explain the Choquet decision rule as a “procedurally rational” agent’s means of “handicapping” the evaluation of an act to the extent the estimate of its “expected performance” is adversely affected by his imprecise knowledge of the odds. There is considerable experimental evidence (see Colin F. Camerer and Martin Weber, 1992) demonstrating that DMs’ choices are influenced by ambiguity and also that aversion to the perceived ambiguity is typical. The classic experiment, due to Daniel Ellsberg (1961), runs as follows: There are two urns each containing one hundred balls. Each ball is either red or black. The subjects are told of the fact that there are fifty balls of each color in urn I . But no information is provided about the proportion of red and black balls in urn II. One ball is chosen at random from each urn. There are four events, denoted IR, IB, IIR, IIB, where IR denotes the event that the ball chosen from urn I is red, etc. On each of the events a bet is offered: $100 if the event occurs and $0 if it does not. The modal response is for a subject to prefer every bet from urn I (IR or IB) to every bet from urn II (IIR or IIB). That is, the typical revealed preference is IB IIB and IR IIR. (The preferences are strict.) The DM’s beliefs about the likelihood of the events, as revealed in the preferences, cannot be described by a unique probability distribution. The story goes: People dislike the ambiguity that comes with choice under uncertainty; they dislike the possibility that they may have the odds wrong and so make a wrong choice (ex ante). Hence they go with the gamble where they know the odds—betting from urn I . It is straightforward to check that the choice is consistent with convex nonadditive probabilities: For instance, let π(IR) = π(IB) and π(IR) + π(IB) = π(IR ∪ IB) = 1; also let π(IIR) = π(IIB), but allow π(IIR) + π(IIB) < π(IIR ∪ IIB) = 1. It follows that the expected payoff from betting on IR ≡ CE(IR) = CE(IB) = 50; and, CE(IIR) = CE(IIB) = π(IIR) × 100 = π(IIB) × 100 < 50. The theory of ambiguity aversion lends fresh insight into the analysis of important economic problems. Despite the relative novelty of the theory, there is already convincing evidence of this. Interesting applications in the area of finance include James P. Dow and Sergio R. Werlang (1992), Larry G. Epstein and Tan Wang (1994, 1995), and Jean-Marc Tallon (1998). Specific applications to strategic interaction
310
Sujoy Mukerji
include Dow and Werlang (1994), Mukerji (1995), Jurgen Eichberger and David Kelsey (1996), and Kin Chung Lo (1998).
14.2. Ambiguous beliefs, investment holdup and incomplete contracts—a formal analysis Consider a (downstream) buyer B who wishes to purchase a unit of a homogeneous input from an (upstream) seller S. B and S, who are assumed to be risk neutral, may sign a contract at an initial date 0. The contract will specify the terms of trade at date 2, which is when 0 or 1 unit of the input may be produced and traded. After date 0 but before date 1, B and S make relation-specific sunk investments β ∈ {βH , βL } and σ ∈ {σH , σL } respectively. Investments are like effort, in the sense of having an unverifiable component; and so it is supposed that β and σ are not contractible. The buyer’s valuation, v, and the seller’s production costs, c, are contingent on state of the world ωi realized at date 1; ωi is drawn from a finite set = {ωi }N i=1 , and v = v(ωi ), c = c(ωi ). The (contingent) surplus at ωi is s(ωi ) ≡ v(ωi ) − c(ωi ). The contingencies are indexed such that s(ωi ) is (weakly) increasing in i. The vector of joint investments determines the belief (common information to B and S) about the likelihood of each contingent event, represented by a (possibly nonadditive and convex) probability distribution over 2 . π(E|β, σ ) is the probability that the event E ⊆ is realized when the investment profile is (β, σ ). hB (β), hS (σ ) denote the respective (private) costs of the actions β and σ . Since investment of one party may affect the likelihood of the ex post surplus and hence the expected gains from trade of the other party, there is a potential problem of untapped externalities, and therein lies the genesis of the holdup problem. The primitives of the model are then described by a tuple (π(·|β, σ ), hB (·), hS (·), v(·), c(·)). If f : → R, then f denotes the vector [f (ω1 ), . . . , f (ωN )]; π (β, σ ) denotes the vector [π(ω1 |β, σ ), . . . , π(ωN |β, σ )]. Given a contingency ωi , let X(ωi ) denote the event {X ⊆ |ωj ∈ X ⇔ j ≥ i}. The following assumptions on the specification of the model are maintained throughout. Assumption 14.1. hB (βH ) > hB (βL ) > 0, hs (σH ) > hB (σ )L > 0. Assumption 14.2a. π(X(ωi )|βH , σ ) ≥ π(X(ωi )|βL , σ ) for all i ∈ {1, . . . , N } and there is an ωn with the property that s(ωn ) > s(ωn−1 ), where the above inequality holds strictly. Assumption 14.2b. π(X(ωi )|β, σH ) ≥ π(X(ωi )|β, σL ) for all i ∈ {1, . . . , N } and there is an ωn∗ with the property that s(ωn∗ ) > s(ωn∗ −1 ), where the above inequality holds strictly. The first assumption has it that “H” actions are costlier than “L” actions. Assumption 14.2a(b) simply says that βH (σH ) stochastically dominates βL (σL ) in the first-order sense.11
Incompleteness of contractual form
311
ES(β, σ ), defined in Equation (14.1), is understood to be the expected net surplus under vertical integration when the profile (β, σ ) is chosen: ES(β, σ ) ≡ E(max{s(ωi ), 0}|β, σ ) − hB (β) − hS (σ ).
(14.1)
(E denotes the standard expectations operator when π is additive and the Choquet expectations operator if π is nonadditive.) The expression ES(β, σ ) provides a natural way to rank profiles (β, σ ) as first best,12 second best, and so on. For instance, (β ∗ , σ ∗ ) is a first-best profile if (β ∗ , σ ∗ ) = arg max{ES(β, σ )}. To keep matters interesting I will restrict attention to parameter configurations which ensure that the expected net surplus from a second-best profile is nonnegative. It is assumed that parties can write contracts as contingency specific as they choose; that is, parties may make prices and delivery contingent on events in
in any way they wish.13 A contingent price p(ωi ) is the payment by B to S subject to the realization of ωi . The price may take a negative value; one may informally interpret the price p(ωi ) < 0 as a fine paid by S to B. A contingent delivery rule is a function δ : → {0, 1}, where δ(ωi ) = 1 (or 0) indicates an agreement for contingent delivery14 (or nondelivery) on date 2. A contract is a list of contingent instructions {(p(ωi ), δ(ωi ))}i∈N ⊆{1,...,N } . A given contract will imply a particular allocation of the surplus between the buyer and seller at each contingency. Correspondingly, a contingent transfer t(ωi ) ≡ p(ωi ) − c(ωi )δ(ωi ), is S’s ex post payoff at ωi ; the complement s(ωi )δ(ωi )−t(ωi ) goes to B. A contract implements an action profile (β, σ ) if for all β ∈ {βH , βL } and σ ∈ {σH , σL } the conditions I CB , P CB , I CS , P CS stated below are satisfied. E(v(ωi )δ(ωi ) − p(ωi )|β, σ ) − E(v(ωi )δ(ωi ) − p(ωi )|β , σ ) ≥ hB (β) − hB (β ) E(v(ωi )δ(ωi ) − p(ωi )|β, σ ) − hB (β) ≥ 0
(14.2) (14.3)
E(p(ωi ) − c(ωi )δ(ωi )|β, σ ) − E(p(ωi ) − c(ωi )δ(ωi )|β, σ ) ≥ hS (σ ) − hS (σ ) E(p(ωi ) − c(ωi )δ(ωi )|β, σ ) − hS (σ ) ≥ 0.
(14.4) (14.5)
The conditions ensure that payments obtained in the contract meet the incentive and participation constraints of the buyer and seller, so that they choose β and σ . A contract is deemed incomplete if it fails to include instructions for the value of p(ωi ) or of δ(ωi ) for at least one contingency ωi . The null contract is an extreme example of an incomplete contract. It is a contract which does not specify instructions about the terms of trade to be observed in any contingency (or whether any trade is to be conducted at all). It is assumed that if a contingency promising
312
Sujoy Mukerji
positive gains from trade is reached parties engage in trade in spite of the absence of instructions. In such an instance the terms of trade are negotiated ex post. It will be taken as given that any instance of ex post bargaining results in a λ (to B), 1 − λ (to S) proportional division of the surplus arising from trade. It is assumed that the value of λ is commonly known to both parties at date 0. Consider a contract C and let tC (·) be the associated transfer rule (where tC (·) is suitably extended in accordance with our assumption about the division of surplus at unwritten contingencies). Denote I(C) to be the set of profiles that can be implemented by C. Then the expected payoff from C is max
(β,σ )∈I (C )
{E(s(ωi )δ(ωi ) − tC (ωi )|β, σ ) + E(tC (ωi )|β, σ )}.
(In this maximand, E(s(ωi )δ(ωi ) − tC (ωi )) is B’s expected payoff and E(tC (ωi )) is S’s expected payoff.) I define optimal contracts to be Pareto optimal (ex ante). Hence a contract is optimal if there does not exits another contract with a greater expected payoff. The point of writing a contingent contract in this setting is to make payoffs contingent on events in a way that ensures that parties have the right incentives for undertaking the requisite actions. In other words, the contingent events serve as proxies for the actual (noncontractible) actions. Hence, for the exercise to be meaningful at all, the contingency space has to be minimally informative. Condition 14.1 ensures that there are contingencies which are differentially informative of agents’ actions. In other words, for any action there is at least one contingency whose likelihood is differently affected (at the margin) by this action than by any other. If this were not to hold, contingent events would be completely uninformative about the actions taken by the parties and writing contingent contracts would not be a meaningful exercise. Henceforth, a convex nonadditive probability π(·|β, σ ) will be referred to as an informed belief if it satisfies Condition 14.1. Condition 14.1 (Informativeness). Let (π(·|β, σ )) denote the set of additive probability measures in the core of the convex nonadditive probability function π(·|β, σ ). Suppose πm and πn are probability functions in (π(·|βH , σH )), πr is any member of (π(·|βL , σH )), and πq is any member of (π(·|βH , σL )). Then there exists at least one pair of contingencies ωk and ωl , such that the vectors (πm (ωk ) − πr (ωk ), πm (ωl ) − πr (ωl )) and (πn (ωk ) − πq (ωk ), πn (ωl ) − πq (ωl )) are linearly independent. The condition, as stated, applies to any convex nonadditive probability, which includes as a special case the instance when beliefs are additive (i.e. they are trivially nonadditive). The technical import of the condition is simpler to grasp if
Incompleteness of contractual form
313
we were to focus on this special case. In this special case, the condition requires independence between the vectors indicating the marginal effect of β and σ on the unique probability describing the uncertainty. In general, the condition requires the same of every distribution in the set of probabilities consistent with the nonadditive probability that might describe the uncertainty. The condition ensures that there are at least two contingencies whose likelihoods (as described by any of the probability functions consistent with the DM’s knowledge) are differently affected at the margin by βH and σH . The following lemma assures us that Condition 14.1 does the job required of it: if the condition is satisfied, then there exist ways of conditioning payments to meet the incentive compatibility requirements of implementing the first best. Lemma 14.1. Suppose π(·|β, σ ) is an informed convex nonadditive probability function and that (βH , σH ) is the first-best action profile. Then there will always be a bounded transfer rule, t˜, which satisfies the incentive compatibility conditions I CB , I CS : (I C B ) E(max{s(ωi ), 0} − t˜(ωi )|βH , σH ) − E(max{s(ωi ), 0} − t˜(ωi )|βL , σH ) ≥ hB (βH ) − hB (βL ) (I C S ) E(t˜(ωi )|βH , σH ) − E(t˜(ωi )|βH , σL ) ≥ hS (σH ) − hS (σL ). The first proposition proves that if agents’ beliefs are not ambiguous, then informativeness of the contingency space guarantees the existence of a contract that implements the first best. The result is not really new; Steven R. Williams and Roy Radner (1988) and Patrick Legros and Hitoshi Matsushima (1991), for example, arrive at a similar conclusion. Proposition 14.1. Suppose π(·|β, σ ) is unambiguous and informed. Let (βH , σH ) be the first-best action profile. Then there exists a contract with an associated bounded transfer rule, t, that implements the first best. The formal proof appears in the Appendix but the basic argument is straightforward: if, at the margin, the effect of B’s action on the likelihood of event {ωk } is greater than its effect on the likelihood of event {ωl } (and the converse is true for S’s action vis-à-vis events {ωk } and {ωl }), then a unit of the contingent surplus assigned to B in the event {ωk } has a relatively greater impact on B’s decisions than a unit of contingent surplus assigned to it in the event {ωl }; a parallel argument works for S and the events {ωl } and {ωk }. Hence adequate incentives can be put in place by rewarding B sufficiently higher at {ωk } than at {ωl }, and by rewarding S higher at {ωl } than at {ωk }. Since the expectation operator is additive, the sum of the agents’ expected payments would be equal to the net surplus obtainable under vertical integration. Thus the participation constraints may be satisfied by appropriate ex ante transfers. Indeed, Williams and Radner (1988) demonstrate that generically, in the space of probability distributions, there exist contingent transfers that would
314
Sujoy Mukerji
enforce efficient implementation. In other words, a condition such as 14.1 will hold generically in the data of the model. An argument invoking mathematical genericity is not necessarily compelling though, since the data of the model is not necessarily generated by “random sampling.” Arguably, the parameter values are likely to be specific to the institutional setting. However, bear in mind the relevant institutional setting is one of a vertical relationship between two firms operating distinct (even though, complementary) production processes. As such it is only reasonable to assume that events such as {ωl } and {ωk } are bound to exist. (In Example 14.2, and its verbal description in Section 14.1, {ωb } and {ωs } are such events.) Therefore, Proposition 14.1 gives us compelling reason to conclude that at least as far as vertical relationships between firms are concerned, if agents’ common beliefs were unambiguous and there were no direct cost to drafting contingent payments, contracts can achieve efficient implementation. Clearly then, in such a world, mergers cannot better what can be achieved with contracts and furthermore, an incomplete contract that does not implement efficiently cannot be the best possible contract. The following Corollaries 14.1 and 14.2 summarize these conclusions. Corollary 14.1. Suppose π(·|β, σ ) is unambiguous and informed. Then any contract that does not implement the first best is not an optimal contract. Corollary 14.2. Suppose π(·|β, σ ) is unambiguous and informed. Then the maximum net expected surplus under vertical integration max{ES(β, σ )}, is no greater than the expected payoff obtainable from an optimal contract. The next and key proposition shows that contrary to the first result, even if informative events exist, ambiguity-averse agents may not be able to draft contracts that implement the first best. Proposition 14.2 says that provided the convex nonadditive probability π is ambiguous and satisfies two conditions, there exists a nonempty set of cost and value functions such that the efficient investment cannot be implemented with a contract—and this in spite of there being sufficiently informative contractible events. The first of these two conditions, encapsulated in Conditions 14.2a and 14.2b, essentially rules out cases where the contingency space (labeled according to increasing surplus) may be partitioned into two cells, one of which is such that its likelihood is affected (at the margin) by only one party (B or S, not both). The final condition, Condition 14.3, may be interpreted to mean that the belief about any event consisting of two contiguous contingencies is ambiguous.15 The role played by the two conditions is explained along with the general intuition for the proof, after the statement of the proposition. A precise statement of Condition 14.2 and the details of the proof are especially facilitated if we were to define a concept of the “social benefit of an action.” Define the Social Benefit of the action βH over the action βL [denoted SocBen (βH /βL )] as E(max{s(ωi ), 0}|βH , σH ) − E(max{s(ωi ), 0}|βL , σH ). Analogously, SocBen (σH /σL ) ≡ E(max{s(ωi ), 0}|βH , σH ) − E(max{s(ωi ), 0}|βH , σL ). Define π(X|β/β , σ ) ≡ π(X|β, σ ) − π(X|β , σ ),
Incompleteness of contractual form
315
and π(X|β, σ/σ ) ≡ π(X|β, σ ) − π(X|β, σ ). Condition 14.2a. There does not exist an i ∗ ∈ {1, . . . , N } such that s(ωk )[π(X(ωk )|βH /βL , σH ) − π(X(ωk+1 )|βH /βL , σH )] N>k≥i ∗
+ s(wN )π({ωN }|βH /βL , σH ) = SocBen(βH /βL ) and
s(ωk )[π(X(ωk )|βH , σH /σL ) − π(X(ωk+1 )|βH , σH /σL )]
0
= SocBen (σH /σL ). Condition 14.2b. There does not exist a j ∗ ∈ {1, . . . , N } such that s(ωk )[π(X(ωk )|βH , σH /σL ) − π(X(ωk+1 )|βH , σH /σL )] N>k≥j ∗
+ s(ωN )π({ωN }|βH , σH /σL ) = SocBen(σH /σL ) and
s(ωk )[π(X(ωk )|βH /βL , σH ) − π(X(ωk+1 )|βH /βL , σH )]
0
= SocBen (βH /βL ). Condition 14.3. π({ωk+1 , ωk }|β, σ ) > π(ωk+1 |β, σ ) + π(ωk |β, σ ) for all k ∈ {1, . . . , N − 1} and for all β ∈ {βH , βL }, σ ∈ {σH , σL }. Proposition 14.2. Suppose π(·|βH , σH ), π(·|βH , σL ), π(·|βL , σH ), π(·|βL , σL ) are informed and ambiguous beliefs, and let (βH , σH ) be the first-best profile. Provided the beliefs also satisfy Conditions 14.2a, 14.2b, and 14.3, there exist investment cost functions hB (·), hS (·), and value and cost functions v(·) and c(·), such that no contract may implement (βH , σH ). The proposition follows immediately from Lemmas 14.A.1, 14.A.2, and 14.A.3 proved in the Appendix. The simple intuition inspiring the formal proof may be stated as follows: If contingent payments are comonotonic (e.g. a proportional split of the surplus) then the full incremental social benefit of an agent’s action does not get passed on to the agent. Hence, the agent’s individual incentive to take the first-best action will be lower than the full marginal benefit of the action. So if the marginal cost of the first-best action is high enough, only noncomonotonic contingent payments could satisfy the relevant incentive constraints.
316
Sujoy Mukerji
But with noncomonotonic payments, given in Condition 14.3, the sum of the individual expected payoffs is bound to be less than the expected surplus under vertical integration. Therefore, if the marginal cost of the first-best action is high enough, one may not find contingent payments that satisfy both the incentive and participation constraints. However, if Condition 14.2 were not to hold then it would be possible to design comonotonic contingent payoffs that enforce the efficient investment. Basically, such a payoff scheme would partition into two cells {ω1 , . . . , ωi ∗ } and {ωi ∗ +1 , . . . , ωN }: all the surplus in {ω1 , . . . , ωi ∗ } will go to one party and all the incremental surplus in the complementary cell (i.e. s(ωi ∗ +1 ) − s(ωi ∗ ), s(ωi ∗ +2 ) − s(ωi ∗ ), . . .) will go to the other party. The argument in Proposition 14.2 suggests why incomplete contracts may not be a paradox in a world with ambiguity aversion since their inefficiency will not be “readily fixable”—complete contracts do not help very much. Proposition 14.3 shows that the same conditions as in Proposition 14.2 also imply that there will exist cost and value functions corresponding to which the null contract is an optimal contract, even though it implements a less than first-best profile. This shows formally that ambiguity aversion can rationalize an “inefficient” incomplete contract even when there exist sufficiently informative events for conditioning contractual payments. As is known from Corollary 14.1, this is impossible without ambiguity aversion. Before stating the formal result, we look at an example. It will illustrate the arithmetic of how because of ambiguity aversion, a null incomplete contract that can at best implement an inefficient profile can dominate a contract which satisfies the incentive constraints for implementing the first-best profile. (A verbal description of the example appeared in the introduction.) Example 14.2. Consider two vertically related firms B and S. = {ω0 , ωb , ωs }; the values of other parameters are as follows: s(ω0 ) = 0, s(ωb ) = s(ωs ) = 200; hB (βL ) = 10; hS (σL ) = 10, hB (βH ) = 85, hS (σH ) = 85; π (β L , σ L ) = (0.78, 0.01, 0.01), π (β H , σ H ) = (0.02, 0.39, 0.39); π (β H , σ L ) = (0.42, 0.365, 0.015); π(β L , σ H ) = (0.42, 0.015, 0.365); π({ωb , ωs }|β, σ ) − π({ωb }|β, σ ) − π({ωs }|β, σ ) = 0.1, π({ω0 , ωb }|β, σ ) − π({ω0 }|β, σ ) − π({ωb }|β, σ ) = 0.1, π({ω0 , ωs }|β, σ ) − π({ω0 }|β, σ ) − π({ωs }|β, σ ) = 0, π( |β, σ ) = 1;
λ¯ = 0.5.
Incompleteness of contractual form
317
Table 14.2 Nonadditive probability over each state corresponding to each action profile
β L , σL β H , σL β L , σH β H , σH
ω0
ωb
ωs
0.78 0.42 0.42 0.02
0.01 0.365 0.015 0.39
0.01 0.015 0.365 0.39
The first point to observe is that the essential effect of B’s taking the H action is to “shift likelihood” from the low surplus state ω0 to the high surplus state ωb . Symmetrically, S’s H action would “shift likelihood” from the low surplus state ω0 to the high surplus state ωs . This information is summarized in Table 14.2. (NB, the actions do not affect the “residual probability” over any event; for the events {ωb , ωs } and {ω0 , ωb } the residual is constant at 0.1 while it is 0 for {ω0 , ωs }.) B and S have a “comparative advantage” in making ωb and ωs , respectively, more likely at the margin, as it were. Next note that (βH , σH ) is the first best, and (βL , σL ) is the second-best action profile. To confirm this, compare the expected net surplus from the possible action profiles: ES(βH , σH ) = 0.39 × s(ωb ) + 0.39 × s(ωs ) + 0.1 × min{s(ωb ), s(ωs )} + 0.02 × s(ω0 ) + 0.1 × min{s(ωb ), s(ω0 )} − 170 = 6, ES(βL , σH ) = 0.365 × 200 + 0.015 × 200 + 0.1 × min{200, 200} + 0.42 × 0 + 0.1 × min{200, 0} − 95 = 1, ES(βH , σL ) = 0.015 × 200 + 0.365 × 200 + 0.1 × min{200, 200} + 0.42 × 0 + 0.1 × min{200, 0} − 95 = 1, ES(βL , σL ) = 0.01 × 200 + 0.01 × 200 + 0.1 × min{200, 200} + 0.78 × 0 + 0.1 × min{200, 0} − 20 = 4. Next consider the contract C : ⎧ ⎪ ⎨ p(ωb ) = c(ωb ); δ(ωb ) = 1 p(ωs ) = v(ωs ); δ(ωs ) = 1 C (ωi ) = ⎪ ⎩ p(ω ) = 0; δ(ω ) = 0. 0 0 This contract allocates the entire surplus at ωb to B and rewards S similarly at ωs . Given the way the parameters lie, this would seem to be a natural way to deliver the
318
Sujoy Mukerji
right incentives. Indeed, C does satisfy the incentive constraints for implementing the profile (βH , σH ): I CB :
[0.39 × s(ωb ) + 0.39 × 0 + 0.1 × min{s(ωb ), 0}] − [0.015 × s(ωb ) + 0.365 × 0 + 0.1 × min{s(ωb ), 0}] = 75 = hB (βH ) − hB (βL )
I CS :
[0.39 × 0 − 0.39 × s(ωs ) + 0.1 × min{0, s(ωs )}] − [0.365 × 0 + 0.015 × s(ωs ) + 0.1 × min{0, s(ωs )}] = 75 = hS (σH ) − hS (σL ).
However, no ex ante transfers may be arranged to satisfy the participation constraints since the expected payoff from C is negative. The expected payoff is (0.39 × 200 + 0.39 × 0 + 0.1 × min{200, 0}) × 2 − 85 × 2 = −14 < 0. Also note that the incentive constraints corresponding to C hold without the slightest slack. Any other contract which “smoothed” out the ex post payoffs (so as to increase the sum of the ex ante payoffs) would break at least one of the incentive constraints for the H actions. Thus the first best cannot be implemented. On the other hand, it is simple to check that the null contract will successfully implement the second-best profile (βL , σL ) given the ex post renegotiation anticipated at the time of signing the contract. Hence the null contract dominates C . The null contract is equivalent to having a complete contract that instructs no trade at each contingency, with renegotiation and trade arising when the states ωb and ωs are realized. Obviously, completing the contract in this way is as good as leaving the contingencies unmentioned. (It is also possible to mimic the null contract with a complete contract by actually instructing formally in the contract an equal split of the surplus and full delivery at all contingencies.) The following lemma, which is used in proving Proposition 14.3, shows how incomplete contracts (especially null contracts) are relatively “robust” to ambiguity aversion. Such contracts actually entail a comonotonic (ex post) payoff scheme across the set of contingencies where the contract (by its silence) leaves payoffs to be determined by ex post negotiation. Compared to contracts with noncomonotonic payoffs, these contracts are robust in the sense that the sum of the value of a contract to B and to S is relatively less adversely affected by ambiguity. Thus if the contingent payoffs from a null contract can satisfy individual incentive constraints and the aggregate participation constraint for a particular profile, it can necessarily implement the profile. Lemma 14.2. Suppose the following inequalities hold and the expectations are evaluated with respect to a nonadditive probability: ¯ ˆ σˆ ) − E(max{s(ωi ), 0}|β, σˆ )] ≥ hB (β) ˆ − hB (β) λ[E(max{s(ω i ), 0}|β, (14.6)
Incompleteness of contractual form
319
ˆ σ )] ≥ hS (σˆ ) − hS (σ ) ¯ ˆ σˆ ) − E(max{s(ωi ), 0}|β, (1 − λ)[E(max{s(ω i ), 0}|β, (14.7) ˆ σˆ ) ≥ hS (σˆ ) + hB (β). ˆ E(max{s(ωi ), 0}|β,
(14.8)
ˆ σˆ ) if it specifies an allocation of ex Then a contract will implement the profile (β, ¯ post surplus in accordance with the uncontingent transfer rule t(·) ≡ λ. Proposition 14.3. Let (π ∗ (·|βH , σH ), π ∗ (·|βH , σL ), π ∗ (·|βL , σH ), π ∗ (·|βL , σL )) be a tuple of ambiguous and informed beliefs that satisfy Conditions 14.2a, 14.2b, and 14.3. Then for each such tuple there exist investment cost functions h∗B (·), h∗S (·), and value and cost functions v ∗ (·) and c∗ (·), such that the null contract is an optimal contract, even though the null contract does not implement the first-best profile. A gist of the proof would run as follows: By Proposition 14.2 we are assured that corresponding to any belief satisfying Conditions 14.2 and 14.3, there is a set of parametric configurations for which there are no contracts that implement the first best. Then it can be shown that for a nonempty subset of the set of such parametric configurations a null contract will satisfy the necessary incentive constraints for implementing a second-best profile. By Lemma 14.2 it then follows that the null contract will actually implement the second-best profile. Since the first-best profile cannot be implemented by any contract, the null contract in such cases must be the optimal contract. Since this is true even in the case of an informed belief, the result stands in stark contrast to Corollary 14.1. It is important to clarify the sense in which this reasoning “predicts” incomplete contracts. Formally, the incomplete contract corresponds to the parties knowing that they will split the surplus ex post according to a state-independent sharing rule. But they could write a complete contract that calls for this explicitly. Hence the argument per se does not show that ambiguity aversion can imply that the optimal contract must be incomplete. Nevertheless, the argument does imply that even the smallest transactions costs would make the incomplete contract (strictly) preferable. Proposition 14.3 also suggests a rationalization of the elusive connection between supply uncertainty and vertical integration mentioned in the introductory section. Suppose, as seems natural, we interpret “supply uncertainty” to mean the uncertainty about the (realized) price and delivery associated with a supply contract. Could there be circumstances wherein the supply uncertainty accompanying any contract with sufficient incentives for the efficient action is more than what the transacting parties want to bear? In such cases vertical integration as an institutional form would have the potential of generating strictly greater value than a contractual transaction. Corollary 14.2 rules out such circumstances if the agents’ beliefs about the uncertainty is unambiguous. However, Proposition 14.3 allows us to infer that such cases do exist under ambiguity aversion, as Corollary 14.3 records. Thus it is the possible ambiguity aversion associated with supply uncertainty which provides the logical link to vertical integration.
320
Sujoy Mukerji
To see the point a little differently, given that agents perceive the uncertainty to be ambiguous and are ambiguity averse, an “efficient” contractual relationship (because of the “high-powered” nature of the incentive scheme it must embody) would only exacerbate the adverse effect of the uncertainty. This “loss of surplus” is avoided by integrating. By monitoring via an adminstrative hierarchy integration makes it possible to deliver adequate incentives without involving contingent payments. Obviously, integration would be the preferred alternative only if the costs of “physical” monitoring is less than the “loss of surplus” that accompanies the use of “financial” monitoring via a contract. Corollary 14.3. Let {π(·|β, σ )} be a tuple of informed and ambiguous beliefs that satisfy Conditions 14.2a, 14.2b, and 14.3. Then for each such tuple there exist investment cost functions and value and cost functions such that the maximum expected net surplus under vertical integration, max{ES(β, σ )}, is strictly greater than the expected payoff obtainable from an optimal contract.
14.3. Concluding discussion The incomplete contracts literature focusing on investment holdup has for the most part maintained the salience of transactions costs in providing the empirical cornerstone for the theory. Since the effect of ambiguity aversion is essentially to reduce the marginal gains from including more details in a contract, under ambiguity aversion even small transactions costs may result in incompleteness. But there is reason to think that the role of ambiguity aversion is not merely supplementary to transactions costs. It is crucial to take into account the nature of the uncertainty characterizing the decision environment in order to explain the empirical realities about contractual forms and vertical relationships. It is a fact that detailed contingent contracts successfully underpin many business relationships. Where contracts fail or exist as incompletely specified arrangements, are situations where uncertainty is rife. Paul L. Joskow (1985) notes examples of lasting long-term contractual relationships between electricity generating plants and minemouth coal suppliers. The story of the merger between Fisher Body and GM is an oft-cited example in the incomplete contracts/vertical integration literature. It is worth remembering that Fisher and GM did actually successfully transact business via a contract for almost ten years before the merger occurred. The reason for the collapse of the contract, as explained by Klein et al. (1978), was the unprecedented and dramatic demand uncertainty faced by the automobile market in the 1920s. The transactions-cost paradigm cannot explain why detailed, long-term contingent contracts thrive in a relatively stable world like that of the coal–electrical utility nexus but not in the more complex uncertain environments. Ambiguity aversion on the other hand would explain this in more intuitive terms.16,17 Ambiguity aversion also explains why long-term vertical relationships exist when the input supplied is “standard equipment,” whereas R&D intensive inputs are typically integrated within the firm. While it may appear that the last observation is fully explained by the notion of “transaction-specific assets,” there is reason to
Incompleteness of contractual form
321
think otherwise. Scott E. Masten et al. (1989) report evidence regarding the relative influence of transaction-specific investments in physical and human capital on the pattern of vertical integration using data obtained directly from U.S. auto manufacturers. Their results support the propositions that investments in specialized technical know-how have a stronger influence than those in specialized physical capital on the decision to integrate production within the firm. It is well known that government/defense procurement contracts are typically incompletely specified and prone to renegotiation when R&D is a significant component of the goods being supplied. But complicated contracts running to several thousand pages are quite common with “large” orders of goods not involving significant R&D. That we inhabit a world of incomplete contracts is not necessarily because agents are constrained from conditioning contractual instructions on “finely” described events by the direct costs of envisaging and/or writing down the relevant details. Rather, it could simply be because DMs perceive that they have very vague ideas about the likelihood of such events. The understanding that how well the DM thinks he knows the relevant likelihoods explains what events are used to condition contractual instructions is a novel contribution of the theory of ambiguity aversion to the debate about the foundations of incomplete contracts. The understanding is indeed novel since to a SEU maximizer the quality or accuracy of his belief does not matter. The explanation of contractual incompleteness advanced in this chapter turns on the assumption that the specific investments are not contractible. At least as long as value and cost are simply functions of the realized contingency [i.e., v = v(w) and c = c(w)], and contingencies are “informative,” the fact that investments are no contractible would not of itself allow “inefficient” incomplete contracts to be optimal contracts. Further, the requisite “informativeness” is an empirically valid assumption in the relevant economic context. Suppose instead, that the value and cost are not just functions of the realized contingency, but say, v = v(w; q) and c = c(w; q), where q is a variable measuring the quality or quantity of the input traded. Then it is possible that contractual incompleteness (in the sense of not mentioning trade prices for some events) can arise as a strategic imperative as long as q is also not contractible. This result is demonstrated in B. Douglas Bernheim and Michael D. Whinston (1997). One contribution of the present chapter lies in demonstrating how it is possible to rationalize incompleteness even if q were degenerate. The model presented in this chapter assumes firms are risk neutral. There is a dominant tradition in economics of modeling firms as such. For the typical “big” firm, pervasive risk spreading ensures that the ownership of the firm’s returns is sufficiently diffuse. Thus a firm may act as if its utility is linear in profits; that is, it only cares about expected profits and may ignore risk completely. When analyzing contractual incompleteness in interfirm relationships, the challenge thus is to explain without invoking risk aversion—hence, the assumption of risk neutrality. Notice though that the arguments used to urge the analyst to ignore risk would not work to rule out the effect of ambiguity. While risk spreading ensures that firms behave as if their utility functions are linear, as we have observed, ambiguity aversion bites in spite of linear utilities.
322
Sujoy Mukerji
Finally, I turn my attention to some general theoretical issues raised by the analysis. We have obtained at least a preliminary understanding of the role of ambiguity aversion in the design of optimal contracts. For instance, we have been alerted to the fact that the trade-off between exacerbating the effect of ambiguity and the provision of incentives may make it optimal to ignore “information-rich” signals. This is very analogous to the well-researched trade-offs between risk and incentives and between “information rent” and incentives. One suspects that the trade-off associated with ambiguity aversion will have as wide-ranging and significant implications as these other trade-offs. For example, the above-mentioned analysis prompts the question: Can ambiguity aversion force contracts in models of double-sided moral hazard to have linear or even approximately linear sharing rules? This is one open question to be dealt with in future research along with the broader issue of obtaining a complete characterization of optimal contracts under ambiguity aversion.
Appendix Example 14.A1. Let the universe consist of two complementary events E and E c . E, in turn, consists of two subevents E1 and E2 . E1 is further partitioned into E11 and E12 . The nonadditive probability function π which describes the DM’s belief is: π(E11 ) = 0.1; π(E12 ) = 0.2; π(E1 ) = 0.5; π(E2 ) = 0.2; π(E) = 0.8; π(E c ) = 0.2; π(E11 ∪ X ) = π(E11 ) + π(X ) where X ∈ {E2 , E c , E2 ∪ E c }; π(E12 ∪ X ) = π(E12 )+π(X ) where X ∈ {E2 , E c , E2 ∪E c }; π(E1 ∪E c ) = π(E1 )+π(E c ); π(E2 ∪ E c ) = π(E2 ) + π(E c ); π( ) = 1. The payoffs of the act f are as indicated in Figure 14A.1.
Ω; (Ω) = 1
Figure 14.A.1 The space of events and outcomes of f .
Incompleteness of contractual form
323
CEπ (f ) = f (E c )[π( ) − π(E)] + f (E2 )[π(E) − π(E1 )] + f (E11 )[π(E1 ) − π(E12 )] + f (E12 )π(E12 ) = 8.5. It is instructive to note that an equivalent computation is CEπ (f ) = π(E11 )f (E11 ) + π(E12 )f (E12 ) + π(E2 )f (E2 ) + π(E c )f (E c ) + [π(E1 ) − {π(E11 ) + π(E12 )}] min{f (E11 ), f (E12 )} + [π(E) − {π(E1 ) + π(E2 )}] × min{f (E11 ), f (E12 ), f (E2 )} = 8.5. The second computation shows that the Choquet method evaluates an act in much the same way as the usual expectation operator, with the amendment that the “residual” measure (namely, the residual at E1 is [π(E1 ) − {π(E11 ) + π(E12 )}] in any subset of contingencies is “attached” to the contingency bearing the worst consequence of the act in that subset. Proof of Lemma 14.1. First I establish some notation. Given a convex nonadditive probability π(·|β, σ ) let (tπ (·|β, σ )) denote the set of additive probabilities in the “core” of π(·|β, σ ), that is, (π(·|β, σ )) = π˜ ∈ ( )|π(X ˜ ) ≥ π(X |β, σ ), ∀X ∈ 2 }. Note that if π(·|β, σ ) is additive then (π(·|β, σ )) is a singleton, the only element being π(·|β, σ ) itself. Next, given a payoff function f , f : → R, define the set π(f ; β, σ ) as follows: " % π(f ; β, σ ) ≡ π˜ ∈ (π(·|β, σ )) f (ωi )π(ω ˜ i ) = CEπ(·|β,σ ) f . ωi ∈
That is, evaluating the Choquet expectation of f with respect to the nonadditive probability π(·|β, σ ) is equivalent to evaluating the expectation of f with respect to any additive probability in π(f ; β, σ ). Now, t˜ will solve I C B and I C s if there exists a π˜ b (ωi |β, σ ) ∈ π(sδ − t˜; β, σ ) and also a π˜ b (ωi |β, σ ) ∈ π(t˜; β, σ ), such that the following inequalities (14.A.1) and (14.A.2) are satisfied: [π˜ b (ωi |βH , σH ) − π˜ b (ωi |βL , σH )][s(ωi )δ(ωi ) − t˜(ωi )] ωi ∈
≥ hB (βH ) − hB (βL )
(14.A.1)
[π˜ s (ωi |βH , σH ) − π˜ s (ωi |βH , σL )][t˜(ωi )]
ωi ∈
≥ hS (σH ) − hS (σL ).
(14.A.2)
324 Sujoy Mukerji Suppose π˜ o (ωi |β, σ ) ∈ π(s; β, σ ). Since (βH , σH ) is the first best, the inequalities (14.A.3) and (14.A.4) must be true:
π˜ o (ωi |βH , σH )(max{s(ωi ), 0}) − hB (βH ) − hS (σH )
ωi ∈
≥
π˜ o (ωi |βL , σH ) × (max{s(ωi ), 0}) − hB (βL ) − hS (σH ) (14.A.3)
ωi ∈
π˜ o (ωi |βH , σH )(max{s(ωi ), 0}) − hB (βH ) − hS (σH )
ωi ∈
≥
π˜ o (ωi |βH , σL ) × (max{s(ωi ), 0}) − hB (βH ) − hS (σL ). (14.A.4)
ωi ∈
Setting δ(·) such that δ(ωi ) > 0 ⇔ s(ωi ) ≥ 0, and rearranging terms (14A.3) yields (14.A.5), and (14.A.4) yields (14A.6):
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi ))
ωi ∈
≥ hB (βH ) − hB (βL )
(14.A.5)
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βH , σL ))(s(ωi )δ(ωi ))
ωi ∈
≥ hS (σH ) − hS (σL ).
(14.A.6)
Hence solutions to (14.A.1) and (14.A.2) will exist if we find t˜ that solves (14.A.7) and (14.A.8):
[π˜ b (ωi |βH , σH ) − π˜ b (ωi |βL , σH )] × [s(ωi )δ(ωi ) − t˜(ωi )]
ωi ∈
≥
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi ))
(14.A.7)
ωi ∈
[π˜ s (ωi |βH , σH ) − π˜ s (ωi |βH , σL )][t˜(ωi )]
ωi ∈
≥
ωi ∈
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βH , σL ))(s(ωi )δ(ωi )).
(14.A.8)
Incompleteness of contractual form
325
Using matrix notation the inequalities (14.A.7) and (14.A.8) may be replaced by (14.A.9.): ⎡ ⎤ . t˜(ω1 ) −π˜ b (ωi |βH , σH ) + π˜ b (ωi |βL , σH ) ⎢ ⎥ .. ⎣ ⎦ . π˜ s (ωi |βH , σH ) − π˜ s (ωi |βH , σL ) t˜(ωN ) ⎤ ⎡ (π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi )) ⎥ ⎢ ω ∈
⎥ ⎢ i ≥⎢ ⎥ . (14.A.9) ⎣ (π˜ (ω |β , σ ) − π˜ (ω |β , σ ))(s(ω )δ(ω )) ⎦ o
i
H
H
o
i
H
L
i
i
ωi ∈
Consider the case where π(·|β, σ ) is unambiguous. It is possible to find a (bounded) t˜ if the vectors −π˜ b (β H , σ H ) + π˜ b (β L , σ H ) and π˜ s (β H , σ H ) − π˜ s (β H , σ L ) are independent, which in turn is ensured by Condition 14.1. Thus the proof is complete for this case. However, in general, the fact that the system in (14.A.9) has a bounded solution does not immediately follow from our assumption on independence. The reason is that, when the core of π (β, σ ) is a nonsingleton set, the vectors −π˜ b (β H , σ H ) + π˜ b (β L , σ H ) and π˜ s (β H , σ H ) − π˜ s (β H , σ L ) are not completely “exogenous” to the system; they are “endogenous” in so far as they depend on ˜t. However, Condition 14.1 applied in conjunction with a standard fixed-point argument shows that (14.A.9) indeed has a bounded solution. To proceed with the fixed-point argument, I first construct an appropriate mapping in the following three steps. Step 1. Pick any two elements from (π(·|βH , σH )) and an element each from (π(·|βH , σL )) and (π(·|βL , σH )). Denote them as π˜ 1 , π˜ 2 , π˜ 3 , π˜ 4 , respectively. Step 2. Consider the solution set to the system given by (14.A.10): ⎡ ⎤ t(ω1 ) * ) −π˜ 1 + π˜ 4 ⎢ ⎥ .. ⎦ . π˜ 2 − π˜ 3 ⎣ t(ωN ) ⎤ ⎡ (π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi )) ⎥ ⎢ ω ∈
⎥ ⎢ i ≥⎢ ⎥. ⎣ (π˜ (ω |β , σ ) − π˜ (ω |β , σ ))(s(ω )δ(ω )) ⎦ o
i
H
H
o
i
H
L
i
i
ωi ∈
(14.A.10) Let τ = {t|t solves (14A.10)}. Recall the role of contingencies ωk and ωl as stated in Condition 14.1. Make a selection t¯ from τ such that t(ωi ) = 0, if i = k or i = l. It follows from Condition 14.1 that such a selection exists and is unique. Furthermore, t¯ is bounded.
326
Sujoy Mukerji
Step 3. Finally consider a set {π˜ 1t¯, π˜ 2t¯, π˜ 3t¯, π˜ 4t¯} where π˜ 1t¯, π˜ 2t¯ ∈ π(t¯; βH , σH ), π˜ 3t¯ ∈ π(t¯; βH , σL ), π˜ 4t¯ ∈ π(t¯; βL , σH ). Steps 1 and 2 together define a continuous function 1 : (π(·|βH , σH )) × (π(·|βH , σH )) × (π(·|βH , σL )) × (π(·|βL , σH )) → RN while Step 3 defines a convex-valued upper hemicontinuous correspondence. 2 : RN ⇒ (π(·|βH , σH )) × (π(·|βH , σH )) × (π(·|βH , σL )) × (π(·|βl , σH )). Hence the composition , ≡ 2 ◦ defines a convex-valued upper hemicontinuous correspondence from the convex domain (π(·|βH , σH )) × (π(·|βH , σH )) × (π(·|βH , σL )) × (π(·|βL , σH )) into itself. Kakutani’s fixed-point theorem ensures that has a fixed point. Let {π1t ∗, π2t ∗, π3t∗ , π4t∗ } be a fixed point of . If t∗ solves (14.A.11), then clearly t∗ satisfies conditions required of the transfer t˜ (in 14.A.1) and (14.A.2): ⎤ ⎡ * t(ω1 ) ) −π1t ∗ + π4t ∗ ⎢ . ⎥ . π˜ 2t ∗ − π˜ 3t ∗ ⎣ . ⎦ t(ωn ) ⎡ ⎤ (π˜ o (ωi |βH , σH ) − π˜ o (ωi |βl , σH ))(s(ωi )δ(ωi )) ⎢ ω ∈
⎥ ⎢ t ⎥ ≥⎢ ⎥ ⎣ (π˜ (ω |β , σ ) − π˜ (ω |β , σ ))(s(ω )δ(ω )) ⎦ o
i
H
H
o
i
H
L
i
i
ωt ∈
(14.A.11) Proof of Proposition 14.1. Lemma 14.1 proves that a (bounded) t exists which satisfies the incentive constraints relevant to implementing the first best. Given such a t the expected payoffs to B and S are E(s(ωi )δ(ωi ) − t(ωi )|βH , σH ) − hB (βH ) and E(t(ωi )|βH , σH ) − hS (σH ), respectively. Since the expectations operator is additive, the sum of the expected payoffs to the two parties is E(s(ωi )δ(ωi )|βH , σH ) − hB (βH ) − hS (σH ). (βH , σH ) is the first best, implying E(s(ωi )δ(ωi )|βH , σH ) − hB (βH ) − hS (σH ) ≥ 0. Hence participation constraints can be taken care of by a transfer
Incompleteness of contractual form
327
τ ∈ R transacted when the contract is signed, so long as τ satisfies the following conditions: (P CB∗ ) E(s(ωi )δ(ωi ) − t(ωi )|βH , σH ) − τ − hB (βH ) ≥ 0 (P CS∗ ) E(t(ωi )|βH , σH ) + τ − hS (σH ) ≥ 0. Lemma 14.A.1. Given that f : → R, g : → R and that π is a convex nonadditive probability function, π : 2 → [0, 1]; and the labeling of the state space = {ωi }N i=1 is such that f (ωm ) > g(ωn ) ⇒ m > n. (a) if f and g are comonotonic then Eπ (f + g) = Eπ (f ) + Eπ (g). (b) Let f and f + g be comonotonic, and suppose that there is ωk+1 , ωk such that (i) and (ii) holds: (i) g(ωk+1 ) < g(ωk ); (ii) π({ωk+1 }) + π({ωk }) < π({ωk+1 , ωk }). Then Eπ (f + g) > Eπ (f ) + Eπ (g). Proof. The proof is straightforward and hence omitted. Lemma 14.A.2. Assume π(·|·, ·) satisfies Conditions 14.2a, 14.2b, and that (βh , σh ) is the first-best action profile. Then there exist hB (·), hS (·) such that for any t˜ which satisfies the incentive, compatibility conditions I C B , I C S in Lemma 14.1, t˜ and the vector [max s(·), 0 − t˜(·)] are not comonotonic. Proof. The strategy of the proof will be to choose hS (·) such that hS (σH ) − hS (σL ) = SocBen (σH /σL ) and then show that if t˜ and s − t˜ are comonotonic, B’s marginal private benefit from choosing βH (i.e. E(s− ˜t|βH , σH )−E(s− ˜t|βL , σH ) falls short of SocBen (βH /βL ). Hence we can choose hB (·) with the difference hB (βH ) − hB (βL ) large enough [but less than SocBen (βH /βL )], so that if ˜t, s − ˜t are to satisfy (I C B ), (I C S ), then it must be that ˜t and s − ˜t are not comonotonic. Fix s(·) such that SocBen(σH /σL ) > 0 and SocBen(βH /βL ) > 0 and consider hS (·) and hB (·) such that (βH , σH ) is the first-best action profile. Choose hS (·) such that hS (σH ) − hS (σL ) = SocBen(σH /σL ). Choose t˜ such that s, t˜ and s − t˜ are comonotonic and t˜ satisfies I C S . Let i S be the smallest value of the contingent state index i, such that π(X(ωi+1 )|βH , σH /σL ) > 0. Similarly, let i S be the highest index i such that π(X(ωi )|βH , σH /σL ) > 0. Note, Assumption 14.2b guarantees that the set {ωi ∈ |i S ≤ i ≤ iS } is not a singleton set.
328
Sujoy Mukerji
Claim 14.A.1. s(ωi ) − t˜(ωi ) is the same for all i satisfying the condition i S ≤ i ≤ iS . Proof. Since for all i ≤ i S , π(X(ωi )|βH , σH /σL ) = 0, it follows that π(X(ωi S )|βH , σH /σL ) − π(X(ωi S +1 )|βH , σH /σL ) < 0.
(14.A.12)
Since for all i > i S , π(X(ωi )βH , σH /σL ) = 0, it follows that π(X(ωi S )|βH , σH /σL ) − π(X(ωi S +.1 )|βH , σH /σL ) > 0.
(14.A.13)
At this point it is useful to recall that E(s − t˜|β, σ ) may be written as N−1
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|β, σ ) − π(X(ωi+1 )|β, σ )]
i=1
+ [s(ωN ) − t˜(ωN )]π(ωN |β, σ ). Suppose the claim is false. That is, s(ωi ) − t˜(ωi ) is not constant when i varies in the interval [i S , i S ]. By comonotonicity of s˜ and s − t˜, s(ωi ) − t˜(ωi ) is weakly increasing in ωi for i such that i S ≤ i ≤ i S . Hence it must be that s(ωi S ) − t˜(ωi S < s(ωi S ) − t˜(ωi S ). Next notice, E(s − t˜|βH , σH ) − E(s − t˜|βH , σL ) =
iS
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|βH , σH ) − π(X(ωi+1 )|βH , σH )]
i=i S
−
iS
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|βH , σL ) − π(X(ωi+1 )|βH , σL )]
i=i S
=
iS
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|βH , σH /σL )
i=i S
− π(X(ωi+1 )|βH , σH /σL )].
(14.A.14)
By inspecting the final expression (14.A.14), it may be checked that E(s − t˜|βH , σH ) − E(s − t˜|βH , σL ) > 0.
(14.A.15)
To see this, first note that for any i = ιˆ, π(X(ωιˆ)|β, σ ) =
N −1
[π(X(ωi )|β, σ ) − π(X(ωi+1 )|β, σ )] + π(ωN |β, σ ).
i=ˆι
Incompleteness of contractual form
329
Hence, the Assumption 14.2b (π(X(ωi )|βH , σh /σl ) ≥ 0) implies iS
[π(X(ωi )|βH , σH /σL ) − π(X(ωi+1 )|βH , σH /σL )] ≥ 0.
(14.A.16)
i=ˆι
Then, (14A.15) finally follows from the fact that s(ωi )− t˜(ωi ) is weakly increasing in ωi , (14A.16), (14A.12), (14A.13) and that s(ωi S ) − t˜(ωiS ) < s(ωi S ) − t˜(ωi S ). But (14A.15) in turn implies E(t˜|βH , σH ) − E(t˜|βH , σL ) < SocBen(σH /σL ).
(14.A.17)
Hence, given that we chose t˜ which satisfies I C S we have arrived at a contradiction. Claim 14.A.2. E(s − t˜|βH , σH ) − E(s − t˜|βL , σH ) < SocBen(βH /βL ). Proof. Suppose not, that is, E(s − t˜|βH , σH ) − E(s − t˜|βL , σH ) = SocBen(βH /βL ). Then by an argument as in Claim 14A.1, one may show that there must be a set of contiguous contingencies {ωi B , . . . , ωi B } where t˜(ωi ) is constant when i varies in the closed interval [i B , i B ]. Further, Condition 14.2 (a and b taken together) ensures that the intervals [i S , i S ] and [i B , i B ] overlap; that is, i B < i S and i S < i B . Since it has already been established that s(ωi ) − t˜(ωi ) is constant when i ∈ [i S , i S ], the fact that t˜(ωi ) is constant for i ∈ [i B , i˜B ] therefore implies that both s(ωi ) − t˜(ωi ) and t˜(ωi ) are constant when i ∈ [i, i], where i ≡ min{i B , i S } and i ≡ max{i B , i S }. Thus we are left with the contradictory conclusion that SocBen(βH /βL ) = 0 = SocBen(σH /σL ). With claim 14A.2 we have established that if t˜ and s − t˜ are comonotonic, and if hS (σH ) − hS (σL ) = SocBen(σH /σL ), then we can find hB (·) such that s − t˜ will not satisfy (I C B ) if t˜ satisfies (I C S ). Lemma 14.A.3. Suppose Conditions 14.2a, 14.2b, and 14.3 are satisfied and (βH , σH ) is the first-best profile. Then, there will exist hB (·) and hS (·), such that (βH , σH ) cannot be implemented even if there are t˜ and s − t˜ which satisfy (I C B ), (I C S ). Proof. We choose two investment cost functions h˜ B (·) and hS (·), such that any t and s − t which satisfy the corresponding (I C B ), (I C S ) are necessarily noncomonotonic. Lemma 14A.2 assures that such a choice is available. Of all t and s − t that satisfies the corresponding (I C B ), (I C S ) with investment cost functions hB (·) and hS (·), let t˜ and s − t˜ be the pair which maximizes E(max{s(ωi ), 0} − t(ωi )|βH , σH ) + E(t(·)|βH , σH ).
330
Sujoy Mukerji
Given what has been assumed about π(·|β, σ ), t˜ and s − t˜ are not comonotonic. Hence, it follows from Lemma 14A.1(b) and Condition 14.3 that E(max{s(ωi ), 0} − t˜(ωi )|βH , σH ) + E(t˜(·)|βH , σH ) < E(max{s(ωi ), 0}|βH , σH ), implying that there exist nonnegative real numbers ξb and ξs such that E(max{s(ωi ), 0} − t˜(ωi )|βH , σH ) + E(t˜(·)|βH , σH ) − hB (·) − ξb − hS (·) − ξs < 0, even though, E(max{s(ωi ), 0}|βH , σH ) − hB (·) − ξb − hS (·) − ξs > 0. Choose, hB (·) = hB (·) + ξb ; hS (·) = hS (·) + ξs ; this choice will satisfy (I C B ), (I C S ) for the division of surplus t˜ and s − t˜; however it will fail to satisfy the participation constraint(s) for implementing (βH , σH ). Proof of Lemma 14.2. Clearly t and s − t satisfy the appropriate incentive constraints. Notice, t and s − t are comonotonic and thus by Lemma 14A.2(a), the expectations operator is additive. Hence ex ante payments can be arranged to satisfy the individual participation constraints given that the aggregate participation constraint (14.8) is satisfied. Proof of Proposition 14.3. We know from Proposition 14.2 that there exists a tuple (πˆ (·|β, σ ), hˆ B (·), hˆ S (·), sˆ (·)) satisfying Conditions 14.1, 14.2a, 14.2b, and 14.3 such that no contract may implement the first best (βH , σH ). Recall, (β ∗ , σ ∗ ) is a second-best profile if ES(β , σ ) > ES(β ∗ , σ ∗ ) ⇒ (β , σ ) is the first best. If (βL , σL ) is the second best in the model described by the tuple (πˆ (·|β, σ ), hˆ B (·), hˆ S (·), sˆ (·)), then it must be the case that the inequalities (14A.18), (14A.19), (14A.20) are satisfied, implying that the null contract will implement (βL , σL ) (by Lemma 14.2). ¯ λ[E(max{ˆ s (ωi ), 0}|βL , σL ) − E(max{ˆs (ωi ), 0}|βH , σL )] ≥ hˆ B (βL ) − hˆ B (βH )
(14.A.18)
¯ (1 − λ)[E(max{ˆ s (ωi ), 0}|βL , σL ) − E(max{ˆs (ωi ), 0}|βL , σH )] ≥ hˆ S (σL ) − hˆ S (σH )
(14.A.19)
Incompleteness of contractual form E(max{ˆs (ωi ), 0}|βL , σL ) ≥ hˆ B (σL ) + hˆ B (βL ).
331
(14.A.20)
To see why (14.A.18), (14.A.19) must be satisfied, suppose (14A.18) does not hold. That is, ¯ λ[E(max{ˆ s (ωi ), 0}|βH , σL − E(max{ˆs (ωi ), 0}|βL , σL )] > hˆ B (βH ) − hˆ B (βL ).
(14.A.21)
But the assumption of stochastic dominance (in Assumption (14.2a) implies that (1 − λ¯ )E(max{ˆs (ωi ), 0}|βH , σL ) − hˆ S (σL ) ¯ ≥ (1 − λ)E(max{ˆ s (ωi ), 0}|βL , σL ) − hˆ S (σL ).
(14.A.22)
Summing up (14.A.21) and (14.A.22), we get (14.A.23): E(max{ˆs (ωi ), 0}|βH , σL ) − hˆ B (βH ) − hˆ S (σL ) > E(max{ˆs (ωi ), 0}|βl , σL ) − hˆ B (βL ) − hˆ S (σl ).
(14.A.23)
But (14A.23) contradicts the hypothesis that (βL , σL ) is the second best. Hence if (βL , σL ) is the second best the null contract is optimal contract for the model (π(·|β, ˆ σ ), hˆ B (·), hˆ S (·), sˆ (·)). Next consider the case where (βL , σL ) is not the second best. Assume w.l.o.g. that (βH , σL ) is the second best. Adjust the investment cost functions as follows: Let h˜ B (βH ) = hˆ B (βH ); h˜ S (σH ) = hˆ S (σH ); and h˜ B (βL ) = hˆ B (βL )+ε; h˜ S (σL ) = hˆ S (σL ) − ε, where ε is such that, ¯ λ[E(max{ˆ s (ωi ), 0}|βH , σL ) − E(max{ˆs (ωi ), 0}|βL , σL )] = h˜ B (βh ) − h˜ B (βL ).
(14.A.24)
It may be checked that with the adjusted cost functions, (βH , σL ) will be implemented by the null contract and further, the adjustment will not alter the fact that (βH , σH ) is the first best. One has to verify that (βH , σH ) cannot be implemented, given the adjusted cost functions h˜ B and h˜ S . To that end, suppose to the contrary. Hence by this hypothesis there exists a transfer t˜ which meets the required incentive and participation constraints corresponding to the adjusted cost functions h˜ B and h˜ S . That implies there exists a transfer tˆ = t˜ + α which ensures that the incentive and participation constraints are met for the implementation of (βH , σH ) corresponding to the original investment cost functions hˆ B and hˆ S . This contradicts the fact that the model given by (π(·|β, ˆ σ ), hˆ B (·), hˆ S (·), sˆ (·)) is such that the first best (βH , σH ) cannot be implemented by a contract.
Acknowledgments The chapter has benefited very substantially from the many constructive suggestions of the two anonymous referees. Their efforts went much beyond the call
332
Sujoy Mukerji
of duty and I remain very grateful. Jim Malcomson’s painstaking scrutiny of an earlier draft made possible the much-needed expositional improvements. I also thank Dieter Balkenborg, Jacques Cramér, David Kelsey, Fahad Khalil, Peter Klibanoff, Andrew Mountford, David Pearce, R. Edward Ray, Gerd Weinrich, and seminar members at various universities and conferences (especially the audience at the Conference on Decision Making Under Uncertainty held at Saarbrücken (Germany), University of Saarland) for helpful discussions and comments.
Notes 1 To preempt misunderstandings it is emphasized that the term “ambiguity,” as used in this chapter, refers purely to the fuzzy perception of the likelihood subjectively associated with an event (e.g. when asked about his subjective estimate of the probability of an event, the agent replies, “It is between 50 and 60 percent.”). It does not refer to a lack of clarity in the description of contingent events and actions. Also note, some authors and researchers refer to ambiguity as “Knightian Uncertainty” or even simply as “uncertainty.” As it is used in this chapter, the word “uncertainty” is simply the defining characteristic of any environment where the consequence of at least one action is not known for certain. 2 The reader is assured that the example is essentially unaffected by also having a state in which both the statements are true. 3 The author remains most grateful to the two anonymous referees for drawing his attention to this point. 4 In general, nonadditative probability (or capacity) π obeys the axioms (i), (ii), and the condition that X ⊇ Y ⇒ π(X ) ≥ π(Y). The axiom (iii) applies to the special case of a convex nonadditive probability. The term “convex” points to the requirement that the nonadditive probability of a set is (weakly) greater than the sum of the nonadditive probabilities of the cells of a partition of the set. Presumably, the analogy is to the property of any increasing convex function, say φ : R+ → R+ , that φ(x) + φ(y) ≤ φ(x + y). It is when the nonadditive probability is convex that the CEU decision rule corresponds to ambiguity aversion. 5 Consider the following stronger version of the third property: (iii ) For every n > 0 and every collection χ1 , . . . , χn ∈ 2 , π(∪ni=1 χi ) ≥ (−1)|I |+1 π(∩i∈I χi ) I ⊆{1,...,n} I =∅
where |I | denotes the cardinality of I . Non-additive probabilities which satisfy (iii ), in addition to (i) and (ii), have been variously referred to as “belief functions”, “totally monotone capacities,” and “n-convex capacities”. In the rest of the chapter, all references to convex non-additive probability measures should be understood to be referring to non-additive probabilities which satisfy (i), (ii), and (iii ), rather than to those satisfying (i), (ii), and (iii), as they did in the version published in the AER. The amendment ensures that Lemma A1 is correct as stated. The author thanks Ben Polak for bringing to his notice that Lemma A1 need not hold for convex capacities which satisfied only (iii), the weaker version of the third property. 6 This follows from the celebrated theorem in Lloyd S. Shapley (1965) which asserts the existence of a core allocation corresponding to any convex characteristic value function defined on possible coalitions in a cooperative game. 7 Peter C. Fishburn (1993) provides an axiomatic justification of this definition of ambiguity and Mukerji (1997) demonstrates its equivalence to a more primitive and epistemic
Incompleteness of contractual form
333
notion of ambiguity (expressed in terms of the DM’s knowledge of the state space). Massimo M. Marinacci (1995) applies the idea to game theory, while David Kelsey and Shasikanta Nandeibam’s (1996) analysis explains why this definition is sometimes interpreted as a measure of “uncertainty aversion.” 8 The Choquet expectation operator may be directly defined with respect to a nonadditive probability. Label ωi such that f (ω1 ) ≤ · · · ≤ f (ωN ). Then, CEπ (f ) = f (ω1 ) +
N
[f (ωi ) − f (ωi−1 )] × π({ωi , . . . , ωN })
i=2
=
N −1
f (ωi )[π({ωi , . . . , ωN }) − π({ωi+1 , . . . , ωN })] + f (ωN )π({ωN }).
i=1
9 For a fuller review of the arithmetic of the Choquet expectation operator, see Example 14.A.1 in the Appendix. 10 This is technically evident from the fact that if {X , Y} is a partition of the set E, then convexity of the belief (on E) implies, A(π(X )) + A(π(Y)) ≥ A(π(E)). 11 Usually stochastic dominance is defined with respect to the payoff or the outcome space. As stated here, instead the reference is to the underlying contingency space. Thus we have to suitably amend the usual definition to accommodate the fact that contiguous contingencies may yield the same outcome, that is, the same surplus. 12 The reader will observe that this notion of the first best is “vindicated” by the fact that this is the profile that will be chosen if the investment effort were contractible. 13 In particular, this allows for terms of trade being contingent on realizations of v(·) and c(·) by the simple expedient of making the terms contingent on events such as E(V ; C) ≡ {ωi ∈ |v(ωi ) = V and c(ωi ) = C}. 14 By taking δ as a mapping into {0, 1} ex post randomization is ruled out. This follows the dominant tradition in the literature on incomplete contracting, see for example, Hart and John H. Moore (1988). 15 It seems a reasonable conjecture that Conditions 14.2 and 14.3 are generic. If, for instance, Condition 14.2 fails to hold, even the slightest perturbation of beliefs should restore the condition. A similar understanding can be suggested for Condition 14.3. In a suitably rich space of measures that includes all convex nonadditive measures, the subspace of beliefs that are strictly additive (over at least some events) would appear to be nongeneric. (NB the parametric specification in Example 14.2 satisfies Conditions 14.1, 14.2a, 14.2b, and 14.3.) 16 A referee has remarked, “the claim . . . that [a] transactions cost argument cannot rationalize the use of long-term contracts in stable environments but not in highly uncertain ones is debatable . . . if the greater uncertainty means more contingencies that must be foreseen, described, bargained over, and ultimately recognized . . . .” While admittedly debatable, the claim is definitely defensible. It is certainly intuitive to posit a link between the nature of the incumbent uncertainty and the extent of transactions costs. But one is yet to see a formal clarification of such a story. For instance, it is hard to figure precisely what primitives and principles imply that “greater uncertainty means more contingencies that must be foreseen (etc).” The point is, while ambiguity aversion does manage to convey a precise and coherent account of a link between uncertainty and contracting costs, the transactions-costs paradigm is yet to find one. 17 James M. Malcomson and W. Bentley McLeod (1993) explains Joskow contracts by essentially arguing that in such contexts conditioning instructions over a coarse partition
334 Sujoy Mukerji of the contingency space is sufficient. This explanation is very consistent with the ambiguity aversion story: as has been observed earlier (Note 9), the coarser the partition the less the bite from ambiguity.
References Anderlini, Luca and Felli, Leonardo. 1994, “Incomplete Written Contracts: Undescribable States of Nature,” Quarterly Journal of Economics, November, 109(4), pp. 1085–124. Baker, George; Gibbons, Robert and Murphy, Kevin. J. 1997, “Implicit Contracts and the Theory of the Firm,” Unpublished manuscript. Bernheim, B. Douglas and Whinston, Michael D. 1997, “Incomplete Contracts and Strategic Ambiguity,” Discussion Paper No.1787, Harvard University. Camerer, Colin F. and Weber, Martin. 1992, “Recent Developments in Modelling Preferences: Uncertainty and Ambiguity,” Journal of Risk and Uncertainty, October 5(4), pp. 325–70. Carlton, Dennis W. 1979, “Vertical Integration in Competitive Markets Under Uncertainty,” Journal of Industrial Economics, March, 27(3), pp. 189–209. Coase, Ronald. 1937, “The Nature of the Firm,” Economica, November, 39(4), pp. 386–405. Dow, James P. and Werlang, Sergio R. 1992, “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, January 60(1), pp. 197–204. (Reprinted as Chapter 17 in this volume.) —— 1994 “Nash Equilibrium Under Kinghtian Uncertainty: Breaking Down Backward Induction,” Journal of Economic Theory, December, 64(2), pp. 305–24. Eichberger, Jurgen and Kelsey, David. 1996, “Signalling Games with Uncertainty,” Mimeo, University of Birmingham, U.K. Ellsberg, Daniel. 1961, “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, November, 75(4), pp. 643–69. Epstein, Larry G. and Wang, Tan. 1994, “Intertemporal Asset Pricing Under Kinghtian Uncertainty,” Econometrica, March 62(2), pp. 283–322. (Reprinted as Chapter 18 in this volume.) —— 1995 “Uncertainty, Risk-Neutral Measures and Security Price Booms and Crashes,” Journal of Economic Theory, October, 67(1), pp. 40–82. Fishburn, Peter C. 1993, “The Axioms and Algebra of Ambiguity,” Theory and Decision, March, 34(2), pp. 119–37. Ghirardato, Paolo. 1995, “Coping with Ignorance: Unforeseen Contingencies and Nonadditive Uncertainty,” Mimeo, University of California, Berkeley. Grossman, Sanford J. and Hart, Oliver D. 1986, “The Costs and Benefits of Ownership: A Theory of Vertical and Lateral Integration,” Journal of Political Economy, August, 94(4), pp. 691–719. Hart, Oliver D. 1995, Contracts and financial structure. Oxford: Clarendon Press. Hart, Oliver D. and Moore, John H. 1988, “Incomplete Contracts and Renegotiation,” Econometrica, July, 56(4), pp. 755–85. Joskow, Paul L. 1985, “Vertical Integration and Long-Term Contracts: The Case of CoalBurning Electric Generating Plants,” Journal of Law, Economics and Organization, Spring, 1(1), pp. 33–80. Kelsey, David and Nandeibam, Shasikanta. 1996, “On the Measurement of Uncertainty Aversion,” Mimeo, University of Birmingham, U.K.
Incompleteness of contractual form
335
Klein, Benjamin; Crawford, Robert G. and Alchian, Armen A. 1978, “Vertical Integration, Appropriable Rents and the Competitive Contracting Process,” Journal of Law and Economics, October, 21(2), pp. 297–326. Legros, Patrick and Matsushima, Hitoshi. 1991, “Efficiency in Partnerships,” Journal of Economic Theory, December, 55(2), pp. 296–322. Lipman, Barton L. 1992, “Limited Rationality and Endogenously Incomplete Contracts,” Queen’s Institute for Economic Research Discussion Paper No. 858. October. Lo, Kin Chung. 1998, “Sealed Bid Auctions with Uncertainty Averse Bidders,” Economic Theory, July, 12(1), pp. 1–20. MacLeod, W. Bentley and Malcomson, James M. 1993 “Investments, Holdup, and the Form of Market Contracts,” American Economic Review, September, 83(4), pp. 811–37. Malcomson, James M. 1997, “Contracts, Hold-Up, and Labor Markets,” Journal of Economic Literature, December, 35(4), pp. 1916–57. Marinacci, Massimo M. 1995, “Ambiguous Games,” Mimeo, Northwestern University. Masten, Scott E.; Meehan, James W. and Snyder, Edward A. 1989, “Vertical Integration in the U.S. Auto Industry: A Note on the Influence of Transaction Specific Assets,” Journal of Economic Behavior and Organization, October, 12(2), pp. 265–73. Mukerji, Sujoy. 1995, “A Theory of Play for Games in Strategic Form when Rationality Is Not Common Knowledge,” Mimeo, University of Southampton, U.K. —— 1997 “Understanding the Nonadditive Probability Decision Model,” Economic Theory, January, 9(1), pp. 23–46. Schmeidler, David. 1989, “Subjective Probability and Expected Utility without Additivity,” Econometrica, May, 57(3), pp. 571–87. (Reprinted as Chapter 5 in this volume.) Shapley, Llyod. S. 1971, “Cores of Convex Games,” International Journal of Game Theory, January, 1(1), pp. 12–26. Simon, Herbert A. 1951, “A Formal Theory of the Employment Relationship,” Econometrica, July 19(3), pp. 293–305. Tallon, Jean-Marc. 1998, “Asymmetric Information, Nonadditive Expected Utility, and the Information Revealed by Prices: An Example,” International Economic Review, May 39(2), pp. 329–42. Tirole, Jean. 1994, “Incomplete Contracts: Where Do We Stand?” Walras-Bowley Lecture, Summer Meetings of the Econometric Society. Williams, Steven R. and Radner, Roy. 1988, “Efficiency in Partnership When the Joint Output Is Uncertain,” Northwestern Center for Mathematical Studies in Economics and Management Science Working Paper No. 760. Williamson, E. Oliver, 1985, The economic institutions of capitalism. New York: Free Press.
15 Ambiguity aversion and incompleteness of financial markets Sujoy Mukerji and Jean-Marc Tallon
15.1. Introduction Suppose an agent’s subjective knowledge about the likelihood of contingent events is consistent with more than one probability distribution. And further that, what the agent knows does not inform him of a precise (second order) probability distribution over the set of “possible” (first order) probabilities. We say then that the agent’s beliefs about contingent events are characterized by ambiguity. If ambiguous, the agent’s beliefs are captured not by a unique probability distribution in the standard Bayesian fashion but instead by a set of probabilities. Thus not only is the outcome of an act uncertain but also the expected payoff of the action, since the payoff may be measured with respect to more than one probability. An ambiguity averse decision maker evaluates an act by the minimum expected value that may be associated with it: the decision rule is to compute all possible expected values for each action and then choose the act which has the best minimum expected outcome. This (informal) notion of ambiguity aversion inspires the formal model of Choquet expected utility (CEU) preferences introduced in Schmeidler (1989). The present chapter considers a model of financial markets populated by agents with CEU preferences, with the interpretation that the agents’ preferences demonstrate ambiguity aversion.1 Typically, economic agents are endowed with income streams that are not evenly spread over time or across uncertain states of nature. A financial contract is a claim to a contingent income stream—hence the logic of the financial markets: by exchanging such claims agents change the shapes of their income streams, obtaining a more even consumption across time and the uncertain contingencies. A financial market is said to be complete if contingent payoffs from the different marketed financial contracts are varied enough to span all the contingencies. However, casual empiricism suggests that in just about every financial market in the real world the span is less than the full set of contingencies, that is, the markets are incomplete. The primary implication of incompleteness of financial markets is that agents may transfer income only across a limited set of contingencies and are thus left to share risk in a suboptimal manner.2
Mukerji, Sujoy and Tallon, Jean-Marc (forthcoming). “Ambiguity aversion and incompleteness of financial markets,” Review of Economic Studies (2001), vol. 68(4), 883–904.
Incompleteness of financial markets
337
Consider the following question: Take a (financial) economy with complete markets, but suppose agents are not subjective expected utility (SEU) maximizers, but rather CEU maximizers; are there conditions under which it is possible that at a competitive equilibrium agents do not trade some assets and hence their equilibrium allocations are equivalent to competitive allocations deriving from some incomplete market economy wherein the allocations are not Pareto optimal? The answer to the question is a qualified yes. The qualification is important and the essential contribution of the present chapter is in identifying this qualification. Imposing CEU maximization in a complete market economy does not generate no-trade, but, as this chapter shows, a robust sequence of incomplete market economies which would converge to complete markets with SEU agents but does not with CEU, can be constructed. The key characteristic of such a sequence of economies is that they include, as nonredundant instruments of risk-sharing, financial assets which are affected by idiosyncratic risk.3 We establish that trade in financial assets, whose payoffs have idiosyncratic components, may break down because of ambiguity aversion. We find, furthermore, that the no-trade due to ambiguity aversion is a robust occurrence, in the sense that it takes place even in the limit replica economy, with enough replicas of the financial assets such that idiosyncratic risk may be completely hedged. Hence, the behavior of the limit replica economy is markedly different depending on whether agents are SEU maximizers or CEU maximizers: in the former case the allocation is precisely that of a complete markets economy whereas in the latter case, because of the endogenous breakdown of trade, the equilibrium allocation, given a “high enough” level of ambiguity aversion and idiosyncratic risk, is not Pareto optimal and the nature of risk-sharing is as in an incomplete markets economy. These findings are of interest, both for the way it complements the related literature and for the substantive economic insight it gives rise to. Dow and Werlang (1992) showed, in a model with one risky and one riskless asset, a single ambiguity averse agent with CEU preferences, exogenously determined asset prices, and a risk-less initial endowment, that there exists a nondegenerate price interval at which an agent will strictly prefer to take a zero position in the risky asset. Recall, the logic of this result essentially rests on the observation that a CEU agent when going short in the risky asset will use a different probability to evaluate expected return as compared to when going long, since an agent taking a short (long) position is relatively better (respectively, worse) off in states where the asset payoff is shocked adversely. Having (robustly) rationalized a zero position in a single decision-maker framework one might be tempted to conjecture (even though such a conjecture is not made by Dow and Werlang) that it were but a short step to generate no-trade in a full equilibrium model. But, as we remarked earlier, simply imposing CEU maximization in a complete market economy does not generate no-trade unless endowments are Pareto optimal to begin with. The point is that, with complete markets, allocations are Pareto optimal and hence comonotonic (i.e. every agent’s ranking of states, ranked in accordance with the agent’s ex post utility from the given allocation, is identical) (Chateauneuf et al. (2000)). Comonotonicity implies that all agents evaluate the returns of assets with
338
Sujoy Mukerji and Jean-Marc Tallon
the same probability measure in a CEU world. Thus, closing Dow and Werlang model in the obvious way makes it apparent that, for generic endowments, assets will surely be traded. Hence it is, at least, of academic interest to find what condition actually generates an endogenous closure of some financial markets and a consequent lessening of risk-sharing opportunities, when moving from an SEU to a CEU world. Perhaps, a more compelling reason for interest in our findings is their economic significance. It is widely regarded that a crucial function of financial markets is that they allow individuals to hedge their income (from, say, human capital/labor) risk even though such risks are not, per se, contractible in appropriate detail because of usual reasons of asymmetric information and/or transactions costs. For instance, take X, a shopowner in Detroit, whose fortunes are heavily dependent on the fortunes of the automobile industry centered in Detroit. While X would love to smooth consumption across the various possible income shocks, it is hardly likely that an insurance company would be willing to insure X against anything other than accidents like fire and theft. But, standard economic/finance theory would argue, even though such personalized contracts may not be available, X should be able to hedge his income shocks in the stock market. To transfer income from the “good” states to the “bad,” all that is required is that X take a short position on a portfolio of shares of different firms in automobile (and related) industry and a long position on a “safe” asset (e.g. a government bond). Of course, the returns of any particular share will not be perfectly correlated with X’s income; in particular, each individual share return will be subject to some idiosyncratic risk. But, with a large enough number of such equities in the portfolio, the idiosyncracies may be hedged away, and X would find the (almost) perfect hedge for his income shocks. To X, therefore, for all practical purposes, the economy is very much a complete market economy. However, what this chapter shows is that the story only runs so far in an SEU world, not in a CEU world. Consider two agents trading an equity subject to idiosyncratic risk, with one agent taking a short position while the other goes long. Evidently, then, the variation in the agent’s consumption across states which differ only in terms of the idiosyncratic shocks would be exclusively determined by the nature and extent of the shocks and the agents’ position on the asset. Moreover, the variation of each agent’s consumption across such states will be inversely related, and therefore, their consumption will not be comonotonic. Hence, given ambiguity aversion with CEU preferences, an agent will behave as if he applies a different probability measure depending on whether he is choosing to go short or to go long. Therefore, it may be that the minimum asking price of the agent when choosing to go short will be higher than the maximum bid of the agent when choosing to go long. Thus, no trade may result, and the chapter provides sufficient conditions that obtain the result. Indeed, as we show, the no-trade outcome will survive even in the limit, when there are an arbitrarily large number of (independent) replicas of the equity. The intuition here is that the law of large numbers implies that the agents’ beliefs on the payoff of a portfolio of risky assets, hit (in part) by idiosyncratic shocks, converge to some mean, but the mean is in principle different for
Incompleteness of financial markets
339
agents taking differently signed positions on the (relevant) assets. In this fashion, ambiguity aversion creates an endogenous limit to the extent of risk sharing possible through financial markets, thereby providing a (theoretical) justification for the basic premise of the general equilibrium with incomplete markets model (HRI). To see it in the eyes of X, in a CEU world, unlike in an SEU world, there may not exist prices that would allow X to go short on automobile industry equities as he needs to do to “export” his income risk. The same market which offers possibilities of risk-sharing equivalent to complete markets when beliefs and behavior are in accordance with SEU, offers only the Pareto suboptimal risk-sharing possibilities on an incomplete market economy when agents are CEU maximizers with beliefs that are “sufficiently” ambiguous. The rest of the chapter is organized as follows. Section 15.2 provides an introduction to the formal model of ambiguity aversion applied in this chapter. Section 15.3 contains the formal model of the finance economy and the main result. Section 15.4 concludes the chapter. Appendix A contains some technical material on independence and law of large numbers for capacities. All formal proofs are in theAppendix B.
15.2. Choquet expected utility and the related literature Let = {ωi }N i−1 be a finite state space, and assume that the decision maker (DM) chooses among acts with state contingent payoffs, z : → R. In the CEU model (Schmeidler, 1989) an ambiguity averse DM’s subjective belief is represented by a convex non-additive probability function (or a convex capacity), ν such that, (i) ν(∅) = 0, (ii) ν( ) = 1 and, (iii) ν(X ∪ Y ) ≥ ν(X) + ν(Y ) − ν(X ∩ Y ), for all X, Y ⊂ . Define the core of ν, (notation: ( ) is the set of all additive probability measures on ): C(ν) = {µ ∈ ( ) | µ(X) ≥ ν(X), for all X ⊂ }. Hence, ν(X) = minµ∈C (ν) µ(X). The ambiguity4 of the belief about an event X is measured by the expression A(X; ν) ≡ 1 − ν(X) − ν(X c ) = maxµ∈C (ν) µ(X) − minµ∈C (ν) µ(X). Like in SEU, a utility function u : R+ → R, u (·) ≥ 0, describes DM’s attitude to risk and wealth. Given a convex non-additive probability ν, the Choquet expected utility5 of an act is simply the minimum of all possible “standard” expected utility values obtained by measuring the contingent utilities possible from the act with respect to each of the additive probabilities in the core of ν: $ u(z(ω))µ(ω) . CEν u(z) = min µ∈C (ν)
ω∈
The fact that the same additive probability in C(ν) will not in general “minimize” the expectation for two different acts, explains why the Choquet expectations operator is not additive, that is, given any acts z, w: CEν (z) + CEν (w) ≤ CEν (z + w).
340
Sujoy Mukerji and Jean-Marc Tallon
The operator is additive, however, if the two acts z and w are comonotonic, that is, if (z(ωi ) − z(ωj ))(w(ωi ) − w(ωj )) ≥ 0. In our analysis, we will need to consider the independent product of capacities. The independent product of two convex capacities, ν1 and ν2 according to the definition (suggested by Gilboa and Schmeidler, 1989) we apply in this chapter, may be (informally) understood as the lower envelope of the set {µ1 ×µ2 |µ1 ∈ C(ν1 ), µ2 ∈ C(ν2 )}. Unlike what is true with “standard” probabilities, there is more than one way to define the independent product of two capacities. As it turns out, the formal analysis in this chapter is unaffected if an alternative definition of independence were applied. We refer the interested reader to the Appendix A and to the discussion at the end of Section 15.3 for more on the independent product of capacities and turn next to the use of capacities and CEU on portfolio decision problems. Dow and Werlang (1992), as noted earlier, identified an important implication of Schmeidler’s model. They showed, in a model with one risky and one riskless asset, that if a CEU maximizer has a riskless endowment than there exists a set of asset prices that support the optimal choice of a riskless portfolio. The intuition behind this finding may be grasped in the following example. Consider an asset that pays off 1 in state L and 3 in state H and assume that ν(L) = 0.3 and ν(H ) = 0.4. Assuming that the DM has a linear utility function, the expected payoff of buying a unit of z, the act zb , is given by CEν (zb ) = 0.6 × 1 + 0.4 × 3 = 1.8. On the other hand, the payoff from going short on a unit of z (the act zs ) is higher at L than at H . Hence, the relevant minimizing probability when evaluating CEν (zb ) is that probability in C(ν) that puts most weight on H . Thus, CEν (zs ) = 0.3 × (−1) + 0.7 × (−3) = −2.4. Hence, if the price of the asset z were to lie in the open interval (1.8, 2.4), then the investor would strictly prefer a zero position to either going short or buying. Unlike in the case of unambiguous beliefs there is no single price at which to switch from buying to selling. Taking a zero position on the risky asset has the unique advantage that its evaluation is not affected by ambiguity. The “inertia” zone demonstrated by Dow and Werlang was simply a statement about optimal portfolio choice corresponding to exogenously determined prices, given an initially riskless position. However, it does not follow from this result at the individual level that no-trade is an equilibrium when closing the model by allowing agents to trade their risks, as we illustrate next using the Edgeworth box diagram in Figure 15.1. The diagram depicts the possibilities of risk-sharing (one may think of the risksharing as being obtained through the exchange of two Arrow securities, one for such contingency) between two CEU agents, h = 1, 2, with uncertain endowment in the two states, ωa and ωb . W is the endowment vector. Notice that, because of ambiguity aversion, the indifference curves are kinked at the point of intersection with the 45◦ ray through the origin. The shaded area in the diagram represents the area of mutually advantageous trade. Hence, no-trade is an equilibrium outcome in this economy if and only if endowment is Pareto optimal to begin with. Introduction of ambiguity aversion in an economy, seemingly, would not impede the trade in risk sharing. Contracts and would not be a reason for incomplete risk sharing. The reason for this “absence of no-trade” goes as follows: Pareto optimal
Incompleteness of financial markets
341
X1b O2 X2a
45°
W 45° O1
X1a X2b
Figure 15.1 Risk sharing with two CEU agents.
allocations lie within the “tramlines,” the 45◦ rays through each origin, that is, they are comonotonic. Hence, at a Pareto optimal allocation, the ranking of the states is identical for both agents and is given by the ordering of aggregate endowment. Now with complete markets, equilibrium allocations are Pareto optimal and therefore comonotonic as well. Thus, agents use the same “minimizing probability” at equilibrium, and agree on asset valuation. Risk-sharing proceeds just as in an economy with SEU agents (see Chateauneuf et al. (2000)). Thus, if one wants to obtain that equilibrium be characterized by absence of trade, one has to move away from this (canonical) example, something that is accomplished by introducing into the model assets with idiosyncratic payoff components. Epstein and Wang (1994) recognized the role of first of the two conditions defining idiosyncratic risk (as defined in this chapter) in obtaining nonunique equilibrium asset prices in a CEU world. That result is related to ours. The precise relationship between the results deserves careful discussion. For expository purposes, we turn to this discussion at the end of the next section, after the presentation of our model. We end this section with a discussion of another model of behavior under Knightian uncertainty due to Bewley (1986), distinct from the one applied in this chapter, which easily generates a no-trade result. Bewley, essentially, drops Savage’s assumption that preferences are complete and adds an axiom of the “status quo.” In our Edgeworth box this would amount to assuming that indifference curves are kinked precisely at the endowment point, irrespective of its position in the box. If indifference curves are “kinked enough,” the incompleteness of markets for contingent deliveries (the absence of trade) is then a direct consequence of preference for status quo which is exogenously imposed as a part of the definition of ambiguity aversion.
342
Sujoy Mukerji and Jean-Marc Tallon
15.3. The model and the main result The setting for our formal analysis is a model of a stylized two period finance economy which we call an n-financial asset economy with idiosyncracy. Households (h = 1, . . . , H ) trade assets in period 0, before uncertainty is resolved, and consume the one (and only) good in period 1. The assets available for trade are claims on deliveries of the consumption good in period 1. There are two sources of uncertainty. First, there is some “economic uncertainty”: agents do not know their endowments tomorrow. An economic state of the world, s, s = 1, 2, is completely identified by the endowment vector for that s state (e1s , . . . ehs , . . . , eH ); where each component of the vector, ehs ∈ R+ , gives a particular household’s endowment of the consumption good in state s (arising in period 1). We have restricted our analysis to the case of risk-sharing across only two economic states, to make the argument as transparent as possible. Second, there is idiosyncratic financial uncertainty. An idiosyncratic state of the world completely characterizes the realization of the idiosyncratic components of payoffs of the available financial assets (described below); it is identified by the vector t = (t1 , t2 , . . . , tn ), where ti ∈ {0, 1}, i ∈ {1, 2, . . . , n}, and n is the total number of financial assets. τn denotes the set of all t’s, that is, τn ≡ {0, 1}n . Hence, to obtain a complete description of a state of the world, exhausting all uncertainty relevant to the model, the economic states s must each be further partitioned into cells denoted (s, t). A typical state of the world is denoted by the letter ω, ω ∈ ≡ {(1, t)t∈τn , (2, t)t∈τn }. The assets available for trade at date 0 are as follows: 1
2
Financial assets, zi , i = 1, . . . , n, with payoffs that have idiosyncratic components. An asset zi yields a payoff of y s + y(ti ) > 0 units of the good; s = 1, 2, ti ∈ τ ≡ {0, 1}. y(ti ) is the idiosyncratic component, in the sense that it is independent of the realized economic state and independent of the realization of the payoff from any other financial asset zj , where j = i. It is assumed i that y(1) > y(0) and that y 1 = y 2 . Price of an asset zi is denoted by qnz . A safe asset, b, which delivers one unit of the good irrespective of the realized state of the world. Price of this security is normalized to 1.
A point behind modeling the asset structure as above is to ensure that in order to transfer resources across the two economic states the agents would have to rely on financial assets whose payoffs are affected by idiosyncratic shocks. Prior to the resolution of uncertainty, agents are endowed with a common belief about the likelihood of state ω. The (marginal) beliefs about particular idiosyncratic component ti are described by a capacity νi , νi (0) + νi (1) ≤ 1. To model the assumption that the realization of ti and tj are believed to be independent, the beliefs τn are described by the independent product (defined in Appendix A) 1on n ν ≡ i=1 νi . For simplicity, we shall assume that νi (ti = r) = νj (tj = r) = ν r , r = 0, 1, i, j ∈ {1, . . . , n}. The belief on an economic state s is given by π(s). To make it transparent that it is the ambiguity of beliefs about the idiosyncratic realizations which is responsible for the possibility of no-trade in financial assets,
Incompleteness of financial markets
343
and also to make the computation less tedious, we assume π(1) + π(2) 1 = 1. Finally, the common belief on is given by the independent product π ν. ω ω Let eh,n and xh,n be h’s endowment and consumption, respectively, in state ω = (s, t), given that the total number of financial assets in the economy is n. (s,t) (s,t ) Note, the definition of an economic state implies eh,n = eh,n . Hence, we may use s as a complete description of state contingent endowment. Holding the notation eh,n i . of the asset b by h is denoted bh,n and holding of the asset zi by h is denoted zh,n Agent h has a von-Neumann Morgenstern utility index uh : R+ → R, which is assumed to be strictly increasing, smooth and strictly concave. Furthermore, ω uh (0) = ∞ and eh,n > 0 for all h and all ω. Phn which denotes the maximization program of agent h, is as follows:
s,t CEπ⊗ν uh xh,n max n 1 ,...,zh,n bh,n ,zh,n
i i bh,n + ni=1 qnz zh,n =0 s.t. s,t s i , s = 1, 2, t ∈ τn . xh,n − eh,n = bh,n + ni=1 (y s + y(ti )) zh,n
2 3 1 n An equilibrium consists of a set of asset prices, qn ≡ 1, qnz , . . . , qnz , a set of "
H % 1 , . . . , zn asset holdings, (bn , zn ) ≡ bh,n , zh,n h,n h=1 , and a consumption vector,
ω xn ≡ xh,n , such that, given qn all agents solve Phn , and the asset h=1,...,H ;ω∈
markets clear, that is, i bh,n = zh,n = 0, h
∀i ∈ {1, . . . , n},
h
ω ω and the consumption vector is feasible at each state, that is, h xh,n = h eh,n . Notice, a tuple (qn , (bn , zn )) uniquely pins down the equilibrium, hence we may denote an equilibrium of an n-financial asset economy using such a tuple. In interpreting various aspects of the model it helps to bear in mind the economic issue the model has been formulated to examine, which is, how economic agents may share risks, inherent to their labor/human capital endowment, by trading in financial markets. Hence, as it appears in the model, a household’s endowment income is distinct from the household’s income obtained from the ownership of assets. Portfolio income is the instrument the household is allowed to use to absorb the shocks it faces in its endowed income. But the instrument is not a perfect one. The presence of idiosyncratic risk embodies the notion that payoff from a financial asset is not only affected by some of the same shocks that affect individual households’ endowment income and common to many assets but also by risks specific to each asset. While most firms’ profits are naturally affected by aggregate or sectorial demand shifts and supply shocks, other factors, more idiosyncratic to the firm, do typically matter.6 Finally, notice, we have assumed that the assets are in zero net supply. This implies that the asset trading our analysis applies to include
344
Sujoy Mukerji and Jean-Marc Tallon
all manner of trade in corporate bonds7 ; but for general assets (e.g. equities) the analysis is (formally) restricted to those trades which involve one side of the market going short. The main point of the assumption is that it allows us the abstraction to study how an agent may use a financial asset (say, an equity) to share the risk in his exogenously endowed income: by going short on an asset he issues contingent claims on his risky income, thereby, trading out his risk. To fix ideas, it might help to refer back to the example of X, the Detroit drugstore owner. X would be very representative of the agents in our model presented earlier. Think of the economic states 1 and 2 as states defined by shocks to X’s income from his drugstore. X may hedge his income shock by trading in a “safe” asset, such as a treasury bond, and financial assets, such as corporate bonds/equities issued by the various automobile and ancillary firms located in and around Detroit. Payoffs to each such financial asset it affected by the same income shock that affects X’s drug-store profits. In addition, each financial asset is also affected by shocks idiosyncratic to the issuing firm. Assuming, the firms’ profits and drug-store profits are affected in the same direction by the income shock, X’s hedging strategy would be, presumably, to take a short position on a portfolio of the available financial assets while simultaneously going long on the treasury bond. Our analysis, in effect, compares how such a strategy would fare in an SEU world and in a CEU world. Formally, the analysis compares equilibrium allocations across two cases: one, where beliefs about idiosyncratic outcome is unambiguous (ν 0 + ν 1 = 1), and the other where beliefs about the idiosyncracy is ambiguous (ν 0 + ν 1 < 1). In order to make the comparison stark, the analysis will relate the two cases to two benchmarks. One benchmark is a complete market economy which we call an economy without idiosyncracy, that is, an economy which is identical to the n-financial asset economy with idiosyncracy described in the last section in every respect except that there is only a single financial asset z which pays off y s + Eν y(t) ≡ y¯ s units in the economic states s = 1, 2. Correspondingly, q z denotes the price of z and zh denotes the amount held by household h. (Note, when denoting endogenous variables in the economy without idiosyncracy we may omit the subscript n.) The second benchmark is an incomplete market economy which is identical to the n-financial asset economy with idiosyncracy in every respect except that the only asset available is the safe asset. The following Lemma simplifies the analysis greatly. Lemma. Let (qn , (bn , zn )) be an equilibrium of the n-financial assets economy i i with idiosyncracy. Suppose ν 0 + ν 1 ≤ 1. Then, zh,n = zh,n , ∀i, i ∈ {1, . . . , n}, ∀h ∈ {1, . . . , H }. According to the Lemma, at an equilibrium, agents will hold all the financial assets in the same proportion. This is essentially a consequence of the fact that agents are risk averse and that the n financial assets are simply “independent replicas.” Let z˜ n denote a unit of a portfolio composed of n1 unit of the asset zi , i = 1, . . . , n; z˜ h,n is the amount held of this portfolio by h and q˜n is the price
Incompleteness of financial markets
345
of a unit of this portfolio. Given the Lemma, we may assume, without loss of generality, that it is only the asset z˜ n , instead of the individual assets zi , that is available for trade in the economy. Hence, an equilibrium of an n-financial assets economy with idiosyncracy, (qn , (bn , zn )), may equivalently be denoted by the tuple (q˜ n , (bn , z˜ n )), q˜ n ≡ {1, q˜n } and (bn , z˜ n ) ≡ {(bh,n , z˜ h,n )H h=1 }. The above characterization of the equilibrium in turn facilitates a simple definition of what it means to satisfy the conditions of equilibrium when n is arbitrarily large. We say (q˜ ∞ , (b∞ , z˜ ∞ ), x∞ ) satisfies the conditions of equilibrium of the n-financial assets economy with idiosyncracy where n is arbitrarily large, that is, n → ∞ if 8 : 1
2
Given q˜ ∞ , ((b∞ , z˜ ∞ ), x∞ ), is a solution to the problem P˜ h,∞ defined as follows:
s,t max CEπ ⊗ν uh xh,∞ ⎧ bh,∞ + q˜∞ z˜ h,∞ = 0 ⎪ ⎪ * ) ⎨ n (y s + y(ti )) s,t s s.t. xh,∞ , − eh,∞ = bh,∞ + z˜ h,∞ limn→∞ i=1 ⎪ n ⎪ ⎩ s = 1, 2, with probability 1,
= h z˜ h,∞ = 0, and the consumption vector is feasible at each ω ω state, that is, h xh,∞ = h eh,∞ with probability 1. h bh,∞
Theorem. Suppose ν 0 + ν 1 = 1. Then (q˜ ∞ , (b∞ , z˜ ∞ )) satisfies the conditions of equilibrium of the n-financial assets economy with idiosyncracy where n is arbitrarily large, if and only if, (q˜ ∞ , (b∞ , z˜ ∞ )) describes an equilibrium of an economy without idiosyncracy, wherein the price of a unit of z is equal to q˜∞ , and a household’s holding of the asset z, zh , is equal to z˜ h,∞ . The theorem shows that equilibrium allocations of the n-financial assets economy with idiosyncracy are essentially identical to that of the economy without idiosyncracy, in which financial markets are complete, provided the number of available financial assets is large enough and agents’ beliefs are unambiguous. The result follows from an application of the usual diversification principle stating that in the limit idiosyncracies are “washed away,” in conjunction with the assumption that y 1 = y 2 . However, if the model of the n-financial assets economy with idiosyncracy were to be reconsidered with the sole amendment that beliefs about idiosyncracies are ambiguous, that is, ν 0 + ν 1 < 1, then the result no longer holds. In such an economy, however large the n, given sufficient ambiguity, the equilibrium allocation is bounded away from Pareto optimal risk-sharing. The allocation actually coincides with the allocation of an incomplete market economy in which it is impossible to transfer resources between states 1 and 2, as we show in our main theorem, later. But, first, we present Example 15.1 to convey an intuition for the result.
346
Sujoy Mukerji and Jean-Marc Tallon
Example 15.1. Consider a 2-period finance economy with two risk averse agents, h = 1, 2, and two economic states. There are two assets available, b and z. b is a safe asset; it delivers one unit of the good in each of the two economic states. The payoff of z in state (s, t) is y s + y(t), s = α, β; t = 0, 1. Fix, y α = 1, y β = 2, y(0) = 0, y(1) = 2. First consider the case where ν 0 + ν 1 = 1. The model reduces to a standard incomplete market equilibrium with two assets and four states, in which, for “generic” endowments, there is trade, that is, some partial insurance among agents.9 Next, suppose, to simplify matters drastically, that ν 0 = ν 1 = 0. Consider an agent h contemplating buying the uncertain asset at a price q z , given the safe asset is priced q b = 1. h may buy zh units of the uncertain asset and take a position bh in the safe asset such that bh + q z zh = 0. His utility functional is then given by: CEπ ⊗ν uh (ehs + zh (y s + y(t)) + bh ) = uh (ehα + zh (y α + y(0)) + bh )π(α)(1 − ν 1 ) + uh (ehα + zh (y α + y(1)) + bh )π(α)ν 1
β + uh eh + zh y β + y(0) + bh π(β)(1 − ν 1 )
β + uh eh + zh y β + y(1) + bh π(β)ν 1 Once we substitute in ν 1 = 0, it is clear from the above functional that the payoff matrix the agent (as a buyer of z) will consider is: ) * 1 1 1 2 If q z ≥ 2, any balanced portfolio with zh > 0 yields negative payoffs and is therefore not worth buying. Thus, an agent will wish to buy the uncertain asset only if q z < 2. Next consider an agent h who contemplates going short on asset z. His utility functional is therefore: CEπ ⊗ν uh (ehs + zh (y s + y(t)) + bh ) = uh (ehα + zh (y α + y(0)) + bh )π(α)ν 0 + uh (ehα + zh (y α + y(1)) + bh )π(α)(1 − ν 0 )
β + uh eh + zh y β + y(0) + bh π(β)ν 0
β + uh eh + zh y β + y(1) + bh π(β)(1 − ν 0 ) Notice now the functional is dependent on ν 0 since the agent is going short, that is, zh < 0. Substituting ν 0 = 0, we find the payoff matrix the agent h will consider: ) * 1 3 1 4
Incompleteness of financial markets
347
For q z ≤ 3 any balanced portfolio with zh < 0 yields negative payoffs. Thus, an agent will wish to sell the risky asset only if q z > 3. Thus, buyers of asset 1 will not want to pay more than 2, while sellers will not sell it for less than 3. Hence, there does not exist an equilibrium price such that agents will have a nonzero holding of the uncertain asset. Next, consider another extreme, a case in which ambiguity appears only on the economic states while the agents are able to assess (additive) probabilities for the idiosyncratic states. In fact, to keep matters stark, assume π(α) = π(β) = 0, though the additive probability on idiosyncratic states is arbitrary, simply ensurβ ing that ν 0 + ν 1 = 1. Suppose that, for agent h, ehα > eh . Then, for zh ∈ (−ε, ε), for ε small enough, CEπ ⊗ν uh (ehs + zh (y s + y(t)) + bh )
β β = ν 0 uh eh + zh y β + y(0) + bh + ν 1 eh + zh y β + y(1) + bh β
since for zh small enough ehα + zh (y α + y(t)) + bh > eh + zh (y β + y(t)) + bh . Hence, zh = 0 if and only if q z = y β + ν 0 y(0) + ν 1 y(1) (the fact that endowments and the utility function do not appear in this expression is due to the extreme form of ambiguity assumed, that is, a maximin behavior). Thus, the only candidate for a no-trade equilibrium price is q z = y β + ν 0 y(0) + ν 1 y(1). Now, assume that for at least one other agent, the order of the endowment is β reversed, that is, eh > ehα , then a computation similar to the one earlier shows that such agents will not want to trade the risky asset if and only if q z = y α + ν 0 y(0) + ν 1 y(1) Hence, if both types of agents are present in the economy, trade will occur as y α = y β . If we were not to assume the extreme maximin form of preferences but that π(α) + π(β) < 1 with, say, π(α) > 0 and π(β) > 0, the no-trade price for β agent h (say with ehα > eh ) depends on his initial endowment and utility function (i.e. relative attitude to risk). In that case, even if endowments of all agents were β comonotonic (i.e. ehα eh for all h) there would not exist, for the generic endowment vector, an asset price q z that would support no-trade as an equilibrium of this economy. The two more significant ways in which the main theorem, generalizes the demonstration in Example 15.1 are: one, it shows that no-trade obtains even when beliefs have a degree of ambiguity strictly less than 1; two, it allows for any arbitrary number of financial assets, in particular, for n → ∞. We consider the intuition for each of these generalizations in turn. First, consider a 2-(economic) state, 2 agent, 1-financial asset (and 1 safe asset) economy with idiosyncracy,
348
Sujoy Mukerji and Jean-Marc Tallon
in which the financial asset’s payoffs are as in Example 15.1. Consider an agent thinking of buying the financial asset. The maximum payoff he expects in any economic state is 2 + 0 × (1 − ν 1 ) + 2 × ν 1 ≡ V (B), the amount he expects in state β. This implies, whatever his utility function, whatever his endowment vector, whatever his beliefs about the economic uncertainty, he will not want to buy the asset for more than V (B). Now, instead, if an agent were to go short with the asset, the least he expects to have to repay in any economic state is 1 + 0 × ν 0 + 2 × (1 − ν 0 ) ≡ V (S), and therefore, will not want to sell the asset if the price is less than this. Clearly, if ν 0 and ν 1 were small enough, V (B) < V (S). Therefore, if ν 0 and ν 1 were small enough, agents will not trade in the financial asset. Intuition about the second bit of generalization is difficult to obtain without some understanding of how the law of large numbers works for nonadditive beliefs. Specifically, let us consider an i.i.d. sequence {Xn }n≥1 of {0, 1}-valued random variables. Suppose, ν({Xn = 0}) = ν({Xn = 1}) = 14 for all n ≥ 1. As is usual with laws of large numbers, the question is about the limiting distribution of the sample average, n1 ni=1 Xi . The law10 implies: ν
1 1 1 3 ≤ lim inf Xi ≤ lim sup Xi ≤ n→∞ 4 n 4 n→∞ n n
n
i=1
i=1
! = 1.
This shows that the DM has a probability 1 4belief5 that the limiting value of the sample average lies in the (closed) interval 14 , 43 . However, unlike in the case of additive probabilities, the DM is not able to further pin down its value. Thus, even with non-additive probabilities the law of large numbers works in the usual way, in the sense that here too the tails of the distribution are “canceled out” and the distribution “converges on the mean.” But of course here, given that the DM’s knowledge is consistent with more than one prior, there is more than one mean to converge on; hence, the convergence is to the set of means corresponding to the set of priors consistent with the DM’s knowledge. Hence, a CEU maximizer whose (ex post) utility is increasing in X (e.g. when the DM is a buyer of an asset with payoff X) will behave as if the convergence of the sample average occurred at 14 , the lower boundary of the interval, while a DM whose utility is increasing in −X (e.g. when the DM is a seller of an asset with payoff X) will behave as if the convergence of the sample average occurred at, 34 , the upper boundary of the interval. Now we can complete our intuition for the main result. Consider a modification of the simplified financial economy of Example 15.1 such that, ceteris paribus, there are now n fold replicas of the financial asset, n → ∞. We consider trade between “two” assets, one the safe asset and the other the “portfolio” asset, containing each of the independent replica assets in equal proportion. The law of large numbers result, explained earlier, implies that any agent contemplating going long4 on the portfolio asset 5will behave as if a unit of the portfolio will payoff y s + 0 × (1 − ν 1 ) + 2 × ν 1 with probability 1 in economic state s
Incompleteness of financial markets
349
while an agent contemplating going short will5 behave as if a unit of the port4 folio will payoff y s + 0 × ν 0 + 2 × (1 − ν 0 ) with probability 1 in economic state s. Hence, exactly the same argument as before applies: for ν 0 and ν 1 sufficiently small, V (B) < V (S) and there will not be any trade in the portfolio. The important insight here is that while agents are fully aware that a “well diversified” portfolio “averages out” the idiosyncracies, they only have an imprecise knowledge of what it averages out to. Another important point demonstrated in Example 15.1, as modified earlier, is how equilibrium risk-sharing is affected by ambiguity aversion. If 1 − ν 0 − ν 1 > 12 , then the equilibrium allocation is necessarily not Pareto optimal unless endowments are, no matter how large the value of n. Consider an economy, E, which is the same as in the original example except that there is only one financial asset available in this economy, the safe asset b. Given ambiguity is greater than 1 2 , there is no-trade in the portfolio of uncertain assets in the economy in (the modified) Example 15.1, hence an equilibrium allocation of this economy is an equilibrium allocation of E. E has two states, α and β, but one asset, and therefore, is an incomplete markets economy with sub-optimal risk-sharing. We now state our main result: Main Theorem. Consider the n-financial assets economy with idiosyncracy. Let y s ≡ min{y s } and y s ≡ max{y s } and suppose that y s − y s < y(1) − y(0). Then s s ¯ z˜ h,n = 0 for all ¯ 0 < A¯ < 1, such that if 1 − ν 0 − ν 1 > A, there exists an A,
h ∈ {1, . . . , H } and xns,t = xns,t , s = 1, 2, t = t at every equilibrium (qn , (bn , zn )), for all n ∈ N, including, n → ∞.
Stated differently, this says that if the range of variation of the idiosyncratic component of the financial asset is greater than the range of variation due to the economic shocks, if the beliefs over the idiosyncratic states are ambiguous enough, and if agents are ambiguity averse, then irrespective of the utility functions of the agents and the endowment vector, the equilibrium of an n-financial assets economy with idiosyncracy is an equilibrium of the economy with one safe asset, that is, an economy with incomplete markets, since the financial assets are not traded in equilibrium, whatever the value of n. Notice further, if the conditions described in the theorem are met, then for a generic endowment, an equilibrium allocation of the n-financial assets economy with idiosyncracy is necessarily not Pareto optimal. This follows simply from the understanding that an equilibrium allocation of the n-financial assets economy with idiosyncracy, given the conditions of the theorem, is an equilibrium of the economy with one safe asset. The latter economy is an incomplete market economy in which it would not be possible to transfer resources between states 1 and 2. The significant sufficient condition to ensure no-trade, irrespective of the utility functions of the agents and the endowment vector, is that y s − y s < y(1) − y(0). ¯ A¯ = (y s − y s )/(y(1) − y(0)), The bound follows from the expression for A, constructed in the proof of the main theorem. Notice, A¯ is the supremum among
350
Sujoy Mukerji and Jean-Marc Tallon
the values of ambiguity required for no-trade, across all the possible combinations of parameters of utilities or endowments, and is independent of any parameter of utility or endowment. So, typically, the ambiguity required for no-trade be less ¯ further, no-trade will result even if y(1) − y(0) < y s − y s . Also, the than A; required ambiguity will be greater, greater the risk aversion and/or riskiness of the endowment (see example 3 in Mukerji and Tallon (1999)). One might be tempted to conjecture that results of the chapter may be replicated by simply assuming heterogeneous beliefs among agents. Or to conjecture, since with incomplete markets comonotonicity of equilibrium allocations is in general broken so that different (CEU) agents would evaluate their prospects using different (effective) probabilities, that adding CEU agents might “worsen” incompleteness even in the absence of idiosyncratic risks. Both conjectures are, however, false. What is at work in obtaining no-trade is not that different agents have different beliefs but that any given agent behaves as if he evaluates the two different actions, going short and going long, with different (probabilistic) beliefs. Neither does market incompleteness, in the absence of idiosyncratic risk, make for this peculiarity and therefore does not, in and of itself, lead to no-trade. We illustrate this with the following example. Example 15.2. Suppose there are S states, H agents, one safe asset and one risky asset that pays off y s unit of the good in state s. Agent h’s budget constraints are (we normalize the price of the safe asset in the first period as well as the price of the good in all states to be equal to 1): "
bh + qzh = 0 xhs = ehs + (y s − q)zh
s = 1, . . . , S
Claim. Assume that there are no pairs of states s and s such that y s = y s and es = es . Then, there exists a unique price qh such that zh∗ (qh ) = 0.
Proof. Assume w.l.o.g. that eh1 ≤ eh2 ≤ · · · ≤ ehS . Since by assumption ehs = ehs ⇒ y s = y s , there exists ε > 0 such that for all zh ∈ (−ε, ε): eh1 + (y 1 − q)zh ≤ eh2 + (y 2 − q)zh ≤ · · · ≤ ehS + (y S − q)zh Let (zh ) be the set of probability measures in C(ν) that minimize Eµ∈C (ν) uh (eh + (y s − q)zh ), that is, (zh ) = {(µ1 , . . . , µS ) ∈ C(ν) | Eµ uh (eh + (y s − q)zh ) = Eν uh (eh + (y s − q)zh )}. Observe that if µ, µ ∈ (zh ) are different, then they must disagree on those states where consumption is identical, or, said differently (given the order we adopted on h’s endowment):
ehs + (y s − q)zh = ehs + (y s − q)zh ∀s = s
⇒ µs = µ s = ν({s, . . . , S}) − ν({s + 1, . . . , S})
Incompleteness of financial markets
351
Hence, zh = 0 is optimal at price qh if and only if there exists µ ∈ (0) such that: µs y s uh (ehs ) q = qh ≡ s µs uh (ehs ) s
Recall now that probability measures in (0) can differ only on those states in which the endowment is constant. Since, by assumption, ehs = ehs ⇒ y s = y s , one obtains, Eµ [y s uh (ehs )] = Eµ [y s uh (ehs )] for all µ, µ ∈ (0). Since Eµ uh (ehs ) = Eµ uh (ehs ) for all µ, µ ∈ (0), qh as defined above is unique. We just established that there is only one price qh , defined in the proof earlier, such that at this price, agent h optimally wants a zero position in the risky asset. Now, unless the endowment allocation is Pareto optimal, qh = qh . Hence, at an equilibrium, trade on the market for the risky asset will be observed. This establishes that, “generically,” in order for zh = 0 for all h to be an equilibrium of the model, there must be pairs of states s, s such that ehs = ehs for all h and y s = y s ; in other words, an idiosyncratic element is necessary to obtain no trade. Before we close this section, we attempt to clarify further how our main result adds to the findings in the related literature. In Example 15.2, inspite of an incomplete markets environment, inspite of CEU agents, no-trade fails to materialize because each agent has a unique price at which he takes a zero position in the asset, and in general, this price is different for different agents. Dow and Werlang (1992) may be read as an exercise in purely deriving the demand function for a risky asset, given an initial riskless position. By putting together two Dow and Werlang agents one does obtain an economy where an equilibrium may be defined, but given that such agents’ endowments are riskless, agents do not have any risks to share in such an economy. Hence simply “completing” the Dow and Werlang exercise to obtain an equilibrium model does not allow one to investigate the question addressed in the present chapter, which is, whether ambiguity aversion affects risk-sharing possibilities in the economy. And, as explained in the previous section, even if we were to make the simple further extension of allowing uncertain endowments, given complete markets, we will find ambiguity has no effect. Finally, as Example 15.2 demonstrates, an even further extension of allowing market incompleteness does not provide the answer either. Evidently, one has to move further afield from the Dow and Werlang analysis to address our question. Epstein and Wang (1994) significantly generalized the Dow and Werlang (1992) result to find that price intervals supporting the zero position occurred (in equilibrium) if there were some states across which asset payoffs differ while endowments remain identical. The intuition for this is as follows. To obtain a range of supporting prices for the zero position, there must occur a “switch” in the effective probability distribution precisely at the zero position. That is, depending on whether he takes a position + or − away from 0, howsoever small, the agent evaluates his position using a different probability. For this to happen, the agent’s ranking
352
Sujoy Mukerji and Jean-Marc Tallon
of states (according to his consumption) must switch depending exclusively on whether he takes a posititive or negative position on the asset. Hence, there must be at least two states for which even the smallest movement away from the zero position would cause a difference in the ranking of the states depending on which side of zero one moves to. Clearly, this may only be true if the endowment were constant across the two states while the asset payoff were not. The clarification obtained in Epstein and Wang (1994) of the condition that enables multiple price supports to emerge, was the point of inspiration for the research reported in the present chapter. Indeed, the condition of Epstein and Wang (1994) is one of the two conditions we apply to define idiosyncratic risk. Where the present chapter has gone further and what, in essence, is its contribution, is in finding conditions for an economy wherein the agents’ price intervals overlap in such a manner that every equilibrium of the economy involves no-trade in an asset, and more importantly, conditions under which ambiguity aversion demonstrably “worsens” risk sharing and incompleteness of markets. These are issues that were neither addressed nor even raised in Epstein and Wang (1994), formally or informally, and understandably so, since the principal model in that paper was the Lucas (1978) pure exchange economy amended to include ambiguity averse beliefs. This is a model with a single representative agent, or equivalently, a number of agents with identical preferences and endowments. In an equilibrium of such an economy, trade and risk-sharing is trivial since agents will consume their endowments; endowments are, by construction, Pareto optimal.11 Kelsey and Milne (1995) extends the equilibrium arbitrage price theory (APT) by allowing for various kinds of nonexpected utility preferences. One of the cases they consider is the CEU model. The model in the present chapter may be thought of as a special case of the equilibrium APT framework: what are labeled as factor risks in APT are precisely what we call economic states and idiosyncratic risk is present in both models though in our model the idiosyncratic risk has a simpler structure in that there are only two possible idiosyncratic states corresponding to each asset. Only a special case of CEU preferences is investigated by Kelsey and Milne (1995): their Assumption 3.3 allows nonadditive beliefs only with respect to factor risks; idiosyncratic risk is described only by additive probabilities (see Assumption 3.3, the Remark following the assumption and footnote 2). The formal result of their analysis appears in Corollary 3.1 and shows, given the qualifications, the usual APT result continues to hold: diversification may proceed as usual, idiosyncratic risk disappears in the limit as the number of assets tend to infinity and the price of any asset is, consequently, a linear function of factor risks. This formal result is readily understandable given our analysis. As is repeatedly stressed upon in the present chapter, what drives our result is the nonadditivity of beliefs over the idiosyncratic states. While it is not necessary that ambiguity aversion be restricted to idiosyncratic states for our result to hold, it is necessary that there be some ambiguity about idiosyncracies. The no-trade result fails if ambiguity is merely restricted to economic states, as we explained in the latter part of Example 15.1 and in Example 15.2. With ambiguity only on economic states, ambiguity aversion has no bite, irrespective of whether there is only a single asset or infinitely many and
Incompleteness of financial markets
353
hence diversification proceeds as with SEU. Hence, their result would not obtain without the restriction imposed by (their) Assumption 3.3. Our analysis therefore warns against informally extrapolating the Kelsey and Milne (1995) result to think that diversification would proceed as usual even when the special circumstances of Assumption 3.3 does not hold (i.e. the ambiguity is not restricted to economic states but occurs more generally over the state space). Further, it would appear to be a compelling description of the economic environment to assume, if an agent is at all ambiguity averse, the agent will be ambiguity averse about an idiosyncratic risk. By definition, such a risk is unrelated to his own income risk and the macroeconomic environment; the risk stems from the internal workings of a particular firm, something about which the typical agent is likely to have little knowledge of. It is well known that it is possible to define more than one notion of independence for nonadditive beliefs. Ghirardato (1997) presents a comprehensive analysis of the various notions. As Ghirardato notes (p. 263), the problem of defining an independent product has been studied, previous to Ghirardato’s investigation, by Hendon et al. (1996), Gilboa and Schmeidler (1989) and Walley and Fine (1982). The definition invoked in the present chapter, suggested by Gilboa and Schmeidler (1989) and Walley and Fine (1982), is arguably the most prominent in the literature. However, the formal analysis in the present chapter, given the primitives of our model, does not hinge on this particular choice of the notion of independence. An important finding of Ghirardato’s analysis was that the proposed specific notions of independent product give rise to a unique product for cases in which marginals have some additional structural properties. The capacity we use in our model is a product of an additive probability and n two-point capacities (ν consists of two points, ν 0 and ν 1 ). A two-point capacity is, of course, a convex capacity and (trivially) a belief function. As is explicit in Theorems 2 and 3 in Ghirardato (1997), if marginals satisfy the structural properties, the marginals we use do, then uniqueness of product capacity obtains. That is, irrespective of which of the two definitions of independence is adopted, the one suggested by Hendon et al. (1996) or the one we use, the computed product capacity is the same. The law of large numbers that we use formally invokes the Gilboa–Schmeidler notion (see Marinacci, 1999: Theorem 15 and Section 7.2). Since both notions of independence are equivalent given the primitives of our model, it is of irrelevance to our analysis whether the law of large numbers that we use also holds if the alternative notion of independence were adopted. In other words, conclusions of our formal analysis are robust to the adoption of the alternative notion of independence.
15.4. Concluding remarks Financial assets typically carry some risk idiosyncratic to them, hence, disposing incomes risk using financial assets will involve buying into the inherent idiosyncratic risk. However, standard theory argues that diversification would, in principle, reduce the inconvenience of idiosyncratic risk to arbitrarily low levels thereby making the tradeoff between the two types of risk much less severe. This arguments is less robust than what standard theory leads us to believe. Ambiguity
354
Sujoy Mukerji and Jean-Marc Tallon
aversion can actually exacerbate the tension between the two kinds of risks to the point that classes of agents may find it impossible to trade some financial assets: they can no more rely on such assets as routes for “exporting” their income risks. Thus, theoretically, the effect of ambiguity aversion on financial markets is to make the risk sharing opportunities offered by financial markets less complete than it would be otherwise. This is the principal conclusion of the exercise in this chapter. This conclusion is robust, to the extent that many of the assumptions of the model presented in the last section could be substantially relaxed without losing the substance of the analytical results. First, it does not matter whether the beliefs about the economic states are ambiguous, the no-trade result still obtains. Second, given that diversification with replica assets doesn’t work with ambiguous beliefs, one might wonder whether diversification can be achieved through assets which are not replicas (in terms of payoffs). It turns out that it does not make any difference (to the main result) if we were to relax the assumption about “strict” replicas (see Mukerji and Tallon, 1999). It is instructive to note the distinction between the empirical content of a theory of no-trade based on the “lemons” problem (e.g. Morris (1997)) and the theory based on ambiguity aversion. The primitive of the former theory is asymmetric uncertainty between the transacting parties, and significantly, no-trade may result even if there were no idiosyncratic component. Thus that theory, per se, does not link the presence and extent of idiosyncratic component to no-trade. To obtain such a link, one has to assume, a priori, that there is sufficient asymmetric information only in the presence of idiosyncratic information. On the other hand, the theory based on ambiguity aversion does not require that one assumes that ambiguity is present only with idiosyncracies, or that agents have ambiguous beliefs especially with respect to payoffs of assets with idiosyncratic components. One may well begin with the primitive that ambiguity is present in a “general” way, across all contingencies: However, since ambiguity aversion selectively attacks only those assets whose payoffs have idiosyncratic components, the link between idiosyncracy and no-trade is endogenously generated in the theory based on ambiguity aversion. This positive understanding is of significance. History of financial markets is replete with episodes of increase in uncertainty leading to a thinning out of trade (or even seizing up completely) peculiarly in assets such as high yield corporate bonds (“junk” bonds) and bonds issued in “emerging markets” (namely, Latin America, Eastern Europe and East Asia) (see Mukerji and Tallon, 1999). The understanding also explains certain institutional structures adopted in some countries to protect markets from such episodes (see Mukerji and Tallon, 1999).
Appendix A: Some formal details relating to the CEU model Independent product for capacities We consider here the formal modeling of the idea of stochastic independence of random variables when beliefs are ambiguous. Let y be a function from a given space τ to R, and σ (y) be the smallest σ -algebra that makes y a random variable.
Incompleteness of financial markets
355
τn denotes the n-fold Cartesian product of τ , and σ (y1 , . . . , yn ) the product σ algebra on τn generated by the σ -algebras {σ (yi )}ni=1 . The following definition was proposed by Gilboa and Schmeidler (1989), and earlier, by Walley and Fine (1982). Definition 15.A.1. Let νi be a convex 1 non-additive probability defined on σ (yi ). The independent product, denoted ni=1 νi , is defined as follows n 6
νi (A) = min{(µ1 × · · · × µn )(A) : µi ∈ C(vi )
for 1 ≤ i ≤ n}
i=1
for every A ∈ σ (y1 , . . . , yn ), where µ1 × · · · × µn is the standard additive product on σ (y measure. We denote by ⊗νi any non-additive probability 1 11 , . . . , yn , . . .) such that for any finite class {yt1 , . . . , ytn } it holds i≥1 νi (A) = ni=1 νi (A) for every A ∈ σ (y1 , . . . , yn ). The computation of the Choquet expectation operator using product capacities is particularly simple for slice comonotonic functions (Ghirardato (1997)), defined now. Let X1 , . . . , Xn be n (finite) sets and let = X1 ×· · ·×Xn . Correspondingly, let νi be convex non-additive probabilities defined on algebras of subsets of Xi , i = 1, . . . , n. Definition 15.A.2. Let f : → R. We say that f has comonotonic xi -sections , x , . . . , x ) ∈ X × if for every (x1 , . . . , xi−1 , xi+1 , . . . , xn ), (x1 , . . . , xi−1 1 n i+1 · · · × Xi−1 × Xi+1 × · · · × Xn , f (x1 , . . . , xi−1 , ·, xi+1 , . . . , xn ): Xi → R, and , ·, x , . . . , x ): X → R are comonotonic functions. f is called f (x1 , . . . , xi−1 i n i+1 slice-comonotonic if it has comonotonic xi -sections for every i ∈ {1, . . . , n}. The following fact follows from Proposition 7 and Theorem 1 in Ghirardato (1997). Fact 15.A.1. Suppose that f : → R is slice comonotonic. Then CE⊗νi f (x1 , . . . , xn ) = CEν1 . . . CEνn f (x1 , . . . , xn ) In what follows we verify that Fact 15.A.1 applies to the calculation of Choquet expected utility of an agent’s contingent consumption vector. As in the main text
= S × {0, 1}n be the state space, with generic element ω = (s, t1 , . . . , tn ) = s,t , h’s consumption at state ω = (s, t). Finally (s, t). For a given h let x(ω) = xh,n let u: R → R denote the strictly increasing utility index. It will be shown that composite function, u ◦ x(·): → R is slice comonotonic, and therefore, the calculation of CEu (x(ω)) may obtain as in Fact 15.A.1. Recall, ! n y s + y(ti ) s zh x(ω) = x(s, t) = eh + bh + n i=1
where z˜ h is the holding of the diversified portfolio consisting of 1/n units of each financial asset. We first show that x(·) is slice comonotonic. This is done
356
Sujoy Mukerji and Jean-Marc Tallon
by demonstrating, in turn, that x has comonotonic s-sections and comonotonic tj -sections. Fix t = (t1 , . . . , tn ) and t = (t1 , . . . , tn ). Assume that x(s, t) ≥ x(s , t). Then, as required in Definition 15.A. (slice comonotonicity), we want to show that x(s, t) ≥ x(s , t). Now, x(s, t) ≥ x(s , t) ⇐⇒
ehs
n y s + y(ti ) n
+ bn + z˜ h
! ≥
ehs
+ bh + z˜ h
i=1
⇐⇒
ehs
+ bh + z˜ h y ≥ s
n
⇐⇒ ehs + bh + z˜ h
i=1
ehs
n y s + y(ti ) n
!
i=1
s
+ bh + z˜ h y ! ! n y s + y(ti ) y s + y(ti ) s ≥ eh + bh + z˜ h n n i=1
⇐⇒ x(s, t ) ≥ x(s , t ) Hence, x has comonotonic s-sections.
Next, fix (s, t−j ) where t−j = (t1 , . . . , tj −1 , tj +1 , . . . , tn ) and s , t−j . Now, x(s, t−j , tj ) ≥ x(s, t−j , tj )
⎛
⎞ y s + y(ti ) (y s + y(tj )) ⎠ + ⇐⇒ ehs + bh + z˜ h ⎝ n n i =j ⎞
⎛ y s + y(tj ) y s + y(ti ) ⎠ ≥ ehs + bh + z˜ h ⎝ + n n i =j
⇐⇒ y(tj ) ≥
y(tj )
, tj ) ≥ x(s , t−j , tj ) ⇐⇒ x(s , t−j
Repeating this, one shows that x has comonotonic tj -sections, for all j = 1, . . . , n. Hence, x is slice comonotonic. Now, it is possible to see that slice comonotonicity of u ◦ x(·): → R follows readily from the assumption that u is strictly increasing. To this end, notice: x(s, t) ≥ x(s , t) ⇐⇒ u(x(s, t)) ≥ u(x(s , t)) and x(s, t−j , tj ) ≥ x(s, t−j , tj ) ⇐⇒ u(x(s, t−j , tj )) ≥ u(x(s, t−j , tj ))
Incompleteness of financial markets
357
Law of large numbers for capacities: (Marinacci (1996) Theorem 7.7, Walley and Fine (1982)) Let y be a function from a given (countably) finite space to the real line R, and σ (y) the smallest σ -algebra that makes y a random variable. n denotes the n n-fold Cartesian product of , and σ (y1 , . . . , yn ) the product n σ -algebra on
n generated by the σ -algebras {σ (yi )}i=1 . Set Sn = (1/n) i=1 yi . Let each νi be sequence of random variables a convex capacity on σ (yi ), and let {yi }i≥1 be a 1 independent and identically distributed relative to νi . Set Sn = (1/n) ni=1 yi . Suppose both CEν1 (y1 ) and CEν1 (−y1 ) exist. Then % " 1 ω ∈ ∞ : CEν1 (y1 ) ≤ lim inf n Sn (ω) ≤ lim supn Sn (ω) 1. νi = 1. ≤ −CEν1 (−y1 ) " % 1 ω ∈ ∞ : CEν1 (y1 ) < lim inf n Sn (ω) ≤ lim supn Sn (ω) 2. νi = 0. < −CEν1 (−X1 ) 1 3. νi ({ω ∈ ∞ : CEν1 (y1 ) = lim inf n Sn (ω)}) = 0. 1 4. νi ({ω ∈ ∞ : − CEν1 (−y1 ) = lim supn Sn (ω)}) = 0.
Appendix B: Proofs of results in the main text i
Proof of the Lemma. Suppose w.l.o.g q z ≥ q z for some i, i ∈ {1, . . . , n}. First i i i i > zh,n for some ≤ zh,n , ∀h ∈ {1, . . . , H }. Indeed, assume zh,n we show that zh,n and construct the portfolio zh,n as follows: i
zih,n
=
i zh,n
− ε,
zih,n
=
i zh,n
+
qz
i
i qz
ε,
and zh,n = zh,n ∀j = i, i . j
j
where ε is small enough so that zih,n > zih,n . Note, zh,n is budget feasible. Let x s,t h,n
≡
ehs
+ bh,n +
n
i zh,n (y s + y(ti )) for s = 1, 2
i=1 s,t xh,n
x s,t h,n
and are comonotonic, and uh is strictly increasing, it folBecause lows from Definition 15.A.1 that exists an additive product measure µ, where µ ≡ ×ni=1 µi , and µi : 2{0,1} → [0, 1] are additive measures, such that,
s,t s,t s,t CEπ ⊗ν xh,n = Eπ×µ xh,n , CEπ⊗ν x s,t h,n = Eπ×µ x h,n , and,
s,t s,t = Eπ×µ uh xh,n , CEπ ⊗ν uh xh,n
= Eπ×µ uh x s,t , s = 1, 2, ∀t ∈ τn . CEπ ⊗ν uh x s,t h,n h,n
s,t | s = Eµ x s,t Furthermore, Eµ xh,n h,n | s + Eµi εy(ti ) − Eµi εy(ti ), s = 1, 2.
i i Next, notice Eµi εy(ti )−Eµi εy(ti ) ≤ 0. Indeed, either zh,n and zh,n have the same
358
Sujoy Mukerji and Jean-Marc Tallon
i i > 0 > zh,n sign, in which case µi = µi and Eµi εy(ti ) − Eµi εy(ti ) = 0. Or zh,n and then
Eµi εy(ti ) − Eµi εy(ti ) = ε[1 − ν 0 − ν 1 ][y(0) − y(1)] ≤ 0 Hence, x s stochastically dominates x s . Given u < 0, therefore, s,t s,t Eπ ×µ uh (x s,t > h,n ) > Eπ×µ uh (xh,n ). As a consequence, CEπ⊗ν uh (x h,n ) s,t CEπ ⊗ν uh (xh,n ). But this is a contradiction to the hypothesis that (qn , (bn , zn , xn )) i i is an equilibrium. ∴ zh,n ≤ zh,n , ∀h ∈ {1, . . . , H }. H i i Since, (qn , (bn , zn , xn )) is an equilibrium, H h=1 zh,n = 0. Thereh=1 zh,n = i i i i fore, using the fact that zh,n ≤ zh,n for all h, we get that zh,n = zh,n for all h. h∞ given asset prices q˜ ∞ may Proof of the Theorem. The maximization problem P be written as follows: * ) n y s + y(t ) i max Eπ ⊗ν uh ehs + bh,∞ + zh,∞ limn→∞ n i=1 s.t. bh,∞ + q˜∞ z˜ h,∞ = 0 And the maximization problem Ph , solved by the agent in an economy without idiosyncracy, given asset prices q = q˜ ∞ : max s∈{1,2} π(s)uh (ehs + bh,n + zh y s ) s.t. bh + q˜∞ zh = 0 If n → ∞, by the law of large numbers, with probability 1 a unit of the portfolio z˜ n yields a payoff of y s + Et∈{0,1} y(t) ≡ y s units. That is, a.s limn→∞ ni=1 ((y s + y(ti ))/n) −→ y s . Recall, the financial asset z yields y s units of the good in the economic states s = 1, 2. h∞ at prices ( Hence, (b∞ , z∞ ) solves the maximization problem P q∞ ), if and z∞ ) also solves the maximization problem Ph at prices ( q∞ ). only if (b∞ , Finally note, if ( q∞ , (b∞ , z∞ , x∞ )) describes an equilibrium of the n-financial assets economy with idiosyncracy it must be that (b∞ , z∞ ) satisfies the conditions q∞ ) will also clear asset of (asset) market clearing at the price vector q∞ . Hence, ( z∞ , x∞ )) markets in the economy without idiosyncracy. Conversely, if ( q∞ , (b∞ , describes an equilibrium of the economy without idiosyncracy then ( q∞ ) will also clear asset markets in the n-financial assets economy with idiosyncracy. hn , the maximization problem in the Proof of the Main Theorem. Consider P n-financial asset economy with idiosyncracy. Suppose that, at equilibrium there exists h such that z˜ h ,n = 0, say z˜ h ,n > 0. Then, there must be h such that
Incompleteness of financial markets
359
z˜ h ,n < 0. Next, since z˜ h ,n > 0 and y(0) < y(1), Fact 15.A.1 together with the fact ω that uh (xh,n ) is slice-comonotonic (see Appendix A), implies that CEπ⊗ν uh (xhs,t ,n ) is a standard expectation with respect to the additive measure π × µ(t), where µ(t) = (1 − ν 1 )n0 × (ν 1 )n−n0 , n0 being the number of financial assets whose s,(t ,t )
idiosyncratic payoff is y(0) at state (s, t). This is because xh ,ni −i is necessarily smaller at a state (s, (0, t−i )) than at the state (s, (1, t−i )), s = 1, 2. The first order h n (for agent h ) then give: conditions of the problem P
q˜n =
Eπ ×µ
38 ((y s + y(ti ))/n)uh xhs,t ,n i=1 7
8 Eπ×µ uh xhs,t ,ni
7n
2
Notice, for s = 1, 2, xhs,t ,n and ni=1 ((y s + y(ti ))/n) are positively dependent given s (see Magill and Quinzii (1996)) since z˜ h ,n > 0. Hence, because u (·) < 0, ! n s
y + y(ti ) s,t < 0, given s. , uh xh ,n Covariance n i=1
Now, Eµ
.
n " s
% y + y(ti ) s,t uh xh ,n n i=1
!
n s
y + y(ti ) = Covariance , uh xhs,t ,n n i=1 - n .
y s + y(ti ) + Eµ Eµ uh xhs,t ,n . n i=1
Thus, -
. n " s
% y + y(ti ) s,t uh xh ,n Eµ n i=1 . - n
y s + y(ti ) < Eµ Eµ uh xhs,t ,n . n i=1
360
Sujoy Mukerji and Jean-Marc Tallon
Hence, 7n
8 π(s)Eµ ((y s + y(ti ))/n) Eµ uh xhs,t ,n s=1 i=1 q˜n < 8 7
2 π(s)Eµ uh xhs,t ,n s=1 .$ - n y s + y(ti ) ⇒ q˜n < max Eµ s n 2
i=1
⇒ q˜n < y + (1 − ν 1 )y(0) + ν 1 y(1) s
(15.A.1)
Consider next, h such that z˜ h ,n < 0. By a reasoning similar to that followed n s,t (y s +y(ti )) for the agent h (noticing xh ,n and i=1 are negatively dependent n given s) one gets q˜n > y s + (1 − ν 0 )y(1) + ν 0 y(0).
(15.A.2)
Therefore, a necessary condition for having an equilibrium with z˜ h,n = 0 for at least some h is that y s − y s > (1 − ν 0 − ν 1 )(y(1) − y(0)). Set A¯ = ¯ then z˜ h,n = 0 for all (y s − y s )/(y(1) − y(0)) ∈ (0, 1). If 1 − ν 0 − ν 1 > A, h, at any equilibrium.
Finally, note if n → ∞, CEπ⊗ν uh xhs,t ,n is just a standard expectation operator with respect to the additive measure π × µ(t), where µ(t) is such that µ
$!
n s (y + y(ti )) t : lim = y s + (1 − ν 1 )y(0) + ν 1 y(1) n→∞ n
= 1.
i=1
The proof then proceeds as in the case of finite n except that the inequality (15.A.1) reads lim q˜n ≤ y s + (1 − ν 1 )y(0) + ν 1 y(1)
n→∞
and the inequality (15.A.2) reads lim q˜n ≥ y s + (1 − ν 0 )y(1) + ν 0 y(0).
n→∞
Acknowledgments We thank the referees and the Managing Editor, M. Armstrong, as well as, A. Bisin, S. Bose, P. Ghirardato, I. Gilboa, R. Guesnerie, B. Lipman, J. Malcomson, J. Marin, M. Marinacci, M. Piccione, Z. Safra, H. S. Shin and J. C. Vergnaud for helpful comments. The chapter has also benefitted from the responses of seminar members at the University of British Columbia, University of Essex, University Evry,
Incompleteness of financial markets
361
Johns Hopkins, Nuffield College, Tilburg University, CORE-Louvain-la-Neuve, NYU, UPenn, University of Paris I, University of Toulouse, University du Maine, ENPC-Paris, University Venezia and the ESRC Economic Theory Conference at Kenilworth. The first author gratefully acknowledges financial assistance from an Economic and Social Research Council of U.K. Research Fellowship (# R000 27 1065).
Notes 1 Recent literature has debated the merits of the CEU framework as a model of ambiguity aversion. For instance, Epstein (1999) contends that CEU preferences associated with convex capacities (see Section 15.2,) do not always conform with a “natural” notion of ambiguity averse behavior. On the other hand, Ghirardato and Marinacci (1997) argue that ambiguity aversion is demonstrated in the CEU model by a broad class of capacities which includes convex capacities. 2 And, indeed, formal empirical investigations overwhelmingly confirm that the data on individual consumption are more consistent with incomplete than complete markets. Among others, see Zeldes (1989), Carroll (1992), Deaton and Paxson (1994) and Hayashi et al. (1996). The evidence, however, is not unanimous, see, for example, Mace (1991). 3 We will say an asset’s payoff has an idiosyncratic component if at least some component of the payoff is independent of (1) the realized endowments of agents and (2) of the payoff of any other asset as well. 4 Fishburn (1993) provides an axiomatic justification of this definition of ambiguity and Mukerji (1997) demonstrates its equivalence to a more primitive and epistemic notion of ambiguity (expressed in terms of the DM’s knowledge of the state space). 5 The Choquet expectation operator may be directly defined with respect to a non-additive probability, see Schmeidler (1989). Also, for an intuitive introduction to the CEU model see Section 2 in Mukerji (1998). 6 For instance, suppose a firm introduces a new product line, an innovation, into the market. In such a case, typically, it is not just the shocks commonly affecting firms in the same trade that will affect the sales of the new product but also more (brand) specific elements, for example, whether (or not) the innovation has a “special” appeal for the consumers. Another example of idiosyncratic shocks, are shocks to firms’internal organizational capabilities. 7 In this context it is worth noting that it is reported almost 70 percent of corporate borrowing in the US is through bonds. Default rates on bonds are also significant. Financial Times, October 13, 1998, in its report headlined “US corporate bond market hit,” notes, “the rate of default on US high-yield bonds was running at 10% in the early 1990s … today the default rate is hovering around 3% but creeping higher.” 8 Werner (1997) considers a finance economy of which this is just a special case. There are standard arguments that ensure the existence of equilibria of such economies (op.-cit., p. 100). 9 This has to be qualified since there exists some nongeneric constraints among endowments in different states, namely ehs,t = ehs,t ≡ ehs . 10 Laws of large numbers for ambiguous beliefs have been studied by, among others, Walley and Fine (1982), Marinacci (1996, 1999). Appendix A contains a formal statement of the version we apply. This version was, essentially, originally proved in Walley and Fine (1982). The statement given here is from Marinacci (1996), Theorem 7.7. However, the result is a direct implication of the more general Theorem 15 in Marinacci (1999).
362
Sujoy Mukerji and Jean-Marc Tallon
11 Section 3.4 of Epstein and Wang (1994), presents an example of an economy with heterogeneous agents. But, in this model markets are assumed to be complete, and hence, risk-sharing continues to be efficient (Pareto optimal), as is explicitly observed by the authors.
References Bewley, T. (1986). “Knightian Decision Theory: Part,” Discussion Paper 807, Cowles Foundation. Carroll, C. (1992). “The Buffer Stock Theory of Savings; some Macroeconomic Evidence,” Brookings Papers on Economic Activity, 2, 61–135. Chateauneuf, A., R. Dana, and J.-M. Tallon (2000). “Optimal Risk Sharing Rules and Equilibria with Choquet-Expected-Utility,” Journal of Mathematical Economics, 34, 191–214. Deaton, A. and C. Paxson (1994). “Intertemporal Choice and Inequality,” Journal of Political Economy, 102(3), 437–467. Dow, J. and S. Werlang (1992). “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, 60(1), 197–204. (Reprinted as Chapter 17 in this volume.) Epstein, L. (1999). “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608. (Reprinted as Chapter 9 in this volume.) Epstein, L. and T. Wang (1994). “Intertemporal Asset Pricing under Knightian Uncertainty,” Econometrica, 62(3), 283–322. (Reprinted as Chapter 18 in this volume.) Fishburn, P. (1993). “The Axioms and Algebra of Ambiguity,” Theory and Decision, 34, 119–137. Ghirardato, P. (1997). “On Independence for Non-Additive Measures, with a Fubini Theorem,” Journal of Economic Theory, 73, 261–291. Ghirardato, P. and M. Marinacci (1997). “Ambiguity Aversion Made Precise: A Comparative Foundation and Some Implications,” Social science w.p. 1026, CalTech. Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Hayashi, F., J.Altonji, and L. Kotlikof (1996). “Risk Sharing Between and Within Families,” Econometrica, 64(2), 261–294. Hendon, E., H. Jacobsen, B. Sloth, and T. Tranaes (1996). “The Product of Capacities and Belief Functions,” Mathematical Social Sciences, 32(2), 95–108. Kelsey, D. and F. Milne (1995). “The Arbitrage Pricing Theorem with Non-Expected Utility Preferences,” Journal of Economic Theory, 65(2), 557–574. Lucas, R. (1978). “Asset Prices in an Exchange Economy,” Econometrica, 46, 1429–1445. Mace, B. (1991). “Full Insurance in the Presence of Aggregate Uncertainty,” Journal of Political Economy, 99(5), 928–956. Magill, M. and M. Quinzii (1996). Theory of Incomplete Markets, Vol. 1, MIT Press. Marinacci, M. (1996). “Limit Laws for Non-Additive Probabilities, and their Frequentist Interpretation,” mimeo. Marinacci, M. (1999). “Limit Laws for Non-Additive Probabilities and their Frequentist Interpretation,” Journal of Economic Theory, 84, 145–195. Morris, S. (1997). “Risk, Uncertainty and Hidden Information,” Theory and Decision, 42(3), 235–269.
Incompleteness of financial markets
363
Mukerji, S. (1997). “Understanding the Nonadditive Probability Decision Model,” Economic Theory, 9(1), 23–46. Mukerji, S. (1998). “Ambiguity Aversion and Incompleteness of Contractual Form,” American Economic Review, 88(5), 1207–1231. (Reprinted as Chapter 14 in this volume.) Mukerji, S., and J.-M. Tallon (1999). “Ambiguity Aversion and Incompleteness of Financial Markets-Extended Version,” Mimeo. 99-28, Cahiers de la Maison des Sciences Economiques, Universite Paris I, available for download at http://eurequa.univparis1.fr/membros/tallon/tallon.htm Schmeidler, D. (1989). “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57(3), 571–587. (Reprinted as Chapter 5 in this volume.) Walley, P. and L. Fine (1982). “Towards a Frequentist Theory of Upper and Lower Probability,” Annals of Statistics, 10, 741–761. Werner, J. (1997). “Diversification and Equilibrium in Securities Markets,” Journal of Economic Theory, 75, 89–103. Zeldes, S. (1989). “Consumption and Liquidity: An Empirical Investigation,” Journal of Political Economy, 97, 305–346.
16 A quartet of semigroups for model specification, robustness, prices of risk, and model detection Evan W. Anderson, Lars Peter Hansen, and Thomas J. Sargent
16.1. Introduction 16.1.1. Rational expectations and model misspecification A rational expectations econometrician or calibrator typically attributes no concern about specification error to agents even as he shuttles among alternative specifications.1 Decision makers inside a rational expectations model know the model.2 Their confidence contrasts with the attitudes of both econometricians and calibrators. Econometricians routinely use likelihood-based specification tests (information criteria or IC) to organize comparisons between models and empirical distributions. Less formally, calibrators sometimes justify their estimation procedures by saying that they regard their models as incorrect and unreliable guides to parameter selection if taken literally as likelihood functions. But the agents inside a calibrator’s model do not share the model-builder’s doubts about specification. By equating agents’ subjective probability distributions to the objective one implied by the model, the assumption of rational expectations precludes any concerns that agents should have about the model’s specification. The empirical power of the rational expectations hypothesis comes from having decision makers’beliefs be outcomes, not inputs, of the model-building enterprise. A standard argument that justifies equating objective and subjective probability distributions is that agents would eventually detect any difference between them, and would adjust their subjective distributions accordingly. This argument implicitly gives agents an infinite history of observations, a point that is formalized by the literature on convergence of myopic learning algorithms to rational expectations equilibria of games and dynamic economies.3 Specification tests leave applied econometricians in doubt because they have too few observations to discriminate among alternative models. Econometricians with finite data sets thus face a model detection problem that builders of rational
Anderson Evan W., Lars Peter Hansen, and Thomas J. Sargent, (forthcoming), “A quartet of semigroups for model specification, robustness, prices of risk, and model detection,” Journal of the European Economic Association. March 2003; 1(1): 68–123.
A quartet of semigroups for model specification
365
expectations models let agents sidestep by endowing them with infinite histories of observations “before time zero.” This chapter is about models with agents whose databases are finite, like econometricians and calibrators. Their limited data leave agents with model specification doubts that are quantitatively similar to those of econometricians and that make them value decision rules that perform well across a set of models. In particular, agents fear misspecifications of the state transition law that are sufficiently small that they are difficult to detect because they are obscured by random shocks that impinge on the dynamical system. Agents adjust decision rules to protect themselves against modeling errors, a precaution that puts model uncertainty premia into equilibrium security market prices. Because we work with Markov models, we can avail ourselves of a powerful tool called a semigroup. 16.1.2. Iterated laws and semigroups The law of iterated expectations imposes consistency requirements that cause a collection of conditional expectations operators associated with a Markov process to form a mathematical object called a semigroup. The operators are indexed by the time that elapses between when the forecast is made and when the random variable being forecast is realized. This semigroup and its associated generator characterize the Markov process. Because we consider forecasting random variables that are functions of a Markov state, the current forecast depends only on the current value of the Markov state.4 The law of iterated values embodies analogous consistency requirements for a collection of economic values assigned to claims to payoffs that are functions of future values of a Markov state. The family of valuation operators indexed by the time that elapses between when the claims are valued and when their payoffs are realized forms another semigroup. Just as a Markov process is characterized by its semigroup, so prices of payoffs that are functions of a Markov state can be characterized by a semigroup. Hansen and Scheinkman (2002) exploited this insight. Here we extend their insight to other semigroups. In particular, we describe four semigroups: (1) one that describes a Markov process; (2) another that adjusts continuation values in a way that rewards decision rules that are robust to misspecification of the approximating model; (3) another that models the equilibrium pricing of securities with payoff dates in the future; and (4) another that governs statistics for discriminating between alternative Markov processes using a finite time series data record.5 We show the close connections that bind these four semigroups. 16.1.3. Model detection errors and market prices of risk In earlier work (Hansen, Sargent, and Tallarini (1999), henceforth denoted HST, and Hansen, Sargent, and Wang (2002), henceforth denoted HSW), we studied various discrete time asset pricing models in which decision makers’ fear of model misspecification put model uncertainty premia into market prices of risk, thereby
366 Anderson et al. potentially helping to account for the equity premium. Transcending the detailed dynamics of our examples was a tight relationship between the market price of risk and the probability of distinguishing the representative decision maker’s approximating model from a worst-case model that emerges as a byproduct of his cautious decision making procedure. Although we had offered only a heuristic explanation for that relationship, we nevertheless exploited it to help us calibrate the set of alternative models that the decision maker should plausibly seek robustness against. In the context of continuous time Markov models, this chapter analytically establishes a precise link between the uncertainty component of risk prices and a bound on the probability of distinguishing the decision maker’s approximating and worstcase models. We also develop new ways of representing decision makers’ concerns about model misspecification and their equilibrium consequences.
16.1.4. Related literature In the context of a discrete-time, linear-quadratic permanent income model, HST considered model misspecifications measured by a single robustness parameter. HST showed how robust decision making promotes behavior like that induced by risk aversion. They interpreted a preference for robustness as a decision maker’s response to Knightian uncertainty and calculated how much concern about robustness would be required to put market prices of risk into empirically realistic regions. Our fourth semigroup, which describes model detection errors, provides a statistical method for judging whether the required concern about robustness is plausible. HST and HSW allowed the robust decision maker to consider only a limited array of specification errors, namely, shifts in the conditional mean of shocks that are i.i.d. and normally distributed under an approximating model. In this chapter, we consider more general approximating models and motivate the form of potential specification errors by using specification test statistics. We show that HST’s perturbations to the approximating model emerge in linear-quadratic, Gaussian control problem as well as in a more general class of control problems in which the stochastic evolution of the state is a Markov diffusion process. However, we also show that misspecifications different from HST’s must be entertained when the approximating model includes Markov jump components. As in HST, our formulation of robustness allows us to reinterpret one of Epstein and Zin’s (1989) recursions as reflecting a preference for robustness rather than aversion to risk. As we explain in Hansen, Sargent, Turmuhambetova, and Williams (henceforth HSTW) (2002), the robust control theory described in Section 16.5 is closely connected to the minmax expected utility or multiple priors model of Gilboa and Schmeidler (1989). A main theme of this chapter is to advocate a workable strategy for actually specifying those multiple priors in applied work. Our strategy is to use detection error probabilities to surround the single model that is typically specified in applied work with a set of empirically plausible but vaguely specified alternatives.
A quartet of semigroups for model specification
367
16.1.5. Robustness versus learning A convenient feature of rational expectations models is that the model builder imputes a unique and explicit model to the decision maker. Our analysis shares this analytical convenience. While an agent distrusts his model, he still uses it to guide his decisions.6 But the agent uses his model in a way that recognizes that it is an approximation. To quantify approximation, we measure discrepancy between the approximating model and other models with relative entropy, an expected log likelihood ratio, where the expectation is taken with respect to the distribution from the alternative model. Relative entropy is used in the theory of large deviations, a powerful mathematical theory about the rate at which uncertainty about unknown distributions is resolved as the number of observations grows.7 An advantage of using entropy to restrain model perturbations is that we can appeal to the theory of statistical detection to provide information about how much concern about robustness is quantitatively reasonable. Our decision maker confronts alternative models that can be discriminated among only with substantial amounts of data, so much data that, because he discounts the future, the robust decision maker simply accepts model misspecification as a permanent situation. He designs robust controls, and does not use data to improve his model specification over time. He adopts this stance because relative to his discount factor, it would take too much time for enough data to accrue for him to dispose of the alternative models that concern him. In contrast, many formulations of learning have decision makers fully embrace an approximating model when making their choices.8 Despite their different orientations, learners and robust decision makers both need a convenient way to measure the proximity of two probability distributions. This fact builds technical bridges between robust decision theory and learning theory. The same expressions from large deviation theory that govern bounds on rates of learning also provide bounds on value functions across alternative possible models in robust decision theory.9 More importantly here, we shall show that the tight relationship between detection error probabilities and the market price of risk that was encountered by HST and HSW can be explained by formally studying the rate at which detection errors decrease as sample size grows.
16.1.6. Reader’s guide A reader interested only in our main results can read Section 16.2, then jump to the empirical applications in Section 16.9.
16.2. Overview This section briefly tells how our main results apply in the special case in which the approximating model is a diffusion. Later sections provide technical details and show how things change when we allow jump components.
368 Anderson et al. A representative agent’s model asserts that the state of an economy xt in a state space D follows a diffusion10 dxt = µ(xt )dt + (xt )dBt ,
(16.1)
where Bt is a Brownian vector. The agent wants decision rules that work well not just when (16.1) is true but also when the data conform to models that are statistically difficult to distinguish from (16.1). A robust control problem to be studied in Section 16.5 leads to such a robust decision rule together with a value function V (xt ) and a process γ (xt ) for the marginal utility of consumption of a representative agent. As a byproduct of the robust control problem, the decision maker computes a worst-case diffusion that takes the form ˆ t )]dt + (xt )dBt , dxt = [µ(xt ) + (xt )g(x
(16.2)
where gˆ = (−1/θ ) (∂V /∂x) and θ > 0 is a parameter measuring the size of potential model misspecifications. Notice that (16.2) modifies the drift but not the volatility relative to (16.1). The formula for gˆ tells us that large values of θ are associated with gˆ t ’s that are small in absolute value, making model (16.2) difficult to distinguish statistically from model (16.1). The diffusion (16.6) lets us quantify just how difficult this statistical detection problem is. Without a preference for robustness to model misspecification, the usual approach to asset pricing is to compute the expected discounted value of payoffs with respect to the “risk-neutral” probability measure that is associated with the following twisted version of the physical measure (diffusion (16.1)): dxt = [µ(xt ) + (xt )g(x ¯ t )]dt + (xt )dBt .
(16.3)
In using the risk-neutral measure to price assets, future expected returns are discounted at the risk-free rate ρ(xt ), obtained as follows. The marginal utility of the representative household γ (xt ) conforms to dγt = µγ (xt )dt + σγ (xt )dBt . Then the risk-free rate is ρ(xt ) = δ − (µγ (xt )/γ (xt )), where δ is the instantaneous rate at which the household discounts future utilities; the risk-free rate thus equals the negative of the expected growth rate of the representative household’s marginal utility. The price of a payoff φ(xN ) contingent on a Markov state in period N is then * ) N ρ(xu )du φ(xN )|x0 = x , (16.4) E¯ exp − 0
where E¯ is the expectation evaluated with respect to the distribution generated by (16.3). This formula gives rise to a pricing operator for every horizon N . Relative to the approximating model, the diffusion (16.3) for the risk-neutral measure distorts the drift in the Brownian motion by adding the term (x)g(x ¯ t ), where g¯ = (∂ log γ (x)/∂x). Here g¯ is a vector of “factor risk prices” or “market prices of risk.” The equity premium puzzle is the finding that with plausible quantitative
A quartet of semigroups for model specification
369
specifications for the marginal utility γ (x), factor risk prices g¯ are too small relative to their empirically estimated counterparts. In Section 16.7, we show that when the planner and a representative consumer want robustness, the diffusion associated with the risk-neutral measure appropriate for pricing becomes dxt = (µ(xt ) + (xt )[g(x ¯ t ) + g(x ˆ t )])dt + (xt )dBt ,
(16.5)
where gˆ is the same process that appears in (16.2). With robustness sought over a set of alternative models that is indexed by θ , factor risk prices become augmented to g+ ¯ g. ˆ The representative agent’s concerns about model misspecification contribute the gˆ component of the factor risk prices. To evaluate the quantitative potential for attributing parts of the market prices of risk to agents’ concerns about model misspecification, we need to calibrate θ and therefore |g|. ˆ To calibrate θ and g, ˆ we turn to a closely related fourth diffusion that governs the probability distribution of errors from using likelihood ratio tests to detect which of two models generated a continuous record of length N of observations on xt . Here the key idea is that we can represent the average error in using a likelihood ratio test to detect the difference between the two models (16.1) and (16.3) from a continuous record of data of length N as 0.5E (min{exp(N ), 1}|x0 = x) where E is evaluated with respect to model (16.1) and N is a likelihood ratio of the data record of model (16.2) with respect to model (16.1). For each α ∈ (0, 1), we can use the inequality E(min{exp(N ), 1}|x0 = x) ≤ E({exp(αN )}|x0 = x) to attain a bound on the detection error probability. For each α, we show that the bound can be calculated by forming a new diffusion that uses (16.1) and (16.2) as ingredients, and in which the drift distortion gˆ from (16.2) plays a key role. In particular, for α ∈ (0, 1), define dxtα = [µ(xt ) + α(xt )g(x ˆ t )]dt + (xt ) dBt ,
(16.6)
and define the local rate function ρ α (x) = ((1−α)α/2)g(x) ˆ Then the bound ˆ g(x). on the average error in using a likelihood ratio test to discriminate between the approximating model (16.1) and the worst-case model (16.2) from a continuous data record of length N is ) N * α α av error ≤ 0.5E exp − ρ (xt ) dt x0 = x , (16.7) 0
where E α is the mathematical expectation evaluated with respect to the diffusion (16.6). The error rate ρ α (x) is maximized by setting α = 0.5. Notice that the right side of (16.7) is one half the price of pure discount bond that pays off one unit of consumption for sure N periods in the future, treating ρ α as the risk-free rate and the measure induced by (16.6) as the risk-neutral probability measure. It is remarkable that the three diffusions (16.2), (16.5), and (16.6) that describe the worst case model, asset pricing under a preference for robustness, and the local behavior of a bound on model detection errors, respectively, are all obtained
370 Anderson et al. by perturbing the drift in the approximating model (16.1) with functions of the same drift distortion g(x) ˆ that emerges from the robust control problem. To the extent that the bound on detection probabilities is informative about the detection probabilities themselves, our theoretical results thus neatly explain the pattern that was observed in the empirical applications of HST and HSW, namely, that there is a tight link between calculated detection error probabilities and the market price of risk. That link transcends all details of the model specification.11 . In Section 16.9, we shall encounter this tight link again when we calibrate the contribution to market prices of risk that can plausibly be attributed to a preference for robustness in the context of three continuous time asset pricing models. Subsequent sections of this chapter substantiate these and other results in a more general Markov setting that permits x to have jump components, so that jump distortions also appear in the Markov processes for the worst-case model, asset pricing, and model detection error. We shall exploit and extend the assetpricing structure of