Uncertainty in Economic Theory
Recent decades have witnessed developments in decision theory that propose an alternative to the accepted Bayesian view. According to this view, all uncertainty can be quantified by probability measures. This view has been criticized on empirical as well as on conceptual grounds. David Schmeidler has offered an alternative way of thinking about decision under uncertainty, which has become popular in recent years. This book provides a review and an introduction to this new decision theory under uncertainty. The first part focuses on theory: axiomatizations, the definitions of uncertainty aversion, of updating and independence, and so forth. The second part deals with applications to economic theory, game theory, and finance. This is the first collection to include chapters on this topic, and it can thus serve as an introduction to researchers who are new to the field as well as a graduate course textbook. With this goal in mind, the book contains survey introductions that are aimed at a graduate level student, and help explain the main ideas, and put them in perspective. Itzhak Gilboa is Professor at the Eitan Berglas School of Economics, Tel-Aviv University, Israel. He is also a Fellow of the Cowles Foundation for Research in Economics at Yale University, USA.
Routledge frontiers of political economy
1 Equilibrium Versus Understanding Towards the rehumanization of economics within social theory Mark Addleson 2 Evolution, Order and Complexity Edited by Elias L. Khalil and Kenneth E. Boulding 3 Interactions in Political Economy Malvern after ten years Edited by Steven Pressman 4 The End of Economics Michael Perelman 5 Probability in Economics Omar F. Hamouda and Robin Rowley 6 Capital Controversy, Post-Keynesian Economics and the History of Economics Essays in honour of Geoff Harcourt, volume one Edited by Philip Arestis, Gabriel Palma and Malcolm Sawyer 7 Markets, Unemployment and Economic Policy Essays in honour of Geoff Harcourt, volume two Edited by Philip Arestis, Gabriel Palma and Malcolm Sawyer 8 Social Economy The logic of capitalist development Clark Everling 9 New Keynesian Economics/Post-Keynesian Alternatives Edited by Roy J. Rotheim 10 The Representative Agent in Macroeconomics James E. Hartley
11 Borderlands of Economics Essays in honour of Daniel R. Fusfeld Edited by Nahid Aslanbeigui and Young Back Choi 12 Value, Distribution and Capital Essays in honour of Pierangelo Garegnani Edited by Gary Mongiovi and Fabio Petri 13 The Economics of Science Methodology and epistemology as if economics really mattered James R. Wible 14 Competitiveness, Localised Learning and Regional Development Specialisation and prosperity in small open economies Peter Maskell, Heikki Eskelinen, Ingjaldur Hannibalsson, Anders Malmberg and Eirik Vatne 15 Labour Market Theory A constructive reassessment Ben J. Fine 16 Women and European Employment Jill Rubery, Mark Smith, Colette Fagan, Damian Grimshaw 17 Explorations in Economic Methodology From Lakatos to empirical philosophy of science Roger Backhouse 18 Subjectivity in Political Economy Essays on wanting and choosing David P. Levine 19 The Political Economy of Middle East Peace The impact of competing trade agendas Edited by J.W. Wright, Jnr 20 The Active Consumer Novelty and surprise in consumer choice Edited by Marina Bianchi 21 Subjectivism and Economic Analysis Essays in memory of Ludwig Lachmann Edited by Roger Koppl and Gary Mongiovi 22 Themes in Post-Keynesian Economics Essays in honour of Geoff Harcourt, volume three Edited by Claudio Sardoni and Peter Kriesler 23 The Dynamics of Technological Knowledge Cristiano Antonelli
24 The Political Economy of Diet, Health and Food Policy Ben J. Fine 25 The End of Finance Capital market inflation, financial derivatives and pension fund capitalism Jan Toporowski 26 Political Economy and the New Capitalism Edited by Jan Toporowski 27 Growth Theory A philosophical perspective Patricia Northover 28 The Political Economy of the Small Firm Edited by Charlie Dannreuther 29 Hahn and Economic Methodology Edited by Thomas Boylan and Paschal F. O’Gorman 30 Gender, Growth and Trade The miracle economies of the postwar years David Kucera 31 Normative Political Economy Subjective freedom, the market and the state David Levine 32 Economist with a Public Purpose Essays in honour of John Kenneth Galbraith Edited by Michael Keaney 33 Involuntary Unemployment The elusive quest for a theory Michel De Vroey 34 The Fundamental Institutions of Capitalism Ernesto Screpanti 35 Transcending Transaction The search for self-generating markets Alan Shipman 36 Power in Business and the State An historical analysis of its concentration Frank Bealey 37 Editing Economics Essays in honour of Mark Perlman Hank Lim, Ungsuh K. Park and Geoff Harcourt
38 Money, Macroeconomics and Keynes Essays in honour of Victoria Chick, volume 1 Philip Arestis, Meghnad Desai and Sheila Dow 39 Methodology, Microeconomics and Keynes Essays in honour of Victoria Chick, volume 2 Philip Arestis, Meghnad Desai and Sheila Dow 40 Market Drive and Governance Reexamining the rules for economic and commercial contest Ralf Boscheck 41 The Value of Marx Political economy for contemporary capitalism Alfredo Saad-Filho 42 Issues in Positive Political Economy S. Mansoob Murshed 43 The Enigma of Globalisation A journey to a new stage of capitalism Robert Went 44 The Market Equilibrium, stability, mythology S.N. Afriat 45 The Political Economy of Rule Evasion and Policy Reform Jim Leitzel 46 Unpaid Work and the Economy Edited by Antonella Picchio 47 Distributional Justice Theory and measurement Hilde Bojer 48 Cognitive Developments in Economics Edited by Salvatore Rizzello 49 Social Foundations of Markets, Money and Credit Costas Lapavitsas 50 Rethinking Capitalist Development Essays on the economics of Josef Steindl Edited by Tracy Mott and Nina Shapiro 51 An Evolutionary Approach to Social Welfare Christian Sartorius
52 Kalecki’s Economics Today Edited by Zdzislaw L. Sadowski and Adam Szeworski 53 Fiscal Policy from Reagan to Blair The Left veers Right Ravi K. Roy and Arthur T. Denzau 54 The Cognitive Mechanics of Economic Development and Institutional Change Bertin Martens 55 Individualism and the Social Order The social element in liberal thought Charles R. McCann, Jnr 56 Affirmative Action in the United States and India A comparative perspective Thomas E. Weisskopf 57 Global Political Economy and the Wealth of Nations Performance, institutions, problems and policies Edited by Phillip Anthony O’Hara 58 Structural Economics Thijs ten Raa 59 Macroeconomic Theory and Economic Policy Essays in honour of Jean-Paul Fitoussi Edited by K. Vela Velupillai 60 The Struggle Over Work The “end of work” and employment alternatives in post-industrial societies Shaun Wilson 61 The Political Economy of Global Sporting Organisations John Forster and Nigel Pope 62 The Flawed Foundations of General Equilibrium Critical essays on economic theory Frank Ackerman and Alejandro Nadal 63 Uncertainty in Economic Theory Essays in honor of David Schmeidler’s 65th birthday Edited by Itzhak Gilboa
Uncertainty in Economic Theory Essays in honor of David Schmeidler’s 65th birthday
Edited by Itzhak Gilboa
First published 2004 by Routledge 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Routledge 29 West 35th Street, New York, NY 10001 This edition published in the Taylor & Francis e-Library, 2006.
“To purchase your own copy of this or any of Taylor & Francis or Routledge’s collection of thousands of eBooks please go to www.eBookstore.tandf.co.uk.” Routledge is an imprint of the Taylor & Francis Group © 2004 selection and editorial matter, Itzhak Gilboa; individual chapters, the contributors All rights reserved. No part of this book may be reprinted or reproduced or utilized in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging in Publication Data A catalog record for this book has been requested ISBN 0-415-32494-7
Contents
List of contributors Preface
xii xv
PART I
Theory 1 Introduction
1 3
ITZHAK GILBOA
2 Preference axiomatizations for decision under uncertainty
20
PETER P. WAKKER
3 Defining ambiguity and ambiguity attitude
36
PAOLO GHIRARDATO
4 Introduction to the mathematics of amibiguity
46
MASSIMO MARINACCI AND LUIGI MONTRUCCHIO
5 Subjective probability and expected utility without additivity
108
DAVID SCHMEIDLER
6 Maxmin expected utility with non-unique prior
125
ITZHAK GILBOA AND DAVID SCHMEIDLER
7 A simple axiomatization of nonadditive expected utility
136
RAKESH SARIN AND PETER P. WAKKER
8 Updating ambiguous beliefs ITZHAK GILBOA AND DAVID SCHMEIDLER
155
x
Contents
9 A definition of uncertainty aversion
171
LARRY G. EPSTEIN
10 Ambiguity made precise: a comparative foundation
209
PAOLO GHIRARDATO AND MASSIMO MARINACCI
11 Stochastically independent randomization and uncertainty aversion
244
PETER KLIBANOFF
12 Decomposition and representation of coalitional games
261
MASSIMO MARINACCI
PART II
Applications
281
13 An overview of economic applications of David Schmeidler’s models of decision making under uncertainty
283
SUJOY MUKERJI AND JEAN-MARC TALLON
14 Ambiguity aversion and incompleteness of contractual form
303
SUJOY MUKERJI
15 Ambiguity aversion and incompleteness of financial markets
336
SUJOY MUKERJI AND JEAN-MARC TALLON
16 A quartet of semigroups for model specification, robustness, prices of risk, and model detection
364
EVAN W. ANDERSON, LARS PETER HANSEN, AND THOMAS J. SARGENT
17 Uncertainty aversion, risk aversion, and the optimal choice of portfolio 419 JAMES DOW AND SÉRGIO RIBEIRO DA COSTA WERLANG
18 Intertemporal asset pricing under Knightian uncertainty
429
LARRY G. EPSTEIN AND TAN WANG
19 Sharing beliefs: between agreeing and disagreeing ANTOINE BILLOT, ALAIN CHATEAUNEUF, ITZHAK GILBOA, AND JEAN-MARC TALLON
472
Contents xi 20 Equilibrium in beliefs under uncertainty
483
KIN CHUNG LO
21 The right to remain silent
522
JOSEPH GREENBERG
22 On the measurement of inequality under uncertainty
531
ELCHANAN BEN-PORATH, ITZHAK GILBOA, AND DAVID SCHMEIDLER
Index
541
Contributors
Evan W. Anderson is Assistant Professor of Economics, University of North Carolina at Chapel Hill. His research interests include heterogeneous agents, recursive utility, robustness, and computational methods. Elchanan Ben-Porath is Professor of Economics at the Hebrew University of Jerusalem. His fields of interest include game theory, decision theory, and social choice theory. Antoine Billot is Professor of Economics at the Université de Paris II, Panthéon-Assas, and junior member of the Institut Universitaire de France. His research interests are in the field of preference theory, social choice theory, and decision theory. Alain Chateauneuf is Professor of Mathematics at the Université de Paris I, Panthéon-Sorbonne. His research is mainly concerned with mathematical economics, focusing on decision theory and particularly on decision under uncertainty. James Dow is Professor of Finance at London Business School. His recent research has been on models that integrate the financial markets with corporate finance. He has also worked on executive compensation and leadership. In his work on Knightian uncertainty with Sérgio Welang, he applied the models developed by David Schmeidler to portfolio choice, to stock price volatility, to the no-trade theorem, and to Nash equilibrium. Larry Epstein is the Elmer B. Milliman Professor of Economics at the University of Rochester. His research interests include decision theory and its applications to macroeconomics and finance. Paolo Ghirardato is Associate Professor of Mathematical Economics at the Università di Torino. His main research interest is decision theory and its consequences for economic, political, and financial modeling. Itzhak Gilboa is Professor of Economics at Tel-Aviv University and a Fellow of the Cowles Foundation for Research in Economics at Yale University. He is interested in decision theory, game theory, and social choice.
Contributors xiii Joseph Greenberg is the Dow Professor of Political Economy at McGill University. His research interests include economic theory, game theory, and theory of social situations. Lars Peter Hansen is the Homer J. Livingston Distinguished Service Professor at the University of Chicago. He is interested in macroeconomic theory, risk, and uncertainty. Peter Klibanoff is Associate Professor of Managerial Economics and Decision Sciences at the Kellogg School of Management, Northwestern University. His research interests include decision making under uncertainty, microeconomic theory, and behavioral finance. Kin Chung Lo is Associate Professor of Economics at York University. He specializes in game theory and decision theory. His publications cover areas such as nonexpected utility, auctions, and foundation of solution concepts in games. Massimo Marinacci is Professor of Applied Mathematics at the Università di Torino, Italy. His main research interest is mathematical economics, in particular choice theory. Luigi Montrucchio is Professor of Economics at the Università di Torino, Italy. His main research interest is mathematical economics, in particular economic dynamics and optimal growth. Sujoy Mukerji is a University Lecturer in Economic Theory at the University of Oxford and Fellow of University College. His research has primarily been on decision making under ambiguity, its foundations, and its relevance in economic contexts. His broader research interests lie in the intersection of bounded rationality and economic theory. Thomas Sargent is Professor of Economics at New York University and senior fellow at the Hoover Institution at Stanford. He is interested in macroeconomics and applied economic dynamics. Rakesh Sarin is Professor of Decisions, Operations, and Technology Management and the Paine Chair in Management at the Anderson School of Management at the University of California at Los Angeles. He is interested in decision analysis, societal risk analysis, and risk analysis. David Schmeidler is Professor of Statistics and Management at Tel-Aviv University and Professor of Economics at Ohio State University. His research topics include economic theory, game theory, decision theory, and social choice. Jean-Marc Tallon is Directeur de Recherche at CNRS and Université Paris I. His research mainly deals with economic applications of models of decision under uncertainty, and general equilibrium models with incomplete market.
xiv Contributors Peter P. Wakker is Professor of Decision under Uncertainty at the University of Amsterdam. His research is on normative foundations of Bayesianism and on descriptive deviations from Bayesianism, the latter both theoretically and empirically. Tan Wang is Associate Professor at the Sauder School of Business, University of British Columbia. His research interest is in decision theory under risk and uncertainty and asset pricing, focusing on the implications of uncertainty aversion. Sérgio Ribeiro da Costa Werlang is Professor of Economics at the Getulio Vargas Foundation. His interests include economic theory and macroeconomics.
Preface
This book is published in celebration of David Schmeidler’s 65th birthday. It is a collection of seventeen papers that have appeared in refereed journals, combined with five introductory chapters that were written for this volume. All papers deal with uncertainty (or “ambiguity”) in economic theory. They range from purely theoretical issues such as axiomatic foundations, definitions, and measurement, to economic applications in fields such as contract theory and finance. There is a large and rapidly growing literature on uncertainty in economic theory, following David’s seminal work on non-additive expected utility. But there is no general introduction to the topic, and scholars who are interested in it are usually referred to the original papers, which are scattered in various journals and are often rather technical. We felt that a collection of papers and introductory surveys would make a significant contribution to the literature, allowing to introduce the novice and to guide the expert. Thus, David’s birthday was the impetus for the publication of this collection, but the latter has an independent raison-d’être. We have no intention or pretense to summarize David Schmeidler’s research. Indeed, uncertainty in economic theory is but one topic that David has worked on. He has made many other remarkable and path breaking contributions to game theory, mathematical economics, economic theory, and decision theory. We do not attempt to give an overview of David’s contributions for two reasons. First, such an overview is a daunting task. Second, this book does not mark David’s retirement in any way. David Schmeidler is a very active researcher. We hope and believe that he will continue to produce new breakthroughs in the future. This book marks a special birthday, but by no means the end of a research career. We thank the contributors to the volume, as well as the publishers for the right to reprint published papers. We hope that this collection, while obviously partial, will give readers a preliminary overview of the research on uncertainty in economic theory. We are grateful to Ms. Lada Burde for her invaluable help in editing and proofreading this volume. Paolo Ghirardato, Itzhak Gilboa, Massimo Marinacci, Luigi Montrucchio, Sujoy Mukerji, Jean-Marc Tallon, and Peter P. Wakker
Part I
Theory
1
Introduction Itzhak Gilboa
1.1. Uncertainty and Bayesianism Ever since economic theory started to engage in formal modeling of uncertainty, it has espoused the Bayesian paradigm. In the mid-twentieth century, the Bayesian approach came to dominate decision theory and game theory, and it has remained a dominant paradigm in the applications of these theories to economics to this day. Economic problems ranging from insurance and portfolio selection to signaling and health policy are typically analyzed in a Bayesian way. In fact, there is probably no other field of formal inquiry involving uncertainty in which Bayesianism enjoys such a predominant status as it does in economic theory. But what does it exactly mean to be Bayesian? One may discern at least three distinct tenets that are often assumed to be held by Bayesians. First, a Bayesian quantifies uncertainty in a probabilistic way. Second, Bayesianism entails updating one’s belief given new information in accordance with Bayes’ law. Finally, in light of the axiomatizations of the Bayesian approach (Ramsey, 1931; de Finetti, 1937; Savage, 1954), Bayesianism is often taken to also imply the maximization of expected utility (EU) relative to probabilistic beliefs.1 Taken as assumptions regarding the behavior of economic agents, all three tenets of Bayesianism have come under attack. The assumption of EU maximization was challenged by Allais (1953). The famous Allais paradox, combined with the body of work starting with Kahneman and Tversky’s Prospect Theory (1979), aimed to show that people may fail to maximize EU even in the face of decisions under risk, namely, where probabilities are known. Tversky and Kahneman (1974) have also shown that people may fail to perform Bayesian updating. That is, even when probabilities are given in a problem, they might not be manipulated in accordance with Bayes’ law. Thus, the second tenet of Bayesianism has also been criticized in terms of descriptive validity. Moreover, other work by Kahneman and Tversky, such as the documentation of framing effects (Tversky and Kahneman, 1981) has shown that some of the implicit assumptions of the Bayesian model are also descriptively inaccurate. Yet, violations of the second and third tenets have not amounted to a serious critique of Bayesianism per se. Violations of Bayesian updating are viewed by most researchers as mistakes. While these mistakes pose a challenge to descriptive
4
Itzhak Gilboa
Bayesian theories, they fail to sway one from the belief that Bayes’ law should be the way that probabilities be updated. Some researchers also view violations of EU maximization (given a probability measure) as plain mistakes, which do not challenge the normative validity of the theory. Other researchers disagree. At any rate, these violations do not clash with the Bayesian view, as statisticians or computer scientists understand it. That is, an agent may quantify uncertainty by a prior probability measure, and update this prior to a posterior in a Bayesian way, without maximizing EU with respect to her probabilistic beliefs.2 This book is devoted to behavioral violations of the first tenet of Bayesianism, namely, that all uncertainty can be quantified by a probability measure. In contrast to the other two types of violations, the rejection of the first tenet is a direct attack on the essence of the Bayesian approach, even when the latter is interpreted as a normative theory. As we argue shortly, there are situations in which violations of the first tenet cannot be viewed as mistakes, and cannot be easily corrected even by decision makers who are willing to convert to Bayesianism. When explaining the basic notion of uncertainty, as opposed to risk, one often starts out with Ellsberg’s (1961) famous examples (the “Ellsberg paradox”, which refers both to the two-urn and to the one-urn experiments). These experiments show that many people tend to prefer bets with known probabilities to bets with unknown ones, in a way that cannot be reconciled with the first tenet of Bayesianism. Specifically, Ellsberg’s paradox provides an example in which Savage’s axiom P2 is consistently violated by a nonnegligible proportion of decision makers.3 A decision maker who violates P2 as in Ellsberg’s paradox does not only deviate from EU maximization. Rather, such a decision maker exhibits a mode of behavior that cannot be described as a function of a probability measure. To the extent that behavioral data can challenge a purely cognitive assumption, Ellsberg’s paradox exhibits a violation of the first tenet of Bayesianism. Yet, David Schmeidler’s interest in uncertainty was not aroused by Ellsberg’s paradox4 or by any other behavioral manifestation of a non-Bayesian approach. Rather, Schmeidler’s starting point was purely cognitive: like Knight (1921) and Ellsberg, he did not find the first tenet of Bayesianism plausible. Specifically, Schmeidler argued that the Bayesian approach “does not reflect the heuristic amount of information that led to the assignment of […] probability” (Schmeidler, 1989: 571). His example was the following: assume that you take a coin out of your pocket, and that you are about to bet on it. You have tossed this coin many times in the past, and you have not observed any significant deviations from the assumption of fairness. For the sake of argument, assume that you have tossed the coin 1,000 times and that it has come up exactly 500 times each as Heads and Tails. Thus, you assign probability of 50 percent to the coin coming up on Heads, as well as on Tails, in the next toss. Next assume that your friend takes a coin out of her pocket. You have absolutely no information about this coin. If you are asked to assign probabilities to the two sides of the coin, you may well follow symmetry considerations (equivalently, Laplace’s principle of insufficient reason) and assign probability of 50 percent to each side of this coin as well. However,
Introduction
5
argued Schmeidler, the 50 percent that are based on empirical frequencies in large databases do not “feel” the same as the 50 percent that were assigned based on symmetry considerations. The Bayesian approach, in insisting that every source of uncertainty be quantified by a (single, additive) probability measure, is too restrictive. It does not allow the amount of information used for probabilistic assessments to be reflected in these assessments. It seems to be a natural step to couch this cognitive observation in a behavioral setup. Indeed, Ellsberg’s two-urn experiment is very similar to Schmeidler’s contrast between the two coins. It is, however, important to note that Schmeidler’s critique of Bayesianism starts from a cognitive perspective. It is not motivated by an observed pattern of behavior, such as much of the work ensuing from Allais’s paradox. Relatedly, Schmeidler’s critique of the first tenet of Bayesianism is not solely on descriptive grounds. Starting with the logic of the Bayesian approach, rather than with experimental evidence, this critique cannot be dismissed as focusing on a setup in which decision makers err. Rather, Schmeidler’s point was that in many situations there is not enough information for the generation of a Bayesian prior. In these situations, it is not clear that the rational thing to do is to behave as if one had such a prior. These considerations also raise doubts regarding the definition of rationality by internal consistency of decisions or of statements. If we were to assume that “rationality” only means coherence, Savage’s axioms would appear to be a very promising candidate for a canon of rationality. However, if we take these axioms as a rationality test, it is too easy to pass it: in a situation of uncertainty, one can arbitrarily choose any prior probability and behave so as to maximize EU with respect to this prior. This will clearly suffice to pass Savage’s rationality test. But it would not seem rational by any intuitive definition of the term. Ellsberg’s paradox is an extremely elegant illustration of a behavioral rejection of the first tenet of Bayesianism. It manages to translate the cognitive unease with the Bayesian approach to observed choice, thanks to certain symmetries in the decision problem. But these symmetries may also be misleading. While many decision makers violate Savage’s P2 in Ellsberg’s paradox, it seems easy enough to “correct” their choices so that they correspond to Savage’s theory. In both of Ellsberg’s examples there is enough symmetry to allow for Laplace’s principle of insufficient reason to pinpoint a single probability measure. This symmetric prior might appear as a natural candidate for the would-be Bayesian, and it might give the impression that violations of P2 can easily be worked around. Cognitive ease aside, the decision maker may behave as if she were Bayesian. This impression would be wrong. Most real-life problems do not exhibit enough symmetries to allow for a Laplacian prior. To consider a simple example, assume that one faces the uncertainty of war. There are only two states of the world to consider – war and no war. Empirical frequencies surely do not suffice to generate a prior probability over these two states, since the uncertain situation cannot be construed as a repeated experiment. Therefore, this is a situation of uncertainty as opposed to risk. But it would be ludicrous to suggest that the probability of war should be 50 percent, simply because there are two states of the world
6
Itzhak Gilboa
with no historical data on their relative frequencies. Indeed, in this situation there is sufficient reason to distinguish between the two states, though not sufficient information to generate a Bayesian prior. Ellsberg’s paradox, as well as Schmeidler’s coin example, should therefore be taken with a grain of salt. They drive home the point that the cognitive unease generated by uncertainty may have behavioral implications. But they do not capture the complexity of a multitude of real-life decisions in which there is sufficient reason (to distinguish among states) but not sufficient information (to generate a prior).
1.2. Nonadditive probabilities (CEU) David Schmeidler’s first attempt to model a non-Bayesian approach to uncertainty involved nonadditive probabilities. This term refers to mathematical entities that resemble probability measures, with the exception that they need not satisfy the additivity axiom. The idea can be simply explained in Schmeidler’s coin example (equivalently, in Ellsberg’s two-urn paradox). Assume, again, that there are two coins. One, the “known” coin, has been tossed many times, with a relative frequency of 50 percent Heads and 50 percent Tails. The other, the “unknown” one, has never been tossed before. Assume further that a decision maker feels uneasy about betting on the unknown coin. That is, she prefers to bet on the known coin coming up Heads than on the unknown coin coming up Heads, and the same applies to Tails. This preference pattern holds despite the fact that the decision maker agrees that both coins will eventually come up on either Heads or Tails. To be more concrete, the decision maker is indifferent between betting on “the known coin comes up Heads or Tails” and on “the unknown coin comes up Heads or Tails”. Should probabilities reflect willingness to bet, argued Schmeidler, the probability that the decision maker assigns to the unknown coin coming up Heads is lower than she does for the known coin coming up Heads, and the same would apply to Tails. To be precise, consider a model with four states of the world, each one describing the outcome of both coin tosses: S = {HH, HT, TH, TT}. HH denotes the state in which both coins come up Heads; HT – the state in which the known coin come up Heads and unknown – Tails; and so forth. In this setup, imagine that the probability of {HH, HT} and of {TH, TT} is 50 percent, whereas the probability of {HH, TH} and of {HT, TT} is 40 percent. This would reflect the fact that the EU of a bet on any side of the known coin is higher than that of a bet on either side of the unknown coin. Yet, the union of the first pair of events, {HH, HT} and {TH, TT}, equals the union of the second pair, namely, {HH, TH} and {HT, TT}, and it equals S. Thus, if probabilities reflect willingness to bet, they are nonadditive: the probability of each of {HH, TH} and {HT, TT} is 40 percent, while the probability of their union is 100 percent. More generally, nonadditive probabilities are real-valued set functions that are defined over a sigma-algebra of events. They are assumed to satisfy three conditions: (i) monotonicity with respect to set inclusion; (ii) assigning zero to the empty
Introduction
7
set; and (iii) assigning 1 to the entire state space (normalization). Observe that no continuity is generally assumed. Hence, adding the requirement of additivity (with respect to the union of two disjoint events) would result in a finitely additive (rather than sigma-additive) probability, in accordance with the derivations of de Finetti (1937) and Savage (1954). As illustrated by the coin example, nonadditive probability measures can reflect the amount of information that was used in estimating a probability of an event. But how does one compute EU with respect to a nonadditive probability measure? Or, how does one define an integral of a real-valued function with respect to such a measure? In the simple case where the function assumes a positive value x on an event A, and otherwise zero, the answer seems simple: the integral should be xv(A) where v(A) denotes the nonadditive probability of A. Indeed, this definition has been implicit in our discussion of “willingness to bet on an event” mentioned earlier. If x stands for the utility level of the more desirable outcome, and zero – of the less desirable one, then 0.4x would be the integral of the bet on each side of the unknown coin, whereas 0.5x would be the integral of the bet on each side of the known coin. What happens, then, if a function assumes two positive values, x on A and y on B (where A and B are two disjoint events)? The straightforward extension would be to define the integral as xv(A) + yv(B). Indeed, this definition would seem to generalize the Riemann integral for the case in which v is nonadditive: it sums the areas of rectangles whose base is the domain of the function, and their height is the value of the function (Figure 1.1). Yet, this definition is problematic. First, letting y approach x, one finds that the integral thus defined is not continuous with respect to the integrand. Specifically, for x = y the value of the integral would be xv(A ∪ B), which will, in general, differ from xv(A) + yv(B) = x[v(A) + v(B)].
x
y xv (A) yv(B)
A
B S
Figure 1.1
8
Itzhak Gilboa
x (x – y)v(A) y
yv(AUB )
A
B S
Figure 1.2
Second, the same example can serve to show that the integral is not monotone with respect to the integrand: a function f may dominate another function g (pointwise), yet the integral of f will be strictly lower than that of g. Schmeidler’s solution to these difficulties was to use the Choquet integral. Choquet (1953–54) dealt with capacities, which he defined as nonadditive probability measures that satisfy certain continuity conditions. Choquet defined a notion of integration of real-valued functions with respect to capacities that satisfies both continuity and monotonicity with respect to the integrand. In the earlier example, the Choquet integral would be computed as follows: assume that x > y. Over the event A ∪ B, one is guaranteed the value y. Hence, let us first calculate yv(A ∪ B). Next, over the event A the function is above y. The additional value, (x − y), is added to y over the event A, but not over B. Thus, we add to the integral (x−y)v(A). Overall, the integral of the function would be (x−y)v(A)+yv(A∪B). This is also the sum of areas of rectangles. But this time their height is not the value of the function. Rather, it is the difference between two consecutive values that the function assumes. (See Figure 1.2.) Observe that this value equals xv(A) + yv(B) if v happens to be additive. That is, this definition generalizes the standard one for additive measures. But even if v is not additive, the Choquet integral retains the properties of continuity and monotonicity. Consider a simplified version of the coin example mentioned earlier, where we ignore the known coin and focus on the unknown coin. There are only two states of the world, H and T. Let us assume that v(H) = v(T) = 0.4. Assume that a function f takes the value x at H and the value y at T. Then, if x ≥ y ≥ 0 the Choquet integral of f is 0.4(x − y) + y = 0.4x + 0.6y. If, however, y > x ≥ 0, the Choquet integral of f is 0.4(y − x) + x = 0.6x + 0.4y.
Introduction
9
The definition of the Choquet integral for general functions follows the intuition stated earlier. The following chapters contain a precise definition and other details. For the time being it suffices to mention that the Choquet integral is, in general, continuous and monotone with respect to its integrand. Schmeidler proposed that decision makers behave as if they were maximizing the Choquet integral of their utility function, where the integral is computed with respect to their subjective beliefs, and the latter are modeled by a nonadditive probability measure. This claim is often referred to as “Choquet Expected Utility” (CEU) theory. Whereas the notion of capacities and the Choquet integral existed in the mathematical literature, Schmeidler’s (1989) was the seminal paper that first applied these concepts to decision under uncertainty, and that also provided an axiomatic foundation for CEU, comparable to the derivation of Expected Utility Theory (EUT) by Anscombe and Aumann (1963). In particular, the axiomatic derivation identifies the nonadditive probability v uniquely, and the utility function – up to a positive linear transformation.5
1.3. A cognitive interpretation of CEU with convex capacities Schmeidler defined a nonadditive (Choquet) EU maximizer to be uncertainty averse if her nonadditive subjective probability v was convex. Convexity, as defined in cooperative game theory,6 means that for any two events A and B, v(A) + v(B) ≤ v(A ∩ B) + v(A ∪ B). This condition is also referred to as supermodularity, or 2-monotonicity, and it is equivalent to stating that the marginal v-contribution of an event is always nondecreasing in the following sense: suppose that an event R is “added”, in the sense of set union, to events S and T that are disjoint from R. Suppose further that S is a subset of T . Then v is convex if and only if, for every such three events R, S, and T , the marginal contribution of R to T , v(T ∪ R) − v(T ), is at least as high as the marginal contribution of R to S, v(S ∪ R) − v(S). Later literature has questioned the appropriateness of this definition of uncertainty aversion, and has provided several alternative definitions. Chapter 3 is devoted to this issue and we do not dwell on it here. But convexity of nonadditive measures has remained an important property for other reasons. It is well known in cooperative game theory that convex games have a nonempty core. That is, if a nonadditive measure v is convex, then there are (finitely) additive probability measures p that dominate it pointwise (p(A) ≥ v(A) for every event A). In the context of cooperative game theory, a dominating measure p suggests a stable imputation: a way to split the worth of the grand coalition, v(S) = p(S) = 1, among its members, in such a way that no coalition A has an incentive to deviate and operate on its own. Schmeidler (1986) showed that, for a convex game v, the Choquet integral of every real-valued function with respect to v equals the minimum over all integrals of this function with respect to the various (additive) measures in the core of v. Conversely, if a game v has a nonempty core, and the Choquet integral of every
10
Itzhak Gilboa
function with respect to v equals the minimum, over additive measures in the core, of the integrals of this function, then v is convex. Consider again the simplified version of the coin example mentioned earlier, where there are only two states of the world, H and T. Assume that v(H) = v(T) = 0.4. This v is convex and its core is Core(v) = {(p, 1 − p) | 0.4 ≤ p ≤ 0.6}. For a function f that takes the value x ≥ 0 at H and the value y ≥ 0 at T, we computed the Choquet integral with respect to v, and found that it is 0.4x + 0.6y if x ≥ y and 0.6x + 0.4y if y > x. It is readily observed that this integral is precisely Min0.4 ≤ p ≤ 0.6 px + (1 − p)y. That is, the Choquet integral (of any non-negative) f with respect to the convex v equals the minimum of the integrals of f relative to additive measures, over all additive measures in Core(v). The decision maker might therefore be viewed as if she does not know what probability measure governs the unknown coin. But she believes that each side of the coin cannot have a probability lower than 40 percent. Thus, she considers all the probability measures that are consistent with this estimate. Each such probability measure defines an integral of the function f . Faced with this set of integral values, the CEU maximizer behaves as if the lowest possible expected value of f is the relevant one. In other words, CEU (with respect to a convex capacity) may be viewed as a theory combining the maxmin principle and EU: the decision maker first computes all possible EU values, then considers the minimal of those, and finally – chooses an act that maximizes this minimal EU. Whenever a CEU maximizer has a convex capacity v, this cognitive interpretation of the Choquet integral holds. The set of probabilities with respect to which one takes the minimum of the integral may be interpreted as representing the information available to the decision maker. In the example stated earlier, the decision maker might not be able to specify the probabilities of the events in question, but she may be able to provide bounds on these probabilities. This cognitive interpretation should be taken with a grain of salt. Observe that in the coin example, the decision maker has no information about the unknown coin. If we were to ask what is the set of probabilities that she deems as possible, we would have to include all probability measures, {(p, 1 − p) | 0 ≤ p ≤ 1}. Yet, the decision maker behaves as if only the measures {(p, 1 − p) | 0.4 ≤ p ≤ 0.6} were indeed possible. Thus, the core of the capacity v need not coincide with the probabilities that are, indeed, possible according to available information. Rather, the core of v is the set of probabilities that the decision maker appears to entertain, given her choices, and in the context of the maxmin decision rule. One may conceive of other decision rules that would give rise to other sets of probability measures, and it is not clear, a priori, that the maxmin framework is the appropriate one to elicit the decision maker’s “real” beliefs.
Introduction
11
1.4. Multiple priors (MMEU) In the coin example, as well as in Ellsberg’s experiments, the information that is explicitly provided to the decision maker can be fully captured by placing lower and/or upper bounds on the probabilities of specific event. For instance, in the coin example it is natural to imagine that the probability of each side of the “known” coin is known to be 50 percent, whereas the probability of each side of the “unknown” coin is only known to be in the range (0, 1). As mentioned earlier, the decision maker’s behavior may not be guided by this set of probabilities. Rather, the decision maker may behave as if each side of the unknown coin has some probability in the range (0.4, 0.6). Further, her behavior may exhibit some uncertainty about the probability governing the known coin as well, and we may find that she behaves as if each side of the known coin has some probability in the range, say, (0.45, 0.55). In all these examples, both explicitly given information and behaviorally derived uncertainty are reflected by lower and upper bounds on the probabilities of specific events. Since upper bounds on the probability of an event may be written as a lower bound on the probability of its complement, lower bounds suffice. That is, one may define the set of relevant probability measures by simple constraints of the form p(A) ≥ v(A) for various events A and an appropriately chosen v. In other words, the set of probabilities may be defined by a nonadditive probability measure, interpreted as the lower bound on the unknown probability.7 But should one follow this cognitive interpretation of CEU, one may find it too restrictive. For example, one might believe that an event A is at least twice as likely as an event B. Thus, one would like to consider only probability measures that satisfy p(A) ≥ 2p(B). This is a simple linear constraint, but it cannot be reduced to constraints of the form p(A) ≥ v(A). Moreover, one may have various pieces of information that restrict the set of probability measures that might be governing the decision problem, that are not representable by linear constraints. For example, assume that a random variable is known (or assumed) to have a normal distribution, with unknown expectation and variance. Ranging over all the possible values of the unknown parameters results in a set of probability measures. This set will generally not be defined by a lower bound nonadditive probability function v. In fact, all problems that are analyzed by the tools of classical statistics are modeled by a set of possible probability measures, over which one has no prior distribution. Thus, a huge variety of problems encountered on a daily basis by individual decision makers, scientists, professional consultants, and other experts involve sets of probability, where, for the most part, these sets do not constitute the core of a nonadditive measure. Whereas classical statistics does not offer a general decision theory, it is natural to extend the CEU interpretation for the case of a convex v to this more general setup. Specifically, assume that a decision maker conceives of a state space S. Over this state, she considers as possible a set of probability measures C. Given a choice problem, and assuming that the decision maker has a utility function u
12
Itzhak Gilboa
defined over possible outcomes, she might adopt the following decision rule: for each possible act f , and for each possible (additive) probability measure p ∈ C, compute the EU of f relative to p. Next consider the minimum (or infimum) of these EU values of f , ranging over all measures in C. Evaluate f by this minimum value, and choose the act that maximizes this index. This theory was suggested and axiomatized by Gilboa and Schmeidler (1989). It has become to be known as the Maxmin Expected Utility (MMEU) model, or the “multiple prior” model. The axiomatization derives a set of priors C that is convex.8 Indeed, given the decision rule of MMEU, any set of probability measures C is observationally equivalent to its convex hull. Given the restriction of convexity, the set C is uniquely identified by the decision maker’s preferences. The utility function in this model is unique up to positive linear transformation, and it is identified in tandem with the set C. In a sense, MMEU theory provided classical statistics with the foundations that Ramsey, de Finetti, and Savage provided Bayesian statistics.9 EUT theory specified how a Bayesian prior might be used for decision making, and the axiomatizations of (subjective) EUT offered a derivation of this prior from observable behavior. Similarly, MMEU specified how decisions might be made given a set of priors, and the related axiomatization provided a derivation of the set of priors. However, with a set of priors there seems to be much lower degree of agreement about the appropriate way to use it for decision making. In particular, the maxmin criterion was often criticized for being too extreme, and several alternatives have been offered. In particular, Jaffray (1989) has offered to use Hurwicz’s α-criterion over the set of EU values of an act, and Klibanoff et al. (2003) offer to aggregate all these EU values. The Gilboa–Schmeidler axiomatization is based on behavioral data. As such, the set of priors that they derive shares the duality of interpretation with the core of a convex capacity. That is, while it is tempting to interpret the set of priors as reflecting the information available to the decision maker, the two might differ. The set of priors is simply those probabilities that describe the decision maker’s behavior, via the maxmin rule, should she satisfy the Gilboa–Schmeidler axioms. It is possible that a decision maker would have actual information represented by a set C, but that she would behave according to the maxmin rule with respect to a different set of priors C . With the caveat mentioned earlier, MMEU has two main advantages over CEU. First, a general set of priors, restricted only by convexity, may represent a much larger variety of decision situations, than may a set that has to be the core of a convex capacity.10 Second, to many authors MMEU appears to be a more intuitive theory than does CEU.11 MMEU is almost as simple to explain as classical EUT. At the same time, MMEU may be easier to implement than EUT, because the former relaxes the informational requirements imposed by the latter. Given that Schmeidler’s interest in uncertainty started with a cognitive unease generated by the assumptions of the Bayesian approach, it is comforting to know that an alternative theory can be offered that can relax the first tenet of Bayesianism, but that retains the cognitive appeal of EUT.
Introduction
13
1.5. Related literature In the introduction, we followed the development of CEU and of MMEU in an associative and chronological order, tracing the path that Schmeidler had taken in his thoughts about decision under uncertainty. Indeed, CEU and MMEU will remain the focus of this volume. However, these theories bear some similarities to other theories of belief representation and/or of decision making. While we do not intend to provide here a complete history of reasoning about uncertainty, the reader would probably benefit from a brief survey of a few other, closely related theories. All of them were developed independently of CEU and MMEU, and some of these developments were more or less concurrent with the development of CEU and MMEU. 1.5.1. Rank dependent expected utility (RDEU)12 Several psychologists have suggested the notion that individuals may not perceive probabilities correctly. This idea dates back to Preston and Baratta (1948) and Edwards (1954), but it has gained popularity among economists mostly with prospect theory (PT), suggested by Kahneman and Tversky (1979). Specifically, it is postulated that, in describing decision making under risk, an event with a stated probability of p has a decision weight f(p) which is, in general, different from p. It is typically assumed that small probabilities are weighted in a disproportionate way, namely, that f(p) > p for small values of p, and that a converse inequality holds when p is close to 1. If we were to separate this idea from the other ingredients of PT (most notably, from gain–loss asymmetry), we would have the following generalization of EUT: faced with a lottery that promises an outcome xi with probability pi , the decision maker evaluates it by f(pi )u(xi ) rather than by pi u(xi ). While this idea can quite intuitively explain many violations of EUT under risk, it poses several theoretical difficulties. First, the evaluation of a lottery depends on its presentation: if the same outcome appears twice in the lottery (with two distinct probabilities), they will enter utility calculations differently than in the case in which they appear only once, with the sum of the corresponding probabilities. Second, the functional f(pi )u(xi ) fails to be continuous in the outcomes xi . Finally, it fails to respect first order stochastic dominance. Specifically, it may decrease as some of the xi increase. All of these problems disappear if f is additive, but in this case it has to be the identity function and the model fails to capture distortion of probabilities (Fishburn, 1978). Prospect theory dealt with the first difficulty by an editing phase that the decision maker goes through before evaluating lotteries.13 But it did not offer solutions to the other two problems. These problems are reminiscent of those that one encounters when one attempts to use a naïve definition of integration with respect to a nonadditive measure, as discussed earlier (see Figure 1.1). Specifically, if one starts out with an additive probability measure P , and “distorts” it by a function f , one obtains a nonadditive probability v defined by v(A) = f(P (A)). In this case,
14
Itzhak Gilboa the functional f(pi )u(xi ) can be thought of as the naïvely defined integral of the utility of the outcome x, with respect to v. It comes as no surprise, then, that maximization of the functional f(pi )u(xi ) poses the same difficulties as those discussed earlier. The discussion of Choquet integration earlier may suggest that, in the context of decision under risk, PT may be modified so that it respects first order stochastic dominance and continuity. To this end, one would like to apply the distortion function f not to the probability that a certain outcome is obtained, but to the cumulative probability, that at least a certain outcome is obtained. Defining v = f(P ), this is tantamount to defining an integral as in Figure 1.2 as opposed to Figure 1.1. This idea was proposed, independently and more or less concurrently, by Quiggin (1982) and by Yaari (1987). Both were developed independently of Schmeidler’s work. Moreover, Weymark (1981) offered yet another independent derivation of the same functional in the context of social choice. The resulting model in the context of decision under risk has come to be known as the “rankdependent expected utility model” (see Chew (1983)), because in this model the decision weight of outcome xi does not depend only on the probability of that outcome, but also on the aggregate probabilities of all outcomes that are ranked above (or below) it. The rank-dependent model has been elaborated on by Segal (1989), and it has been applied to a range of economic problems, as well as tested experimentally. More recently, the rank-dependent model was combined with other ideas of PT to generate cumulative prospect theory (CPT, see Tversky and Kahneman, 1992). The rank-dependent model (without additional ingredients of PT) is a special case of CEU. Specifically, defining v = f(P ), CEU reduces to RDEU. However, the converse is false: not every CEU model can be represented as a RDEU model. Only a very special class of nonadditive probability measures v can be represented by an additive measure P and a distortion function f as stated earlier. The RDEU model does not deal with uncertainty. It is restricted to situations of risk, that is, of known probabilities. In particular, the RDEU model cannot help explain the pattern of choices observed in Ellsberg’s paradox. We therefore do not discuss the RDEU model in this book. 1.5.2. Belief functions The notion that a nonadditive set function may represent bounds on the probabilities of events dates back to the theory of belief functions suggested by Dempster (1967) and Shafer (1976). Assume that one gathers evidence for various events. Some evidence is very specific, suggesting that a particular state of the world is likely to be the case. Other evidence may be more nebulous, suggesting that a nonsingleton event is likely to have obtained, without specifying which state in it has indeed materialized. Generally, evidence is conceptualized as a non-negative number attached to an event. Thus, evidence is specific to an event, and the weight of evidence is measured numerically. Given a collection of such number–event pairs, what is the total weight of evidence supporting a particular event? In answering this question according to
Introduction
15
Dempster–Shafer’s theory, one first normalizes the weight of all evidence gathered so that it adds up to unity. Then one sums up the evidence for the event in question, as well as the evidence for each subset thereof. The resulting function is a nonadditive probability. It can also be shown that this nonadditive probability is convex. In fact, it satisfies a stronger condition than convexity, called infinite monotonicity. Conversely, an infinite monotone nonadditive measure can be obtained from a set of non-negative weights as described earlier. Such functions are called belief functions. In this theory, nonadditivity arises from the fact that evidence is not fully specified. In the context of the coin example, one might imagine that we have evidence for the fact that one of the sides of the coin will come up, but no evidence that specifically points to any side. The weight of evidence for the event {H, T} cannot be split between {H} and {T}. If one were to think in terms of a “true”, “objective” probability measure, one would view the belief function as a lower bound on the values of this probability measure: each event should be assigned a probability that is at least as large as the value attributed to it by the belief function. Since belief functions are convex, and hence have a nonempty core, there are always probability measures that satisfy the constraints represented by a belief function. Dempster–Shafer’s theory is purely cognitive. It has no behavioral component, and no decision theory attached to it. Dempster and Shafer have not offered an axiomatic foundation for their theory. That is, there is no set of axioms on observable data, such as likelihood judgments, that characterize a unique belief function related to these data. Yet, it shares with CEU the representation of uncertainty by a nonadditive measure, and the potential interpretation of this nonadditive measure as a lower bound on what the “real”, additive probability might be.
1.5.3. Multiple priors with unanimity ranking Suppose that a decision maker entertains a set of probability measures as possible priors. For every act, she has a range of possible expected utility values, computed according to these priors. In making a decision, the decision maker may summarize this range by a single number, as suggested by the maxmin, Hurwicz’s or some other criterion. But she may also refrain from collapsing this set of expected utilities into a single number. Rather, she may retain the entire set of EU values, indexed by the priors, as a representation of the act’s desirability. It is then natural to suggest that act f is preferred to act g, if and only if for each and every possible probability measure p, of EU values of f , with respect to p is above that of g. If we think of each possible prior as the opinion of a given individual, then f is preferred to g, if and only if f is considered to be better than g unanimously. Alternatively, if we were to think of probability measures as columns in a decision matrix that specifies, for every act, a row of EU values, this criterion would coincide with strict domination.14 This decision rule was axiomatized, independently, by Gilboa (1984) and Bewley (1986, 2003), both relying on a theorem of Aumann (1962).15
16
Itzhak Gilboa
Strict domination is, obviously, a partial ordering. It follows that the decision theory, one ends up with, will have to remain silent of certain choices. Specifically, if an act f is preferred to another act g according to some priors, but converse preference holds for other priors, the theory does not offer any prediction of choice. This violation of the completeness axiom is viewed by many as problematic for several reasons. Theoretically, the completeness axiom is often justified by necessity: a decision has to be made, so that ultimately revealed preference will have to decide whether f is (at least weakly) preferred to g or vice versa. From a more practical viewpoint, it is often hard to conduct economic analysis when the theory leaves considerable freedom in terms of its predictions. Bewley’s attempt to deal with these difficulties was to suggest that there always is a “status quo” act, f0 , which gets to be chosen unless another act dominates it, and then this new one becomes the new status quo. Bewley has not offered a theory of how the status quo is generated. However, it is still possible that such a theory be offered and complement the unanimous multiple prior model to be an alternative to MMEU.
1.6. Conclusion Choquet expected utility and MMEU were suggested as theories of decision making under uncertainty, rejecting the first tenet of Bayesianism. While some researchers view them solely as theories of bounded rationality, their starting point is not mistakes that decision makers might commit, but the theoretical inadequacy of the Bayesian paradigm. Specifically, when there is no sufficient information for a generation of a prior probability, it is not obvious how one can choose a prior rationally. Instead, one may entertain uncertainty, and make decisions in a way that reflects one’s state of knowledge. The chapters in this volume constitute a sample of the works published on uncertainty in economic theory. They are divided into two main parts: theory and applications, containing more detailed introductions that may help to orient the reader. It (almost) goes without saying that this volume is not exhaustive. Many very good papers, published and unpublished, were written on the topics discussed here, but could not be included in the volume due to obvious constraints. In making the particular selection we offer the reader here, we strove for brevity and variety, in the hope of whetting the reader’s appetite, and with no claim to exhaust the important contributions to this literature. A final caveat relates to terminology. As is often the case, authors who contributed to this volume do not always agree on the appropriate terms for various concepts. In particular, some authors feel very strongly about the choice between “uncertainty” and “ambiguity” (and, correspondingly, between “uncertainty aversion” and “ambiguity aversion”), while they refer, for the most part, to the same concept. Similarly, “MMEU” is sometimes referred to as “MEU” (leaving the maximization out, as in “EU” and “CEU”), or as “the multiple prior model”. Since
Introduction
17
all relevant concepts are defined formally in a way that leaves no room for confusion, we decided to let everyone use the terms they prefer, in the hope that the diverse terms would lead to more fruitful associations.
Acknowledgments I thank my colleagues for comments and suggestions. In particular, this introduction has benefited greatly from many comments by Peter Wakker.
Notes 1 Two more assumptions are often entailed by “Bayesianism”. First, a Bayesian is supposed to conceive of all relevant eventualities. Second, she is expected to apply the Bayesian approach to any decision problem. We do not dwell on these assumptions here. 2 Moreover, the agent may use her probabilistic beliefs in her decisions in a way that uniquely identifies these beliefs, yet that differs from EU maximization, as suggested by Machina and Schmeidler’s (1992) “probabilistic sophistication”. 3 Savage’s axiom P2 states that, if two acts are equal on a given event, then it should not matter what they are equal to on that event. That is, that one can determine preference between them based solely on their values on the event on which they differ. This axiom is often referred to as “the Sure Thing Principle”, though this term has been used in several other ways as well. See Wakker (Chapter 2). 4 In fact, Schmeidler was not aware of Ellsberg’s work when he started his study in the early 1980s. 5 Schmeidler’s work appeared as a working paper in 1982. Some of the mathematical analysis were published separately in Schmeidler (1986). However, it was not until 1989 that his main paper appeared in print. 6 Schmeidler’s choice of the letter v to denote a nonadditive measure was probably guided by his past experience with cooperative game theory, in which the letter v is a standard notation for the characteristic function of a transferable utility cooperative game (which is a nonadditive set function). See Shapley (1965). 7 This set of constraints defines the core of v. Yet, even if this set is a nonempty, v need not be convex. It will, however, be exact (see Schmeidler, 1972). 8 That is, if p and q are in C, then so is αp + (1 − α)q. 9 Gilboa and Schmeidler (1989) derivation was conducted in the framework of Anscombe and Aumann (1963). Derivations that do not resort to objective probabilities were provided by Casadesus-Masanell et al. (2000) and Ghirardato et al. (2001). 10 To be concrete, with a finite state space, the beliefs modeled by CEU are characterized by finitely many parameters, whereas the beliefs modeled by MMEU constitute an infinitely dimensional class. This does not imply that MMEU is a generalization of CEU. MMEU only generalizes CEU with a convex capacity. Generally, capacities need not be convex. Moreover, CEU may reflect uncertainty liking behavior, whereas a certain form of uncertainty aversion is built into MMEU. 11 The axioms of MMEU in the Anscombe–Aumann framework are also easier to interpret than those of CEU. 12 The term “rank dependent utility” is also used for this model. 13 Prospect theory deals with prospects rather than with lotteries, namely, with outcomes as they are perceived relative to a reference point. The notion of a reference point, and the idea that people respond to changes rather than to absolute levels, are ingredients of PT that we ignore here. 14 Alternatively, one may borrow the idea of interval orders (Fishburn, 1985) and argue that f is preferred to g if and only if the entire interval of expected utility values of f
18
Itzhak Gilboa
is above that of g. This would correspond to the notion of “overwhelming” strategy in game theory, which is evidently stronger than domination. 15 Gilboa (1984) appeared in a master’s thesis, and has not been translated from Hebrew. Bewley’s paper appeared as a Cowles Foundation discussion paper in 1986. Bewley took this decision rule as a building block of an elaborate theory.
References Allais, M. (1953), “Le Comportement de L’Homme Rationel devant le Risque: critique des Postulates et Axioms de l’Ecole Americaine,” Econometrica, 21: 503–546. Anscombe, F. J. and R. J. Aumann (1963), “A Definition of Subjective Probability,” The Annals of Mathematics and Statistics, 34: 199–205. Aumann, R. J. (1962), “Utility Theory without the Completeness Axiom,” Econometrica, 30: 445–462. Bayes, T. (1764), “Essay Towards Solving a Problem in the Doctrine of Chances,” Philosophical Transactions of the Royal Society of London. Bewley, T. F. (2003), “Knightian Decision Theory, Part I,” Decisions in Economics and Finance, 25: 79–110 (Cowles Foundation Discussion Paper, 1986). Casadesus-Masanell, R., P. Klibanoff, and E. Ozdenoren (2000), “Maxmin Expected Utility over Savage Acts with a Set of Priors,” Journal of Economic Theory, 92: 35–65. Chew, S. H. (1983), “A Generalization of the Quasilinear Mean with Applications to the Measurement of Income Inequality and Decision Theory Resolving the Allais Paradox,” Econometrica, 51: 1065–1092. Choquet, G. (1953–4), “Theory of Capacities,” Annales de l’Institut Fourier, 5 (Grenoble): 131–295. de Finetti, B. (1937), “La Prevision: Ses Lois Logiques, Ses Sources Subjectives,” Annales de l’Institute Henri Poincare, 7: 1–68. Dempster, A. P. (1967), “Upper and Lower Probabilities Induced by a Multivalued Mapping,” Annals of Mathematical Statistics, 38: 325–339. Edwards, Ward (1954), “The Theory of Decision Making,” Psychological Bulletin, 51: 380–417. Ellsberg, D. (1961), “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal of Economics, 75: 643–669. Fishburn, P. C. (1978), “On Handa’s ‘New Theory of Cardinal Utility’and the Maximization of Expected Return,” Journal of Political Economy, 86: 321–324. Fishburn, P. C. (1985), Interval Orders and Interval Graphs, John Wiley and Sons, New York, 1985. Ghirardato, P., F. Maccheroni, M. Marinacci et al. (2001), “A subjective Spin on Roulette Wheels,” Econometrica, 71 (6): 1897–1908, November 2003. (Reprinted as Chapter 6 in this volume.) Gilboa, I. (1984), Aggregation of Preferences, MA Thesis, Tel-Aviv University. Gilboa, I. and D. Schmeidler (1989), “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18: 141–153. Jaffray, J.-Y. (1989), “Linear Utility Theory for Belief Functions,” Operations Research Letters, 8: 107–112. Kahneman, D. and A. Tversky (1979), “Prospect Theory: An Analysis of Decision Under Risk,” Econometrica, 47: 263–291. Klibanoff, P., M. Marinacci, and S. Mukerji (2003), “A Smooth Model of Decision Making under Ambiguity,” mimeo.
Introduction
19
Knight, F. H. (1921), Risk, Uncertainty, and Profit. Boston, New York: Houghton Mifflin. Machina, M. and D. Schmeidler (1992), “A more Robust Definition of Subjective Probability,” Econometrica, 60: 745–780. Preston, M. G. and P. Baratta (1948), “An Experimental Study of the Auction Value of an Uncertain Outcome,” American Journal of Psychology, 61: 183–193. Quiggin, J. (1982), “A Theory of Anticipated Utility,” Journal of Economic Behaviour and Organization, 3: 323–343. Ramsey, F. P. (1931), “Truth and Probability,” The Foundation of Mathematics and Other Logical Essays. New York: Harcourt, Brace and Co. Savage, L. J. (1954), The Foundations of Statistics. New York: John Wiley and Sons. Schmeidler, D. (1972), “Cores of Exact Games, I,” Journal of Mathematical Analysis and Applications, 40: 214–225. Schmeidler, D. (1986), “Integral Representation without Additivity,” Proceedings of the American Mathematical Society, 97: 255–261. Schmeidler, D. (1989), “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57: 571–587. Segal, U. (1989), “Anticipated Utility: A Measure Representation Approach,” Annals of Operations Research, 19: 359–373. Shafer, G. (1976), A Mathematical Theory of Evidence. Princeton University Press, Princeton NJ. Shapley, L. S. (1965), “Notes on n-Person Games VII: Cores of Convex Games,” The RAND Corporation R.M. Reprinted as: Shapley, L. S. (1972), “Cores of Convex Games,” International Journal of Game Theory, 1: 11–26. (Reprinted as Chapter 5 in this volume.) Tversky, A. and D. Kahneman (1974), “Judgment under Uncertainty: Heuristics and Biases,” Science, 185(4157): 1124–1131. Tversky, A. and D. Kahneman (1981), “The Framing of Decisions and the Psychology of Choice,” Science, 211(4481): 453–458. Tversky, A. and D. Kahneman (1992), “Advances in Prospect Theory: Cumulative Representation of Uncertainty,” Journal of Risk and Uncertainty, 5: 297–323. Weymark, J. A. (1981), “Generalized Gini Inequality Indices,” Mathematical Social Sciences, 1: 409–430. Yaari, M. E. (1987), “The Dual Theory of Choice under Risk,” Econometrica, 55: 95–115.
2
Preference axiomatizations for decision under uncertainty Peter P. Wakker
Several contributions in this book present axiomatizations of decision models, and of special forms thereof. This chapter explains the general usefulness of such axiomatizations, and reviews the basic axiomatizations for static individual decisions under uncertainty. It will demonstrate that David Schmeidler’s contributions to this field were crucial.
2.1. The general purpose of axiomatizations In this section we discuss some general purposes of axiomatizations. In particular, the aim is to convince the reader that axiomatizations are an essential step in the development of new models. To start, imagine that you are a novice in decision theory, and have an important decision to take, say which of several risky medical treatments to undergo. You consult a decision theorist, and she gives you a first advice, as follows: 1 2 3 4
List all relevant uncertainties. In your case we assume that the uncertainty concerns which of n potential diseases s1 , . . . , sn is the one you have. Express your uncertainty about what your disease is numerically through probabilities p1 , . . . , pn , subjective if necessary. Express numerically how good you think the result is of each treatment conditional upon each disease. Call these numbers utilities. Of the available treatments, choose the one that maximizes expected utility, that is, the probability-weighted average utility.
Presented in this way, the first advice is ad hoc, and will not convince you. What are such subjective probabilities, and how are you to choose them? Similar questions apply to the utility numbers. And, if such numbers can be chosen, why should you take products of probabilities and utilities, and then sum these products? Why not use other mathematical operations? The main problem with the first advice is that its concepts of probabilities and utilities do not have a clear meaning. They are theoretical constructs, which means that they have no meaning in isolation, but can only get meaning within a model, in relation to other concepts. The decision theorist did not succeed in convincing you, and she now turns to a second advice, seemingly very different. She explains the meaning of transitivity
Preference axiomatizations under uncertainty
21
and completeness of preferences to you, and you declare that you want to satisfy these conditions. She next explains the sure-thing principle to you, meaning that a choice between two treatments should depend only on their results under those diseases where the treatments differ, and not on the results for diseases for which the two treatments give the same results. Let us assume that you want to satisfy this condition as well. Next the decision theorist succeeds in convincing you of the appropriateness of the other preference conditions of Savage (1954). Satisfying these conditions is the decision analyst’s second advice. The second advice is of a different nature than the first. All of its conditions have been stated directly in terms of choice making. Even if you would not agree with the appropriateness of all conditions, at least you can relate to them, and know what they mean. They do not concern strange undefined theoretical concepts. Still, and this was Savage’s (1954) surprising result, the two advices turn out to be identical. One holds if and only if the other holds, given a number of technical assumptions that we ignore here. Whereas the second advice seemed to be entirely different from the first, it turns out to be the same. The second advice translates the first advice, which was stated in a theoretical language, into the meaningful language of empirical primitives, that is, preferences. Such translations are called axiomatizations. They reformulate, directly in terms of the observable primitives such as choices, what it means to assume that some theoretical model holds. A decision model is normatively appropriate if and only if its characterizing axioms are, and is descriptively valid if and only if the characterizing axioms are. Axiomatizations can be used to justify a model, but also to criticize it. Expected utility can be criticized by criticizing, for instance, the sure-thing principle. This is what Allais (1953) did. If a model is to be falsified empirically, then axioms can be of help because they are stated in terms of directly testable empirical primitives. In applications, we usually do not believe models to hold true perfectly well, and use them as approximations or as metaphors, to clarify some aspects of reality that are relevant to us. We mostly do not actually measure the concepts used in models. For instance, most economic models assume that consumers maximize utility, but we rarely measure consumers’ utility functions. The assumption of utility maximization is justified by the belief that for the topics considered, completeness and transitivity of preference are reasonable assumptions. These preference axioms, jointly with continuity, axiomatize the maximization of utility and clarify the validity and limitations thereof. Axiomatizations are crucial at an early stage of the development of new models or concepts, namely at the stage where setups and intuitions are qualitative but quantifications seem to be desirable. Not only do axiomatizations show how to verify or falsify, and how to justify or criticize given models, but they also demonstrate what are the essential parameters and concepts to be measured or determined. Without axiomatizations of expected utility, Choquet expected utility (CEU), and multiple priors, it would not be clear whether at all their concepts such as utility etc. are sensible concepts, and are at all the parameters to be assessed. Ahistorical example may illustrate the importance of axiomatizations. For a long time, models were popular that deviated from expected utility by transforming
22
Peter P. Wakker
probabilities of separate outcomes, such as those examined by Edwards (1955) and Kahneman and Tversky (1979). These models were never axiomatized, which could have served as a warning signal that something was wrong. Indeed, in 1978, Fishburn discovered that no sensible axiomatization of such models will ever be found because these models violate basic axioms such as continuity and, even more seriously, stochastic dominance. When Quiggin (1982) and Schmeidler (1989, first version 1982) introduced alternative models of nonlinear probabilities, they took good care of providing axiomatic foundations. This made clear what the empirical meaning of their models is, that these models do not contain intrinsic inconsistencies, and that their concepts of utilities and nonlinear probabilities are sensible. Quiggin (1982) and Schmeidler (1989) independently developed the idea of rank-dependence and, thus, were the first to present sound models that allow for a new component in individual decision theory: a subjective decision attitude toward incomplete information (i.e. risk and uncertainty). This new component is essential for the study of decision under incomplete information, and sound models for handling it had been dearly missing in the literature up to that point. I consider this development the main step forward for decision under incomplete information of the last decades. Quiggin developed his idea for decision under risk, Schmeidler for the more important and more subtle domain of decision under uncertainty, which is the topic of this book. Axioms can be divided into three different classes. First there are the basic rationality axioms such as transitivity, completeness, and monotonicity, which are satisfied by most models studied today. For descriptive purposes, it has become understood during the last decades that these very basic axioms are the main cause of most deviations from theoretical models. For normative applications, these axioms are relatively uncontroversial, although there is no unanimous agreement on any axiom. The second class of axioms consists of technical axioms, mostly continuity, that impose a richness on the structures considered. For decision under uncertainty, these axioms impose a richness on the state space or on the outcome space. They are usually necessary for obtaining mathematical proofs, and will be further discussed later in this chapter. The third and final class of axioms consists of the “intuitive” axioms that are most characteristic of the models they characterize. They vary from model to model. For expected utility, the sure-thing principle (which amounts to the independence axiom for given probabilities) is the most characteristic axiom. Most axiomatizations of nonexpected utility models have relaxed this axiom. Many examples will be discussed in the following sections, and in other chapters in this book. I end this introduction with a citation from Gilboa and Schmeidler (2001), who concisely listed the purposes of axiomatizations as follows: Meta-theoretical: Define theoretical terms by observables (and enable their elicitation). Descriptive: Define terms of refutability. Normative: Do the right thing.
Preference axiomatizations under uncertainty
23
2.2. General conditions for decision under uncertainty S denotes a state space, with elements called states (of nature). Exactly one state is true, the others are not true. The decision maker does not know which state is the true one, and has no influence on the truth of the states (no moral hazard). For example, assume that a horse race will take place. Exactly one horse will win the race. Every s ∈ S refers to one of the horses participating, and designates the “state of nature” that this horse will win the race. Alternative terms for state of nature are state of the world or proposition. An event is a subset of S, and is true or obtains if it contains the true state of nature. For example, the event “A Spanish horse will win” is the set {s ∈ S: s is Spanish}. C denotes the outcome space, and F the set of acts. Formally, acts are functions from S to C, and F contains all such functions. A decision maker should choose between different acts. An act will yield the outcome f (s) for the decision maker where s is the true state of nature. Because the decision maker is uncertain about which state is true, she is uncertain about what outcome will result from an act, and has to make decisions under uncertainty. An alternative term for an act is statecontingent payoffs, and acts can refer to financial assets. Acts can be considered random variables with the randomness not expressed through probabilities but through states of nature. David Schmeidler is known for his concise ways of formulating things. In the abstract of Schmeidler (1989), he used only seven words to describe the above model: “Acts map states of nature to outcomes.” By , a binary relation on F , we denote the preference relation of the decision maker over acts. In decision under uncertainty, we study properties of the quadruple S, C, F , . A function V represents if V : F → R and f g if and only if V (f ) ≥ V (g). If a representing function exists, then must be a weak order, that is, it is complete (f g or g f for all acts f , g) and transitive. Completeness implies reflexivity, that is, f f for all acts f . We write f g if f g and not g f , f ∼ g if f g and g f , f ≺ g if g f , and f g if g f . For a weak order , ∼ is an equivalence relation, that is, it is symmetric (f ∼ g if g ∼ f ), transitive, and reflexive. Outcomes are often identified with the corresponding constant acts. In this way, on F generates a binary relation on C, denoted by the same symbol and identified with the restriction of to the constant acts. Decision under risk refers to the special case of decision under uncertainty where an objective probability measure Q on S is given, and f ∼ g whenever f and g generate the same probability distribution over C. Then the only information relevant for the preference value of an act is the probability distribution that the act generates over the outcomes. Therefore, acts are usually identified with the probability distributions generated over the outcomes, and S is suppressed from the model. It is useful to keep in mind, though, that probabilities must be generated by some random process, and that some randomizing state space S is underlying, even if not an explicit part of the model. It is commonly assumed in decision under risk that S is rich enough to generate all probabilities, and all probability distributions. My experience in decision under risk and uncertainty has been that
24
Peter P. Wakker
formulations of concepts for the general context of uncertainty are more clarifying and intuitive than formulations only restricted to the special case of risk. This chapter will focus on axiomatizations for decision under uncertainty, the central topic of this book, and will not discuss axiomatizations for decision under risk. Often, axiomatizations for decision under risk readily follow simply by restricting the axioms of uncertainty to the special case of risk. For example, Yaari’s (1987) axiomatization of rank-dependent utility for risk can be obtained as a mathematical corollary of Schmeidler (1989); I will not elaborate on this point. We will also restrict attention to static models, and will not consider dynamic decision making or multistage models such as examined by Luce (2000) unless serving to interpret static models. Other restrictions are that we only consider individual decisions, and do not examine decompositions of multiattribute outcomes. We will neither discuss topological or measure-theoretic details, and primarily refer to works introducing results and not to follow-up works and generalizations. The most well-known representation for decision under uncertainty is subjective expected utility (SEU). SEU holds if there exists a probability measure P on S, and a utility function U : C → R, such that f → S U (f (s) dP (s), the SEU of f , represents preferences. For infinite state spaces S, measure-theoretical conditions can be imposed to ensure that the expectation is well defined for all acts considered. For the special case of decision under risk, P has to agree with the objective probability measure on S under mild richness assumptions regarding S, contrary to what has often been thought in the psychological literature. In general, P need not be based on objective statistical information, and may be based on subjective judgments of the decision situation in the same way as U is. P is, therefore, often called a subjective probability measure. SEU implies monotonicity, that is, f g whenever f (s) g(s) for all s, where furthermore f g if f (s) = α β = g(s) for outcomes α, β and all s in an event E that is “nonnull” in some sense. E being nonnull means that the outcomes of E can affect the preference value of an act, in a way that depends on the theory considered and that will not be formalized here. The most important implication of SEU is the sure-thing principle, discussed informally in the introduction. It means that a preference between two acts is not affected if, for an event for which the two acts yield the same outcome, that common outcome is changed into another common outcome. The condition holds true under SEU, because an event with a common outcome contributes the same term to the expected-utility integral of both acts, which will cancel from the comparison irrespective of what that common outcome is. Savage (1954) introduced this condition as his P2. He did not use the term sure-thing principle for this condition alone, but for a broader idea. The term is, however, used exclusively for Savage’s P2 nowadays. In a mathematical sense, the sure-thing principle can be equated with separability from consumer demand theory, although Savage developed his idea independently. The condition can be derived from principles for dynamic decisions (Burks, 1977: chapter 5; Hammond, 1988), a topic that falls outside the scope of this chapter.
Preference axiomatizations under uncertainty
25
The sure-thing principle is too weak to imply SEU. For instance, for a fixed partition (A1 , . . . , An ) of S, and acts (A1 : x1 ; . . . ; An : xn ) yielding xj for each s ∈ Aj , the sure-thing principle amounts to an additively decomposable representation V1 (x1 ) + · · · Vn (xn ), under some technical assumptions discussed later. This representation is strictly more general than the SEU representation P (A1 )U (x1 ) + · · · + P (An )U (xn ), for instance if V2 = exp(V1 ). It can be interpreted as state-dependent expected utility (Karni, 1985). Therefore, additional conditions are required to imply the SEU model. The particular reinforcements of the sure-thing principle depend on the particular model chosen, and are discussed in the next section.
2.3. Conditions to characterize subjective expected utility The most desirable characterization of SEU, or any model, would concern an arbitrary set of preferences over acts, not necessarily a complete set of preferences over a set F , and would give necessary and sufficient conditions for the preferences considered to be representable by SEU. Most important would be the case of a finite set of preferences, to truly capture the empirical and normative meaning of models such as SEU. Unfortunately, such general results are very difficult to obtain. For SEU, necessary and sufficient conditions for finite models were given by Shapiro (1979). These conditions are, however, extremely complex, and amount to general solvability requirements of inequalities for mathematical models called rings. They do not clarify the intuitive meaning of the model. Therefore, people have usually resorted to continuity conditions so as to simplify the axiomatizations of models. These continuity conditions imply richness of either the state space or the outcome space. Difficulties in using such technical richness conditions are discussed by Krantz et al. (1971: section 9.1) and Pfanzagl (1968: section 9.5). The following discussion is illustrated in Table 2.1. The most prominent model with richness of the state space is Savage (1954). Savage added an axiom P4 to the sure-thing principle, requiring that a preference for betting on one event rather than another is independent of the stakes of the bets. The richness of the state space was ensured by an axiom P6 requiring arbitrarily fine partitions of the state space to exist, so that the state space must be atomless. Decision under risk can be considered a special case of decision under uncertainty where the state space is rich, because it is commonly assumed that all probabilities can be generated by random events. Other than that, there have not been many derivations of SEU with a rich state space. Most axiomatizations have imposed richness structure on the outcome space, to which we turn in the rest of this section. We start with approaches that assume convex subsets of linear spaces as outcome space, with linear utility. In these approaches, outcomes are either monetary, with C ⊂ R an interval, or they are probability distributions over a set of prizes. The sure-thing principle is reinforced into linearity with respect to addition (f g ⇒ f + c g + c for acts f , g, c, where addition is statewise), or mixing (f g ⇒ λf + (1 − λ)c λg + (1 − λ)c for acts f , g, c,
26
Peter P. Wakker
Table 2.1 Axiomatizations and their structural assumptions
Continuous state space U linear in money U linear in probability mixing, 2-stage Canonical probabilities Continuous U, tradeoff consistency Continuous U, multisymmetry Continuous U, actindependence
SEU
CEU
Savage (1954)
Gilboa (1987)∗
de Finetti (1931, 1937) Anscombe and Aumann (1963)
Chateauneuf (1991) Schmeidler (1989)
Raiffa (1968), Sarin and Wakker (1997) Wakker (1984)
Sarin and Wakker (1992) Wakker (1989)
Nakamura (1990) Gul (1992)
Nakamura (1990) Chew and Karni (1994), Ghirardato et al. (2003)
PT
Multiple priors
Chateauneuf (1991) Gilboa and Schmeidler (1989) Sarin and Wakker (1994) Tversky and Kahneman (1992) × ×
Ghirardato et al. (2003); CasadesusMasanell et al. (2000)
Notes × Such an extension is not possible, because the required certainty equivalents are not contained in most of the sign-comonotonic sets. ∗ Required more modifications than only comonotonic restrictions.
where mixing is statewise, and under continuity can be restricted to λ = 12 ). Both of these approaches characterize SEU with a linear utility function. The additive approach was followed by de Finetti (1931, 1937) and Blackwell and Girshick (1954: theorem 4.3.1 and problem 4.3.1). For the mixture approach, Anscombe and Aumann (1963) provided the most appealing result. For earlier results on mixture spaces, see Arrow (1951: 431–432). In addition to the axioms mentioned, these works used weak ordering, monotonicity (this, together with additivity, is what de Finetti’s book-making amounts to), and some continuity (existence of “fair prizes” for de Finetti, continuous mixing for the mixture approaches). In the mixture approaches, the linear utility function is interpreted as an expected utility functional for the probability distributions over prizes, and acts are two-stage: In the first stage, the uncertainty about the true state of nature is resolved yielding a probability distribution over prizes, in the second stage the probability distribution is resolved, finally leading to a prize. This approach assumes that the two stages are processed through backwards induction (“folding back”). The second-stage probabilities could also be modeled through a rich product state space, but for this survey the categorization as rich outcomes is more convenient.
Preference axiomatizations under uncertainty
27
An alternative to Anscombe and Aumann’s (1963) approach was customary in the early decision-analysis literature of the 1960s (Raiffa, 1968: chapter 5). As in Anscombe and Aumann (1963), a rich set of events with objectively given probabilities was assumed present, with preferences over acts on these events governed by expected utility. However, these events were not part of a second stage to be resolved after the events of interest, but they were simply a subset of the collection of events considered in the first, and only, stage. Formally, this approach belongs to the category that requires a rich state space. To evaluate an arbitrary act (A1 : x1 ; . . . ; An : xn ), where no objective probabilities are given for the events Aj , a canonical representation (E1 : x1 ; . . . ; En : xn ) is constructed. Here each event Ej does have an objective probability and is equally likely as event Aj in the sense that one would just as well bet $1 on Ej as on Aj . It is assumed that such canonical representations can be constructed and are preferentially equivalent. In this manner, SEU is obtained over all acts. Sarin and Wakker (1997) formalized this approach. Ramsey (1931) can be interpreted as a variation of this canonical approach, with his “ethically neutral” event an event with probability half, utility derived from gambles on this event, and the extension of SEU to all acts and events not formalized. Returning to the approach with rich outcome sets, more general axiomatizations have been derived for continuous instead of linear utility. Then C can, more generally, be a connected topological space. For simplicity, we continue to assume that C is a convex subset of a linear space. Pfanzagl (1959) gave an axiomatization of SEU when restricted to two-outcome acts. He added a bisymmetry axiom to the sure-thing principle. Denote by CE(f ) a certainty equivalent of act f , that is, an outcome (identified with a constant act) equivalent to f . For events A, M with complements Ac , M c , bisymmetry requires that (A: CE(M: x1 ; M c : y1 ); Ac : CE(M: x2 ; M c : y2 )) ∼ (M: CE(A: x1 ; Ac : x2 ); M c : CE(A: y1 ; Ac : y2 )). For arbitrary finite state spaces S, Grodal (1978) axiomatized SEU with continuous utility using a mean-groupoid operation (a generalized mixture operation derived from preference) developed by Vind. These works were finally published in Vind (2003). Wakker (1984, 1993) characterized SEU for continuous utility using a tradeoff consistency technique based on conjoint measurement theory of Krantz et al. (1971) and suggested by Pfanzagl (1968: end of remark 9.4.5). The basic axiom requires that (A1 : α; A2 : x2 ; . . . ; An : xn ) (A1 : β; A2 : y2 ; . . . ; An : yn ), (A1 : γ ; A2 : x2 ; . . . ; An : xn ) (A1 : δ; A2 : y2 ; . . . ; An : yn ), and (A1 : v1 ; . . . ; An−1 : vn−1 ; An : α) (A1 : v1 ; . . . ; An−1 : vn−1 ; An : β)
28
Peter P. Wakker
imply (A1 : v1 ; . . . ; An−1 : vn−1 ; An : γ ) (A1 : v1 ; . . . ; An−1 : vn−1 ; An : δ), where (A1 , . . . , An ) can be any partition of S. By renumbering, similar conditions follow for outcomes α, β, γ , δ conditional on all pairs of events Ai , Aj . Nakamura (1990) used multi-symmetry, a generalization of Pfanzagl’s (1959, 1968) bisymmetry to general acts, to characterize SEU with continuous utility for finite state spaces. Similar conditions had appeared before in decision under risk (Quiggin, 1982; Chew, 1989). Chew called the condition event commutativity. Consider a partition (A1 , . . . , An ) and a “mixing” event M with complementary event M c . Multisymmetry requires that (A1 : CE(M: x1 ; M c : y1 ); . . . ; An : CE(M: xn ; M c : yn )) ∼ (M: CE(A1 : x1 ; . . . ; An : xn ); M c : CE(A1 : y1 ; . . . ; An : yn )). Multisymmetry implies that (x1 , . . . , xn ) is separable in (A1 : CE(M: x1 ; M c : c1 ); . . .; An : CE(M: xn ; M c : cn )). This implication is called act-independence, and was introduced by Gul (1992). Formally, the condition requires that (A1 : x1 ; . . . ; An : xn ) (A1 : y1 ; . . . ; An : yn ) implies (A1 : CE(M: x1 ; M c : c1 ); . . . ; An : CE(M: xn ; M c : cn )) (A1 : CE(M: y1 ; M c : c1 ); . . . ; An : CE(M: yn ; M c : cn )). Gul showed that this condition suffices to characterize SEU with continuous utility for finite state spaces, under the usual other assumptions. Gul used an additional symmetry requirement that was shown to be redundant by Chew and Karni (1994). Using bisymmetry axioms for two-outcome acts, Ghirardato et al. (2003a) defined a mixture operation that can be interpreted as an endogeneous analog of the mixture operation used in Anscombe and Aumann (1963). They used it also to derive nonexpected utility models discussed in the next section. Characterizations of properties of utility such as concavity have mostly been studied for decision under risk, and less so for decision under uncertainty. Also for uncertainty, utility is concave if and only if the subjective expected value of an act is always preferred to the act (Wakker, 1989; proposition VII.6.3.ii). This result is more difficult to prove than for decision under risk because not all probabilities need to be available, and is less useful because the subjective expected value is not directly observable, in the same way as subjective probabilities are not. More interesting for uncertainty is that utility is concave if and only if preferences are convex with respect to the mixing of outcomes, that is, if f g then 12 f + 12 g g where outcomes are mixed statewise (Wakker, 1989: proposition VII.6.3.iv). This condition has the advantage that it is directly observable.
Preference axiomatizations under uncertainty
29
2.4. Nonexpected utility models This section considers models deviating from SEU. Abandoning basic axioms. Models abandoning completeness (Bewley, 1986; Dubra et al., 2004), transitivity (Fishburn, 1982; Loomes and Sugden, 1982; Vind, 2003), or continuity (Fishburn and LaValle, 1993) will not be discussed. We will only discuss models that weaken the sure-thing principle. In this class, we will not discuss betweenness models (Chew, 1983; Dekel, 1986; Epstein, 1992). These models have been examined almost exclusively for risk, with statements for uncertainty only in Hazen (1987) and Sarin and Wakker (1998), and have nowadays lost popularity. We will neither discuss quadratic utility (Chew et al., 1991), which has been stated only for decision under risk. Choquet expected utility. The first nonexpected utility model that we discuss is rank-dependent utility, or Choquet expected utility (CEU) as it is often called when considered for uncertainty. We assume a utility function as under SEU, but instead of a subjective probability P on S we assume, more generally, a capacity W on S. W is defined on the collection of subsets of S with W (Ø) = 0, W (S) = 1, and C ⊃ D ⇒ W (C) ≥ W (D). is represented by f → S U (f (s)) dW (s), the CEU of f , defined next. Assume that f = (E1 : x1 ; . . . ; En : xn ). The integral is nj=1 πj U (xj ) where the πj s are defined as follows. Take a permutation ρ on {1, . . . , n} such that xρ(1) ≥ · · · ≥ xρ(n) . πρ(j ) is W (Eρ(1) ∪ · · · ∪ Eρ(j ) ) − W (Eρ(1) ∪ · · · ∪ Eρ(j −1) ); in particular, πρ(1) = W (Eρ(1) ). An important concept in CEU, introduced by Schmeidler (1989), is comonotonicity. Two acts f and g are comonotonic if f (s) > f (t) and g(s) < g(t) for no states s, t. A set of acts is comonotonic if every pair of its elements is comonotonic. Comonotonicity is an important concept because, as can be proved, within any comonotonic subset of F the CEU functional is an SEU functional (with numbers such as the above πρ(j ) playing the role of probabilities). It is, therefore, obvious that a necessary requirement for CEU is that all conditions of SEU hold within comonotonic subsets. Such restrictions are indicated by the prefix comonotonic, leading to the comonotonic sure-thing principle, etc. It is more complex to demonstrate that these comonotonic restrictions are also sufficient to imply CEU, but this can be proved in many circumstances. The third column of Table 2.1 gives the axiomatizations of CEU. Prospect theory. Original prospect theory, introduced by Kahneman and Tversky (1979), assumed nonlinear probability weighting but had theoretical problems, and was defined only for risk, not for uncertainty. Only when Schmeidler (1989) introduced a sound model for nonlinear probabilities, could a model of prospect theory be developed that is theoretically sound and that also deals with uncertainty (Tversky and Kahneman, 1992). We define it next. Under prospect theory, one outcome, called the reference outcome, plays a special role. Outcomes preferred to the reference outcome are gains, outcomes preferred less than the reference outcome are losses. The main deviation from other theories is that in different decision situations the decision maker may choose
30
Peter P. Wakker
different reference points, and remodel her decisions accordingly. Although there is much empirical evidence for such procedures, formal theories to describe them have not yet been developed. We will therefore restrict attention, in this theoretical chapter, to one fixed reference point. For results on varying reference points, see Schmidt (2003). With a fixed reference point, prospect theory generalizes CEU and SEU in that it allows for a different capacity, W − , for losses than for gains, where the gain capacity is denoted as W + . Under prospect theory we define, for an act f , f + by replacing all losses of f by the reference outcome, and f − by replacing all gains of f by the reference outcome. Our notation f − deviates from mathematical conventions that, for real-valued functions f , take f − as a positive function, being our function f − multiplied by −1. For general outcomes, however, such a multiplication cannot be defined, which explains our definition. The prospect theory (PT) of an act f is PT(f ) = CEU(f + ) + CEU(f − ) where CEU(f + ) is with respect to W + and CEU(f − ) is with respect to the dual of W − , assigning 1 − W − (Ac ) to each event A (Ac denotes complement). Two acts f , g are sign-comonotonic if they are comonotonic and, further, there is no state s such that of f (s), g(s) one is a gain and the other a loss. A set of acts is sign-comonotonic if any pair of its elements is sign-comonotonic. Signcomonotonicity plays the same role for PT as comonotonicity for CEU. Within any sign-comonotonic set, PT agrees with SEU and, therefore, all conditions of SEU are satisfied within sign-comonotonic sets. A more difficult result, that can be proved in several situations, is that PT holds as soon as the sign-comonotonic conditions of SEU hold, that is, the restrictions of these conditions to signcomonotonic subsets of acts. Axiomatizations of PT are given in the fourth column of Table 2.1. Properties of utility and capacities under CEU and PT. Specific properties of utilities and capacities have been characterized for CEU, and for PT alike. Schmeidler (1989) demonstrated, in his CEU model with linear utility, that the capacity is convex (W (A ∪ B) + W (A ∩ B) ≥ W (A) + W (B)) if and only if preferences are convex. Chateauneuf and Tallon (2002) generalized this result by showing that, under differentiability assumptions, preferences are convex if and only if both utility is concave and the capacity W is convex. Wakker (2001) gave necessary and sufficient conditions for convexity of the capacity, without restricting the form of utility other than being continuous. Tversky and Wakker (1995) characterized a number of other conditions of capacities, such as bounded subadditivity, that are often found in experimental tests of prospect theory. Multiple priors. Another popular deviation from expected utility is the multiple priors model. As in SEU, it assumes a utility function U over outcomes. It deviates by not considering one fixed probability measure, but a set of probability measures. Say C is such a set of probability measures over S. Then an act f is evaluated by minP ∈C SEUP (f ), where SEUP is taken with respect to P . This defines the multiple priors model. It was first characterized by Gilboa and
Preference axiomatizations under uncertainty
31
Schmeidler (1989) in an Anscombe–Aumann setup where outcomes designate probability distributions over prizes, evaluated through a linear utility function (an expected utility functional). In a comprehensive paper, Chateauneuf (1991) obtained the same characterization independently, also for linear utility, but with linearity relating to monetary outcomes. For two-outcome acts, the multiple priors model coincides with CEU utility, so that the common generalization of these two models that imposes the representation only on two-outcome acts can serve as a good starting point (Ghirardato and Marinacci, 2001). The axiomatization of the multiple priors model requires convexity of preference, implying that a representing functional is quasi-concave. Mainly independence with respect to constant functions: f g ⇒ λf +(1−λ)c λg +(1−λ)c for acts f , g, c, both in the Gilboa–Schmeidler approach and in Chateauneuf’s approach, where c is required to be constant, ensures that the representing functional is even concave. A functional is concave if and only if it is the minimum of dominating linear functions, which, under appropriate monotonicity, must be expected utility functionals. Thus, the multiple priors model results. The axiomatization of multiple priors for continuous instead of linear utility has been obtained by Casadesus-Masanell et al. (2000) who used both bisymmetrylike and tradeoff-consistency-like axioms, and Ghirardato et al. (2003a) who used bisymmetry-like axioms to define an endogeneous mixture operation. A less conservative extension of the multiple priors model is the α-Hurwicz criterion, where acts are evaluated by α times the minimal SEU plus 1 − α times the maximal SEU over C. It was axiomatized by Ghirardato et al. (2003b). Probabilistic sophistication. We finally discuss probabilistic sophistication. The derivation of SEU can be divided into two steps. In the first step, uncertainty is quantified through probabilities and the only relevant aspect for the preference value of an act is the probability distribution that it generates over the outcomes. In the second step, the probability distribution over outcomes is evaluated through expected utility. Probabilistic sophistication refers to the first of these steps without imposing expected utility in the second step. A first characterization was given by Machina and Schmeidler (1992), with an appealing generalization in Epstein and LeBreton (1993). The main axiom is de Finetti’s (1949) additivity: If you rather bet on A than on B, then you also rather bet on A ∪ D than on B ∪ D for any event D disjoint from A and B. Under appropriate richness of the event space, this axiom implies that there exists a probability measure P on the events such that you rather bet on A than on B if and only if P (A) ≥ P (B) (for a review, see Fishburn, 1986). Additional assumptions then guarantee that two different acts that generate the same probability distribution over outcomes are equivalent, which implies probabilistic sophistication.
2.5. Conclusion For all models discussed, axiomatizations provided a crucial step in the beginning of their developments, when it was not entirely clear what the right subjective
32
Peter P. Wakker
parameters and their quantitative rules of combination were. It is remarkable that prospect theory could be modeled in a sound way only after Schmeidler (1989) had developed the first axiomatization of decision under uncertainty with nonlinear decision weights.
References Allais, Maurice (1953), “Fondements d’une Théorie Positive des Choix Comportant un Risque et Critique des Postulats et Axiomes de l’Ecole Américaine,” Colloques Internationaux du Centre National de la Recherche Scientifique 40, Econométrie, 257–332. Paris: Centre National de la Recherche Scientifique. Translated into English, with additions, as “The Foundations of a Positive Theory of Choice Involving Risk and a Criticism of the Postulates and Axioms of the American School,” in Maurice Allais and Ole Hagen (1979, eds), Expected Utility Hypotheses and the Allais Paradox, 27–145, Reidel, Dordrecht, The Netherlands. Anscombe, F. J. and Robert J. Aumann (1963), “A Definition of Subjective Probability,” Annals of Mathematical Statistics 34, 199–205. Arrow, Kenneth J. (1951), “Alternative Approaches to the Theory of Choice in Risk-Taking Situations,” Econometrica 19, 404–437. Bewley, Truman F. (1986), “Knightian Decision Theory Part I,” Cowles Foundation Discussion Paper No. 807. Blackwell, David and M. A. Girshick (1954), “Theory of Games and Statistical Decisions.” Wiley, New York. Burks, Arthur W. (1977), “Chance, Cause, Reason (An Inquiry into the Nature of Scientific Evidence).” The University of Chicago Press, Chicago. Casadesus-Masanell, Ramon, Peter Klibanoff, and Emre Ozdenoren (2000), “Maxmin Expected Utility over Savage Acts with a Set of Priors,” Journal of Economic Theory 92, 35–65. Chateauneuf, Alain (1991), “On the Use of Capacities in Modeling Uncertainty Aversion and Risk Aversion,” Journal of Mathematical Economics 20, 343–369. Chateauneuf, Alain and Jean-Marc Tallon (2002), “Diversification, Convex Preferences and Non-Empty Core,” Economic Theory, 19, 509–523. Chew, Soo Hong (1983), “A Generalization of the Quasilinear Mean with Applications to the Measurement of Income Inequality and Decision Theory Resolving the Allais Paradox,” Econometrica 51, 1065–1092. Chew, Soo Hong (1989), “The Rank-Dependent Quasilinear Mean,” Unpublished manuscript, Department of Economics, University of California, Irvine, USA. Chew, Soo Hong and Edi Karni (1994), “Choquet Expected Utility with a Finite State Space: Commutativity and Act-Independence,” Journal of Economic Theory 62, 469–479. Chew, Soo Hong, Larry G. Epstein, and Uzi Segal (1991), “Mixture Symmetric and Quadratic Utility,” Econometrica 59, 139–163. de Finetti, Bruno (1931), “Sul Significato Soggettivo della Probabilità,” Fundamenta Mathematicae 17, 298–329. Translated into English as “On the Subjective Meaning of Probability,” in Paola Monari and Daniela Cocchi (eds, 1993) “Probabilità e Induzione,” Clueb, Bologna, 291–321. de Finetti, Bruno (1937), “La Prévision: Ses Lois Logiques, ses Sources Subjectives,” Annales de l’Institut Henri Poincaré 7, 1–68. Translated into English by Henry E. Kyburg Jr., “Foresight: Its Logical Laws, its Subjective Sources,” in Henry E. Kyburg Jr.
Preference axiomatizations under uncertainty
33
and Howard E. Smokler (1964, eds), Studies in Subjective Probability, Wiley, New York; 2nd edition 1980, Krieger, New York. de Finetti, Bruno (1949), “La ‘Logica del Plausible’ Secondo la Concezione di Pòlya,” Atti della XLII Riunione della Società Italiana per il Progresso delle Scienze, 227–236. Dekel, Eddie (1986), “An Axiomatic Characterization of Preferences under Uncertainty: Weakening the Independence Axiom,” Journal of Economic Theory 40, 304–318. Dubra, Juan, Fabio Maccheroni, and Efe A. Ok (2004), “Expected Utility without the Completeness Axiom,” Journal of Economic Theory, 115, 118–133. Edwards, Ward (1955), “The Prediction of Decisions Among Bets,” Journal of Experimental Psychology 50, 201–214. Epstein, Larry G. (1992), “Behavior under Risk: Recent Developments in Theory and Applications.” In Jean-Jacques Laffont (ed.), Advances in Economic Theory II, 1–63, Cambridge University Press, Cambridge, UK. Epstein, Larry G. and Michel Le Breton (1993), “Dynamically Consistent Beliefs Must Be Bayesian,” Journal of Economic Theory 61, 1–22. Fishburn, Peter C. (1978), “On Handa’s ‘New Theory of Cardinal Utility’ and the Maximization of Expected Return,” Journal of Political Economy 86, 321–324. Fishburn, Peter C. (1982), “Nontransitive Measurable Utility,” Journal of Mathematical Psychology 26, 31–67. Fishburn, Peter C. (1986), “The Axioms of Subjective Probability,” Statistical Science 1, 335–358. Fishburn, Peter C. and Irving H. LaValle (1993), “On Matrix Probabilities in Nonarchimedean Decision Theory,” Journal of Risk and Uncertainty 7, 283–299. Ghirardato, Paolo and Massimo Marinacci (2001), “Risk, Ambiguity, and the Separation of Utility and Beliefs,” Mathematics of Operations Research 26, 864–890. Ghirardato, Paolo, Fabio Maccheroni, Massimo Marinacci, and Marciano Siniscalchi (2003a), “A Subjective Spin on Roulette Wheels,” Econometrica, 71, 1897–1908. Ghirardato, Paolo, Fabio Maccheroni, and Massimo Marinacci (2003b), “Differentiating Ambiguity and Ambiguity Attitude,” Economic Dept, University of Torino. Gilboa, Itzhak (1987), “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics 16, 65–88. Gilboa, Itzhak and David Schmeidler (1989), “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics 18, 141–153. (Reprinted as Chapter 6 in this volume.) Gilboa, Itzhak and David Schmeidler (2001), lecture at 22nd Linz Seminar on Fuzzy Set Theory, Linz, Austria. Grodal, Birgit (1978), “Some Further Results on Integral Representation of Utility Functions,” Institute of Economics, University of Copenhagen, Copenhagen. Appeared in Vind, Karl (2003), “Independence, Additivity, Uncertainty.” With contributions by B. Grodal. Springer, Berlin. Gul, Faruk (1992), “Savage’s Theorem with a Finite Number of States,” Journal of Economic Theory 57, 99–110. (“Erratum,” 1993, Journal of Economic Theory 61, 184.) Hammond, Peter J. (1988), “Consequentialist Foundations for Expected Utility,” Theory and Decision 25, 25–78. Hazen, Gorden B. (1987), “Subjectively Weighted Linear Utility,” Theory and Decision 23, 261–282. Kahneman, Daniel and Amos Tversky (1979), “Prospect Theory: An Analysis of Decision under Risk,” Econometrica 47, 263–291.
34
Peter P. Wakker
Karni, Edi (1985), “Decision-Making under Uncertainty: The Case of State-Dependent Preferences.” Harvard University Press, Cambridge, MA. Krantz, David H., R. Duncan Luce, Patrick Suppes, and Amos Tversky (1971), “Foundations of Measurement, Vol. I. (Additive and Polynomial Representations).” Academic Press, New York. Loomes, Graham and Robert Sugden (1982), “Regret Theory: An Alternative Theory of Rational Choice under Uncertainty,” Economic Journal 92, 805–824. Luce, R. Duncan (2000), “Utility of Gains and Losses: Measurement-Theoretical and Experimental Approaches.” Lawrence Erlbaum Publishers, London. Machina, Mark J. and David Schmeidler (1992), “A More Robust Definition of Subjective Probability,” Econometrica 60, 745–780. Nakamura, Yutaka (1990), “Subjective Expected Utility with Non-Additive Probabilities on Finite State Spaces,” Journal of Economic Theory 51, 346–366. Pfanzagl, Johann (1959), “A General Theory of Measurement—Applications to Utility,” Naval Research Logistics Quarterly 6, 283–294. Pfanzagl, Johann (1968), “Theory of Measurement.” Physica-Verlag, Vienna. Quiggin, John (1982), “A Theory of Anticipated Utility,” Journal of Economic Behaviour and Organization 3, 323–343. Raiffa, Howard (1968), “Decision Analysis.” Addison-Wesley, London. Ramsey, Frank P. (1931), “Truth and Probability.” In “The Foundations of Mathematics and other Logical Essays,” 156–198, Routledge and Kegan Paul, London. Reprinted in Henry E. Kyburg Jr. and Howard E. Smokler (1964, eds), Studies in Subjective Probability, 61–92, Wiley, New York. (2nd edition 1980, Krieger, New York.) Sarin, Rakesh K. and Peter P. Wakker (1992), “A Simple Axiomatization of Nonadditive Expected Utility,” Econometrica 60, 1255–1272. (Reprinted as Chapter 7 in this volume.) Sarin, Rakesh K. and Peter P. Wakker (1994), “Gains and Losses in Nonadditive Expected Utility.” In Mark J. Machina and Bertrand R. Munier (eds), Models and Experiments on Risk and Rationality, Kluwer Academic Publishers, Dordrecht, The Netherlands, 157–172. Sarin, Rakesh K. and Peter P. Wakker (1997), “A Single-Stage Approach to Anscombe and Aumann’s Expected Utility,” Review of Economic Studies 64, 399–409. Sarin, Rakesh K. and Peter P. Wakker (1998), “Dynamic Choice and Nonexpected Utility,” Journal of Risk and Uncertainty 17, 87–119. Savage, Leonard J. (1954), “The Foundations of Statistics.” Wiley, New York. (2nd edition 1972, Dover, New York.) Schmeidler, David (1989), “Subjective Probability and Expected Utility without Additivity,” Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) Schmidt, Ulrich (2003), “Reference Dependence in Cumulative Prospect Theory,” Journal of Mathematical Psychology 47, 122–131. Shapiro, Leonard (1979), “Necessary and Sufficient Conditions for Expected Utility Maximizations: The Finite Case, with a Partial Order,” Annals of Statistics 7, 1288–1302. Tversky, Amos and Daniel Kahneman (1992), “Advances in Prospect Theory: Cumulative Representation of Uncertainty,” Journal of Risk and Uncertainty 5, 297–323. Tversky, Amos and Peter P. Wakker (1995), “Risk Attitudes and Decision Weights,” Econometrica 63, 1255–1280. Vind, Karl (2003), “Independence, Additivity, Uncertainty.” With contributions by B. Grodal. Springer, Berlin. Wakker, Peter P. (1984), “Cardinal Coordinate Independence for Expected Utility,” Journal of Mathematical Psychology 28, 110–117.
Preference axiomatizations under uncertainty
35
Wakker, Peter P. (1989), “Additive Representations of Preferences, A New Foundation of Decision Analysis.” Kluwer Academic Publishers, Dordrecht, The Netherlands. Wakker, Peter P. (1993), “Unbounded Utility for Savage’s ‘Foundations of Statistics,’ and other Models,” Mathematics of Operations Research 18, 446–485. Wakker, Peter P. (2001), “Testing and Characterizing Properties of Nonadditive Measures through Violations of the Sure-Thing Principle,” Econometrica 69, 1039–1059. Yaari, Menahem E. (1987), “The Dual Theory of Choice under Risk,” Econometrica 55, 95–115.
3
Defining ambiguity and ambiguity attitude Paolo Ghirardato
According to the well-known distinction attributed to Knight (1921), there are two kinds of uncertainty. The first, called “risk,” corresponds to situations in which all events relevant to decision making are associated with obvious probability assignments (which every decision maker agrees to). The second, called “(Knightian) uncertainty” or (following Ellsberg (1961)) “ambiguity,” corresponds to situations in which some events do not have an obvious, unanimously agreeable, probability assignment. As Chapter 1 makes clear, this collection focuses on the issues related to decision making under ambiguity. In this chapter, I briefly discuss the issue of the formal definition of ambiguity and ambiguity attitude. In his seminal paper on the Choquet expected utility (CEU) model David Schmeidler (1989) proposed a behavioral definition of ambiguity aversion, showing that it is represented mathematically by the convexity of the decision maker’s capacity v. The property he proposed can be understood by means of the example of the two coins used in Chapter 1. Assume that the decision maker places bets that depend on the result of two coin flips, the first of a coin that she is very familiar with, the second of a coin provided by somebody else. Given that she is not familiar with the second coin, it is possible that she would consider “ambiguous” all the bets whose payoff depends on the result of the second flip. (For instance, a bet that pays $1 if the second coin lands with heads up, or equivalently if the event {HH, TH} obtains.) If she is averse to ambiguity, she may therefore see such bets as somewhat less desirable than bets that are “unambiguous,” that is, only depend on the result of the first flip. (For instance, a bet that pays $1 if the first coin lands with heads up, or equivalently if the event {HH, HT} obtains.) However, suppose that we give the decision maker the possibility of buying shares of each bet. Then, if she is offered a bet that pays $0.50 on {HH} and $0.50 on {HT}, she may prefer it to either of the two bets that pay $1 contingently on {HH} or on {HT}, which are ambiguous. In fact, such a bet has the same contingent payoffs as a bet which pays $0.50 if the first coin lands with heads up, which is unambiguous. That is, a decision maker who is averse to ambiguity may prefer the equal-probability “mixture” of two ambiguous acts to either of the acts. In contrast, a decision maker who is attracted to ambiguity may prefer to choose one of the ambiguous acts.
Defining ambiguity and ambiguity attitude
37
Formally, Schmeidler called ambiguity averse a decision maker who prefers the even mixture1 (1/2)f + (1/2)g of two acts that she finds indifferent to either of the two acts. That is, (1/2)f + (1/2)g f for all f and g such that f ∼ g. As recalled earlier, if the decision maker has CEU preferences, this property implies that her capacity v is convex. If, instead, she has maxmin expected utility (MMEU) preferences, then she satisifies this property automatically (indeed, it is one of the axioms that characterize the model). While this is certainly a compelling definition, it does not seem to be fully satisfactory as a definition of ambiguity aversion. First of all, it explicitly relies on the availability of mixtures of acts, and thus apparently on the existence of objective randomizing devices. This is not a serious problem, for it has been shown by Ghirardato et al. (2001) that mixtures can be defined without invoking randomizing devices, provided the set of prizes is rich and preferences satisfy some mild restrictions. (Moreover, Casadesus-Masanell et al. (2000) show that Schmeidler’s definition can be formulated in a Savage setting which does not explicitly involve mixtures.) Second—and more important—Schmeidler’s definition is not satisfied by preferences that do seem to embody ambiguity aversion, as illustrated by the following example. Example 3.1. Consider again the decision maker facing the set S = {HH, HT, TH, TT} of results of flips of a familiar and an unfamiliar coin. Suppose that she has CEU preferences represented by a capacity v on S which: •
assigns 1/8 to each singleton state, that is, v({HH}) = v({HT}) = v({TH}) = v({TT}) =
•
assigns 1/2 to the results of the familiar coin flip, that is, v({HH, HT}) = v({TH, TT) =
• •
1 ; 8
1 ; 2
assigns 9/16 to any 3-state event (like {HH, HT, TH}) and 1 to the whole state space; assigns the sum of the weights of its (singleton) elements to each other event.
Such a preference embodies a dislike of ambiguity: The decision maker prefers to bet on the familiar coin rather than on the unfamiliar one (notice that v({HH, TH}) = 1/4 < 1/2 = v({HH, HT})). However, the capacity v is not convex,2 so that she is not ambiguity averse according to Schmeidler’s definition.
3.1. Comparative foundations to ambiguity aversion Motivated by these problems with Schmeidler’s definition, Epstein (1999) tried a different approach to defining aversion to ambiguity, inspired by Yaari’s (1969) general definition of risk aversion for non-expected utility preferences.
38
Paolo Ghirardato
He suggested using a two-stage approach, first defining a notion of comparative ambiguity aversion, and then calling averse to ambiguity any preference which is more averse than (what we establish to be) an ambiguity neutral preference. Ghirardato and Marinacci (2002, GM) followed his example, employing a different comparative notion and a different definition of ambiguity neutrality. For reasons that will become clear presently, I shall discuss these contributions in inverse chronological order. Ghirardato and Marinacci depart from the observation that preferences that obey the classical Expected Utility Theory (EUT) are intuitively ambiguity neutral, and propose using such preferences as the benchmark to measure ambiguity aversion. As to the comparative ambiguity aversion notion, they suggest calling a preference 2 more ambiguity averse than a preference 1 , if both preferences are represented by the same utility function3 and given any constant act x and any act f , we have that whenever the first preference favors the (certainly unambiguous) constant x to the (possibly ambiguous) f , the second does the same; that is, x 1 (1 )f
=⇒
x 2 (2 )f .
(3.1)
Thus, a preference is ambiguity averse if it is more averse to ambiguity than some EUT preference. GM show that every MMEU preference is averse to ambiguity in this sense (while “maximax EU” preferences are ambiguity seeking). In contrast, a CEU preference is ambiguity averse if and only if its capacity v has a nonempty core, a strictly weaker property than convexity. Therefore, GM conclude that Schmeidler’s definition captures strictly more than aversion to ambiguity. (Notice that the capacity v in Example 3.1 does have a nonempty core; the uniform probability on S is in Core(v).) This definition is simple and it has intuitive characterizations,4 but it can be criticized in an important respect. It does not distinguish between those departures from EUT, which are unrelated to ambiguity (like the celebrated “Allais paradox”)—in the terminology of Chapter 1, the violations of the third tenet of Bayesianism—and those which are. Every departure from the EUT benchmark is attributed to the presence of ambiguity. To see why this may be an issue, consider the following example. Example 3.2. Using again the two-coin example, consider a decision maker with CEU (indeed, RDEU) preferences and the capacity v defined by: v (S) = 1 and v (A) = P (A)/2 for A = S, where P is the uniform probability on the state space S. At first blush, we may invoke aversion to ambiguity (recall that the second coin is the unfamiliar one) to explain the fact that v ({TH, HH}) = v ({TT, HT}) = 1/4. However, we also see that v ({HH, HT}) = v ({TH, TT}) = 1/4; that is, the decision maker is similarly unwilling to bet on the familiar, unambiguous coin. What we are observing is a dislike of uncertainty which is more general than just aversion to ambiguity: The decision maker treats even events with “known” probability 1/2 as if they really had probability 1/4. This is a trait usually called probabilistic risk aversion; the decision maker appears in fact to be neutral to
Defining ambiguity and ambiguity attitude
39
the ambiguity in this problem. However, the capacity v is convex, so that both Schmeidler and GM would classify this decision maker as ambiguity averse. Epstein (1999) offers a definition that avoids this problem, carefully distinguishing between “risk-based” behavioral traits and “ambiguity-based” ones. The key idea is to use a set A of events which are exogenously known to be considered unambiguous by every decision maker, like the results of the flips of the familiar coin in the example stated earlier. Acts which only depend on the events in A are called unambiguous. The comparative definition is then modified as follows: say that preference 2 is more ambiguity averse than preference 1 if for any act f and any unambiguous act h, we have h 1 (1 )f
=⇒
h 2 (2 )f .
(3.2)
Notice that this definition is strictly stronger than GM’s, as constant acts are unambiguous, while in general (i.e. for nontrivial A) there will be unambiguous acts which are not constant. As long as the set A (and hence the set of unambiguous acts) is sufficiently rich, Eq. (3.2) implies that the two preferences have identical utility functions as well as identical probabilistic risk aversion. For instance, the CEU decision maker with capacity v in Example 3.1 cannot be compared to the one with capacity v in Example 3.2; their willingness to bet on the unambiguous results of the flips of the second coin are different. A CEU preference comparable to that with capacity v must also “transform” an objective probability of 1/2 into a 1/4. The choice of the benchmark with respect to which ambiguity aversion has to be measured is made consistently with this modified comparative notion. EUT preferences are probabilistic risk neutral, and do not “transform” the probabilities of unambiguous events, so they cannot be compared to preferences like the CEU preference with capacity v . Epstein uses preferences which satisfy Machina and Schmeidler’s (1992) probabilistic sophistication model, which allows nonexpected utility preferences as long as their ranking of bets on events can be represented by a probability.5 He calls a decision maker ambiguity averse if his preference is more averse to ambiguity than a probabilistically sophisticated preference. His characterization results are not as clear-cut as those in GM: While basically every MMEU preference is ambiguity averse, the characterization of CEU preferences is less straightforward. Epstein does provide a full characterization for those CEU preferences that satisfy a certain smoothness condition, which he calls “eventwise differentiability.” I refer the reader to his chapter for details. Epstein’s definition of ambiguity aversion is limited by the requirement of a rich set A of exogenously unambiguous events. Suppose that we observe a decision maker who has CEU preferences with capacity v as in Example 3.2, but we do not know what the decision maker knows about these two coins. Can we conclude that he is ambiguity neutral and probabilistic risk averse? If both coins were unfamiliar, his capacity would instead reflect ambiguity aversion—for all we know, he may even have EUT preferences (i.e. be probabilistic risk neutral) when betting on familiar coins. The problem is that in this case the set A is just the trivial {θ , S},
40
Paolo Ghirardato
too poor to enable us to distinguish between “pure” ambiguity aversion and probabilistic risk aversion. (As a consequence, the observation that the capacity v is convex yet induces behavior that is not intuitively ambiguity averse, may be in need of reconsideration.) We reach the conclusion that a theory of “pure” ambiguity aversion (as opposed to what is measured by GM) must be founded on an endogenous theory of ambiguity, if it is to be generally valid. This is what Epstein next turned his attention to; it is discussed in the next subsection. Before closing this discussion on the comparative foundation to ambiguity aversion, I remark that, while Epstein’s (1999) chapter is the earliest to use a comparative approach to provide an absolute notion of ambiguity aversion, there are others who much earlier discuss comparative ambiguity aversion. Tversky and Wakker (1995) present and characterize some different comparative notions related to ambiguity and probabilistic risk aversion. Kelsey and Nandeibam (1996) propose a comparative notion similar to GM’s, implicitly assuming the equality of utility, and show its characterization for CEU and MMEU preferences.
3.2. What is ambiguity? As observed earlier, the quest for the distinction of ambiguity aversion and behavioral traits unrelated to the presence of ambiguity was a driving force behind the more recent attempts (like Epstein and Zhang (2001)) at understanding the behavioral consequences of the presence of ambiguity. However, there have been others who have addressed the definition of ambiguity. Fishburn (1993) considers a primitive ambiguity relation over events, and discusses its properties and representation by an ambiguity measure. Nehring (1999) defines an event A unambiguous for a MMEU preference with set of priors C if P (A) = P (A) for every P , P ∈ C. As to CEU preferences, Nehring recalls that any capacity v on a finite state space S = {s1 , s2 , . . . , sn } can be canonically associated with the set Cv of the probabilities Pσ defined as follows. Let σ denote a permutation of the indices {1, . . . , n}, and define6 Pσ (sσ (i) ) = v({sσ (1) , sσ (2) , . . . , sσ (i) }) − v({sσ (1) , sσ (2) , . . . , sσ (i−1) }). Using this fact allows him to define ambiguity of events analogously to the MMEU case, with Cv in place of C. In both cases, an event is unambiguous if it is given identical weight in the evaluation of any act. Nehring shows that while for MMEU preferences the set of unambiguous events is a λ-system (a class closed with respect to complements and disjoint unions), for CEU preferences it is an algebra (i.e. it is also closed with respect to intersections). As there are situations in which the set of unambiguous events is not an algebra, this suggests that CEU preferences cannot be used to model all decision problems under ambiguity.7 A notion of ambiguity for events that holds for a wider class of preferences was introduced in Zhang (2002). Loosely put, Zhang calls unambiguous an event A such that Savage’s sure-thing principle holds for acts separated on the partition {A, Ac }. He then shows that the set of such events is a λ-system, and that for
Defining ambiguity and ambiguity attitude
41
a subset of CEU preferences (those which induce an exact v; details are found in GM) it has a simple representation in terms of the capacity v: It is the set of the A’s such that v(A) + v(Ac ) = 1. Zhang’s definition of unambiguous event was later modified in Epstein and Zhang (2001, EZ), the announced attempt to endogenize the class of unambiguous events used in Epstein’s definition of ambiguity aversion. The idea of EZ’s definition is similar to Zhang’s (2002), though it yields a larger collection of unambiguous events. Axioms on the decision maker’s preferences are introduced, which guarantee that the resulting collection of events is a λ-system and that the preferences over the sets of unambiguous acts (those which are measurable with respect to unambiguous events) are probabilistically sophisticated in the sense of Machina and Schmeidler (1992). This yields an interesting extension of Machina and Schmeidler’s and Savage’s models, wherein the set of events on which the decision maker satisfies the first and second tenet of Bayesianism is determined endogenously.8 However, it does not fully solve the problem of screening “riskbased” behavioral traits. In fact, if a preference is probabilistically sophisticated then every event is unambiguous in the EZ sense. It follows that the decision maker with CEU preferences and capacity v in Example 3.2 (who, recall, is probabilistically sophisticated) considers every event unambiguous and is probabilistic risk averse. This is regardless of the information that is available to her; it does not matter whether she is betting on familiar or unfamiliar coins. The problem is that EZ’s definition does not distinguish between the really unambiguous events and those which appear to be. It seems likely that such distinction could only be assessed by enriching the decision framework; that is, allowing the theorist to observe more than just the decision maker’s preferences over acts. Going back to the two-coins flip example, regardless of what a decision maker thinks about the unfamiliar coin she may believe that the event that it lands heads up on a single flip is more likely than the event that it lands heads up twice in a row. That is, she may hold that a bet on one head in two flips is “unambiguously better than” a bet on two heads in two flips. All the notions of ambiguity introduced thus far cannot formally capture this possibility. In an unpublished 1996 conference talk, Nehring suggested doing so using the largest subrelation of that satisfies independence, that I shall label I . He argued that if S is finite, for a class of preferences9 the results in Bewley (2002) can be used to show that I has a multiple priors with unanimity representation, with a set of priors D. In particular, when the decision maker satisfies MMEU with set of priors C we have C = D, while D = Cv when she satisfies CEU with capacity v. Although the relation I thus obtained can in principle be constructed using only behavioral data, its derivation is not simple. Independently, Nehring (2001) and Ghirardato et al. (2002) proposed to derive from the decision maker’s preference an unambiguous preference relation as follows: Say that act f is unambiguously preferred to act g, which is denoted f ∗ g, if αf + (1 − α)h αg + (1 − α)h for every α and every h. That is, f ∗ g if the preference of f over g cannot be overturned by mixing them with another act h, regardless of whether the latter allows to hedge (or speculate on) ambiguity. It turns out that ∗ = I , providing
42
Paolo Ghirardato
a more immediate behavioral foundation to the approach proposed by Nehring in his 1996 talk. The set of priors D representing ∗ by unanimity is naturally interpreted as the ambiguity that the decision maker perceives—better, appears to perceive—in her problem. The events on which all probabilities in D agree (which can simply be characterized in terms of the primitive ; see Ghirardato et al. (2002: prop. 24)) are natural candidates for being called unambiguous, and the collection of unambiguous events forms a λ-system. Unlike his 1996 talk, Nehring (2001) considers a countably infinite S and preferences whose induced ∗ is represented by a D satisfying a “range convexity” condition. Among various consequences of such range convexity, he shows the characterization of two intuitive notions of absolute ambiguity aversion. In particular, say that a preference relation is weakly ambiguity averse if for every pair of partitions of S, {A1 , A2 , . . . , An } and {T1 , T2 , . . . , Tn }, such that each Ti is unambiguous, we cannot have that the decision maker prefers betting on Ai over betting on Ti for every i. Under Nehring’s assumptions, a decision maker is weakly ambiguity averse if her ranking of bets can be represented by a capacity v with a nonempty core. A stronger property, which Nehring calls “ambiguity aversion,” is shown instead to be equivalent to the fact that the decision maker’s ranking of bets is represented by the lower envelope of D. Ghirardato et al. (2002) consider an arbitrary S and a different class of preferences.10 They show that the set D representing ∗ can also be obtained as an (appropriately defined) “derivative” of the functional that represents the preferences. In particular, when the state S is finite, this characterization implies that D is the (closed convex hull of the) set of all the Gateaux derivatives of the preference functional, where they exist. This result generalizes the EUT intuition that a decision maker’s subjective probability of state s is the shadow price for changes in the utility received in state s, by allowing a multiplicity of shadow prices. A consequence is the extension to preferences with nonlinear utility of Nehring’s 1996 result that D corresponds to C (resp. Cv ) in the MMEU (resp. CEU) case—which in turn implies that the set of unambiguous events coincides with that defined for such preferences in Nehring (1999). Ghirardato et al. (2002) also prove that the preferences they study can in general be given a representation, which is a generalization of the MMEU representation. More precisely, an act f is evaluated via a(f ) min u(f (s)) dP (s) + (1 − a(f )) max u(f (s)) dP (s), P ∈D
P ∈D
where a(·) is a function taking values in [0, 1] which represents the decision maker’s aversion to perceived ambiguity in the sense of GM. They also axiomatize the so-called α-maxmin EU model, in which a(·) ≡ α.11 The interesting aspect of this representation is its clear separation of ambiguity (represented by D) and ambiguity attitude (represented by a(·)), and it is encouraging that the model does not impose cross-restrictions between these two aspects of the representation. As can be seen from the foregoing discussion, the “relation-based” approach to modeling ambiguity is, at least in terms of its consequences, a significant
Defining ambiguity and ambiguity attitude
43
improvement over the previous “event-based” approaches. It has also yielded some interesting new perspectives on the characterization of ambiguity aversion and love. On the other hand, it is important to stress that this approach suffers of the same shortcoming as GM’s theory of ambiguity aversion: It does not really describe “pure” ambiguity aversion, rather the conjunction of all those behavioral features that induce departures from the independence axiom of EUT. In the terminology of Chapter 1, it does not distinguish between the violations of the first and the third tenets of Bayesianism. As observed earlier, it is not obvious that a solution to this identification problem can be reached without departing from a purely behavioral approach. Besides, a difficulty with such a departure is that it would require some prejudgment as to what really constitutes ambiguity, which is the very question that we set to answer. Another limitation of the “relation-based” approach due to its purely behavioral nature is the identification of ambiguity neutrality with lack of ambiguity. If a decision maker’s preference satisfy EUT, she is deemed to perceive no ambiguity, while it may be the case that she perceives ambiguity and is neutral with respect to it. Clearly, distinctions could be drawn if we considered ancillary information about the ambiguity present in the problem, at the mentioned cost of prejudging the nature of ambiguity. On the other hand, this is not as serious a concern as the one mentioned earlier, for ultimately our interest is modeling ambiguity as it affects decision makers’ behavior, and not otherwise.
Notes 1 Recall that Schmeidler used the Anscombe–Aumann setting, in which mixtures of acts can be defined state by state. Also, he used the term “uncertainty” averse rather than ambiguity averse. 2 For instance, we have that v({HH, HT, TH}) = 9/16 < 10/16 = v({HH, HT}) + v({TH}). 3 The preferences considered in GM, called “biseparable preferences,” induce stateindependent and cardinally unique utilities. They include CEU and MMEU preferences (and other models as well) as special cases. 4 The idea that nonemptiness of the core could be a more appropriate formalization of ambiguity aversion for CEU preferences had already been suggested by Montesano and Giovannoni (1996). 5 For instance, a CEU preference is probabilistically sophisticated if its capacity v is ordinally equivalent to a probability; that is, if it is RDEU. Such is the case of the preference with capacity v in Example 3.2. 6 Given utility u and an act f , it can be seen from the definition of Choquetintegral that if σ is such that u(f (sσ (1) )) ≥ u(f (sσ (2) )) ≥ · · · ≥ u(f (sσ (n) )), then u(f ) dv = u(f ) dPσ . 7 The fact that unambiguous events should form λ-systems and not algebras was observed earlier in Zhang (2002), whose first version predates Nehring’s. 8 Further extensions in this spirit are found in Kopylov (2002). In that chapter it is also shown that in general the sets of unambiguous events of Zhang and EZ are not λ-systems, but less structured families called “mosaics.” 9 Those that have linear utility among the preferences that satisfy all the axioms in Gilboa and Schmeidler (1989) but their “uncertainty aversion” axiom. The latter are called invariant biseparable preferences by Ghirardato et al. (2002).
44
Paolo Ghirardato
10 Invariant biseparable preferences (see note 9). Such preferences do not yield specific restrictions on D (beyond convexity, nonemptiness and closedness), but they embody a mild restriction that Nehring (2001) calls “utility sophistication”. Nehring shows that under range convexity, it is possible to define an unambiguous likelihood relation on events even without utility sophistication. See chapter for details. 11 Variants of this representation are well known at least since the seminal work of Hurwicz (published in Arrow and Hurwicz (1972)). See, in particular, Jaffray (1989).
References Arrow, K. J. and L. Hurwicz (1972). “An Optimality Criterion for Decision Making under Ignorance,” in Uncertainty and Expectations in Economics, ed. by C. Carter and J. Ford. Basil Blackwell, Oxford. Bewley, T. (2002). “Knightian Decision Theory: Part I,” Decisions in Economics and Finance, 25(2), 79–110 (first version 1986). Casadesus-Masanell, R., P. Klibanoff, and E. Ozdenoren (2000). “Maxmin Expected Utility over Savage acts with a Set of Priors,” Journal of Economic Theory, 92, 33–65. Ellsberg, D. (1961). “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Epstein, L. G. (1999). “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608. (Reprinted as Chapter 9 in this volume.) Epstein, L. G. and J. Zhang (2001). “Subjective Probabilities on Subjectively Unambiguous Events,” Econometrica, 69, 265–306. Fishburn, P. C. (1993). “The Axioms and Algebra of Ambiguity,” Theory and Decision, 34, 119–137. Ghirardato, P., F. Maccheroni, and M. Marinacci (2002). “Ambiguity from the Differential Viewpoint,” Social Science Working Paper 1130, Caltech, http://www.hss.caltech.edu/∼paolo/differential.pdf. Ghirardato, P., F. Maccheroni, M. Marinacci, and M. Siniscalchi (2001). “A Subjective Spin on Roulette Wheels,” Econometrica, 71 (6): 1897–1908, Nov. 2003. Ghirardato, P. and M. Marinacci (2002). “Ambiguity Made Precise: A Comparative Foundation,” Journal of Economic Theory, 102, 251–289. (Reprinted as Chapter 10 in this volume.) Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Jaffray, J.-Y. (1989). “Linear Utility Theory for Belief Functions,” Operations Research Letters, 8, 107–112. Kelsey, D. and S. Nandeibam (1996). “On the Measurement of Uncertainty Aversion,” Mimeo, University of Birmingham. Knight, F. H. (1921). Risk, Uncertainty and Profit. Houghton Mifflin, Boston. Kopylov, I. (2002). “Subjective Probabilities on ‘Small’ Domains,” Work in progress, University of Rochester. Machina, M. J. and D. Schmeidler (1992). “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. Montesano, A. and F. Giovannoni (1996). “Uncertainty Aversion and Aversion to Increasing Uncertainty,” Theory and Decision, 41, 133–148. Nehring, K. (1999). “Capacities and Probabilistic Beliefs: A Precarious Coexistence,” Mathemathical Social Sciences, 38, 197–213.
Defining ambiguity and ambiguity attitude
45
—— (2001). “Ambiguity in the Context of Probabilistic Beliefs,” Mimeo, UC Davis. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Tversky, A. and P. P. Wakker (1995). “Risk Attitudes and Decision Weights,” Econometrica, 63, 1255–1280. Yaari, M. E. (1969). “Some Remarks on Measures of Risk Aversion and on Their Uses,” Journal of Economic Theory, 1, 315–329. Zhang, J. (2002). “Subjective Ambiguity, Probability and Capacity,” Economic Theory, 20, 159–181.
4
Introduction to the mathematics of ambiguity Massimo Marinacci and Luigi Montrucchio
4.1. Introduction As discussed at length in Chapters 1–3, some mathematical objects play a central role in Schmeidler’s decision-theoretic ideas. In this chapter we provide some more details on them. One of the novelties of Schmeidler’s decision theory papers was the use of general set functions, not necessarily additive, to model “ambiguous” beliefs. This provided a new and intriguing motivation for the study of these mathematical objects, already studied from a different standpoint in cooperative game theory, another field where David Schmeidler has made important contributions. Here we overview the main properties of such set functions. Most of the results we will present are known, though often not in the generality in which we state and prove them. In the attempt to provide streamlined proofs and more general statements, we sometimes came up with novel arguments.
4.2. Set functions 4.2.1. Basic properties We begin by studying the basic properties of set functions. We use the setting of cooperative game theory as most of these concepts originated there; their decisiontheoretic interpretation is treated in great detail in Chapters 1–3 and 13, as well as in the other chapters in this book. Let be a set of players and an algebra of admissible coalitions in . A (transferable utility) game is a real-valued set function ν : → R with the only requirement that ν(Ø) = 0. Given a coalition A ∈ , the number ν(A) is interpreted as its worth, that is, the overall value that his members can achieve by teaming up. The condition ν(Ø) = 0 reflects the obvious fact that the worth of the empty coalition is zero; a priori, nothing more is assumed in defining a game ν. In the game theory literature several additional conditions have been considered. In
Introduction to the mathematics of ambiguity
47
particular, a game ν is1 positive if ν(A) ≥ 0 for all A; bounded if supA∈ |ν(A)| < + ∞; monotone if ν(A) ≤ ν(B) whenever A ⊆ B; superadditive if ν(A ∪ B) ≥ ν(A) + ν(B) for all pairwise disjoint sets A and B; 5 convex (supermodular) if ν(A ∪ B) + ν(A ∩ B) ≥ ν(A) + ν(B) for all A, B; 6 additive (a charge) if ν(A ∪ B) = ν(A) + ν(B) for all pairwise disjoint sets A and B.
1 2 3 4
All these conditions have natural game-theoretic interpretations (see, e.g. Moulin (1995) and Owen (1995)). For example, a game is monotone when larger coalitions can achieve higher values, and it is superadditive when combining disjoint coalitions results in more than proportional increases in value. As to supermodularity, it is a stronger property than superadditivity and it can be equivalently formulated as ν(B ∪ C ∪ A) − ν(B ∪ C) ν(B ∪ A) − ν(B),
(4.1)
for all disjoint sets A, B, and C; hence, it can be interpreted as a property of increasing marginal values (see Proposition 4.15). Some assumptions of a more technical nature are also often assumed. For example, a game ν is 7 outer (inner, resp.) continuous at A if limn→∞ ν(An ) = ν(A) whenever An ↓ A (An ↑ A, resp.); 8 continuous at A if it is both inner and outer continuous at A; 9 continuous if it is continuous at each A; ∞ ∞ 10 countably additive (a measure) if ν ν(Ai ) for all countable i=1 Ai = i=1 ∞ collections of pairwise disjoint sets {Ai }∞ i=1 Ai ∈ . i=1 such that We get important classes of games by combining some of the previous properties. In particular, monotone games are called capacities, additive games are called charges, and countably additive games are called measures. Finally, positive games ν that are normalized with ν( ) = 1 are called probabilities. Notice that capacities are always positive and bounded, while positive superadditive games are always capacities. Given a charge µ, its total variation norm µ is given by sup
n
|µ(Ai ) − µ(Ai−1 )|,
(4.2)
i=1
where the supremum is taken over all finite chains Ø = A0 ⊆ A1 ⊆ · · · ⊆ An =
. Denote by ba() and ca() the vector spaces of all charges and of all measures having finite total variation norm, respectively. By classic results (e.g. Dunford and Schwartz (1958) and Rao and Rao (1983)), a charge has finite total variation
48
Massimo Marinacci and Luigi Montrucchio
if and only if it is bounded, and both ba() and ca() are Banach spaces when endowed with the total variation norm. In particular, ca() is a closed subspace of ba(). In view of these classic results, it is natural to wonder whether a useful norm can be introduced in more general spaces of games. Aumann and Shapley (1974) showed that this is the case by introducing the variation norm on the space of all games. Given a game ν, its variation norm ν is given by sup
n
|ν(Ai ) − ν(Ai−1 )|,
(4.3)
i=1
where the supremum is taken over all finite chains Ø = A0 ⊆ A1 ⊆ · · · ⊆ An =
. If ν is a charge, the variation norm ν reduces to the total variation norm. Moreover, all finite games are of bounded variation as they have a finite number of finite chains. Denote by bv() the vector space of all games ν having finite variation norm. Aumann and Shapley (1974) proved the following noteworthy properties. Proposition 4.1. A game belongs to bv() if and only if it can be written as the difference of two capacities. Moreover, bv() endowed with the variation norm is a Banach space, and ba() and ca() are closed subspaces of bv().2 In view of this result, we can say that bv() is a Banach environment for not necessarily additive games that generalizes the classic spaces ba() and ca(). In the sequel we will mostly consider games belonging to it. We close this section by observing that each game ν has a dual game ν¯ defined by ν¯ (A) = ν( ) − ν(Ac ) for each A. From the definition immediately follows that • • •
ν¯¯ = ν; ν is monotone if and only if ν¯ does; ν belongs to bv() if and only if ν¯ does.
More important, dual games have “dual” properties relative to the original game. For example, • •
ν is convex if and only if ν¯ is concave, that is, ν¯ (A ∪ B) + ν¯ (A ∩ B) ≤ ν¯ (A) + ν¯ (B) for all A, B; ν is inner continuous at A if and only if ν¯ is outer continuous at Ac .
For charges µ it clearly holds µ = µ. ¯ Without additivity, ν and ν¯ are in general distinct games (see Proposition 4.3) and sometimes it is useful to consider the pair (ν, ν¯ ) rather than only ν. Example 4.1. The duality between ν and ν¯ does not hold for all properties. For example, it is false that ν is superadditive if and only if ν¯ is subadditive. Consider
Introduction to the mathematics of ambiguity
49
the game ν on = {ω1 , ω2 , ω3 } given by ν(ωi ) = 0 for i = 1, 2, 3, ν(ωi ∪ ωj ) = 5/6 for i, j = 1, 2, 3, and ν( ) = 1. Its dual ν¯ is given by ν¯ (ωi ) = 1/6 for i = 1, 2, 3, ν¯ (ωi ∪ ωj ) = 1 for i, j = 1, 2, 3, and ν( ) = 1. While ν is superadditive, its dual is not subadditive. In fact, ν¯ (ω1 ∪ ω2 ) = 1 > ν¯ (ω1 ) + ν¯ (ω2 ) = 1/3. Normalized superadditive games having subadditive duals are sometimes called upper probabilities (see Wolfenson and Fine (1982) and the references therein contained). 4.2.2. The core Given a game ν, its core is the (possibly empty) set given by core(ν) = {µ ∈ ba() : µ(A) ≥ ν(A) for each A and µ( ) = ν( )}. In other words, the core of ν is the set of all suitably normalized charges that setwise dominate ν. Notice that core(ν) = {µ ∈ ba() : ν ≤ µ ≤ ν¯ } = {µ ∈ ba() : µ(A) ≤ ν¯ (A) for each A and µ( ) = ν( )}, and so the core can be also regarded as the set of charges “sandwiched” between the game and its dual, as well as the set of charges setwise dominated by the dual game. The core is a fundamental solution concept in cooperative game theory, where it is interpreted as the set of undominated allocations (see Moulin (1995) and Owen (1995)). After Schmeidler’s seminal works, the core plays an important role in decision theory as well, as detailed in Chapters 1–3. Mathematically, the interest of the core lies in the connection it provides between games and charges, which, unlike games, are familiar objects in measure theory. As it will be seen later, useful properties of games can be deduced via the core from classic properties of charges. The core is a convex subset of ba(). More interestingly, it has the following compactness property.3 Proposition 4.2. When nonempty, the core of a bounded game is weak*-compact. Proof. Let µ ∈ core(ν) and let k = 2 supA∈ |ν(A)|. For each A it clearly holds µ(A) ≥ ν(A) ≥ − k. On the other hand, µ(A) = µ( ) − µ(Ac ) ≤ ν( ) − ν(Ac ) ≤ 2 sup |ν(A)|, A∈
and so |µ(A)| ≤ k. By (Dunford and Schwartz, 1958: 97), µ ≤ 2k, which implies core(ν) ⊆ {µ ∈ ba() : µ ≤ 2k}. By the Alaoglu Theorem (see Dunford and Schwartz, 1958: 424), {µ ∈ ba() : µ ≤ 2k} is weak*-compact. Therefore, to complete the proof it remains
50
Massimo Marinacci and Luigi Montrucchio
to show that core(ν) is weak*-closed. Let {µα }α be a net in core(ν) that weak*converges to µ ∈ ba(). Using the properties of the weak* topology, it is easy to see that µ ∈ core(ν). Hence, core(ν) is weak*-closed. Remark. When is a σ -algebra, the condition of boundedness of the game in Proposition 4.2 is superfluous by the Nikodym Uniform Boundedness Theorem (e.g. Rao and Rao, 1983: 204–205). The core suggests some further taxonomy on games. A game ν is 11 balanced if its core is nonempty; 12 totally balanced if all its subgames νA have nonempty cores.4 We already observed that for a charge µ it holds µ = µ. ¯ This property actually characterizes charges among balanced games. Proposition 4.3. A balanced game ν is a charge if and only if ν = ν¯ . Proof. The “only if ” part is trivial. As to the “if part”, let µ ∈ core(ν). As ν ≤ µ ≤ ν¯ , we have ν = µ = ν¯ , as desired. The next result characterizes balanced games directly in terms of properties of the game ν. It was proved by Bondareva (1963) and Shapley (1967) for finite games, and extended to infinite games by Schmeidler (1968). Theorem 4.1. A bounded game is balanced if and only if, for all λ1 , . . . , λn ≥ 0 and all A1 , . . . , An ∈ , it holds n i=1
λi ν(Ai ) ≤ ν( ) whenever
n
λi 1Ai = 1.
(4.4)
i=1
Proof. As the converse is trivial, we only show that ν is balanced provided it satisfies (4.4). By (4.4), ν(A) + ν(Ac ) ≤ ν( ) for all A, so that ν ≤ ν¯ . Let E be the collection of all finite subalgebras 0 of ; for each 0 ∈ E set c(0 ) = {γ ∈ R : ν(A) ≤ γ (A) ≤ ν¯ (A) for each A ∈ and γ|0 is a charge}, where R is the collection of all set functions on , and γ|0 is the restriction of γ on 0 . The set c(0 ) is nonempty. In fact, as 0 is finite and the restriction ν|0 satisfies (4.4), by Bondareva (1963) and Shapley (1967) there exists a charge γ0
Introduction to the mathematics of ambiguity
51
on 0 satisfying ν(A) ≤ γ0 (A) ≤ ν¯ (A) for each A ∈ 0 . If we set γ0 (A) if A ∈ 0 γ (A) = ν(A) otherwise we have γ ∈ c(0 ), so that c(0 ) = Ø. Set a = inf A∈ ν(A) and b = supA∈ ν¯ (A). Both a and b belong to R since ν is bounded, Border, 1999: 52) and so by the Tychonoff Theorem (see Aliprantis and . Clearly, c( ) ⊆ [a, b] is compact in the product topology of R the set 0 B∈ B∈ [a, b]. We want to show that c(0 ) is actually a closed subset of B∈ [a, b]. Let γt be a net in c(0 ) such that γt → γ ∈ R in the product topology, that is, γt (A) → γ (A) for all A ∈ . For each A and each t, we have ν(A) ≤ γt (A) ≤ ν¯ (A); hence, ν(A) ∪ γ (A) ≤ ν¯ (A). For each t and for all disjoint A and B in 0 , we have γt (A ∪ B) = γt (A) + γt (B); hence, γ (A ∪ B) = γ (A) + γ (B). We conclude that γ ∈ c(0 ), and so c(0 ) is a closed (and so compact) subset of B∈ [a, b]. ˜ 0 ∈ E the algebra If 0 ⊆ 0 , then c(0 ) ⊆ c(0 ). Hence, denoted by i n generated by a finite sequence {0 }i=1 ⊆ E, we have n
˜0 ⊆ Ø = c c 0i . i=1
In other words, the collection of compact sets {c(0 )}0 ∈E satisfies the finite intersection property. In turn, this implies 0 ∈E c(0 ) = Ø (see Aliprantis and Border, 1999: 38), which means that there exists a charge γ such that ν(A) ≤ γ (A) ≤ ν¯ (A) for each A ∈ . Since γ ∈ B∈ [a, b], the charge γ is bounded and so it belongs to ba(). We conclude that core(ν) = Ø, as desired. Remark. As observed by Kannai (1969: 229–230) for positive games Theorem 4.1 also follows from a result of Fan (1956) on systems of linear inequalities in normed spaces. Since countable additivity is a most useful technical property, it is natural to wonder when it is the case that a nonempty core actually contains some measures. The next example of Kannai (1969) shows that this might well not happen. Example 4.2. Let = N and consider the game ν : 2N → R defined by 0 Ac is infinite ν(A) = 1 else Here core(ν) = Ø. In fact, let ∇ be any ultrafilter containing the filter of all sets having finite complements. The two-valued charge u∇ : 2N → R defined by 1 A∈∇ u∇ (A) = 0 else
52
Massimo Marinacci and Luigi Montrucchio
belongs to core(ν). On the other hand, core(ν) ∩ ca() = Ø. For, suppose per contra that µ ∈ core(ν) ∩ ca(). For each n ∈ N we have µ(n) = µ(N) − µ(N − {n}) = 0. The countable additivity of µ then implies µ(N) = n µ(n) = 0, which contradicts µ(N) = ν(N) = 1. For positive games it is trivially true that core(ν) ⊆ ca() provided ν is continuous at . In fact, for each monotone sequence An ↑ it holds ν( ) = µ( ) ≥ lim µ(An ) ≥ lim ν(An ) = ν( ), n
n
for all µ ∈ core(ν). Hence, µ( ) = limn µ(An ), which implies µ ∈ ca(). For signed games we have a more interesting result, based on Aumann and Shapley (1974: 173). Proposition 4.4. Given a balanced game ν, it holds core(ν) ⊆ ca() provided ν is continuous at both and Ø. Proof. Consider An ↑ . Let µ ∈ core(ν). We want to show that µ( ) = limn µ(An ). Since µ(An ) ≥ ν(An ) for each n, by the continuity of ν at we have lim inf n µ(An ) ≥ lim inf n ν(An ) = ν( ). On the other hand, since Acn ↓ Ø and ν is continuous at Ø, we have lim sup µ(An ) = µ( ) − lim inf µ(Acn ) ≤ ν( ) − lim inf ν(Acn ) = ν( ). n
n
n
In sum, lim sup µ(An ) ≤ ν( ) ≤ lim inf µ(An ), n
n
and so µ( ) = limn µ(An ), as desired. The next example shows that in general these continuity properties are only sufficient for the core being contained in ca(). Example 4.3. Let λ be the Lebesgue measure on [0, 1] and let f : [0, 1] → R be given by ⎧ 1 ⎪ ⎨x 0 ≤ x ≤ 2 1 1 f (x) = 2 2 < x < 1 ⎪ ⎩ 1 x=1 Consider the game ν(A) = f (λ(A)) for each A. Though this game is not continuous at , we have core(ν) = {λ} ∈ ca(). For, let µ ∈ core(ν). We want to show that µ = λ. Given A, there isa partition {Ai }ni=1 of A such that λ(Ai ) ≤ 1/2. any n Hence, µ(A) = i=1 µ(Ai ) ≥ ni=1 λ(Ai ) = λ(A). Since A was arbitrary, this implies µ ≥ λ, and so µ = λ.
Introduction to the mathematics of ambiguity
53
Intuitively, this example works because the connection between the form of the game ν = f (λ) and its core is a bit “loose.” Formally, there are gaps between ν and the core’s lower envelope minµ∈core(ν) µ(A). For example, if A is such that λ(A) = 3/4, then ν(A) = 1/2 < 3/4 = minµ∈core(ν) µ(A). To fix this problem, Schmeidler (1972) introduced the following class of games: a game ν is 13 exact if it is balanced and ν(A) = minµ∈core(ν) µ(A) for each A. In other words, a game is exact if for each A there is µ ∈ core(ν) such that ν(A) = µ(A). Exact games can thus be viewed as games in which there is a tight connection between the form of the game and its core. Schmeidler (1972) provided a characterization of exact games in terms of the game ν, related to (4.4). Moreover, he was able to prove that for exact games continuity becomes a necessary and sufficient condition for the core to be a subset of ca(). To see why this is the case, we need a remarkable property of weak*compact subsets of ca(), due to Bartle, Dunford and Schwartz (see Maccheroni and Marinacci (2000) and the references therein contained). The result requires to be a σ -algebra, a natural domain for continuous set functions. Lemma 4.1. If is a σ -algebra, then a subset of ca() is weak*-compact if and only if it is weakly compact. Remark. As the proof shows, this lemma is a consequence of the Dini Theorem when K ⊆ ca + (). Proof. It is enough to prove that a weak*-compact subset of ca() is weakly compact, the converse being trivial. Suppose K ⊆ ca() is weak*-compact. Since K is bounded and weakly closed, by Dunford and Schwartz (1958: Theorem IV.9.1) the set K is sequentially weakly compact if and only if, given An ↑ , for each ε > 0 there is a positive integer n(ε) such that |µ( ) − µ(An )| < ε for all µ ∈ K and all n ≥ n(ε). In other words, if and only if the measures in K are uniformly countably additive. For convenience, we only consider the case K ⊆ ca + () (e.g., Maccheroni and Marinacci (2000), for the general case). For each n ≥ 1 consider the evaluation functions φn : ba() → R defined by φn (µ) = µ(An )
for each µ ∈ ba().
Moreover, let φ : ba() → R be denned by φ(µ) = µ( ) for each µ ∈ ba(). Both the function φ and each function φn are weak*-continuous, and the sequence {φn }n≥1 is increasing on K. As K is weak*-compact and lim φn (µ) = lim µ(An ) = µ( ) = φ(µ) for each µ ∈ K, n
n
by the Dini Theorem (see Aliprantis and Border, 1999: 55) φn converges uniformly to φ. In turn, this easily implies the desired uniform countable additivity of the
54
Massimo Marinacci and Luigi Montrucchio
measures in K, and so K is sequentially weakly compact. By the Eberlein–Smulian Theorem (see Aliprantis and Border, 1999: 256), K is then weakly compact as well.
Using this lemma we can prove the following result, due to Schmeidler (1972) for positive games. Here |µ|(A) denotes the total variation of µ at A (see Aliprantis and Border, 1999: 360). Theorem 4.2. Let ν : → R be an exact game defined on a σ -algebra . Then, the following conditions are equivalent: (i) (ii) (iii) (iv)
ν is continuous at and Ø. ν is continuous at each A. core(ν) is a weakly compact subset of ca(). There exists λ ∈ ca + () such that, given any A, for all ε > 0 there exists δ > 0 such λ(A) < δ =⇒ |µ|(A) < ε for all µ ∈ core(ν).
(4.5)
Remark. Inspection of the proof shows that when ν is positive, in (i) we can just assume continuity at , while in (iv) we can choose λ so that it belongs to core(ν). Proof. (ii) trivially implies (i), which in turn implies core(ν) ⊆ ca() by Proposition 4.4. By Proposition 4.2, core(ν) is weak*-compact, and so, by Lemma 4.1, it is weakly compact as well. Assume (iii) holds. Since core(ν) is a weakly compact subset of ca(), by Dunford and Schwartz (1958: Theorem IV.9.2) there is λ ∈ ca + () such that (iv) holds. If ν is positive, following Delbaen (1974: 226) replace 1/2i by 1/mn at the bottom of Dunford and Schwartz (1958: 307) to get λ ∈ core(ν). It remains to show that (iv) implies (ii). Assume (iv). Since λ is countably additive, (4.5) implies that each µ ∈ core(ν) is countably additive as well, that is, core(ν) ⊆ ca(). By Lemma 4.1, core(ν) is weakly compact. We are now ready to show that ν is continuous at each A. Per contra, suppose there is some A at which ν is not continuous, that is, there is a sequence, say An ↑ A (the argument for An ↓ A is similar), and some η > 0 such that |ν(An )−ν(A)| ≥ η. As ν is exact, for each n there is µn ∈ core(ν) such that ν(An ) = µn (An ). By the Eberlein– Smulian Theorem (see Aliprantis and Border, 1999: 256), core(ν) is sequentially weakly compact as well. Hence, there is a suitable subsequence {µnk }nk of {µn }n such that µnk weakly converges to some µ˜ ∈ core(ν). By Dunford and Schwartz ˜ for each A. (1958: Theorem IV.9.5), this means that limk µnk (A) = µ(A) Now, consider ν(Ank ) = µnk (Ank ) = µnk (A) − µnk (A \ Ank ).
(4.6)
Clearly, A \ Ank ↓ Ø. Since core(ν) is weakly compact, by Dunford and Schwartz (1958: Theorem IV.9.1) the measures in core(ν) are uniformly countably additive,
Introduction to the mathematics of ambiguity
55
and so for each ε > 0 there is k(ε) 1 such that |µ(A \ Ank )| < ε for all µ ∈ core(ν) and all k ≥ k(ε). In particular, |µnk (A \ Ank )| < ε for all k ≥ k(ε). As ε is arbitrary, this implies limk µnk (A \ Ank ) = 0. By (4.6), we then have lim ν(Ank ) = lim µnk (Ank ) = µ(A) ˜ ≥ ν(A). k
k
(4.7)
On the other hand, there exists a µˆ ∈ core(ν) such that µ(A) ˆ = ν(A). Hence, ˆ nk ) ≥ lim ν(Ank ). ν(A) = µ(A) ˆ = lim µ(A k
k
(4.8)
Putting together (4.7) and (4.8), we get ν(A) = limnk ν(Ank ), thus contradicting |ν(An ) − ν(A)| ≥ η. We conclude that ν is continuous at A, as desired. Point (iv) is noteworthy. It says that the continuity of ν guarantees the existence of a positive control measure λ for core(ν), that is, a measure λ ∈ ca + () such that µ λ for all µ ∈ core(ν). This is a very useful property; inter alia, it implies that core(ν) can be identified with a subset of L1 (λ), the set of all (equivalence classes) of -measurable functions that are integrable with respect to λ. In fact, by the Radon–Nikodym Theorem (see Aliprantis and Border, 1999: 437) to each µ λ corresponds a unique f ∈ L1 (λ) such that µ(A) = A f dλ for all A. Corollary 4.1. Let ν : → R be an exact game defined on a σ -algebra . Then, ν is continuous at and Ø if and only if there is λ ∈ ca + () such that core(ν) is a weakly compact subset of L1 (λ). Proof. Set ca(λ) = {µ ∈ ca : µ λ}. By the Radon–Nikodym Theorem, there is an isometric isomorphism between ca(λ) and L1 (λ) determined by the formula µ(A) = A f dλ (see Dunford and Schwartz, 1958: 306). Hence, a subset is weakly compact in ca(λ) if and only if it is in L1 (λ) as well. It is sometimes useful to know when the core of a continuous game consists of non-atomic measures. We close the section by studying this problem, which also provides a further illustration of the usefulness of the control measure λ. In order to do so, we need to introduce null sets. Given a game ν, a set N is ν-null if ν(N ∪ A) = ν(A)
for all A ∈ .
The next lemma collects some basic properties of null sets. Lemma 4.2. Given a game ν, let N be a ν-null set. Then (i) each subset B ⊆ N is ν-null; (ii) ν(B) = 0 and ν(A \ B) = ν(A) for any B ⊆ N ; (iii) N is ν¯ -null.
(4.9)
56
Massimo Marinacci and Luigi Montrucchio
Proof. (i) Let B ⊆ N and let A be any set in . By (4.9), ν(B ∪ A) = ν(B ∪ A ∪ N ) = ν(A ∪ N ) = ν(A), and so B is ν-null. (ii) If we put A = Ø in (4.9), we get ν(N ) = 0. By (i), each B ⊆ N is νnull, so that ν(B) = 0 by what we have just established. It remains to show that ν(A\B) = ν(A) for any B ⊆ N . By (i), A ∩ B is ν-null. Hence, ν(A\B) = ν((A \ B) ∪ (A ∩ B)) = ν(A), as desired. (iii) Let A be any set in . By (ii) we then have ν¯ (A ∪ N ) = ν( ) − ν(Ac \ N ) = ν( ) − ν(Ac ) = ν¯ (A), as desired. For a charge µ, a set N is µ-null if and only if |µ|(N ) = 0. For, suppose N is µ-null. We have (see Aliprantis and Border, 1999: 360): |µ|(N ) = sup{|µ(B)| + |µ(N\B)| : B ⊆ N }, and so point (ii) of Lemma 4.2 implies |µ|(N ) = 0. Conversely, suppose |µ|(N ) = 0. Then, |µ(B)| = 0 for each B ⊆ N, and so µ(A ∪ N) = µ(A ∪ N \A) = µ(A) + µ(N\A) = µ(A) for each set A ∈ . We conclude that N is µ-null, as desired. Given two games ν1 and ν2 , we say that ν1 is absolutely continuous with respect to ν2 (written ν1 ν2 ) when each ν2 -null set is ν1 -null; we say that the two games are equivalent (written ν1 ≡ ν2 ) when a set is ν1 -null if and only if it is ν2 -null. In the special case of charges we get back to the standard definitions of absolute continuity (see Aliprantis and Border, 1999: 363). Given a balanced game ν, we have µ ν for each µ ∈ core (ν). For, let m ∈ core (ν) and suppose N is ν-null. For each A ⊆ N , we have m(A) ≥ ν(A) = 0, and m(Ac ) ≥ ν(Ac ) = ν( ) = m( ) = m(A) + m(Ac ). Hence, m(A) = 0 for all A ⊆ N, namely, |m|(N ) = 0. For continuous exact games we have the following deeper result, due to Schmeidler (1972: Theorem 3.10), which provides a further useful property of the control measure λ. Lemma 4.3. Given an exact and continuous game ν defined on a σ -algebra , let λ be the control measure of Theorem 4.2. Then, ν ≡ λ.
Introduction to the mathematics of ambiguity
57
Proof. By Dunford and Schwartz (1958: Theorem IV.9.2), we have λ=
kn ∞ −n n 2 µ i kn n=1
(4.10)
i=1
with each µni ∈ core (ν). Let N be ν-null. As µ ν for each µ ∈ core (ν), N is µ-null for each µ ∈ core (ν). Hence, |µ|(N ) = 0 for all µ ∈ core (ν). By (4.10), λ(N ) = 0. Therefore N is λ-null. Conversely, suppose λ(N ) = 0. As µ λ, for each µ ∈ core (ν), we have |µ|(N ) = 0 for each µ ∈ core (ν). By exactness, there are µ, µ ∈ core (ν) such that ν(N ∪ F ) = µ(N ∪ F ) = µ(F ) ≥ ν(F ) = µ (F ) = µ (N ∪ F ) ≥ ν(N ∪ F ) and so N is ν-null. We conclude that ν ≡ λ, as desired. A game ν is non-atomic if for each ν-nonnull set A there is a set B ⊆ A such that both B and A\B are ν-nonnull. In particular, a charge µ is non-atomic if and only if for each |µ|(A) > 0 there is B ⊆ A such that 0 < |µ|(B) < |µ|(A). In turn, this is equivalent to require that for each µ(A) = 0 there is B ⊆ A such that both µ(B) = 0 and µ(A\B) = 0 (see Rao and Rao, 1983: 141–142). We can now state and prove the announced result on “non-atomic” cores. Proposition 4.5. Let ν be a continuous exact game defined on σ -algebra . Then, ν is non-atomic if and only if core (ν) consists of non-atomic measures. Proof. “If” part. Suppose ν is non-atomic. By Lemma 4.3, λ as well is nonatomic. In turn, this implies that each µ ∈ core (ν) is non-atomic. In fact, let |µ|(A) = 0 for some A, so that λ(A) > 0. Since λ is non-atomic, there is a partition 1 of A such that λ(A1 ) = λ(B 1 ) = 1 λ(A) (see Rao and Rao, 1983: A11/2 , B1/2 1/2 1/2 2 1 ) < |µ|(A), we are Theorem 5.1.6). If 0 < |µ|(A11/2 ) < |µ|(A) or 0 < |µ|(B1/2 1 1 ) = |µ|(A). done. Suppose, in contrast, that either |µ|(A1/2 ) = |µ|(A) or |µ|(B1/2 2 be a partition of A1 Without loss, let |µ|(A11/2 ) = |µ|(A). Let A21/2 and B1/2 1/2 1 ) = 1 λ(A1 ). If 0 < |µ|(A2 ) < |µ|(A1 ) or such that λ(A21/2 ) = λ(B1/2 1/2 1/2 1/2 2 2 ) < |µ|(A1 ) we are done. 0 < |µ|(B1/2 1/2 2 ) = Suppose, in contrast, that either |µ|(A21/2 ) = |µ|(A11/2 ) or |µ|(B1/2 1 2 1 |µ|(A1/2 ). Without loss, let |µ|(A1/2 ) = |µ|(A1/2 ). By proceeding in this way, either we find a set B ⊆ A such that 0 < |µ|(B) < |µ|(A) or we can construct a chain {An1/2 }n≥1 such that λ(An1/2 ) = 21n λ(A) and |µ|(An1/2 ) = |µ|(A) for all
n ≥ 1. Hence, being n≥1 An1/2 ∈ , λ( n≥1 An1/2 ) = 0 and |µ|( n≥1 An1/2 ) = |µ|(A) > 0. Since µ λ, this is impossible, and so there exists some set B ⊆ A such that 0 < |µ|(B) < |µ|(A). We conclude that µ is non-atomic, as desired.
58
Massimo Marinacci and Luigi Montrucchio
“Only if.” Suppose that each µ ∈ core (ν) is non-atomic. Set λn = n (2−n /kn ) ki=1 |µni | in (4.10). Then, λ = ∞ n=1 λn . Each positive measure λn is non-atomic. For, suppose λn (A) > 0. There is some |µni | such that |µni |(A) > 0. Hence, there is B ⊆ A such that |µni |(B) > 0 and |µni |(A\B) > 0. Since λn ≥ |µni |, we then have λn (B) > 0 and λn (A\B) > 0, as desired. Since each λn is non-atomic, λ as well is non-atomic. For, suppose λ(A) > 0. There is some λn such that λn (A) > 0. Hence, there is B ⊆ A such that λn (B) > 0 and λn (A\B) > 0. Since λ ≥ λn , we then have λ(B) > 0 and λ(A\B) > 0, and so λ is non-atomic. By Lemma 4.3, ν ≡ λ. As λ is non-atomic, this implies that ν as well is non-atomic, as desired.
4.3. Choquet integrals Given a game ν : → R and a real-valued function f : → R, a natural question is whether there is a meaningful way to write an integral f dν that extends the standard notions of integrals for additive games. Fortunately, Choquet (1953: 265) has shown that it is possible to develop a rich theory of integration in a non-additive setting. As usual with notions of integration, we will present Choquet’s integral in a few steps, beginning with positive functions. 4.3.1. Positive functions A function f : → R is -measurable if f −1 (I ) ∈ for each open and each closed interval I of R (see Dunford and Schwartz, 1958: 240). The set of all bounded -measurable f : → R is denoted by B(). Proposition 4.6. The set B() is a lattice. If, in addition, is a σ -algebra, then B() is a vector lattice. Proof. Let f , g ∈ B(). We only prove that (f ∨ g)−1 (a, b) ∈ for any open (possibly unbounded) interval (a, b) ⊆ R, the other cases being similar. For each t ∈ R, the following holds: (f ∨ g > t) = (f > t) ∪ (g > t) (f ∨ g < t) = (f < t) ∩ (g < t). Hence, (f ∨ g)−1 (a, b) = (f ∨ g > a) ∩ (f ∨ g < b) = ((f > a) ∪ (g > a)) ∩ ((f < b) ∩ (g < b)) ∈ , as desired. Finally, the fact that B() is a vector space when is a σ -algebra is a standard result in measure theory (see Aliprantis and Border, 1999: Theorem 4.26).
Introduction to the mathematics of ambiguity
59
Given a capacity ν : → R and a positive -measurable function f : → R, the Choquet integral of f with respect to ν is given by ∞ ν({ω ∈ : f (ω) ≥ t}) dt, (4.11) f dν = 0
where on the right we have a Riemann integral. To see why the Riemann integral is well defined, first observe that f −1 ([t, +∞)) = {ω ∈ : f (ω) ≥ t} ∈
for each t ∈ R.
Set Et = {ω ∈ : f (ω) ≥ t}; the survival function Gν : R → R of f with respect to ν is defined = ν(Et ) for each t ∈ R. Using this function, we by Gν (t) ∞ can write (4.11) as f dν = 0 Gν (t) dt. The family {Et }t∈R is a chain, with Et ⊇ Et if t ≤ t .5 Since ν is a capacity, we have ν(Et ) ≥ ν(Et ) if t ≤ t , and so Gν is a decreasing function. Moreover, since f is both positive and bounded, the function Gν is positive, decreasing and with compact support. By standard +∞ results on Riemann integration, we conclude that the Riemann integral 0 Gν (t) dt exists, and so the Choquet integral (4.11) is well defined. The Choquet integral f dν reduces to the standard additive integral when ν is additive. Given a positive charge µ and a function f in B(), let f dµ be the standard additive integral for charges (see Aliprantis and Border, 1999: 399 and Rao and Rao, 1983: 115–121). Proposition 4.7. Given a positive function f ∈ B() and a positive charge µ ∈ ba(), it holds f dµ = µ(f ≥ t) dt = f dµ. Proof. We use an argument of Rudin (1987: 172). Set Et = (f ≥ t). Given ω ∈ , we have
∞
1Et (ω) dt =
0
0
∞
1[0,f (ω)] (t) dt =
f (ω)
dt = f (ω).
0
∞ Equivalently, f (ω) = 0 1Et (ω) dλ, where λ is the Lebesgue measure on R. By the Fubini Theorem for the integral (e.g., Marinacci, 1997), we can write
f dµ =
= 0
as desired.
∞
1Et (ω) dλ
0 ∞
dµ =
µ(f ≥ t) dλ = 0
∞
1Et (ω) dµ
0 ∞
µ(f ≥ t) dt,
dλ
60 Massimo Marinacci and Luigi Montrucchio We close by observing that in defining Choquet integrals we could have equivalently used the “strict” upper sets (f > t). Proposition 4.8. Let ν be a capacity and f a positive function in B(). Then,
∞
∞
ν(f t) dt =
ν(f > t) dt.
0
0
Proof. As before, set Gν (t) = ν(f ≥ t) for each t ∈ R. Moreover, set Gν (t) = ν(f > t) for each t ∈ R. We have (f ≥ t + 1/n] ⊆ (f > t) ⊆ (f ≥ t) for each t ∈ R, and so Gν (t +1/n) ≤ Gν (t) ≤ Gν (t) for each t ∈ R. If Gν is continuous at t, we have Gν (t) = limn Gν (t + 1/n) ≤ Gν (t) ≤ Gν (t), so that Gν (t) = Gν (t). On the other hand, as Gν is a decreasing function, it is continuous except on an / T, at most countable setT ⊆ R. As a result, ∞ it holds Gν (t) = Gν (t) for all t ∈ ∞ which in turn implies 0 Gν (t) dt = 0 Gν (t) dt by standard results on Riemann integration. 4.3.2. General functions We now extend the definition of the Choquet integral to general -measurable functions. In the previous subsection we have defined the Choquet integral on B + (), the cone of all positive elements of B(). Each capacity ν induces a functional νc : B + () → R on this cone, given by νc (f ) = f dν for each f ∈ B + (). If f is a characteristic function 1A , we get νc (1A ) = 1A dν = ν(A); thus, the functional νc —which we call the Choquet functional—can be viewed as an extension of the capacity ν from to B + (). Our problem of defining a Choquet integral on B() can be viewed as the problem of how to extend the Choquet functional on the entire space B(). In principle, there are many different ways to extend it. To make the extension problem meaningful we have to set a desideratum for the extension, that is, a property we want it to satisfy. A natural property to require is that the extended functional νc : B() → R be translation invariant, that is, νc (f + α1) = νc (f ) + ανc (1) for each α ∈ R and each f ∈ B(). The next result shows that this desideratum pins down the extension to a particular form. Proposition 4.9. A Choquet functional νc : B + () → R induced by a capacity admits a unique translation invariant extension, given by 0
∞
ν(f ≥ t) dt +
0
−∞
[ν(f ≥ t) − ν( )] dt
for each f ∈ B(), where on the right we have two Riemann integrals.
(4.12)
Introduction to the mathematics of ambiguity
61
Proof. Set
∞
νc (f ) =
ν(f ≥ t) dt +
0
−∞
0
[ν(f ≥ t) − ν( )] dt.
The functional νc is well defined and some simple algebra shows that it is translation invariant and that it reduces to the Choquet integral when f ∈ B + (). Assume ν : B() → R is a translation invariant functional such that ν(f ) = νc (f ) ν satisfies (4.12), so that ν = νc . whenever f ∈ B + (). We want to show that Let f ∈ B() be such that inf f = γ < 0. By translation invariance, ν(f − γ) = ν(f ) − γ ν(1 ). As f − γ belongs to B + (), we can then write: ν(f ) = ν(f − γ ) + γ ν(1 ) = νc (f − γ ) + γ νc (1 ) ∞ ν((f − γ ) ≥ t) dt + γ νc (1 ) = 0
=
∞
ν(f ≥ t + γ ) dt + γ νc (1 )
0
=
∞
ν(f ≥ τ ) dτ + γ νc (1 )
γ
=
0
∞
ν(f ≥ τ ) dτ +
ν(f ≥ τ ) dτ −
0
γ
0
ν( ) dτ γ
where the penultimate equality is due to the change of variable τ = t + γ . As [ν(f ≥ τ ) − ν( )] = 0 for all τ ≤ γ , the following holds: ν(f ) =
∞
ν(f ≥ τ ) dτ +
0
0
−∞
(ν(f ≥ τ ) − ν( )) dτ .
Hence, ν = νc , as desired. Before moving on, observe that the Riemann integrals in (4.12) exist even if ν is a game of bounded variation, that is, if ν ∈ bv(). In fact, for each such game there exist two capacities ν1 and ν2 with ν = ν1 − ν2 . Hence, ν(f ≥ t) = ν1 (f ≥ t) − ν2 (f ≥ t) for each t ∈ R, and so ν(f ≥ t) is a function of bounded variation in t. The Riemann integrals in (4.12) then exist by standard results on Riemann integrals. In view of Proposition 4.9 and the above observation, next we define the Choquet integral for functions in B() with respect to games in bv() as the translation invariant extension of the definition given in (4.11) for positive functions.
62
Massimo Marinacci and Luigi Montrucchio
Definition 4.1. Given a game ν ∈ bv() and a function f ∈ B(), the Choquet integral f dν is defined by
f dν =
∞
ν(f ≥ t) dt +
0
0
−∞
[ν(f ≥ t) − ν( )] dt.
The associated Choquet functional νc : B() → R is given by νc (f ) = for each f ∈ B().
(4.13)
f dν
Translation invariance and Proposition 4.7 imply that when ν is a bounded charge, the Choquet integral f dν of a f ∈ B() reduces to the standard additive integral. Moreover, it is easy to check that Proposition 4.8 holds for general Choquet integrals, that is, ∞ 0 f dν = ν(f > t) dt + [ν(f > t) − ν( )] dt. −∞
0
Finally, the Choquet integral (4.13) is well defined for all finite games since they belong to bv(). As in the finite case B() = R , this means that finite games induce Choquet functional νc : R → R. Example 4.4. Given a nonempty coalition A, the unanimity game uA : → R is the two-valued convex game defined by 1 A⊆B uA (B) = 0 else for all B ∈ . For each f ∈ B() it holds f duA = inf ω∈A f (ω). In fact, we have A ⊆ (f ≥ t) if and only if t ≤ inf ω∈A f (ω), and so GuA (t) = 1(−∞,inf ω∈A f (ω)) (t). Example 4.5. Let = {ω1 , ω2 } and suppose ν is a capacity on 2 with 0 < ν(ω1 ) < 1, 0 < ν(ω2 ) < 1, and ν( ) = 1. Then, νc : R2 → R is given by x1 (1 − ν(ω2 )) + x2 ν(ω2 ) if x2 ≥ x1 , νc (x1 , x2 ) = x1 ν(ω1 ) + x2 (1 − ν(ω1 )) if x2 < x1 Given any k ∈ R, the level curve {(x1 , x2 ) ∈ R2 : νc (x1 , x2 ) = k} is 2) x2 = ν(ωk 2 ) − 1−ν(ω if x2 ≥ x1 , ν(ω2 ) x1 x2 =
k 1−ν(ω1 )
−
ν(ω1 ) 1−ν(ω1 ) x1
if x2 < x1
As a result, the level curve is a straight line when ν is a charge—that is, when ν(ω1 ) + ν(ω2 ) = 1—and it has, in contrast, a kink at the 45-degree line {(x1 , x2 ) ∈ R2 : x1 = x2 } when ν is not a charge. The non additivity of ν is thus reflected
Introduction to the mathematics of ambiguity
63
by kinks in the level curves. In general, level curves of Choquet integrals are not affine spaces, unless the game is a charge. A function f in B() is simple if it is finite-valued, that is, if the set {f (ω) : ω ∈ } is finite. Each simple function f admits a unique represenk k tation f = i=1 αi 1Ai , where {Ai }i=1 ⊆ is a suitable partition of and α1 > · · · > αk . Using this representation, we can rewrite formula (4.13) in a couple of equivalent ways, which are sometimes useful (e.g., the discussion of the Choquet Expected Utility model of Schmeidler (1989) in Chapter 1). Proposition 4.10. Given a game ν ∈ bv() and a simple function f ∈ B(), it holds ⎞ ⎛ i k f dν = Aj ⎠ (αi − αi+1 )ν ⎝ i=1
=
k
⎛ ⎛ αi ⎝ν ⎝
j =1
⎞
i
⎛
Aj ⎠ − ν ⎝
j =0
i=1
⎞⎞
i−1
Aj ⎠⎠ ,
j =0
where we set αk+1 = 0 and A0 = Ø. Proof. It is enough to prove the first equality, the other being a simple rearrangement of its terms. Let f be positive, so that αk ≥ 0. If t > α1 , then {ω ∈ : f (ω) ≥ t} = ∅. If t ∈ (αi+1 , αi ], then (recall that αk+1 = 0): {ω ∈ : f (ω) ≥ t} =
i
Aj .
j =1
Hence, ν(f ≥ t) =
k
⎛ ν⎝
i
⎞ Aj ⎠ 1(αi+1 ,αi ] (t) for each t ∈ R+ ,
j =1
i=1
so that
f dν =
∞
ν(f ≥ t) dt =
0
=
k i=1
k ∞
0
⎛ ν⎝
i
j =1
⎞ Aj ⎠
0
∞
i=1
⎛ ν⎝
i
⎞ Aj ⎠ 1(αi+1 ,αi ] (t) dt
j =1
1(αi+1 ,αi ] (t) dt =
k i=1
⎛ (αi − αi+1 )ν ⎝
i
⎞ Aj ⎠ ,
j =1
as desired. This proves the first equality for a positive f . The case of a general f is easily obtained using translation invariance.
64
Massimo Marinacci and Luigi Montrucchio
When ν ∈ ba(), the above formulae reduce to k αi ν(Ai ), f dν = i=1
which is the standard integral of f with respect to the charge ν. Example 4.6. Let P : → [0, 1] be a probability charge with range R(P ) = {P (A) : A ∈ }. Given a real-valued function g : R(P ) → R, the game ν = f (P ) is called a scalar measure game. It holds ∞ 0 f dν = [g(P (f ≥ t)) − g(1)] dt. g(P (f ≥ t)) dt + −∞
0
The right-hand side becomes ⎡ ⎛ ⎛ ⎞⎞ ⎛ ⎛ ⎞⎞⎤ i i−1 k αi ⎣g ⎝P ⎝ Aj ⎠⎠ − g ⎝P ⎝ Aj ⎠⎠⎦ i=1
j =0
j =0
when f is a simple function. This is a familiar formula in Rank Dependent Expected Utility (see Chapters 1 and 2). 4.3.3. Basic properties We begin by collecting a few basic properties of Choquet integrals. Here, · on bv() is the variation norm given by (4.3), while ≥ and · on B() are the pointwise order and the supnorm, respectively.6 Proposition 4.11. Suppose νc : B() → R is the Choquet functional induced by a game ν ∈ bv(). Then (i) (ii) (iii) (iv)
(Positive homogeneity): νc (αf ) = ανc (f ) for each α ≥ 0. (Translation invariance): νc (f + α1 ) = νc (f ) + ανc (1 ) for each α ∈ R. (Monotonicity): νc (f ) ≥ νc (g) if f ≥ g, provided ν is a capacity. (Lipschitz continuity): for all f , g ∈ B(), |νc (f ) − νc (g)| ≤ νf − g.
(4.14)
Proof. Properties (i) and (ii) are easily established. To see that (iii) holds it is enough to observe that, being ν a capacity, it holds ν(g ≥ t) ≤ ν(f ≥ t) for each t ∈ R since f ≥ g implies (g ≥ t) ⊆ (f ≥ t) for each t ∈ R. As to (iv), suppose first that ν is a capacity. Assume νc (f ) ≥ νc (g) (the other case is similar). As f ≤ g + f − g, by (ii) and (iii) we have νc (f ) ≤ νc (g) + f − gν( ). This implies |νc (f ) − νc (g)| ≤ ν( )f − g, which is (4.14) when ν is monotonic. For, in this case ν = ν( ).
(4.15)
Introduction to the mathematics of ambiguity
65
Now, let ν ∈ bv(). By Aumann and Shapley (1974: 28), ν can be written as ν = ν + − ν − , where ν + and ν − are capacities such that ν = ν + ( ) + ν − ( ). By (4.15), we then have |νc (f ) − νc (g) ≤ [ν + ( ) + ν − ( )]f − g, as desired. If a game ν belongs to bv(), its dual ν¯ as well belongs to bv(). The Choquet functional ν¯ c is therefore well defined and next we show that it can be viewed as the dual functional of νc . Proposition 4.12. Let ν ∈ bv(). Then, ν¯ c (f ) = −νc (−f ) for each f ∈ B(). If, in addition, ν is balanced, then νc (f ) ≤ µ(f ) ≤ ν¯ c (f ) for each f ∈ B() and each µ ∈ core (ν). Proof. Given f ∈ B(), we have ν¯ c (f ) =
∞
ν¯ (f ≥ t) dt +
0
=
∞
0
−∞
[¯ν (f ≥ t) − ν¯ ( )] dt
0
=
∞
[ν( ) − ν(f ≤ t)] dt −
0
=
∞
[ν( ) − ν(f < t)] dt +
0 −∞ 0
−∞
[ν( ) − ν(−f ≥ −t)] dt −
0
−∞
0 −∞
∞
[ν( ) − ν(−f ≥ t)] dt −
=−
ν(f ≤ t) dt
0
=
−ν(f < t) dt
0 0
−∞
ν(−f ≥ −t) dt
ν(−f ≥ t) dt
∞
[ν(−f ≥ t) − ν( )] dt +
ν(−f ≥ t) dt
0
= −νc (−f ). Suppose ν is balanced. Then ν(A) ≤ µ(A) ≤ ν¯ (A) for each A ∈ and each µ ∈ core (ν). In turn this implies that, given any f ∈ B(), ν(f ≥ t) ≤ µ
66
Massimo Marinacci and Luigi Montrucchio
(f ≥ t) ≤ ν¯ (f ≥ t) for each t ∈ R. By the monotonicity of the Riemann integral, 0 ∞ ν¯ (f ≥ t) dt + [¯ν (f ≥ t) − ν¯ ( )] dt 0
≥
−∞
∞
0
≥
0
∞
µ(f ≥ t) dt + ν(f ≥ t) dt +
0
−∞ 0
−∞
[µ(f ≥ t) − ν¯ ( )] dt
[ν(f ≥ t) − ν( )] dt,
and so νc (f ) ≤ µ(f ) ≤ ν¯ c (f ), as desired. In general, Choquet functionals ν : B() → R are not additive, that is, it is in general false that νc (f + g) = νc (f ) + νc (g). However, the next result, due to Dellacherie (1971), shows that additivity holds in a restricted sense. Say that two functions f , g ∈ B() are comonotonic (short for “commonly monotonic”) if (f (ω) − f (ω ))(g(ω) − g(ω )) ≥ 0 for any pair ω, ω ∈ . That is, two functions are comonotonic provided they have a similar pattern. Theorem 4.3. Suppose ν : B() → R is the Choquet functional induced by a game ν ∈ bv(). Then, νc (f + g) = νc (f ) + νc (g) provided f and g are comonotonic, and f + g ∈ B(). To prove this result we need a couple of useful lemmas. The first one says that two functions f and g are comonotonic if and only if all their upper sets are nested. This is trivially true for the two collections (f ≥ t) and (g ≥ t) separately; the interesting part here is that f and g are comonotonic if and only if this is still the case for the combined collection {(f ≥ t)}t∈R ∪ {(g ≥ t)}t∈R . For a proof of this lemma we refer to Denneberg (1994: Prop. 4.5). Lemma 4.4. Two functions f , g ∈ B() are comonotonic if and only if the overall collection of all upper sets (f ≥ t) and (g ≥ t) is a chain. The next lemma says that we can replicate games over chains with suitable charges. The non-additivity of a game is, therefore, immaterial as long as we restrict ourselves to chains. Lemma 4.5. Let ν ∈ bv(). Given any chain C in there is µ ∈ ba() such that µ(A) = ν(A) for all A ∈ C.
(4.16)
If, in addition, ν is a capacity, then we can take µ ∈ ba + (). Proof. It is enough to prove the result for a capacity ν, as the extension to any game in bv() is routine in view of their decomposition as differences of capacities given in Proposition 4.1.
Introduction to the mathematics of ambiguity
67
Consider first a finite chain Ø = A0 ⊆ A1 ⊆ · · · ⊆ An ⊆ An+1 = . Let 0 be the finite subalgebra of generated by such chain. Let µ0 ∈ ba + (0 ) be defined by µ0 (Ai+1 \Ai ) = ν(Ai+1 ) − ν(Ai ) for i = 1, . . . , n. By standard extension theorems for positive charges (see Rao and Rao, 1983: Corollary 3.3.4), there exists µ ∈ ba + () which extends µ0 on , that is, µ(A) = µ0 (A) for each A ∈ 0 . Hence, µ is the desired charge. Now, let C be any chain. Let {Cα }a be the collection of all its finite subchains, and set α = {µ ∈ ba + () : µ(A) = ν(A) for each A ∈ Cα }. By what we just proved, each α is nonempty. Moreover, the collection {α }a has the finite intersection property. For, let {Ci }ni=1 ⊆ {Cα }a be a finite collection. Since ni=1 Ci is in turn a finite chain, by proceeding as before it is easy to establish the existence
of a µ ∈ ba() such that µ(A) = ν(A) for each A ∈ ni=1 Ci . As µ ∈ ni=1 i , the intersection ni=1 i nonempty, as desired. Each α is a weak∗ -closed subset of the weak∗ -compact set {µ ∈ ba + () : µ( ) = ν( )}. Since {α }a has the finite intersection property, we conclude that Any charge µ ∈ α α satisfies (4.16).
α
α = Ø.
Proof of Theorem 4.3. Suppose f and g are comonotonic functions in B(). Then, the sum f + g is comonotonic with both f and g, so that the collection {f , g, f + g} consists of pairwise comonotonic functions. Let C = {(f ≥ t)}t∈R ∪ {(g ≥ t)}t∈R ∪ {(f + g ≥ t)}t∈R . By Lemma 4.4, C is a chain. By Lemma 4.5, there is µ ∈ ba() such that µ(A) = ν(A) for all A ∈ C. Hence, f dν + g dν = f dµ + g dµ = (f + g) dµ = (f + g) dν, as desired. As constant functions are comonotonic with all other functions, comonotonic additivity is a much stronger property than translation invariance. The next result of Bassanezi and Greco (1984: Theorem 2.1) shows that comonotonic additivity is actually the “best” possible type of additivity for Choquet functionals. Proposition 4.13. Suppose contains all singletons. Then, two functions f , g ∈ B(), with f + g ∈ B(), are comonotonic if and only if it holds νc (f + g) = νc (f )+νc (g) for all Choquet functionals induced by convex capacities ν : → R.
68
Massimo Marinacci and Luigi Montrucchio
Proof. The “only if ” part holds by Theorem 4.3. As to “if ” part, assume it holds νc (f + g) = νc (f ) + νc (g)
(4.17)
for all Choquet functionals induced by convex capacities. Suppose, per contra, that f and g are not comonotonic. Then, there exist ω , ω ∈ such that [f (ω ) − f (ω )][g(ω ) − g(ω )] < 0. Say that f (ω ) < f (ω ) and g(ω ) > g(ω ), and consider the convex game u{ω ,ω } (A) =
1 if {ω , ω } ⊆ A . 0 else
By Example 4.4, u{ω ,ω },c (f ) = f (ω ) and u{ω ,ω },c (g) = g(ω ). Hence, u{ω ,ω },c (f + g) = min{(f + g)(ω ), (f + g)(ω )} = f (ω ) + g(ω ) = u{ω ,ω },c (f ) + u{ω ,ω },c (g), which contradicts (4.17). Notice that the argument used to prove the last result can be adapted to give the following characterization of comonotonicity: when contains all singletons, two functions f , g ∈ B() are comonotonic if and only if inf (f (ω) + g(ω)) = inf f (ω) + inf g(ω)
ω∈A
ω∈A
ω∈A
for all A ∈ . Lemmas 4.4 and 4.5 are especially useful in finding counterparts for games and for their Choquet integrals of standard results that hold in the additive case. Theorem 4.3 is a first important example since through these lemmas we could derive the counterpart for Choquet integrals of the additivity of standard integrals. We close this subsection with another simple illustration of this feature of Lemmas 4.4 and 4.5 by showing a version for Choquet integrals of the classic Jensen inequality. Proposition 4.14. Let ν be a capacity with ν( ) = 1. Given a monotone convex function φ : R → R, for each f ∈ B() the following holds:
φ(f ) dν ≥ φ
f dν .
Proof. Given any f ∈ B(), the functions φ ◦ f and f are comonotonic. By Lemmas 4.4 and 4.5, there is µ ∈ ba + () such that µ(f ≥ t) = ν(f ≥ t) and µ(φ(f ) ≥ t) = ν(φ(f ) ≥ t) for each t ∈ R. In turn, this implies µ( ) =
Introduction to the mathematics of ambiguity 69 φ(f ) dµ, and f dν = f dµ. By the standard
ν( ) = 1, φ(f ) dν = Jensen inequality: φ(f ) dν = φ(f ) dµ ≥ φ f dµ = φ f dν , as desired.
4.4. Representation Summing up, Choquet functionals are positively homogeneous, comonotonic additive, and Lipschitz continuous; they are also monotone provided the underlying game does. A natural question is whether these properties actually characterize Choquet functionals among all the functionals defined on B(). Schmeidler (1986) showed that this is the case and we now present his result. Theorem 4.4. Let ν : B() → R be a functional. Define the game ν(A) = ν(1A ) on . The following conditions are equivalent: (i) ν is monotone and comonotonic additive; (ii) ν is a capacity and, for all f ∈ B(), it holds: ν(f ) = 0
∞
ν(f ≥ t) dt +
0
−∞
[ν(f ≥ t) − ν( )] dt.
(4.18)
Remarks. (i) Positive homogeneity is a redundant condition here as it is implied by comonotonic additivity and monotonicity, as shown in the proof. (ii) Zhou (1998) proved a version of this result on Stone lattices. Proof. (ii) trivially implies (i) Conversely, assume (i). We divide the proof into three steps. Step 1. For
and any integer
n, by comonotonic additivity we have
any f ∈ B() f f ν n . Namely, ν fn = n1 ν(f ). Hence, given any positive ν(f ) = ν n n = n rational number α = m/n,
m f f f f m f = ν + ··· + = ν + · · · + ν = ν(f ). ν n n n n n n As a result, we have ν(λf ) = λ ν(f ) for any λ ∈ Q+ . In particular, this implies 0 = ν(λ1 − λ1 ) = λν( ) + ν(−λ1 ) for each λ ∈ Q+ , and so ν(f + λ1 ) = ν(f ) + ν(λ1 ) = ν(f ) + λν( ) for each f ∈ B() and each λ ∈ Q. Step 2. We now prove that ν is supnorm continuous. Let f , g ∈ B() and let {rn }n be a sequence of rationals such that rn ↓ f − g. As f ≤ g + f − g ≤ g + rn ,
70
Massimo Marinacci and Luigi Montrucchio
it follows that ν(f ) ≤ ν(g) + rn ν( ). Consequently, | ν(f ) − ν(g)| ≤ rn ν( ). As n → ∞, we get | ν(f ) − ν(g)| ≤ f − gν( ). Hence, ν is Lipschitz continuous, and so supnorm continuous. In turn, this implies ν(λf ) = λ ν(f ) for all λ ≥ 0 and ν(f + λ1 ) = ν(f ) + λν( ) for each f ∈ B() and each λ ∈ R, that is, ν is translation invariant. Step 3. It remains to show that (4.18) holds, that is, that ν(f ) = νc (f ) for all f ∈ B(). Since both ν and νc are supnorm continuous and B0 () is supnorm dense in B(), it is enough to show that ν(f ) = νc (f ) for all f ∈ B0 (). Let f ∈ B0 (). Since both ν and νc are translation invariant, it is enough to show that ν(f ) = νc (f ) for f ≥ 0. As f ∈ B0 (), we can write f = ki=1 αi 1Ai , where {Ai }ki=1 ⊆ is a suitable partition of and α1 > · · · > αk . Setting Di = ij =1 Aj and αk+1 = 0, we can then write f = k−1 i=1 (αi −αi+1 )1Di +αk 1 . k−1 As the functions {(αi − αi+1 )1Di }i=1 and αk 1 are pairwise comonotonic, by the comonotonic additivity and positive homogeneity of ν we have ⎞ ⎛ k−1 i ν(f ) = (αi − αi+1 )ν ⎝ Aj ⎠ + αk 1. j =1
i=1
∞ i Since ki=1 (αi −αi+1 )ν ν(f ) = j =1 Aj = 0 ν(f ≥ t) dt, we conclude that ∞ ν(f ) = νc (f ), as desired. 0 ν(f ≥ t) dt, that is, Next we extend Schmeidler’s Theorem to the non-monotonic case. Given a functional ν : B() → R and any two f , g ∈ B() with f ≤ g, set V (f ; g) = sup
n−1
| ν(fi+1 ) − ν(fi )|,
i=0
where the supremum is taken over all finite chains f = f0 ≤ f1 ≤ · · · ≤ fn = g. We say that ν is of bounded variation if V (0; f ) < +∞ for all f ∈ B + (). Theorem 4.5. Let ν : B() → R be a functional. Define the game ν(A) = ν(1A ) on . The following conditions are equivalent: (i) ν is comonotonic additive and of bounded variation; (ii) ν is comonotonic additive and supnorm continuous on B + (), and ν ∈ bv(); (iii) ν ∈ bv() and, for all f ∈ B(), ∞ 0 ν(f ) = ν(f ≥ t) dt + [ν(f ≥ t) − ν( )] dt. 0
−∞
Remark. When is finite, the requirement ν ∈ bv() becomes superfluous in conditions (ii) and (iii) as all finite games are of bounded variation.
Introduction to the mathematics of ambiguity
71
Before proving the result, we give a useful lemma. Observe that the decomposition f = (f − t)+ + (f ∧ t) reduces to the standard f = f + − f − when t = 0. Lemma 4.6. Let ν : B() → R be a comonotonic additive functional. Then, ν(f ) = ν((f − t)+ ) + ν(f ∧ t) for each t ∈ R and f ∈ B(). Proof. Given any t ∈ R, the functions (f − t)+ and f ∧ t are comonotonic. In fact, for any ω, ω ∈ we have [(f − t)+ (ω) − (f − t)+ (ω )][(f ∧ t)(ω) − (f ∧ t)(ω )] = (f − t)+ (ω)(f ∧ t)(ω) − (f − t)+ (ω)(f ∧ t)(ω ) − (f − t)+ (ω )(f ∧ t)(ω) + (f − t)+ (ω )(f ∧ t)(ω ) = (f − t)+ (ω)(f − t)− (ω ) + (f − t)+ (ω )(f − t)− (ω) ≥ 0, as desired. Proof of Theorem 4.5. (i) implies (ii). Clearly, ν ∈ bv(). We want to show that (i) implies that ν is supnorm continuous over B + (). As Step 1 of the proof of Theorem 4.4 still holds here, we have ν(f + λ1 ) = ν(f ) + λν( ) for each f ∈ B() and each λ ∈ Q. That is, ν is translation invariant w.r.t. Q. Let f , g ∈ B() with f ≤ g. If f ≥ 0, then V (f ; g) ≤ V (0; g) < +∞. Suppose f is not necessarily positive. There exists λ ∈ Q+ such that f + λ ≥ 0 and g + λ ≥ 0. By the translation invariance w.r.t. Q of ν, we have V (f ; g) = V (f + λ; g + λ) for all λ ∈ Q. Hence, V (f ; g) = V (f + λ; g + λ) < +∞. It is easy to see that V (0; λf ) = λV (0; f ) for all λ ∈ Q+ . The next claim gives a deeper property of V (f ; g). Claim. For all f ≥ 0 and all λ ∈ Q+ , it holds V (−λ; f ) = V (−λ; 0) + V (0; f ). Proof of the Claim. If f ≤ h ≤ g, we have V (f ; g) ≥ V (f ; h)+V (h; g). Hence, it suffices to show that V (−λ; f ) ≤ V (−λ; 0) + V (0; f ). By definition, for any ε > 0 there exists a chain {ϕi }ni=0 such that n−1
| ν(ϕi+1 ) − ν(ϕi )| ≥ V (−λ; f ) − ε,
i=0
with ϕ0 = −λ and ϕn = f . For each ϕi consider the two functions ϕi− = −(ϕi ∧0) and ϕi+ = ϕi ∨ 0 and the two chains {−ϕi− } and {ϕi+ }. The former chain is relative
72
Massimo Marinacci and Luigi Montrucchio
to V (−λ; 0), while the latter is relative to V (0; f ). Therefore, we have V (−λ; 0) + V (0; f ) ≥
n−1
− ) − ν(−ϕi− )| + | ν(−ϕi+1
i=0
=
n−1
n−1
+ | ν(ϕi+1 ) − ν(ϕi+ )|
i=0 − + (| ν(−ϕi+1 ) − ν(−ϕi− )| + | ν(ϕi+1 ) − ν(ϕi+ )|).
(4.19)
i=0
ν(ϕi+ ) + ν(−ϕi− ), On the other hand, by Lemma 4.6 for each i we have ν(ϕi ) = and so + − ν(ϕi )| = | ν(ϕi+1 ) + ν(−ϕi+1 ) − ν(ϕi+ ) − ν(−ϕi− )| | ν(ϕi+1 ) − + − ≤ | ν(ϕi+1 ) − ν(ϕi+ )| + | ν(−ϕi+1 ) − ν(−ϕi− )|.
In view of (4.19), we can write V (−λ; 0) + V (0; f ) ≥
n−1
| ν(ϕi+1 ) − ν(ϕi )| ≥ V (−λ; f ) − ε,
i=0
which proves our claim. Define the monotone functional ν1 (f ) = V (0; f ) on B + (). For each λ ∈ Q+ we have ν1 (f + λ) = V (0; f + λ) = V (−λ; f ) = V (−λ; 0) + V (0; f ) ν1 (f ). = V (0; λ) + V (0; f ) = λV (0; 1) + V (0; f ) = λ ν1 (1 ) + Hence, ν1 is translation invariant w.r.t. Q+ . Since ν1 is monotone, by Step 2 of the proof of Theorem 4.4 it is Lipschitz continuous, and so supnorm continuous. ν1 − ν on B + (). The functional ν2 is monotone; Consider the functional ν2 = moreover, it is translation invariant w.r.t. Q as both ν1 and ν do. Consequently, by Step 2 of the proof of Theorem 4.4 ν2 is supnorm continuous. As ν = ν1 − ν2 , we conclude that also ν is supnorm continuous, thus completing the proof that (i) implies (ii). (ii) implies (iii). Step 1 of the proof of Theorem 4.4 holds here as well. Hence, ν(f + λ1 ) = ν(f ) + λν( ) for each ν(λf ) = λ ν(f ) for all λ ∈ Q+ , and f ∈ B() and each λ ∈ Q. By supnorm continuity, ν(λf ) = λ ν(f ) for all λ ≥ 0, and ν(f + λ1 ) = ν(f ) + λν( ) for each λ ∈ R. The functional ν is, therefore, positively homogeneous and translation invariant. Let νc be the Choquet functional associated with ν. As ν ∈ bv(), νc is well ν and defined and supnorm continuous. We want to show that ν = νc . Since both νc are supnorm continuous and B0 () is supnorm dense in B(), it is enough
Introduction to the mathematics of ambiguity
73
to show that ν(f ) = νc (f ) for each f ∈ B0 (). This can be established by proceeding as in Step 3 of the proof of Theorem 4.4. (iii) implies (i). It remains to show that the Choquet functional νc is of bounded variation as long as ν ∈ bv(). By Proposition 4.1, there exist capacities ν 1 and ν 2 such that ν = ν 1 − ν 2 . Hence, νc = νc1 − νc2 and so the functional νc is the difference of two monotone functionals. This implies V (f ; g) ≤ νc1 (g) − νc1 (f ) + νc2 (g) − νc2 (f ), and we conclude that νc is of bounded variation.
4.5. Convex games Convex games are an interesting class of games and played an important role in Schmeidler’s approach to ambiguity, as explained in Chapter 1. Here we show some of their remarkable mathematical properties. We begin by proving formally that convexity can be formulated as in Equation (4.1), a version useful in game theory for interpreting supermodularity in terms of marginal values (see Moulin, 1995). Proposition 4.15. For any game ν, the following properties are equivalent: (i) ν is convex; (ii) for all sets A, B, and C such that A ⊆ B and B ∩ C = Ø, ν(A ∪ C) − ν(A) ≤ ν(B ∪ C) − ν(B); (iii) for all disjoint sets A, B, and C: ν(B ∪ A) − ν(B) ≤ ν(B ∪ C ∪ A) − ν(B ∪ C). Proof. (ii) easily implies (iii). Assume (ii) holds. Since (A∪B)\A = B \(A∩B), to check the supermodularity of ν is enough to set C = (A∪B)\A. Finally, assume (i) holds. If the sets A, B, and C are disjoint, then (B ∪ C) ∩ (B ∪ A) = B, and so supermodularity implies (iii), as desired. The next result, due to Choquet (1953: 289), shows that the convexity of the game and the superlinearity of the associated Choquet functional are two faces of the same coin.7 Recall that, by Proposition 4.6, B() is a lattice and it becomes a vector lattice when is σ -algebra. Theorem 4.6. For any game ν in bv(), the following conditions are equivalent: (i) ν is convex, (ii) νc is superadditive on B(), that is, νc (f + g) ≥ νc (f ) + νc (g) for all f , g ∈ B() such that f + g ∈ B(). (iii) νc is supermodular on B(), that is, νc (f ∨ g) + νc (f ∧ g) ≥ νc (f ) + νc (g) for all f , g ∈ B().
74
Massimo Marinacci and Luigi Montrucchio
Proof. We prove that both (ii) and (iii) are equivalent to (i). (i) implies (ii). Given f ∈ B + () and E ∈ , we have (f + 1E ≥ t) = (f ≥ t) ∪ (E ∩ (f ≥ t − 1)), and so f + 1E ∈ B + (). In turn, this implies f + g ∈ B + () whenever g ∈ B + () is simple. Moreover, as ν is convex, we get ν(f + 1E ≥ t) ≥ ν(f ≥ t) + ν(E ∩ (f ≥ t − 1)) − ν(E ∩ (f ≥ t)). Consequently, νc (f + 1E ) ∞ = ν(f + 1E ≥ t) dt
0
≥
∞
ν(f ≥ t) dt +
0
= νc (f ) +
∞
0 0 −1
∞
ν(E ∩ (f ≥ t − 1)) dt −
ν(E ∩ (f ≥ t)) dt
0
ν(E ∩ (f ≥ t)) dt = νc (f ) + ν(E).
As νc is positive homogeneous, for each λ ≥ 0 we have f f + 1E ≥ λ νc + ν(E) νc (f + λ1E ) = λνc λ λ = νc (f ) + λν(E).
n Let g ∈ B + () be a simple function. We can write g = i=1 λi 1Di , where D1 ⊆ · · · ⊆ Dn and λi ≥ 0 for each i = 1, . . . , n. As g is simple, we have f + g ∈ B + (). Hence, ! ! n n λi 1Di ≥ νc f + λi 1Di + λ1 ν(D1 ) νc (f + g) = νc f + i=1
≥ · · · ≥ νc (f ) +
i=2 n
λi ν(Di ) = νc (f ) + νc (g),
i=1
as desired. To show that the inequality ν(f + g) ≥ ν(f ) + ν(g) holds for all f , g ∈ B() it is now enough to use the translation invariance and supnorm continuity of νc . (ii) implies (i). Given any sets A and B, it holds 1A∪B + 1A∩B = 1A + 1B . Since the characteristic functions 1A∪B and 1A∩B are comonotonic, we then have ν(A ∪ B) + ν(A ∩ B) = νc (1A∪B ) + νc (1A∩B ) = νc (1A∪B + 1A∩B ) = νc (1A + 1B ) ≥ νc (1A ) + νc (1B ) = ν(A) + ν(B), and so the game ν is convex, as desired.
Introduction to the mathematics of ambiguity
75
(i) implies (iii). As νc is translation invariant, it is enough to prove the implication for f and g positive. It is easy to check that, for each t ∈ R, it holds (f ∨ g ≥ t) = (f ≥ t) ∪ (g ≥ t) (f ∧ g ≥ t) = (f ≥ t) ∩ (g ≥ t). Therefore, if ν is convex, then ν(f ∨ g ≥ t) + ν(f ∧ g ≥ t) ≥ ν(f ≥ t) + ν(g ≥ t). Hence,
∞
νc (f ∨ g) + νc (f ∧ g) = 0
=
∞
ν(f ∨ g ≥ t) dt +
ν(f ∧ g ≥ t) dt
0 ∞
[ν(f ∨ g ≥ t) + ν(f ∧ g ≥ t)] dt
0 ∞
≥
[ν(f ≥ t) + ν(g ≥ t)] dt = νc (f ) + νc (g),
0
as desired. (iii) implies (i). We have 1A ∨ 1B = 1A∪B and 1A ∧ 1B = 1A∩B . Hence, if we put f = 1A and g = 1B in the inequality νc (f ∨ g) + νc (f ∧ g) ≥ νc (f ) + νc (g), we get ν(A ∪ B) + ν(A ∩ B) ≥ ν(A) + ν(B), as desired. By Theorem 4.6, a game is convex if and only if the associated Choquet functional νc is superlinear, that is, superadditive and positively homogeneous. This is a useful property that, for example, makes it possible to use the classic Hahn–Banach Theorem in studying convex games. In order to do so, however, we first have to deal with a technical problem: unless is a σ -algebra, the space B() is not in general a vector space, something needed to apply the Hahn–Banach Theorem and other standard functional analytic results. There are at least two ways to bypass the problem. The first one is to consider the vector space B0 () of -measurable simple functions in place of the whole set B(). This can be enough as long as one is interested in using results that, like the Hahn–Banach Theorem, hold on any vector space. There are important results, however, that only hold on Banach spaces (e.g. the Uniform Boundedness Principle). In this case B0 (), which is not a Banach space, is useless. A solution is to consider B(), the supnorm closure B() of B0 (),8 which is a Banach lattice under the supnorm (Dunford and Schwartz, 1958: 258). B() is a dense subset of B(); it holds B() = B() when is a σ -algebra, and so in this case B() itself is a Banach lattice. If is not a σ -algebra, to work with the Banach lattice B() we have to extend on it the Choquet functional νc , which is originally defined on B().
76
Massimo Marinacci and Luigi Montrucchio
Lemma 4.7. Any Choquet functional νc : B() → R induced by a game ν ∈ bv() admits a unique supnorm continuous extension on B(). Such extension is positively homogeneous and comonotonic additive. Proof. By Proposition 4.11(iv), νc is Lipschitz continuous on B(). By standard results (Aliprantis and Border, 1999: 77), it then admits a unique supnorm continuous extension on the closure B(). Using its supnorm continuity, such extension is easily seen to be positively homogeneous. As to comonotonic additivity, we first prove the following claim. Claim. Given any two comonotonic and supnorm bounded functions f and g, there exist two sequences of simple functions {fn }n and {gn }n uniformly converging to f and g, respectively, and such that fn and gn are comonotonic for each n. Proof of the Claim. It is enough to prove the claim for positive functions. Let f : → R be positive and supnorm bounded, so that there exists a constant M > 0 such that 0 ≤ f (ω) ≤ M for each ω ∈ . Let M = αn > αn−1 > · · · > α1 > α0 = 0, with αi = (i/n)M for each i = 0, 1, . . . , n. SetAi = (f ≥ αi ) for each i = 1, . . . , n − 1, and define fn : → R as fn = n−1 i=1 αi 1Ai . The collection of upper sets {(fn ≥ t)}t∈R is included in {(f ≥ t)}t∈R and f − fn = maxi∈{0,...,n−1} (αi+1 − αi ) = M/n. In a similar way, for each n we can construct a simple function gn such that the collection of upper sets {(gn ≥ t)}t∈R is included in {(g ≥ t)}t∈R and g − gn = M/n. By Lemma 4.4 the collections {(g ≥ t)}t∈R and {(f ≥ t)}t∈R together form a chain. Hence, by what we just proved, for each n the collections {(g ≥ t)}t∈R , {(gn ≥ t)}t∈R , {(f ≥ t)}t∈R , and {(fn ≥ t)}t∈R together form a chain as well. Again by Lemma 4.4, fn and gn are then comonotonic functions, and so the sequences {fn }n and {gn }n we have constructed have the desired properties. This completes the proof of the Claim. Let f , g ∈ B(). Consider the sequences {fn }n and {gn }n of simple functions given by the Claim. As such sequences belong to B(), by the supnorm continuity of νc we have: νc (f + g) = lim νc (fn + gn ) = lim νc (fn ) + lim νc (gn ) = νc (f ) + νc (g), n
n
n
as desired. It is convenient to denote this extension still by νc , and in the sequel we will write νc : B() → R. In the enlarged domain B() the following cleaner version of Theorem 4.6 holds. As B() is a vector space, here we can consider concavity and quasi-concavity. The latter property is the only nontrivial feature of the next result relative to Theorem 4.6.9 Corollary 4.2. For any game ν in bv(), the following conditions are equivalent: (i) ν is convex,
Introduction to the mathematics of ambiguity (ii) (iii) (iv) (v)
νc νc νc νc
77
is superlinear on B(), is supermodular on B(), is concave on B(), is quasi-concave on B(), provided ν( ) = 0.
Proof. In view of Theorem 4.6, the only nontrivial part is to show that (v) implies (iv). We will actually prove the stronger result that (iv) is equivalent to the convexity of the cone {f : νc (f ) ≥ 0}. Set K = {f ∈ B() : νc (f ) ≥ 0}. Given two functions f , g ∈ B(), we have νc (f ) νc (g) νc g − νc f − 1 = 0, 1 = 0. ν( ) ν( ) Hence, both f − (νc (f )/ν( ))1 , and g − (νc (g)/ν( ))1 lie in K. By the convexity of K, taken α ∈ [0, 1] and α ≡ 1 − α, we have αf − α
νc (f ) νc (g) 1 + αg − α 1 ∈ K. ν( ) ν( )
Namely, νc (f ) νc (g) 1 + αg − α 1
νc αf − α ν( ) ν( ) = νc (αf + αg) − ανc (f ) − ανc (g) ≥ 0. Therefore, νc is concave. Remarks. (i) Dual properties hold for submodular games. For example, a game ν is submodular if and only if its Choquet functional νc is convex on B(); equivalently, a game ν is convex if and only if its dual Choquet functional ν c is convex on B(). For brevity, we omit these dual properties. (ii) Condition ν( ) = 0 in point (v) is needed. Consider the game ν on = {ω1 , ω2 } with ν(ω1 ) = 2, ν(ω2 ) = −1, and ν( ) = 0. Being subadditive, ν is not convex. On the other hand, its Choquet integral is 2(x1 − x2 ) x1 ≥ x2 νc (x1 , x2 ) = , x1 − x 2 x2 > x 1 which is quasi-concave. The next result is a first consequence of the use of functional analytic tools in the study of convex games. The equivalence between (i) and (v) is due to Schmeidler (1986) for positive games and to De Waegenaere and Wakker (2001) for finite games; for the other equivalences we refer to Delbaen (1974) and Marinacci and Montrucchio (2003).
78
Massimo Marinacci and Luigi Montrucchio
Theorem 4.7. For a bounded game ν, the following conditions are equivalent: (i) ν is convex; (ii) for any A ⊆ B there is µ ∈ core(ν), such that µ(A) = ν(A) and µ(B) = ν(B); (iii) for any finite chain {Ai }ni=1 , there is µ ∈ core(ν) such that µ(Ai ) = ν(Ai ) for all i = 1, . . . , n; (iv) ν ∈ bv() and, for any chain {Ai }i∈I , there is µ ∈ ext (core (ν)) such that µ(Ai ) = ν(Ai ) for all i ∈ I ; (v) ν ∈ bv() and νc (f ) = minµ∈core(ν) f dµ for all f ∈ B(); (vi) νc (f ) = minµ∈core(ν) f dµ for all f ∈ B0 (). This theorem has a few noteworthy features. First, it shows that bounded and convex games belong to bv(), so that they always have well-defined Choquet integrals on B(). Second, it improves Lemma 4.5 by showing that in the convex case the “replicating” measures over chains can be assumed to be in the core. Finally, Theorem 4.7 shows that Choquet functionals of convex games can be viewed as lower envelopes of the linear functional on B() induced by the measures in the cores. In other words, convex games are exact games of a special type, in which the close connection between the game and the measures in the core holds on the entire space B(), and not just on . Proof. The proof proceeds as follows: (i) ⇒ (vi) ⇒ (iv) ⇒ (v) ⇒ (iii) ⇒ (ii) ⇒ (i). (i) implies (vi). Given any f ∈ B0 (), the Choquet integral f dν is well defined since ν ∈ bv(f ), where f is the finite algebra generated by f . Hence, the Choquet functional νc : B0 () → R exists on the vector space B0 (), and it is positively homogeneous and translation invariant. Let f , g : → R be any two functions in B0 (). Let f ,g be the smallest algebra that makes both f and g measurable. As f ,g is finite, ν ∈ bv(f ,g ) and so we can apply Theorem 4.6 to the restricted Choquet integral νc : B(f ,g ) → R. Thus, νc (f +g) ≥ νc (f )+νc (g). Since f and g were arbitrary elements of B0 (), we conclude that νc : B0 () → R is a superlinear functional on B0 (). Let f ∈ B0 (). The algebraic dual of B0 () is the space f a() of all finitely additive games on .10 As νc : B0 () → R is superlinear, by the Hahn–Banach Theorem there is µc ∈ f a() such that µc (f ) = νc (f ) and µc (g) ≥ νc (g) for each g ∈ B0 (). In other words, νc (f ) = min µc (f ), µ∈C
where C = {µ ∈ f a() : µc (f ) ≥ νc (f ) for each f ∈ B0 ()}. Next we show that C coincides with the set C = {µ ∈ f a() : µ ≥ ν and µ( ) = ν( )}.
Introduction to the mathematics of ambiguity
79
Let µ ∈ C. Then, µ(A) = µc (1A ) ≥ νc (1A ) = ν(A) for all A ∈ ; moreover, −µ( ) = µc (−1 ) ≥ νc (−1 ) = −ν( ). Hence, µ ∈ C . Conversely, suppose µ ∈ C . As µ ≥ ν and µ( ) = ν( ), the definition of Choquet integral immediately implies that νc (f ) µ(f ). Hence, µ ∈ C. It remains to show that C = core(ν). As ba() ⊆ f a(), core(ν) ⊆ C . As to the converse inclusion, suppose µ ∈ C . Since ν is bounded, for each µ ∈ C we have |µ(A)| ≤ 2 supA∈ |ν(A)| (see Proposition 4.2). Then, µ ∈ ba() (see Dunford and Schwartz, 1958: 97) and we conclude that C ⊆ core(ν), as desired. (vi) implies (iv). Consider first a finite chain A1 ⊆ · · · ⊆ An . By (vi), there exists µ ∈ core(ν) such that
µc
n i=1
! 1Ai
= νc
n
! 1Ai .
i=1
n By comonotonic additivity, ni=1 µ(Ai ) = i=1 ν(Ai ). As µ ∈ core(ν), we have µ(Ai ) ≥ ν(Ai ) for all i = 1, . . . , n, which in turn implies µ(Ai ) = ν(Ai ) for all i = 1, . . . , n. Now, let {Ai }i∈I be any chain in . Let J be the (finite) algebra generated by a finite subchain {Ai }i∈J and J = {µ ∈ core(ν) : µ(Aj ) = ν(Aj ) for all j ∈ J }. Since core(ν) is weak∗ -compact, the set J is weak∗ -compact. Moreover, it is convex and, by what we just proved, J = Ø. It is easily seen that J is also extremal in core(ν). The collection of weak∗ -compact sets {J }{J : J ⊆I and |J |<∞} has the finite intersection property, and so its overall intersection {J : J ⊆I and |J |<∞} J is nonempty (see Aliprantis and Border, 1999: 38). Moreover, such intersection ∗ is extremal
in core(ν). Being convex and weak -compact, by the Krein–Milman Theorem {J : J ⊆I and |J |<∞} J has then an extreme point µ. We conclude that µ ∈ ext(core(ν)) and µ(Ai ) = ν(Ai ) for all i ∈ I , as desired. To complete the proof that (vi) implies (iv) it remains to show that ν ∈ bv(). Since core(ν) is weak∗ -compact, it is bounded; that is, there exists M ∈ R such that µ ≤ M for all µ ∈ core(ν). Since, given any finite chain Ø = A0 ⊆ A1 ⊆ · · · ⊆ An = , there exists µ ∈ core(ν) such that µ(Ai ) = ν(Ai ) for all i = 0, . . . , n, we conclude that n
|ν(Ai ) − ν(Ai−1 )| ≤ µ ≤ M.
i=1
(iv) implies (v). Let f ∈ B(). Since νc is translation invariant, assume w.l.o.g. that f ≥ 0. Consider the chain of all upper sets {(f ≥ t)}t∈R . Given any
80
Massimo Marinacci and Luigi Montrucchio
µ ∈ core(ν), the following holds: νc (f ) = ν(f ≥ t) dt ≤ µ(f ≥ t) dt = µ(f ). By (iv), there is µ ∈ core(ν) such that ν(A) = µ(A) for all A ∈ . Hence, νc (f ) = ν(f ≥ t) dt = µ(f ≥ t) dt = µ(f ), and we conclude that νc (f ) = minµ∈core(ν) f dµ. Since B() is supnorm dense in B(), the supnorm continuous functional νc : B() → R given by Lemma 4.7 is superlinear. By proceeding as before, we can show that core(ν) = {µ ∈ ba() : µc (f ) ≥ νc (f ) for each f ∈ B0 ()} = {µ ∈ ba() : µc (f ) ≥ νc (f ) for each f ∈ B()} Hence, by the Hahn–Banach Theorem: νc (f ) = min{µc (f ) : µ ∈ ba() and µc (g) ≥ νc (g) for each g ∈ B()} = min{µc (f ) : µ ∈ ba() and µc (g) ≥ νc (g) for each g ∈ B0 ()} = min{µc (f ) : µ ∈ core(ν)} = min f dµ, µ∈core(ν)
as desired. (v) implies (iii). Consider a finite chain {Ai }ni=1 and set f = ni=1 1Ai . By (v), such that µ(f ) = ν(f ). By comonotonic additivity, core(ν) n there is µ ∈ n n µ(A ) = ν(A ), and so i i i=1 i=1 i=1 [µ(Ai ) − ν(Ai )] = 0. Since µ ≥ ν we conclude that µ(Ai ) = ν(Ai ) for each i = 1, . . . , n, as desired. As (iii) trivially implies (ii), it remains to show that (ii) implies (i). Given any A and B, by (ii) there is µ ∈ core(ν) such that ν(A) = µ(A) and ν(B) = µ(B). Hence, ν(A ∩ B) + ν(A ∪ B) = µ(A ∩ B) + µ(A ∪ B) = µ(B) + µ(A) ≥ ν(B) + ν(A), where the last inequality follows from µ ∈ core(ν). We close with some characterizations of convexity through properties of subgames, thus providing an “hereditary” perspective on it. Theorem 4.8. For a bounded game ν, the following conditions are equivalent: (i) ν is convex; (ii) each subgame of ν is exact;
Introduction to the mathematics of ambiguity
81
(iii) ν is totally balanced and, given any A ⊆ B, each charge in core (νA ) has an extension belonging to core (νB ); (iv) ν is balanced and, given any finite subalgebra 0 of , each charge in core (ν0 ) has an extension on belonging to core (ν). The equivalence between (i) and (iv) is essentially due to Kelley (1959) (see also Delbaen, 1974: 218), that between (i) and (ii) to Biswas et al. (1999: 10), and that between (i) and (iii) to Einy and Shitovitz (1996: 197–199). The second part of condition (iii) is a property introduced by Kikuta and Shapley (1986). They call extendable the games satisfying this property for B = , which turns out to be useful in studying the von Neumann–Morgenstern stability of cores.11 The proof of Theorem 4.8 uses the following straightforward lemma. In this regard, notice that Schmeidler (1972: 219) gives an example of an exact game with four players that is not convex. Lemma 4.8. A finite game with at most three players is exact if and only if it is convex. Proof of Theorem 4.8. For convenience, we first prove the equivalence between (i)–(iii), and then that between (i) and (iv). (ii) implies (i). Given any A and B, consider the subgame νA∪B . By (ii), there is µ ∈ core(νA∪B ) such that µ(A ∩ B) = νA∪B (A ∩ B). Hence, ν(A ∪ B) + ν(A ∩ B) = νA∪B (A ∪ B) + νA∪B (A ∩ B) = µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B) ≥ νA∪B (A) + νA∪B (B) = ν(A) + ν(B), as desired. (i) implies (iii). Since A ⊆ B, the space B0 (A ) of simple A -measurable functions can be regarded as a vector subspace of B0 (B ). Let µ ∈ core(νA ). Given any f ∈ B0+ (A ), it holds νA,c (f ) = νB,c (f ), where νA,c : B0 (A ) → R is the Choquet functional induced by the subgame νA (νB,c is similarly defined). Therefore, µ(f ) ≥ νB,c (f ) for all f ∈ B0+ (A ). Given any f ∈ B0 (A ), there is k > 0 large enough so that f +k1A ∈ B0+ (A ). Since µ(A) = ν(A), by Theorem 4.6 we have µ(f ) + kµ(A) = µ(f + k1A ) ≥ νB,c (f + k1A ) ≥ νB,c (f ) + kν(A) = νB,c (f ) + kµ(A). Hence, µ(f ) ≥ νB,c (f ). We conclude that µ(f ) ≥ νB,c (f ) for all f ∈ B0 (A ). By the Hahn–Banach Theorem, there exists a charge µ∗ : B → R which extends µ and such that µ∗ (f ) ≥ νB,c (f ) for all f ∈ B0 (B ). Hence, µ∗ ∈ core(νB ).
82
Massimo Marinacci and Luigi Montrucchio
(iii) implies (ii). Given any B, let νB be the associated subgame. Given any A ⊆ B, let µ ∈ core(νA ). By hypothesis, there is µ∗ ∈ core(νB ) that extends µ. Hence, µ∗ (A) = µ(A) = νA (A) = νB (A), which implies that νB is exact, as desired. To complete the proof it remains to show that (iv) is equivalent to (i). (i) implies (iv). Let µ ∈ core(ν0 ). By Theorem 4.7, ν ∈ bv(), and so νc : B0 () → R is superlinear by Theorem 4.6. By the Hahn–Banach Theorem, there is µ∗ ∈ ba() that extends µ and such that µ∗ (f ) ≥ νc (f ) for all f ∈ B0 (). Hence, µ∗ ∈ core(ν). (iv) implies (i). If µ ∈ core(ν), then its restriction µ0 on any subalgebra 0 belongs to core(ν0 ). Therefore, the fact that ν is balanced implies that core(ν0 ) = Ø for each 0 . In particular, (iv) implies that: core(ν0 ) = {µ0 ∈ ba(0 ) : µ ∈ core(ν)}.
(4.20)
Given any A, consider first the finite subalgebra 0 = {Ø, A, Ac , }. It is easy to see that there exists an element of core(ν0 ) that on A takes on value ν(A). By (4.20), this amounts to say that there exists µ ∈ core(ν) such that µ(A) = µ0 (A) = ν(A). We conclude that ν is exact. Given any A and B, consider the finite subalgebra 0 generated by the partition {A ∩ B, (A ∪ B)c , A ∪ B \ A ∩ B}. Let C ∈ 0 . As ν is exact, there is µ ∈ core(ν) such that µ(C) = ν(C). As µ0 ∈ core(ν0 ), we have µ0 (C) = ν0 (C) and so ν0 is exact. Since 0 is generated by a partition consisting of three elements, Lemma 4.8 then implies that ν0 is convex. Since A ∩ B, A ∪ B ∈ 0 , by Theorem 4.7 and (4.20) there is µ ∈ core(ν) such that µ0 (A ∩ B) = ν0 (A ∩ B) and µ0 (A ∪ B) = ν0 (A ∪ B). Hence, ν(A ∪ B) + ν(A ∩ B) = ν0 (A ∪ B) + ν0 (A ∩ B) = µ0 (A ∪ B) + µ0 (A ∩ B) = µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B) ≥ ν(A) + ν(B), which shows that ν is convex.
4.6. Finite games 4.6.1. The space of finite games Games defined on finite spaces have some noteworthy peculiar properties, thanks to the special form of their domain. We devote this section to their study.12 Let be a finite set {ω1 , . . . , ωn } of n players and its power set. Vn denotes the space of all finite games on , which is a vector space under the setwise operations ν1 + ν2 and αν for ν1 , ν2 , ν ∈ Vn and α ∈ R. The next result, due to Shapley (1953: Lemma 3), shows the crucial importance of unanimity games, introduced in Example 4.4.
Introduction to the mathematics of ambiguity
83
Theorem 4.9. Unanimity games form a basis for the (2|| −1)-dimensional vector ν uA space Vn . For any ν ∈ Vn , the unique coefficients satisfying ν = Ø=A∈ αA are given by ν = (−1)|A|−|B| ν(B). (4.21) αA B⊆A
Proof. We first show that unanimity games are linearly independent in Vn . Suppose n α i=1 i uAi = θ , where θ is the trivial game such that θ (A) = 0 for each A. We want to show that αi = 0 for each i = 1, . . . , n. Suppose, per contra, that there is a subset I ⊆ {1, . . . , n} such that αi = 0 for each i ∈ I . As in Myerson (1991: 440), let i0 ∈ I be such that Ai0 is of minimal size among the coalitions {Ai }i∈I . By construction, αi = 0 for each i such that Ai Ai0 , so that 0=
n
αi uAi (Ai0 ) =
αi uAi (Ai0 ) = αi0 ,
{i : Ai ⊆Ai0 }
i=1
a contradiction. To complete the proof it remains to prove that, for each ν ∈ Vn , it holds ⎛ ⎞ ⎝ (−1)|A|−|B| ν(B)⎠ uA . ν= Ø=A∈
B⊆A
The needed combinatorial argument is detailed in, for example, Owen (1995: 263), to which we refer the reader. Example 4.7. For a charge µ we have µ(ω) if A = {ω} µ αA = 0 else That is, a game is additive if and only if all coefficients in (4.21) associated with non-singletons are zero. Example 4.8. Let f : R → R with f (0) = 0. Its first difference is f (n) = f (n+1)−f (n). By iteration, the k-order difference is k f = k−1 f . Consider the scalar measure game ν : 2{1,...,n} → R defined by ν(A) = f (|A|) for each A ⊆ {1, . . . , n}. The following holds: ν = |A| f (0). αA
To see why this is the case, observe that (4.21) implies m m ν f (m − 1) + f (m − 2) − · · · = αA = f (m) − 1 2 m m f (m − k), (−1)k = k k=0
(4.22)
84
Massimo Marinacci and Luigi Montrucchio
where we set |A| = m. Denote by I : N → N the identity operator and by S : N → N the shift operator defined by Sf (m) = f (m + 1). As = S − I , we have m m m−k m = (S − I )m = S . (−1)k k k=0
Hence m f (0) =
m k=0
(−1)k
m m m−k m S f (m − k), f (0) = (−1)k k k k=0
and so (4.22) holds. ν By Theorem 4.9, each game ν is uniquely determined by the coefficients {αA } given by (4.21). A natural question is whether there is a significant class of games identified by the requirement that all such coefficients be positive. Fortunately, Theorem 4.10 will show that there is such a class, which we now introduce. A game ν : → R is
14 monotone of order k (with k ≥ 2) if, for every A1 , . . . , Ak ∈ , ν
k i=1
! Ai
≥
{I : Ø =I ⊆{1,...,k}}
(−1)|I |+1 ν
! Ai ,
(4.23)
i∈I
15 totally monotone if it is positive and k-monotone for all k ≥ 2, 16 a belief function if it is a totally monotone probability. These definitions works for any algebra , not necessarily finite. For k = 2, we get back to convexity. Hence, totally monotone games are convex, though the converse is false. When ν is a charge, in (4.23) we have an equality. Totally monotone games are studied at length in Choquet (1953), and belief functions play a central role in the works of Dempster (1967, 1968) and Shafer (1976). They are also related to the theory of Mobius transforms pioneered by Rota (1964), as detailed in Chateauneuf and Jaffray (1989) and Grabisch et al. (2000) (see also Subsection 4.6.4). Example 4.9. All {0, 1}-valued convex games (e.g. unanimity games) are totally monotone (e.g. Marinacci, 1996: 1005) for a proof. Example 4.10. Let ( 1 , 1 , P1 ) be a probability space and 2 a finite space. A correspondence f : 1 → 2 2 is a random set if it is measurable, that is, f −1 (A) = {ω ∈ 1 : f (ω) ⊆ A} ∈ 1 for each A ⊆ 2 . Consider the distribution νf : 2 → R induced by a random set f , defined by νf (A) = P (f −1 (A)) for each A ∈ 2 . The distribution νf is a belief function (e.g. Nguyen, 1978). Random
Introduction to the mathematics of ambiguity
85
sets reduce to standard random variables when the images f (ω) are singletons; in this case, νf is the usual additive distribution induced by a random variable f . Under suitable topological conditions, random sets with values in infinite spaces
2 can also be considered (there is a large literature on them; e.g. Salinetti and Wets, 1986). We can now state the announced result. Theorem 4.10. Let ν be a game defined on the power set of a finite space . Then, the coefficients given by (4.21) are all positive if and only if ν is totally monotone. Remark. This theorem is essentially due to Dempster and Shafer (see Shafer, 1976). It has been extended to games on lattices by Gilboa and Lehrer (1991). ν = ν(A) ≥ 0. Proof. “If part”. Suppose ν is totally monotone. If |A| = 1, then αA Suppose |A| > 1 and set A = {ω1 , . . . , ωk } and Ai = A\{ωi } for each i = 1, . . . , k. We have ν = αA
(−1)|A|−|B| ν(B)
B⊆A
= ν(A) −
ν(Ai ) +
ν(Ai ∩ Aj ) − · · · + (−1)k ν(A1 ∩ · · · ∩ Ak )
i =j
i
= ν(A) −
(−1)|I |+1 ν
Ø=I ⊆{1,...,k}
As A = ν αA
k
i=1 Ai ,
=ν
k i=1
! Ai .
i∈I
we then have ! Ai
−
Ø=I ⊆{1,...,k}
(−1)
|I |+1
ν
! Ai ,
i∈I
ν so that αA ≥ 0, as desired. ν “If ” part. Suppose αA ≥ 0 for each Ø = A ∈ . By Example 4.9, each unanimity game u is totally monotone. Hence, by Theorem 4.9, we have ν = A ν α u . The positive linear combination of totally monotone game is clearly A Ø=A A totally monotone. We infer that ν is totally monotone as desired.
Example 4.11. Given a function f : N → R with f (0) = 0, each scalar measure game f (|A|) is totally monotone if and only if f is absolutely monotone à la Bernstein, that is, k f (n) ≥ 0 for each n and k (see Widder, 1941). By (4.22) and by Theorem 4.10, to prove this fact it is enough to show that f is absolutely
86
Massimo Marinacci and Luigi Montrucchio
monotone if and only if k f (0) ≥ 0 for each k. As S = [ + I ] = n k
n
k
n
k k r=1
k k = r+n , r r r
r=1
we get f (k) = n
k k r=1
r
r+n f (0) ≥ 0,
which gives the desired conclusion. Totally monotone games are therefore the convex cone of Vn consisting of all its elements featuring positive coefficients in (4.21). Denote this cone by Vn+ ; being pointed,13 it induces a partial order ! on Vn defined by ν ! ν if ν − ν ∈ Vn+ . In particular, ν ! θ if and only if ν is totally monotone, so that Vn+ = {ν ∈ Vn : ν ! θ }. The partial order ! makes Vn an ordered vector space. More is true, under the lattice operations ∨ and ∧ induced by !.14 Lemma 4.9. The ordered vector space (Vn , !) is a Riesz space with lattice operations given by ν1 ∨ ν2 =
ν1 ν2 )uA , (αA ∨ αA
Ø=A∈
and ν1 ∧ ν2 =
ν1 ν2 (αA ∧ αA )uA ,
Ø=A∈
for each ν1 and ν2 in Vn . Proof. We only prove the result for ν1 ∨ ν2 , as a similar argument can be used for ν1 ν2 ν1 ∧ ν2 . Set ν = Ø=A∈ (αA ∨ αA )uA . We want to show that ν = ν1 ∨ ν2 . First observe that, for i = 1, 2, νi ν1 ν2 ν − νi = [(αA ∨ αA ) − αA ]uA . Ø=A∈ νi ν1 ν2 Hence, by Theorem 4.10, ν − νi ∈ Vn+ , all coefficients [(αA ∨ αA ) − αA ] being positive. This shows that ν is an upper bound for {ν1 , ν2 }. It remains to show that ν for any game ν such that ν ! νi for it is the least such bound, that is, ν ! i = 1, 2.
Introduction to the mathematics of ambiguity
87
As ν − νi ∈ Vn+ , it holds
ν α A uA −
ν −νi α uA ∈ Vn+ . A
Ø =A∈
Ø =A∈
Ø=A∈
νi αA uA =
ν −νi νi ν By Theorem 4.10, α ≥ 0 for each A, and so α A A ≥ αA for each A. Therefore, ν1 ν2 ν α A ≥ αA ∨ αA for each A, and so the difference
ν − ν=
ν1 ν2 ν [α A − (αA ∨ αA )]uA
Ø=A∈
ν1 ν2 ν belongs to Vn+ by Theorem 4.10, all the coefficients α A −(αA ∨αA ) being positive. We conclude that ν ! νi for each i, as desired. ||
The Riesz space (Vn , !) is lattice isomorphic to the Euclidean space (R2 ||
Lemma 4.10. The function T : Vn → R2
−1
−1 , ≥).
defined by
ν T (ν) = (αA ) for all ν ∈ Vn ||
is a lattice preserving isomorphism between (Vn , !) and (R2
−1 , ≥).
ν Proof. By Theorem 4.9, the vector (αA ) is uniquely determined. Hence, T is one-to-one. Now, let ν1 , ν2 ∈ Vn and α, β ∈ R. By Theorem 4.9,
⎛ αν1 +βν2
T (αν1 + βν2 ) = (αA ⎛ = ⎝α
)=⎝
⎞ (−1)|A|−|B| (αν1 + βν2 )(B)⎠
B⊆A
(−1)|A|−|B| ν1 (B) + β
B⊆A
⎞ (−1)|A|−|B| ν2 (B)⎠
B⊆A
= αT (ν1 ) + βT (ν2 ), and so T is an isomorphism. Moreover, by Lemma 4.9, ||
−1
||
−1
ν1 ν2 T (ν1 ∨ ν2 ) = T (ν1 ) ∨ T (ν2 ) = (αA ∨ αA ) ∈ R2 ν1 ν2 ∧ αA ) ∈ R2 T (ν1 ∧ ν2 ) = T (ν1 ) ∧ T (ν2 ) = (αA
as desired.
, ,
88
Massimo Marinacci and Luigi Montrucchio By Lemma 4.9, the positive ν + and negative ν − parts of a game ν, defined by = ν ∨ 0 and ν − = −(ν ∧ 0), are given by ν (αA ∨ 0)uA , ν+ =
ν+
Ø=A∈
ν− =
ν (αA ∧ 0)uA .
Ø=A∈
The absolute value |ν|, defined by |ν| = ν + + ν − , is given by ν |αA |uA . |ν| = Ø=A∈
Notice that ν + ν + = T −1 ((αA ) ),
ν − ν − = T −1 ((αA ) ),
and
ν |ν| = T −1 (|αA |),
in accordance with Lemma 4.10. The associated norm · c is given by νc = |ν|( ) = ν + ( ) + ν − ( ) for each ν ∈ Vn , that is, ν |αA | = T (ν)1 . (4.24) νc = Ø=A∈
Following Gilboa and Schmeidler (1995), we call · c the composition norm. It is an L-norm since ν1 + ν2 c = ν1 c + ν1 c whenever ν1 and ν2 belong to Vn+ . As a result, (Vn , !, · c ) is an AL-space. Since ν |αA | = T (ν)1 , (4.25) νc = Ø=A∈ ||
where · 1 is the l1 -norm of R2 −1 , the isomorphism T is therefore an isometry || between (Vn , · c ) and (R2 −1 , · ).15 Summing up, Theorem 4.11. There is a lattice preserving and isometric isomorphism T between || the AL-spaces (Vn , !, · c ) and (R2 −1 , ≥, · 1 ) determined by the identity αA u A . (4.26) ν= Ø=A∈
Moreover, ν is totally monotone if and only if the corresponding vector (αA ) in || R2 −1 is nonnegative. ||
In other words, for each ν in Vn there is a unique (αA ) in R2 −1 such that (4.26) || holds; conversely, for each vector (αA ) in R2 −1 there is a unique ν in Vn such
Introduction to the mathematics of ambiguity
89
that (4.26) holds. Moreover, the correspondence T between ν and (αA ) is linear, lattice preserving, and isometric. Consider the restriction of the partial order ! on ba(), the vector subspace of Vn consisting of charges. Since ba + () = Vn+ ∩ ba(), given any µ1 and µ2 in ba(), we have µ1 ! µ2 if and only if µ1 − µ2 ∈ ba + (). Equivalently, µ1 ! µ2 if and only if µ1 ≥ µ2 setwise, that is, µ1 (A) ≥ µ2 (A) for each A. This is the standard partial order studied on ba() (see Rao and Rao, 1983), which can therefore be viewed as the restriction of ! on ba(). As a result, the standard lattice structure on ba() coincides with the one it inherits as a subspace of (Vn , !). In particular, on ba() the norm · c reduces to the total variation norm · . All this shows that the standard structures on ba() studied in measure theory are consistent with the ones we have identified on Vn so far. In the sequel we will denote by !ba the restriction of ! on ba(). 4.6.2. A decomposition The lattice structure of Vn suggests the possibility of achieving a decomposition à la Riesz for finite games. Given the close connection between · and the l1 -norm · 1 established in Theorem 4.11, it is natural to expect that such decomposition would resemble the one available for the familiar l1 -norm. For this reason, we first recall a simple decomposition result for the l1 -norm. Lemma 4.11. Given any vector z ∈ Rn , the vectors z+ and z− are the unique vectors in Rn+ such that z = z+ − z− ,
(4.27)
z1 = z+ 1 + z− 1 .
(4.28)
and
Proof. Clearly, the decomposition z = z+ −z− satisfies (4.28). Suppose x, y ∈ Rn+ satisfy (4.27). We want to show that x ≥ z+ and y ≥ z− . As x = z + y, x ≥ z, we have x ≥ z+ . Likewise, y = x − z ⇒ y ≥ −z ⇒ y ≥ z− . On the other hand, we have x1 + y1 ≥ z+ 1 + z− 1 = z1 . As x1 ≥ z+ 1 and y1 ≥ z− 1 , to get (4.28) we must have x1 = z+ 1 and y1 = z− 1 . Hence, x = z+ and y = z− . Lemma 4.11 leads to the following decomposition result, which generalizes in our finite setting the Jordan Decomposition Theorem for charges. Versions of this result for finite and infinite games have been proved by Revuz (1955), Gilboa and Schmeidler (1994, 1995) and Marinacci (1996).
90
Massimo Marinacci and Luigi Montrucchio
Theorem 4.12. Given any ν ∈ Vn , the games ν + and ν − are the unique totally monotone games such that ν = ν+ − ν−
(4.29)
νc = ν + c + ν − c .
(4.30)
and
Proof. Let ν1 and ν2 be any two games in Vn+ satisfying (4.39) and (4.30). Then, 2|| −1 are such that the positive vectors T (ν1 ) and T (ν2 ) of R+ T (ν) = T (ν1 ) − T (ν2 ),
and T (ν)1 = T (ν1 )1 + T (ν2 )1 .
By Lemma 4.11, T (ν1 ) = T (ν)+ = T (ν + ) and T (ν1 ) = T (ν)− = T (ν − ). Since T is an isomorphism, we conclude that ν1 = ν + and ν2 = ν − , as desired. 4.6.3. Additive representation By Theorem 4.9, each finite game ν can be uniquely written as ν=
ν αA uA ,
(4.31)
Ø=A∈
Let be the collection of all nonempty sets of , that is, = {A ∈ : A = Ø}. The collection can be viewed as new space, whose “points” are the nonempty sets of . By identifying with the collection of all singletons {{ω} : ω ∈ }, we can actually view the space as an enlargement of the original space. ν Define on the power set 2 of the space a charge µν as follows: µν (A) = αA for each A ∈ . By additivity, this is enough to define the charge µν on the entire power set 2 . For example, for the set {A, B} ∈ 2 we have µν ({A, B}) = ν αA + αBν ; more generally, any collection {A1 , . . . , An } ∈ 2 , we have n given ν µν ({A1 , . . . , An }) = i=1 αA . i Each game ν is thus associated with a charge µν on 2 . Denote by I : Vn → ba(2 ) this correspondence ν → µν , which is well defined by Theorem 4.9. It is also linear, that is, I (αν1 + βν2 ) = αI (ν1 ) + βI (ν2 ) for all α, β ∈ R and all ν1 , ν2 ∈ Vn . The linear correspondence I provides some noteworthy insights into Choquet integrals. To see why, given a set E, consider the function 1E : → R defined by " 1 A⊆E 1E (A) = 1E duA = uA (E) = 0 else = {A ∈ : A ⊆ E}, then 1E = 1E. That is, 1E is a for each A. If we set E characteristic function on the enlarged space .
Introduction to the mathematics of ambiguity Using µν and 1E , we can rewrite (4.31) as ν(E) = 1E (A)µν (A) = 1E dµν =
A∈
91
1E dµν = µν (E)
for each E ∈ . Equivalently, 1E dν = 1E dµν for each E ∈ .
(4.32)
Therefore, thanks to the linear correspondence I , we can represent the Choquet integral 1E dν as a standard additive integral on the enlarged space . In this = {A ∈ : A ⊆ E} of extended domain, the set E of is replaced by the setE . We call 1E dµν the additive representation of 1E dν. In a sense, (4.32) says that the Choquet integral 1E dν can be viewed as a “zipped” version of the additive integral 1E dµν . The trade-off here is between a more economical domain—that is, ( , ) rather than ( , 2 )—and a better behaved integral—that is, the additive integral rather than the non-additive one. In any case, to compute both representations we need to know the 2| |−1 values of ν and µν , respectively; hence, both representations involve the same amount of information, though processed in different ways. Next we formally collect the relevant properties of the additive representation. Observe that the correspondence I is actually an isomorphism.16 Theorem 4.13. There is a lattice preserving and isometric isomorphism I between the AL-spaces (Vn , !, · ) and (ba(2 ), !ba , · ) determined by the identity for each E ∈ . ν(E) = µ(E)
(4.33)
The game ν is totally monotone if and only if the corresponding µ is nonnegative. Versions of this result for finite and infinite games can be found in Revuz (1955), Gilboa and Schmeidler (1994, 1995), Marinacci (1996), and Philippe et al. (1999). Denneberg (1997) provides an overview and alternative proofs of some of these results.
Proof. Given a charge µ ∈ ba(2 ), the set function ν on defined by (4.33) is clearly a game. As to the converse, the charge µν defined earlier belongs to ba(2 ) and satisfies (4.33). It is also the unique charge in ba(2 ) satisfying (4.33). In fact, let µ be any other charge in ba(2 ) satisfying (4.33). Consider the collection : E ∈ } of subsets of . As = {E E 1 ∩ E2 = E1 ∩ E2
and E 1 ∪ E2 ⊇ E1 ∪ E2 ,
is, in general, only closed under intersections, that is, it is a π the collection class (see Aliprantis and Border, 1999: 132). As µ and µν coincide on a π -class,
92
Massimo Marinacci and Luigi Montrucchio
generated by (see Aliprantis and Border, they coincide on the algebra A() coincides with the power set 2 of . For, 1999: Theorem 9.10). But, A() \ A contains all singletons: given A ∈ , we have {A} = A A() − ω for any ω ∈ A. As a result, µ and µν coincide on the power set 2 , thus proving that µν is the unique charge in ba(2 ) satisfying (4.33). All this shows that the linear correspondence I we introduced above is an isomor phism between Vn and ba(2 ). It is also an isometry: the equality I (ν) = νr follows from ν µν = |µν (A)| = |αA | = νr . A∈
A∈
It remains to show that I is lattice preserving. We will only consider ∨, the argument for ∧ being similar. For each A ∈ , we have ν1 ν2 ∨ αA = max{µν1 (A), µν2 (A)} = (µν1 ∨ µν2 )(A), µν1 ∨ν2 (A) = αA
as desired (the last equality holds because A is a singleton when viewed as a member of ). The additive representation is not limited to integrals of characteristic functions, but it holds for all functions in B(). To see why this is the case, observe that the additivity of the Riemann integral immediately implies that the Choquet integral is linear on games, that is, f d(ν1 + ν2 ) = f dν1 + f dν2 for any ν1 and ν2 in Vn . Therefore, (4.31) implies that ! ν ν f dν = f d α A uA = αA f duA (4.34) A∈
A∈
for each f ∈ B(). Define a function f: → R by f(A) = f duA for each A ∈ . As
(4.35)
f duA = minω∈A f (ω) (see Example 4.4), it actually holds f(A) = min f (ω) ω∈A
for each A ∈ .
By (4.34) we have ν ν f dν = αA f (A) = αA min f (ω) = A∈
A∈
ω∈A
and so the representation (4.34) can be written as f dν = fdµν for each f ∈ B().
fdµν ,
(4.36)
This is the desired extension of (4.32) to all functions in B(). In fact, if f = 1E , we have f = 1E, and so (4.36) reduces to (4.32) for characteristic functions.
Introduction to the mathematics of ambiguity 93 Summing up, the additive representation of the Choquet integral f dν is given by
fdµν =
min f (ω) dµν =
ω∈A
ν αA min f (ω).
A∈
ω∈A
Theorem 4.13 can be extended from games to Choquet integrals along these lines. In order to do so, consider the space Vnc of all Choquet functionals on B(). It is a vector space since (αν + βν )c = ανc + βνc for all ν , ν ∈ Vn and all α, β ∈ R. By the next result, Vnc is isomorphic to the dual space B(2 )∗ of B(2 ).17 Corollary 4.3. There is an isomorphism between the vector spaces Vnc and B(2 )∗ determined by the identity νc (f ) = µ(f) for each f ∈ B().
(4.37)
In particular, I (ν) = µ, where ν is the game associated to νc and I is the isomorphism of Theorem 4.13.
Remark. For convenience, here µ denotes the linear functional in B(2 )∗ given by f dµ for each f ∈ B(2 ).
Proof. We first show that given any µ ∈ B(2 )∗ , the functional νc : B() → R defined by (4.37) is comonotonic additive. Observe that, given any two comonotonic f1 and f2 in B(), it holds (f 1 + f2 )(A) = min (f1 + f2 )(ω) ω∈A
= min f1 (ω) + min f2 (ω) = f1 (A) + f2 (A) ω∈A
ω∈A
(4.38)
for each A ∈ . Hence, νc (f1 + f2 ) = µ(f 1 + f2 ) = µ(f1 + f2 ) = µ(f1 ) + µ(f2 ) = νc (f1 ) + νc (f2 ), and so νc is comonotonic additive, as desired. It remains to prove that, given any Choquet functional νc ∈ Vnc , the functional µ defined by (4.37) is linear on B(2 ). By Theorem 4.13, Equation (4.37) uniquely the power set 2 . Hence, the associated linear functional determines a charge µ on f dµ belongs to B(2 )∗ , as desired.
94
Massimo Marinacci and Luigi Montrucchio
4.6.4. Polynomial representation A further possible way to represent finite games is in terms of polynomials. Consider the set {0, 1}n of the vertices of the hypercube [0, 1]n . Functions f : {0, 1}n → R are called pseudo-Boolean (see Boros and Hammer, 2002 and Grabisch et al., 2000). Say that a pseudo-Boolean function f is grounded if f (0, . . . , 0) = 0. Finite games can be regarded as grounded pseudo-Boolean functions. In fact, w.l.o.g. set
= {1, . . . , n} and = 2{1,...,n} , so that Vn is the set of all games ν : 2{1,...,n} → R. Given A ⊆ {1, . . . , n}, consider the characteristic vector 1A ∈ {0, 1}n given by " 1A (i) =
1 i∈A 0 else
Since {0, 1}n = {1A : A ⊆ {1, . . . , n}}, each game ν uniquely determines a grounded pseudo-Boolean function f by setting f (1A ) = ν(A) for each A ⊆ {1, . . . , n}. Conversely, each grounded pseudo-Boolean function f induces a game ν : 2{1,...,n} → R by setting ν(A) = f (1A ) for each A ⊆ {1, . . . , n}. Given a pseudo-Boolean function f , consider the polynomial
Bf (x) =
f (1A )
A⊆{1,...,n}
#
xi
#
(1 − xj )
for each x ∈ Rn .
(4.39)
j ∈Ac
i∈A
This polynomial is an extension of f on Rn as Bf (1A ) = f (1A ) for each A ⊆ {1, . . . , n}. More important, Bf is a Bernstein polynomial of f . For, recall (see Schultz, 1969) that given a function f : [0, 1]n → R and an ntuple m = (m1 , m2 , . . . , mn ) with non-negative integer components, its Bernstein polynomial B m f : Rn → R is B m f (x) =
m1 m2 k1 =0 k2 =0
···
mn
f
kn =0
k1 kn ,..., m1 mn
# n i=1
mi ki x (1 − xi )mi −ki . ki i
In particular, the least-degree Bernstein polynomial B (1,...,1) f : Rn → R associated with f is given by B (1,...,1) f (x) =
f (k)x1k1 · · · xnkn (1 − x1 )1−k1 · · · (1 − xn )1−kn .
k=(k1 ,...,kn )∈{0,1}n
To define B (1,...,1) we only need to know the values of f at the vertices {0, 1}n , and this makes it possible to associate B (1,...,1) to any pseudo-Boolean function. The polynomial (4.39) is, therefore, the least-degree Bernstein polynomial B (1,...,1) of f : {0, 1}n → R. When f is grounded, the polynomial Bf is multilinear, that is, it is linear in each variable xi . In particular, Bf is the unique multilinear extension of f on Rn
Introduction to the mathematics of ambiguity and it can be also written as # # Bν (x) = ν(A) xi (1 − xj ) Ø=A∈
i∈A
for each x ∈ Rn ,
95
(4.40)
j ∈Ac
where ν is the game associated with the grounded function f . The polynomical Bν is called the Owen multilinear extension of the game ν, and it was introduced by Owen (1972). In view of our previous discussion, Bν is the least-degree Bernstein polynomial of the grounded pseudo-Boolean function induced by the game. Example 4.12. Consider the game ν ∈ Vn given by ν(A) = |A|2 for each A ⊆ {1, . . . , n}. We have Bν (x) =
n
xi + 2
xi x j .
i =j
i=1
Denote by Pn the vector space of all multilinear polynomials on Rn . The next result provides two basis for Pn and a formula for the relative change of basis. Theorem 4.14. Monomials i∈A xi form a basis for the (2n − 1)-dimensional vector space Pn , as well as the polynomials i∈A xi i∈Ac (1 − xi ). Given P ∈ Pn , if # # # P (x) = αA xi = βA xi (1 − xi ), Ø=A∈
i∈A
Ø =A∈
i∈A
i∈Ac
then αA =
(−1)|A|−|B| βB .
(4.41)
B⊆A
Proof. Each multilinear polynomial P can written as a linear combination α Ø=A∈ A i∈A xi of monomials. Let us prove that such combination is unique. As in Boros and Hammer (2002), we proceed by induction on the size of the subsets A. Begin with |A| = 1. In this case αA = P (1A ), and so the coefficient αA is uniquely determined. Assume next that all αA , with |A| ≤ k − 1, are uniquely determined. Let A be such that |A| = k. Since P (1A ) = B⊆A αB , we have αA = P (1A ) −
αB .
B A
The coefficient αA is then uniquely determined as all coefficients αB are uniquely determined by the induction hypothesis. We conclude that the monomials are a basis for Pn . As there are 2n − 1 monomials, the space Pn has dimension 2n − 1.
96
Massimo Marinacci and Luigi Montrucchio There are 2n − 1 polynomials of the form i∈A xi i∈Ac (1 − xi ). Hence, they form a basis provided they are linearly independent. To see that this is the case, suppose # # P (x) ≡ βA xi (1 − xi ) = 0 for each x ∈ Rn . Ø=A∈
i∈A
i∈Ac
Then P (1A ) = 0 for each A, and so βA = 0. This shows that these polynomials are linearly independent, and so a basis. It remains to prove (4.41). Since P (1A ) = B⊆A αB for each A ⊆ {1, . . . , n}, we can obtain (4.41) by using a combinatorial argument that can be found in Shafer (1976: 48) and Chateauneuf and Jaffray (1989: Lemma 2.3). Remark. Consider the function M : Pn → Pn given by M(P )(1A ) = (−1)|A|−|B| P (1B )
(4.42)
B⊆A
for each index set A ⊆ {1, . . . , n}. This is the Mobius transform on Pn and, by (4.41), it can be viewed as a change of basis formula. By Theorem 4.14, the polynomials i∈A xi i∈Ac (1 − xi ) form a basis for Pn and so each multilinear polynomial can be represented as in (4.39) and viewed as the least-degree Bernstein polynomial of a suitable grounded pseudo-Boolean function. Equivalently, each multilinear polynomial can be viewed as the Owen polynomial of a suitable game. Moreover, by Theorem 4.14 we can represent the polynomial Bf of a grounded f in a unique way as ⎛ ⎞ # ⎝ Bf (x) = (−1)|A|−|B| f (1B )⎠ xi . (4.43) Ø=A∈
B⊆A
i∈A
Hence, the relative Owen polynomial can be uniquely written as ⎛ ⎞ # ⎝ Bν (x) = (−1)|A|−|B| ν(B)⎠ xi . Ø=A∈
B⊆A
i∈A
Let us get back to finite games. Denote by B the Owen correspondence ν → Bν between Vn and Pn . The next lemma collects a few simple properties of B. Here eA denotes the game in Vn given by 1 A=B eA (B) = . 0 else The family {eA }Ø=A∈ is clearly a basis in Vn , and any game ν can be represented by ν = Ø=A∈ ν(A)eA .
Introduction to the mathematics of ambiguity
97
Lemma 4.12. The Owen correspondence B is an isomorphism between the vector spaces Vn and Pn . Moreover, (i) for each unanimity game uA , we have # BuA (x) = xi for each x ∈ Rn ; i∈A
(ii) for each game eA , we have # # BeA (x) = xi (1 − xi ) for each x ∈ Rn ; i∈A
i∈Ac
(iii) for each charge µ, we have Bµ (x) =
n
µ(i)xi for each x ∈ Rn ;
i=1
(iv) a game ν is positive if and only if Bν (x) ≥ 0 for all x ∈ [0, 1]n . (v) a game ν is convex if, for each i = j , ∂Bν (x) ≥ 0 for all x ∈ (0, 1)n . ∂xi ∂xj Proof. By Theorem 4.14, B is one-to-one. As it is also linear, B is an isomorphism between the vector spaces Vn and Pn . Let us prove (i). As i∈A xi ∈ Pn (x) and B is a linear isomorphism, there exists a unique game ν such that Bν (x) = i∈A xi . As ν(B) = Bν (1B ), we have ν(B) = 1 if B ⊇ A and ν(B) = 0 elsewhere. Hence, ν = uA . As (ii) is trivially true, let us prove (iii). By Example 4.7, µ = ni=1 µ(i)δi , where δi is the Dirac charge concentrated on i ∈ . By the linearity of B and by point (i), Bµ (x) = Bni=1 µ(i)δi (x) =
n i=1
µ(i)Bδi (x) =
n
µ(i)xi ,
i=1
as desired. (40) has all positive coefficients. As (iv) If ν ≥ 0, the Owen polynomial n n i∈A xi j ∈Ac (1 − xj ) ≥ 0 on [0, 1] , we then have Bν (x) ≥ 0 on [0, 1] . The converse is obvious as ν(A) = Bν (1A ) ≥ 0. (v) This condition on the second derivatives implies that Bν is supermodular on (0, 1)n . As it is continuous, Bν is then supermodular on the hypercube [0, 1]n . In turn this implies the convexity of ν. Lemma 4.12(i) shows that unanimity games are the game counterpart of monomials. By Theorem 4.14, monomials form a basis of the space Pn of multilinear polynomials. As a result, Theorem 4.9 can be viewed as a corollary of
98
Massimo Marinacci and Luigi Montrucchio
Theorem 4.14, and the representation (4.21) as a consequence of the polynomial representation (4.43). Remark. As we did in Pn with (4.42), here as well we can define a Mobius transform M : Vn → Vn by (−1)|A|−|B| ν(B) M(ν)(A) = B⊆A
for each A ⊆ {1, . . . , n}. The Mobius transform on Vn can be viewed as a change of basis formula, between the basis {eA }Ø=A∈ and {uA }Ø=A∈ . The next result completes Lemma 4.12 by showing what is the polynomial counterpart of total monotonicity. Lemma 4.13. A game ν is totally monotone if and only if its Owen polynomial Bν is nonnegative on Rn+ , that is, Bν (x) ≥ 0 for each x ∈ Rn+ . Proof. Suppose ν is totally monotone. By Lemma 4.12, we can write # ν ν BuA (x) = αA xi . αA Bν (x) = Ø=A∈
Ø =A∈
i∈A
Hence, if ν is totally monotone, then Bν (x) ≥ 0 for all x ∈ Rn+ . Conversely, assume Bν (x) ≥ 0 for all x ∈ Rn+ . We want to show that ν is totally monotone, ν ν ≥ 0 for each A. Suppose, per contra, that αA < 0 for some A. Consider that is, αA n , with t > 0. Then, the vector t1A ∈ R+ ν |A| Bν (t1A ) = αA t + terms of lower degree.
Hence, for t large enough we have Bν (t1A ) < 0, a contradiction. We conclude ν that αA ≥ 0 for each A, as desired. This lemma is the reason why we considered multilinear polynomials defined on Rn rather than on [0, 1]n , as it is usually the case. In fact, by Lemma 4.12(iv) the positivity of the Owen polynomial on [0, 1]n only reflects the positivity of the associated game, not its total monotonicity. We now illustrate Lemmas 4.12 and 4.13 with a couple of examples. Example 4.13. Consider the game ν(A) = |A|2 of Example 4.12. As Bν (x) =
n i=1
xi + 2
i =j
x i xj ≥ 0
for each x ∈ Rn+ ,
by Lemma 4.13 the game ν is totally monotone.
Introduction to the mathematics of ambiguity
99
Example 4.14. Consider the game associated with the multilinear polynomial B(x) = x1 x2 + x1 x3 + x2 x3 − εx1 x2 x3 with ε > 0. As B(10/ε, 10/ε, 2/ε) < 0 for each ε > 0, this game is not totally monotone. The game is positive and convex when ε ≤ 1. In fact, B(x) = x1 x2 (1 − εx3 ) + x1 x3 + x2 x3 ≥ 0 on [0, 1]3 , and so by Lemma 4.12(iv) is positive. On the other hand, ∂ 2B = 1 − εxk ≥ 0, ∂xi ∂xj on (0, 1)n , so that, by Lemma (4.12)(v), the game is convex. In view of Lemma (4.13), it is natural to consider the pointed convex cone Pn+ = {P ∈ Pn : P (x) ≥ 0 for each x ∈ Rn+ }. It induces in the usual way an order !p on Pn as follows: given P1 , P2 ∈ Pn , write P1 !p P2 if P1 − P2 ∈ Pn+ . In turn, !p induces a lattice structure and norm, denoted by · p , that makes Pn an AL-space. For brevity, we omit the details of these by now standard notions. The next result summarizes the relations existing between the space of finite games and the space of multilinear polynomials just introduced. Theorem 4.15. There is a lattice preserving and isometric isomorphism B between the AL-spaces (Vn , !, · ) and (Pn , !p , · p ) determined by the identity P (x) =
ν(A)
Ø=A∈
# i∈A
xi
#
(1 − xj ) for each x ∈ Rn .
j ∈Ac
The game ν is totally monotone if and only if the corresponding polynomial P in Pn is nonnegatiue on Rn+ . Summing up, Theorems 4.11, 4.13, and 4.15 established the following lattice isometries:
R2||−1 , ≥, · 1
T
I
←→ (Vn , !, · ) ←→ (ba(2 ), !ba , · ) #B (Pn , !p , · p )
The resulting isometrics I ◦ T −1 and B ◦ T −1 between R2||−1 , ba(2 ), and Pn are obviously well known. The interesting part here is given by the possibility of representing finite games in different ways, each useful for different purposes.
100
Massimo Marinacci and Luigi Montrucchio
4.6.5. Convex games In this last subsection we show some noteworthy properties of finite convex games. A first important property has been already mentioned right after Theorem 4.12: any finite game can be written as the difference of two convex games. To see other properties of finite convex games, we have to turn our attentions to chains of subsets of . As = {ω1 , . . . , ωn }, the collection C given by {ω1 }, {ω1 , ω2 }, . . . , {ω1 , . . . , ωn } forms a maximal chain, that is, no other chain can contain it. More generally, given any permutation σ on {1, . . . , n}, the collection Cσ given by {ωσ (1) }, {ωσ (1) , ωσ (2) }, . . . , {ωσ (1) , . . . , ωσ (n) } forms another maximal chain. All maximal chains in have this form, and so there are n! of them. Let ν be any game. By Lemma 4.5, for each Cσ there is a charge µσ ∈ ba() such that µσ (A) = ν(A) for each A ∈ Cσ . Because of the maximality of Cσ , the charge µσ is easily seen to be unique. We call µσ is the marginal worth charge associated with permutation σ . Marginal worth charges play a central role in studying finite convex games. We begin by providing a characterization of convexity based on them, due to Ichiishi (1981). Theorem 4.16. A finite game ν is convex if and only if all its marginal worth charges µσ belong to the core. Proof. “Only if”. Suppose ν is convex. We want to show that each µσ belongs to core(ν). By Theorem 4.7, there exists µ ∈ core(ν) such that µ(A) = ν(A) for each A ∈ Cσ . By the maximality of Cσ , µσ is the unique charge having such property. Hence, µ = µσ , as desired. “If”. Suppose µσ ∈ core(ν) for all permutations σ . Given any A and B, let Cσ be a maximal chain containing A ∩ B, A, and A ∪ B. Then ν(A ∪ B) + ν(A ∩ B) − ν(A) = µσ (A ∪ B) + µσ (A ∩ B) − µσ (A) = µσ (B). As µσ ∈ core(ν), we have µσ (B) ≥ ν(B), and so ν is convex. Turn now to cores of finite games. The first observation to make is that the core of a finite game is a subset of the | |-dimensional space R of the form: core(ν) = x ∈ R :
ω∈
xω = ν( ) and
ω∈A
$ xω ≥ ν(A) for each A .
Introduction to the mathematics of ambiguity
101
Equivalently, core(ν) =
A∈
x∈R :
ω∈A
$
xω ≥ ν(A) ∩ x ∈ R :
$ xω ≤ ν( ) ,
ω∈
that is, core(ν) is the set of solutions of a finite system of linear inequalities on R . Sets of this form are called polyhedra. By Proposition 4.2 the core is weak*-compact. In this finite setting, this means that it is a compact subset of R , where compactness is in the standard norm topology of R . The core of a finite game is, therefore, a compact polyhedron. As a result, we have the following geometric property of cores of finite games. Proposition 4.16. The core of a finite game is a polytope in R , that is, it is the convex hull of a finite set. Proof. By a standard result (see Aliprantis and Border, 1999: 233–234 or Webster, 1994: 114), compact polyhedra are polytopes. The extreme points of a polytope are called vertices and they form a finite set. As each element of a polytope can be represented as a convex combination of its vertices, the knowledge of the set of vertices is, therefore, key in describing the structure of a polytope. All this means that, by Proposition 4.16, in order to understand the structure of the core it is crucial to identify the set of its vertices. This is achieved by the next result, due to Shapley (1971). Interestingly, the marginal worth charges, which by Theorem 4.16 always belong to the core of a convex game, turn out to be exactly the sought-after vertices. Theorem 4.17. Let ν be a finite convex game. Then, a charge µ ∈ ba() is a vertex of core(ν) if and only if it is a marginal worth charge, that is, if and only if there is a maximal chain Cσ such that ν(A) = µ(A) for all A ∈ Cσ . Proof. An element of a polytope is a vertex if and only if it is an exposed point. Hence, it is enough to show that the marginal worth charges are the set of exposed points of core(ν). “If ”. Suppose µσ is a marginal worth charge, with associated maximal chain Cσ . We want to show that is an exposed point of core(ν). Since Cσ is a maximal by Cσ , that chain, there is an injective function fσ whose upper sets are given is, Cσ = {(fσ ≥ t)}t∈R . For example, if Cσ = {Aσ (i) }, take fσ = ni=1 1Aσ (i) . By the definition of Choquet integral, we have f dµσ = f dν. Since Cσ is maximal, µσ is the unique charge replicating ν on Cσ . Therefore, given any other charge µ in core(ν), there exists A ∈ Cσ such that µσ (A) < µ(A). Equivalently, there is some t ∈ R such that ν(fσ ≥ t) = µσ (fσ ≥ t) < µ(fσ ≥ t). Hence, fσ dν = fσ dµσ < fσ dµ for all µ ∈ core(ν) with µ = µσ , and this proves that µσ is an exposed point, as desired.
102
Massimo Marinacci and Luigi Montrucchio
“Only if ”. Suppose µ∗ is an exposed point of core(ν). We want to show that is a marginal worth charge, that is, that there exists a maximal chain C ∗ in
such that µ∗ (A) = ν(A) for each A ∈ C ∗ . ∗ Let {µi }m i=1 be the set of all exposed points of core(ν), except µ . Set k1 = ∗ ∗ f: → µ ∨ (maxi=1,...,m µi ). Since µ is an exposed point, there exists ∗ . Set k = f dµ for all µ ∈ core(ν) with µ = µ R such that f dµ∗ < 2 mini=1,...,m ( f dµi − f dµ∗ ). Clearly, k2 > 0. Given 0 < ε < k2 /2k1 , there is an injective g : → R such that f − g < ε. Hence, for each i we have g dµi − g dµ∗ = g dµi − f dµi + f dµi − f dµ∗ + f dµ∗ − g dµ∗ ≥ −εk1 + k2 − εk1 > 0.
µ∗
We conclude that g dµ∗ < g dµi for each i, and so g dµ∗ < g dµ for all µ ∈ core(ν) with µ = µ∗ . Since ν is convex, by Theorem 4.7 it holds g dν = minµ∈core(ν) g dµ, and ∗ < g dµ for all µ ∈ core(ν) with µ = µ∗ . The equality so g dν = g dµ ∗ g dν = g dµ implies that µ∗ (g ≥ t) = ν(g ≥ t) for all t ∈ R. Since g is injective, the chain of upper sets {g ≥ t} is maximal in , and it is actually the desired maximal chain C ∗ . Denote by M(ν) the set of all marginal worth charges of a game ν. By Theorem 4.17, we have core(ν) = co(M(ν)), and so all elements of the core can be represented as convex combinations of marginal worth charges. This result has been recently generalized to infinite games by Marinacci and Montrucchio (forthcoming). Putting together Theorems 4.16 and 4.17, we have the following remarkable property of finite games. Corollary 4.4. A finite game ν is convex if and only if M(ν) = exp(core(ν)). Therefore, given a game, the knowledge of its n! marginal worth charges makes it possible to determine both whether the game is convex and what is the structure of its core. We close by observing that it is not by chance that in Corollary 4.4 we use the set of exposed point exp rather than that of extreme points ext. For a polytope these two sets coincide and they form the set of vertices. For general compact convex sets, even in finite dimensional spaces, this is no longer the case and exposed points are only a subset of the set of extreme points. Inspection of the proof of Theorem 4.17 shows that what we have actually proved is that marginal worth charges are the set of exposed points of the core. The fact that they then turn out to coincide with the set of extreme points is a consequence of properties of polytopes, which are immaterial for the proof.
Introduction to the mathematics of ambiguity
103
When extending the result to infinite convex games this observation is important as in the more general, setting—where exposed and extreme points no longer necessarily coincide—the analog of marginal worth charges will actually characterize the exposed points. We refer the interested reader to Marinacci and Montrucchio (forthcoming) for details.
4.7. Concluding remarks 1
2
3
4
5
6
In this chapter we only considered games defined on spaces having no topological structure. There is a large literature on suitably “regular” set functions defined on topological spaces, tracing back to Choquet (1953). We refer the interested reader to Huber and Strassen (1973) and Dellacherie and Meyer (1978). Epstein and Wang (1996) and Philippe et al. (1999) provide some decision-theoretic applications of capacities on topological domains. In a series of papers, Gabriele Greco proposed an interesting notion of measurability on algebras. A noteworthy feature of his approach is that, unlike B(), the resulting class of measurable functions forms a vector space. Greco’s approach is, therefore, a further way to bypass the lack of vector structure of B() that we discussed in some detail after Theorem 4.6. In this chapter, we preferred to define the Choquet functional on the smaller domain B() ¯ and then extend it on the vector space B() using its Lipschitz continuity, following in this way a standard procedure in functional analysis. In any case, details on Greco’s approach can be found in his papers (e.g. Greco, 1981 and Bassanezi and Greco, 1984) and in Denneberg (1994). We did not consider here games and Choquet functionals defined on product algebras. For details on this topic we refer the interested reader to Ben Porath et al. (1997), Ghirardato (1997), and to the references contained therein. Throughout the chapter we only considered Choquet functionals defined on bounded functions. Results for the unbounded case can be found in Greco (1976, 1982), and Bassanezi and Greco (1984) and in Wakker (1993). Sipos (1979a,b) introduced a different notion of integral for capacities. It coincides with the Choquet integral for positive functions, but the extension to general functions is done according to the standard procedure used to extend the Lebesgue integral from positive functions to general functions, based on the decomposition f = f + −f − . The resulting integral is in general different from the Choquet integral and it turned out to be useful in some applications. We refer the interested reader to Sipos’ original papers and to Denneberg (1994). Theorem 4.6 and Corollary 4.2 make it possible to use convex analysis tools in studying convex games and their Choquet integrals. For example, Carlier and Dana (forthcoming) and Marinacci and Montrucchio (forthcoming) use such tools to study the structure of cores of convex games and the differentiability and subdifferentiability properties of their Choquet integrals.
104
Massimo Marinacci and Luigi Montrucchio
Acknowledgments We thank Fabio Maccheroni for his very insightful suggestions, which greatly improved this chapter. The financial support of MIUR (Ministero dell’Istruzione, Università e Ricerca Scientifica) is gratefully acknowledged.
Notes 1 In the sequel subsets of are understood to be in even where not stated explicitly and they are referred to both as sets and as coalitions. 2 Maccheroni and Ruckle (2002) proved that (bv(), · ) is a dual Banach space. 3 The weak∗ -topology and its properties can be found in, for example, Aliprantis and Border (1999), Dunford and Schwartz (1958) and Rudin (1973). 4 The subgame νA is the restriction of ν on the induced algebra A = ∩ A given by νA (B) = ν(B) for all B ⊆ A. 5 A collection C in is chain if for each A and B in C it holds either A ⊆ B or B ⊆ A. Throughout we assume that Ø, ∈ C. 6 That is, f ≥ g if f (ω) ≥ g(ω) for each ω ∈ , and f = supω∈ |f (ω)|. 7 A functional is superlinear if it is positively homogeneous and superadditive. Recall that, by Proposition 4.11, Choquet functionals are always positively homogeneous. 8 That is, f ∈ B() provided there is a sequence {fn }n ⊆ B0 () such that limn f − fn = 0. Here we are viewing B0 () as a subset of the set of all bounded functions f : → R. 9 The equivalence between the convexity of ν and the concavity of νc established in Corollary 4.2 is also a curious terminological phenomenon, which may give rise to some confusion. A simple way to avoid any problem is to use the terminology “supermodular games.” 10 Notice that ba() is the subspace of f a() consiting of all bounded charges. 11 See, for example, Biswas et al. (1999) and the references therein contained. For characterizations of convexity and exactness related to stability, see Kikuta (1988) and Sharkey (1982). 12 Needless to say, the properties we will establish for finite games also hold for games defined on finite algebras of subsets of infinite spaces. 13 That is, V + ∩ (−V + ) = {0}. 14 See Aliprantis and Border (1999: 263–330) for a definition of these lattice operations, as well as for all notions on vector lattices needed in the sequel. 15 The l1 -norm · 1 of Rn is given by x1 = ni=1 |xi | for each x ∈ Rn . 16 In the statement !ba denotes the restriction of ! on ba() as discussed right after Theorem 4.11. 17 B(2 )∗ is the vector space of all linear functionals defined on the vector space B(2 ) of all functions defined on the enlarged space .
References Aliprantis, C. D. and K. C. Border (1999) Infinite dimensional analysis, Springer-Verlag, New York. Aumann, R. and L. Shapley (1974) Values of non-atomic games, Princeton University Press, Princeton. Bassanezi, R. C. and G. H. Greco (1984) Sull’additività dell’integrale, Rendiconti Seminario Matematico Università di Padova, 72, 249–275.
Introduction to the mathematics of ambiguity
105
Ben Porath, E., I. Gilboa and D. Schmeidler (1997) On the measurement of inequality under uncertainty, Journal of Economic Theory, 75, 194–204. (Reprinted as Chapter 22 in this volume.) Bhaskara Rao, K. P. S., M. Bhaskara Rao (1983) Theory of charges, Academic Press, New York. Biswas, A. K., T. Parthasarathy, J. A. M. Potters, and M. Voorneveld (1999) Large cores and exactness, Games and Economic Behavior, 28, 1–12. Bondareva, O. (1963) Certain applications of the methods of linear programming to the theory of cooperative games, (in Russian) Problemy Kibernetiki, 10, 119–139. Boros, E. and P. L. Hammer (2002) Pseudo-Boolean optimization, Discrete Applied Mathematics, 123, 155–225. Carlier, G. and R. A. Dana, Core of convex distortions of a probability on a non-atomic space, Journal of Economic Theory, 173, 199–222, 2003. Chateauneuf, A. and Jaffray, J.-Y. (1989) Some characterizations of lower probabilities and other monotone capacities through the use of Mobius inversion, Mathematical Social Sciences, 17, 263–283. Choquet, G. (1953) Theory of capacities, Annales de l’Institut Fourier, 5, 131–295. Delbaen, F. (1974) Convex games and extreme points, Journal of Mathematical Analysis and Applications, 45, 210–233. Dellacherie, C. (1971) Quelques commentaires sur le prolongements de capacités, Seminaire Probabilités V, Lecture Notes in Math. 191, Springer-Verlag, New York. Dellacherie, C. and P.-A. Meyer (1978) Probabilities and potential, North-Holland, Amsterdam. Dempster, A. (1967) Upper and lower probabilities induced by a multivalued mapping, Annals of Mathematical Statistics, 38, 325–339. Dempster, A. (1968) A generalization of Bayesian inference, Journal of the Royal Statistical Society (B), 30, 205–247. Denneberg, D. (1994) Non-additive measure and integral, Kluwer, Dordrecht. Denneberg, D. (1997) Representation of the Choquet integral with the σ -additive Mobius transform, Fuzzy Sets and Systems, 92, 139–156. De Waegenaere, A. and P. Wakker (2001) Nonmonotonic Choquet integrals, Journal of Mathematical Economics, 36, 45–60. Dunford, N. and J. T. Schwartz (1958) Linear operators, part I: general theory, WileyIntersience, London. Einy, E. and B. Shitovitz (1996) Convex games and stable sets, Games and Economic Behavior, 16, 192–201. Epstein, L. G. and T. Wang (1996) “Beliefs about beliefs” without probabilities, Econometrica, 64, 1343–1373. Fan, K. (1956) On systems of linear inequalities, in Linear inequalities and related systems, Annals of Math. Studies, 38, 99–156. Ghirardato, P. (1997) On independence for non-additive measures, with a Fu-bini theorem, Journal of Economic Theory, 73, 261–291. Gilboa, I. and E. Lehrer (1991) Global games, International Journal of Game Theory, 20, 129–147. Gilboa, I. and D. Schmeidler (1994) Additive representations of non-additive measures and the Choquet integral, Annals of Operations Research, 52, 43–65. Gilboa, I. and D. Schmeidler (1995) Canonical representation of set functions, Mathematics of Operations Research, 20, 197–212.
106
Massimo Marinacci and Luigi Montrucchio
Grabisch, M., J.-L. Marichal and M. Roubens (2000) Equivalent representations of set functions, Mathematics of Operations Research, 25, 157–178. Greco, G. H. (1976) Integrale monotono, Rendiconti Seminario Matematico Università di Padova, 57, 149–166. Greco, G. H. (1981) Sur la mesurabilité d’une fonction numérique par rapport à une famille d’ensembles. Rendiconti Seminario Matematico Università di Padova, 65, 163–176. Greco, G. H. (1982) Sulla rappresentazione di funzionali mediante integrali, Rendiconti Seminario Matematico Università di Padova, 66, 21–42. Huber, P. J. and V. Strassen (1973) “Minimax tests and the Neyman-Pearson lemma capacities,” Annals of Statistics, 1, 251–263. Ichiishi, T. (1981) Super-modularity: applications to convex games and to the greedy algorithm for LP, Journal of Economic Theory, 25, 283–286. Kannai, Y. (1969) Countably additive measures in cores of games, Journal of Mathematical Analysis and Applications, 27, 227–240. Kelley, J. L. (1959) Measures on Boolean algebras, Pacific Journal of Mathematics, 9, 1165–1177. Kikuta, K. (1988) A condition for a game to be convex, Mathematica Japonica, 33, 425–430. Kikuta, K. and L. S. Shapley (1986) Core stability in n-person games, mimeo (2000). Maccheroni, F. and M. Marinacci, An Heine-Borel theorem for ba(), mimeo (2000). Maccheroni, F. and W. H. Ruckle (2002) BV as a dual space, Rendiconti Seminario Matematico Università di Padova, 107, 101–109. Marinacci, M. (1996) Decomposition and representation of coalitional games, Mathematics of Operations Research, 21, 1000–1015. (Reprinted as Chapter 12 in this volume.) Marinacci, M. (1997) Finitely additive and epsilon Nash equilibria, International Journal of Game Theory, 26, 315–333. Marinacci, M. and L. Montrucchio (2003) Subcalculus for set functions and cores of TU games, Journal of Mathematical Economics, 39, 1–25. Marinacci, M. and L. Montrucchio (2004) A Characterization of the core of convex games through Gateaux derivatives, Journal of Economic Theory, 116, 229–248. Moulin, H. (1995) Cooperative microeconomics, Princeton University Press, Princeton. Myerson, R. (1991) Game theory, Harvard University Press, Cambridge. Nguyen, H. T. (1978) On random sets and belief functions, Journal of Mathematical Analysis and Applications, 65, 531–542. Owen, G. (1972) Multilinear extensions of games, Management Science, 18, 64–79. Owen, G. (1995) Game theory, Academic Press, New York. Philippe, F., G. Debs and J.-Y. Jaffray (1999) Decision making with monotone lower probabilities of infinite order, Mathematics of Operations Research, 24, 767–784. Revuz, A. (1955) Fonctions croissantes et mesures sur les espaces topologiques ordonnés, Annales de l’Institut Fourier, 6, 187–269. Rota, G. C. (1964) Theory of Mobius functions, Z. Wahrsch. und Verw. Geb., 2, 340–368. Rudin, W. (1973) Functional analysis, McGraw-Hill, New York. Rudin, W. (1987) Real and complex analysis (3rd edition), McGraw-Hill, New York. Salinetti, G. and R. Wets (1986) On the convergence in distribution of measurable multifunctions (random sets), normal integrands, stochastic processes and stochastic infima, Mathematics of Operations Research, 11, 385–419. Schmeidler, D. (1968) On balanced games with infinitely many players, Research Program in Game Theory and Mathematical Economics, RM 28, The Hebrew University of Jerusalem.
Introduction to the mathematics of ambiguity
107
Schmeidler, D. (1972) Cores of exact games, Journal of Mathematical Analysis and Applications, 40, 214–225. Schmeidler, D. (1986) Integral representation without additivity, Proceedings of the American Mathematical Society, 97, 255–261. Schmeidler, D. (1989) Subjective probability and expected utility without additivity, Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Schultz, M. H. (1969) L∞ -multivariate approximation theory, SIAM Journal on Numerical Analysis, 6, 161–183. Shafer, G. (1976) A mathematical theory of evidence, Princeton University Press, Princeton. Shapley, L. S. (1953) A value for n-person games, in Contributions to the Theory of Games (H. Kuhn and A. W. Tucker, eds), Princeton University Press, Princeton. Shapley, L. S. (1967) On balanced sets and cores, Naval Research Logistic Quarterly, 14, 453–460. Shapley, L. S. (1971) Cores of convex games, International Journal of Game Theory, 1, 12–26. Sharkey, W. W. (1982) Cooperative games with large cores, International Journal of Game Theory, 11, 175–182. Sipos, J. (1979a) Integral with respect to a pre-measure, Mathematica Slovaca, 29, 141–155. Sipos, J. (1979b) Non linear integrals, Mathematica Slovaca, 29, 257–270. Wakker, P. (1993) Unbounded utility for Savage’s “foundations of statistics” and other models, Mathematics of Operations Research, 18, 446–485. Webster, R. (1994) Convexity, Oxford University Press, Oxford. Widder, D. V. (1941) The Laplace transform, Princeton University Press, Princeton. Wolfenson, M. and T.-L. Fine (1982) Bayes-like decision making with upper and lower probabilities, Journal of the American Statistical Association, 77, 80–88. Zhou, L. (1998) Integral representation of continuous comonotonically additive functionals, Transactions of the American Mathematical Society, 350, 1811–1822.
5
Subjective probability and expected utility without additivity David Schmeidler
5.1. Introduction Bayesian statistical techniques are applicable when the information and uncertainty with respect to the parameters or hypotheses in question can be expressed by a probability distribution. This prior probability is also the focus of most of the criticism against the Bayesian school. My starting point is to join the critics in attacking a certain aspect of the prior probability: The probability attached to an uncertain event does not reflect the heuristic amount of information that led to the assignment of that probability. For example, when the information on the occurrence of two events is symmetric they are assigned equal prior probabilities. If the events are complementary the probabilities will be 1/2, independently of whether the symmetric information is meager or abundant. There are two (unwritten?) rules for assigning prior probabilities to events in case of uncertainty. The first says that symmetric information with respect to the occurrence of events results in equal probabilities. The second says that if the space is partitioned into k symmetric (i.e. equiprobable) events, then the probability of each event is 1/k. I agree with the first rule and object to the second. In the example mentioned earlier, if each of the symmetric and complementary uncertain events is assigned the index 3/7, the number 1/7, 1/7 = 1 − (3/7 + 3/7), would indicate the decision maker’s confidence in the probability assessment. Thus, allowing nonadditive (not necessarily additive) probabilities enables transmission or recording of information that additive probabilities cannot represent. The idea of nonadditive probabilities is not new. Nonadditive (objective) probabilities have been in use in physics for a long time (Feynman, 1963). The nonadditivity describes the deviation of elementary particles from mechanical behavior toward wave-like behavior. Daniel Ellsberg (1961) presented his arguments against necessarily additive (subjective) probabilities with the help of the following “mind experiments”: There are two urns each containing one hundred balls. Each ball is either red or black. In urn I there are fifty balls of each color and
Schmeidler, D. (1989). “Subjective probability and expected utility without additivity,” Econometrica, 57, 571–587.
Subjective probability and EU without additivity
109
there is no additional information about urn I I . One ball is chosen at random from each urn. There are four events, denoted I R, I B, I I R, I I B, where I R denotes the event that the ball chosen from urn I is red, etc. On each of the events a bet is offered: $100 if the event occurs and zero if it does not. According to Ellsberg most decision makers are indifferent between betting on I R and betting on I B and are similarly indifferent between bets on I I R and I I B. It may be that the majority are indifferent among all four bets. However, there is a nonnegligible proportion of decision makers who prefer every bet from urn I (I B or I R) to every bet from urn I I (I I B or I I R). These decision makers cannot represent their beliefs with respect to the occurrence of uncertain events through an additive probability. The most compelling justification for representation of beliefs about uncertain events through additive prior probability has been suggested by Savage. Building on previous work by Ramsey, de Finetti, and von Neumann–Morgenstern (N–M), Savage suggested axioms for decision theory that lead to the criterion of maximization of expected utility. The expectation operation is carried out with respect to a prior probability derived uniquely from the decision maker’s preferences over acts. The axiom violated by the preference of the select minority in the example above is the “sure thing principle,” that is, Savage’s P2. In this chapter a simplified version of Savage’s model is used. The simplification consists of the introduction of objective or physical probabilities. An act in this model assigns to each state an objective lottery over deterministic outcomes. The uncertainty concerns which state will occur. Such a model containing objective and subjective probabilities has been suggested by Anscombe and Aumann (1963). They speak about roulette lotteries (objective) and horse lotteries (subjective). In the presentation here the version in Fishburn (1970) is used. The N–M utility theorem used here can also be found in Fishburn (1970). The concept of objective probability is considered here as a physical concept like acceleration, momentum, or temperature; to construct a lottery with given objective probabilities (a roulette lottery) is a technical problem conceptually not different from building a thermometer. When a person has constructed a “perfect” die, he assigns a probability of 1/6 to each outcome. This probability is objective in the same sense as the temperature measured by the thermometer. Another person can check and verify the calibration of the thermometer. Similarly, he can verify the perfection of the die by measuring its dimensions, scanning it to verify uniform density, etc. Rolling the die many times is not necessarily the exclusive test for verification of objective probability. On the other hand, the subjective or personal probability of an event is interpreted here as the number used in calculating the expectation (integral) of a random variable. This definition includes objective or physical probabilities as a special case where there is no doubt as to which number is to be used. This interpretation does not impose any restriction of additivity on probabilities, as long as it is possible to perform the expectation operation which is the subject of this work. Subjective probability is derived from a person’s preferences over acts. In the Anscombe–Aumann type model usually five assumptions are imposed on preferences to define unique additive subjective probability and N–M utility over
110
David Schmeidler
outcomes. The first three assumptions are essentially N–M’s—weak order, independence, and continuity— and the fourth assumption is equivalent to Savage’s P3, that is, state-independence of preferences. The additional assumption is nondegeneracy; without it uniqueness is not guaranteed. The example quoted earlier can be embedded in such a model. There are four states: (I B, I I B), (I B, I I R), (I R, I I B), (I R, I I R). The deterministic outcomes are sums of dollars. For concreteness of the example, assume that there are 101 deterministic outcomes: $0, $1, $2, . . . , $100. An act assigns to each state a probability distribution over the outcomes. The bet “$100 if I I B” is an act which assigns the (degenerate objective) lottery of receiving “$100 with probability one” to each state in the event I I B and “zero dollars with probability one” to each state in the event I I R. The bet on I I R is similarly interpreted. Indifference between these two acts (bets), the independence condition, continuity, and weak order imply indifference between either of them and the constant act which assigns to each state the objective lottery of receiving $100 with probability 1/2 and receiving zero dollars with probability 1/2. The same considerations imply that the constant act above is indifferent to either of the two acts (bets): “$100 if I B” and “$100 if I R”. Hence the indifference between I B and I R and the indifference between I I B and I I R in Ellsberg’s example, together with the von N–M conditions, imply indifference between all four bets. The nonnegligible minority of Ellsberg’s example does not share this indifference: they are indifferent between the constant act (stated earlier) and each bet from urn I , and prefer the constant act to each bet from urn I I . Our first objective consists of restatement, or more specifically of weakening, of the independence condition such that the new assumption together with the other three assumptions can be consistently imposed on the preference relation over acts. In particular the special preferences of the example become admissible. It is obvious that the example’s preferences between bets (acts) do not admit additive subjective probability. Do they define in some consistent way a unique nonadditive subjective probability, and if so, is there a way to define the expected utility maximization criterion for the nonadditive case? An affirmative answer to this problem is presented in the third section. Thus the new model rationalizes nonadditive (personal) probabilities and admits the computation of expected utility with respect to these probabilities. It formally extends the additive model and it makes the expected utility criterion applicable to cases where additive expected utility is not applicable. Before turning to a precise and detailed presentation of the model, another heuristic observation is made. The nomenclature used in economics distinguishes between risk and uncertainty. Decisions in a risk situation are precisely the choices among roulette lotteries. The probabilities are objectively given; they are part of the data. For this case the economic theory went beyond N–M utility and defined concepts of risk aversion, risk premium, and certainty equivalence. Translating these concepts to the case of decisions under uncertainty we can speak about uncertainty aversion, uncertainty premium, and risk equivalence. Returning to the example, suppose that betting $100 on I I R is indifferent to betting $100 on a risky event with an (objective) probability of 3/7. Thus, the subjective probability
Subjective probability and EU without additivity
111
of an event is its risk equivalent (P (I I R) = 3/7). In this example the number 1/7 computed earlier expresses the uncertainty premium in terms of risk. Note that nonadditive probability may not exhibit consistently either uncertainty aversion or uncertainty attraction. This is similar to the case of decisions in risk situations where N–M utility (of money) may be neither concave nor convex.
5.2. Axioms and background Let X be a set and Y be the set of distributions over X with finite supports Y = y : X → [0, 1] | y(x) = 0 for finitely many x’s in X and
$ y(x) = 1 .
x∈X
For notational simplicity we identify X with the subset {y ∈ Y | y(x) = 1 for some x in X} of Y . Let S be a set and let be an algebra of subsets of S. Both sets, X and S are assumed to be nonempty. Denote by L0 the set of all -measurable finite valued functions from S to Y and denote by Lc the constant functions in L0 . Let L be a convex subset of Y S which includes Lc . Note that Y can be considered a subset of some linear space, and Y S , in turn, can then be considered as a subspace of the linear space of all functions from S to the first linear space. Whereas it is obvious how to perform convex combinations in Y it should be stressed that convex combinations in Y S are performed pointwise. That is, for f and g in Y S and α in [0, 1], αf + (1 − α)g = h where h(s) = αf (s) + (1 − α)g(s) on S. In the neo-Bayesian nomenclature, elements of X are (deterministic) outcomes, elements of Y are random outcomes or (roulette) lotteries, and elements of L are acts (or horse lotteries). Elements of S are states (of nature) and elements of are events. The primitive of a neo-Bayesian decision model is a binary (preference) relation over L to be denoted by . Next are stated several properties (axioms) of the preference relation, which will be used in the sequel. (i) Weak order. (a) For all f and g in L: f g or g f . (b) For all f, g, and h in L: If f g and g h, then f h. The relation on L induces a relation also denoted by on Y : y z iff y S zS where y S denotes the constant function y on S (i.e. {y}S ). As usual, and ∼ denote the asymmetric and symmetric parts, respectively, of . Definition 5.1. Two acts f and g in Y S are said to be comonotonic if for no s and t in S, f (s) f (t) and g(t) g(s). A constant act f , that is, f = y S for some y in Y , and any act g are comonotonic. An act f whose statewise lotteries {f (s)} are mutually indifferent, that is, f (s) ∼ y for all s in S, and any act g are comonotonic. If X is a set of numbers and
112
David Schmeidler
preferences respect the usual order on numbers, then any two X-valued functions f and g are comonotonic iff (f (s) − f (t))(g(s) − g(t)) 0 for all s and t in S. Clearly, I I R and I I B of the Introduction are not comonotonic. (Comonotonicity stands for common monotonicity.) Next our new axiom for neo-Bayesian decision theory is introduced. (ii) Comonotonic Independence. For all pairwise comonotonic acts f, g, and h in L and for all α in ]0, 1[: f g implies αf + (1 − α)h αg + (1 − α)h. (]0, 1[ is the open unit interval.) Elaboration of this condition is delayed until after condition (vii). Comonotonic independence is clearly a less restrictive condition than the independence condition stated below. (iii) Independence. For all f, g, and h in L and for all α in ]0, 1[: f g implies αf + (1 − α)h αg + (1 − α)h. (iv) Continuity. For all f, g, and h in L: f g and g h, then there are α and β in ]0, 1[ such that αf + (1 − α)h g and g βf + (1 − β)h. Next, two versions of state-independence are introduced. The intuitive meaning of each of these conditions is that the preferences over random outcomes do not depend on the state that occurred. The first version is the one to be used here. The second version is stated for comparisons since it is the common one in the literature. (v) Monotonicity. For all f and g in L: If f (s) g(s) on S thenf g. (vi) Strict monotonicity. For all f and g in L, y and z in Y, and E in : If f g, f (s) = y on E and g(s) = z on E, and f (s) = g(s) on E c , then y z. Observation.
If L = L0 , then (vi) and (i) imply (v).
Proof. Let f and g be finite step functions such that f (s) g(s) on s. There is a finite chain f = h0 , h1 , . . . , hk , = g where each pair of consecutive functions hi−1 , hi are constant on the set on which they differ. For this pair (vi) and (i) imply (v). Transitivity (i)(b) of concludes the proof. Clearly (i) and (v) imply (vi). For the sake of completeness we list as axiom: (vii) Nondegeneracy. Not for all f and g in L, f g. Out of the seven axioms listed here the completeness of the preferences, (i)(a), seems to be the most restrictive and most imposing assumption of the theory. One can view the weakening of the completeness assumption as a main contribution of all other axioms. Imagine a decision maker who initially has a partial preference relation over acts. After an additional introspection she accepts the validity of several of the axioms. She can then extend her preferences using these axioms.
Subjective probability and EU without additivity
113
For example, if she ranks f g and g h, and if she accepts transitivity, then she concludes that f h. From this point of view, the independence axiom, (iii), seems the most powerful axiom for extending partial preferences. Given f g and independence we get for all h in L and α in ]0, 1[: f ≡ αf + (1 − α)h αg + (1 − α)h ≡ g . However, after additional retrospection this implication may be too powerful to be acceptable. For example, consider the case where outcomes are real numbers and S = [0, 2π]. Let f and g be two acts defined: f (s) = sin(s) and g(s) = sin(s + π/2) = cos(s). The preferences f g may be induced by the rough evaluation that the event [π/3, 4π/3] is more probable than its complement. Define the act h by h(s) = sin(77s). In this case the structure of the acts f = 12 f + 12 h and g = 12 g + 12 h is far from transparent and the automatic implication of independence, f g , may seem doubtful to the decision maker. More generally: the ranking f g implies some rough estimation by the decision maker of the probabilities of events (in the algebra) defined by the acts f and g. If mixture with an arbitrary act h is allowed, the resulting acts f and g may define a much finer (larger) algebra (especially when the algebra defined by h is qualitatively independent of the algebras of f and g). Careful retrospection and comparison of the acts f and g may lead them to the ranking g f (as in the case of the Ellsberg paradox) contradictory to the implication of the independence axiom. Qualifying the comparisons and the application of independence to comonotonic acts rules out the possibility of contradiction. If f , g, and h are pairwise comonotonic, then the comparison of f to g is not very different from the comparison of f to g . Hence the decision maker can accept the validity of the implication: f g ⇐⇒ f g , without fear of running into a contradiction. Note that accepting the validity of comonotonic independence, (ii), means accepting the validity of the implication mentioned earlier without knowing the specific acts f , g, h, f , g , but knowing that all five are pairwise comonotonic. Before presenting the von Neumann–Morgenstern Theorem we point out that stating the axioms of, (i) weak order, (iii) independence, and (iv) continuity do not require that the preference relation be defined on a set L containing Lc . Only the convexity of L is required for (ii) and (iii). von Neumann–Morgenstern Theorem. Let M be a convex subset of some linear space, with a binary relation defined on it. A necessary and sufficient condition for the relation to satisfy (i) weak order, (iii) independence, and (iv) continuity is the existence of an affine real-valued function, say w, on M such that for all f and g in M: f g iff w(f ) w(g). (Affinity of w means that w(αf +(1−α)g) = αw(f )+(1−α)w(g) for 0 < α < 1.) Furthermore, an affine real-valued function w on M can replace w in the above statement iff there exist a positive number α and a real number β such that w (f ) = αw(f ) + β on M. As mentioned earlier, for proof of this theorem and the statement and proof of Anscombe–Aumann Theorem stated later, the reader is referred to Fishburn (1970).
114
David Schmeidler
Implication. Suppose that a binary relation on some convex subset L of Y S with Lc ⊂ L satisfies (i) weak order, (ii) comonotonic independence, and (iv) continuity. Suppose also that there is a convex subset M of L with Lc ⊂ M such that any two acts in M are comonotonic. Then by the von Neumann–Morgenstern Theorem there is an affine function on M, to be denoted by J , which represents the binary relation on M. That is, for all f and g in M: f g iff J (f ) J (g). Clearly, if M = Lc ≡ {y S | y ∈ Y } any two acts in M are comonotonic. Hence, if a function u is defined on Y by u(y) = J (y S ), then u is affine and represents the induced preferences on Y . The affinity of u implies u(y) = x∈X y(x)u(x). When subjective probability enters into the calculation of expected utility of an act, an integral with respect to a finitely additive set function has to be defined. Denote by P a finitely additive probability measure on and let a be a real-valued -measurable function on S. For the special case where a is a finite step function, a can be uniquely represented by ki=1 ai Ei∗ where α1 > α2 > · · · > αk are the values that a attains and Ei∗ is the indicator function on S of Ei ≡ {s ∈ S | a(s) = αi } for i = 1, . . . , k. Then a dP = S
k
P (Ei )αi .
i=1
The more general case where a is not finitely valued is treated as a special case of nonadditive probability. Anscombe–Aumann Theorem. Suppose that a preference relation on L = L0 satisfies (i) weak order, (iii) independence, (iv) continuity, (vi) strict monotonicity, and (vii) nondegeneracy. Then there exist a unique finitely additive probability measure P on and an affine real-valued function u on Y such that for all f and g in L0 : f g iff u(f (·)) dP (g(·)) dP . S
S
Furthermore, if there exist P and u as above, then the preference relation they induce on L0 satisfied conditions, (i), (iii), (iv), (vi), and (vii). Finally, the function u is unique up to a positive linear transformation. There are three apparent differences between the statement of the main result in the next section and the Anscombe–Aumann Theorem stated earlier: (i) Instead of strict monotonicity, monotonicity is used. It has been shown in the Observation that it does not make a difference. However, for the forthcoming extension, monotonicity is the natural condition. (ii) Independence is replaced with comonotonic independence. (iii) The finitely additive probability measure P is replaced with a nonadditive probability v.
Subjective probability and EU without additivity
115
5.3. Theorem A real-valued set function v on is termed nonadditive probability if it satisfies the normalization conditions v(φ) = 0 and v(S) = 1, and monotonicity, that is, for all E and G in : E ⊂ G implies v(E) v(G). We now introduce the definition of s a dv for v nonadditive probability and a = ki=1 αi Ei∗ a finite step function with α1 > α2 > · · · > αk and (Ei )ki=1 a partition of S. Let αk+1 = 0 and define a dv = S
k
⎛ (αi − αi+1 )v ⎝
i
⎞ Ej ⎠ .
j =1
i=1
For the special case of v additive the definition stated earlier coincides with the usual one mentioned in the previous section. Theorem 5.1. Suppose that the preference relation on L = L0 satisfies (i) weak order, (ii) comonotonic independence, (iv) continuity, (v) monotonicity, and (vii) nondegeneracy. Then there exist a unique nonadditive probability v on and an affine real-valued function u on Y such that for all f and g in L0 : f g iff u(f (·)) dv u(g(·)) dv. S
S
Conversely, if there exist v and u as above, u nonconstant, then the preference relation they induce on L0 satisfies, (i), (ii), (iv), (v), and (vii). Finally, the function u is unique up to positive linear transformations. Proof. From the implication of the von N–M Theorem we get a N–M utility u representing the preference relation induces on Y . By nondegeneracy there are f ∗ and f∗ in L0 with f ∗ f∗ . Monotonicity, (v), implies existence of a state s in S such that f ∗ (s) ≡ y ∗ f∗ (s) ≡ y∗ . Since u is given up to a positive linear transformation, suppose from now on u(y ∗ ) = 1 and u(y∗ ) = −1. Denote K = u(Y ). Hence K is a convex subset of the real line including the interval [1, −1]. For an arbitrary f in L0 denote Mf = {αf + (1 − α)y S | y ∈ Y
and a ∈ [0, 1]}.
Thus Mf is the convex hull of the union of f and Lc . It is easy to see that any two acts in Mf are comonotonic. Hence, there is an affine real-valued function on Mf , which represents the preference relation restricted to Mf . After rescaling, this function, Jf satisfies Jf (Y ∗S ) = 1 and Jf (y∗S ) = −1. Clearly, if h ∈ Mf ∩ Mg , then Jf (h) = Jg (h). So, defining J (f ) = Jf (f ) for f in L0 , we get a real-valued function on L0 which represents the preferences on L0 and satisfies for all y in Y : J (y S ) = u(y). Let B0 (K) denote the -measurable, K-valued finite step function on S. Let U : L0 → B0 (K) be defined by U (f )(s) = u(f (s)) for s in S
116
David Schmeidler
and f in L0 . The function U is onto, and if U (f ) = U (g), then by monotonicity f ∼ g, which in turn implies J (f ) = J (g). We now define a real-valued function I on B0 (K). Given a in B0 (K), let f in L0 be such that U (f ) = a. Then define I (a) = J (f ). I is well defined since as mentioned earlier J is constant on U −1 (a):
L0
U
B0 I
J
R We now have a real-valued function I on B0 (K) which satisfies the following three conditions: (i) For all α in K: I (αS ∗ ) = α. (ii) For all pairwise comonotonic functions a, b, and c in B0 (K) and α in [0, 1]: if I (a) > I (b) then I (αa + (1 − α)c) > I (αb + (1 − α)c). (iii) If a(s) b(s) on S for a and b in B0 (K), then I (a) I (b). To see that (i) is satisfied, let y in Y be such that u(y) = α. Then J (y S ) = α and U (y S ) = αS ∗ . Hence I (αS ∗ ) = α. Similarly, (ii) is satisfied because comonotonicity is preserved by U and J represents which satisfies comonotonic independence. Finally, (iii) holds because U preserves monotonicity. The corollary of Section 3 and the Remark following it in Schmeidler (1986) say that if a real-valued function I on B0 (K) satisfies conditions (i), (ii), and (iii), then the nonadditive probability v on defined by v(E) = I (E ∗ ) satisfies for all a and b in B0 (K):
I (a) I (b)
a dv
iff
b dv.
(5.1)
S
S
Hence, for all f and g in L0 :
f g
U (f ) dv
iff S
U (g) dv, S
and the proof of the main part of the theorem is completed. To prove the opposite direction note first that in Schmeidler (1986) it is shown and referenced that if I on B0 (K) is defined by (5.1), then it satisfies conditions (i), (ii), and (iii). (Only (ii) requires some proof.) Second, the assumptions of the opposite direction say that J is defined as a combination of U and I in the diagram. Hence the preference relation on L0 induced by J satisfies all the required conditions. (U preserves monotonicity and comonotonicity and S a dv is a (sup) norm continuous function of a.) Finally, uniqueness properties of the expected utility representation will be proved. Suppose that there exist an affine real-valued function u on Y and a
Subjective probability and EU without additivity nonadditive probability v on such that for all f and g in L0 : u (f (s)) dv u (g(s)) dv . f g iff S
117
(5.2)
S
Note that monotonicity of v can be derived instead of assumed. When considering (5.2) for all f and g in Lc we immediately obtain, from the uniqueness part of the von N–M Theorem, that u is a positive linear transformation of u. On the other hand it is obvious that the inequality in (5.2) is preserved under positive linear transformations of the utility. Hence, in order to prove that v = v we may assume without loss of generality that u = u. For an arbitrary E in let f in ∗ ∗ L0 be such that f (s) = y ∗ /2 + y∗ /2 U (f ) = E . (E.g. f (s) = y on E and C on E . Then S U (f ) dv = v(E) and S U (f ) dv = v (E).) Let y in Y be such that u(y) = v(E). (E.g. y = v(E)y ∗ +(1 − v(E))(y ∗ /2 + y∗ /2).) Then f ∼ y S which in turn implies u(y) = u (y) = S u (y S ) dv = v (E). The last equality is implied by (5.2). In order to extend the Theorem to more general acts, we have to specify precisely the set of acts L on which the extension holds and we have to extend correspondingly the definition of the integral with respect to nonadditive probability. We start with the latter. Denote by B the set of real-valued, bounded -measurable functions on S. Given a in B and a nonadditive probability v on we define 0 ∞ a dv = (v(a α) − 1) dα + v(a α) dα. S
−∞
0
Each of the integrands mentioned earlier is monotonic, bounded and identically zero where |α| > λ for some number λ. This definition of integration for nonnegative functions in B has been suggested by Choquet (1955). A more detailed exposition appears in Schmeidler (1986). It should be mentioned here that this definition coincides, of course, with the one at the beginning of this section when a obtains finitely many values. For the next definition, existence of weak order over Lc is presupposed. An act f : S → Y is said to be -measurable if for all y in Y the sets {s|f (s) y} and {s|f (s) y} belong to . It is said to be bounded if there are y and z in Y such that y f (s) z on S. The set of all -measurable bounded acts in Y S is denoted by L(). Clearly, it contains L0 . Corollary 5.1. (a) Suppose that a preference relation over L0 satisfies (i) weak order, (ii) comonotonic independence, (iv) continuity, and (v) monotonicity. Then it has a unique extension to all of L() which satisfies the same conditions (over L()). (b) If the extended relation, also to be denoted by , is nondegenerate, then there exist a unique nonadditive probability v on and an affine real-valued function u (unique upto positive linear transformations) such that for all f and g in L(): f g iff S u(f (·)) dv S u(g(·)) dv.
118
David Schmeidler
Proof. The case of degeneracy is obvious, so assume nondegenerate preferences. Consider the following diagram: U⬘
L(≥) ~ i J⬘
U
L0 J
B(K) i
B0 (K)
I⬘
I
R The inner triangle is that of the Proof of the Theorem. B(K) is the set of Kvalued, -measurable, bounded functions on S, and i denotes identity. U is the natural extension of U and is also onto. Because B0 (K) is (sup) norm dense in B(K) and I satisfies condition (iii), I is the unique extension of I that satisfies on B(K) the three conditions that I satisfies on B0 (K). The functional J , defined on L() by: J (f ) = I (U (f )), extends J . Hence, the relation on L() defined by f g iff J (f ) J (g) extends the relation on L0 , and satisfies the desired properties. By the corollary of section 3 in Schmeidler (1986) there exists a nonaddi tive probability v on such that for all f and g in L(): I (f ) I (g) iff S U (f ) dv S U (g) dv. Hence, the expected utility representation of the preference relation has been shown. To complete the proof of (b), uniqueness of v and uniqueness up to a positive linear transformation of u have to be established. However, it follows from the corresponding part of the Theorem. The uniqueness properties also imply that the extension of from L0 to L() is unique. Remark 5.1. Instead of first stating the Theorem for L0 and then extending it to L(), one can state directly the extended Theorem. More precisely a preference relation on L, L0 ⊂ L ⊂ Y S is defined such that in addition to the conditions (i), (ii), (iv), and (vii) it satisfies L = L(). It can then be represented by expected utility with respect to nonadditive probability. However, the first part of the Corollary shows that in this case the preference relation of L() is overspecified: The preferences of L0 dictate those over L(). Remark 5.2. If does not contain all subsets of S, and #X 3 then L() contains finite step functions that do not belong to L0 . Let y and z in Y be such that y ∼ z but y = z, and let E ⊂ S but E ∈ . Define f (s) = y on E and f (s) = z on E C . Clearly f ∈ L0 . The condition #X 3 is required to guarantee existence of y and z as mentioned earlier. Remark 5.3. It is an elementary exercise to show that under the conditions of the Theorem, v is additive iff satisfies (iii) independence (instead of or in addition to (ii) comonotonic independence). Also an extension of an independent relation,
Subjective probability and EU without additivity
119
as in Corollary (a), is independent. Hence our results formally extend the additive theory. We now introduce formally the concept of uncertainty aversion alluded to in the Introduction. A binary relation on L is said to reveal uncertainty aversion if for any three acts f , g, and h in L and any α in [0, 1]: If f h and g h, then αf + (1 − α)g h. Equivalently we may state: If f g, then αf + (1 − α)g g. For definition of strict uncertainty aversion the conclusion should be a strict preference . However, some restrictions then have to be imposed on f and g. One such obvious restriction is that f and g are not comonotonic. We will return to this question in a subsequent Remark. Intuitively, uncertainty aversion means that “smoothing” or averaging utility distributions makes the decision maker better off. Another way is to say that substituting objective mixing for subjective mixing makes the decision maker better off. The definition of uncertainty aversion may become more transparent when its full mathematical characterization is presented. Proposition 5.1. Suppose that on L = L() is the extension of on L0 according to the Corollary. Let v be the derived nonadditive subjective probability and I (the I of the Corollary) be the functional on B, I (a) = S a dv. Then the following conditions are equivalent: (i) reveals uncertainty aversion. (ii) For all a and b in B: I (a + b) I (a) + I (b). (iii) For all a and b in B and for all α in [0, 1]: I (αa + (1 − α)b) αI (a) + (1 − α)b. (iv) For all a and b in B and for all α in [0, 1]: I (αa + (1 − α)b) min{I (a), I (b)}. (v) For all α in R the sets {a ∈ B | I (a) α} are convex. (vi) There exists an α¯ in R s.t. the set {a ∈ b | I (a) α} ¯ is convex. (vii) For all a and b in B and for all α in [0, 1]: If I (a) = I (b), then I (αa + (1 − α)b) I (a). (viii) For all a and b in B: If I (a) = I (b), then I (a + b) I (a) + I (b). (ix) v is convex. That is, for all E and F in : v(E) + v(F ) v(EF ) + v(E + F ). (x) For all a in B: I (a) = min{ S a dp | p ∈ core(v)}, where core(v) = {p: → R | p is additive, p(s) = v(S) and for all E in , p(E) v(E)}.
120
David Schmeidler
Proof. For any functional on B: (iii) implies (iv), (iv) implies (vii), (iv) is equivalent to (v), and (v) implies (vi). The positive homogeneity of degree one of I results in: (ii) equivalent to (iii) and (vii) equivalent to (viii). (vi) implies (v) because for all β in R, (β = α − α), ¯ I (a + βS ∗ ) = I (a) + β, and because adding ∗ βS preserves convexity. (viii) implies (ix). Suppose, without loss of generality, that v(E) v(F ). Then there is γ 1 such that v(E) = γ v(F ). Since I (E ∗ ) = v(E) = γ v(F ) = I (γ F ∗ ), we have by (viii), v(E) + γ v(F ) I (E ∗ + γ F ∗ ). But E ∗ + γ F ∗ = (EF )∗ + (γ − 1)F ∗ + (E + F )∗ , which implies I (E ∗ + γ F ∗ ) = v(EF ) + (γ − 1)v(F ) + v(E + F ). Inserting the last equality in the inequality above leads to the inequality in (ix). The equivalence of (ix), (x), and (ii) is stated as proposition 3 in Schmeidler (1986). Last but not least, (i) is equivalent to (iv). This becomes obvious after considering the mapping U from the diagram in the Proof of the Corollary. The basic result of the Proposition is the equivalence of (i), (iii), (iv), (ix), and (x). (iv) is quasiconcavity of I and it is the translation of (i) by U from L to B. (iii) is concavity, which usually is a stronger assumption. Here I is concave iff it is quasiconcave. Concavity captures best the heuristic meaning of uncertainty aversion. Remark 5.4. The Proposition holds if all the inequalities are strict and in (i) it is strict uncertainty aversion. To show it precisely, null or dummy events in have to be defined. An event E in is termed dummy if for all F in : v(F + E) = v(F ). In (ii)–(vii), in order to state strict inequality one has to assume that a and b are not comonotonic for any b which differs from b on a dummy set. To have a strict inequality in (ix) one has to assume that (E − F )∗ , (EF )∗ , and (F − E)∗ are not dummies. In (x) a geometric condition on the core of v has to be assumed. Remark 5.5. The point of view of this work is that if the information is too vague to be represented by an additive prior, it still may be represented by a nonadditive prior. Another possibility is to represent vague information by a set of priors. Condition (x) and its equivalence to other conditions of the Proposition point out when the two approaches coincide. Remark 5.6. The concept of uncertainty appeal can be defined by: f g implies f αf + (1 − α)g. In the Proposition then all the inequalities have to be reversed and maxima have to replace minima. Obviously, additive probability or the independence axiom reveal uncertainty neutrality.
5.4. Concluding remarks 5.4.1 In the introduction a point of view distinguishing between objective and subjective probabilities has been articulated. It is not necessary for the results
Subjective probability and EU without additivity
121
of this work. What matters is that the lotteries in Y be constructed of additive probabilities. These probabilities can be subjectively arrived upon. This is the point of view of Anscombe and Aumann (1963). They describe their result as a way to assess complicated probabilities, “horse lotteries,” assuming that the probabilities used in the simpler “roulette lotteries” are already known. The Theorem here can also be interpreted in this way, and one can consider the lotteries in Y as derived within the behavioristic framework as follows: Let be a set (a roulette). An additive probability P on all subsets of is derived via Savage’s Theorem. More specifically, let Z be a set of outcomes with two or more elements. (Suppose that the sets Z and X are disjoint.) Let F denote the set of Savage’s acts, that is, all functions from to Z. Postulating existence of a preference relation on F satisfying Savage’s axioms leads to an additive probability P on . Next we identify a lottery, say y, in Y with all the acts from
to X, which induce the probability distribution y. Thus we have a two-step model within the framework of a behavioristic (or personal or subjective) theory of probability. Since the motivation of our Theorem is behavioristic (i.e. derivation of utility and probability from preference), the conceptual consistency of the work requires that the probabilities in Y could also be derived from preferences. We will return to the question of conceptual consistency in the next Remark. Instead of the two-step model of the previous paragraph one can think of omitting the roulette lotteries from the model. One natural way to do this is to try to extend Savage’s Theorem to nonadditive probability. This has been done by Gilboa (1987). Another approach has been followed by Wakker (1986), wherein he substituted a connected topological space for the linear structure of Y . 5.4.2 In recent years many articles have been written which challenged the expected utility hypothesis in the von Neumann–Morgenstern model and in the model with state-dependent acts. We restrict our attention to models that (i) introduce functional representation of a preference relation derived from axioms, and (ii) separate “utilities” from “probabilities” (in the representation). Furthermore (iii) we consider functional representations which are sums of products of two numbers; one number has a “probability” interpretation and the other number has a “utility” interpretation. (For recent works disregarding restriction (iii) the reader may consult Fishburn (1985) and the reference there.) Restriction (iii) is tantamount to the functional representation used in the Theorem (the Choquet integral). An article that preceded the present work in this kind of representation using nonadditive probability is Quiggin (1982). (Thanks for this reference are due to a referee.) His result will be introduced here somewhat indirectly. 5.4.2.1 Consider a preference relation over acts satisfying the assumptions, and hence the conclusions, of the Theorem. Does there exist an additive probability P on and a nondecreasing function f from the unit interval onto itself such that v(E) = f (P (E)) on ? (Such a function f is referred to as a distortion function.) Conditions leading to a positive answer when the function f is increasing are well known. (They are stated as a step in the proof in Savage (1954); see also Fishburn (1970).) In this case v represents qualitative (or ordinal) probability, and
122
David Schmeidler
the question we deal with can be restated as follows: Under what conditions does a qualitative probability have an additive representation? The problem is much more difficult when f is just nondecreasing but not necessarily increasing. A solution has been provided by Gilboa (1985). 5.4.2.2 The set of nonadditive probabilities which can be represented as a composition of a distortion function f and an additive probability P is “small” relative to all nonadditive probabilities. For example, consider the following version of the Ellsberg paradox. There are 90 balls in an urn, 30 black, B, balls and all the other balls are either white, W , or red, R. Bets on the color of a ball drawn at random from the urn are offered. A correct guess is awarded by $100. There are six bets: “B”, “R”, “W ”, “B or W ”, “R or W ”, and “B or R”. The following preferences constitute an Ellsberg paradox: B R ∼ W , R or W B or R ∼ B or W . It is impossible to define an additive probability on the events B, R, and W such that this probability’s (nondecreasing) distortion will be compatible with the preferences mentioned earlier. 5.4.2.3 In Quiggin’s model X is the set of real numbers. An act is a lottery k of the form y = (xi , pi )i=1 where k 1, x1 x2 · · · xk , pi 0 and pi = 1. Quiggin postulates a weak order over all such acts which satisfies several axioms. As a result he gets a unique distortion function f and a monotonic, unique up to a positive linear transformation, utility function u on X such that the mapping y → ki=1 (xi −xi+1 )f ( ij =1 pj ) represents the preferences. However, f (1/2) = 1/2. Quiggin’s axioms are not immediate analogues of the assumptions in Section 5.2. For example he postulates the existence of certainty equivalence for each act, that is, for every y there is x in X such that y ∼ x. Yaari (1987) simplified Quiggin’s axioms and got rid of the restriction f (1/2) = (1/2) on the distortion function. However, Yaari’s main interest was the uncertainty aversion properties of the distortion function f . Hence his simplified axioms result in linear utility over the set of incomes, X. He explored the duality between concavity of the utility functions in the theory of risk aversion and the convexity of the distortion function in the theory of uncertainty aversion. Quiggin extended his results from distributions over the real numbers with finite support to distributions over the real line having density functions. Yaari dealt with arbitrary distribution functions over the real line. Finally, Segal (1984) and Chew (1984) got the most general representation for Quiggin’s model. I conclude my remark on the works of Quiggin, Yaari, and Segal with a criticism from a normative, behavioristic point of view: It may seem conceptually inconsistent to postulate a decision maker who, while computing anticipated utility, assigns weight f (p) to an event known to him to be of probability p, p = f (p). His knowledge of p is derived, within the behavioristic model, from preferences over acts (as in 5.4.1). The use of the terms “anticipation” and “weight,” instead of “expectation” and “probability” does not resolve, in my opinion, the inconsistencies. One way out would be to follow paragraph 5.4.2.1 and to try to derive simultaneously distorted and additive probabilities of events. 5.4.3 The first version of this work (Schmeidler (1982)) includes a slightly extended version of the present Theorem. First recall that Savage termed an event
Subjective probability and EU without additivity
123
E null if for all f and g in L: f = g on E c implies f ∼ g. Clearly, if the conditions of the Theorem are satisfied then an event is null iff it is dummy. The extended version of the Theorem includes the following addition: The nonadditive probability v of the Theorem satisfies the following condition: u(E) = 0 implies E is dummy, if and only if the preference relation also satisfies: E is not null, f = g on E c and f (s) g(s) on E imply f g. 5.4.4 The expected utility model has in economic theory two other interpretations in addition to decisions under uncertainty. One interpretation is decisions over time: s in S represents time or period. The other interpretation of S is the set of persons or agents in the society, and the model is applied to the analysis of social welfare functions. Our extension of the expected utility model may have the same uses. Consider the special case where f (s) is s person’s income. Two income allocations f and g are comonotonic if the social rank (according to income) of any two persons is not reversed between f and g. Comonotonic f , g, and h induce the same social rank on individuals and then f g implies γ f +(1−γ )h γ g +(1−γ )h. This restriction on independence is, of course, consistent with strict uncertainty aversion which can here be interpreted as inequality (or inequity) aversion. In other words we have here an “Expected Utility” representation of a concave Bergson–Samuelson social welfare function. 5.4.5 One of the puzzling phenomena of decisions under uncertainty is people buying life insurance and gambling at the same time.1 This behavior is compatible with the model of this chapter. Let S 0 = S 1 × S 2 × S 3 , where s 1 in S 1 describes a possible state of health of the decision maker, s 2 in S 2 describes a possible resolution of the gamble, and s 3 in S 3 describes a possible resolution of all other relevant uncertainties. Let v i be a nonadditive probability on S i , i = 0, 1, 2, 3. Suppose that v 1 is strictly convex (i.e. satisfying strict uncertainty aversion), v 2 is strictly concave (i.e. v 2 (E) + v 2 (F ) > v(E ∪ F ) + v(E ∩ F ) if E\F and F \E are nonnull). Furthermore, if E 0 = E 1 × E 2 × E 3 , and Ei ⊂ S i , then v 0 (E 0 ) = v 1 (E 1 )v 2 (E 2 )v 3 (E 3 ). To simplify matters suppose that X is a bounded interval of real numbers (representing an income in dollars), and the utility u is linear on X. Let the preference relation over acts on S 0 be represented by f → u(f ) dv 0 . In this case buying insurance and gambling (betting) simultaneously is preferred to buying insurance only or gambling only, ceteris parabus. Also either of these last two acts is preferred to “no insurance no gambling.”
Acknowledgments I am thankful to Roy Radner for comments on the previous version presented at Oberwolfach, 1982. Thanks are due also to Benyamin Shitovitz, and anonymous referees for pointing out numerous typos in previous versions. Partial financial support from the Foerder Institute and NSF Grant No. SES 8026086, is gratefully acknowledged. Parts of this research have been done at the University of Pennsylvania, and at the Institute for Mathematics and its Applications at the University of Minnesota.
124
David Schmeidler
Note 1 It is not puzzling, as a referee pointed out, if one accepts the Friedman–Savage (1948) explanation of this phenomenon.
References Anscombe, F. J. and R. J. Aumann (1963). “A Definition of Subjective Probability,” The Annals of Mathematical Statistics, 34, 199–205. Chew, Soo Hong (1984). “An Axiomatization of the Rank-dependent Quazilinear Mean Generalizing the Gini Mean and the Quazilinear Mean,” mimeo. Choquet, G. (1955). “Theory of Capacities,” Ann. Inst. Fourier (Grenoble), 5, 131–295. Dunford, N. and J. T. Schwarz (1957). Linear Operators Part I. New York: Interscience. Ellsberg, D. (1961). “Risk, Ambiguity and Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Feynman, R. P. et al. (ed.) (1963, 1965). The Feynman Lectures on Physics, Vol. I, Sections 37–4, 37–5, 37–6, and 37–7; Vol III, Chapter 1. Fishburn, P. C. (1970). Utility Theory for Decision Making. New York: John Wiley & Sons. —— (1985). “Uncertainty Aversion and Separated Effects in Decision Making Under Uncertainty,” mimeo. Friedman, M. and L. J. Savage (1984). “The Utility Analysis of Choices Involving Risk,” Journal of Political Economy, 56, 279–304. Gilboa, I. (1987). “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88. —— (1985). “Subjective Distortions of Probabilities and Non-Additive Probabilities,” Working Paper, The Foerder Institute for Economic Research, Tel Aviv University. von Neumann, J. and O. Morgenstern (1947). Theory of Games and Economic Behavior, 2nd ed. Princeton: Princeton University Press. Quiggin, J. (1982). “A Theory of Anticipated Utility,” Journal of Economic Behavior and Organization, 3, 323–343. Savage, L. J. (1954). The Foundations of Statistics. New York (2nd ed. 1972): John Wiley & Sons; New York: Dover Publications. Schmeidler, D. (1982). “Subjective Probability without Additivity” (Temporary Title), Working Paper, The Foerder Institute for Economic Research, Tel Aviv University. —— (1986). “Integral Representation with Additivity,” Proceedings of the American Mathematical Society, 97, 253–261. Segal, U. (1984). “Nonlinear Decision Weights with the Independence Axiom,” UCLA Working Paper #353. Wakker, P. O. (1986). “Representations of Choice Situations,” Ph.D. Thesis, Tilburg, rewritten as Additive Representation of Preferences (1989). Norwell, MA: Kewer Academic Publishers (Ch. VI). Yaari, M. E. (1987). “Dual Theory of Choice Under Uncertainty,” Econometrica, 55, 95–115.
6
Maxmin expected utility with non-unique prior Itzhak Gilboa and David Schmeidler
6.1. Introduction One of the first objections to Savage’s paradigm was raised by Ellsberg (1961). He suggested the following mind experiment, challenging the expected utility hypotheses: subject is asked to preference rank four bets. He/she is shown two urns, each containing 100 balls each one either red or black. Urn A contains 50 black balls and 50 red ones, while there is no additional information about urn B. One ball is drawn at random from each urn. Bet 1 is “the ball drawn from urn A is black,” and will be denoted by AB. Bet 2 is “the ball drawn from urn A is red,” and will be denoted by AR, and similarly we have BB and BR. Winning a bet entitles the subject $100. The following preferences have been observed empirically: AB & AR > BB & BR. It is easy to see that there is no probability measure supporting these preferences through expected utility maximization. One conceivable explanation of this phenomenon which we adopt here is as follows: In case of urn B, the subject has too little information to form a prior. Hence he/she considers a set of priors as possible. Being uncertainty averse, he/she takes into account the minimal expected utility (over all priors in the set) while evaluating a bet. For instance, one may consider the extreme case in which our decision maker takes into account all possible priors over urn B. In this case the minimal utility of each one of the bets AB, AR is $50, while that of bets BB and BR is $0, so that the observed preferences are compatible with the maxmin expected utility decision rule. These ideas are not new. Hurwicz (1951) showed an example of statistical analysis where the statistician is too ignorant to have a unique “Bayesian” prior, but “not quite as ignorant” to apply Wald’s decision rule with respect to all priors. Smith (1961) suggested considering an interval of priors in such situations. He tried to axiomatize this behavior pattern using the “Odds” concept. Other works utilize the Choquet Integration with respect to capacities (Choquet (1955)) to deal with the
Gilboa, I. and D. Schmeidler (1989). Maxmin expected utility with a non-unique prior, Journal of Mathematical Economics, 97, 141–153.
126
Itzhak Gilboa and David Schmeidler
problem of a nonunique prior. Huber and Strassen (1973) use the Choquet Integral in testing hypotheses regarding the choice between two disjoint sets of measures. Schmeidler (1982, 1984, 1986) axiomatizes the preferences representable via the Choquet Integral of the utility with respect to a nonadditive probability measure. He used a framework including both “Horse Lotteries” and “Roulette Lotteries,” à la Anscombe and Aumann (1963). Gilboa (1987) obtains the same representation in the original framework of Savage (1954). (See also Wakker (1986)). In Schmeidler (1986) it has been shown, roughly speaking, that when the nonadditive probability v on S is convex (i.e. v(A ∪ B) + v(A ∩ B) ≥ v(A) + v(B)), the Choquet Integralof a real-valued function, say a, with respect to v is equal to the minimum of { a dP |P is in the core of v}. The core of v, by definition, consists of all finitely additive probability measures that majorize v pointwise (i.e. event-wise). That is to say, the nonadditive expected utility theory coincides with the decision rule we propose here, where the set of possible priors is the core of v. However, when an arbitrary (closed and convex) set of priors C is given, and one defines v(A) = min{P (A)|P ∈ C}, v need not be convex, though it is exact, that is, pointwise minimum of additive set functions. (See examples in Schmeidler (1972) and Huber and Strassen (1973).) Furthermore, even if v happens to be convex C does not have to be its core. It is not hard to construct an example in which C is a proper subset of the core of v. This chapter proposes an axiomatic foundation of the maxmin expected utility decision rule. As in Schmeidler (1984), some of which notations we repeat, using the framework of Anscombe and Aumann (1963). The main difference among the models of Anscombe and Aumann (1963), Schmeidler (1984) and the present one lies in the phrasing of the independence axiom (sure-thing principle). Unlike in the other two works, we also use here an axiom of uncertainty aversion. Similarly to the nonadditive expected utility theory, this model extends classical expected utility. In general, the theories differ from each other; as mentioned earlier, they coincide in the case of a convex v. The straightforward interpretation of our result is an extension of the neoBayesian paradigm which leads to a set of priors instead of a unique one. However, with a different interpretation, in which the set C is the set of possible probability distributions in a statistical decision problem, our result sheds light on Wald’s minimax criterion and on its relation to personalistic probability. (We refer here to the minimax loss criterion, which is equivalent to maximin utility, and not to the minimax regret criterion suggested by Savage (1954: Ch. 9).) In Wald (1950: Section 1.4.2), we find: “A minimax solution seems, in general, to be a reasonable solution of the decision problem when an a priori distribution in does not exist or is unknown to the experimenter.” Hence our main result can be considered as an axiomatic foundation of Wald’s criterion. The detailed exposition of the model and the main result are stated in the next section. The proof in Section 6.3 and Section 6.4 is devoted to an extension and several concluding remarks. Especially, we deal there with the definition of the concept of independence in the case of a nonunique prior.
Maxmin expected utility with non-unique prior
127
Finally we would like to note that different approaches to the phenomenon of a nonunique prior appear in Lindley et al. (1979), Vardennan and Meeden (1983), Agnew (1985), Genest and Schervish (1985), Bewley (1986) and others.
6.2. Statement of the main result Let X be a set and let Y be the set of distributions over X with finite supports " % Y = y : X → [0, 1] | y(x) = 0 for only finitely many x’s in X and y(x) = 1 . x∈X
For notational simplicity we identify X with the subset {y ∈ Y |y(x) = 1 for some x in X} of Y . Let S be a set and let be an algebra of subsets of S. Both sets, X and S are assumed to be nonempty. Denote by L0 the set of all -measurable finite step functions from S to Y and denote by Lc the constant functions in L0 . Let L be a convex subset of Y S which includes Lc . Note that Y can be considered a subset of some linear space, and Y S , in turn, can then be considered as a subspace of the linear space of all functions from S to the first linear space. Whereas it is obvious how to perform convex combinations in Y it should be stressed that convex combinations in Y S are performed pointwise. That is, for f and g in Y S and α in [0, 1], αf + (1 − α)g = h where h(s) = αf (s) + (1 − α)g(s) for s ∈ S. In the neo-Bayesian nomenclature, elements of X are (deterministic) outcomes, elements of Y are random outcomes or (roulette) lotteries and elements of L are acts (or horse lotteries). Elements of S are states (of nature) and elements of are events. The primitive of a neo-Bayesian decision model is a binary (preference) relation over L to be denoted by . Next are stated several properties (axioms) of the preference relation, which will be used in the sequel. A1 Weak order. (a) For all f and g in L: f g or g f . (b) For all f , g, and h in L: If f g and g h then f h. The relation on L induces a relation also denoted by on Y : y z iff y ∗ z∗ where x ∗ (s) = x for all x ∈ Y and s ∈ S. When no confusion is likely to arise, we shall not distinguish between y ∗ and y. As usual, > and & denote the asymmetric and symmetric parts, respectively, of . A2 Certainly-Independence (C-independence for short). For all f , g in L and h in Lc and for all α in ]0, 1[: f > g iff αf + (1 − α)h > αg + (1 − α)h. A3 Continuity. For all f , g, and h in L: if f > g and g > h then there are α and β in ]0, 1[ such that αf + (1 − α)h > g and g > βf + (1 − β)h. A4 Monotonicity. For all f and g in L: if f (s) g(s) on S then f g.
128
Itzhak Gilboa and David Schmeidler
A5 Uncertainty aversion. For all f , g ∈ L and α ∈ ]0, 1[: f & g implies αf + (1 − α)g f . A6 Nondegeneracy. Not for all f and g in L, f g. All the assumptions except for A2 and A5 are quite common. The standard independence axiom is stronger than C-independence as it allows h to be any act in L rather than restricting it to constant acts. This axiom seems heuristically more appealing: a decision maker who prefers f to g can more easily visualize the mixtures of f and g with a constant h than with an arbitrary one, hence he is less likely to reverse his preferences. An intuitive objection to the standard independence axiom is that it ignores the phenomenon of hedging. Like comonotonic independence (Schmeidler (1984)), C-independence does not exclude hedging. However, C-independence is much simpler than and implied by comonotonic independence. Uncertainty aversion (which was introduced in Schmeidler (1984)) captures the phenomenon of hedging, especially when the preference is strict. Thus this assumption complements C-independence. Before stating the main result we mention that the topology to be used on the space of finitely additive set functions on is the product topology, that is, the weak∗ topology in Dunford and Schwartz (1957) terms. Recall that in this topology the set of finitely additive probability measures on is compact. Theorem 6.1. Let be a binary relation on L0 . Then the following conditions are equivalent: (1) satisfies assumptions A1–A5 for L = L0 . (2) There exist an affine function u: Y → R and a nonempty, closed and convex set C of finitely additive probability measures on such that: (∗) f g iff minP ∈C u ◦ f dP minP ∈C u ◦ g dP (for all f , g ∈ L0 ). Furthermore: (a) The function u in (2) is unique up to a positive linear transformation; (b) The set C in (2) is unique iff assumption A6 is added to (1).
6.3. Proof of Theorem 6.1 The crucial part of the proof is that (1) implies (2). If A6 fails to hold, then a constant function u and any closed and convex subset C will satisfy (2), hence for the next several lemmata we suppose assumptions A1–A6. Lemma 6.1. There exists an affine u: Y → R such that for all y, z ∈ Y : y z iff u(y) u(z). Furthermore, u is unique up to a positive linear transformation. Proof. This is an immediate consequence of the von Neumann–Morgenstern theorem, since the independence assumption for Lc is implied by C-independence. (See Fishburn (1970: Ch. 8).)
Maxmin expected utility with non-unique prior
129
Lemma 6.2. Given a u : Y → R from Lemma 6.1, there exists a unique J : L0 → R such that: (i) f g iff J (f ) J (g) (for all f , g ∈ L0 ); (ii) for f = y ∗ ∈ Lc , J (f ) = u(y). Proof. On Lc J is uniquely determined by (ii). We extend J to L0 as follows: ¯ y ∈ Y such that y f y. ¯ Given f ∈ L0 , there are y, By the continuity assumption and other assumptions, there exists a unique α ∈ [0, 1] such that f & αy + (1 − α)y. ¯ Define J (f ) = J (αy + (1 − α)y). ¯ By construction, J satisfies (i), hence it is also unique. We shall henceforth choose a specific u: Y → R such that there are y1 , y2 ∈ Y for which u(y1 )< − 1 and u(y2 ) > 1. (Such a choice of a utility u is possible in view of the nondegeneracy assumption.) We denote by B the space of all bounded -measurable real-valued functions on S (which is denoted B(S, ) in Dunford and Schwartz (1957)). B0 will denote the space of functions in B which assume finitely many values. Let K = u(Y ), and let B0 (K) be the subset of functions in B0 with values in K. For γ ∈ R, let γ ∗ ∈ B0 be the constant function on S the value of which is γ . Lemma 6.3. There exists a functional I : B0 → R such that: (i) (ii) (iii) (iv)
For all f ∈ L0 , I (u ◦ f ) = J (f ) (hence I (1∗ ) = 1). I is monotonic (i.e. for a, b ∈ B0 : a b ⇒ I (a) I (b)). I is superlinear (i.e. superadditive and homogeneous of degree 1). I is C-independent: for any a ∈ B0 and γ ∈ R, I (a + γ ∗ ) = I (a) + I (γ ∗ ).
Proof. We first define I on B0 (K) by condition (i). (Lemma 6.2 and the monotonicity assumption assure that I is thus well defined). We now show that I is homogeneous on B0 (K). Assume a = αb where a, b ∈ B0 (K) and 0 < α 1. We have to show that I (a) = αI (b). (This will imply the equality for α > 1.) Let g ∈ L0 satisfy u ◦ g = b. Let z ∈ Y satisfy J (z) = 0 and define f = αg + (1 − α)z. Hence u ◦ f = αu ◦ g + (1 − α)u ◦ z = αb = a, so I (a) = J (f ). Let y ∈ Y satisfy y & g (hence J (y) = J (g) = I (b)). By C-independence, αy + (1 − α)z & αg + (1 − α)z = f . Hence J (f ) = J (αy + (1 − α)z) = αJ (y) + (1 − α)J (z) = αJ (y). Whence I (a) = J (f ) = αJ (y) = αI (b). We now extend I by homogeneity to all of B0 . Note that I is monotone and homogeneous of degree 1 on B0 . Next we show that I is C-independent (part (iv) of the Lemma). Let there be given a ∈ B0 and γ ∈ R. By homogeneity we may assume without loss of generality that 2a, 2γ ∗ ∈ B0 (K). Now define β = I (2a) = 2I (a). Let f ∈ L0 satisfy u ◦ f = 2a and let y, z ∈ Y satisfy u ◦ y = β ∗ and u ◦ z = 2γ ∗ . Since
130
Itzhak Gilboa and David Schmeidler
f & y, C-independence of implies that 12 f + 12 z & 12 y + 12 z. Hence 1 1 I (a + γ ∗ ) = I ( β ∗ + γ ∗ ) = β + γ = I (a) + γ , 2 2 and I is C-independent. It is left to show that I is superadditive. Let there be given a, b ∈ B0 . Once again, by homogeneity we may assume without loss of generality that a, b ∈ B0 (K). Furthermore, for the same reason it suffices to prove that I ( 12 a + 12 b) 12 I (a) + 1 2 I (b). Suppose that f , g ∈ L0 are such that u◦f = a and u◦g = b. If I (a) = I (b), then f & g and by uncertainty aversion (assumption A5), 12 f + 12 g f , which, in turn, implies I ( 12 a + 12 b) I (a) = 12 I (a) + 12 I (b). Assume, then, I (a) > I (b), and let γ = I (a) − I (b). Set c = b + γ ∗ and note that I (c) = I (b) + γ = I (a) by C-independence of I . Using the C-independence of I twice more and its superadditivity for the case proven earlier, one obtains: 1 1 1 1 1 1 I a+ b + γ =I a + c I (a) + I (c) 2 2 2 2 2 2 =
1 1 1 I (a) + I (b) + γ , 2 2 2
which completes the Proof of the Lemma. Recall that the space B is a Banach space with the sup norm · , and B0 is a norm-dense subspace of B. Lemma 6.4 will also be used in an extension of the Theorem. Lemma 6.4. There exists a unique continuous extension of I to B. Furthermore, this extension is monotonic, superlinear and C-independent. Proof. We first show that for each a, b ∈ B0 , |I (a) − I (b)| a − b. Indeed, a = b + a − b b + a − b∗ . Monotonicity and C-independence of I imply that I (a) I (b + a − b∗ ) = I (b) + a − b or I (a) − I (b) a − b. The same argument implies I (b) − I (a) b − a. Thus there exists a unique continuous extension of I . Obviously, it is superlinear, monotonic and C-independent. In Lemma (6.5) the convex set of finitely additive probability measures C of Theorem 6.1 will be constructed via a separation theorem. Lemma 6.5. If I is a monotonic superlinear and C-independent functional on B with I (1∗ ) = 1, there exists a closed and convex set C of finitely additive probability measures on such that: for all b ∈ B, I (b) = min{ b dP |P ∈ C}. Proof. Let b ∈ B with I (b) > 0 be given.We will construct a finitely additive probability measure Pb such that I (b) = b dPb and I (a) a dPb for all
Maxmin expected utility with non-unique prior
131
a ∈ B. To this end we define D1 = {a ∈ B|I (a) > 1}, D2 = conv({a ∈ B|a 1∗ } ∪ {a ∈ B|a b/I (b)}). We now show that D1 ∩ D2 = ∅. Let d2 ∈ D2 satisfy d2 = αa1 + (1 − α)a2 where a1 1∗ , a2 (b/I (b)), and α ∈ [0, 1]. By monotonicity, homogeneity, and C-independence of I , I (d2 ) α + (1 − α)I (a2 ) 1. Note that each of the sets D1 , D2 has an interior point and that they are both convex. Thus, by a separation theorem (see Dunford and Schwartz (1957, V.2.8)) there exists a nonzero continuos linear functional pb and an α ∈ R such that: for all d1 ∈ D1 and d2 ∈ D2 ,
pb (d1 ) α pb (d2 ).
(6.1)
Since the unit ball of B is included in D2 , α > 0. (Otherwise pb would have been identically zero.) We may therefore assume without loss of generality that α = 1. By (1), pb (1∗ ) 1. Since 1∗ is a limit point of D1 , pb (1∗ ) 1 is also true, hence pb (1∗ ) = 1. We now show that pb is non-negative, or, more specifically, that pb (1E ) 0 whenever 1E is the indicator function of some E ∈ . Since pb (1E ) + pb (1∗ − 1E ) = pb (1∗ ) = 1, and 1∗ − 1E ∈ D2 , the inequality follows. By the classical representation theorem there exists a finitely additive probability measure Pb on such that pb (a) = a dPb for all a ∈ B. We will now show that pb (a) I (a) for all a ∈ B, with equality for a = b: First assume I (a) > 0. It is easily seen that a/I (a) + (1/n)∗ ∈ D1 , so the continuity of pb and (1) imply pb (a) I (a). For the case I (a) 0 the inequality follows from C-independence. Since b/I (b) ∈ D2 , we obtain the converse inequality for b, thus pb (b) = I (b). We now define the set C as the closure of the convex hull of {Pb |I (b) > 0} (which, of course, is convex). It is easy to see that I (a) min{ a dP |P ∈ C}. For a such that I (a) > 0, we have shown the converse inequality to hold as well. For a such that I (a) 0, it is again a simple implication of C-independence. Conclusion of the Proof of Theorem 6.1 Lemmata 6.1–6.5 prove that (1) implies (2). Assuming (2) define I on B by I (b) = min{ b dP |P ∈ C}, C compact and convex. It is easy to see that I is monotonic, superlinear, C-independent, and continuous. So, in turn, the preference relation defined on L0 by (2) satisfies A1–A5. We now turn to prove the uniqueness properties of u and C. The uniqueness of u up to positive linear transformation is implied by Lemma 6.1. If Assumption A6 does not hold, the range of u, K, is a singleton, and C can be any nonempty closed and convex set. We shall now show that if assumption A6
132
Itzhak Gilboa and David Schmeidler
does hold, C is unique. Assume the contrary, that is, that there are C1 = C2 , both nonempty, closed and convex, such that the two functions on L0 : % " J1 (f ) = min u(f ) dP |P ∈ C1 , %
" J2 (f ) = min
u(f ) dP |P ∈ C2 ,
both represent . Without loss of generality one may assume that there exists P1 ∈ C1 \C2 . By a separation Theorem (Dunford and Schwartz (1957: V.2.10)), there exists a ∈ B such that % " a dP1 < min a dP |P ∈ C2 . Without loss of generality we may assume that a ∈ B0 (K). Hence there exists f ∈ L0 such that J1 (f ) < J2 (f ). Now let y ∈ Y satisfy y & f . We get J1 (y) = J1 (f ) < J2 (f ) = J2 (y), a contradiction.
6.4. Extension and concluding remarks A natural question arising in view of Theorem 6.1 is whether it holds when the set of acts L, on which the preference relation is given, is a convex superset of L0 . A partial answer is presented in the sequel. It will be shown that, for a certain superset of L0 , the preference relation on it is completely determined by its restriction to L0 , should it satisfy the assumptions introduced in Section 6.2. Given a weak order on Lc , an act f : S → Y is said to be -measurable if for all y ∈ Y the sets {s|f (s) > y} and {s|f (s) y} belong to . It is said to be bounded (or, more precisely, -bounded) if there are y1 , y2 ∈ Y such that y1 f (s) y2 for all s ∈ S. The set of all -measurable bounded acts in Y S is denoted by L(). It is obvious that L() is convex and contains L0 . Proposition 6.1. Suppose that a preference relation over L0 satisfies assumptions A1–A5. Then it has a unique extension to L() which satisfies the same assumptions (over L()). Proof. Because of monotonicity, the proposition is obvious in case that Assumption A6 does not hold. Therefore we assume it does, and we may apply Lemmata 6.1–6.4. We then define the extension of (also to be denoted by ) as follows: f g iff I (u(f )) I (u(g)). It is obvious that satisfies A1–A5 and that on L() is the unique monotonic extension of on L0 .
Maxmin expected utility with non-unique prior
133
Remark. Suppose that satisfies A1–A5 over L, which is convex and contains L0 . Then, in view of Proposition 6.1, may be represented as in Theorem 6.1 on L ∩ L(). We now introduce the concepts of independence of acts and products of binary relations. Suppose that a given preference relation satisfies A1–A6 over L0 . By Proposition 6.1 we extend it to L = L() and let u and C be as in Theorem 6.1. Two acts f , g ∈ L are said to be independent if the following two conditions hold: (1) There exists P0 ∈ C such that " % u ◦ f dP0 = min u ◦ f dP |P ∈ C , and "
u ◦ g dP0 = min
% u ◦ g dP |P ∈ C ;
(2) u ◦ f and u ◦ g are two stochastically independent random variables with respect to any extreme point of C (for short: Ext(C)). As expected, this notion of independence turns out to be closely related to that of product spaces, once the latter is defined. We will refer to a triple (S, , C) as a nonunique probability space. Given two nonunique probability spaces (Si , i , Ci ) i = 1, 2, we define their product (S, , C) as follows: S = S1 × S2 , = 1 ⊗ 2 and C is the closed convex hull of {P1 ⊗ P2 |P1 ∈ C1 , P2 ∈ C2 }. Suppose that for a given set of outcomes X, there are given two acts spaces Li0 ⊂ Y Si , i = 1, 2, and two preference relations i correspondingly, such that the restrictions of 1 and 2 to Y coincide. As before, we suppose that each i satisfies A1–A6 and we consider its extension to Li = Li (i ). For the product acts space L0 ⊂ Y S1 ×S2 we define the product preference relation = 1 ⊗ 2 as derived from u and C. It is obvious that also satisfies A1–A6, and it has a unique extension to L = L(). Given f i ∈ Li , it has a unique trivial extension f¯i ∈ L. Now we formulate the result which justifies our definition of independence: Proposition 6.2. Given L1 , 1 , L2 , 2 and L as stated earlier, is the unique preference relations over L satisfying: (1) assumptions A1–A6; (2) for all f i , g i ∈ Li , f i i g i iff f¯i g¯ i (i = 1, 2); (3) for all f ∈ L1 and g ∈ L2 , f¯ and g¯ are independent. Proof. It is trivial to see that indeed satisfies (1)–(3). To see that it is unique, let also satisfy (1)–(3). By (1) and our main result, is representable by a utility
134
Itzhak Gilboa and David Schmeidler
u and a convex and closed set of finitely additive measures C . By Lemma 6.1 we assume without loss of generality that u = u . We now wish to show that C = C. Step 1. C ⊂ C. Proof of Step 1. As C is convex, it suffices to show that Ext(C ) ⊂ C. Let, then P0 ∈ Ext(C ). Define Pi to be the restriction of P0 to i (i = 1, 2). Choose A ∈ 1 and B ∈ 2 , and let f ∈ L1 and g ∈ L2 satisfy u◦f = 1A , u◦g = 1B . Since f¯ and g¯ are independent, they are independent with respect to P0 . Hence P0 (A × B) = P0 (A × S2 )P0 (S1 × B) = P1 (A)P2 (B). This implies P0 = P1 ⊗ P2 ∈ C. Step 2. C ⊂ C . Proof of Step 2. We begin with Step 2a. It 1 and 2 are finite, then C ⊂ C . Proof of Step 2a. By a theorem of Straszewicz (1935), it suffices to show that P1 ⊗ P2 ∈ C for all P1 ∈ Exp(C1 ) and P2 ∈ Exp(C2 ), where Exp(C) denotes the set of exposed points in C, that is, the points at which there exists a supporting hyperplane which does not pass through any other point of C. Let there be given, then, P1 ∈ Exp(C1 ) and P2 ∈ Exp(C2 ). Let f ∈ L1 and g ∈ L2 be such that % " u ◦ f dP |P ∈ C1 u ◦ f dP1 = min and
"
u ◦ g dP2 = min
% u ◦ g dP |P ∈ C2 .
By the independence of f and g, there exists P0 ∈ C , for which u ◦ f¯ dP and u ◦ g¯ dP are minimized simultaneously. By step 1, P0 ∈ C, hence there are P1 ∈ C1 and P2 ∈ C2 such that P0 = P1 ⊗P2 . However, u◦ f¯ dP0 = u◦f dP1 and u ◦ g¯ dP0 = u ◦ g dP2 . By the uniqueness property of Exp(Ci )(i = 1, 2), we obtain P1 = P1 and P2 = P2 . Hence P1 ⊗ P2 = P0 ∈ C, and step 2a is proved. We will now complete the Proof of Step 2. Assume that, by way of negation, C \C = ∅, that is, = . As in the Proof of the Theorem, there exists f ∈ L0 ˜ and y ∈ Y such that f > y ∗ and y ∗ > f . Consider the finite sub-algebra, say , of generated by f . There are i finite sub-algebras of i (I = 1, 2), such that ˜ ⊂ = ⊗ . Next consider the restrictions of i to the -measurable 1 2 i functions, and the restrictions of , to -measurable functions. Obviously, both and satisfy requirements (1)–(3) of the Proposition, although they differ on the set of -measurable functions (to which f and y ∗ belong.) This contradicts Step 2a, and the Proof of the Proposition is thus completed.
Acknowledgments The authors acknowledge partial financial support by the Foerder Institute for Economic Research and by The Keren Rauch Fund at Tel Aviv University.
Maxmin expected utility with non-unique prior
135
References Agnew, C. E. (1985). Multiple probability assessments by dependent experts, Journal of the American Statistical Association, 80, 343–347. Anscombe, F. J. and R. J. Aumann (1963). A definition of subjective probability, The Annals of Mathematics and Statistics, 34, 199–205. Bewley, T. (1986). Knightian decision theory: Part 1, Mimeo (Yale University, New Haven, CT). Choquet, G. (1955). Theory of capacities, Annales de l’Institut Fourier, 5, 131–295. Dunford, N. and J. T. Schwartz (1957). Linear operators, Part I (Interscience, New York). Ellsberg, D. (1961). Risk, ambiguity and the savage axioms, Quarterly Journal of Economics, 75, 643–669. Fishburn, P. C. (1970). Utility theory for decision making (Wiley, New York). Genest, C. and M. J. Schervish (1985). Modeling expert judgments for Bayesian updating, The Annals of Statistics, 13, 1198–1212. Gilboa, I. (1987). Expected utility theory with purely subjective non-additive probabilities, Journal of Mathematical Economics, 16, 65–88. Huber, P. J. and V. Strassen (1973). Minimax tests and the Neyman–Pearson lemma for capacities, The Annals of Statistics, 1, 251–263. Hurwicz, L. (1951). Some specification problems and application to econometric models, Econometrica, 19, 343–344. Lindley, D. V., A. Tversky and R. V. Brown (1979). On the reconciliation of probability assessments, Journal of the Royal Statistical Society, Series A 142, 146–180. Savage, L. J. (1954). The foundations of statistics (Wiley, New York). Schmeidler, D. (1972). Cores of exact games, I, Journal of Mathematical Analysis and Applications, 40, 214–225. Schmeidler, D. (1982). Subjective probability without additivity (temporary title), Working paper (Foerder Institute for Economic Research, Tel Aviv University, Tel Aviv). Schmeidler, D. (1984). Subjective probability and expected utility without additivity, IMA Preprint Series. (Reprinted as Chapter 5 in this volume.) Schmeidler, D. (1986). Integral representation without additivity, Proceedings of the American Mathematical Society, 97, No. 2. Smith, A. B. Cedric, (1961). Consistency in statistical inference and decision, Journal of the Royal Statistical Society, Series B 23, 1–25. Straszewicz, S. (1935). Uber exponierte Punkte abgeschlossener Punktmengen, Fundamenta Mathematicae, 24, 139–143. Vardennan, S. and G. Meeden (1983). Calibration, sufficiency and dominant consideration for Bayesian probability assessors, Journal of the American Statistical Association, 78, 808–816. Wakker, P. (1986). Ch. 6 in a draft of a Ph.D thesis. Wald, A. (1950). Statistical decision functions (Wiley, New York).
7
A simple axiomatization of nonadditive expected utility Rakesh Sarin and Peter P. Wakker
7.1. Introduction Savage’s (1954) subjective expected utility (SEU) theory has been widely adopted as the guide for rational decision making in the face of uncertainty. In SEU theory both the probabilities and the utilities are derived from preferences (see also Ramsey (1931)). This represents a hallmark contribution, as it avoids the reliance on introspection for quantifying tastes and beliefs. We continue in Savage’s vein and extend his theory to derive a more general nonadditive expected utility representation, called Choquet expected utility (CEU). Schmeidler (1989, first version 1982) made the first contribution in providing a CEU representation and Gilboa (1987) extended this work. We develop this line of research further by providing an intuitive axiomatization of CEU. The key distinction between our work and that of Savage is that we identify two types of events—unambiguous and ambiguous. People feel relatively “sure” about the probabilities of unambiguous events. An example of an unambiguous event could be the outcome of a toss of a fair coin (heads or tails). We assume that Savage’s axioms hold for a sufficiently rich set of “unambiguous acts,” that is, acts measurable with respect to the unambiguous events. The probabilities of ambiguous events, however, are not known with precision. An example of such an event could be next week’s weather conditions (rain or sunshine). Ambiguity in the probability of such events may be caused, for example, by a lack of available information relative to the amount of conceivable information Keynes (1921). Most people exhibit a reluctance to bet on events with ambiguous probabilities. This reluctance leads to a violation of Savage’s sure-thing principle (P2). The CEU theory proposed here does not impose the sure-thing principle for all events and is therefore capable of permitting a liking for specificity and a dislike for ambiguity in probability. The key condition in this chapter to provide the CEU representation is “cumulative dominance” (P4 in Section 7.3). Simply stated, this condition requires that
Sarin, R. and P. P. Wakker (1992). “A simple axiomatization of nonadditive expected utility,” Economica, 18, 141–153.
Axiomatization of nonadditive expected utility
137
if receiving consequence α or a superior consequence is considered more likely for an act f than for an act g, for every α, then the act f is preferred to the act g. This condition is trivially satisfied for an SEU maximizer. Unlike the sure-thing principle that forces the probabilities for all events to be additive, cumulative dominance permits that probabilities for some events could be nonadditive. A probability function is nonadditive if the probability of the union of two disjoint events is not equal to the sum of the individual probabilities of each event. An example will show how nonadditive probabilities could accommodate an aversion toward ambiguity. The judgments and preferences that may lead to nonadditive probability have been rationalized by many authors. For example, Keynes (1921) has argued that confidence in probability influences decisions under uncertainty. Knight (1921) made the distinction between risk and uncertainty based on whether the event probabilities are known or unknown. Recently Schmeidler (1989) has argued that the amount of information available about an event may influence probabilities in such a way that probabilities are not necessarily additive. In a seminal paper, Ellsberg (1961) showed that if one accepts Savage’s definition of probability then a majority of subjects violates additivity of probability. Numerous experiments since then have confirmed Ellsberg’s findings. Even though Ellsberg’s example is well known, we present it as it serves to illustrate the motivation and direction for our proposed modification of Savage’s theory. Suppose an urn is filled with 90 balls, 30 of which are red (R), and 60 of which are white (W ) and yellow (Y ) in an unknown proportion. One ball will be drawn randomly from the urn and your payoff will depend on the color of the drawn ball and the “act” (decision alternative) you choose. See Table 7.1. When subjects are asked to choose between acts f and g, a majority chooses act f , presumably because in act f the chance of winning $1,000 is precisely known to be 1/3. In act g the chance of drawing a white ball is ambiguous since the number of white balls is unknown. Now, when the same subjects are asked to choose between acts f and g a majority chooses the act g . Again, in act g , the chance of winning $1,000 is precisely known to be 2/3, whereas in act f , the chance of winning is ambiguous. Thus, subjects tend to like specificity and to avoid ambiguity. By denoting v(R), v(W ), and v(Y ) as the probability of drawing a red, white, or yellow ball respectively, we obtain, assuming expected utility with
Table 7.1 The Ellsberg options Act
f g f g
30 balls
60 balls
Red
White
Yellow
$1,000 $0 $1,000 $0
$0 $1,000 $0 $1,000
$0 $0 $1,000 $1,000
138 Rakesh Sarin and Peter P. Wakker u(0) = 0:f g implies v(R)u(1, 000) > v(W )u(1, 000) or
v(R) > v(W );
g f implies v(W )u(1,000) + v(Y )u(1,000) > v(R)u(1,000) + v(Y )u(1,000), or v(W ) > v(R). Thus, consistent probabilities cannot be assigned to the states, as v(R) cannot simultaneously be larger as well as smaller than v(W ). Clearly, in this example no inconsistency results if v(R ∪ Y ) = v(R) + v(Y ). In our development we permit nonadditive probabilities for some events (such as R ∪ Y ) that we call ambiguous events. Our strategy is to differentiate between ambiguous and unambiguous events by requiring that only the acts that are measurable with respect to unambiguous events satisfy Savage’s axioms. General acts are assumed to satisfy somewhat weaker conditions that may yield nonadditive probabilities for ambiguous events. It is to be noted that we do not require an a priori definition of unambiguous or ambiguous events (for the latter see Fishburn, 1991). We do, however, assume that there exists a subclass of events, such as those generated by a roulette wheel, such that an SEU representation holds with respect to these events. The idea is that these events are unambiguous. The subclass of unambiguous events should be rich enough to ensure that all ambiguous events can be calibrated by appropriate bets contingent on unambiguous events. The strategy of permitting probabilities to be nonadditive and using them in CEU was first proposed by Schmeidler (1989, first version 1982). Schmeidler uses the set-up of Anscombe and Aumann (1963) (as refined in Fishburn, 1967, 1970, 1982), where for every state an act leads to an objective probability distribution, to formulate his axioms and derive the result. A nonadditive probability extension for the approach of Savage (1954) in full generality is very complicated. Gilboa (1987) succeeded in finding such an extension. The resulting axioms are, however, quite complicated and do not seem to have simple intuitive interpretations (see Fishburn 1998: 202). In this chapter, we propose another extension of Schmeidler’s model that in our view has a greater intuitive appeal. The basic idea is to reformulate Savage’s axioms to permit nonadditivity in probability for ambiguous events (event R ∪ Y in Table 7.1) while preserving additivity for unambiguous events (event Y ∪ W in Table 7.1). Technically, our work may be viewed as a sort of unification of Gilboa (1987) and Schmeidler (1989), and builds heavily on these works. Additional axiomatizations of CEU that assume some rich structure on the consequences instead of the states have been provided in Wakker (1989a,b, 1993a), and Nakamura (1990, 1992). Wakker (1990) has shown that CEU when applied to decision making under risk (where probabilities are extraneously specified) is identical to rank-dependent (anticipated) utility. A survey of several independent discoveries of the CEU form has been given in Wakker (1991).
Axiomatization of nonadditive expected utility
139
Schmeidler’s lottery-acts formulation may be viewed as a two-stage process where a state s occurs in the first stage and in the second stage a lottery is played to determine the final consequence. If probabilities are additive the one-stage formulation (e.g. of Savage) and the two-stage formulation (e.g., of Anscombe and Aumann) yield the same conclusion. However, as we shall see, in the nonadditive case the two formulations yield different conclusions about the preference rankings of acts. We begin by presenting some notations and definitions in Section 7.2. Our axioms and main result are stated in Section 7.3. In Section 7.4 we explore the relationship between CEU and SEU models. An example and a general result showing the irreconcilability of Schmeidler’s two-stage formulation with a naturally equivalent one-stage formulation are presented in Section 7.5. Finally, Section 7.6 contains conclusions, and proofs are given in the Appendix.
7.2. Definitions 7.2.1. Elementary definitions In this section we present the notation for the Savage (1954) style formulation for decisions under uncertainty and introduce some definitions that are useful in developing our results. There is a set C of consequences (payoffs, prizes, outcomes) and a set S of states of nature. The states in S are mutually exclusive and collectively exhaustive, so that exactly one state is the true state. We shall let A denote a σ -algebra of subsets of S, that is, A contains S, A ∈ A implies Ac (the complement of A) ∈ A, and A is closed under countable unions (this will be generalized in Remark 7.1). Thus A also contains Ø, and is closed under countable intersections. Subjective probabilities or “capacities” will be assigned to the elements of A; these elements are called events. An event A is informally said to occur if A contains the true state. The set C is also assumed to be endowed with a σ -algebra D; this will only play a role for acts with an infinite number of consequences. A decision alternative or an act is a function from S to C that is measurable, that is, f −1 (D) ∈ A for all D ∈ D. If the decision maker chooses an act f , then the consequence f (s) will result where s is the true state. The decision maker is uncertain about which state is true, hence about which consequence will result from an act. The set of acts is denoted as F . Act f is constant if, for some α ∈ C, f (s) = α for all states s. Often a constant act is identified with the resulting consequence. Statements of conditions are simplified by defining fA as the restriction of f to A, fA h as the act that assigns consequences f (s) to all s ∈ A, and consequences h(s) to all s ∈ S \ A. Given that consequences are identified with constant acts, fA α designates the act that is identical to f on A and constant α on S \ A; αA β is similar. Further, for a partition {A1 , . . . , Am }, we denote by 1 . . . α m the act that assigns consequence α j to each s ∈ A , j = 1, . . . , m. αA j Am 1 Such acts are called step acts.1 A binary relation over F gives the decision
140
Rakesh Sarin and Peter P. Wakker
maker’s preferences. The notations , , ≺, and ∼ are as usual. Further, is a weak order if it is complete (f g or g f for all f , g) and transitive. We define on C from on F through constant acts: α β if f g where f is constant α, g is constant β. Postulate P3 will ensure that on F and on C are in proper agreement. We assume that and D are compatible in the sense that all “preference intervals” are contained in D. A preference interval, as defined in Fishburn (1982), is a set E ⊂ D such that α, γ ∈ E, α β γ imply β ∈ E. A special case is a set E such that α ∈ E, β α implies β ∈ E. Such sets are called cumulative consequence sets. They will play a central role in this chapter. Example 7A.1 shows why, in the absence of set continuity, cumulative dominance must include all cumulative consequence sets and not just sets of the form {β:β α}; in the latter case cumulative dominance would become too strong. Following Savage (1954) (see also de Finetti (1931, 1937) and Ramsey (1931)), we define on A from on F through “bets on events:” A B if there exist consequences α β such that αA β αB β. We then say that A is more likely than B. Postulate P4 will ensure that on A satisfies usual conditions such as transitivity and completeness, and is in proper agreement with on F ; see also Lemma 7.1 in Section 7.2.2. Obviously, in this chapter the more-likely-than relation will not correspond to an additive probability; it will correspond to a “capacity”, that is a nonadditive probability see Lemma 7.2.1.2 We will make use of a sub σ -algebra Aua of A that should be thought of as containing unambiguous events, for example events generated by the spin of a roulette wheel, or by repeated tosses of a coin. We denote by F ua the set of acts that are D–Aua measurable; that is, F ua contains the acts f for which f −1 (E) ∈ Aua for each E ∈ D. We will assume that Savage’s (1954) axioms are satisfied if attention is restricted to the unambiguous events and F ua . An event A ∈ Aua is null if fA h ∼ gA h for all f , g ∈ F ua ; it is non-null otherwise. 7.2.2. Choquet expected utility A function v:A → [0, 1] is a capacity if v(Ø) = 0, v(s) = 1, and v is monotonic with respect to set-inclusion, that is, A ⊃ B ⇒ v(A) v(B). The capacity v is a (finitely additive) probability measure if, in addition, v is additive, that is, v(A ∪ B) = v(A)+v(B) for all disjoint A, B. A capacity v is convex-ranged if for every A ⊃ C and every µ between v(A) and v(C) there exists A ⊃ B ⊃ C such that v(B) = µ. For a capacity v, and a measurable functionφ : S → R, the Choquet integral of φ (with respect to v), denoted S φ dv, or φ dv, or φ, and introduced in Choquet (1953–1954), is v({s ∈ S:φ(s) τ }) dτ + [v({s ∈ S:φ(s) τ }) − 1] dτ . (7.1) R+
R−
In Wakker (1989b, Chapter VI) illustrations are given for the Choquet integral. We say that maximizes Choquet expected utility (CEU) if there exist a capacity v
Axiomatization of nonadditive expected utility
141
on A and a measurable utility function U :C → R such that the preference rela tion is represented by f → S U (f (s)) dv; the latter is called the Choquet expected utility of f , denoted CEU(f ). Suppose there are n states s1 , . . . , sn and U (f (s1 )) · · · U (f (sn )). Then CEU(f )=
n−1
(U (f (si )) − U (f (si+1 )))v({s1 , . . . , si }) + U (f (sn )).
i=1
The proof of the following lemma is left to the reader. Lemma 7.1. If on F maximizes CEU , then the relation on C is represented by the utility function U , and the relation on A is represented by the capacity v whenever U is nonconstant.
7.3. The main result Apart from the well-known postulates of Savage on the unambiguous acts, we shall use one additional postulate, “cumulative dominance” (P4 on the next page), to govern preferences over ambiguous acts. It is a natural extension of Savage’s P4 to acts with more than two consequences. When restricted to acts with exactly two consequences, our P4 is identical to Savage’s P4. It is best appreciated as an adaptation of the stochastic dominance condition. Let us recall that stochastic dominance applies to decision making under risk, where for each uncertain event A ∈ A a probability P (A) is well specified, and usually C is an interval within R. In this setting, an act (or its probability distribution as generated over consequences) stochastically dominates another if it assigns to each cumulative consequence set3 at least as high a probability. In the present set-up, without probabilities attached to each event, it is natural to say that an act f stochastically (“cumulatively”) dominates an act g if the decision maker regards each cumulative consequence set at least as likely under f as under g. Monotonicity with respect to stochastic dominance, reformulated with this adaptation, is our additional postulate P4. It turns out that this condition in the presence of the usual conditions, and Savage’s conditions on a rich set of unambiguous acts, is necessary and sufficient for CEU. To readers familiar with CEU and with Savage’s set-up, the proof of the main result may be transparent if P4 is assumed. We hope that this mathematical simplicity is viewed as a strength of the chapter, because P4, in our opinion, is an intuitively appealing assumption about behavior under uncertainty as well. We first state the axioms and then the main theorem, which is followed by a discussion. Postulate P1. Weak ordering. Postulate P2. (The sure-thing principle for unambiguous acts). For all events A and acts f , g, h, h with fA h, gA h, fA h , gA h ∈ F ua : fA h gA h ⇐⇒ fA h gA h .
142
Rakesh Sarin and Peter P. Wakker
Postulate P3. For all events A ∈ A, acts f ∈ F , and consequences α, β : α β ⇒ αA f βA f . The reversed implication holds as well if A ∈ Aua , A is nonnull, and f ∈ F ua . Postulate P4. (Cumulative dominance). For all acts f, g we have: f g whenever f −1 (E) g −1 (E) for all cumulative consequence sets E. Postulate P5. (Nontriviality). There exist consequences α, β such that α β. Postulate P6. (Fineness of the unambiguous events). If α ∈ C and, for f ∈ F ua , g ∈ F , f g, then there exists a partition (A1 , . . . , Am ) of S, with all elements in Aua , such that αAj f g for all j , and the same holds with ≺ instead of . The following postulate is Gilboa’s adaptation of Savage’s P7 to the case of CEU. It is a technical condition, and is only needed for the extension of CEU to acts with infinite range. In order to state the postulate, we define an event A to be f -convex if for any s, s ∈ A and s ∈ S, f (s) f (s ) f (s ) ⇒ s ∈ A. Note that, for some fixed s ∈ A, f (s)A h denotes the act that assigns f (s) to each s ∈ A, and is identical to h on Ac . Postulate P7. For all f , g ∈ F , and nonempty f -convex events A, f (s)A f g
f or all s ∈ A ⇒ f g,
and the same holds with instead of . We now state the main theorem. In it, cardinal abbreviates “unique up to scale and location.” Theorem 7.1. The following two statements are equivalent: (i) The preference relation maximizes CEU for a bounded nonconstant utility function U on C, and for a capacity v on A. On Aua the capacity is additive and convex-ranged. (ii) Postulates P1–P7 are satisfield. Further, the utility function in statement (i) is cardinal, and the capacity is unique. In this result, condition P4 can be weakened to the following “cumulative reduction” condition, if in addition we include Savage’s P4 (i.e. our P4 restricted to two-consequence acts). Cumulative reduction says that the only relevant aspect of an act is its “decumulative” distribution. Cumulative reduction follows from two-fold application of P4, with the roles of f and g interchanged. This condition is the only implication of P4 that we shall use in the proof of Theorem 7.1 for acts
Axiomatization of nonadditive expected utility
143
with more than two consequences. We have preferred to present the stronger P4 in the theorem because of its close relationship with stochastic dominance. Postulate P4 . (Cumulative reduction). For all acts f , g we have: f ∼ g whenever
f −1 (E) ∼ g −1 (E)
for all cumulative consequences sets E. Let us also point out that all conditions can be weakened to hold only for step acts, with the exception of P1, the act g in P6, and P7. If P4/P4 is restricted to step acts then cumulative consequence sets can be restricted to sets of the form {β ∈ C : β α} for some α ∈ C. The next example considers the cases where the state space is a product space. These are the cases considered by Schmeidler. The above theorem applies to any case where there is a sub σ -algebra isomorphic to the Borel sets on [0, 1] endowed with the Lebesgue measure; the latter is somewhat more general than product spaces. The technique of this chapter allows for more generality: the sets of ambiguous acts and events can be quite general, as long as the set of unambiguous acts and events is sufficiently rich. This will be explicated in Remark 7.1. A further generalization can be obtained in our one-stage approach by imposing on F ua the conditions of Gilboa (1987) which lead to CEU, instead of using Savage’s conditions which lead to additive expected utility. The proof of this more general result is almost identical to the proof of Theorem 7.1. In other words, as soon as there is a sufficiently rich subset of acts on which CEU holds, then by cumulative dominance CEU will spread over all acts. Alternatively, for the rich subset of acts, we could have taken the set of probability distributions over the consequences, with expected utility or rank-dependent utility maximized there. We chose Savage’s set-up because it is very appealing. Example 7.1. Let [0, 1] be endowed with the usual Lebesgue measure (i.e. uniform distribution) over the usual Borel σ -algebra. can be any set endowed with any σ -algebra. Let S = × [0, 1], endowed with the usual product σ -algebra; v is any capacity that assigns the Lebesgue measure of E to any set × E. C can be any arbitrary set, and U :C → R any function, nonconstant to avoid triviality. Preferences maximize CEU. With Aua the σ -algebra of all sets of the form × E for E a Borel-subset of [0, 1], all Postulates P1–P7 are satisfied. Remark 7.1. The requirement that A should be a σ -algebra, and that all A–D measurable functions from S to C should be included in F , can be restricted to the unambiguous acts and events, as follows. (i) Aua should be a σ -algebra, and all Aua –D measurable functions from S to C should be included in F . Then, in addition, the following adaptations should be made. First, the measurability requirement should be imposed that for all f ∈ F and cumulative consequence sets E, f −1 (E) ∈ A. Second, Postulate P3 should be required only if αA f , βA f ∈ F . Third, the nontriviality Postulate P5 should be changed as follows.
144 Rakesh Sarin and Peter P. Wakker Postulate P5 . There exist consequences α β such that αA βAc ∈ F for all events A ∈ A. P5 as such is not a necessary condition for the CEU representation. Fourth and finally, for Postulate P7, needed for nonsimple acts, it should be required that for all acts f ∈ F , f -convex events A, and states s ∈ A, f (s)A f be contained in F (consequences can be “collapsed”). Note that this allows for great generality. For instance, A may consist of Aua , events described by a roulette wheel, and a collection of events entirely unrelated to the roulette wheel. There is no need to incorporate intersections or unions of events described by the roulette wheel, and other events. Let us finally comment further on the uniqueness of the capacity in Theorem 7.1. Suppose Statement (i) in Theorem 7.1 holds. Would there exist CEU representations that also represent the preference relation but have v nonadditive on Aua ? The following observation answers this question. Observation 7.1. Suppose Statement (i) in Theorem 7.1 holds. If there exist three or more equivalence classes of consequences, then for any CEU representation the capacity will be additive on Aua . If there exist no more than two equivalence classes of consequences, then any capacity can be taken that is a strictly increasing transform of the capacity of Theorem 7.1.4
7.4. Revealed unambiguous events In this section we characterize revealed unambiguous events and partitions, that is, those for which the capacity is additive (defined hereafter). It is possible that a decision maker considers some events as ambiguous but nevertheless reveals an additive capacity with respect to these. The characterization of this section will lead to a generalization of the theorem of Anscombe and Aumann (1963). A capacity is additive on a partition {A1 , . . . , Am } if v(A ∪ B) = v(A) + v(B) for all disjoint events A, B that are unions of elements of the partition. This is equivalent to additivity of the capacity on the algebra generated by the partition. A capacity is additive with respect to an event A if it is additive with respect to the partition {A, Ac }, that is, if v(A) = 1 − v(Ac ). Gilboa (1989) used the term symmetry for a capacity that is additive with respect to each event. As shown there, symmetry does not imply that the capacity is additive. A capacity is additive if and only if it is additive on each partition, which holds if and only if it is additive on each partition consisting of three events (consider, for disjoint events A, B, the partition {A, B, (A ∪ B)c }). In the presence of the rich Aua in Theorem 7.1, the characterization of revealed unambiguous partitions is easy. Note that in CEU additivity of the capacity immediately leads to SEU. Machina and Schmeidler (1992) consider the case with an additive probability measure on the events, and a general (nonexpected utility) functional, such as used in Machina (1982). Like our main result, their main result weakens Savage’s sure-thing principle and strengthens his P4. Their P4 implies the sure-thing principle for two-consequence acts, which our P4
Axiomatization of nonadditive expected utility
145
obviously does not. In addition, it implies, mainly in the presence of P6, our P4. The Ellsberg paradoxes give examples where their P4 is satisfied. Proposition 7.1. Suppose Statement (i) in Theorem 7.1 holds. Let {A1 , . . . , Am } be a partition. The following four statements are equivalent: (i) The capacity is additive on the partition. (ii) For all disjoint A and A that are unions of elements of the partition, and for disjoint unambiguous events B ua ∼ A, B ua ∼ A , we have A ∪ A ∼ ua ua B ∪B . ua } such that (iii) There exists an unambiguous partition {B1ua , . . . , Bm 1 m ∼ αB1 1ua . . . αBmmua αA . . . αA 1 m
for all consequences α 1 , . . . , α m .
ua } we have: (iv) For each unambiguous partition {B1ua , . . . , Bm
A1 ∪ . . . ∪ Aj ∼ B1ua ∪ . . . ∪ Bjua for all j ⇒
1 αA 1
m . . . αA m
∼
αB1 1ua
. . . αBmmua
(7.2) 1
for all consequences α , . . . , α m .
We could obviously obtain additivity of the capacity v in Statement (i) of Theorem 7.1 by adding any of the conditions in Statements (ii), (iii), or (iv) above, for each partition, to Statement (ii) of Theorem 7.1. Given the importance of the result that can be derived from Statement (iv), let us make the condition explicit: Postulate P4 . (Reduction.) For each partition {A1 , . . . , Am } and each unamua }, (7.2) holds true. bigous partition {B1ua , . . . , Bm If in the definition of reduction we would have added the condition that the consequences in (7.2) are rank-ordered, that is, α1 · · · αm , then the condition would have been identical to P4 (cumulative reduction) restricted to step acts, which is all of P4 that is needed apart from its restriction to two-consequence acts (i.e. Savage’s P4). P4 resembles the reduction principle in Fishburn (1998), which is called neutrality in Yaari (1987). This principle says that if for two acts consequences are in some sense equally likely, then the acts are equivalent. Corollary 7.1. In Statement (i) of Theorem 7.1 additivity of the capacity can be added if in Statement (ii) P4 (cumulative dominance) is replaced by P4 (reduction) plus the restriction of P4 to two-consequence acts. The above corollary can be regarded as a generalization of the result of Anscombe and Aumann (1963) and Fishburn (1967). Their structure is rich enough to satisfy P1–P3, P4 , and P5–P7. The set-up of the above corollary is more general in exactly the same way that the set-up of Theorem 7.1 is more general than the result of Schmeidler (1989): The state space is not required to be a Cartesian product of ambiguous and unambiguous events. All that is needed is that the set of unambiguous events be rich enough. In the same way that Theorem 7.1 can be
146
Rakesh Sarin and Peter P. Wakker
considered a unification of the results of Schmeidler (1989) and Gilboa (1987), the above corollary can be considered a unification of the results of Anscombe and Aumann (1963) and Savage (1954). The key feature in either case is that the events generated be a random device are incorporated within the state space. We think this is more natural than the two-stage approach of Anscombe and Aumann (1963). In the practice of decision analysis, objective probabilities of events Aua generated by a roulette wheel will typically be used as in Lemma 7A.1 in the Appendix to elicit “unknown” probabilities. This in no way requires a two-stage structure. While Theorem 7.1 was (apart from convex-rangedness) less general than Gilboa’s result, the above corollary is a generalization of both Anscombe and Aumann’s result and Savage’s result. A generalization as indicated in Remark 7.1 can also be obtained for the above corollary. An earlier result along these lines, within the classical additive set-up, is Bernardo et al. (1985). Corollary 7.1 is more general, mainly because, unlike Bernardo et al., we do not require a stochastic independence relation as a primitive, or existence of independent unambiguous events.
7.5. Nonequivalence of one- and two-stage approaches Schmeidler made the novel contribution of showing that CEU is capable of permitting attitudes toward ambiguity that are disallowed by Savage’s SEU. Schmeidler stated his axioms using the horserace–roulette wheel set-up of Anscombe and Aumann (1963). This is a two-stage set-up; that is, in the first stage an event (e.g. the horse Secretariat winning) obtains and in the second stage the consequence is determined depending, for example, on a roulette wheel. In Schmeidler’s model capacities are assigned to first-stage events. Further, the lotteries in the second stage are evaluated by the usual additive expected utility. An act assigns to each first-stage event a lottery, thus an expected utility value. The Choquet integral of these (with respect to the capacity over the first-stage events) gives the evaluation of the act. In our one-stage approach we embed the roulette wheel lotteries within Savage’s formulation by enlarging the state space S. Our one-stage approach is complementary to the two-stage approach of Schmeidler as it provides additional flexibility in modeling decisions under uncertainty. This one-stage approach to CEU was introduced in Becker and Sarin (1989). In the SEU theory, whether the one-stage or a two-stage approach is employed is purely a matter of taste or convenience in modeling. In the CEU framework, however, these two variations produce theoretically different results. We demonstrate this theoretical nonequivalence of one- and two-stage approaches through an example. Our analysis gives further evidence that multi-stage set-ups in nonexpected utility may cause complications. Gärdenfors and Sahlin (1983), Luce and Narens (1985), Luce (1988, 1991, 1992), Luce and Fishburn (1991), Segal (1987, 1990) focus on distinctions between one- and two-or-more-stage set-ups. Segal (1990) uses a two-stage set-up to describe an ambiguous event. Probabilities within each stage are assumed to be additive but they do not follow multiplicative rules between
Axiomatization of nonadditive expected utility (a)
Stage 1
act
Stage 2 H ub
H
b
f
T
ub
H
ub
Tb T ub (b)
H bH ub H bTub
147
Consequences (utilities) 1
1
1
–1
0
0
0
0
1
0
0
1
1
1
1
–1
0
0
0
0
1
0
0
1
f T bH ub T bT ub
Figure 7.1 (a) Two-stage formulation of Example 7.2 (b) one-stage formulation of Example 7.2.
the two stages. Segal showed how dominance type axioms can provide nonexpected utility characterizations in the two-stage-set up (also see Wakker, 1993b). Example 7.2. This example is a small variation on one of the paradoxes of Ellsbeurg. The preferences used in the example are consistent with those observed in the Ellsberg paradox. Further, the single-stage capacities are uniquely determined by the equivalent two-stage model of Schmeidler. Suppose a biased coin and an unbiased coin will be tossed. The possible states of nature are H b H ub , H b T ub , T b H ub , T b T ub , where H b T ub denotes the state where the biased coin lands heads up and the unbiased coin lands tails up, and so on. For simplicity assume that utility is known and that payment is in utility. It follows in Schmeidler’s model that subjects consider a bet of 1 on H ub 5 as well as a bet of 1 on T ub equivalent to 1/2 for certain (given that payment is in utility). It has been observed that subjects will typically consider a bet of 1 on H b as well as a bet of 1 on T b less preferable. Let us assume the latter bets are equivalent to α for certain, for some number α < 1/2. In the two-stage set-up of Anscombe–Aumann and Schmeidler, decisions are formulated as shown in Figure 7.1(a). For the act f shown in Figure 7.1(a), the two-stage approach yields CEU(f ) = 0, because the probability of H ub and T ub is 1/2. Thus, f is judged indifferent to a constant act g with consequence 0. Note that our assumption stated in the preceding paragraph implies that, with v m denoting the capacity in the two-stage approach, v m (H b ) = v m (T b ) = α. Now consider the one-stage formulation of the act in Figure 7.1(a) as depicted in Figure 7.1(b). To evaluate CEU(f ) in Figure 7.1(b) we need the single-stage capacities, now denoted v j to distinguish from the capacities in the two-stage approach, v j (H b H ub ) and v j (H b H ub , T b H ub , T b T ub ). For consistency with
148
Rakesh Sarin and Peter P. Wakker
the two-stage approach (see the boxed columns in Figure 7.1(a) and (b), the first column in Schmeidler’s two-stage approach is equivalent to α/2 and the second column to α × 1 + (1 − α) × 12 = 12 + 12 α, so v j (H b H ub ) = α/2 and v j (H b H ub , T b H ub , T b T ub ) = 12 + 12 α must be chosen. Hence, in the one-stage approach, CEU(f ) is α/2 + (1 − ( 12 + α/2))(−1) = α − 12 < 0; it follows that f ≺ g(≡ 0). Thus the one- and the two-stage approach yield different results, and are irreconcilable. They only agree in the additive case α = 1/2. In Sarin and Wakker (1990: Theorem 10) it is shown that the result of the above example holds in full generality. That is to say, only under expected utility can the one- and two-stage approach of CEU be equivalent. As soon as the capacity is nonadditive in Schmeidler’s two-stage approach, the equivalent one-stage approach is not a CEU model.
7.6. Conclusion Savage’s SEU theory is widely accepted as a rational theory of decision making under uncertainty in economics and decision sciences. Unfortunately, however, people’s choices violate the axioms of SEU theory in some well-defined situations. One such situation is when event probabilities are ambiguous. In this chapter we have shown that a simple extension of SEU theory called CEU theory can be derived by assuming a natural cumulative dominance condition. CEU permits a subject to assign probabilities to events so that the probability of a union of two disjoint events is not necessarily the sum of the individual event probabilities. The violation of additivity may occur because a person’s choice may be influenced by the degree of confidence or specificity about the event probabilities. Schmeidler and Gilboa have also proposed axioms to derive the CEU representation. Building on their work, we have provided the simplest derivation of CEU presently available. Also, conditions have been given under which CEU reduces to SEU. It is also shown that unlike SEU theory, where a one-stage set-up of Savage or a two-stage set-up of Anscombe and Aumann yield identical results, the two-stage set-up of Schmeidler cannot be reconciled with a one-stage formulation unless event probabilities are additive. In our opinion the one-stage set-up as used by Gilboa seems more appropriate in single-person decision theory. We hope that our work has clarified the distinction between CEU and SEU theories and that it will stimulate further research and additional explorations of CEU.
Appendix: Proofs A1. Proof of Theorem 7.1, Remark 7.1, and Observation 7.1 For the implication (i) ⇒ (ii) in Theorem 7.1, suppose (i) holds. Then P1 follows directly. P2 and P3 are standard results from, mainly, the usual additive expected utility theory. For Postulate P4, note that if [f −1 (E) g −1 (E) for all cumulative consequence sets E], then by Lemma 7.1 the integrand in (7.1) is at least as large for φ = U ◦ f as for φ = U ◦ g. So f g, as P4 requires. P5 is direct from
Axiomatization of nonadditive expected utility
149
nonconstantness of U . For P6, let f ∈ F ua , g ∈ F , f g (the case f ≺ g is similar) and α ∈ C. By boundedness of utility, there exists µ > 0 such that ∀s ∈ S:U (f (s))−U (α) < µ. Because v is convex-ranged within Aua , we can take a partition {A1 , . . . , Am } of S such that Aj ∈ Aua and v(Aj ) < (CEU(f )−CEU(g))/µ for all j . For P7, let f , g ∈ F , and let A ∈ A be a nonempty event (f -convexity of A will not be used). Then, with U ∗ = U ◦f on Ac , and U ∗ = inf A U ◦f (inf is real-valued by nonemptiness of A and boundedness of U ) on A, the premise in P7 implies ∗ U ◦ f dv U dv = inf U ◦ f (s)A f dv CEU(g). s∈A
Next we suppose (ii) holds, and derive (i) and the uniqueness results, including Observation 7.1. It is immediate that Savage’s postulates P1–P6 hold true on F ua . So we get an SEU representation on F s,ua , which denotes the set of step acts in F ua . There exist a cardinal utility function U :C → R and a unique additive probability measure P on Aua , such that expected utility represents preferences on F s,ua . We call P (A) the “probability” of A. As follows from Savage (1954), P is atomless and satisfies convex-rangedness. Obviously, P will be the restriction of v to F ua . Let us next extend the CEU representation as now established for all unambiguous step acts, to all step acts. First, we define the capacity v. By P5 there are consequences ζ η, which are kept fixed throughout the proof. Lemma 7.A.1. For each event A there exists an Aua ∈ Aua such that ζA η ∼ ζAua η. Proof. By P2, ζS η ζA η ζØ η. Suppose that in fact ζS η ζA η ζØ η (otherwise we are done immediately), and that for event B ua ∈ Aua we have ζA η ζB ua η (e.g. B ua = Ø). This implies P ((B ua )c ) > 0. By P6, there exists a partition C1 , . . . , Cn of S, with all Cj ∈ Aua , such that ζB ua ∪Cj η ≺ ζA η for all j . There exists at least one Cj ∩ (B ua )c with strictly positive probability. So there exists an event B ua : = B ua ∪ Cj with probability strictly greater than B ua , and such that still ζA η ζB ua η. So, using convex-rangedness, the set of probabilities of events B ua as above must be of the form [0, p − [ for some 0 < p − 1. Similarly, the set of probabilities of events C ua ∈ Aua such that ζA η ≺ ζC ua η, must be of the form ]p+ , 1] for some 0 p+ < 1. The only possibility is p− = p+ . By convex-rangedness there exists an event Aua ∈ Aua with probability p − . Now ζA η ∼ ζAua η is the only possibility. Q.E.D. Thus, for every A ∈ A, there exists an Aua that is equally likely. Because each possible choice of Aua has the same P value, we can define v:A → P (Aua ), extending v from Aua (where v = P ) to the entire A. For monotonicity with respect to set-inclusion, suppose that A ⊃ B. Then, by P2, ζA η ζB η . From this v(A) v(B) follows, and v is a capacity. To establish the CEU representation for all step acts, we construct for each ambiguous step act an unambiguous one “with the same cumulative distribution.”
150
Rakesh Sarin and Peter P. Wakker
That is, for the ambiguous and the unambiguous acts the events of obtaining a consequence at least as good as α are equally likely, for each consequence α. For step acts this is not only necessary, but also sufficient, to have all cumulative consequence sets equally likely under the two acts. First, we extend Lemma 7A.1. The proof of the extension is completely similar, with µ, ν in the place of ζ , η, further f in the place of ζA η, and µ f ν implied by P4.
Lemma 7.A.2. For each act f for which there exist consequences µ, ν such that [∀s ∈ S:µ f (s) ν], there exists an Aua ∈ Aua such that µAua ν ∼ f . Obviously, by the SEU representation as already established, ζAua η ∼ ζB ua η for each unambiguous event B ua equally likely as Aua . By convex-rangedness of P , and Lemma 7.A.1, for each partition A1 , . . . , Am of S we can find an unambiguous partition B1 , . . . , Bm of S such that A1 ∪ · · · ∪Aj is equally likely as B1 ∪· · ·∪Bj , for each j . To do so, first we find an unambiguous B1 ∼ A1 , and set B1 : = B1 . Next we find an unambiguous B2 ∼ A1 ∪ A2 . By convex-rangedness of P , we can find an unambiguous B2 with B2 ∩ B1 = Ø such that P (B1 ∪ B2 ) = P (B2 ), so that B1 ∪ B2 ∼ A1 ∪ A2 , and so on. The next paragraph is the central part of the proof, and is simple. The other parts of the proof are all standard after Savage (1954), using Gilboa’s (1987) P7. 1 . . . α m be an arbitrary step act, with α 1 · · · α m . We take Let αA Am 1 an unambiguous partition {B1 , . . . , Bm } as described earlier. The unambiguous act αB1 1 . . . αBmm , by two-fold application of P4 (once with , once with ), is equivalent to the ambiguous act. Its SEU value can, similarly to the Choquet integral, be written as P (B1 )U (α 1 ) + [P (B1 ∪ B2 ) − P (B1 )]U (α 2 ) + · · · + [1 − P (B1 ∪ · · · ∪ Bm−1 )]U (α m ). This shows that it is identical to the CEU value of the ambiguous act. So indeed CEU represents preferences between all step acts. The extension of the CEU representation to non-step acts is mainly by P7, and is similar to Gilboa (1987). Note that this in particular establishes the expected utility representation on the entire set F ua . Contrary to Gilboa (1987), our capacity need not be convex-ranged. We can however follow the reasoning of his subsection 4.3 with only unambiguous step acts f¯, g. ¯ Convex-rangedness is used there for the existence of g, ¯ while convex-rangedness of P suffices for that. In the proof of his Theorem 4.3.4, in Statement (i), the act f¯ can always be chosen unambiguous, by Lemma 7.A.2. Let us also mention that one cannot restrict P7 to F ua . This would be possible if for each ambiguous act there would exist an unambiguous act with the same cumulative distribution. This however is not the case in general. For example if P is countably additive, then it cannot generate strictly finitely additive distributions; for example, with C = R, it does not generate cumulative distribution functions that are not continuous from the right. Also it is possible that for instance U (C) = [0, 1[, P is countably additive, and there exists a positive ε such that under an ambiguous act f each cumulative event {s ∈ S : f (s) α} (0 α < 1) has capacity at least ε.
Axiomatization of nonadditive expected utility
151
The utility functions must be bounded, as follows from the representation on F ua . This is shown in Fishburn (1970: section 14.1), and the second 1972 edition of Savage (1954: footnote on p. 80). Finally we establish the uniqueness results. By the standard results of Savage (1954) we get cardinality of U , and uniqueness of the restriction P of v to Aua . The extension of v to A \ Aua shows that v is uniquely determined. Next let us suppose that v is allowed to be nonadditive on F ua , as studied in Observation 7.1. Let us at first also suppose that there are three or more nonequivalent consequences. Then the representation, if restricted to F ua , satisfies all conditions in Gilboa (1987); hence by his uniqueness results the restriction of v to F ua is unique, so additive. The uniqueness of v follows in the same way as above. Let us finally consider the case where there are exactly two equivalence classes of consequences, with say ζ η. Any U instead of U in a CEU representation is constant on equivalence classes of consequences and satisfies U (ζ ) > U (η). So U is a strictly increasing transform of U , and obviously is bounded. Given the two-valued range, U is cardinal. Because any v in a CEU representation has to represent the same ordering over events as v, v must be a strictly increasing transform of v. Conversely, any such v will do. Thus, it is possible to choose v such that it is not convex-ranged and not additive on Aua . It can however always be chosen such that it is convex-ranged and additive on Aua . For the proof of Remark 7.1, note that all constructions in the proof of the implication (ii) ⇒ (i) of Theorem 7.1 (including the extension to nonsimple acts, following Gilboa 1987) remain possible under the conditions of Remark 7.1. Our result has not established convex-rangedness of the capacity v. That can be characterized by addition of one condition, Gilboa’s P6*. We propose to rename this as “solvability.” Solvability is satisfied if for all acts f , g, consequences α β, and events A, if αA f g βA f , and αA f , βA f “comonotonic” (∀s ∈ Ac :f (s) α or f (s) β), there exists an event B ⊂ A such that αB βA\B fAc ∼ g. That solvability, even if restricted to two-consequence acts, is sufficient for convexrangedness of v, follows mainly from convex-rangedness of P , which gives all desired “intermediate” g. Necessity is straightforward. Proposition 7.A.1. Suppose Statement (i) in Theorem 7.1 holds. Then v is convexranged if and only if satisfies solvability. For the case of three or more equivalence classes of consequences, a more general derivation, without use of F ua , is given in Gilboa (1987). If there are exactly two equivalence classes of consequences and v is not required to be additive on Aua , then, by Observation 7.1, v need not be convex-ranged, even if solvability is satisfied. The following example shows why we used cumulative consequence sets, instead of less general sets of the form {β ∈ C:β α} for α ∈ C, in the definition P4 of cumulative dominance, and its derivatives P4 and P4 . Note that the distinction is relevant only for nonstep acts, and that we could have restricted
152
Rakesh Sarin and Peter P. Wakker
P4, P4’, P4 to step acts. In that case, we could have used the less general sets as mentioned earlier. Example 7.A.1. Suppose the special case of Statement (i) in Theorem 7.1 holds where in fact all of Savage’s axioms are satisfied. So v is an additive probability measure, that we denote by P . Let C = {−1/j :j ∈ N} ∪ {1 + 1/j :j ∈ N}, and ∞ ∞ let U be the identity. Let {Aj }∞ j = 1 ∪ {Bj }j = 1 be a partition of S, {Aj }j = 1 ∪ {Bj } ∞ another partition of S. A: = ∪j = 1 Aj , B, A , B are defined similarly. Suppose that P (Aj ) = P (Aj ), P (Bj ) = P (Bj ) for all j . Further suppose that P (A) > P (A ). Such cases can be constructed if P is not set-continuous, that is, not countably additive. Let f assign 1 + 1/j to each Aj , and −1/j to each Bj . Similarly f assigns 1 + 1/j to each Aj , and −1/j to each Bj . For each consequence 1 + 1/j we have j j P (f (s) 1 + 1/j ) = i = 1 P (Aj ) = i = 1 P (Aj ) = P (f (s) 1 + 1/j ). For each consequence −1/j we have P (f (s) −1/j ) = 1 − P (f (s) ≺ j −1 j −1 −1/j ) = 1 − i = 1 P (Bi ) = 1 − i = 1 P (Bi ) = P (f (s) −1/j ). So for each α ∈ C:{s ∈ S:f (s) α} ∼ {s ∈ S:f (s) α}. However, for 0 µ 1, P (f (s) µ) = P (A) > P (A ) = P (f (s) µ). By Formula (7.1), CEU(f ) − CEU(f ) = 1 × (P (A) − P (A )) > 0. So f f . Only for cumulative consequence sets E: = [µ, ∞[ with µ as above we do not have f −1 (E) f −1 (E). A1. Proof of Proposition 7.1 The implications (i) ⇒ (ii) and (i) ⇒ (iv) are direct. The implication (i) ⇒ (iii) follows from convex-rangedness of P . Next we prove that Statement (i) is implied by each of the other statements. (ii) ⇒ (i) is direct. (iii) ⇒ (ii) follows from taking A and A as union of Aj ’s, taking B ua and B ua as union of corresponding Bjua ’s, and from the equivalences (with ζ η) ζA η ∼ ζB ua η, ζA η ∼ ζB ua η, ζA∪A η ∼ ζB ua ∪B ua η. Finally, suppose (iv) holds. Similarly to the reasoning below Lemma 7.A.2, we can show the existence of ua } such that A ∪· · ·∪A ∼ B ua ∪· · ·∪B ua an unambiguous partition {B1ua , . . . , Bm 1 j 1 j for all j . For any A that is a union of Aj ’s-different-from-A1 , and B ua a union of corresponding Bj ’s, we have, by (7.2), ζA∪A1 η ∼ ζB ua ∪B1ua η and ζA η ∼ ζB ua η. Taking differences and dividing by the positive U (ζ ) − U (η) we get v(A∪A1 )−v(A) = P (B ua ∪B1ua )−P (B ua ) = P (B1ua ). So the “decision weight” that A1 contributes to each union of the other Aj ’s, is independent of those other Aj ’s. The same holds for each Av . Hence, the capacity of a union of different Aj ’s is the sum of the separate capacities: v is additive on the partition.
Acknowledgments The support for this research was provided in part by the Decision, Risk, and Management Science branch of the National Science Foundation; the research of Peter Wakker has been made possible by a fellowship of the Royal Netherlands Academy of Arts and Sciences, and a fellowship of the Netherlands Organization for Scientific Research. The authors are thankful to two referees for many detailed comments.
Axiomatization of nonadditive expected utility
153
Notes 1 Every step act is “simple,” that is, is measurable and has a finite range. If D contains every one-element subset, then every simple act is a step act. Step acts turn out to be easier to work with than simple acts. 2 Sometimes a nonadditive capacity is a strictly increasing transform of a probability measure, which then also represents the “more-likely-than” relation. In general, however, a capacity will not be of that form. 3 For example, receiving α or a superior consequence. 4 Only one will be additive on Aua of course. 5 Such a bet gives 1 if H ub obtains and 0 if T ub obtains.
References Anscombe, F. J. and R. J. Aumann (1963). “A Definition of Subjective Probability,” Annals of Mathematical Statistics, 34, 199–205. Becker, J. L. and R. K. Sarin (1989). “Economics of Ambiguity,” Duke University, Fuqua School of Business, Durham, NC, USA. Bernardo, J. M., J. R. Ferrandiz, and A. F. M. Smith (1985). “The Foundations of Decision Theory: An Intuitive, Operational Approach with Mathematical Extensions,” Theory and Decision, 19, 127–150. Choquet, G. (1953–1954). “Theory of Capacities,” Annales de l’Institut Fourier, 5 (Grenoble), 131–295. de Finetti, B. (1931). “Sul Significato Soggettivo della Probabilità,” Fundamenta Mathematicae, 17, 298–329. —— (1937). “La Prévision: Ses Lois Logiques, ses Sources Subjectives,” Annales de l’ Institut Henri Poincaré, 7, 1–68. Translated into English by H. E. Kyburg, “Foresight: Its Logical Laws, its Subjective Sources,” in Studies in Subjective Probability, ed. by H. E. Kyburg and H. E. Smokler. New York: Wiley, 1964, 53–118; 2nd edition, New York: Krieger, 1980. Ellsberg, D. (1961). “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Fishburn, P. C. (1967). “Preference-Based Definitions of Subjective Probability,” Annals of Mathematical Statistics, 38, 1605–1617. —— (1970). Utility Theory for Decision Making. New York: Wiley. —— (1982). The Foundations of Expected Utility. Dordrecht: Reidel. —— (1988). Nonlinear Preference and Utility Theory. Baltimore: Johns Hopkins University Press. —— (1991). “On the Theory of Ambiguity,” International Journal of Information and Management Science, 2, 1–16. Gärdenfors, P. and N.-E. Sahlin (1983). “Decision Making with Unreliable Probabilities,” British Journal of Mathematical and Statistical Psychology, 36, 240–251. Gilboa, I. (1987). “Expected Utility with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88. —— (1989). “Duality in Non-Additive Expected Utility Theory,” in Choice under Uncertainty, Annals of Operations Research, ed. by P. C. Fishburn and I. H. LaValle. Basel: J. C. Baltzer AG, 405–414. Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan. Second edition, 1948. Knight, F. H. (1921). Risk, Uncertainty, and Profit. New York: Houghton Mifflin.
154
Rakesh Sarin and Peter P. Wakker
Luce, R. D. (1988). “Rank-Dependent, Subjective Expected-Utility Representations,” Journal of Risk and Uncertainty, 1, 305–332. —— (1991). “Rank- and-Sign Dependent Linear Utility Models for Binary Gambles,” Journal of Economic Theory, 53, 75–100. —— (1992). “Where Does Subjective Expected Utility Fail Descriptively?” Journal of Risk and Uncertainty, 4, 5–27. Luce, R. D. and P. C. Fishburn (1991). “Rank- and-Sign Dependent Linear Utility Models for Finite First-Order Gambles,” Journal of Risk and Uncertainty, 4, 29–59. Luce, R. D. and L. Narens (1985). “Classification of Concatenation measurement Structures According to Scale Type,” Journal of Mathematical Psychology, 29, 1–72. Machina, M. J. (1982). “ ‘Expected Utility’ Analysis without the Independence Axiom,” Econometrica, 50, 277–323. Machina, M. J. and D. J. Schmeidler (1992). “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. Nakamura, Y. (1990). “Subjective Expected Utility with Non-Additive Probabilities on Finite State Spaces,” Journal of Economic Theory, 51, 346–366. —— (1992). “Multi-Symmetric Structures and Non-Expected Utility,” Journal of Mathematical Psychology, 36, 375–395. Ramsey, F. P. (1931). “Truth and Probability,” in The Foundations of Mathematics and other Logical Essays. London: Routledge and Kegan Paul, 156–198. Reprinted in Studies in Subjective Probability, ed. by H. E. Kyburg and H. E. Smokler. New York: Wiley, 1964, 61–92. Sarin, R. and P. P. Wakker (1990). “Incorporating Attitudes towards Ambiguity in Savage’s Set-up,” University of California, Los Angeles, CA. Savage, L. J. (1954). The Foundations of Statistics. New York: Wiley. Second Edition, New York: Dover, 1972. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Segal, U. (1987). “The Ellsberg Paradox and Risk Aversion: An Anticipated Utility Approach,” International Economic Review, 28, 175–202. —— (1990). “Two-Stage Lotteries without the Reduction Axiom,” Econometrica, 58, 349– 377. Wakker, P. P. (1989a). “Continuous Subjective Expected Utility with Nonadditive Probabilities,” Journal of Mathematical Economics, 18, 1–27. —— (1989b). Additive Representations of Preferences, A New Foundation of Decision Analysis. Dordrecht: Kluwer. —— (1993a). “Unbounded Utility for Savage’s ‘Foundations of Statistics’, and Other Models,” Mathematics of Operations Research, 18, 446–485. —— (1990). “Under Stochastic Dominance Choquet-Expected Utility and Anticipated Utility are Identical,” Theory and Decision, 29, 119–132. —— (1991). “Additive Representations of Preferences on Rank-Ordered Sets I. The Algebraic Approach,” Journal of Mathematical Psychology, 35, 501–531. —— (1993b). “Counterexamples to Segal’s Measure Representation Theorem,” Journal of Risk and Uncertainty, 6, 91–98. Yaari, M. E. (1987). “The Dual Theory of Choice under Risk,” Econometrica, 55, 95–115.
8
Updating ambiguous beliefs Itzhak Gilboa and David Schmeidler
8.1. Introduction The Bayesian approach to decision making under uncertainty prescribes that a decision maker have a unique prior probability and a utility function such that decisions are made so as to maximize the expected utility. In particular, in a statistical inference problem the decision maker is assumed to have a probability distribution over all possible distributions which may govern a certain random process. This paradigm was justified by axiomatic treatments, most notably that of Savage (1954), and it enjoys unrivaled popularity in economic theory, game theory, and so forth. However, this theory is challenged by two classes of evidence: on the one hand, there are experiments and thought experiments (such as Ellsberg (1961) and many others) which seem to show that individuals tend to violate the consistency conditions underlying (and implied by) the Bayesian approach. On the other hand, people seem to have difficulties with specification of a prior for actual statistical inference problems. Thus, classical—rather than Bayesian—methods are user for practical purposes, although they are theoretically less satisfactory. The last decade has witnessed—among numerous and various generalizations of von Neumann and Morgenstern’s (1947) expected utility theory—generatlizations of the Bayesian paradigm as well. We will not attempt to provide a survey of them here. Instead, we only mention the models which are relevant to the sequel. 1
2
Nonadditive probabilities. First introduced by Schmeidler (1982, 1986, 1989) and also axiomatized in Gilboa (1987), Fishburn (1988), and Wakker (1989), nonadditive probabilities are monotone set-functions which do not have to satisfy additivity. Using the Choquet integral (Choquet, 1953–1954) one may define expected utility, and the works cited before axiomatize preference relations which are representable by expected utility in this sense. Multiple priors (MP). As axiomatized by Gilboa and Schmeidler (1989), this model assumes that the decision maker has a set of priors, and each alternative
Gilboa, I. and D. Schmeidler (1993). Updating ambiguous beliefs, J. Econ. Theory, 59, 33–49.
156
Itzhak Gilboa and David Schmeidler is assessed according to its minimal expected utility, where the minimum is taken over all priors in the set. (This idea is also related to Bewley (1986– 1988), who suggests a partial order over alternatives, such that one alternative is preferred to another only if its expected utility is higher according to all priors in the set.)
Of particular interest to this study is the intersection of the two models: it turns out that if a nonadditive measure (NA) exhibits uncertainty aversion (technically, if it is convex in the sense v(A ∪ B) + v(A ∩ B) v(A) + v(B)), then the Choquet integral of a real-valued function with respect to v equals the minimum of all its integrals with respect to additive priors taken from the core of v. (The core is defined as in cooperative game theory, that is, p is in the core of v if p(A) v(A) for every event A with equality for the whole sample space. Convex NA have nonempty cores.) While these models—as many others—suggested generalizations of the Bayesian approach for a one-shot decision problem, they shed very little light on the problem of dynamically updating probabilities as new information is gathered. We find this problem to be of paramount importance for several interrelated reasons: 1 2
3
4
The theoretical validity of any model of decision making under uncertainty is quite dubious if it cannot cope successfully with the dynamic aspect. The updating problem is at the heart of statistical theory. In fact, it may be viewed as the problem statistical inference is trying to solve. Some of the works in the statistical literature which pertain to this study are Agnew (1985), Genest and Schervish (1985), and Lindley et al. (1979). Applications of these models to economic and game theory models require some assumptions on how economic agents change their beliefs over time. The question naturally arises, then: What are the reasonable ways to update such beliefs? The theory of artificial intelligence, which in general seems to have much in common with the foundations of economic, decision, and game theory, also tries to cope with this problem. See, for instance, Fagin and Halpern (1989), Halpern and Fagin (1989), and Halpern and Tuttle (1989).
In this study we try to deal with the problem axiomatically and suggest plausible update rules which satisfy some basic requirements. We present a family of pseudoBayesian rules, each of which may be considered a generalization of Bayes’ rule for a unique additive prior. We also present a family of “classical” update rules, each of which starts out with a given set of priors, possibly rules some of them out in the face of new information, and continues with the (Bayesian) updates of the remaining ones. In particular, a maximum-likelihood-update rule would be the following: consider only those priors which ascribe the maximal probability to the event
Updating ambiguous beliefs
157
that is known to have occurred, update each of them according to Bayes’ rule, and continue in this fashion. It turns out that if the set of priors one starts out with can also be represented by a nonadditive probability, the results of this rule are independent of the order in which information is gathered. Furthermore, for those preferences which can be simultaneously represented by a NA and by MP, the maximum-likelihood-update rule coincides with one of the more intuitive Bayesian rules, and they boil down to the Dempster–Shafer rule (see Dempster (1967, 1968), Shafer (1976), and Smets (1986)). For recent work on belief functions and their updating, see Jaffray (1989), Chateauneuf and Jaffray (1989), and especially Jaffray (1990). Thus, we find that an axiomatically based generalization of the Bayesian approach can accommodate MP (which are used in classical statistics). Moreover, the maximum-likelihood principle, which is at the heart of statistical inference (and implicit in the techniques of confidence sets and hypothesis testing) coincides with the generalized Bayesian updating. Due to the prominence of this rule, it may be a source of insight to study it in a simple example. Consider Ellsberg’s example in which an urn with 90 balls is given, out of which 30 are red, and 60 are either blue or yellow. For simplicity of exposition, let us model this situation in a somewhat extreme fashion, allowing for all distributions of blue and yellow balls. Maxmin expected utility with respect to this set of priors is equivalent to the maximization of the Choquet integral of utility with respect to (w.r.t.) a NA v defined as 1 , 3 1 v(R ∪ B) = v(R ∪ Y ) = , 3 v(R ∪ B ∪ Y ) = 1, v(R) =
v(B) = v(Y ) = 0 v(B ∪ Y ) =
2 3
where R, B, and Y denote the events of a red, blue, or yellow ball being drawn, respectively. Assume now that it is known that a ball (which, say, has already been drawn) is not red. Conditioning on the event B ∪ Y , all priors in the set ascribe probability of 2 3 to it. Thus, they are all left in the set and updated according to Bayes’ rule. This captures our intuition that no ambiguity was resolved, and our complete ignorance regarding the event B ∪ Y has not changed. (Actually, it is now highlighted by the fact that this event, about which we know the least, is now known to have occurred.) Consider, on the other hand, the same update rules in the case that R ∪ B is known. The priors we started out with ascribe to this event probabilities ranging from 13 to 23 . According to the maximum-likelihood principle, only one of them is chosen—namely, the p which satisfies p(R) =
1 , 3
p(B) =
2 , 3
p(Y ) = 0.
158
Itzhak Gilboa and David Schmeidler
In this particular case, the set of priors shrinks to a singleton and, equivalently, the updated measure v is additive (and equals p itself). Ambiguity is thus reduced (in the case, eliminated) by the generalized Bayesian learning embodied in the exclusion of some priors. In the context of such examples it is sometimes argued that the maximumlikelihood rule is too extreme, and that priors which, say, only ε-maximize the likelihood function should not be ruled out. Indeed, classical statistics techniques such as hypothesis testing do allow for ranges of the likelihood function. At present we are not aware of a nice axiomatization of such rules. We point out, however, that the other extreme rule, that is, updating all priors without excluding any of them (see, for instance, Fagin and Halpern (1989), and Jaffray (1990)), does not appear to be any less “extreme” in general, nor does it seem to be implied by more compelling axioms. We believe that our theory can be applied to a variety of economic models, explaining phenomena which are incompatible with the Bayesian theory, and possibly providing better predictions. As a matter of fact, this belief may be updated given new evidence: Dow and Werlang (1990) and Simonsen and Werlang (1990) have already applied the MP theory to portfolio selection problems. These studies have shown that a decision maker having ambiguous beliefs will have a (nontrivial) range of prices at which he/she will neither buy or sell an uncertain asset, exhibiting inertia in portfolio selection. Applying our new results regarding ambiguous beliefs update, one may study the conditions under which these price ranges will shrink in the face of new information. Dow et al. (1989) studied trade among agents, at least one of whom has ambiguous beliefs. They show that the celebrated no-trade result of Milgrom and Stokey (1982) fails to hold in this context. In this study, the Dempster–Shafer rule for updating NA was used, a rule which is justified by the current chapter. Casting the trade setup into a dynamic context raises the question of an asymptotic no-trade theorem: Under what conditions will additional information reduce the volume or probability or trade? In another recent study, Yoo (1990) addressed the question of why stock prices tend to fall after the initial public offering and rise at a later stage. Although Yoo uses ambiguous beliefs mostly as in Bewley’s (1986) model, his results can also be obtained using the models mentioned earlier. It seems that the update rule justified by our study may explain the price dynamics. These various models seem to point at a basic problem: given a convex NA (or, equivalently, a set of priors which is the core of such a measure), under what conditions will the Dempster–Shafer rule yield convergence of beliefs to a single additive prior? Obviously, the answer cannot be “always.” Consider a “large” measurable space with all possible priors (equivalently, with the “unanimity game” as an NA measure). In this setup of “complete ignorance,” no conclusions about the future may be drawn from past observations—that is, the updated beliefs still include all possible priors. However, with some initial information (say, finitely many extreme points of the set of priors) convergence is possible. Conditions that will guarantee such convergence call for further study.
Updating ambiguous beliefs
159
The rest of this chapter is organized as follows. Section 8.2 presents the framework and quotes some results. Section 8.3 defines the update rules and states the theorems. Finally, Section 8.4 includes proofs, related analysis, and some remarks regarding possible generalizations.
8.2. Framework and preliminaries Let X be a set of consequences endowed with a weak order . Let (S, ) be a measurable space of states of the world, where is the algebra of events. A function f : S → X is -measurable if for every x ∈ X {s|f (s) > x},
{s|f (s) x} ∈ .
Let F = {f : S → X|f is -measurable} be the set of acts. Let F0 = {f ∈ F | |range(f )| < ∞} be the set of simple acts. A function u: X → R, which represents , that is, u(x) u(y) ⇐⇒ x y,
∀x, y ∈ X
is called a utility function. A function v: → [0, 1] satisfying (i) v(Ø) = 0; v(S) = 1; (ii) A ⊆ B ⇒ v(A) v(B) is an NA. It is convex if v(A ∪ B) + v(A ∩ B) v(A) + v(B) for all A, B ∈ . It is additive, or simply a measure, if the above inequality is always satisfied as an equality. A real-valued function is -measurable if for every t ∈ R {s|w(s) t},
{s|w(s) > t} ∈ .
Given such a function w and an NA v, the (Choquet) integral of w w.r.t. v on S is
w dv =
w dv =
S
0
∞
v({s|w(s) t}) dt +
0 ∞
[v({s|w(s) t}) − 1] dt.
For an NA measure v we define the core as for a cooperative game, that is, Core(v) = {p|p is a measure s.t. p(A) v(A)∀A ∈ }. Recall that a convex v has a nonempty core (see Shapley (1965)). We are now about to define two classes of binary relations on F : those represented by maximization of expected utility with NA, and those represented by maxmin of expected utility with MP.
160
Itzhak Gilboa and David Schmeidler
Denote by NA◦ (= NA◦ (X, , S, )) the set of binary relations on F such that there are a utility u, unique up to positive linear transformation (p.l.t.), and a unique NA v satisfying: (i) for every f ∈ F , u ◦ f is -measurable; (ii) for every f , g ∈ F f g ⇐⇒
u ◦ f dv
u◦g dv.
Note that in general the measurability of f does not guarantee that of u◦f , and that (ii) implies that on F , when restricted to constant functions, extends on X. Hence we use this convenient abuse of notation. Similarly, we will not distinguish between x ∈ X and the constant act which equals x on S. Characterizations of NA◦ were given by Schmeidler (1986, 1989) for the Anscombe–Aumann (1963) framework, where X is a mixture space and u is assumed affine; by Gilboa (1987) in the Savage (1954) framework, where X is arbitrary but = 2S and v is nonatomic; and by Wakker (1989) for the case where X is a connected topological space. Fishburn (1988) extended the characterization to non-transitive relations. Let MP◦ (= MP◦ (X, , S, )) denote the set of binary relations of F such that there are a utility u unique up to a p.l.t., and a unique nonempty, closed (in the weak* topology), and convex set C of (finitely additive) measures on such that: (i) for every f ∈ F , u ◦ f is -measurable; (ii) for every f , g ∈ F f g ⇐⇒ min p∈C
u ◦ f dp min p∈C
u ◦ g dp.
A characterization of MP◦ in the Anscombe–Aumann framework was given in Gilboa and Schmeidler (1989). To the best of our knowledge, there is no such axiomatization in the framework of Savage. However, the set NA◦ ∩ MP◦ , which will play an important role in the sequel, may be characterized by strengthening the axioms in Gilboa (1987). It will be convenient to include the trivial weak order ∗ = F × F in both NA and MP. Hence, we define NA = NA◦ ∪ {∗ } and MP = MP◦ ∪ {∗ }. For simplicity we assume that X has -maximal and -minimal elements. More specifically, let x ∗ , x∗ ∈ X satisfy x∗ x x ∗ for all x ∈ X. Without loss of generality (w.l.o.g.), assume that x∗ and x ∗ are unique. Since for both NA◦ and MP◦ the utility function is unique only up to a p.l.t. we will assume w.l.o.g. that u(x∗ ) = 0 and u(x ∗ ) = 1 for all utilities henceforth considered. When X is a mixture space we define NA and MP to be the subsets of NA and MP, respectively, where the utility u is also required to be affine. For such spaces X we recall the following results.
Updating ambiguous beliefs
161
Proposition 8.1. Suppose that ∈ NA and let v be the associated NA. Then ∈ MP iff v is convex. Proposition 8.2. Suppose that ∈ MP and let C be the associated set of measures. Define v(A) = min p(A) for A ∈ . p∈C
Then v is an NA and ∈ NA iff v is convex and C = Core(v). The proofs of these appear, explicity or implicity, in Schmeidler (1984, 1986, 1989). Note that the axiomatization of NA (Schmeidler, 1989) uses comonotonic independence, and given this property the convexity of v is equivalent to uncertainty aversion. The axiomatization of MP (Gilboa and Schmeidler, 1989) uses a weaker independence notion—termed C-independence and uncertainty aversion. Given these, the convexity of v and the equality C = Core(v) (where v is defined as in Proposition 8.2) is equivalent to comonotonic independence. We now define update rules. We need the following definitions. Given a measurable partition = {Ai }ni−1 of S and {fi }ni=1 ⊆ F , let (f1 , A1 ; . . . ;fn , An ) denote the act g ∈ F satisfying g(s) = fi (s) for all s ∈ Ai and all 1 i n. Given a binary relation on F , an event A ∈ is -null iff the following holds: for every f , g, h1 , h2 ∈ F , f g
iff (f , Ac ; h1 , A) (g, Ac ; h2 , A).
¯ an update Let B¯ denote the set of all binary relations on F . Given B ⊆ B, rule for B is a collection of functions, U = {UA }A∈ , where UA :B → B¯ such that for all ∈ B and A ∈ , Ac is UA ()-null and US () = . UA () should be thought of as the preference relation once A is known to have occurred. Given B and an update rule for it, U = {UA }A∈ , U is said to be commutative w.r.t. or -commutative if for every A, B ∈ we have UA () ∈ B and UB (UA ()) = UA∩B (). It is commutative if it is commutative w.r.t. for all ∈ B. (Note that this condition is stronger than strict commutativity, that is, UA ◦ UB = UB ◦ UA . However, “commutativity” seems to be a suggestive name which is not overburdened with other meanings.)
8.3. Bayesian and classical rules Given a set B of binary relations of F , every f ∈ F suggests a natural update rule f for B: define BUf = {BUA }A∈ by f
g BUA ()h ⇐⇒ (g, A; f , Ac ) (h, A; f , Ac )
for all g, h ∈ F .
162
Itzhak Gilboa and David Schmeidler f
It is obvious that for every f , BUf is an update rule, that is, that Ac is BUA ()null for all ∈ B and A ∈ . We will refer to it as the f -Bayesian update rule and {BUf }f ∈F will be called the set of Bayesian-update rules. Note that for ∈ NA with an additive v, all the Bayesian update rules coincide with Bayes’ rule, hence the definition of the Bayesian-update rules may be considered a formulation and axiomatization of Bayes’ rule in the case of (a unique) additive prior. Proposition 8.3. For every ∈ B¯ and f ∈ F , BUf is -commutative. Theorem 8.1. Let f ∈ F and assume that || > 4. Then the following are equivalent: (i) BUA (NA ) ⊆ NA for all A ∈ ; (ii) f = (x ∗ , T ; x∗ , T c ) for some T ∈ . f
Of particular interest are the Bayesian update rules corresponding to f = x ∗ and f = x∗ (i.e. T = S or T = Ø in (ii) earlier). For the latter (x∗ ) there is an “optimistic” interpretation: when comparing two actions given a certain event A, the decision maker implicitly assumes that had A not occurred, the worst possible outcome (x∗ ) would have resulted. In other words, the behavior given f A—BUA ()—exhibits “happiness” that A has occurred; the decisions are made as if we are always in “the best of all possible worlds.” Note that the corresponding NA is vA (B) = v(B ∩ A)/v(A). On the other hand, for f = x ∗ , we consider a “pessimistic” decision maker, whose choices reveal the hidden assumption that all the impossible worlds are the best conceivable ones. This rule defines the nonadditive function by vA (B) = [v((B ∩ A) ∪ Ac ) − v(Ac )]/(1 − v(Ac )), which is identical to the Dempster–Shafer rule for updating probabilities. It should not surprise us that this “pessimistic” rule is going to play a major role in relation to MP—that is, to uncertainty averse decision makers who follow a maxmin (expected utility) decision rule. In a similar way one may develop a “dual” theory of “optimism” in which uncertainty seeking will replace uncertainty aversion, concavity of v will replace convexity, and maxmax will supercede maxmin. For this “dual” theory, the update rule vA (B) = v(B ∩ A)/v(A) would be the “appropriate” one (in a sense that will be clear shortly). Note that this rule was used—without axiomatization—as a definition of probability update in Gilboa (1989).
Updating ambiguous beliefs
163
Taking a classical statistics point of view, it is natural to start out with a set of priors. Hence we only define classical update rules for B = MP . A natural procedure in the classical updating process is to rule out some of the given priors, and update the rest according to Bayes’ rule. Thus, we get a family of update rules, which differ in the way the priors are selected. Formally, a classical update rule is characterized by a function R: (C, A) → C such that C ⊆ C is a closed and convex set of measures for every such C and every A ∈ , with R(C, S) = C. The associated update rule will be denoted CU R = {CUAR }A∈ . (If R(C, A) = Ø we define CUAR () = ∗ .) Note that these are indeed update rules, that is, for every ∈ MP , every R and every A ∈ , Ac is CUAR ()-null. Furthermore, for ∈ MP with an associated set C, CUAR () ∈ MP provided that inf {p(A)|R(C, A)} > 0 for all A ∈ . Of particular interest will be the classical update rule called maximum likelihood and defined by R 0 (C A) = {p ∈ C|p(A) = max q(A) > 0}. q∈C
Theorem 8.2. CU R ∈ NA ∩ MP , BU(x A
∗
,S)
0
is commutative on NA ∩ MP . Furthermore, for
() = CUR A () ∈ NA ∩ MP . 0
That is, the Bayesian update rule with f = (x ∗ , S) coincides with the maximumlikelihood classical update rule. Moreover, they are also equivalent to the Dempster–Shafer update rule for belief functions. (Note that every belief function (see Shafer (1976)) is convex, though the converse is false. Yet one may apply the Dempster–Shafer rule for every NA v.)
8.4. Proofs and related analysis Proof of Proposition 8.1. It only requires to note that for every f , g ∈ F , A, B ∈ ((g, A; f , Ac ), B; f , B c ) = (g, A ∩ B; f , (A ∩ B)c ). Proof of Theorem 8.1. 8.1. First assume (ii). Let there be given ∈ NA with associated u and v. Define for B ∈ an NA vB by vB (A) = [v((A ∩ B) ∪ (T ∩ B c )) − v(T ∩ B c )]/[v(B ∪ T ) − v(T ∩ B c )]
164
Itzhak Gilboa and David Schmeidler
if the denominator is positive. (Otherwise the result is trivial.) For every g ∈ F we have
u ◦ (g, B; f , B ) dv = c
1
v({s|u ◦ (g, B; f , B c )(s) t}) dt
0
S
=
1
v((T ∩ B c ) ∪ ({s|u ◦ g(s) t} ∩ B)) dt
0 1
=
[v(T ∩ B c )
0
+ [v(B ∪ T ) − v(T ∩ B c )]vB ({s|u ◦ g(s) t})] dt c c = v(T ∩ B ) + [v(B ∪ T ) − v(T ∩ B )] u ◦ g dvB , where vB and u represent BUB , which implies that the latter is in NA . Conversely, assume (i) holds. Assume, to the contrary, that f (s) ∼ x for s ∈ D where D ∈ , D = S and x∗ < x < x ∗ (where ∼ denotes -equivalence). Let E, F ∈ satisfy E ∩ F = F ∩ D = D ∩ F = Ø. Denote α = u(x) (where 0 < α < 1). Choose m ∈ (α, 1) and a nonadditive v such that f
v(E) = v(F ) = v(D) = m v(E ∪ F ) = v(E ∪ D) = v(F ∪ D) = m and v(T ) = v(T ∩ (E ∪ F ∪ D)) for all T ∈ . Next define ∈ NA by v and (the unique) u. Choose g1 , g2 such that u ◦ g1 (s) = u ◦ g2 (s) = α s∈D u ◦ g1 (s) = 1, u ◦ g2 (s) = α + (1 − α/m) s ∈ E u ◦ g1 (s) = 0, u ◦ g2 (s) = α + (1 − α/m) s ∈ F . Let be BUE∪F (). By assumption it belongs to NA ; hence, there correspond to it u = u and v . Note that v is unique as is nontrivial, and that v (T )= v (T ∩ (E ∪ F )) for all T ∈ . As u ◦ g dv = u ◦ g2 dv, g1 ∼ g2 , whence g1 ∼ g2 . Hence, u ◦ g1 dv = 1 u ◦ g2 dv , that is, v (E) = α + (1 − α/m). Next choose β ∈ (0, α) and choose an act g3 ∈ F such that f
⎧ ⎪ ⎨α u ◦ g3 (s) = β ⎪ ⎩ 0
s∈D s∈E s ∈ F.
Updating ambiguous beliefs
165
For every γ ∈ (0, α) choose gγ ∈ F such that α s∈D u ◦ gγ (s) = γ s ∈ E ∪ F. Then u ◦ g3 dv = αm and u ◦gγ dv = αm + γ (1 − m). Hence, gγ > g3 and gγ > g3 for all γ > 0. However, u ◦ gγ dv = γ and u ◦ g3 dv = βv (E), where v (E) = 0, which is a contradiction. Remark 8.1. In the case of no extreme outcomes, that is, when X has no -maximal or no -minimal elements, and in particular when the utility is not bounded, there are no update rules BUf which map NA into itself. However, one may choose for g, h ∈ F ◦ x ∗ , x∗ ∈ X such that x ∗ g(s), h(s) x∗ , ∀s ∈ S, and for every f T ∈ define BUf () = {BUA }A∈ between g and h by f = (x ∗ , T ; x∗ , T c ). If ∈ NA, this definition is independent of the choice of x ∗ and x∗ . The resulting update rule will be commutative for any (fixed) T ∈ . Proof of Theorem 8.2. Let ∈ MP be given, and let C denote its associated set of additive measures. Define v(·) = minp∈C p(·). Assume that v is convex and C = Core(v). For A ∈ with q(A) > 0 for some q ∈ C, we have " % R 0 (C, A) = p ∈ C|p(A) = max q(A) = {p ∈ C|p(Ac ) = v(Ac )}. q∈C
R R R ∗ (Note that if v(Ac ) = 1, CUR A (CUB ()) = CUB (CUA () = .) As was shown in Schmeidler (1984), v is convex iff for every chain Ø = E0 ⊆ E1 ⊂ · · · ⊂ En = S there is an additive measure p in Core(v) = C such that p(Ei ) = v(Ei ), 0 i n. Furthermore, this requirement for n = 3 is also equivalent to convexity. Next define 0
0
vA (T ) = min{p(T ∩ A)|p ∈ R 0 (C, A)}. (T ) = v((T ∩ A) ∪ Ac ) − v(Ac ). Claim. vA
Proof. For p ∈ R 0 (C, A) we have p(T ∩ A) = p((T ∩ A) ∪ Ac ) − p(Ac ) = p((T ∩ A) ∪ Ac ) − v(Ac ) v((T ∩ A) ∪ Ac ) − v(Ac ) whence vA (T ) v((T ∩ A) ∪ Ac ) − v(Ac ).
0
0
166
Itzhak Gilboa and David Schmeidler
To show the converse inequality, consider the chain Ø ⊆ Ac ⊆ Ac ∪ (A ∩ T ) ⊆ S. By convexity there is p ∈ Core(v) = C satisfying p(Ac ) = v(Ac ) and p(Ac ∪ (T ∩ A)) = v(Ac ∪ (T ∩ A)) which also implies p ∈ R 0 (C, A). Then vA (T ) p(T ∩ A) = p((T ∩ A) ∪ Ac ) − p(Ac )
= v((T ∩ A) ∪ Ac ) − v(Ac ). ∗ c Consider CUR A (). If it is not equal to , it has to be the case that v(A ) < 1, and then it is defined by the set of additive measures 0
CA = {pA |p ∈ R 0 (C, A)} where pA (T ) = p(T ∩ A)/p(A) = p(T ∩ A)/(1 − v(Ac )). Note that CA is nonempty, closed, and convex. Define vA (T ) = min{p(T )|p ∈ CA }, (T )/(1 − v(Ac )), that is, and observe that vA (T ) = vA
vA (T ) = [v((T ∩ A) ∪ Ac ) − v(Ac )]/[1 − v(Ac )].
(8.1)
Hence, vA is also convex. We have to show that CA = Core(vA ). To see this, let p ∈ Core(vA ). We will show that p = qA for some q ∈ R 0 (C, A). Take any q ∈ Core(v) and define q(T ) = p(T ∩ A)[1 − v(Ac )] + q (T ∩ Ac ). Note that q(T ∩ A) = p(T ∩ A)[1 − v(Ac )] vA (T ∩ A)[1 − v(Ac )] = v((T ∩ A) ∪ Ac ) − v(Ac ). (As p ∈ Core(vA ) and by definition of the latter.) Next, since q ∈ Core(v), q(T ∩ Ac ) = q (T ∩ Ac ) v(T ∩ Ac ). Hence, q(T ) = q(T ∩ A) + q(T ∩ Ac ) v((T ∩ A) ∪ Ac ) − v(Ac ) + v(T ∩ Ac ) = v(T ∪ Ac ) − v(Ac ) + v(T ∩ Ac ) v(T ), where the last inequality follows from the convexity of v. Finally, q(S) = q(A) + q(Ac ) = p(A)[1 − v(Ac )] + v(Ac ) = 1. Hence, q ∈ Core(v). Furthermore, q ∈ R 0 (C, A). Obviously, p = qA .
Updating ambiguous beliefs
167
∗
(x , S) R Thus we establish CUR () and A () ∈ NA . Furthermore, CUA () = BUA the non-additive probability update rule (8.1) coincides with the Dempster–Shafer 0 rule. Any of these two facts, combined with the observation CUR A () ∈ NA , 0 R implies that CU is commutative. 0
0
Remark 8.2. It is not difficult to see that the maximum-likelihood update rule is not commutative in general. In fact, one may ask whether the converse of Theorem 8.2 is true, that is, whether a relation ≥ ∈ MP w.r.t. which CUR0 is commutative has to define a set C which is a core of an NA. The negative answer is given by the following example: S = {1, 2, 3, 4}, = 2S , C = conv{p1 , p2 } defined by
p1 p2
1
2
3
4
0.7 0.1
0.1 0.3
0.1 0.3
0.1 0.3
It is easily verifiable that the maximum-likelihood update rule is commutative w.r.t. the induced ∈ MP , though C is not the core of any v. Remark 8.3. It seems that the maximum-likelihood update rule is not commutative in general, because it lacks some “look-ahead” property. One is tempted to define an update rule that will retain all the priors which may, at some point in the future, turn out to be likelihood maximizers. Thus, we are led to the “semi-generalized maximum likelihood”: " R 1 (C, A) = cl conv p ∈ C|p(E) = max q(E) > 0 q∈C
%
for some measurableE ⊆ A (where cl means closure in the weak* topology). Note that the resulting set of measures may include p ∈ C such that p(A) = 0. In this case, define 1 ∗ CUR A () = . However, the following example shows that this update rule also fails to be commutative in general. Consider S = {1, 2, 3, 4, 5}, = 2S , and let C be conv {p1 , p2 , p3 , p4 } defined by the following table:
p1 p2 p3 p4
1
2
3
4
5
0.2 0 0.27 0
0.2 0 0 0.27
0.01 0.4 0.03 0.03
0.09 0.4 0 0
0.5 0.2 0.7 0.7
168
Itzhak Gilboa and David Schmeidler
Taking A = {1, 2, 3, 4} and B = {1, 2, 3}, one may verify that R 1 (R 1 (C, A), B) = {p2 , p3 , p4 } and R 1 (C, B) = {p1 , p2 , p3 , p4 } and that p1B is not in the convex hull of {p2B , p3B , p4B }. We may try an even more generalized version of the maximum-likelihood criterion: retain all priors according to which the integral of some non-negative simple function is maximized. That is, define " " % R 2 (C, A) = cl conv p ∈ C u ◦ f dp = max u ◦ f dq q ∈ C > 0 % for some f ∈ F◦ . The maximization of u ◦ f dp for some f may be viewed as maximization of some convex combination of the likelihood function at several points of time. 2 However, the same example shows that CUR is not commutative in general. Remark 8.4. Although our results are formulated for NA and MP , they may be generalized easily. First, one should note that none of the results actually requires that X be a mixture space. All that is needed is that the utility on X be uniquely defined (up to a p.l.t.) and that its range will contain an open interval. In particular, connected topological spaces with a continuous utility function will do. Moreover, most of the results do not even require such richness of the utility’s range. In fact, this richness was only used in the proof of (i) ⇒ (iii) in Theorem 8.1.
Acknowledgments We thank James Dow, Joe Halpern, Jean-Yves Jaffray, Bart Lipman, Klaus Nehring, Sergiu Werlang, and an anonymous referee for stimulating discussions and comments. NSF grants SES-9113108 and SES-9111873 are gratefully acknowledged.
References C. E. Agnew (1985) Multiple probability assessments by dependent experts, J. Amer. Statist. Assoc., 80, 343–347. F. J. Anscombe and R. J. Aumann (1963) A definition of subjective probability, Ann. Math. Statist., 34, 199–205. T. Bewley (1986) “Knightian Decision Theory: Part I,” Cowles Foundation Discussion paper No. 807, Yale University.
Updating ambiguous beliefs
169
T. Bewley (1987) “Knightian Decision Theory: Part II: Intertemporal Problems,” Cowles Foundation Discussion paper No. 835, Yale University. T. Bewley (1988) “Knightian Decision Theory and Econometric Inference,” Cowles Foundation Discussion paper No. 868, Yale University. A. Chateauneuf and J.-Y. Jaffray (1989) Some characterizations of lower probabilities and other monotone capacities through the use of Moebius inversion, Mathematical Social Sciences, 17, 263–283. G. Choquet (1953–1954) Theory of capacities, Ann. l’Institut Fourier, 5, 131–295. A. P. Dempster (1967) Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Statist., 38, 325–339. A. P. Dempster (1968) A generalization of Bayesian inference, J. Roy. Statist. Soc. Series B, 30, 205–247. J. Dow and S. R. C. Werlang (1990) “Risk Aversion, Uncertainty Aversion and the Optimal Choice of Portfolio,” London Business School Working Paper. (Reprinted as Chapter 17 in this volume.) J. Dow, V. Madrigal, and S. R. C. Werlang (1989) “Preferences, Common Knowledge and Speculative Trade,” Fundacao Getulio Vargas Working paper. D. Ellsberg (1961) Risk, ambiguity and the Savage axioms, Quart. J. Econ., 75, 643–669. R. Fagin and J. Y. Halpern (1989) A new approach to updating beliefs, mimeo. P. C. Fishburn (1988) Uncertainty aversion and separated effects in decision making under uncertainty, in “Combining Fuzzy Imprecision with Probabilistic Uncertainty in Decision Making,” (J. Kacprzyk and M. Fedrizzi, eds), pp. 10–25, Springer-Verlag, New York/Berlin. C. Genest and M. J. Schervish (1985) Modeling expert judgments for Bayesian updating, Ann. Statist., 13, 1198–1212. I. Gilboa (1987) Expected utility theory with purely subjective non-additive probabilities, J. Math. Econ., 16, 65–88. I. Gilboa (1989)Additivizations of non-additive measures, Math. Operations Res., 14, 1–17. I. Gilboa and D. Schmeidler (1989) Maxmin expected utility with non-unique prior, J. Math. Econ., 18, 141–153. (Reprinted as Chapter 6 in this volume.) J. Y. Halpern and R. Fagin (1989) Two views of belief: Belief as generalized probability and belief as evidence, mimeo. J. Y. Halpern and M. R. Tuttle (1989) Knowledge, probability and adversaries, mimeo. J. Y. Jaffray (1989) Linear utility theory for belief functions, Operations Res. Lett., 8, 107–112. J. Y. Jaffray (1990) Bayesian updating and belief functions, mimeo. D. V. Lindley, A. Tversky, and R. V. Brown (1979) On the reconciliation of probability assessments, J. Roy. Statist. Soc. Series A, 142, 146–180. P. Milgrom and N. Stokey (1982) Information, trade and common knowledge, J. Econ. Theory, 26, 17–27. J. von Neumann and O. Morgenstern (1947) “Theory of Games and Economic Behavior,” 2nd edn, Princeton University Press, Princeton, NJ. L. J. Savage (1954) “The Foundations of Statistics,” Wiley, New York. D. Schmeidler (1982) “Subjective Probability Without Additivity,” (temporary title), Working paper, Foerder Institute for Economic Research. Tel Aviv University. D. Schmeidler (1984) Nonadditive probabilities and convex games, mimeo. D. Schmeidler (1986) Integral representation without additivity, Proc. Amer. Math. Soc., 97, 253–261.
170
Itzhak Gilboa and David Schmeidler
D. Schmeidler (1989) Subjective probability and expected utility without additivity, Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) G. Shafer (1976) “A Mathematical Theory of Evidence,” Princeton University Press. Princeton, NJ. M. H. Simonsen and S. R. C. Werlang (1990) Subadditive probabilities and portfolio inertia, mimeo. L. S. Shapley (1971) “Notes on n-person Games VII: Cores of Convex Games,” The Rand Corporation R. M. (1965); also as Cores of convex games. Int. J. Game Theory, 1, 12–26. Ph. Smets (1986) “Combining Non-distinct Evidences,” Technical report ULBIRIDIA-86/3. P. Wakker (1989) Continuous subjective expected utility with non-additive probabilities, J. Math. Econ., 18, 1–17. K. R. Yoo (1990) A theory of the underpricing of initial public offerings, mimeo.
9
A definition of uncertainty aversion Larry G. Epstein
9.1. Introduction 9.1.1. Objectives The concepts of risk and risk aversion are cornerstones of a broad range of models in economics and finance. In contrast, relatively little attention is paid in formal models to the phenomenon of uncertainty that is arguably more prevalent than risk. The distinction between them is roughly that risk refers to situations where the perceived likelihoods of events of interest can be represented by probabilities, whereas uncertainty refers to situations where the information available to the decision-maker is too imprecise to be summarized by a probability measure. Thus the terms “vagueness” or “ambiguity” can serve as close substitutes. Ellsberg, in his famous experiment, has demonstrated that such a distinction is meaningful empirically, but it cannot be accommodated within the subjective expected utility (SEU) model. Perhaps because this latter model has been so dominant, our formal understanding of uncertainty and uncertainty aversion is poor. There exists a definition of uncertainty aversion, due to Schmeidler (1989), for the special setting of Anscombe–Aumann (AA) horse-race/roulette-wheel acts. Though it has been transported and widely adopted in models employing the Savage domain of acts, I feel that it is both less appealing and less useful in such contexts. Because the Savage domain is typically more appropriate and also more widely used in descriptive modeling, this suggests the need for an alternative definition of uncertainty aversion that is more suited to applications in a Savage domain. Providing such a definition is the objective of this chapter. Uncertainty aversion is defined for a large class of preferences. This is done for the obvious reason that a satisfactory understanding of uncertainty aversion can be achieved only if its meaning does not rely on preference axioms that are auxiliary rather than germane to the issue. On the other hand, Choquet expected utility (CEU) theory Schmeidler (1989) and its close relative, the multiple-priors model Epstein, L.G. (1999) “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608.
172
Larry G. Epstein
Gilboa and Schmeidler (1989), provide important examples for understanding the nature of our definition, as they are the most widely used and studied theories of preference that can accommodate Ellsberg-type behavior. Recall that risk aversion has been defined and characterized for general preferences, including those that lie outside the expected utility class (see Yaari (1969) and Chew and Mao (1995), for example). There is a separate technical or methodological contribution of the paper. After the formulation and initial examination of the definition of uncertainty aversion, subsequent analysis is facilitated by assuming eventwise differentiability of utility. The role of eventwise differentiability may be described roughly as follows: The notion of uncertainty aversion leads to concern with the “local probabilistic beliefs” implicit in an arbitrary preference order or utility function. These beliefs represent the decision-maker’s underlying “mean” or “ambiguity-free” likelihood assessments for events. In general, they need not be unique. But they are unique if utility is eventwise differentiable (given suitable additional conditions). Further perspective is provided by recalling the role of differentiability in decision theory under risk, where utility functions are defined on cumulative distribution functions. Much as calculus is a powerful tool, Machina (1982) has shown that differential methods are useful in decision theory under risk. He employs Frechet differentiability; others have shown that Gateaux differentiability suffices for many purposes (Chew et al., 1987). In the present context of decision-making under uncertainty, where utility functions are defined over acts, the preceding two notions of differentiability are not useful for the task of uncovering implicit local beliefs. On the other hand, eventwise differentiability “works.” Because local probabilistic beliefs are likely to be useful more broadly, so it seems will the notion of eventwise differentiability. It must be acknowledged, however, that eventwise differentiability has close relatives in the literature, namely in Rosenmuller (1972) and Machina (1992).1 The differences from this chapter and the value added here are clarified later (Appendix C). It seems accurate to say that this chapter adds to the demonstration in Machina (1992) that differential techniques are useful also for analysis of decision-making under uncertainty. The chapter proceeds as follows: The Schmeidler definition of uncertainty aversion is examined first. This is accompanied by examples that motivate the search for an alternative definition. Then, because the parallel with the well understood theory of risk aversion is bound to be helpful, relevant aspects of that theory are reviewed. A new definition of uncertainty aversion is formulated in the remainder of Section 9.2 and some attractive properties are described in Section 9.3. In particular, uncertainty aversion is shown to have intuitive empirical content and to admit simple characterizations within the CEU and multiple-priors models. The notion of “eventwise derivative” and the analysis of uncertainty aversion given eventwise differentiability follow in Section 9.4. It is shown that eventwise differentiability of utility simplifies the task of checking whether the corresponding preference order is uncertainty averse and thus enhances the tractability of the proposed definition. Section 9.5 concludes with remarks on the significance of the choice between the domain of Savage acts versus the larger AA domain of horse-race/roulette-wheel
A definition of uncertainty aversion
173
acts. This difference in domains is central to understanding the relation between this chapter and Schmeidler (1989). Two important limitations of the analysis should be acknowledged at the start. First, uncertainty aversion is defined relative to an exogenously specified collection of events A. Events in A are thought of as unambiguous or uncertainty free. They play a role here parallel to that played by constant (or risk-free) acts in the standard analysis of risk aversion. However, whether or not an event is ambiguous is naturally viewed as subjective or derived from preference. Accordingly, it seems desirable to define uncertainty aversion relative to the collection of subjectively unambiguous events. Unfortunately, such a formulation is beyond the scope of this chapter.2 In defense of the exogenous specification of the collection A, observe that Schmeidler (1989) relies on a comparable specification through the presence of objective lotteries in the AA domain. In addition, it seems likely that given any future success in endogenizing ambiguity, the present analysis of uncertainty aversion relative to a given collection A will be useful. The other limitation concerns the limited success in this chapter in achieving the ultimate objective of deriving the behavioral consequences of uncertainty aversion. The focus here is on the definition of uncertainty aversion. Some behavioral implications are derived but much is left for future work. In particular, applications to standard economic contexts, such as asset pricing or games, are beyond the scope of the chapter. However, the importance of the groundwork laid here for future applications merits emphasis—an essential precondition for understanding the behavioral consequences of uncertainty aversion is that the latter term has a precise and intuitively satisfactory meaning. Admittedly, there have been several papers in the literature claiming to have derived consequences of uncertainty aversion for strategic behavior and also for asset pricing. To varying degrees these studies either adopt the Schmeidler definition of uncertainty aversion or they do not rely on a precise definition. In the latter case, they adopt a model of preference that has been developed in order to accommodate an intuitive notion of uncertainty aversion and interpret the implications of this preference specification as due to uncertainty aversion. (This author is partly responsible for such an exercise (Epstein and Wang, 1995); there are other examples in the literature.) There is an obvious logical flaw in such a procedure and the claims made (or the interpretations proposed) are unsupportable without a satisfactory definition of uncertainty aversion. 9.1.2. The current definition of uncertainty aversion In order to motivate the chapter further, consider briefly Schmeidler’s definition of uncertainty aversion. See Section 9.5 for a more complete description and for a discussion of the importance of the choice between the AA domain (as in Schmeidler, 1989) and the Savage domain (as in this chapter). Fix a state space (S, ), where is an algebra, and an outcome set X . Denote by F the Savage domain, that is, the set of all finite-ranged (simple) and measurable acts e from (S, ) into X . Choice behavior relative to F is the object of study.
174
Larry G. Epstein
Accordingly, postulate a preference order ! and a representing utility function U defined on F. Schmeidler’s definition of uncertainty aversion has been used primarily in the context of CEU theory, according to which uncertain prospects are evaluated by a utility function having the following form: ceu u(e) dν, e ∈ F. (9.1) U (e) = S
Here, u : X −→ R1 is a vNM utility index, ν is a capacity (or nonadditive probability) on , integration is in the sense of Choquet and other details will be provided later.3 For such a preference order, uncertainty aversion in the sense of Schmeidler is equivalent to convexity of the capacity ν, that is, to the property whereby ν(A ∪ B) + ν(A ∩ B) ≥ ν(A) + ν(B),
(9.2)
for all measurable events A and B. Additivity is a special case that characterizes uncertainty neutrality (suitably defined). However, Ellsberg’s single-urn experiment illustrates the weak connection between convexity of the capacity and behavior that is intuitively uncertainty averse.4 The urn is represented by the state space S = {R, B, G}, where the symbols represent the possible colors, red, blue, and green of a ball drawn at random from an urn. The information provided the decision-maker is that the urn contains 30 red balls and 90 balls in total. Thus, while he knows that there are 60 balls that are either blue or green, the relative proportions of each are not given. Let ! be the decision-maker’s preference over bets on events E ⊂ S. Typical choices in such a situation correspond to the following rankings of events:5 {R} {B} ∼ {G}, {B, G} {R, B} ∼ {R, G}.
(9.3)
The intuition for these rankings is well known and is based on the fact that {R} and {B, G} have objective probabilities, while the other events are “ambiguous,” or have “ambiguous probabilities.” Thus these rankings correspond to an intuitive notion of uncertainty or ambiguity aversion. Next suppose that the decision-maker has CEU preferences with capacity ν. Then convexity is neither necessary nor sufficient for the above rankings. For example, if ν(R) = 8/24, ν(B) = ν(G) = 7/24, and ν({B, G}) = 13/24, ν({R, G}) = ν({R, B}) = 1/2, then (9.3) is implied but ν is not convex. For the fact that convexity is not sufficient, observe that convexity does not even exclude the “opposite” rankings that intuitively reflect an affinity for ambiguity. (Let ν(R) = 1/12, ν(B) = ν(G) = 1/6, ν({B, G}) = 1/3, ν({R, G}) = ν({R, B}) = 1/2.) An additional example, taken from Zhang (1997), will reinforce that stated earlier and also illustrate a key feature of the analysis to follow. An urn contains 100 balls in total, with color composition R, B, W , and G, such
A definition of uncertainty aversion
175
that R + B = 50 = G + B. Thus S = {R, B, G, W } and the collection A = {∅, S, {B, G}, {R, W }, {B, R}, {G, W }} contains the events that are intuitively unambiguous. It is natural to suppose that the decision-maker would use the probability measure p on A, where p assigns probability 1/2 to each binary event. For other subsets of S, she might use the capacity p∗ defined by6 p∗ (E) = sup {p(B) : B ⊂ E, B ∈ A},
E ⊂ S.
The fact that the capacity of any E is computed by means of an inner approximation by unambiguous events seems to capture a form of aversion to ambiguity. However, p∗ is not convex because 1 = p∗ ({B, G}) + p∗ ({B, R}) > p∗ ({B, G, R}) + p∗ ({B}) =
1 . 2
Finally, observe that the collection A is not an algebra, because it is not closed with respect to intersections. Each of {R, B} and {G, B} is unambiguous, but {B} is ambiguous, showing that an algebra is not the appropriate mathematical structure for modeling collections of unambiguous events. This important insight is due to Zhang (1997). He further proposes an alternative structure, called a λ-system, that is adopted in Section 9.2.2.
9.2. Aversion to risk and uncertainty 9.2.1. Risk aversion Recall first some aspects of the received theory of risk aversion. This will provide some perspective for the analysis of uncertainty aversion. In addition, it will become apparent that if a distinction between risk and uncertainty is desired, then the theory of risk aversion must be modified. Because a subjective approach to risk aversion is the relevant one, adapt Yaari’s analysis (Yaari, 1969), which applies to the primitives (S, ), X ⊂ RN and !, a preference over the set of acts F. Turn first to “comparative risk aversion.” Say that !2 is more risk averse than 1 ! if for every act e and outcome x, x !1 (1 )e =⇒ x !2 (2 )e.
(9.4)
The two acts that are being compared here differ in that the variable outcomes prescribed by e are replaced by the single outcome x. The intuition for this definition is clear given the identification of constant acts with the absence of risk or perfect certainty. To define absolute (rather than comparative) risk aversion, it is necessary to adopt a “normalization” for risk neutrality. Note that this normalization is exogenous to the model. The standard normalization is the “expected value function,”
176
Larry G. Epstein
that is, risk neutral orders !rn are those satisfying:
e !rn e ⇐⇒
e(s)dm(s) !rn S
e (s)dm(s),
(9.5)
S
for some probability measure m on (S, ), where the R N -valued integrals are interpreted as constant acts and accordingly are ranked by !rn . This leads to the following definition of risk aversion: Say that ! is risk averse if there exists a risk neutral order !rn such that ! is more risk averse than !rn . Risk loving and risk neutrality can be defined in the obvious ways. In the SEU framework, this notion of risk aversion is the familiar one characterized by concavity of the vNM index, with the required m being the subjective beliefs or prior. By examining the implications of risk aversion for choice between binary acts, Yaari (1969), argues that this interpretation for m extends to more general preferences. Three points from this review merit emphasis. First, the definition of comparative risk aversion requires an a priori definition for the absence of risk. Observe that the identification of risklessness with constant acts is not tautological. For example, Karni (1983) argues that in a state-dependent expected utility model “risklessness” may very well correspond to acts that are not constant. Thus the choice of how to model risklessness is a substantive normalization that precedes the definition of “more risk averse.” Second, the definition of risk aversion requires further an a priori definition of risk neutrality. The final point is perhaps less evident or familiar. Consider rankings of the sort used in (9.4) to define “more risk averse.” A decision-maker may prefer the constant act because she dislikes variable outcomes even when they are realized on events that are understood well enough to be assigned probabilities (risk aversion). Alternatively, the reason for the indicated preference may be that the variable outcomes occur on events that are ambiguous and because she dislikes ambiguity or uncertainty. Thus it seems more appropriate to describe (9.4) as revealing that !2 is “more risk and uncertainty averse than !1 ,” with no attempt being made at a distinction. However, the importance of the distinction between these two underlying reasons seems self-evident; it is reflected also in recent concern with formal models of “Knightian uncertainty” and decision theories that accommodate the Ellsberg (as opposed to Allais) Paradox. The second possibility mentioned earlier can be excluded, and thus a distinction made, by assuming that the decision-maker is indifferent to uncertainty, or put another way, by assuming that there is no uncertainty (all events are assigned probabilities). But these are extreme assumptions that are contradicted in Ellsberg-type situations. This chapter identifies and focuses upon the uncertainty aversion component implicit in the comparisons (9.4) and, to a limited extent, achieves a separation between risk aversion and uncertainty aversion.
A definition of uncertainty aversion
177
9.2.2. Uncertainty aversion Once again, consider orders ! on F, where for the rest of the chapter the outcome set X is arbitrary rather than Euclidean. The objective now is to formulate intuitive notions of comparative and absolute uncertainty aversion. Turn first to comparative uncertainty aversion. It is clear intuitively and also from the discussion of risk aversion that one can proceed only given a prior specification of the “absence of uncertainty.” This specification takes the form of an exogenous family A ⊂ of “unambiguous” events. Assume throughout the following intuitive requirements for A: It contains S and A∈A A1 , A2 ∈ A
implies that Ac ∈ A; and
A1 ∩ A2 = ∅ imply that A1 ∪ A2 ∈ A.
Zhang (1997) argues that these properties are natural for a collection of unambiguous events and, following (Billingsley, 1986: 36), calls such collections λ-systems. Intuitively, if an event being unambiguous means that it can be assigned a probability by the decision-maker, then the sum of the individual probabilities is naturally assigned to a disjoint union, while the complementary probability is naturally assigned to the complementary event. As demonstrated earlier, it is not intuitive to require that A be closed with respect to non-disjoint unions or intersections, that is, that A be an algebra. Denote by F ua the set of A-measurable acts, also called unambiguous acts. The following definition parallels the earlier one for comparative risk aversion. Given two orderings, say that !2 is more uncertainty averse than !1 if for every unambiguous act h and every act e in F, h !1 (1 )e =⇒ h !2 (2 )e.
(9.6)
There is no loss of generality in supposing that the acts h and e deliver the identical outcomes. The difference between the acts lies in the nature of the events where these outcomes are delivered (some of these events may be empty). For h, the typical outcome x is delivered on the unambiguous event h−1 (x), while it occurs on an ambiguous event given e. Then whenever the greater ambiguity inherent in e leads !1 to prefer h, the more ambiguity averse !2 will also prefer h. This interpretation relies on the assumption that each event in A is unambiguous and thus is (weakly) less ambiguous than any E ∈ . Fix an order !. To define absolute (rather than comparative) uncertainty aversion for !, it is necessary to adopt a “normalization” for uncertainty neutrality. As in the case of risk, a natural though exogenous normalization exists, namely that preference is based on probabilities in the sense of being probabilistically sophisticated as defined in Machina and Schmeidler (1992). The functional form of representing utility functions reveals clearly the sense in which preference is based on probabilities. The components of that functional form are a probability measure m on the state space (S, ) and a functional W : (X ) −→ R1 , where (X ) denotes the set of all simple (finite support) probability measures
178
Larry G. Epstein
on the outcome set X . Using m, any act e induces such a probability distribution m,e . Probabilistic sophistication requires that e be evaluated only through the distribution over outcomes m,e that it induces. More precisely, utility has the form U ps (e) = W (m,e ),
e ∈ F.
(9.7)
Following Machina and Schmeidler (1992: 754), assume also that W is strictly increasing in the sense of first-order stochastic dominance, suitably defined.7 Denote any such order by !ps . A decision-maker with !ps assigns probabilities to all events and in this way transforms any act into a lottery, or pure risk. Such exclusive reliance on probabilities is, in particular, inconsistent with the typical “uncertainty averse” behavior exhibited in Ellsberg-type experiments. Thus it is both intuitive and consistent with common practice to identify probabilistic sophistication with uncertainty neutrality. Think of m and W as the “beliefs” (or probability measure) and “risk preferences” underlying !ps .8 This normalization leads to the following definition: Say that ! is uncertainty averse if there exists a probabilistically sophisticated order !ps such that ! is more uncertainty averse than !ps . In other words, under the conditions stated in (9.6), h !ps (ps )e =⇒ h ! ()e.
(9.8)
The intuition is similar to that for (9.6). It is immediate that ! and !ps agree on unambiguous acts. Further, !ps is indifferent to uncertainty and thus views all acts as being risky only. Therefore, interpret (9.8) as stating that !ps is a “risk preference component” of !. The indefinite article is needed for two reasons—first because all definitions depend on the exogenously specified collection A and second, because !ps need not be unique even given A. Subject to these same qualifications, the probability measure underlying !ps is naturally interpreted as “mean” or “uncertainty-free” beliefs underlying !. The formal analysis stated later does not depend on these interpretations. It might be useful to adapt familiar terminology and refer to !ps satisfying (9.8) as constituting a support for ! at h. Then uncertainty aversion for ! means that there exists a single order !ps supporting ! at every unambiguous act. A parallel requirement in consumer theory is that there exist a single price vector supporting the indifference curve at each consumption bundle on the 45◦ line. (This parallel is developed further in Section 9.3.4 and via Theorem 9.3(c).) Turn next to uncertainty loving and uncertainty neutrality. For the definition of the former, reverse the inequalities in (9.8). That is, say that ! is uncertainty loving if there exists a probabilistically sophisticated order !ps such that, under the conditions stated in (9.6), h )ps (≺ps )e =⇒ h ) (≺)e.
(9.9)
The conjunction of uncertainty aversion and uncertainty loving is called uncertainty neutrality.
A definition of uncertainty aversion
179
9.2.3. A degree of separation Consider the question of a separation between attitudes toward uncertainty and attitudes toward risk. Suppose that ! is uncertainty averse with support !ps . Because ! and !ps agree on the set F ua of unambiguous acts, ! is probabilistically sophisticated there. Thus, treating the probability measure underlying !ps as objective, one may adopt the standard notion of risk aversion (or loving) for objective lotteries (see e.g. Machina (1982)) in order to give precise meaning to the statement that ! is risk averse (or loving). In the same way, such risk attitudes are well defined if ! is uncertainty loving. That a degree of separation between risk and uncertainty attitudes has been achieved is reflected in the fact that all four logically possible combinations of risk and uncertainty attitudes are admissible. On the other hand, the separation is partial: If !1 is more uncertainty averse than !2 , then these two preference orders must agree on F ua and thus embody the same risk aversion. As emphasized earlier, the meaning of uncertainty aversion depends on the exogenously specified A. That specification also bears on the distinction between risk aversion and uncertainty aversion. The suggestion just expressed is that the risk attitude of an order ! is embodied in the ranking it induces on F ua , while the attitude toward uncertainty is reflected in the way in which ! relates arbitrary acts e with unambiguous acts h as in (9.6). Thus if the modeler specifies that A = {∅, S}, and hence that F ua contains only constant acts, then she is assuming that the decision-maker is not facing any meaningful risk. Accordingly, the modeler is led to interpret comparisons of the form (9.4) as reflecting (comparative) uncertainty aversion exclusively. At the other extreme, if the modeler specifies that A = , and hence that all acts in F are unambiguous, then she is assuming that the decision-maker faces only risk, which leads to the interpretation of (9.4) as reflecting (comparative) risk aversion exclusively. More generally, the specification of A reflects the modeler’s prior view of the decision-maker’s perception of his environment.
9.3. Is the definition attractive? 9.3.1. Some attractive properties The definition of uncertainty aversion has been based on the a priori identification of uncertainty neutrality (defined informally) with probabilistic sophistication. Therefore, internal consistency of the approach should deliver this identification as a formal result. On the other hand, because attitudes toward uncertainty have been defined relative to a given A, such a result cannot be expected unless it is assumed that A is “large.” Suppose, therefore, that A is rich: There exist x ∗ x∗ such that for every E¯ ⊂ E in and A in A satisfying (x ∗ , A; x∗ , Ac ) ∼ (x ∗ , E; x∗ , E c ),
180
Larry G. Epstein
there exists A¯ in A, A¯ ⊂ A such that ¯ x∗ , A¯ c ) ∼ (x ∗ , E; ¯ x∗ , E¯ c ). (x ∗ , A; A corresponding notion of richness is valid for the roulette-wheel lotteries in the AA framework adopted by Schmeidler (1989).9 The next theorem (proved in Appendix A) establishes the internal consistency of our approach. Theorem 9.1. If ! is probabilistically sophisticated, then it is uncertainty neutral. The converse is true if A is rich. The potential usefulness of the notion of uncertainty aversion depends on being able to check for the existence of a probabilistically sophisticated order supporting a given !. This concern with tractability motivates the later analysis of eventwise differentiability. Anticipating that analysis, consider here the narrower question “does there exist !ps that both supports ! and has underlying beliefs represented by the given probability measure m on ?” On its own, the question may seem to be of limited interest. But once eventwise differentiability delivers m, its answer completes a procedure for checking for uncertainty aversion. Lemma 9.1. Let !ps support ! in the sense of (9.8) and have underlying probability measure m on . Then: (i) For any two unambiguous acts h and h , if m,h first-order stochastically dominates m,h , then U (h) ≥ U (h ). (ii) For all acts e and unambiguous acts h, m,e = m,h =⇒ U (e) ≤ U (h). The converse is true if m satisfies: For each unambiguous A and 0 < r < mA, there exists unambiguous B ⊂ A with mB = r. The added assumption for m is satisfied if S = S1 × S2 , unambiguous events are measurable subsets of S1 and the marginal of m on S1 is convex-ranged in the usual sense. The role of the assumption is to ensure that, using the notation surrounding (9.7), {m,h : h ∈ F ua } = (X ). 9.3.2. Multiple-priors and CEU utilities The two most widely used generalizations of SEU theory are CEU and the multiplepriors model. In this subsection, uncertainty aversion is examined in the context of these models.
A definition of uncertainty aversion
181
Say that ! is a multiple-priors preference order if it is represented by a utility function U mp of the form U mp (e) = min u(e) dm, (9.10) m∈P
S
for some set P of probability measures on (S, ) and some vNM index u : X −→ R1 . Given a class A, it is natural to model the unambiguous nature of events in A by supposing that all measures in P are identical when restricted to A; that is, mA = m A
for all m and m in P and A in A.
(9.11)
These two restrictions on ! imply uncertainty aversion, because ! is more uncertainty averse than the expected utility order !ps with vNM index u and any probability measure m in P . More precisely, the following intuitive result is valid: Theorem 9.2. Any multiple-priors order satisfying (9.11) is uncertainty averse. Proof. Let !ps denote an expected utility order with vNM index u and any probps e ⇐⇒ ability measure m in P . Then h ! u(h) dm ≥ u(e) dm =⇒ U mp (h) = u(h) dm ≥ u(e) dm ≥ U mp (e). A commonly studied special case of the multiple-priors model is a CEU order with convex capacity ν. Then (9.10) applies with P = core(ν) = {m : m(·) ≥ ν(·) on }. Thus convexity of the capacity implies uncertainty aversion given (9.11). Focus more closely on the CEU model, with particular emphasis on the connection between uncertainty aversion and convexity of the capacity. The next result translates Lemma 9.1 into the present setting, thus providing necessary and sufficient conditions for uncertainty aversion combined with a prespecified supporting probability measure m. For necessity, an added assumption is adopted. Say that a capacity ν is convex-ranged if for all events E1 ⊂ E2 and ν(E1 ) < r < ν(E2 ), there exists E, E1 ⊂ E ⊂ E2 , such that ν(E) = r. This terminology applies in particular if ν is additive, where it is standard.10 For axiomatizations of CEU that deliver a convex-ranged capacity, see (Gilboa, 1987: 73 and Sarin and Wakker, 1992: Proposition A.3). Savage’s axiomatization of expected utility delivers a convex-ranged probability measure. Use the notation U ceu to refer to utility functions defined by (9.1), where the vNM index u : X −→ R1 satisfies u(X ) has nonempty interior in R1 . For those unfamiliar with Choquet integration, observe that for simple acts it yields n−1 U ceu (e) = i=1 [u(xi ) − u(xi+1 )]ν(∪i1 Ej ) + u(xn ),
(9.12)
where the outcomes are ranked as x1 x2 · · · xn and the act e has e(xi ) = Ei , i = 1, . . . , n.
182 Larry G. Epstein Lemma 9.2. Let U ceu be a CEU utility function with capacity ν. (a) The following conditions are sufficient for U ceu to be uncertainty averse with supporting U ps having m as underlying probability measure: There exists a bijection g : [0, 1] −→ [0, 1] such that m ∈ core(g −1 (ν));
and
m(·) = g −1 (ν(·)) on A.
(9.13) (9.14)
(b) Suppose that ν is convex-ranged and that A is rich. Then the conditions in (a) are necessary in order that U ceu be uncertainty averse with supporting U ps having m as underlying probability measure. (c) Finally, in each of the preceding parts, the supporting utility U ps can be taken to be an expected utility function if and only if in addition g is the identity function, that is, m = ν on A and m ≥ ν on .
(9.15)
See Appendix A for a proof. The supporting utility function U ps that is provided by the proof of (a) has the form (9.7), where the risk preference functional W is W () = u(x) d(g ◦ )(x), X
a member of the rank-dependent expected utility class (Chew et al., 1987). Observe first that attitudes toward uncertainty do not depend on properties of the vNM index u. More surprising is that given m, the conditions on ν described in (a) are ordinal invariants, that is, if ν satisfies them, then so does ϕ(ν) for any monotonic transformation ϕ. In other words, ν and g satisfy these conditions if and only if ϕ(ν) and g = ϕ(g) do. Consequently, under the regularity conditions in the lemma, the CEU utility function u(e) dν is uncertainty averse if and only if the same is true for u(e) dϕ(ν). The fact that uncertainty aversion is determined by ordinal properties of the capacity makes it perfectly clear that uncertainty aversion has little to do with convexity, a cardinal property. Thus far, only parts (a) and (b) of the lemma have been used. Focus now on (c), characterizing conditions under which U ceu is “more uncertainty averse than some expected utility order with probability measure m.” Because the CEU utility functions studied by Schmeidler are defined on horse-race/roulette-wheels and conform with expected utility on the objective roulette-wheels, this latter comparison may be more relevant than uncertainty aversion per se for understanding the connection with convexity. The lemma delivers the requirement that ν be additive on A and that it admit an extension to a measure lying in its core. It is well known that convexity of ν is sufficient for nonemptiness of the core, but that seems to be the extent of the link with uncertainty aversion. The final example in Section 9.1.2, as completed in the next subsection, shows that U ceu may be more uncertainty averse than some expected utility order even though its capacity is not convex.
A definition of uncertainty aversion
183
To summarize, there appears to be no logical connection in the Savage framework between uncertainty aversion and convexity. Convexity does not imply uncertainty aversion, unless added conditions such as (9.11) are imposed. Furthermore, convexity is not necessary even for the stricter notion “more uncertainty averse than some expected utility order” that seems closer to Schmeidler’s notion. This is not to say that convexity and the associated multiple-priors functional structure that it delivers are not useful hypotheses. Rather, the point is to object to the widely adopted behavioral interpretation of convexity as uncertainty aversion. 9.3.3. Inner measures Zhang (1997) argues that rather than convex capacities, it is capacities that are inner measures that model uncertainty aversion. These capacities are defined as follows: Let p be a probability measure on A; its existence reflects the unambiguous nature of events in A. Then the corresponding inner measure p∗ is the capacity given by p∗ (E) = sup{p(B) : B ⊂ E, B ∈ A},
E ∈ .
The fact that the capacity of any E is computed by means of an inner approximation by unambiguous events seems to capture a form of aversion to ambiguity. Zhang provides axioms for preference that are consistent with this intuition and that deliver the subclass of CEU preferences having an inner measure as the capacity ν. It is interesting to ask whether CEU preferences with inner measures are uncertainty averse in the formal sense of this chapter. The answer is “sometimes” as described in the next lemma. Lemma 9.3. Let U ceu (·) ≡ u(·)dp∗ , where p∗ is the inner measure generated as above from the probability measure p on A. ceu is more (a) If p admits an extension to a probability measure on , then U uncertainty averse than the expected utility function u(·)dp. (b) Adopt the auxiliary assumptions in Lemma 9.2(b). If U ceu is uncertainty averse, then p admits an extension from A to a measure on all of . Proof. (a) p∗ and p coincide on A. For every B ⊂ E, p(B) ≤ p(E). Therefore, p∗ (E) ≤ p(E). From the formula (9.12) for the Choquet integral, conclude that for all acts e and unambiguous acts h,
u(h) dp∗ =
u(h) dp
and
u(e) dp∗ ≤
u(e) dp.
(b) By Lemma 9.2 and its proof, p = p∗ = g(m) on A and m(A) = [0, 1]. Therefore, g must be the identity function. Again by the previous lemma, m lies in core(p∗ ), implying that m ≥ p∗ = p on A. Because A is closed with respect to complements, conclude that m = p on A and hence that m is the asserted extension of p.
184
Larry G. Epstein
Both directions in the lemma are of interest. In general, a probability measure on the λ-system A need not admit an extension to the algebra .11 Therefore, (b) shows that the intuition surrounding “inner approximation” is flawed or incomplete, demonstrating the importance of a formal definition of uncertainty aversion. Part (a) provides a class of examples of CEU functions that are more uncertainty averse than some expected utility order. These can be used to show that even if this stricter notion of (more) uncertainty averse is adopted, the capacity p∗ need not be convex. For instance, the last example in Section 9.1.2 satisfies the conditions in (a)—the required extension is the equally likely (counting) probability measure on the power set. Thus preference is uncertainty averse, even though p∗ is not convex. 9.3.4. Bets, beliefs and uncertainty aversion This section examines some implications of uncertainty aversion for the ranking of binary acts. Because the ranking of bets reveals the decision-maker’s underlying beliefs or likelihoods, these implications clarify the meaning of uncertainty aversion and help to demonstrate its intuitive empirical content. The generic binary act is denoted xEy, indicating that x is obtained if E is realized and y otherwise. Let ! be uncertainty averse with probabilistically sophisticated order !ps satisfying (9.8). Apply the latter to binary acts, to obtain the following relation: For all unambiguous A, events E and outcomes x1 and x2 , x1 Ax2 !ps (ps )x1 Ex2 =⇒ x1 Ax2 ! ()x1 Ex2 . Proceed to transform this relation into a more illuminating form. Exclude the uninteresting case x1 ∼ x2 and assume that x1 x2 . Then x1 Ex2 can be viewed as a bet on the event E. As noted earlier, !ps necessarily agrees with the given ! in the ranking of unambiguous acts and hence also constant acts or outcomes, so x1 ps x2 . Let m be the subjective probability measure on the state space (S, ) that underlies !ps . Then the monotonicity property inherent in probabilistic sophistication implies that x1 Ax2 !ps (ps )x1 Ex2 ⇐⇒ m(A1 ) ≥ (>)m(E1 ). Conclude that uncertainty aversion implies the existence of a probability measure m such that: For all A, E, x1 and x2 as given earlier, m(A) ≥ (>)m(E) =⇒ x1 Ax2 ! ()x1 Ex2 . One final rewriting is useful. Define, for the given pair x1 x2 , ν(E) = U (x1 Ex2 ).
A definition of uncertainty aversion
185
Then, mA ≥ (>)mE =⇒ νA ≥ (>)νE,
(9.16)
which is the sought-after implication of uncertainty aversion.12 In the special case of CEU (9.1), with vNM index satisfying u(x1 ) = 1 and u(x2 ) = 0, ν defined as stated earlier coincides with the capacity in the CEU functional form. Even when CEU is not assumed, (suppose that ν is monotone with respect to set inclusion and) refer to ν as a capacity. The interpretation is that ν represents ! numerically over bets on various events with the given stakes x1 and x2 , or alternatively, that it represents numerically the likelihood relation underlying preference !. From this perspective, only the ordinal properties of ν are significant.13 An implication of (9.16) is that ν and m must be ordinally equivalent on A (though not on ). In other words, uncertainty aversion implies the existence of a probability measure m that supports {E ∈ : ν(E) ≥ ν(A)} at each unambiguous A, where support is in a sense analogous to the usual meaning, except that the usual linear supporting function defined on a linear space is replaced by an additive function defined on an algebra. Think of the measure m as describing the (not necessarily unique) “mean ambiguity-free likelihoods” implicit in ν and !. This interpretation and the “support” analogy are pursued and developed further in Section 9.4.3 under the assumption that preference is eventwise differentiable. In a similar fashion, one can show that uncertainty loving implies the existence of a probability measure q on (S, ) such that q(A) ≤ (<)q(E) =⇒ ν(A) ≤ (<)ν(E),
(9.17)
for every E ∈ and A ∈ A. The conjunction of (9.16) and (9.17) imply, under a mild additional assumption, that ν is ordinally equivalent to a probability measure (see Lemma 9.A.1), which is one step in the proof of Theorem 9.1. Because choice between bets provides much of the experimental evidence regarding nonindifference to uncertainty, the implication (9.16) is convenient for demonstrating the intuitive empirical content of uncertainty aversion. The Ellsberg urn discussed in the introduction provides the natural vehicle. Consider again the typical choices in (9.3). In order to relate these rankings to the formal definition of uncertainty aversion, adopt the natural specification A = {∅, S, {R}, {B, G}}. Given this specification, it is easy to see that these rankings imply uncertainty aversion—the measure m assigning 1/3 probability to each color is a support in the sense of (9.16). Equally revealing is that the notion of uncertainty aversion excludes behavior that is interpreted intuitively as reflecting an affinity for ambiguity.14 To see this, suppose that the decision-maker’s rankings are changed by reversing the strict preference “” to “≺”. These new rankings contradict uncertainty aversion: Let
186
Larry G. Epstein
m be a support as in the implication (9.16) of uncertainty aversion and take A = {B, G}. Then {B, G} ≺ {R, B} implies that m({B, G}) < m({R, B}). Because m is additive, conclude that m(G) < m(R). But then uncertainty aversion applied to the unambiguous event {R} implies that {R} {G}, contrary to the hypothesis. Though a general formal result seems unachievable, there is an informal sense in which these results seem to be valid much more broadly than the specific Ellsberg experiment considered. Typically when choices are viewed as paradoxical relative to probabilistically sophisticated preferences, there is a natural probability measure on the state space that is “contradicted” by observed choices. This seems close to saying precisely that the measure is a support. Another revealing implication of uncertainty aversion is readily derived from (9.16). Notation that is useful here and later is, given A, write an arbitrary event E in the form E = A + F − G,
where
F = E\A
and
G = A\E.
(9.18)
Henceforth, E + F denotes both E ∪ F and the assumption that the sets are disjoint. Similarly, implicit in the notation E − G is that G ⊂ E. Now let m be the supporting measure delivered by uncertainty aversion. Then for any unambiguous A and A, if F ⊂ A ∩ Ac and G ⊂ Ac ∩ A, A − F + G ! A =⇒ A + F − G ) A,
(9.19)
because the first ranking implies (by the support property at A ) that mF ≤ mG and this implies the second ranking (by the support property at A).15 In particular, taking A = Ac , Ac − F + G ! Ac =⇒ A + F − G ) A,
(9.20)
for all F ⊂ Ac and G ⊂ A. The interpretation is that if F seems small relative to G when (as at A ) one is contemplating subtracting F and adding G, then it also seems small when (as at A) one is contemplating adding F and subtracting G. This is reminiscent of the familiar inequality between the compensating and equivalent variations for an economic change, or the property of diminishing marginal rate of substitution. A closer connection between uncertainty aversion and such familiar notions from consumer theory is possible if eventwise differentiability of preference is assumed, as in the next section.
9.4. Differentiable utilities Tractability in applying the notion of uncertainty aversion raises the following question: Is there a procedure for deriving from ! all probabilistically sophisticated orders satisfying (9.8), or for deriving from ν all candidate supporting measures m satisfying (9.16)? Such a procedure is essential for the hypothesis of uncertainty aversion to be verifiable. For example, within CEU, Lemma 9.2 describes the probability measures that can serve as supports. However, to apply
A definition of uncertainty aversion
187
the description, one must be able to compute the cores of the capacity ν and of monotonic transformations of ν, while even the core of ν alone is typically not easily computed from ν. In order to address the question of tractability, this section introduces the notion of eventwise differentiability of preference. Much as within expected utility theory (where outcomes lie in some Rn ), differentiability of the vNM index simplifies the task of checking for concavity and hence risk aversion, eventwise differentiability simplifies the task of checking for uncertainty aversion. That is because such differentiability permits the candidate supporting measures to be derived via convenient calculations of the sort familiar from calculus. Further, conditions are provided that deliver a unique supporting measure from the eventwise derivative of utility. When combined with Lemmas 9.1 and 9.2, this provides a practicable characterization of uncertainty aversion. 9.4.1. Definition of eventwise differentiability The standard representation of an act, used earlier, is as a measurable map from states into outcomes. Let e : S −→ X be such an act. An alternative representation of this act is by means of the inverse correspondence e−1 , denoted by e. Thus e : X −→ , where e(x) denotes the event E on which the act assumes the outcome x. For notational simplicity, it is convenient to write e rather than e and to leave it to the context to make clear whether e denotes a mapping from states into consequences or alternatively from outcomes into events. Henceforth, when examining the decision-maker’s ranking of a pair of acts, view those acts as assigning a common set of outcomes to different events. This perspective is “dual” to the more common one, where distinct acts are viewed as assigning different outcomes to common events. These two perspectives are mathematically equally valid; the choice between them is a matter of convenience. The latter is well suited to the study of risk aversion (attitudes toward variability in outcomes) and, it is argued here, the former is well suited to the study of uncertainty aversion. The intuition is that uncertainty or ambiguity stems from events and that aversion to uncertainty reflects attitudes toward changes in those events. Because acts are simple, {x ∈ X : e(x) = ∅} is finite.
(9.21)
In addition, {e(x) : x ∈ X , e(x) = ∅}
partitions S.
(9.22)
The set of acts F may be identified with the set of all maps satisfying these two conditions. In particular, F ⊂ X , where the latter is defined as the set of all maps from X into satisfying (9.21). Let U : F −→ R be a utility function for ! and define the “eventwise derivative of U .” Because utility is defined on a subset of X , it is convenient to define derivatives first for functions that are defined on all of X . Continue to refer to elements e ∈ X as acts even when they are not elements of F.
188
Larry G. Epstein
The following structure for X is useful. Define the operations “∪”, “∩” and “complementation” (e −→ ec ) on X co-ordinatewise; for example, (e ∪ f )(x) ≡ e(x) ∪ f (x),
for all x ∈ X .
Say that e and f are disjoint if e(x) ∩ f (x) ≡ ∅ for all x, abbreviated e ∩ f = ∅. In that case, denote the above union by e + f . The notation e \e and e e indicates set difference and symmetric difference applied outcome by outcome. Similar meaning is given to g ⊂ e. Say that {f j }nj=1 partitions f if {f j (x)} partitions f (x) for each x. Define the refinement partial ordering of partitions in the obvious way. Given an act f , λ }λ denotes the net of all finite partitions of f , where λ < λ if and only {{f j ,λ }nj =1 if the partition corresponding to λ refines the partition corresponding to λ. A real-valued function µ on X is called additive if it is additive across disjoint acts. Refer to such a function as a (signed) measure even though that terminology is usually reserved for functions defined on algebras, while X is not an algebra.16 Expected utility functions, U (e) = x u(x)p(e(x)), are additive and hence measures in this terminology. The properties of boundedness and convex-rangedness for a measure µ on X can be defined in the natural way (see Appendix B.) Define differentiability for a function : X −→ R1 . In order to better understand the essence of the definition, some readers may wish to focus on the special case where the domain of is . Then each act e is simply an event E. One can think of as a capacity and of δ(·; E) as its derivative at E. Definition 9.1. is (eventwise) differentiable at e ∈ X if there exists a bounded and convex-ranged measure δ(·; e) on X , such that: For all f ⊂ ec and g ⊂ e, λ jn=1 | (e + f j ,λ − g j ,λ ) − (e) − δ(f j ,λ ; e) + δ(g j ,λ ; e) | −→λ 0.
(9.23) Any utility function U is defined on the proper subset F of X . Define δU (·; e) as given earlier, with the exception that the perturbations f j ,λ and g j ,λ are restricted so that e + f j ,λ − g j ,λ lies in F. Say that U is eventwise differentiable if the derivative exists at each e in F. (To clarify the notation, suppose that e is an act in F that assumes the outcomes x1 and x2 on E and E c , respectively. Let f assume (only) these outcomes on events F ⊂ E c and G ⊂ E, while g assumes (only) x1 and x2 on G and F , respectively. Then f and g lie in X , f ⊂ ec , g ⊂ e, and e + f − g is the act in F that yields x1 on E + F − G and x2 on its complement. Further if {F j ,λ } and {Gj ,λ } are partitions of F and G and if f j ,λ and g j ,λ are defined in fashion paralleling the definitions given for f and g, then {f j ,λ } and {g j ,λ } are partitions of f and g that enter into the definition of δU (·; e).) The suggested interpretation is that δU (·; e) represents the “mean” or “uncertainty free” assessment of acts implicit in utility, as viewed from the perspective of
A definition of uncertainty aversion
189
the act e. It may help to recall that in the theory of expected utility over objective lotteries or risk, if the vNM index is differentiable, then utility is linear to the first order and hence preference is risk neutral for small gambles. The suggested parallel here is that a differentiable utility is additive (rather than linear) and uncertainty neutral (rather than risk neutral) to the “first-order.” Before applying eventwise differentiability to the analysis of uncertainty aversion, the next section provides some examples. See Appendix C for some technical aspects of eventwise differentiability, for a stronger form of differentiability (similar to that in Machina, 1992) and for a brief comparison with Rosenmuller (1972), which inspired the earlier mentioned definition. 9.4.2. Examples Turn to some examples that illustrate both differentiability and uncertainty aversion. All are special cases of the CEU model (9.12), though other examples are readily constructed. Because the discussion of differentiability dealt with functions defined on X rather than just F, rewrite the CEU functional form here using this larger domain. If the outcomes satisfy x1 x2 · · · xn and the act e has e(xi ) = Ei , i = 1, . . . , n, then U
ceu
n−1 (e) = [u(xi ) − u(xi+1 )]ν(∪i1 Ej ) + u(xn )ν(∪n1 Ej ). i=1
Suppose that the capacity ν is eventwise differentiable with derivative δν(·; E) at E; naturally, differentiability is in the sense of the last section (with | X |= 1). Then U ceu (·) is eventwise differentiable with derivative δU (e ; e) =
n−1
[u(xi ) − u(xi+1 )]δν(∪i1 Ej ; ∪i1 Ej ) + u(xn )δν(∪n1 Ej ; ∪n1 Ej ),
i=1
(9.24) where e (xi ) = Ei . (This follows as in calculus from the additivity property of differentiation.) Because differentiability of utility is determined totally by that of the capacity, it is enough to consider examples of differentiable (and nondifferentiable) capacities. In each case where the capacity is differentiable, (9.24) describes the corresponding derivative of utility. The CEU case demonstrates clearly that eventwise differentiability is distinct from more familiar notions, such as Gateaux differentiability. It is well-known that a CEU utility function is not Gateaux differentiable, even if the vNM index is smooth, unless it is an expected utility function. In contrast, many CEU utility functions are eventwise differentiable, regardless of the nature of u(·). Verification of the formulae provided for derivatives is possible using the definition (9.23). Alternatively, verification of the stronger µ-differentiability (see Appendix C) is straightforward. (Define µ by (9.C.2) and µ 0 = p in the first two
190
Larry G. Epstein
examples, = q in the third example and = ∗ /∗ (S) in the final example, where only “one-sided” derivatives exist.) Example. (Probability measure) Let p be a convex-ranged probability measure. Then δp(·; E) = p(·), the same measure for all E. Application of (9.24) yields δU (e ; e) =
n
u(xi )pEi .
i=1
Example. (Probabilistic sophistication within CEU ) Let ν = g(p),
(9.25)
where p is a convex-ranged probability measure and g : [0, 1] −→ [0, 1] is increasing, onto and continuously differentiable. The corresponding utility function lies in the rank-dependent-expected-utility class of functions studied in the case of risk where p is taken to be objective. (See Chew et al. (1987) and the references therein.) Then δν(·; E) = g (pE) p(·) and n δU (e ; e) = i=1 [u(xi ) − u(xi+1 )]g (p(∪i1 Ej ))p(∪i1 Ej ), u(xn+1 ) ≡ 0.
Example. (Quadratic capacity) Let ν(E) = p(E) q(E), where p and q are convex-ranged probability measures with p q. Then δν(·; E) = p(E)q(·) + p(·)q(E), a formula that is reminiscent of standard calculus.17 Direct verification shows that ν is convex. As for uncertainty aversion, if p and q agree on A, then the probability measure on defined by m(·) = δν(·; A)/δν(S; A) = [q(·) + p(·)]/2, serves as a support in the sense of (9.16). That the implied CEU utility function is uncertainty averse in the full sense of (9.8) may be established by application of Lemma 9.2. Observe that ν = p 2 = m2 on A; thus g(t) = t 2 . Then m lies in the core of (pq)1/2 , because [p(·) + q(·)]2 ≥ 4p(·)q(·). The probabilistically sophisticated supporting utility function U ps is U ps (e) = u(e) dm2 . S
A definition of uncertainty aversion
191
Example. (Interval beliefs) Let ∗ and ∗ be two non-negative, convex-ranged measures on (S, ), such that ∗ (·) ≤ ∗ (·) and
0 < ∗ (S) < 1 < ∗ (S).
Define ξ = ∗ (S) − 1 and ν(E) = max{∗ (E), ∗ (E) − ξ }.
(9.26)
Then ν is a convex capacity on (S, ) and has the core core(ν) = {p ∈ M(S, ) : ∗ (·) ≤ p(·) ≤ ∗ (·) on }. This representation for the core provides intuition for ν and the reason for its name. See Wasserman (1990) for details regarding this capacity and its applications in robust statistics. Because the capacity is “piecewise additive,” one can easily see that though it has “one-sided derivatives,” ν is generally not eventwise differentiable at any E such that ∗ (E) = ∗ (E) − ξ . It follows from Theorem 9.3 and the nature of core(ν) that a CEU utility U ceu with capacity ν is uncertainty averse for any class A such that ∗ (·) = ∗ (·) on A\{S}. Because any such class A excludes events that are “close to” S, such an A cannot be rich. In fact, one can show using Lemma 9.2, that it is impossible for U ceu to be uncertainty averse relative to any rich class of unambiguous events, unless ∗ (·)/∗ (S) = ∗ (·)/∗ (S) on , in which case U ceu is probabilistically sophisticated, providing another illustration of the lack of a connection between uncertainty aversion and convexity. 9.4.3. Uncertainty aversion under differentiability To begin this section, the discussion will be restricted to binary acts; that is, uncertainty aversion will refer to (9.9), or equivalently, to (9.16). Implications are then drawn for uncertainty aversion in the full sense of general acts and (9.8). The relevant derivative is δν(·; E), where νE ≡ U (x1 Ex2 ) and U need not be a CEU function. Assume that νE is increasing with E. Thus δν(·; E) is a nonnegative measure, though not necessarily a probability measure. The suggested interpretation from Section 9.4.1, specialized to this case, is that δν(·; E) represents the “mean” or “uncertainty-free” likelihoods implicit in ν, as viewed from the perspective of the event E. This interpretation is natural given that δν(·; E) is additive over events and hence ordinally equivalent to a probability measure on . Turn to the relation between differentiability and uncertainty aversion. When ν is differentiable, analogy with calculus might suggest that the support at any event A, in the sense of (9.16), should be unique and given by δν(·; A), perhaps up to a scalar multiple. Though the analogy with calculus is imperfect, it is nevertheless the case that, under additional assumptions, differentiability provides information about the set of supports.
192
Larry G. Epstein
The principal additional assumption may be stated as follows: A0 ≡ {A ∈ A : ν(S) > max{νA, νAc }}, the set of unambiguous events A such that A and its complement are each strictly less likely than S. Say that ν is coherent if there exists a positive real-valued function κ defined on A0 , such that δν(·; A) = κ(A)δν(·; Ac )
on ,
(9.27)
for each A in A0 . Coherence is satisfied by all the differentiable examples in Section 9.4.2. By the Chain Rule for eventwise differentiability (Theorem 9.C.1), coherence is invariant to suitable monotonic transformations of ν and thus is an assumption about the preference ranking of binary acts. It is arguably an expression of the unambiguous nature of events in A. To see this, it may help to consider first the following addition to (9.20): A + F − G ) A =⇒ Ac − F + G ! Ac . This is a questionable assumption because the events Ac − F + G and A + F − G are both ambiguous. Therefore, there is no reason to expect the perspective on the change “add F and subtract G” to be similar at Ac as at A. However, if F and G are both “small,” then only mean likelihoods matter and it is reasonable that the relative mean likelihoods of F and G be the same from the two perspectives. In fact, such agreement seems to be an expression of the existence of “coherent” ambiguity-free beliefs underlying preference. This condition translates into the following restriction on derivatives: δν(F ; A) ≤ δν(G; A) =⇒ δν(F ; Ac ) ≤ δν(G; Ac ). By arguments similar to those in the proof of the theorem, this implication delivers (9.27) under the assumptions in part (b). (Observe that the reverse implication follows from (9.20).) The following result is proven in Appendix A: Theorem 9.3. Let ν be eventwise differentiable. (a) If ν is uncertainty averse, then for all A ∈ A, F ⊂ Ac and G ⊂ A, δν(F ; Ac ) ≤ δν(G; Ac ) =⇒ ν(A + F − G) ≤ ν(A).
(9.28)
(b) Suppose further that is a σ -algebra and that m and each δν(·, A), A ∈ A0 , are countably additive, where m is a support in the sense of (9.16). Then for each A in A0 , δν(F ; A) m(G) ≤ δν(G; A) m(F )
(9.29)
δν(G; Ac ) m(F ) ≤ δν(F ; Ac ) m(G).
(9.30)
and
(c) Suppose further that A0 is nonempty and that ν is coherent. Then the unique countably additive supporting probability measure m is given by m(·) = δν(·; A)/δν(S; A), for any A in A0 .
A definition of uncertainty aversion
193
When division is permitted, the inequalities in (b) imply that δν(F ; A) m(F ) δν(F ; Ac ) ≤ ≤ , δν(G; A) m(G) δν(G; Ac )
(9.31)
which suggests an interpretation as an interval bound for the “marginal rate of substitution at any A between F and G.” The relation (9.28) states roughly that for each A, δν(·; Ac ) serves as a support at A. Given our earlier interpretation for the derivative, it states that if the decisionmaker would rather bet on A + F − G than on A when ambiguity is ignored and when mean-likelihoods are computed from the perspective of Ac , then she would make the same choice also when ambiguity is considered. That is because the former event is more ambiguous and the decision-maker dislikes ambiguity or uncertainty. Finally, part (c) of the theorem describes conditions under which the parallel with calculus is valid— the (countably additive) supporting measure is unique and given essentially by the derivative of ν. Note that the support property in question here is global in that the same measure “works” at each unambiguous A, and not just at a single given A.18 This explains the need for the coherence assumption, which helps to ensure that δν(·; A)/δν(S; A) is independent of A. Turn to uncertainty aversion for general nonbinary acts, that is, in the sense of (9.8). Lemma 9.1 characterizes uncertainty aversion for preferences or utility functions, assuming a given supporting measure. Theorem 9.3 delivers the uniqueness of the supporting measure under the stated conditions. Combining these two results produces our most complete characterization of uncertainty aversion. Theorem 9.4. Let U be a utility function, x1 x2 , ν(E) ≡ U (x1 Ex2 ) and suppose that ν is eventwise differentiable. Suppose further that each δν(·, A), A ∈ A0 , is countably additive, A0 is nonempty and ν is coherent. Then (1) implies (2), where: (1) U is uncertainty averse with countably additive supporting probability measure. (2) U satisfies conditions (i) and (ii) of Lemma 9.1 with measure m given by m(·) = δν(·; A)/δν(S; A),
for any A in A0 .
(9.32)
Conversely, if δν(·; A) is convex-ranged on A for any A in A0 , then (2) implies (1). The combination of Theorem 9.3 with Lemma 9.2 delivers a comparable result for CEU utility functions. In particular, to verify “more uncertainty averse than some expected utility function” (Lemma 9.2(c)), one need only verify (9.15) for the particular measure m defined in (9.32), a much easier task than computing the complete core of ν.
194
Larry G. Epstein
9.5. Concluding remarks Within the CEU framework, convexity of the capacity has been widely taken to characterize uncertainty aversion. This chapter has questioned the appeal of this characterization and has proposed an alternative. To conclude, consider further the relation between the two definitions and, in particular, the significance of the difference in the domains adopted in Schmeidler (1989) and in this chapter. Denote by H the set of all finite-ranged (simple) and measurable acts e from (S, ) into (X ). Then H is the domain of horse-race/roulette-wheel acts used by Anscombe and Aumann. Each such act h involves two stages—in the first, uncertainty is resolved through realization of the horse-race winner s ∈ S and in the second stage the risk associated with the objective lottery h(s) is resolved. An act h that yields a degenerate lottery h(s) in the second stage for every s can be identified with a Savage act; in other words, F ⊂ H. Schmeidler assumes that preference ! and the representing utility function U are defined on the larger domain H. He calls U uncertainty averse if it is quasiconcave, that is, if U (e) ≥ U (f ) =⇒ U (αe + (1 − α)f ) ≥ U (f ),
(9.33)
for all α ∈ [0, 1], where the mixture αe + (1 − α)f is defined in the obvious way. The suggested interpretation (p. 119) is that “substituting objective mixing for subjective mixing makes the decision-maker better off.” Within CEU theory, expanded to the domain H, U ceu is uncertainty averse if and only if the corresponding capacity ν is convex. Though formulated and motivated by Schmeidler within the AA framework, the identification of convexity of ν with uncertainty aversion has been widely adopted in many instances where the Savage domain F, rather than H, is the relevant one, that is, where choice behavior over F is the object of study and in which only such behavior is observable to the analyst. The Ellsberg single-urn experiment provides such a setting, but it was shown in Section 9.1.2 that convexity has little to do with intuitively uncertainty averse behavior in that setting. One possible reaction is to suggest that the single-urn experiment is special and that convexity is better suited to Ellsberg’s other principal experiment involving two urns, one ambiguous and the other unambiguous.19 Because behavior in this experiment is also prototypical of the behavior that is to be modeled and because it might be unrealistic to expect a single definition of uncertainty aversion to perform well in all settings, good performance of the convexity definition in this setting might restore its appeal. Moreover, such good performance might be expected because the Cartesian product state space that is natural for modeling the two-urn experiment suggests a connection with the horse-race/roulette-wheel acts in the AA domain. According to this view, the state space for the ambiguous urn “corresponds” to the horse-race stage of the AA acts and the state space for the unambiguous urn “corresponds” to the roulette-wheel component. In fact, the performance of the convexity definition is no better in the two-urn experiment than in the single-urn case. Rather than providing specific examples
A definition of uncertainty aversion
195
of capacities supporting this assertion, it may be more useful to point out why the grounds for optimism described earlier are unsound. In spite of the apparent correspondence between the AA setup and the Savage domain with a Cartesian product state space, these are substantially different specifications because, as pointed out by Sarin and Wakker (1992), only the AA domain involves two-stage acts (the horse-race first and then the roulette-wheel) and in Schmeidler’s formulation of CEU, these are evaluated in an iterative fashion. Eichberger and Kelsey (1996) show that this difference leads to different conclusions about the connection between convexity of the capacity and attitudes toward randomization. For the same reason the difference in domains leads to different conclusions about the connection between convexity of the capacity and attitudes toward uncertainty. In particular, convexity is not closely connected to typical behavior in the two-urn experiment. While the preceding discussion has centered on examples, albeit telling examples, there is a general point that may be worth making explicit. The general point concerns the practice of transferring to the Savage domain notions, such as uncertainty aversion, that have been formulated and motivated in the AA framework. The difference between the decision-maker’s attitude toward the second-stage roulette-wheel risk as opposed to the uncertainty inherent in the first-stage horse-race is the basis for Schmeidler’s definition of uncertainty aversion. The upshot is that uncertainty aversion is not manifested exclusively or primarily through the choice of pure horse-races or acts over S. Frequently, however, it is the latter choice behavior that is of primary interest to the modeller. This is the case, for example, in the Ellsberg experiments discussed earlier and is the reason for the weak (or nonexistent) connection between convexity and intuitive behavior in those experiments. This is not to deny that convexity may be a useful hypothesis even in a Savage framework nor that its interpretation as uncertainty aversion may be warranted where preferences over AA acts are observable, say in laboratory experiments. Accordingly, this is not a criticism of Schmeidler’s definition within his chosen framework. It argues only against the common practice of interpreting convexity as uncertainty aversion outside that framework. (An alternative behavioral interpretation for convexity is provided in Wakker (1996).) I conclude with one last remark on the AA domain. The extension of the Savage domain of acts to the AA domain is useful because the inclusion of second-stage lotteries delivers greater analytical power or simplicity. This is the reason for their inclusion by Anscombe and Aumann—to simplify the derivation of subjective probabilities—as well as in the axiomatizations of the CEU and multiple-priors utility functions in Schmeidler (1989) and Gilboa and Schmeidler (1989), respectively. In all these cases, roulette-wheels are a tool whose purpose is to help in delivering the representation of utility for acts over S. Kreps (1988: 101) writes that this is sensible in a normative application but “is a very dicey and perhaps completely useless procedure in descriptive applications” if only choices between acts over S are observable. Emphasizing and elaborating this point has been the objective of this section.
196
Larry G. Epstein
Appendix A: Proofs Proof of Lemma 9.1. U ps and U agree on F ua . Therefore, (i) follows from (9.7) and the monotonicity assumed for W . That U ps supports U implies by (9.8) that for all e ∈ F and h ∈ F ua , W (m,e ) ≤ W (m,h ) =⇒ U (e) ≤ U (h). This implies (ii). For the converse, define !ps as the order represented numerically by U ps , U ps (e) = W (m,e ),
e ∈ F,
where W : (X ) −→ R1 is defined by W () = U (h) for any h ∈ F ua
satisfying m,h = .
Part (i) ensures that W () does not depend on the choice of h, making W welldefined. The assumption added for m ensures that this defines W on all of (X ). Then U ps supports U . Proof of Lemma 9.2. (b) U ceu and U ps must agree on F ua , implying that ν and m are ordinally equivalent on A. Because ν is convex-ranged and A is rich, ν(A) = ν() = [0, 1]. Conclude that m(A) = [0, 1] also. Thus (9.14) is proven. Lemma 9.1(ii) implies that for all acts e and unambiguous acts h, U ceu (e) =
n−1 [u(xi ) − u(xi+1 )]ν(∪i1 e(xj )) + u(xn ) i=1
≤
n−1
[u(xi ) − u(xi+1 )]ν(∪i1 h(xj )) + u(xn ) = U ceu (h)
i=1
=
n−1
[u(xi ) − u(xi+1 )]g ◦ m(∪i1 h(xj )) + u(xn )
i=1
if m(e(xj )) = m(h(xj )) for all j . Because this inequality obtains for all u(x1 ) > · · · > u(xn ) and these utility levels can be varied over an open set containing some point (u(x), . . . , u(x)), it follows that g(m(∪i1 e(xj ))) = g(m(∪i1 h(xj ))) ≥ ν(∪i1 e(xj )), for all e and h as stated earlier. Given E ∈ , let e(x1 ) = E and e(x2 ) = E c , x1 x2 . There exists unambiguous A such that mE = mA. Let h(x1 ) = A and h(x2 ) = Ac . Then g(m(E)) ≥ ν(E) follows, proving (9.13). The sufficiency portion (a) can be proven by suitably reversing the preceding argument.
A definition of uncertainty aversion
197
Proof of Theorem 9.1. The following lemma is of independent interest because of the special significance of bets as a subclass of all acts. Notation from Section 9.3.4 is used below. Lemma 9.A.1. Suppose that A is rich, with outcomes x ∗ and x∗ as in the definition of richness. Let ν(E) ≡ U (x ∗ Ex∗ ). Then the conjunction of (9.16) and (9.17) implies that ν is ordinally equivalent to a probability measure on (or equivalently, ν satisfies (9.25)). A fortiori, the conclusion is valid if ! is both uncertainty averse and uncertainty loving. Proof. Let m and q be the hypothesized supports. Their defining properties imply that mF ≤ mG =⇒ qF ≤ qG, for all A ∈ A, F ⊂ Ac and G ⊂ A. But if this relation is applied to Ac in place of A, noting that Ac ∈ A, then the roles of F and G are reversed and one obtains mF ≥ mG =⇒ qF ≥ qG. In other words, mF ≤ mG ⇐⇒ qF ≤ qG, for all A ∈ A, F ⊂ Ac and G ⊂ A. Conclude from (9.16) and (9.17) that mF ≤ mG ⇐⇒ ν(A + F − G) ≤ νA for all A ∈ A, F ⊂ Ac and G ⊂ A; or equivalently, that for all A ∈ A, mE ≤ mA ⇐⇒ νE ≤ νA. In other words, every indifference curve for ν containing some unambiguous event is also an indifference curve for m. The stated hypothesis regarding A ensures that every indifference curve contains some unambiguous A and therefore that ν and m are ordinally equivalent on all of . ps
Complete the proof of Theorem 9.1. Denote by !ps and !∗ the probabilistically sophisticated preference orders supporting ! in the sense of (9.8) and (9.9), respectively, and having underlying probability measures m and q defined on . From the proof of the Lemma, m and q are ordinally equivalent on . Claim. For each act e, there exists h ∈ F ua such that ps
e ∼ps h and e ∼∗ h.
198
Larry G. Epstein
To see this, let e = ((xi , Ei )ni=1 ). By the richness of A, there exist unambiguous events H1 , such that, x ∗ H1 x∗ ∼ x ∗ E1 x∗ , or, in the notation of the lemma, ν(H1 ) = ν(E1 ). Because ν and m are ordinally equivalent, m(H1 ) = m(E1 ) and thus also m(H1c ) = m(E1c ) and ν(H1c ) = ν(E1c ). Thus one can apply richness again to find a suitable unambiguous subset H2 of H1c . Proceeding in this way, one constructs an unambiguous act h = ((xi , Hi )ni=1 ) such that ν(Hi ) = ν(Ei )
and m(Hi ) = m(Ei )
for all i. By the ordinal equivalence of m and q, q(Hi ) = q(Ei ),
all i.
The claim now follows immediately from the nature of probabilistic sophistication. ps From (9.8), ! and !ps agree on F ua . Similarly, ! and !∗ agree on F ua . ps ps Therefore, ! and !∗ agree there. From the claim, it follows that they agree on the complete set of acts F. The support properties (9.8) and (9.9) thus imply that h !ps e ⇐⇒ h ! e,
for all h ∈ F ua
and
e ∈ F.
In particular, every indifference curve for !ps containing some unambiguous act is also an indifference curve for !. But the qualification can be dropped because of the claim. It follows that ! and !ps coincide on F. Proof of Theorem 9.3. (a) Let m satisfy (9.16) at A. Show first that mF ≤ mG =⇒ δν(F ; A) ≤ δν(G; A),
(9.A.1)
for all, F ⊂ Ac and G ⊂ A: Fix ε > 0 and let λ0 be such that the expression defining δν(·; A) is less than ε whenever λ > λ0 . By Lemma 9.B.1, there exist partitions {F j ,λ }n1 λ and {Gj ,λ }n1 λ such that mF j ,λ ≤ mGj ,λ ,
j = 1, . . . , nλ ,
and λ > λ0 , hence nλ
|[ν(A) − ν(A + F j ,λ − Gj ,λ )] − Eδν(Gj ,λ ; A) − δν(F j ,λ ; A)]| < ε.
j =1
Because m is a support, ν(A + F j ,λ − Gj ,λ ) ≤ ν(A). Thus20 nλ [δν(Gj ,λ ; A) − δν(F j ,λ ; A)] > −ε. δν(G; A) − δν(F ; A) = j =1
However, ε is arbitrary. This proves (9.A.1).
A definition of uncertainty aversion
199
Replace A by Ac , in which case F and G reverse roles and deduce that mF ≥ mG =⇒ δν(F ; Ac ) ≥ δν(G; Ac ) or equivalently, δν(F ; Ac ) ≤ δν(G; Ac ) =⇒ mF ≤ mG.
(9.A.2)
Because m is a support, this yields (9.28). (b) Let A ∈ A satisfy S A and
S Ac .
(9.A.3)
Claim 1. δν(Ac ; A) > 0. If it equals zero, then δν(Ac ; A) = δν(φ; A) implies, by (9.28), that A + Ac ) A, or S ∼ A, contrary to (9.A.3). Claim 2. mAc > 0. If not, then mS ≤ mA = 1 and (9.16) implies that S ∼ A, contrary to (9.A.3). Claim 3. δν(A; Ac ) > 0 and mA > 0. Replace A by Ac above. Claim 4. δν(Ac ; Ac ) > 0. If it equals zero, then δν(A; Ac ) mAc = 0 by (9.29), contradicting Claim 3. Claim 5. For any G ⊂ A, δν(G; A) = 0 =⇒ mG = 0: Let F = Ac . By Claim 1, δν(F ; A) > 0. Therefore, Lemma 9.B.1 implies that ∀λ0 ∃λ > λ0 , j . By (9.A.1), ∀λ0 ∃ λ > λ0 , m(F j ,λ ) > m(G) δν(F j ,λ ; A) > 0 = δν(G; A) for nall λ for all j , and thus also mF > j =1 (mG). This implies mG = 0. Claim 6. For any F ⊂ Ac , mF = 0 =⇒ δν(F ; A) = 0: mF = 0 =⇒ (by (9.A.1)) δν(F ; A) ≤ δν(G; A) for all G ⊂ A. Claim 4 implies δν(G; A) > 0 if G = A. Therefore, δν(·; A) convex-ranged implies (Lemma 9.B.1) that δν(F ; A) = 0. Claim 7. m is convex-ranged: By Claim 5, m is absolutely continuous with respect to δν(·; A) on A. The latter measure is convex-ranged. Therefore, m has no atoms in A. Replace A by Ac and use the convex range of δν(·; Ac ) to deduce in a similar fashion that m has no atoms in Ac . Thus m is non-atomic. Because it is also countably additive by hypothesis, conclude that it is convex-ranged (Rao and Rao, 1983: Theorem 5.1.6). Turn to (9.29); (9.30) may be proven similarly. Define the measures µ and p on Ac × A as follows: µ = m δν(·; A),
p = δν(·; A) ⊗ m.
Claims 5 and 6 prove that p µ. Denote by h ≡ dp/dµ the Radon-Nikodym density. (Countable additivity is used here.)
200
Larry G. Epstein
Claim 8. µ{(s, t) ∈ Ac × A : h(s, t) > 1} = 0: If not, then there exist F0 ⊂ Ac and G0 ⊂ A, with µ(F0 × G0 ) > 0, such that h>1
on F0 × G0 .
Case 1. mF0 = mG0 . Integration delivers that
F0 G0 [h(s, t)
− 1]dµ > 0, implying
δν(F0 ; A)mG0 − mF0 δν(G0 ; A) > 0. Consequently, mF0 = mG0 and δν(F0 ; A) > δν(G0 ; A), contradicting (9.A.1). Case 2. mF0 < mG0 . Because m is convex-ranged (Claim 7), there exists G1 ⊂ G0 such that mG1 = mF0 and µ(F0 × G1 ) > 0. Thus the argument in Case 1 can be applied. Case 3. mF0 > mG0 . Similar to Case 2. c This proves Claim 8. Finally, for any F ⊂ A and G ⊂ A, δν(F ; A) (mG) − (mF )δν(G; A) = F G (h − 1)dµ ≤ 0, proving (9.29). (c) Though at first glance the proof may seem obvious given (9.31), some needed details are provided here. Let A ∈ A0 . Multiply through (9.29) by δν(G; Ac ) to obtain that
δν(F ; A)δν(G; Ac )mG ≤ δν(G; A)δν(G; Ac )mF , for all F ⊂ Ac and G ⊂ A. Similarly, multiplying through (9.30) by δν(G; A) yields δν(G; A)δν(G; Ac )mF ≤ δν(G; A)δν(F ; Ac )mG, for all such F and G. Conclude from coherence that δν(G; A)δν(G; Ac )mF = δν(G; A)δν(F ; Ac )mG,
(9.A.4)
for all F ⊂ Ac and G ⊂ A. Take G = A in (9.A.4) to deduce δν(F ; Ac ) = δν(A; Ac )m(F )/m(A),
for all F ⊂ Ac .
(9.A.5)
Next take F = Ac in (9.A.4). If δν(G; A) > 0, then δν(G; Ac ) = δν(Ac ; Ac )m(G)/m(Ac ),
for all G ⊂ A.
(9.A.6)
This equation is true also if δν(G; A) = 0, because then (9.28), with F = Ac , implies δν(Ac ; A)m(G) = 0, which implies mG = 0 by Claim 1.
A definition of uncertainty aversion
201
Substitute the expressions for δν(F ; Ac ) and δν(G; Ac ) into (9.A.4) and set F = Ac and G = A to derive δν(Ac ; Ac )/m(Ac ) = δν(A; Ac )/m(A) ≡ α(A) > 0. Thus
δν(·; A ) = c
α(A)m(·) on ∩ Ac α(A)m(·) on ∩ A.
By additivity, it follows that δν(·; Ac ) = α(A) m(·) on all of . Thus δν(·; A) = κ(A)α(A)m(·), completing the proof.
Appendix B: Additive functions on
X
Some details are provided for such functions, as defined in Section 9.4.1. For any additive µ, µ(∅) = 0 and µ(e) = x µx (e(x)),
(9.B.1)
where µx is the marginal measure on defined by µx (E) = the µ-measure of the act that assigns E to the outcome x and the empty set to every other outcome. Apply to each marginal the standard notions and results for finitely additive measures on an algebra (see Rao and Rao, 1983). In this way, one obtains a decomposition of µ, µ = µ+ − µ− , where µ+ and µ− are non-negative measures. Define | µ | = µ+ + µ − . Say that the measure µ is bounded if ⎧ ⎫ nλ ⎨ ⎬ sup | µ | (f ) = sup | µ(f j ,λ ) | : f ∈ X , λ < ∞. ⎩ ⎭ f
(9.B.2)
j =1
Call the measure µ on X convex-ranged if for every e and r ∈ (0, | µ | (e)), there exists b, b ⊂ e such that | µ | (b) = r, where e and b are elements of X . Lemma 9.B.1 summarizes some useful properties of convex-ranged measures on X . See (Rao and Rao, 1983: 142–3) for comparable results for measures on an algebra. In Rao and Rao, 1983 property (b) is referred to as strong continuity. Lemma 9.B.1. Let µ be a measure on X . Then the following statements are equivalent: (a) µ is convex-ranged.
202
Larry G. Epstein
λ (b) For any act f , with corresponding net of all finite partitions {f j ,λ }nj =1 , and for any ε > 0, there exists λ0 such that
λ > λ0 =⇒ | µ | (f j ,λ ) < ,
for j = 1, . . . , nλ .
(c) For any acts f , g, and h ≡ f + g, if µ(f ) > µ(g), then there exists a λ partition {hj ,λ }nj =1 of h, such that µ(hj ,λ ) < ε and µ(hj ,λ ∩f ) > µ(hj ,λ ∩g), j = 1, . . . , nλ .
Appendix C: Differentiability This appendix elaborates on mathematical aspects of the definition of eventwise differentiability. Then it describes a stronger differentiability notion. The requirement of convex range for δ(·; e) is not needed everywhere, but is built into the definition for ease of exposition. Though I use the term derivative, δ(·; e) is actually the counterpart of a differential. The need for a signed measure arises from the absence of any monotonicity assumptions. If (·) is monotone with respect to inclusion ⊂, then each δ(·; e) is a non-negative measure. The limiting condition (9.23) may seem unusual because it does not involve a difference quotient. It may be comforting, therefore, to observe that a comparable condition can be identified in calculus: For a function ϕ : R1 −→ R1 that is differentiable at some x in the usual sense, elementary algebraic manipulation of the definition of the derivative ϕ (x) yields the following expression paralleling (9.23): N
[ϕ(x + N −1 ) − ϕ(x) − N −1 ϕ (x)] −→N−→∞ 0.
i=1
Further clarification is afforded as follows by comparison with Gateaux differentiability: Roughly speaking, eventwise differentiability at e states that the difference (e + f − g) − (e) can be approximated by δ(f ; e) − δ(g; e) for suitably “small” f and g, where the small size of the perturbation “f − g” is in the sense of the fineness of the partitions as λ grows. Naturally, it is important that the approximating functional δ(·; e) is additive (a signed measure). There is an apparent parallel with Gateaux (directional) differentiability of functions defined on a linear space—“f − g” represents the “direction” of perturbation and the additive approximation replaces the usual linear one. Note that the perturbation from e to e + f − g is perfectly general; any e can be expressed (uniquely) in the form e = e + f − g, with f ⊂ ec and g ⊂ e (see (9.18)). A natural question is “how restrictive is the assumption of eventwise differentiability?” In this connection, the reader may have noted that the definition is formulated for an arbitrary state space S and algebra . However, eventwise differentiability is potentially interesting only in cases where these are both infinite. That is because if is finite, then is differentiable if and only if it is additive.
A definition of uncertainty aversion
203
Another question concerns the uniqueness of the derivative. The limiting condition (9.23) has at most one solution, that is, the derivative is unique if it exists: If p and q are two measures on X satisfying the limiting property, then for λ | p(g j ,λ ) − q(g j ,λ ) | −→λ 0. Therefore, each g ⊂ ec , | p(g) − q(g) | ≤ nj =1 p(g) = q(g) for all g ⊂ e. Similarly, prove equality for all f ⊂ ec and then apply additivity. Next I describe a Chain Rule for eventwise differentiability. Theorem 9.C.1. Let : X −→ R1 be eventwise differentiable at e and ϕ : ( X ) −→ R1 be strictly increasing and continuously differentiable. Then ϕ ◦ is eventwise differentiable at e and δ(ϕ ◦ )(·; e) = ϕ ((e))δ(·; e) Proof. Consider the sum whose convergence defines the eventwise derivative of ϕ ◦ . By the Mean Value Theorem, ϕ ◦ (e + f j ,λ − g j ,λ ) − ϕ ◦ (e) = ϕ (zj ,λ )[(e + f j ,λ − g j ,λ ) − (e)] for suitable real numbers zj ,λ . Therefore, it suffices to prove that nλ
| (e + f j ,λ − g j ,λ ) − (e) || ϕ (zj ,λ ) − ϕ ((e)) | −→λ 0.
j =1
By the continuity of ϕ , the second term converges to zero uniformly in j . Eventwise differentiability of implies that given ε, there exists λ0 such that λ > λ0 =⇒ nλ
| (e + f j ,λ − g j ,λ ) − (e) | ≤ ε +
j =1
nλ
| δ(f j ,λ ; e) − δ(g j ,λ ; e) |
j =1
≤ε+
nλ
[| δ(f j ,λ ; e) | + | δ(g j ,λ ; e) |]
j =1
≤ K, for some K < ∞ that is independent of λ, f and g, as provided by the boundedness of the measure δ(·; e). Eventwise differentiability is inspired by Rosenmuller’s (1972) notion, but there are differences. Rosenmuller deals with convex capacities defined on , rather than with utility functions defined on acts. Even within that framework, his formulation differs from (9.23) and relies on the assumed convexity. Moreover, he restricts attention to “one-sided” derivatives, that is, where the inner perturbation g is identically empty (producing an outer derivative), or where the outer perturbation
204
Larry G. Epstein
f is identically empty (producing an inner derivative). Finally, Rosenmuller’s application is to cooperative game theory rather than to decision theory. A strengthening of eventwise differentiability, called µ-differentiability, is described here. The stronger notion is more easily interpreted, thus casting further light on eventwise differentiability, and it delivers a form of the Fundamental Theorem of Calculus. Machina (1992) introduces a very similar notion. Because it is new and still unfamiliar and because our formulation is somewhat different and arguably more transparent, a detailed description seems in order.21 To proceed, adopt as another primitive a non-negative, bounded and convexranged measure µ on X . This measure serves the “technical role” of determining the distance between acts. To be precise, if e and e are identified whenever µ(ee ) = 0, then d(e, e ) = µ(ee )
(9.C.1)
defines a metric on X ; the assumption of convex range renders the metric space path-connected (by Volkmer and Weber, 1983; see also Landers, 1973: Lemma 4). One way in which such a measure can arise is from a convex-ranged probability measure µ0 on . Given µ0 , define µ by µ(e) = x µ0 (e(x)).
(9.C.2)
Once again let : X −→ R1 Because acts e and e are identified when µ(ee ) = 0, is assumed to satisfy the condition µ(ee ) = 0 =⇒ (e ∪ f ) = (e ∪ f ),
for all f .
(9.C.3)
In particular, acts of µ-measure 0 are assumed to be “null” with respect to . Definition 9.C.1. is µ-differentiable at e ∈ X if there exists a bounded and convex-ranged measure δ(·; e) on X , such that for all f ⊂ ec and g ⊂ e, | (e + f − g) − (e) − δ(f ; e) + δ(g; e) | /µ(f + g) −→ 0 (9.C.4) as µ(f + g) −→ 0. The presence of a “difference quotient” makes the definition more familiar in appearance and permits an obvious interpretation. Think in particular of the case (| X | = 1) where the domain of is . It is easy to see that δ(·; e) is absolutely continuous with respect to µ for each e. (Use additivity of the derivative and (9.C.3).) Eventwise and µ-derivatives have not been distinguished notationally because they coincide whenever both exist. Lemma 9.C.1. If is µ-differentiable at some e in X , then is also eventwise differentiable at e and the two derivatives coincide.
A definition of uncertainty aversion
205
Proof. Let δ(·; e) be the µ-derivative at e, f ⊂ ec and g ⊂ e. Given ε > 0, there exists (by µ-differentiability) ε > 0 such that | (e + f − g ) − (e) − δ(f ; e) + δ(g ; e) | < εµ(f + g ), (9.C.5) if µ(f + g ) < ε . By Lemma 9.B.1 applied to the convex-ranged µ, there exists λ0 such that µ(f j ,λ + g j ,λ ) < ε ,
for all λ > λ0 .
Therefore, one can apply (9.C.5) to the acts (f , g ) = (f j ,λ , g j ,λ ). Deduce that nλ
| (e + f j ,λ − g j ,λ ) − (e) − δ(f j ,λ ; e) + δ(g j ,λ ; e) |
j =1
<ε
nλ
µ(f j ,λ + g j ,λ ) = εµ(f + g) < ε sup µ(·).
j =1
A consequence is that the µ-derivative of is independent of µ; that is, if µ1 and µ2 are two measures satisfying the conditions in the lemma, then they imply the identical derivatives for . This follows from the uniqueness of the eventwise derivative noted earlier. Such invariance is important in light of the exogenous and ad hoc nature of µ. This result is evident because of the deeper perspective afforded by the notion of eventwise differentiability and reflects its superiority over the notion of µ-differentiability. Finally, under a slight strengthening of µ-differentiability, one can “integrate” back to from its derivatives. That is, a form of the Fundamental Theorem of Calculus is valid. Lemma 9.C.2. Let be µ-differentiable and suppose that the convergence in (9.C.4) is uniform in e. For every ε > 0, f ⊂ ec and g ⊂ e, there exist finite partitions f = f j and g = g j such that ε > | (e+f −g)−(e)−i δ(f i ; e+F i−1 −G i−1 )+i δ(g i ; e+F i−1 −G i−1 ) | (9.C.6) where F i =
i
j =1 f
j
and G i =
i
j =1 g
j.
Proof. µ-differentiability and the indicated uniform convergence imply that | (e + F i−1 − G i−1 + f i − g i ) − (e + F i−1 − G i−1 ) −δ(f i ; e + F i−1 − G i−1 ) + δ(g i ; e + F i−1 − G i−1 ) | < εµ(f i + g i ), for any partitions {f j } and {g j } such that µ(f j + g j ) is sufficiently small for all j . But the latter can be ensured by taking the partitions {f j ,λ } and {g j ,λ } for λ sufficiently large. The convex range assumption for µ enters here; use Lemma 9.B.1.
206
Larry G. Epstein
Therefore, the triangle inequality delivers | (e + f − g) − (e) − δ(f i ; e + F i − 1 − G i − 1 ) + δ(g i ; e + F i − 1 − G i − 1 ) | ≤ εi µ(f i + g i ) = εµ(f + g).
Acknowledgments An earlier version of this chapter was circulated in July 1997 under the title “Uncertainty Aversion.” The financial support of the Social Sciences and Humanities Research Council of Canada and the hospitality of the Hong Kong University of Science and Technology are gratefully acknowledged. I have also benefitted from discussions with Paolo Ghirardato, Jiankang Zhang and especially Kin Chung Lo, Massimo Marinacci and Uzi Segal, and from comments by audiences at HKUST and the Chantilly Workshop on Decision Theory, June 1997. The suggestions of an editor and referee led to an improved exposition.
Notes 1 After a version of this chapter was completed, I learned of a revision of Machina (1992), dated 1997, that is even more closely related. 2 Zhang (1997) is the first paper to propose a definition of ambiguity that is derived from preference, but his definition is problematic. An improved definition is the subject of current research by this author and Zhang. 3 See Section 9.3.2 for the definition of Choquet integration. 4 As explained in Section 9.5, the examples to follow raise questions about the widespread use that has been made of Schmeidler’s definition rather than about the definition itself. Section 9.3.4 describes the performance of this chapter’s definition of uncertainty aversion in the Ellsbergian setting. 5 In terms of acts, {R} {B} means 1R 1B and so on. For CEU, a decision-maker always prefers to bet on the event having the larger capacity. 6 p∗ is an inner measure, as defined and discussed further in Section 9.3.3. 7 Write y ≥ x if receiving outcome y with probability 1 is weakly preferable, according to U ps , to receiving x for sure. first-order stochastically dominates if for all outcomes y, ({x ∈ X : y ≥ x}) ≤ ({x ∈ X : y ≥ x}). Thus the partial order depends on the utility function U ps , but that causes no difficulties. See Machina and Schmeidler (1992) for further details. 8 Subjective expected utility is the special case of (9.7) with W () = X u(x)d(x). But more general risk preferences W are admitted, subject only to the noted monotonicity restriction. In particular, probabilistically sophisticated preference can rationalize behavior such as that exhibited in theAllais Paradox. It follows that uncertainty aversion, as defined shortly, is concerned with Ellsberg-type, and not, Allais-type, behavior. 9 It merits emphasis that richness of A is needed only for some results stated later; for example, for the necessity parts of Theorem 9.1 and Lemma 9.2. Richness is not used to describe conditions that are sufficient for uncertainty aversion (or neutrality). In particular, the approach and definitions of this chapter are potentially useful even if A = {∅, S}. 10 See Rao and Rao (1983). Given countable additivity, convex-ranged is equivalent to non-atomicity. 11 Massimo Marinacci provided the following example: Let S be the set of integers {1, . . . , 6} and A the λ-system {∅, S} ∪ {Ai , Aci : 1 ≤ i ≤ 3}, where A1 = {1, 2, 3},
A definition of uncertainty aversion
12
13 14 15
207
A2 = {3, 4, 5} and A3 = {1, 5, 6}. Define p on A as the unique probability measure satisfying p(Ai ) = 1/6 for all i. If p has an extension to the power set, then p(∪i Ai ) = 1 > 1/2 = i p(Ai ). However, the reverse inequality must obtain for any probability measure. This condition is necessary for uncertainty aversion but not sufficient, even if there are only two possible outcomes. That is because by taking h in (9.8) to be a constant act, one concludes that an uncertainty averse order ! assigns a lower certainty equivalent to any act than does the supporting order !ps . In contrast, (9.16) contains information only on the ranking of bets and not on their certainty equivalents. (I am assuming here that certainty equivalents exist.) These ordinal properties are independent of the particular pair of outcomes satisfying x1 x2 if (and only if) ! satisfies Savage’s axiom P4: For any events A and B and outcomes x1 x2 and y1 y2 , x1 Ax2 ! x1 Bx2 implies that y1 Ay2 ! y1 By2 . Alternatively, we could show that the rankings in (9.3) are inconsistent with the implication (9.17) of uncertainty loving. A slight strengthening of (9.19) is valid. Suppose that A − F i + Gi ! A all i,
16 17 18
19
20 21
for some partitions F = F i and G = Gi . Only the trivial partitions were admitted earlier. Then additivity of the supporting measure implies as above that mF ≤ mG and hence that A + F − G ) A. In particular, X is not the product algebra on S X induced by . However, X is a ring, that is, it is closed with respect to unions and differences. More generally, a counterpart of the usual product rule of differentiation is valid for eventwise differentiation. Even given (9.27), the supporting measure at a given single A is not unique, contrary to the intuition suggested by calculus. If the support property “mF ≤ mG =⇒ ν(A + F − G) ≤ νA,” is satisfied by m, then it is also satisfied by any m satisfying m(·) ≤ m (·) on ∩ Ac and m(·) ≥ m (·) on ∩ A. For example, let m be the conditional of m given Ac . Each urn contains 100 balls that are either red or blue. For the ambiguous urn this is all the information provided. For the unambiguous urn, the decision-maker is told that there are 50 balls of each color. The choice problem is whether to bet on drawing a red (or blue) ball from the ambiguous urn versus the unambiguous one. Given |xj − yj | < ε and yj ≥ 0 for all j , then xj ≥ −| xj − yj | + yj ≥ |xj − yj |, implying that xj > −ε. As mentioned earlier, after a version of this chapter was completed, I learned of a revision of Machina (1992), dated 1997, in which Machina provides a formulation very similar to that provided in this subsection. The connection with the more general “partitionsbased” notion of eventwise differentiability, inspired by Rosenmuller (1972), is not observed by Machina.
References F. Anscombe and R.J. Aumann (1963) “A Definition of Subjective Probability,” Ann. Math. Stat., 34, 199–205. P. Billingsley (1986) Probability and Measure, John Wiley. S.H. Chew, E. Karni and Z. Safra (1987) “Risk Aversion in the Theory of Expected Utility with Rank Dependent Probabilities,” J. Econ. Theory, 42, 370–381. S.H. Chew and M.H. Mao (1995) “A Schur Concave Characterization of Risk Aversion for Non-Expected Utility Preferences,” J. Econ. Theory, 67, 402–435.
208
Larry G. Epstein
J. Eichberger and D. Kelsey (1996) “Uncertainty Aversion and Preference for Randomization,” J. Econ. Theory, 71, 31–43. L.G. Epstein and T. Wang (1995) “Uncertainty, Risk-Neutral Measures and Security Price Booms and Crashes,” J. Econ. Theory, 67, 40–82. I. Gilboa (1987) “Expected Utility with Purely Subjective Non-Additive Probabilities,” J. Math. Econ., 16, 65–88. I. Gilboa and D. Schmeidler (1989) “Maxmin Expected Utility With Nonunique Prior,” J. Math. Econ., 18, 141–153. (Reprinted as Chapter 6 in this volume.) P.R. Halmos (1974) Measure Theory, Springer–Verlag. E. Karni (1983) “Risk Aversion for State-Dependent Utility Functions: Measurement and Applications,” Int. Ec. Rev., 24, 637–647. D.M. Kreps (1988) Notes on the Theory of Choice, Westview. D. Landers (1973) “Connectedness Properties of the Range of Vector and Semimeasures,” Manuscripta Math., 9, 105–112. M. Machina (1982) “Expected Utility Analysis Without the Independence Axiom,” Econometrica, 50, 277–323. M. Machina (1992) “Local Probabilistic Sophistication,” mimeo. M. Machina and D. Schmeidler (1992) “A More Robust Definition of Subjective Probability,” Econometrica, 60, 745–780. K.P. Rao and M.B. Rao (1983) Theory of Charges, Academic Press. J. Rosenmuller (1972) “Some Properties of Convex Set Functions, Part II,” Methods of Oper. Research, 17, 277–307. R. Sarin and P. Wakker (1992) “A Simple Axiomatization of Nonadditive Expected Utility,” Econometrica, 60, 1255–1272. (Reprinted as Chapter 7 in this volume.) L. Savage (1954) The Foundations of Statistics, John Wiley. D. Schmeidler (1972) “Cores of Exact Games,” J. Math. Anal. and Appl., 40, 214–225. D. Schmeidler (1986) “Integral Representation without Additivity,” Proc. Amer. Math. Soc., 97, 255–261. D. Schmeidler (1989) “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) G. Shafer (1979) “Allocations of Probability,” Ann. Prob., 7, 827–839. H. Volkmer and H. Weber (1983) “Der Wertebereich Atomloser Inhalte,” Archive der Mathematik, 40, 464–474. P. Wakker (1996) “Preference Conditions for Convex and Concave Capacities in Choquet Expected Utility,” mimeo. L. Wasserman (1990) “Bayes’Theorem for Choquet Capacities,” Ann. Stat., 18, 1328–1339. M. Yaari (1969) “Some Remarks on Measures of Risk Aversion and on Their Uses,” J. Econ. Theory, 1, 315–329. J. Zhang (1997) “Subjective Ambiguity, Probability and Capacity,” U. Toronto, mimeo.
10 Ambiguity made precise A comparative foundation Paolo Ghirardato and Massimo Marinacci
10.1. Introduction In this chapter we propose and characterize a formal definition of ambiguity aversion for a class of preference models which encompasses the most popular models developed to allow ambiguity attitude in decision making. Using this notion, we define and characterize ambiguity of events for ambiguity averse or loving preferences. Our analysis is based on a fully “subjective” framework with no extraneous devices (like a roulette wheel, or a rich set of exogenously “unambiguous” events). This yields a definition that can be fruitfully used with any preference in the mentioned class, though it imposes a limitation in the definition’s ability of distinguishing “real” ambiguity aversion from other behavioral traits that have been observed experimentally. The subjective expected utility (SEU) theory of decision making under uncertainty of Savage (1954) is firmly established as the choice-theoretic underpinning of modern economic theory. However, such success has well-known costs: SEU’s simple and powerful representation is often violated by actual behavior, and it imposes unwanted restrictions. In particular, Ellsberg’s (1961) famous thought experiment (see Section 10.6) convincingly shows that SEU cannot take into account the possibility that the information a decision maker (DM) has about some relevant uncertain event is vague or imprecise, and that such “ambiguity” affects her behavior. Ellsberg observed that ambiguity affected his “nonexperimental” subjects in a consistent fashion: Most of them preferred to bet on unambiguous rather than ambiguous events. Furthermore, he found that even when shown the inconsistency of their behavior with SEU, the subjects stood their ground “because it seems to them the sensible way to behave.” This attitude has later been named ambiguity aversion, and has received ample experimental confirmation.1 Savage was well aware of this limit of SEU, for he wrote: There seem to be some probability relations about which we feel relatively “sure” as compared with others. . . . The notion of “sure” and “unsure”
Ghirardato, P. and M. Marinacci (2002), “Ambiguity made precise: Acomparative foundation,” Journal of Economic Theory, 102, 251–289.
210
Paolo Ghirardato and Massimo Marinacci introduced here is vague, and my complaint is precisely that neither the theory of personal probability, as it is developed in this book, nor any other device known to me renders the notion less vague. (Savage 1954: 57–58 of the 1972 edition)
In the wake of Ellsberg’s contribution, extensions of SEU have been developed allowing ambiguity, and the DM’s attitude towards it, to play a role in her choices. Two methods for extending SEU have established themselves as the standards of this literature. The first, originally proposed in Schmeidler (1989), is to allow the DM’s beliefs on the state space to be represented by nonadditive probabilities, called capacities, and her preferences by Choquet integrals (which are just standard integrals when integrated with respect to additive probabilities). For this reason, this generalization is called the theory of Choquet expected utility (CEU) maximization. The second, axiomatized by Gilboa and Schmeidler (1989), allows the DM’s beliefs to be represented by multiple probabilities, and represents her preferences by the “maximin” on the set of the expected utilities. This generalization is thus called the maxmin expected utility (MEU) theory. Here we use the general class of preferences with ambiguity attitudes developed in Ghirardato and Marinacci (2000a). These orderings, that we call biseparable preferences, are all those such that the ranking of consequences can be represented by a stateindependent cardinal utility u, and the ranking of bets on events by u and a unique numerical function (a capacity) ρ.2 The latter represents the DM’s willingness to bet; that is, ρ(A) is roughly the number of euros she is willing to exchange for a bet that pays 1 euro if event A obtains and 0 euros otherwise. The only restriction imposed on the ranking of nonbinary acts is a mild dominance condition. CEU and MEU are special cases of biseparable preferences, where ρ is respectively the DM’s nonadditive belief and the lower envelope of her multiple probabilities. An important reason for the lasting success of SEU theory is the elegant theory of the measurement of risk aversion developed from the seminal contributions of de Finetti (1952), Arrow (1974) and Pratt (1964). Unlike risk aversion, ambiguity aversion is yet without a fully general formalization, one that does not require extraneous devices and applies to most if not all the existing models of ambiguity averse behavior. This chapter attempts to fill this gap: We propose a definition of ambiguity aversion and show its formal characterization in the general decisiontheoretic framework of Savage, whose only restriction is a richness condition on the set of consequences. Our definition is behavioral; that is, it only requires observation of the DM’s preferences on acts in this fully subjective setting. However, the definition works as well (indeed better, see Proposition 10.2) in the Anscombe– Aumann framework, a special case of Savage’s framework which presumes the existence of an auxiliary device with “known” probabilities. Decision models with ambiguity averse preferences are the objects of increasing attention by economists and political scientists interested in explaining phenomena at odds with SEU. For example, they have been used to explain the existence of incomplete contracts (Mukerji, 1998), the existence of substantial volatility in stock markets (Epstein and Wang, 1994; Hansen et al., 1999), or selective
Ambiguity made precise
211
abstention in political elections (Ghirardato and Katz, 2000). We hope that the characterization provided here will turn out to be useful for the “applications” of models of ambiguity aversion, as that of risk aversion was for the applications of SEU. More concretely, we hope that it will help to understand the predictive differences of risk and ambiguity attitudes. To understand our definition, it is helpful to go back to the characterization of risk aversion in the SEU model. The following approach to defining risk aversion was inspired by Yaari (1969). Given a state space S, let F denote a collection of “acts”, maps from S into R (e.g. monetary payoffs). Define a comparative notion of risk aversion for SEU preferences as follows: Say that 2 is more risk averse than 1 if they have identical beliefs and the following implications hold for every “riskless” (i.e. constant) act x and every “risky” act f : x 1 f
⇒ x 2 f
(10.1)
x 1 f
⇒ x 2 f
(10.2)
(where is the asymmetric component of ). Identity of beliefs is required to avoid possible confusions between differences in risk attitudes and in beliefs (cf. Yaari, 1969: 317). We can use this comparative ranking to obtain an absolute notion of risk aversion by calling some DMs—for instance expected value maximizers – risk neutral, and by then calling risk averse those DMs who are more risk averse than risk neutrals. As it is well known, this “comparatively founded” notion has the usual characterization. Like the traditional “direct” definition of risk aversion, it is fully behavioral in the sense defined above. However, its interpretation is based on two primitive assumptions. First, constant acts are intuitively riskless. Second, expected value maximization intuitively reflects risk neutral behavior, so that it can be used as our benchmark for measuring risk aversion. In this chapter, we follow the example of Epstein (1999) in giving a comparative foundation to ambiguity attitude: We start from a “more ambiguity averse than . . .” ranking and then establish a benchmark, thus obtaining an “absolute” definition of ambiguity aversion. Analogously to Yaari’s, our “more ambiguity averse . . .” relation is based on the following intuitive consideration: If a DM prefers an unambiguous (resp. ambiguous) act to an ambiguous (resp. unambiguous) one, a more (resp. less) ambiguity averse one will do the same. This is natural, but it raises the obvious question of which acts should be used as the “unambiguous” acts for this ranking. Depending on the decision problem the DM is facing and on her information, there might be different sets of “obviously” unambiguous acts; that is, acts that we are confident that any DM perceives as unambiguous. It seems intuitive to us that in any well-formulated problem, the constant acts will be in this set. Hence, we make our first primitive assumption: Constant acts are the only acts that are “obviously” unambiguous in any problem, since other acts may not be perceived as unambiguous by some DM in some state of information. This assumption implies that a preference (not necessarily SEU) 2 is more ambiguity averse than 1 whenever Equations (10.1) and (10.2) hold. However, the following example casts some doubts as to the intuitive appeal of such definition.
212
Paolo Ghirardato and Massimo Marinacci
Example 10.1. Consider an (Ellsberg) urn containing balls of two colors: Black and Red. Two DMs are facing this urn, and they have no information on its composition. The first DM has SEU preferences 1 , with a utility function on the set of consequences R given by u1 (x) = x, and beliefs on the state space of ball extractions S = {B, R} given by ρ1 (B) =
1 2
and
ρ1 (R) = 12 .
The second DM also has SEU √ preferences, and identical beliefs: Her preference 2 is represented by u2 (x) = x and ρ2 = ρ1 . Both (10.1) and (10.2) hold, but it is quite clear that this is due to differences in the DMs’ risk attitudes, and not in their ambiguity attitudes: They both apparently disregard the ambiguity in their information. Given a biseparable preference, call cardinal risk attitude the psychological trait described by the utility function u—what explains any differences in the choices over bets of two biseparable preferences with the same willingness to bet ρ. The problem with the example is that the two DMs have different cardinal risk attitude. To avoid confusions of this sort, our comparative ambiguity ranking uses Equations (10.1) and (10.2) only on pairs which satisfy a behavioral condition, called cardinal symmetry, that implies that two DMs have identical u. As it only looks at each DM’s preferences over bets on one event (which may be different across DMs), cardinal symmetry does not impose any restriction on the DMs’ relative ambiguity attitudes. Having thus constructed the comparative ambiguity ranking, we next choose a benchmark against which to measure ambiguity aversion. It seems generally agreed that SEU preferences are intuitively ambiguity neutral. We use SEU preferences as benchmarks because we posit—our second primitive assumption—that they are the only ones that are “obviously” ambiguity neutral in any decision problem and in any situation. Thus, ambiguity averse is any preference relation for which there is a SEU preference “less ambiguity averse than” . Ambiguity love and (endogenous) neutrality are defined in the obvious way. The main results in the chapter present the characterization of these notions of ambiguity attitude for biseparable preferences. The characterization of ambiguity neutrality is simply stated: A preference is ambiguity neutral if and only if it has a SEU representation. That is, the only preferences which are endogenously ambiguity neutral are SEU. The general characterization of ambiguity aversion (resp. love) implies in particular that a preference is ambiguity averse (resp. loving) only if its willingness to bet ρ is pointwise dominated by (resp. pointwise dominates) a probability. In the CEU case, the converse is also true: A CEU preference is ambiguity averse if and only if its belief (which is equal to ρ) is dominated by a probability; that is, it has a nonempty “core.” On the other hand, all MEU preferences are ambiguity averse, as it is intuitive. As to comparative ambiguity aversion, we find that if 2 is more ambiguity averse than 1 then ρ1 ρ2 . That is, a less ambiguity averse DM will have uniformly higher willingness to
Ambiguity made precise
213
bet. The latter condition is also sufficient for CEU preferences, whereas for MEU preferences containment of the sets of probabilities is necessary and sufficient for relative ambiguity. We next briefly turn to the issue of defining ambiguity itself. A “behavioral” notion of unambiguous act follows naturally from our earlier analysis: Say that an act is unambiguous if an ambiguity averse (or loving) DM evaluates it in an ambiguity neutral fashion. The unambiguous events are those that unambiguous acts depend upon. We obtain the following simple characterization of the set of unambiguous events for biseparable preferences: For an ambiguity averse (or loving) DM with willingness to bet ρ, event A is unambiguous if and only if ρ(A) + ρ(Ac ) = 1. (A more extensive discussion of ambiguity is contained in the companion (Ghirardato and Marinacci, 2000a).) Finally, as an application of the previous analysis, we consider the classical Ellsberg problem with a 3-color urn. We show that the theory delivers the intuitive answers, once the information provided to the DM is correctly incorporated. It is important to underscore from the outset two important limitations of the notions of ambiguity attitude we propose. The first limitation is that while the comparative foundation makes our absolute notion “behavioral,” in the sense defined above, it also makes it computationally demanding. A more satisfactory definition would be one which is more “direct:” It can be verified by observing a smaller subset of the DM’s preference relation. While we conjecture that it may be possible to construct such a definition—obtaining the same characterization as the one proposed here—we leave its development to future work. Our comparative notion is more direct, thus less amenable to this criticism. However, it is in turn limited by the requirement of the identity of cardinal risk attitude. The absolute notion is not, as it conceptually builds on the comparison of the DM with an idealized version of herself, identical to her in all traits but her ambiguity aversion. The second limitation stems from the fact that no extraneous devices are used in this chapter. An advantage of this is that our notions apply to any decision problem under uncertainty, and our results to any biseparable preference. However, such wide scope carries costs: Our notion of ambiguity aversion comprises behavioral traits that may not be due to ambiguity—like probabilistic risk aversion, the tendency of discounting “objective” probabilities that has been observed in many experiments on decision making under risk (including the celebrated “Allais paradox”). Thus, one may consider it more appropriate to use a different name for what is measured here, like “chance aversion” or “extended ambiguity aversion.” The reason for our choice of terminology is that we see a ranking of conceptual importance between ambiguity aversion/love and other departures from SEU maximization. As we argued above using Savage’s words, the presence of ambiguity provides a normatively compelling reason for violating SEU. We do not feel that other documented reasons are similarly compelling. Moreover, we hold (see below and Subsection 10.7.3) that extraneous devices—say, a rich set of exogenously “unambiguous” events—are required for ascertaining the reason of a given departure. Thus, when these devices are not available—say, because the set
214
Paolo Ghirardato and Massimo Marinacci
of “unambiguous” events is not rich enough—we prefer to attribute a departure to the reasons we find normatively more compelling. However, the reader is warned, so that he/she may choose to give a different name to the phenomenon we formally describe.
10.1.1. The related literature The problem of defining ambiguity and ambiguity aversion is discussed in a number of earlier papers. The closest to ours in spirit and generality is Epstein (1999), the first paper to develop a notion of absolute ambiguity aversion from a comparative foundation.3 As we discuss in more detail in Subsection 10.7.3, the comparative notion and benchmarks he uses are different from ours. Epstein’s objective is to provide a more precise measurement of ambiguity attitude than the one we attempt here; in particular, to filter out probabilistic risk aversion. For this reason, he assumes that in the absence of ambiguity a DM’s preferences are “probabilistically sophisticated” in the sense of Machina and Schmeidler (1992). However, we argue that for its conclusions to conform with intuition, Epstein’s approach requires an extraneous device: a rich set of acts which are exogenously established to be “unambiguous,” much larger than the set of the constants that we use. Thus, the higher accuracy of his approach limits its applicability vis à vis our cruder but less demanding approach. The most widely known and accepted definition of absolute ambiguity aversion is that proposed by Schmeidler in his seminal CEU model (Schmeidler, 1989). Employing an Anscombe–Aumann framework, he defines ambiguity aversion as the preference for “objective mixtures” of acts, and he shows that for CEU preferences this notion is characterized by the convexity of the capacity representing the DM’s beliefs. While the intuition behind this definition is certainly compelling, Schmeidler’s axiom captures more than our notion of ambiguity aversion. It gives rise to ambiguity averse behavior, but it entails additional structure that does not seem to be related to ambiguity aversion (see Example 10.4). Doubts about the relation of convexity to ambiguity aversion in the CEU case are also raised by Epstein (1999), but he concludes that they are completely unrelated (see Section 10.6 for a discussion). There are other interesting papers dealing with ambiguity and ambiguity aversion. In a finite setting, Kelsey and Nandeibam (1996) propose a notion of comparative ambiguity for the CEU and MEU models similar to ours and obtain a similar characterization, as well as an additional characterization in the CEU case. Unlike us, they do not consider absolute ambiguity attitude, and they do not discuss the issue of the distinction of cardinal risk and ambiguity attitude. Montesano and Giovannoni (1996) notice a connection between absolute ambiguity aversion in the CEU model and nonemptiness of the core, but they base themselves purely on intuitive considerations on Ellsberg’s example. Chateauneuf and Tallon (1998) present an intuitive necessary and sufficient condition for nonemptiness of the core of CEU preferences in an Anscombe–Aumann framework. Zhang
Ambiguity made precise
215
(1996), Nehring (1999), and Epstein and Zhang (2001) propose different definitions of unambiguous event and act. Fishburn (1993) characterizes axiomatically a primitive notion of ambiguity. 10.1.2. Organization The structure of the chapter is as follows. Section 10.2 provides the necessary definitions and set-up. Section 10.3 introduces the notions of ambiguity aversion. The cardinal symmetry condition is introduced in Subsection 10.3.1, and the comparative and absolute definitions in 10.3.2. Section 10.4 presents the characterization results. Section 10.5 contains the notions of unambiguous act and event, and the characterization of the latter. In Section 10.6, we go back to the Ellsberg urn and show the implications of our results for that example. Section 10.7 discusses the key aspects of our approach, in particular, the choices of the comparative ambiguity ranking and the benchmark for defining ambiguity neutrality; it thus provides a more detailed comparison with Epstein’s (1999) approach. The Appendices contain the proofs and some technical material.
10.2. Set-up and preliminaries The general set-up of Savage (1954) is the following. There is a set S of states of the world, an algebra of subsets of S, and a set X of consequences. The choice set F is the set of all finite-valued acts f : S → X which are measurable w.r.t. . With the customary abuse of notation, for x ∈ X we define x ∈ F to be the constant act x(s) = x for all s ∈ S, so that X ⊆ F . Given A ∈ , we denote by xAy the binary act (bet) f ∈ F such that f (s) = x for s ∈ A, and f (s) = y for s∈ / A. Our definitions require that the DM’s preferences be represented by a weak order on F : a complete and transitive binary relation , with asymmetric (resp. symmetric) component (resp. ∼). The weak order is called nontrivial if there are f , g ∈ F such that f g. We henceforth call preference relation any nontrivial weak order on F . A functional V : F → R is a representation of if for every f , g ∈ F , f g if and only if V (f ) V (g). A representation V is called: monotonic if f (s) g(s) for every s ∈ S implies V (f ) V (g); nontrivial if V (f ) > V (g) for some f,g ∈ F . While the definitions apply to any preference relation, our results require a little more structure, provided by a general decision model introduced in Ghirardato and Marinacci (2000a). To present it, we need the following notion of “nontrivial” event: Given a preference relation , A ∈ is essential for if for some x, y ∈ X, we have x x Ay y. Definition 10.1. Let be a binary relation. We say that a representation V : F → R of is canonical if it is nontrivial monotonic and there exists
216
Paolo Ghirardato and Massimo Marinacci
a set-function ρ : → [0, 1] such that, letting u(x) ≡ V (x) for all x ∈ X, for all consequences x y and all events A, V (x Ay) = u(x) ρ(A) + u(y) (1 − ρ(A)).
(10.3)
A relation is called a biseparable preference if it admits a canonical representation, and moreover such representation is unique up to a positive affine transformation when has at least one essential event. Clearly, a biseparable preference is a preference relation. If V is a canonical representation of , then u is a cardinal state-independent representation of the DM’s preferences over consequences, hence we call it his canonical utility index. Moreover, for all x y and all events A, B ∈ we have xAy xBy if and only if ρ(A) ρ(B). Thus, ρ represents the DM’s willingness to bet (likelihood relation) on events. ρ is easily shown to be a capacity—a set-function normalized and monotonic w.r.t. set inclusion—so that V evaluates binary acts by taking the Choquet expectation of u with respect to ρ.4 However, the DM’s preferences over nonbinary acts are not constrained to a specific functional form. To understand the rationale of the clause relating to essential events, first observe that for any with a canonical representation with willingness to bet ρ, an event A is essential if and only iff 0 < ρ(A) < 1. Thus, there are no essential events iff ρ(A) is either 0 or 1 for every A; that is, the DM behaves as if he does not judge any bet to be uncertain, and his canonical utility index is ordinal. In such a case, the DM’s cardinal risk attitude is then intuitively not defined: without an uncertain event there is no risk. On the other hand, it can be shown (Ghirardato and Marinacci, 2000a: Theorem 4) that cardinal risk attitude is characterized by a cardinal property of the canonical utility index, its concavity. Hence the additional requirement in Definition 10.1 guarantees that when there is some uncertain event cardinal risk aversion is well defined. As the differences in two DM’s cardinal risk attitude might play a role in the choices in Equations (10.1) and (10.2), it is useful to identify the situation in which these attitudes are defined: Say that preference relations 1 and 2 have essential events if there are events A1 , A2 ∈ such that for each i = 1, 2, Ai is essential for i . To avoid repetitions, the following lists all the assumptions on the structure of the decision problem and on the DM’s preferences that are tacitly assumed in all results in the chapter: Structural Assumption X is a connected and separable topological space (e.g. a convex subset of Rn with the usual topology). Every biseparable preference on F has a continuous canonical utility function. A full axiomatic characterization of the biseparable preferences satisfying the Structural Assumption is provided in Ghirardato and Marinacci, 2000a. 10.2.1. Some examples of biseparable preferences As mentioned earlier, the biseparable preference model is very general. In fact, it contains most of the known preference models that obtain a separation between
Ambiguity made precise
217
cardinal (state-independent) utility and willingness to bet. We now illustrate this claim by showing some examples of decision models which under mild additional restrictions (e.g. the Structural Assumption) belong to the biseparable class. (More examples and details are found in Ghirardato and Marinacci, 2000a.) (i) A binary relation on F is a CEU ordering if there exist a cardinal utility index u on X and a capacity ν on (S, ) such that can be represented by the functional V : F → R defined by the following equation: V (f ) =
u(f (s)) ν(ds),
(10.4)
S
where the integral is taken in the sense of Choquet (notice that it is finite because each act in F is finite-valued). The functional V is immediately seen to be a canonical representation of , and ρ = ν is its willingness to bet. An important subclass of CEU orderings are the SEU orderings, which correspond to the special case in which ν is a probability measure, that is, a finitely additive capacity. See Wakker (1989) for an axiomatization of CEU and SEU preferences (satisfying the Structural Assumption) in the Savage setting. (ii) Let denote the set of all the probability measures on (S, ). A binary relation on F is a MEU ordering if there exist a cardinal utility index u and a unique nonempty, (weak∗ )-compact and convex set C ⊆ such that can be represented by the functional V : F → R defined by the following equation: V (f ) = min
P ∈C S
u(f (s)) P (ds).
(10.5)
SEU also corresponds to the special case of MEU in which C = {P } for some probability measure P . If we now let for any A ∈ , P (A) = min P (A), P ∈C
(10.6)
we see that P is an exact capacity. While in general V (f ) is not equal to the Choquet integral of u(f ) with respect to P , this is the case for binary acts f . This shows that V is a canonical representation of , with willingness to bet ρ = P . See Casadesus-Masanell et al. (2000) for an axiomatization of MEU preferences (satisfying the Structural Assumption) in the Savage setting. More generally, consider an α-MEU preference which assigns some weight to both the worst-case and best-case scenarios. Formally, there is a cardinal utility u, a set or probabilities C, and α ∈ [0, 1], such that is
218
Paolo Ghirardato and Massimo Marinacci represented by * ) V (f ) = α min u(f (s)) P (ds) + (1 − α) max u(f (s)) P (ds) . P ∈C S
P ∈C
S
This includes the case of a “maximax” DM, who has α ≡ 0. V is canonical, so that is biseparable, with ρ given by ρ(A) = α minP ∈C P (A) + (1 − α) maxP ∈C P (A), for A ∈ . (iii) Consider a binary relation constructed as follows: There is a cardinal utility u, a probability P and a number β ∈ [0, 1] such that is represented by V (f ) ≡ (1 − β) u(f (s)) P (ds) + β ϕ(u ◦ f ), S
where
"
ϕ(u ◦ f ) ≡ sup
u(g(s)) P (ds) : g ∈ F binary, S
% u(g(s)) u(f (s)) for all s ∈ S . describes a DM who behaves as if he was maximizing SEU when choosing among binary acts, but not when comparing more complex acts. The higher the parameter β, the farther the preference relation is from SEU on nonbinary acts. V is monotonic and it satisfies Equation (10.3) with ρ = P , so that it is a canonical representation of . 10.2.2. The Anscombe–Aumann case The Anscombe–Aumann framework is a widely used special case of our framework in which the consequences have an objective feature: X is also a convex subset of a vector space. For instance, X is the set of all the lotteries on a set of prizes if the DM has access to an “objective” independent randomizing device. In this framework, it is natural to consider the following variant of the biseparable preference model— where for every f , g ∈ F and α ∈ [0, 1], αf + (1 − α)g denotes the act which pays αf (s) + (1 − α)g(s) ∈ X for every s ∈ S. Definition 10.2. A canonical representation V of a preference relation is constant linear (c-linear for short) if V (αf + (1 − α)x) = αV (f ) + (1 − α)V (x) for all binary f ∈ F , x ∈ X, and α ∈ [0, 1]. A relation is called a c-linearly biseparable preference if it admits a c-linear canonical representation. Again, an axiomatic characterization of this model is found in Ghirardato and Marinacci (2000a). It generalizes the SEU model ofAnscombe andAumann (1963) and many non-EU extensions that followed, like the CEU and MEU models of Schmeidler (1989) and Gilboa and Schmeidler (1989) respectively. In fact, a clinearly biseparable preference behaves in a SEU fashion over the set X of the
Ambiguity made precise
219
constant acts, but it is almost unconstrained over nonbinary acts. (C-linearity guarantees the cardinality of V and hence u.) All the results in this chapter are immediately translated to this class of preferences, in particular to the CEU and MEU models in the Anscombe–Aumann framework mentioned earlier. Indeed, as we show in Proposition 10.2 later, in this case removing cardinal risk aversion is much easier than in the more general framework we use.
10.3. The definitions As anticipated in the Introduction, the point of departure of our search for an extended notion of ambiguity aversion is the following partial order on preference relations: Definition 10.3. Let 1 and 2 be two preference relations. We say that 2 is more uncertainty averse than 1 if: For all x ∈ X and f ∈ F , both x 1 f ⇒ x 2 f
(10.7)
x 1 f ⇒ x 2 f .
(10.8)
and
This order has the advantage of making the weakest prejudgment on which acts are “intuitively” unambiguous: The constants. However, Example 10.1 illustrates that it does not discriminate between cardinal risk attitude and ambiguity attitude: DMs 1 and 2 are intuitively both ambiguity neutral, but 1 is more cardinal risk averse, and hence more uncertainty averse than 2. The problem is that constant acts are “neutral” with respect to ambiguity and with respect to cardinal risk. Given that our objective is comparing ambiguity attitudes, we thus need to find ways to coarsen the ranking above, so as to identify which part is due to differences in cardinal risk attitude and which is due to differences in ambiguity attitude. 10.3.1. Filtering cardinal risk attitude While the “factorization” just described can be achieved easily if we impose more structure on the decision framework (see, e.g. the discussion in Subsection 10.7.3), we present a method for separating cardinal risk and ambiguity attitude which is only based on preferences, does not employ extraneous devices, and obtains the result for all biseparable preferences. Moreover, this approach does not impose any restrictions on the two DMs’ beliefs (and hence on their relative ambiguity attitude), a problem that all alternatives share. The key step is coarsening comparative uncertainty aversion by adding the following restriction on which pairs of preferences are to be compared (we write {x, y} z as a short-hand for x z and y z, and similarly for ≺):
220
Paolo Ghirardato and Massimo Marinacci
Definition 10.4. Two preference relations 1 and 2 are cardinally symmetric if for any pair (A1 , A2 ) ∈ × such that each Ai is essential for i , i = 1, 2, and any v∗ , v ∗ , w∗ , w∗ ∈ X such that v∗ ≺1 v ∗ and w∗ ≺2 w ∗ we have: •
If there are x, y ∈ X such that v∗ 1 {x, y}, w∗ 2 {x, y}, and v∗ A1 x ∼1 v ∗ A1 y
and
w∗ A2 x ∼2 w ∗ A2 y,
(10.9)
then for every x , y ∈ X such that v∗ 1 {x , y }, w∗ 2 {x , y } we have v∗ A1 x ∼1 v ∗ A1 y ⇐⇒ w∗ A2 x ∼2 w ∗ A2 y . •
(10.10)
Symmetrically, if there are x, y ∈ X such that v ∗ ≺1 {x, y}, w ∗ ≺2 {x, y}, and x A1 v ∗ ∼1 y A1 v∗
and
x A2 w ∗ ∼2 y A2 w∗ ,
(10.11)
then for every x , y ∈ X such that v ∗ ≺1 {x , y }, w ∗ ≺2 {x , y } we have x A1 v ∗ ∼1 y A1 v∗ ⇐⇒ x A2 w ∗ ∼2 y A2 w∗ .
(10.12)
This condition is inspired by the utility construction technique used in the axiomatizations of additive conjoint measurement in, for example, Krantz et al. (1971) and Wakker (1989). A few remarks are in order: First, cardinal symmetry holds vacuously for any pair of preferences which do not have essential events. Second, cardinal symmetry does not impose restrictions on the DMs’ relative ambiguity attitudes. In fact, for all acts ranked by i , the consequence obtained if Ai is always strictly better than that obtained if Aci , so that all acts are bets on the same event Ai . Intuitively, a DM’s ambiguity attitude affects these bets symmetrically, so that his preferences do not convey any information about it. Moreover, cardinal symmetry does not constrain the DMs’ relative confidence on A1 and A2 , since the “win” (or “loss”) payoffs can be different for the two DMs. On the other hand, it does unsurprisingly restrict their relative cardinal risk attitudes. To better understand the relative restrictions implied by cardinal symmetry, assume that consequences are monetary payoffs and that both DMs like more money to less. Suppose that, when betting on events (A1 , A2 ), (10.9) holds for some “loss” payoffs x and y and “win” payoffs v ∗ 1 v∗ and w∗ 2 w∗ respectively. This says that exchanging v∗ for v ∗ as the prize for A1 , and w∗ for w∗ as the prize for A2 , can for both DMs be traded off with a reduction in “loss” from x to y. Suppose that when the initial loss is x < x, 1 is willing to tradeoff the increase in “win” with a reduction in “loss” to y , but 2 accepts reducing “loss” only to y > y (i.e., w∗ A2 x 2 w ∗ A2 y , in violation of (10.10)). That is, as the amount of the low payoff decreases, 2 becomes more sensible to differences in payoffs than 1 . Such diversity of behavior—that we intuitively attribute to differences in the DMs’ risk attitude—is ruled out by cardinal symmetry, which requires that the two DMs consistently agree on the acceptable tradeoff for improving their “win”
Ambiguity made precise
221
payoff, and similarly for the “loss” payoff. It is important to stress that this discussion makes sense only when both DMs are faced with nontrivial uncertainty (i.e. they are both betting on essential events). Thus, we do not use “trade-off” to mean certain substitution; rather, substitution in the context of an uncertain prospect. To see how cardinal symmetry is used to show that two biseparable preferences have the same cardinal risk attitude, assume first that the two relations are ordinally equivalent: for every x, y ∈ X, x 1 y ⇔ x 2 y. When that is the case, cardinal symmetry holds if and only if their canonical utility indices are positive affine transformations of each other. In order to simplify the statements, we write u1 ≈ u2 to denote such “equality” of indices. Proposition 10.1. Suppose that 1 and 2 are ordinally equivalent biseparable preferences which have essential events. Then 1 and 2 are cardinally symmetric if and only if their canonical utility indices satisfy u1 ≈ u2 . The intuition of the proof (see Appendix B) can be quickly grasped by rewriting, say, Equations (10.9) and (10.10) in terms of the canonical representations to find that for every x, y, x , y ∈ X, u1 (x) − u1 (y) = u1 (x ) − u1 (y ) ⇐⇒ u2 (x) − u2 (y) = u2 (x ) − u2 (y ). Notice however that this does not imply that the preferences are identical on binary acts: The DMs’ beliefs on events could be totally different. The comparative notion of ambiguity aversion we propose in the next subsection checks comparative uncertainty aversion in preferences with the same cardinal risk attitude. Clearly, it would be nicer to have a comparative notion that also ranks preferences without the same cardinal risk attitude. In Subsection 10.7.1, we discuss how to extend our notion to deal with these cases. This extension requires the exact measurement of the two preferences’ canonical utility indices, and is thus “less behavioral” than the one we just anticipated. Finally, we remark that a symmetric exercise to that performed here is to coarsen comparative uncertainty aversion so as to rank preferences by their cardinal risk aversion only. In Ghirardato and Marinacci (2000a) it is shown that for biseparable preferences such ranking is represented by the ordering of canonical utilities by their relative concavity, thus generalizing the standard result. 10.3.2. Comparative and absolute ambiguity aversion Having thus prepared the ground, our comparative notion of ambiguity is immediately stated: Definition 10.5. Let 1 and 2 be two preference relations. We say that 2 is more ambiguity averse than 1 whenever both the following
222
Paolo Ghirardato and Massimo Marinacci
conditions hold: (A) 2 is more uncertainty averse than 1 ; (B) 1 and 2 are cardinally symmetric. Thus, we restrict our attention to pairs which are cardinally symmetric. As explained earlier, when one DM’s preference does not have an essential event, cardinal risk aversion does not play a role in that DM’s choices, so that we do not need to remove it from the picture. Remark 10.1. So far, we have tacitly assumed that cardinal risk and ambiguity attitude completely characterize biseparable preferences. Indeed, the validity of this can be easily verified by observing that if two such preferences are “as uncertainty averse as” each other (i.e., 1 is more uncertainty averse than 2 , and vice versa), they are identical. We finally come to the absolute definition of ambiguity aversion and love. Let be a preference relation on F with a SEU representation.5 As we observed in the Section 10.1, these relations intuitively embody ambiguity neutrality. We propose to use them as the benchmark for defining ambiguity aversion. Of course, one could intuitively hold that the SEU ones are not the only relations embodying ambiguity neutrality, and thus prefer using a wider set of benchmarks. This alternative route is discussed in Subsection 10.7.3. Definition 10.6. A preference relation is ambiguity averse (loving) if there exists a SEU preference relation which is less (more) ambiguity averse than . It is ambiguity neutral if it is both ambiguity averse and ambiguity loving. If is a SEU preference which is less ambiguity averse than , we call it a benchmark preference for . We denote by R() the set of all benchmark preferences for . That is, R() ≡ { ⊆ F × F : is SEU and is more ambiguity averse than }. Each benchmark preference ∈ R() induces a probability measure P on , so a natural twin of R() is the set of the benchmark measures: M() = {P ∈ : P represents , for ∈ R()}. Using this notation, Definition 10.6 can be rewritten as follows: is ambiguity averse if either R() = Ø, or M() = Ø.
10.4. The characterizations We now characterize the notions of comparative and absolute ambiguity aversion defined in the previous section for the general case of biseparable preferences, and the important subcases of CEU and MEU preferences. To start, we use
Ambiguity made precise
223
Proposition 10.1 and the observation that the canonical utility index of a preference with no essential events is ordinal, to show that if two preferences are biseparable and they are ranked by Definition 10.5, they have the same canonical utility index: Theorem 10.1. Suppose that 1 and 2 are biseparable preferences, and that 2 is more ambiguity averse than 1 . Then u1 ≈ u2 . Checking cardinal symmetry is clearly not a trivial task, but for an important subclass of preference relations—the c-linearly biseparable preferences in an Anscombe–Aumann setting—it is implied by comparative uncertainty aversion. In fact, under c-linearity, ordinal equivalence easily implies cardinal symmetry, so that we get: Proposition 10.2. Suppose that X is a convex subset of a vector space, and that 1 and 2 are c-linearly biseparable preferences. 2 is more ambiguity averse than 1 if and only if 2 is more uncertainty averse than 1 . Therefore, in this case Definition 10.3 can be directly used as our definition of comparative ambiguity attitude. 10.4.1. Absolute ambiguity aversion We first characterize absolute ambiguity aversion for a general biseparable preference . Suppose that V is a canonical representation of , with canonical utility u. We let " % D() ≡ P ∈ : u(f (s)) P (ds) V (f ) for all f ∈ F . S
That is, D(), which depends only on V , is the set of beliefs inducing preferences which assign (weakly) higher expected utility to every act f . These preferences exhaust the set of the benchmarks of : Theorem 10.2. Let be a biseparable preference. Then, M() = D(). In particular, is ambiguity averse if and only if D() = Ø. Let ρ be the capacity associated with the canonical representation V . It is immediate to see that if P ∈ D(), then P ρ. Thus, nonemptiness of the core of ρ (the set of the probabilities that dominate ρ pointwise, that we denote C(ρ)) is necessary for to be ambiguity averse. In Subsection 10.4.2 it is shown to be not sufficient in general. Turn now to the characterization of ambiguity aversion for the popular CEU and MEU models. Suppose first that is a CEU preference relation represented by the capacity ν, and let C(ν) denote ν’s possibly empty core. It is shown that D() = C(ν), so that the following result—which also provides a novel decision-theoretic interpretation of the core as the set of all the benchmark measures—follows as a corollary of Theorem 10.2.
224
Paolo Ghirardato and Massimo Marinacci
Corollary 10.1. Suppose that is a CEU preference relation, represented by capacity ν. Then C(ν) = M(). In particular, is ambiguity averse if and only if C(ν) = Ø. Thus, the core of an ambiguity averse capacity is equal to the set of its benchmark measures, and the ambiguity averse capacities are those with a nonempty core, called “balanced.” A classical result (see, e.g. Kannai, 1992) thus provides an internal characterization of ambiguity aversion in the CEU case: Letting 1A denote the characteristic function of A ∈ , a capacity reflects ambiguity aversion if and only if for all λ1 , . . . , λn 0 and all A1 , . . . , An ∈ such that ni=1 λi 1Ai 1S , we have ni=1 λi ν (Ai ) 1. As convex capacities are balanced, but not conversely, the corollary motivates our claim that convexity does not characterize our notion of ambiguity aversion. This point is illustrated by Example 10.4 below, which presents a capacity that intuitively reflects ambiguity aversion but is not convex. On the other hand, given a MEU preference relation with set of priors C, it is shown that D() = C. Thus, Theorem 10.2 implies that any MEU preference is ambiguity averse (as it is intuitive) and, more interestingly, that the set C can be interpreted as the set of the benchmark measures for . Corollary 10.2. Suppose that is a MEU preference relation, represented by the set of probabilities C. Then C = M(), so that is ambiguity averse. As to ambiguity love, reversing the proof of Theorem 10.2 shows that for any biseparable preference, ambiguity love is characterized by nonemptiness of the set " % E () ≡ P ∈ : u(f (s)) P (ds) V (f ) for all f ∈ F . S
In particular, a CEU preference with capacity ν is ambiguity loving if and only if the set of probabilities dominated by ν is nonempty. As for MEU preferences: None is ambiguity loving. Conversely, any “maximax” EU preference is ambiguity loving, with E () = C. Finally, we look at ambiguity neutrality. Since we started with an informal intuition of SEU preferences as reflecting neutrality to ambiguity, an important consistency check on our analysis is to verify that they are ambiguity neutral in the formal sense. This is the case: Proposition 10.3. Let be a biseparable preference. Then is ambiguity neutral if and only if it is a SEU preference relation. 10.4.2. Comparative ambiguity aversion We conclude the section with the characterization of comparative ambiguity aversion. The general result on comparative ambiguity, an immediate consequence
Ambiguity made precise
225
of Theorem 10.2, is stated as follows (where ρ1 and ρ2 represent the willingness to bet of 1 and 2 respectively): Proposition 10.4. Let 1 and 2 be two biseparable preferences. If 2 is more ambiguity averse than 1 , then ρ1 ρ2 , D(1 ) ⊆ D(2 ), E (1 ) ⊇ E (2 ) and u1 ≈ u2 . Thus, relative ambiguity implies containment of the sets D() and E () (clearly in opposite directions), and dominance of the willingness to bet ρ. Of course, the proposition lacks a converse, and thus it does not offer a full characterization. As we argue below, biseparable preferences seem to have too little structure for obtaining a general characterization result. Things are different if we restrict our attention to specific models. For instance, the next result characterizes comparative ambiguity for the CEU and MEU models: Theorem 10.3. Let 1 and 2 be biseparable preferences, with canonical utilities u1 and u2 respectively. (i) Suppose that 1 and 2 are CEU, with respective capacities ν1 and ν2 . Then 2 is more ambiguity averse than 1 if and only if ν1 ν2 and u1 ≈ u2 . (ii) Suppose that 1 is MEU, with set of probabilities C1 . Then 2 is more ambiguity averse than 1 if and only if C1 = D(1 ) ⊆ D(2 ) and u1 ≈ u2 . Observe that part (ii) of the theorem does more than characterize comparative ambiguity for MEU preferences, as it applies to any biseparable 2 . For instance, it is immediate to notice that one can characterize absolute ambiguity aversion using that result and the fact that if 1 is a SEU preference relation with beliefs P , then C1 = {P }. Also, a symmetric result to (ii) holds: If 2 is “maximax” EU, it is more ambiguity averse than 1 iff C2 = E (2 ) ⊆ E (1 ). Remark 10.2. Theorem 10.3 can be used to explain the apparent incongruence of the characterization of comparative risk aversion in SEU (in the sense of Yaari, 1969) and of comparative ambiguity aversion in CEU: Convexity of ν seems to be the natural counterpart of concavity of u, but it is not. This is due to the different uniqueness properties of utility functions and capacities. A SEU 2 is more risk averse than a SEU 1 iff for every common normalization of the utilities, we have u2 (x) u(x) inside the interval of normalization. Since any normalization is allowed, u2 must then be a concave transformation of u1 . In the case of capacities only one normalization is allowed, so we only have ν1 ν2 . It is not difficult to show that the necessary conditions of Proposition 10.4 are not sufficient if taken one by one. For instance, there are pairs of MEU (resp. CEU) preferences 1 and 2 such that ρ1 ρ2 (resp. C(ν1 ) = D(1 ) ⊆ D(2 ) = C(ν2 )) does not entail that 2 is more ambiguity averse than 1 .
226
Paolo Ghirardato and Massimo Marinacci
Example 10.2. Let S = {s1 , s2 , s3 }, the power set of S. Consider the probabilities P , Q and R defined by P = [1/2, 0, 1/2], Q = [0, 1, 0] and R = [1/2, 1/2, 0]. Let C1 and C2 respectively be the closed convex hull of {P , Q} and {P , Q, R}. Then, ρ1 = P 1 = P 2 = ρ2 , but C2 ⊆ C1 , and indeed by Theorem 10.3 the MEU preference 2 inducing C2 is more ambiguity averse than the MEU preference 1 inducing C1 . Consider next a capacity ν such that ν(A) = 1/3 for any A = {Ø, S}, and a probability P equal to 1/3 on each singleton. Then C(ν) = {P }, so that ν is balanced, but not exact (for instance, P ({s1 , s2 }) = 2/3 > 1/3 = ν({s1 , s2 })). We have C(ν) ⊆ C(P ) but ν P , and by Theorem 10.3 the CEU preference inducing ν is not more ambiguity averse than that inducing P . In contrast, P is exact, and we have both C(P ) ⊆ C(ν) and P ν. This example illustrates two conceptual observations. The first (anticipated in Subsection 10.4.1) is that nonemptiness of the core of ρ is not sufficient for absolute ambiguity aversion: A probability can dominate ρ without being a benchmark measure for . Unsurprisingly, in general the capacity ρ does not completely describe the DM’s ambiguity attitude. The second observation is that, while D() does characterize the DM’s absolute ambiguity aversion, it is also an incomplete description of the DM’s ambiguity attitude: There can be preferences 1 and 2 strictly ranked by comparative ambiguity even though D(1 ) = D(2 ). To better appreciate the difficulty of obtaining a general sufficiency result for biseparable preferences, we now present an example in which all the necessary conditions hold but the comparative ranking does not obtain. Example 10.3. For a general S and (but see the restriction on P below), consider two preference relations 1 and 2 which behave according to example (iii) of biseparable preference in Section 10.2. Both have identical P and u (which ranges in a nondegenerate interval of R), with the following restriction on P : There are at least three disjoint events in , A1 , A2 and A3 such that P (Ai ) > 0 for i = 1, 2, 3 (otherwise both preferences are indistinguishable from SEU preferences with utility u and beliefs P ). Their β parameters are different, in particular β2 > β1 > 0. Clearly ρ1 = ρ2 = P and u1 ≈ u2 . It is also immediate to verify that, under the assumption on P , D(1 ) = D(2 ) = {P } and E (1 ) = E (2 ) = Ø, so that both preferences are (strictly) ambiguity averse. However, 1 is not more ambiguity averse than 2 (nor are 1 and 2 equal, which would follow from two applications of the converse). Indeed, the parameter β measures comparative ambiguity for these preferences, so that 2 is more ambiguity averse than 1 .
10.5. Unambiguous acts and events Let be an ambiguity averse or loving preference relation. Even though the preference relation has a strict ambiguity attitude, it may nevertheless behave in an ambiguity neutral fashion with respect to some subclass of acts and events,
Ambiguity made precise
227
that we may like to consider “unambiguous.” The purpose of this section is to identify the class of the unambiguous acts and the related class of unambiguous events, and to present a characterization of the latter for biseparable (in particular CEU and MEU) preference relations. We henceforth focus on ambiguity averse preference relations, but it is easy to see that all the results in this section can be shown for ambiguity loving preferences. A more extensive discussion of the behavioral definition of ambiguity for events and acts is found in Ghirardato and Marinacci, 2000b. In view of our results so far, the natural approach in defining the class of unambiguous events of a preference relation is to fix a benchmark ∈ R(), and to consider the subset of all the acts in F over which is as ambiguity averse as . Intuitively, ambiguity is a property that the DM attaches to partitions of events, so that nonconstant acts which generate the same partition should be consistently deemed either both ambiguous or both unambiguous. Hence, we consider as “truly” unambiguous only the acts which belong to the set defined next. Definition 10.7. Given a preference relation and ∈ R(), the set of -unambiguous acts, denoted H , is the largest subset of F satisfying the following two conditions:6 (A) For every x ∈ X and every f ∈ H , and agree on the ranking of f and x. (B) For every f ∈ H and every g ∈ F , if {{s : g(s) ∼ x} : x ∈ X} ⊆ {{s : f (s) ∼ x} : x ∈ X}, then g ∈ H . Given a preference relation , for any f ∈ F denote by f the collection of all the “upper pre-image” sets of f , that is, f = {{s : f (s) x} : x ∈ X} .
(10.13)
Since any benchmark ∈ R() is ordinally equivalent to , for any act f ∈ F the upper pre-images of f with respect to and coincide: for all x ∈ X, {s: f (s) x} = {s: f (s) x}. The set ⊆ of the -unambiguous events is thus naturally defined to be the collection of all sets of upper pre-images of the acts in H . That is, ≡ f . f ∈H
It is immediate to observe that if A ∈ , then for every x, y ∈ X the binary act xAy belongs to H . This implies that Ac ∈ (i.e., is closed w.r.t. complements). We now present the characterization of the set . This turns out to be quite simple and intuitive: It is the subset of the events over which the capacity ρ representing ’s willingness to bet is complement-additive (sometimes called “symmetric”):
228
Paolo Ghirardato and Massimo Marinacci
Proposition 10.5. Let be an ambiguity averse biseparable preference with willingness to bet ρ. Then for every ∈ R(), the set satisfies: , + = A ∈ : ρ(A) + ρ(Ac ) = 1 . (10.14) It immediately follows from the proposition that the choice of the specific benchmark does not change the resulting set of events. In light of this, we henceforth call = the set of unambiguous events for . The consequences of the proposition for the CEU and MEU models are clear: Just substitute ν or P for ρ. In particular, when is a MEU preference with a set of probabilities C, it can be further shown that is the set of events on which all probabilities agree: = {A ∈ : ρ(A) = P (A) for all P ∈ C} . It is also interesting to observe that is in general not an algebra. This is intuitive, as the intersection of unambiguous events could be ambiguous.7 As to the set of unambiguous acts H , it can also be seen to be independent of the choice of benchmark. In general, the only way to ascertain which acts are unambiguous is to construct the set H . However, for MEU preferences and for CEU preferences whose capacity is exact (the lower envelope of its core), the set H is the set of all the acts which are measurable with respect to the events in . Therefore, in these cases characterizes the set of unambiguous acts as well. (All these results are proved in Ghirardato and Marinacci, 2000b).
10.6. Back to Ellsberg We now illustrate our results using the classical Ellsberg urn. The urn contains 90 balls of three colors: red, blue and yellow. The DM knows that there are 30 red balls and that the other 60 balls are either blue or yellow. However, he does not know their relative proportion. The state space for an extraction from the urn is S = {B, R, Y }. Given the nature of his information, it is natural to assume that the DM’s preference relation will be such that its set of unambiguous events satisfies ⊇ {Ø, {R}, {B, Y }, S}. In particular, assume that the DM’s preference relation is CEU and it induces the capacity ν. To reflect the fact that {R} and {B, Y } form an unambiguous partition, we know from Section 10.5 that if the DM is ambiguity averse (or loving) ν must satisfy ν(R) + ν(B, Y ) = 1.
(10.15)
Also, because of the symmetry of the information that the DM is given, it is natural to assume that ν(B) = ν(Y )
and ν(B, R) = ν(R, Y ).
(10.16)
We first show that, if the ambiguity restriction (10.15) is imposed, ambiguity aversion is not compatible with the following beliefs, which induce behavior that
Ambiguity made precise
229
would on intuitive grounds be considered “ambiguity loving”: ν(R) < ν(B) = ν(Y ); ν(B, Y ) < ν(B, R) = ν(R, Y ).
(10.17)
Proposition 10.6. No ambiguity averse CEU preference relation such that its set of unambiguous events contains {{R}, {B, Y }} can agree with the ranking (10.17). In his paper on ambiguity aversion, Epstein (1999) also discusses the Ellsberg urn, and he presents a convex capacity compatible with ambiguity loving in his sense (see Subsection 10.7.3 for a brief review), which satisfies the conditions in (10.17). This is the capacity ν1 defined by 1 , ν1 (B, Y ) = 13 , ν1 (R) = 12 1 ν1 (B) = ν1 (Y ) = 6 , ν1 (B, R) = ν1 (R, Y ) = 12 .
He thus concludes that convexity of beliefs does not imply ambiguity aversion for CEU preferences (it is also not implied, in his definition). We know from Corollary 10.1 that convexity implies ambiguity aversion in our sense. Proposition 10.6 helps clarifying why this example does not conflict with the intuition developed earlier: In fact, ν1 does embody ambiguity aversion in our sense, but it does not reflect the usual presumption that {R} and {B, Y } are seen as unambiguous events. If it did, it would have to satisfy (10.15), which is not the case (it cannot be, since convex capacities are balanced). For us, the DM with beliefs ν1 does not perceive {R} and {B, Y } as unambiguous. Of course, then it is not clear in which sense the conditions in (10.17) should “intuitively” embody ambiguity loving behavior. Going back to the example, we would say that the DM’s preferences intuitively reflect ambiguity aversion if the reverse inequalities held: ν (R) ν (B) = ν (Y ) ; ν (B, Y ) ν (B, R) = ν (R, Y ) .
(10.18)
We now show that the notion of ambiguity aversion proposed earlier characterizes this intuitive ranking when, besides the obvious symmetry restrictions in (10.16), we strengthen the requirement in (10.15) in the following natural way: ν(R) =
1 3
and
ν(B, Y ) =
2 . 3
(10.19)
Proposition 10.7. Let be a CEU preference relation such that its representing capacity ν satisfies the equalities (10.16) and (10.19). Then is ambiguity averse if and only if ν agrees with the ranking (10.18). In closing our discussion of Ellsberg’s problem, we provide further backing for our belief that convexity is not necessary for ambiguity aversion. Here is a capacity which is not convex, and still makes the typical Ellsberg choices.
230
Paolo Ghirardato and Massimo Marinacci
Example 10.4. Consider the capacity ν2 defined by (10.19) and ν2 (B) = ν2 (Y ) =
7 , 24
ν2 (B, R) = ν2 (R, Y ) =
1 . 2
This capacity satisfies (10.18), so that it reflects ambiguity aversion both formally and intuitively, but it is not superadditive, let alone convex.
10.7. Discussion In this section we discuss some of the choices we have made in the previous sections. First we briefly discuss how the comparative ambiguity ranking can be extended to preferences with different cardinal risk attitude. Then we discuss in more detail how the unambiguous acts described in Section 10.5 can be used in the comparative ranking, and why we chose SEU preferences as benchmarks. 10.7.1. Comparative ambiguity and equality of cardinal risk attitude As we observed earlier, our comparative ambiguity aversion notion cannot compare biseparable preferences with different canonical utility indices. Of course, the characterization results of Section 10.4 can be used to qualitatively compare two preferences by ambiguity: For instance, we can look at two CEU preferences and compare their willingness to bet, or we can use utility functions to compare two SEU preferences by risk aversion, even if they do not have the same beliefs. However, when dealing with biseparable preferences, it is easy to apply the intuition of our comparative ranking to compare preferences which do not have the same canonical utility. This requires eliciting the canonical utility indices first, and then using acts and constants that are “utility equivalents” in Equations (10.7) and (10.8).8 The ranking thus obtained is very general (it does not even entail ordinal equivalence), but it yields mutatis mutandis the same characterization results that we obtained with the more restrictive one. For instance: is ambiguity averse iff D() = Ø, and CEU (MEU) preference 2 is more ambiguity averse than CEU (MEU) preference 1 iff ν1 ν2 (C1 ⊆ C2 ) (but of course in general u1 ≈ u2 ). Nonetheless, this ranking requires the full elicitation of the DMs’ canonical utility indices, and is thus operationally more complex than that in Definition 10.5. 10.7.2. Using unambiguous acts in the comparative ranking One of the intuitive assumptions that our analysis builds on is that constant acts are primitively “unambiguous:” That is, we assume that every DM perceives constants as unambiguous. No other acts are “unambiguous” in this primitive sense. However, one could argue that it is natural to use in the comparative ranking also those acts which are revealed to be deemed unambiguous by both DMs, even if they are not constant.
Ambiguity made precise
231
Suppose that is an ambiguity averse biseparable preference, and let H ( ) be its set of unambiguous acts (events), as defined in Section 10.5. It is possible to see Ghirardato and Marinacci (2000b) that for every ∈ R() and every h ∈ H and f ∈ F , we have hf ⇒hf
and
h > f ⇒ h f.
(10.20)
That is, all benchmarks according to Definition 10.5 satisfy the stronger comparative ranking suggested above. Conversely, it is obvious that if and a SEU preference are cardinally symmetric and satisfy (10.20), they satisfy Definition 10.5. Thus, modifying Definition 10.5 to have (10.20) in part (A) does not change the set of the ambiguity averse preferences. 10.7.3. A more general benchmark We chose SEU maximization as the benchmark representing ambiguity neutrality. While few would disagree that SEU preferences are “ambiguity neutral” (in a primitive, nonformal sense), some readers may find that the result of Proposition 10.3 that SEU maximization characterizes ambiguity neutrality does not agree with their intuition of what constitutes ambiguity neutral behavior. In particular, they might feel that we should also classify as ambiguity neutral any non-SEU preference whose likelihood relation can still be represented by a probability measure. This would clearly be the case if we let such preferences be benchmarks for our comparative ambiguity notion. Here we explain why we have not followed that route, and the consequences of this choice for the interpretation of our notions. The non-SEU preferences in question are those that are probabilistically sophisticated (PS) in the sense of Machina and Schmeidler (1992). For example, consider a CEU preference whose willingness to bet is ρ = g(P ) for some probability measure P and “distortion” function g; that is, an increasing g : [0, 1] → [0, 1] such that g(0) = 0 and g(1) = 1. Such is PS since its ranking of bets (likelihood relation) is represented by the probability P , but it is not SEU if g is different from the identity function. According to the point of view suggested above, such is “ambiguity neutral”; it should thus be used as a benchmark in characterizing ambiguity aversion. Moreover, if we used PS preferences as benchmarks it might be possible to avoid attributing to ambiguity aversion the effects of probabilistic risk aversion. However, go back to the ambiguous urn of Example 10.1 and consider the following: Example 10.1. (continued) In the framework of Example 10.1, consider a third DM with CEU preferences 3 , with canonical utility u(x) = x and willingness to bet defined by ρ3 (B) =
1 4
and
ρ3 (R) =
1 . 4
It is immediate to verify that according to Definition 10.5, DM 3 is more ambiguity averse than DM 1 (who is SEU), so that he is ambiguity averse in our sense.
232
Paolo Ghirardato and Massimo Marinacci
That seems quite natural, since he is willing to invest less in bets on the ball extractions. With PS benchmarks, we conclude that both DMs are ambiguity neutral, since their willingness to bet are ordinally equivalent to the probability ρ1 (ρ3 = g(ρ1 ) for any distortion g such that g(1/2) = 1/4), so that both are PS. Hence, DM 3’s behavior is only due to his probabilistic risk aversion. Yet, it seems that the fact that DM 3 is only willing to bet 1/4 utils on any color may at least in part be due to the ambiguity of the urn and his possible ambiguity aversion. This example is not the only case in which using PS benchmarks yields counterintuitive conclusions. When the state space is finite, if we use PS preferences as benchmarks we find that almost every CEU preference inducing a strictly positive ρ on a finite state space is both ambiguity averse and loving. Thus, a large set of preferences are shown to be ambiguity neutral. Including, as the following example illustrates, many preferences which are not PS. Example 10.5. Suppose that two DMs are faced with the following decision problem. There are two urns, both containing 100 balls, either red or black. The DMs are told that Urn I contains at least 40 balls of each color, while Urn II contains at least 10 balls of each color. One ball will be extracted from each urn. Thus, the state space is S = {Rr, Rb, Br, Bb}, where the upper (lower) case letter stands for the color of the ball extracted from Urn I (II). Suppose that both DMs have CEU preferences 1 and 2 , with respective willingness to bet ρ1 and ρ2 . Using obvious notation, suppose that ρ1 (b) = ρ1 (r) = 0.1 and ρ1 (B) = ρ1 (R) = 0.4, that ρ1 (s) = 0.04 for each singleton s, and for every other event ρ1 is obtained by additivity. According to Definition 13, DM 1 is strictly ambiguity averse. In contrast, with PS benchmarks the result mentioned above shows that he is ambiguity neutral. Let ρ2 be as follows: ρ2 (b) = ρ2 (r) = 0.9 and ρ2 (B) = ρ2 (R) = 0.6, ρ2 (s) = 0.54 for each singleton s, ρ2 (A) = 0.92 for each A ∈ {Rr ∪ Bb, Rb ∪ Br}, and ρ2 (A) = 0.95 for each ternary set. According to Corollary 10.1, DM 2 is ambiguity loving, but if we use PS benchmarks we conclude that she is ambiguity neutral. Both conclusions go against our intuition. Moreover, since both ρ1 and ρ2 are not ordinally equivalent to a probability, 1 and 2 are not PS. The foregoing discussion shows some of the difficulties that may arise if we use PS, rather than SEU, preferences as benchmarks with our comparative ambiguity aversion notion: We end up attributing too much explanatory power to probabilistic risk aversion. Instead, with SEU benchmarks we overemphasize the role of ambiguity aversion. Is it possible to remove probabilistic risk attitude from the picture, as we did for cardinal risk attitude?9 10.7.3.1. Removing probabilistic risk aversion Suppose that there is a subset E of acts which are universally accepted as “unambiguous,” in the sense that we are sure that a DM’s choices among these acts are unaffected by his ambiguity attitude. Then, if E (and the
Ambiguity made precise
233
associated set of “unambiguous” events, denoted ) is sufficiently rich, we can discriminate between probabilistic risk and ambiguity aversion. For instance, modify Example 10.1 by assuming the availability of an “unambiguous” randomizing device, so that each state describes the result of the device as well. Now, find a set A of results of the device (obviously, here is the family of all such sets) which is as likely as R(ed) and then check if B(lack) is as likely as Ac . If it is, the DM behaves identically when faced with (equally likely) ambiguous and unambiguous events, so that all the nonadditivity of ρ3 on {B, R} must be due to his probabilistic risk aversion. His preferences are also PS on the extended problem. If it is not, then DM 3’s behavior is affected by ambiguity, and his preferences are not PS on the extended problem. The point is that in the presence of a sufficiently rich , a DM whose preferences are PS is treating ambiguous and unambiguous events symmetrically, and is hence intuitively ambiguity neutral. Therefore, in such a case we would expect PS preferences to be found ambiguity neutral. This is not the case in the original version of Example 10.1, since a rich set of “unambiguous” events is missing. More generally, consider a biseparable preference which is not PS overall, but is PS when comparing only unambiguous acts. That is, the DM behaves as if he forms a probability P on the set , and calculates his willingness to bet on these events by means of a distortion function g which only reflects his probabilistic risk attitude. As we did in controlling for cardinal risk attitude, we want to use as benchmarks for only those PS preferences—that with a small abuse of notation we also denote —which have the same probabilistic risk attitude; for example, those biseparable preferences which share g as distortion function. Interestingly, it turns out that if the set E is rich enough, any PS preference satisfying Equation (10.20) for all h ∈ E has this property. This is exactly the approach followed by Epstein (1999) in his work on ambiguity aversion: He assumes the existence of a suitably rich set of “unambiguous” events,10 defines E as the set of all the -measurable acts, and uses Equation (10.20) with h ∈ E as his comparative ambiguity notion. His choice of benchmark are PS preferences. This approach attains the objective of “filtering” the effects of probabilistic risk attitude from our absolute ambiguity notion. It thus yields a finer assessment of the DM’s ambiguity attitude. However, the foregoing discussion has illustrated that a crucial ingredient to this filtration is the existence of a set of “unambiguous” acts which is sufficiently rich: If it is too poor (e.g. it contains only the constants, as in Example 10.5), we may use benchmarks whose probabilistic risk attitude is different from the DM’s. This may cause Epstein’s approach to reach counterintuitive conclusions, as illustrated in the previous examples. The main problem we have with this approach is that we find it undesirable to base our measurement of ambiguity attitude on an exogenous notion of “ambiguity,” especially in view of the richness requisite. It seems that in many cases of interest the “obvious” set of “unambiguous” acts does not satisfy such requisite; for example, Ellsberg’s example. Our objective is to develop a notion of ambiguity attitude which is based on the weakest set of primitive requisites (like the two assumptions stated in the Introduction), even though this has a cost in terms of the “purity” of the interpretation of the behavioral feature we measure.
234
Paolo Ghirardato and Massimo Marinacci
Epstein and Zhang (2001) propose a behavioral foundation to the notion of “ambiguity,” so that the existence of a rich set E can be objectively verified, solving the problem mentioned earlier. In Ghirardato and Marinacci (2000b) we present an example which suggests that their behavioral notion can lead to counterintuitive conclusions (in that case, an intuitively ambiguous event is found unambiguous). More generally, we see the following problem with this enterprise: There may be events which are “unambiguous” (resp. “ambiguous”) with respect to which the DM nonetheless behaves in an ambiguity nonneutral (resp. neutral) fashion. Consider a DM who listens to a weather forecast stated as a probabilistic judgment. If the DM does not consider the specific source reliable, he might express a willingness to bet which is a distortion of this judgment, while being probabilistic risk neutral. Alternatively, he may find the source reliable, hence perceive no ambiguity, but be probabilistically risk averse. A preference-based notion of ambiguity must be able to distinguish between these two cases, classifying the relevant events ambiguous in the first case and unambiguous in the second. And this without using any auxiliary information. Considering moreover that the set of “verifiably unambiguous” events must be rich, we are skeptical that this feat is possible: The problem is that the Savage set-up does not provide us with enough instruments; it is too abstract. 10.7.3.2. Summing up We have argued that what motivates using PS (rather than SEU) preferences as benchmarks is the objective of discriminating between probabilistic risk aversion and ambiguity attitude. We have shown that this requires a rich set of “verifiably unambiguous” events, and briefly reviewed our doubts about the possibility of providing a behavioral foundation to this “verifiable ambiguity” notion in a general subjective setting without extraneous devices. In contrast, the analysis in this chapter shows that there are no such problems in using SEU benchmarks to identify an “extended” notion of ambiguity attitude, which can be disentangled from cardinal risk attitude using only behavioral data and no extraneous devices. Though it does not distinguish “real” ambiguity and probabilistic risk attitudes, we think that this “extended” ambiguity attitude is worthwhile, especially because of its wider applicability.
Appendix A: Capacities and Choquet integrals A set-function ν on (S, ) is called a capacity if it is monotone and normalized. That is: if for A, B ∈ , A ⊆ B, then ν(A) ν(B); ν(Ø) = 0 and ν(S) = 1. A capacity is called a probability measure if it is finitely additive: ν(A ∪ B) = ν(A) + ν(B) for all A disjoint from B. It is called convex if for every pair A, B ∈ , we have ν(A ∪ B) ν(A) + ν(B) − ν(A ∩ B). The core of a capacity ν is the (possibly empty) set C(ν) of all the probability measures on (S, ) which dominate it, that is, C(ν) ≡ {P : P ∈ , P (A) ν(A) for all A ∈ }.
Ambiguity made precise
235
Following the usage in Cooperative Game Theory (e.g., Kannai, 1992), all capacities with nonempty core are called balanced. A capacity ν is called exact it is balanced and it is equal to the lower envelope of its core (i.e., for all A ∈ , ν(A) = minP ∈C(ν) P (A)). Convex implies exact, which in turn implies balanced, but the converse implications are all false. The notion of integral used for capacities is the Choquet integral, due to Choquet (1953). For a given -measurable function ϕ : S → R, the Choquet integral of ϕ with respect to a capacity ν is defined as:
∞
ϕ dν = S
ν({s ∈ S : ϕ(s) α}) dα
0
+
0
−∞
[1 − ν({s ∈ S : ϕ(s) α})] dα,
(10.A.1)
where the r.h.s. is a Riemann integral (which is well defined because ν is monotone). When ν is additive, (10.A.1) becomes a standard (additive) integral. In general it is seen to be monotonic, positive homogeneous and comonotonic addi tive: If ϕ, ψ : S → R are non-negative and comonotonic, then (ϕ + ψ) dν = ϕ dν + ψ dν. Two functions ϕ, ψ : S → R are called comonotonic if there are no s, s ∈ S such that ϕ(s) > ϕ(s ) and ψ(s) < ψ(s ).
Appendix B: Cardinal symmetry and biseparable preferences In this Appendix, we prove Proposition 10.1. In order to make the proof as clear as possible, we first explain the notion of “standard sequence,” and then show how the latter can be used to prove the proposition. 10.B.1. Standard sequences Consider a DM whose preferences have a canonical representation V , with canonical utility index u, willingness to bet ρ, and an essential event A ∈ . Fix a pair of consequences v ∗ v∗ , and consider x 0 ∈ X such that x 0 v ∗ . If there is an x ∈ X such that x A v∗ x 0 A v ∗ , then by (10.3) and the convexity of the range of u, there is x 1 ∈ X such that x 1 A v∗ ∼ x 0 A v ∗ .
(10.B.1)
It is easy to verify that x 1 x 0 : If x 0 x 1 held, by monotonicity and biseparability, we would have x 0 A v ∗ x 1 A v ∗ and x 1 A v ∗ x 1 A v∗ . This yields x 0 A v ∗ x 1 A v∗ , a contradiction. Assuming that there is an x ∈ X such that x A v∗ x 1 A v ∗ , as above we can find x 2 ∈ X such that x 2 A v∗ ∼ x 1 A v ∗ .
(10.B.2)
236
Paolo Ghirardato and Massimo Marinacci
Again, x 2 x 1 . We can use the representation V to check that the equivalences in (10.B.1) and (10.B.2) translate to u(x 1 ) − u(x 0 ) =
1 − ρ(A) (u(v ∗ ) − u(v∗ )) = u(x 2 ) − u(x 1 ), ρ(A)
(10.B.3)
that is, the three points x 0 , x 1 , x 2 , are equidistant in u. Proceeding in this fashion we can construct a sequence of points {x 0 , x 1 , x 2 , . . .} all evenly spaced in utility. Such sequence we call an increasing standard sequence with base x 0 , carrier A, and mesh (v∗ , v ∗ ). (Notice that the distance in utility between the points in the sequence is proportional to the distance in utility between v∗ and v ∗ , which is used as the “measuring rod.”) Analogously, we can construct a decreasing standard sequence with base x 0 , carrier A and mesh (v∗ , v ∗ ) where v∗ x 0 . This will be a sequence starting again from x 0 , but now moving in the direction of decreasing utility: For every n 0, v ∗ A x n+1 ∼ v∗ A x n . Henceforth, we call a standard sequence w.r.t. (x 0 , A) any sequence {x¯ 0 , x¯ 1 , x¯ 2 , . . .} such that x¯ 0 = x 0 , and there is a pair of points (above or below x 0 ) which provides the mesh for obtaining {x¯ 0 , x¯ 1 , x¯ 2 , . . .} as a decreasing/increasing standard sequence with carrier A. It is simple to see how—having fixed an essential event A, and a base x 0 which is non-extremal in the ordering on X (i.e. there are y, z ∈ X such that y x 0 z)— standard sequences can be used to measure the canonical utility index u of a biseparable preference (extending the scope of the method proposed by Wakker and Deneffe, 1996): One just needs to construct (increasing and decreasing) standard sequences with base x 0 and finer and finer mesh. In what follows we use standard sequences and cardinal symmetry to show that equality of the ui , i = 1, 2, can be verified without eliciting them. 10.B.2. Equality of utilities: Proof of Proposition 10.1 The proof of Proposition 10.1 builds on two lemmas. The first lemma, whose simple proof we omit, shows the following: Suppose that a pair of biseparable preferences are cardinally symmetric, then for fixed non-extremal x 0 and essential events A1 and A2 , the sets of the standard sequences (with respect to (x 0 , A1 ) and (x 0 , A2 ) respectively) of the orderings are “nested” into each other. Stating this lemma requires some terminology and notation: Given a standard sequence {x n } for preference relation i , we say that a sequence {y m } ⊆ X is a refinement of {x n } if it is itself a standard sequence, and it is such that y m = x n whenever m = kn for some k ∈ N. Two canonical utility indices are subject to a common normalization if they take identical values on two consequences x, y ∈ X such that x i y for both i. Finally, for the rest of this section: For each i = 1, 2, the carrier of any standard sequence for i is a fixed essential event Ai , and SQ(i , x 0 ) ⊆ X denotes the set of the points belonging to some standard sequence of i with base x 0 and carrier Ai .
Ambiguity made precise
237
Lemma 10.B.1. Suppose that 1 , 2 are as assumed in Proposition 10.1. Fix a non-extremal x 0 ∈ X. If 1 and 2 are cardinally symmetric, then the following holds: Either every standard sequence for ordering 1 is a refinement of a standard sequence for 2 , or every standard sequence for ordering 2 is a refinement of a standard sequence for 1 . Hence, SQ(1 , x 0 ) = SQ(2 , x 0 ) ≡ SQ(x 0 ). The second lemma shows that, because of cardinal symmetry, the result holds on SQ(x 0 ): Lemma 10.B.2. Suppose that 1 , 2 are as assumed in Proposition 10.1. If 1 and 2 are cardinally symmetric, then for any non-extremal x 0 ∈ X and any common normalization of the two indices, u1 (x) = u2 (x) for every x ∈ SQ(x 0 ). Proof. Fix a non-extremal x 0 . Suppose that x belongs to an increasing standard sequence for i , {x n }. Since the relations are cardinally symmetric, by Lemma 10.B.1 it is w.l.o.g. (taking refinements if necessary) to take the sequence to be standard for both orderings. That is, there are v∗ , v ∗ , w∗ , w∗ ∈ X such that v ∗ 1 v∗ , w ∗ 2 w∗ and for n 0, x n+1 A1 v∗ ∼1 x n A1 v ∗ , and analogously for 2 (with w replacing v). Moreover, there is n 0 such that x = x n . Choose x m for some m > n, and take positive affine transformations of the two canonical utility functions so as to obtain u1 (x 0 ) = u2 (x 0 ) = 0 and u1 (x m ) = u2 (x m ) = 1. All points in the sequence are evenly spaced for both preferences (cf. Equation (10.B.3)). Hence we have u1 (x n ) = u2 (x n ) = n/m. The case in which x belongs to a decreasing standard sequence is treated symmetrically. Finally, we have the immediate observation that if u1 (x) = u2 (x) for one common normalization, the equality holds for every common normalization. Proof of Proposition 10.1. The “if” part follows immediately from the canonical representation. We now prove the “only if.” Start by fixing a non-extremal x 0 and adding a constant to both indices, so that u1 (x 0 ) = u2 (x 0 ) = 0. Suppose that (after this transformation) there is x ∈ X such that u1 (x) = u2 (x). By relabelling if necessary, assume that u1 (x) = α > β = u2 (x). There are different cases to consider, depending on where α and β are located. Suppose first that β 0. Choose v ∗ ∈ X such that x 0 1 v ∗ and further transform the utilities so that u¯ 1 (v ∗ ) = u¯ 2 (v ∗ ) = −1, to obtain u¯ 1 (x) = α¯ > β¯ = u¯ 2 (x). Choose ε > 0 such that α¯ − β¯ > ε. By the connectedness of the range of each ui and Lemma 10.B.1, there are v∗ , w∗ ∈ X such that (v∗ , v ∗ ) and (w∗ , v ∗ ) generate the same standard sequence {x n } and u¯ 1 (x n+1 ) − u¯ 1 (x n ) = u¯ 2 (x n+1 ) − u¯ 2 (x n ) < ε. So the “length” of the utility interval between each element in the increasing ¯ We also proved standard sequence is smaller than the distance between α¯ and β.
238
Paolo Ghirardato and Massimo Marinacci
in Lemma 10.B.2 that for each element in the standard sequence, we have equality of the utilities (since we imposed a common normalization). Hence there must be ¯ α). n 0 such that u¯ 1 (x n ) = u¯ 2 (x n ) = γ ∈ (β, ¯ We then have u¯ 1 (x n ) > u¯ 1 (x) ⇐⇒ x n 1 x
and u¯ 2 (x n ) < u¯ 2 (x) ⇐⇒ x n ≺2 x,
which contradicts the assumption of ordinal equivalence. The case in which α 0 is treated symmetrically. If, finally, α > 0 > β then, using an argument similar to the one just presented, one can find x¯ ∈ X such that u1 (x) ¯ = u2 (x) ¯ ∈ (0, α) and obtain a similar contradiction. This shows that u1 (x) = u2 (x) for every x ∈ X.
Appendix C: Proofs for Sections 10.4–10.6 10.C.1. Section 10.4 Proof of Theorem 10.1. We first state without proof an immediate result: Lemma 10.C.1. Two preference relations 1 and 2 satisfying Equations (10.7) and (10.8) are ordinally equivalent. Given this lemma, if 1 and 2 have essential events the result follows immediately from Proposition 10.1. If, say, relation i does not have essential events, any ordinal transformations of ui is still a canonical utility. Since the two preferences are ordinally equivalent by the lemma, it is then w.l.o.g. to use uj (j = i) to represent both of them. Proof of Theorem 10.2. We first prove that D() ⊆ M(). Given a canonical representation V of with canonical utility u, suppose that P ∈ D(), and consider the relation induced by P and u. We want to show that is more ambiguity averse than . Since P ∈ D(), u(f ) dP V (f ) for all f ∈ F , so that for every x ∈ X and f ∈ F , u(x) u(f (s)) P (ds) =⇒ V (x) V (f ), S
where the implication follows from the definition u(x) = V (x) for all x ∈ X. This proves that (10.7) holds. Similarly one shows the validity of (10.8). Part (B) of Definition 10.5 is immediate: If and have essential events, then the result follows from Proposition 10.1. Hence ∈ R(), or in other words P ∈ M(). We now prove the opposite inclusion D() ⊇ M(). Suppose that P ∈ M(). Let be the benchmark preference corresponding to P , and let u be the canonical utility index of . Since is a benchmark for , we have for every x ∈ X and f ∈ F , u (x) u (f (s)) P (ds) =⇒ u(x) V (f ), (10.C.1) S
Ambiguity made precise
239
and the same with strict inequality. We have to show that P ∈ D(). By Theorem 10.1, it is w.l.o.g. to take u = u . Hence, (10.C.1) implies that u(f ) dP V (f ) for all f ∈ F , and so P ∈ D(). Proof of Corollary 10.1. By Theorem 10.2, M() = D(). Let P ∈ D(). For every A ∈ and x ∗ x∗ , consider the act f = x ∗ A x∗ . Normalizing u(x ∗ ) = 1 and u(x∗ ) = 0, we have P (A) = u(f (s)) P (ds) u(f (s)) ν(ds) = ν(A), S
S
and so P ∈ C(ν). This implies D() ⊆ C(ν). The converse inclusion is trivial, since P ∈ C(ν) implies u(f ) dP u(f ) dν for all f ∈ F . Proof of Corollary 10.2. We are done if we show that for all f , g ∈ F , u(f (s)) P (ds) min u(g(s)) P (ds). f g ⇐⇒ min P ∈D() S
P ∈D() S
(10.C.2) This follows from the fact that there exists a unique weak∗ -compact and convex set C representing . D() is clearly weak∗ -compact (so that the minimum in (10.C.2) is well defined) and convex. Hence, if (10.C.2) holds C = D(), and by Theorem 1 10.2, D() = M(). To prove (10.C.2), suppose there are f , g ∈ F such that min u(f ) dP min u(g) dP P ∈C
P ∈C
and
min
P ∈D()
u(f ) dP < min
P ∈D()
u(g) dP .
Let P ∗ ∈ arg min{ S u(f (s)) P (ds) : P ∈ D()}. Since C ⊆ D(), we have: min u(f (s)) P (ds) u(f (s)) P ∗ (ds) < P ∈C S
min
P ∈D() S
S
u(g(s)) P (ds) min
P ∈C S
u(g(s)) P (ds),
a contradiction. Similarly, one shows that there cannot be f , g ∈ F such that the preference based on D() prefers weakly f to g, while g f . This shows that Equation (10.C.2) holds, concluding the proof. Proof of Proposition 10.3. That every SEU preference is ambiguity neutral follows immediately from two applications of Theorem 10.1. As for the converse: If is both ambiguity averse and ambiguity loving, there are a SEU preference
240
Paolo Ghirardato and Massimo Marinacci
relation 1 (represented by probability P1 ) such that is more ambiguity averse than 1 , and a SEU preference relation 2 (represented by probability P2 ) which is more ambiguity averse than . Applying Definition 10.5 twice, we obtain that for every f ∈ F and x ∈ X, x 1 f ⇒ x 2 f
and
x >1 f ⇒ x >2 f .
We show that 1 and 2 are cardinally symmetric. This requires first showing that if 2 has an essential event, so must . Suppose that A ∈ is essential for 2 , so that for some x y (remember that and 1 and 2 are all ordinally equivalent), x >2 x A y >2 y. Using the contrapositive of (10.7), we then have x A y y. Since 2 is a SEU preference, Ac is also 2 -essential, similarly implying x Ac y y. Now, suppose that has no essential event. Because of the preferences we just derived, we must have both x ∼ x A y and x ∼ x Ac y. This is impossible since 1 ∈ R(), for the contrapositive of (10.8) then yields x A y 1 x, which implies P1 (A) = 1, and x Ac y 1 x, which implies P1 (A) = 0. This gives us a contradiction, so that must have an essential event if 2 does. Hence, 2 and have essential events, and they are cardinally symmetric by assumption. Similarly one shows that 1 and have essential events and are cardinally symmetric. It is now immediate to check that these facts imply that 1 and 2 are cardinally symmetric. We thus conclude that 2 is more ambiguity averse than 1 . Mimicking the last part of the proof of Theorem 10.2, we then show that then P1 P2 , which immediately implies P1 = P2 , so that 1 = 2 ≡ . Thus is both more and less ambiguity averse than , which immediately implies = . Proof of Theorem 10.3. Part (i) follows immediately along the lines of the proofs of Theorem 10.2 and Corollary 10.1. As for part (ii), it is similarly immediate to show that if 2 is more ambiguity averse than 1 , then C1 ⊆ D(2 ) and u1 ≈ u2 . We show the converse. Let V1 and V2 denote the canonical representations of 1 and 2 , and w.l.o.g. assume that u1 = u2 = u. Then C1 ⊆ D(2 ) implies that for every f ∈ F and every P ∈ C1 , V2 (f ) u(f ) dP . Hence, using the fact that 1 is MEU, we find V2 (f ) min u(f (s)) P (ds) = V1 (f ), P ∈C1 S
which immediately yields the desired result. 10.C.2. Section 10.5 Proof of Proposition 10.5. Let ∈ R() and set ρ(Ac ) = 1}. If A ∈ for all x ∈ X we have u(x) = P (A) ⇐⇒ u(x) = ρ(A) u(x) = P (Ac ) ⇐⇒ u(x) = ρ(Ac ),
≡ {A ∈ : ρ(A) +
Ambiguity made precise and so ρ(A) = P (A) and ρ(Ac ) = P (Ac ). This implies that A ∈ ⊆ . Now, if A ∈ we have ρ(A) = P (A)
and
ρ(Ac ) = P (Ac ).
241
, so that
(10.C.3)
In order to show that A ∈ , we need to show that any act measurable w.r.t. the partition {A, Ac } is in H . This follows from (10.C.3), as for every x, y ∈ X we have V (x A y) = V (x A y). Thus ⊆ , which concludes the proof. 10.C.3. Section 10.6 Proof of Proposition 10.6. Suppose, to the contrary, that ν agrees with (10.17). If Eq. (10.15) holds then P (R) = ν(R) and P (B, Y ) = ν(B, Y ) for all P ∈ C(ν), so that we have P (B, Y ) = ν(B, Y ) < ν(B, R) P (B, R). In turn, this implies P (Y ) < P (R), yielding ν(Y ) P (Y ) < P (R) = ν(R). Hence ν(Y ) < ν(R), contradicting (10.17). Proof of Proposition 10.7. Every ν which satisfies (10.18) is such that C(ν) = Ø. For, the measure P such that P (R) = P (B) = P (Y ) = 1/3 belongs to C(ν). This proves that all preferences satisfying (10.18) are ambiguity averse. As to the converse, let be ambiguity averse, that is, C(ν) = Ø. Let P ∈ C(ν). Assume first that ν(B) = ν(Y ) > ν(R). Since P (B) ν(B) and P (Y ) ν(Y ), P (B) + P (R) + P (Y ) > ν(B) + ν(R) + ν(Y ) > 1, a contradiction. Assume now ν(B, Y ) < ν(B, R) = ν(R, Y ). This implies P (B, Y ) < P (B, R) and P (B, Y ) < P (R, Y ), so that P (Y ) < P (R), P (B) < P (R), and P (B) + P (R) + P (Y ) < 1, a contradiction.
Acknowledgments An earlier version of this chapter was circulated with the title “Ambiguity Made Precise: A Comparative Foundation and Some Implications.” We thank Kim Border, Eddie Dekel, Itzhak Gilboa, Tony Kwasnica, Antonio Rangel, David Schmeidler, audiences at Caltech, Johns Hopkins, Northwestern, NYU, Rochester, UC-Irvine, Université Paris I, the TARK VII-Summer Micro Conference (Northwestern, July 1998), the 1999 RUD Workshop, and especially Simon Grant, Peter Klibanoff, Biung-Ghi Ju, Peter Wakker, and an anonymous referee for helpful comments and discussion. Our greatest debt of gratitude is however to Larry Epstein, who sparked our interest on this subject with his paper (Epstein (1999)) and stimulated it with many discussions. Marinacci gratefully acknowledges the financial support of MURST.
242
Paolo Ghirardato and Massimo Marinacci
Notes 1 Other widespread names are “uncertainty aversion” and “aversion to Knightian uncertainty.” We like to use “uncertainty” in its common meaning of any situation in which the consequences of the DM’s possible actions are not known at the time of choice. 2 A bet “on” an event is any binary act in which a better payoff (“win”) is received when the event obtains. 3 There are earlier papers that use a comparative approach for studying ambiguity attitude, but they do not use it as a basis for defining absolute notions. For example, Tversky and Wakker (1995). 4 See Appendix A for the definition of capacities, Choquet integrals, and some of their properties. 5 We use the symbols (and >) to denote SEU weak (and strict) preferences. 6 Such set is well-defined since it is trivially true that the union of any collection of sets satisfying (A) and (B) below also satisfies the two conditions. 7 See Zhang (1996) for a compelling urn example in which this happens. 8 For any pair of biseparable preferences which have essential events, this elicitation can be done without extraneous devices by using the tradeoff method briefly outlined in Appendix B. 9 We thank Peter Klibanoff for his substantial help in developing the ensuing discussion. 10 The richness condition is: For every F ⊆ E in and A ∈ such that A is as likely as E, there is B ⊆ A in such that B is as likely as F . Epstein remarks that richness of is not required for some of his results.
References F. I. Anscombe and R. J. Aumann (1963), A definition of subjective probability, Ann. Math. Stat. 34, 199–205. K. J. Arrow (1974), The theory of risk aversion, in “Essays in the Theory of Risk-Bearing,” Chap. 3, North-Holland, Amsterdam. R. Casadesus-Masanell, P. Klibanoff, and E. Ozdenoren (2000), Maxmin expected utility over Savage acts with a set of priors, J. Econ. Theory 92, 33–65. A. Chateauneuf and J. M. Tallon (1998), Diversification, convex preferences and non-empty core, mimeo, Université Paris I, July. G. Choquet (1953), Theory of capacities, Ann. Inst. Fourier (Grenoble) 5, 131–295. B. de Finetti (1952), Sulla preferibilitá, Giorn. Econ. 6, 3–27. D. Ellsberg (1961), Risk, ambiguity, and the Savage axioms, Quart. J. Econ. 75, 643–669. L. G. Epstein (1999), A definition of uncertainty aversion, Rev. Econ. Stud. 66, 579–608. (Reprinted as Chapter 9 in this volume.) L. G. Epstein and T. Wang (1994), Intertemporal asset pricing under Knightian uncertainty, Econometrica 62, 283–322. (Reprinted as Chapter 18 in this volume.) L. G. Epstein and J. Zhang (2001), Subjective probabilities on subjectively unambiguous events, Econometrica 69, 265–306. P. C. Fishburn (1993), The axioms and algebra of ambiguity, Theory Dec. 34, 119–137. P. Ghirardato and J. N. Katz (2000a), “Indecision Theory: Explaining Selective Abstention in Multiple Elections,” Social Science Working Paper 1106, Caltech, November. P. Ghirardato and M. Marinacci (2000b), “Risk, Ambiguity, and the Separation of Utility and Beliefs,” Social Science Working Paper 1085, Caltech, March (Revised: January 2001).
Ambiguity made precise
243
P. Ghirardato and M. Marinacci (2000), A subjective definition of ambiguity, Work in progress, Caltech and Università di Torino. I. Gilboa and D. Schmeidler (1989), Maxmin expected utility with a non-unique prior, J. Math. Econ. 18, 141–153. (Reprinted as Chapter 6 in this volume.) L. P. Hansen, T. Sargent, and T. D. Tallarini (1999), Robust permanent income and pricing, Rev. Econ. Stud. 66, 873–907. Y. Kannai (1992), The core and balancedness, in “Handbook of Game Theory” (R. J. Aumann and S. Hart, eds), pp. 355–395, North-Holland, Amsterdam. D. Kelsey and S. Nandeibam (1996), On the measurement of uncertainty aversion, mimeo, University of Birmingham, September. D. H. Krantz, R. D. Luce, P. Suppes, and A. Tversky (1971), “Foundations of Measurement: Additive and Polynomial Representations,” Vol. 1, Academic Press, San Diego. M. J. Machina and D. Schmeidler (1992), A more robust definition of subjective probability, Econometrica 60, 745–780. A. Montesano and F. Giovannoni (1996), Uncertainty aversion and aversion to increasing uncertainty, Theory Dec. 41, 133–148. S. Mukerji (1998), Ambiguity aversion and incompleteness of contractual form, Amer. Econ. Rev. 88, 1207–1231. (Reprinted as Chapter 14 in this volume.) K. Nehring (1999), Capacities and probabilistic beliefs: A precarious coexistence, Math. Soc. Sci. 38, 197–213. J. W. Pratt (1964), Risk aversion in the small and in the large, Econometrica 32, 122–136. L. J. Savage (1954), “The Foundations of Statistics,” Wiley, New York. D. Schmeidler (1989), Subjective probability and expected utility without additivity, Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) A. Tversky and P. P. Wakker (1995), Risk attitudes and decision weights, Econometrica 63, 1255–1280. P. P. Wakker (1989), “Additive Representations of Preferences,” Kluwer, Dordrecht. P. P. Wakker and D. Deneffe (1996), Eliciting von Neumann–Morgenstern utilities when probabilities are distorted or unknown, Manage. Sci. 42, 1131–1150. M. E. Yaari (1969), Some remarks on measures of risk aversion and on their uses, J. Econ. Theory 1, 315–329. J. Zhang (1996), Subjective ambiguity, probability and capacity, mimeo, University of Toronto, October.
11 Stochastically independent randomization and uncertainty aversion Peter Klibanoff
11.1. Introduction An example seminal to interest in uncertainty (or ambiguity) aversion is Ellsberg’s (1961) “two-color” problem. There is a “known urn” which contains 50 red balls and 50 black balls, and an “unknown urn” which contains a mix of red and black balls, totaling 100, about which no information is given. Ellsberg observed (as did many afterwards, more carefully) that a substantial fraction of individuals were indifferent between the colors in both urns, but preferred to bet on either color in the “known urn” rather than the corresponding color in the “unknown urn.” This violates not only expected utility (EU), but probabilistically sophisticated behavior more generally. One contemporary criticism of the displayed behavior was put forward by Raiffa (1961) who pointed out that flipping a coin to decide which color to bet on in the unknown urn should be viewed as equivalent to betting on the “known” 50–50 urn. One can think of such preferences as displaying a preference for randomization. Jumping ahead to more recent work, there is a burgeoning literature attempting to model uncertainty (or ambiguity) aversion in decision makers using representations with nonadditive probabilities or sets of probabilities. Some of this work (e.g. Lo, 1996; Klibanoff, 1994) accepts this preference for mixture or randomization as a facet of uncertainty aversion, while other work (e.g. Dow and Werlang, 1994; Eichberger and Kelsey, 2000) does not. This has led to several papers, most directly Eichberger and Kelsey (1996), but also Ghirardato (1997) and Sarin and Wakker (1992), related to this difference. In particular, all three papers observe that the choice of a “one-stage” or Savage model as opposed to a “two-stage” or Anscombe–Aumann model can lead to different preferences when modeling uncertainty aversion. In Eichberger and Kelsey (1996) the authors set out to “show that while individuals with nonadditive beliefs may display a strict preference for randomization in an Anscombe–Aumann framework they will not do so in a Savage-style decision theory.”1
Klibanoff, P. Stochastically independent randomization and uncertainty aversion. Economic Theory 18, 605–620.
Stochastically independent randomization
245
This chapter was motivated in part by the intuition that the one-stage/two-stage modeling distinction is largely a red herring, at least as it relates to preference for randomization. In particular, while appreciating that there can be differences between the frameworks, one goal of this chapter is to relate these differences to violations of stochastic independence and to point out that they have essentially no role to play in the debate over preference for randomization in uncertainty aversion. In making this point, the related finding of the restrictiveness of Choquet expected utility (CEU) preferences in allowing for randomizing devices is key. An additional contribution of the chapter is to provide preference based conditions to describe a stochastically independent randomizing device in a nonBayesian environment. Section 11.2 sets out some preliminaries and notation. Section 11.3 describes two frameworks in which a randomizing device can be modeled. Section 11.4 provides the key preference conditions and contains the main results on the restrictiveness of CEU when stochastic independence is required and the relative flexibility of Maxmin expected utility (MMEU) with multiple priors. Section 11.5 concludes.
11.2. Preliminaries and notation We will consider two representations of preferences, each of which generalizes EU and allows for uncertainty aversion. The first model is CEU. CEU was axiomatized first in an Anscombe–Aumann framework by Schmeidler (1989), and then in a Savage framework by Gilboa (1987) and Sarin and Wakker (1992). In a Savage framework, but assuming a rich set of consequences and a finite state space, Wakker (1989), Nakamura (1990), and Chew and Karni (1994) have axiomatized CEU. The second model is MMEU with non-unique prior. MMEU was first axiomatized in an Anscombe–Aumann framework by Gilboa and Schmeidler (1989). In a Savage framework, but assuming a rich set of consequences and allowing a finite or infinite state space, MMEU has been axiomatized by Casadesus-Masanell et al. (2000b). Consider a finite set of states of the world S. Let X be a set of consequences. An act f is a function from S to X. Denote the set of acts by F . A function v : 2S → [0, 1] is a capacity or nonadditive probability if it satisfies, (i) v(∅) = 0, (ii) v(S) = 1, and (iii) A ⊆ B implies v(A) ≤ v(B). It is convex if, in addition, (iv) For all A, B ⊆ S, v(A) + v(B) ≤ v(A ∪ B) + v(A ∩ B). Now define the (finite) Choquet integral of a real-valued function a to be: a dv = α1 v(E1 ) +
n
αi [v(∪ij =1 Ej ) − v(∪i−1 j =1 Ej )],
i=2
where αi is the ith largest value that a takes on, and Ei = a −1 (αi ).
246
Peter Klibanoff
Let ! be a binary relation on acts, F , that represents (weak) preferences. A decision maker is said to have CEU preferences if there exists a utility function S u : X → / and a nonadditive probability v : 2 → / such that, for all f , g ∈ F , f ! g if and only if u ◦ f dv ≥ u ◦ g dv. CEU preferences are said to display uncertainty aversion if v is convex.2 A decision maker is said to have MMEU preferences if there exists a utility function u : X → / and a non-empty, closed and convex set B of additive probability measures on S such that, for all f , g ∈ F , f ! g if and only if minp∈B u ◦ f dp ≥ minp∈B u ◦ g dp. All MMEU preferences display uncertainty aversion.3 Finally, note that the set of MMEU preferences strictly contains the set of CEU preferences with convex capacities.
11.3. Modeling a randomizing device Corresponding to the two standard frameworks for modeling uncertainty (Anscombe–Aumann and Savage) there are at least two alternative ways to model a randomizing device. In an Anscombe–Aumann setting, a randomizing device is incorporated in the structure of the consequence space. Specifically the “consequences” X, are often taken to be the set of all simple probability distributions over some more primitive set of outcomes, Z. In this setup, a randomization over two acts f and g with probabilities p and 1 − p respectively is modeled by an act h where h(s)(z) = pf (s)(z) + (1 − p)g(s)(z), for all s ∈ S, z ∈ Z. Observe that h is, indeed, a well-defined act because the set of simple probability distributions is closed under mixture. Returning to the “unknown urn” of the introduction, Table 11.1 shows the three acts (a) “bet on red,” (b) “bet on black,” and (c) “randomize 50–50 over betting on red or on black” as modeled in this setting. Alternatively, consider a Savage-style setting with a finite state space (e.g. Wakker (1984), Nakamura (1990), or Gul (1992)). Here a convex combination of two elements of the consequence space X need not be an element of X (and need not even be defined). Therefore, to model a randomization, we may instead expand the original state space, S, by forming the cross product of S with the possible outcomes (or “states”) of the randomizing device. For example, Table 11.2 shows the acts (a) “bet on red,” (b) “bet on black,” (c) “bet on red if heads, black if tails,” and (d) “bet on black if heads, red if tails” in the case of the unknown urn with a coin used to randomize. Table 11.1 Unknown urn with randomization in the consequence space (Anscombe– Aumann) R(ed)
B(lack)
(a) (b)
$100 $0
$0 $100
(c)
1 2 $100
⊕ 12 $0
1 2 $100
⊕ 12 $0
Stochastically independent randomization
247
Table 11.2 Unknown urn with randomization in the state space only (Savage)
(a) (b) (c) (d)
R(ed), H(eads)
B(lack), H(eads)
R(ed), T(ails)
B(lack), T(ails)
$100 $0 $100 $0
$0 $100 $0 $100
$100 $0 $0 $100
$0 $100 $100 $0
In comparing the two models, observe that the Anscombe–Aumann setting builds in several key properties that a randomizing device should satisfy while the Savage setting does not. In particular, the probabilities attached to the outcomes of the randomizing device should be unambiguous and the device should be stochastically independent from the (rest of the) state space. Arguably these two properties capture the essence of what is meant by a randomizing device. Both properties are automatically satisfied in an Anscombe–Aumann setting. In a Savage setting, as we will see later, these properties require additional restrictions on preferences.4 Several recent papers (including Eichberger and Kelsey, 1996; Ghirardato, 1997; and Sarin and Wakker, 1992), have noted that CEU need not give identical results in the two frameworks. Specifically, they suggest that the choice of a one-stage (Savage) or two-stage (Anscombe–Aumann) model can lead to different behavior. To see this in the unknown urn example, consider the case where the decision maker’s marginal capacity over the colors is v(R) = v(B) = 13 . In the Anscombe–Aumann setting this is enough to pin down preferences as c a ∼ b (i.e. the Raiffa preferences or preference for randomization). In the Savage setting, consider the capacity given by v(R × {H , T }) = v(B × {H , T }) = v({R, B} × H ) = v({R, B} × T ) = v(R × H ) = v(R × T ) = v(B × H ) = v(B × T ) = v((R × H ) ∪ (B × T )) = v((R × T ) ∪ (B × H )) = v(any 3 states) =
1 , 3 1 , 2 1 , 6 1 , 3 2 . 3
This capacity yields the preferences a ∼ b ∼ c ∼ d, and thus does not provide a preference for randomization as in the Anscombe–Aumann setting. Why does this occur despite the fact that the marginals are identical in the two cases and the product capacity is equal to the product of the marginals on all rectangles? Mathematically, as Ghirardato (1997) explains, the source is a failure of the usual Fubini Theorem to hold for Choquet integrals. Intuitively, however, it is not clear what is going “wrong” in the example.
248
Peter Klibanoff Table 11.3 Non-product weights for randomized act
c weights
R, H
B, H
R, T
B, T
$100
$0
$0
$100
1 6
1 3
1 3
1 6
To gain some insight, it is useful to examine the weights applied to each state when evaluating the randomized acts using the Choquet integral. For example, as Table 11.3 shows, “Bet on Red if Heads, Black if Tails” is evaluated using nonproduct weights. The fact that such non-product weights can be applied suggests that the CEU preferences with the capacity above reflect ambiguity not only about the color of the ball drawn from the urn but also about the correlation between the randomizing device and the color of the ball. This can also be seen by noting that v({R, B} × H ) > v((R × H ) ∪ (B × T )), in contrast to the equality one might expect if H and T are really produced by a symmetric, independent randomization. While such ambiguity is certainly possible, it runs directly counter to the stochastic independence we would expect of a randomizing device. In the next section, therefore, I propose conditions on preferences that ensure this independence.
11.4. Stochastically independent randomization and preferences Here I propose conditions on preferences that are designed to reflect two properties of a randomizing device: unambiguous probabilities and stochastic independence. These two properties are essential to what is meant by a randomizing device. Formally, consider preferences, !, over acts, F : S → X, on a finite product state space, S = S1 ×S2 ×· · ·×SN . Let S−i denote the product of all ordinates other than i. Denote by FSi the subset of acts for which outcomes are determined entirely by the ith ordinate. This means that f ∈ FSi implies f (si , s−i ) = f (si , sˆ−i ) for all s−i , sˆ−i ∈ S−i and si ∈ Si . For f , g ∈ F and A ⊆ S, denote by fA g the act which equals f (s) for s ∈ A and equals g(s) for s ∈ / A. We now state some useful definitions concerning preferences. Definition 11.1. ! satisfies solvability on Si if, for f ∈ FSi , x, y, z ∈ X and Ai ⊆ Si , xAi ×S−i z ! f ! yAi ×S−i z implies f ∼ wAi ×S−i z for some w ∈ X. Solvability should be seen as a joint richness condition on ! and X. It is satisfied in all axiomatizations of which I am aware of, EU, CEU, or MMEU over Savage acts on a finite state space. For example, Nakamura (1991) imposes solvability directly, while Wakker (1984, 1989), Gul (1992) and Casadesus-Masanell et al. (2000a,b) ensure it is satisfied through topological assumptions on X and continuity assumptions on !.
Stochastically independent randomization
249
Definition 11.2. ! satisfies expected utility (EU) on Si if ! restricted to FSi can be represented by expected utility where the utility function is unique up to a positive affine transformation and the probability measure on the set of all subsets of Si is unique. While the definition is intentionally stated somewhat flexibly, it could easily be made more primitive/rigorous by assuming that preferences restricted to FSi satisfy the axioms in one of the existing axiomatizations of EU over Savage acts on a finite state space such as Wakker (1984), Nakamura (1991), Gul (1992), or Chew and Karni (1994). This definition is intended to capture the fact that the decision maker associates a unique probability distribution with Si and uses that distribution to weight outcomes. Note that the uniqueness requirement on the probability measure entails the existence of consequences x, y ∈ X such that x y (where preferences over X are derived from preferences over the associated constant-consequence acts in the usual way). Furthermore, any of the axiomatizations cited will imply solvability on Si as well. Definition 11.3. si ∈ Si is null if fsi ×S−i h ∼ gsi ×S−i h for all f , g, h ∈ FSi . Note that given EU, a state is null if and only if it is assigned zero probability. Definition 11.4. Si is stochastically independent of S−i if, for all sˆ−i ∈ S−i , f ∈ FSi and w ∈ X, f ∼w
(11.1)
implies, fSi ׈s−i w ∼ w.
(11.2)
While this is formulated as a general definition of stochastic independence of an ordinate, this chapter will focus only on independence of a randomizing device. For this purpose, the main definition is the following: Definition 11.5. Si is a stochastically independent randomizing device (SIRD) if Si is stochastically independent and contains at least two non-null states, and ! satisfies solvability and EU on Si . This condition is designed to differentiate between EU ordinates that are stochastically independent from the rest of the state space and those that are dependent, while still allowing for possible uncertainty aversion on other ordinates. A useful way to understand this definition is as follows: There are several potential reasons why Equation (11.1) could hold while Equation (11.2) is violated. First, it might be that uncertainty aversion over Si leads a different marginal probability measure over Si to be used when evaluating the acts in (11.2) than when evaluating acts
250
Peter Klibanoff
in (11.1). This is ruled out by the assumption that preferences satisfy EU on Si . Second, it might be that the marginal over Si conditional on sˆ−i is different than the unconditional marginal over Si due to some stochastic dependence (or uncertainty about stochastic independence) between Si and S−i . Since we want to model an independent randomizing device, it is proper that the SIRD condition does not allow for such dependence. Also supporting the idea that this definition reflects stochastic independence is the observation that if preferences are EU and nontrivial, then Si an SIRD is equivalent to requiring that the representing probability measure be a product measure on Si × S−i . Note also that all of the results that follow will also hold true if we additionally impose that S−i is stochastically independent of Si (by switching the role of i and −i in the definition of stochastically independent). Thus this concept shares the symmetry that a notion of stochastic independence should intuitively possess. In the next two sections, we develop the implications of SIRD for some common classes of uncertainty averse preferences. 11.4.1. MMEU and randomizing devices This section develops the implications for MMEU preferences of one ordinate of the state space being a SIRD. MMEU will be found to be flexible enough to easily incorporate both a SIRD and uncertainty aversion. Theorem 11.1. Assume ! are MMEU preferences satisfying solvability for some Si that contains at least two non-null states. Then the following are equivalent: (i) Si is a SIRD; (ii) There exists a probability measure on 2Si , p, ˆ such that all probability measures, p, in the closed, convex set of measures, B, of the MMEU representation satisfy p(s) = p(s ˆ i )p(Si × s−i ), for all s ∈ S. Proof. ((i) ⇒ (ii)) We first show that all p ∈ B must have the same marginal on Si . Fix outcomes x, y ∈ X such that x y. EU on Si implies that ! restricted to FSi ˆ i ) where pˆ is the unique representing may be represented by si ∈Si u(f (si ))p(s probability measure on 2Si , and u is unique up to a positive affine transformation. Using the MMEU representation of ! yields a utility function u˜ and a set of measures B such that, for all f , g ∈ FSi , min p∈B
si ∈Si
⇐⇒
u(f ˜ (si ))p(si × S−i ) ≥ min p∈B
si ∈Si
u(f (si ))p(s ˆ i) ≥
u(g(s ˜ i ))p(si × S−i )
si ∈Si
u(g(si ))p(s ˆ i ).
si ∈Si
Without loss of generality, set u(x) = u(x) ˜ = 1 and u(y) = u(y) ˜ = 0. Using the fact that Si satisfies EU and solvability, combined with the MMEU representation,
Stochastically independent randomization
251
allows one to apply Nakamura (1990: lemma 3) and conclude that, given the normalization, the two utility functions must be the same (i.e. u(x) ˜ = u(x) for all x ∈ X). Therefore, min u(f (si ))p(si × S−i ) ≥ min u(g(si ))p(si × S−i ) p∈B
si ∈Si
⇐⇒
p∈B
ˆ i) ≥ u(f (si ))p(s
si ∈Si
si ∈Si
u(g(si ))p(s ˆ i ).
si ∈Si
ˆ i ) for some si ∈ Si . Suppose there is some p ∈ B such that p (si × S−i ) = p(s Without loss of generality, assume that p(s ˆ i ) > p (si × S−i ) for an si ∈ Si . Consider the act f = xsi ×S−i y. Solvability guarantees that there exists a z ∈ X such that z ∼ f . Thus, u(f (si ))p(si × S−i ) u(z) = min p∈B
≤ <
si ∈Si
p (si × S−i ) p(s ˆ i )
=
u(f (si ))p(s ˆ i)
si ∈Si
= u(z), ˆ i ) for a contradiction. Therefore, it must be that p ∈ B implies p(si × S−i ) = p(s all si ∈ Si . In other words, all the marginals on Si agree. Now we show that each p ∈ B is a product measure on Si × S−i . This part of the argument proceeds by contradiction. Suppose that p ∈ B does not imply that p(s) = p(s ˆ i )p(Si × s−i ), for all s ∈ S. Then there must exist a p0 ∈ B and a sˆ ∈ S such that ˆ si )p0 (Si × sˆ−i ). p0 (ˆs ) < p(ˆ
(11.3)
According to p0 , the probability of sˆi and sˆ−i occurring together is less than the product of the respective marginal probabilities. We now show that this is inconsistent with the assumption that Si is stochastically independent. Consider the act f ∈ FSi such that f = xsˆi ×S−i y. Since x ! f ! y, solvability on Si implies there exists a w ∈ X such that w ∼ f . Observe that our normalization of u and the preference representation imply u(w) = p(ˆ ˆ si )u(x) + (1 − p(ˆ ˆ si ))u(y) = p(ˆ ˆ si ). Define the act h = fSi ׈s−i w. By SIRD, f ∼ w implies h ∼ w. We have the following contradiction: u(h(s))p(s) u(w) = min p∈B
≤
s∈S
s∈S
u(h(s))p0 (s)
252
Peter Klibanoff = u(f (s))p0 (s) + u(w)p0 (s) s∈Si ׈s−i
s ∈S / i ׈s−i
= p0 (ˆs ) + u(w)(1 − p0 (Si × sˆ−i )) ˆ si )p0 (Si × sˆ−i )) = u(w) + (p0 (ˆs ) − p(ˆ < u(w). (Note that the last inequality follows from (11.3).) Therefore each p ∈ B must in fact be a product measure on Si × S−i and (ii) is proved. ((ii) ⇒ (i)) That (ii) implies EU is satisfied on Si is clear because pˆ is the unique representing probability measure. To see that (11.1) implies (11.2) is satisfied on Si , consider any f ∈ FSi and w ∈ X such that f ∼ w. Fix an sˆ−i ∈ S−i and define h = fSi ׈s−i w. By (ii),
min p∈B
u(h(s))p(s) = min p∈B
s∈S
⎡ p(Si × s−i ) ⎣
s−i ∈S−i
⎤ u(h(si , s−i ))p(s ˆ i )⎦
si ∈Si
and,
min p∈B
u(w)p(s) = min
s∈S
p∈B
s−i ∈S−i
⎡ p(Si × s−i ) ⎣
⎤ u(w)p(s ˆ i )⎦ .
si ∈Si
Since f ∼ w, si ∈Si
u(h(si , s−i ))p(s ˆ i) =
u(w)p(s ˆ i ) for all s−i ∈ Si .
si ∈Si
Therefore the two minimization problems are the same and h ∼ w. Thus, we get quite a natural representation in the MMEU framework: • •
All the marginals on the randomizing device agree, reflecting the lack of ambiguity about the device. All the measures in B are product measures on Si × S−i , reflecting the independence of the randomizing device.
Remark 11.1. It is not hard to see from the theorem that, in the Ellsberg “unknown urn” example, if “bet on red” is indifferent to “bet on black” then any MMEU preferences that are not EU and for which the coin is a SIRD lead to the “Raiffa” preference for randomization. As a concrete example, consider the MMEU
Stochastically independent randomization preferences with set of measures " 1 p | p(R × H ) = p(R × T ) = x, 2 % 1 1 2 = (1 − x), ≤x≤ . 2 3 3
253
p(B × H ) = p(B × T )
By the theorem, these preferences make {H , T } a SIRD and it is easy to verify that they exhibit the “Raiffa” preference for randomization. Remark 11.2. The set of product measures that emerges from the MMEU characterization is consistent with a notion of independent product of two sets of measures proposed by Gilboa and Schmeidler (1989). Specifically, the set B is trivially the independent product (in their sense) of the (unique) marginal on Si and the set of marginals on S−i used in representing preferences over FS−i . It is worth noting that no purely preference based justification for their broader notion (when neither of the sets is a singleton) is known. 11.4.2. CEU, uncertainty aversion, and randomizing devices This section examines uncertainty averse CEU preferences on a product state space where one of the ordinates is assumed to be a candidate randomizing device. In stark contrast to the results of the previous section, this class is shown to include only EU preferences. This suggests that CEU preferences with a convex capacity are incapable of modeling both a randomizing device and uncertainty aversion simultaneously. Theorem 11.2. If CEU preferences, !, display uncertainty aversion and, for some i, Si is a SIRD then ! must be EU preferences. Proof. Recall that the state space is S = S1 × S2 × · · · × SN . Without loss of generality, let S1 be a SIRD. Uncertainty aversion implies that the capacity v in the CEU representation is convex. The core of a capacity v is the set of probability measures that pointwise dominate v (i.e. {p | p(A) ≥ v(A), for all A ⊆ S; p a probability measure}). Any CEU preferences with a convex v are also MMEU preferences with the set of measures, B, equal to the core of v (Schmeidler, 1986). It follows that v(A) = minp∈core(v) p(A) for all A ⊆ S (i.e. v is the lower envelope of its core). Since preferences are MMEU and S1 is a SIRD, Theorem 11.1 implies ˆ such that all probability measures, that there exists a probability measure on 2S1 , p, p, in the core of v satisfy p(s) = p(s ˆ 1 )p(S1 × s−1 ), for all s ∈ S. Thus the core of v must be of a very special form. The remainder of the proof is devoted to showing that convexity of v and a core of this form are only compatible when preferences are EU. First I derive a key equality implied by convexity together with the form of the core. To this end, fix any s1 ∈ S1 and A−1 , B−1 ⊆ S−1 . Denote the complement
254
Peter Klibanoff
c of s−1 relative to S−1 by s−1 and the complement of s1 relative to S1 by s1c . Define the sets C = s1 × S−1 and D = (s1 × A−1 ) ∪ (s1c × B−1 ). Convexity of v implies that
v(C) + v(D) ≤ v(C ∪ D) + v(C ∩ D). Using the structure of the core of v and the fact that v is the lower envelope of its core yields the opposite inequality: v(C) + v(D) = p(s ˆ 1) +
ˆ 1 )p(S1 × A−1 ) min [p(s
p∈core(v)
+ (1 − p(s ˆ 1 ))p(S1 × B−1 )] ≥ p(s ˆ 1 ) + p(s ˆ 1 ) min
p(S1 × A−1 )
+ (1 − p(s ˆ 1 )) min
p(S1 × B−1 )
p∈core(v)
p∈core(v)
= v(C ∪ D) + v(C ∩ D). Combining the two inequalities, it must be that, for all s1 ∈ S1 and all A−1 , B−1 ∈ S−1 , ˆ 1 )p(S1 × A−1 ) + (1 − p(s ˆ 1 ))p(S1 × B−1 )] min [p(s
p∈core(v)
= p(s ˆ 1 ) min
p∈core(v)
p(S1 × A−1 ) + (1 − p(s ˆ 1 )) min
p∈core(v)
p(S1 × B−1 ). (11.4)
Now, using Equation (11.4), an argument by contradiction shows that the core of v cannot contain more than one measure. Suppose core(v) contains more than one probability measure. Then there exists an s¯ ∈ S such that arg minp∈core(v) p(¯s ) ⊂ core(v). Since all the measures in the core are of the form p(s) = p(s ˆ 1 )p(S1 × s−1 ), it must be that arg minp∈core(v) p(S1 × s¯−1 ) = arg minp∈core(v) p(¯s ). Since c p(S1 × s¯−1 ) = 1 − p(S1 × s¯−1 ) for any p ∈ core(v), arg min
p∈core(v)
p(S1 × s¯−1 ) ∩ arg min
p∈core(v)
c p(S1 × s¯−1 ) = ∅.
Thus, for any non-null s1 ∈ S1 , c ˆ 1 )p(S1 × s¯−1 ) + (1 − p(s ˆ 1 ))p(S1 × s¯−1 )] min [p(s
p∈core(v)
> p(s ˆ 1 ) min
p∈core(v)
p(S1 × s¯−1 ) + (1 − p(s ˆ 1 )) min
p∈core(v)
c p(S1 × s¯−1 ),
in violation of Equation (11.4). Therefore, the core of v must be a singleton and, since v is the lower envelope of its core, v must be a probability measure and preferences are EU.
Stochastically independent randomization
255
Remark 11.3. This theorem shows that CEU with a convex capacity is a very restrictive class of preferences in a Savage-like setting. In particular a decision maker with such preferences must be either uncertainty neutral (i.e. an EU maximizer) or must not view any ordinate of the state space as a stochastically independent randomizing device. Note that this fact is disguised in an Anscombe– Aumann setting because there the randomizing device is built into the outcome space and thus automatically separated from the uncertainty over the rest of the world. Remark 11.4. The theorem allows us to better understand the result of Eichberger and Kelsey (1996), who find that convexity of v, a symmetric additive marginal on S1 , and a requirement that relabeling the states in S1 do not affect preference, together imply no preference for randomization. The result shown here makes clear that the lack of preference for randomization in their paper comes from the fact that decision makers having preferences in this class (with v somewhere strictly convex) cannot act as if the randomizing device is stochastically independent in the sense of SIRD. In other words, the uncertainty averse preferences they consider rule out a priori the possibility of a stochastically independent device and thus of true randomization. In this light, their result arises because all of the non-EU preferences they consider force a range of possible correlations (which are then viewed pessimistically since they are another source of uncertainty) between the device and the rest of the state space. Once they admit preferences like MMEU, which, as shown earlier, can reflect a proper randomizing device (a SIRD) as well as uncertainty aversion, preference for randomization reappears. Remark 11.5. If convexity of v is replaced by the weaker requirement of v balanced (or, equivalently, the core of v nonempty), as advocated by Ghirardato and Marinacci (1998), preferences do not collapse to EU. For example, if the capacity used in Section 11.3 is modified by setting v((R × H ) ∪ (B × T )) = v((R × T ) ∪ (B × H )) = 21 rather than 13 , the resulting preferences make {H , T } a SIRD and are not EU. This capacity has a nonempty core, but is not convex. Note that these preferences still display a preference for randomization. To the extent that one is willing to accept this weaker characterization of uncertainty aversion and wants to use CEU preferences in a Savage-like setting, these findings suggest that the class of capacities with nonempty cores that are not convex may be of particular interest. 11.4.3. Further discussion of the SIRD condition The key to these results is the definition of a SIRD, in particular the assumption that Equation (11.1) implies Equation (11.2). I argued above that given preferences satisfying EU on Si and given the restriction of the acts in (11.1) to be Si -measurable, it is quite natural to accept Equation (11.1) implies Equation (11.2) as reflecting the stochastic independence of Si from the rest of the state space.
256
Peter Klibanoff
It is worth elaborating a bit on why SIRD is appropriate for a randomizing device. Since stochastic independence does concern independence, and uncertainty aversion fundamentally involves violations of the independence axiom/sure-thing principle of subjective EU theory, it is fair to ask whether imposing SIRD unnecessarily restricts uncertainty aversion. Does SIRD confound stochastic independence with the violations of independence inherent in uncertainty aversion? Theorem 11.1 answers this question in the negative and suggests that uncertainty aversion is not restricted at all by imposing SIRD. Specifically, any MMEU preferences5 over FS−i are compatible with Si being a SIRD. Notice that it is exactly and only uncertainty aversion over S−i that is unrestricted. This is appropriate, since any other uncertainty aversion must be either over Si (ruled out by EU) or over the correlation between Si and S−i (incompatible with the presumption of stochastic independence). There simply is nothing else to be uncertain about. It seems that SIRD strikes a reasonable balance – enforcing stochastic independence of a randomizing device while allowing uncertainty aversion on the other ordinates of the state space.
11.5. Conclusion This chapter has provided preference-based conditions that a randomizing device should satisfy. When these conditions are applied to the class of CEU preferences with convex capacities in a product state space model a collapse to EU results. This does not occur with MMEU preferences in the same setting. In particular, it appears that some previous results on the absence of preference for randomization were driven not by some deep difference in Anscombe–Aumann and Savage style models as they relate to uncertainty aversion, but by the restrictiveness, as it relates to stochastic independence, of the CEU functional form with a convex capacity which is exacerbated in Savage style models. When stochastic independence is properly accounted for, preference for randomization by uncertainty averse decision makers arises in both one- and two-stage models. To my knowledge, Blume et al. (1991) are the only others to have developed a preference axiom for stochastic independence. Their work is in the context of preferences satisfying the decision-theoretic independence axiom. This leads their condition to be unsatisfactory in the setting of this chapter. In particular, their axiom asks more of conditional preferences than is reasonable in the presence of uncertainty aversion and does not need to address the consistency of conditional with unconditional preferences. There have been several functional (i.e. not preference axiom based) notions of stochastically independent product that have been proposed for the MMEU or CEU models. For the case where one marginal is additive in the MMEU model, as was mentioned following Theorem 11.1, the results of the approach taken here agree with the notion proposed in Gilboa and Schmeidler (1989). Approaches specific to the CEU model have been suggested by Ghirardato (1997) and Hendon et al. (1996). If one marginal is additive and the product capacity is convex (as in Theorem 11.2), these approaches are weaker than the one advocated here.
Stochastically independent randomization
257
Specifically, preferences that are independent in the sense of Ghirardato (1997) or Hendon et al. (1996) may violate SIRD. Some other recent work on shortcomings of the CEU model in capturing probabilistic features is Nehring (1999). An analysis relating separability of events in the CEU model to EU is given in Sarin and Wakker (1997). In the context of inequality measurement under uncertainty, Ben-Porath et al. (1997) advocate MMEU type functionals and show that they are closed under iterated application while CEU functionals are not. Differences between CEU and MMEU are also discussed in Klibanoff (2001) and Ghirardato et al. (1996). Any discussion of behavior, such as preference for randomization, that departs from what is considered standard raises some natural questions. First, descriptively, do actual decision makers behave in this way? Unfortunately, there are no studies that I am aware of that examine this issue. To do so properly would require: (1) taking some device like a coin and verifying that the decision maker viewed it as a SIRD; and (2) making it explicit to the decision maker that it is possible to choose acts that depend, not only on the main feature of interest (e.g. the color of ball drawn) but simultaneously on the realization of the coin flip. It is worth noting that many standard Ellsberg-style experiments do not offer an opportunity to examine preference for randomization because they tend to ask only questions such as “Do you prefer betting on red (black) in urn I or red (black) in urn II?” rather than allowing a fuller range of choices that provide a role for randomization. Second, normatively, is the preference for randomization described here “reasonable” or “rational” behavior or is it normatively unacceptable? In examples where uncertainty averse behavior seems reasonable, I find preference for randomization to be just as reasonable. Randomization acts to limit the negative influence of uncertainty on expected payoffs. In a nutshell, if one is afraid that the distribution over colors will be an unfavorable one, if one does not suffer this fear regarding the outcome of a coin flip, and if one is sure that the realization of the coin is independent, then the fact that any joint distribution must respect this independence limits the extent to which acts that pay based on the coin as well as the color (“randomizations”) can be hurt by uncertainty over the colors. This chapter has shown that to reject this argument, one must either (1) reject uncertainty aversion as defined here or (2) reject the possibility of committing to acts that are contingent on a SIRD (i.e. reject the static, Savage-like model of independent randomization). To argue the former, as in Raiffa (1961), one may invoke reasoning based on the decision-theoretic independence axiom/sure-thing principle to reject Ellsberg-type behavior as irrational. However, at the very least, the normative force of the independence axiom/sure-thing principle is a topic on which there are a wide range of opinions. Arguments relying on an inability to commit to a randomized action bring in an explicit dynamic component that is beyond the scope of this chapter to fully address. Such arguments are not, in my view, particularly satisfying since they leave open the question of why such acts would not be introduced, by third parties if necessary, given that the decision maker desires them.
258
Peter Klibanoff
Finally, it is important to emphasize that, although the language of this chapter has been in terms of independent randomization, fundamentally a SIRD is simply an ordinate of a product state space over which preferences are EU and which is viewed as stochastically independent of the rest of the state space. In many situations where an individual faces a number of uncertainties, it may be useful to be able to assume that the individual is EU on some dimensions but uncertainty averse on others, and that the EU facets of the uncertainty are stochastically independent of others. In this regard, Theorem 11.2 has shown that CEU with a convex capacity is not an appropriate class of preferences, while, by Theorem 11.1, MMEU preferences are capable of reflecting these features.
Acknowledgments Previous versions were circulated with the title “Stochastic Independence and Uncertainty Aversion.” I thank Massimo Marinacci, Paolo Ghirardato, Bart Lipman, Ed Schlee, Peter Wakker, Drew Fudenberg, an anonymous referee and especially Eddie Dekel for helpful discussions and comments and also thank a number of seminar audiences.
Notes 1 Eichberger and Kelsey, 1996, Abstract. 2 This characterization of uncertainty aversion for the CEU model stems from an axiom of Schmeidler’s (1989) of the same name in an Anscombe–Aumann framework. CasadesusMasanell et al. (2000a,b) develop analogous axioms for a Savage-style framework. This notion of uncertainty aversion has been by far the most common in the literature. However, recently, Epstein (1999) and Ghirardato and Marinacci (1998) have proposed alternative notions of uncertainty aversion. In particular, for the case of CEU, Ghirardato and Marinacci’s characterization requires the capacity v to be balanced. All convex capacities are balanced, but the converse is not true. Epstein’s notion neither implies nor is implied by convexity of v. However, the reason for this is that he uses a set of preferences larger than EU as an uncertainty neutral benchmark. If (as is the philosophy in this chapter and in Ghirardato and Marinacci) EU is the uncertainty neutral benchmark, then Epstein’s notion also requires v to be balanced. The reason these notions are weaker than Schmeidler’s is that they are based solely on preference comparisons for which at least one of the two acts being compared is “unambiguous.” In contrast, Schmeidler’s approach relies, in addition, on comparisons between certain pairs of “ambiguous” acts that are implicitly ranked as more or less ambiguous by his axiom (or its Savage counterpart). See Casadesus-Masanell et al. (2000b) for a more detailed discussion along these lines. 3 This is true using the approach of either Schmeidler (1989), Casadesus-Masanell et al. (2000a,b), or Ghirardato and Marinacci (1998). Under the assumption that preferences over “unambiguous” acts are EU, it is true in Epstein’s (1999) approach as well. 4 A randomizing device could be modeled in an Anscombe–Aumann setting by expanding the state space in exactly the same way as illustrated for the Savage setting. In this case, the same additional restrictions on preferences as in the latter setting would be required to ensure that the randomizing device was unambiguous and stochastically independent. 5 Recall that this includes any CEU preferences with a convex capacity as well.
Stochastically independent randomization
259
References Ben-Porath, E., Gilboa, I., Schmeidler, D. (1997) On the measurement of inequality under uncertainty. Journal of Economic Theory 75, 194–204. (Reprinted as Chapter 22 in this volume.) Blume, L., Brandenburger, A., Dekel, E. (1991) Lexicographic probabilities and choice under uncertainty. Econometrica 59, 61–79. Casadesus-Masanell, R., Klibanoff, P., Ozdenoren, E. (2000a) Maxmin expected utility through statewise combinations. Economics Letters 66, 49–54. Casadesus-Masanell, R., Klibanoff, P., Ozdenoren, E. (2000b) Maxmin expected utility over savage acts with a set of priors. Journal of Economic Theory 92, 35–65. Chew, S. H., Karni, E. (1994) Choquet expected utility with a finite state space: commutativity and act-independence. Journal of Economic Theory 62, 469–479. Dow, J., Werlang, S. (1994) Nash equilibrium under Knightian uncertainty: breaking down backward induction. Journal of Economic Theory 64, 305–324. Eichberger, J., Kelsey, D. (1996) Uncertainty aversion and preference for randomization. Journal of Economic Theory 71, 31–43. Eichberger, J., Kelsey, D. (2000) Nonadditive beliefs and strategic equilibria. Games and Economic Behavior 30, 183–215. Ellsberg, D. (1961) Risk, ambiguity, and the Savage axioms. Quarterly Journal of Economics 75, 643–669. Epstein, L. (1999) A definition of uncertainty aversion. Review of Economic Studies 66, 579–608. (Reprinted as Chapter 9 in this volume.) Ghirardato, P. (1997) On independence for non-additive measures with a Fubini theorem. Journal of Economic Theory 73, 261–291. Ghirardato, P., Klibanoff, P., Marinacci, M. (1996) Additivity with multiple priors. Journal of Mathematical Economics 30, 405–420. Ghirardato, P., Marinacci, M. (1998) Ambiguity made precise: a comparative foundation. Social Science Working Paper 1026, California Institute of Technology. (Reprinted as Chapter 10 in this volume.) Gilboa, I. (1987) Expected utility theory with purely subjective non-additive probabilities. Journal of Mathematical Economics 16, 141–153. (Reprinted as Chapter 6 in this volume.) Gilboa, I., Schmeidler, D. (1989) Maxmin expected utility with non-unique prior. Journal of Mathematical Economics 18, 141–153. (Reprinted as Chapter 6 in this volume.) Gul, F. (1992) Savage’s theorem with a finite number of states. Journal of Economic Theory 57, 99–110. Hendon, E., Jacobsen, H. J., Sloth, B., Tranaes, T. (1996) The product of capacities and belief functions. Mathematical Social Sciences 32, 95–108. Klibanoff, P. (1994) Uncertainty, decision, and normal form games. Mimeo, Northwestern University. Klibanoff, P. Characterizing uncertainty aversion through preference for mixtures. Social Choice and Welfare, 18, 289–301. Lo, K. C. (1996) Equilibrium in beliefs under uncertainty. Journal of Economic Theory 71, 443–484. (Reprinted as Chapter 20 in this volume.) Nakamura, Y. (1990) Subjective expected utility with non-additive probabilities on finite state spaces. Journal of Economic Theory 51, 346–366.
260
Peter Klibanoff
Nehring, K. (1999) Capacities and probabilistic beliefs: a precarious coexistence. Mathematical Social Sciences 38, 197–213. Raiffa, H. (1961) Risk, ambiguity, and the Savage axioms: comment. Quarterly Journal of Economics 75 , 690–694. Sarin, R., Wakker, P. (1992) A simple axiomatization of nonadditive expected utility. Econometrica 60, 1255–1272. (Reprinted as Chapter 7 in this volume.) Sarin, R., Wakker, P. (1997) Dynamic choice and nonexpected utility. Journal of Risk and Uncertainty 17, 87–119. Schmeidler, D. (1986) Integral representation without additivity. Proceedings of the American Mathematical Society 97, 255–261. Schmeidler, D. (1989) Subjective probability and expected utility without additivity. Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) Wakker, P. (1984) Cardinal coordinate independence for expected utility. Journal of Mathematical Psychology 28, 110–117. Wakker, P. (1989) Continuous subjective expected utility with non-additive probabilities. Journal of Mathematical Economics 18, 1–27.
12 Decomposition and representation of coalitional games Massimo Marinacci
12.1. Introduction Let F be an algebra of subsets of a given space X, and V the set of all set functions ν on F such that ν(Ø) = 0. These set functions are called transferable utility coalitional games (games, for short). Gilboa and Schmeidler (1995) prove the existence of a one-to-one correspondence between the games in V that have finite composition norm (see Section 12.2) and the bounded finitely additive measures defined on an appropriate algebra of subsets of 2F . The algebra is constructed as follows: for T ∈ F , define T ∗ ⊆ F by T ∗ = {S ∈ F : Ø = S ⊆ T }. Denote = {T ∗ : Ø = T ∈ F }. Then is defined as the algebra of subsets of 2F generated by . On the basis of this representation, Gilboa and Schmeidler (1995) show that every game with finite composition norm can be decomposed in the difference of two totally monotone games (i.e., belief functions). Their work is related to Choquet (1953–1954) and Revuz (1955–1956), as discussed on p. 211 of their article. In this chapter we first give a direct proof of the mentioned decomposition theorem, a proof based on the well-known Dempster–Shafer–Shapley Representation Theorem for finite games (see, e.g. Shapley, 1953; Shafer, 1976). On the basis of such a decomposition we obtain a one-to-one correspondence between the games in V that have finite composition norm and the bounded regular countably additive measures defined on an appropriate Borel σ -algebra. To construct this σ -algebra we proceed as follows. Let Ub be the set of all {0, 1}-valued convex games (a game ν is convex if ν(A) + ν(B) ≤ ν(A ∪ B) + ν(A ∩ B) for all A, B ∈ F ). The set Ub can be endowed with a natural topology τν as defined in Section 12.3. Let B(Ub ) be the Borel σ -algebra on (Ub , τν ) This is the σ -algebra we use to get the mentioned one-to-one correspondence. A main advantage of this novel representation theorem is that the space of bounded regular countably additive measures on B(Ub ) has much more structure than the space of finitely additive measures on the algebra . Besides, unlike
Marinacci, M. (1996) Decomposition and representation of coalitional games, Mathematics of Operations Research 21, 1000–1015.
262
Massimo Marinacci
finitely additive measures, regular countably additive measures are widely studied in measure theory, and technically more convenient. For finite algebras, both the Gilboa–Schmeidler representation and the one proved here reduce to the Dempster–Shafer–Shapley Representation Theorem. Finally, in this work we use a topological approach, while Gilboa and Schmeidler (1995) use an algebraic one. This is a secondary contribution of the chapter. In particular, with our topological approach it is possible to reobtain also their finitely additive representation. In sum, the approach taken in this chapter leads to novel results, without losing any of the results already proved with the different algebraic approach taken by Gilboa and Schmeidler (1995). This gives a unified perspective on this topic. The chapter is organized as follows. Section 12.2 contains some preliminary material. In Section 12.3 some properties of a locally convex topological vector space on V are proved. In Section 12.4 a direct proof of the decomposition theorem is provided. In Section 12.5, which is the heart of the chapter, the main result is proved. As a consequence of this result, in Section 12.6 it is proved that every Choquet integral on F can be represented by a standard additive integral on B(Ub ). In Section 12.7 it is shown how the finitely additive representation result of Gilboa and Schmeidler (1995) can be reobtained in our setup. Finally, in Section 12.8 some dual spaces of V are studied.
12.2. Preliminaries A set function ν on the algebra F is said to be a game if ν(Ø) = 0. The symbol V denotes the set of all games defined on F . The space V becomes a vector space if we define addition and multiplication elementwise: (ν1 + ν2 )(A) = ν1 (A) + ν2 (A) and
(αν)(A) = αν(A)
for all A ∈ F and α ∈ /. A game ν is monotone if ν(A) ≤ ν(B) whenever A ⊆ B. A game ν is convex if ν(A) + ν(B) ≤ ν(A ∩ B) + ν(A ∪ B) for all A, B ∈ F . A game ν is normalized if ν(X) = 1. A game ν is totally monotone if it is non-negative and if for every n ≥ 2 and A1 , . . . , An ∈ F we have: ν
n i=1
! Ai
≥
{I : Ø =I ⊆{1,...,n}}
(−1)
|I |+1
ν
! Ai
.
i∈I
For T ∈ F , the {0, 1}-valued game uT ∈ V such that uT (A) = 1 if and only if T ⊆ A is called a unanimity game. We can now present the Dempster–Shafer– Shapley Representation Theorem, which will play a central role in the sequel. Given a finite algebra F = {T1 , . . . , Tn }, the atoms of F are the sets of the form T1i1 ∩ T2i2 ∩ · · · ∩ Tnin where ij ∈ {0, 1} and Tj0 = −Tj , Tj1 = Tj (−T denotes the complement of T ). We denote by the set of all atoms of F . It holds n ≤ | | ≤ 2n .
Representation of coalitional games
263
Theorem 12.1. Suppose F is finite. Then {uT : Ø = T ∈ F } is a linear basis for V . Given ν ∈ V , the unique coefficients {αTν : Ø = T ∈ F } satisfying
ν(A) =
αTν uT (A) for all A ∈ F
Ø=T ∈F
are given by αTν
=
(−1)
|T |−|S|
ν(S) = ν(T ) −
{I : ∅ =I ⊆{1,...,k}}
S⊆T
(−1)
|I |+1
ν
! Ti
i∈I
where Ti = T \ ωi , T = ∪ki=1 ωi . Moreover, ν is totally monotone on F if and only if αTν ≥ 0 for all nonempty T ∈ F . For a finite algebra F , Gilboa and Schmeidler (1995) define the composition norm of ν ∈ V to be ν = T ∈F |αTν |. On infinite algebras, Gilboa and Schmeidler (1995) define the composition norm · of ν ∈ V in the following way. Given a subalgebra F0 ⊆ F , let ν|F0 denote the restriction of ν to F0 . Then define: , + ν = sup ν|F0 : F0 is a finite subalgebra of F . The function · is a norm, and in what follows · will always denote the composition norm. V b will denote the set {ν ∈ V : ν is finite}. The pair (V b , ·) is a Banach space (see Gilboa and Schmeidler, 1995: 204). Another important norm in transferable utility cooperative game theory is the variation norm · ν , introduced by Aumann and Shapley (1974). This norm is defined by n $ |ν(Ai+1 ) − ν(Ai )|: Ø = A0 ⊆ A1 ⊆ · · · ⊆ An+1 = X . νv = sup i=0
BV b will denote the set {ν ∈ V : νv is finite}. If ν is additive, then · coincides with · v , and both norms coincide with the standard variation norm for additive set functions (cf. Gilboa and Schmeidler, 1995: 201). A filter p of F is a collection of subsets of F such that (1) X ∈ p, (2) if A ∈ p, B ∈ F , and A ⊆ B, then B ∈ p, (3) if A ∈ p and B ∈ p, then A ∩ B ∈ p. Let p be a filter of F . Then (1) p is a proper filter if Ø ∈ / p, that is, p = F , (2) p is a principal filter if p = {B ∈ F : A ⊆ B} for some A ∈ F ; p is then the principal filter generated by A, (3) p is a free filter if it is not a principal filter.
264
Massimo Marinacci
In a finite algebra all filters are principal. This is no longer true in infinite algebras. For example, let X be an infinite space. A simple example of a free filter in the power set of X is the collection of all cofinite sets {A ⊆ X : − A is finite}. Every filter p can be directed by the binary relation ≥ defined by A ≥ B ⇐⇒ A ⊆ B
where A, B ∈ p.
Let f : X → / be a real-valued function on X. Set fA = inf f (x)
for each A ∈ p.
x∈A
The pair (fA , ≥) is a monotone increasing net. Using it we can define lim inf p f as follows: lim inf f ≡ lim fA . p
A∈p
If p is a principal filter generated by a set A ∈ F , then lim inf p f = inf x∈A f (x). This shows that lim inf p f is the appropriate generalization of inf x∈A f (x) needed to take care of free filters. We denote by F the set of all bounded functions f : X → / such that for every t ∈ / the sets {x : f (x) > t} and {x : f (x) ≥ t} belong to F . For a monotone set function ν ∈ V and a function f ∈ F , the Choquet integral is defined as
f dν =
∞
ν({x : f (x) ≥ t}) dt +
0
0 −∞
[ν({x : f (x) ≥ t}) − ν(X)] dt,
where the r.h.s. is a Riemann integral.
12.3. A locally convex topological vector space on V A natural topology on V has as a local base at ν0 ∈ V the sets of the form B(ν0 ; A1 , . . . , An ; ) = {ν ∈ V : |ν0 (Ai ) − ν(Ai )| <
for 1 ≤ i ≤ n},
where Ai ∈ F for 1 ≤ i ≤ n, and > 0. We call this topology the V-topology of V . In the next proposition we claim that under this topology the vector space V becomes a locally convex and Hausdorff topological vector space. The proof is standard, and it is therefore omitted. Proposition 12.1. Under the V-topology the vector space V is a locally convex and Hausdorff topological vector space. The next proposition is rather important for the rest of the chapter and it is a simple extension of Alaoglu Theorem to this setup. Proposition 12.2. The set {ν ∈ V : νv ≤ 1} is V-compact in V .
Representation of coalitional games
265
Proof. Let MO be the set of all monotone set functions on F . Set K = {ν ∈ MO : ν(X) ≤ 1}. Let I = A∈F [0, 1]A . By the Tychonoff Theorem, I is compact w.r.t. the product topology. We can define a map τ : K → I by τ (ν) = A∈F ν(A). It is easy to check that τ is a homeomorphism between K endowed with the relative V-topology, and τ (K) endowed with the relative product topology. Therefore, to prove that K is V-compact it suffices to show that K is V-closed. Let να be a net in K that V-converges to a game ν. Since limα να (A) = ν(A) for all A ∈ F , it is easy to check that ν ∈ MO and ν(X) ≤ 1. We conclude that K is V-closed, and, therefore, V-compact. Let ν ∈ BV b . In Aumann and Shapley (1974: 28) it is proved that there exists a decomposition ν1 , ν2 of ν such that ν1 , ν2 ∈ MO and νv = ν1 v + ν2 v . Therefore, {ν ∈ V : νv ≤ 1} ⊆ K − K. Since V is a locally convex and Hausdorff topological vector space, the set K − K is V-compact (see e.g. Schaefer, 1966: I.5.2). Therefore, to prove that {ν ∈ V : νv ≤ 1} is V-compact it suffices to show that it is V-closed. Let να be a net in {ν ∈ V : νv ≤ 1} that V-converges to a game ν. For each α there exists a decomposition ν1,α , ν2,α such that both ν1,α and ν2,α are in K and να v = ν1,α v + v2,α v . Since K is V-compact and να V-converges to ν, there exist two subnets ν1,β and ν2,β that V-converge, respectively, to two games ν1 and ν2 such that ν1 , ν2 ∈ K and ν = ν1 − ν2 . We can write: νv = ν1 − ν2 v ≤ ν1 v + ν2 v = ν1 (X) + ν2 (X) = lim{ν1,β (X) + ν2,β (X)} = lim{ν1,β + ν2,β } = lim νβ . β
β
β
Therefore, νv ≤ 1, as wanted.
12.4. Decomposition In this section we prove the decomposition result mentioned in the Introduction. The proof is rather different than that of Gilboa and Schmeidler (1995), and it is essentially based on the properties of the V-topology. It is worth noting that this result is an extension to set functions in V b of the well-known Jordan Decomposition Theorem for measures. Theorem 12.2. (i) Let ν ∈ V b . Then there exist two totally monotone games ν + and ν − such that ν(A) = ν + (A) − ν − (A) for all A ∈ F and ν = ν + + ν − . Moreover, this is the unique decomposition that satisfies the norm equation. (ii) The set U (X) = {ν ∈ V : ν ≤ 1} is V-compact. Proof. Let T M = {ν ∈ V : ν is totally monotone}. Let ν0 ∈ V b . Let B(ν0 ; A1 , . . . , An ; ε) be a neighborhood of ν0 . Let F (A1 , . . . , An ) be the algebra generated by {A1 , . . . , An }. Since F (A1 , . . . , An ) is finite, Theorem 12.1 holds. Let
266
Massimo Marinacci
F+ = {Ø = A ∈ F (A1 , . . . , An ) : αTν ≥ 0}, and F = {A ∈ F (A1 , . . ., An ) : A = Ø}. Set ν+ = (−αTν )uT . αTν uT and ν − = T ∈F+
T ∈F \F+
As observed by Gilboa and Schmeidler (1994: 56), we have ν0 (A) = ν + (A) − ν − (A) for all F (A1 , . . . , An ) and ν0|F (A1 ,...,An ) = ν + +ν − . Moreover, each unanimity game uT is totally monotone on the entire algebra F . Set ν = ν + − ν − . Clearly, ν = (ν + − ν − ) ∈ B(ν0 ; A1 , . . . , An ; ε). Since ν + is totally monotone, ν + = ν + (X). Therefore, ν + = + − − ν|F (A1 ,...,An ) . Similarly, ν = ν|F (A1 ,...,An ) . We now show that ν = − + ν + ν . Since ν(A) = ν0 (A) for all A ∈ F (A1 , . . . , An ), we have + − ν|F (A1 ,...,An ) = ν0|F (A1 ,...,An ) = ν|F (A1 ,...,An ) + ν|F (A1 ,...,An ) . + − + − Hence, ν ≥ ν|F (A1 ,...,An ) +ν|F (A1 ,...,An ) = ν +ν . On the other hand, since · is a norm, ν ≤ ν + + ν − . We conclude ν = ν + + ν − , as claimed. Using this equality we can write: + − + − ν0 ≥ ν0|F (A1 ,...,An ) = ν|F (A1 ,...,An ) + ν|F (A1 ,...,An ) = ν + ν . (12.1)
By what has just been proved, if we consider the family of all V-neighborhoods of ν0 as directed by the inclusion ⊆, there exists a net να that V-converges to ν0 , and such that for all α we have: (i) να = να+ − να− ; (ii) να = να+ + να− ; (iii) να ≤ ν0 . Set M = ν0 and U M (X) = {ν ∈ V : ν ≤ M}. If ν ∈ TM , then ν = νv = ν(X). Therefore, using Proposition 12.2, it is easy to check that the set TM ∩ U M (X) is V-compact. Since να+ is a net in TM ∩ U M (X), there exists a subnet νβ+ that V-converges to a game ν0+ ∈ TM ∩ U M (X). Since the net να V-converges, this implies that also the subnet νβ− (which is equal to νβ+ − νβ )V-converges to a game ν0− ∈ T M ∩ U M (X). Clearly, ν0 = limβ (νβ+ − νβ− ) = ν0+ − ν0− . Moreover: ν0 ≥ lim νβ = lim{νβ+ + νβ− } = lim{νβ+ (X) + νβ− (X)} β
=
β
ν0+ (X) + ν0− (X)
β
=
ν0+ + ν0− ,
(12.2)
where the first inequality follows from expression (12.1). On the other hand, ν0 = ν0+ − ν0− implies ν0 ≤ ν0+ + ν0− . Together with (12.2), this implies
Representation of coalitional games
267
ν0 = ν0+ + ν0− . This proves the existence of the decomposition. As to uniqueness, to prove it we need the techniques used in the proof of Theorem 12.3. Consequently, uniqueness is proved in the proof of Theorem 12.3. As to part (ii), we first show that U (X) is V-closed. Let να be a net in U (X) that V-converges to an element ν ∈ V b . By part (i), there exists a decomposition να = να+ − να− with να+ , να− ∈ U (X) ∩ TM and να = να+ + να− . Proceeding as before, we can prove that there exist two subnets νβ+ and νβ− that V-converge, respectively, to ν + and ν − , where ν + , ν − ∈ TM ∩ U (X). We have: ν = ν + − ν − ≤ ν + + ν − = ν + (X) + ν − (X) = lim{νβ+ (X) + νβ− (X)} = lim{νβ+ + νβ− } = lim νβ . β
β
β
Since νβ ≤ 1 for all β, we can conclude ν ≤ 1, so that ν ∈ U (X). This proves that U (X) is V-closed. On the other hand, from part (i) it follows that U (X) ⊆ [T M ∩ U (X)] − [T M ∩ U (X)]. Since the set T M ∩ U (X) is V-compact, also the set [T M ∩ U (X)] − [T M ∩ U (X)] is V-compact, and this implies that U (X) is V-compact, as desired.
12.5. Countably additive representation of games in V b We first introduce a class of simple games that will play a key role in the sequel. As observed in Shafer (1979), these games can already be found in Choquet (1953– 1954). Definition 12.1. Let p be a proper filter of F . A normalized game up (·) on F is called a filter game if up (A) = 1 whenever A ∈ p, and up (A) = 0 whenever A∈ / p. We denote by Ub the set of all filter games on F . Unanimity games are a subclass of filter games. In particular, a filter game up is a unanimity game if and only if p is a principal filter. For, if p is a principal filter, that is, p = {B ∈ F : A ⊆ B} for some A ∈ F , then up coincides with the unanimity game uA . In finite algebras all filters are principal, so that all filter games are unanimity games. This is no longer true in infinite algebras, where there are free filters to consider. For example, if P (X) is the power set of an infinite |X| space, it is known that there are 22 filters (see e.g. Balcar and Franek, 1982), and |X| just 2 of them are principal. In sum, filter games are the natural generalization of unanimity games to infinite algebras. We now list some very simple properties of filter games. Proposition 12.3. (i) A game is {0, 1}-valued and convex if and only if it is a filter game. (ii) Every filter game is totally monotone. (iii) The set Ub is V-compact in V .
268
Massimo Marinacci
Remark. Of course, (i) and (ii) together imply that a game is {0, 1}-valued and totally monotone if and only if it is a filter game. Proof. (i) “Only if” part: Let ν be a {0, 1}-valued and convex game. Then ν is monotone. Let p = {A : ν(A) = 1}. By monotonicity, if A ∈ p, then B ∈ p whenever A ⊆ B. Now, let A, B ∈ p. Then ν(A) = ν(B) = ν(A ∪ B) = 1. By convexity, 1 = ν(A) + ν(B) − ν(A ∪ B) ≤ ν(A ∩ B), and so ν(A ∩ B) = 1. We conclude that p is a filter, and ν a filter game. “If ” part: Tedious, but obvious. (ii) Let up be a filter game, and A1 . . . , An ∈ F . Let I∗ = {i : Ai ∈ p}. If I∗ is empty, the claim is obvious. Let I∗ = Ø, with I∗ = k. Let Ck,i be a binomial coefficient. Then: ! k |I |+1 (−1) up Ai = Ck,i (−1)i+1 = 1. {I : Ø =I ⊆{1,...n}}
i∈I
i=1
Since up (∪ni=1 Ai ) = 1, it follows that up is totally monotone. (iii) Let να be a net in Ub that V-converges to an element ν ∈ V . By hypothesis, να (A) ∈ {0, 1} for all A ∈ F . Then ν(A) ∈ {0, 1} for all A ∈ F . Hence ν is {0, 1}-valued. It is easy to check that ν is also convex. Therefore, by part (i), ν is a filter game, as wanted. Let B(Ub ) be the Borel σ -algebra on the space Ub w.r.t. the V-topology, and let rca(Ub ) be the set of all regular and bounded Borel measures on B(Ub ). Moreover, set rca+ (Ub ) = {µ ∈ rca(Ub ) : µ(A) ≥ 0 for all A ∈ B(Ub )} and rca1 (Ub ) = {µ ∈ rca+ (Ub ) : µ(Ub ) = 1}. When the space rca(Ub ) is endowed with the weak∗ -topology, we write (rca(Ub ), τw ). Similarly, (V b , τV ) denotes the space V b endowed with the V-topology. We recall that an isometric isomorphism between two normed spaces is a one-to-one continuous linear map which preserves the norm. As in the previous section, U (X) = {ν ∈ V : ν ≤ 1}. Theorem 12.3. There is an isometric isomorphism J ∗ between (V b , · ) and (rca(Ub ), · ) determined by the identity up (A) dµ for all A ∈ F .
ν(A) =
(12.3)
Ub
The correspondence J ∗ is linear and isometric, that is, µ = J ∗ (ν) = ν. Moreover, ν is totally monotone if and only if the corresponding µ is nonnegative. Finally, the map J ∗ is a homeomorphism between (V b ∩ U (X), τV ) and (rca(Ub ) ∩ U (X), τw ).
Representation of coalitional games
269
In other words, we claim that for each ν ∈ V b there is a unique µ ∈ rca(Ub ) such that (12.3) holds; and, conversely, for each µ ∈ rca(Ub ) there is a unique ν such that (12.3) holds. Remark. The following proof is based on Theorem 12.2. The hard part is uniqueness. The proof of existence of the measure µ when ν is totally monotone is based on the well-known Dempster–Shafer–Shapley Representation Theorem for games defined on finite algebras (Theorem 12.1), and on a simple compactness argument (similar to those used in Choquet Representation Theory (see Phelps, 1965)). Other remarks on the existence part can be found after Corollary (12.1) below. Proof. Let T M1 = {ν ∈ V : ν is totally monotone and ν = 1}. Let U na = {uT : T ∈ F and T = Ø} be the set of all unanimity games on F . Let ν ∈ T M1 , and let B(ν; A1 , . . . , An ; ) be a neighborhood of ν. Let F (A1 , . . . , An ) be the algebra generated by {A1 , . . . , An }. By Theorem (12.1) there exist αTν ≥ 0 such that
v(A) =
αTν uT (A),
{Ø=T ∈F (A1 ,...,An )}
for allA ∈ F (A1 , . . . , An ), αTν≥ 0 for all Ø = T ∈ F (A1 , . . . , An ) ν ν and = 1. Hence T αT Ø =T ∈F (A1 ,...,An ) αT uT (·) belongs both to B(ν; A1 , . . . , An ; ) and to co(Una). This implies T M1 = cl{co(Una)}. Clearly, Una ⊆ Ub . Moreover, we know Proposition 12.3 (ii) that Ub ⊆ T M1 . Therefore, T M1 = cl{co(Ub )}. Hence, there exists a net λβ contained in co(Ub ) that V-converges to ν. For A ∈ F , let fA : Ub → / be defined by fA (up ) = up (A). The map fA is V-continuous on Ub . By definition, λβ (A) =
αj upj (A)
j ∈Iβ
for all A ∈ F and some finite index set Iβ . Since can write fA dµβ , λβ (A) =
j ∈Iβ
αj = 1 and αj ≥ 0, we
Ub
where µβ (upj ) = αj if j ∈ Iβ , and µβ (upj ) = 0 otherwise. Since, by Proposition 12.1 (iii), Ub is V-compact, it is known that rca1 (Ub ) is weak∗ -compact. Then there exists a subnet µγ of µβ that weak∗ -converges to some µ0 ∈ rca1 (Ub ). Since fA is V-continuous, Ub fA dµγ converges to Ub fA dµ0 for all A ∈ F . Set νγ (A) = Ub fA dµγ for all A ∈ F . The net νγ (A) converges to ν(A) for all
270
Massimo Marinacci
A ∈ F because λβ V-converges to ν. Therefore, it follows that up (A) dµ0 , ν(A) = Ub
for all A ∈ F , and we conclude that µ0 is the measure we were looking for. We have therefore proved the existence of a correspondence J1 between TM 1 and rca1 (Ub ), where J1 is determined by ν(A) = Ub up (A) dµ for all A ∈ F . Clearly, ν(X) = Ub dµ = µ(Ub ) because X ∈ p for every filter p in F . Hence, if µ ∈ J1 (ν), then ν = µ. Let T M = {ν ∈ V b : ν is totally monotone}. A simple argument now shows that there exists a correspondence J between T M and rca+ (Ub ), where J is determined by ν(A) = up (A) dµ Ub
for all A ∈ F . Moreover, if µ ∈ J (ν), then ν = µ. Let ν ∈ V b . By Theorem 12.2 (existence part), there exist ν + , ν − ∈ T M such that ν = ν + − ν − and ν = ν + + ν − . Let µ+ ∈ J (ν + ) and µ− ∈ J (ν − ). Set µ = µ+ − µ− . Since {up : A ∈ p} ∈ B(Ub ), we have: + − ν(A) = up (A) dµ − up (A)dµ = up (A) dµ, Ub
Ub
Ub
for all A ∈ F . We claim that ν = µ. On one hand, since ν + = µ+ and ν − = µ− , we have: µ ≤ µ+ + µ− = ν + + ν − = ν. On the other hand, let µ1 and µ2 be the Jordan decomposition of the signed measure µ. We have µ = µ1 + µ2 = ν1 + ν2 ≥ ν, where νi (A) = Ub up (A) dµi for all A ∈ F and i = 1, 2. The inequality holds because ν = ν1 − ν2 by construction. We conclude that ν = µ, as wanted. We now prove that this µ is unique. Indeed, suppose to the contrary that there exist two signed regular Borel measures µ, µ such that µ = µ = ν and ν(A) = up (A) dµ = up (A)dµ , (12.4) Ub
Ub
for all A ∈ F . We first observe that µ = µ = ν < ∞ implies sup |µ(B)| < ∞ and sup |µ (B)| < ∞, that is, µ and µ are bounded. Next, define a map s from F into the power set of Ub by s(A) = {up : A ∈ p}. Let = {s(A) : A ∈ F }. Every set s(A) is V-closed, so that ⊆ B(Ub ).
Representation of coalitional games
271
From (12.4) it follows that µ and µ coincide on . The set is a π -class (i.e., it is closed under intersection) because, as it is easy to check, it holds s(A) ∩ s(B) = s(A ∩ B). Then µ and µ coincide on A(), the algebra generated by . For, let L = {B ⊆ Ub : µ(B) = µ (B)}. We check that L is a λ-system (see, e.g., Billingsley, 1985: 36–38). Since X ∈ p for every filter p in F , (12.4) implies Ub ∈ L. Moreover, if B ∈ L, then B c ∈ L because µ and µ are additive. Finally, suppose {Bi }∞ i=1 is an infinite sequence of pairwise disjoint subsets of Ub . If Bi ∈ L for all i ≥ 1, then ∪∞ i=1 Bi ∈ L because both µ and µ are countably additive. We conclude that L is a λ-system. By the π − λ theorem (see, e.g., Billingsley, 1985: 37), this implies that µ and µ coincide on A(), as wanted. The algebra A() is a base for a topology on Ub . Let us denote by τs such a topology. Next we prove that τs coincides with the relative V-topology τv on Ub . Let B(up0 ; A1 , . . . , An ; ) be a neighborhood of up0 . Set I1 = {i ∈ {1, . . . , n} : up0 (Ai ) = 1}
and
I2 = {i ∈ {1, . . . , n} : up0 (Ai ) = 0}, and ⎡ G=⎣
i∈I1
⎤
⎡
s(Ai )⎦ ∩ ⎣
⎤ (s(Ai ))c ⎦ .
i∈I2
Of course, G ∈ A(). Moreover, up0 ∈ G. If P ∈ G, then up (A1 ) = up0 (Ai ) for all 1 ≤ i ≤ n, so that up ∈ B(up0 ; A1 , . . . , An ; }. Therefore, we conclude up0 ∈ G ⊆ B(up0 ; A1 , . . . , An ; ) and this implies τv ⊆ τs because the sets of the form B(up ; A1 , . . . , An ; ) are a local base for V. As to the converse,
let G ∈ A()
be a neighborhood of up0 . W.l.o.g. the set G has the form [ i∈I1 s(Ai )] ∩ [ i∈I2 (s(Ai ))c ]. This follows from the usual procedure used for the construction of A() from and from the fact that up0 ∈ G. Let us consider B(up0 ; A1 , . . . , An ; ). Clearly, up0 ∈ B(up0 ; A1 , . . . , An ; ). Let up ∈ B(up0 ; A1 , . . . , An ; ). Then up (Ai ) = up0 (Ai ) for all 1 ≤ i ≤ n. This implies Ai ∈ p if i ∈ I1 and Ai ∈ / p if i ∈ I2 . Then / s(Ai ) if i ∈ I2 . Consequently, up ∈ G. Therefore, up ∈ s(Ai ) if i ∈ I1 and up ∈ up0 ∈ B(up0 ; A1 , . . . , An ; ) ⊆ G and this implies τs ⊆ τv . We can conclude τs = τv , as desired. Propositions 12.1 and 12.3 imply that τv is a compact Hausdorff space. From τs = τv it follows that also τs is a compact Hausdorff space. Since µ and µ are regular Borel measures on a compact Hausdorff space, they are τ -additive. For, let µ1 , µ2 be the Jordan decomposition of µ, and let {Gα } be a net of open sets such that Gα ⊆ Gβ for α ≤ β. Both µ1 and µ2 are regular (see Dunford and Schwartz, 1957: 137).
272
Massimo Marinacci
Therefore, they are τ -additive (see Gardner, 1981: 47), that is, lim µi (Gα ) = µi α
! Gα
for i = 1, 2.
α
On the other hand, it holds that ! ! ! µ Gα = µ1 Gα − µ 2 Gα α
α
α
= lim µ1 (Gα ) − lim µ2 (Gα ) = lim[µ1 (Gα ) − µ2 (Gα )] α
α
α
= lim µ(Gα ), α
and this proves that µ is τ -additive. A similar argument holds for µ . Now, let G be an open set in τv . Since A() is a base for τs , we have G = i∈I Gi where Gi ∈A() for all i ∈ I . Let |I | be the cardinal number of I . If |I | ≤ |N| set G∗n = ni=1 Gi . Since A() is an algebra, G∗n ∈ A(), so that countable additivity implies µ(G) = lim µ(G∗n ) = lim µ (G∗n ) = µ (G). n
n
If |I | is any infinite cardinal, we can again order {Gi }i∈I so that {Gi }i∈I = {Gα : α < |I |} (Greek letters denote ordinal numbers). Define G∗α as follows: (i) G∗1 = G1 ; (ii) if α is not a limit ordinal, then set G∗α= G∗α−1 ∪ Gα ; (iii) if α is a limit ordinal, then set G∗α = γ < α G∗γ . To prove that µ(G) = µ (G) we can then use a transfinite induction argument on the increasing net of open sets G∗α , an argument based on τ -additivity and on the fact that α < |I |. Of course, µ(F ) = µ (F ) for all closed subsets F ⊆ Ub . The class of all closed subsets is a π -class, and B(Ub ) is the σ -algebra generated by the closed sets. We have already proved that L = {B ⊆ Ub : µ(B) = µ (B)} is a λ-sytem. Therefore, by the π − λ theorem, B(Ub ) ⊆ L, as wanted. This completes the proof that µ = µ . This implies that there exists a unique decomposition of ν that satisfies the norm equation. For, suppose there exist two pairs ν1 , ν2 and ν1 , ν2 such that ν(A) = ν1 (A) − ν2 (A) = ν1 (A) − ν2 (A) and ν = ν1 + ν2 = ν1 + ν2 .
for all A ∈ F ,
Representation of coalitional games
273
Let µ, µi and µi be the unique measures on B(Ub ) associated to ν, νi and νi for i = 1, 2. It is easy to check that µ(s(A)) = µ1 (s(A)) − µ2 (s(A)) = µ1 (s(A)) − µ2 (s(A))
for all A ∈ F ,
and µ = µ1 + µ2 = µ1 + µ2 . It is then easy to check that µ(A) = µ1 (A) − µ2 (A) = µ1 (A) − µ2 (A)
for all A ∈ A().
Using transfinite induction as we did before, this equality can be extended to all open sets in Ub , and it is then easy to see that µ(A) = µ1 (A) − µ2 (A) = µ1 (A) − µ2 (A)
for all A ∈ B(Ub ).
But, there is only a unique decomposition of µ on B(Ub ) that satisfies the norm equation, that is, the Jordan decomposition. Therefore, µi = µi for i = 1, 2 and so νi = νi for i = 1, 2 as desired. We have already defined a correspondence J between T M and rca+ (Ub ). By what has been proved, this correspondence is indeed a function, that is, J (ν) is a singleton for every ν ∈ T M. Define a function J ∗ on V b by J ∗ (ν) = J (ν + ) − J (ν − ), where ν + , ν − is the unique decomposition of ν that satisfies the norm equation. Clearly J ∗ (ν) ∈ rca(Ub ), and J ∗ is onto. By now we know that µ ∈ J ∗ (ν) implies ν = µ. Therefore, we conclude that J ∗ is an isometric isomorphism. Finally, we show that J ∗ is a homeomorphism between (V b ∩ U (X), τD ) and (rca(Ub ) ∩ U (X), τw ). Since V b ∩ U (X) is V-compact and J ∗ is a bijection, it suffices to show that J ∗ is continuous on V b ∩ U (X). Let να be a net in V b ∩ U (X) that V-converges to an element ν ∈ V b ∩ U (X). Since rca(Ub ) ∩ U (X) is weak∗ -compact, to show that J ∗ (να ) weak∗ -converges to J ∗ (ν) it suffices to prove that every convergent subnet J ∗ (νβ ) of J ∗ (να ) V-converges to J ∗ (ν). Suppose limβ J ∗ (νβ ) = J ∗ (ν ). Then ν(A) = lim νβ (A) = lim up (A) dJ ∗ (νβ ) = up (A) dJ ∗ (ν ). β
β
Ub
Ub
Since J ∗ is bijective, this implies ν = ν , as wanted. As a simple corollary of Theorem 12.3, we can obtain the following interesting result, proved in a completely different way in Choquet (1953–1954). Let T M be the set of all totally monotone games on F . Corollary 12.1. A game is an extreme point of the convex set {ν ∈ T M : ν = 1} if and only if it is a filter game.
274
Massimo Marinacci
Proof. It is well known that the Dirac measures are the extreme points of the set of all regular probability measures defined on the Borel σ -algebra of a compact Hausdorff space. Since B(Ub ) satisfies this condition, a simple application of Theorem 12.3 proves the result. As observed in Shafer (1979), using this result of Choquet, the existence part in Theorem 12.3 for totally monotone set functions can be obtained as a consequence of the celebrated Krein–Milman Theorem. However, we think that the simple proof of existence we have given, based on the well-known Dempster–Shafer–Shapley Representation Theorem for finite algebras, is more attractive in the context of this chapter. Indeed, in Section 12.7 it will be proved that this technique leads to a new proof of the finitely additive representation of Revuz (1955–1956) and Gilboa and Schmeidler (1995).
12.6. Integral representation As a consequence of Theorem 12.3, we have the following representation result for the Choquet integral. Theorem 12.4. Let ν be a monotone set function in V b and f ∈ F . Then
f dν =
)
* lim inf f dµ p
Ub
X
where µ = J ∗ (ν). Proof. If f ∈ F is a simple function, it is easy to check that
f dν = X
f dup
Ub
dµ.
X
If f ∈ F is not a simple function, then there exists a sequence of simple functions that converges uniformly to f . Since the standard convergence theorems hold for the Choquet integral under uniform convergence, we conclude that the above equality is truefor any f ∈ F . To complete the proof we now show that ( Ub X f dup ) dµ = Ub [lim inf p f ] dµ. The pair (uA , ≥) is a net, where ≥ is the binary relation that directs p. We want to show that limA∈p uA = up . Let B(up ; A1 , . . . , An ; ) be a neighborhood of up . Set I = {i ∈ {1, . . . , n} : Ai ∈ p}. Suppose first that I = Ø. Then uA (Ai ) = up (Ai ) = 0 for all 1 ≤ i ≤ n and all
A ∈ p. This implies uA ∈ B(up ; A1 , . . . , An ; ). Suppose I = Ø. Set T = i∈I Ai . Let A ∈ p such that A ≥ T . Then uA (Ai ) = up (Ai ) = 0 whenever i ∈ / I , and uA (Ai ) = up (Ai ) = 1 whenever i ∈ I . Again, this implies
Representation of coalitional games
275
uA ∈ B(up ; A1 , . . . , An ; ). All this proves that limA∈p uA = up . Then f duA = f dup . lim A∈p X
But
X
X
fduA = inf x∈A f (x). Therefore lim f dup dµ = f duA dµ =
Ub
X
Ub
= Ub
)
A∈p X
* lim inf f dµ,
) lim
Ub
*
A∈p
inf f (x)
x∈A
dµ
p
as wanted. This result suggests a simple, but useful observation. Let f : N → X be a bounded infinite sequence in X. For convenience, set xn = f (n) for all n ≥ 1. Let us consider the power set P (N ). Let pc be the free filter of all cofinite subsets of N , and δpc the Dirac measure concentrated on pc . Then f dupc = lim inf f dδpc = lim inf xn . (12.5) N
Ub
pc
n
This shows that the lim inf of an infinite bounded sequence may be seen as a Choquet integral. This is interesting because Choquet integrals have been axiomatized as a decision criterion in the so-called Choquet subjective expected utility (CSEU, for short; see Schmeidler, 1989). As Equation (12.5) shows, the ranking of two infinite payoff streams through their lim inf can then be naturally embedded in CSEU. Of course, here we interpret games as weighting functions over periods and not as beliefs. In repeated games choice criteria based on lim inf have played an important role (see, e.g., Myerson, 1991: Ch. 7). One might hope that elaborating on Equation (12.5) a better understanding of the decision-theoretic bases of these criteria may be obtained. This is the subject of future research.
12.7. Finitely additive representation of games in V b In this section we give a proof of the finitely additive representation already proved, with an algebraic approach, by Revuz (1995–1956) and by Gilboa and Schmeidler (1995). The proof we give is a modification of the proof we used to prove the countably additive representation of the previous section. Let U na = {uT : T ∈ F and T = Ø} be the set of all unanimity games on F . Unlike Ub , the set U na is not V-compact. Consequently, the set of all bounded regular Borel measures on U na is not as well behaved as it was on Ub . To get a representation theorem base on U na it is therefore natural to look directly at
276
Massimo Marinacci
ba(U na, ), the set of all bounded finitely additive measures on an appropriate algebra ⊆ 2U na(X,F ) . Indeed, the unit ball in ba(U na, ) is weak∗ -compact, whatever topological structure on X we have. The problem is now to find out an appropriate algebra . For any A ∈ F let fA : U na → / be defined by fA (uT ) = uT (A). Moreover, let ∗ be the algebra generated by the sets {uT : T ⊆ A and Ø = T ∈ F } for A ∈ F . It turns out that the appropriate algebra is just ∗ , which is also the smallest algebra w.r.t. the functions fA belonging to B(U na, ∗ ), where B(U na, ∗ ) denotes the closure w.r.t. the supnorm of the set of all simple functions on ∗ . Since, as it is easy to check, there is a one-to-one correspondence between ∗ and , where is the algebra generated by the sets of the form {T : T ⊆ A and Ø = T ∈ F } for A ∈ F , we finally have the following result. Theorem 12.5. There is an isometric isomorphism T ∗ between (V b , · ) and (ba(U na, ), · ) determined by the identity ν(A) =
{T ∈F : T =Ø}
uT (A) dµ f or all A ∈ F .
(12.6)
The correspondence T ∗ is linear and isometric, that is, µ = T ∗ (ν) = ν. Moreover, ν is totally monotone if and only if the corresponding µ is nonnegative. In other words, we claim that for each ν ∈ V b there is a unique µ ∈ ba(U na, ) such that (12.6) holds; conversely, for each µ ∈ ba(U na, ) there is a unique ν such that (12.6) holds. Proof. Let ∗∗ be the algebra on U na generated by {uT }T ∈Y and ∗ . Let T M1 = {ν ∈ V : ν is totally monotone and ν = 1}. From the proof of Theorem 12.3 we already know that T M1 = cl{co(U na)}. Since the unit ball in ba(U na, ∗∗ ) is weak∗ -compact, an existence argument similar to that used in the proof of Theorem 12.3 proves that there exists a finitely additive probability measure µ∗∗ ∈ ba(U na, ∗∗ ) such that: ν(A) =
{T ∈F : T =Ø}
uT (A) dµ∗∗
for all A ∈ F .
(12.7)
At this stage we have to consider the whole ∗∗ because measures on U na with finite support might not be in ba(U na, ∗ ), while they always are in ba(U na, ∗∗ ). We can rewrite (12.7) as ν(A) = Y fT (A) dµ∗∗ for all A ∈ F . Let µ∗ be the restriction of µ∗∗ on ∗ . Since fT ∈ B(X, ∗ ), by lemma III.8.1 of Dunford and Schwartz (1957) we have: ν(A) =
{T ∈F : T =Ø}
uT (A) dµ∗
for all A ∈ F .
(12.8)
Representation of coalitional games
277
As to uniqueness, suppose there exists a probability measure µ ∈ ba(U na, ∗ ) such that ν(A) = uT (A) dµ∗ {T ∈F : T = Ø}
=
{T ∈F : T = Ø}
uT (A) dµ
for all A ∈ F .
This implies that µ∗ and µ coincide on {µT : T ⊆ A and Ø = T ∈ F }. Since the sets of the form {uT : T ⊆ A and Ø = T ∈ F } are a π -class and ∗ is generated by them, it is easy to check that µ∗ = µ , as wanted. There is a one-to-one correspondence g between ∗ and such that for A ∈ F we have g({uT : T ⊆ A and Ø = T ∈ F }) = {T : T ⊆ A and Ø = T ∈ F }. Setting µ(g(A)) = µ∗ (A) for all A ∈ ∗ , we finally get ν(A) = uT (A) dµ for all A ∈ F . {T ∈F : T =Ø}
(12.9)
We have therefore proved the existence of a bijection T1 between T M1 and the set of all probability measures in ba(U na, ), and T1 is determined by ν(A) = {T ∈F : T =Ø} uT (A) dµ for all A ∈ F . Clearly, ν(X) = Y dµ = µ(U na) because T ⊆ X for all T ∈ F . Hence ν = T1 (ν). This shows that T1 is isometric. The rest of the proof, that is, the construction of the isometric isomorphism T ∗ that extends T1 from T M1 to V b , can be done through a simple application of the decomposition obtained in Theorem 12.2. This observation completes the proof.
As a corollary we can obtain also Theorem E of Gilboa and Schmeidler (1995). It is interesting to compare this result with Theorem 12.4. In the next corollary the argument of the Choquet integral is inf x∈T f (x) because only unanimity games are considered, while in Theorem 12.4 we used the more general lim inf p f because we integrated over all filter games, including those defined by free filters. Corollary 12.2. Let ν be a monotone set function in V b and f ∈ F . Then ) * inf f (x) dµ, f dv = X
U na
x∈T
where µ = T ∗ (ν). Remark. This corollary is a bit sharper than Theorem E in Gilboa and Schmeidler (1995). In fact, instead of V b they use its subset V σ = {ν ∈ V b : µ = T ∗ (ν) is a σ -additive signed measure}.
278
Massimo Marinacci
Proof. If f ∈ F , we can apply the same argument (based only on finite additivity) used in the first part of the proof of Theorem 12.4 to prove that f dv = f duT dµ. X
Since
U na
X
X
f duT = inf x∈T f (x), we get the desired result.
12.8. Dual spaces On the basis of the representation results proved in Sections 12.5 and 12.6, in this section we give two different Banach spaces that have duals congruent to V b , that is, there exists an isometric isomorphism between V b and these dual spaces. Proposition 12.4. The following Banach spaces have their duals congruent with the space V b : (i) Let C(Ub ) be the space of all continuous functions on the space Ub endowed with the V-topology, and let · s be the supnorm. Then the dual space of the Banach space (C(Ub ), · s ) is congruent to (V b , · ). (ii) Set H = {T : T ∈ F and T = Ø}. Let B(H , ) be the closure w.r.t. the supnorm of the set of all simple functions on . Then the dual space of the Banach space (B(H , ), · s ) is congruent to (V b , · ). Ruckle (1982) proves that BV b ([0, 1], B) is congruent with a dual space. We can obtain his result as a consequence of Propositions 12.1 and 12.2, together with the following result from Functional Analysis (see e.g. Holmes, 1975: 211, Theorem 23.A). Proposition 12.5. Suppose that there is a Hausdorff locally convex topological space τ on a Banach space X such that the unit ball U (X) is τ -compact. Then X is congruent to a dual space.
Acknowledgments The author wishes to thank Itzhak Gilboa for his guidance, and Larry Epstein, Ehud Lehrer, David Schmeidler, two anonymous referees and an Associate Editor of Mathematics of Operations Research for helpful comments, discussions and references. Financial support from Ente Einaudi and Universitá Bocconi is gratefully acknowledged.
References Aumann, R. J., L. S. Shapley (1974). Values of Nonatomic Games. Princeton University Press, Princeton, NJ. Balcar, B., F. Franek (1982). Independent families in complete Boolean algebras. Trans. Amer. Math. Soc. 274, 607–618.
Representation of coalitional games
279
Billingsley, P. (1985). Probability and Measure. John Wiley and Sons, New York. Choquet, G. (1953–1954). Theory of capacities. Ann. Inst. Fourier 5, 131– 295. Dunford, N., J. T. Schwartz (1957). Linear Operators. Interscience, New York. Gardner, R. J. (1981). The regularity of Borel measures. Proc. Measure Theory Conf., Oberwolfach, Lecture Notes in Math., vol. 945, pp. 42–100, Springer-Verlag, New York. Gilboa, I., D. Schmeidler (1994). Additive representations of nonadditive measures and the Choquet integral. Ann Oper. Res. 52, 43– 65. ——, —— (1995). Canonical representation of set functions. Math. Oper. Res. 20, 197– 212. Holmes, R. B. (1975). Geometric Functional Analysis and Its Applications. SpringerVerlag, New York. Myerson, R. B. (1991). Game Theory. Harvard University Press, Cambridge. Phelps, R. (1965). Lectures on Choquet’s Theorem. Van Nostrand, Princeton, NJ. Revuz, A. (1955–1956). Fonctions croissantes et mesures sur les espaces topologiques ordonnés. Ann. Inst. Fourier 6, 187– 268. Ruckle, W. H. (1982). Projections in certain spaces of set functions, Math. Oper. Res. 7, 314–318. Schaefer, H. H. (1966). Topological Vector Spaces. The Macmillan Co., New York. Schmeidler, D. (1989). Subjective probability and expected utility without additivity. Econometrica 57, 571– 587. (Reprinted as Chapter 5 in this volume.) Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton University Press, Princeton, NJ. —— (1979). Allocations of probability. Ann. Probab. 7, 827– 839. Shapley, L. S. (1953). A value for n-person games. Ann. Math. Stud. 28, 307– 317.
Part II
Applications
13 An overview of economic applications of David Schmeidler’s models of decision making under uncertainty Sujoy Mukerji and Jean-Marc Tallon
13.1. Introduction What do ambiguity averse decision makers do when they are not picking balls out of urns—when they find themselves in contexts that are “more realistic” in terms of economic institutions involved? In this part, the reader is provided with a sample of economic applications of the decision theoretic framework pioneered by David Schmeidler. Indeed, decision theoretic models are designed, at least in part, to eventually be used to answer questions about behavior and outcomes in interesting economic environments. Does it make a difference for the outcome of a given game, or market interaction or contractual arrangement if we were to assume that decision makers are ambiguity averse rather than Bayesian? What kind of insights are gained by introducing ambiguity averse agents in our economic models? What are the phenomena that can be explained in the “ambiguity aversion paradigm” that did not have a (convincing) explanation in the expected utility framework? Do equilibrium conditions (e.g. rational expectations equilibrium or any sort of equilibrium in a game) that place constraints on agents’ beliefs rule out certain types of beliefs and attitudes toward uncertainty? These are but a few of the questions that the contributions collected in this part of the volume have touched upon. In this introduction we discuss these contributions along with several other chapters which, while not included in the volume, make important related points and therefore play a significant role in the literature. We have organized the discussion of economic applications principally around three themes: financial markets, contractual arrangements and game theory. In all these contexts, it is found that ambiguity aversion does make a difference in terms of qualitative predictions of the models and, furthermore, often provides an explanation of contextual phenomena that is, arguably, more straightforward and intuitive than that provided by the expected utility model. The first section discusses chapters that have contributed to a better understanding of financial market outcomes based on ambiguity aversion. The second section focusses on contractual arrangements and is divided into two subsections. The first subsection reports research on optimal risk sharing arrangements, while in the second sub-section, discusses research on incentive contracts. The third section concentrates on strategic interaction and reviews several papers that have extended different game theoretic
284
Sujoy Mukerji and Jean-Marc Tallon
solution concepts to settings with ambiguity averse players. A final section deals with several contributions which while not dealing with ambiguity per se, are linked at a formal level, in terms of the pure mathematical structures involved, with Schmeidler’s models of decision making under ambiguity. These contributions involve issues such as, inequality measurement, intertemporal decision making and multi-attribute choice.
13.2. Financial market outcomes In a pioneering work, Dow and Werlang (1992) applied the Choquet expected utility (CEU) model of Schmeidler (1989) to the portfolio choice and identified an important implication of Schmeidler’s model. They showed, in a model with one risky and one riskless asset that there exists a nondegenerate price interval at which a CEU agent will strictly prefer to take a zero position in the risky asset (rather than to sell it short or to buy it). This constitutes a striking difference with an expected utility decision maker, for whom this price interval is reduced to a point (as known since Arrow, 1965). The intuition behind this finding may be grasped in the following example. Consider an asset that pays off 1 in state L and 3 in state H and assume that the DM is of the CEU type with capacity ν(L) = 0.3 and ν(H ) = 0.4 and linear utility function. The expected payoff (that is, the Choquet integral computed in a way explained in the introduction of this volume) of buying a unit of the risky asset (the act zb ) is given by CEν (zb ) = 0.6 × 1 + 0.4 × 3 = 1.8. On the other hand, the payoff from going short on a unit of the risky asset (the act zs ) is higher at L than at H . Hence, the relevant minimizing probability when evaluating CEν (zb ) is that probability in the core of (ν) that puts most weight on H . Thus, CEν (zs ) = 0.3 × (−1) + 0.7 × (−3) = −2.4. Hence, if the price of the asset z were to lie in the open interval (1.8, 2.4), then the investor would strictly prefer a zero position to either going short or buying. Unlike in the case of unambiguous beliefs there is no single price at which to switch from buying to selling. Taking a zero position on the risky asset has the unique advantage that its evaluation is not affected by ambiguity. The “inertia” zone demonstrated by Dow and Werlang was a statement about optimal portfolio choice corresponding to exogenously determined prices, given an initially riskless position.1 It leaves open the issue whether this could be an equilibrium outcome. Epstein and Wang’s (1994) is the first chapter that looked at an equilibrium model with multiple-prior agents. The main contribution is twofold. First, they provide an extension of Gilboa and Schmeidler’s multiple-prior model to a dynamic (infinite horizon) setting. Gilboa and Schmeidler’s model is static and considers only oneshot choice. The extension to a dynamic setting poses the difficult problem of ensuring dynamic consistency of choices, together with the choice of a revision rule for beliefs that are not expressed probabilistically. The recursive approach developed in Epstein and Wang (1994) (and which was subsequently axiomatized by Epstein and Schneider (2003)) allows one to bypass these problems and ensures that intertemporal choices are dynamically consistent. The authors also provide
Overview of economic applications
285
techniques enabling one to find optimal solutions of such a problem; techniques that amount to generalizing Euler equalities and dealing with Euler inequalities. Second, they develop a model of intertemporal asset pricing à la Lucas (1978). They thus construct a representative agent economy, in which the price of the assets are derived at equilibrium. One central result shows that asset prices can be indeterminate at equilibrium. Indeterminacy, for instance, would result when there are factors that influence the dividend process while leaving the endowment of the agent unchanged. In that case, we are back to the intuition developed in Dow and Werlang (1992): there exists a multiplicity of prices supporting the initial endowment. The economic important consequence of this finding is that large volatility of asset prices is consistent with equilibrium. Chen and Epstein (2002) develops the continuous-time counterpart of that in Epstein and Wang (1994). They show that excess returns for a security can be decomposed as a sum of a risk premium and an ambiguity premium. Epstein and Miao (2003) use this model to provide an explanation of the home-bias puzzle: when agents perceive domestic assets as nonambiguous and foreign asset as ambiguous, they will hold “too much” (compared to a model with probabilistic beliefs) of the latter. The framework developed in Epstein and Wang (1994) has the feature that the equilibrium is Pareto optimal and, somewhat less importantly, the equilibrium allocation necessarily entails no-trade (given the representative agent structure). Mukerji and Tallon (2001) develops a static model with heterogeneous agents. They show that ambiguity aversion could be the cause of less than full risk sharing, and, consequently, of an imperfect functioning of financial markets. Actually, ambiguity aversion could lead to the absence of trade on financial markets. This could be perceived to be a direct generalization of Dow and Werlang (1992)’s no-trade price interval result. However, simply closing Dow and Werlang’s model is not enough to obtain this result, as can be seen in an Edgeworth box. Hence, some other ingredient has to be inserted. Similar to the crucial ingredient leading to equilibrium price indeterminacy in Epstein and Wang (1994), what is needed here is the introduction of a component in asset payoffs that is independent of the endowments of the agents. Actually, one also needs to ensure that some component of an asset payoff is independent of both the endowments and the payoff of any other asset as well. Mukerji and Tallon (2001) prove that, when the assets available to trade risk among agents are affected by this kind of idiosyncratic risk, and if agents perceive this idiosyncratic component as being ambiguous and the ambiguity is high enough, then financial market equilibrium entails no trade at all and is suboptimal. This is to be contrasted with the situation in which agents do not perceive any ambiguity, in which standard replication and diversification arguments ensure that, eventually, full risk-sharing is obtained and the equilibrium is Pareto optimal. Thus, ambiguity aversion is identified to cause market breakdown: assets are there to be traded, but agents, because of aversion toward ambiguity, prefer to hold on to their (suboptimal) endowments, rather than bear the ambiguity associated to holding the assets. The absence of trade is of course an extreme result, which in particular is due to the fact that all the assets are internal assets. It would
286
Sujoy Mukerji and Jean-Marc Tallon
in particular be interesting to obtain results concerning the volume of trade, in particular if outside assets were present as well. Building on a similar intuition, Mukerji and Tallon (2004a) explain why unindexed debt is often preferred to indexed debt: the indexation potentially introduces some extra risk in one’s portfolio (essentially, the risk due to relative price variation of goods that appear in the indexation bundle but that the asset holder does not consume nor possess in his endowments). This provides further evidence that risk sharing and market functioning might be altered when ambiguity is perceived in financial markets. Financial applications of the decision theoretic models developed by David Schmeidler are, of course, not limited to the ones reported earlier. There is by now a host of studies that address issues such as under-diversification (Uppal and Wang (2003), cross-sectional properties of asset prices in presence of uncertainty (Kogan and Wang, 2002)), liquidity when the model of the economy is uncertain (Routledge and Zin (2001)). What is probably most needed now is an econometric evaluation of these ideas. Thus more work is needed to precisely assess how (non-probabilistic) uncertainty can be measured in the data. Some econometric techniques are being developed (see Henry (2001)) but applications are still rare. In a series of contributions, Hansen et al. (1999; 2001; 2004)2 have developed an approach to understanding a decision maker’s concern about “model uncertainty,” which, although not formally building on Schmeidler’s work, is based upon a similar intuition. The idea goes back to what the “rational expectations revolution” in macroeconomics wanted to achieve: that the econometrician modeler and the agents within the model be placed on equal footing concerning the knowledge they have of the model. This led to the construction of models in which agents have an exact knowledge of the model, in particular in which they knew the equilibrium price function. However, econometricians typically acknowledge that their models might be mis-specified. Thus, Hansen and Sargent argue, these doubts should also be present in the agents’ minds. They hence came to develop a model of robust decision making, wherein agents have a model in mind but also acknowledge the fact that this model might be wrong: they therefore want to take decisions that are robust against possible mis-specifications. Since a particular model implies a particular probability distribution over the evolution of the economic system, a consideration for robustness can be understood, in familiar terms of Schmeidler’s decision theory, as a concern for the uncertainty about what probability distribution is a true description of the relevant environment. Wang (2003) examines the axiomatic foundation of a related decision model and compares it with the multiple-prior model. The chapter Anderson et al. (2003) included in the volume, belongs to this line of research and takes an important step in formulating the kind of model mis-specifications the decision maker may take into consideration. As the authors emphasize “a main theme of the present chapter is to advocate a workable strategy for actually specifying those [Gilboa-Schmeidler] multiple priors in applied work.” The analysis is based on the assumption that the agents, given that they have access to a limited amount of data, cannot discriminate among various models
Overview of economic applications
287
of the economy. This makes them value decision rules that perform well across a set of models. What is of particular interest is that this cautious behavior will show in the data generated by the model as an uncertainty premium incorporated in equilibrium security market prices, which goes toward an explanation of the equity premium puzzle.
13.3. Optimal contractual arrangements 13.3.1. Risk sharing Optimal risk-sharing arrangements have been studied extensively, in contract theory, in general equilibrium analysis and so on. It is a priori not clear whether the risk-sharing arrangements that were optimal under risk remain so when one reconsiders their efficacy in the context of non-Bayesian uncertainty. Chateauneuf et al. (2000) studies the way economy-wide risk-sharing is affected when agents behave according to the CEU model. They show that the Pareto optimal allocations of an economy in which all agents are of the von-Neumann and Morgenstern type are still optimal in the economy in which agents behave according to CEU provided all agents’ beliefs are described by the same convex capacity. Things are however different when agents have different beliefs. To understand why, consider the particular case of betting: there is no aggregate uncertainty and agents have certain endowments. The only reason why they might be room for Pareto improving trade is if agents have different beliefs. This is the situation treated in Billot et al. (2000). They show that in this case, Pareto optimal allocations are full insurance allocations (i.e. all agents have a constant across state consumption profile) if and only if the intersection of the cores of the capacities representing their beliefs is nonempty. This is to be contrasted with the case in which agents have probabilistic beliefs: then, betting will take place as soon as agents have different beliefs, no matter how “small” this difference is. Thus, the fact that people do not bet against one another on many issues could be interpreted not as evidence that they have the same beliefs but rather that they have vague beliefs about these issues, and that vagueness is sufficiently large to ensure that agents have overlapping beliefs. Rigotti and Shannon (2004) look at similar issues when agents have “multiple priors with unanimity ranking” à la Bewley (1986) (see the introduction to this volume). Mukerji and Tallon (2004b) also considers a risk-sharing problem, but in the context of a wage contract. The chapter studies the optimal degree of (price) indexation in a wage contract between a risk-averse employee and a risk-neutral firm, given the presence of two types of price risk. The two types of risk are, an aggregate price risk, arising from an economy-wide shock (possibly, monetary) that multiplies prices of all goods by the same factor, and specific risks, arising from demand/supply shocks to specific commodities that affects the price of a single good or a restricted class of goods. If contracting parties were SEU maximizers, an optimal wage contract will typically involve partial indexation (i.e. a certain fraction of wages, strictly greater than zero, will be index linked). However, this
288
Sujoy Mukerji and Jean-Marc Tallon
chapter shows that if agents are ambiguity averse with CEU preferences and if they have an ambiguous belief about specific price risks involving goods that are neither in the employee’s consumption basket nor in the firm’s production set, then zero indexation coverage is optimal so long as the variability of inflation is anticipated to lie within a certain bound. What is crucial is the ambiguity of belief about specific price risks. The intuition for this result is again rather simple: ambiguity averse workers will not want to bear the risk associated to changes in relative prices of the goods composing the indexation bundle if these changes are difficult to anticipate. Thus, even though indexation insures them against the risk of inflation, if this risk is well apprehended (which is the case in most countries where inflation is low and not very variable) workers prefer to bear this (known) risk to the ambiguous risk associated to relative price movements which are less predictable. 13.3.2. Incentive contracts Typically, incentive contracts involve arrangements about contingent events. As such, the relevant trade-offs hinge crucially on the likelihoods of the relevant contingencies. Hence, it is a reasonable conjecture that the domain of contractual transactions is one area of economics that is significantly affected by agents’ knowledge of the odds. Thus such contractual relations are a natural choice as a particular focus of the research on the principal economic effects of ambiguity aversion. Why firms exist, what productive processes and activities are typically integrated within the boundaries of a firm, is largely explicated on the understanding that under certain conditions it is difficult or impossible to write supply and delivery contracts that are complete in relevant respects. A contract may be said to be incomplete if the contingent instructions included in the contract do not exhaust all possible contingencies; for some contingencies arrangements are left to be completed ex post. Incomplete contracts are, typically, inefficient. It is held, firms emerge to coordinate related productive activities through administrative hierarchies if such productive activities may only be inefficiently coordinated using incentives delivered through contracts, as would happen if conditions are such that the best possible contractual arrangements are incomplete. Mukerji (1998) shows that uncertainty and an ambiguity averse perception and attitude to this uncertainty is one instance of a set of conditions wherein the best possible contracts may be incomplete and inefficient. The formal analysis there basically involves a reconsideration of the canonical model of a vertical relationship (i.e. a relationship in which one firm’s output is an input in the other firm’s production activity) between two contracting firms under the assumption that the agents’ common belief about the contingent events (which affect the gains from trade) is described by a convex capacity rather than a probability. A complete contract which is appropriate in the sense of being able to deliver enough incentives to the contracting parties to undertake efficient actions, will require that the payments from the contract be uneven across contingencies. For instance, the contract would reward a party in those contingencies which are more likely if the party takes the “right” action. However,
Overview of economic applications
289
the Choquet evaluation of such contract, for either party, may be low because the expected value of the contracted payoffs vary significantly across the different probabilities in the core of the convex capacity. Thus a null contract, an extreme example of an incomplete contract, may well be preferred to the contract which delivers “more appropriate” incentives. This would be so because the null contract would imply that the ex post surplus is divided by an uncontingent rule and, as such, deliver payoffs that are more even across contingencies thereby ensuring that the expected value is more robust to variation in probabilities. Hence, the best contractual agreement under ambiguity might not be a good one, in the sense of being unable to deliver appropriate incentives, and therefore may be improved upon by vertical integration which delivers incentives by a hierarchical authority structure. Why might an explanation like the one given earlier be of interest? A recurrent claim among business people is that they integrate vertically because of uncertainty in input supply, a point well supported in empirical studies (see discussion and references in Shelanski and Klein (1999)). The claim, however, has always caused difficulties for economists in the sense that it has been hard to rationalize on the basis of standard theory (see, for instance, remarks in Carlton, 1979). The analysis in the present chapter explains how the idea of ambiguity aversion provides one precise understanding of the link between uncertainty and vertical integration. In a related vein, Mukerji (2002) finds that ambiguity could provide a theoretical link between a very prevalent contractual practice in procurement contracting and uncertainty. The prevalent practice in question is the use of contracts which reimburse costs (wholly or partially) ex post and therefore provide very weak cost control incentives to the contractor. It is argued in that paper, while there is ample empirical evidence for this link, for instance in the case of research and development procurement by the US Defense Department, the existing theoretical underpinnings for this link based on expected utility are less than plausible. It is worth pointing out that the analyses in both papers, Mukerji (1998) and Mukerji (2002), model firms as ambiguity averse entities. Economists have traditionally preferred to model firms as risk neutral, citing diversification opportunities. Based on formal laws of large numbers results proved in Marinacci (1999), and following upon the intuition in Mukerji and Tallon (2001), it may be conjectured that diversification opportunities, even in the limit, are not good enough to neutralize ambiguity, the way they can neutralize risk. The conjecture, to date, remains an interesting open question. Intriguingly, the optimal contractual forms characterized in both Mukerji (1998) and Mukerji (2002) are instances where ambiguity aversion leads to contracts with low powered incentives. It is an interesting but open question as to how general this is. It has been widely observed that optimal contracts, say under moral hazard, as predicted on the basis of expected utility analysis are far more high powered than seen in the real world. It is an intriguing conjecture that high powered, fine tuned incentive contracts are not robust to the ambiguity about relevant odds and that optimal incentive schemes under, say, moral hazard would be far less complex when ambiguity considerations are taken into account than what they are in standard theory. While this question is an important one, finding an answer is not likely
290
Sujoy Mukerji and Jean-Marc Tallon
to be easy. Ghirardato (1994) investigated the principal agent problem under moral hazard with the player’s preferences given by CEU. A significant finding there is that many of the standard techniques used in characterizing optimal incentive schemes in standard theory, do not seemingly work with CEU/MEU preferences. For instance, the Grossman-Hart “trick” of separating the principal’s objective function into a revenue (from implementing a given action) component and a cost (of implementing a given action) component is not possible with CEU/MEU preferences. On a more positive note, the chapter reports interesting findings about the comparative statics of the optimal incentive scheme with respect to changes in the agent’s uncertainty aversion. One result shows that as uncertainty aversion decreases the agent will be willing to implement an arbitrary action for a uniformly lower incentive scheme. The point of interest is that this is in contrast with what happens for decreases in risk aversion. Typically, a decrease in risk aversion will have an asymmetric effect on contingent payments: it will make high payments higher and low payments lower.
13.4. Strategic interaction In recent years noncooperative game theory, the theory of strategic decision making, has come to be the basic building block of economic theory. Naturally, one of the first points of inspiration stimulated by Schmeidler’s ideas was the question of incorporating these ideas into noncooperative game theory. The general research question was, “What if players in a game were allowed to have beliefs and preferences as in the CEU/MEU model?” More particularly, there were at least three interrelated sets of questions: (1) a set of purely modeling/conceptual questions: For example, how should solution concepts, such as strategic equilibrium, be defined given the new decision theoretic foundations; (2) questions about the general behavioral implications of the new solution concepts; (3) questions about insights such innovations might bring to applied contexts. The research so far has largely focused on clarifying conceptual questions like defining the appropriate analogue of solution concepts like Nash equilibrium, and that too almost exclusively in the domain of strategic form games with complete information. Questions about the appropriate equilibrium concepts in incomplete information games and refinements of equilibrium in extensive form games remain largely unanswered. However, the progress on conceptual clarification has provided significant clues about behavioral implications and, in turn, has led to some important insights in some applied contexts. One reason why the progress has been largely limited to complete information normal form games, is the host of supplementary questions that one has to face up to in order to tackle the question of defining equilibrium even in this simplest of strategic contexts. Defining a strategic equilibrium under ambiguity involves making several nontrivial modeling choices—namely, whether to use multiple priors or capacities to represent beliefs and if the latter what specific class of capacities, whether to allow for a strict preference for randomization, whether to fix actions explicitly in the description of the equilibrium, or whether, instead of
Overview of economic applications
291
explicitly describing actions, to simply describe the supports of the beliefs, and if the latter, what among the various possible notions of support to adopt (see Ryan (1999) for a perspective on this choice). Unsurprisingly, the definition of equilibrium varies across the literature, each definition involving a particular set of modeling choices. Lo (1996) considers the question of an appropriate notion of strategic equilibrium, for normal form games, when players’ preferences conform to the multiple priors MEU model. In Lo’s conceptualization, equilibrium is a profile of beliefs (about other players’ strategic choice) that satisfies certain conditions. To see the key ideas, consider a two-player game. The component of the equilibrium profile that describes player i’s belief, about strategic choice of player j , is a (convex) set of priors such that all j ’s strategies in the support of each prior are best responses for j given j ’s belief component in the equilibrium profile. (Lo also extends the concept to n-player games requiring players’ beliefs to satisfy stochastic independence as defined in Gilboa and Schmeidler (1989).) This notion of equilibrium predicts that player i chooses some (possibly mixed) strategy that is in the set of priors describing player j ’s belief about i’s choice. In terms of behavioral implications, the notion implies that an outsider who can only observe the actual strategy choices (and not beliefs), will not be able to distinguish uncertainty averse players from Bayesian players. Intuitively, the reason why uncertainty aversion has seemingly so limited “bite” in this construction is that players’ belief set is severely restricted by equilibrium knowledge: every prior in i’s belief set about j ’s strategic choice must be a best response mixed strategy. In other words, given equilibrium knowledge, there are too few possible priors, too little to be uncertain about, so to speak. Dow and Werlang (1994), Klibanoff (1996) and Marinacci (2000), all offer equilibrium concepts with uncertainty aversion that differ from Lo’s in one key way. They do not restrict the equilibrium belief to only those priors which are best responses (as mixed strategies); other priors are also possible thus enriching the uncertainty, in a manner of speaking. One principal effect of this is that these notions of equilibria “rationalize” more strategy profiles compared to Lo’s concept, indeed even strategy profiles that are not Nash equilibria. Dow and Werlang (1994) defines equilibrium in two-player normal form games where players have CEU preferences. Equilibrium is simply a pair of capacities, where each capacity gives a particular player’s belief about the strategic choice made by the other player. Further, the support of each capacity is restricted to include only such strategies which are best responses with respect to the counterpart capacity. Significantly, the equilibrium notion only considers pure strategies. A pure strategy is deemed to be a best response if it maximizes the Choquet expectation over the set of pure strategies. Much depends on how the support of a capacity is defined. Indeed, the only restriction on equilibrium behavior is that only those strategies which appear in the set defined to be the support of the equilibrium beliefs may be played in an equilibrium. Dow and Werlang (1994) defines support A of a capacity µi to be a set (a subset of Sj , the set of strategies of player j , in the present context) such that µi (Ac ) = 0 and µi (B c ) > 0 for all B ⊂ A. The nature of this definition may be more readily appreciated if we
292
Sujoy Mukerji and Jean-Marc Tallon
restrict attention to convex capacities and consider the set of priors in the core of such a capacity. Significantly, the set includes priors that put positive probability on pure strategies in Ac which may not be best responses. The convex capacity µi is the lower envelope of the priors in the core of µi ; hence, as long as there is one prior in the core which puts zero probability weight on sj ∈ Ac , µi (sj ) = 0. Hence, a player i’s evaluation of a pure strategy si will take into account the payoff ui (si , sj ), sj ∈ Ac , even though sj may not be a best response for player j , given j ’s equilibrium belief. Klibanoff (1996) defines equilibrium in normal form games and like Lo, applies the multiple priors framework to model players’ preferences. But there are important differences. Klibanoff defines equilibrium as a profile of tuples ({σi }, {mi })ni=1 where σi is the actual mixed strategy used by player i and mi is a set of priors of player i denoting his belief about opponents’ strategy choices. The profile has to satisfy two “consistency” conditions. One, each σi has to be consistent with mi in the sense that σi is a best response for i given his belief set mi . Two, the strategy profile σ−i chosen by other players should be considered possible by player i. The second condition is a consistency condition on the set mi in the sense that it has to include the actual (possibly mixed) strategy chosen by the other players. However, it is permitted that mi may contain priors that are not mixed strategies chosen by other players and indeed strategies that are not best responses. Hence, Klibanoff’s equilibrium is different from Lo’s in that it puts a weaker restriction on equilibrium beliefs, a restriction that is very similar to that implicit in Dow and Werlang’s definition. But Klibanoff’s definition is different from Dow and Werlang’s in that it explicitly allows players to choose a mixed strategy and allows for mixed strategies to be strictly preferred to pure strategies. Moreover, it is different from both Lo’s definition and Dow and Werlang’s in that it specifies more than just equilibrium beliefs: as noted, it explicitly states which strategies will be played in equilibrium. Marinacci (2000), defines equilibrium in two-player normal form games and, like Dow and Werlang, applies the CEU framework to model players’ preferences. He also defines an equilibrium in beliefs, again much like Dow and Werlang, where beliefs are modeled by convex capacities. However, he employs a slightly different notion of support for equilibrium capacities. His definition of support A of a capacity µi consists of all elements si ∈ Si such that µi (si ) > 0. This puts a weaker restriction on beliefs than Lo’s definition, in very much the same spirit as Dow and Werlang’s and Klibanoff’s definitions. But the true distinctiveness of Marinacci’s definition lies elsewhere. His definition includes an explicit, exogenous parametric restriction on the ambiguity incorporated in equilibrium beliefs. Given a capacity µ(·), the ambiguity of belief about an event A, denoted (A), is measured by 1 − µ(A) − µ(Ac ). The measure is intuitive: (A) is precisely the difference between the maximum likelihood put on A by any probability measure in the “core” of the µ(·), and the minimum likelihood put on A by some other probability measure appearing in the core. Thus (A) is indeed a measure of the fuzziness of belief about A. Marinacci views the ambiguity in the description of the strategic situation as a primitive, and characterizes a two-player ambiguous
Overview of economic applications
293
game G by the tuple {Si , ui , i : i = 1, 2}. The addition to the usual menu of strategies and utility functions is the ambiguity functional, i : Sj → [0, 1], restricting the possible beliefs of the player i to have a given level of ambiguity: a ˆ i (A). player i’s belief µi :2Sj → [0, 1] must be such that 1 − µi (A) − µi (Ac ) = In the models of Dow and Werlang, Lo, and Klibanoff, the equilibrium beliefs are freely equilibrating variables and as such, the level of ambiguity in the beliefs is endogenous. Given the endogeneity, it is not possible in these models, strictly speaking, to pose the comparative statics question, “What happens to the equilibrium if the ambiguity in the way players perceive information of the strategic environment changes?” In Marinacci’s model, on the other hand, this question is well posed since beliefs, as equilibrating variables, are not free to the extent they are subject to the ambiguity level constraint imposed by the parameter . Hence, the answer to the question (a very natural one in an applied context) involves a well posed comparative static exercise showing how the equilibrium changes when changes. Following is one way of understanding why notions of equilibrium in Dow and Werlang (1994), Klibanoff (1996) and Marinacci (2000) allow a “rationalization” of non-Nash strategy profiles. For instance, in a two-player game, it is possible to have as equilibrium, (µ∗1 , µ∗2 ), convex capacities denoting equilibrium beliefs of players 1 and 2, where si∗ = sup p(µ∗j ) is not a best response to si∗ = sup p(µ∗i ). The key to the understanding lies in the fact that the support of µ∗i , i = 1, 2, is so defined that it may exclude strategies sˆj ∈ Sˆj ⊂ Sj such that µ∗i (Sˆj ) > 0, but µ∗i (sj ) = 0. Since the strategy sˆj ∈ Sˆj is not in the support of µ∗i it is not required to be a best response to µ∗j , but nevertheless the Choquet integral evaluation of si∗ , with respect to the belief µ∗i , may attach a positive weight to the payoff ui (si∗ , sj ) given that µ∗i (Sˆj ) > 0. Hence, si∗ can be an equilibrium best response even though it may not be a best response to a belief that puts probability 1 on sj∗ . It is as if the player i, when evaluating si∗ , allows for the possibility that j may play a strategy that is not a best response. The discussion in the preceding paragraph suggests that equilibrium, as defined by Dow and Werlang (1994), Klibanoff (1996) and Marinacci (2000) incorporates a flavor of a (standard) Bayesian equilibrium involving “irrational types.” This point has been further investigated in Mukerji and Shin (2002). That paper concerns the interpretation of equilibrium in non-additive beliefs in two-player normal form games. It is argued that such equilibria involve beliefs and actions which are consistent with a lack of common knowledge of the game. The argument rests on representation results which show that different notions of equilibrium in games with non-additive beliefs may be reinterpreted as standard notions of equilibrium in associated games of incomplete information with additive (Bayesian) beliefs where common knowledge of the (original) game does not apply. More precisely, it is shown that any pair of non-additive belief functions (and actions, to the extent these are explicit in the relevant notion of equilibrium), which constitute an equilibrium in the game with Knightian uncertainty/ambiguity may be replicated as beliefs and actions of a specific pair of types, one for each player, in an equilibrium of an orthodox Bayesian game, in which there is a common prior over the type
294
Sujoy Mukerji and Jean-Marc Tallon
space. The representation results provide one way of comparing and understanding the various notions of equilibrium for games with non-additive beliefs, such as those in Dow and Werlang (1994), Klibanoff (1996), and Marinacci (2000). Greenberg (2000) analyzes an example of an equilibrium in a dynamic game wherein beliefs about strategic choice off the equilibrium path of play are modeled using ideas of Knightian uncertainty/ambiguity. The example illustrates both the appropriateness of this modeling innovation and its potential for generating singular insight in the context of extensive form games. The example is a game consisting of three players. Players 1 and 2 first play a “bargaining game” which can end in agreement or disagreement. Disagreement may arise due to the “intransigence” of either player. However, player 3, who comes into play only in the instance of disagreement, does not have perfect information as to which of players 1 or 2 was responsible for the disagreement; at the point player 3 makes his choice the responsibility for disagreement is private information to players 1 and 2. Player 3 has two actions, one of which is disliked by player 1 while the other is disliked by player 2, and disliked more than disagreement. However, player 3 is indifferent between his two choices though he prefers the agreement outcome to either of them. The conditions of Nash equilibrium require that players have a common probabilistic belief about 3’s choice. The details of payoffs are such that any common probability belief would make disagreement more attractive to at least one of players 1 and 2. Greenberg argues that agreement, even though not a Nash equilibrium, can be supported as the unique equilibrium outcome if the first two players’s beliefs about what 3 will do in the event of disagreement (an off equilibrium choice) is ambiguous and players are known to be ambiguity averse. A common set of probabilities describes players 1 and 2’s prediction about 3’s choice. But given uncertainty aversion, say as in MEU, each of players 1 and 2 evaluates his options as if his belief is described by the probability that mirrors his most pessimistic prediction. Hence the two players, given their respective apprehensions, choose to agree, thereby behaving as if they had two different probability beliefs about player 3’s choice. Greenberg further observes, player 3 may actually be able to facilitate this “good” equilibrium by not announcing or precommitting to the action he would choose if called upon to play following disagreement; the player would strictly prefer to exercise “the right to remain silent.” The silence “creates” ambiguity of belief and given aversion to this ambiguity, in turn “brings about” the equilibrium. The question of appropriate modeling of beliefs about offequilibrium path choices has been a source of vexation about as long as extensive form games have been around. It may be argued persuasively that on the equilibrium path of play, beliefs are pinned down by actual play. The argument is far less persuasive, if at all, for beliefs off the equilibrium path of play. Hence, the appropriateness of modeling such beliefs as ambiguous. But off-equilibrium path beliefs may be crucial for the construction of equilibrium. As has been noted, the good equilibrium described in Greenberg’s example would not obtain if players 1 and 2 were required to have a common probabilistic belief about 3’s choice. Of course, this profile would not be ruled out by a solution concept that allows for “disparate” beliefs off the path of play, for instance, self-confirming equilibrium
Overview of economic applications
295
(Fudenberg and Levine, 1993), subjective equilibrium (Kalai and Lehrer, 1994)), extensive form rationalizability (Bernheim (1984), Pearce (1984)). What ambiguity aversion adds, compared to these solution concepts, is a positive theory as to why the players (1 and 2) would choose to behave as if they had particular differing probabilistic beliefs even though they are commonly informed. While Greenberg does not give a formal definition of equilibrium for extensive form games where players may be uncertainty averse, Lo (1999) does. However, Lo does not go far enough to consider the question of extensive form refinements. Hence, determining reasonable restrictions on beliefs about off-equilibrium play, while allowing them to be ambiguous, remains an exciting open question, hopefully to be taken up in future research. Analysis of behavior in auctions has been a prime area of application of game theory, especially in recent years. Traditional analysis of auctions assumes the seller’s and each bidder’s belief about rival bidders’ valuations are represented by probability measures. Lo (1998) makes an interesting departure from this tradition by proceeding on the assumption that such beliefs are instead described by sets of multiple priors and players are uncertainty averse (in the sense of MEU). Lo’s analyzes first and second price sealed bid auctions with independent private values. In a more significant finding, he shows, under “an interesting” parametric specification of beliefs, that the first price auction Pareto dominates the second price auction. A rough intuition for the result is as follows. Suppose the seller and the bidders are commonly informed about “others’” valuation, that is, the information is described by the same set of probabilistic priors. When the seller is considering which of the two auction formats to adopt, the first or the second price sealed bid auction, he evaluates his options using that probabilistic prior (from the “common” set of priors) which reflects his worst fears, namely, that bidders have low valuations. Recall that bidders always bid their true valuation in the second price auction. Therefore, the usual Revenue Equivalence Theorem implies that the seller would be indifferent between the two auction formats, if bidders’ strategy (in the first price auction) were based on the same probabilistic prior that the seller effectively applies for his own strategy evaluations. However, given uncertainty aversion, bidders will behave as if the probability relevant for their purposes is the one that reflects their apprehensions: the fear that their rivals have high valuation. Which means under uncertainty aversion the optimal bid will be higher. On the other hand, because of his apprehensions, the seller will choose a reserve price for first price auction that is strictly lower than the one he chooses for the second price auction. Hence, the first price auction format is Pareto preferred to the second price format. Ozdenoren (2002) generalizes Lo’s results by relaxing the parametric restriction on beliefs significantly. These successful investigations of behavior in auctions point to the potential for further research to understand behavioral implications of uncertainty aversion in incomplete information environments, in general, and of implementation theory in particular. In general, strategic interaction in incomplete information environments would appear to be particularly appropriate a setting for investigation since the scope of play of ambiguity is far greater than what is possible under equilibrium restrictions in complete information settings. More
296
Sujoy Mukerji and Jean-Marc Tallon
particularly, Bayesian implementation theory has frequently been criticized for prescribing schemes which are “too finely tuned” to the principal’s and agents’ knowledge of the precise prior/posteriors. Perhaps introducing the effect of uncertainty aversion will lead to the rationalization of schemes which are more robust in this respect (and hopefully, an understanding of implementation schemes which are more implementable in the real world!). Eichberger and Kelsey (2002) apply Dow and Werlang’s notion of equilibrium to analyze the effect of ambiguity aversion on a public good contribution game. They show that it is possible to sustain as equilibrium (under ambiguity) a strategy profile which involves higher contributions than possible under standard beliefs. The rough idea is as follows. Recall our discussion about how the Dow and Werlang notion may allow a non-Nash profile to be the support of equilibrium beliefs. Working with the CEU model, Eichberger and Kelsey construct an equilibrium belief profile, wherein each player i behaves as if there is a chance that another player j plays a strategy lying outside the support of i’s equilibrium belief, in particular, a strategy that is “bad” for i, that is, make contribution lower than the equilibrium contribution. Essentially, it is this “fear” of lower contribution by others, given the strategic uncertainty, which drives up i’s equilibrium contribution. The chapter also extends the analysis from the public good provision games to consider more general games classified in terms of strategic substitutability and complementarity. Ghirardato and Katz (2002) apply the MEU framework to the analysis of voting behavior to give an explanation of the phenomenon of selective abstention. Consider a multiple office election, that is, an election in which the same ballot form asks for the voter’s opinion on multiple electoral races. It is typically observed in such elections that voters choose to vote on the more high profile races (say, the state governor), but simultaneously abstain from expressing an opinion on other races (say, the county sheriff). Ghirardato and Katz’s objective is to formalize a commonly held intuition that the reason the voters selectively abstain is because they believe that they are relatively poorly informed about the candidates involved in the low profile races. The chapter contends that the key to the intuition is the issue of modeling the sensitivity of a decision maker’s choice to what he perceives to be the quality of his information and that this issue cannot be adequately addressed in a standard Bayesian framework. On the other hand, they argue, it can be addressed in a decision theoretic framework which incorporates ideas of ambiguity; roughly put, information about an issue is comparatively more ambiguous and of lower quality if the information about the issue is represented by a more encompassing set of priors. This point would seem to be of wider interest and worth pursuing in future research.
13.5. Other applications Finally, we survey some applications of the tools developed by David Schmeidler that do not, per se, involve decision making under uncertainty. The applications covered relate to the measurement of inequality, intertemporal choice and multi-attribute choice.
Overview of economic applications
297
Inequality measurement. Decision theory under risk and the theory of inequality measurement essentially deal with the same mathematical objects (probability distributions). Therefore, these two fields are closely related, and their relationship has long been acknowledged.3 However, surprisingly enough, almost all the literature on inequality measurement deals with certain incomes. This is probably due, in part, to a widely held opinion that the problem of measuring inequality of uncertain incomes can be reduced to a problem of individual choice under uncertainty (say, e.g. by first computing in each state a traditional welfare function à la Atkinson-Kolm-Sen, and then reducing the problem to a single decision maker’s choice among prospects of welfare) or alternatively to a problem of inequality measurement over sure incomes (say, e.g. by evaluating each individual’s welfare by his expected utility, and then considering the distribution of the certainty equivalent incomes). In a path-breaking paper, Ben Porath et al. (1997) show that such is not the case, and that inequality and uncertainty should be analyzed jointly and not separately in two stages. They present the following example which serves to illustrate their point. Consider a society with two individuals, a and b, facing two equally likely possible states of the world, s and t, and assume that the planner has to choose among the three following social policies, P1 , P2 , and P3 : P1 s t
a 0 1
b 0 1
P2 s t
a 1 0
b 0 1
P3 s t
a 1 1
b 0 0
Observe that in P1 , both individuals face the same income prospects as in P2 ; but in P1 , there is no ex post inequality, whatever the state of the world. This could lead one to prefer P1 over P2 . Similarly, P2 and P3 are ex post equivalent, since in both cases, whatever the state of the world, the final income distribution is identical; but P3 gives 1 for sure to one individual, and 0 to the other, while P2 provides both individuals with the same ex ante income prospects. On these grounds, it is reasonable to think that P2 should be ranked above P3 . Thus, a natural ordering would be P1 P2 P3 . Ben Porath et al. (1997) point out that there is no hope to obtain such an ordering by two-stage procedures. Indeed, the first two-stage procedure (mentioned earlier) would lead us to neglect ex ante considerations and to judge P2 and P3 as equivalent. In contrast, the second procedure would lead us to neglect ex post considerations and to see P1 and P2 as equivalent. In other words, these procedures would fail to simultaneously take into account the ex ante and the ex post income distributions. They suggest solving this problem by considering a linear combination of the two procedures, that is, a linear combination of the expected Gini index and the Gini index of expected income. This solution captures both ex ante and ex post inequalities. Furthermore, it is a natural generalization of the principles commonly used for evaluating inequality under certainty on the one hand, and for decision making under uncertainty on the other hand.
298
Sujoy Mukerji and Jean-Marc Tallon
The procedure suggested in Ben Porath et al. (1997) is not the only possible evaluation principle that takes into account both ex ante and ex post inequalities. Any functional that is increasing in both individuals’expected income and snapshot inequalities (say, measured by the Gini index) has the same nice property, provided that it takes its values between the expected Gini and the Gini of the expectation. Furthermore, it is unclear why we should restrict ourselves, as Ben Porath et al. (1997) did, to decision makers who behave in accordance with the multiple-priors model. Finally, they do not provide an axiomatization for the specific functional forms they propose. This problem is partially dealt with in Gajdos and Maurin (2004), who provide an axiomatization for a broad class of functionals that can accommodate Ben Porath, Gilboa, and Schmeidler’s example. Intertemporal choice and Multi-attribute choice. Schmeidler’s models of decision under uncertainty have also been shown to hold new insights when applied to decision making contexts and questions that do not (necessarily) involve uncertainty. For instance, in the context of intertemporal decision making (under certainty). In terms of abstract construction, an intertemporal decision setting is essentially the same as that of decision making under uncertainty with time periods replacing states of nature. Gilboa (1989) transposes the CEU model of decision under uncertainty to intertemporal choices. He shows that using the axiomatization of Schmeidler (1989) and adding an axiom called “Variation preserving sure-thing principle” the decision rule is given by a weighted average of the utility in each period and the utility variation between each two consecutive periods. Aversion toward uncertainty is now replaced by aversion toward variation over time of the consumption profile. Wakai (2001) in a similar vein uses the idea that agents dislike time-variability of consumption and axiomatizes a notion of non-separability in the decision criterion. He then goes on to show how such a decision criterion modifies consumption smoothing and can help providing an explanation to the equity and the risk-free rate puzzle. Marinacci (1998) also uses a transposition of Gilboa and Schmeidler (1989) model to intertemporal choice and axiomatizes a complete patience criterion with a new choice criterion (the Polya index). De Waegenaere and Wakker (2001) generalizes the Choquet integral to signed Choquet integral, which captures both violations of separability and monotonicity. This tool can be used to model agents whose aversion toward volatility of their consumption is so high that they could prefer a uniformly smaller profile of consumption if it entails sufficiently less volatility. A work of related interest is Shalev (1997), which uses a mathematically similar technique to represent preferences incorporating a notion of loss aversion in an explicitly intertemporal setting, that is, where objects of choice are intertemporal income/consumption streams and the decision maker is averse to consumption decreasing between successive periods. In the context of decision making under uncertainty, the Choquet integral may be viewed as a way of aggregating utility across different states in order to arrive at an (ex ante) decision criterion. Multi-attribute choice concerns the question of aggregating over different attributes, or characteristics, of commodities in order to formulate an appropriate decision criterion for choosing among the multi-attributed
Overview of economic applications
299
objects (see for instance Grabisch (1996), Dubois et al. (2000).) The use of variants of the Choquet integral allows some flexibility in the way attributes are weighted and combined. In an interesting paper, Nehring and Puppe (2002) use the multiattribute approach to modeling (bio-)diversity. In doing so, they develop tools to measure diversity, based on the notion of the (Möbius) inverse of a capacity. Interestingly, this is also related to another line of research developed by Gilboa and Schmeidler, namely Case-Based Decision Theory (Gilboa and Schmeidler (1995), Gilboa and Schmeidler (2001)).
13.6. Concluding remarks This was a review of a sample of the rich, varied, and very much “alive” literature that has been inspired by David Schmeidler’s path breaking contributions to decision theory. We do want to emphasize that it is very much a sample; the list of papers discussed or cited is far from exhaustive. Nevertheless, we hope the following conclusions are justifiable even on the basis of the limited discussion. First, that thinking of decision making under uncertainty in the way Schmeidler’s models prompt and allow us to do, incorporating issues such as sensitivity of decision makers to the ambiguity of information, does lead to new and important insights about economic phenomena. Second, that while it has long been suspected that issues like ambiguity could be of significance in economics, the great merit of Schmeidler’s contribution has been to provide us with tools that have allowed us to develop tractable models to investigate these intuitions in formal exercises, per standard practice in economic theory. The opportunity we have had is quite unique. There are several other branches of decision theory that depart from the stand expected utility paradigm. But comparatively, these branches have seen far less applied work. One cannot help but speculate that the relative success of the ambiguity paradigm is in no small measure due to the tractability of Schmeidler’s models. Tractability, is a hallmark of a classic model. Lastly, we hope the review has demonstrated that there are many important and exciting questions that remain unanswered. And indeed, enough progress has been made with modeling issues which gives us grounds to be optimistic that the answers to the open questions cannot be far away. Therefore, researchers should consider such questions definitely worth investigating. While we have mentioned several issues worthy of future investigation in the course of our discussions, there are couple of issues that we have not touched upon. One, we hope there will be more directed empirical work, directed to testing predictions and also the basis of some of the assumptions, for instance in financial markets. Two, as would have been evident in our survey, most of the work has been of the “positive” variety, meant to help us “understand” puzzling phenomena. Perhaps, we also want to think about more normative questions: for instance, “What is a good way of deciding between alternatives, in a particular applied context, when information is ambiguous?” The work of Hansen, Sargent and their coauthors is one notable exception, but more is needed, perhaps in fields like environmental policy making where, arguably, information is in many instances, ambiguous.
300
Sujoy Mukerji and Jean-Marc Tallon
Acknowledgment We thank Thibault Gajdos and Itzhak Gilboa for helpful comments.
Notes 1 See Mukerji and Tallon (2003) for a derivation of this inertia property from a primitive notion of ambiguity without relying on a parametric preference model. 2 See for instance Hansen et al. (1999), Hansen et al. (2001) and their book Hansen and Sargent (2004). 3 For instance, it is easy to check that the well-known Gini index relies on a Choquet integral (with respect to a symmetric capacity). Indeed, as recalled in the introduction to this volume, the first axiomatization of a rank dependent model was provided in the framework of inequality measurement (Weymark, 1981). We will not expand on this literature, which has very close links with the RDEU model, here.
References Anderson, E., L. Hansen, and T. Sargent (2003). “A Quartet of Semigroups for Model Specification Robustness, Prices of Risk, and Model Detection,” Journal of the European Economic Association. (Reprinted as Chapter 16 in this volume.) Arrow, K. (1965). “The Theory of Risk Aversion,” in Aspects of the Theory of Risk Bearing. Yrjo J. Saatio, Helsinki. Ben Porath, E., I. Gilboa, and D. Schmeidler (1997). “On the Measurement of Inequality under Uncertainty,” Journal of Economic Theory, 75, 443–467. (Reprinted as Chapter 22 in this volume.) Bernheim, D. (1984). “Rationalizable Strategic Behavior,” Econometrica, 52, 1007–1028. Bewley, T. (1986). “Knightian Decision Theory: Part,” Discussion Paper 807, Cowles Foundation. Billot, A., A. Chateauneuf, I. Gilboa, and J.-M. Tallon (2000). “Sharing Beliefs: Between Agreeing and Disagreeing,” Econometrica, 68(3), 685–694. (Reprinted as Chapter 19 in this volume.) Carlton, D. W. (1979). “Vertical Integration in Competitive Markets Under Uncertainty,” Journal of Industrial Economics, 27(3), 189–209. Chateauneuf, A., R. Dana, and J.-M. Tallon (2000). “Optimal Risk Sharing Rules and Equilibria with Choquet-Expected-Utility,” Journal of Mathematical Economics, 34, 191–214. Chen, Z. and L. Epstein (2002). “Ambiguity, Risk, and Asset Returns in Continuous Time,” Econometrica, 70, 1403–1443. De Waegenaere, A., and P. Wakker (2001). “Nonmonotonic Choquet Integrals,” Journal of Mathematical Economics, 36, 45–60. Dow, J. and S. Werlang (1992). “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, 60(1), 197–204. (Reprinted as Chapter 17 in this volume.) —— (1994). “Nash Equilibrium Under Knightian Uncertainty: Breaking Down Backward Induction,” Journal of Economic Theory, 64(2), 305–324. Dubois, D., M. Grabisch, F. Modave, and H. Prade (2000). “Relating Decision Under Uncertainty and Multicriteria Decision Making Models,” International Journal of Intelligent Systems, 15(10), 967–979.
Overview of economic applications
301
Eichberger, J. and D. Kelsey (2002). “Strategic Complements, Substitutes, and Ambiguity: The Implications for Public Goods,” Journal of Economic Theory, 106(2), 436–466. Epstein, L. and J. Miao (2003). “A Two-person Dynamic Equilibrium Under Ambiguity,” Journal of Economic Dynamics and Control, 27, 1253–1288. Epstein, L. and M. Schneider (2003). “Recursive Multiple Prior,” Journal of Economic Theory, 113, 1–31. Epstein, L. and T. Wang (1994). “IntertemporalAsset Pricing Under Knightian Uncertainty,” Econometrica, 62(3), 283–322. (Reprinted as Chapter 18 in this volume.) Fudenberg, D. and D. K. Levine (1993). “Self-Confirming Equilibrium,” Econometrica, 61, 523—545. Gajdos, T., and E. Maurin (2002). “Unequal Uncertainties and Uncertain Inequalities: An Axiomatic Approach,” Journal of Economic Theory, 116(1), 93–118. Ghirardato, P. (1994). “Agency Theory with Non-additive Uncertainty,” mimeo. Ghirardato, P. and J. N. Katz (2002). “Indecision Theory: Quality of Information and Voting Behavior,” Discussion Paper 1106, California Institute of Technology, Pasadena. Gilboa, I. (1989). “Expectation and Variation in Multi-period Decisions,” Econometrica, 57, 1153—1169. Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18, 141—153. (Reprinted as Chapter 6 in this volume.) —— (1995). “Case-based Decision Theory,” Quarterly Journal of Economics, 59, 605–640. —— (2001). A Theory of Case-Based Decisions. Cambridge University Press, Cambridge. Grabisch, M. (1996). “The Application of Fuzzy Integrals in Multicriteria Decision Making,” European Journal of Operational Research, 89, 445–456. Greenberg, J. (2000). “The Right to Remain Silent,” Theory and Decision, 48(2), 193–204. (Reprinted as Chapter 21 in this volume.) Hansen, L. and T. Sargent (2004). Robust Control and Economic Model Uncertainty. Princeton University Press. Hansen, L., T. Sargent, and J. Tallarini (1999). “Robust Permanent Income and Pricing,” Review of Economic Studies, 66, 873–907. Hansen, L., T. Sargent, G. Turmuhambetova, and N. Williams (2001). “Robusteness and Uncertainty Aversion,” mimeo. Henry, H. (2001). “Generalized Entropy Measures of Ambiguity and its Estimation,” Working paper, Columbia University. Kalai, E. and E. Lehrer (1994). “Subjective Games and Equilibria,” Games and Economic Behavior, 8, 123–163. Klibanoff, P. (1996). “Uncertainty, Decision, and Normal Form Games,” mimeo, Northewestern University. Kogan, L. and T. Wang (2002). “A Simple Theory of Asset Pricing Under Model Uncertainty,” Working paper, University of British Columbia. Lo, K. (1996). “Equilibrium in Beliefs Under Uncertainty,” Journal of Economic Theory, 71(2), 443–484. (Reprinted as Chapter 20 in this volume.) —— (1998). “Sealed Bid Auctions with Uncertainty Averse Bidders,” Economic Theory, 12, 1–20. Lo, K. C. (1999). “Extensive Form Games with Uncertainty Averse Players,” Games and Economic Behavior, 28, 256–270. Lucas, R. (1978). “Asset Prices in an Exchange Economy,” Econometrica, 46, 1429–1445.
302
Sujoy Mukerji and Jean-Marc Tallon
Marinacci, M. (1998). “An Axiomatic Approach to Complete Patience,” Journal of Economic Theory, 83, 105–144. —— (1999). “Limit Laws for Non-additive Probabilities, and their Frequentist Interpretation,” Journal of Economic Theory, 84, 145–195. —— (2000). “Ambiguous Games,” Games and Economic Behavior, 31(2), 191–219. Mukerji, S. (1998). “Ambiguity Aversion and Incompleteness of Contractual Form,” American Economic Review, 88(5), 1207–1231. (Reprinted as Chapter 14 in this volume.) —— (2002). “Ambiguity Aversion and Cost-Plus Contracting,” Discussion paper, Department of Economics, Oxford University. Mukerji, S. and H. S. Shin (2002). “Equilibrium Departures from Common Knowledge in Games with Non-additive Expected Utility,” Advances in Theoretical Economics, 2(1), http://www.bepress.com/bejte/advances/vol2/iss1/art2. Mukerji, S. and J.-M. Tallon (2001). “Ambiguity Aversion and Incompleteness of Financial Markets,” Review of Economic Studies, 68(4), 883–904. (Reprinted as Chapter 15 in this volume.) —— (2004a). “Ambiguity Aversion and the Absence of Indexed Debt,” Economic Theory, 24(3), 665–685. —— (2004b). “Ambiguity Aversion and the Absence of Wage Indexation,” Journal of Monetary Economics, 51(3), 653–670. —— (2003). “Ellsberg’s 2-Color Experiment, Portfolio Inertia and Ambiguity,” Journal of Mathematical Economics, 39(3–4), 299–316. Nehring, K. and C. Puppe (2002). “A Theory of Diversity,” Econometrica, 67, 1155–1198. Ozdenoren, E. (2002). “Auctions and Bargaining with a Set of Priors”. Pearce, D. (1984). “Rationalizable Strategic Behavior and the Problem of Perfection,” Econometrica, 52, 1029–1050. Rigotti, S. and C. Shannon (2004). “Uncertainty and Risk in Financial Markets,” Discussion Paper, University of California, Berkeley, forthcoming in Econometrica. Routledge, B. and S. Zin (2001). “Model Uncertainty and Liquidity,” NBER Working paper 8683, Carnegie Mellon University. Ryan, M. J. (1999). “What Do Uncertainty-Averse Decision-Makers Believe?,” mimeo. University of Auckland. Schmeidler, D. (1989). “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57(3), 571–587. (Reprinted as Chapter 5 in this volume.) Shalev, J. (1997). “Loss Aversion in a Multi-Period Model,” Mathematical Social Sciences, 33, 203–226. Shelanski, H. and P. Klein (1999). “Empirical Research in Transaction Cost Economics: a Review and Assessment,” in Firms, Markets and Hierarchies: A Transactions Cost Economics Perspective, ed. by G. Caroll, and D. Teece, Chap. 6. Oxford University Press. Uppal, R. and T. Wang (2003). “Model Misspecification and Under-diversification,” Journal of Finance, 58(6), 2465–2486. Wakai, K. (2001). “A Model of Consumption Smooting with an Application to Asset Pricing,” Working paper, Yale University. Wang, T. (2003). “A Class of Multi-prior Preferences,” Discussion paper, University of British Columbia. Weymark, J. (1981). “Generalized Gini Inequality Indices,” Mathematical Social Sciences, 1, 409–430.
14 Ambiguity aversion and incompleteness of contractual form Sujoy Mukerji
State-contingent contracts record agreements about rights and obligations of contracting parties at uncertain future scenarios: descriptions of possible future events are listed and the action to be taken by each party on the realization of each listed contingency is specified. Casual empiricism suggests that a real-world contract is very often incomplete in the sense that it may not include any instruction for some possible events. The actual actions to be taken in such events are thus left to ex post negotiation. The fact that real-world contracts are incomplete explains the working of many economic institutions (see surveys by Jean Tirole (1994), Oliver D. Hart (1995), and James M. Malcomson (1997)). Take for instance the institution of the business firm. What determines whether all stages of production will take place within a single firm or will be coordinated through markets? In a world of complete contingent contracts there is no benefit from integrating activities within a single firm as opposed to transacting via the market. However, contractual incompleteness can explain why integration might be desirable and more generally why the allocation of authority and of ownership rights matters. This insight developed by Ronald H. Coase (1937), Herbert A. Simon (1951), Benjamin Klein et al. (1978), Oliver E. Williamson (1985), and Sanford J. Grossman and Hart (1986), among others, underscores the importance of understanding what reasons explain why and in what circumstances contracts are left incomplete. Traditionally, incompleteness of contracts has been explained by appealing to a combination of transactions costs (e.g. Williamson, 1985) and bounded rationality (e.g. Barton L. Lipman, 1992; Luca Anderlini and Leonardo Felli, 1994). These rationalizations validate the incompleteness as an economizing action on the hypothesis that including more detail in a contract involves direct costs. This chapter will provide an alternative explanation based on the hypothesis that decision behavior under subjective uncertainty is affected by ambiguity aversion. Indeed, it will be assumed throughout that there is no direct cost to introducing a marginal contingent instruction into a contract. Suppose an agent’s subjective knowledge about the likelihood of contingent events is consistent with more than one probability distribution, and further that
American Economic Review, 88 (1998), pp. 1207–32.
304
Sujoy Mukerji
what the agent knows does not inform him of a precise (second-order) probability distribution over the set of “possible” probabilities. We say then that the agent’s beliefs about contingent events are characterized by ambiguity. If ambiguous,1 the agent’s beliefs are captured not by a unique probability distribution in the standard Bayesian fashion but instead by a set of probabilities, any one of which could be the “true” distribution. Thus not only is the particular outcome of an act uncertain but also the expected payoff of the action, since the payoff may be measured with respect to more than one probability. An ambiguity-averse decision maker evaluates an act by the minimum expected value that may be associated with it. Thus the decision rule is to compute all possible expected values for each action and then choose the act which has the best minimum expected outcome. The idea being, the more an act is affected adversely by the ambiguity the less its appeal to the ambiguity-averse decision maker. The formal analysis in this chapter basically involves a reconsideration of the canonical model of a vertical relationship between two contracting firms under the assumption that the agents’ common information about the contingent events is ambiguous and that the agents are ambiguity averse. Next, I preview this exercise with a simple example. Consider two vertically related risk-neutral firms, B and S. B is an automobile manufacturer planning to introduce a new line of models. B wishes to purchase a consignment of car bodies (tailor-made for the new models) from S. The firms may sign a contract at some initial date 0 specifying the terms of trade of the sale at date 2; that is, whether trade takes place and at what price. The gains from trade are contingent upon the state of nature realized in date 1. There are three possible contingencies, ω0 , ωb , ωs , with corresponding tradeable surpluses s0 , sb , ss . After date 0 but before date 1, S invests in research for a die that will efficiently cast car bodies required for the new model while B invests effort to put together an appropriate marketing campaign for the new model. The investments affect the likelihood of realizing a particular state of nature. Each firm may choose between a low and a high level of investment effort. The investments are not contractible per se but the terms of trade specification may be made as contingency specific as required. In the case that the contract is incomplete and an “unmentioned” event arises with sure potential for surplus, it is commonly anticipated by the parties that trade will be negotiated ex post and the surplus split evenly. Consider the two possibilities X and Y : X—there is a longer list of reservations for the new model than for comparable makes and at a price higher than those for comparable makes; Y —the variable cost of production of car bodies is low. The state of the world ω0 is characterized by the fact that both the statements are false. At ωb , X is true but not Y ; conversely, at ωs , X is false but Y holds.2 Correspondingly, suppose s0 < sb = ss . The common belief about the likelihood of ωb is at the margin affected (positively) more by B choosing the high investment effort over low effort than by S doing the same, while the opposite is true of ωs . As is customary, we define a (first-best) efficient investment profile as one that would be chosen if investment effort were verifiable and contractible. Bear in mind the allowance of being able to write complete contingent contracts and the institutional setting of a vertical interfirm relationship. As will be
Incompleteness of contractual form
305
formally argued in a subsequent section, given all this and that decision makers are subjective expected utility (henceforth, SEU) maximizers, the non-verifiability of investment will not impede efficiency. In our example for instance, a contract which distinguishes the three contingencies and sets prices that reward B sufficiently higher at ωb than at other contingencies (and similarly reward S at ωs ) will enforce the first-best effort profile. The general conclusion is that if agents are SEU maximizers then an incomplete contract which implements an inefficient profile cannot be rationalized. Such a contract can never be the optimal because it will be possible to find a complete contract that dominates it (i.e. a contract that obtains higher ex ante payoffs for both parties). However, this conclusion is overturned if agents are ambiguity averse. The logic of this may be seen by reevaluating the above-mentioned example with the sole amendment that agents are ambiguity averse. To provide sufficient incentive to take the efficient investment, the ex post payoffs in the contract have to treat the two firms asymmetrically at ωb and ωs ; and ωs ; for B the payoff is higher at ωb than at ωs , while it is the other way around for S. This implies that the firms must use different probability distributions to evaluate their expected payoffs. From the set of probabilities embodying the firms’ symmetric information, B measures its payoffs using a probability distribution that puts a relatively higher weight on ωs than the distribution S thinks prudent to check its payoff against. Consequently, the sum of the expected payoffs will fall short of the expected total surplus—there is a “virtual loss” of the expected surplus. It follows that if this “loss” is large enough the participation constraints will break, thereby making such a contract impossible. An incomplete contract, say the null contract (one that leaves all allocation of contingent surplus to ex post negotiation), is not similarly vulnerable to ambiguity aversion. Such a contract will lead to a proportionate division of surplus at each contingency, implying that each firm will use the same probability to evaluate its payoffs. Additivity of the standard expectation operator then ensures that no “virtual loss” occurs. It will be shown that from all this it follows that there will be parametric configurations for which an incomplete contract, even though only implementing an inefficient investment profile, is not dominated by any other contract. Under such circumstances the market transaction, if maintained, may justifiably be conducted with an “inefficient” incomplete contract. The “inefficiency” of the market transaction would also explain why it might be abandoned in favor of vertical integration. Why might an explanation like the one given earlier be of interest? The final section of the chapter will discuss historic instances of vertical mergers and empirical regularities about supply contracts that are understandable on the basis of ambiguity aversion, but are not well explained by “physical” transactions costs of writing contingent details into contracts. A recurrent claim among business people is that they integrate vertically because of uncertainty in input supply. This idea has always caused difficulties for economists (see, for instance, Dennis W. Carlton, 1979) who have been unable to rationalize it and have generally regarded it as misguided (see, however, George Baker et al., 1997). The analysis in the present chapter explains how the idea of ambiguity aversion provides one precise understanding of the link between uncertainty and vertical integration.3 Moreover, since
306
Sujoy Mukerji
violations of SEU in general, and evidence of ambiguity aversion in particular, have long been noted in laboratory settings, it is worth uncovering what implications such “pathologies” have for “real-world” economics outside the laboratory. The exercise in this chapter is at least partly inspired by this thought. Finally, at a more abstract level, a significant insight obtained is that even if there were no direct cost to conditioning contractual terms on “finely described” events, one may well end up with only “coarse” arrangements because the value of fine-tuning is not robust to the agents’ misgivings that they have only a vague assessment of the likelihoods of the relevant “fine” events. The rest of the chapter is organized as follows: Section 14.1 introduces the framework of the formal decision model and its underlying motivation; Section 14.2 analyses the holdup model under the assumption of ambiguity aversion; Section 14.3 concludes the chapter with a discussion of the empirical significance of the results. The Appendix contains the formal proofs. Those eager for a first pass at the arithmetic of the main results may wish to look at Example 14.2 (which basically fleshes out the above example) in Section 14.2.
14.1. An introduction to the model of decision-making by ambiguity-averse agents It is often the case that a decision maker’s (DM) perception of the uncertain environment is ambiguous in the sense that his knowledge is consistent with more than one probability function. The theory of ambiguity aversion is inspired by two simple hypotheses about decision-making in such situations. First, that behavior is influenced by ambiguity: that is, DM’s behavior actually reflects the fact that his guess about a likelihood may be given by a probability interval. By presumption agents do not necessarily behave as if they have reduced all their ambiguity to a belief consistent with a unique probability using a “second-order” probability over the different probability distributions consistent with their knowledge. Second, that agents are ambiguity averse. That is, ceteris paribus, the more ambiguous their knowledge of the uncertainty the more conservative is their choice. David Schmeidler (1989) pioneered an axiomatic derivation of a model of DMs with preferences incorporating ambiguity aversion. This chapter uses Schmeidler’s model, termed the Choquet expected utility (CEU) model, in the formal arguments. The DM’s domain of uncertainty is the finite state space = {ωi }N i=1 . The DM chooses between acts whose payoffs are state contingent: for example, an act f , f : → R. In the CEU model an ambiguity-averse DM’s subjective belief is represented by a convex non-additive probability function, π . Like a standard probability function it assigns to each event a number between 0 and 1, and it is also true that, (i) π(Ø) = 0 and (ii) π( ) = 1. Where a convex nonadditive probability function differs from a standard probability is in the third property, (iii) π(X ∪ Y) ≥ π(X ) + π(Y) − π(X ∩ Y), for all X , Y ⊂ . By this third property,4 the measure of the union of two disjoint events may be greater than the sum of the measure of each individual event.5 A convex nonadditive probability function is actually a parsimonious representation of the full range of
Incompleteness of contractual form
307
probabilities compatible with the DM’s knowledge. π(X ) is interpreted as the minimum possible probability of X . This is readily seen from the fact that a given convex nonadditive probability π corresponds to a unique convex set of probability functions identified by the core6 of π , denoted by (π ) (notation: ( ) is the set of all additive probability measures on ): (π ) = {πj ∈ ( )|πj (X ) ≥ π(X ), for all X ⊂ }. Hence, π(X ) = minπj ∈(π) πj (X ). The convex nonadditive probability representation enables us to express the notion of ambiguity precisely. We say π is ambiguous if there are two events X , Y such that axiom (iii) holds with a strict inequality; π is unambiguous if axiom (iii) holds as an equality everywhere. (ADM with unambiguous belief is a SEU maximizer.) The ambiguity7 of the belief about an event X is measured by the expression A(π(X )) ≡ 1−π(X )−π(X c ). The relation between π and (π ) shows that the A is indeed a measure of the “fuzziness” of the belief, since, A(π(X)) = maxπj ∈(π) πj (X) − minπj ∈(π) πj (X). The DM evaluates Choquet expectation of each act with respect to the nonadditive probability, and chooses the act with the highest evaluation. Given a convex nonadditive probability π, the Choquet expectation7 of an act is simply the minimum of all possible “standard” expected values obtained by measuring acts with respect to each of the additive probabilities in (π ), the core of π :
CE(f ) = min
⎧ ⎨
πj ∈(π) ⎩
ωi ∈
⎫ ⎬ f (ωi )πj (ωi ) . ⎭
The Choquet expectation of an act is just its standard expectation calculated with respect to a “minimizing probability” corresponding to this act. Hence, in the Choquet method the DM’s appraisal is not only informed by his knowledge of the odds but is also automatically adjusted downwards to the extent it may be affected by the imprecision of his knowledge. The fact that the same additive probability (in the core of relevant nonadditive probability) will not in general “minimize” the expectation for two different acts, explains why the Choquet expectations operator, unlike the standard operator, is not additive: Property. For any two acts f , g : CE(f ) + CE(g) ≤ CE(f + g). Two acts are comonotonic if their outcomes are monotonic across the state space in the same way: that is, the acts f and g are comonotonic if for every ω, ω ∈ , (f (ω) − f (ω ))(g(ω) − (g(ω )) ≥ 0. Clearly, for comonotonic acts the “minimizing probability” will be the same. Hence, the Choquet expectations operator is assuredly additive if the acts being considered are comonotonic, but not otherwise. The first example explains how noncomonotonicity may lead to the failure of additivity.
308
Sujoy Mukerji Table 14.1 Relevant details about the contingent states Possible states
Nonadditive probability of the state Total surplus in the state Surplus designated for B in the state Surplus designated for S in the state
ws
wb
π(ws ) = 0.4 100 40 60
π(wb ) = 0.3 100 60 40
Example 14.1. Two agents, B and S, are considering their respective payoff from an agreement for sharing a contingent “surplus.” Table 14.1 indicates the (nonadditive) probability describing the common information about the uncertainty, the contingent surpluses, and the division of the surplus specified in the agreement. Given π(·), B’s expected payoff is obtained by taking expectations with respect to the relevant minimizing probability in the core of π ≡ CE(b) = 0.7 × 40 + 0.3×60 = 46. Similarly, S’s expected payoff ≡ CE(s) = 0.4×60+0.6×40 = 48. Finally, the expectation of the total surplus ≡ CE(b + s) = 100. Clearly, CE(b) + CE(s) = 94 < CE(b+s). Note that the payoff vectors chosen for B and S are noncomonotonic. This is responsible for the fact that (given π ) the minimizing probability corresponding to B’s payoffs (0.7; 0.3) is different from the corresponding to S’s payoffs (0.4; 0.6)—hence the evident failure of additivity. Notice how b and s mutually “hedge” against the ambiguity. This is the “economic” intuition of why “integrated” payoffs given by (b + s) is relatively robust (in the sense that its expected payoff is less affected by the possible mistake about the actual probability) to ambiguity aversion.9 In the example, the Choquet operator calculates expectations using the nonadditive probability directly by multiplying the nonadditive probability of each ωi with the payoff at the respective wi and then multiplying the “residual” [π({ωs , ωb }) − π(ωs ) − π(ωb )] with minimum outcome of the act across the two states. Thus, CE(b) = 40 × π(ωs ) + 60 × π(ωb ) + min{40, 60} × [π({ωs , ωb }) − π(ωs ) − π(ωb )] = 0.4 × 40 + 0.3 × 60 + 40 × 0.3 = 46. In general, the operator will associate the “residual” in an event with the worst outcome in the event. See Example 14.A.1 for further clarification. A convex nonadditive probability function expresses the idea that an agent’s knowledge of the true likelihood of an event E, is less vague than his knowledge of the likelihood of a cell in a “fine” partition of E.10 It is common experience that the evidence and received knowledge that informs one of the likelihood of a “large”
Incompleteness of contractual form
309
event does not readily break down to give similar information about the “finer” constituent events. While it is routine to work out an “objective” next-day forecast of the probability of rain in the New York area, the same analytics generally would not yield a similar forecast for a particular borough in New York. Beliefs are not built “bottom up”: one typically does not figure out the belief about a “large” event by putting together the beliefs about all possible subevents. This rationale for nonadditive probabilities is formalized in Paolo Ghirardato (1994) and Mukerji (1997). These papers also point out how a similar rationale explains why the DM’s awareness that the precise implication of some contingencies is inevitably left unforeseen will typically lead to beliefs that have nonadditive representation. The papers explain the Choquet decision rule as a “procedurally rational” agent’s means of “handicapping” the evaluation of an act to the extent the estimate of its “expected performance” is adversely affected by his imprecise knowledge of the odds. There is considerable experimental evidence (see Colin F. Camerer and Martin Weber, 1992) demonstrating that DMs’ choices are influenced by ambiguity and also that aversion to the perceived ambiguity is typical. The classic experiment, due to Daniel Ellsberg (1961), runs as follows: There are two urns each containing one hundred balls. Each ball is either red or black. The subjects are told of the fact that there are fifty balls of each color in urn I . But no information is provided about the proportion of red and black balls in urn II. One ball is chosen at random from each urn. There are four events, denoted IR, IB, IIR, IIB, where IR denotes the event that the ball chosen from urn I is red, etc. On each of the events a bet is offered: $100 if the event occurs and $0 if it does not. The modal response is for a subject to prefer every bet from urn I (IR or IB) to every bet from urn II (IIR or IIB). That is, the typical revealed preference is IB IIB and IR IIR. (The preferences are strict.) The DM’s beliefs about the likelihood of the events, as revealed in the preferences, cannot be described by a unique probability distribution. The story goes: People dislike the ambiguity that comes with choice under uncertainty; they dislike the possibility that they may have the odds wrong and so make a wrong choice (ex ante). Hence they go with the gamble where they know the odds—betting from urn I . It is straightforward to check that the choice is consistent with convex nonadditive probabilities: For instance, let π(IR) = π(IB) and π(IR) + π(IB) = π(IR ∪ IB) = 1; also let π(IIR) = π(IIB), but allow π(IIR) + π(IIB) < π(IIR ∪ IIB) = 1. It follows that the expected payoff from betting on IR ≡ CE(IR) = CE(IB) = 50; and, CE(IIR) = CE(IIB) = π(IIR) × 100 = π(IIB) × 100 < 50. The theory of ambiguity aversion lends fresh insight into the analysis of important economic problems. Despite the relative novelty of the theory, there is already convincing evidence of this. Interesting applications in the area of finance include James P. Dow and Sergio R. Werlang (1992), Larry G. Epstein and Tan Wang (1994, 1995), and Jean-Marc Tallon (1998). Specific applications to strategic interaction
310
Sujoy Mukerji
include Dow and Werlang (1994), Mukerji (1995), Jurgen Eichberger and David Kelsey (1996), and Kin Chung Lo (1998).
14.2. Ambiguous beliefs, investment holdup and incomplete contracts—a formal analysis Consider a (downstream) buyer B who wishes to purchase a unit of a homogeneous input from an (upstream) seller S. B and S, who are assumed to be risk neutral, may sign a contract at an initial date 0. The contract will specify the terms of trade at date 2, which is when 0 or 1 unit of the input may be produced and traded. After date 0 but before date 1, B and S make relation-specific sunk investments β ∈ {βH , βL } and σ ∈ {σH , σL } respectively. Investments are like effort, in the sense of having an unverifiable component; and so it is supposed that β and σ are not contractible. The buyer’s valuation, v, and the seller’s production costs, c, are contingent on state of the world ωi realized at date 1; ωi is drawn from a finite set = {ωi }N i=1 , and v = v(ωi ), c = c(ωi ). The (contingent) surplus at ωi is s(ωi ) ≡ v(ωi ) − c(ωi ). The contingencies are indexed such that s(ωi ) is (weakly) increasing in i. The vector of joint investments determines the belief (common information to B and S) about the likelihood of each contingent event, represented by a (possibly nonadditive and convex) probability distribution over 2 . π(E|β, σ ) is the probability that the event E ⊆ is realized when the investment profile is (β, σ ). hB (β), hS (σ ) denote the respective (private) costs of the actions β and σ . Since investment of one party may affect the likelihood of the ex post surplus and hence the expected gains from trade of the other party, there is a potential problem of untapped externalities, and therein lies the genesis of the holdup problem. The primitives of the model are then described by a tuple (π(·|β, σ ), hB (·), hS (·), v(·), c(·)). If f : → R, then f denotes the vector [f (ω1 ), . . . , f (ωN )]; π (β, σ ) denotes the vector [π(ω1 |β, σ ), . . . , π(ωN |β, σ )]. Given a contingency ωi , let X(ωi ) denote the event {X ⊆ |ωj ∈ X ⇔ j ≥ i}. The following assumptions on the specification of the model are maintained throughout. Assumption 14.1. hB (βH ) > hB (βL ) > 0, hs (σH ) > hB (σ )L > 0. Assumption 14.2a. π(X(ωi )|βH , σ ) ≥ π(X(ωi )|βL , σ ) for all i ∈ {1, . . . , N } and there is an ωn with the property that s(ωn ) > s(ωn−1 ), where the above inequality holds strictly. Assumption 14.2b. π(X(ωi )|β, σH ) ≥ π(X(ωi )|β, σL ) for all i ∈ {1, . . . , N } and there is an ωn∗ with the property that s(ωn∗ ) > s(ωn∗ −1 ), where the above inequality holds strictly. The first assumption has it that “H” actions are costlier than “L” actions. Assumption 14.2a(b) simply says that βH (σH ) stochastically dominates βL (σL ) in the first-order sense.11
Incompleteness of contractual form
311
ES(β, σ ), defined in Equation (14.1), is understood to be the expected net surplus under vertical integration when the profile (β, σ ) is chosen: ES(β, σ ) ≡ E(max{s(ωi ), 0}|β, σ ) − hB (β) − hS (σ ).
(14.1)
(E denotes the standard expectations operator when π is additive and the Choquet expectations operator if π is nonadditive.) The expression ES(β, σ ) provides a natural way to rank profiles (β, σ ) as first best,12 second best, and so on. For instance, (β ∗ , σ ∗ ) is a first-best profile if (β ∗ , σ ∗ ) = arg max{ES(β, σ )}. To keep matters interesting I will restrict attention to parameter configurations which ensure that the expected net surplus from a second-best profile is nonnegative. It is assumed that parties can write contracts as contingency specific as they choose; that is, parties may make prices and delivery contingent on events in
in any way they wish.13 A contingent price p(ωi ) is the payment by B to S subject to the realization of ωi . The price may take a negative value; one may informally interpret the price p(ωi ) < 0 as a fine paid by S to B. A contingent delivery rule is a function δ : → {0, 1}, where δ(ωi ) = 1 (or 0) indicates an agreement for contingent delivery14 (or nondelivery) on date 2. A contract is a list of contingent instructions {(p(ωi ), δ(ωi ))}i∈N ⊆{1,...,N } . A given contract will imply a particular allocation of the surplus between the buyer and seller at each contingency. Correspondingly, a contingent transfer t(ωi ) ≡ p(ωi ) − c(ωi )δ(ωi ), is S’s ex post payoff at ωi ; the complement s(ωi )δ(ωi )−t(ωi ) goes to B. A contract implements an action profile (β, σ ) if for all β ∈ {βH , βL } and σ ∈ {σH , σL } the conditions I CB , P CB , I CS , P CS stated below are satisfied. E(v(ωi )δ(ωi ) − p(ωi )|β, σ ) − E(v(ωi )δ(ωi ) − p(ωi )|β , σ ) ≥ hB (β) − hB (β ) E(v(ωi )δ(ωi ) − p(ωi )|β, σ ) − hB (β) ≥ 0
(14.2) (14.3)
E(p(ωi ) − c(ωi )δ(ωi )|β, σ ) − E(p(ωi ) − c(ωi )δ(ωi )|β, σ ) ≥ hS (σ ) − hS (σ ) E(p(ωi ) − c(ωi )δ(ωi )|β, σ ) − hS (σ ) ≥ 0.
(14.4) (14.5)
The conditions ensure that payments obtained in the contract meet the incentive and participation constraints of the buyer and seller, so that they choose β and σ . A contract is deemed incomplete if it fails to include instructions for the value of p(ωi ) or of δ(ωi ) for at least one contingency ωi . The null contract is an extreme example of an incomplete contract. It is a contract which does not specify instructions about the terms of trade to be observed in any contingency (or whether any trade is to be conducted at all). It is assumed that if a contingency promising
312
Sujoy Mukerji
positive gains from trade is reached parties engage in trade in spite of the absence of instructions. In such an instance the terms of trade are negotiated ex post. It will be taken as given that any instance of ex post bargaining results in a λ (to B), 1 − λ (to S) proportional division of the surplus arising from trade. It is assumed that the value of λ is commonly known to both parties at date 0. Consider a contract C and let tC (·) be the associated transfer rule (where tC (·) is suitably extended in accordance with our assumption about the division of surplus at unwritten contingencies). Denote I(C) to be the set of profiles that can be implemented by C. Then the expected payoff from C is max
(β,σ )∈I (C )
{E(s(ωi )δ(ωi ) − tC (ωi )|β, σ ) + E(tC (ωi )|β, σ )}.
(In this maximand, E(s(ωi )δ(ωi ) − tC (ωi )) is B’s expected payoff and E(tC (ωi )) is S’s expected payoff.) I define optimal contracts to be Pareto optimal (ex ante). Hence a contract is optimal if there does not exits another contract with a greater expected payoff. The point of writing a contingent contract in this setting is to make payoffs contingent on events in a way that ensures that parties have the right incentives for undertaking the requisite actions. In other words, the contingent events serve as proxies for the actual (noncontractible) actions. Hence, for the exercise to be meaningful at all, the contingency space has to be minimally informative. Condition 14.1 ensures that there are contingencies which are differentially informative of agents’ actions. In other words, for any action there is at least one contingency whose likelihood is differently affected (at the margin) by this action than by any other. If this were not to hold, contingent events would be completely uninformative about the actions taken by the parties and writing contingent contracts would not be a meaningful exercise. Henceforth, a convex nonadditive probability π(·|β, σ ) will be referred to as an informed belief if it satisfies Condition 14.1. Condition 14.1 (Informativeness). Let (π(·|β, σ )) denote the set of additive probability measures in the core of the convex nonadditive probability function π(·|β, σ ). Suppose πm and πn are probability functions in (π(·|βH , σH )), πr is any member of (π(·|βL , σH )), and πq is any member of (π(·|βH , σL )). Then there exists at least one pair of contingencies ωk and ωl , such that the vectors (πm (ωk ) − πr (ωk ), πm (ωl ) − πr (ωl )) and (πn (ωk ) − πq (ωk ), πn (ωl ) − πq (ωl )) are linearly independent. The condition, as stated, applies to any convex nonadditive probability, which includes as a special case the instance when beliefs are additive (i.e. they are trivially nonadditive). The technical import of the condition is simpler to grasp if
Incompleteness of contractual form
313
we were to focus on this special case. In this special case, the condition requires independence between the vectors indicating the marginal effect of β and σ on the unique probability describing the uncertainty. In general, the condition requires the same of every distribution in the set of probabilities consistent with the nonadditive probability that might describe the uncertainty. The condition ensures that there are at least two contingencies whose likelihoods (as described by any of the probability functions consistent with the DM’s knowledge) are differently affected at the margin by βH and σH . The following lemma assures us that Condition 14.1 does the job required of it: if the condition is satisfied, then there exist ways of conditioning payments to meet the incentive compatibility requirements of implementing the first best. Lemma 14.1. Suppose π(·|β, σ ) is an informed convex nonadditive probability function and that (βH , σH ) is the first-best action profile. Then there will always be a bounded transfer rule, t˜, which satisfies the incentive compatibility conditions I CB , I CS : (I C B ) E(max{s(ωi ), 0} − t˜(ωi )|βH , σH ) − E(max{s(ωi ), 0} − t˜(ωi )|βL , σH ) ≥ hB (βH ) − hB (βL ) (I C S ) E(t˜(ωi )|βH , σH ) − E(t˜(ωi )|βH , σL ) ≥ hS (σH ) − hS (σL ). The first proposition proves that if agents’ beliefs are not ambiguous, then informativeness of the contingency space guarantees the existence of a contract that implements the first best. The result is not really new; Steven R. Williams and Roy Radner (1988) and Patrick Legros and Hitoshi Matsushima (1991), for example, arrive at a similar conclusion. Proposition 14.1. Suppose π(·|β, σ ) is unambiguous and informed. Let (βH , σH ) be the first-best action profile. Then there exists a contract with an associated bounded transfer rule, t, that implements the first best. The formal proof appears in the Appendix but the basic argument is straightforward: if, at the margin, the effect of B’s action on the likelihood of event {ωk } is greater than its effect on the likelihood of event {ωl } (and the converse is true for S’s action vis-à-vis events {ωk } and {ωl }), then a unit of the contingent surplus assigned to B in the event {ωk } has a relatively greater impact on B’s decisions than a unit of contingent surplus assigned to it in the event {ωl }; a parallel argument works for S and the events {ωl } and {ωk }. Hence adequate incentives can be put in place by rewarding B sufficiently higher at {ωk } than at {ωl }, and by rewarding S higher at {ωl } than at {ωk }. Since the expectation operator is additive, the sum of the agents’ expected payments would be equal to the net surplus obtainable under vertical integration. Thus the participation constraints may be satisfied by appropriate ex ante transfers. Indeed, Williams and Radner (1988) demonstrate that generically, in the space of probability distributions, there exist contingent transfers that would
314
Sujoy Mukerji
enforce efficient implementation. In other words, a condition such as 14.1 will hold generically in the data of the model. An argument invoking mathematical genericity is not necessarily compelling though, since the data of the model is not necessarily generated by “random sampling.” Arguably, the parameter values are likely to be specific to the institutional setting. However, bear in mind the relevant institutional setting is one of a vertical relationship between two firms operating distinct (even though, complementary) production processes. As such it is only reasonable to assume that events such as {ωl } and {ωk } are bound to exist. (In Example 14.2, and its verbal description in Section 14.1, {ωb } and {ωs } are such events.) Therefore, Proposition 14.1 gives us compelling reason to conclude that at least as far as vertical relationships between firms are concerned, if agents’ common beliefs were unambiguous and there were no direct cost to drafting contingent payments, contracts can achieve efficient implementation. Clearly then, in such a world, mergers cannot better what can be achieved with contracts and furthermore, an incomplete contract that does not implement efficiently cannot be the best possible contract. The following Corollaries 14.1 and 14.2 summarize these conclusions. Corollary 14.1. Suppose π(·|β, σ ) is unambiguous and informed. Then any contract that does not implement the first best is not an optimal contract. Corollary 14.2. Suppose π(·|β, σ ) is unambiguous and informed. Then the maximum net expected surplus under vertical integration max{ES(β, σ )}, is no greater than the expected payoff obtainable from an optimal contract. The next and key proposition shows that contrary to the first result, even if informative events exist, ambiguity-averse agents may not be able to draft contracts that implement the first best. Proposition 14.2 says that provided the convex nonadditive probability π is ambiguous and satisfies two conditions, there exists a nonempty set of cost and value functions such that the efficient investment cannot be implemented with a contract—and this in spite of there being sufficiently informative contractible events. The first of these two conditions, encapsulated in Conditions 14.2a and 14.2b, essentially rules out cases where the contingency space (labeled according to increasing surplus) may be partitioned into two cells, one of which is such that its likelihood is affected (at the margin) by only one party (B or S, not both). The final condition, Condition 14.3, may be interpreted to mean that the belief about any event consisting of two contiguous contingencies is ambiguous.15 The role played by the two conditions is explained along with the general intuition for the proof, after the statement of the proposition. A precise statement of Condition 14.2 and the details of the proof are especially facilitated if we were to define a concept of the “social benefit of an action.” Define the Social Benefit of the action βH over the action βL [denoted SocBen (βH /βL )] as E(max{s(ωi ), 0}|βH , σH ) − E(max{s(ωi ), 0}|βL , σH ). Analogously, SocBen (σH /σL ) ≡ E(max{s(ωi ), 0}|βH , σH ) − E(max{s(ωi ), 0}|βH , σL ). Define π(X|β/β , σ ) ≡ π(X|β, σ ) − π(X|β , σ ),
Incompleteness of contractual form
315
and π(X|β, σ/σ ) ≡ π(X|β, σ ) − π(X|β, σ ). Condition 14.2a. There does not exist an i ∗ ∈ {1, . . . , N } such that s(ωk )[π(X(ωk )|βH /βL , σH ) − π(X(ωk+1 )|βH /βL , σH )] N>k≥i ∗
+ s(wN )π({ωN }|βH /βL , σH ) = SocBen(βH /βL ) and
s(ωk )[π(X(ωk )|βH , σH /σL ) − π(X(ωk+1 )|βH , σH /σL )]
0
= SocBen (σH /σL ). Condition 14.2b. There does not exist a j ∗ ∈ {1, . . . , N } such that s(ωk )[π(X(ωk )|βH , σH /σL ) − π(X(ωk+1 )|βH , σH /σL )] N>k≥j ∗
+ s(ωN )π({ωN }|βH , σH /σL ) = SocBen(σH /σL ) and
s(ωk )[π(X(ωk )|βH /βL , σH ) − π(X(ωk+1 )|βH /βL , σH )]
0
= SocBen (βH /βL ). Condition 14.3. π({ωk+1 , ωk }|β, σ ) > π(ωk+1 |β, σ ) + π(ωk |β, σ ) for all k ∈ {1, . . . , N − 1} and for all β ∈ {βH , βL }, σ ∈ {σH , σL }. Proposition 14.2. Suppose π(·|βH , σH ), π(·|βH , σL ), π(·|βL , σH ), π(·|βL , σL ) are informed and ambiguous beliefs, and let (βH , σH ) be the first-best profile. Provided the beliefs also satisfy Conditions 14.2a, 14.2b, and 14.3, there exist investment cost functions hB (·), hS (·), and value and cost functions v(·) and c(·), such that no contract may implement (βH , σH ). The proposition follows immediately from Lemmas 14.A.1, 14.A.2, and 14.A.3 proved in the Appendix. The simple intuition inspiring the formal proof may be stated as follows: If contingent payments are comonotonic (e.g. a proportional split of the surplus) then the full incremental social benefit of an agent’s action does not get passed on to the agent. Hence, the agent’s individual incentive to take the first-best action will be lower than the full marginal benefit of the action. So if the marginal cost of the first-best action is high enough, only noncomonotonic contingent payments could satisfy the relevant incentive constraints.
316
Sujoy Mukerji
But with noncomonotonic payments, given in Condition 14.3, the sum of the individual expected payoffs is bound to be less than the expected surplus under vertical integration. Therefore, if the marginal cost of the first-best action is high enough, one may not find contingent payments that satisfy both the incentive and participation constraints. However, if Condition 14.2 were not to hold then it would be possible to design comonotonic contingent payoffs that enforce the efficient investment. Basically, such a payoff scheme would partition into two cells {ω1 , . . . , ωi ∗ } and {ωi ∗ +1 , . . . , ωN }: all the surplus in {ω1 , . . . , ωi ∗ } will go to one party and all the incremental surplus in the complementary cell (i.e. s(ωi ∗ +1 ) − s(ωi ∗ ), s(ωi ∗ +2 ) − s(ωi ∗ ), . . .) will go to the other party. The argument in Proposition 14.2 suggests why incomplete contracts may not be a paradox in a world with ambiguity aversion since their inefficiency will not be “readily fixable”—complete contracts do not help very much. Proposition 14.3 shows that the same conditions as in Proposition 14.2 also imply that there will exist cost and value functions corresponding to which the null contract is an optimal contract, even though it implements a less than first-best profile. This shows formally that ambiguity aversion can rationalize an “inefficient” incomplete contract even when there exist sufficiently informative events for conditioning contractual payments. As is known from Corollary 14.1, this is impossible without ambiguity aversion. Before stating the formal result, we look at an example. It will illustrate the arithmetic of how because of ambiguity aversion, a null incomplete contract that can at best implement an inefficient profile can dominate a contract which satisfies the incentive constraints for implementing the first-best profile. (A verbal description of the example appeared in the introduction.) Example 14.2. Consider two vertically related firms B and S. = {ω0 , ωb , ωs }; the values of other parameters are as follows: s(ω0 ) = 0, s(ωb ) = s(ωs ) = 200; hB (βL ) = 10; hS (σL ) = 10, hB (βH ) = 85, hS (σH ) = 85; π (β L , σ L ) = (0.78, 0.01, 0.01), π (β H , σ H ) = (0.02, 0.39, 0.39); π (β H , σ L ) = (0.42, 0.365, 0.015); π(β L , σ H ) = (0.42, 0.015, 0.365); π({ωb , ωs }|β, σ ) − π({ωb }|β, σ ) − π({ωs }|β, σ ) = 0.1, π({ω0 , ωb }|β, σ ) − π({ω0 }|β, σ ) − π({ωb }|β, σ ) = 0.1, π({ω0 , ωs }|β, σ ) − π({ω0 }|β, σ ) − π({ωs }|β, σ ) = 0, π( |β, σ ) = 1;
λ¯ = 0.5.
Incompleteness of contractual form
317
Table 14.2 Nonadditive probability over each state corresponding to each action profile
β L , σL β H , σL β L , σH β H , σH
ω0
ωb
ωs
0.78 0.42 0.42 0.02
0.01 0.365 0.015 0.39
0.01 0.015 0.365 0.39
The first point to observe is that the essential effect of B’s taking the H action is to “shift likelihood” from the low surplus state ω0 to the high surplus state ωb . Symmetrically, S’s H action would “shift likelihood” from the low surplus state ω0 to the high surplus state ωs . This information is summarized in Table 14.2. (NB, the actions do not affect the “residual probability” over any event; for the events {ωb , ωs } and {ω0 , ωb } the residual is constant at 0.1 while it is 0 for {ω0 , ωs }.) B and S have a “comparative advantage” in making ωb and ωs , respectively, more likely at the margin, as it were. Next note that (βH , σH ) is the first best, and (βL , σL ) is the second-best action profile. To confirm this, compare the expected net surplus from the possible action profiles: ES(βH , σH ) = 0.39 × s(ωb ) + 0.39 × s(ωs ) + 0.1 × min{s(ωb ), s(ωs )} + 0.02 × s(ω0 ) + 0.1 × min{s(ωb ), s(ω0 )} − 170 = 6, ES(βL , σH ) = 0.365 × 200 + 0.015 × 200 + 0.1 × min{200, 200} + 0.42 × 0 + 0.1 × min{200, 0} − 95 = 1, ES(βH , σL ) = 0.015 × 200 + 0.365 × 200 + 0.1 × min{200, 200} + 0.42 × 0 + 0.1 × min{200, 0} − 95 = 1, ES(βL , σL ) = 0.01 × 200 + 0.01 × 200 + 0.1 × min{200, 200} + 0.78 × 0 + 0.1 × min{200, 0} − 20 = 4. Next consider the contract C : ⎧ ⎪ ⎨ p(ωb ) = c(ωb ); δ(ωb ) = 1 p(ωs ) = v(ωs ); δ(ωs ) = 1 C (ωi ) = ⎪ ⎩ p(ω ) = 0; δ(ω ) = 0. 0 0 This contract allocates the entire surplus at ωb to B and rewards S similarly at ωs . Given the way the parameters lie, this would seem to be a natural way to deliver the
318
Sujoy Mukerji
right incentives. Indeed, C does satisfy the incentive constraints for implementing the profile (βH , σH ): I CB :
[0.39 × s(ωb ) + 0.39 × 0 + 0.1 × min{s(ωb ), 0}] − [0.015 × s(ωb ) + 0.365 × 0 + 0.1 × min{s(ωb ), 0}] = 75 = hB (βH ) − hB (βL )
I CS :
[0.39 × 0 − 0.39 × s(ωs ) + 0.1 × min{0, s(ωs )}] − [0.365 × 0 + 0.015 × s(ωs ) + 0.1 × min{0, s(ωs )}] = 75 = hS (σH ) − hS (σL ).
However, no ex ante transfers may be arranged to satisfy the participation constraints since the expected payoff from C is negative. The expected payoff is (0.39 × 200 + 0.39 × 0 + 0.1 × min{200, 0}) × 2 − 85 × 2 = −14 < 0. Also note that the incentive constraints corresponding to C hold without the slightest slack. Any other contract which “smoothed” out the ex post payoffs (so as to increase the sum of the ex ante payoffs) would break at least one of the incentive constraints for the H actions. Thus the first best cannot be implemented. On the other hand, it is simple to check that the null contract will successfully implement the second-best profile (βL , σL ) given the ex post renegotiation anticipated at the time of signing the contract. Hence the null contract dominates C . The null contract is equivalent to having a complete contract that instructs no trade at each contingency, with renegotiation and trade arising when the states ωb and ωs are realized. Obviously, completing the contract in this way is as good as leaving the contingencies unmentioned. (It is also possible to mimic the null contract with a complete contract by actually instructing formally in the contract an equal split of the surplus and full delivery at all contingencies.) The following lemma, which is used in proving Proposition 14.3, shows how incomplete contracts (especially null contracts) are relatively “robust” to ambiguity aversion. Such contracts actually entail a comonotonic (ex post) payoff scheme across the set of contingencies where the contract (by its silence) leaves payoffs to be determined by ex post negotiation. Compared to contracts with noncomonotonic payoffs, these contracts are robust in the sense that the sum of the value of a contract to B and to S is relatively less adversely affected by ambiguity. Thus if the contingent payoffs from a null contract can satisfy individual incentive constraints and the aggregate participation constraint for a particular profile, it can necessarily implement the profile. Lemma 14.2. Suppose the following inequalities hold and the expectations are evaluated with respect to a nonadditive probability: ¯ ˆ σˆ ) − E(max{s(ωi ), 0}|β, σˆ )] ≥ hB (β) ˆ − hB (β) λ[E(max{s(ω i ), 0}|β, (14.6)
Incompleteness of contractual form
319
ˆ σ )] ≥ hS (σˆ ) − hS (σ ) ¯ ˆ σˆ ) − E(max{s(ωi ), 0}|β, (1 − λ)[E(max{s(ω i ), 0}|β, (14.7) ˆ σˆ ) ≥ hS (σˆ ) + hB (β). ˆ E(max{s(ωi ), 0}|β,
(14.8)
ˆ σˆ ) if it specifies an allocation of ex Then a contract will implement the profile (β, ¯ post surplus in accordance with the uncontingent transfer rule t(·) ≡ λ. Proposition 14.3. Let (π ∗ (·|βH , σH ), π ∗ (·|βH , σL ), π ∗ (·|βL , σH ), π ∗ (·|βL , σL )) be a tuple of ambiguous and informed beliefs that satisfy Conditions 14.2a, 14.2b, and 14.3. Then for each such tuple there exist investment cost functions h∗B (·), h∗S (·), and value and cost functions v ∗ (·) and c∗ (·), such that the null contract is an optimal contract, even though the null contract does not implement the first-best profile. A gist of the proof would run as follows: By Proposition 14.2 we are assured that corresponding to any belief satisfying Conditions 14.2 and 14.3, there is a set of parametric configurations for which there are no contracts that implement the first best. Then it can be shown that for a nonempty subset of the set of such parametric configurations a null contract will satisfy the necessary incentive constraints for implementing a second-best profile. By Lemma 14.2 it then follows that the null contract will actually implement the second-best profile. Since the first-best profile cannot be implemented by any contract, the null contract in such cases must be the optimal contract. Since this is true even in the case of an informed belief, the result stands in stark contrast to Corollary 14.1. It is important to clarify the sense in which this reasoning “predicts” incomplete contracts. Formally, the incomplete contract corresponds to the parties knowing that they will split the surplus ex post according to a state-independent sharing rule. But they could write a complete contract that calls for this explicitly. Hence the argument per se does not show that ambiguity aversion can imply that the optimal contract must be incomplete. Nevertheless, the argument does imply that even the smallest transactions costs would make the incomplete contract (strictly) preferable. Proposition 14.3 also suggests a rationalization of the elusive connection between supply uncertainty and vertical integration mentioned in the introductory section. Suppose, as seems natural, we interpret “supply uncertainty” to mean the uncertainty about the (realized) price and delivery associated with a supply contract. Could there be circumstances wherein the supply uncertainty accompanying any contract with sufficient incentives for the efficient action is more than what the transacting parties want to bear? In such cases vertical integration as an institutional form would have the potential of generating strictly greater value than a contractual transaction. Corollary 14.2 rules out such circumstances if the agents’ beliefs about the uncertainty is unambiguous. However, Proposition 14.3 allows us to infer that such cases do exist under ambiguity aversion, as Corollary 14.3 records. Thus it is the possible ambiguity aversion associated with supply uncertainty which provides the logical link to vertical integration.
320
Sujoy Mukerji
To see the point a little differently, given that agents perceive the uncertainty to be ambiguous and are ambiguity averse, an “efficient” contractual relationship (because of the “high-powered” nature of the incentive scheme it must embody) would only exacerbate the adverse effect of the uncertainty. This “loss of surplus” is avoided by integrating. By monitoring via an adminstrative hierarchy integration makes it possible to deliver adequate incentives without involving contingent payments. Obviously, integration would be the preferred alternative only if the costs of “physical” monitoring is less than the “loss of surplus” that accompanies the use of “financial” monitoring via a contract. Corollary 14.3. Let {π(·|β, σ )} be a tuple of informed and ambiguous beliefs that satisfy Conditions 14.2a, 14.2b, and 14.3. Then for each such tuple there exist investment cost functions and value and cost functions such that the maximum expected net surplus under vertical integration, max{ES(β, σ )}, is strictly greater than the expected payoff obtainable from an optimal contract.
14.3. Concluding discussion The incomplete contracts literature focusing on investment holdup has for the most part maintained the salience of transactions costs in providing the empirical cornerstone for the theory. Since the effect of ambiguity aversion is essentially to reduce the marginal gains from including more details in a contract, under ambiguity aversion even small transactions costs may result in incompleteness. But there is reason to think that the role of ambiguity aversion is not merely supplementary to transactions costs. It is crucial to take into account the nature of the uncertainty characterizing the decision environment in order to explain the empirical realities about contractual forms and vertical relationships. It is a fact that detailed contingent contracts successfully underpin many business relationships. Where contracts fail or exist as incompletely specified arrangements, are situations where uncertainty is rife. Paul L. Joskow (1985) notes examples of lasting long-term contractual relationships between electricity generating plants and minemouth coal suppliers. The story of the merger between Fisher Body and GM is an oft-cited example in the incomplete contracts/vertical integration literature. It is worth remembering that Fisher and GM did actually successfully transact business via a contract for almost ten years before the merger occurred. The reason for the collapse of the contract, as explained by Klein et al. (1978), was the unprecedented and dramatic demand uncertainty faced by the automobile market in the 1920s. The transactions-cost paradigm cannot explain why detailed, long-term contingent contracts thrive in a relatively stable world like that of the coal–electrical utility nexus but not in the more complex uncertain environments. Ambiguity aversion on the other hand would explain this in more intuitive terms.16,17 Ambiguity aversion also explains why long-term vertical relationships exist when the input supplied is “standard equipment,” whereas R&D intensive inputs are typically integrated within the firm. While it may appear that the last observation is fully explained by the notion of “transaction-specific assets,” there is reason to
Incompleteness of contractual form
321
think otherwise. Scott E. Masten et al. (1989) report evidence regarding the relative influence of transaction-specific investments in physical and human capital on the pattern of vertical integration using data obtained directly from U.S. auto manufacturers. Their results support the propositions that investments in specialized technical know-how have a stronger influence than those in specialized physical capital on the decision to integrate production within the firm. It is well known that government/defense procurement contracts are typically incompletely specified and prone to renegotiation when R&D is a significant component of the goods being supplied. But complicated contracts running to several thousand pages are quite common with “large” orders of goods not involving significant R&D. That we inhabit a world of incomplete contracts is not necessarily because agents are constrained from conditioning contractual instructions on “finely” described events by the direct costs of envisaging and/or writing down the relevant details. Rather, it could simply be because DMs perceive that they have very vague ideas about the likelihood of such events. The understanding that how well the DM thinks he knows the relevant likelihoods explains what events are used to condition contractual instructions is a novel contribution of the theory of ambiguity aversion to the debate about the foundations of incomplete contracts. The understanding is indeed novel since to a SEU maximizer the quality or accuracy of his belief does not matter. The explanation of contractual incompleteness advanced in this chapter turns on the assumption that the specific investments are not contractible. At least as long as value and cost are simply functions of the realized contingency [i.e., v = v(w) and c = c(w)], and contingencies are “informative,” the fact that investments are no contractible would not of itself allow “inefficient” incomplete contracts to be optimal contracts. Further, the requisite “informativeness” is an empirically valid assumption in the relevant economic context. Suppose instead, that the value and cost are not just functions of the realized contingency, but say, v = v(w; q) and c = c(w; q), where q is a variable measuring the quality or quantity of the input traded. Then it is possible that contractual incompleteness (in the sense of not mentioning trade prices for some events) can arise as a strategic imperative as long as q is also not contractible. This result is demonstrated in B. Douglas Bernheim and Michael D. Whinston (1997). One contribution of the present chapter lies in demonstrating how it is possible to rationalize incompleteness even if q were degenerate. The model presented in this chapter assumes firms are risk neutral. There is a dominant tradition in economics of modeling firms as such. For the typical “big” firm, pervasive risk spreading ensures that the ownership of the firm’s returns is sufficiently diffuse. Thus a firm may act as if its utility is linear in profits; that is, it only cares about expected profits and may ignore risk completely. When analyzing contractual incompleteness in interfirm relationships, the challenge thus is to explain without invoking risk aversion—hence, the assumption of risk neutrality. Notice though that the arguments used to urge the analyst to ignore risk would not work to rule out the effect of ambiguity. While risk spreading ensures that firms behave as if their utility functions are linear, as we have observed, ambiguity aversion bites in spite of linear utilities.
322
Sujoy Mukerji
Finally, I turn my attention to some general theoretical issues raised by the analysis. We have obtained at least a preliminary understanding of the role of ambiguity aversion in the design of optimal contracts. For instance, we have been alerted to the fact that the trade-off between exacerbating the effect of ambiguity and the provision of incentives may make it optimal to ignore “information-rich” signals. This is very analogous to the well-researched trade-offs between risk and incentives and between “information rent” and incentives. One suspects that the trade-off associated with ambiguity aversion will have as wide-ranging and significant implications as these other trade-offs. For example, the above-mentioned analysis prompts the question: Can ambiguity aversion force contracts in models of double-sided moral hazard to have linear or even approximately linear sharing rules? This is one open question to be dealt with in future research along with the broader issue of obtaining a complete characterization of optimal contracts under ambiguity aversion.
Appendix Example 14.A1. Let the universe consist of two complementary events E and E c . E, in turn, consists of two subevents E1 and E2 . E1 is further partitioned into E11 and E12 . The nonadditive probability function π which describes the DM’s belief is: π(E11 ) = 0.1; π(E12 ) = 0.2; π(E1 ) = 0.5; π(E2 ) = 0.2; π(E) = 0.8; π(E c ) = 0.2; π(E11 ∪ X ) = π(E11 ) + π(X ) where X ∈ {E2 , E c , E2 ∪ E c }; π(E12 ∪ X ) = π(E12 )+π(X ) where X ∈ {E2 , E c , E2 ∪E c }; π(E1 ∪E c ) = π(E1 )+π(E c ); π(E2 ∪ E c ) = π(E2 ) + π(E c ); π( ) = 1. The payoffs of the act f are as indicated in Figure 14A.1.
Ω; (Ω) = 1
Figure 14.A.1 The space of events and outcomes of f .
Incompleteness of contractual form
323
CEπ (f ) = f (E c )[π( ) − π(E)] + f (E2 )[π(E) − π(E1 )] + f (E11 )[π(E1 ) − π(E12 )] + f (E12 )π(E12 ) = 8.5. It is instructive to note that an equivalent computation is CEπ (f ) = π(E11 )f (E11 ) + π(E12 )f (E12 ) + π(E2 )f (E2 ) + π(E c )f (E c ) + [π(E1 ) − {π(E11 ) + π(E12 )}] min{f (E11 ), f (E12 )} + [π(E) − {π(E1 ) + π(E2 )}] × min{f (E11 ), f (E12 ), f (E2 )} = 8.5. The second computation shows that the Choquet method evaluates an act in much the same way as the usual expectation operator, with the amendment that the “residual” measure (namely, the residual at E1 is [π(E1 ) − {π(E11 ) + π(E12 )}] in any subset of contingencies is “attached” to the contingency bearing the worst consequence of the act in that subset. Proof of Lemma 14.1. First I establish some notation. Given a convex nonadditive probability π(·|β, σ ) let (tπ (·|β, σ )) denote the set of additive probabilities in the “core” of π(·|β, σ ), that is, (π(·|β, σ )) = π˜ ∈ ( )|π(X ˜ ) ≥ π(X |β, σ ), ∀X ∈ 2 }. Note that if π(·|β, σ ) is additive then (π(·|β, σ )) is a singleton, the only element being π(·|β, σ ) itself. Next, given a payoff function f , f : → R, define the set π(f ; β, σ ) as follows: " % π(f ; β, σ ) ≡ π˜ ∈ (π(·|β, σ )) f (ωi )π(ω ˜ i ) = CEπ(·|β,σ ) f . ωi ∈
That is, evaluating the Choquet expectation of f with respect to the nonadditive probability π(·|β, σ ) is equivalent to evaluating the expectation of f with respect to any additive probability in π(f ; β, σ ). Now, t˜ will solve I C B and I C s if there exists a π˜ b (ωi |β, σ ) ∈ π(sδ − t˜; β, σ ) and also a π˜ b (ωi |β, σ ) ∈ π(t˜; β, σ ), such that the following inequalities (14.A.1) and (14.A.2) are satisfied: [π˜ b (ωi |βH , σH ) − π˜ b (ωi |βL , σH )][s(ωi )δ(ωi ) − t˜(ωi )] ωi ∈
≥ hB (βH ) − hB (βL )
(14.A.1)
[π˜ s (ωi |βH , σH ) − π˜ s (ωi |βH , σL )][t˜(ωi )]
ωi ∈
≥ hS (σH ) − hS (σL ).
(14.A.2)
324 Sujoy Mukerji Suppose π˜ o (ωi |β, σ ) ∈ π(s; β, σ ). Since (βH , σH ) is the first best, the inequalities (14.A.3) and (14.A.4) must be true:
π˜ o (ωi |βH , σH )(max{s(ωi ), 0}) − hB (βH ) − hS (σH )
ωi ∈
≥
π˜ o (ωi |βL , σH ) × (max{s(ωi ), 0}) − hB (βL ) − hS (σH ) (14.A.3)
ωi ∈
π˜ o (ωi |βH , σH )(max{s(ωi ), 0}) − hB (βH ) − hS (σH )
ωi ∈
≥
π˜ o (ωi |βH , σL ) × (max{s(ωi ), 0}) − hB (βH ) − hS (σL ). (14.A.4)
ωi ∈
Setting δ(·) such that δ(ωi ) > 0 ⇔ s(ωi ) ≥ 0, and rearranging terms (14A.3) yields (14.A.5), and (14.A.4) yields (14A.6):
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi ))
ωi ∈
≥ hB (βH ) − hB (βL )
(14.A.5)
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βH , σL ))(s(ωi )δ(ωi ))
ωi ∈
≥ hS (σH ) − hS (σL ).
(14.A.6)
Hence solutions to (14.A.1) and (14.A.2) will exist if we find t˜ that solves (14.A.7) and (14.A.8):
[π˜ b (ωi |βH , σH ) − π˜ b (ωi |βL , σH )] × [s(ωi )δ(ωi ) − t˜(ωi )]
ωi ∈
≥
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi ))
(14.A.7)
ωi ∈
[π˜ s (ωi |βH , σH ) − π˜ s (ωi |βH , σL )][t˜(ωi )]
ωi ∈
≥
ωi ∈
(π˜ o (ωi |βH , σH ) − π˜ o (ωi |βH , σL ))(s(ωi )δ(ωi )).
(14.A.8)
Incompleteness of contractual form
325
Using matrix notation the inequalities (14.A.7) and (14.A.8) may be replaced by (14.A.9.): ⎡ ⎤ . t˜(ω1 ) −π˜ b (ωi |βH , σH ) + π˜ b (ωi |βL , σH ) ⎢ ⎥ .. ⎣ ⎦ . π˜ s (ωi |βH , σH ) − π˜ s (ωi |βH , σL ) t˜(ωN ) ⎤ ⎡ (π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi )) ⎥ ⎢ ω ∈
⎥ ⎢ i ≥⎢ ⎥ . (14.A.9) ⎣ (π˜ (ω |β , σ ) − π˜ (ω |β , σ ))(s(ω )δ(ω )) ⎦ o
i
H
H
o
i
H
L
i
i
ωi ∈
Consider the case where π(·|β, σ ) is unambiguous. It is possible to find a (bounded) t˜ if the vectors −π˜ b (β H , σ H ) + π˜ b (β L , σ H ) and π˜ s (β H , σ H ) − π˜ s (β H , σ L ) are independent, which in turn is ensured by Condition 14.1. Thus the proof is complete for this case. However, in general, the fact that the system in (14.A.9) has a bounded solution does not immediately follow from our assumption on independence. The reason is that, when the core of π (β, σ ) is a nonsingleton set, the vectors −π˜ b (β H , σ H ) + π˜ b (β L , σ H ) and π˜ s (β H , σ H ) − π˜ s (β H , σ L ) are not completely “exogenous” to the system; they are “endogenous” in so far as they depend on ˜t. However, Condition 14.1 applied in conjunction with a standard fixed-point argument shows that (14.A.9) indeed has a bounded solution. To proceed with the fixed-point argument, I first construct an appropriate mapping in the following three steps. Step 1. Pick any two elements from (π(·|βH , σH )) and an element each from (π(·|βH , σL )) and (π(·|βL , σH )). Denote them as π˜ 1 , π˜ 2 , π˜ 3 , π˜ 4 , respectively. Step 2. Consider the solution set to the system given by (14.A.10): ⎡ ⎤ t(ω1 ) * ) −π˜ 1 + π˜ 4 ⎢ ⎥ .. ⎦ . π˜ 2 − π˜ 3 ⎣ t(ωN ) ⎤ ⎡ (π˜ o (ωi |βH , σH ) − π˜ o (ωi |βL , σH ))(s(ωi )δ(ωi )) ⎥ ⎢ ω ∈
⎥ ⎢ i ≥⎢ ⎥. ⎣ (π˜ (ω |β , σ ) − π˜ (ω |β , σ ))(s(ω )δ(ω )) ⎦ o
i
H
H
o
i
H
L
i
i
ωi ∈
(14.A.10) Let τ = {t|t solves (14A.10)}. Recall the role of contingencies ωk and ωl as stated in Condition 14.1. Make a selection t¯ from τ such that t(ωi ) = 0, if i = k or i = l. It follows from Condition 14.1 that such a selection exists and is unique. Furthermore, t¯ is bounded.
326
Sujoy Mukerji
Step 3. Finally consider a set {π˜ 1t¯, π˜ 2t¯, π˜ 3t¯, π˜ 4t¯} where π˜ 1t¯, π˜ 2t¯ ∈ π(t¯; βH , σH ), π˜ 3t¯ ∈ π(t¯; βH , σL ), π˜ 4t¯ ∈ π(t¯; βL , σH ). Steps 1 and 2 together define a continuous function 1 : (π(·|βH , σH )) × (π(·|βH , σH )) × (π(·|βH , σL )) × (π(·|βL , σH )) → RN while Step 3 defines a convex-valued upper hemicontinuous correspondence. 2 : RN ⇒ (π(·|βH , σH )) × (π(·|βH , σH )) × (π(·|βH , σL )) × (π(·|βl , σH )). Hence the composition , ≡ 2 ◦ defines a convex-valued upper hemicontinuous correspondence from the convex domain (π(·|βH , σH )) × (π(·|βH , σH )) × (π(·|βH , σL )) × (π(·|βL , σH )) into itself. Kakutani’s fixed-point theorem ensures that has a fixed point. Let {π1t ∗, π2t ∗, π3t∗ , π4t∗ } be a fixed point of . If t∗ solves (14.A.11), then clearly t∗ satisfies conditions required of the transfer t˜ (in 14.A.1) and (14.A.2): ⎤ ⎡ * t(ω1 ) ) −π1t ∗ + π4t ∗ ⎢ . ⎥ . π˜ 2t ∗ − π˜ 3t ∗ ⎣ . ⎦ t(ωn ) ⎡ ⎤ (π˜ o (ωi |βH , σH ) − π˜ o (ωi |βl , σH ))(s(ωi )δ(ωi )) ⎢ ω ∈
⎥ ⎢ t ⎥ ≥⎢ ⎥ ⎣ (π˜ (ω |β , σ ) − π˜ (ω |β , σ ))(s(ω )δ(ω )) ⎦ o
i
H
H
o
i
H
L
i
i
ωt ∈
(14.A.11) Proof of Proposition 14.1. Lemma 14.1 proves that a (bounded) t exists which satisfies the incentive constraints relevant to implementing the first best. Given such a t the expected payoffs to B and S are E(s(ωi )δ(ωi ) − t(ωi )|βH , σH ) − hB (βH ) and E(t(ωi )|βH , σH ) − hS (σH ), respectively. Since the expectations operator is additive, the sum of the expected payoffs to the two parties is E(s(ωi )δ(ωi )|βH , σH ) − hB (βH ) − hS (σH ). (βH , σH ) is the first best, implying E(s(ωi )δ(ωi )|βH , σH ) − hB (βH ) − hS (σH ) ≥ 0. Hence participation constraints can be taken care of by a transfer
Incompleteness of contractual form
327
τ ∈ R transacted when the contract is signed, so long as τ satisfies the following conditions: (P CB∗ ) E(s(ωi )δ(ωi ) − t(ωi )|βH , σH ) − τ − hB (βH ) ≥ 0 (P CS∗ ) E(t(ωi )|βH , σH ) + τ − hS (σH ) ≥ 0. Lemma 14.A.1. Given that f : → R, g : → R and that π is a convex nonadditive probability function, π : 2 → [0, 1]; and the labeling of the state space = {ωi }N i=1 is such that f (ωm ) > g(ωn ) ⇒ m > n. (a) if f and g are comonotonic then Eπ (f + g) = Eπ (f ) + Eπ (g). (b) Let f and f + g be comonotonic, and suppose that there is ωk+1 , ωk such that (i) and (ii) holds: (i) g(ωk+1 ) < g(ωk ); (ii) π({ωk+1 }) + π({ωk }) < π({ωk+1 , ωk }). Then Eπ (f + g) > Eπ (f ) + Eπ (g). Proof. The proof is straightforward and hence omitted. Lemma 14.A.2. Assume π(·|·, ·) satisfies Conditions 14.2a, 14.2b, and that (βh , σh ) is the first-best action profile. Then there exist hB (·), hS (·) such that for any t˜ which satisfies the incentive, compatibility conditions I C B , I C S in Lemma 14.1, t˜ and the vector [max s(·), 0 − t˜(·)] are not comonotonic. Proof. The strategy of the proof will be to choose hS (·) such that hS (σH ) − hS (σL ) = SocBen (σH /σL ) and then show that if t˜ and s − t˜ are comonotonic, B’s marginal private benefit from choosing βH (i.e. E(s− ˜t|βH , σH )−E(s− ˜t|βL , σH ) falls short of SocBen (βH /βL ). Hence we can choose hB (·) with the difference hB (βH ) − hB (βL ) large enough [but less than SocBen (βH /βL )], so that if ˜t, s − ˜t are to satisfy (I C B ), (I C S ), then it must be that ˜t and s − ˜t are not comonotonic. Fix s(·) such that SocBen(σH /σL ) > 0 and SocBen(βH /βL ) > 0 and consider hS (·) and hB (·) such that (βH , σH ) is the first-best action profile. Choose hS (·) such that hS (σH ) − hS (σL ) = SocBen(σH /σL ). Choose t˜ such that s, t˜ and s − t˜ are comonotonic and t˜ satisfies I C S . Let i S be the smallest value of the contingent state index i, such that π(X(ωi+1 )|βH , σH /σL ) > 0. Similarly, let i S be the highest index i such that π(X(ωi )|βH , σH /σL ) > 0. Note, Assumption 14.2b guarantees that the set {ωi ∈ |i S ≤ i ≤ iS } is not a singleton set.
328
Sujoy Mukerji
Claim 14.A.1. s(ωi ) − t˜(ωi ) is the same for all i satisfying the condition i S ≤ i ≤ iS . Proof. Since for all i ≤ i S , π(X(ωi )|βH , σH /σL ) = 0, it follows that π(X(ωi S )|βH , σH /σL ) − π(X(ωi S +1 )|βH , σH /σL ) < 0.
(14.A.12)
Since for all i > i S , π(X(ωi )βH , σH /σL ) = 0, it follows that π(X(ωi S )|βH , σH /σL ) − π(X(ωi S +.1 )|βH , σH /σL ) > 0.
(14.A.13)
At this point it is useful to recall that E(s − t˜|β, σ ) may be written as N−1
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|β, σ ) − π(X(ωi+1 )|β, σ )]
i=1
+ [s(ωN ) − t˜(ωN )]π(ωN |β, σ ). Suppose the claim is false. That is, s(ωi ) − t˜(ωi ) is not constant when i varies in the interval [i S , i S ]. By comonotonicity of s˜ and s − t˜, s(ωi ) − t˜(ωi ) is weakly increasing in ωi for i such that i S ≤ i ≤ i S . Hence it must be that s(ωi S ) − t˜(ωi S < s(ωi S ) − t˜(ωi S ). Next notice, E(s − t˜|βH , σH ) − E(s − t˜|βH , σL ) =
iS
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|βH , σH ) − π(X(ωi+1 )|βH , σH )]
i=i S
−
iS
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|βH , σL ) − π(X(ωi+1 )|βH , σL )]
i=i S
=
iS
[s(ωi ) − t˜(ωi )] × [π(X(ωi )|βH , σH /σL )
i=i S
− π(X(ωi+1 )|βH , σH /σL )].
(14.A.14)
By inspecting the final expression (14.A.14), it may be checked that E(s − t˜|βH , σH ) − E(s − t˜|βH , σL ) > 0.
(14.A.15)
To see this, first note that for any i = ιˆ, π(X(ωιˆ)|β, σ ) =
N −1
[π(X(ωi )|β, σ ) − π(X(ωi+1 )|β, σ )] + π(ωN |β, σ ).
i=ˆι
Incompleteness of contractual form
329
Hence, the Assumption 14.2b (π(X(ωi )|βH , σh /σl ) ≥ 0) implies iS
[π(X(ωi )|βH , σH /σL ) − π(X(ωi+1 )|βH , σH /σL )] ≥ 0.
(14.A.16)
i=ˆι
Then, (14A.15) finally follows from the fact that s(ωi )− t˜(ωi ) is weakly increasing in ωi , (14A.16), (14A.12), (14A.13) and that s(ωi S ) − t˜(ωiS ) < s(ωi S ) − t˜(ωi S ). But (14A.15) in turn implies E(t˜|βH , σH ) − E(t˜|βH , σL ) < SocBen(σH /σL ).
(14.A.17)
Hence, given that we chose t˜ which satisfies I C S we have arrived at a contradiction. Claim 14.A.2. E(s − t˜|βH , σH ) − E(s − t˜|βL , σH ) < SocBen(βH /βL ). Proof. Suppose not, that is, E(s − t˜|βH , σH ) − E(s − t˜|βL , σH ) = SocBen(βH /βL ). Then by an argument as in Claim 14A.1, one may show that there must be a set of contiguous contingencies {ωi B , . . . , ωi B } where t˜(ωi ) is constant when i varies in the closed interval [i B , i B ]. Further, Condition 14.2 (a and b taken together) ensures that the intervals [i S , i S ] and [i B , i B ] overlap; that is, i B < i S and i S < i B . Since it has already been established that s(ωi ) − t˜(ωi ) is constant when i ∈ [i S , i S ], the fact that t˜(ωi ) is constant for i ∈ [i B , i˜B ] therefore implies that both s(ωi ) − t˜(ωi ) and t˜(ωi ) are constant when i ∈ [i, i], where i ≡ min{i B , i S } and i ≡ max{i B , i S }. Thus we are left with the contradictory conclusion that SocBen(βH /βL ) = 0 = SocBen(σH /σL ). With claim 14A.2 we have established that if t˜ and s − t˜ are comonotonic, and if hS (σH ) − hS (σL ) = SocBen(σH /σL ), then we can find hB (·) such that s − t˜ will not satisfy (I C B ) if t˜ satisfies (I C S ). Lemma 14.A.3. Suppose Conditions 14.2a, 14.2b, and 14.3 are satisfied and (βH , σH ) is the first-best profile. Then, there will exist hB (·) and hS (·), such that (βH , σH ) cannot be implemented even if there are t˜ and s − t˜ which satisfy (I C B ), (I C S ). Proof. We choose two investment cost functions h˜ B (·) and hS (·), such that any t and s − t which satisfy the corresponding (I C B ), (I C S ) are necessarily noncomonotonic. Lemma 14A.2 assures that such a choice is available. Of all t and s − t that satisfies the corresponding (I C B ), (I C S ) with investment cost functions hB (·) and hS (·), let t˜ and s − t˜ be the pair which maximizes E(max{s(ωi ), 0} − t(ωi )|βH , σH ) + E(t(·)|βH , σH ).
330
Sujoy Mukerji
Given what has been assumed about π(·|β, σ ), t˜ and s − t˜ are not comonotonic. Hence, it follows from Lemma 14A.1(b) and Condition 14.3 that E(max{s(ωi ), 0} − t˜(ωi )|βH , σH ) + E(t˜(·)|βH , σH ) < E(max{s(ωi ), 0}|βH , σH ), implying that there exist nonnegative real numbers ξb and ξs such that E(max{s(ωi ), 0} − t˜(ωi )|βH , σH ) + E(t˜(·)|βH , σH ) − hB (·) − ξb − hS (·) − ξs < 0, even though, E(max{s(ωi ), 0}|βH , σH ) − hB (·) − ξb − hS (·) − ξs > 0. Choose, hB (·) = hB (·) + ξb ; hS (·) = hS (·) + ξs ; this choice will satisfy (I C B ), (I C S ) for the division of surplus t˜ and s − t˜; however it will fail to satisfy the participation constraint(s) for implementing (βH , σH ). Proof of Lemma 14.2. Clearly t and s − t satisfy the appropriate incentive constraints. Notice, t and s − t are comonotonic and thus by Lemma 14A.2(a), the expectations operator is additive. Hence ex ante payments can be arranged to satisfy the individual participation constraints given that the aggregate participation constraint (14.8) is satisfied. Proof of Proposition 14.3. We know from Proposition 14.2 that there exists a tuple (πˆ (·|β, σ ), hˆ B (·), hˆ S (·), sˆ (·)) satisfying Conditions 14.1, 14.2a, 14.2b, and 14.3 such that no contract may implement the first best (βH , σH ). Recall, (β ∗ , σ ∗ ) is a second-best profile if ES(β , σ ) > ES(β ∗ , σ ∗ ) ⇒ (β , σ ) is the first best. If (βL , σL ) is the second best in the model described by the tuple (πˆ (·|β, σ ), hˆ B (·), hˆ S (·), sˆ (·)), then it must be the case that the inequalities (14A.18), (14A.19), (14A.20) are satisfied, implying that the null contract will implement (βL , σL ) (by Lemma 14.2). ¯ λ[E(max{ˆ s (ωi ), 0}|βL , σL ) − E(max{ˆs (ωi ), 0}|βH , σL )] ≥ hˆ B (βL ) − hˆ B (βH )
(14.A.18)
¯ (1 − λ)[E(max{ˆ s (ωi ), 0}|βL , σL ) − E(max{ˆs (ωi ), 0}|βL , σH )] ≥ hˆ S (σL ) − hˆ S (σH )
(14.A.19)
Incompleteness of contractual form E(max{ˆs (ωi ), 0}|βL , σL ) ≥ hˆ B (σL ) + hˆ B (βL ).
331
(14.A.20)
To see why (14.A.18), (14.A.19) must be satisfied, suppose (14A.18) does not hold. That is, ¯ λ[E(max{ˆ s (ωi ), 0}|βH , σL − E(max{ˆs (ωi ), 0}|βL , σL )] > hˆ B (βH ) − hˆ B (βL ).
(14.A.21)
But the assumption of stochastic dominance (in Assumption (14.2a) implies that (1 − λ¯ )E(max{ˆs (ωi ), 0}|βH , σL ) − hˆ S (σL ) ¯ ≥ (1 − λ)E(max{ˆ s (ωi ), 0}|βL , σL ) − hˆ S (σL ).
(14.A.22)
Summing up (14.A.21) and (14.A.22), we get (14.A.23): E(max{ˆs (ωi ), 0}|βH , σL ) − hˆ B (βH ) − hˆ S (σL ) > E(max{ˆs (ωi ), 0}|βl , σL ) − hˆ B (βL ) − hˆ S (σl ).
(14.A.23)
But (14A.23) contradicts the hypothesis that (βL , σL ) is the second best. Hence if (βL , σL ) is the second best the null contract is optimal contract for the model (π(·|β, ˆ σ ), hˆ B (·), hˆ S (·), sˆ (·)). Next consider the case where (βL , σL ) is not the second best. Assume w.l.o.g. that (βH , σL ) is the second best. Adjust the investment cost functions as follows: Let h˜ B (βH ) = hˆ B (βH ); h˜ S (σH ) = hˆ S (σH ); and h˜ B (βL ) = hˆ B (βL )+ε; h˜ S (σL ) = hˆ S (σL ) − ε, where ε is such that, ¯ λ[E(max{ˆ s (ωi ), 0}|βH , σL ) − E(max{ˆs (ωi ), 0}|βL , σL )] = h˜ B (βh ) − h˜ B (βL ).
(14.A.24)
It may be checked that with the adjusted cost functions, (βH , σL ) will be implemented by the null contract and further, the adjustment will not alter the fact that (βH , σH ) is the first best. One has to verify that (βH , σH ) cannot be implemented, given the adjusted cost functions h˜ B and h˜ S . To that end, suppose to the contrary. Hence by this hypothesis there exists a transfer t˜ which meets the required incentive and participation constraints corresponding to the adjusted cost functions h˜ B and h˜ S . That implies there exists a transfer tˆ = t˜ + α which ensures that the incentive and participation constraints are met for the implementation of (βH , σH ) corresponding to the original investment cost functions hˆ B and hˆ S . This contradicts the fact that the model given by (π(·|β, ˆ σ ), hˆ B (·), hˆ S (·), sˆ (·)) is such that the first best (βH , σH ) cannot be implemented by a contract.
Acknowledgments The chapter has benefited very substantially from the many constructive suggestions of the two anonymous referees. Their efforts went much beyond the call
332
Sujoy Mukerji
of duty and I remain very grateful. Jim Malcomson’s painstaking scrutiny of an earlier draft made possible the much-needed expositional improvements. I also thank Dieter Balkenborg, Jacques Cramér, David Kelsey, Fahad Khalil, Peter Klibanoff, Andrew Mountford, David Pearce, R. Edward Ray, Gerd Weinrich, and seminar members at various universities and conferences (especially the audience at the Conference on Decision Making Under Uncertainty held at Saarbrücken (Germany), University of Saarland) for helpful discussions and comments.
Notes 1 To preempt misunderstandings it is emphasized that the term “ambiguity,” as used in this chapter, refers purely to the fuzzy perception of the likelihood subjectively associated with an event (e.g. when asked about his subjective estimate of the probability of an event, the agent replies, “It is between 50 and 60 percent.”). It does not refer to a lack of clarity in the description of contingent events and actions. Also note, some authors and researchers refer to ambiguity as “Knightian Uncertainty” or even simply as “uncertainty.” As it is used in this chapter, the word “uncertainty” is simply the defining characteristic of any environment where the consequence of at least one action is not known for certain. 2 The reader is assured that the example is essentially unaffected by also having a state in which both the statements are true. 3 The author remains most grateful to the two anonymous referees for drawing his attention to this point. 4 In general, nonadditative probability (or capacity) π obeys the axioms (i), (ii), and the condition that X ⊇ Y ⇒ π(X ) ≥ π(Y). The axiom (iii) applies to the special case of a convex nonadditive probability. The term “convex” points to the requirement that the nonadditive probability of a set is (weakly) greater than the sum of the nonadditive probabilities of the cells of a partition of the set. Presumably, the analogy is to the property of any increasing convex function, say φ : R+ → R+ , that φ(x) + φ(y) ≤ φ(x + y). It is when the nonadditive probability is convex that the CEU decision rule corresponds to ambiguity aversion. 5 Consider the following stronger version of the third property: (iii ) For every n > 0 and every collection χ1 , . . . , χn ∈ 2 , π(∪ni=1 χi ) ≥ (−1)|I |+1 π(∩i∈I χi ) I ⊆{1,...,n} I =∅
where |I | denotes the cardinality of I . Non-additive probabilities which satisfy (iii ), in addition to (i) and (ii), have been variously referred to as “belief functions”, “totally monotone capacities,” and “n-convex capacities”. In the rest of the chapter, all references to convex non-additive probability measures should be understood to be referring to non-additive probabilities which satisfy (i), (ii), and (iii ), rather than to those satisfying (i), (ii), and (iii), as they did in the version published in the AER. The amendment ensures that Lemma A1 is correct as stated. The author thanks Ben Polak for bringing to his notice that Lemma A1 need not hold for convex capacities which satisfied only (iii), the weaker version of the third property. 6 This follows from the celebrated theorem in Lloyd S. Shapley (1965) which asserts the existence of a core allocation corresponding to any convex characteristic value function defined on possible coalitions in a cooperative game. 7 Peter C. Fishburn (1993) provides an axiomatic justification of this definition of ambiguity and Mukerji (1997) demonstrates its equivalence to a more primitive and epistemic
Incompleteness of contractual form
333
notion of ambiguity (expressed in terms of the DM’s knowledge of the state space). Massimo M. Marinacci (1995) applies the idea to game theory, while David Kelsey and Shasikanta Nandeibam’s (1996) analysis explains why this definition is sometimes interpreted as a measure of “uncertainty aversion.” 8 The Choquet expectation operator may be directly defined with respect to a nonadditive probability. Label ωi such that f (ω1 ) ≤ · · · ≤ f (ωN ). Then, CEπ (f ) = f (ω1 ) +
N
[f (ωi ) − f (ωi−1 )] × π({ωi , . . . , ωN })
i=2
=
N −1
f (ωi )[π({ωi , . . . , ωN }) − π({ωi+1 , . . . , ωN })] + f (ωN )π({ωN }).
i=1
9 For a fuller review of the arithmetic of the Choquet expectation operator, see Example 14.A.1 in the Appendix. 10 This is technically evident from the fact that if {X , Y} is a partition of the set E, then convexity of the belief (on E) implies, A(π(X )) + A(π(Y)) ≥ A(π(E)). 11 Usually stochastic dominance is defined with respect to the payoff or the outcome space. As stated here, instead the reference is to the underlying contingency space. Thus we have to suitably amend the usual definition to accommodate the fact that contiguous contingencies may yield the same outcome, that is, the same surplus. 12 The reader will observe that this notion of the first best is “vindicated” by the fact that this is the profile that will be chosen if the investment effort were contractible. 13 In particular, this allows for terms of trade being contingent on realizations of v(·) and c(·) by the simple expedient of making the terms contingent on events such as E(V ; C) ≡ {ωi ∈ |v(ωi ) = V and c(ωi ) = C}. 14 By taking δ as a mapping into {0, 1} ex post randomization is ruled out. This follows the dominant tradition in the literature on incomplete contracting, see for example, Hart and John H. Moore (1988). 15 It seems a reasonable conjecture that Conditions 14.2 and 14.3 are generic. If, for instance, Condition 14.2 fails to hold, even the slightest perturbation of beliefs should restore the condition. A similar understanding can be suggested for Condition 14.3. In a suitably rich space of measures that includes all convex nonadditive measures, the subspace of beliefs that are strictly additive (over at least some events) would appear to be nongeneric. (NB the parametric specification in Example 14.2 satisfies Conditions 14.1, 14.2a, 14.2b, and 14.3.) 16 A referee has remarked, “the claim . . . that [a] transactions cost argument cannot rationalize the use of long-term contracts in stable environments but not in highly uncertain ones is debatable . . . if the greater uncertainty means more contingencies that must be foreseen, described, bargained over, and ultimately recognized . . . .” While admittedly debatable, the claim is definitely defensible. It is certainly intuitive to posit a link between the nature of the incumbent uncertainty and the extent of transactions costs. But one is yet to see a formal clarification of such a story. For instance, it is hard to figure precisely what primitives and principles imply that “greater uncertainty means more contingencies that must be foreseen (etc).” The point is, while ambiguity aversion does manage to convey a precise and coherent account of a link between uncertainty and contracting costs, the transactions-costs paradigm is yet to find one. 17 James M. Malcomson and W. Bentley McLeod (1993) explains Joskow contracts by essentially arguing that in such contexts conditioning instructions over a coarse partition
334 Sujoy Mukerji of the contingency space is sufficient. This explanation is very consistent with the ambiguity aversion story: as has been observed earlier (Note 9), the coarser the partition the less the bite from ambiguity.
References Anderlini, Luca and Felli, Leonardo. 1994, “Incomplete Written Contracts: Undescribable States of Nature,” Quarterly Journal of Economics, November, 109(4), pp. 1085–124. Baker, George; Gibbons, Robert and Murphy, Kevin. J. 1997, “Implicit Contracts and the Theory of the Firm,” Unpublished manuscript. Bernheim, B. Douglas and Whinston, Michael D. 1997, “Incomplete Contracts and Strategic Ambiguity,” Discussion Paper No.1787, Harvard University. Camerer, Colin F. and Weber, Martin. 1992, “Recent Developments in Modelling Preferences: Uncertainty and Ambiguity,” Journal of Risk and Uncertainty, October 5(4), pp. 325–70. Carlton, Dennis W. 1979, “Vertical Integration in Competitive Markets Under Uncertainty,” Journal of Industrial Economics, March, 27(3), pp. 189–209. Coase, Ronald. 1937, “The Nature of the Firm,” Economica, November, 39(4), pp. 386–405. Dow, James P. and Werlang, Sergio R. 1992, “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, January 60(1), pp. 197–204. (Reprinted as Chapter 17 in this volume.) —— 1994 “Nash Equilibrium Under Kinghtian Uncertainty: Breaking Down Backward Induction,” Journal of Economic Theory, December, 64(2), pp. 305–24. Eichberger, Jurgen and Kelsey, David. 1996, “Signalling Games with Uncertainty,” Mimeo, University of Birmingham, U.K. Ellsberg, Daniel. 1961, “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, November, 75(4), pp. 643–69. Epstein, Larry G. and Wang, Tan. 1994, “Intertemporal Asset Pricing Under Kinghtian Uncertainty,” Econometrica, March 62(2), pp. 283–322. (Reprinted as Chapter 18 in this volume.) —— 1995 “Uncertainty, Risk-Neutral Measures and Security Price Booms and Crashes,” Journal of Economic Theory, October, 67(1), pp. 40–82. Fishburn, Peter C. 1993, “The Axioms and Algebra of Ambiguity,” Theory and Decision, March, 34(2), pp. 119–37. Ghirardato, Paolo. 1995, “Coping with Ignorance: Unforeseen Contingencies and Nonadditive Uncertainty,” Mimeo, University of California, Berkeley. Grossman, Sanford J. and Hart, Oliver D. 1986, “The Costs and Benefits of Ownership: A Theory of Vertical and Lateral Integration,” Journal of Political Economy, August, 94(4), pp. 691–719. Hart, Oliver D. 1995, Contracts and financial structure. Oxford: Clarendon Press. Hart, Oliver D. and Moore, John H. 1988, “Incomplete Contracts and Renegotiation,” Econometrica, July, 56(4), pp. 755–85. Joskow, Paul L. 1985, “Vertical Integration and Long-Term Contracts: The Case of CoalBurning Electric Generating Plants,” Journal of Law, Economics and Organization, Spring, 1(1), pp. 33–80. Kelsey, David and Nandeibam, Shasikanta. 1996, “On the Measurement of Uncertainty Aversion,” Mimeo, University of Birmingham, U.K.
Incompleteness of contractual form
335
Klein, Benjamin; Crawford, Robert G. and Alchian, Armen A. 1978, “Vertical Integration, Appropriable Rents and the Competitive Contracting Process,” Journal of Law and Economics, October, 21(2), pp. 297–326. Legros, Patrick and Matsushima, Hitoshi. 1991, “Efficiency in Partnerships,” Journal of Economic Theory, December, 55(2), pp. 296–322. Lipman, Barton L. 1992, “Limited Rationality and Endogenously Incomplete Contracts,” Queen’s Institute for Economic Research Discussion Paper No. 858. October. Lo, Kin Chung. 1998, “Sealed Bid Auctions with Uncertainty Averse Bidders,” Economic Theory, July, 12(1), pp. 1–20. MacLeod, W. Bentley and Malcomson, James M. 1993 “Investments, Holdup, and the Form of Market Contracts,” American Economic Review, September, 83(4), pp. 811–37. Malcomson, James M. 1997, “Contracts, Hold-Up, and Labor Markets,” Journal of Economic Literature, December, 35(4), pp. 1916–57. Marinacci, Massimo M. 1995, “Ambiguous Games,” Mimeo, Northwestern University. Masten, Scott E.; Meehan, James W. and Snyder, Edward A. 1989, “Vertical Integration in the U.S. Auto Industry: A Note on the Influence of Transaction Specific Assets,” Journal of Economic Behavior and Organization, October, 12(2), pp. 265–73. Mukerji, Sujoy. 1995, “A Theory of Play for Games in Strategic Form when Rationality Is Not Common Knowledge,” Mimeo, University of Southampton, U.K. —— 1997 “Understanding the Nonadditive Probability Decision Model,” Economic Theory, January, 9(1), pp. 23–46. Schmeidler, David. 1989, “Subjective Probability and Expected Utility without Additivity,” Econometrica, May, 57(3), pp. 571–87. (Reprinted as Chapter 5 in this volume.) Shapley, Llyod. S. 1971, “Cores of Convex Games,” International Journal of Game Theory, January, 1(1), pp. 12–26. Simon, Herbert A. 1951, “A Formal Theory of the Employment Relationship,” Econometrica, July 19(3), pp. 293–305. Tallon, Jean-Marc. 1998, “Asymmetric Information, Nonadditive Expected Utility, and the Information Revealed by Prices: An Example,” International Economic Review, May 39(2), pp. 329–42. Tirole, Jean. 1994, “Incomplete Contracts: Where Do We Stand?” Walras-Bowley Lecture, Summer Meetings of the Econometric Society. Williams, Steven R. and Radner, Roy. 1988, “Efficiency in Partnership When the Joint Output Is Uncertain,” Northwestern Center for Mathematical Studies in Economics and Management Science Working Paper No. 760. Williamson, E. Oliver, 1985, The economic institutions of capitalism. New York: Free Press.
15 Ambiguity aversion and incompleteness of financial markets Sujoy Mukerji and Jean-Marc Tallon
15.1. Introduction Suppose an agent’s subjective knowledge about the likelihood of contingent events is consistent with more than one probability distribution. And further that, what the agent knows does not inform him of a precise (second order) probability distribution over the set of “possible” (first order) probabilities. We say then that the agent’s beliefs about contingent events are characterized by ambiguity. If ambiguous, the agent’s beliefs are captured not by a unique probability distribution in the standard Bayesian fashion but instead by a set of probabilities. Thus not only is the outcome of an act uncertain but also the expected payoff of the action, since the payoff may be measured with respect to more than one probability. An ambiguity averse decision maker evaluates an act by the minimum expected value that may be associated with it: the decision rule is to compute all possible expected values for each action and then choose the act which has the best minimum expected outcome. This (informal) notion of ambiguity aversion inspires the formal model of Choquet expected utility (CEU) preferences introduced in Schmeidler (1989). The present chapter considers a model of financial markets populated by agents with CEU preferences, with the interpretation that the agents’ preferences demonstrate ambiguity aversion.1 Typically, economic agents are endowed with income streams that are not evenly spread over time or across uncertain states of nature. A financial contract is a claim to a contingent income stream—hence the logic of the financial markets: by exchanging such claims agents change the shapes of their income streams, obtaining a more even consumption across time and the uncertain contingencies. A financial market is said to be complete if contingent payoffs from the different marketed financial contracts are varied enough to span all the contingencies. However, casual empiricism suggests that in just about every financial market in the real world the span is less than the full set of contingencies, that is, the markets are incomplete. The primary implication of incompleteness of financial markets is that agents may transfer income only across a limited set of contingencies and are thus left to share risk in a suboptimal manner.2
Mukerji, Sujoy and Tallon, Jean-Marc (forthcoming). “Ambiguity aversion and incompleteness of financial markets,” Review of Economic Studies (2001), vol. 68(4), 883–904.
Incompleteness of financial markets
337
Consider the following question: Take a (financial) economy with complete markets, but suppose agents are not subjective expected utility (SEU) maximizers, but rather CEU maximizers; are there conditions under which it is possible that at a competitive equilibrium agents do not trade some assets and hence their equilibrium allocations are equivalent to competitive allocations deriving from some incomplete market economy wherein the allocations are not Pareto optimal? The answer to the question is a qualified yes. The qualification is important and the essential contribution of the present chapter is in identifying this qualification. Imposing CEU maximization in a complete market economy does not generate no-trade, but, as this chapter shows, a robust sequence of incomplete market economies which would converge to complete markets with SEU agents but does not with CEU, can be constructed. The key characteristic of such a sequence of economies is that they include, as nonredundant instruments of risk-sharing, financial assets which are affected by idiosyncratic risk.3 We establish that trade in financial assets, whose payoffs have idiosyncratic components, may break down because of ambiguity aversion. We find, furthermore, that the no-trade due to ambiguity aversion is a robust occurrence, in the sense that it takes place even in the limit replica economy, with enough replicas of the financial assets such that idiosyncratic risk may be completely hedged. Hence, the behavior of the limit replica economy is markedly different depending on whether agents are SEU maximizers or CEU maximizers: in the former case the allocation is precisely that of a complete markets economy whereas in the latter case, because of the endogenous breakdown of trade, the equilibrium allocation, given a “high enough” level of ambiguity aversion and idiosyncratic risk, is not Pareto optimal and the nature of risk-sharing is as in an incomplete markets economy. These findings are of interest, both for the way it complements the related literature and for the substantive economic insight it gives rise to. Dow and Werlang (1992) showed, in a model with one risky and one riskless asset, a single ambiguity averse agent with CEU preferences, exogenously determined asset prices, and a risk-less initial endowment, that there exists a nondegenerate price interval at which an agent will strictly prefer to take a zero position in the risky asset. Recall, the logic of this result essentially rests on the observation that a CEU agent when going short in the risky asset will use a different probability to evaluate expected return as compared to when going long, since an agent taking a short (long) position is relatively better (respectively, worse) off in states where the asset payoff is shocked adversely. Having (robustly) rationalized a zero position in a single decision-maker framework one might be tempted to conjecture (even though such a conjecture is not made by Dow and Werlang) that it were but a short step to generate no-trade in a full equilibrium model. But, as we remarked earlier, simply imposing CEU maximization in a complete market economy does not generate no-trade unless endowments are Pareto optimal to begin with. The point is that, with complete markets, allocations are Pareto optimal and hence comonotonic (i.e. every agent’s ranking of states, ranked in accordance with the agent’s ex post utility from the given allocation, is identical) (Chateauneuf et al. (2000)). Comonotonicity implies that all agents evaluate the returns of assets with
338
Sujoy Mukerji and Jean-Marc Tallon
the same probability measure in a CEU world. Thus, closing Dow and Werlang model in the obvious way makes it apparent that, for generic endowments, assets will surely be traded. Hence it is, at least, of academic interest to find what condition actually generates an endogenous closure of some financial markets and a consequent lessening of risk-sharing opportunities, when moving from an SEU to a CEU world. Perhaps, a more compelling reason for interest in our findings is their economic significance. It is widely regarded that a crucial function of financial markets is that they allow individuals to hedge their income (from, say, human capital/labor) risk even though such risks are not, per se, contractible in appropriate detail because of usual reasons of asymmetric information and/or transactions costs. For instance, take X, a shopowner in Detroit, whose fortunes are heavily dependent on the fortunes of the automobile industry centered in Detroit. While X would love to smooth consumption across the various possible income shocks, it is hardly likely that an insurance company would be willing to insure X against anything other than accidents like fire and theft. But, standard economic/finance theory would argue, even though such personalized contracts may not be available, X should be able to hedge his income shocks in the stock market. To transfer income from the “good” states to the “bad,” all that is required is that X take a short position on a portfolio of shares of different firms in automobile (and related) industry and a long position on a “safe” asset (e.g. a government bond). Of course, the returns of any particular share will not be perfectly correlated with X’s income; in particular, each individual share return will be subject to some idiosyncratic risk. But, with a large enough number of such equities in the portfolio, the idiosyncracies may be hedged away, and X would find the (almost) perfect hedge for his income shocks. To X, therefore, for all practical purposes, the economy is very much a complete market economy. However, what this chapter shows is that the story only runs so far in an SEU world, not in a CEU world. Consider two agents trading an equity subject to idiosyncratic risk, with one agent taking a short position while the other goes long. Evidently, then, the variation in the agent’s consumption across states which differ only in terms of the idiosyncratic shocks would be exclusively determined by the nature and extent of the shocks and the agents’ position on the asset. Moreover, the variation of each agent’s consumption across such states will be inversely related, and therefore, their consumption will not be comonotonic. Hence, given ambiguity aversion with CEU preferences, an agent will behave as if he applies a different probability measure depending on whether he is choosing to go short or to go long. Therefore, it may be that the minimum asking price of the agent when choosing to go short will be higher than the maximum bid of the agent when choosing to go long. Thus, no trade may result, and the chapter provides sufficient conditions that obtain the result. Indeed, as we show, the no-trade outcome will survive even in the limit, when there are an arbitrarily large number of (independent) replicas of the equity. The intuition here is that the law of large numbers implies that the agents’ beliefs on the payoff of a portfolio of risky assets, hit (in part) by idiosyncratic shocks, converge to some mean, but the mean is in principle different for
Incompleteness of financial markets
339
agents taking differently signed positions on the (relevant) assets. In this fashion, ambiguity aversion creates an endogenous limit to the extent of risk sharing possible through financial markets, thereby providing a (theoretical) justification for the basic premise of the general equilibrium with incomplete markets model (HRI). To see it in the eyes of X, in a CEU world, unlike in an SEU world, there may not exist prices that would allow X to go short on automobile industry equities as he needs to do to “export” his income risk. The same market which offers possibilities of risk-sharing equivalent to complete markets when beliefs and behavior are in accordance with SEU, offers only the Pareto suboptimal risk-sharing possibilities on an incomplete market economy when agents are CEU maximizers with beliefs that are “sufficiently” ambiguous. The rest of the chapter is organized as follows. Section 15.2 provides an introduction to the formal model of ambiguity aversion applied in this chapter. Section 15.3 contains the formal model of the finance economy and the main result. Section 15.4 concludes the chapter. Appendix A contains some technical material on independence and law of large numbers for capacities. All formal proofs are in theAppendix B.
15.2. Choquet expected utility and the related literature Let = {ωi }N i−1 be a finite state space, and assume that the decision maker (DM) chooses among acts with state contingent payoffs, z : → R. In the CEU model (Schmeidler, 1989) an ambiguity averse DM’s subjective belief is represented by a convex non-additive probability function (or a convex capacity), ν such that, (i) ν(∅) = 0, (ii) ν( ) = 1 and, (iii) ν(X ∪ Y ) ≥ ν(X) + ν(Y ) − ν(X ∩ Y ), for all X, Y ⊂ . Define the core of ν, (notation: ( ) is the set of all additive probability measures on ): C(ν) = {µ ∈ ( ) | µ(X) ≥ ν(X), for all X ⊂ }. Hence, ν(X) = minµ∈C (ν) µ(X). The ambiguity4 of the belief about an event X is measured by the expression A(X; ν) ≡ 1 − ν(X) − ν(X c ) = maxµ∈C (ν) µ(X) − minµ∈C (ν) µ(X). Like in SEU, a utility function u : R+ → R, u (·) ≥ 0, describes DM’s attitude to risk and wealth. Given a convex non-additive probability ν, the Choquet expected utility5 of an act is simply the minimum of all possible “standard” expected utility values obtained by measuring the contingent utilities possible from the act with respect to each of the additive probabilities in the core of ν: $ u(z(ω))µ(ω) . CEν u(z) = min µ∈C (ν)
ω∈
The fact that the same additive probability in C(ν) will not in general “minimize” the expectation for two different acts, explains why the Choquet expectations operator is not additive, that is, given any acts z, w: CEν (z) + CEν (w) ≤ CEν (z + w).
340
Sujoy Mukerji and Jean-Marc Tallon
The operator is additive, however, if the two acts z and w are comonotonic, that is, if (z(ωi ) − z(ωj ))(w(ωi ) − w(ωj )) ≥ 0. In our analysis, we will need to consider the independent product of capacities. The independent product of two convex capacities, ν1 and ν2 according to the definition (suggested by Gilboa and Schmeidler, 1989) we apply in this chapter, may be (informally) understood as the lower envelope of the set {µ1 ×µ2 |µ1 ∈ C(ν1 ), µ2 ∈ C(ν2 )}. Unlike what is true with “standard” probabilities, there is more than one way to define the independent product of two capacities. As it turns out, the formal analysis in this chapter is unaffected if an alternative definition of independence were applied. We refer the interested reader to the Appendix A and to the discussion at the end of Section 15.3 for more on the independent product of capacities and turn next to the use of capacities and CEU on portfolio decision problems. Dow and Werlang (1992), as noted earlier, identified an important implication of Schmeidler’s model. They showed, in a model with one risky and one riskless asset, that if a CEU maximizer has a riskless endowment than there exists a set of asset prices that support the optimal choice of a riskless portfolio. The intuition behind this finding may be grasped in the following example. Consider an asset that pays off 1 in state L and 3 in state H and assume that ν(L) = 0.3 and ν(H ) = 0.4. Assuming that the DM has a linear utility function, the expected payoff of buying a unit of z, the act zb , is given by CEν (zb ) = 0.6 × 1 + 0.4 × 3 = 1.8. On the other hand, the payoff from going short on a unit of z (the act zs ) is higher at L than at H . Hence, the relevant minimizing probability when evaluating CEν (zb ) is that probability in C(ν) that puts most weight on H . Thus, CEν (zs ) = 0.3 × (−1) + 0.7 × (−3) = −2.4. Hence, if the price of the asset z were to lie in the open interval (1.8, 2.4), then the investor would strictly prefer a zero position to either going short or buying. Unlike in the case of unambiguous beliefs there is no single price at which to switch from buying to selling. Taking a zero position on the risky asset has the unique advantage that its evaluation is not affected by ambiguity. The “inertia” zone demonstrated by Dow and Werlang was simply a statement about optimal portfolio choice corresponding to exogenously determined prices, given an initially riskless position. However, it does not follow from this result at the individual level that no-trade is an equilibrium when closing the model by allowing agents to trade their risks, as we illustrate next using the Edgeworth box diagram in Figure 15.1. The diagram depicts the possibilities of risk-sharing (one may think of the risksharing as being obtained through the exchange of two Arrow securities, one for such contingency) between two CEU agents, h = 1, 2, with uncertain endowment in the two states, ωa and ωb . W is the endowment vector. Notice that, because of ambiguity aversion, the indifference curves are kinked at the point of intersection with the 45◦ ray through the origin. The shaded area in the diagram represents the area of mutually advantageous trade. Hence, no-trade is an equilibrium outcome in this economy if and only if endowment is Pareto optimal to begin with. Introduction of ambiguity aversion in an economy, seemingly, would not impede the trade in risk sharing. Contracts and would not be a reason for incomplete risk sharing. The reason for this “absence of no-trade” goes as follows: Pareto optimal
Incompleteness of financial markets
341
X1b O2 X2a
45°
W 45° O1
X1a X2b
Figure 15.1 Risk sharing with two CEU agents.
allocations lie within the “tramlines,” the 45◦ rays through each origin, that is, they are comonotonic. Hence, at a Pareto optimal allocation, the ranking of the states is identical for both agents and is given by the ordering of aggregate endowment. Now with complete markets, equilibrium allocations are Pareto optimal and therefore comonotonic as well. Thus, agents use the same “minimizing probability” at equilibrium, and agree on asset valuation. Risk-sharing proceeds just as in an economy with SEU agents (see Chateauneuf et al. (2000)). Thus, if one wants to obtain that equilibrium be characterized by absence of trade, one has to move away from this (canonical) example, something that is accomplished by introducing into the model assets with idiosyncratic payoff components. Epstein and Wang (1994) recognized the role of first of the two conditions defining idiosyncratic risk (as defined in this chapter) in obtaining nonunique equilibrium asset prices in a CEU world. That result is related to ours. The precise relationship between the results deserves careful discussion. For expository purposes, we turn to this discussion at the end of the next section, after the presentation of our model. We end this section with a discussion of another model of behavior under Knightian uncertainty due to Bewley (1986), distinct from the one applied in this chapter, which easily generates a no-trade result. Bewley, essentially, drops Savage’s assumption that preferences are complete and adds an axiom of the “status quo.” In our Edgeworth box this would amount to assuming that indifference curves are kinked precisely at the endowment point, irrespective of its position in the box. If indifference curves are “kinked enough,” the incompleteness of markets for contingent deliveries (the absence of trade) is then a direct consequence of preference for status quo which is exogenously imposed as a part of the definition of ambiguity aversion.
342
Sujoy Mukerji and Jean-Marc Tallon
15.3. The model and the main result The setting for our formal analysis is a model of a stylized two period finance economy which we call an n-financial asset economy with idiosyncracy. Households (h = 1, . . . , H ) trade assets in period 0, before uncertainty is resolved, and consume the one (and only) good in period 1. The assets available for trade are claims on deliveries of the consumption good in period 1. There are two sources of uncertainty. First, there is some “economic uncertainty”: agents do not know their endowments tomorrow. An economic state of the world, s, s = 1, 2, is completely identified by the endowment vector for that s state (e1s , . . . ehs , . . . , eH ); where each component of the vector, ehs ∈ R+ , gives a particular household’s endowment of the consumption good in state s (arising in period 1). We have restricted our analysis to the case of risk-sharing across only two economic states, to make the argument as transparent as possible. Second, there is idiosyncratic financial uncertainty. An idiosyncratic state of the world completely characterizes the realization of the idiosyncratic components of payoffs of the available financial assets (described below); it is identified by the vector t = (t1 , t2 , . . . , tn ), where ti ∈ {0, 1}, i ∈ {1, 2, . . . , n}, and n is the total number of financial assets. τn denotes the set of all t’s, that is, τn ≡ {0, 1}n . Hence, to obtain a complete description of a state of the world, exhausting all uncertainty relevant to the model, the economic states s must each be further partitioned into cells denoted (s, t). A typical state of the world is denoted by the letter ω, ω ∈ ≡ {(1, t)t∈τn , (2, t)t∈τn }. The assets available for trade at date 0 are as follows: 1
2
Financial assets, zi , i = 1, . . . , n, with payoffs that have idiosyncratic components. An asset zi yields a payoff of y s + y(ti ) > 0 units of the good; s = 1, 2, ti ∈ τ ≡ {0, 1}. y(ti ) is the idiosyncratic component, in the sense that it is independent of the realized economic state and independent of the realization of the payoff from any other financial asset zj , where j = i. It is assumed i that y(1) > y(0) and that y 1 = y 2 . Price of an asset zi is denoted by qnz . A safe asset, b, which delivers one unit of the good irrespective of the realized state of the world. Price of this security is normalized to 1.
A point behind modeling the asset structure as above is to ensure that in order to transfer resources across the two economic states the agents would have to rely on financial assets whose payoffs are affected by idiosyncratic shocks. Prior to the resolution of uncertainty, agents are endowed with a common belief about the likelihood of state ω. The (marginal) beliefs about particular idiosyncratic component ti are described by a capacity νi , νi (0) + νi (1) ≤ 1. To model the assumption that the realization of ti and tj are believed to be independent, the beliefs τn are described by the independent product (defined in Appendix A) 1on n ν ≡ i=1 νi . For simplicity, we shall assume that νi (ti = r) = νj (tj = r) = ν r , r = 0, 1, i, j ∈ {1, . . . , n}. The belief on an economic state s is given by π(s). To make it transparent that it is the ambiguity of beliefs about the idiosyncratic realizations which is responsible for the possibility of no-trade in financial assets,
Incompleteness of financial markets
343
and also to make the computation less tedious, we assume π(1) + π(2) 1 = 1. Finally, the common belief on is given by the independent product π ν. ω ω Let eh,n and xh,n be h’s endowment and consumption, respectively, in state ω = (s, t), given that the total number of financial assets in the economy is n. (s,t) (s,t ) Note, the definition of an economic state implies eh,n = eh,n . Hence, we may use s as a complete description of state contingent endowment. Holding the notation eh,n i . of the asset b by h is denoted bh,n and holding of the asset zi by h is denoted zh,n Agent h has a von-Neumann Morgenstern utility index uh : R+ → R, which is assumed to be strictly increasing, smooth and strictly concave. Furthermore, ω uh (0) = ∞ and eh,n > 0 for all h and all ω. Phn which denotes the maximization program of agent h, is as follows:
s,t CEπ⊗ν uh xh,n max n 1 ,...,zh,n bh,n ,zh,n
i i bh,n + ni=1 qnz zh,n =0 s.t. s,t s i , s = 1, 2, t ∈ τn . xh,n − eh,n = bh,n + ni=1 (y s + y(ti )) zh,n
2 3 1 n An equilibrium consists of a set of asset prices, qn ≡ 1, qnz , . . . , qnz , a set of "
H % 1 , . . . , zn asset holdings, (bn , zn ) ≡ bh,n , zh,n h,n h=1 , and a consumption vector,
ω xn ≡ xh,n , such that, given qn all agents solve Phn , and the asset h=1,...,H ;ω∈
markets clear, that is, i bh,n = zh,n = 0, h
∀i ∈ {1, . . . , n},
h
ω ω and the consumption vector is feasible at each state, that is, h xh,n = h eh,n . Notice, a tuple (qn , (bn , zn )) uniquely pins down the equilibrium, hence we may denote an equilibrium of an n-financial asset economy using such a tuple. In interpreting various aspects of the model it helps to bear in mind the economic issue the model has been formulated to examine, which is, how economic agents may share risks, inherent to their labor/human capital endowment, by trading in financial markets. Hence, as it appears in the model, a household’s endowment income is distinct from the household’s income obtained from the ownership of assets. Portfolio income is the instrument the household is allowed to use to absorb the shocks it faces in its endowed income. But the instrument is not a perfect one. The presence of idiosyncratic risk embodies the notion that payoff from a financial asset is not only affected by some of the same shocks that affect individual households’ endowment income and common to many assets but also by risks specific to each asset. While most firms’ profits are naturally affected by aggregate or sectorial demand shifts and supply shocks, other factors, more idiosyncratic to the firm, do typically matter.6 Finally, notice, we have assumed that the assets are in zero net supply. This implies that the asset trading our analysis applies to include
344
Sujoy Mukerji and Jean-Marc Tallon
all manner of trade in corporate bonds7 ; but for general assets (e.g. equities) the analysis is (formally) restricted to those trades which involve one side of the market going short. The main point of the assumption is that it allows us the abstraction to study how an agent may use a financial asset (say, an equity) to share the risk in his exogenously endowed income: by going short on an asset he issues contingent claims on his risky income, thereby, trading out his risk. To fix ideas, it might help to refer back to the example of X, the Detroit drugstore owner. X would be very representative of the agents in our model presented earlier. Think of the economic states 1 and 2 as states defined by shocks to X’s income from his drugstore. X may hedge his income shock by trading in a “safe” asset, such as a treasury bond, and financial assets, such as corporate bonds/equities issued by the various automobile and ancillary firms located in and around Detroit. Payoffs to each such financial asset it affected by the same income shock that affects X’s drug-store profits. In addition, each financial asset is also affected by shocks idiosyncratic to the issuing firm. Assuming, the firms’ profits and drug-store profits are affected in the same direction by the income shock, X’s hedging strategy would be, presumably, to take a short position on a portfolio of the available financial assets while simultaneously going long on the treasury bond. Our analysis, in effect, compares how such a strategy would fare in an SEU world and in a CEU world. Formally, the analysis compares equilibrium allocations across two cases: one, where beliefs about idiosyncratic outcome is unambiguous (ν 0 + ν 1 = 1), and the other where beliefs about the idiosyncracy is ambiguous (ν 0 + ν 1 < 1). In order to make the comparison stark, the analysis will relate the two cases to two benchmarks. One benchmark is a complete market economy which we call an economy without idiosyncracy, that is, an economy which is identical to the n-financial asset economy with idiosyncracy described in the last section in every respect except that there is only a single financial asset z which pays off y s + Eν y(t) ≡ y¯ s units in the economic states s = 1, 2. Correspondingly, q z denotes the price of z and zh denotes the amount held by household h. (Note, when denoting endogenous variables in the economy without idiosyncracy we may omit the subscript n.) The second benchmark is an incomplete market economy which is identical to the n-financial asset economy with idiosyncracy in every respect except that the only asset available is the safe asset. The following Lemma simplifies the analysis greatly. Lemma. Let (qn , (bn , zn )) be an equilibrium of the n-financial assets economy i i with idiosyncracy. Suppose ν 0 + ν 1 ≤ 1. Then, zh,n = zh,n , ∀i, i ∈ {1, . . . , n}, ∀h ∈ {1, . . . , H }. According to the Lemma, at an equilibrium, agents will hold all the financial assets in the same proportion. This is essentially a consequence of the fact that agents are risk averse and that the n financial assets are simply “independent replicas.” Let z˜ n denote a unit of a portfolio composed of n1 unit of the asset zi , i = 1, . . . , n; z˜ h,n is the amount held of this portfolio by h and q˜n is the price
Incompleteness of financial markets
345
of a unit of this portfolio. Given the Lemma, we may assume, without loss of generality, that it is only the asset z˜ n , instead of the individual assets zi , that is available for trade in the economy. Hence, an equilibrium of an n-financial assets economy with idiosyncracy, (qn , (bn , zn )), may equivalently be denoted by the tuple (q˜ n , (bn , z˜ n )), q˜ n ≡ {1, q˜n } and (bn , z˜ n ) ≡ {(bh,n , z˜ h,n )H h=1 }. The above characterization of the equilibrium in turn facilitates a simple definition of what it means to satisfy the conditions of equilibrium when n is arbitrarily large. We say (q˜ ∞ , (b∞ , z˜ ∞ ), x∞ ) satisfies the conditions of equilibrium of the n-financial assets economy with idiosyncracy where n is arbitrarily large, that is, n → ∞ if 8 : 1
2
Given q˜ ∞ , ((b∞ , z˜ ∞ ), x∞ ), is a solution to the problem P˜ h,∞ defined as follows:
s,t max CEπ ⊗ν uh xh,∞ ⎧ bh,∞ + q˜∞ z˜ h,∞ = 0 ⎪ ⎪ * ) ⎨ n (y s + y(ti )) s,t s s.t. xh,∞ , − eh,∞ = bh,∞ + z˜ h,∞ limn→∞ i=1 ⎪ n ⎪ ⎩ s = 1, 2, with probability 1,
= h z˜ h,∞ = 0, and the consumption vector is feasible at each ω ω state, that is, h xh,∞ = h eh,∞ with probability 1. h bh,∞
Theorem. Suppose ν 0 + ν 1 = 1. Then (q˜ ∞ , (b∞ , z˜ ∞ )) satisfies the conditions of equilibrium of the n-financial assets economy with idiosyncracy where n is arbitrarily large, if and only if, (q˜ ∞ , (b∞ , z˜ ∞ )) describes an equilibrium of an economy without idiosyncracy, wherein the price of a unit of z is equal to q˜∞ , and a household’s holding of the asset z, zh , is equal to z˜ h,∞ . The theorem shows that equilibrium allocations of the n-financial assets economy with idiosyncracy are essentially identical to that of the economy without idiosyncracy, in which financial markets are complete, provided the number of available financial assets is large enough and agents’ beliefs are unambiguous. The result follows from an application of the usual diversification principle stating that in the limit idiosyncracies are “washed away,” in conjunction with the assumption that y 1 = y 2 . However, if the model of the n-financial assets economy with idiosyncracy were to be reconsidered with the sole amendment that beliefs about idiosyncracies are ambiguous, that is, ν 0 + ν 1 < 1, then the result no longer holds. In such an economy, however large the n, given sufficient ambiguity, the equilibrium allocation is bounded away from Pareto optimal risk-sharing. The allocation actually coincides with the allocation of an incomplete market economy in which it is impossible to transfer resources between states 1 and 2, as we show in our main theorem, later. But, first, we present Example 15.1 to convey an intuition for the result.
346
Sujoy Mukerji and Jean-Marc Tallon
Example 15.1. Consider a 2-period finance economy with two risk averse agents, h = 1, 2, and two economic states. There are two assets available, b and z. b is a safe asset; it delivers one unit of the good in each of the two economic states. The payoff of z in state (s, t) is y s + y(t), s = α, β; t = 0, 1. Fix, y α = 1, y β = 2, y(0) = 0, y(1) = 2. First consider the case where ν 0 + ν 1 = 1. The model reduces to a standard incomplete market equilibrium with two assets and four states, in which, for “generic” endowments, there is trade, that is, some partial insurance among agents.9 Next, suppose, to simplify matters drastically, that ν 0 = ν 1 = 0. Consider an agent h contemplating buying the uncertain asset at a price q z , given the safe asset is priced q b = 1. h may buy zh units of the uncertain asset and take a position bh in the safe asset such that bh + q z zh = 0. His utility functional is then given by: CEπ ⊗ν uh (ehs + zh (y s + y(t)) + bh ) = uh (ehα + zh (y α + y(0)) + bh )π(α)(1 − ν 1 ) + uh (ehα + zh (y α + y(1)) + bh )π(α)ν 1
β + uh eh + zh y β + y(0) + bh π(β)(1 − ν 1 )
β + uh eh + zh y β + y(1) + bh π(β)ν 1 Once we substitute in ν 1 = 0, it is clear from the above functional that the payoff matrix the agent (as a buyer of z) will consider is: ) * 1 1 1 2 If q z ≥ 2, any balanced portfolio with zh > 0 yields negative payoffs and is therefore not worth buying. Thus, an agent will wish to buy the uncertain asset only if q z < 2. Next consider an agent h who contemplates going short on asset z. His utility functional is therefore: CEπ ⊗ν uh (ehs + zh (y s + y(t)) + bh ) = uh (ehα + zh (y α + y(0)) + bh )π(α)ν 0 + uh (ehα + zh (y α + y(1)) + bh )π(α)(1 − ν 0 )
β + uh eh + zh y β + y(0) + bh π(β)ν 0
β + uh eh + zh y β + y(1) + bh π(β)(1 − ν 0 ) Notice now the functional is dependent on ν 0 since the agent is going short, that is, zh < 0. Substituting ν 0 = 0, we find the payoff matrix the agent h will consider: ) * 1 3 1 4
Incompleteness of financial markets
347
For q z ≤ 3 any balanced portfolio with zh < 0 yields negative payoffs. Thus, an agent will wish to sell the risky asset only if q z > 3. Thus, buyers of asset 1 will not want to pay more than 2, while sellers will not sell it for less than 3. Hence, there does not exist an equilibrium price such that agents will have a nonzero holding of the uncertain asset. Next, consider another extreme, a case in which ambiguity appears only on the economic states while the agents are able to assess (additive) probabilities for the idiosyncratic states. In fact, to keep matters stark, assume π(α) = π(β) = 0, though the additive probability on idiosyncratic states is arbitrary, simply ensurβ ing that ν 0 + ν 1 = 1. Suppose that, for agent h, ehα > eh . Then, for zh ∈ (−ε, ε), for ε small enough, CEπ ⊗ν uh (ehs + zh (y s + y(t)) + bh )
β β = ν 0 uh eh + zh y β + y(0) + bh + ν 1 eh + zh y β + y(1) + bh β
since for zh small enough ehα + zh (y α + y(t)) + bh > eh + zh (y β + y(t)) + bh . Hence, zh = 0 if and only if q z = y β + ν 0 y(0) + ν 1 y(1) (the fact that endowments and the utility function do not appear in this expression is due to the extreme form of ambiguity assumed, that is, a maximin behavior). Thus, the only candidate for a no-trade equilibrium price is q z = y β + ν 0 y(0) + ν 1 y(1). Now, assume that for at least one other agent, the order of the endowment is β reversed, that is, eh > ehα , then a computation similar to the one earlier shows that such agents will not want to trade the risky asset if and only if q z = y α + ν 0 y(0) + ν 1 y(1) Hence, if both types of agents are present in the economy, trade will occur as y α = y β . If we were not to assume the extreme maximin form of preferences but that π(α) + π(β) < 1 with, say, π(α) > 0 and π(β) > 0, the no-trade price for β agent h (say with ehα > eh ) depends on his initial endowment and utility function (i.e. relative attitude to risk). In that case, even if endowments of all agents were β comonotonic (i.e. ehα eh for all h) there would not exist, for the generic endowment vector, an asset price q z that would support no-trade as an equilibrium of this economy. The two more significant ways in which the main theorem, generalizes the demonstration in Example 15.1 are: one, it shows that no-trade obtains even when beliefs have a degree of ambiguity strictly less than 1; two, it allows for any arbitrary number of financial assets, in particular, for n → ∞. We consider the intuition for each of these generalizations in turn. First, consider a 2-(economic) state, 2 agent, 1-financial asset (and 1 safe asset) economy with idiosyncracy,
348
Sujoy Mukerji and Jean-Marc Tallon
in which the financial asset’s payoffs are as in Example 15.1. Consider an agent thinking of buying the financial asset. The maximum payoff he expects in any economic state is 2 + 0 × (1 − ν 1 ) + 2 × ν 1 ≡ V (B), the amount he expects in state β. This implies, whatever his utility function, whatever his endowment vector, whatever his beliefs about the economic uncertainty, he will not want to buy the asset for more than V (B). Now, instead, if an agent were to go short with the asset, the least he expects to have to repay in any economic state is 1 + 0 × ν 0 + 2 × (1 − ν 0 ) ≡ V (S), and therefore, will not want to sell the asset if the price is less than this. Clearly, if ν 0 and ν 1 were small enough, V (B) < V (S). Therefore, if ν 0 and ν 1 were small enough, agents will not trade in the financial asset. Intuition about the second bit of generalization is difficult to obtain without some understanding of how the law of large numbers works for nonadditive beliefs. Specifically, let us consider an i.i.d. sequence {Xn }n≥1 of {0, 1}-valued random variables. Suppose, ν({Xn = 0}) = ν({Xn = 1}) = 14 for all n ≥ 1. As is usual with laws of large numbers, the question is about the limiting distribution of the sample average, n1 ni=1 Xi . The law10 implies: ν
1 1 1 3 ≤ lim inf Xi ≤ lim sup Xi ≤ n→∞ 4 n 4 n→∞ n n
n
i=1
i=1
! = 1.
This shows that the DM has a probability 1 4belief5 that the limiting value of the sample average lies in the (closed) interval 14 , 43 . However, unlike in the case of additive probabilities, the DM is not able to further pin down its value. Thus, even with non-additive probabilities the law of large numbers works in the usual way, in the sense that here too the tails of the distribution are “canceled out” and the distribution “converges on the mean.” But of course here, given that the DM’s knowledge is consistent with more than one prior, there is more than one mean to converge on; hence, the convergence is to the set of means corresponding to the set of priors consistent with the DM’s knowledge. Hence, a CEU maximizer whose (ex post) utility is increasing in X (e.g. when the DM is a buyer of an asset with payoff X) will behave as if the convergence of the sample average occurred at 14 , the lower boundary of the interval, while a DM whose utility is increasing in −X (e.g. when the DM is a seller of an asset with payoff X) will behave as if the convergence of the sample average occurred at, 34 , the upper boundary of the interval. Now we can complete our intuition for the main result. Consider a modification of the simplified financial economy of Example 15.1 such that, ceteris paribus, there are now n fold replicas of the financial asset, n → ∞. We consider trade between “two” assets, one the safe asset and the other the “portfolio” asset, containing each of the independent replica assets in equal proportion. The law of large numbers result, explained earlier, implies that any agent contemplating going long4 on the portfolio asset 5will behave as if a unit of the portfolio will payoff y s + 0 × (1 − ν 1 ) + 2 × ν 1 with probability 1 in economic state s
Incompleteness of financial markets
349
while an agent contemplating going short will5 behave as if a unit of the port4 folio will payoff y s + 0 × ν 0 + 2 × (1 − ν 0 ) with probability 1 in economic state s. Hence, exactly the same argument as before applies: for ν 0 and ν 1 sufficiently small, V (B) < V (S) and there will not be any trade in the portfolio. The important insight here is that while agents are fully aware that a “well diversified” portfolio “averages out” the idiosyncracies, they only have an imprecise knowledge of what it averages out to. Another important point demonstrated in Example 15.1, as modified earlier, is how equilibrium risk-sharing is affected by ambiguity aversion. If 1 − ν 0 − ν 1 > 12 , then the equilibrium allocation is necessarily not Pareto optimal unless endowments are, no matter how large the value of n. Consider an economy, E, which is the same as in the original example except that there is only one financial asset available in this economy, the safe asset b. Given ambiguity is greater than 1 2 , there is no-trade in the portfolio of uncertain assets in the economy in (the modified) Example 15.1, hence an equilibrium allocation of this economy is an equilibrium allocation of E. E has two states, α and β, but one asset, and therefore, is an incomplete markets economy with sub-optimal risk-sharing. We now state our main result: Main Theorem. Consider the n-financial assets economy with idiosyncracy. Let y s ≡ min{y s } and y s ≡ max{y s } and suppose that y s − y s < y(1) − y(0). Then s s ¯ z˜ h,n = 0 for all ¯ 0 < A¯ < 1, such that if 1 − ν 0 − ν 1 > A, there exists an A,
h ∈ {1, . . . , H } and xns,t = xns,t , s = 1, 2, t = t at every equilibrium (qn , (bn , zn )), for all n ∈ N, including, n → ∞.
Stated differently, this says that if the range of variation of the idiosyncratic component of the financial asset is greater than the range of variation due to the economic shocks, if the beliefs over the idiosyncratic states are ambiguous enough, and if agents are ambiguity averse, then irrespective of the utility functions of the agents and the endowment vector, the equilibrium of an n-financial assets economy with idiosyncracy is an equilibrium of the economy with one safe asset, that is, an economy with incomplete markets, since the financial assets are not traded in equilibrium, whatever the value of n. Notice further, if the conditions described in the theorem are met, then for a generic endowment, an equilibrium allocation of the n-financial assets economy with idiosyncracy is necessarily not Pareto optimal. This follows simply from the understanding that an equilibrium allocation of the n-financial assets economy with idiosyncracy, given the conditions of the theorem, is an equilibrium of the economy with one safe asset. The latter economy is an incomplete market economy in which it would not be possible to transfer resources between states 1 and 2. The significant sufficient condition to ensure no-trade, irrespective of the utility functions of the agents and the endowment vector, is that y s − y s < y(1) − y(0). ¯ A¯ = (y s − y s )/(y(1) − y(0)), The bound follows from the expression for A, constructed in the proof of the main theorem. Notice, A¯ is the supremum among
350
Sujoy Mukerji and Jean-Marc Tallon
the values of ambiguity required for no-trade, across all the possible combinations of parameters of utilities or endowments, and is independent of any parameter of utility or endowment. So, typically, the ambiguity required for no-trade be less ¯ further, no-trade will result even if y(1) − y(0) < y s − y s . Also, the than A; required ambiguity will be greater, greater the risk aversion and/or riskiness of the endowment (see example 3 in Mukerji and Tallon (1999)). One might be tempted to conjecture that results of the chapter may be replicated by simply assuming heterogeneous beliefs among agents. Or to conjecture, since with incomplete markets comonotonicity of equilibrium allocations is in general broken so that different (CEU) agents would evaluate their prospects using different (effective) probabilities, that adding CEU agents might “worsen” incompleteness even in the absence of idiosyncratic risks. Both conjectures are, however, false. What is at work in obtaining no-trade is not that different agents have different beliefs but that any given agent behaves as if he evaluates the two different actions, going short and going long, with different (probabilistic) beliefs. Neither does market incompleteness, in the absence of idiosyncratic risk, make for this peculiarity and therefore does not, in and of itself, lead to no-trade. We illustrate this with the following example. Example 15.2. Suppose there are S states, H agents, one safe asset and one risky asset that pays off y s unit of the good in state s. Agent h’s budget constraints are (we normalize the price of the safe asset in the first period as well as the price of the good in all states to be equal to 1): "
bh + qzh = 0 xhs = ehs + (y s − q)zh
s = 1, . . . , S
Claim. Assume that there are no pairs of states s and s such that y s = y s and es = es . Then, there exists a unique price qh such that zh∗ (qh ) = 0.
Proof. Assume w.l.o.g. that eh1 ≤ eh2 ≤ · · · ≤ ehS . Since by assumption ehs = ehs ⇒ y s = y s , there exists ε > 0 such that for all zh ∈ (−ε, ε): eh1 + (y 1 − q)zh ≤ eh2 + (y 2 − q)zh ≤ · · · ≤ ehS + (y S − q)zh Let (zh ) be the set of probability measures in C(ν) that minimize Eµ∈C (ν) uh (eh + (y s − q)zh ), that is, (zh ) = {(µ1 , . . . , µS ) ∈ C(ν) | Eµ uh (eh + (y s − q)zh ) = Eν uh (eh + (y s − q)zh )}. Observe that if µ, µ ∈ (zh ) are different, then they must disagree on those states where consumption is identical, or, said differently (given the order we adopted on h’s endowment):
ehs + (y s − q)zh = ehs + (y s − q)zh ∀s = s
⇒ µs = µ s = ν({s, . . . , S}) − ν({s + 1, . . . , S})
Incompleteness of financial markets
351
Hence, zh = 0 is optimal at price qh if and only if there exists µ ∈ (0) such that: µs y s uh (ehs ) q = qh ≡ s µs uh (ehs ) s
Recall now that probability measures in (0) can differ only on those states in which the endowment is constant. Since, by assumption, ehs = ehs ⇒ y s = y s , one obtains, Eµ [y s uh (ehs )] = Eµ [y s uh (ehs )] for all µ, µ ∈ (0). Since Eµ uh (ehs ) = Eµ uh (ehs ) for all µ, µ ∈ (0), qh as defined above is unique. We just established that there is only one price qh , defined in the proof earlier, such that at this price, agent h optimally wants a zero position in the risky asset. Now, unless the endowment allocation is Pareto optimal, qh = qh . Hence, at an equilibrium, trade on the market for the risky asset will be observed. This establishes that, “generically,” in order for zh = 0 for all h to be an equilibrium of the model, there must be pairs of states s, s such that ehs = ehs for all h and y s = y s ; in other words, an idiosyncratic element is necessary to obtain no trade. Before we close this section, we attempt to clarify further how our main result adds to the findings in the related literature. In Example 15.2, inspite of an incomplete markets environment, inspite of CEU agents, no-trade fails to materialize because each agent has a unique price at which he takes a zero position in the asset, and in general, this price is different for different agents. Dow and Werlang (1992) may be read as an exercise in purely deriving the demand function for a risky asset, given an initial riskless position. By putting together two Dow and Werlang agents one does obtain an economy where an equilibrium may be defined, but given that such agents’ endowments are riskless, agents do not have any risks to share in such an economy. Hence simply “completing” the Dow and Werlang exercise to obtain an equilibrium model does not allow one to investigate the question addressed in the present chapter, which is, whether ambiguity aversion affects risk-sharing possibilities in the economy. And, as explained in the previous section, even if we were to make the simple further extension of allowing uncertain endowments, given complete markets, we will find ambiguity has no effect. Finally, as Example 15.2 demonstrates, an even further extension of allowing market incompleteness does not provide the answer either. Evidently, one has to move further afield from the Dow and Werlang analysis to address our question. Epstein and Wang (1994) significantly generalized the Dow and Werlang (1992) result to find that price intervals supporting the zero position occurred (in equilibrium) if there were some states across which asset payoffs differ while endowments remain identical. The intuition for this is as follows. To obtain a range of supporting prices for the zero position, there must occur a “switch” in the effective probability distribution precisely at the zero position. That is, depending on whether he takes a position + or − away from 0, howsoever small, the agent evaluates his position using a different probability. For this to happen, the agent’s ranking
352
Sujoy Mukerji and Jean-Marc Tallon
of states (according to his consumption) must switch depending exclusively on whether he takes a posititive or negative position on the asset. Hence, there must be at least two states for which even the smallest movement away from the zero position would cause a difference in the ranking of the states depending on which side of zero one moves to. Clearly, this may only be true if the endowment were constant across the two states while the asset payoff were not. The clarification obtained in Epstein and Wang (1994) of the condition that enables multiple price supports to emerge, was the point of inspiration for the research reported in the present chapter. Indeed, the condition of Epstein and Wang (1994) is one of the two conditions we apply to define idiosyncratic risk. Where the present chapter has gone further and what, in essence, is its contribution, is in finding conditions for an economy wherein the agents’ price intervals overlap in such a manner that every equilibrium of the economy involves no-trade in an asset, and more importantly, conditions under which ambiguity aversion demonstrably “worsens” risk sharing and incompleteness of markets. These are issues that were neither addressed nor even raised in Epstein and Wang (1994), formally or informally, and understandably so, since the principal model in that paper was the Lucas (1978) pure exchange economy amended to include ambiguity averse beliefs. This is a model with a single representative agent, or equivalently, a number of agents with identical preferences and endowments. In an equilibrium of such an economy, trade and risk-sharing is trivial since agents will consume their endowments; endowments are, by construction, Pareto optimal.11 Kelsey and Milne (1995) extends the equilibrium arbitrage price theory (APT) by allowing for various kinds of nonexpected utility preferences. One of the cases they consider is the CEU model. The model in the present chapter may be thought of as a special case of the equilibrium APT framework: what are labeled as factor risks in APT are precisely what we call economic states and idiosyncratic risk is present in both models though in our model the idiosyncratic risk has a simpler structure in that there are only two possible idiosyncratic states corresponding to each asset. Only a special case of CEU preferences is investigated by Kelsey and Milne (1995): their Assumption 3.3 allows nonadditive beliefs only with respect to factor risks; idiosyncratic risk is described only by additive probabilities (see Assumption 3.3, the Remark following the assumption and footnote 2). The formal result of their analysis appears in Corollary 3.1 and shows, given the qualifications, the usual APT result continues to hold: diversification may proceed as usual, idiosyncratic risk disappears in the limit as the number of assets tend to infinity and the price of any asset is, consequently, a linear function of factor risks. This formal result is readily understandable given our analysis. As is repeatedly stressed upon in the present chapter, what drives our result is the nonadditivity of beliefs over the idiosyncratic states. While it is not necessary that ambiguity aversion be restricted to idiosyncratic states for our result to hold, it is necessary that there be some ambiguity about idiosyncracies. The no-trade result fails if ambiguity is merely restricted to economic states, as we explained in the latter part of Example 15.1 and in Example 15.2. With ambiguity only on economic states, ambiguity aversion has no bite, irrespective of whether there is only a single asset or infinitely many and
Incompleteness of financial markets
353
hence diversification proceeds as with SEU. Hence, their result would not obtain without the restriction imposed by (their) Assumption 3.3. Our analysis therefore warns against informally extrapolating the Kelsey and Milne (1995) result to think that diversification would proceed as usual even when the special circumstances of Assumption 3.3 does not hold (i.e. the ambiguity is not restricted to economic states but occurs more generally over the state space). Further, it would appear to be a compelling description of the economic environment to assume, if an agent is at all ambiguity averse, the agent will be ambiguity averse about an idiosyncratic risk. By definition, such a risk is unrelated to his own income risk and the macroeconomic environment; the risk stems from the internal workings of a particular firm, something about which the typical agent is likely to have little knowledge of. It is well known that it is possible to define more than one notion of independence for nonadditive beliefs. Ghirardato (1997) presents a comprehensive analysis of the various notions. As Ghirardato notes (p. 263), the problem of defining an independent product has been studied, previous to Ghirardato’s investigation, by Hendon et al. (1996), Gilboa and Schmeidler (1989) and Walley and Fine (1982). The definition invoked in the present chapter, suggested by Gilboa and Schmeidler (1989) and Walley and Fine (1982), is arguably the most prominent in the literature. However, the formal analysis in the present chapter, given the primitives of our model, does not hinge on this particular choice of the notion of independence. An important finding of Ghirardato’s analysis was that the proposed specific notions of independent product give rise to a unique product for cases in which marginals have some additional structural properties. The capacity we use in our model is a product of an additive probability and n two-point capacities (ν consists of two points, ν 0 and ν 1 ). A two-point capacity is, of course, a convex capacity and (trivially) a belief function. As is explicit in Theorems 2 and 3 in Ghirardato (1997), if marginals satisfy the structural properties, the marginals we use do, then uniqueness of product capacity obtains. That is, irrespective of which of the two definitions of independence is adopted, the one suggested by Hendon et al. (1996) or the one we use, the computed product capacity is the same. The law of large numbers that we use formally invokes the Gilboa–Schmeidler notion (see Marinacci, 1999: Theorem 15 and Section 7.2). Since both notions of independence are equivalent given the primitives of our model, it is of irrelevance to our analysis whether the law of large numbers that we use also holds if the alternative notion of independence were adopted. In other words, conclusions of our formal analysis are robust to the adoption of the alternative notion of independence.
15.4. Concluding remarks Financial assets typically carry some risk idiosyncratic to them, hence, disposing incomes risk using financial assets will involve buying into the inherent idiosyncratic risk. However, standard theory argues that diversification would, in principle, reduce the inconvenience of idiosyncratic risk to arbitrarily low levels thereby making the tradeoff between the two types of risk much less severe. This arguments is less robust than what standard theory leads us to believe. Ambiguity
354
Sujoy Mukerji and Jean-Marc Tallon
aversion can actually exacerbate the tension between the two kinds of risks to the point that classes of agents may find it impossible to trade some financial assets: they can no more rely on such assets as routes for “exporting” their income risks. Thus, theoretically, the effect of ambiguity aversion on financial markets is to make the risk sharing opportunities offered by financial markets less complete than it would be otherwise. This is the principal conclusion of the exercise in this chapter. This conclusion is robust, to the extent that many of the assumptions of the model presented in the last section could be substantially relaxed without losing the substance of the analytical results. First, it does not matter whether the beliefs about the economic states are ambiguous, the no-trade result still obtains. Second, given that diversification with replica assets doesn’t work with ambiguous beliefs, one might wonder whether diversification can be achieved through assets which are not replicas (in terms of payoffs). It turns out that it does not make any difference (to the main result) if we were to relax the assumption about “strict” replicas (see Mukerji and Tallon, 1999). It is instructive to note the distinction between the empirical content of a theory of no-trade based on the “lemons” problem (e.g. Morris (1997)) and the theory based on ambiguity aversion. The primitive of the former theory is asymmetric uncertainty between the transacting parties, and significantly, no-trade may result even if there were no idiosyncratic component. Thus that theory, per se, does not link the presence and extent of idiosyncratic component to no-trade. To obtain such a link, one has to assume, a priori, that there is sufficient asymmetric information only in the presence of idiosyncratic information. On the other hand, the theory based on ambiguity aversion does not require that one assumes that ambiguity is present only with idiosyncracies, or that agents have ambiguous beliefs especially with respect to payoffs of assets with idiosyncratic components. One may well begin with the primitive that ambiguity is present in a “general” way, across all contingencies: However, since ambiguity aversion selectively attacks only those assets whose payoffs have idiosyncratic components, the link between idiosyncracy and no-trade is endogenously generated in the theory based on ambiguity aversion. This positive understanding is of significance. History of financial markets is replete with episodes of increase in uncertainty leading to a thinning out of trade (or even seizing up completely) peculiarly in assets such as high yield corporate bonds (“junk” bonds) and bonds issued in “emerging markets” (namely, Latin America, Eastern Europe and East Asia) (see Mukerji and Tallon, 1999). The understanding also explains certain institutional structures adopted in some countries to protect markets from such episodes (see Mukerji and Tallon, 1999).
Appendix A: Some formal details relating to the CEU model Independent product for capacities We consider here the formal modeling of the idea of stochastic independence of random variables when beliefs are ambiguous. Let y be a function from a given space τ to R, and σ (y) be the smallest σ -algebra that makes y a random variable.
Incompleteness of financial markets
355
τn denotes the n-fold Cartesian product of τ , and σ (y1 , . . . , yn ) the product σ algebra on τn generated by the σ -algebras {σ (yi )}ni=1 . The following definition was proposed by Gilboa and Schmeidler (1989), and earlier, by Walley and Fine (1982). Definition 15.A.1. Let νi be a convex 1 non-additive probability defined on σ (yi ). The independent product, denoted ni=1 νi , is defined as follows n 6
νi (A) = min{(µ1 × · · · × µn )(A) : µi ∈ C(vi )
for 1 ≤ i ≤ n}
i=1
for every A ∈ σ (y1 , . . . , yn ), where µ1 × · · · × µn is the standard additive product on σ (y measure. We denote by ⊗νi any non-additive probability 1 11 , . . . , yn , . . .) such that for any finite class {yt1 , . . . , ytn } it holds i≥1 νi (A) = ni=1 νi (A) for every A ∈ σ (y1 , . . . , yn ). The computation of the Choquet expectation operator using product capacities is particularly simple for slice comonotonic functions (Ghirardato (1997)), defined now. Let X1 , . . . , Xn be n (finite) sets and let = X1 ×· · ·×Xn . Correspondingly, let νi be convex non-additive probabilities defined on algebras of subsets of Xi , i = 1, . . . , n. Definition 15.A.2. Let f : → R. We say that f has comonotonic xi -sections , x , . . . , x ) ∈ X × if for every (x1 , . . . , xi−1 , xi+1 , . . . , xn ), (x1 , . . . , xi−1 1 n i+1 · · · × Xi−1 × Xi+1 × · · · × Xn , f (x1 , . . . , xi−1 , ·, xi+1 , . . . , xn ): Xi → R, and , ·, x , . . . , x ): X → R are comonotonic functions. f is called f (x1 , . . . , xi−1 i n i+1 slice-comonotonic if it has comonotonic xi -sections for every i ∈ {1, . . . , n}. The following fact follows from Proposition 7 and Theorem 1 in Ghirardato (1997). Fact 15.A.1. Suppose that f : → R is slice comonotonic. Then CE⊗νi f (x1 , . . . , xn ) = CEν1 . . . CEνn f (x1 , . . . , xn ) In what follows we verify that Fact 15.A.1 applies to the calculation of Choquet expected utility of an agent’s contingent consumption vector. As in the main text
= S × {0, 1}n be the state space, with generic element ω = (s, t1 , . . . , tn ) = s,t , h’s consumption at state ω = (s, t). Finally (s, t). For a given h let x(ω) = xh,n let u: R → R denote the strictly increasing utility index. It will be shown that composite function, u ◦ x(·): → R is slice comonotonic, and therefore, the calculation of CEu (x(ω)) may obtain as in Fact 15.A.1. Recall, ! n y s + y(ti ) s zh x(ω) = x(s, t) = eh + bh + n i=1
where z˜ h is the holding of the diversified portfolio consisting of 1/n units of each financial asset. We first show that x(·) is slice comonotonic. This is done
356
Sujoy Mukerji and Jean-Marc Tallon
by demonstrating, in turn, that x has comonotonic s-sections and comonotonic tj -sections. Fix t = (t1 , . . . , tn ) and t = (t1 , . . . , tn ). Assume that x(s, t) ≥ x(s , t). Then, as required in Definition 15.A. (slice comonotonicity), we want to show that x(s, t) ≥ x(s , t). Now, x(s, t) ≥ x(s , t) ⇐⇒
ehs
n y s + y(ti ) n
+ bn + z˜ h
! ≥
ehs
+ bh + z˜ h
i=1
⇐⇒
ehs
+ bh + z˜ h y ≥ s
n
⇐⇒ ehs + bh + z˜ h
i=1
ehs
n y s + y(ti ) n
!
i=1
s
+ bh + z˜ h y ! ! n y s + y(ti ) y s + y(ti ) s ≥ eh + bh + z˜ h n n i=1
⇐⇒ x(s, t ) ≥ x(s , t ) Hence, x has comonotonic s-sections.
Next, fix (s, t−j ) where t−j = (t1 , . . . , tj −1 , tj +1 , . . . , tn ) and s , t−j . Now, x(s, t−j , tj ) ≥ x(s, t−j , tj )
⎛
⎞ y s + y(ti ) (y s + y(tj )) ⎠ + ⇐⇒ ehs + bh + z˜ h ⎝ n n i =j ⎞
⎛ y s + y(tj ) y s + y(ti ) ⎠ ≥ ehs + bh + z˜ h ⎝ + n n i =j
⇐⇒ y(tj ) ≥
y(tj )
, tj ) ≥ x(s , t−j , tj ) ⇐⇒ x(s , t−j
Repeating this, one shows that x has comonotonic tj -sections, for all j = 1, . . . , n. Hence, x is slice comonotonic. Now, it is possible to see that slice comonotonicity of u ◦ x(·): → R follows readily from the assumption that u is strictly increasing. To this end, notice: x(s, t) ≥ x(s , t) ⇐⇒ u(x(s, t)) ≥ u(x(s , t)) and x(s, t−j , tj ) ≥ x(s, t−j , tj ) ⇐⇒ u(x(s, t−j , tj )) ≥ u(x(s, t−j , tj ))
Incompleteness of financial markets
357
Law of large numbers for capacities: (Marinacci (1996) Theorem 7.7, Walley and Fine (1982)) Let y be a function from a given (countably) finite space to the real line R, and σ (y) the smallest σ -algebra that makes y a random variable. n denotes the n n-fold Cartesian product of , and σ (y1 , . . . , yn ) the product n σ -algebra on
n generated by the σ -algebras {σ (yi )}i=1 . Set Sn = (1/n) i=1 yi . Let each νi be sequence of random variables a convex capacity on σ (yi ), and let {yi }i≥1 be a 1 independent and identically distributed relative to νi . Set Sn = (1/n) ni=1 yi . Suppose both CEν1 (y1 ) and CEν1 (−y1 ) exist. Then % " 1 ω ∈ ∞ : CEν1 (y1 ) ≤ lim inf n Sn (ω) ≤ lim supn Sn (ω) 1. νi = 1. ≤ −CEν1 (−y1 ) " % 1 ω ∈ ∞ : CEν1 (y1 ) < lim inf n Sn (ω) ≤ lim supn Sn (ω) 2. νi = 0. < −CEν1 (−X1 ) 1 3. νi ({ω ∈ ∞ : CEν1 (y1 ) = lim inf n Sn (ω)}) = 0. 1 4. νi ({ω ∈ ∞ : − CEν1 (−y1 ) = lim supn Sn (ω)}) = 0.
Appendix B: Proofs of results in the main text i
Proof of the Lemma. Suppose w.l.o.g q z ≥ q z for some i, i ∈ {1, . . . , n}. First i i i i > zh,n for some ≤ zh,n , ∀h ∈ {1, . . . , H }. Indeed, assume zh,n we show that zh,n and construct the portfolio zh,n as follows: i
zih,n
=
i zh,n
− ε,
zih,n
=
i zh,n
+
qz
i
i qz
ε,
and zh,n = zh,n ∀j = i, i . j
j
where ε is small enough so that zih,n > zih,n . Note, zh,n is budget feasible. Let x s,t h,n
≡
ehs
+ bh,n +
n
i zh,n (y s + y(ti )) for s = 1, 2
i=1 s,t xh,n
x s,t h,n
and are comonotonic, and uh is strictly increasing, it folBecause lows from Definition 15.A.1 that exists an additive product measure µ, where µ ≡ ×ni=1 µi , and µi : 2{0,1} → [0, 1] are additive measures, such that,
s,t s,t s,t CEπ ⊗ν xh,n = Eπ×µ xh,n , CEπ⊗ν x s,t h,n = Eπ×µ x h,n , and,
s,t s,t = Eπ×µ uh xh,n , CEπ ⊗ν uh xh,n
= Eπ×µ uh x s,t , s = 1, 2, ∀t ∈ τn . CEπ ⊗ν uh x s,t h,n h,n
s,t | s = Eµ x s,t Furthermore, Eµ xh,n h,n | s + Eµi εy(ti ) − Eµi εy(ti ), s = 1, 2.
i i Next, notice Eµi εy(ti )−Eµi εy(ti ) ≤ 0. Indeed, either zh,n and zh,n have the same
358
Sujoy Mukerji and Jean-Marc Tallon
i i > 0 > zh,n sign, in which case µi = µi and Eµi εy(ti ) − Eµi εy(ti ) = 0. Or zh,n and then
Eµi εy(ti ) − Eµi εy(ti ) = ε[1 − ν 0 − ν 1 ][y(0) − y(1)] ≤ 0 Hence, x s stochastically dominates x s . Given u < 0, therefore, s,t s,t Eπ ×µ uh (x s,t > h,n ) > Eπ×µ uh (xh,n ). As a consequence, CEπ⊗ν uh (x h,n ) s,t CEπ ⊗ν uh (xh,n ). But this is a contradiction to the hypothesis that (qn , (bn , zn , xn )) i i is an equilibrium. ∴ zh,n ≤ zh,n , ∀h ∈ {1, . . . , H }. H i i Since, (qn , (bn , zn , xn )) is an equilibrium, H h=1 zh,n = 0. Thereh=1 zh,n = i i i i fore, using the fact that zh,n ≤ zh,n for all h, we get that zh,n = zh,n for all h. h∞ given asset prices q˜ ∞ may Proof of the Theorem. The maximization problem P be written as follows: * ) n y s + y(t ) i max Eπ ⊗ν uh ehs + bh,∞ + zh,∞ limn→∞ n i=1 s.t. bh,∞ + q˜∞ z˜ h,∞ = 0 And the maximization problem Ph , solved by the agent in an economy without idiosyncracy, given asset prices q = q˜ ∞ : max s∈{1,2} π(s)uh (ehs + bh,n + zh y s ) s.t. bh + q˜∞ zh = 0 If n → ∞, by the law of large numbers, with probability 1 a unit of the portfolio z˜ n yields a payoff of y s + Et∈{0,1} y(t) ≡ y s units. That is, a.s limn→∞ ni=1 ((y s + y(ti ))/n) −→ y s . Recall, the financial asset z yields y s units of the good in the economic states s = 1, 2. h∞ at prices ( Hence, (b∞ , z∞ ) solves the maximization problem P q∞ ), if and z∞ ) also solves the maximization problem Ph at prices ( q∞ ). only if (b∞ , Finally note, if ( q∞ , (b∞ , z∞ , x∞ )) describes an equilibrium of the n-financial assets economy with idiosyncracy it must be that (b∞ , z∞ ) satisfies the conditions q∞ ) will also clear asset of (asset) market clearing at the price vector q∞ . Hence, ( z∞ , x∞ )) markets in the economy without idiosyncracy. Conversely, if ( q∞ , (b∞ , describes an equilibrium of the economy without idiosyncracy then ( q∞ ) will also clear asset markets in the n-financial assets economy with idiosyncracy. hn , the maximization problem in the Proof of the Main Theorem. Consider P n-financial asset economy with idiosyncracy. Suppose that, at equilibrium there exists h such that z˜ h ,n = 0, say z˜ h ,n > 0. Then, there must be h such that
Incompleteness of financial markets
359
z˜ h ,n < 0. Next, since z˜ h ,n > 0 and y(0) < y(1), Fact 15.A.1 together with the fact ω that uh (xh,n ) is slice-comonotonic (see Appendix A), implies that CEπ⊗ν uh (xhs,t ,n ) is a standard expectation with respect to the additive measure π × µ(t), where µ(t) = (1 − ν 1 )n0 × (ν 1 )n−n0 , n0 being the number of financial assets whose s,(t ,t )
idiosyncratic payoff is y(0) at state (s, t). This is because xh ,ni −i is necessarily smaller at a state (s, (0, t−i )) than at the state (s, (1, t−i )), s = 1, 2. The first order h n (for agent h ) then give: conditions of the problem P
q˜n =
Eπ ×µ
38 ((y s + y(ti ))/n)uh xhs,t ,n i=1 7
8 Eπ×µ uh xhs,t ,ni
7n
2
Notice, for s = 1, 2, xhs,t ,n and ni=1 ((y s + y(ti ))/n) are positively dependent given s (see Magill and Quinzii (1996)) since z˜ h ,n > 0. Hence, because u (·) < 0, ! n s
y + y(ti ) s,t < 0, given s. , uh xh ,n Covariance n i=1
Now, Eµ
.
n " s
% y + y(ti ) s,t uh xh ,n n i=1
!
n s
y + y(ti ) = Covariance , uh xhs,t ,n n i=1 - n .
y s + y(ti ) + Eµ Eµ uh xhs,t ,n . n i=1
Thus, -
. n " s
% y + y(ti ) s,t uh xh ,n Eµ n i=1 . - n
y s + y(ti ) < Eµ Eµ uh xhs,t ,n . n i=1
360
Sujoy Mukerji and Jean-Marc Tallon
Hence, 7n
8 π(s)Eµ ((y s + y(ti ))/n) Eµ uh xhs,t ,n s=1 i=1 q˜n < 8 7
2 π(s)Eµ uh xhs,t ,n s=1 .$ - n y s + y(ti ) ⇒ q˜n < max Eµ s n 2
i=1
⇒ q˜n < y + (1 − ν 1 )y(0) + ν 1 y(1) s
(15.A.1)
Consider next, h such that z˜ h ,n < 0. By a reasoning similar to that followed n s,t (y s +y(ti )) for the agent h (noticing xh ,n and i=1 are negatively dependent n given s) one gets q˜n > y s + (1 − ν 0 )y(1) + ν 0 y(0).
(15.A.2)
Therefore, a necessary condition for having an equilibrium with z˜ h,n = 0 for at least some h is that y s − y s > (1 − ν 0 − ν 1 )(y(1) − y(0)). Set A¯ = ¯ then z˜ h,n = 0 for all (y s − y s )/(y(1) − y(0)) ∈ (0, 1). If 1 − ν 0 − ν 1 > A, h, at any equilibrium.
Finally, note if n → ∞, CEπ⊗ν uh xhs,t ,n is just a standard expectation operator with respect to the additive measure π × µ(t), where µ(t) is such that µ
$!
n s (y + y(ti )) t : lim = y s + (1 − ν 1 )y(0) + ν 1 y(1) n→∞ n
= 1.
i=1
The proof then proceeds as in the case of finite n except that the inequality (15.A.1) reads lim q˜n ≤ y s + (1 − ν 1 )y(0) + ν 1 y(1)
n→∞
and the inequality (15.A.2) reads lim q˜n ≥ y s + (1 − ν 0 )y(1) + ν 0 y(0).
n→∞
Acknowledgments We thank the referees and the Managing Editor, M. Armstrong, as well as, A. Bisin, S. Bose, P. Ghirardato, I. Gilboa, R. Guesnerie, B. Lipman, J. Malcomson, J. Marin, M. Marinacci, M. Piccione, Z. Safra, H. S. Shin and J. C. Vergnaud for helpful comments. The chapter has also benefitted from the responses of seminar members at the University of British Columbia, University of Essex, University Evry,
Incompleteness of financial markets
361
Johns Hopkins, Nuffield College, Tilburg University, CORE-Louvain-la-Neuve, NYU, UPenn, University of Paris I, University of Toulouse, University du Maine, ENPC-Paris, University Venezia and the ESRC Economic Theory Conference at Kenilworth. The first author gratefully acknowledges financial assistance from an Economic and Social Research Council of U.K. Research Fellowship (# R000 27 1065).
Notes 1 Recent literature has debated the merits of the CEU framework as a model of ambiguity aversion. For instance, Epstein (1999) contends that CEU preferences associated with convex capacities (see Section 15.2,) do not always conform with a “natural” notion of ambiguity averse behavior. On the other hand, Ghirardato and Marinacci (1997) argue that ambiguity aversion is demonstrated in the CEU model by a broad class of capacities which includes convex capacities. 2 And, indeed, formal empirical investigations overwhelmingly confirm that the data on individual consumption are more consistent with incomplete than complete markets. Among others, see Zeldes (1989), Carroll (1992), Deaton and Paxson (1994) and Hayashi et al. (1996). The evidence, however, is not unanimous, see, for example, Mace (1991). 3 We will say an asset’s payoff has an idiosyncratic component if at least some component of the payoff is independent of (1) the realized endowments of agents and (2) of the payoff of any other asset as well. 4 Fishburn (1993) provides an axiomatic justification of this definition of ambiguity and Mukerji (1997) demonstrates its equivalence to a more primitive and epistemic notion of ambiguity (expressed in terms of the DM’s knowledge of the state space). 5 The Choquet expectation operator may be directly defined with respect to a non-additive probability, see Schmeidler (1989). Also, for an intuitive introduction to the CEU model see Section 2 in Mukerji (1998). 6 For instance, suppose a firm introduces a new product line, an innovation, into the market. In such a case, typically, it is not just the shocks commonly affecting firms in the same trade that will affect the sales of the new product but also more (brand) specific elements, for example, whether (or not) the innovation has a “special” appeal for the consumers. Another example of idiosyncratic shocks, are shocks to firms’internal organizational capabilities. 7 In this context it is worth noting that it is reported almost 70 percent of corporate borrowing in the US is through bonds. Default rates on bonds are also significant. Financial Times, October 13, 1998, in its report headlined “US corporate bond market hit,” notes, “the rate of default on US high-yield bonds was running at 10% in the early 1990s … today the default rate is hovering around 3% but creeping higher.” 8 Werner (1997) considers a finance economy of which this is just a special case. There are standard arguments that ensure the existence of equilibria of such economies (op.-cit., p. 100). 9 This has to be qualified since there exists some nongeneric constraints among endowments in different states, namely ehs,t = ehs,t ≡ ehs . 10 Laws of large numbers for ambiguous beliefs have been studied by, among others, Walley and Fine (1982), Marinacci (1996, 1999). Appendix A contains a formal statement of the version we apply. This version was, essentially, originally proved in Walley and Fine (1982). The statement given here is from Marinacci (1996), Theorem 7.7. However, the result is a direct implication of the more general Theorem 15 in Marinacci (1999).
362
Sujoy Mukerji and Jean-Marc Tallon
11 Section 3.4 of Epstein and Wang (1994), presents an example of an economy with heterogeneous agents. But, in this model markets are assumed to be complete, and hence, risk-sharing continues to be efficient (Pareto optimal), as is explicitly observed by the authors.
References Bewley, T. (1986). “Knightian Decision Theory: Part,” Discussion Paper 807, Cowles Foundation. Carroll, C. (1992). “The Buffer Stock Theory of Savings; some Macroeconomic Evidence,” Brookings Papers on Economic Activity, 2, 61–135. Chateauneuf, A., R. Dana, and J.-M. Tallon (2000). “Optimal Risk Sharing Rules and Equilibria with Choquet-Expected-Utility,” Journal of Mathematical Economics, 34, 191–214. Deaton, A. and C. Paxson (1994). “Intertemporal Choice and Inequality,” Journal of Political Economy, 102(3), 437–467. Dow, J. and S. Werlang (1992). “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, 60(1), 197–204. (Reprinted as Chapter 17 in this volume.) Epstein, L. (1999). “A Definition of Uncertainty Aversion,” Review of Economic Studies, 66, 579–608. (Reprinted as Chapter 9 in this volume.) Epstein, L. and T. Wang (1994). “Intertemporal Asset Pricing under Knightian Uncertainty,” Econometrica, 62(3), 283–322. (Reprinted as Chapter 18 in this volume.) Fishburn, P. (1993). “The Axioms and Algebra of Ambiguity,” Theory and Decision, 34, 119–137. Ghirardato, P. (1997). “On Independence for Non-Additive Measures, with a Fubini Theorem,” Journal of Economic Theory, 73, 261–291. Ghirardato, P. and M. Marinacci (1997). “Ambiguity Aversion Made Precise: A Comparative Foundation and Some Implications,” Social science w.p. 1026, CalTech. Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Hayashi, F., J.Altonji, and L. Kotlikof (1996). “Risk Sharing Between and Within Families,” Econometrica, 64(2), 261–294. Hendon, E., H. Jacobsen, B. Sloth, and T. Tranaes (1996). “The Product of Capacities and Belief Functions,” Mathematical Social Sciences, 32(2), 95–108. Kelsey, D. and F. Milne (1995). “The Arbitrage Pricing Theorem with Non-Expected Utility Preferences,” Journal of Economic Theory, 65(2), 557–574. Lucas, R. (1978). “Asset Prices in an Exchange Economy,” Econometrica, 46, 1429–1445. Mace, B. (1991). “Full Insurance in the Presence of Aggregate Uncertainty,” Journal of Political Economy, 99(5), 928–956. Magill, M. and M. Quinzii (1996). Theory of Incomplete Markets, Vol. 1, MIT Press. Marinacci, M. (1996). “Limit Laws for Non-Additive Probabilities, and their Frequentist Interpretation,” mimeo. Marinacci, M. (1999). “Limit Laws for Non-Additive Probabilities and their Frequentist Interpretation,” Journal of Economic Theory, 84, 145–195. Morris, S. (1997). “Risk, Uncertainty and Hidden Information,” Theory and Decision, 42(3), 235–269.
Incompleteness of financial markets
363
Mukerji, S. (1997). “Understanding the Nonadditive Probability Decision Model,” Economic Theory, 9(1), 23–46. Mukerji, S. (1998). “Ambiguity Aversion and Incompleteness of Contractual Form,” American Economic Review, 88(5), 1207–1231. (Reprinted as Chapter 14 in this volume.) Mukerji, S., and J.-M. Tallon (1999). “Ambiguity Aversion and Incompleteness of Financial Markets-Extended Version,” Mimeo. 99-28, Cahiers de la Maison des Sciences Economiques, Universite Paris I, available for download at http://eurequa.univparis1.fr/membros/tallon/tallon.htm Schmeidler, D. (1989). “Subjective Probability and Expected Utility Without Additivity,” Econometrica, 57(3), 571–587. (Reprinted as Chapter 5 in this volume.) Walley, P. and L. Fine (1982). “Towards a Frequentist Theory of Upper and Lower Probability,” Annals of Statistics, 10, 741–761. Werner, J. (1997). “Diversification and Equilibrium in Securities Markets,” Journal of Economic Theory, 75, 89–103. Zeldes, S. (1989). “Consumption and Liquidity: An Empirical Investigation,” Journal of Political Economy, 97, 305–346.
16 A quartet of semigroups for model specification, robustness, prices of risk, and model detection Evan W. Anderson, Lars Peter Hansen, and Thomas J. Sargent
16.1. Introduction 16.1.1. Rational expectations and model misspecification A rational expectations econometrician or calibrator typically attributes no concern about specification error to agents even as he shuttles among alternative specifications.1 Decision makers inside a rational expectations model know the model.2 Their confidence contrasts with the attitudes of both econometricians and calibrators. Econometricians routinely use likelihood-based specification tests (information criteria or IC) to organize comparisons between models and empirical distributions. Less formally, calibrators sometimes justify their estimation procedures by saying that they regard their models as incorrect and unreliable guides to parameter selection if taken literally as likelihood functions. But the agents inside a calibrator’s model do not share the model-builder’s doubts about specification. By equating agents’ subjective probability distributions to the objective one implied by the model, the assumption of rational expectations precludes any concerns that agents should have about the model’s specification. The empirical power of the rational expectations hypothesis comes from having decision makers’beliefs be outcomes, not inputs, of the model-building enterprise. A standard argument that justifies equating objective and subjective probability distributions is that agents would eventually detect any difference between them, and would adjust their subjective distributions accordingly. This argument implicitly gives agents an infinite history of observations, a point that is formalized by the literature on convergence of myopic learning algorithms to rational expectations equilibria of games and dynamic economies.3 Specification tests leave applied econometricians in doubt because they have too few observations to discriminate among alternative models. Econometricians with finite data sets thus face a model detection problem that builders of rational
Anderson Evan W., Lars Peter Hansen, and Thomas J. Sargent, (forthcoming), “A quartet of semigroups for model specification, robustness, prices of risk, and model detection,” Journal of the European Economic Association. March 2003; 1(1): 68–123.
A quartet of semigroups for model specification
365
expectations models let agents sidestep by endowing them with infinite histories of observations “before time zero.” This chapter is about models with agents whose databases are finite, like econometricians and calibrators. Their limited data leave agents with model specification doubts that are quantitatively similar to those of econometricians and that make them value decision rules that perform well across a set of models. In particular, agents fear misspecifications of the state transition law that are sufficiently small that they are difficult to detect because they are obscured by random shocks that impinge on the dynamical system. Agents adjust decision rules to protect themselves against modeling errors, a precaution that puts model uncertainty premia into equilibrium security market prices. Because we work with Markov models, we can avail ourselves of a powerful tool called a semigroup. 16.1.2. Iterated laws and semigroups The law of iterated expectations imposes consistency requirements that cause a collection of conditional expectations operators associated with a Markov process to form a mathematical object called a semigroup. The operators are indexed by the time that elapses between when the forecast is made and when the random variable being forecast is realized. This semigroup and its associated generator characterize the Markov process. Because we consider forecasting random variables that are functions of a Markov state, the current forecast depends only on the current value of the Markov state.4 The law of iterated values embodies analogous consistency requirements for a collection of economic values assigned to claims to payoffs that are functions of future values of a Markov state. The family of valuation operators indexed by the time that elapses between when the claims are valued and when their payoffs are realized forms another semigroup. Just as a Markov process is characterized by its semigroup, so prices of payoffs that are functions of a Markov state can be characterized by a semigroup. Hansen and Scheinkman (2002) exploited this insight. Here we extend their insight to other semigroups. In particular, we describe four semigroups: (1) one that describes a Markov process; (2) another that adjusts continuation values in a way that rewards decision rules that are robust to misspecification of the approximating model; (3) another that models the equilibrium pricing of securities with payoff dates in the future; and (4) another that governs statistics for discriminating between alternative Markov processes using a finite time series data record.5 We show the close connections that bind these four semigroups. 16.1.3. Model detection errors and market prices of risk In earlier work (Hansen, Sargent, and Tallarini (1999), henceforth denoted HST, and Hansen, Sargent, and Wang (2002), henceforth denoted HSW), we studied various discrete time asset pricing models in which decision makers’ fear of model misspecification put model uncertainty premia into market prices of risk, thereby
366 Anderson et al. potentially helping to account for the equity premium. Transcending the detailed dynamics of our examples was a tight relationship between the market price of risk and the probability of distinguishing the representative decision maker’s approximating model from a worst-case model that emerges as a byproduct of his cautious decision making procedure. Although we had offered only a heuristic explanation for that relationship, we nevertheless exploited it to help us calibrate the set of alternative models that the decision maker should plausibly seek robustness against. In the context of continuous time Markov models, this chapter analytically establishes a precise link between the uncertainty component of risk prices and a bound on the probability of distinguishing the decision maker’s approximating and worstcase models. We also develop new ways of representing decision makers’ concerns about model misspecification and their equilibrium consequences.
16.1.4. Related literature In the context of a discrete-time, linear-quadratic permanent income model, HST considered model misspecifications measured by a single robustness parameter. HST showed how robust decision making promotes behavior like that induced by risk aversion. They interpreted a preference for robustness as a decision maker’s response to Knightian uncertainty and calculated how much concern about robustness would be required to put market prices of risk into empirically realistic regions. Our fourth semigroup, which describes model detection errors, provides a statistical method for judging whether the required concern about robustness is plausible. HST and HSW allowed the robust decision maker to consider only a limited array of specification errors, namely, shifts in the conditional mean of shocks that are i.i.d. and normally distributed under an approximating model. In this chapter, we consider more general approximating models and motivate the form of potential specification errors by using specification test statistics. We show that HST’s perturbations to the approximating model emerge in linear-quadratic, Gaussian control problem as well as in a more general class of control problems in which the stochastic evolution of the state is a Markov diffusion process. However, we also show that misspecifications different from HST’s must be entertained when the approximating model includes Markov jump components. As in HST, our formulation of robustness allows us to reinterpret one of Epstein and Zin’s (1989) recursions as reflecting a preference for robustness rather than aversion to risk. As we explain in Hansen, Sargent, Turmuhambetova, and Williams (henceforth HSTW) (2002), the robust control theory described in Section 16.5 is closely connected to the minmax expected utility or multiple priors model of Gilboa and Schmeidler (1989). A main theme of this chapter is to advocate a workable strategy for actually specifying those multiple priors in applied work. Our strategy is to use detection error probabilities to surround the single model that is typically specified in applied work with a set of empirically plausible but vaguely specified alternatives.
A quartet of semigroups for model specification
367
16.1.5. Robustness versus learning A convenient feature of rational expectations models is that the model builder imputes a unique and explicit model to the decision maker. Our analysis shares this analytical convenience. While an agent distrusts his model, he still uses it to guide his decisions.6 But the agent uses his model in a way that recognizes that it is an approximation. To quantify approximation, we measure discrepancy between the approximating model and other models with relative entropy, an expected log likelihood ratio, where the expectation is taken with respect to the distribution from the alternative model. Relative entropy is used in the theory of large deviations, a powerful mathematical theory about the rate at which uncertainty about unknown distributions is resolved as the number of observations grows.7 An advantage of using entropy to restrain model perturbations is that we can appeal to the theory of statistical detection to provide information about how much concern about robustness is quantitatively reasonable. Our decision maker confronts alternative models that can be discriminated among only with substantial amounts of data, so much data that, because he discounts the future, the robust decision maker simply accepts model misspecification as a permanent situation. He designs robust controls, and does not use data to improve his model specification over time. He adopts this stance because relative to his discount factor, it would take too much time for enough data to accrue for him to dispose of the alternative models that concern him. In contrast, many formulations of learning have decision makers fully embrace an approximating model when making their choices.8 Despite their different orientations, learners and robust decision makers both need a convenient way to measure the proximity of two probability distributions. This fact builds technical bridges between robust decision theory and learning theory. The same expressions from large deviation theory that govern bounds on rates of learning also provide bounds on value functions across alternative possible models in robust decision theory.9 More importantly here, we shall show that the tight relationship between detection error probabilities and the market price of risk that was encountered by HST and HSW can be explained by formally studying the rate at which detection errors decrease as sample size grows.
16.1.6. Reader’s guide A reader interested only in our main results can read Section 16.2, then jump to the empirical applications in Section 16.9.
16.2. Overview This section briefly tells how our main results apply in the special case in which the approximating model is a diffusion. Later sections provide technical details and show how things change when we allow jump components.
368 Anderson et al. A representative agent’s model asserts that the state of an economy xt in a state space D follows a diffusion10 dxt = µ(xt )dt + (xt )dBt ,
(16.1)
where Bt is a Brownian vector. The agent wants decision rules that work well not just when (16.1) is true but also when the data conform to models that are statistically difficult to distinguish from (16.1). A robust control problem to be studied in Section 16.5 leads to such a robust decision rule together with a value function V (xt ) and a process γ (xt ) for the marginal utility of consumption of a representative agent. As a byproduct of the robust control problem, the decision maker computes a worst-case diffusion that takes the form ˆ t )]dt + (xt )dBt , dxt = [µ(xt ) + (xt )g(x
(16.2)
where gˆ = (−1/θ ) (∂V /∂x) and θ > 0 is a parameter measuring the size of potential model misspecifications. Notice that (16.2) modifies the drift but not the volatility relative to (16.1). The formula for gˆ tells us that large values of θ are associated with gˆ t ’s that are small in absolute value, making model (16.2) difficult to distinguish statistically from model (16.1). The diffusion (16.6) lets us quantify just how difficult this statistical detection problem is. Without a preference for robustness to model misspecification, the usual approach to asset pricing is to compute the expected discounted value of payoffs with respect to the “risk-neutral” probability measure that is associated with the following twisted version of the physical measure (diffusion (16.1)): dxt = [µ(xt ) + (xt )g(x ¯ t )]dt + (xt )dBt .
(16.3)
In using the risk-neutral measure to price assets, future expected returns are discounted at the risk-free rate ρ(xt ), obtained as follows. The marginal utility of the representative household γ (xt ) conforms to dγt = µγ (xt )dt + σγ (xt )dBt . Then the risk-free rate is ρ(xt ) = δ − (µγ (xt )/γ (xt )), where δ is the instantaneous rate at which the household discounts future utilities; the risk-free rate thus equals the negative of the expected growth rate of the representative household’s marginal utility. The price of a payoff φ(xN ) contingent on a Markov state in period N is then * ) N ρ(xu )du φ(xN )|x0 = x , (16.4) E¯ exp − 0
where E¯ is the expectation evaluated with respect to the distribution generated by (16.3). This formula gives rise to a pricing operator for every horizon N . Relative to the approximating model, the diffusion (16.3) for the risk-neutral measure distorts the drift in the Brownian motion by adding the term (x)g(x ¯ t ), where g¯ = (∂ log γ (x)/∂x). Here g¯ is a vector of “factor risk prices” or “market prices of risk.” The equity premium puzzle is the finding that with plausible quantitative
A quartet of semigroups for model specification
369
specifications for the marginal utility γ (x), factor risk prices g¯ are too small relative to their empirically estimated counterparts. In Section 16.7, we show that when the planner and a representative consumer want robustness, the diffusion associated with the risk-neutral measure appropriate for pricing becomes dxt = (µ(xt ) + (xt )[g(x ¯ t ) + g(x ˆ t )])dt + (xt )dBt ,
(16.5)
where gˆ is the same process that appears in (16.2). With robustness sought over a set of alternative models that is indexed by θ , factor risk prices become augmented to g+ ¯ g. ˆ The representative agent’s concerns about model misspecification contribute the gˆ component of the factor risk prices. To evaluate the quantitative potential for attributing parts of the market prices of risk to agents’ concerns about model misspecification, we need to calibrate θ and therefore |g|. ˆ To calibrate θ and g, ˆ we turn to a closely related fourth diffusion that governs the probability distribution of errors from using likelihood ratio tests to detect which of two models generated a continuous record of length N of observations on xt . Here the key idea is that we can represent the average error in using a likelihood ratio test to detect the difference between the two models (16.1) and (16.3) from a continuous record of data of length N as 0.5E (min{exp(N ), 1}|x0 = x) where E is evaluated with respect to model (16.1) and N is a likelihood ratio of the data record of model (16.2) with respect to model (16.1). For each α ∈ (0, 1), we can use the inequality E(min{exp(N ), 1}|x0 = x) ≤ E({exp(αN )}|x0 = x) to attain a bound on the detection error probability. For each α, we show that the bound can be calculated by forming a new diffusion that uses (16.1) and (16.2) as ingredients, and in which the drift distortion gˆ from (16.2) plays a key role. In particular, for α ∈ (0, 1), define dxtα = [µ(xt ) + α(xt )g(x ˆ t )]dt + (xt ) dBt ,
(16.6)
and define the local rate function ρ α (x) = ((1−α)α/2)g(x) ˆ Then the bound ˆ g(x). on the average error in using a likelihood ratio test to discriminate between the approximating model (16.1) and the worst-case model (16.2) from a continuous data record of length N is ) N * α α av error ≤ 0.5E exp − ρ (xt ) dt x0 = x , (16.7) 0
where E α is the mathematical expectation evaluated with respect to the diffusion (16.6). The error rate ρ α (x) is maximized by setting α = 0.5. Notice that the right side of (16.7) is one half the price of pure discount bond that pays off one unit of consumption for sure N periods in the future, treating ρ α as the risk-free rate and the measure induced by (16.6) as the risk-neutral probability measure. It is remarkable that the three diffusions (16.2), (16.5), and (16.6) that describe the worst case model, asset pricing under a preference for robustness, and the local behavior of a bound on model detection errors, respectively, are all obtained
370 Anderson et al. by perturbing the drift in the approximating model (16.1) with functions of the same drift distortion g(x) ˆ that emerges from the robust control problem. To the extent that the bound on detection probabilities is informative about the detection probabilities themselves, our theoretical results thus neatly explain the pattern that was observed in the empirical applications of HST and HSW, namely, that there is a tight link between calculated detection error probabilities and the market price of risk. That link transcends all details of the model specification.11 . In Section 16.9, we shall encounter this tight link again when we calibrate the contribution to market prices of risk that can plausibly be attributed to a preference for robustness in the context of three continuous time asset pricing models. Subsequent sections of this chapter substantiate these and other results in a more general Markov setting that permits x to have jump components, so that jump distortions also appear in the Markov processes for the worst-case model, asset pricing, and model detection error. We shall exploit and extend the assetpricing structure of formulas like (16.4) and (16.7) by recognizing that they reflect that collections of expectations, values, and bounds on detection error rates can all be described with semigroups.
16.3. Mathematical preliminaries The remainder of this chapter studies continuous-time Markov formulations of model specification, robust decision making, pricing, and statistical model detection. We use Feller semigroups indexed by time for all four purposes. This section develops the semigroup theory needed for our chapter. 16.3.1. Semigroups and their generators Let D be a Markov state space that is a locally compact and separable subset of R m . We distinguish two cases. First, when D is compact, we let C denote the space of continuous functions mapping D into R. Second, when we want to study cases in which the state space is unbounded so that D is not compact, we shall use a one-point compactification that enlarges the state space by adding a point at ∞. In this case we let C be the space of continuous functions that vanish at ∞. We can think of such functions as having domain D or domain D ∪ ∞. The compactification is used to limit the behavior of functions in the tails when the state space is unbounded. We use the sup-norm to measure the magnitude of functions on C and to define a notion of convergence. We are interested in a strongly continuous semigroup of operators {St :t ≥ 0} with an infinitesimal generator G. For {St :t ≥ 0} to be a semigroup we require that S0 = I and St+τ = St Sτ for all τ , t ≥ 0. A semigroup is strongly continuous if lim Sτ φ = φ, τ ↓0
A quartet of semigroups for model specification
371
where the convergence is uniform for each φ in C. Continuity allows us to compute a time derivative and to define a generator Gφ = lim τ ↓0
Sτ φ − φ . τ
(16.8)
This is again a uniform limit and it is well defined on a dense subset of C. A generator describes the instantaneous evolution of a semigroup. A semigroup can be constructed from a generator by solving a differential equation. Thus applying the semigroup property gives lim τ ↓0
St+τ φ − St φ = GSt φ, τ
(16.9)
a differential equation for a semigroup that is subject to the initial condition that S0 is the identity operator. The solution to differential Equation (16.9) is depicted heuristically as: St = exp(tG), and thus satisfies the semigroup requirements. The exponential formula can be justified rigorously using a Yosida approximation, which formally constructs a semigroup from its generator. In what follows, we will use semigroups to model Markov processes, intertemporal prices, and statistical discrimination. Using a formulation of Hansen and Scheinkman (2002), we first examine semigroups that are designed to model Markov processes. 16.3.2. Representation of a generator We describe a convenient representation result for a strongly continuous, positive, contraction semigroup. Positivity requires that St maps nonnegative functions φ into nonnegative functions φ for each t. When the semigroup is a contraction, it is referred to as a Feller semigroup. The contraction property restricts the norm of St to be less than or equal to one for each t and is satisfied for semigroups associated with Markov processes. Generators of Feller semigroups have a convenient characterization: 1 ∂ 2φ ∂φ + N φ − ρφ, (16.10) + trace Gφ = µ · ∂x 2 ∂x∂x where N has the product form N φ(x) = [φ(y) − φ(x)]η(dy|x),
(16.11)
where ρ is a nonnegative continuous function, µ is an m-dimensional vector of continuous functions, is a matrix of continuous functions that is positive
372 Anderson et al. semidefinite on the state space, and η(·|x) is a finite measure for each x and 2 into C where continuous in x for Borel subset of D. We require that N map CK 2 CK is the subspace of functions that are twice continuously differentiable functions 2 .12 with compact support in D. Formula (16.11) is valid at least on CK To depict equilibrium prices we will sometimes go beyond Feller semigroups. Pricing semigroups are not necessarily contraction semigroups unless the instantaneous yield on a real discount bond is nonnegative. When we use this approach for pricing, we will allow ρ to be negative. While this puts us out of the realm of Feller semigroups, as argued by Hansen and Scheinkman (2002), known results for Feller semigroups can often be extended to pricing semigroups. We can think of the generator (16.10) as being composed of three parts. The first two components are associated with well known continuous-time Markov process models, namely, diffusion and jump processes. The third part discounts. The next three subsections will interpret these components of Equation (16.10). 16.3.2.1. Diffusion processes The generator of a Markov diffusion process is a second-order differential operator: Gd φ = µ ·
∂φ ∂x
+
1 ∂ 2φ , trace ∂x∂x 2
where the coefficient vector µ is the drift or local mean of the process and the coefficient matrix is the diffusion or local covariance matrix. The corresponding stochastic differential equation is: dxt = µ(xt )dt + (xt )dBt , where {Bt } is a multivariate standard Brownian motion and = . Sometimes the resulting process will have attainable boundaries, in which case we either stop the process at the boundary or impose other boundary protocols. 16.3.2.2. Jump processes The generator for a Markov jump process is. Gn φ = N φ = λ[Qφ − φ], (16.12) . where the coefficient λ = η(dy|x) is a possibly state-dependent Poisson intensity parameter that sets the jump probabilities and Q is a conditional expectation operator that encodes the transition probabilities conditioned on a jump taking place. Without loss of generality, we can assume that the transition distribution associated with the operator Q assigns probability zero to the event y = x provided that x = ∞, where x is the current Markov state and y the state after a jump takes place. That is, conditioned on a jump taking place, the process cannot stay put with positive probability unless it reaches a boundary.
A quartet of semigroups for model specification
373
The jump and diffusion components can be combined in a model of a Markov process. That is, Gd φ + Gn φ = µ ·
∂φ ∂x
1 ∂ 2φ + trace + N φ, 2 ∂x∂x
(16.13)
is the generator of a family (semigroup) of conditional expectation operators of a Markov process {xt }, say St (φ)(x) = E[φ(xt )|x0 = x]. 16.3.2.3. Discounting The third part of (16.10) accounts for discounting. Thus, consider a Markov process {xt } with generator Gd + Gn . Construct the semigroup: * ) t St φ = E exp − ρ(xτ ) dτ φ(xt )|x0 = x , 0
on C. We can think of this semigroup as discounting the future state at the stochastic rate ρ(x). Discount rates will play essential roles in representing shadow prices from a robust resource allocation problem and in measuring statistical discrimination between competing models.13 16.3.3. Extending the domain to bounded functions While it is mathematically convenient to construct the semigroup on C, sometimes it is necessary for us to extend the domain to a larger class of functions. For instance, indicator functions 1D of nondegenerate subsets D are omitted from C. Moreover, 1D is not in C when D is not compact; nor can this function be approximated uniformly. Thus to extend the semigroup to bounded, Borel measurable functions, we need a weaker notion of convergence. Let {φj :j = 1, 2, . . .} be a sequence of uniformly bounded functions that converges pointwise to a bounded function φo . We can then extend the Sτ semigroup to φo using the formula: Sτ φo = lim Sτ φj , j →∞
where the limit notion is now pointwise. The choice of approximating sequence does not matter and the extension is unique.14 With this construction, we define the instantaneous discount or interest rate as the pointwise derivative − lim τ ↓0
1 log Sτ 1D = ρ, τ
when the derivative exists.
374 Anderson et al. 16.3.4. Extending the generator to unbounded functions Value functions for control problems on noncompact state spaces are often not bounded. Thus for our study of robust counterparts to optimization, we must extend the semigroup and its generator to unbounded functions. We adopt an approach that is specific to a Markov process and hence we study this extension only for a semigroup generated by G = Gd + Gn . We extend the generator using martingales. To understand this approach, we first remark that for a given φ in the domain of the generator, t Mt = φ(xt ) − φ(x0 ) − Gφ(xτ ) dτ , 0
is a martingale. In effect, we produce a martingale by subtracting the integral of the local means from the process {φ(xt )}. This martingale construction suggests a way to build the extended generator. Given φ we find a function ψ such that t Mt = φ(xt ) − φ(x0 ) − ψ(xτ ) dτ , (16.14) 0
is a local martingale (a martingale under all members of a sequence of stopping times that increases to ∞). We then define Gφ = ψ. This construction extends the operator G to a larger class of functions than those for which the operator differentiation (16.8) is well defined. For every φ in the domain of the generator, ψ = Gφ in (16.14) produces a martingale. However, there are φ’s not in the domain of the generator for which (16.14) also produces a martingale.15 In the case of a Feller process defined on a state space D that is an open subset of R m , this extended domain contains at least functions in C˜ 2 , functions that are twice continuously differentiable on D. Such functions can be unbounded when the original state space D is not compact.
16.4. A tour of four semigroups In the remainder of the chapter we will study four semigroups. Before describing each in detail, it is useful to tabulate the four semigroups and their uses. We have already introduced the first semigroup, which describes the evolution of a state vector process {xt }. This semigroup portrays a decision maker’s approximating model. It has the generator displayed in (16.10) with ρ = 0, which we repeat here for convenience: ∂φ 1 ∂ 2φ Gφ = µ · + N φ. (16.15) + trace ∂x 2 ∂x ∂x While up to now we used G to denote a generic semigroup, from this point forward we will reserve it for the approximating model. We can think of the decision maker as using the semigroup generated by G to forecast functions φ(xt ). This semigroup for the approximating model can have both jump and Brownian components, but
A quartet of semigroups for model specification
375
the discount rate ρ is zero. In some settings, the semigroup associated with the approximating model includes a description of endogenous state variables and therefore embeds robust decision rules of one or more decision makers, as for example when the approximating model emerges from a robust resource allocation problem of the kind to be described in Section 16.5. With our first semigroup as a point of reference, we will consider three additional semigroups. The second semigroup represents an endogenous worst-case model that a decision maker uses to promote robustness to possible misspecification of his approximating model (16.15). For reasons that we discuss in Section 16.8, we shall focus the decision maker’s attention on worst-case models that are absolutely continuous with respect to his approximating model. Following Kunita (1969), we shall assume that the decision maker believes that the data are actually generated by a member of a class of models that are obtained as Markov perturbations of the approximating model (16.15). We parameterize this class of models by a pair of functions (g, h), where g is a continuous function of the Markov state x that has the same number of coordinates as the underlying Brownian motion, and h is a nonnegative function of (y, x) that distorts the jump intensities. For the worst-case ˆ Then we can represent model, we have the particular settings g = gˆ and h = h. ˆ the worst-case generator G as ˆ = µˆ · Gφ
∂φ ∂x
+
2 1 ∂ φ + Nˆ φ, trace 2 ∂x ∂x
(16.16)
where µˆ = µ + gˆ = ˆ x)η(dy|x). η(dy|x) ˆ = h(y, The distortion gˆ to the diffusion and the distortion hˆ to the jump component in the worst case model will also play essential roles both in asset pricing and in the detection probabilities formulas. From (16.12), it follows that the jump ˆ x)η(dy|x) and the intensity under this parameterization is given by λˆ (x) = h(y, ˆ x)/λ(x))η(dy|x). ˆ jump distribution conditioned on x is (h(y, A generator of the ˆ form (16.16) emerges from a robust decision problem, the perturbation pair (g, ˆ h) being chosen by a malevolent player, as we discuss next. Our third semigroup modifies one that Hansen and Scheinkman (2002) developed for computing the time zero price of a state contingent claim that pays off φ(xt ) at time t. Hansen and Scheinkman showed that the time zero price can be computed with a risk-free rate ρ¯ and a risk-neutral probability measure embedded in a semigroup with generator: ¯ = −ρφ Gφ ¯ + µ¯ ·
∂φ ∂x
+
1 ∂ 2φ ¯ + N¯ φ. trace 2 ∂x∂x
(16.17a)
376 Anderson et al. Here µ¯ = µ + π¯ ¯ =
(16.17b)
¯ η(dy|x) ¯ = (y, x)η(dy|x). In the absence of a concern about robustness, π¯ = g¯ is a vector of prices for the ¯ = h¯ encodes the jump risk prices. In Markov setBrownian motion factors and tings without a concern for robustness, (16.17b) represents the connection between the physical probability and the so-called risk-neutral probability that is widely used for asset pricing along with the interest rate adjustment. We alter generator (16.17) to incorporate a representative consumer’s concern about robustness to model misspecification. Specifically a preference for ¯ that are based solely robustness changes the ordinary formulas for π¯ and on pricing risks under the assumption that the approximating model is true. A concern about robustness alters the relationship between the semigroups for representing the underlying Markov processes and pricing. With a concern for robustness, we represent factor risk prices by relating µ¯ to the worst-case drift µ: ˆ µ¯ = µˆ + g¯ and risk-based jump prices by relating η¯ to the worst-case jump ¯ x)η(dy|x). measure η: ˆ η(dy|x) ¯ = h(y, ˆ Combining this decomposition with the relation between the worst-case and the approximating models gives the new vectors of pricing functions π¯ = g¯ + gˆ ˆ ¯ = h¯ h, ˆ is used to portray the (constrained) worst-case model in where the pair (g, ˆ h) ¯ (16.16). Later we will supply formulas for (ρ, ¯ g, ¯ h). A fourth semigroup statistically quantifies the discrepancy between two competing models as a function of the time interval of available data. We are particularly interested in measuring the discrepancy between the approximating and worst-case models. For each α ∈ (0, 1), we develop a bound on a detection error probability in terms of a semigroup and what looks like an associated “risk-free interest rate.” The counterpart to the risk-free rate serves as an instantaneous discrimination rate. For each α, the generator for the bound on the detection error probability can be represented as: ∂ 2φ 1 ∂φ G α φ = −ρ α φ + µα · + N α φ, + trace α ∂x 2 ∂x ∂x where µα = µ + g α α = ηα (dy|x) = hα (y, x)η(dy|x).
A quartet of semigroups for model specification
377
Table 16.1 Parameterizations of the generators of four semigroups. The rate modifies the generator associated with the approximating model by adding −ρφ to the generator for a test function φ. The drift distortion adds a term g · (∂φ/∂x) to the generator associated with the approximating model. The jump distortion density is h(y, x)η(dy|x) instead of the jump distribution η(dy|x) in the generator for the approximating model Semigroup
Generator
Rate
Drift distortion
Jump dist. density
Approximating model Worst-case model Pricing Detection
G
0
0
1
Gˆ G¯ Gα
0 ρ(x) ¯ ρ α (x)
g(x) ˆ π(x) ¯ = g(x) ¯ + g(x) ˆ g α (x)
ˆ x) h(y, ¯ x)h(y, ˆ x) ¯ (x) = h(y, hα (y, x)
The semigroup generated by G α governs the behavior as sample size grows of a bound on the fraction of errors made when distinguishing two Markov models using likelihood ratios or posterior odds ratios. The α associated with the best bound is determined on a case by case basis and is especially easy to find in the special case that the Markov process is a pure diffusion. Table 16.1 summarizes our parameterization of these four semigroups. Subsequent sections supply formulas for the entries in this table.
16.5. Model misspecification and robust control We now study the continuous-time robust resource allocation problem. In addition to an approximating model, this analysis will produce a constrained worst case model that by helping the decision maker to assess the fragility of any given decision rule can be used as a device to choose a robust decision rule. 16.5.1. Lyapunov equation under Markov approximating model and a fixed decision rule Under a Markov approximating model with generator G and a fixed policy function i(x), the decision maker’s value function is V (x) =
∞
exp(−δt)E[U [xt , i(xt )]|x0 = x] dt.
0
The value function V satisfies the continuous-time Lyapunov equation: δV (x) = U [x, i(x)] + GV (x).
(16.18)
Since V may not be bounded, we interpret G as the weak extension of the generator (16.13) defined using local martingales. The local martingale associated with this
378 Anderson et al. equation is:
t
Mt = V (xt ) − V (x0 ) −
(δV (xs ) − U [xs , i(xs )]) ds.
0
As in (16.13), this generator can include diffusion and jump contributions. We will eventually be interested in optimizing over a control i, in which case the generator G will depend explicitly on the control. For now we suppress that dependence. We refer to G as the approximating model; G can be modelled using the triple (µ, , η) as in (16.13). The pair (µ, ) consists of the drift and diffusion coefficients while the conditional measure η encodes both the jump intensity and the jump distribution. We want to modify the Lyapunov equation (16.18) to incorporate a concern about model misspecification. We shall accomplish this by replacing G with another generator that expresses the decision maker’s precaution about the specification of G. 16.5.2. Entropy penalties We now introduce perturbations to the decision maker’s approximating model that are designed to make finite horizon transition densities of the perturbed model be absolutely continuous with respect to those of the approximating model. We use a notion of absolute continuity that pertains only to finite intervals of time. In particular, imagine a Markov process evolving for a finite length of time. Our notion of absolute continuity restricts probabilities induced by the path {xτ :0 ≤ τ ≤ t} for all finite t. See HSTW (2002), who discuss this notion as well as an infinite history version of absolute continuity. Kunita (1969) shows how to preserve both the Markov structure and absolute continuity. Following Kunita (1969), we shall consider a Markov perturbation that can be parameterized by a pair (g, h), where g is a continuous function of the Markov state x and has the same number of coordinates as the underlying Brownian motion, and h is a nonnegative function of (y, x) used to model the jump intensities. In Section 16.8, we will have more to say about these perturbations including a discussion of why we do not perturb . For the pair (g, h), the perturbed generator is portrayed using a drift µ + g, a diffusion matrix , and a jump measure h(y, x)η(dy|x). Thus the perturbed generator is ∂φ(x) G(g, h)φ(x) = Gφ(x) + [(x)g(x)] · + [h(y, x) − 1] ∂x [φ(y) − φ(x)]η(dy|x). For this perturbed generator to be a Feller process would require that we impose additional restrictions on h. For analytical tractability we will only limit the perturbations to have finite entropy. We will be compelled to show, however, that the perturbation used to implement robustness does indeed generate a Markov process. This perturbation will be constructed formally as the solution to a constrained
A quartet of semigroups for model specification
379
minimization problem. In what follows, we continue to use the notation G to be the approximating model in place of the more tedious G(0, 1). 16.5.3. Conditional relative entropy At this point, it is useful to have a local measure of conditional relative entropy.16 Conditional relative entropy plays a prominent role in large deviation theory and in classical statistical discrimination where it is sometimes used to study the decay in the so-called type II error probabilities, holding fixed type I errors (Stein’s Lemma). For the purposes of this section, we will use relative entropy as a discrepancy measure. In Section 16.8 we will elaborate on its connection to the theory of statistical discrimination. As a measure of discrepancy, it has been axiomatized by Csiszar (1991) although his defense shall not concern us here. By t we denote the log of the ratio of the likelihood of model one to the likelihood of model zero, given a data record of length t. For now, let the data be either a continuous or a discrete time sample. The relative entropy conditioned on x0 is defined to be: E(t |x0 , model 1) = E[t exp(t )|x0 , model 0] =
d E[exp(αt )|x0 , model 0]|α=1 , dα
(16.19)
where we have assumed that the model zero probability distribution is absolutely continuous with respect to the model one probability distribution. To evaluate entropy, the second relation differentiates the moment-generating function for the log-likelihood ratio. The same information inequality that justifies maximum likelihood estimation implies that relative entropy is nonnegative. When the model zero transition distribution is absolutely continuous with respect to the model one transition distribution, entropy collapses to zero as the length of the data record t → 0. Therefore, with a continuous data record, we shall use a concept of conditional relative entropy as a rate, specifically the time derivative of (16.19). Thus, as a local counterpart to (16.19), we have the following measure: (g, h)(x) =
g(x) g(x) + 2
[1 − h(y, x) + h(y, x) log h(y, x)]η(dy|x), (16.20)
where model zero is parameterized by (0, 1) and model one is parameterized by (g, h). The quadratic form g g/2 comes from the diffusion contribution, and the term [1 − h(y, x) + h(y, x) log h(y, x)]η(dy|x), measures the discrepancy in the jump intensities and distributions. It is positive by the convexity of h log h in h.
380 Anderson et al. Let denote the space of all such perturbation pairs (g, h). Conditional relative entropy is convex in (g, h). It will be finite only when h(y, x)η(dy|x) < ∞.
0<
When we introduce adjustments for model misspecification, we modify Lyapunov equation (16.18) in the following way to penalize entropy δV (x) = min U [x, i(x)] + θ (g, h) + G(g, h)V (x), (g,h)∈
where θ > 0 is a penalty parameter. We are led to the following entropy penalty problem. Problem A J (V ) = inf
(g,h)∈
θ (g, h) + G(g, h)V .
(16.21)
Theorem 16.1. Suppose that (i) V is in C˜ 2 and (ii) exp[−V (y)/θ ]η(dy|x) < ∞ for all x. The minimizer of Problem A is 1 ∂V (x) g(x) ˆ = − (x) ∂x θ ) * V (x) − V (y) ˆh(y, x) = exp . θ
(16.22a)
The optimized value of the criterion is: J (V ) = −θ
G [exp (−V /θ )] . exp (−V /θ )
(16.22b)
Finally, the implied measure of conditional relative entropy is: ∗ =
V G[exp(−V /θ )] − G[V exp(−V /θ )] − θ G[exp(−V /θ)] . θ exp(−V /θ )
(16.22c)
Proof. The proof is in Appendix A. The formulas (16.22a) for the distortions will play a key role in our applications to asset pricing and statistical detection.
A quartet of semigroups for model specification
381
16.5.4. Risk-sensitivity as an alternative interpretation In light of Theorem 16.1, our modified version of Lyapunov equation (16.18) is δV (x) = min U [x, i(x)] + θ (g, h) + G(g, h)V (x) (g,h)∈
= U [x, i(x)] − θ
G [exp (−V /θ )] (x) . exp [−V (x)/θ]
(16.23)
If we ignore the minimization prompted by fear of model misspecification and instead simply start with that modified Lyapunov equation as a description of preferences, then replacing GV in the Lyapunov equation (16.18) by −θ(G[exp(−V /θ)]/ exp(−V /θ )) can be interpreted as adjusting the continuation value for risk. For undiscounted problems, the connection between risk-sensitivity and robustness is developed in a literature on risk-sensitive control (e.g. see James (1992) and Runolfsson (1994)). Hansen and Sargent’s (1995) recursive formulation of risk sensitivity accommodates discounting. The connection between the robustness and the risk-sensitivity interpretations is most evident when G = Gd so that x is a diffusion. Then
−θ
1 Gd [exp (−V /θ )] = Gd (V ) − exp (−V /θ ) 2θ
∂V ∂x
∂V ∂x
.
1 In this case, (16.23) is a partial differential equation. Notice that − 2θ scales (∂V /∂x) (∂V /∂x), the local variance of the value function process {V (xt )}. The interpretation of (16.23) under risk sensitive preferences would be that the decision maker is concerned not about robustness but about both the local mean and the local variance of the continuation value process. The parameter θ is inversely related to the size of the risk adjustment. Larger values of θ assign a smaller concern about risk. The term 1/θ is the so-called risk sensitivity parameter. Runolfsson (1994) deduced the δ = 0 (ergodic control) counterpart to (16.23) to obtain a robust interpretation of risk sensitivity. Partial differential equation (16.23) is also a special case of the equation system that Duffie and Epstein (1992), Duffie and Lions (1992), and Schroder and Skiadas (1999) have analyzed for stochastic differential utility. They showed that for diffusion models, the recursive utility generalization introduces a variance multiplier that can be state dependent. The counterpart to this multiplier in our setup is state independent and equal to the risk sensitivity parameter 1/θ . For a robust decision maker, this variance multiplier restrains entropy between the approximating and alternative models. The mathematical connections between robustness, on the one hand, and risk sensitivity and recursive utility, on the other, let us draw on a set of analytical results from those literatures.17
382 Anderson et al. 16.5.5. The θ -constrained worst-case model Given a value function. Theorem 16.1 reports the formulas for the distortions ˆ for a worst-case model used to enforce robustness. This worst-case model (g, ˆ h) is Markov and depicted in terms of the value function. This theorem thus gives us a generator Gˆ and shows us how to fill out the second row in Table 16.1. In fact, a separate argument is needed to show formally that Gˆ does in fact generate a Feller process or more generally a Markov process. There is a host of alternative sufficient conditions in the probability theory literature. Kunita (1969) gives one of the more general treatments of this problem and goes outside the realm of Feller semigroups. Also, Ethier and Kurtz (1985: Chapter 8) give some sufficient conditions for operators to generate Feller semigroups, including restrictions on the jump component Gˆn of the operator. ˆ we can apply Theorem 16.3 Using the Theorem 16.1 characterization of G, to obtain the generator of a detection semigroup that measures the statistical discrepancy between the approximating model and the worst-case model. 16.5.6. An alternative entropy constraint We briefly consider an alternative but closely related way to compute worst-case models and to enforce robustness. In particular, we consider: Problem B J ∗ (V ) =
inf
(g,h)∈,(g,h)≤ε
G(g, h)V .
(16.24)
This problem has the same solution as that given by Problem A except that θ must now be chosen so that the relative entropy constraint is satisfied. That is, θ should be chosen so that (g, h) satisfies the constraint. The resulting θ will typically depend on x. The optimized objective must now be adjusted to remove the penalty: J ∗ (V ) = J (V ) − θ ∗ =
V G[exp(−V /θ )] − G[V exp(−V /θ)] , exp(−V /θ )
which follows from (16.22c). These formulas simplify greatly when the approximating model is a diffusion. Then θ satisfies ∂V (x) 1 ∂V (x) . θ2 = ∂x ∂x 2ε This formulation embeds a version of the continuous-time preference order that Chen and Epstein (2001) proposed to capture uncertainty aversion. We had also suggested the diffusion version of this robust adjustment in our earlier paper (Anderson et al., 1998).
A quartet of semigroups for model specification
383
16.5.7. Enlarging the class of perturbations In this chapter we focus on misspecifications or perturbations to an approximating Markov model that themselves are Markov models. But in HSTW, we took a more general approach and began with a family of absolutely continuous perturbations to an approximating model that is a Markov diffusion. Absolute continuity over finite intervals puts a precise structure on the perturbations, even when the Markov specification is not imposed on these perturbations. As a consequence, HSTW follow James (1992) by tconsidering path dependent specifications of the drift of the Brownian motion 0 gs ds, where gs is constructed as a general function of past x’s. Given the Markov structure of this control problem, its solution can be represented as a time-invariant function of the state vector xt that we denote ˆ t ). gˆ t = g(x 16.5.8. Adding controls to the original state equation We now allow the generator to depend on a control vector. Consider an approximating Markov control law of the form i(x) and let the generator associated with an approximating model be G(i). For this generator, we introduce perturbation (g, h) as before. We write the corresponding generator as G(g, h, i). To attain a robust decision rule, we use the Bellman equation for a two-player zero-sum Markov multiplier game: δV = max min U (x, i) + θ (g, h) + G(g, h, i)V . i
(g,h)∈
(16.25)
The Bellman equation for a corresponding constraint game is: δV = max i
min
(g,h)∈(i),(g,h)≤ε
U (x, i) + G(g, h, i)V .
Sometimes infinite-horizon counterparts to terminal conditions must be imposed on the solutions to these Bellman equations. Moreover, application of a Verification Theorem will be needed to guarantee that the implied control laws actually solve the game. Finally, these Bellman equations presume that the value function is twice continuously differentiable. It is well known that this differentiability is not always present in problems in which the diffusion matrix can be singular. In these circumstances there is typically a viscosity generalization to each of these Bellman equations with very similar structures. (See Fleming and Soner (1991) for a development of the viscosity approach to controlled Markov processes.)
16.6. Portfolio allocation To put some of the results of Section 16.5 to work, we now consider a robust portfolio problem. In Section 16.7 we will use this problem to exhibit how asset prices can be deduced from the shadow prices of a robust resource allocation problem. We depart somewhat from our previous notation and let {xt : t ≥ 0} denote a state vector that is exogenous to the individual investor. The investor influences
384 Anderson et al. the evolution of his wealth, which we denote by wt . Thus the investor’s composite state at date t is (wt , xt ). We first consider the case in which the exogenous component of the state vector evolves as a diffusion process. Later we let it be a jump process. Combining the diffusion and jump pieces is straightforward. We focus on the formulation with the entropy penalty used in Problem (16.21), but the constraint counterpart is similar. 16.6.1. Diffusion An investor confronts asset markets that are driven by a Brownian motion. Under an approximating model, the Brownian increment factors have data t prices given by π(xt ) and xt evolves according to a diffusion: dxt = µ(xt ) dt + (xt ) dBt .
(16.26)
Equivalently, the x process has a generator Gd that is a second-order differential operator with drift µ and diffusion matrix = . A control vector bt entitles the investor to an instantaneous payoff bt · dBt with a price π(xt ) · bt in terms of the consumption numeraire. This cost can be positive or negative. Adjusting for cost, the investment has payoff −π(xt ) · bt dt + bt · dBt . There is also a market in a riskless security with an instantaneous risk-free rate ρ(x). The wealth dynamics are therefore dwt = [wt ρ(xt ) − π(xt ) · bt − ct ] dt + bt · dBt ,
(16.27)
where ct is date t consumption. The control vector is i = (b , c). Only consumption enters the instantaneous utility function. By combining (16.26) and (16.27), we form the evolution for a composite Markov process. But the investor has doubts about this approximating model and wants a robust decision rule. Therefore he solves a version of game (16.25) with (16.26), (16.27) governing the dynamics of his composite state vector w, x. With only the diffusion component, the investor’s Bellman equation is δV (w, x) = max min U (c) + θ (g) + G(g, b, c)V , (c,b)
g
where G(g, b, c) is constructed using drift vector ) * µ(x) + (x)g wρ(x) − π(x) · b − c + b · g and diffusion matrix ) * [ b]. b The choice of the worst-case shock g satisfies the first-order condition: θ g + Vw b + Vx = 0,
(16.28)
A quartet of semigroups for model specification
385
. where Vw = ∂V /∂w and similarly for Vx . Solving (16.28) for g gives a special case of the formula in (16.22a). The resulting worst-case shock would depend on the control vector b. In what follows we seek a solution that does not depend on b. The first-order condition for consumption is Vw (w, x) = Uc (c), and the first-order condition for the risk allocation vector b is −Vw π + Vww b + Vxw + Vw g = 0.
(16.29)
In the limiting case in which the robustness penalty parameter is set to ∞, we obtain the familiar result that b=
π Vw − Vxw , Vww
in which the portfolio allocation rule has a contribution from risk aversion measured by −Vw /wVww and a hedging demand contributed by the dynamics of the exogenous forcing process x. Take the Markov perfect equilibrium of the relevant version of game (16.25). Provided that Vww is negative, the same equilibrium decision rules prevail no matter whether one player or the other chooses first, or whether they choose simultaneously. The first-order conditions (16.28) and (16.29) are linear in b and g. Solving these two linear equations gives the control laws for b and g as a function of the composite state (w, x): θ π Vw − θ Vxw + Vw Vx bˆ = θ Vww − (Vw )2 Vw Vxw − (Vw )2 π − Vww Vx gˆ = . θVww − (Vw )2
(16.30)
Notice how the robustness penalty adds terms to the numerator and denominator of the portfolio allocation rule. Of course, the value function V also changes when we introduce θ. Notice also that (16.30) gives decision rules of the form ˆ bˆ = b(w, x) gˆ = g(w, ˆ x),
(16.31)
and in particular how the worst-case shock g feeds back on the consumer’s endogenous state variable w. Permitting g to depend on w expands the kinds of misspecifications that the consumer considers. 16.6.1.1. Related formulations So far we have studied portfolio choice in the case of a constant robustness parameter θ . Maenhout (2001) considers portfolio problems in which the robustness
386 Anderson et al. penalty depends on the continuation value. In his case, the preference for robustness is designed so that asset demands are not sensitive to wealth levels as is typical in constant θ formulations. Lei (2000) uses the instantaneous constraint formulation of robustness described in Section 16.5.6. to investigate portfolio choice. His formulation also makes θ state dependent, since θ now formally plays the role of a Lagrange multiplier that restricts conditional entropy at every instant. Lei specifically considers the case of incomplete asset markets in which the counterpart to b has a lower dimension than the Brownian motion. 16.6.1.2. Ex post Bayesian interpretation While the dependence of g on the endogenous that w seems reasonable as a way to enforce robustness, it can be unattractive if we wish to interpret the implied worst-case model as one with misspecified exogenous dynamics. It is sometimes asked whether a prescribed decision rule can be rationalized as being optimal for some set of beliefs, and then to find what those beliefs must be. The dependence of the shock distributions on an endogenous state variable such as wealth w might be regarded as a peculiar set of beliefs because it is egotistical to let an adverse nature feedback on personal state variables. But there is a way to make this feature more acceptable. It requires using a dynamic counterpart to an argument of Blackwell and Girshick (1954). We can produce a different representation of the solution to the decision problem by forming an exogenous state vector W that conforms to the Markov perfect equilibrium of the game. We can confront a decision maker with this law of motion for the exogenous state vector, have him not be concerned with robustness against misspecification of this law by setting θ = ∞, and pose an ordinary decision problem in which the decision maker has a unique model. We initialize the exogenous state at W0 = w0 . The optimal decision processes for {(bt , ct )} (but not the control laws) will be identical for this decision problem and for game (16.25) (see HSWT). It can be said that this alternative problem gives a Bayesian rationale for the robust decision procedure. 16.6.2. Jumps Suppose now that the exogenous state vector {xt } evolves according to a Markov jump process with jump measure η. To accommodate portfolio allocation, introduce the choice of a function a that specifies how wealth changes when a jump takes place. Consider an investor who faces asset markets with date-state Arrow security prices given by (y, xt ) where {xt } is an exogenous state vector with jump dynamics. In particular, a choice a with instantaneous payoff a(y) if the state jumps to y has a price (y, xt )a(y)η(dy|x) in terms of the consumption numeraire. This cost can be positive or negative. When a jump does not take place, wealth evolves according to * ) dwt = ρ(xt− )wt− − (y, xt− )a(y)η(dy|xt− ) − ct− dt
A quartet of semigroups for model specification
387
where ρ(x) is the risk-free rate given state x and for any variable z, zt− = limτ ↑t zτ . If the state x jumps to y at date t, the new wealth is a(y). The Bellman equation for this problem is δV (w, x) = max min U (c) c,a h∈ ) * + Vw (w, x) ρ(x)wt − (y, x)a(y)η (dy|x) − c + θ [1 − h(y, x) + h(y, x) log h(y, x)]η (dy|x) + h(y, x)(V [a(y), y] − V (w, x))η (dy|x) The first-order condition for c is the same as for the diffusion case and equates Vw to the marginal utility of consumption. The first-order condition for a requires ˆ x)Vw [a(y), h(y, ˆ y] = Vw (w, x)(y, x), and the first-order condition for h requires ˆ x) = V [a(y), −θ log h(y, ˆ y] − V (w, x). Solving this second condition for hˆ gives the jump counterpart to the solution asserted in Theorem 16.1. Thus the robust aˆ satisfies: Vw [a(y), ˆ y] (y, x) = . Vw (w, x) exp((−V [a(y), ˆ y] + V (x)/θ )) In the limiting no-concern-about-robustness case θ = ∞, hˆ is set to one. Since Vw is equated to the marginal utility for consumption, the first-order condition for a equates the marginal rate of substitution of consumption before and after the jump to the price (y, x). Introducing robustness scales the price by the jump distribution distortion. In this portrayal, the worst-case h depends on the endogenous state w, but it is again possible to obtain an alternative representation of the probability distortion that would give an ex post Bayesian justification for the decision process of a.
16.7. Pricing risky claims By building on findings of Hansen and Scheinkman (2002), we now consider a third semigroup that is to be used to price risky claims. We denote this semigroup by {Pt : t ≥ 0} where Pt φ assigns a price at date zero to a date t payoff φ(xt ). That pricing can be described by a semigroup follows from the Law of Iterated Values: a date 0 state-date claim φ(xt ) can be replicated by first buying a claim Pτ φ(xt−τ )
388 Anderson et al. and then at time t − τ buying a claim φ(xt ). Like our other semigroups, this one ¯ that we write as in (16.10): has a generator, say G, ∂φ 1 ∂ 2φ ¯ ¯ Gφ = −ρφ ¯ + µ¯ · + trace + N¯ φ ∂x 2 ∂x∂x where N¯ φ =
[φ(y) − φ(x)]η(dy|x). ¯
The coefficient on the level term ρ¯ is the instantaneous riskless yield to be given in formula (16.34). It is used to price locally riskless claims. Taken together, the remaining terms ∂φ 1 ∂ 2φ ¯ µ¯ · + N¯ φ, + trace ∂x 2 ∂x∂x comprise the generator of the so-called risk neutral probabilities. The risk neutral evolution is Markov. As discussed by Hansen and Scheinkman (2002), we should expect there to be a connection between the semigroup underlying the Markov process and the semigroup that underlies pricing. Like the semigroup for Markov processes, a pricing semigroup is positive: it assigns nonnegative prices to nonnegative functions of the Markov state. We can thus relate the semigroups by importing the measuretheoretic notion of equivalence. Prices of contingent claims that payoff only in probability measure zero events should be zero. Conversely, when the price of a contingent claim is zero, the event associated with that claim should occur only with measure zero, which states the principle of no-arbitrage. We can capture these properties by specifying that the generator G¯ of the pricing semigroup satisfies: µ(x) ¯ = µ(x) + (x)π(x) ¯ ¯ (x) = (x)
(16.32)
¯ η(x) ¯ = (y, x)η (dy|x), ¯ is strictly positive. Thus we construct equilibrium prices by producing where ¯ We now show how to construct this triple both with and without a triple (ρ, ¯ π¯ , ). a preference for robustness. 16.7.1. Marginal rate of substitution pricing To compute prices, we follow Lucas (1978) and focus on the consumption side of the market. While Lucas used an endowment economy, Brock (1982) showed that the essential thing in Lucas’s analysis was not the pure endowment feature. Instead it was the idea of pricing assets from marginal utilities that are evaluated at a candidate equilibrium consumption process that can be computed prior to computing prices. In contrast to Brock, we use a robust planning problem to generate a
A quartet of semigroups for model specification
389
candidate equilibrium allocation. As in Breeden (1979), we use a continuous-time formulation that provides simplicity along some dimensions.18 16.7.2. Pricing without a concern for robustness First consider the case in which the consumer has no concern about model misspecification. Proceedings in the spirit of Lucas (1978) and Brock (1982), we can construct market prices of risk from the shadow prices of a planning problem. Following Lucas and Prescott (1971) and Prescott and Mehra (1980), we solve a representative agent planning problem to get a state process {xt }, an associated control process {it }, and a marginal utility of consumption process {γt }, respectively. We let G ∗ denote the generator for the state vector process that emerges when the optimal controls from the resource allocation problem with no concern for robustness are imposed. In effect, G ∗ is the generator for the θ = ∞ robust control problem. We construct a stochastic discount factor process by evaluating the marginal rate of substitution at the proposed equilibrium consumption process: mrst = exp(−δt)
γ (xt ) , γ (x0 )
where γ (x) denotes the marginal utility process for consumption as a function of the state x. Without a preference for robustness, the pricing semigroup satisfies Pt φ(x) = E ∗ [mrst φ(xt )|x0 = x],
(16.33)
where the expectation operator E ∗ is the one implied by G ∗ . Individuals solve a version of the portfolio problem described in Section 16.6 without a concern for robustness. This supports the following representation of the generator for the equilibrium pricing semigroup Pt : ρ¯ = −
G∗γ +δ γ
∂ log γ µ¯ = µ∗ + ∗ π¯ = µ∗ + ∗ ∗ ∂x * ) γ (y) ∗ ¯ η¯ (dy|x) = (y, x)η∗ (dy|x) = η (dy|x). γ (x)
(16.34)
These are the usual rational expectations risk prices. The risk-free rate is the subjective rate of discount reduced by the local mean of the equilibrium marginal utility process scaled by the marginal utility. The vector π¯ of Brownian motion risk prices are weights on the Brownian increment in the evolution of the marginal utility of consumption, again scaled by the marginal utility. Finally the jump ¯ are given by the equilibrium marginal rate of substitution between risk prices consumption before and after a jump.
390 Anderson et al. 16.7.3. Pricing with a concern for robustness under the worst-case model As in our previous analysis, let G denote the approximating model. This is the model that emerges after imposing the robust control law iˆ while assuming that there is no model misspecification (g = 0 and h = 1). It differs from G ∗ , which also assumes no model misspecification but instead imposes a rule derived without any preference for robustness. But simply attributing the beliefs G to private agents in (16.34) will not give us the correct equilibrium prices when there is a preference for robustness. Let Gˆ denote the worst-case model that emerges as part of the Markov perfect equilibrium of the two-player, zero-sum game. However, formula (16.34) will yield the correct equilibrium prices if we in effect impute to the individual agents the worst-case generator Gˆ instead of G ∗ as their model of state evolution when making their decisions without any concerns about its possible misspecification. To substantiate this claim, we consider individual decision makers who, when choosing their portfolios, use the worst-case model Gˆ as if it were correct (i.e. they have no concern about the misspecification of that model, so that rather than entertaining a family of models, the individuals commit to the worst-case Gˆ as a model of the state vector {xt : t ≥ 0}). The pricing semigroup then becomes ˆ Pt φ(x) = E[mrs t φ(xt )|x0 = x],
(16.35)
where Eˆ denotes the mathematical expectation with respect to the distorted meaˆ The generator for this pricing semigroup is sure described by the generator G. parameterized by ρ¯ = −
ˆ Gγ +δ γ
∂ log γ µ¯ = µˆ + g¯ = µˆ + ∂x ) * γ (y) ¯ η(dy|x) ¯ = h(y, x)η(dy|x) ˆ = η(dy|x). ˆ γ (x)
(16.36)
As in subsection 16.7.2, γ (x) is the log of the marginal utility of consumption except it is evaluated at the solution of the robust planning problem. Individuals solve the portfolio problem described in Section 16.6 using the worst-case model 1 = h¯ specified relative to the of the state {xt } with pricing functions π¯ = g¯ and worst-case model. We refer to g¯ and h¯ as risk prices because they are equilibrium prices that emerge from an economy in which individual agents use the worst-case model as if it were the correct model to assess risk. The vector g¯ contains the socalled factor risk prices associated with the vector of Brownian motion increments. Similarly, h¯ prices jump risk. Comparison of (16.34) and (16.36) shows that the formulas for factor risk prices and the risk-free rate are identical except that we have used the distorted generator Gˆ
A quartet of semigroups for model specification
391
in place of G ∗ . This comparison shows that we can use standard characterizations of asset pricing formulas if we simply replace the generator for approximating ˆ 19 model G with the distorted generator G. 16.7.4. Pricing under the approximating model There is another portrayal of prices that uses the approximating model G as a reference point and that provides a vehicle for defining model uncertainty prices and for distinguishing between the contributions of risk and model uncertainty. The g¯ and h¯ from subsection 16.7.3 give the risk components. We now use the discrepancy between G and Gˆ to produce the model uncertainty prices. To formulate model uncertainty prices, we consider how prices can be represented under the approximating model when the consumer has a preference for robustness. We want to represent the pricing semigroup as Pt φ(x) = E[(mrst )(mput )φ(xt )|x0 = x],
(16.37)
where mpu is a multiplicative adjustment to the marginal rate of substitution that allows us to evaluate the conditional expectation with respect to the approximating model rather than the distorted model. Instead of (16.34), to attain (16.37), we portray the drift and jump distortion in the generator for the pricing semigroup as µ¯ = µˆ + g¯ = µ + (g¯ + g) ˆ ¯ x)η(dy|x) ¯ x)h(y, ˆ x)η(dy|x). η(dy|x) ¯ = h(y, ˆ = h(y, Changing expectation operators in depicting the pricing semigroup will not change the instantaneous risk-free yield. Thus from Theorem 16.1 we have: Theorem 16.2. Let V p be the value function for the robust resource allocation problem. Suppose that (i) V p is in C˜ 2 and (ii) exp[−V p (y)/θ ]η(dy|x) < ∞ for all x. Moreover, γ is assumed to be in the domain of the extended generator ˆ Then the equilibrium prices can be represented by: G. ρ¯ = −
ˆ Gγ +δ γ
) * 1 p γx (x) π¯ (x) = − (x) Vx (x) + (x) = g(x) ˆ + g(x) ¯ θ γ (x) 1 ¯ log (y, x) = − [V p (y) − V p (x)] + log γ (y) − log γ (x) θ ˆ x) + log h(y, ¯ x). = log h(y, This theorem follows directly from the relation between G and Gˆ given in Theorem 16.1 and from the risk prices of subsection 16.7.3. It supplies the third row of Table 16.1.
392 Anderson et al. 16.7.5. Model uncertainly prices: diffusion and jump components We have already interpreted g¯ and h¯ as risk prices. Thus we view gˆ = −(1/θ ) Vx as the contribution to the Brownian exposure prices that comes from model uncerˆ x) = −(1/θ ) exp[V p (y) − V p (x)] as the model tainty. Similarly, we think of h(y, uncertainty contribution to the jump exposure prices. HST obtained the additive decomposition for the Brownian motion exposure asserted in Theorem 16.2 as an approximation for linear-quadratic, Gaussian resource allocation problems. By studying continuous-time diffusion models we have been able to sharpen their results and relax the linear-quadratic specification of contraints and preferences. p
16.7.6. Subtleties about decentralization In Hansen and Sargent (2003), we confirm that the solution of a robust planning problem can be decentralized with households who also solve robust decision problems while facing the state-date prices that we derived above. We confront the household with a recursive representation of state-date prices, give the household the same robustness parameter θ as the planner, and allow the household to choose a new worst-case model. The recursive representation of the state-date prices is portrayed in terms of the state vector X for the planning problem. As in the portfolio problems of Section 16.6, among the households’ state variables is their endogenously determined financial wealth, w. In equilibrium, the household’s wealth can be expressed as a function of the state vector X of the planner. However, in posing the household’s problem, it is necessary to include both wealth w and the state vector X that propels the state-date prices as distinct state components of the household’s state. More generally, it is necessary to include both economy-wide and individual versions of household capital stocks and physical capital stocks in the household’s state vector, where the economy-wide components are used to provide a recursive representation of the date-state prices. Thus the controls and the worst-case shocks chosen by both the planner, on the one hand, and the households in the decentralized economy, on the other hand, will depend on different state vectors. However, in a competitive equilibrium, the decisions that emerge from these distinct rules will be perfectly aligned. That is, if we take the decision rules of the household in the decentralized economy and impose the equilibrium conditions requiring that “the representative agent be representative,” then the decisions and the motion of the state will match. The worst-case models will also match. In addition, although the worst-case models depend on different state variables, they coincide along an equilibrium path. 16.7.7. Ex post Bayesian equilibrium interpretation of robustness In a decentralized economy, Hansen and Sargent (2003) also confirm that it is possible to compute robust decision rules for both the planner and the household
A quartet of semigroups for model specification
393
by (a) endowing each such decision maker with his own worst-case model, and (b) having each solve his decision problem without a preference for robustness, while treating those worst-case models as if they were true. Ex post it is possible to interpret the decisions made by a robust decision maker who has a concern about the misspecification of his model as also being made by an equivalent decision maker who has no concern about the misspecification of a different model that can be constructed from the worst-case model that is computed by the robust decision maker. Hansen and Sargent’s (2003) results thus extend results of HSTW, discussed in Section 16.6.1.2, to a setting where both a planner and a representative household choose worst-case models, and where their worst-case models turn out to be aligned.
16.8. Statistical discrimination A weakness in what we have achieved up to now is that we have provided the practitioner with no guidance on how to calibrate our model uncertainty premia of Theorem 16.2, or what formulas (16.22a) tell us is virtually the same thing, the decision maker’s robustness parameter θ . It is at this critical point that our fourth semigroup enters the picture.20 Our fourth semigroup governs bounds on detection statistics that we can use to guide our thinking about how to calibrate a concern about robustness. We shall synthesize this semigroup from the objects in two other semigroups that represent alternative models that we want to choose between given a finite data record. We apply the bounds associated with distinguishing between the decision maker’s approximating and worst-case models. In designing a robust decision rule, we assume that our decision maker worries about alternative models that available time series data cannot readily dispose of. Therefore, we study a stylized model selection problem. Suppose that a decision maker chooses between two models that we will refer to as zero and one. Both are continuous-time Markov process models. We construct a measure of how much time series data are needed to distinguish these models and then use it to calibrate our robustness parameter θ . Our statistical discrepancy measure is the same one that in Section 16.5 we used to adjust continuation values in a dynamic programming problem that is designed to acknowledge concern about model misspecification.
16.8.1. Measurement and prior probabilities We assume that there are direct measurements of the state vector {xt :0 ≤ t ≤ N } and aim to discriminate between two Markov models: model zero and model one. We assign prior probabilities of one-half to each model. If we choose the model with the maximum posterior probability, two types of errors are possible, choosing model zero when model one is correct and choosing model one when model zero is correct. We weight these errors by the prior probabilities and, following Chernoff (1952), study the error probabilities as the sample interval becomes large.
394 Anderson et al. 16.8.2. A semigroup formulation of bounds on error probabilities We evade the difficult problem of precisely calculating error probabilities for nonlinear Markov processes and instead seek bounds on those error probabilities. To compute those bounds, we adapt Chernoff’s (1952) large deviation bounds to discriminate between Markov processes. Large deviation tools apply here because the two types of error both get small as the sample size increases. Let G 0 denote the generator for Markov model zero and G 1 the generator for Markov model one. Both can be represented as in (16.13). 16.8.2.1. Discrimination in discrete time Before developing results in continuous time, we discuss discrimination between two Markov models in discrete time. Associated with each Markov process is a family of transition probabilities. For any interval τ , these transition probabilities are mutually absolutely continuous when restricted to some event that has positive probability under both probability measures. If no such event existed, then the probability distributions would be orthogonal, making statistical discrimination easy. Let pτ (y|x) denote the ratio of the transition density over a time interval τ of model one relative to that for model zero. We include the possibility that pτ (y|x) integrates to a magnitude less than one using the model zero transition probability distribution. This would occur if the model one transition distribution assigned positive probability to an event that has measure zero under model zero. We also allow the density pτ to be zero with positive model zero transition probability. If discrete time data were available, say x0 , xτ , x2τ , . . . , xT τ where N = T τ , then we could form the log likelihood ratio: N τ =
T
log pτ (xj τ , x(j −1)τ ).
j =1
Model one is selected when N τ > 0,
(16.38)
and model zero is selected otherwise. The probability of making a classification error at date zero conditioned on model zero is Pr{N |x0 = x, model 0). τ > 0|x0 = x, model 0} = E(1{N τ >0} It is convenient that the probability of making a classification error conditioned on model one can also be computed as an expectation of a transformed random variable conditioned on model zero. Thus, Pr{N |x0 = x, model 1] τ < 0|x0 = x, model 1} = E[1{N τ <0} |x0 = x, model 0]. = E[exp(N τ )1{N τ <0} The second equality follows because multiplication of the indicator function by the likelihood ratio exp(N τ ) converts the conditioning model from one to zero.
A quartet of semigroups for model specification
395
2.0 1.8
Min (exp (r), 1)
1.6 1.4
Min (exp(r), 1)
1.2 exp (r)
1.0 0.8 0.6 0.4 0.2 0
0
0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 Exp (r)
Figure 16.1 Graph of min{exp(r), 1} and the dominating function exp(rα) for α = 0.5.
Combining these two expressions, the average error is: av error =
1 E(min{exp(N τ ), 1}|x0 = x, model 0). 2
(16.39)
Because we compute expectations only under the model zero probability measure, from now on we leave implicit the conditioning on model zero. Instead of using formula (16.39) to compute the probability of making an error, we will use a convenient upper bound originally suggested by Chernoff (1952) and refined by Hellman and Raviv (1970). To motivate the bound, note that for any 0 < α < 1 the piecewise linear function min{s, 1} is dominated by the concave function s α and that the two functions agree at the kink point s = 1. The smooth dominating function gives rise to more tractable computations as we alter the amount of available data. Thus, setting log s = r = N τ and using (16.39) gives the bound: av error ≤
1 E[exp(αN τ )|x0 = x], 2
(16.40)
where the right side is the moment-generating function for the log-likelihood ratio N τ (see Figure 16.1). (Later we shall discuss how to choose α ∈ (0, 1) in order to maximize error detection rates.) Define an operator: Kτα φ(x) = E[exp(αττ )φ(xτ )|x0 = x]. Then inequality (16.40) can be portrayed as: av error ≤
1 α T (K ) 1D (x), 2 τ
(16.41)
396 Anderson et al. where 1D is again the indicator function of the state space D for the Markov process, and where superscript T on the right side denotes sequential application of an operator T times. This bound applies for any integer choice of T and any choice of α between zero and one.21 When restricted to a function space C, we have the inequalities |Kτα φ(x)| ≤ E([exp(ττ )]α |φ|x0 = x) ≤ E[exp(ττ )|φ|1/α x0 = x]α ≤ φ, where the second inequality is an application of Jensen’s inequality. Thus Kτα is a contraction on C. 16.8.3. Rates for measuring discrepancies between models locally Classification errors become less frequent as more data become available. One common way to study the large sample behavior of classification error probabilities is to investigate the limiting behavior of the operator (Kτα )T as T gets large. This amounts to studying how fast (Kτα )T contracts for large T and results in a large deviation characterization. Chernoff (1952) proposed such a characterization for i.i.d. data that later researchers extended to Markov processes. Formally, a largedeviation analysis can give rise to an asymptotic rate for discriminating between models. Given the state dependence in the Markov formulation, there are two different possible notions of discrimination rates that are based on Chernoff entropy. One notion that we shall dub “long run” is state independent; to construct it requires additional assumptions. This long run rate is computed by studying the semigroup {Kt :t ≥ 0} as t gets large. This semigroup can have a positive eigenfunction, that is, a function φ that solves ˜ Kt φ = exp(−δt)φ,
(16.42)
˜ When it exists, this eigenfunction dominates the remaining for some positive δ. ones as the time horizon t gets large. As a consequence, for large t this semigroup ˜ Therefore, δ˜ is a long run measure decays approximately at an exponential rate δ. of the rate at which information for discriminating between two models accrues. By construction and as part of its “long run” nature, the rate δ˜ is independent of the state. In this chapter, we use another approximation that results in a state-dependent or “short-run” discrimination rate. It is this state-dependent rate that is closely linked to our robust decision rule, in the sense that it is governed by the same objects that emerge from the worst-case analysis for our robust control problem. The semigroup {Kt :t ≥ 0} has the same properties as a pricing semigroup, and furthermore it contracts. We can define a discrimination rate in the same way that
A quartet of semigroups for model specification
397
we define an instantaneous interest rate from a pricing semigroup. This leads us to use Chernoff entropy as ρ α (x). It differs from the decay rate δ˜ defined by (16.42). For a given state x, it measures the statistical ability to discriminate between models when a small interval of data becomes available. When the rate ρ α (x) is large, the time series data contain more information for discriminating between models. Before characterizing a local discrimination rate ρ α that is applicable to continuous-time processes, we consider the following example. 16.8.3.1. Constant drift Consider sampling a continuous multivariate Brownian motion with a constant drift. Let µ0 , 0 and µ1 , 1 be the drift vectors and constant diffusion matrices for models zero and one, respectively. Thus under model zero, xj τ − x(j −1)τ is normally distributed with mean τ µ0 and covariance matrix τ 0 . Under an alternative model one, xj τ − x(j −1)τ is normally distributed with mean τ µ1 and convariance matrix τ 1 . Suppose that 0 = 1 and that the probability distributions implied by the two models are equivalent (i.e. mutually absolutely continuous). Equivalence will always be satisfied when 0 and 1 are nonsingular but will also be satisfied when the degeneracy implied by the covariance matrices coincides. It can be shown that lim Kτα 1D < 1 τ ↓0
suggesting that a continuous-time limit will not result in a semigroup. Recall that a semigroup of operators must collapse to the identity when the elapsed interval becomes arbitrarily small. When the covariance matrices 0 and 1 differ, the detection-error bound remains positive even when the data interval becomes small. This reflects the fact that while absolute continuity is preserved for each positive τ , it is known from Cameron and Martin (1947) that the probability distributions implied by the two limiting continuous-time Brownian motions will not be mutually absolutely continuous when the covariance matrices differ. Since diffusion matrices can be inferred from high frequency data, differences in these matrices are easy to detect.22 Suppose that 0 = 1 = . If µ0 − µ1 is not in the range of , then the discrete-time transition probabilities for the two models over an interval τ are not equivalent, making the two models easy to distinguish using data. If, however, µ0 − µ1 is in the range of , then the probability distributions are equivalent for any transition interval τ . Using a complete-the-square argument, it can be shown that Kτα φ(x) = exp(−τρα ) φ(y)Pτα (y − x)dy
398 Anderson et al. where Pατ is a normal distribution with mean τ (1 − α)µ0 + τ αµ1 and covariance matrix τ , ρα =
α(1 − α) 0 (µ − µ1 ) −1 (µ0 − µ1 ) 2
(16.43)
and −1 is a generalized inverse when is singular. It is now the case that lim Kτα 1D = 1. τ ↓0
The parameter ρ α acts as a discount rate, since Kτα 1D = exp(−τρ α ). The best probability bound (the largest ρ α ) is obtained by setting α = 1/2, and the resulting discount rate is referred to as Chernoff entropy. The continuous-time limit of this example is known to produce probability distributions that are absolutely continuous over any finite horizon (see Cameron and Martin, 1947). For this example, the long-run discrimination rate δ˜ and the short-run rate ρ α coincide because ρ α is state independent. This equivalence emerges because the underlying processes have independent increments. For more general Markov processes, this will not be true and the counterpart to the short-run rate will depend on the Markov state. The existence of a well defined long-run rate requires special assumptions. 16.8.3.2. Continuous time There is a semigroup probability bound analogous to (16.40) that helps to understand how data are informative in discriminating between Markov process models. Suppose that we model two Markov processes as Feller semigroups. The generator of semigroup zero is ∂φ 1 ∂ 2φ G 0 φ = µ0 · + N 0φ + trace ∂x 2 ∂x∂x and the generator of semigroup one is ∂φ 1 ∂ 2φ G 1 φ = µ1 · + N 1φ + trace ∂x 2 ∂x∂x In specifying these two semigroups, we assume identical ’s. As in the example, this assumption is needed to preserve absolute continuity. Moreover, we require that µ1 can be represented as: µ1 = g + µ0 . for some continuous function g of the Markov state, where we assume that the rank of is constant on the state space and can be factored as = where has full rank. This is equivalent to requiring that µ0 − µ1 is in the range of .23
A quartet of semigroups for model specification
399
In contrast to the example, however, both of the µ’s and can depend on the Markov state. Jump components are allowed for both processes. These two operators are restricted to imply jump probabilities that are mutually absolutely continuous for at least some nondegenerate event. We let h(·, x) denote the density function of the jump distribution of N 1 with respect to the distribution of N 0 . We assume that h(y, x)dη0 (y|x) is finite for all x. Under absolute continuity we write: N 1 φ(x) = h(y, x)[φ(y) − φ(x)]η0 (dy|x). Associated with these two Markov processes is a positive, contraction semigroup {Ktα :t ≥ 0} for each α ∈ (0, 1) that can be used to bound the probability of classification errors: av error ≤
1 α (K )1D (x). 2 N
This semigroup has a generator G α with the Feller form: ∂φ 1 ∂ 2φ + N α φ. + trace G α φ = −ρ α φ + µα · ∂x 2 ∂x∂x
(16.44)
The drift µα is formed by taking convex combinations of the drifts for the two models µα = (1 − α)µ0 + αµ1 = µ0 + αg; the diffusion matrix is the common diffusion matrix for the two models, and the jump operator N α is given by: N α φ(x) = [h(y, x)]α [φ(y) − φ(x)]η0 (dy/x). Finally, the rate ρ α is nonnegative and state dependent and is the sum of contributions from the diffusion and jump components: ρ α (x) = ρdα (x) + ρnα (x).
(16.45)
The diffusion contribution ρdα (x) =
(1 − α)α g(x) g(x) 2
is a positive semi-definite quadratic form and the jump contribution ρnα (x) = ((1 − α) + αh(y, x) − [h(y, x)]α )η0 (dy/x)
(16.46)
(16.47)
is positive because the tangent line to the concave function (h)α at h = 1 must lie above the function.
400 Anderson et al. Theorem 16.3. (Newman, 1973 and Newman and Stuck, 1979). The generator of the positive contraction semigroup {Ktα :t ≥ 0} on C is given by (16.44). Thus we can interpret the generator G α that bounds the detection error probabilities as follows. Take model zero to be the approximating model, and model one to be some other competing model. We use 0 < α < 1 to build a mixed . diffusion-jump process from the pair (g α , hα ) where g α =αg ˙ and hα = (h)α . α Use the notation E to depict the associated expectation operator. Then av error ≤
) N * 1 α ρ α (xt )dt x0 = x . E exp − 2 0
(16.48)
Of particular interest to us is formula (16.45) for ρ α , which can be interpreted as a local statistical discrimination rate between models. In the case of two diffusion processes, this measure is a state-dependent counterpart to formula (16.43) in the example presented in Section 16.8.3.1. The diffusion component of the rate is maximized by setting α = 1/2. But when jump components are also present, α = 1/2 will not necessarily give the maximal rate. Theorem 16.3 completes the fourth row of Table 16.1. 16.8.4. Detection statistics and robustness Formulas (16.46) and (16.47) show how the local Chernoff entropy rate is closely related to the conditional relative entropy measure that we used to formulate robust control problems. In particular, the conditional relative entropy rate continuous time satisfies dρ α . =− dα α=1 In the case of diffusion process, the Chernoff rate equals the conditional relative entropy rate without the proportionality factor α(1 − α).24 16.8.5. Further discussion In some ways, the statistical decision problem posed above is too simple. It entails a pairwise comparison of ex ante equally likely models and gives rise to a statistical measure of distance. That the contenders are both Markov models greatly simplifies the bounds on the probabilities of making a mistake when choosing between models. The implicit loss function that justifies model choice based on the maximal posterior probabilities is symmetric (e.g. see Chow, 1957). Finally, the detection problem compels the decision maker to select a specific model after a fixed amount of data have been gathered. Bounds like Chernoff’s can be obtained when there are more than two models and also when the decision problem is extended to allow waiting for more data before making a decision (e.g. see Hellman and Raviv, 1970 and Moscarini and
A quartet of semigroups for model specification
401
Smith, 2002).25 Like our problem, these generalizations can be posed as Bayesian problems with explicit loss functions and prior probabilities. While the statistical decision problem posed here is by design too simple, we nevertheless find it useful in bounding a reasonable taste for robustness. The hypothesis of rational expectations instructs agents and the model builder to eliminate as misspecified those models that are detectable from infinite histories of data. Chernoff entropy gives us one way to extend rational expectations by asking agents to exclude specifications rejected by finite histories of data but to contemplate alternative models that are difficult to detect from finite histories of data. When Chernoff entropy is small, it is challenging to choose between competing models on purely statistical grounds. 16.8.6. Detection and plausibility In Section 16.6.1.2, we reinterpreted the equilibrium allocation under a preference for robustness as one that instead would be chosen by a Bayesian social planner who holds fast to a particular model that differs from the approximating model. If the approximating model is actually true, then this artificial Bayesian planner has a false model of forcing processes, one that enough data should disabuse him of. However, our detection probability tools let us keep the Bayesian planner’s model sufficiently close to the approximating model that more data than are available would be needed to detect that the approximating model is really better. That means that a rational expectations econometrician would have a difficult time distinguishing the forcing process under the approximating model from the Bayesian planner’s model. Nevertheless, that the Bayesian planner uses such a nearby model can have important quantitative implications for decisions and/or asset prices. We demonstrate such effects on asset prices in Section 16.9.26
16.9. Entropy and the market price uncertainty In comparing different discrete time representative agent economies with robust decision rules, HSW computationally uncovered a connection between the market price of uncertainty and the detection error probabilities for distinguishing between a planner’s approximating and worst-case models. The connection was so tight that the market price of uncertainty could be expressed as nearly the same linear function of the detection error probability, regardless of the details of the model specification.27 HSW used this fact to calibrate θ ’s, which differed across different models because the relationship between detection error probabilities and θ did depend on the detailed model dynamics. As emphasized in Section 16.2, the tight link that we have formally established between the semigroup for pricing under robustness and the semigroup for detection error probability bounds provides the key to understanding HSW’s empirical finding, provided that the detection error probability bounds do a good enough job of describing the actual detection error probabilities. Subject to that proviso, our formal results thus provide a way of moving directly from a specification of
402 Anderson et al. a sample size and a detection error probability to a prediction of the market price of uncertainty that transcends details of the model. Partly to explore the quality of the information in our detection error probability bounds, this section takes three distinct example economies and shows within each of them that the probability bound is quite informative and that consequently the links between the detection error probability and the market price of uncertainty are very close across the three different models. All three of our examples use diffusion models, so that the formulas summarized in Section 16.2 apply. Recall from formula (16.46) that for the case of a diffusion the local Chernoff rate for discriminating between the approximating model and the worst-case model is α(1 − α)
|g| ˆ2 , 2
(16.49)
which is maximized by setting α = 0.5. Small values of the rate suggest that the competing models would be difficult to detect using time series statistical methods. For a diffusion, we have seen how the price vector for the Brownian increments can be decomposed as g¯ + g. ˆ In the standard model without robustness, the conditional slope of a mean-standard deviation frontier is the absolute value of the factor risk price vector, |g|, ¯ but with robustness it is |g¯ + g|, ˆ where gˆ is the part attributable to aversion to model uncertainty. One possible statement of the equity–premium puzzle is that standard models imply a much smaller slope than is found in the data because plausible risk aversion coefficients imply a small value for g. ¯ This conclusion extends beyond the comparison of stocks and bonds and is also apparent in equity return heterogeneity. See Hansen and Jagannathan (1991), Cochrane and Hansen (1992), and Cochrane (1997) for discussions. In this section we explore the potential contribution from model uncertainty. In particular, for three models we compute |g|, ˆ the associated bounds on detection error probabilities, and the detection error probabilities themselves. The three models are: (1) a generic one where the worst-case drift distortion gˆ is independent of x; (2) our robust interpretation of a model of Bansal and Yaron (2000) in which gˆ is again independent of x but where its magnitude depends on whether a low frequency component of growth is present; and (3) a continuous time version of HST’s equilibrium permanent income model, in which gˆ depends on x. Very similar relations between detection error probabilities and market prices of risk emerge across these three models, though they are associated with different values of θ . 16.9.1. Links among sample size, detection probabilities, and mpu when gˆ is independent of x In this section we assume that the approximating model is a diffusion and that the worst-case model is such that gˆ is independent of the Markov state x. Without knowing anything more about the model, we can quantitatively explore the links among the market price of model uncertainty (mpu ≡ |g|) ˆ and the detection error probabilities.
A quartet of semigroups for model specification
403
Table 16.2 Prices of Model Uncertainty and Detection-Error Probabilities when gˆ is independent of x, N = 200; mpu denotes the market price of model uncertainty, measured by |g|. ˆ The Chernoff rate is given by (16.49) mpu (= |g|) ˆ
Chernoff rate (α = 0.5)
Probability bound
Probability
0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.30 0.40
0.0001 0.0002 0.0004 0.0008 0.0013 0.0018 0.0015 0.0032 0.0040 0.0050 0.0113 0.0200
0.495 0.480 0.457 0.426 0.389 0.349 0.306 0.264 0.222 0.184 0.053 0.009
0.444 0.389 0.336 0.286 0.240 0.198 0.161 0.129 0.102 0.079 0.017 0.002
For N = 200, Table 16.2 reports values of mpu = |g| ˆ together with Chernoff entropy for α = 0.5, the associated probability-error bound (16.48), and the actual probability of detection on the left side of (16.48) (which we can calculate analytically in this case). The probability bounds and the probabilities are computed under the simplifying assumption that the drift and diffusion coefficients are constant, as in the example in Section 16.8.3.1. With constant drift and diffusion coefficients, the log-likelihood ratio is normally distributed, which allows us easily to compute the actual detection-error probabilities.28 The numbers in Table 16.2 indicate that market prices of uncertainty somewhat less than 0.2 are associated with misspecified models that are difficult to detect. However, market prices of uncertainty of 0.40 are associated with easily detectable alternative models. The table also reveals that although the probability bounds are weak, they display patterns similar to those of the actual probabilities. Empirical estimates of the slope of the mean–standard deviation frontier are about 0.25 for quarterly data. Given the absence of volatility in aggregate consumption, risk considerations only explain a small component of the measured risk-return tradeoff using aggregate data (Hansen–Jagannathan, 1991). In contrast, our calculations suggest that concerns about statistically small amounts of model misspecification could account for a substantial component of the empirical estimates. The following subsections confirm that this quantitative conclusion transcends details of the specification of the approximating model.
16.9.2. Low frequency growth Using a recursive utility specification, Bansal and Yaron (2000) study how low frequency components in consumption and/or dividends become encoded in risk
404 Anderson et al. premia. Here we take up Bansal and Yaron’s theme that the frequency decomposition of risks matters but reinterpret their risk premia as reflecting model uncertainty. Consider a pure endowment economy where the state x1t driving the consumption endowment exp(x1t ) is governed by the following process: dx1t = (0.0020 + 0.1154x2t )dt + 0.0048dBt dx2t = −0.0263x2t dt + 0.0048dBt .
(16.50)
The logarithm of the consumption endowment has a constant component of the growth rate of 0.002 and a time varying component of the growth rate of x2t ; x2t has mean zero and is stationary but is highly temporally dependent. Relative to the i.i.d. specification that would be obtained by attaching a coefficient of zero on x2t in the first equation of (16.50), the inclusion of the x2t component alters the long run properties of consumption growth. We calibrated the state evolution equation (16.50) by taking a discrete-time consumption process that was fit by Bansal and Yaron and embedding it in a continuous state space model.29 We accomplished this using conversions in the MATLAB control toolbox. The impulse response of log consumption to a Brownian motion shock, which is portrayed in Figure 16.2, and the spectral density function, which is shown in Figure 16.3, both show the persistence in consumption growth. The low frequency component is a key feature in the analysis of Bansal and Yaron (2000). The impulse response function converges to its supremum from below. It takes about ten years before the impulse response function is close to its supremum. Corresponding to this behavior of the impulse response function, there is a narrow peak in the spectral density of consumption growth at frequency zero, with the spectral density being much greater than its infimum only for frequencies with periods of more than ten years. In what follows, we will also compute model uncertainly premia for an alternative economy in which the coefficient on x2t in the first equation of (16.50) is set to zero. We calibrate this economy so that the resulting spectral density for consumption growth is flat at the same level as the infimum depicted in Figure 16.3 and so that the corresponding impulse response function is also flat at the same initial response reported in Figure 16.2. Because we have reduced the long-run variation by eliminating x2t from the first equation, we should expect to see smaller risk-premia in this economy. Suppose that the instantaneous utility function is logarithmic. Then the value function implied by Theorem 16.1 is linear in x, so we can write V as V (x) = v0 + v1 x1 + v2 x2 . The distortion in the Brownian motion is given by the following special case of equation (16.28) 1 gˆ = − (0.0048v1 + 0.01v2 ), θ
A quartet of semigroups for model specification
405
0.03
Impulse response
0.025 0.02 0.015 0.01 0.005 0
0
20
40
60
80 100 120 140 160 180 Time (months)
Figure 16.2 Impulse response function for the logarithm of the consumption endowment x1t to a Brownian increment.
8
× 10–4
Spectral density
6
4
2
0
0
0.5
1 1.5 2 2.5 Frequency (rad/month)
3
Figure 16.3 Spectral density function for the consumption growth rate dx1t .
and is independent of the state x. The coefficients on the vi ’s are the volatility attached to dB in Equation (16.50).30 Since gˆ is constant, the worst-case model implies the same impulse response functions but different mean growth rates in consumption. Larger values of 1/θ
406 Anderson et al. lead to more negative values of the drift g. ˆ We have to compute this mapping numerically because the value function itself depends on θ . Figure 16.4 reports gˆ as a function of 1/θ (the g’s ˆ are negative). Larger values of |g| ˆ imply larger values of the rate of Chernoff entropy. As in the previous example this rate is constant, and the probability bounds in Table 16.2 continue to apply to this economy.31 This instantaneous risk-free rate for this economy is: f
rt = δ + 0.0020 + 0.1154x2t + σ1 gt ˆ −
(σ1 ) 2 2
where σ1 = 0.0048 is the coefficient on the Brownian motion in the evolution equation for x1t . Our calculations hold the risk-free rate fixed as we change θ . This requires that we adjust the subjective discount rate. The predictability of consumption growth emanating from the state variable x2t makes the risk-free rate vary over time. 16.9.3. Risk-sensitivity and calibration of θ This economy has a risk-sensitive interpretation along the lines asserted by Tallarini (2000) in his analysis of business cycle models, one that makes 1/θ a risksensitivity parameter that imputes risk aversion to the continuation utility and makes gˆ become the incremental contribution to risk aversion contributed by this risk adjustment. Under Tallarini’s risk-sensitivity interpretation, 1/θ is a risk aversion parameter, and as such is presumably fixed across environments. Figure 16.4 shows that holding θ fixed but changing the consumption endowment process changes the detection error rates. For a given θ , the implied worst-case model could be more easily detected when there is positive persistence in the endowment growth rate. On the robustness interpretation, such detection error calculations suggest that θ should not be taken to be invariant across environments. However, on the risk sensitivity interpretation, θ should presumably be held fixed across environments. Thus while concerns about risk and robustness have similar predictions within a given environment, calibrations of θ can depend on whether it is interpreted as a risk-sensitivity parameter that is fixed across environments, or a robustness parameter to be adjusted across environments depending on how detection error probabilities differ across environments. 16.9.4. Permanent income economy Our previous two examples make Chernoff entropy be independent of the Markov state. In our third example, we computed detection-error bounds and detectionerror probabilities for the version of HST’s robust permanent income model that includes habits.32 The Chernoff entropies are state dependent for this model, but the probability bounds can still be computed numerically. We used the parameter values from HST’s discrete-time model to form an approximate continuous-time robust permanent income model, again using conversions in the MATLAB control toolbox. HST allowed for two independent income components when estimating
A quartet of semigroups for model specification
407
0
Worst-case distortion
–0.05
–0.1
–0.15
–0.2
–0.25
Dependent growth i.i.d. growth 0
1
2
3
4
5 1/
6
7
8
9
10
Figure 16.4 Drift distortion gˆ for the Brownian motion. This distortion is plotted as a function of 1/θ . For comparison, the drift distortions are also given for the economy in which the first-difference in the logarithm of consumption is i.i.d. In generating this figure, we set the discount parameter δ so that the mean risk-free rate remained at 3.5 percent when we changed θ. In the model with temporally dependent growth the volatility of the risk-free rate is 0.38 percent. The risk-free rate is constant in the model with i.i.d. growth rates.
their model. The impulse responses for the continuous-time version of the model are depicted in Figure 16.5. The responses are to independent Brownian motion shocks. One of the processes is much more persistent than the other one. That persistent process is the one that challenges a permanent-income-style saver. When we change the robustness parameter θ , we alter the subjective discount rate in a way that completely offsets the precautionary motive for saving in HST’s economy and its continuous-time counterpart, so that consumption and investment profiles and real interest rates remain fixed.33 It happens that the worst case gˆ vector is proportional to the marginal utility of consumption and therefore is highly persistent. This outcome reflects that the decision rule for the permanent income model is well designed to protect the consumer against transient fluctuations, but that it is vulnerable to model misspecifications that are highly persistent under the approximating mode. Under the approximating model, the marginal utility process is a martingale, but the (constrained) worst-case model makes this process become an explosive scalar autoregression. The choice of θ determines the magnitude of the explosive autoregressive coefficient for the marginal utility process. The distortion is concentrated primarily in the persistent income component. Figure 16.6 compares the impulse response of the distorted income process to that of the income process under the approximating model. Under the distorted model there is considerably more long-run variation. Decreasing θ increases this variation.
0.4 0.35
Impulse response
0.3 0.25 0.2 0.15 0.1 0.05 0
0
5
10
15 20 25 Time (quarters)
30
35
40
Figure 16.5 This figure gives the impulse response functions for the two income processes to two independent Brownian motion shocks.
0.4 0.35
Impulse response
0.3 0.25 0.2 0.15 0.1 0.05 0
0
5
10
15
20
25
30
35
40
Time (quarters)
Figure 16.6 Impulse response functions for the persistent income process under the approximating model and the model for which mpu = 0.16. The more enduring response is from the distorted model. The implied risk-free rate is constant and identical across the two models.
A quartet of semigroups for model specification
409
Table 16.3 Prices of Model Uncertainty and Detection-Error Probabilities for the permanent income model, N = 200. The probability bound calculations are computed numerically and optimized over the choice of α. The simulated probabilities were computed by simulating discrete-approximations with a time interval of length 0.01 of a quarter and with 1000 replications mpu (= |g|) ˆ
1/θ
Probability bound
Simulated probabilities
0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20 0.30 0.40
1.76 3.51 5.27 7.02 8.78 10.53 12.29 14.04 15.80 17.56 26.33 35.11
0.495 0.479 0.453 0.416 0.372 0.323 0.271 0.220 0.171 0.128 0.015 0.0004
0.446 0.388 0.334 0.282 0.231 0.185 0.143 0.099 0.072 0.054 0.004 0.000
Table 16.3 gives the detection-error probabilities corresponding to those reported in Table 16.2. In this case, the Chernoff entropy is state dependent, leading us to compute the right side of (16.48) numerically in Table 16.3.34 The values of θ are set so that the market prices of uncertainty match those in Table 16.2. Despite the different structures of the two models, for mpus up to about 0.12, the results in Table 16.3 are close to those of Table 16.2, though the detection-error probabilities are a little bit lower in Table 16.3. However, the probabilities do decay faster in the tails for the permanent income model. As noted, the model in Table 16.2 has no dependence of gˆ on the state x. As shown by HST, the fluctuations in |g| ˆ are quite small relative to its level, which contributes to the similarity of the results for low mpus. It remains the case that statistical detection is difficult for market prices of uncertainty up to about half that of the empirically estimated magnitude. We conclude that a preference for robustness that is calibrated to plausible values of the detection error probabilities can account for a substantial fraction, but not all, of estimated equity premia, and that this fraction is impervious to details of the model specification. Introducing other features into a model can help to account more fully for the steep slope of the frontier, but they would have to work through g¯ and not the robustness component g. ˆ For example, market frictions or alterations to preferences affect the g¯ component.35
16.10. Concluding remarks In this chapter we have applied tools from the mathematical theory of continuous-time Markov processes to explore the connections between model
410 Anderson et al. misspecification, risk, robustness, and model detection. We used these tools in conjunction with a decision theory that captures the idea that the decision maker views his model as an approximation. An outcome of the decision making process is a constrained worst-case model that the decision maker uses to create a robust decision rule. In an equilibrium of a representative agent economy with such robust decision makers, the worst-case model of a fictitious robust planner coincides with the worst-case model of a representative private agent. These worst-case models are endogenous outcomes of decision problems and give rise to model uncertainty premia in security markets. By adjusting constraints or penalties, the worst-case model can be designed to be close to the approximating model in the sense that it is difficult statistically to discriminate it from the original approximating model. These same penalties or constraints limit model uncertainty premia. Since mean returns are hard to estimate, nontrivial departures from an approximating model are difficult to detect. As a consequence, within limits imposed by statistical detection, there can still be sizable model uncertainty premia in security prices. There is another maybe less radical interpretation of our calculations. A rational expectations econometrician is compelled to specify forcing processes, often without much guidance except from statistics, for example, taking the form of a least squares estimate of an historical vector autoregression. The exploration of worst-case models can be thought of as suggesting an alternative way of specifying forcing process dynamics that is inherently more pessimistic, but that leads to statistically similar processes. Not surprisingly, changing forcing processes can change equilibrium outcomes. More interesting is the fact that sometimes subtle changes in perceived forcing processes can have quantitatively important consequences on equilibrium prices. Our altered decision problem introduces a robustness parameter that we restrict by using detection probabilities. In assessing reasonable amounts of risk aversion in stochastic, general equilibrium models, it is common to explore preferences for hypothetical gambles in the manner of Pratt (1964). In assessing reasonable preferences for robustness, we propose using large sample detection probabilities for a hypothetical model selection problem. We envision a decision maker as choosing to be robust to departures that are difficult to detect statistically. Of course, using detection probabilities in this way is only one way to discipline robustness. Exploration of other consequences of being robust, such as utility losses, would also be interesting. We see three important extensions to our current investigation. Like builders of rational expectations models, we have sidestepped the issue of how decision makers select an approximating model. Following the literature on robust control, we envision this approximating model to be analytically tractable, yet to be regarded by the decision maker as not providing a correct model of the evolution of the state vector. The misspecifications we have in mind are small in a statistical sense but can otherwise be quite diverse. Just as we have not formally modelled how agents learned the approximating model, neither have we formally justified why they do not bother to learn about potentially complicated misspecifications of that model. Incorporating forms of learning would be an important extension of our work.
A quartet of semigroups for model specification
411
The equilibrium calculations in our model currently exploit the representative agent paradigm in an essential way. Reinterpreting our calculations as applying to a multiple agent setting is straightforward in some circumstances (see Tallarini, 2000), but in general, even with complete security market structures, multiple agent versions of our model look fundamentally different from their single-agent counterparts (see Anderson, 1998). Thus, the introduction of heterogeneous decision makers could lead to new insights about the impact of concerns about robustness for market outcomes. Finally, while the examples of Section 16.9 were all based on diffusions, the effects of concern about robustness are likely to be particularly important in environments with large shocks that occur infrequently, so that we anticipate that our modeling of robustness in the presence of jump components will be useful.
Appendix A: Entropy solution This appendix contains the proof of Theorem 16.1. Proof. To verify the conjectured solution, first note that the objective is additively separable in g and h. Moreover the objective for the quadratic portion in g is: θ
gg ∂V + g . 2 ∂x
(16.A.1)
Minimizing this component of the objective by choice of g verifies the conjectured solution. The diffusion contribution to the optimized objective including (16.A.1) is: Gd (V ) −
1 2θ
∂V ∂x
∂V ∂x
= −θ
Gd [exp(−V /θ)] exp(−V /θ)
where we are using the additive decomposition: G = Gd + Gn . Very similar reasoning justifies the diffusion contribution to entropy formula (16.22c). Consider next the component of the objective that depends on h: θ
[1 − h(y, x) + h(y, x) log h(y, x)]η(dy|x) + h(y, x)[V (y) − V (x)]η(dy|x)
(16.A.2)
ˆ first use the fact that 1−h+h log h To verify36 that this objective is minimized by h, ˆ Thus is convex and hence dominates the tangent line at h. ˆ − h). ˆ 1 − h + h log h ≥ 1 − hˆ + hˆ log hˆ + log h(h
412 Anderson et al. This inequality continues to hold when we multiply by θ and integrate with respect to η(dy|x). Thus ˆ x)η(dy|x) ≥ θ(h)(x) ˆ θ (h)(x) − θ h(y, x) log h(y, ˆ x) log h(y, ˆ x)η(dy|x). − θ h(y, ˆ x)] shows that hˆ minimizes (16.A.2.), and the resulting Substituting for log[h(y, objective is: ˆ x)]η(dy|x) = −θ Gn exp(−V /θ ) , θ [1 − h(y, exp(−V /θ ) which establishes the jump contribution to (16.22b). Very similar reasoning justifies the jump contribution to (16.22c).
Acknowledgments We thank Fernando Alvarez, Pierre-Andre Chiappori, Jose Mazoy, Eric Renault, Jose Scheinkman, Grace Tsiang, and Neng Wang for comments on earlier drafts and Nan Li for valuable research assistance. This chapter supersedes our earlier manuscript Risk and Robustness in Equilibrium (1998). This research provided the impetus for subsequent work including Hansen et al. (2002). Hansen and Sargent gratefully acknowledge support from the National Science Foundation.
Notes 1 For example, see the two papers about specification error in rational expectations models by Sims (1993) and Hansen and Sargent (1993). 2 This assumption is so widely used that it rarely excites comment within macroeconomics. Kurz (1997) is an exception. The rational expectations critique of earlier dynamic models with adaptive expectations was that they implicitly contained two models, one for the econometrician and a worse one for the agents who are forecasting inside the model. See Jorgenson (1967) and Lucas (1976). Rational expectations modeling responded to this critique by attributing a common model to the econometrician and the agents within his model. Econometricians and agents can have different information sets, but they agree about the model (stochastic process). 3 See Evans and Honkapohja (2001) and Fudenberg and Levine (1998). 4 The semigroup formulation of Markov processes is common in the literature on applied probability. See Ethier and Kurtz (1986) for a general treatment of semigroups and Hansen and Scheinkman (1995) for their use in studying the identification of continuous-time Markov models. 5 Here the operator is indexed by the time horizon of the available data. In effect there is a “statistical detection operator” that measures the statistical value of information available to discriminate between two Markov processes. 6 The assumption of rational expectations equates a decision maker’s approximating model to the objective distribution. Empirical applications of models with robust decision makers like HST and HSW have equated those distributions too. The statement
A quartet of semigroups for model specification
7 8 9 10 11 12
13
14 15 16 17 18 19
20 21
413
that the agent regards his model as an approximation, and therefore makes cautious decisions, leaves open the possibility that the agent’s concern about model misspecification is “just in his head”, meaning that the data are actually generated by the approximating model. The “just in his head” assumption justifies equating the agent’s approximating model with the econometrician’s model, a step that allows us to bring to bear much of the powerful empirical apparatus of rational expectations econometrics. In particular, it provides the same economical way of imputing an approximating model to the agents as rational expectations does. The difference is that we allow the agent’s doubts about that model to affect his decisions. See Cho et al. (2002) for a recent application of large deviation theory to a model of learning dynamics in macroeconomics. See Bray (1982) and Kreps (1998). See Hansen and Sargent (2004) for discussions of these bounds. Diffusion (16.1) describes the “physical probability measure.” See Figure 8 of HSW. See Theorem 1.13 in Chapter VII of Revuz and Yor (1994). Revuz and Yor give a more ∞ general representation that is valid provided that the functions in CK are in the domain of the generator. Their representation does not require that η(·|x) be a finite measure for each x but imposes a weaker restriction on this measure. As we will see, when η(·|x) is finite, we can define a jump intensity. Weaker restrictions permit there to be an infinite number of expected jumps in finite intervals that are arbitrarily small in magnitude. As a consequence, this extra generality involves more cumbersome notation and contributes nothing essential to our analysis. When ρ ≥ 0, the semigroup is a contraction. In this case, we can use G as a generator of a Markov process in which the process is curtailed at rate ρ. Formally, we can let ∞ be a 7terminal state at 8 which the process stays put. Starting the process at x = t ∞, E exp − 0 ρ(xτ )dτ |x0 = x is the probability that the process is not curtailed after t units of time. See Revuz and Yor (1994: 280) for a discussion. As in Hansen and Scheinkman (2002), we will use the discounting interpretation of the semigroup and not use ρ as a cutailment rate. Discounting will play an important role in our discussion of detection and pricing. In pricing problems, ρ can be negative in some states as might occur in a real economy, an economy with a consumption numeraire. This extension was demonstrated by Dynkin (1956). Specifically, Dynkin defines a weak (in the sense of functionals) counterpart to this semigroup and shows that there is a weak extension of this semigroup to bounded, Borel measurable functions. There are other closely related notions of an extended generator in the probability literature. Sometimes calendar time dependence is introduced into the function φ, or martingales are used in place of local martingales. This will turn out to be a limiting version of a local Chernoff measure ρ α to be defined in Section 16.8. See Section 16.9.2 for alternative interpretations of a particular empirical application in terms of risk-sensitivity and robustness. For that example, we show how the robustness interpretation helps us to restrict θ . This analysis differs from that of Breeden (1979) by its inclusion of jumps. In the applications in HST, HSW, and Section 16.9, we often take the actual data generating model to be the approximating model to study implications. In that sense, the approximating model supplies the same kinds of empirical restrictions that a rational expectations econometric model does. As we shall see in Section 16.9, our approach to disciplining the choice of θ depends critically on our adopting a robustness and not a risk-sensitivity interpretation. This bound covers the case in which the model one density omits probability, and so equivalence between the two measures is not needed for this bound to be informative.
414 Anderson et al. 22 The continuous time diffusion specification carries with it the implication that diffusion matrices can be inferred from high frequency data without knowledge of the drift. That data are discrete in practice tempers the notion that the diffusion matrix can be inferred exactly. Nevertheless, estimating conditional means is much more difficult than estimating conditional covariances. One continuous time formulation simplifies our analysis by focusing on the more challenging drift inference problem. 23 This can be seen by writing g = ( )−1 g. 24 The distributions associated with these rates differ, however. Bound (16.48) also uses a Markov evolution indexed by α, whereas we used the α = 1 model in evaluating the robust control objective. 25 In particular, Moscarini and Smith (2002) consider Bayesian decision problems with a more general but finite set of models and actions. Although they restrict their analysis to i.i.d. data, they obtain a more refined characterization of the large sample consequences of accumulating information. Chernoff entropy is a key ingredient in their analysis too. 26 For heterogeneous agent economies, worst-case models can differ across agents because their preferences differ. But an argument like the one in the text could still be used to keep each agent’s worst-case model close to the approximating model as measured by detection probabilities. 27 See figure 8 in HSW. 28 Thinking of a quarter as the unit of time, we took the sample interval to be 200. Alternatively, we might have used a sample interval of 600 to link to monthly postwar data. The market prices of risk and model uncertainty are associated with specific time unit normalizations. Since, at least locally, drift coefficients and diffusion matrices scale linearly with the time unit, the market prices of risk and model uncertainty scale with the square root of the time unit. 29 We base our calculations on an ARMA model for consumption growth, namely, log ct − log ct−1 = 0.002 + [(1 − 0.860L)/(1 − 0.974L)](0.0048)νt reported by Bansal and Yaron (2000) where {νt :t ≥ 0} is a serially uncorrelated shock with mean zero and unit variance. 30 The state independence implies that we can also interpret these calculations as coming from a decision problem in which |g| ˆ is constrained to be less than or equal to the same constant for each date and state. This specification satisfies the dynamic consistency postulate advocate by Epstein and Schneider (2003). 31 However, the exact probability calculations reported in Table 16.2 will not apply to the present economy. 32 HST estimated versions of this model with and without habits. 33 See HST and HSW for a proof in a discrete time setting of an observational equivalence proposition that identifies a locus of (δ, θ ) pairs that are observationally equivalent for equilibrium quantities. 34 See Hansen and Sargent (2004) for computational details. In computing the probability bounds, we chose the following initial conditions: we set the initial marginal utility of consumption to 15.75 and the mean zero components of the income processes to their unconditional means of zero. 35 We find it fruitful to explore concern about model uncertainty because these other model modifications are themselves only partially successful. To account fully for the market price of risk, Campbell and Cochrane (1999) adopt specifications with substantial risk aversion during recessions. Constantinides and Duffie (1996) accommodate fully the high market prices of risk by attributing empirically implausible consumption volatility to individual consumers (see Cochrane, 1997). Finally, Heaton and Lucas (1996) show that reasonable amounts of proportional transaction costs can explain only about half of the equity premium puzzle. 36 In the special case in which the number of states is finite and the probability of jumping to any of these states is strictly positive, a direct proof that hˆ is the minimizer is available.
A quartet of semigroups for model specification
415
Abusing notation somewhat, let η(yi | x) > 0 denote the probability that the state jumps to its ith possible value, yi given that the current state is x. We can write the component of the objective that depends upon h (which is equivalent to (16.A.2) in this special case) as θ+
n
η(yi |x)[−θh(yi , x) + θh(yi , x) log h(yi , x)
i=1
+ h(yi , x)V (yi ) − h(yi , x)V (x)]. Differentiating this expression with respect to h(yi , x) yields η(yi | x)[θ log h(yi , x) + V (yi ) − V (x)]. Setting this derivative to zero and solving for h(yi , x) yields h(yi , x) = exp[(V (x) − V (yi )/θ)], which is the formula for hˆ given in the text.
References Anderson, E. W. (1998), “Uncertainty and the Dynamics of Pareto Optimal Allocations,” University of Chicago Dissertation. Anderson, E. W., L. P. Hansen and T. J. Sargent (1998), “Risk and Robustness in Equilibrium,” manuscript. Bansal, R. and A. Yaron (2000), “Risks for the Long Run: A Potential Resolution of Asset Pricing Puzzles,” NBER Working Paper. Blackwell, D. and M. Girshick (1954), Theory of Games and Statistical Decisions, Wiley: New York. Bray, M. (1982), “Learning, Estimation, and the Stability of Rational Expectations,” Journal of Economic Theory, 26(2), 318–339. Breeden, D. T. (1979), “An IntertemporalAsset Pricing Model with Stochastic Consumption and Investment Opportunities,” Journal of Financial Economics, 7, 265–296. Brock, W. A. (1979), “An Integration of Stochastic Growth Theory and the Theory of Finance, Part I: The Growth Model,” General Equilibrium, Growth and Trade. Brock, W. A, (1982), “Asset Pricing in a Production Economy,” In The Economics of Information and Uncertainty, J. J. McCall, ed., Chicago: University of Chicago Press, for the National Bureau of Economic Research. Brock, W. A. and M. J. P. Magill (1979), “Dynamics Under Uncertainty,” Econometrica, 47, 843–868. Brock, W. A. and L. J. Mirman (1972), “Optimal Growth Under Uncertainty: The Case of Discounting,” Journal of Economic Theory, 4(3), 479–513. Cameron, H. R. and W. T. Martin (1947), “The Behavior of Measure and Measurability Under Change of Scale in Wiener Space,” Bulletin of the American Mathematics Society, 53, 130–137. Campbell, J. Y. and J. H. Cochrane (1999), “By Force of Habit: A Consumption-Based Explanation ofAggregate Stock Market Behavior,” Journal of Political Economy, 107(2), 205–251. Chen, Z. and L. G. Epstein (2001), “Ambiguity, Risk and Asset Returns in Continuous Time,” manuscript. Chernoff, H. (1952), “A Measure of Asymptotic Efficiency for Tests of a Hypothesis Based on the Sum of Observations,” Annals of Mathematical Statistics, 23, 493–507. Cho, I.-K., Williams. N. and T. J. Sargent (2002), “Escaping Nash Inflation,” Review of Economic Studies, 69(1), 1–40.
416 Anderson et al. Chow, C. K. (1957), “An Optimum Character Recognition System Using Decision Functions,” Institute of Radio Engineers (IRE) Transactions on Electronic Computers, 6, 247–254. Cochrane, J. H. (1997), “Where is the Market Going? Uncertain Facts and Novel Theories,” Federal Reserve Bank of Chicago Economic Perspectives, 21(6), 3–37. Cochrane, J. H. and L. P. Hansen (1992), “Asset Pricing Lessons for Macroeconomics,” 1992 NBER Macroeconomics Annual, O. Blanchard and S. Fischer (eds), 1115–1165. Constantinides, G. M. and D. Duffie (1996), “Asset Pricing with Heterogeneous Consumers,” Journal of Political Economy, 104, 219–240. Cox, J. C., J. E. Ingersoll, Jr and S. A. Ross (1985), “An Intertemporal General Equilibrium Model of Asset Prices,” Econometrica, 53, 363–384. Csiszar, I. (1991), “Why Least Squares and Maximum Entropy? An Axiomatic Approach to Inference for Linear Inverse Problems,” The Annals of Statistics, 19(4), 2032–2066. Dow, J. and S. R. C. Werlang (1994), “Learning Under Knightian Uncertainty: The Law of Large Numbers for Non-additive Probabilities, manuscript. Duffie, D. and L. G. Epstein (1992), “Stochastic Differential Utility,” Econometrica, 60(2), 353–394. Duffie, D. and P. L. Lions (1992), “PDE Solutions of Stochastic Differential Utility,” Journal of Mathematical Economics, 21, 577–606. Duffie, D. and C. Skiadas (1994), “Continuous-Time Security Pricing: A Utility Gradient Approach,” Journal of Mathematical Economics, 23, 107–131. Dupuis, P. and R. S. Ellis (1997), A Weak Convergence Approach to the Theory of Large Deviations, New York: John Wiley and Sons. Dynkin, E. B. (1956), “Markov Processes and Semigroups of Operators,” Theory of Probability and its Applications, 1, 22–33. Elliot, R. J., L. Aggoun, and J. B. Moore (1995), Hidden Markov Models: Estimation and Control, Springer: New York. Ellsberg, D. (1961), “Risk Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, 75(4), 643–669. Epstein, L. G. and M. Schneider (2003), “Recursive Multiple-Priors,” Journal of Economic Theory, November, 113(1), 1–31. Epstein, L. G. and T. Wang (1994), “Intertemporal Asset Pricing Under Knightian Uncertainty,” Econometrica, 62(3), 283–322. (Reprinted as Chapter 18 in this volume.) Epstein, L. G. and S. E. Zin (1989), “Substitution, Risk Aversion and the Temporal Behavior of Consumption and Asset Returns: A Theoretical Framework,” Econometrica, 57(4), 937–969. Ethier, S. N. and T. G. Kurtz (1986), Markov Processes: Characterization and Convergence, New York: John Wiley and Sons. Evans, G. W. and S. Honkapohja (2001), Learning and Expectations in Macroeconomics, Princeton: Princeton University Press. Fleming, W. H. and H. M. Soner (1991), Controlled Markov Processes and Viscosity Solutions, New York: Springer-Verlag. Fudenberg, D. and D. K. Levine (1988), The Theory of Learning in Games, Cambridge: MIT Press. Gilboa, I. and D. Schmeidler (1989), “Maxmin Expected Utility with Non-unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.)
A quartet of semigroups for model specification
417
Hansen, L. P. and R. Jagannathan (1991), “Implications of Security Market Data for Models of Dynamic Economies,” Journal of Political Economy, 99(2), 225–262. Hansen, L. P. and T. J. Sargent (1993), “Seasonality and Approximation Errors in Rational Expectations Models,” Journal of Political Economy, 55, 21–55. Hansen, L. P. and T. J. Sargent (1995), “Discounted Linear Exponential Quadratic Gaussian Control,” IEEE Transactions on Automatic Control, 40(5), 968–971. Hansen, L. P. and T. J. Sargent (2001), “Time Inconsistency of Robust Control?”, manuscript. Hansen, L. P. and T. J. Sargent (2003), “Decentralizing economies with preferences for robustness,” University of Chicago and New York University, manuscript. Hansen, L. P. and T. J. Sargent (2004), Robust Control and Economic Model Uncertainty, Princeton: Princeton University Press. Hansen, L. P., T. J. Sargent and T. D. Tallarini, Jr (1999), “Robust Permanent Income and Pricing,” Review of Economic Studies, 66, 873–907. Hansen, L. P., T. J. Sargent, G. A. Turmuhambetova, and N. Williams (2002), “Robustness and Uncertainty Aversion,” manuscript. Hansen, L. P., T. J. Sargent, and N. E. Wang (2002), “Robust Permanent Income and Pricing with Filtering,” Macroeconomic Dynamics, 6, February, 40–84. Hansen, L. P. and J. A. Scheinkman (1995), “Back to the Future: Generating Moment Implications for Continuous-Time Markov Processes,” Econometrica, 63, 767–804. Hansen, L. P. and J. Scheinkman (2002), “Semigroup Pricing,” manuscript. Heaton, J. C. and D. J. Lucas (1996), “Evaluating the Effects of Incomplete Markets on Risk Sharing and Asset Pricing,” Journal of Political Economy, 104, 443–487. Hellman, M. E. and J. Raviv (1970), “Probability Error, Equivocation, and the Chernoff Bound,” IEEE Transactions on Information Theory, 16, 368–372. James, M. R. (1992), “Asymptotic Analysis of Nonlinear Stochastic Risk-Sensitive Control and Differential Games,” Mathematics of Control, Signals, and Systems, 5, 401–417. Jorgenson, D. W. (1967), “Discussion,” The American Economic Review, Papers and Proceedings, 57(2), 557–559. Knight, F. H. (1921), Risk, Uncertainty and Profit, Boston: Houghton Mifflin. Kreps, D. (1998), “Anticipated Utility and Dynamic Choice,” Kalai, E. and Kamien, M. I. (eds), Frontiers of Research in Economic Theory: The Nancy L. Schwartz Memorial Lectures, Econometric Society Monographs, No. 29, Cambridge: Cambridge University Press, 242–274. Kunita (1969), H. “Absolute Continuity of Markov Processes and Generators,” Nagoya Math. Journal, 36, 1–26 Kurz, M. (1997), Endogenous Economic Fluctuations: Studies in the Theory of Rational Beliefs, New York: Springer. Lei, C. I. (2000), Why Don’t Investors have Large Positions in Stocks? A Robustness Perspective, unpublished Ph.D. Dissertation, University of Chicago. Lucas, R. E., Jr (1976), “Econometric Policy Evaluation: A Critique,” Journal of Monetary Economics, 1(2), 19–46. Lucas, R. E., Jr (1978), “Asset Prices in an Exchange Economy,” Econometrica, 46(6), 1429–1445. Lucas, R. E., Jr and E. C. Prescott (1971), “Investment Under Uncertainity,” Econometrica, 39(5), 659–681. Maenhout, P. (2001), “Robust Portfolio Rules, Hedging and Asset Pricing”, manuscript, INSEAD.
418 Anderson et al. Merton, R. (1973), “An Intertemporal Capital Asset Pricing Model,” Econometrica, 41, 867–888. Moscarini, G. and L. Smith (2002), “The Law of Large Demand for Information,” manuscript, forthcoming in Econometrica. Newman. C. M. (1973), “The orthogonality of independent increment processes,” in Topics in Probability Theory, D. W. Stroock and S. R. S. Varadhan (eds), 93–111, Courant Institute of Mathematical Sciences, N.Y.U. Newman, C. M. and B. W. Stuck (1979), “Chernoff Bounds for Discriminating Between Two Markov Processes,” Stochastics, 2, 139–153. Petersen, I. R., M. R. James and P. Dupuis (2000), “Minimax Optimal Control of Stochastic Uncertain Systems with Relative Entropy Constraints,” IEEE Transactions on Automatic Control, 45, 398–412. Pratt, J. W. (1964), “Risk Aversion in the Small and Large,” Econometrica, 32(1–2), 122–136. Prescott, E. C. and R. Mehra (1980), “Recursive Competitive Equilibrium: The Case of Homogeneous Households,” Econometrica, 48, 1365–1379. Revuz, D. and M. Yor (1994), Continuous Martingales and Brownian Motion, 2nd ed., New York: Springer Verlag. Runolfsson, T. (1994), “The Equivalence between Infinite-horizon Optimal Control of Stochastic Systems with Exponential-of-integral Performance Index an Stochastic Differential Games,” IEEE Transactions on Automatic Control, 39, 1551–1563. Schroder, M. and C. Skiadas (1999), “Optimal Consumption and Portfolio Selection with Stochastic Differential Utility,” Journal of Economic Theory, 89, 68–126. Sims, C. A. (1993), “Rational Expectations Modelling with Seasonally Adjusted Data,” Journal of Econometrics, 55. Tallarini, Jr, T. D. (2000), “Risk-Sensitive Real Business Cycles,” Journal of Monetary Economics, 45(3), 507–532.
17 Uncertainty aversion, risk aversion, and the optimal choice of portfolio James Dow and Sergio ´ Ribeiro da Costa Werlang∗
17.1. Introduction In this chapter, we describe some implications for economic analysis of a model of decision making under uncertainty which generalizes the expected-utility model accepted by most economists as a representation of rational behavior. The model we use is the model of expected utility under a nonadditive probability measure, which seeks to distinguish between quantifiable “risks” and unknown “uncertainties.” An axiomatic treatment of the model may be found in Schmeidler (1982, 1989), Gilboa (1987), and Gilboa and Schmeidler (1989). The focus of this chapter is the problem of optimal investment decisions. Under the standard theory of expected utility, an agent who must allocate his or her wealth between a safe and a risky asset will buy some of the asset if the price is less than the expected (present) value. Conversely the agent will sell the asset short when the price is greater than the expected value. Our main theorem is a generalization of this result to the case of uncertainty. We also provide a definition of an increase in perceived uncertainty, and analyze the effects of an increase on the investment decision. The problem of making decisions under uncertainty has been of central importance to economics and statistics throughout the development of these disciplines. The expected-utility theory, which owes its axiomatic development to von Neumann and Morgenstern (1947), initiates from the work of Bernoulli (1730). Savage (1954) has made a persuasive case that rational behavior necessarily entails actions represented by such a utility function and by a prior subjective probability distribution over possible events. For example, an agent gambling on the toss of a coin about which he knows nothing may behave qualitatively differently from when he knows whether the coin is biased and if so by how much. According to Savage, this distinction would be unreasonable: in the first case the agent should behave exactly as if he knew that the bias was equal to some value (of course, this value need not be the “true” value since the agent does not have sufficient information).
Dow, James and Sérgio Ribeiro da Costa Werlang (1992) “Uncertainty aversion, risk aversion, and the optimal choice of portfolio,” Econometrica, 60, 197–204.
420
James Dow and Sergio ´ Ribeiro da Costa Werlang
Nevertheless, for both theoretical and empirical reasons economists have developed models which generalize the expected-utility model. One group of these models is based on a distinction between risk and uncertainty: the idea was proposed by Knight (1921) and has been further explored by Ellsberg (1961) and Bewley (1986) among others. In the series of papers referred to above, Schmeidler and Gilboa have given an axiomatic development of a model which incorporates this distinction. Based on a weakening of the independence axiom, the model entails maximizing expected utility with a nonadditive probability measure. With a nonadditive probability measure, the “probability” that either of two mutually exclusive events will occur is not necessarily equal to the sum of their two “probabilities.” If it is less than the sum, then expected-utility calculations using this probability measure will reflect uncertainty aversion as well as (possibly) risk aversion. The reader may be disturbed by “probabilities” that do not sum to one. It should be stressed that the probabilities, together with the utility function, provide a representation of behavior. They are not objective probabilities. Although the expected-utility model has been questioned, there is one factor which is strongly in its favor. While the theory of consumer behavior under certainty has only the most pedestrian empirical implications (homogeneity of degree zero and continuity of the demand function, and symmetry and negative semidefiniteness of the Slutsky matrix where demand is differentiable), the theory of expected utility yields some strong predictions, in particular the results on local risk-neutrality and on complete insurance with actuarially fair policies. A generalization of the theory which eliminated the independence axiom completely would also lead to the loss of these useful predictions. The purpose of this chapter is to show that the model of expected-utility maximization with nonadditive probabilities reflecting uncertainty aversion preserves strong results which are analogous to these. We focus on the local risk-neutrality theorem (Arrow (1965)). According to this result, an agent who starts from a position of certainty will invest in an asset if, and only if, the expected value of the asset exceeds the price. The amount of the asset that is bought depends on the agent’s attitude to risk. This result holds in the absence of transactions costs whenever it is possible to buy small quantities of an asset. Conversely, if the expected value is lower than the price of the asset the agent will wish to sell the asset short. Consequently an agent’s demand for an asset should be positive below a certain price, negative above that price, and zero at exactly that price. In case there are many risky assets, this price will not necessarily be the expected value. With a nonadditive subjective probability distribution over returns on the asset, we show that this result has a straightforward analog which is intuitively plausible and is compatible with observed investment behavior. There is an interval of prices within which the agent neither buys nor sells shorts the asset. At prices below the lower limit of this interval, the agent is willing to buy this asset. At prices above the upper end of this interval, the agent is willing to sell the asset short. The highest price at which the agent will buy the asset is the expected value of the asset under the nonadditive probability measure. The lowest price at which the agent sells the asset is the expected value of selling the asset short. This reservation price is larger
Uncertainty aversion and optimal portfolio
421
than the other one if the beliefs reflect uncertainty aversion: with a nonadditive probability measure, the expectation of a random variable is less than the negative of the expectation of the negative of the random variable. The computation of expected values with nonadditive probability measures is explained below. These two reservation prices, then, depend only on the beliefs and aversion to uncertainty incorporated in the agent’s prior, and not on attitudes to risk. This result is the nonadditive analog of the local risk-neutrality result. The local risk-neutrality result has a counterpart in the analysis of insurance. An agent who can buy actuarially fair insurance in any amount will choose to be fully insured. It follows from the results presented here that there will be a range of premium costs at which the agent buys full insurance (the model, like Savage’s model, has no objective probabilities and hence actuarial fairness is not defined). We suggest that a reasonable person may not act according to Savage’s model. Maximizing utility with a nonadditive prior may be a reasonable model of rational behavior in some circumstances. However, we do not argue that this model is the only way, nor necessarily the best way, to represent genuine uncertainty. What we show here is that it provides a tractable framework for economic analysis of the types of problems for which expected-utility theory itself is useful. In terms of empirical implications of the Schmeidler–Gilboa model, broadly similar types of behavior could be caused by transactions costs of asymmetric information, or by the preferences in Bewley’s (1986) model. The main difference is that in each of those three cases there is a tendency not to trade, whereas in Schmeidler–Gilboa there is a tendency not to hold a position. In other words the agent’s frame of reference here is the safe allocation, rather than the status quo. In this chapter we have set out the simplest investment decision to analyze, namely where there is only one uncertain asset. In case there are several assets the analysis becomes more complex because one must consider statistical dependence of the risks and uncertainties of the different asset returns. We hope to pursue this issue in the future. We have also refrained here from describing equilibrium interaction among many uncertainty averse traders. Dow et al. (1989) discuss this in relation to the no-trade theorem. The organization of the rest of the chapter is as follows. In Section 17.2 we present a simple example which illustrates the basic features of the model. In Section 17.3 we present a definition of an increase in uncertainty aversion and results on expectation of a random variable with a nonadditive distribution. In Section 17.4 we give our main theorem on asset choice under uncertainty. The Appendix contains mathematical results for reference. Several of the proofs are omitted for brevity, and are available on request from the authors.
17.2. An example In this section, we present an example which illustrates the portfolio decisions of an agent whose preferences are represented by a nonadditive probability measure. The example is based on a risk-neutral agent and an asset which can take only two
422
James Dow and Sergio ´ Ribeiro da Costa Werlang
possible values. The agent has wealth W and the (present) value of the asset will be either high, H , or low, L. The probabilities of these two outcomes are π and π respectively. If π + π < 1, the agent’s decisions reflect uncertainty aversion. We stress that the nonadditive prior represents both the presence of uncertainty and the agent’s aversion to it. For instance, in this example we could have π = π = 1/2, which does not necessarily mean that the agent “knows” the risk with certainty. It could be that the agent thinks both outcomes are equally likely and is not averse to uncertainty. Consider the expected return from buying one unit of the asset at price p. The value will be at worst (L − p) net of the price, but with probability π it will be (H − p), that is, an improvement of (H − L) over the worst outcome. The assessment of this possible improvement reflects its uncertainty: the expected payoff from buying one unit of the asset is [L + π(H − L)] − p. If the price p is less than [L + π(H − L)], a risk-neutral investor will buy the asset. Now consider the return from selling one unit of the asset short. The payoff will be (p − H ) if the asset is worth H , which is the worst outcome. With probability π it will increase to (p −L). The expected payoff is therefore p −H +π (H −L). Thus if p exceeds H − π (H − L), the investor will sell the asset short. Because π + π < 1, H − π (H − L) > L + π(H − L). At prices in between these two numbers the investor will not hold the asset. Figure 17.1 shows the expected payoff from buying and selling the asset as a function of p. This example illustrates how the expected value is computed under a nonadditive distribution. In this case, E(X) = L + π(H − L) (the details are given in the Appendix). It should be clear from the discussion that adding a constant to a random variable or multiplying it by a positive constant has the same effect on its
Payoff Expected gain from buying L + (H – L) – p
Expected gain from short sale p – H + ⬘ (H – L)
0 L Long position
H
Price
Short position
Figure 17.1 Expected gains from buying and selling short one unit of the asset.
Uncertainty aversion and optimal portfolio
423
expectation. On the other hand, this property does not hold for negative constants: −E(−X) = H + π (H − L), so that −E(−X) > E(X). It is this inequality which gives rise to the interval of prices with no asset holdings. A closely related representation of decisions is to suppose that the agent evaluates expected utility for a set of prior (additive) probability distributions and acts to maximize the minimum of expected utility over these priors (see Gilboa and Schmeidler (1989)). At one extreme, the agent considers only one prior—a “known” distribution—and acts according to the standard theory of expected utility. At the other extreme, if all prior distributions over outcomes are considered, the agent considers only the worst possible outcome. In our example, we would consider a set of additive priors where the chance of a high return lies between π (at least) and 1 − π (at most). The payoff from buying a unit of the asset is then Min{L + λ(H − L) − p|λ ∈ [π , 1 − π ]} = L + π(H − L) − p, and from selling it short, Min{p − H + λ(H − L)|λ ∈ [π , 1 − π ]} = p − H + π (H − L).
17.3. Uncertainty aversion We define a measure of uncertainty aversion, following an idea of Schmeidler (1989) for the case of two states of nature. The reader should refer as necessary to the Appendix for the notation, the definition of nonadditive probabilities, and a summary of their mathematical properties. Definition 17.1. Let P be a probability and A ⊂ an event. The uncertainty aversion of P at A is defined by c(P , A) = 1 − P (A) − P (Ac ). This number measures the amount of probability “lost” by the presence of uncertainty aversion. It gives the deviation of P from additivity at A. Notice that c(P , A) = c(P , Ac ), which is natural. Lemma 17.1. c(P , A) = 0 for all events A ⊂ if, and only if, P is additive. The proof is omitted. Example 17.1. Constant Uncertainty Aversion. Let be finite with n elements and let the event space be the power set of , 2 . For all ω ∈ , set P ({ω}) = (1−c)/n, where c ∈ [0, 1]. For A ⊂ , A = , define P (A) = ω∈A P ({ω}). It is easy to verify that c(P , A) = c, ∀A = , Ø. In other words this is a distribution with constant uncertainty aversion. In general a nonadditive probability need not be so simple.
424
James Dow and Sergio ´ Ribeiro da Costa Werlang
Example 17.2. Maximin Behavior. A person with extreme uncertainty aversion who is completely uninformed maximizes the payoff of the worst possible outcome. Suppose c(P , A) = 1 for all events A = , Ø. Then P (A) = 0 for all A = . Let u : R → R+ be the utility function of the agent. Then ∞ Eu = P (u α) dα. u dp = 0
Let u = inf x∈R u(x). Then P (u u) = 1 and P (u u + ε) = 0 ∀ε > 0. Therefore u 1 dα = u = inf u(x). Eu = x∈R
0
This “maximin” rule was proposed by Wald (1950) for situations of complete uncertainty, and Ellsberg (1961) and Rawls (1971) also suggest that this rule should be considered in such circumstances. Simonsen (1986) is a recent application to the theory of inflationary inertia. We now proceed to extend this “local” measure of uncertainty aversion to the whole range of two nonadditive probabilities. Definition 17.2. Given two nonadditive probabilities P and Q defined on the same space of events, we say that P is at least as uncertainty averse as Q if for all events A ⊂ , c(P , A) c(Q, A). The terminology is clumsy, but shorter than alternatives such as “P reflects at least as much perceived uncertainty as Q,” etc. This definition allows us to formalize the statement that the gap between buying and selling prices increases as the uncertainty aversion increases. Theorem 17.1. The following statements are equivalent: (i) P is at least as uncertainty averse as Q. (ii) For all random variables X for which the integrals are finite, −EP (−X) − EP X −EQ (−X) − EQ X. Proof. (i) ⇒ (ii): Let A(α) = {ω ∈ |X(ω) α}. Then 0 ∞ [P (A(α)) − 1] dα + P (A(α)) dα. EP X = −∞
0
Notice that {ω ∈ | − X(ω) > α} = A(−α)c . Thus 0 ∞ EP (−X) = [P (A(−α)c ) − 1] dα + P (A(−α)c ) dα −∞
=
0
∞
[P (A(α)c ) − 1] dα +
0 0 −∞
P (A(α)c ) dα.
Uncertainty aversion and optimal portfolio
425
Hence −EP (−X) − EP (X) =
∞
−∞
[1 − P (A(α)) − P (A(α)c )] dα.
By the same argument, −EQ (−X) − EQ (X) =
∞
−∞
[1 − Q(A(α)) − Q(A(α)c )] dα.
Since P is at least as uncertainty averse as Q, the result follows immediately. (ii) ⇒ (i): For all events A ∈ , define the random variable X = 1A (the characteristic function of the set A). Then EP X = P (A), EP (−X) = P (Ac ) − 1, EQ X = Q(A), and EQ (−X) = Q(Ac ) − 1. Applying (ii) to X, we get (i). The next example illustrates the effect of uncertainty aversion on the difference between −E(−X) and E(X). Example 17.3. Let X be a random variable with X = inf ω∈ X(ω) 0 and X = supω∈ X(ω) ∞. Let P be an additive probability, and fix c ∈ [0, 1]. We define a nonadditive probability which is obtained by uniformly increasing the uncertainty aversion from P : let Pc ( ) = 1, and Pc (A) = (1 − c)P (A) for A = . It is easy to verify that c(Pc , A) = c for all A = , Ø, and that EPc X = cX + (1 − c)EP X
and
− EPc (−X) = cX + (1 − c)EP X.
Thus −EPc (−X) − EPc X = c(X − X), which is increasing in the uncertainty aversion c in accordance with Theorem 17.1. Here we have taken an additive distribution and squeezed it uniformly. A risk-neutral agent whose behavior is represented by this distribution will maximize a weighted average of the worst possible outcome and the expectation of the additive distribution. Ellsberg (1961) suggested this as an ad hoc decision rule; this example provides some rationale for the rule.
17.4. Portfolio choice In this section we derive our main result, namely that there will be a range of prices, from E(X) to −E(−X), at which the investor has no position in the asset. At prices below these, the investor holds a positive amount of the asset, and at higher prices he holds a short position. Notice that this range of prices depends only on the beliefs and attitudes to uncertainty incorporated in the agent’s prior, and not on the attitudes towards risk captured by the utility function. Let W > 0 be the investors’ initial wealth, u 0 the utility function, and X a random variable with nonadditive distribution P . We assume that u is C 2 , u > 0, and u 0.
426
James Dow and Sergio ´ Ribeiro da Costa Werlang
Lemma 17.2. Suppose EX < ∞ and −E(−X 2 ) < ∞. For λ ∈ R define f (λ) = Eu(W + λX). Then (i) f is right-differentiable at λ = 0; (ii) f+ (0) = u (W )EX. The proof is omitted. We now proceed to the main result, namely the behavior of the risk-averse or risk-neutral agent under uncertainty aversion. Suppose the investor is faced with the problem of choosing the sum of money S he will invest in an asset. The present value of one unit of the asset next period is a random amount X with nonadditive probability distribution P . We characterize the demand for the asset as a function of the price. Theorem 17.2. A risk-averse or risk-neutral investor with certain wealth W , who is faced with an asset which yields X per unit, whose price is p > 0 per unit, will buy the asset if p < EX and only if p EX. He will sell the asset if p > −E(−X) and only if p −E(−X). Proof. By Jensen’s inequality (see the Appendix), Eu(W − S + (S/p)X) u(E[W − S + (S/p)X]). If EX p, then E[W − S + (S/p)X] W (by property (iv) of the integral in the Appendix). Thus the investor is at least as well off not holding the asset, giving expected utility u(W ), as buying any positive amount. Similarly if EX < p, no holding is strictly better than investing in the asset. We now show that if p < EX the investor will buy some of the asset. The investor’s objective is to maximize g(S) = Eu(W −S+(S/p)X). By Lemma 17.2, g+ (0) = u (W )E[(X − p)/p] > 0,
since EX > p. Thus the investor will buy a strictly positive amount of the asset. Similar arguments give the corresponding results for short sales. Notice that if u is not differentiable at some point W , then there is a range of prices with no trade even with an additive measure (if u is concave, the set of such points has measure zero).
Appendix The mathematical treatment of nonadditive probabilities may be found in Schmeidler (1982, 1986, 1989), Choquet (1955), Dellacherie (1970), Gilboa (1987), Gilboa and Schmeidler (1989), Shafer (1976), and Dempster (1967). The reader is referred to these sources. In particular, Schmeidler (1986) contains only material related to the mathematical aspects of the theory. Let be a set, and an algebra, that is a set of subsets of such that (i) ∈ , (ii) A, B ∈ ⇒ A ∪ B ∈ , and (iii) A ∈ ⇒ Ac ∈ (here Ac means the set of elements of not in A). is the set of states of nature and the elements of are called events. A function P : → [0, 1] is a nonadditive probability
Uncertainty aversion and optimal portfolio
427
if (i) P (Ø) = 0, (ii) P ( ) = 1, and (iii) P (A) P (B) if A ⊂ B. We impose an additional restriction (see Gilboa and Schmeidler (1989), Schmeidler (1986) and Shafer (1976)): (iv) ∀A, B ∈ , P (A ∪ B) + P (A ∩ B) P (A) + P (B). In Section 17.3 of the chapter we show that this corresponds to uncertainty aversion. A real valued function X : → R is said to be a random variable if for all open sets O of R, X−1 (O) ∈ . The expected value of a random variable X is defined as: 0 ∞ EX = X dp = (P (X α) − 1) dα + P (X α) dα,
−∞
0
whenever these integrals exist (in the improper Riemann sense) and are finite. Notice that since P (X α) = P (X > α) a.e., the expression for the expected value may also be written with strict inequalities. When it is necessary to distinguish between P and other distributions, we write EP X. The following properties of the integral are either proved in the papers referred to previously, or else can be proved immediately: (i) (ii) (iii) (iv) (v)
X Y ⇒ EX EY ; E(X + Y ) EX + EY ; −E(−X) EX; ∀a 0 and b ∈ R, E(aX + b) = aEX + b; For all concave functions u : R → R, Eu(X) u(EX) (Jensen’s inequality).
Note ∗ This research was initiated while the authors were at the University of Pennsylvania.
References Arrow, K. J. (1965). “The Theory of Risk Aversion,” Chapter 2 of Aspects of the Theory of Risk Bearing. Helsinki: Yrjo Jahnsonin Saatio. Bernoulli, D. (1730). “Exposition of a New Theory of the Measurement of Risk,” (in Latin), English translation in Econometrica, 21 (1953), 503–546. Bewley, T. (1986). “Knightian Decision Theory, Part 1,” Yale University. Choquet, G. (1955). “Theory of Capacities,” Ann. Inst. Fourier, Grenoble, 5, 131–295. Dellacherie, C. (1970). “Quelques Commentaires sur les Prolongements de Capacites,” Seminaire Probabilities V, Strasbourg. Berlin: Springer–Verlag, Lecture Notes in Mathematics 191. Dempster, A. (1967). “Upper and Lower Probabilities Induced by a Multivalued Mapping,” Annals of Mathematical Statistics, 38, 205–247. Dow, J., V. Madrigal, and S. Werlang (1989). “Preferences, Common Knowledge and Speculative Trade,” Working Paper, EPGE/Fundacao Getulio Vargas. Ellsberg, D. (1961). “Risk, Ambiguity and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Gilboa, I. (1987). “Expected Utility Theory with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88.
428
James Dow and Sergio ´ Ribeiro da Costa Werlang
Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Knight, F. (1921). Risk, Uncertainty and Profit. Boston: Houghton Mifflin. Neumann, J. von and O. Morgenstern (1947). Theory of Games and Economic Behavior. Princeton: Princeton University Press. Rawls, J. (1971). A Theory of Justice. Cambridge: Harvard University Press. Savage, L. J. (1954). The Foundations of Statistics. New York: John Wiley (2nd edn, 1972, New York: Dover). Schmeidler, D. (1982). “Subjective Probability without Additivity (Temporary Title),” Foerder Institute for Economic Research Working Paper, Tel Aviv University. —— (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) —— (1986). “Integral Representation with Additivity,” Proceedings of the American Mathematical Society, 97, 255–261. Shafer, G. (1976). A Mathematical Theory of Evidence. Princeton: Princeton University Press. Simonsen, M. H. (1986). “Rational Expectations, Income Policies and Game Theory,” Revista de Econometria, 6, 7–46. Wald, A. (1950). Statistical Decision Functions. New York: John Wiley.
18 Intertemporal asset pricing under Knightian uncertainty Larry G. Epstein and Tan Wang
18.1. Introduction Modern asset pricing theory typically adopts strong assumptions about agents’ beliefs. According to the rational expectations hypothesis, for example, there exists an objective probability law describing the state process, and it is assumed that agents know this probability law precisely. More generally, even if existence of the latter is not assumed, each agent’s beliefs about the likelihoods of future states of the world are represented by a subjective probability measure or Bayesian prior, in conformity with the Bayesian model of decision-making and, more particularly, with the Savage (1954) axioms. As a result, no meaningful distinction is allowed between risk, where probabilities are available to guide choice, and uncertainty, where information is too imprecise to be summarized adequately by probabilities. In contrast, Knight (1921) emphasized the distinction between risk and uncertainty and argued that uncertainty is more common in economic decision-making.1 Particularly, in the context of asset prices, Keynes emphasized the importance of “animal spirits” when, because of Knightian uncertainty, individuals cannot estimate probabilities reliably and so cannot make a good calculation of expected values. (See Keynes (1936) and (1921: Ch. 6); see also Koppl (1991) for discussion and additional references.) This chapter provides a formal model of asset price determination in which Knightian uncertainty plays a role. Specifically, we extend the Lucas (1978) general equilibrium pure exchange economy by suitably generalizing the representation of beliefs. Two principal results are the proof of existence of equilibrium and the characterization of equilibrium prices by an “Euler inequality.” The latter represents the appropriate generalization of the standard Euler equation to the context of uncertainty. A noteworthy feature of our model is that uncertainty may lead to equilibria that are indeterminate; that is, there may exist a continuum of equilibria for given fundamentals. That leaves the determination of a particular equilibrium price process to “animal spirits” and sizable volatility may result.
Epstein, Larry G. and Tan Wang (1994) “Intertemporal asset pricing under Knightian uncertainty,” Econometrica, 62, 283–322.
430
Larry G. Epstein and Tan Wang
Overall our model conforms closely to Keynes’ (1936: p. 154) description of the consequences of uncertainty: A conventional valuation which is established as the outcome of the mass psychology of a large number of ignorant individuals is liable to change violently as a result of a sudden fluctuation of opinion due to factors which do not really make much difference to the prospective yield; since there will be no strong roots of conviction to hold it steady. Besides the motivation provided by the intuitively appealing ideas of Knight and Keynes, our chapter is motivated also by evidence that people prefer to act on known rather than unknown or vague probabilities. For example, they typically prefer to bet on drawing a red ball from an urn containing 50 red and black balls each, than from an urn containing 100 red and black balls in undisclosed proportions. The best known such evidence is the Ellsberg Paradox (Ellsberg (1961)); the large body of empirical evidence inspired by this paradox, both experimental and market-based, is surveyed by Camerer and Weber (1992). Behavior such as that exhibited in the context of the Ellsberg Paradox contradicts the Bayesian paradigm, that is the existence of any prior underlying choices. Intuitively, the reason is that a probability measure cannot adequately represent both the relative likelihoods of events and the amount, type, and reliability of the information underlying those likelihoods. On the other hand, in a multiperiod setting such as ours, one may wonder whether “vagueness” might disappear asymptotically as a result of learning by the agent, at least if the environment is stationary. Learning in the presence of uncertainty has not yet been studied sufficiently well to provide a definitive theoretical answer to this question. In any event, it would seem that economic processes are typically too complicated or unstable to be modeled in detail and understood precisely. Thus we would not presume uncertainty to be strictly a shortrun phenomenon. See Walley (1991: Ch. 5) for further arguments about the general prevalence of imprecision and Zarnowitz (1992: 61–63) for cogent arguments in a business cycle setting. In an asset pricing context, Barsky and DeLong (1992) argue that there is substantial uncertainty about the structure of the aggregate dividend process in the United States over the last century, even on the part of current analysts who have the benefit of hindsight. In addition, many processes of interest are presumably physically indeterminate; Papamarcou and Fine (1991) describe an empirical process that generates relative frequencies that can be modeled by a set of probability measures, but not by any single probability measure. Ultimately, our objective is to investigate whether the noted shortcoming of the Bayesian paradigm is at all responsible for any of the empirical failures of the consumption-based asset pricing model derived from Lucas (1978). While serious empirical analysis is beyond the scope of this chapter, we will address the empirical content of our model informally. We do so first in Section 18.3.4 where we indicate the potential usefulness of our model for resolving the excess volatility puzzle (Shiller (1981) and Cochrane (1991)). Further discussion of empirical content is provided in Section 18.4.
Intertemporal asset pricing
431
There are now available a number of extensions of the Bayesian model that admit a distinction between risk and uncertainty. One, due to Bewley (1986), drops Savage’s assumption that preferences are complete and adds a model of the “status quo.” An alternative direction, due to Gilboa and Schmeidler (1989), is to weaken Savage’s Sure-Thing Principle. The consequence for the representation of preferences and beliefs is that Savage’s single prior is replaced by a set of priors. In this chapter, we take this multiple-priors model as our starting point.2 Then, since our framework is intertemporal and since the Gilboa–Schmeidler (1989) framework is atemporal and deals exclusively with one-shot choice, we extend their model (nonaxiomatically) to an intertemporal, infinite-horizon setting. Moreover, this is done in a way that delivers two attractive properties of the standard expected additive utility model that dominates economics and finance—dynamic consistency and tractability. Since such an extension is potentially useful for addressing issues other than asset pricing where uncertainty may be important, we view it as a separate contribution of the chapter. While the rational expectations hypothesis has considerable a priori appeal for economists, it has come under scrutiny in recent years because of apparently contradictory empirical evidence. We have already mentioned the asset pricing anomalies that indicate rejection of a collection of joint hypotheses that includes rational expectations. In addition, where it has been tested separately by means of survey data, the rational expectations hypothesis has generally been rejected (e.g., see Cragg and Malkiel (1982), Zarnowitz (1984), Ito (1990), Frankel and Froot (1990)). As a result, models with “irrational expectations” have been developed, involving “fads” (Shiller (1991), Barsky and DeLong (1992)) or “noise traders” (DeLong et al. (1990)). A focus on beliefs is shared by the model proposed in this chapter, though, in a sense, we deviate much less from the standard Lucas-style model. One can interpret our model as differing from Lucas’ only by replacing the Sure-Thing Principle and its implied Bayesian prior, by the Gilboa–Schmeidler set of axioms, suitably adapted to the intertemporal framework, and the resulting set of priors. We proceed as follows: Section 18.2 describes our model of intertemporal utility, including beliefs. Equilibrium asset pricing is studied in Section 18.3. We conclude in Section 18.4 with some comments on the empirical content of our model. Technical details are collected in appendices.
18.2. Intertemporal utility 18.2.1. Background The standard specification of utility over infinite horizon consumption processes is given by U (c) = E
∞ 1
! t
β u(ct ) ,
(18.1)
432
Larry G. Epstein and Tan Wang
or in recursive form U (c) = u(c1 ) + βEU (c2 , c3 , . . .).
(18.2)
Here E denotes the expectation operator conditional on available information; other notation is standard and will shortly be defined precisely in any event, as will the underlying stochastic environment. Beliefs about the likelihoods of future underlying states of the world are represented by a conditional probability measure π. In the rational expectations paradigm, π is an objective probability law that governs the evolution of states of the world and is assumed known to the decisionmaker. An alternative justification for π is the Savage representation theorem according to which π is a subjective probability measure; an objective probability law need not exist in principle. In either approach, a role for Knightian uncertainty or imprecise information is excluded a priori, either because information is assumed to be precise, or, in the second approach, because the Savage axioms imply that imprecision is a matter of indifference to the decision-maker (as discussed later). Our objective is to investigate the implications of imprecise information and thus we need to adopt a more general representation for beliefs. In order to focus more sharply on our objective, we consider otherwise “minimal” variations of (18.1) and (18.2). 18.2.2. The environment and beliefs The set of states is , a compact metric space with Borel σ -algebra B( ). Under the weak convergence topology, M( ), the space of all Borel probability measures, is also a compact metric space. At time t the decision-maker observes some realization ωt ∈ . Beliefs about the evolution of the process {ωt } conform to a time-homogeneous Markov structure. In standard models, this would involve a Markov probability kernel giving conditional probabilities. Here we assume that beliefs conditional on ωt are too vague to be represented by a probability measure and are represented instead by a set of probability measures. Thus we model beliefs by a probability kernel correspondence P , which is a (nonempty valued) correspondence P : → M( ), assumed to be continuous, compact-valued, and convex-valued. For each ω ∈ , we think of P (ω) as the set of probability measures representing beliefs about next period’s state. However, the rigorous interpretation of P is as a component of the representation of the preference ordering over consumption processes as described in the next subsection. Anticipating somewhat the noted representation of preferences, adapt common terminology and refer to the multivalued nature of P as reflecting uncertainty aversion of preferences (see Schmeidler (1989: Proposition, p. 532)). In fact, the multivaluedness of P reflects both the presence of uncertainty and the agent’s aversion to uncertainty; for our purposes, there is no need to attempt to define a meaningful distinction between the “absence of uncertainty” on the one hand, and the presence of uncertainty accompanied by indifference to it on the other. If
Intertemporal asset pricing
433
P is singleton-valued, then P = {π }, where π is a probability kernel, that is, a continuous map from into M( ). Since this Bayesian representation of beliefs excludes any role for uncertainty, we refer to uncertainty neutrality or indifference in this case. It will be convenient to adopt the following notation: for any bounded, Borelmeasurable f : → R and for any set P ⊂ M( ), " % f dP ≡ inf f dm : m ∈ P (18.3)
and accordingly, P(A) ≡ inf {m(A) : m ∈ P},
A ∈ B( ).
(18.4)
In particular, if P = P (ω) for some ω, then P (ω, A) ≡ inf {m(A) : m ∈ P (ω)},
(18.5)
and for any continuous f , " % f (·) dP (ω, ·) ≡ f dP (ω) ≡ min f dm : m ∈ P (w) .
(18.6)
Note that the latter minimum exists since P (ω) is compact and the map m → f dm is continuous by the nature of the weak convergence topology. We stretch common terminology and refer to the expressions on the left sides of (18.3) and (18.6) as “integrals” or “expected values.”3 On occasion, we will want to impose a link between beliefs and “reality,” at least with respect to which events are null or impossible. Suppose therefore that objectively null events are defined in the obvious way by the probability kernel π ∗ . It is not necessary to assume that the {ωt } process evolves according to π ∗ or any other probability kernel. Say that P is absolutely continuous with respect to π ∗ if ∀ω ∈ , ∀A ∈ B( ),
π ∗ (ω, A) = 0 ⇒ m(A) = 0
∀m ∈ P (ω); (18.7)
that is, the objective nullity of A(π ∗ (ω, A) = 0) implies the subjective nullity of A, with “conditioning on ω” understood throughout. For other purposes, it is useful to have the following property satisfied for all ω ∈ and all continuous functions f , g : → R+ , ∗ f dP (ω) > g dP (ω). f g, π (ω, {ω : f (ω ) > g(ω )}) > 0 ⇒ (18.8) With this in mind, we assume where explicitly stated that P has full support, that is, m(A) > 0 for all ω ∈ , m ∈ P (ω) and nonempty open subsets A ⊂ . Given
434
Larry G. Epstein and Tan Wang
this assumption, this indicated strict inequality for the integrals holds if f g and f = g. Thus we avoid the expositional and notational clutter associated with qualifications of the form “a.e. [π ∗ (ω, ·)]” in various definitions and statements of theorems stated later. Note that P (ω, A) > 0 ⇒ P (ω, \ A) < 1. Therefore, the assumption of full support limits the class of subjectively null events and guarantees that, at least for open sets A, P (ω, \ A) = 1 ⇒ A = Ø ⇒ π ∗ (ω, A) = 0, which is the converse of the implication in (18.7). 18.2.3. Examples of probability kernel correspondences Many natural and useful specifications of sets of priors have been studied in the statistics literatures (see, e.g. Wasserman (1990), Wasserman and Kadane (1990), and Walley (1991)) and many of these are readily extended to probability kernel correspondences. Here we describe two such examples. Example 18.1. (ε-Contamination) Fix a probability kernel π ∗ and a continuous function ε : → [0, 1]. Let P be defined by P (ω) ≡ {(1 − ε(ω))π ∗ (ω) + ε(ω)m : m ∈ M( )}.
(18.9)
Then the associated integrals (18.4) take the form f dP (ω) = (1 − ε(ω)) f (ω )π ∗ (ω, dω ) + ε(ω) · inf f ,
and for each B ∈ B( ), " (1 − ε(ω))π ∗ (ω, B) if B = , P (ω, B) = 1 if B = .
(18.10)
(18.11)
The correspondence P has full support if ε < 1 and supp π ∗ (ω) = for all ω ∈ . It reduces to the probability kernel π ∗ if ε ≡ 0. The other extreme, called complete ignorance, has ε ≡ 1 and P (·) ≡ M( ), in which case f dP (ω) = inf f .
The set P (ω) includes all perturbations of π ∗ (ω, ·), where ε(ω) reflects the amount of error deemed possible. Accordingly, a possible rationale for the specification mentioned earlier of P is that π ∗ represents the “true” probability law on
that the agent knows only imprecisely.4 Other forms of perturbations are also possible as suggested by the examples in the references cited earlier. An attractive feature of the particular perturbation represented by (18.9) is the explicit formula (18.10) available for associated integrals. Example 18.2. (Belief Function Kernels) The set of states is assumed to be exhaustive and therefore is presumably large and complex. Consequently, the law
Intertemporal asset pricing
435
of motion on may be too complicated to be understood precisely, or alternatively may not be representable by a probability kernel. Suppose, however, that the agent observes N statistics, each a function of the current state, and that the probability law governing the dynamics of these statistics is known. More precisely, let G : → RN
(18.12)
and let p be a probability kernel that describes the evolution of {G(ωt )} as a time-homogeneous Markov process; that is, p(·|y) is a conditional probability meausure on G( ) that varies continuously with y ∈ G( ). We assume both that G is continuous and that the inverse y → G−1 (y) is a continuous correspondence. Since, as described later, pay-off relevant variables, such as consumption and dividends, vary with ωt rather than G(ωt ), assessment of likelihoods over is essential to the agent. It is important to note that p and G do not imply a probability kernel over unless G is one-to-one. However, a representation of likelihoods in terms of a probability kernel correspondence may be constructed for arbitrary G in the following intuitively plausible fashion: For any ω ∈ and B ∈ B( ), let µ(ω, B) ≡ p({y ∈ RN : G−1 (y) ⊆ B}|G(ω)),
(18.13)
the probability according to p of those realizations for the statistics that imply B conditional on the values of the statistics at ω. Now define PG by PG (ω) ≡ {m ∈ M( ) : m(B) µ(ω, B), ∀B ∈ B( )}.
(18.14)
Then PG is continuous and therefore is a probability kernel correspondence.5 To elucidate (18.14), note that for each ω ∈ , µ(ω, ·) is a special capacity, called a belief function (Dempster (1967)) and PG (ω) is the “core” of µ(ω, ·); see also Wasserman (1988, 1990), Jaffray (1992), and Schmeidler (1989). Examination of the integration formulae implied by (18.14) provides further insight into the nature of PG . The analogues of (18.5)–(18.6) are PG (ω, A) = µ(ω, A),
ω ∈ ,
A ∈ B( ),
and
f dPG (ω) ≡
f ∗ (y)dp(y|G(ω)),
(18.15)
where f ∗ : G( ) → R is defined by f ∗ (y) ≡ min{f (ω ) : G(ω ) = y}. Since f ∗ is defined as the indicated minimum, the integral on the right side of (18.15) reflects the agent’s ignorance on each level set {ω : G(ω ) = y}. Thus PG models the situation where the law of motion p for the statistics G is the only information available regarding the law of motion on .
436
Larry G. Epstein and Tan Wang
18.2.4. Utility This subsection completes the description of the utility function over consumption processes, the first component of which is the probability kernel correspondence P . To define the domain of consumption processes, we need some notation and terminology. The measurable space underlying all random processes is ( ∞ , the product Borel σ -algebra B( ∞ )). For ω ∈ ∞ and t 1, ωt ≡ (ω1 , . . . , ωt ); t is the collection of all such points. Let B( t ) be the product Borel σ -algebra and embed it in the usual fashion in B( ∞ ). A process {Xt }, Xt : ∞ → Rn for each t, is adapted if Xt is B( t )-measurable for all t. Given such measurability, we can identify Xt with a map from t → Rn . If each such map is also continuous, refer to the process {Xt } as a continuous process. The process is real-valued if n = 1. Consumption processes lie in the complete normed space 2 D ≡ X = {Xt } : {Xt } is an adapted and continuous real-valued process, Xt (ωt ) 0 for all t 1 and ωt ∈ t , 3 and ||X|| ≡ sup sup |Xt (ωt )|/bt < ∞ , t
ωt
where b 1 is a fixed real number that provides an upper bound for the average rate of growth of consumption. The restriction to adapted processes is natural; consumption at time t can depend only on information available then. The assumption of continuity is undoubtedly less natural. Nevertheless, it affords considerable analytical simplification and is important for the analysis of equilibrium asset pricing and therefore seems appropriate in our attempt to balance mathematical generality with economic significance and accessibility.6 Consumption processes are typically denoted by c = {ct }. Since D will also be the ambient space for utility and price processes, the “neutral” dummy variable X is used earlier. An element X in D is Markovian if for each t and ωt ∈ , Xt (·, ωt ) is constant on t−1 and time-homogeneous if in addition Xt does not vary with t. Utilities over D are defined by three primitives: a probability kernel correspondence P , a discount factor β ∈ (0, 1), and an instantaneous utility or felicity function u : R+ → R+ assumed to be continuous, increasing, concave, and normalized to satisfy u(0) = 0. For each given c in D we define a utility process {Vt (c)}∞ 1 as the unique element of D satisfying the following recursive relation: for all t 1 and ωt in t , Vt (c; ωt ) = u(ct (ωt )) + β Vt+1 (c; ωt , ·)dP (ωt , ·), (18.16) where Vt (c; ωt ) denotes Vt (c)(ωt ). Think of Vt (c; ωt ) as the utility of the continuation consumption process t c ≡ (ct , ct+1 , . . .) conditional on the history ωt . The (initial) utility of the entire path c is V1 (c; ω1 ). The interpretation of (18.16) is clear. Given the history ωt at time t, the individual evaluates the consumption process for the remaining future in two stages. First, the
Intertemporal asset pricing
437
future from (t + 1) onward is evaluated by means of the “expected value” of Vt+1 with respect to beliefs P (ωt ). This summary index of future is then combined with the instantaneous utility of time t consumption to define the utility of the consumption process from t onward. If each P (ωt ) is a singleton probability measure, then (18.16) reduces to the standard model (18.2). Uncertainty aversion is incorporated into preferences in the general case by permitting P (ωt ) to be multivalued.7 By routine arguments based on the contraction mapping theorem, we show in Appendix A that utilities are well defined by (18.16). To state our theorem, adopt the notation t c|ωt−1 ≡ {cτ (ωt−1 , ·)}∞ τ =t ∈ D, for the continuation of c given the history ωt−1 preceding t. Also, if c and c are elements of D, write c > c if c = c and ct ct for all t; c 3 c if c > c and there exists t such that ct (ωt ) > ct (ωt ) for all ωt ∈ t . Finally, if U : D → R+ , say that U is (strictly) increasing if (c > c)c 3 c implies U (c ) > U (c). Theorem 18.1. (Existence of utility). Suppose that βb < 1. Then for each c ∈ D, there exists a unique V (c) ∈ D satisfying (18.16). Moreover, for all c, t 1 and ωt , Vt (c; ωt ) = V1 (t c|ωt−1 ; ωt ).
(18.17)
For each ω ∈ , V1 (· ; ω) is increasing and concave on D; it is strictly increasing if P has full support. Finally, if u satisfies a growth condition, that is, if there exist constants k1 and k2 > 0 such that u(x) k1 + k2 x for all x ∈ R+ , then V1 (c; ω) is jointly continuous on D × . Condition (18.17) asserts that time t utility equals a time-invariant function of the continuation consumption path t c|ωt−1 and the current state ωt . This follows from the time-homogeneous, first-order Markov structure assumed for beliefs. Since the time 1 designation is irrelevant, we denote V1 (c; ω) simply by V (c; ω) and refer to V as the utility function defined by (18.16). By the last part of the theorem, V possesses some standard regularity conditions. Note that the assumption βb < 1 is adopted throughout. Another important property of V , or at least of the entire utility process, is dynamic consistency. The recursive construction of utility via (18.16) suggests that dynamic consistency (suitably defined) will be satisfied. To be more precise, each Vt (· ; ωt ) is a utility function over D; denote by {Vt } the corresponding process of utility functions. Say that {Vt } is dynamically consistent if for all ω1 ∈ , c and c in D and T 1, V1 (c ; ω1 ) > V1 (c; ω1 ) if: (i) ct = ct for t = 1, . . . , T − 1, (ii) VT (c ; ω1 , ·) = VT (c; ω1 , ·), and (iii) VT (c ; ω1 , ·) VT (c; ω1 , ·) on T −1 . Say that {Vt } is weakly dynamically consistent if (i)–(iii) imply only that V1 (c ; ω1 ) V1 (c; ω1 ). The stronger notion of dynamic consistency is the counterpart for our framework of the usual definition (e.g. Epstein and Zin (1989), Duffie
438
Larry G. Epstein and Tan Wang
and Epstein (1992)). Only the weaker notion is satisfied in general by the process {Vt }, since the set of histories ω2 , . . . , ωT −1 where VT (c ; ω1 , ·) > VT (c; ω1 , ·) could be “null” from the perspective of time 1 and the beliefs prevailing there and thus V1 (c ; ω1 ) = V1 (c; ω1 ) is possible. That possibility is ruled out if P has full support, in which case dynamic consistency holds (see Appendix A). However, even if only weak dynamic consistency obtains, we show that our asset pricing model of Section 18.3 has an equilibrium along which optimal plans are carried out. Risk aversion for V is not mentioned earlier since it is well defined only given the existence of probabilities that can be used to define actuarial fairness. For that purpose, suppose there exist events in B( ) that can be assigned probabilities; that is, suppose B ∗ is a sub-sigma algebra of B( ) such that for each ω, any two measures in P (ω) agree when restricted to B ∗ . Then V has the form (18.1) on the subdomain of consumption processes defined by B ∗ -measurability, and so is clearly risk averse there. Finally, in this subsection we relate our recursive model of utility to Gilboa and Schmeidler (1989) and argue that (18.16) represents a sensible extension of their atemporal model to an intertemporal framework. An alternative extension has the following form: there exists a correspondence K : → M( ∞ ), with the set of measures K(ωt ) representing beliefs at (t, ωt ) about the entire future, such that intertemporal utility Ut is given by . - ∞ i−t t Ut (c; ω ) = β u(ci ) dK(ωt )
∞
≡ inf
t
∞
-
∞
. β
i−t
$
u(ci ) dm : m ∈ K(ωt ) .
(18.18)
t
In comparing (18.16) and (18.18), note first that they coincide under uncertainty neutrality but not more generally. In particular, if P is a probability kernel, then, given ωt , it determines a unique probability measure p(ωt ) on ∞ and (18.18) is derived with K(ωt ) equal to the singleton {p(ωt )}. However, such a derivation of (18.18) from (18.16) fails more generally since the additivity property of Lebesgue integration with respect to a probability measure is not satisfied by “integration” with respect to a set of probability measures. Given that (18.16) and (18.18) represent distinct models of intertemporal utility, one is left wondering which is more attractive. A definitive judgment would presumably require an examination of the axiomatic underpinnings of each model.8 While such an examination is beyond the scope of this chapter, we can point to an axiomatic difference between the two models that is important and supportive of (18.16) at least when “time” is taken seriously. That feature is simply that {Ut } is generally weakly dynamically inconsistent. Therefore, in the absence of an explanation of how dynamic inconsistency is resolved, the model (18.18) does not deliver predictions about choice behavior. In an important sense, therefore, the model (18.18) is incomplete; in particular, it is not clear how it should be applied to describe consumption/savings behavior and asset price determination in the model
Intertemporal asset pricing
439
economy of Section 18.3. A game-theoretic resolution of dynamic inconsistency has been examined in related models, but the tractability of such an approach is a serious concern in the setting of Section 18.3. There is a closely related observation concerning (18.18) that also merits mention. One might think of adopting the specification (18.18) at t = 1 and then suitably updating the set of priors K(ωt ) as time proceeds. However, any updating rule will invariably imply the weak dynamic inconsistency of preferences, excluding a “small” number of arguably uninteresting specifications for K(ωt ), one of which is that K(ωt ) is a single probability measure (see Epstein and LeBreton (1993)). This difficulty reflects the problematic nature of rules for updating vague beliefs that is now well recognized (see Walley (1991: 279–281), Gilboa and Schmeidler (1993), Jaffray (1992), Epstein and LeBreton (1993)). In contrast, our model of utility delivers weak dynamic consistency. By adopting “conditional belief,” represented by P , as the primitive, we obviate the need for an updating rule. Moreover, we feel that the recursive framework has some psychological plausibility because of the algorithmic appeal of backward induction. 18.2.5. Utility supergradients Since we will be concerned later with the (shadow) pricing of securities, we are led naturally to an examination of the supergradients, suitbly defined, of our utility function V . A novel feature of V relative to utility functions that have been generally applied previously in the macro/finance literature is that V (·, ω) is “frequently” nondifferentiable in the Gâteaux sense unless P is a probability kernel. However, since V (·; ω) is concave, it possesses one-sided Gâteaux derivatives. Here we derive representations for these one-sided derivatives and the associated supergradients. These representations are applied to the security valuation problem in Section 18.3.2. Let e ∈ D represent a base consumption process that is everywhere strictly positive and consider the effect on utility of perturbations in specified directions. It will suffice to consider perturbations in “today’s” and “next period’s” consumption only, that is, consider the change from e to e + ξ h where ξ ∈ R1 and h = {ht }∞ 1 is a continuous real-valued process, such that ht ≡ 0 for t = 1, 2, h1 ∈ R, and h2 ∈ C( ). Note that e+ξ h ∈ D for sufficiently small ξ . Therefore, V (e+ξ h; ω) is defined for such ξ . Since V is defined via a minimum over a set of probability measures as in (18.6) and (18.16), one-sided directional derivatives of V may be derived by an appropriate “envelope theorem.” The one-sided derivatives are described in Lemma 18.1, which is a special case of the envelope theorem result in Aubin (1979: 118). For simplicity, the Lemma deals with the case where e is Markovian and time-homogeneous. Lemma 18.1. Let e ∈ D be a positive, Markovian and time-homogeneous consumption process with et (ωt ) = e∗ (ωt ). Let h = {ht }∞ 1 with ht = 0 for t = 1, 2, h1 ∈ R, and h2 ∈ C( ). Define the convex-valued and compact-valued
440
Larry G. Epstein and Tan Wang
correspondence Q : → M( ) by % " ∗ ∗ Q(ω) = m ∈ P (ω) : V dm = V dP (ω) ,
(18.19)
where V ∗ (ω) ≡ V (e; ω),
ω ∈ .
(18.20)
Then the one-sided Gâteauz derivatives of V (·, ω) at e and in the direction h, are given by d V (e + ξ h; ω) = u (e∗ (ω))h1 dξ 0+ " % ∗ + β min u (e )h2 dm : m ∈ Q(ω) , d V (e + ξ h; ω) dξ
(18.21) 0−
= u (e∗ (ω))h1 " + β max
% u (e∗ )h2 dm : m ∈ Q(ω) .
The Lemma suggests, and this will be confirmed by examples later, that utility is not Gâteaux differentiable in general. The “origin” of this nondifferentiability is clear since uility is defined via a pointwise minimum, namely the “integral” on the right side of (18.16), corresponding to the Gilboa–Schmeidler (1989) way of modeling uncertainty aversion, and a pointwise minimum of functions is not differentiable in general. The particular representation for one-sided derivatives provided in (18.21) is also intuitive for “envelope theorem” reasons. To elaborate and in order to pave the way for its role in our study of asset prices, we spell out the following interpretation for Q : m ∈ Q(ω) if and only if m is (i) “compatible” with beliefs, in the sense of lying in P (ω), and (ii) “equivalent” to P (ω), in the sense of the calculation of expected future utility for the given base process e. Since any single prior reflects the absence of (or indifference to) uncertainty, the relation between P (ω) and each m ∈ Q(ω) is akin to that between a random pay-off and its certainty equivalent familiar in the case of risk. Accordingly, refer to Q(ω) as the set of uncertainty adjusted probability measures corresponding to P (ω), for the given e. A critical question for our purposes is whether the nondifferentiability suggested by (18.21) is likely to be sufficiently frequent to be “significant.” We postpone discussion of this question until Section 18.3.4, at which point the relevance of (18.16) for asset pricing will have been described. Finally, in this subsection, we provide an alternative formulation of Lemma 18.1. Let e and h be as stated earlier. Refer to s as a (one-period ahead) supergradient
Intertemporal asset pricing
441
of V (·, ω) at e if s is a continuous linear functional on R × C( ) satisfying V (e + h; ω) − V (e; ω) s(h1 , h2 )
(18.22)
for all (h1 , h2 ) such that e + h ∈ D. Denote by M + ( ) the space of positive countably additive measures on endowed with the weak topology induced by C( ). By the Riesz Representation Theorem, each s can be identified with an element (s1 , p) ∈ R+ × M + ( ) in the sense that s(h1 , h2 ) = s1 h1 + β h2 dp, (h1 , h2 ) ∈ R × C( ).
Lemma 18.1 shows that s1 = u (e∗ (ω)) and dp = u (e∗ ) dm for some m ∈ Q(ω). Therefore the set of supergradients of V (·; ω) at e, ∂V (e; ω), viewed as a subset of R+ × M + ( ), is given by ∂V (e; ω) = {(u (e∗ (ω)), p) : p ∈ M+ ( ), ∃m ∈ Q(ω), dp = u (e∗ ) dm}. (18.23) The continuity of the correspondence ω → ∂V (e; ω) is important in the proof of existence of an equilibrium in the economies to which we now turn.
18.3. Equilibrium asset pricing 18.3.1. The economy We consider an extension of the Lucas (1978) pure exchange economy, having a representative agent, or equivalently a number of agents with identical preferences and endowments. Preferences are as stated earlier, with the exception that we add the assumptions that the felicity function u is strictly increasing and continuously differentiable, with u (0) = ∞ admissable. Such a “minimal” variation of the Lucas model seems appropriate given our focus on the effects of uncertainty aversion. There is a single perishable consumption good with the total supply available at any time and state described by the endowment process e = {et } ∈ D. For simplicity, assume that the endowment process has a time-homogeneous Markov structure, in the sense that for some function e∗ , et (ωt ) = e∗ (ωt )
t 1,
ω t ∈ t ,
(18.24)
and that endowments are positive, that is, e∗ (ω) > 0 on .
(18.25)
There are n securities, where the ith provides the dividend process di = {di,t } ∈ D. In each period, the securities are traded in a competitive market at prices qi = {qi,t } ∈ D, i = 1, . . . , n, with consumption in each period serving as
442
Larry G. Epstein and Tan Wang
numeraire.9 Write qt ≡ (q1,t , . . . , qn,t ) and q = {qt } ∈ D n . Without loss of generality, that is, by redefining e if necessary, we can assume that each asset is available in zero net supply at all times and states. At the beginning of each period, the consumer plans consumption and portfolio holdings for the current period and all future periods in order to maximize intertemporal utility. Plans are represented by a pair (c, θ), where c ∈ D and θ = {θt } is a continuous process with θt = (θ1,t , . . . , θn,t ) representing the portfolio plan for period t. Consider a time-history pair (t, ωt ). Refer to (c, θ) as (t, ωt )-feasible if for all τ t qτ · θτ + cτ = θτ −1 · [qτ + dτ ] + eτ ,
θt−1 (ωt−1 ) ≡ 0,
and
inf θi,τ (ω ) > −∞, τ
i,τ ,ωτ
where the latter is a weak restriction on short sales and θ0 ≡ 0.10 Say that the (t, ωt )-feasible plan (c, θ ) is (t, ωt )-optimal if V (t c|ωt−1 ; ωt ) V (t c |ωt−1 ; ωt ) for all other plans (c , θ ) that are (t, ωt )-feasible. ∞ n An equilibrium is a price process {qt }∞ 1 ∈ D such that {(eτ , 0)}1 is a t t t (t, ω )-optimal plan for all t 1 and ω ∈ . In an equilibrium, spot asset and consumption good markets clear at any (t, ωt ) when the agent optimizes given expectations regarding future prices described by {qτ }∞ t+1 ; and subsequently, those expectations are fulfilled in that they clear later spot markets. Note that the consumer is dynamically consistent in equilibrium in the sense that the given (t, ωt )-optimal plan remains optimal from the perspective of all later times and histories. A weaker notion of equilibrium would require only that {(et , 0)}∞ 1 be (1, ω1 )-optimal. The relation between these two equilibrium notions is clarified in Theorem 18.2. The term “equilibrium” is reserved for the first definition. 18.3.2. Euler inequalities In this section we derive necessary conditions for an equilibrium from the firstorder conditions for the agent’s optimization problem. These conditions generalize the standard Euler equations; they take the form of inequalities, rather than equalities, because V (· ; ω) is generally nondifferentiable in the Gâteaux sense (see Section 18.2.5) unless P is a probability kernel. Suppose {qt } is an equilibrium. At any given (t, ωt ) consider a variation (c, θ) of the optimal policy such that cτ = eτ and θτ = 0 for τ = t, t + 1, ct = et − ξ( · qt ), θt = ξ , θt+1 = 0, and ct+1 = et+1 + ξ · (qt+1 + dt+1 ), where ∈ Rn represents the direction in which the period t portfolio is perturbed and ξ ∈ R represents the “size” of the perturbation. Any such perturbation must leave the agent worse-off. In other words, if ht ≡ − · qt
and ht+1 ≡ · (qt+1 + dt+1 ),
Intertemporal asset pricing
443
then in the obvious notation 0 ∈ argmax V (e + ξ(ht , ht+1 , 0, . . .); ωt ). ξ
(18.26)
By Lemma 18.1, the first-order conditions for this problem take the form11 u (e∗ ) · (qt+1 + dt+1 ) dm β min m∈Q(ωt )
u (et ) · qt β max
m∈Q(ωt )
u (e∗ ) · (qt+1 + dt+1 ) dm.
We can rewrite these inequalities in the more compact and equivalent form " ) % * u (et+1 ) min · (qt+1 + dt+1 ) − · qt 0 ∀ ∈ Rn , βEm m∈Q(ωt ) u (et ) (18.27) where Em denotes integration with respect to the probability measure m. We wish to express this infinite collection of inequalities in a more efficient and useful way. In usual formulations, where differentiability obtains, there is no loss in restricting to the coordinate directions. Such equivalence fails here, however, since the expression in (18.27) is not linear in , or equivalently, the one-sided Gâteaux derivatives of V , described in Lemma 18.1, are not linear in the perturbation. Therefore, a slightly more elaborate procedure is required. First, rewrite (18.27) in the more manageable form sup min F (m, ) 0. m∈Q(ωt )
Since F (m, ·) is linearly homogeneous, this inequality is equivalent to max min F (m, ) 0,
∈ γ m∈Q(wt )
where γ is the convex hull of {±ith unit coordinate vector: i = 1, . . . , n}. By Fan’s Theorem (see Appendix B), the latter inequality is equivalent to min
max F (m, ) 0.
m ∈ Q(ωt ) ∈ γ
By the Maximum Theorem, there exists m∗ ∈ Q(ωt ) for which the minimum over Q(ωt ) equals max ∈ γ F (m∗ , ) 0. By the linear homogeneity of F (m∗ , ·) and the fact that ∈ γ ⇔ − ∈ γ , we conclude that for all ∈ γ , F (m∗ , ) =
min
max F (m, ) = 0.
m ∈ Q(ωt ) ∈ γ
(18.28)
Since for each m, F (m, ·) is linear, max{F (m, ) : ∈ γ } is attained on the set of extreme points of γ . We arrive finally at the following system of Euler inequalities
444
Larry G. Epstein and Tan Wang
that must be satisfied in equilibrium: for all (t, ωt ), % " ) * u (et+1 ) = 0. (q + d ) − q min max βEm i,t+1 i,t+1 i,t m∈Q(ωt ) i u (et )
(18.29)
The presence of the minimization over Q(ωt ) on the left side of (18.29) justifies our use of the term “inequalities” to refer to (18.29), in spite of the equality with zero. The inequality nature of (18.29) is highlighted in the single asset case (n = 1) where it reduces, in the obvious notation, to ) * u (et+1 ) (q + d ) min βEm t+1 t+1 m∈Q(ωt ) u (et ) * ) u (et+1 ) qt max βEm (18.30) (qt+1 + dt+1 ) . m∈Q(ωt ) u (et ) When n > 1, (18.29) implies an inequality analogous to (18.30) for each asset, but this collection of n inequalities is not exhaustive for the reasons given earlier concerning the nonlinearity of one-sided Gâteaux derivatives. Of course, if P is a probability kernel, then both P (ωt ) and Q(ωt ) are singletons and (18.29) reduces to the standard Euler equation ) * u (et+1 ) (q + d ) , for all i. qi,t = βEµ(ωt ,·) i,t+1 i,t+1 u (et ) 18.3.3. Equilibrium The Euler inequalities are not only necessary, but they are also sufficient for an equilibrium, that is, any price process {qt } satisfying (18.29) is an equilibrium, as we show shortly. To establish the existence of solutions to (18.29), and therefore of equilibria, we need to restrict the probability kernel correspondence P . To formulate the added assumption, define the correspondence Qf from into M( ), for any given f ∈ C( ), by " % f dm : m ∈ P (ω) . (18.31) Qf (ω) ≡ argmin Assumption (Strict Feller property for P ). Qf is a continuous correspondence for each f ∈ C( ). If P is a probability kernel, then Qf is continuous since Qf = P . Another trivial case, termed i.i.d. beliefs, has P (ω) independent of ω; then Qf is constant and a fortiori continuous. The continuity of Qf is trivial also if is finite and endowed with the discrete topology. More generally, we can infer from the continuity of P and the Maximum Theorem only that Qf is upper semi-continuous. The interpretation of the strict Feller property is facilitated by reference to Section 18.2.5. From (18.23), we see that it implies that the set of supergradients of V (· ; ω) varies continuously with ω. From the perspective of the question
Intertemporal asset pricing
445
of the existence of equilibria, such “continuous superdifferentiability” is the essential content of the added assumption. Note that for the proof of existence in the economy corresponding to the specific endowment process e, it suffices that QV ∗ be continuous (see Lemma 18.1 and note that QV ∗ = Q). In particular, existence is guaranteed if e∗ is constant, since that V ∗ is constant and thus Q = P . The proof of existence of solutions to the Euler inequalities now proceeds as follows.12 Since Q is compact and convex-valued and continuous, it admits a continuous selection, that is, there exists a sequence of probability kernels {πt } such that πt (ωt , ·) ∈ Q(ωt ) for all t and ωt ∈ . Now consider the equation " qi,t = βEπt (ωt ,·)
%
u (et+1 ) (qi,t+1 + di,t+1 ) u (et )
(18.32)
for all t, ωt , and i. By contraction mapping arguments (as extended in Lemma 18.A.1), one can prove the existence of a unique (given {πt }) price process satisfying (18.32). For that solution q, the Euler inequalities follow immediately. The arguments-given earlier lead us to the following central theorem, the proof of which is completed in Appendix B: Theorem 18.2. (Existence and characterization of Equilibria): (a) The set of equilibria coincides with the set of price processes satisfying (18.29). (b) If P satisfies the strict Feller property, there exists equilibria. (c) If P has full support, then q ∈ D n is an equilibrium if and only if {(eτ , 0)}∞ 1 is (1, ω1 )-optimal for all ω1 ∈ . Part (c) shows that the two equlibrium notions described earlier coincide if P has full support. This is not surprising in light of the dynamic consistency property of the utility process implied by the full support assumption, as discussed in Section 18.2.4. There exists an equilibrium for each sequence of selections {πt }, used as in (18.32), implying that there may be many equilibria in our economy. This nonuniqueness is related to the findings of Dow and Werlang (1992), who show in a static model with one risky and one riskless asset, that there exists a set of asset prices that support the optimal choice of a riskless portfolio. Here we extend their analysis to an infinite-horizon, multiple-asset framework and we show that the nonuniqueness of supporting prices is not restricted to riskless positions. Simonsen and Werlang (1991) also observe the potential nonuniqueness of supporting prices under uncertainty aversion in a static setting. Note also that the nonuniqueness of prices and its “origin” in the multiplicity of underlying priors accord well with Keynes’ intuition. He writes (1936: 152) that the “existing market valuation . . . cannot be uniquely correct, since our existing knowledge does not provide a sufficient basis for a calculated mathematical expectation.” In order to discuss further the nonuniqueness or indeterminacy of equilibrium prices, adopt the following notation and terminology: Denote by E the set of all
446
Larry G. Epstein and Tan Wang
equilibria. Say that the price of the ith security is determinate if for all q and q in ∞ E , {qi,t }∞ 1 = {qi,t }1 . Theorem 18.3. (Structure of set of equilibria). If P satisfies the strict Feller property, then: (a) E is a closed and connected subset of D n . (b) For each i, the equations " q¯i,t = β max Em m∈Q(ωt )
"
q i,t = β min Em m∈Q(ωt )
%
u (et+1 ) (q¯i,t+1 + di,t+1 ) u (et )
and %
u (et+1 ) (q i,t+1 + di,t+1 ) u (et )
(18.33)
have unique solutions in D, denoted q¯i and q i , respectively. These solutions satisfy the condition that for any q ∈ E and for any i and t, q i,t qi,t q¯i,t
on t .
(18.34)
Moreover, given i, t, and any ε > 0, there exist q 1 and q 2 in E such that 1 q i,t + ε qi,t
and
2 qi,t q¯i,t − ε
on t .
(18.35)
Finally, the ith security price is indeterminate if and only if for some t q i,t ≡ q¯i,t ,
(18.36)
in which case {qi : q ∈ E } is an uncountably infinite set.13 Part (a) provides some information regarding the size of E . Since E is a connected complete metric space, it follows from the Baire category theorem (Royden (1988: 159)), that if the equilibrium is not unique, then there exists an uncountable infinity of equilibria. This is confirmed by part (b). The latter first provides, via (18.34), bounds for the equilibrium price of any security and then shows that these bounds are tight, in the natural sense of (18.35). Finally, (18.36) provides necessary and sufficient conditions for price indeterminacy. In special circumstances, those conditions assume a simpler form. For instance, the condition % % " " u (et+1 ) u (et+1 ) min Em = max Em m∈Q(ωt ) m∈Q(ωt ) u (et ) u (et ) characterizes the indeterminacy of the price of a one-period discount bond issued at (t, ωt ) and paying one unit of consumption at (t + 1). Intuitively, we would expect a link between indeterminacy of asset prices and intertemporal price volatility. This intuition can be confirmed in the special case of “i.i.d. beliefs,” that is, where P (ω) is independent of ω, in which case the
Intertemporal asset pricing
447
correspondence Q is also constant. Hence, for a security with time-homogeneous dividend process, if the price of the security is determinate, then it must be constant (across time and states). Consequently, any fluctuation in price is a reflection of indeterminacy. More generally, the link between indeterminacy and volatility can be thought of in the usual way in terms of the existence of “sunspot equilibria.” That is, if the selection {πt } from Q (see 18.32)) is made to depend on a “sunspot” or “extrinsic” variable, then the corresponding equilibrium price process will also depend on that variable.14 The discussion to this point has assumed implicitly that price indeterminacy is a significant feature of our model in the sense of occurring on a “nonnegligible” set of economies. That this assumption is warranted is most easily demonstrated in the context of specific examples of probability kernel correspondences and so we defer further discussion to the next section. The final result of this section provides a further characterization of equilibria. Let q be an equilibrium and reconsider (18.28). For the given t, we will now consider ωt to be variable and thus the dependence of F on ωt (through qt and qt+1 ) is made explicit by writing F (m, , ωt ). From (18.28) and the linearity of F (m, ·, ωt ) we derive min g(m, ωt ) = 0,
m∈Q(ωt )
where g(m, ωt ) ≡ max{F (m, , ωt ) : an extreme point of γ }. By the Maximum Theorem, g is continuous and the correspondence of minimizers given earlier is upper semicontinuous. Therefore, it admits a measurable selection (Klein and Thomson (1984: Theorem 4.2.1)), that is, there exists for each t ξt : t → M( ) measurable, ξt (ωt , ·) ∈ Q(ωt )
∀ωt ∈ t
(18.37)
and g(ξt (ωt , ·), ωt ) ≡ 0. Substitution of the appropriate expressions for g and F establishes the nontrivial portion of the following result. Theorem 18.4. (Further characterization of equilibria). q is in E if and only if q is in D n and for some {ξt } as in (18.37), q satisfies " % u (et+1 ) qi,t = βEξt (ωt ,· ) (qi,t+1 + di,t+1 ) , (18.38) u (et ) for all t and i.15 The characterization provided by Theorem 18.4 is helpful in placing our model of asset price determination in the context of the literature. In order to proceed, adopt the standard assumption that the actual evolution of {ωt } is described by a probability kernel π ∗ . In place of the rational expectations hypothesis that π ∗ is known precisely by the agent, assume instead that Q is absolutely continuous with respect to π ∗ , for which it suffices that the probability kernel correspondence P
448
Larry G. Epstein and Tan Wang
be absolutely continuous (recall (18.7)). Such absolute continuity is assured if
is finite and π ∗ (ω, ω ) > 0 for all ω and ω in . Denote by zt+1 (ωt , ·) : → R+ the Radon-Nikodym derivative of ξt (ωt , ·). Then (18.38) has the form " % u (et+1 ) (q + d ) . (18.39) qi,t = βEπ ∗ (ωt ,· ) z t+1 i,t+1 i,t+1 u (et ) By construction, {zt+1 } is restricted by zt+1 0, zt+1 dπ ∗ (ωt , ·) ≡ 1 and
ξt (ωt , ·) ∈ Q(ωt ), dξt (ωt , ·) ≡ zt+1 (ωt , ·)dπ ∗ (ωt , ·).
(18.40)
The relations (18.39), without (18.40) or other restrictions on {zt+1 }, can be established under fairly general considerations and contain commonly considered models as special cases (see Hansen and Richard (1987) and Hansen and Jagannathan (1991)). Generally, (18.39) is rewritten in terms of the “stochastic discount factors” γt+1 ≡ βzt+1 u (et+1 )/u (et ) in the form qi,t = Eπ ∗ (ωt ,·) [γt+1 (qi,t+1 + di,t+1 )],
i = 1, . . . , n.
(18.41)
Since one can always find some {γt+1 } so that 18.41 is satisfied, the empirical content of any particular model of asset prices is represented by the restrictions it imposes on the discount factors {γt+1 } or equivalently, on {zt+1 }. For our model, those restrictions are represented by (18.40). The standard Lucas based rational expectations model imposes the stronger restriction {zt+1 } ≡ 1. See Cochrane and Hansen (1992) for examples of other restrictions on stochastic discount factors that have been studied in the literature. 18.3.4. Examples We illustrate and elaborate upon our analysis of asset price determination in the context of the two examples of probability kernel correspondences of Section 18.2.3. Then, in order to lend indirect support to our “explanation” of price indeterminacy, we examine another model where indeterminacy can occur—a Lucas-style model where the felicity function u is not differentiable. Finally, we consider briefly an example of an economy where agents are uncertainty averse and heterogeneous so that trade may occur. ε-Contamination. It follows from (18.10) that for any f ∈ C( ), " % Qf (ω) = (1 − ε(ω))π ∗ (ω) + ε(ω)m : m ∈ M argmin f .
Therefore, the strict Feller property is satisfied. In the particular case f = V ∗ (see Lemma 18.1), Q(ω) = (1 − ε(ω))π ∗ (ω) + ε(ω)M( m ),
m ≡ argmin V ∗ .
(18.42)
Intertemporal asset pricing
449
For simplicity, assume henceforth that dividend processes are Markovian and time-homogeneous,16 di,t (ωt ) = di∗ (ωt ),
for all i, t, and ωt .
Then it follows from Theorem 18.4 and (18.42) that the price of security i is indeterminate if and only if u (e∗ )di∗ is nonconstant on m .
(18.43)
The essential economic (as opposed to mathematical) content of this restriction is that knowledge of the level of intertemporal utility V ∗ is not sufficient to infer the weighted dividend u (e∗ )di∗ , or more precisely u (e∗ )di∗ is not V ∗ -measurable.17
(18.44)
The conditions for indeterminacy simplify if we consider the i.i.d. case where ε(ω) and π ∗ (ω, ·), and hence also P (ω) and Q(ω), are independent of ω. Then V ∗ (·) = u(e∗ (·)) + constant. Therefore, by (18.43), the ith price is indeterminate if and only if di∗ is nonconstant on argmin e∗
(18.45)
which in the sense explained in the preceding footnote is tantamount to18 di∗ is not e∗ -measurable.
(18.46)
This will be the case, for example, if there exist state variables affecting dividends that do not influence consumption. Since that is a plausible hypothesis, we conclude that our model predicts price indeterminacy for a “broad” or at least economically interesting class of dividend and endowment processes. Moreover, note that the model delivers predictions regarding the cross-sectional (across asset) variation of the degree of indeterminacy. That is, referring to Theorem 18.4, we see that q j ,t q i,t q¯i,t q¯j ,t
if
min dj∗ min di∗ max di∗ max dj∗ ,
m
m
m
m
where, given the i.i.d. assumption, m = argmin e∗ . Therefore, asset j features a “large” degree of indeterminacy in its price if [min m dj∗ , max m dj∗ ] is large, which interval provides a measure of the extent to which dj∗ is “unpredictable” given consumption.19 Finally, consider the counterpart of (18.39)–(18.40), under the assumptions that the contamination function ε is constant, is finite, and π ∗ (ω, ω ) > 0 for all
450
Larry G. Epstein and Tan Wang
ω and ω in , ensuring thereby the absolute continuity of P with respect to π ∗ . Then (18.40) is equivalent to20 zt+1 dπ ∗ (ωt , ·) = 1 and
zt+1 1 − ε, zt+1 (ωt , ·) = 1 − ε
on \ m ,
and the associated Euler equations (18.39) take the form ) * u (et+1 ) −1 ∗ zt+1 Ri,t+1 , i = 1, . . . n, β = Eπ (ωt ,· ) u (et )
(18.47)
(18.48)
where Ri,t+1 ≡ [qi,t+1 + di,t+1 ]/qi,t . The potential empirical significance of (18.47)–(18.48) can be illustrated through the analysis of stochastic discount factors in Hansen and Jagannathan (1991), for example. They infer from asset price and aggregate consumption data for the United States that stochastic discount factors that rationalize the data in the sense of (18.41) must have a large variance. The indicated variance is often considered too extreme to be compatible with any “reasonable” model of fundamentals and is occasionally interpreted as evidence for “fads” (Poterba and Summers (1988)). In particular, the consumption-based model having zt+1 ≡ 1, is rejected in this way because consumption is too smooth. It is interesting, therefore, to examine whether our specific model of discount factors (18.47) is compatible with a large variance. To highlight the role of uncertainty, we make the challenge facing our model of discount factors as difficult as possible and assume the extreme case of “smooth consumption,” e∗ constant. We then compute mvar(ε), the maximum variance of limiting distributions corresponding to some {zt+1 } satisfying (18.47) and ergodicity. (Ergodicity justifies the approximation of moments of the limiting distribution by appropriate sample moments.) Assuming that {ωt } under π ∗ is ergodic with limiting distribution described by p ∈ M( ), we find that21 2 mvar(ε) = ε 1 − min p(ω) / min p(ω). ω∈
ω∈
A consideration in evaluating the implications of this expression is that the underlying state space and therefore also π ∗ , may not be observable to the analyst even if the probability distributions induced by π ∗ on dividends and rates of return are observable or estimable. Note accordingly that for any given ε > 0, mvar(ε) → ∞ as minω∈ p(ω) → 0. It follows that, unless the analyst insists on maintaining assumptions on and π ∗ that are themselves arguably irrefutable, our model does not restrict the variance of discount factors. Moreover, the earlier mentioned is true for any fixed ε > 0, even arbitrarily small. This suggests, therefore, that some heretofore anomalous features of asset return data can be accommodated if we introduce a “small” amount of uncertainty aversion into the standard model. The earlier mentioned is not to suggest that other important empirical puzzles are similarly resolvable or that the model (18.47)–(18.48) is irrefutable. Indeed,
Intertemporal asset pricing
451
in other dimensions the empirical restrictiveness of the generalization (18.42) diminishes “continuously” as ε increases from 0, the standard model, to the extreme of complete ignorance, ε = 1. For example, assuming for simplicity that e∗ is constant, it follows from (18.47)–(18.48) that the return to a one-period pure discount bond equals β −1 and that −1 Eπ ∗ (ωt ,·) Rt+1 − β ε Eπ ∗ (ωt ,·) Rt+1 − min Rt+1 ,
where Rt+1 ≡ (qt+1 + dt+1 )/qt . Consequently, the largest admissible equity premium is small if ε is small.22 Belief Function Kernels. Let f ∈ C( ) and define ψf (y) ≡ argmin{f (ω) : G(ω) = y}. From (18.15) (see also Wasserman (1990: Theorem 2.1)), it follows that for any given f ∈ C( ), Qf (ω) ={m ∈ M( ) : m(·) =
r(y)(·) dp(y|G(ω)) for some function
G( )
r : G( ) → M( ) such that r(y)(ψf (y)) = 1 for all y}. (18.49) Therefore, Qf is a continuous correspondence and PG satisfies the strict Feller property if the mapping y → p(·|y ) is continuous in the strong topology. If f is set equal to V ∗ in (18.49), we obtain a representation for elements of Q as a suitable mixture of measures {r(y) : y ∈ G( )}, where r(y) has support on ψV ∗ (y). Since V ∗ is constant on each ψV ∗ (y), every m ∈ Q(ω) induces the identical probability distribution for V ∗ . Nevertheless, Q(ω) is a nonsingleton if the set of minimizers ψV ∗ (y) is a nonsingleton for “many” y values, since then there are many possible choices for the measure r(y) supported on ψV ∗ (y). Arguing as in the preceding example, we can show that the essential economic characterization of indeterminacy for the ith security price is the condition u (e∗ )di∗ is not (G, V ∗ )-measurable;
(18.50)
that is, the level of the weighted dividend u∗ (e∗ )di∗ cannot be inferred from knowledge of the levels of the statistics G and intertemporal utility V ∗ . This can be expected to be the case in situations where the statistics G provide only a crude summary of the underlying state. Nondifferentiable Lucas Model. Price indeterminacy can occur also in a Lucasstyle model where the felicity function u is not necessarily differentiable. However, such an “explanation” of indeterminacy differs from ours in two important respects. First, it does not capture Keynes’ intuition, in the citation given earlier, regarding the link between uncertainty and indeterminacy. In our model, V (c) = 0∞ β t u(ct ) for deterministic consumption processes, that is, those for which each ct is a
452
Larry G. Epstein and Tan Wang
constant function. Therefore, all the usual regularity properties, including the uniqueness of supporting prices, are satisfied in the domain of deterministic consumption processes, supporting our assertion that indeterminacy is due to uncertainty. In contrast, in the modified Lucas model, supporting prices are nonunique even for deterministic consumption processes. The second important difference concerns the robustness of the prediction of indeterminacy. Since u can fail to be differentiable only on a zero Lebesgue measure subset K of R, security prices are determinate in the Lucas model as long as all conditional distributions assign zero probability to consumption lying in K. For example, if the endowment process is constant with e∗ ≡ e, ¯ then security prices are determinate for all e¯ ∈ / K. On the other hand, for the constant endowment case our model predicts indeterminacy for all values of e¯ and all securities paying nonconstant dividends (see (18.43), for example, and note that m = if e∗ is constant). More generally, we have argued earlier that in our model indeterminacy occurs in a “large” set of economies. Heterogeneous Agents. Our “justification” for representative agent modeling is the usual one, namely that it provides a simple way to organize observations in terms of familiar microeconomic principles and notions. One may also take a more stringent view and ask whether such a model can be justified theoretically in the context of an economy with heterogeneous agents. Here we adopt such an approach and prove a complete-markets aggregation theorem along the lines of Constantinides (1982), thereby providing an additional “example” to which our representative agent analysis applies. The example serves also to suggest an alternative interpretation for our price indeterminacy result in a model with trade and to clarify the “real” consequences of Knightian uncertainty in our model. Expand the economy defined in Section 18.3.1 to admit H consumers, where consumer h has intertemporal utility function V h corresponding to discount parameter β, belief kernel correspondence P , and felicity function uh , the only source of differences in consumer preferences. (Though restrictive, these assumptions, are weaker than those in Constantinides (1982), where the standard single prior representation of beliefs is also imposed.) Specialize P further so that it implies i.i.d. beliefs (P (ω) independent of ω), has full support and is based on the capacity representation of beliefs) Schmeidler (1989)); both the ε-contamination and belief function kernel examples fulfill the latter requirement. (See Appendix C for clarification and for a proof of the assertions given later under more general assumptions.) Though we will be interested in the competitive equilibria of a decentralized economy, it is useful first to characterize Pareto optimal allocations given the earlier mentioned preferences, an aggregate endowment process c (possibly different from e), and the initial state ω0 . For the usual reasons, it is enough to consider, for each vector α = (αh )H h=1 of nonnegative utility weights, the planning problem U α (c; ω0 ) ≡ max {αh V h (ch ; ω0 ) : ch ∈ D, ch = c},
c ∈ D. (18.51)
This U α is a candidate utility for the representative agent in the decentralized economy specified in the usual way (see e.g. Duffie (1992: Chapter 2)). Consumers
Intertemporal asset pricing
453
begin with endowments eh ∈ D of consumption and zero shares of each asset and then trade in complete asset markets. Focus on an (Pareto optimal) equilibrium allocation and denote by q ∈ D n a corresponding equilibrium price and by α the utility weights corresponding to (18.51). Then, by suitable adaptations of Duffie (1992: 9–11), q is also an equilibrium in the single-agent model with aggregate endowment e and intertemporal utility U α . The agent with utility U α is “representative” if the intertemporal utility function α U lies in the same recursive class defined in Section 18.2 containing the individual utilities. Under our assumptions, thus is indeed the case: The standard risk-sharing rule, that is Pareto optimal in the expected utility framework of Constantinides, is to allocate the endowment x at any time t and state ωt by solving , + uα (x) ≡ max αh uh (xh ) : xh = x .
(18.52)
Under our assumptions, this risk-sharing rule continues to be efficient given aversion to Knightian uncertainty, that is, the set of processes {c¯h }H h=1 solves (18.51) H h t t if and only if {c¯t (ω )}h=1 solves (18.52) for all t, w and x = ct (ωt ). It follows that U α is the recursive intertemporal utility function corresponding, in the sense of our paper, to β, P and uα . This aggregation result “justifies” the application to aggregate data of our Euler inequalities (18.29) or the discount-factor model (18.39)–(18.40). However, interpretations of the indeterminacy of prices and its potential empirical relevance must be revised. That is because it is generally not the case that every equilibrium q for the representative agent economy with utility U α is also a competitive equilibrium for the given initial endowments {eh }. The situation is easily visualized in the context of an Edgeworth box where at an arbitrary point on the contract curve there exists a continuum of price lines that separate the better-than-sets for the two agents, but these lines do not all pass through the given initial endowment. Since not all selections from the set of representative agent equilibria are warranted, our earlier discussion of sunspots, animal spirits and price volatility seems wrong from this perspective. However, not all is lost if the analyst does not know the initial micro endowments. Indeed, if she knows nothing at all about them other than that they sum to e, then from her perspective all equilibria in the representative agent model have equal standing and the significance of price indeterminacy in the representative agent model is restored. More generally, one would expect there to remain a continuum of representative agent price equilibria that are consistent with the analyst’s information about the micro endowments, and some potential for explaining price volatility would be retained. We emphasize that, according to this perspective, the “origin” of price indeterminacy and the associated price volatility lies in the conjunction of: (i) agents’ aversion to Knightian uncertainty and (ii) incompleteness of a model formulated exclusively in terms of aggregate variables, or the analyst’s incomplete information. The preceding also clarifies the differing implications of our model for prices versus allocations. In the general representative agent model, prices may be
454
Larry G. Epstein and Tan Wang
indeterminate while consumption is exogenously specified and thus trivially determinate. This can “explain” greater volatility for prices than for consumption. These comparisons are more interesting in the heterogeneous agent model where the consumption side is nontrivial. Here we see the earlier confirmed in the sense that the prices supporting a given efficient allocation may be indeterminate. This is not to say, however, that Knightian uncertainty aversion has no real consequences, as it clearly influences the set of efficient and competitive allocations.
18.4. Remarks on empirical content Alternative models of irrational expectations, such as Shiller’s model of “fads,” have been criticized for not being well enough specified to produce rejectable implications (West (1988), Cochrane (1991), Leroy (1989)). Some readers may be skeptical also regarding the useful empirical content of our model. The discussions surrounding (18.39)–(18.40) and in Section 18.3.4 provided some indication of the potential usefulness of our model. Here we argue further that empirical investigation of our model is potentially fruitful. However, we caution the reader that the example just described may provide cause for suitably revising and weakening our arguments regarding empirical relevance. One potential source of skepticism concerns Theorem 18.4. Equation (18.38) is the Euler equation implied by a Lucas-style model in which {ξt } represents beliefs. Note that ξt is not a probability kernel because it, (i) depends on ωt and not just ωt and (ii) may not be continuous in ωt . Nevertheless, the theorem raises concerns about whether our model is essentially observationally indistinguishable from a Lucas model, with the rational expectations hypothesis possibly deleted, but where beliefs are represented by probability measures and therefore uncertainty neutrality prevails. Observe, however, that to replicate an equilibrium q as an equilibrium of a Lucas-style model and the associated Euler equation (18.38), the required “shadow” sequence of probability kernels {ξt } may seem unnatural and contrived. (For convenience, we refer to the ξt ’s as probability kernels though they need not conform to our definition of the term.) For example, the ξt ’s will often depend on history or be time dependent for no “good” reason. Second, when some states are extrinsic (see the discussion of sunspot equilibria in Section 18.3.3), replication of a sunspot equilibrium q requires that “shadow” beliefs about intrinsic states, represented by {ξt }, depend upon extrinsic states. Therefore, acceptance of the Lucas model approximation requires that one revise the classification of “intrinsic” versus “extrinsic” states. Finally, we point out later that our model has some crosssectional (across agent) implications. They can be delivered also by a Lucas-style model with a larger number of agents, if each agent’s beliefs are represented by some {ξt }, but the latter would have to vary across agents in an artificial way. Another possible reason for skepticism is the feeling that our model “can explain anything” by a suitable specification of the capacity kernel representing beliefs, which are presumably unobservable. But similar remarks apply with respect to the specification of utility even if the Bayesian, rational expectations model of beliefs is adopted. That is, in principle, a wide range of specifications are possible
Intertemporal asset pricing
455
for the intertemporal von Neumann–Morgenstern index v(c0 , c1 , . . . , ct , . . .). The strong predictive content of the Lucas asset pricing model derives in part from the t u(c ). This specializaparametric specialization of v to the additive form ∞ β t 0 tion is widely accepted, at least as a benchmark, both because of the tractability that it delivers and because we have some understanding of its plausibility, via its axiomatic underpinnings, for example. Analogy with the present context of modeling beliefs argues, not for skepticism, but rather for the need to study the properties of alternative specifications for P . This chapter points out some attractive features of the ε-contamination model (18.9) and of belief function kernels, but much more work in this direction is required. In order to derive rejectable predictions for time series data, beliefs must be related to the actual evolution of the state process. One possible link is to posit that {ωt } is governed by a probability kernel π ∗ and that beliefs incorporate some vagueness about π ∗ on the part of the investor. For reasons of robustness of empirical procedures, Lehmann (1992) suggests studying pricing equations for a range of discount factors, reflecting the analyst’s imprecise information about the correct factors. It is at least as plausible to posit that investor’s information is imprecise. Here such imprecision is incorporated into the theoretical framework and a “robust” theoretical model is delivered. Finally, some may disagree with the presumption that beliefs are unobservable; for instance, a number of researchers, cited in the introduction, have used survey data as an independent measure of investors’ expectations. Therefore, to conclude, suppose that such information is available for a cross-section of investors and consider some predictions of our model regarding expectations. We interpret our model as containing a number of agents with identical endowments and preferences, including the probability kernel correspondence P . If surveys elicit entire correspondences, then agents will respond identically given our model. However, suppose that they are asked for a conditional probability distribution over next period’s state variables, or for some summary moments and that they respond with an uncertainty adjusted conditional prior, that is, with an element of Q(ω). Then there is no reason to expect all investors to report the same element of Q(ω). Thus our model is consistent with heterogeneous measured forecasts, even though agents have common information in the form of P . Moreover, the dispersion of forecasts should increase if Q(ω) increase in the sense of set inclusion. Specialize to the ε-contamination model of beliefs (18.9) and suppose that ε(ω) is larger in those states ω where the “true” conditional probability measure π ∗ (ω) is riskier, for example, has larger variance. Then a positive relation is indicated between dispersion of reported expectations of forecasters on the one hand and the poor performance of point forecasts, on the other. For a related prediction, recall our earlier discussion of a link between the indeterminacy and volatility of prices. Given such a link, our model suggests a positive relation between price volatility and the dispersion of reported expectations of forecasters. There is some supporting evidence for such a relation (Cragg and Malkiel (1982), Frankel and Froot (1990)).
456
Larry G. Epstein and Tan Wang
One could derive a number of other predictions that would be testable given appropriate survey data. Needless to say, we are not asserting that such data are currently available (see, however, Zarnowitz and Lambros (1987)). The current paucity of suitable data is not damning of our model, however. After all, one of the roles of theory is to guide the collection of data.
Appendix A: Proof of Theorem 18.1 The following Lemma (18.A.1) is an adaptation to our space D, consisting of sequences of real-valued functions, of the well-known Blackwell sufficient condition for a contraction mapping that applies to a space of real-valued functions. Lemma 18.A.1. Let T : D → D be an operator with the following properties: (i) (Monotonicity): if f , g ∈ D and f g, that is, ft (ωt ) gt (ωt ) for all t and ωt , then Tf T g; (ii) (Discounting): there exists a real constant β, 0 < bβ < 1, such that for any f ∈ D and sequence of constant functions a = {at } ∈ D with at ∈ R+ , (T (f + a))t (ωt ) (Tf )t (ωt ) + at+1 β for all t and ωt . Then T has a unique fixed point. Proof. Let f and g ∈ D. Set at = ft −gt . Then f g+a. By monotonicity and discounting, (Tf )t (ωt ) (T (g + a))t (ωt ) (T g)t (ωt ) + βft+1 − gt+1 . Thus |(Tf )t (ωt ) − (T g)t (ωt )|/bt βbft+1 − gt+1 /bt+1 , and further Tf − T g βbf − g, proving that T is a contraction. Proposition 18.A.1 (Existence of utility). For each c ∈ D there exists a unique V (c) ∈ D such that (18.16) holds for all t and ωt . Proof. Define a map T : D → D by ∀f ∈ D (Tf )t (ωt ) = u(ct (ωt )) + β ft+1 (ωt , ·) dP (ωt , ·). By the continuity of P , (Tf )t is continuous. Next, β sup(Tf )t (ω )/b sup u(ct (ω ))/b + t sup t t b ω ω ωt t
t
t
t
"
% ft+1 (ω , ·) dP (ωt , ·) t
= sup u(ct (ωt ))/bt + βbft+1 /bt+1 . ωt
Since u is increasing, concave, and u(0) = 0, we have ∞ > u(c) u(ct /bt ) u(ct )/bt and Tf = sup sup t
ωt
(Tf )t (ωt ) bt
u(c) + βb sup ft+1 /bt+1 u(c) + βbf . t
Intertemporal asset pricing
457
Therefore, T is well-defined. Monotonicity and discounting for T are obvious. By Lemma 18.A.1, T has a unique fixed point, which is the solution of (18.16). Proposition 18.A.2 (Approximation of utility). Fix c ∈ D. For each T , define T {VtT (c)}∞ t=1 ∈ D by Vt ≡ 0 for t > T and T VtT (c; ωt ) = u(ct (ωt )) + β Vt+1 (c; ωt , ·) dP (ωt , ·) for 0 t T . Then limT →∞ VtT (c; ωt ) = Vt (c; ωt ) for all t and ωt . Proof. Verify that, for any t T and ωt , VtT (c; ωt ) Vt (c; ωt ) VtT (c; ωt ) + V (c)(βb)T −t+1 bt .
(18.A.1)
Proposition 18.A.3 (Continuity of utility). If u satisfies the growth condition, then Vt (c; ωt ) is continuous in (c, ωt ). Proof. Under the growth condition, ∞
|Vt (c; ωt )|
k1 β j ct+j . + k2 1−β j =1
Hence V (c)
βbk2 c βk1 + . 1−β 1 − βb
Thus it follows from (18.A.1) that βk1 βbk2 c t T t Vt (c; ω ) − Vt (c; ω ) + (βb)T −t+1 bt . 1−β 1 − βb
(18.A.2)
Let cn → c. For fixed t, |Vt (cn ; ωt ) − Vt (c; ωt )| |Vt (cn ; ωt ) − VtT (cn ; ωt )| + |VtT (cn ; ωt ) − VtT (c; ωt )| + |Vt (c; ωt ) − VtT (c; ωt )|. For all cn such that cn − c < 1, |ctn (ωt )| (c + 1)bT for t T . It follows from the continuity of u that the second term converges to zero as cn → c and that the convergence is uniform in ωt . By (18.A.2), the first and third terms on the right side converge to zero as T → ∞ uniformly in n and ωt . Therefore, Vt (·; ωt )
458
Larry G. Epstein and Tan Wang
is continuous at c uniformly in ωt . The desired joint continuity now follows from the continuity of Vt (c; ωt ) in ωt . The remaining properties asserted for utility can be proven by standard arguments from the theory of recursive utility (see e.g. Lucas and Stokey (1984), Stokey and Lucas (1989), and Epstein and Zin (1989)). Footnote 8 clarifies the link with the recursive utility literature; note that W defined there is increasing and concave. For dynamic consistency, note that if P has full support, then (18.8) applies.
Appendix B: Proof of Theorems in Section 18.3 For the convenience of the reader, we provide here statements of two results invoked in Section 18.3. The first is the version of Fan’s Theorem employed in the derivation of the Euler inequalities (18.29). A stronger form is proven in Sion (1958: Theorems 4.2 and 4.2 ). Fan’s Theorem. Let X and Y be metrizable convex and compact subsets of some linear topological spaces, and f a continuous real-valued function on X × Y that satisfies (i) f (·, y) is concave on X for each y; and (ii) f (x, ·) is convex on Y for each x. Then max min f (x, y) = min max f (x, y). x
y
y
x
Second, the argument surrounding (18.32) relies on the following selection theorem that is slightly stronger than that which is explicitly stated in Michael (1956). This result is also needed below in the proof of Theorem 18.3. Lemma 18.B.1 (Michael). Suppose that X is paracompact, Y is a topological linear space, and Z is a convex closed subset of Y containing 0 that has a base {Bn } for the neighborhoods of 0, consisting of symmetric and convex sets such that Bn+1 ⊂ 12 Bn . Suppose that ψ : X → Z ⊂ Y is a lower semicontinuous convex-valued correspondence such that ψ(X) + Bn ⊂ Z. Suppose further that for each y ∈ ψ(X), y + Bn is open in Z. Then the correspondence ψ¯ defined by ¯ ψ(x) = ψ(x) admits a continuous selection. Proof. See the proofs of Lemma 4.1 and Theorem 3.2 in Michael (1956), or the proof of Theorem 9.G of Zeidler (1986). Proof of Theorem 18.2. Part (a): It remains to show only that if a price process q satisfies (18.29), or equivalently (18.27), then it is an equilibrium. Let (c, θ) be any (t, ωt )-feasible plan for which θt−1 (ωt−1 ) = 0. It follows from (18.27) with = θτ that there exist πτ : τ → M( ), τ t, such that πτ (ωτ ) ∈ Q(ωτ ) for
Intertemporal asset pricing
459
each ωτ and θτ · qτ u (eτ ) βEπτ (ωτ ,·) {u (eτ +1 )θτ · (qτ +1 − dτ +1 )}.
(18.B.1)
It follows from the budget constraints that eτ − cτ − θτ · qτ = −(qτ + dτ ) · θτ −1 .
(18.B.2)
By the concavity of u, u(eτ ) u(cτ ) + u (eτ )(eτ − cτ ).
(18.B.3)
Define VtT as in Proposition 18.A.2. We have the following lengthy but elementary chain of inequalities: VtT (c; ωt ) − Vt (e; ωt ) T (c; ωt , ·) dP (ωt , ·) − u(et ) − β Vt+1 (e; ωt , ·) dP (ωt , ·) = u(ct ) + β Vt+1 u (et )(ct − et ) + β = −u (et )θt · qt + β
T Vt+1 (c; ωt , ·) dP (ωt , ·) − β
Vt+1 (e; ωt , ·) dP (ωt , ·)
T Vt+1 (c; ωt , ·) dP (ωt , ·) − β
Vt+1 (e; ωt , ·) dP (ωt , ·)
T (c; ωt , ·)} − βEπt (ωt ,·) {Vt+1 (e; ωt , ·)} −u (et )θt · qt + βEπt (ωt ,·) {Vt+1 T βEπt (ωt ,·) {−u (et+1 )θt · (qt+1 + dt+1 )} + βEπt (ωt ,·) {Vt+1 (c; ωt , ·)}
− βEπt (ωt ,·) {Vt+1 (e; ωt , ·)} " T (c; ωt+1 , ·) = βEπt (ωt ,·) −u (et+1 )θt · (qt+1 + dt+1 ) + u(ct+1 ) + β Vt+2 %
× dP (ωt+1 , ·) − u(et+1 ) − β
Vt+2 (e; ωt+1 , ·) dP (ωt+1 , ·)
" T (c; ωt+1 , ·) = βEπt (ωt ,·) u (et+1 )(et+1 − ct+1 − θt+2 · qt+1 ) + u(ct+1 ) + β Vt+2 %
× dP (ωt+1 , ·) − u(et+1 ) − β
Vt+2 (e; ω
t+1
, ·) dP (ωt+1 , ·)
" T (c; ωt+1 , ·) dP (ωt+1 , ·) βEπt (ωt ,·) −u (et+1 )θt+2 · qt+1 + β Vt+2 %
−β
Vt+2 (e; ωt+1 , ·) dP (ωt+1 , ·)
460
Larry G. Epstein and Tan Wang " " βEπt (ωt ,·) βEπt+1 (ωt+1 ,·) − u (et+2 )θt+2 · (qt+2 − dt+2 ) + u(ct+2 ) + β
T Vt+3 (c; ωt+2 , ·) dP (ωt+2 , ·)
%%
−u(et+2 ) − β .. .
Vt+3 (e; ωt+2 , ·) dP (ωt+1 , ·)
"
βEπt (ωt ,·)
" · · · βEπt+T (ωt+T ,·)
− u (et+T +1 )θt+T · (qt+T +1 + dt+T +1 )
+ u(ct+T +1 ) − u(et+T +1 ) % % −β Vt+T +2 (e; ωt+T +1 , ·) dP (ωt+T +1 , ·) · · · "
" · · · βEπt+T (ωt+T ,·)
βEπt (ωt ,·) −β
Vt+T +2 (e; ω
t+T +1
− u (et+T +1 )θt+T · qt+T +1 %
%
, ·) dP (ωt+T +1 , ·) · · ·
+ , , βEπt (ωt ,·) · · · βEπt+T (ωt+T ,·) u (et+T +1 )K · qt+T +1 · · · +
bt (βb)T −t+1 u (e∗ )K · q → 0
where K ∈ Rn+ has all components equal to inf i,τ ,ωτ θi,τ (ωτ ). The first inequality follows from (18.B.3); the second equality follows from (18.B.2) and θt−1 = 0; the second inequality follows from πt (ωt ) ∈ Q(ωt ); the third inequality follows from (18.B.1); the third and fourth equalities follow from (18.16) and (18.B.2); the fourth inequality follows from (18.B.3); the fifth inequality follows from (18.B.1) and πt+1 (ωt+1 ) ∈ Q(ωt+1 ); the seventh inequality follows from (18.B.2) and (18.B.3); the eighth inequality follows from the short selling constraint θ −K and the nonnegativity of utility; and the last inequality follows from the fact that the process {u (et )K · qt } is in D. Thus Vt (c; ωt ) − Vt (e; ωt ) 0, which implies that q is an equilibrium. Part (b): We need to show only that Q admits a continuous selection. Then the claim of (b) follows from (a) and the arguments in the text surrounding (18.32). Let C ∗ ( ) be the dual of C( ) endowed with the weak∗ topology. Then M( ) is a compact subset of C ∗ ( ). Since C( ) is separable, there exists a countable family {fn } that is a dense subset of the closed unit ball of C( ). Let Z be the closed ball with radius 4 in C ∗ ( ), that is, Z ≡ {m ∈ C ∗ ( ) : m 4}, where the norm is the usual norm on the dual space. Define a metric on Z by 1 fn dP − fn dQ . d(P , Q) = n 9 n
Intertemporal asset pricing
461
This metric induces the weak∗ topology on Z. In particular, it induces the weak convergence topology on M( ), which is a subset of Z. Under this metric, Z is a convex and compact metric space. Define $ 1 1 fn dm < 1 , Bn = n B, B≡ m∈Z: n 9 2 n and apply Lemma 18.B.1. Part (c): Follows from the dynamic consistency of the utility process under the assumption of full support for P . Proof of Theorem 18.4. See text. Proof of Theorem 18.3. Part (b): (i) Proof of (18.33) and (18.34): Define contraction mappings T i : D → D and T i : D → D by, for each f ∈ D, (T i f )t (ωt ) = β Max Em {ft+1 + u (et+1 ) di,t+1 }, m∈Q(ωt )
(T i f )t (ωt ) = β Min Em {ft+1 + u (et+1 ) di,t+1 }. m∈Q(ωt )
Denote their unique fixed points by f¯i and f i and define q i,t and q i,t by q i,t (ωt ) = f¯i,t (ωt )/u (et (ωt )) and q (ωt ) = f (ωt )/u (et (ωt )). By construction, {q i,t } and i,t
{q i,t } ∈ D and satisfy (18.33).
i,t
Given q ∈ E , let {ξt } be as in (18.37) and (18.38). Denote by D the set of processes satisfying the requirements in the definition of D with the possible exception of continuity. Define contraction mappings Ti : D → D by (Ti f )t (ωt ) = βEξt (ωt ,·) {ft+1 + u (et+1 ) di,t+1 }. Its unique fixed point is fi ∈ D. By (18.38) and the uniqueness of the fixed point, u (et )qi,t = fi,t on t . Now (18.34) follows from the monotonicity of the three maps T i , T i , and Ti and the observation that (T i f )t (Ti f )t (T i f )t . For the next step, we need the following Lemma (18.B.2) concerning the existence of ε-optimal continuous policies. Bertsekas and Shreve (1978: Section 8.2) contains a parallel result for measurable policies. Lemma 18.B.2. Suppose that X is paracompact, Y is a topological linear space, and Z is a convex closed subset of Y containing 0 that has a base {Bn } for the neighborhoods of 0, consisting of symmetric and convex sets such that Bn+1 ⊂ 12 Bn . Suppose : X → Z ⊂ Y is a continuous, compact and convex-valued correspondence such that ψ(X) + Bn ⊂ Z. Suppose further that for each y ∈ (X), y + Bn is open in Z. Let F : X × Z → R be continuous. Define f , g : X → R by f (x) = min F (x, y); y∈(x)
and g(x) = max F (x, y). y∈(x)
462
Larry G. Epstein and Tan Wang
(a) If F (x, y) is convex in y, then for any ε > 0 there exists a continuous function h : X → Y such that h(x) ∈ (x) and F (x, h(x)) f (x) + ε for all x ∈ X. (b) If F (x, y) is concave in y, then for any ε > 0 there exists a continuous function h : X → Y such that h(x) ∈ (x) and F (x, h(x)) g(x) − ε for all x ∈ X. Proof. We prove (a). Fix ε > 0. Define a correspondence ψ : X → Z ⊂ Y by ψ(x) = {y ∈ (x) : F (x, y) < f (x) + ε}. By the convexity of F (x, y) in y, ψ(x) is convex. Suppose y ∈ ψ(x0 ). Then F (x0 , y) < f (x0 ) + ε. By the continuity of F (x) + ε (via the Maximum Theorem) and F , there exists a neighborhood N (x0 ) of x0 such that ∀x ∈ N (x0 ), F (x, y) < f (x) + ε, which implies that ∀x ∈ N (x0 ), y ∈ ψ(x), which in turn implies that for any open set V , the set {x : ψ(x) ∩ V = Ø} is open. Therefore ψ is lower semicontinuous. By Lemma 18.B.1, ψ¯ admits a continuous selection, say h. Since ¯ ψ(x) ⊂ {y ∈ (x) : F (x, y) f (x) + ε}, we have F (x, h(x)) f (x) + ε for all x ∈ X. Lemma 18.B.3. If qi ∈ D and satisfies (18.38) for some {ξt }, then u (et )qt bt
u(e)βb . 1 − βb
Proof. Apply (18.38) and the concavity of u. (ii) Proof of (18.35). We show the existence of q 1 . The existence of q 2 can be shown similarly. In the following, the superscript 1 is suppressed and, without essential loss of generality, we set t = 0. Choose T such that 2
(βb)T u(e)βb < βε. u (et )(1 − βb)
By Lemma 18.B.2. (with X = t , Z as in the proof of part (b) of Theorem 18.2, ≡ Q, F (ωt , m) ≡ Em {u (et+1 )(di,t+1 + qi,t+1 )} and noting that the right side of the last expression is a continuous function of (ωt , m)), there exists, for each t, πt : t → M( ) continuous such that πt (ωt ) ∈ Q(ωt ) and βEπt (ωt ,·) {u (et+1 )(di,t+1 + q i,t+1 )} u (et (ωt ))q i,t (ωt )) + u (e∗ )(1 − β)2 ε. By the proof of Theorem 18. 2(b), there is a unique equilibrium price process q in E associated with {πt } as in (18.38) with ξt replaced by πt . Now we show that
Intertemporal asset pricing
463
qi,0 satisfies the appropriate form of (18.35). For this purpose, define qiT ∈ D by T ≡ 0 for t > T + 1, q T qi,t i,T +1 = q i,T +1 and " % u (et+1 ) T T (di,t+1 + qi,t+1 ) for t T . qi,t = βEπt u (et ) Then we claim for t T , T u (et )qi,t u (et )q i,t + u (e∗ )(1 − β)2 (ε + βε + · · · + β T −t ε). (18.B.4)
This is true when t = T , since T u (eT )qi,T = βEπT {u (eT +1 )(di,T +1 + q i,T +1 )}
u (et )q i,T + u (e∗ )ε(1 − β)2 . Assume that (18.B.4) is true for some t + 1 T . Then T T u (et )qi,t = βEπt {u (et+1 )(di,t+1 + qi,t+1 )}
βEπt {u (et+1 )di,t+1 + u (et+1 )q i,t+1 + u (e∗ )(1 − β)2 (ε + βε + · · · + β T −t−1 ε)} u (et )q i,t + u (e∗ )(1 − β)2 (ε + βε + · · · + β T −t ε). Thus (18.B.4) is established. Setting t = 1 and letting T → ∞ on the right side of (18.B.4), we obtain T qi,1 q i,1 + ε(1 − β).
Now by Lemma 18.B.2 and straightforward calculation, T 0 qi,1 − qi,1 2
(βb)T u(e) βb . u (e0 ) 1 − βb
T + βε q Then by our choice of T , qi,1 qi,1 + ε. i,1
(iii) Proof of (18.36). “Only if ” follows from (18.34). For the converse, assume (18.36). By choosing ε sufficiently small in (18.35) and noting the proof of the latter, it follows that there exist two equilibria q 0 and q 1 , with qi0 = qi1 and corresponding (in the sense of (18.32)) sequences of continuous functions {πt0 } and {πt1 } from t to M( ) with πti (ωt ) ∈ Q(ωt ) for i = 0, 1 and all ωi ∈ t . For each α ∈ [0, 1], define πtα = απt0 − (1 − α)πt1 . By the proof of Theorem 18.2(b), there exists a unique q(α) ∈ E such that " % u (et+1 ) t qi,t (α, ω ) = βEπtα (ωt ,·) (qi,t+1 (α) + di,t+1 ) . u (et ) If it can be shown that for each i and t, the map α → qi,t (α) ∈ C( t ) is continuous, 0 = q 1 implies that q 0 (ωt ) = q 1 (ωt ) then the proposition is proven because qi,t i,t i,t i,t
464
Larry G. Epstein and Tan Wang
for some ωt . Then qi,t (α, ωt ) as a continuous function of α assumes at least two distinct values and hence must assume a continuum of distinct values. It remains to show that qi,t (α) is continuous in α. Let ε > 0. By Lemma 18.B.3, T T T qi,t (α) − qi,t (α0 ) qi,t (α) − qi,t (α) + qi,t (α) − qi,t (α0 ) T + qi,t (α0 ) − qi,t (α0 )
2
bt (βb)T −t+1 u(e)βb T T (α) − qi,t (α0 ), + qi,t u (et )(1 − βb) (18.B.5)
T (α) ≡ 0 for t > T and where qiT (α) ∈ D is defined by qi,t
" T (α) = βEπtα qi,t
%
u (et+1 ) T qi,t+1 (α) + di,t+1 u (et )
for t T .
T (α) as a function from [0,1] to C( t ) follows by straightforThe continuity of qi,t ward induction. This implies that the second term of (18.B.5) can be made less than ε/2 by choosing |α − α0 | arbitrarily small. Finally, choose T such that the first term of (18.B.5) is less than ε/2.
Part (a): Let q n ∈ E and q n → q ∈ D n . Then, by the Maximum Theorem, (18.27) is satisfied for q. Therefore, q ∈ E and E is closed. Define P E ⊂ E to consist of those equilibria q for which (18.38) is satisfied by some sequence {ξt } as in (18.37), except that “measurability” is strengthened to “continuity.” As in the proof of (18.36), P E can be shown to be path-connected and hence also connected. Second, P E is dense in E . (The argument is similar to the proof of (18.35); in particular, ε-optimal continuous policies are used. A detailed proof is available from the authors upon request.) We conclude (Dugundji (1966: Theorem 1.6, p. 109)) that E is connected.
Appendix C: Aggregation in a heterogeneous agent economy We provide the details to support the example in Section 18.3.4 dealing with heterogeneous agents. First define the subclass of our model of utility that corresponds to Schmeidler (1989) where beliefs are represented by a capacity. Say that the probability kernel correspondence P is capacity-based if for each ω ∈ : (i) the mapping A → P (ω, A), from B( ) into [0, 1], defines a convex capacity; and (ii) P (ω) = {m ∈ M( ) : m(A) P (ω, A), ∀A ∈ B( )}. In that case we have the following convenient Choquet integration formula for any f ∈ C+ ( ):
f dP (ω) = 0
∞
P (ω, {f t}) dt.
Intertemporal asset pricing
465
Moreover, and this is critical for what follows, for any two such functions f and g: (f + g) dP (ω) f dP (ω) + g dP (ω) (18.C.1) and equality prevails if f and g are comonotone, that is, if [f (ω ) − f (ω)][g(ω ) − g(ω)] 0,
∀ω , ω ∈ .
(18.C.2)
Note that the ε-contamination and belief function kernel examples are capacitybased. For these and other examples in a static setting, see Wasserman and Kadane (1990). Suppose further that P has full support and that P (ω) is constant in ω. Let β, {uh }, {α h }, and e be as in the text. For each t and ωt assign consumption to agent h given by the solution to (18.52) with x = e∗ (ωt ). Denote the consumption h processes defined in this way by ch and the associated utility processes by V . Then V¯th (ωt ) = uh (x h ∗(e∗ (ωt ))) + constant, where x h ∗(x), h = 1, . . . , H , is the solution to (18.52). Since the latter functions are all nondecreasing, we see that for each t, the functions {V¯th : h = 1, . . . , H } are pairwise comonotone. (∗) That is, given the allocation {c¯h }, agents agree (weakly) in their induced rankings of states. This occurs because the i.i.d. assumption restricts the dependence of beliefs on the current state so that it does not offset the comonotonicity of current felicities uh (x h∗ (e∗ (·))). We now show that {c¯h } solves (18.57) uniquely and in particular is Pareto optimal: For any other feasible utility processes {Vth }∞ t=0 for h = 1, . . . , H , (18.16), (18.C.1), and (18.52) imply h h t h h t αh Vt (ω ) = αh u (ct (ω )) + β αh Vt+1 (ωt , ·) dP (ωt , ·) h h h t αh u (c¯t (ω )) + β αh Vt+1 (ωt , ·) dP (ωt , ·) whereas
αh V¯th (ωt ) =
αh uh (c¯th (ωt )) + β
h αh V¯t+1 (ωt , ·) dP (ωt , ·).
(18.C.3) By the contraction mapping arguments in Appendix A, it follows that αh Vth (ωt ) αh V¯th (ωt ).
(18.C.4)
The full support assumption for P guarantees that {c¯h } is the unique solution to (18.51).
466
Larry G. Epstein and Tan Wang
In terms of the candidate representative agent’s intertemporal utility U α defined by (18.51), it follows from (18.C.3) and (18.C.4) that α t α ∗ α Ut (e; ω ) = u (e (ωt )) + β Ut+1 (e; ωt , ·) dP (ωt , ·). Moreover, a corresponding equality holds also if e is replaced by an arbitrary c ∈ D, since the preceding arguments extend to arbitrary endowment processes e. Therefore, U α is generated by β, P , and uα , completing the arguments sketched in the text. Finally, note that the assumption of i.i.d. beliefs was used earlier only to guarantee (∗). Indeed, the latter condition, assumed to hold not only for the given e but for all endowment processes in an open neighborhood of e in the norm topology of D, suffices for our aggregation result.
Acknowledgments We are grateful to the Social Sciences and Humanities Research Council of Canada for financial support and to Chew Soo Hong, Darrell Duffie, Mike Peters, J. C. Rochet, Rishin Roy, and especially to Angelo Melino and Guy Laroque for valuable discussions and suggestions.
Notes 1 See, however, LeRoy and Singell (1987), for a different interpretation of Knight. 2 A closely related model, due to Schmeidler (1989) and Gilboa (1987), replaces the Bayesian prior by a nonadditive probability measure or capacity. In an earlier working paper, the capacity-based model was adopted as a starting point and virtually the identical results were obtained. 3 Under suitable additional restrictions on P, the map A → P(A) defines a capacity and f dP equals the associated Choquet integral, that is central to the capacity-based model of Schmeidler (1989) and Gilboa (1987). (See Appendix C.) With this link in mind, we add the following remark concerning the above definition of uncertainty aversion: Not every uncertainty averse P can accommodate Ellsberg-type behavior; the latter is inconsistent with the “small” subclass of correspondences P for which the capacity P (ω, ·), for some ω, defines a qualitative probability (Schmeidler (1989: 585)). 4 Another possible rationale for (18.9) is based on the hypothesis that the set of states
is not exhaustive. See Epstein and Wang (1992) for elaboration and for another class of examples motivated by “missing states.” 5 The continuity of PG follows from Epstein and Wang (1992: Proposition A.2.1). A closely related form of continuity is apparent from (18.15) later. Under our assumptions on p and G, the integrals there vary continuously with ω since f ∗ is continuous by the Maximum Theorem. 6 The class of functions that are analytic in the sense of analytic set theory (see Dellacherie and Meyer (1982)) is the appropriate one for Choquet integration, which is closely related to the integration notion (18.3) employed here. Therefore, future extensions of D may need to go beyond the space of adapted processes to include processes for which each Xt is analytic. 7 Epstein and Zin (1989) study recursive relations of the form Vt (c; ωt ) = W (c(ωt ), m(Vt+1 (c; ωt , ·))), where m is a generalized certainty equivalent operator.
Intertemporal asset pricing
8
9
10
11
12 13
14
15
16
467
Relation (18.16) is the special case in which W (c, z) = u(c) + βz and m in the generalized expected value operator (18.6). Several of our results can be extended considerably beyond the specification (18.16) by applying and adapting available results on recursive utility, but such extensions would detract from the main focus of this chapter. Note also that the presence of uncertainty aversion introduces an important technical difference, namely a lack of Gâteaux differentiability, relative to the analysis in Epstein and Zin. See Sections 18.2.5 and 18.3 for elaboration and for the economic significance of the nondifferentiability. Though we have not axiomatized our model, there is reason to believe that it can be provided with a respectable axiomatic basis. That is because in the literature on preferences under risk, the corresponding question of how to extend atemporal theories has been thoroughly examined and the recursive approach has been provided with respectable axiomatic credentials (see Kreps and Porteus (1978), Chew and Epstein (1991), and, for an overview, see Epstein (1993)). In addition, Skiadas (1992) axiomatizes recursive utility in a Savage-style framework where conditional subjective probabilities are derived. In particular, we assume that for each security buying and selling prices coincide. In fact, the presence of uncertainty aversion can “explain” bid-ask spreads even in the absence of transactions costs. We leave this extension of our model to a separate chapter. By cτ , we mean the function cτ (ωt , ·) on τ −t and similarly for qτ , θτ and so on. The indicated equality and inequality are intended at the level of functions and so apply throughout τ −t . Similar simplifying notation is adopted throughout the chapter. Finally, note that restrictions on short sales are commonly assumed in the literature in order to guarantee existence of planning optima and equilibria. The objective function in (18.26) is concave in ξ and therefore is almost everywhere differentiable in ξ , for given e, d, q, and . It is incorrect, however, to interpret this fact as implying that the price indeterminacy discussed later is “infrequent.” Only differentiability at ξ = 0 is relevant to price determinacy. Thus the relevant question is whether for “many” specifications of e, d, q, and , the objective function in (18.26) is nondifferentiable in ξ at ξ = 0. The frequency of price indeterminacy is examined in Section 18.3.4. We continue to write Q rather than QV ∗ . It is common in the literature to assume a time-homogeneous Markov structure for dividends and to restrict attention to price processes that are time-homogeneous and Markovian. Therefore, we point out that under the earlier mentioned assumption, Theorems 18.2 and 18.3 remain valid if price processes are defined to be elements of D that are time-homogeneous and Markovian. It is well known that sunspot equilibria may exist, even in infinitely lived representative agent models, given financial constraints, externalities, nonconvexities, or other sources of market imperfection that lead to inefficient equilibrium allocations. See Guesnerie and Woodford (1993) for a survey. In contrast, in our model no such imperfections exist and the equilibrium allocation is trivially efficient, but quantities do not vary with the extrinsic state. Note the difference between (18.32) and (18.38). The former is sufficient for q to be an equilibrium since the selection {πt } is assumed to be continuous and hence the solution q to (18.32) must lie in D n . On the other hand, as just shown, the existence of a measurable selection {ξt } as in (18.37)–(18.38) is a necessary condition for q to be an equilibrium. It is also sufficient only if, as in Theorem 18.4, we assume that the solution q to (18.38) lies in D n . It is more common to assume a time-homogeneous Markov structure for growth rates rather than levels. Our analysis is readily modified accordingly with no effects on our qualitative results.
468
Larry G. Epstein and Tan Wang
17 That is, u (e∗ ) di∗ is not measurable with respect to the σ -algebra on generated by the mapping V ∗ : → R. Note that (18.44) is weaker than (18.43) since the latter requires only that one not be able to infer the magnitude of u (e∗ ) di∗ from knowledge that V ∗ = min V ∗ , while the former rules out the possibility of such inference given V ∗ = k for some k. This difference does not appear to us to be economically significant and thus we will not differentiate between (18.43) and (18.44). 18 If consists of only two states, then (18.45) and (18.46) are each equivalent to: (∗) e∗ is constant and di∗ is not constant (on ). In particular, for this i.i.d. case, indeterminacy can occur only if consumption is certain. This conclusion that asset price indeterminacy is limited to riskless initial positions is also apparent from examination of the indifference curves of a Gilboa–Schmeidler utility in the state preference diagram for a static setting (see Simonsen and Werlang (1991), for example). However, one must be cautious in extrapolating to more general state spaces, where (18.45) implies not (∗), but rather that the conditions specified there apply on arg min e∗ . 19 Note the loose parallel with the case of risk (ε = 0 and P a probability kernal) where our model reduces to the consumption-based CAPM according to which the risk premium for asset j depends on the covariation of di∗ and consumption. 20 Under the stated assumptions, (18.42) implies that any ξt+1 (ωt , ·) ∈ Q(ωt ) has RadonNikodym derivative of the form zt+1 (ωt , ·) = 1 − ε + εgt+1 (ωt , ·), for some gt+1 0 satisfying gt+1 dπ ∗ (ωt , ·) ≡ 1 and gt+1 = 0 on \. m . These restrictions on zt+1 are equivalent to (18.47) 21 Specifically, if = {ω1 , ..., ωn }, then 3 2 p(ωi )zi = 1 , m var (ε) ≡ max p(ωi )zi2 − 1 : z ∈ RN , zi 1 − ε ∀i, and the maximum is attained at one of the N extreme points of the constraint set {zj }N 1 , j where zj = 1 − ε + ε/p(ωj ) and zii = 1 − ε if i = j . Finally, note that e∗ constant implies that m = . 22 Under the additional assumption that the price of equity is constant across time and states (such an equilibrium exists if beliefs are i.i.d.), the maximum equity premium equals, in terms of primitives of the model, ) *9) * (1 − ε)Eπ ∗ (ωt ,·) dt+1 + ε min dt+1 . ε(1 − β)β −1 Eπ ∗ (ωt ,·) dt+1 − min dt+1
The latter vanishes if ε = 0. Therefore, this expression represents a premium for the uncertainty associated with holding equity rather than for the bearing of risk. We leave to a separate chapter consideration of the equity premium puzzle (Mehra and Prescott (1985)) unrestricted by the numerous simplifying assumptions of this section.
References Aubin, J. P. (1979). Mathematical Methods of Game and Economic Theory. Amsterdam: North Holland. Barsky, R. B. and J. B. DeLong (1992). “Why Does the Stock Market Fluctuate?” NBER Working Paper 3995. Bertsekas, D. P. and S. E. Shreve (1978). Stochastic Optimal Control. New York: Academic Press. Bewley, T. (1986). “Knightian Decision Theory: Part I,” Cowles Foundation Discussion Paper No. 807, Yale University. Camerer, C. and M. Weber (1992). “Recent Developments in Modeling Preferences: Uncertainty and Ambiguity,” Journal of Risk and Uncertainty, 5, 325–370.
Intertemporal asset pricing
469
Chew, S. H. and L. G. Epstein (1991). “Recursive Utility under Uncertainty,” in Equilibrium with an Infinite Number of Commodities, ed. A. Khan and N. Yannelis. Heidelberg: Springer-Verlag. Cochrane, J. H. (1991). “Volatility Tests and Efficient Markets,” Journal of Monetary Economics, 27, 463–485. Cochrane, J. H. and L. P. Hansen (1992). “Asset Pricing Explorations for Macroeconomics,” NBER Working Paper 4088. Constantinides, G.M. (1982). “Intertemporal Asset Pricing with Heterogeneous Consumers and without Demand Aggregation,” Journal of Business, 55, 253–267. Cragg, J. and B. Malkiel (1982). Expectations and the Structure of Share Prices. Chicago: University of Chicago Press. Dellacherie, D. and P. A. Meyer (1982). Probabilities and Potential. New York: NorthHolland. DeLong, J. B., A. Shleifer, L. H. Summers, and R. J. Waldmann (1990). “Noise Trader Risk in Financial Markets,” Journal of Political Economy, 98, 703–738. Dempster, A. P. (1967). “Upper and Lower Probabilities Induced by a Multivalued Mapping,” Annals of Mathematical Statistics, 38, 325–339. Dow, J. and S. R. Werlang (1992). “Uncertainty Aversion, Risk Aversion and the Optimal Choice of Portfolio,” Econometrica, 60, 197–204. (Reprinted as Chapter 17 in this volume.) Duffie, D. (1992). Dynamic Asset Pricing Theory. Princeton: Princeton University Press. Duffie, D. and L. G. Epstein (1992). “Stochastic Differential Utility,” Econometrica, 60, 353–394. Dugundji, J. (1966). Topology. Boston: Allyn and Bacon. Ellsberg, D. (1961). “Risk, Ambiguity, and the Savage Axioms,” Quarterly Journal of Economics, 75, 643–669. Epstein, L. G. (1993). “Behavior under Risk: Recent Developments in Theory and Applications,” in Advances in Economic Theory, Vol. II, ed. J. J. Laffont. Cambridge: Cambridge University Press. Epstein, L. G. and M. LeBreton (1993). “Dynamically Consistent Beliefs Must Be Bayesian,” Journal of Economic Theory, 61, 1–22. Epstein, L. G. and S. Zin (1989). “Substitution, Risk Aversion, and the Temporal Behavior of Consumption and Asset Returns: A Theoretical Framework,” Econometrica, 57, 937–969. Epstein, L. G. and T. Wang (1992). “Intertemporal Asset Pricing under Knightian Uncertainty,” University of Toronto, Working Paper 9211. Frankel, J. A. and K. Froot (1990). “Exchange Rate Forecasting Techniques, Survey Data, and Implications for the Foreign Exchange Market,” NBER Working Paper 3470. Gilboa, I. (1987). “Expected Utility Theory with Purely Subjective Non-Additive Probabilities,” Journal of Mathematical Economics, 16, 65–88. Gilboa, I. and D. Schmeidler (1989). “Maxmin Expected Utility with Non-Unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) —— (1993). “Updating Ambiguous Beliefs,” Journal of Economic Theory, 59, 33–49. (Reprinted as Chapter 8 in this volume.) Guesnerie, R. and M. Woodford (1993). “Endogenous Fluctuations,” in Advances in Economic Theory, Vol. II, ed. J. J. Laffont. Cambridge: Cambridge University Press. Hansen, L. P. and R. Jagannathan (1991). “Implications of Security Market Data for Models of Dynamic Economies,” Journal of Political Economy, 99, 225–262.
470
Larry G. Epstein and Tan Wang
Hansen, L. P. and S. F. Richard (1987). “The Role of Conditioning Information in Deducing Testable Restrictions Implied by Dynamic Asset Pricing Models,” Econometrica, 55, 587–619. Ito, T. (1990). “Foreign Exchange Rate Expectations: Micro Survey Data,” American Economic Review, 80, 434–449. Jaffray, J. Y. (1992). “Dynamic Decision Making with Belief Functions,” mimeo. Keynes, J. M. (1921). A Treatise on Probability. London: Macmillan. —— (1936). The General Theory of Employment Interest and Money. London: Macmillan. Klein, E. and A. Thompson (1984). Theory of Correspondences. New York: Wiley. Knight, F. H. (1921). Risk, Uncertainty and Profit. Boston: Houghton Mifflin. Koppel, R. (1991). “Animal Spirits,” Journal of Economic Perspectives, 5, 203–210. Kreps, D. M. and E. L. Porteus (1978). “Temporal Resolution of Uncertainty and Dynamic Choice Theory,” Econometrica, 46, 185–200. Lehmann, B. N. (1992). “Asset Pricing and Intrinsic Values: A Review Essay,” Journal of Monetary Economics, 28, 485–500. LeRoy, S. F. (1989). “Efficient Capital Markets and Martingales,” Journal of Economic Literature, 27, 1583–1621. LeRoy, S. F. and L. D. Singell Jr. (1987). “Knightian Risk and Uncertainty,” Journal of Political Economy, 95, 384–406. Lucas, R. E. Jr. (1978). “Asset Prices in an Exchange Economy,” Econometrica, 46, 1429–1445. Lucas, R. E. Jr. and N. Stokey (1984). “Optimal Growth with Many Consumers,” Journal of Economic Theory, 7, 188–209. Mehra, R. and E. Prescott (1985). “The Equity Premium: A Puzzle,” Journal of Monetary Economics, 15, 145–161. Michael, E. (1956). “Continuous Selections, I,” Annals of Mathematics, 63, 361–382. Papamarcou, A. and T. L. Fine (1991). “Unstable Collectives and Envelopes of Probability Measures,” Annals of Probability, 19, 893–906. Poterba, J. M. and L. H. Summers (1988). “Mean Reversion in Stock Prices: Evidence and Implications,” Journal of Financial Economics, 22, 27–59. Royden, H. L. (1988). Real Analysis, 3rd ed. New York: Macmillan. Savage, L. (1954). The Foundations of Statistics. New York: John Wiley. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Shiller, R. J. (1981). “Do Stock Prices Move Too Much to Be Justified by Subsequent Changes in Dividends?” American Economic Review, 71, 421–436. —— (1991). Market Volatility. Cambridge: MIT Press. Simonsen, M. H. and S. R. C. Werlang (1991). “Subadditive Probabilities and Portfolio Inertia,” R. de Econometria, 11, 1–19. Sion, M. (1958). “On General Min-Max Theorems,” Pacific Journal of Mathematics, 8, 171–176. Skiadas, C. (1992). “Advances in the Theory of Choice and Asset Pricing,” Ph.D. Dissertation, Stanford University. Stokey, N. and R. E. Lucas Jr. (1989). Recursive Methods in Economic Dynamics. Cambridge: Harvard University Press. Walley, P. (1991). Statistical Reasoning with Imprecise Probabilities. London: Chapman and Hall. Wasserman, L. A. (1988). “Some Applications of Belief Functions to Statistical Inference,” Ph.D. Dissertation, University of Toronto.
Intertemporal asset pricing
471
—— (1990). “Prior Envelopes Based on Belief Functions,” Annals of Statistics, 18, 454–464. Wasserman, L. A. and J. Kadane (1990). “Bayes’ Theorem for Choquet Capacities,” Annals of Statistics, 18, 1328–1339. West, K. D. (1988). “Bubbles, Fads and Stock Price Volatility Tests: A Partial Evaluation,” Journal of Finance, 43, 639–660. Zarnowitz, V. (1984). “Business CycleAnalysis and Expectational Survey Data,” in Leading Indicators and Business Cycle Surveys, eds K. H. Openheimer and G. Poser. Aldershot: Gower Publishing. —— (1992). Business Cycles: Theory, History, Indicators and Forecasting. Chicago: University of Chicago Press. Zarnowitz, V. and L. A. Lambros (1987). “Consensus and Uncertainty in Economic Prediction,” Journal of Political Economy, 95, 591–621. Zeidler, E. (1986). Nonlinear Functional Analysis and Its Applications, I: Fixed-Point Theorems. New York: Springer-Verlag.
19 Sharing beliefs Between agreeing and disagreeing Antoine Billot, Alain Chateauneuf, Itzhak Gilboa, and Jean-Marc Tallon
19.1. Introduction When is it Pareto optimal for risk averse agents to take bets? Under what conditions do they choose to introduce uncertainty into an otherwise certain economic environment? One obvious case is where they do not share beliefs. As in the classical (theoretical) example of horse lotteries, people who do not agree on probability assessments do find it mutually beneficial to engage in uncertainty-generating trade. If the agents involved are Bayesian expected utility maximizers and strictly risk averse, it is not hard to see that disagreement on probabilities is the only way that betting, understood as trade of an uncertain asset, may be Pareto improving when starting from a full insurance allocation. On the other hand, any such disagreement induces betting. Put differently, Pareto optimality dictates either that there be no betting (in case beliefs are common to all agents) or that there be betting (in case of disagreement). This is somewhat puzzling, because there is no lack of allocation-neutral, “sunspot” sources of uncertainty in the world around us. If every disagreement on probabilities of states of the world suggests a Pareto improving trade, one might have expected to see much more betting taking place. Rather than believing that people who do not bet necessarily share probabilistic beliefs about anything they do not bet on (or, to be precise, share these beliefs up to some slack allowed by transaction costs), we tend to take the relative rarity of bets as a piece of empirical evidence against the Bayesian model. It seems that often people do not bet because they are uncertainty averse, and they therefore tend to avoid uncertainty that they know little about. It follows that a person’s willingness to bet will increase with her subjective confidence in her information and in her likelihood assessments. It is worth emphasizing that Bewley’s (1986) motivation for his work on Knightian decision theory was partly this absence of observed widespread betting.
Billot, Antoine, Alain Chateauneuf, Itzhak Gilboa, and Jean-Marc Tallon (2000) “Sharing beliefs: between agreeing and disagreeing,” Econometrica, 68, 685–694.
Sharing beliefs
473
While we do not attempt to argue that the full complexity of betting behavior can be explained by the type of models we study here,1 we are led to ask, how much can be explained by these models if we relax some of the more demanding assumptions of the Bayesian model. Specifically, we consider maxmin expected utility with a nonunique prior (Gilboa and Schmeidler, 1989) that captures Knightian uncertainty (Knight, 1921). Assume that such uncertainty averse agents who are also risk averse, give rise to an economy in which there is no aggregate risk. When does there exist full insurance, that is, no-bet allocations that are also Pareto optimal? When is it the case that all Pareto optimal allocations are full insurance? Is any betting due to different beliefs, and, conversely, does a difference in beliefs always trigger some betting? In the multiple prior model an individual is characterized by a utility function and a nonempty, closed and convex set of probability measures. The individual evaluates every act by its expected utility according to each possible probability measure, and chooses an act whose minimal expected utility is the highest. The family of preference relations described by this model strictly contains the relations described by Choquet expected utility with a convex capacity (Schmeidler, 1989). Consider now a pair of agents conforming to the multiple prior model. It is an easy extension of the expected utility analysis to show that these agents will not bet against one another if they share at least one prior. Moreover in a general framework with more than two agents and complex bets possibly involving several of them, it is easy to show, following Dow and Werlang (1992) early intuition, that Pareto optimal allocations are indeed full insurance allocations whenever agents’ sets of priors have a nonempty intersection (see, e.g. Dana, 1998; Tallon, 1998). The question of whether the converse to this result holds arises naturally: is commonality of beliefs, in the sense of agents sharing a prior in common, exactly what is needed to explain, within the framework of the multiple prior model, the absence of betting on the many possible sources of “extrinsic” uncertainty? Differently put, is the observation of a Pareto optimal allocation that is immune to sunspots enough to tell us something about the intersection of agents’sets of priors? It turns out that we can answer this question affirmatively and that the result in the Bayesian model has a conceptually identical counterpart in the multiple prior model. Under the same nontriviality conditions, there exists a Pareto optimal full insurance allocation if and only if all Pareto optimal allocations provide full insurance, and this holds if and only if all agents share a prior probability on the states of the world. In other words, commonality of beliefs is the necessary and sufficient condition to explain the absence of betting. Whereas in the Bayesian model “sharing a prior” could only mean “having an identical prior,” in the multiple prior model this phrase may be read as “having at least one prior in common.” With this grammatical convention in place, the result holds verbatim. Bayesian agents either agree on probability assessments, or disagree enough to bet against each other. By contrast, uncertainty averse agents can be in a “grey area” between agreeing and disagreeing: they may not agree in the sense of having the same set of possible priors, yet not disagree in the sense of being willing to bet against each other.
474
Billot et al.
Finally, we emphasize another contribution of this note. In showing that commonality of beliefs is the minimal assumption explaining the absence of bets, we prove a separation theorem for n convex sets that might be of interest on its own. The rest of this chapter is organized as follows. Section 19.2 provides the setup of the model. In Section 19.3 we state the main result and the separation theorem. Proofs are relegated to an Appendix.
19.2. Setup The economy we consider is a standard two-period pure-exchange economy with uncertainty in the second period, but for agents’ preferences. The state space is S, and is a σ -algebra of subsets of S, so that (S, ) is a measurable state space. There are n agents indexed by subscript i. We assume (i) that there is only one good, which can be interpreted as income or money; and (ii) that there is no aggregate uncertainty. Trading an uncertain asset is thus interpreted as betting rather than as hedging. Let B(S, ) be the Banach space of real-valued, bounded and measurable functions on S, endowed with the sup-norm. Let ba(S, ) be the space of bounded finitely additive measures on (S, ) endowed with the weak# -topology. Agent i’s consumption Ci , is a positive element of B(S, ), that is, Ci (s) is the consumption of agent i in state s. Denote by w ∈ B(S, ) the constant-across-states aggregate endowment, and assume that w > 0. An allocation C = (C1 , . . . , Cn ) is feasible if ni=1 Ci = w. An allocation is interior if Ci (s) > 0 for all i, for all s. In the multiple prior approach, each agent i is endowed with a utility index Ui :R+ → R and a set Pi of probability distributions over S. Ui is defined up to a positive affine transformation, and is taken to be differentiable, strictly increasing, and strictly concave. Pi is a convex and closed set of ba(S, ). We assume that all priors in Pi are σ -additive.2 Note that Pi is compact in the weak# -topology since it is a weak# -closed subset of the set of finitely-additive probability measures on , which is compact in the weak# -topology (see, e.g. Dunford and Schwartz, 1958). The norm-dual of B(S, ) which is isometrically isomorphic to ba(S, ) will be denoted B # (S, ). The overall utility function Vi defined over B(S, ) then takes the following form: Vi (Ci ) = min Eπ Ui (Ci ). π ∈Pi
We assume throughout that: ∀A ∈ , ∀i, j , ∀πi ∈ Pi , ∀πj ∈ Pj ,
πi (A) = 0 ⇐⇒ πj (A) = 0.
This assumption essentially says that all agents agree on “null events.” The last definition we need is that of a full insurance allocation. An allocation C is said to be full insurance if it is constant apart from a set A ∈ that has πi (A) = 0 for some (and therefore, by the assumption of mutual absolute continuity, for all) πi ∈ Pi and i.3
Sharing beliefs
475
19.3. The main result The following theorem states that the set of Pareto optimal allocations and the set of full insurance allocations are either identical or disjoint. Moreover, they are identical if and only if the agents share at least one prior. Theorem 19.1. Under the maintained assumptions, the following assertions are equivalent: (i) (ii) (iii) (iv)
There exists an interior full insurance Pareto optimal allocation. Any Pareto optimal allocation is a full insurance allocation. Every full insurance allocation is Pareto optimal. ∩ni=1 Pi = Ø.
The intuition for the proof (and the role of some assumptions) is as follows. We prove that (iv) ⇒ (ii) ⇒ (iii) ⇒ (i) ⇒ (iv). If there is a common prior (iv), one can use strict concavity to show that a risk bearing allocation is Pareto dominated by the full insurance allocation that equals its expectation at every state, proving (ii).4 This step uses the mutual absolute continuity assumption, as well as the assumption that the probability measures we deal with are σ -additive (rather than only finitely additive). Observe that with finitely additive measures the implication (iv) ⇒ (ii) does not hold, even in a Bayesian set-up. This is so because the integral of a function with respect to a finitely additive measure may be strictly smaller than each of the values the function assumes. Therefore individuals who hold assets that they view as uncertain may not benefit from smoothing them across states. If every Pareto improving allocation provides full insurance (ii), the converse (iii) also holds, since no two full insurance allocations can be Pareto ranked,5 and it follows trivially that there is at least one such allocation (i). Finally, the crucial step and the main contribution of the theorem is that the existence of a full insurance Pareto optimal allocation (i) implies that there is a common prior (iv). This step does not require concavity of the utility index.6 In proving this last part we make use of the following theorem, which generalizes the standard separating hyperplane theorem, and may be of interest on its own. In the Appendix we also comment on the geometric interpretation of this result, which may be viewed as a separation theorem among n convex sets. Theorem 19.2. Let X be a locally convex linear topological space and let Pi ⊆ X, 1 ≤ i ≤ n, be convex, nonempty, and compact. Then, the following are equivalent: (i) ∩ni=1 Pi = Ø. (ii) There exist I ⊆ {1, . . . , n}, I = Ø and p ∈ co(∪i∈I Pi ) and for each i ∈ I , there exists a continuous linear functional hi : X → R such that: (a) ∀ i ∈ I , hi (q − p) > 0 (b) i∈I hi = 0.
for all q ∈ Pi ,
476
Billot et al.
An immediate corollary of Theorem 19.2 is that, under the same assumption, if ∩ni=1 Pi = Ø, there exist continuous linear functionals hi , i = 1, . . . , n, and a point p such that (a ) hi (q − p) ≥ 0 for all q ∈ Pi , for all i, (b ) ni=1 hi = 0, and (c ) there exist i, i such that the inequality in (a ) is strict. It is worthy of note that a similar result, developed independently and with a rather different motivation, is to be found in Samet (1998), for subsets of a finite dimensional simplex. Samet’s result is weaker in the sense that it guarantees the existence of linear functionals as in our case, but does not guarantee that the separating hyperplanes will intersect at one point p in the convex hull of the sets, and therefore does not yield itself to a straightforward geometric interpretation. Further, Samet’s result can be easily derived from the corollary above specialized to subsets of the simplex. It does not appear that Samet’s argument could easily be amended to get ours. Theorem 19.1 has two immediate corollaries. First, in the Choquet expected utility model with convex capacities, nonempty core intersection is equivalent to some, or all, Pareto optimal allocations being full insurance. Second, in the expected utility case, where the sets of priors are reduced to one point, some, or all, Pareto optimal allocations are full insurance allocations if and only if agents have the same beliefs (i.e. the same prior). Note that even though we cast the argument in the multiple prior model, it should be clear from the proof that a similar result holds for the Bewley (1986) approach. In Bewley’s approach, agents are also endowed with a set of priors and move away from a (exogenously defined) status quo situation only if the new situation is better than the status quo for all the probability distributions in their set of priors. While Bewley characterizes a partial order over acts, a proposed bet will be preferred to a certain status quo if and only if this preference holds in the multiple prior model of Gilboa and Schmeidler.7 Our analysis is conducted for an economy with one good. However, the only use we make of this assumption is in arguing that all full insurance allocations are Pareto optimal. Indeed, one can generalize our results to an economy with m goods, with the slight modification that full insurance allocations that are considered for optimality be assumed Pareto optimal in each state.
Appendix Proof of Theorem 19.1. We first prove (iv) ⇒ (ii). Assume to the contrary that there exists an agent, say, agent 1, such that for every π1 ∈ P1 and every c ∈ R+ , π1 ({s | C1 (s) < c}) + π1 ({s | C1 (s) > c}) > 0. Let π ∈ ∩i Pi and define C i = Eπ Ci for all i. Abusing notation, let C i also denote the constant allocation all states. C = (C i )i is a givingC i to agent i in feasible allocation since i C i = i Eπ Ci = Eπ i Ci = Eπ w1S = w. Now, Vi (Ci ) = min Eϕ Ui (Ci ) ≤ Eπ Ui (Ci ). ϕ∈Pi
Sharing beliefs
477
Furthermore, Eπ Ui (Ci ) ≤ Ui (Eπ (Ci )) = Ui (C i ) = Vi (C i ) for all i since Ui is concave. Since π belongs to Pi , one gets that π({s | C1 (s) < C 1 }) + π({s | C1 (s) > C 1 }) > 0. Furthermore, π({s | C1 (s) < C 1 }) = 0 is impossible, for then π({s | C1 (s) > C 1 }) > 0, implying by σ -additivity of π that Eπ (C1 ) > C 1 , a contradiction. Hence, π({s | C1 (s) < C 1 }) > 0 and, similarly, π({s | C1 (s) > C 1 }) > 0. It follows that V1 (C1 ) < V1 (C 1 ) since U1 is strictly concave. Therefore, the allocation C Pareto dominates C, a contradiction. To see that (ii) implies (iii), let C be a full insurance allocation. Assume, contrary to (iii), that it is not Pareto optimal, and is dominated by another allocation C . By the same argument as above, C is at least as desirable as C for every agent. By transitivity of Pareto domination, C Pareto dominates C. But this is a contradiction since both provide full insurance and there is only one good in the economy. That (iii) implies (i) is obvious, and it remains to prove that (i) implies (iv). Suppose to the contrary that ∩i Pi = Ø, and let C be an interior Pareto optimal allocation that is a full-insurance allocation (Ci is constant for all i apart on a set of measure zero, the latter notion being defined unambiguously given our absolute mutual continuity assumption). By Theorem 19.2 (where X is B # (S, ) endowed with the weak# -topology), since ∩i Pi = Ø, there exists a nonempty set I , a point p and functionals hi ∈ B # (S, ), i ∈ I , such that: (a) ∀i ∈ I , hi (q − p) > 0 (b) i∈I hi = 0.
for all q ∈ Pi ;
Recall that (see, e.g. Kelley and Namioka, 1963: 155) every weak# -continuous linear functional on the conjugate space of a linear topological space E is the evaluation at some point of E. Hence, for all i ∈ I , there exists Di ∈ B(S, ) such that hi (p) = p(Di ), for all p ∈ B # (S, ). Construct the allocation (Cˆ i )i=1, ..., n as follows: Cˆ i = Ci ,
i∈ / I,
Cˆ i = Ci + ε[Di − p(Di )1S ],
i ∈ I,
with ε > 0 small enough so that Cˆ is an allocation. We first check that this allocation is feasible: . . ε Di − p(Di )1S = ε Di − hi (p) i∈I
i∈I
i∈I
=ε
i∈I
Di
i∈I
since
i∈I
hi = 0.
478
Billot et al.
Now, Di is such that hi (q) = q(Di ) for all q ∈ B # (S, ) and hence # q( i∈I Di ) = 0 for all q ∈ B (S, ). To conclude that i∈I D i = 0, suppose there exists s such that i∈I Di (s) = a, a = 0. The event {s | i∈I Di (s) = a} is measurable because the Di are measurable. Now, let q be the continuous linear functional in B # (S, ) correspondingto the additive probability in ba(S, ) with the mass 1 on that event. Then q( i∈I Di ) = 0 implies a = 0, a contradiction. Hence, i∈I Di = 0. Now, for i ∈ I , one has: Vi (Cˆ i ) = Eqˆ ε Ui (Ci + ε[Di − p(Di )1S ]) for some qˆ ε ∈ Pi = Vi (Ci ) + εUi (Ci )[qˆ ε (Di ) − p(Di )] + o(ε) = Vi (Ci ) + εUi (Ci )[hi (qˆ ε − p)] + o(ε) ) * ≥ Vi (Ci ) + εUi (Ci ) inf hi (q − p) + α(ε) q∈Pi
where α(ε) = o(ε)/ε → 0 as ε → 0. Since inf q∈Pi hi (q − p) > 0 by continuity of hi and compactness of Pi , and α(ε) → 0, there exists ε small enough so that the term in brackets is strictly positive. Hence, Vi (Cˆ i ) > Vi (Ci ) for i ∈ I , and we found a Pareto dominating allocation ˆ (Ci )i=1, ..., n , a contradiction. Proof of Theorem 19.2. We start with the following lemma: Lemma. Let X be a locally convex linear topological space and let Pi ⊆ X, 1 ≤ i ≤ n be convex, nonempty, and compact. Assume that ∩i≤n Pi = Ø but that for all l ≤ n, ∩i=l Pi = Ø. Then, there exist p ∈ co(∪ni=1 Pi ) and a continuous linear functional hi : X → R for each i ≤ n such that: (a) ∀ i ≤ n, hi (q − p) > 0 ∀q ∈ Pi ; (b) i≤n hi = 0. The geometric interpretation of this lemma is as follows. Assume that n convex and compact sets have an empty intersection, but that every subset of them has a nonempty intersection. Then, we can find a point p that is not included in any set, but that is “in the middle” in the following sense: one can find, for each set Pi , a hyperplane hi that passes through p that is in the convex hull of the union of the Pi and leaves the entire Pi on one side, such that the normals of these hyperplanes, multiplied by appropriate positive constants, add up to zero. In the case n = 2, our lemma reduces to a standard separation theorem between two disjoint sets. For n > 2, the lemma may be considered as an n-way separation among n convex sets. See Figure 19.A1 for an illustration of the case n = 3. Proof of the Lemma. The proof is by induction on n. For n = 2, we have P1 ∩ P2 = Ø and we use a standard separation theorem (cf. Kelley and Namioka,
Sharing beliefs
479
h2 = h2(p)
2
1
p
h3 = h3(p)
3
h1 = h1(p)
Figure 19.1 Separation among three convex sets.
1963: 119, theorem on strong separation) to conclude that there is a continuous linear functional h:X → R and a number β ∈ R such that h(q) > β for q ∈ P1 and h(q) < β for q ∈ P2 . Choose p such that h(p) = β, and set h1 = h and h2 = −h. By linearity of h it is possible to choose p ∈ co(P1 ∪ P2 ). Assume that the lemma holds for every n < n. Let there be given (Pi )ni=1 . Set A = ∩i
β
∀q ∈ B
and
h˜ n (q) < β
∀ q ∈ A.
Choose q0 ∈ X such that h˜ n (q0 ) = β. We shift the origin to q0 . Specifically, define for each i ≤ n, Pˆi = {p − q0 | p ∈ Pi } = Pi − q0 . Naturally, (Pˆi )ni=1 and their intersections inherit all relevant properties of (Pi )i . Denote Bˆ = B − q0 = Pˆn and Aˆ = A − q0 = ∩i 0 ∀q ∈ Bˆ and h˜ n (q) < 0 ˆ Consider X = {q ∈ X | h˜ n (q) = 0}. X is a locally convex linear ∀q ∈ A. topological subspace of X. Focusing on this subspace, define Pˆi = Pˆi ∩ X for i < n. Obviously, Pˆi is convex and compact for every i < n. We argue that it ˆ On the other hand, Pˆi has a nonempty is also nonempty. Indeed, Pˆi contains A. intersection with Bˆ = Pˆn . By convexity of Pˆi and linearity of h˜ n , Pˆi = Ø. Similarly, for l < n, ∩i=l,n Pˆi contains Aˆ and intersects Bˆ and we therefore get Pˆi = Ø ∀ l < n. i=l,n
ˆ Hence ∩i
480
Billot et al.
i < n, such that hi (q − p) ˆ > 0 ∀q ∈ Pˆi , i < n, and i 0 ∀q ∈ Pˆi . Define h = i
hi (q) =
i
hi (q) = 0.
i
Hence h˜ n and h are continuous linear functionals on X satisfying h˜ n (q) = 0 ⇒ h(q) = 0
∀q ∈ X.
By standard arguments (see Fact 19.2 below), there exists α ∈ R such that h(q) = α h˜ n (q) ∀ q ∈ X. We wish to show that α < 0. Consider q ∈ Aˆ = ∩i 0 ∀ i < n and h(p) ˆ = 0, we obtain h(q) = h(q − p) ˆ =
hi (q − p) ˆ > 0.
i
ˆ It follows that α < 0. On the other hand, h˜ n (q) < 0 since q ∈ A. ˜ Define hn = (−α)hn . Since (−α) > 0, hn (q − p) ˆ = hn (q) > 0 ∀q ∈ Pˆn . To conclude, set p = pˆ + q0 . Observe that p ∈ co(∪n−1 i=1 Pi ) and hence p ∈ co(∪ni=1 Pi ). We claim that p and (hi )i≤n satisfy (a) and (b). Indeed, for every i ≤ n, and every q ∈ Pi : hi (q − p) = hi ((q − q0 ) − (p − q0 )) = hi ((q − q0 ) − p) ˆ >0 since q − q0 ∈ Pˆi . Finally,
i≤n hi
= 0 by construction of hn .
The following two facts, which are used in the proof above, are straightforward and/or well known. Fact 19.1. Let X be a locally convex linear topological space. Let hˆ be a conˆ tinuous linear functional and X = {p ∈ X | h(p) = 0}. Assume that C ⊆ X is convex and compact, and that C ∩ X = Ø. Further assume that h : X → R is a continuous linear functional such that h (p) > 0 ∀ p ∈ C ∩ X . Then, h can be extended to a continuous linear functional h: X → R such that h(p) > 0 ∀p ∈ C. Proof of Fact 19.1. Set D = {p ∈ X | h (p) = 0}. Observe that D = Ø since the origin is in D. Thus C and D are disjoint nonempty closed and convex sets in
Sharing beliefs
481
˜ → R and d ∈ R be X, and C is compact. Let a continuous linear functional h:X such that ˜ h(p)
∀p ∈ D
and
˜ h(p) >d
∀p ∈ C.
We claim that h˜ has to be constant on D. Indeed, assume that for some p, q ∈ D, ˜ ˜ ˆ ˆ h(p) = h(q). Since p, q ∈ D implies h(p) = h(q) = 0 and h (p) = h (q) = 0, ˜ + α(q − p)) | α ∈ we conclude that p + α(q − p) ∈ D for all α ∈ R. Hence {h(p ˜ R} = R, a contradiction to the fact that h(p) < d ∀ p ∈ D. Thus there is a c ∈ R ˜ such that h(p) = c ∀p ∈ D. Since the origin is in D, we obtain c = 0. It follows that d > 0 and therefore ˜ h(p) >d>0
∀ p ∈ C.
We now wish to show that, up to multiplication by a positive constant, h˜ extends h on X. Restrict attention to X . If p ∈ X satisfies h (p) = 0, then p ∈ D ˜ and we know that h(p) = 0. By Fact 19.2 below, there exists α ∈ R such that ˜ h(p) = αh (p) ∀ p ∈ X . However, on C ∩ X , both h˜ and h are positive. Therefore α > 0. Hence h ≡ (1/α)h˜ extends h on X and is positive on all of C. ˜ h: X → R be linear. Assume that Fact 19.2. Let X be a linear space and let h, ˜ h(q) = 0 ⇒ h(q) = 0 ∀ q ∈ X. ˜ Then there exists α ∈ R such that h(q) = α h(q) ∀q ∈ X. We skip the proof of this Fact and now turn to the proof of Theorem 19.2: (i) ⇒ (ii). Assume that ∩i≤n Pi = Ø. Let I be a minimal (with respect to set inclusion) subset of {1, . . . , n} with the property that ∩i∈I Pi = Ø. Since ∩ni=1 Pi = Ø, but Pi = Ø for every i, such a set I exists and for every such set | I | ≥ 2. Apply the Lemma to I . (ii) ⇒ (i). Assume that a point p ∈ X, a set I ⊆ {1, . . . , n}, and functionals (hi )i∈I exist asrequired, and suppose, contrary to (i), that there exists q ∈ ∩i≤n Pi . Then, by (a), i∈I hi (q − p) > 0, contrary to (b).
Acknowledgments We thank participants of the Erasmus conference at Tilburg University and two referees for useful comments.
Notes 1 In particular, we ignore the social aspects of betting as well as the strategic ones (see, e.g., Milgrom and Stokey, 1982). 2 Note that the axiomatization of Gilboa and Schmeidler (1989) delivers only finitely additive probability distributions.
482
Billot et al.
3 It is straightforward to check that C is of full-insurance if and only if ∀i, Ci is constant apart from a set Ai ∈ that has πi (Ai ) = 0 for some (and therefore, by assumption of mutual absolute continuity, for all) πi ∈ Pi . 4 This implication follows the logic of similar results for Choquet expected utility in Chateauneuf et al. (2000). 5 The fact that (iv) implies (ii) and (iii) also appears in Dana (1998) but in a finite set-up. 6 Dana (1998) shows that if there is a full insurance competitive equilibrium in this economy with finitely many states, then agents share a prior in common. Her proof, however, uses the concavity of the utility index and relies on the existence of a competitive equilibrium. 7 Bewley (1989) contains a similar no-trade result for agents whose preferences are given by partial orders as in Bewley (1986). His proof is very similar to Samet’s, and his result is weaker than Theorem 19.2 in the same sense that Samet’s is.
References Bewley, T. (1986). “Knightian Decision Theory: Part I,” Discussion Paper 807, Cowles Foundation. —— (1989). “Market Innovation and Entrepreneurship: A Knightian View,” Discussion Paper 905, Cowles Foundation. Chateauneuf, A., R. A. Dana, and J.-M. Tallon (2000). “Optimal Risk-sharing Rules and Equilibria with Choquet Expected Utility,” Cahiers EcoMaths 97-54, Université Paris I; Journal of Mathematical Economics, 34(2), 191–214. Dana, R. A. (1998). “Pricing Rules when Agents have Non-additive Expected Utility and Homogeneous Expectations,” Cahier du Ceremade, Université Paris IX. Dow, J., and S. Werlang (1992). “Uncertainty Aversion, Risk Aversion, and the Optimal Choice of Portfolio,” Econometrica, 60, 197–204. (Reprinted as Chapter 17 in this volume.) Dunford, N., and J. T. Schwartz (1958). Linear Operators. Part I. New York: Interscience. Gilboa, I., and D. Schmeidler (1989). “Maxmin Expected Utility with a Non-unique Prior,” Journal of Mathematical Economics, 18, 141–153. (Reprinted as Chapter 6 in this volume.) Kelley, J., and I. Namioka (1963). Linear Topological Spaces, Volume of Graduate Text in Mathematics. New York: Springer Verlag. Knight, F. (1921). Risk, Uncertainty and Profit. Boston: Houghton Mifflin. Milgrom, P., and N. Stokey (1982). “Information, Trade and Common Knowledge,” Journal of Economic Theory, 26, 17–27. Samet, D. (1998). “Common Priors and Separation of Convex Sets,” Games and Economic Behavior, 24, 172–174. Schmeidler, D. (1989). “Subjective Probability and Expected Utility without Additivity,” Econometrica, 57, 571–587. (Reprinted as Chapter 5 in this volume.) Tallon, J.-M. (1988). “Do Sunspots Matter when Agents are Choquet-Expected-Utility Maximizers?” Journal of Economic Dynamics and Control, 22, 357–368.
20 Equilibrium in beliefs under uncertainty Kin Chung Lo
20.1. Introduction Due to its simplicity and tractability, the subjective expected utility model axiomatized by Savage (1954) has been the most important theory in analyzing human decision making under uncertainty. In particular, it is almost universally used in game theory. Using the subjective expected utility model to represent players’ preferences, a large number of equilibrium concepts have been developed. The central one being Nash Equilibrium. On the other hand, the descriptive validity of the subjective expected utility model has been questioned, for example, because of Ellsberg’s (1961) famous mind experiment, a version of which follows. Suppose there are two urns. Urn 1 contains 50 red balls and 50 black balls. Urn 2 contains 100 balls. Each ball in urn 2 can be either red or black but the relative proportions are not specified. Consider the four acts listed in Table 20.1. Ellsberg argues that the typical preferences for the acts are f1 ∼ f2 f3 ∼ f4 , where the strict preference f2 f3 reflects an aversion to the “ambiguity” or “Knightian uncertainty” associated with urn 2. Subsequent experimental studies generally support that people are averse to ambiguity. (A summary can be found in Camerer and Weber (1992).) Such aversion contradicts the subjective expected utility model, as is readily demonstrated for the Ellsberg experiment. In fact, it contradicts any model of preferences in which underlying beliefs are represented by a probability measure. (Machina and Schmeidler (1992) call such preferences “probabilistically sophisticated.” In this chapter, I reserve the term “Bayesian” for subjective expected utility maximizer.) The Ellsberg paradox has motivated generalizations of the subjective expected utility model. In the multiple priors model axiomatized by Gilboa and Schmeidler (1989), the single prior of Savage is replaced by a closed and convex set of probability measures. The decision maker is said to be uncertainty averse if the set is not a singleton. He evaluates an act by computing the minimum expected utility over the probability measures in his set of priors.
Lo, K. C. (1996) Equilibrium in beliefs under uncertainty, J. Econ. Theory, 71: 443–484.
484
Kin Chung Lo Table 20.1 Acts in Ellsberg’s experiment f1 f2 f3 f4
Win $100 if the ball drawn from urn 1 is black Win $100 if the ball drawn from urn 1 is red Win $100 if the ball drawn from urn 2 is black Win $100 if the ball drawn from urn 2 is red
Although the Ellsberg Paradox only involves a single decision maker facing an exogenously specified environment, it is natural to think that ambiguity aversion is also common in decision-making problems where more than one person is involved. Since existing equilibrium notions of games are defined under the assumption that players are subjective expected utility maximizers, deviations from the Savage model to accommodate aversion to uncertainty make it necessary to redefine equilibrium concepts. This chapter generalizes Nash Equilibrium and one of its variations in normal form games to allow the beliefs of each player to be representable by a closed and convex set of probability measures as in the Gilboa–Schmeidler model. The chapter then employs the generalized equilibrium concepts to study the effects of uncertainty aversion on strategic interaction in normal form games. Note that in order to carry out a ceteris paribus study of the effects of uncertainty aversion on how a game is played, the solution concept we use for uncertainty averse players should be different from that for Bayesian players only in terms of attitude toward uncertainty. In particular, the solution concepts should share, as far as possible, comparable epistemic conditions. That is, the requirements on what the players should know about each other’s beliefs and rationality underlying the new equilibrium concepts should be “similar” to those underlying familiar equilibrium concepts. This point is emphasized throughout the chapter and is used to differentiate the equilibrium concepts proposed here from those proposed by Dow and Werlang (1994) and Klibanoff (1993), also in an attempt to generalize Nash Equilibrium in normal form games to accommodate uncertainty aversion. The chapter is organized as follows. Section 20.2 contains a brief review of the multiple priors model and a discussion of how it is adapted to the context of normal form games. Section 20.3 defines Nash Equilibrium and one of its variants. Section 20.4 defines and discusses the generalized equilibrium concepts used in this chapter. Section 20.5 makes use of the equilibrium concepts defined in Section 20.4 to investigate how uncertainty aversion affects players’ strategy choices and welfare. Section 20.6 identifies how uncertainty aversion is related to the structure of a game. Section 20.7 discusses the epistemic conditions of the equilibrium concepts for uncertainty averse players used in this chapter and compares them with those underlying the corresponding equilibrium notions for subjective expected utility maximizing players. Section 20.8 provides a comparison with Dow and Werlang (1994) and Klibanoff (1993). The comparison also serves to clarify the implications for adopting different approaches for developing equilibrium notions for games with uncertainty averse players. Section 20.9 argues that the results in previous sections hold even if we drop the particular functional form
Equilibrium in beliefs under uncertainty
485
of the utility function proposed by Gilboa and Schmeidler (1989) but retain some of its basic properties. Some concluding remarks are offered in Section 20.10.
20.2. Preliminaries 20.2.1. Multiple priors model In this section, I provide a brief review of the multiple priors model and a discussion of some of its properties that will be relevant in later sections. For any topological space Y , adopt the Borel σ -algebra Y and denote by M(Y ) the set of all probability measures over Y .1 Adopt the weak∗ topology on the set of all finitely additive probability measures over (Y , Y ) and the induced topology on subsets. Let (X, X ) be the space of outcomes and ( , ) the space of states of nature. Let F be the set of all bounded measurable functions from to M(X).2 That is, F is the set of two-stage, horse-race/roulette-wheel acts, as in Anscombe and Aumann (1963). For f , g ∈ F and α ∈ [0, 1], αf + (1 − α)g ≡ h, where h(ω) = αf (ω) + (1 − α)g(ω) ∀ω ∈ . f ∈ F is called a constant act if f (ω) = p ∀ω ∈ ; such an act involves (probabilistic) risk but no uncertainty. For notational simplicity, I also use p ∈ M(X) to denote the constant act that yields p in every state of the world, x ∈ X, the degenerate probability measure on x, and ω ∈ , the event {ω} ∈ . The primitive is a preference ordering over acts. The relations of strict preference and indifference are denoted by and ∼, respectively. Gilboa and Schmeidler (1989) impose a set of axioms on that are necessary and sufficient for to be represented by a numerical function having the following structure: there exists an affine function u : M(X) → R and a unique, nonempty, closed and convex set of finitely additive probability measures on such that for all f , g ∈ F , f g ⇐⇒ min
p∈
u ◦ f dp min
p∈
u ◦ g dp.
(20.1)
It is convenient, but in no way essential, to interpret as “representing the beliefs underlying ”; I provide no formal justification for such an interpretation. The difference between the subjective expected utility model and the multiple priors model can be illustrated by a simple example. Suppose = {ω1 , ω2 } and X = R. Consider an act f ≡ (f (ω1 ), f (ω2 )). If the decision maker is a Bayesian and his beliefs over are represented by a probability measure p, the utility of f is p(ω1 )u(f (ω1 )) + p(ω2 )u(f (ω2 )). On the other hand, if the decision maker is uncertainty averse with the set of priors = {p ∈ M({ω1 , ω2 })|pl p(ω1 ) ph with 0 pl < ph 1},
486
Kin Chung Lo
then the utility of f is pl u(f (ω1 )) + (1 − pl )u(f (ω2 )) ph u(f (ω1 )) + (1 − ph )u(f (ω2 ))
if u(f (ω1 )) u(f (ω2 )) if u(f (ω1 )) u(f (ω2 )).
Note that given any act f with u(f (ω1 )) > u(f (ω2 )), (ω1 , pl ; ω2 , 1 − pl )3 can be interpreted as local probabilistic beliefs at f in the following sense. There exists an open neighborhood of f such that for any two acts g and h in the neighborhood, g h ⇐⇒ pl u(g(ω1 )) + (1 − pl )u(g(ω2 )) pl u(h(ω1 )) + (1 − pl )u(h(ω2 )). That is, the individual behaves like an expected utility maximizer in that neighborhood with beliefs represented by (ω1 , pl ; ω2 , 1−pl ). Similarly, (ω1 , ph ; ω2 , 1−ph ) represents the local probabilistic beliefs at f if u(f (ω1 )) < u(f (ω2 )). Therefore, the decision maker who “consumes” different acts may have different local probability measures at those acts. There are three issues regarding the multiple priors model that will be relevant when the model is applied to normal form games. The first concerns the decision maker’s preference for randomization. According to the multiple priors model, preferences over constant acts, that can be identified with objective lotteries over X, are represented by u(·) and thus conform with the von Neumann Morgenstern model. The preference ordering over the set of all acts is quasiconcave. That is, for any two acts f , g ∈ F with f ∼ g, we have αf + (1 − α)g f for any α ∈ (0, 1). This implies that the decision maker may have a strict incentive to randomize among acts. The second concerns the notion of null event. Given any preference ordering over acts, define an event T ⊂ to be -null as in Savage (1954): T is -null if for all f , f , g ∈ F , * ) * ) f (ω) if ω ∈ T f (ω) if ω ∈ T . ∼ g(ω) if ω ∈ /T g(ω) if ω ∈ /T In other words, an event T is -null if the decision maker does not care about payoffs in states belonging to T . This can be interpreted as the decision maker knows (or believes) that T can never happen. If is expected utility preferences, then T is -null if and only if the decision maker attaches zero probability to T . If is represented by the multiple priors model, then T is -null if and only if every probability measure in attaches zero probability to T . Finally, the notion of stochastic independence will also be relevant when the multiple priors model is applied to games having more than two players. Suppose the set of states is a product space 1 × · · · × N . In the case of a subjective expected utility maximizer, where beliefs are represented by a probability measure p ∈ M( ), beliefs are said to be stochastically independent if p is a product i measure: p = ×N i=1 mi , where mi ∈ M( ) ∀i. In the case of uncertainty aversion,
Equilibrium in beliefs under uncertainty
487
the decision maker’s beliefs over are represented by a closed and convex set of probability measures . Let marg i be the set of marginal probability measures on i as one varies over all the probability measures in . That is, marg i ≡ {mi ∈ M( i ) | ∃p ∈ such that mi = marg i p}. Following Gilboa and Schmeidler (1989: 150–151), say that the decision maker’s beliefs are stochastically independent if = closed convex hull of {×N i=1 mi |mi ∈ marg i ∀i}. That is, is the smallest closed convex set containing all the product measures in ×N i=1 marg i . 20.2.2. Normal form games This section defines n-person normal form games where players’ preferences are represented by the multiple priors model. Throughout, the indices i, j , and k vary over distinct players in {1, . . . , n}. Unless specified otherwise, the quantifier “for all such i, j , and k” is to be understood. As usual, −i denotes the set of all players other than i. Player i’s finite pure strategy space is Si with typical element si . The set of pure strategy profiles is S ≡ ×ni=1 Si . The game specifies an outcome function gi : S → X for player i. Since mixed strategies induce lotteries over X, we specify an affine function uˆ i : M(X) → R to represent player i’s preference ordering over M(X). A set of strategy profiles, outcome functions, and utility functions determine a normal form game (Si , gi , uˆ i )ni=1 . Let M(Si ) be the set of mixed strategies for player i with typical element σi . The set of mixed strategy profiles is therefore given by ×ni=1 M(Si ). σi (si ) denotes the probability of playing si according to the mixed strategy σi , σ−i (s−i ) denotes j =i σj (sj ) and σ−i is the corresponding probability measure on S−i ≡ ×j =i Sj . Note that when players are Bayesians, σi is sometimes interpreted as the probabilistic conjecture held by i’s opponents about i’s pure strategy choice. This chapter adopts the view that uncertainty averse players have a strict incentive to randomize.4 Therefore, σi represents player i’s conscious randomization. For example, suppose a factory employer has two pure strategies s = monitor worker 1 and s = monitor worker 2. His decision problem is to choose a (possibly degenerate) random device to determine which worker he is going to monitor. (See Section 20.4.2 for arguments for and against this approach.) Assume that player i is uncertain about the strategy choices of all the other players. Since the objects of choice for player j is the set of mixed strategies M(Sj ), the relevant state space for player i is ×j =i M(Sj ), endowed with the product topology. Each mixed strategy of player i can be regarded as an act over this state space. If player i plays σi and the other players play σ−i , i receives the lottery that yields outcome gi (si , s−i ) with probability σi (si )σ−i (s−i ). Note that this lottery has finite support because S and therefore {gi (s)}s∈S are finite sets. It is also easy to see that the act corresponding to any mixed strategy is bounded and
488
Kin Chung Lo
measurable in the sense of the preceding subsection. Consistent with the multiple priors model, player i’s beliefs over ×j =i M(Sj ) are represented by a closed and convex set of probability measures Bˆ i . Therefore, the objective of player i is to choose σi ∈ M(Si ) to maximize uˆ i (gi (si , s−i ))σi (si )σ−i (s−i ) d pˆ i (σ−i ). min pˆ i ∈Bˆ i
×j =i M(Sj ) s ∈S s ∈S j i −i −i
Define the payoff function ui : S → R as follows: ui (s) ≡ uˆ i (gi (s)) ∀s ∈ S. A normal form game can then be denoted alternatively as (Si , ui )ni=1 and the objective function of player i can be restated in the form (20.2) ui (si , s−i )σi (si )σ−i (s−i ) d pˆ i (σ−i ). min pˆ i ∈Bˆ i
×j =i M(Sj ) s ∈S s ∈S i i −i −i
In order to produce a simpler formulation of player i’s objective function, note that each element in Bˆ i is a probability measure over a set of probability measures. Therefore, the standard rule for reducing two-stage lotteries leads to the following construction of Bi ⊆ M(S−i ): Bi ≡ pi ∈ M(S−i ) | ∃pˆ i ∈ Bˆ i such that $
pi (s−i ) =
×j =i M(Sj )
σ−i (s−i ) d pˆ i (σ−i )
∀s−i ∈ S−i .
The objective function of player i can now be rewritten as min ui (σi , pi ),
pi ∈Bi
(20.3)
where ui (σi , pi ) ≡ si ∈Si s−i ∈S−i ui (si , s−i )σi (si )pi (s−i ). Convexity of Bˆ i implies that Bi is also convex. Further, from the perspective of the multiple priors model (20.1), (20.3) admits a natural interpretation whereby S−i is the set of states of nature relevant to i and Bi is his set of priors over S−i . Because of the greater simplicity of (20.3), the equilibrium concepts used in this chapter will be expressed in terms of (20.3) and Bi instead of (20.2) and Bˆ i . The above construction shows that doing this is without loss of generality. However, the reader should always bear in mind that the former is derived from the latter and I will occasionally go back to the primitive level to interpret the equilibrium concepts.
20.3. Equilibrium concepts for Bayesian players This section defines equilibrium concepts for Bayesian players. The definition of equilibrium proposed by Nash (1951) can be stated as follows:
Equilibrium in beliefs under uncertainty
489
Definition 20.1. A Nash Equilibrium is a mixed strategy profile {σi∗ }ni=1 such that ∗ ∗ σi∗ ∈ BRi (σ−i ) ≡ argmax ui (σi , σ−i ). σi ∈M(Si )
Under the assumption that players are expected utility maximizers, Nash proves that any finite matrix game of complete information has a Nash Equilibrium. It is well known that there are two interpretations of Nash Equilibrium. The traditional interpretation is that σi∗ is the actual strategy used by player i. In a Nash Equi∗ . The librium, it is best for player i to use σi∗ given that other players choose σ−i ∗ second interpretation is that σi is not necessarily the actual strategy used by player i. Instead it represents the marginal beliefs of player j about what pure strategy player i is going to pick. Under this interpretation, Nash Equilibrium is usually stated as an n-tuple of probability measures {σi∗ }ni=1 such that ∗ si ∈ BRi (σ−i )
∀si ∈ support of σi∗ .
∗ , BR (σ ∗ ) Its justification is that given that player i’s beliefs are represented by σ−i i −i is the set of strategies that maximize the utility of player i. So player j should ∗ ) will be chosen “think,” if j knows i’s beliefs, that only strategies in BRi (σ−i by i. Therefore, the event that player i will choose a strategy which is not in ∗ ) should be “null” (in the sense of Section 20.2.1) from the point of view BRi (σ−i of player j . This is the reason for imposing the requirement that every strategy si in the support of σi∗ , which represents the marginal beliefs of player j , must be an ∗ ). element of BRi (σ−i The “beliefs” interpretation of Nash Equilibrium allows us to see clearly the source of restrictiveness of this solution concept. First, the marginal beliefs of player j and player k about what player i is going to do are represented by the same probability measure σi∗ . Second, player i’s beliefs about what his opponents are going to do are required to be stochastically independent in the sense that the ∗ on the strategy choices of the other players is a product probability distribution σ−i measure. We are therefore led to consider the following variation.
Definition 20.2. A Bayesian Beliefs Equilibrium is an n-tuple of probability measures {bi }ni=1 where bi ∈ M(S−i ) such that5 margSi bj ∈ BRi (bi ) ≡ argmax ui (σi , bi ). σi ∈M(Si )
∗ }n is a Bayesian It is easy to see that if {σi∗ }ni=1 is a Nash Equilibrium, then {σ−i i=1 Beliefs Equilibrium. Conversely, a Bayesian Beliefs Equilibrium {bi }ni=1 consti∗ . Note that in games involving only tutes a Nash Equilibrium {σi∗ }ni=1 if bi = σ−i two players, the two equilibrium concepts are equivalent in that a Bayesian Beliefs Equilibrium must constitute a Nash Equilibrium. However, when a game involves more than two players, the definition of Bayesian Beliefs Equilibrium is more general. For instance, in a Bayesian Beliefs
490
Kin Chung Lo
Equilibrium, players i and k can disagree about what player j is going to do. That is, it is allowed that margSj bi = margSj bk . Example 20.1. Marginal beliefs disagree. Suppose the game involves three players. Player 1 only has one strategy {X}. Player 2 only has one strategy {Y }. Player 3 has two pure strategies {L, R}. The payoff to player 3 is a constant. {b1 = Y L, b2 = XR, b3 = XY } is a Bayesian Beliefs Equilibrium. However it does not constitute a Nash Equilibrium because players 1 and 2 disagree about what player 3 is going to do. Second, in a Bayesian Beliefs Equilibrium, player i is allowed to believe that the other players are playing in a correlated manner. As argued by Aumann (1987), this does not mean that the other players are actually coordinating with each other. It may simply reflect that i believes that there exist some common factors among the players that affect their behavior; for example, player i knows that all other players are professors of economics. Example 20.2. Stochastically dependent beliefs. Suppose the game involves three players. Player 1 has two pure strategies {U , D}. Player 2 has two pure strategies {L, R}. Player 3 has two pure strategies {T , B}. The payoffs of players 1 and 2 are constant. The payoff matrix for player 3 is as shown in Table 20.2. (For all n-person games presented in this chapter, the payoff is in terms of utility.) It is easy to see that b1 = (LT , 0.5; RT , 0.5), b2 = (U T , 0.5; DT , 0.5), and b3 = (U R, 0.5; DL, 0.5) constitute a Bayesian Beliefs Equilibrium. Moreover the marginal beliefs of the players agree. However it does not constitute a Nash Equilibrium. The reason is that player 3’s beliefs about the strategies of players 1 and 2 are stochastically dependent. If player 3 believes that the strategies of player 1 and player 2 are stochastically independent, player 3’s beliefs are possibly (UL, 0.25; UR, 0.25; DL, 0.25; DR, 0.25) and T would no longer be his best response. Table 20.2 Payoff matrix for player 3
T B
UL
UR
DL
DR
−10 0
3 0
4 0
−10 0
Equilibrium in beliefs under uncertainty
491
20.4. Equilibrium concepts for uncertainty averse players 20.4.1. Equilibrium concepts This section defines generalizations of Nash Equilibrium and Bayesian Beliefs Equilibrium to allow players’ preferences to be represented by the multiple priors model. The proposed equilibrium concepts preserve all essential features of their Bayesian counterparts, except that players’ beliefs are not necessarily represented by a probability measure. Further discussion is provided in Section 20.4.2. The generalization of Bayesian Beliefs Equilibrium is presented first. Definition 20.3. A Beliefs Equilibrium is an n-tuple of sets of probability measures {Bi }ni=1 where Bi ⊆ M(S−i ) is a nonempty, closed, and convex set such that margSi Bj ⊆ BRi (Bi ) ≡ argmax min ui (σi , pi ). σi ∈M(Si ) pi ∈Bi
When expressed in terms of Bˆ i , a Beliefs Equilibrium is an n-tuple of closed and convex sets of probability measures {Bˆ i }ni=1 such that σi ∈ BRi (Bˆ i )
∀σi ∈ ∪pˆj ∈Bˆ j support of margM(Si ) pˆ j ,
where BRi (Bˆ i ) is the set of strategies which maximize (20.2).6 The interpretation of Beliefs Equilibrium parallels that of its Bayesian counterpart. Given that player i’s beliefs are represented by Bˆ i , BRi (Bˆ i ) is the set of strategies that maximize the utility of player i. So player j should “think,” if j knows i’s beliefs, that only strategies in BRi (Bˆ i ) will be chosen by i. Therefore, the event that player i will choose a strategy that is not in BRi (Bˆ i ) should be “null” (in the sense of Section 20.2.1) from the point of view of player j . This is the reason for imposing the requirement that every strategy σi in the union of the support of every probability measure in margM(si ) Bˆ j , which represents the marginal beliefs of player j about what player i is going to do, must be an element of BRi (Bˆ i ). It is obvious that every Bayesian Beliefs Equilibrium is a Beliefs Equilibrium. Say that a Beliefs Equilibrium {Bi }ni=1 is proper if not every Bi is a singleton. Recall that Nash Equilibrium is different from Bayesian Beliefs Equilibrium in two respects: (i) The marginal beliefs of the players agree and (ii) the overall beliefs of each player are stochastically independent. An appropriate generalization of Nash Equilibrium to allow for uncertainty aversion should also possess these two properties. Consider therefore the following definition. Definition 20.4. A Beliefs Equilibrium {Bi }ni=1 is called a Beliefs Equilibrium with Agreement if there exists ×ni=1 i ⊆ ×ni=1 M(Si ) such that Bi = closed convex hull of {σ−i ∈ M(S−i ) | margSj σ−i ∈ j }.
492
Kin Chung Lo
We can see as follows that this definition delivers the two properties “agreement” and “stochastic independence of beliefs.” As explained in Section 20.2.2, player i’s beliefs are represented by a closed and convex set of probability measures Bˆ i on ×j =i M(Sj ). I require the marginal beliefs of the players to agree in the sense that margM(Sj ) Bˆ i = margM(Sj ) Bˆ k . To capture the idea that the beliefs of each player are stochastically independent, I impose the requirement that Bˆ i contains all the product measures. That is, Bˆ i = closed convex hull of {×j =i m ˆ j |m ˆ j ∈ margM(Sj ) Bˆ i }. Bi is derived from Bˆ i as in Section 20.2.2. By construction, we have margSj Bi = margSj Bk = convex hull of j and Bi takes the form required in the definition of Beliefs Equilibrium with Agreement. Note that Beliefs Equilibrium and Beliefs Equilibrium with Agreement coincide in two-person games. Further, for n-person games, if {bi }ni=1 is a Bayesian Beliefs Equilibrium with Agreement, then {bi }ni=1 constitutes a Nash Equilibrium. To provide further perspective and motivation, I state two variations of Beliefs Equilibrium and explain why they are not the focus of this chapter. Given that any strategy in BRi (Bi ) is equally good for player i, it is reasonable for player j to feel completely ignorant about which strategy i will pick from BRi (Bi ). This leads us to consider the following strengthening of Beliefs Equilibrium: Definition 20.5. A Strict Beliefs Equilibrium is a Beliefs Equilibrium with margSi Bj = BRi (Bi ). A Beliefs Equilibrium may not be a Strict Beliefs Equilibrium, as demonstrated in the following example.7 The example also shows that a Strict Beliefs Equilibrium does not always exist, which is obviously a serious deficiency of this solution concept. Example 20.3. Nonexistence of Strict Beliefs Equilibrium. The game in Table 20.3 only has one Nash Equilibrium, {U , L}. It is easy to check that it is not a Strict Beliefs Equilibrium. In fact, there is no Strict Beliefs Equilibrium for this game. An opposite direction is to consider weakening the definition of Beliefs Equilibrium. Table 20.3 A two-person game
U D
L
R
3, 2 0, 4
−1, 2 0, −100
Equilibrium in beliefs under uncertainty
493
Definition 20.6. A Weak Beliefs Equilibrium is an n-tuple of beliefs {Bi }ni=1 such that margSi Bj ∩ BRi (Bi ) = Ø. It is clear that any Beliefs Equilibrium is a Weak Beliefs Equilibrium. The converse is not true. If margSi Bj BRi (Bi ), there are some strategies (in j ’s beliefs about i) that player i will definitely not choose. However, player j considers those strategies “possible.” On the other hand, margSi Bj ∩ BRi (Bi ) = Ø captures the idea that player j cannot be “too wrong.” Weak Beliefs Equilibrium is also not the focus of this chapter because we do not expect much strategic interaction if the players know so little about their opponents. However, it will be discussed further in Section 20.7 (Proposition 20.8) and Section 20.8, where its relation to the equilibrium concepts proposed by Dow and Werlang (1994) and Klibanoff (1993) is discussed. 20.4.2. Discussion 20.4.2.1. Mixed strategies as objective randomization vs subjective beliefs As pointed out earlier, a mixed strategy of a player is traditionally regarded as his conscious randomization. In recent years, a modern view of mixed strategies has emerged according to which players do not randomize. Each player chooses a definite action but his opponents may not know which one, and the mixture represents their conjecture about his choice. Note that in games with Bayesian players, the two views are “observationally indistinguishable” in the sense that they lead to the same set of Nash Equilibria. However, this is not necessarily the case for games with uncertainty averse players. For example, consider the game in Table 20.4. (For all two person games presented in this chapter, player 1 is the row player and player 2 is the column player.) If player 1 is Bayesian, D is never optimal no matter whether he randomizes or not. Now suppose that player 1 is uncertainty averse. If he has a preference ordering represented by (20.3), then whatever his beliefs, the utility of the mixed strategy (U , 0.5; C, 0.5) is equal to 5 and the utility of D is only equal to 1. Therefore, D is also never optimal. On the other hand, if we assume that player 1 does not randomize and therefore has the choice set {U , C, D}, then he will strictly prefer to play D rather than U and C if his beliefs are, say, B1 = M({U , C, D}). The above example demonstrates the necessity to reexamine the two views of mixed strategies when we consider games with uncertainty averse players. Such Table 20.4 A two-person game
U C D
L
R
10, 1 0, 1 1, 1
0, 1 10, 1 1, 1
494
Kin Chung Lo
a reexamination is provided here. It also serves to justify the adoption of the traditional view in this chapter. One justification of the modern view is that the normal form game under study is repeated over time, where each player’s pure strategy choices are independent and identically distributed random variables. A mixed strategy equilibrium can therefore be interpreted as a stochastic steady state. However, since uncertainty is presumably eliminated asymptotically, this repeated game scenario is of limited relevance for the present study of games with uncertainty averse players. The standard objection to the traditional view also does not necessarily extend to games with uncertainty averse players. The argument against the traditional view is that since expected utility is linear in probabilities, Bayesian players do not have a strict incentive to randomize (see, for instance, Brandenburger (1992: 91). However, when preferences deviate from the expected utility model, there may exist a strict incentive to randomize. To see this, let us first go back to the context of single-person decision theory. Recall that is a preference ordering over the set of acts F , where each act f maps
into M(X). The interpretation of f is as follows. First a horse race determines the true state of nature ω ∈ . The decision maker is then given the objective lottery ticket f (ω). He spins the roulette wheel as specified by f (ω) to determine the actual prize he is going to receive. Also recall that for any two acts f , f ∈ F and α ∈ [0, 1], αf + (1 − α)f refers to the act which yields the lottery ticket αf (ω) + (1 − α)f (ω) in state ω. Suppose is strictly quasiconcave as in the Gilboa–Schmeidler model and the decision maker has to choose between f and f . Suppose further that he perceives that nature moves first; that is, a particular state ω∗ ∈ has been realized but the decision maker does not know what ω∗ is. If the decision maker randomizes between choosing f and f with probabilities α and 1 − α, respectively, he will receive the lottery αf (ω)+(1−α)f (ω) when ω∗ = ω. This is precisely the payoff of the act αf + (1 − α)f in state ω. That is, randomization enables him to enlarge the choice set from {f , f } to {αf + (1 − α)f | α ∈ [0, 1]}. Correspondingly, there will “typically” be an α ∈ (0, 1) such that αf + (1 − α)f is optimal according to .8 On the other hand, suppose the decision maker moves first and nature moves second. If the decision maker randomizes between choosing f and f with probabilities α and 1−α, respectively, he faces the lottery (f , α; f , 1−α) that delivers act f with probability α and f with probability 1 − α. Therefore, randomization delivers the set {(f , α; f , 1 − α) | α ∈ [0, 1]} of objective lotteries over F . Note that {(f , α; f , 1 − α) | α ∈ [0, 1]} is not in the domain of F and so the Gilboa–Schmeidler model is silent on the decision maker’s preference ordering over this set. This discussion translates to the context of normal form games with uncertainty averse players as follows. The key is whether player i perceives himself as moving first or last. The assumption of strict incentive to randomize can be justified by the assumption that each player perceives himself as moving last. On the other hand, if we assume that each player perceives himself as moving first, and has
Equilibrium in beliefs under uncertainty
495
an expected utility representation for preferences over objective lotteries on F , then there will be no strict incentive to randomize. Since the perception of each player about the order of strategy choices is not observable and there does not seem to be a compelling theoretical case for assuming either order, it would seem that either specification of strategy space merits study. (See also Dekel et al., 1991 for another instance where the perception of the players about the order of moves is important.) Another objection to the assumption of strict incentive to randomize that might be raised in the context of uncertainty is that it contradicts “Ellsberg type” behavior. The argument goes as follows. Suppose a decision maker can choose between f3 and f4 listed in Table 20.1. If the decision maker randomizes between f3 and f4 with equal probability, it will generate the act 12 f3 + 12 f4 which yields the lottery [$100, 12 ; $0, 12 ] in each state. Therefore, 12 f3 + 12 f4 is as desirable as f1 or f2 . This implies that the decision maker will be indifferent between having the opportunity to choose an act from {f1 , f2 } or from {f3 , f4 } and the Ellsberg Paradox disappears! The discussion in previous paragraphs gives us the correct framework to handle this objection. Randomization between f3 and f4 with equal probability will generate the act 21 f3 + 12 f4 only if either the decision maker is explicitly told or he himself perceives that he can first draw a ball from the urn but not look at its color, then toss a fair coin and choose f3 (f4 ) when head (tail) comes up (Raiffa, 1961: 693). Also, the preference pattern f1 ∼ f2 f3 ∼ f4 is already sufficient to constitute one version of the Ellsberg Paradox. In this version, consideration of randomization is irrelevant. Therefore, assuming a strict incentive to randomize does not make every version of the Ellsberg Paradox disappear. Finally, one standard defense of the interpretation of mixed strategies as objective randomization also makes sense in games with uncertainty averse players. That is, one may imagine a hypothetical “guide to playing games.” Such a guide can certainly recommend a mixed strategy or a set of mixed strategies to each player. 20.4.2.2. Knowledge of beliefs In common with the equilibrium concepts for Bayesian players presented in Section 20.3, Beliefs Equilibrium (with Agreement) assumes that each player knows his opponents’ beliefs. Three possible justifications for this assumption are as follows. First, players’ beliefs are derived from statistical information (which is not necessarily precise enough to be characterized by an objective probability measure). For instance, a salesman possesses statistical information about the bargaining behavior of customers. If a customer also knows the information, then he knows the beliefs of the salesman (Aumann and Brandenburger, 1995: 1176). Second, players may learn about their opponents’ beliefs through pre-game communication. For instance, suppose a player has two pure strategies X and Y . He may announce that he will choose X with probability between 0 and 1. In fact, Example 20.5 in Section 20.5 illustrates that it may be strictly better for a player
496
Kin Chung Lo
to make such a “vague” announcement. This point is also discussed by Greenberg (1994). Finally, players’ beliefs may be derived from public recommendation. A “guide” suggests a set of strategies to each player publicly. After receiving the suggestion, each player chooses a strategy which is unknown to his opponents. Admittedly, these justifications may not be entirely convincing. For example, when two players receive the same statistical information which does not take the form of an objective probability measure, it is demanding to assume that their beliefs agree. However, without the assumption of agreement, it would be harder to justify that players know each other’s beliefs, even though beliefs are derived from the same source of information. Nevertheless, note that the agreement assumption is equally strong if we restrict players to be Bayesians. In fact, if we allow players’ beliefs to disagree, it seems even harder for Bayesian beliefs to be mutual knowledge. How can player i know the unique subjective probability measure representing the beliefs of player j ? To conclude, I acknowledge the limitation of the above story and, following Aumann and Brandenburger (1995: 1176), only intend to show that a player may well know another’s conjecture in situations of economic interest. 20.4.2.3. Knowledge of rationality As do their Bayesian counterparts, Beliefs Equilibrium (with Agreement) assumes mutual knowledge of rationality. That is, player j ’s beliefs about player i’s behavior are focused on i’s best response given i’s true beliefs. This can be justified by the assumption that players learn their opponents’ rational behavior from past observations. We can assume that past observations are obtained from previous plays of the same normal form game (which has not been repeated sufficiently often to eliminate all uncertainty about strategy choices).9 Alternatively, we can assume that players’ knowledge of opponents’ rationality is derived from other sources. For example, before players i and j play a normal form game, i has observed that j was rational when j played a different game with player k. 20.4.3. Relationship with maximin strategy and rationalizability Finally it is useful to clarify the relationship between Beliefs Equilibrium defined in Section 20.4.1 and some familiar concepts in the received theory of normal form games. Definition 20.7. The strategy σi∗ is a maximin strategy for player i if σi∗ ∈ argmax
min
σi ∈M(Si ) pi ∈M(S−i )
ui (σi , pi ).
The following result is immediate: Proposition 20.1. If {M(Si )}ni=1 is a Beliefs Equilibrium, then every σi ∈ M(Si ) is a maximin strategy.
Equilibrium in beliefs under uncertainty
497
Definition 20.8. Set i0 ≡ M(Si ) and recursively define10 " in = σi ∈ in−1 | ∃p ∈ M(×j =i supp jn−1 ) % such that ui (σi , p) ui (σi , p) ∀σi ∈ in−1 .
n For player i, the set of Correlated Rationalizable Strategies is Ri ≡ ∞ n=0 i .
∞ n−1 We call RBi ≡ n=1 M(×j =i supp j ) the set of Rationalizable Beliefs. These notions are related to Beliefs Equilibrium by the next proposition. Proposition 20.2. Suppose {Bi }ni=1 is a Beliefs Equilibrium. Then BRi (Bi ) ⊆ Ri and Bi ⊆ RBi . Proof. Set ˆ i0 ≡ M(Si ) and recursively define " n ˆ i = σi ∈ ˆ in−1 | ∃P ⊆ M(×j =i supp ˆ jn−1 ) such that min ui (σi , p) p∈P
min ui (σi , p) p∈P
∀σi
% n−1 ˆ . ∈ i
By definition, i0 = ˆ i0 . It is obvious that i1 ⊆ ˆ i1 . Any element σi not in does not survive the first round of the iteration in the definition of correlated rationalizability. Since correlated rationalizability and iterated strict dominance coincide (see Fudenberg and Tirole, 1991: 52), there must exist σi∗ ∈ i0 such that ui (σi∗ , p) > ui (σi , p) ∀p ∈ M(×j =i supp j0 ). This implies minp∈P ui (σi∗ , p) > minp∈P ui (σi , p) ∀P ⊆ M(×j =i supp ˆ 0 ). Therefore, σi ∈ / ˆ 1 and we have i1
j
i
i1 = ˆ i1 . The argument can be repeated to establish in = ˆ in ∀n. BRi (Bi ) is rationalized by Bi , that is, BRi (Bi ) ⊆ ˆ i1 . According to the definition of Beliefs Equilibrium, margSi Bj ⊆ BRi (Bi ) ⊆ ˆ i1 . This implies ×j =i margSj Bi ⊆ ×j =i ˆ j1 and therefore Bi ⊆ M(×j =i supp ˆ j1 ). The argument can be repeated to establish BRi (Bi ) ⊆ ˆ n and Bi ⊆ M(×j =i supp ˆ n ) ∀n. i
j
20.5. Does uncertainty aversion matter? 20.5.1. Questions In Section 20.4, I have set up a framework that enables us to investigate how uncertainty aversion affects strategic interaction in the context of normal form
498
Kin Chung Lo
games. My objective here is to address the following two specific questions: 1. As an outside observer, one only observes the actual strategy choice but not the beliefs of each player. Is it possible for an outside observer to distinguish uncertainty averse players from Bayesian players? 2. Does uncertainty aversion make the players worse off (better off)? To deepen our understanding, let me first provide the answers to these two questions in the context of single-person decision making and conjecture the possibility of extending them to the context of normal form games. 20.5.2. Single-person decision making The first question is: as an outside observer, can we distinguish an uncertainty averse decision maker from a Bayesian decision maker? The answer is obviously yes if we have “enough” observations. (Otherwise the Ellsberg Paradox would not exist!) However, it is easy to see that if we only observe an uncertainty averse decision maker who chooses one act from a convex constraint set G ⊆ F , then his choice can always be rationalized (as long as monotonicity is not violated) by a subjective expected utility function. For example, take the simple case where
= {ω1 , ω2 }. The feasible set of utility payoffs C ≡ {(u(f (ω1 )), u(f (ω2 )))|f ∈ G} generated by G will be a convex set in R 2 . Suppose the decision maker chooses a point c ∈ C. To rationalize his choice by an expected utility function, we can simply draw a linear indifference curve which is tangent to C at c, with slope describing the probabilistic beliefs of the expected utility maximizer. This answer is at least partly relevant to the first question posed in Section 20.5.1. That is because in a normal form game, an outside observer only observes that each player i chooses a strategy from the set M(Si ). An important difference, though, is that the strategy chosen by i is a best response given his beliefs and these are part of an equilibrium. Therefore, it is possible that the consistency condition imposed by the equilibrium concept can enable us to break the observational equivalence. The second question addresses the welfare consequences of uncertainty aversion: does uncertainty aversion make a decision maker worse off (better off)? There is a sense in which uncertainty aversion makes a decision maker worse off. For simplicity, suppose again that X = R. Suppose that initially, beliefs over the state space are represented by a probability measure pˆ and next that beliefs change from pˆ to the set of priors with pˆ ∈ . Given f ∈ F , let CE (f ) be the certainty equivalent of f , that is, u(CE (f )) = minp∈ u ◦ f dp. Similar meaning is given to CEpˆ (f ). Then uncertainty aversion makes the decision maker worse off in the sense that CEpˆ (f ) CE (f ). That is, the certainty equivalent of any f when beliefs are represented by pˆ is higher than that when beliefs are represented by . Note that in this welfare comparison, I am fixing the utility function of lotteries u. This assumption can be clarified by the following restatement: Assume that the
Equilibrium in beliefs under uncertainty
499
decision maker has a fixed preference ordering ∗ over M(X) which satisfies the independence axiom and is represented numerically by u. Denote by and the orderings over acts corresponding to the priors pˆ and , respectively. Then the welfare comparison presumes that both and agree with ∗ on the set of constant acts, that is, for any f , g ∈ F with f (ω) = p and g(ω) = q for all ω ∈ , f g ⇔ f g ⇔ p ∗ q. At this point, it is not clear that the earlier discussion extends to the context of normal form games. When strategic considerations are present, one might wonder whether it is possible that if player 1 is uncertainty averse and if player 2 knows that player 1 is uncertainty averse, then the behavior of player 2 is affected in a fashion that benefits player 1 relative to a situation where 2 knows that 1 is a Bayesian.11 When both players are uncertainty averse and they know that their opponents are uncertainty averse, can they choose a strategy profile that Pareto dominates equilibria generated when players are Bayesians? 20.5.3. Every Beliefs Equilibrium contains a Bayesian Beliefs Equilibrium In this section, the two questions posed in Section 20.5.1 are addressed using the equilibrium concepts Bayesian Beliefs Equilibrium and Beliefs Equilibrium. The answers are implied by the following proposition. Proposition 20.3. If {Bi }ni=1 is a Beliefs Equilibrium, then there exist bi ∈ Bi , i = 1, . . . , n, such that {bi }ni=1 is a Bayesian Beliefs Equilibrium and BRi (Bi ) ⊆ BRi (bi ). Proof. 12 It is sufficient to show that there exists bi ∈ Bi such that BRi (Bi ) ⊆ BRi (bi ). This and the fact that {Bi }ni=1 is a Beliefs Equilibrium imply margSi bj ∈ margSi Bj ⊆ BRi (Bi ) ⊆ BRi (bi ). Therefore, {bi }ni=1 is a Bayesian Beliefs Equilibrium. We have that ui (·, pi ) is linear on M(Si ) for each pi and ui (σi , ·) is linear on Bi for each σi . Therefore, by Fan’s Theorem (Fan, 1953), u ≡ max min ui (σi , pi ) = min σi ∈M(Si ) pi ∈Bi
max ui (σi , pi ).
pi ∈Bi σi ∈M(Si )
By definition, σi ∈ BRi (Bi ) if and only if minpi ∈Bi ui (σi , pi ) = u. Therefore, ui (σi , bi ) u ∀pi ∈ Bi ∀σi ∈ BRi (Bi ).
(20.4)
Take bi ∈ argminpi ∈Bi maxσi ∈M(Si ) ui (σi , pi ). Then conclude that ui (σi , bi ) u = max ui (σi , bi ) σi ∈M(Si )
∀σi ∈ M(Si ).
Combining (20.4) and (20.5), we have ui (σi , bi ) = u ∀σi ∈ BRi (Bi ), that is, BRi (Bi ) ⊆ BRi (bi )
(20.5)
500
Kin Chung Lo Table 20.5 A two-person game
R1 R2 R3 R4
C1
C2
C3
C4
0, 1 0, 1 1, 1 −1, 1
0, 1 0, 1 −1, 1 1, 1
1, 1 0, 1 0, 1 0, 1
0, 1 1, 1 0, 1 0, 1
Example 20.4. Illustrating Proposition 20.3. Consider the game in Table 20.5. The sets of probability measures B1 = M({C1 , C2 , C3 , C4 }) and B2 = {R1 } constitute a Beliefs Equilibrium. It contains the Bayesian Beliefs Equilibrium {b1 = (C1 , 0.5; C2 , 0.5), b2 = R1 }. Also note that BR1 (B1 ) = {p ∈ M({R1 , R2 , R3 , R4 }) | p(R3 ) = p(R4 )} and BR1 (b1 ) = M({R1 , R2 , R3 , R4 }). This shows that the inclusion property BRi (Bi ) ⊆ BRi (bi ) in Proposition 20.3 can be strict. The example also demonstrates that a Proper Beliefs Equilibrium may contain more than one Bayesian Beliefs Equilibrium. For instance, {b1 = C3 , b2 = R1 } is another Bayesian Beliefs Equilibrium. However BR1 (b1 ) = {R1 }. Therefore, not every Bayesian Beliefs Equilibrium {bi }ni=1 contained in a Beliefs Equilibrium {Bi }ni=1 has the property BRi (Bi ) ⊆ BRi (bi ). For games involving more than two players, a Beliefs Equilibrium in general does not contain a Nash Equilibrium. This is already implied by the fact that a Bayesian Beliefs Equilibrium is itself a Beliefs Equilibrium but not a Nash Equilibrium. However, since Bayesian Beliefs Equilibrium and Nash Equilibrium are equivalent in two-person games, Proposition 20.3 has the following corollary. Corollary of Proposition 20.3. In a two-person game, if {B1 , B2 } is a Beliefs Equilibrium, then there exists σj∗ ∈ Bi such that {σ1∗ , σ2∗ } is a Nash Equilibrium and BRi (Bi ) ⊆ BRi (σj∗ ). Proposition 20.3 delivers two messages. The first is regarding the prediction of how the game will be played. Suppose {Bi }ni=1 is a Beliefs Equilibrium. The associated prediction regarding strategies played is that i chooses some σi ∈ BRi (Bi ). According to Proposition 20.3, it is always possible to find at least one Bayesian Beliefs Equilibrium {bi }ni=1 contained in {Bi }ni=1 such that the observed behavior of the uncertainty averse players (the actual strategies they choose) is consistent with utility maximization given beliefs represented by {bi }ni=1 . This implies that an outsider who can only observe the actual strategy choices in the single game under study will not be able to distinguish uncertainty averse players from Bayesian players. (I will provide reasons to qualify such observational equivalence in the next section.) We can use Proposition 20.3 also to address the welfare consequences of uncertainty aversion, where the nature of our welfare comparisons is spelled out in Section 20.5.2. If {Bi }ni=1 is a Beliefs Equilibrium, it contains a Bayesian Beliefs
Equilibrium in beliefs under uncertainty
501
Equilibrium {bi }ni=1 , and therefore, max min ui (σi , pi ) max ui (σi , bi ).
σi ∈M(Si ) pi ∈Bi
σi ∈M(Si )
The left-hand side of the above inequality is the ex ante utility of player i when his beliefs are represented by Bi and the right-hand side is ex ante utility when beliefs are represented by bi . The inequality implies that ex ante, i would prefer to play the Bayesian Beliefs Equilibrium {bi }ni=1 to the Beliefs Equilibrium {Bi }ni=1 . In this ex ante sense, uncertainty aversion makes the players worse off.13 20.5.4. Uncertainty aversion can be beneficial when players agree The earlier comparisons addressed the effects of uncertainty aversion when the equilibrium concepts used, namely Beliefs Equilibrium and Bayesian Beliefs Equilibrium, do not require agreement between agents. Here I reexamine the effects of uncertainty aversion when agreement is imposed, as incorporated in the Beliefs Equilibrium with Agreement and Nash Equilibrium solution concepts. For two-person games, the Corollary of Proposition 20.3 still applies since agreement is not an issue given only two players. However, for games involving more than two players, the following example demonstrates that it is possible to have a Beliefs Equilibrium with Agreement not containing any Nash Equilibrium. Example 20.5. Uncertainty aversion leads to Pareto improvement The game presented in this example is a modified version of the prisoners’ dilemma. The game involves three players, 1, 2, and N . Player N can be interpreted as “nature.” The payoff of player N is a constant and his set of pure strategies is {X, Y }. Players 1 and 2 can be interpreted as two prisoners. The set of pure strategies available for players 1 and 2 are {C1 , D1 } and {C2 , D2 }, respectively, where C stands for “co-operation” and D stands for “defection.” The payoff matrix for player 1 is shown in Table 20.6 and that for player 2 is shown in Table 20.7. Assume that the payoffs satisfy a>c>b
and c > d > e
and
2c < a + b.
(20.6)
Note that the game is the prisoner’s dilemma game if the inequalities a > c > b in (20.6) are replaced by a = b > c. (When a = b > c, the payoffs of players 1 and 2 for all strategy profiles do not depend on nature’s move.) This game is different from the standard prisoners’ dilemma in one respect. In the standard prisoners’ dilemma game, the expression a = b > c says that it is always better for one Table 20.6 Payoff matrix for player 1
C1 D1
XC2
Y C2
XD2
Y D2
c a
c b
e d
e d
502
Kin Chung Lo Table 20.7 Payoff matrix for player 2
C2 D2
XC1
Y C1
XD1
Y D1
c b
c a
e d
e d
Table 20.8 A two-person game
C1 D1
C2
D2
c, c pl a + (1 − pl )b, e
e, ph b + (1 − ph )a d, d
player to play D given that his opponent plays C. In this game, the expression a > c > b says that if one player plays D and one plays C, the player who plays D may either gain or lose. The interpretation of the inequalities c > d > e in (20.6) is the same as that in the standard prisoners’ dilemma. That is, it is better for both players to play C rather than D. However, a player should play D given that his opponent plays D. Note that the last inequality 2c < a + b in (20.6) is implied by a = b > c in the prisoners’ dilemma game. The inequality 2c < a + b can be rewritten as (a − c) − (c − b) > 0. For player 1, for example, (a − c) is the utility gain from playing D1 instead of C1 if the true state is XC2 . (c − b) is the corresponding utility loss if the true state is Y C2 . Therefore, the interpretation of 2c < a + b is that if you know your opponent plays C, the possible gain (loss) for you to play D instead of C is high (low). Assume that players 1 and 2 know each other’s action but they are uncertain about nature’s move. To be precise, suppose that the beliefs of the players are BN = {C1 C2 } B1 = {p ∈ M({XC2 , Y C2 }) | pl p(XC2 ) ph with 0 pl < ph 1} B2 = {p ∈ M({XC1 , Y C1 }) | pl p(XC1 ) ph with 0 pl < ph 1}. The construction of {BN , B1 , B2 } reflects the fact that the players agree. For example, the marginal beliefs of players 1 and 2 regarding {X, Y } agree with = {p ∈ M({X, Y }) | pl p(X) ph
with
0 pl < ph 1}. (20.7)
Given {BN , B1 , B2 }, the payoffs of each pure strategy profile for players 1 and 2 are shown in Table 20.8. Recall that the payoff of player N is a constant. Therefore, both X and Y are optimal and we only need to consider players 1 and 2. {BN , B1 , B2 } is a Beliefs Equilibrium with Agreement and BR1 (B1 ) = {C1 }
and BR2 (B2 ) = {C2 }
if and only if pl <
c−b a−b
and
ph >
a−c . a−b
Equilibrium in beliefs under uncertainty
503
Table 20.9 A two-person game
C1 D1
C2
D2
c, c λa + (1 − λ)b, e
e, λb + (1 − λ)a d, d
Note that our assumptions guarantee that c−b >0 a−b
and
a−c < 1, a−b
so there exist values for pl and ph consistent with the above inequalities. However, {BN , B1 , B2 } does not contain a Nash Equilibrium. To see this, suppose that players 1 and 2 are Bayesians who agree that p(X) = λ = 1 − p(Y ) for any λ ∈ [0, 1]. Then they are playing the game in Table 20.9. The strategy profile {C1 , C2 } is a Nash Equilibrium if and only if c λa + (1 − λ)b
and c λb + (1 − λ)a.
There exists λ ∈ [0, 1] such that {C1 , C2 } is a Nash Equilibrium if and only if c
1 (a + b), 2
which contradicts the last inequality in (20.6). Therefore, it is never optimal for both Bayesian players to play C and any Nash Equilibrium requires both players to play D and therefore that both players receive d with certainty. In the Beliefs Equilibrium with Agreement constructed above, on the other hand, both players play C and receive c > d with certainty. To better understand why uncertainty aversion leads to a better equilibrium in this game, let us go back to the beliefs {B1 , B2 } of players 1 and 2. As explained in Section 20.2.1, although the global beliefs of players 1 and 2 on {X, Y } are represented by the same in (20.7), the local probability measures for different acts may be different. For example, the local probability measure on {X, Y } at the act corresponding to D1 is (X, pl ; Y , 1 − pl ) and for D2 it is (X, ph ; Y , 1 − ph ), respectively. In the sense of local probability measures, therefore, players 1 and 2 disagree on the relative likelihood of X and Y when they are consuming the acts D1 and D2 , respectively. This allows playing D to be undesirable for both players. The example delivers three messages. First, it shows that in a game involving more than two players, uncertainty aversion can lead to an equilibrium that Pareto dominates all Nash Equilibria. Second, interpreting player N in the above game as “nature,” the game becomes a two-person game where the players are uncertain
504 Kin Chung Lo about their own payoff functions. Therefore, uncertainty aversion can be “beneficial” even in two-person games. Third, the beliefs profile {BN , B1 , B2 } continues to be a Beliefs Equilibrium with Agreement if, for instance, the payoff of player N is independent of his own strategy, but is the highest when player 1 plays C1 and player 2 plays C2 . In this case, even if players can communicate, player N has a strict incentive not to announce his own strategy.14
20.6. Why do we have uncertainty aversion? The next question I want to address is: When should we expect (or not) the existence of an equilibrium reflecting uncertainty aversion? In the context of single-person decision theory, we do not have much to say about the origin or precise nature of beliefs on the set of states of nature. However, we should be able to say more in the context of game theory. The beliefs of the players should be “endogenous” in the sense of depending on the structure of the game. For example, it is reasonable to predict that the players will not be uncertainty averse if there is an “obvious way” to play the game. The following two examples identify possible reasons for players to be uncertainty averse. Example 20.6. Nonunique equilibria. In the game in Table 20.10, any strategy profile is a Nash Equilibrium. Any {B1 , B2 } is a Beliefs Equilibrium. Uncertainty aversion in this game is due to the fact that the players do not have any idea about how their opponents will play. Example 20.7. Nonunique best responses. In the game in Table 20.11, {U , L} is the only Nash Equilibrium. However, it is equally good for player 1 to play D if he believes that player 2 plays L. Under this circumstance, it may be too demanding to require player 2 to attach probability one to player 1 playing U . At the other extreme, where 2 is totally ignorant of 1’s strategy choice, we obtain the Proper Beliefs Equilibrium {B1 = {L}, B2 = M({U , D})}. This example shows that the existence of a unique Nash Equilibrium is not sufficient to rule out an equilibrium with uncertainty aversion. However, I can Table 20.10 A two-person game
U M D
L
C
R
1, 1 1, 1 1, 10
1, 1 1, 1 1, 10
10, 1 10, 1 10, 10
Equilibrium in beliefs under uncertainty
505
prove the following: Table 20.11 A two-person game
U D
L
R
0, 1 0, 1
1, 0.5 0, 2
Table 20.12 A two-person game
U D
L
R
2, 1 2, 2
1, 2 0, 2
Proposition 20.4. If the game has a unique Bayesian Beliefs Equilibrium and it is also a strict Nash Equilibrium, then there does not exist a Proper Beliefs Equilibrium. Proof. Let {bi }ni=1 be the unique Bayesian Beliefs Equilibrium. Since it is also a ∗ . Let strict Nash Equilibrium of the game, there exists si∗ ∈ Si such that bi = s−i ∗ ∈ B . Using {Bi }ni=1 be a Beliefs Equilibrium. According to Proposition 20.3, s−i i Proposition 20.3 and the definition of Beliefs Equilibrium, we have margSi Bj ⊆ ∗ ) = {s ∗ }. This implies {B }n ∗ n BRi (Bi ) ⊆ BRi (s−i i i=1 = {s−i }i=1 . i Corollary of Proposition 20.4. In a two-person game, if the game has a unique Nash Equilibrium and it is also a Strict Nash Equilibrium, then there does not exist a Proper Beliefs Equilibrium. A Proper Beliefs Equilibrium can be ruled out also if the game is dominance solvable. Proposition 20.5. If the game is dominance solvable, then there does not exist a Proper Beliefs Equilibrium.15 (See Table 20.12.) Proof. Let {Bi }ni=1 be a Beliefs Equilibrium. According to Proposition 20.2, BRi (Bi ) ⊆ Ri and Bi ⊆ RB i . Since iterated strict dominance and correlated rationalizability are equivalent, a dominance solvable game has a unique pure strat∗ . Therefore, BR (B ) = s ∗ egy profile {si∗ }ni=1 such that Ri = si∗ and RB i = s−i i i i ∗ and Bi = s−i .
20.7. Decision theoretic foundation Recently, decision theoretic foundations for Bayesian solution concepts have been developed. In particular, Aumann and Brandenburger (1995) develop epistemic
506
Kin Chung Lo
conditions for Nash Equilibrium. The purpose of this line of research is to understand the knowledge requirements needed to justify equilibrium concepts. Although research on the generalization of Nash Equilibrium to allow for uncertainty averse preferences has already started (see Section 20.8), serious study of the epistemic conditions for those generalized equilibrium concepts has not yet been carried out. In this section, I provide epistemic conditions for the equilibrium concepts proposed in this chapter. The main finding is that Beliefs Equilibrium (Beliefs Equilibrium with Agreement) and Bayesian Beliefs Equilibrium (Nash Equilibrium) presume similar knowledge requirements. This supports the interpretation of results in previous sections as reflecting solely the effects of uncertainty aversion. Before I proceed, I acknowledge that although the partitional information structure used below is standard in game theory (see, for instance, Aumann, 1987 and Osborne and Rubinstein, 1994: 76), it is more restrictive than the interactive belief system used by Aumann and Brandenburger (1995). In their framework, “know” means “ascribe probability 1 to” which is more general than the “absolute certainty without possibility of error” that is being used here (Aumann and Brandenburger, 1995: 1175). Apart from this difference, the two approaches share essentially the same spirit. There is a common set of states of the world. A state contains a description of each player’s knowledge, beliefs, strategy, and payoff function.16 Formally, the following notation is needed. Let be a common finite set of states of nature for the players with typical element ω. Each state ω ∈ consists of a specification for each player i of • • • •
Hi (ω) ⊆ , which describes player i’s knowledge in state ω (where Hi is a partitional information function) i (ω), a closed and convex set of probability measures on Hi (ω), the beliefs of player i in state ω fi (ω) ∈ M(Si ), the mixed strategy used by player i in state ω ui (ω, ·): S → R, the payoff function of player i in state ω.
To respect the partitional information structure, the payoff function ui : × S → R and the strategy fi : → M(Si ) are required to be adapted to Hi . Given fi , fi (ω)(si ) denotes the probability that i plays si according to the strategy fi in state ω. For each ω ∈ , player i’s beliefs over S−i are represented by a closed and convex set of probability measures Bi (ω) that is induced from i (ω) in the following way: Bi (ω) ≡ pi ∈ M(S−i ) | ∃qi ∈ i (ω) such that pi (s−i )
=
ω∈H ˆ i (ω)
qi (ω) ˆ
# j =i
fj (ω)(s ˆ j ) ∀s−i ∈ S−i
⎫ ⎬ ⎭
.
Equilibrium in beliefs under uncertainty
507
This specification is common knowledge among the players. Player i is said to know an event E at ω if Hi (ω) ⊆ E. Say that an event is mutual knowledge if everyone knows it. Let H be the meet of the partitions of all the players and H (ω) the element of H which contains the element ω. An event E is common knowledge at ω if and only if H (ω) ⊆ E. Say that player i is rational at ω if his strategy fi (ω) maximizes utility as stated in (20.3) when beliefs are represented by Bi (ω). The following proposition describes formally the knowledge requirements for {Bi }ni=1 to be a Beliefs Equilibrium. If each Bi is a singleton, then a parallel result for Bayesian Beliefs Equilibrium is obtained. (The version of Proposition 20.6 for Bayesian Beliefs Equilibrium for two-person games can be found in Theorem A in Aumann and Brandenburger, 1995: 1167.) To focus on the intuition, all the propositions in this section are only informally discussed. Their proofs can be found in the Appendix. Proposition 20.6. Suppose that at some state ω, the rationality of the players, {ui }ni=1 , and {Bi }ni=1 are mutual knowledge. Then {Bi }ni=1 is a Beliefs Equilibrium. The idea of Proposition 20.6 is not difficult. At ω, player i knows j ’s beliefs Bj (ω), payoff function BRj (ω), and that j is rational. Therefore, any strategy fj (ω ) for player j with ω ∈ Hi (ω), that player i thinks is possible, must be player j ’s best response given his beliefs. That is, fj (ω ) ∈ BRj (ω)(Bj (ω)) ∀ω ∈ Hi (ω). Since the preference ordering of player j is quasiconcave, any convex combination of strategies in the set {fj (ω ) | ω ∈ Hi (ω)} must also be a best response for player j . By construction, margSj Bi (ω) is a subset of the convex hull of {fj (ω ) | ω ∈ Hi (ω)}. This implies margSj Bi (ω) ⊆ BRj (ω)(Bj (ω)) and therefore {Bi }ni=1 is a Beliefs Equilibrium. In a Beliefs Equilibrium with Agreement, the beliefs {Bi }ni=1 of the players over the strategy choices of opponents are required to have the properties of agreement and stochastic independence. Since Bi is derived from i , it is to be expected that some restrictions on i are needed for {Bi }ni=1 to possess the desired properties. In the case where players are expected utility maximizers so that i (ω) is a singleton for all ω, Theorem B in Aumann and Brandenburger (1995: 1168) shows that by restricting {i }ni=1 to come from a common prior, mutual knowledge of rationality and payoff functions and common knowledge of beliefs are sufficient to imply Nash Equilibrium. In the case where players are uncertainty averse, the following proposition says that by restricting each player i to being completely ignorant at each ω about the relative likelihood of states in Hi (ω), exactly the same knowledge requirements that imply Nash Equilibrium also imply Beliefs Equilibrium with Agreement. Proposition 20.7. Suppose that i (ω) = M(Hi (ω)) ∀ω. Suppose that at some state ω, the rationality of the players and {ui }ni=1 are mutual knowledge and that {Bi }ni=1 is common knowledge. Then {Bi }ni=1 is a Beliefs Equilibrium with Agreement.
508
Kin Chung Lo
The specification of i (ω) as the set of all probability measures on Hi (ω) reflects the fact that player i is completely ignorant about the relative likelihood of states in Hi (ω). It is useful to explain the role played by this parametric specialization of beliefs. Given any state ω and any event E, the beliefs {p(E) | p ∈ M(Hi (ω))} of player i about E at ω must satisfy one and only one of the following three conditions. 1 2 3
p(E) = 1 ∀p ∈ M(Hi (ω)) if and only if Hi (ω) ⊆ E (player i knows E). {p(E) | p ∈ M(Hi (ω))} = [0, 1] if and only if Hi (ω) E and Hi (ω) ∩ E = Ø (player i does not know E or \ E). p(E) = 0 ∀p ∈ M(Hi (ω)) if and only if Hi (ω) ∩ E = Ø (player i knows
\ E).
Therefore, player j knows i’s beliefs about E if and only if one and only one of the following is true: 1 2 3
j knows that i knows E. j knows that i does not know E or \ E. j knows that i knows \ E.
As a result, mutual knowledge of beliefs about E implies agreement of beliefs about E. The common knowledge assumption in Proposition 20.7 is used only to derive the property of stochastically independent beliefs.17 Note that Theorem B in Aumann and Brandenburger (1995) is not a special case of Proposition 20.7. The common prior assumption imposed by their theorem coincides with the restriction on i imposed by Proposition 20.7 only in the case where Hi (ω) = {ω} ∀ω. Therefore, the examples they provide to show the sharpness of their theorem do not apply to Proposition 20.7. To serve this purpose, I provide Example 20.8 to show that Proposition 20.7 is tight in the sense that mutual knowledge rather than common knowledge of {Bi }ni=1 is not sufficient to guarantee a Beliefs Equilibrium with Agreement. Example 20.8. Mutual knowledge of beliefs is not sufficient for agreement. The game consists of three players. The set of states of nature is = {ω1 , ω2 , ω3 , ω4 }. The players’ information structures are H1 = {{ω1 , ω2 }, {ω3 , ω4 }}, H2 = {{ω1 , ω3 }, {ω2 , ω4 }}, and H3 = {{ω1 , ω2 , ω3 }, {ω4 }}. Their strategies are listed in Table 20.13. Suppose i (ω) = M(Hi (ω)) ∀ω. At ω1 , the beliefs {Bi (ω1 )}3i=1 of the players are mutual knowledge. For example, since B1 (ω) = M({U T , DT }) ∀ω ∈ ,
Equilibrium in beliefs under uncertainty
509
Table 20.13 Strategies
f1 f2 f3
ω1
ω2
ω3
ω4
L U T
L D T
R U T
R D T
B1 (ω1 ) is common knowledge and therefore mutual knowledge at ω1 . According to the proof of Proposition 20.7, marginal beliefs of the players agree. For example, margS2 B1 (ω1 ) = margS2 B3 (ω1 ) = M({U , D}). However, B3 (ω1 ) = M({LU , LD, RU }) is not common knowledge at ω1 . Player 3 does not know that player 1 knows player 3’s beliefs. At ω1 , player 3 cannot exclude the possibility that the true state is ω3 . At ω3 , player 1 only knows that player 3’s beliefs are represented by either B3 (ω3 ) = M({LU , LD, RU }) or B3 (ω4 ) = {RD}. Note that B3 (ω1 ) does not take the form required in the definition of Beliefs Equilibrium with Agreement. Finally, although the notion of Weak Beliefs Equilibrium (Definition 20.6) is not the main focus of this chapter, it is closely related to the equilibrium concepts proposed by the papers discussed in Section 20.8. According to the following proposition, complete ignorance and rationality at a state ω are sufficient to imply Weak Beliefs Equilibrium. Proposition 20.8. Suppose that at some state ω, i (ω) = M(Hi (ω)) and that players are rational. Then {Bi }ni=1 is a Weak Beliefs Equilibrium. By looking at the definition of Weak Beliefs Equilibrium more carefully, Proposition 20.8 is hardly surprising. For instance, given any n-person normal form game, it is immediate that {M(S−i )}ni=1 is a Weak Beliefs Equilibrium. Note that Proposition 20.8 requires only that the players be rational; they do not need to know that their opponents are rational. They also do not need to know anything about the beliefs of their opponents.
20.8. Related literature In this section, I compare my equilibrium concepts with those proposed by Dow and Werlang (1994) and Klibanoff (1993).18 Since the latter employs the same strategy space as in this chapter, let me first conduct a direct comparison. 20.8.1. Klibanoff (1993) Klibanoff (1993) also adopts the multiple priors model to represent players’ preferences in normal form games with any finite number of players and proposes the following solution concept:19
510
Kin Chung Lo
Definition 20.9. ({σi }ni=1 , {Bi }ni=1 ) is an Equilibrium with Uncertainty Aversion if the following conditions are satisfied: 1 2
σ−i ∈ Bi . minpi ∈Bi ui (σi , pi ) minpi ∈Bi ui (σi , pi ) ∀σi ∈ M(Si ).
σi is the actual strategy used by player i and Bi is his beliefs about opponents’ strategy choices. Condition 1 says that player i’s beliefs cannot be “too wrong.” That is, the strategy profile σ−i chosen by other players should be considered “possible” by player i. Condition 2 says that σi is a best response for i given his beliefs Bi . In addition, the following refinement of Equilibrium with Uncertainty Aversion is offered in his paper. ({σi }ni=1 , {Bi }ni=1 ) is an Equilibrium with Uncertainty Aversion and Rationalizable Beliefs if it is an Equilibrium with Uncertainty Aversion and, in addition, Bi ⊆ RBi (defined in Definition 20.8). That is, player i believes that his opponents’ strategy choices are correlated rationalizable. Although Klibanoff’s equilibrium concepts involve both the specification of beliefs and the actual strategies used by the players, while the equilibrium concepts in my chapter involve only the former, the main differences between them can be summarized in terms of beliefs in four aspects as shown in Table 20.14. It enables us to conclude that Beliefs Equilibrium with Agreement is a refinement of Klibanoff’s equilibrium concepts. Proposition 20.9. If {Bi }ni=1 is a Beliefs Equilibrium with Agreement, then for any σi ∈ margSi Bj , it is the case that ({σi }ni=1 , {Bi }ni=1 ) is an Equilibrium with Uncertainty Aversion and Rationalizable Beliefs. Since knowledge of rationality and beliefs is the most essential property underlying the equilibrium concepts, let us focus on two-person games where agreement and stochastic independence are not the issues. The following example illustrates Table 20.14 Comparison for two equilibrium concepts
Knowledge of rationality and beliefs Agreement of marginal beliefs Stochastically independent beliefs Rationalizable beliefs
Equilibrium with uncertainty aversion and rationalizable beliefs
Beliefs equilibrium with agreement
margSi Bj ∩ BRi (Bi ) = Ø
margSi Bj ⊆ BRi (Bi )
margSj Bi ∩ margSj Bk = Ø
margSj Bi = margSj Bk
Bi contains at least one product measure in ×j =i margSj Bi Bi ⊆ RB i
Bi contains all product measures in ×j =i margSj Bi Bi ⊆ RB i (Proposition 20.2)
Equilibrium in beliefs under uncertainty
511
Table 20.15 A two-person game
U D
L
C
R
10, 10 2, 4
1.99, 10 2, 4
10, 10 2, 5
that an Equilibrium with Uncertainty Aversion and Rationalizable Beliefs may not be a Beliefs Equilibrium. Example 20.9. Refinement of Klibanoff (1993). The game in Table 20.15 is deliberately constructed so that every strategy of every player survives iterated elimination of strictly dominated strategies. Therefore, Klibanoff’s standard equilibrium concept coincides with his own refinement. It is easy to check that {σ1 , σ2 , B1 , B2 } = {D, R, M(S2 ), M(S1 )} is an Equilibrium with Uncertainty Aversion (and Rationalizable Beliefs). This equilibrium predicts that D and R to be the unique best response for players 1 and 2, respectively. As a result, player 1 receives 2 and player 2 receives 5. It is reasonable that player 2 will play R. The reason is that R is as good as L and C if 2 plays U and it is strictly better than L and C if 2 plays D. However, if player 1 realizes this, 1 should play U , and as a result, both players will receive 10. Note that {B1 , B2 } = {M(S2 ), M(S1 )} is not a Beliefs Equilibrium. Moreover, no Beliefs Equilibrium in this game will predict D to be player 1’s unique best response. Finally, for any two-person game, Klibanoff’s standard equilibrium concept is equivalent to the notion of Weak Beliefs Equilibrium (Definition 20.6). Proposition 20.10. {B1 , B2 } is a Weak Beliefs Equilibrium if and only if there exists σi ∈ Bj such that {σ1 , σ2 , B1 , B2 } is an Equilibrium with Uncertainty Aversion. 20.8.2. Dow and Werlang (1994) Dow and Werlang (1994) consider two-person games and assume that players’ preference orderings over acts are represented by the convex capacity model proposed by Schmeidler (1989). Any such preference ordering is a member of the multiple priors model (Gilboa and Schmeidler, 1989). Their equilibrium concept can be restated using the multiple priors model as follows. Definition 20.10. {B1 , B2 } is a Nash Equilibrium Under Uncertainty if the following conditions are satisfied: 1 2
There exists Ei ⊆ Si such that pj (Ei ) = 1 for at least one pj ∈ Bj . minpi ∈Bi ui (si , pi ) minpi ∈Bi ui (si , pi ) ∀si ∈ Ei ∀si ∈ Si .
Dow and Werlang (1994) interpret Condition 1 as saying that player j “knows” that player i will choose a strategy in Ei . Condition 2 says that every si ∈ Ei is
512
Kin Chung Lo
a best response for i, given that Bi represents i’s beliefs about the strategy choice of player j . Unlike Klibanoff and this chapter, Dow and Werlang restrict players to choosing pure rather than mixed strategies. It is therefore important to reiterate the justification for using one strategy space instead of the other. According to the discussion in Section 20.4.2, the use of pure versus mixed strategy spaces depends on the perception of the players about the order of strategy choices. The adoption of a mixed strategy space in Klibanoff (1993) and in this chapter can be justified by the assumption that each player perceives himself as moving last. On the other hand, we can understand the adoption of a pure strategy space in Dow and Werlang (1994) as assuming that each player perceives himself as moving first and has an expected utility representation for preferences over objective lotteries on acts. Further comparison of Dow and Werlang (1994) and this chapter is provided in the next subsection. 20.8.3. Epistemic conditions I suggested in the introduction that in order to carry out a ceteris paribus study of the effects of uncertainty aversion on how a game is played, we should ensure that the generalized equilibrium concept is different from Nash Equilibrium only in one dimension, players’ attitude toward uncertainty. In particular, the generalized solution concept should share comparable knowledge requirements with Nash Equilibrium. According to this criterion, I argue in Section 20.7 that the solution concepts I propose are appropriate generalizations of their Bayesian counterparts. Dow and Werlang (1994) and Klibanoff (1993) do not provide epistemic foundations for their solution concepts and a detailed study is beyond the scope of this chapter. However, I show below that in the context of two-person normal form games, exactly the same epistemic conditions that support Weak Beliefs Equilibrium as stated in Proposition 20.8, namely, complete ignorance and rationality, also support Nash Equilibrium Under Uncertainty and Equilibrium with Uncertainty Aversion. Therefore, the sufficient conditions for players’ beliefs to constitute an equilibrium in these two senses do not require the players to know anything about the beliefs and rationality of their opponents. The weak epistemic foundation for their equilibrium concepts is readily reflected by the fact that given any two-person normal form game, {M(S2 ), M(S1 )} is always a Nash Equilibrium Under Uncertainty and there always exist σ1 and σ2 such that (σ1 , σ2 , M(S2 ), M(S1 )) is an Equilibrium with Uncertainty Aversion. The equilibrium notions in these two papers therefore do not fully exploit the difference between a game, where its payoff structure (e.g. dominance solvability) may limit the set of “reasonable” beliefs, and a single-person decision-making problem, where any set of priors (or single prior in the Bayesian case) is “rational.” In fact, Dow and Werlang (1994: 313) explicitly adopt the view that the degree of uncertainty aversion is subjective, as in the single-agent setting, rather than reasonably tied to the structure of the game. As a result, their equilibrium concept delivers a continuum of equilibria for every normal form game (see their theorem on p. 313).
Equilibrium in beliefs under uncertainty
513
Let me now proceed to the formal statements. Recall that Klibanoff’s standard equilibrium concept can be readily rewritten as a Weak Beliefs Equilibrium (Proposition 20.10). It follows that Proposition 20.8 provides the epistemic conditions underlying Klibanoff’s equilibrium concept as simplified here. Since Dow and Werlang (1994) adopt a pure strategy space, I keep all notation from Section 20.7 but redefine • • •
fi : → Si B i (ω) ≡ {pi
∈
M(Sj ) | ∃qi
∈
qi (ω) ˆ ∀sj ∈ Sj } ω∈H ˆ i (ω)∩{ω |fj (ω )=sj } BRi (Bi ) ≡ argmaxsi ∈Si minpi ∈Bi ui (si , pi ).
i (ω) such that pi (sj )
=
That is, fi (ω) is the pure action used by player i at ω and Bi (ω) is the set of probability measures on Sj induced from i (ω) and fj . It represents the beliefs of player i at ω about j ’s strategy choice. Proposition 20.11. Suppose that at some state ω, i (ω) = M(Hi (ω)) and that the players are rational. Then {B1 , B2 } is a Nash Equilibrium Under Uncertainty. When a decision maker’s beliefs are represented by a probability measure p, an event E is -null if p(E) = 0. It is well recognized that when preferences are not probabilistically sophisticated, there are alternative ways of defining nullity. The equilibrium concepts in Dow and Werlang (1994), Klibanoff (1993), and this chapter can all be regarded as generalizations of Nash Equilibrium if the “right” notion of nullity is adopted. To see this, first assume that each player does not have a strict incentive to randomize. Take S−i to be the state space of player i and Si to be a subset of acts which map S−i to R. Suppose that i represents the preference ordering of player i over Si . Player i is rational if he chooses si such that si i sˆi ∀ˆsi ∈ Si . The following is an appropriate restatement of Nash Equilibrium in terms of preferences: Definition 20.11. {i , j } is a Nash Equilibrium if the following conditions are satisfied: 1 2
There exists i ⊆ Si such that the complement of i is j -null. si i sˆi ∀si ∈ i ∀ˆsi ∈ Si .
In words, {i , j } is a Nash Equilibrium if the event that player i is irrational is j -null. Suppose i and j are represented by the multiple priors model. Let Bi and Bj be the sets of probability measures underlying i and j , respectively. Then {Bi , Bj } is a Beliefs Equilibrium (with Agreement) if and only if {i , j } satisfies Definition 20.11, with Si replaced by M(Si ) and using the definition of nullity as stated in Section 20.2.1. The equilibrium concepts of Dow and Werlang (1994) and Klibanoff (1993) are equivalent to Definition 20.11 if the notion of nullity in Dow and Werlang (1994) is adopted: an event is j -null if it is attached zero probability by at least one probability measure in Bj .20
514
Kin Chung Lo
The above discussion may lead the reader to think that the epistemic conditions provided for the equilibrium concepts in Dow and Werlang (1994) and Klibanoff (1993) are biased by using the notion of knowledge that is only appropriate for the equilibrium concepts proposed in this chapter. Therefore, it is worth reiterating that the conditions stated in Propositions 20.8 and 20.11 do not require the players to know anything about their opponents’ beliefs and rationality. Therefore, the notion of knowledge to be adopted is irrelevant. Moreover, Propositions 20.8 and 20.11 do not even exploit the fact that the information structure is represented by partitions. The two propositions and their proof continue to hold as long as the beliefs of player i at ω are represented by the set of all probability measures over an event Hi (ω) ⊆ with the property that ω ∈ Hi (ω). Therefore, the conclusion that complete ignorance and rationality imply the two equilibrium concepts remains valid even in the absence of partitional information structures.
20.9. More general preferences The purpose of this section is to show that even if we drop the particular functional form proposed by Gilboa and Schmeidler (1989) but retain some of its basic properties, counterparts of this paper’s equilibrium concepts and results can be formulated and proven. Let us first go back to the context of single-person decision theory and define a class of utility functions that generalizes the multiple priors model. Recall the notation introduced in Section 20.2.1, whereby is a preference ordering over the set of acts F , where each act maps into M(X). Impose the following restrictions on : Suppose that restricted to constant acts conforms to expected utility theory and so is represented by an affine u : M(X) → R. Suppose that there exists a nonempty, closed, and convex set of probability measures on such that is representable by a utility function of the form " % f → U (f ) ≡ V u ◦ f dp|p ∈ (20.8) for some function V : R → R. Assume that is monotonic in the sense that for any f , g ∈ F , if u ◦ f dp > u ◦ g dp ∀p ∈ , then f g. Say that is multiple priors representable if satisfies all the above properties. Quasiconcavity of will also be imposed occasionally. Two examples are provided here to clarify the structure of the utility function U in (20.8). Suppose there exists a probability measure µ over M( ) and a concave and increasing function h such that U (f ) = h u ◦ f dp dµ. M( )
In this example, the set of probability measures corresponds to the support of µ. The interpretation of the above utility function is that the decision maker views an act f as a two-stage lottery. However, the reduction of compound lotteries axiom
Equilibrium in beliefs under uncertainty
515
may not hold (Segal, 1990). Note that this utility function satisfies quasiconcavity. Another example for U which is not necessarily quasiconcave is the Hurwicz (1951) criterion; U (f ) = α min u ◦ f dp, u ◦ f dp + (1 − α) max p∈
p∈
where 0 α 1. Adapting the model to the context of normal form games as in Section 20.2.2, the objective function of player i is V ({ui (σi , bi ) | bi ∈ Bi }). All equilibrium notions can be defined precisely as before. I now prove the following extension of Proposition 20.3. Proposition 20.12. Consider an n-person game. Suppose that the preference ordering of each player is multiple priors representable and quasiconcave. If {Bi }ni=1 is a Beliefs Equilibrium, then there exists bi ∈ Bi such that {bi }ni=1 is a Bayesian Beliefs Equilibrium and BRi (Bi ) ⊆ BRi (bi ). Proof. As in the proof of Proposition 20.3, it suffices to show that, given Bi , there exists bi ∈ Bi such that BRi (Bi ) ⊆ BRi (bi ). I first show that for each σi ∈ BRi (Bi ), there exists bi ∈ Bi such that σi ∈ BRi (bi ). Suppose that this were not true. Then there exists σi ∈ BRi (Bi ) such that for each bi ∈ Bi , we can find σi ∈ M(Si ) with ui (σi , bi ) < ui (σi , bi ). This implies that there exists σi∗ ∈ M(Si ) such that ui (σi∗ , bi ) > ui (σi , bi ) ∀bi ∈ Bi . (See Lemma 3 in Pearce, 1984: 1048.) Since the preference of player i is monotonic, player i should strictly prefer σi∗ to σi when his beliefs are represented by Bi . This contradicts the fact that σi ∈ BRi (Bi ). Quasiconcavity of preference implies that BRi (Bi ) is a convex set. Therefore, there exists an element σi ∈ BRi (Bi ) such that the support of σi is equal to the union of the support of every probability measure in BRi (Bi ). Since σi ∈ BRi (bi ) for some bi ∈ Bi and ui (·, bi ) is linear on M(Si ), this implies that si ∈ BRi (bi ) ∀si ∈ support of σi . This in turn implies BRi (Bi ) ⊆ BRi (bi ). Besides Proposition 20.3, it is not difficult to see that Proposition 20.2 also holds if the preference ordering of each player is multiple priors representable. (The monotonicity of the preference ordering for player i ensures that in = ˆ in ∀n in the proof of Proposition 20.2.) Propositions 20.4 and 20.5 are also valid because their proofs depend only on Propositions 20.2 and 20.3. Finally, all the results in Section 20.6, except Proposition 20.6 which also requires preferences to be quasiconcave, are true under the assumption of multiple priors representable preferences.
20.10. Concluding remarks Let me first summarize the questions addressed in this chapter.
516 1 2 3 4 5
Kin Chung Lo What is a generalization of Nash Equilibrium (and its variants) in normal form games that allows for uncertainty averse preferences? What are the epistemic conditions for those equilibrium concepts? Can an outside observer distinguish uncertainty averse players from Bayesian players? Does uncertainty aversion make the players worse off (better off)? How is uncertainty aversion related to the structure of the game?
Generalizations of Nash Equilibrium have already been proposed by Dow and Werlang (1994) and Klibanoff (1993) to partly answer questions 3, 4, and 5. One important feature of the equilibrium concepts presented in this chapter that is different from Dow and Werlang (1994) but in common with Klibanoff (1993) is the adoption of mixed instead of pure strategy space. They can both be justified by different perceptions of the players about the order of strategy choices. On the other hand, I can highlight the following relative merits of the approach pursued here. A distinctive feature of the solution concepts proposed in my chapter is their epistemic foundation, which resemble as closely as possible those underlying the corresponding Bayesian equilibrium concepts. As pointed out by Dow and Werlang (1994: 313), their equilibrium concepts are only “presented intuitively rather than derived axiomatically.” In my chapter, some epistemic conditions are also provided for the equilibrium concepts proposed by Dow and Werlang (1994) and Klibanoff (1993). The weakness of their equilibrium concepts is revealed by the fact that the epistemic conditions do not involve any strategic considerations. This point was demonstrated in Section 20.8.3 where I noted that in any normal form game, regardless of its payoff structure, the beliefs profile {M(S2 ), M(S1 )} constitutes an equilibrium in their sense.
20.11. Appendix Proof of Proposition 20.6. Fix ω at the state where the rationality of the players, {ui }ni=1 and {Bi }ni=1 , are mutual knowledge. Player i knows player j ’s beliefs means Hi (ω) ⊆ {ω ∈ | Bj (ω ) = Bj (ω)}. Player i knows player j is rational means Hi (ω) ⊆ {ω ∈ | fj (ω ) ∈ BRj (ω )(Bj (ω ))}. Note that BRj varies with the state because uj does. Player i knows player j ’s payoff function means Hi (ω) ⊆ {ω ∈ | BRj (ω ) = BRj (ω)}.
Equilibrium in beliefs under uncertainty
517
Therefore, Hi (ω) ⊆ {ω ∈ | Bj (ω ) = Bj (ω)} ∩ {ω ∈ | fj (ω ) ∈ BRj (ω )(Bj (ω ))} ∩ {ω ∈ | BRj (ω ) = BRj (ω)} ⊆ {ω ∈ | fj (ω ) ∈ BRj (ω)(Bj (ω))}. This implies {fj (ω ) | ω ∈ Hi (ω)} ⊆ BRj (ω)(Bj (ω)). The fact that the preference of player j is quasiconcave implies that BRj (ω)(Bj (ω)) is a convex set. Therefore, we have convex hull of {fj (ω ) | ω ∈ Hi (ω)} ⊆ BRj (ω)(Bj (ω)). By construction of Bi (ω), margSj Bi (ω) ⊆ convex hull of {fj (ω ) | ω ∈ Hi (ω)} ⊆ BRj (ω)(Bj (ω)). This shows that {Bi }ni=1 is a Beliefs Equilibrium. Proof of Proposition 20.7. The conditions stated in Proposition 20.7 imply those in Proposition 20.6. Therefore, it is immediate that {Bi }ni=1 is a Beliefs Equilibrium. By construction of Bi (ω) and the assumption i (ω) = M(Hi (ω)) ∀ω, we have margSj Bi (ω) = convex hull of {fj (ω ) | ω ∈ Hi (ω)} ∀ω. Now fix ω at the state where the rationality of the players and {ui }ni=1 are mutual knowledge and {Bi }ni=1 is common knowledge. Player k knows player i’s beliefs implies that Bi (ω ) = Bi (ω) ∀ω ∈ Hk (ω) and therefore, convex hull of {fj (ω ) | ω ∈ Hi (ω )} = margSj Bi (ω ) = margSj Bi (ω) = convex hull of {fj (ω ) | ω ∈ Hi (ω)}
∀ω ∈ Hk (ω).
Let jk ⊆ {fj (ω ) | ω ∈ Hk (ω)} be the set of extreme points of margSj Bk (ω). I claim that jk ⊆ margSj Bi (ω) (and therefore margSj Bk (ω) ⊆ margSj Bi (ω)). Suppose that it were not true. Then there exists σj ∈ jk such that σj ∈ /
518
Kin Chung Lo
margSj Bi (ω ) ∀ω ∈ Hk (ω). Therefore, σj ∈ / {fj (ω ) | ω ∈ Hi (ω )} ω ∈Hk (ω)
⊇ {fj (ω ) | ω ∈ Hk (ω)} jk ∈ σj , which is a contradiction. Since i, j , and k are arbitrary, we have margSj Bi (ω) = margSj Bk (ω) and, in particular, ji = jk ≡ j . It only remains to show that Bi (ω) = convex hull of {σ−i ∈ M(S−i ) | margSj σ−i ∈ j }. Bi (ω) takes the form as required if and only if for each σ−i ∈ ×j =i j , there exists ω ∈ Hi (ω) such that fj (ω ) = σj ∀j = i. Suppose that the condition stated above were not satisfied. Without loss of generality, assume that for player 1, there exists σ−i ∈ ×j =i j such that for each / ω ∈ H1 (ω), there exists j = 1 where fj (ω ) = σj . This implies that σ−1 ∈ B1 (ω). B1 (ω) is common knowledge at ω implies that B1 (ω ) = B1 (ω) ∀ω ∈ / B1 (ω ) ∀ω ∈ H(ω). Therefore, for each ω ∈ H (ω), H(ω). Therefore, σ−1 ∈ there exists j = 1 where fj (ω ) = σj . Now consider player 2. The last sentence in the previous paragraph implies that for ω ∈ H (ω) such that f2 (ω ) = σ2 , for each ω ∈ H2 (ω ), there exists j ∈ / marg×nj=3 Sj B2 (ω ). Again {3, . . . , n} such that fj (ω ) = σj . Therefore, nj=3 σj ∈ B2 (ω) is common knowledge at ω ∈ implies B2 (ω ) = B2 (ω) ∀ω ∈ H (ω). / marg×nj=3 Sj B2 (ω ) ∀ω ∈ H (ω) and we can conclude that Therefore, nj=3 σj ∈ for each ω ∈ H(ω), there exists j ∈ {3, . . . , n} such that fj (ω ) = σj . Repeat the same argument for players 3, . . . , n to conclude that for each ω ∈ H(ω), fn (ω ) = σn . This contradicts the fact that σn ∈ n . Proof of Proposition 20.8. By construction of Bi (ω) and the assumption i (ω) = M(Hi (ω)), it follows that margSj Bi (ω) = convex hull of {fj (ω ) | ω ∈ Hi (ω)}. In particular, fj (ω) ∈ margSj Bi (ω). At ω, the fact that player j is rational implies fj (ω) ∈ BRj (ω)(Bj (ω)). Therefore, margSj Bi (ω) ∩ BRj (ω)(Bj (ω)) = Ø. Proof of Proposition 20.11. Set Ej = fj (ω). By construction of Bi (ω) and assumption i (ω) = M(Hi (ω)), it follows that Bi (ω) = M({fj (ω ) | ω ∈ Hi (ω)}). In particular, there exists a probability measure in Bi (ω) which attaches probability one to Ej . Therefore, Ej satisfies condition 1 in Definition 20.10. At ω, the fact that player j is rational implies fj (ω) ∈ BRj (ω)(Bj (ω)). Therefore, condition 2 in Definition 20.10 is also satisfied. This completes the proof that {B1 , B2 } is a Nash Equilibrium Under Uncertainty.
Acknowledgments This is a revised version of Chapter 1 of my PhD thesis at the University of Toronto. I especially thank Professor Larry G. Epstein for pointing out this topic, and for
Equilibrium in beliefs under uncertainty
519
providing supervision and encouragement. I am also grateful to Professors Eddie Dekel, R. M. Neal, Mike Peters, and Shinji Yamashige for valuable discussions and to an associate editor and two referees for helpful comments. Remaining errors are my responsibility.
Notes 1 The only exception is that when Y is the space of outcomes X, M(X) denotes the set of all probability measures over X with finite supports. 2 See Gilboa and Schmeidler (1989: 149–150). 3 Throughout this chapter, I use (y1 , p1 ; . . . ; ym , pm ) to denote the probability measure which attaches probability pi to yi . 4 Also note that uncertainty aversion is not the only reason for players to have a strict incentive to randomize. In Crawford (1990) and Dekel et al. (1991), players may also strictly prefer to randomize even though they are probabilistically sophisticated. 5 To avoid confusion, note that this is not Harsanyi’s Bayesian Equilibrium for games of incomplete information with Bayesian players. 6 To be even more precise, a Beliefs Equilibrium is an n-tuple of closed and convex sets of probability measures {Bˆ i }ni=1 such that the complement of BRi (Bˆ i ) is a set of margM(Si ) pˆ j -measure zero for every pˆ j ∈ Bˆ j . 7 A parallel statement for Bayesian players is that a Nash Equilibrium may not be a Strict Nash Equilibrium. 8 Note that this only explains why the decision maker may strictly prefer to randomize. We also need to rely on the dynamic consistency argument proposed by Machina (1989) to ensure that the decision maker is willing to obey the randomization result after the randomizing device is used. See also Dekel et al. (1991: 241) for discussion of this issue in the context of normal form games. 9 Mukerji (1994) argues that uncertainty about opponents’ strategy choices can even persist as a steady state in the repeated game scenario. 10 The notation supp jn−1 stands for the union of the supports of the probability measures in jn−1 . 11 A well-known example where this kind of reasoning applies is the following. An expected utility maximizer who is facing an exogenously specified set of states of nature always prefers to have more information before making a decision. However, this is not necessarily the case if the decision maker is playing a game against another player. The reason is that if player 1 chooses to have less information and if player 2 “knows” it, the strategic behavior of player 2 may be affected. The end result is that player 1 may obtain a higher utility by throwing away information. (See the discussion of correlated equilibrium in Chapter 2 of Fudenberg and Tirole, 1991.) 12 Though I prove a result later (Proposition 20.12) for more general preferences, I provide a separate proof here because the special structure of the Gilboa–Schmeidler model permits some simplification. 13 For the Bayesian Beliefs Equilibrium {bi }ni=1 constructed in the proof of Proposition 20.3, we actually have max min ui (σi , pi ) = max ui (σi , bi ).
σi ∈M(Si ) pi ∈Bi
σi ∈M(Si )
This of course, does not exclude the possibility that there may exist other Bayesian Beliefs Equilibria contained in {Bi }ni=1 such that the equality is replaced by a strict inequality. 14 Greenberg (2000) develops an example independently. The intuition of his example is very similar to that of Example 20.5 in this chapter.
520
Kin Chung Lo
15 We may also want to ask the reverse question: Are the conditions stated in Propositions 20.4 and 20.5 necessary for the absence of Proper Beliefs Equilibrium? The game in Table 20.12 has two Nash Equilibria (and therefore it is not dominance solvable). They are {U , R} and {D, L}. However, there does not exist a Proper Beliefs Equilibrium. 16 Another feature of the interactive belief system (Aumann and Brandenburger, 1995: 1164) that is shared by the model here is that players’ prior beliefs are not part of the specification. Note that the common prior assumption in their paper is imposed only for their Theorem B (p. 1168). 17 In particular, unlike Theorem B in Aumann and Brandenburger (1995), the proof of Proposition 20.7 does not rely on the “agreeing to disagree” result of Aumann (1976). 18 A brief review of other related papers is provided below. There are two other papers on generalizations of Nash Equilibrium. Lo (1999a) proposes Cautious Nash Equilibrium which, when specialized to the multiple priors model, refines the equilibrium concept in Dow and Werlang (1994). However, its main focus is on relaxing mutual knowledge of rationality, rather than uncertainty aversion. Mukerji (1994) proposes the equilibrium concept Equilibrium in ε-ambiguous Beliefs. The equilibrium concept only admits players’ utility functions having a specific form but otherwise is identical to that in Dow and Werlang (1994). Epstein (1997) and Mukerji (1994) generalize rationalizability. The former requires common knowledge of rationality but the latter does not. For normal form games of incomplete information, Epstein and Wang (1996) establish the general theoretical justification for the Harsanyi style formulation for non-Bayesian players. Lo (1999b) provides a generalization of Nash Equilibrium in extensive form games. All the above papers either adopt the multiple priors model or consider a class of preferences that includes the multiple priors model as a special case. 19 The equilibrium concept presented here is a simplified version. Klibanoff (1993) assumes that players’ beliefs are represented by lexicographic sets of probability measures. 20 To see that Klibanoff’s equilibrium concept satisfies Definition 20.11 when Dow and Werlang’s definition of nullity is adopted, restate Weak Beliefs Equilibrium in terms of Bˆ i : {Bˆ i , Bˆ j } is a Weak Beliefs Equilibrium if there exists bˆj ∈ Bˆ j such that σi ∈ BRi (Bˆ i ) ∀σi ∈ support of bˆj . Set i in Definition 20.11 to be the support of bˆj .
References F. J. Anscombe and R. Aumann, (1963) A definition of subjective probability, Ann. Math. Statist. 34, 199–205. R. Aumann, (1976) Agreeing to disagree, Ann. Statist. 4, 1236–1239. R. Aumann, (1987) Correlated equilibrium as an expression of Bayesian rationality, Econometrica 55, 1–18. R. Aumann and A. Brandenburger, (1995) Epistemic conditions for Nash equilibrium, Econometrica 63, 1161–1180. A. Brandenburger, (1992) Knowledge and equilibrium in games, J. Econ. Perspect. 6, 83–101. C. Camerer and M. Weber, (1992) Recent developments in modelling preference: Uncertainty and ambiguity, J. Risk. Uncertainty 5, 325–370. V. Crawford, (1990) Equilibrium without independence, J. Econ. Theory 50, 127–154. E. Dekel, Z. Safra, and U. Segal, (1991) Existence and dynamic consistency of Nash equilibrium with non-expected utility preferences, J. Econ. Theory 55, 229–246. J. Dow and S. Werlang, (1994) Nash equilibrium under Knightian uncertainty: Breaking down backward induction, J. Econ. Theory 64, 305–324. D. Ellsberg, (1961) Risk, ambiguity, and the savage axioms, Quart. J. Econ. 75, 643–669.
Equilibrium in beliefs under uncertainty
521
L. G. Epstein, (1997) Preference, rationalizability and equilibrium, J. Econ. Theory, 73, 1–29. L. G. Epstein and T. Wang, (1996) Beliefs about beliefs without probabilities, Econometrica, 64, 1343–1373. K. Fan, (1953) Minimax theorems, Proc. Nat. Acad. Sci. 39, 42–47. D. Fudenberg and J. Tirole, (1991) “Game Theory,” MIT Press, Cambridge, MA. I. Gilboa and D. Schmeidler, (1989) Maxmin expected utility with non-unique prior, J. Math. Econ. 18, 141–153. (Reprinted as Chapter 6 in this volume.) J. Greenberg, (2000) The right to remain silent, Theory and Decision 48, 193–204. L. Hurwicz, (1951) Optimality criteria for decision making under ignorance, Cowles Commission Discussion Paper. P. Klibanoff, (1993) Uncertainty, decision, and normal form games, manuscript, MIT. K. C. Lo, (1999a) Nash equilibrium without mutual knowledge of rationality, Economic Theory 14, 621–633. K. C. Lo, (1999b) Extensive form games with uncertainty averse players, Games and Economic Behaviour 28, 256–270. S. Mukerji, (1994) A theory of play for games in strategic form when rationality is not common knowledge, manuscript, Yale University. M. Machina, (1989) Dynamic consistency and non-expected utility models of choice under uncertainty, J. Econ. Lit. 27, 1622–1668. M. Machina and D. Schmeidler, (1992) A more robust definition of subjective probability, Econometrica 60, 745–780. J. Nash, (1951) Non-cooperative games, Ann. Math. 54, 286–295. M. Osborne and A. Rubinstein, (1994) “Game Theory,” MIT Press, Cambridge, MA. D. Pearce, (1984) Rationalizable strategic behaviour and the problem of perfection, Econometrica 52, 1029–1050. H. Raiffa, (1961) Risk, ambiguity and the savage axioms: Comment, Quart. J. Econ. 75, 690–694. L. Savage, (1954) “The Foundations of Statistics,” Wiley, New York. D. Schmeidler, (1989) Subjective probability and expected utility without additivity, Econometrica 57, 571–581. (Reprinted as Chapter 5 in this volume.) U. Segal, (1990) Two-stage lotteries without the reduction axiom, Econometrica 58, 349–377.
21 The right to remain silent Joseph Greenberg
21.1. Introduction Over the last two decades economic theorists (in fields that include microeconomics, macroeconomics, industrial organization, and labor economics) have extensively studied dynamic situations in which players move sequentially. Most of the formal analysis is done by representing the model under consideration as a “game tree,” and then employing the notion of “equilibrium in strategy profiles,” notably Nash equilibrium (or any one of its many refinements). In recent years economists and game theorists have come to recognize many shortcomings of Nash equilibrium. In a narrow sense, the contribution of this short chapter is pointing out another deficiency of Nash equilibrium in dynamic games. Much more ambitiously, I hope to convince (at least some of ) the readers that “equilibrium in strategy profiles” is not the appropriate notion that ought to be used in the analysis of dynamic games. This is true both conceptually and empirically: it is very hard to interpret a strategy profile (viewed as either a choice of actions or as beliefs), and neither introspection nor observed behavior suggest that players consider strategy profiles. Moreover, I shall show that the use of “equilibrium in strategy profiles” does not allow players to use ambiguity to their advantage. In “normal form games” the strategy sets constitute part of the given data. Indeed, such games are now known as “games in strategic form.” Thus, in any analysis of a normal form game, strategies must constitute the basic (in fact, the only!) “building block.” Such is not the case with dynamic games. It was an ingenious idea to invent the notion of strategy in dynamic games, enabling to transform them into normal form games.1 This transformation is not trivial; a “strategy” becomes a function that assigns to every information set an action available at this information set. In particular, a strategy profile specifies the precise ( perhaps probabilistic) actions to be taken in every possible contingency (information set). Clearly, this notion is both complex and unintuitive.2 It is also very difficult to interpret this notion. But more importantly, I shall argue that rarely
Greenberg, Joseph (2000) “The right to remain silent,” Theory and Decision, 48: 193–204.
The right to remain silent
523
do we observe players employing strategies. A more plausible building block in the analysis of dynamic games is, perhaps, a path or a play, that is, the course of action that is to be followed.3 I also contend that in many social interactions in “real life” players communicate and discuss their choice of actions, (even if no agreement can be signed or trusted).4 Players, typically, negotiate over the “paths” to be taken, not over strategies. The Example in Section 21.3 illustrates that when players are involved in “open negotiations,” it may be disadvantageous for a player to choose a strategy. That is, a player may benefit by not revealing (or not pre-determining) the choice of his action in an information set, he thereby hopes will not be reached.5 He would be better-off to “cross the bridge if and when he gets to it.” A player might benefit from exercising his “right to remain silent” if he believes – as the empirical evidence shows – that players display aversion to “Knightian uncertainty.”6 In that case, a player who behaves strategically, may wish to avoid revealing/choosing his strategy.7 Section 21.4 concludes the chapter with a discussion of some related literature, and with a modification of the Example of Section 21.3 demonstrating that by “remaining silent,” all players can be made better-off, relative to the (unique) Nash equilibrium.
21.2. Strategies in a dynamic game In this section I shall argue that the analysis of a dynamic game should not be based on “equilibrium in strategy profiles.” This is true both conceptually and empirically. On the conceptual level, it is by no means clear what it is that a strategy profile, in dynamic games, represents, because a crucial feature of a dynamic game is that (some of ) players’ actions are revealed along the play of the game. Ever since Cournot, a strategy in a normal form game typically represents a choice of action(s). It is in this way that game theorists have, for a long time, interpreted the notion of a strategy also in extensive form game. But then, how are we to interpret, for example, the action player i’s strategy specifies in some information set h, if that information set cannot be reached (because of i’s own previous choice of actions) if i were to follow this strategy?8 To rescue the usefulness of the notion of “equilibrium in strategy profiles” (and hence, of Nash equilibrium or its refinements) in dynamic games, it was then suggested to interpret a strategy of player i as representing the beliefs other players have over the actions i would take. But, again, because in a dynamic game (some of ) players’ actions are revealed along the play of the game, the beliefs other players have over the actions i would take should be modified as the game unravels and i’s past actions are revealed. Beliefs, therefore, ought to depend on the subgame reached.9 In addition to the conceptual difficulties, strategy profiles fail also to be descriptive. Typically, individuals do not consider all possible contingencies. Rather, players often “negotiate openly,” trying to “convince,” “influence,” “coordinate,” and “agree” on a course of action that is to be followed.10 Sometimes, such agreements
524
Joseph Greenberg
include clauses that prescribe the precise consequences (sanctions/punishments) for some deviations. But rarely, if ever, are all possible deviations covered. Almost no contract is “complete.” The same is true for any “social norm” or “legal system.” They specify the “appropriate/legal/acceptable behavior,” but neither the social norm nor any legal system pins down the precise actions (“punishments”) to be taken in all contingencies that might possibly arise when the prescribed behavior is not followed.11 To conclude, any notion that uses “equilibrium in strategy profiles” considerably limits the relevance of the analysis. This is true when strategies are interpreted as the actual choice of actions by the players or as players’ beliefs, or as representing the legal system or players’ “thought processes.”
21.3. An example Consider the following diplomatic “peace-negotiation” scenario, which is represented by the game tree in Figure 21.1.12 Each of the two warring countries, 1 and 2, has to decide whether or not to reach a peace agreement, represented by the path (bd). Failing to reach an agreement, country 3 would “re-evaluate” its policy, a decision that will affect both countries 1 and 2. Assume that country 3 has no way to know which of the two countries caused the breakup of the negotiations (otherwise, it could threaten to retaliate against that country). All it observes is whether or not the negotiations were successful. As the payoffs in Figure 21.1 indicate, it is in the best interest of country 3 that the two warring countries sign the peace agreement.13 Since country 3 cannot know who is responsible for the breakup of the peace negotiations, both policies L and R are “rational.” Both countries 1 and 2 (correctly) anticipate this set of “plausible/rational” re-evaluated policies. Therefore, unless country 3 pre-determines, or reveals in advance the policy it is going to adopt should the peace treaty not be reached, countries 1 and 2 have no way to know (even probabilistically) which policy would be adopted by country 3. It is, then, conceivable14 that each country 1 b a
2 c
3
v L
0 9 1
w L
R
9 0 0
Figure 21.1
d
3 9 0
R
6 0 1
4 4 4
The right to remain silent
525
will follow the path (bd), but each because of different reasons: country 1 for believing that policy L is more likely to be adopted than policy R, and country 2 for believing that policy R is more likely to be adopted than policy L. It is important to observe that if both countries held the same beliefs on the precise likelihood of the adoption of policies L and R, at least one of these two countries would find it in its best interest to jeopardize the peace talks. Nevertheless, by remaining silent, player 3 can create some uncertainty in the other players’ minds, thereby accomplishing his goal (that his information set is not reached). However, no Nash equilibrium for this game supports the path (bd). In fact, this game possesses a unique Nash equilibrium,which is given by: player 1 chooses actions a and b with equal probabilities (i.e. he uses the mixed strategy ( 12 a; 12 b)), player 2 chooses c (with probability 1), and player 3 chooses actions L and R with equal probabilities (i.e. he uses the mixed strategy ( 12 L, 12 R)). The resulting equilibrium payoff vector is (4.5, 4.5, 0.5).15 The success of the peace talks between Israel and Egypt ( players 1 and 2) mediated by the USA (player 3) following the 1973 war, may be, at least partially, attributed to such a phenomenon. Egypt and Israel were each afraid that if negotiations broke down, she would be the loser. “And once a negotiation is thus reduced to details, it has a high probability of success – unless one party has consciously decided to make a show of flexibility simply to put itself in a better light for a deliberate breakup of the talks.16 Egypt was precluded from such a course by the plight of the Third Army, Israel by the fear of diplomatic isolation” (Kissinger, 1982: 802). I shall now show how player 3 can implement the path (bd) when players are allowed to openly communicate. Were I player 3, I would suggest players 1 and 2 to follow the path (bd). I definitely would choose not to disclose the choice of my action if my information set were to be reached. By “remaining silent,” players 1 and 2 would no longer have a single common belief about my choice of action. It is then conceivable that player 1 might fear that I would choose L (with probability greater than 5/9), and that player 2 might fear that I would choose R (with probability greater than 5/9). In this case, each of the two players would be happy with the payoff of 4, thus, they would accept my suggestion to follow the path (bd). And I shall get a payoff of 4 instead of my Nash equilibrium payoff of 1/2. That is, by deferring or concealing the choice of my strategy, I may well deter the players from employing the Nash strategies, thereby considerably increasing my own payoff. The unique Nash equilibrium may not be acceptable even if it is interpreted as a recommendation. Indeed, if either an outside recommender or one of the two players were to suggest that we follow the unique Nash equilibrium rather than the path (bd), I, as player 3, would openly reject this recommendation. Instead, I would tell the other two players that I am not yet sure which probability distribution over my actions L and R I will choose, but in any case, I can assure them that I shall not follow their (Nash) recommendation. Note that this threat of mine is “credible.” For, if players 1 and 2 would follow their Nash strategies, then my (expected) payoff is 1/2 no matter what action I choose. I stand to lose nothing by
526
Joseph Greenberg
adhering to my threat. It is, therefore, likely that players 1 and 2 would reconsider and agree to follow that path (bd), instead.
21.4. Concluding remarks Remark 21.1. The following simple modification of our example shows that the strategic employment of Knightian uncertainty might yield an outcome that is Pareto superior to the (unique) Nash outcome. Since the game in Figure 21.1 has a unique Nash equilibrium that passes through player 3’s information set, the only Nash payoff in the game depicted in Figure 21.2 is (2, 2, 2, 2). But, if player 3 does not specify his strategy, then the players may well agree to follow the path (Dbd) which yields the Pareto superior payoff of (4, 4, 4, 4). (Note that in this example player 4 need not worry that player 3 might decide to “double cross,” that is, to remain silent in order to induce player 4 to choose D, and then to disclose his choice were his information reached. Player 3’s interests are best served by remaining silent.) Remark 21.2. Knight (1921) argued for a distinction between uncertainty (a situation in which players are not informed about the “objective” probabilities) and risk (when the “objective” probabilities are known by the players). There is ample evidence that players behave differently under uncertainty and risk. Specifically, most players exhibit aversion to uncertainty. The best known example of this phenomenon is the Ellsberg (1961) Paradox. As Ellsberg (1961: 656) notes: The important finding is that, after rethinking all their “offending” decisions in the light of [Savage] axioms, a number of people who are not only 4 U
D 1 b
2 2 2 2
–a c v
L
0 9 1 0
2
3
w
R
9 0 0 0
Figure 21.2
d
L
3 9 0 0
R
6 0 1 0
4 4 4 4
The right to remain silent
527
sophisticated but reasonable decide that they wish to persist in their choices. This includes many people who previously felt a “first-order commitment” to the axioms, many of them surprised and some dismayed to find that they wished, in these situations, to violate the Sure-thing Principle. Many subsequent studies (see, e.g. Camerer and Weber (1992)) have found ambiguity premiums which are strictly positive. Observe that for the purpose of this chapter, the magnitude of these premiums (which is typically around 10–20 percent in expected value terms) is irrelevant. The existence of these premiums implies that one can construct examples, similar to the one given in Section 21.3, in which it would benefit a player not to reveal/pre-determine his choice of actions in some contingencies. Remark 21.3. As was mentioned in the Introduction, there are many other solution concepts that support the path (bd). Bernheim (1984) and Pearce’s (1984) notion of “rationalizability,” which is appropriate if no communication among players takes place, includes this path. Other concepts that include this path emerge from the recent literature on “learning,” and are motivated by the fact that “off equilibrium choices” are not observed, and hence the requirement of “commonality of beliefs” cannot be justified.17 Finally, (bd) is also included in the solution concepts that modify the notion of Nash equilibrium to incorporate Knightian uncertainty.18 But all of the above are notions of “equilibrium in strategies” (or in “capacities”), and they all extend the notion of Nash equilibrium. The same is true for rationalizable outcomes. Thus, even in our simple example, these notions support other paths as well (including the “Nash path”). In contrast, I am not attempting here to come up with an “equilibrium notion” in the absence of commonality of beliefs or in the presence of Knightian uncertainty. Rather, I suggest that players use these features to their advantage. In particular, in our example, I suggest that it is the path (bd) that would result in that game. Remark 21.4. Of course, just as it might pay a player not to reveal his choice of “credible” action in some of his information sets, (as is the case with player 3 in our example), there are other situations in which a player may wish to reveal the actions he intends to take in the future, thereby attracting players to his information set. I intend to further study the set of paths that is likely to prevail when players behave strategically, but my purpose here is only to suggest that equilibrium in strategies might be inappropriate to study strategic behavior in dynamic games.
Appendix: Proof of uniqueness We shall now verify that the game depicted in Figure 21.1 admits a unique Nash equilibrium, given by: Player 1 uses the mixed strategy ( 12 a, 12 b), player 2 uses the pure strategy c, and player 3 uses the mixed strategy ( 12 L, 12 R). It is easy to see that there is no Nash equilibrium in which player 3 employs a pure strategy, since if it is R then player 1 must choose a, in which case, player 3’s
528
Joseph Greenberg
best response is L. If, on the other hand, player 3’s pure strategy is L, then player 1 will choose b, player 2 will choose c, in which case, player 3’s best response is R. Moreover, in every Nash equilibrium player 1 must employ a strictly mixed strategy, since otherwise player 3 would know whether he is in vertex v or in vertex w, and thus employ a pure strategy, contrary to the earlier argument. As for player 2, he cannot employ the pure strategy d, since then player 3 would know that he is in vertex v and employ the pure strategy L, contradicting our conclusion that in every Nash equilibrium player 3 does not employ a pure strategy. Denote by α, β, and γ , respectively, the probabilities that player 1 chooses a, player 2 chooses c and player 3 chooses L in a Nash equilibrium for this game. By the earlier discussion, we have that 0 < α, γ < 1, and β > 0. We shall now show that the only values that α, β and γ can assume are 1/2, 1, and 1/2, respectively. To see that β = 1, assume otherwise. Then, since we have established that β > 0, player 2 employs a strictly mixed strategy and therefore he must be indifferent between c and d. That is, 9γ = 4, that is, γ = 4/9. Since β < 1, player 1’s unique best response is a, (guaranteeing himself the payoff of 5), which contradicts our conclusion that player 1 uses a strictly mixed strategy. Thus, β = 1. As 0 < α < 1, player 1 is indifferent between a and b. That is 9(1 − γ ) = 3γ + 6(1 − γ ), implying that γ = 1/2. Finally, since 0 < γ < 1 player 3 is indifferent between L and R, that is, α = 1/2. Thus, in the unique Nash equilibrium in this game, α = 1/2, β = 1, and γ = 1/2 – as we wished to show.
Acknowledgments I thank Daniel Arce, Geir Asheim, Ariel Assaf, Giacomo Bonanno, Faye Diamantoudi, Benyamin Shitovitz, Xiao Luo, and Licun Xue for their useful comments and advice. I also thank the Editor for his support and encouragement. Financial support from the Research Council of Canada (both the Natural Sciences and Engineering (NSERC), and the Social Sciences and Humanities (SSHRC)) and from Quebec’s Fonds (FCAR) is gratefully acknowledged.
Notes 1 My understanding is that the definition of a strategy in dynamic games is due to Kuhn (1953). 2 This is evidenced, for example, by the difficulty almost every student encounters when first exposed to this notion. 3 See Greenberg (1990, 1996), and Greenberg et al. (1996). 4 This, of course, is in sharp contrast to the social environment envisioned by Nash (1951) where: “each participant acts independently, without collaboration or communication with any of the others.” 5 See Remark 21.4. 6 See Remark 21.2. 7 Ed Green communicated to me that some of his colleagues in the Federal Reserve System in Minnesota use the term “constructive ambiguity” to describe a policy of
The right to remain silent
8 9 10
11 12 13 14 15 16 17 18
529
being deliberately vague about how far they would be willing to go to bail out a large bank if one were to fail. For a more detailed discussion, see, for example, Rubinstein (1991). A similar criticism, regarding the notion of subgame perfect equilibrium, was put forward by Binmore (1987), arguing that players cannot hold to their beliefs if these beliefs have been proved to be wrong in the past; see, also, Osborne and Rubinstein (1994). Only a very limited set of real life situations is captured in Nash’s “complete noncommunicative” realm. As I have argued in Greenberg (1990), the description of a normal form game does not provide any information concerning the way in which the game is being played. For example, it provides no information concerning the availability of legal institutions that allow for binding agreements, self-commitments, or coalition formation. Nash equilibrium, in addition to providing a solution concept, also “completes” the description of the game by assuming that every player takes the actions of the other players as given. For a more detailed analysis and discussion, see Greenberg (1996). The example is reminiscent of the “horse-shaped game” in Fudenberg and Kreps (1995: Example 6.1). Country 3’s payoff in that case is 4, while the most it can obtain if the negotiations break up is a payoff of 1. See Remark 21.2. See proof in Appendix. Thus, the USA was unable to know which of the two players is “really” responsible for the breakup of the talks, as is reflected in Figure 21.1. (My footnote.) See, for example, Fudenberg and Levine (1993), Kalai-Lehrer (1993), and Rubinstein and Wolinsky (1994). See, for example, Dow–Werlang (1994), Goes et al. (1998), Hendon et al. (1994), Klibanoff (1993), and Lo (1996). Goes et al. (1998) consider a game similar to our example, and they, too, single out the path (bd), from among the set of Nash equilibrium in lower probabilities.
References Bernheim, D. (1984). Rationalizable strategic behavior, Econometrica, 52, 1007–1028. Binmore, K. (1987). Modelling rational players: Part I, Economics and Philosophy, 3: 179–214. Camerer, C. and Weber, M. (1992). Recent developments in modelling preferences: Uncertainty, and Ambiguity, Journal of Risk and Uncertainty, 5(4): 325–370. Dow, J. and Werlang, S. (1994). Nash equilibrium under Knightian uncertainty, Journal of Economic Theory, 64(2): 305–324. Ellsberg, D. (1961). Risk, ambiguity, and the savage axioms, Quarterly Journal of Economics, 75: 643–669. Fudenberg, D. and Levine, D. (1993). Self-confirming equilibrium, Econometrica, 61(3): 523–545. Fudenberg, D. and Kreps, D. (1995). Learning in extensive-form games, I: Self-confirming equilibria, Games and Economic Behavior, 8: 20–55. Goes, E., Jacobason, H. J., Sloth, B. and Tranaes, T. (1998). Nash equilibrium with lower probabilities, Theory and Decision, 44: 37–66. Greenberg, J. (1990). The theory of social situations: an alternative game-theoretic approach. Cambridge University Press.
530 Joseph Greenberg Greenberg, J. (1996). Acceptable course of action in dynamic games, Proceedings of the Seventh International Symposium on Dynamic Games and Applications; Filar J., Gaitsgory V. and Imado F. (eds.), 283–298. Greenberg, J., Monderer, D. and Shitovitz, B. (1996). Multistage situations, Econometrica, 64(6): 1415–1437. Hendon, E., Jacobason, H. J., Sloth, B. and Tranaes, T. (1994). Game theory with lower probabilities. University of Copenhagen, Mimeo. Kalai, E. and Lehrer, E. (1993). Subjective equilibrium in repeated games, Econometrica, 61(5): 1231–1240. Kissinger, H. (1982). Years of upheaval. Boston Little Brown and Company. Klibanoff, P. (1993). Uncertainty, decision and normal-form games. Cambridge, MA: MIT, Mimeo. Knight, F. (1921). Risk, uncertainty, and profit. Houghton-Mifflin. Kuhn, H. W. (1953). Extensive games and the problem of information. Contribution to the Theory of Games Vol. i (pp. 193–216). Princeton, NJ: Princeton University Press. Lo, C. K. (1996). Equilibrium in beliefs under uncertainty, Journal of Economic Theory, 71: 443–484. (Reprinted as Chapter 20 in this volume.) Osborne, J. M. and Rubinstein, A. (1994). A course in game theory. Cambridge, MA: The MIT Press. Pearce, D. (1984). Rationalizable strategic behavior and the problem of perfection, Econometrica, 52, 1029–1050. Rubinstein, A. (1991). Comments on the interpretation of game theory, Econometrica, 59: 909–924. Rubinstein, A. and Wolinsky, A. (1994). Rationalizable conjectural equilibrium: between Nash and rationalizability, games and economic behavior, Games and Economic Behavior, 6(2): 299–311.
22 On the measurement of inequality under uncertainty Elchanan Ben-Porath, Itzhak Gilboa, and David Schmeidler
22.1. Motivation The bulk of the literature on inequality measurement assumes that the income profiles do not involve uncertainty. It is natural to suppose that if uncertainty (or risk) is present, one may use the theory of decision under uncertainty to reduce the inequality problem to the case of certainty, say, by replacing each individual’s income distribution by its expected value or expected utility. Alternatively, it would appear that one may use the theory of inequality measurement to reduce the problem to a single decision-maker’s choice under uncertainty, say, to the choice among distributions over inequality indices. We claim, however, that neither of these reductions would result in a satisfactory approach to the measurement of inequality under uncertainty. Rather, inequality and uncertainty need to be analyzed in tandem. The following example illustrates. Consider a society consisting of two individuals, a and b. They are facing two possible states of the world, s and t. A social policy determines the income of each individual at each state of the world. We further assume that both individuals, hence also a “social planner,” have identical beliefs represented by a probability over the states. Say, the two states are equally likely. Consider the following possible choices (or “social policies”). f1 s t
a 0 1
b 0 1
f2 s t
a 1 0
b 1 0
g1 s t
a 0 1
b 1 0
g2 s t
a 1 0
b 0 1
h1 s t
a 0 0
b 1 1
h2 s t
a 1 1
b 0 0
“On the Measurement of Inequality under Uncertainty”, by Elchanan Ben-Porath, Itzhak Gilboa, and David Schmeidler (Journal of Economic Theory, 75 (1997): 194–204).
532
Ben-Porath et al.
We argue that a reasonable social ordering · would rank these choices from top to bottom: f1 ≈ ·f2 > ·g1 ≈ ·g2 > ·h1 ≈ ·h2 . Indeed, symmetry between the states and anonymity of individuals imply the equivalence relations. The f choices are preferred to the g choices due to ex-post inequality: in all of these (f and g) alternatives, the expected income of each individual is 0.5, and thus there is no inequality ex-ante. But according to f , both individuals will have the same income at each state of the world, while under g the resulting income profile will have a rich individual and a poor individual.1 One may argue that the f alternatives are riskier than the g ones from a social standpoint, since under f there is a state of the world in which no individual has any income, whereas the g alternatives allow additional transfers in each state of the world, after which both individuals would have a positive income. However, we consider a social planner’s preferences over final allocations. These preferences are the basis on which potential transfers will be made. The comparison between g and h hinges on ex-ante inequality: ex-post, both choices have the same level of inequality at each state of the world. However, the g alternatives promise each individual the same expected income, while the h choices pre-determine which individual will be the rich and which will be the poor one. Thus g is “more ex-ante egalitarian” than h. Matrices g2 and h2 are identical to those of Diamond (1967)2 . We observe that one cannot capture these preferences if one reduces uncertainty to, say, expected utility and measures the inequality of the latter, or vice versa. For instance, suppose that the Gini index is the accepted measure of inequality. In the case of two individuals, and in the absence of uncertainty, the Gini welfare function can be written as G(y1 , y2 ) = (3y˜1 + y˜2 )/4 where (y˜1 , y˜2 ) is a permutation of (y1 , y2 ) such that y˜1 y˜2 . Ranking alternatives by the Gini welfare function of the expected incomes will distinguish between g and h, but not between f and g. On the other hand, selecting the expected Gini index as a choice criterion will serve to distinguish between f and g, but not between g and h. By contrast, a (weighted) average of the expected Gini index and the Gini of the expected income would rank f above g and g above h. We are therefore interested in measures of social welfare under uncertainty that take into account both ex-ante and ex-post inequality, and, in particular, that include the above-mentioned functionals. Furthermore, our goal is to characterize a class of measures that is a natural generalization of those commonly used for the measurement of social welfare under certainty. That is, we seek a set of principles that will be equally plausible in the contexts of certainty and of uncertainty, that is satisfied by known measures under certainty, and that, under uncertainty, will reflect both ex-ante and ex-post inequality considerations.
22.2. Inequality and uncertainty The relationship between the measurement of inequality and the evaluation of decisions under uncertainty has long been recognized. Harsanyi’s utilitarian solution (Harsanyi, 1955) corresponds to maximization of expected utility, while
Measurement of inequality under uncertainty
533
Rawls’ egalitarian solution (Rawls, 1971) is equivalent to maximization of the minimal payoff under uncertainty.3 Further, the previous decade has witnessed several derivations of a class of functionals that generalizes both expected utility (in the context of choice under uncertainty) and the Gini index of inequality (in the context of inequality measurement). The reader is referred to Weymark, 1981; Quiggin, 1982; Chew, 1983; Yaari, 1987, 1988; Schmeidler, 1989. Chew also pointed out the relationship between the “rank-dependent probabilities” approach to uncertainty and the generalization of the Gini index. The “rank-dependent probabilities” approach suggests that the probability weight assigned to a state of the world in the evaluation of an uncertain act f depends not only on the state, but also on its relative ranking according to f . Yet, if we restrict our attention to acts that are “comonotonic,” that is, that agree on the payoff-ranking of the states, probabilities play their standard role as in expected utility calculations. We therefore refer to these functionals as “comonotonically linear.” For simplicity, consider the symmetric case. A symmetric comonotonically linear functional would be characterized by a probability vector (pi )ni=1 , where there are n states of the world. Given an uncertain act f , that guarantees a payoff of fj at state of the world j , let f (i) be the ith lowest payoff in f . Then, f is evaluated by a weighted sum I (f ) =
pi f (i) .
i
In the context of inequality measurement, Weymark, 1981; Chew, 1983 and Yaari, 1988 considered the same type of evaluation functionals, where an element j is interpreted as an individual in a society, rather than as a state. In this context, it is natural to impose the symmetry (or “anonymity”) condition, which implies that an individual’s weight in the comonotonically linear aggregation given depends only on her social ranking. A common assumption is that p1 p2 · · · pn . For such a vector p, the functional given represents social preferences according to which a transfer of income from a richer to a poorer individual that preserves the social ranking, can only increase social welfare. Special cases of these functionals are the following: (i) if pi = 1/n, we get the average income function; (ii) if pi−1 − pi = pi − pi+1 > 0 (for 1 < i < n), the resulting functional agrees with the Gini index on subspaces of income profiles defined by a certain level of total income (see Ben-Porath and Gilboa, 1994); (iii) if p1 = 1 (and pi = 0 for i > 1), the functional reduces to the minimal income level. While axiomatizations of these special cases do exist in the literature, we prefer to keep the discussion here on the more general level, dealing with all functionals defined by p1 p2 · · · pn as above, and focusing on these cases as examples and reference points. The strong connection between inequality measurement and decision under uncertainty, and, furthermore, the fact that comonotonically linear functionals were independently developed in both fields, may lead one to believe that the problem of
534
Ben-Porath et al.
inequality measurement under uncertainty is (mathematically) a special case of the known problems of the measurement of inequality—or of uncertainty—aversion. But this is not the case. The rank-dependent approach of Weymark, Quiggin, Yaari, and Chew cannot satisfactorily deal with the preference patterns described in Section 22.1. Specifically, in each of the six alternatives in the f , g, and h matrices there are two 1’s and two 0’s. If we follow the rank-dependent approach, applied to the state–individual matrix, and impose symmetry between the states and between the individuals, we will have to conclude that all six alternatives are equivalent. Indeed, the pattern of preferences between the f ’s and the g’s, as well as that between the g’s and the h’s, is mathematically equivalent to Ellsberg’s paradox (see Ellsberg, 1961), and the rank-dependent approach is not general enough to deal with this paradox. We suggest to measure inequality under uncertainty by the class of min-of-means functionals. A min-of-means functional is representable by a set of probability vectors (or matrices in the case of uncertainty) in the following sense: for every income profile, the functional assigns the minimal expected income, where the expectation is taken separately with respect to each of the probability vectors (matrices). Observe that a comonotonically linear functional defined by (pi )ni=1 with p1 p2 · · · pn can be viewed as a min-of-means functional, for the set of measures which is the convex hull of all permutations of (pi )ni=1 . In Section 22.3 we define this class axiomatically, and contend that the axioms are acceptable in the presence of uncertainty no less than under certainty. After quoting a representation theorem, we prove (in Section 22.4) that this class is closed under iterative application, as well as under averaging. It follows that this class includes linear combinations of, say, the expected Gini index and the Gini index of expected income. We devote Section 22.5 to briefly comment on the related class of Choquet integrals. Finally, an Appendix contains an explicit calculation of the probability matrices for some examples.
22.3. Min-of-means functionals We start with some general-purpose notation. For a finite set A, let F = FA be the set of real-valued functions on A, and let P = PA be the space of probability vectors on A. For f ∈ F and p ∈ P we use fi and pi with the obvious meaning for i ∈ A. Elements of F and of P will be identified with real-valued vectors or matrices, as the case may be. We also use p · f to denote the inner product i∈A pi fi . Specifically, let S be a finite set of states of the world, and let K be a finite set of individuals. (Both S and K are assumed non-empty.) Let A = S × K, so that F = FA denotes the space of income profiles under uncertainty for the society K where uncertainty is represented by S. Let · denote the binary relation on F , reflecting the preference order of “society” or of a “social planner.” Consider the following axioms on ·. (Here and in the sequel, ≈ · and > · stand for the symmetric and antisymmetric parts of ·, respectively.)
Measurement of inequality under uncertainty A1 A2 A3 A4 A5 A6
535
Weak order: For all f , g, h ∈ F : (i) f ·g or g ·f ; (ii) f ·g and g ·h imply f ·h; Continuity: f k → f and f k ·(· )g imply f ·(· )g; Monotonicity: fsi (>)gsi for all (s, i) ∈ A implies f ·(> ·)g; Homogeneity: f ·g and λ > 0 imply λf ·λg; Shift covariance: f ·g implies f + c ·g + c for any constant function c ∈ A; Concavity: f ≈ ·g and α ∈ (0, 1) implies αf + (1 − α)g ·f .
We do not insist that any of these axioms, let alone all of them taken together, are indisputable. However, under certainty (where F is the set of income vectors), they are satisfied by utilitarian preferences, by egalitarian (maxmin) preferences, as well as by any preferences that correspond to a comonotonically linear functional with p1 p2 · · · pn . Moreover, axioms 1–6 seem to be as reasonable in the case of uncertainty as they are in the case of certainty. In Gilboa and Schmeidler (1989) it is shown that a preference order satisfies A1–A6 iff it can be numerically represented by a functional I : F → / defined by a compact and convex set of measures C ⊆ P as I (f ) = Minp∈C p · f
for all f ∈ F .
Moreover, in this case the set C is the unique compact and convex set of measures satisfying this equality for all f ∈ F . We refer to such a functional I as a “minof-means” functional: for every function f , its value is the minimum over a set of values, each of which is a weighted-average of the values of f .
22.4. Iteration and averaging We now prove that the class of min-of-means functionals is closed under two operations: pointwise averaging over a given space, and iterated application over two spaces. (This, of course, will prove closure under any finite number of iterations.) Let there be given two sets A1 and A2 , to be interpreted as the sets of states and of individuals, respectively. Consider the product space A = A1 × A2 . Given two min-of-means functionals I1 and I2 (on F1 = FA1 and on F2 = FA2 , respectively), we wish to show that applying one of them to the results of the other generates a min-of-means functional on F = FA . We first define iterative application formally. Notation 22.1. For a matrix f ∈ F , define f˜1 ∈ F1 to be the vector of I2 -values of the rows of f ; that is, for i ∈ A1 , (f˜1 )i = I2 (fi· ). Then (I1 ∗ I2 )(f ) = I1 (f˜1 ). Should I1 ∗ I2 be a min-of-means functional (on F ), it would have a set of probability matrices (on A) corresponding to it. In order to specify this set, we need the following notation. First, let Pr = PAr for r = 1, 2. We now define
536
Ben-Porath et al.
a “product” operation between a probability vector on A1 , and a matrix on A, associating a probability vector on A2 for each element of A1 . Notation 22.2. Let m = (mij )ij be a stochastic matrix, such that mi· ∈ P2 for every i ∈ A1 . Then, for p1 ∈ P1 let p1 ∗ m be the probability matrix on A defined by (p1 ∗ m)ij = (p1 )i mij . Notation 22.3. Let Cr ⊆ Pr be given (for r = 1, 2). Let C1 ∗ C2 ⊆ P = PA be defined by C1 ∗ C2 = {p1 ∗ m | p1 ∈ C1 , ∀i, mi· ∈ C2 }. That is, C1 ∗ C2 denotes the set of all probability matrices for which every conditional probability on A2 , given a row in A1 , is in C2 , and whose marginal on A1 is in C1 . Theorem 22.1. Let there be given A1 and A2 as above, and let I1 and I2 be minof-means functionals on them, respectively. Let Cr ⊆ Pr be the set of measures corresponding to Ir , r = 1, 2, by the result quoted above. Then I1 ∗ I2 is a minof-means functional on A. Furthermore, the set C ⊆ P corresponding to I1 ∗ I2 is C ≡ C1 ∗ C2 . Proof. Observe that the set C is compact. We note that it is also convex. Indeed, assume that p1 ∗ m, p1 ∗ m ∈ C and let α ∈ [0, 1]. Define p¯ 1 = αp1 + (1 − α)p1 ∈ C1 and m ¯ i· =
α(p1 )i mi· + (1 − α)(p1 )i mi· ∈ C2 α(p1 )i + (1 − α)(p1 )i
for i ∈ A1 whenever the denominator does not vanish. (The definition of m ¯ is immaterial when it does.) It is easily verified that p¯ 1 ∗ m ¯ = α(p1 ∗ m) + (1 − α)(p1 ∗ m ). We now turn to show that C is the set of measures corresponding to I1 ∗ I2 . We need to show that for every f ∈ F , (I1 ∗ I1 )(f ) = Minp ∈ C p · f . Let f ∈ F be given. We first show that (I1 ∗ I2 )(f ) Minp ∈ C p · f . Let mi· ∈ C2 be a minimizer of j mij fij ≡ fˆi . Let p1 ∈ C1 be a minimizer of p1 · fˆ
Measurement of inequality under uncertainty
537
(where fˆ is defined in the obvious way). Note that p1 · fˆ = (I1 ∗ I2 )(f ). Since p1 ∗ m appears in C, the inequality follows. Next we show that (I1 ∗ I2 )(f ) Minp ∈ C p · f . Assume that the minimum on the right-hand side is attained by the measurep1 ∗ m. We claim that, unless (p1 )i = 0, mi· is a minimizer, over all p2 ∈ C2 , of j (p2 )j fij . Indeed, were one to minimize p ·f by choosing a measure p1 ∗ m, one could choose mi· independently at different states i; hence, for any choice of p1 , the minimal product will be obtained for mi· that is a pointwise minimizer. Without loss of generality we may therefore assume that mi· is a minimizer of j mij fij ≡ fˆi . Hence p1 has to be a minimizer of p1 · fˆ, and the equality has been established. Finally, since I1 ∗ I2 is representable as Minp ∈ C p · f , it is a min-of-means functional. Under the above conditions, both I1 ∗I2 and I2 ∗I1 are min-of-means functionals. However, in general they are not equal, as the examples in Section 22.1 show. Specifically, let A1 = {s, t}, A2 = {a, b}, and define I1 by C1 = {(1/2, 1/2)} and I2 by C2 = {(p, 1 − p) | 0 p 1}. That is, I1 is the expectation with respect to a uniform prior, and I2 is the minimum operator. Consider the matrix g1 defined in Section 22.1, and observe that (I1 ∗ I2 )(g1 ) = 0 while (I2 ∗ I1 )(g1 ) = 1/2. The theorem above states that if a certain inequality index, such as the Gini index or the minimal income index, is representable as a min-of-means functional, so will be that index applied to expected income, and so will be the expected value of this index. However, if we consider a sum (or an average) of these two, we need the following result to guarantee that the resulting functional is also a min-of-means functional. Proposition 22.1. Let there be given two min-of-means functionals I 1 and I 2 on F = FA . Let α ∈ [0, 1]. Then I = αI 1 + (1 − α)I 2 is a min-of-means functional. Furthermore, if C 1 and C 2 are the sets of measures corresponding to I 1 and I 2 , respectively, then the set C corresponding to I is given by C = {αp1 + (1 − α)p 2 | p1 ∈ C 1 , p2 ∈ C 2 }.
22.5. A comment on Choquet integration Schmeidler (1989) suggested to use Choquet integration (Choquet, 1953–1954) with respect to non-additive measures for the representation of preferences under uncertainty. For the sake of the present discussion, the reader may think of a Choquet integral as a continuous functional, which is linear over each cone of comonotonic income profiles. Specifically, assume that f and g are two matrices, with rows corresponding to states, and columns—to individuals. Recall that the two are comonotonic if there is an ordering of the state–individual pairs that both f and g agree with, that is, if there are no two pairs (s, a) and (t, b) such that fsa > ftb while gsa < gtb . In this case, the Choquet integral I with respect to any
538
Ben-Porath et al.
non-additive measure has to satisfy I (f + g) = I (f ) + I (g). Under certainty, the assumption of symmetry (between individuals) reduces the non-additive integration approach to the rank-dependent one. This is not the case, however, when the relevant space is a product of states and individuals. In a twodimensional space, symmetry between rows and between columns does not imply that every subset of the matrix is equivalent to any other subset with identical cardinality. Thus, the non-additive approach may explain preferences as in Ellsberg’s paradox, and can, correspondingly, account for preferences as in Section 22.1, without violating the symmetry assumptions. This would give one a reason to hope that this approach is the appropriate generalization of comonotonically linear functionals in the case of certainty. Yet, we find that this linearity property, even when restricted to comonotonic profiles, is hardly plausible in our context. Consider the following four alternatives. f3 s t
a 1 0
b 0 0
f4 s t
a 0 0
b 1 0
g3 s t
a 2 1
b 1 0
g4 s t
a 1 1
b 2 0
By symmetry between the two individuals, the first two alternatives are equivalent. Assuming a Choquet-integral representation, this would imply that the last two alternatives are also equivalent. To see this, note that g3 = f3 + h and g4 = f4 + h for h given by h s t
a 1 1
b 1 0
Since f3 and h are comonotonic, I (g3 ) = I (f3 ) + I (h). Similarly, I (g4 ) = I (f4 ) + I (h). From I (f3 ) = I (f4 ) one therefore obtains I (g3 ) = I (g4 ). However, g3 and g4 are equivalent only as far as ex-post inequality considerations are concerned. Ex-ante, g3 makes one individual always better off than the other, while g4 guarantees the two individuals identical expected income. The expected minimal income, as well as the expected Gini index, are the same under g3 and g4 . Yet, the Gini index of the expected income, as well as the minimal expected income, differ. From a mathematical viewpoint, we note that the average of Choquet integrals is a Choquet integral, and therefore the expected Gini and the expected minimum are representable by a Choquet integral over the states–individuals matrix. By contrast, the minimum of Choquet integrals, or the integral (over individuals) of Choquet integrals (over states) need not be a Choquet integral itself. In other words, the family of Choquet integrals fails to be closed under iterative application.
Measurement of inequality under uncertainty
539
Appendix Suppose that an inequality measure (under certainty) is represented by a min-of-means functional IK on FK that corresponds to a set of measures CK ⊆ PK . Assume that p ∈ PS is an objective probability measure on the state space S. Then the functional I (f ) = αEs [IK (fs· )] + (1 − α)IK (Es [fsi ]) ∀f ∈ FS×K is a min-of-means functional, and the corresponding set of measures is 0 1 % " ∃r , r , . . . , r |S| ∈ CK s.t. ∀s, i . q ∈ PS×K qsi = αps ris + (1 − α)ps ri0 As an example, consider the case of extreme egalitarianism. That is, on the set of individuals K we adopt the set of all probability measures: CK = PK . Assume that p ∈ PS is an objective probability measure as given above. Then the functional ) * 1 1 I (f ) = Es min fsi + min Es [fsi ] ∀f ∈ FS×K i 2 2 i is a min-of-means functional, and its corresponding set of measures (on S × K) is ⎧ ⎫ (i) q = p ∀s; ⎪ ⎪ ⎪ ⎪ si s ⎨ ⎬ i q ∈ PS×K . 0 ⎪ ⎪ ps ri ⎪ ⎪ 0 ⎩ 0 ∀s, i ⎭ (ii) ∃r ∈ PK s.t. qsi − 2
Acknowledgment We thank two anonymous referees and the associate editor for their comments.
Notes 1 Myerson (1981) also draws the distinction between ex-ante and ex-post inequality, and argues that a social planner’s preference for ex-post inequality might lead to a choice which is not ex-ante Pareto optimal. 2 One can justify the preference of g over h on grounds of “procedural justice” (Karni and Schmeidler, 1995): it is more just that Nature (i.e., a lottery) would choose whether a and b will be the rich individual, rather than that the choice be made by a person (or persons) acting on behalf of “society.” 3 Indeed, Rawls’ conceptual derivation of his criterion resorts to the “veil of ignorance,” that is, to the reduction of social choice problems to decision under uncertainty. (See also Harsanyi, 1953.)
References Ben-Porath, E. and I. Gilboa (1994) Linear measures, the Gini index and the income– equality tradeoff, J. Econ. Theory 64, 443–467.
540
Ben-Porath et al.
Chew, S. H. (1983) A generalization of the quasilinear mean with applications to measurement of income inequality and decision theory resolving the Allais paradox, Econometrica 51, 1065–1092. Choquet, G. (1953–1954) Theory of capacities, Ann. Inst. Fourier 5, 131–295. Diamond, P. A. (1967) Cardinal welfare, individualistic ethics, and interpersonal comparison of utility: Comment, J. Polit. Econ. 75, 765–766. Ellsberg, D. (1961) Risk, ambiguity and the savage axioms, Quart. J. Econ. 76, 643–669. Gilboa, I. and D. Schmeidler (1989) Maxmin expected utility with non-unique prior, J. Math. Econ. 18, 141–153. (Reprinted as Chapter 6 in this volume.) Harsanyi, J. C. (1953) Cardinal utility in welfare economics and in the theory of risk-taking, J. Polit. Econ. 61, 434–435. —— (1955) Cardinal welfare, individualistic ethics and interpersonal comparisons of utility, J. Polit. Econ. 63, 309–321. Karni, E. and D. Schmeidler (1995) Justice and common knowledge, mimeo. Myerson, R. (1981) Utilitarism, egalitarism, and the timing effect in social choice problems, Econometrica 49, 883–897. Quiggin, J. (1982) A theory of anticipated utility, J. Econ. Behav. and Organ. 3, 323–343. Rawls, J. (1971) “A Theory of Justice,” Harvard Univ. Press, Cambridge, MA. Schmeidler, D. (1989) Subjective probability and expected utility without additivity, Econometrica 57, 571–587. (Reprinted as Chapter 5 in this volume.) Weymark, J. (1981) Generalized Gini inequality indices, Math. Soc. Sci. 1, 409–430. Yaari, M. (1987) The dual theory of choice under risk, Econometrica 55, 95–115. —— (1988) A controversial proposal concerning inequality measurement, J. Econ. Theory 44, 381–397.
˚
Index
acts: comonotonic 29, 111, 307; constant 211; as decision alternative 137; as horse lotteries 127; measurable 139; “objective mixtures” of 214; set of 23; step 139 additive games (charge games) 47 additive (subjective) probabilities: on idiosyncratic states 347; in physics 108 additivity for unambiguous events 138 agents: aversion to Knightian uncertainty 453; beliefs, assumptions about 429; consumption across states 338; with finite databases 365; heterogeneous 452; ranking of states 337 Agnew, C. E. 127, 156 Alaoglu Theorem 49, 264 Σ as σ-algebra 50, 53–4, 73, 474 Aliprantis, C. D. 51–6, 58–9, 76, 79, 91–2, 101 Allais, M. 3, 21; Paradox of 3, 5, 176, 206, 213 ambiguity: affinity for 185; and ambiguity attitude, defining 36–44; and ambiguity aversion, related literature 214–15; in beliefs, updating 155–68; definition of 40–3; in events 138; as fuzzy perception of likelihood 332; made precise 209–41; in probabilities 174 ambiguity aversion 209, 212; absolute, characterization 223–4; behavioral definition of 36; capacities and Choquet integrals 234–5; cardinal symmetry and biseparable preferences 235–8; characterizations comparative and absolute 222–6; comparative and absolute 221–2; comparative, and equality of cardinal risk attitude 230; comparative foundations 37–40; definition for class of preference models
209; definition of unambiguous acts 227; definitions 37, 219–22; equality of utilities 236; filtering cardinal risk attitude 219; with heterogeneous agents, static model 285; probabilistic risk aversion 232–4; proposition on ambiguity averse CEU preference relation 229; “pure” 40; set-up and preliminaries 215–19; theorem on absolute ambiguity aversion 223–4; theorems on biseparable preferences 223, 225; theorem proofs 238–41; theory 309, 354; unambiguous acts, using in comparative ranking 230 Anderlini, L. 303 Anderson, E. W. 286, 364, 382, 411 animal spirits 429 Anscombe, F. J. 17, 26–7, 109, 121, 126, 138–9, 144–6, 168, 194, 210, 214, 218, 223, 520; models of 126; see also Anscombe–Aumann Anscombe–Aumann model 31, 43, 109, 160, 195; framework 210, 214, 218, 223; horse-race/roulette-wheel acts 171; standard frameworks for modeling uncertainty 246; Theorem 113–14; or “two-stage” model 147, 244–5, 255, 258 ARMA model for consumption growth 414 Arrow, K. J. 26, 44, 210, 284, 420 Arrow securities 340 artificial intelligence, theory of 156 asset price: and aggregate consumption data for the United States 450; determination 438 asset pricing 370; models 365 asset pricing under Knightian uncertainty, intertemporal 429–68; aggregation in heterogenous agent economy 464–6;
542
Index
asset pricing under Knightian – cont. belief function kernels 451; equilibrium asset pricing 441–54; integral equilibrium examples 448; lemma on continuity 462; lemma on existence of ε-optimal continuous policies 461; emma from Michael 458; Lucas model, nondifferentiable 451; lMaximum Theorem 462; proposition on approximation of utility 457; proposition on continuity of utility 457; proposition on existence of utility 456; remarks on empirical content 454–6; theorem on existence of utility proof 456–8; theorem on structure of set of equilibria proofs 458–64; theorems on existence and characterization of equilibria 445, 447; utility 431–4, 436 asset trading 343 asymmetric uncertainty between transacting parties 354 Atkinson–Kolm–Sen, traditional welfare function 297 attitudes toward uncertainty and risk, degree of separation between 179 Aubin, J. P. 468 auctions, analysis of behavior in 295 Aumann, R. J. 17, 26–7, 48, 65, 109, 121, 138–9, 144–6, 168, 194, 210, 214, 218, 223, 263, 495, 505–8, 520; models of 126; theorem of 15; see also Anscombe–Aumann aversion to risk and uncertainty 175–9 axiomatizations: for decision under uncertainty 24; and development of new models or concepts 20–1; of expected utility 21; general purpose of 20–2; and their structural assumptions 26 axioms: bisymmetry 28, 31; P4 25; three different classes 22 Baire category theorem 446 Baker, G. 305 Balcar, B. 267 Banach: lattice 75; space, dual 104; spaces 48, 75, 130, 263, 278, 474 Bansal, R. 403–4, 414 Baratta, P. 13 Barsky, R. B. 430–1 Bartle, R. G. 53 Bassanezi, R. C. 67 Bayes, T. 3 Bayes’/Bayesian: Beliefs Equilibrium 490–1, 501, 507, 519; decision problems
386–7, 414; expected utility maximizers 472; implementation theory 296; law of 3; learning generalized 158; rule 163; statistical techniques 108; tenets of 43; update rules 162–3; updating, violations of 3 Bayesian approach: to decision making under uncertainty 155; for one-shot decision problem 156 Bayesian model 430; of decision-making 429; extensions of the 431; prior in 466 Bayesianism: assumptions 17; first tenet of 4–5, 16 Becker, J. L. 146 behavioral consequences of uncertainty aversion 173 behavioral definition of ambiguity for events and acts 227 belief functions 14–15, 84; kernels 455; and their updating 157 Beliefs Equilibrium 501, 507, 515, 519; with Agreement 506 beliefs, sharing 472–81; lemma on convex and compact sets 478; mutual knowledge of 508; proof of theorem Pareto optimal allocations 476–8; standard two-period economy 474–5; theorem on Pareto optimal allocations theorem 475, 478–8 Bellman equations 383–4, 387 Ben Porath, E. 103; 257, 297–8, 531, 533 Bernardo, J. M. 146 Bernheim, B. D. 295, 321 Bernheim and Pearce’s notion of “rationalizability” 527 Bernoulli, D. 419 Bernstein polynomials, 85, 94–6 Bertsekas, D. P. 461 Bessanezi, R. C. 103 bets: on event 242; on events, ranking of 210; willingness as complementadditive 227 Bewley, T. F. 15–16, 18, 29, 41, 127, 156, 287, 341, 420, 431, 476; model 158, 421; work on Knightian decision theory 472 Billingsley, P. 177, 271 Billot, A. 287, 472 Binmore, K. 529 biseparable preferences 43, 210, 212, 216, 222, 236; ambiguity averse 231; examples of 216–18; theorems 223–5 biseparability 235 Biswas, A. K. 81, 104
Index 543 Blackwell, D. 26, 386; sufficient condition for contraction mapping 456 Blume, L. 256 Bondareva, O. 50 bond/s: Chernoff 400; high yield corporate (“junk” bonds) 354; issued in “emerging markets” 354; market, US corporate 361 Border, K. C. 51–6, 58–9, 76, 79, 91–2, 101, 104 Borel: measurable functions 413; measures 275; probability measures 432; sets 143 Borel σ-algebra 143, 261, 268, 432, 436; of compact Hausdorff space 274 Boros, E. 94–5 bounded game 47, 50 bounded rationality 303 Brandenburger, A. 494–5, 505–8, 520 Bray, M. 413 Breeden, D. T. 389, 413 Brock, W. A. 388–9 Brownian motion 372, 375, 378, 383–4, 386, 390; with constant drift 397; risk prices 389; shocks 407–8 Burks, A. W. 24 business: cycle models, analysis of 406; firm 303 Camerer, C. F. 309, 430, 483, 527 Cameron, H. R. 397–8 Campbell, J. Y. 414 canonical utility index 216, 221, 223 capacity: additive 144; notion of 9; see also nonadditive probability cardinal risk attitude 212 cardinal symmetry 212, 220, 223 Carlier, G. 103 Carlton, D. W. 289, 305 Carroll, C. 361 Cartesian product state space 194 Casadesus-Masanell, R. 17, 26, 31, 37, 217, 245, 248, 258 Case-Based Decision Theory of Gilboa and Schmeidler 299 certainty: equivalence 110; independence 127 CEU see Choquet expected utility Chain Rule for eventwise differentiability 203 Chateauneuf, A. 30–1, 84, 96, 157, 214, 287, 337, 341, 472 Chen, Z. 285, 382 Chernoff, H. 393, 395–6, 415; bonds 400; entropy 396–8, 401, 403, 406, 409, 414;
entropy rate 400; large deviation bounds 394; measure 413 Chew, S. H. 14, 26, 28–9, 122, 172, 182, 190, 245, 249, 259, 467, 533–4 Cho, I.-K. 413 Choquet expectation 291, 323; of act 307; operator 308, 311, 332, 361 Choquet expected utility (CEU) 21, 26, 29, 136, 138, 140–1, 344, 482; with convex capacities, cognitive interpretation of 9–10; framework as model of ambiguity aversion 361; maximizers 337; model with convex capacities 473, 476; model of Gilboa and Schmeidler 218; model of Schmeidler 36, 63, 218, 284; one-stage and two-stage approaches, nonequivalence 146–8; orderings 217; properties of utility and capacities under 30; restrictiveness 245; uncertainty aversion, and randomizing devices 253–5 Choquet expected utility (CEU) model, formal details relating to 354; independent product for capacities 353–7; law of large numbers for capacities 357 Choquet expected utility (CEU) preferences 40–1, 43, 222, 228, 245, 290, 336, 361; with convex capacities in product state space model 256; with convex capacity restrictive class in Savage-like setting 255; with a convex capacity, restrictiveness 253; theorem 253 Choquet expected utility (CEU) theory 9, 16, 180, 210; with convex capacities, cognitive interpretation of 9–10; development of theory, related literature 13; maximizer 10; nonadditive probabilities 6–9; Schmeidler 171, 182; utility function 189–90 Choquet functionals 60, 62–7, 73, 76–8, 81, 93, 103–4; representation 69–73; representation lemma 71; representation theorem 69–70 Choquet, G. 8, 73, 84, 103, 117, 125, 153, 155, 174, 217, 235, 261, 273–4, 426, 540; decision rule 309; method of evaluation of act 323; Representation Theory 269 Choquet integral 8, 10, 58–69, 77–8, 90–3, 121, 126, 141, 146, 155, 210, 235, 245, 247–8, 264, 274–5, 277, 298, 534; basic properties 64; definition for general
544
Index
Choquet integral – cont. functions 9; evaluation 293; of every real-valued function 9; ∫ f dν definition 61; general functions 60–4; Jensen inequality 68; lemma on comonotonic functions 66; Lipschitz continuity 64; monotonicity 64; for positive functions 58–60, 103; positive homogeneity 64; proposition on basic properties 64–5; proposition with comonotonic additivity 67; proposition with Jensen inequality 68; proposition with simple functions 63; proposition with unique invariant extension 60; propositions defining 58–9; representation 538; theorem with comonotonic functions 66; translation invariance 64; variants of 299 Choquet integration 14, 125, 181, 466, 537–8; definition of 206; formula 464; proposition on min-of-means functionals 537 Choquet subjective expected utility (CSEU) 275 Chow, C. K. 400 classification errors in models 396 coalition games, representation of 263; corollary on monotone set function 277; decomposition 265–7, 273; decomposition and representation of 261–78; definition of filter games 267; Dempster–Shafer–Shapley Representation Theorem 263; dual spaces 278; filter games, proposition 267–8; of finitely additive representation of games in Vb 275; integral representation 274–5; isometric isomorphism T 276; Jordan Decomposition Theorem for measures 265–7; locally convex topological vector space on V 264–5; representation of countably additive representation of games in Vb 267–74; theorem on isometric isomorphism 268–9 coalitions in cooperative game 332 Coase, R. H. 303 Cochrane, J. H. 402, 414, 430, 448, 454 cognitive ease 5 cognitive unease 6 coins, example of two 11, 36 collection of unambiguous events 177; see also λ-systems commutativity 161 comonotonic additivity 235 comonotonic independence 112, 128
comonotonicity 68 compact Hausdorff space 271, 274 complete information normal form games 290 complete-markets aggregation theorem 452 completeness 22 composition norm 88 concavity 28; of vNM index 176 conditional relative entropy 379 confidence sets and hypothesis testing, techniques of 157 Constantinides, G. M. 414, 452 consumer behavior under certainty, theory of 420 consumption: model based on 450; processes 436; and savings behavior 438 ε-contamination 448, 452; model of beliefs 455 contingent contracts 320 contingent deliveries: incompleteness of markets for 341; rule 311 contingent payoffs 336; comonotonic 316; noncomonotonic 316 contingent states, relevant details 308 contingent surplus 310 continuation values 365 continuity 29, 112, 127; absolute 378 continuous-time: diffusion specification 414; Markov formulations of model specification, robust decision making, pricing, and statistical model detection 370; Markov models 366, 412 contracts with low powered incentives 289 contractual arrangements, ambiguity aversion 283 contractual relationship, “efficient” 320 convex games (supermodular games) 47, 73–82; and their Choquet integrals 103; convex analysis tools in studying; corollary, conditions in 76; lemma on Choquet functionals 76; lemma on finite game 81; proposition, properties of 73; theorem on bounded game 80; theorem on condition equivalence 73; theorem on condition equivalence in bounded game 78 convex nonadditive probability 158, 312–13, 339; function 306, 308 convexity: as defined in cooperative game theory 9; identification with uncertainty aversion 194; of nonadditive measures 9 Cooperative Game Theory 235 core: of bounded game, weak compact 49;
Index 545 of convex games 103; weak*-compact 54; corporate borrowing in US 361 countable additivity 51–3; games called measures 47 Cournot, A. A. 523 Cowles foundation 18 Cox, J. C. 416 Cragg, J. 431, 455 Crawford, V. 519 Csiszar, I. 379 cumulative dominance 137, 141–2, 145 cumulative prospect theory (CPT) 14 Dana, R. A. 103, 473 Deaton, A. 361 decision: analysis, early literature 27; in risk situation 110; models with ambiguity averse preferences 210; process, ex post Bayesian justification for 387; under risk 23–5; under uncertainty, general conditions for 23–5 decision maker: ambiguity averse 37, 39, 283; approximating model 378; concern about “model uncertainty” 286; Knightian uncertainty, response to 366; rational expectations model 364 decision maker’s beliefs 210; convexity of capacity representing 214; represented by multiple probabilities 210; stochastically independent 487 decision theory: applications of capacities on topological domains 103; models developed by Schmeidler 286; singleperson 494; under risk 297; under uncertainty 16, 148 de Finetti, B. 3, 7, 12, 26, 31, 109, 140, 210 Dekel, E. 29, 519 Delbaen, F. 54, 77, 81 Dellacherie, C. 66, 103, 426, 466 DeLong, J. B. 430–1 Dempster, A. P. 14, 84–5, 157, 426, 435 Dempster–Shafer theory 15; for belief functions 163; for probabilities 162; update rule 157–8, 167 Dempster–Shafer–Shapley Representation Theorem 262, 269; for finite algebras 274; for finite games 261 Deneffe, D. 236 Denneberg, D. 66, 91, 103 detection 377; error probability 401; error probability bounds 402 deterministic outcomes 127 De Waegenaere, A. 77, 298
Diamond, P. A. 532 differentiability and uncertainty aversion, examples 189–91 Dini Theorem 53 Dirac: charge 97; measures 274–5 discount-factor model 453 diversification opportunities 289 Dow, J. P. 158, 259, 284–5, 291–2, 294, 309–10, 337–8, 340, 351, 416, 419, 421, 445, 473, 483, 493, 512–14, 516, 520, 529; notion of equilibrium 296 Dow and Werlang model, equilibrium beliefs 293 dual games with “dual” properties 48 dual spaces, proposition 278 Dubois, D. 299 Dubra, J. 29 Duffie, D. 381, 414, 437, 452 Dugundji, J. 464 Dunford, N. 47, 49, 53–5, 58, 75, 79, 104, 124, 128–9, 131–2, 272, 276, 474 dynamic consistency of choices 284 Dynkin, E. B. 413 East Asia 354 Eastern Europe 354 Eberlein–Smulian Theorem 54 economic effects of ambiguity aversion 288 economic significance of nondifferentiability 467 economy: without idiosyncracy 344; limit replica 337; markets, complete and incomplete 337–40; models 286–7; see also n-financial asset economy Edgeworth box 285, 341, 453 Edwards, W. 13, 22 Eichberger, J. 195, 244, 247, 255, 258, 296, 310 Einy, E. 81 Ellsberg, D. 4, 17, 36, 108, 110, 135, 137, 155, 176, 206, 210, 309, 420, 424–5, 430, 534; options 137; experiments 11, 195, 209, 212–14, 228–30, 257, 309, 483; mind experiment, challenging expected utility hypotheses 125; see also Ellsberg Paradox Ellsberg Paradox 4–6, 14, 113, 145, 430, 484, 495, 526, 538; example of urn with 90 balls 137, 157; one-stage formulation 147; “two-color” problem 244; twostage formulation 147; two-urn experiment 5; “unknown urn” example 242; urn experiments 174, 178, 185, 194, 207
546
Index
Ellsberg-type behavior 466 entropy 381; penalties 378; penalty problem 380; relative, models with 367 Epstein, L. G. 29, 31, 37, 39, 40–1, 103, 173, 210–11, 214–15, 229, 234, 258, 284–5, 300, 309, 341, 351–2, 361, 366, 381–2, 414, 429, 437–9, 458, 466–7, 520; definition of ambiguity aversion 39 equilibrium: in ε-ambiguous beliefs 520; arbitrage price theory (APT) 352; model with multiple-prior agents 284; as price process 442; prices, characterization by an “Euler inequality” 429; pricing of securities with payoff dates in the future 365; profile 291; risk-sharing 349; in strategy profiles 522–4 equilibrium asset pricing 441–54; belief function kernels 451; economy 441; Euler inequalities 442–4; examples 448–54; heterogenous agents 452–3; nondifferentiable Lucas model 451–2; theorem on existence and characterization of 445–6, 447–8; theorem on structure set of 446–7 equilibrium in beliefs under uncertainty 483–518; Bayesian players concepts for 488–90; belief equilibrium containing Bayesian beliefs equilibrium 499; Beliefs Equilibrium (with Agreement) 496; correlated rationalizable strategies 497; general preferences 514–15; knowledge of rationality 496; marginal beliefs disagreement 490; multiple priors model 485–7; Nash Equilibrium, “beliefs” interpretation of 489; normal form games 487–8; proposition on Bayesian Beliefs Equilibrium 507; proposition on Bayesian Beliefs Equilibrium proof 516; proposition on Beliefs Equilibrium with Agreement 510; proposition on Beliefs Equilibrium with Agreement proof 517; proposition on Nash Equilibrium Under Uncertainty 513; proposition on Nash Equilibrium Under Uncertainty proof 518; proposition on Proper Beliefs Equilibrium 507; proposition on Weak Beliefs Equilibrium 511; proposition Weak Beliefs Equilibrium proof 518; rationalizable beliefs 497; single-person decision making 498; stochastically dependent beliefs 490; uncertainty averse players, concepts for 491–7;
uncertainty aversion, importance of 497–504 equilibrium in beliefs under uncertainty, decision theoretic foundation for Bayesian solution concepts 505–9; proposition on Beliefs Equilibrium 507; proposition on Beliefs Equilibrium with Agreement 507–8, 510; Proposition on Weak Beliefs Equilibrium 511 equilibrium in beliefs under uncertainty, definitions 489; Bayesian Beliefs Equilibrium 489; Beliefs Equilibrium 491–2; Nash Equilibrium 489, 513; Strict Beliefs Equilibrium 492; Weak Beliefs Equilibrium 493 equilibrium in beliefs under uncertainty, related literature 509–14; definition of Nash Equilibrium 489, 513; Dow and Werlang 511–12; epistemic conditions for equilibrium 512–14; proposition on Nash Equilibrium Under Uncertainty 513; proposition on Nash Equilibrium Under Uncertainty proof 518 equilibrium in beliefs under uncertainty, with uncertainty aversion 510; beneficial when players agree 501; importance of 497–504; need for 504–5; and rationalizable beliefs 510 equilibrium concepts for uncertainty averse players 491–7; definition of Beliefs Equilibrium 491–2; definition of Strict Beliefs Equilibrium 492–3; definition of Weak Beliefs Equilibrium 493; knowledge of beliefs 495; knowledge of rationality 496; mixed strategies as objective randomization vs subjective beliefs 493–5; nonunique best responses 504; nonunique equilibria 504; relationship with maximin strategy and rationalizability 496 equity premium puzzle 287, 368, 414, 468 equivalence relation 23 Ethier, S. N. 382, 412 EU see expected utility Euclidean space 87 Euler: equalities 285; equations 442, 444, 450, 454; inequalities 285, 443–4, 453, 458 Evans, G. W. 412 events 139; commutativity 28; evidence as non-negative number attached to 14; fineness of the unambiguous 142; types 136; unambiguous 40 eventwise differentiability 39
Index 547 eventwise differentiability of utility 172; definition 188; technical aspects of 189, 202–5 expectation 122 expected gains 312, 422 expected Gini index 534 expected utility (EU) 22, 172; maximization relative to probabilistic beliefs 3; violations of 4 expected utility model 123; of concave Bergson-Samuelson social welfare function 123 Expected Utility Theory (EUT) 419; by Anscombe and Aumann 9; classical 38; subjective, violations of independence axiom/sure thing principle 256 expected utility under nonadditive probability measure: maximizing 420; model of 419; value computed 422, 426–7 expected value: of contracted payoffs 289; function 175 extended generator 374 factor risk prices 368–9 Fagin, R. 156, 158 fair prizes 26 Fan, K. 51, 521; Theorem 443, 458 Federal Reserve System in Minnesota 528 Feller: process 374, 378; property, strict 444–5, 448; semigroups 370–2, 382 Felli, L. 303 Feynman, R. P. 108 filter games 267; corollary 273; definition 267; proposition 267 financial markets: ambiguity aversion 283; outcomes 284–7 financial uncertainty, idiosyncratic 342 Fine, T.-L. 47, 353, 357, 361, 430 finite algebras 262 finite convex games 100–3; properties of 100; theorem on marginal worth charges 100; theorem on vertex of core 101 finite games 77, 81–103; additive representation 90–3; decomposition 89–90; lemma on decomposition 89; lemma on lattice preserving isomorphism 87; lemma on Owen correspondence 97; lemma on polynomial counterpart of total monotonicity 98; lemma on Riesz space with lattice operations 86; polynomial representation 94–9; proposition on core of 101; space of 82–9; theorem on
decomposition 90; theorem on lattice preserving and isometric isomorphism 88, 91, 99; theorem on mononomials 95; theorem on totally monotone games 85; theorem on unanimity games 83 finite time series data record 365 firms: profits of 343; as risk neutral 289, 321 Fishburn, P. C. 22, 29, 31, 40, 109, 113, 121, 128, 138, 40, 145–6, 151, 155, 160, 215, 332, 361 Fisher Body and GM merger between 320 Fleming, W. H. 383 framing effects, documentation of 3 Franek, F. 267 Frankel, J. A. 431, 455 Frechet differentiability 172 Friedman, M. 124 Froot, K. 431, 455 Fubini Theorem 59, 247 Fudenberg, D. 295, 412, 497, 519, 529 full-insurance allocation 477 functional analytic tools in study of convex games 77 functions, comonotonic 66–7 Fundamental Theorem of Calculus 204–5 Gajdos, T. 298 game theory: ambiguity aversion 283; interpreting supermodularity in terms of marginal values 73; resolution of dynamic inconsistency 439 game tree 524; model 522 games: balanced 50; dynamic, building blocks in analysis of 522–3; extendable 81; monotone 262; non-atomic 57; representation of locally convex topological vector space 264–5; in strategic form 522; totally balanced 50; two-player zero-sum Markov multiplier 383 Gärdenfors, P. 146 Gardner, R. J. 272 Gâteaux 439, 444; derivatives 439, 443; differentiability of functions 172, 189, 202, 440, 467; nondifferentiable 442 Gaussian control problem 366 Genest, C. 127, 156 Ghirardato and Katz, MEU framework to analysis of voting behavior 296 Ghirardato, P. 17, 26, 28, 31, 36–8, 41–2, 103, 209–10, 215–17, 221, 227–8, 231, 234, 244, 247, 255–8, 290, 309, 353, 361
548
Index
Gilboa, I. 12, 15, 17, 22, 26, 30, 43, 85, 88–9, 91, 121–2, 125–6, 136, 143, 146, 148, 150–1, 155, 160–1, 172, 181, 195, 210, 243, 256, 261–4, 266, 274–5, 291, 299, 340, 353, 416, 419–20, 423, 426, 431, 438–9, 466, 472–3, 476, 481, 485, 487, 514, 519, 531, 533, 535; adaptation of Savage’s P7 to case of CEU 142 Gilboa and Schmeidler 440; axiomatizaiton based on behavioral data 12; multiple-prior model 284; representation 262; Theorem E of 277; unification of 138; utility 468 Gilboa–Schmeidler model 494, 519; closed and convex set of probability measures 483–4 Gini index 532, 537; of expectation 298; of expected income 297, 534, 538; of inequality 533; on subspaces of income profiles 533; welfare function 532 Giovannoni, F. 40, 43, 214; theory of ambiguity aversion 43 Girshick, M. A. 26, 386 Goes, E. 529 government/defense procurement contracts 321 Grabisch, M. 84, 94, 299 Greco, G. H. 67, 103–4 Green, E. 528 Greenberg, J. 294, 519, 522, 528–9 Grodal, B. 27 Grossman, S. J. 303 Grossman-Hart separation of CEU/MEU preferences 290 Guesnerie, R. 467 Gul, F. 26, 28, 246, 248–9 Hahn–Banach Theorem 75, 80–2 Halmos, P. R. 208 Halpern, J. Y. 156, 158 Hammer, P. L. 94–5 Hammond, P. J. 24 Hansen, L. P. 210, 286, 299–300, 364–5, 371–2, 375, 381, 387–8, 392–3, 402–3, 412–14, 448, 450 Hansen, Sargent, and Tallarini 365–7, 370, 392, 406–7, 409, 412–14; discrete-time model 406; equilibrium permanent income model 402; robust permanent income model 406; see also Hansen; Sargent; Tallarini Hansen, Sargent, Turmuhambetova, and Williams 366, 378, 383, 386, 392; see
also Hansen; Sargent; Turmuhambetova; Williams Hansen, Sargent, and Wang 365–7, 370, 401, 412–14; see also Hansen; Sargent; Wang Harsanyi, J. C. 532; Bayesian Equilibrium for games of incomplete information with Bayesian players 519; utilitarian solution 532 Hart, O. D. 303 Hausdorff: compact space 271, 274; topological vector space 264 Hayashi, F. 361 Hazen, G. B. 29 Heaton, J. C. 414 Hellman, M. E. 395, 400 Hendon, E. 256–7, 353, 529 Henry, H. 286 Holmes, R. B. 278 home-bias puzzle 285 Honkapohja, S. 412 horse lotteries 109, 111, 126–7, 472; acts 146, 182, 194; complicated probabilities 121 Huber, P. J. 103, 126 Hurwicz, L. 44, 125, 515; α-criterion 12, 31 hypercube [0, 1]n 94, 97 hypothesis testing 158 Ichiishi, T. 100 idiosyncratic risk 352–3 idiosyncratic shocks 361 incomplete contracts 210, 314; ambiguous beliefs, investment holdup and 310–20; assumptions 310; literature 320; null 316 incomplete information games 290 incomplete market economy 344; with sub-optimal risk-sharing 349 incompleteness of contractual form: ambiguity aversion and 303–33; condition on informativeness 312, 315; corollaries on first best action 314; corollary on tuple of informed and ambiguous beliefs 320; informed and ambiguous beliefs 320; lemma on informed convex nonadditive probability 313; lemma on informed convex nonadditive probability proof 323; lemma on nonadditive probability 318; lemma on nonadditive probability proof 330; model of decision-making by ambiguity-averse agents 306–10;
Index 549 propositions on first best action profile 313, 315, 319; proposition on first best action profile proofs 326, 330 incompleteness of financial markets: ambiguity aversion and 336–62; CEU model, formal details relating to 354–7; Choquet expected utility and related literature 339–41; lemma on equilibrium of n-financial assets economy with idiosyncracy 344; model and main result 342–53; theorems on n-financial assets economy with idiosyncracy 345–53; theorems on n-financial assets economy with idiosyncracy, proofs of 357–60 independence 112; axiom (sure-thing principle) 126, 420; concept in case of non-unique prior 126; monotonicity 112; for nonadditive beliefs, notion of 353; nondegeneracy 112; strict monotonicity 112 indexed debt 286 inequality (or inequity): aversion 123; and uncertainty 532–4 inequality measurement 297, 533; evaluation functionals 533; theory of 297; under uncertainty 531 inertia property 300 infinite convex games 103 infinitesimal generator 370 information criteria (IC) 364 integral for capacities, notion of 103 integration: for non-negative functions, definition of 117; of real-valued functions 8 intertemporal asset pricing, model of 285; see also asset pricing intertemporal choice and multi-attribute choice 298 intertemporal utility 431–41; belief function kernels 434; environment and beliefs 432; models of 438; probability kernel correspondences, examples of 434–5; supergradients, lemma 439; theorem on existence of 437–8 interval beliefs 191 investment holdup 320 irrational expectations 431; alternative models of 454 iteration and averaging 535–7 Ito, T. 470 Jaffray, J.-Y. 12, 44, 84, 96, 157–8, 435, 439 Jagannathan, R. 402–3, 448, 450
James, M. R. 381, 383 Jordan Decomposition Theorem: for charges 89; for measures 265, 273 Jorgenson, D. W. 412 Joskow, P. L. 320 jump distortions 370 Kadane, J. 434, 465 Kahneman, D. 3, 13, 14, 22, 26, 29; and Tversky, Prospect Theory of 3; see also Prospect Theory Kalai, E. 295, 529 Kannai, Y. 51, 224 Karni, E. 25–6, 28, 176, 245, 249, 539 Katutani’s fixed-point theorem 326 Katz, J. N. 242, 296 Kelley, J. L. 81, 477–8 Kelsey, D. 40, 195, 214, 244, 247, 255, 258, 296, 310, 332, 352–3 Keynes, J. M. 136–7, 429, 445, 451; description of consequences of uncertainty 430 Kikuta, K. 81, 104 Klein, B. 303, 320 Klein, E. 447 Klein, P. 289 Klibanoff, P. 12, 257, 291, 294, 483, 493, 509–12, 514, 516, 529; definition of equilibrium in normal form games 292; equilbrium in 292, 513, 520; model, equilibrium belief 293 Knight, F. H. 4, 36, 137, 417, 420, 429–30, 466, 473, 526 Knightian uncertainty 36, 176, 332, 429, 432, 452–3, 473, 483, 523, 526–7; ambiguity 293–4; aversion 242, 454; model of behavior under 341 Kogan, L. 286 Koppel, R. 429 Kopylov, I. 43 Krantz, D. H. 25, 220; conjoint measurement theory of 27 Krein–Milman Theorem 79, 274 Kreps, D. M. 208, 413, 467, 529 Kreps, S. 195 Kuhn, H. W. 528 Kunita, H. 375, 378, 382 Kurtz, T. G. 382, 412 Kurz, M. 412 Lambros, L. A. 456 Landers, D. 208 Laplace’s principle of insufficient reason 4–5
550
Index
Latin America 354 La Valle, I. H. 29 Laws: of iterated expectations 365; of Iterated Values 387 law of large numbers 353; for ambiguous beliefs 361; of capacities 357 Lebesgue: integration 438; measure 143 LeBreton, M. 31, 439 Legros, P. 313 Lehmann, B. N. 455 Lehrer, E. 85, 295, 529 Lei, C. I. 386 Leonardo, F. 333 LeRoy, S. F. 454, 466 Levine, D. K. 295, 412, 529 Lindley, D. V. 127, 156 linear inequalities in normed spaces, systems of 51 Lions, P. L. 381 Lipman, B. L. 303 Lipschitz continuity 103 Lo, K. C. 259, 281, 295, 310, 483, 529; definition of equilibrium 292; model, equilibrium beliefs 293 local probabilistic beliefs 172 local risk-neutrality result 421 Loomes, G. 29 loss of surplus 320 lotteries 17; evaluating 13; ticket 494; twostage 514 Luca, R. 301 Lucas, D. J. 414 Lucas, R. 352 Lucas, R. E. Jr. 388–9, 412, 429–30, 458; model 448, 452, 454; pure exchange economy 441; rational expectations model 448 Luce, R. D. 24, 146 Lyapunov equation 377–8, 380 Maccheroni, F. 53, 104 Mace, B. 361 Machina, M. J. 17, 31, 41, 144, 172, 177–9, 189, 204, 206–7, 214, 231, 483, 519; probabilistic sophistication model 39 MacLeod, W. B. 333–4 Maenhout, P. 385 Magill, M. J. P. 359 Malcomson, J. M. 303, 333 Malkiel, B. 431, 455 Mao, M. H. 172 marginal worth 101; charges 100, 102 Marinacci M. M. 31, 38, 46, 53, 59, 77, 89,
91, 102–3, 206, 209–10, 216–17, 221, 227–8, 231, 234, 255, 258, 261, 289, 291, 294, 332, 353, 361; definition of equilibrium in two-player normal form games 292; model 293; theorem 357 market prices of risk 365–6, 368–9, 414 Markov: evolution indexed by α 414; models 365; perfect equilibrium of twoplayer, zero-sum game 390; probability kernel 432–3, 454; structure 378, 467 Markov processes 365, 373; conditional expectations operators 365; diffusion, generator of 372; as Feller semigroups 398; jump, generator for 372; mathematical theory of continuous-time 409; models, discriminating between 398 martingale construction 374 Martin, W. T. 397–8 Masten, S. E. 321 mathematical treatment of nonadditive probabilities 426 mathematics of ambiguity 46–104; Choquet integrals 58–69; convex games 73–82; finite games 82–103; representation 69–73; set functions 46–58; MATLAB control toolbox 404, 406 Matsushima, H. 313 Maurin, E. 298 maximizing utility with nonadditive prior 421 maximum-likelihood-update rule 156–7, 163; generalized version of 168; not commutative 167 Maximum Theorem 443–4, 447, 462, 464, 466 maxmin expected utility (MEU) (MMEU) 157; preferences 37, 40, 224, 228; theory 210 Maxmin Expected Utility (MMEU) model or “multiple prior” model 16; in Anscombe–Aumann framework, axioms of 17; development of theory, related literature 13; preferences 222, 290; of Schmeidler, and Gilboa and Schmeidler 218; two main advantages over CEU 12 Maxmin expected utility (MMEU) model with non-unique prior 125–34, 245; extension of 132–4; lemmata 128–30; with multiple priors 245; preferences with convex capacities in a product state space model 256; proof of 128–31;
Index 551 propositions 132–3; and randomizing devices theorem 250–3; theorem 127–8 Meeden, G. 127 Mehra, R. 389, 468 MEU see maxmin expected utility Meyer, P.-A. 103, 466 Miao, J. 285 Michael, E. 470 Milgrom, P. 158, 481 Milne, F. 352 minimal income index 537 minimax criteria: loss 126; regret 126 minmax expected utility or multiple priors model of Gilboa and Schmeidler 366; see also maxmin expected utility min-of-means functionals 536; theorem 534–6 Mobius: inverse of capacity 299; theory of 84; transform 98 model misspecification and robust control 365, 377–83; for adding controls to the original state equation 383; alternative entropy constraint 382–3; conditional relative entropy 379–80; enlarging class of perturbations 383; entropy penalties 378–9; Lyapunov equation under Markov approximating model and fixed decision rule 377–8; risk-sensitivity as an alternative interpretation 381; theorem on entropy penalty 380–2; worst-case model, θ-constrained 382 model specification, semigroups for: detection errors 365–6, 369–70; discounting 373; entropy solution 411–12; extending domain to bounded functions 373; extending generator to unbounded function 374; and their generators 370–1; iterated laws and 365; market prices of risk 365–6; mathematics 370–4; related literature 366; robustness versus learning 367; theorem on entropy penalty 380; worstcase models 375, 390; see also semigroups for model specification model specification, semigroups for entropy and the market price uncertainty 401–9; permanent income economy 406, 409; prices of model uncertainty and detection-error probabilities 403, 409; risk-sensitivity and calibration of θ 406 model specification, semigroups for portfolio allocation 383–7; diffusion 384; ex post Bayesian interpretation
386; jumps 386–7; related formulations 385–6 model specification semigroups for pricing risky claims 387–93; ex post Bayesian equilibrium interpretation of robustness 392; marginal rate of substitution pricing 388; model uncertainty prices, diffusion and jump components 392; pricing under approximating model 391; pricing without concern for robustness 389–91; subtleties about decentralization 392; theorem on robust resource allocation 391 model specification semigroups for statistical discrimination 393–401; constant drift 397; continuous time 398; detection and plausibility 401; detection statistics and robustness 400; discrimination Markov models in discrete time 394; formulation of bounds on error probabilities 394; measurement and prior probabilities 393; rates for measuring discrepancies between models locally 396; theorem on generator of positive contraction semigroup 400 model uncertainty premia: for an alternative economy 404; in market prices of risk 365 monotonicity 22, 24, 112, 127, 235, 456; in game 47, 273; strict 112 Montesano, A. 43, 214 Montrucchio, L. 46, 77, 102–3 Moore, J. H. 333 moral hazard 289; models of double-sided 322 Morgenstern, O. 124, 169, 419 Morris, S. 354 Moscarini, G. 400, 414 Moulin, H. 47, 49, 73 Mukerji, S. 210, 285–6, 289, 293, 300, 303, 309–10, 336, 350, 354, 519–20 multi-attribute choice 298 multiple price supports 352 multiple priors (MP) 21, 26, 155–7; theory applied to portfolio selection problems 158; with unanimity ranking 15–16 multiple-priors model 11–12, 30, 155–7, 171, 286, 291, 298, 431, 473, 476, 509, 511, 515; ; axiomatized by Gilboa and Schmeidler 483; preference order 181; see also Maxmin Expected Utility (MMEU) Myerson, R. B. 83, 275, 539
552
Index
Nakamura, Y. 26, 138, 245–6, 248–9, 251; multi-symmetry 28 Namioka, I. 477–8 Nandeibam, S. 40, 214, 332 Narens, L. 146 Nash equilibrium 290–1, 294, 483–4, 491, 500–1, 503, 506–7, 519–20, 522; Cautious 520; deficiency in dynamic games 522; generalization of 516; proof of uniqueness 527; strict 519; Under Uncertainty 512; unique 525–6; unique recommendation 525 Nash, J. 488, 528; path 527 Nehring, K. 40–3, 215, 257, 302 Neo-Bayesian decision theory: Anscombe–Aumann Theorem 113; comonotonic independence 112; continuity 112; implication of von Neumann–Morgenstern Theorem; Newman, C. M. 400 n-financial asset economy with idiosyncracy 342–4; lemma on equilibrium of 344; maximization problem in 358; model and main result 342–53; theorem proofs 357–60; theorems 345–53 Nguyen, H. T. 106 Nikodym Uniform Boundedness Theorem 50 no-arbitrage, principle of 388 noise traders 431 nonadditive (objective) probabilities 108, 115; in physics 108; set of 122 nonadditive beliefs 294, 352; law of large numbers work 348 nonadditive expected utility, simple axiomatization of 136–53; corollary on additivity of capacity 145; definition of Choquet expected utility 140; lemma 141; main result 141; nonequivalence of one- and two-stage approaches 146–8; proposition 145; revealed unambiguous events 144–6; theorem 142–3; theorem proof 148–52; theory 126 nonadditive expected utility, simple axiomatization, definitions 139–41; Choquet expected utility (CEU) 140–1; consequences 139; cumulative consequence sets 140; events 139; states of nature 139; step acts 139 nonadditive measures 157; nonempty cores, convex 156; for representation of preferences under uncertainty 537; uncertainty aversion 156
nonadditive probabilities 6–9, 155, 244; or capacities 210, 466; distribution 420; for events 138; measures 421; monotone set-functions 155; update rule 167; updating 158; see also Choquet Expected Utilities theory nonadditivity 14–15; in probability for ambiguous events 138 non-Bayesian uncertainty 287 noncooperative game theory 290 nondegeneracy 112, 128 nonexpected utility models 29–31; abandoning basic axioms 29; preferences 352; probabilistic sophistication 31; properties of utility and capacities under CEU and PT 30–1; prospect theory 29–30 nonlinear decision weights 32 nonlinear probabilities, alternative models of 22 non-product weights for randomized act 248 non-trade theorem 421 nontriviality 142 nonuniqueness or indeterminacy of equilibrium prices 445 normal form games 522 no-trade: based on “lemons” problem, theory of 354; equilibrium price 347; price interval 285; result 352 null contract 311 null sets, basic properties of 55 objective probabilities 174; distribution 138 “odds” concept 125 off equilibrium choices 527 “one-stage” or Savage model 244, 247 ε-optimal continuous policies, existence of 461 optimal contractual arrangements 287–90; incentive contracts 288; risk sharing 287; role of ambiguity aversion in design of 322 Osborne, J. M. 529 Osborne, M. 521 outcome space 23 Owen, G. 47, 49, 95; correspondence 96; multilinear extension 95; polynomial 96–7 ownership rights 303 Ozdenoren, E. 295 Papamarcou, A. 430
Index 553 Pareto optimal allocations 473; optimality 472; theorem 475, 478–8; theorem proof 476–8 path games, analysis of 523 Paxson, C. 361 peace-negotiation scenario example 524–6 Pearce, D. 302, 521 permanent income model, discrete-time, linear-quadratic 366 Pfanzagl, J. 27; bisymmetry 28 Phelps, R. 269 Philippe, F. 91, 103 physical probability measure 413 players: Bayesian 291; preferences, multiple priors framework to model 292; strategic choice 291 Poisson intensity parameter, statedependent 372 Polya index 298 polyhedra 101 polytopes 101 Porteus, E. L. 467 portfolio: choice 284; income 343 positive games 47, 77; called probabilities 47 Poterba, J. M. 450 Pratt, J. W. 210, 410 preference axiomatizations for decision under uncertainty 20–32; axioms 21; general conditions for decision under uncertainty 23–5; general purpose of 20–2; nonexpected utility models 29–31; relation of decision 23; subjective expected utility (SEU) 24, 26; subjective expected utility (SEU) conditions to characterize 25–8 preferences: ambiguity averse 38, 42; intervals 140; mixture or randomization as facet of uncertainty aversion 244; randomization 257; relations 117, 215, 220, 222 preferences, definition 249; expected utility (EU) 249; solvability 248; stochastically independent 249; stochastically independent randomizing device (SIRD) 249 Prescott, E. C. 389, 468 Preston, M. G. 13 price: dynamics 158; indeterminacy 451; risk, two types of 287 pricing semigroups see semigroups principal agent problem under moral hazard 290
priors, set of 155, 157; see also multiple priors prisoners’ dilemma 501 probabilistic risk aversion 39–40, 213, 232–4 probabilistic sophistication: behavior 244; within CEU 190 probabilistically sophisticated (PS) preferences 231, 234; as benchmarks 231 probability 122; measure 190; on λ-system 184; prior 108; update in Gilboa, definition of 162 probability kernel 433, 454; correspondences 432; intertemporal utility examples of 434 procedural justice 539 proper randomizing device (SIRD) 255; see also stochastically independent randomizing device prospect theory (PT) 13, 17, 26; properties of utility and capacities under 30; reference outcome 29 pseudo-Bayesian rules, family of 156 pseudo-Boolean functions 94; grounded 94–6 Puppe, C. 299 pure endowment feature 388 pure exchange economy 352 quadratic capacity 190 quasiconcavity of preference 515 Quiggin, J. 14, 22, 28, 121, 533–4; model of 122 Quinzii, M. 359 Radner, R. 313 Radon–Nikodym Theorem 55; derivative 448, 468 Raiffa, H. 26–7, 244, 257, 495; preference for randomization 253; preferences 247 Ramsey, F. P. 3, 12, 27, 109, 136, 140 random outcomes or (roulette) lotteries 111, 127 randomization: device, modeling 246–8; stochastically independent and preferences 248–56 rank-dependence 22 rank dependent utility 17, 29, 138 rank-dependent expected utility (RDEU) 64, 182; class of functions 190; model 13–14 “rank-dependent probabilities” approach to uncertainty 533
554
Index
Rao, K. P. 47, 50, 57, 59, 67, 199, 201, 206 Rao, M. B. 47, 50, 57, 59, 67, 199, 201, 206 rational expectations: hypothesis 429; model of beliefs 454; models 367; revolution in macroeconomics 286; risk price 389 “rationalization” of non-Nash strategy profiles 293 Raviv, J. 395, 400 Rawls, J. 424, 533, 539; egalitarian solution 533 real-life decisions 6 real-valued function 7 real-world contracts, incomplete 303 recursive utility 381; specification 403 reference point, notion of 17 “relation-based” approach to modeling ambiguity 42; limitations 43 research and development procurement by US Defense Department 289 reservation prices 421 Revenue Equivalence Theorem 295 Revuz, A. 89, 91, 261, 275; finitely additive representation of 274 Revuz, D. 413 Richard, S. F. 448 Riemann integrals 59, 61, 92, 235, 264 Riesz: Representation Theorem 441; space 87; space with lattice operations 86 Rigotti, S. 287 risk 36; based and ambiguity-based behavioral traits 39; equivalence 110; interest rate free of 376; neutral agent 421; neutral firms, vertically related 304; neutral probabilities 388; neutral probability measure 368; premium 110; sensitivity 381; and uncertainty, distinction between 137, 420 risk aversion 110, 172, 175–6; concepts of 171; neutrality, definition of 176; preferences 178; in SEU model 211; theory of measurement of 210; and uncertainty, distinction between 175 risklessness 176; in state-dependent expected utility model 176 risk-sharing 285; opportunities 338; possibilities on an incomplete market economy 339; problem 287; proceeds 341 robust control: problem 368; theory 366 robustness 381; asset pricing under preference for 369; versus learning 367
Rosenmuller, J. 172, 189, 203–4 Rota, G. C. 84 roulette wheel lotteries (objective) 109, 121, 126–7, 494; acts 146, 182, 194; richness notion valid for 180 Routledge, B. 286 Royden, H. L. 446 Rubinstein, A. 506, 529 Ruckle, W. H. 104, 278 Rudin, W. 59, 104 Runolfsson, T. 381 Ryan, M. J. 291 Sahlin, N.-E. 146 Salinetti, G. 85 Samet, D. 476 Sargent, T. J. 286, 299–300, 364, 366, 378, 381, 383, 386, 392–3, 412–14 Sarin, R. K. 26–7, 29, 136, 146, 148, 181, 195, 244–5, 247, 257 Savage, L. J. 3, 7, 12, 24, 26, 37, 109, 122, 126, 140, 146, 151, 155, 160, 173, 183, 195, 209–10, 213, 215, 217, 234, 258, 419, 429, 486; P2 109; Theorem 121 Savage, L. J.: act 194; acts, EU axiomatizations on finite state space 249; axiomatization of expected utility 181; axioms 4–5, 17432, 526; decision theory 244; definition of probability 137; EU 146; framework by Gilboa 245; model 41, 421; model with richness of state space 25; paradigm, objections to 125; preference conditions of 21; rationality test 5; single prior 431, 483; standard frameworks for modeling uncertainty 246; subjective expected utility model axiomatized by 483; subjective expected utility (SEU) theory 136, 146, 148; Savage, postulates of: cumulative dominance 142; fineness of unambiguous events 142; nontriviality 142; P4 (cumulative reduction) 141, 143, 145; sure-thing principle (P2) 5, 24, 40, 136, 141, 144, 431; on unambiguous acts 141; weak ordering 141 Schaefer, H. H. 279 Scheinkman, J. A. 365, 371–2, 375, 387–8, 412–13 Schervish, M. J. 127, 156 Schmeidler, D. J. 4–6, 9, 14, 17, 20, 22–3, 26, 29–32, 37, 41, 43, 46, 49, 53–4, 56, 69, 77, 81, 88–9, 91, 108, 116, 124–6,
Index 555 128, 136–8, 143–4, 146, 148, 155, 160–1, 165, 171, 173, 177–8, 180, 194, 210, 214, 231, 253, 256, 258; 261–3, 266, 274–5, 291, 296, 299, 306, 336, 339–40, 353, 361, 416, 419–20, 423, 426, 431–2, 438–9, 452, 464, 466, 473, 476, 481, 483, 485, 487, 514, 519, 531, 533, 535, 537, 539; approach to ambiguity 73; CEU model 214; characterization of exact games 53; coin example 6; critique of Bayesianism 5; decision theory 46; definition of uncertainty aversions 172, 174, 195; financial market outcomes 284–7; formulation of CEU 195; inequality measurement 297–8; interest in uncertainty 12; intertemporal choice and multi-attribute choice 298–9; lotteryacts formulation 139; model 146; model of decision making under uncertainty, economic applications of 283–300; models, tractability of 299; optimal contractual arrangements 287–90; probabilistic sophistication model 39; strategic interaction 290–6; theorem 70; two-stage approach 148 Schmeidler–Gilboa model 421 Schmidt, U. 30 Schneider, M. 284, 414 Schroder, M. 381 Schultz, M. H. 107 Schwartz, J. T. 47, 49, 53–5, 58, 75, 79, 104, 124, 128–9, 131–2, 272, 276, 474 Segal, U. 14, 122, 146, 515 semigroups for model specification, robustness, prices of risk, and model detection, four 364–415; approximating model 377, 402; and its associated generator 365; as contraction 413; for detection error probability bounds 401; formulation of Markov processes 412; iterated laws and 365; marginal utility process in approximating model 407; model space, generator of positive contraction 400; parameterizations of generators of 377; pricing 372, 377; for pricing under robustness 401; rational expectations and model misspecification 364–5; see also model specification and semigroups Serlang, S. R. C. 169 set: of consequences 159; of priors 126 set functions 46–58; basic properties 46–9; core 49–58; lemma on basic properties
of null sets 55; lemma on continuous exact games 56; lemma on Σ as a σalgebra 53; proposition on balanced games 52; proposition on core compactness property 49; proposition on finite variations norm 48; proposition on “non-atomic” cores 57; theorem on balanced game 50; theorem on positive games 54 SEU see subjective expected utility; see also Savage Shafer, G. 15, 84–5, 96, 157, 208, 261, 267, 274, 426 Shalev, J. 298 Shannon, C. 287 Shapiro, L. 25 Shapley, L. S. 19, 48, 50, 65, 81–2, 101, 104, 159, 261, 263, 332 Sharkey, W. W. 104 Shelanski, H. 289 Shiller, R. J. 430–1; model of “fads” 454 Shin, H. S. 293 Shitovitz, B. 81 shocks 366; distributions 386 Shreve, S. E. 461 Simon, H. A. 303 Simonsen, M. H. 158, 428, 445, 468 Sims, C. A. 412 Singell Jr., L. D. 466 Sion, M. 458 Sipos, J. 103 SIRD see stochastically independent randomizing device Skiadas, C. 381, 467 Slutsky matrix 420 Smets, Ph. 157 Smith, A. B. C. 125 Smith, L. 401, 414 social choice 14; problems, reduction to decision under uncertainty 539 Soner, H. M. 383 sources of uncertainty 342 space of events and outcomes 322 specification: error in rational expectations models 412; tests, likelihood-based 364 standard expectations operator 311; additivity of 305 standard incomplete market equilibrium 356 standard two-period pure-exchange economy 474 state-contingent contracts 303 state space 23; outcome space, richness on 22
556
Index
states: of nature 23, 127, 139; of the world 159, 245 statistical detection operator 412 stochastic discount factors 448, 450 stochastic dominance underlying contingency space 333 stochastically independent randomization and uncertainty aversion 244–58; CEU, uncertainty and randomizing devices 253–5; definition of preferences 248–9; MMEU and randomising devices 250–3; modelling randomizing device 246–8; and preferences 248–56; preliminaries and notation 245–6; stochastically independent randomizing device (SIRD), condition 255–6; stochastically independent randomizing device (SIRD), definition of 249–50; theorem on CEU preferences 253–4; theorem on MMEU preferences 250–1 stochastically independent randomizing device (SIRD) 249–50; condition 255–6 stock markets: prices fall after initial public offering 158; volatility in 210 Stokey, N. 158, 458, 481; updating NA 158 Stone lattices 69 Strassen, V. 103, 126 Straszewicz, S. 134; theorem of 134 strategies in dynamic game 523–4; choice of action(s) 523; deferring or concealing choice of 525 strategy: in decision making, theory of 290; in equilibrium, defining under ambiguity 290; in interaction 290–6; pure 291 Stuck, B. W. 400 subjective decision attitude toward incomplete information 22 subjective expected utility (SEU) 24, 26; conditions to characterize 25–8; maximization as benchmark representing ambiguity neutrality 231–4; maximizers 305, 337; model 171; model of Anscombe and Aumann 218; orderings 217; preference relation 224; theory 136, 146, 148, 180; theory of decision making under uncertainty 209; world 344; see also Savage subjective probabilities or “capacities” 139; measure 24 subjective probability and expected utility (EU) without additivity 108–34; Anscombe–Aumann Theorem 114–15;
axioms and background 111–14; theorem 115–20 Sugden, R. 29 Summers, L. H. 450 super modularity, or 2-monotonicity 9 superadditive game 47 Sure-thing Principle 21–2, 24–5, 109, 527; of unambiguous acts 136, 141, 144; variation preserving 298; see also Savage symmetric and complementary uncertain events 108 symmetry: for additive capacity 144; in information 108 λ-system (a class closed with respect to complements and disjoint unions) 40–2; see also Zhang Tallarini, Jr, T. D. 365–7, 370, 392, 406–7, 409, 411–14; risk-sensitivity interpretation Tallon, J.-M. 30, 214, 285–6, 289, 300, 309, 350, 354, 336, 472–3 taxonomy on games 50 technical axioms 22 technical richness conditions 25 Thompson, A. 447 Tirole, J. 303, 497, 519 tools of classical statistics 11 ν-topology 264; properties of the 265 total variation norm 47 trade among agents 158 tradeoff-consistency-like axioms 31 transaction-specific assets 320 transaction-specific investments 321 transactions costs 303, 414; of asymmetric information 421; paradigm 320, 333 transferable utility coalitional games 261 transitivity 22, 29; and completeness of preferences 20 Turmuhambetova, G. A. 366, 378, 383, 386, 392 Tuttle, M. R. 156 Tversky, A. 3, 13–14, 22, 26, 29–30, 40, 242; and Kahneman, Prospect Theory of 3; see also Prospect Theory two-coins flip example 41 two-consequence acts 145 two period finance economy, model of stylized 342 “two-stage” or Anscombe–Aumann model 147, 244, 247 two-urn and one-urn experiments of Ellsberg 4; see also Ellsberg
Index 557 Tychonoff Theorem 51 unambiguous acts: “behavioral” notion of 213; and events 226–8 unambiguous events 138; revealed 144–6 unambiguous preference relation 41 unanimity games 62, 82–3, 262, 266–7, 275 uncertainty: averse players 291; aversion axiom 43; and Bayesianism 3–6; kinds of 36; premium 110; premium incorporated in equilibrium security market prices 287; qualified by a probability measure 4; representation by nonadditive measure 15; and vertical integration, link between 305; of war 5 uncertainty aversion 110, 119, 128, 177–8, 181, 183, 187, 201, 242, 246, 423; additive function on σχ 201; bets, beliefs and 184–6; for the CEU model 258; and convexity 183; current definition of 173–5; definition of 171–207; definition and attractive properties of 179–86; under differentiability 191–3, 202–5; differentiable utilities 186–93; eventwise differentiability, definition of 187–9; for general nonbinary acts 193; implications of 185–6; inner measures 183–4; lemma on CEU utility function 182; lemma on CEU utility function proof 196; lemma on inner measure 183; lemma on inner measure proof 198; lemma on probability measure 180; lemma on probability measure proof 196; multiple-priors and CEU utilities 180–3; of preferences 432; theorem on eventwise differentiability 192–3; theorem on eventwise differentiability, proof 198–201; theorem on multiple priors order 181; theorem on probabilistically sophisticated order180; theorem 190 uncertainty aversion and optimal portfolio: constant 423; definition on amount of probability lost 423; definition on nonadditive probabilities 424; lemma on constant uncertainty aversion 423; lemma on optimal portfolio 426; portfolio choice 425–6; risk aversion and 419–27; theorem on nonadditive probabilities 424; theorem on riskaverse investor 426 uncertainty loving and uncertainty neutrality 178
uncertainty neutral benchmark, Epstein set of preferences 258 Uniform Boundedness Principle 75 unindexed debt 286 unique prior probability 155 unknown urn with randomisation: in consequence space (AnscombeAumann) 246; in the state space only (Savage) 247 update rule 161; classical 163 updating ambiguous beliefs 155–68; Bayesian and classical rules 161–3; framework and preliminaries 159–61; proofs and related analysis 163–8; proposition on associated set of measures 161; proposition on commutativity in 162; proposition on convexity in 161; theorem on commutativity 163; theorem on commutativity proof 165–8; theorem on equivalence162; theorem on equivalence proof 163–5 Uppal, R. 286 U.S. auto manufacturers 321 utility: characterizations of properties of 28; function 24, 155, 159, 437; recursive model of 438; sophistication 44 Vardennan, S. 127 Vind, K. 27, 29 volatility: for prices 454; puzzle of excess 430 Volkmer, H. 208 von Neumann, J. 124, 169, 419 von Neumann–Morgenstern: expected utility theory 155; index intertemporal 455; model 121, 486; stability of cores 81; Theorem 109, 113–14, 117, 128; utility index 343; utility of money 111 wage contract 287 Wakai, K. 298 Wakker, P. P. 17, 20, 26–30, 40, 77, 121, 126, 136, 138, 141, 148, 155, 160, 181, 195, 217, 220, 236, 242, 244–9, 257, 298; tradeoff consistency technique 27 Wald, A. 126, 424; decision rule 125; minimax criterion 126 Walley, P. 353, 357, 361, 430, 434, 439 Wang, T. 103, 173, 210, 284–6, 309, 341, 351–2, 362, 365–7, 370, 401, 412–14, 416, 429, 466, 520 Wasserman, L. A. 208, 434–5, 451, 465 Weak Beliefs Equilibrium 509, 511, 513
558
Index
weak order 111, 127, 140–1, 215 weak*-topology 104, 128, 474, 477 Weber, H. 208 Weber, M. 309, 430, 483, 527 Webster, R. 101 Werlang, S. R. C. 158, 259, 284–5, 291–2, 294, 309–10, 337–8, 340, 351, 416, 445, 468–9, 473, 483, 493, 509, 511–14, 516, 520, 529; definition of Nash Equilibrium Under Uncertainty 511; notion of equilibrium 296 Werner, J. 363 West, K. D. 454 Wets, R. 85 Weymark, J. A. 14, 302, 533–4 Whinston, M. D. 321 Widder, D. V. 85 Williams, N. 366, 378, 383, 386, 392 Williamson, O. E. 303 Williams, S. R. 313 Wolfenson, M. 49 Wolinsky, A. 529 Woodford, M. 467
worst-case jump measure 376 worst-case models 366, 369–70, 377, 386, 392–3, 402, 405, 410; specification semigroups, θ-constrained 382 Yaari, M. E. 14, 122, 145, 172, 175, 211, 533–4; axiomatization of rankdependent utility for risk 24; general definition of risk aversion for nonexpected utility references 37 Yaron, A. 403–4, 414 Yoo, K. R. 158 Yor, M. 413 Yosida approximation 371 Zarnowitz, V. 430–1, 456 Zeidler, E. 458 Zeldes, S. 361 Zhang, J. 40, 43, 174, 177, 183, 214–15, 234, 242; definition of unambiguous event 41; λ-system 175 Zhou, L. 69 Zin, S. E. 286, 366, 437, 458, 466–7
eBooks – at www.eBookstore.tandf.co.uk
A library at your fingertips!
eBooks are electronic versions of printed books. You can store them on your PC/laptop or browse them online. They have advantages for anyone needing rapid access to a wide variety of published, copyright information. eBooks can help your research by enabling you to bookmark chapters, annotate text and use instant searches to find specific words or phrases. Several eBook files would fit on even a small laptop or PDA. NEW: Save money by eSubscribing: cheap, online access to any eBook for as long as you need it.
Annual subscription packages We now offer special low-cost bulk subscriptions to packages of eBooks in certain subject areas. These are available to libraries or to individuals. For more information please contact [email protected] We’re continually developing the eBook concept, so keep up to date by visiting the website.
www.eBookstore.tandf.co.uk