CONTENTS A. PAKES: Alternative Models for Moment Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Adverse Selection in Competitive Search Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MELISSA DELL: The Persistent Effects of Peru’s Mining Mita . . . . . . . . . . . . . . . . . . . . . . . . . . ALEJANDRO M. MANELLI AND DANIEL R. VINCENT: Bayesian and Dominant-Strategy Implementation in the Independent Private-Values Model . . . . . . . . . . . . . . . . . . . . . . . . . . SIMON GRANT, ATSUSHI KAJII, BEN POLAK, AND ZVI SAFRA: Generalized Utilitarianism and Harsanyi’s Impartial Observer Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DAVID DILLENBERGER: Preferences for One-Shot Resolution of Uncertainty and AllaisType Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DIRK ENGELMANN AND GUILLAUME HOLLARD: Reconsidering the Effect of Market Experience on the “Endowment Effect” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1783
VERONICA GUERRIERI, ROBERT SHIMER, AND RANDALL WRIGHT:
1823 1863 1905 1939 1973 2005
NOTES AND COMMENTS: SHAKEEB KHAN AND ELIE TAMER: Irregular Identification, Support Conditions, and
Inverse Weight Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Testing for Causal Effects in a Generalized Regression Model With Endogenous Regressors . . . . . PER KRUSELL, BURHANETTIN KURU¸SÇU, AND ANTHONY A. SMITH, JR.: Temptation and Taxation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LARRY G. EPSTEIN: A Paradox for the “Smooth Ambiguity” Model of Preference .
2021
JASON ABREVAYA, JERRY A. HAUSMAN, AND SHAKEEB KHAN:
2043 2063 2085
ANNOUNCEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2101 FORTHCOMING PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2103
(2010 Volume Table of Contents is located on p. iii of this issue.)
VOL. 78, NO. 6 — November, 2010
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org EDITOR STEPHEN MORRIS, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] MANAGING EDITOR GERI MATTSON, 2002 Holly Neck Road, Baltimore, MD 21221, U.S.A.; mattsonpublishingservices@ comcast.net CO-EDITORS DARON ACEMOGLU, Dept. of Economics, MIT, E52-380B, 50 Memorial Drive, Cambridge, MA 021421347, U.S.A.;
[email protected] PHILIPPE JEHIEL, Dept. of Economics, Paris School of Economics, 48 Bd Jourdan, 75014 Paris, France; University College London, U.K.;
[email protected] WOLFGANG PESENDORFER, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A.;
[email protected] JEAN-MARC ROBIN, Dept. of Economics, Sciences Po, 28 rue des Saints Pères, 75007 Paris, France and University College London, U.K.;
[email protected] JAMES H. STOCK, Dept. of Economics, Harvard University, Littauer M-26, 1830 Cambridge Street, Cambridge, MA 02138, U.S.A.;
[email protected] ASSOCIATE EDITORS YACINE AÏT-SAHALIA, Princeton University JOSEPH G. ALTONJI, Yale University JAMES ANDREONI, University of California, San Diego JUSHAN BAI, Columbia University MARCO BATTAGLINI, Princeton University PIERPAOLO BATTIGALLI, Università Bocconi DIRK BERGEMANN, Yale University YEON-KOO CHE, Columbia University XIAOHONG CHEN, Yale University VICTOR CHERNOZHUKOV, Massachusetts Institute of Technology J. DARRELL DUFFIE, Stanford University JEFFREY ELY, Northwestern University HALUK ERGIN, Duke University JIANQING FAN, Princeton University MIKHAIL GOLOSOV, Yale University FARUK GUL, Princeton University JINYONG HAHN, University of California, Los Angeles PHILIP A. HAILE, Yale University JOHANNES HORNER, Yale University MICHAEL JANSSON, University of California, Berkeley PER KRUSELL, Stockholm University FELIX KUBLER, University of Zurich OLIVER LINTON, London School of Economics BART LIPMAN, Boston University
THIERRY MAGNAC, Toulouse School of Economics (GREMAQ and IDEI) DAVID MARTIMORT, IDEI-GREMAQ, Université des Sciences Sociales de Toulouse, Paris School of Economics STEVEN A. MATTHEWS, University of Pennsylvania ROSA L. MATZKIN, University of California, Los Angeles SUJOY MUKERJI, University of Oxford LEE OHANIAN, University of California, Los Angeles WOJCIECH OLSZEWSKI, Northwestern University NICOLA PERSICO, New York University JORIS PINKSE, Pennsylvania State University BENJAMIN POLAK, Yale University PHILIP J. RENY, University of Chicago SUSANNE M. SCHENNACH, University of Chicago ANDREW SCHOTTER, New York University NEIL SHEPHARD, University of Oxford MARCIANO SINISCALCHI, Northwestern University JEROEN M. SWINKELS, Northwestern University ELIE TAMER, Northwestern University EDWARD J. VYTLACIL, Yale University IVÁN WERNING, Massachusetts Institute of Technology ASHER WOLINSKY, Northwestern University
EDITORIAL ASSISTANT: MARY BETH BELLANDO, Dept. of Economics, Princeton University, Fisher Hall, Princeton, NJ 08544-1021, U.S.A.;
[email protected] Information on MANUSCRIPT SUBMISSION is provided in the last two pages. Information on MEMBERSHIP, SUBSCRIPTIONS, AND CLAIMS is provided in the inside back cover.
INDEX ARTICLES ABBRING, JAAP H. AND JEFFREY R. CAMPBELL: Last-In First-Out Oligopoly Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . AHN, DAVID S. AND HALUK ERGIN: Framing Contingencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ANDREWS, DONALD W. K. AND GUSTAVO SOARES: Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ARMSTRONG, MARK AND JOHN VICKERS: A Model of Delegated Project Choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BAI, YAN AND JING ZHANG: Solving the Feldstein–Horioka Puzzle With Financial Frictions . . . . . . . . . . . . . . . . . . . . . . . . . . . BAJARI, PATRICK, HAN HONG AND STEPHEN P. RYAN: Identification and Estimation of a Discrete Game of Complete Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BESANKO, DAVID, ULRICH DORASZELSKI, YAROSLAV KRYUKOV AND MARK SATTERTHWAITE: Learning-by-Doing, Organizational Forgetting, and Industry Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BESLEY, TIMOTHY AND TORSTEN PERSSON: State Capacity, Conflict, and Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BIAIS, BRUNO, THOMAS MARIOTTI, JEAN-CHARLES ROCHET AND STÉPHANE VILLENEUVE: Large Risks, Limited Liability, and Dynamic Moral Hazard . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BRUHIN, ADRIAN, HELGA FEHR-DUDA AND THOMAS EPPER: Risk and Rationality: Uncovering Heterogeneity in Probability Distortion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CALDENTEY, RENÉ AND ENNIO STACCHETTI: Insider Trading With a Random Deadline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CAMPBELL, JEFFREY R.: (See ABBRING) CHAMBERLAIN, GARY: Binary Response Models for Panel Data: Identification and Information . . . . . . . . . . . . . . . . . . . . . . . CHASSANG, SYLVAIN: Fear of Miscoordination and the Robustness of Cooperation in Dynamic Global Games With Exit CHE, YEON-KOO AND FUHITO KOJIMA: Asymptotic Equivalence of Probabilistic Serial and Random Priority Mechanisms CHERNOZHUKOV, VICTOR, IVÁN FERNÁNDEZ-VAL AND ALFRED GALICHON: Quantile and Probability Curves Without Crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CHESHER, ANDREW: Instrumental Variable Models for Discrete Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CITANNA, ALESSANDRO AND PAOLO SICONOLFI: Recursive Equilibrium in Stochastic Overlapping-Generations Economies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . COMPTE, OLIVIER AND PHILIPPE JEHIEL: The Coalitional Nash Bargaining Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CUNHA, FLAVIO, JAMES J. HECKMAN AND SUSANNE M. SCHENNACH: Estimating the Technology of Cognitive and Noncognitive Skill Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DELL, MELISSA: The Persistent Effects of Peru’s Mining Mita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DILLENBERGER, DAVID: Preferences for One-Shot Resolution of Uncertainty and Allais-Type Behavior . . . . . . . . . . . . . . . . DORASZELSKI, ULRICH: (See BESANKO) EECKHOUT, JAN AND PHILIPP KIRCHER: Sorting and Decentralized Price Competition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EINAV, LIRAN, AMY FINKELSTEIN AND PAUL SCHRIMPF: Optimal Mandates and the Welfare Cost of Asymmetric Information: Evidence From the U.K. Annuity Market . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ENGELMANN, DIRK AND GUILLAUME HOLLARD: Reconsidering the Effect of Market Experience on the “Endowment Effect” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . EPPER, THOMAS: (See BRUHIN) ERGIN, HALUK AND TODD SARVER: A Unique Costly Contemplation Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ERGIN, HALUK: (See AHN) FEHR-DUDA, HELGA: (See BRUHIN) FERNÁNDEZ-VAL, IVÁN: (See CHERNOZHUKOV) FINKELSTEIN, AMY: (See EINAV) GALICHON, ALFRED: (See CHERNOZHUKOV) GANUZA, JUAN-JOSÉ AND JOSÉ S. PENALVA: Signal Orderings Based on Dispersion and the Supply of Private Information in Auctions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GENTZKOW, MATTHEW AND JESSE M. SHAPIRO: What Drives Media Slant? Evidence From U.S. Daily Newspapers . . . . GONZALEZ, FRANCISCO M. AND SHOUYONG SHI: An Equilibrium Theory of Learning, Search, and Wages . . . . . . . . . . . . . GRANT, SIMON, ATSUSHI KAJII, BEN POLAK AND ZVI SAFRA: Generalized Utilitarianism and Harsanyi’s Impartial Observer Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GUERRIERI, VERONICA, ROBERT SHIMER AND RANDALL WRIGHT: Adverse Selection in Competitive Search Equilibrium HECKMAN, JAMES J., ROSA L. MATZKIN AND LARS NESHEIM: Nonparametric Identification and Estimation of Nonadditive Hedonic Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HECKMAN, JAMES J.: (See CUNHA) HELLWIG, MARTIN F.: Incentive Problems With Unidimensional Hidden Characteristics: A Unified Approach . . . . . . . . . . HELPMAN, ELHANAN, OLEG ITSKHOKI AND STEPHEN REDDING: Inequality and Unemployment in a Global Economy . . HOLLARD, GUILLAUME: (See ENGELMANN) HONG, HAN: (See BAJARI) ITSKHOKI, OLEG: (See HELPMAN) JEHIEL, PHILIPPE: (See COMPTE) KAJII, ATSUSHI: (See GRANT)
iii
1491 655 119 213 603 1529 453 1 73 1375 245 159 973 1625 1093 575 309 1593 883 1863 1973 539 1031 2005 1285
1007 35 509 1939 1823 1569 1201 1239
iv
INDEX
KIRCHER, PHILIPP: (See EECKHOUT) KOJIMA, FUHITO AND MIHAI MANEA: Axioms for Deferred Acceptance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KOJIMA, FUHITO: (See CHE) KRYUKOV, YAROSLAV: (See BESANKO) LEVITT, STEVEN D., JOHN A. LIST AND DAVID H. REILEY: What Happens in the Field Stays in the Field: Exploring Whether Professionals Play Minimax in Laboratory Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LIST, JOHN A.: (See LEVITT) MANEA, MIHAI: (See KOJIMA) MANELLI, ALEJANDRO M. AND DANIEL R. VINCENT: Bayesian and Dominant-Strategy Implementation in the Independent Private-Values Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MARIOTTI, THOMAS: (See BIAIS) MARTINS-DA-ROCHA, V. FILIPE AND YIANNIS VAILAKIS: Existence and Uniqueness of a Fixed Point for Local Contractions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MATZKIN, ROSA L.: (See HECKMAN) NESHEIM, LARS: (See HECKMAN) OBARA, ICHIRO: (See RAHMAN) PAKES, A.: Alternative Models for Moment Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PENALVA, JOSÉ S.: (See GANUZA) PERSSON, TORSTEN: (See BESLEY) PETERS, MICHAEL: Noncontractible Heterogeneity in Directed Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . POLAK, BEN: (See GRANT) RAHMAN, DAVID AND ICHIRO OBARA: Mediated Partnerships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . REDDING, STEPHEN: (See HELPMAN) REILEY, DAVID H.: (See LEVITT) ROCHET, JEAN-CHARLES: (See BIAIS) ROMANO, JOSEPH P. AND AZEEM M. SHAIKH: Inference for the Identified Set in Partially Identified Econometric Models ROZEN, KAREEN: Foundations of Intrinsic Habit Formation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RYAN, STEPHEN P.: (See BAJARI) SAFRA, ZVI: (See GRANT) SANNIKOV, YULIY AND ANDRZEJ SKRZYPACZ: The Role of Information in Repeated Games With Frequent Actions . . . . SARVER, TODD: (See ERGIN) SATTERTHWAITE, MARK: (See BESANKO) SCHENNACH, SUSANNE M.: (See CUNHA) SCHRIMPF, PAUL: (See EINAV) SHAIKH, AZEEM M.: (See ROMANO) SHAPIRO, JESSE M.: (See GENTZKOW) SHI, SHOUYONG: (See GONZALEZ) SHIMER, ROBERT: (See GUERRIERI) SICONOLFI, PAOLO: (See CITANNA) SKRZYPACZ, ANDRZEJ: (See SANNIKOV) SOARES, GUSTAVO: (See ANDREWS) STACCHETTI, ENNIO: (See CALDENTEY) STRULOVICI, BRUNO: Learning While Voting: Determinants of Collective Experimentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . VAILAKIS, YIANNIS: (See MARTINS-DA-ROCHA) VICKERS, JOHN: (See ARMSTRONG) VILLENEUVE, STÉPHANE: (See BIAIS) VINCENT, DANIEL R.: (See MANELLI) WRIGHT, RANDALL: (See GUERRIERI) ZHANG, JING: (See BAI)
633
1413
1905
1127
1783
1173 285
169 1341
847
933
NOTES AND COMMENTS ABREVAYA, JASON, JERRY A. HAUSMAN AND SHAKEEB KHAN: Testing for Causal Effects in a Generalized Regression Model With Endogenous Regressors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ASHLAGI, ITAI, MARK BRAVERMAN, AVINATAN HASSIDIM AND DOV MONDERER: Monotonicity and Implementability . . BARTOLUCCI, FRANCESCO AND VALENTINA NIGRO: A Dynamic Model for Binary Panel Data With Unobserved Hetero√ geneity Admitting a n-Consistent Conditional Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BEARE, BRENDAN K.: Copulas and Temporal Dependence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BERGEMANN, DIRK AND JUUSO VÄLIMÄKI: The Dynamic Pivot Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . BRAVERMAN, MARK: (See ASHLAGI) BUGNI, FEDERICO A.: Bootstrap Inference in Partially Identified Models Defined by Moment Inequalities: Coverage of the Identified Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CARNEIRO, PEDRO, JAMES J. HECKMAN AND EDWARD VYTLACIL: Evaluating Marginal Policy Changes and the Average Effect of Treatment for Individuals at the Margin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2043 1749 719 395 771
735 377
INDEX
EPSTEIN, LARRY G.: A Paradox for the “Smooth Ambiguity” Model of Preference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FUDENBERG, DREW AND YUICHI YAMAMOTO: Repeated Games Where the Payoffs and Monitoring Structure Are Unknown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . GILBOA, ITZHAK, FABIO MACCHERONI, MASSIMO MARINACCI AND DAVID SCHMEIDLER: Objective and Subjective Rationality in a Multiple Prior Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . HASHIMOTO, TADASHI: Corrigendum to “Games With Imperfectly Observable Actions in Continuous Time” . . . . . . . . . . . HASSIDIM, AVINATAN: (See ASHLAGI) HAUSMAN, JERRY A.: (See ABREVAYA) HECKMAN, JAMES J.: (See CARNEIRO) IVANOV, ASEN, DAN LEVIN AND MURIEL NIEDERLE: Can Relaxation of Beliefs Rationalize the Winner’s Curse?: An Experimental Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KAMADA, YUICHIRO: Strongly Consistent Self-Confirming Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . KHAN, SHAKEEB AND ELIE TAMER: Irregular Identification, Support Conditions, and Inverse Weight Estimation . . . . . . . KHAN, SHAKEEB: (See ABREVAYA) KRÄTZIG, MARKUS: (See WINSCHEL) KRUSELL, PER, BURHANETTIN KURU¸SÇU AND ANTHONY A. SMITH, JR.: Temptation and Taxation . . . . . . . . . . . . . . . . . . . . . KUERSTEINER, GUIDO AND RYO OKUI: Constructing Optimal Instruments by First-Stage Prediction Averaging . . . . . . . . . KURU¸SÇU, BURHANETTIN: (See KRUSELL) LEVIN, DAN: (See IVANOV) MACCHERONI, FABIO: (See GILBOA) MARINACCI, MASSIMO: (See GILBOA) MIKUSHEVA, ANNA: Corrigendum to “Uniform Inference in Autoregressive Models” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MONDERER, DOV: (See ASHLAGI) NIEDERLE, MURIEL: (See IVANOV) NIGRO, VALENTINA: (See BARTOLUCCI) OKUI, RYO: (See KUERSTEINER) PESENDORFER, MARTIN AND PHILIPP SCHMIDT-DENGLER: Sequential Estimation of Dynamic Discrete Games: A Comment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . SCHMEIDLER, DAVID: (See GILBOA) SCHMIDT-DENGLER, PHILIPP: (See PESENDORFER) SMITH, JR., ANTHONY A.: (See KRUSELL) SPRUMONT, YVES: An Axiomatization of the Serial Cost-Sharing Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . STOVALL, JOHN E.: Multiple Temptations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . TAMER, ELIE: (See KHAN) VÄLIMÄKI, JUUSO: (See BERGEMANN) VYTLACIL, EDWARD: (See CARNEIRO) WINSCHEL, VIKTOR AND MARKUS KRÄTZIG: Solving, Estimating, and Selecting Nonlinear Dynamic Models Without the Curse of Dimensionality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . WOODERS, JOHN: Does Experience Teach? Professionals and Minimax Play in the Lab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . YAMAMOTO, YUICHI: (See FUDENBERG) YAMASHITA, TAKURO: Mechanism Games With Multiple Principals and Three or More Agents . . . . . . . . . . . . . . . . . . . . . . . .
v 2085 1673 755 1155
1435 823 2021
2063 697
1773
833
1711 349
803 1143 791
ANNOUNCEMENTS 2009 ELECTION OF FELLOWS FOR THE ECONOMETRIC SOCIETY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1165 ANNOUNCEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 411, 843, 1161, 1453, 1775, 2101 ECONOMETRICA REFEREES 2008–2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 437 FELLOWS OF THE ECONOMETRIC SOCIETY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1457 FORTHCOMING PAPERS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 413, 845, 1163, 1455, 1777, 2103 REPORT OF THE EDITORS 2008–2009 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433 REPORT OF THE EDITORS OF THE MONOGRAPH SERIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 447 REPORT OF THE PRESIDENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1779 REPORT OF THE SECRETARY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 REPORT OF THE TREASURER . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 SUBMISSION OF MANUSCRIPTS TO THE ECONOMETRIC SOCIETY MONOGRAPH SERIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 451, 1489
SUBMISSION OF MANUSCRIPTS TO ECONOMETRICA 1. Members of the Econometric Society may submit papers to Econometrica electronically in pdf format according to the guidelines at the Society’s website: http://www.econometricsociety.org/submissions.asp Only electronic submissions will be accepted. In exceptional cases for those who are unable to submit electronic files in pdf format, one copy of a paper prepared according to the guidelines at the website above can be submitted, with a cover letter, by mail addressed to Professor Stephen Morris, Dept. of Economics, Princeton University, Fisher Hall, Prospect Avenue, Princeton, NJ 08544-1021, U.S.A. 2. There is no charge for submission to Econometrica, but only members of the Econometric Society may submit papers for consideration. In the case of coauthored manuscripts, at least one author must be a member of the Econometric Society. Nonmembers wishing to submit a paper may join the Society immediately via Blackwell Publishing’s website. Note that Econometrica rejects a substantial number of submissions without consulting outside referees. 3. It is a condition of publication in Econometrica that copyright of any published article be transferred to the Econometric Society. Submission of a paper will be taken to imply that the author agrees that copyright of the material will be transferred to the Econometric Society if and when the article is accepted for publication, and that the contents of the paper represent original and unpublished work that has not been submitted for publication elsewhere. If the author has submitted related work elsewhere, or if he does so during the term in which Econometrica is considering the manuscript, then it is the author’s responsibility to provide Econometrica with details. There is no page fee; nor is any payment made to the authors. 4. Econometrica has the policy that all empirical and experimental results as well as simulation experiments must be replicable. For this purpose the Journal editors require that all authors submit datasets, programs, and information on experiments that are needed for replication and some limited sensitivity analysis. (Authors of experimental papers can consult the posted list of what is required.) This material for replication will be made available through the Econometrica supplementary material website. The format is described in the posted information for authors. Submitting this material indicates that you license users to download, copy, and modify it; when doing so such users must acknowledge all authors as the original creators and Econometrica as the original publishers. If you have compelling reason we may post restrictions regarding such usage. At the same time the Journal understands that there may be some practical difficulties, such as in the case of proprietary datasets with limited access as well as public use datasets that require consent forms to be signed before use. In these cases the editors require that detailed data description and the programs used to generate the estimation datasets are deposited, as well as information of the source of the data so that researchers who do obtain access may be able to replicate the results. This exemption is offered on the understanding that the authors made reasonable effort to obtain permission to make available the final data used in estimation, but were not granted permission. We also understand that in some particularly complicated cases the estimation programs may have value in themselves and the authors may not make them public. This, together with any other difficulties relating to depositing data or restricting usage should be stated clearly when the paper is first submitted for review. In each case it will be at the editors’ discretion whether the paper can be reviewed. 5. Papers may be rejected, returned for specified revision, or accepted. Approximately 10% of submitted papers are eventually accepted. Currently, a paper will appear approximately six months from the date of acceptance. In 2002, 90% of new submissions were reviewed in six months or less. 6. Submitted manuscripts should be formatted for paper of standard size with margins of at least 1.25 inches on all sides, 1.5 or double spaced with text in 12 point font (i.e., under about 2,000 characters, 380 words, or 30 lines per page). Material should be organized to maximize readability; for instance footnotes, figures, etc., should not be placed at the end of the manuscript. We strongly encourage authors to submit manuscripts that are under 45 pages (17,000 words) including everything (except appendices containing extensive and detailed data and experimental instructions).
While we understand some papers must be longer, if the main body of a manuscript (excluding appendices) is more than the aforementioned length, it will typically be rejected without review. 7. Additional information that may be of use to authors is contained in the “Manual for Econometrica Authors, Revised” written by Drew Fudenberg and Dorothy Hodges, and published in the July, 1997 issue of Econometrica. It explains editorial policy regarding style and standards of craftmanship. One change from the procedures discussed in this document is that authors are not immediately told which coeditor is handling a manuscript. The manual also describes how footnotes, diagrams, tables, etc. need to be formatted once papers are accepted. It is not necessary to follow the formatting guidelines when first submitting a paper. Initial submissions need only be 1.5 or double-spaced and clearly organized. 8. Papers should be accompanied by an abstract of no more than 150 words that is full enough to convey the main results of the paper. On the same sheet as the abstract should appear the title of the paper, the name(s) and full address(es) of the author(s), and a list of keywords. 9. If you plan to submit a comment on an article which has appeared in Econometrica, we recommend corresponding with the author, but require this only if the comment indicates an error in the original paper. When you submit your comment, please include any correspondence with the author. Regarding comments pointing out errors, if an author does not respond to you after a reasonable amount of time, then indicate this when submitting. Authors will be invited to submit for consideration a reply to any accepted comment. 10. Manuscripts on experimental economics should adhere to the “Guidelines for Manuscripts on Experimental Economics” written by Thomas Palfrey and Robert Porter, and published in the July, 1991 issue of Econometrica. Typeset at VTEX, Akademijos Str. 4, 08412 Vilnius, Lithuania. Printed at The Sheridan Press, 450 Fame Avenue, Hanover, PA 17331, USA. Copyright © 2010 by The Econometric Society (ISSN 0012-9682). Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation, including the name of the author. Copyrights for components of this work owned by others than the Econometric Society must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works, requires prior specific permission and/or a fee. Posting of an article on the author’s own website is allowed subject to the inclusion of a copyright statement; the text of this statement can be downloaded from the copyright page on the website www.econometricsociety.org/permis.asp. Any other permission requests or questions should be addressed to Claire Sashi, General Manager, The Econometric Society, Dept. of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA. Email:
[email protected]. Econometrica (ISSN 0012-9682) is published bi-monthly by the Econometric Society, Department of Economics, New York University, 19 West 4th Street, New York, NY 10012. Mailing agent: Sheridan Press, 450 Fame Avenue, Hanover, PA 17331. Periodicals postage paid at New York, NY and additional mailing offices. U.S. POSTMASTER: Send all address changes to Econometrica, Blackwell Publishing Inc., Journals Dept., 350 Main St., Malden, MA 02148, USA.
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Membership Joining the Econometric Society, and paying by credit card the corresponding membership rate, can be done online at www.econometricsociety.org. Memberships are accepted on a calendar year basis, but the Society welcomes new members at any time of the year, and in the case of print subscriptions will promptly send all issues published earlier in the same calendar year. Membership Benefits • Possibility to submit papers to Econometrica, Quantitative Economics, and Theoretical Economics • Possibility to submit papers to Econometric Society Regional Meetings and World Congresses • Full text online access to all published issues of Econometrica (Quantitative Economics and Theoretical Economics are open access) • Full text online access to papers forthcoming in Econometrica (Quantitative Economics and Theoretical Economics are open access) • Free online access to Econometric Society Monographs, including the volumes of World Congress invited lectures • Possibility to apply for travel grants for Econometric Society World Congresses • 40% discount on all Econometric Society Monographs • 20% discount on all John Wiley & Sons publications • For print subscribers, hard copies of Econometrica, Quantitative Economics, and Theoretical Economics for the corresponding calendar year Membership Rates Membership rates depend on the type of member (ordinary or student), the class of subscription (print and online or online only) and the country classification (high income or middle and low income). The rates for 2010 are the following:
Ordinary Members Print and Online Online only Print and Online Online only
1 year (2010) 1 year (2010) 3 years (2010–2012) 3 years (2010–2012)
Student Members Print and Online Online only
1 year (2010) 1 year (2010)
High Income
Other Countries
$90 / €65 / £55 $50 / €35 / £30 $216 / €156 / £132 $120 / €84 / £72
$50 $10 $120 $24
$50 / €35 / £30 $10 / €7 / £6
$50 $10
Euro rates are for members in Euro area countries only. Sterling rates are for members in the UK only. All other members pay the US dollar rate. Countries classified as high income by the World Bank are: Andorra, Antigua and Barbuda, Aruba, Australia, Austria, The Bahamas, Bahrain, Barbados, Belgium, Bermuda, Brunei, Canada, Cayman Islands, Channel Islands, Croatia, Cyprus, Czech Republic, Denmark, Equatorial Guinea, Estonia, Faeroe Islands, Finland, France, French Polynesia, Germany, Greece, Greenland, Guam, Hong Kong (China), Hungary, Iceland, Ireland, Isle of Man, Israel, Italy, Japan, Rep. of Korea, Kuwait, Liechtenstein, Luxembourg, Macao (China), Malta, Monaco, Netherlands, Netherlands Antilles, New Caledonia, New Zealand, Northern Mariana Islands, Norway, Oman, Portugal, Puerto Rico, Qatar, San Marino, Saudi Arabia, Singapore, Slovak Republic, Slovenia, Spain, Sweden, Switzerland, Taiwan (China), Trinidad and Tobago, United Arab Emirates, United Kingdom, United States, Virgin Islands (US). Institutional Subscriptions Information on Econometrica subscription rates for libraries and other institutions is available at www.econometricsociety.org. Subscription rates depend on the class of subscription (print and online or online only) and the country classification (high income, middle income, or low income). Back Issues and Claims For back issues and claims contact Wiley Blackwell at
[email protected].
An International Society for the Advancement of Economic Theory in its Relation to Statistics and Mathematics Founded December 29, 1930 Website: www.econometricsociety.org Administrative Office: Department of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA; Tel. 212-9983820; Fax 212-9954487 General Manager: Claire Sashi (
[email protected]) 2010 OFFICERS JOHN MOORE, University of Edinburgh and London School of Economics, PRESIDENT BENGT HOLMSTRÖM, Massachusetts Institute of Technology, FIRST VICE-PRESIDENT JEAN-CHARLES ROCHET, University of Zurich and Toulouse School of Economics, SECOND VICE-PRESIDENT ROGER B. MYERSON, University of Chicago, PAST PRESIDENT RAFAEL REPULLO, CEMFI, EXECUTIVE VICE-PRESIDENT
2010 COUNCIL DARON ACEMOGLU, Massachusetts Institute of Technology MANUEL ARELLANO, CEMFI SUSAN ATHEY, Harvard University ORAZIO ATTANASIO, University College London DAVID CARD, University of California, Berkeley JACQUES CRÉMER, Toulouse School of Economics (*)EDDIE DEKEL, Tel Aviv University and Northwestern University MATHIAS DEWATRIPONT, Free University of Brussels DARRELL DUFFIE, Stanford University GLENN ELLISON, Massachusetts Institute of Technology HIDEHIKO ICHIMURA, University of Tokyo (*)MATTHEW O. JACKSON, Stanford University MICHAEL P. KEANE, University of Technology Sydney LAWRENCE J. LAU, Chinese University of Hong Kong
CESAR MARTINELLI, ITAM ANDREW MCLENNAN, University of Queensland ANDREU MAS-COLELL, Universitat Pompeu Fabra and Barcelona GSE AKIHIKO MATSUI, University of Tokyo HITOSHI MATSUSHIMA, University of Tokyo MARGARET MEYER, University of Oxford PAUL R. MILGROM, Stanford University STEPHEN MORRIS, Princeton University JUAN PABLO NICOLINI, Universidad Torcuato di Tella CHRISTOPHER A. PISSARIDES, London School of Economics (*)ROBERT PORTER, Northwestern University JEAN-MARC ROBIN, Sciences Po and University College London LARRY SAMUELSON, Yale University ARUNAVA SEN, Indian Statistical Institute JÖRGEN W. WEIBULL, Stockholm School of Economics
The Executive Committee consists of the Officers, the Editors of Econometrica (Stephen Morris), Quantitative Economics (Orazio Attanasio), and Theoretical Economics (Martin J. Osborne), and the starred (*) members of the Council.
REGIONAL STANDING COMMITTEES Australasia: Andrew McLennan, University of Queensland, CHAIR; Maxwell L. King, Monash University, SECRETARY. Europe and Other Areas: John Moore, University of Edinburgh and London School of Economics, CHAIR; Helmut Bester, Free University Berlin, SECRETARY; Enrique Sentana, CEMFI, TREASURER. Far East: Hidehiko Ichimura, University of Tokyo, CHAIR. Latin America: Juan Pablo Nicolini, Universidad Torcuato di Tella, CHAIR; Juan Dubra, University of Montevideo, SECRETARY. North America: Bengt Holmström, Massachusetts Institute of Technology, CHAIR; Claire Sashi, New York University, SECRETARY. South and Southeast Asia: Arunava Sen, Indian Statistical Institute, CHAIR.
Econometrica, Vol. 78, No. 6 (November, 2010), 1783–1822
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES BY A. PAKES1 Behavioral choice models generate inequalities which, when combined with additional assumptions, can be used as a basis for estimation. This paper considers two sets of such assumptions and uses them in two empirical examples. The second example examines the structure of payments resulting from the upstream interactions in a vertical market. I then mimic the empirical setting for this example in a numerical analysis which computes actual equilibria, examines how their characteristics vary with the market setting, and compares them to the empirical results. The final section uses the numerical results in a Monte Carlo analysis of the robustness of the two approaches to estimation to their underlying assumptions. KEYWORDS: Moment inequalities, behavioral models, structural and reduced forms.
BEHAVIORAL CHOICE MODELS generate inequalities which, when combined with additional assumptions, can be used as a basis for estimation. This paper considers two sets of assumptions which suffice and uses them in two examples which have been difficult to analyze empirically. In doing so, we distinguish between the assumptions needed to estimate the “structural” parameters defined by the primitives of the choice problem and the “reduced form” coefficients obtained from regressing profits on variables of interest. I begin with a single agent discrete choice problem: a consumer’s decision of which supermarket to shop at. This provides a transparent setting to illustrate the assumptions underlying alternative estimators and motivates the more formal discussion in the rest of the paper. The difficulty in analyzing this example arises from the size of its choice set: all possible bundles of goods at “nearby” locations. Its importance stems from the need to analyze similar problems to understand the implications of alternative local policies (zoning laws, public transportation alternatives, and the like). Section 2 of the paper formalizes two sets of assumptions that take one from the choice model to an estimation algorithm. This is done in a multiple agent setting (with the single agent simplifications noted). The first approach is labeled the generalized discrete choice approach, as it generalizes familiar discrete choice theory to allow for multiple interacting agents. The ideas behind this approach date to Tamer (2003), and are developed in more detail in papers by Ciliberto and Tamer (2009) and Andrews, Berry, and Jia (2006). It was 1 This paper is a revised version of part of my Fisher–Schultz Lecture presented at the World Congress of the Econometric Society in London, August 2005. The paper draws extensively from past interactions with my students and co-authors, and I would like to take this opportunity to express both my intellectual debt and my thanks to them. I like to think they enjoyed the experience as much as I did, although that might have been harder for the students in the group. For help on this paper, I owe a particular debt to Robin Lee. I also thank three referees and a co-editor for helpful comments.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7944
1784
A. PAKES
first considered in the context of analyzing two-stage entry games, but is easily adapted to other multiple agent problems. The second approach is based on the inequalities generated by the difference between the expected profits from the choice made and those from an alternative feasible choice, so we refer to it as the profit inequality approach. It is preceded by the first order (or Euler) condition estimators for single agent dynamic models provided in Hansen and Singleton (1982) and extended to incorporate transaction costs, and hence inequalities, by Luttmer (1996). The approach considered here is a direct extension of the work in Pakes, Porter, Ho, and Ishii (2006), who provided assumptions that enable us to take “revealed preference” inequalities to data (for related work on revealed preference in demand analysis, see Varian (1982), and in the analysis of auctions, see Haile and Tamer (2003)). The two approaches are not nested and a comparison of their assumptions closes this section. Section 3 applies the frameworks developed in Section 2 to the analysis of markets in which a small number of sellers interact with a small number of buyers. This is typical of upstream interactions in many vertical markets; markets where sellers remarket the goods they buy to consumers. A difficulty in analyzing them is that the contracts that establish the buyers’ costs are typically proprietary, and these costs determine both the prices the buyer charges to consumers and the sellers’ investment incentives. Costs are also often proprietary in consumer goods markets. However, because there are many consumers in those markets, we typically assume a Nash equilibrium in price (or quantity) in them. Then the first order conditions from that equilibrium can be used to back out marginal cost. The analogous procedure for vertical markets leads us to moment inequalities: we observe who contracts with whom and ask what features must the buyer’s cost functions have for each agent to be doing better under the observed set of contracts than what they could have expected from changing their contracting behavior. The section begins by extending the empirical work of Ho (2009), which characterizes cost functions in HMO–hospital networks. It shows that her approach can be extended to allow for disturbances that are known to the agents when they make their decisions but not to the econometrician. It then compares the empirical results she obtains to those obtained once we allow for these disturbances. Next we compute equilibria for markets similar to those used in the empirical analysis. The numerical results allow us both to investigate the consistency of the empirical results with those obtained from an equilibrium computation and to engage in a more general examination of the correlates of equilibrium markups. Section 4 uses the data underlying the numerical results of Section 3 in a Monte Carlo analysis of the two approaches to estimation introduced in Section 2. It focuses on the behavior of the two estimators when one or more of the assumptions needed to derive their properties is violated. At least in our example, the estimators were robust to all assumptions except that on the form
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1785
of the disturbance distribution, an assumption which is required only for the generalized discrete choice model.2 1. A MOTIVATING EXAMPLE I begin with a single agent example taken from an unpublished thesis by Michael Katz (2007); I thank him for permission to use it. Katz’s goal was to estimate the costs shoppers assign to driving to supermarkets. Transportation costs are central to understanding store location decisions, and hence to the analysis of the impact of regulations (e.g., zoning laws) and policy changes (e.g., public transportation projects) on retail trade. They have been difficult to analyze empirically with traditional discrete choice models because of the size and complexity of the choice set facing consumers (all possible bundles of goods at all nearby stores). In contrast, large choice sets facilitate moment inequality estimators, as they give the empirical researcher a greater ability to chose a counterfactual that is likely to isolate the effect of interest (here the cost of travel time). Assume that the agents’ utility functions are additively separable functions of the utility from the basket of goods the agent buys, expenditure on that basket, and drive time to the supermarket. The agent’s decision, say di , consists of buying a basket of goods, say bi , at a particular store, say si , so di = (bi si ). If zi represents individual characteristics, U(bi zi ) and dt(si zi ) provide individual i’s utility from bi and drive time to si , respectively, and e(bi si ) is the expenditure required to buy bi at si , then the agent’s utility from (bi si ) is (1)
π(di zi θ) = U(bi zi ) − e(bi si ) − θi dt(si zi )
where I have normalized the coefficient on expenditure to 1 so θi , the disutility of a unit of drive time, is in dollars. To proceed using moment inequalities, we need to compare the utility from the choice the individual made to the utility from a choice the individual could have made but chose not to. This is a sample design question, as it determines what variance is used to estimate the parameter. For a particular di , we chose the alternative, say d (di ), to be the purchase of (i) the same basket of goods, (ii) at a store which is further away from the consumer’s home than the store the consumer shopped at. Note that, given the additive separability assumption, this choice differences out the impact of the basket of goods chosen on utility; that is, it allows us to hold fixed the dimension of the choice that is not of direct interest and to investigate the impact of travel time in isolation. 2 Since the theoretical restrictions brought to data are moment inequalities, they typically lead to set valued estimators. Methods of inference for set valued estimators are an active and important area of econometric research that I do not discuss here; see, in particular, Chernozhukov, Hong, and Tamer (2007), Andrews and Soares (2010), and the papers cited above.
1786
A. PAKES
Assume the agent makes his choice of store by maximizing his expected utility (equation (1)) conditional on the information at his disposal when he chooses the store to shop at. We denote the agent’s expectation operator by E (·) and his information set by Ji . Note that when the agent makes this choice, the goods that will be bought at the store are a random variable, say b, as are their prices and hence total expenditure e(·). If the bundle bought at the chosen store could have been bought at the alternate store, then (s bi ), where bi is what would have been bought had the agent gone to s , is preferred over (si bi ). Since si was preferred over si , transitivity of preferences insures that if, for any function f (·), f (d d ·) ≡ f (d ·) − f (d ·)
E [π(b si si zi )|Ji ] = E [−e(b si si ) − θi dt(si si zi )|Ji ] ≥ 0 We let ν1iss be the difference between realized and expected utility, that is, ν1iss ≡ π(b si si zi ) − E [π(b si si zi )|Ji ] and consider two different assumptions on the distribution of the θi . CASE 1: Assume θi = θ0 or, more generally, that all determinants of the costs of drive time are observed and incorporated in the econometrician’s specification. Letting →P denote convergence in probability, then since dt(si si zi ) < 0, e(bi si si ) i (2) ν1iss →P 0 implies − N −1 →p θ ≤ θ0 dt(si si zi ) i i
To obtain an upper bound for θ0 , consider an alternative store (si ) which was closer to the individual. An analogous argument shows e(bi si si ) i →p θ ≥ θ0 (3) ν1iss →P 0 implies − N −1 dt(si si zi ) i i
That is, provided the average of the expectational errors converges to zero, equations (2) and (3) give us asymptotic bounds for θ0 . CASE 2: Now assume that there is a determinant of the cost of drive time that the agent knows but is not observed by the econometrician, that is, θi = ν2i ≡ 0, so θ0 is the mean cost of drive time. Then our (θ0 + ν2i ), where inequality becomes
E [e(bi si si ) − (θ0 + ν2i )dt(si si zi )|Ji ] ≥ 0
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1787
Now assume the agent knows drive time in deciding where to shop, that is, dt(·) ∈ Ji . Then (4)
1 dt(si si zi )−1 ν1iss →P 0 which implies N i 1 e(bi si si ) →P θ ≤ θ0 N i dt(si si zi )
An analogous upper bound to θ0 is generated by choosing an alternative that has drive time less than that of the chosen store. Discussion Case 1 uses a ratio of averages to bound the parameter of interest, while Case 2 uses the average of a ratio. The following points should be kept in mind. CASE 1 VERSUS CASE 2: Case 2 allows for unobserved heterogeneity in the coefficient of interest and does not need to specify what the distribution of that unobservable is. In particular, the unobservable can be freely correlated with the right-hand side variable. “Drive time” is a choice variable, so we might expect it to be correlated with the perceived costs of that time (with ν2i ). If it is, then Case 1 and Case 2 estimators should be different; otherwise they should be the same. So there is a test for whether any unobserved differences in preferences are correlated with the “independent” variable, and that test does not require us to specify a conditional distribution for ν2i . Similar issues arise in analyzing the choice of a durable good when there is an intensity of use decision that follows the choice of the durable (for a classic example, see Dubin and McFadden (1984)). BEHAVIORAL CONDITIONS: This is a two-stage model with uncertainty: an initial choice of where to shop is made before knowing what prices are, and a choice of what to buy is made after arriving at the store. Note, however, that we did not have to specify either the information on prices the agent had at his disposal when he made his initial decision or the form of the agent’s prior price distribution conditional on that information. These are objects econometricians seldom have access to. CONDITIONS ON THE CHOICE SET: All we required of the choice set was one feasible alternative. In particular, we did not need to specify and compute returns for the many possible “inside” choices, and we did not need to specify an “outside” alternative. Finally note that our focus on the drive time coefficient led us to chose an alternative that differenced out any heterogeneity in preferences over bundles
1788
A. PAKES
of goods. If instead we were interested in the utility of a particular good, we would compare baskets with and without that good at the same store. If we had multiple observations on the same individual, there are many more (largely unexplored) possibilities. 1.1. Estimates From the Inequality and a Comparison Model Katz (2007) estimated his model using the Nielsen Homescan Panel, 2004, for household expenditures and data from Retail Site Database of TradeDimensions for the characteristics of stores. He used the shopping trips of about 1,300 families in Massachusetts and surrounding counties and compared the results that use inequalities to the results that he obtained from estimating a discrete choice comparison model. The Comparison Model To obtain the econometric implications of a behavioral model of supermarket choice, we would (i) specify the agent’s prior distribution of prices at each store, (ii) compute the bundle of goods the agent would buy for each possible realization of the price vector at the store, and (iii) form the expected utility of going to the store. These are demanding tasks. Similar considerations lead most (though not all; see below) analysis of single agent discrete choice problems to reduced forms. The reduced form can be given an appealing interpretation by constructing it from the regressions of expected returns from each choice on variables of interest. If we then make a sufficiently powerful assumption on the joint distribution of the regression functions disturbances, the functions themselves can be estimated. It is the fact that an analogous reduced form is not useful in multiple agent problems that lead to the generalized discrete choice model considered below. Unfortunately this reduced form cannot be used in the supermarket choice problem without first reducing the dimension of the choice set. Katz assumed the number of weekly visits made to supermarkets is distributed as a Poisson random variable. At each visit, the consumer chooses between 10 expenditure bundles at each of the outlets within a given radius of his home. The utility function for a given expenditure bundle and store is allowed to differ with the number of shopping trips per week, but for a fixed number of trips is given by equation (1) augmented with an additive “logit” error. The expenditure bundles are constructed from typical purchase patterns for a given amount of expenditure which are then priced at each outlet (giving us the expenditure level for each choice).3 There are a number of reasons to doubt the estimates from this model. I focus on those directly related to the price and drive time variables. First, the 3 For a discussion of alternative ways to build reduced forms for supermarket discrete choice problems and an application, see Beckert, Griffith, and Nesheim (2009).
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1789
prices for the expenditure class need not reflect the prices of the goods the individual actually is interested in, so there is an error in the price variable, and if individuals shop at stores where the goods they are interested in are less costly, that error is negatively correlated with the price itself. Second, the model does not allow for expectational errors, so agents are assumed to know all relevant prices when store choice decisions were made (and there are a lot of them). Finally, the model does not allow for unobserved heterogeneity in the aversion to drive time. One could allow for a random coefficient on drive time and integrate it out, but this would require an assumption on the conditional distribution for that variable, and given that the aversion to drive time is likely to be related to drive time per se, a traditional random coefficient assumption would be suspect. Results The specifications used for the models estimated were quite detailed; the comparison model estimated about 40 different parameters for each of three different number of visits per week, while the revealed choice model estimated about 15. Both models were estimated with specifications that included outlet characteristics and interactions between expenditure and demographics, so the aversion to drive time varied with observed individual characteristics. As in the original paper, I focus on the estimates of the median (of the mean) aversion to drive time. The multinomial comparison models were estimated using maximum likelihood. The median aversion to drive time was estimated at $240 per hour. The median wage in the region was $17 per hour, so this estimate is implausibly high. Also several of the other coefficients had the “wrong” sign. The inequality estimators were obtained from differences between the chosen store and four different counterfactual store choices (chosen to reflect price and distance differences with the chosen store). Each comparison was interacted with positive functions of 26 “instruments” (variables that were assumed to be mean independent of the expectational errors), producing over 100 moment inequalities. As is not unusual for problems with many more inequalities than bounds to estimate, the inequality estimation routine generated point (rather than interval) estimates for the coefficients of interest (there was no value of the parameter vector that satisfied all of the moment inequalities). However, tests indicated that one could accept the null that this result was due to sampling error.4 4 The finding that there is no value of the parameter vector that satisfies all the inequalities is not unusual in moment inequality problems with many inequalities. Consider the one parameter case. When there are many moment inequalities there are many upper and lower bounds for that parameter. The estimation routine forms an interval estimate from the least upper and the greatest lower bound. The approximate normality of finite sample means implies that in finite samples, the least upper bound will have a negative bias and the greatest lower bound will have a
1790
A. PAKES
The inequality estimators that corresponded to Case 1 above, that is, those that did not allow for unobserved heterogeneity in the drive time coefficient, produced median aversions to drive time of about $4 per hour. The estimators that corresponded to Case 2 above, the case that did allow for heterogeneity in the drive time coefficient, generated estimates of the median aversion to drive time that varied between $16 and $18 per hour, depending on the specification. The difference between the two estimators is consistent with there being unobserved heterogeneity in the drive time coefficient that is negatively correlated with drive time itself; a result one would expect from a model where drive time itself was a choice variable. Moreover, in the model which allowed for heterogeneity, the other coefficients took on values which accorded with intuition. 2. CONDITIONS FOR MOMENT INEQUALITY ESTIMATORS This section provides two sets of conditions that can be used to justify moment inequality estimators in more general (both multiple and single agent) settings. For each of the two approaches, we consider estimation of both the parameters of the underlying behavioral model and a reduced form constructed by regressing expected profits on variables of interest. Each approach is defined by four assumptions: two that are common across approaches and two that differ. I begin with the two common assumptions. 2.1. Common Assumptions The first condition is that agents expect their choice to lead to higher returns than alternative feasible choices. Let π(·) be the profit function, let di and d−i be the agent’s and his competitors’ choices, let Di be the choice set, let Ji be the agent’s information set, and let E be the expectation operator used by the agent to evaluate the implications of his actions. C1: We assume sup E [π(d d−i yi θ0 )|Ji ] ≤ E π(di = d(Ji ) d−i yi θ0 )|Ji d∈Di
where yi is any variable (other than the decision variables) which affects the agent’s profits and the expectation is calculated using the agent’s beliefs on the likely values of (d−i yi ). Throughout, variables that the decision maker views as random will be boldface, while realizations of those random variables will be represented by standard typeface. positive bias. So the two can easily cross. The test is a test of whether such crossings could have been a result of sampling error.
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1791
Three points about C1 are central to what follows. First, there are no restrictions on either the choice set or the objective function. In particular, the objective function need not be concave in d, D could be discrete (e.g., a choice among bilateral contracts, ordered choice, ) or continuous (e.g., the choice of the location and size of a retail outlet), and when continuous, di can be at a corner of the choice set. Second, C1 is a necessary condition for a best response. As a result, were we to assume a Nash equilibrium, C1 would be satisfied regardless of the equilibrium selection mechanism. Finally, note that C1 is meant to be a rationality assumption in the sense of Savage (1954); that is, the agent’s choice is optimal with respect to the agent’s beliefs. In itself, it does not place any restrictions on the relationship of those beliefs to the data generating process. We will need to restrict beliefs to evaluate estimators, but the restrictions used differ between the two approaches to estimation. Both approaches require a model capable of predicting what expected profits would be were the agent to deviate from his observed choice. This is the sense in which both require a “structural” model. To predict what expected profits would be from a counterfactual choice, we need to model what the agent thinks that yi and d−i would be were he to change his own decision. For example, consider a two period model to determine the number of outlets (di ) a retailer builds. Returns from the choice of di will be a function of postentry prices (which are in yi ) and the number of competing outlets (d−i ). If either of these are likely to change when di is changed to d , we need a model of that change so as to construct the firm’s profits from its counterfactual choice. C2 formalizes this requirement. We say yi and/or d−i are endogenous if they change in response to a change in di ; zi will represent a set of exogenous variables, that is, variables whose distributions do not change in response to changes in di . Then our second condition can be stated: C2: yi = y(zi d d−i θ) and d−i = d −i (d zi θ), and the distribution of zi conditional on (Ji di = d) does not depend on d. In words, C2 states that if either yi or d−i is endogenous, we need a model for its response to changes in di , and the model must produce a value for the endogenous variable which depends only on decisions and exogenous variables. The condition that the distribution of zi does not depend on the agent’s choice is what we mean when we say that zi is exogenous. The restrictiveness of C2 will vary with the problem. In single agent problems, profits do not depend on d−i and the agent’s decision is typically not thought to effect environmental conditions, so yi is exogenous. Then C2 is unobjectionable. In multiple agent simultaneous move games d −i (d = di zi θ) = d−i , so there is no need for an explicit model of reactions by competitors, but yi often contains price and/or quantity variables which are endogenous. If, in a sequen-
1792
A. PAKES
tial move game, we want to consider counterfactuals for agents who move early, we need a model for the responses of the agents that move later (for d−i ).5 Implications of C1 and C2 Let d ∈ Di and define π(di d d−i zi θ0 ) ≡ π(di d−i yi θ0 ) − π(d d −i (d z) y(zi d d−i ) θ0 ) Then together C1 and C2 imply the inequality (5)
E [π(di d d−i zi θ0 )|Ji ] ≥ 0 ∀d ∈ Di
To move from (5) to a moment inequality we can use in estimation requires the following: • We need a measurement model which determines the relationship between the π(· θ) and (zi di d−i ) that appear in the theory and the measures of them we use in estimation. • We need the relationship between the expectation operator underlying the agents decisions (our E (·)) and the sample moments that the data generating process provides. These are the two aspects of the problem which differ across the two approaches. We begin with a measurement model which nests both their assumptions. 2.2. Measurement Model o
Let r(d d−i z θ) be the profit function specified by the econometrician up to an additively separable disturbance (so the z o are observed), and define ν(·) to be the difference between the profit function the agent responds to and this specification, so that (6)
r(d d−i zio θ) ≡ π(d d−i zi θ) + ν(d d−i zio zi θ)
The agent’s decision is based on E [π(·)|Ji ]. We observe r(·) and have constructed ν so that E [r(·)|Ji ] = E [π(·)|Ji ] + E [ν(·)|Ji ]. It follows that (7)
r(d d−i zio θ) ≡ E [π(d d−i zi θ)|Ji ] + ν2id + ν1id
5 Often it is natural to write that model recursively, so that each agent’s decision depends on the decisions of the agents who move prior to it. The fact that we allow for sequential games explains the difference between our C2 and Assumption 2 in Pakes et al. (2006). The buyer–seller network example in the next section is sequential and illustrates the types of assumptions needed to model d−i .
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1793
where ν2id ≡ E [ν(d d−i zio zi θ)|Ji ] and
ν1id ≡ π(d ·) − E [π(d ·)|Ji ] + ν(d ·) − E [ν(d ·)|Ji ]
Equation (7) expresses the difference between the researcher’s specification of profits (i.e., r(·)) and the function the agent bases its decision on (i.e., E [π(·)|Ji ]) as a sum of three components, two of which I have grouped together into ν1 . The grouping was done because, when evaluated at θ = θ0 , they both are “mean independent” of Ji under the agent’s expectation operator (under E ) by construction. ν2 does not share this property and it is this distinction which forces us to keep track of two separate disturbances below. The relative importance of the two disturbances will differ with the application. Sources of ν1 ν1 is a sum of two terms. π(d ·) − E [π(d ·)|Ji ] provides the difference between the agent’s expectation of profits at the time he makes his decision and the realization of profits. In single agent problems, it is solely a result of uncertainty in the exogenous variables whose realizations help determine returns (in the supermarket example, the uncertainty in the prices). In multiple agent problems, there may also be uncertainty in d−i . In either case, to compute the distribution of π(d ·) − E [π(d ·)|Ji ], we would have to specify the probabilities each agent assigns to different outcomes conditional on their information sets (objects we often know little about). To compute the distribution of ν1 in the multiple agent case, we would also have to solve for an equilibrium conditional on all possible realizations of d−i . This would both be computationally burdensome and require an assumption that selects among possible equilibria. The second component of ν1 , ν(d ·) − E [ν(d ·)|Ji ], results either from measurement error in observables or from a specification error in r(·) that is mean independent of Ji . Note that such a “specification” error occurs when r(·) is formed by regressing the true profit function onto variables of interest to obtain a “reduced form” whose coefficients become the focus of investigation. Sources of ν2 ν2 is defined to equal that part of profits that the agent can condition on when he makes his decisions but the econometrician does not include in the specification. So although it is not known to the econometrician, ν2i ∈ Ji , and since di = d(Ji ), di will generally be a function of ν2i . In the supermarket example, ν2i has two components: the utility from the goods bought (the U(bi zi ) in equation (1)), and, in Case 2, the differences between the individual’s and the average drive time coefficients (θi − θ0 ). In multiple agent problems, di might also be a function of ν2−i .
1794
A. PAKES
Selection We can now explain the selection problem in behavioral models. Assume that x is an “instrument” in the sense that E [ν2 |x] = 0 and, in addition, that x ∈ J . Then
E [ν1 |x] = E [ν2 |x] = 0 These expectations do not, however, condition on the decision actually made (our di ), and any moment which depends on the selected choice requires properties of the disturbance conditional on the di the agent selected. Since di is measurable σ(Ji ) and ν1 is mean independent of any function of J ,
E [ν1id |xi di ] = 0
however
E [ν2id |xi di ] = 0
As a result, the covariance of x and the residuals will typically not be zero when θ = θ0 , the condition we generally require of an “instrument.” To see why E [ν2i |xi di ] = 0, consider a single agent binary choice problem (di ∈ {0 1}). Then di = 1 implies
E [π(di = 1 d = 0 ·)|Ji ] = E [r(di = 1 d = 0 ·)|Ji ] + ν2i ≥ 0 where ν2i = ν2id=1 − ν2id=0 . So for every agent with di = 1, ν2i > −E [r(di = 1 d = 0 ·)|Ji ] If xi is correlated with E [r(di = 1 d = 0 ·)|Ji ], and if x is used as an instrument it is likely to be correlated π(·), then the expectation of ν2i given xi and di = 1 will be not be zero, regardless of whether E [ν2id=1 |xi ] = E [ν2id=0 |xi ] = 0. In words, if di was selected, then the difference in the unobservable part of the incremental expected returns to di must have been greater than the (negative of the) difference in the observable part of the incremental returns, and the latter will typically be correlated with our instruments. 2.3. The Generalized Discrete Choice Approach The measurement model in equation (7) provides the notation needed to clarify the conditions needed to move from the profit inequalities in equation (5) to sample moments for our two approaches. Recall that we also need to clarify the relationship between the agent’s perceptions of expected returns and the returns emanating from the data generation process embedded in those approaches. We begin with the generalized discrete choice approach (the approach originally developed to handle entry games in the papers by Tamer (2003), Ciliberto and Tamer (2009), and Andrews, Berry, and Jia (2006)). The multiple agent versions of the generalized discrete choice approach make the following assumption:
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1795
DC3: ∀d ∈ Di π(d d−i zi θ0 ) = E [π(d d−i zi θ0 )|Ji ] or that there is no uncertainty in either the exogenous variables (in zi ) or in the actions of the firm’s competitors (in d−i ). Together C1 and DC3 imply that agents never err.6 It is important to note that there are parts of the single agent discrete choice literature that do allow for uncertainty. These include both the dynamic single agent discrete choice models that explicitly account for randomness in exogenous variables and the literature that uses survey data on expectations in conjunction with choice models to allow for uncertainty (see Keane and Wolpin (2009) and Manski (2004), respectively, and the literature they cite). However, computational difficulties and a lack of information on agents’ perceptions on the likely behavior of their competitors have made it difficult to use analogous techniques in multiple agent problems. DC4 provides the restrictions the generalized discrete choice model places on the measurement model in equation (7). DC4: ∀d ∈ Di , r(d d−i zio θ) = π(d d−i zi θ) + ν1i for a known π(· θ) o ∼ F(·; θ) for a known F(· θ). and zi = ({ν2id }d zio ) with (ν2id ν2−id )|dzio z−i The first line in DC4 states that there are no decision specific errors in our profit measure (ν1i does not have a d subscript). So if we knew (di d−i zi z−i ), we could construct an exact measure of profit differences for each θ. The second line states that zi has both observed (the zio ) and unobserved (the ν2id ) components, and provides their properties. The distribution of the unobserved conditional on the observed components is known up to a parameter vector,7 and there is no measurement error in the observed components. Since DC3 assumes full information, all the ν2id are known to all agents when decisions are made, just not to the econometrician. Note that given DC3, the distribution F(·|·) appearing in DC4 is a distribution of realized values, and hence must be consistent with the data generating process (more on this below). Although the assumptions used in DC3 and DC4 may seem restrictive, they clearly advanced the study of discrete choice models in multiple agent settings. Recall that single agent discrete choice models can always be given an intuitive 6 As stated, DC3 also rules out the analysis of sequential games in which an agent who moves initially believes that the decisions of an agent who moves thereafter depend on his initial decision. However, with only notational costs, we could allow for a deterministic relationship between a component of d−i and (d z). 7 There are papers in the single agent discrete choice literature which have allowed for classification errors in d; see, for example, Hausman, Abrevaya, and Scott-Morton (1998). At least in principle, such errors could be added to any of the models considered here.
1796
A. PAKES
reduced form interpretation. Simply regress expected returns from alternate decisions onto observed variables and the decision itself, then solve for the optimal d conditional on the observables and the disturbances from the regressions, and finally make an assumption on the joint distribution of those disturbances that enables identification. The analogous reduced form for multiple agent problems proved not to be useful. In multiple agent contexts, researchers were interested in the relationship between profits and (d−i z) conditional on unobservable determinants of profits, particularly those that were correlated with d−i . For example, in the entry models that stimulated this literature, there was a focus on the relationship of profitability to the number of entrants. Models which did not allow for unobserved market characteristics that affected the profitability of all potential entrants in a market often estimated coefficients that implied that a firm’s profits increased in the number of competitors (since more profitable markets attracted more entrants). So to provide a reduced form of interest for the relationship between profits and (zi di d−i ), we needed to allow for a disturbance that was correlated with d−i . This is the problem the generalized discrete choice approach sought to solve. Substituting DC3 and DC4 into the model generated by C1 and C2 (equation (5)), and letting ν2i ≡ {ν2id }d , with analogous notation for ν2−i , we obtain the following model: MODEL D: ∀d ∈ Di , (8)
π(di d d−i zio ν2i ; θ0 ) ≥ 0;
o ∼ F(·; θ0 ) (ν2i ν2−i )|zio z−i
To insure that there exists a θ for which the event π(di d d−i zio ν2i ; θ) ≥ 0 has positive probability ∀d ∈ Di and all agents in each market, we need further conditions on F(·) and/or π(·). The additional restriction typically imposed is that the profit function is additively separable in the unobserved determinants of profits: RDas : ∀d ∈ Di , (9)
π(d d−i zio ν2i ) = π as (d d−i zio θ0 ) + ν2id
and the distribution ν2id conditional on ν2−i has full support [∀(i d)].8 Keep in mind that the additive separability in equation (9) cannot be obtained definitionally. If we did observe realized profits and regressed it on (di d−i zio ), we would get a residual, but that residual is not the ν2i in equation (9). The regression residuals are mean independent of (di d−i ), while ν2i 8 Allowing for additional unobservables, for example, unobservable random coefficients on the zio , would increase the notational burden but would not change our ability to obtain any of the results below.
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1797
is not. So for the specification in equation (9) to be correct, π as (·) and ν2i have to be derived from the primitives of the problem. Inequalities for Inference Index markets by j = 1 J. For inference, we need the distribution of nj o dij )}i=1 ≡ (ν2j zjo dj ) across the population of markets, which we {(ν2ij zij denote by P(·). The data consist of random draws on (ν2j zjo dj ) from P(·). The expectations
of any function g(·) of a draw conditional on x will be given by E[g(·)|x] = g(·) dP(·|x), so E(·) is defined by the data generating process (DGP). The distribution of ν2j conditional on zjo in DC4, or F(·; θ), is assumed to be consistent with this DGP. The model’s conditions can be satisfied by multiple vectors of dj for any value of θ (i.e., there can be multiple equilibria). As a result, there is not a one to one map between observables, unobservables, and parameters, on the one hand, and outcomes for the decision variables, on the other; so the model is not detailed enough to deliver a likelihood. However Ciliberto and Tamer (2009) and Andrews, Berry, and Jia (2006) noted that we can check whether the model’s conditions are satisfied at the observed dj for any νj and θ, and this, together with F(·; θ), enable us to calculate conditional probabilities of satisfying those conditions. Since these are necessary conditions for observing the choices made when θ = θ0 , the probability of satisfying them must be greater than the probability of actually observing dj . In addition, if we checked whether the dj are the only values of the decision variables to satisfy the necessary conditions for each ν2j at that θ, we could construct the probability that dj is the unique equilibrium. That probability must be lower than the true probability of observing dj at θ = θ0 . These are inequalities that not all values of θ will satisfy and as a result, they can be used as a basis for inference. More formally define the probability that the model in equation (8) (with a restriction like that in equation (9)) is satisfied at a particular dj given zjo for a given θ to be Pr{dj |zjo θ} ≡ Pr{ν2j : dj satisfy equation (8)|zjo θ} and the analogous lower bound to be Pr{dj |zjo θ} ≡ Pr{ν2j : only dj satisfy equation (8)|zjo θ} Letting I{·} be the indicator function which takes the value 1 if the condition inside the brackets is satisfied and 0 elsewhere, the true probability (determined in part by the equilibrium selection mechanism) is Pr{dj |zjo θ0 } ≡ E[I{d = dj }|zjo ] Since we do not know the selection mechanism, we do not know Pr{dj |zjo θ0 }, but we do know that when θ = θ0 , Pr{dj |zjo θ0 } ≥ Pr{dj |zjo θ0 } ≥ Pr{dj |zjo θ0 }
1798
A. PAKES
Let h(·) be a function which only takes on positive values and let →P denote convergence in probability. Since E(·) provides cross-markets averages, we have E J −1 (10) (Pr{dj |zjo θ} − I{d = dj })h(zjo ) j
→P J −1
(Pr{dj |zjo θ} − Pr{dj |zjo θ0 })h(zjo )
j
which is nonnegative at θ = θ0 . An analogous moment condition can be constructed from Pr{dj |zjo θ0 } − Pr{dj |zjo θ}. The estimation routine constructs unbiased estimates of (Pr(·|θ) Pr(·|θ)), substitutes them for the true values of the probability bounds into these moments, and then accepts values of θ for which the moment inequalities are satisfied.9 Since typically neither the upper nor the lower bound is an analytic function of θ, simulation is used to obtain unbiased estimates of them. The simulation procedure is straightforward, though often computationally burdensome. Take pseudorandom draws from a standardized version of F(·) as defined in DC4 and for each random draw, check the necessary conditions for an equilibrium, that is, the conditions in equation (8), at the observed (di d−i ). Estimate Pr(di d−i |θ) by the fraction of random draws that satisfy those conditions at that θ. Next check if there is another value of (d d−i ) ∈ Di × D−i that satisfies the equilibrium conditions at that θ and estimate Pr(di d−i |θ) by the fraction of the draws for which (di d−i ) is the only such value. If we were analyzing markets with N interactive agents, each of whom had #D possible choices and used ns simulation draws on {ν2i }Ni=1 , then for each market and each θ evaluated in the estimation routine, we need to evaluate up to ns × #D × N inequalities to obtain estimates of Pr{·|θ}, and we need to evaluate up to ns × (#D)N inequalities if we also estimate Pr(·|θ). This can be computationally expensive, particularly in multistage games solved by backward recursions, as then to solve for each π(di d−i ·) we need to compute equilibria to later stages of the game. 2.4. Profit Inequalities An earlier version of this approach appears in Pakes, Porter, Ho, and Ishii (2006). Recall that what we need is an assumption on the relationship between 9 As noted by a referee, this routine ignores information. If we can enumerate all possible equilibria, as is assumed if we use the lower bound, we could use the fact that the equilibrium selection probabilities must sum to 1 (for more detail, see Beresteanu and Molinari (2008)). Also, I have implicitly assumed that there is an equilibrium in pure strategies for each point evaluated. This need not be the case; for a discussion of the implications of existence problems for econometric work on discrete games, see Bajari et al. (2010).
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1799
(i) the agents’ perceptions of expected returns and the returns emanating from the data generating process and (ii) the profit measure the agent uses and the one the econometrician specifies. From equation (5), E [π(· θ0 )|Ji ] ≥ 0. If xi ∈ Ji this implies E [π(· θ0 )| xi ] = 0. PC3 relates these expectations to averages from the data generating process (our E(·) operator). PC3: There is a positive valued h(·) and an xi ∈ Ji for which 1 E (π(di d d−i zi θ0 )|xi ) ≥ 0 N i 1 ⇒ E π(di d d−i zi θ0 )h(xi ) ≥ 0 N i PC3 nests DC3, as it allows for uncertainty and it does so without requiring us to fully specify how the agent forms his expectations. If agents know (i) the other agents’ strategies (i.e., d−i (J−i )), and (ii) the joint distribution of other agents’ information sets and the primitives sources of uncertainty (i.e., of (J−i zi ) conditional on Ji ), then, provided all expectations exist, our optimality condition C1 insures that PC3 is satisfied. These assumptions are, however, stronger than are needed for PC3. Several authors have noted that agents’ expectations can satisfy C1 without them having such detailed information (see, e.g., Dekel, Fudenberg, and Levine (1993)). Furthermore, although correct expectations about profit differences are sufficient for PC3, they are not necessary. A weaker sufficient condition is (11)
1 E π(di d d−i zi θ0 )h(xi ) N i 1 −E π(di d d−i zi θ0 )h(xi ) ≥ 0 N i
where again E(·) is defined by the DGP. If agents have incorrect expectations on π(· θ0 ) but their expectational error is not systematically related to xi (i.e., are mean independent of xi ), then (11) is satisfied with equality. Indeed PC3 is satisfied even if agents are incorrect on average, provided they are overly optimistic about the relative profitability of their choices. The final condition used in this estimation strategy is designed to deal with the selection problem caused by the {ν2ijd } for the i = 1 Nj agents in market j. C1, C2, and the definitions in equation (7) imply that our model generates the restriction that
E [π(·; θ0 )|J ] = E [r(·; θ0 )|J ] − ν2 ≥ 0
1800
A. PAKES
PC3 insures that this implies that if x ∈ J , sample averages of π(·; θ0 )h(x) = r(·; θ0 )h(x) − ν2 h(x) have positive expectation. For consistency we require that the sample average of the observable r(·; θ)h(x) have positive expectation at θ = θ0 . This will be the case if the expectation of the average of ν2 h(x) is nonnegative. Above we stressed that even if x is an instrument in the sense that {ν2d } is mean independent of x in the population at large, the mean of ν2d conditional on x for those who made particular decisions will typically depend on x. Pakes et al. (2006) presented a general condition which insures that the selection resulting from conditioning on agents’ choices does not impact the consistency of our estimators. Here we consider three ways to form moments that satisfy that condition that we have seen used, often in combination, in applied work. They are based on the researcher finding weighted averages of differences between actual and counterfactual choices that either (i) difference out the effect of the ν2 (PC4a), (ii) insure that we average over the ν2 of every agent (so that there is no selection) (PC4b), or (iii) sum to an observable which controls for a weighted average of the ν2 (PC4c). PC4a—Differencing: Let there be G groups of observations indexed by g, ∈ Dig , and positive weights wig ∈ Jig such that i∈g wig × counterfactuals dig = 0; that is, a within-group weighted average of profit differences ν2igdig dig eliminates the ν2 errors. Then G−1
g
provided G−1
wig r(dig dig ·; θ0 ) − E [π(dig dig ·; θ0 )|Jig ] →P 0
i∈g
g
i
wig r(dig dig ·; θ0 ) obeys a law of large numbers.
Our Case 1 supermarket example is a special case of PC4a with ng = wig = 1: There di = (bi si ), π(·) = U(bi zi ) − e(bi si ) − θ0 dt(si zi ), and ν2id ≡ U(bi zi ). If we measure expenditures up to a ν1id error, r(·) = −e(bi si ) − θ0 dt(si zi ) + ν2id + ν1id . We chose a counterfactual with bi = bi , so r(·) = π(·) + ν1· and the utility from the bundle of goods bought is differenced out. “Matching estimators,” that is, estimators based on differences in outcomes of matched observations, implicitly assume PC4a (no differences in unobservable determinants of the choices made by matched observations). For other single agent examples, see Pakes et al. (2006). For a multiple agent example, consider two period entry games with common unobservable determinants of market profitability—the problem that stimulated the literature on using inequality estimators in multiple agent settings. For specificity, consider two retailers, say i = {W T }, deciding whether to enter different markets, so dji ∈ {1 0} and dji = 1 indicates that firm i enters market j = [1 J]. If there are market-specific unobservables
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1801
known to the agents but not to the econometrician, then rj (dij d−ij zi θ) = E [πj (dij d−ij zi θ)|Ji ] + ν2j + ν1ij . The zi include sources of cost differences (like warehouse and central officelocations). Let wij = wj ∈ {0 J −1 } with −1 T W wj = J ⇔ dj = [1 − dj ]; that is, j i wij πij (·) puts weight only on markets where the two agents make opposite decisions. The only possible counterfactual is dji = [1 − dji ]. So if wj = 1 and djW = 1, then djT = 0, rjW (·) = E [πjW (·)|JW ] + ν2j + ν1W j , and rjT (·) = E [πjT (·)|JT ] − ν2j + ν1Tj . Since ν2j enters the two inequalities with opposite signs, it cancels when we sum over i = {W T } and
wj
j
rji (d i d i d−i zi θ)
i=W T
=
j
wj
E [πji (d i d i d−i zi θ)|Ji ] + ν1ij i=W T
which will be nonnegative at θ = θ0 Notice that since we assume wij ∈ Jij , these weights imply d−ij ∈ Jij , as in the generalized discrete choice model, but we do not require assumptions on the distribution of zi or of νi· . Also more moment inequalities can be generated from appropriate instruments and the model can be enriched to explicitly allow for differences in the firm’s responses to the market-specific shock (replace ν2j by θzji ν2j , where zji ≥ 0 and zji ∈ Ji , and then divide each difference by zji ). Similar structures appear in a number of other familiar problems (e.g., social interaction models where the interaction effects are additive and groupspecific). In simultaneous move games where the market allocation mechanism is known, one can often construct counterfactuals which difference out individual-specific (rather than group-specific) ν2 effects. For example, in electricity auctions with known allocation mechanisms, we can compute the difference between the revenues and quantities actually allocated to the agent, and those the agent would have obtained had the agent submitted a different bid (holding the realizations of environmental variables and competitors’ bids constant). Profits are revenues minus fixed and variable costs. The fact that the expectation of the difference in profits from the two bids should be positive allows us to bound the variable cost function without restricting agent-specific fixed costs in any way (as they are differenced out). PC4b—Unconditional Averages and Instrumental Variables: Assume that ∀d ∈ Di there is a d ∈ Di and a wi ∈ Ji such that wi r(di di ·; θ) = wi E [π(di di ·; θ)|Ji ] + ν2i + ν1i·
1802
A. PAKES
Then if xi ∈ Ji , E[ν2i |xi ] = 0, and h(·) > 0, wi r(di di ·; θ0 )h(xi ) N −1 i
→P N −1 provided N
−1
wi E [π(di di ·; θ0 )|Ji ]h(xi ) ≥ 0
i
ν1i· h(xi ) and N −1
ν2i h(xi ) obey laws of large numbers.
PC4b assumes there is a counterfactual which gives us an inequality that is additive in ν2 no matter what decision the agent made. Then we can form averages which do not condition on d, and hence do not have a selection problem. This form of PC4b suffices for the examples in this section but the next section requires a more general form, which allows us to group agents and is given in the footnote below.10 Case 2 of our supermarket example had two ν2 components: a decisionspecific utility from the goods bought, ν2id = U(bi zi ) (as in Case 1), and an agent-specific aversion to drive time, θi = θ0 + ν2i . As in Case 1, taking d = (bi si ) differenced out the U(bi zi ). Then r(·) = −e(· si si ) − (θ0 + ν2i )dt(si si zi ) + ν1· . Divide by dt(si si zi ) ≤ 0. Then C1 and C2 imply that E [e(si si bi )/dt(si si zi )|Ji ] − (θ0 + ν2i ) ≤ 0. This inequality is (i) linSo if E[ν2 ] = 0, PC3 and a law of ear in ν2i and (ii) is available for every agent. large numbers insure N −1 i ν2i →P 0, and i e(si si bi )/dt(si si zi ) →P θ0 ≤ θ0 : whereas if E[ν2 |x] = 0, we can use x to form instruments. Notice that ν2i can be correlated with dt(zi si ), so this procedure enables us to analyze discrete choice models when a random coefficient affecting tastes for a characteristic is correlated with the characteristics chosen. For a multiple agent example of PC4b, we look at within market expansion decisions. Agents chose a number of outlets, a di ∈ Z+ (the integers) to maximize expected profits. Formally the model is a multiple agent two period ordered choice model: a model with many Industrial Organization applications (e.g., Ishii (2008)). In the first period the agents chose a number of outlets and in the second they obtain the variable profits from sales at those outlets. So πi (·) = vp(di d−i ·) − (c0 + ν2i )di , where vp(·) are variable profits and (c0 + ν2i ) represent the costs of building and maintaining the outlets. These costs differ across firms in ways known to the firms but not to the econometrician, and c0 is defined to be their average (so ν2i ≡ 0). We measure variable profits up to a ν1 (·) measurement error, so define ri (·) = πi (·) + ν1di = vpi (·) − co di + ν2i di + ν1di , where vpi (·) and di are observed. 10
The more general version is as follows. Assume G groups of observations with ng members is a counterfactual {dig ∈ Dig }i and positive in group g, and that for each {dig ∈ Dig }i there · θ0 ) = i∈g wig (E [π(dig dig · θ0 )|Jig ] + weights wig ∈ Jig such that i∈g wig r(dig dig −1 ν2g + ν1ig ). Then if xig ∈ Jig , E[ν2g |xig ] = 0, and h(·) > 0, G g i∈g h(xig )(wig r(dig dig ·; θ0 ) →P G−1 g i∈g h(xig )(wig E [π(dig dig ·; θ0 )|Jig ] ≥ 0, provided laws of large numbers hold.
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1803
C1 and C2 imply that the incremental profits from choosing one more machine than was actually chosen (a di = di + 1) are expected to be less than its cost, or E [π(di di + 1 ·)|J i ] ≤ 0. But r(di di + 1 ·) = E [π(di di + 1 ·)|Ji ] + ν2i + ν1i· . So since ν2i ≡ 0, N −1
r(di di + 1 ·) →P N −1
i
π(di di + 1 ·)
i
= N −1
[vp(di di + 1 ·) − c0 ] ≤ 0
i
where we have assumed PC3 and a law of large numbers. That is, N −1 × i [r(di di + 1 ·) →P c 0 ≤ c0 . An upper bound for c0 can be obtained by choosing di ≤ di (see Pakes et al. (2006), for the case where some observations are at di = 0, in which case the counterfactual di < di is infeasible). Additional moments can be obtained by forming covariances with h(xi ) that are (unconditionally) uncorrelated with ν2i . Notice that E[v2 h(x)] = 0 is our only assumption; in particular we do not require d−i ∈ Ji as in our previous multiple agent example. PC4c—Control Functions: Often variables that are not available at a disaggregated level are available as aggregates, and this can be used to develop a control function for ν2 . Two familiar examples are (i) firm level exports to different countries are typically unavailable but aggregate trade flows by product and country of destination are recorded, and (ii) product level input, cost, and sales data of multiproduct firms are not typically available, but both firm level aggregates over products, and product level aggregates over firms, often are. We illustrate with a firm location application adapted from de Loecker et al. (2010) that uses the trade data. An alternative illustration would have been to use the available data on multiproduct firms in conjunction with the product level aggregates over firms to analyze multiproduct cost functions. The output of each firm at each location is known, but where that output is sold is not. The fixed costs of firm i in location d are f (zi d θ), while its marginal costs are m(zi d; θ) (both vary with location of production and firm characteristics). Let qid be the quantity firm i produces and let qie = qe + e be the quantity it exports to market e. Here qe is average exports to e, so ν2i e ν i 2i ≡ 0 (∀e). c(d e) is the transportation costs (which vary with the location of production d and consumption e). We observe qid and zi , and measure the firm’s total cost up to a ν1 error, or r(·), where r(zi d; θ) = f (zi d; θ) + m(zi d; θ)qid e + c(d e θ)qe + c(d e θ)ν2i + ν1id e
e
1804
A. PAKES
Each firm producing in d chose its location to minimize expected costs and each could have produced in counterfactual location d without changing the countries it sold to. Summing the expected cost difference in moving all e firms from d toe d , letting Q be (the observed) total exports to e, and using c(d e)ν = 0, we have 2i i e r(d d · θ) N −1 i
(f (d d · θ) + m(d d · θ)qid )
→P N −1 +
i
c(d d e θ)Qe ≤ 0
e
at θ = θ0 , which allows us to combine the microdata on firms costs and aggregate data on exports to bound the parameters of interest. Notice that we do so without having to either estimate demand functions or make a pricing assumption in each country. 2.5. Comparing the Two Approaches The two approaches differ in their informational (DC3 vs. PC3) and their measurement (DC4 vs. PC4) assumptions. They also differ in their computational properties, but these are discussed in the context of the Monte Carlo example in Section 4. PC3 nests DC3. PC3 allows for uncertainty and does so without having to specify either the agents’ information sets or their subjective probability distributions conditional on those information sets, objects we typically know little about. It also allows for expectational errors provided they are mean independent of the instruments. DC3 assumes agents know the returns from every choice and correctly optimize. Partly as a result, DC3’s primary use in multiple agent settings has been as a characterization of “rest points” of environments which are “stable” over time—a setting often invoked to justify the use of two period games to structure cross sectional empirical work. Notice that even in this setting, the combination of DC3 with DC4 only leads to consistent bounds if the profit (or value) function is constructed from correctly specified primitives. If instead a reduced form from regressing profits on variables of interest is used, the model contain a regression error. This violates the assumptions of the generalized discrete choice model and will lead to inconsistent estimates of the bounds on the reduced form parameters. We show how to adapt the discrete choice model to accommodate reduced forms below. The two measurement assumptions (PC4 and DC4) are not nested. Indeed the simplest of the measurement assumptions used to justify the two models are distinctly different. Sufficient conditions for PC4 are that the {ν2id } do not vary over d and that there are instruments which are uncorrelated with the
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1805
{ν1id }. DC4 requires that the {ν1id } do not vary over d and that there be a known joint distribution for the {ν2id }. When {ν2id } does vary over d, then to use the profit inequalities approach, we need to control for a selection problem. Our ability to do so typically depends on the richness of the set of feasible counterfactual choices and the appropriate form(s) for the heterogeneity. When {ν1id } does vary over d, then the generalized discrete choice model will not deliver consistent bounds. {ν1id } will vary over d if there is uncertainty about outcomes, measurement error, or specification error. For some of these cases, we can modify the generalized discrete choice model and its estimation algorithm to deliver consistent bounds. For a familiar example, consider the case where we are analyzing the profits from entering markets in which the true profit from entering is additively o o ν2ij ) = π as (dij = 1 d−ij zij θ0 ) + ν2ij , separable in ν2 , or π(dij = 1 d−ij zij as in equation (9) and the profits from not entering are normalized to zero. As in most entry models, we are interested in the reduced form obtained by o and the number of competitors, say i {dij = 1}. So regressing π as (·) onto zij o o r(dij = 1 d−ij zij θ) = zij θz +
{dij = 1}θd + ν2ij + ν1ij i
o where ν1ij is the regression error, so E[ν1ij |zij i {di = 1} ν2ij ] = 0 by construction. Recall that to obtain the inequalities used in estimation in this model, we have to check whether the Nash conditions are satisfied at a given θ; that o θz + {dij = 1}θd + ν2ij + ν1ij condiis, to check whether dij maximizes zij tional on d−ij for each i. To do this, we will need an assumption on the joint distribution of the ν1ij , as well as for the {ν2ij }; and the ν1ij must be mean independent of the {ν2ij }, dij , and zij , while the {ν2ij } are determinants of dij . Given the additional distributional assumption the estimation algorithm is analogous to that in Section 2.3, although a more complex simulator must be used. It is not as easy to accommodate measurement error in the generalized diso ∗ = zij + ν1ij and the agent responds to z ∗ (not crete choice framework. If zij o ∗ θz + i {dij = 1}θd + ν2ijd satisz ), estimation requires checking whether zij fies the Nash conditions. To obtain a simulator that would allow us to do so, we would need to draw from the distribution of z ∗ conditional on z 0 . This is typically not known nor is it easy to estimate. In contrast, classical measurement error does not affect the consistency of the profit inequality estimator. Finally DC4 requires explicit distributional assumptions on the {ν2ij }, while the profit inequality model relies only on mean independence assumptions. The need for an explicit distributional assumption is a concern, as the generalized discrete choice corrects for selection by finding the probability of ν2 values which induce the agent to make particular choices. Those probabilities depend on the properties of the tail of the ν2 distributions—properties we often know little about. The Monte Carlo example in Section 4 is designed to investigate
1806
A. PAKES
the robustness of the two models to violations of their assumptions and indicates that it is the distributional assumption of the generalized discrete choice model that is most problematic. 3. A MULTIPLE AGENT EXAMPLE (BUYER–SELLER NETWORKS) Vertical markets typically contain a small number of both sellers and buyers (who resell the products they buy to consumers). Most buyers buy from more than one seller, while most sellers sell to more than one buyer. The terms of the payments the buyer makes to the seller are negotiated and vary with underlying market conditions. These terms determine both the costs buyers factor into the prices they set when they remarket the goods they sell to consumers and the split of the profits between the sellers and the buyers, and hence the sellers’ incentives to invest in cost reductions (or product improvements). Unfortunately those terms are often proprietary; a seller bargaining with many buyers may not want one buyer to know the terms of its other contracts. Costs are also often proprietary in consumer goods markets. However, since these are markets with many purchasers, we typically assume sellers have the power to set prices (or quantities) in them. Then the first order conditions from a Nash equilibrium can be used to back out costs; that is, we can find the marginal costs that insure that no firm has an incentive to deviate from the observed prices. This section uses moment inequalities to unravel features of the payment structure in vertical markets in an analogous way. We observe which sellers establish contracts with which buyers and, were we to know the buyer’s cost function, could compute approximations to both the buyers and the sellers profits from (i) the existing arrangement and from (ii) a counterfactual in which one of the observed relationships is changed. So we proceed by parameterizing the buyer’s cost function and look for values of the parameter vector that, on average, make the profits from the observed contracts larger than those from possible counterfactuals; that is, values of θ that make the observed relationship in the interests of both agents. We analyze an HMO–hospital example. To see how market characteristics effect payments in this example, consider two different situations. In one a hospital with excess capacity in a neighborhood with several other similar noncapacity constrained hospitals is bargaining with an HMO. The HMO has already contracted with other neighborhood hospitals. Since there are similar options for consumers who require a hospital, the HMO’s attractiveness to consumers is relatively insensitive to the inclusion of the given hospital in its network. As a result, were the HMO to include that hospital it would not, in equilibrium, increase the premium it charges to consumers. So for the hospital’s contract offer to be accepted by the HMO, the contract would have to set hospital prices low enough for the HMO to prefer sending patients to that particular hospital rather than to its neighbors. On the other hand, if the hospital was the only hospital in the neighborhood, the HMO would be unlikely to
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1807
attract any customers from that neighborhood without having the hospital in its network. Then, provided it is in the HMO’s interest to operate in the neighborhood, the hospital should be able to extract nearly all the (hospital related) premiums that would be generated in that neighborhood. To use this logic in estimation, we will have to specify what the buyer (seller) would have expected to happen if he had made a counterfactual choice. This requires assumptions, although the assumptions need not specify the form of the contracts. We suffice with a reduced form for the payments generated by the contracts obtained by regressing the HMO’s per patient payments on hospital characteristics. In these respects, this paper follows the assumptions used in Ho’s (2009) work on HMO–hospital markets. Ho’s analysis assumes that there are no structural disturbances in her data (in our notation, ν2id = ν2i ∀d). We begin by showing that, by changing the moment inequalities taken to data, we can develop an estimation algorithm for her model that allows for both ν2 and ν1 errors. We then compare the results from estimators that do and that do not allow for ν2 errors. Next full information equilibria are computed from a structural buyer–seller network game with primitives similar to those in the empirical example. The reduced form implied by the computed equilibria is calculated, compared to the estimates obtained from the inequality estimators, and explored for possible additional correlates of contract characteristics. 3.1. Empirical Analysis Ho (2009) used a two period game to structure the analysis: in the first period, contracts between HMOs and hospitals are established, and in the second period, the HMOs engage in a premium setting game which conditions on those contracts (and is assumed to have a unique Nash equilibria). Once the premiums are set, consumers choose HMO’s and, if the need arises, choose a hospital in their HMO’s network. The premium setting game generates revenues for each HMO conditional on any configuration of hospital networks. Let Hm be a vector of dimension equal to the number of hospitals whose components are either 0 or 1, where a 1 indicates the hospital is in HMO m’s network and a 0 indicates it is not. H−m specifies the networks of the competing HMOs. The revenues the HMO receives from the premium setting game, say Rm (Hm H−m z), and the number of patients HMO m sends to hospital h, say qmh (Hm H−m z), depend on these networks and exogenous variables (our z). The profits of the HMO are the revenues from the second period game minus the transfers the HMO makes to the hospitals in its network in payment for their services, say Tmh or πmM (Hm H−m z) = Rm (Hm H−m z) −
h∈Hm
Tmh (Hm H−m z)
1808
A. PAKES
Analogously if ch is the per patient costs of hospital h and Mh is the hospital’s network of HMOs, the hospital’s profits are Tmh (Hm H−m z) πhH (Mh M−h z) = m∈Mh
− ch
qmh (Hm H−m z)
m∈Mh
We project Tmh onto a set of interactions of qmh (·) with a vector of hospital characteristics, say zh , and look for bounds on the resulting reduced form parameters; that is, if xmh (·) = qmh (·)zh are the interactions, we estimate the θ in Tmh (Hm H−m z) = xmh (Hm H−m z)θ + νmh where νmh are uncorrelated with xmh by construction. Note that if agents know more about the details of the contracts they sign than is captured by xmh (·), νmh has a component which is known to both agents when they make their decisions (a “ν2 ” component). Substituting this form of Tmh (·) into the two profit functions, we obtain πmM (· θ) ≡ RM (12) xmh (Hm H−m z)θ − νmh m (Hm H−m z) − πhH (· θ) ≡
h∈Hm
h∈Hm
xmh (Hm H−m z)θ
m∈Mh
− ch
qmh (Hm H−m z) +
m∈Mh
νmh
m∈Mh
These equations determine actual (in contrast to measured) realized profo its. Our measured variables, (Rom (·) xomh (·) qmh (·) cho ) are obtained either directly from data or from a careful study of hospital demand and the formation of HMO premiums described in Ho (2009). We assume that they are correct up to a mean zero measurement error. That is, our measure of profits for HMO m and hospital h given a value for θ is (13) xomh (·)θ rmM (· θ) ≡ Rom (·) − r (· θ) ≡ H h
h∈Hm
o mh
x
m∈Mh
θ − cho
m∈Mh
and our assumptions imply r (· θ) ≡ E [π (· θ)|Jm ] + E M m
o qmh (·)
M m
νmh |Jm + ν1mMh M−h
h∈Hm
r (· θ) ≡ E [π (· θ)|Jh ] − E H h
H h
m∈Mh
νmh |Jh + ν1hHm H−m
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1809
Counterfactuals Equation (12) provides the profits agents obtain from the observed network. To obtain our moment inequalities, we have to consider the profits, and hence the network, the agents thought would have obtained from counterfactual behavior, and this requires assumptions on the contracting game. Ho’s assumptions are both familiar and computationally convenient (we consider a less convenient alternative below). She assumes that sellers simultaneously make take it or leave it offers to buyers, who then simultaneously accept or reject. As in Hart and Tirole (1990) the contract offers are assumed to be proprietary: each HMO knows the offers made to it but not to its competitors, and each hospital knows the offers it makes but not those of its competitors. We observe which HMOs contracted with which hospitals and can compute our measures of returns from any network. To form our moment inequalities, we need to know the network that would be established were either an HMO or a hospital to change its behavior.11 The HMOs act last. So our assumptions imply that the HMO could reverse any of its decision without changing the behavior of any other agent. Accordingly our HMO counterfactuals are obtained by reversing the HMO’s acceptance–rejection decision with each of the hospitals in the market, leaving all other contracts unchanged, and computing the difference in the HMO’s profits between the actual and the counterfactual networks. To obtain a profit inequality for the hospital, we have to (i) specify an alternative offer the hospital could make and (ii) specify either what the hospital thinks the particular HMO would do were it offered the alternative contract or compute a lower bound to the profits the hospital could make as a result of the actions the HMO might take in response to the alternative contract. We assume that the hospital could always offer a null contract (a contract which is never accepted). What the hospital thinks the HMO would do if offered this contract depends on how the hospital thinks receiving the alternative contract would affect the HMO’s beliefs about the contracts offered to other hospitals, and given those beliefs, whether the hospital thinks the HMO would change its replies to the contracts offered by other hospitals. We assume “passive beliefs,” that is, the hospital believes that the HMO will not change its beliefs about the offers the hospital makes to other HMOs were it to receive the counterfactual offer, and present results which assume that the hospital thinks the HMO would not change its behavior with other hospitals were it to receive the null contract. However, we have also done the analysis assuming the hospital thinks the HMO might add a different hospital with little difference in empirical findings. 11 Since we assume that the premium setting game is a full information game, our assumptions are what McAfee and Schwartz (1992) refer to as “ex poste observability”; the HMOs do not know each other’s offer in the first period, but the costs in each accepted contract are revealed before the second stage of the game. This assumption could be relaxed at a cost of increasing the computational burden of estimation.
1810
A. PAKES
Inequalities Used We began with Ho’s assumption that E [νmh |Ji ] = 0 for i = (m h). Then the only disturbances in equations (13) are ν1 disturbances, so we can form our inequalities by interacting positive functions of variables that were known to the decision maker when the decision was made with the difference between our models’ estimates of the profits actually earned and those that would have been generated by our counterfactuals. Recall that these are the HMO’s profits from reversing its decision with each hospital and the hospital’s profits from offering a null contract to an HMO which had accepted its offer. Next we considered alternative ways to allow νmh to have a ν2 component, that is, we allowed E [νmh |Ji ] ≡ ν2mh = 0 and the same value for i = m h. We first tried ν2mh = ν2m ∀(m h); that is, that the ν2 are HMO-specific fixed effects. As shown in the Appendix, we can then use PC4a to generate a quite detailed set of inequalities. There is no a priori reason to assume a fixed effects structure here: when we did, it accentuated the problems with the ν1 -only model,12 so we used the generalized version of PC4b in footnote 10 to develop an estimator for the buyer–seller network problem that allows for a ν2mh of a general form. Recall that the νmh are a component of transfers, so the same νmh value that goes into a hospital’s revenues is a component of an HMO’s costs. Let χmh be the indicator function for whether a contract is established between m and h with χmh = 1 if it is established and 0 if not. These are the only two outcomes possible. So to satisfy PC4b, we need an inequality which is additively separable in νmh regardless of whether χmh = 0 or 1. Let πhH (Mh Mh /m M−h z) be the difference between the hospital’s profit when the network of the hospital includes HMO m and when it does not. If χmh = 1, this contains νmh . Let πmM (Hm Hm ∪ h H−m z) be the difference between the HMO’s profit were it to reject hospital h’s contract and were it to accept it. If χmh = 0, this includes the savings in νmh from rejecting the contract. Note that if χmh = 1, E [πhH (Mh Mh /m M−h z)|Jh ] ≥ 0, while if χmh = 0, E [πmM (Hm Hm ∪ h H−m z)|Jm ] ≥ 0. Using analogous notation for r(·), equation (13) implies χmh rhH (Mh Mh /m M−h z; θ) + (1 − χmh )rmM (Hm Hm ∪ h H−m z; θ) 12 In the ν1 -only model, about 12% of the inequalities were negative, but under 2% were individually significant at the 5% level. In the model with fixed effects, about a third of the inequalities were negative and 10% were significant at the 5% level. A more complete analysis of effects models in buyer–seller networks would allow for both buyer and seller effects. This is a straightforward, although somewhat tedious, extension of the results in the Appendix. We examine the HMO effects case in detail because all the contract correlates we use in our analysis are hospitalspecific and we wanted to make sure that the absence of HMO characteristics did not bias the analysis of the impacts of these hospital-specific variables.
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1811
= χmh E [πhH (Mh Mh /m ·)|Jh ] + ν2mh + ν1m· + (1 − χmh ) E [πmM (Hm Hm ∪ h ·)|Jm ] + ν2mh + ν1h· = χmh E [πhH (Mh Mh /m ·)|Jh ] + ν1m· + (1 − χmh ) E [πmM (Hm Hm ∪ h ·)|Jm ] + ν1h· + ν2mh which is additive in ν2mh regardless of whether χmh is 1 or 0. So there is no selection and provided x ∈ Jm ∩ Jh , E[ν2mh |x] = 0, and h(·) ≥ 0, PC3 insures that at θ = θ0 , (14)
E(χmh rhH (Mh Mh /m M−h z; θ) + (1 − χmh )rmM (Hm Hm ∪ h H−m z; θ))h(x) ≥ 0
The model also delivers an inequality that does not depend on the νmh (as in PC4a). The sum of the increments in profits to the HMO and the hospital when a contract is established does not contain the transfers between them (and hence νmh ), does contain information on θ (since if the contract is not established there is a change in transfers to other agents), and must have positive expectation (at least if contract offers are proprietary). So if Hm / h is the observed network of hospital m minus hospital h, our conditions on x also insure E χmh (rmM (Hm Hm / h ·; θ0 ) + rhH (Mh Mh /m ·; θ0 ))h(x) ≥ 0 (15) Estimates Neither the ν1 -only nor the model which allowed for ν2 could be rejected by our formal tests. This is not surprising given the sample size.13 However, the results did seem to favor the model that allows for ν2 , as only 6 of the inequalities were negative at its estimated parameter value (the ν1 -only model 13 There were 40 markets containing about 450 plans and 630 hospitals. The market characteristics used as instruments were indicators for the quartile of the market’s population size, high (greater than mean) share of population aged 55–64, and hospitals integrated into systems. The plan characteristics were indicators for whether the plan was local, its quartile of the breast screening distribution, the quality of its mental health services, and an interaction between the last two variables. The hospital cost measure was not used as an instrument because we were worried about measurement error in that variable. The results reported here weighted the market averages of the moment inequalities by the square root of the number of plans in the market, as this produced slightly smaller confidence intervals (interestingly weighting by the variance of the moment inequalities did not improve those intervals). Confidence intervals for each dimension are computed using the techniques in Pakes et al. (2006); a Monte Carlo study of their properties is available from the author). Finally, Ho (2009) reported a series of robustness checks on the ν1 only estimates of a model which is similar to the model presented here. Although specifications which add right-hand side variables sometimes increase the confidence intervals quite a bit, the qualitative results in our column 1 in Table I are never reversed.
1812
A. PAKES TABLE I DETERMINANTS OF HOSPITAL –HMO CONTRACTSa Real Datab
Data: Estimator:
Simulated Datac
Inequality Estimators ν1 -Only
Column:
(1) θ
Variable
Const. CapCon Cost/Adm. Av.Cost Cost − AC Pop/bed # Patients HMOmarg R2
9.5 3.5 −0.95 – – – – – –
OLS Regression Actual Markups
ν1 and ν2
(2) 95% CI UB/LB
(3) θ
(4) 95% CI UB/LB
(5) θ
(6) s.e.
Per patient markup (units = $thousand/patient) 15.4/4.8 8.2 15.2/3.3 8.9 0.09 8.6/1.4 13.5 16.1/2.3 1.2 0.10 −1.5/−0.57 −0.58 −0.2/−1.1 −0.39 0.01 – – – – – – – – – – – – – – – – – – – – – – – – – –
–
–
0.71
(7) θ
(8) s.e.
3.7 0.48 – −0.23 −0.56 0.11 −0.09 1.4
0.24 0.11 – 0.01 0.01 0.01 0.01 0.10
0.80
a Notation: CI, confidence interval; s.e., standard error; UB, upper bound; LB, lower bound. b Real Data: There are 40 markets. CapCon measures whether the hospital would be capacity constrained if all hospitals contracted with all HMOs. Cost/Adm denotes hospital cost per admission. Costs and admissions are not elements of instrumental variables (IV). c Simulated Data: These are least squares regression coefficients from projecting computed markups onto the included variables. See below for the calculation of equilibrium markups. There are 1,385 markets with two HMOs and two hospitals in each. This generates approximately the same number of buyer–seller pairings as in the data set used in the empirical analysis. Additional variables are defined as follows: Cost − AC is the cost per admission of the hospital minus the average of that over the hospitals in the market, Pop/bed is population over total number of hospital beds in the market, # Patients is number of patients the HMO sends to the hospital, and HMOmarg is the HMO’s average premium minus its average cost.
had 11), and none of them as individually significant at the 5% level (in contrast to 1 for the ν1 -only model). The first four columns of Table I present the empirical results. We subtracted our estimate of hospital costs from the revenues in all specifications, so the coefficients appearing in the table are the coefficients of the markup implicit in the per patient payment. Despite the fact that none of the test statistics we computed was significant at the 5% level, there was no value of θ which satisfied all the inequality constraints in any specification, a finding that is not unusual when there are many inequalities (all our specifications had 88 or more of them). The algorithm then generates a point estimate equal to the θ value that minimizes a squared metric in the negative part of the sample moments. Sample size limited the right-hand side variables we could use in the investigation. Still the estimates we do get, though reduced form, are eye-opening. They imply an equilibrium configuration in which the majority of cost savings from low cost hospitals are captured by the HMOs, and markups increase sharply when a hospital is capacity constrained (CapCon measures whether
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1813
the hospital would be capacity constrained if all hospitals contracted with all HMOs). Although these are not structural estimates, they do lead us to worry about the possibility of significantly lower incentives for hospitals to invest in either cost savings or in capacity expansion than would occur in a price-taking equilibrium. The difference between the ν1 -only estimates and those that allow for ν2 is that the former imply that almost all the cost savings from low cost hospitals go to the HMOs, while the latter imply that just over 50% do and a larger fraction of profits go to capacity constrained hospitals. Low cost hospitals tend to be more capacity constrained, so the two variables are negatively correlated. 3.2. Numerical Analysis Might we expect contracts with these characteristics to emanate from a contracting equilibrium and should we interpret those coefficients to mean that an increase in the right-hand side variable would, ceteris paribus, generate the markup response we estimate? To shed some light on these issues, we computed equilibria to a structural contracting model in markets with characteristics similar to those in Ho’s data, but with population scaled down to a size where we would expect to have two hospitals and two HMOs in each market (this made it possible to compute equilibria for many markets in a reasonable amount of time).14 We compute a full information Nash equilibrium to a game in which hospitals make take it or leave it offers to HMOs. The algorithm assumes that both hospitals choose among a finite set of couples of markups, one for each HMO, and that these markups are offered simultaneously to the HMOs. The offers are public information, as are the HMO premiums that would result from any set of contracts (these are obtained as the Nash equilibrium to a premium setting game among the HMOs). The HMOs then simultaneously accept or reject the offers. At equilibrium, each hospital is making the best offers it can given the offers of the other hospital and the responses of the HMOs, and each HMO is doing the best it can do given the actions of its competitor and the offers made by the hospitals.15 14 We used a discrete choice model of demand and market characteristics determined by random draws from demand and cost characteristic distributions that mimicked those in Ho’s data. The closest exercise I know of is in a paper by Gal-Or (1997). By judicious choice of primitives, she is able to provide analytic results from a full information Nash bargaining game between two HMOs and two hospitals. She focuses on when her assumptions would generate exclusive dealing and its effects on consumers. 15 A more detailed description of the algorithm can be found in Lee and Pakes (2010). An iterative process with an initial condition in which both hospitals contract with both HMOs chooses among the equilibria when there are multiple equilibria. The choice set included 50 possible markups for each of the two hospitals. The algorithm starts with the lowest ones. It then determines whether HMO1 wants to reject one (or both) of the contracts conditional on HMO2
1814
A. PAKES
Note that these assumptions differ from those used in the empirical analysis. In this full information game, the necessary conditions for an equilibrium guarantee an outcome which is renegotiation-proof, while the necessary conditions for the asymmetric information game we took to data do not. The related questions of (i) when the different equilibrium notions are appropriate and (ii) whether the estimation results are sensitive to this choice are questions that research on buyer–seller networks will have to sort out. Although the contents of contracts are often proprietary, typically who contracts with whom is not. So if we were trying to model a set of relationships which have been stable over some time, we might only consider equilibria in which no two agents would find it profitable to recontract given the information on who is contracting with whom. Of course the market we are studying may be constantly changing and negotiations might be costly. Then we might not expect the data to abide by a renegotiation-proof criteria, at least not one with costless renegotiation. Since all we need for estimation is a way to obtain a lower bound to the expected profits from a counterfactual choice, we could, at least in principle, obtain our inequalities from the difference between the actual profits and the minimum of the profits from a group of counterfactuals chosen to reflect different possible game forms (although the larger the group, the less tight our bounds and the larger the computational burden). Numerical Results Columns 5–8 of Table I present ordinary least squares (OLS) estimates from regressing the computed markups onto variables of interest. The first two columns show that the three variables that the empirical study focused on have the appropriate signs, are significant, and account for a large fraction, about 70%, of the variation in markups (or about 85% of the variance in transfers). Columns 7 and 8 add variables. The original three variables maintain their signs and remain significant but have noticeably different magnitudes; that is, being contracted to both hospitals. This requires solving for equilibrium premiums and profits for HMO1 given each possible choice it can make and the fact that HMO2 is contracted to both hospitals. HMO2 then computes its optimal responses to HMO1’s decisions in the same way. This process is repeated until we find a Nash equilibrium for the HMOs’ responses. No matter the offers, we always found an equilibrium to this subgame. We then optimize over the first hospital’s (say H1) offers, holding H2’s offers fixed. For each offer, we repeat the process above until we find a Nash equilibrium for the HMOs’ responses. This gives us H1’s optimal offers given the initial offers by H2. Next H1’s offers are held fixed and H2’s offers are optimized against that. We repeat this process until we find a Nash equilibrium in offers. For 3% of the random draws of characteristics, we could not find an equilibria and those markets were dropped from the analysis. Note that when a hospital contracts with an HMO in equilibrium, it does not necessarily contract at the lowest offer that is consistent with the HMO accepting. Different offers change the HMO costs per patient. This changes the outcome of the premium setting game that the HMOs engage in and feeds back into hospital profits.
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1815
although the empirical results do pick up important correlates of the equilibrium payments, the reduced form parameter estimates should not be thought of as causal responses. The coefficients of the additional variables in column 7 are instructive. They imply that when the average hospital cost in the market goes up by 1%, the markups of the hospitals in the market go down by 23%, but if the difference between a hospital’s cost and the average hospital cost goes up by 1%, the hospital’s markup goes down by 56%. So a hospital’s markup over costs depends on the costs of the other hospitals it is competing with. Hospitals earn higher markups in “tighter” markets (markets with lower ratios of population to the number of hospital beds) and once we account for this, the effect of capacity constraints is greatly reduced (though not eliminated). HMOs seem to get a small quantity discount (the markups they pay are lower when they send more patients to the hospital), and hospitals earn higher markups when the HMOs they are dealing with charge their members higher markups. Finally note that 20% of the variance in markups, or 8% of the variance in transfers, is not accounted for by our observables. Given the full information assumptions, this is ν2 variance. Even in a world where our equilibrium and functional form assumptions are correct, measurement error in hospital costs would cause ν1 error. So in this (and we suspect in most) empirical example, both types of errors are likely to be present. 4. SPECIFICATION ERRORS AND ALTERNATIVE ESTIMATORS The generalized discrete choice model ignores ν1 errors and requires an a priori specification of the ν2 distribution, both assumptions which, if incorrect, can generate an inconsistency in its estimators. The profit inequality model which pays inadequate attention to possible sources of ν2 error will generate selection biases. This section asks what the impacts of these specification errors are likely to be in the context of our buyer–seller network example. It presents Monte Carlo results from using each of the two model’s estimators both (i) when that models’ assumptions are the assumptions generating the data and (ii) when they are not. Where possible, we will also present results from Ho’s data. Details of the Monte Carlo Analysis The Monte Carlo results are based on a population of 100,000 markets whose equilibria were computed using the algorithm described in the last section. We estimate one parameter; the average per patient markup. To obtain the true value of that parameter, we took the transfers implicit in the equilibrium offers and projected them onto the number of patients and the variables
1816
A. PAKES
we used as instruments.16 The function obtained from this projection is treated as the parametric transfer function. The coefficients of the instruments are treated as known and the coefficient of the patient variable is the coefficient to be estimated. The residual from this projection is the ν2 error. This insures that ν2 has zero covariance with our instruments before we condition on the outcome. When all we require is a ν1 error, we treat these ν2 as known and add pseudorandom draws on a normal measurement error to hospital costs and/or population size. The Monte Carlo results are based on 400 data sets, each obtained as independent draws from our “population” of markets. The sample size was set so that the number of contracts in each sample matched the number of contracts in Ho’s (2009) data set. Since each of the computed equilibria had only two hospitals and two HMOs, this gave us a larger number of markets (1,385 markets per sample), but many fewer contracts per market, than in Ho (2009). The ν1 and ν2 draws are taken independently across samples.17 The inequalities used to estimate the profit inequality model are the same as those used in the empirical work: each HMO reverses its equilibrium decision with each hospital and each hospital replaces its equilibrium contract offer to each HMO with a null contract. However, since the Monte Carlo data are generated from a full information Nash equilibrium, when the hospital offers a null contract to an HMO, that hospital considers the profits that would accrue to it were both HMOs to reoptimize.18 For the generalized discrete choice approach, we used the inequalities generated by the necessary conditions for equilibrium.19 Results Table II, which presents the results, is split into panels. Panel A provides estimates obtained using the ν1 -only inequalities, panel B, using the ν2 -only 16 For accepted offers, these were the actual transfers; for the offers that were rejected, these are the transfers that would have resulted if the last offer had been accepted. 17 Actually we did the analysis in two ways. In the second, we drew a Monte Carlo data set, took 200 draws on vectors of ν1 errors for that data set, tabulated the results for each data set, and then averaged over data sets. This provides confidence intervals that condition on the observables, while the results reported in the text do not. The results from the two procedures were virtually identical. 18 To obtain the π(·) resulting from the null contract offer, let omh be the contract offered by hospital h to HMO m in equilibrium, and let φ be the null contract. If h = 1 contracted with m = 1, its profits from offering φ are obtained from the HMO equilibrium responses to the tuple (φ o12 o21 o22 ). 19 Note that the estimating equations used do not exhaust the information in the data in either approach. At the cost of increasing the computational burden, we (i) could have used the inequalities obtained from simultaneously switching each HMO’s behavior with respect to both hospitals in the profit inequality approach (and if more details on the contracts were available yet other inequalities would become available), and (ii) for the generalized discrete choice approach, we could have computed the probability that the observed equilibrium was unique.
1817
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES TABLE II INEQUALITY ESTIMATORS: SIMULATED AND ACTUAL DATAa Average Disturbances
Not in IV
LB
95% of θ in UB
LB
UB
A. Using ν1 inequalities Simulated data: θ0 = 1676, [θ0 θ0 ] = [1347 1859] Only ν1 disturbances 1. 25% Cost Cost 12.39 2. 25% Cost, 5% pop Cost, Njk , pop 11.43
18.72 37.34
12.12 11.30
19.05 45.88
ν1 and ν2 disturbances 3. ν2 , Costs 4. ν2 , Costs, pop
Cost Cost, Njk , pop
12.25 11.69
18.42 35.91
12.01 11.55
18.86 43.97
Cost
8.2
8.2
2.3
16.4
B. Using ν2 inequalities Simulated data, only constant as IV: θ0 = 1676, [θ0 θ0 ] = [167 1695] 6. ν2 (bootstrap dist.) 17.75 17.75 17.92 17.92 7. ν2 (normal dist.)
17.25 17.2
18.1 18.45
Simulated data, all IV: θ0 = 1676, [θ0 θ0 ] = [167 1685] 8. ν2 (bootstrap dist.) 17.84 18.02 9. ν2 (normal dist.)
17.84 18.02
17.40 17.65
18.25 18.5
Simulated data, ν1 (in costs) and ν2 disturbances 10. ν2 ∼ N , IV = only constant 11. ν2 ∼ N , all IV Costs
18.02 18.11
17.35 17.64
18.5 18.65
Ho’s (2009) data with ν1 inequalities 5. Actual disturbances
Ho’s (2009) data with ν2 inequalities 12. Assume ν2 normal
18.02 18.11
Could not compute
C. Using robust inequalities Simulated data 13. ν2 , Costs 14. ν2 , Costs, pop Ho’s (2009) data 15. Actual disturbances
Cost Cost, Njk , pop
11.86 11.69
n.b. n.b.
11.72 11.55
n.b. n.b.
Cost
11.7
11.7
3.6
17.9
a Instruments for panels A and C (unless omitted) are constant, N , hospital cost and capacity measures, marjk ket cost capacity and population measures, HMO characteristics, and interactions among them. Instruments for ν2 inequalities are market averages of the above variables. The model for line 15 also allowed for a cost coefficient; without it the average markup was negative.
inequalities (the inequalities from the generalized discrete choice model), and panel C using the inequalities that allow for both ν1 and ν2 disturbances. The “true” value of θ0 from the simulated data was 16.76. The “identified set,” that is, the θ interval that satisfies the population moment conditions, differs across panels. Since the instruments are orthogonal to the disturbance by construction, the identified set for panel A is the θ interval which generates positive popula-
1818
A. PAKES
tion profit inequalities when we set all disturbances to zero: [1347 1859]. The true identified set for the generalized discrete choice model depends on the true joint distribution of the ν2 ’s conditional on the market’s instruments. This is not known and is too complex to estimate nonparametrically—a problem which is likely to recur in empirical work. To get a sense of the identified set generated by this approach, we set all the disturbances to zero and, for each possible network structure, found the set of θ which lead to positive values for (i) the averages of the differences between the indicator functions for satisfying the Nash conditions and the observed equilibrium outcome, and (ii) did the same after interacting the difference in indicator functions for each network structure with the variables used as instruments. The interval for (i) was [167 1695], while for (ii) it was [167 1685]—both rather amazingly short. The first two rows of panel A provide results from the ν1 -only profit inequality model when there are only ν1 errors (so its estimators are consistent). Row 1 adds measurement error in costs equal to 25% of the true cost variance. The average of the estimated lower bounds is 8% lower than the true lower bound (θ0 ), while that of the upper bound is 2.5% higher than θ0 . Moreover, the bounds are precisely estimated, less than 2.5% of the lower (upper) bounds were more than 10% different from their true values. When we add an expectational error to the population, and hence to the patient flows from the HMOs to the hospitals, the estimated interval gets substantially larger. This is unfair to the model since, although there may be uncertainty in the relevant population size and patient flows variables when contracts are signed, we should be able to construct good instruments for them from current population size and flows, and we did not do that here. We keep this case because it allows us to examine the impact of specification errors in one setting where the bounds define a short interval and one where they do not. Rows 3 and 4 use a simulated data set that contains both ν1 and ν2 errors but the inequalities from the ν1 -only model. The ratio of the variance in ν2 to the variance in the dependent variable is 12.7%. Now the estimated bounds are inconsistent; the lower bound will, in the limit, be too large, while the upper bound will be too low. This makes the bounds move toward θ0 , but they may overshoot, leaving us with an estimator which does not cover the true θ0 . Adding ν2 also adds variance to the estimators, so in any finite sample, the estimated bounds may be smaller or larger with ν2 errors than without them. In the case with only measurement error in costs, the case in which the interval was tightly estimated, adding specification error in the form of the ν2 has little effect on any of the estimates. When there is also measurement error in population and the estimated intervals are larger, the effect of the specification error is to lessen the loosely estimated upper bound, but only by 5%. At least in this example, estimates from the ν1 -only inequalities do not change much when we allowed for ν2 error. Apparently when we add ν2 variance, its biasing effects on the estimates are largely offset by the effect of increased data variance on those estimates.
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1819
The last row of panel A provides the estimates when we use Ho’s (2009) data with this specification. This generates a point estimate about a third lower than the lower bound estimate from the simulated data, and a confidence interval of length between that of the model with errors in the population and that without those errors. Panel B provides the results when we use the ν2 -only inequalities. To use the ν2 -only algorithm, we need an assumption on the joint distribution of the ν2 . We tried two assumptions: (i) random draws from the empirical distribution of the actual ν2 and (ii) a normal distribution. The first option would not be available to empirical researchers, but might be closer to the true population distribution (it would be if the ν2 were truly independent, not just mean independent, of the instruments and had no within market correlation). Regardless of whether we use just the constant term (rows 6 and 7) or all of our instruments (rows 8 and 9) and regardless of the choice of ν2 -distribution, the ν2 -only model generates point estimates whose values are larger than the true θ0 . The distribution of estimates had little variance, so the interval formed from 95% of the point estimates does not cover the true θ0 either. Apparently the lack of information on the ν2 -distribution leads to an inconsistency. Although this is disturbing, the asymptotic bias is not large; the lower bound of the (normal) confidence interval is about 2.6% larger and the point estimate is 6.9% larger than θ0 . Just as we added ν2 variance to the algorithm which uses the ν1 -only inequalities, rows 10 and 11 add ν1 variance to the algorithm which uses the ν2 -only inequalities. The estimates presented in these rows use the normal distribution of the ν2 (an empirical researcher would not have access to the bootstrap distribution). Adding ν1 errors does tend to increase the parameter estimates further, but by a surprisingly small amount. We could not use the ν2 -only algorithms on Ho’s actual data. To do so we would have had to compute about 100,000 premium setting equilibria and their implied profits for each ν2 draw and each θ evaluated in estimation—a task that will be beyond our computational abilities for some time. Panel C provides the estimates obtained when we used the inequalities that allow for both ν1 and ν2 disturbances—the “robust” inequalities in equations (14) and (15). The fact that there are only two agents on each side of the simulated markets implies that the robust inequalities do not deliver an upper bound. The lower bound is lower than the bound obtained when we used the ν1 -only inequalities on this data, but not by much. When we move to Ho’s (2009) data and use the robust inequalities, we get an estimate which is larger than the estimate which allows for only ν1 errors but a confidence interval of similar length, and both confidence intervals cover both estimates. Interestingly, once we allow for ν2 errors, the estimates from Ho’s data are closer to, and the confidence interval covers, the value of the parameter obtained from the numerical analysis.
1820
A. PAKES
The results from the Monte Carlo analysis are quite encouraging. It seems that the most salient problem is the requirement of an assumption on the distribution of ν2 in the generalized discrete choice model. In multistage games, that estimator also carries a large computational burden. However, the worry that a moderate amount of ν1 variance in the generalized discrete choice model or a moderate amount of ν2 variance in the profit inequality model would severely bias the estimates is, at least in this example, not warranted. The addition of the unaccounted for error adds variance, as well as bias, to the estimates. This variance tends to move the bounds in the opposite direction as does the bias, and in our example, the net effect was small. The estimator which uses the robust inequalities is least subject to bias but does generate larger identified sets. 5. SUMMARY This paper formulates two sets of assumptions that enable one to bring behavioral models—both their structural and their reduced forms—to data and applies them to two empirical problems. Our first example illustrates that the assumptions underlying traditional discrete choice estimators are not always the most sensible choice for discrete choice problems. This motivates an enumeration of assumptions that justify alternative estimators in both multiple and single agent settings. An empirical example illustrates how the multiple agent estimator can be used to analyze a problem which is central to the determinants of prices and investment incentives in vertical markets; the correlates of the profit split between buyers and sellers in those markets. Although the results were reduced form and had to make do with both limited data and the auxiliary assumptions required to obtain counterfactual profits, they were broadly consistent with results obtained from a numerical analysis of equilibrium contracts in markets which were similar to those used in the empirical analysis. A Monte Carlo analysis indicated that the estimators from both models were surprisingly robust to all likely sources of problems but one: the need to assume a distribution for the generalized discrete choice model. It seems that moment inequalities open up possibilities for empirically analyzing market interactions in relatively unexplored, yet important, settings. APPENDIX: INEQUALITIES FOR BUYER–SELLER NETWORK WITH FIXED EFFECTS We use the notation introduced for the hospital–HMO problem in Section 3.1 and consider the case in which the {ν2mh } are HMO fixed effects; that is, that ∀(h m), ν2mh = ν2m . These restrictions generate two sets of inequalities. The first is a difference in difference inequality. If an HMO accepts at least one hospital’s contract and rejects the contract of another, then the sum of
ALTERNATIVE MODELS FOR MOMENT INEQUALITIES
1821
the increment in profits from accepting the contract accepted and rejecting the contract rejected (i) differences out the HMO effect and (ii) has a positive expectation. More formally, for every h˜ ∈ / Hm and h ∈ Hm , we have ˜ ·) + π M (Hm Hm \ h ·) πmM (Hm Hm ∪ h m ˜ ·) + r M (Hm Hm \ h ·) = rmM (Hm Hm ∪ h m which implies that provided x ∈ Jm ∩ Jh and h(·) is a positive valued function, ˜ ·; θ0 ) + r M (Hm Hm \ h ·; θ0 )]h(x) ≥ 0 E[rmM (Hm Hm ∪ h m For the second inequality, note that if ν2mh = ν2m we can use the logic leading to equation (14) in the text to show that for any positive valued function h(·), 1 0≤E χmh πhH (Mh Mh / h ·) #H h M + (1 − χmh )πm (Hm Hm ∪ h ·) h(x) =E
1 χmh rhH (Mh Mh / h ·; θ) #H h
+ (1 − χmh )r (Hm Hm ∪ h ·; θ) + ν2m h(x) M m
≡ E[S r (m ·; θ) + ν2m ]h(x) This implies that ES r (m ·; θ0 )h(x) ≥ −Eν2m h(x) and, consequently, that for any x ∈ Jm ∩ Jh , E[rmM (Hm Hm \ h ·; θ0 ) + S r (m ·; θ0 )]h(x) ≥ 0 REFERENCES ANDREWS, D., AND G. SOARES (2010): “Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection Procedures,” Econometrica, 119–157. [1785] ANDREWS, D., S. BERRY, AND P. JIA (2006): “Confidence Regions for Parameters in Discrete Games With Multiple Equilibria, With an Application to Discount Chain Store Location,” Mimeo, Yale University. [1783,1794,1797] BAJARI P., H. HONG, AND S. RYAN (2010): “Identification and Estimation of a Discrete Game of Complete Information,” Econometrica (forthcoming). [1798] BECKERT, W., R. GRIFFITH, AND L. NESHEIM (2009): “Disaggregate Demand Elasticities at the Extensive and Intensive Margins: An Empirical Analysis of Store and Shopping Basket Choices,” Mimeo, Institute for Fiscal Studies, London. [1788]
1822
A. PAKES
BERESTEANU, A., AND F. MOLINARI (2008): “Asymptotic Properties for a Class of Partially Identified Models,” Econometrica, 76, 763–814. [1798] CHERNOZHUKOV, V., H. HONG, AND E. TAMER (2007): “Estimation and Confidence Regions for Parameter Sets in Econometric Models,” Econometrica, 75, 1243–1284. [1785] CILIBERTO, F., AND E. TAMER (2009): “Market Structure and Multiple Equilibria in the Airline Markets,” Econometrica, 77, 1791–1828. [1783,1794,1797] DEKEL, E., D. FUDENBERG, AND D. LEVINE (1993): “Payoff Information and Self-Confirming Equilibrium,” Journal of Economic Theory, 89, 165–85. [1799] DE LOECKER, J., M. MELITZ, AND A. PAKES (2010): “Plant Location in the European Chemical Industry,” Mimeo, Princeton University. [1803] DUBIN, J., AND D. MCFADDEN (1984): “An Econometric Analysis of Residential Electric Appliance Holding and Consumption,” Econometrica, 52, 345–362. [1787] GAL -OR, E. (1997): “Exclusionary Equilibria in Health-Care Markets,” Journal of Economics and Management Strategy, 6, 5–43. [1813] HAILE, P., AND E. TAMER (2003): “Inference With an Incomplete Model of English Auctions,” Journal of Political Economy, 111, 1–51. [1784] HANSEN, L. P., AND K. J. SINGLETON (1982): “Generalized Instrumental Variables Estimation of Nonlinear Rational Expectations Models,” Econometrica, 50, 1269–1286. [1784] HART, O., AND J. TIROLE (1990): “Vertical Integration and Market Foreclosure,” Brookings Papers on Economic Activity: Microeconomics, 205–286. [1809] HAUSMAN, J., J. ABREVAYA, AND F. SCOTT-MORTON (1998): “Misclassification of the Dependent Variable in a Discrete Choice Setting,” Journal of Econometrics, 87, 239–269. [1795] HO, K. (2009): “Insurer–Provider Networks in the Medical Care Market,” American Economic Review, 99, 393–430. [1784,1807,1808,1811,1816,1817,1819] ISHII, J. (2008): “Compatability, Competition, and Investment in Network Industries: ATM Networks in the Banking Industry,” Mimeo, GSB, Stanford University. [1802] KATZ, M. (2007): “Supermarkets and Zoning Laws,” Unpublished Ph.D. Dissertation, Harvard University. [1785,1788] KEANE, M., AND K. WOLPIN (2009): “Empirical Applications of Discrete Choice Dynamic Programming Models,” Review of Economic Dynamics, 12, 1–22. [1795] LEE, R., AND A. PAKES (2010): “Supplement to ‘Alternative Models for Moment Inequalities’,” Econometrica Supplemental Material, 79, http://www.econometricsociety.org/ecta/Supmat/ 7944_data data and programs.zip. [1813] LUTTMER, E. (1996): “Asset Pricing in Economies With Frictions,” Econometrica, 64, 1439–1467. [1784] MANSKI, C. (2004): “Measuring Expectations,” Econometrica, 72, 1329–1376. [1795] MCAFEE, R., AND M. SCHWARTZ (1992): “Opportunism in Multilateral Vertical Contracting: Nondiscrimination, Exclusivity, and Uniformity,” American Economic Review, 94, 210–230. [1809] PAKES, A., J. PORTER, K. HO, AND J. ISHII (2006), “Moment Inequalities and Their Application,” Mimeo, Harvard University. [1784,1792,1798,1800,1803,1811] SAVAGE, L. (1954): The Foundations of Statistics. Hoboken, NJ: Wiley. [1791] TAMER, E. (2003): “Incomplete Simultaneous Discrete Response With Multiple Equilibria,” Review of Economic Studies, 70, 147–165. [1783,1794] VARIAN, H. (1982): “The Nonparametric Approach to Demand Analysis,” Econometrica, 50, 945–973. [1784]
Dept. of Economics, Harvard University, Littauer Room 117, Cambridge, MA 02138, U.S.A. and National Bureau of Economic Research;
[email protected]. edu. Manuscript received May, 2008; final revision received July, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 1823–1862
ADVERSE SELECTION IN COMPETITIVE SEARCH EQUILIBRIUM BY VERONICA GUERRIERI, ROBERT SHIMER, AND RANDALL WRIGHT1 We study economies with adverse selection, plus the frictions in competitive search theory. With competitive search, principals post terms of trade (contracts), then agents choose where to apply, and they match bilaterally. Search allows us to analyze the effects of private information on both the intensive and extensive margins (the terms and probability of trade). There always exists a separating equilibrium where each type applies to a different contract. The equilibrium is unique in terms of payoffs. It is not generally efficient. We provide an algorithm for constructing equilibrium. Three applications illustrate the usefulness of the approach, and contrast our results with those in standard contract and search theory. KEYWORDS: Search, matching, information, adverse selection, contracts.
1. INTRODUCTION WE ARE INTERESTED IN EQUILIBRIUM and efficiency in economies with adverse selection, plus the frictions in competitive search theory. In competitive search, principals on one side of the market post terms of trade—here they post contracts—and agents on the other side choose where to direct their search. Then they match bilaterally, although some principals and agents may fail to find a partner. Agents here have private information concerning their type. Although there is not much work that explicitly incorporates competitive search and information frictions simultaneously, with a few exceptions discussed below, we think that such an integration is natural. It is natural because it allows uninformed principals to try, through the posted terms of trade, to attract certain types and screen out others. We develop a general framework and obtain strong results: we prove that equilibrium always exists, which is not always the case with adverse selection, we prove that equilibrium payoffs are unique, we show that equilibrium is not generally efficient, and we provide an algorithm for constructing equilibrium. The model is useful for studying a variety of substantive economic issues. For example, in labor markets, an idea discussed in many papers using contract theory is that incentive problems related to private information can distort allocations in terms of hours per worker (the intensive margin). Another idea is that incentive problems may affect the probability that a worker gets a job in the first place (the extensive margin), and for this, search theory is useful. 1 We are grateful for comments from Daron Acemoglu, Hector Chade, Guido Lorenzoni, Giuseppe Moscarini, Iván Werning, Martin Gervais, Miguel Faig, numerous seminar participants, three anonymous referees, and a co-editor. Shimer and Wright thank the National Science Foundation for research support. Wright is also grateful for support from the Ray Zemon Chair in Liquid Assets. Guerrieri is grateful for the hospitality of the Federal Reserve Bank of Minneapolis.
© 2010 The Econometric Society
DOI: 10.3982/ECTA8535
1824
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
It seems interesting to study models where both margins are operative to see how incentive problems manifest themselves in terms of distortions in either hours per worker or unemployment, or both. Similarly, in asset markets it is commonly thought that information frictions may be reflected by discounts in prices (again, the intensive margin). It is sometimes also suggested that these frictions may be reflected in liquidity or the time it takes to trade (the extensive margin). Again, it is interesting to allow both margins to operate in the same model. To this end, we think it is a good idea to pursue models that integrate information and search theory.2 In our framework, although agents observe everything that is posted and search wherever they like, matching is bilateral—each principal meets at most one agent and vice versa. Indeed, our results hold both when the only friction is the bilateral matching technology, with the short side of a market assured of matching, and, more generally when there is also a search friction, in the sense that principals and agents may simultaneously be left unmatched. We assume that the number of agents and the distribution of types is fixed, while the number of active principals is determined by free entry. As is standard, principals and agents potentially face a trade-off between the terms of trade and market tightness. For example, a worker might like to apply for a high-wage job, but not if too many others also apply. In our setup, principals also need to form expectations about market composition, that is, which types of agents search for a given contract. Under mild assumptions, including a single-crossing property, there always exists a separating equilibrium where each principal posts a contract that attracts a single type of agent. Because we get separating equilibria, we can assume without loss of generality that principals post contracts, as opposed to an ostensibly more general situation where they post revelation mechanisms. This simplifies the model. We also provide an algorithm, involving the solution to a sequence of optimization problems, which characterizes the equilibrium. This simplifies the analysis a lot. We provide a series of applications and examples to illustrate the usefulness of the framework and to show how some well known results in contract theory and in search theory change when we combine elements of both in the same model. The first application is a classic sorting problem (Akerlof (1976)). Suppose that workers are heterogeneous with respect both to their expected productivity and to their cost of working longer hours: more productive workers find long hours less costly. Contracts specify a combination of wages and hours of work but cannot be directly conditioned on a worker’s type (it is private information). In equilibrium, firms may require that more productive workers 2 There are too many contributions to provide a survey here, but by way of example, a well known paper where information and incentive issues distort hours per worker is Green and Kahn (1983), and one where they distort unemployment is Shapiro and Stiglitz (1984). Papers where informational frictions distort the terms of trade in the asset market include Glosten and Milgrom (1985) and Kyle (1985), and one where they distort the time it takes to trade is Williamson and Wright (1994). We discuss other papers in more detail below.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1825
work longer hours than under full information—a version of the rat race. We discuss cases where, although hours are distorted, the probability that a worker gets a job is not, and other cases where there is also overemployment of high types. We find that the equilibrium can be Pareto inefficient if there are few low-productivity agents or the difference in the cost of working is small. Our second application is a version of the well known Rothschild and Stiglitz (1976) insurance problem. Consider a labor market interpretation, rather than pure insurance, as in the original model (only because in the labor market our assumption of bilateral meetings may seem more natural). Risk-averse workers and risk-neutral firms can combine to produce output, but only some pairs are productive. Workers differ in the probability that they will be productive once they are matched. Firms can observe whether a worker is productive but cannot observe a worker’s type. If a match proves unproductive, the worker is let go. In equilibrium, firms separate workers by only partially insuring them against the probability the match will be unproductive. Indeed, workers are worse off if they find a job and are then let go than they would be had they never found a job in the first place. We interpret this as an explanation for the fact that firms do not fully insure workers against layoff risk: if they did, they would attract low-productivity applicants. This example also illustrates how our approach resolves the nonexistence problem in standard adverse selection models like Rothschild and Stiglitz (1976). When there are relatively few low-productivity workers, equilibrium may not exist in that model for the following reason: given any separating contract, profit for an individual firm can be increased by a deviation to a pooling contract that subsidizes low-productivity workers. Here, such a pooling contract will not increase profit. The key difference is that in our model, firms can match with at most one worker, so a deviation cannot serve the entire population. Suppose a firm posts a contract designed to attract a representative cross section of agents. The more workers that search for this contract, the less likely it is that any one will match. This discourages some from searching. Critically, it is the most productive workers who are the first to go, because their outside option—trying to obtain a separating contract—is more attractive. Hence, only undesirable types are attracted by the deviation, making it unprofitable. Our third application—to asset markets—illustrates among other things how adverse selection can sometimes make it harder to trade without affecting the terms of trade. Principals want to buy and agents want to sell apples, meant to represent assets that could be high or low quality.3 As in Akerlof (1970), some apples are bad: they are lemons. To make the case stark, we can even assume there are no fundamental search frictions, so that everyone on the short side of the market matches. But it is important to understand that here the short side of a market is endogenous. In equilibrium, we show that sellers with good 3 Apples here stand in for claims to long-lived assets (trees) with uncertain dividends (fruit), as in standard asset-pricing theory.
1826
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
apples trade only probabilistically, which is precisely how buyers screen low quality. This is different from related results in the literature (e.g., Nosal and Wallace (2007)), where lotteries are used to screen. Interestingly, screening through search saves resources, compared to lotteries. Still, equilibrium can be Pareto dominated by a pooling allocation if there are few bad apples. We also show that in some cases the market completely shuts down, an extreme lemons problem perhaps relevant for understanding the recent collapse in some credit markets. In terms of the literature, many papers propose alternative solutions to the Rothschild and Stiglitz (1976) nonexistence problem. As mentioned, a key difference in our paper is that matching is bilateral and that each principal can serve at most one agent. This can create distortions along the extensive margin and implies that principals must form expectations about which agents are most attracted to a contract. Ours is not the first paper to highlight the key role of capacity constraints in breaking the nonexistence result. Gale (1996) used a notion of competitive equilibrium with price-taking principals and agents in an environment with one-sided asymmetric information. He allowed for the possibility of rationing in equilibrium and recognized how this affects the composition of agents seeking particular contracts. We discuss in more detail the relationship between his notion of equilibrium and ours when we present the formal definition in Section 2; for now, we simply note that Gale (1996) did not prove existence or uniqueness, nor did he provide our simple characterization. More recently, research by Inderst and Wambach proved the existence of equilibrium in an adverse selection model with capacity constraints and a finite number of principals and agents. Inderst and Wambach (2001) developed a model that is close to our second example, although they found that the equilibrium is not unique due to the possibility of coordination failures from the independent outcomes of mixed search strategies. This is not an issue in our large economy. Inderst and Wambach (2002) developed a model that is a special case of our first example, and they proved existence and uniqueness of equilibrium.4 Due to the nature of the example, there is no rationing along the equilibrium path, although as they stress, rationing off the equilibrium path is key to sustaining equilibrium. In our third example, there is always rationing in equilibrium, a possibility that appears to be new to this literature. Ours is the first paper to develop a general framework for analyzing competitive search with adverse selection, and to present a variety of applications that stress distortions along both the intensive and extensive margins. Other papers follow Prescott and Townsend (1984) and study adverse selection in competitive economies without rationing. In particular, Gale (1992), Dubey and Geanakoplos (2002), and Dubey, Geanakoplos, and Shubik (2005) established existence and uniqueness of equilibrium. Key to these papers is the assumption that everyone takes as given the price and composition of traders 4 In Inderst and Wambach (2002), there are capacity constraints but no search frictions. A related working paper, Inderst and Müller (1999), also has search frictions.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1827
for all potential contracts, not only the ones traded in equilibrium. A similar feature arises in our environment, although with the economically relevant difference that trading probabilities take the role of prices in clearing the market. In contrast, the Rothschild and Stiglitz (1976) equilibrium concept would allow principals to consider a deviation to a new contract with an arbitrary trading probability (or price). That is, trading probabilities (or prices) are not restricted by the market. This enlarges the set of potential deviations and potentially leads to nonexistence of equilibrium. Of course, by restricting the set of deviations, Gale (1992), Dubey and Geanakoplos (2002), Dubey, Geanakoplos, and Shubik (2005), and our paper must face the possibility that equilibrium may not be unique. This turns out to depend on how beliefs about the composition of traders are determined for contracts that are not traded in equilibrium. The basic problem is that if beliefs about untraded contracts are arbitrary, there may exist many equilibria in which contracts are not traded because of a concern that only bad traders are in those markets. In our paper, beliefs are based on a notion of subgame perfection.5 If a principal offers a contract that is not offered in equilibrium, he anticipates that it would attract only the agents who are willing to accept the lowest principal–agent ratio, and he evaluates the contract accordingly. In Gale (1992), beliefs are pinned down by a strategic stability requirement (Kohlberg and Mertens (1986)), which essentially requires that a small number of principals offer and a small number of agents search for each contract. In Dubey and Geanakoplos (2002) and Dubey, Geanakoplos, and Shubik (2005), beliefs are pinned down by an assumption that individuals are “optimistic.” More precisely, a small number of agents with the highest type search for each contract, which ensures that all agents believe that each contract will serve exclusively the type that finds it most desirable, as in our paper. We view our notion of equilibrium as the simplest and our finding that rationing, rather than distortion of contracts, may occur in equilibrium as substantively important. Still, the relationship between the different approaches and similarities of the conclusions is interesting. An older literature resolves the Rothschild–Stiglitz nonexistence problem by modifying the game. Miyazaki (1977) and Wilson (1977) allowed principals to withdraw contracts after other principals deviate. This makes deviations less profitable and typically leads to the existence of a pooling equilibrium where some agents cross-subsidize others. Riley (1979) let principals add new contracts to the menu after other principals deviate, allowing the least-cost separating equilibrium to survive. We remain closer to Rothschild and Stiglitz 5 Burdett, Shi, and Wright (2001) proved that a competitive search equilibrium is the limit of a two stage game with finite numbers of homogeneous buyers and sellers. In the first stage, sellers post prices. In the second, stage, buyers use identical mixed strategies to select a seller. Each seller who attracts at least one buyer sells a unit to one randomly selected buyer, while the remaining buyers and sellers do not trade. Buyers’ strategies must be an equilibrium following any choice of prices in the first stage, and sellers anticipate those strategies when they set prices.
1828
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
(1976), allowing principals to commit to contracts, and find that there is no cross-subsidization in equilibrium. Our paper is also related to a large literature on competitive search (Montgomery (1991), Peters (1991), Moen (1997), Shimer (1996), Acemoglu and Shimer (1999), Burdett, Shi, and Wright (2001), Mortensen and Wright (2002)). A few earlier papers proposed extensions of the competitive search framework to an environment with private information (Faig and Jerez (2005), Guerrieri (2008), Moen and Rosén (2006)). However, in all these papers, agents are ex ante homogeneous and their private information is about the quality of the match. The rest of the paper is organized as follows. In Section 2, we develop the general environment, define equilibrium, and discuss some critical assumptions. In Section 3, we show how to find equilibrium by solving a constrained optimization problem, prove that a separating equilibrium always exists, and show that equilibrium payoffs are unique. In Section 4, we define a class of incentive-feasible allocations so that we can discuss whether equilibrium outcomes are efficient within this class. In Sections 5–7, we present the applications discussed above, in each case characterizing equilibria and discussing efficiency. Section 8 concludes. Proofs are relegated to the Appendix. 2. THE MODEL 2.1. Preferences and Technology There is a measure 1 of agents, a fraction πi > 0 of whom are of type i ∈ I ≡ {1 2 I}. Type is an agent’s private information. It could index, for instance, productivity or preferences if agents are workers; if they are sellers of assets or commodities, type could represent the quality of their holdings. There is a large set of ex ante homogeneous principals, each of whom may or may not participate in the market: they can enter, which provides a principal an opportunity to match with an agent, if and only if they pay cost k > 0.6 To keep the analysis focused, in this project, we consider an environment where principals and agents have a single opportunity to match, and matching is bilateral. If a principal and an agent match, the pair enter into a relationship described by a contract. A contract is given by a vector y ∈ Y that may specify actions for the principal, actions for the agent, transfers between them, and other possibilities. As we make explicit in the applications, it can include lotteries. All we need to say for now about the set of feasible contracts Y is that it is compact, nonempty, and contained in a metric space with metric d(y y ) for y y ∈ Y . A principal who matches with a type i agent gets a payoff vi (y) − k from contract y. A principal who does not enter the market gets a payoff normalized to 0, while one 6 In our model, the ex ante investment k is given. See Mailath, Postlewaite, and Samuelson (2010) for a model where asymmetric information may lead to inefficient pre-match investments.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1829
who enters but fails to match gets −k. A type i agent matched with a principal gets a payoff ui (y), while an unmatched agent gets a payoff also normalized to 0. For all i, ui : Y → R and vi : Y → R are assumed to be continuous. When we say that a principal enters the market, we mean that he posts a mechanism. By posting we mean that the principal announces and commits to a mechanism, and all agents can see what every principal posts. By the revelation principle, without loss of generality, we can assume they post direct revelation mechanisms. A posted mechanism is a vector of contracts, y = {y1 yI } ∈ Y I , specifying that if a principal and an agent match, the latter (truthfully) announces his type i and they implement yi . The mechanism y is incentive compatible if ui (yi ) ≥ ui (yj ) for all i j.7 Let C ⊂ Y I denote the set of incentivecompatible mechanisms. Principals only post mechanisms in C. We now turn to the matching process. As we said, each agent observes what all the principals post and then directs his search to any one he likes (although he can only apply to one, he can use a mixed strategy to decide which one). Matching is bilateral, so at most one agent ever contacts a principal.8 Let Θm (y) denote the principal–agent ratio, or market tightness, associated with a mechanism y, defined as the measure of principals posting y divided by the measure of agents applying to y, Θm : C → [0 ∞]. Let γim (y) denote the share of agents applying to y that are type i, with Γ m (y) ≡ {γ1m (y) γim (y) γIm (y)} ∈ ΔI , the unit simplex. That is, Γ m (y) satisfies γim (y) ≥ 0 for all i and I-dimensional m m I m m are determined eni γi (y) = 1, and Γ : C → Δ . The functions Θ and Γ dogenously in equilibrium and, as we discuss below, they are defined for all incentive-compatible mechanisms, not only the ones that are posted in equilibrium. An agent who applies to y matches with a principal with probability μ(Θm (y)), independent of type, where the matching function μ : [0 ∞] → [0 1] is nondecreasing. Otherwise the agent is unmatched. A principal offering y matches with an agent with probability η(Θm (y)), where η : [0 ∞] → [0 1] is nonincreasing, and otherwise is unmatched. Conditional on a match, the probability that the agent’s type is i is γim (y). We impose μ(θ) = θη(θ) for all θ, since the left hand side is the matching probability of an agent and the right hand side is the matching probability of a principal times the principal– agent ratio. Together with the monotonicity of μ and η, this implies both functions are continuous. It is convenient to let η¯ ≡ η(0) > 0 denote the highest probability with which a principal can meet an agent, obtained when the principal–agent ratio is 0. Similarly, let μ¯ ≡ μ(∞) > 0 denote the highest probability with which an agent can meet a principal. Conversely, μ(θ) = θη(θ) ensures that η(∞) = μ(0) = 0. 7 Since we are not concerned with moral hazard, we assume that any y ∈ Y can be implemented by any principal–agent pair. 8 This assumption renders moot the usual notion of capacity constraints, which is the notion that a principal can only match with a limited number of agents. Here they never contact more than one agent.
1830
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
2.2. Assumptions Throughout our analysis, we make three assumptions on preferences. First, let Y¯ i ≡ {y ∈ Y | ηv ¯ i (y) ≥ k and ui (y) ≥ 0} be the set of contracts that deliver nonnegative utility to a type i agent while permitting the principal to make nonnegative profits if the principal–agent ratio is 0, and Y¯ i Y¯ ≡ i
In equilibrium, contracts that are not in Y¯ are not traded, because there is no way to simultaneously cover the fixed cost k and attract agents. Our first formal assumption is quite mild and simply says that for any given contract, principals weakly prefer higher types. ASSUMPTION A1—Monotonicity: For all y ∈ Y¯ , v1 (y) ≤ v2 (y) ≤ · · · ≤ vI (y) For the next assumptions, let Bε (y) ≡ {y ∈ Y | d(y y ) < ε} be a ball of radius ε around y. ASSUMPTION A2—Local Nonsatiation: For all i ∈ I, y ∈ Y¯ i , and ε > 0, there exists a y ∈ Bε (y) such that vi (y ) > vi (y) and uj (y ) ≤ uj (y) for all j < i. Another mild assumption, A2 is satisfied in any application where contracts allow transfers.9 Our final assumption guarantees that it is possible to design contracts that attract some agents without attracting less desirable agents. ASSUMPTION A3—Sorting: For all i ∈ I, y ∈ Y¯ i , and ε > 0, there exists a y ∈ Bε (y) such that uj (y ) > uj (y) for all j ≥ i
and uj (y ) < uj (y) for all j < i
9 Moreover, we use A2 only in the proof of Proposition 2 below and nowhere else. We use it to establish that it is possible to make a principal better off while not improving the well-being of agents. If η were strictly decreasing, we could do this by adjusting market tightness, but since for some examples it is interesting to have η only weakly decreasing, we introduce A2.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1831
A standard single-crossing condition states that if a low type prefers a high contract y to a low contract y, then a high type must as well (Milgrom and Shannon (1994)). While our sorting condition is related to this requirement, it it is weaker in that it is local and does not require that the set of contracts is partially ordered. On the other hand, our condition is stronger than singlecrossing in the requirement that sorting can always be achieved through local perturbations. If this assumption fails, principals may be unable to screen agents and the equilibrium can involve pooling. 2.3. Mechanisms versus Contracts Under A1–A3, we may without loss of generality assume that each principal posts a single contract y, rather than a mechanism y offering a (potentially different) contract to each type i that may show up. In this case, the principal–agent ratio and the share of type i agents are defined for every contract, Θ : Y → [0 ∞] and Γ : Y → ΔI . This reduces the notation considerably. When we say that we can assume without loss of generality that each principal posts a contract, rather than a mechanism, we mean the following: Proposition 5 establishes that any equilibrium under the restriction that each principal posts a contract generates an equilibrium for the model where principals post mechanisms. In this equilibrium, each principal posts a degenerate mechanism, offering the same contract to all agents, y = {y y}. Conversely, any equilibrium for the general model is payoff-equivalent to one for the model where each principal posts a single contract.10 This is not too surprising once one sees that equilibria in both environments have the feature that principals separate agents, in the sense that different types trade different contracts. Given this, we proceed by assuming for now that each principal posts a single contract. To summarize where we are, the expected utility of a principal who posts contract y is η(Θ(y))
γi (y)vi (y) − k
i
The expected utility of a type i agent who applies to contract y is μ(Θ(y))ui (y) The functions Θ and Γ are defined over the set of feasible contracts Y and are determined in equilibrium. 10
In fact, under some conditions, principals might be willing to post menus that attract multiple types and offer them different contracts. But even in such a case, there is always a payoffequivalent equilibrium in which all principals offer degenerate mechanisms.
1832
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
2.4. Equilibrium In equilibrium, principals post profit maximizing contracts and earn zero profit; and conditional on the contracts posted and the search behavior of other agents, each agent directs his search to a preferred contract. In practice, many contracts are not posted in equilibrium, but it is still necessary to define beliefs about the principal–agent ratio and the types of agents who would apply for those contracts if they were offered. We propose the following definition of equilibrium, and argue below that the implied beliefs are reasonable. DEFINITION 1—Equilibrium: A competitive search equilibrium is a vector U¯ = {U¯ i }i∈I ∈ RI+ , a measure λ on Y with support Y P , a function Θ : Y → [0 ∞], and a function Γ : Y → ΔI that satisfy the following conditions (i) Principals’ Profit Maximization and Free Entry: For any y ∈ Y , γi (y)vi (y) ≤ k η(Θ(y)) i
with equality if y ∈ Y P . (ii) Agents’ Optimal Search: Let U¯ i = max 0 max μ(Θ(y ))ui (y ) y ∈Y P
and U¯ i = 0 if Y P = ∅. Then for any y ∈ Y and i, U¯ i ≥ μ(Θ(y))ui (y) with equality if Θ(y) < ∞ and γi (y) > 0. Moreover, if ui (y) < 0, either Θ(y) = ∞ or γi (y) = 0. (iii) Market Clearing: γi (y) dλ({y}) ≤ πi for any i P Y Θ(y) with equality if U¯ i > 0. To understand the equilibrium concept, first consider contracts that are actually posted in equilibrium, y ∈ Y P . Part (i) of the definition implies that principals earn zero profits from any such contract. Since η(∞) = 0 < k, it must be that Θ(y) < ∞. Part (ii) then implies that if type i agents apply for any such contract, that is, γi (y) > 0, they cannot earn a higher level of utility from any other posted contract. Part (iii) guarantees that all type i agents apply to some contract, unless they are indifferent about participating in the market, which gives them the outside option U¯ i = 0.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1833
Our definition also imposes restrictions on contracts that are not posted in equilibrium. This is important for our uniqueness results. In particular, without any further restrictions, principals may choose not to post a deviating contract y because they anticipate that it will only attract the lowest type; and with that belief, any principal–agent ratio that is high enough to attract agents to the contract is unprofitable to the principal. In our view, such beliefs are unreasonable if higher types find the deviating contract more attractive than do low types, in the sense that they would be willing to apply for the contract at a lower principal–agent ratio. If a principal did post the contract, these beliefs would be refuted. This view is hardwired into our definition of equilibrium. To be concrete, suppose that we start from a situation where the distribution of posted contracts is λ and we force a small measure ε of principals to post an arbitrary deviating contract y. In the resulting “subgame,” agents optimally search for one of the posted contracts. Those types willing to accept the lowest principal–agent ratio at contract y would determine both the composition of agents Γ (y) and the principal–agent ratio Θ(y). If the principals posting y would earn positive profits in the limit as ε converges to zero, λ is not part of an equilibrium. By considering all such deviations, we can pin down the equilibrium functions Γ and Θ. Part (ii) of the definition of equilibrium formalizes this idea. If a principal posts a deviating contract y, he anticipates that the principal–agent ratio Θ(y) will be such that one type of agent is indifferent about applying for the contract and all other types weakly prefer some other posted contracts. Moreover, his belief about the distribution Γ (y) places all its weight on types that are indifferent about applying for the contract. Of course, some contracts may be unattractive to all agents for any principal–agent ratio, in which case we impose Θ(y) = ∞. This uniquely pins down the functions Θ and Γ .11 Equilibrium condition (i) then imposes that, for principals not to post y, they must not earn positive profit from it, given Θ and Γ . One can also think heuristically about Θ and Γ through a hypothetical adjustment process. When a principal considers posting a deviating contract y, he initially imagines an infinite principal–agent ratio. Some contracts will not be able to attract any agents even at that ratio, in which case Θ(y) = ∞ and the choice of Γ (y) is arbitrary and immaterial. Otherwise, some agents would be attracted to the contract, pulling down the principal–agent ratio. This adjustment process stops at the value of Θ(y) such that one type is indifferent about the deviating contract and all other types weakly prefer their equilibrium contract. Moreover, only agents who are indifferent are attracted to the contract, restricting Γ (y). 11 The requirement that if ui (y) < 0, then either Θ(y) = ∞ or γi (y) = 0 rules out the possibility that type i agents earn zero utility, but apply for contract y with the expectation they will not be able to get it, Θ(y) = 0. Other agents with U¯ j > 0 would then not apply for the contract. Such a belief might make a deviation unprofitable, if ηv ¯ i (y) < k, but we find it implausible. In particular, it is inconsistent with the adjustment process described in the next paragraph.
1834
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
Our definition of equilibrium is related to the “refined equilibrium” concept in Gale (1996). That model initially allows principals to have arbitrary beliefs about the composition of agents attracted to contracts y ∈ / Y P . In particular, principals may anticipate attracting only undesirable agents, which raises the required principal–agent ratio. At a high principal–agent ratio, the contract may be unattractive to all types of agents, so the pessimistic beliefs are never confirmed. Gale (1996, p. 220) noted that this can create multiple equilibria and then argued for a refinement, where the principal believes that the type of agent attracted to y is the type who is willing to endure the lowest principal– agent ratio, and suggested that this is equivalent to the “universal divinity” concept (Banks and Sobel (1987)). This refinement is similar to our restriction on Γ for contracts that are not posted in equilibrium. 3. CHARACTERIZATION We now show how to construct an equilibrium as the solution to a set of optimization problems. For any type i, consider the problem (P-i)
max
θ∈[0∞]y∈Y
s.t.
μ(θ)ui (y)
η(θ)vi (y) ≥ k and μ(θ)uj (y) ≤ U¯ j for all j < i
In terms of economics, (P-i) chooses market tightness θ and a contract y to maximize the expected utility of type i subject to a principal making nonnegative profits when only type i agents apply, and subject to types lower than i not wanting to apply. Now consider the larger problem (P) of solving (P-i) for all i. More precisely, we say that a set I∗ ⊂ I and three vectors {U¯ i }i∈I , {θi }i∈I∗ , and {yi }i∈I∗ solve (P) if the following criteria are satisfied: (a) I∗ denotes the set of i such that the constraint set of (P-i) is nonempty and the maximized value is strictly positive, given (U¯ 1 U¯ i−1 ). (b) For any i ∈ I∗ , the pair (θi yi ) solves problem (P-i) given (U¯ 1 U¯ i−1 ), and U¯ i = μ(θi )ui (yi ). (c) For any i ∈ / I∗ , U¯ i = 0. Lemma 1 below claims that there exists a solution to (P). Then Proposition 1 says that we can find any equilibrium by solving (P); conversely, Proposition 2 says that any solution to (P) generates an equilibrium. Existence and uniqueness of equilibrium (Proposition 3) follows directly. We then find conditions which ensure that U¯ i > 0 for all i, and so I∗ = I, in Proposition 4. Finally, Proposition 5 shows that our results are not sensitive to the restriction that principals post contracts rather than revelation mechanisms. In terms of related approaches in the literature, Moen (1997) first defined competitive search equilibrium and showed that it is equivalent to the solu-
ADVERSE SELECTION IN COMPETITIVE SEARCH
1835
tion to a constrained optimization problem, like our problem (P-1). We extend this to an environment with adverse selection and show that for all types i > 1, this introduces the additional constraints μ(θ)uj (y) ≤ U¯ j for all j < i. Essentially these constraints ensure that lower types are not attracted to the contract designed for type i. We establish in Lemma 1 below that the appropriate constraints for higher types j > i are also satisfied, so only downward incentive constraints bind in equilibrium. The solution to problem (P) is essentially the least-cost separating equilibrium: it maximizes the utility of each type of agent subject to principals earning nonnegative profits and subject to worse types of agents not attempting to get the contract. The explanation for why an equilibrium must be separating is standard, although the application to an environment where the trading probabilities μ and η are endogenous is more novel. If there were pooling in equilibrium, the sorting condition A3 ensures that principals could perturb the contract to attract only the more desirable type of agent, breaking the proposed equilibrium. Similarly, any other separating contract (e.g., one that strictly excludes lower types) is dominated by a less distortionary contract (e.g., one that leaves lower types indifferent about the contract). The question remains as to why a least-cost separating equilibrium always exists. For example, in Rothschild and Stiglitz (1976) such a proposed equilibrium may be broken by a nondistorting pooling contract if there are sufficiently few low types. Such a deviation is never optimal here because of the endogenous composition of the searchers attracted to an off-the-equilibrium-path contract. Part (ii) of the definition of equilibrium ensures that if an agent is attracted to a deviating contract, he earns the same expected payoff from the contract as from his most preferred equilibrium contract. All other types do better sticking with their equilibrium contracts. Since the nondistorting pooling contract cross-subsidizes low types at the expense of high types, low types have more to gain from the deviation and are, therefore, the ones who actually search for the deviating contract. We stress that beliefs about the composition of searchers for contracts that are not posted in equilibrium, Γ , are not arbitrary; if they were, there may be many equilibria supported by the belief that only bad types search for deviating contracts, as in Gale (1996) before he introduced his refinement. When beliefs about Γ are rationally determined by which type of agent has the strongest incentive to apply for a deviating contract, equilibrium payoffs are unique. We now proceed to show all of this formally. As a preliminary step, we prove that (P) has a solution and provide a partial characterization by showing that the zero profit condition binds and that higher types are not attracted by (θ y). LEMMA 1: There exists I∗ , {U¯ i }i∈I , {θi }i∈I∗ , and {yi }i∈I∗ that solve (P). At any solution, η(θi )vi (yi ) = k for all i ∈ I∗ μ(θi )uj (yi ) ≤ U¯ j for all j ∈ I and i ∈ I∗
1836
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
All formal proofs are given in the Appendix, but the existence proof comes directly from noticing that (P) has a recursive structure. As a first step, (P-a) depends only on exogenous variables and thus determines U¯ 1 . In general, at step i, (P-i) depends on the previously determined values of U¯ j for j < i and determines U¯ i . Thus, we can solve (P) in I iterative steps. We now show that a solution to (P) can be used to construct an equilibrium in which some principals offer a contract to attract type i ∈ I∗ , while keeping out other types. The relevant contract is suggested by the solution to (P), but we must also show that no other contract gives positive profit. PROPOSITION 1: Suppose I∗ , {U¯ i }i∈I , {θi }i∈I∗ , and {yi }i∈I∗ solve (P). Then there ¯ λ Y P Θ Γ } with U¯ = {U¯ i }i∈I , Y P = exists a competitive search equilibrium {U {yi }i∈I∗ , Θ(yi ) = θi , and γi (yi ) = 1. Note that the type distribution π does not enter problem (P), so this propo¯ Y P Θ Γ } is consistent with competitive search sition implies that whether {U equilibrium is independent of that distribution. The type distribution only affects the measure λ over contracts. This is consistent with known results in competitive search models with heterogeneous agents; see, for example, Moen (1997, Proposition 5). The next result establishes that any equilibrium can be characterized using (P). The proof is based on a variational argument, showing that if (θi yi ) does not solve (P), it cannot be part of an equilibrium. ¯ λ Y P Θ Γ } be a competitive search equilibrium. Let PROPOSITION 2: Let {U ∗ {U¯ i }i∈I = U¯ and I = {i ∈ I | U¯ i > 0}. For each i ∈ I∗ , there exists a contract y ∈ Y P with Θ(y) < ∞ and γi (y) > 0. Moreover, take any {yi }i∈I∗ and {θi }i∈I∗ with γi (yi ) > 0 and θi = Θ(yi ) < ∞. Then I∗ , {U¯ i }i∈I , {θi }i∈I∗ , and {yi }i∈I∗ solve (P). The above results imply that, in equilibrium, any contract y that attracts type i solves (P-i) in the sense that the solution to the problem has θi = Θ(y) and yi = y. The existence of equilibrium and uniqueness of equilibrium payoffs now follow immediately. PROPOSITION 3: Competitive search equilibrium exists and the equilibrium U¯ is unique. Note that, in principle, two pairs (θ y) and (θ y ) may both solve problem (P-i). In such a case, the set of contracts posted in equilibrium Y P is not uniquely determined. The next result shows that when there are strict gains from trade for all types, all agents get strictly positive utility.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1837
¯ i (y) > k and PROPOSITION 4: Assume that for all i there exists y ∈ Y with ηv ui (y) > 0. Then in any competitive search equilibrium, U¯ i > 0 for all i and, in particular, there exists a contract y ∈ Y with Θ(y) < ∞ and γi (y) > 0 for all i. The proof follows by showing that the maximized value of any (P-i) is positive as long as U¯ j > 0 for j < i. One might imagine a stronger claim that if there are strict gains from trade for any type i, then U¯ i > 0, but an example in Section 7 shows that this may not be the case. In particular, if there are no gains from trade for some type j < i, so U¯ j = 0, it may be that U¯ i = 0, even though there would be gains from trade for type i with full information. Finally, as stated before, we show that our restriction to contract posting is without loss of generality in the following sense: PROPOSITION 5: Any competitive search equilibrium with contract posting is a competitive search equilibrium with revelation mechanisms. Conversely, any competitive search equilibrium with revelation mechanisms is payoff-equivalent to a competitive search equilibrium with contract posting. The proof in the Appendix includes the definition of competitive search equilibrium with revelation mechanisms. 4. FEASIBLE ALLOCATIONS To set the stage for studying efficiency, we define a feasible allocation. We begin by defining an allocation, by which we basically mean a description of the posted contracts together with the implied search behavior and payoffs of agents. DEFINITION 2: An allocation is a vector U¯ of expected utilities for the agents, a measure λ over the set of feasible contracts Y with support Y P , a function Θ˜ : Y P → [0 ∞], and a function Γ˜ : Y P → ΔI . Note that Θ˜ and Γ˜ are different from the Θ and Γ in the definition of equilibrium, because the former are defined only over the set of posted contracts, while the latter are defined for all feasible contracts. An allocation is feasible whenever (i) each posted contract offers the maximal expected utility to agents who direct their search for that contract and no more to those who do not, (ii) each posted contract generates zero profits, and (iii) markets clear. Any feasible allocation could be implemented through legal or other restrictions on the contracts that can be offered; however, some feasible allocations may not correspond to equilibria because principals may want to offer contracts that are not posted. More formally, we have the following definition:
1838
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
˜ Γ˜ } is feasible if ¯ λ Y P Θ DEFINITION 3: An allocation {U P ˜ < ∞, (i) For any y ∈ Y and i such that γ˜ i (y) > 0 and Θ(y) ˜ U¯ i = μ(Θ(y))u i (y) where
˜ ))ui (y ) U¯ i ≡ max 0 max μ(Θ(y y ∈Y P
and U¯ i = 0 if Y P = ∅. ˜ (ii) For any y ∈ Y P , η(Θ(y)) ˜ i (y)vi (y) = k. iγ ˜ i (y) (iii) For all i ∈ I, γΘ(y) dλ({y}) ≤ πi , with equality if U¯ i > 0. ˜ 5. APPLICATION I: THE RAT RACE We now proceed with the first of our three main applications—a version of the rat race (Akerlof (1976)). For concreteness, think of agents here as workers who are heterogeneous in terms of both their productivity and their preference over consumption and working hours, and think of principals as firms that are willing to pay more for high-productivity (good) workers and can observe hours but not productivity. We prove that if the disutility of hours is lower for good workers, a separating equilibrium may require them to work more hours than in the first-best case. In addition, if longer hours produce more output, then the constrained optimum features overemployment of good workers.12 Finally, under a regularity condition on the matching function, good workers get more consumption when employed than they would get under full information. 5.1. Setup A contract here is y = (c h), where c is the worker’s consumption and h ≥ 0 is the amount of work hours. We assume there are I = 2 types. The payoff of a type i worker who applies to (c h) and is matched is ui (c h) = c − φi (h) where φi is a differentiable, increasing, strictly convex function with φi (0) = φi (0) = 0. We assume that φ1 (h) = κφ2 (h) for all h and impose Assumption A3, which here amounts to κ > 1. The payoff of a firm posting (c h) matched with a type i worker is vi (c h) = fi (h) − c 12 A static version of Inderst and Müller (1999) is a special case of this example, with output independent of hours worked. In this case, asymmetric information does not distort employment.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1839
where fi is a differentiable, nonnegative, nondecreasing, weakly concave production function. ¯ 1 (h) − Assumption A1 requires f2 (h) ≥ f1 (h) for all h. We assume η(f φ1 (h)) > k for some h, which ensures that there are gains from trade for both types; Proposition 4 implies U¯ 1 > 0 and U¯ 2 > 0. We also assume there is an ¯ = φ2 (h). ¯ Equilibrium contracts never set h > h. ¯ We then h¯ > 0 such that f2 (h) ¯ ¯ for some ε > 0. The maxrestrict the set of contracts to Y = [−ε f2 (h)]×[0 h] imum effort level ensures Y is compact, but is otherwise irrelevant. The possibility of a small negative transfer ensures Assumption A2 holds, while no firm ¯ Finally, we assume the matching function μ is would ever pay more than f2 (h). strictly concave and continuously differentiable. Then we find equilibrium by solving problem (P), where problem (P-i) is U¯ i =
max
θ∈[0∞](ch)∈Y
s.t.
μ(θ)(c − φi (h))
μ(θ)(fi (h) − c) ≥ θk and μ(θ)(c − φj (h)) ≤ U¯ j for
j ≤ i
5.2. Equilibrium Before characterizing equilibrium, we describe the allocation that would arise under full information. For type i, this is given by the solution to problem (P-i), but ignoring the constraint μ(θ)(c − φj (h)) ≤ U¯ j for all j < i. One can prove that the zero profit condition always binds, η(θ)(fi (h) − c) = k, and use that to eliminate c from the objective function. Then the unconstrained values of hours and recruiting solve (h∗i θi∗ ) = arg max μ(θ)(fi (h) − φi (h)) − θk hθ
The level of consumption can be derived from the zero profit condition, ci∗ = fi (h∗i ) − kθi∗ /μ(θi∗ ). Under asymmetric information, there are no additional constraints in problem (P-1), so equilibrium consumption, hours, and market tightness are unchanged for bad workers, c1 = c1∗ , h1 = h∗1 , and θ1 = θ1∗ . Problem (P-1) also defines U¯ 1 . Now turn to the good workers’ problem. Under some conditions, this problem may also be unconstrained. For example, if the difference in the disutility of work is large relative to the difference in productivity, bad workers may prefer their unconstrained contract to the unconstrained contract for good workers. This is the case if and only if (1)
μ(θ2∗ )(f2 (h∗2 ) − φ1 (h∗2 )) − θ2∗ k ≤ μ(θ1∗ )(f1 (h∗1 ) − φ1 (h∗1 )) − θ1∗ k
We are interested in cases where this constraint is violated. We prove that the contract for good workers specifies that they work too many hours (h2 > h∗2 )
1840
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
and, as long as f2 (h) is strictly increasing, overemploys good workers (θ2 > θ2∗ ). Under an additional mild restriction on the matching function μ,13 employed good workers are also overpaid (c2 > c2∗ ). In short, the equilibrium is distorted along both the intensive and extensive margins. RESULT 1: Assume condition (1) is not satisfied. There exists a competitive search equilibrium with h2 > h∗2 . If f2 is constant, θ2 = θ2∗ and c2 = c2∗ . If f2 is strictly increasing, θ2 > θ2∗ . If, in addition, the elasticity of the matching function μ is nonincreasing in θ, c2 > c2∗ . As usual, the proof is given in the Appendix. One may also be interested in comparing outcomes across individuals in equilibrium. It is straightforward to prove that good workers are employed more often than bad workers, θ2 > θ1 . To see this, recall that first-best hours h∗i maximize fi (h) − φi (h). Revealed preference then implies f2 (h∗2 ) − φ2 (h∗2 ) ≥ f2 (h∗1 ) − φ2 (h∗1 ). Since also f2 (h) ≥ f1 (h) and φ2 (h) < φ1 (h) for all h > 0, then f2 (h∗1 ) − φ2 (h∗1 ) > f1 (h∗1 ) − φ1 (h∗1 ). This proves f2 (h∗2 ) − φ2 (h∗2 ) > f1 (h∗1 ) − φ1 (h∗1 ). In addition, the first-best market tightness θi∗ maximizes μ(θ)(fi (h∗i ) − φi (h∗i )) − θk, proving θ2∗ > θ1∗ . Since θ2 ≥ θ2∗ and θ1 = θ1∗ , the result follows. On the other hand, without an additional regularity condition, f1 (h) ≤ f2 (h) for all h, we cannot generally establish whether good workers work more and are paid more than bad workers. The notion that separating good workers from bad ones may require overworking the good workers dates back to Akerlof (1976). In a model without frictions, zero profit then implies that good workers are compensated with higher wages. Our novel results are that a separating equilibrium always exists and that, under mild conditions, it involves overemployment of good workers, both relative to bad workers and relative to the first best. Competitive search is central to these conclusions. 5.3. Efficiency Consider an allocation that treats the two types identically. All firms post the ˜ p hp ) = θp , where contract (c h) = (c p hp ) and the market tightness is Θ(c p ∗ h = h2 is the first-best level of hours for type 2 workers, θp = θ2∗ is the firstbest principal–agent ratio for type 2, and c p ensures firms earn zero expected profits: f2 (hp ) = φ2 (hp ) μ (θp )(f2 (hp ) − φ2 (hp )) = k 13 We require that the elasticity of μ is nonincreasing. The elasticity is nonnegative since μ is nondecreasing. Since μ is bounded above, the elasticity converges to zero in the limit as θ converges to infinity.
ADVERSE SELECTION IN COMPETITIVE SEARCH
c p = π1 f1 (hp ) + π2 f2 (hp ) −
1841
θp k μ(θp )
The share of type i agents searching for this contract satisfies γ˜ i (c p hp ) = πi and there are enough of these contracts to be consistent with market clearing. It is straightforward to verify that this allocation is feasible. We claim that if there are sufficiently few low-productivity workers, it is also a Pareto improvement over the equilibrium: RESULT 2: Assume condition (1) is not satisfied. For fixed values of the other ¯ the equilibrium is Pareto parameters, there exists a π¯ > 0 such that if π1 < π, dominated by the pooling allocation where all firms post (c p hp ) with associated market tightness θp . In equilibrium, firms that want to attract type 2 workers need to screen out type 1 workers. The cost of screening is independent of the share of type 1 workers, while the collective benefit of screening depends on the share of type 1 workers. Type 2 workers may collectively prefer to cross-subsidize type 1 workers to avoid costly screening. However, this is inconsistent with equilibrium, since any individual worker would prefer a contract that screens out the bad types. 6. APPLICATION II: INSURANCE Our next application is based on Rothschild and Stiglitz (1976), where riskneutral principals offer insurance to risk-averse agents who are heterogeneous in their probability of a loss. This illustrates several features. First, we show here that even if a pooling allocation does not Pareto dominate the equilibrium, a partial-pooling allocation may. Second, to illustrate that traditional search frictions are not necessary for existence, in this example we assume that the short side of the market, which is determined endogenously, matches for sure: μ(θ) = min{θ 1}. 6.1. Setup We again specify the model in terms of worker–firm matching.14 Now the productivity of a match is initially unknown by both worker and firm. Some workers are more likely than others to generate productive matches, but firms cannot observe this: type i produce 1 unit of output with probability pi and 0 14 We frame the discussion in terms of labor markets, rather than general insurance, because it seems more reasonable to assume an employer wants to hire only a fraction of the available workforce than to assume an insurance company wants to serve only a fraction of its potential customers.
1842
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
otherwise, and pi is the agent’s private information. A contract specifies a transfer to the worker conditional on realized productivity. Workers are riskaverse and firms are risk-neutral. In the absence of adverse selection, full insurance equates the marginal utility of agents across states. We show that firms here do not provide full insurance, because incomplete insurance helps keep undesirable workers from applying. A contract consists of a pair of consumption levels, conditional on employment and unemployment, after match productivity has been realized, y = (ce cu ). The payoff of a type i worker who applies to (ce cu ) and is matched is ui (ce cu ) = pi φ(ce ) + (1 − pi )φ(cu ) where p1 < p2 < · · · < pI < 1 and the utility function φ : [c ∞) → R is increasing and strictly concave with limc→c φ(c) = −∞ for some c < 0 and φ(0) = 0. The payoff of a firm posting (ce cu ) matched with type i is vi (ce cu ) = pi (1 − ce ) − (1 − pi )cu To ensure A1 is satisfied, we restrict the set of feasible contracts to Y = {(ce cu ) | cu + 1 ≥ ce ≥ c and cu ≥ c}. The assumption limc→c φ(c) = −∞ ensures that actions of the form (ce c) yield negative utility for all types and so are not in Y¯ . Then, since a reduction in cu raises vi (ce cu ) and lowers uj (ce cu ) for any j < i, and is feasible for all y ∈ Y¯ , A2 is satisfied. To verify A3, consider an incremental increase in ce to ce + dce and an incremental reduction in cu to cu − dcu for some dce > 0 and dcu > 0. For a type i worker, this raises utility by approximately pi φ (ce )dce − (1 − pi )φ (cu )dcu , which is positive if and only if dce 1 − pi φ (cu ) > dcu pi φ (ce ) Since (1 − pi )/pi is decreasing in i, an appropriate choice of dce /dcu yields an increase in utility if and only if j ≥ i, which verifies A3. Finally, assume p1 ≤ k < pI , which ensures that there are no gains from employing the lowest type, even in the absence of asymmetric information, but there are gains from trade for higher types, say by setting ce = cu = pI − k > 0. Let i∗ denote the lowest type without gains from trade, so pi∗ ≤ k < pi∗ +1 .15 6.2. Equilibrium We can again characterize equilibrium using (P), leading to the next result: 15 The results extend to the case k < p1 by defining i∗ = 1. The lowest type obtains full insurance at an actuarially fair price, ce1 = cu1 = p1 − k, and the inductive characterization for i > i∗ is unchanged.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1843
RESULT 3: There exists a competitive search equilibrium, where for all i ≤ i∗ , ¯ Ui = 0, and for all i > i∗ , θi = 1, U¯ i > 0, and cei > cei−1 and cui < cui−1 are the unique solutions to pi (1 − cei ) − (1 − pi )cui = k and pi−1 φ(cei ) + (1 − pi−1 )φ(cui ) = pi−1 φ(cei−1 ) + (1 − pi−1 )φ(cui−1 ) where cei∗ = cui∗ = 0. In this case, adverse selection distorts the intensive margin—firms do not offer full insurance—but does not affect the extensive margin, since all workers and firms match if there are gains from trade.16 One interesting feature of equilibrium here is that cui < 0 for all i > i∗ , and, therefore, a worker is worse off when he matches and turns out to be unproductive than he would be if did not match in the first place. If one interprets a bad match as a layoff, contracts give laid-off workers lower payoffs than those who never match, because this keeps inferior workers from applying for the job.17 6.3. Efficiency Again we show equilibrium may not be efficient. First note that a worker with pi close to 1 suffers little from the distortions introduced by the information problem. At the extreme, if pI = 1, cuI = c excludes other workers without distorting the type I contract at all. Generally, adverse selection has the biggest impact on the utility of workers with an intermediate value of pi . We show that a Pareto improvement may result from partial pooling. Consider p1 = 1/4, p2 = 1/2, and p3 = 3/4, and suppose there are equal numbers of type 1 and 3, so that half of all matches are productive. Set φ(c) = log(1 + c) and k = 3/8. Then in equilibrium, U¯ 1 = 0; ce2 = 0344, cu2 = −0094, and U¯ 2 = φ(0104); and ce3 = 0576, cu3 = −0227, and U¯ 3 = φ(0319). Pooling all three types, the best incentive-feasible allocation involves ce = cu = 1/8 and U¯ i = φ(1/8). Compared to the equilibrium, this raises the utility of type 1 and type 2 workers, but reduces the utility of type 3 workers. Now consider an allocation that pools types 1 and 2. If there are sufficiently few type 1 workers, it is feasible to set ce = cu > 0104, delivering greater utility to types 1 and 2. For example, suppose π1 = π3 = 001 and π2 = 098. Then the utility of types 1 and 2 rises to φ(0122). By raising the utility of type 2, it is 16 If there were search frictions, so the short side of the market faced a risk of being unmatched, adverse selection would also affect the extensive margin. 17 To interpret a bad match as a layoff, it might help to imagine that an agent actually must work for some time before productivity is realized, as in Jovanovic (1979).
1844
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
easier to exclude them from type 3 contracts, reducing the requisite inefficiency of those contracts. This raises the utility of type 3, in this case, to φ(0325). 6.4. Relationship to Rothschild and Stiglitz Rothschild and Stiglitz (1976, p. 630) “consider an individual who will have income of size W if he is lucky enough to avoid accident. In the event an accident occurs, his income will be only W − d. The individual can insure against this accident by paying to an insurance company a premium α1 in return for which he will be paid αˆ 2 if an accident occurs. Without insurance his income in the two states, ‘accident,’ ‘no accident,’ was (W W − d); with insurance it is now (W − α1 W − d + α2 ) where α2 = αˆ 2 − α1 .” We can normalize the utility of an uninsured individual to zero and express the utility of one who anticipates an accident with probability pi as ui (α1 α2 ) = pi φ(W − α1 ) + (1 − pi )φ(W − d + α2 ) − κi where κi ≡ pi φ(W ) + (1 − pi )φ(W − d). Setting W = d = 1, and defining ce = 1 − α1 , and cu = α2 , this is equivalent to our example. Our results apply to their setup, with one wrinkle: our fixed cost of posting contracts. Rothschild and Stiglitz (1976) showed that in any equilibrium, principals who attract type i agents, i > 1, offer incomplete insurance to deter type i − 1 agents, which is of course very similar to our finding. Under some conditions, however, their equilibrium does not exist. Starting from a configuration of separating contracts, suppose one principal deviates by offering a full insurance contract to attract multiple types. In their setup, this is profitable if the leastcost separating contract is Pareto inefficient. But such a deviation is never profitable in our environment. The key difference is that in the original model, a deviating principal can capture all the agents in the economy, or at least a representative cross section, while in our model, a principal cannot serve all the agents who are potentially attracted to a contract, given that matching is bilateral. Instead, agents are rationed through the endogenous movement in market tightness θ. Whether such a deviation is profitable depends on which agents are most willing to accept a decline in θ. What we find is that high types are the first to give up on the full insurance contract. A lower type, with an inferior outside option U¯ i−1 < U¯ i , will accept a lower θ. Hence, a principal who tries to offer a full insurance contract will end up with a long queue of type 1 agents—the worst possible outcome. For this reason, the deviation is not profitable, and equilibrium with separating contracts always exists. 7. APPLICATION III: ASSET MARKETS A feature of the previous two examples is that the posted contracts are designed to screen agents. We now present a model where instead the market
ADVERSE SELECTION IN COMPETITIVE SEARCH
1845
tightness can be used to screen bad types. In other words, distortions occur only along the extensive margin. Although the results hold more generally, to stress the point, we again abstract from traditional search frictions and assume μ(θ) = min{θ 1}, so that matching is determined by the short side of the market. 7.1. Setup Consider an asset market with lemons, in the sense of Akerlof (1970). Buyers (principals) always value an asset more than sellers (agents) value it, but some assets are better than others and their values are private information to the seller. Market tightness, or probabilistic trading, seems in principle a good way to screen out low quality asset holders, since sellers with more valuable assets are more willing to accept a low probability of trade at any given price. This model shows how an illiquid asset market may have a useful role as a screening device. Each type i seller is endowed with one indivisible asset, which we call an apple, of type i, with value aSi > 0 to the seller and value aBi > 0 to the buyer, both expressed in units of a numeraire good. A contract is a pair (α t), where α is the probability that the seller gives the buyer the apple and t is the transfer in terms of numeraire to the seller.18 The payoff of a seller of type i matched with a buyer posting (α t) is ui (α t) = t − αaSi while the payoff of a buyer posting (α t) matched with a type i seller is vi (α t) = αaBi − t Note that we have normalized the no-trade payoff to 0. We set I = 2 and impose a number of restrictions on payoffs. First, both buyers and sellers prefer type 2 apples and both types of agents like apples: aS2 > aS1 > 0
and aB2 > aB1 > 0
Second, there would be gains from trade, including the cost k of posting, if the buyer were sure to trade: aSi + k < aBi
for i = 1 2
18 Given that apples are indivisible, it may be efficient to use lotteries, with α the probability apples change hands, as, for example, in Prescott and Townsend (1984) and Rogerson (1988). Nosal and Wallace (2007) provided a related model of an asset (money) market where probabilistic trade is useful due to private information, but they used random and not directed search, leading to quite different results. It would actually be equivalent here to assume apples are perfectly divisible and preferences are linear, with α reinterpreted as the fraction traded, but we like the indivisibility since it allows us to contrast our results with models that use lotteries.
1846
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
The available contracts are Y = [0 1] × [0 aB2 ], with Y¯ i = {(α t) ∈ Y | αaSi ≤ t ≤ αaBi − k}. Using these restrictions, we verify our three assumptions. As a preliminary step, note that (α t) ∈ Y¯ i implies α ≥ k/(aBi − aSi ) > 0 and t ≥ kaSi /(aBi − aSi ) > 0, so in any equilibrium contract, trades are bounded away from zero. Since α > 0 whenever (α t) ∈ Y¯ i , the restriction aB1 < aB2 implies A1. Also, A2 holds because for any (α t) ∈ Y¯ i , a movement to (α t − ε) with ε > 0 is feasible and raises buyer utility. The important assumption is again A3, which is here guaranteed by aS1 < aS2 . Fix (α t) ∈ Y¯ and a¯ ∈ (aS1 aS2 ). For arbitrary ¯ δ > 0, consider (α t ) = (α − δ t − aδ). This is feasible for small δ because (α t) ∈ Y¯ guarantees that α > 0 and t > 0. Then ¯ > 0 u2 (α t ) − u2 (α t) = δ(aS2 − a) ¯ < 0 u1 (α t ) − u1 (α t) = δ(aS1 − a) √ Now for fixed ε > 0, choose δ ≤ ε/ 1 + a¯ 2 . This ensures (α t ) ∈ Bε (α t), so A3 holds. 7.2. Equilibrium We again use problem (P) to characterize the equilibrium. RESULT 4: There exists a unique competitive search equilibrium with αi = 1, aB −aS −k ti = aBi −k, θ1 = 1, U¯ 1 = aB1 −aS1 −k, θ2 = a1B −a1S −k < 1, and U¯ 2 = θ2 (aB2 −aS2 −k). 2
1
With full information, θ2 = 1 and U¯ 2 = aB2 − aS2 − k. Relative to this benchmark, buyers post too few contracts designed to attract type 2 sellers, and hence too many of them fail to trade. Since type 2 sellers have better apples than type 1, they are more willing to accept this in return for a better price when they do trade. Agents with inferior assets are less willing to accept a low probability of trade because they do not want to be stuck with their own apple, which is in fact a lemon. Note that the alternative of setting θ2 = 1 but rationing though the probability of trade in a match, α2 < 1, wastes resources, because it involves posting more contracts at cost k. In other words, extensive distortions—reducing the matching rate—are more cost effective at screening than intensive distortions—lotteries. 7.3. Efficiency Consider a pooling allocation, with α1 = α2 = 1 and t1 = t2 = t. That is, all ˜ buyers post (1 t). Moreover, Θ(1 t) = 1, γ˜ i (1 t) = πi , and λ({(1 t)}) = 1. Finally, set t = π1 aB1 + π2 aB2 − k. This allocation is feasible, given that all sellers
ADVERSE SELECTION IN COMPETITIVE SEARCH
1847
apply to the same contract, the choice of t ensures zero profits, and the choice of λ ensures markets clear. The expected payoff for type i sellers is U¯ i = π1 aB1 + π2 aB2 − aSi − k With this pooling allocation, type 1 sellers are always better off than they were in equilibrium since aB1 < aB2 , and type 2 sellers are better off if and only if π1 aB1 + π2 aB2 − aS2 − k >
(aB2 − aS2 − k)(aB1 − aS1 − k) aB2 − aS1 − k
Since π2 = 1 − π1 , this reduces to π1 <
aB2 − aS2 − k U¯ 2 = aB2 − aS1 − k U¯ 1
Both the numerator and the denominator are positive, but the numerator is smaller (gains from trade are smaller for type 2 sellers) because aS2 > aS1 . Thus type 2 sellers prefer the pooling allocation when there is not too much crosssubsidization, or π1 is small, so that the cost of subsidizing type 1 sellers is worth the increased efficiency of trade. 7.4. No Trade So far, we have assumed there are gains from trade for both types of sellers. Now suppose there are no gains from trade for type 1 apples, aB1 ≤ aS1 + k. Then not only will type 1 sellers fail to trade, but in equilibrium, the entire market will shut down: RESULT 5: If aB1 ≤ aS1 + k, then, in any equilibrium, U¯ 1 = U¯ 2 = 0. Notice that the market shuts down here even if there are still gains from trade in good apples, aB2 > aS2 + k. Intuitively, it is only possible to keep bad apples out of the market by reducing the probability of trade in good apples. If there is no market in bad apples, however, agents holding them would accept any probability of trade. Hence we cannot screen out bad apples, and this renders the good apple market inoperative. Whether this is related to the recent collapse in asset-backed securities markets seems worth further exploration. 8. CONCLUSION We have developed a tractable general framework to analyze adverse selection in competitive search markets. Under our assumptions, there is a unique equilibrium, where principals post separating contracts. We characterized the equilibrium as the solution to a set of constrained optimization problems and
1848
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
illustrated the use of the model through a series of examples. We expect that one could extend the framework to dynamic situations, with repeated rounds of posting and search, as seems relevant in many applications—including labor and asset markets. It may also be interesting to study the case opposite to the one analyzed here, where the informed instead of the uninformed parties post contracts. In standard competitive search theory, the outcome does not depend on who posts. With asymmetric information, contract posting by informed parties may introduce multiplicity of equilibrium through the usual signaling mechanism (see Delacroix and Shi (2007)). All of this is left for future work. APPENDIX PROOF OF LEMMA 1: In the first step, we prove that there exists a solution to (P). The second and third steps establish the stated properties of the solution. Step 1. Consider i = 1. If the constraint set in (P-1) is empty, set U¯ 1 = 0. Otherwise, (P-1) is well behaved, as the objective function is continuous and the constraint set is compact. Hence, (P-1) has a solution and a unique maximum m1 . If m1 ≤ 0, set U¯ 1 = 0; otherwise set U¯ 1 = m1 and let (θ1 y1 ) be one of the maximizers. We now proceed by induction. Fix i > 1 and assume that we have found U¯ j for all j < i and (θj yj ) for all j ∈ I∗ , j < i. Consider (P-i). If the constraint set is empty, set U¯ i = 0. Otherwise, (P-i) again has a solution and a unique maximum mi . If mi ≤ 0, set U¯ i = 0; otherwise, let U¯ i = mi and let (θi yi ) be one of the maximizers. Step 2. Suppose by way of contradiction that there exists i ∈ I∗ such that (θi yi ) solves (P-i) but η(θi )vi (yi ) > k. This together with U¯ i = μ(θi )ui (yi ) > 0 implies that yi ∈ Y¯ i and μ(θi ) > 0. Fix ε > 0 such that η(θi )vi (y) ≥ k for all y ∈ Bε (yi ). Then A3 ensures there exists a y ∈ Bε (yi ) such that uj (y ) > uj (yi )
uj (y ) < uj (yi )
for all j ≥ i for all j < i
Then the pair (θi y ) satisfies all the constraints of problem (P-i): (i) η(θi )vi (y ) ≥ k from the choice of ε. (ii) μ(θi )uj (y ) < μ(θi )uj (yi ) ≤ U¯ j for all j < i, where the first inequality is by construction of y and μ(θi ) > 0, while the second holds since (θi yi ) solves (P-i). Now (θi y ) achieves a higher value than (θi yi ) for the objective function in (P-i), given μ(θi )ui (y ) > μ(θi )ui (yi ). Hence, (θi yi ) does not solve (P-i), a contradiction.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1849
Step 3. Fix i ∈ I∗ and suppose by way of contradiction that there exists j > i such that μ(θi )uj (yi ) > U¯ j . Let h be the smallest such j. Since i ∈ I∗ , then μ(θi )ui (yi ) = U¯ i > 0, which implies μ(θi ) > 0 and ui (yi ) > 0. Also, from the previous step, (θi yi ) satisfies η(θi )vi (yi ) = k, which ensures η(θi ) > 0 and vi (yi ) > 0. In particular, this implies that yi ∈ Y¯ i . The pair (θi yi ) satisfies the constraints of (P-h) for the following reasons: (i) η(θi )vh (yi ) ≥ η(θi )vi (yi ) = k, where the inequality holds by A1 given h > i and yi ∈ Y¯ i ⊂ Y¯ , and the equality comes from the previous step. (ii) μ(θi )ul (yi ) ≤ U¯ l for all l < h, which holds for (a) l < i because (θi yi ) satisfy the constraints of (P-i), (b) l = i because U¯ i = μ(θi )ui (yi ) since i ∈ I∗ , and (c) i < l < h by the choice of h as the smallest violation of μ(θi )uj (yi ) > U¯ j . Since μ(θi )uh (yi ) > U¯ h ≥ 0, then (θi yi ) is in the constraint set of (P-h) and delivers a strictly positive value for the objective function; hence h ∈ I∗ . But then the fact that U¯ h is not the maximized value of (P-h) is a contradiction. Q.E.D. PROOF OF PROPOSITION 1: We proceed by construction. • The vector of expected utilities is U¯ = {U¯ i }i∈I . • The set of posted contracts is Y P = {yi }i∈I∗ . • λ is such that λ({yi }) = πi Θ(yi ) for any i ∈ I∗ . • For all i ∈ I∗ , Θ(yi ) = θi . For any other y ∈ Y , let J(y) = {j | uj (y) > 0} denote the types that attain positive utility from y. If J(y) = ∅ and minj∈J(y) {U¯ j /uj (y)} < μ, ¯ then U¯ j j∈J(y) uj (y)
μ(Θ(y)) = min
If this equation is consistent with multiple values of Θ(y), pick the largest one. ¯ then Θ(y) = ∞. Otherwise, if J(y) = ∅ or minj∈J(y) {U¯ j /uj (y)} ≥ μ, • For all i ∈ I∗ , let γi (yi ) = 1 and so γj (yi ) = 0 for j = i. For any other y ∈ Y , define Γ (y) such that γh (y) > 0 only if h ∈ arg minj∈J(y) {U¯ j /uj (y)}. If there are multiple minimizers, let γh (y) = 1 for the smallest such h. If J(y) = ∅, again choose Γ (y) arbitrarily, for example, γ1 (y) = 1. We now verify that all of the equilibrium conditions from Definition 1 hold. Condition (i). For any i ∈ I∗ , (θi yi ) solves (P-i), and Lemma 1 implies η(θi )vi (yi ) = k. Thus, profit maximization and free entry hold for any contract yi , i ∈ I∗ . Now consider an arbitrary contract: we claim that principals’ profit maximization and free-entry condition are satisfied. Suppose, to the contrary, that there exists y ∈ Y with η(Θ(y)) i γi (y)vi (y) > k. This implies η(Θ(y)) > 0, so Θ(y) < ∞, and there exists some type j with γj (y) > 0 and η(Θ(y))vj (y) > k. Since γj (y) > 0 and Θ(y) < ∞, our construction of Θ(y) and Γ (y) implies that j is the smallest solution to minh∈J(y) {U¯ h /uh (y)}, and hence uj (y) > 0 and U¯ j = μ(Θ(y))uj (y). So, by construction, for all h < j with
1850
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
uh (y) > 0, U¯ h > μ(Θ(y))uh (y). Moreover, if uh (y) ≤ 0, U¯ h ≥ μ(Θ(y))uh (y) since U¯ h ≥ 0. This proves that (Θ(y) y) satisfies the constraints of (P-j). Then, since η(Θ(y))vj (y) > k, Lemma 1 implies that there exists (θ y ) that satisfies the constraints of (P-j) but delivers a higher value of the objective function, μ(θ )uj (y ) > U¯ j ≥ 0. Since U¯ j is at least equal to the maximized value of (P-j), this is a contradiction. Condition (ii). By construction, Θ and Γ ensure that U¯ i ≥ μ(Θ(y))ui (y) for all contracts y ∈ Y , with equality if Θ(y) < ∞ and γi (y) > 0. Moreover, for any i ∈ I∗ , U¯ i = μ(θi )ui (yi ) > 0, where θi = Θ(yi ) and yi is the equilibrium contract offered to i. Finally, for any y ∈ Y such that ui (y) < 0 for some i, it must be that i ∈ / J(y). Then either J(y) = ∅, in which case Θ(y) = ∞, or J(y) = ∅, in which case γh (y) = 1 for some h ∈ J(y) and so γi (y) = 0. Condition (iii). Market clearing obviously holds given the way we construct λ. Since all the equilibrium conditions are satisfied, the proof is complete. Q.E.D. PROOF OF PROPOSITION 2: From equilibrium condition (i), any y ∈ Y P has η(Θ(y)) > 0, hence Θ(y) < ∞. From condition (iii), U¯ i > 0 implies γi (y) > 0 for some y ∈ Y P . This proves that for each i ∈ I∗ , there exists a contract y ∈ Y P with Θ(y) < ∞ and γi (y) > 0. The remainder of the proof proceeds in five steps. The first four steps show that for any i ∈ I∗ and yi ∈ Y P with θi = Θ(yi ) < ∞ and γi (yi ) > 0, (θi yi ) solves (P-i). First, we prove that the constraint η(θi )vi (yi ) ≥ k is satisfied. Second, we prove that the constraint μ(θi )uj (yi ) ≤ U¯ j is satisfied for all j. Third, we prove that the pair (θi yi ) delivers U¯ i to type i. Fourth, we prove that (θi yi ) solves (P-i). The fifth and final step shows that for any i ∈ / I∗ , either the constraint set of (P-i) is empty or the maximized value is nonpositive. Step 1. Take i ∈ I∗ and yi ∈ Y P with θi = Θ(yi ) < ∞ and γi (yi ) > 0. We claim the constraint η(θi )vi (yi ) ≥ k is satisfied in (P-i). Note first that i ∈ I∗ implies U¯ i > 0. By equilibrium condition (ii), U¯ i = μ(θi )ui (yi ) and so μ(θi ) > 0. To derive a contradiction, assume η(θi )vi (yi ) < k. Equilibrium condition (i) implies η(θi ) j γj (yi )vj (yi ) = k, so there is an h with γh (yi ) > 0 and η(θi )vh (yi ) > k. ¯ then ηv ¯ h (yi ) > k. Moreover, because θi = Θ(yi ) < ∞ and Since η(θi ) ≤ η, γh (yi ) > 0, optimal search implies uh (yi ) ≥ 0. This proves yi ∈ Y¯ h . Next fix ε > 0 such that η(θi )vh (y) > k for all y ∈ Bε (yi ). Then A3 together with yi ∈ Y¯ h guarantees that there exists y ∈ Bε (yi ) such that uj (y ) > uj (yi )
for all j ≥ h
uj (y ) < uj (yi )
for all j < h
Notice that y ∈ Y¯ h as well, since uh (y ) > uh (yi ) ≥ 0 and ηv ¯ h (y ) ≥ η(θi ) × vh (y ) > k.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1851
Now consider θ ≡ Θ(y ). Note that μ(θ )uh (y ) ≤ U¯ h = μ(θi )uh (yi ) < μ(θi )uh (y ) where the weak inequality follows from optimal search, the equality holds because θi < ∞ and γh (yi ) > 0, and the strict inequality holds by the construction of y , since μ(θi ) > 0. This implies μ(θ ) < μ(θi ) < ∞ and so θ < θi . Next observe that for all j < h, either uj (y ) < 0, in which case γj (y ) = 0 by equilibrium condition (ii), or μ(θ )uj (y ) < μ(θi )uj (yi ) ≤ U¯ j where the first inequality uses μ(θ ) < μ(θi ) and the construction of y , while the second follows from optimal search. Hence, γj (y ) = 0 for all j < h. Finally, profits from posting y are η(θ )
I
γj (y )vj (y ) ≥ η(θ )vh (y ) ≥ η(θi )vh (y ) > k
j=1
The first inequality follows because γj (y ) = 0 if j < h and vh (y ) is nondecreasing in h by A1 together with y ∈ Y¯ h ⊂ Y¯ ; the second inequality follows because θ < θi ; and the last inequality uses the construction of ε. A deviation to posting y is therefore strictly profitable. This is a contradiction and completes Step 1. Step 2. Again take i ∈ I∗ and yi ∈ Y P with θi = Θ(yi ) < ∞ and γi (yi ) > 0. Equilibrium condition (ii) implies μ(θi )uj (yi ) ≤ U¯ j for all j so that the second constraint in (P-i) is satisfied for all j. Step 3. Again take i ∈ I∗ and yi ∈ Y P with θi = Θ(yi ) < ∞ and γi (yi ) > 0. Equilibrium condition (ii) implies U¯ i = μ(θi )ui (yi ), since θi < ∞ and γi (yi ) > 0. Hence (θi yi ) delivers U¯ i to type i. Step 4. Again take i ∈ I∗ and yi ∈ Y P with θi = Θ(yi ) < ∞ and γi (yi ) > 0. To find a contradiction, suppose there exists (θ y) that satisfies the constraints of (P-i) but delivers higher utility. That is, η(θ)vi (y) ≥ k, μ(θ)uj (y) ≤ U¯ j for all j < i, and μ(θ)ui (y) > U¯ i . We now use A2. Note that μ(θ)ui (y) > U¯ i > 0 implies μ(θ) > 0 and ui (y) > 0, while η(θ)vi (y) ≥ k implies vi (y) > 0 and so ηv ¯ i (y) ≥ k. In particular, y ∈ Y¯ i . We can therefore fix ε > 0 such that for all y ∈ Bε (y), μ(θ)ui (y ) > U¯ i , and then choose y ∈ Bε (y) such that vi (y ) > vi (y) and uj (y ) ≤ uj (y) for all j < i. This ensures η(θ)vi (y ) > k, μ(θ)uj (y ) ≤ U¯ j for all j < i, and μ(θ)ui (y ) > U¯ i . Note that we still have y ∈ Y¯ i . We now use A3. Fix ε > 0 such that for all y ∈ Bε (y ), η(θ)vi (y) > k and μ(θ)ui (y) > U¯ i . Choose y ∈ Bε (y ) such that uj (y ) > uj (y ) for all j ≥ i uj (y ) < uj (y ) for all j < i
1852
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
This ensures η(θ)vi (y ) > k, μ(θ)uj (y ) < U¯ j for all j < i, and μ(θ)ui (y ) > U¯ i . Note that we still have y ∈ Y¯ i . Now consider the contract y . From equilibrium condition (ii), μ(θ)ui (y ) > U¯ i implies that μ(θ) > μ(Θ(y )), which guarantees η(Θ(y ))vi (y ) > k. This also implies Θ(y ) < ∞. We next claim γj (y ) = 0 for all j < i. Suppose γj (y ) > 0 for some j < i. Since Θ(y ) < ∞, equilibrium condition (ii) implies uj (y ) ≥ 0. We have already shown that μ(θ) > μ(Θ(y )), and uj (y ) > uj (y ) by construction. Thus μ(θ)uj (y ) > μ(Θ(y ))uj (y ). Equilibrium condition (ii) requires U¯ j ≥ μ(θ)uj (y ) and thus U¯ j > μ(Θ(y ))uj (y ). This together with Θ(y ) < ∞ implies γj (y ) = 0, a contradiction. Now the profit from offering contract y is η(Θ(y ))
I
γj (y )vj (y ) ≥ η(Θ(y ))vi (y ) > k
j=1
where the first inequality uses γj (y ) = 0 for j < i and A1, and the second holds by construction. This contradicts the first condition in the definition of equilibrium and proves (θi yi ) solves (P-i). Step 5. Suppose there is an i ∈ / I∗ for which the constraint set of (P-i) is nonempty and the maximized value is positive. That is, suppose there exists (θ y) such that η(θ)vi (y) ≥ k, μ(θ)uj (y) ≤ U¯ j for all j < i, and μ(θ)ui (y) > U¯ i = 0. Replicating Step 4, we can first find y such that μ(θ)ui (y ) > 0, η(θ)vi (y ) > k, and μ(θ)uj (y ) ≤ U¯ j for all j < i. Then we find y such that μ(θ)ui (y ) > 0, η(θ)vi (y ) > k, and μ(θ)uj (y ) < U¯ j for all j < i. Hence y only attracts type i or higher, and hence must be profitable and deliver positive utility, a contradiction. Q.E.D. PROOF OF PROPOSITION 3: By Lemma 1 there is a solution to (P). Proposition 1 shows that if I∗ , {U¯ i }i∈I , {θi }i∈I∗ , and {yi }i∈I∗ solve (P), there is an ¯ λ Y P Θ Γ } with the same U, ¯ Y P = {yi }i∈I∗ , Θ(yi ) = θi , and equilibrium {U γi (yi ) = 1. This proves existence. Proposition 2 shows that in any equilibrium ¯ λ Y P Θ Γ }, U¯ i is the maximum value of (P-i) for all i ∈ I∗ , and U¯ i = 0 {U otherwise. Lemma 1 shows there is a unique maximum value U¯ i for (P-i) for all i ∈ I∗ . This proves that the equilibrium U¯ is unique. Q.E.D. ¯ 1 (y) > k and PROOF OF PROPOSITION 4: Consider i = 1. Fix y satisfying ηv u1 (y) > 0. Then fix θ > 0 satisfying η(θ)v1 (y) = k. These points satisfy the constraints of (P-1) and deliver utility μ(θ)u1 (y) > 0. This proves U¯ 1 > 0. Now suppose U¯ j > 0 for all j < i. We claim U¯ i > 0. Again fix y satisfying ηv ¯ i (y) > k and ui (y) > 0. Then fix θ > 0 satisfying η(θ)v1 (y) ≥ k and μ(θ)uj (y) ≤ U¯ j for all j < i; this is feasible since U¯ j > 0 and μ(0) = 0. These points satisfy the
ADVERSE SELECTION IN COMPETITIVE SEARCH
1853
constraints of (P-i) and deliver utility μ(θ)ui (y) > 0, which proves U¯ i > 0. By induction, the proof is complete. Q.E.D. PROOF OF PROPOSITION 5: First, we define a competitive search equilibrium in the general environment where principals can post revelation mechanisms. Recall that C ⊂ Y I is the set of revelation mechanisms. The proof uses yi to denote the contract that a mechanism y offers to an agent who reports that her type is i, and similarly yi for a mechanism y . In addition, y¯ is a degenerate mechanism that offers everyone a contract y. DEFINITION 4: A competitive search equilibrium with revelation mechanisms is a vector U¯ = {U¯ i }i∈I ∈ RI+ , a measure λm on C with support C P , a function Θm : C → [0 ∞], and a function Γ m : C → ΔI that satisfy the following conditions: (i) Principals’ Profit Maximization and Free Entry: For any y ∈ C, γim (y)vi (yi ) ≤ k η(Θm (y)) i
with equality if y ∈ C P . (ii) Agents’ Optimal Search: Let U¯ i = max 0 max μ(Θm (y ))ui (yi ) y ∈C P
and U¯ i = 0 if C P = ∅. Then for any y ∈ C and i, U¯ i ≥ μ(Θm (y))ui (yi ), with equality if Θm (y) < ∞ and γim (y) > 0. Moreover, if ui (yi ) < 0, either Θm (y) = ∞ or γim (y) = 0. γm (y) (iii) Market Clearing: C P Θim (y) dλm ({y}) ≤ πi for any i, with equality if U¯ i > 0. This is a natural generalization of our definition of competitive search with contract posting. It implicitly assumes that agents always truthfully reveal their type, as a direct revelation mechanism naturally induces them to do. The proof proceeds in two steps. First, we show that any competitive search equilibrium with contract posting is a competitive search equilibrium with revelation mechanisms. Second, we show that any competitive search equilibrium with revelation mechanisms is payoff-equivalent to a competitive search equilibrium with contract posting. ¯ λ Step 1. Take a competitive search equilibrium with contract posting {U Y P Θ Γ }. We construct a competitive search equilibrium with revelation ¯ λm C P Θm Γ m }, where λm ({¯y}) = λ({y}), so y ∈ Y P if and mechanisms {U P only if y¯ ∈ C , and for all y ∈ Y , Θm (¯y) = Θ(y) and Γ m (¯y) = Γ (y). We also need to construct Θm and Γ m for general mechanisms y ∈ C. As before, let
1854
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
J m (y ) = {j | uj (yj ) > 0} denote the types that attain positive utility from y . If ¯ then J m (y ) = ∅ and minj∈J m (y ) {U¯ j /uj (yj )} < μ, U¯ j (y ) uj (y ) j
μ(Θm (y )) = min m j∈J
If this equation is consistent with multiple values of Θm (y ), pick the largest one. Also define Γ m (y ) such that γhm (y ) > 0 only if h ∈ arg minj∈J m (y ) {U¯ j / uj (yj )}. If there are multiple minimizers, γhm (y ) = 1 for the smallest such h. Otherwise, if J m (y ) = ∅ or minj∈J m (y ) {U¯ j /uj (yj )} ≥ μ, ¯ then Θm (y ) = ∞ and m Γ (y ) is arbitrary, for example, γ1 (y ) = 1. Conditions (ii) and (iii) are satisfied by construction. Condition (i) is also satisfied for degenerate mechanisms. Now suppose that there exists a general mechanism y ∈ C with η(Θm (y )) i γim (y )vi (yi ) > k. This implies Θm (y ) < ∞. In addition, Γ m (y ) puts all its weight on the smallest element of arg minj∈J m (y ) {U¯ j /uj (yj )}, some type h. That is, η(Θm (y ))vh (yh ) > k. Now consider the degenerate mechanism y¯ h . Since incentive compatibility implies ui (yi ) ≥ ui (yh ) for all i, Θm (¯yh ) = Θm (y ) and h is also the smallest element of arg minj∈J m (¯yh ) {U¯ j /uj (yh )}. Thus η(Θm (¯yh ))
γim (¯yh )vi (yh ) ≥ η(Θm (¯yh ))vh (yh )
i
= η(Θm (y ))vh (yh ) > k The first inequality uses A1, since h is the worst type attracted to the mechanism, the equality uses Θm (¯yh ) = Θm (y ), and the final inequality holds by assumption. In other words, the degenerate mechanism y¯ h yields positive profits. But then so would yh in the original competitive search equilibrium with contracting posting, a contradiction. Step 2. Take a competitive search equilibrium with revelation mechanisms ¯ λm C P Θm Γ m }. We construct a competitive search equilibrium with con{U ¯ λ Y P Θ Γ } as follows. First, for each type i with U¯ i > 0, tract posting {U select a mechanism y with Θm (y) < ∞ and γim (y) > 0. Let yi be the contract offered to i by this mechanism. Then set Θ(yi ) = Θm (y), γi (yi ) = 1, and λ({yi }) = πi Θ(yi ). If for two (or more) types i and j, this procedure selects mechanisms y for i and y for j with yi = yj , it is straightforward to prove that Θm (y) = Θm (y ); otherwise both types would prefer the mechanism with the higher principal– agent ratio. Then let γi (yi ) = πi /(πi +πj ) and λ({yi }) = (πi +πj )Θ(yi ). Finally, for any other degenerate mechanism y¯ , Θ(y) = Θm (¯y) and Γ (y) = Γ m (¯y). Conditions (ii) and (iii) in the definition of equilibrium with contract posting hold by construction. Moreover, no contract y ∈ Y yields positive profits, since the associated degenerate mechanism y¯ could have been offered in the equilibrium with revelation mechanisms. It only remains to verify that principals
ADVERSE SELECTION IN COMPETITIVE SEARCH
1855
break even on equilibrium contracts, η(Θ(y)) i γi (y)vi (y) = k for y ∈ Y P . To prove this, consider any mechanism that is offered in the equilibrium with revelation mechanisms, y ∈ C P . Part (i) of the definition of equilibrium m m imposes η(Θ (y)) i γi (y)vi (yi ) = k. Suppose for some i with γim (y) > 0, η(Θm (y))vi (yi ) > k. A3 ensures that there is a degenerate mechanism offering a perturbation of contract yi , say y¯ , which attracts only type i and higher agents, and so, by A1, yields positive profits: η(Θm (¯y ))vi (y ) > k for some y near yi . This is a contradiction with the definition of competitive search equilibrium with revelation mechanisms, proving that for all y ∈ C P and i with γim (y) > 0, η(Θm (y))vi (yi ) = k. Then by the construction of Θ, Γ , and Y P , we have that for all yi ∈ Y P with γi (yi ) > 0, η(Θ(yi ))vi (yi ) = k. This completes the proof. Q.E.D. PROOF OF RESULT 1: By Lemma 1, the constraint in problem (P-1) is binding, so we can eliminate c and reduce the problem to U¯ 1 = max μ(θ)(f1 (h) − φ1 (h)) − θk θh
At the solution, h1 is such that f1 (h1 ) = φ1 (h1 ) and θ1 solves μ (θ1 )(f1 (h1 ) − φ1 (h1 )) = k. Substituting this into the objective function delivers U¯ 1 , and the constraint delivers c1 . Next, solve (P-2) using the U¯ 1 derived in the previous step: U¯ 2 = s.t.
max
θ∈[0∞](tx)∈Y
μ(θ)(c − φ2 (h))
μ(θ)(f2 (h) − c) ≥ θk and μ(θ)(c − φ1 (h)) ≤ U¯ 1
Denoting by λ and ν the Lagrangian multipliers attached, respectively, to the first and the second constraint, the first order conditions with respect to h and c can be combined to obtain λ=
φ1 (h2 ) − φ2 (h2 ) φ1 (h2 ) − f2 (h2 )
and
ν=
φ2 (h2 ) − f2 (h2 ) φ1 (h2 ) − f2 (h2 )
Given that we restricted attention to the interesting case where both constraints are binding, both λ and ν are positive. A3 (φ1 (h2 ) > φ2 (h2 )) implies the numerator of λ is positive. Then the denominator is positive as well, φ1 (h2 ) > f2 (h2 ). This is also the denominator of ν, and so its numerator is positive, proving φ2 (h2 ) > f2 (h2 ). Since φ2 (h∗2 ) = f2 (h∗2 ), convexity of φ2 and concavity of f2 imply h2 > h∗2 . Next eliminate the multipliers from the first order condition with respect to θ. Use φ1 = κφ2 to get
f2 (h2 )φ2 (h2 ) (2) = k μ (θ2 ) f2 (h2 ) − φ2 (h2 )
1856
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
Note that with f2 (h) = f¯2 for all h, this reduces to μ (θ2 )f¯2 = k. In the economy with symmetric information, we had instead μ (θ2∗ )(f2 (h∗2 ) − φ2 (h∗2 )) = k and f2 (h∗2 ) = φ2 (h∗2 ) (so if f2 (h) = f¯2 , h∗2 = 0 and μ (θ2∗ )f¯2 = k). Thus in any case equation (2) holds at (h2 θ2 ) = (h∗2 θ2∗ ). Now if f2 (h) = f¯2 , equation (2) pins down θ2 , independent of h2 , so θ2 = θ2∗ . If f2 is strictly increasing, equation (2) describes an increasing locus of points in (h2 θ2 ) space, so h2 > h∗2 implies θ2 > θ2∗ . Finally, consumption satisfies the zero profit condition. Eliminate k using equation (2) to get
θ2 μ (θ2 ) f2 (h2 )φ2 (h2 ) c2 = f2 (h2 ) − f2 (h2 ) − μ(θ2 ) φ2 (h2 ) Again, if h2 = h∗2 and θ2 = θ2∗ , then c2 = c2∗ . If h2 > h∗2 and θ2 > θ2∗ , then f2 μ (θ2 ) strictly increasing and concave, φ2 increasing and convex, and θ2μ(θ nonin2) ∗ Q.E.D. creasing imply c2 > c2 . PROOF OF RESULT 2: We first prove that if the pooling contract raises the utility of type 2 workers relative to the equilibrium level, it raises the utility of type 1 as well. Since the constraint to exclude type 1 workers from the type 2 contract binds in equilibrium, the equilibrium utility of a type 1 worker is U¯ 1 = U¯ 2 − μ(θ2 )(φ1 (h2 ) − φ2 (h2 )) In the pooling contract, both types work the same hours and have the same market tightness. Therefore, p p U¯ 1 = U¯ 2 − μ(θp )(φ1 (hp ) − φ2 (hp ))
where U¯ i is the utility of a type i worker in the pooling contract. It follows that p if U¯ 2 ≥ U¯ 2 , p
p U¯ 1 − U¯ 1 ≥ μ(θ2 )(φ1 (h2 ) − φ2 (h2 )) − μ(θp )(φ1 (hp ) − φ2 (hp ))
Recalling that hp and θp are the first-best levels, Result 1 proved hp ≤ h2 and θp ≤ θ2 . Therefore, φ1 (hp ) − φ2 (hp ) < φ1 (h2 ) − φ2 (h2 ) and μ(θp ) < μ(θ2 ). p This proves U¯ 1 > U¯ 1 . Next, because condition (1) is not satisfied, the equilibrium expected utility of type 2 workers, U¯ 2 , is strictly less than the first-best level, say U¯ 2∗ . In addition, by construction hp and θp are equal to the first-best level for type 2, (h∗2 θ2∗ ), and c p is continuous in π1 and converges to the first-best level c2∗ p as π1 converges to 0. It follows that U¯ 2 is continuous in π1 and converges to ¯ U¯ 2∗ > U¯ 2 as π1 converges to 0. This proves there is a π¯ such that for all π1 < π, p U¯ 2 > U¯ 2 . Q.E.D.
ADVERSE SELECTION IN COMPETITIVE SEARCH
1857
PROOF OF RESULT 3: For i ≤ i∗ , consider (P-i) without the constraint of keeping out lower types. This relaxed problem should yield a higher payoff U¯ i ≤
max
θ∈[0∞](ce cu )∈Y
s.t.
min{θ 1}(pi φ(ce ) + (1 − pi )φ(cu ))
min{1 θ−1 }(pi (1 − ce ) − (1 − pi )cu ) ≥ k
At the solution, cui = cei = ci , so this reduces to U¯ i ≤
max
θ∈[0∞]c≥c
s.t.
min{θ 1}φ(c)
min{1 θ−1 }(pi − c) ≥ k
Either the constraint set is empty (if pi < k + c) or there are no points in the constraint set that give positive utility (given that pi − k ≤ 0). In any case, this gives U¯ i = 0. Turn next to a typical problem (P-i), i > i∗ : U¯ i =
max
θ∈[0∞](ce cu )∈Y
s.t.
min{θ 1}(pi φ(ce ) + (1 − pi )φ(cu ))
min{1 θ−1 }(pi (1 − ce ) − (1 − pi )cu ) ≥ k and min{θ 1}(pj φ(ce ) + (1 − pj )φ(cu )) ≤ U¯ j for all j < i
Note first that the solution sets θi ≤ 1: if θi > 1, reducing θi to 1 relaxes the first constraint without otherwise affecting the problem. Hence, we can rewrite the problem as U¯ i =
max
θ≤1(ce cu )∈Y
s.t.
θ(pi φ(ce ) + (1 − pi )φ(cu ))
pi (1 − ce ) − (1 − pi )cu ≥ k and θ(pj φ(ce ) + (1 − pj )φ(cu )) ≤ U¯ j
for all j < i
Lemma 1 ensures the first constraint is binding, which proves pi (1 − cei ) − (1 − pi )cui = k. It remains to prove that θi = 1, cei > cei−1 , cui < cui−1 , and the constraint for j = i − 1 binds. We start by establishing these claims for i = i∗ + 1. In this case, cei = cui = 0 satisfies the constraints but leaves the first one slack. Lemma 1 implies that it is possible to do better, which proves U¯ i > 0. On the other hand, consider any (cei cui ) that delivers positive utility and satisfies the last constraint, so pi φ(cei ) + (1 − pi )φ(cui ) > 0 and pj φ(cei ) + (1 − pj )φ(cui ) ≤ 0
1858
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
for all j < i. Subtracting inequalities gives (pi − pj )(φ(cei ) − φ(cui )) > 0 which proves cei > cui . Now if cei > cui ≥ 0, then pj φ(cei ) + (1 − pj )φ(cui ) > 0, so this is infeasible. If 0 ≥ cei > cui , then pi φ(cei ) + (1 − pi )φ(cui ) < 0, so this is suboptimal. This proves cei > 0 > cui when i = i∗ + 1. Finally, since pj φ(cei ) + (1 − pj )φ(cui ) ≤ 0 for all j < i, setting θi = 1 raises the value of the objective function without affecting the constraints and so is optimal. We now proceed by induction. Fix i > i∗ + 1 and assume that for all j ∈ ∗ {i + 1 i − 1}, we have cej > cej−1 , cuj < cuj−1 , θj = 1, and pj−1 φ(cej ) + (1 − pj−1 )φ(cuj ) = pj−1 φ(cej−1 ) + (1 − pj−1 )φ(cuj−1 ) = U¯ j−1 We establish the result for i. Setting cei = cei−1 , cui = cui−1 , and θi = 1 satisfies the constraints in (P-i). Since it leaves the first constraint slack, Lemma 1 implies it is possible to do better. Thus θi (pi φ(cei ) + (1 − pi )φ(cui )) > pi φ(cei−1 ) + (1 − pi )φ(cui−1 ) On the other hand, the second constraint implies θi (pi−1 φ(cei ) + (1 − pi−1 )φ(cui )) ≤ U¯ i−1 = pi−1 φ(cei−1 ) + (1 − pi−1 )φ(cui−1 ) Subtracting inequalities gives (pi − pi−1 )(θi φ(cei ) − φ(cei−1 ) − θi φ(cui ) + φ(cui−1 )) > 0. Using pi > pi−1 , this implies θi φ(cei ) − φ(cei−1 ) > θi φ(cui ) − φ(cui−1 ). As before, we can rule out the possibility that θi φ(cui ) ≥ φ(cui−1 ), because this is infeasible. We can rule out the possibility that θi φ(cei ) ≤ φ(cei−1 ), because this is suboptimal. Hence θi φ(cei ) − φ(cei−1 ) > 0 > θi φ(cui ) − φ(cui−1 ). Now since φ(cei−1 ) > 0 and θi ∈ [0 1], the first inequality implies cei > cei−1 . Since φ(cui−1 ) < 0, the second inequality implies cui < cui−1 . Next suppose θi < 1 and consider the following variation: raise θi to 1, and increase ce and reduce cu while keeping both θ(pi φ(ce ) + (1 − pi )φ(cu )) and pi ce + (1 − pi )cu unchanged; that is, set ce > cei and cu < cui so that θi (pi φ(cei ) + (1 − pi )φ(cui )) = pi φ(ce ) + (1 − pi )φ(cu ) and pi cei + (1 − pi )cui = pi ce + (1 − pi )cu For all j < i, (pj − pi )(θi φ(cei ) − φ(ce ) + φ(cu ) − θi φ(cui )) > 0 since pj < pi , 0 < cei < ce , cu < cui < 0, and θi < 1. Add this to θi (pi φ(cei ) +
ADVERSE SELECTION IN COMPETITIVE SEARCH
1859
(1 − pi )φ(cui )) = pi φ(ce ) + (1 − pi )φ(cu ) to obtain θi (pj φ(cei ) + (1 − pj )φ(cui )) > pj φ(cu ) + (1 − pj )φ(ce ). This implies the perturbation relaxes the remaining constraints and so is feasible. This proves θi = 1. Finally, we prove that the constraints for j < i − 1 are slack. If not then pj φ(cei ) + (1 − pj )φ(cui ) = U¯ j ≥ pj φ(cei−1 ) + (1 − pj )φ(cui−1 ) where the inequality uses the constraints in problem (P-(i − 1)). On the other hand, pi−1 φ(cei ) + (1 − pi−1 )φ(cui ) ≤ U¯ i−1 = pi−1 φ(cei−1 ) + (1 − pi−1 )φ(cui−1 ) where the inequality is a constraint in (P-i) and the equality is the definition of U¯ i−1 . Subtracting these equations yields (pi−1 − pj )(φ(cei−1 ) − φ(cei ) + φ(cui ) − φ(cui−1 )) ≥ 0. Since pi−1 > pj , cei > cei−1 , and cui < cui−1 , we have a contradiction. The constraints for all j < i − 1 are slack, while the constraint for i − 1 binds; otherwise the solution to (P-i) would have cei = cui = pi − k. Q.E.D. PROOF OF RESULT 4: Write (P-1) as U¯ 1 = s.t.
max
θ∈[0∞](αt)∈Y
min{θ 1}(t − αaS1 )
min{1 θ−1 }(αaB1 − t) ≥ k
By Lemma 1, we can rewrite the problem as U¯ 1 =
max
θ∈[0∞]α∈[01]
min{θ 1}α(aB1 − aS1 ) − θk
Since aB1 > aS1 + k, it is optimal to set α = θ = 1. It follows that U¯ 1 = aB1 − aS1 − k. Now consider (P-2): U¯ 2 = s.t.
max
θ∈[0∞](αt)∈Y
min{θ 1}(t − αaS2 )
min{1 θ−1 }(αaB2 − t) ≥ k min{θ 1}(t − αaS1 ) ≤ aB1 − aS1 − k
Again, we can eliminate t using the first constraint to write U¯ 2 = s.t.
max
θ∈[0∞]α∈[01]
min{θ 1}α(aB2 − aS2 ) − θk
min{θ 1}α(aB2 − aS1 ) − θk = aB1 − aS1 − k
1860
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
Then use the last constraint to eliminate α and write aB − aS1 − (1 − θ)k B (a2 − aS2 ) − θk U¯ 2 = max 1 θ∈[0∞] aB2 − aS1 s.t.
aB1 − aS1 − (1 − θ)k ∈ [0 1] min{θ 1}(aB2 − aS1 )
where the constraint here ensures that α is a probability. Since aS1 < aS2 < aB2 , the objective function is decreasing in θ, and we set it to the smallest value consistent with the constraints: θ2 =
aB1 − aS1 − k < 1 aB2 − aS1 − k
This implies α2 = 1, so the constraint binds. Then U¯ 2 is easy to compute. Q.E.D. PROOF OF RESULT 5: Write problem (P-1) as U¯ 1 = s.t.
max
θ∈[0∞](αt)∈Y
min{θ 1}(t − αaS1 )
min{1 θ−1 }(αaB1 − t) ≥ k
Again we can eliminate t and rewrite this as U¯ 1 =
max
θ∈[0∞]α∈[01]
min{θ 1}α(aB1 − aS1 ) − θk
Since aB1 ≤ aS1 + k, then U¯ 1 = 0, which is attained by θ = 0. Now consider (P-2): U¯ 2 = s.t.
max
θ∈[0∞](αt)∈Y
min{θ 1}(t − αaS2 )
min{1 θ−1 }(αaB2 − t) ≥ k
min{θ 1}(t − αaS1 ) ≤ 0
Eliminating t using the first constraint gives U¯ 2 = s.t.
max
θ∈[0∞]α∈[01]
min{θ 1}α(aB2 − aS2 ) − θk
min{θ 1}α(aB2 − aS1 ) − θk = 0
Eliminating α using the last constraint gives aS − aS2 θk U¯ 2 = max B1 θ∈[0∞] a − aS 1 2 s.t.
θk ∈ [0 1] min{θ 1}(aB2 − aS1 )
ADVERSE SELECTION IN COMPETITIVE SEARCH
1861
Since aB2 > aS2 > aS1 , the fraction in the objective function is negative. Hence θ = 0 and U¯ 2 = 0. Q.E.D. REFERENCES ACEMOGLU, D., AND R. SHIMER (1999): “Efficient Unemployment Insurance,” Journal of Political Economy, 107, 893–928. [1828] AKERLOF, G. A. (1970): “The Market for ‘Lemons’: Quality Uncertainty and the Market Mechanism,” Quarterly Journal of Economics, 84, 488–500. [1825,1845] (1976): “The Economics of Caste and of the Rat Race and Other Woeful Tales,” Quarterly Journal of Economics, 90, 599–617. [1824,1838,1840] BANKS, J. S., AND J. SOBEL (1987): “Equilibrium Selection in Signaling Games,” Econometrica, 55, 647–661. [1834] BURDETT, K., S. SHI, AND R. WRIGHT (2001): “Pricing and Matching With Frictions,” Journal of Political Economy, 109, 1060–1085. [1827,1828] DELACROIX, A., AND S. SHI (2007): “Pricing and Signaling With Frictions,” Working Paper 298, University of Toronto. [1848] DUBEY, P., AND J. GEANAKOPLOS (2002): “Competitive Pooling: Rothschild–Stiglitz Reconsidered,” Quarterly Journal of Economics, 117, 1529–1570. [1826,1827] DUBEY, P., J. GEANAKOPLOS, AND M. SHUBIK (2005): “Default and Punishment in General Equilibrium,” Econometrica, 73, 1–37. [1826,1827] FAIG, M., AND B. JEREZ (2005): “A Theory of Commerce,” Journal of Economic Theory, 122, 60–99. [1828] GALE, D. (1992): “A Walrasian Theory of Markets With Adverse Selection,” Review of Economic Studies, 59, 229–255. [1826,1827] (1996): “Equilibria and Pareto Optima of Markets With Adverse Selection,” Economic Theory, 7, 207–235. [1826,1834,1835] GLOSTEN, L. R., AND P. MILGROM (1985): “Bid, Ask and Transaction Prices in a Market-Maker Market With Heterogeneously Informed Traders,” Journal of Financial Economics, 14, 71–100. [1824] GREEN, J., AND C. M. KAHN (1983): “Wage-Employment Contracts,” Quarterly Journal of Economics, 98, 173–187. [1824] GUERRIERI, V. (2008): “Inefficient Unemployment Dynamics Under Asymmetric Information,” Journal of Political Economy, 116, 667–708. [1828] INDERST, R., AND H. M. MÜLLER (1999): “Competitive Search Markets With Adverse Selection,” Mimeo. [1826,1838] INDERST, R., AND A. WAMBACH (2001): “Competitive Insurance Markets Under Adverse Selection and Capacity Constraints,” European Economic Review, 45, 1981–1992. [1826] (2002): “Capacity Constrained Firms in (Labor) Markets With Adverse Selection,” Economic Theory, 19, 525–548. [1826] JOVANOVIC, B. (1979): “Job Matching and the Theory of Turnover,” Journal of Political Economy, 87, 972–990. [1843] KOHLBERG, E., AND J.-F. MERTENS (1986): “On the Strategic Stability of Equilibria,” Econometrica, 54, 1003–1037. [1827] KYLE, A. S. (1985): “Continuous Auctions and Insider Trading,” Econometrica, 53, 1315–1335. [1824] MAILATH, G., A. POSTLEWAITE, AND L. SAMUELSON (2010): “Pricing in Matching Markets,” Mimeo, University of Pennsylvania. [1828] MILGROM, P., AND C. SHANNON (1994): “Monotone Comparative Statics,” Econometrica, 62, 157–180. [1831] MIYAZAKI, H. (1977): “The Rat Race and Internal Labor Markets,” Bell Journal of Economics, 8, 394–418. [1827]
1862
V. GUERRIERI, R. SHIMER, AND R. WRIGHT
MOEN, E. R. (1997): “Competitive Search Equilibrium,” Journal of Political Economy, 105, 385–411. [1828,1834,1836] MOEN, E. R., AND Å. ROSÉN (2006): “Incentives in Competitive Search Equilibrium and Wage Rigidity,” Discussion Paper 5554, CEPR. [1828] MONTGOMERY, J. D. (1991): “Equilibrium Wage Dispersion and Interindustry Wage Differentials,” Quarterly Journal of Economics, 106, 163–179. [1828] MORTENSEN, D. T., AND R. WRIGHT (2002): “Competitive Pricing and Efficiency in Search Equilibrium,” International Economic Review, 43, 1–20. [1828] NOSAL, E., AND N. WALLACE (2007): “A Model of (the Threat of) Counterfeiting,” Journal of Monetary Economics, 54, 994–1001. [1826,1845] PETERS, M. (1991): “Ex ante Price Offers in Matching Games: Non-Steady States,” Econometrica, 59, 1425–1454. [1828] PRESCOTT, E. C., AND R. M. TOWNSEND (1984): “Pareto Optima and Competitive Equilibria With Adverse Selection and Moral Hazard,” Econometrica, 52, 21–45. [1826,1845] RILEY, J. G. (1979): “Informational Equilibrium,” Econometrica, 47, 331–359. [1827] ROGERSON, R. (1988): “Indivisible Labor, Lotteries and Equilibrium,” Journal of Monetary Economics, 21, 3–16. [1845] ROTHSCHILD, M., AND J. STIGLITZ (1976): “Equilibrium in Competitive Insurance Markets: An Essay on the Economics of Imperfect Information,” Quarterly Journal of Economics, 90, 629–649. [1825-1828,1835,1841,1844] SHAPIRO, C., AND J. E. STIGLITZ (1984): “Equilibrium Unemployment as a Worker Discipline Device,” American Economic Review, 74, 433–444. [1824] SHIMER, R. (1996): “Contracts in a Frictional Labor Market,” Mimeo, MIT. [1828] WILLIAMSON, S., AND R. WRIGHT (1994): “Barter and Monetary Exchange Under Private Information,” American Economic Review, 84, 104–123. [1824] WILSON, C. (1977): “A Model of Insurance Markets With Incomplete Information,” Journal of Economic Theory, 16, 167–207. [1827]
Booth School of Business, University of Chicago, 5807 South Woodlawn Avenue, Chicago, IL 60637, U.S.A.;
[email protected], Dept. of Economics, University of Chicago, 1126 East 59th Street, Chicago, IL 60637, U.S.A.;
[email protected], and Wisconsin School of Business, University of Wisconsin–Madison, 975 University Avenue, Madison, WI 53706, U.S.A. and Research Dept., Federal Reserve Bank of Minneapolis, 90 Hennepin Avenue, Minneapolis, MN 55401, U.S.A.;
[email protected]. Manuscript received April, 2009; final revision received May, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 1863–1903
THE PERSISTENT EFFECTS OF PERU’S MINING MITA BY MELISSA DELL1 This study utilizes regression discontinuity to examine the long-run impacts of the mita, an extensive forced mining labor system in effect in Peru and Bolivia between 1573 and 1812. Results indicate that a mita effect lowers household consumption by around 25% and increases the prevalence of stunted growth in children by around 6 percentage points in subjected districts today. Using data from the Spanish Empire and Peruvian Republic to trace channels of institutional persistence, I show that the mita’s influence has persisted through its impacts on land tenure and public goods provision. Mita districts historically had fewer large landowners and lower educational attainment. Today, they are less integrated into road networks and their residents are substantially more likely to be subsistence farmers. KEYWORDS: Forced labor, land tenure, public goods.
1. INTRODUCTION THE ROLE OF HISTORICAL INSTITUTIONS in explaining contemporary underdevelopment has generated significant debate in recent years.2 Studies find quantitative support for an impact of history on current economic outcomes (Nunn (2008), Glaeser and Shleifer (2002), Acemoglu, Johnson, and Robinson (2001, 2002), Hall and Jones (1999)), but have not focused on channels of persistence. Existing empirical evidence offers little guidance in distinguishing a variety of potential mechanisms, such as property rights enforcement, inequality, ethnic fractionalization, barriers to entry, and public goods. This paper uses variation in the assignment of an historical institution in Peru to identify land tenure and public goods as channels through which its effects persist. Specifically, I examine the long-run impacts of the mining mita, a forced labor system instituted by the Spanish government in Peru and Bolivia in 1573 and abolished in 1812. The mita required over 200 indigenous communities to send one-seventh of their adult male population to work in the Potosí silver and Huancavelica mercury mines (Figure 1). The contribution of mita conscripts changed discretely at the boundary of the subjected region: on one side, all communities sent the same percentage of their population, while on the other side, all communities were exempt. 1 I am grateful to Daron Acemoglu, Bob Allen, Josh Angrist, Abhijit Banerjee, John Coatsworth, David Cook, Knick Harley, Austin Huang, Nils Jacobsen, Alan Manning, Ben Olken, James Robinson, Peter Temin, Gary Urton, Heidi Williams, Jeff Williamson, and seminar participants at City University of Hong Kong, Chinese University of Hong Kong, Harvard, MIT, Oxford, Stanford Institute of Theoretical Economics, and Warwick for helpful comments and suggestions. I also thank Javier Escobal and Jennifer Jaw for assistance in accessing data. Research funding was provided by the George Webb Medley Fund (Oxford University). 2 See, for example, Coatsworth (2005), Glaeser et al. (2004), Easterly and Levine (2003), Acemoglu, Johnson, and Robinson (2001, 2002), Sachs (2001), and Engerman and Sokoloff (1997).
© 2010 The Econometric Society
DOI: 10.3982/ECTA8121
1864
MELISSA DELL
FIGURE 1.—The mita boundary is in black and the study boundary in light gray. Districts falling inside the contiguous area formed by the mita boundary contributed to the mita. Elevation is shown in the background.
This discrete change suggests a regression discontinuity (RD) approach for evaluating the long-term effects of the mita, with the mita boundary forming a multidimensional discontinuity in longitude–latitude space. Because validity of the RD design requires all relevant factors besides treatment to vary smoothly at the mita boundary, I focus exclusively on the portion that transects the Andean range in southern Peru. Much of the boundary tightly follows the steep Andean precipice, and hence has elevation and the ethnic distribution of the population changing discretely at the boundary. In contrast, elevation, the ethnic distribution, and other observables are statistically identical across the segment of the boundary on which this study focuses. Moreover, specification checks using detailed census data on local tribute (tax) rates, the allocation of tribute revenue, and demography—collected just prior to the mita’s institution in 1573—do not find differences across this segment. The multidimensional nature of the discontinuity raises interesting and important questions about how to specify the RD polynomial, which will be explored in detail. Using the RD approach and household survey data, I estimate that a longrun mita effect lowers equivalent household consumption by around 25% in
PERSISTENT EFFECTS OF MITA
1865
subjected districts today. Although the household survey provides little power for estimating relatively flexible models, the magnitude of the estimated mita effect is robust to a number of alternative specifications. Moreover, data from a national height census of school children provide robust evidence that the mita’s persistent impact increases childhood stunting by around 6 percentage points in subjected districts today. These baseline results support the well known hypothesis that extractive historical institutions influence long-run economic prosperity (Acemoglu, Johnson, and Robinson (2002)). More generally, they provide microeconomic evidence consistent with studies establishing a relationship between historical institutions and contemporary economic outcomes using aggregate data (Nunn (2008), Banerjee and Iyer (2005), Glaeser and Shleifer (2002)). After examining contemporary living standards, I use data from the Spanish Empire and Peruvian Republic, combined with the RD approach, to investigate channels of persistence. Although a number of channels may be relevant, to provide a parsimonious yet informative picture, I focus on three that the historical literature and fieldwork highlight as important. First, using districtlevel data collected in 1689, I document that haciendas—rural estates with an attached labor force—developed primarily outside the mita catchment. At the time of the mita’s enactment, a landed elite had not yet formed. To minimize the competition the state faced in accessing scarce mita labor, colonial policy restricted the formation of haciendas in mita districts, promoting communal land tenure instead (Garrett (2005), Larson (1988)). The mita’s effect on hacienda concentration remained negative and significant in 1940. Second, econometric evidence indicates that a mita effect lowered education historically, and today mita districts remain less integrated into road networks. Finally, data from the most recent agricultural census provide evidence that a long-run mita impact increases the prevalence of subsistence farming. Based on the quantitative and historical evidence, I hypothesize that the long-term presence of large landowners in non-mita districts provided a stable land tenure system that encouraged public goods provision. The property rights of large landowners remained secure from the 17th century onward. In contrast, the Peruvian government abolished the communal land tenure that had predominated in mita districts soon after the mita ended, but did not replace it with a system of enforceable peasant titling (Jacobsen (1993), Dancuart and Rodriguez (1902, Vol. 2, p. 136)). As a result, extensive confiscation of peasant lands, numerous responding peasant rebellions as well as banditry and livestock rustling were concentrated in mita districts during the late 19th and 20th centuries (Jacobsen (1993), Bustamante Otero (1987, pp. 126– 130), Flores Galindo (1987, p. 240), Ramos Zambrano (1984, pp. 29–34)). Because established landowners in non-mita districts enjoyed more secure title to their property, it is probable that they received higher returns from investing in public goods. Moreover, historical evidence indicates that well established landowners possessed the political connections required to secure public goods
1866
MELISSA DELL
(Stein (1980)). For example, the hacienda elite lobbied successfully for roads, obtaining government funds for engineering expertise and equipment, and organizing labor provided by local citizens and hacienda peons (Stein (1980, p. 59)). These roads remain and allow small-scale agricultural producers to access markets today, although haciendas were subdivided in the 1970s. The positive association between historical haciendas and contemporary economic development contrasts with the well known hypothesis that historically high land inequality is the fundamental cause of Latin America’s poor long-run growth performance (Engerman and Sokoloff (1997)). Engerman and Sokoloff argued that high historical inequality lowered subsequent investments in public goods, leading to worse outcomes in areas of the Americas that developed high land inequality during the colonial period. This theory’s implicit counterfactual to large landowners is secure, enfranchised smallholders of the sort that predominated in some parts of North America. This is not an appropriate counterfactual for Peru or many other places in Latin America, because institutional structures largely in place before the formation of the landed elite did not provide secure property rights, protection from exploitation, or a host of other guarantees to potential smallholders.3 The evidence in this study indicates that large landowners—while they did not aim to promote economic prosperity for the masses—did shield individuals from exploitation by a highly extractive state and ensure public goods. Thus, it is unclear whether the Peruvian masses would have been better off if initial land inequality had been lower, and it is doubtful that initial land inequality is the most useful foundation for a theory of long-run growth. Rather, the Peruvian example suggests that exploring constraints on how the state can be used to shape economic interactions, for example, the extent to which elites can employ state machinery to coerce labor or citizens can use state guarantees to protect their property, could provide a particularly useful starting point for modeling Latin America’s long-run growth trajectory. In the next section, I provide an overview of the mita. Section 3 discusses identification and tests whether the mita affects contemporary living standards. Section 4 examines channels empirically. Finally, Section 5 offers concluding remarks. 2. THE MINING MITA 2.1. Historical Introduction The Potosí mines, discovered in 1545, contained the largest deposits of silver in the Spanish Empire, and the state-owned Huancavelica mines provided the 3
This argument is consistent with evidence on long-run inequality from other Latin American countries, notably Acemoglu et al. (2008) on Cundinamarca and Colombia and Coatsworth (2005) on Mexico.
PERSISTENT EFFECTS OF MITA
1867
mercury required to refine silver ore. Beginning in 1573, indigenous villages located within a contiguous region were required to provide one-seventh of their adult male population as rotating mita laborers to Potosí or Huancavelica, and the region subjected remained constant from 1578 onward.4 The mita assigned 14,181 conscripts from southern Peru and Bolivia to Potosí and 3280 conscripts from central and southern Peru to Huancavelica (Bakewell (1984, p. 83)).5 Using population estimates from the early 17th century (Cook (1981)), I calculate that around 3% of adult males living within the current boundaries of Peru were conscripted to the mita at a given point in time. The percentage of males who at some point participated was considerably higher, as men in subjected districts were supposed to serve once every 7 years.6 Local native elites were responsible for collecting conscripts, delivering them to the mines, and ensuring that they reported for mine duties (Cole (1985, p. 15), Bakewell (1984)). If community leaders were unable to provide their allotment of conscripts, they were required to pay in silver the sum needed to hire wage laborers instead. Historical evidence suggests that this rule was strictly enforced (Garrett (2005, p. 126), Cole (1985, p. 44), Zavala (1980), Sanchez-Albornoz (1978)). Some communities did commonly meet mita obligations through payment in silver, particularly those in present-day Bolivia who had relatively easy access to coinage due to their proximity to Potosí (Cole (1985)). Detailed records of mita contributions from the 17th, 18th, and early 19th centuries indicate that communities in the region that this paper examines contributed primarily in people (Tandeter (1993, pp. 56, 66), Zavala (1980, Vol. II, pp. 67–70)). This is corroborated by population data collected in a 1689 parish census (Villanueva Urteaga (1982)), described in the Supplemental Material (Dell (2010)), which shows that the male–female ratio was 22% lower in mita districts (a difference significant at the 1% level).7 4
The term mita was first used by the Incas to describe the system of labor obligations, primarily in local agriculture, that supported the Inca state (D’Altoy (2002, p. 266), Rowe (1946, pp. 267–269)). While the Spanish coopted this phrase, historical evidence strongly supports independent assignment. Centrally, the Inca m’ita required every married adult male in the Inca Empire (besides leaders of large communities), spanning an area far more extensive than the region I examine, to provide several months of labor services for the state each year (D’Altoy (2002, p. 266), Cieza de León (1551)). 5 Individuals could attempt to escape mita service by fleeing their communities, and a number pursued this strategy (Wightman (1990)). Yet fleeing had costs: giving up access to land, community, and family; facing severe punishment if caught; and either paying additional taxes in the destination location as a “foreigner” (forastero) or attaching oneself to a hacienda. 6 Mita districts contain 17% of the Peruvian population today (Instituto Nacional de Estadística e Información de Perú (INEI) (1993)). 7 While colonial observers highlighted the deleterious effects of the mita on demography and well-being in subjected communities, there are some features that could have promoted relatively better outcomes. For example, mita conscripts sold locally produced goods in Potosí, generating trade linkages.
1868
MELISSA DELL
With silver deposits depleted, the mita was abolished in 1812, after nearly 240 years of operation. Sections 3 and 4 discuss historical and empirical evidence showing divergent histories of mita and non-mita districts. 2.2. The Mita’s Assignment Why did Spanish authorities require only a portion of districts in Peru to contribute to the mita and how did they determine which districts to subject? The aim of the Crown was to revive silver production to levels attained using free labor in the 1550s, before epidemic disease had substantially reduced labor supply and increased wages. Yet coercing labor imposed costs: administrative and enforcement costs, compensation to conscripts for traveling as much as 1000 kilometers (km) each way to and from the mines, and the risk of decimating Peru’s indigenous population, as had occurred in earlier Spanish mining ventures in the Caribbean (Tandeter (1993, p. 61), Cole (1985, pp. 3, 31), Cañete (1794), Levillier (1921, Vol. 4, p. 108)). To establish the minimum number of conscripts needed to revive production to 1550s levels, Viceroy Francisco Toledo commissioned a detailed inventory of mines and production processes in Potosí and elsewhere in 1571 (Bakewell (1984, pp. 76–78), Levillier (1921, Vol. 4)). These numbers were used, together with census data collected in the early 1570s, to enumerate the mita assignments. The limit that the mita subject no more than one-seventh of a community’s adult male population at a given time was already an established rule that regulated local labor drafts in Peru (Glave (1989)). Together with estimates of the required number of conscripts, this rule roughly determined what fraction of Andean Peru’s districts would need to be subjected to the mita. Historical documents and scholarship reveal two criteria used to assign the mita: distance to the mines at Potosí and Huancavelica and elevation. Important costs of administering the mita, such as travel wages and enforcement costs, were increasing in distance to the mines (Tandeter (1993, p. 60), Cole (1985, p. 31)). Moreover, Spanish officials believed that only highland peoples could survive intensive physical labor in the mines, located at over 4000 meters (13,000 feet) (Golte (1980)). The geographic extent of the mita is consistent with the application of these two criteria, as can be seen in Figure 1.8 This study focuses on the portion of the mita boundary that transects the Andean range, which this figure highlights in white, and the districts along this portion are termed the study region (see Supplemental Material Figure A1 for a detailed view). Here, exempt districts were those located farthest from 8 An elevation constraint was binding along the eastern and western mita boundaries, which tightly follow the steep Andean precipice. The southern Potosí mita boundary was also constrained, by the border between Peru and the Viceroyalty of Rio de la Plata (Argentina), and by the geographic divide between agricultural lands and an uninhabitable salt flat.
PERSISTENT EFFECTS OF MITA
1869
the mining centers given road networks at the time (Hyslop (1984)).9 While historical documents do not mention additional criteria, concerns remain that other underlying characteristics may have influenced mita assignment. This will be examined further in Section 3.2. 3. THE MITA AND LONG-RUN DEVELOPMENT 3.1. Data I examine the mita’s long-run impact on economic development by testing whether it affects living standards today. A list of districts subjected to the mita is obtained from Saignes (1984) and Amat y Junient (1947) and matched to modern districts as detailed in the Supplemental Material, Table A.I. Peruvian districts are in most cases small political units that consist of a population center (the district capital) and its surrounding countryside. Mita assignment varies at the district level. I measure living standards using two independent data sets, both georeferenced to the district. Household consumption data are taken from the 2001 Peruvian National Household Survey (Encuesta Nacional de Hogares (ENAHO)) collected by the National Institute of Statistics (INEI). To construct a measure of household consumption that reflects productive capacity, I subtract the transfers received by the household from total household consumption and normalize to Lima metropolitan prices using the deflation factor provided in ENAHO. I also utilize a microcensus data set, obtained from the Ministry of Education, that records the heights of all 6- to 9-yearold school children in the region. Following international standards, children whose heights are more than 2 standard deviations below their age-specific median are classified as stunted, with the medians and standard deviations calculated by the World Health Organization from an international reference population. Because stunting is related to malnutrition, to the extent that living standards are lower in mita districts, we would also expect stunting to be more common there. The height census has the advantage of providing substantially 9 This discussion suggests that exempt districts were those located relatively far from both Potosí and Huancavelica. The correlation between distance to Potosí and distance to Huancavelica is −0.996, making it impossible to separately identify the effect of distance to each mine on the probability of receiving treatment. Thus, I divide the sample into two groups—municipalities to the east and those to the west of the dividing line between the Potosí and Huancavelica mita catchment areas. When considering districts to the west (Potosí side) of the dividing line, a flexible specification of mita treatment on a cubic in distance to Potosí, a cubic in elevation, and their linear interaction shows that being 100 additional kilometers from Potosí lowers the probability of treatment by 0.873, with a standard error of 0.244. Being 100 meters higher increases the probability of treatment by 0.061, with a standard error of 0.027. When looking at districts to the east (Huancavelica side) of the dividing line and using an analogous specification with a polynomial in distance to Huancavelica, the marginal effect of distance to Huancavelica is negative but not statistically significant.
1870
MELISSA DELL
more observations from about four times more districts than the household consumption sample. While the height census includes only children enrolled in school, 2005 data on primary school enrollment and completion rates do not show statistically significant differences across the mita boundary, with primary school enrollment rates exceeding 95% throughout the region examined (Ministro de Educación del Perú (MINEDU) (2005b)). Finally, to obtain controls for exogenous geographic characteristics, I calculate the mean area weighted elevation of each district by overlaying a map of Peruvian districts on 30 arc second (1 km) resolution elevation data produced by NASA’s Shuttle Radar Topography Mission (SRTM (National Aeronautics and Space Administration and the National Geospatial-Intelligence Agency) (2000)), and I employ a similar procedure to obtain each district’s mean area weighted slope. The Supplemental Material contains more detailed information about these data and the living standards data, as well as the data examined in Section 4. 3.2. Estimation Framework Mita treatment is a deterministic and discontinuous function of known covariates, longitude and latitude, which suggests estimating the mita’s impacts using a regression discontinuity approach. The mita boundary forms a multidimensional discontinuity in longitude–latitude space, which differs from the single-dimensional thresholds typically examined in RD applications. While the identifying assumptions are identical to those in a single-dimensional RD, the multidimensional discontinuity raises interesting and important methodological issues about how to specify the RD polynomial, as discussed below. Before considering this and other identification issues in detail, let us introduce the basic regression form: (1)
cidb = α + γmitad + Xid β + f (geographic locationd ) + φb + εidb
where cidb is the outcome variable of interest for observation i in district d along segment b of the mita boundary, and mitad is an indicator equal to 1 if district d contributed to the mita and equal to 0 otherwise; Xid is a vector of covariates that includes the mean area weighted elevation and slope for district d and (in regressions with equivalent household consumption on the left-hand side) demographic variables giving the number of infants, children, and adults in the household; f (geographic locationd ) is the RD polynomial, which controls for smooth functions of geographic location. Various forms will be explored. Finally, φb is a set of boundary segment fixed effects that denote which of four equal length segments of the boundary is the closest to the observation’s district capital.10 To be conservative, all analysis excludes metropolitan Cusco. Metropolitan Cusco is composed of seven non-mita and two mita 10 Results (available upon request) are robust to allowing the running variable to have heterogeneous effects by including a full set of interactions between the boundary segment fixed effects
PERSISTENT EFFECTS OF MITA
1871
districts located along the mita boundary and was the capital of the Inca Empire (Cook (1981, pp. 212–214), Cieza de León (1959, pp. 144–148)). I exclude Cusco because part of its relative prosperity today likely relates to its pre-mita heritage as the Inca capital. When Cusco is included, the impacts of the mita are estimated to be even larger. The RD approach used in this paper requires two identifying assumptions. First, all relevant factors besides treatment must vary smoothly at the mita boundary. That is, letting c1 and c0 denote potential outcomes under treatment and control, x denote longitude, and y denote latitude, identification requires that E[c1 |x y] and E[c0 |x y] are continuous at the discontinuity threshold. This assumption is needed for individuals located just outside the mita catchment to be an appropriate counterfactual for those located just inside it. To assess the plausibility of this assumption, I examine the following potentially important characteristics: elevation, terrain ruggedness, soil fertility, rainfall, ethnicity, preexisting settlement patterns, local 1572 tribute (tax) rates, and allocation of 1572 tribute revenues. To examine elevation—the principal determinant of climate and crop choice in Peru—as well as terrain ruggedness, I divide the study region into 20 × 20 km grid cells, approximately equal to the mean size of the districts in my sample, and calculate the mean elevation and slope within each grid cell using the SRTM data.11 These geographic data are spatially correlated, and hence I report standard errors corrected for spatial correlation in square brackets. Following Conley (1999), I allow for spatial dependence of an unknown form. For comparison, I report robust standard errors in parentheses. The first set of columns of Table I restricts the sample to fall within 100 km of the mita boundary; the second, third, and fourth sets of columns restrict it to fall within 75, 50, and 25 km, respectively. The first row shows that elevation is statistically identical across the mita boundary.12 I next look at terrain ruggedness, using the SRTM data to calculate the mean uphill slope in each grid cell. In contrast to elevation, there are some statistically significant, but relatively small, differences in slope, with mita districts being less rugged.13 and f (geographic locationd ). They are also robust to including soil type indicators, which I do not include in the main specification because they are highly collinear with the longitude–latitude polynomial used for one specification of f (geographic locationd ). 11 All results are similar if the district is used as the unit of observation instead of using grid cells. 12 Elevation remains identical across the mita boundary if I restrict the sample to inhabitable areas (<4800 m) or weight by population, rural population, or urban population data (Center for International Earth Science Information (2004, SEDAC)). 13 I also examined data on district soil quality and rainfall (results available upon request; see the data appendix in the Supplemental Materials for more details). Data from the Peruvian Instituto Nacional de Recursos Naturales (INRENA (1997)) reveal higher soil quality in mita districts. I do not emphasize soil quality because it is endogenous to land usage. While climate is exogenous, high resolution data are not available and interpolated climate estimates are notoriously
1872
TABLE I SUMMARY STATISTICSa Sample Falls Within <100 km of Mita Boundary
Inside
Slope Observations % Indigenous Observations Log 1572 tribute rate
4042 554 177 6359 1112 157
Outside
4018 721
s.e.
[18877] (8554) [088]* (049)***
95 5884
4085 575 144
[1119] (976)
366 160
Inside
7100 831
[004] (003)
157
Outside
4103 702
s.e.
[16692] (8275) [086] (052)**
86 6455
Inside
4117 587 104
[804] (814)
330 160
<50 km of Mita Boundary
7101 683
[004] (003)
158
Outside
4096 695
s.e.
[16945] (8961) [095] (058)*
73 6454
[842] (843)
330 161
<25 km of Mita Boundary
Inside
4135 577
4060 721
48
52
7447
6335
329 [005] (004)
Outside
165
s.e.
[14616] (11515) [090] (079)* [1087] (1052)
251 161
[002]* (003) (Continues)
MELISSA DELL
GIS Measures Elevation
<75 km of Mita Boundary
TABLE I—Continued Sample Falls Within <100 km of Mita Boundary
<75 km of Mita Boundary
<50 km of Mita Boundary
<25 km of Mita Boundary
Outside
s.e.
Inside
Outside
s.e.
Inside
Outside
s.e.
Inside
Outside
s.e.
% 1572 tribute to Spanish Nobility
5980
6382
[139]*** (136)***
5998
6369
[156]** (153)**
6201
6307
[112] (134)
6101
6317
[158] (221)
Spanish Priests
2105
1910
[090]** (094)**
2190
1945
[102]** (102)**
2059
1993
[076] (092)
2145
1998
[101] (133)
Spanish Justices
1336
1258
[053] (048)*
1331
1246
[065] (060)
1281
1248
[043] (055)
1306
1237
[056] (079)
567
440
[078] (085)
455
429
[026] (029)
442
447
[034] (033)
448
442
[029] (039)
Indigenous Mayors Observations
63
41
47
37
35
30
18
24
a The unit of observation is 20 × 20 km grid cells for the geospatial measures, the household for % indigenous, and the district for the 1572 tribute data. Conley standard errors
PERSISTENT EFFECTS OF MITA
Inside
for the difference in means between mita and non-mita observations are in brackets. Robust standard errors for the difference in means are in parentheses. For % indigenous, the robust standard errors are corrected for clustering at the district level. The geospatial measures are calculated using elevation data at 30 arc second (1 km) resolution (SRTM (2000)). The unit of measure for elevation is 1000 meters and for slope is degrees. A household is indigenous if its members primarily speak an indigenous language in the home (ENAHO (2001)). The tribute data are taken from Miranda (1583). In the first three columns, the sample includes only observations located less than 100 km from the mita boundary, and this threshold is reduced to 75, 50, and finally 25 km in the succeeding columns. Coefficients that are significantly different from zero are denoted by the following system: *10%, **5%, and ***1%.
1873
1874
MELISSA DELL
The third row examines ethnicity using data from the 2001 Peruvian National Household Survey (ENAHO). A household is defined as indigenous if the primary language spoken in the household is an indigenous language (usually Quechua). Results show no statistically significant differences in ethnic identification across the mita boundary. Spanish authorities could have based mita assignment on settlement patterns, instituting the mita in densely populated areas and claiming land for themselves in sparsely inhabited regions where it was easier to usurp. A detailed review by Bauer and Covey (2002) of all archaeological surveys in the region surrounding the Cusco basin, covering much of the study region, indicates no large differences in settlement density at the date of Spanish Conquest. Moreover, there is no evidence suggesting differential rates of population decline in the 40 years between conquest and enactment of the mita (Cook (1981, pp. 108–114)). Spanish officials blamed demographic collapse on excessive, unregulated rates of tribute extraction by local Hispanic elites (encomenderos), who received the right to collect tribute from the indigenous population in return for their role in Peru’s military conquests. Thus Viceroy Francisco Toledo coordinated an in-depth inspection of Peru, Bolivia, and Ecuador in the early 1570s to evaluate the maximum tribute that could be demanded from local groups without threatening subsistence. Based on their assessment of ability to pay, authorities assigned varying tribute obligations at the level of the district socioeconomic group, with each district containing one or two socioeconomic groups. (See the Supplemental Material for more details on the tribute assessment.) These per capita contributions, preserved for all districts in the study region, provide a measure of Spanish authorities’ best estimates of local prosperity. The forth row of Table I shows average tribute contributions per adult male (women, children, and those over age 50 were not taxed). Simple means comparisons across the mita boundary do not find statistically significant differences. The fifth through eight examine district level data on how Spanish authorities allocated these tribute revenues, divided between rents for Spanish nobility (encomenderos, fith row), salaries for Spanish priests (sixth row), salaries for local Spanish administrators (justicias, seventh row), and salaries for indigenous mayors (caciques, eigth row). The data on tribute revenue allocation are informative about the financing of local government, about the inaccurate for the mountainous region examined in this study (Hijmans et al. (2005)). Temperature is primarily determined by altitude (Golte (1980), Pulgar-Vidal (1950)), and thus is unlikely to differ substantially across the mita boundary. To examine precipitation, I use station data from the Global Historical Climatology Network, Version 2 (Peterson and Vose (1997)). Using all available data (from stations in 50 districts located within 100 km of the mita boundary), mita districts appear to receive somewhat higher average annual precipitation, and these differences disappear when comparing districts closer to the mita boundary. When using only stations with at least 20 years of data (to ensure a long-run average), which provides observations from 20 different stations (11 outside the mita catchment and 9 inside), the difference declines somewhat in magnitude and is not statistically significant.
PERSISTENT EFFECTS OF MITA
1875
extent to which Spain extracted local revenues, and about the relative power of competing local administrators to obtain tribute revenues. Table I reveals some modest differences: when the sample is limited to fall within 100 km or 75 km from the mita boundary, we see that Spanish nobility received a slightly lower share of tribute revenue inside the mita catchment than outside (60% versus 64%), whereas Spanish priests received a slightly higher share (21% versus 19%). All differences disappear as the sample is limited to fall closer to the mita boundary. In the ideal RD setup, the treatment effect is identified using only the variation at the discontinuity. Nonparametric RD techniques can be applied to approximate this setup in contexts with a large number of observations very near the treatment threshold (Imbens and Lemieux (2008)). While nonparametric techniques have the advantage of not relying on functional form assumptions, the data requirements that they pose are particularly high in the geographic RD context, as a convincing nonparametric RD would probably require precise georeferencing: for example, each observation’s longitude–latitude coordinates or address.14 This information is rarely made available due to confidentiality restrictions, and none of the available Peruvian micro data sets contains it. Moreover, many of the data sets required to investigate the mita’s potential long-run effects do not provide sufficiently large sample sizes to employ nonparametric techniques. Thus, I use a semiparametric RD approach that limits the sample to districts within 50 km of the mita boundary. This approach identifies causal effects by using a regression model to distinguish the treatment indicator, which is a nonlinear and discontinuous function of longitude (x) and latitude (y), from the smooth effects of geographic location. It is important for the regression model to approximate these effects well, so that a nonlinearity in the counterfactual conditional mean function E[c0 |x y] is not mistaken for a discontinuity, or vice versa (Angrist and Pischke (2009)). To the best of my knowledge, this is the first study to utilize a multidimensional, semiparametric RD approach. Because approaches to specifying a multidimensional RD polynomial have not been widely explored, I report estimates from three baseline specifications of f (geographic locationd ). The first approach uses a cubic polynomial in latitude and longitude.15 This parametrization is relatively flexible; it is analogous to the standard single-dimensional RD approach; and the RD plots, drawn in “x–y outcome” space, allow a transparent visual assessment of the data. 14 A notable example of a multidimensional nonparametric RD is Black’s (1999) study of the value that parents place on school quality. Black compared housing prices on either side of school attendance district boundaries in Massachusetts. Because she employs a large and precisely georeferenced data set, Black was able to include many boundary segment fixed effects and limit the sample to observations located within 0.15 miles of the boundary, ensuring comparison of observations in extremely close proximity. 15 Letting x denote longitude and y denote latitude, this polynomial is x + y + x2 + y 2 + xy + x3 + y 3 + x2 y + xy 2 .
1876
MELISSA DELL
For these reasons, this approach appears preferable to projecting the running variable into a lower-dimensional space—as I do in the other two baseline specifications—when power permits its precise estimation. One drawback is that some of the necessary datasets do not provide enough power to precisely estimate this flexible specification. The multidimensional RD polynomial also increases concerns about overfitting at the discontinuity, as a given order of a multidimensional polynomial has more degrees of freedom than the same order one-dimensional polynomial. This point is discussed using a concrete example in Section 4.3. Finally, there is no a priori reason why a polynomial form will do a good job of modeling the interactions between longitude and latitude. I partially address this concern by examining robustness to different orders of RD polynomials. Given these concerns, I also report two baseline specifications that project geographic location into a single dimension. These single-dimensional specifications can be precisely estimated across the paper’s data sets and provide useful checks on the multidimensional RD. One controls for a cubic polynomial in Euclidean distance to Potosí, a dimension which historical evidence identifies as particularly important. During much of the colonial period, Potosí was the largest city in the Western Hemisphere and one of the largest in the world, with a population exceeding 200,000. Historical studies document distance to Potosí as an important determinant of local production and trading activities, and access to coinage (Tandeter (1993, p. 56), Glave (1989), Cole (1985)).16 Thus, a polynomial in distance to Potosí is likely to capture variation in relevant unobservables. However, this approach does not map well into the traditional RD setup, although it is similar in controlling for smooth variation and requiring all factors to change smoothly at the boundary. Thus I also examine a specification that controls for a cubic polynomial in distance to the mita boundary. I report this specification because it is similar to traditional one-dimensional RD designs, but to the best of my knowledge neither historical nor qualitative evidence suggests that distance to the mita boundary is economically important. Thus, this specification is most informative when examined in conjunction with the other two. In addition to the two identifying assumptions already discussed, an additional assumption often employed in RD is no selective sorting across the treatment threshold. This would be violated if a direct mita effect provoked substantial out-migration of relatively productive individuals, leading to a larger indirect effect. Because this assumption may not be fully reasonable, I do not emphasize it. Rather I explore the possibility of migration as an interesting channel of persistence, to the extent that the data permit. During the past 130 years, migration appears to have been low. Data from the 1876, 1940, 16 Potosí traded extensively with the surrounding region, given that it was located in a desert 14,000 feet above sea level and that it supported one of the world’s largest urban populations during the colonial period.
PERSISTENT EFFECTS OF MITA
1877
and 1993 population censuses show a district level population correlation of 0.87 between 1940 and 1993 for both mita and non-mita districts.17 Similarly, the population correlation between 1876 and 1940 is 0.80 in mita districts and 0.85 in non-mita districts. While a constant aggregate population distribution does not preclude extensive sorting, this is unlikely given the relatively closed nature of indigenous communities and the stable linkages between haciendas and their attached peasantry (Morner (1978)). Moreover, the 1993 Population Census (INEI (1993)) does not show statistically significant differences in rates of out-migration between mita and non-mita districts, although the rate of in-migration is 4.8% higher outside the mita catchment. In considering why individuals do not arbitrage income differences between mita and non-mita districts, it is useful to note that over half of the population in the region I examine lives in formally recognized indigenous communities. It tends to be difficult to gain membership and land in a different indigenous community, making large cities—which have various disamenities—the primary feasible destination for most migrants (INEI (1993)). In contrast, out-migration from mita districts during the period that the mita was in force may have been substantial. Both Spanish authorities and indigenous leaders of mita communities had incentives to prevent migration, which made it harder for local leaders to meet mita quotas that were fixed in the medium run and threatened the mita’s feasibility in the longer run. Spanish authorities required individuals to reside in the communities to which the colonial state had assigned their ancestors soon after Peru’s conquest to receive citizenship and access to agricultural land. Indigenous community leaders attempted to forcibly restrict migration. Despite these efforts, the state’s capacity to restrict migration was limited, and 17th century population data—available for 15 mita and 14 non-mita districts—provide evidence consistent with the hypothesis that individuals migrated disproportionately from mita to non-mita districts.18 To the extent that flight was selective and certain cognitive skills, physical strength, or other relevant characteristics are highly heritable, so that initial differences could persist over several hundred years, historical migration could contribute to the estimated mita effect. The paucity of data and complex patterns of heritability that would link historically selective migration to the present unfortunately place further investigation substantially beyond the scope of the current paper. I begin by estimating the mita’s impact on living standards today; see Table II. First, I test for a mita effect on household consumption, using the log of equivalent household consumption, net transfers, in 2001 as the dependent variable. Following Deaton (1997), I assume that children aged 0 to 4 are equal 17
The 2005 Population Census was methodologically flawed and thus I use 1993. According to data from the 1689 Cusco parish reports (see the Supplemental Material), in the 14 non-mita districts, 52.5% of individuals had ancestors who had not been assigned to their current district of residence, as compared to 35% in the 15 mita districts. 18
1878
TABLE II LIVING STANDARDSa Dependent Variable Log Equiv. Hausehold Consumption (2001) Sample Within:
Mita R2
R2 Mita R2 Geo. controls Boundary F.E.s Clusters Observations
<75 km of Bound.
<50 km of Bound.
<100 km of Bound.
<75 km of Bound.
<50 km of Bound.
Border District
(1)
(2)
(3)
(4)
(5)
(6)
(7)
−0284 (0198) 0060 −0337*** (0087) 0046 −0277*** (0078) 0044 yes yes 71 1478
Panel A. Cubic Polynomial in Latitude and Longitude −0216 −0331 0070 (0207) (0219) (0043) 0060 −0307*** (0101) 0036
0069
0051
Panel B. Cubic Polynomial in Distance to Potosí −0329*** 0080*** (0096) (0021) 0047
0049
0084* (0046)
0087* (0048)
0020
0017
0078*** (0022) 0017
Panel C. Cubic Polynomial in Distance to Mita Boundary −0230** −0224** 0073*** 0061*** (0089) (0092) (0023) (0022) 0042 yes yes 60 1161
0040 yes yes 52 1013
0078*** (0024) 0013 0064*** (0023)
0114** (0049) 0050 0063* (0032) 0047 0055* (0030)
0040
0015
0013
0043
yes yes 289 158,848
yes yes 239 115,761
yes yes 185 100,446
yes yes 63 37,421
a The unit of observation is the household in columns 1–3 and the individual in columns 4–7. Robust standard errors, adjusted for clustering by district, are in parentheses. The dependent variable is log equivalent household consumption (ENAHO (2001)) in columns 1–3, and a dummy equal to 1 if the child has stunted growth and equal to 0 otherwise in columns 4–7 (Ministro de Educación (2005a)). Mita is an indicator equal to 1 if the household’s district contributed to the mita and equal to 0 otherwise (Saignes (1984), Amat y Juniet (1947, pp. 249, 284)). Panel A includes a cubic polynomial in the latitude and longitude of the observation’s district capital, panel B includes a cubic polynomial in Euclidean distance from the observation’s district capital to Potosí, and panel C includes a cubic polynomial in Euclidean distance to the nearest point on the mita boundary. All regressions include controls for elevation and slope, as well as boundary segment fixed effects (F.E.s). Columns 1–3 include demographic controls for the number of infants, children, and adults in the household. In columns 1 and 4, the sample includes observations whose district capitals are located within 100 km of the mita boundary, and this threshold is reduced to 75 and 50 km in the succeeding columns. Column 7 includes only observations whose districts border the mita boundary. 78% of the observations are in mita districts in column 1, 71% in column 2, 68% in column 3, 78% in column 4, 71% in column 5, 68% in column 6, and 58% in column 7. Coefficients that are significantly different from zero are denoted by the following system: *10%, **5%, and ***1%.
MELISSA DELL
Mita
Stunted Growth, Children 6–9 (2005)
<100 km of Bound.
PERSISTENT EFFECTS OF MITA
1879
to 0.4 adults and children aged 5 to 14 are equal to 0.5 adults. Panel A reports the specification that includes a cubic polynomial in latitude and longitude, panel B reports the specification that uses a cubic polynomial in distance to Potosí, and panel C reports the specification that includes a cubic polynomial in distance to the mita boundary. Column 1 of Table II limits the sample to districts within 100 km of the mita boundary, and columns 2 and 3 restrict it to fall within 75 and 50 km, respectively.19 Columns 4–7 repeat this exercise, using as the dependent variable a dummy equal to 1 if the child’s growth is stunted and equal to 0 otherwise. Column 4 limits the sample to districts within 100 km of the mita boundary, and columns 5 and 6 restrict it to fall within 75 and 50 km, respectively. Column 7 limits the sample to only those districts bordering the mita boundary. In combination with the inclusion of boundary segment fixed effects, this ensures that I am comparing observations in close geographic proximity. 3.3. Estimation Results Columns 1–3 of Table II estimate that a long-run mita effect lowers household consumption in 2001 by around 25% in subjected districts. The point estimates remain fairly stable as the sample is restricted to fall within narrower bands of the mita boundary. Moreover, the mita coefficients are economically similar across the three specifications of the RD polynomial, and I am unable to reject that they are statistically identical. All of the mita coefficients in panels B and C, which report the single-dimensional RD estimates, are statistically significant at the 1% or 5% level. In contrast, the point estimates using a cubic polynomial in latitude and longitude (panel A) are not statistically significant. This imprecision likely results from the relative flexibility of the specification, the small number of observations and clusters (the household survey samples only around one-quarter of districts), and measurement error in the dependent variable (Deaton (1997)). Columns 4–7 of Table II examine census data on stunting in children, an alternative measure of living standards which offers a substantially larger sample. When using only observations in districts that border the mita boundary, point estimates of the mita effect on stunting range from 0.055 (s.e. = 0.030) to 0.114 (s.e. = 0.049) percentage points. This compares to a mean prevalence of stunting of 40% throughout the region examined.20 Of the 12 point estimates reported in Table II, 11 are statistically significant, and I cannot reject at the 10% level that the estimates are the same across specifications. 19 The single-dimensional specifications produce similar estimates when the sample is limited to fall within 25 km of the mita boundary. The multidimensional specification produces a very large and imprecisely estimated mita coefficient because of the small sample size. 20 A similar picture emerges when I use height in centimeters as the dependent variable and include quarter × year of birth dummies, a gender dummy, and their interactions on the righthand side.
1880
MELISSA DELL
FIGURE 2.—Plots of various outcomes against longitude and latitude. See the text for a detailed description.
The results can be seen graphically in Figure 2. Each subfigure shows a district-level scatter plot for one of the paper’s main outcome variables. These plots are the three-dimensional analogues to standard two-dimensional RD plots, with each district capital’s longitude on the x axis, its latitude on the y axis, and the data value for that district shown using an evenly spaced monochromatic color scale, as described in the legends. When the underlying data are at the microlevel, I take district-level averages, and the size of the dot indicates the number of observations in each district. Importantly, the scaling on these dots, which is specified in the legend, is nonlinear, as otherwise some would be microscopic and others too large to display. The background in each plot shows predicted values, for a finely spaced grid of longitude–latitude co-
PERSISTENT EFFECTS OF MITA
1881
FIGURE 2.—Continued.
ordinates, from a regression of the outcome variable under consideration on a cubic polynomial in longitude–latitude and the mita dummy. In the typical RD context, the predicted value plot is a two-dimensional curve, whereas here it is a three-dimensional surface, with the third dimension indicated by the color gradient.21 The shades of the data points can be compared to the shades of the predicted values behind them to judge whether the RD has done an adequate job of averaging the data across space. The majority of the population in the region is clustered along the upper segment of the mita boundary, giving these 21 Three-dimensional surface plots of the predicted values are shown in Figure A2 in the Supplemental Material, and contour plots are available upon request.
1882
MELISSA DELL TABLE III SPECIFICATION TESTSa Dependent Variable Log Equiv. Hausehold Consumption (2001)
Sample Within:
<100 km of Bound. (1)
<75 km of Bound. (2)
<50 km of Bound. (3)
Stunted Growth, Children 6–9 (2005) <100 km of Bound. (4)
<75 km of Bound. (5)
<50 km of Bound. (6)
Border District (7)
Alternative Functional Forms for RD Polynomial: Baseline I Linear polynomial in latitude and longitude Mita −0294*** −0199 −0143 0064*** 0054** 0062** (0092) (0126) (0128) (0021) (0022) (0026)
0068** (0031)
Quadratic polynomial in latitude and longitude Mita −0151 −0247 −0361 (0189) (0209) (0216)
0073* (0040)
0091** (0043)
0106** (0047)
0087** (0041)
Quartic polynomial in latitude and longitude Mita −0392* −0324 −0342 (0225) (0231) (0260)
0073 (0056)
0072 (0050)
0057 (0048)
0104** (0042)
Alternative Functional Forms for RD Polynomial: Baseline II Linear polynomial in distance to Potosí Mita −0297*** −0273*** −0220** 0050** 0048** 0049** (0079) (0093) (0092) (0022) (0022) (0024)
0071** (0031)
Quadratic polynomial in distance to Potosí Mita −0345*** −0262*** −0309*** 0072*** 0064*** 0072*** 0060* (0086) (0095) (0100) (0023) (0022) (0023) (0032) Quartic polynomial in distance to Potosí Mita −0331*** −0310*** −0330*** 0078*** 0075*** 0071*** 0053* (0086) (0100) (0097) (0021) (0020) (0021) (0031) Interacted linear polynomial in distance to Potosí Mita −0307*** −0280*** −0227** (0092) (0094) (0095)
0051** (0022)
Interacted quadratic polynomial in distance to Potosí Mita −0264*** −0177* −0285** 0033 (0087) (0096) (0111) (0024)
0048** (0021)
0043* (0022)
0076*** (0029)
0027 (0023)
0039* (0023)
0036 (0024) (Continues)
districts substantially more weight in figures showing predicted values from microlevel regressions. Table III examines robustness to 14 different specifications of the RD polynomial, documenting mita effects on household consumption and stunting that are generally similar across specifications. The first three rows report results from alternative specifications of the RD polynomial in longitude–latitude: linear, quadratic, and quartic. The next five rows report alternative specifications using distance to Potosí: linear, quadratic, quartic, and the mita dummy inter-
1883
PERSISTENT EFFECTS OF MITA TABLE III—Continued Dependent Variable Log Equiv. Hausehold Consumption (2001) Sample Within:
<100 km of Bound. (1)
<75 km of Bound. (2)
<50 km of Bound. (3)
Stunted Growth, Children 6–9 (2005) <50 km of Bound. (6)
Border District (7)
Alternative Functional Forms for RD Polynomial: Baseline III Linear polynomial in distance to mita boundary Mita −0299*** −0227** −0223** 0072*** 0060*** 0058** (0082) (0089) (0091) (0024) (0022) (0023)
0056* (0032)
Quadratic polynomial in distance to mita boundary Mita −0277*** −0227** −0224** (0078) (0089) (0092)
<100 km of Bound. (4)
<75 km of Bound. (5)
0072*** 0060*** 0061*** 0056* (0023) (0022) (0023) (0030)
Quartic polynomial in distance to mita boundary Mita −0251*** −0229** −0246*** 0073*** 0064*** 0063*** 0055* (0078) (0089) (0088) (0023) (0022) (0023) (0030) Interacted linear polynomial in distance to mita boundary Mita −0301* −0277 −0385* 0082 (0174) (0190) (0210) (0054)
0087 (0055)
0095 (0065)
0132** (0053)
Interacted quadratic polynomial in distance to mita boundary Mita −0351 −0505 −0295 0140* (0260) (0319) (0366) (0082)
0132 (0084)
0136 (0086)
0121* (0064)
0048* (0024)
0049* (0026)
0055* (0031)
Mita Geo. controls Boundary F.E.s Clusters Observations
Ordinary Least Squares −0294*** −0288*** −0227** 0057** (0083) (0089) (0090) (0025) yes yes 71 1478
yes yes 60 1161
yes yes 52 1013
yes yes 289 158,848
yes yes 239 115,761
yes yes 185 100,446
yes yes 63 37,421
a Robust standard errors, adjusted for clustering by district, are in parentheses. All regressions include geographic controls and boundary segment fixed effects (F.E.s). Columns 1–3 include demographic controls for the number of infants, children, and adults in the household. Coefficients significantly different from zero are denoted by the following system: *10%, **5%, and ***1%.
acted with a linear or quadratic polynomial in distance to Potosí.22 Next, the ninth trough thirteenth rows examine robustness to the same set of specifications, using distance to the mita boundary as the running variable. Finally, the fourteenth row reports estimates from a specification using ordinary least squares. The mita effect on consumption is always statistically significant in 22 The mita effect is evaluated at the mean distance to Potosí for observations very near (<10 km from) the mita boundary. Results are broadly robust to evaluating the mita effect at different average distances to Potosí, that is, for districts <25 km from the boundary, for bordering districts, or for all districts.
1884
MELISSA DELL
the relatively parsimonious specifications: those that use noninteracted, singledimensional RD polynomials and ordinary least squares. In the more flexible specifications—the longitude–latitude regressions and those that interact the RD polynomial with the mita dummy—the mita coefficients in the consumption regression tend to be imprecisely estimated. As in Table II, the household survey does not provide enough power to precisely estimate relatively flexible specifications, but the coefficients are similar in magnitude to those estimated using a more parsimonious approach. Estimates of the mita’s impact on stunting are statistically significant across most specifications and samples.23 Given broad robustness to functional form assumptions, Table IV reports a number of additional robustness checks using the three baseline specifications of the RD polynomial. To conserve space, I report estimates only from the sample that contains districts within 50 km of the mita boundary. Columns 1–7 examine the household consumption data and columns 8–12 examine the stunting data. For comparison purposes, columns 1 and 8 present the baseline estimates from Table II. Column 2 adds a control for ethnicity, equal to 1 if an indigenous language is spoken in the household and 0 otherwise. Next, columns 3 and 9 include metropolitan Cusco. In response to the potential endogeneity of the mita to Inca landholding patterns, columns 4 and 10 exclude districts that contained Inca royal estates, which served sacred as opposed to productive purposes (Niles (1987, p. 13)). Similarly, columns 5 and 11 exclude districts falling along portions of the mita boundary formed by rivers to account for one way in which the boundary could be endogenous to geography. Column 6 estimates consumption equivalence flexibly, using log household consumption as the dependent variable, and controlling for the ratio of children to adults and the log of household size. In all cases, point estimates and significance levels tend to be similar to those in Table II. As expected, the point estimates are somewhat larger when metropolitan Cusco is included. Table IV investigates whether differential rates of migration today may be responsible for living standards differences between mita and non-mita districts. Given that in-migration in non-mita districts is about 4.8% higher than in mita districts (whereas rates of out-migration are statistically and economically similar), I omit the 4.8% of the non-mita sample with the highest equivalent household consumption and least stunting, respectively. Estimates in columns 7 and 12 remain of similar magnitude and statistical significance, documenting that migration today is not the primary force responsible for the mita effect. If the RD specification is estimating the mita’s long-run effect as opposed to some other underlying difference, being inside the mita catchment should not affect economic prosperity, institutions, or demographics prior to the mita’s enactment. In a series of specification checks, I first regress the log of the 23 Results (not shown) are also robust to including higher order polynomials in elevation and slope.
TABLE IV ADDITIONAL SPECIFICATION TESTSa Log Equivalent Household Consumption (2001)
Stunted Growth, Children 6–9 (2005)
Excludes Excludes
R2 Mita R2 Mita R2 Geo. controls Bound. F.E.s Clusters Observations
Excludes Flexible
Excludes
Districts
Boundary
Estimation
Control for
Includes
With Inca
Formed by
of Consump.
Baseline
Ethnicity
Cusco
Estates
Rivers
Equivalence
Migration
(1)
(2)
(3)
(4)
(5)
(6)
(7)
−0331 (0219)
−0202 (0157)
0069
0154
Districts
Boundary
Includes
With Inca
Formed by
Baseline
Cusco
Estates
Rivers
Migration
(8)
(9)
(10)
(11)
(12)
Panel A. Cubic Polynomial in Latitude and Longitude −0465** −0281 −0322 −0326 −0223 0087* (0207) (0265) (0215) (0230) (0198) (0048) 0104
0065
0070
0292
0067
Portions of
0017
0147*** 0093* (0048) (0048) 0046
0019
0090* (0048)
0069 (0049)
0018
0016
Panel B. Cubic Polynomial in Distance to Potosí −0329*** −0282*** −0450*** −0354*** −0376*** −0328*** −0263*** 0078*** 0146*** 0077*** 0081*** 0060** (0096) (0073) (0096) (0101) (0114) (0099) (0095) (0024) (0030) (0026) (0024) (0025) 0047
0140
0087
0036
0049
0275
0042
0013
0039
0014
0013
0012
Panel C. Cubic Polynomial in Distance to Mita Boundary −0224** −0195*** −0333*** −0255** −0217** −0224** −0161* 0064*** 0132*** 0066*** 0065*** 0046* (0092) (0070) (0087) (0110) (0098) (0095) (0088) (0023) (0027) (0025) (0023) (0024) 0040 yes yes 52 1013
0135 yes yes 52 1013
0088 yes yes 57 1173
0047
0039
yes yes 47 930
yes yes 51 992
0270 yes yes 52 1013
0037 yes yes 52 997
0013
0042
0014
0013
0012
yes yes 185 100,446
yes yes 195 127,259
yes yes 180 96,440
yes yes 183 99,940
yes yes 185 98,922
1885
a Robust standard errors, adjusted for clustering by district, are in parentheses. All regressions include soil type indicators and boundary segment fixed effects (F.E.s). Columns 1–5 and 7 include demographic controls for the number of infants, children, and adults in the household. Column (6) includes controls for the log of household size and the ratio of children to household members, using the log of household consumption as the dependent variable. The samples include observations whose district capitals are less than 50 km from the mita boundary. Coefficients that are significantly different from zero are denoted by the following system: *10%, **5%, and ***1%.
PERSISTENT EFFECTS OF MITA
Mita
Portions of
1886
MELISSA DELL
mean district 1572 tribute contribution per adult male on the variables used in the stunting regressions in Table II. I then examine the shares of 1572 tribute revenues allocated to rents for Spanish nobility, salaries for Spanish priests, salaries for local Spanish administrators, and salaries for indigenous mayors. Finally, also using data from the 1572 census, I investigate demographics, with the population shares of tribute paying males (those aged 18–50), boys, and women as the dependent variables. These regressions, reported in Table V, do not show statistically significant differences across the mita boundary, and the estimated mita coefficients are small. TABLE V 1572 TRIBUTE AND POPULATIONa Dependent Variable Share of Tribute Revenues Log Mean Tribute (1)
Mita R2 Mita R2 Mita
Spanish Nobility (2)
Spanish Priests (3)
Spanish Justices (4)
Indig. Mayors (5)
Percent Men (6)
Panel A. Cubic Polynomial in Latitude and Longitude 0020 −0010 0004 0004 0003 −0006 (0031) (0030) (0019) (0010) (0005) (0009) 0762
0109
0090
0228
0266
0596
Panel B. Cubic Polynomial in Distance to Potosí 0019 −0013 0008 0006 −0001 −0012 (0029) (0025) (0015) (0009) (0004) (0008) 0597
0058
0073
0151
0132
0315
Boys (7)
Females (8)
0011 −0009 (0012) (0016) 0377
0599
0005 −0011 (0010) (0012) 0139
0401
Panel C. Cubic Polynomial in Distance to Mita Boundary 0040 −0009 0005 0003 −0001 −0011 0001 −0008 (0030) (0018) (0012) (0006) (0004) (0007) (0008) (0010)
R2
0406
0062
0096
0118
0162
0267
0190
0361
Geo. controls Boundary F.E.s Mean dep. var. Observations
yes yes 1.591 65
yes yes 0.625 65
yes yes 0.203 65
yes yes 0.127 65
yes yes 0.044 65
yes yes 0.193 65
yes yes 0.204 65
yes yes 0.544 65
a The dependent variable in column 1 is the log of the district’s mean 1572 tribute rate (Miranda (1583)). In columns 2–5, it is the share of tribute revenue allocated to Spanish nobility (encomenderos), Spanish priests, Spanish justices, and indigenous mayors (caciques), respectively. In columns 6–8, it is the share of 1572 district population composed of males (aged 18–50), boys, and females (of all ages), respectively. Panel A includes a cubic polynomial in longitude and latitude, panel B includes a cubic polynomial in Euclidean distance from the observation’s district capital to Potosí, and panel C includes a cubic polynomial in Euclidean distance to the nearest point on the mita boundary. All regressions include geographic controls and boundary segment fixed effects. The samples include districts whose capitals are less than 50 km from the mita boundary. Column 1 weights by the square root of the district’s tributary population and columns 6–8 weight by the square root of the district’s total population. 66% of the observations are from mita districts. Coefficients that are significantly different from zero are denoted by the following system: *10%, **5%, and ***1%.
PERSISTENT EFFECTS OF MITA
1887
To achieve credible identification, I exploit variation across observations located near the mita boundary. If the boundary is an unusual place, these estimates may have little external validity. To examine this issue further, I use ordinary least squares to estimate the correlation between the mita and the main outcome variables (including those that will be examined in Section 4), limiting the sample to districts located between 25 and 100 km from the mita boundary. The estimates are quite similar to those obtained from the RD specifications (results available upon request). Moreover, correlations between the mita and living standards (measured by both consumption and stunting) calculated along the entire mita boundary within Peru are consistent in magnitude with the effects documented above.24 In summary, the RD evidence appears informative about the mita’s overall impacts. Why would the mita affect economic prosperity nearly 200 years after its abolition? To open this black box, I turn to an investigation of channels of persistence. 4. CHANNELS OF PERSISTENCE This section uses data from the Spanish Empire and Peruvian Republic to test channels of persistence. There exist many potential channels, but to provide a picture that is both parsimonious and informative, I focus on three that the historical literature and fieldwork suggest are important: land tenure, public goods, and market participation. The results document that the mita limited the establishment of large landowners inside the mita catchment and, combined with historical evidence, suggest that land tenure has in turn affected public goods provision and smallholder participation in agricultural markets. The tables in the main text report three specifications, which use a cubic polynomial in latitude and longitude, a cubic polynomial in distance to Potosí, or a cubic polynomial in distance to the mita boundary. Table A.III in the Supplemental Material reports results from the 14 additional specifications examined in Table III. In most cases, the point estimates across these specifications are similar. When not, I note it explicitly.25 4.1. Land Tenure and Labor Systems This section examines the impact of the mita on the formation of haciendas—rural estates with an attached labor force permanently settled on the estate (Keith (1971, p. 437)). Critically, when authorities instituted the mita 24 When considering observations in Peru within 50 km of any point on the mita boundary, being inside the mita catchment is associated with 28.4 percent lower equivalent household consumption and an increase of 16.4 percentage points in the prevalence of stunting. 25 As in Table III, the more flexible specifications in Table A.III are less likely than the parsimonious ones to estimate statistically significant effects.
1888
MELISSA DELL
in 1573 (40 years after the Spanish conquest of Peru), a landed elite had not yet formed. At the time, Peru was parceled into encomiendas, pieces of territory in which appointed Spaniards exercised the right to collect tribute and labor services from the indigenous population but did not hold title to land (Keith (1971, p. 433)). Rivalries between encomenderos provoked civil wars in the years following Peru’s conquest, and thus the Crown began to dismantle the encomienda system during the 1570s. This opened the possibility for manipulating land tenure to promote other policy goals, in particular, the mita.26 Specifically, Spanish land tenure policy aimed to minimize the establishment of landed elites in mita districts, as large landowners—who unsurprisingly opposed yielding their attached labor for a year of mita service—formed the state’s principal labor market competition (Larson (1988), Sanchez-Albornoz (1978)).27 Centrally, as Bolivian historian Larson (1988, p. 171) concisely articulated, “Haciendas secluded peasants from the extractive institutions of colonial society.” Moreover, by protecting native access to agricultural lands, the state promoted the ability of the indigenous community to subsidize mita conscripts, who were paid substantially below subsistence wages (Garrett (2005, p. 120), Tandeter (1993, pp. 58–60), Cole (1985, p. 31)). Similarly, authorities believed that protecting access to land could be an effective means of staving off demographic collapse (Larson (1982, p. 11), Cook (1981, pp. 108–114, 250), Morner (1978)). Finally, in return for ensuring the delivery of conscripts, local authorities were permitted to extract surplus that would have otherwise been claimed by large landowners (Garrett (2005, p. 115)). I now examine the concentration of haciendas in 1689, 1845, and 1940. The 1689 data are contained in parish reports commissioned by Bishop Manuel de Mollinedo and submitted by all parishes in the bishopric of Cusco, which encompassed most of the study region. The reports list the number of haciendas and the population within each subdivision of the parish, and were compiled by Horacio Villanueva Urteaga (1982). For haciendas in 1845, I employ data collected by the Cusco regional government, which had jurisdiction over a substantial fraction of the study region, on the percentage of the rural tributary population residing in haciendas (Peralta Ruiz (1991)). Data from 1845, 1846, and 1850 are combined to form the circa 1845 data set.28 Finally, data from the 1940 Peruvian Population Census are aggregated to the district level to calculate the percentage of the rural population residing in haciendas. 26 Throughout the colonial period, royal policy aimed to minimize the power of the (potentially revolutionary) landed class: landowners did not acquired the same political clout as mine owners, the most powerful colonial interest group (Tandeter (1993), Cole (1985)). 27 For example, land sales under Philip VI between 1634 and 1648 and by royal charter in 1654 played a central role in hacienda formation and were almost exclusively concentrated in non-mita districts (Brisseau (1981, p. 146), Glave and Remy (1978, p. 1)). 28 When data are available for more than one year, figures change little, and I use the earliest observation.
1889
PERSISTENT EFFECTS OF MITA TABLE VI LAND TENURE AND LABOR SYSTEMSa Dependent Variable
Haciendas per District in 1689 (1)
Mita R2 Mita R2 Mita
Haciendas per 1000 District Residents in 1689 (2)
Percent of Rural Tributary Population in Haciendas in ca. 1845 (3)
Percent of Rural Population in Haciendas in 1940 (4)
Panel A. Cubic Polynomial in Latitude and Longitude −12683*** −6453** −0127* −0066 (3221) (2490) (0067) (0086) 0538
0582
0410
0421
Panel B. Cubic Polynomial in Distance to Potosí −10316*** −7570*** −0204** −0143*** (2057) (1478) (0082) (0051) 0494
0514
0308
0346
Panel C. Cubic Polynomial in Distance to Mita Boundary −11336*** −8516*** −0212*** −0120*** (2074) (1665) (0060) (0045)
R2
0494
Geo. controls Boundary F.E.s Mean dep. var. Observations
yes yes 6.500 74
0497 yes yes 5.336 74
0316 yes yes 0.135 81
0336 yes yes 0.263 119
Land Gini in 1994 (5)
0078 (0053) 0245 0107*** (0036) 0194 0124*** (0033) 0226 yes yes 0.783 181
a The unit of observation is the district. Robust standard errors are in parentheses. The dependent variable in column 1 is haciendas per district in 1689 and in column 2 is haciendas per 1000 district residents in 1689 (Villanueva Urteaga (1982)). In column 3 it is the percentage of the district’s tributary population residing in haciendas ca. 1845 (Peralta Ruiz (1991)), in column 4 it is the percentage of the district’s rural population residing in haciendas in 1940 (Dirección de Estadística del Perú (1944)), and in column 5 it is the district land gini (INEI (1994)). Panel A includes a cubic polynomial in the latitude and longitude of the observation’s district capital, panel B includes a cubic polynomial in Euclidean distance from the observation’s district capital to Potosí, and panel C includes a cubic polynomial in Euclidean distance to the nearest point on the mita boundary. All regressions include geographic controls and boundary segment fixed effects. The samples include districts whose capitals are less than 50 km from the mita boundary. Column 3 is weighted by the square root of the district’s rural tributary population and column 4 is weighted by the square root of the district’s rural population. 58% of the observations are in mita districts in columns 1 and 2, 59% in column 3, 62% in column 4, and 66% in column 5. Coefficients that are significantly different from zero are denoted by the following system: *10%, **5%, and ***1%.
In Table VI, column 1 (number of haciendas per district) and column 2 (number of haciendas per 1000 district residents) show a very large mita effect on the concentration of haciendas in the 17th century, of similar magnitude and highly significant across specifications.29 The median coefficient from column 1, con29 Given the mita’s role in provoking population collapse (Wightman (1990, p. 72)), the latter measure is likely endogenous, but nevertheless provides a useful robustness check.
1890
MELISSA DELL
tained in panel C, estimates that the mita lowered the number of haciendas in subjected districts by 11.3 (s.e. = 21), a sizeable effect given that on average mita districts contained only one hacienda. Figure 2, panel (c) clearly demonstrates the discontinuity. Moreover, Table VI provides reasonably robust support for a persistent impact. Column 3 estimates that the mita lowered the percentage of the rural tributary population in haciendas in 1845 by around 20 percentage points (with estimates ranging from 0.13 to 0.21), an effect that is statistically significant across specifications. Column 4 suggests that disparities persisted into the 20th century, with an estimated effect on the percentage of the rural labor force in haciendas that is somewhat smaller for 1940 than for 1845—as can be seen by comparing panels (d) and (e) of Figure 2—and not quite as robust. The median point estimate is −0.12 (s.e. = 0.045) in panel C; the point estimates are statistically significant at the 1% level in panels B and C, but the longitude–latitude specification estimates an effect that is smaller, at −0.07, and imprecise. Table VI also documents that the percentage of the rural population in haciendas nearly doubled between 1845 and 1940, paralleling historical evidence for a rapid expansion of haciendas in the late 19th and early 20th centuries. This expansion was spurred by a large increase in land values due to globalization and seems to have been particularly coercive inside the mita catchment (Jacobsen (1993, pp. 226–237), Favre (1967, p. 243), Nuñez (1913, p. 11)). No longer needing to ensure mita conscripts, Peru abolished the communal land tenure predominant in mita districts in 1821, but did not replace it with enforceable peasant titling (Jacobsen (1993), Dancuart and Rodriguez (1902, Vol. 2, p. 136)). This opened the door to tactics such as the interdicto de adquirir, a judicial procedure which allowed aspiring landowners to legally claim “abandoned” lands that in reality belonged to peasants. Hacienda expansion also occurred through violence, with cattle nustling, grazing estate cattle on peasant lands, looting, and physical abuse used as strategies to intimidate peasants into signing bills of sale (Avila (1952, p. 22), Roca-Sanchez (1935, pp. 242– 243)). Numerous peasant rebellions engulfed mita districts during the 1910s and 1920s, and indiscriminate banditry and livestock rustling remained prevalent in some mita districts for decades (Jacobsen (1993), Ramos Zambrano (1984), Tamayo Herrera (1982), Hazen (1974, pp. 170–178)). In contrast, large landowners had been established since the early 17th century in non-mita districts, which remained relatively stable (Flores Galindo (1987, p. 240)). In 1969, the Peruvian government enacted an agrarian reform bill mandating the complete dissolution of haciendas. As a result, the hacienda elite were deposed and lands formerly belonging to haciendas were divided into Agricultural Societies of Social Interest (SAIS) during the early 1970s (Flores Galindo (1987)). In SAIS, neighboring indigenous communities and the producers acted as collective owners. By the late 1970s, attempts to impose collective ownership through SAIS had failed, and many SAIS were divided and allocated to individuals (Mar and Mejia (1980)). The 1994 Agricultural Census
PERSISTENT EFFECTS OF MITA
1891
(INEI (1994)) documents that when considering districts within 50 km of the mita boundary, 20% of household heads outside the mita catchment received their land in the 1970s through the agrarian reform, versus only 9% inside the mita catchment. Column 5, using data from the 1994 Agricultural Census, documents somewhat lower land inequality in non-mita districts. This finding is consistent with those in columns 1–4, given that non-mita districts had more large properties that could be distributed to smallholders during the agrarian reform.30 4.2. Public Goods Table VII examines the mita’s impact on education in 1876, 1940, and 2001, providing two sets of interesting results.31 First, there is some evidence that the mita lowered access to education historically, although point estimates are imprecisely estimated by the longitude–latitude RD polynomial. In column 1, the dependent variable is the district’s mean literacy rate, obtained from the 1876 Population Census (Dirección de Estadística del Perú (1878)). Individuals are defined as literate if they could read, write, or both. Panels B and C show a highly significant mita effect of around 2 percentage points, as compared to an average literacy rate of 3.6% in the region I examine. The estimated effect is smaller, at around one percentage point, and not statistically significant, when estimated using the more flexible longitude–latitude specification.32 In column 2, the dependent variable is mean years of schooling by district, from the 1940 Population Census (Dirección de Estadística del Perú 1944). The specifications reported in panels A–C suggest a long-run negative mita effect of around 0.2 years, as compared to a mean schooling attainment of 0.47 years throughout the study region, which again is statistically significant in panels B and C. While this provides support for a mita effect on education historically, the evidence for an effect today is weak. In column 3, the dependent variable is individual years of schooling, obtained from ENAHO (2001). The mita coefficient is negative in all panels, but is of substantial magnitude and marginally significant only in panel A.33 It is also statistically insignificant in most specifications in Table A.III. This evidence is consistent with studies of the Peruvian educational 30 The 1994 Agricultural Census also documents that a similar percentage of households across the mita boundary held formal titles to their land. 31 Education, roads, and irrigation are the three public goods traditionally provided in Peru (Portocarrero, Beltran, and Zimmerman (1988)). Irrigation has been almost exclusively concentrated along the coast. 32 In some of the specifications in Table A.III in the Supplemental Material that interact the RD polynomial with the mita dummy, the estimated mita effect is near 0. This discrepancy is explained by two mita districts with relatively high literacy located near the mita boundary, to which these specifications are sensitive. When these two observations are dropped, the magnitude of the effect is similar across specifications. 33 Data from the 1981 Population Census (INEI (1981)) likewise do not show a mita effect on years of schooling. Moreover, data collected by the Ministro de Educación in 2005 reveal no sys-
1892
MELISSA DELL TABLE VII EDUCATIONa Dependent Variable
Literacy 1876 (1)
Mita R2 Mita R2 Mita R2 Geo. controls Boundary F.E.s Mean dep. var. Clusters Observations
Mean Years of Schooling 1940 (2)
Panel A. Cubic Polynomial in Latitude and Longitude −0015 −0265 (0012) (0177) 0401
0280
Panel B. Cubic Polynomial in Distance to Potosí −0020*** −0181** (0007) (0078) 0345
0187
Panel C. Cubic Polynomial in Distance to Mita Boundary −0022*** −0209*** (0006) (0076) 0301 yes yes 0.036 95 95
0234 yes yes 0.470 118 118
Mean Years of Schooling 2001 (3)
−1479* (0872) 0020 −0341 (0451) 0007 −0111 (0429) 0004 yes yes 4.457 52 4038
a The unit of observation is the district in columns 1 and 2 and the individual in column 3. Robust standard errors, adjusted for clustering by district, are in parentheses. The dependent variable is mean literacy in 1876 in column 1 (Dirección de Estadística del Perú (1878)), mean years of schooling in 1940 in column 2 (Dirección de Estadística del Perú (1944)), and individual years of schooling in 2001 in column 3 (ENAHO (2001)). Panel A includes a cubic polynomial in the latitude and longitude of the observation’s district capital, panel B includes a cubic polynomial in Euclidean distance from the observation’s district capital to Potosí, and panel C includes a cubic polynomial in Euclidean distance to the nearest point on the mita boundary. All regressions include geographic controls and boundary segment fixed effects. The samples include districts whose capitals are less than 50 km from the mita boundary. Columns 1 and 2 are weighted by the square root of the district’s population. 64% of the observations are in mita districts in column 1, 63% in column 2, and 67% in column 3. Coefficients that are significantly different from zero are denoted by the following system: *10%, **5%, and ***1%.
sector, which emphasize near-universal access (Saavedra and Suárez (2002), Portocarrero and Oliart (1989)). What about roads, the other principal public good in Peru? I estimate the mita’s impact using a GIS road map of Peru produced by the Ministro de Transporte (2006). The map classifies roads as paved, gravel, nongravel, and trocha tematic differences in primary or secondary school enrollment or completion rates. Examination of data from a 2006 census of schools likewise showed little evidence for a causal impact of the mita on school infrastructure or the student-to-teacher ratio.
PERSISTENT EFFECTS OF MITA
1893
TABLE VIII ROADSa Dependent Variable
Density of Local Road Networks (1)
Mita R2 Mita R2 Mita
Density of Regional Road Networks (2)
Panel A. Cubic Polynomial in Latitude and Longitude 0464 −29276* (18575) (16038) 0232
0293
Panel B. Cubic Polynomial in Distance to Potosí −1522 −32644*** (12101) (8988) 0217
0271
Panel C. Cubic Polynomial in Distance to Mita Boundary 0535 −35831*** (12227) (9386)
Density of Paved/Gravel Regional Roads (3)
−22426* (12178) 0271 −30698*** (8155) 0256 −32458*** (8638)
R2
0213
0226
0208
Geo. controls Boundary F.E.s Mean dep. var. Observations
yes yes 85.34 185
yes yes 33.55 185
yes yes 22.51 185
a The unit of observation is the district. Robust standard errors are in parentheses. The road densities are defined as total length in meters of the respective road type in each district divided by the district’s surface area, in kilometers squared. They are calculated using a GIS map of Peru’s road networks (Ministro de Transporte (2006)). Panel A includes a cubic polynomial in the latitude and longitude of the observation’s district capital, panel B includes a cubic polynomial in Euclidean distance from the observation’s district capital to Potosí, and panel C includes a cubic polynomial in Euclidean distance to the nearest point on the mita boundary. All regressions include geographic controls and boundary segment fixed effects. The samples include districts whose capitals are less than 50 km from the mita boundary. 66% of the observations are in mita districts. Coefficients that are significantly different from zero are denoted by the following system: *10%, **5%, and ***1%.
carrozable, which translates as “narrow path, often through wild vegetation that a vehicle can be driven on with great difficulty” (Real Academia Española (2006)). The total length (in meters) of district roads is divided by the district surface area (in kilometers squared) to obtain a road network density. Column 1 of Table VIII suggests that the mita does not impact local road networks, which consist primarily of nongravel and trocha roads. Care is required in interpreting this result, as the World Bank’s Rural Roads program, operating since 1997, has worked to reduce disparities in local road networks in marginalized areas of Peru. In contrast, there are significant disparities in regional road networks, which connect population centers to each other. Column 2 in panel A estimates that a mita effect lowers the density of regional
1894
MELISSA DELL
roads by a statistically significant −29.3 meters of roadway for every square kilometer of district surface area (s.e. = 16.0). In panels B and C, the coefficients are similar, at −32.6 and −35.8, respectively, and are significant at the 1% level. This large effect compares to an average road density in mita districts of 20. Column 3 breaks down the result by looking only at the two highest quality road types—paved and gravel—and a similar picture emerges.34 If substantial population and economic activity endogenously clustered along roads, the relative poverty of mita districts would not be that surprising. While many of Peru’s roads were built or paved in the interlude between 1940 and 1990, aggregate population responses appear minimal. The correlation between 1940 district population density and the density of paved and gravel roads, measured in 2006, is 0.58; when looking at this correlation using 1993 population density, it remains at 0.58. In summary, while I find little evidence that a mita effect persists through access to schooling, there are pronounced disparities in road networks across the mita boundary. Consistent with this evidence, I hypothesize that the long-term presence of large landowners provided a stable land tenure system that encouraged public goods provision.35 Because established landowners in non-mita districts controlled a large percentage of the productive factors and because their property rights were secure, it is probable that they received higher returns to investing in public goods than those inside the mita catchment. Moreover, historical evidence indicates that these landowners were better able to secure roads, through lobbying for government resources and organizing local labor, and these roads remain today (Stein (1980, p. 59)).36 4.3. Proximate Determinants of Household Consumption This section examines the mita’s long-run effects on the proximate determinants of consumption. The limited available evidence does not suggest differences in investment, so I focus on the labor force and market participation.37 Agriculture is an important economic activity, providing primary employment for around 70% of the population in the region examined. Thus, Table IX begins by looking at the percentage of the district labor force whose primary occupation is agriculture, taken from the 1993 Population Census. The median 34 18% of mita districts can be accessed by paved roads versus 40% of non-mita districts (INEI (2004)). 35 The elasticity of equivalent consumption in 2001 with respect to haciendas per capita in 1689, in non-mita districts, is 0.036 (s.e. = 0.022). 36 The first modern road building campaigns occurred in the 1920s and many of the region’s roads were constructed in the 1950s (Stein (1980), Capuñay (1951, pp. 197–199)). 37 Data from the 1994 Agricultural Census on utilization of 15 types of capital goods and 12 types of infrastructure for agricultural production do not show differences across the mita boundary, nor is the length of fallowing different. I am not aware of data on private investment outside of agriculture.
1895
PERSISTENT EFFECTS OF MITA TABLE IX CONSUMPTION CHANNELSa Dependent Variable
Percent of District Labor Force in Agriculture—1993 (1)
Mita R2 Mita R2 Mita
Agricultural Household Sells Part of Produce in Markets—1994 (2)
Panel A. Cubic Polynomial in Latitude and Longitude 0211 −0074** (0140) (0036) 0177
0176
Panel B. Cubic Polynomial in Distance to Potosí 0101 −0208*** (0061) (0030) 0112
0144
Panel C. Cubic Polynomial in Distance to Mita Boundary 0092* −0225*** (0054) (0032)
Household Member Employed Outside the Agricultural Unit—1994 (3)
−0013 (0032) 0010 −0033 (0020) 0008 −0038** (0018)
R2
0213
0136
0006
Geo. controls Boundary F.E.s Mean dep. var. Clusters Observations
yes yes 0.697 179 179
yes yes 0.173 178 160,990
yes yes 0.245 182 183,596
a Robust standard errors, adjusted for clustering by district in columns 2 and 3, are in parentheses. The dependent variable in column 1 is the percentage of the district’s labor force engaged in agriculture as a primary occupation (INEI (1993)), in column 2 it is an indicator equal to 1 if the agricultural unit sells at least part of its produce in markets, and in column 3 it is an indicator equal to 1 if at least one member of the household pursues secondary employment outside the agricultural unit (INEI (1994)). Panel A includes a cubic polynomial in the latitude and longitude of the observation’s district capital, panel B includes a cubic polynomial in Euclidean distance from the observation’s district capital to Potosí, and panel C includes a cubic polynomial in Euclidean distance to the nearest point on the mita boundary. All regressions include geographic controls and boundary segment fixed effects. Column 1 is weighted by the square root of the district’s population. 66% of the observations in column 1 are in mita districts, 68% in column 2, and 69% in column 3. Coefficients that are significantly different from zero are denoted by the following system: *10%, **5%, and ***1%.
point estimate on mitad is equal to 0.10 and marginally significant only in panel C, providing some weak evidence for a mita effect on employment in agriculture. Further results (not shown) do not find an effect on male and female labor force participation and hours worked. The dependent variable in column 2, from the 1994 Agricultural Census, is a dummy equal to 1 if the agricultural household sells at least part of its produce in market. The corpus of evidence suggests we can be confident that the mita’s effects persist in part through an economically meaningful impact on
1896
MELISSA DELL
agricultural market participation, although the precise magnitude of this effect is difficult to convincingly establish given the properties of the data and the mechanics of RD. The cubic longitude–latitude regression estimates a longrun mita effect of −0.074 (s.e. = 0.036), which is significant at the 5% level and compares to a mean market participation rate in the study region of 0.17. The magnitude of this estimate differs substantially from estimates that use a cubic polynomial in distance to Potosí (panel B, −0.208, s.e. = 0.030) and a cubic polynomial in distance to the mita boundary (panel C, −0.225, s.e. = 0.032). It also contrasts to the estimate from ordinary least squares limiting the sample to districts bordering the boundary (−0.178, s.e. = 0.050). The surface plots in Figure 3 shed some light on why the cubic longitude– latitude point estimate is smaller. They show predicted values in “longitude– latitude–market participation rate” space from regressing the market participation dummy on the mita dummy (upper left), the mita dummy and a linear polynomial in longitude–latitude (upper right), the mita dummy and a quadratic polynomial in longitude–latitude (lower left), or the mita dummy and a cubic polynomial in longitude–latitude (lower right).38 The mita region is seen from the side, appearing as a “canyon” with lower market participation values. In the surface plot with the cubic polynomial, which is analogous to the regression in panel A, the function increases smoothly and steeply, by orders of magnitude, near the mita boundary. In contrast, the other plots model less of the steep variation near the boundary as smooth and thus estimate a larger discontinuity. The single-dimensional RDs likewise have fewer degrees of freedom to model the variation near the boundary as smooth. It is not obvious which specification produces the most accurate results, as a more flexible specification will not necessarily yield a more reliable estimate. For example, consider the stylized case of an equation that includes the mita dummy and a polynomial with as many terms as observations. This has a solution that perfectly fits the data with a discontinuity term of zero, regardless of how large the true mita effect is. On the other hand, flexibility is important if parsimonious specifications do not have enough degrees of freedom to accurately model smoothly changing unobservables. While there is not, for example, a large urban area at the peak of the cubic polynomial causing market participation to increase steeply in this region, it is difficult to conclusively argue that the variation is attributable to the discontinuity and not to unobservables, or vice versa.39 The estimates in Tables IX and A.III are most useful for determining a range of 38
I show three-dimensional surface plots, instead of shaded plots as in Figure 2, because the predicted values can be seen more clearly and it is not necessary to plot the data points. 39 Note, however, that the relatively large (mita) urban area of Ayacucho, while outside the study region, is near the cluster of mita districts with high market participation in the upper left corner of the mita area.
PERSISTENT EFFECTS OF MITA
1897
FIGURE 3.—Plots of predicted values from regressing a market participation dummy on the mita dummy and various degrees of polynomials in longitude and latitude. See the text for a detailed description.
1898
MELISSA DELL
possible mita effects consistent with the data, and this range supports an economically meaningful mita effect on market participation.40 A mita effect on market participation is consistent with the findings on road networks, particularly given that recent studies on Andean Peru empirically connect poor road infrastructure to higher transaction costs, lower market participation, and reduced household income (Escobal and Ponce (2002), Escobal (2001), Agreda and Escobal (1998)).41 An alternative hypothesis is that agricultural producers in mita districts supplement their income by working as wage laborers rather than by producing for markets. In column 3, the dependent variable is an indicator equal to 1 if a member of the agricultural household participates in secondary employment outside the agricultural unit, also taken from the 1994 Agricultural Census. Estimates suggest that, if anything, the mita effect on participation in secondary employment is negative. Could residents in mita districts have less desire to participate in the market economy, rather than being constrained by poor road infrastructure? While Shining Path, a Maoist guerilla movement, gained a strong foothold in the region during the 1980s, this hypothesis seems unlikely.42 Shining Path’s rise to power occurred against a backdrop of limited support for Maoist ideology, and the movement’s attempts to reduce participation in markets were unpopular and unsuccessful where attempted (McClintock (1998), Palmer (1994)). Recent qualitative evidence also underscores roads and market access. The citizens I spoke with while visiting eight primarily mita and six primarily nonmita provinces were acutely aware that some areas are more prosperous than 40 The specifications interacting the mita dummy with a linear or quadratic polynomial in distance to the mita boundary, reported in Table A.III, do not estimate a significant mita effect. Graphical evidence suggests that these specifications are sensitive to outliers near the boundary. 41 In my sample, 33% of agricultural households in districts with paved road density above the median participate in markets, as compared to 13% in districts with paved road density below the median. Of course, there may also exist other channels through which a mita effect lowers market participation. Data from the 1994 Agricultural Census reveal that the median size of household landholdings is somewhat lower inside the mita catchment (at 1.2 hectares) than outside (at 1.4 hectares). If marketing agricultural produce involves fixed costs, a broader group of small farmers in non-mita districts may find it profitable. 42 Many of the factors linked to the mita (poor infrastructure, limited access to markets, poorly defined property rights, and poverty) are heavily emphasized as the leading factors promoting Shining Path (Comisión de la Verdad y Reconciliación (2003, Vol. 1, p. 94), McClintock (1998), Palmer (1994)). Thus, I tested whether there was a mita effect on Shining Path (results available upon request). To measure the intensity of Shining Path, I exploit a loophole in the Peruvian constitution that stipulates that when more than two-thirds of votes cast are blank or null, authorities cannot be renewed (Pareja and Gatti (1990)). In an attempt to sabotage the 1989 municipal elections, Shining Path operatives encouraged citizens to cast blank or null (secret) ballots (McClintock (1998, p. 79)). I find that a mita effect increased blank/null votes by 10.7 percentage points (s.e. = 0.031), suggesting greater support for and intimidation by Shining Path in mita districts. Moreover, estimates show that a mita effect increased the probability that authorities were not renewed by a highly significant 43.5 percentage points. I also look at blank/null votes in 2002, 10 years after Shining Path’s defeat, and there is no longer an effect.
PERSISTENT EFFECTS OF MITA
1899
others. When discussing the factors leading to the observed income differences, a common theme was that it is difficult to transport crops to markets. Thus, most residents in mita districts are engaged in subsistence farming. Agrarian scientist Gonzales Castro (2006) argued, “Some provinces have been favored, with the government—particularly during the large road building campaign in the early 1950s—choosing to construct roads in some provinces and completely ignore others.” At the forefront of the local government’s mission in the (primarily mita) province of Espinar is “to advocate effectively for a system of modern roads to regional markets” Espinar Municipal Government (2008). Popular demands have also centered on roads and markets. In 2004, (the mita district) Ilave made international headlines when demonstrations involving over 10,000 protestors culminated with the lynching of Ilave’s mayor, whom protestors accused of failing to deliver on promises to pave the town’s access road and build a local market (Shifter (2004)). 5. CONCLUDING REMARKS This paper documents and exploits plausible exogenous variation in the assignment of the mita to identify channels through which it influences contemporary economic development. I estimate that its long-run effects lower household consumption by around 25% and increase stunting in children by around 6 percentage points. I then document land tenure, public goods, and market participation as channels through which its impacts persist. In existing theories about land inequality and long-run growth, the implicit counterfactual to large landowners in Latin America is secure, enfranchised smallholders (Engerman and Sokoloff (1997)). This is not an appropriate counterfactual for Peru, or many other places in Latin America, because institutional structures largely in place before the formation of the landed elite did not provide secure property rights, protection from exploitation, or a host of other guarantees to potential smallholders. Large landowners—while they did not aim to promote economic prosperity for the masses—did shield individuals from exploitation by a highly extractive state and did ensure public goods. This evidence suggests that exploring constraints on how the state can be used to shape economic interactions—for example, the extent to which elites can employ state machinery to coerce labor or citizens can use state guarantees to protect their property—is a more useful starting point than land inequality for modeling Latin America’s long-run growth trajectory. The development of general models of institutional evolution and empirical investigation of how these constraints are influenced by forces promoting change are particularly central areas for future research. REFERENCES ACEMOGLU, D., M. A. BAUTISTA, P. QEURUBIN, AND J. A. ROBINSON (2008): “Economic and Political Inequality in Development: The Case of Cundinamarca, Colombia,” in Institutions
1900
MELISSA DELL
and Economic Performance, ed. by E. Helpman. Cambridge, MA: Harvard University Press. [1866] ACEMOGLU, D., S. JOHNSON, AND J. A. ROBINSON (2001): “The Colonial Origins of Comparative Development: An Empirical Investigation,” American Economic Review, 91, 1369–1401. [1863] (2002): “Reversal of Fortune: Geography and Institutions in the Making of the Modern World Income Distribution,” Quarterly Journal of Economics, 117, 1231–1294. [1863,1865] AGREDA, V., AND J. ESCOBAL (1998): “Analasís de la comercialización agrícola en el Perú,” Boletín de opinión, 33. [1898] AMAT Y JUNIENT, M. (1947): Memoria de Gobierno. Sevilla: Escuela de Estudios HispanoAmericanos. [1869,1878] ANGRIST, J., AND J. PISCHKE (2009): Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton, NJ: Princeton University Press. [1875] AVILA, J. (1952): “Exposición e causas que justifican la necesidad de la reforma agraria en el distrito de Azángaro de la provincia del mismo nombre, Puno,” Ph.D. Thesis, Universidad Nacional del Cusco. [1890] BAKEWELL, P. (1984): Miners of the Red Mountain. Indian Labor in Potosi, 1545–1650. Albuquerque: University of New Mexico Press. [1867,1868] BANERJEE, A., AND L. IYER (2005): “History, Institutions, and Economic Performance: The Legacy of Colonial Land Tenure Systems in India,” American Economic Review, 95, 1190–1213. [1865] BAUER, B., AND A. COVEY (2002): “Processes of State Formation in the Inca Heartland (Cusco, Peru),” American Anthropologist, 104, 846–864. [1874] BLACK, S. (1999): “Do Better Schools Matter? Parental Valuation of Elementary Education,” Quarterly Journal of Economics, 114, 577–599. [1875] BRISSEAU, J. (1981): Le Cuzco dans sa région: Étude de l’aire d’influence d’une ville andine. Lima: Institut français d’études andines. [1888] BUSTAMANTE OTERO, L. H. (1987): “Mita y realidad: Teodomerio Gutierrez Cuevas o Rumi Maqui en el marco de la sublevación campesina de Azángaro: 1915–1916,” Ph.D. Thesis, Pontificia Universidad Católica del Perú. [1865] CAÑETE, P. V. (1794): El codigo carolino de ordenanzas reales de las minas de Potos y demas provincias del Río de la Plata. Buenos Aires: E Martiré. Reprinted in 1973. [1868] CAPUÑAY, M. (1951): Leguia, vida y obra del constructor del gran Perú. Lima: Enrique Bustamante y Bellivian. [1894] CENTER FOR INTERNATIONAL EARTH SCIENCE INFORMATION (2004): Global Rural-Urban Mapping Project, Alpha Version: Population Grids. Palisades, NY: Socioeconomic Data and Applications Center (SEDAC), Columbia University. Available at http://sedac.ciesin.columbia.edu/ gpw (May 10th, 2007). [1871] CIEZA DE LEÓN, P. (1551): El Senorio de los Incas; 2a. Parte de la Crónica del Perú. Lima: Instituto de Estudios Peruanos. Reprinted in 1967. [1867] (1959): The Incas. Norman: University of Oklahoma Press. [1871] COATSWORTH, J. (2005): “Structures, Endowments, and Institutions in the Economic History of Latin America,” Latin American Research Review, 40, 126–144. [1863,1866] COLE, J. (1985): The Potosí Mita 1573–1700. Compulsory Indian Labor in the Andes. Stanford: Stanford University Press. [1867,1868,1876,1888] COMISIÓN DE LA VERDAD Y RECONCILIACIÓN (2003): Informe Final. Lima: Comision de la Verdad y Reconciliacion. [1898] CONLEY, T. (1999): “GMM Estimation With Cross Sectional Dependence,” Journal of Econometrics, 92, 1–45. [1871] COOK, N. D. (1981): Demographic Collapse: Colonial Peru 1520–1620. Cambridge: Cambridge University Press. [1867,1871,1874,1888] D’ALTOY, T. (2002): The Incas. Oxford: Blackwell. [1867] DANCUART, E., AND J. RODRIGUEZ (1902): Anales de la hacienda pública del Perú, 24 Vols. Lima: Imprenta de La Revista y Imprenta de Guillermo Stolte. [1865,1890]
PERSISTENT EFFECTS OF MITA
1901
DEATON, A. (1997): The Analysis of Household Surveys: A Microeconomic Approach to Development Policy. Baltimore: Johns Hopkins University Press. [1877,1879] DELL, M. (2010): “Supplement to ‘The Persistent Effects of Peru’s Mining Mita’,” Econometrica Supplementary Material, 78, http://www.econometricsociety.org/ecta/Supmat/8121_data description.pdf; http://www.econometricsociety.org/ecta/Supmat/8121_data_and_programs. zip. [1867] DIRECCIÓN DE ESTADÍSTICA DEL PERÚ (1878): Censo general de la república del Perú, formatdo en 1876. Lima: Imp. del Teatro. [1891,1892] (1944): Censo Nacional de Población y Ocupación, 1940. Lima: Imprenta Torres Aguirree. [1889,1891,1892] EASTERLY, W., AND R. LEVINE (2003): “Tropics, Germs, and Crops: How Endowments Influence Economic Development,” Journal of Monetary Economics, 50, 3–39. [1863] ENGERMAN, S., AND K. SOKOLOFF (1997): “Factors Endowments, Institutions, and Differential Paths of Growth Among New World Economies,” in How Latin American Fell Behind, ed. by S. Haber. Stanford: Stanford University Press, 260–304. [1863,1866,1899] ESCOBAL, J. (2001): “The Benefits of Roads in Rural Peru: A Transaction Costs Approach,” Working Paper, Grupo de Analisis para el Desarrollo, Lima. [1898] ESCOBAL, J., AND C. PONCE (2002): “The Benefits of Rural Roads: Enhancing Income Opportunities for the Rural Poor,” Working Paper 40, Grupo de Analisis para el Desarrollo, Lima. [1898] ESPINAR MUNICIPAL GOVERNMENT (2008): “Vision,” available at www.muniespinar.gob.pe/ vision.php. [1899] FAVRE, H. (1967): “Evolución y situación de las haciendas en la region de Huancavelica, Perú,” in La Hacienda en el Perú, ed. by H. Favre, C. Collin-Delavaud, and J. M. Mar. Lima: Instituto de Estudios Peruanos, 237–257. [1890] FLORES GALINDO, A. (1987): Buscando un inca: Identidad y utopia en los Andes. Lima: Instituto de Apoyo Agrario. [1865,1890] GARRETT, D. (2005): Shadows of Empire: The Indian Nobility of Cusco, 1750–1825. Cambridge: Cambridge University Press. [1865,1867,1888] GLAESER, E., AND A. SHLEIFER (2002): “Legal Origins,” Quarterly Journal of Economics, 117, 1193–1230. [1863,1865] GLAESER, E., R. LAPORTA, F. L. DE SILANES, AND A. SHLEIFER (2004): “Do Institutions Cause Growth?” Journal of Economic Growth, 9, 271–303. [1863] GLAVE, L. M. (1989): Trajinantes: Caminos indígenas de la sociedad colonial. Lima: Instituto de Apoyo Agraria. [1868,1876] GLAVE, L. M., AND M. I. REMY (1978): Estructura Agraria e historia rural cuzqueña: A proposito de las haciendas del Valle Sagrado: 1780–1930. Cuzco: Archivo Historico del Cuzco. [1888] GOLTE, J. (1980): La racionalidad de la organización andina. Lima: Insituto de Estudios Peruanos. [1868,1874] GONZALES CASTRO, E. (2006): Personal Interview. 14 December. [1899] HALL, R., AND C. JONES (1999): “Why do Some Countries Produce so Much More Output per Worker Than Others?” Quarterly Journal of Economics, 114, 83–116. [1863] HAZEN, D. (1974): “The Awakening of Puno: Government Policy and the Indian Problem in Southern Peru, 1900–1955,” Ph.D. Thesis, Yale University, New Haven. [1890] HIJMANS, R. ET AL. (2005): “Very High Resolution Interpolated Climate Surfaces for Global Land Area,” International Journal of Climatology, 25, 1965–1978. [1874] HYSLOP, J. (1984): The Inka Road System. New York: Academic Press. [1869] IMBENS, G., AND T. LEMIEUX (2008): “Regression Discontinuity Designs: A Guide to Practice,” Journal of Econometrics, 142, 615–635. [1875] INSTITUTO NACIONAL DE ESTADÍSTICA E INFORMACIÓN DE PERÚ (INEI) (1981): VIII Censo de Población. Lima, Peru: Instituto Nacional de Estadistica e Informacion, Direccion Nacional de Censos y Encuestas. [1891]
1902
MELISSA DELL
(1993): IX Censo de Población. Lima, Peru: Instituto Nacional de Estadistica e Informacion, Direccion Nacional de Censos y Encuestas. [1867,1877,1895] (1994): Tercer Censo Nacional Agropecuario. Lima, Peru: Instituto Nacional de Estadistica e Informacion, Direccion Nacional de Censos y Encuestas. [1889,1891,1895] (2001): Encuesta Nacional de Hogares (ENAHO). Lima, Peru: Instituto Nacional de Estadistica e Informacion, Direccion Nacional de Censos y Encuestas. [1873,1878,1891,1892] (2004): Registro Nacional de Municipalidades (REMANU). Lima, Peru: Instituto Nacional de Estadistica e Informacion, Direccion Nacional de Censos y Encuestas. [1894] INSTITUTO NACIONAL DE RECURSOS NATURALES (INRENA) (1997): Potencial Bioclimatica. Lima: INRENA. [1871] JACOBSEN, N. (1993): Mirages of Transition: The Peruvian Altiplano, 1780–1930. Berkeley: University of California Press. [1865,1890] KEITH, R. G. (1971): “Encomienda, Hacienda and Corregimiento in Spanish America: A Structural Analysis,” Hispanic American Historical Review, 51, 431–446. [1887,1888] LARSON, B. (1982): Exploitación Agraria y Resistencia Campesina en Cochabamba. La Paz: Centro de Estudios de la Realidad Economico y Social. [1888] (1988): Colonialism and Agrarian Transformation in Bolivia: Cochabamba, 1550–1900. Princeton: Princeton University Press. [1865,1888] LEVILLIER, R. (1921): Gobernantes del Perú, cartas y papeles, siglo XVI: Documentos del Archivo de Indias. Madrid: Sucesores de Rivadeneyra. [1868] MAR, M., AND M. MEJIA (1980): La reforma agraria en el Perú. Lima: Instituto de Estudios Peruanos. [1890] MCCLINTOCK, C. (1998): Revolutionary Movements in Latin America: El Salvador’s FMLN and Peru’s Shining Path. Washington, DC: United States Institute of Peace Press. [1898] MINISTRO DE EDUCACIÓN DEL PERÚ (MINEDU) (2005a): Censo de Talla. Lima: Ministro de Educacion del Peru. [1878] (2005b): Indicadores de cobertura y culminación de la educación básica. Lima: Ministro de Educacion del Peru. Available at www.minedu.gob.pe. [1870] MINISTRO DE TRANSPORTE (2006): “Red Vial en GIS,” unpublished data compiled by Ministro de Transporte, Peru. [1892,1893] MIRANDA, C. (1583): Tasa de la visita general de Francisco Toledo. Lima, Peru: Universidad Nacional de San Marcos, Dirección Universitaria de Biblioteca y Publicaciones. [1873,1886] MORNER, M. (1978): Perfil de la sociedad rural del Cuzco a fines de la colonia. Lima: Universidad del Pacifico. [1877,1888] NATIONAL AERONAUTICS AND SPACE ADMINISTRATION AND THE NATIONAL GEOSPATIAL INTELLIGENCE AGENCY (2000): Shuttle Radar Topography Mission 30 Arc Second Finished Data. Available at http://webmap.ornl.gov/wcsdown/wcsdown.jsp?dg_id=10008_1. [1870,1873] NILES, S. (1987): Callachaca: Style and Status in an Inca Community. Iowa City: University of Iowa Press. [1884] NUÑEZ, J. T. (1913): Memoria leida de la ceremonía de apertura del año judicial de 1913 por el Presidente de la Ilustrisima Corte Superior de los departamentos de Puno y Madre de Dios, Dr. J. Teofilo Nuñez. Puno: Imprenta del Seminario. [1890] NUNN, N. (2008): “The Long-Term Effects of Africa’s Slave Trades,” Quarterly Journal of Economics, 123, 139–176. [1863,1865] PALMER, D. S. (1994): The Shining Path of Peru. New York: St. Martin’s Press. [1898] PAREJA PFLUCKER, P., AND A. G. GATTI MURRIEL (1990): Evaluación de las elecciones municipales de 1989: Impacto politico de la violencia terrorista. Lima: Instituto Nacional de Planificación. [1898] PERALTA RUÍZ, V. (1991): En pos del tributo: Burocracia estatal, elite regional y comunidades indígenas en el Cusco rural (1826–1854). Cusco: Casa Bartolome de las Casas. [1888,1889] PETERSON, T. C., AND R. S. VOSE (1997): “An Overview of the Global Historical Climatology Network Temperature Data Base,” Bulletin of the American Meteorological Society, 78, 2837–2849. [1874]
PERSISTENT EFFECTS OF MITA
1903
PORTOCARRERO, F., A. BELTRAN, AND A. ZIMMERMAN (1988): Inversiones públicas en el Perú (1900–1968). Una aproximación cuantitativa. Lima: CIUP. [1891] PORTOCARRERO, G., AND P. OLIART (1989): El Perú desde la escuela. Lima: Instituto de Apoyo Agrario. [1892] PULGAR VIDAL, J. (1950): Geografía del Perú: Las ocho regiones naturales del Perú. Lima: Editorial Universo. [1874] RAMOS ZAMBRANO, A. (1984): La Rebelión de Huancane: 1923–1924. Puno: Editorial Samuel Frisancho Pineda. [1865,1890] REAL ACADEMIA ESPAÑOLA (2006): Diccionario de la lengua Española. Madrid: Editorial Espasa Calpe. [1893] ROCA SANCHEZ, P. E. (1935): Por la clase indígena. Lima: Pedro Barrantes Castro. [1890] ROWE, J. (1946): “Inca Culture at the Time of the Spanish Conquest,” in Handbook of South American Indians, Vol. 2, ed. by J. Steward. Washington, DC: Bureau of American Ethnology, 183–330. [1867] SAAVEDRA, J., AND P. SUÁREZ (2002): El financiamiento de la educación pública en el Perú: El rol de las familias. Lima, Peru: GRADE. [1892] SACHS, J. (2001): “Tropical Underdevelopment,” Working Paper 8119, NBER. [1863] SAIGNES, T. (1984): “Las etnias de Charcas frente al sistema colonial (Siglo XVII): Ausentismo y fugas en el debate sobre la mano de obra indígena, 1595–1665,” Jahrbuchfr Geschichte (JAS), 21, 27–75. [1869,1878] SANCHEZ-ALBORNOZ, N. (1978): Indios y tributos en el Alto Perú. Lima: Instituo de Estudios Peruanos. [1867,1888] SHIFTER, M. (2004): “Breakdown in the Andes,” Foreign Affairs, September/October, 126–138. [1899] STEIN, S. (1980): Populism in Peru: The Emergence of the Masses and the Politics of Social Control. Madison: University of Wisconsin Press. [1866,1894] TAMAYO HERRERA, J. (1982): Historia social e indigenismo en el Altiplano. Lima: Ediciones Treintaitres. [1890] TANDETER, E. (1993): Coercion and Market: Silver Mining in Colonial Potosí, 1692–1826. Albuquerque: University of New Mexico Press. [1867,1868,1876,1888] VILLANUEVA URTEAGA, H. (1982): Cuzco 1689: Informes de los párrocos al obispo Mollinedo. Cusco: Casa Bartolomé de las Casas. [1867,1888,1889] WIGHTMAN, A. (1990): Indigenous Migration and Social Change: The Forasteros of Cuzco, 1570– 1720. Durham: Duke University Press. [1867,1889] ZAVALA, S. (1980): El servicio personal de los indios en el Perú. Mexico City: El Colegio de México. [1867]
Dept. of Economics, Massachusetts Institute of Technology, 50 Memorial Drive, E52, Cambridge, MA 02142, U.S.A.;
[email protected]. Manuscript received September, 2008; final revision received January, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 1905–1938
BAYESIAN AND DOMINANT-STRATEGY IMPLEMENTATION IN THE INDEPENDENT PRIVATE-VALUES MODEL BY ALEJANDRO M. MANELLI AND DANIEL R. VINCENT1 We prove—in the standard independent private-values model—that the outcome, in terms of interim expected probabilities of trade and interim expected transfers, of any Bayesian mechanism can also be obtained with a dominant-strategy mechanism. KEYWORDS: Independent private values, incentive compatibility, Bayesian implementations, dominant-strategy implementation, auctions, adverse selection, bilateral trade, mechanism design.
1. INTRODUCTION WE PROVE THAT in the independent private-values model with linear utility, the outcome—in terms of interim expected probabilities of trade and interim expected transfers—of any Bayesian incentive-compatible mechanism can also be obtained with a dominant-strategy mechanism. In other words, a mechanism is Bayesian incentive compatible if and only if there is a dominantstrategy incentive-compatible mechanism that generates the same interim expected probability of trade for every agent. This equivalence result is valuable. Dominant-strategy mechanisms have advantages over Bayesian mechanisms. For instance, one may be more confident that a rational agent will play a dominant strategy (if one is available) than that the same agent will play a Nash equilibrium strategy.2 The equivalence result holds, in particular, in many commonly studied auction models. There is a single indivisible object and finitely many agents. Every agent has private information, customarily interpreted as the agent’s valuation for the object. Payoffs are linear in valuation and transfer. From each agent’s viewpoint, other agents’ valuations are random variables independently distributed according to known distribution functions. The setup is sufficiently flexible to include the case of a privately informed seller and heterogeneous buyers. A direct mechanism consists of two maps per agent: a probability-of-trade function and a transfer function. Every agent, after observing her valuation, sends a report to the mechanism designer. Given the profile of reported valuations, the probability-of-trade function specifies the probability that the agent receives the object, and the transfer function specifies the amounts that the agent must pay. Thus, a direct mechanism defines a game where an agent’s strategy is her report given her private information, and an agent’s payoff is determined by the two functions. 1
We are very grateful to Martín Besfamille, Roberto Burguet, Hector Chade, Estelle Cantillon, Giuseppe Lopomo, Jeroen Swinkels, and Lin Zhou for comments on an earlier draft. 2 See Mas-Colell, Whinston, and Green (1995, p. 870), for a brief discussion of this point. © 2010 The Econometric Society
DOI: 10.3982/ECTA8025
1906
A. M. MANELLI AND D. R. VINCENT
A direct mechanism is Bayesian incentive compatible if reporting truthfully is a Bayesian Nash equilibrium, and is dominant-strategy incentive compatible if reporting truthfully is an equilibrium in weakly dominant strategies. In a Bayesian Nash equilibrium, an agent reports truthfully when doing so maximizes the agent’s interim utility, that is, the agent’s expected payoff given the agent’s true valuation (and assuming by way of equilibrium analysis that opponents report their valuations truthfully). The interim utility is determined by the agent’s expected probability of trade and expected transfer. The actual probability of trade and transfer depend on the realization of opponents’ reports. Our result is of the form “for every Bayesian incentive-compatible mechanism, there is an equivalent dominant-strategy mechanism.” We consider that two mechanisms are equivalent if both yield the same interim utility to each agent. With independent private values and linear utilities, this is so if and only if every agent is assigned the same expected probability of trade and expected transfer by both mechanisms. Since expected transfers are determined up to a constant by the expected probability-of-trade functions (Myerson (1981)), it suffices to consider the latter. Hence, we use the term “outcome” to refer to the expected probability-of-trade functions. Our equivalence applies to every Bayesian incentive-compatible outcome. Hence, solely moving from dominant-strategy to Bayesian incentive compatibility yields no gain; any such gain must come from variations in other constraints such as ex ante or ex post budget balance. As an illustration, consider a well known example: d’Aspremont and Gerard-Varet (1979a) and Arrow (1979) showed that a particular Bayesian mechanism, the expected externality mechanism, satisfies ex post efficiency, ex post budget balance, and ex ante individual rationality. Green and Laffont (1977) proved that, in general, no dominant-strategy mechanism can satisfy all three properties. Our equivalence implies that there is a dominant-strategy mechanism that achieves the same interim (and therefore ex ante) allocation, transfer, and payoffs as the Bayesian mechanism; it may not, however, satisfy ex post budget balance. Our result is specific to the independent private-values model with linear utilities. Without linearity, for instance, under risk aversion, the expected probability of trade does not suffice to determine interim utility. Thus, even if there is a dominant-strategy mechanism that generates the same expected probability-of-trade as the target Bayesian mechanism, agents need not be indifferent between both mechanisms. With nonindependent valuations, Crémer and McLean (1988, Appendix A) provided an example where the seller obtains the full surplus with a Bayesian mechanism, but cannot do so with a dominant-strategy mechanism. Once again there is no equivalent dominantstrategy mechanism. We depart from the existing literature in the notion of equivalence that we employ, and, consequently, in the technique of proof and the scope of the result. We comment first on the technique of proof. Our proofs have a geometric quality. First, we define the set of expected probability-of-trade functions derived from Bayesian incentive-compatible
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1907
mechanisms. Second, we identify the step functions that are extreme points of that set. Third, we demonstrate, by construction, that the identified extreme points can be obtained with dominant-strategy mechanisms. Fourth, we show that other elements of the set—as convex combinations of extreme points and limits of those convex combinations—inherit the desired property, that is, they also have a dominant-strategy equivalent. We borrow several ideas from Border (1991) and Matthews (1983). Border characterized the functions that are the expected probability of trade for some mechanism; he proved, elegantly and in great generality, a conjecture in Matthews (1984). Maskin and Riley (1984) proved, constructively, a variation of the characterization for a particular case. Border (2007) extended his own result to nonsymmetric environments. The relationship to the cited works will be indicated throughout this essay. Formally, we only use a simple observation from the mentioned literature, that is, the trivial part of Border’s characterization (Lemma 4.1). Our debt, however, is larger in that our technique of proof stems from Border’s geometric approach. We turn to our equivalence notion: Two mechanisms are equivalent if both give each agent the same interim utility (or expected probability of trade). For example, the first- and second-price, sealed-bid auctions are equivalent in a stronger sense than the one we use: Both auctions provide the same interim utility to all agents including the seller, but they also implement the same allocation, the same probability-of-trade function. (See, for instance, Myerson (1981) for details.) Previous literature on the relationship between Bayesian and dominantstrategy mechanisms has used the stronger equivalence notion. Mookherjee and Reichelstein (1992) identified “ mechanism design problems for which there is no loss in replacing Bayesian incentive compatibility by the stronger requirement of dominant strategy.” For them, two “equivalent” mechanisms must not only provide the same interim utility to participants, but also implement the same allocation, that is, the same ex post probability-oftrade functions. They consider an independent private-values model where utilities are quasilinear and identify a monotonicity condition that is sufficient for dominant-strategy implementation. In the case of linear utility, their monotonicity condition is, simply, that the ex post probability-of-trade function be monotone.3 Their contribution is to find conditions on utilities such as a single crossing property and their one-dimensional condensation property, so that particular allocations can be implemented in dominant strategy. An extensive literature, including d’Aspremont and Gérard-Varet (1979b), Laffont and Maskin (1979), Makowski and Mezzetti (1994), and Williams (1999), shows in various cases that if an ex post efficient allocation is implemented by a Bayesian incentive-compatible mechanism, then it can also be 3 This follows from the standard characterization of incentive compatibility applied to a single agent (Myerson (1981)).
1908
A. M. MANELLI AND D. R. VINCENT
implemented by a dominant-strategy mechanism. Williams (1999) obtained an equivalence for quasilinear utilities and provided interesting applications, a lucid discussion, and a summary of the literature. We prove that the interim utilities of any Bayesian incentive-compatible mechanism can be obtained with a dominant-strategy mechanism. Our result is not specific to particular allocations and holds with heterogeneous agents and nonsymmetric mechanisms. The formal results are presented in three sections. Section 4 deals with ex ante identical bidders and symmetric mechanisms. This case allows us to present the main ideas in the proofs. Sections 5 and 6 use similar arguments to those introduced in Section 4. Section 5 treats heterogeneous agents and nonsymmetric mechanisms. Section 6 introduces a privately informed seller and ex ante identical buyers. 2. NOTATION Vectors are represented in boldface. If b is a vector in RK , then bk is its kth coordinate, b−k = (b1 bk−1 bk+1 bK ) ∈ RK−1 , (a b−k ) = (b1 bk−1 a bk+1 bK ) ∈ RK , and bk = (0 0 0 bk bk+1 bK ) ∈ RK The vector whose kth coordinate is 1 and all others are 0 is denoted by ek . For b b ∈ RK , b ∨ b = (max{b1 b1 } max{bK bK }) and b ∧ b = (min{b1 b1 } min{bK bK }). 2 A sum with no terms is defined to be zero; for instance, j=3 bj = 0. Given a set B, Bc denotes its complement, χB denotes its characteristic or indicator function, int B denotes its interior, and |B| denotes the number of elements it contains. Let I be a positive integer and let I = {1 2 I}. For i ∈ I , let Xi ⊆ R, and let λi be a probability distribution on X . Then λ = i i∈I λi is the product distri bution and λ−j = i∈I \{j} λi . All functions are assumed to be measurable with respect to the corresponding Borel σ-algebras; product spaces are endowed with the product σ-algebras. We use the following convention for expectations. I Given a function q : i=1 Xi → R, Ex−j q(xj ) = q(x1 xI ) dλ1 · · · dλj−1 dλj+1 · · · dλI i∈I \{j} Xi
Thus the expectation is taken over x−j . Analogous notation is applied to other objects. Step functions are used prominently in our analysis. Any increasing step function can be represented by a collection of pairs {(bk βk )}Kk=1 , where k represents the kth step, βk is the value of Q on that step, and bk is the size of the
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1909
step (in the domain) according to the underlying probability distribution. The following definition makes this observation precise. DEFINITION 2.1: Let X ⊆ R, let λ be a probability distribution on X, and let Q : X → [0 1]. We say that Q is a step function with K steps if Q(X) has K elements and [β ∈ Q(X) ⇒ λ(Q−1 (β)) > 0]. REMARK 2.1: If a function Q with K steps is nondecreasing, it can be represented by K pairs {(bk βk )}Kk=1 , that satisfy the following conditions: (a) Q(X) = {βk }Kk=1 , 1 ≥ βK > βK−1 > · · · > βk > βk−1 > · · · > β1 ≥ 0. (b) For k = 1 K, bk = λ(Q−1 (βk )). We denote such Q by {(bk βk )}Kk=1 or by (b β), where b = (b1 bK ) and β = (β1 βK ). Abusing notation, we may write Q = {(bk βk )}Kk=1 = (b β). REMARK 2.2: Nondecreasing step functions that differ only on λ-measure zero sets may have the same representation by K pairs. If the function Q has K steps and is nondecreasing, then βk > βk−1 for all k with 1 < k ≤ K; otherwise it would have fewer than K steps. 3. MODEL We use a standard independent private-values model. There is a single indivisible object and a finite set I = {1 2 I} of agents. Agent i’s type is an element xi ∈ Xi = [xi xi ] ⊆ R, distributed according to a nonatomic probability distribution λi . Agents are risk neutral. Preferences are linear in type and money: If ti is the amount paid by agent i and qi is the probability that i obtains the object, i’s utility is xi qi − ti ; hence the interpretation of an agent’s type as her valuation for the object. A direct mechanism consists of two functions per agent, qi (x) and ti (x), where qi (x) is the probability that i is assigned the object and ti (x) is the amount i pays when the profile of reports is x. The sum over i of the probabilities qi (x) must be less than or equal to 1. (Strict inequality is allowed.) Henceforth, all mechanisms are direct unless specified otherwise. Fix a mechanism {(qi ti )}i∈I . If i reports her type truthfully (and other players report x−i ), then i’s payoff is ui (xi x−i ) = qi (xi x−i )xi − ti (xi x−i ). Assuming other players also report truthfully, i’s expected payoff is Ex−i ui (xi ) = Ex−i qi (xi )xi − Ex−i ti (xi ). A mechanism is incentive compatible if truthful reporting is an equilibrium. The following well known results follow from Myerson (1981). (a) A mechanism is dominant-strategy incentive compatible if and only if for all xi i and x−i , qi (xi x−i ) is nondecreasing on xi and ti (xi x−i ) = qi (xi x−i )xi − q(z x−i ) dz − u(xi x−i ). x i
1910
A. M. MANELLI AND D. R. VINCENT
(b) A mechanism is Bayesian incentive compatible if and only if for on xi and Ex−i ti (xi ) = Ex−i qi (xi )xi − all xi i, Ex−i qi (xi ) is nondecreasing 4 E q (z) dz − E u (x ). x i x i i −i −i xi This characterization justifies the usage summarized in the following definition; the omitted transfer functions in the definition are recovered, up to a constant, using the corresponding incentive compatibility characterizations. I DEFINITION 3.1: Let {qi }i∈I be a collection of I functions qi : i=1 Xi → I [0 1] such that for every x ∈ i=1 Xi , i∈I qi (x) ≤ 1. If for every i and x−i , qi (xi x−i ) is nondecreasing in xi , then {qi }i∈I is a dominant-strategy incentive compatible mechanism. If for every i, Ex−i qi (xi ) is nondecreasing in xi , then {qi }i∈I is a Bayesian incentive-compatible mechanism. The framework presented is sufficiently flexible to include, among other things, a seller with private information. These and other features are discussed further in the following sections. 4. EX ANTE IDENTICAL BIDDERS In this section, we assume that the I agents or bidders are ex ante identical: Types are identically and independently distributed according to the nonatomic probability distribution λb in X = [x x]. The product distribution, λIb , is the product of I factors λb . We require that mechanisms be symmetric, that is, that ex ante identical bidders be treated ex ante identically. (We introduce a privately informed seller, heterogeneous bidders, and nonsymmetric mechanisms in the following sections.) Symmetric mechanisms are interesting in their own right. An alleged advantage of competitive bidding is that competitive bidding tends to reduce agency problems. Favoring a particular bidder when they are all ex ante identical may diminish this advantage. For instance, in several countries, government agencies must use competitive bidding for their purchases and are often not permitted to favor a particular bidder when all bidders are ex ante identical. Every symmetric mechanism can be represented by a single probability-oftrade function satisfying a permutation inequality. This representation is introduced in the definition below; its relationship to Definition 3.1 is made clear afterward. DEFINITION 4.1: Let q : X I → [0 1] be such that for every x ∈ X I , I i=1 q(σi (x)) ≤ 1, where σi (x1 xI ) = (xi x2 xi−1 x1 xi+1 xI ), that is, σi (x) interchanges the first and ith coordinates of the vector x. 4 Myerson (1981) assumed that distributions have densities and that the densities are strictly positive on their supports. Monteiro and Svaiter (2007) extended the characterization to arbitrary measures.
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1911
(a) q is a symmetric, dominant-strategy incentive-compatible, mechanism with I bidders if q(x1 x−1 ) is nondecreasing in x1 . (b) q is a symmetric, Bayesian incentive-compatible, mechanism with I bidders if Ex−1 q(x1 ) is nondecreasing in x1 . REMARK 4.1: Each bidder’s probability-of-trade function qi is derived from the single function q by setting qi (x) = q(σi (x)). The omitted transfer functions are recovered, up to a constant, using the corresponding incentive compatibility characterizations. Theorem 1 is this section’s main result: Every Bayesian incentive-compatible mechanism has a dominant-strategy equivalent. Both mechanisms yield the same interim utilities for all agents. THEOREM 1: If q is a symmetric, Bayesian incentive-compatible mechanism with I bidders, then there is a symmetric, dominant-strategy incentive-compatible mechanism q with I bidders that generates the same expected probability of trade, that is, Ex−1 q = Ex−1 q almost everywhere (a.e.). REMARK 4.2: We prove a stronger result: Even if a Bayesian incentivecompatible mechanism is not symmetric, provided the interim expected utility is the same for all bidders, there is a symmetric dominant-strategy incentivecompatible mechanism that yields the same interim expected utility. Theorem 1 demonstrates that, ceteris paribus, going from dominant strategy to Bayesian implementation does not increase the set of obtainable outcomes, when outcomes are defined in terms of expected probabilities of trade and expected transfers. Figure 1 depicts two mechanisms in an environment with two bidders whose valuations are uniformly distributed in [0 1]. Types are divided into five intervals of equal probability, and types in the same interval are treated equally. The left diagram in the figure represents the mechanism q (x1 x2 ). Every type profile (x1 x2 ) belongs to a cell and the number in that cell is the value of q (x1 x2 ). (Cells without values indicate q (x1 x2 ) = 0.) The numbers below the horizontal axis are the expected probability of trade Ex−1 q (x1 )—the integral of the function q for fixed x1 along the vertical axis. Since Ex−1 q (x1 ) is nondecreasing, q (with its implicit expected transfer) satisfies Bayesian incentive compatibility. (This is incentive compatibility’s classic characterization.) It is clear, however, that q does not satisfy dominant-strategy incentive compatibility because q (x1 x2 ) is not nondecreasing on x1 for some x2 , say x2 ∈ [1/5 2/5]. The right diagram in the figure represents the mechanism q(x1 x2 ) that is equivalent to q in that it yields the same expected probability of trade, Ex−1 q = Ex−1 q , but it is also dominant-strategy incentive compatible.
1912
A. M. MANELLI AND D. R. VINCENT
FIGURE 1.—In cells with no values, q and q are 0.
We go from mechanism q on the left to mechanism q on the right by rearranging the cells in the diagram so that q(x1 x2 ) is nondecreasing on x1 for fixed x2 . Care must be exercised so that the rearrangement of cells satisfies the symmetry of the mechanism: Given a type profile (x1 x2 ), if q (x1 x2 ) is the probability that agent 1 gets the object, the probability that bidder 2 gets the object is q2 (x1 x2 ) = q (x2 x1 ), and thus q (x1 x2 ) + q (x2 x1 ) ≤ 1. Thus, in both diagrams, the numbers in cells that are symmetric with respect to the diagonal must sum up to no more than 1. While focusing on symmetric mechanisms allows us to use a single function q , it also requires us to use a single function q. In the example, the required rearrangement of cells is straightforward. That the required rearrangement can be carried out for any arbitrary mechanism q is the content of the theorem. Figure 2 provides a less transparent example. There are two bidders, and x1 and x2 are uniformly distributed in [0 1]. Types are divided into four intervals of equal probability. Empty cells in the diagram represent q(x1 x2 ) = 0 for every (x1 x2 ) in the cell. The left-hand diagram is Bayesian incentive compatible; the right-hand diagram is its dominant-strategy equivalent. The proof of Theorem 1 follows from three lemmas of independent interest plus a convergence argument. We will conclude the section with an example. The inequality in Lemma 4.1 is a feasibility constraint that must be satisfied by any mechanism, not only Bayesian incentive compatible ones: The proba bility that a buyer with type in B wins is I B Ex−1 q(x1 ) dλb ; the probability that there is a buyer with type in B is 1 − λb (Bc )I ; the former cannot exceed the latter.
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1913
FIGURE 2.—In cells with no values, q and q are 0.
LEMMA 4.1: If q is a Bayesian incentive-compatible, symmetric, mechanism with I bidders, then Ex−1 q ∈ W , where W = Q Q : X → [0 1] is nondecreasing and (1)
Q(x1 ) dλb ≤ 1 − [λb (B )]
B ⊆ X ⇒ I
c
I
B
PROOF: Bayesian incentive compatibility implies that Ex−1 q is nondecreasing. Lemma 5.1 in Border (1991) establishes the inequality. Q.E.D. The feasibility constraint first appeared in Matthews (1983) and in Maskin and Riley (1984). It plays a key role in our proof. Matthews (1984) conjectured that for any function Q : X → [0 1], not necessarily nondecreasing, satisfying the feasibility constraint, there is a symmetric mechanism q with Ex−1 q = Q, that is, Q is the expected probability of trade of some symmetric mechanism q. Border (1991) proved Matthews’ conjecture for general type spaces. He was not concerned with incentive compatibility; he was interested in determining when the expected probability of trade can be used as the primitive in the analysis. (Lemma 4.1 is a corollary to Lemma 5.1 in Border (1991).) Maskin and Riley (1984, Theorem 7 in their Appendix) proved, constructively, a version of Matthews’ conjecture for nondecreasing step functions Q. Matthews (1984) extended Maskin and Riley’s result to arbitrary nondecreasing functions Q. All these authors restricted attention to symmetric mechanisms and ex ante identical bidders. The rest of the proof proceeds as follows. Lemma 4.2 characterizes the step functions that are extreme points of W . Lemma 4.3 constructs a symmetric, dominant-strategy mechanism for each extreme point identified in Lemma 4.2.
1914
A. M. MANELLI AND D. R. VINCENT
Finally, a convergence argument, provided after the proofs of the lemmas, establishes the theorem: Every function in W is the limit of convex combinations of step functions. Lemma 4.2 is based on the following observations. The domain of any step function can be partitioned into finitely many sets where the function is constant; the elements of the partition are the function’s level sets. Lemma 4.2 arbitrarily fixes one such partition and identifies the step functions, relative to the fixed partition, that are extreme points of the feasible set W . Verifying that a step function is an extreme point is a finite-dimensional matter: When applied to a step function, the inequality in (1) becomes a system of finitely many linear inequalities, determined by the fixed partition. To be an extreme point, the step function must make sufficiently many inequalities bind. (To visualize this point, imagine a set in R2 defined by finitely many linear inequalities, say a rectangle. The extreme points of the rectangle are its vertices. Vertices are defined by the intersection of sufficiently many lines; more precisely, two lines per vertex, because the rectangle is a subset of R2 .) The proof of Theorem 1 uses only one direction in Lemma 4.2: If a step function is an extreme point, it must be one of those identified by the lemma. ¯ = {(bk β¯ k )}Kk=1 be a step function in W . Then LEMMA 4.2: Let Q = (b β) K {(bk β¯ k )}k=1 is an extreme point of W if and only if one of the following statements holds: ( k bj )I −( k−1 bj )I for k = 1 K. (a) β¯ k = j=1 Ib j=1 k (b) β¯ 1 = 0 and β¯ k is as in (a) for k = 2 K. K PROOF: Letting B = j=k Q−1 (β¯ k ), the inequality in the definition of W becomes k−1 I K
(2) Ibj βj ≤ 1 − bj j=k
j=1
or, in vector notation, Ibk · β ≤ 1 − rk , where bk is the vector (0 0 0 bk k−1 bk+1 bK ) and rk = ( j=1 bj )I . Taking k = 1 K, (2) becomes a system of K inequalities. (Note that β¯ (with a bar) indicates the point defined in the statement of the lemma and β (without a bar) represents a possible solution to the system of equations.) Recall that ek denotes the vector whose kth coordinate is 1 and all others are zero. Define (3)
P(b) = {β ∈ RK : for k = 1 K Ibk · β ≤ 1 − rk and ek · β ≥ 0}
The set P(b) ⊆ RK is defined by 2K inequalities; it is the set of all nonnegative vectors β ∈ RK (K inequalities), such that (b β) satisfies the inequalities (2) (another K inequalities).
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1915
A step function (b β) ∈ W is an extreme point of W if and only if β is an extreme point of P(b) (Lemma A.2). A vector β ∈ P(b) is an extreme point of P(b) if and only if the set (4)
R(β) = {bk : k ∈ {1 K} Ibk · β = 1 − rk } ∪ {ek : k ∈ {1 K} ek · β = 0}
has K linearly independent elements (Lemma A.1), that is, if the inequalities defining P(b) are evaluated at β, they must include K linearly independent ¯ is an extreme point of W if and only if equations. We conclude that (b β) ¯ R(β) has K linearly independent vectors. ¯ has K linearly independent vectors if and only if either The set R(β) K ¯ ¯ Simple inspection reveals that both {bk }k=1 ⊆ R(β) or {e1 b2 bK } ⊆ R(β): alternatives have K linearly independent vectors. To see that no other alter¯ Then β¯ k = 0. This implies k = 1; native is possible, suppose that ek ∈ R(β). ¯ otherwise β has less than K steps. ˜ if and only if β ˜ is as defined in Lemma 4.2(a): Finally, {bk }Kk=1 ⊆ R(β) ˜ The system of equations Ibk · β = 1 − rk , k = 1 K, has a unique solution (because the vectors {bk }Kk=1 are linearly independent). The solution to ˜ = 1 − rK is β˜ K = β¯ K . Pick any k < K. Subtracting Ibk+1 · β ˜ = 1 − rk+1 IbK · β ˜ ˜ ¯ from Ibk · β = 1 − rk yields βk = βk . ¯ if and only if β ¯ is as in LemSimilarly, {e1 b2 bK } ⊆ R(β) ma 4.2(b). Q.E.D. Lemma 4.3 constructs a dominant-strategy incentive-compatible mechanism that implements the extreme points identified in Lemma 4.2. Let the step function Q be an extreme point of W . The mechanism is constructed as follows. Given a type profile (x1 xI ), bidders are ranked using Q(xi ). Those bidders with maximum rank, that is, maxi Q(xi ), share the object with equal probability in the new mechanism; those bidders with less than maximum rank are assigned the object with probability 0. Thus, the new mechanism q takes values in { 11 12 1I 0}. It depends only on the partition {Q−1 (βk )}Kk=1 defined by Q and not on the actual values taken by Q. LEMMA 4.3: Let the step function Q = {(bk β¯ k )}Kk=1 be an extreme point of W . Then the symmetric mechanism q(x1 x2 xI ) ⎧ 1 ⎨ if Q(x1 ) > 0 Q(x1 ) ≥ Q(xi ) ∀i, = |{i : Q(x1 ) = Q(xi )}| ⎩ 0 otherwise, is dominant-strategy incentive compatible and Ex−1 q = Q.
1916
A. M. MANELLI AND D. R. VINCENT
PROOF: Since q is nondecreasing in x1 for any given x−1 , q satisfies dominant-strategy incentive compatibility. We must prove that Ex−1 q = Q. Pick an arbitrary x1 . If Q(x1 ) = 0, then Ex−1 q(x1 ) = 0. Suppose then that Q(x1 ) = β¯ k > 0. By direct calculation, I−1−(n−1)
I k−1
1 I −1 bj bkn−1 Ex−1 q(x1 ) = n n−1 n=1
j=1
To see this note that Ex−1 q(x1 ) is the integral of q(x1 xI ) over all xi with i = 1. Since q takes finitely many values, its integral is a summation. Each term in the expression above corresponds to a value of q(x1 x−1 ) as x−1 varies. The first factor in a typical term, n1 , is the value of q. The second factor is the number of ways in which q may take the value n1 : there are I − 1 variables xi and exactly n − 1 of them must be in Q−1 (βk ). The last two factors represent the k−1 k−1 probabilities: j=1 bj is the probability that a given xi is in j=1 Q−1 (βj ) and, k−1 therefore, ( j=1 bj )I−1−(n−1) is the probability that I − 1 − (n − 1) of them will k−1 be in j=1 Q−1 (βj ). Similarly, bkn−1 is the probability that n − 1 variables xi will be in Q−1 (βk ). To show that x1 ∈ Q(β¯ k ) ⇒ Ex−1 q(x1 ) = β¯ k , we must prove that I−n
I k−1
1 I −1 bj bkn−1 = n n−1 n=1
k−1
I bj + bk
j=1
j=1
Ibk
j=1
Multiply both sides by Ibk , note that sides to obtain
I I−1 n n−1
k−1 I
− bj
=
I n
k−1
, and add (
j=1
bj )I to both
I k−1 I−n k−1 I k−1 I
I n bj bk + bj = bj + bk n n=1
j=1
This is the binomial formula since ( in the summation, for n = 0.
j=1
k−1 j=1
j=1
bj )I corresponds to the term, missing Q.E.D.
That the type of mechanism employed in Lemma 4.3 can achieve the bounds in Lemma 4.1 was recognized by Border (1991, Lemma 5.2, p. 1180). The proof of Theorem 1 now follows from Lemmas 4.1, 4.2, and 4.3 plus a convergence argument. (See Border (1991, Lemma 5.4) for related material.) To develop the convergence argument, we define the class of monotone
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1917
dominant-strategy mechanisms (Definition 4.2) and prove Lemmas 4.4 and 4.5. Lemma 4.4 identifies the set of monotone mechanisms as a subset of L∞ . Lemma 4.5 demonstrates that the set of monotone mechanisms is compact, thus contributing a convergent subsequence of dominant-strategy mechanisms to the proof of Theorem 1. After proving both lemmas, the convergence argument is presented in the proof of Theorem 1. DEFINITION 4.2: A symmetric mechanism q : X I → [0 1] is monotone if q(x ) ≤ q(x) for every x x ∈ X with x1 ≤ x1 and xi ≥ xi for every i > 1. REMARK 4.3: The mechanism q in Lemma 4.3 is monotone. Monotone mechanisms are nondecreasing in x1 —hence dominant-strategy incentive compatible—and nonincreasing in xi for i > 1. DEFINITION 4.3: For x x ∈ RI , let x x if x1 ≤ x1 and xi ≥ xi for i > 1; and let x x = (x1 ∧ x1 x2 ∨ x2 x3 ∨ x3 xI ∨ xI ). The lemma below defines a subset D and demonstrates that, as a subset of L∞ , D is the set of monotone mechanisms: The elements of D are equivalence classes; two functions that differ only on a set of λIb -measure zero belong to the same equivalence class. Monotone mechanisms belong to D because they satisfy its defining conditions everywhere. Lemma 4.4 proves that every equivalence class in D contains a monotone mechanism. This is necessary for the intended interpretation of D . (For instance, the set of continuous functions as a subset of L∞ does not contain all functions that are continuous a.e.) LEMMA 4.4: Let D = q ∈ L∞ (λIb ) : ∃B ⊆ [x x]I with λIb (B) = 1 and ∀x x ∈ B
∀i ∈ I q(σi (x)) ∈ [0 1] q(σi (x)) ≤ 1 and
i∈I
[σi (x ) σi (x)] ⇒ q(σi (x )) ≤ q(σi (x)) Suppose supp[λb ] = X. Then q˜ ∈ D if and only if there is a monotone symmetric mechanism q such that q = q˜ a.e. PROOF: One direction is trivial: Monotone mechanisms satisfy the defining conditions of D everywhere. We prove the converse, that is, that every function in D belongs to the equivalence class of a monotone mechanism. Let q˜ ∈ D and let B be its corresponding set of full measure as specified in the definition of D . The restriction of q˜ to B is monotone and, therefore, the
1918
A. M. MANELLI AND D. R. VINCENT
set of its discontinuity points has Lebesgue measure zero (Lavriˇc (1993)). Let B ⊆ B be the set of its continuity points. Since λb is absolutely continuous with respect to the Lebesgue measure, λIb (B ) = λIb (B) = 1. Let B = B \ {x : ∃i ∈ I σi (x) ∈ / B }. Then λIb (B ) = λIb (B ) = 1. Since I I I supp[λb ] = X , B is dense in X . We will construct a monotone mechanism q such that q = q˜ a.e. Let ˜ n ) → (x r) ϕ(x) = r ∈ [0 1] : ∃{xn } in B xn q(x For every x ∈ X I , ϕ(x) is nonempty (because B is dense in X I ) and closed (because it is the set of limit points). For x ∈ X I , define 0 if int({x : x x}) = ∅ q(x) = (5) min{r : r ∈ ϕ(x)} otherwise, where for any set A, int(A) is the interior of A. By construction, q takes values in [0 1] and q = q˜ a.e. (because every x ∈ B is a continuity point of the restriction of q˜ to B and, therefore, ϕ(x) is a ˜ singleton and q(x) = q(x)). We prove that ∀x ∈ X I , i q(σi (x)) ≤ 1. Pick any x ∈ X I . Let {xn } be a sequence in B such that {xn } → x. (Such a sequence exists because B is dense.) ˜ 1 (xn )) q(σ ˜ I (xn )))} takes values in X I × [0 1]I and The sequence {(xn q(σ thus has a convergent subsequence, nk , whose limit point indexed by we denote ˜ i (xnk )) ≤ 1 for all nk , i∈I q¯ i ≤ 1. By ¯ = (x q¯ 1 q¯ I ). Since i∈I q(σ by (x q) construction, q¯ i ∈ ϕ(σi (¯x)). By (5), q(σi (¯x)) ≤ q¯ i . Therefore, i∈I q(σi (x)) ≤ ¯ i ≤ 1. i∈I q It remains to show that q is monotone. Let x y, x = y. If int({x : x x}) = ∅, then q(x) = 0 and hence q(x) q(y). If int({x : x x}) = ∅, then int({y : y y}) = ∅ and, for sufficiently large n, int({y : y yn }) = ∅. To prove that q(x) ≤ q(y), it suffices to establish that for any r ∈ ϕ(y), ∃r ∈ ϕ(x) r ≤ r. Since r ∈ ϕ(y) ∃{yn } in B such ˜ n ))} → (y r). For each yn , pick xn ∈ {x ∈ B : x (yn x)} and that {(yn q(y xn − (yn x) ≤ 1/n. Then {xn } converges to x and xn yn for every n. Let ˜ n )). Then r ≤ r. (x r ) be any accumulation point of (xn q(x Q.E.D. LEMMA 4.5: The set D defined in Lemma 4.4 is weak∗ compact as a subset of L∞ (λIb ). PROOF: If q ∈ D , q takes values in [0 1] a.e. and thus D is L∞ -bounded. If, in addition, D is weak∗ closed, D is weak∗ compact (Banach–Alaoglu Theow∗ ¯ (Since in this case L1 is rem). Let {qn } be a sequence in D such that {qn } → q. separable, the weak∗ topology on the unit ball in L∞ is metrizable and hence we may restrict attention to sequences.) We must prove that q¯ belongs to (an equivalence class in) D .
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
Let A1 = {x :
1919
¯ i (x)) ≤ 1} and suppose λIb (A1 ) < 1. For any qn ∈ D , q(σ n I χAc1 q (σi (x)) dλb ≤ χAc1 dλIb = λIb (Ac1 ) i∈I
i∈I
¯ i (x)) dλIb ≤ λIb (Ac1 ), a contradiction. Hence, Taking limits, χAc1 q(σ i∈I λIb (Ac1 ) = 0. Let B and B be subsets of X I , λIb (B ) > 0 and λIb (B) > 0, such that [x ∈ B x ∈ B] ⇒ x x. By Lemma A.3, 1 1 n I χ χB qn dλIb q dλ ≤ qn ∈ D ⇒ B b λIb (B ) λIb (B) 1 χB q¯ dλIb ≤ λI 1(B) χB q¯ dλIb . Lemma A.3 implies Taking limits, λI (B ) b
b
∃A2 λIb (A2 ) = 1 : [∀x x ∈ A2 x x]
⇒
¯ ) ≤ q(x) ¯ q(x
Let A = A 1 ∩ A2 Since λIb (A1 ) = λIb (A2 ) = 1, then λIb (A) = 1. Similar arguments show that q¯ takes values in [0 1] a.e. This proves that D is weak∗ closed and hence weak∗ compact. Q.E.D. Combining Lemmas 4.4 and 4.5, we obtain the following corollary. COROLLARY 4.5.1: The set of monotone symmetric mechanisms, as a subset of L∞ (λIb ), is weak∗ compact. PROOF OF THEOREM 1: We divide the proof into four statements and their proofs. Fix any Q in W . L∞ (a) There exists a sequence of step functions {Qn } in W such that {Qn } → Q. PROOF: For n = 1 2 define Qn (x1 ) = sup k2−n : k ∈ N k2−n ≤ Q(x1 ) λb Q−1 [k2−n (k + 1)2−n ) > 0 where the supremum over the empty set is defined to be zero. By construction, λb x1 : |Qn (x1 ) − Q(x1 )| ≤ 2−n = 1
1920
A. M. MANELLI AND D. R. VINCENT L∞
and, therefore, {Qn } → Q. Also Qn is a nondecreasing step function in the sense of Definition 2.1 and Remark 2.1. Finally, since Qn ≤ Q, it satisfies the inequality in (1). Thus, Qn ∈ W . (b) For n = 1 2 the step function Qn is a convex combination of step functions that are extreme points of W . PROOF: Suppose, without loss of generality, that Qn has K steps. By Definition 2.1 and Remark 2.1, Qn = (b β), b β ∈ RK . Let W (b) = {β ∈ [0 1]K : (b β ) ∈ W }. The set W (b) is a convex, compact, subset of RK . Then W (b) equals the convex hull of its extreme points (see, for instance, Grünbaum (2003, p. 18)). Every extreme point of W (b) is an extreme point of W (Corollary A.2.1). (c) For n = 1 2 there exists a monotone mechanism qn such that Ex−1 qn = Qn . PROOF: Apply Lemmas 4.2 and 4.3, Remark 4.3, and the fact that the convex combination of monotone symmetric mechanisms is a monotone symmetric mechanism. Using (b), the claim is established. w∗
(d) The sequence {qn } has a subsequence {qnk } such that {qnk } → q and (a member of the equivalence class of) q is a monotone mechanism with Ex−1 q = Q a.e. PROOF: By Lemma 4.5, the sequence {qn } in D has a convergent subsew∗
quence, {qnk } → q , q ∈ D . (Since L1 is separable, the weak∗ topology on the unit ball in L∞ is metrizable and we may use sequences without loss of generality.) By Lemma 4.4, q can be considered a monotone mechanism. The map qnk → Ex−1 qnk is continuous when domain and range are endowed w∗ with their weak∗ topologies; therefore, {Ex−1 qnk } → Ex−1 q . Since {Qnk } is a L∞ subsequence of {Qn } in (a), {Qnk } → Q. By (c), Ex−1 qnk = Qnk ∀nk ; hence Ex−1 q = Q a.e. Q.E.D. The example below illustrates Lemmas 4.2 and 4.3. EXAMPLE 1: There are two bidders, i = 1 2, K1 = K2 = 4, and for every i, xi is uniformly distributed in X = [0 1]. Lemma 4.2 states that there are at most two extreme points for any given partition. Fix a partition of [0 1], say [0 1/4], (1/4 2/4], (2/4 3/4], (3/4 1]. The step function Q = {(1/4 1/8) (1/4 3/8) (1/4 5/8) (1/4 7/8)}
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1921
FIGURE 3.—An extreme point and its dominant-strategy mechanism.
is one extreme point of W for the proposed partition. The level sets of Q are the elements of the partition. The second extreme point of W (for the same partition), say Q , is obtained from Q by setting Q (x1 ) = 0 for all x1 ∈ [0 1/4], and Q (x1 ) = Q(x1 ) elsewhere. Figure 3 illustrates Lemma 4.3 as it applies to Q. The numbers in the cells of the figure indicate the values of q(x1 x2 ); empty cells indicate q(x1 x2 ) = 0 for (x1 x2 ) in the cell. Note that Ex−1 q = Q. The mechanism that implements the second extreme point is q (x1 x2 ) = 0 for (x1 x2 ) ∈ [0 1/4] × [0 1/4] and q = q elsewhere; then Ex−1 q = Q . 5. HETEROGENEOUS BIDDERS In this section, agents are potentially heterogeneous and, therefore, mechanisms are not required to be symmetric. THEOREM 2: If {qi }i∈I is a Bayesian incentive-compatible mechanism, then there exists a dominant-strategy, incentive-compatible mechanism {qi }i∈I that generates the same expected probability of trade, that is, Ex−i qi = Ex−i qi a.e. for i ∈ I . REMARK 5.1: Theorem 1 is not implied by Theorem 2. Theorem 2 establishes the existence of an equivalent dominant-strategy mechanism, but does not ensure that it is symmetric, even if the original Bayesian mechanism is symmetric. Note that nonsymmetric mechanisms may yield symmetric expected probabilities of trade. See also Remark 4.2 following Theorem 1. The proof of Theorem 2 follows closely the proof of Theorem 1. It proceeds in three lemmas, the analogues of Lemmas 4.1, 4.2, and 4.3.
1922
A. M. MANELLI AND D. R. VINCENT
LEMMA 5.1: If {qi }i∈I is Bayesian incentive compatible, then {Ex−i qi }i∈I is in (6)
W = {Qi }i∈I ∀i Qi : Xi → [0 1] is nondecreasing and i∈I
Bi ⊆
I i=1
Xi ⇒
i∈I
Qi dλi ≤ 1 − Bi
λi (Bic )
i∈I
Since the proof is analogous to that of Lemma 4.1, we provide only a sketch. The Because of Bayesian incentive compatibility Ex−i qi must be nondecreasing. probability that bidder i has type in Bi and wins the object is Bi Ex−i qi dλi . Hence the left-hand side of the inequality is the probability that one buyer’s type is in her specified set and the buyer wins the object. The right-hand side is the probability that at least one buyer has type in her specified set. Therefore, the inequality in (6) must hold. Lemma 5.2 below serves the same purpose as Lemma 4.2 did in the symmetric environment of Section 4. Given a partition of the type space, the lemma identifies all the step functions (relative to that partition) that are extreme points of W . Similar arguments are used in the proofs of both lemmas: Identifying the extreme points of W is equivalent to finding the solution to a system of equations obtained from the feasibility condition in (6). This is what the proof of Lemma 4.2 accomplished. The differences in details between Lemmas 4.2 and 5.2 arise from the selection of the system of equations and the number of unknowns to be determined. Generally there are more inequalities than necessary to determine an extreme point. To see this, imagine that W is a rectangle in R2 . Four inequalities suffice to define the rectangle, but each of its extreme points, that is, each vertex, is determined by only two inequalities; each vertex is a point where two inequalities become binding. To identify a vertex, the inequalities must be chosen judiciously: if two inequalities represented by parallel lines are chosen, an extreme point will not be identified. In Lemma 4.2, because the I ex ante identical bidders must be treated symmetrically, the selection of equations to determine the unknowns is trivial: For a fixed partition of the type space, Lemma 4.2 identifies a single “family” of extreme points containing one main extreme point and another extreme point obtained through a small variation (i.e., β¯ 1 = 0). In Lemma 5.2, the situation is more involved. Since bidders need not be treated symmetrically, even for a fixed partition of the type spaces, the feasible set has many extreme points. Each one of them is identified by a different system of equations. Modulus the selection of the system of equations, however, the arguments used to prove Lemma 5.2 are the same as those used to prove Lemma 4.2. To characterize the different extreme points without listing them individually, we use a labeling system. The labeling system identifies the
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1923
equations that determine the extreme points of W . This allows us to prove Lemma 5.2 for a canonical extreme point. integers. A laDEFINITION 5.1: Let {Ki }Ii=1 be a collection of I nonnegative beling relative to {Ki }i∈I is a function g : {0 1 i∈I Ki } → i∈I {1 Ki + 1} such that the following statements hold: (a) g(0) = (K1 + 1 KI + 1). (b) For n ≥ 1, g(n) − g(n − 1) = −ei for some i ∈ {1 I}. For k ∈ {0 i∈I Ki }, define gi−1 (k) = min{n : gi (n) = k}.5 Let I = 2. Intuitively, a labeling is a collection of multi-indices ordered from highest (K1 +1 K2 +1) to lowest (1 1) with the restriction that each element in the collection is obtained from the previous one by decreasing a single index by one unit. (We use the term multi-index because, as we will see after Example 2, each g(n) is used in Lemma 5.2 to index a feasibility constraint.) For instance, if K1 = K2 = 3, one possible labeling is g(0) = (4 4) g(1) = (3 4) g(2) = (3 3) g(3) = (2 3) g(4) = (2 2) g(5) = (1 2) g(6) = (1 1). The collection g(0) = (4 4) g(1) = (3 4) g(2) = (3 2) g(3) = (2 2) g(4) = (1 2) g(5) = (1 1) g(6) = (1 1) is not a labeling. More generally, consider the set i∈I {1 Ki + 1} with the partial order k if ki ≤ ki for every defined by the standard vector inequality, that is, k < i with strict inequality for some i. A labeling g selects i∈I Ki vectors in decreasing order (i.e., g(n) < g(n − 1)) and at each step n, only one component decreases by the minimum possible (i.e., g(n) − g(n − 1) = −ei for some i). Thus, a labeling determines a finite, ordered sequence of multi-indices. Example 2 contains three labelings that we also use in Example 3. EXAMPLE 2: There are two bidders, i = 1 2, and K1 = K2 = 3. Three labeling systems are described in Table I. Figures 4 and 5 may be used to visualize the labelings in Example 2. Let the lattice formed by the intersections of the solid lines represent the set i∈I {1 Ki + 1} = {1 2 3 4} × {1 2 3 4}: Then g(0) = (4 4) is the northeast corner of the diagram; if g(1) = (3 4), then g(1) − g(0) = (−1 0) is the horizontal arrow with origin at g(0). Thus, the arrows in the diagram indicate the finite sequence of multi-indices and their order. The set of all step functions relative to a fixed partition in W is determined by a system of finitely many inequalities derived from the feasibility inequality in (6). The role of the labeling is to select sufficiently many, linear-independent inequalities so that when binding, they determine an extreme point. Note that gi−1 (k) is well defined: gi (0) = Ki +1 and g( i∈I Ki ) = 1. Therefore, gi ( i∈I Ki ) = 1 and there must be an n such that gi (n ) = k. 5
1924
A. M. MANELLI AND D. R. VINCENT TABLE I LABELING SYSTEMS
g(0) g(1) g(2) g(3) g(4) g(5) g(6)
Labeling (a)
Labeling (b)
Labeling (c)
(4 4) (3 4) (3 3) (2 3) (2 2) (1 2) (1 1)
(4 4) (4 3) (3 3) (3 2) (2 2) (2 1) (1 1)
(4 4) (4 3) (4 2) (4 1) (3 1) (2 1) (1 1)
Before stating Lemma 5.2, we offer an example. Example 3 shows that, given a partition of the type space, labelings (a) and (b) in Example 2 correspond to extreme points. EXAMPLE 3: There are two bidders, i = 1 2, K1 = K2 = 3, and for every i, xi is uniformly distributed in Xi = [0 1]. Although bidders are ex ante identical, mechanisms are not required to be symmetric. Fix a partition of [0 1], say [0 1/3], (1/3 2/3], (2/3 1] and index its elements by ki = 1 2 3, respectively, when referring to Xi . Each diagram in Figure 4 corresponds to a pair of step functions (Q1 Q2 ) in W . Each pair is an extreme point of W . The left-hand diagram is associated with Q1 = {(b1k β1k )}3k=1 = {(1/3 1/3) (1/3 2/3) (1/3 3/3)} Q2 = {(b2k β2k )}3k=1 = {(1/3 0) (1/3 1/3) (1/3 2/3)}
FIGURE 4.—Labelings (a) and (b), two extreme points.
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1925
FIGURE 5.—Labeling (c).
The right-hand diagram illustrates the reciprocal extreme point. (See Lemma 5.2.) Numbers in the cells of the diagrams indicate the values of the mechanism q1 (x1 x2 ); empty cells signify q1 = 0. (See Lemma 5.3.) Note that in both diagrams Ex−i qi = Qi . Labelings (a) and (b) in Example 2 correspond to the extreme points in Example 3. Figure 4 is helpful to see this. We associate each index ki with a set as follows: If ki = 3, then let Bi = (2/3 3/3]; if ki = 2, then let Bi = (1/3 2/3] ∪ (2/3 3/3]; if ki = 1, then let Bi = [0 1/3) ∪ (1/3 2/3] ∪ (2/3 3/3]. Therefore, to any multi-index (k1 k2 ) corresponds a set B1 × B2 . Applying the feasibility inequality in (6) to this set yields a linear constraint Ki 2
i=1 k=ki
bik βik ≤ 1 −
2 ki −1
bik
i=1 k=1
Every labeling thus determines i∈I Ki equations (i.e., binding inequalities), six in the example. If the equations are linearly independent, the solution to the equations is an extreme point. Similarly, for each extreme point there is an implicit labeling. Lemma 5.2 makes this precise. Consider, as an illustration, labeling (a). Since g(1) = (3 4), the inequality in (6) becomes (1/3)β13 ≤ 1 − [(1/3 + 1/3) × 1]. (When binding, the solution is β13 = 1.) From g(2) = (3 3), the inequality in (6) becomes (1/3)β13 + (1/3)β23 ≤ 1 − [(1/3 + 1/3) × (1/3 + 1/3)]. When binding and using β13 = 1, the solution is β23 = 2/3. The process continues until all βiki have been identified. It is not difficult to verify that the six equations selected by labeling (a) are linearly independent. Thus labeling (a) corresponds to an extreme point. A similar argument will show that labeling (b) also corresponds to an extreme point. Applying the same argument to labeling (c), however, yields as
1926
A. M. MANELLI AND D. R. VINCENT
a solution, a function with fewer than three steps. Thus, labeling (c) does not correspond to an extreme point with three steps (Figure 5). In a first reading of Lemma 5.2, it may be useful to assume that K1 = K2 = · · · = KI and that, for n ≥ 1, the difference g(n) − g(n − 1) = −e(n mod I) . Under this assumption, the labeling is the natural one, that is, g(1) = (K1 K2 + 1 KI + 1), g(2) = (K1 K2 K3 + 1 KI + 1), and so forth. The proof of Theorem 2 only uses one direction in Lemma 5.2: If a step function is an extreme point, it must be one of those identified by the lemma. Ki }i∈I be a collection of step functions in W . The LEMMA 5.2: Let {{(bik β¯ ik )}k=1 K i collection {{(bik β¯ ik )}k=1 }i∈I is an extreme point of W if and only if there exists a labeling g relative to {Ki }i∈I such that one of the following statements holds: gj (gi−1 (k))−1 j b . (a) ∀i ∈ I and k ∈ {1 Ki }, β¯ ik = j∈I \{i} =1 i ¯ (b) βk is defined as above for all i and k with the following exceptions: there ¯i is i such that β¯ i1 = 0, and for every i and k with k > gi (gi−1 (1)), βk = 0.
PROOF: ⇒ We prove that if a step function is an extreme point of W , then it is one of those identified by the lemma. Its converse, not used in the proof of Theorem 2, is proven in Appendix A. Ki }i∈I as in the lemma’s statement, define Fix {{bik }k=1 K = {1 Ki + 1} i∈I
K = 1 2
Ki
i∈I
and note that g maps K ∪ {0} into K. Ki Any family of step functions {{(bik βik )}k=1 }i∈I ∈ W must satisfy the inequality in (6) and, therefore, (7)
∀k = (k1 kI ) ∈ K
Ki
bik βik ≤ 1 −
ki −1
i∈I k=ki
bik
i∈I k=1
where sums with no terms are defined to be zero. We find it convenient to use vector notation. To that end, define, for i = 1 I, bi = (bi1 biKi ) and b = (b1 bI ) and bk = (b1k1 bIkI ) biki = 0 0 biki biki +1 biKi βi = (βi1 βiKi ) and
β = (β1 βI )
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1927
and let biKi +1 = 0, the null vector in RKi . Also for every k ∈ K, define r(k) =
ki −1
bik
i∈I k=1
In vector notation, inequality (7) becomes (8)
∀k ∈ K
bk · β ≤ 1 − r(k)
constraints in Note that β is a vector in R i∈I Ki . To express nonnegativity vector form, for any i ∈ I and k ∈ {1 Ki }, let eik ∈ R i∈I Ki such that eik = (0 0 1 0 0) where the 1 corresponds to the element k of bidder i. Thus writing eik · β ≥ 0 is equivalent to writing βik ≥ 0. Define the set containing all nonnegative vectors β such that (b β) satisfies (8): P = β : [k ∈ K ⇒ bk · β ≤ 1 − r(k)] and [i ∈ I k ∈ {1 Ki } ⇒ eik · β ≥ 0] An element (b β) in W is an extreme point of W if and only if β is an extreme point of P (Lemma A.2). In turn, β is an extreme point of P if and only if (9)
R(β) = {bk : bk · β = 1 − r(k) k ∈ K} ∪ {eik : eik · β = 0 i ∈ I k ∈ {1 Ki }}
I contains i=1 K i linearly independent vectors (Lemma A.1). ¯ be an extreme point of W . Then β ¯ is an In summary, fix b and let (b β) ¯ extreme point of P . Therefore, the set R(β) must have i∈I Ki linearly independent vectors. ¯ is strictly positive, that is, β¯ ik > 0 for every i and k. Let First, suppose that β {bk } be the collection, with i∈I Ki elements, of linearly independent vectors ¯ Each vector bk in the collection satisfies in R(β). ¯ = 1 − r(k) bk · β Order the i∈I Ki indices of these vectors from largest to smallest so that k > k > · · ·. We will show below that it is always possible to order indices strictly as described. Then define the labeling g as follows: g(0) = (K1 + 1 KI + 1) and g(n) is the nth element in the ordered sequence. (10)
1928
A. M. MANELLI AND D. R. VINCENT
We now prove that the strict ordering of indices is possible. Arguing by contradiction, suppose it is not. Then ∃k k ∈ K, k = (k ∧ k ) = k such that ¯ = 1 − r(k), bk · β ¯ = 1 − r(k ), and bk∧k · β ¯ = 1 − r(k ∧ k ). Note that bk · β k ∨ k = k + k − k ∧ k and thus bk + bk − bk∧k = bk∨k . Therefore, ¯ ≤ 1 − r(k ∨ k ) 1 − r(k) + 1 − r(k ) − 1 + r(k ∧ k ) = bk∨k · β This implies that r(k ∨ k ) + r(k ∧ k ) ≤ r(k) + r(k ). This is a contradiction because r(k ∨ k ) + r(k ∧ k ) ≥ r(k) + r(k ) and the inequality is strict except when k ∧ k ∈ {k k}. This establishes that ordering the indices strictly is possible. ¯ is item (a) in the lemma’s statement relative to We now demonstrate that β the labeling g defined above. ¯ = 1 − r(g(n)) ∀n ∈ K. Using the labeling, (10) can be rewritten as bg(n) · β −1 ¯= Pick any i ∈ I and k ∈ {1 Ki }, and let n = gi (k). Subtracting bg(n −1) · β ¯ 1 − r(g(n − 1)) from bg(n ) · β = 1 − r(g(n )) yields ¯ = r(g(n − 1)) − r(g(n )) bg(n ) − bg(n −1) · β By definition of n , gi (n ) = k, gi (n − 1) = k + 1 and for all j = i, gj (n ) = ¯ = bik β¯ ik and the expression above gj (n − 1). Therefore, [bg(n ) − bg(n −1) ] · β becomes bik β¯ ik = r(g(n − 1)) − r(g(n ))
=
−1)−1 gj (n
j∈I
= = =
j
b −
(n )−1 gj
j∈I
=1
j
b
=1
−1)−1 gi (n )−1
j gi (n
i i b b − b
gj (n )−1
j∈I \{i}
=1
gj (n )−1
j∈I \{i}
=1
gj (n )−1
j∈I \{i}
=1
=1
=1
k−1
j k+1−1
i i b b − b
=1
=1
j
b
bik
gj (n )−1 j ¯ is item (a) in the b . This establishes that β Therefore, β¯ ik = j∈I \{i} =1 lemma’s statement. ˜ = 0 and thus Second, suppose that β¯ ik = 0 for some i and k, that is, eik · β i ¯ ek ∈ R(β) is one of the linearly independent vectors. Note that β¯ ik = 0 implies β¯ ik−1 = 0. Therefore, unless k = 1, the function will not have the required
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1929
number of steps. A similar argument to the one we used above (when assuming ¯ is item (b) in the lemma’s statement. Q.E.D. β¯ ik > 0) yields that β Ki LEMMA 5.3: Let {{(bik β¯ ik )}k=1 }i∈I be an extreme point of W and let g be its labeling. The two mechanisms {qi }Ii=1 defined below satisfy dominant-strategy Ki incentive compatibility and, for every i, Ex−i qi = {(bik β¯ ik )}k=1 . −1 ¯ i For i = 1 I, let ιi (xi ) = k : xi ∈ Qi (βk ). For alternative (a) in Lemma 5.2, the implementing mechanism is 1 if ιj (xj ) ≤ gj gi−1 (ιi (xi )) − 1 ∀j = i, qi (x1 xI ) = 0 otherwise.
For alternative (b) in Lemma 5.2, the implementing mechanism is ⎧ ⎨ 1 if ιj (xj ) ≤ gj gi−1 (ιi (xi )) − 1 ∀j = i and ιi (xi ) = 1, qi (x1 xI ) = ⎩ 0 otherwise, −1 ⎧ ⎨ 1 if ιj (xj ) ≤ gj gi (ιi (xi )) − 1 ∀j = i and qi (x1 xI ) = ιi (xi ) = gi (n) for n ≤ min{n : n ∈ gi−1 (1)}, ⎩ 0 otherwise. PROOF: The proof is by direct calculation. For alternative (a) in Lemma 5.2, pick i and xi . Let ιi (xi ) = k and gi−1 (ι(xi )) = n . We must show that Ex−i qi (xi ) = g (n )−1 j β¯ ik , that is, that Ex−i qi (xi ) = j∈I \{i} kj =1 bk . Using definitions, Ex−i qi (xi ) = =
qi (xi x−i ) dλ−i X−i
j∈I \{i}
gj (n )−1 xj
dλj =
gj (n )−1
j∈I \{i}
k =1
j
bk
For alternative (b), apply the same argument.
Q.E.D.
6. BILATERAL DIFFERENTIAL INFORMATION The environment in this section has I ex ante identical buyers plus a distinct agent that we call the seller. (We discuss the interpretation of the distinct agent as a seller after Definition 6.1.) All agents, including the seller, have private information. To emphasize the difference between the seller and the bidders, the seller’s private information is denoted by y ∈ Y , distributed according to a probability distribution λs . (Every
1930
A. M. MANELLI AND D. R. VINCENT
buyer’s private information x ∈ X is independently distributed according to the same distribution λb .) We require that mechanisms be symmetric with respect to the ex ante identical buyers as in Section 4. After stating the definition, we explain its content. To avoid cumbersome notation, we will use Eq or Eq(x1 ), that is, the expectation of q(x y) taken over x−1 and y, instead of Eyx−1 q. DEFINITION 6.1: Let q : Y × X I → [0 1] and qs : Y × X I → [0 1] be such I that for every (y x) ∈ Y × X I , i=1 q(y σi (x)) + qs (y x) ≤ 1. If q(y x) and qs (y x) are nondecreasing in x1 and y, respectively, then (q qs ) is a symmetric, dominant-strategy incentive-compatible mechanism with I bidders and a seller. If Eq(x1 ) and Ex qs (y) are nondecreasing, then (q qs ) is a symmetric, Bayesian incentive-compatible mechanism with I bidders and a seller. The omitted transfer functions are recovered, up to a constant, using the corresponding incentive compatibility characterizations. (See Section 3 and the paragraph following this definition.) To interpret the distinct agent as a seller, it suffices to set Y = [y y] ⊆ R− . A nondecreasing Ex qs (y) becomes nonincreasing as a function of |y|.6 Fix a profile (y x). While qs (y x) is the probability that the seller ends up with the object, it is not the probability that the object is not given to some buyer. If the probability sum (for the given type profile) is strictly less than 1, then the object might not be assigned to either buyers or the seller. This flexibility in the definition of a mechanism increases the set of mechanisms for which the equivalence (between dominant strategy and Bayesian implementation) is obtained. Since we show the equivalence of any mechanism, not just a revenue maximizing one, the additional generality is valuable. THEOREM 3: If (q qs ) is a symmetric, Bayesian incentive-compatible mechanism with I bidders and a seller, then there is a symmetric dominant-strategy, incentive-compatible mechanism (q qs ) with I bidders and a seller that generates the same expected probability of trade, that is, Eq (x1 ) = Eq(x1 ) a.e. and Ex qs = Ex qs a.e. If there is a single buyer, Theorem 3 is a particular case of Theorem 2 (with two agents—a buyer and a seller). If there are at least two ex ante identical buyers who must be treated symmetrically, Theorem 3 does not follow from Theorem 2. The proof, however, is similar to the proofs of Theorems 1 and 2. The nontrivial direction also proceeds in three lemmas. Given the similarities, we state them without proof in Appendix B. 6 The seller’s preferences, defined as us (y x) = qs (y x)y − ts (y x) (where ts (y x) ≥ 0 represents transfers from the agent to the mechanism designer) can be written as us (y x) = −ts (y x) − qs (y x)|y|.
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1931
7. CONCLUDING COMMENTS An outcome in a game is customarily defined as a distribution on the terminal nodes that results from a strategy profile and nature’s moves. For conciseness, suppose there are only two agents in our model and consider the implicit game in a direct revelation mechanism. An outcome is a distribution μ on X1 × X2 × X1 × X2 × [0 1] × [0 1] × R × R where from left to right we have the type spaces, the action spaces, the probabilities of trade, and the transfers. The linearity of preferences implies that the marginal distribution of μ on the first, fifth, and seventh space (μX1 ×[01]×R ) suffices to determine player 1’s payoff. (This is the marginal distribution on player 1’s own type, own probability of trade, and own transfer.) If two outcomes μ and ν generate the same relevant marginal distributions (i.e., μXi ×[01]×R = νXi ×[01]×R ) for all players, they are equivalent in the sense that players are indifferent between them. We have proved that for any Bayesian Nash equilibrium outcome μ, there is a dominant-strategy equilibrium outcome ν such that the relevant marginal distributions of μ and ν are the same. The actual outcomes μ and ν will generally be different. A mechanism design problem is often cast in terms of the maximization of an objective function subject to constraints. If the objective function and constraints depend only on the expected probabilities of trade, as is often the case, then, per our equivalence theorems, there is no loss in requiring dominant strategy over Bayesian incentive compatibility. Our work is related to Border (1991, 2007). Border’s (1991) objective was to identify the functions Q : X → [0 1] for which there is a symmetric mechanism q : X I → [0 1] (with I identical bidders) and Ex−1 q = Q. He demonstrates that a necessary and sufficient condition for this is that Q satisfy the feasibility inequality in Lemma 4.1. (From that characterization, we use only the simple part (Lemma 4.1).) Border (1991) assumed ex ante identical bidders and considered only symmetric mechanisms. He did not require incentive compatibility and, therefore, his expected probabilities of trade Ex−1 q need not be nondecreasing. Border (2007) extended his own characterization to general nonsymmetric environments, but assumed finite types. As a by-product, Theorem 2 extends Border’s result in that it applies to heterogeneous agents (and, thus, to bilateral trade) and to nonsymmetric mechanisms with a continuum of types (but assuming nondecreasing Q). APPENDIX A: ADDITIONAL LEMMAS AND PROOFS We complete the proof of Lemma 5.2. ¯ defined in Lemma 5.2(a), PROOF OF LEMMA 5.2: ⇐ We now prove that for β ¯ is an extreme point of W . By hypothesis, (b β) ¯ belongs to W . There(b β)
1932
A. M. MANELLI AND D. R. VINCENT
¯ has I Ki linearly independent, fore, it suffices to demonstrate that R(β) i=1 vectors. We do so in two steps. First, simple inspection shows that the i∈I Ki vectors {bg(n) }n∈K are linearly ¯ independent, where g is the labeling used to defined β. ¯ We must show that Second, we demonstrate that {bg(n) }n∈K ⊆ R(β). (11)
n∈K
¯ = 1 − r(g(n)) bg(n) · β
⇒
˜ = 1 − r(g(n)) ∀n ∈ K. ˜ be a solution to the system of equations bg(n) · β Let β (Such a solution always exists because the vectors {bg(n) }n∈K are linearly inde˜ = β. ¯ Pick any i ∈ I and k ∈ {1 Ki }, and let pendent.) We will show that β n = gi−1 (k). ˜ = 1 − r(g(n − 1)) from bg(n ) · β ˜ = 1 − r(g(n )) yields Subtracting bg(n −1) · β ˜ = r(g(n − 1)) − r(g(n )) bg(n ) − bg(n −1) · β By definition of n , gi (n ) = k, gi (n − 1) = k + 1, and, for all j = i, gj (n ) = ˜ = bik β˜ ik and the expression above gj (n − 1). Therefore, [bg(n ) − bg(n −1) ] · β becomes bik β˜ ik = r(g(n − 1)) − r(g(n ))
=
−1)−1 gj (n
j∈I
= = =
j
b −
(n )−1 gj
j∈I
=1
j
b
=1
−1)−1 gi (n )−1
j gi (n
i i b b − b
gj (n )−1
j∈I \{i}
=1
gj (n )−1
j∈I \{i}
=1
gj (n )−1
j∈I \{i}
=1
=1
j
b
k+1−1
=1
=1
b − i
k−1
i
b
=1
j
b
bik
gj (n )−1 j b = β¯ ik . This establishes (11). Therefore, β˜ ik = j∈I \{i} =1 ¯ has We have proved that R(β) i∈I Ki linearly independent vectors and, ¯ is an extreme point of W . We now prove that for β ¯ defined therefore, (b β) ¯ in Lemma 5.2(b), (b β) is an extreme point of W . Once again, we need to ¯ has show that R(β) i∈I Ki linearly independent vectors.
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1933
gj (g−1 (1))−1 j b . (If there is no i for which this holds, Let β¯ i1 = 0 < j∈I \{i } =1 i ¯ is as in Lemma 5.2(a) and we are done.) Define n = g−1 then β i (1). For n < n , β¯ igi (n) is as defined in Lemma 5.2(a) and, therefore, by (11), ¯= bg(n) · β
Ki
bik β¯ ik = 1 − r(g(n))
i∈I k=gi (n)
¯ Therefore, for every n < n , bg(n) ∈ R(β). For n ≥ n , ¯= bg(n) · β
Ki
bik β¯ ik < 1 − r(g(n))
i∈I k=gi (n)
¯ i1 = 0 and this variable was strictly positive when ¯ ig (n ) = β This is so because β i ¯ but eii does. For every (11) applied. Therefore, bg(n ) does not belong to R(β), ¯ ik = 0 if k > gi (n ), bg(n) ∈ ¯ and eik ∈ n > n , the same argument applies: β / R(β), ¯ R(β). ¯ are linearly independent. Hence It is immediate that all vectors in R(β) ˜ ˜ is an extreme point ¯ β is an extreme point of P . Since (b β) ∈ W , (b β) Q.E.D. of W . The following well known property is included here for the reader’s convenience. LEMMA A.1: For j = 1 J, let aj ∈ RK and let rj ∈ R. Let P = {β ∈ RK : aj · β ≤ rj j = 1 J}. Then a vector β ∈ P is an extreme point of P if and only if the set Aβ = {aj : aj · β = rj j ∈ {1 J}} contains K linearly independent vectors. For the proof, see for instance, Bertsekas (2003, Proposition 3.3.3, p. 184). The following lemma adds detail to the proof of Lemmas 4.2 and 5.2. We state it and prove it using W and P used in Lemma 4.2. The result and its proof remain valid for the W and P as used in Lemma 5.2.7 LEMMA A.2: Let W be defined as in Lemma 4.1. Let (b β) ∈ W be a step function with K steps and let P be as defined in (3). Then (b β) is an extreme point of W if and only if β is an extreme point of P. 7 Manelli and Vincent (2007, Theorems 17 and 19), observed that the domain partition defining a step function determines a face of W . Lemma A.2 and its corollary are based on this observation.
1934
A. M. MANELLI AND D. R. VINCENT
PROOF: First, if (b β) is not an extreme point of W , then β is not an extreme point of P. If Q = (b β) is not an extreme point of W , then there are Q1 Q2 ∈ W such that Q = 12 Q1 + 12 Q2 . Let βk be the kth component of β and pick any x x ∈ Q−1 (βk ) with x > x. For i = 1 2, Qi is nondecreasing (because Qi ∈ W ) and, therefore, Qi (x ) ≥ Qi (x). Suppose this inequality is strict for some i. Then βk = 12 Q1 (x ) + 12 Q2 (x ) > 12 Q1 (x) + 12 Q2 (x) = βk , a contradiction. We conclude that for i = 1 2, Qi is constant in Q−1 (βk ). Since k was chosen arbitrarily, Qi is constant in every interval in which Q is constant. Therefore, we may write Qi = (b βi ). Then β = 12 β1 + 12 β2 and β is not an extreme point of P. Second, if β is not an extreme point of P, then β = β /2 + β /2 for some β β ∈ P. If β and β are both nondecreasing, then (b β ) (b β ) ∈ W and the proof is complete. For every α ∈ (0 1), β = [αβ + (1 − α)β]/2 + [αβ + (1 − α)β]/2. For every α sufficiently small, αβ + (1 − α)β and αβ + (1 − α)β are nondecreasing members of P. Hence, β is not an extreme point of W . Q.E.D. COROLLARY A.2.1: Let (b β) ∈ W be a step function with K steps and let W (b) = {β ∈ [0 1]K : (b β ) ∈ W }. Then (b β) is an extreme point of W if and only if β is an extreme point of W (b). PROOF: Since β1 and β2 in the proof of the lemma are nondecreasing, if (b β) is not an extreme point of W , β is not an extreme point of W (b). The converse—if (b β) is an extreme point of W , then it is an extreme point of W (b)—is trivial. Q.E.D. The following lemma, used in the proof of Theorem 1, provides a convenient characterization of functions that are monotone almost everywhere. LEMMA A.3: Let X = [x x] ⊆ R and let λb be a probability measure on the Borel σ-algebra of X. Define an order on X I by x x if x1 ≤ x1 and xi ≥ xi for i > 1. Let f : X I → [0 1] be a measurable function. The following statements are equivalent: (i) ∃A ⊆ X λIb (A) = 1 : [x x ∈ A x x] ⇒ f (x ) ≤ f (x). I I (ii) ∀B := i=1 Bi and B := i=1 Bi [Bi Bi ⊆ X], λb (Bi ) > 0, λb (Bi ) > 0, such that [x ∈ B x ∈ B ⇒ x x], 1 1 I f dλ ≤ f dλIb b λIb (B ) B λIb (B) B PROOF: (i) ⇒ (ii). We have [x ∈ (B ∩ A) x ∈ B ∩ A]
⇒
f (x ) ≤ f (x)
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
1935
Therefore, E[f |B ∩ A] ≤ E[f |B ∩ A]. Since λIb (B ∩ A) = λIb (B) and similarly for B , E[f |B ] ≤ E[f |B]. (ii) ⇒ (i). Without loss of generality, let X = [0 1]. For n = 1 2 let Gn = [0 1/2n ) [1/2n 2/2n ) [k/2n (k + 1)/2n ) [(2n − 1)/2n 2n /2n ] Then {Gn } is a sequence of increasingly finer, nested partitions of [0 1]; each partition is a collection of disjoint intervals of [0 1]. Define ⎧ ⎨ 1 f dλIb if x ∈ B ∈ GIn and λIb (B) > 0, fn (x) = λIb (B) B ⎩ 0 otherwise. By construction fn satisfies (ii), and it is simple to verify that there is An ⊆ X I and λIb (An ) = 1 such that fn satisfies (i) in An . Also by construction, fn = ∞ E[f |σ(GIn )]. Since σ( n=1 σ(GIn )) is the Borel σ-field, ∃A ⊆ X I λIb (A ) = 1 such that ∀x ∈ A {fn (x)} → f (x) (see, for instance, Shiryaev (1991, Theorem 3, p. 510)). ∞ Finally, let A = n=1 An ∩ A . Since A is the countable intersection of sets of measure 1, λIb (A) = 1. Suppose ∃x x ∈ A x x and f (x ) > f (x). Then, for sufficiently large n, fn (x ) > fn (x), a contradiction since fn satisfies (i) with respect to An and A ⊆ An . Therefore, f satisfies (i). Q.E.D. APPENDIX B: MAIN STEPS IN THEOREM 3’S PROOF We discuss here the three lemmas leading to the proof of Theorem 3. Lemma B.1 follows as a corollary to Lemma 5.1. LEMMA B.1: If (q qs ) is a Bayesian incentive-compatible, symmetric mechanism with I bidders and a seller, then (Eq Ex qs ) is in W , where W = (Qb Qs ) Qb : X → [0 1] (12) Qs : Y → [0 1] are nondecreasing and [B ⊆ X S ⊆ Y ] ⇒ I Qb dλb B
Qs dλs ≤ 1 − λb (Bc )I λs (S c )
+ S
Ex ante identical buyers must be treated symmetrically by the mechanism.
1936
A. M. MANELLI AND D. R. VINCENT
We proceed to identify the extreme points of W . Lemma B.2 is the analogue of Lemmas 4.2 and 5.2.8 Once the domain’s partition (for the step function) is determined, these lemmas identify all the step functions (defined by the partition) that are extreme points of the feasible set. Kb s LEMMA B.2: Let ({bk β¯ k }k=1 {bsk β¯ sk }Kk=1 ) be a pair of step functions in W . K K b s The pair ({bk β¯ k }k=1 {bsk β¯ sk }k=1 ) is an extreme point of W if there exists a labeling g relative to (Kb Ks ), such that for every (kb ks ) ∈ {1 Kb } × {1 Ks } one of the following statements holds: kb kb −1 gs (g−1 (k ))−1 s bk )I − ( k=1 bk )I ] k=1b b bk and β¯ sks = (i) β¯ bkb = Ib1 [( k=1 kb gb (gs−1 (ks ))−1 bk )I . ( k=1 (ii) One or both of β¯ 1 and β¯ s1 are zero, and all other (β¯ kb β¯ sks ) are as in (i).
A direct proof of Lemma B.2 can be obtained following the same steps employed in Lemmas 4.2 and 5.2. Instead of repeating those steps, we make a few heuristic observations that lead to the result. Suppose that there are only two heterogeneous agents—a buyer and a seller, i ∈ {b s}. From Lemma 5.2(a), the extreme point mechanism for the buyer is
(13)
β¯ = b kb
gs (gb−1 (kb ))−1
bsk
k=1
for some labeling g. Suppose instead that there is no seller, but that there are I ex ante identical bidders. From Lemma 4.2(a), the symmetric extreme point mechanism is I k −1 I k b b
bj − bj (14)
β¯ kb =
k=1
k=1
Ibkb
This mechanism assigns the object to one of the I bidders. Finally, suppose that there are I ex ante identical bidders plus a seller, a heterogeneous agent. The probability that one of the I bidders gets the object gs (g−1 (k ))−1 s is given by (13), k=1b b bk . This probability must be distributed among the I symmetric bidders; this is done according to (14). The result is precisely Lemma B.2(i). Similar arguments lead to Lemma B.2(ii). 8 In reading the lemma’s statement, recall that summations with no terms are assumed to be zero.
1937
BAYESIAN AND DOMINANT-STRATEGY MECHANISMS
Kb {bsk LEMMA B.3: Let the pair of step functions (Qb Qs ) = ({bk β¯ k }k=1 s Ks ¯ βk }k=1 ) be an extreme point of W and let g be its labeling. The symmetric, mechanism (q qs ) defined below satisfies dominant-strategy incentive compatibility and (Eq Ex qs ) = (Qb Qs ): ⎧ 1 ⎪ ⎪ if Qb (x1 ) = max{Qb (xi )}Ii=1 > 0 ⎨ |{i : ι (x ) = ι (x )}| b i b 1 q(y x) = and ιs (y) ≤ gs gb−1 (ιb (x1 )) − 1, ⎪ ⎪ ⎩ 0 otherwise, ⎧
⎨1 − q(y σi (x)) if Qs (y) > 0, qs (y x) = i∈I ⎩ 0 otherwise,
where ιb (xi ) = k : xi ∈ Qb−1 (β¯ k ) and ιs (y) = k : y ∈ Qs−1 (β¯ sk ). PROOF: We first show that Eq = Qb . If ιb (x1 ) = 1 and β¯ 1 = 0, then Eq(x1 ) = 0 trivially and we are done. Suppose then ιb (x1 ) = kb and β¯ kb = 0. By direct calculation, the expected probability of trade is (15)
Eq(x1 ) =
k
I b −1
1 I −1 i=1
i
i−1
I−1−(i−1) bk
k=1
i−1
(bK− )
gs (g−1 (kb ))−1 b
bsk
k=1
The argument in Lemma 4.3 yields the desired result. We now show that Ex qs = Qs . If ιs (y) = 1 and β¯ s1 = 0, then Ex qs (y) = 0 trivially. Suppose then ιs (y) = ks and β¯ sks = 0. Note then that qs (y x) = 0 ⇐⇒ i∈I q(y σi (x)) = 1 ⇐⇒ ∀i q(y σi (x)) = 0. It follows from the definition of q(y x) in the lemma’s statement that q(y σi (x)) = 0 ∀i ⇐⇒ ks > gs (gb−1 (ιb (xi ))) − 1 or, equivalently, ks ≥ gs (gb−1 (ιb (xi ))). In turn, this is so if and only if gs−1 (ks ) < gb−1 (ιb (xi )) ⇐⇒ gb (gs−1 (ks )) > ιb (xi ) ⇐⇒ gb (gs−1 (ks )) − 1 ≥ ιb (xi ). The probability that xi is gb (gs−1 (ks ))−1 such that ιb (xi ) ≤ gb (gs−1 (ks )) − 1 is k=1 bk . This occurs for the I bidgb (gs−1 (ks ))−1 I ders with probability ( k=1 bk ) . Q.E.D. REFERENCES ARROW, K. (1979): “The Property Rights Doctrine and Demand Revelation Under Incomplete Information,” in Economics and Human Welfare, ed. by M. Boskin. New York: Academic Press. [1906] BERTSEKAS, D. WITH A. NEDIC´ AND A. OXDAGLAR (2003): Convex Analysis and Optimization. Belmont, MA: Athena Scientific. [1933] BORDER, K. (1991): “Implementation of Reduced Form Auctions: A Geometric Approach,” Econometrica, 59, 1175–1187. [1907,1913,1916,1931]
1938
A. M. MANELLI AND D. R. VINCENT (2007): “Reduced Form Auctions Revisited,” Economic Theory, 31, 167–181. [1907,
1931] CRÉMER, J., AND R. MCLEAN (1988): “Full Extraction of the Surplus in Bayesian and Dominant Strategy Auctions,” Econometrica, 56, 1247–1257. [1906] D’ASPREMONT, C., AND L. A. GERARD -VARET (1979a): “Incentives and Incomplete Information,” Journal of Public Economics, 11, 25–45. [1906] (1979b): “On Bayesian Incentive Compatible Mechanisms,” in Aggregation and Revelation of Preferences, ed. by J.-J. Laffont. Amsterdam: North-Holland. [1907] GREEN, J., AND J.-J. LAFFONT (1977): “Characterization of Satisfactory Mechanisms for the Revelation of Preferences for Public Goods,” Econometrica, 45, 427–438. [1906] GRÜNBAUM, B. (2003): Convex Polytopes. New York: Springer-Verlag. [1920] LAFFONT, J.-J., AND E. MASKIN (1979): “A Differentiable Approach to Expected Utility Maximizing Mechanisms,” in Aggregation and Revelation of Preferences, ed. by J.-J. Laffont. Amsterdam: North-Holland. [1907] LAVRICˇ , B. (1993): “Continuity of Monotone Functions,” Archivum Mathematicum, 29, 1–4. [1918] MAKOWSKI, L., AND C. MEZZETTI (1994): “Bayesian and Weakly Robust First-Best Mechanisms: Characterization,” Journal of Economic Theory, 64, 500–519. [1907] MANELLI, A., AND D. VINCENT (2007): “Multidimensional Mechanism Design: Revenue Maximization and the Multiple Good Monopoly,” Journal of Economic Theory, 137, 153–185. [1933] MAS-COLELL, A., M. WHINSTON, AND J. GREEN (1995): Microeconomic Theory. New York, Oxford: Oxford University Press. [1905] MASKIN, E., AND J. RILEY (1984): “Optimal Auction With Risk Averse Buyers,” Econometrica, 52, 1473–1518. [1907,1913] MATTHEWS, S. (1983): “Selling to Risk Averse Buyers With Unobservable Tastes,” Journal of Economic Theory, 30, 370–400. [1907,1913] (1984): “On the Implementability of Reduced Form Auctions,” Econometrica, 52, 1519–1522. [1907,1913] MONTEIRO, P., AND B. FUX SVAITER (2007): “Optimal Auction With a General Distribution: Virtual Valuations Without a Density,” Working Paper, FGV-EPGE, Brazil. [1910] MOOKHERJEE, D., AND S. REICHELSTEIN (1992): “Dominant Strategy Incentive Compatible Allocation Rules,” Journal of Economic Theory, 56, 378–399. [1907] MYERSON, R. (1981): “Optimal Auction Design,” Mathematics of Operations Research, 6, 58–73. [1906,1907,1909,1910] SHIRYAEV, A. (1991): Probability (Second Ed.). Graduate Texts in Mathematics. New York: Springer-Verlag. [1935] WILLIAMS, S. (1999): “A Characterization of Efficient, Bayesian Incentive Compatible Mechanisms,” Economic Theory, 14, 155–180. [1907,1908]
Dept. of Economics, Arizona State University, Tempe, AZ 85287-3806, U.S.A.;
[email protected] and Dept. of Economics, University of Maryland, College Park, MD 20742, U.S.A.;
[email protected]. Manuscript received July, 2008; final revision received May, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 1939–1971
GENERALIZED UTILITARIANISM AND HARSANYI’S IMPARTIAL OBSERVER THEOREM1 BY SIMON GRANT, ATSUSHI KAJII, BEN POLAK, AND ZVI SAFRA Harsanyi’s impartial observer must consider two types of lotteries: imaginary identity lotteries (“accidents of birth”) that she faces as herself and the real outcome lotteries (“life chances”) to be faced by the individuals she imagines becoming. If we maintain a distinction between identity and outcome lotteries, then Harsanyi-like axioms yield generalized utilitarianism, and allow us to accommodate concerns about different individuals’ risk attitudes and concerns about fairness. Requiring an impartial observer to be indifferent as to which individual should face similar risks restricts her social welfare function, but still allows her to accommodate fairness. Requiring an impartial observer to be indifferent between identity and outcome lotteries, however, forces her to ignore both fairness and different risk attitudes, and yields a new axiomatization of Harsanyi’s utilitarianism. KEYWORDS: Generalized utilitarianism, impartial observer, social welfare function, fairness, ex ante egalitarianism.
1. INTRODUCTION THIS PAPER REVISITS HARSANYI’S (1953, 1955, 1977) utilitarian impartial observer theorem. Consider a society of individuals I The society has to choose among different social policies, each of which induces a probability distribution or lottery over a set of social outcomes X . Each individual i has preferences i over these lotteries. These preferences are known and they differ. To help choose among social policies, Harsanyi proposed that each individual should imagine herself as an “impartial observer” who does not know which person she will be. That is, the impartial observer faces not only the real lottery over the social outcomes in X , but also a hypothetical lottery z over which identity in I she will assume. In forming preferences over all such extended lotteries, an impartial observer is forced to make interpersonal comparisons: for example, she is forced to compare being person i in social state x with being person j in social state x . Harsanyi assumed the so-called acceptance principle; that is, when an impartial observer imagines herself being person i, she adopts person i’s preferences over the outcome lotteries. He also assumed that all individuals are expected utility maximizers and that they continue to be so in the role of the impartial observer. Harsanyi argued that these “Bayesian rationality” axioms force 1 We thank John Broome, Jurgen Eichberger, Marc Fleurbaey, Edi Karni, Bart Lipman, Philippe Mongin, Stephen Morris, Heve Moulin, Klaus Nehring, David Pearce, John Quiggin, John Roemer, John Weymark, three referees, and a co-editor for many helpful comments. Atsushi Kajii thanks Grant-in-Aid for Scientific Research S (Grant 90152298) and the Inamori Foundation for support. Zvi Safra thanks the Israel Science Foundation (Grant 1299/05) and the Henry Crown Institute of Business Research for support.
© 2010 The Econometric Society
DOI: 10.3982/ECTA6712
1940
GRANT, KAJII, POLAK, AND SAFRA
the impartial observer to be a (weighted) utilitarian. More formally, over all extended lotteries (z ) in which the identity lottery and the outcome lotteries are independently distributed, the impartial observer’s preferences admit a representation of the form V (z ) = (1) zi Ui () i
where zi is the probability of assuming person i’s identity and Ui () := ui (x)(dx) is person i’s von Neumann–Morgenstern expected utility for the X outcome lottery . Where no confusion arises, we will omit the “weighted” and refer to the representation in (1) simply as utilitarianism.2 Harsanyi’s utilitarianism has attracted many criticisms. We confront just two: one concerning fairness and one concerning different attitudes toward risk. To illustrate both criticisms, consider two individuals i and j, and two social outcomes xi and xj . Person i strictly prefers outcome xi to outcome xj , but person j strictly prefers xj to xi . Perhaps there is some (possibly indivisible) good and xi is the state in which person i gets the good, while xj is the state in which person j gets it. Suppose that an impartial observer would be indifferent between being person i in state xi and being person j in state xj ; hence ui (xi ) = uj (xj ) =: uH . She is also indifferent between being i in xj and being j in xi ; hence ui (xj ) = uj (xi ) =: uL . Additionally, she strictly prefers the first pair (having the good) to the second (not having the good); hence uH > uL . The concern about fairness is similar to Diamond’s (1967) critique of Harsanyi’s aggregation theorem. Consider the two extended lotteries illustrated in tables (a) and (b) in which rows are the people and columns are the outcomes: i j
xi 1/2 1/2 (a)
xj 0 0
i j
xi 1/4 1/4
xj 1/4 1/4
(b)
In each, the impartial observer has a half chance of being person i or person j. But in table (a), the good is simply given outright to person i: outcome 2 Some writers (e.g., Sen (1970, 1977), Weymark (1991), Mongin (2001, 2002)) reserve the term utilitarianism for social welfare functions in which all the zi ’s are equal and the Ui ’s are welfares, not just von Neumann–Morgenstern utilities. Harsanyi claimed that impartial observers should assess social policies using equal zi weights, and that von Neumann–Morgenstern utilities should be identified with welfares. Harsanyi (1977, pp. 57–60) conceded that his axioms do not force all potential impartial observers to agree in their extended preferences. Nevertheless, he claimed that, given enough information about “the individuals’ psychological, biological and cultural characteristics,” all impartial observers would agree. These extra claims are not the focus of this paper, but we will return to the issues of agreement and welfare in Section 7.
GENERALIZED UTILITARIANISM
1941
xi has probability 1. In table (b), the good is allocated by tossing a coin: the outcomes xi and xj each have probability 1/2. Diamond argued that a fairminded person might prefer the second allocation policy since it gives each person a “fair shake.”3 But Harsanyi’s utilitarian impartial observer is indifferent to such considerations of fairness. Each policy (or its associated extended lottery) involves a half chance of getting the good and hence yields the impartial observer 12 uH + 12 uL . The impartial observer cares only about her total chance of getting the good, not how this chance is distributed between person i and person j. The concern about different risk attitudes is less familiar.4 Consider the two extended lotteries illustrated in tables (c) and (d): i j
xi 1/2 0
xj 1/2 0
(c)
i j
xi 0 1/2
xj 0 1/2
(d)
In each, the impartial observer has a half chance of being in state xi or state xj , and hence a half chance of getting the good. But in (c), the impartial observer faces this risk as person i, while in (d), she faces the risk as person j. Suppose that person i is more comfortable facing such a risk than is person j.5 Harsanyi’s utilitarian impartial observer is indifferent to such considerations of risk attitude. Each of the extended lotteries (c) and (d) again yields 12 uH + 12 uL . Thus, Harsanyi’s impartial observer does not care who faces this risk. In his own response to the concern about fairness, Harsanyi (1975) argued that even if randomizations were of value for promoting fairness (which he doubted), any explicit randomization is superfluous since “the great lottery of (pre-)life” may be viewed as having already given each child an equal chance of being each individual. That is, for Harsanyi, it does not matter whether a good is allocated by a (possibly imaginary) lottery over identities as in table (a), or 3 Societies often use both simple lotteries and weighted lotteries to allocate goods (and bads), presumably for fairness considerations. Examples include the draft, kidney machines, oversubscribed events, schools, and public housing, and even whom should be thrown out of a lifeboat! For a long list and an enlightening discussion, see Elster (1989). 4 Pattanaik (1968) remarked that in reducing an identity-outcome lottery to a one-stage lottery, “what we are actually doing is to combine attitudes to risk of more than one person” (pp. 1165– 1166). 5 To make this notion of greater “comfort” concrete, suppose that both people have certainty equivalents for the risk of a half chance of being in states xi or xj —call these certainty equivalents yi and yj , respectively—and suppose that, according to the interpersonal comparisons of the impartial observer, person j is prepared to give up more than person i to remove this risk: that is, the impartial observer would prefer to be person i with yi than person j with yj . In this case, by the definition of a certainty equivalent, the acceptance principle, and transitivity, the impartial observer would prefer to face the risk of a half chance of being in states xi or xj as person i than as person j.
1942
GRANT, KAJII, POLAK, AND SAFRA
by a (real) lottery over outcomes as in table (c), or by some combination of the two as in table (b). The dispute about fairness thus seems to rest on whether we are indeed indifferent between identity and outcome lotteries; that is, between “accidents of birth” and real “life chances.” For Harsanyi, they are equivalent, but for those concerned about fairness, genuine life chances might be preferred to mere accidents of birth.6 If we regard outcome and identity lotteries as equivalent, there is little scope left to accommodate different risk attitudes of different individuals. For example, the outcome lottery in table (c) would be indifferent to the identity lottery in table (a) even though the risk in the first is faced by person i and the risk in the second is faced by the impartial observer. Similarly, the outcome lottery in (d) would be indifferent to the identity lottery in (a). Hence the two outcome lotteries (c) and (d) must be indifferent even though one is faced by person i and the other by person j. In effect, indifference between outcome and identity lotteries treats all risks as if they were faced by one agent, the impartial observer: it forces us to conflate the risk attitudes of individuals with those of the impartial observer herself. But Harsanyi’s own acceptance principle states that when the impartial observer imagines herself as person i, she should adopt person i’s preferences over the outcome lotteries faced by person i. This suggests that different lotteries perhaps should not be treated as equivalent if they are faced by different people with possibly different risk attitudes. We want to make explicit the possibility that an impartial observer might distinguish between the identity lotteries (I ) she faces and the outcome lotteries (X ) faced by the indviduals. Harsanyi’s impartial observer is assumed to form preferences over the entire set of joint distributions (I × X ) over identities and outcomes. In such a setup, it is hard to distinguish outcome from identity lotteries since the resolution of identity can partially or fully resolve the outcome. For example, the impartial observer could face a joint distribution in which, if she becomes person i, then society holds the outcome lottery , but if she becomes person j, then social outcome x obtains for sure. To keep this distinction clean, we restrict attention to product lotteries (I ) × (X ). That is, the impartial observer only forms preferences over extended lotteries in which the outcome lottery she faces is the same regardless of which identity she assumes. That said, our restriction to product lotteries is for conceptual clarity only and is not essential for the main results.7 Harsanyi’s assumption that identity and outcome lotteries are equivalent is implicit. Suppose that, without imposing such an equivalence, we impose each of Harsanyi’s three main assumptions: that if the impartial observer imagines being individual i, she accepts the preferences of that individual; that each individual satisfies independence over the lotteries he faces (which are outcome 6 This could be seen as an example of what Ergin and Gul (2009) called issue or source preference. 7 See Section 6.
GENERALIZED UTILITARIANISM
1943
lotteries); and that the impartial observer satisfies independence over the lotteries she faces (which are identity lotteries). Notice that, by acceptance, the impartial observer inherits independence over outcome lotteries. But this is not enough to force us to the (weighted) utilitarianism of expression (1). Instead (Theorem 1), we obtain a generalized (weighted) utilitarian representation: V (z ) = (2) zi φi (Ui ()) i
where zi is again the probability of assuming person i’s identity and Ui () is again person i’s expected utility from the outcome lottery , but each φi (·) is a (possibly nonlinear) transformation of person i’s expected utility. Generalized utilitarianism is well known to welfare economists, but has not before been given foundations in the impartial-observer framework.8 Generalized utilitarianism can accommodate concerns about fairness if the φi functions are concave.9 Harsanyi’s utilitarianism can be thought of as the special case where each φi is affine. The discussion above suggests that these differences about fairness involve preferences between identity and outcome lotteries. The framework allows us to formalize this intuition: we show that a generalized utilitarian impartial observer has concave φi functions if and only if she has a preference for outcome lotteries over identity lotteries (i.e., a preference for life chances), and she is a utilitarian if and only if she is indifferent between outcome and identity lotteries (i.e., indifferent between life chances and accidents of birth).10 Generalized utilitarianism can accommodate concerns about different risk attitudes simply by allowing the φi functions to differ in their degree of concavity or convexity.11 In the example above, the impartial observer first assessed equal welfares to being person i in state xi or person j in state xj , and equal welfares to being i in xj or j in xi . The issue of different risk attitudes seemed to rest on whether such equal welfares implies equal von Neumann–Morgenstern utilities. We show that a generalized utilitarian impartial observer uses the 8 For example, see Blackorby, Bossert, and Donaldson (2005, Chap. 4) and Blackorby, Donaldson, and Mongin (2004). Both obtained similar representations for aggregating utility vectors; the former from Gorman-like separability assumptions; the latter by assuming consistency between evaluations based on the ex post social welfares and those based on ex ante utilities. See also Blackorby, Donaldson, and Weymark (1999). 9 In our story, we have φi (ui (xi )) = φj (uj (xj )) > φi (ui (xj )) = φj (uj (xi )). Thus, if the φ functions are strictly concave, the impartial observer evaluation of allocation policy (c) is φi ( 12 ui (xi ) + 12 ui (xj )) > 12 φi (ui (xi )) + 12 φi (ui (xj )) = 12 φi (ui (xi )) + 12 φj (uj (xi )), her evaluation of policy (a). The argument comparing (b) and (a) is similar. 10 This provides a new axiomatization of Harsanyi’s utilitarianism that is distinct from, for example, Karni and Weymark (1998) or Safra and Weissengrin (2003). 11 For example, if φi is strictly concave but φj is linear, then the impartial observer’s evaluation of policy (c) φi ( 12 ui (xi ) + 12 ui (xj )) > 12 φi (ui (xi )) + 12 φi (ui (xj )) = 12 φj (uj (xj )) + 12 φj (uj (xi )) = φj ( 12 uj (xj ) + 12 uj (xi )), her evaluation of policy (d).
1944
GRANT, KAJII, POLAK, AND SAFRA
same φ function for all people (implying the same mapping from their von Neumann–Morgenstern utilities to her welfare assessments) if and only if she would be indifferent as to which person to be when facing such similar risks. Where does Harsanyi implicitly assume both indifference between life chances and accidents of births, and indifference between individuals facing similar risks? Harsanyi’s independence axiom goes further than ours in two ways. First, in our case, the impartial observer inherits independence over outcome lotteries indirectly (via acceptance) from individuals’ preferences. In contrast, Harsanyi’s axiom imposes independence over outcome lotteries directly on the impartial observer. We will see that this direct imposition forces the impartial observer to be indifferent as to which individual faces similar risks. Second, Harsanyi’s independence axiom extends to randomizations that simultaneously mix outcome and identity lotteries. We will see that this assumption forces the impartial observer to be indifferent between these two types of randomization, and this in turn precludes concern for fairness. Earlier attempts to accommodate fairness considerations focussed on dropping independence. For example, Karni and Safra (2002) relaxed independence for the individual preferences, while Epstein and Segal (1992) relaxed independence for the impartial observer.12 Our approach maintains independence for each agent but restricts its domain to the lotteries faced by that agent. Section 2 sets up the framework. Section 3 axiomatizes generalized utilitarianism. Section 4 deals with concerns about fairness. We show that the impartial observer ignoring these concerns is equivalent to her being indifferent between identity and outcome lotteries. This yields a new axiomatization of Harsanyi’s utilitarianism. Section 5 deals with concerns about different risk attitudes. Section 6 first shows how to extend our analysis to the entire set of joint distributions (I × X ) over identities and outcomes. We then show how Harsanyi’s independence axiom restricted to our domain of product lotteries, (I ) × (X ), implies our independence axiom and both of our indifference conditions: indifference between outcome and identity lotteries, and indifference as to who faces similar risks. Section 7 considers four possible views (including the one taken in this paper) for the role of the impartial observer. For each view we ask “What are the knowledge requirements for the impartial observer” and “Must all potential impartial observers agree in their preferences over extended lotteries?” Then we relate these to the issues of fairness and different risk attitudes. Proofs are given in the Appendix A. Appendix B contains additional examples and discussion. 12 Strictly speaking, Epstein and Segal’s paper is in the context of Harsanyi’s (1955) aggregation theorem. In addition, Broome (1991) addressed fairness concerns by expanding the outcome space to include the means of allocation (e.g., the use of a physical randomization device) as part of the description of the final outcome.
GENERALIZED UTILITARIANISM
1945
2. SETUP AND NOTATION Let society consist of a finite set of individuals I = {1 I}, I ≥ 2, with generic elements i and j. The set of social outcomes is denoted by X with generic element x. The set X is assumed to have more than one element and to be a compact metrizable space that has associated with it the set of events E , which is taken to be the Borel sigma algebra of X . Let (X ) (with generic element ) denote the set of outcome lotteries; that is, the set of probability measures on (X E ) endowed with the weak convergence topology. These lotteries represent the risks actually faced by each individual in his or her life. With slight abuse of notation, we will let x or sometimes [x] denote the degenerate outcome lottery that assigns probability weight 1 to social state x. Each individual i in I is endowed with a preference relation i defined over the set of life chances (X ). We assume throughout that for each i in I , the preference relation i is a complete, transitive binary relation on (X ) and that its asymmetric part i is nonempty. We assume these preferences are continuous in that weak upper and weak lower contour sets are closed. Hence for each i there exists a nonconstant function Vi : (X ) → R, that satisfies, for any and in (X ), Vi () ≥ Vi ( ) if and only if i . In summary, a society may be characterized by the tuple X I {i }i∈I . In Harsanyi’s story, the impartial observer imagines herself behind a veil of ignorance, uncertain about which identity she will assume in the given society. Let (I ) denote the set of identity lotteries on I. Let z denote the typical element of (I ) and let zi denote the probability assigned by the identity lottery z to individual i. These lotteries represent the imaginary risks in the mind of the impartial observer of being born as someone else. With slight abuse of notation, we will let i or sometimes [i] denote the degenerate identity lottery that assigns probability weight 1 to the impartial observer assuming the identity of individual i. As discussed above, we assume that the outcome and identity lotteries faced by the impartial observer are independently distributed; that is, she faces a product lottery (z ) ∈ (I ) × (X ). We sometimes refer to this as a product identity-outcome lottery or, where no confusion arises, simply as a product lottery. Fix an impartial observer endowed with a preference relation defined over (I ) × (X ). We assume throughout that is complete, transitive continuous (in that weak upper and weak lower contour sets are closed in the product topology), and that its asymmetric part is nonempty, so it admits a (nontrivial) continuous representation V : (I ) × (X ) → R. That is, for any pair of product lotteries (z ) and (z ), (z ) (z ) if and only if V (z ) ≥ V (z ). DEFINITION 1 —Utilitarianism: We say that the impartial observer is a (weighted) utilitarian if her preferences admit a representation {Ui }i∈I of
1946
GRANT, KAJII, POLAK, AND SAFRA
the form V (z ) =
I
zi Ui ()
i=1
where, for each individual i in I, Ui : (X ) → R is a von Neumann–Morgen stern expected-utility representation of i ; that is, Ui () := X ui (x)(dx). DEFINITION 2—Generalized Utilitarianism: We say that the impartial observer is a generalized (weighted) utilitarian if her preferences admit a representation {Ui φi }i∈I of the form V (z ) =
I
zi φi [Ui ()]
i=1
where, for each individual i in I, φi : R → R is a continuous, increasing function and Ui : (X ) → R is a von Neumann–Morgenstern expected-utility representation of i . 3. GENERALIZED UTILITARIANISM In this section, we axiomatize generalized utilitarianism. The first axiom is Harsanyi’s acceptance principle. In degenerate product lotteries of the form (i ) or (i ), the impartial observer knows she will assume identity i for sure. The acceptance principle requires that, in this case, the impartial observer’s preferences must coincide with that individual’s preferences i over outcome lotteries. AXIOM 1—Acceptance Principle: For all i in I and all ∈ (X ), i if and only if (i ) (i ). Second, we assume that each individual i’s preferences satisfy the independence axiom for the lotteries he faces; that is, outcome lotteries. AXIOM 2—Independence Over Outcome Lotteries (for Individual i): Sup˜ ˜ ∈ (X ), ˜ i ˜ if and pose , ∈ (X ) are such that ∼i . Then, for all , only if α˜ + (1 − α) i α˜ + (1 − α) for all α in (0 1]. Third, we assume that the impartial observer’s preferences satisfy independence for the lotteries she faces; that is, identity lotteries. Here, however, we need to be careful. The set of product lotteries (I ) × (X ) is not a convex
GENERALIZED UTILITARIANISM
1947
subset of (I × X ) and hence not all probability mixtures of product lotteries are well defined. Thus, we adopt the following notion of independence.13 AXIOM 3—Independence Over Identity Lotteries (for the Impartial Observer): Suppose (z ), (z ) ∈ (I ) × (X ) are such that (z ) ∼ (z ). ˜ z˜ ∈ (I ), (˜z ) (˜z ) if and only if (αz˜ + (1 − α)z ) Then, for all z, (αz˜ + (1 − α)z ) for all α in (0 1]. To understand this axiom, first notice that the two mixtures on the right side of the implication are identical to α(˜z ) + (1 − α)(z ) and α(z˜ ) + (1 − α)(z ), respectively. These two mixtures of product lotteries are well defined: they mix identity lotteries that hold the outcome lottery fixed. Second, notice that the two product lotteries, (z ) and (z ), that are mixed in with weight (1 − α) are themselves indifferent. The axiom states that mixing in two indifferent lotteries (with equal weight) preserves the original preference be˜ ) and (˜z ) prior to mixing. Finally, notice that this axiom only tween (z applies to mixtures of identity lotteries that hold the outcome lotteries fixed, not to the opposite case: mixtures of outcome lotteries that hold the identity lotteries fixed. To obtain our representation results, we work with a richness condition on the domain of individual preferences: we assume that none of the outcome lotteries under consideration is Pareto dominated. CONDITION 1—Absence of Unanimity: For all ∈ (X ), if i for some i in I , then there exists j in I such that j . This condition is perhaps a natural restriction in the context of Harsanyi’s thought experiment. That exercise is motivated by the need to make social choices when agents disagree. We do not need to imagine ourselves as impartial observers facing an identity lottery to rule out social alternatives that are Pareto dominated.14 These axioms are enough to yield a generalized utilitarian representation. THEOREM 1—Generalized Utilitarianism: Suppose that absence of unanimity applies. Then the impartial observer’s preferences admit a generalized utilitarian representation {Ui φi }i∈I if and only if the impartial observer satisfies the 13 This axiom is based on Fishburn’s (1982, p. 88) and Safra and Weissengrin’s (2003) substitution axioms for product lottery spaces. Their axioms, however, apply wherever probability mixtures are well defined in this space. We only allow mixtures of identity lotteries. In this respect, our axiom is similar to Karni and Safra’s (2000) “constrained independence” axiom, but their axiom applies to all joint distributions over identities and outcomes, not just to product lotteries. 14 In Harsanyi’s thought experiment, Pareto dominated lotteries would never be chosen by the impartial observer since the combination of the acceptance principle and Harsanyi’s stronger independence axioms implies the Pareto criterion. We are grateful to a referee for making this point.
1948
GRANT, KAJII, POLAK, AND SAFRA
acceptance principle and independence over identity lotteries, and each individual satisfies independence over outcome lotteries. Moreover the functions Ui are unique up to positive affine transformations and the composite functions φi ◦ Ui are unique up to a common positive affine transformation. Grant, Kajii, Polak, and Safra (2006, Theorem 8) showed that without absence of unanimity, we still obtain a generalized utilitarian representation, but we lose the uniqueness of the composite functions φi ◦ Ui . Notice that although the representation of each individual’s preferences Ui is affine in outcome lotteries, in general, the representation of the impartial observer’s preferences V is not. 4. FAIRNESS OR EX ANTE EGALITARIANISM So far we have placed no restriction on the shape of the φi functions except that they are increasing. In a standard utilitarian social welfare function, each ui function maps individual i’s income to an individual utility. These incomes differ across people, and concavity of the ui functions is associated with egalitarianism over incomes. In a generalized utilitarian social welfare function, each φi function maps individual i’s expected utility Ui () to a utility of the impartial observer. These expected utilities differ across people, and concavity of the φi functions is associated with egalitarianism over expected utilities, which is often called ex ante egalitarianism.15 We will show that concavity of the φi functions is equivalent to an axiom that generalizes the example in the Introduction. The example involved two indifference sets of the impartial observer: one that contains (i xi ) and (j xj ) and one that contains (i xj ) and (j xi ). We argued that a preference for fairness corresponds to preferring a randomization between these indifference sets in outcome lotteries to a randomization in identity lotteries. To generalize, suppose the impartial observer is indifferent between (z ) and (z ), and consider the product lottery (z ) that (in general) lies in a different indifference set. There are two ways to randomize between these indifference sets while remaining in the set of product lotteries. The product lottery (z α + (1 − α) ) randomizes between these indifference sets in outcome lotteries (i.e., real life chances), while the product lottery (αz + (1 − α)z ) randomizes between these indifference sets in identity lotteries (i.e., imaginary accidents of birth). 15
See, for example, Broome (1984), Myerson (1981), Hammond (1981, 1982), and Meyer (1991). In our context, it is perhaps better to call this interim egalitarianism since it refers to distributions after the resolution of the identity lottery but before the resolution of the outcome lottery. We can contrast this with a concern for ex post inequality of individuals’ welfare; see, for example, Fleurbaey (2007).
GENERALIZED UTILITARIANISM
1949
AXIOM 4—Preference for Life Chances: For any pair of identity lotteries z and z in (I ), and any pair of outcome lotteries and in (X ), if (z ) ∼ (z ), then (z α + (1 − α) ) (αz + (1 − α)z ) for all α in (0 1). If we add this axiom to the conditions of Theorem 1, then we obtain concave generalized utilitarianism. PROPOSITION 2—Concavity: Suppose that absence of unanimity applies. A generalized utilitarian impartial observer with representation {Ui φi }i∈I exhibits preference for life chances if and only if each of the φi functions is concave. This result does rely on there being some richness in the underlying preferences so that preference for life chances has bite. In particular, Example 2 in Appendix B shows that if all agents agree in their ranking of all outcome lotteries, then the φi ’s need not be concave. This is ruled out in the proposition by absence of unanimity. As discussed, Harsanyi treated identity and outcome lotteries as equivalent. Hence he implicitly imposed the following indifference. AXIOM 4∗ —Indifference Between Life Chances and Accidents of Birth: For any pair of identity lotteries z and z in (I ), and any pair of outcome lotteries and in (X ), if (z ) ∼ (z ), then (z α + (1 − α) ) ∼ (αz + (1 − α)z ) for all α in (0 1). This is a very strong assumption. If we impose this indifference as an explicit axiom, then as a corollary of Proposition 2, we obtain that each φi function must be affine. In this case, if we let Uˆ i := φi ◦ Ui , then Uˆ i is itself a von Neumann–Morgenstern expected-utility representation of i . Thus, we immediately obtain Harsanyi’s utilitarian representation. In fact, we obtain a stronger result. This indifference over the type of randomization allows us to dispense with the independence axiom over outcome lotteries for the individuals. THEOREM 3—Utilitarianism: Suppose that absence of unanimity applies. The impartial observer’s preferences admit a utilitarian representation {Ui }i∈I if and only if the impartial observer satisfies the acceptance principle and independence over identity lotteries, and is indifferent between life chances and accidents of birth. Moreover, the functions Ui are unique up to common positive affine transformations. Standard proofs of Harsanyi’s utilitarianism directly impose stronger notions of independence16 ; for example, the following axiom is usually imposed. 16
See Section 6 for details.
1950
GRANT, KAJII, POLAK, AND SAFRA
AXIOM 5—Independence Over Outcome Lotteries (for the Impartial Observer): Suppose (z ), (z ) ∈ (I ) × (X ) are such that (z ) ∼ (z ). ˜ ˜ ∈(X ), (z ) ˜ (z ˜ ) if and only if (z α˜ + (1 − a)) Then for all , (z α˜ + (1 − a) ) for all α in (0 1]. This axiom is the symmetric analog of Axiom 3, identity independence for the impartial observer, with the roles of identity and outcome lotteries reversed. Clearly, if the impartial observer satisfies this independence, then it would be redundant for her to inherit independence over outcome lotteries from individual preferences. Moreover, given acceptance, this independence for the impartial observer imposes independence on the individuals. We do not directly impose independence over outcome lotteries on the impartial observer, but our axioms imply it. COROLLARY 4: Suppose that absence of unanimity applies. Then the impartial observer satisfies independence over outcome lotteries if she satisfies acceptance and independence over identity lotteries, and is indifferent between life chances and accidents of birth. In summary, what separates Harsanyi from those generalized utilitarian impartial observers who are ex ante egalitarians is their preferences between outcome and identity lotteries. If the impartial observer prefers outcome lotteries, she is an ex ante egalitarian. If she is indifferent (like Harsanyi), then she is a utilitarian. Moreover, indifference between outcome and identity lotteries forces the generalized utilitarian to accept stronger notions of independence. 5. DIFFERENT RISK ATTITUDES Recall that an impartial observer’s interpersonal welfare comparisons might rank (i xi ) ∼ (j xj ) and (i xj ) ∼ (j xi ), but if person i is more comfortable facing risk than person j, she might rank (i 12 [xi ] + 12 [xj ]) (j 12 [xi ] + 12 [xj ]). Harsanyi’s utilitarianism rules this out. An analogy might be useful. In the standard representative-agent model of consumption over time, each time period is assigned one utility function. This utility function must reflect both risk aversion in that period and substitutions between periods. Once utilities are scaled for intertemporal welfare comparisons, there is limited scope to accommodate different risk attitudes across periods. Harsanyi’s utilitarian impartial observer assigns one utility function per person. This utility function must reflect both the risk aversion of that person and substitutions between people. Once utilities are scaled for interpersonal welfare comparisons, there is limited scope to accommodate different risk attitudes across people.
GENERALIZED UTILITARIANISM
1951
Given this analogy, it is not surprising that generalized utilitarianism can accommodate different risk attitudes. Each person is now assigned two functions, φi and ui , so we can separate interpersonal welfare comparisons from risk aversion. To be more precise, we first generalize the example in the Introduction. DEFINITION 3—Similar Risks: Suppose the impartial observer assesses ˜ ∼ (j ˜ ). Then, for all α in (0 1), the two outcome (i ) ∼ (j ) and (i ) ˜ lotteries α + (1 − α) and α˜ + (1 − α) are similar risks for individuals i and j, respectively. These risks are similar for i and j in that they are across outcome lotteries that the impartial observer has assessed to have equal welfare for individuals i and j, respectively. If individual j is more risk averse than individual i, we might expect the impartial observer to prefer to face these similar risks as person i. DEFINITION 4—Preference to Face Similar Risks as i Rather Than j: Fix a pair of individuals i and j in I . The impartial observer is said to prefer to face similar risks as individual i rather than as individual j if, for any four outcome ˜ and ˜ in (X ), if (i ) ∼ (j ) and (i ) ˜ ∼ (j ˜ ) then (i α˜ + lotteries , (1 − α)) (j α˜ + (1 − α) ) for all α in [0 1]. Recall that agent j is more income risk averse than agent i if the function uj that maps income to agent j’s von Neumann–Morgenstern utility is a concave is convex. For transformation of that function ui for agent i; that is, ui ◦ u−1 j maps the utilities of the impartial observer (used in her each i, the function φ−1 i interpersonal welfare comparisons) to agent i’s von Neumann–Morgenstern utility. Thus, if agent j is more (welfare) risk averse than agent i, then φ−1 j is a −1 −1 concave transformation of φi ; that is, φi ◦ φj is convex everywhere they are comparable. The next proposition makes this precise. PROPOSITION 5—Different Risk Attitudes: Suppose that absence of unanimity applies. A generalized utilitarian impartial observer with representation {Ui φi }i∈I always prefers to face similar risks as i rather than j if and ◦ φj is convex on the domain Uji := {u ∈ only if the composite function φ−1 i R : there exists ∈ (X ) with (i ) ∼ (j ) and Uj ( ) = u}. Next consider indifference as to which individual should face similar risks. AXIOM 6—Indifference Between Individuals Facing Similar Risks: For any ˜ and ˜ in pair of individuals i and j in I and any four outcome lotteries , , , ˜ ˜ (X ), if (i ) ∼ (j ) and (i ) ∼ (j ), then, for all α in [0 1], the impartial observer is indifferent between facing the similar risks α˜ + (1 − α) and α˜ + (1 − α) as individual i or j, respectively.
1952
GRANT, KAJII, POLAK, AND SAFRA
Harsanyi’s utilitarian impartial observer satisfies this indifference: it is an immediate consequence of independence over outcome lotteries for the impartial observer. But we can imagine an impartial observer who, without necessarily satisfying all of Harsanyi’s axioms, is nevertheless indifferent as to which individual should face similar risks. For example, consider an impartial observer in the analog of a representative-agent model. In the standard representativeagent model, all individuals have the same preferences over private consumption and the same attitude to risk. In our setting, we must allow individuals to have different preferences over public outcomes.17 But, as in the standard representative-agent model, we could assume that each individual had the same risk attitude across outcome lotteries that had been assessed to have equal welfare. This is precisely the indifference property in Axiom 6. Given Proposition 5, for any two individuals i and j, indifference between individuals facing similar risks forces the φi and φj functions to be identical up to positive affine transformations provided Uji has a nonempty interior. Hence we can make the following statement. PROPOSITION 6—Common φ Function: Suppose that absence of unanimity applies and consider a generalized utilitarian impartial observer. There exists a generalized utilitarian representation {Ui φi }i∈I with φi = φ for all i in I if and only if the impartial observer is indifferent between individuals facing similar risks. Moreover if for any pair of individuals i and j in I , there exists a sequence of individuals j1 jN with j1 = i and jN = j such that Ujn jn−1 has nonempty interior, then the functions Ui are unique up to a common positive affine transformation and the composite functions φ ◦ Ui are unique up to a common positive affine transformation. To compare results, a generalized utilitarian impartial observer who is not concerned about the issue of different individual risk attitudes (and hence satisfies indifference between individuals facing similar risks) need not be a utilitarian. She needs only to translate individuals’ von Neumann–Morgenstern utilities using a common φ function when making welfare comparisons across those individuals. Hence such an impartial observer can accommodate issues of fairness: in particular, the common φ function might be concave. In contrast, a generalized utilitarian impartial observer who is not concerned about issues of fairness (and hence satisfies indifference between life chances and accidents of birth) must be a utilitarian. Hence such an impartial observer cannot accommodate the issue of different individual risk attitudes. To see this directly, recall that independence over outcome lotteries for the impartial observer immediately implies indifference between individuals facing similar risks. In addition, by Corollary 4, for a generalized utilitarian impartial 17 For example, public outcome xi might allocate an indivisible good to person i, while xj might allocate it to person j.
GENERALIZED UTILITARIANISM
1953
observer, indifference between life chances and accidents of birth implies independence over outcome lotteries for the impartial observer. Consideration of different risk aversions and consideration of fairness are distinct issues, and they may lead an impartial observer in opposite directions. For example, suppose that all individuals are extremely risk averse over outcome lotteries, but that the impartial observer is almost risk neutral over identity lotteries. This impartial observer, anticipating the real discomfort that outcome lotteries would cause people, might prefer to absorb the risk into the imaginary identity lottery of her thought experiment. That is, she might prefer a society in which most uncertainty has been resolved—and hence people would “know their fates”—by the time they were born. Such an impartial observer would prefer accidents of birth to life chances: she would be an ex ante antiegalitarian. 6. CONTRASTING INDEPENDENCES AND DOMAINS Recall that Harsanyi worked with the full set of joint distributions (I × X ), not just the product lotteries (I )×(X ). He imposed independence directly on the impartial observer for all mixtures defined on that domain. In this section, we first consider the natural extensions of our axioms for the impartial observer in the larger domain (I × X ). Second, we consider restricting Harsanyi’s original independence axiom defined on (I × X ) to the set of product lotteries (I )×(X ). Third, we discuss whether imposing identity and outcome independence directly on the impartial observer is enough to induce utilitarianism. 6.1. The Full Set of Joint Distributions Suppose that the impartial observer has preferences over the full space of joint distributions over identities and outcomes, (I × X ). With slight abuse of notation, let continue to denote these larger preferences. For purposes of comparison, it is convenient to denote each element of (I × X ) in the form (z (i )i∈I ), where z ∈ (I ) is the marginal on the identities and each i ∈ (X ) is the outcome lottery conditional on identity i obtaining. Thus (i )i∈I is a vector of conditional outcome lotteries. Notice that, in this larger setting, the impartial observer imagines each individual having his own personal outcome lottery. In this setting, the analog of our independence over identity lotteries axiom, Axiom 3, for the impartial observer is as follows: AXIOM 3∗ —Constrained Independence Over Identity Lotteries (for the Impartial Observer): Suppose (z (i )i∈I ), (z (i )i∈I ) ∈ (I × X ) are such that ˜ (i )i∈I ) (z˜ (i )i∈I ) if (z (i )i∈I ) ∼ (z (i )i∈I ). Then, for all z˜ , z˜ ∈ (I ), (z and only if (αz˜ + (1 − α)z (i )i∈I ) (α˜z + (1 − α)z (i )i∈I ) for all α in (0 1].
1954
GRANT, KAJII, POLAK, AND SAFRA
This is the independence axiom suggested by Karni and Safra (2000). Constrained independence over identity lotteries is weaker than Harsanyi’s independence axiom in that it only applies to mixtures of identity lotteries. That is, like our independence axiom for the impartial observer, constrained independence over identity lotteries is independence for the impartial observer over the lotteries that she faces directly—namely, identity lotteries—holding the vector of conditional outcome lotteries fixed. Notice, however, that each resolution of the identity lottery yields not just a different identity, but also a different outcome lottery. This extends the bite of the axiom to the larger space (I × X ). When restricted to the set of product lotteries, (I )×(X ), conditional independence reduces to our independence axiom over identity lotteries. The following axiom (also from Karni and Safra (2000)) is a slight strengthening of Harsanyi’s acceptance axiom. AXIOM 1∗ —Acceptance Principle: For all i in I , all (1 i I ) in (X )I , and all i in (X ), i i i if and only if (i (1 i I )) (i (1 i I )). The motivation for this axiom is the same as that for Harsanyi’s axiom. The slight additional restriction is that if the impartial observer knows that she will assume individual i’s identity, she does not care about the (possibly different) conditional outcome lottery that she would have faced had she assumed some other identity. If we replace our independence and acceptance axioms with these axioms, then our generalized utilitarian representation theorem holds exactly as stated in Theorem 1 except that the representation becomes (3)
V (z (i )i∈I ) =
zi φi (Ui (i ))
i
That is, each individual has a personal conditional outcome lottery i in place of the common outcome lottery . The proof is essentially the same as that of Theorem 1.18 Moreover, Proposition 2, Theorem 3, Proposition 5, and their corollaries all continue to hold (with the same modification about personal outcome lotteries) by the same proofs.19 Thus, if we extend the analogs of our axioms to Harsanyi’s setting (I × X ), we get essentially the same results. 18 See Appendix B. Alternatively, this generalized utilitarian representation could be obtained as a corollary of Theorem 1 in Karni and Safra (2000). 19 Corollary 4 also holds without this modification, and we can also obtain stronger versions of outcome independence.
GENERALIZED UTILITARIANISM
1955
6.2. Harsanyi’s Independence Axiom Restricted to Product Lotteries Conversely, now consider the restriction of Harsanyi’s independence axiom to our setting, (I ) × (X ). In this setting, the analog of Harsanyi’s axiom is to apply independence to all mixtures that are well defined in the set of product lotteries.20 To understand how Harsanyi’s independence relates to the axioms in this paper—and hence to see how Harsanyi implicitly imposes each of those axioms—it helps to unpack Harsanyi’s independence axiom into three axioms, each associated with the type of mixture to which it applies. First, Harsanyi’s independence axiom restricted to product lotteries implies our independence over identity lotteries for the impartial observer. This independence axiom is also satisfied by our generalized utilitarian impartial observer. Second, it implies independence over outcome lotteries, imposed directly on the impartial observer not just derived via acceptance from the preferences of the individuals. This independence axiom immediately implies indifference between individuals facing similar risks. Third, the restriction of Harsanyi’s axiom also forces the impartial observer to apply independence to hybrid mixtures. AXIOM 7—Independence Over Hybrid Lotteries (for the Impartial Observer): Suppose (z ), (z ) ∈ (I ) × (X ) are such that (z ) ∼ (z ). Then, for all z˜ ∈ (I ) and all ˜ ∈ (X ), (˜z ) (resp. ) (z ˜ ) if and only if (αz˜ + (1 − α)z ) (resp. ) (z α˜ + (1 − a) ) for all α in (0 1]. In this axiom, the lotteries being mixed on the left are identity lotteries (holding outcome lotteries fixed), while the lotteries being mixed on the right are outcome lotteries (holding identity lotteries fixed). This independence axiom immediately implies indifference between life chances and accidents of birth. It follows from Theorem 3 that, given absence of unanimity and acceptance, the first and third implication of Harsanyi’s independence axiom when restricted to our setting (I ) × (X )—that is, identity and hybrid independence—are enough to yield Harsanyi’s conclusion—utilitarianism.21 6.3. Independence Along Both Margins A natural question is whether we can replace hybrid independence with outcome independence in the statement above; that is, whether acceptance and both identity and outcome independence are enough to induce utilitarianism. We have argued in this paper that outcome independence is a strong assumption in the context of the impartial observer: it directly imposes independence 20 This is the approach of Safra and Weissengrin (2003), who adapted Fishburn’s (1982, Chap. 7) work on product spaces of mixture sets. 21 Given all three implications of Harsanyi’s independence axiom (i.e., including outcome independence), we can dispense with absence of unanimity; see Safra and Weissengrin (2003).
1956
GRANT, KAJII, POLAK, AND SAFRA
over lotteries that she does not face directly, and by so doing implies much more than simply imposing independence on the individuals and acceptance on the impartial observer. Nevertheless, one might prefer such an axiomatization to using hybrid independence. First, hybrid independence might seem the least natural of the three implications of Harsanyi’s independence axiom for product lotteries. Both outcome and identity independence only involve mixing one margin at a time. Second, an impartial observer might satisfy identity and outcome independence because she views the two types of randomization symmetrically—if independence applies to one margin, then perhaps it should apply to the other—without taking a direct position on whether the two types of randomization are equivalent. It turns out, however, that identity independence, outcome independence, and acceptance are not enough to induce utilitarianism. In fact, we can see this using the example in the Introduction. Once again, suppose that there are two individuals, i and j, and two states, xi and xj , denoting which agent is given a (possibly indivisible) good. As before, suppose that the impartial observer’s preferences satisfy (i xi ) ∼ (j xj ) and (i xj ) ∼ (j xi ). Suppose that both individuals satisfy independence. Specifically, for any outcome lottery , player i’s expected utility is given by Ui () = (xi ) − (xj ) and player j’s expected utility is given by Uj () = (xj ) − (xi ). Let the impartial observer’s preferences be given by the generalized utilitarian representation V (z ) := zi φ[Ui ()] + zj φ[Uj ()], where the (common) φ function is given by φ[u] =
uk −(−u)k
for u ≥ 0, for u < 0,
for some k > 0
Since these preferences are generalized utilitarian (by Theorem 1), they satisfy acceptance and identity independence, and since the φ function is common (by Proposition 6), they satisfy indifference between individuals facing similar risks. It is less obvious that they satisfy outcome independence, but this is shown in Appendix B. These preferences even have the property (similar to utilitarianism) that if the impartial observer thinks she is equally likely to be either person, she is indifferent as to who gets the good. But these preferences do not satisfy utilitarianism unless k = 1. To see this, notice that these preferences fail indifference between life chances and accidents of birth. For example, we have (i xi ) ∼ (j xj ), but (i αxi + (1 − α)xj ) (α[i] + (1 − α)[j] xi ) except in the special case when α = 12 . Nevertheless, the conjecture that independence along both margins implies utilitarianism is close to correct. Grant et al. (2006, Theorem 7) showed that if there are three or more agents, under some richness conditions on the preferences, the combination of identity independence, outcome independence, and acceptance does imply utilitarianism.
GENERALIZED UTILITARIANISM
1957
7. KNOWLEDGE, AGREEMENT, AND WELFARE Two questions figure prominently in the debates on the impartial observer theorem: (i) What is it that an individual imagines and knows when she imagines herself in the role of the impartial observer? (ii) must all potential impartial observers agree in their preferences over extended lotteries? In this section, we consider four (of many) possible views on these questions and show how they relate to the issues in this paper: concern about different risk attitudes (loosely, does the impartial observer use a common φ function?) and concern about fairness (loosely, is her common φ function affine?).22 In one view of the impartial observer, she simply imagines being in the physical circumstances of person i or j facing the outcome lottery or .23 In this view, often associated with Vickrey (1945), the impartial observer does not attempt to imagine having person i’s or j’s preferences. In the context of our example, the impartial observer simply imagines herself having some chance of getting the indivisible good and applies her own preferences about such outcome lotteries. Compared to other views, this approach does not require as much imagination or knowledge on behalf of the impartial observer. In particular, she need not know i’s or j’s preferences. If the impartial observer adopts this approach, loosely speaking, we get a common φ function for free: the utilities in its domain are all utilities of the same agent—the impartial observer. The φ function need not be affine, however, since the impartial observer might still, for example, prefer outcome to identity lotteries. In this approach, there is no reason to expect all impartial observers to agree. For example, different potential impartial observers will generally have different preferences over physical outcome lotteries. This approach does not attempt to follow the acceptance principle. Individuals’ preferences over outcome lotteries (other than those of the impartial observer) play no role. In a second view (the view taken in this paper), the impartial observer imagines not only being in the physical circumstances of person i or j, but also adopting what Pattanaik (1968, p. 1155) called “the subjective features of the respective individuals.” Arrow (1963, p. 114, 1997) called this “extended sympathy,” but it is perhaps better to use Harsanyi’s own term, “imaginative empathy” (Harsanyi (1977, p. 52); notation changed to ours, but emphasis is in the original24 ): 22 The following discussion builds especially on Weymark (1991) and Mongin (2001). For other views, see, for example, Mongin and d’Aspremont (1998). 23 Pattanaik (1968, p. 1155) and Harsanyi (1977, p. 52) referred to these as “objective positions.” 24 Rawls (1951, p. 179) also appealed to such imaginative empathy: “A competent judge must not consider his own de facto preferences as the necessarily valid measure of the actual worth of those interests which come before him, but be both able and anxious to determine, by imaginative appreciation, what those interests mean to persons who share them, and to consider them accordingly” (quoted in Pattanaik (1968, p. 1157–1158)). See also Sen’s (1979) behavioral and introspective bases for interpersonal comparisons of welfare.
1958
GRANT, KAJII, POLAK, AND SAFRA
This must obviously involve [her] imagining [her]self to be placed in individual i’s objective position, i.e., to be placed in the objective positions (e.g., income, wealth, consumption level, state of health, social position) that i would face in social situation x. But it must also involve assessing these objective conditions in terms of i’s own subjective attitudes and personal preferences . . . rather than assessing them in terms of [her] own subjective attitudes and personal preferences.
This approach requires more imagination and knowledge by the impartial observer; in particular, she is assumed to know the preferences of each individual over outcome lotteries and, by acceptance, to adopt these preferences when facing outcome lotteries as that individual. Knowledge and acceptance of individual preferences implies agreement across all potential impartial observers in ranking pairs of the form (i ) and (i ), but as Broome (1993) and Mongin (2001) pointed out (and as Harsanyi (1977, p. 57) himself concedes), it does not imply agreement in ranking pairs of the form (i ) and (j ), where i = j. For example, each impartial observer can have her own rankings across others’ subjective and objective positions. Moreover, unlike in the Vickrey view above, a generalized utilitarian impartial observer in this setting need not use a common φ function across all individuals. To see this, let us extend the example from the Introduction by allowing the good being allocated to be divisible. Suppose that an impartial observer’s own interpersonal assessments are such that she is indifferent between being person i with share s of the good and being person j with the same share s of the good. Suppose that for person i, the outome lottery 12 xi + 12 xj in which he has a half chance of getting the whole good is indifferent to getting half the good for sure, but for person j this same lottery is indifferent to getting one-third of the good for sure. Combining acceptance with her interpersonal assessments, the impartial observer must prefer facing this outcome lottery as person i, but (by Proposition 6) this contradicts using a common φ function (and in particular, not all the φi functions can be affine). A third, more welfarist view goes beyond the assumptions of this paper. Suppose that when an impartial observer imagines being person i facing outcome lottery , she knows the (ex ante) “welfare” that i attains from this lottery. That is, suppose that each person i has a commonly known welfare function wi : (X ) → R. If we assume what Weymark (1991) called congruence between welfare and preference—that is, i if and only if wi () ≥ wi ( )— then this implies, as before, that the impartial observer knows person i’s preferences. Now suppose further that these welfares are at least ordinally measurable and fully comparable, and that the impartial observer satisfies the rule (i ) (j ) if and only if wi () ≥ wj ( ). This extra assumption implies acceptance, but it is stronger. It implies that all potential impartial observers must agree in ranking pairs of the form (i ) and (j ). Nevertheless, a generalized utilitarian impartial observer in this setting still need not use a common φ function across all individuals. The example above still applies. The wi (·) functions can encode the impartial observer’s assessment about being indifferent between being i or j with the same share s of the
GENERALIZED UTILITARIANISM
1959
good, and they can encode i and j’s different certainty equivalents. Again, this forces φi and φj to differ (and at least one to be nonaffine). Moreover, these welfarist assumptions still do not imply full agreement across potential impartial observers. All impartial observers must agree in the ranking of extended lotteries in which they know for sure which identity they will assume, but they can still differ in their ranking of general extended lotteries of the form (z ) and (z ). For example, different impartial observers might have different preferences between outcome and identity lotteries, and/or each impartial observer can have her own risk attitude in facing identity lotteries, reflected in her own set of φi functions. That is, even with these extreme assumptions, different impartial observers with different risk attitudes will make different social choices. To get beyond this conclusion, a fourth view simply assumes that each potential impartial observer’s von Neumann–Morgenstern utility V (i ) from the extended lottery (i ) is equal to the commonly known (fully comparable) welfare wi (), which in turn is equal to individual i’s von Neumann–Morgenstern utility Ui ().25 In this case, all attitudes toward similar risks are the same; in particular, the preferences of the impartial observer and the individuals i and j in the example above can no longer apply. With this strong identification assumption, we finally get both an affine common φ function (i.e., utilitarianism) and agreement among all potential impartial observers, but this approach seems a few assumptions beyond Harsanyi’s claim to have derived utilitarianism from Bayesian rationality alone. APPENDIX A: PROOFS We first establish some lemmas that will be useful in the proofs that follow. The first lemma shows that, given absence of unanimity, we need at most two outcome lotteries, 1 and 2 , to cover the entire range of the impartial observer’s preferences in the following sense: for all product lotteries (z ), either (z ) ∼ (z 1 ) for some z , or (z ) ∼ (z 2 ) for some z , or both. Moreover the set of product lotteries for which both applies are not all indifferent. To state this more formally, let the outcome lotteries 1 2 (not necessarily distinct) and the identity lotteries z 1 z2 (not necessarily distinct) be such that (z 1 1 ) (z2 2 ) and such that (z 1 1 ) (z ) (z2 2 ) for all product lotteries (z ). That is, the product lottery (z 1 1 ) is weakly better than all other product lotteries and the product lottery (z2 2 ) is weakly worse than all other product lotteries. Additionally, let the identity lotteries z1 and z 2 (not necessarily distinct) be such that (z 1 1 ) (z 1 ) (z1 1 ) for all product lotteries (z 1 ) and (z 2 2 ) (z 2 ) (z2 2 ) for all product lotteries (z 2 ). That 25 This identification is at the heart of the debate between Harsanyi and Sen. See Weymark (1991).
1960
GRANT, KAJII, POLAK, AND SAFRA
is, given outcome lottery 1 , the identity lottery z1 is (weakly) worse than all other identity lotteries, and given outcome lottery 2 , the identity lottery z 2 is (weakly) better than all other identity lotteries. The existence of these special lotteries follows from continuity of , nonemptiness of and compactness of (I ) × (X ). Moreover, by independence over identity lotteries, we can take z 1 z1 z 2 and z2 each to be a degenerate identity lottery. Let these be i1 i1 i2 and i2 , respectively. LEMMA 7—Spanning: Assume absence of unanimity applies and that the impartial observer satisfies acceptance and independence over identity lotteries. Let i1 i1 i2 i2 1 and 2 be defined as above. Then (a) either (i1 1 ) ∼ (i2 2 ), or (i2 2 ) ∼ (i1 1 ), or (i2 2 ) (i1 1 ) and (b) for any product lottery (z ), either (i1 1 ) (z ) (i1 1 ) or (i2 2 ) (z ) (i2 2 ) or both. PROOF: (a) If 1 = 2 , then the first two cases both hold. Otherwise, suppose that the first two cases do not hold; that is, (i1 1 ) (i2 2 ) and (i1 1 ) (i2 2 ). By the definition of i1 , we know that (i2 1 ) (i1 1 ) and hence (i2 1 ) (i2 2 ). Using absence of unanimity and acceptance, there must exist another individual ıˆ = i2 such that (ˆı 2 ) (ˆı 1 ). Again by the definition of i1 , we know that (ˆı 1 ) (i1 1 ) and hence (ˆı 2 ) (i1 1 ). By the definition of i2 , we know that (i2 2 ) (ˆı 2 ) and hence (i2 2 ) (i1 1 ), as desired. Part (b) follows immediately from (a). Q.E.D. The next lemma does not yet impose independence over outcome lotteries on individuals and hence yields a more general representation than that in Theorem 1. The idea for this lemma comes from Karni and Safra (2000), but they worked with the full set of joint distributions (I × X ), whereas we are restricted to the set of product lotteries (I ) × (X ). LEMMA 8—Affine Representation: Suppose absence of unanimity applies. Then the impartial observer satisfies the acceptance principle and independence over identity lotteries if and only if there exist a continuous function V : (I ) × (X ) → R that represents and, for each individual i in I , a function Vi : (X ) → R that represents i such that for all (z ) in (I )× (X ), (4)
V (z ) =
I
zi Vi ()
i=1
Moreover the functions Vi are unique up to common positive affine transformations. PROOF: Since the representation is affine in identity lotteries, it is immediate that the represented preferences satisfy the axioms. We will show that the axioms imply the representation.
GENERALIZED UTILITARIANISM
1961
Let i1 i1 i2 i2 1 and 2 be defined as in Lemma 7. Given continuity, an immediate consequence of Lemma 7 is that, for any product lottery (z ), either (z ) ∼ (z 1 ) for some z or (z ) ∼ (z 2 ) for some z or both. Moreover, we can choose the z such that its support only contains individuals i1 and i1 , and similarly for z with respect to i2 and i2 . The proof of Lemma 8 now proceeds with two cases. Case 1. The easiest case to consider is where 1 = 2 . In this case, (i1 1 ) (i1 1 ) and (i1 1 ) (z ) (i1 1 ) for all (z ). Then, for each (z ), let V (z ) be defined by
V (z )[i1 ] + (1 − V (z ))[i1 ] 1 ∼ (z )
By continuity and independence over identity lotteries, such a V (z ) exists and is unique. To show that this representation is affine, notice that if (V (z )[i1 ] + (1 − V (z ))[i1 ] 1 ) ∼ (z ) and (V (z )[i1 ] + (1 − V (z ))[i1 ] 1 ) ∼ (z ), then independence over identity lotteries implies ([αV (z ) + (1 − α)V (z )][i1 ] + [1 − αV (z ) − (1 − α)V (z )][i1 ] 1 ) ∼ (αz + (1 − α)z ). Hence αV (z ) + (1 − α)V (z ) = V (αz + (1 − α)z ). Since any identity lottery z in (I ) can be written as z = i zi [i], proceeding sequentially on I , affinity implies V (z ) = i zi V (i ). Finally, by acceptance, V (i ·) agrees with i on (X ). Hence, if we define Vi : (X ) → R by Vi () = V (i ), then Vi represents individual i’s preferences. The uniqueness argument is standard; see, for example, Karni and Safra (2000, p. 321). Case 2. If (i1 1 ) ∼ (i2 2 ), then (i1 1 ) (z ) (i1 1 ) for all (z ) and hence Case 1 applies. Similarly, if (i2 2 ) ∼ (i1 1 ), then (i2 2 ) (z ) (i2 2 ) for all (z ), and again Case 1 applies (with 2 in place of 1 ). Hence suppose that (i1 1 ) (i2 2 ) and that (i1 1 ) (i2 2 ). Then, by Lemma 7, (i1 1 ) (i2 2 ) (i1 1 ) (i2 2 ); that is, we have two overlapping intervals that span the entire range of the impartial observer’s preferences. Then, just as in Case 1, we can construct an affine function V 1 (· ·) to represent the impartial observer’s preferences restricted to those (z ) such that (i1 1 ) (z ) (i1 1 ), and we can construct an affine function V 2 (· ·) to represent restricted to those (z ) such that (i2 2 ) (z ) (i2 2 ). We can then apply an affine renormalization of either V1 or V2 such that the (renormalized) representations agree on the overlap (i2 2 ) (z ) (i1 1 ). Since V1 (· ·) and V2 (· ·) are affine, the renormalized representation is affine, and induction on I (plus acceptance) gives us V (z ) = i zi Vi () as before. Again, uniqueness follows from standard arguments. Q.E.D. REMARK: The argument in Case 1 above is similar to that in Safra and Weisengrin (2003, p. 184) and Karni and Safra (2000, p. 320) except that in
1962
GRANT, KAJII, POLAK, AND SAFRA
the latter case, the analog of 1 is a vector of outcome lotteries with a different outcome lottery for each agent. Both these papers use stronger axioms to obtain a unique representation when Case 1 does not apply. Our argument for these cases applies Lemma 7, which in turn uses the richness condition, absence of unanimity, in place of any stronger axiom on the preferences of the impartial observer. PROOF OF THEOREM 1—Generalized Utilitarianism: It is immediate that the represented preferences satisfy the axioms. We will show that the axioms imply the representation. If we add to Lemma 8 (the affine representation lemma) the assumption that each individual satisfies independence over outcome lotteries, then it follows immediately that each Vi function in representation (4) must be a strictly increasing transformation φi of a von Neumann– Morgenstern expected-utility representation Ui . Thus, we obtain a generalized utilitarian representation. Q.E.D. PROOF OF PROPOSITION 2—Concavity: For each i in I , set Vi () := V (i ) = φi [Ui ()] for all . That is, these are the Vi ’s from the affine representation in Lemma 8. Since each Ui is affine in outcome lotteries, each V (i ·) is concave in outcome lotteries if and only if the corresponding φi is concave. To show that concavity is sufficient, suppose (z ) ∼ (z ). Using the representation in Lemma 8 and imposing concavity, we obtain V (z α+(1−α) ) = I I I i=1 zi Vi (α+(1−α) ) = i=1 zi V (i α+(1−α) ) ≥ i=1 zi [αV (i )+(1− α)V (i )] = αV (z ) + (1 − α)V (z ). Using the fact that (z ) ∼ (z ), the last expression is equal to αV (z ) + (1 − α)V (z ) = V (αz + (1 − α)z ). Hence the impartial observer exhibits a preference for life chances. For necessity, we need to show that for all i and all , ∈ (X ), V (i α + (1 − α) ) ≥ αV (i ) + (1 − α)V (i ) for all α in [0 1]. So let exhibit preference for life chances, fix i, and consider , ∈ (X ). Assume first that ∼i . By acceptance, V (i ) = V (i ). Hence, by preference for life chances, V (i α + (1 − α) ) ≥ V (α[i] + (1 − α)[i] ) (by preference for life chances) = V (i ) = αV (i ) + (1 − α)V (i )
(since V (i ) = V (i ))
as desired. Assume henceforth that i (and, by acceptance, V (i ) > V (i )). By absence of unanimity, there must exist a j such that V (j ) < V (j ). There are three cases to consider. (a) If V (i ) ≥ V (j ), then by the representation in Lemma 8, there exists z (of the form β[i] + (1 − β)[j]) such that V (z ) = V (i ). Thus, for all α
GENERALIZED UTILITARIANISM
1963
in (0 1), V (i α + (1 − α) ) ≥ V (α[i] + (1 − α)z ) (by preference for life chances) = αV (i ) + (1 − α)V (z ) = αV (i ) + (1 − α)V (i ) (since V (z ) = V (i )) as desired. Assume henceforth that V (j ) > V (i ) (which implies V (j ) > V (i )). (b) If V (j ) ≥ V (i ), then by the representation in Lemma 8, there exists z (of the form β[i] + (1 − β)[j]) such that V (z ) = V (i ). Thus, for all α in (0 1), V (i α + (1 − α)) ≥ V (α[i] + (1 − α)z ) (by preference for life chances) = αV (i ) + (1 − α)V (z ) = αV (i ) + (1 − α)V (i ) (since V (z ) = V (i )) as desired. (c) Finally, let V (i ) > V (j ) > V (j ) > V (i ). By the continuity of V , there exist β0 and β0 in (0 1) such that β0 > β0 , and such that V (i β0 + (1 − β0 ) ) = V (j ) and V (i β0 + (1 − β0 ) ) = V (j ). Denote 0 = β0 + (1 − β0 ) . Then similarly to part (a), Vi (γ + (1 − γ)0 ) ≥ γVi () + (1 − γ)Vi (0 ) for all γ ∈ (0 1). Next, denote 0 = β0 + (1 − β0 ) . Then similarly to part (b), Vi (γ + (1 − γ)0 ) ≥ γVi ( ) + (1 − γ)Vi (0 ) for all γ ∈ (0 1). Therefore, restricted to the line segment [ ], the graph of Vi lies weakly above the line connecting ( Vi ( )) and (0 Vi (0 )) (as does the point (0 Vi (0 ))) and weakly above the line connecting (0 Vi (0 )) and ( Vi ()) (as does the point (0 Vi (0 ))). Hence, Vi (α + (1 − α) ) ≥ αVi () + Q.E.D. (1 − α)Vi ( ) for all α ∈ (0 1). PROOF OF THEOREM 3—Utilitarianism: It is immediate that the represented preferences satisfy the axioms. We will show that the axioms imply the representation. Given acceptance, the proof of Proposition 2 (concavity) shows that the impartial observer satisfies preference for life chances if and only if, each Vi in the representation in Lemma 8 is concave in outcome lotteries. Notice, in particular, that this argument never uses the fact that each individual
1964
GRANT, KAJII, POLAK, AND SAFRA
satisfies independence over outcome lotteries. By a similar argument, the impartial observer is indifferent between life chances and accidents of birth if and only if each Vi is affine in outcome lotteries. To complete the representation, for each i, set Ui (·) ≡ Vi (·) to obtain the required von Neumann–Morgenstern Q.E.D. expected-utility representation of individual i’s preferences i . PROOF OF COROLLARY 4—Outcome Independence: This result can be obtained as a corollary of Theorem 3 (utilitarianism). Alternatively, the proof of Proposition 2 (concavity) shows that the impartial observer is indifferent between life chances and accidents of birth if and only if, for all i in I , V (i ·) is affine in outcome lotteries. Using the representation in Lemma 8, we obI I tain V (z α + (1 − α) ) = i=1 zi V (i α + (1 − α) ) = i=1 zi [αV (i ) + (1 − α)V (i )] = αV (z ) + (1 − α)V (z ). That is, the impartial observer is affine in outcome lotteries. Hence it follows that the impartial observer satisfies independence over outcome lotteries. Q.E.D. PROOF OF PROPOSITION 5—Different Risk Attitudes: First, notice that if Uji is not empty, then it is a closed interval. If Uji has an empty interior, then the proposition holds trivially true. Therefore, assume that Uji = [uji u¯ ji ], where uji < u¯ ji . ˜ and ˜ such that To prove that φ−1 ◦ φj convex is sufficient, fix , , , i ˜ = V (j ˜ ). We want to show that V (i α˜ + (1 − V (i ) = V (j ) and V (i ) ˜ α)) ≥ V (j α + (1 − α) ). By construction, both Uj ( ) and Uj (˜ ) lie in Uji . −1 ˜ ˜ Moreover, we have Ui () = φ−1 i ◦ φj [Uj ( )] and Ui () = φi ◦ φj [Uj ( )]. Applying the representation, we obtain V (i α˜ + (1 − α)) = φi Ui (α˜ + (1 − α))
(by the representation)
˜ + (1 − α)Ui ()] (by affinity of Ui ) = φi [αUi () −1 = φi αφi ◦ φj [Uj (˜ )] (by the representation) + (1 − α)φ−1 i ◦ φj [Uj ( )] −1 ≥ φi φi ◦ φj [αUj (˜ ) + (1 − α)Uj ( )] (by convexity of φ−1 i ◦ φj ) = φj Uj (α˜ + (1 − α) ) (by affinity of Uj ) = V (j α˜ + (1 − α) ) (by the representation) To prove that φ−1 i ◦ φj convex is necessary, fix v w in Uji . By the definition of Uji , there exist outcome lotteries ∈ (X ) such that Uj ( ) = v and Ui () =
GENERALIZED UTILITARIANISM
1965
˜ ˜ ˜ φ−1 i ◦ φj (v), and there exist outcome lotteries ∈ (X ) such that Uj ( ) = −1 ˜ = φi ◦ φj (w). By construction, we have V (i ) = V (j ) and w and Ui () ˜ V (i ) = V (j ˜ ). Therefore, for all α in (0 1), φi Ui (α˜ + (1 − α)) ≥ φj Uj (α˜ + (1 − α) ) ⇒
˜ + (1 − α)Ui () ≥ φ−1 ˜ αUi () i ◦ φj [αUj ( ) + (1 − α)Uj ( )]
⇒
−1 αφ−1 i ◦ φj (w) + (1 − α)φi ◦ φj (v)
≥ φ−1 i ◦ φj (αw + (1 − α)v) Since v and w were arbitrary, the last inequality corresponds to the convexity of φ−1 Q.E.D. i ◦ φj on Uji . PROOF OF PROPOSITION 6—Common φ Function: Necessity follows immediately from Proposition 5. For the sufficiency argument, first fix a representation {Ui φi }i∈I of the preferences of the generalized utilitarian impartial observer. Recall that by Theorem 1, the composite functions φi ◦ Ui are unique up to a common positive affine transformation. The argument proceeds by a series of steps to construct a new representation {Uˆ i φˆ i }i∈I with φˆ i ≡ φ for all i in I . The construction leaves the composite functions unchanged; that is, φi ◦ Ui ≡ φ ◦ Uˆ i for all i. To start, let the outcome lottery 1 and the individual i1 be such that (i1 1 ) (j ) for all individuals j ∈ I and outcome lotteries in (X ). Step 1. Suppose there exists a second individual j such that the interval Uji1 has a nonempty interior. By Proposition 5, if the impartial observer is indif◦ φj is affine on Uji1 . ferent between facing similar risks as i1 or j, then φ−1 i1 −1 Since Uji1 has a nonempty interior, φi1 ◦ φj has a unique affine extension on R. Define a new von Neumann–Morgenstern utility function Uˆ j for agent j by the affine transformation Uˆ j () := φ−1 ◦ φj [Uj ()] for all in (X ). Define a i1 ˆ new transformation function φj for agent j by setting φˆ j (Uˆ j ()) := φj (Uj ()). Thus, in particular, if (i1 ) ∼ (j ) (and hence φj [Uj ( )] = φi1 [Ui1 ()]), then by construction we have Uˆ j ( ) = Ui1 (). Moreover, by construction we have φˆ j (u) = φi1 (u) for all u in the intersection of the ranges Ui ((X )) ∩ Uˆ j ((X )). Hence, with slight abuse of notation we can write φ := φˆ j = φi1 , even if this extends the domain of φi1 . Thus, we can construct a new generalized utilitarian representation of the same preferences with Uj replaced by Uˆ j and φj replaced by φ in which the two individuals i1 and j share a common φ. Uniqueness of the Ui up to common positive affine transformations holds because, by construction, (i1 ) ∼ (j ) implies Ui1 () = Uˆ j ( ). Step 2. By repeating Step 1 for any individual j in I such that there exists a sequence of individuals j1 jN with j1 = i1 and jN = j such that Ujn jn−1 has
1966
GRANT, KAJII, POLAK, AND SAFRA
nonempty interior, we can construct a new generalized utilitarian representation in which the two individuals i1 and j share a common φ. Let I 1 be the set of individuals who can be connected to i1 in this manner. If I 1 = I , then we are done. Step 3. Suppose then that I \ I 1 is nonempty. By construction, (j ) (j ) for all in (X ) and all j ∈ I1 and j ∈ I \ I1 . Let i ∈ I \ I 1 and ˆ ∈ (X ) ˆ (j ) for all individuals j ∈ I \ I 1 and outcome lotteries be such that (i ) in (X ). If (j ) ∼ (i ) for some in (X ) and j ∈ I1 , let Uˆ i be a positive affine transformation of Ui such that Uˆ i () = Uj ( ) and let φˆ i be such that φˆ i ◦ Uˆ i ≡ φi ◦ Ui . Then simply extend φ on the range of Uˆ i by setting φ := φˆ i . Conversely, if (j ) (i ) for all in (X ) and j ∈ I1 , let Uˆ i be a positive affine transformation of Ui such that Uˆ i () < Uj ( ) for all in (X ) and j ∈ I1 , and let φˆ i be such that φˆ i ◦ Uˆ i ≡ φi ◦ Ui . Again, extend φ on the range of Uˆ i by setting φ := φˆ i . Step 4. Repeat Steps 1 and 2 using i in place of i1 and φ in place of φi1 . Let 2 I be the set of individuals who can be connected to i when Step 2 is repeated. Notice that by construction, I 1 ∩ I 2 is empty. If I 1 ∪ I 2 = I , then we are done. If I 1 ∪ I 2 = I , then repeat Step 3. Let i be the individual in I \ (I 1 ∪ I 2 ) that corresponds to i in this step. Then repeat Steps 1 and 2 using i place of i. From the finiteness of I , this process can be repeated only a finite number of Q.E.D. times before we exhaust I . APPENDIX B This appendix contains two counterexamples mentioned in the text and also the key step to show that the proof of Theorem 1 extends to obtain the form of generalized utilitarian representation given in expression (3) for preferences defined on (I × X ) and the corresponding axioms as given in Section 6. Examples For each of the following examples, let I ={1 2} and X ={x1 x2 }. To simplify notation, for each z ∈ (I ), let q = z2 , and for each ∈ (X ), let p := (x2 ). Then, with slight abuse of notation, we write (q p) (q p ) for (z ) (z ) and write V (q p) for V (z ). Example 1 simply translates the example discussed in Section 6 to show that the impartial observer might satisfy acceptance, and both identity and outcome independence but not be utilitarian. EXAMPLE 1: Let agent 1’s preferences be given by U1 (p) = (1 − 2p) and let agent 2’s preferences be given by U2 (p) = (2p − 1). Let the impartial observer’s preferences be given by V (q p) := (1 − q)φ[U1 (p)] + qφ[U2 (p)],
GENERALIZED UTILITARIANISM
1967
where the (common) φ function is given by φ[u] =
uk −(−u)k
for u ≥ 0, for u < 0,
for some k > 0
Acceptance and identity independence were discussed in the text. To show that this example satisfies outcome independence, consider the inverse function φ−1 (u) = u1/k for u ≥ 0 and φ−1 (u) = −(−u)1/k for u < 0. This is a strictly increasing function. Therefore, the function φ−1 [V (· ·)] represents the same preferences as V (· ·). It is enough to show that we can write φ−1 [V (q p)] = (1 − p)φ−1 [(1 − 2q)] + pφ−1 [(2q − 1)] This alternative representation is symmetric to the original representation V (· ·) with the p’s and q’s reversed, and φ−1 replacing φ. Since the alternative representation is affine in p, preferences must satisfy independence over outcome lotteries. To confirm that φ−1 [V (· ·)] takes this form, it is instructive to rewrite V (q p) as
(1 − 2q)(1 − 2p)k for p < 1/2, (2q − 1)(2p − 1)k for p > 1/2, ⎧ (1 − 2q)(1 − 2p)k for q < 1/2, p < 1/2 ⎪ ⎪ ⎪ ⎪ (and V (q p) > 0), ⎪ ⎪ ⎪ ⎪ k ⎪ for q > 1/2, p < 1/2 −(2q − 1)(1 − 2p) ⎪ ⎪ ⎪ ⎪ (and V (q p) < 0), ⎨ = 0 for (2q − 1)(2p − 1) = 0, ⎪ ⎪ k ⎪ −(1 − 2q)(2p − 1) for q < 1/2, p > 1/2 ⎪ ⎪ ⎪ ⎪ (and V (q p) < 0), ⎪ ⎪ ⎪ ⎪ k ⎪ for q > 1/2, p > 1/2 (2q − 1)(2p − 1) ⎪ ⎩ (and V (q p) > 0).
V (q p) =
Hence, ⎧ (1 − 2q)1/k (1 − 2p) ⎪ ⎪ ⎪ 1/k ⎪ ⎨ −(2q − 1) (1 − 2p) φ−1 ◦ V (q p) = 0 ⎪ ⎪ 1/k ⎪ (2p − 1) ⎪ ⎩ −(1 − 2q) 1/k (2q − 1) (2p − 1)
for q < 1/2, p < 1/2, for q > 1/2, p < 1/2, for (2q − 1)(2p − 1) = 0, for q < 1/2, p > 1/2, for q > 1/2, p > 1/2,
1968
GRANT, KAJII, POLAK, AND SAFRA
⎧ 1/k 1/k (1 − p) (1 − 2q) + p −[−(2q − 1)] ⎪ ⎪ ⎪ ⎪ for q < 1/2, ⎨ = 0 for q = 1/2, ⎪ ⎪ 1/k ⎪ + p (2q − 1)1/k ⎪ ⎩ (1 − p) −[−(1 − 2q)] for q > 1/2, = (1 − p)φ−1 [(1 − 2q)] + pφ−1 [(2q − 1)] which equals (1 − p)φ−1 [(1 − 2q)] + pφ−1 [(2q − 1)] as desired. Example 2 shows that the impartial observer’s preferences can satisfy all the conditions of Proposition 2 (the concavity result) except absence of unanimity and yet the functions φi need not be concave. That is, absence of unanimity is essential. EXAMPLE 2: Let the individual’s preferences be given by U1 (p) = U2 (p) = p and let the impartial observer’s preferences be given by V (q p) := (1 − q)φ1 [U1 (p)] + qφ2 [U2 (p)], where 1/4 + u/2 for u ≤ 1/2, φ1 (u) := u for u > 1/2, u for u ≤ 1/2, φ2 (u) := 2u − 1/2 for u > 1/2. Since U1 = U2 , both individuals have the same ranking over outcome lotteries and so the impartial observer’s preferences violate absence of unanimity. Clearly, the functions φ1 (·) and φ2 (·) are not concave. To see that the impartial observer satisfies preference for life chances, without loss of generality let p ≤ p and notice that (q p ) ∼ (q p) implies either p ≤ p ≤ 1/2 or p ≥ p ≥ 1/2. But in either case, the functions φ1 and φ2 are concave (in fact, affine) on the domain [p p ] and hence V (αq + (1 − α)q p) ≤ (in fact, =) V (q αp + (1 − α)p ), as desired. The Generalized Utilitarian Representation for (I × X ) We next show that we can use essentially the same proof as for Theorem 1 to obtain the form of generalized utilitarian representation given in expression (3) for an impartial observer’s preferences defined on (I × X ) that satisfy the axioms given in Section 6. The key step is to show that the analog of Lemma 7 (spanning) part (b) still applies: there exist two outcome lotteries 1 and 2 , and four individuals i1 i1 i2 and i2 such that for any joint distribution (z (i )i∈I ), either (i1 1 ) (z (i )i∈I ) (i1 1 ) or (i2 2 ) (z (i )i∈I ) (i2 2 ) or both. That is, we can still use two sets of product lotteries, one associated with 1 and one with 2 , to span the entire range of the the impartial
GENERALIZED UTILITARIANISM
1969
observer’s preferences even though these are now defined over the full set of joint distributions (I × X ). To see this, let (ˆz (ˆi )i∈I ) be an element of (I × X ) with the property ˆ (ˆi )i∈I ) (z (i )i∈I ) for all (z (i )i∈I ) ∈ (I × X ). By constrained inthat (z dependence, there must exist an individual i1 in the support of zˆ such that (i1 (ˆi )i∈I ) ∼ (zˆ (ˆi )i∈I ). Let 1 := ˆi1 and let (i1 1 ) denote the (product) lottery (i1 (i )i∈I ), where i = 1 for all i ∈ I . By the acceptance∗ principle Axiom 1∗ , (i1 1 ) ∼ (i1 (ˆi )i∈I ). Therefore, there exists an outcome lottery 1 and an individual i1 such that the product lottery (i1 1 ) has the property that (i1 1 ) (z (i )i∈I ) for all (z (i )i∈I ) ∈ (I × X ). Similarly, there exists an outcome lottery 2 and an individual i2 such that the product lottery (i2 2 ) has the property that (z (i )i∈I ) (i2 2 ) for all (z (i )i∈I ) ∈ (I × X ). Define i1 and i2 exactly as in Lemma 7. The proof of part (a) of Lemma 7 (spanning) then follows with no change in the proof, and the analog of part (b) of the lemma (as stated above) follows immediately from part (a). Thereafter, the proof of the representation result is almost unchanged. The analog of Lemma 8 obtains an affine representation of the form V (z ) = I i=1 zi Vi (i ). The proof is the same as that for Lemma 8 except that constrained independence is used wherever independence over identity lotteries was used before. This extends the representation from product lotteries (I )×(X ) to the full space of joint distributions (I × X ). The fact that Vi (i ) takes the form φi (Ui (i )) follows (as before) from acceptance and outcome independence for individuals. REFERENCES ARROW, K. J. (1963): Social Choice and Individual Values (Second Ed.). New York: Wiley. [1957] (1977): “Extended Sympathy and the Possibility of Social Choice,” American Economic Review, 67, 219–225. [1957] BLACKORBY, C., W. BOSSERT, AND D. DONALDSON (2005): Population Issues in Social Choice Theory, Welfare Economics and Ethics. Cambridge: Cambridge University Press. [1943] BLACKORBY, C., D. DONALDSON, AND P. MONGIN (2004): “Social Aggregation Without the Expected Utility Hypothesis,” Working Paper 2004-020, Ecole Polytechnique. [1943] BLACKORBY, C., D. DONALDSON, AND J. WEYMARK (1999): “Harsanyi’s Social Aggregation Theorem for State-Contingent Alternatives,” Journal of Mathematical Economics, 32, 365–387. [1943] BROOME, J. (1984): “Uncertainty and Fairness,” Economic Journal, 94, 624–632. [1948] (1991): Weighing Goods. Oxford: Blackwell. [1944] (1993): “A Cause of Preference Is Not an Object of Preference,” Social Choice and Welfare, 10, 57–68. [1958] DIAMOND, P. A. (1967): “Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparison of Utility: Comment,” Journal of Political Economy, 75, 765–766. [1940] ELSTER, J. (1989): Solomonic Judgements: Studies in the Limitation of Rationality. Cambridge: Cambridge University Press. [1941] EPSTEIN, L. G., AND U. SEGAL (1992): “Quadratic Social Welfare,” Journal of Political Economy, 100, 691–712. [1944]
1970
GRANT, KAJII, POLAK, AND SAFRA
ERGIN, H., AND F. GUL (2009): “A Theory of Subjective Compound Lotteries,” Journal of Economic Theory, 144, 899–929. [1942] FISHBURN, P. C. (1982): Foundations of Expected Utility. Dordrecht: Reidel. [1947,1955] FLEURBAEY, M. (2007): “Assessing Risky Social Situations,” Mimeo, University Paris Descartes. [1948] GRANT, S., A. KAJII, B. POLAK, AND Z. SAFRA (2006): “Generalized Utilitarianism and Harsanyi’s Impartial Observer Theorem,” Discussion Paper 1578, Cowles Foundation. [1948,1956] HAMMOND, P. (1981): “Ex-ante and ex-post Welfare Optimality Under Uncertainty,” Economica, 48, 235–250. [1948] (1982): “Utilitarianism, Uncertainty and Information,” in Utilitarianism and Beyond, ed. by A. K. Sen and B. Williams. Cambridge: Cambridge University Press, Chapter 4, 85–102. [1948] HARSANYI, J. C. (1953): “Cardinal Utility in Welfare Economics and in the Theory of RiskTaking,” Journal of Political Economy, 61, 434–435. [1939] (1955): “Cardinal Welfare, Individualistic Ethics, and Interpersonal Comparison of Utility: Comment,” Journal of Political Economy, 63, 309–321. [1939,1944] (1975): “Nonlinear Social Welfare Functions: Do Welfare Economists Have a Special Exemption From Bayesian Rationality?” Theory and Decision, 6, 311–332. [1941] (1977): Rational Behavior and Bargaining Equilibrium in Games and Social Situations. Cambridge: Cambridge University Press. [1939,1940,1957,1958] KARNI, E., AND Z. SAFRA (2000): “An Extension of a Theorem of von Neumann and Morgenstern With an Application to Social Choice,” Journal of Mathematical Economics, 34, 315–327. [1947, 1954,1960,1961] (2002): “Individual Sense of Justice: A Utility Representation,” Econometrica, 70, 263–284. [1944] KARNI, E., AND J. WEYMARK (1998): “An Informationally Parsimonious Impartial Observer Theorem,” Social Choice and Welfare, 15, 321–332. [1943] MEYER, M. (1991): “A Social Welfare Function Approach to the Measurement of ex post Inequality Under Uncertainty,” Working Paper, Nuffield College. [1948] MONGIN, P. (2001): “The Impartial Observer Theorem of Social Ethics,” Economics & Philosophy, 17, 147–179. [1940,1957,1958] (2002): “Impartiality, Utilitarian Ethics and Collective Bayesianism,” Working Paper 2002-030, Laboratoire D’Econometrie. [1940] MONGIN, P., AND C. D’ASPREMONT (1998): “Utility Theory and Ethics,” in Handbook of Utility Theory, Boston: Kluwer Academic, Chapter 8, 371–481. [1957] MYERSON, R. (1981): “Utilitarianism, Egalitarianism and the Timing Effect in Social Choice Problems,” Econometrica, 49, 883–897. [1948] PATTANAIK, P. K. (1968): “Risk, Impersonality, and the Social Welfare Function,” Journal of Political Economy, 76, 1152–1169. [1941,1957] RAWLS, J. (1951): “Outline for a Decision Procedure for Ethics,” Philosophical Review, 40, 177–197. [1957] SAFRA, Z. AND E. WEISSENGRIN (2003): “Harsanyi’s Impartial Observer Theorem With a Restricted Domain,” Social Choice and Welfare, 20, 95–111. [1943,1947,1955,1961] SEN, A. K. (1970): Collective Choice and Social Welfare. San Francisco: Holden-Day. [1940] (1977): “On Weights and Measures: Informational Constraints in Social Welfare Analysis,” Econometrica, 45, 1539–1572. [1940] (1979): “Interpersonal Comparisons of Welfare,” in Economics and Human Welfare: Essays in Honor or Tibor Skitovsky, ed. by M. Boskin. New York: Academic Press. Reprinted in A. K. SEN (1982): Choice, Welfare and Measurement, Oxford: Blackwell, 264–282. [1957] VICKREY, W. (1945): “Measuring Marginal Utility by Reaction to Risk,” Econometrica, 13, 319–333. [1957] WEYMARK, J. (1991): “A reconsideration of the Harsanyi–Sen debate on utilitarianism” in Interpersonal Comparisons of Well-being, ed. by J. Elster and J. Roemer. Cambridge: Cambridge University Press, 255–320. [1940,1957-1959]
GENERALIZED UTILITARIANISM
1971
Dept. of Economics, Rice University, 6100 Main Street, Houston, TX 77005, U.S.A. and School of Economics, University of Queensland, Level 6, Colin Clark Building (39), St. Lucia, Brisbane QLD 4072, Australia; simongrant1962@gmail. com, Institute of Economic Research, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan;
[email protected], Dept. of Economics, Yale University, P.O. Box 208268, New Haven, CT 065208268, U.S.A.;
[email protected], and Business School, University of Exeter, Exeter EX4 4ST, U.K., The College of Management, and Tel Aviv University, Israel;
[email protected]. Manuscript received September, 2006; final revision received April, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 1973–2004
PREFERENCES FOR ONE-SHOT RESOLUTION OF UNCERTAINTY AND ALLAIS-TYPE BEHAVIOR BY DAVID DILLENBERGER1 Experimental evidence suggests that individuals are more risk averse when they perceive risk that is gradually resolved over time. We address these findings by studying a decision maker who has recursive, nonexpected utility preferences over compound lotteries. The decision maker has preferences for one-shot resolution of uncertainty if he always prefers any compound lottery to be resolved in a single stage. We establish an equivalence between dynamic preferences for one-shot resolution of uncertainty and static preferences that are identified with commonly observed behavior in Allais-type experiments. The implications of this equivalence on preferences over information systems are examined. We define the gradual resolution premium and demonstrate its magnifying effect when combined with the usual risk premium. KEYWORDS: Recursive preferences over compound lotteries, resolution of uncertainty, Allais paradox, negative certainty independence.
1. INTRODUCTION EXPERIMENTAL EVIDENCE SUGGESTS that individuals are more risk averse when they perceive risk that is gradually resolved over time. In an experiment with college students, Gneezy and Potters (1997) found that subjects invest less in risky assets if they evaluate financial outcomes more frequently. Haigh and List (2005) replicated the study of Gneezy and Potters with professional traders and found an even stronger effect. These two studies allow for flexibility in adjusting investment according to how often the subjects evaluate the returns. Bellemare, Krause, Kröger, and Zhang (2005) found that even when all subjects have the same investment flexibility, variations in the frequency of information feedback alone affect investment behavior systematically. All their subjects had to commit in advance to a fixed equal amount of investment for three subsequent periods. Group A was told that they would get periodic statements (i.e., would be informed about the outcome of the gamble after every draw), whereas group B knew that they would hear only the final yields of their investment. The average investment in group A was significantly lower than in group B. The authors concluded that “information feedback should be the variable of interest for researchers and actors in financial markets alike.” 1 I am grateful to Faruk Gul and Wolfgang Pesendorfer for their invaluable advice during the development of the paper. I thank Roland Benabou, Eric Maskin, Stephen Morris, Klaus Nehring, and Uzi Segal for their helpful discussions and comments. The co-editor and four anonymous referees provided valuable comments that improved the paper significantly. I have also benefited from suggestions made by Shiri Artstein-Avidan, Amir Bennatan, Bo’az Klartag, Ehud Lehrer, George Mailath, Charles Roddie, and Kareen Rozen. Special thanks to Anne-Marie Alexander for all her help. This paper is based on the first chapter of my doctoral dissertation at Princeton University.
© 2010 The Econometric Society
DOI: 10.3982/ECTA8219
1974
DAVID DILLENBERGER
Such interdependence between the way individuals observe the resolution of uncertainty and the amount of risk they are willing to take is not compatible with the standard model of decision making under risk, which is a theory of choice among probability distributions over final outcomes.2 In this paper, we assume that the value of a lottery depends directly on the way the uncertainty is resolved over time. Using this assumption, we provide a choice theoretic framework that can address the experimental evidence above while pinpointing the required deviations from the standard model. We exploit the structure of the model to identify the links between the temporal aspect of risk aversion, a static attitude toward risk, and intrinsic preferences for information. To facilitate exposition, we mainly consider a decision maker (DM) whose preferences are defined over the set of two-stage lotteries, namely lotteries over lotteries over outcomes. Following Segal (1990), we replace the reduction of compound lotteries axiom with the following two assumptions: time neutrality and recursivity. Time neutrality says that the DM does not care about the time in which uncertainty is resolved as long as resolution happens in a single stage. Recursivity says that if the DM prefers a single-stage lottery p to a single-stage lottery q, then he also prefers to substitute q with p in any two-stage lottery containing q as an outcome. Under these assumptions, any two-stage lottery is subjectively transformed into a simpler, one-stage lottery. In particular, there exists a single preference relation over the set of one-stage lotteries that fully determines the DM’s preferences over the domain of twostage lotteries. To link behavior in both domains, we introduce the following two properties: the first is dynamic while the second is static. • Preferences for one-shot resolution of uncertainty (PORU). The DM has PORU if he always prefers any two-stage lottery to be resolved in a single stage. PORU implies an intrinsic aversion to receiving partial information. This notion formalizes an idea first raised by Palacious-Huerta (1999). • Negative certainty independence (NCI). NCI states that if the DM prefers lottery p to the (degenerate) lottery that yields the prize x for certain, then this ranking is not reversed when we mix both options with any common, third lottery q. This axiom is similar to Kahneman and Tversky’s (1979) “certainty effect” hypothesis, though it does not imply that people weigh probabilities nonlinearly. The restrictions NCI imposes on preferences are just enough to explain commonly observed behavior in the common-ratio version of the Allais paradox with positive outcomes. In particular, NCI allows the von Neumann– Morgenstern (vNM) independence axiom to fail when the certainty effect is present. 2 All lotteries discussed in this paper are objective, that is, the probabilities are known. Knight (1921) proposed distinguishing between risk and uncertainty according to whether the probabilities are given to us objectively or not. Despite this distinction, we will use both notions interchangeably.
ONE-SHOT RESOLUTION OF UNCERTAINTY
1975
Proposition 1 establishes that NCI and PORU are equivalent. On the one hand, numerous replications of the Allais paradox prove NCI to be one of the most prominently observed preference patterns. On the other hand, empirical and experimental studies involving dynamic choices and experimental studies on preference for uncertainty resolution are still rather rare. The disproportional amount of evidence in favor of each property strengthens the importance of Proposition 1, since it provides new theoretical predictions for dynamic behavior, based on robust (static) empirical evidence. In an extended model, we allow the DM to take intermediate actions that might affect his ultimate payoff. The primitive in such a model is a preference relation over information systems, which is induced from preferences over compound lotteries. Safra and Sulganik (1995) left open the question of whether there are nonexpected utility preferences for which, when applied recursively, perfect information is always the most valuable information system. Proposition 2 shows that this property, which we term preferences for perfect information, is equivalent to PORU. As a corollary, NCI is both a necessary and sufficient condition to have preferences for perfect information. We extend our results to preferences over arbitrary n-stage lotteries and show that PORU can be quantified. The gradual resolution premium of any compound lottery is the amount that the DM would pay to replace that lottery with its single-stage counterpart. We demonstrate that, for a broad class of preferences, the gradual resolution premium can be quantitatively important; for any one-stage lottery, there exists a multistage lottery (with the same probability distribution over terminal prizes) whose value is arbitrarily close to that of getting the worst prize for sure. 1.1. Related Literature Confining his attention to binary single-stage lotteries and to preferences from the rank-dependent utility class (Quiggin (1982)), Segal (1987, 1990) discussed sufficient conditions under which the desirability of a two-stage lottery decreases as the two stages become less degenerate. Proposition 3 shows that these conditions cannot be extended to the general case, that is, the combination of rank-dependent utility and PORU implies expected utility. PalaciousHuerta (1999) was the first to raise the idea that the form of the timing of resolution of uncertainty might be an important economic variable. By working out an example, he demonstrated that a DM with Gul’s (1991) disappointment aversion preferences will be averse to the sequential resolution of uncertainty or, in the language of this paper, will be displaying PORU. He also discussed numerous applications. The general theory we suggest provides a way to understand which attribute of Gul’s preferences accounts for the resulting behavior. It also spells out the extent to which the analysis can be extended beyond Gul’s preferences. Schmidt (1998) developed a static model of expected utility with certainty preferences. His notion of certainty preferences is very close to Axiom NCI. In
1976
DAVID DILLENBERGER
his model, the value of any nondegenerate lottery is the expectation of a utility index over prizes, u, whereas the value of the degenerate lottery that yields the prize x for sure is v(x). The certainty effect is captured by requiring v(x) > u(x) for all x. Schmidt’s model violates both continuity and monotonicity with respect to first-order stochastic dominance, while in this paper, we confine our attention to preferences that satisfy both properties.3 Loss aversion with narrow framing (also known as myopic loss aversion) is a combination of two motives: loss aversion (Kahneman and Tversky (1979)), that is, people’s tendency to be more sensitive to losses than to gains, and narrow framing, that is, a dynamic aggregation rule that argues that when making a series of choices, individuals “bracket” them by making each choice in isolation.4 Benartzi and Thaler (1995) were the first to use this approach to suggest explanations for several economic “anomalies,” such as the equity premium puzzle (Mehra and Prescott (1985)). Barberis, Huang, and Thaler (2006) generalized Benartzi and Thaler’s work by assuming that the DM derives utility directly from the outcome of a gamble over and above its contribution to total wealth. Our model can be used to address similar phenomena. The combination of recursivity and a specific form of atemporal preferences implies that individuals behave as if they intertemporally perform narrow framing. The gradual resolution premium quantifies this effect. The two approaches are conceptually different: loss aversion with narrow framing brings to the forefront the idea that individuals evaluate any new gamble separately from its cumulative contribution to total wealth, while we maintain the assumption that terminal wealth matters, and we identify narrow framing as a temporal effect. In addition, we set aside the question of why individuals are sensitive to the way uncertainty is resolved (i.e., why they narrow frame) and construct a model that reveals the (context-independent) behavioral implications of such considerations. Köszegi and Rabin (2009) studied a model in which utility additively depends both on current consumption and on recent changes in (rational) beliefs about present and future consumption, where the latter component displays loss aversion. In their setting, they identified narrow framing with preference over such fluctuations in beliefs. They also showed that people prefer to get information clumped together (similar to PORU) rather than apart. Aside from the same conceptual differences between the two approaches, their set of results concerning information preferences is confined to the case where consumption happens only in the last period and is binary. This corresponds in 3 Continuity and monotonicity ensure that the certainty equivalent of each lottery is well defined. This fact is used when applying the recursive structure of Segal’s model. 4 Narrow framing is an example of people’s tendency to evaluate risky decisions separately. This tendency is illustrated in Tversky and Kahneman (1981), and further studied in Kahneman and Lovallo (1993) and Read, Loewenstein, and Rabin (1999), among others. Barberis and Huang (2007) presented an extensive survey of this approach.
ONE-SHOT RESOLUTION OF UNCERTAINTY
1977
our setup to lotteries over only two monetary prizes. Our results are valid for lotteries with arbitrary (finite) support. In this paper, we study time’s effect on preferences by distinguishing between one-shot and gradual resolution of uncertainty. A different, but complementary, approach is to study intrinsic preferences for early or late resolution of uncertainty. This research agenda was initiated by Kreps and Porteus (1978), and later extended by Epstein and Zin (1989) and Chew and Epstein (1989), among others. Grant, Kajii, and Polak (1998, 2000) connected preferences for the timing of resolution of uncertainty to intrinsic preferences for information. We believe that both aspects of intrinsic time preferences play a role in most real life situations. For example, an anxious student might prefer to know as soon as possible his final grade in an exam, but still prefers to wait rather than to get the grade of each question separately. The motivation to impose time neutrality is to demonstrate the role of the one-shot versus gradual effect, which has been neglected in the literature to date. The remainder of the paper is organized as follows: we start Section 2 by establishing our basic framework, after which we introduce the main behavioral properties of the paper and state our main characterization result. Section 3 comments on the implications of our model on preferences over information systems. In Section 4, we elaborate on the static implications of our model and provide examples. Section 5 first extends our results to preferences over compound lotteries with an arbitrarily finite number of stages. We then define the gradual resolution premium and illustrate its magnifying effect. Most proofs are relegated to the Appendix. 2. THE MODEL 2.1. Groundwork Consider an interval [w b] = X ⊂ R of monetary prizes. Let L1 be the set of all simple lotteries (probability measures with finite support) over X. That is, each p ∈ L1 is a function p : X → [0 1], satisfying x∈X p(x) = 1, and we restrict our analysis to the case where in any given lottery, the number of prizes with nonzero probability is finite. Let S(p) = {x | p(x) > 0}. For each p, q ∈ L1 and α ∈ (0 1), the mixture αp + (1 − α)q ∈ L1 is the simple lottery that yields each prize x with probability αp(x) + (1 − α)q(x). We denote by δx ∈ L1 the degenerate lottery that gives the prize x with certainty, that is, δx (x) = 1. Note that for any lottery p ∈ L1 , we have p = x∈X p(x)δx . 1 Correspondingly, let L2 be the set of all simple lotteries over L . That is, each 2 1 Q ∈ L is a function Q : L → [0 1], satisfying p∈L1 Q(p) = 1. For each P Q ∈ L2 and λ ∈ (0 1), the mixture R = λP + (1 − λ)Q ∈ L2 is the two-stage lottery for which R(p) = λP(p)+(1−λ)Q(p). We denote by Dp ∈ L2 the degenerate, in the first stage, compound lottery that gives lottery pin the second stage with certainty, that is, Dp (p) = 1. Note that for any lottery Q ∈ L2 , we have Q =
1978
DAVID DILLENBERGER
Q(q)Dq . We think of each Q ∈ L2 as a dynamic two-stage process where, in the first stage, a lottery q is realized with probability Q(q), and, in the second stage, a prize is obtained according to q. Two special subsets of L2 are Γ = {Dp | p ∈ L1 }, the set of degenerate lotteries in L2 , and Λ = {Q ∈ L2 | Q(p) > 0 ⇒ p = δx for some x ∈ X}, the set of lotteries in L2 , the outcomes of which are degenerate in L1 . Note that both Γ and Λ are isomorphic to L1 . Let be a continuous (in the topology of weak convergence) preference relation over L2 . Let Γ and Λ be the restriction of to Γ and Λ, respectively. On we impose the following axioms: q∈L1
AXIOM A0—More Is Better: For all x y ∈ X, x ≥ y ⇔ Dδx Dδy . AXIOM A1—Time Neutrality: For all p ∈ L1 , Dp ∼
x∈X
p(x)Dδx .
AXIOM A2—Recursivity: For all q p ∈ L1 , all Q ∈ L2 , and λ ∈ (0 1), Dp Dq
⇐⇒
λDp + (1 − λ)Q λDq + (1 − λ)Q
Axiom A0 is a weak monotonicity assumption. By postulating Axiom A1, we assume that the DM does not care about the time in which the uncertainty is resolved as long as it happens in a single stage. Axiom A2 assumes that preferences are recursive. It states that preferences over two-stage lotteries respect the preference relation over degenerate two-stage lotteries (that is, over singlestage lotteries) in the sense that two compound lotteries that differ only in the outcome of a single branch are compared exactly as these different outcomes would be compared separately. LEMMA 1: If satisfies Axioms A0, A1, and A2, then both Γ and Λ are monotone (with respect to the relation of first-order stochastic dominance).5 PROPOSITION—Segal (1990): The relation satisfies Axioms A0, A1, and A2 if and only if there exists a continuous function V : L1 → R, such that the certainty equivalent function c : L1 → X is given by V (δc(p) ) = V (p) for all p ∈ L1 , and for all P Q ∈ L2 , P Q ⇔ V P(p)δc(p) ≥ V Q(p)δc(p) p∈L1
p∈L1
Axioms A0, A1, and A2 imply that both Γ and Λ satisfy the axiom of degenerate independence (ADI; Grant, Kajii, and Polak (1992)). Simple induction arguments show that ADI is equivalent to monotonicity with respect to the relation of first-order stochastic dominance. 5
ONE-SHOT RESOLUTION OF UNCERTAINTY
1979
Note that under Axioms A0, A1, and A2, the preference relation Γ = Λ fully determines . The decision maker evaluates two-stage lotteries by first calculating the certainty equivalent of every second-stage lottery using the preferences represented by V and then calculating (using V again) the first-stage value by treating the certainty equivalents of the former stage as the relevant prizes. Since only the function V appears in the formula above, we slightly abuse notation by writing V (Q) for the value of the two-stage lottery Q. Last, since under the above assumptions V (p) = V (Dp ) = V ( x∈X p(x)Dδx ) for all p ∈ L1 , we simply write V (p) for this common value. 2.2. Main Properties We now introduce and motivate our two main behavioral assumptions. The first is dynamic, whereas the second is static. Our static properties are imposed on preference relations over sets that are isomorphic to L1 (such as Γ and Λ ). We denote by 1 such a generic preference relation and assume throughout that it is continuous and monotone. 2.2.1. Preference for One-Shot Resolution of Uncertainty We model a DM whose concept of uncertainty is multistage and who cares about the way uncertainty is resolved over time. In this section, we define consistent preferences to have all uncertainty resolved in one-shot rather than gradually or vice versa. Define ρ : L2 → L1 to be the reduction operator that maps a compound lottery to its reduced single-stage counterpart, that is, ρ(Q) = q∈L1 Q(q)q. Note that by Axiom A1, Dρ(Q) ∼ x∈X [ q∈L1 Q(q)q(x)]Dδx . DEFINITION 1: The preference relation displays preference for oneshot resolution of uncertainty (PORU) if ∀Q ∈ L2 , Dρ(Q) Q. If ∀Q ∈ L2 , Q Dρ(Q) , then displays preference for gradual resolution of uncertainty (PGRU). PORU implies an aversion to receiving partial information. If uncertainty is not fully resolved in the first stage, the DM prefers to remain fully unaware until the final resolution is available. PGRU implies the opposite. As we will argue in later sections, these notions render “the frequency at which the outcomes of a random process are evaluated” a relevant economic variable.6 6 Halevy (2007) provided some evidence in favor of PORU. In his paper, subjects were asked to state their reservation prices for four different compound lotteries. The behavior of approximately 60% of his subjects was consistent with Axioms A0–A2. Furthermore, approximately 40% of his subjects were classified as having preferences that are consistent with the recursive, nonexpected utility model. His results (which are discussed in Section 4.2.1 of his paper) show that within the latter group, the reservation prices of the two degenerate two-stage lotteries (V 1 and V 4, members of Λ and Γ , respectively) were approximately the same and larger than the reservation price of the gradually resolved lottery (V 3).
1980
DAVID DILLENBERGER
2.2.2. The Ratio Allais Paradox and Axiom NCI In a generic Allais-type questionnaire (also known as common-ratio effect with a certain prize) subjects choose between A and B, where A = δ3000 and B = 08δ4000 + 02δ0 . They also choose between C and D, where C = 025δ3000 + 075δ0 and D = 02δ4000 + 08δ0 . The majority of subjects tend to systematically violate expected utility by choosing the pair A and D.7 Since Allais’s (1953) original work, numerous versions of his questionnaire have appeared, many of which contain one lottery that does not involve any risk.8 Kahneman and Tversky (1979) used the term “certainty effect” to explain the commonly observed behavior. Their idea is that individuals tend to put more weight on certain events in comparison with very likely, yet uncertain, events. This reasoning is behaviorally translated into a nonlinear probabilityweighting function, π : [0 1] → [0 1], that individuals are assumed to use when evaluating risky prospects. In particular, this function has a steep slope near— or even a discontinuity point at—0 and 1. As we remark below, this implication has its own limitations. We suggest a property that is motivated by similar insights and captures the certainty effect without implying that people weigh probabilities nonlinearly. Consider the following axiom on 1 : AXIOM NCI—Negative Certainty Independence: For all p q δx ∈ L1 and λ ∈ [0 1], p 1 δx implies λp + (1 − λ)q 1 λδx + (1 − λ)q. The axiom states that if the sure outcome x is not enough to compensate the DM for the risky prospect p, then mixing it with any other lottery, thus eliminating its certainty appeal, will not result in the mixture of x being more attractive than the corresponding mixture of p. If we define c(p|λ q), the conditional certainty equivalent of a lottery p, as the solution to λp + (1 − λ)q ∼1 λδc(p|λq) + (1 − λ)q, then the axiom implies that c(p|λ q) ≥ c(p) for all p q ∈ L1 and λ ∈ (0 1). The implication of this axiom on responses to the Allais questionnaire above is as follows: if you choose the nondegenerate lottery B, then you must also choose D. This prediction is empirically rarely violated in versions of the Allais questionnaire that involved positive outcomes.9,10 7 This example is taken from Kahneman and Tversky (1979). Of 95 subjects, 80% choose A over B, 65% choose D over C, and more than half choose the pair A and D. 8 Camerer (1995) gave an extensive survey of the experimental evidence against expected utility, including the “common consequence effect” and “common ratio effect” that are related to the Allais paradox. 9 Conlisk (1989), for example, replicated the two basic Allais questions. About half of his subjects (119 out of 236) violate expected utility. The fraction of violations that are of the B and C type is 16/119 013. 10 It is worth mentioning that there is also some empirical and experimental evidence that conflicts with NCI. For example, NCI will be inconsistent with the “reflection effect,” that is, a common ratio effect with negative numbers (Kahneman and Tversky (1979), Machina (1987)). I thank a referee for pointing this out.
1981
ONE-SHOT RESOLUTION OF UNCERTAINTY
As we mentioned before, NCI does not imply any probabilistic distortion. This observation becomes relevant in experiments similar to the one reported by Conlisk (1989, p. 398), who studied the robustness of Allais-type behavior to boundary effects. Conlisk considered a slight perturbation of prospects similar to A B C, and D above, so that (i) each of the new prospects, A B C , and D , yields all three prizes with strictly positive probability, and (ii) in the resulting “displaced Allais question” (namely choosing between A and B , and then choosing between C and D ), the only pattern of choice that is consistent with expected utility is either the pair A and C or the pair B and D . Although violations of expected utility become significantly less frequent and are no longer systematic (a result that supports the claim that violations can be explained by the certainty effect), a nonlinear probability function predicts that this increase in consistency would be the result of fewer subjects choosing A over B , and not because more subjects choose C over D . In fact, the latter occurred, which is consistent with NCI. PROPOSITION 1: Under Axioms A0, A1, and A2, 1 satisfies NCI if and only if displays PORU. PROOF: Only If. Suppose 1 satisfies NCI. We need to show that an arbitrary two-stage lottery Q is never preferred to its single-stage counterpart Dρ(Q) . Without loss of generality, assume that there are l outcomes in the support of Q. Using Axioms A0–A2, we have Q=
l Q(qi )Dqi
(Axioms A0 A2)
∼
i=1
l
(Axiom A1)
∼
Q(qi )Dδc(qi )
Dl
i i=1 Q(q )δc(qi )
i=1
and by repeatedly applying NCI, l i 1 1 Q(q )δc(qi ) = Q(q )δc(q1 ) + (1 − Q(q )) i=1
Q(qi ) δc(qi ) (1 − Q(q1 )) i=1 (NCI) Q(qi ) i δ 1 Q(q1 )q1 + (1 − Q(q1 )) c(q ) (1 − Q(q1 )) i=1 Q(q1 ) 2 2 q1 = Q(q )δc(q2 ) + (1 − Q(q )) (1 − Q(q2 )) Q(qi ) δc(qi ) + (1 − Q(q2 )) i=12 (NCI)
1 Q(q1 )q1 + Q(q2 )q2 +
Q(qi )δc(qi ) = · · ·
i=12
1982
DAVID DILLENBERGER
= Q(q )δc(ql ) + (1 − Q(q )) l
l
i=l
(NCI)
1
Q(qi ) qi (1 − Q(ql ))
l Q(qi )qi = ρ(Q) i=1
Therefore, l Q(qi )Dqi ∼ Dl Q=
i i=1 Q(q )δc(qi )
Dρ(Q)
i=1
If. Suppose 1 does not satisfy NCI. Then there exists p, q = x q(x)δx , δy ∈ L1 and λ ∈ (0 1) such that p 1 δy and λδy + (1 − λ)q 1 λp + (1 − λ)q. By monotonicity, λδc(p) + (1 − λ)q 1 λp + (1 − λ)q. Let Q := λDp + (1 − λ) x q(x)Dδx and note that Q ∼ λDδc(p) + (1 − λ) q(x)Dδx x
[λp(x) + (1 − λ)q(x)]Dδx ∼ Dρ(Q) x
which violates PORU.
Q.E.D.
The idea behind Proposition 1 is simple: the second step of the foldingback procedure involves mixing all certainty equivalents of the corresponding second-stage lotteries. Applying NCI repeatedly implies that each certainty equivalent loses relatively more (or gains relatively less) from the mixture than the original lottery that it replaces would. Proposition 1 ties together two notions that are defined on different domains. The equivalence of PORU and NCI suggests that being averse to the gradual resolution of uncertainty and being prone to Allais-type behavior are synonymous. This assertion justifies the proposed division of the space of twostage lotteries into the one-shot and gradually resolved lotteries. On the one hand, numerous replications of the Allais paradox in the last 50 years prove that the availability of a certain prize in the choice set affects behavior in a systematic way. On the other hand, empirical and experimental studies involving dynamic choices and experimental studies on preferences for uncertainty resolution are still rather rare. Proposition 1 thus provides new theoretical predictions for dynamic behavior, based on robust (static) empirical evidence. 3. PORU AND THE VALUE OF INFORMATION Suppose now that before the second-stage lottery is played, but after the realization of the first-stage lottery, the decision maker can take some action
ONE-SHOT RESOLUTION OF UNCERTAINTY
1983
that might affect his ultimate payoff. The primitive in such a model is a preference relation over information systems (as we formally define below), which is induced from preferences over compound lotteries. Assume throughout this section that preferences over compound lotteries satisfy Axioms A0–A2. An immediate consequence of Blackwell’s (1953) seminal result is that in the standard expected utility class, the DM always prefers to have perfect information before making the decision, which allows him to choose the optimal action corresponding to the resulting state. Schlee (1990) showed that if 1 is of the rank-dependent utility class (Quiggin (1982)), then the value of perfect information will always be nonnegative. This value is computed relative to the value of having no information at all and, therefore, Schlee’s result has no implications for the comparison between getting complete and partial information. Safra and Sulganik (1995) left open the question of whether there are static preference relations, other than expected utility, for which, when applied recursively, perfect information is always the most valuable. We show below that this property is equivalent to PORU. As a corollary, such preferences for perfect information are fully characterized by NCI. Formally, fix an interval of monetary prizes X ⊂ R. Let S = {s1 sN } be a finite set of possible states of nature. Each state s ∈ S occurs with probability ps . Let J = {j1 jM } be a finite set of signals and let A = {a1 aH } be a finite set of actions. Let u : A×S → X be a function that gives the deterministic outcome u(a s) (an element of X) if action a ∈ A is taken and the realized state is s ∈ S. The collection Ω = {S J A (ps )s∈S u} is called an information environment. Let π : S × J → [0 1] be a function such that π(s j) is the conditional probability of getting the signal j ∈ J when the prevailing state is s ∈ S. We naturally require that for all s ∈ S, j∈J π(s j) = 1 (so that when the prevailing state is s, there is some probability distribution on the signals the DM might get). The function π is called an information system. For any s ∈ S, denote the updated probability of s after the signal j ∈ J is obtained by p(s|j) = π(s j)ps / s ∈S π(s j)ps . A full information system, I, is a function such that for all s ∈ S there exists j(s) ∈ J with p(s|j(s)) = 1. The null information system, φ, is a function such that p(s|j) = ps for all s ∈ S and j ∈ J. Let pj (a) ∈ L1 be the second-stage lottery if signal j is obtained and action a ∈ A is taken, that is, pj (a) = s∈S p(s|j)δu(as) . For aj ∈ arg maxa∈A V (pj (a)), let pj∗ := pj (aj ). Let V (π) := V ( j∈J ( s∈S π(s j)ps )Dpj∗ ) be the value of the optimal compound lottery, that is, the compound lottery assigning probability j∗ j)p to p . Note that V (φ) = max V ( p δu(as) ) and αj (π) = s∈S π(s s a∈A s s∈S that V (I) = V ( s∈S ps δu(a(s)s) ), where a(s) is an optimal action if you know that the prevailing state is s, that is, a(s) ∈ arg maxa∈A u(a s). DEFINITION 2: The relation displays preferences for perfect information if for every information environment Ω and any information system π, V (I) ≥ V (π).
1984
DAVID DILLENBERGER
PROPOSITION 2: If satisfies Axioms A0–A2, then the two statements below are equivalent: (i) The relation displays PORU. (ii) The relation displays preferences for perfect information. Analogously, PGRU holds if and only if for every information environment Ω and any information system π, V (π) ≥ V (φ). Since any temporal lottery corresponds to an information environment in which for all a ∈ A, u(a s) = v(s) ∈ X, showing that (i) is necessary for (ii) is immediate. For the other direction, we note that two forces reinforce each other. First, getting full information means that the underlying lottery is of the one-shot resolution type, since uncertainty is completely resolved by observing the signal. Second, better information enables better planning; using it, a decision maker with monotonic preferences is sure to take the optimal action in any state. The proof distinguishes between the two motives for getting full information: the former, which is captured by PORU, is intrinsic, whereas the latter, which is reflected via the monotonicity of preferences with respect to outcomes, is instrumental. The result for PGRU is similarly proven. The null information system is of the one-shot resolution type and it has no instrumental value. By combining Propositions 1 and 2 we get the following corollary: COROLLARY 1: If satisfies Axioms A0–A2, then displays preferences for perfect information if and only if 1 satisfies NCI. 4. STATIC IMPLICATIONS 4.1. NCI in the Probability Triangle Fix three prizes x3 > x2 > x1 . All lotteries over these prizes can be represented as points in a two-dimensional space, Δ{δx1 δx2 δx3 } := {p = (p1 p3 ) | p1 p3 ≥ 0 p1 + p3 ≤ 1} as in Figure 1. The origin (0 0) represents the lottery δx2 . The probability of the high prize, p(x3 ) = p3 , is measured on the vertical axis, and the probability of the low prize, p(x1 ) = p1 , is measured on the horizontal axis. The probability of obtaining the middle prize is p(x2 ) = p2 = 1 − p1 − p3 . Given these conventions, monotonicity implies that preferences increase in the northwest direction. The properties below are geometric restrictions that NCI (hence PORU) imposes on the map of indifference curves in any probability triangle Δ, which corresponds to some triple x3 > x2 > x1 . LEMMA 2—Quasiconcavity: If 1 satisfies NCI, then V is quasiconcave, that is, V (αp + (1 − α)q) ≥ min{V (p) V (q)}. COROLLARY 2: If 1 satisfies NCI, then all indifference curves in Δ are convex.
ONE-SHOT RESOLUTION OF UNCERTAINTY
1985
FIGURE 1.—The probability triangle (showing linear indifference curves). The bold indifference curve through the origin demonstrates the steepest middle slope property (Lemma 3).
Let μ(p) be the slope, relative to the (p1 p3 ) coordinates, of the indifference curve at lottery p. Slope μ(p) is the marginal rate of substitution between a probability shift from x2 to x3 and a probability shift from x2 to x1 . As explained by Machina (1982), changes in the slope express local changes in attitude toward risk: the greater the slope, the more (local) risk averse the DM is. Denote by μ+ (p) the right derivative of the indifference curve at p and denote by int(Δ) the interior of Δ. Let I(p) := {q ∈ Δ | q ∼ p}. DEFINITION 3: The function V satisfies the steepest middle slope property if the following statements hold: (i) The indifference curve through the origin is linear, that is, q ∈ I((00)) implies μ(q) = μ+ ((0 0)) := μ(I((00)) ). (ii) The indifference curve through the origin is the steepest, that is, μ(I((00)) ) ≥ μ+ (q) for all q ∈ int(Δ)11 LEMMA 3—Steepest Middle Slope: If 1 satisfies NCI, then V satisfies the steepest middle slope property. The applicability of the steepest middle slope property stems from its simplicity. To detect violation of PORU, one need not construct the (potentially complicated) exact choice problem. Rather, it is often sufficient to “examine” the slopes of one-dimensional indifference curves. This, in turn, is a relatively simple task, at least once a utility function is given. Proposition 3 below is based on this observation. The linearity of the indifference curve through the origin is implied by applying NCI twice: p 1 δx ⇒ p = αp + (1 − α)p 1 αδx + (1 − α)p 1 αδx + (1 − α)δx = δx . Therefore, p ∼1 δx ⇒ αp + (1 − α)δx ∼1 δx . 11
By Corollary 2, all the right derivatives exist (see Rockafellar (1970, p. 214)).
1986
DAVID DILLENBERGER
Examples of preferences that satisfy NCI will be given in Section 4.2.1. For now, we use both lemmas to argue that two broad and widely used classes of preferences—rank-dependent utility (Quiggin (1982)) and quadratic utility (Chew, Epstein, and Segal (1991))—do not satisfy NCI unless they coincide with expected utility. Order the prizes x1 < x2 < · · · < xn . The functional form for rank-dependent utility is n p(xi )δxi = g(p(x1 ))u(x1 ) V i=1
i i−1 n + u(xi ) g p(xj ) − g p(xj ) i=2
j=1
j=1
where g : [0 1] → [0 1] is increasing, g(0) = 0, and g(1) = 1. If g(p) = p, then rank-dependent utility reduces to expected utility. The functional form for quadratic utility is n n n p(xi )δxi = ϕ(xi xj )p(xi )p(xj ) V i=1
i=1 j=1
where ϕ : X × X → R is some symmetric function. If ϕ(xi xj ) = (u(xi ) + u(xj ))/2, then quadratic utility reduces to expected utility. PROPOSITION 3: If 1 satisfies NCI and is a member of either the rankdependent utility class or the quadratic utility class, then V is an expected utility functional.12 Confining his attention to smooth preferences, in the sense that the function V is Fréchet differentiable, Machina (1982) suggested the following fanning-out property: for all p q ∈ Δ, if p first-order stochastically dominates q, then μ(p) ≥ μ(q). If for all such p = q, we have μ(p) > μ(q), then we say that 1 satisfies the proper fanning-out property. Lemma 3 immediately implies that if 1 satisfies NCI, then 1 does not satisfy the proper fanning-out 12 Segal (1990, Section 5) used a different, but equivalent, way to write the functional form for rank-dependent utility, using the transformation f (p) = 1 − g(1 − p). He showed that within this model, if f is convex and its elasticity is nondecreasing, then the desirability of a two-stage lottery of the form αDδy + (1 − α)Dβδy +(1−β)δx decreases as the two stages become less degenerate. Similar results are stated in Segal (1987, Theorem 4.2). This condition is not sufficient to imply global PORU. For example, let f (p) = p2 , which satisfies Segal’s conditions and u(x) = x. Take three prizes, 0 1 and 2 and note that V ( 12 Dδ1 + 12 D((√2−1)/√2)δ0 +(1/√2)δ2 ) = 1 > 0853 = √ 1 √ δ0 + 1 δ1 + √ δ ) V ( 22−1 2 2 2 2 2
ONE-SHOT RESOLUTION OF UNCERTAINTY
1987
property. This observation does not contradict the usual explanation of fanning out as a resolution to Allais paradox. Typical Allais experiments with positive outcomes (as the one described in Section 2) provide evidence of behavior in the lower right subtriangle. In this region, NCI is consistent with fanning out.13 The following example demonstrates that the steepest n middle slope property is weaker than NCI.14 For a fixed n ≥ 4, let p = n1 j=1 δ((j/n)w+((n−j)/n)b) and let n−1 p = n1 j=0 δ((j/n)w+((n−j)/n)b) . Note that p first-order stochastically dominates p (denoted p >1 p). Let L1|j ⊂ L1 be the set of lotteries with j possible outcomes, that is, L1|j = {p ∈ L1 : |S(p)| = j}. Define L∗ := {p ∈ L1 : p >1 p >1 p}. Observe that for j 3, q ∈ L1|j ⇒ q ∈ / L∗ . 1 For any p ∈ L , denote by p∗ its cumulative distribution function. Let q∗ −p∗ 2 d : L∗ → [0 1] be defined as d(q) =( p ∗ −p∗ ) , where · is the L1 norm. De∗ ∗ fine f : L → L by f (q) = d(q)p + (1 − d(q))p. Note that f (p) = p and that f (p) = p. Furthermore, if q r ∈ L∗ and q >1 r, then d(q) < d(r) and f (q) >1 f (r). Denote by e(p) the expectation of a lottery p ∈ L1 , that is, e(p) = x xp(x). Define
e(p) if p ∈ L1 \ L∗ , V (p) = e(f (p)) if p ∈ L∗ . The function V is continuous, is monotone, and satisfies the steepest middle slope property, but does not satisfy NCI.15 4.2. Betweenness For the rest of the section, assume that 1 is quasiconvex, that is, ∀p q ∈ L1 , V (αp + (1 − α)q) ≤ max{V (p) V (q)}. The conjunction of quasiconvexity with quasiconcavity (Lemma 2) yields the following axiom: AXIOM A3—Single-Stage Betweenness: For all p q ∈ L1 and α ∈ [0 1], p 1 q implies p 1 αp + (1 − α)q 1 q. Axiom A3 is a weakened form of the vNM independence axiom. It implies neutrality toward randomization among equally good lotteries. It yields the following representation: 13 The behavioral evidence that supports fanning out is generally weaker in the upper left subtriangle than in the lower right subtriangle (see Camerer (1995)). 14 I thank Danielle Catambay for her help constructing this example. 15 Suppose that X = [0 1]. Let p = 05p + 05p ∈ L∗ and note that V (p ) = 2n+1 But for γ 4n sufficiently close to 1, γp + (1 − γ)δ(2n+1)/(4n) ∈ L∗ and V (γp + (1 − γ)δ(2n+1)/(4n) ) = 2n+1 . 4n
1988
DAVID DILLENBERGER
PROPOSITION—Chew (1989), Dekel (1986): The relation 1 satisfies Axiom A3 if and only if there exists a local utility function u : X × [0 1] → [0 1], which is continuous in both arguments, is strictly increasing in the first argument, and satisfies u(w v) = 0 and u(b v) = 1 for all v ∈ [0 1], such that p q ⇔ V (p) ≥ V (q), where V (p) is defined implicitly as the unique v ∈ [0 1] that solves p(x)u(x v) = v x
The next result gives the utility characterization of NCI within the between ness class of preferences. Let W (p v) := x p(x)u(x v) and denote by L1|2 the set of all binary lotteries, that is, L1|2 = {p ∈ L1 : |S(p)| = 2} PROPOSITION 4: If 1 satisfies Axiom A3, then the following three statements are equivalent: (i) The relation 1 satisfies NCI. (ii) For all p ∈ L1 and ∀v ∈ [0 1], W (p v) − W (δc(p) v) ≥ 0 (iii) For all p ∈ L1|2 and ∀v ∈ [0 1], W (p v) − W (δc(p) v) ≥ 0. Dekel (1986) provided the following observation: If W (p v) = v and W (q v) = v , then V (p) ≥ V (q) ⇔ v ≥ v . That is, to compare two lotteries p and q, it is enough to evaluate them at the same value v, which is between V (p) and V (q). The proof of Proposition 4 is based on Dekel’s observation. The term W (p v) can be interpreted as the value (expected utility) of p relative to a reference utility level v. Roughly speaking, condition (ii) then implies that risk aversion is maximized at the true lottery value: by definition, W (p V (p)) = W (δc(p) V (p)) = V (p), whereas the value assigned to p relative to any other v is (weakly) greater than that of δc(p) . Put differently, condition (ii) is the utility equivalent of the requirement that the conditional certainty equivalent of p (when p is a part of a mixture) is never less than its unconditional certainty equivalent (see Section 2). Condition (iii) is condition (ii) restricted to binary lotteries. It is equivalent to the following weaker version of NCI: ∀q δx ∈ L1 , p ∈ L1|2 , and λ ∈ [0 1], p 1 δx implies λp + (1 − λ)q 1 λδx + (1 − λ)q. We use the convexity of betweenness indifference sets to show that condition (iii) is also sufficient for condition (ii).16 4.2.1. Examples In a dynamic context, expected utility preferences trivially satisfy PORU: a DM with such preferences is just indifferent to the way uncertainty is resolved. 16
In terms of preferences, the steepest middle slope property is equivalent to NCI with the restrictions that p ∈ L1|2 and S(q) ⊆ x ∪ S(p). Its analogous utility characterization is ∀p ∈ L1|2 with two outcomes xp >xp , W (p v) − W (δc(p) v) ≥ 0 ∀v ∈ (V (δxp ) V (δxp )) Note that this condition is weaker then condition (iii) in Proposition 4.
ONE-SHOT RESOLUTION OF UNCERTAINTY
1989
Gul (1991) proposed a theory of disappointment aversion. He derived the local utility function: ⎧ ⎨ φ(x) + βv φ(x) > v, u(x v) = 1+β ⎩ φ(x) φ(x) ≤ v, with β ∈ (−1 ∞) and φ : X → R increasing. For Gul’s preferences, the sign of β, the coefficient of disappointment aversion, unambiguously determines whether preferences satisfy PORU (if β ≥ 0) or PGRU (if −1 < β ≤ 0). (See Artstein-Avidan and Dillenberger (2010).)17 4.2.2. NCI and Differentiability In most economic applications, it is assumed that individuals’ preferences are “smooth.” We confine our attention to the betweenness class and suppose that the local utility function u : X × [0 1] → [0 1] is sufficiently differentiable with respect to both arguments. In this case, the function V is (continuously) Fréchet differentiable (Wang (1993)).18 The following result demonstrates that coupling this smoothness assumption with NCI leads us back to expected utility. PROPOSITION 5: Suppose u(x v) is at least twice differentiable with respect to both arguments, and that all derivatives are continuous and bounded. Then preferences satisfy NCI if and only if they are expected utility. To prove Proposition 5, we use the fact that betweenness (Axiom A3), along with monotonicity, implies that indifference curves in any unit probability triangle are positively sloped straight lines. In particular, for any lottery p ∈ Δ such that V (p) = v, μ(p) = μ(v|x3 x2 x1 ) =
u(x2 v) − u(x1 v) u(x3 v) − u(x2 v)
Expected utility preferences are characterized by the independence axiom that implies NCI. To show the other direction, we fix v and denote by x(v) the unique x satisfying v = u(x v). Combining Lemma 3 with differentiability implies that for any x > x(v) > w, the derivative with respect to v of 17 The question of whether there is a continuous and monotone function V : L1 → R, which represents preferences that satisfy NCI but not betweenness, remains open. 18 The notion of smoothness we consider here is the one assumed in Neilson (1992). For a formal definition of Fréchet differentiability, see Machina (1982). Roughly speaking, Fréchet differentiability means that V (p) changes continuously with p and that V can be locally approximated by a linear functional. The economic meaning of Fréchet differentiability is discussed in Safra and Segal (2002).
1990
DAVID DILLENBERGER
μ(v|x x(v) w) must vanish at v. We use the fact that this statement is true for any x > x(v) and that v is arbitrary to get a differential equation with a solution on {(x v) | v < u(x v)} given by u(x v) = h1 (v)g1 (x) + f 1 (v) and h1 (v) > 0. We perform a similar exercise for x < x(v) < b to uncover that on the other region, {(x v) | v > u(x v)}, u(x v) = h2 (v)g2 (x) + f 2 (v) and h2 (v) > 0. Continuity and differentiability then imply that the functional form is equal in both regions, and, therefore, for all x, u(x v) = h(v)g(x) + f (v) and h(v) > 0. The uniqueness theorem for betweenness representations establishes the result.19 5. GRADUAL RESOLUTION PREMIUM We now extend our results to finite-stage lotteries. 5.1. Extension to n-Stage Lotteries Fix n ∈ N and denote the space of finite n-stage lotteries by Ln . The extension of our setting to Ln is as follows: equipped with a continuous and increasing function V : L1 → R, the DM evaluates any n-stage lottery by folding back the probability tree and applying the same V in each stage. Preferences for one-shot resolution of uncertainty imply that the DM prefers to replace each compound sublottery with its single-stage counterpart. The equivalence between PORU and NCI remains intact. In what follows, we will continue simplifying notation by writing V (Q) for the value of any multistage lottery Q. We sometimes write Qn to emphasize that we consider an n-stage lottery. 5.2. Definitions As before, for any p ∈ L1 we denote by e(p) the expectation of p. We say that p second-order stochastically dominates q if for every nondecreas ing concave function u, x u(x)p(x) ≥ x u(x)q(x). The DM is risk averse if ∀p q ∈ L1 with e(p) = e(q), p second-order stochastically dominates q implies p 1 q. For any p ∈ L1 , the risk premium of p, denoted by rp(p), is the number that satisfies δe(p)−rp(p) ∼1 p. The risk premium rp(p) is the amount that the DM would pay to replace p with its expected value. By definition, rp(p) ≥ 0 whenever the DM is risk averse.20 19 Neilson (1992) provided sufficient conditions for smooth (in the sense of Proposition 5) betweenness preferences to satisfy the mixed-fan hypothesis (that is, indifference curves fan out in the lower right subtriangle and fan in in the upper left subtriangle). The additional requirement that the switch between fanning out and fanning in always occurs at the indifference curve that passes through the origin (the lottery that yields the middle prize for certain) renders those conditions empty, as is evident from Proposition 5. 20 Weak risk aversion is defined as for all p, δe(p) p. This definition is not appropriate once we consider preferences that are not expected utility. The definition of the risk premium, on the other hand, is independent of the preferences considered.
ONE-SHOT RESOLUTION OF UNCERTAINTY
1991
DEFINITION 4: Fix p ∈ L1 and let P (p) := {Q | ρ(Q) = p}. For any Q ∈ P (p), the gradual resolution premium of Q, denoted by grp(Q), is the number that satisfies 1 δc(p)−grp(Q) ∼ Q. The gradual resolution premium grp(Q) is the amount that the DM would pay to replace Q with its single-stage counterpart. By definition, PORU implies grp(Q) ≥ 0. Since c(p) = e(p) − rp(p), we can, equivalently, define grp(Q) as the number that satisfies 1 δe(p)−rp(p)−grp(Q) ∼ Q. Observe that the signs of rp(p) and grp(Q) need not agree. In other words, (global) risk aversion does not imply and is not implied by PORU. Indeed, Gul’s disappointment aversion preferences (see Section 4.2.1) are risk averse if and only if β ≥ 0 and φ : X → R is concave (Gul (1991, Theorem 3)). However, for sufficiently small β ≥ 0 and sufficiently convex φ, one can find a lottery p with rp(p) < 0, whereas β ≥ 0 is sufficient for grp(Q) ≥ 0 for all Q ∈ P (p). On the other hand, if λ (v) > 0 and λ(v) > 1 for all v,21 then the local utility function
x if x > v, u(x v) = v − λ(v)(v − x) if x ≤ v, has the property that u(· v) is concave for all v. Therefore, the DM is globally risk averse (Dekel (1986, Property 2)), and hence rp(p) ≥ 0 ∀p ∈ L1 . However, these preferences do not satisfy NCI,22 meaning that there exists Q ∈ P (p) with grp(Q) < 0. 5.3. The Magnifying Effect In the case where the DM is both risk averse and displays PORU, these two forces magnify each other. By varying the parameter n, we change the frequency at which the DM updates information. Our next result demonstrates that high frequency of information updates (sufficiently large value of n) alone might inflict an extreme cost on the DM; a particular splitting of a lottery drives down its value to the value of the worst prize in its support. Although the same result holds for more general preferences, for purposes of clarity, we state Proposition 6 below in terms of biseparable preferences. DEFINITION 5: The relation 1 satisfies biseparability if there exist an increasing and continuous function π from [0 1] onto [0 1] and a mapping 21 The condition that λ(v) is nondecreasing is both necessary and sufficient for u to be a local utility function. See Nehring (2005). 22 Look at the slope of an indifference curve for values x3 > v > x2 > x1 . We have λ(v)(x2 −x1 ) . In this region, the slope is increasing in v if x3 > λ(v)(λ(v)−1) + v. μ(v|x3 x2 x1 ) = x3 −v+λ(v)(v−x λ (v) 2) For a given v, we can always choose arbitrarily large x3 that satisfies the condition, and construct, by varying the probabilities, a lottery whose value is equal to v. Apply this argument in the limit where v = x2 to violate the steepest middle slope property.
1992
DAVID DILLENBERGER
φ : X → R (unique up to positive affine transformations), such that the restriction of 1 to {αδx + (1 − α)δy : α ∈ [0 1] x y ∈ X and x > y} can be represented by the function V (αδx + (1 − α)δy ) = π(α)φ(x) + (1 − π(α))φ(y) Examples of biseparable preferences include any rank-dependent utility (Section 4.1), as well as betweenness preferences that are represented by a local utility of the form
if x > v, v + (φ(x) − φ(v))γ u(x v) = γ v − β(φ(v) − φ(x)) if x ≤ v, with β γ > 0 (Nehring (2005)). We consider biseparable preferences with π(α) < α.23,24 PROPOSITION 6: Suppose 1 satisfiesbiseparability and that π(α) < α. Then m for any ε > 0 and for any lottery p = j=1 p(xj )δxj , there exist T < ∞ and a multistage lottery QT ∈ P (p) such that V (QT ) < minxj ∈S(p) φ(xj ) + ε. Let p be a binary lottery that yields 0 and 1 with equal probabilities. Consider n tosses of an unbiased coin. Define a series of random variables {zi }ni=1 with zi = 1 if the ith toss is heads and zi = 0 if it is tails. Let the terminal nodes of the n-stage lottery be 1
n n zi > if 2 i=1
05δ1 + 05δ0
if
0
n n if zi < 2 i=1
n n zi = 2 i=1
Note that the value of this n-stage lottery, calculated using recursive biseparable preferences as in the premise of Proposition 6, is identical to the value 23 Note that these preferences need not satisfy NCI. For example, in rank-dependent utility, π(p) = 1 − g(1 − p) < p if g is concave. 24 In the context of decision making under subjective uncertainty (with unknown probabilities), Ghirardato and Maccheroni (2001) argued that the biseparable preferences model is the most general model that achieves a separation between cardinal utility and a unique representation of beliefs.
ONE-SHOT RESOLUTION OF UNCERTAINTY
1993
calculated using recursive expected utility and probability π(05) < 05 for heads in each period. Applying the weak law of large numbers yields n n →1 Pr zi < 2 i=1 and, therefore, for n large enough, the value approaches φ(0). We use a similar construction to establish that this result holds true for any lottery. If most actual risks that individuals face are resolved gradually over time, then these risks cannot be compounded into a single lottery and, therefore, the gradual resolution premium should not be disregarded. The combination of risk aversion and PORU can help explain why people often buy periodic insurance for moderately priced objects, such as electrical appliances and cellular phones, at much more than the actuarially fair rates.25 A formal analysis of this phenomenon will be developed in future work. APPENDIX PROOF OF PROPOSITION 2: Since any temporal lottery corresponds to some information environment in which u(a s) = v(s) ∈ X for all a ∈ A, showing that (i) is necessary for (ii) is immediate. To show sufficiency, fix an information environment Ω = {S J A (ps )s∈S u}. Let Q and pj be two intermediate lotteries, where pj assigns probability p(s|j) to the outcome u(a(s) s), and the compound lottery Q assigns probability αj (π) to pj , that is, Q = j∈J αj (π)Dδu(a(s)s) . Clearly, since for each state s and for any action a we have u(a s) ≤ u(a(s) s), by monotonicity of the value of a lottery with respect to the relation of first-order stochastic dominance, V (pj∗ ) ≤ V (pj ) and, hence, by the same reason, also V (π) ≤ V (Q). However, now Q is simply the folding back of the two-stage lottery, which when played in one shot is the lottery that corresponds to full information system I. Thus by (i) we have that V (I) ≥ V (Q). Combining the two inequalities establishes the result. Similarly, it is obvious that PGRU is necessary for φ to be the least valuable information system. To show sufficiency, let a = arg maxa V ( s∈S ps δu(as) ). Let Q and pj be two intermediate lotteries, where pj assigns probability p(s|j) to the outcome u(a s), and the compound lottery Q assigns probability αj (π) to pj , that is , Q = j∈J αj (π)Dδu(as) . By definition, V (pj ) ≤ V (pj∗ ) for all j and, therefore, by monotonicity, V (Q) ≤ V (π). 25 An example was given by Tim Harford (“The Undercover Economist,” Financial Times, May 13, 2006): “There is plenty of overpriced insurance around. A popular cell phone retailer will insure your $90 phone for $1.70 a week—nearly $90 a year. The fair price of the insurance is probably closer to $9 a year than $90.”
1994
DAVID DILLENBERGER
However, now Q is simply the folding back of the two-stage lottery, which when played in one shot is the lottery corresponding to φ. Thus by (i), we have that V (φ) ≤ V (Q). Combining the two inequalities establishes the result. Q.E.D. PROOF OF LEMMA 2: Suppose not. Then there exist p q ∈ L1 and α ∈ (0 1) such that V (αDp + (1 − α)Dq ) = V αδc(p) + (1 − α)δc(q) ≥ min V δc(p) V δc(q) > V δc(αp+(1−α)q) = V (αp + (1 − α)q) where the weak inequality is implied by monotonicity, contradicting PORU. Q.E.D. PROOF OF LEMMA 3: (i) By monotonicity and continuity, there exists q = (q (1 − q)) ∈ I((00)) . By applying NCI twice, q = βq + (1 − β)q βq + (1 − β)(0 0) β(0 0) + (1 − β)(0 0) = (0 0) for all β ∈ [0 1]. Since q ∈ I((00)) , the result follows. (ii) Suppose not. Let q be a lottery such that μ(I((00)) ) < μ+ (q ). Take p ∈ I((00)) and look at the triangle with vertices (0 0) p q . Using the triangle proportional sides theorem, for α sufficiently close to 1, we have αq + (1 − Q.E.D. α)(0 0) αq + (1 − α)p a contradiction. PROOF OF PROPOSITION 3: (i) Suppose that 1 is of the rank-dependent utility class. Let L1|2 be the set of all binary lotteries, that is, L1|2 = {p ∈ L1 : |S(p)| = 2}. Consider the following axiom: AXIOM A∗ : For all q ∈ L1|2 , x ∈ X and α ∈ (0 1), q ∼1 δx implies αq + (1 − α)δx ∼1 δx . By Lemma 3, NCI implies Axiom A∗ . Bell and Fishburn (2003, Theorem 1) showed that if 1 is of the rank-dependent utility class and satisfies Axiom A∗ , then 1 is expected utility. (ii) Suppose that 1 is of the quadratic utility class. Fix x3 > x2 > x1 . By the quadratic utility formula, μ(p) equals p1 [ϕ(x1 x2 ) − ϕ(x1 x1 )] + p3 [ϕ(x2 x3 ) − ϕ(x1 x3 )] + (1 − p1 − p3 )[ϕ(x2 x2 ) − ϕ(x1 x2 )] / p1 [ϕ(x1 x3 ) − ϕ(x1 x2 )] + p3 [ϕ(x3 x3 ) − ϕ(x2 x3 )] + (1 − p1 − p3 )[ϕ(x2 x3 ) − ϕ(x2 x2 )]
ONE-SHOT RESOLUTION OF UNCERTAINTY
1995
Note that if μ(m 1 − m) = μ(x 1 − x) = k, then for all α ∈ [0 1], μ(αm + (1 − α)x α(1 − m) + (1 − α)(1 − x)) = k. Lotteries p and q lie on the same expansion path if there is a common subgradient to the indifference curves at p and q. Chew, Epstein, and Segal (1991) showed that for any quadratic utility, all expansion paths are straight lines and perspective, that is, they have a common point of intersection, which could be infinity if they are parallel lines. An implication of this projective property is that for all m ∈ (0 1) there exists either (i) x ∈ (0 1) such that μ+ (m 0) = μ+ (0 x) or (ii) y ∈ (0 1) such that μ+ (m 0) = μ(y 1 − y). For case (i), let α∗mx ∈ (0 1) solves α(m 0) + (1 − α)(0 x) ∈ I((00)) . By Lemmas 2 and 3, μ+ (0 0) ≤ μ(α∗mx m (1 − α∗mx )x) = μ+ (0 x) ≤ μ+ (0 0) and similarly for case (ii). Therefore, all indifference curves are linear and parallel, hence preferences are expected utility. Q.E.D. PROOF OF PROPOSITION 4: Let W (q v) := x q(x)u(x v). (i) ⇒ (ii) Suppose not. Then there exists a lottery p such that W (p v) − W (δc(p) v) < 0 for some v. Pick y ∈ X and α ∈ (0 1) such that V (αp + (1 − α)δy ) = v. We have v < αu(c(p) v)+(1−α)u(y v) = W (αδc(p) +(1−α)δy v) or αδc(p) + (1 − α)δy 1 αp + (1 − α)δy , contradicting NCI. (ii) ⇒ (i) Assume p 1 δx . Then W (p V (p)) ≥ W (δx V (p)). By (ii) and monotonicity, W (p v) ≥ W (δx v) for all v and, in particular, for v = V (λp + (1 − λ)q).26 Therefore, W (λp + (1 − λ)q V (λp + (1 − λ)q)) ≥ W (λδx + (1 − λ)q V (λp + (1 − λ)q)), which is equivalent to λp + (1 − λ)q 1 λδx + (1 − λ)q. (iii) ⇒ (ii) Take a lottery p with |S(p)| = n−1 that belongs to an indifference set Iv := {p : W (p v) = v} in an (n − 1)-dimensional unit simplex. Assume / S(p), 1 δxv ∈ Iv 27 . By monotonicfurther that for some xv ∈ (w b) with xv ∈ 28 ity and continuity, p can be written as a convex combination αr + (1 − α)w for some α ∈ (0 1) and r w ∈ Iv with |S(r)| = |S(w)| = n − 2. By the same argument, both r and w can be written, respectively, as convex combinations of two other lotteries with size of support equal to n − 3 and that belong to Iv . Continue in the same fashion to get an index set J and a collection of lotteries, {qj }j∈J , such that for all j ∈ J, |S(qj )| = 2 and qj ∈ Iv . Note that by monotonicity, if y z ∈ S(qj ) then either z > xv > y or y > xv > z. By construcIf p ∼ δx , the assertion is evident. Otherwise, we need to find p∗ that is both first-order stochastically dominated by p and satisfies p∗ ∼ δx , and to use the monotonicity of u(· v) with respect to its first argument. By continuity, such p∗ exists. 27 The analysis would be the same, although with messier notation, even if |S(p)| = n, that is, if x ∈ S(p). 28 These two assumptions guarantee that no indifference set terminates in the relative interior of any k ≤ n − 1-dimensional unit simplex. 26
1996
DAVID DILLENBERGER
tion, for some α1 αJ with αj > 0 and j αj = 1, j αj qj = p. By hypothesis, W (qj v ) ≥ u(xv v ) for all j ∈ J and for all v , and, therefore, also W (p v ) = αj W (qj v ) = αj qxj u(x v ) x
j
≥
j
αj u(xv v ) = u(xv v ) = u(c(p) v )
j
(ii) ⇒ (iii) Obvious.
Q.E.D.
PROOF OF PROPOSITION 5: Since for expected utility preferences, NCI is always satisfied, it is enough to demonstrate the result for lotteries with at most three prizes in their support. For x ∈ [w b], denote by V (δx ) the unique solution of v = u(x v). Without loss of generality, set u(w v) = 0 and u(b v) = 1 for all v ∈ [0 1]. Fix v ∈ (0 1). By monotonicity and continuity there exists x(v) ∈ (w b) such that v = u(x(v)v) V (δx(v) ). Take any x > x(v) and note that μ(v|x x(v) w) = [ u(xv)−u(x(v)v) ] is continuous and differentiable as a function of v on [0 V (δx )]. Since v ∈ (0 V (δx )), Lemma 3 implies that μ(v|x x(v) w) is maximized at v = v. A necessary condition is u(x(v) v) ∂ = 0 ∂v u(x v) − u(x(v) v) Alternatively,29 using v = u(x(v) v) and denoting by ui the partial derivative of u with respect to its ith argument yields u2 (x(v) v)[u(x v) − v] = u2 (x v) − u2 (x(v) v) v (1) Note that by continuity and monotonicity of u(x v) in its first argument, for all x ∈ (x(v) b) there exists p ∈ (0 1) such that pδw + (1 − p)δx ∼1 δx(v) or u(x v)(1 − p) = u(x(v) v) = v. Therefore, and again using Lemma 3, ( 1) is an identity for x ∈ (x(v) b), so we can take the partial derivative of both sides with respect to x and maintain equality. We get u2 (x(v) v)u1 (x v) = u21 (x v)v Since u is strictly increasing in its first argument, u1 (x v) > 0 and v > 0. Thus = u2 (x(v)v) = l(v) independent of x or, by changing order of differentiav ∂ tion, ∂v [ln u1 (x v)] is independent of x. u21 (xv) u1 (xv)
29
Second-order conditions would be u22 (x(v) v) v < (< 1) u22 (x v) u(x v)
1997
ONE-SHOT RESOLUTION OF UNCERTAINTY
Since v was arbitrary, we have the following differential equation on {(x v)|v < u(x v)}: ∂ [ln u1 (x v)] = l(v) ∂v By the fundamental theorem of calculus, the solution of this equation is ∂ [ln u1 (x v)] = l(v) ∂v
v
ln u1 (x v) = ln u1 (x 0) +
⇒
l(s) ds
u1 (x v) = u1 (x 0) exp
s=0
⇒
v
l(s) ds
s=0
⇒
u(x v) − u(x(v) v) = exp
⇒
v
s=0
v
u(x v) − v = exp
l(s) ds
x
l(s) ds
u1 (t 0) dt x(v)
u(x 0) − u(x(v) 0)
s=0
Note that the term v l(s) ds = exp exp s=0
v
u2 (x(s) s) ds s s=0
is well defined, since by the assumption that all derivatives are continuous and bounded, and that u1 > 0, we use l’Hôpital’s rule and implicit differentiation to show that the term u2 (x(s) s) = lim u21 (x(s) s)x (s) + u21 (x(s) s) s→0 s 1 − u2 (x(s) s) + u21 (x(s) s) = lim u21 (x(s) s) s→0 u1 (x(s) s) v is finite and hence ( s=0 u2 (x(s)s) ds) is finite as well. s To uncover u(x v) on the region {(x v)|v > u(x v)}, again fix some v ∈ (0 1) and the corresponding x(v) ∈ (w b) (with v = u(x(v) v)). Take any x < x(v) and note that μ(v|b x(v) x) = [ u(x(v)v)−u(xv) ] is continuous and dif1−u(x(v)v) ferentiable as a function of v on [V (δx ) b]. Since v ∈ (V (δx ) b), by using Lemma 3 we have ∂ u(x(v) v) − u(x v) =0 ∂v 1 − u(x(v) v) lim s→0
1998
DAVID DILLENBERGER
or (2)
u2 (x(v) v) − u2 (x v) [1 − v] = −u2 (x(v) v)[v − u(x v)]
Using the same argumentation from the former case, (2) holds for all x ∈ (w x(v)), so we can take the partial derivative of both sides with respect to x and maintain equality. We get −u21 (x v)[1 − v] = u1 (x v)u2 (x(v) v) Since u is strictly increasing in its first argument, u1 (x v) > 0 and 1 − v > (xv) = − u2 (x(v)v) = k(v) independent of x or, by changing order of 0. Thus uu21(xv) [1−v] 1 differentiation, ∂v∂ [ln u1 (x v)] is independent of x. Since v was arbitrary, we have the following differential equation on {(x v)|v > u(x v)}: ∂ [ln u1 (x v)] = k(v) ∂v Its solution is given by ∂ [ln u1 (x v)] = k(v) ∂v ⇒
1
ln u1 (x 1) − ln u1 (x v) = u1 (x v) = u1 (x 1) exp
⇒
k(s) ds s=v
−1
1
k(s) ds s=v
u(x v) − u(x(v) v) = exp
⇒
−1
1
x(v)
k(s) ds s=v
u1 (t 1) dt x
u(x v) − v = − u(x(v) 1) − u(x 1) exp
⇒
−1
1
k(s) ds
s=v
which is again well defined since 1 k(s) ds = exp exp s=v
and
1 s=v
u2 (x(s) s) ds − [1 − s]
u2 (x(s) s) = lim u21 (x(s) s)x (s) + u21 (x(s) s) lim − s→1 s→1 [1 − s] = lim u21 (x(s) s) s→1
1 − u2 (x(s) s) + u21 (x(s) s) u1 (x(s) s)
ONE-SHOT RESOLUTION OF UNCERTAINTY
1999
is finite, and hence the whole integral is finite. So far we have ⎧ u(x 0) − u(x(v) 0) ⎪ ⎪ v ⎪ ⎪ u2 (x(s) s) ⎪ ⎪ × exp ds x > x(v), ⎪ ⎨ s s=0 (3) u(x v) − v = ⎪ − u(x(v) 1) − u(x 1) ⎪ ⎪ −1 1 ⎪ ⎪ u2 (x(s) s) ⎪ ⎪ ⎩ × exp ds − x < x(v). [1 − s] s=v We add the following restrictions: (i) For all v ∈ [0 1], u(b v) = 1, which implies 1 − u(x(v) 0) exp
u2 (x(s) s) ds = 1 − v s s=0
v
(ii) For all v ∈ [0 1], u(w v) = 0, which implies
1
u(x(v) 1) exp s=v
−1 u2 (x(s) s) ds − = v [1 − s]
Substituting into (3), we get ⎧ 1−v ⎪ ⎪ if x > x(v), ⎨ u(x 0) − u(x(v) 0) [1 − u(x(v) 0)] u(x v)−v = (4) v ⎪ ⎪ if x < x(v). ⎩ − u(x(v) 1) − u(x 1) u(x(v) 1) We add two further requirements: (iii) Continuity at x = x(v), which is immediate since lim (u(x v) − v) = lim (u(x v) − v) = 0
x→− x(v)
x→+ x(v)
(iv) Differentiability at x(v) for all v: u1 (x(v) 0)
1−v v = u1 (x(v) 1) [1 − u(x(v) 0)] u(x(v) 1)
or (5)
u1 (x(v) 1) [1 − u(x(v) v)] u(x(v) 1) = u1 (x(v) 0) [1 − u(x(v) 0)] u(x(v) v)
2000
DAVID DILLENBERGER
Let r(x v) :=
−u11 (xv) u1 (xv)
. Given v ∈ (0 1), note that
⎧ u11 (x 0) ⎪ ⎪ ⎨− u1 (x 0) r(x v) = ⎪ u (x 1) ⎪ ⎩ − 11 u1 (x 1)
x > x(v), x < x(v).
But since u is continuous and r(x v) is well defined, r(x v) must be continuous as well. Therefore, we require −
u11 (x(v) 1) u11 (x(v) 0) =− u1 (x(v) 0) u1 (x(v) 1)
and since this is true for any v and the function x(v) is onto, we have for all x ∈ (w b), −
u11 (x 1) u11 (x 0) =− u1 (x 0) u1 (x 1)
which implies that for some a and b, u(x 1) = au(x 0) + b. But u(0 1) = u(0 0) = 0 and u(1 1) = u(1 0) = 1, hence, by continuity, b = 0 and a = 1 or u(x 1) = u(x 0) := z(x) for all x ∈ [w b]. Plug into (4) to get ⎧ ⎪ ⎨ z(x) − z(x(v))
(6)
1−v [1 − z(x(v))] u(x v) − v = v ⎪ ⎩ − z(x(v)) − z(x) z(x(v))
if x > x(v), if x < x(v),
and plug into (5) to get [1 − v] z(x(v)) u1 (z(x)) =1= u1 (z(x)) [1 − z(x(v))] v or (7)
v [1 − v] = := m(v) z(x(v)) [1 − z(x(v))]
Substituting (7) into (6), we have (8) u(x v) − v = z(x) − z(x(v)) m(v) and using the boundary conditions (i) and (ii), again we find that u(w v) − v = 0 − v = 0 − z(x(v)) m(v)
ONE-SHOT RESOLUTION OF UNCERTAINTY
2001
or (9)
v − z(x(v))m(v) = 0
and u(b v) − v = 1 − v = 1 − z(x(v)) m(v) or (10)
1 = m(v) + v − z(x(v))m(v) = m(v)
where the second equality is implied by (9). Therefore, m(v) = 1, and using (7) and (8), we have u(x v) = z(x) which implies that the local utility function is independent of v, hence preferences are expected utility. Q.E.D. PROOF OF PROPOSITION 6: We first show that the result holds for lotteries of the form αδx + (1 − α)δy , with x > y. There are three cases to consider: CASE 1—α = 05: Construct the compound lottery Qn ∈ P (05δx + 05δy ) as follows: In each period Pr(success) = Pr(failure) = 05. Define
1 if success, zi = i = 1 2 3 0 if failure, The terminal nodes are δx
if
n n zi > 2 i=1
05δx + 05δy
if
n n zi = 2 i=1
δy
if
n n zi < 2 i=1
We claim that limn→∞ V (Qn ) = V (δy ) = φ(y). To prove this claim, we use the fact that the value of the lottery using recursive biseparable preferences (with π(05) < 05) and probability 05 for success in each period is equal to the value of the lottery using recursive expected utility and probability π(05) for success in each period. Since zi ’s are independent and identically distributed
2002
DAVID DILLENBERGER
(i.i.d.) random variables, the weak law of large numbers implies n zi i=1
n or
p
→ π(05) < 05
n n → 1 zi < Pr 2 i=1
Therefore,
n n n V (Q ) = φ(x) Pr zi > 2 i=1
n n zi = + π(05)φ(x) + (1 − π(05))φ(y) Pr 2 i=1 n n → φ(y) + φ(y) Pr zi < 2 i=1
CASE 2 —α < 05: Take Qn+1 = 2α Qn ; 1 − 2α δy , with Qn as defined above. CASE 3—α > 05: Fix ε > 0. Using the construction in Case 1, obtain QT1 with V (QT1 ) ∈ (φ(y) φ(y) + ε2 ). Reconstruct a lottery as above, but replace δy with QT1 in the terminal node. By the same argument, there exist T2 and V (QT1 +T2 ) ∈ (φ(y) φ(y) + ε). Note that the underlying probability of y in QT1 +T2 is 025. Therefore, by monotonicity, the construction works for any α < 075. Repeat in the same fashion to show that the assertion is true for , k = 1 2 and note that αk → 1. αk < 3+4k 4+4k m Now take any finite lottery j=1 αj δxj and order its prizes as x1 < x2 < · · · < xm . Repeat the construction above for the binary lottery xm−1 xm to make its value arbitrarily close to φ(xm−1 ). Then mix it appropriately with xm−2 and repeat the argument above. Continue in this fashion to get a multistage lottery over x2 xm with a value arbitrarily close to φ(x2 ). Conclude by mixing it Q.E.D. with x1 and repeat the construction above. REFERENCES ALLAIS, M. (1953): “Le Comportement de l’Homme Rationnel Devant le Risque: Critique des Postulats et Axiomes de l’École Américaine,” Econometrica, 21, 503–546. [1980]
ONE-SHOT RESOLUTION OF UNCERTAINTY
2003
ARTSTEIN-AVIDAN, S., AND D. DILLENBERGER (2010): “Dynamic Disappointment Aversion: Don’t Tell Me Anything Until You Know for Sure,” Unpublished Manuscript, University of Pennsylvania; Working Paper 10-025, PIER. [1989] BARBERIS, N., AND M. HUANG (2007): “The Loss Aversion/Narrow Framing Approach to the Equity Premium Puzzle,” in Handbook of the Equity Risk Premium, Vol. 1, ed. by R. Mehra. Amsterdam, The Netherlands: Elsevier. [1976] BARBERIS, N., M. HUANG, AND R. H. THALER (2006): “Individual Preferences, Monetary Gambles, and Stock Market Participation: A Case for Narrow Framing,” American Economic Review, 96, 1069–1090. [1976] BELL, D. E., AND P. C. FISHBURN (2003): “Probability Weights in Rank-Dependent Utility With Binary Even-Chance Independence,” Journal of Mathematical Psychology, 47, 244–258. [1994] BELLEMARE, C., M. KRAUSE, S. KRÖGER, AND C. ZHANG (2005): “Myopic Loss Aversion: Information Feedback vs. Investment Flexibility,” Economics Letters, 87, 319–324. [1973] BENARTZI, S., AND R. THALER (1995): “Myopic Loss Aversion and the Equity Premium Puzzle,” Quarterly Journal of Economics, 110, 73–92. [1976] BLACKWELL, D. (1953): “Equivalent Comparison of Experiments,” Annals of Mathematics and Statistics, 24, 265–272. [1983] CAMERER, C. (1995): “Individual Decision Making,” in The Handbook of Experimental Economics, ed. by J. H. Kagel and A. E. Roth. Princeton, NJ: Princeton University Press. [1980,1987] CHEW, S. H. (1989): “Axiomatic Utility Theories With the Betweenness Property,” Annals of Operations Research, 19, 273–298. [1988] CHEW, S. H., AND L. G. EPSTEIN (1989): “The Structure of Preferences and Attitudes Towards the Timing of the Resolution of Uncertainty,” International Economic Review, 30, 103–117. [1977] CHEW, S. H., L. G. EPSTEIN, AND U. SEGAL (1991): “Mixture Symmetry and Quadratic Utility,” Econometrica, 59, 139–163. [1986,1995] CONLISK, J. (1989): “Three Variations on the Allais Example,” American Economic Review, 79, 392–407. [1980,1981] DEKEL, E. (1986): “An Axiomatic Characterization of Preferences Under Uncertainty: Weakening the Independence Axiom,” Journal of Economic Theory, 40, 304–318. [1988,1991] EPSTEIN, L. G., AND S. E. ZIN (1989): “Substitution, Risk Aversion and the Temporal Behavior of Consumption and Asset Returns: A Theoretical Framework,” Econometrica, 57, 937–969. [1977] GHIRARDATO, P., AND F. MACCHERONI (2001): “Risk, Ambiguity and the Separation of Utility and Beliefs,” Mathematics of Operations Research, 26, 864–890. [1992] GNEEZY, U., AND J. POTTERS (1997): “An Experiment on Risk Taking and Evaluation Periods,” Quarterly Journal of Economics, 112, 632–645. [1973] GRANT, S., A. KAJII, AND B. POLAK (1992): “Many Good Choice Axioms: When Can Many-Good Lotteries Be Treated as Money Lotteries?” Journal of Economic Theory, 56, 313–337. [1978] (1998): “Intrinsic Preference for Information,” Journal of Economic Theory, 83, 233–259. [1977] (2000): “Temporal Resolution of Uncertainty and Recursive Non-Expected Utility Models,” Econometrica, 68, 425–434. [1977] GUL, F. (1991): “A Theory of Disappointment Aversion,” Econometrica, 59, 667–686. [1975,1988, 1989,1991] HALEVY, Y. (2007): “Ellsberg Revisited: An Experimental Study,” Econometrica, 75, 503–536. [1979] HAIGH, M. S., AND J. A. LIST (2005): “Do Professional Traders Exhibit Myopic Loss Aversion? An Experimental Analysis,” Journal of Finance, 60, 523–534. [1973] KAHNEMAN, D., AND D. LOVALLO (1993): “Timid Choices and Bold Forecasts: A Cognitive Perspective on Risk-Taking,” Management Science, 39, 17–31. [1976] KAHNEMAN, D., AND A. TVERSKY (1979): “Prospect Theory: An Analysis of Decision Under Risk,” Econometrica, 47, 263–291. [1974,1976,1980]
2004
DAVID DILLENBERGER
KNIGHT, F. (1921): Risk, Uncertainty and Profit. Boston, MA: Houghton Mifflin. [1974] KÖSZEGI, B., AND M. RABIN (2009): “Reference-Dependent Consumption Plans,” American Economic Review, 99, 909–936. [1976] KREPS, D. M., AND E. L. PORTEUS (1978): “Temporal Resolution of Uncertainty and Dynamic Choice Theory,” Econometrica, 46, 185–200. [1977] MACHINA, M. J. (1982): “’Expected Utility’ Analysis Without the Independence Axiom,” Econometrica, 50, 277–323. [1985,1986,1989] (1987): “Choice Under Uncertainty: Problems Solved and Unsolved,” Journal of Economic Perspectives, 1, 121–154. [1980] MEHRA, R., AND E. C. PRESCOTT (1985): “The Equity Premium: A Puzzle,” Journal of Monetary Economics, 15, 145–161. [1976] NEHRING, K. (2005): “Notes on Expected Utility With Psychological Consequences: The Logic of Betweenness-Based Risk Preferences,” Paper presented at Risk, Uncertainty and Decision 2005 Conference, Heidelberger Akademie der Wissenschaften, Heidelberg. [1991,1992] NEILSON, W. S. (1992): “A Mixed Fan Hypothesis and Its Implications for Behavior Towards Risk,” Journal of Economic Behavior and Organization, 19, 197–211. [1989,1990] PALACIOUS-HUERTA, I. (1999): “The Aversion to the Sequential Resolution of Uncertainty,” Journal of Risk and Uncertainty, 18, 249–269. [1974,1975] QUIGGIN, J. (1982): “A Theory of Anticipated Utility,” Journal of Economic Behavior and Organization, 3, 323–343. [1975,1983,1986] READ, D., G. LOEWENSTEIN, AND M. RABIN (1999): “Choice Bracketing,” Journal of Risk and Uncertainty, 19, 171–197. [1976] ROCKAFELLAR, T. R. (1970): Convex Analysis. Princeton, NJ: Princeton University Press. [1985] SAFRA, Z., AND U. SEGAL (2002): “On the Economic Meaning of Machina’s Fréchet Differentiability Assumption,” Journal of Economic Theory, 104, 450–461. [1989] SAFRA, Z., AND E. SULGANIK (1995): “On the Nonexistence of Blackwell’s Theorem-Type Results With General Preference Relations,” Journal of Risk and Uncertainty, 10, 187–201. [1975,1983] SCHLEE, E. (1990): “The Value of Information in Anticipated Utility Theory,” Journal of Risk and Uncertainty, 3, 83–92. [1983] SCHMIDT, U. (1998): “A Measurement of the Certainty Effect,” Journal of Mathematical Psychology, 42, 32–47. [1975] SEGAL, U. (1987): “The Ellsberg Paradox and Risk Aversion: An Anticipated Utility Approach,” International Economic Review, 28, 175–202. [1975,1986] (1990): “Two-Stage Lotteries Without the Reduction Axiom,” Econometrica, 58, 349–377. [1974,1975,1978,1986] TVERSKY, A., AND D. KAHNEMAN (1981): “The Framing of Decisions and the Psychology of Choice,” Science, 211, 453–458. [1976] WANG, T. (1993): “LP -Fréchet Differentiable Preference and ‘Local Utility’ Analysis,” Journal of Economic Theory, 61, 139–159. [1989]
Dept. of Economics, University of Pennsylvania, 160 McNeil Building, 3718 Locust Walk, Philadelphia, PA 19104-6297, U.S.A.;
[email protected]. Manuscript received October, 2008; final revision received April, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 2005–2019
RECONSIDERING THE EFFECT OF MARKET EXPERIENCE ON THE “ENDOWMENT EFFECT” BY DIRK ENGELMANN AND GUILLAUME HOLLARD1 Simple exchange experiments have revealed that participants trade their endowment less frequently than standard demand theory would predict. List (2003a) found that the most experienced dealers acting in a well functioning market are not subject to this exchange asymmetry, suggesting that a significant amount of market experience is required to overcome it. To understand this market-experience effect, we introduce a distinction between two types of uncertainty—choice uncertainty and trade uncertainty— both of which could lead to exchange asymmetry. We conjecture that trade uncertainty is most important for exchange asymmetry. To test this conjecture, we design an experiment where the two treatments impact differently on trade uncertainty, while controlling for choice uncertainty. Supporting our conjecture, we find that “forcing” subjects to give away their endowment in a series of exchanges eliminates exchange asymmetry in a subsequent test. We discuss why markets might not provide sufficient incentives for learning to overcome exchange asymmetry. KEYWORDS: Endowment effect, exchange asymmetry, market experience.
1. INTRODUCTION SIMPLE EXCHANGE EXPERIMENTS, starting with Knetsch (1989), have shown that participants trade their endowments less frequently than standard demand theory would predict. This could suggest that individuals value objects differently according to whether they possess them or not. Taken at face value, this exchange asymmetry or “endowment effect”2 implies that subjects may well 1 We thank Fred Celimene, Kinvi Logossah, and Nicolas Sanz for very substantial help in running Experiment 1, Hela Maafi, Marie-Pierre Dargnies and Omar Sene for their help with Experiment 2, Olivier Armantier, Michèle Cohen, Guillaume Frechette, Ori Heffetz, John List, Andreas Ortmann, Paul Pezanis, Hilke Plassmann, Charles Plott, Drazen Prelec, Andy Schotter, Jason Shogren, Jean-Marc Tallon, Jean-Christophe Vergnaud, Leeat Yariv, four anonymous referees, a co-editor, and seminar participants at Westminster Business School, La Sorbonne, University of Rennes, University of Strasbourg, New York University, Royal Holloway, The Paris School of Economics, and the LSE, as well as the ESEM in Milan, the ESA meetings in Lyon and at Caltech, and the third Nordic Conference in Experimental and Behavioral Economics in Copenhagen for helpful comments. We are also grateful to Jean-Robert Tyran for his valuable input at an early stage of the project. This work was done while Dirk Engelmann was a member of the Department of Economics at Royal Holloway, University of London. Engelmann thanks Royal Holloway for supporting this research. Engelmann also acknowledges financial support from the institutional research grant AV0Z70850503 of the Economics Institute of the Academy of Sciences of the Czech Republic, v.v.i. 2 We agree with Plott and Zeiler (2007) that the term “endowment effect” is problematic, as it already entails an interpretation of the phenomenon observed, namely that too little trade in simple exchange experiments is driven by the fact that players are endowed with one of the goods and experience loss aversion with respect to their endowment. We follow Plott and Zeiler in using the term “exchange asymmetry.”
© 2010 The Econometric Society
DOI: 10.3982/ECTA8424
2006
D. ENGELMANN AND G. HOLLARD
miss out on beneficial trades. To evaluate the impact of such anomalies in actual markets, it is important to understand whether this exchange asymmetry disappears with market experience.3 List (2003a) ran an experiment where the subjects were dealers acting in a well functioning market. He showed that the most experienced dealers are indeed not subject to any exchange asymmetry. List’s experiment has played a prominent part in the debate about the robustness of laboratory results in the field. This raises the question of what it is that the market actually does that makes people rational. The answer is generally twofold: the market selects rational individuals—the market acts as a filter for irrational behavior—and provides incentives to correct any mistakes—the market acts as a teacher (see List and Millimet (2008), and the numerous references therein). Selection by markets is easy to understand. Those who make too many mistakes perform poorly on the market and either choose to withdraw or go bankrupt. But little is known about how market experience succeeds in teaching participants to avoid anomalies, such as exchange asymmetry. We make the following four observations that should be taken into account to understand this learning process. First, in List’s (2003a) experiments, only traders with intense market experience overcome exchange asymmetry. Specifically, the experienced traders for whom no significant exchange asymmetry is detected are those who make six or more trades a month and have typically had this experience for several years. Thus, learning seems to be fairly slow. Second, List also reported experiments in which subjects take part in four trading sessions, each separated by a week. He noted a decline in, although not the complete elimination of, exchange asymmetry, concluding that these results “reinforce the notion that useful cognitive capital builds up slowly, over days or years, rather than in the short run of an experiment” (List (2003a, p. 67)), as noted previously by Camerer and Hogarth (1999). Third, the experiments that test for exchange asymmetry are very simple and do not require any computational skills or inferences about others’ behavior. Subjects are simply asked whether they want to exchange their object for another one. It is thus surprising that a great deal of experience is required to perform such a simple task adequately. Finally, while List (2003a) used unique sports collectors’ items, List (2004) replicated a classic experiment in which choices involve mugs and chocolate bars, again using participants who have experience in a sports-card market. As in List (2003a), only the participants with very intensive market experience in the sports-card market (here 12 or more trades a month) exhibit no exchange asymmetry in the simple choice experiment with mugs and chocolate 3 A natural question is how individuals can lack market experience, since most of us are active in the market nearly every day. This is certainly true with respect to buying. However, most individuals, including typical laboratory subjects, have little experience as sellers. As pointed out by Kahneman, Knetsch, and Thaler (1990), among others, the pathologies in question are most likely to occur on the selling side of the market.
MARKET EXPERIENCE AND “ENDOWMENT EFFECT”
2007
bars. These results also suggest that trading behavior learned in one market can be carried over to other markets in substantially different goods. Taken together, these four observations imply that the market is not easily able to eliminate exchange asymmetry and that learning is slow. Alternatively, it could be the case that no learning occurs at all, but that there is selection (those who are not subject to exchange asymmetry simply trade more both in the market and in the laboratory). Given the slow speed at which participants overcome exchange asymmetry (if they learn at all), we may well wonder whether the market is a poor teacher of this subject4 or, rather, whether the lessons to be learned to overcome exchange asymmetry are simply very difficult. One plausible hypothesis is that market experience helps to overcome exchange asymmetry by reducing the uncertainty that participants face. On the one hand, experience with trading certain types of goods should reduce traders’ uncertainty about their preferences regarding these goods. On the other hand, the learning spillovers mentioned in the final point above suggest that market experience may also reduce the perceived uncertainty in the trading process. Note that in typical exchange experiments, the objects are carefully chosen to have roughly equal market value. Any small amount of uncertainty may then affect subjects’ behavior, even if it appears to be negligible in other experiments. Overall, there are many types of uncertainty that subjects may perceive in the face of a trade opportunity, but we argue that these fall into one of the following two distinct categories: choice uncertainty or trade uncertainty. Choice uncertainty covers all of the potential sources of uncertainty that matter when an individual has to choose between two or more objects. The relative value of the objects at stake could be uncertain, individuals may have incomplete or fuzzy preferences, and so on. Choice uncertainty thus subsumes what we might call object or product uncertainty as well as preference uncertainty. This only addresses uncertainty about which of the consumption bundles is preferred, but does not include phenomena such as loss aversion. Trade uncertainty concerns market procedures. Individuals sometimes overestimate the cost or risk associated with market transactions. They may thus be reluctant to trade if the benefits of doing so are too small, judging that these benefits will not cover the transaction costs (including any risks from trade). In general, trade uncertainty concerns any uncertainty regarding the trading procedure itself. At the most basic level, trading, in contrast to choosing, involves a (human) partner. This implies that issues like other-regarding preferences might come into play. For example, as Plott and Zeiler (2007) argued, the typical designs found in exchange experiments entail the potential risk of offending the experimenter by rejecting an initial endowment perceived as a gift. Anything that could be interpreted as uncertainty regarding transaction cost falls 4 We stress that learning in the marketplace regarding other issues may be much faster. For example, market participants may learn quickly how prices are formed on markets.
2008
D. ENGELMANN AND G. HOLLARD
into the category of trade uncertainty. If individuals are biased against trading because of such risks or due to a general dislike of controversy or bargaining, or even thinking and deciding, they will exhibit exchange asymmetry that may suggest that they are loss-averse. This distinction5 helps us to make sense of the four observations listed above. If subjects perceive trade uncertainty, then to realize that trading is not as risky as they feared, they need to experience trade in precisely those situations where they were reluctant to trade. If they are free to choose when to trade, however, they will only very rarely make such trades, for example, only if the good to be obtained holds the promise of a substantial gain. The market would thus be a poor teacher because traders will avoid those trades that would teach them the crucial lessons. Hence, if trade uncertainty is largely responsible for exchange asymmetry, this is consistent with learning to overcome it being slow, if it occurs at all. On the other hand, if people learn new trading strategies, they can also apply these to different types of goods, so that the spillover effects observed in List (2004) are plausible, whereas they cannot be explained by a reduction in choice uncertainty. Furthermore, trade uncertainty makes it plausible that exchange asymmetries appear for everyday consumable goods as those used in List (2004), which are unlikely to cause any choice uncertainty. To test our hypothesis that trade uncertainty is a major factor in explaining exchange asymmetry, we design an experiment that (i) controls for choice uncertainty and (ii) impacts on trade uncertainty by providing incentives to consider new trading strategies. Our design incorporates two distinct stages. The first consists of a simple (experimental) market in which subjects can trade with each other without any restrictions on how they interact, bargain, move, and so on. After this training stage, we test for the existence of exchange asymmetry in the second stage, which is carried out in isolation, where subjects can trade only with the experimenter. The second stage is identical in all of our treatments. Our two treatments differ only in one aspect. In one treatment subjects are free to trade at the market stage, while in the other they are forced to trade, that is, if they do not exchange their initial endowment, they lose it. This “forced” trade encourages participants to trade even in situations where they perceive considerable trade uncertainty and would hence normally avoid trade. As a result, relatively little experience can be sufficient to learn new trading strategies. 5
A similar distinction between what we interpret as trade and choice uncertainty has been suggested by, for example, Plott and Zeiler (2007) and Braga and Starmer (2005), who distinguished between “institutional learning” and “value learning.” This distinction parallels, to a certain extent, our own between trade and choice uncertainty, but Braga and Starmer’s institutional learning is more related to subjects’ understanding of the technical functioning of the mechanism (similar to Plott and Zeiler (2005)), whereas trade uncertainty captures the (social) risk associated with the mechanism.
MARKET EXPERIENCE AND “ENDOWMENT EFFECT”
2009
We find that when forced to overcome their reluctance to trade during the market stage, subjects no longer subsequently exhibit exchange asymmetry. In contrast, when trade in the market stage is voluntary, we detect clear exchange asymmetry in the second stage. These results support the hypothesis that the exchange asymmetry in our experiment is largely driven by trade uncertainty and probably to a greater extent than by choice uncertainty. In Section 2, we explain our experimental design and procedures in detail. This is followed by the results in Section 3 and a discussion in Section 4. 2. EXPERIMENTAL DESIGN AND PROCEDURES Our first experiment was run in April 2007 at the University of Antille– Guyane in Martinique with a total of 74 subjects. A replication, with a much larger number of subjects, 246, was run in September and October 2009 in Paris. Our subjects are economics students in the first experiment and students in various disciplines in the second. Participants were exogenously sorted into the different treatments. The laboratory consisted of a circle of 20 small tables. On entering the room, participants drew cards assigning them to one of these tables. There were two stages in the experiment. The first consisted of three interactive trading rounds that provided subjects with the opportunity to gain trading experience. The second stage was performed in isolation and is a standard test of exchange asymmetry. In each trading round, the participants were randomly endowed with one of two different goods. After being given the opportunity to freely inspect the goods, they were assigned one of the goods by drawing a card that was then exchanged for the respective good. All the goods had nontrivial value for the participants: a package of coffee and a package of rice (round 1), a packet of crisps and a can of cola (round 2), and a note pad and a ball pen (round 3). In Experiment 2, the objects were the same, except in round 1, where the rice was replaced by a set of toothbrushes. The first stage of the experiment (i.e., the trading rounds) consisted of either “free-trade” or “forced-trade” rounds of exchange. Interaction, movement, and communication were not restricted in any way and participants could see all of the other participants in the session at any time. In the free-trade sessions, participants were free to trade with any of the participants who were endowed with the other good. Participants could keep the good they possessed at the end of each round, whether it be that with which they were endowed at the beginning of the round or the other. The duration of each trading round was restricted to a total of 5 minutes. The forced-trade sessions differed from the free-trade sessions in only one respect. Participants were only allowed to keep the good they possessed at the end of the round if it was not the type of good with which they were originally endowed. If they were still in possession of their endowment good, they had to
2010
D. ENGELMANN AND G. HOLLARD
return it to the experimenter. They were thus “forced” to trade, in the sense that they had to trade with a participant who was endowed with a different good so as not to forfeit their goods. This procedure was intended as a shock therapy for participants who are generally reluctant to trade. In all sessions, we gradually introduced an imbalance in the endowments over the three rounds. In the first round, the two goods at stake were given out in equal numbers. So exactly half of the participants received one good and the other half received the other good. In the second round, there were two more items of one of the goods than of the other good, and four more in the third. This increases the number of players who are unable to trade. The aim here was to create pressure on the participants with the good in excess supply to trade fast, in particular in the forced-trade sessions. After the third trading round, participants were given an additional good as compensation while filling out a survey. In any one session, all participants were given the same good. They were informed that they could do whatever they wanted with this good. They were then asked one by one to proceed to an adjacent room with their goods (the ones they kept from the trading rounds and their additional endowed good). Once in isolation, a short exit interview was conducted (which, as expected for such a simple experiment, did not reveal any particular misconceptions by subjects regarding the experimental procedures). The experimenter then offered the opportunity to exchange their additional endowed good for another one. This stage was identical in both the forced-trade and the free-trade sessions. The extent of trade at this last stage then serves to test for any exchange asymmetry. The procedure in this stage closely follows that in List (2003a) so as to make our results comparable. Plott and Zeiler (2007) have demonstrated that exchange asymmetries can be influenced by experimental procedures. To control for any aspects that might influence the results, we applied exactly the same procedure in each session.6 We hence have a 2 × 2 design, with one dimension being the type of market experience (free vs. forced) in the first stage of the experiment and the other being the type of good with which subjects were endowed in the second stage. In Experiment 1, the second stage involves either a rewritable DVD (D) or a package of copy paper (P). In Experiment 2, we returned to the classic mugs and chocolates. Pre-tests with other students revealed that all of these goods were of nonnegligible value for participants. Table I summarizes the treatments. The number of participants in each treatment is shown in parentheses. We carried out four independent sessions in Experiment 1 and 13 sessions in Experiment 2, with 16–20 participants in each session. 6
Specifically, in all sessions, the object received at the end of the trading rounds was put in front of the subjects. It was made clear to them that the object was theirs and that they were free to use it as they wanted, with the precise wording fixed in advance.
2011
MARKET EXPERIENCE AND “ENDOWMENT EFFECT” TABLE I SUMMARY OF THE EXPERIMENTAL TREATMENTSa
Experiment 1
Experiment 2
Endowment DVD
Endowment Paper
Total
Free trade Forced trade
Free-D (18) Forced-D (16)
Free-P (20) Forced-P (20)
Total
34
40
Endowment Choc.
Endowment Mug
Total
Free trade Forced trade
Free-C (60; 3) Forced-C (56; 3)
Free-M (76; 4) Forced-M (54; 3)
136 110
Total
116
130
38 36
a Endowment is the type of good given as compensation for participation after the end of the first stage. The number of participants in each treatment is given in parentheses. For Experiment 2, the number of independent observations is also given in parentheses. Totals refer to the total numbers across categories.
3. EXPERIMENTAL RESULTS In the absence of any exchange asymmetry, the average trade rate across endowments in the final stage should be close to 50%. Exchange asymmetry implies a smaller average rate of trade. If, as we hypothesize, trade uncertainty is largely responsible for exchange asymmetry, the latter should be substantially reduced after forced trade. It is also reasonable to expect that the participants in the free-trade sessions who trade more frequently would exhibit less exchange asymmetry than participants who trade less frequently. This would, however, not allow us to infer that the former participants learn to trade, as this could just reflect a selection effect with those who are generally more willing to trade trading more, both in the three rounds of free trade and in the second stage of the experiment. We find clear support for our main hypotheses. Table II shows the relationship between the initial endowment and the good the subjects leave with and the trade rates. The average trade rate is substantially below the rational level in the free-trade treatments.7 There is thus considerable exchange asymmetry in the free-trade treatments of both experiments. Fisher’s exact test indeed rejects for the free-trade treatments the H0 that the good subjects leave with is independent of their endowment (two sided, p = 0047 in Experiment 1 and p = 0001 in Experiment 2).8 7
Overall, we observed a general preference for chocolate in Experiment 2. This general preference is not a problem, as the mean exchange rate controls for any bias in general preferences. 8 For a χ2 test, we get p = 0024 in Experiment 1 and p = 0001 in Experiment 2. A caveat regarding these tests is that they treat the data as independent even though the participants interacted before we employed our measure of exchange asymmetry. We note, however, that the
2012
D. ENGELMANN AND G. HOLLARD TABLE II
GOODS SUBJECTS LEAVE WITH CONDITIONAL ON GOODS ENDOWED IN THE FREE-TRADE AND FORCED -TRADE TREATMENTSa
Experiment 1
Experiment 2
Treatment
Endowment
Leaves With A
Leaves With B
Trade Rate
Free trade
A (paper) B (DVD) Average
15 7
5 11
25% 38.9% 31.9%
Forced trade
A (paper) B (DVD) Average
9 6
11 10
55% 37.5% 46.3%
Free trade
A (chocolate) B (mug) Average
47 38
13 38
21.7% 50% 35.9%
Forced trade
A (chocolate) B (mug) Average
38 39
18 15
32.1% 72.2% 52.2%
a In the absence of exchange asymmetry the good the subjects leave with would have to be independent of their endowment. For the detailed data see the supplementary material (Engelmann and Hollard (2010)).
In contrast, the forced-trade treatments in both experiments yield average trade rates much closer to 50%. As expected, for the forced-trade treatments, Fisher’s exact test cannot reject the H0 that the good subjects leave with is independent of their endowment at any conventional level of significance (p = 0741 in Experiment 1 and p = 068 in Experiment 2, and similarly for a χ2 test, p = 065 in Experiment 1 and p = 0617 in Experiment 2). The greater number of independent observations in Experiment 2 allow for more in-depth analysis. Using these data only, we carried out two additional tests. We restrict this analysis to Experiment 2, because we used different goods in Experiment 1, which was also run in a different location. Pooling both experiments would thus require adding a number of controls for only slightly more observations. First, to compare the trade rates between treatments while taking any possible dependence of the data within sessions into account, we ran Mann– Whitney tests comparing the trade rates per session between the free-trade and forced-trade treatments. The trade rates for the sessions endowed with mugs are generally higher (592% overall) than for those endowed with chocolates (267%). Therefore, to enable comparison of the impact of free and forced trade using all sessions, while controlling for preferences regarding the endowed goods, we normalize the trade rate in each session by subtracting the average trade rate for this good (note that this is the average across all sessions last stage was conducted independently for each participant and involved trade with the experimenter with a one-shot option, in contrast to open haggling in a large group.
MARKET EXPERIENCE AND “ENDOWMENT EFFECT”
2013
TABLE III TRADE RATES (TOP) AND NORMALIZED TRADE RATES (BOTTOM) IN THE INDIVIDUAL SESSIONS IN EXPERIMENT 2a Free Trade
Forced Trade
Endowment choc
02; 02; 025 −007; −007; −002
035; 022; 039 008; −005; 012
Endowment mug
06; 044; 04; 056 001; −015; −019; −004
075; 069; 072 016; 01; 013
a The normalized trade rate is calculated by subtracting the average trade rate for the endowed good.
endowed with the good, both those in free trade and forced trade). See Table III for the trade rates and normalized trade rates in the individual sessions of Experiment 2. The normalized trade rates are significantly higher in forced trade than in free trade (z = −2575 p = 001). Second, we ran a probit regression with the dependent variable being whether the participant leaves with the mug or not, and the explanatory variable being a dummy for whether she was endowed with the mug, MugEndow (with robust standard errors clustered at the session level), as shown in Table IV. We can see that the endowment matters in the free-trade treatments (see column 1) but not in the forced-trade treatment (see column 2). Moreover, if we run this regression pooled over both the free- and forced-trade treatments of Experiment 2 and include a dummy for the forced-trade treatment Forced and an interaction effect, MugEndowXForced (see column 3), we find a significant effect of both MugEndow, representing the exchange asymTABLE IV PROBIT REGRESSIONS FOR THE EVENT THAT THE PARTICIPANT LEAVES WITH THE MUG IN EXPERIMENT 2a 1 Free Trade Only
MugEndow
2 Forced Trade Only
Coeff.
p
Coeff.
07835 (01233)
<0001
−01257 (01309)
p
0337
Forced MugEndow XForced Constant
−07835 (00501)
<0001
−04637 (01220)
a Robust standard errors (clustered at the session level) in parentheses.
<0001
3 Both Treatments Coeff.
p
07835 (01188)
<0001
03198 (01255)
0011
−09092 (01720)
<0001
−07835 (00483)
<0001
2014
D. ENGELMANN AND G. HOLLARD TABLE V THE GOODS THAT THE SUBJECTS LEAVE WITHa
Experience
Endowment
Leaves With Chocolate
Leaves With Mug
Trade Rate
0 Trades in stage 1
Chocolate Mug Average
15 17
5 23
25% 42.5% 338%
1 Trade in stage 1
Chocolate Mug Average
23 17
4 14
14.8% 54.8% 348%
2–3 Trades in stage 1
Chocolate Mug Average
9 4
4 1
30.8% 80% 554%
a Conditional on their endowment goods in the free-trade treatment in Experiment 2, separated by the number of trades made in the free-trade training rounds (“Experience”). In the absence of exchange asymmetry, the good the participants leave with should be independent of their endowment.
metry in the free-trade treatment, and Forced, indicating that participants endowed with the chocolate are more likely to leave with the mug after forced trade than after free trade. This is consistent with forced-trade training making the endowment less relevant. Most importantly, the interaction effect is highly significant and negative with a slightly larger absolute value than that of the coefficient on MugEndow. Under forced trade, the exchange asymmetry is not only significantly smaller than under free trade, but it disappears completely. To address the additional question whether exchange asymmetry is stronger for those participants in the free-trade treatment who trade only little in the three rounds of market trade, we split the sample according to the number of trades. The results are given in Table V. Most subjects either never trade or do so only once (60 or 58, respectively); only 10 trade twice and 8 trade three times. The average trade rates in the second stage are almost identical for the subjects who do not trade in the first stage of the free-trade treatment and those who did so only once. In both cases there is significant exchange asymmetry according to Fisher’s exact test (p = 0027 and p = 0022, respectively, and p = 0017 and p = 0013 for a χ2 test) and probit regressions (see Table VI, columns 1 and 2, respectively). For the subjects who traded two or three times in the training rounds of the free-trade treatment, however, being endowed with the mug has no significant effect on the likelihood of leaving with the mug, according to Fisher’s exact test (p = 1 and, for a χ2 test, p = 0648) and the probit regression (column 3 in Table VI). Moreover, in a pooled probit over all subjects in the free-trade treatment, with a dummy for the subject trading at least twice, and an interaction of this dummy with the mug-endowment dummy, the latter is negative and significant at the 10% level (see column 4 in Table VI), showing that exchange asymmetry is significantly smaller (to the extent that it disappears) for
2015
MARKET EXPERIENCE AND “ENDOWMENT EFFECT” TABLE VI PROBIT REGRESSIONS FOR THE EVENT THAT THE PARTICIPANT LEAVES WITH THE MUG IN THE FREE-TRADE TREATMENT OF EXPERIMENT 2a 1 0 Trades
MugEndow
2 1 Trade
Coeff.
p
Coeff.
08636 (02912)
0.003
09228 (02195)
3 2–3 Trades p
Coeff.
<0.001 −03392 (06796)
Minimum 2 Trades Minimum 2 TradesX MugEndow Constant
−06745 (02619)
0.010 −10444 (00719)
<0.001 −05024 (03449)
4 All p
Coeff.
p
0.618
09254 (01881)
<0001
03700 (05002)
0459
−12646 (07345)
0085
0.145 −08724 (01602)
<0001
a Robust standard errors (clustered at the session level) in parentheses.
the subjects who trade at least twice in the training rounds. Since we have relatively few subjects who trade at least twice in the training rounds and there is no difference between those who trade never or only once (and those who traded three times in the training rounds actually traded less frequently in the second stage than those who traded twice), the impact of the number of trades is not entirely robust.9 More importantly, in contrast to our comparison between free and forced trade, when we look at the effect of the number of trades carried out in the free-trade training rounds, we cannot distinguish learning from selection. Some subjects may be more prone to trade, and would thus trade more in both the first and second stages of the experiment. 4. DISCUSSION We have shown that a simple design feature—forced trade—eliminates exchange asymmetry in an environment where the same amount of market experience—free trade—yields significant and substantial exchange asymmetry. Our results strengthen those of List (2003a, 2004), and further call into question the importance of anomalies such as exchange asymmetry in settings where experienced agents populate the market. In particular, we find that even very limited experience can be effective in eliminating exchange asymmetry. We conjecture that this is so because subjects, particularly learn if they are 9 This can also be seen in a regression where we replace the dummy for having traded at least twice with the number of actual trades carried out in the training stage. The coefficient on the relevant interaction term just misses significance at the 10% level.
2016
D. ENGELMANN AND G. HOLLARD
forced to make trades that they would otherwise not carry out. In what follows, we address a number of important questions regarding the causes of exchange asymmetry, what makes it disappear, and whether it is a real phenomenon or an artefact of biased experimental procedures, considering the related literature. Obviously, something changes because of forced trade. The model of reference-dependent preferences of K˝ oszegi and Rabin (2006) provides a useful framework to address the question of what may have changed. Their model contains three key elements of individual behavior: “consumption” utility, which is derived from consuming goods, “gain/loss” utility, which is derived from the difference between a reference point and actual consumption, and expectations about future consumption that determine this reference point. Each of these elements may be affected by our forced-trade treatment and thus drive our results. We can safely rule out the possibility that forced trade affects consumption utility in a different way from free trade. The goods we used in our experiments are simple objects that subjects encounter often in their daily lives and that they had time to freely inspect in both treatments. Furthermore, in the second stage of the experiment, the goods traded are different from those in the first stage, and it would seem implausible that experience with the latter changes the consumption utility derived from the former. Note that what we call choice uncertainty is exactly captured by uncertainty in consumption utility. Hence, to explain our results in the K˝ oszegi–Rabin framework, some change must have occurred in the gain/loss utility function or in expectations and, thus, the reference point. One potential change in the gain/loss utility function would come from lower loss aversion, as suggested by List (2003b, p. 23): The main effect of endowment is not to enhance the appeal of the good one owns, but rather the pain of giving it up (Loewenstein and Kahneman (1991)). Thus, via market interaction and numerous arbitrage opportunities, practiced agents may have learned to overcome this “pain” and treat the good leaving their endowment as an opportunity cost rather than a loss.
According to this reading, subjects may shy away from trade when they are particularly reluctant to give up a good. Our results would then suggest that being forced to give up a good teaches subjects that their disutility from losses is actually smaller than they had expected. The results would also suggest that one voluntary trade does not lead subjects to learn sufficiently about their true loss aversion. The third alternative within the K˝ oszegi–Rabin framework is then that forced trade has shifted expectations about future trading and hence the reference point. The question is then why these expectations are more receptive to forced than to free trade. In both treatments, and in sharp contrast to most previous experiments, the subjects are fully aware that the experiment in which they are taking part concerns trading (e.g., this was written explicitly on the instruction sheets). Their expectations should then be shifted toward trading in
MARKET EXPERIENCE AND “ENDOWMENT EFFECT”
2017
both treatments. Certainly, subjects who traded once or more in the training rounds of the free-trade treatment should expect to be likely to trade also in the second stage. However, the exchange asymmetry is as strong for those who trade once as for those who do not trade at all (see Table V), and the difference between treatments remains significant when we exclude the latter subjects. Expectations about the reference point thus do not appear to account for the observed difference between treatments. It may also seem surprising that forced trade would shift the expectation exactly enough that we observe exchange close to its rational rate of 50%.10 The explanation we propose is also based on changes in expectations, but does not rely on loss aversion. Assume that subjects form beliefs not only about their future reference points, but also about the possibility that trading will lead to some costly mistakes, which is what we called trade uncertainty. In this interpretation, subjects perceive trading as risky and are thus willing to trade only if they have a strong enough preference for the alternative good. Why then do subjects not appear to learn sufficiently about trade uncertainty when they trade once in the free-trade treatment? Our conjecture is that in the cases where they do trade (because they have strong preferences over the goods), they do not think much about any risk (since their thoughts are focused on the much more attractive alternative good) and hence learn little about trade risks. This is similar to the behavior often observed in experiments where subjects have to price lotteries. When the stakes are high, they tend to focus on the gain, and forget about the risk. This type of behavior is a major cause of observed preference reversal (see Seidl (2002), for a survey).11 Forced trade, on the other hand, allows traders to learn that trade is not as risky as they expected since they are forced to make trades they would otherwise shy away from exactly because they focus on this risk.12 Finally, we address the question of whether exchange asymmetry is a real phenomenon or an artefact of experimental procedures, as Plott and Zeiler 10 A related issue is the possibility that our results are confounded by experimenter demand effects. Subjects may perceive the forced-trade treatment as a signal that the experimenter would like them to trade. This could bias the results in the opposite direction to any naturally occurring exchange asymmetry, without eliminating the forces that drive the latter. If this is the case, there is again no particular reason for the exchange rate to be almost exactly 50%. 11 Alternatively, individual subjects may not perceive the same degree of trade uncertainty in all trades. Perceived trade uncertainty may depend on a number of factors and subjects may perceive relatively little uncertainty in the market stage and greater uncertainty in the exchange with the experimenter, for example, because they feel more comfortable interacting with their peers. Trading in the face of little uncertainty may well not provide any insight into trade under greater uncertainty. 12 We can also interpret these different explanations from the perspective of Plott’s (1996) discovered-preference hypothesis. The explanation based on consumption utility would be a direct application of this hypothesis. List’s explanation would correspond to traders discovering their preferences regarding losses rather than their preferences regarding the goods. Our explanation could be interpreted as subjects discovering their preferences over trade itself.
2018
D. ENGELMANN AND G. HOLLARD
(2007) argued, based on their demonstration that exchange asymmetry is sensitive to various experimental features. Their results are broadly consistent with ours in that the variables they identify as crucial are those closely related to trade uncertainty (such as the method of endowing subjects with a good), whereas those addressing only aspects of choice uncertainty (such as public revelation of choice) do not eliminate exchange asymmetry (although we note that these variables cannot always be unambiguously assigned to one of our two categories). Nevertheless, our research leaves open the possibility that exchange asymmetries occur outside the laboratory. Trade uncertainty may well be perceived in markets. For example, this may explain (without requiring loss aversion) phenomena such as the fact that people tend to stick with the default for pension plans. As our results suggest, markets may not provide sufficient incentives to explore new strategies that help to overcome exchange asymmetry, hence the asymmetry persists. Forced trade may well be difficult to implement outside of the laboratory, but exogenous shocks may produce comparable effects and hence enable us to study it in the field. For instance, the subprime crisis forced an unusually large number of homeowners to sell their houses. An intriguing question is whether this will have an effect on the future willingness of the affected individuals to trade on the housing or other markets. (The result here could work in the direction of less trade, as the experience of having to trade may be worse than expected and thus reinforce any reluctance to trade.) To summarize, our forced-trade treatment appears to remove any reluctance to trade, and observed trading rates are, thus, as if subjects simply traded according to their preferences between the goods, without the endowment mattering. Our experiments certainly have not provided a definitive answer to the question of whether preferences or beliefs have changed. As a more general contribution, our experiments also represent one step in crossing the bridge between the laboratory and the field in both directions. It has been suggested that market experience is a powerful tool to induce rationality. The laboratory offers the kinds of controls that are required to distinguish between different channels via which markets promote rationality. In the other direction, the laboratory results suggest experiments in which the effective learning devices in the laboratory can be implemented in the field. For example, we could consider whether the forced trade experienced in the laboratory spills over to trading behavior in the field, and for how long any such effects last (i.e., have we “cured” our subjects for good?). REFERENCES BRAGA, J., AND C. STARMER (2005): “Preference Anomalies, Preference Elicitation and the Discovered Preference Hypothesis,” Environmental and Resource Economics, 32, 55–89. [2008] CAMERER, C. F., AND R. M. HOGARTH (1999): “The Effects of Financial Incentives in Experiments: A Review and Capital-Labor-Production Framework,” Journal of Risk and Uncertainty, 19, 7–42. [2006]
MARKET EXPERIENCE AND “ENDOWMENT EFFECT”
2019
ENGELMANN, D., AND G. HOLLARD (2010): “Supplement to ‘Reconsidering the Effect of Market Experience on the “Endowment Effect” ’,” Econometrica Supplemental Material, 78, http: //www.econometricsociety.org/ecta/Supmat/8424_data.zip. [2012] KAHNEMAN, D., J. KNETSCH, AND R. THALER (1990): “Experimental Tests of the Endowment Effect and the Coase Theorem,” Journal of Political Economy, 98, 1325–1348. [2006] KNETSCH, J. L. (1989): “The Endowment Effect and Evidence of Nonreversible Indifference Curves,” American Economic Review, 79, 1277–1284. [2005] ˝ KOSZEGI , B., AND M. RABIN (2006): “A Model of Reference-Dependent Preferences,” Quarterly Journal of Economics, 121, 1133–1165. [2016] LIST, J. A. (2003a): “Does Market Experience Eliminate Market Anomalies?” Quarterly Journal of Economics, 118, 41–71. [2005,2006,2010,2015] (2003b): “Neoclassical Theory versus Prospect Theory: Evidence From the Marketplace,” Working Paper 9736, NBER. Available at http://www.nber.org/papers/w9736. [2016] (2004): “Neoclassical Theory versus Prospect Theory: Evidence From the Marketplace,” Econometrica, 72, 615–625. [2006,2008,2015] LIST, J. A., AND D. MILLIMET (2008): “The Market: Catalyst for Rationality and Filter of Irrationality,” The B.E. Journal of Economic Analysis & Policy, 8 (Frontiers), Article 47. [2006] LOEWENSTEIN, G., AND D. KAHNEMAN (1991): “Explaining the Endowment Effect,” Working Paper, Carnegie Mellon University. [2016] PLOTT, C. R. (1996): “Rational Individual Behaviour in Markets and Social Choice Processes: The Discovered Preference Hypothesis,” in The Rational Foundations of Economic Behavior, ed. by K. Arrow, E. Colombatto, M. Perlaman, and C. Schmidt. London: MacMillan and New York: St. Martin’s Press, 225–250. [2017] PLOTT, C. R., AND K. ZEILER (2005): “The Willingness to Pay/Willingness to Accept Gap, the ‘Endowment Effect,’ Subject Misconceptions and Experimental Procedures for Eliciting Valuations,” American Economic Review, 95, 530–545. [2008] (2007): “Exchange Asymmetries Incorrectly Interpreted as Evidence of Endowment Effect Theory and Prospect Theory?” American Economic Review, 97, 1449–1466. [2005,2007, 2008,2010,2017,2018] SEIDL, C. (2002): “Preference Reversal,” Journal of Economic Surveys, 16, 621–655. [2017]
Dept. of Economics, University of Mannheim, L7, 3-5, D-68131 Mannheim, Germany and Centre for Experimental Economics at the University of Copenhagen and Economics Institute of the Academy of Sciences of the Czech Republic, v.v.i.;
[email protected] and Paris School of Economics and CNRS, 106/112 Boulevard de l’Hôpital, 75647 Paris Cedex 13, France;
[email protected]. Manuscript received February, 2009; final revision received June, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 2021–2042
NOTES AND COMMENTS IRREGULAR IDENTIFICATION, SUPPORT CONDITIONS, AND INVERSE WEIGHT ESTIMATION BY SHAKEEB KHAN AND ELIE TAMER1 In weighted moment condition models, we show a subtle link between identification and estimability that limits the practical usefulness of estimators based on these models. In particular, if it is necessary for (point) identification that the weights take arbitrarily large values, then the parameter of interest, though point identified, cannot be estimated at the regular (parametric) rate and is said to be irregularly identified. This rate depends on relative tail conditions and can be as slow in some examples as n−1/4 . This nonstandard rate of convergence can lead to numerical instability and/or large standard errors. We examine two weighted model examples: (i) the binary response model under mean restriction introduced by Lewbel (1997) and further generalized to cover endogeneity and selection, where the estimator in this class of models is weighted by the density of a special regressor, and (ii) the treatment effect model under exogenous selection (Rosenbaum and Rubin (1983)), where the resulting estimator of the average treatment effect is one that is weighted by a variant of the propensity score. Without strong relative support conditions, these models, similar to well known “identified at infinity” models, lead to estimators that converge at slower than parametric rate, since essentially, to ensure point identification, one requires some variables to take values on sets with arbitrarily small probabilities, or thin sets. For the two models above, we derive some rates of convergence and propose that one conducts inference using rate adaptive procedures that are analogous to Andrews and Schafgans (1998) for the sample selection model. KEYWORDS: Irregular identification, inverse weighting, rates of convergence, rate adaptive inference.
1. INTRODUCTION THERE IS A CLASS OF MODELS in econometrics, mainly arising in limited dependent variable models, that attain identification by requiring that covariate variables take support in regions with arbitrary small probability mass. These identification strategies sometimes lead to estimators that are weighted by a density, a conditional probability, or weights that take arbitrarily small values on these regions of small mass. For example, an estimator that is weighted by the density of a continuous random variable effectively only uses observations for which that density is small. Taking values on these “thin sets,” is essential (necessary) for point identification in these models. We explore the effect of this weighting on the rates of convergence of resulting estimators and show 1 We thank a co-editor and three referees for comments that improved the content and exposition in the paper. We also thank J. Heckman, M. Ponomareva, J. Powell, and seminar participants at many universities, as well as conference participants at the 2007 NASM at Duke University and at the 2008 ES Winter Meeting in New Orleans, for helpful comments. Support from the National Science Foundation is gratefully acknowledged by Tamer.
© 2010 The Econometric Society
DOI: 10.3982/ECTA7372
2022
S. KHAN AND E. TAMER
that under general conditions, it is often not possible to attain the regular parametric rate (square root of the sample size) with these models. We label these identification strategies as “irregular” in the sense that estimators based on them do not converge at the parametric (square root of the sample size) rate. Consequently, we argue that these strategies belong to the class of “identified at infinity” approaches (see Chamberlain (1986) and Heckman (1990)). Note that it is part of econometrics folklore to equate the parameter being identified at infinity with slow rates of convergence, as was done in Chamberlain (1986) for some particular models. See also Andrews and Schafgans (1998), who showed how rates can vary with relative tail behavior in a sample selection model. Furthermore, these rates of convergence depend on (unknown) explicit functions of the relative tail behavior of observed and unobserved random variables, making inference more complicated than in standard problems. The results in this paper are connected to two models that we examine in detail. First, we consider the binary choice model under an exclusion restriction and mean restrictions. This model was introduced to the literature by Lewbel (1997, 1998, 2000). Lewbel demonstrated that this binary model is point identified with only a mean restriction on the unobservables by requiring the presence of a special regressor that is conditionally independent of the error. Under these conditions, Lewbel provided a density weighted estimator for the finite dimensional parameter (including the constant term). This estimator contains a random denominator that can take arbitrarily small values in regions that are necessary for point identification. We show that the parameters in a simple version of that model are irregularly identified unless special relative tail behavior conditions are imposed. Two such conditions are (i) a support condition such that the propensity score hits limiting values with positive measure, in which case root-n estimation can be attained, and (ii) infinite moment restrictions on the regressors. In general, the rate of convergence of the estimator depends on the tail behavior of the special regressor relative to that of the error distribution. Second, we consider the treatment effects model under exogenous selection. This is an important model that is widely used in economics and statistics to estimate the average treatment effect (ATE) or the treatment on the treated (ATT) parameters in program evaluations. For an exhaustive review of this literature, see Imbens (2004). Hahn (1998), in important work, derived the efficiency bound for this model and provided an estimator that reaches that bound (see also Hirano, Imbens, and Ridder (2003)). In the case where the covariates have a relatively large support, the propensity score is arbitrarily close to 0 or 1.2 We show that the ATE in this case can be irregularly identified 2 The identification strategy for these models is based on writing the estimand as a weighted sum, where the weights (in the denominator) can get arbitrary small. This intuition applies equally to a matching strategy, since a matching estimator can be written as a weighted estimator where weights can get small.
INVERSE WEIGHT ESTIMATION
2023
(the semiparametric efficiency bound can be infinite), resulting in nonregular rates of convergence. This is related to having the propensity score take values arbitrarily close to 0 and 1 (as opposed to bounded away from 0 and 1). Once again the rates of an inverse weighting procedure will depend on the relative tail thickness of the error in the treatment equation and the covariates. For empirical researchers, this results in instability of the estimator in cases where there is “limited overlap,” and care should be taken in making inference in these situations. Busso, DiNardo, and McCrary (2008), in an important recent paper, highlighted this instability in extensive Monte Carlo simulations where the ATE and ATT parameters appear to exhibit bias at moderate sample sizes. See also Frolich (2004). The next section introduces our class of models. Section 3 considers the binary choice model under mean restrictions, and Section 4 considers the treatment effect models. In each of these sections, we also derive rates of convergence for estimators of parameters of interest that are sample analogs to the moment conditions used to identify these parameters. Section 5 concludes by summarizing and suggesting areas for future research. 2. IRREGULAR IDENTIFICATION AND INVERSE WEIGHTING The notion of regularity of statistical models is linked to efficient estimation of parameters; see, for example, Stein (1956). The concept of irregular identification has been related to consistent estimation of the parameters of interest at the parametric rate. An important early reference to irregular identification in the literature appears in Chamberlain (1986), which deals with efficiency bounds for sample selection models.3 Chamberlain showed that even though a slope parameter in the selection equation is identified, it is not possible to estimate it at the parametric root-n rate (under the assumptions he maintains in the paper). Chamberlain added that point identification in this case relies “on the behavior of the regression function at infinity.” In essence, since the disturbance terms are allowed to take support on the real line, one requires that the regression function takes “large” values so that at the limit, the propensity score is equal to 1 (or is arbitrarily close to 1). The identified set in the model shrinks to a point when the propensity score hits 1 (or no selection). Heckman (1990) highlighted the importance of identified at infinity parameters in various selection models and linked this type of identification essentially to the propensity score. Andrews and Schafgans (1998) used a smoothed variation of an estimator proposed by Heckman (1990) and confirmed slow rates of convergence of the estimator for the constant and also showed that this rate depends on the thickness of the tails of the regression function distribution relative to 3 See also Newey (1990) for other impossibility theorems and notions of regular and irregular estimators.
2024
S. KHAN AND E. TAMER
that of the error term. This rate can be as slow as cube root and can be arbitrarily close to root n. The class of models we consider are those that can be written as a weighted moment condition. The weight in these models can take arbitrary small values and causes point identification to be fragile. This means that the rate of convergence is generally slower than the regular rate. This exact rate is determined by relative tail conditions on observed and unobserved variables. These weighted models arise in various settings, especially in cases where the moment condition is a ratio and where the denominator takes arbitrarily small values in a region of its support. Typical examples are weighted estimators and estimators with “random denominators.” An important recent example that fits into this framework is Graham and Powell (2009). We next apply this framework to specific models, beginning with the binary choice model. Then we examine the average treatment effect model under conditional independence (unconfoundedness) assumptions. 2.1. Overview of Approach We consider models where the parameter of interest can be written as an expectation of some function of observed variables, (2.1)
θ0 = E[g(yi xi )]
where θ0 is a finite dimensional parameter of interest and (yi xi ) is finite dimensional vector of observed random variables. Of particular interest in this paper is the class of models above in which [g(yi xi )2 ] = ∞, where · denotes the Euclidean norm. The key problem with an empirical analog estimator of θ0 , 1 θˆ = g(yi xi ) n i=1 n
(2.2)
is that although it sometimes is consistent, it has the unattractive property that since E[g(yi xi )2 ] = ∞, it can no longer be concluded that θˆ is root-n consistent and asymptotically normal. One approach to exploring the properties of an analog estimator is to introduce a trimming sequence τni = τ(yi xi n) > as 0 τni → 1, such that 2 E τni (2.3) g(yi xi )2 ≡ γn < ∞ But note that since τni is converging to 1, γn will generally diverge to infinity as the sample size increases. Nonetheless, we can explore the asymptotic properties of the trimmed estimator 1 τni g(yi xi ) θˆ n = n i=1 n
(2.4)
INVERSE WEIGHT ESTIMATION
2025
This trimming, which disregards observations in the sample where the value of the variance is large, will introduce bias which has to be accounted for. So, ultimately for models we consider here (binary response and ATE), we address the following questions. • What is the fastest rate of convergence for θˆ n ? • What is the limiting distribution for θˆ n ? • How does one conduct inference for θ0 given that the rate of convergence of its estimator will generally be unknown? We shed light on these questions by showing first that, typically, parameters in these models cannot be regularly estimated. To derive limiting distributions for these parameters, we propose an approach analogous to Andrews and Schafgans (1998) that studentizes the estimator. 3. BINARY CHOICE WITH MEAN RESTRICTIONS In the standard binary response model, y = 1[x β + ε ≥ 0] Manski (1988) showed that a conditional mean restriction does not identify the parameter β. In fact, he showed that the mean independence assumption is not strong enough in binary response models to even bound β. Hence, we modify this model by adding more assumptions to ensure point identification. To simplify the discussion here, consider the special case where x consists only of a constant and a single regressor v with a coefficient that is normalized to equal 1, as introduced by Lewbel (1997); that is, (3.1)
yi = 1[α + vi − εi ≥ 0]
where vi is a scalar random variable that is independent of εi , both εi and vi have support on the real line, and E[εi ] = 0. We observe both y and v, and the object of interest is the parameter α. The location restriction on ε point identifies α. We start with the equality (3.2)
P(yi = 1|vi ) = Fε (α + vi )
where Fε (·) is the cumulative distribution function (c.d.f.) of ε, and we assume that this c.d.f. is a strictly increasing function. Lewbel (1997) derived the relation (3.3) α = E (yi − I[vi > 0])f (vi )−1 ≡ E (yi − I[vi > 0])w(vi ) with the weight function w(vi ) = f (vi )−1 , where f (·) here denotes the density of vi . This type of identification is sensitive to conditions imposed on the support of v and ε. For example, point identification of α is lost when one excludes
2026
S. KHAN AND E. TAMER
sets of v of arbitrary small probability when v is in the tails. The model in this case will not contain any information about α (in the sense of trivial bounds). This is a case of nonrobust or thin-set identification which can be shown to be a defining characteristic of this kind of identification. To see this, note that if we restrict the support of v to lie on the set [−K K] for any K > 0, α will not be point identified: α=−
−K
−∞
K ∞ vfε (α + v) dv − vp (v) dv − vfε (α + v) dv −K K (i)
(ii)
(iii)
where p (·) denotes the derivative of the probability function P(yi = 1|vi ). We see that only (ii) is identified, while (i) and (iii) are not since, in (3.2), we can only learn Fε from α − K to α + K. So, no matter how large K is, the model is set identified. In addition, we see that it is possible to choose the unidentified portion of fε (·) (parts (i) and (iii) above) in such a way that the model provides no information about α. This is similar to results in Magnac and Maurin (2007) (we view this property as similar to the nonrobustness of the sample mean). To guarantee point identification when ε takes support on the real line ([−∞ +∞]), v is also required to take arbitrary large values. This will mean that the density of vi vanishes on a set of arbitrarily small measure, in this case |v| > M, for an arbitrarily large constant M. Here, the weight function in (3.3) becomes unbounded. This suggests that identification of α based on (3.3) is irregular in the sense that an analog estimator will not generally converge at the parametric rate.4 The next theorem confirms this by showing that the efficiency bound is infinite for the model with or without additional regressors. THEOREM 3.1: In the binary choice model y = 1[α + xβ + v − ε ≥ 0], with exclusion restriction and unbounded support on the error terms, and where ε|x v =d ε|x and E[ε|x] = 0, if the second moments of x v are finite, then the semiparametric efficiency bound for α and β is not finite. The proof of this theorem is provided in the Appendix and basically shows that the binary choice model cannot be estimated at the regular rate. We confirm this later by showing how rates of an analog estimator can vary widely with tail behavior conditions on observed and unobserved variables. This nonuniform behavior of the analog estimator makes conducting inference difficult. 4
In cases where ε has bounded support, it is possible that α is identified regularly if the support of v is large relative to that of ε. One can check that easily since for those large values of v, the choice probability hits 0 or 1.
INVERSE WEIGHT ESTIMATION
2027
3.1. Relative Tail Behavior and Rates of Convergence in Special Cases In this section, we derive the rate of convergence for the estimator of α in some examples. This will shed light on the rates of convergence in simple and generic cases. This rate of convergence depends on the relative tail behavior of the density of v versus the “tails” of the propensity score or the rate at which the propensity score approaches 0 and 1. This rate for the estimator of the intercept term in (3.1) is slower than the parametric rate. The result holds for any model where the tail of the special regressor v is as thin or thinner than the tail of the error term. We also present examples where the rate of convergence reaches the regular parametric rate. For example, as we mention below, when vi is Cauchy, then the estimator of α converges at root-n rate.5 Here, we focus on the rate of convergence for the estimator using the true regressor density. This rate of convergence is of the same order as that of the estimator using a kernel density estimator (see Khan and Tamer (2010) for details). Consider the estimator of α that was proposed by Lewbel (1997) but with known density function f (·): 1 yi − I[vi > 0] I[|vi | ≤ γ2n ] αˆ = n i=1 f (vi ) n
(3.4)
ˆ In what follows we will establish a rate of convergence in Next let α¯ n = E[α]. some special cases. To do so, we first define the sequence of constants v(γ2n ) = i >0] I[|vi | ≤ γ2n ]) and let hn = v(γ2n )−1 . As we will show below, hn → 0 Var( yi −I[v f (vi ) in some generic examples, resulting in a slower rate of convergence, as is to be expected given our efficiency bound calculations in the Appendix. We now derive the rate of convergence of the bias terms
yi − I[vi > 0] nhn (α¯ n − α) and α = E f (vi ) We have (3.5)
bn ≡ α¯ n − α = −
∞ γ2n
(p(v) − 1) dv −
−γ2n
p(v) dv −∞
where p(v) = P(yi = 1|vi = v) is the “propensity score.” Clearly we have limn→∞ bn = 0 if we maintain that limn→∞ γ2n = ∞. First we calculate the form of the variance term v(γ2n ) as a function of γ2n . Note that we can express v(γ2n ) 5 The notion of attaining a faster rate of convergence when moments are infinite is not new. For example, it is well known that when regressors have a Cauchy distribution in the basic linear model, ordinary least squares (OLS) is superconsistent. We also point out that this case is ruled out by the conditions in our impossibility theorem.
2028
S. KHAN AND E. TAMER
as the integral (3.6)
v(γ2n ) =
p(v)(1 − p(v)) dv f (v) −γ2n
yi − I[vi > 0] I[|vi | ≤ γ2n ] vi − α¯ n + Var E f (vi ) γ2n
We focus on the first term because we now show that the second term is negligible when compared to the first. The second term in the variance is of the form
yi − I[vi > 0] (3.7) I[|vi | ≤ γ2n ] vi − α¯ n Var E f (vi ) The variance (3.7) is of the form E[(α¯ n (vi ) − α¯ n )2 ], where α¯ n (vi ) denotes the conditional expectation in (3.7). Note that since α¯ n converges to α0 and E[α¯ n (vi )] = α¯ n , the only term in (3.7) that may diverge is γ2n (p(v) − I[vi > 0])2 2 (3.8) dv E[α¯n (vi ) ] = f (v) −γ2n 0 γ2n (1 − p(v))2 (p(v))2 dv + dv = f (v) f (v) −γ2n 0 γ which is of equal or smaller order6 than −γ2n2n p(v)(1−p(v)) dv, the first piece of the f (v) variance term in (3.6). So, as far as deriving the fastest rate of convergence for the estimator, we can ignore the second term in (3.6) and focus on the first term. As for the bias term bn , we see that γ2n (3.9) {Fε (v + α) − 1[v ≥ 0]} dv − α bn = E[α¯ n ] − α = −γ2n
where Fε (·) denotes the c.d.f. of εi . This bias term behaves asymptotically like (3.10)
bn ≈ γ2n (1 − p(γ2n )) + γ2n p(−γ2n )
Note that the bias term does not directly depend on the density of vi . With these general results, we now calculate the rate of convergence for some special cases that correspond to various tail behavior conditions on the regressors and the error terms. 6 This can be shown most easily by applying l’Hôpital’s rule to the ratio of γ γ2n (1[v≥0]−p(v))2 dv/ −γ2n2n p(v)(1−p(v)) dv when the denominator diverges. When the denomi−γ2n f (v) f (v) nator converges to a constant, it can also be shown that the numerator converges to a constant.
INVERSE WEIGHT ESTIMATION
2029
• Logit Errors/Logit Regressors. Here we assume that the latent error term and the regressors all have a standard logistic distribution. We have (3.11)
v(γ2n ) = 2γ2n
and
bn = ln
1 + e−γ2n −α 1 + e−γ2n +α
and by our approximation7,8 (3.12)
bn ≈ −2 exp(−γ2n )
2 exp(−2γ2n ) to get γ2n = So to minimize mean squared error, we set γn2n = γ2n 1/2 O(log n ), resulting in the rate of convergence of the estimator: n (3.13) (αˆ − α0 ) = Op (1) log n1/2
Furthermore, from the results in the previous section, the estimator will be asymptotically normally distributed, and have a limiting bias. γ•2n Logit 2 Errors/Normal Regressors. The variance is of the form 2v(γ2n )−1= exp(v /2) dv, which can be approximated as v(γ2n ) = O(exp(γn /2)γ2n ). The bias is of the form bn ≈ exp(−γ2n ). So the mean squared error (MSE) minimizing sequence is of the form γ2n = O( log n). The rate of convergence n−1/4 is Op ( √ ). 4 log n
• Other Rates. Similarly, one can easily show that in the case of Normal er-
n) rors and Normal regressors, we get that the estimator is Op ( log(log ). With n logit errors and Cauchy regressors, we get the standard root-n rate with a similar regular rate with Probit errors and Cauchy regressors. • Differing Support Conditions: Regular Rate. Let the support of the latent error term be strictly smaller than the support of the regression function. For γ2n sufficiently large, p(γ2n ) takes the value 1, so in this case we need not use the above approximation and limγ2n →∞ v(γ2n ) is finite. This implies we can let γ2n increase as quickly as possible to remove the bias, so the estimator is root-n consistent and asymptotically normal with no limiting bias. This is a case where, for example, α + v in (3.1) has strictly larger support than ε’s.
7 This approximation is based on the case where α = 0; when α = 0, the estimator is unbiased and converges at a rate arbitrarily close to root n. Actually, the approximation in (3.10) is not sharp in this case, as the limit of the actual bias over the approximate bias converges to 0. Hence for this example, we worked with the approximation bn ≈ p(γ2n ) − 1 − p(−γ2n ). 8 The way these approximations are derived here and in all the examples is (i) we guess at an approximating function (say, in this case, −2 exp(−γ2n ) for the bias in this example) and then (ii) we take the ratio of bn to its proposed approximation (here bn in (3.9) divided by its approximating function in (3.12)) and show via l’Hôpital’s rule that the limit is a nonzero constant that is finite.
2030
S. KHAN AND E. TAMER
As the results in this section illustrate, the rates of convergence for the analog inverse weight estimators vary with tail behavior conditions on both observed and unobserved random variables, and the rate rarely coincides with the parametric rate.9 Hence, semiparametric efficiency bounds might not be as useful for this class of models. Overall, this confirms the results in Lewbel (1997) about the relationship between the thickness of the tails and the rates of convergence. An important question is on how to conduct inference with varying rates. One approach would be along the lines of that proposed in Andrews and Schafgans (1998) for the sample selection model. To illustrate the approach for the problem at hand,10 we let αˆ n denote the trimmed variant of the estimator proposed in Lewbel (1997) for the intercept term, 1 yi − I[vi > 0] I[|vi | ≤ γ2n ] n i=1 f (vi ) n
(3.14)
αˆ n =
where f (vi ) is the density of v. The estimator includes the additional trimming term I[|vi | ≤ γ2n ], where γ2n is a deterministic sequence of numbers that satisfies limn→∞ γ2n = ∞ and limn→∞ γ2n /n = 0. Let Sˆ n denote a trimmed estimator of its asymptotic variance if conditions were such that the asymptotic variance were finite: 1 p(vi )(1 − p(vi )) Sˆ n = I[|vi | ≤ γ2n ] n i=1 f (vi )2 n
(3.15)
where p(·) denotes the choice probability function (or the propensity score). Define p(vi )(1 − p(vi )) (3.16) v(γ2n ) = E I[|vi | ≤ γ2n ] f (vi )2 yi − I[vi > 0] I[|vi | ≤ γ2n ] (3.17) b(γ2n ) = α0 − E f (vi ) and (3.18)
Xni =
yi − p(vi ) I[|vi | ≤ γ2n ] f (vi )
9 There are cases when the parametric rate can be attained, such as when the regressor has a Cauchy distribution or the error term has bounded support. Both cases are ruled out in the conditions of our impossibility theorem. 10 Here we show results for an infeasible estimator where both the regressor density and the propensity score function are assumed to be known. For results for the feasible estimator, see Khan and Tamer (2010).
INVERSE WEIGHT ESTIMATION
2031
We have the following theorem, whose proof can be found in Khan and Tamer (2010). THEOREM 3.2: Suppose (i) γ2n → ∞, (ii) ∀ε > 0, (3.19)
lim
n→∞
and (iii) (3.20)
1 E Xni2 I[|Xni | > ε nv(γ2n )] = 0 v(γ2n )
√ nv(γ2n )b(γ2n ) → 0. Then √ −1/2 nSˆn (αˆ − α0 ) ⇒ N(0 1)
4. TREATMENT EFFECTS MODEL UNDER EXOGENOUS SELECTION This section studies another example of a parameter that is written as a weighted moment condition and where regular rates of convergence require that support conditions be met, essentially guaranteeing that the denominator be bounded away from zero. We show that the average treatment effect estimator under exogenous selection cannot generally be estimated at the regular rate unless the propensity score is bounded away from 0 and 1, and so estimation of this parameter can lead to unstable estimates if these support (or overlap) conditions are not met. A central problem in evaluation studies is that potential outcomes that program participants would have received in the absence of the program are not observed. Letting di denote a binary variable that takes the value 1 if treatment was given to agent i and 0 otherwise, and letting y0i , y1i denote potential outcome variables, we refer to y1i − y0i as the treatment effect for the ith individual. A parameter of interest for identification and estimation is the average treatment effect, defined as (4.1)
β0 = E[y1i − y0i ]
One identification strategy for β0 was proposed by Rosenbaum and Rubin (1983) under the following assumption: ASSUMPTION 1—ATE Under Conditional Independence: Let the following statements hold: (i) There exists an observed variable xi such that di ⊥ (y0i y1i )|xi (ii) 0 < P(di = 1|xi ) < 1 ∀xi . See also Hirano, Imbens, and Ridder (2003). The above assumption can be used to identify β0 as (4.2)
β0 = E[E[yi |di = 1 xi ] − E[yi |di = 0 xi ]]
2032
S. KHAN AND E. TAMER
or (4.3)
β0 = EX [E[yi |di = 1 p0 (xi )] − E[yi |di = 0 p0 (xi )]]
where p0 (xi ) = P(di = 1|xi ) denotes the propensity score. The above parameter can be written as yi (di − p0 (xi )) β0 = E (4.4) p0 (xi )(1 − p0 (xi )) This parameter is a weighted moment condition where the denominator gets small if the propensity score approaches 0 or 1. Also, identification is lost when we remove any region in the support of xi (so fixed trimming will not identify β0 above). Without any further restrictions, there can be regions in the sup1 port of x where p (x)(1−p becomes arbitrarily large, which can happen under 0 0 (x)) Assumption 1(ii), suggesting an analog estimator may not converge at the parametric rate. Whether it can is a delicate question, and depends on the support conditions and on the propensity score. Hahn (1998), in very important work, derived efficiency bounds for the model based on Assumption 1. These bounds can be infinite and, in fact, there will be many situations where this can happen. For example, when the treatment equation corresponds to a propensity score function of p0 (x) = F(x), where F(·) denotes the c.d.f. of the error term in the treatment equation, and there is one continuous regressor xi , where the distribution of the xi is the same as the treatment equation error term so that (ii) is satisfied, then the variance bound in Hahn (1998) is infinity. Thus we see that, as was the case in the binary choice model, rates of convergence will depend on relative tail behaviors of regressors and error terms. In fact, one can even establish an impossibility theorem, analogous to Theorem 3.1, which we state here for the ATE: THEOREM 4.1: In the treatment effects model, under the conditions in Assumption 1 and adding the assumption (iii) the support of p0 (xi ) is (0 1), then the information bound for β0 is 0. The proof is left to the Appendix. Consequently, inference can be more complicated and one might want to supplement standard efficiency bound calculations. Next, we provide rate calculations in simple examples. These should provide a flavor of the role that relative tail conditions play in this model. 4.1. Average Treatment Effect Estimation Rates We conduct the same rate of convergence exercises for estimating the average treatment effect as the ones we computed for the binary response model in the previous section. As a reminder, we do not compute the exact variance
2033
INVERSE WEIGHT ESTIMATION
or bias values: rather, we are interested in the rate at which the need to behave as sample size increases. We will explore the asymptotic properties of the estimator n (1 − di )yi 1 di yi ˆ − I[|xi | ≤ γ2n ] (4.5) β= n i=1 p0 (xi ) 1 − p0 (xi ) We note that this estimator is infeasible, this time because we are assuming the propensity score is known. As in the binary choice model, this will not effect our main conclusions (its asymptotic variance will be different from the feasible estimator, but will have the same order; see Khan and Tamer (2010)). Also, as in the binary choice setting, we assume that it suffices to trim on regressor values and that the denominator only vanishes in the tails of the regressor. Furthermore, to clarify our arguments, we will assume that the counterfactual outcomes are homoskedastic with a variance of 1. Define hn = v(γ2n )−1 and ˆ β¯ n = E[β] 1 v(γ2n ) = E I[|xi | ≤ γ2n ] p0 (xi )(1 − p0 (xi ))
bn = β¯ n − β0
Carrying the same arguments as before, we explore the asymptotic properties of v(γ2n ) and bn as γ2n → ∞. Generally speaking, anytime v(γ2n ) → ∞, the fastest rate for the estimator will be slower than root n. We can express v(γ2n ) as the integral γ2n f (x) (4.6) dx v(γ2n ) = −γ2n p0 (x)(1 − p0 (x)) For the problem at hand, we can write bn approximately as the integral
∞ −γ2n (4.7) m(x)f (x) dx + m(x)f (x) dx bn = γ2n γ2n
−∞
where m(x) is the conditional ATE (CATE). We now consider some particular examples that correspond to tail behavior on the error term in the treatment equation and the regressor. Logit Errors/Regressors/Bounded CATE Here we assume that the latent error term and the regressors both have a standard logistic distribution. To simplify calculations, we also assume the conditional average treatment effect is bounded on the regressor support. We consider this to be a main example, as the results that arise here (a slower than root-n rate) will generally also arise anytime we have distributions whose tails
2034
S. KHAN AND E. TAMER
decline exponentially, such as the normal, logistic, and Laplace distributions. From our calculations in the previous section, we can see that (4.8)
v(γ2n ) = γ2n
bn ≈ γ2n
exp(−γ2n ) (1 + exp(−γ2n ))2
Clearly v(γ2n ) → ∞, resulting in a slower rate of convergence. To get the MSE minimizing rate, we solve for γ2n that set v(γ2n )/n = b2n . We see this matches up when γ2n = 12 log n, resulting in the rate of convergence of the estimator: n (4.9) (βˆ − β0 ) = Op (1) log n1/2 Furthermore, from the results in the previous section, the estimator will be asymptotically normally distributed and have a limiting bias. Normal Errors/Logistic Regressors/Bounded CATE Here we assume the latent error term has a standard normal distribution and the regressors both have a standard logistic distribution. To simplify calculations, we also assume the conditional average treatment effect is bounded γ exp(−x) dx. By multiplying on the regressor support. We have v(γ2n ) ≈ −γ2n2n (x)(1− (x)) and dividing the above fraction inside the integral by φ(x), taking the standard normal probability density function (p.d.f.), and using the asymptotic approximation to the inverse Mills ratio, we get
γ2n 1 2 exp(x2 /2)x dx = O exp γ2n v(γ2n ) ≈ 2 −γ2n ∞ −γ2n f (x) dx + f (x) dx ≈ exp(−γ2n ) bn = γ2n
−∞
so the MSE minimizing value of γ2n is γ2n = O( log n). This leads to a rate of convergence that is Op (n−1/4 ). Thus we see that rates of convergence can vary significantly as in the binary choice model and, consequently, inference procedures would have to be rate-adaptive. Also, as with the binary choice model, there are cases where the parametric rate can be attained, such as Cauchy errors in the treatment equation or support conditions that bound the propensity score away from 0 or 1, but those cases are ruled out by our impossibility theorem. 5. CONCLUSION This paper points out a subtle property of estimators based on weighted moment conditions. In some of these models, there is a direct link between identification and estimability, in that if it is necessary for (point) identification that
INVERSE WEIGHT ESTIMATION
2035
the weights take arbitrary large values, then the parameter of interest, though point identified, cannot be estimated at the regular rate. This rate depends on relative tail conditions and, in some examples, can be as slow as O(n−1/4 ). Practically speaking, this nonstandard rate of convergence can lead to numerical instability and/or large standard errors. See, for example the recent work of Busso, DiNardo, and McCrary (2008) for some numerical evidence based on the ATE parameter. We also propose in this paper a studentized approach to inference in which there is no need to know explicitly the rate of convergence of the estimator. While we illustrate our points in the context of two specific models (binary choice and treatment effects), the ideas here can be extended to other models involving inverse densities, inverse propensity scores, and, more generally, inverse probabilities. For example, in panel data models, recent work by Honore and Lewbel (2002) proposed a procedure involving inverse density weighting, and in randomly censored regression models, inverse weighting (by the censoring probability) procedures are often employed (see, e.g., Koul, Susarla, and Ryzin (1981)). How rates of convergence vary with relative tail conditions and development of rate-adaptive procedures for those models as well, would be worth exploring. There are a few remaining issues that need to be addressed in future work. One is the choice of the trimming parameter, γ2n , in small samples. We provide bounds on the rate of divergence to infinity of this parameter in this paper. One approach to “pick” a rate is similar to what is typically done for kernel estimators for densities: mainly pick the parameter that minimizes the first term expansion of the mean squared error. For example, in the ATE case, γ2n can be chosen to minimize the sum of an empirical version of (4.6) and the square of the empirical version of (4.7). Even though identification has usually been a problem that is conceptually distinct from estimation, our results show that in certain models, identification and estimation are linked, and that although certain parameters are point identified, one cannot estimate them precisely in general. A deeper understanding of this link between identification and estimation would be helpful. APPENDIX: INFINITE BOUNDS As alluded to earlier in the paper, the slower than parametric rates of the inverse weight estimators do not necessarily imply a form of inefficiency of the procedure. The parametric rate is unattainable by any procedure. We show that efficiency bounds are infinite for a variant of model (3.1) above. Now, introducing regressors xi , we alter the assumption so that ε|x v =d ε|x and that E[ε|x] = 0 (here, =d means “has the same distribution as”). We also impose other conditions such that ε|x and v|x have support on the real line for all x. The model we consider now is (A.1)
y = 1[α + xβ + v − ε ≥ 0]
2036
S. KHAN AND E. TAMER
We modify and restate Theorem 3.1 first; its proof follows. THEOREM A.1: In the binary choice model (A.1) with exclusion restriction and unbounded support on the error terms, and where ε|x v =d ε|x and E[ε|x] = 0, if the second moments of x v are finite, then the semiparametric efficiency bound for α and β is not finite. REMARK A.1: The proof of the above results are based on the assumption of second moments of the regressors being finite. This type of assumption was also made in Chamberlain (1986) in establishing his impossibility result for the maximum score model. PROOF OF THEOREMS 3.1 AND A.1: We follow the approach taken by Chamberlain (1986). Specifically, look for a subpath around which the variance of the parametric submodel is unbounded. We first define the function g0 (t x) = P(εi ≤ t|xi = x) Note that it does not depend on the v because of our assumption of conditional independence. In what follows, t will generally correspond to the index x β + v. The likelihood function we will be working with is based on the density function f (y x z β g) = g(xβ + v x)y (1 − g(xβ + v x))1−y First we require some definitions: DEFINITION A.1: The family of conditional distributions Γ consists of all functions g : Rk → R such that for all (t x) ∈ R × Rk−1 , the following statements hold: (i) g is continuous. (ii) g (t x), the partial derivative of g(t x) with respect to its first argument, is continuous and positive. (iii) lim s→−∞ g(s x) = 0 and lims→+∞ g(s x) = 1 (iv) sg (s x) ds = 0. DEFINITION A.2: The set of subpaths Λ consists of the paths
h λ(δ) = g0 C δ − δ0 where g0 is the “true” distribution function, which is assumed to be an element of Γ , C (·) denotes the c.d.f. of random variables with a Cauchy distribution, location parameter 0, and scale parameter 1, and h : Rk+1 → R is a positive, continuously differentiable function.
INVERSE WEIGHT ESTIMATION
2037
We note that (A.2)
d −g0 λ(δ) = dδ πh δ=δ0
g0 So for any h that satisfies s πh ds = 0 λ(δ) ∈ Γ for δ in a neighborhood of δ0 , as all four conditions will be satisfied. We also note that the scores of the root likelihood function are ψj (y x v) =
1 −1/2 yg0 (xβ0 + v x) − (1 − y)(1 − g0 (xβ0 + v x))−1/2 2 × g0 (xβ0 + v x)x(j)
where x(j) is the jth component of x and ψλ (y x v) =
1 −1/2 yg0 (xβ0 + v x) − (1 − y)(1 − g0 (xβ0 + v x))−1/2 2 −g0 (xβ0 + v x) × πh(xβ0 + v x)
We will show that the following theorem holds: THEOREM A.2: Let Iλj denote the partial information for the jth component of β0 as defined in Chamberlain (1986). If P(xi β0 + vi = 0) = 0 and if the second moment of the vector (xi vi ) is finite, then inf Iλj = 0
λ∈Λ
1 Note that, heuristically, we will get the desired result if we define πh(txv) = a(x v)c(t), where a(x v) is arbitrarily close to fε|X (xβ0 +v|x)/g0 (xβ0 +v)·x(j) and c(t) is the function that takes value 1 on its support. To fill in these details, we define Π and Qλ as Π(A) = g0 (xβ0 + v x)(1 − g0 (xβ0 + v))−1 fV (v) dFX (x) A
and
Qλ =
2 b(xβ0 + v x)x(j) − h(xβ0 + v x) dΠ(v x)
where b(· ·) = g0 (· ·)/g0 (· ·). We can then add and subtract a(x v) to the above integrand, inside the square.
2038
S. KHAN AND E. TAMER
Note that we can make the term 2 b(xβ0 + v x)x(j) − a(x v) dΠ(v x) arbitrarily small by the denseness of the space of continuously differentiable functions. This result, that the above integral can be made arbitrarily small, follows from Lemma A.2 in Chamberlain (1986) if we can show that b(xβ0 + v) ∈ L2 (Π), where dΠ(v x) = g0 (xβ0 + v x)(1 − g0 (xβ0 + v x))−1 fX (x)fV (v) dx dv In other words, all we need to show is the finiteness of the integral g0 (xβ0 + v x)2 x2 fX (x)fV (v) dx dv (A.3) g0 (xβ0 + v x)(1 − g0 (xβ0 + v x)) (j) Finiteness follows for all distributions satisfying the assumptions in the definition of Γ , which implies the uniform (in v x) boundedness of the term g0 (xβ0 + v x)2 g0 (xβ0 + v x)(1 − g0 (xβ0 + v x)) g (tx)
0 This is true because under our assumptions, for any finite t, g (tx)(1−g is fi0 0 (tx)) nite, so the only possibility of the fraction becoming unbounded is as t → ±∞. However, by considering t → ∞ and applying l’Hôpital’s rule, this would be equivalent to the unboundedness of limt→+∞ g0 (t x) (with g0 (t x) the second partial derivative of g0 (t x)). But that would contradict g0 (t x) being a density function limiting to 0 as t → ±∞. The boundedness of the above ratio implies finiteness of the integral in (A.3) by the finite second moment assumption of x, and the fact that fV (v) is a density function and integrates to 1. Finally, the term [a(x v) − a(x v)c(t)]2 dΠ(v x)
can be made arbitrarily small by setting c(t) to 1 in most of its support. Furthermore, with this same definition of c(t), we can satisfy c (t)t dt = 0. Q.E.D. A.1. Proof of Theorem 4.1 Our proof works with a more restrictive model: this does not change the results of the theorem because we prove an infinite bound despite adding additional restrictions; the purpose of the restrictions is to simplify the arguments
INVERSE WEIGHT ESTIMATION
2039
made here. Thus in addition to assumptions (i)–(iii), we add the following conditions: (iv) Conditional on xi , y1i and y0i have a known distribution. (v) E[(β1 (xi )(1 − p0 ) + β0 (xi )p0 )−2 ] is finite, where βj (xi ), j = 0 1, denotes the mean of yji conditional on xi , assumed to be positive functions. For notational convenience, we have suppressed expressing the propensity score as a function of x. We first note that the density of the observables (y d x) can be expressed as (see, e.g., Hahn (1998, p. 325)) (φ1 (y|x)p)d (φ0 (y|x)(1 − p))1−d fX (x) where φ1 (·|x) = φ(y0 ·|x) dy0 and φ1 (·|x) = φ(y0 ·|x) dy0 . We define the set of paths Λ as follows: (A.4)
DEFINITION A.3: Λ consists of the paths λ(δ) = p0 C (h/(δ − δ0 )) where p0 is the “true” propensity score, assumed to take values on (0 1) and h : Rk → R is a positive continuously differentiable function; C : R → (0 1) denotes the c.d.f. of random variable with a Cauchy distribution, location parameter 0, and scale parameter 1. d λ(δ)|δ=δ0 = We note that with this definition, we have λ(δ0 ) = p0 and dδ −p0 /πh ≡ p0 . Then we get ψλ , which is the derivative of the log density (here, we assumed that both φ1 and φ0 are known)
d (1 − d) d − p0 p0 p0 (x) = − ψλ (d y x) = p0 (1 − p0 ) p0 (1 − p0 )
(A.5)
The information for the ATE is (ψβ − ψλ )2 dμ (A.6) where again we let μ denote the dominating measure. We can write the integral (A.6) as 2 (A.7) ψ2λ 1 − (dβ(δ0 )/dδ)−1 dμ =
2 1 − (dβ(δ0 )/dδ)−1 E[ψ2λ |x] dFX (x)
2040
S. KHAN AND E. TAMER
As explained below, we can find an ∗ (x) to make (1 − (dβ(δ0 )/dδ)−1 )2 arbitrarily small uniformly in x. Here, we simply need to verify that E[ψ2λ |x] dFX (x) (A.8) is finite when = ∗ , where ∗ is established below. Recall that ψλ (· · ·) depends on , as does E[ψ2λ |x], for which we evaluate the expression 2 d − p0 p0 (A.9) E[E[ψ2λ |x]] = E p0 (1 − p0 ) p0 l 2 p20 l2 2 =E = E p0 l + 1 − p0 1 − p0 This is finite when l is replaced with l∗ in (A.12). Specifically, the expectation (A.9) is of the form p0 (1 − p0 ) (A.10) E (β1 (x)(1 − p0 ) + β0 (x)p0 )2 and a sufficient condition for finiteness of the term (A.10) is our condition (v). Thus all that remains is to show how to obtain the form of ∗ . From Hirano, Imbens, and Ridder (2003), we have y(d − p(δ)) (A.11) β(δ) = E p(δ)(1 − p(δ)) d where β(δ0 ) denotes the ATE. We wish to evaluate dδ β(δ)|δ=δ0 . Letting p d denote dδ λ(δ)|δ=δ0 , this derivative is d −yp p0 (1 − p0 ) − y(d − p0 )(p (1 − p0 ) − p0 p ) β(δ)|δ=δ0 = E dδ p20 (1 − p0 )2 2 yp (p0 + d(1 − 2p0 )) = −E p20 (1 − p0 )2 β1 (x)p (p20 + (1 − 2p0 )) = −E p0 p20 (1 − p0 )2 β0 (x)p p20 + 2 (1 − p0 ) p0 (1 − p0 )2 β1 (x) β0 (x) + = −E p p0 (1 − p0 ) β1 (x) β0 (x) = −E p0 l + p0 (1 − p0 )
INVERSE WEIGHT ESTIMATION
Therefore, to make
|
dβ(δ) dδ δ=δ0
(A.12)
∗
(x) = −p
−1 0
2041
arbitrarily close to 1, we can set
β1 (x) β0 (x) + p0 1 − p0
−1
It is easy to see that the path evaluated at ∗ = 1/ h∗ verifies the conditions we put on the propensity score for some δ in a close neighborhood of δ0 As we can see, at this value of l∗ , the path in Definition A.3 can be made arbitrarily close to 0 or 1, since the only restrictions on p0 (x) is that it belongs to (0 1) It is interesting to note that this path will not work if one restricts p0 to be bounded away from 0 and 1, since the path we defined, p0 C (h/(δ − δ0 )), will push λ(δ) to be arbitrarily close to (0, 1) for δ in a neighborhood of δ0 ; hence our argument will not work if the propensity score is bounded away from 0 and there the bound is generally finite. REFERENCES ANDREWS, D. W. K., AND M. SCHAFGANS (1998): “Semiparametric Estimation of the Intercept of a Sample Selection Model,” Review of Economic Studies, 65, 487–517. [2021-2023,2025,2030] BUSSO, M., J. DINARDO, AND J. MCCRARY (2008): “Finite Sample Properties of Semiparametric Estimators of Average Treatment Effects,” Working Paper, UC–Berkeley. [2023,2035] CHAMBERLAIN, G. (1986): “Asymptotic Efficiency in Semi-Parametric Models With Censoring,” Journal of Econometrics, 32, 189–218. [2022,2023,2036-2038] FROLICH, M. (2004): “Finite-Sample Properties of Propensity-Score Matching and Weighting Estimators,” Review of Economics and Statistics, 86, 77–90. [2023] GRAHAM, B., AND J. POWELL (2009): “Identification and Estimation of ‘Irregular’ Correlated Random Coefficient Models,” Working Paper, UC–Berkeley. [2024] HAHN, J. (1998): “On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average Treatment Effects,” Econometrica, 66, 315–331. [2022,2032,2039] HECKMAN, J. (1990): “Varieties of Selection Bias,” American Economic Review, 80, 313–318. [2022,2023] HIRANO, K., G. W. IMBENS, AND G. RIDDER (2003): “Efficient Estimation of Average Treatment Effects Using the Estimated Propensity Score,” Econometrica, 71, 1161–1189. [2022,2031,2040] HONORE, B. E., AND A. LEWBEL (2002): “Semiparametric Binary Choice Panel Data Models Without Strictly Exogeneous Regressors,” Econometrica, 70, 2053–2063. [2035] IMBENS, G. W. (2004): “Nonparametric Estimation of Average Treatment Effects Under Exogeneity: A Review,” Review of Economics and Statistics, 86, 4–29. [2022] KHAN, S., AND E. TAMER (2010): “Feasible Rate Adaptive Inference in Some Inverse Weighted Models,” Working Paper, Duke University. [2027,2030,2031,2033] KOUL, H., V. SUSARLA, AND J. V. RYZIN (1981): “Regression Analysis With Randomly Right Censored Data,” The Annals of Statistics, 9, 1276–1288. [2035] LEWBEL, A. (1997): “Semiparametric Estimation of Location and Other Discrete Choice Moments,” Econometric Theory, 13, 32–51. [2021,2022,2025,2027,2030] (1998): “Semiparametric Latent Variable Model Estimation With Endogenous or Mismeasured Regressors,” Econometrica, 66, 105–122. [2022] (2000): “Semiparametric Qualitative Response Model Estimation With Unknown Heteroscedasticity or Instrumental Variables,” Journal of Econometrics, 97, 145–177. [2022] MAGNAC, T., AND E. MAURIN (2007): “Identification and Information in Monotone Binary Models,” Journal of Econometrics, 139, 76–104. [2026]
2042
S. KHAN AND E. TAMER
MANSKI, C. F. (1988): “Identification of Binary Response Models,” Journal of the American Statistical Association, 83, 729–738. [2025] NEWEY, W. (1990): “Semiparametric Efficiency Bounds,” Journal of Applied Econometrics, 5, 99–135. [2023] ROSENBAUM, P., AND D. RUBIN (1983): “The Central Role of the Propensity Score in Observational Studies for Causal Effects,” Biometrika, 70, 41–55. [2021,2031] STEIN, C. (1956): “Efficient Nonparametric Testing and Estimation,” in Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Vol. 1. Berkeley, CA: University of California Press, 187–195. [2023]
Dept. of Economics, Duke University, 213 Social Sciences Building, Durham, NC 27708, U.S.A.;
[email protected] and Dept. of Economics, Northwestern University, 2001 Sheridan Road, Evanston, IL 60208, U.S.A.;
[email protected]. Manuscript received August, 2007; final revision received May, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 2043–2061
TESTING FOR CAUSAL EFFECTS IN A GENERALIZED REGRESSION MODEL WITH ENDOGENOUS REGRESSORS BY JASON ABREVAYA, JERRY A. HAUSMAN, AND SHAKEEB KHAN1 A unifying framework to test for causal effects in nonlinear models is proposed. We consider a generalized linear-index regression model with endogenous regressors and no parametric assumptions on the error disturbances. To test the significance of the effect of an endogenous regressor, we propose a statistic that is a kernel-weighted version of the rank correlation statistic (tau) of Kendall (1938). The semiparametric model encompasses previous cases considered in the literature (continuous endogenous regressors (Blundell and Powell (2003)) and a single binary endogenous regressor (Vytlacil and Yildiz (2007))), but the testing approach is the first to allow for (i) multiple discrete endogenous regressors, (ii) endogenous regressors that are neither discrete nor continuous (e.g., a censored variable), and (iii) an arbitrary “mix” of endogenous regressors (e.g., one binary regressor and one continuous regressor). KEYWORDS: Endogeneity, causal effects, semiparametric estimation.
1. INTRODUCTION ENDOGENOUS REGRESSORS are frequently encountered in econometric models, and failure to correct for endogeneity can result in incorrect inference. With the availability of appropriate instruments, two-stage least squares (2SLS) yields consistent estimates in linear models without the need for making parametric assumptions on the error disturbances. Unfortunately, it is not theoretically appropriate to apply 2SLS to nonlinear models, as the consistency of 2SLS depends critically upon the orthogonality conditions that arise in the linear-regression context. Until recently, the standard approach for handling endogeneity in nonlinear models has required parametric specification of the error disturbances (see, e.g., Heckman (1978), Smith and Blundell (1986), and Rivers and Vuong (1988)). A more recent literature in econometrics has developed methods that do not require parametric distributional assumptions, which is more in line with the 2SLS approach in linear models. In the context of the model considered in this paper, existing approaches depend critically on the form of the endogenous regressor(s).2 For continuous endogenous regressors, a “control-function approach” has been proposed by Blundell and Powell (2003, 2004) for many nonlinear models (see also Aradillas-Lopez, Honoré, and Powell (2007) and, without linear1 This paper has benefited from comments by a co-editor, two anonymous referees, Andrew Chesher, Han Hong, Jim Powell, and seminar participants at Berkeley, Duke, Harvard/MIT, Purdue, Vanderbilt, and the Midwest Econometrics conference. The first and third authors acknowledge financial support from the National Science Foundation. 2 Several papers have considered estimation in the presence of endogeneity under additional assumptions. These include Lewbel (2000), Hong and Tamer (2003), and Kan and Kao (2005).
© 2010 The Econometric Society
DOI: 10.3982/ECTA7133
2044
J. ABREVAYA, J. A. HAUSMAN, AND S. KHAN
index and separability restrictions, Imbens and Newey (2009)). With these approaches, often a linear model specifies a relationship between the continuous endogenous regressors and the full set of exogenous covariates (including the instruments). The first-stage estimation yields estimates of the residuals from this model, which are then plugged into a second-stage estimation procedure to appropriately “control” for the endogenous regressors.3 The control-function approach, however, requires the endogenous regressors to be continuously distributed. For the endogenous regressors, this restriction is necessary to identify the average structural function and its derivatives (i.e., the structural effects).4 For a single binary endogenous regressor, Vytlacil and Yildiz (2007) established conditions under which it is possible to identify the average treatment effect in nonlinear models. Identification requires variation in exogenous regressors (including the instruments for the binary endogenous regressor) that has the same effect on the outcome variable as a change in the binary endogenous regressor. Yildiz (2006) implemented this identification strategy in the context of a linear-index binary-choice model, where the outcome equation is y1 = 1(z1 β0 + α0 y2 + ε > 0) for exogenous regressors z1 , a binary endogenous regressor y2 , and independent and identically distributed (i.i.d.) error disturbance ε. The reduced-form equation for y2 is y2 = 1(z δ0 + η > 0) for exogenous regressors z (which now include instruments for y2 ) and i.i.d. error disturbance η. Identification assumes an extra support condition,5 specifically that for some z δ0 values (i.e., a positive-probability region), the conditional distribution of z1 β0 has support wider than the parameter value α0 .6 In this paper, we consider the problem of testing the statistical significance of causal (or treatment) effects in a general nonlinear setting. That is, rather than attempting to estimate the magnitude of causal effects, we seek to estimate the direction (or sign) of these effects. The focus on the sign(s) of causal effects rather than the magnitude(s) turns out to have important implications for the generality of our proposed testing procedure. First, the testing procedure can handle endogenous regressors of arbitrary form, including continuous regressors as in Blundell and Powell (2003, 2004), a binary regressor as in Vytlacil and 3
See also, for example, in linear models, Telser (1964) and Dhrymes (1970). On the other hand, following their analysis, in certain cases one could recover the sign of the structural effect(s) without the support condition. However, there are relevant cases, such as a binary endogenous variable, for which their method is unable to recover even the sign of the effect. 5 See Appendix B in Vytlacil and Yildiz (2007). This support condition pertains only to index and parameter values, and not the unobserved error terms. Thus, this result is distinct from previous “identification at infinity” results. 6 Alternatively, one can view this as a parameter restriction rather than a support condition. This restriction is substantive in the sense that identification of β0 does not require unbounded support of z1 β0 . Our Assumption RD in the Appendix imposed an unbounded support condition, but this is only for convenience. See Khan (2001) on how to relax this condition in the context of rank estimation, and note that such a condition is not required when alternative estimators are used for binary-choice models (e.g., Ahn, Ichimura, and Powell (1994)). 4
TESTING FOR CAUSAL EFFECTS
2045
Yildiz (2007), or other types of regressors (e.g., a censored variable). Second, the approach extends easily to the case of multiple endogenous regressors; importantly, the set of endogenous regressors can include a “mix” of discrete and continuous variables. Third, the procedure can test the statistical significance of a causal effect even in cases in which the magnitude of the causal effect is not identified. For example, the extra support condition in Vytlacil and Yildiz (2007) and Yildiz (2006) is not required to identify the sign of the treatment effect and, therefore, is not needed for our testing procedure. The work proposed here is also related to recent literature on bounding causal effects in models with a binary endogenous variable. See, for example, Bhattacharya, Shaikh, and Vytlacil (2005), Shaikh and Vytlacil (2005), and Chiburis (2010). These studies focus on partial identification of causal-effect parameters (such as the average treatment effect). Chesher (2005) concluded that point identification of these parameters is generally not possible without additional assumptions. Therefore, it is natural to focus directly on the sign (rather than the magnitude) of the causal effect. In addition, while the aforementioned studies imply identification of the sign in specific settings, this paper will focus on such identification in a more general model that allows for additional covariates, potential nonlinearities, and nondiscrete endogenous regressors. The generality of our model is also in contrast to related studies in the biostatistics literature, where tests for the significance of a treatment effect with time-to-event data have been proposed (e.g., Mantel (1966), Peto and Peto (1972), Prentice (1978), Yang and Zhao (2007), and Yang and Prentice (2005)). The outline of the paper is as follows. Section 2 introduces the generalized regression model, a model similar to Han (1987) but with the inclusion of an endogenous regressor. To complete the specification of the (triangular) model, a reduced-form model is utilized for the endogenous regressor. Focusing on the case of a binary endogenous regressor, Section 3 introduces a three-step procedure for testing significance of the causal (or treatment) effect of the endogenous regressor. The third stage of this procedure computes the statistic of interest, which turns out to be a kernel-weighted √ version of the tau statistic of Kendall (1938). Since the (scalar) statistic is n-consistent and asymptotically normal, the proposed test for statistical significance of the causal effect is simply a z-test. Section 4 describes the causal-effect testing approach for the general version of the model, which allows for multiple endogenous regressors (with an arbitrary mix of discrete, continuous, and possibly censored endogenous regressors). Section 5 provides a brief empirical illustration of the methodology, based on Angrist and Evans (1998), in which we test for a causal effect of fertility (specifically, having a third child) on mothers’ labor supply. In the interest of space, Monte Carlo simulations, additional details on the empirical application, and some of the asymptotic proofs have been provided as Supplemental Material (Abrevaya, Hausman, and Khan (2010)).
2046
J. ABREVAYA, J. A. HAUSMAN, AND S. KHAN
2. THE MODEL Let y1 denote the dependent variable of interest, which is assumed to depend on a vector of covariates z1 and a single endogenous variable y2 . (The general treatment of multiple endogenous regressors with a mix of continuous and discrete covariates is considered in Section 4.) We consider the (latent-variable) generalized regression model for y1 : (2.1)
y1∗ = F(z1 β0 y2 ε)
y1 = D(y1∗ )
The model for the latent dependent variable y1∗ has a general linear-index form, where ε is the error disturbance, and F is a possibly unknown function that is assumed to be strictly monotonic (without loss of generality, strictly increasing) in its first and third arguments and weakly monotonic in its second argument.7 To be more precise, v > v
⇒
F(v y e) > F(v y e) for all y, e
e > e
⇒
F(v y e ) > F(v y e )
for all v, y
Moreover, the direction of the monotonicity with respect to the second argument is assumed to be invariant to the values of the first and third arguments. That is, either y2 > y2
⇒
F(v y2 e) ≥ F(v y2 e)
for all v, e
y2 > y2
⇒
F(v y2 e) ≤ F(v y2 e)
for all v, e
or
The observed dependent variable is y1 , where the function D is weakly increasing and nondegenerate (i.e., strictly increasing on some region of its argument). The model in (2.1) is similar to the generalized regression model of Han (1987),8 7 The assumption that F is strictly increasing in its first argument can be viewed as a normalization given the presence of the coefficient vector β0 . In turn, the signs of the β0 components are identified relative to this assumption. 8 Hence we impose the same monotonicity conditions as in that paper, noting the strict monotonicity conditions in the third arguments ensure that the support of y1 does not depend on β0 . We also note that under further restrictions on the support of exogenous regressors, such as assuming one of its components has positive density on the real line, the strict monotonicity condition on F(· · ·) with respect to its first argument may be relaxed to weakly monotonic but nondegenerate, as long as the nondegeneracy of D(·) is preserved. Also, an alternative specification, which separates the strict and weakly and strongly monotonic relationships is
y1∗ = F(z1 β0 ε) y1 = D(y1∗ y2 ) where F(· ·) is strictly monotonic in each of its arguments for all values of the other and D(· ·) is weakly monotonic but nondegenerate in each of its argument for all values of the other.
TESTING FOR CAUSAL EFFECTS
2047
except for the inclusion of the endogenous variable y2 . This model encompasses many nonlinear microeconometric models of interest, including binarychoice models, ordered-choice models, censored-regression models, transformation (e.g., Box–Cox) models, and proportional hazards duration models. Note that the endogenous variable y2 enters separably in the model for y1∗ . This formulation includes the traditional additively separable case (i.e., z1 β0 + α0 y2 ) considered in Blundell and Powell (2004) and Yildiz (2006), but allows for other forms of separability.9 In addition to consistently estimating β in the presence of y2 , researchers are also interested in determining whether the endogenous variable y2 has an effect on y1 and, if so, the direction of this effect. More formally, in the context of the generalized regression model, the null hypothesis of no effect of y2 on y1 is (2.2)
H0 : F(v y2 ε) = F(v y2 ε) for all y2 , y2 , v, ε
In contrast, a positive effect of y2 on y1 is equivalent to (2.3)
F(v y2 ε) ≥ F(v y2 ε) for all y2 > y2 , v, ε
with strict inequality on some region of the support of y2 . Similarly, a negative effect of y2 on y1 is equivalent to (2.4)
F(v y2 ε) ≤ F(v y2 ε) for all y2 > y2 , v, ε
with strict inequality on some region of the support of y2 . As is common in econometric practice, the three alternatives (2.2)–(2.4) rule out the case that y2 may have a positive effect for some z1 β0 values and a negative effect for other z1 β0 values.10 For instance, in the traditional linear-index approach, where z1 and y2 enter through the linear combination z1 β0 + α0 y2 , the value of α0 determines which of the above three cases is relevant (α0 = 0, no effect; α0 > 0, positive effect; and, α0 < 0, negative effect). In the presence of possibly nonmonotonic effects of y2 on y1 , it is straightforward to apply the testing component of this paper (i.e., testing H0 above) to different regions of the covariate space. Turning to the model for the endogenous regressor, we first focus on the case of a single binary endogenous regressor so as to simplify exposition. The binary endogenous variable y2 is assumed to be determined by the reducedform model (2.5)
y2 = 1[z δ0 + η > 0]
9 Vytlacil and Yildiz (2007) also considered a weakly separable model with the added generality that z1 enters nonparametrically (rather than through a linear index). 10 There is also a tradition in the biostatistics literature to focus on testing the null of no treatment effect; see, for example, Rosenbaum (2002) and the references cited in the Introduction.
2048
J. ABREVAYA, J. A. HAUSMAN, AND S. KHAN
where z ≡ (z1 z2 ) is the vector of “instruments” and η is an error disturbance. The z2 subcomponent of z provides the exclusion restrictions in the model; z2 is only required to be nondegenerate conditional on z1 β0 . We assume that (ε η) is independent of z. Endogeneity of y2 in (2.1) arises when ε and η are not independent of each other. Estimation of the model in (2.5) is standard. When dealing with a binary endogenous regressor, we use the common terminology “treatment effect” rather then referring to the “causal effect of y2 on y1 .” Thus, for example, a positive treatment effect would correspond to the case of equation (2.3), where y2 can take on only two values: F(v 1 ε) > F(v 0 ε) for all v, ε. The binary-choice model with a binary endogenous regressor is a special case of the model in (2.1). The linear-index form of this model, with an additively separable endogenous variable, is given by (2.6)
y1 = 1[z1 β0 + α0 y2 + ε > 0]
Parametric assumptions on the error disturbances (e.g., bivariate normality of (ε η)) naturally lead to maximum likelihood estimation of (β0 α0 δ0 ) (Heckman (1978)).11 The semiparametric version of this model (i.e., the distribution of (ε η) being left unspecified) was considered by Yildiz (2006), whose estimation approach requires that all components of z be continuous. 3. ESTIMATION AND TESTING FOR A TREATMENT EFFECT The testing approach consists of three stages. In the first stage, the reducedform parameters δ0 are estimated. In the second stage, the coefficients of the exogenous variables (β0 ) in the structural model are estimated. Then, in the third stage, the treatment-effect statistic is calculated. Each of the three stages is described in turn below. Stage 1: Estimation of δ0 When no distribution is √ assumed for η, several semiparametric binarychoice estimators exist for n-consistent estimation of δ0 up to scale (see Powell (1994) for a comprehensive review).12 Since the second stage of our estimation procedure utilizes rank-based procedures, we also focus our theoretical treatment of the first-stage estimator on the use of a rank-based estimator (specifically, the maximum rank correlation (MRC) estimator of Han √ (1987)). We note, however, that any other n-consistent estimator (parametric or semiparametric) of δ0 could be used in the first stage. 11 Another common estimation approach is simply to ignore the nonlinearity in (2.6) and apply 2SLS to the system given by (2.6) and (2.5). 12 With a parametric assumption on η, standard binary-choice maximum likelihood estimation (e.g., probit) would apply.
TESTING FOR CAUSAL EFFECTS
2049
Stage 2: Estimation of β0 The estimator of β0 is based on pairwise comparisons of the y1 values. If (ε η) is independent of z, note that the conditional distribution ε|y2 z is given by (3.1)
Pr(ε ≤ e|y2 z) =
Pr(ε ≤ e|η ≤ −z δ0 ) Pr(ε ≤ e|η > −z δ0 )
if y2 = 0, if y2 = 1.
If two observations (indexed i and j) have y2i = y2j and zi δ0 = zj δ0 , equation (3.1) implies that the conditional distributions εi |yi2 zi and εj |yj2 zj are identical. For such a pair of observations, the strict monotonicity of F with respect to its first and third arguments implies that (3.2)
β0 z1i β0 ≥ z1j
⇐⇒
Pr(y1i > y1j |z1i z1j y2i = y2j zi δ0 = zj δ0 ) ≥ Pr(y1i < y1j |z1i z1j y2i = y2j zi δ0 = zj δ0 )
Equation (3.2) forms the basis for the proposed estimator of β0 . Unfortunately, equation (3.2) cannot be used directly for estimation since (i) δ0 is unknown and (ii) having zi δ0 = zj δ0 might be a zero-probability event. Using the firststage estimator δˆ of δ0 ,13 note that equation (3.2) will be “approximately true” ˆ This in large samples for a pair of observations with y2i = y2j and zi δˆ ≈ zj δ. suggests the kernel-weighted rank-based estimator of β0 , (3.3)
βˆ ≡ arg max β∈B
1 ˆ 1[y2i = y2j ]kh (zi δˆ − zj δ) n(n − 1) i =j
β] × 1[y1i > y1j ]1[z1i β > z1j
where kh (u) ≡ h−1 k(u/ h) for a kernel function k(·) and a bandwidth h that shrinks to zero as n → ∞. The kernel weighting serves to place more weight ˆ 14 Under appropriate regon pairs of observations for which zi δˆ is close to zj δ. √ ularity assumptions, it can be shown that βˆ is a n-consistent estimator of β0 (see the Appendix). 13 Our method is not subject to the problems of the “forbidden regression” (in which fitted values are plugged in to a nonlinear function prior to mimicking 2SLS). The first-stage plugin estimator (of the reduced-form index) is used not as a regressor, but rather as a matching mechanism. Matching also on the value of the endogenous regressor ensures that there is no relationship between the structural error and the plug-in index. 14 For a binary endogenous regressor, the weighting is analogous to propensity-score matching.
2050
J. ABREVAYA, J. A. HAUSMAN, AND S. KHAN
Stage 3: Testing for a Treatment Effect To test for a treatment effect, we propose a kernel-weighted version of Kendall’s tau (or rank correlation) statistic (Kendall (1938)). To motivate this statistic, we first substitute the reduced-form model (2.5) for the endogenous regressor into the structural model (2.1), which yields (3.4)
y1 = D F(z1 β0 1(z δ0 + η > 0) ε)
For fixed z1 β0 , note that the sign of the rank correlation between y1 and z δ0 depends on whether there is a positive treatment effect, a negative treatment effect, or no treatment effect. More precisely, for a pair of observations (in β0 , (3.4) implies dexed i and j) with z1i β0 = z1j (3.5)
zi δ0 ≥ zj δ0
⇐⇒
Pr(y1i > y1j |zi zj z1i β0 = z1j β0 ) ≥ Pr(y1i < y1j |zi zj z1i β0 = z1j β0 )
if there is a positive treatment effect (as in (2.3)) and (3.6)
zi δ0 ≥ zj δ0
⇐⇒
Pr(y1i > y1j |zi zj z1i β0 = z1j β0 ) ≤ Pr(y1i < y1j |zi zj z1i β0 = z1j β0 )
if there is a negative treatment effect (as in (2.4)). In the case of no treatment effect (as in (2.2)), it is trivially the case that (3.7)
β0 ) = Pr(y1i < y1j |zi zj z1i β0 = z1j β0 ) Pr(y1i > y1j |zi zj z1i β0 = z1j
since y1i∗ and y1j∗ are identically distributed if z1i β0 = z1j β0 . The ability to find statistical evidence against the null of no treatment effect (equation (3.7)) requires that the inequality in (3.5) (or (3.6)) is strict in some region. Unlike equation (3.2), these probability statements do not condition on y2 . In fact, the proposed statistic below does not directly use the y2 values. This feature is somewhat analogous to the second stage of 2SLS, where endogenous regressors are not directly used in the regression, but rather their “fitted values” (projections onto the exogenous regressors) are included. In our context, the y2 values play a role in estimation of δ0 and β0 . Unlike 2SLS, however, fitted values of y2 are not used, since linear projections are not appropriate in our general nonlinear model. To operationalize the empirical implications of the probability statements above, it is necessary to plug in the estimators δˆ and βˆ of δ0 and β0 , respectively, and to place greater weight on pairs of observations for which
TESTING FOR CAUSAL EFFECTS
2051
ˆ z1i βˆ ≈ z1j β. This leads to the proposed treatment-effect statistic, which is a kernel-weighted version of Kendall’s tau,15 ˆ ω ˆ ij sgn(y1i − y1j ) sgn(z δˆ − z δ) i
(3.8)
τˆ ≡
i =j
ω ˆ ij
j
i =j
where sgn(v) = 1(v > 0) − 1(v < 0) and the (estimated) weights ω ˆ ij are defined as (3.9)
ˆ ω ˆ ij ≡ kh (z1i βˆ − z1j β)
√ ˆ it is shown Given asymptotically normal √n-consistent estimators δˆ and β, in the Appendix that τˆ is also n-consistent and asymptotically normal. The probability limit of τˆ is (3.10)
β0 ] τ0 ≡ E[sgn(y1i − y1j ) sgn(zi δ0 − zj δ0 )|z1i β0 = z1j
Based on (3.5)–(3.7), it is easy to show that τ0 > 0 for a positive treatment effect, τ0 < 0 for a negative treatment effect, and τ0 = 0 for no treatment effect. Therefore, it is straightforward to conduct a one-sided or two-sided z-test of H0 : τ0 = 0 based on τˆ and its asymptotic standard error se(τ). ˆ This test for a treatment effect is consistent against the alternatives of a positive or negative treatment effect. REMARK 3.1: The testing approach does not require the index structure of equations (2.1) and (2.5). A nonparametric version of the model in (2.1) would be of the form y1∗ = F(z1 y2 ε). Then, the statistic described in this section could match on all components of the z1 vector, which would be attractive when some (or all) of its components are discrete. Moreover, the linear-index restriction in (2.5) is unnecessary; a nonparametric specification (e.g., a nonparametric propensity score in the binary endogenous-variable case) could be used instead. REMARK 3.2: A comparison of our proposed test procedure with the standard 2SLS approach is warranted. As a referee correctly pointed out, while the 2SLS coefficient on the endogenous variable might not get the right magnitude of any particular parameter of interest, it could be getting the right sign given the monotonicity conditions of the model. For the case of binary y1 and y2 and no exogenous regressors (empty z1 ), Shaikh and Vytlacil (2005) and 15 ˆ is simply In the context of a binary endogenous regressor, note that the sign of (zi δˆ − zj δ) the sign of the difference in propensity scores for the pair of observations.
2052
J. ABREVAYA, J. A. HAUSMAN, AND S. KHAN
Bhattacharya, Shaikh, and Vytlacil (2005) showed that the probability limit of 2SLS identifies the correct sign of the average treatment effect when the outcome is assumed to be weakly monotonic in the treatment.16 Unfortunately, there are reasons to question the validity of a 2SLS-based test in more general settings. Specifically, in a nonlinear model (say, a probit model) with nonempty z1 and no causal effect of y2 on y1 , the probability limit of the 2SLS endogenous-variable coefficient will generally be nonzero; when this is the case, simply looking at the sign of the 2SLS coefficient introduces Type-I error whose probability converges to 1. REMARK 3.3: Interestingly, even in the case when the sign of the treatment effect can vary across individuals (contrary to our maintained assumptions), τ0 can represent an interesting parameter. For example, in the case without z1 , τ0 is the rank correlation between y1 and the treatment probability (propensity score).17 This correlation has the usual advantages when compared to other measures, for example, linear correlation, such as being more robust to outliers, which is especially important for the generalization to nonbinary variables. REMARK 3.4: If the treatment effect is positive for some z1 β0 and negative for some z1 β0 , it would be necessary to use local versions of τˆ to construct a consistent test. See, for example, Ghosal, Sen, and van der Vaart (2000) and Abrevaya and Jiang (2005), who developed consistent tests in similar Ustatistic frameworks. REMARK 3.5: No consideration has been given to the efficiency of the various estimators discussed above. As a referee pointed out, in principle, more efficient estimates of (δˆ0 βˆ0 τˆ0 ) could be obtained through the use of some joint estimation technique (e.g., a version of joint generalized method of moments). √ The (scalar) statistic τˆ is n-consistent and asymptotically normal, which implies that testing the null hypothesis H0 : τ0 = 0 is a simple z-test. The theoretical assumptions required for this result are provided in the Appendix, as is 16
Shaikh and Vytlacil (2005) generalized this result to the case with covariates (nonempty z1 ) by conditioning on the covariates. This approach can be interpreted as a conditional version of 2SLS, but is distinct from a standard 2SLS regression in which z1 and y2 are explicitly included as right-hand-side variables. It is an open question whether their results extend to situations with nonbinary outcomes, nonbinary endogenous regressors, and/or multiple endogenous regressors. 17 More generally, τ0 has an interpretation as a conditional rank correlation. In the case of a binary (nonbinary) endogenous variable, τ0 is the rank correlation between the outcome y1 and the propensity score (reduced-form index z δ0 ) conditional on having an identical structural index z1 β0 .
TESTING FOR CAUSAL EFFECTS
2053
the formal statement of the asymptotic properties (Theorem 1); proofs are provided in the Supplemental Material. Given τˆ and an estimated asymptotic standard error vˆ τˆ , one-sided or two-sided versions of this test can be implemented based on the ratio τ/ ˆ vˆ τˆ . To compute the standard error vˆ τˆ , we recommend the use of the bootstrap, since the form of the asymptotic variances in Theorem 1 is somewhat complicated.18 Furthermore, estimating the components of the analytical asymptotic variance matrix requires the choice of additional smoothing parameters. 4. THE GENERAL CASE This section presents the general version of the model for which the rankbased testing procedure can be applied. The model allows for multiple endogenous regressors. A given endogenous regressor may be discrete, continuous, or even censored in some way. The endogenous regressors are denoted y21 y22 y2Q , where Q is the number of endogenous regressors. Let QC ≤ Q denote the number of nondiscrete endogenous regressors. The Q × 1 vector y2 is defined as y2 = (y21 y22 y2Q ) . Each endogenous regressor y2q (for q = 1 Q) has a reduced-form generalized regression model (4.1)
∗ y2q = F2q (z δ0q ηq )
∗ y2q = D2q (y2q )
The error disturbances (ε η1 ηQ ) are assumed to be independent of z. The functions F2q and D2q may differ over q, allowing for an arbitrary mix of discrete and continuous endogenous regressors. Similar to the model for the generalized regression for y1 , we assume that (for q = 1 Q) F2q is strictly increasing in each of its two arguments and D2q is weakly increasing and nondegenerate (i.e., strictly increasing on some region of its argument). To simplify notation, define Δ0 ≡ (δ01 δ0Q ) to be the Q × matrix containing all of the reduced-form coefficients (where is √ the dimension of z). Each of the δ0q coefficient vectors can be estimated n-consistently in a first stage using equation-by-equation semiparametric estimation (e.g., maximum rank correlation or some other linear-index estimator). The estimate is defined as of δ0q (for q = 1 Q) is denoted δˆ q and the Q × matrix Δ ≡ (δˆ 1 δˆ Q ) . Δ ˆ we generalize the approach from Section 3 For the second-stage estimator β, i is and focus on observations pairs (i j) for which y2i is close to y2j and Δz 18 Although the bootstrap has not formally been shown to be consistent in the specific context considered, there √ is no reason to expect failure of the bootstrap given that each stage of the testing procedure is n-consistent. Recently, Subbotin (2006) showed consistency of the bootstrap for the maximum rank correlation estimator (our first-stage estimator). It is worthy of future research to investigate whether the approach of Subbotin (2006) could be extended to kernel-weighted rank estimators (like βˆ and τ). ˆ
2054
J. ABREVAYA, J. A. HAUSMAN, AND S. KHAN
j . Specifically, the second-stage estimator βˆ maximizes the objective close to Δz 19 function (4.2)
1 i − Δz j )1[y1i > y1j ]1[z β > z β] Kh (y2i − y2j )Kh (Δz 1i 1j n(n − 1) i =j
dim(v) where Kh (·) is a multivariate kernel function defined as Kh (v) ≡ q=1 kh (vq ) for a vector v. To test the effect of y2q on y1 (for any q = 1 Q), we want to fix z1 β and z δ0p for all p = q and examine the significance of the relationship between −q denote the matrix Δ with the qth row (i.e., δˆ ) removed, y1 and zδ0q . Let Δ q −q has dimension (Q − 1) × . The statistic associated with the qth so that Δ endogenous regressor is thus given by ω ˆ ijq sgn(y1i − y1j ) sgn(zi δˆ q − zj δˆ q ) (4.3)
τˆ q ≡
i =j
ω ˆ ijq
i =j
where the (estimated) weights ω ˆ ijq are defined as (4.4)
ˆ −q zi − Δ −q zj ) ω ˆ ijq ≡ kh (z1i βˆ − z1j β)Kh (Δ
The asymptotic theory for the general case is completely analogous to the results developed previously. The regularity conditions which change for the general case are conditions on the bandwidth sequence used in matching variables and the order of smoothness assumed on certain density and conditional expectation functions. An Example: Two Binary Endogenous Regressors Consider the following model with two binary endogenous regressors y21 and y22 : (4.5)
y1∗ = F(z1 β0 y21 y22 ε)
(4.6)
y21 = 1[z δ01 + η1 > 0]
y1 = D(y1∗ ) y22 = 1[z δ02 + η2 > 0]
Given estimators for δ01 , δ02 , and β0 , one tests the effect of y21 (second argument) on y1 by fixing z1 β and z δ02 , and examining the significance of the 19 Note that if the variables y2i and y2j are in fact discrete, we can replace the kernel function Kh (y2i − y2j ) with I[y2i = y2j ].
TESTING FOR CAUSAL EFFECTS
2055
relationship between y1 and z δ01 . This idea can be operationalized with the kernel-weighted rank-based statistic ω ˆ ij1 sgn(y1i − y1j ) sgn(zi δˆ 1 − zj δˆ 1 ) (4.7)
τˆ 1 ≡
i =j
ω ˆ ij1
i =j
where the (estimated) weights ω ˆ ij1 are defined as (4.8)
ˆ ω ˆ ij1 ≡ kh (z1i βˆ − z1j β)kh (zi δˆ 2 − zj δˆ 2 )
An analogous statistic (for testing the effect of y22 on y1 ) can easily be constructed. 5. EMPIRICAL ILLUSTRATION In this section, we apply our testing methodology to an empirical application concerning the effects of fertility on female labor supply. In particular, we adopt the approach of Angrist and Evans (1998), who used the gender mix of a woman’s first two children to instrument for the decision to have a third child. This instrumental-variable strategy allows one to identify the effect of having a third child on the woman’s labor-supply decision. The rationale for this strategy is that child gender is arguably randomly assigned and that, in the United States, families whose first two children are the same gender are significantly more likely to have a third child. The sample for the current study is drawn from the 2000 Census data (5% public-use microdata sample (PUMS)). In the analysis, the outcome of interest (y1 ) is whether the mother worked in 1999, the binary endogenous explanatory variable (y2 ) is the presence of a third child, and the instrument is whether the mother’s first two children were of the same gender. We consider specifications in which education, mother’s age at first birth, and age of first child enter as exogenous covariates (z1 ). More complete details on the empirical application, including construction of the sample, descriptive statistics, and first-stage results, are reported in the Supplemental Material. The primary results of interest relate to the conclusions from the causaleffect significance tests, which are reported in Table I. The table compares results obtained from the semiparametric τˆ statistic with those obtained from the z-statistic based on 2SLS estimates. The bootstrap was used to compute standard errors for τ. ˆ To examine the effect of additional covariates, testing results are reported starting from a model with no exogenous covariates and then adding covariates one by one until the full set of three exogenous covariates are included. In the model with no exogenous covariates, the z-statistics associated with τˆ and the 2SLS coefficient are extremely similar. The 2SLS z-statistic for
2056
J. ABREVAYA, J. A. HAUSMAN, AND S. KHAN TABLE I
TESTING SIGNIFICANCE OF THE BINARY ENDOGENOUS REGRESSOR (HAVING A THIRD CHILD) IN WOMEN’S LABOR-FORCE PARTICIPATIONa Exogenous Covariates in the Model
Semiparametric τˆ
s.e.
2SLS z -stat
αˆ
s.e.
z -stat
None
−0.00316
0.00085
−3.72
−0.1118
0.0298
−3.75
Education
−0.00299
0.00102
−2.94
−0.1103
0.0296
−3.72
Education, Mother’s age at first birth
−0.00655
0.00229
−2.86
−0.1111
0.0296
−3.75
Education, Mother’s age at first birth, Age of first child
−0.00695
0.00258
−2.69
−0.1124
0.0295
−3.80
a The z -statistics for the semiparametric and 2SLS estimation approaches are reported for several different model specifications. The 2SLS standard errors are heteroskedasticity-robust.
the larger models is basically unchanged from the no-covariate model, which is not too surprising given that the same-sex instrument is uncorrelated with the other exogenous covariates in the model. In contrast, the magnitude of the z-statistic for the semiparametric τˆ method does decline. The addition of covariates to the model forces the semiparametric method to make comparisons based on observation pairs with similar first-stage (estimated) index values associated with these exogenous covariates. It is encouraging, however, that the z-statistic magnitude does not decline by much as the second and third covariates are added to the model. Table I highlights the inherent robustnesspower trade-off between the semiparametric and parametric methodologies. Although one might have worried that the trade-off would be so drastic as to render the semiparametric method useless in practice, the results indicate that this is not the case. Even in the model with three covariates, the τˆ estimate provides strong statistical evidence (z = −269) that the endogenous third-child indicator variable has a causal effect on mothers’ labor supply. Importantly, this finding is not subject to the inherent misspecification of the linear probability model or any type of parametric assumption on the error disturbances. In addition, this illustration highlights the feasibility of the semiparametric approach even for a very large sample (n close to 300,000 here). 6. CONCLUDING REMARKS This paper proposes a new method for testing for the causal effects of endogenous variables in a generalized regression model. The model considered here allows for multiple continuously and/or discretely distributed endogenous variables, thereby offering a test for cases not previously considered in the literature. The proposed statistic converges at the parametric rate to a limiting
TESTING FOR CAUSAL EFFECTS
2057
normal distribution under the null hypothesis of no causal effect. A useful extension would be a localized version of the proposed procedure that would allow the sign of the causal effect(s) to vary over the support of the random variables in question. In addition, it would be of interest to improve the efficiency of the τˆ estimator. APPENDIX In this appendix, we outline the asymptotic theory for the three-stage testing procedure. We state the main asymptotic-normality results and also explicitly state sufficient regularity conditions for these results. The proofs, which are somewhat standard given previous results in this literature, are provided in the Supplemental Material. The assumed linear representation of the first-step estimator is 1 δˆ − δ0 = ψδi + op n−1/2 n i=1 n
(A.1)
where ψδi is an influence-function term with √ zero mean and finite variance. This representation exists for the available n-consistent semiparametric estimators. We do not specify a particular form for the influence-function term ψδi , since it depends on the particular estimator chosen. The first result concerns the asymptotic distribution for the second-stage estimator of β0 . Since β0 is only identified up to scale, we normalize its last component to 1, and denote its other components by θ0 and denote the correˆ where sponding estimator by θ, (A.2)
θˆ = arg max θ∈Θ
1 ˆ 1[y2i = y2j ]kh (zi δˆ − zj δ) n(n − 1) i =j
β(θ)] × 1[y1i > y1j ]1[z1i β(θ) > z1j
We impose the following regularity conditions: ASSUMPTION CPS—Parameter Space: θ0 lies in the interior of Θ, a compact subset of Rk−1 . ASSUMPTION FS: The first-stage estimator used to estimate δ0 is the maximum rank correlation estimator of Han (1987). Consequently, the same regularity conditions in that paper and Sherman (1993) are assumed, so we will have a linear representation as discussed above. We normalize one of the coefficients of δ0 to 1 and assume the corresponding regressor is continuously distributed on its support.
2058
J. ABREVAYA, J. A. HAUSMAN, AND S. KHAN
ASSUMPTION K—Matching Stages Kernel Function: The kernel function k(·) used in the second stage is assumed to have the following properties: K.1 k(·) is twice continuously differentiable, has compact support, and integrates to 1. K.2 k(·) is symmetric about 0. K.3 k(·) is a pth order kernel, where p is an even integer: ul k(u) du = 0 for l = 1 2 p − 1 up k(u) du = 0 ASSUMPTION H—Matching Stages Bandwidth Sequence: √ √ The bandwidth sequence hn used in the second stage satisfies nhpn → 0 and nh3n → ∞. ASSUMPTION RD—Last Regressor and Index Properties: z1i(k) is continuously distributed with positive density on the real line conditional on zi δ0 and all other elements of z1i . Moreover, zi δ0 is nondegenerate conditional on z1i β0 . ASSUMPTION ED—Error Distribution: (εi ηi ) is distributed independently of zi and is continuously distributed with positive density on R2 . ASSUMPTION FR—Full Rank Condition: Conditional on (zi δ0 y2i ), the support of z1i does not lie in a proper linear subspace of Rk . The following lemma establishes the asymptotic properties of the secondstage estimator of θ0 . Some additional notation is used in the statement of the lemma. The reduced-form linear index is denoted ζδi = zi δ0 and fζδ (·) denotes its density function. FZ1 denotes the distribution function of z1i . Also, ∇θθ denotes the second-derivative operator. LEMMA 1: If Assumptions CPS, FS, K, H, RD, ED, and FR hold, then √ (A.3) n(θˆ − θ0 ) ⇒ N(0 V −1 ΩV −1 ) or, alternatively, θˆ − θ0 has the linear representation 1 θˆ − θ0 = ψβi + op n−1/2 n i=1 n
(A.4)
with V = ∇θθ N (θ)|θ=θ0 and Ω = E[δ1i δ1i ], and ψβi = V −1 δ1i , where (A.5) N (θ) = 1[z1i β(θ) > z1j β(θ)]H(ζj ζj ) × F (z1i β0 z1j β0 ζj ζj ) dFZ1 ζ (z1i ζj ) dFZ1 ζ (z1j ζj )
TESTING FOR CAUSAL EFFECTS
2059
with ζi = zi δ0 , whose density function is denoted by fζ , and where (A.6)
F (z1i β0 z1j β0 ζi ζj ) = P(y1i > y1j |y2i = y2j z1i z1j ζi ζj )
(A.7)
H(ζi ζj ) = P(y2i = y2j |ζi ζj )
and the mean-zero vector δ1i is given by
δ1i = (A.8) fζ (ζi )μ31 (ζi ζi β0 ) dζi ψδi where (A.9)
μ(t ζ β) = H(t ζ)M(t ζ β)fζ (t)
with (A.10)
M(t ζ β) = E F (z1i β0 z1j β0 ζi ζj )1[z1i β > z1j β]zi |ζi = t ζj = ζ
μ1 (· · ·) denotes the partial derivative of μ(· · ·) with respect to its first argument, and μ31 (· · ·) denotes the partial derivative of μ1 (· · ·) with respect to its third argument. Although the particular expressions for V and Ω are quite involved, note that V represents the second derivative of the limit of the expectation of the maximand and Ω represents the variance of the limit of its projection. The asymptotic theory for the third-stage statistic is based on the above conditions, now also assuming Assumptions K and H are valid for the third-stage matching kernel, and assuming the following additional smoothness condition: ASSUMPTION S—Order of Smoothness of Density and Conditional Expectation Functions: S.1 Letting ζβi denote z1i β0 and fζβ (·) denote its density function, we assume fζβ (·) is p times continuously differentiable with derivatives that are bounded on the support of ζβi . S.2 The functions G11 (·) and Gx (·), defined as (A.11) G11 (·) = E sgn(y1i − y1j )fZk |Z−k z−kij δ(−k) z−kij |ζβi = · ζβj = · 0 (A.12) Gx (·) = E (sgn(y1i − y1j ) sgn(zi δ0 − zj δ0 ) − τ0 )(z1i − z1j ) | z1i − z1j = · where fZk |Z−k (·) in (A.11) denotes the density function of the last component of zi − zj , conditional on its other components, and z−kij denotes the difference for all the components of zi except the last one, are all assumed to be all p times continuously differentiable with derivatives that are bounded on the support of ζβi .
2060
J. ABREVAYA, J. A. HAUSMAN, AND S. KHAN
The main theorem establishes the asymptotic distribution of the statistic τ: ˆ THEOREM 1: If Assumptions CPS, FS, K, H, RD, ED, FR, and S hold, then √ (A.13) n(τˆ − τ0 ) ⇒ N(0 V2−2 Ω2 ) with V2 = E[fζβ (ζβi )] and Ω2 = E[δ22i ]. The mean-zero random variable δ2i is (A.14)
δ2i = 2fζβ (ζβi )G(y1i zi ζβi ) + E[Gx (ζβi )fζβ (ζβi )]ψβi + E[G11 (ζβi )fζβ (ζβ i)]ψδi
where Gx (·) denotes the derivative of Gx and G(· · ·) is given by (A.15)
G(y1 z ζ) = E[sgn(y1i − y1 ) sgn(zi δ0 − z δ0 )|ζβi = ζ] REFERENCES
ABREVAYA, J., AND W. JIANG (2005): “A Nonparametric Approach to Measuring and Testing Curvature,” Journal of Business & Economic Statistics, 23, 1–19. [2052] ABREVAYA, J., J. A. HAUSMAN, AND S. KHAN (2010): “Supplement to ‘Testing for Causal Effects in a Generalized Regression Model With Endogenous Regressors’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/7133-3_proofs.pdf; http://www.econometricsociety.org/ecta/Supmat/7133_Data and programs.zip. [2045] AHN, H., H. ICHIMURA, AND J. POWELL (1994): “Simple Estimators for Monotone Index Models,” Working Paper, Princeton University. [2044] ANGRIST, J. D., AND W. N. EVANS (1998): “Children and Their Parents’ Labor Supply: Evidence From Exogenous Variation in Family Size,” American Economic Review, 88, 450–477. [2045, 2055] ARADILLAS-LOPEZ, A., B. E. HONORÉ, AND J. L. POWELL (2007): “Pairwise Difference Estimation With Nonparametric Control Variables,” International Economic Review, 48, 1119–1158. [2043] BHATTACHARYA, J., A. SHAIKH, AND E. VYTLACIL (2005): “Treatment Effect Bounds: An Application to Swan–Ganz Catheterization,” Working Paper 11263, NBER. [2045,2052] BLUNDELL, R. W., AND J. L. POWELL (2003): “Endogeneity in Nonparametric and Semiparametric Regression Models,” in Advances in Economics and Econometrics: Theory and Applications, Eighth World Congress, Vol. II, ed. by M. Dewatripont, L. P. Hansen, and S. J. Turnovsky. Cambridge, U.K.: Cambridge University Press. [2043,2044] (2004): “Endogeneity in Semiparametric Binary Response Models,” Review of Economic Studies, 71, 655–679. [2043,2044,2047] CHESHER, A. (2005): “Nonparametric Identification Under Discrete Variation,” Econometrica, 73, 1525–1550. [2045] CHIBURIS, R. C. (2010): “Semiparametric Bounds on Treatment Effects,” Journal of Econometrics (forthcoming). [2045] DHRYMES, P. J. (1970): Econometrics: Statistical Foundations and Applications. New York: Springer-Verlag. [2044] GHOSAL, S., A. SEN, AND A. W. VAN DER VAART (2000): “Testing Monotonicity of Regression,” The Annals of Statistics, 28, 1054–1082. [2052] HAN, A. K. (1987): “Non-Parametric Analysis of a Generalized Regression Model: The Maximum Rank Correlation Estimator,” Journal of Econometrics, 35, 303–316. [2045,2046,2048, 2057]
TESTING FOR CAUSAL EFFECTS
2061
HECKMAN, J. (1978): “Dummy Endogenous Variables in a Simultaneous Equation System,” Econometrica, 46, 931–959. [2043,2048] HONG, H., AND E. TAMER (2003): “Inference in Censored Models With Endogenous Regressors,” Econometrica, 71, 905–932. [2043] IMBENS, G. W., AND W. K. NEWEY (2009): “Identification and Estimation of Triangular Simultaneous Equations Models Without Additivity,” Econometrica, 77, 1481–1512. [2044] KAN, K. AND C. KAO (2005): “Simulation-Based Two-Step Estimation With Endogenous Regressors,” Mimeo. [2043] KENDALL, M. G. (1938): “A New Measure of Rank Correlation,” Biometrika, 30, 81–93. [2043, 2045,2050] KHAN, S. (2001): “Two Stage Rank Estimation of Quantile Index Models,” Journal of Econometrics, 100, 319–355. [2044] LEWBEL, A. (2000): “Semiparametric Qualitative Response Model Estimation With Unknown Heteroscedasticity and Instrumental Variables,” Journal of Econometrics, 97, 145–177. [2043] MANTEL, N. (1966): “Evaluations of Survival Data and Two New Rank Order Statistics Arising in Its Consideration,” Cancer Chemotherapy Reports, 50, 163–170. [2045] PETO, R., AND J. PETO (1972): “Asymptotically Efficient Rank Invariant Test Procedures” (with discussion), Journal of the Royal Statistical Society, Ser. A, 135, 185–206. [2045] POWELL, J. L. (1994): “Estimation of Semiparametric Models,” in Handbook of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden. Amsterdam: North-Holland, 2443–2521. [2048] PRENTICE, R. L. (1978): “Linear Rank Tests With Right Censored Data,” Biometrika, 65, 167–179. [2045] RIVERS, D., AND Q. VUONG (1988): “Limited Information Estimators and Exogeneity Tests for Simultaneous Probit Models,” Journal of Econometrics, 39, 347–366. [2043] ROSENBAUM, P. R. (2002): Observational Studies. Springer Series in Statistics. Springer. [2047] SHAIKH, A. M., AND E. VYTLACIL (2005): “Threshold Crossing Models and Bounds on Treatment Effects: A Nonparametric Analysis,” Working Paper T307, NBER. [2045,2051,2052] SHERMAN, R. P. (1993): “The Limiting Distribution of the Maximum Rank Correlation Estimator,” Econometrica, 61, 123–137. [2057] SMITH, R. J., AND R. W. BLUNDELL (1986): “An Exogeneity Test for a Simultaneous Equation Tobit Model With an Application to Labor Supply,” Econometrica, 54, 679–685. [2043] SUBBOTIN, V. (2006): “On the Bootstrap of the Maximum Rank Correlation Estimator,” Mimeo, Northwestern University. [2053] TELSER, L. G. (1964): “Iterative Estimation of a Set of Linear Regression Equations,” Journal of the American Statistical Association, 59, 845–862. [2044] VYTLACIL, E., AND N. YILDIZ (2007): “Dummy Endogenous Variables in Weakly Separable Models,” Econometrica, 75, 757–779. [2043-2045,2047] YANG, S., AND R. L. PRENTICE (2005): “Semiparametric Analysis of Short Term and Long Term Hazard Ratios With Two Sample Survival Data,” Biometrika, 92, 1–17. [2045] YANG, S., AND Y. ZHAO (2007): “Testing Treatment Effect by Combining Weighted Log-Rank Tests and Using Empirical Likelihood,” Statistics & Probability Letters, 77, 1385–1393. [2045] YILDIZ, N. (2006): “Estimation of Binary Choice Models With Linear Index and Dummy Endogenous Variables,” Mimeo. [2044,2045,2047,2048]
Dept. of Economics, The University of Texas at Austin, Austin, TX 78712, U.S.A.;
[email protected], Dept. of Economics, Massachusetts Institute of Technology, Cambridge, MA 02142, U.S.A.;
[email protected], and Dept. of Economics, Duke University, Durham, NC 27708, U.S.A.; shakeebk@ econ.duke.edu. Manuscript received April, 2007; final revision received November, 2009.
Econometrica, Vol. 78, No. 6 (November, 2010), 2063–2084
TEMPTATION AND TAXATION BY PER KRUSELL, BURHANETTIN KURU¸SÇU, AND ANTHONY A. SMITH, JR.1 We study optimal taxation when consumers have temptation and self-control problems. Embedding the class of preferences developed by Gul and Pesendorfer into a standard macroeconomic setting, we first prove, in a two-period model, that the optimal policy is to subsidize savings when consumers are tempted by “excessive” impatience. The savings subsidy improves welfare because it makes succumbing to temptation less attractive. We then study an economy with a long but finite horizon which nests, as a special case, the Phelps–Pollak–Laibson multiple-selves model (thereby providing guidance on how to evaluate welfare in this model). We prove that when period utility is logarithmic, the optimal savings subsidies increase over time for any finite horizon. Moreover, as the horizon grows large, the optimal policy prescribes a constant subsidy, in contrast to the well known Chamley–Judd result. KEYWORDS: Temptation, self-control, consumption-savings, optimal taxation.
1. INTRODUCTION EXPERIMENTAL AND INTROSPECTIVE EVIDENCE suggest that consumers exhibit preference reversals as time passes. Such evidence has led to the development of models in which consumers have time-inconsistent preferences (see Laibson (1997), who built on earlier work by Strotz (1956) and Phelps and Pollak (1968)). In models with time-inconsistent preferences, a sequence of the consumer’s different “selves,” each valuing consumption streams in a unique way, plays a dynamic game. In this game of conflict across selves, one can define Pareto frontiers among selves and discuss non-cooperative equilibria of the dynamic game relative to this frontier. Consequently, policy proposals by an outside authority, such as the government, do not, in general, lead to unambiguous recommendations without deciding how to assign welfare weights to the different selves. In contrast, Gul and Pesendorfer (2001, 2004, 2005) developed an alternative, axiomatic, approach to modeling preference reversals. This approach does not necessitate splitting up the consumer into multiple selves. To address reversals, Gul and Pesendorfer formalized the ideas of temptation and self-control: they defined preferences over consumption sets rather than over consumption sequences, and then discussed temptation and self-control in terms of preferences over these sets. The axiomatization delivers a representation theorem with utility over consumption sets expressed in terms of two utility functions: one describes commitment utility (u), which gives the ranking that the consumer uses to compare consumption bundles, as opposed to 1 We thank a co-editor and four anonymous referees for important suggestions and comments. This paper is a shortened and substantially revised version of an earlier paper with the same title; the quantitative analysis in the earlier paper is now contained in a new paper titled “How Much Can Taxation Alleviate Temptation and Self-Control Problems?”
© 2010 The Econometric Society
DOI: 10.3982/ECTA8611
2064
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
consumption sets; the other describes temptation utility (v), which plays a key role in determining how actual consumption choices depart from what commitment utility would dictate. In this framework, the consumer’s welfare of a given set B, where B, for example, could be a normal budget set, is given by ˜ the sum of the commitment and temptation v(x), maxx∈B [u(x)+v(x)]−maxx∈B ˜ utilities less the temptation utility evaluated at the most tempting choice (i.e., the maximal level of temptation). The consumer’s actual choice maximizes the sum of the commitment and temptation utilities. In addition, the consumer ˜ − v(x), which increases v(x) experiences a (utility) cost of self-control, maxx∈B ˜ to the extent that the consumer’s actual choice deviates from succumbing completely to the temptation. The consumer’s actual choice then represents a compromise between commitment utility and the cost of self-control. Using the Gul–Pesendorfer model, it is straightforward to ask normative questions. The purpose of this paper is to examine optimal tax policy with their model. In particular, we look at Ramsey taxation, that is, we consider whether and how linear tax-transfer schemes can be used to improve consumer welfare. Linearity here is a restriction in our analysis: if any nonlinear taxation scheme is allowed, one could (trivially) circumvent the self-control/temptation problems by implementing a command policy in which the consumer’s choice set is reduced to a singleton. There are obvious reasons why such schemes are not attractive to use in practice, but more importantly we show that even with the rather weak instrument we offer the government, it is possible to improve utility in most cases, and that in some cases a linear tax can actually achieve the full optimum. We discuss and motivate linearity in more detail in Section 2.3. Overall, the question here is how different (linear) distortion rates affect the welfare of consumers who suffer from temptation and self-control problems. To begin our analysis, we look at a two-period model with general preferences, except that we specialize temptation utility to reflect impatience, since this is the object of our study. In the two-period model, the consumption set faced by an agent is the usual triangle, and taxes alter the precise nature of the triangle. We first analyze a partial equilibrium economy (i.e., prices are exogenous) in which we let the government use a tax-transfer scheme that—for the consumer’s actual choice—uses up no net resources and thus is self-financing. For example, the government can make consumption in period 1 more expensive relative to consumption in period 2 by subsidizing period-2 consumption, and to the extent the consumer responds by buying more consumption in period 2 than his endowment, the government must use a lump-sum tax in period 1 to balance its budget. We show that, in general, taxation can improve welfare and that a temptation toward impatience calls for subsidizing consumption in period 2. We also examine the size of these subsidies. Subsidizing period-2 consumption improves welfare because it makes temptation less attractive. To see this, remember that the consumer’s actual choice maximizes the sum of the commitment and temptation utilities. Because of the envelope theorem, a small increase (from zero) in the subsidy for period-2
TEMPTATION AND TAXATION
2065
consumption has no effect on the sum of the commitment and temptation utilities. But this increase reduces the maximal level of temptation: the consumer receives a smaller subsidy if he gives in to temptation (because in that case he consumes more today and less tomorrow), but the (lump-sum) tax that the consumer pays to finance the subsidy remains unchanged (because it depends on the consumer’s actual choice). We then consider general-equilibrium effects, which are important for two reasons. First, in an endowment economy, tax policy is not useful at all. In this case, the consumption allocation cannot be altered and for it to be supported in equilibrium by a triangle budget set, the slope of this set, net of taxes, must be unaffected by policy. Thus, pre-tax prices adjust to undo fully the tax wedge. Second, we show that if the production technology takes the standard neoclassical form (constant returns to scale and diminishing marginal products), government policy again has a role to play by altering equilibrium investment. In particular, using a representative-agent equilibrium model, we show that the partial-equilibrium result remains intact: it is optimal to subsidize investment. The intuition underlying this result is the same as in the partial-equilibrium model, although general-equilibrium effects on prices reduce the size of the optimal subsidy (but do not, as in the endowment economy, eliminate any role for a subsidy). The contrast between the general-equilibrium economies with and without intertemporal production makes it clear that the government must be able to influence the slope of the budget constraint, and thereby aggregate savings, if it is to improve welfare outcomes in the presence of temptation and self-control problems. Are our results special to the two-period model? In a setting with standard preferences and a choice between distorting either investment or labor supply, for example, Chamley (1986) and Judd (1985) showed that, in the long run, the government should not distort investment. In our model, labor supply is inelastic, but, nonetheless, is it possible that in the long run, investment should not be distorted/subsidized? We show, with a simple example, that the answer is, in general, no. We extend the two-period model to a T -period model in a way guided by the applied macroeconomics literature, which uses time-additive, stationary utility. Thus, we let commitment utility take the standard form and allow temptation utility simply to have a different current (or short-run) discount factor than commitment utility, reflecting impatience. This means that temptation utility reflects “quasigeometric” discounting of the future, which amounts to the assumption that nothing can be tempting other than changing current consumption relative to future consumption. We also introduce a parameter, γ, which regulates how strongly temptation utility influences consumption choices; γ = 0 delivers standard utility, where consumers act without self-control problems. Quasigeometric temptation nests two cases of special interest: the first is the time-inconsistent preferences considered by Laibson (1997) (the case γ → ∞, where the consumer succumbs completely to temptation) and the second is
2066
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
the special case in Gul and Pesendorfer (2004) in which temptation utility puts zero weight on the future (so that the consumer is tempted to consume his entire wealth today). For the Laibson case, in particular, we show that the consumer discounts utility using the “long-run” discount rate, thereby resolving the problem of which self to use when evaluating welfare in the multiple-selves model.2 Specializing further to logarithmic period utility, we solve fully for the laissez-faire and optimal outcomes. We find that optimal investment subsidy rates increase over time in any finite-horizon model. More importantly, as T → ∞, the optimum calls for a constant subsidy rate on investment, in contrast to the Chamley–Judd result. Finally, in the Laibson case in which consumers succumb completely to temptation, we show, for any period utility function featuring constant relative risk aversion, that linear taxes are, in fact, not restrictive, but instead deliver first-best welfare outcomes (i.e., the welfare outcome associated with the command outcome) and that, as in the logarithmic case, the optimum calls for a subsidy to savings as T → ∞. Section 2 looks at the two-period model and Section 3 looks at the T -period model. Section 4 concludes. Proofs of Propositions 6 and 8 are gathered in the Appendix. All other proofs are provided in the Supplemental material (Krusell, Kuru¸sçu, and Smith (2010)). 2. THE TWO-PERIOD MODEL For illustrating temptation and self-control problems in the savings context, a two-period model captures much of the essence, and in this section we provide some general results for this setting. In Section 3 we then examine some aspects of the further dynamics that appear in models with more periods. 2.1. Preferences A typical consumer in the economy values consumption today (c1 ) and tomorrow (c2 ). Specifically, the consumer has Gul–Pesendorfer preferences represented by two functions u(c1 c2 ) and v(c1 c2 ), where u is commitment utility and v is temptation utility. The decision problem of a typical consumer then is max{u(c1 c2 ) + v(c1 c2 )} − max v(c˜1 c˜2 ) c1 c2
c˜1 c˜2
subject to a budget constraint that we specify below. The consumer’s actual choice maximizes the sum u(c1 c2 ) + v(c1 c2 ) of the commitment and temptation utilities, and for any choice bundle (c1 c2 ), the cost of self-control is max v(c˜1 c˜2 ) − v(c1 c2 ). We make three assumptions. 2
See Proposition 6.
2067
TEMPTATION AND TAXATION
ASSUMPTION 1: u(c1 c2 ) and v(c1 c2 ) are twice continuously differentiable. ASSUMPTION 2:
u1 (c1 c2 ) u2 (c1 c2 )
<
v1 (c1 c2 ) v2 (c1 c2 )
for all c1 and c2 .
ASSUMPTION 3: u1 u2 v1 v2 > 0, u11 u22 v11 v22 < 0, and u12 v12 ≥ 0. Assumption 2 specializes to the case where temptation utility is tilted toward current consumption more than is commitment utility. 2.2. Budget Constraints Each consumer is endowed with k1 units of capital at the beginning of the first period and with one unit of labor in each period. Consumers rent these factors at given prices. Let r1 (r2 ) and w1 (w2 ) be the gross return on savings and wage rate in the first (second) period, respectively, and let P be the price vector defined as P = (r1 r2 w1 w2 ) We will specify the determination of prices in the following subsections. Given these prices, the consumer’s budget set is B(k1 P) ≡ {(c1 c2 ) : ∃k2 : c1 = r1 k1 + w1 − k2 and c2 = r2 k2 + w2 } where k2 is the consumer’s asset holding at the beginning of period 2 (i.e., his savings in period 1). Inserting the definitions of the functions u and v into the consumer’s objective function and combining terms, a typical consumer’s decision problem is max
(c1 c2 )∈B(k1 P)
{u(c1 c2 ) + v(c1 c2 )} −
max
(c˜1 c˜2 )∈B(k1 P)
v(c˜1 c˜2 )
In this two-period problem, the temptation part of the problem (i.e., the second maximization problem in the objective function) plays no role in determining the consumer’s actions in period 1. The temptation part of the problem does, however, affect the consumer’s welfare, as we discuss below in Section 2.4. Letting u¯ = u + v, the consumer’s intertemporal first-order condition is u¯ 1 (c1 c2 ) u1 (c1 c2 ) + v1 (c1 c2 ) = = r2 u¯ 2 (c1 c2 ) u2 (c1 c2 ) + v2 (c1 c2 ) It is straightforward to see that the intertemporal consumption allocation (which, in effect, maximizes u + v) represents a compromise between maximizing u and maximizing v. In contrast, the allocation that maximizes the temptation utility satisfies v1 (c˜1 c˜2 )/v2 (c˜1 c˜2 ) = r2 . Assumptions 2 and 3 imply that c˜1 > c1 and c˜2 < c2
2068
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
2.3. Government Policy We examine the effects of proportional taxes and subsidies. Thus, let there be a lump-sum transfer s and a proportional tax τi on investment in the first period. The consumer’s budget set then is Bτ (k1 P) ≡ {(c1 c2 ) : ∃k2 : c1 = r1 k1 + w1 + s − (1 + τi )k2 and c2 = r2 k2 + w2 } where k2 is the consumer’s asset holding at the beginning of period 2 (i.e., his savings in period 1). We assume that the government balances its budget in each period. Since the government has no exogenous expenditures to finance, its budget constraint reads s = τi k¯ 2 , where k¯ 2 is the representative agent’s savings in period 1. The restriction to linear schemes is motivated primarily by practical concerns: most real-world tax schemes for savings are linear or close to linear and it is interesting to know whether, without using a more sophisticated instrument, it is possible to improve welfare if consumers suffer from temptation and self-control costs. We thus simply answer the broad question, “Can a subsidy to savings improve welfare?,” a question which presumes linearity. In our environment, it is entirely nontrivial what the answer is: short of “forcing the consumer to consume what she likes,” is it possible for the government to improve matters? We show below that the answer is yes in all cases except one (see Section 2.5.1), and also that in one case of particular interest (Section 3.4), linearity is actually not restrictive at all: it can achieve the full optimum. These results and the reasons for them are, we think, quite illuminating. Of course, it is an open question why linear or near-linear tax schedules are used so often. Linear schemes are simple and likely easier to use than many nonlinear ones, although it would be quite challenging to make this point formally.3 It would indeed be very interesting to extend the tax setting used here to allow limited forms of nonlinearity, for example, cases with savings floors, particularly in a version of the model where nonlinearity is particularly “costly.” For example, if consumers have different discount rates, a savings floor which is uniform—say, because discount rates are private information—may be quite costly if some consumers have much stronger discounting than others; for an analysis along these lines, see Amador, Werning, and Angeletos (2006). However interesting, it is beyond the scope of the present paper to pursue these ideas further. In what follows, we will use the command policy—where the government is fully unrestricted and thus can give the consumer a singleton 3
For example, when there is consumer heterogeneity, implementing a nonlinear taxation scheme—such as the command policy—might impose large transactions costs on the government, requiring it to know each consumer’s preferences and resources in every state of the world.
TEMPTATION AND TAXATION
2069
choice set—as a natural benchmark against which to evaluate the efficacy of linear taxation.4 The government’s objective, then, is to choose the tax rate and transfer so that an individual’s welfare is maximized (subject to the government’s budget constraint). With a change in taxes, individuals are induced to behave differently; in addition, temptation changes because taxation changes the shape of the budget sets. It is thus not a priori clear how taxes influence equilibrium utility. 2.4. Partial Equilibrium In this section, we examine the effects of proportional taxes and subsidies for a fixed price vector P = (r1 r2 w1 w2 ).5 Proposition 1 states that it is optimal to subsidize savings in this case. PROPOSITION 1: In the partial-equilibrium two-period model, the optimal investment tax is negative. As becomes clear in the proof of the proposition, the optimal investment subsidy is positive if the representative consumer’s actual saving is greater than what would have been chosen had he succumbed to temptation. This can be explained intuitively. Consider increasing the investment subsidy from τi = 0. The marginal effect of this increase on u + v, evaluated at the consumption bundle actually chosen by the consumer, (c1 c2 ), is zero in the two-period model since the consumer is choosing his saving optimally. However, the marginal effect of this increase on the maximal level of temptation, v(c˜1 c˜2 ), is negative. The government sets tax rates so as to balance the budget based on equilibrium behavior: in equilibrium, the consumer pays a lump-sum tax equal to the amount of the investment subsidy received. When the investment subsidy is positive, a consumer who deviates to save less would thus not receive as large an investment subsidy, while paying the same tax. Therefore, succumbing to temptation is now less attractive: increasing the investment subsidy reduces the maximal level of temptation. With optimal taxation, thus, the consumer is induced to save more, so that his intertemporal consumption allocation is tilted more toward the future than in the absence of taxation. At the same time, the change in the slope of the consumer’s budget constraint reduces (other things equal) the temptation faced by the consumer. The net result is to increase the consumer’s welfare. 4
When consumers are tempted to consume “too much,” the command policy is equivalent to the imposition of a savings floor, with the floor chosen to replicate the command allocation. 5 The partial-equilibrium economy could also be viewed as one in which there are two linear production technologies, one using capital (with fixed marginal returns r1 and r2 ) and one using labor (with fixed marginal returns w1 and w2 ).
2070
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
2.5. General Equilibrium In this section, we study economies in which prices are no longer exogenous but instead adjust to clear perfectly competitive markets. There are important differences between the cases with and without intertemporal production. In an endowment economy, tax policy cannot influence aggregate savings and, consequently, has no influence on equilibrium welfare. In an economy with production, by contrast, tax policy can influence aggregate savings and welfare, though to a smaller extent than in partial equilibrium because of equilibrium effects on prices. 2.5.1. An Endowment Economy This section considers the role of taxation in an endowment economy. Proposition 2 states that a (linear) investment tax cannot improve welfare in an endowment economy. PROPOSITION 2: In an endowment economy, an investment tax has no effect on equilibrium welfare. In an endowment economy, prices adjust so that consumers choose to hold the endowment at all points in time. The proportional tax τi , therefore, cannot influence (realized) consumption. Furthermore, taxes are not useful for decreasing the disutility of self-control either. In equilibrium, the slope of the budget line at the endowment point is given from preferences (where commitment utility and temptation utility both matter). Because equilibrium consumption cannot change in response to taxes, this slope does not change: the slope is determined by the net-of-tax return on savings, and any change in tax rates simply changes the before-tax return. Thus, taxes do not influence the choice set of consumers: whatever temptations consumers face, they cannot be influenced by a proportional tax.6 2.5.2. An Economy With Intertemporal Production In this section, we examine the effects of proportional taxes in a generalequilibrium economy with production. Let f be a standard neoclassical aggregate production function and let there be standard geometric depreciation at rate d. The main difference between the endowment economy and the economy with intertemporal production is that prices (wages and interest rates) are 6 Nonlinear taxes, of course, would change the consumer’s equilibrium choice set and would, therefore, affect the disutility of self-control. In addition, we study an economy with a representative consumer; in an economy with heterogeneous consumers who differ in, say, their short-run discount rates, it is conceivable that linear taxation could affect individual allocations, and hence welfare, even if the aggregate allocation is fixed. We leave the examination of this possibility to future research.
TEMPTATION AND TAXATION
2071
determined by the aggregate savings behavior of consumers according to the usual marginal-product conditions. Specifically, r1 = r(k¯ 1 ) = 1 + f (k¯ 1 ) − d
and
w1 = w(k¯ 1 ) = f (k¯ 1 ) − f (k¯ 1 )k¯ 1
r2 = r(k¯ 2 ) = 1 + f (k¯ 2 ) − d
and
w2 = w(k¯ 2 ) = f (k¯ 2 ) − f (k¯ 2 )k¯ 2
and
so that tax policy can influence prices through an impact on investment. Proposition 3 states that the government can improve an individual’s welfare by imposing a negative tax (i.e., a subsidy) on investment. PROPOSITION 3: In the two-period production economy, the optimal investment tax is negative. Propositions 1 and 2 establish that investment subsidies can improve welfare when the government can influence aggregate saving outcomes, but they are silent on the sizes of both the optimal subsidies and the welfare improvements that accompany them. A proper quantitative analysis, however, requires both extending the model to a long (infinite) time horizon and finding a reasonable way to calibrate its key parameters (especially the parameters governing preferences). Krusell, Kuru¸sçu, and Smith (2010) tackled these tasks. Nonetheless, there is a qualitative question of theoretical interest: how does optimal policy and, in particular, the implied saving behavior, compare to that dictated by commitment utility (that is, that which would be chosen if the consumer—or the government, via general nonlinear taxation—had access to commitment)? Actual choices in this model are informed by both commitment and temptation utility, that is, they end up in between what commitment utility and temptation utility would dictate. Here we specialize utility functions to show that there is no presumption that, in general, optimal saving (i.e., saving under the optimal tax rate) lies in between also. In particular, optimal saving can actually prescribe less consumption today than would commitment utility. We specialize by making functional-form assumptions that are typical in the applied macroeconomics literature. Specifically, we assume that preferences are additively separable across time and that the period utility function features constant relative risk aversion, c 1−σ c11−σ +δ 2 and 1−σ 1−σ 1−σ c21−σ c1 v(c1 c2 ) = γ + δβ 1−σ 1−σ
u(c1 c2 ) =
2072
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
so that β < 1 regulates temptation impatience relative to commitment impatience. In this case, the consumer’s problem can be written as c 1−σ c11−σ + δ(1 + βγ) 2 (c1 c2 )∈Bτ (k1 P) 1−σ 1−σ 1−σ c˜1 c˜1−σ + δβ 2 −γ max (c˜1 c˜2 )∈Bτ (k1 P) 1 − σ 1−σ max
(1 + γ)
c˜1−σ
c˜1−σ
c 1−σ
c 1−σ
1 2 1 2 + δβ 1−σ − [ 1−σ + δβ 1−σ ]} is the cost of selfIn the expression above, γ{ 1−σ control. As we show in Section 3, where we extend the model to T periods, under these functional-form assumptions, the Gul–Pesendorfer model of temptation and self-control nests the multiple-selves model. In addition, the tractability provided by these assumptions allows us to obtain some additional analytical results. We consider first the case of logarithmic (period) utility. The following proposition gives explicit solutions for both laissez-faire and optimal saving.
PROPOSITION 4: In the two-period model with logarithmic utility, laissez-faire δ(1+βγ) savings are a fraction 1+γ+δ(1+βγ) of present-value wealth. The optimal investment γ(β−1) δ ∗ tax is τi = 1+γ < 0 and the associated savings fraction is 1+δ . The optimal subsidy depends only on preference parameters and not on the specification of technology. Specifically, it increases as temptation grows larger: it decreases as β increases (thereby reducing the gap between the “longrun” discount rate δ and the “short-run” discount rate βδ) and it increases in the strength, γ, of the temptation. To obtain the savings rate under commitment, set γ = 0 (or β = 1) in the expression for the laissez-faire savings rate; to obtain the savings rate when succumbing to temptation, let γ → ∞. The laissez-faire savings rate then lies in between these two extremes. Moreover, the optimal savings rate is identical to the laissez-faire savings rate under commitment. This result holds because, in the special case of logarithmic utility, the ratio of temptation consumption to actual consumption is a constant that depends only on preference parameters. This fact implies, in turn, that the cost of self-control depends only on preference parameters and, in particular, does not depend either on prices or taxes. Changes in the subsidy, consequently, leave the cost of self-control unchanged when utility is logarithmic, and the government in effect chooses the optimal subsidy rate simply to maximize commitment utility. With logarithmic utility, the competitive equilibrium allocation with optimal taxation coincides with the command (or commitment) outcome, that is, the allocation that obtains when the government chooses for the consumer by restricting his consumption set to a singleton (or when the consumer does
TEMPTATION AND TAXATION
2073
not suffer from self-control problems). Specifically, the command (or commitment) allocation maximizes welfare using the commitment utility function with β = 1. But welfare is higher under the command outcome than under the competitive equilibrium allocation with optimal taxation because the consumer does not incur a self-control cost when his choice set is a singleton. Proposition 4 shows that optimal policy under logarithmic utility dictates more than a marginal distortion: the prescription is to distort so that the equilibrium allocation is the same one that would obtain under commitment. Is this case a bound on the size of the distortion or does optimal policy sometimes prescribe a distortion that is strong enough to go beyond the commitment allocation? The following proposition answers this question affirmatively when the elasticity of intertemporal substitution, σ −1 , is close to 1 (i.e., when utility is close to logarithmic). PROPOSITION 5: Given σ, let τi (σ) be the optimal investment subsidy, and let c1 (τi (σ)) and c2 (τi (σ)) be the associated equilibrium consumption allocation. d c2 (τi (σ)) i (σ) |σ=1 < 0 and dσ ( c (τi (σ)) )|σ=1 > 0. Then dτdσ 1
Thus, near σ = 1, the optimal subsidy is larger (smaller) than the optimal subsidy under logarithmic utility when σ > (<) 1. Moreover, when σ > (<) 1, the competitive equilibrium allocation under optimal taxation is tilted more (less) toward future consumption than is the commitment allocation. 3. THE T -PERIOD MODEL Does the prescription that investment should be subsidized extend to a longer horizon model? To answer this question, we extend the generalequilibrium model with production analyzed above to have T periods. This extension requires us to specialize preferences, again along the lines of what seems useful for applied macroeconomic modeling and for comparisons with the well known Chamley–Judd result that the optimal tax on investment is zero in the long run. In particular, we use “quasigeometric temptation,” which we show nests the Laibson model (the case γ → ∞) for constant relative risk aversion (CRRA) preferences. Our demonstration that investment should be subsidized in the long run uses a further specialization of preferences, first to logarithmic utility and then to CRRA utility when γ → ∞ (i.e., the Laibson case), since these assumptions permit an explicit solution for both laissez-faire outcomes and optimal outcomes. 3.1. Quasigeometric Temptation Consider a T -period (periods 0 1 T ) production economy where taxes and transfers are allowed to be different across periods. The agent makes his decision by taking as given the aggregate prices as functions of the aggregate
2074
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
¯ the law of motion for aggregate capital k¯ = Gt (k), ¯ and the sequence capital k, of transfers and taxes. The problem of the price-taking agent in period t, using recursive notation (a prime symbolizes next-period values), is given by ¯ = max u(c) + δUt+1 (k Gt (k)) ¯ Ut (k k) ck
¯ − max Vt (k˜ Gt (k)) ¯ + Vt (k Gt (k)) ˜ k˜ c
where the temptation function is quasigeometric, ¯ = γ u(c) + βδUt+1 (k Gt (k)) ¯ Vt (k k) with a budget constraint (which applies for both actual and temptation choices) given by ¯ + w(k) ¯ + st c + (1 + τit )k = r(k)k The investment subsidy τit is allowed to depend on time, and the lump-sum transfer st varies with τit and k¯ so as to ensure that the government’s budget balances. The consumer’s actual savings are determined by a “realized” deci¯ similarly, savings when succumbing to temptation are sion rule k = gt (k k); ¯ determined by a “temptation” decision rule k˜ = g˜ t (k k). DEFINITION 1: A time-t recursive competitive equilibrium for this econ¯ and g˜ t (k k), ¯ a pair of omy consists of a pair of decision rules gt (k k) ¯ and Vt (k k), ¯ pricing functions r(k) ¯ and w(k), ¯ and value functions Ut (k k) ¯ such that (i) given Ut (k k) ¯ and a law of motion for aggregate capital Gt (k), ¯ ¯ ¯ Vt (k k), gt (k k) solves the maximization problem above and g˜ t (k k) max¯ (ii) prices are given by r(k) ¯ = 1 − d + f (k) ¯ and w(k) ¯ = imizes Vt (k k), ¯ ¯ ¯ f (k) − f (k)k, (iii) the law of motion for aggregate capital is consistent with ¯ k) ¯ = Gt (k), ¯ and (iv) the government the individual decision rule, that is, gt (k ¯ budget balances in each period: st = τit Gt (k). We require the government to run a balanced budget in this definition, but this requirement is not restrictive, because a Ricardian-equivalence result obtains straightforwardly in this environment (i.e., given the sequence of investment subsidies and accompanying balanced-budget lump-sum taxes, government deficits and/or surpluses financed by incremental lump-sum taxes/subsidies would have no effect on equilibrium allocations).7 7 With borrowing constraints, Ricardian equivalence might fail to hold in the model of temptation and self-control even if the timing of taxes does not influence actual consumption choices or equilibrium interest rates. In particular, borrowing constraints could still affect welfare if they restrict the temptation choice but not the actual choice.
TEMPTATION AND TAXATION
2075
3.2. Generalized Euler Equations Solving for equilibrium requires finding two decision rules: one for actual savings decisions and one for temptation savings decisions. It is straightforward to derive a pair of generalized Euler equations (GEEs) that determine these two decision rules. These GEEs will prove useful for interpreting the policy results in the T -period model. The GEE for the actual choice is u (ct ) = δ
1 + βγ r(k¯ t+1 ) {(1 + γ)u (ct+1 ) − γu (c˜t+1 )} 1 + γ 1 + τit
where ct and ct+1 are the actual consumption levels in periods t and t + 1, and c˜t+1 is temptation consumption in period t + 1. The GEE for the temptation choice is u (c˜t ) = δβ
r(k¯ t+1 ) s s {(1 + γ)u (ct+1 ) − γu (c˜t+1 )} 1 + τit
where c˜t is the consumption level in period t in the hypothetical case that the s s consumer succumbs to temptation today and ct+1 and c˜t+1 are the actual and temptation consumption levels in period t + 1 given that the consumer succumbs today. The GEEs differ from standard Euler equations in two ways. First, the discount factors are smaller than the discount factor for commitment utility, δ (the discount factor in the GEE for actual consumption is between δ and the discount factor for temptation utility, βδ). Second, there is an additional term γ(ut+1 − u˜ t+1 ) on the right-hand side of the GEEs. This term is positive because utility is strictly concave and temptation consumption exceeds actual consumption (assuming impatience). Thus, relative to the standard consumption– savings model, there is an additional benefit to saving here. 3.3. Characterization In this section we specialize preferences to cases that are of particular interest from the perspective of the macroeconomics literature. These will then be used in the subsequent section, where we study optimal policy in the T -period model. We look first at (period) utility functions with a constant elasticity of intertemporal substitution, that is, u(c) = c 1−σ /(1 − σ) for σ > 0 (or logarithmic utility if σ = 1). For this case, our model nests the Laibson formulation. In particular, Proposition 6 shows that (given prices) as γ → ∞, the consumer’s value function converges to the function under commitment utility but evaluated at temptation consumption. ¯ and PROPOSITION 6: Given a law of motion for aggregate capital, k¯ = Gt (k), a sequence of taxes and transfers, as γ → ∞, the Gul–Pesendorfer (GP) model
2076
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
converges to the Laibson model, that is, the value functions and consumption choices of the consumer in the GP setting are given by 1−σ ¯ = c ¯ + δUt+1 (k Gt (k)) Ut (k k) 1−σ
where c 1−σ ¯ + δβUt+1 (k Gt (k)) 1−σ ¯ + w(k) ¯ + st s.t. c + (1 + τit )k = r(k)k
(c k ) = arg max
This limit offers a resolution to the problem of which of the consumer’s selves to use when assessing welfare in the multiple-selves model. Specifically, in this limit the consumer succumbs completely to temptation, but he evaluates welfare by discounting using the discount factors in commitment utility. For the case of logarithmic utility, we obtain a similar result regardless of the extent to which the consumer succumbs to temptation (i.e., for any value of γ). ¯ and PROPOSITION 7: Given a law of motion for aggregate capital, k¯ = Gt (k), a sequence of taxes and transfers, when u(c) = log(c), the value function and consumption choices of the agent are given by ¯ = log(c) + δUt+1 (k Gt (k)) ¯ + Ω Ut (k k) where ¯ (c k ) = arg max(1 + γ) log(c) + δ(1 + βγ)Ut+1 (k Gt (k)) ¯ + w(k) ¯ + st s.t. c + (1 + τit )k = r(k)k where Ω is a constant that depends only on preference parameters. This result holds because, as in the two-period model with logarithmic utility, both actual and temptation consumption are proportional (at any point in time) to lifetime income, with the constant of proportionality depending only on preference parameters (and not on prices or taxes). Thus, the ratio of actual to temptation consumption depends only on preference parameters at any point in time. As in the two-period model, the self-control cost (the constant Ω in Proposition 7) depends only on preference parameters and does not vary either with prices or with policy.
TEMPTATION AND TAXATION
2077
3.4. Optimal Policy In this section, we study optimal policy in the T -period model under the assumption that the government can commit to a sequence of tax and/or subsidy rates. In Proposition 8, we analyze the case of logarithmic preferences for any values of β and γ. In Proposition 9, we analyze the case of CRRA preferences when γ → ∞ (for any value of β). We therefore nest the Laibson multipleselves formulation (which appears in the limit as γ → ∞). As in the previous sections, the government’s objective is to maximize time-0 lifetime utility of the representative agent. Proposition 7 shows that for logarithmic utility, the welfare of the representative agent at time 0 is U0 (k¯ 0 k¯ 0 ) = a constant +
T
δt u(ct )
t=0
The government’s goal is to maximize this welfare function subject to the aggregate resource constraint ct + k¯ t+1 − (1 − d)k¯ t = f (k¯ t ). The welfaremaximizing consumption allocation, therefore, must satisfy the first-order condition u (ct )/u (ct+1 ) = δr(k¯ t+1 ) at every point in time. As in the two-period model with logarithmic utility, the government’s optimal policy replicates the commitment allocation. To find the tax policy that generates the commitment allocation as a competitive equilibrium outcome, it is straightforward to use the optimality conditions of a typical (competitive) consumer to find the sequence of tax rates that induces him to choose it (see the proof of Proposition 8 for details). Proposition 8 gives the optimal sequence of subsidies to investment.8 PROPOSITION 8: Under logarithmic utility, the optimal tax at time t is given by ⎧ γ(β − 1) ⎪ ⎪ for t = T − 1, ⎨ 1+γ τit = γ(β − 1) ⎪ ⎪ ⎩ for t < T − 1. 1 + γ + δ(1 + δ + · · · + δT −2−t )(1 + βγ) The optimal investment subsidies are all positive, because when the selfcontrol cost is independent of prices and policies (as it is under logarithmic utility), the government’s objective reduces to maximizing the commitment utility function. Thus, the optimal government policy is to replicate the commitment savings rate, and since the savings rate is lower in competitive equilibrium than under commitment, the optimal policy is to subsidize savings. 8 In a multiple-selves consumption–savings model, Laibson (1996) also argued that optimal policy requires subsidizing savings.
2078
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
Under logarithmic utility, the optimal subsidies depend only on preference parameters. This result follows from the fact that the income and substitution effects of changes in interest rates exactly offset each other when utility is logarithmic, so that the ratio of consumption to income at any point in time is a constant that depends only on preference parameters. Furthermore, at each time t, the optimal subsidies are decreasing in β (provided β < 1) and increasing in γ: that is, they increase as temptation grows stronger. Finally, the optimal subsidies increase as an individual comes closer to period T : τiT −2 < · · · < τi1 < τi0 < 0. To see why, examine the GEE at time t:
ct+1 ct+1 δ(1 + βγ) r(k¯ +1 ) 1+γ 1− = ct 1 + γ 1 + τit c˜t+1 The ratio of actual consumption to temptation consumption grows larger as t increases, so the term in square brackets on the right-hand side of the (rearranged) GEE is larger at earlier than at later dates.9 As a result, the righthand side of the Euler equation is closer to the right-hand side of the commitment Euler equation at early dates. Replicating the right-hand side of the commitment Euler equation, therefore, requires a smaller subsidy at earlier dates. An immediate implication of Proposition 8 follows. COROLLARY 1: Under logarithmic utility, as T → ∞, the optimal tax at any fixed t converges to τi =
γ(β − 1) δ (1 + βγ) 1+γ+ 1−δ
Thus, the celebrated Chamley–Judd result that investment (alternatively, capital income) should be undistorted in the long run does not apply in this 9 To see why the ratio of actual to temptation consumption decreases with age, note that consumption at time t is given by
ct =
1 Yt δ(1 + βγ) mt 1+ 1+γ
and temptation consumption is given by c˜t =
1 Yt 1 + δβmt
where Yt is lifetime income at time t and mt = 1 + δ + · · · + δT −t−2 . The mt ’s decrease over time, so ct /c˜t increases over time.
TEMPTATION AND TAXATION
2079
model.10 For any finite horizon, the optimal subsidy rate will in fact increase over time, and for the infinite-horizon case, the optimal subsidy rate is timeinvariant. We turn now to the determination of optimal policy under CRRA utility for the limiting case in which γ → ∞. Proposition 6 shows that the objective of the government is the same as in the logarithmic case, except that the constant term in the objective function (which captures the cost of self-control in the logarithmic case) is equal to zero. The optimal policy, therefore, is to replicate the commitment allocation, as stated formally in Proposition 9. PROPOSITION 9: Under CRRA utility, in the limiting case γ → ∞, the optimal sequence of investment taxes implements the commitment allocation and generates the first-best welfare outcome for the consumer. When the consumer succumbs completely to temptation, therefore, restricting the set of tax instruments to a linear class does not prevent the government from achieving the first-best welfare outcome. Moreover, we can again show that as T → ∞, the optimum calls for a subsidy to savings, in contrast to the Chamley–Judd result. 4. CONCLUSION AND REMARKS This paper makes clear that when consumers suffer from temptation and self-control problems, linear tax schedules can improve consumers’ welfare, even though such schedules are not very powerful tools for restricting consumers’ choice sets. The direction of the change is the expected one: when temptation is characterized by “excessive” impatience, optimal policy is to subsidize savings. Moreover, in the special case in which consumers succumb completely to temptation (i.e., the multiple-selves model), linear taxes deliver firstbest welfare outcomes. As discussed in Section 2.3, it would be very interesting to extend the present analysis to nonlinear taxation, especially when there is consumer heterogeneity and private information about types makes it costly to use nonlinear (and linear) schemes. It would also be interesting to consider political-economy constraints on taxes; in practice, we observe a range of tax policy outcomes that do not appear to line up with theoretical prescriptions. For example, we suggest (Section 3.4) that the Chamley–Judd prescription that “optimal taxes on capi10
The usual setting for the Chamley–Judd result is an infinite-horizon economy. Here, we obtain results for the infinite-horizon economy by studying the limit of a sequence of finite-horizon economies as the horizon grows long.
2080
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
tal income should be zero in the long run” could be sharpened to “. . . should be negative . . . ,” but in reality these taxes are positive, and in some places large, and it is highly likely that these outcomes have political-economy underpinnings.11 Integrating such constraints is an important topic that we hope to address in future work. APPENDIX PROOF OF PROPOSITION 6: Let Yt be the lifetime income from period t on w which is given by rt kt + wt + r t+1 + · · · for a given price sequence {(rt wt )}Tt=0 . t+1 (For simplicity, we assume that taxes and transfers are zero in the proof of the proposition, but it is straightforward to adapt the proof to allow for nonzero taxes.) The budget constraint of the agent at time t in terms of consumption in that period and the lifetime income from next period on is given by ct +
Yt+1 = Yt rt+1
To prove this proposition, we show that the optimization problem of the consumer in period t takes the form
1−σ 1+γ Yt+1 Yt − + δ(1 + βγ)Ut (Yt+1 ) Yt+1 1 − σ rt+1 t+1 1−σ 1 Y − γ max Yt − + δβUt (Yt+1 ) t+1 1 − σ rt+1 Y
Ut (Yt ) = max
where Ut (Yt ) is given by Ut (Yt ) = bt
Yt1−σ 1−σ
bt is a constant that depends on utility parameters, prices, and time period t. t+1 In addition, starting from the last period, we show that, as γ → ∞, Yt+1 → Y and that the value function of the consumer for each t is given by t+1 1−σ 1 Y t+1 ) Ut (Yt ) = Yt − + δUt (Y 1−σ rt+1 11
See, for example, Acemoglu, Golosov, and Tsyvinski (2008).
TEMPTATION AND TAXATION
2081
The optimal decision rules for the actual and temptation solutions, respectively, are given by ct =
1
Yt and
1/σ δ(1 + βγ)bt+1 (1−σ)/σ 1+ rt+1 1+γ
1/σ δ(1 + βγ)bt+1 1/σ rt+1 1+γ Yt Yt+1 =
1/σ δ(1 + βγ)bt+1 (1−σ)/σ 1+ rt+1 1+γ
and ct =
1 (1−σ)/σ 1 + (δβbt+1 )1/σ rt+1
t+1 = Y
Yt
1/σ (δβbt+1 )1/σ rt+1 (1−σ)/σ 1 + (δβbt+1 )1/σ rt+1
and
Yt
We start with period T − 1 and continue backward. Note that bT = 1. From T Next cT −1 and YT → Y the expressions above, it should be clear that cT −1 → we show that UT −1 (YT −1 ) =
1−σ cT1−σ Y −1 +δ T 1−σ 1−σ
1−σ 1−σ ) − cT1−σ To show this, we need to show that limγ→∞ γ(cT1−σ −1 + δβYT −1 − δβYT 1−σ 1−σ ), we obtain − cT1−σ Inserting the decision rules into γ(cT1−σ −1 + δβYT −1 − δβYT
⎛
(1)
(1−σ)/σ δ(1 + βγ) 1 + δβ rT(1−σ)/σ ⎜ 1+γ ⎜ lim γ ⎜
1−σ
1/σ γ→∞ ⎝ δ(1 + βγ) (1−σ)/σ rT 1+ 1+γ ⎞ −
⎟ ⎟ ⎟ (1 + δβ(δβ)(1−σ)/σ rT(1−σ)/σ )1−σ ⎠ 1 + δβ(δβ)(1−σ)/σ rT(1−σ)/σ
2082
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
Applying l’Hôpital’s rule by letting γ = 1/γ and γ → 0, it is easy to show that the limit above converges to zero. Thus, lim UT −1 (YT −1 ) =
γ→∞
1−σ cT1−σ Y −1 +δ T 1−σ 1−σ
In period t, expression (1) contains bt+1 as ⎛
(1−σ)/σ δ(1 + βγ)bt+1 rT(1−σ)/σ ⎜ 1 + δβbt+1 1+γ ⎜ (2) lim γ ⎜
1/σ
1−σ γ→∞ ⎝ δ(1 + βγ)bt+1 (1−σ)/σ 1+ rT 1+γ ⎞ −
⎟ ⎟ (1−σ)/σ 1−σ ⎟ 1/σ (1 + (δβbt+1 ) rT ) ⎠ 1 + (δβbt+1 )1/σ rT(1−σ)/σ
where bT = 1 and bt is given recursively as
σ
1/σ δ(1 + βγ)bt+1 (1−σ)/σ bt = (1 + γ) 1 + rt+1 1+γ (1−σ)/σ σ − γ 1 + (δβbt+1 )1/σ rt+1 db
t+1 =0 Using this equation and the fact that bT = 1, we can show that limγ→0 ˜ d γ˜ for all t, which implies that the expression in (2) converges to zero as γ˜ → 0. Thus,
Ut (Yt ) =
1−σ Y c 1−σ ct1−σ t+1 ) + δbt+1 t+1 = t + δUt+1 (Y 1−σ 1−σ 1−σ
Q.E.D.
PROOF OF PROPOSITION 8: Proposition 7 provides the value function for the consumer evaluated at the competitive equilibrium allocation, which is also the objective function for the government. The government maximizes the objective function by choosing consumption allocations subject to the economy’s resource constraint at each point in time. Thus, setting kt = k¯ t , the government’s problem reduces to Ut (k¯ t k¯ t τ) = max log(ct ) + δUt (k¯ t+1 k¯ t+1 τ) ct k¯ t+1
subject to economy’s resource constraint ct + k¯ t+1 = (1 − d)k¯ t + f (k¯ t )
2083
TEMPTATION AND TAXATION
The optimal allocation must satisfy the Euler equation 1 1 = δr(k¯ t+1 ) ct ct+1 The government implements this allocation by choosing tax rates such that the Euler equation for the consumer is equivalent to the government’s Euler equation above. The proof of Proposition 7 shows that the competitive equilibrium allocation satisfies the Euler equation 1 r(k¯ t+1 ) 1 = Mt+1 ct 1 + τit ct+1 where MT =
δ(1+βγ) 1+γ
, MT −1 =
δ(1+δ)(1+βγ) 1+γ+δ(1+βγ)
Mt+1 =
Mt+1 1+τit γ(β−1) 1+γ+δ(1+δ+···+δT −2−t )(1+βγ)
Thus, the government chooses τit such that γ(β−1) 1+γ
, τiT −2 =
γ(β−1) 1+γ+δ(1+βγ)
τit =
δ(1+δ+···+δT −t−1 )(1+βγ) 1+γ+δ(1+δ+···+δT −t−2 )(1+βγ)
= δ, which delivers τiT −1 =
Q.E.D.
REFERENCES ACEMOGLU, D., M. GOLOSOV, AND A. TSYVINSKI (2008): “Political Economy of Mechanisms,” Econometrica, 76, 619–641. [2080] AMADOR, M., I. WERNING, AND G.-M. ANGELETOS (2006): “Commitment vs Flexibility,” Econometrica, 74, 365–396. [2068] CHAMLEY, C. (1986): “Optimal Taxation of Capital Income in General Equilibrium With Infinite Lives,” Econometrica, 54, 607–622. [2065] GUL, F., AND W. PESENDORFER (2001): “Temptation and Self-Control,” Econometrica, 69, 1403–1435. [2063] (2004): “Self-Control and the Theory of Consumption,” Econometrica, 72, 119–158. [2063,2066] (2005): “The Revealed Preference Theory of Changing Tastes,” Review of Economic Studies, 72, 429–448. [2063] JUDD, K. L. (1985): “Redistributive Taxation in a Simple Perfect Foresight Model,” Journal of Public Economics, 28, 59–83. [2065] KRUSELL, P., B. KURU¸SÇU, AND A. A. SMITH, JR. (2009): “How Much Can Taxation Alleviate Temptation and Self-Control Problems?” Manuscript. [2071] (2010): “Supplement to ‘Temptation and Taxation’,” Econometrica Supplemental Material, 78, http://www.econometricsociety.org/ecta/Supmat/8611_Proofs.pdf. [2066] LAIBSON, D. (1996): “Hyperbolic Discount Functions, Undersaving, and Savings Policy,” Working Paper 5635, National Bureau of Economic Research. [2077] (1997): “Golden Eggs and Hyperbolic Discounting,” Quarterly Journal of Economics, 62, 443–477. [2063,2065] PHELPS, E. S., AND R. A. POLLAK (1968): “On Second-Best National Saving and GameEquilibrium Growth,” Review of Economic Studies, 35, 185–199. [2063] STROTZ, R. H. (1956): “Myopia and Inconsistency in Dynamic Utility Maximization,” Review of Economic Studies, 23, 165–180. [2063]
Institute for International Studies, Stockholm University, 106 91 Stockholm, Sweden;
[email protected],
2084
P. KRUSELL, B. KURU¸ SÇU, AND A. A. SMITH, JR.
Economics Dept., University of Toronto, 150 St. George St., Toronto, Ontario M5S 3G7, Canada;
[email protected], and Economics Dept., Yale University, 28 Hillhouse Avenue, New Haven, CT 06520, U.S.A.;
[email protected]. Manuscript received June, 2009; final revision received April, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 2085–2099
A PARADOX FOR THE “SMOOTH AMBIGUITY” MODEL OF PREFERENCE BY LARRY G. EPSTEIN1 Two Ellsberg-style thought experiments are described that reflect on the smooth ambiguity decision model developed by Klibanoff, Marinacci, and Mukerji (2005). The first experiment poses difficulties for the model’s axiomatic foundations and, as a result, also for its interpretation, particularly for the claim that the model achieves a separation between ambiguity and the attitude toward ambiguity. Given the problematic nature of its foundations, the behavioral content of the model and how it differs from multiple priors, for example, are not clear. The second thought experiment casts some light on these questions. KEYWORDS: Ambiguity, calibrating ambiguity aversion, multiple priors, smooth ambiguity model of preference, separation of ambiguity from ambiguity aversion.
1. INTRODUCTION TWO ELLSBERG-STYLE THOUGHT EXPERIMENTS, or examples, are described that reflect on the smooth ambiguity decision model developed by Klibanoff, Marinacci, and Mukerji (KMM) (2005). It is argued that the first experiment poses difficulties for KMM’s axiomatic foundations for their model and, as a result, also for its interpretation, particularly for the claim that the model achieves a “separation” between ambiguity and the attitude toward ambiguity. It is shown that in an important sense separation is not afforded by the model. KMM presented their model as a general model, for example, as an alternative to multiple priors (Gilboa and Schmeidler (1989)), and describe it (p. 1875) as “offering flexibility in modeling ambiguity” and as permitting “a wide variety of patterns of ambiguity.” However, because of its problematic foundations, the behavioral content of the model and how it differs from multiple priors, for example, are not clear. The second thought experiment casts light on these questions by demonstrating important differences from multiple priors as to when randomization between acts is valuable. We begin with an outline of the model that we refer to here as the KMM model. Let Ω be a set of states, let C be the set of consequences or prizes, taken here, for simplicity, to be a compact interval in the real line, and denote by Δ(C) and Δ(Ω) the sets of probability measures on C and Ω, respectively. (Technical details are standard and are suppressed.) An act is a mapping f : Ω → Δ(C), that is, by an act we mean an Anscombe–Aumann act over the 1
This research was supported by a grant from the National Science Foundation (Award SES0917740). I am grateful to Andrew Ellis, Yoram Halevy, Peter Klibanoff, Asen Kochov, Bart Lipman, Mark Machina, Wolfgang Pesendorfer, Martin Schneider, Uzi Segal, Kyoungwon Seo, Peter Wakker, and especially Bob Nau for helpful comments and discussions. This paper was previously titled “Three Paradoxes for the ‘Smooth Ambiguity’ Model of Preference.” © 2010 The Econometric Society
DOI: 10.3982/ECTA8689
2086
LARRY G. EPSTEIN
state space Ω.2 The set of all acts is F . KMM also employed second-order acts, which are maps F : Δ(Ω) → C; if F is binary (has only two possible outcomes), refer to it as a second-order bet. The set of all second-order acts is F 2 . KMM posited a preference order on F and another preference 2 on F 2 . The corresponding utility functions, U and U 2 , have the form U(f ) = (1.1) φ u(f (ω)) dp(ω) dμ(p) f ∈ F Δ(Ω)
and (1.2)
Ω
φ u(F(p)) dμ(p)
U 2 (F) =
F ∈ F 2
Δ(Ω)
Here μ is a (countably additive) probability measure on Δ(Ω), u : Δ(C) → R is mixture linear, and φ is continuous and strictly increasing on u(C) ⊂ R, where C is identified with a subset of Δ(C) in the familiar way and we denote by u also its restriction to C. Finally, it is assumed that u is continuous and strictly increasing on C. Identify a KMM agent with a triple (u φ μ) satisfying the above conditions. These functional forms suggest appealing interpretations. The utility of an Anscombe–Aumann act f in F would simply be its expected utility if the probability law p on Ω were known. However, it is uncertain, in general, with prior beliefs represented by μ, and this uncertainty about the true law matters if φ is nonlinear; in particular, if φ is concave, then U(f ) ≤ φ u(f (ω)) dp(ω) dμ(p) Δ(Ω)
=φ
Ω
f (ω) dp(ω) dμ(p)
u Δ(Ω)
=φ u
Δ(Ω)
Ω
f (ω) dp(ω) dμ(p)
Ω
= U(Lf (μ)) where Lf (μ) is a lottery over outcomes, viewed also as a constant act f (ω) dp(ω) dμ(p) ∈ Δ(C) Lf (μ) = Δ(Ω)
Ω
2 KMM used Savage acts over Ω × [0 1] rather than Anscombe–Aumann acts. However, this difference is not important for our purposes. Below by the “KMM model” we mean the Anscombe-Aumann version outlined here, and the corresponding translation of their axioms and arguments.
SMOOTH AMBIGUITY MODEL OF PREFERENCE
2087
It is the lottery derived from f if one uses μ to weight probability measures over states and then reduces the resulting three-stage compound lottery in the usual way. In that sense, Lf (μ) and f embody similar uncertainty, but only for f do eventual payoffs depend on states in Ω where there is uncertainty about the true law. Thus the inequality U(f ) ≤ U(Lf (μ))
for all f ∈ F
is essentially KMM’s behavioral definition of ambiguity aversion. As noted, the latter is modeled by a concave φ, while ambiguity (as opposed to the attitude toward it) seems naturally to be captured by μ—hence, it is claimed, a separation is provided between ambiguity and aversion to ambiguity. This separation is highlighted by KMM as a major advantage of their model over all others in the literature (see also their discussion in the paper KMM (2009a, pp. 931–932) dealing with a dynamic model) and often has been cited by researchers as motivation for their adoption of the KMM model. (See Hansen (2007), Chen, Ju, and Miao (2009), Ju and Miao (2009), Collard, Mukerji, Sheppard, and Talon (2008), for example.) Seo (2009) provided alternative foundations for the utility function (1.1) on F (see Section 2.5 below), and Nau (2006) and Ergin and Gul (2009) proposed related models. None made comparably strong claims for their models. Finally, other critical perspectives on the smooth ambiguity model may be found in Baillon, Driesen, and Wakker (2009) and Halevy and Ozdenoren (2008).
2. THOUGHT EXPERIMENT 1 2.1. Second-Order Bets In Ellsberg’s classic three-color experiment, you are told the following. An urn contains three balls, of which one is red (R) and the others are either blue (B) or green (G).3 Then you are offered some bets on the color of the ball to be drawn at random from the urn. Specifically, you are asked to choose
3 Ellsberg postulated 30 red balls and 60 balls that are either blue or green, but the message is clearly the same.
2088
LARRY G. EPSTEIN
between f1 and f2 , and also between f3 and f4 , where these acts are defined by Bets on the Color
f1 f2 f3 f4
R
B
G
100 0 100 0
0 100 0 100
0 0 100 100
The choices pointed to by Ellsberg (and by many subsequent experimental studies) are (2.1)
f 1 f2
and
f3 ≺ f4
The well known intuition for these choices is uncertainty about the true composition of the urn combined with aversion to that uncertainty. Refer to the pair of choices (2.1) as the “Ellsbergian choices.” Consider a simple extension of Ellsberg’s experiment that adds bets on the true composition of the urn. First you are told more about how the urn (subsequently referred to as the normal urn) is constructed. There exists another urn, that we call a second-order urn, containing three balls. One has the label r and the others are labeled either b or g. A ball will be drawn from this urn and the ball’s label will determine the color composition of the normal urn. If the label i ∈ {r b g} is drawn from the second-order urn, then the normal urn will have composition pi , where (2.2)
pr =
1 1 1 3 3 3
pb =
1 2 0 3 3
and
pg =
2 1 0 3 3
are three probability measures on {R B G}. Thus it is certain that there will be one red ball, but there could be either zero or two blue (and hence also green). You are offered some bets, and after making your choices, a ball will be drawn from the second-order urn, and from the normal urn constructed as described according to the outcome of the first draw. Finally, the two balls drawn and the bets chosen determine payoffs. In one pair of choice problems, you choose between f1 and f2 , and between f3 and f4 , the bets on the color drawn from the normal urn as in Ellsberg’s experiment. In addition, you choose between bets on the true composition of the normal urn or, equivalently, on the label of the ball drawn from the second-order urn. Specifically, you choose between F1 and F2 , and between F3
SMOOTH AMBIGUITY MODEL OF PREFERENCE
2089
and F4 , where they are given by Bets on the Composition
F1 F2 F3 F4
r
b
g
100 0 100 0
0 100 0 100
0 0 100 100
The Ellsbergian choices here are (2.3)
F1 F2
and
F3 ≺ F4
Each urn defines a setting that is qualitatively similar to that in Ellsberg’s three-color experiment. (The second-order urn is identical to Ellsberg’s urn, apart from the rescaling of the total number of balls, while the information given for the normal urn is different but qualitatively similar in that the proportion of one color is unambiguous and only partial information is given about the proportions of the other two.) Thus the two urns are also qualitatively similar to one another and ambiguity aversion suggests Ellsbergian choices both when betting on the color and when betting on the composition. To apply the smooth ambiguity model, we take Ω = {R B G}. Then bets on the color drawn from the normal urn are acts in F , and bets on the true composition of the normal urn are second-order acts, elements in F 2 . By KMM’s (2005) Assumption 2, preference on second-order acts has the subjective expected utility representation (1.2). Thus, the model cannot produce Ellsbergian behavior when betting on the composition. Although there is nothing mysterious in this contradiction, we elaborate shortly on why we feel that nevertheless it has significant implications for the KMM model and its interpretation. To elaborate, note that although the two urns are qualitatively similar, the KMM model treats bets on the urns differently, imposing ambiguity neutrality on one only, because that urn is used to determine the composition of the other. But why should it matter for an individual deciding how to bet whether the urn is a second-order urn or a normal urn? Moreover, if one were to argue for a difference in behavior, then would it not make more sense to argue that ambiguity averse behavior is more pronounced in the case of the second-order urn? After all, for it there is no information at all given about the number of b versus g balls, while the details given about the construction of the normal urn give some information about the number of B versus G balls; in fact, it implies, via the usual probability calculus, that there is an objective probability of at least 19 of drawing B and similarly for G. That information leaves much uncertainty, but surely it implies (weakly) less ambiguity than when nothing at all is known as in the second-order urn. Thus even if one grants that an
2090
LARRY G. EPSTEIN
asymmetry in treatment of the urns is warranted, the asymmetry in the KMM model seems to be in the wrong direction. Finally, note that our expanded Ellsberg example has nothing to say about any of the other models of ambiguity averse preferences in the literature. The smooth ambiguity model is, to our knowledge, unique in making assumptions about the ranking of second-order acts. 2.2. Why Is It Important? KMM’s (2005, p. 1851) focus is the functional form (1.1) for U on the domain F . They expand the domain to include second-order acts only to provide foundations for preference on F . The focus on F is understandable since economically relevant objects of choice correspond to acts in F , while choices between bets on the true probability law are not readily observed in the field. For example, the purchase of a financial asset is a bet on a favorable realization of the stochastic process generating returns and not directly on which probability law describes that process.4 Thus, it might seem, the domain F 2 is of secondary importance and counterfactual or counterintuitive predictions there are not critical. The subjective expected utility (SEU) assumption on F 2 , one might think, is merely a simplifying assumption that facilitates focusing on the important behavior. However, the model’s predictions on F 2 are not a side issue. The way in which uncertainty about the true probability law is treated by the individual is not a side issue when trying to explain ambiguity averse behavior in the choice between acts in F . Dekel and Lipman (2010, Section 2) argued similarly when considering the more general question of when the refutation of a model’s “simplifying assumptions” is important. In their view, this is the case when those simplifications are crucial to the model’s explanation of the central observations (here, ambiguity averse behavior in F ). Another way in which the domain F 2 is a critical ingredient concerns uniqueness and interpretation. It is well known that μ and φ appearing in (1.1) are not pinned down uniquely by preference on F alone. (For example, if φ is linear, then any two measures μ and μ with the same mean represent the same preference on F .) Moreover, interpretation of μ and φ as capturing (and separating) ambiguity and ambiguity attitude obviously presupposes that these components are unique. KMM achieved uniqueness by expanding their domain to include second-order acts. Thus, if one is to retain the appealing interpretations that they offer, one cannot simply ignore that component of their model. (See below for further discussion of the model’s interpretation.) 4
One might argue that this is reason alone to be dissatisfied with KMM’s axioms. That is not our criticism, however. It suffices for our purposes that the ranking of second-order acts is, in principle, observable in the laboratory.
SMOOTH AMBIGUITY MODEL OF PREFERENCE
2091
Finally, note that interpretation matters. This is true for many reasons, but the specific reason we wish to emphasize is that it matters for empirical applications of the model. In a quantitative empirical exercise, one needs to judge not only whether the model matches data (moments of asset returns, for example), but also whether parameter values make sense, and this requires that they have an interpretation.5 2.3. Are We Using the Wrong State Space? The preceding critique hinges on our adoption of {R B G} as the state space Ω. In defense of the smooth ambiguity model, one might argue that the “correct” state space is Ω1 = {R B G} × Δ({R B G}), (or {R B G} × {pr pb pg }), since the payoffs to the bets being considered are determined by the eventual realization of a pair (color, composition). Then all bets correspond to normal acts and the expected utility assumption for second-order acts does not matter. However, this response does not seem satisfactory, since, taken to its logical conclusion, it renders KMM’s Assumption 2 (the expected utility form in (1.2)) unfalsifiable. To make the argument most clearly, suppose that, in fact, there is no physical second-order urn: it exists, but only in the mind of the decision-maker. All bets concern the normal urn, both the color of the ball drawn and the urn’s composition. The second-order urn represents the decision-maker’s theory of the construction of the normal urn. Naturally, it is unobservable to the modeler, but intuitively it leads to the same choices described above, including the Ellsbergian choices (2.3) when betting on the composition. How would we model this situation using the KMM model? One could adopt the large state space and argue accordingly that bets on the composition are normal acts and that observed choices are consistent with the model. However, if the SEU assumption is falsifiable, then there exist settings and behavior that would lead to its rejection, and there is no obvious reason why the behavior described here should lead to a different conclusion. We are led, therefore, to reject the expanded state space as an acceptable way to apply the model if its axioms are viewed as in principle falsifiable. Finally, there is no reason that we can see to proceed differently if the second-order urn is concrete and observable. This concludes the argument. An alternative to the last step is to dispense throughout with the concrete second-order urn and to adopt instead the subjective story sketched here: suitably translated, the discussion to follow is largely unaffected. 5 The following quote from Lucas (2003), addressing the equity premium puzzle, expresses clearly the typical view that fitting moments is not enough: “No one has found risk aversion parameters of 50 or 100 in the diversification of individual portfolios, in the level of insurance deductibles, in the wage premiums associated with occupations with high earnings risk, or in the revenues raised by state-operated lotteries. It would be good to have the equity premium resolved, but I think we need to look beyond high levels of risk aversion to do it.” This quotation was used by Barillas, Hansen, and Sargent (2009) to motivate their attempt to reinterpret the risk aversion parameter as capturing in part an aversion to ambiguity or model uncertainty.
2092
LARRY G. EPSTEIN
2.4. Separation In Section 2.2, we pointed to “nonuniqueness” as one source of difficulty for interpreting the components μ and φ of the model. Here we comment further on interpretation. We describe a variation of our thought experiment that illustrates a sense in which KMM’s foundations do not support identifying μ and φ separately with ambiguity and attitude toward ambiguity. You are faced in turn with two scenarios, I and II. Scenario I is similar to that in our thought experiment. In particular, it features a second-order urn and a normal urn, related as described in (2.2). The only difference here is that the second-order urn contains 90 balls, with 30 labeled r and the other 60 labeled b or g. Scenario II is similar except that you are told more about the second-order urn, namely that b g ≥ 20. Consider bets on both urns in each scenario. The following rankings seem intuitive: Bets on b and g are indifferent to one another for each second-order urn, and bets on r have the same certainty equivalent across scenarios. For each normal urn, the bet on R is strictly preferable to the bet on B, and the certainty equivalent for a bet on B is strictly larger in scenario II than in I, because the latter is intuitively more ambiguous. How could we model these choices using the smooth ambiguity model? Assume that the KMM axioms are satisfied for each scenario, so that preferences are represented by two triples (ui φi μi ), i = I II. The basic model (1.1)–(1.2) does not impose any connection across scenarios. However, since the scenarios differ in ambiguity only and it is the same decision-maker involved in both, one is led naturally to consider the restrictions (2.4)
uI = uII
and
φI = φII
These equalities are motivated by the hypothesis that risk and ambiguity attitudes describe the individual, and, therefore, travel with him across settings. In addition, the postulated behavior implies that μI and μII are both uniform measures on {r b g} and hence coincide. Thus the indicated behavior cannot be rationalized. On the other hand, it can be rationalized if we assume that the priors μi are fixed (and uniform) across scenarios, but allow φI and φII to differ. The preceding defies the common interpretation of the smooth ambiguity model whereby μ captures ambiguity and φ represents ambiguity aversion. The meaning of “separation” is particularly important for applied work. If φ describes the individual’s attitude alone, and thus moves with her from one setting to another, then it serves to connect the individual’s behavior across different settings. Thus, in principle, one could calibrate ambiguity aversion in the application under study by examining choices in other situations. Such quantitative discipline is crucial for credible empirical applications; the equity premium puzzle is a classic illustration (recall the quotation from Lucas given in Section 2.2). Thus, in the context of finance applications based on the smooth ambiguity model, Collard et al. (2008), Chen, Ju, and Miao (2009) and
SMOOTH AMBIGUITY MODEL OF PREFERENCE
2093
Ju and Miao (2009) assumed that φ can be calibrated. Specifically, they employed the functional form φ(t) = t 1−α /(1 − α), where α ≥ 0 is viewed as an ambiguity aversion parameter, and they used the choices implied in hypothetical or experimental Ellsberg-style choice problems to determine what values of α are reasonable to adopt for their asset market applications. KMM (2009a, p. 957) explicitly support such calibration when they write, in the context of an asset pricing example, that “we may assess a plausible range for α [the ambiguity aversion parameter] by . . . looking at the experimental data on ambiguity premiums in Ellsberg-like experiments.” We see no justification for such an exercise. Our thought experiment in this section dealt with a fixed individual who moves across settings. Alternatively, one might wish to compare the behavior of two individuals who face identical environments, but who differ in ambiguity attitude. KMM (2005, Theorem 2) argued that such a comparative statics exercise can be conducted within their model by keeping (u μ) fixed across individuals, while allowing φ to vary. We do not dispute this feature of their model. However, we emphasize that separation in this sense does not make possible the calibration of ambiguity aversion, which inherently concerns the comparative statics exercise with a single individual and two settings.6 2.5. Alternative Foundations—Nonreduction Since we have been criticizing KMM’s foundations for (1.1), rather than the latter itself, one may wonder about alternative foundations. Seo (2009) provided an alternative axiomatic foundation for the same model of preference on normal acts. In his model, an individual can be ambiguity averse only if she fails to reduce objective (and timeless) two-stage lotteries to their onestage equivalents (much as in Segal’s (1987) seminal paper). This connection has some experimental support (Halevy (2007)). Nevertheless, nonreduction of (timeless) compound lotteries is arguably a mistake, while ambiguity aversion is normatively at least plausible. Thus the noted connection severely limits the scope of Seo’s model of ambiguity aversion. Though KMM did not include two-stage lotteries in their domain and thus did not explicitly take a stand on whether these are properly reduced, there is a sense in which nonreduction is implicit also in their model, as we now describe. The argument can be made very generally, but for concreteness, we consider again two scenarios with Ellsberg-style urns as described above. Scenario I is 6 KMM (2005, pp. 1864–1869) contribute to confusion about the meaning of “separation” in their model. Their discussion sometimes correctly focused on the second comparative statics exercise involving two individuals and one setting. But elsewhere (p. 1852) they send the conflicting message that their model affords the separation needed to conduct a comparative statics exercise in which one “hold[s] ambiguity attitudes fixed and ask[s] how the equilibrium is affected if the perceived ambiguity is varied.” A similar claim is repeated on page 1877 and also in their second paper (KMM (2009a, p. 931)).
2094
LARRY G. EPSTEIN
unchanged. Let (uI φI μI ) describe the individual in that setting; symmetry calls for μI (b) = μI (g). In scenario II, you are told that the composition of the second-order urn is given by μI , that is, the subjective prior is announced as being true.7 We would expect the announcement not to change risk preferences or preferences over acts defined within the second-order urn, nor to cause the individual to change his beliefs about that urn. (Think of the corresponding exercise for a subjective expected utility agent in an abstract state space setting.) Thus (uII φII μII ) = (uI φI μI ). But then preferences on all acts, over both urns, must be unchanged across scenarios. In I, there is ambiguity aversion in betting on the normal urn. In particular, the bet on R is strictly preferable to a bet on B or (normalizing so that uI (100) = 1 and uI (0) = 0) 1 1 2 1 − μI (r) 1 − μI (r) > μI (r)φI + φI + φI (0) φI 3 3 2 3 2 Therefore, the corresponding inequality is satisfied also in scenario II, though there is no ambiguity there. In II, the individual faces an objective two-stage lottery and the displayed inequality reflects a failure to reduce two-stage lotteries. Thus, as in Seo’s model, KMM’s foundations imply that ambiguity aversion is tied to mistakes in processing objective probabilities. 3. THOUGHT EXPERIMENT 2 The example presented here does not involve second-order acts. It concerns only the properties of the KMM model on F , the declared domain of interest. Before describing the example, we present the simple analytical observation that underlies it. As mentioned, KMM interpreted concavity of φ as modeling ambiguity aversion. If φ is strictly concave, as it is in all applications of the smooth ambiguity model that we have seen, then the preference order on F represented by (1.1) satisfies the following condition8 : For all Anscombe– Aumann acts f1 and f2 , 1 1 (3.1) f1 ∼ f2 ∼ f1 + f2 2 2 ⇒
1 1 1 1 f1 + h ∼ f2 + h 2 2 2 2
for all h ∈ F
Thus indifference to randomization between the pair of indifferent acts f1 and f2 implies indifference between mixtures with any third act h. Of course, 7
The announcer can, in principle, infer the prior from sufficiently rich data on the individual’s choices between second-order acts. 8 The (elementary) proof will be apparent after reading the proof of the next proposition.
SMOOTH AMBIGUITY MODEL OF PREFERENCE
2095
the implication would be required by the Independence axiom, but ambiguity aversion calls for relaxing Independence. Note that while strict concavity of φ is used to derive the sharp result in (3.1), only weak concavity is assumed henceforth. To see the force of (3.1), consider a concrete case. You are given two urns, numbered 1 and 2, each containing 50 balls that are either red or blue. Thus, Ω = {R1 B1 } × {R2 B2 } and R1 + B1 = 50 = R2 + B2 You are told also that the two urns are generated independently, for example, they are set up by administrators from opposite sides of the planet who have never been in contact with one another. One ball will be drawn from each urn. Consider the following bets, where c ∗ > c are outcomes in C, and (c ∗ 12 ; c 12 ) denotes the equal probability lottery over these outcomes: Bets for Experiment 2 R1 R2
R1 B2
∗
f1 f2 1 f + 12 f2 2 1 g1 g2
∗
c c∗ c∗ (c ∗ 12 ; c 12 ) (c ∗ 12 ; c 12 )
c c ∗ 1 (c 2 ; c 12 ) (c ∗ 12 ; c 12 ) c
B1 R2
B1 B2
c c∗ (c ∗ 12 ; c 12 ) (c ∗ 12 ; c 12 ) c∗
c c c ∗ 1 (c 2 ; c 12 ) (c ∗ 12 ; c 12 )
Symmetry suggests indifference between f1 and f2 . If it is believed that the compositions of the two urns are unrelated, then f1 and f2 do not hedge one another. If, as in the multiple-priors model, hedging ambiguity is the only motivation for randomizing, then we are led to the rankings (3.2)
1 1 f1 ∼ f2 ∼ f1 + f2 2 2
Ambiguity aversion suggests (3.3)
g1 g2
(Note that (3.4)
1 1 g1 = f1 + h 2 2
and
1 1 g2 = f2 + h 2 2
where h = (c c c ∗ c ∗ ).) The rankings (3.2)–(3.3), for all c ∗ > c, are easily accommodated by the multiple-priors model. However, as we show next, they are inconsistent with KMM if the natural state space Ω = {R1 B1 R2 B2 } is adopted and if φ is taken to be concave. (See the Appendix for a proof.)
2096
LARRY G. EPSTEIN
PROPOSITION 3.1: If preference over the set of Anscombe–Aumann acts F is represented by the utility function U in (1.1), where φ is concave, then (*)
1 1 f1 ∼ f2 ∼ f1 + f2 2 2 1 1 f1 + h ∼ ⇒ 2 2
for all c ∗ > c 1 1 f2 + h 2 2
for all c ∗ > c
In particular, in light of (3.4), the rankings (3.2)–(3.3) are impossible. To our knowledge, there is no relevant experimental evidence on the hypothesis in (*) that randomizing between bets on “independent” urns is of no value. In their reply, KMM (2009b) offered the contrary intuition whereby the mixture 12 f1 + 12 f2 is strictly preferable to f1 because it reduces the variation in expected utilities across possible probability laws. The intuition does not rely on ambiguity about the true probability law; in particular, it would presumably apply also when the prior μ is based on a given objective distribution as in the example in Section 2.5. Thus this argument for the value of randomization would seem to reflect nonreduction of compound lotteries rather than ambiguity aversion. Nevertheless, the descriptive validity of (*) is an empirical question. 4. CONCLUDING REMARKS The smooth ambiguity model is less parsimonious than multiple priors: both require specifying a set of probability laws, the support of μ in the case of the smooth model, but only the latter requires the modeler to specify also a distribution over this set and a function φ. Typically, less parsimonious models are motivated by the desire to accommodate behavior that is deemed descriptively or normatively important, and yet is inconsistent with the existing tighter model. KMM did not offer descriptive evidence as motivation. One might see their axioms as providing normative motivation for their model. However, our first thought experiment has shown that these axiomatic foundations are problematic normatively. KMM offered two other motivating arguments. The major one is conceptual—the added degrees of freedom permit the separation of ambiguity from ambiguity aversion. The discussion surrounding the first thought experiment clarifies the limited sense in which such “separation” is achieved— calibration of ambiguity aversion is not justified thereby. The other motivation offered is tractability—because utility is (under standard assumptions) differentiable, calculus techniques can be applied to characterize solutions to optimization problems, unlike the case for multiple priors. Although our thought experiments do not touch directly on this rationale, we offer two comments. First, a growing literature (surveyed in Epstein
SMOOTH AMBIGUITY MODEL OF PREFERENCE
2097
and Schneider (2010)) has fruitfully applied the multiple-priors model in finance, thus showing that differentiability is not necessary for tractability. Second, as first pointed out by Dow and Werlang (1992), differentiability, or the lack thereof, has economic significance. The cited survey describes several ways in which the “first-order uncertainty aversion” generated by nondifferentiability helps to account for asset market behavior that is qualitatively puzzling in light of smooth models such as subjective expected utility and the KMM model. There is also experimental evidence (see Bossaerts, Ghirardato, Guarnaschelli, and Zame (2010) and Ahn, Choi, Gale, and Kariv (2009)) that first-order effects are important in portfolio choice. It remains unclear what the smooth ambiguity model adds to the arsenal of ambiguity averse preference models in terms of explanatory power. Our second thought experiment demonstrates some of the behavioral differences between the smooth and multiple-priors models, but obviously the picture is still incomplete. APPENDIX PROOF OF PROPOSITION 3.1: It is without loss of generality (since C was taken to be a compact interval) to assume that u has range equal to [0 1] and that φ : [0 1] −→ R. Also without loss of generality, suppose there exists 0 < κ < 1 such that, for all t < κ < t , 1 1 1 1 φ t + t > φ(t) + φ(t ) 2 2 2 2 Otherwise, φ is linear and (*) is obvious. The following cases are essentially exhaustive. Abbreviate p(R1 × {R2 B2 }) by p(R1 ) and so on. Case 1. p(R1 ) = p(R2 ) with μ-probability equal to 1. Then u(f1 ) dp = u(f2 ) dp μ-a.s. Ω
⇒
⇒
⇒
Ω
1 1 u f1 + h dp (since u is linear) 2 2 Ω 1 1 = u f2 + h dp μ-a.s. 2 2 Ω 1 1 φ u f1 + h dp dμ 2 2 Ω 1 1 = φ u f2 + h dp dμ 2 2 Ω 1 1 1 1 U f1 + h = U f2 + h 2 2 2 2
2098
LARRY G. EPSTEIN
Case 2. There exists P ⊂ Δ(Ω), with μ(P) > 0, such that (A.1)
p(R1 ) > p(R2 ) ≥ 0
for all p ∈ P
Take the special case P = {p∗ }. Pick c ∗ and c so that 1 ≥ u(c ∗ ) > u(c) ≥ 0 and p∗ (R2 ) < Then
κ − u(c) < p∗ (R1 ) u(c ∗ ) − u(c)
u(f2 ) dp∗ < κ < Ω
u(f1 ) dp∗ Ω
which, by definition of κ implies that 1 1 ∗ u f1 + f2 dp φ 2 2 Ω 1 1 ∗ ∗ =φ u(f1 ) dp + u(f2 ) dp 2 Ω 2 Ω 1 1 ∗ ∗ > φ u(f1 ) dp + φ u(f2 ) dp 2 2 Ω Ω Since φ is concave, it follows that 1 1 U f1 + f2 2 2 1 1 = φ u f1 + f2 dp dμ(p) 2 2 Ω 1 1 > φ u(f1 ) dp + φ u(f2 ) dp dμ(p) 2 2 Ω Ω 1 1 = U(f1 ) + U(f2 ) = U(f1 ) 2 2 contrary to the hypothesis in (*). Turn to the general case of (A.1), where P need not be a singleton. Then there exists a subset Q ⊂ P, μ(Q) > 0, where, for some a > 0, q(R1 ) > a > q(R2 ) ≥ 0 Adapt the above argument.
for all q ∈ Q Q.E.D.
SMOOTH AMBIGUITY MODEL OF PREFERENCE
2099
REFERENCES AHN, D., S. CHOI, D. GALE, AND S. KARIV (2009): “Estimating Ambiguity Aversion in a Portfolio Choice Experiment.” [2097] BAILLON, A., B. DRIESEN, AND P. P. WAKKER (2009): “Relative Concave Utility for Risk and Ambiguity.” [2087] BARILLAS, F., L. P. HANSEN, AND T. J. SARGENT (2009): “Doubts, or Variability?” Journal of Economic Theory, 144, 2388–2418. [2091] BOSSAERTS, P., P. GHIRARDATO, S. GUARNASCHELLI, AND W. R. ZAME (2010): “Ambiguity in Asset Markets: Theory and Experiment,” Review of Financial Studies, 23, 1325–1359. [2097] CHEN, H., N. JU, AND J. MIAO (2009): “Dynamic Asset Allocation With Ambiguous Return Predictability.” [2087,2092] COLLARD, F., S. MUKERJI, K. SHEPPARD, AND J. M. TALON (2008): “Ambiguity and the Historical Equity Premium.” [2087,2092] DEKEL, E., AND B. LIPMAN (2010): “How (Not) to Do Decision Theory,” Annual Review of Economics (forthcoming). [2090] DOW, J., AND S. R. WERLANG (1992): “Uncertainty Aversion, Risk Aversion and the Optimal Choice of Portfolio,” Econometrica, 60, 197–204. [2097] EPSTEIN, L. G., AND M. SCHNEIDER (2010): “Ambiguity and Asset Markets,” Annual Review of Financial Economics (forthcoming). [2096] ERGIN, H., AND F. GUL (2009): “A Theory of Subjective Compound Lotteries,” Journal of Economic Theory, 144, 899–929. [2087] GILBOA, I., AND D. SCHMEIDLER (1989): “Maxmin Expected Utility With Non-Unique Priors,” Journal of Mathematical Economics, 18, 141–153. [2085] HALEVY, Y. (2007): “Ellsberg Revisited: An Experimental Study,” Econometrica, 75, 503–536. [2093] HALEVY, Y., AND E. OZDENOREN (2008): “Ambiguity and Compound Lotteries: Calibration.” [2087] HANSEN, L. P. (2007): “Beliefs, Doubts and Learning: Valuing Macroeconomic Risk,” American Economic Review, 97, 1–30. [2087] JU, N., AND J. MIAO (2009): “Ambiguity, Learning and Asset Returns.” [2087,2093] KLIBANOFF, P., M. MARINACCI, AND S. MUKERJI (2005): “A, Smooth Model of Decision Making Under Ambiguity,” Econometrica, 73, 1849–1892. [2085,2089,2090,2093] (2009a): “Recursive Smooth Ambiguity Preferences,” Journal of Economic Theory, 144, 930–976. [2087,2093] (2009b): “On the Smooth Ambiguity Model: A Reply.” [2096] LUCAS, R. E., JR. (2003): “Macroeconomic Priorities,” American Economic Review, 93, 1–14. [2091] NAU, R. F. (2006): “Uncertainty Aversion and Second-Order Utilities and Probabilities,” Management Science, 52, 136–145. [2087] SEGAL, U. (1987): “The Ellsberg Paradox and Risk Aversion: An Anticipated Utility Approach,” International Economic Review, 28, 175–202. [2093] SEO, K. (2009): “Ambiguity and Second-Order Belief,” Econometrica, 77, 1575–1605. [2087,2093]
Dept. of Economics, Boston University, 270 Bay State Road, Boston, MA 02215, U.S.A.;
[email protected]. Manuscript received July, 2009; final revision received July, 2010.
Econometrica, Vol. 78, No. 6 (November, 2010), 2101
ANNOUNCEMENTS THE 2011 NORTH AMERICAN WINTER MEETING
THE 2011 NORTH AMERICAN WINTER MEETING of the Econometric Society will be held in Denver, CO, from January 7–9, 2011, as part of the annual meeting of the Allied Social Science Associations. The program will consist of contributed and invited papers. More information on program details and registration will be sent by email to all members of the Econometric Society and posted on the website at http:// www.econometricsociety.org. Program Committee Chair: Markus K. Brunnermeier 2011 NORTH AMERICAN SUMMER MEETING
THE 2011 NORTH AMERICAN SUMMER MEETING of the Econometric Society in 2011 will be held June 9–12, 2011, hosted by Washington University in Saint Louis, MO. The program committee will be chaired by Marcus Berliant of Washington University in Saint Louis. The program will include plenary, invited and contributed sessions in all fields of economics. 2011 AUSTRALASIA MEETING
THE 2011 AUSTRALASIA MEETING of the Econometric Society in 2011 (ESAM11) will be held in Adelaide, Australia, from July 5 to July 8, 2011. ESAM11 will be hosted by the School of Economics at the University of Adelaide. The program committee will be co-chaired by Christopher Findlay and Jiti Gao. The program will include plenary, invited and contributed sessions in all fields of economics. 2011 EUROPEAN MEETING
THE 2011 EUROPEAN MEETING of the Econometric Society (ESEM) will take place in Oslo, Norway, from August 25 to 29, 2011. The Meeting is organized by the University of Oslo, and it will run in parallel with the Congress of the European Economic Association (EEA). Participants will be able to attend all sessions of both events. The Program Committee Chairs are Professor John van Reenen, London School of Economics, for Econometrics and Empirical Economics, and Professor Ernst-Ludwig von Thadden, University of Mannheim, for Theoretical and Applied Economics. The Local Arrangements Chairs is Professor Asbjørn Rødseth, University of Oslo.
© 2010 The Econometric Society
DOI: 10.3982/ECTA786ANN
Econometrica, Vol. 78, No. 6 (November, 2010), 2103
FORTHCOMING PAPERS THE FOLLOWING MANUSCRIPTS, in addition to those listed in previous issues, have been accepted for publication in forthcoming issues of Econometrica. ACEMOGLU, DARON, AND ALEXANDER WOLITZKY: “The Economics of Labor Coercion.” BRÜCKNER, MARKUS, AND ANTONIO CICCONE: “Rain and the Democratic Window of Opportunity.” FRENCH, ERIC, AND JOHN BAILEY JONES: “The Effects of Health Insurance and Self-Insurance on Retirement Behavior.” GOEREE, JACOB K., AND LEEAT YARIV: “An Experimental Study of Collective Deliberation.” GRAHAM, BRYAN: “Efficiency Bounds for Missing Data Models With Semiparametric Restrictions.” HANSEN, PETER R., ASGER LUNDE, AND JAMES M. NASON: “The Model Confidence Set.” HOROWITZ, JOEL L.: “Applied Nonparametric Instrumental Variables Estimation.” KABOSKI, JOSEPH P., AND ROBERT M. TOWNSEND: “A Structural Evaluation of a Large-Scale Quasi-Experimental Microfinance Initiative.” KLEVEN, HENRIK JACOBSEN, MARTIN KNUDSEN, CLAUS THUSTRUP KREINER, SØREN PEDERSEN, AND EMMANUEL SAEZ: “Unwilling or Unable to Cheat? Evidence From a Tax Audit Experiment in Denmark.” MÜLLER, ULRICH K.: “Efficient Tests Under a Weak Convergence Assumption.” RENY, PHILIP J.: “On the Existence of Monotone Pure Strategy Equilibria in Bayesian Games.”
© 2010 The Econometric Society
DOI: 10.3982/ECTA786FORTH